Quick start

ollama run deepseek-ocr

Available sizes

Tag	Size	Quantization	Context	Min RAM
deepseek-ocr:latest	6.7GB	q4_k_m	8K context	8.4 GB

Strengths & Limitations

Strengths

Token-efficient OCR
Vision-language capabilities
Performs Optical Character Recognition

Related models

gemma3General

The current, most capable model that runs on a single GPU.

32.1M pulls

llavaMultimodal

🌋 LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. Updated to version 1.6.

12.9M pulls

minicpm-vMultimodal

A series of multimodal LLMs (MLLMs) designed for vision-language understanding.

4.6M pulls

llama3.2-visionVision

Llama 3.2 Vision is a collection of instruction-tuned image reasoning generative models in 11B and 90B sizes.

3.8M pulls