Quick start

ollama run llama3.2-vision

Available sizes

Tag	Size	Quantization	Context	Min RAM
llama3.2-vision:latest	7.8GB	q4_k_m	128K context	9.8 GB
llama3.2-vision:90b	55GB	q4_k_m	128K context	68.8 GB

Strengths & Limitations

Strengths

Image reasoning
Generative modeling
Instruction following

Related models

gemma3General

The current, most capable model that runs on a single GPU.

32.1M pulls

llavaMultimodal

🌋 LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. Updated to version 1.6.

12.9M pulls

minicpm-vMultimodal

A series of multimodal LLMs (MLLMs) designed for vision-language understanding.

4.6M pulls

llava-llama3Multimodal

A LLaVA model fine-tuned from Llama 3 Instruct with better scores in several benchmarks.

2.1M pulls