Quick start
ollama run qwen2.5vlAvailable sizes
| Tag | Size | Quantization | Context | Min RAM |
|---|---|---|---|---|
| qwen2.5vl:3b | 3.2GB | q4_k_m | 125K context | 4 GB |
| qwen2.5vl:latest | 6.0GB | q4_k_m | 125K context | 7.5 GB |
| qwen2.5vl:32b | 21GB | q4_k_m | 125K context | 26.2 GB |
| qwen2.5vl:72b | 49GB | q4_k_m | 125K context | 61.2 GB |
Strengths & Limitations
Strengths
- Vision-language capabilities
- Significant improvement over Qwen2-VL
- Flagship model of Qwen
Related models
llavaMultimodal
🌋 LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. Updated to version 1.6.
12.9M pullsminicpm-vMultimodal
A series of multimodal LLMs (MLLMs) designed for vision-language understanding.
4.6M pullsllava-llama3Multimodal
A LLaVA model fine-tuned from Llama 3 Instruct with better scores in several benchmarks.
2.1M pullsqwen3-vlMultimodal
The most powerful vision-language model in the Qwen model family to date.
1.6M pulls