Quick start
ollama run qwen3-vlAvailable sizes
| Tag | Size | Quantization | Context | Min RAM |
|---|---|---|---|---|
| qwen3-vl:2b | 1.9GB | q4_k_m | 256K context | 2.4 GB |
| qwen3-vl:4b | 3.3GB | q4_k_m | 256K context | 4.1 GB |
| qwen3-vl:latest | 6.1GB | q4_k_m | 256K context | 7.6 GB |
| qwen3-vl:30b | 20GB | q4_k_m | 256K context | 25 GB |
| qwen3-vl:32b | 21GB | q4_k_m | 256K context | 26.2 GB |
| qwen3-vl:235b | 143GB | q4_k_m | 256K context | 178.8 GB |
Run with
Claude Code
ollama launch claude --model qwen3-vlCodex
ollama launch codex --model qwen3-vlOpenCode
ollama launch opencode --model qwen3-vlOpenClaw
ollama launch openclaw --model qwen3-vlStrengths & Limitations
Strengths
- Strong vision-language capabilities
- High performance within the Qwen family
- Handles complex visual and textual inputs
Related models
llavaMultimodal
🌋 LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. Updated to version 1.6.
12.9M pullsminicpm-vMultimodal
A series of multimodal LLMs (MLLMs) designed for vision-language understanding.
4.6M pullsllava-llama3Multimodal
A LLaVA model fine-tuned from Llama 3 Instruct with better scores in several benchmarks.
2.1M pullsqwen2.5vlMultimodal
Flagship vision-language model of Qwen and also a significant leap from the previous Qwen2-VL.
1.3M pulls