Quick start
ollama run minicpm-vAvailable sizes
| Tag | Size | Quantization | Context | Min RAM |
|---|---|---|---|---|
| minicpm-v:latest | 5.5GB | q4_k_m | 32K context | 6.9 GB |
Strengths & Limitations
Strengths
- Vision-language understanding
- Multimodal capabilities
- Designed for MLLMs
Related models
llavaMultimodal
🌋 LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. Updated to version 1.6.
12.9M pullsllava-llama3Multimodal
A LLaVA model fine-tuned from Llama 3 Instruct with better scores in several benchmarks.
2.1M pullsqwen3-vlMultimodal
The most powerful vision-language model in the Qwen model family to date.
1.6M pullsqwen2.5vlMultimodal
Flagship vision-language model of Qwen and also a significant leap from the previous Qwen2-VL.
1.3M pulls