Skip to main content
Ollama ExplorerBeta
MultimodaladvancedVision

llava

LLaVA

πŸŒ‹ LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. Updated to version 1.6.

12.9M pullsUpdated Feb 26, 202698 tags32K context

Quick start

ollama run llava

Available sizes

TagSizeQuantizationContextMin RAM
llava:latest4.7GBq4_k_m32K context5.9 GB
llava:13b8.0GBq4_k_m4K context10 GB
llava:34b20GBq4_k_m4K context25 GB

Strengths & Limitations

Strengths

  • Visual and language understanding
  • End-to-end training
  • Combines vision and language models

Related models