Quick start

ollama run glm-ocr

Available sizes

Tag	Size	Quantization	Context	Min RAM
glm-ocr:q8_0	1.6GB	q8_0	128K context	2 GB
glm-ocr:latest	2.2GB	q4_k_m	128K context	2.8 GB
glm-ocr:bf16	2.2GB	bf16	128K	2.8 GB

Run with

Claude Code

ollama launch claude --model glm-ocr

Codex

ollama launch codex --model glm-ocr

OpenCode

ollama launch opencode --model glm-ocr

OpenClaw

ollama launch openclaw --model glm-ocr

Strengths & Limitations

Strengths

Complex document understanding
Multimodal OCR
GLM-V architecture

Related models

llavaMultimodal

🌋 LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. Updated to version 1.6.

12.9M pulls

minicpm-vMultimodal

A series of multimodal LLMs (MLLMs) designed for vision-language understanding.

4.6M pulls

llava-llama3Multimodal

A LLaVA model fine-tuned from Llama 3 Instruct with better scores in several benchmarks.

2.1M pulls

qwen3-vlMultimodal

The most powerful vision-language model in the Qwen model family to date.

1.6M pulls