Quick start

ollama run glm-4.7-flash

Available sizes

Tag	Size	Quantization	Context	Min RAM
glm-4.7-flash:latest	19GB	q4_k_m	198K context	23.8 GB
glm-4.7-flash:q4_K_M	19GB	q4_k_m	198K	23.8 GB
glm-4.7-flash:q8_0	32GB	q8_0	198K context	40 GB
glm-4.7-flash:bf16	60GB	bf16	198K context	75 GB

Run with

Claude Code

ollama launch claude --model glm-4.7-flash

Codex

ollama launch codex --model glm-4.7-flash

OpenCode

ollama launch opencode --model glm-4.7-flash

OpenClaw

ollama launch openclaw --model glm-4.7-flash

Strengths & Limitations

Strengths

Strong performance in 30B class
Lightweight deployment option
Balances performance and efficiency

Benchmarks

Benchmark	Score	Unit
AIME	25	—
GPQA	75.2	—
SWE-bench Verified	59.2	—

Related models

llama3.1Language

Llama 3.1 is a new state-of-the-art model from Meta available in 8B, 70B and 405B parameter sizes.

110.5M pulls

llama3.2Language

Meta's Llama 3.2 goes small with 1B and 3B models.

58.0M pulls

mistralLanguage

The 7B model released by Mistral AI, updated to version 0.3.

25.6M pulls

qwen2.5Language

Qwen2.5 models are pretrained on Alibaba's latest large-scale dataset, encompassing up to 18 trillion tokens. The model supports up to 128K tokens and has multilingual support.

22.0M pulls