Browse AI Models | Ollama Model Explorer

Showing 214 of 214 models

llama3.1

Language

110.5M pulls1 year ago

Llama 3.1 is a new state-of-the-art model from Meta available in 8B, 70B and 405B parameter sizes.

8b70b405b

Tools

Min RAM: 4.9 GBCtx: 128Kmedium

MetaLlama

ollama run llama3.1

deepseek-r1

Reasoning

78.6M pulls7 months ago

DeepSeek-R1 is a family of open reasoning models with performance approaching that of leading models, such as O3 and Gemini 2.5 Pro.

1.5b7b8b14b32b+2

ToolsThinking

Min RAM: 1.1 GBCtx: 160Kfast

DeepSeekDeepSeek

ollama run deepseek-r1

llama3.2

Language

58.0M pulls1 year ago

Meta's Llama 3.2 goes small with 1B and 3B models.

1b3b

Tools

Min RAM: 1.3 GBCtx: 128Kfast

MetaLlama

ollama run llama3.2

nomic-embed-text

Embedding

54.8M pulls2 years ago

A high-performing open embedding model with a large token context window.

Embedding

Min RAM: 0 MB

Nomic AINomic

ollama run nomic-embed-text

gemma3

General

32.1M pulls2 months ago

The current, most capable model that runs on a single GPU.

270m1b4b12b27b

VisionCloud

Min RAM: 3.3 GBCtx: 128Kmedium

Google DeepMindGemma

ollama run gemma3

mistral

Language

25.6M pulls7 months ago

The 7B model released by Mistral AI, updated to version 0.3.

Tools

Min RAM: 4.4 GBCtx: 32Kmedium

Mistral AIMistral

ollama run mistral

qwen2.5

Language

22.0M pulls1 year ago

Qwen2.5 models are pretrained on Alibaba's latest large-scale dataset, encompassing up to 18 trillion tokens. The model supports up to 128K tokens and has multilingual support.

0.5b1.5b3b7b14b+2

Tools

Min RAM: 1.9 GBCtx: 32Kfast

Alibaba CloudQwen

ollama run qwen2.5

qwen3

Language

19.8M pulls4 months ago

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models.

0.6b1.7b4b8b14b+3

ToolsThinking

Min RAM: 1.4 GBCtx: 256Kfast

Alibaba CloudQwen

ollama run qwen3

gemma2

Language

16.4M pulls1 year ago

Google Gemma 2 is a high-performing and efficient model available in three sizes: 2B, 9B, and 27B.

2b9b27b

Min RAM: 1.6 GBCtx: 8Kfast

GoogleGemma

ollama run gemma2

llama3

General

16.1M pulls1 year ago

Meta Llama 3: The most capable openly available LLM to date

8b70b

Min RAM: 4.7 GBCtx: 8Kmedium

MetaLlama

ollama run llama3

phi3

Language

16.0M pulls1 year ago

Phi-3 is a family of lightweight 3B (Mini) and 14B (Medium) state-of-the-art open models by Microsoft.

3.8b14b

Min RAM: 2.2 GBCtx: 128Kfast

MicrosoftPhi

ollama run phi3

llava

Multimodal

12.9M pullsto version 1.6.

🌋 LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. Updated to version 1.6.

7b13b34b

Vision

Min RAM: 4.7 GBCtx: 32Kmedium

LLaVA

ollama run llava

qwen2.5-coder

Code

11.3M pulls9 months ago

The latest series of Code-Specific Qwen models, with significant improvements in code generation, code reasoning, and code fixing.

0.5b1.5b3b7b14b+1

Tools

Min RAM: 1.9 GBCtx: 32Kfast

Alibaba CloudQwen

ollama run qwen2.5-coder

mxbai-embed-large

Embedding

7.6M pulls1 year ago

State-of-the-art large embedding model from mixedbread.ai

335m

Embedding

Min RAM: 0 MB

mixedbread.aimxbai

ollama run mxbai-embed-large

phi4

Language

7.2M pulls1 year ago

Phi-4 is a 14B parameter, state-of-the-art open model from Microsoft.

14b

Min RAM: 9.1 GBCtx: 16Kslow

MicrosoftPhi

ollama run phi4

gpt-oss

General

7.1M pulls4 months ago

OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases.

20b120b

ToolsThinkingCloud

Min RAM: 14 GBCtx: 128Kslow

OpenAILlama

ollama run gpt-oss

gemma

Language

5.9M pullsto version 1.1

Gemma is a family of lightweight, state-of-the-art open models built by Google DeepMind. Updated to version 1.1

2b7b

Min RAM: 1.7 GBCtx: 8Kfast

Google DeepMindGemma

ollama run gemma

qwen

Language

5.6M pulls1 year ago

Qwen 1.5 is a series of large language models by Alibaba Cloud spanning from 0.5B to 110B parameters

0.5b1.8b4b7b14b+3

Min RAM: 1.1 GBCtx: 32Kfast

Alibaba CloudQwen

ollama run qwen

llama2

Language

5.5M pulls2 years ago

Llama 2 is a collection of foundation language models ranging from 7B to 70B parameters.

7b13b70b

Min RAM: 3.8 GBCtx: 4Kmedium

MetaLlama

ollama run llama2

qwen2

Language

4.9M pulls1 year ago

Qwen2 is a new series of large language models from Alibaba group

0.5b1.5b7b72b

Tools

Min RAM: 4.4 GBCtx: 32Kmedium

Alibaba CloudQwen

ollama run qwen2

minicpm-v

Multimodal

4.6M pulls1 year ago

A series of multimodal LLMs (MLLMs) designed for vision-language understanding.

Vision

Min RAM: 5.5 GBCtx: 32Kmedium

Other

ollama run minicpm-v

codellama

Code

4.3M pulls1 year ago

A large language model that can use text prompts to generate and discuss code.

7b13b34b70b

Min RAM: 3.8 GBCtx: 16Kmedium

MetaCodeLlama

ollama run codellama

llama3.2-vision

Vision

3.8M pulls9 months ago

Llama 3.2 Vision is a collection of instruction-tuned image reasoning generative models in 11B and 90B sizes.

11b90b

Vision

Min RAM: 7.8 GBCtx: 128Kslow

MetaLlama

ollama run llama3.2-vision

tinyllama

Language

3.7M pulls2 years ago

The TinyLlama project is an open endeavor to train a compact 1.1B Llama model on 3 trillion tokens.

1.1b

Min RAM: 0 MB

Llama

ollama run tinyllama

dolphin3

General

3.6M pulls1 year ago

Dolphin 3.0 Llama 3.1 8B 🐬 is the next generation of the Dolphin series of instruct-tuned models designed to be the ultimate general purpose local model, enabling coding, math, agentic, function calling, and general use cases.

Min RAM: 4.9 GBCtx: 128Kmedium

Cognitive ComputationsLlama

ollama run dolphin3

deepseek-v3

Language

3.5M pulls1 year ago

A strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.

671b

Min RAM: 404 GBCtx: 160Kslow

DeepSeekDeepSeek

ollama run deepseek-v3

olmo2

Language

3.5M pulls1 year ago

OLMo 2 is a new family of 7B and 13B models trained on up to 5T tokens. These models are on par with or better than equivalently sized fully open models, and competitive with open-weight models such as Llama 3.1 on English academic benchmarks.

7b13b

Min RAM: 4.5 GBCtx: 4Kmedium

Nomic AIOther

ollama run olmo2

mistral-nemo

Language

3.4M pulls7 months ago

A state-of-the-art 12B model with 128k context length, built by Mistral AI in collaboration with NVIDIA.

12b

Tools

Min RAM: 7.1 GBCtx: 1Mslow

Mistral AIMistral

ollama run mistral-nemo

bge-m3

Embedding

3.3M pulls1 year ago

BGE-M3 is a new model from BAAI distinguished for its versatility in Multi-Functionality, Multi-Linguality, and Multi-Granularity.

567m

Embedding

Min RAM: 1.2 GBCtx: 8Kfast

BAAIOther

ollama run bge-m3

llama3.3

Language

3.3M pulls1 year ago

New state of the art 70B model. Llama 3.3 70B offers similar performance compared to the Llama 3.1 405B model.

70b

Tools

Min RAM: 43 GBCtx: 128Kslow

MetaLlama

ollama run llama3.3

qwen3-coder

Code

3.2M pulls5 months ago

Alibaba's performant long context models for agentic and coding tasks.

30b480b

ToolsCloud

Min RAM: 19 GBCtx: 256Kslow

Alibaba CloudQwen

ollama run qwen3-coder

deepseek-coder

Code

3.1M pulls2 years ago

DeepSeek Coder is a capable coding model trained on two trillion code and natural language tokens.

1.3b6.7b33b

Min RAM: 3.8 GBCtx: 16Kmedium

DeepSeekDeepSeek

ollama run deepseek-coder

smollm2

Language

2.5M pulls1 year ago

SmolLM2 is a family of compact language models available in three size: 135M, 360M, and 1.7B parameters.

135m360m1.7b

Tools

Min RAM: 1.8 GBCtx: 8Kfast

Other

ollama run smollm2

all-minilm

Embedding

2.5M pulls1 year ago

Embedding models on very large sentence level datasets.

22m33m

Embedding

Min RAM: 0 MB

Sentence TransformersBERT

ollama run all-minilm

mistral-small

Language

2.4M pulls1 year ago

Mistral Small 3 sets a new benchmark in the “small” Large Language Models category below 70B.

22b24b

Tools

Min RAM: 13 GBCtx: 128Kslow

Mistral AIMistral

ollama run mistral-small

codegemma

Code

2.2M pulls1 year ago

CodeGemma is a collection of powerful, lightweight models that can perform a variety of coding tasks like fill-in-the-middle code completion, code generation, natural language understanding, mathematical reasoning, and instruction following.

2b7b

Min RAM: 1.6 GBCtx: 8Kfast

Google DeepMindGemma

ollama run codegemma

granite3.1-moe

Language

2.2M pulls1 year ago

The IBM Granite 1B and 3B models are long-context mixture of experts (MoE) Granite models from IBM designed for low latency usage.

1b3b

Tools

Min RAM: 1.4 GBCtx: 128Kfast

IBMGranite

ollama run granite3.1-moe

falcon3

Science

2.2M pulls1 year ago

A family of efficient AI models under 10B parameters performant in science, math, and coding through innovative training techniques.

1b3b7b10b

Min RAM: 1.8 GBCtx: 32Kfast

TIIFalcon

ollama run falcon3

llava-llama3

Multimodal

2.1M pulls1 year ago

A LLaVA model fine-tuned from Llama 3 Instruct with better scores in several benchmarks.

Vision

Min RAM: 5.5 GBCtx: 8Kmedium

MetaLlama

ollama run llava-llama3

snowflake-arctic-embed

Embedding

2.1M pulls1 year ago

A suite of text embedding models by Snowflake, optimized for performance.

22m33m110m137m335m

Embedding

Min RAM: 0 MB

SnowflakeOther

ollama run snowflake-arctic-embed

starcoder2

Code

2.1M pulls1 year ago

StarCoder2 is the next generation of transparently trained open code LLMs that comes in three sizes: 3B, 7B and 15B parameters.

3b7b15b

Min RAM: 1.7 GBCtx: 16Kfast

BigCodeStarCoder

ollama run starcoder2

qwq

Reasoning

2.1M pulls11 months ago

QwQ is the reasoning model of the Qwen series.

32b

Tools

Min RAM: 20 GBCtx: 40Kslow

Alibaba CloudQwen

ollama run qwq

orca-mini

General

2.0M pulls2 years ago

A general-purpose model ranging from 3 billion parameters to 70 billion, suitable for entry-level hardware.

3b7b13b70b

Min RAM: 2 GBCtx: 4Kfast

Orca

ollama run orca-mini

mixtral

Language

1.9M pulls1 year ago

A set of Mixture of Experts (MoE) model with open weights by Mistral AI in 8x7b and 8x22b parameter sizes.

7B22B

Tools

Min RAM: 26 GBCtx: 64Kslow

Mistral AIMistral

ollama run mixtral

llama2-uncensored

Language

1.8M pulls2 years ago

Uncensored Llama 2 model by George Sung and Jarrad Hope.

7b70b

Min RAM: 3.8 GBCtx: 2Kmedium

Llama

ollama run llama2-uncensored

lfm2

Language

1.6M pulls2 days ago

LFM2 is a family of hybrid models designed for on-device deployment. LFM2-24B-A2B is the largest model in the family, scaling the architecture to 24 billion parameters while keeping inference efficient.

24b

Tools

Min RAM: 14 GBCtx: 32Kslow

Llama

ollama run lfm2

deepseek-coder-v2

Code

1.6M pulls1 year ago

An open-source Mixture-of-Experts code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks.

16b236b

Min RAM: 8.9 GBCtx: 160Kslow

DeepSeekDeepSeek

ollama run deepseek-coder-v2

qwen3-vl

Multimodal

1.6M pulls3 months ago

The most powerful vision-language model in the Qwen model family to date.

2b4b8b30b32b+1

VisionToolsThinkingCloud

Min RAM: 1.9 GBCtx: 256Kfast

Alibaba CloudQwen

ollama run qwen3-vl

cogito

Reasoning

1.4M pulls10 months ago

Cogito v1 Preview is a family of hybrid reasoning models by Deep Cogito that outperform the best available open models of the same size, including counterparts from LLaMA, DeepSeek, and Qwen across most standard benchmarks.

3b8b14b32b70b

Tools

Min RAM: 2.2 GBCtx: 128Kfast

Deep CogitoOther

ollama run cogito

qwen2.5vl

Multimodal

1.3M pulls9 months ago

Flagship vision-language model of Qwen and also a significant leap from the previous Qwen2-VL.

3b7b32b72b

Vision

Min RAM: 3.2 GBCtx: 125Kmedium

Alibaba CloudQwen

ollama run qwen2.5vl

mistral-small3.2

Language

1.3M pulls8 months ago

An update to Mistral Small that improves on function calling, instruction following, and less repetition errors.

24b

VisionTools

Min RAM: 15 GBCtx: 128Kslow

Mistral AIMistral

ollama run mistral-small3.2

gemma3n

General

1.2M pulls8 months ago

Gemma 3n models are designed for efficient execution on everyday devices such as laptops, tablets or phones.

e2be4b

Min RAM: 5.6 GBCtx: 32Kmedium

Google DeepMindGemma

ollama run gemma3n

llama4

Multimodal

1.2M pulls8 months ago

Meta's latest collection of multimodal models.

16x17b128x17b

VisionTools

Min RAM: 67 GBCtx: 10Mslow

MetaLlama

ollama run llama4

phi4-reasoning

Reasoning

1.2M pulls10 months ago

Phi 4 reasoning and reasoning plus are 14-billion parameter open-weight reasoning models that rival much larger models on complex reasoning tasks.

14b

Min RAM: 11 GBCtx: 32Kslow

MicrosoftPhi

ollama run phi4-reasoning

dolphin-phi

Language

1.2M pulls2 years ago

2.7B uncensored Dolphin model by Eric Hartford, based on the Phi language model by Microsoft Research.

2.7b

Min RAM: 1.6 GBCtx: 2Kfast

MicrosoftPhi

ollama run dolphin-phi

deepscaler

Math

1.1M pulls1 year ago

A fine-tuned version of Deepseek-R1-Distilled-Qwen-1.5B that surpasses the performance of OpenAI’s o1-preview with just 1.5B parameters on popular math evaluations.

1.5b

Min RAM: 3.6 GBCtx: 128Kmedium

DeepSeekQwen

ollama run deepscaler

magistral

Reasoning

1.1M pulls8 months ago

Magistral is a small, efficient reasoning model with 24B parameters.

24b

ToolsThinking

Min RAM: 14 GBCtx: 39Kslow

Mistral AIMistral

ollama run magistral

dolphin-llama3

General

1.1M pulls1 year ago

Dolphin 2.9 is a new model with 8B and 70B sizes by Eric Hartford based on Llama 3 that has a variety of instruction, conversational, and coding skills.

8b70b

Min RAM: 4.7 GBCtx: 8Kmedium

Eric HartfordLlama

ollama run dolphin-llama3

dolphin-mixtral

Code

1.1M pulls1 year ago

Uncensored, 8x7b and 8x22b fine-tuned models based on the Mixtral mixture of experts models that excels at coding tasks. Created by Eric Hartford.

7B22B

Min RAM: 26 GBCtx: 64Kslow

Dolphin

ollama run dolphin-mixtral

phi

Language

1.0M pulls2 years ago

Phi-2: a 2.7B language model by Microsoft Research that demonstrates outstanding reasoning and language understanding capabilities.

2.7b

Min RAM: 1.6 GBCtx: 2Kfast

MicrosoftPhi

ollama run phi

smollm

Language

970K pulls1 year ago

🪐 A family of small models with 135M, 360M, and 1.7B parameters, trained on a new high-quality dataset.

135m360m1.7b

Min RAM: 0 MB

HuggingFaceOther

ollama run smollm

qwen3-embedding

Embedding

946K pulls5 months ago

Building upon the foundational models of the Qwen3 series, Qwen3 Embedding provides a comprehensive range of text embeddings models in various sizes

0.6b4b8b

Embedding

Min RAM: 2.5 GBCtx: 40Kfast

Alibaba CloudQwen

ollama run qwen3-embedding

phi4-mini

Language

906K pulls12 months ago

Phi-4-mini brings significant enhancements in multilingual support, reasoning, and mathematics, and now, the long-awaited function calling feature is finally supported.

3.8b

Tools

Min RAM: 2.5 GBCtx: 128Kfast

MicrosoftPhi

ollama run phi4-mini

granite3.3

Language

903K pulls10 months ago

IBM Granite 2B and 8B models are 128K context length language models that have been fine-tuned for improved reasoning and instruction-following capabilities.

2b8b

Tools

Min RAM: 1.5 GBCtx: 128Kfast

IBMGranite

ollama run granite3.3

codestral

Code

834K pulls1 year ago

Codestral is Mistral AI’s first-ever code model designed for code generation tasks.

22b

Min RAM: 13 GBCtx: 32Kslow

Mistral AIMistral

ollama run codestral

openthinker

Reasoning

797K pulls10 months ago

A fully open-source family of reasoning models built using a dataset derived by distilling DeepSeek-R1.

7b32b

Min RAM: 4.7 GBCtx: 32Kmedium

DeepSeekDeepSeek

ollama run openthinker

granite3.2-vision

Vision

767K pulls12 months ago

A compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more.

VisionTools

Min RAM: 2.4 GBCtx: 16Kfast

IBMGranite

ollama run granite3.2-vision

devstral

Code

761K pulls7 months ago

Devstral: the best open source model for coding agents

24b

Tools

Min RAM: 14 GBCtx: 128Kslow

CodeLlama

ollama run devstral

dolphin-mistral

Code

747K pullsto version 2.8.

The uncensored Dolphin model based on Mistral that excels at coding tasks. Updated to version 2.8.

Min RAM: 4.1 GBCtx: 32Kmedium

Cognitive ComputationsMistral

ollama run dolphin-mistral

granite4

Language

733K pulls4 months ago

Granite 4 features improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.

350m1b3b

Tools

Min RAM: 2.1 GBCtx: 128Kfast

Granite

ollama run granite4

command-r

Language

720K pulls1 year ago

Command R is a Large Language Model optimized for conversational interaction and long context tasks.

35b

Tools

Min RAM: 19 GBCtx: 128Kslow

CohereCommand

ollama run command-r

qwen3-coder-next

Code

691K pulls2 weeks ago

Qwen3-Coder-Next is a coding-focused language model from Alibaba's Qwen team, optimized for agentic coding workflows and local development.

ToolsCloud

Min RAM: 52 GBCtx: 256Kslow

Alibaba CloudQwen

ollama run qwen3-coder-next

granite-code

Code

689K pulls1 year ago

A family of open foundation models by IBM for Code Intelligence

3b8b20b34b

Min RAM: 2 GBCtx: 125Kfast

IBMGranite

ollama run granite-code

wizardlm2

Language

667K pulls1 year ago

State of the art large language model from Microsoft AI with improved performance on complex chat, multilingual, reasoning and agent use cases.

7b8x22b

Min RAM: 4.1 GBCtx: 64Kmedium

MicrosoftWizardLM

ollama run wizardlm2

deepcoder

Code

652K pulls10 months ago

DeepCoder is a fully open-Source 14B coder model at O3-mini level, with a 1.5B version also available.

1.5b14b

Min RAM: 1.1 GBCtx: 128Kfast

DeepSeek

ollama run deepcoder

moondream

Multimodal

649K pulls1 year ago

moondream2 is a small vision language model designed to run efficiently on edge devices.

1.8b

Vision

Min RAM: 1.7 GBCtx: 2Kfast

Moondream

ollama run moondream

hermes3

Language

631K pulls1 year ago

Hermes 3 is the latest version of the flagship Hermes series of LLMs by Nous Research

3b8b70b405b

Tools

Min RAM: 2 GBCtx: 128Kfast

Nous ResearchOpenHermes

ollama run hermes3

mistral-small3.1

Multimodal

609K pulls10 months ago

Building upon Mistral Small 3, Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance.

24b

VisionTools

Min RAM: 15 GBCtx: 128Kslow

Mistral AIMistral

ollama run mistral-small3.1

lfm2.5-thinking

Language

598K pulls1 month ago

LFM2.5 is a new family of hybrid models designed for on-device deployment.

1.2b

Tools

Min RAM: 0 MB

Llama

ollama run lfm2.5-thinking

Language

587K pulls1 year ago

Yi 1.5 is a high-performing, bilingual language model.

6b9b34b

Min RAM: 3.5 GBCtx: 4Kmedium

01-AIYi

ollama run yi

zephyr

Language

567K pulls1 year ago

Zephyr is a series of fine-tuned versions of the Mistral and Mixtral models that are trained to act as helpful assistants.

7b141b

Min RAM: 4.1 GBCtx: 64Kmedium

HuggingFaceMistral

ollama run zephyr

mistral-large

Reasoning

558K pulls1 year ago

Mistral Large 2 is Mistral's new flagship model that is significantly more capable in code generation, mathematics, and reasoning with 128k context window and support for dozens of languages.

123b

Tools

Min RAM: 73 GBCtx: 128Kslow

Mistral AIMistral

ollama run mistral-large

phi3.5

Language

555K pulls1 year ago

A lightweight AI model with 3.8 billion parameters with performance overtaking similarly and larger sized models.

3.8b

Min RAM: 2.2 GBCtx: 4Kfast

MicrosoftPhi

ollama run phi3.5

wizard-vicuna-uncensored

Language

549K pulls2 years ago

Wizard Vicuna Uncensored is a 7B, 13B, and 30B parameter model based on Llama 2 uncensored by Eric Hartford.

7b13b30b

Min RAM: 3.8 GBCtx: 2Kmedium

Vicuna

ollama run wizard-vicuna-uncensored

embeddinggemma

Embedding

544K pulls5 months ago

EmbeddingGemma is a 300M parameter embedding model from Google.

300m

Embedding

Min RAM: 0 MB

GoogleGemma

ollama run embeddinggemma

bakllava

Multimodal

524K pulls2 years ago

BakLLaVA is a multimodal model consisting of the Mistral 7B base model augmented with the LLaVA architecture.

Vision

Min RAM: 4.7 GBCtx: 32Kmedium

SkunkworksAIBakLLaVA

ollama run bakllava

paraphrase-multilingual

Embedding

510K pulls1 year ago

Sentence-transformers model that can be used for tasks like clustering or semantic search.

278m

Embedding

Min RAM: 0 MB

Nomic AIBERT

ollama run paraphrase-multilingual

starcoder

Code

505K pulls2 years ago

StarCoder is a code generation model trained on 80+ programming languages.

1b3b7b15b

Min RAM: 1.8 GBCtx: 8Kfast

BigCodeStarCoder

ollama run starcoder

nous-hermes

General

497K pulls2 years ago

General use models based on Llama and Llama 2 from Nous Research.

7b13b

Min RAM: 3.8 GBCtx: 4Kmedium

Nous ResearchLlama

ollama run nous-hermes

exaone-deep

Reasoning

495K pulls11 months ago

EXAONE Deep exhibits superior capabilities in various reasoning tasks including math and coding benchmarks, ranging from 2.4B to 32B parameters developed and released by LG AI Research.

2.4b7.8b32b

Min RAM: 1.6 GBCtx: 32Kfast

LG AI ResearchOther

ollama run exaone-deep

deepseek-llm

Language

491K pulls2 years ago

An advanced language model crafted with 2 trillion bilingual tokens.

7b67b

Min RAM: 4 GBCtx: 4Kmedium

DeepSeekDeepSeek

ollama run deepseek-llm

falcon

Language

488K pulls2 years ago

A large language model built by the Technology Innovation Institute (TII) for use in summarization, text generation, and chat bots.

7b40b180b

Min RAM: 4.2 GBCtx: 2Kmedium

TIIFalcon

ollama run falcon

deepseek-v2

Language

487K pulls1 year ago

A strong, economical, and efficient Mixture-of-Experts language model.

16b236b

Min RAM: 8.9 GBCtx: 160Kslow

DeepSeekDeepSeek

ollama run deepseek-v2

ministral-3

Language

487K pulls2 months ago

The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware.

3b8b14b

VisionToolsCloud

Min RAM: 3 GBCtx: 256Kmedium

Mistral AIMistral

ollama run ministral-3

openchat

Language

486K pullsto version 3.5-0106.

A family of open-source models trained on a wide variety of data, surpassing ChatGPT on various benchmarks. Updated to version 3.5-0106.

Min RAM: 4.1 GBCtx: 8Kmedium

Llama

ollama run openchat

vicuna

General

482K pulls2 years ago

General use chat model based on Llama and Llama 2 with 2K to 16K context sizes.

7b13b33b

Min RAM: 3.8 GBCtx: 4Kmedium

MetaVicuna

ollama run vicuna

glm4

Language

473K pulls1 year ago

A strong multi-lingual general language model with competitive performance to Llama 3.

Min RAM: 5.5 GBCtx: 128Kmedium

Alibaba CloudOther

ollama run glm4

openhermes

Language

462K pulls2 years ago

OpenHermes 2.5 is a 7B model fine-tuned by Teknium on Mistral with fully open datasets.

Min RAM: 3.1 GBCtx: 32Kmedium

TekniumOpenHermes

ollama run openhermes

codeqwen

Code

462K pulls1 year ago

CodeQwen1.5 is a large language model pretrained on a large amount of code data.

Min RAM: 4.2 GBCtx: 64Kmedium

Alibaba CloudQwen

ollama run codeqwen

qwen2-math

Math

450K pulls1 year ago

Qwen2 Math is a series of specialized math language models built upon the Qwen2 LLMs, which significantly outperforms the mathematical capabilities of open-source models and even closed-source models (e.g., GPT4o).

1.5b7b72b

Min RAM: 4.4 GBCtx: 4Kmedium

Alibaba CloudQwen

ollama run qwen2-math

aya

Language

444K pulls1 year ago

Aya 23, released by Cohere, is a new family of state-of-the-art, multilingual models that support 23 languages.

8b35b

Min RAM: 4.8 GBCtx: 8Kmedium

CohereOther

ollama run aya

llama2-chinese

Language

438K pulls2 years ago

Llama 2 based model fine tuned to improve Chinese dialogue ability.

7b13b

Min RAM: 3.8 GBCtx: 4Kmedium

MetaLlama

ollama run llama2-chinese

stable-code

Code

432K pulls1 year ago

Stable Code 3B is a coding model with instruct and code completion variants on par with models such as Code Llama 7B that are 2.5x larger.

Min RAM: 1.6 GBCtx: 16Kfast

Stability AICodeLlama

ollama run stable-code

neural-chat

Language

431K pulls2 years ago

A fine-tuned model based on Mistral with good coverage of domain and language.

Min RAM: 4.1 GBCtx: 32Kmedium

IntelMistral

ollama run neural-chat

nous-hermes2

Science

427K pulls2 years ago

The powerful family of models by Nous Research that excels at scientific discussion and coding tasks.

10.7b34b

Min RAM: 6.1 GBCtx: 4Kslow

Nous ResearchOpenHermes

ollama run nous-hermes2

opencoder

Code

420K pulls1 year ago

OpenCoder is an open and reproducible code LLM family which includes 1.5B and 8B models, supporting chat in English and Chinese languages.

1.5b8b

Min RAM: 1.4 GBCtx: 8Kfast

StarCoder

ollama run opencoder

sqlcoder

Code

420K pulls2 years ago

SQLCoder is a code completion model fined-tuned on StarCoder for SQL generation tasks

7b15b

Min RAM: 4.1 GBCtx: 32Kmedium

StarCoder

ollama run sqlcoder

wizardcoder

Code

420K pulls2 years ago

State-of-the-art code generation model

33b

Min RAM: 3.8 GBCtx: 16Kmedium

WizardLMWizardLM

ollama run wizardcoder

yi-coder

Code

413K pulls1 year ago

Yi-Coder is a series of open-source code language models that delivers state-of-the-art coding performance with fewer than 10 billion parameters.

1.5b9b

Min RAM: 5 GBCtx: 128Kmedium

01-AIYi

ollama run yi-coder

stablelm2

Language

407K pulls1 year ago

Stable LM 2 is a state-of-the-art 1.6B and 12B parameter language model trained on multilingual data in English, Spanish, German, Italian, French, Portuguese, and Dutch.

1.6b12b

Min RAM: 7 GBCtx: 4Kslow

Stability AIOther

ollama run stablelm2

llama3-chatqa

Language

405K pulls1 year ago

A model from NVIDIA based on Llama 3 that excels at conversational question answering (QA) and retrieval-augmented generation (RAG).

8b70b

Min RAM: 4.7 GBCtx: 8Kmedium

NVIDIALlama

ollama run llama3-chatqa

granite3-dense

Code

403K pulls1 year ago

The IBM Granite 2B and 8B models are designed to support tool-based use cases and support for retrieval augmented generation (RAG), streamlining code generation, translation and bug fixing.

2b8b

Tools

Min RAM: 1.6 GBCtx: 4Kfast

IBMGranite

ollama run granite3-dense

granite3.1-dense

Language

402K pulls1 year ago

The IBM Granite 2B and 8B models are text-only dense LLMs trained on over 12 trillion tokens of data, demonstrated significant improvements over their predecessors in performance and speed in IBM’s initial testing.

2b8b

Tools

Min RAM: 1.6 GBCtx: 128Kfast

IBMGranite

ollama run granite3.1-dense

dolphincoder

Code

392K pulls1 year ago

A 7B and 15B uncensored variant of the Dolphin model family that excels at coding, based on StarCoder2.

7b15b

Min RAM: 4.2 GBCtx: 16Kmedium

Cognitive ComputationsDolphin

ollama run dolphincoder

wizard-math

Math

392K pulls2 years ago

Model focused on math and logic problems

7b13b70b

Min RAM: 4.1 GBCtx: 32Kmedium

WizardLM

ollama run wizard-math

translategemma

Language

391K pulls1 month ago

A new collection of open translation models built on Gemma 3, helping people communicate across 55 languages.

4b12b27b

Vision

Min RAM: 3.3 GBCtx: 128Kmedium

Google DeepMindGemma

ollama run translategemma

llama3-gradient

Language

391K pulls1 year ago

This model extends LLama-3 8B's context length from 8k to over 1m tokens.

8b70b

Min RAM: 4.7 GBCtx: 1Mmedium

MetaLlama

ollama run llama3-gradient

samantha-mistral

Language

387K pulls2 years ago

A companion assistant trained in philosophy, psychology, and personal relationships. Based on Mistral.

Min RAM: 4.1 GBCtx: 32Kmedium

Mistral AIMistral

ollama run samantha-mistral

deepseek-v3.1

Code

385K pulls5 months ago

DeepSeek-V3.1-Terminus is a hybrid model that supports both thinking mode and non-thinking mode.

671b

ToolsThinkingCloud

Min RAM: 404 GBCtx: 160Kslow

DeepSeekDeepSeek

ollama run deepseek-v3.1

command-r-plus

Language

382K pulls1 year ago

Command R+ is a powerful, scalable large language model purpose-built to excel at real-world enterprise use cases.

104b

Tools

Min RAM: 59 GBCtx: 128Kslow

CohereCommand

ollama run command-r-plus

internlm2

Reasoning

381K pulls1 year ago

InternLM2.5 is a 7B parameter model tailored for practical scenarios with outstanding reasoning capability.

1m1.8b7b20b

Min RAM: 1.1 GBCtx: 256Kfast

Alibaba CloudOther

ollama run internlm2

llama3-groq-tool-use

Language

379K pulls1 year ago

A series of models from Groq that represent a significant advancement in open-source AI capabilities for tool use/function calling.

8b70b

Tools

Min RAM: 4.7 GBCtx: 8Kmedium

GroqLlama

ollama run llama3-groq-tool-use

llama-guard3

Language

375K pulls1 year ago

Llama Guard 3 is a series of models fine-tuned for content safety classification of LLM inputs and responses.

1b8b

Min RAM: 1.6 GBCtx: 128Kfast

MetaLlama

ollama run llama-guard3

starling-lm

Language

374K pulls1 year ago

Starling is a large language model trained by reinforcement learning from AI feedback focused on improving chatbot helpfulness.

Min RAM: 4.1 GBCtx: 8Kmedium

Berkeley NestOther

ollama run starling-lm

phind-codellama

Code

372K pulls2 years ago

Code generation model based on Code Llama.

34b

Min RAM: 19 GBCtx: 16Kslow

MetaCodeLlama

ollama run phind-codellama

solar

Language

372K pulls2 years ago

A compact, yet powerful 10.7B large language model designed for single-turn conversation.

10.7b

Min RAM: 6.1 GBCtx: 4Kslow

UpstageLlama

ollama run solar

xwinlm

Language

369K pulls2 years ago

Conversational model based on Llama 2 that performs competitively on various benchmarks.

7b13b

Min RAM: 3.8 GBCtx: 4Kmedium

Llama

ollama run xwinlm

aya-expanse

Language

367K pulls1 year ago

Cohere For AI's language models trained to perform well across 23 different languages.

8b32b

Tools

Min RAM: 5.1 GBCtx: 8Kmedium

CohereOther

ollama run aya-expanse

granite3-moe

Language

363K pulls1 year ago

The IBM Granite 1B and 3B models are the first mixture of experts (MoE) Granite models from IBM designed for low latency usage.

1b3b

Tools

Min RAM: 2.1 GBCtx: 4Kfast

IBMGranite

ollama run granite3-moe

yarn-llama2

Language

363K pulls2 years ago

An extension of Llama 2 that supports a context of up to 128k tokens.

7b13b

Min RAM: 3.8 GBCtx: 64Kmedium

MetaLlama

ollama run yarn-llama2

codegeex4

Code

355K pulls1 year ago

A versatile model for AI software development scenarios, including code completion.

Min RAM: 5.5 GBCtx: 128Kmedium

THUDMCodeLlama

ollama run codegeex4

mistral-openorca

Language

354K pulls2 years ago

Mistral OpenOrca is a 7 billion parameter model, fine-tuned on top of the Mistral 7B model using the OpenOrca dataset.

Min RAM: 4.1 GBCtx: 32Kmedium

Mistral AIMistral

ollama run mistral-openorca

tinydolphin

Language

353K pulls2 years ago

An experimental 1.1B parameter model trained on the new Dolphin 2.8 dataset by Eric Hartford and based on TinyLlama.

1.1b

Min RAM: 0 MB

Dolphin

ollama run tinydolphin

orca2

Reasoning

350K pulls2 years ago

Orca 2 is built by Microsoft research, and are a fine-tuned version of Meta's Llama 2 models. The model is designed to excel particularly in reasoning.

7b13b

Min RAM: 3.8 GBCtx: 4Kmedium

MicrosoftOrca

ollama run orca2

stable-beluga

Language

343K pulls2 years ago

Llama 2 based model fine tuned on an Orca-style dataset. Originally called Free Willy.

7b13b70b

Min RAM: 3.8 GBCtx: 4Kmedium

Stability AILlama

ollama run stable-beluga

qwen3-next

Language

342K pulls2 months ago

The first installment in the Qwen3-Next series with strong performance in terms of both parameter efficiency and inference speed.

80b

ToolsThinkingCloud

Min RAM: 50 GBCtx: 256Kslow

Alibaba CloudQwen

ollama run qwen3-next

reader-lm

Language

337K pulls1 year ago

A series of models that convert HTML content to Markdown content, which is useful for content conversion tasks.

0.5b1.5b

Min RAM: 0 MB

Jina AIOther

ollama run reader-lm

shieldgemma

Language

333K pulls1 year ago

ShieldGemma is set of instruction tuned models for evaluating the safety of text prompt input and text output responses against a set of defined safety policies.

2b9b27b

Min RAM: 1.7 GBCtx: 8Kfast

Google DeepMindGemma

ollama run shieldgemma

llama-pro

Math

329K pulls2 years ago

An expansion of Llama 2 that specializes in integrating both general language understanding and domain-specific knowledge, particularly in programming and mathematics.

Min RAM: 3.5 GBCtx: 4Kmedium

MetaLlama

ollama run llama-pro

rnj-1

Code

326K pulls2 months ago

Rnj-1 is a family of 8B parameter open-weight, dense models trained from scratch by Essential AI, optimized for code and STEM with capabilities on par with SOTA open-weight models.

ToolsCloud

Min RAM: 5.1 GBCtx: 32Kmedium

Essential AIOther

ollama run rnj-1

yarn-mistral

Language

326K pulls2 years ago

An extension of Mistral to support context windows of 64K or 128K.

Min RAM: 4.1 GBCtx: 32Kmedium

Nous ResearchMistral

ollama run yarn-mistral

nexusraven

Language

323K pulls2 years ago

Nexus Raven is a 13B instruction tuned model for function calling tasks.

13b

Min RAM: 7.4 GBCtx: 16Kslow

NexusflowLlama

ollama run nexusraven

wizardlm

General

318K pulls2 years ago

General use model based on Llama 2.

70B

Min RAM: 2.8 GBCtx: 4Kmedium

MetaWizardLM

ollama run wizardlm

meditron

Medical

303K pulls2 years ago

Open-source medical large language model adapted from Llama 2 to the medical domain.

7b70b

Min RAM: 3.8 GBCtx: 4Kmedium

EPFLLlama

ollama run meditron

glm-4.7-flash

Language

301K pulls1 month ago

As the strongest model in the 30B class, GLM-4.7-Flash offers a new option for lightweight deployment that balances performance and efficiency.

ToolsThinking

Min RAM: 19 GBCtx: 198Kslow

Alibaba CloudOther

ollama run glm-4.7-flash

reflection

Reasoning

287K pulls1 year ago

A high-performing model trained with a new technique called Reflection-tuning that teaches a LLM to detect mistakes in its reasoning and correct course.

70b

Min RAM: 40 GBCtx: 128Kslow

Other

ollama run reflection

nemotron-mini

Language

280K pulls1 year ago

A commercial-friendly small language model by NVIDIA optimized for roleplay, RAG QA, and function calling.

Tools

Min RAM: 2.7 GBCtx: 4Kmedium

NVIDIAOther

ollama run nemotron-mini

granite3.2

Reasoning

273K pulls1 year ago

Granite-3.2 is a family of long-context AI models from IBM Granite fine-tuned for thinking capabilities.

2b8b

Tools

Min RAM: 1.5 GBCtx: 128Kfast

IBMGranite

ollama run granite3.2

wizardlm-uncensored

Language

269K pulls2 years ago

Uncensored version of Wizard LM model

13b

Min RAM: 7.4 GBCtx: 4Kslow

WizardLM

ollama run wizardlm-uncensored

athene-v2

Code

267K pulls1 year ago

Athene-V2 is a 72B parameter model which excels at code completion, mathematics, and log extraction tasks.

72b

Tools

Min RAM: 47 GBCtx: 32Kslow

NexusflowLlama

ollama run athene-v2

nemotron

General

265K pulls1 year ago

Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries.

70b

Tools

Min RAM: 43 GBCtx: 128Kslow

NVIDIALlama

ollama run nemotron

exaone3.5

Language

264K pulls1 year ago

EXAONE 3.5 is a collection of instruction-tuned bilingual (English and Korean) generative models ranging from 2.4B to 32B parameters, developed and released by LG AI Research.

2.4b7.8b32b

Min RAM: 1.6 GBCtx: 32Kfast

LG AI ResearchOther

ollama run exaone3.5

snowflake-arctic-embed2

Embedding

258K pulls1 year ago

Snowflake's frontier embedding model. Arctic Embed 2.0 adds multilingual support without sacrificing English performance or scalability.

568m

Embedding

Min RAM: 1.2 GBCtx: 8Kfast

SnowflakeOther

ollama run snowflake-arctic-embed2

nous-hermes2-mixtral

Language

250K pulls1 year ago

The Nous Hermes 2 model from Nous Research, now trained over Mixtral.

8x7b

Min RAM: 26 GBCtx: 32Kslow

Nous ResearchMistral

ollama run nous-hermes2-mixtral

r1-1776

Language

247K pulls1 year ago

A version of the DeepSeek-R1 model that has been post trained to provide unbiased, accurate, and factual information by Perplexity.

70b671b

Min RAM: 43 GBCtx: 128Kslow

DeepSeekDeepSeek

ollama run r1-1776

medllama2

Medical

244K pulls2 years ago

Fine-tuned Llama 2 model to answer medical questions based on an open source medical dataset.

Min RAM: 3.8 GBCtx: 4Kmedium

MetaLlama

ollama run medllama2

codeup

Code

234K pulls2 years ago

Great code generation model based on Llama2.

13b

Min RAM: 7.4 GBCtx: 4Kslow

DeepSeekLlama

ollama run codeup

everythinglm

Language

228K pulls2 years ago

Uncensored Llama2 based model with support for a 16K context window.

13b

Min RAM: 7.4 GBCtx: 16Kslow

Llama

ollama run everythinglm

mathstral

Math

224K pulls1 year ago

MathΣtral: a 7B model designed for math reasoning and scientific discovery by Mistral AI.

Min RAM: 4.1 GBCtx: 32Kmedium

Mistral AIMistral

ollama run mathstral

solar-pro

Language

222K pulls1 year ago

Solar Pro Preview: an advanced large language model (LLM) with 22 billion parameters designed to fit into a single GPU

22b

Min RAM: 13 GBCtx: 4Kslow

UpstageOther

ollama run solar-pro

magicoder

Code

218K pulls2 years ago

🎩 Magicoder is a family of 7B parameter models trained on 75K synthetic instruction data using OSS-Instruct, a novel approach to enlightening LLMs with open-source code snippets.

Min RAM: 3.8 GBCtx: 16Kmedium

UIUCCodeLlama

ollama run magicoder

falcon2

Language

216K pulls1 year ago

Falcon2 is an 11B parameters causal decoder-only model built by TII and trained over 5T tokens.

11b

Min RAM: 6.4 GBCtx: 2Kslow

TIIFalcon

ollama run falcon2

stablelm-zephyr

Language

216K pulls2 years ago

A lightweight chat model allowing accurate, and responsive output without requiring high-end hardware.

Min RAM: 1.6 GBCtx: 4Kfast

Stability AILlama

ollama run stablelm-zephyr

megadolphin

Language

215K pulls2 years ago

MegaDolphin-2.2-120b is a transformation of Dolphin-2.2-70b created by interleaving the model with itself.

120b

Min RAM: 68 GBCtx: 4Kslow

Dolphin

ollama run megadolphin

granite-embedding

Embedding

212K pulls1 year ago

The IBM Granite Embedding 30M and 278M models models are text-only dense biencoder embedding models, with 30M available in English only and 278M serving multilingual use cases.

30m278m

Embedding

Min RAM: 0 MB

IBMGranite

ollama run granite-embedding

duckdb-nsql

Code

212K pulls2 years ago

7B parameter text-to-SQL model made by MotherDuck and Numbers Station.

Min RAM: 3.8 GBCtx: 16Kmedium

MotherDuckSQLCoder

ollama run duckdb-nsql

nuextract

Language

210K pulls1 year ago

A 3.8B model fine-tuned on a private high-quality synthetic dataset for information extraction, based on Phi-3.

3.8b

Min RAM: 2.2 GBCtx: 4Kfast

NumindPhi

ollama run nuextract

mistrallite

Language

210K pulls2 years ago

MistralLite is a fine-tuned model based on Mistral with enhanced capabilities of processing long contexts.

Min RAM: 4.1 GBCtx: 32Kmedium

Mistral AIMistral

ollama run mistrallite

bespoke-minicheck

Language

209K pulls1 year ago

A state-of-the-art fact-checking model developed by Bespoke Labs.

Min RAM: 4.7 GBCtx: 32Kmedium

Bespoke LabsOther

ollama run bespoke-minicheck

tulu3

Language

208K pulls1 year ago

Tülu 3 is a leading instruction following model family, offering fully open-source data, code, and recipes by the The Allen Institute for AI.

8b70b

Min RAM: 4.9 GBCtx: 128Kmedium

The Allen Institute for AIOther

ollama run tulu3

notux

Language

206K pulls2 years ago

A top-performing mixture of experts model, fine-tuned with high-quality data.

8x7b

Min RAM: 26 GBCtx: 32Kslow

ArgillaOther

ollama run notux

notus

Language

206K pulls2 years ago

A 7B chat model fine-tuned with high-quality data and based on Zephyr.

Min RAM: 4.1 GBCtx: 32Kmedium

ArgillaLlama

ollama run notus

deepseek-ocr

Vision

206K pulls3 months ago

DeepSeek-OCR is a vision-language model that can perform token-efficient OCR.

Vision

Min RAM: 6.7 GBCtx: 8Kslow

DeepSeekDeepSeek

ollama run deepseek-ocr

wizard-vicuna

Language

205K pulls2 years ago

Wizard Vicuna is a 13B parameter model based on Llama 2 trained by MelodysDreamj.

13b

Min RAM: 7.4 GBCtx: 2Kslow

MelodysDreamjVicuna

ollama run wizard-vicuna

firefunction-v2

General

202K pulls1 year ago

An open weights function calling model based on Llama 3, competitive with GPT-4o function calling capabilities.

70B

Tools

Min RAM: 40 GBCtx: 8Kslow

fireworks-aiLlama

ollama run firefunction-v2

codebooga

Code

201K pulls2 years ago

A high-performing code instruct model created by merging two existing code models.

34b

Min RAM: 19 GBCtx: 16Kslow

oobaboogaCodeLlama

ollama run codebooga

bge-large

Embedding

200K pulls1 year ago

Embedding model from BAAI mapping texts to vectors.

335m

Embedding

Min RAM: 0 MB

BAAIOther

ollama run bge-large

llava-phi3

Multimodal

199K pulls1 year ago

A new small LLaVA model fine-tuned from Phi 3 Mini.

3.8b

Vision

Min RAM: 2.9 GBCtx: 4Kmedium

LLaVA

ollama run llava-phi3

open-orca-platypus2

Code

198K pulls2 years ago

Merge of the Open Orca OpenChat model and the Garage-bAInd Platypus 2 model. Designed for chat and code generation.

13b

Min RAM: 7.4 GBCtx: 4Kslow

Open-OrcaOrca

ollama run open-orca-platypus2

dbrx

General

192K pulls1 year ago

DBRX is an open, general-purpose LLM created by Databricks.

132b

Min RAM: 74 GBCtx: 32Kslow

DatabricksOther

ollama run dbrx

goliath

Language

187K pulls2 years ago

A language model created by combining two fine-tuned Llama 2 70B models into one.

Min RAM: 50 GBCtx: 4Kslow

Llama

ollama run goliath

nemotron-3-nano

Language

178K pulls2 months ago

Nemotron 3 Nano - A new Standard for Efficient, Open, and Intelligent Agentic Models

30b

ToolsThinkingCloud

Min RAM: 24 GBCtx: 1Mslow

MetaOther

ollama run nemotron-3-nano

sailor2

Language

167K pulls1 year ago

Sailor2 are multilingual language models made for South-East Asia. Available in 1B, 8B, and 20B parameter sizes.

1b8b20b

Min RAM: 1.1 GBCtx: 32Kfast

Other

ollama run sailor2

olmo-3

Language

162K pulls2 months ago

Olmo is a series of Open language models designed to enable the science of language models. These models are pre-trained on the Dolma 3 dataset and post-trained on the Dolci datasets.

7b32b

Min RAM: 4.5 GBCtx: 64Kmedium

Other

ollama run olmo-3

devstral-small-2

Code

161K pulls2 months ago

24B model that excels at using tools to explore codebases, editing multiple files and power software engineering agents.

24b

VisionToolsCloud

Min RAM: 15 GBCtx: 384Kslow

DeepSeekCodeLlama

ollama run devstral-small-2

smallthinker

Reasoning

156K pulls1 year ago

A new small reasoning model fine-tuned from the Qwen 2.5 3B Instruct model.

Min RAM: 3.6 GBCtx: 32Kmedium

Alibaba CloudQwen

ollama run smallthinker

command-r7b

Language

155K pulls1 year ago

The smallest model in Cohere's R series delivers top-tier speed, efficiency, and quality to build powerful AI applications on commodity GPUs and edge devices.

Tools

Min RAM: 5.1 GBCtx: 8Kmedium

CohereCommand

ollama run command-r7b

phi4-mini-reasoning

Reasoning

153K pulls10 months ago

Phi 4 mini reasoning is a lightweight open model that balances efficiency with advanced reasoning ability.

3.8b

Min RAM: 3.2 GBCtx: 128Kmedium

MicrosoftPhi

ollama run phi4-mini-reasoning

deepseek-v2.5

Code

153K pulls1 year ago

An upgraded version of DeekSeek-V2 that integrates the general and coding abilities of both DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct.

236b

Min RAM: 133 GBCtx: 4Kslow

DeepSeekDeepSeek

ollama run deepseek-v2.5

granite3-guardian

General

147K pulls1 year ago

The IBM Granite Guardian 3.0 2B and 8B models are designed to detect risks in prompts and/or responses.

2b8b

Min RAM: 2.7 GBCtx: 8Kmedium

IBMGranite

ollama run granite3-guardian

command-a

General

130K pulls11 months ago

111 billion parameter model optimized for demanding enterprises that require fast, secure, and high-quality AI

111b

Tools

Min RAM: 67 GBCtx: 16Kslow

Command

ollama run command-a

marco-o1

Reasoning

119K pulls1 year ago

An open large reasoning model for real-world solutions by the Alibaba International Digital Commerce Group (AIDC-AI).

Min RAM: 4.7 GBCtx: 32Kmedium

Alibaba CloudQwen

ollama run marco-o1

alfred

Language

110K pulls2 years ago

A robust conversational model designed to be used for both chat and instruct use cases.

40b

Min RAM: 24 GBCtx: 2Kslow

LightOn AILlama

ollama run alfred

olmo-3.1

Language

98K pulls2 months ago

Olmo is a series of Open language models designed to enable the science of language models. These models are pre-trained on the Dolma 3 dataset and post-trained on the Dolci datasets.

32b

Tools

Min RAM: 19 GBCtx: 64Kslow

Nomic AIOther

ollama run olmo-3.1

kimi-k2.5

Multimodal

91K pulls1 month ago

Kimi K2.5 is an open-source, native multimodal agentic model that seamlessly integrates vision and language understanding with advanced agentic capabilities, instant and thinking modes, as well as conversational and agentic paradigms.

VisionToolsThinkingCloud

Min RAM: 0 MB

Other

ollama run kimi-k2.5

devstral-2

Code

90K pulls2 months ago

123B model that excels at using tools to explore codebases, editing multiple files and power software engineering agents.

123b

ToolsCloud

Min RAM: 75 GBCtx: 256Kslow

DeepSeekCodeLlama

ollama run devstral-2

command-r7b-arabic

Language

86K pulls12 months ago

A new state-of-the-art version of the lightweight Command R7B model that excels in advanced Arabic language capabilities for enterprises in the Middle East and Northern Africa.

Tools

Min RAM: 5.1 GBCtx: 16Kmedium

CohereCommand

ollama run command-r7b-arabic

cogito-2.1

Language

74K pulls3 months ago

The Cogito v2.1 LLMs are instruction tuned generative models. All models are released under MIT license for commercial use.

671b

Cloud

Min RAM: 0 MB

Other

ollama run cogito-2.1

gpt-oss-safeguard

Reasoning

73K pulls4 months ago

gpt-oss-safeguard-20b and gpt-oss-safeguard-120b are safety reasoning models built-upon gpt-oss

20b120b

ToolsThinking

Min RAM: 14 GBCtx: 128Kslow

Llama

ollama run gpt-oss-safeguard

functiongemma

Language

72K pulls2 months ago

FunctionGemma is a specialized version of Google's Gemma 3 270M model fine-tuned explicitly for function calling.

270m

Tools

Min RAM: 0 MB

Google DeepMindGemma

ollama run functiongemma

glm-4.6

Code

71K pulls4 months ago

Advanced agentic, reasoning and coding capabilities.

ToolsThinkingCloud

Min RAM: 0 MB

Alibaba CloudOther

ollama run glm-4.6

gemini-3-flash-preview

General

66K pulls2 months ago

Gemini 3 Flash offers frontier intelligence built for speed at a fraction of the cost.

VisionToolsThinkingCloud

Min RAM: 0 MB

Google DeepMindGemma

ollama run gemini-3-flash-preview

minimax-m2

Code

64K pulls4 months ago

MiniMax M2 is a high-efficiency large language model built for coding and agentic workflows.

ToolsThinkingCloud

Min RAM: 0 MB

Other

ollama run minimax-m2

glm-5

Reasoning

58K pulls2 weeks ago

A strong reasoning and agentic model from Z.ai with 744B total parameters (40B active), built for complex systems engineering and long-horizon tasks.

744B

ToolsThinkingCloud

Min RAM: 0 MB

Z.aiOther

ollama run glm-5

glm-4.7

Code

53K pulls2 months ago

Advancing the Coding Capability

ToolsThinkingCloud

Min RAM: 0 MB

Alibaba CloudCodeLlama

ollama run glm-4.7

minimax-m2.5

Code

50K pulls2 weeks ago

MiniMax-M2.5 is a state-of-the-art large language model designed for real-world productivity and coding tasks.

ToolsThinkingCloud

Min RAM: 0 MB

Other

ollama run minimax-m2.5

qwen3.5

Multimodal

48K pulls19 hours ago

Qwen 3.5 is a family of open-source multimodal models that delivers exceptional utility and performance.

35b122b

VisionToolsThinkingCloud

Min RAM: 24 GBCtx: 256Kslow

Alibaba CloudQwen

ollama run qwen3.5

glm-ocr

Multimodal

46K pulls3 weeks ago

GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture.

VisionTools

Min RAM: 1.6 GBCtx: 128Kfast

Alibaba CloudOther

ollama run glm-ocr

nomic-embed-text-v2-moe

Embedding

43K pulls2 months ago

nomic-embed-text-v2-moe is a multilingual MoE text embedding model that excels at multilingual retrieval.

Embedding

Min RAM: 0 MB

Nomic AINomic

ollama run nomic-embed-text-v2-moe

kimi-k2

General

40K pulls5 months ago

A state-of-the-art mixture-of-experts (MoE) language model. Kimi K2-Instruct-0905 demonstrates significant improvements in performance on public benchmarks and real-world coding agent tasks.

ToolsCloud

Min RAM: 0 MB

Other

ollama run kimi-k2

deepseek-v3.2

Reasoning

35K pulls2 months ago

DeepSeek-V3.2, a model that harmonizes high computational efficiency with superior reasoning and agent performance.

ToolsThinkingCloud

Min RAM: 0 MB

DeepSeekDeepSeek

ollama run deepseek-v3.2

kimi-k2-thinking

Language

32K pulls3 months ago

Kimi K2 Thinking, Moonshot AI's best open-source thinking model.

ToolsThinkingCloud

Min RAM: 0 MB

Moonshot AIOther

ollama run kimi-k2-thinking

mistral-large-3

Multimodal

20K pulls2 months ago

A general-purpose multimodal mixture-of-experts model for production-grade tasks and enterprise workloads.

VisionToolsCloud

Min RAM: 0 MB

Mistral AIMistral

ollama run mistral-large-3

minimax-m2.1

Code

20K pulls2 months ago

Exceptional multilingual capabilities to elevate code engineering

ToolsCloud

Min RAM: 0 MB

DeepSeekCodeLlama

ollama run minimax-m2.1