Models Directory
Browse all 51 available language models and their capabilities
Free Models19
These models are available to all users without any subscription or pay-as-you-go charges.
mistralai/mistral-small-3.2-24b-instruct
Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimized for instruction following, repetition reduction, and improved function calling. Compared to the 3.1 release, version 3.2 significantly improves accuracy on...
Context: 128000 tokens
Max output: N/A tokens
z-ai/glm-4-32b
GLM 4 32B is a cost-effective foundation language model. It can efficiently perform complex tasks and has significantly enhanced capabilities in tool use, online search, and code-related intelligent tasks. It...
Context: 128000 tokens
Max output: N/A tokens
baidu/ernie-4.5-21b-a3b
A sophisticated text-based Mixture-of-Experts (MoE) model featuring 21B total parameters with 3B activated per token, delivering exceptional multimodal understanding and generation through heterogeneous MoE structures and modality-isolated routing. Supporting an...
Context: 120000 tokens
Max output: 8000 tokens
mistralai/ministral-3b-2512
The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.
Context: 131072 tokens
Max output: N/A tokens
mistralai/ministral-8b-2512
A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.
Context: 262144 tokens
Max output: N/A tokens
ibm-granite/granite-4.0-h-micro
Granite-4.0-H-Micro is a 3B parameter from the Granite 4 family of models. These models are the latest in a series of models released by IBM. They are fine-tuned for long...
Context: 131000 tokens
Max output: N/A tokens
nousresearch/hermes-4-70b
Hermes 4 70B is a hybrid reasoning model from Nous Research, built on Meta-Llama-3.1-70B. It introduces the same hybrid mode as the larger 405B release, allowing the model to either...
Context: 131072 tokens
Max output: N/A tokens
openai/gpt-5-nano
GPT-5-Nano is the smallest and fastest variant in the GPT-5 system, optimized for developer tools, rapid interactions, and ultra-low latency environments. While limited in reasoning depth compared to its larger...
Context: 400000 tokens
Max output: 128000 tokens
openai/gpt-oss-20b
gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for...
Context: 131072 tokens
Max output: 131072 tokens
google/gemini-2.5-flash-lite
Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance...
Context: 1048576 tokens
Max output: 65535 tokens
meta-llama/llama-4-scout
Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input...
Context: 327680 tokens
Max output: 16384 tokens
nvidia/nemotron-nano-9b-v2
NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and...
Context: 131072 tokens
Max output: N/A tokens
qwen/qwen3-30b-a3b
Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique...
Context: 40960 tokens
Max output: 40960 tokens
qwen/qwen3-8b
Qwen3-8B is a dense 8.2B parameter causal language model from the Qwen3 series, designed for both reasoning-heavy tasks and efficient dialogue. It supports seamless switching between "thinking" mode for math,...
Context: 40960 tokens
Max output: 8192 tokens
qwen/qwen3-14b
Qwen3-14B is a dense 14.8B parameter causal language model from the Qwen3 series, designed for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for...
Context: 40960 tokens
Max output: 40960 tokens
qwen/qwen3-32b
Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for...
Context: 40960 tokens
Max output: 40960 tokens
google/gemma-3-4b-it
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...
Context: 131072 tokens
Max output: N/A tokens
google/gemma-3-12b-it
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...
Context: 131072 tokens
Max output: N/A tokens
essentialai/rnj-1-instruct
Rnj-1 is an 8B-parameter, dense, open-weight model family developed by Essential AI and trained from scratch with a focus on programming, math, and scientific reasoning. The model demonstrates strong performance...
Context: 32768 tokens
Max output: N/A tokens
Pro Models20
These models are available to Pro subscribers with unlimited usage included in the subscription.
z-ai/glm-4.5-air
GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter...
Context: 131072 tokens
Max output: 98304 tokens
mistralai/mistral-small-creative
Mistral Small Creative is an experimental small model designed for creative writing, narrative generation, roleplay and character-driven dialogue, general-purpose instruction following, and conversational agents.
Context: 32768 tokens
Max output: N/A tokens
mistralai/ministral-14b-2512
The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language...
Context: 262144 tokens
Max output: N/A tokens
x-ai/grok-4-fast
Grok 4 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window. It comes in two flavors: non-reasoning and reasoning. Read more about the model...
Context: 2000000 tokens
Max output: 30000 tokens
x-ai/grok-4.1-fast
Grok 4.1 Fast is xAI's best agentic tool calling model that shines in real-world use cases like customer support and deep research. 2M context window. Reasoning can be enabled/disabled using...
Context: 2000000 tokens
Max output: 30000 tokens
minimax/minimax-m2
MiniMax-M2 is a compact, high-efficiency large language model optimized for end-to-end coding and agentic workflows. With 10 billion activated parameters (230 billion total), it delivers near-frontier intelligence across general reasoning,...
Context: 196608 tokens
Max output: 196608 tokens
minimax/minimax-m2.1
MiniMax-M2.1 is a lightweight, state-of-the-art large language model optimized for coding, agentic workflows, and modern application development. With only 10 billion activated parameters, it delivers a major jump in real-world...
Context: 196608 tokens
Max output: N/A tokens
openai/gpt-oss-120b
gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized...
Context: 131072 tokens
Max output: N/A tokens
google/gemma-3-27b-it
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...
Context: 131072 tokens
Max output: 16384 tokens
meta-llama/llama-4-maverick
Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward...
Context: 1048576 tokens
Max output: 16384 tokens
meta-llama/llama-3.3-70b-instruct
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model...
Context: 131072 tokens
Max output: 16384 tokens
deepseek/deepseek-chat-v3-0324
DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team. It succeeds the DeepSeek V3 model and performs really well...
Context: 163840 tokens
Max output: N/A tokens
deepseek/deepseek-chat-v3.1
DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase long-context...
Context: 32768 tokens
Max output: 7168 tokens
deepseek/deepseek-v3.2
DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism...
Context: 163840 tokens
Max output: N/A tokens
minimax/minimax-m2-her
MiniMax M2-her is a dialogue-first large language model built for immersive roleplay, character-driven chat, and expressive multi-turn conversations. Designed to stay consistent in tone and personality, it supports rich message...
Context: 65536 tokens
Max output: 2048 tokens
nvidia/llama-3.3-nemotron-super-49b-v1.5
Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context. It’s post-trained for agentic workflows (RAG, tool calling) via SFT across math, code, science, and...
Context: 131072 tokens
Max output: N/A tokens
nousresearch/hermes-4-405b
Hermes 4 is a large-scale reasoning model built on Meta-Llama-3.1-405B and released by Nous Research. It introduces a hybrid reasoning mode, where the model can choose to deliberate internally with...
Context: 131072 tokens
Max output: N/A tokens
qwen/qwen3-next-80b-a3b-instruct
Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable responses without “thinking” traces. It targets complex tasks across reasoning, code generation, knowledge QA, and multilingual...
Context: 262144 tokens
Max output: N/A tokens
qwen/qwen3-235b-a22b-2507
Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. It is optimized for general-purpose text generation, including instruction following,...
Context: 262144 tokens
Max output: N/A tokens
qwen/qwen3-235b-a22b
Qwen3-235B-A22B is a 235B parameter mixture-of-experts (MoE) model developed by Qwen, activating 22B parameters per forward pass. It supports seamless switching between a "thinking" mode for complex reasoning, math, and...
Context: 131072 tokens
Max output: 8192 tokens
Pro Metered Models12
These premium models are available on a pay-as-you-go basis with per-token pricing.
openai/gpt-5.1-chat
Input: $0.00000125 per token
Output: $0.00001 per token
GPT-5.1 Chat (AKA Instant is the fast, lightweight member of the 5.1 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively “think” on...
Context: 128000 tokens
Max output: 16384 tokens
✓ Moderated
z-ai/glm-4.6
Input: $0.00000039 per token
Output: $0.0000019 per token
Compared with GLM-4.5, this generation brings several key improvements: Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex...
Context: 204800 tokens
Max output: 204800 tokens
✗ Unmoderated
mistralai/mistral-large-2512
Input: $0.0000005 per token
Output: $0.0000015 per token
Mistral Large 3 2512 is Mistral’s most capable model to date, featuring a sparse mixture-of-experts architecture with 41B active parameters (675B total), and released under the Apache 2.0 license.
Context: 262144 tokens
Max output: N/A tokens
✗ Unmoderated
anthropic/claude-sonnet-4.5
Input: $0.000003 per token
Output: $0.000015 per token
Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model to date, optimized for real-world agents and coding workflows. It delivers state-of-the-art performance on coding benchmarks such as SWE-bench Verified, with...
Context: 1000000 tokens
Max output: 64000 tokens
✗ Unmoderated
anthropic/claude-haiku-4.5
Input: $0.000001 per token
Output: $0.000005 per token
Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, delivering near-frontier intelligence at a fraction of the cost and latency of larger Claude models. Matching Claude Sonnet 4’s performance...
Context: 200000 tokens
Max output: 64000 tokens
✓ Moderated
google/gemini-2.5-pro
Input: $0.00000125 per token
Output: $0.00001 per token
Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy...
Context: 1048576 tokens
Max output: 65536 tokens
✗ Unmoderated
google/gemini-2.5-flash
Input: $0.0000003 per token
Output: $0.0000025 per token
Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater...
Context: 1048576 tokens
Max output: 65535 tokens
✗ Unmoderated
google/gemini-3-flash-preview
Input: $0.0000005 per token
Output: $0.000003 per token
Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool...
Context: 1048576 tokens
Max output: 65536 tokens
✗ Unmoderated
amazon/nova-premier-v1
Input: $0.0000025 per token
Output: $0.0000125 per token
Amazon Nova Premier is the most capable of Amazon’s multimodal models for complex reasoning tasks and for use as the best teacher for distilling custom models.
Context: 1000000 tokens
Max output: 32000 tokens
✓ Moderated
mistralai/mistral-medium-3.1
Input: $0.0000004 per token
Output: $0.000002 per token
Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances...
Context: 131072 tokens
Max output: N/A tokens
✗ Unmoderated
deepseek/deepseek-r1-0528
Input: $0.00000045 per token
Output: $0.00000215 per token
May 28th update to the original DeepSeek R1 Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active...
Context: 163840 tokens
Max output: 65536 tokens
✗ Unmoderated
moonshotai/kimi-k2-0905
Input: $0.0000004 per token
Output: $0.000002 per token
Kimi K2 0905 is the September update of Kimi K2 0711. It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32...
Context: 131072 tokens
Max output: N/A tokens
✗ Unmoderated