OpenCharacter
ChatsCommunityReferral

Models Directory

Browse all 75 available language models and their capabilities

Free Models23

These models are available to all users without any subscription or pay-as-you-go charges.

liquid/lfm-7b

LFM-7B, a new best-in-class language model. LFM-7B is designed for exceptional chat capabilities, including languages like Arabic and Japanese. Powered by the Liquid Foundation Model (LFM) architecture, it exhibits unique features like low memory footprint and fast inference speed.

LFM-7B is the world’s best-in-class multilingual language model in English, Arabic, and Japanese.

See the launch announcement for benchmarks and more info.

Context: 32768 tokens

Max output: N/A tokens

liquid/lfm-3b

Liquid's LFM 3B delivers incredible performance for its size. It positions itself as first place among 3B parameter transformers, hybrids, and RNN models It is also on par with Phi-3.5-mini on multiple benchmarks, while being 18.4% smaller.

LFM-3B is the ideal choice for mobile and other edge text-based applications.

See the launch announcement for benchmarks and more info.

Context: 32768 tokens

Max output: N/A tokens

mistralai/ministral-3b

Ministral 3B is a 3B parameter model optimized for on-device and edge computing. It excels in knowledge, commonsense reasoning, and function-calling, outperforming larger models like Mistral 7B on most benchmarks. Supporting up to 128k context length, it’s ideal for orchestrating agentic workflows and specialist tasks with efficient inference.

Context: 128000 tokens

Max output: N/A tokens

mistralai/ministral-8b

Ministral 8B is an 8B parameter model featuring a unique interleaved sliding-window attention pattern for faster, memory-efficient inference. Designed for edge use cases, it supports up to 128k context length and excels in knowledge and reasoning tasks. It outperforms peers in the sub-10B category, making it perfect for low-latency, privacy-first applications.

Context: 128000 tokens

Max output: N/A tokens

gryphe/mythomax-l2-13b

One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge

Context: 4096 tokens

Max output: 4096 tokens

amazon/nova-micro-v1

Amazon Nova Micro 1.0 is a text-only model that delivers the lowest latency responses in the Amazon Nova family of models at a very low cost. With a context length of 128K tokens and optimized for speed and cost, Amazon Nova Micro excels at tasks such as text summarization, translation, content classification, interactive chat, and brainstorming. It has simple mathematical reasoning and coding abilities.

Context: 128000 tokens

Max output: 5120 tokens

microsoft/phi-4

Microsoft Research Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed.

At 14 billion parameters, it was trained on a mix of high-quality synthetic datasets, data from curated websites, and academic materials. It has undergone careful improvement to follow instructions accurately and maintain strong safety standards. It works best with English language inputs.

For more information, please see Phi-4 Technical Report

Context: 16384 tokens

Max output: 8192 tokens

microsoft/wizardlm-2-7b

WizardLM-2 7B is the smaller variant of Microsoft AI's latest Wizard model. It is the fastest and achieves comparable performance with existing 10x larger opensource leading models

It is a finetune of Mistral 7B Instruct, using the same technique as WizardLM-2 8x22B.

To read more about the model release, click here.

#moe

Context: 32000 tokens

Max output: N/A tokens

google/gemini-flash-1.5-8b

Gemini Flash 1.5 8B is optimized for speed and efficiency, offering enhanced performance in small prompt tasks like chat, transcription, and translation. With reduced latency, it is highly effective for real-time and large-scale operations. This model focuses on cost-effective solutions while maintaining high-quality results.

Click here to learn more about this model.

Usage of Gemini is subject to Google's Gemini Terms of Use.

Context: 1000000 tokens

Max output: 8192 tokens

mistralai/mistral-7b-instruct

A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.

Mistral 7B Instruct has multiple version variants, and this is intended to be the latest version.

Context: 32768 tokens

Max output: 8192 tokens

google/gemma-2-9b-it

Gemma 2 9B by Google is an advanced, open-source language model that sets a new standard for efficiency and performance in its size class.

Designed for a wide variety of tasks, it empowers developers and researchers to build innovative applications, while maintaining accessibility, safety, and cost-effectiveness.

See the launch announcement for more details. Usage of Gemma is subject to Google's Gemma Terms of Use.

Context: 8192 tokens

Max output: 8192 tokens

meta-llama/llama-3.2-3b-instruct

Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it supports eight languages, including English, Spanish, and Hindi, and is adaptable for additional languages.

Trained on 9 trillion tokens, the Llama 3.2 3B model excels in instruction-following, complex reasoning, and tool use. Its balanced performance makes it ideal for applications needing accuracy and efficiency in text generation across multilingual settings.

Click here for the original model card.

Usage of this model is subject to Meta's Acceptable Use Policy.

Context: 131000 tokens

Max output: 131000 tokens

meta-llama/llama-3.2-1b-instruct

Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate efficiently in low-resource environments while maintaining strong task performance.

Supporting eight core languages and fine-tunable for more, Llama 1.3B is ideal for businesses or developers seeking lightweight yet powerful AI solutions that can operate in diverse multilingual settings without the high computational demand of larger models.

Click here for the original model card.

Usage of this model is subject to Meta's Acceptable Use Policy.

Context: 131072 tokens

Max output: N/A tokens

meta-llama/llama-3.1-8b-instruct

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient.

It has demonstrated strong performance compared to leading closed-source models in human evaluations.

To read more about the model release, click here. Usage of this model is subject to Meta's Acceptable Use Policy.

Context: 131072 tokens

Max output: 8192 tokens

qwen/qwen-2-7b-instruct

Qwen2 7B is a transformer-based model that excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning.

It features SwiGLU activation, attention QKV bias, and group query attention. It is pretrained on extensive data with supervised finetuning and direct preference optimization.

For more details, see this blog post and GitHub repo.

Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.

Context: 32768 tokens

Max output: N/A tokens

mistralai/mistral-7b-instruct-v0.3

A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.

An improved version of Mistral 7B Instruct v0.2, with the following changes:

  • Extended vocabulary to 32768
  • Supports v3 Tokenizer
  • Supports function calling

NOTE: Support for function calling depends on the provider.

Context: 32768 tokens

Max output: 8192 tokens

meta-llama/llama-3-8b-instruct

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases.

It has demonstrated strong performance compared to leading closed-source models in human evaluations.

To read more about the model release, click here. Usage of this model is subject to Meta's Acceptable Use Policy.

Context: 8192 tokens

Max output: 8192 tokens

mistralai/mistral-nemo

A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA.

The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.

It supports function calling and is released under the Apache 2.0 license.

Context: 131072 tokens

Max output: 8192 tokens

sao10k/l3-lunaris-8b

Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It's a strategic merge of multiple models, designed to balance creativity with improved logic and general knowledge.

Created by Sao10k, this model aims to offer an improved experience over Stheno v3.2, with enhanced creativity and logical reasoning.

For best results, use with Llama 3 Instruct context template, temperature 1.4, and min_p 0.1.

Context: 8192 tokens

Max output: 8192 tokens

nousresearch/hermes-2-pro-llama-3-8b

Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.

Context: 131000 tokens

Max output: 131000 tokens

openchat/openchat-7b

OpenChat 7B is a library of open-source language models, fine-tuned with "C-RLFT (Conditioned Reinforcement Learning Fine-Tuning)" - a strategy inspired by offline reinforcement learning. It has been trained on mixed-quality data without preference labels.

  • For OpenChat fine-tuned on Mistral 7B, check out OpenChat 7B.
  • For OpenChat fine-tuned on Llama 8B, check out OpenChat 8B.

#open-source

Context: 8192 tokens

Max output: 8192 tokens

undi95/toppy-m-7b:nitro

amazon/nova-lite-v1

Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast processing of image, video, and text inputs to generate text output. Amazon Nova Lite can handle real-time customer interactions, document analysis, and visual question-answering tasks with high accuracy.

With an input context of 300K tokens, it can analyze multiple images or up to 30 minutes of video in a single input.

Context: 300000 tokens

Max output: 5120 tokens

Pro Models31

These models are available to Pro subscribers with unlimited usage included in the subscription.

thedrummer/unslopnemo-12b

UnslopNemo v4.1 is the latest addition from the creator of Rocinante, designed for adventure writing and role-play scenarios.

Context: 32000 tokens

Max output: N/A tokens

meta-llama/llama-3.1-70b-instruct

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases.

It has demonstrated strong performance compared to leading closed-source models in human evaluations.

To read more about the model release, click here. Usage of this model is subject to Meta's Acceptable Use Policy.

Context: 131072 tokens

Max output: 8192 tokens

nousresearch/hermes-3-llama-3.1-70b

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board.

Hermes 3 70B is a competitive, if not superior finetune of the Llama-3.1 70B foundation model, focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user.

The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills.

Context: 131000 tokens

Max output: 131000 tokens

deepseek/deepseek-chat

DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported evaluations reveal that the model outperforms other open-source models and rivals leading closed-source models.

For model details, please visit the DeepSeek-V3 repo for more information, or see the launch announcement.

Context: 163840 tokens

Max output: 163840 tokens

microsoft/phi-3.5-mini-128k-instruct

Phi-3.5 models are lightweight, state-of-the-art open models. These models were trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties. Phi-3.5 Mini uses 3.8B parameters, and is a dense decoder-only transformer model using the same tokenizer as Phi-3 Mini.

The models underwent a rigorous enhancement process, incorporating both supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3.5 models showcased robust and state-of-the-art performance among models with less than 13 billion parameters.

Context: 128000 tokens

Max output: N/A tokens

ai21/jamba-1-5-mini

Jamba 1.5 Mini is the world's first production-grade Mamba-based model, combining SSM and Transformer architectures for a 256K context window and high efficiency.

It works with 9 languages and can handle various writing and analysis tasks as well as or better than similar small models.

This model uses less computer memory and works faster with longer texts than previous designs.

Read their announcement to learn more.

Context: 256000 tokens

Max output: 4096 tokens

mistralai/codestral-mamba

A 7.3B parameter Mamba-based model designed for code and reasoning tasks.

  • Linear time inference, allowing for theoretically infinite sequence lengths
  • 256k token context window
  • Optimized for quick responses, especially beneficial for code productivity
  • Performs comparably to state-of-the-art transformer models in code and reasoning tasks
  • Available under the Apache 2.0 license for free use, modification, and distribution

Context: 256000 tokens

Max output: N/A tokens

openai/gpt-4o-mini

GPT-4o mini is OpenAI's newest model after GPT-4 Omni, supporting both text and image inputs with text outputs.

As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than GPT-3.5 Turbo. It maintains SOTA intelligence, while being significantly more cost-effective.

GPT-4o mini achieves an 82% score on MMLU and presently ranks higher than GPT-4 on chat preferences common leaderboards.

Check out the launch announcement to learn more.

#multimodal

Context: 128000 tokens

Max output: 16384 tokens

anthropic/claude-3-haiku

Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance.

See the launch announcement and benchmark results here

#multimodal

Context: 200000 tokens

Max output: 4096 tokens

cognitivecomputations/dolphin-mixtral-8x22b

Dolphin 2.9 is designed for instruction following, conversational, and coding. This model is a finetune of Mixtral 8x22B Instruct. It features a 64k context length and was fine-tuned with a 16k sequence length using ChatML templates.

This model is a successor to Dolphin Mixtral 8x7B.

The model is uncensored and is stripped of alignment and bias. It requires an external alignment layer for ethical use. Users are cautioned to use this highly compliant model responsibly, as detailed in a blog post about uncensored models at erichartford.com/uncensored-models.

#moe #uncensored

Context: 16000 tokens

Max output: N/A tokens

google/gemma-2-27b-it

Gemma 2 27B by Google is an open model built from the same research and technology used to create the Gemini models.

Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning.

See the launch announcement for more details. Usage of Gemma is subject to Google's Gemma Terms of Use.

Context: 8192 tokens

Max output: 8192 tokens

mistralai/mixtral-8x7b-instruct

Mixtral 8x7B Instruct is a pretrained generative Sparse Mixture of Experts, by Mistral AI, for chat and instruction use. Incorporates 8 experts (feed-forward networks) for a total of 47 billion parameters.

Instruct model fine-tuned by Mistral. #moe

Context: 32768 tokens

Max output: 8192 tokens

mistralai/mistral-small-24b-instruct-2501

Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed for efficient local deployment.

The model achieves 81% accuracy on the MMLU benchmark and performs competitively with larger models like Llama 3.3 70B and Qwen 32B, while operating at three times the speed on equivalent hardware. Read the blog post about the model here.

Context: 32768 tokens

Max output: 8192 tokens

gryphe/mythomist-7b

anthropic/claude-instant-1:beta

nvidia/llama-3.1-nemotron-70b-instruct

NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging Llama 3.1 70B architecture and Reinforcement Learning from Human Feedback (RLHF), it excels in automatic alignment benchmarks. This model is tailored for applications requiring high accuracy in helpfulness and response generation, suitable for diverse user queries across multiple domains.

Usage of this model is subject to Meta's Acceptable Use Policy.

Context: 131000 tokens

Max output: 131000 tokens

thedrummer/rocinante-12b

Rocinante 12B is designed for engaging storytelling and rich prose.

Early testers have reported:

  • Expanded vocabulary with unique and expressive word choices
  • Enhanced creativity for vivid narratives
  • Adventure-filled and captivating stories

Context: 32768 tokens

Max output: N/A tokens

eva-unit-01/eva-qwen-2.5-14b

mistralai/mistral-tiny

This model is currently powered by Mistral-7B-v0.2, and incorporates a "better" fine-tuning than Mistral 7B, inspired by community work. It's best used for large batch processing tasks where cost is a significant factor but reasoning capabilities are not crucial.

Context: 32000 tokens

Max output: N/A tokens

mistralai/mistral-small

With 22 billion parameters, Mistral Small v24.09 offers a convenient mid-point between (Mistral NeMo 12B)[/mistralai/mistral-nemo] and (Mistral Large 2)[/mistralai/mistral-large], providing a cost-effective solution that can be deployed across various platforms and environments. It has better reasoning, exhibits more capabilities, can produce and reason about code, and is multiligual, supporting English, French, German, Italian, and Spanish.

Context: 32000 tokens

Max output: N/A tokens

qwen/qwen-turbo

Qwen-Turbo, based on Qwen2.5, is a 1M context model that provides fast speed and low cost, suitable for simple tasks.

Context: 1000000 tokens

Max output: 8192 tokens

qwen/qwen-plus

Qwen-Plus, based on the Qwen2.5 foundation model, is a 131K context model with a balanced performance, speed, and cost combination.

Context: 131072 tokens

Max output: 8192 tokens

deepseek/deepseek-r1-distill-qwen-1.5b

DeepSeek R1 Distill Qwen 1.5B is a distilled large language model based on Qwen 2.5 Math 1.5B, using outputs from DeepSeek R1. It's a very small and efficient model which outperforms GPT 4o 0513 on Math Benchmarks.

Other benchmark results include:

  • AIME 2024 pass@1: 28.9
  • AIME 2024 cons@64: 52.7
  • MATH-500 pass@1: 83.9

The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

Context: 131072 tokens

Max output: 32768 tokens

deepseek/deepseek-r1-distill-qwen-32b

DeepSeek R1 Distill Qwen 32B is a distilled large language model based on Qwen 2.5 32B, using outputs from DeepSeek R1. It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.

Other benchmark results include:

  • AIME 2024 pass@1: 72.6
  • MATH-500 pass@1: 94.3
  • CodeForces Rating: 1691

The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

Context: 131072 tokens

Max output: 8192 tokens

deepseek/deepseek-r1-distill-llama-70b

DeepSeek R1 Distill Llama 70B is a distilled large language model based on Llama-3.3-70B-Instruct, using outputs from DeepSeek R1. The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including:

  • AIME 2024 pass@1: 70.0
  • MATH-500 pass@1: 94.5
  • CodeForces Rating: 1633

The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

Context: 131072 tokens

Max output: 8192 tokens

qwen/qvq-72b-preview

qwen/qwq-32b-preview

QwQ-32B-Preview is an experimental research model focused on AI reasoning capabilities developed by the Qwen Team. As a preview release, it demonstrates promising analytical abilities while having several important limitations:

  1. Language Mixing and Code-Switching: The model may mix languages or switch between them unexpectedly, affecting response clarity.
  2. Recursive Reasoning Loops: The model may enter circular reasoning patterns, leading to lengthy responses without a conclusive answer.
  3. Safety and Ethical Considerations: The model requires enhanced safety measures to ensure reliable and secure performance, and users should exercise caution when deploying it.
  4. Performance and Benchmark Limitations: The model excels in math and coding but has room for improvement in other areas, such as common sense reasoning and nuanced language understanding.

Context: 32768 tokens

Max output: N/A tokens

qwen/qwen-2.5-coder-32b-instruct

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:

  • Significantly improvements in code generation, code reasoning and code fixing.
  • A more comprehensive foundation for real-world applications such as Code Agents. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies.

To read more about its evaluation results, check out Qwen 2.5 Coder's blog.

Context: 33000 tokens

Max output: 3000 tokens

mistralai/codestral-2501

Mistral's cutting-edge language model for coding. Codestral specializes in low-latency, high-frequency tasks such as fill-in-the-middle (FIM), code correction and test generation.

Learn more on their blog post: https://mistral.ai/news/codestral-2501/

Context: 256000 tokens

Max output: N/A tokens

meta-llama/llama-3.3-70b-instruct

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.

Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

Model Card

Context: 131072 tokens

Max output: N/A tokens

deepseek/deepseek-r1-distill-llama-3.1-70b

Pro Metered Models21

These premium models are available on a pay-as-you-go basis with per-token pricing.

anthropic/claude-3.7-sonnet

Input: $0.000003 per token

Output: $0.000015 per token

Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and extended, step-by-step processing for complex tasks. The model demonstrates notable improvements in coding, particularly in front-end development and full-stack updates, and excels in agentic workflows, where it can autonomously navigate multi-step processes.

Claude 3.7 Sonnet maintains performance parity with its predecessor in standard mode while offering an extended reasoning mode for enhanced accuracy in math, coding, and instruction-following tasks.

Read more at the blog post here

Context: 200000 tokens

Max output: 128000 tokens

✓ Moderated

anthropic/claude-3.7-sonnet:thinking

Input: $0.000003 per token

Output: $0.000015 per token

Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and extended, step-by-step processing for complex tasks. The model demonstrates notable improvements in coding, particularly in front-end development and full-stack updates, and excels in agentic workflows, where it can autonomously navigate multi-step processes.

Claude 3.7 Sonnet maintains performance parity with its predecessor in standard mode while offering an extended reasoning mode for enhanced accuracy in math, coding, and instruction-following tasks.

Read more at the blog post here

Context: 200000 tokens

Max output: 128000 tokens

✓ Moderated

deepseek/deepseek-r1

Input: $0.00000055 per token

Output: $0.00000219 per token

DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.

Fully open-source model & technical report.

MIT licensed: Distill & commercialize freely!

Context: 163840 tokens

Max output: 163840 tokens

✗ Unmoderated

openai/gpt-4o-2024-11-20

Input: $0.0000025 per token

Output: $0.00001 per token

The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more natural, engaging, and tailored writing to improve relevance & readability. It’s also better at working with uploaded files, providing deeper insights & more thorough responses.

GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of GPT-4 Turbo while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.

Context: 128000 tokens

Max output: 16384 tokens

✓ Moderated

openai/o3-mini-high

Input: $0.0000011 per token

Output: $0.0000044 per token

OpenAI o3-mini-high is the same model as o3-mini with reasoning_effort set to high.

o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and coding. The model features three adjustable reasoning effort levels and supports key developer capabilities including function calling, structured outputs, and streaming, though it does not include vision processing capabilities.

The model demonstrates significant improvements over its predecessor, with expert testers preferring its responses 56% of the time and noting a 39% reduction in major errors on complex questions. With medium reasoning effort settings, o3-mini matches the performance of the larger o1 model on challenging reasoning evaluations like AIME and GPQA, while maintaining lower latency and cost.

Context: 200000 tokens

Max output: 100000 tokens

✓ Moderated

allenai/llama-3.1-tulu-3-405b

Input: $0.000005 per token

Output: $0.00001 per token

Tülu 3 405B is the largest model in the Tülu 3 family, applying fully open post-training recipes at a 405B parameter scale. Built on the Llama 3.1 405B base, it leverages Reinforcement Learning with Verifiable Rewards (RLVR) to enhance instruction following, MATH, GSM8K, and IFEval performance. As part of Tülu 3’s fully open-source approach, it offers state-of-the-art capabilities while surpassing prior open-weight models like Llama 3.1 405B Instruct and Nous Hermes 3 405B on multiple benchmarks. To read more, click here.

Context: 16384 tokens

Max output: 4096 tokens

✗ Unmoderated

aion-labs/aion-1.0

Input: $0.000004 per token

Output: $0.000008 per token

Aion-1.0 is a multi-model system designed for high performance across various tasks, including reasoning and coding. It is built on DeepSeek-R1, augmented with additional models and techniques such as Tree of Thoughts (ToT) and Mixture of Experts (MoE). It is Aion Lab's most powerful reasoning model.

Context: 32768 tokens

Max output: 32768 tokens

✗ Unmoderated

qwen/qwen-max

Input: $0.0000016 per token

Output: $0.0000064 per token

Qwen-Max, based on Qwen2.5, provides the best inference performance among Qwen models, especially for complex multi-step tasks. It's a large-scale MoE model that has been pretrained on over 20 trillion tokens and further post-trained with curated Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) methodologies. The parameter count is unknown.

Context: 32768 tokens

Max output: 8192 tokens

✗ Unmoderated

openai/o1

Input: $0.000015 per token

Output: $0.00006 per token

The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought.

The o1 models are optimized for math, science, programming, and other STEM-related tasks. They consistently exhibit PhD-level accuracy on benchmarks in physics, chemistry, and biology. Learn more in the launch announcement.

Context: 200000 tokens

Max output: 100000 tokens

✓ Moderated

x-ai/grok-2-1212

Input: $0.000002 per token

Output: $0.00001 per token

Grok 2 1212 introduces significant enhancements to accuracy, instruction adherence, and multilingual support, making it a powerful and flexible choice for developers seeking a highly steerable, intelligent model.

Context: 131072 tokens

Max output: N/A tokens

✗ Unmoderated

mistralai/mistral-large-2411

Input: $0.000002 per token

Output: $0.000006 per token

Mistral Large 2 2411 is an update of Mistral Large 2 released together with Pixtral Large 2411

It provides a significant upgrade on the previous Mistral Large 24.07, with notable improvements in long context understanding, a new system prompt, and more accurate function calling.

Context: 128000 tokens

Max output: N/A tokens

✗ Unmoderated

neversleep/llama-3.1-lumimaid-70b

Input: $0.000003375 per token

Output: $0.0000045 per token

Lumimaid v0.2 70B is a finetune of Llama 3.1 70B with a "HUGE step up dataset wise" compared to Lumimaid v0.1. Sloppy chats output were purged.

Usage of this model is subject to Meta's Acceptable Use Policy.

Context: 16384 tokens

Max output: 2048 tokens

✗ Unmoderated

x-ai/grok-beta

Input: $0.000005 per token

Output: $0.000015 per token

Grok Beta is xAI's experimental language model with state-of-the-art reasoning capabilities, best for complex and multi-step use cases.

It is the successor of Grok 2 with enhanced context length.

Context: 131072 tokens

Max output: N/A tokens

✗ Unmoderated

inflection/inflection-3-pi

Input: $0.0000025 per token

Output: $0.00001 per token

Inflection 3 Pi powers Inflection's Pi chatbot, including backstory, emotional intelligence, productivity, and safety. It has access to recent news, and excels in scenarios like customer support and roleplay.

Pi has been trained to mirror your tone and style, if you use more emojis, so will Pi! Try experimenting with various prompts and conversation styles.

Context: 8000 tokens

Max output: 1024 tokens

✗ Unmoderated

cohere/command-r-plus-08-2024

Input: $0.000002375 per token

Output: $0.0000095 per token

command-r-plus-08-2024 is an update of the Command R+ with roughly 50% higher throughput and 25% lower latencies as compared to the previous Command R+ version, while keeping the hardware footprint the same.

Read the launch post here.

Use of this model is subject to Cohere's Usage Policy and SaaS Agreement.

Context: 128000 tokens

Max output: 4000 tokens

✗ Unmoderated

ai21/jamba-1-5-large

Input: $0.000002 per token

Output: $0.000008 per token

Jamba 1.5 Large is part of AI21's new family of open models, offering superior speed, efficiency, and quality.

It features a 256K effective context window, the longest among open models, enabling improved performance on tasks like document summarization and analysis.

Built on a novel SSM-Transformer architecture, it outperforms larger models like Llama 3.1 70B on benchmarks while maintaining resource efficiency.

Read their announcement to learn more.

Context: 256000 tokens

Max output: 4096 tokens

✗ Unmoderated

01-ai/yi-large

Input: $0.000003 per token

Output: $0.000003 per token

The Yi Large model was designed by 01.AI with the following usecases in mind: knowledge search, data classification, human-like chat bots, and customer service.

It stands out for its multilingual proficiency, particularly in Spanish, Chinese, Japanese, German, and French.

Check out the launch announcement to learn more.

Context: 32768 tokens

Max output: 4096 tokens

✗ Unmoderated

neversleep/llama-3-lumimaid-70b

Input: $0.000003375 per token

Output: $0.0000045 per token

The NeverSleep team is back, with a Llama 3 70B finetune trained on their curated roleplay data. Striking a balance between eRP and RP, Lumimaid was designed to be serious, yet uncensored when necessary.

To enhance it's overall intelligence and chat capability, roughly 40% of the training data was not roleplay. This provides a breadth of knowledge to access, while still keeping roleplay as the primary strength.

Usage of this model is subject to Meta's Acceptable Use Policy.

Context: 8192 tokens

Max output: 2048 tokens

✗ Unmoderated

anthropic/claude-3-opus

Input: $0.000015 per token

Output: $0.000075 per token

Claude 3 Opus is Anthropic's most powerful model for highly complex tasks. It boasts top-level performance, intelligence, fluency, and understanding.

See the launch announcement and benchmark results here

#multimodal

Context: 200000 tokens

Max output: 4096 tokens

✓ Moderated

anthropic/claude-3-sonnet

Input: $0.000003 per token

Output: $0.000015 per token

Claude 3 Sonnet is an ideal balance of intelligence and speed for enterprise workloads. Maximum utility at a lower price, dependable, balanced for scaled deployments.

See the launch announcement and benchmark results here

#multimodal

Context: 200000 tokens

Max output: 4096 tokens

✓ Moderated

alpindale/goliath-120b

Input: $0.000009375 per token

Output: $0.000009375 per token

A large LLM created by combining two fine-tuned Llama 70B models into one 120B model. Combines Xwin and Euryale.

Credits to

#merge

Context: 6144 tokens

Max output: 512 tokens

✗ Unmoderated