Models Directory
Browse all 75 available language models and their capabilities
Free Models23
These models are available to all users without any subscription or pay-as-you-go charges.
liquid/lfm-7b
LFM-7B, a new best-in-class language model. LFM-7B is designed for exceptional chat capabilities, including languages like Arabic and Japanese. Powered by the Liquid Foundation Model (LFM) architecture, it exhibits unique features like low memory footprint and fast inference speed.
LFM-7B is the world’s best-in-class multilingual language model in English, Arabic, and Japanese.
See the launch announcement for benchmarks and more info.
Context: 32768 tokens
Max output: N/A tokens
liquid/lfm-3b
Liquid's LFM 3B delivers incredible performance for its size. It positions itself as first place among 3B parameter transformers, hybrids, and RNN models It is also on par with Phi-3.5-mini on multiple benchmarks, while being 18.4% smaller.
LFM-3B is the ideal choice for mobile and other edge text-based applications.
See the launch announcement for benchmarks and more info.
Context: 32768 tokens
Max output: N/A tokens
mistralai/ministral-3b
Ministral 3B is a 3B parameter model optimized for on-device and edge computing. It excels in knowledge, commonsense reasoning, and function-calling, outperforming larger models like Mistral 7B on most benchmarks. Supporting up to 128k context length, it’s ideal for orchestrating agentic workflows and specialist tasks with efficient inference.
Context: 128000 tokens
Max output: N/A tokens
mistralai/ministral-8b
Ministral 8B is an 8B parameter model featuring a unique interleaved sliding-window attention pattern for faster, memory-efficient inference. Designed for edge use cases, it supports up to 128k context length and excels in knowledge and reasoning tasks. It outperforms peers in the sub-10B category, making it perfect for low-latency, privacy-first applications.
Context: 128000 tokens
Max output: N/A tokens
gryphe/mythomax-l2-13b
One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge
Context: 4096 tokens
Max output: 4096 tokens
amazon/nova-micro-v1
Amazon Nova Micro 1.0 is a text-only model that delivers the lowest latency responses in the Amazon Nova family of models at a very low cost. With a context length of 128K tokens and optimized for speed and cost, Amazon Nova Micro excels at tasks such as text summarization, translation, content classification, interactive chat, and brainstorming. It has simple mathematical reasoning and coding abilities.
Context: 128000 tokens
Max output: 5120 tokens
microsoft/phi-4
Microsoft Research Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed.
At 14 billion parameters, it was trained on a mix of high-quality synthetic datasets, data from curated websites, and academic materials. It has undergone careful improvement to follow instructions accurately and maintain strong safety standards. It works best with English language inputs.
For more information, please see Phi-4 Technical Report
Context: 16384 tokens
Max output: 8192 tokens
microsoft/wizardlm-2-7b
WizardLM-2 7B is the smaller variant of Microsoft AI's latest Wizard model. It is the fastest and achieves comparable performance with existing 10x larger opensource leading models
It is a finetune of Mistral 7B Instruct, using the same technique as WizardLM-2 8x22B.
To read more about the model release, click here.
#moe
Context: 32000 tokens
Max output: N/A tokens
google/gemini-flash-1.5-8b
Gemini Flash 1.5 8B is optimized for speed and efficiency, offering enhanced performance in small prompt tasks like chat, transcription, and translation. With reduced latency, it is highly effective for real-time and large-scale operations. This model focuses on cost-effective solutions while maintaining high-quality results.
Click here to learn more about this model.
Usage of Gemini is subject to Google's Gemini Terms of Use.
Context: 1000000 tokens
Max output: 8192 tokens
mistralai/mistral-7b-instruct
A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.
Mistral 7B Instruct has multiple version variants, and this is intended to be the latest version.
Context: 32768 tokens
Max output: 8192 tokens
google/gemma-2-9b-it
Gemma 2 9B by Google is an advanced, open-source language model that sets a new standard for efficiency and performance in its size class.
Designed for a wide variety of tasks, it empowers developers and researchers to build innovative applications, while maintaining accessibility, safety, and cost-effectiveness.
See the launch announcement for more details. Usage of Gemma is subject to Google's Gemma Terms of Use.
Context: 8192 tokens
Max output: 8192 tokens
meta-llama/llama-3.2-3b-instruct
Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it supports eight languages, including English, Spanish, and Hindi, and is adaptable for additional languages.
Trained on 9 trillion tokens, the Llama 3.2 3B model excels in instruction-following, complex reasoning, and tool use. Its balanced performance makes it ideal for applications needing accuracy and efficiency in text generation across multilingual settings.
Click here for the original model card.
Usage of this model is subject to Meta's Acceptable Use Policy.
Context: 131000 tokens
Max output: 131000 tokens
meta-llama/llama-3.2-1b-instruct
Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate efficiently in low-resource environments while maintaining strong task performance.
Supporting eight core languages and fine-tunable for more, Llama 1.3B is ideal for businesses or developers seeking lightweight yet powerful AI solutions that can operate in diverse multilingual settings without the high computational demand of larger models.
Click here for the original model card.
Usage of this model is subject to Meta's Acceptable Use Policy.
Context: 131072 tokens
Max output: N/A tokens
meta-llama/llama-3.1-8b-instruct
Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient.
It has demonstrated strong performance compared to leading closed-source models in human evaluations.
To read more about the model release, click here. Usage of this model is subject to Meta's Acceptable Use Policy.
Context: 131072 tokens
Max output: 8192 tokens
qwen/qwen-2-7b-instruct
Qwen2 7B is a transformer-based model that excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning.
It features SwiGLU activation, attention QKV bias, and group query attention. It is pretrained on extensive data with supervised finetuning and direct preference optimization.
For more details, see this blog post and GitHub repo.
Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.
Context: 32768 tokens
Max output: N/A tokens
mistralai/mistral-7b-instruct-v0.3
A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.
An improved version of Mistral 7B Instruct v0.2, with the following changes:
- Extended vocabulary to 32768
- Supports v3 Tokenizer
- Supports function calling
NOTE: Support for function calling depends on the provider.
Context: 32768 tokens
Max output: 8192 tokens
meta-llama/llama-3-8b-instruct
Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases.
It has demonstrated strong performance compared to leading closed-source models in human evaluations.
To read more about the model release, click here. Usage of this model is subject to Meta's Acceptable Use Policy.
Context: 8192 tokens
Max output: 8192 tokens
mistralai/mistral-nemo
A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA.
The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.
It supports function calling and is released under the Apache 2.0 license.
Context: 131072 tokens
Max output: 8192 tokens
sao10k/l3-lunaris-8b
Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It's a strategic merge of multiple models, designed to balance creativity with improved logic and general knowledge.
Created by Sao10k, this model aims to offer an improved experience over Stheno v3.2, with enhanced creativity and logical reasoning.
For best results, use with Llama 3 Instruct context template, temperature 1.4, and min_p 0.1.
Context: 8192 tokens
Max output: 8192 tokens
nousresearch/hermes-2-pro-llama-3-8b
Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.
Context: 131000 tokens
Max output: 131000 tokens
openchat/openchat-7b
OpenChat 7B is a library of open-source language models, fine-tuned with "C-RLFT (Conditioned Reinforcement Learning Fine-Tuning)" - a strategy inspired by offline reinforcement learning. It has been trained on mixed-quality data without preference labels.
- For OpenChat fine-tuned on Mistral 7B, check out OpenChat 7B.
- For OpenChat fine-tuned on Llama 8B, check out OpenChat 8B.
#open-source
Context: 8192 tokens
Max output: 8192 tokens
undi95/toppy-m-7b:nitro
amazon/nova-lite-v1
Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast processing of image, video, and text inputs to generate text output. Amazon Nova Lite can handle real-time customer interactions, document analysis, and visual question-answering tasks with high accuracy.
With an input context of 300K tokens, it can analyze multiple images or up to 30 minutes of video in a single input.
Context: 300000 tokens
Max output: 5120 tokens
Pro Models31
These models are available to Pro subscribers with unlimited usage included in the subscription.
thedrummer/unslopnemo-12b
UnslopNemo v4.1 is the latest addition from the creator of Rocinante, designed for adventure writing and role-play scenarios.
Context: 32000 tokens
Max output: N/A tokens
meta-llama/llama-3.1-70b-instruct
Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases.
It has demonstrated strong performance compared to leading closed-source models in human evaluations.
To read more about the model release, click here. Usage of this model is subject to Meta's Acceptable Use Policy.
Context: 131072 tokens
Max output: 8192 tokens
nousresearch/hermes-3-llama-3.1-70b
Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board.
Hermes 3 70B is a competitive, if not superior finetune of the Llama-3.1 70B foundation model, focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user.
The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills.
Context: 131000 tokens
Max output: 131000 tokens
deepseek/deepseek-chat
DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported evaluations reveal that the model outperforms other open-source models and rivals leading closed-source models.
For model details, please visit the DeepSeek-V3 repo for more information, or see the launch announcement.
Context: 163840 tokens
Max output: 163840 tokens
microsoft/phi-3.5-mini-128k-instruct
Phi-3.5 models are lightweight, state-of-the-art open models. These models were trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties. Phi-3.5 Mini uses 3.8B parameters, and is a dense decoder-only transformer model using the same tokenizer as Phi-3 Mini.
The models underwent a rigorous enhancement process, incorporating both supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3.5 models showcased robust and state-of-the-art performance among models with less than 13 billion parameters.
Context: 128000 tokens
Max output: N/A tokens
ai21/jamba-1-5-mini
Jamba 1.5 Mini is the world's first production-grade Mamba-based model, combining SSM and Transformer architectures for a 256K context window and high efficiency.
It works with 9 languages and can handle various writing and analysis tasks as well as or better than similar small models.
This model uses less computer memory and works faster with longer texts than previous designs.
Read their announcement to learn more.
Context: 256000 tokens
Max output: 4096 tokens
mistralai/codestral-mamba
A 7.3B parameter Mamba-based model designed for code and reasoning tasks.
- Linear time inference, allowing for theoretically infinite sequence lengths
- 256k token context window
- Optimized for quick responses, especially beneficial for code productivity
- Performs comparably to state-of-the-art transformer models in code and reasoning tasks
- Available under the Apache 2.0 license for free use, modification, and distribution
Context: 256000 tokens
Max output: N/A tokens
openai/gpt-4o-mini
GPT-4o mini is OpenAI's newest model after GPT-4 Omni, supporting both text and image inputs with text outputs.
As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than GPT-3.5 Turbo. It maintains SOTA intelligence, while being significantly more cost-effective.
GPT-4o mini achieves an 82% score on MMLU and presently ranks higher than GPT-4 on chat preferences common leaderboards.
Check out the launch announcement to learn more.
#multimodal
Context: 128000 tokens
Max output: 16384 tokens
anthropic/claude-3-haiku
Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance.
See the launch announcement and benchmark results here
#multimodal
Context: 200000 tokens
Max output: 4096 tokens
cognitivecomputations/dolphin-mixtral-8x22b
Dolphin 2.9 is designed for instruction following, conversational, and coding. This model is a finetune of Mixtral 8x22B Instruct. It features a 64k context length and was fine-tuned with a 16k sequence length using ChatML templates.
This model is a successor to Dolphin Mixtral 8x7B.
The model is uncensored and is stripped of alignment and bias. It requires an external alignment layer for ethical use. Users are cautioned to use this highly compliant model responsibly, as detailed in a blog post about uncensored models at erichartford.com/uncensored-models.
#moe #uncensored
Context: 16000 tokens
Max output: N/A tokens
google/gemma-2-27b-it
Gemma 2 27B by Google is an open model built from the same research and technology used to create the Gemini models.
Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning.
See the launch announcement for more details. Usage of Gemma is subject to Google's Gemma Terms of Use.
Context: 8192 tokens
Max output: 8192 tokens
mistralai/mixtral-8x7b-instruct
Mixtral 8x7B Instruct is a pretrained generative Sparse Mixture of Experts, by Mistral AI, for chat and instruction use. Incorporates 8 experts (feed-forward networks) for a total of 47 billion parameters.
Instruct model fine-tuned by Mistral. #moe
Context: 32768 tokens
Max output: 8192 tokens
mistralai/mistral-small-24b-instruct-2501
Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed for efficient local deployment.
The model achieves 81% accuracy on the MMLU benchmark and performs competitively with larger models like Llama 3.3 70B and Qwen 32B, while operating at three times the speed on equivalent hardware. Read the blog post about the model here.
Context: 32768 tokens
Max output: 8192 tokens
gryphe/mythomist-7b
anthropic/claude-instant-1:beta
nvidia/llama-3.1-nemotron-70b-instruct
NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging Llama 3.1 70B architecture and Reinforcement Learning from Human Feedback (RLHF), it excels in automatic alignment benchmarks. This model is tailored for applications requiring high accuracy in helpfulness and response generation, suitable for diverse user queries across multiple domains.
Usage of this model is subject to Meta's Acceptable Use Policy.
Context: 131000 tokens
Max output: 131000 tokens
thedrummer/rocinante-12b
Rocinante 12B is designed for engaging storytelling and rich prose.
Early testers have reported:
- Expanded vocabulary with unique and expressive word choices
- Enhanced creativity for vivid narratives
- Adventure-filled and captivating stories
Context: 32768 tokens
Max output: N/A tokens
eva-unit-01/eva-qwen-2.5-14b
mistralai/mistral-tiny
This model is currently powered by Mistral-7B-v0.2, and incorporates a "better" fine-tuning than Mistral 7B, inspired by community work. It's best used for large batch processing tasks where cost is a significant factor but reasoning capabilities are not crucial.
Context: 32000 tokens
Max output: N/A tokens
mistralai/mistral-small
With 22 billion parameters, Mistral Small v24.09 offers a convenient mid-point between (Mistral NeMo 12B)[/mistralai/mistral-nemo] and (Mistral Large 2)[/mistralai/mistral-large], providing a cost-effective solution that can be deployed across various platforms and environments. It has better reasoning, exhibits more capabilities, can produce and reason about code, and is multiligual, supporting English, French, German, Italian, and Spanish.
Context: 32000 tokens
Max output: N/A tokens
qwen/qwen-turbo
Qwen-Turbo, based on Qwen2.5, is a 1M context model that provides fast speed and low cost, suitable for simple tasks.
Context: 1000000 tokens
Max output: 8192 tokens
qwen/qwen-plus
Qwen-Plus, based on the Qwen2.5 foundation model, is a 131K context model with a balanced performance, speed, and cost combination.
Context: 131072 tokens
Max output: 8192 tokens
deepseek/deepseek-r1-distill-qwen-1.5b
DeepSeek R1 Distill Qwen 1.5B is a distilled large language model based on Qwen 2.5 Math 1.5B, using outputs from DeepSeek R1. It's a very small and efficient model which outperforms GPT 4o 0513 on Math Benchmarks.
Other benchmark results include:
- AIME 2024 pass@1: 28.9
- AIME 2024 cons@64: 52.7
- MATH-500 pass@1: 83.9
The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.
Context: 131072 tokens
Max output: 32768 tokens
deepseek/deepseek-r1-distill-qwen-32b
DeepSeek R1 Distill Qwen 32B is a distilled large language model based on Qwen 2.5 32B, using outputs from DeepSeek R1. It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
Other benchmark results include:
- AIME 2024 pass@1: 72.6
- MATH-500 pass@1: 94.3
- CodeForces Rating: 1691
The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.
Context: 131072 tokens
Max output: 8192 tokens
deepseek/deepseek-r1-distill-llama-70b
DeepSeek R1 Distill Llama 70B is a distilled large language model based on Llama-3.3-70B-Instruct, using outputs from DeepSeek R1. The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including:
- AIME 2024 pass@1: 70.0
- MATH-500 pass@1: 94.5
- CodeForces Rating: 1633
The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.
Context: 131072 tokens
Max output: 8192 tokens
qwen/qvq-72b-preview
qwen/qwq-32b-preview
QwQ-32B-Preview is an experimental research model focused on AI reasoning capabilities developed by the Qwen Team. As a preview release, it demonstrates promising analytical abilities while having several important limitations:
- Language Mixing and Code-Switching: The model may mix languages or switch between them unexpectedly, affecting response clarity.
- Recursive Reasoning Loops: The model may enter circular reasoning patterns, leading to lengthy responses without a conclusive answer.
- Safety and Ethical Considerations: The model requires enhanced safety measures to ensure reliable and secure performance, and users should exercise caution when deploying it.
- Performance and Benchmark Limitations: The model excels in math and coding but has room for improvement in other areas, such as common sense reasoning and nuanced language understanding.
Context: 32768 tokens
Max output: N/A tokens
qwen/qwen-2.5-coder-32b-instruct
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:
- Significantly improvements in code generation, code reasoning and code fixing.
- A more comprehensive foundation for real-world applications such as Code Agents. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies.
To read more about its evaluation results, check out Qwen 2.5 Coder's blog.
Context: 33000 tokens
Max output: 3000 tokens
mistralai/codestral-2501
Mistral's cutting-edge language model for coding. Codestral specializes in low-latency, high-frequency tasks such as fill-in-the-middle (FIM), code correction and test generation.
Learn more on their blog post: https://mistral.ai/news/codestral-2501/
Context: 256000 tokens
Max output: N/A tokens
meta-llama/llama-3.3-70b-instruct
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.
Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
Context: 131072 tokens
Max output: N/A tokens
deepseek/deepseek-r1-distill-llama-3.1-70b
Pro Metered Models21
These premium models are available on a pay-as-you-go basis with per-token pricing.
anthropic/claude-3.7-sonnet
Input: $0.000003 per token
Output: $0.000015 per token
Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and extended, step-by-step processing for complex tasks. The model demonstrates notable improvements in coding, particularly in front-end development and full-stack updates, and excels in agentic workflows, where it can autonomously navigate multi-step processes.
Claude 3.7 Sonnet maintains performance parity with its predecessor in standard mode while offering an extended reasoning mode for enhanced accuracy in math, coding, and instruction-following tasks.
Read more at the blog post here
Context: 200000 tokens
Max output: 128000 tokens
✓ Moderated
anthropic/claude-3.7-sonnet:thinking
Input: $0.000003 per token
Output: $0.000015 per token
Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and extended, step-by-step processing for complex tasks. The model demonstrates notable improvements in coding, particularly in front-end development and full-stack updates, and excels in agentic workflows, where it can autonomously navigate multi-step processes.
Claude 3.7 Sonnet maintains performance parity with its predecessor in standard mode while offering an extended reasoning mode for enhanced accuracy in math, coding, and instruction-following tasks.
Read more at the blog post here
Context: 200000 tokens
Max output: 128000 tokens
✓ Moderated
deepseek/deepseek-r1
Input: $0.00000055 per token
Output: $0.00000219 per token
DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.
Fully open-source model & technical report.
MIT licensed: Distill & commercialize freely!
Context: 163840 tokens
Max output: 163840 tokens
✗ Unmoderated
openai/gpt-4o-2024-11-20
Input: $0.0000025 per token
Output: $0.00001 per token
The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more natural, engaging, and tailored writing to improve relevance & readability. It’s also better at working with uploaded files, providing deeper insights & more thorough responses.
GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of GPT-4 Turbo while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.
Context: 128000 tokens
Max output: 16384 tokens
✓ Moderated
openai/o3-mini-high
Input: $0.0000011 per token
Output: $0.0000044 per token
OpenAI o3-mini-high is the same model as o3-mini with reasoning_effort set to high.
o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and coding. The model features three adjustable reasoning effort levels and supports key developer capabilities including function calling, structured outputs, and streaming, though it does not include vision processing capabilities.
The model demonstrates significant improvements over its predecessor, with expert testers preferring its responses 56% of the time and noting a 39% reduction in major errors on complex questions. With medium reasoning effort settings, o3-mini matches the performance of the larger o1 model on challenging reasoning evaluations like AIME and GPQA, while maintaining lower latency and cost.
Context: 200000 tokens
Max output: 100000 tokens
✓ Moderated
allenai/llama-3.1-tulu-3-405b
Input: $0.000005 per token
Output: $0.00001 per token
Tülu 3 405B is the largest model in the Tülu 3 family, applying fully open post-training recipes at a 405B parameter scale. Built on the Llama 3.1 405B base, it leverages Reinforcement Learning with Verifiable Rewards (RLVR) to enhance instruction following, MATH, GSM8K, and IFEval performance. As part of Tülu 3’s fully open-source approach, it offers state-of-the-art capabilities while surpassing prior open-weight models like Llama 3.1 405B Instruct and Nous Hermes 3 405B on multiple benchmarks. To read more, click here.
Context: 16384 tokens
Max output: 4096 tokens
✗ Unmoderated
aion-labs/aion-1.0
Input: $0.000004 per token
Output: $0.000008 per token
Aion-1.0 is a multi-model system designed for high performance across various tasks, including reasoning and coding. It is built on DeepSeek-R1, augmented with additional models and techniques such as Tree of Thoughts (ToT) and Mixture of Experts (MoE). It is Aion Lab's most powerful reasoning model.
Context: 32768 tokens
Max output: 32768 tokens
✗ Unmoderated
qwen/qwen-max
Input: $0.0000016 per token
Output: $0.0000064 per token
Qwen-Max, based on Qwen2.5, provides the best inference performance among Qwen models, especially for complex multi-step tasks. It's a large-scale MoE model that has been pretrained on over 20 trillion tokens and further post-trained with curated Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) methodologies. The parameter count is unknown.
Context: 32768 tokens
Max output: 8192 tokens
✗ Unmoderated
openai/o1
Input: $0.000015 per token
Output: $0.00006 per token
The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought.
The o1 models are optimized for math, science, programming, and other STEM-related tasks. They consistently exhibit PhD-level accuracy on benchmarks in physics, chemistry, and biology. Learn more in the launch announcement.
Context: 200000 tokens
Max output: 100000 tokens
✓ Moderated
x-ai/grok-2-1212
Input: $0.000002 per token
Output: $0.00001 per token
Grok 2 1212 introduces significant enhancements to accuracy, instruction adherence, and multilingual support, making it a powerful and flexible choice for developers seeking a highly steerable, intelligent model.
Context: 131072 tokens
Max output: N/A tokens
✗ Unmoderated
mistralai/mistral-large-2411
Input: $0.000002 per token
Output: $0.000006 per token
Mistral Large 2 2411 is an update of Mistral Large 2 released together with Pixtral Large 2411
It provides a significant upgrade on the previous Mistral Large 24.07, with notable improvements in long context understanding, a new system prompt, and more accurate function calling.
Context: 128000 tokens
Max output: N/A tokens
✗ Unmoderated
neversleep/llama-3.1-lumimaid-70b
Input: $0.000003375 per token
Output: $0.0000045 per token
Lumimaid v0.2 70B is a finetune of Llama 3.1 70B with a "HUGE step up dataset wise" compared to Lumimaid v0.1. Sloppy chats output were purged.
Usage of this model is subject to Meta's Acceptable Use Policy.
Context: 16384 tokens
Max output: 2048 tokens
✗ Unmoderated
x-ai/grok-beta
Input: $0.000005 per token
Output: $0.000015 per token
Grok Beta is xAI's experimental language model with state-of-the-art reasoning capabilities, best for complex and multi-step use cases.
It is the successor of Grok 2 with enhanced context length.
Context: 131072 tokens
Max output: N/A tokens
✗ Unmoderated
inflection/inflection-3-pi
Input: $0.0000025 per token
Output: $0.00001 per token
Inflection 3 Pi powers Inflection's Pi chatbot, including backstory, emotional intelligence, productivity, and safety. It has access to recent news, and excels in scenarios like customer support and roleplay.
Pi has been trained to mirror your tone and style, if you use more emojis, so will Pi! Try experimenting with various prompts and conversation styles.
Context: 8000 tokens
Max output: 1024 tokens
✗ Unmoderated
cohere/command-r-plus-08-2024
Input: $0.000002375 per token
Output: $0.0000095 per token
command-r-plus-08-2024 is an update of the Command R+ with roughly 50% higher throughput and 25% lower latencies as compared to the previous Command R+ version, while keeping the hardware footprint the same.
Read the launch post here.
Use of this model is subject to Cohere's Usage Policy and SaaS Agreement.
Context: 128000 tokens
Max output: 4000 tokens
✗ Unmoderated
ai21/jamba-1-5-large
Input: $0.000002 per token
Output: $0.000008 per token
Jamba 1.5 Large is part of AI21's new family of open models, offering superior speed, efficiency, and quality.
It features a 256K effective context window, the longest among open models, enabling improved performance on tasks like document summarization and analysis.
Built on a novel SSM-Transformer architecture, it outperforms larger models like Llama 3.1 70B on benchmarks while maintaining resource efficiency.
Read their announcement to learn more.
Context: 256000 tokens
Max output: 4096 tokens
✗ Unmoderated
01-ai/yi-large
Input: $0.000003 per token
Output: $0.000003 per token
The Yi Large model was designed by 01.AI with the following usecases in mind: knowledge search, data classification, human-like chat bots, and customer service.
It stands out for its multilingual proficiency, particularly in Spanish, Chinese, Japanese, German, and French.
Check out the launch announcement to learn more.
Context: 32768 tokens
Max output: 4096 tokens
✗ Unmoderated
neversleep/llama-3-lumimaid-70b
Input: $0.000003375 per token
Output: $0.0000045 per token
The NeverSleep team is back, with a Llama 3 70B finetune trained on their curated roleplay data. Striking a balance between eRP and RP, Lumimaid was designed to be serious, yet uncensored when necessary.
To enhance it's overall intelligence and chat capability, roughly 40% of the training data was not roleplay. This provides a breadth of knowledge to access, while still keeping roleplay as the primary strength.
Usage of this model is subject to Meta's Acceptable Use Policy.
Context: 8192 tokens
Max output: 2048 tokens
✗ Unmoderated
anthropic/claude-3-opus
Input: $0.000015 per token
Output: $0.000075 per token
Claude 3 Opus is Anthropic's most powerful model for highly complex tasks. It boasts top-level performance, intelligence, fluency, and understanding.
See the launch announcement and benchmark results here
#multimodal
Context: 200000 tokens
Max output: 4096 tokens
✓ Moderated
anthropic/claude-3-sonnet
Input: $0.000003 per token
Output: $0.000015 per token
Claude 3 Sonnet is an ideal balance of intelligence and speed for enterprise workloads. Maximum utility at a lower price, dependable, balanced for scaled deployments.
See the launch announcement and benchmark results here
#multimodal
Context: 200000 tokens
Max output: 4096 tokens
✓ Moderated
alpindale/goliath-120b
Input: $0.000009375 per token
Output: $0.000009375 per token
A large LLM created by combining two fine-tuned Llama 70B models into one 120B model. Combines Xwin and Euryale.
Credits to
- @chargoddard for developing the framework used to merge the model - mergekit.
- @Undi95 for helping with the merge ratios.
#merge
Context: 6144 tokens
Max output: 512 tokens
✗ Unmoderated