Our Chat API has built-in support for many popular models we host via our serverless endpoints, as well as any model that you configure and host yourself using our dedicated GPU infrastructure. When using one of our serverless models, you’ll be charged based on the amount of tokens you use in your queries. For dedicated models that you configure and run yourself, you’ll be charged per minute as long as your endpoint is running. You can start or stop your endpoint at any time using our online playground. To learn more about the pricing for our serverless endoints, check out our pricing page.

Hosted models

In the table below, models marked as “Turbo” are quantized to FP8 and those marked as “Lite” are INT4. All our other models are at full precision (FP16).
If you’re not sure which chat model to use, we currently recommend Llama 3.1 8B Turbo (meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo) to get started.
OrganizationModel NameAPI Model StringContext lengthQuantization
MetaLlama 3.1 8B Instruct Turbometa-llama/Meta-Llama-3.1-8B-Instruct-Turbo128000FP8
MetaLlama 3.1 70B Instruct Turbometa-llama/Meta-Llama-3.1-70B-Instruct-Turbo128000FP8
MetaLlama 3.1 405B Instruct Turbometa-llama/Meta-Llama-3.1-405B-Instruct-Turbo4096FP8
MetaLlama 3 8B Instruct Turbometa-llama/Meta-Llama-3-8B-Instruct-Turbo8192FP8
MetaLlama 3 70B Instruct Turbometa-llama/Meta-Llama-3-70B-Instruct-Turbo8192FP8
MetaLlama 3 8B Instruct Litemeta-llama/Meta-Llama-3-8B-Instruct-Lite8192INT4
MetaLlama 3 70B Instruct Litemeta-llama/Meta-Llama-3-70B-Instruct-Lite8192INT4
GoogleGemma 2 27Bgoogle/gemma-2-27b-it8192FP16
GoogleGemma 2 9Bgoogle/gemma-2-9b-it8192FP16
Allen AIOLMo Instruct (7B)allenai/OLMo-7B-Instruct2048FP16
01.AI01-ai Yi Chat (34B)zero-one-ai/Yi-34B-Chat4096FP16
Allen AIOLMo Twin-2T (7B)allenai/OLMo-7B-Twin-2T2048FP16
Allen AIOLMo (7B)allenai/OLMo-7B2048FP16
AustismChronos Hermes (13B)Austism/chronos-hermes-13b2048FP16
CognitiveDolphin 2.5 Mixtral 8x7bcognitivecomputations/dolphin-2.5-mixtral-8x7b32768FP16
databricksDBRX Instructdatabricks/dbrx-instruct32768FP16
DeepSeekDeepseek Coder Instruct (33B)deepseek-ai/deepseek-coder-33b-instruct16384FP16
DeepSeekDeepSeek LLM Chat (67B)deepseek-ai/deepseek-llm-67b-chat4096FP16
garage-bAIndPlatypus2 Instruct (70B)garage-bAInd/Platypus2-70B-instruct4096FP16
GoogleGemma Instruct (2B)google/gemma-2b-it8192FP16
GoogleGemma Instruct (7B)google/gemma-7b-it8192FP16
GrypheMythoMax-L2 (13B)Gryphe/MythoMax-L2-13b4096FP16
LM SysVicuna v1.5 (13B)lmsys/vicuna-13b-v1.54096FP16
LM SysVicuna v1.5 (7B)lmsys/vicuna-7b-v1.54096FP16
MetaCode Llama Instruct (13B)codellama/CodeLlama-13b-Instruct-hf16384FP16
MetaCode Llama Instruct (34B)codellama/CodeLlama-34b-Instruct-hf16384FP16
MetaCode Llama Instruct (70B)codellama/CodeLlama-70b-Instruct-hf4096FP16
MetaCode Llama Instruct (7B)codellama/CodeLlama-7b-Instruct-hf16384FP16
MetaLLaMA-2 Chat (70B)meta-llama/Llama-2-70b-chat-hf4096FP16
MetaLLaMA-2 Chat (13B)meta-llama/Llama-2-13b-chat-hf4096FP16
MetaLLaMA-2 Chat (7B)meta-llama/Llama-2-7b-chat-hf4096FP16
MetaLLaMA-3 Chat (8B)meta-llama/Llama-3-8b-chat-hf8192FP16
MetaLLaMA-3 Chat (70B)meta-llama/Llama-3-70b-chat-hf8192FP16
mistralaiMistral (7B) Instructmistralai/Mistral-7B-Instruct-v0.18192FP16
mistralaiMistral (7B) Instruct v0.2mistralai/Mistral-7B-Instruct-v0.232768FP16
mistralaiMistral (7B) Instruct v0.3mistralai/Mistral-7B-Instruct-v0.332768FP16
mistralaiMixtral-8x7B Instruct (46.7B)mistralai/Mixtral-8x7B-Instruct-v0.132768FP16
mistralaiMixtral-8x22B Instruct (141B)mistralai/Mixtral-8x22B-Instruct-v0.165536FP16
NousResearchNous Capybara v1.9 (7B)NousResearch/Nous-Capybara-7B-V1p98192FP16
NousResearchNous Hermes 2 - Mistral DPO (7B)NousResearch/Nous-Hermes-2-Mistral-7B-DPO32768FP16
NousResearchNous Hermes 2 - Mixtral 8x7B-DPO (46.7B)NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO32768FP16
NousResearchNous Hermes 2 - Mixtral 8x7B-SFT (46.7B)NousResearch/Nous-Hermes-2-Mixtral-8x7B-SFT32768FP16
NousResearchNous Hermes LLaMA-2 (7B)NousResearch/Nous-Hermes-llama-2-7b4096FP16
NousResearchNous Hermes Llama-2 (13B)NousResearch/Nous-Hermes-Llama2-13b4096FP16
NousResearchNous Hermes-2 Yi (34B)NousResearch/Nous-Hermes-2-Yi-34B4096FP16
OpenChatOpenChat 3.5 (7B)openchat/openchat-3.5-12108192FP16
OpenOrcaOpenOrca Mistral (7B) 8KOpen-Orca/Mistral-7B-OpenOrca8192FP16
QwenQwen 1.5 Chat (0.5B)Qwen/Qwen1.5-0.5B-Chat32768FP16
QwenQwen 1.5 Chat (1.8B)Qwen/Qwen1.5-1.8B-Chat32768FP16
QwenQwen 1.5 Chat (4B)Qwen/Qwen1.5-4B-Chat32768FP16
QwenQwen 1.5 Chat (7B)Qwen/Qwen1.5-7B-Chat32768FP16
QwenQwen 1.5 Chat (14B)Qwen/Qwen1.5-14B-Chat32768FP16
QwenQwen 1.5 Chat (32B)Qwen/Qwen1.5-32B-Chat32768FP16
QwenQwen 1.5 Chat (72B)Qwen/Qwen1.5-72B-Chat32768FP16
QwenQwen 1.5 Chat (110B)Qwen/Qwen1.5-110B-Chat32768FP16
QwenQwen 2 Instruct (72B)Qwen/Qwen2-72B-Instruct32768FP16
Snorkel AISnorkel Mistral PairRM DPO (7B)snorkelai/Snorkel-Mistral-PairRM-DPO32768FP16
SnowflakeSnowflake Arctic InstructSnowflake/snowflake-arctic-instruct4096FP16
StanfordAlpaca (7B)togethercomputer/alpaca-7b2048FP16
TekniumOpenHermes-2-Mistral (7B)teknium/OpenHermes-2-Mistral-7B8192FP16
TekniumOpenHermes-2.5-Mistral (7B)teknium/OpenHermes-2p5-Mistral-7B8192FP16
TogetherLLaMA-2-7B-32K-Instruct (7B)togethercomputer/Llama-2-7B-32K-Instruct32768FP16
TogetherRedPajama-INCITE Chat (3B)togethercomputer/RedPajama-INCITE-Chat-3B-v12048FP16
TogetherRedPajama-INCITE Chat (7B)togethercomputer/RedPajama-INCITE-7B-Chat2048FP16
TogetherStripedHyena Nous (7B)togethercomputer/StripedHyena-Nous-7B32768FP16
Undi95ReMM SLERP L2 (13B)Undi95/ReMM-SLERP-L2-13B4096FP16
Undi95Toppy M (7B)Undi95/Toppy-M-7B4096FP16
WizardLMWizardLM v1.2 (13B)WizardLM/WizardLM-13B-V1.24096FP16
upstageUpstage SOLAR Instruct v1 (11B)upstage/SOLAR-10.7B-Instruct-v1.04096FP16

Dedicated Instances

Customizable on-demand deployable model instances, priced by hour hosted. All models in the serverless endpoints are available for hosting as private dedicated instances. Additionally, the below models are also available for hosting as private dedicated instances.
OrganizationModel NameModel String for APIContext lengthQuantization
DatabricksDolly v2 (12B)databricks/dolly-v2-12b2048FP16
DatabricksDolly v2 (3B)databricks/dolly-v2-3b2048FP16
DatabricksDolly v2 (7B)databricks/dolly-v2-7b2048FP16
DiscoResearchDiscoLM Mixtral 8x7b (46.7B)DiscoResearch/DiscoLM-mixtral-8x7b-v232768FP16
HuggingFaceZephyr-7B-ßHuggingFaceH4/zephyr-7b-beta32768FP16
HuggingFaceH4StarCoderChat Alpha (16B)HuggingFaceH4/starchat-alpha8192FP16
LAIONOpen-Assistant Pythia SFT-4 (12B)OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.52048FP16
LAIONOpen-Assistant StableLM SFT-7 (7B)OpenAssistant/stablelm-7b-sft-v7-epoch-34096FP16
LM SysKoala (13B)togethercomputer/Koala-13B2048FP16
LM SysKoala (7B)togethercomputer/Koala-7B2048FP16
LM SysVicuna v1.3 (13B)lmsys/vicuna-13b-v1.32048FP16
LM SysVicuna v1.3 (7B)lmsys/vicuna-7b-v1.32048FP16
LM SysVicuna-FastChat-T5 (3B)lmsys/fastchat-t5-3b-v1.0512FP16
Mosaic MLMPT-Chat (30B)togethercomputer/mpt-30b-chat2048FP16
Mosaic MLMPT-Chat (7B)togethercomputer/mpt-7b-chat2048FP16
NousResearchNous Hermes LLaMA-2 (70B)NousResearch/Nous-Hermes-Llama2-70b4096FP16
QwenQwen Chat (7B)Qwen/Qwen-7B-Chat2048FP16
QwenQwen Chat (14B)Qwen/Qwen-14B-Chat2048FP16
TIIFalcon Instruct (7B)tiiuae/falcon-7b-instruct2048FP16
TIIFalcon Instruct (40B)tiiuae/falcon-40b-instruct2048FP16
Tim DettmersGuanaco (13B)togethercomputer/guanaco-13b2048FP16
Tim DettmersGuanaco (33B)togethercomputer/guanaco-33b2048FP16
Tim DettmersGuanaco (65B)togethercomputer/guanaco-65b2048FP16
Tim DettmersGuanaco (7B)togethercomputer/guanaco-7b2048FP16
TogetherGPT-NeoXT-Chat-Base (20B)togethercomputer/GPT-NeoXT-Chat-Base-20B2048FP16
TogetherPythia-Chat-Base (7B)togethercomputer/Pythia-Chat-Base-7B-v0.162048FP16

Request a model

Don’t see a model you want to use? Send us a Model Request here → Updated 1 day ago