Chat Models

Our Chat API has built-in support for many popular models we host via our serverless endpoints, as well as any model that you configure and host yourself using our dedicated GPU infrastructure. When using one of our serverless models, you’ll be charged based on the amount of tokens you use in your queries. For dedicated models that you configure and run yourself, you’ll be charged per minute as long as your endpoint is running. You can start or stop your endpoint at any time using our online playground. To learn more about the pricing for our serverless endoints, check out our pricing page.

Hosted models

In the table below, models marked as “Turbo” are quantized to FP8 and those marked as “Lite” are INT4. All our other models are at full precision (FP16).

If you’re not sure which chat model to use, we currently recommend Llama 3.1 8B Turbo (meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo) to get started.

Organization	Model Name	API Model String	Context length	Quantization
Meta	Llama 3.1 8B Instruct Turbo	meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo	128000	FP8
Meta	Llama 3.1 70B Instruct Turbo	meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo	128000	FP8
Meta	Llama 3.1 405B Instruct Turbo	meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo	4096	FP8
Meta	Llama 3 8B Instruct Turbo	meta-llama/Meta-Llama-3-8B-Instruct-Turbo	8192	FP8
Meta	Llama 3 70B Instruct Turbo	meta-llama/Meta-Llama-3-70B-Instruct-Turbo	8192	FP8
Meta	Llama 3 8B Instruct Lite	meta-llama/Meta-Llama-3-8B-Instruct-Lite	8192	INT4
Meta	Llama 3 70B Instruct Lite	meta-llama/Meta-Llama-3-70B-Instruct-Lite	8192	INT4
Google	Gemma 2 27B	google/gemma-2-27b-it	8192	FP16
Google	Gemma 2 9B	google/gemma-2-9b-it	8192	FP16
Allen AI	OLMo Instruct (7B)	allenai/OLMo-7B-Instruct	2048	FP16
01.AI	01-ai Yi Chat (34B)	zero-one-ai/Yi-34B-Chat	4096	FP16
Allen AI	OLMo Twin-2T (7B)	allenai/OLMo-7B-Twin-2T	2048	FP16
Allen AI	OLMo (7B)	allenai/OLMo-7B	2048	FP16
Austism	Chronos Hermes (13B)	Austism/chronos-hermes-13b	2048	FP16
Cognitive	Dolphin 2.5 Mixtral 8x7b	cognitivecomputations/dolphin-2.5-mixtral-8x7b	32768	FP16
databricks	DBRX Instruct	databricks/dbrx-instruct	32768	FP16
DeepSeek	Deepseek Coder Instruct (33B)	deepseek-ai/deepseek-coder-33b-instruct	16384	FP16
DeepSeek	DeepSeek LLM Chat (67B)	deepseek-ai/deepseek-llm-67b-chat	4096	FP16
garage-bAInd	Platypus2 Instruct (70B)	garage-bAInd/Platypus2-70B-instruct	4096	FP16
Google	Gemma Instruct (2B)	google/gemma-2b-it	8192	FP16
Google	Gemma Instruct (7B)	google/gemma-7b-it	8192	FP16
Gryphe	MythoMax-L2 (13B)	Gryphe/MythoMax-L2-13b	4096	FP16
LM Sys	Vicuna v1.5 (13B)	lmsys/vicuna-13b-v1.5	4096	FP16
LM Sys	Vicuna v1.5 (7B)	lmsys/vicuna-7b-v1.5	4096	FP16
Meta	Code Llama Instruct (13B)	codellama/CodeLlama-13b-Instruct-hf	16384	FP16
Meta	Code Llama Instruct (34B)	codellama/CodeLlama-34b-Instruct-hf	16384	FP16
Meta	Code Llama Instruct (70B)	codellama/CodeLlama-70b-Instruct-hf	4096	FP16
Meta	Code Llama Instruct (7B)	codellama/CodeLlama-7b-Instruct-hf	16384	FP16
Meta	LLaMA-2 Chat (70B)	meta-llama/Llama-2-70b-chat-hf	4096	FP16
Meta	LLaMA-2 Chat (13B)	meta-llama/Llama-2-13b-chat-hf	4096	FP16
Meta	LLaMA-2 Chat (7B)	meta-llama/Llama-2-7b-chat-hf	4096	FP16
Meta	LLaMA-3 Chat (8B)	meta-llama/Llama-3-8b-chat-hf	8192	FP16
Meta	LLaMA-3 Chat (70B)	meta-llama/Llama-3-70b-chat-hf	8192	FP16
mistralai	Mistral (7B) Instruct	mistralai/Mistral-7B-Instruct-v0.1	8192	FP16
mistralai	Mistral (7B) Instruct v0.2	mistralai/Mistral-7B-Instruct-v0.2	32768	FP16
mistralai	Mistral (7B) Instruct v0.3	mistralai/Mistral-7B-Instruct-v0.3	32768	FP16
mistralai	Mixtral-8x7B Instruct (46.7B)	mistralai/Mixtral-8x7B-Instruct-v0.1	32768	FP16
mistralai	Mixtral-8x22B Instruct (141B)	mistralai/Mixtral-8x22B-Instruct-v0.1	65536	FP16
NousResearch	Nous Capybara v1.9 (7B)	NousResearch/Nous-Capybara-7B-V1p9	8192	FP16
NousResearch	Nous Hermes 2 - Mistral DPO (7B)	NousResearch/Nous-Hermes-2-Mistral-7B-DPO	32768	FP16
NousResearch	Nous Hermes 2 - Mixtral 8x7B-DPO (46.7B)	NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO	32768	FP16
NousResearch	Nous Hermes 2 - Mixtral 8x7B-SFT (46.7B)	NousResearch/Nous-Hermes-2-Mixtral-8x7B-SFT	32768	FP16
NousResearch	Nous Hermes LLaMA-2 (7B)	NousResearch/Nous-Hermes-llama-2-7b	4096	FP16
NousResearch	Nous Hermes Llama-2 (13B)	NousResearch/Nous-Hermes-Llama2-13b	4096	FP16
NousResearch	Nous Hermes-2 Yi (34B)	NousResearch/Nous-Hermes-2-Yi-34B	4096	FP16
OpenChat	OpenChat 3.5 (7B)	openchat/openchat-3.5-1210	8192	FP16
OpenOrca	OpenOrca Mistral (7B) 8K	Open-Orca/Mistral-7B-OpenOrca	8192	FP16
Qwen	Qwen 1.5 Chat (0.5B)	Qwen/Qwen1.5-0.5B-Chat	32768	FP16
Qwen	Qwen 1.5 Chat (1.8B)	Qwen/Qwen1.5-1.8B-Chat	32768	FP16
Qwen	Qwen 1.5 Chat (4B)	Qwen/Qwen1.5-4B-Chat	32768	FP16
Qwen	Qwen 1.5 Chat (7B)	Qwen/Qwen1.5-7B-Chat	32768	FP16
Qwen	Qwen 1.5 Chat (14B)	Qwen/Qwen1.5-14B-Chat	32768	FP16
Qwen	Qwen 1.5 Chat (32B)	Qwen/Qwen1.5-32B-Chat	32768	FP16
Qwen	Qwen 1.5 Chat (72B)	Qwen/Qwen1.5-72B-Chat	32768	FP16
Qwen	Qwen 1.5 Chat (110B)	Qwen/Qwen1.5-110B-Chat	32768	FP16
Qwen	Qwen 2 Instruct (72B)	Qwen/Qwen2-72B-Instruct	32768	FP16
Snorkel AI	Snorkel Mistral PairRM DPO (7B)	snorkelai/Snorkel-Mistral-PairRM-DPO	32768	FP16
Snowflake	Snowflake Arctic Instruct	Snowflake/snowflake-arctic-instruct	4096	FP16
Stanford	Alpaca (7B)	togethercomputer/alpaca-7b	2048	FP16
Teknium	OpenHermes-2-Mistral (7B)	teknium/OpenHermes-2-Mistral-7B	8192	FP16
Teknium	OpenHermes-2.5-Mistral (7B)	teknium/OpenHermes-2p5-Mistral-7B	8192	FP16
Together	LLaMA-2-7B-32K-Instruct (7B)	togethercomputer/Llama-2-7B-32K-Instruct	32768	FP16
Together	RedPajama-INCITE Chat (3B)	togethercomputer/RedPajama-INCITE-Chat-3B-v1	2048	FP16
Together	RedPajama-INCITE Chat (7B)	togethercomputer/RedPajama-INCITE-7B-Chat	2048	FP16
Together	StripedHyena Nous (7B)	togethercomputer/StripedHyena-Nous-7B	32768	FP16
Undi95	ReMM SLERP L2 (13B)	Undi95/ReMM-SLERP-L2-13B	4096	FP16
Undi95	Toppy M (7B)	Undi95/Toppy-M-7B	4096	FP16
WizardLM	WizardLM v1.2 (13B)	WizardLM/WizardLM-13B-V1.2	4096	FP16
upstage	Upstage SOLAR Instruct v1 (11B)	upstage/SOLAR-10.7B-Instruct-v1.0	4096	FP16

Dedicated Instances

Customizable on-demand deployable model instances, priced by hour hosted. All models in the serverless endpoints are available for hosting as private dedicated instances. Additionally, the below models are also available for hosting as private dedicated instances.

Organization	Model Name	Model String for API	Context length	Quantization
Databricks	Dolly v2 (12B)	databricks/dolly-v2-12b	2048	FP16
Databricks	Dolly v2 (3B)	databricks/dolly-v2-3b	2048	FP16
Databricks	Dolly v2 (7B)	databricks/dolly-v2-7b	2048	FP16
DiscoResearch	DiscoLM Mixtral 8x7b (46.7B)	DiscoResearch/DiscoLM-mixtral-8x7b-v2	32768	FP16
HuggingFace	Zephyr-7B-ß	HuggingFaceH4/zephyr-7b-beta	32768	FP16
HuggingFaceH4	StarCoderChat Alpha (16B)	HuggingFaceH4/starchat-alpha	8192	FP16
LAION	Open-Assistant Pythia SFT-4 (12B)	OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5	2048	FP16
LAION	Open-Assistant StableLM SFT-7 (7B)	OpenAssistant/stablelm-7b-sft-v7-epoch-3	4096	FP16
LM Sys	Koala (13B)	togethercomputer/Koala-13B	2048	FP16
LM Sys	Koala (7B)	togethercomputer/Koala-7B	2048	FP16
LM Sys	Vicuna v1.3 (13B)	lmsys/vicuna-13b-v1.3	2048	FP16
LM Sys	Vicuna v1.3 (7B)	lmsys/vicuna-7b-v1.3	2048	FP16
LM Sys	Vicuna-FastChat-T5 (3B)	lmsys/fastchat-t5-3b-v1.0	512	FP16
Mosaic ML	MPT-Chat (30B)	togethercomputer/mpt-30b-chat	2048	FP16
Mosaic ML	MPT-Chat (7B)	togethercomputer/mpt-7b-chat	2048	FP16
NousResearch	Nous Hermes LLaMA-2 (70B)	NousResearch/Nous-Hermes-Llama2-70b	4096	FP16
Qwen	Qwen Chat (7B)	Qwen/Qwen-7B-Chat	2048	FP16
Qwen	Qwen Chat (14B)	Qwen/Qwen-14B-Chat	2048	FP16
TII	Falcon Instruct (7B)	tiiuae/falcon-7b-instruct	2048	FP16
TII	Falcon Instruct (40B)	tiiuae/falcon-40b-instruct	2048	FP16
Tim Dettmers	Guanaco (13B)	togethercomputer/guanaco-13b	2048	FP16
Tim Dettmers	Guanaco (33B)	togethercomputer/guanaco-33b	2048	FP16
Tim Dettmers	Guanaco (65B)	togethercomputer/guanaco-65b	2048	FP16
Tim Dettmers	Guanaco (7B)	togethercomputer/guanaco-7b	2048	FP16
Together	GPT-NeoXT-Chat-Base (20B)	togethercomputer/GPT-NeoXT-Chat-Base-20B	2048	FP16
Together	Pythia-Chat-Base (7B)	togethercomputer/Pythia-Chat-Base-7B-v0.16	2048	FP16

Request a model

Don’t see a model you want to use? Send us a Model Request here → Updated 1 day ago

Getting Started

Inference Chat

Hosted models

Dedicated Instances

Request a model

Getting Started

Inference Chat

​Hosted models

​Dedicated Instances

​Request a model

Hosted models

Dedicated Instances

Request a model