Our Chat API has built-in support for many popular models we host via our serverless endpoints, as well as any model that you configure and host yourself using our dedicated GPU infrastructure. When using one of our serverless models, you’ll be charged based on the amount of tokens you use in your queries. For dedicated models that you configure and run yourself, you’ll be charged per minute as long as your endpoint is running. You can start or stop your endpoint at any time using our online playground. To learn more about the pricing for our serverless endoints, check out our pricing page.Documentation Index
Fetch the complete documentation index at: https://together-ai-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Hosted models
In the table below, models marked as “Turbo” are quantized to FP8 and those marked as “Lite” are INT4. All our other models are at full precision (FP16).If you’re not sure which chat model to use, we currently recommend Llama 3.1 8B Turbo (
meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo) to get started.
| Organization | Model Name | API Model String | Context length | Quantization |
|---|---|---|---|---|
| Meta | Llama 3.1 8B Instruct Turbo | meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo | 128000 | FP8 |
| Meta | Llama 3.1 70B Instruct Turbo | meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo | 128000 | FP8 |
| Meta | Llama 3.1 405B Instruct Turbo | meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo | 4096 | FP8 |
| Meta | Llama 3 8B Instruct Turbo | meta-llama/Meta-Llama-3-8B-Instruct-Turbo | 8192 | FP8 |
| Meta | Llama 3 70B Instruct Turbo | meta-llama/Meta-Llama-3-70B-Instruct-Turbo | 8192 | FP8 |
| Meta | Llama 3 8B Instruct Lite | meta-llama/Meta-Llama-3-8B-Instruct-Lite | 8192 | INT4 |
| Meta | Llama 3 70B Instruct Lite | meta-llama/Meta-Llama-3-70B-Instruct-Lite | 8192 | INT4 |
| Gemma 2 27B | google/gemma-2-27b-it | 8192 | FP16 | |
| Gemma 2 9B | google/gemma-2-9b-it | 8192 | FP16 | |
| Allen AI | OLMo Instruct (7B) | allenai/OLMo-7B-Instruct | 2048 | FP16 |
| 01.AI | 01-ai Yi Chat (34B) | zero-one-ai/Yi-34B-Chat | 4096 | FP16 |
| Allen AI | OLMo Twin-2T (7B) | allenai/OLMo-7B-Twin-2T | 2048 | FP16 |
| Allen AI | OLMo (7B) | allenai/OLMo-7B | 2048 | FP16 |
| Austism | Chronos Hermes (13B) | Austism/chronos-hermes-13b | 2048 | FP16 |
| Cognitive | Dolphin 2.5 Mixtral 8x7b | cognitivecomputations/dolphin-2.5-mixtral-8x7b | 32768 | FP16 |
| databricks | DBRX Instruct | databricks/dbrx-instruct | 32768 | FP16 |
| DeepSeek | Deepseek Coder Instruct (33B) | deepseek-ai/deepseek-coder-33b-instruct | 16384 | FP16 |
| DeepSeek | DeepSeek LLM Chat (67B) | deepseek-ai/deepseek-llm-67b-chat | 4096 | FP16 |
| garage-bAInd | Platypus2 Instruct (70B) | garage-bAInd/Platypus2-70B-instruct | 4096 | FP16 |
| Gemma Instruct (2B) | google/gemma-2b-it | 8192 | FP16 | |
| Gemma Instruct (7B) | google/gemma-7b-it | 8192 | FP16 | |
| Gryphe | MythoMax-L2 (13B) | Gryphe/MythoMax-L2-13b | 4096 | FP16 |
| LM Sys | Vicuna v1.5 (13B) | lmsys/vicuna-13b-v1.5 | 4096 | FP16 |
| LM Sys | Vicuna v1.5 (7B) | lmsys/vicuna-7b-v1.5 | 4096 | FP16 |
| Meta | Code Llama Instruct (13B) | codellama/CodeLlama-13b-Instruct-hf | 16384 | FP16 |
| Meta | Code Llama Instruct (34B) | codellama/CodeLlama-34b-Instruct-hf | 16384 | FP16 |
| Meta | Code Llama Instruct (70B) | codellama/CodeLlama-70b-Instruct-hf | 4096 | FP16 |
| Meta | Code Llama Instruct (7B) | codellama/CodeLlama-7b-Instruct-hf | 16384 | FP16 |
| Meta | LLaMA-2 Chat (70B) | meta-llama/Llama-2-70b-chat-hf | 4096 | FP16 |
| Meta | LLaMA-2 Chat (13B) | meta-llama/Llama-2-13b-chat-hf | 4096 | FP16 |
| Meta | LLaMA-2 Chat (7B) | meta-llama/Llama-2-7b-chat-hf | 4096 | FP16 |
| Meta | LLaMA-3 Chat (8B) | meta-llama/Llama-3-8b-chat-hf | 8192 | FP16 |
| Meta | LLaMA-3 Chat (70B) | meta-llama/Llama-3-70b-chat-hf | 8192 | FP16 |
| mistralai | Mistral (7B) Instruct | mistralai/Mistral-7B-Instruct-v0.1 | 8192 | FP16 |
| mistralai | Mistral (7B) Instruct v0.2 | mistralai/Mistral-7B-Instruct-v0.2 | 32768 | FP16 |
| mistralai | Mistral (7B) Instruct v0.3 | mistralai/Mistral-7B-Instruct-v0.3 | 32768 | FP16 |
| mistralai | Mixtral-8x7B Instruct (46.7B) | mistralai/Mixtral-8x7B-Instruct-v0.1 | 32768 | FP16 |
| mistralai | Mixtral-8x22B Instruct (141B) | mistralai/Mixtral-8x22B-Instruct-v0.1 | 65536 | FP16 |
| NousResearch | Nous Capybara v1.9 (7B) | NousResearch/Nous-Capybara-7B-V1p9 | 8192 | FP16 |
| NousResearch | Nous Hermes 2 - Mistral DPO (7B) | NousResearch/Nous-Hermes-2-Mistral-7B-DPO | 32768 | FP16 |
| NousResearch | Nous Hermes 2 - Mixtral 8x7B-DPO (46.7B) | NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO | 32768 | FP16 |
| NousResearch | Nous Hermes 2 - Mixtral 8x7B-SFT (46.7B) | NousResearch/Nous-Hermes-2-Mixtral-8x7B-SFT | 32768 | FP16 |
| NousResearch | Nous Hermes LLaMA-2 (7B) | NousResearch/Nous-Hermes-llama-2-7b | 4096 | FP16 |
| NousResearch | Nous Hermes Llama-2 (13B) | NousResearch/Nous-Hermes-Llama2-13b | 4096 | FP16 |
| NousResearch | Nous Hermes-2 Yi (34B) | NousResearch/Nous-Hermes-2-Yi-34B | 4096 | FP16 |
| OpenChat | OpenChat 3.5 (7B) | openchat/openchat-3.5-1210 | 8192 | FP16 |
| OpenOrca | OpenOrca Mistral (7B) 8K | Open-Orca/Mistral-7B-OpenOrca | 8192 | FP16 |
| Qwen | Qwen 1.5 Chat (0.5B) | Qwen/Qwen1.5-0.5B-Chat | 32768 | FP16 |
| Qwen | Qwen 1.5 Chat (1.8B) | Qwen/Qwen1.5-1.8B-Chat | 32768 | FP16 |
| Qwen | Qwen 1.5 Chat (4B) | Qwen/Qwen1.5-4B-Chat | 32768 | FP16 |
| Qwen | Qwen 1.5 Chat (7B) | Qwen/Qwen1.5-7B-Chat | 32768 | FP16 |
| Qwen | Qwen 1.5 Chat (14B) | Qwen/Qwen1.5-14B-Chat | 32768 | FP16 |
| Qwen | Qwen 1.5 Chat (32B) | Qwen/Qwen1.5-32B-Chat | 32768 | FP16 |
| Qwen | Qwen 1.5 Chat (72B) | Qwen/Qwen1.5-72B-Chat | 32768 | FP16 |
| Qwen | Qwen 1.5 Chat (110B) | Qwen/Qwen1.5-110B-Chat | 32768 | FP16 |
| Qwen | Qwen 2 Instruct (72B) | Qwen/Qwen2-72B-Instruct | 32768 | FP16 |
| Snorkel AI | Snorkel Mistral PairRM DPO (7B) | snorkelai/Snorkel-Mistral-PairRM-DPO | 32768 | FP16 |
| Snowflake | Snowflake Arctic Instruct | Snowflake/snowflake-arctic-instruct | 4096 | FP16 |
| Stanford | Alpaca (7B) | togethercomputer/alpaca-7b | 2048 | FP16 |
| Teknium | OpenHermes-2-Mistral (7B) | teknium/OpenHermes-2-Mistral-7B | 8192 | FP16 |
| Teknium | OpenHermes-2.5-Mistral (7B) | teknium/OpenHermes-2p5-Mistral-7B | 8192 | FP16 |
| Together | LLaMA-2-7B-32K-Instruct (7B) | togethercomputer/Llama-2-7B-32K-Instruct | 32768 | FP16 |
| Together | RedPajama-INCITE Chat (3B) | togethercomputer/RedPajama-INCITE-Chat-3B-v1 | 2048 | FP16 |
| Together | RedPajama-INCITE Chat (7B) | togethercomputer/RedPajama-INCITE-7B-Chat | 2048 | FP16 |
| Together | StripedHyena Nous (7B) | togethercomputer/StripedHyena-Nous-7B | 32768 | FP16 |
| Undi95 | ReMM SLERP L2 (13B) | Undi95/ReMM-SLERP-L2-13B | 4096 | FP16 |
| Undi95 | Toppy M (7B) | Undi95/Toppy-M-7B | 4096 | FP16 |
| WizardLM | WizardLM v1.2 (13B) | WizardLM/WizardLM-13B-V1.2 | 4096 | FP16 |
| upstage | Upstage SOLAR Instruct v1 (11B) | upstage/SOLAR-10.7B-Instruct-v1.0 | 4096 | FP16 |
Dedicated Instances
Customizable on-demand deployable model instances, priced by hour hosted. All models in the serverless endpoints are available for hosting as private dedicated instances. Additionally, the below models are also available for hosting as private dedicated instances.| Organization | Model Name | Model String for API | Context length | Quantization |
|---|---|---|---|---|
| Databricks | Dolly v2 (12B) | databricks/dolly-v2-12b | 2048 | FP16 |
| Databricks | Dolly v2 (3B) | databricks/dolly-v2-3b | 2048 | FP16 |
| Databricks | Dolly v2 (7B) | databricks/dolly-v2-7b | 2048 | FP16 |
| DiscoResearch | DiscoLM Mixtral 8x7b (46.7B) | DiscoResearch/DiscoLM-mixtral-8x7b-v2 | 32768 | FP16 |
| HuggingFace | Zephyr-7B-ß | HuggingFaceH4/zephyr-7b-beta | 32768 | FP16 |
| HuggingFaceH4 | StarCoderChat Alpha (16B) | HuggingFaceH4/starchat-alpha | 8192 | FP16 |
| LAION | Open-Assistant Pythia SFT-4 (12B) | OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5 | 2048 | FP16 |
| LAION | Open-Assistant StableLM SFT-7 (7B) | OpenAssistant/stablelm-7b-sft-v7-epoch-3 | 4096 | FP16 |
| LM Sys | Koala (13B) | togethercomputer/Koala-13B | 2048 | FP16 |
| LM Sys | Koala (7B) | togethercomputer/Koala-7B | 2048 | FP16 |
| LM Sys | Vicuna v1.3 (13B) | lmsys/vicuna-13b-v1.3 | 2048 | FP16 |
| LM Sys | Vicuna v1.3 (7B) | lmsys/vicuna-7b-v1.3 | 2048 | FP16 |
| LM Sys | Vicuna-FastChat-T5 (3B) | lmsys/fastchat-t5-3b-v1.0 | 512 | FP16 |
| Mosaic ML | MPT-Chat (30B) | togethercomputer/mpt-30b-chat | 2048 | FP16 |
| Mosaic ML | MPT-Chat (7B) | togethercomputer/mpt-7b-chat | 2048 | FP16 |
| NousResearch | Nous Hermes LLaMA-2 (70B) | NousResearch/Nous-Hermes-Llama2-70b | 4096 | FP16 |
| Qwen | Qwen Chat (7B) | Qwen/Qwen-7B-Chat | 2048 | FP16 |
| Qwen | Qwen Chat (14B) | Qwen/Qwen-14B-Chat | 2048 | FP16 |
| TII | Falcon Instruct (7B) | tiiuae/falcon-7b-instruct | 2048 | FP16 |
| TII | Falcon Instruct (40B) | tiiuae/falcon-40b-instruct | 2048 | FP16 |
| Tim Dettmers | Guanaco (13B) | togethercomputer/guanaco-13b | 2048 | FP16 |
| Tim Dettmers | Guanaco (33B) | togethercomputer/guanaco-33b | 2048 | FP16 |
| Tim Dettmers | Guanaco (65B) | togethercomputer/guanaco-65b | 2048 | FP16 |
| Tim Dettmers | Guanaco (7B) | togethercomputer/guanaco-7b | 2048 | FP16 |
| Together | GPT-NeoXT-Chat-Base (20B) | togethercomputer/GPT-NeoXT-Chat-Base-20B | 2048 | FP16 |
| Together | Pythia-Chat-Base (7B) | togethercomputer/Pythia-Chat-Base-7B-v0.16 | 2048 | FP16 |