Introduction

👋 Welcome to the Together AI docs! Together AI makes it easy to run or fine-tune leading open source models with only a few lines of code. We offer a variety of generative AI services:

Serverless models

Use the API or playground to evaluate 100+ models run out of the box with our Inference Engine. You only pay per token/image.

On-demand dedicated endpoints

Run models on your own private GPU, with a pay-per-second usage model. Start dedicated endpoints here and review our docs.

Monthly reserved dedicated endpoints

Larger capacity reserved instances starting at a one month minimum, including VPC options for large deployments. Contact us

Fine-Tuning

Fine-tune with a few commands and deploy your fine-tuned model for inference.

GPU Clusters

If you’re interested in private, state of the art clusters with A100 or H100 GPUs, contact us.

Quickstart

See our full quickstart for how to get started with our API in 1 minute.

Python
TypeScript
cURL

from together import Together

client = Together()

completion = client.chat.completions.create(
model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
messages=[{"role": "user", "content": "What are the top 3 things to do in New York?"}],
)

Together hosts many popular models via our serverless endpoints. You can also use our dedicated GPU infrastructure to configure and host your own model. When using one of our hosted serverless models, you’ll be charged based on the amount of tokens you use in your queries. For dedicated models you configure and run yourself, you’ll be charged per minute as long as your endpoint is running. You can start or stop your endpoint at any time using our online playground. To learn more about the pricing for both our serverless and dedicated endpoints, visit our pricing page. Check out these pages to see our current list of available models:

Don’t see a model you want to use? Send us a request to add or upvote the model you’d love to see us add to our serverless infrastructure.

Next steps

Check out the Together AI playground to try out different models.
Learn how to stream responses back to your applications.
Explore our examples to learn about various use cases.
See our integrations with leading LLM frameworks.

Resources

Updated 10 days ago

Getting Started

Inference Chat

Serverless models

On-demand dedicated endpoints

Monthly reserved dedicated endpoints

Fine-Tuning

GPU Clusters

Quickstart

Which model should I use?

Next steps

Resources

Getting Started

Inference Chat

Serverless models

On-demand dedicated endpoints

Monthly reserved dedicated endpoints

Fine-Tuning

GPU Clusters

​Quickstart

​Which model should I use?

​Next steps

​Resources

Quickstart

Which model should I use?

Next steps

Resources