Query a language, code, or image model.
A string providing context for the model to complete.
"<s>[INST] What is the capital of France? [/INST]"
The name of the model to query.
"mistralai/Mixtral-8x7B-Instruct-v0.1"
The maximum number of tokens to generate.
A list of string sequences that will truncate (stop) inference text output.
Determines the degree of randomness in the response.
The top_p (nucleus) parameter is used to dynamically adjust the number of choices for each predicted token based on the cumulative probabilities.
The top_k parameter is used to limit the number of choices for the next predicted word or token.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
If set, tokens are returned as Server-Sent Events as they are made available. Stream terminates with data: [DONE]
Determines the number of most likely tokens to return at each token position log probabilities to return
0 <= x <= 1If set, the response will contain the prompt, and will also return prompt logprobs if set with logprobs.
Number of generations to return
1 <= x <= 128The name of the safety model to use.
"safety_model_name"
The min_p parameter is a number between 0 and 1 and an alternative to temperature.
The presence_penalty parameter is a number between -2.0 and 2.0 where a positive value will increase the likelihood of a model talking about new topics.
The frequency_penalty parameter is a number between -2.0 and 2.0 where a positive value will decrease the likelihood of repeating tokens that were mentioned prior.
The logit_bias parameter allows us to adjust the likelihood of specific tokens appearing in the generated output.
{ "105": 21.4, "1024": -10.5 }