Authorizations
Body
A string providing context for the model to complete.
"<s>[INST] What is the capital of France? [/INST]"
The name of the model to query.
"mistralai/Mixtral-8x7B-Instruct-v0.1"
The maximum number of tokens to generate.
A list of string sequences that will truncate (stop) inference text output.
Determines the degree of randomness in the response.
The top_p
(nucleus) parameter is used to dynamically adjust the number of choices for each predicted token based on the cumulative probabilities.
The top_k
parameter is used to limit the number of choices for the next predicted word or token.
A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition.
If set, tokens are returned as Server-Sent Events as they are made available. Stream terminates with data: [DONE]
Determines the number of most likely tokens to return at each token position log probabilities to return
0 <= x <= 1
If set, the response will contain the prompt, and will also return prompt logprobs if set with logprobs.
Number of generations to return
1 <= x <= 128
The name of the safety model to use.
"safety_model_name"
The min_p
parameter is a number between 0 and 1 and an alternative to temperature
.
The presence_penalty
parameter is a number between -2.0 and 2.0 where a positive value will increase the likelihood of a model talking about new topics.
The frequency_penalty
parameter is a number between -2.0 and 2.0 where a positive value will decrease the likelihood of repeating tokens that were mentioned prior.
The logit_bias
parameter allows us to adjust the likelihood of specific tokens appearing in the generated output.
{ "105": 21.4, "1024": -10.5 }