Chat Completion API Documentation

Endpoint

POST `/api/chat/completions`

This endpoint generates chat completions using the specified model and conversation history. It supports both non-streaming and streaming responses.

Request Format

Body Parameters

The request body should be in JSON format with the following fields:

Parameter	Type	Description
`model`	`string`	The model name to use (e.g., `brogevity-mini`).
`messages`	`List<Dict>`	The conversation history, including system, user, and assistant roles. Each entry must have `role` and `content`.
`temperature`	`float` (default: `1.0`)	Controls the randomness of the output. Lower values (e.g., `0.2`) make output more deterministic.
`top_p`	`float` (default: `1.0`)	Alternative to `temperature`. Filters output based on cumulative probability.
`n`	`int` (default: `1`)	Number of completions to generate.
`stream`	`bool` (default: `false`)	If `true`, streams the response in chunks.
`stop`	`string` or `List<string>` (optional)	Sequence(s) that will stop the generation.
`max_tokens`	`int` (optional)	Maximum number of tokens to generate in the completion.
`presence_penalty`	`float` (default: `0.0`)	Penalizes new tokens based on their presence in the text so far, encouraging new topics.
`frequency_penalty`	`float` (default: `0.0`)	Penalizes tokens based on their frequency in the text so far.
`logit_bias`	`Dict<string, float>` (optional)	Modifies the likelihood of specific tokens appearing in the completion.
`user`	`string` (optional)	A unique identifier for tracking the user making the request.

Example Request

{
    "model": "brogevity-mini",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What Andrew Huberman thinks about Sleep?"}
    ],
    "temperature": 0.7,
    "stream": false
}

Response Format

Non-Streaming Response

When stream is set to false, the response is returned as a single JSON object:

Field	Type	Description
`id`	`string`	Unique identifier for the completion.
`object`	`string`	The object type (e.g., `chat.completion`).
`created`	`int`	Timestamp when the completion was created.
`model`	`string`	The model used for the completion.
`choices`	`List<Dict>`	Contains the generated messages.
`usage`	`Dict`	Token usage statistics (`prompt_tokens`, `completion_tokens`, `total_tokens`).

Example Non-Streaming Response

{
    "id": "bro5e804520a51740f39bc321f0",
    "object": "chat.completion",
    "created": 1732906423,
    "model": "brogevity-mini",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Andrew Huberman emphasizes the critical role of sleep in cognitive and physical health..."
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 5319,
        "completion_tokens": 492,
        "total_tokens": 5811
    }
}