Context Cache Chat Completion¶

1. Overview¶

Call this endpoint to send a request with cached context to the model. Before using it, call Context Cache Creation to obtain the cache id, then reference that cache in this endpoint through the context_id field.

2. Request¶

Method:POST

Endpoint:

https://gateway.serevixai.ai/v1/context/chat/completions

3. Parameters¶

3.1 Header Parameters¶

Parameter	Type	Required	Description	Example
`Content-Type`	string	Yes	Sets the request content type. It must be `application/json`	`application/json`
`Accept`	string	Yes	Sets the response content type. The recommended value is `application/json`	`application/json`
`Authorization`	string	Yes	API key required for authentication, in the format `Bearer $YOUR_API_KEY`.	`Bearer $YOUR_API_KEY`

3.2 Body Parameters (application/json)¶

Parameter	Type	Required	Description	Example
context_id	string	Yes	The cache ID used to reference the stored context.	`ctx-20241211104333-12345`
model	string	Yes	The model ID to use. See Model List for available versions, such as `Doubao-1.5-pro-32k`.	`Doubao-1.5-pro-32k`
messages	array	Yes	A chat message list in an OpenAI-compatible format. Each object contains `role` and `content`.	`[{"role": "user","content": "Hello"}]`
role	string	No	Message role. Supported values: `system`, `user`, and `assistant`.	`user`
content	string	No	The message content.	`Hello, tell me a joke.`
temperature	number	No	Sampling temperature in the range `0-2`. Higher values make the output more random, while lower values make it more focused and deterministic.	`0.7`
top_p	number	No	Another way to control the sampling distribution, in the range `0-1`. It is usually used instead of `temperature`.	`0.9`
n	number	No	How many completions to generate for each input message.	`1`
stream	boolean	No	Whether to enable streaming output. When set to `true`, the API returns ChatGPT-style streamed data.	`false`
stop	string	No	You can specify up to 4 stop strings. Generation stops when one of them appears in the output.	`"\n"`
max_tokens	number	No	The maximum number of tokens that can be generated in a single reply, subject to the model context window.	`1024`
presence_penalty	number	No	-2.0 to 2.0. Positive values encourage the model to introduce new topics, while negative values reduce that tendency.	`0`
frequency_penalty	number	No	-2.0 to 2.0. Positive values reduce repetition, while negative values increase it.	`0`

4. Request Examples¶

HTTPShellGo

POST /v1/context/chat/completions
Content-Type: application/json
Accept: application/json
Authorization: Bearer $YOUR_API_KEY

{
    "context_id": "ctx-20241211104333-12345",
    "model": "Doubao-1.5-pro-32k",
    "messages": [
        {
            "role": "user",
            "content": "Hello, can you explain quantum mechanics to me?"
        }
    ],
    "temperature": 0.7,
    "max_tokens": 1024
}

curl https://gateway.serevixai.ai/v1/context/chat/completions \
    -H "Content-Type: application/json" \
    -H "Accept: application/json" \
    -H "Authorization: Bearer $YOUR_API_KEY" \
    -d "{
    \"context_id\": \"ctx-20241211104333-12345\",
    \"model\": \"Doubao-1.5-pro-32k\",
    \"messages\": [{
        \"role\": \"user\",
        \"content\": \"Hello, can you explain quantum mechanics to me?\"
    }],
    \"temperature\": 0.7,
    \"max_tokens\": 1024
}"

package main

import (
    "fmt"
    "io/ioutil"
    "net/http"
    "strings"
)

const (
    YOUR_API_KEY    = "sk-123456789012345678901234567890123456789012345678"
    REQUEST_PAYLOAD = `{
    "context_id": "ctx-20241211104333-12345",
    "model": "Doubao-1.5-pro-32k",
    "messages": [{
        "role": "user",
        "content": "Hello, can you explain quantum mechanics to me?"
    }],
    "temperature": 0.7,
    "max_tokens": 1024
}`
)

func main() {

    requestURL := "https://gateway.serevixai.ai/v1/context/chat/completions"
    requestMethod := "POST"
    requestPayload := strings.NewReader(REQUEST_PAYLOAD)

    req, err := http.NewRequest(requestMethod, requestURL, requestPayload)
    if err != nil {
        fmt.Println("Create request failed, err: ", err)
        return
    }

    req.Header.Add("Content-Type", "application/json")
    req.Header.Add("Accept", "application/json")
    req.Header.Add("Authorization", "Bearer "+YOUR_API_KEY)

    client := &http.Client{}

    resp, err := client.Do(req)
    if err != nil {
        fmt.Println("Do request failed, err: ", err)
        return
    }
    defer resp.Body.Close()

    respBodyBytes, err := ioutil.ReadAll(resp.Body)
    if err != nil {
        fmt.Println("Read response body failed, err: ", err)
        return
    }
    fmt.Println(string(respBodyBytes))
}

5. Response Example¶

{
    "id": "chatcmpl-1234567890",
    "object": "chat.completion",
    "created": 1699999999,
    "model": "Doubao-1.5-pro-32k",
    "choices": [
        {
            "message": {
                "role": "assistant",
                "content": "Quantum mechanics is the branch of physics that studies the microscopic world..."
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 64,
        "completion_tokens": 13,
        "total_tokens": 77,
        "prompt_tokens_details": {
            "cached_tokens": 50
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0
        }
    }
}