Skip to content

Context Cache Chat Completion

1. Overview

Call this endpoint to send a request with cached context to the model. Before using it, call Context Cache Creation to obtain the cache id, then reference that cache in this endpoint through the context_id field.

2. Request

  • Method:POST
  • Endpoint:

    https://gateway.serevixai.ai/v1/context/chat/completions
    

3. Parameters

3.1 Header Parameters

Parameter Type Required Description Example
Content-Type string Yes Sets the request content type. It must be application/json application/json
Accept string Yes Sets the response content type. The recommended value is application/json application/json
Authorization string Yes API key required for authentication, in the format Bearer $YOUR_API_KEY. Bearer $YOUR_API_KEY

3.2 Body Parameters (application/json)

Parameter Type Required Description Example
context_id string Yes The cache ID used to reference the stored context. ctx-20241211104333-12345
model string Yes The model ID to use. See Model List for available versions, such as Doubao-1.5-pro-32k. Doubao-1.5-pro-32k
messages array Yes A chat message list in an OpenAI-compatible format. Each object contains role and content. [{"role": "user","content": "Hello"}]
role string No Message role. Supported values: system, user, and assistant. user
content string No The message content. Hello, tell me a joke.
temperature number No Sampling temperature in the range 0-2. Higher values make the output more random, while lower values make it more focused and deterministic. 0.7
top_p number No Another way to control the sampling distribution, in the range 0-1. It is usually used instead of temperature. 0.9
n number No How many completions to generate for each input message. 1
stream boolean No Whether to enable streaming output. When set to true, the API returns ChatGPT-style streamed data. false
stop string No You can specify up to 4 stop strings. Generation stops when one of them appears in the output. "\n"
max_tokens number No The maximum number of tokens that can be generated in a single reply, subject to the model context window. 1024
presence_penalty number No -2.0 to 2.0. Positive values encourage the model to introduce new topics, while negative values reduce that tendency. 0
frequency_penalty number No -2.0 to 2.0. Positive values reduce repetition, while negative values increase it. 0

4. Request Examples

POST /v1/context/chat/completions
Content-Type: application/json
Accept: application/json
Authorization: Bearer $YOUR_API_KEY

{
    "context_id": "ctx-20241211104333-12345",
    "model": "Doubao-1.5-pro-32k",
    "messages": [
        {
            "role": "user",
            "content": "Hello, can you explain quantum mechanics to me?"
        }
    ],
    "temperature": 0.7,
    "max_tokens": 1024
}
curl https://gateway.serevixai.ai/v1/context/chat/completions \
    -H "Content-Type: application/json" \
    -H "Accept: application/json" \
    -H "Authorization: Bearer $YOUR_API_KEY" \
    -d "{
    \"context_id\": \"ctx-20241211104333-12345\",
    \"model\": \"Doubao-1.5-pro-32k\",
    \"messages\": [{
        \"role\": \"user\",
        \"content\": \"Hello, can you explain quantum mechanics to me?\"
    }],
    \"temperature\": 0.7,
    \"max_tokens\": 1024
}"
package main

import (
    "fmt"
    "io/ioutil"
    "net/http"
    "strings"
)

const (
    YOUR_API_KEY    = "sk-123456789012345678901234567890123456789012345678"
    REQUEST_PAYLOAD = `{
    "context_id": "ctx-20241211104333-12345",
    "model": "Doubao-1.5-pro-32k",
    "messages": [{
        "role": "user",
        "content": "Hello, can you explain quantum mechanics to me?"
    }],
    "temperature": 0.7,
    "max_tokens": 1024
}`
)

func main() {

    requestURL := "https://gateway.serevixai.ai/v1/context/chat/completions"
    requestMethod := "POST"
    requestPayload := strings.NewReader(REQUEST_PAYLOAD)

    req, err := http.NewRequest(requestMethod, requestURL, requestPayload)
    if err != nil {
        fmt.Println("Create request failed, err: ", err)
        return
    }

    req.Header.Add("Content-Type", "application/json")
    req.Header.Add("Accept", "application/json")
    req.Header.Add("Authorization", "Bearer "+YOUR_API_KEY)

    client := &http.Client{}

    resp, err := client.Do(req)
    if err != nil {
        fmt.Println("Do request failed, err: ", err)
        return
    }
    defer resp.Body.Close()

    respBodyBytes, err := ioutil.ReadAll(resp.Body)
    if err != nil {
        fmt.Println("Read response body failed, err: ", err)
        return
    }
    fmt.Println(string(respBodyBytes))
}

5. Response Example

{
    "id": "chatcmpl-1234567890",
    "object": "chat.completion",
    "created": 1699999999,
    "model": "Doubao-1.5-pro-32k",
    "choices": [
        {
            "message": {
                "role": "assistant",
                "content": "Quantum mechanics is the branch of physics that studies the microscopic world..."
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 64,
        "completion_tokens": 13,
        "total_tokens": 77,
        "prompt_tokens_details": {
            "cached_tokens": 50
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0
        }
    }
}