Context Cache Chat Completion¶
1. Overview¶
Call this endpoint to send a request with cached context to the model. Before using it, call Context Cache Creation to obtain the cache id, then reference that cache in this endpoint through the context_id field.
2. Request¶
- Method:
POST -
Endpoint:
https://gateway.serevixai.ai/v1/context/chat/completions
3. Parameters¶
3.1 Header Parameters¶
| Parameter | Type | Required | Description | Example |
|---|---|---|---|---|
Content-Type |
string | Yes | Sets the request content type. It must be application/json |
application/json |
Accept |
string | Yes | Sets the response content type. The recommended value is application/json |
application/json |
Authorization |
string | Yes | API key required for authentication, in the format Bearer $YOUR_API_KEY. |
Bearer $YOUR_API_KEY |
3.2 Body Parameters (application/json)¶
| Parameter | Type | Required | Description | Example |
|---|---|---|---|---|
| context_id | string | Yes | The cache ID used to reference the stored context. | ctx-20241211104333-12345 |
| model | string | Yes | The model ID to use. See Model List for available versions, such as Doubao-1.5-pro-32k. |
Doubao-1.5-pro-32k |
| messages | array | Yes | A chat message list in an OpenAI-compatible format. Each object contains role and content. |
[{"role": "user","content": "Hello"}] |
| role | string | No | Message role. Supported values: system, user, and assistant. |
user |
| content | string | No | The message content. | Hello, tell me a joke. |
| temperature | number | No | Sampling temperature in the range 0-2. Higher values make the output more random, while lower values make it more focused and deterministic. |
0.7 |
| top_p | number | No | Another way to control the sampling distribution, in the range 0-1. It is usually used instead of temperature. |
0.9 |
| n | number | No | How many completions to generate for each input message. | 1 |
| stream | boolean | No | Whether to enable streaming output. When set to true, the API returns ChatGPT-style streamed data. |
false |
| stop | string | No | You can specify up to 4 stop strings. Generation stops when one of them appears in the output. | "\n" |
| max_tokens | number | No | The maximum number of tokens that can be generated in a single reply, subject to the model context window. | 1024 |
| presence_penalty | number | No | -2.0 to 2.0. Positive values encourage the model to introduce new topics, while negative values reduce that tendency. | 0 |
| frequency_penalty | number | No | -2.0 to 2.0. Positive values reduce repetition, while negative values increase it. | 0 |
4. Request Examples¶
POST /v1/context/chat/completions
Content-Type: application/json
Accept: application/json
Authorization: Bearer $YOUR_API_KEY
{
"context_id": "ctx-20241211104333-12345",
"model": "Doubao-1.5-pro-32k",
"messages": [
{
"role": "user",
"content": "Hello, can you explain quantum mechanics to me?"
}
],
"temperature": 0.7,
"max_tokens": 1024
}
curl https://gateway.serevixai.ai/v1/context/chat/completions \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-H "Authorization: Bearer $YOUR_API_KEY" \
-d "{
\"context_id\": \"ctx-20241211104333-12345\",
\"model\": \"Doubao-1.5-pro-32k\",
\"messages\": [{
\"role\": \"user\",
\"content\": \"Hello, can you explain quantum mechanics to me?\"
}],
\"temperature\": 0.7,
\"max_tokens\": 1024
}"
package main
import (
"fmt"
"io/ioutil"
"net/http"
"strings"
)
const (
YOUR_API_KEY = "sk-123456789012345678901234567890123456789012345678"
REQUEST_PAYLOAD = `{
"context_id": "ctx-20241211104333-12345",
"model": "Doubao-1.5-pro-32k",
"messages": [{
"role": "user",
"content": "Hello, can you explain quantum mechanics to me?"
}],
"temperature": 0.7,
"max_tokens": 1024
}`
)
func main() {
requestURL := "https://gateway.serevixai.ai/v1/context/chat/completions"
requestMethod := "POST"
requestPayload := strings.NewReader(REQUEST_PAYLOAD)
req, err := http.NewRequest(requestMethod, requestURL, requestPayload)
if err != nil {
fmt.Println("Create request failed, err: ", err)
return
}
req.Header.Add("Content-Type", "application/json")
req.Header.Add("Accept", "application/json")
req.Header.Add("Authorization", "Bearer "+YOUR_API_KEY)
client := &http.Client{}
resp, err := client.Do(req)
if err != nil {
fmt.Println("Do request failed, err: ", err)
return
}
defer resp.Body.Close()
respBodyBytes, err := ioutil.ReadAll(resp.Body)
if err != nil {
fmt.Println("Read response body failed, err: ", err)
return
}
fmt.Println(string(respBodyBytes))
}
5. Response Example¶
{
"id": "chatcmpl-1234567890",
"object": "chat.completion",
"created": 1699999999,
"model": "Doubao-1.5-pro-32k",
"choices": [
{
"message": {
"role": "assistant",
"content": "Quantum mechanics is the branch of physics that studies the microscopic world..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 64,
"completion_tokens": 13,
"total_tokens": 77,
"prompt_tokens_details": {
"cached_tokens": 50
},
"completion_tokens_details": {
"reasoning_tokens": 0
}
}
}