API Reference

The Unified AI API is fully OpenAI-compatible. If your code already targets OpenAI, swap the base URL and key — that's it.

Base URL

https://kadegate.com/api/v1

Authentication

All endpoints require a Bearer token:

Authorization: Bearer uai_xxxxxxxxxxxxxxxxxxxxxxxx

POST /chat/completions

Create a chat completion. Mirrors the OpenAI Chat Completions API.

Request

POST https://kadegate.com/api/v1/chat/completions
Content-Type: application/json
Authorization: Bearer YOUR_KADEGATE_KEY

{
  "model": "google/gemma-4-26b-a4b-it",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "temperature": 0.7,
  "max_tokens": 256,
  "stream": false
}

Parameters

FieldTypeDescription
modelstringRequired. One of the supported model ids — see the Models section.
messagesarrayRequired. Conversation history.
temperaturenumber0 to 2. Defaults to 1.
max_tokensintegerMaximum tokens to generate.
top_pnumberNucleus sampling.
streambooleanServer-sent events streaming.
stopstring|arrayStop sequences.

Response

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1730000000,
  "model": "google/gemma-4-26b-a4b-it",
  "choices": [{
    "index": 0,
    "message": {"role": "assistant", "content": "Paris."},
    "finish_reason": "stop"
  }],
  "usage": {"prompt_tokens": 18, "completion_tokens": 2, "total_tokens": 20}
}

POST /completions

Legacy text completion endpoint. Prefer /chat/completions for new code.

POST https://kadegate.com/api/v1/completions
Authorization: Bearer YOUR_KADEGATE_KEY
Content-Type: application/json

{"model":"google/gemma-4-26b-a4b-it","prompt":"The capital of France is","max_tokens":8}

GET /models

Lists models available to your account. Pass any of these id values as "model" in your request — they are forwarded as-is.

Model IDContext
google/gemma-4-26b-a4b-it 262K
google/gemma-4-31b-it 256K
meta-llama/llama-4-maverick 1M
minimax/minimax-m2.7 196K
minimax/minimax-m3 524K
qwen/qwen3.5-plus-20260420 1K
xiaomi/mimo-v2-flash 262K
z-ai/glm-4.7-flash 262K
anthropic/claude-opus-4.6 1M
google/gemma-4-26b-a4b-it:free 262K
google/gemma-4-31b-it:free 262K
claude-opus-4-8
claude-opus-4-7 256K
gpt-5.4-nano 256K
curl https://kadegate.com/api/v1/models \
  -H "Authorization: Bearer YOUR_KADEGATE_KEY"

Streaming

Set "stream": true to receive Server-Sent Events. The response stream uses the OpenAI SSE format with data: lines and a final [DONE] sentinel.

data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"Hello"},"index":0}]}
data: {"id":"chatcmpl-...","choices":[{"delta":{"content":" there"},"index":0}]}
data: [DONE]

SDKs

Use any OpenAI SDK by overriding the base URL:

# Python
from openai import OpenAI
client = OpenAI(base_url="https://kadegate.com/api/v1", api_key="...")

// Node
import OpenAI from "openai";
const client = new OpenAI({ baseURL: "https://kadegate.com/api/v1", apiKey: "..." });