Authenticated · Scoped · Audited
Smart-Routing API.
A small, opinionated JSON API on top of the same routing brain that powers chat.bcat.app. Send a question, get the right specialist's answer back — no model selection, no orchestration code, no infrastructure.
401;
out-of-scope calls return 403. All
access is audited.
Base URL
https://api.bcat.app/api/v1
Dedicated api.bcat.app vhost on a PQC nginx (TLS 1.3 first, hybrid
X25519MLKEM768). The chat UI lives on a separate vhost
(chat.bcat.app) and does not proxy
/api/* — clients must call api.bcat.app directly.
api.bcat.app is additionally IP-allowlisted at the edge.
Approved caller IPs are added to the allowlist before a token is issued; both
the IP gate and the Bearer token must pass.
Authentication
Every request must include an Authorization: Bearer <token> header.
Three credential types are accepted, in order of precedence:
1. Service token
A long-lived secret reserved for trusted internal callers. Service tokens have no scope restriction and no rate limit, and rotation is supported via overlapping values. Not issued to external clients.
2. App-client token bck_*
Provisioned by an admin on request. Each token has its own scopes and per-minute rate limit. Tokens are shown once at creation; only a SHA-256 hash is stored, so a leaked database cannot leak the secret. Disabling or revoking takes effect on the next request.
Authorization: Bearer bck_<your-token-here> 3. User JWT (Google OAuth)
Sign in to chat.bcat.app with Google;
the resulting session JWT can call the API. Approved users get scopes
ask, embed, models, vision,
extract; admins additionally get infer. Per-user
rate limit is 30/min (60/min for admins).
Scopes & rate limits
| Scope | Endpoints | Default cap (req/min) |
|---|---|---|
| ask | POST /ask | 30 |
| embed | POST /embed | 60 |
| models | GET /models, GET /specialists | 120 |
| vision | POST /vision (multipart image) | 20 |
| extract | POST /extract (multipart document) | 60 |
| infer | POST /infer (raw passthrough) | 60 (admin) |
| * | wildcard — every non-admin scope | per-token cap |
Per-token caps override the defaults; service tokens are unlimited. Exceeding
a cap returns 429 rate_limit_exceeded with a retry_after field.
Endpoints
POST /api/v1/ask scope: ask
The flagship endpoint. Routes the question to the best specialist (code, math, biology, translation, …), runs it, and returns the answer plus which specialist served it.
curl -X POST https://api.bcat.app/api/v1/ask -H "Authorization: Bearer $BCAT_API_TOKEN" -H "Content-Type: application/json" -d '{"question":"What is the integral of sin(x)?","max_tokens":120}' {
"answer": "The integral of sin(x) is -cos(x) + C.",
"specialist_used": "math",
"model": "mathstral",
"tokens": { "input": 18, "output": 14 }
} POST /api/v1/embed scope: embed
Generate an embedding vector. Defaults to nomic-embed-text.
GET /api/v1/models scope: models
List the models currently loaded in the local Ollama instance.
GET /api/v1/specialists scope: models
Self-describing capability discovery. Returns the live specialist registry, the active main and vision models, every installed Ollama tag, and the full scope list. Use this to feature-detect at runtime instead of hardcoding what bcat can do.
{
"specialists": {
"code": {"label": "Coding / software / AI", "summary": "..."},
"math": {"label": "Mathematics / physics", "summary": "..."}
},
"ask_targets": ["ancient", "arxiv", "bio", "code", ...],
"active_model": "phi4:14b",
"vision_model": "gemma4:e2b",
"installed_models": ["aya-expanse:8b", "deepseek-coder-v2:16b", ...],
"scopes": ["ask","embed","models","vision","extract","infer","*"]
} POST /api/v1/vision scope: vision
Image understanding. Multipart POST with form field image (PNG / JPEG / GIF / WebP / BMP, ≤ 8 MB) and an optional prompt. Forwarded to the configured vision model.
POST /api/v1/extract scope: extract
Plain-text extraction from documents. Multipart POST with form field file. Supports text-bearing PDFs, text/*, JSON, XML, HTML, markdown, and common source-code MIME types. Output is capped at 50 000 characters.
POST /api/v1/infer scope: infer (admin-only)
Raw model passthrough. Skips routing — you supply the prompt and target model directly. Reserved for trusted callers; not granted to standard app clients.
Errors
| Code | Meaning |
|---|---|
| 401 | Missing, malformed, expired, or revoked credential |
| 403 | Credential is valid but lacks the required scope |
| 429 | Per-token rate limit exceeded; honour retry_after |
| 5xx | Backend (Ollama / specialist) failure; safe to retry |
What is not exposed
- The agent's tool surface (code execution, page fetching, web search, memory writes) is chat-only. It is never reachable through the API, even with a service token.
- User identities, conversations, and stored memories are isolated per-account. App-client tokens have no ambient user identity.
- Administrative endpoints require an admin user JWT; they are not reachable with an app-client token.
Getting a token
Tokens are not self-serve. Open a GitHub issue with the name of your app, the scopes you need, and an estimated request volume. Tokens are revocable at any time and rotated on a fixed schedule.