Authenticated · Scoped · Audited

Smart-Routing API.

A small, opinionated JSON API on top of the same routing brain that powers chat.bcat.app. Send a question, get the right specialist's answer back — no model selection, no orchestration code, no infrastructure.

Authorization required. The API is not a public service. Every request must carry a valid Bearer token. Unauthenticated calls return 401; out-of-scope calls return 403. All access is audited.

Base URL

https://api.bcat.app/api/v1

Dedicated api.bcat.app vhost on a PQC nginx (TLS 1.3 first, hybrid X25519MLKEM768). The chat UI lives on a separate vhost (chat.bcat.app) and does not proxy /api/* — clients must call api.bcat.app directly.

api.bcat.app is additionally IP-allowlisted at the edge. Approved caller IPs are added to the allowlist before a token is issued; both the IP gate and the Bearer token must pass.

Authentication

Every request must include an Authorization: Bearer <token> header. Three credential types are accepted, in order of precedence:

1. Service token

A long-lived secret reserved for trusted internal callers. Service tokens have no scope restriction and no rate limit, and rotation is supported via overlapping values. Not issued to external clients.

2. App-client token `bck_*`

Provisioned by an admin on request. Each token has its own scopes and per-minute rate limit. Tokens are shown once at creation; only a SHA-256 hash is stored, so a leaked database cannot leak the secret. Disabling or revoking takes effect on the next request.

Authorization: Bearer bck_<your-token-here>

3. User JWT (Google OAuth)

Sign in to chat.bcat.app with Google; the resulting session JWT can call the API. Approved users get scopes ask, embed, models, vision, extract; admins additionally get infer. Per-user rate limit is 30/min (60/min for admins).

Scopes & rate limits

Scope	Endpoints	Default cap (req/min)
ask	`POST /ask`	30
embed	`POST /embed`	60
models	`GET /models`, `GET /specialists`	120
vision	`POST /vision` (multipart image)	20
extract	`POST /extract` (multipart document)	60
infer	`POST /infer` (raw passthrough)	60 (admin)
*	wildcard — every non-admin scope	per-token cap

Per-token caps override the defaults; service tokens are unlimited. Exceeding a cap returns 429 rate_limit_exceeded with a retry_after field.

Endpoints

`POST` /api/v1/ask scope: ask

The flagship endpoint. Routes the question to the best specialist (code, math, biology, translation, …), runs it, and returns the answer plus which specialist served it.

curl -X POST https://api.bcat.app/api/v1/ask   -H "Authorization: Bearer $BCAT_API_TOKEN"   -H "Content-Type: application/json"   -d '{"question":"What is the integral of sin(x)?","max_tokens":120}'

{
  "answer": "The integral of sin(x) is -cos(x) + C.",
  "specialist_used": "math",
  "model": "mathstral",
  "tokens": { "input": 18, "output": 14 }
}

`POST` /api/v1/embed scope: embed

Generate an embedding vector. Defaults to nomic-embed-text.

`GET` /api/v1/models scope: models

List the models currently loaded in the local Ollama instance.

`GET` /api/v1/specialists scope: models

Self-describing capability discovery. Returns the live specialist registry, the active main and vision models, every installed Ollama tag, and the full scope list. Use this to feature-detect at runtime instead of hardcoding what bcat can do.

{
  "specialists": {
    "code": {"label": "Coding / software / AI", "summary": "..."},
    "math": {"label": "Mathematics / physics",  "summary": "..."}
  },
  "ask_targets": ["ancient", "arxiv", "bio", "code", ...],
  "active_model":   "phi4:14b",
  "vision_model":   "gemma4:e2b",
  "installed_models": ["aya-expanse:8b", "deepseek-coder-v2:16b", ...],
  "scopes": ["ask","embed","models","vision","extract","infer","*"]
}

`POST` /api/v1/vision scope: vision

Image understanding. Multipart POST with form field image (PNG / JPEG / GIF / WebP / BMP, ≤ 8 MB) and an optional prompt. Forwarded to the configured vision model.

`POST` /api/v1/extract scope: extract

Plain-text extraction from documents. Multipart POST with form field file. Supports text-bearing PDFs, text/*, JSON, XML, HTML, markdown, and common source-code MIME types. Output is capped at 50 000 characters.

`POST` /api/v1/infer scope: infer (admin-only)

Raw model passthrough. Skips routing — you supply the prompt and target model directly. Reserved for trusted callers; not granted to standard app clients.

Errors

Code	Meaning
401	Missing, malformed, expired, or revoked credential
403	Credential is valid but lacks the required scope
429	Per-token rate limit exceeded; honour `retry_after`
5xx	Backend (Ollama / specialist) failure; safe to retry

What is not exposed

The agent's tool surface (code execution, page fetching, web search, memory writes) is chat-only. It is never reachable through the API, even with a service token.
User identities, conversations, and stored memories are isolated per-account. App-client tokens have no ambient user identity.
Administrative endpoints require an admin user JWT; they are not reachable with an app-client token.

Getting a token

Tokens are not self-serve. Open a GitHub issue with the name of your app, the scopes you need, and an estimated request volume. Tokens are revocable at any time and rotated on a fixed schedule.

Security report? See /.well-known/security.txt. Please report responsibly — do not load-test, scrape, or attempt unauthorized access.

Base URL

Authentication

1. Service token

2. App-client token bck_*

3. User JWT (Google OAuth)

Scopes & rate limits

Endpoints

POST /api/v1/ask scope: ask

POST /api/v1/embed scope: embed

GET /api/v1/models scope: models

GET /api/v1/specialists scope: models

POST /api/v1/vision scope: vision

POST /api/v1/extract scope: extract

POST /api/v1/infer scope: infer (admin-only)