AI proxy
Call Anthropic and OpenAI via amba so provider keys stay server-side. Prompts are managed in the console.
Amba.ai.* proxies LLM requests through amba so your provider keys never ship to a client. You write the prompt once in the console, reference it by prompt_slug from any SDK, and the server fills in the system prompt + variables + model + rate-limits + usage tracking.
Two providers wired today: Anthropic (messages.create) and OpenAI (chat.completions.create). Both return the upstream response shape verbatim, plus a usage event so you can attribute cost per user.
Quick start
Operations
ai.anthropic.messages.create({ prompt_slug, variables, max_tokens })
Sends a prompt to Anthropic via amba.
| Field | Required | Notes |
|---|---|---|
prompt_slug | yes | Reference to a prompt defined in the console. Server resolves the system prompt + model. |
variables | optional | Object whose keys substitute into the prompt's {{variable}} placeholders. |
max_tokens | optional | Cap output length. Server enforces a project-wide ceiling. |
temperature | optional | Float; default per-prompt in the console. |
enable_prompt_cache | optional | Pass true to opt into Anthropic's prompt caching for that slug. |
The response shape mirrors Anthropic's Message object:
ai.openai.chat.completions.create({ prompt_slug, variables, max_tokens })
Same shape, but routes to OpenAI. Response mirrors OpenAI's ChatCompletion.
Patterns
Prompt slugs
Prompts live in the console — system prompt, variable schema, default model, default temperature, default max_tokens. The client passes the slug; the server fills the rest. This means you can:
- Change the model behind a slug without redeploying.
- A/B test prompt variants by routing a slug through a feature flag.
- Audit which user sent which prompt via the per-call usage event.
Slug naming: lowercase, underscore-separated, up to 64 characters. Group by feature: support_assistant, summarize_review, onboarding_recommend.
Variable substitution
The prompt template uses {{name}} placeholders. The client's variables object fills them:
Variables that aren't in the template are silently ignored. Missing required variables return a 400 from the server with the offending key listed.
Streaming
Streaming responses (Anthropic's stream: true, OpenAI's chunked) are exposed via the SDK as an async iterator on platforms that support it. See client API — ai for the canonical stream shape.
Cost attribution
Every ai.*.create call emits a ai_usage event automatically — same events namespace as everything else, but with usage.input_tokens and usage.output_tokens attached. Query the event stream per-user to attribute cost without instrumenting your own counter.
Limits
- Prompt slug must exist: the server returns
404 prompt_slug_not_foundfor unknown slugs. Define the prompt in the console before referencing it. max_tokensceiling: per-project hard cap (default 4096); the prompt's console default applies if you don't pass one.- Rate limits: per-prompt-slug rate limits configured in the console. Defaults are conservative; raise per-slug as your usage grows.
- No client-side keys: client SDKs cannot pass an
api_key. The server's provider key is the only credential in play. - No tool-use roundtrip from clients: tool calls (Anthropic) and function-calling (OpenAI) are accepted in the response but the client SDK doesn't auto-execute tools. Run tool dispatch in a server function and only return the final text to the client.
Reference
- Client API — ai — endpoint reference.
- CLI:
amba ai prompts— manage prompt slugs. - Auth feature — prerequisite.
- Per-platform quickstarts: Web, Node, iOS, Android, Flutter, Unity.