bus-api-provider-llm — OpenAI-compatible LLM provider
bus-api-provider-llm — OpenAI-compatible LLM provider
bus-api-provider-llm is the provider for OpenAI-compatible /v1/* model
proxy behavior. It lets end users call Bus-hosted LLM services with normal
OpenAI-compatible clients while Bus handles authentication, billing
entitlement, runtime wake-up, streaming, and usage metering.
The provider validates Bus API JWTs issued by bus auth. The same
OpenAI-compatible API can be used by tools such as bus-agent, but existing
non-Bus model providers and credential flows can remain available in those
client tools.
Authentication
Execution endpoints require a Bearer JWT with audience ai.hg.fi/api and
scope llm:proxy. GET /v1/models also requires a valid bearer token so the
catalog is not public, but it does not check billing entitlement or wake the
runtime.
The JWT sub is the account UUID used for billing and usage records.
GET /v1/models
Returns the model catalog shown to end users.
By default this endpoint uses the configured local catalog. It does not wake GPU runtimes, check billing entitlement, or probe the backend.
Use proxy mode only when the deployment intentionally wants model listing to be forwarded to the backend.
POST /v1/chat/completions
Proxies OpenAI-compatible chat completion requests to the configured backend,
or publishes a provider-neutral LLM execution event when
--execution-backend events is selected.
Streaming requests are forwarded chunk by chunk. When possible, the provider
adds stream_options.include_usage=true so streamed responses can include
token usage for billing.
POST /v1/completions
Proxies OpenAI-compatible text completion requests or sends the matching Bus LLM execution event when event-backed execution is enabled.
The provider applies the same authentication, billing, runtime readiness, and usage recording behavior as chat completions.
POST /v1/responses
Proxies OpenAI-compatible Responses API requests or sends the matching Bus LLM execution event when event-backed execution is enabled.
Use this endpoint for clients that target the newer OpenAI-compatible response shape.
POST /v1/embeddings
Proxies OpenAI-compatible embedding requests.
Embedding requests are authenticated and metered under the same account as other execution requests.
GET /readyz
Reports provider readiness.
When required Events listeners are enabled, readiness stays unhealthy until the runtime, usage, and billing response streams are connected.
Billing Enforcement
When --billing-backend events is enabled, execution endpoints check
entitlement before runtime wake-up or backend proxying.
Denied access returns billing_required or quota_exceeded with guidance from
the billing system.
Runtime Wake-Up
When --runtime-backend events is enabled, the provider uses VM runtime events
to make the backend available before forwarding execution requests.
Model catalog reads do not trigger runtime wake-up.
Usage Recording
The provider records request lifecycle and token-usage events through direct
storage or bus-integration-usage.
Client disconnects during streaming cancel upstream work and record a terminal abort/failure event when backend work may have started.
--addr <addr>
Selects the listen address for the provider.
Default is 127.0.0.1:8080; use --addr 0.0.0.0:<port> inside containers
that must accept traffic from other services.
--backend-url <url>
Sets the OpenAI-compatible backend URL used for execution requests when
--execution-backend http is selected.
Default is http://127.0.0.1:11434. It is required for HTTP execution unless
the deployment intentionally uses the default local backend.
Use the provider root as the base URL, without appending /v1; the LLM
provider appends the incoming /v1/* request path itself. For example:
bus-api-provider-llm --backend-url http://127.0.0.1:11434
--execution-backend <http|events>
Selects where model execution runs.
Default is http; accepted values are http and events.
Use http to proxy requests directly to an OpenAI-compatible backend at
--backend-url.
Use events to publish provider-neutral bus.llm.* execution events. This is
the preferred local Bus architecture for Codex-backed development because
bus-api-provider-llm stays responsible for REST compatibility, JWTs, billing,
runtime readiness, and usage records while integrations such as
bus-integration-codex own provider-specific model
execution.
When events is selected, the provider listens for correlated response events
and does not require --backend-url. The provider service token must have
llm:proxy for publishing and listening to bus.llm.* events.
--model-catalog <path>
Loads the local /v1/models catalog from a JSON file.
Required when --models-backend catalog is used and the deployment should
serve a custom catalog.
The matching environment variable is BUS_LLM_MODEL_CATALOG.
The file uses the OpenAI-compatible model-list shape. Each entry needs at
least id, object, created, and owned_by:
{
"object": "list",
"data": [
{"id": "codex-chatgpt", "object": "model", "created": 0, "owned_by": "bus-codex"}
]
}
--models-backend <catalog|proxy>
Selects how /v1/models is served.
Default is catalog; accepted values are catalog and proxy.
Use catalog for production deployments that should not wake GPU backends on
model listing. Use proxy only when backend model listing is intended.
--runtime-backend <none|events>
Controls runtime wake-up.
Default is none; accepted values are none and events.
Use events when the provider should ask the Bus VM/runtime layer to start or
verify the backend before execution requests.
--usage-backend <postgres|events|memory>
Controls usage recording.
Default is postgres. Use events when usage should be collected by
bus-integration-usage; use memory only for deterministic local checks.
With the postgres default, set BUS_USAGE_DATABASE_URL to a reachable
PostgreSQL URL before starting the provider. The provider creates or uses its
minimal usage tables at startup; missing or unreachable storage makes usage
recording unavailable instead of silently using memory.
--billing-backend <none|events>
Controls billing entitlement checks.
Default is none; accepted values are none and events.
Use events for paid LLM plans.
--events-url <url>
Sets the Bus Events API URL used by runtime, usage, and billing event backends. Required when any selected backend uses Events.
Provide the provider’s Events token through deployment-managed configuration,
such as BUS_API_TOKEN. Do not pass bearer tokens as command-line arguments.
When --execution-backend events is enabled, the token must be able to send
and receive bus.llm.* events with llm:proxy. When --runtime-backend
events is enabled, the token must be able to send VM start/status requests and
receive the correlated responses, typically vm:write and vm:read. When
--usage-backend events is enabled, it needs usage write permissions such as
usage:write. When --billing-backend events is enabled, it needs
entitlement-check permission such as billing:entitlement:check. Deployments
may use an internal service token for these provider-to-provider calls.
--backend-ready-path <path>
Sets the backend readiness path checked after runtime wake-up.
No default is set. Configure it when --runtime-backend events should poll a
specific backend readiness endpoint after wake-up.
Common values are /v1/models for OpenAI-compatible backends and /api/tags
for Ollama-compatible backends.
--backend-ready-timeout <duration>
Sets the maximum time to wait for backend readiness.
Default is 30s; use a Go duration such as 5s, 30s, or 2m.
--backend-ready-poll-interval <duration>
Sets the delay between backend readiness attempts.
Default is 1s; use a Go duration.
--backend-ready-statuses <codes>
Sets the HTTP status codes that count as backend-ready. Use comma-separated
integer status codes, such as 200,204. The default ready status set is
200,204.
--timeout <duration>
Sets backend proxy and Events request timeouts. Default is 60s; use a Go
duration such as 15s, 60s, or 2m.
BUS_EVENTS_LISTENER_REQUIRED
When set to 1, readiness requires the Events response listeners needed by the
enabled backends.
The default is unset/false.
Local Compose Stack
The BusDK superproject compose.yaml starts this provider as bus-llm with
--execution-backend events, --usage-backend events, --runtime-backend
none, and --events-url http://bus-events:8081. Nginx exposes the
OpenAI-compatible API at /v1/* on the local API port. The model catalog is
loaded from deploy/local-ai-platform/model-catalog.json; the smoke check
expects GET /v1/models to include codex-chatgpt.
Start and verify the local stack from the superproject root:
cd /path/to/busdk
docker compose up --build -d
TOKEN="$(docker compose exec -T testing-agent cat /root/.config/bus/auth/api-token)"
curl -fsS -H "Authorization: Bearer $TOKEN" \
http://127.0.0.1:${LOCAL_AI_PLATFORM_PORT:-8080}/v1/models
The response should include a model with "id":"codex-chatgpt".
LLM execution requests are sent as bus.llm.* events to
bus-integration-codex. Set BUS_LOCAL_AI_PLATFORM_LIVE_CODEX=1 for the
compose smoke script only after Codex credentials are available to the
bus-codex container. Set BUS_LLM_BILLING_BACKEND=events when local LLM
requests should require billing entitlement checks.
End-User Access
Approved users request an API token with LLM scope:
bus auth token --audience ai.hg.fi/api --scope "llm:proxy"
llm:proxy is the required scope for model execution. Add billing:read only
when the same token will also call billing status or setup APIs.
By default, bus auth token writes the issued API token to
~/.config/bus/auth/api-token or ${BUS_CONFIG_DIR}/auth/api-token, which the
curl example below reads. If a deployment configures token output differently,
write the token to that file or set TOKEN from the command output explicitly.
They can then use the token with OpenAI-compatible clients by setting the base
URL to the Bus LLM endpoint and using the Bus API token as the bearer token.
For a hosted AI Platform deployment this is commonly the /v1 API base URL.
For example:
TOKEN="$(cat ~/.config/bus/auth/api-token)"
curl -fsS \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://ai.hg.fi/v1/chat/completions \
-d '{"model":"codex-chatgpt","messages":[{"role":"user","content":"Say OK"}]}'
Success returns an OpenAI-compatible chat completion response. If billing is required or quota is exhausted, the provider returns a deterministic billing error instead of forwarding the request.
Billing setup is required only when the deployment enforces billing for the feature. If billing is missing or quota is exhausted, the provider returns a deterministic error with guidance instead of waking the runtime or forwarding the request to the backend.
Usage And Billing
The provider records lifecycle events for request starts, runtime readiness,
backend starts and finishes, successful token usage, missing usage,
request failures, and client aborts. bus-integration-usage can export
successful token usage to bus-integration-billing, which counts quota buckets
and records payment-provider meter events such as Stripe meter events.
Streaming clients that disconnect early cancel upstream work and record a terminal failure/abort usage event when backend work may have started. This keeps billing and operational records aligned with actual work attempted by the service.
For Stripe-backed deployments, configure billing and Stripe integrations before enabling paid LLM access for users. Keep Stripe keys and webhook secrets in deployment secrets or untracked local operator configuration.
Using from .bus files
Inside a .bus file, write the module target without the bus prefix:
# same as: bus api provider llm --addr 127.0.0.1:8088 --execution-backend events
api provider llm --addr 127.0.0.1:8088 --execution-backend events --events-url "$BUS_EVENTS_API_URL"