bus-api-provider-llm — OpenAI-compatible LLM provider

bus-api-provider-llm — OpenAI-compatible LLM provider

bus-api-provider-llm is the provider for OpenAI-compatible /v1/* model proxy behavior. It lets end users call Bus-hosted LLM services with normal OpenAI-compatible clients while Bus handles authentication, billing entitlement, runtime wake-up, streaming, and usage metering.

The provider validates Bus API JWTs issued by bus auth. The same OpenAI-compatible API can be used by tools such as bus-agent, but existing non-Bus model providers and credential flows can remain available in those client tools.

Authentication

Execution endpoints require a Bearer JWT with audience ai.hg.fi/api and scope llm:proxy. GET /v1/models also requires a valid bearer token so the catalog is not public, but it does not check billing entitlement or wake the runtime.

The JWT sub is the account UUID used for billing and usage records.

GET /v1/models

Returns the model catalog shown to end users.

By default this endpoint uses the configured local catalog. It does not wake GPU runtimes, check billing entitlement, or probe the backend.

Use proxy mode only when the deployment intentionally wants model listing to be forwarded to the backend.

POST /v1/chat/completions

Proxies OpenAI-compatible chat completion requests to the configured backend, or publishes a provider-neutral LLM execution event when --execution-backend events is selected.

Streaming requests are forwarded chunk by chunk. When possible, the provider adds stream_options.include_usage=true so streamed responses can include token usage for billing.

POST /v1/completions

Proxies OpenAI-compatible text completion requests or sends the matching Bus LLM execution event when event-backed execution is enabled.

The provider applies the same authentication, billing, runtime readiness, and usage recording behavior as chat completions.

POST /v1/responses

Proxies OpenAI-compatible Responses API requests or sends the matching Bus LLM execution event when event-backed execution is enabled.

Use this endpoint for clients that target the newer OpenAI-compatible response shape.

POST /v1/embeddings

Proxies OpenAI-compatible embedding requests.

Embedding requests are authenticated and metered under the same account as other execution requests.

GET /readyz

Reports provider readiness.

When required Events listeners are enabled, readiness stays unhealthy until the runtime, usage, and billing response streams are connected.

Billing Enforcement

When --billing-backend events is enabled, execution endpoints check entitlement before runtime wake-up or backend proxying.

Denied access returns billing_required or quota_exceeded with guidance from the billing system.

Runtime Wake-Up

When --runtime-backend events is enabled, the provider uses VM runtime events to make the backend available before forwarding execution requests.

Model catalog reads do not trigger runtime wake-up.

Usage Recording

The provider records request lifecycle and token-usage events through direct storage or bus-integration-usage.

Client disconnects during streaming cancel upstream work and record a terminal abort/failure event when backend work may have started.

--addr <addr>

Selects the listen address for the provider. Default is 127.0.0.1:8080; use --addr 0.0.0.0:<port> inside containers that must accept traffic from other services.

--backend-url <url>

Sets the OpenAI-compatible backend URL used for execution requests when --execution-backend http is selected. Default is http://127.0.0.1:11434. It is required for HTTP execution unless the deployment intentionally uses the default local backend.

Use the provider root as the base URL, without appending /v1; the LLM provider appends the incoming /v1/* request path itself. For example:

bus-api-provider-llm --backend-url http://127.0.0.1:11434

--execution-backend <http|events>

Selects where model execution runs. Default is http; accepted values are http and events.

Use http to proxy requests directly to an OpenAI-compatible backend at --backend-url.

Use events to publish provider-neutral bus.llm.* execution events. This is the preferred local Bus architecture for Codex-backed development because bus-api-provider-llm stays responsible for REST compatibility, JWTs, billing, runtime readiness, and usage records while integrations such as bus-integration-codex own provider-specific model execution.

When events is selected, the provider listens for correlated response events and does not require --backend-url. The provider service token must have llm:proxy for publishing and listening to bus.llm.* events.

--model-catalog <path>

Loads the local /v1/models catalog from a JSON file. Required when --models-backend catalog is used and the deployment should serve a custom catalog.

The matching environment variable is BUS_LLM_MODEL_CATALOG. The file uses the OpenAI-compatible model-list shape. Each entry needs at least id, object, created, and owned_by:

{
  "object": "list",
  "data": [
    {"id": "codex-chatgpt", "object": "model", "created": 0, "owned_by": "bus-codex"}
  ]
}

--models-backend <catalog|proxy>

Selects how /v1/models is served. Default is catalog; accepted values are catalog and proxy.

Use catalog for production deployments that should not wake GPU backends on model listing. Use proxy only when backend model listing is intended.

--runtime-backend <none|events>

Controls runtime wake-up. Default is none; accepted values are none and events.

Use events when the provider should ask the Bus VM/runtime layer to start or verify the backend before execution requests.

--usage-backend <postgres|events|memory>

Controls usage recording.

Default is postgres. Use events when usage should be collected by bus-integration-usage; use memory only for deterministic local checks. With the postgres default, set BUS_USAGE_DATABASE_URL to a reachable PostgreSQL URL before starting the provider. The provider creates or uses its minimal usage tables at startup; missing or unreachable storage makes usage recording unavailable instead of silently using memory.

--billing-backend <none|events>

Controls billing entitlement checks. Default is none; accepted values are none and events.

Use events for paid LLM plans.

--events-url <url>

Sets the Bus Events API URL used by runtime, usage, and billing event backends. Required when any selected backend uses Events.

Provide the provider’s Events token through deployment-managed configuration, such as BUS_API_TOKEN. Do not pass bearer tokens as command-line arguments. When --execution-backend events is enabled, the token must be able to send and receive bus.llm.* events with llm:proxy. When --runtime-backend events is enabled, the token must be able to send VM start/status requests and receive the correlated responses, typically vm:write and vm:read. When --usage-backend events is enabled, it needs usage write permissions such as usage:write. When --billing-backend events is enabled, it needs entitlement-check permission such as billing:entitlement:check. Deployments may use an internal service token for these provider-to-provider calls.

--backend-ready-path <path>

Sets the backend readiness path checked after runtime wake-up. No default is set. Configure it when --runtime-backend events should poll a specific backend readiness endpoint after wake-up.

Common values are /v1/models for OpenAI-compatible backends and /api/tags for Ollama-compatible backends.

--backend-ready-timeout <duration>

Sets the maximum time to wait for backend readiness. Default is 30s; use a Go duration such as 5s, 30s, or 2m.

--backend-ready-poll-interval <duration>

Sets the delay between backend readiness attempts. Default is 1s; use a Go duration.

--backend-ready-statuses <codes>

Sets the HTTP status codes that count as backend-ready. Use comma-separated integer status codes, such as 200,204. The default ready status set is 200,204.

--timeout <duration>

Sets backend proxy and Events request timeouts. Default is 60s; use a Go duration such as 15s, 60s, or 2m.

BUS_EVENTS_LISTENER_REQUIRED

When set to 1, readiness requires the Events response listeners needed by the enabled backends. The default is unset/false.

Local Compose Stack

The BusDK superproject compose.yaml starts this provider as bus-llm with --execution-backend events, --usage-backend events, --runtime-backend none, and --events-url http://bus-events:8081. Nginx exposes the OpenAI-compatible API at /v1/* on the local API port. The model catalog is loaded from deploy/local-ai-platform/model-catalog.json; the smoke check expects GET /v1/models to include codex-chatgpt.

Start and verify the local stack from the superproject root:

cd /path/to/busdk
docker compose up --build -d
TOKEN="$(docker compose exec -T testing-agent cat /root/.config/bus/auth/api-token)"
curl -fsS -H "Authorization: Bearer $TOKEN" \
  http://127.0.0.1:${LOCAL_AI_PLATFORM_PORT:-8080}/v1/models

The response should include a model with "id":"codex-chatgpt".

LLM execution requests are sent as bus.llm.* events to bus-integration-codex. Set BUS_LOCAL_AI_PLATFORM_LIVE_CODEX=1 for the compose smoke script only after Codex credentials are available to the bus-codex container. Set BUS_LLM_BILLING_BACKEND=events when local LLM requests should require billing entitlement checks.

End-User Access

Approved users request an API token with LLM scope:

bus auth token --audience ai.hg.fi/api --scope "llm:proxy"

llm:proxy is the required scope for model execution. Add billing:read only when the same token will also call billing status or setup APIs. By default, bus auth token writes the issued API token to ~/.config/bus/auth/api-token or ${BUS_CONFIG_DIR}/auth/api-token, which the curl example below reads. If a deployment configures token output differently, write the token to that file or set TOKEN from the command output explicitly.

They can then use the token with OpenAI-compatible clients by setting the base URL to the Bus LLM endpoint and using the Bus API token as the bearer token. For a hosted AI Platform deployment this is commonly the /v1 API base URL. For example:

TOKEN="$(cat ~/.config/bus/auth/api-token)"
curl -fsS \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://ai.hg.fi/v1/chat/completions \
  -d '{"model":"codex-chatgpt","messages":[{"role":"user","content":"Say OK"}]}'

Success returns an OpenAI-compatible chat completion response. If billing is required or quota is exhausted, the provider returns a deterministic billing error instead of forwarding the request.

Billing setup is required only when the deployment enforces billing for the feature. If billing is missing or quota is exhausted, the provider returns a deterministic error with guidance instead of waking the runtime or forwarding the request to the backend.

Usage And Billing

The provider records lifecycle events for request starts, runtime readiness, backend starts and finishes, successful token usage, missing usage, request failures, and client aborts. bus-integration-usage can export successful token usage to bus-integration-billing, which counts quota buckets and records payment-provider meter events such as Stripe meter events.

Streaming clients that disconnect early cancel upstream work and record a terminal failure/abort usage event when backend work may have started. This keeps billing and operational records aligned with actual work attempted by the service.

For Stripe-backed deployments, configure billing and Stripe integrations before enabling paid LLM access for users. Keep Stripe keys and webhook secrets in deployment secrets or untracked local operator configuration.

Using from .bus files

Inside a .bus file, write the module target without the bus prefix:

# same as: bus api provider llm --addr 127.0.0.1:8088 --execution-backend events
api provider llm --addr 127.0.0.1:8088 --execution-backend events --events-url "$BUS_EVENTS_API_URL"