OpenAI-compatible HTTP API · model pinned to operator · every token receipted
Take any model in the Concord registry, ask the operator of your jurisdiction to
serve it, get back an HTTPS endpoint that speaks the OpenAI HTTP API. Same
/v1/chat/completions,
/v1/embeddings, and
/v1/audio/transcriptions shape
your client already knows — but the inference happens on operator hardware in the
operator's jurisdiction, with a signed per-request receipt naming model version,
operator, region, and the puller's institutional key.
Use cases: regulated finance running RAG against models without exporting prompts; national health systems on sovereign weights; defence procurement that needs a chain-of-custody from manifest to token; reproducible eval pipelines pinned to a specific model + operator + date.
OpenAI-shaped · base URL https://serve.eu.concordfaces.org/v1
| Method | Path | Description | Status |
|---|---|---|---|
| POST | /v1/chat/completions | OpenAI-compatible chat completion. stream=true supported. | α Q1 2027 |
| POST | /v1/completions | Legacy text completion shape, for older clients. | α Q1 2027 |
| POST | /v1/embeddings | Vector embeddings — encoder + sentence-transformer models. | α Q1 2027 |
| POST | /v1/audio/transcriptions | Speech-to-text (Whisper-class). | β Q2 2027 |
| POST | /v1/images/generations | Diffusion image generation (FLUX, SDXL, SD3.5). | β Q2 2027 |
| GET | /v1/models | List models hosted by this operator endpoint. | α Q1 2027 |
| GET | /v1/receipts/{id} | Fetch the signed receipt for a prior request id. | α Q1 2027 |
Inference does not cross a border without a signed token
The operator hosts the model, runs the GPU, and emits the tokens — all inside one jurisdiction. Prompts are not forwarded to other operators. A request from outside the operator's region either resolves at the puller's continental operator (if it also hosts the model), or carries a signed cross-border token that countersigns the receipt. There is no shadow back-end on a foreign cloud.
[receipt] id = "rcp_01HZ2A…7P3K" issued_at = 2027-01-14T11:22:09Z operator = "eu:concord-eu" node = "eu:ams-3" [model] name = "deepseek-ai/DeepSeek-R1" version = "v1" manifest = "b3:1f8c…ab02" # content-addressed [request] endpoint = "/v1/chat/completions" puller_key = "eu:acme-bank:k/2027-01" residency = "eu → eu" # inferred jurisdiction → served jurisdiction in_tokens = 2_134 out_tokens = 418 [signature] alg = "ed25519" key = "eu:europa:k/2027-01" sig = "5f3c…d091"
Best-of-breed open source · operator picks the engine, the protocol stays the same
LLM
Default text-gen back-end. PagedAttention, prefix caching, speculative decoding. Operator-selectable per model.
LLM (alt)
HuggingFace text-generation-inference. Selected for models with custom tokenizers or unusual quantizations vLLM doesn't yet support.
Embeddings
HF TEI for sentence-transformers + BGE + nomic + cross-encoders. Sub-millisecond batched embeddings.
Speech
CTranslate2-backed Whisper variants. Streaming transcription with per-segment receipts.
Image
FLUX, SDXL, SD3.5 generation. Operator may enforce a default safety guard depending on its jurisdiction's regulator.
Custom
Any runtime that speaks the OpenAI HTTP shape and produces a signed receipt to spec. Operators register their runtimes in the federation gossip layer.
Cost-passthrough + sovereignty premium
Each operator publishes its own per-token rate per model. Phase-0 pricing
targets cost-passthrough on the GPU hour plus a small premium that funds
the operator's audit, compliance, and storage obligations. Rates are
posted at serve.<op>.concordfaces.org/v1/pricing
in machine-readable form and indexed in the federation gossip layer so
a client can shop the federation under a residency constraint.
Operator-published · per-model · per-region
Each operator publishes per-model SLO targets (p50 / p95 / p99 first-token
latency, tokens-per-second steady-state, monthly availability) and a public
status page at status.<op>.concordfaces.org.
Breach of a posted SLO is a credit owed by the operator under its standard
terms. The federation does not arbitrate operator SLOs; each operator is
accountable under its seat's law.
Drop-in OpenAI client · point base_url at the operator
$ curl https://serve.eu.concordfaces.org/v1/chat/completions \ -H "Authorization: Bearer $CONCORD_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "deepseek-ai/DeepSeek-R1", "messages": [ {"role": "user", "content": "Hello"} ] }'
>>> from openai import OpenAI >>> client = OpenAI( ... base_url="https://serve.eu.concordfaces.org/v1", ... api_key=os.environ["CONCORD_API_KEY"], ... ) >>> r = client.chat.completions.create( ... model="meta-llama/Llama-3.1-8B-Instruct", ... messages=[{"role":"user","content":"Hello"}], ... ) >>> r.id # signed receipt id
id on every response maps
to a signed receipt at /v1/receipts/{id}. Pull and store
receipts to discharge audit obligations; they are admissible in the
operator's jurisdiction.