CH·INFAPI Reference

Inference API

Run LLM chat and embedding inference through an OpenAI-compatible REST surface — no instances to manage.

§Getting started¶

The inference API is OpenAI-compatible. Point any OpenAI SDK at https://api.gpu.ai/v1 with one of your API keys and call /v1/chat/completions or /v1/embeddings. Browse the available models in the model catalog.