CH·INFAPI Reference
Inference API
Run LLM chat and embedding inference through an OpenAI-compatible REST surface — no instances to manage.
§Getting started¶
The inference API is OpenAI-compatible. Point any OpenAI SDK at https://api.gpu.ai/v1 with one of your API keys and call /v1/chat/completions or /v1/embeddings. Browse the available models in the model catalog.