Skip to main content
praisonai serve openai starts a FastAPI server that exposes OpenAI-compatible HTTP endpoints — point any existing OpenAI SDK code at it and it works as a drop-in replacement.

Quick Start

1

Start the server

praisonai serve openai
The server starts on http://localhost:8765 by default.
2

Use from any OpenAI client

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8765/v1", api_key="anything")

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
3

Expose a PraisonAI agent

from praisonaiagents import Agent

agent = Agent(
    name="Assistant",
    instructions="You are a helpful assistant"
)
# 1. Start the agents API: praisonai serve agents agents.yaml
# 2. Start the OpenAI compat layer: praisonai serve openai --agents-url http://localhost:8000

How It Works


Endpoints

All endpoints follow the OpenAI REST specification.
MethodPathDescriptionStreaming
POST/v1/chat/completionsChat completionsSSE (stream: true) with [DONE] sentinel
POST/v1/completionsLegacy text completionsNo
GET/v1/modelsList available modelsNo
POST/v1/tools/invokePraisonAI extension — invoke an agent toolNo
GET/__praisonai__/discoveryProvider discoveryNo
GET/healthHealth checkNo

Streaming

Set stream=True to receive tokens via Server-Sent Events:
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8765/v1", api_key="anything")

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Write a haiku about agents"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
The stream ends with a data: [DONE] sentinel. Errors during streaming are also sent as SSE events:
data: {"error": {"message": "...", "type": "stream_error"}}

data: [DONE]

Tool Invocation

/v1/tools/invoke is a PraisonAI extension that calls an agent tool by name. It requires --agents-url to point to a running agents server.
curl -X POST http://localhost:8765/v1/tools/invoke \
  -H "Content-Type: application/json" \
  -d '{"tool": "search", "arguments": {"query": "PraisonAI features"}}'
Start the agents server first:
# Terminal 1 — agents API
praisonai serve agents --file agents.yaml --port 8000

# Terminal 2 — OpenAI compat layer
praisonai serve openai --agents-url http://localhost:8000

Authentication

Pass --api-key to require authentication:
praisonai serve openai --api-key my-secret-key
Clients must include the key as the Authorization header:
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8765/v1",
    api_key="my-secret-key"
)

CLI Reference

praisonai serve openai [OPTIONS]
OptionTypeDefaultDescription
--host, -hstr"127.0.0.1"Host to bind to
--port, -pint8765Port to bind to
--agents-urlstr"http://127.0.0.1:8000"URL of the running Agents API server
--api-keystrNoneOptional API key for authentication
--reloadboolFalseEnable auto-reload (dev)

Choosing Between serve openai and serve rag --openai-compat

SurfaceCommandScope
OpenAI serverpraisonai serve openaiStandalone — chat, completions, models, tool invocation
RAG compat flagpraisonai serve rag --openai-compatRAG microservice only, chat endpoint only
Unifiedpraisonai serve unifiedAll providers in a single server
For OpenAI-compatible RAG queries, use praisonai serve rag --openai-compat instead. See RAG CLI for details.

Error Shape

All errors follow the OpenAI error format:
{
  "error": {
    "message": "Model not found",
    "type": "api_error"
  }
}

Best Practices

Always pass --api-key when running on a public or shared network. The default configuration accepts unauthenticated requests.
praisonai serve openai --api-key $PRAISONAI_API_KEY --host 0.0.0.0
The default 127.0.0.1 only accepts local connections. Use --host 0.0.0.0 to accept connections from other machines — always pair with --api-key.
praisonai serve openai --host 0.0.0.0 --port 8765 --api-key $SECRET
The server passes model directly to the underlying LLM. Use LiteLLM-style model names (ollama/llama3, anthropic/claude-3-5-sonnet-20241022) to route to any provider.
Use praisonai serve unified when you need multiple protocol surfaces (MCP, A2A, agents) alongside the OpenAI-compat layer. Use serve openai for a lightweight standalone deployment that only exposes OpenAI endpoints.

Serve Commands

All praisonai serve subcommands

Gateway

Unified gateway and control plane

RAG OpenAI Compat

OpenAI-compatible RAG server

LLM Endpoint Config

Configure LLM provider endpoints