Skip to main content
Stream AI responses token-by-token as they’re generated, instead of waiting for the complete response.

Quick Start

1

Install

pip install praisonaiagents
2

Stream Responses

from praisonaiagents import Agent

agent = Agent(instructions="You are a helpful assistant")

for chunk in agent.start("Write a short story", stream=True):
    print(chunk, end="", flush=True)

Choosing the Right Method

MethodStreamsDisplayBest For
start(stream=True)✅ Yes✅ AutoTerminal, interactive chat
iter_stream()✅ Always❌ NoApp integration, custom UIs
run()❌ No❌ NoProduction, batch processing
chat(stream=True)ConfigurableConfigurableLow-level control

Common Patterns

Terminal Streaming

from praisonaiagents import Agent

agent = Agent(instructions="You are a helpful assistant")

# Tokens appear as they arrive
for chunk in agent.start("Explain quantum computing", stream=True):
    print(chunk, end="", flush=True)

App Integration with iter_stream()

Best for integrating into your own application — yields raw chunks with no display overhead.
from praisonaiagents import Agent

agent = Agent(instructions="You are a helpful assistant")

full_response = ""
for chunk in agent.iter_stream("Write a haiku"):
    full_response += chunk
    # Send to your UI, WebSocket, or processing pipeline

print(full_response)

Streaming with Callbacks

Hook into every streaming event for fine-grained control.
from praisonaiagents import Agent
from praisonaiagents.streaming import StreamEvent, StreamEventType

def on_event(event: StreamEvent):
    if event.type == StreamEventType.DELTA_TEXT:
        print(event.content, end="", flush=True)
    elif event.type == StreamEventType.FIRST_TOKEN:
        print("⚡ First token received!")
    elif event.type == StreamEventType.STREAM_END:
        print("\n✅ Done!")

agent = Agent(instructions="You are a helpful assistant")
agent.stream_emitter.add_callback(on_event)
agent.start("Tell me a joke", stream=True)

FastAPI SSE Integration

Pipe streaming tokens directly to a web client using Server-Sent Events.
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from praisonaiagents import Agent

app = FastAPI()

@app.get("/stream")
async def stream_response(prompt: str):
    agent = Agent(instructions="You are a helpful assistant")
    
    def generate():
        for chunk in agent.iter_stream(prompt):
            yield f"data: {chunk}\n\n"
        yield "data: [DONE]\n\n"
    
    return StreamingResponse(generate(), media_type="text/event-stream")

Async Streaming

import asyncio
from praisonaiagents import Agent

async def main():
    agent = Agent(instructions="You are a helpful assistant")
    result = await agent.astart("Write a poem", stream=True)
    print(result)

asyncio.run(main())

StreamEvent Protocol

Every streaming chunk emits a StreamEvent with full context.
EventWhen
REQUEST_STARTBefore API call
HEADERS_RECEIVEDHTTP 200 arrives
FIRST_TOKENFirst content delta (TTFT marker)
DELTA_TEXTEach text chunk
DELTA_TOOL_CALLTool call streaming
LAST_TOKENFinal content delta
STREAM_ENDStream completed

Metrics

Track Time To First Token (TTFT) and throughput.
from praisonaiagents import Agent
from praisonaiagents.streaming import StreamEvent, StreamEventType, StreamMetrics

metrics = StreamMetrics()

def on_event(event: StreamEvent):
    metrics.update_from_event(event)
    if event.type == StreamEventType.DELTA_TEXT:
        print(event.content, end="", flush=True)

agent = Agent(instructions="You are a helpful assistant")
agent.stream_emitter.add_callback(on_event)
agent.start("Explain AI briefly", stream=True)

print(metrics.format_summary())
# Output: TTFT: 245ms | Stream: 1200ms | Total: 1445ms | Tokens: 150 (125.0/s)
MetricDescription
TTFTTime from request to first token (provider latency)
Stream DurationFrom first to last token
Total TimeEnd-to-end request time
Tokens/sToken generation rate

Key Concepts

Time To First Token (TTFT)

Request → [TTFT] → First Token → [Streaming] → Last Token → Done
TTFT is the time before the first token arrives. This is provider latency — the model must process your prompt before generating. Streaming does NOT reduce TTFT, but it shows progress immediately.

Streaming vs Non-Streaming

ModeBehaviorUse Case
stream=TrueTokens appear as generatedInteractive chat, real-time display
stream=FalseComplete response at onceBatch processing, structured output

CLI Usage

# Stream responses in terminal
praisonai chat --stream "Tell me a joke"

# With verbose output
praisonai chat --stream --verbose "Explain quantum computing"

Best Practices

iter_stream() yields raw chunks with zero display overhead — ideal for piping into FastAPI, WebSocket, or custom UIs.
start() handles display automatically. Pass stream=True for real-time token output in interactive sessions.
High TTFT indicates model or network issues. Use StreamMetrics to track and optimize.
The emitter catches callback exceptions silently to avoid breaking the stream. Log errors inside your callback.

Troubleshooting

”Streaming seems to buffer before showing anything”

This is TTFT, not buffering. The model is generating the first token. Check:
  • Model complexity (larger models have higher TTFT)
  • Prompt length (longer prompts take longer to process)
  • Network latency to the API

”Tokens appear in chunks, not one at a time”

Normal. Providers may batch tokens for efficiency.