Streaming

Stream AI responses token-by-token as they’re generated, instead of waiting for the complete response.

Quick Start

Install

pip install praisonaiagents

Stream Responses

from praisonaiagents import Agent

agent = Agent(instructions="You are a helpful assistant")

for chunk in agent.start("Write a short story", stream=True):
    print(chunk, end="", flush=True)

Choosing the Right Method

Method	Streams	Display	Best For
`start(stream=True)`	✅ Yes	✅ Auto	Terminal, interactive chat
`iter_stream()`	✅ Always	❌ No	App integration, custom UIs
`run()`	❌ No	❌ No	Production, batch processing
`chat(stream=True)`	Configurable	Configurable	Low-level control

Common Patterns

Terminal Streaming

from praisonaiagents import Agent

agent = Agent(instructions="You are a helpful assistant")

# Tokens appear as they arrive
for chunk in agent.start("Explain quantum computing", stream=True):
    print(chunk, end="", flush=True)

App Integration with `iter_stream()`

Best for integrating into your own application — yields raw chunks with no display overhead.

from praisonaiagents import Agent

agent = Agent(instructions="You are a helpful assistant")

full_response = ""
for chunk in agent.iter_stream("Write a haiku"):
    full_response += chunk
    # Send to your UI, WebSocket, or processing pipeline

print(full_response)

Streaming with Callbacks

Hook into every streaming event for fine-grained control.

from praisonaiagents import Agent
from praisonaiagents.streaming import StreamEvent, StreamEventType

def on_event(event: StreamEvent):
    if event.type == StreamEventType.DELTA_TEXT:
        print(event.content, end="", flush=True)
    elif event.type == StreamEventType.FIRST_TOKEN:
        print("⚡ First token received!")
    elif event.type == StreamEventType.STREAM_END:
        print("\n✅ Done!")

agent = Agent(instructions="You are a helpful assistant")
agent.stream_emitter.add_callback(on_event)
agent.start("Tell me a joke", stream=True)

FastAPI SSE Integration

Pipe streaming tokens directly to a web client using Server-Sent Events.

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from praisonaiagents import Agent

app = FastAPI()

@app.get("/stream")
async def stream_response(prompt: str):
    agent = Agent(instructions="You are a helpful assistant")
    
    def generate():
        for chunk in agent.iter_stream(prompt):
            yield f"data: {chunk}\n\n"
        yield "data: [DONE]\n\n"
    
    return StreamingResponse(generate(), media_type="text/event-stream")

Async Streaming

import asyncio
from praisonaiagents import Agent

async def main():
    agent = Agent(instructions="You are a helpful assistant")
    result = await agent.astart("Write a poem", stream=True)
    print(result)

asyncio.run(main())

StreamEvent Protocol

Every streaming chunk emits a StreamEvent with full context.

Event	When
`REQUEST_START`	Before API call
`HEADERS_RECEIVED`	HTTP 200 arrives
`FIRST_TOKEN`	First content delta (TTFT marker)
`DELTA_TEXT`	Each text chunk
`DELTA_TOOL_CALL`	Tool call streaming
`LAST_TOKEN`	Final content delta
`STREAM_END`	Stream completed

Metrics

Track Time To First Token (TTFT) and throughput.

from praisonaiagents import Agent
from praisonaiagents.streaming import StreamEvent, StreamEventType, StreamMetrics

metrics = StreamMetrics()

def on_event(event: StreamEvent):
    metrics.update_from_event(event)
    if event.type == StreamEventType.DELTA_TEXT:
        print(event.content, end="", flush=True)

agent = Agent(instructions="You are a helpful assistant")
agent.stream_emitter.add_callback(on_event)
agent.start("Explain AI briefly", stream=True)

print(metrics.format_summary())
# Output: TTFT: 245ms | Stream: 1200ms | Total: 1445ms | Tokens: 150 (125.0/s)

Metric	Description
TTFT	Time from request to first token (provider latency)
Stream Duration	From first to last token
Total Time	End-to-end request time
Tokens/s	Token generation rate

Key Concepts

Time To First Token (TTFT)

Request → [TTFT] → First Token → [Streaming] → Last Token → Done

TTFT is the time before the first token arrives. This is provider latency — the model must process your prompt before generating. Streaming does NOT reduce TTFT, but it shows progress immediately.

Streaming vs Non-Streaming

Mode	Behavior	Use Case
`stream=True`	Tokens appear as generated	Interactive chat, real-time display
`stream=False`	Complete response at once	Batch processing, structured output

CLI Usage

# Stream responses in terminal
praisonai chat --stream "Tell me a joke"

# With verbose output
praisonai chat --stream --verbose "Explain quantum computing"

Best Practices

Use iter_stream() for app integration

iter_stream() yields raw chunks with zero display overhead — ideal for piping into FastAPI, WebSocket, or custom UIs.

Use start(stream=True) for terminal

start() handles display automatically. Pass stream=True for real-time token output in interactive sessions.

Monitor TTFT for performance

High TTFT indicates model or network issues. Use StreamMetrics to track and optimize.

Handle errors in callbacks

The emitter catches callback exceptions silently to avoid breaking the stream. Log errors inside your callback.

Troubleshooting

”Streaming seems to buffer before showing anything”

This is TTFT, not buffering. The model is generating the first token. Check:

Model complexity (larger models have higher TTFT)
Prompt length (longer prompts take longer to process)
Network latency to the API

”Tokens appear in chunks, not one at a time”

Normal. Providers may batch tokens for efficiency.

Output & Display

Output formatting options

Async

Async agent execution

Getting Started

Core Concepts

Guides

Features

Models

Databases

Observability

Memory

Knowledge

RAG

Persistence

Tools

Other Features

Developers

Configuration

Best Practices

Getting Started (No Code)

Quick Start

Choosing the Right Method

Common Patterns

Terminal Streaming

App Integration with `iter_stream()`

Streaming with Callbacks

FastAPI SSE Integration

Async Streaming

StreamEvent Protocol

Metrics

Key Concepts

Time To First Token (TTFT)

Streaming vs Non-Streaming

CLI Usage

Best Practices

Troubleshooting

”Streaming seems to buffer before showing anything”

”Tokens appear in chunks, not one at a time”

Output & Display

Async

Getting Started

Core Concepts

Guides

Features

Models

Databases

Observability

Memory

Knowledge

RAG

Persistence

Tools

Other Features

Developers

Configuration

Best Practices

Getting Started (No Code)

​Quick Start

​Choosing the Right Method

​Common Patterns

​Terminal Streaming

​App Integration with iter_stream()

​Streaming with Callbacks

​FastAPI SSE Integration

​Async Streaming

​StreamEvent Protocol

​Metrics

​Key Concepts

​Time To First Token (TTFT)

​Streaming vs Non-Streaming

​CLI Usage

​Best Practices

​Troubleshooting

​”Streaming seems to buffer before showing anything”

​”Tokens appear in chunks, not one at a time”

​Related

Output & Display

Async

Quick Start

Choosing the Right Method

Common Patterns

Terminal Streaming

App Integration with `iter_stream()`

Streaming with Callbacks

FastAPI SSE Integration

Async Streaming

StreamEvent Protocol

Metrics

Key Concepts

Time To First Token (TTFT)

Streaming vs Non-Streaming

CLI Usage

Best Practices

Troubleshooting

”Streaming seems to buffer before showing anything”

”Tokens appear in chunks, not one at a time”

Related