Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.praison.ai/llms.txt

Use this file to discover all available pages before exploring further.

TL;DR — When agent.chat() fails (LLM 5xx, timeout, rate-limit) the user’s message is normally lost. Set dlq=InboundDLQ(...) and PraisonAI persists the failed message so you can replay it later.

Why you want this

No silent data loss

Failed inbound messages are persisted to a SQLite file before the exception bubbles up.

Operator-friendly replay

A single CLI command (praisonai bot dlq replay) re-runs failed messages through the agent.

Bounded by design

TTL + max_size keep the queue from growing unbounded; oldest entries evict first.

Zero new dependency

Uses only stdlib sqlite3. Default OFF — your existing bots are untouched.

How it flows

Quick start (3 lines)

from praisonai.bots import BotSessionManager, InboundDLQ

dlq = InboundDLQ(path="~/.praisonai/dlq.sqlite")
mgr = BotSessionManager(platform="telegram", dlq=dlq)
# ↑ that's it — failed agent.chat() now lands in the DLQ
Default behaviour is unchanged when no dlq= is passed. This is fully opt-in.

CLI

# List failed messages (newest first)
praisonai bot dlq list

# List from a custom path
praisonai bot dlq list --path /var/lib/myapp/dlq.sqlite --limit 50

# Replay through your bot's configured agent
praisonai bot dlq replay --config bot.yaml

# Purge everything (asks for confirmation)
praisonai bot dlq purge
praisonai bot dlq purge --yes  # skip confirmation

API reference

path
str | Path
required
Where the SQLite file lives. Parent directories are created automatically.
max_size
int
default:"10_000"
Maximum number of entries kept. When exceeded, oldest entries are dropped first.
ttl_seconds
int
default:"604800 (7 days)"
Entries older than this are evicted on the next enqueue() or evict_expired().

DLQEntry

Methods

dlq.size()                  # int
dlq.list(limit=100)         # list[DLQEntry], newest first

Real LLM smoke test

[1] Sending failing message: 'What is 2 plus 2? Answer with a single digit.'
   Caught expected error: simulated LLM 503
   DLQ size after fail: 1  ✅

[2] Replaying DLQ via real LLM …
   succeeded=1, failed=0, remaining=0

[Real LLM reply] 4

PASS: DLQ → replay → real LLM produced expected '4'.

Operational notes

Disk usage — every failed message + its prompt is written to disk. With chronic LLM outages this can grow fast. Tune max_size and ttl_seconds for your retention policy.
Thread safety — every write is guarded by an internal threading.Lock. SQLite WAL is enabled. Safe to share one InboundDLQ instance across threads.
Backward compatibleBotSessionManager(...) without dlq= behaves exactly as before. No behaviour change for existing bots.

Combining with other features

The DLQ records platform, user_id, and (if W1’s IdentityResolver is wired) the same user_id resolves the same human across platforms. Replay restores the exact session.
For transient failures use praisonai.bots._resilience.BackoffPolicy to retry inline before falling back to the DLQ. The DLQ is the last resort, not the first.
Wrap dlq.enqueue() with your tracer (e.g. OTEL span) to alert on DLQ growth. A non-zero dlq.size() is a great SLO trip-wire.
OSS now File-backed SQLite DLQ — single-host deploys. Cloud (planned) Multi-region replicated DLQ with web dashboard, automatic alerting, and one-click bulk replay.