Rate Limiter caps how fast your agents call the LLM, so you stay inside provider rate limits and protect your budget — safely, even when many agents share the same limiter. The rate limiter is shared by both the initial LLM call and the follow-up call that runs after tool execution in streaming mode — you don’t need to configure them separately.Documentation Index
Fetch the complete documentation index at: https://docs.praison.ai/llms.txt
Use this file to discover all available pages before exploring further.
Quick Start
Share one limiter across multiple agents
The same
RateLimiter instance can be shared across any number of agents and threads — the combined throughput stays inside the configured budget.How It Works
| Step | What happens |
|---|---|
| Refill | Tokens regenerate based on elapsed time and requests_per_minute / tokens_per_minute. |
| Acquire | A thread reserves a token; under contention, only one thread mutates state at a time. |
| Wait | If no tokens are available, the caller sleeps (sync) or awaits (async) until the next refill. |
| Release | No explicit release — tokens refill automatically on a rolling window. |
Choose Your Mode
Configuration Options
| Option | Type | Default | Description |
|---|---|---|---|
requests_per_minute | int | Required | Max LLM requests per rolling 60-second window. |
tokens_per_minute | int | None | Optional token-budget limit (for TPM-quoted providers). |
burst | int | 1 | Max burst size — requests allowed back-to-back before the rate kicks in. |
Thread Safety & Multi-Agent Use
Every method on
RateLimiter — both sync (acquire, acquire_tokens, try_acquire, reset) and async (acquire_async, acquire_tokens_async) — is safe to call concurrently. You can share a single RateLimiter across threads, AgentTeam members, PraisonAIAgents, and ParallelToolCallExecutor workers without exceeding the configured budget.Thread pool with shared limiter
Monitoring available budget
available_tokens and available_api_tokens are safe to read from any thread — they acquire the same locks as acquire() internally.Manual Usage
When not usingExecutionConfig, you can acquire tokens directly:
Best Practices
Share one limiter across related agents
Share one limiter across related agents
Match burst to your workload
Match burst to your workload
A low burst (1–5) smooths traffic; a high burst tolerates spiky demand.
Use tokens_per_minute when the provider charges by tokens
Use tokens_per_minute when the provider charges by tokens
OpenAI / Anthropic quote both RPM and TPM — limiting only on RPM can still trip 429s.
Prefer async paths in async flows
Prefer async paths in async flows
agent.achat(...) automatically calls acquire_async(); avoid mixing sync and async limiters in one workflow.CLI
Related
Thread-safe chat history and caches
Limit parallel agent runs

