Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.praison.ai/llms.txt

Use this file to discover all available pages before exploring further.

Rate Limiter caps how fast your agents call the LLM, so you stay inside provider rate limits and protect your budget — safely, even when many agents share the same limiter. The rate limiter is shared by both the initial LLM call and the follow-up call that runs after tool execution in streaming mode — you don’t need to configure them separately.

Quick Start

1

Simple RPM limit on one agent

from praisonaiagents import Agent
from praisonaiagents.config.feature_configs import ExecutionConfig

agent = Agent(
    name="Researcher",
    instructions="You research topics on the web.",
    execution=ExecutionConfig(max_rpm=60)
)

agent.start("Summarise the latest Mars rover news")
2

Share one limiter across multiple agents

from praisonaiagents import Agent, PraisonAIAgents
from praisonaiagents.config.feature_configs import ExecutionConfig
from praisonaiagents.llm import RateLimiter

shared = RateLimiter(requests_per_minute=60, burst=5)

researcher = Agent(
    name="Researcher",
    instructions="Research topics",
    execution=ExecutionConfig(rate_limiter=shared)
)
writer = Agent(
    name="Writer",
    instructions="Write articles",
    execution=ExecutionConfig(rate_limiter=shared)
)

team = PraisonAIAgents(agents=[researcher, writer])
team.start()
The same RateLimiter instance can be shared across any number of agents and threads — the combined throughput stays inside the configured budget.
3

Token-based limiting (for TPM-quoted providers)

from praisonaiagents import Agent
from praisonaiagents.config.feature_configs import ExecutionConfig
from praisonaiagents.llm import RateLimiter

limiter = RateLimiter(
    requests_per_minute=60,
    tokens_per_minute=90_000,
    burst=5,
)

agent = Agent(
    name="Analyst",
    instructions="Analyse long documents",
    execution=ExecutionConfig(rate_limiter=limiter)
)

How It Works

StepWhat happens
RefillTokens regenerate based on elapsed time and requests_per_minute / tokens_per_minute.
AcquireA thread reserves a token; under contention, only one thread mutates state at a time.
WaitIf no tokens are available, the caller sleeps (sync) or awaits (async) until the next refill.
ReleaseNo explicit release — tokens refill automatically on a rolling window.

Choose Your Mode


Configuration Options

OptionTypeDefaultDescription
requests_per_minuteintRequiredMax LLM requests per rolling 60-second window.
tokens_per_minuteintNoneOptional token-budget limit (for TPM-quoted providers).
burstint1Max burst size — requests allowed back-to-back before the rate kicks in.

Thread Safety & Multi-Agent Use

Every method on RateLimiter — both sync (acquire, acquire_tokens, try_acquire, reset) and async (acquire_async, acquire_tokens_async) — is safe to call concurrently. You can share a single RateLimiter across threads, AgentTeam members, PraisonAIAgents, and ParallelToolCallExecutor workers without exceeding the configured budget.

Thread pool with shared limiter

from concurrent.futures import ThreadPoolExecutor
from praisonaiagents import Agent
from praisonaiagents.config.feature_configs import ExecutionConfig
from praisonaiagents.llm import RateLimiter

limiter = RateLimiter(requests_per_minute=60, burst=5)

def run_agent(question: str) -> str:
    agent = Agent(
        name="Worker",
        instructions="Answer concisely",
        execution=ExecutionConfig(rate_limiter=limiter),
    )
    return agent.start(question)

with ThreadPoolExecutor(max_workers=10) as pool:
    answers = list(pool.map(run_agent, [f"Q{i}" for i in range(50)]))

Monitoring available budget

limiter = RateLimiter(requests_per_minute=60, tokens_per_minute=90_000)

print(f"Requests left: {limiter.available_tokens:.1f}")
print(f"API tokens left: {limiter.available_api_tokens:.1f}")
available_tokens and available_api_tokens are safe to read from any thread — they acquire the same locks as acquire() internally.

Manual Usage

When not using ExecutionConfig, you can acquire tokens directly:
limiter = RateLimiter(requests_per_minute=60)

# Sync
limiter.acquire()  # Blocks if rate exceeded

# Async
await limiter.acquire_async()

# Non-blocking
if limiter.try_acquire():
    # Token acquired
    pass

Best Practices

A low burst (1–5) smooths traffic; a high burst tolerates spiky demand.
OpenAI / Anthropic quote both RPM and TPM — limiting only on RPM can still trip 429s.
agent.achat(...) automatically calls acquire_async(); avoid mixing sync and async limiters in one workflow.

CLI

praisonai "task" --rpm 60

Thread-safe chat history and caches

Limit parallel agent runs