Token Estimation

PraisonAI provides fast, offline token estimation that works without API calls. This enables real-time context budget tracking and optimization decisions.

Quick Start

from praisonaiagents import (
    estimate_tokens_heuristic,
    estimate_messages_tokens,
    estimate_tool_schema_tokens,
)

# Estimate tokens for text
text = "Hello, how are you today?"
tokens = estimate_tokens_heuristic(text)
print(f"Estimated: {tokens} tokens")

# Estimate tokens for messages
messages = [
    {"role": "system", "content": "You are helpful."},
    {"role": "user", "content": "What is Python?"},
]
tokens = estimate_messages_tokens(messages)
print(f"Messages: {tokens} tokens")

# Estimate tool schema tokens
tools = [
    {"name": "read_file", "description": "Read a file"},
    {"name": "write_file", "description": "Write to a file"},
]
tokens = estimate_tool_schema_tokens(tools)
print(f"Tools: {tokens} tokens")

Estimation Algorithm

The heuristic estimator uses character-based rules optimized for typical LLM tokenization:

Character Type	Tokens per Character
ASCII text	~0.25 (4 chars = 1 token)
Non-ASCII (Unicode)	~1.3 tokens per char
Whitespace	Counted normally

Message Overhead

Each message includes overhead for role markers and formatting:

Base overhead: 4 tokens per message
Role tokens: ~2 tokens
Content: Estimated via heuristic

API Reference

`estimate_tokens_heuristic(text: str) -> int`

Estimate tokens for a string using character-based heuristics.

tokens = estimate_tokens_heuristic("Hello world!")
# Returns: ~3 tokens

`estimate_messages_tokens(messages: List[Dict]) -> int`

Estimate total tokens for a list of chat messages.

messages = [
    {"role": "user", "content": "Hello"},
    {"role": "assistant", "content": "Hi there!"},
]
tokens = estimate_messages_tokens(messages)

`estimate_tool_schema_tokens(tools: List[Dict]) -> int`

Estimate tokens for tool/function schemas.

tools = [{"name": "search", "description": "Search the web"}]
tokens = estimate_tool_schema_tokens(tools)

`TokenEstimatorImpl`

Class-based estimator with caching:

from praisonaiagents import TokenEstimatorImpl

estimator = TokenEstimatorImpl()
tokens = estimator.estimate("Some text")

`get_estimator() -> TokenEstimatorImpl`

Get a singleton estimator instance:

from praisonaiagents import get_estimator

estimator = get_estimator()
tokens = estimator.estimate("Text to estimate")

Accuracy Considerations

The heuristic estimator is designed for speed over perfect accuracy:

Scenario	Accuracy
English text	~90-95%
Code	~85-90%
Mixed content	~85-90%
Non-ASCII heavy	~80-85%

For budget decisions, the estimator adds a small safety margin to prevent underestimation.

Performance

Speed: < 1ms for 100K characters
Memory: O(1) - no caching required
No API calls: Works completely offline

Integration with Budgeter

from praisonaiagents import (
    ContextBudgeter,
    estimate_messages_tokens,
)

budgeter = ContextBudgeter(model="gpt-4o-mini")
budget = budgeter.allocate()

# Check if messages fit in budget
messages = [...]  # Your conversation
tokens = estimate_messages_tokens(messages)

if tokens > budget.usable * 0.8:
    print("Warning: Approaching context limit!")

Next Steps

Context Ledger - Track tokens by segment
Context Budgeter - Allocate token budgets

Getting Started

Core Concepts

Guides

Features

Models

Databases

Observability

Memory

Knowledge

RAG

Persistence

Tools

Other Features

Developers

Configuration

Best Practices

Getting Started (No Code)

Token Estimation

Token Estimation

Quick Start

Estimation Algorithm

Message Overhead

API Reference

`estimate_tokens_heuristic(text: str) -> int`

`estimate_messages_tokens(messages: List[Dict]) -> int`

`estimate_tool_schema_tokens(tools: List[Dict]) -> int`

`TokenEstimatorImpl`

`get_estimator() -> TokenEstimatorImpl`

Accuracy Considerations

Performance

Integration with Budgeter

Next Steps

Getting Started

Core Concepts

Guides

Features

Models

Databases

Observability

Memory

Knowledge

RAG

Persistence

Tools

Other Features

Developers

Configuration

Best Practices

Getting Started (No Code)

​Token Estimation

​Quick Start

​Estimation Algorithm

​Message Overhead

​API Reference

​estimate_tokens_heuristic(text: str) -> int

​estimate_messages_tokens(messages: List[Dict]) -> int

​estimate_tool_schema_tokens(tools: List[Dict]) -> int

​TokenEstimatorImpl

​get_estimator() -> TokenEstimatorImpl

​Accuracy Considerations

​Performance

​Integration with Budgeter

​Next Steps

Token Estimation

Quick Start

Estimation Algorithm

Message Overhead

API Reference

`estimate_tokens_heuristic(text: str) -> int`

`estimate_messages_tokens(messages: List[Dict]) -> int`

`estimate_tool_schema_tokens(tools: List[Dict]) -> int`

`TokenEstimatorImpl`

`get_estimator() -> TokenEstimatorImpl`

Accuracy Considerations

Performance

Integration with Budgeter

Next Steps