Token Estimation
PraisonAI provides fast, offline token estimation that works without API calls. This enables real-time context budget tracking and optimization decisions.Quick Start
Estimation Algorithm
The heuristic estimator uses character-based rules optimized for typical LLM tokenization:| Character Type | Tokens per Character |
|---|---|
| ASCII text | ~0.25 (4 chars = 1 token) |
| Non-ASCII (Unicode) | ~1.3 tokens per char |
| Whitespace | Counted normally |
Message Overhead
Each message includes overhead for role markers and formatting:- Base overhead: 4 tokens per message
- Role tokens: ~2 tokens
- Content: Estimated via heuristic
API Reference
estimate_tokens_heuristic(text: str) -> int
Estimate tokens for a string using character-based heuristics.
estimate_messages_tokens(messages: List[Dict]) -> int
Estimate total tokens for a list of chat messages.
estimate_tool_schema_tokens(tools: List[Dict]) -> int
Estimate tokens for tool/function schemas.
TokenEstimatorImpl
Class-based estimator with caching:
get_estimator() -> TokenEstimatorImpl
Get a singleton estimator instance:
Accuracy Considerations
The heuristic estimator is designed for speed over perfect accuracy:| Scenario | Accuracy |
|---|---|
| English text | ~90-95% |
| Code | ~85-90% |
| Mixed content | ~85-90% |
| Non-ASCII heavy | ~80-85% |
Performance
- Speed: < 1ms for 100K characters
- Memory: O(1) - no caching required
- No API calls: Works completely offline
Integration with Budgeter
Next Steps
- Context Ledger - Track tokens by segment
- Context Budgeter - Allocate token budgets

