> ## Documentation Index
> Fetch the complete documentation index at: https://docs.praison.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Token Estimation Validation

> Validate token estimates and track estimation accuracy

Token estimation validation compares heuristic estimates against accurate counts, logging mismatches for debugging.

## Quick Start

```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
from praisonaiagents import ContextManager, ManagerConfig, EstimationMode

config = ManagerConfig(
    estimation_mode=EstimationMode.VALIDATED,
    log_estimation_mismatch=True,
    mismatch_threshold_pct=15.0,
)

manager = ContextManager(model="gpt-4o-mini", config=config)

# Estimate with validation
tokens, metrics = manager.estimate_tokens(text, validate=True)

if metrics:
    print(f"Heuristic: {metrics.heuristic_estimate}")
    print(f"Accurate: {metrics.accurate_estimate}")
    print(f"Error: {metrics.error_pct:.1f}%")
```

## Estimation Modes

| Mode        | Description                   | Performance |
| ----------- | ----------------------------- | ----------- |
| `HEURISTIC` | Fast character-based estimate | Fastest     |
| `ACCURATE`  | Use tiktoken if available     | Slower      |
| `VALIDATED` | Compare both, log mismatches  | Slowest     |

## Configuration

```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
config = ManagerConfig(
    estimation_mode=EstimationMode.VALIDATED,
    log_estimation_mismatch=True,      # Log when mismatch > threshold
    mismatch_threshold_pct=15.0,       # 15% threshold
)
```

### Environment Variables

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
export PRAISONAI_CONTEXT_ESTIMATION_MODE=validated
export PRAISONAI_CONTEXT_LOG_MISMATCH=true
```

## EstimationMetrics

```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
@dataclass
class EstimationMetrics:
    heuristic_estimate: int    # Fast estimate
    accurate_estimate: int     # Tiktoken count
    error_pct: float          # Percentage error
    estimator_used: EstimationMode
```

## Mismatch Logging

When `log_estimation_mismatch=True` and error exceeds threshold:

```
WARNING: Token estimation mismatch: heuristic=1250, accurate=1100, error=13.6%
```

## Estimation Caching

Estimates are cached by content hash:

```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
# First call - computes estimate
tokens1, _ = manager.estimate_tokens(text)

# Second call - uses cache
tokens2, _ = manager.estimate_tokens(text)

# Cache key is MD5 hash of text
```

## Heuristic Algorithm

The heuristic uses character-based estimation:

```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
# ASCII characters: ~0.25 tokens per char
# Non-ASCII: ~1.3 tokens per char
# Plus overhead for message structure
```

## Accurate Estimation

When tiktoken is available:

```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
# Uses model-specific tokenizer
# Falls back to heuristic if unavailable
```

## CLI Usage

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
# View estimation mode in config
praisonai chat
> /context config

# Shows:
# Estimation:
#   estimation_mode:        validated
#   log_mismatch:           True
```

## Best Practices

1. **Use heuristic for production** - Fast and good enough
2. **Use validated for debugging** - Find estimation issues
3. **Set reasonable threshold** - 15-20% is typical
4. **Monitor mismatch logs** - Identify problematic content
