Performance
Latency profiles, accuracy benchmarks, and methodology.
Latency overview
ChessGrammar is designed for low-latency analysis. Latency depends on the depth mode, the number of patterns present, and whether move sequences are requested.
| Mode | Median | p99 | Description |
|---|---|---|---|
| L1 (structural scan) | ~3ms | < 15ms | Fast geometric pattern candidate detection |
| L2 (forcing tree) | ~42ms | < 500ms | Full confirmation with alpha-beta pruning |
| L2 + with_sequence | ~205ms/tactic | varies | Includes forcing move sequences |
All measurements are per position, taken on the production deployment (engine-side time, excluding network overhead).
Latency by pattern
Different patterns have different computational costs. Simple structural patterns (smothered mate, skewer) are fastest; patterns requiring deeper forcing tree search (double check, interference) take longer.
| Pattern | L1 p50 | L2 p50 | Notes |
|---|---|---|---|
| Fork | 2ms | 38ms | Geometric detection with iterative SEE |
| Pin | 4ms | 38ms | |
| Skewer | 1ms | 1ms | Fastest — structural only |
| Discovered Attack | 5ms | 114ms | Blocker-ray geometric pre-filter |
| Double Check | 6ms | 457ms | Geometric blocker mask, deep forcing tree |
| Back Rank Mate | 2ms | 20ms | |
| Smothered Mate | 3ms | 7ms | Low L1 and L2 — strict structural condition |
| Deflection | 2ms | 132ms | Requires defender analysis |
| Interference | 3ms | 246ms | Requires line analysis |
| Trapped Piece | 10ms | 111ms | Requires full mobility check |
Detection accuracy
Accuracy is measured against a curated dataset of annotated positions from international tournament play and established puzzle databases.
| Metric | Value |
|---|---|
| Overall accuracy | 97.3% |
| Dataset size | 25,000 annotated positions |
| False positive rate (L2) | < 2% |
| False negative rate (L2) | < 4% |
Accuracy by pattern
| Pattern | Accuracy | Notes |
|---|---|---|
| Fork | 98.1% | |
| Pin | 97.8% | |
| Skewer | 96.5% | |
| Discovered Attack | 97.2% | |
| Double Check | 99.1% | Highest — binary condition |
| Back Rank Mate | 98.4% | |
| Smothered Mate | 99.3% | Highest — strict structural condition |
| Deflection | 95.2% | Most subjective pattern |
| Interference | 94.8% | |
| Trapped Piece | 96.1% |
Note on L1 accuracy: L1 has a higher false positive rate (~15-20%) since it detects structural candidates without confirmation. L2 is recommended for applications requiring precision.
Game analysis performance
Full game analysis (PGN) processes each position sequentially. Performance depends on tactical density of the game.
| Game length | L1 estimate | L2 estimate |
|---|---|---|
| 20 moves (40 plies) | ~0.2s | ~1.5s |
| 40 moves (80 plies) | ~0.5s | ~5s |
| 60 moves (120 plies) | ~0.8s | ~8s |
Use depth: "l1" for fast game scanning, then L2 on specific positions of interest (two-pass strategy).
Engine v2.0 improvements
The v2.0 engine introduces significant performance and detection improvements:
- Geometric detection for fork (5x faster), double check (160x faster), and discovered attack (3x faster)
- Alpha-beta pruning in the L2 forcing tree for fast candidate rejection
- Quiescence search prevents horizon-effect evaluation errors
- More tactics detected: structural detection without premature gain filtering catches patterns that v1 missed
Methodology
- Dataset: Positions sourced from FIDE-rated tournament games (2000+ ELO) and curated puzzle databases
- Annotation: Each position manually verified by titled players (FM+)
- Measurement: Latency measured on production Vercel deployment (cold start excluded, engine-side time)
- Reproducibility: All benchmarks can be reproduced using
benchmarks/capture_baseline.py
Performance characteristics may vary during the Developer Preview as the engine evolves.