Benchmarks

We test frontier models against human-verified data drawn from real conversations and real work — the conditions models actually meet in production, not curated test sets.

BENCH-001Read report →

Speech-to-Text

Fifteen frontier transcription models against real multilingual conversation. Best-in-class still misses every second word of dialectal Arabic.

15 models4 languages

BENCH-002

In progress.