ElevenLabs Scribe replaces local Whisper as default transcription: - 96.7% accuracy English, 2.4% WER Indonesian (vs Whisper 7.7%) - 18x faster (200s song = 11s vs 3-5 min on CPU) - No hallucinations on songs (Whisper invented 'Pony und Kleid' for 'Bonnie und Clyde') - 99 languages supported, including SLO/HR/BS/SR - $0.40/h pricing, ~$0.022 per 200s song Implementation: - transcribe_with_elevenlabs() function uses Scribe v1 - ISO 639-1 ↔ 639-3 mapping (Scribe needs 'deu' not 'de') - Word-level timestamps converted to pseudo-segments (close on 0.6s pause or 6s duration) - 24MB upload limit guard with auto-fallback to local Default whisper_provider='auto': - If ELEVENLABS_API_KEY set → use Scribe - Otherwise → fallback to local faster-whisper - 'elevenlabs' strict mode: no fallback - 'local' strict mode: skip Scribe entirely Tested on Ben Zucker - Ohne dich: Scribe correctly transcribed 'Wir sind Bonnie und Clyde, zu allem bereit' where local Whisper hallucinated. |
||
|---|---|---|
| .. | ||
| main.py | ||