reels-app/app
Sebastjan Artič 68247bb84c Integrate ElevenLabs Scribe (best multilingual STT 2026)
ElevenLabs Scribe replaces local Whisper as default transcription:
- 96.7% accuracy English, 2.4% WER Indonesian (vs Whisper 7.7%)
- 18x faster (200s song = 11s vs 3-5 min on CPU)
- No hallucinations on songs (Whisper invented 'Pony und Kleid' for 'Bonnie und Clyde')
- 99 languages supported, including SLO/HR/BS/SR
- $0.40/h pricing, ~$0.022 per 200s song

Implementation:
- transcribe_with_elevenlabs() function uses Scribe v1
- ISO 639-1 ↔ 639-3 mapping (Scribe needs 'deu' not 'de')
- Word-level timestamps converted to pseudo-segments (close on 0.6s pause or 6s duration)
- 24MB upload limit guard with auto-fallback to local

Default whisper_provider='auto':
- If ELEVENLABS_API_KEY set → use Scribe
- Otherwise → fallback to local faster-whisper
- 'elevenlabs' strict mode: no fallback
- 'local' strict mode: skip Scribe entirely

Tested on Ben Zucker - Ohne dich: Scribe correctly transcribed
'Wir sind Bonnie und Clyde, zu allem bereit' where local Whisper hallucinated.
2026-04-29 12:03:40 +00:00
..
main.py Integrate ElevenLabs Scribe (best multilingual STT 2026) 2026-04-29 12:03:40 +00:00