reels-app/scripts
Sebastjan Artič 865e21fe1a Integrate Soniox stt-async-v4 as primary STT provider
Test results comparing all providers on Slovenian folk-pop:

CVETELE SO MALINE:
- Scribe: HALLUCINATED ('finančni moduli...') 
- Gemini 3 Pro: correct lyrics, ~100s 
- Soniox: PERFECT lyrics in 4 seconds 

PA PA:
- Scribe: 'se mu pomahala' (wrong: missing M) 
- Soniox: 'sem mu pomahala'  + caught 'pa-pa-ra-pa' fillers

ŽENA ME TEPE:
- Scribe: hallucinations + word errors
- Soniox: PERFECT 'Žena me tepe, mi prazni žepe, da vidi, kje in s kom sem bil'

Soniox advantages:
- 4x cheaper than Scribe ($0.10/h vs $0.40/h)
- 5x faster (4-15s vs 10-15s for 180s audio)
- 50x cheaper than Gemini 3 Pro
- 25x faster than Gemini
- Slovenian native quality matches Gemini
- Word-level timestamps + diacritics + punctuation

Implementation:

1. transcribe_with_soniox() function:
   - Multipart upload to /v1/files (no SDK dependency)
   - Create transcription with stt-async-v4 model
   - Auto language hint based on filename (NZ → 'sl')
   - Multilingual fallback ['en', 'sl', 'de', 'hr', 'es', 'fr', 'it']
   - Poll status, fetch transcript
   - Group subword tokens into words → segments
   - Auto-cleanup files after transcription

2. New 'soniox_chain' provider mode (default for 'auto'):
   - Soniox primary (fast + cheap + accurate)
   - Scribe fallback (rare cases when Soniox fails)
   - Gemini fallback (last resort, slow but bulletproof)
   - Quality gate: coverage >= 50%, no hallucinations

3. Provider modes: auto, soniox, elevenlabs, gemini, hybrid, local

This makes the pipeline reliable for ALL music genres including
Slovenian narodno-zabavni glasbi which Scribe consistently failed on.
2026-04-30 03:06:38 +00:00
..
acr_recognize.py MXF/MPG broadcast format support: handle multichannel audio properly 2026-04-29 14:38:48 +00:00
analyze.py Integrate Soniox stt-async-v4 as primary STT provider 2026-04-30 03:06:38 +00:00
clip.py Upgrade default Whisper model: small/medium → large-v3 for much better Slovenian/Slavic transcription accuracy 2026-04-29 08:20:18 +00:00
find_chorus.py Find chorus: weight repetitive short phrases (like 'Ohne dich x5') as strong chorus signal 2026-04-28 16:57:45 +00:00
reframe.py MXF/MPG broadcast format support: handle multichannel audio properly 2026-04-29 14:38:48 +00:00
subtitle.py Upgrade default Whisper model: small/medium → large-v3 for much better Slovenian/Slavic transcription accuracy 2026-04-29 08:20:18 +00:00
yt_download.py Add cookies support to yt_download.py for YouTube bot detection bypass 2026-04-28 15:47:59 +00:00