reels-app

History

Sebastjan Artič 865e21fe1a Integrate Soniox stt-async-v4 as primary STT provider Test results comparing all providers on Slovenian folk-pop: CVETELE SO MALINE: - Scribe: HALLUCINATED ('finančni moduli...') ❌ - Gemini 3 Pro: correct lyrics, ~100s ✅ - Soniox: PERFECT lyrics in 4 seconds ✅✅ PA PA: - Scribe: 'se mu pomahala' (wrong: missing M) ❌ - Soniox: 'sem mu pomahala' ✅ + caught 'pa-pa-ra-pa' fillers ŽENA ME TEPE: - Scribe: hallucinations + word errors - Soniox: PERFECT 'Žena me tepe, mi prazni žepe, da vidi, kje in s kom sem bil' Soniox advantages: - 4x cheaper than Scribe ($0.10/h vs $0.40/h) - 5x faster (4-15s vs 10-15s for 180s audio) - 50x cheaper than Gemini 3 Pro - 25x faster than Gemini - Slovenian native quality matches Gemini - Word-level timestamps + diacritics + punctuation Implementation: 1. transcribe_with_soniox() function: - Multipart upload to /v1/files (no SDK dependency) - Create transcription with stt-async-v4 model - Auto language hint based on filename (NZ → 'sl') - Multilingual fallback ['en', 'sl', 'de', 'hr', 'es', 'fr', 'it'] - Poll status, fetch transcript - Group subword tokens into words → segments - Auto-cleanup files after transcription 2. New 'soniox_chain' provider mode (default for 'auto'): - Soniox primary (fast + cheap + accurate) - Scribe fallback (rare cases when Soniox fails) - Gemini fallback (last resort, slow but bulletproof) - Quality gate: coverage >= 50%, no hallucinations 3. Provider modes: auto, soniox, elevenlabs, gemini, hybrid, local This makes the pipeline reliable for ALL music genres including Slovenian narodno-zabavni glasbi which Scribe consistently failed on.		2026-04-30 03:06:38 +00:00
..
acr_recognize.py	MXF/MPG broadcast format support: handle multichannel audio properly	2026-04-29 14:38:48 +00:00
analyze.py	Integrate Soniox stt-async-v4 as primary STT provider	2026-04-30 03:06:38 +00:00
clip.py	Upgrade default Whisper model: small/medium → large-v3 for much better Slovenian/Slavic transcription accuracy	2026-04-29 08:20:18 +00:00
find_chorus.py	Find chorus: weight repetitive short phrases (like 'Ohne dich x5') as strong chorus signal	2026-04-28 16:57:45 +00:00
reframe.py	MXF/MPG broadcast format support: handle multichannel audio properly	2026-04-29 14:38:48 +00:00
subtitle.py	Upgrade default Whisper model: small/medium → large-v3 for much better Slovenian/Slavic transcription accuracy	2026-04-29 08:20:18 +00:00
yt_download.py	Add cookies support to yt_download.py for YouTube bot detection bypass	2026-04-28 15:47:59 +00:00