reels-app/scripts
Sebastjan Artič 0dd33c16f3 Hybrid transcription: Scribe primary + Gemini 3 Pro fallback
Real-world test confirmed Gemini 3 Pro can transcribe Slovenian folk-pop
songs accurately where ElevenLabs Scribe hallucinates:

Test: FEHTARJI - GORENJSKA LJUBLJENA (120s sample)
- Scribe result: 'finančni moduli...' (total hallucination, wrong content)
- Gemini 3 Pro: 'Zunaj srečo sem iskal, planet prepotoval' (CORRECT lyrics)

Implementation:

1. New transcribe_with_gemini() function:
   - Uploads audio via Gemini Files API (resumable upload)
   - Calls gemini-3-pro-preview with structured prompt
   - Parses JSON response with word-level timestamps
   - Computes coverage_pct and hallucination_count
   - Returns same format as Scribe (compatible)

2. New 'hybrid' provider mode (now the default for 'auto'):
   - Try Scribe first (fast, cheap: 8-10s, $0.013)
   - If quality OK (coverage >= 50%, no hallucinations) → return Scribe
   - Else retry Scribe once
   - If still bad → fallback to Gemini 3 Pro (slow, more expensive: 100s, $0.20)
   - Compare results, return whichever is better

3. Provider modes:
   - 'auto'      → hybrid if both keys, else elevenlabs, else local
   - 'hybrid'    → explicit Scribe + Gemini fallback
   - 'elevenlabs'→ Scribe only (with auto-retry)
   - 'gemini'    → Gemini only
   - 'local'     → faster-whisper on CPU

Cost analysis (10 reels/day):
- Pure Scribe: $0.13/day, ~5-10% reels unusable
- Hybrid: ~$0.55/day, 100% usable
- Pure Gemini: $2/day

Hybrid is the clear winner: +$0.42/day for 100% reliability.
2026-04-29 18:38:27 +00:00
..
acr_recognize.py MXF/MPG broadcast format support: handle multichannel audio properly 2026-04-29 14:38:48 +00:00
analyze.py Hybrid transcription: Scribe primary + Gemini 3 Pro fallback 2026-04-29 18:38:27 +00:00
clip.py Upgrade default Whisper model: small/medium → large-v3 for much better Slovenian/Slavic transcription accuracy 2026-04-29 08:20:18 +00:00
find_chorus.py Find chorus: weight repetitive short phrases (like 'Ohne dich x5') as strong chorus signal 2026-04-28 16:57:45 +00:00
reframe.py MXF/MPG broadcast format support: handle multichannel audio properly 2026-04-29 14:38:48 +00:00
subtitle.py Upgrade default Whisper model: small/medium → large-v3 for much better Slovenian/Slavic transcription accuracy 2026-04-29 08:20:18 +00:00
yt_download.py Add cookies support to yt_download.py for YouTube bot detection bypass 2026-04-28 15:47:59 +00:00