Real-world test confirmed Gemini 3 Pro can transcribe Slovenian folk-pop
songs accurately where ElevenLabs Scribe hallucinates:
Test: FEHTARJI - GORENJSKA LJUBLJENA (120s sample)
- Scribe result: 'finančni moduli...' (total hallucination, wrong content)
- Gemini 3 Pro: 'Zunaj srečo sem iskal, planet prepotoval' (CORRECT lyrics)
Implementation:
1. New transcribe_with_gemini() function:
- Uploads audio via Gemini Files API (resumable upload)
- Calls gemini-3-pro-preview with structured prompt
- Parses JSON response with word-level timestamps
- Computes coverage_pct and hallucination_count
- Returns same format as Scribe (compatible)
2. New 'hybrid' provider mode (now the default for 'auto'):
- Try Scribe first (fast, cheap: 8-10s, $0.013)
- If quality OK (coverage >= 50%, no hallucinations) → return Scribe
- Else retry Scribe once
- If still bad → fallback to Gemini 3 Pro (slow, more expensive: 100s, $0.20)
- Compare results, return whichever is better
3. Provider modes:
- 'auto' → hybrid if both keys, else elevenlabs, else local
- 'hybrid' → explicit Scribe + Gemini fallback
- 'elevenlabs'→ Scribe only (with auto-retry)
- 'gemini' → Gemini only
- 'local' → faster-whisper on CPU
Cost analysis (10 reels/day):
- Pure Scribe: $0.13/day, ~5-10% reels unusable
- Hybrid: ~$0.55/day, 100% usable
- Pure Gemini: $2/day
Hybrid is the clear winner: +$0.42/day for 100% reliability.