reels-app/app
Sebastjan Artič df6011c3cf Detect Scribe hallucinations + filter from SRT + auto-retry
Bug found in Žena ME TEPE third re-test:
- Scribe transcribed only verse 1 (0-33s) properly
- Then returned a single 98s segment [34.7-133.2] with just 1 word 'sam'
- This is a known Scribe hallucination on instrumental sections
- Result: SRT showed 'SAM SAM SAM SAM...' 14 times across the chorus
- Looked completely wrong because the chorus audio was correct but
  subtitles showed 'SAM' repeatedly

Three-part fix:

1. SRT GENERATOR: skip segments > 15s with < 5 words.
   These are hallucinations and have no real transcription value.

2. SCRIBE TRANSCRIBE: detect hallucinations in returned segments.
   - Mark segments > 15s with < 5 words as hallucinations
   - Compute true coverage % (excluding hallucinations)
   - Add _hallucination_count and _coverage_pct to result

3. TRANSCRIBE_FULL: auto-retry Scribe if quality is poor.
   - If hallucinations detected OR coverage < 50%, retry once
   - Keep retry result only if it has better stats
   - Otherwise fall back to first attempt (still better than nothing)

This makes the pipeline robust against Scribe's occasional bad transcripts
on songs with long instrumental breaks. Most second attempts succeed
where the first failed (random Scribe variance).
2026-04-29 18:08:35 +00:00
..
main.py Detect Scribe hallucinations + filter from SRT + auto-retry 2026-04-29 18:08:35 +00:00
telegram.py Multi-upload batch queue + Telegram notifications 2026-04-29 15:12:38 +00:00