Real-world failure: 'Ansambel Saša Avsenika - ŽENA ME TEPE'
- Refren starts with 'Žena me tepe' at 78.0s
- Scribe's segment boundary: word 'Žena' was end of previous segment (73.9-78.2s)
while new segment 'tepe, mi prazni žepe' started at 78.3s
- Claude picked clip start = 78.3s (segment boundary)
- Fade-in 0.4s on vocal start = inaudible 'Že-'
- User hears: '...na me tepe' (cut)
Three-part fix:
1. PROMPT: instruct Claude to start clip ~0.3s BEFORE first chorus word
(not exactly at it). Concrete example with timing math.
2. POST-LLM EXTENSION: scan corrected_segments for boundary cases:
- If clip start falls MID-segment → extend back to segment start - 0.2s
- If a previous segment ended within 0.5s of clip start → check if its
last word might actually be the first chorus word, extend back to it
- Uses word-level timestamps when available (Scribe provides these)
3. FADE-IN: was 0.4s when starting on vocal — too long, audibly cuts first
word. Reduced to 0.05s (just click prevention, not audible). Still 0.2s
for instrumental intros where fade is musically appropriate.
Now 'Žena' will be heard fully — clip starts at ~77.5-77.7s, word starts
at 78.0s, plenty of buffer.