Commit Graph

22 Commits

Author SHA1 Message Date
4bc5ac6756 Major: Claude post-processing of Whisper transcript
- Claude now corrects transcription errors (Slavic languages, dialects, mixed langs)
- Returns corrected_segments with same timestamps but cleaner text
- Pipeline generates SRT from Claude-corrected transcript and passes to subtitle.py via --srt
- subtitle.py supports --srt to skip Whisper re-transcription on the trimmed clip
- clip.py propagates --srt through to subtitle.py
- Whisper still runs once (in analyze.py); subtitle.py reuses corrected output instead of re-running
- This means: Whisper's mistakes (mixed langs, hallucinations, wrong words) are fixed by Claude before becoming visible subtitles
2026-04-29 08:13:33 +00:00
af3c933c78 Robust language detection + anti-hallucination
- 3-sample voting for auto-detect (start/middle/end of song) prevents lang switching mid-song
- Lock detected language for full transcription
- Anti-hallucination: condition_on_previous_text=False, temperature=0.0
- compression_ratio_threshold=2.4 (rejects repetitive hallucinations)
- log_prob_threshold=-1.0 (rejects low-confidence segments)
- no_speech_threshold=0.6 (more aggressive silence detection)
- Default Whisper model changed: small → medium (better for all langs incl. Slavic)
2026-04-29 07:59:20 +00:00
c870d80726 Fix: extend clip if ends mid-vocal (no chorus cut-off), DejaVu Sans font (supports SLO/HR/BS chars), auto-upgrade to medium Whisper model for Slavic languages 2026-04-29 07:35:00 +00:00
5d5e169f9d Disable Whisper VAD filter — was dropping vocal segments in songs creating gaps in subtitles 2026-04-29 07:07:29 +00:00
a04811bdc9 Add Claude LLM analysis: sends full transcript to Claude API for true song structure understanding (refrain detection across all repetitions, not just local heuristic) 2026-04-29 06:55:41 +00:00
e072eec362 Fix: handle Whisper transcribe failure for instrumental-only audio (fallback to empty transcript) 2026-04-29 06:33:52 +00:00
33a138af9e Fix: force native Python bool/float for JSON serialization (numpy types) 2026-04-29 06:23:41 +00:00
8512076b91 Major: smart selection pipeline (analyze.py) + audio fade + multi-lang auto-detect
- New analyze.py: full transcript + energy + structural analysis
- Smart clip range: includes pre-chorus, can exceed 30s up to max_duration (default 45s)
- Audio fade in/out: auto-detected from vocal boundaries
- Instrumental detection: auto-disables subs if vocals < 10% of duration
- Multi-language: auto-detect via Whisper or explicit (DE/SL/HR/BS/SR/EN/IT/ES/FR)
- Frontend: cleaner UX, added bs language, smart selection description
- reframe.py: --fade-in --fade-out args
- clip.py: propagates fade params
- app/main.py: replaces find_chorus.py call with analyze.py
2026-04-29 06:21:35 +00:00
81edd24ca3 Subtitles: smaller font 56px (was 84), higher position MarginV=400, side margins 80px for safe zone 2026-04-29 06:09:26 +00:00
ba787744a6 Subtitles: cap chunk duration at 2.5s, split long lines into multiple time slices for faster reels pacing 2026-04-29 05:59:36 +00:00
e001387a89 Subtitles: convert SRT to ASS directly with PlayResY=1920 for predictable scaling instead of unreliable force_style 2026-04-28 18:09:53 +00:00
28d933c916 Subtitles: UPPERCASE + position lower (MarginV=320 for 1080x1920) + bigger font 2026-04-28 17:40:48 +00:00
15ef4888a1 Debug: log exact clip.py cmd in job + clip.py logs run_clip args 2026-04-28 17:28:10 +00:00
bc3fe1f9d4 Add explicit FFmpeg trim command logging + duration verification 2026-04-28 17:17:11 +00:00
8eaef029e2 Find chorus: weight repetitive short phrases (like 'Ohne dich x5') as strong chorus signal 2026-04-28 16:57:45 +00:00
c17578521a Fix find_chorus: RMS energy parser was broken (no pts_time available), now syntheses timestamps; energy weight x10 (refren je glasnejši) 2026-04-28 16:55:51 +00:00
64e8854cea Track mode: more sensitive face detection + longer smoothing window 2026-04-28 16:45:13 +00:00
400f6dbb6d Fix: limit FFmpeg crop expression to 20 sample points (was overflowing 4KB limit) 2026-04-28 16:32:26 +00:00
2e337ff079 Fix: shutil import was inside finally block, causing NameError when shutil.move was called 2026-04-28 16:22:39 +00:00
6e2a13d8a3 Fix cross-device link error: use shutil.move instead of os.replace 2026-04-28 16:15:20 +00:00
47509b4f06 Add cookies support to yt_download.py for YouTube bot detection bypass 2026-04-28 15:47:59 +00:00
30b969e4b8 Initial: reels clipper app
- FastAPI backend (auth, jobs, SSE, download)
- Frontend: drag&drop + YouTube URL + jobs panel
- Pipeline: yt_download → find_chorus → reframe → subtitle
- Modes: track (face follow), center, blur
- Whisper for SI/DE/EN subtitles
- Auto-chorus detection via Whisper + RMS energy
- Docker + Coolify ready
2026-04-28 15:28:22 +00:00