reels-app

Author	SHA1	Message	Date
Sebastjan Artič	a30137f1f2	Strict 'chorus only' mode: respect include_prebuild in LLM prompt Bug: 'Vključi pre-chorus' checkbox in UI was sent to backend but ignored by Claude/Gemini analysis prompt. Both modes used same lenient rules saying 'pre-chorus is optional' — Claude often included pre-chorus even when user wanted just chorus. Real-world failure: Lady Gaga 'Abracadabra' picked 54.7-84.6s, but actual chorus 'Abracadabra, amor, ooh-na-na' starts at 85.2s. Claude included the entire pre-chorus block ('Hold me in your heart tonight', 'Like a poem said by a lady in red', 'With a haunting dance') and missed the actual chorus completely. Fix: include_prebuild parameter now flows all the way to the prompt: - main.py → analyze.py CLI args → analyze_with_llm() → prompt builder - Two distinct prompt rule sets: CHORUS ONLY (default, include_prebuild=False): - Strict: 'clip starts on FIRST WORD of chorus, never before' - Length: 12-25s typically - Explicit examples for pop songs (Abracadabra, Despacito, Shape of You) - List of common mistakes to avoid CHORUS + PRE-CHORUS (include_prebuild=True): - Optional pre-chorus before chorus, 4-10s - Length: 18-35s This fixes the most common failure mode where Claude rationalizes including verse/pre-chorus content even when user explicitly wants just the chorus.	2026-04-29 14:03:40 +00:00
Sebastjan Artič	90cdad516b	Universal chorus selection: chorus mandatory, pre-chorus only natural extension User feedback: 'REFREN je obvezen, pre-chorus opcijsko' + 'sistem mora biti stabilen za vse jezike, tudi španščino in romunščino'. Two changes: 1. Web search is now MANDATORY first step (was: optional fallback): - Even if Claude thinks it knows the song, must search lyrics first - Universal lyrics sources by language: SLO: besedila.com, lyricstranslate.com DE: songtexte.com HR/SR/BS: tekstovi.net ES: letras.com, musica.com RO: versuri.ro IT: angolotesti.it FR: paroles.net EN: genius.com, azlyrics.com Universal: lyricstranslate.com (any language) - Search strategy: artist+title first, then transcript snippet fallback - Without lyrics, Claude cannot reliably identify chorus boundaries 2. Simplified selection rules - chorus is THE priority: - Chorus (full first occurrence) = MANDATORY - Pre-chorus = ONLY if 1-2 verse lines tightly connected to chorus - In doubt: just take chorus alone (12-25s) - Outro fillers explicitly multi-language: SLO 'aj ja ja' / 'ej ej ej' EN 'yeah' / 'oh oh' ES 'ay ay ay' RO 'hei hei' JA 'la la la' - 12-35s total range (was 15-35s, now allows shorter chorus-only clips) This makes the system language-agnostic: works the same way for Slovenian narodno-zabavna, Spanish reggaeton, Romanian manele, German Schlager, etc. The lyrics lookup is what makes it stable across languages.	2026-04-29 13:36:34 +00:00
Sebastjan Artič	4efd726176	Extend clip end past chorus to capture outro/sustained notes Problem: Claude was cutting clip exactly at last transcribed word of chorus, but in real songs: - Singer holds last note 1-3s longer (still meaningful) - Outro 'ej-ej-ej' / 'oh' / 'yeah' may not be transcribed as words - Result felt like 'incomplete chorus' even though SRT was correct Fix has two parts: 1. Prompt enhancement: - Ask Claude to add 1-2s padding AFTER last chorus word - Explicit example with timing math - Mention outro fillers (ej-ej-ej, oh, yeah) 2. Post-LLM extension logic: - After Claude returns clip range, scan corrected_segments for segments overlapping or starting just after current end - If next segment is within 1s pause and ends within max_duration+5s, extend clip to include it (with 0.3s breathing room) - Hard cap at max_duration + 5s to prevent unbounded extension This ensures chorus naturally trails off rather than being cut mid-emotional-peak.	2026-04-29 13:12:28 +00:00
Sebastjan Artič	81bae81401	Fix Scribe stopping mid-song: enable tag_audio_events=true + filter events out ROOT CAUSE FOUND: tag_audio_events=false caused Scribe to stop transcribing when instrumental music dominates (polka harmonica taking over from vocals). Real-world test on Avseniki - Ena bolha za pomoč (186s polka): - tag_audio_events=false: 20% coverage (37s only) — fails - tag_audio_events=true: 100% coverage (186s full) — works When tag_audio_events=true, Scribe inserts placeholder markers like '(glasba)' / '(plesalna glasba)' for instrumental sections instead of giving up. We then filter these out so they don't appear in subtitles. Filtering logic: - Skip word.type != 'word' (audio_event types) - Skip parenthesized text legacy fallback like '(music)', '(applause)' This is the core fix — no longer reliant on filename for transcription completeness. Even untitled files like '12345.mp4' now get full coverage.	2026-04-29 13:04:19 +00:00
Sebastjan Artič	7d00730051	Auto-detect language from filename for Scribe (no manual UI selection needed) Problem: Scribe was failing on Slovenian narodno-zabavna songs (Avseniki, Modrijani) because: - User doesn't manually pick language (everything is auto) - Scribe auto-detect had low confidence (0.58) on harmonika-heavy polka - Result: only 37s transcribed instead of full 186s song Solution: detect_language_from_filename() function: - Recognizes 60+ Slovenian artists (Avseniki, Modrijani, Veseli Dolenjci, ...) - Recognizes 30+ German artists (Ben Zucker, Helene Fischer, ...) - Recognizes 20+ Croatian/Serbian artists (Thompson, Severina, Lepa Brena, ...) - Falls back to keyword matching (volim, liebe, srce, herz, ...) - Detects character set (č/ž/š → SL, ä/ö/ü/ß → DE, đ → HR) - Score-based: 5pts for artist match, 1-2pts for keywords/chars When detected, sends language_code to Scribe explicitly: - Avseniki → 'slv' lock → no more half-transcribed songs - Ben Zucker → 'deu' lock → consistent German transcription - User still doesn't need to manually pick anything filename_hint flows: main.py → analyze.py CLI → transcribe_full → Scribe	2026-04-29 12:57:19 +00:00
Sebastjan Artič	40acad26f3	Crystal-clear chorus selection rules: pre-chorus build-up + FIRST chorus Previous rules were ambiguous and Claude was sometimes picking: - Just the chorus (no build-up) - Second chorus instance (lower energy than first) - Random verse + later chorus combinations New explicit priority order: 1. PRIMARY: pre-chorus verse (build-up) + first chorus (~20-35s total) 2. FALLBACK: just first chorus alone 3. LAST RESORT: dramatic peak section Strict rules: - ALWAYS first chorus (highest energy/recognition) - NEVER second/third chorus instances - NEVER skip between verses - NEVER extend over 35 seconds - Concrete example given: chorus@32s,16s long → pick 20-48s This fixes Veseli Dolenjci picking second chorus + post-chorus verse instead of natural pre-chorus build-up + first chorus.	2026-04-29 12:42:54 +00:00
Sebastjan Artič	5f90085981	Add Claude web_search tool for lyrics lookup + tighter subtitle timing 1. Claude API web_search tool integration: - Claude can now search web for actual lyrics when STT text is wrong - Especially useful for SLO/HR/BS/SR songs (Modrijani, Veseli Dolenjci) where Claude doesn't know lyrics from training data - Agentic loop: tool_use → server-side search → continuation → final text - Max 3 searches per job ($0.03 cost limit) - Hint sources: besedila.com, lyricstranslate.com, tekstovi.net, songtexte.com 2. Tighter subtitle segmentation from Scribe word timestamps: - Phrase boundaries on shorter pauses (0.4s vs 0.6s) - Sentence-ending punctuation triggers segment break - Max segment 4s (was 6s) for natural readable subtitles - Hard cap at 5.5s to prevent very long lines This fixes 'ples to noč' → 'ples pojoč' for Modrijani songs that Scribe transcribed phonetically wrong but Claude can fix via web lookup.	2026-04-29 12:24:17 +00:00
Sebastjan Artič	68247bb84c	Integrate ElevenLabs Scribe (best multilingual STT 2026) ElevenLabs Scribe replaces local Whisper as default transcription: - 96.7% accuracy English, 2.4% WER Indonesian (vs Whisper 7.7%) - 18x faster (200s song = 11s vs 3-5 min on CPU) - No hallucinations on songs (Whisper invented 'Pony und Kleid' for 'Bonnie und Clyde') - 99 languages supported, including SLO/HR/BS/SR - $0.40/h pricing, ~$0.022 per 200s song Implementation: - transcribe_with_elevenlabs() function uses Scribe v1 - ISO 639-1 ↔ 639-3 mapping (Scribe needs 'deu' not 'de') - Word-level timestamps converted to pseudo-segments (close on 0.6s pause or 6s duration) - 24MB upload limit guard with auto-fallback to local Default whisper_provider='auto': - If ELEVENLABS_API_KEY set → use Scribe - Otherwise → fallback to local faster-whisper - 'elevenlabs' strict mode: no fallback - 'local' strict mode: skip Scribe entirely Tested on Ben Zucker - Ohne dich: Scribe correctly transcribed 'Wir sind Bonnie und Clyde, zu allem bereit' where local Whisper hallucinated.	2026-04-29 12:03:40 +00:00
Sebastjan Artič	3ffa9740f0	Revert "Add Groq Whisper API integration (200x faster than local CPU)" This reverts commit `5c53a27d33`.	2026-04-29 11:19:31 +00:00
Sebastjan Artič	6a8f87b4a2	Revert "Filler detection: trim clip before la-la-la / instrumental medbridge" This reverts commit `4488717f6f`.	2026-04-29 11:19:31 +00:00
Sebastjan Artič	4488717f6f	Filler detection: trim clip before la-la-la / instrumental medbridge Problem: When a song has chorus → la-la-la medbridge → chorus structure, Claude was including the whole 40s+ block, with 18 seconds of la-la-la making the reel feel artificially extended. Fix: 1. Prompt enhancement: explicitly tell Claude NEVER to include la-la-la / ooh ooh / yeah yeah / instrumental fillers 2. Post-LLM detection: scan corrected_segments for repetitive content (>70% repeated words) and trim clip before that segment 3. Max duration guidance reduced from 45s → 35s in prompt This means: clip will end at the first chorus, not extend through fillers.	2026-04-29 11:17:16 +00:00
Sebastjan Artič	5c53a27d33	Add Groq Whisper API integration (200x faster than local CPU) Pipeline: - New transcribe_with_groq() function uses Groq's whisper-large-v3-turbo - 30s audio transcribed in ~0.5s (vs 30s+ on CPU) - Same quality as local Whisper (it's the same OpenAI model) - Cloudflare bypass via custom User-Agent header - 24MB upload limit guard with auto-fallback to local - Language auto-detect works (Groq returns full lang name, mapped to ISO codes) Default whisper_provider='auto': - If GROQ_API_KEY is set → use Groq (200x faster) - Otherwise → fallback to local faster-whisper - Strict 'groq' mode: no fallback (returns empty if Groq fails) - Strict 'local' mode: skip Groq entirely CLI: --whisper-provider {auto,groq,local} API: whisper_provider field in StartJobIn Cost: $0.04/h with whisper-large-v3-turbo ($0.002 per 200s song)	2026-04-29 11:08:15 +00:00
Sebastjan Artič	60765ad84c	Anti-hallucination: filename hint to LLM + beam search + silence threshold When Whisper hallucinates (generates fake lyrics not matching the audio), LLM can now use the original filename as a hint to recognize the song and override the false transcript with the actual lyrics. Pipeline: 1. Pass filename (e.g. 'Ben Zucker - Bonnie und Clyde') as hint 2. Whisper transcribes (may hallucinate) 3. Claude/Gemini reads filename + transcript: - Recognizes song from filename hint - Compares Whisper output to known lyrics - Replaces hallucinated text with real lyrics (preserves timestamps) - If can't fix, removes segment (better silent than wrong) Also added Whisper anti-hallucination params: - beam_size=5 (more careful decoding vs greedy) - hallucination_silence_threshold=2.0 (skip text in long silences)	2026-04-29 10:48:55 +00:00
OpenClaw Agent	0ca33be6ac	Fix: clip_range source dynamic from LLM result instead of hardcoded 'claude' Diagnoza: - analyze.py je zgodovinsko imel samo Claude support - ko se je dodal Gemini, je clip_range.source ostal hardcoded 'claude' - prav tako log 'Whisper segmenti zamenjani s Claude' in 'Generated SRT from Claude' - API rezultat je v jobu kazal source='claude' tudi ko je dejansko bil uporabljen Gemini - to je samo COSMETIC bug — funkcionalno je vse delovalo pravilno - Gemini se DEJANSKO klical (potrjeno: '🤖 Gemini (gemini-3.1-pro-preview) izbral: 172.5-201.8s') in vrnil pravilen rezultat — samo logging je rekel napačno Popravki: 1. clip_range['source'] = claude_result['source'] (dejansko 'gemini:...' ali 'claude:...') 2. clip_range['reason'] prefix iz hardcoded 'claude_llm:' v dinamičen '{source}:' 3. Log 'Whisper segmenti zamenjani s Claude' → 'z {llm_label}' 4. Log 'Claude je popravil jezik' → 'LLM je popravil' 5. main.py 'Generated SRT from Claude' → 'from {llm_src}' Test (Zlati Muzikanti - Le prijatelja bodiva, valček, 246s): ✓ Gemini dejansko izbere refren (172.5-201.8s) ✓ Whisper detektira sl (p=0.97 across 3 samples) ✓ Vseh 18 segmentov popravljenih ✓ Pipeline end-to-end deluje Backward compat: - transcript['claude_corrected'] in srt_from_claude variable name ohranjena ker že obstajajo v starih job state fajlih	2026-04-29 09:49:58 +00:00
OpenClaw Agent	e350352883	Fix: Gemini 3.1 Pro thinking model needs 32k maxOutputTokens (was 4096 → MAX_TOKENS truncation) Diagnoza: - Gemini 3.x Pro je thinking model (ima internal reasoning, thoughtsTokenCount) - Pri velikih transkriptih (60+ segmentov pesmi): * thoughts ~ 1500-3000 tokens * output JSON s corrected_segments ~ 3000-7000 tokens * total ~ 4500-10000 tokens - Z maxOutputTokens=4096 je bil response prekinjen (finishReason: MAX_TOKENS), JSON odrezan na pol, _parse_llm_response je threw json.JSONDecodeError - Rezultat: 'Gemini vrnil prazen string' v logih Popravki: 1. Gemini maxOutputTokens 4096 → 32768 (dovolj za thinking + dolg JSON) 2. Diagnostika finishReason==MAX_TOKENS in usage tokens v logih 3. Detekcija praznega text-a (ne samo praznega parts array-a) 4. Claude max_tokens 4096 → 8192 (rezerva za dolge pesmi) 5. Claude detekcija stop_reason==max_tokens Test (60 segmentov, 5631 char prompt): - 4096 → finishReason=MAX_TOKENS, thoughts=2594, output=1488, JSON odrezan ❌ - 16384 → finishReason=STOP, thoughts=1445, output=3040, JSON popoln ✅ - 32768 → varen default ✅	2026-04-29 09:03:53 +00:00
Sebastjan Artič	ec71c54570	Upgrade to Sonnet 4.6 + add Gemini 3.1 Pro support - Refactored analyze_with_claude into shared _build_analysis_prompt + _parse_llm_response helpers - New analyze_with_gemini() using Gemini 3.1 Pro ($2/M in, MMMLU 92.6% — best multilingual) - Unified analyze_with_llm(provider) dispatcher with auto-fallback (Claude → Gemini) - API endpoint accepts llm_provider in StartJobIn (claude/gemini/auto) - Frontend dropdown to pick LLM - Default model is now Sonnet 4.6 (was Haiku 4.5) — 3x quality at 3x price (~3 cents/video) - Gemini support is opt-in: needs GEMINI_API_KEY env var to activate	2026-04-29 08:26:27 +00:00
Sebastjan Artič	9faa224885	Upgrade Claude model: Haiku 4.5 → Sonnet 4.6 for better Slavic language transcript correction	2026-04-29 08:22:10 +00:00
Sebastjan Artič	69fb2f5ce8	Upgrade default Whisper model: small/medium → large-v3 for much better Slovenian/Slavic transcription accuracy	2026-04-29 08:20:18 +00:00
Sebastjan Artič	4bc5ac6756	Major: Claude post-processing of Whisper transcript - Claude now corrects transcription errors (Slavic languages, dialects, mixed langs) - Returns corrected_segments with same timestamps but cleaner text - Pipeline generates SRT from Claude-corrected transcript and passes to subtitle.py via --srt - subtitle.py supports --srt to skip Whisper re-transcription on the trimmed clip - clip.py propagates --srt through to subtitle.py - Whisper still runs once (in analyze.py); subtitle.py reuses corrected output instead of re-running - This means: Whisper's mistakes (mixed langs, hallucinations, wrong words) are fixed by Claude before becoming visible subtitles	2026-04-29 08:13:33 +00:00
Sebastjan Artič	af3c933c78	Robust language detection + anti-hallucination - 3-sample voting for auto-detect (start/middle/end of song) prevents lang switching mid-song - Lock detected language for full transcription - Anti-hallucination: condition_on_previous_text=False, temperature=0.0 - compression_ratio_threshold=2.4 (rejects repetitive hallucinations) - log_prob_threshold=-1.0 (rejects low-confidence segments) - no_speech_threshold=0.6 (more aggressive silence detection) - Default Whisper model changed: small → medium (better for all langs incl. Slavic)	2026-04-29 07:59:20 +00:00
Sebastjan Artič	c870d80726	Fix: extend clip if ends mid-vocal (no chorus cut-off), DejaVu Sans font (supports SLO/HR/BS chars), auto-upgrade to medium Whisper model for Slavic languages	2026-04-29 07:35:00 +00:00
Sebastjan Artič	5d5e169f9d	Disable Whisper VAD filter — was dropping vocal segments in songs creating gaps in subtitles	2026-04-29 07:07:29 +00:00
Sebastjan Artič	a04811bdc9	Add Claude LLM analysis: sends full transcript to Claude API for true song structure understanding (refrain detection across all repetitions, not just local heuristic)	2026-04-29 06:55:41 +00:00
Sebastjan Artič	e072eec362	Fix: handle Whisper transcribe failure for instrumental-only audio (fallback to empty transcript)	2026-04-29 06:33:52 +00:00
Sebastjan Artič	33a138af9e	Fix: force native Python bool/float for JSON serialization (numpy types)	2026-04-29 06:23:41 +00:00
Sebastjan Artič	8512076b91	Major: smart selection pipeline (analyze.py) + audio fade + multi-lang auto-detect - New analyze.py: full transcript + energy + structural analysis - Smart clip range: includes pre-chorus, can exceed 30s up to max_duration (default 45s) - Audio fade in/out: auto-detected from vocal boundaries - Instrumental detection: auto-disables subs if vocals < 10% of duration - Multi-language: auto-detect via Whisper or explicit (DE/SL/HR/BS/SR/EN/IT/ES/FR) - Frontend: cleaner UX, added bs language, smart selection description - reframe.py: --fade-in --fade-out args - clip.py: propagates fade params - app/main.py: replaces find_chorus.py call with analyze.py	2026-04-29 06:21:35 +00:00
Sebastjan Artič	81edd24ca3	Subtitles: smaller font 56px (was 84), higher position MarginV=400, side margins 80px for safe zone	2026-04-29 06:09:26 +00:00
Sebastjan Artič	ba787744a6	Subtitles: cap chunk duration at 2.5s, split long lines into multiple time slices for faster reels pacing	2026-04-29 05:59:36 +00:00
Sebastjan Artič	e001387a89	Subtitles: convert SRT to ASS directly with PlayResY=1920 for predictable scaling instead of unreliable force_style	2026-04-28 18:09:53 +00:00
Sebastjan Artič	28d933c916	Subtitles: UPPERCASE + position lower (MarginV=320 for 1080x1920) + bigger font	2026-04-28 17:40:48 +00:00
Sebastjan Artič	15ef4888a1	Debug: log exact clip.py cmd in job + clip.py logs run_clip args	2026-04-28 17:28:10 +00:00
Sebastjan Artič	bc3fe1f9d4	Add explicit FFmpeg trim command logging + duration verification	2026-04-28 17:17:11 +00:00
Sebastjan Artič	8eaef029e2	Find chorus: weight repetitive short phrases (like 'Ohne dich x5') as strong chorus signal	2026-04-28 16:57:45 +00:00
Sebastjan Artič	c17578521a	Fix find_chorus: RMS energy parser was broken (no pts_time available), now syntheses timestamps; energy weight x10 (refren je glasnejši)	2026-04-28 16:55:51 +00:00
Sebastjan Artič	64e8854cea	Track mode: more sensitive face detection + longer smoothing window	2026-04-28 16:45:13 +00:00
Sebastjan Artič	400f6dbb6d	Fix: limit FFmpeg crop expression to 20 sample points (was overflowing 4KB limit)	2026-04-28 16:32:26 +00:00
Sebastjan Artič	2e337ff079	Fix: shutil import was inside finally block, causing NameError when shutil.move was called	2026-04-28 16:22:39 +00:00
Sebastjan Artič	6e2a13d8a3	Fix cross-device link error: use shutil.move instead of os.replace	2026-04-28 16:15:20 +00:00
Sebastjan Artič	47509b4f06	Add cookies support to yt_download.py for YouTube bot detection bypass	2026-04-28 15:47:59 +00:00
Sebastjan Artič	30b969e4b8	Initial: reels clipper app - FastAPI backend (auth, jobs, SSE, download) - Frontend: drag&drop + YouTube URL + jobs panel - Pipeline: yt_download → find_chorus → reframe → subtitle - Modes: track (face follow), center, blur - Whisper for SI/DE/EN subtitles - Auto-chorus detection via Whisper + RMS energy - Docker + Coolify ready	2026-04-28 15:28:22 +00:00

40 Commits