reels-app

Author	SHA1	Message	Date
Sebastjan Artič	d73453fe50	Fix SRT subtitles: word-level clipping for partial segments Bug found in Žena ME TEPE re-test: - Clip start: 76.73s (correct, captures full 'Žena' word) - But SRT subtitle #1 showed: 'SAJ ŠE DOMA MI VEČ NOČJO VERJET.' - That text is from the PREVIOUS verse, not the chorus! Why: previous segment (73.9-78.2s) contained 'saj še doma mi več nočjo verjet. Žena me'. Clip start fell at 76.73s (mid-segment). Old SRT logic: max(s_start, clip_start) just clipped TIMING but kept ALL the text from that segment, including text from before the clip. Fix: when a segment partially falls outside clip range AND has word-level timestamps (Scribe provides these), reconstruct the segment using only the words that actually fall within [clip_start, clip_end]. Audio (clipped at clip_start) only contains those words anyway, so the subtitle should match. Result for Žena chorus: - Old: 'SAJ ŠE DOMA MI VEČ NOČJO VERJET.' (wrong, that text is silent in clip) - New: 'ŽENA ME' (only words actually heard at 76.73-78.16s)	2026-04-29 16:48:39 +00:00
Sebastjan Artič	91cc03658d	Multi-upload batch queue + Telegram notifications Changes: 1. Frontend multi-upload: - File input now has 'multiple' attribute, drag-drop accepts multiple - File queue list with per-file artist/title preview + remove button - 'Pošlji vse' uploads sequentially (one at a time to avoid network saturation) - Each file gets same batch_id for Telegram batch summary - After upload, queue clears, jobs appear in right sidebar 2. Backend queue worker: - New _queue_worker() background thread processes 'queued' jobs sequentially - Only 1 job at a time to keep openclaw stable (avoid CPU/RAM thrash) - FIFO order by created_at - Auto-starts on app startup after job resume 3. Job submission flow change: - /api/process and /api/youtube no longer call background.add_task directly - Just mark status='queued', queue worker picks up - This means upload completes fast, processing happens in background - User can close browser, jobs continue 4. Telegram notifications (FOLX Alerts bot): - Per-job: 'Reel pripravljen: Lady Gaga - Abracadabra (29s, 30 MB)' - Per-job failed: 'Reel ni uspel: <name> + error message' - Batch summary: 'Batch končan: 10/10 reels pripravljeni' (only if >1 in batch) - Uses existing TELEGRAM_TOKEN + TELEGRAM_CHAT_ID env vars - app/telegram.py module with notify_job_done(), notify_job_failed(), notify_batch_complete() 5. batch_id field: - Added to Job model + StartJobIn pydantic - Saved during upload + process - Used to count batch progress and trigger summary notification User experience: - Drag 20 videos at once - Click 'Pošlji' - Close browser, go grab coffee - Telegram sends 'Reel pripravljen' for each - After all done: 'Batch končan: 20/20 reels pripravljeni' summary - Open app to download all	2026-04-29 15:12:38 +00:00
Sebastjan Artič	b543057cee	ACRCloud auto-recognition: never block uploads, fall back to fingerprinting Changes: 1. UI: removed blocking prompt() that asked for artist+title on filename that didn't match 'Artist - Title' pattern. Upload always proceeds. Instead shows yellow warning saying 'server will try to recognize'. 2. Backend: added scripts/acr_recognize.py — extracts 20s audio sample from video (at 15s and 60s offsets for robustness), computes ACRCloud fingerprint via native binary (3KB payload), sends to identify API. 3. Pipeline: process_job() now runs ACR recognition step before analysis IF parsed_artist or parsed_title is missing. Result is saved to job metadata and used for download filename + Scribe/Claude filename hint. 4. Credentials: ACR_HOST + ACR_ACCESS_KEY + ACR_SECRET_KEY env vars added to Coolify (using existing keys from openclaw fb-agent metka). 5. requirements.txt: added pyacrcloud==1.0.11 for native fingerprinting. This unblocks future automation/cron upload pipelines — files don't need to be perfectly named, ACRCloud will identify them automatically. Fallback chain: 1. Filename parsing (Artist - Title.mp4) 2. ACRCloud audio fingerprint (works even for '12345.mp4', 'IMG_001.mp4') 3. If both fail: download filename uses 'reel_<id>.mp4' (still works)	2026-04-29 14:24:53 +00:00
Sebastjan Artič	3877b822ff	Smart download filenames: 'Artist - Title - REEL.mp4' + validation Two improvements: 1. DOWNLOAD FILENAME: instead of 'reel_<job-id>.mp4' (e.g. reel_25e076af7600.mp4), downloads now have descriptive names like: - 'Lady Gaga - Abracadabra - REEL.mp4' - 'Modrijani - S teboj - REEL.mp4' - 'Sarah Connor - FICKA - REEL.mp4' 2. PRE-UPLOAD VALIDATION: when filename doesn't follow 'Artist - Title' format, browser prompts user for both fields. Without them, upload is blocked. This prevents files with names like '12345.mp4' or 'video_final.mp4' from being processed without identifying info. Implementation: - parse_artist_title() helper handles common formats: - 'Artist - Title.mp4' / 'Artist – Title' (em-dash) - 'Artist \| Title' / 'Artist : Title' - Strips noise: '(Official Music Video)', '(Audio)', '(HD)', '[Lyric Video]' - Client-side parser mirrors backend (validation before upload) - Backend accepts artist + title form fields (override parsed) - Job stored with parsed_artist + parsed_title + has_clean_name fields - YouTube jobs auto-fetch title via yt-dlp --info-only and parse it - Filename hint to Scribe/Claude uses parsed values (cleaner than raw filename) - Download endpoint uses build_download_filename() for content-disposition - Jobs list shows 'Artist — Title' instead of raw filename Result: downloaded reels are auto-named correctly for Facebook/Instagram upload, no more renaming files manually.	2026-04-29 14:15:18 +00:00
Sebastjan Artič	68247bb84c	Integrate ElevenLabs Scribe (best multilingual STT 2026) ElevenLabs Scribe replaces local Whisper as default transcription: - 96.7% accuracy English, 2.4% WER Indonesian (vs Whisper 7.7%) - 18x faster (200s song = 11s vs 3-5 min on CPU) - No hallucinations on songs (Whisper invented 'Pony und Kleid' for 'Bonnie und Clyde') - 99 languages supported, including SLO/HR/BS/SR - $0.40/h pricing, ~$0.022 per 200s song Implementation: - transcribe_with_elevenlabs() function uses Scribe v1 - ISO 639-1 ↔ 639-3 mapping (Scribe needs 'deu' not 'de') - Word-level timestamps converted to pseudo-segments (close on 0.6s pause or 6s duration) - 24MB upload limit guard with auto-fallback to local Default whisper_provider='auto': - If ELEVENLABS_API_KEY set → use Scribe - Otherwise → fallback to local faster-whisper - 'elevenlabs' strict mode: no fallback - 'local' strict mode: skip Scribe entirely Tested on Ben Zucker - Ohne dich: Scribe correctly transcribed 'Wir sind Bonnie und Clyde, zu allem bereit' where local Whisper hallucinated.	2026-04-29 12:03:40 +00:00
Sebastjan Artič	3ffa9740f0	Revert "Add Groq Whisper API integration (200x faster than local CPU)" This reverts commit `5c53a27d33`.	2026-04-29 11:19:31 +00:00
Sebastjan Artič	5c53a27d33	Add Groq Whisper API integration (200x faster than local CPU) Pipeline: - New transcribe_with_groq() function uses Groq's whisper-large-v3-turbo - 30s audio transcribed in ~0.5s (vs 30s+ on CPU) - Same quality as local Whisper (it's the same OpenAI model) - Cloudflare bypass via custom User-Agent header - 24MB upload limit guard with auto-fallback to local - Language auto-detect works (Groq returns full lang name, mapped to ISO codes) Default whisper_provider='auto': - If GROQ_API_KEY is set → use Groq (200x faster) - Otherwise → fallback to local faster-whisper - Strict 'groq' mode: no fallback (returns empty if Groq fails) - Strict 'local' mode: skip Groq entirely CLI: --whisper-provider {auto,groq,local} API: whisper_provider field in StartJobIn Cost: $0.04/h with whisper-large-v3-turbo ($0.002 per 200s song)	2026-04-29 11:08:15 +00:00
Sebastjan Artič	60765ad84c	Anti-hallucination: filename hint to LLM + beam search + silence threshold When Whisper hallucinates (generates fake lyrics not matching the audio), LLM can now use the original filename as a hint to recognize the song and override the false transcript with the actual lyrics. Pipeline: 1. Pass filename (e.g. 'Ben Zucker - Bonnie und Clyde') as hint 2. Whisper transcribes (may hallucinate) 3. Claude/Gemini reads filename + transcript: - Recognizes song from filename hint - Compares Whisper output to known lyrics - Replaces hallucinated text with real lyrics (preserves timestamps) - If can't fix, removes segment (better silent than wrong) Also added Whisper anti-hallucination params: - beam_size=5 (more careful decoding vs greedy) - hallucination_silence_threshold=2.0 (skip text in long silences)	2026-04-29 10:48:55 +00:00
Sebastjan Artič	05fb0081c6	Fix preview cutoff + sticky left panel 1. Preview endpoint now supports HTTP Range requests (HTTP 206 Partial) - HTML5 video player needs Range support to seek/buffer properly - Without it, video would cut off after a few seconds - Returns chunks of 64KB on demand 2. Left panel (upload form) is now sticky (position: sticky) - Stays in view while right panel (jobs list) scrolls - On mobile (<800px) reverts to normal flow	2026-04-29 10:24:32 +00:00
OpenClaw Agent	0ca33be6ac	Fix: clip_range source dynamic from LLM result instead of hardcoded 'claude' Diagnoza: - analyze.py je zgodovinsko imel samo Claude support - ko se je dodal Gemini, je clip_range.source ostal hardcoded 'claude' - prav tako log 'Whisper segmenti zamenjani s Claude' in 'Generated SRT from Claude' - API rezultat je v jobu kazal source='claude' tudi ko je dejansko bil uporabljen Gemini - to je samo COSMETIC bug — funkcionalno je vse delovalo pravilno - Gemini se DEJANSKO klical (potrjeno: '🤖 Gemini (gemini-3.1-pro-preview) izbral: 172.5-201.8s') in vrnil pravilen rezultat — samo logging je rekel napačno Popravki: 1. clip_range['source'] = claude_result['source'] (dejansko 'gemini:...' ali 'claude:...') 2. clip_range['reason'] prefix iz hardcoded 'claude_llm:' v dinamičen '{source}:' 3. Log 'Whisper segmenti zamenjani s Claude' → 'z {llm_label}' 4. Log 'Claude je popravil jezik' → 'LLM je popravil' 5. main.py 'Generated SRT from Claude' → 'from {llm_src}' Test (Zlati Muzikanti - Le prijatelja bodiva, valček, 246s): ✓ Gemini dejansko izbere refren (172.5-201.8s) ✓ Whisper detektira sl (p=0.97 across 3 samples) ✓ Vseh 18 segmentov popravljenih ✓ Pipeline end-to-end deluje Backward compat: - transcript['claude_corrected'] in srt_from_claude variable name ohranjena ker že obstajajo v starih job state fajlih	2026-04-29 09:49:58 +00:00
Sebastjan Artič	534d710e8a	Auto-resume jobs interrupted by container restart When Coolify redeploys, the container is killed mid-job. Now on FastAPI startup: - Detect status=processing jobs from JOBS_DIR - If input file exists and resume_attempts < 3, restart pipeline (status=queued) - After 3 failed attempts, mark as error - If input is missing, mark error immediately - Track resume_attempts and last_resume_at for diagnostics Run actual process_job in asyncio executor (sync function in thread) so startup completes quickly and resume happens in background. Resolves: 'Veseli Dolenci stuck' issue	2026-04-29 08:52:16 +00:00
Sebastjan Artič	32baf9cd45	Auto-resume: cleanup stuck jobs on container startup + GEMINI_API_KEY env - @app.on_event(startup) marks all status=processing jobs as error after restart - Process endpoint now clears chorus_error/interrupted_at on retry (retry-friendly) - GEMINI_API_KEY added to Coolify env (Gemini 3.1 Pro now active) - User can now choose Gemini in UI dropdown for analysis	2026-04-29 08:43:31 +00:00
Sebastjan Artič	ec71c54570	Upgrade to Sonnet 4.6 + add Gemini 3.1 Pro support - Refactored analyze_with_claude into shared _build_analysis_prompt + _parse_llm_response helpers - New analyze_with_gemini() using Gemini 3.1 Pro ($2/M in, MMMLU 92.6% — best multilingual) - Unified analyze_with_llm(provider) dispatcher with auto-fallback (Claude → Gemini) - API endpoint accepts llm_provider in StartJobIn (claude/gemini/auto) - Frontend dropdown to pick LLM - Default model is now Sonnet 4.6 (was Haiku 4.5) — 3x quality at 3x price (~3 cents/video) - Gemini support is opt-in: needs GEMINI_API_KEY env var to activate	2026-04-29 08:26:27 +00:00
Sebastjan Artič	69fb2f5ce8	Upgrade default Whisper model: small/medium → large-v3 for much better Slovenian/Slavic transcription accuracy	2026-04-29 08:20:18 +00:00
Sebastjan Artič	4bc5ac6756	Major: Claude post-processing of Whisper transcript - Claude now corrects transcription errors (Slavic languages, dialects, mixed langs) - Returns corrected_segments with same timestamps but cleaner text - Pipeline generates SRT from Claude-corrected transcript and passes to subtitle.py via --srt - subtitle.py supports --srt to skip Whisper re-transcription on the trimmed clip - clip.py propagates --srt through to subtitle.py - Whisper still runs once (in analyze.py); subtitle.py reuses corrected output instead of re-running - This means: Whisper's mistakes (mixed langs, hallucinations, wrong words) are fixed by Claude before becoming visible subtitles	2026-04-29 08:13:33 +00:00
Sebastjan Artič	c870d80726	Fix: extend clip if ends mid-vocal (no chorus cut-off), DejaVu Sans font (supports SLO/HR/BS chars), auto-upgrade to medium Whisper model for Slavic languages	2026-04-29 07:35:00 +00:00
Sebastjan Artič	8512076b91	Major: smart selection pipeline (analyze.py) + audio fade + multi-lang auto-detect - New analyze.py: full transcript + energy + structural analysis - Smart clip range: includes pre-chorus, can exceed 30s up to max_duration (default 45s) - Audio fade in/out: auto-detected from vocal boundaries - Instrumental detection: auto-disables subs if vocals < 10% of duration - Multi-language: auto-detect via Whisper or explicit (DE/SL/HR/BS/SR/EN/IT/ES/FR) - Frontend: cleaner UX, added bs language, smart selection description - reframe.py: --fade-in --fade-out args - clip.py: propagates fade params - app/main.py: replaces find_chorus.py call with analyze.py	2026-04-29 06:21:35 +00:00
Sebastjan Artič	d36893bf2d	FIX CRITICAL: reload job dict after find_chorus update so reframe gets new start/duration values	2026-04-28 17:33:11 +00:00
Sebastjan Artič	15ef4888a1	Debug: log exact clip.py cmd in job + clip.py logs run_clip args	2026-04-28 17:28:10 +00:00
Sebastjan Artič	2e337ff079	Fix: shutil import was inside finally block, causing NameError when shutil.move was called	2026-04-28 16:22:39 +00:00
Sebastjan Artič	30b969e4b8	Initial: reels clipper app - FastAPI backend (auth, jobs, SSE, download) - Frontend: drag&drop + YouTube URL + jobs panel - Pipeline: yt_download → find_chorus → reframe → subtitle - Modes: track (face follow), center, blur - Whisper for SI/DE/EN subtitles - Auto-chorus detection via Whisper + RMS energy - Docker + Coolify ready	2026-04-28 15:28:22 +00:00

21 Commits