Changes:
1. Frontend multi-upload:
- File input now has 'multiple' attribute, drag-drop accepts multiple
- File queue list with per-file artist/title preview + remove button
- 'Pošlji vse' uploads sequentially (one at a time to avoid network saturation)
- Each file gets same batch_id for Telegram batch summary
- After upload, queue clears, jobs appear in right sidebar
2. Backend queue worker:
- New _queue_worker() background thread processes 'queued' jobs sequentially
- Only 1 job at a time to keep openclaw stable (avoid CPU/RAM thrash)
- FIFO order by created_at
- Auto-starts on app startup after job resume
3. Job submission flow change:
- /api/process and /api/youtube no longer call background.add_task directly
- Just mark status='queued', queue worker picks up
- This means upload completes fast, processing happens in background
- User can close browser, jobs continue
4. Telegram notifications (FOLX Alerts bot):
- Per-job: 'Reel pripravljen: Lady Gaga - Abracadabra (29s, 30 MB)'
- Per-job failed: 'Reel ni uspel: <name> + error message'
- Batch summary: 'Batch končan: 10/10 reels pripravljeni' (only if >1 in batch)
- Uses existing TELEGRAM_TOKEN + TELEGRAM_CHAT_ID env vars
- app/telegram.py module with notify_job_done(), notify_job_failed(),
notify_batch_complete()
5. batch_id field:
- Added to Job model + StartJobIn pydantic
- Saved during upload + process
- Used to count batch progress and trigger summary notification
User experience:
- Drag 20 videos at once
- Click 'Pošlji'
- Close browser, go grab coffee
- Telegram sends 'Reel pripravljen' for each
- After all done: 'Batch končan: 20/20 reels pripravljeni' summary
- Open app to download all
Previous fix used segment boundaries — required segments <3s for type 1
or <4s for type 2. But Žena was in a 4.3s segment ('saj še doma mi več
noč'jo verjet'. Žena me'), so the condition wasn't met and clip start
stayed at 77.7s, exactly at end of word 'Žena' (76.88-77.70s).
New approach: scan word-level timestamps directly:
1. If clip start falls MID-WORD → extend back to word start - 0.15s
2. If a word ends 0-0.5s BEFORE clip start AND next word is at clip start
→ that word is suspect (may be first word of chorus that Scribe put
in previous segment), extend back to its start - 0.15s
Word-level timestamps are always available from Scribe (timestamps_granularity=word).
Falls back to segment-level for local Whisper without word timing.
This handles arbitrary segment lengths and is universal — works for any
language where the chorus starts on a word that the STT placed in the
previous segment.
Problem: MXF and MPG files (TV broadcast formats) often contain:
- Multiple audio streams (4-8 streams for different language tracks)
- Multichannel layouts (5.1, 7.1) instead of stereo
- Default ffmpeg behavior was -c:a aac without channel limit, which
meant multichannel got transcoded as multichannel AAC, overwriting
what should have been clean stereo
Solution:
1. get_audio_streams() helper probes all audio streams with ffprobe
- Returns codec, channels, sample_rate, language, layout for each
2. build_audio_args() picks best stream + downmix:
- Prefers first 2-channel stereo stream (usually main mix)
- Falls back to first stream if none are 2-ch
- Always: -ac 2 (force stereo downmix), -ar 48000, -c:a aac, -b:a 192k
- Bitrate raised from 128k to 192k for music quality
3. Smart trim path now detects broadcast formats:
- .mxf, .mpg, .mpeg, .ts, .m2ts, .mts → transcode (not stream copy)
- Standard MP4/MOV → stream copy (faster, lossless)
4. Pre-conversion step for broadcast files without trim:
- Even without --start/--duration, MXF/MPG get converted to MP4
- Same audio handling as trim path
5. Main render adds explicit -map 0✌️0 -map 0🅰️0? -ac 2 to ensure
only first video and first audio stream get encoded, with stereo
6. ACR recognize also gets -map 0🅰️0 -ac 2 for MXF compatibility
7. UI accepts: video/*,.mxf,.mpg,.mpeg,.ts,.m2ts,.mts
8. Upload limit raised: 2GB → 10GB (MXF files are large)
This means a TV broadcast MXF with [SLO/EN/DE language tracks] now
correctly outputs stereo MP4 with the main language track preserved.
Changes:
1. UI: removed blocking prompt() that asked for artist+title on filename
that didn't match 'Artist - Title' pattern. Upload always proceeds.
Instead shows yellow warning saying 'server will try to recognize'.
2. Backend: added scripts/acr_recognize.py — extracts 20s audio sample
from video (at 15s and 60s offsets for robustness), computes ACRCloud
fingerprint via native binary (3KB payload), sends to identify API.
3. Pipeline: process_job() now runs ACR recognition step before analysis
IF parsed_artist or parsed_title is missing. Result is saved to job
metadata and used for download filename + Scribe/Claude filename hint.
4. Credentials: ACR_HOST + ACR_ACCESS_KEY + ACR_SECRET_KEY env vars
added to Coolify (using existing keys from openclaw fb-agent metka).
5. requirements.txt: added pyacrcloud==1.0.11 for native fingerprinting.
This unblocks future automation/cron upload pipelines — files don't need
to be perfectly named, ACRCloud will identify them automatically.
Fallback chain:
1. Filename parsing (Artist - Title.mp4)
2. ACRCloud audio fingerprint (works even for '12345.mp4', 'IMG_001.mp4')
3. If both fail: download filename uses 'reel_<id>.mp4' (still works)
Two improvements:
1. DOWNLOAD FILENAME: instead of 'reel_<job-id>.mp4' (e.g. reel_25e076af7600.mp4),
downloads now have descriptive names like:
- 'Lady Gaga - Abracadabra - REEL.mp4'
- 'Modrijani - S teboj - REEL.mp4'
- 'Sarah Connor - FICKA - REEL.mp4'
2. PRE-UPLOAD VALIDATION: when filename doesn't follow 'Artist - Title' format,
browser prompts user for both fields. Without them, upload is blocked.
This prevents files with names like '12345.mp4' or 'video_final.mp4' from
being processed without identifying info.
Implementation:
- parse_artist_title() helper handles common formats:
- 'Artist - Title.mp4' / 'Artist – Title' (em-dash)
- 'Artist | Title' / 'Artist : Title'
- Strips noise: '(Official Music Video)', '(Audio)', '(HD)', '[Lyric Video]'
- Client-side parser mirrors backend (validation before upload)
- Backend accepts artist + title form fields (override parsed)
- Job stored with parsed_artist + parsed_title + has_clean_name fields
- YouTube jobs auto-fetch title via yt-dlp --info-only and parse it
- Filename hint to Scribe/Claude uses parsed values (cleaner than raw filename)
- Download endpoint uses build_download_filename() for content-disposition
- Jobs list shows 'Artist — Title' instead of raw filename
Result: downloaded reels are auto-named correctly for Facebook/Instagram
upload, no more renaming files manually.
Root cause: inline onclick with JSON.stringify(title) broke when title
contained quotes, special chars, or was empty. The HTML attribute parser
got confused by mismatched quotes, so click handler never fired.
Fix:
- Replaced inline onclick handlers with data-action attributes
- Added single delegated click listener at document level
- Title stored in element dataset (no HTML quoting issues)
- Added escapeHtml() helper for safe rendering of titles/errors
Now clicking Preview in the right sidebar opens the fullscreen modal
correctly, regardless of filename characters.
Previously: clicking Preview in jobs list showed a small inline video
within the job card row.
Now: clicking Preview opens a centered fullscreen modal with:
- Large video player (up to 95vw × 85vh) — same experience as bottom
live-preview but accessible from jobs list
- Auto-play, controls, native HTML5 video player
- Title shown below video for context
- Download button + Close button
- Click outside or ESC key to close
- Backdrop blur for focus
Removes the obsolete inline <video> element that was rendered hidden
in each job card. Body scroll locked while modal open.
1. Preview endpoint now supports HTTP Range requests (HTTP 206 Partial)
- HTML5 video player needs Range support to seek/buffer properly
- Without it, video would cut off after a few seconds
- Returns chunks of 64KB on demand
2. Left panel (upload form) is now sticky (position: sticky)
- Stays in view while right panel (jobs list) scrolls
- On mobile (<800px) reverts to normal flow
- Refactored analyze_with_claude into shared _build_analysis_prompt + _parse_llm_response helpers
- New analyze_with_gemini() using Gemini 3.1 Pro ($2/M in, MMMLU 92.6% — best multilingual)
- Unified analyze_with_llm(provider) dispatcher with auto-fallback (Claude → Gemini)
- API endpoint accepts llm_provider in StartJobIn (claude/gemini/auto)
- Frontend dropdown to pick LLM
- Default model is now Sonnet 4.6 (was Haiku 4.5) — 3x quality at 3x price (~3 cents/video)
- Gemini support is opt-in: needs GEMINI_API_KEY env var to activate
- 3-sample voting for auto-detect (start/middle/end of song) prevents lang switching mid-song
- Lock detected language for full transcription
- Anti-hallucination: condition_on_previous_text=False, temperature=0.0
- compression_ratio_threshold=2.4 (rejects repetitive hallucinations)
- log_prob_threshold=-1.0 (rejects low-confidence segments)
- no_speech_threshold=0.6 (more aggressive silence detection)
- Default Whisper model changed: small → medium (better for all langs incl. Slavic)