Fix: Gemini 3.1 Pro thinking model needs 32k maxOutputTokens (was 4096 → MAX_TOKENS truncation)
Diagnoza:
- Gemini 3.x Pro je thinking model (ima internal reasoning, thoughtsTokenCount)
- Pri velikih transkriptih (60+ segmentov pesmi):
* thoughts ~ 1500-3000 tokens
* output JSON s corrected_segments ~ 3000-7000 tokens
* total ~ 4500-10000 tokens
- Z maxOutputTokens=4096 je bil response prekinjen (finishReason: MAX_TOKENS),
JSON odrezan na pol, _parse_llm_response je threw json.JSONDecodeError
- Rezultat: 'Gemini vrnil prazen string' v logih
Popravki:
1. Gemini maxOutputTokens 4096 → 32768 (dovolj za thinking + dolg JSON)
2. Diagnostika finishReason==MAX_TOKENS in usage tokens v logih
3. Detekcija praznega text-a (ne samo praznega parts array-a)
4. Claude max_tokens 4096 → 8192 (rezerva za dolge pesmi)
5. Claude detekcija stop_reason==max_tokens
Test (60 segmentov, 5631 char prompt):
- 4096 → finishReason=MAX_TOKENS, thoughts=2594, output=1488, JSON odrezan ❌
- 16384 → finishReason=STOP, thoughts=1445, output=3040, JSON popoln ✅
- 32768 → varen default ✅