chahinebrini 685782b538 fix(coach): dynamische Sprache (Text-Detection + App-Locale-Fallback)
LLM-Prompt (message.post + sos-stream):
- LANG_INSTRUCTIONS Map raus, ersetzt durch dynamische Instruktion
  'Reply in {detectedFromUser} ... fallback: {appLang}'
- Lyra matcht jetzt die Sprache der letzten User-Message (per
  detectLang Unicode-Detection); App-Locale ist nur noch Fallback
- Instruktion doppelt eingehängt (Anfang + Ende des System-Prompts)
  gegen recency bias bei langen deutschen Prompts

TTS (speak dispatcher + speak-cartesia + speak-elevenlabs):
- Kein 'de'-Default mehr für language. detectLang(text, locale) leitet
  Sprache primär aus dem Antwort-Text ab (Arabic/Cyrillic/CJK/Turkish-
  Letters), Locale als Fallback
- Cartesia + ElevenLabs: language/language_code nur senden wenn
  ableitbar, sonst Provider auto-detect statt erzwungenem 'de'
- speak-cartesia: sonic-2 → sonic-3 (Multi-Lang, war beim Dispatcher-
  Fix gestern vergessen worden)
- Google: en-US neutraler Fallback statt de-DE-Bias

Neu: server/utils/detect-lang.ts
2026-05-31 00:12:40 +02:00

37 lines
1.4 KiB
TypeScript

/**
* Detect language from text using Unicode script ranges.
*
* Non-Latin scripts are detected reliably from a single character. For Latin
* scripts (de/en/fr/tr/es/it/pt …) we fall back to the supplied locale-hint,
* since distinguishing them needs a real NLP library and the user-facing
* App-Sprache is a perfectly good signal.
*
* Returns a 2-letter ISO code, or null if neither detection nor hint apply.
*/
export function detectLang(
text: string,
localeHint?: string | null,
): string | null {
if (text) {
// Sample a window — first 300 chars is plenty; counting script hits is
// cheaper than scanning multi-KB Lyra-Antworten.
const sample = text.slice(0, 300);
if (/[\u0600-\u06FF\u0750-\u077F\u08A0-\u08FF]/.test(sample)) return "ar"; // Arabic
if (/[\u0400-\u04FF]/.test(sample)) return "ru"; // Cyrillic
if (/[\u3040-\u309F\u30A0-\u30FF]/.test(sample)) return "ja"; // Hiragana/Katakana
if (/[\uAC00-\uD7AF]/.test(sample)) return "ko"; // Hangul
if (/[\u4E00-\u9FFF]/.test(sample)) return "zh"; // CJK Unified Ideographs
if (/[\u0590-\u05FF]/.test(sample)) return "he"; // Hebrew
if (/[\u0E00-\u0E7F]/.test(sample)) return "th"; // Thai
// Turkish-specific Latin letters — strong hint without an NLP lib.
if (/[ğĞıİşŞ]/.test(sample)) return "tr";
}
if (localeHint) {
const base = localeHint.split("-")[0].toLowerCase();
if (base) return base;
}
return null;
}