rebreak-monorepo

Author	SHA1	Message	Date
chahinebrini	b9c48dfd63	test: update stale comments in test fixes	2026-06-18 10:19:48 +02:00
chahinebrini	eb3fb129e9	test: update mail classifier score expectations	2026-06-18 10:12:37 +02:00
chahinebrini	38811820e6	feat(backend): Public-Domain-Guard + Mail-Detection (spins/%-Pattern) Public-Domain-Guard (icloud.com/gmail.com etc. nie blockbar/veröffentlichbar): - neue utils/public-email-domains.ts (shared Freemail-Liste) - custom-domains/index.post + custom-domains/suggest + curated-domains/suggest lehnen Public-Domains mit 400 PUBLIC_DOMAIN ab (defense-in-depth) Mail-Detection (mo): "spins" zu GAMBLING_KEYWORDS + Subject-%-Pattern (Score 10) → fängt "Spins + 400% Bonus"-Spam von Freemail-Absendern. 61/61 Tests grün. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 01:06:06 +02:00
chahinebrini	c3de7055a5	feat(mail): Sucht-Compound-Regel + Phase-1-Training-Foundation Task B — linguistische FP-Fix: - mail-classifier.ts: Subject-Keyword-Loop überspringt Keyword-Score wenn Subject das Keyword als Sucht-Compound enthält (z.B. "glücksspiel" in "Glücksspielsucht" → kein +50 Score). Globale linguistische Invariante Deutsch — Gambling-Marketer schreiben nie "Glücksspielsucht-Bonus". - gambling-keywords.mjs: GAMBLING_WHITELIST erweitert um Stamm-Varianten (wettsucht, spielsucht, suchtberatung, suchthilfe) als Fallback für Compounds wo keyword ≠ exakter Stamm. - 4 neue Tests: Forum Glücksspielsucht → PASS, Hilfe bei Spielsucht → PASS, Wettsucht-Selbsthilfe → PASS, Glücksspiel-Bonus 100€ → BLOCK. Task C — Phase-1-Data-Foundation: - mail-training-utils.ts: sanitizeSubjectForTraining() (PII-Stripping via Regex: EMAIL/URL/NUM/Greeting/ALL-CAPS) + detectSubjectLanguage() via franc (iso639-3). 26 Unit-Tests. - franc@6.2.0 installiert (~50KB ESM). - mail.ts insertMailClassificationSample(): ruft sanitizeSubjectForTraining() auf, schreibt detectedLang + subjectSanitized in features-JSON (Interim bis Schema-Migration). - mail-retention-cron.ts: Subject-Nullification nach 30 Tagen (täglich) + Sample-Purge nach 12 Monaten (monatlich). DSGVO Art. 5 Abs. 1e. 105 Tests grün (58 classifier + 26 training-utils + 11 display-name + 10 gmail). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-29 08:14:57 +02:00
chahinebrini	4573d16e1a	refactor(mail-classifier): display-name aus Score-Pfad entfernen (v1.0) SENDER_NAME_GAMBLING_KEYWORD (+30) und SENDER_NAME_BRAND_MATCH (+20) aus SCORE_WEIGHTS entfernt. Layer-2.5-Brand-Match prüft nur noch Domain-Root und Relay-Domain, nicht mehr displayNameNorm. Sender-Name-Keywords-Block in computeScore() entfernt. keywordHitsName bleibt im Interface für v1.1. Tests: Brand+Random-Tests die Display-Name als einzige Brand-Source hatten auf neues v1.0-Verhalten (PASS) umgeschrieben. Zwei neue Tests: Display-Name- only Casino-Signal → Score=0 → PASS verifiziert. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-16 05:18:00 +02:00
chahinebrini	00ec716694	fix(mail): skip Gmail system folders in scan + raise subject-keyword score to 50 Fix 1 (scan-internal): filter out \All, \Drafts, \Sent, \Trash, \Flagged via specialUse — stops [Gmail]/All Mail from consuming the SCAN_LIMIT=200 and blocking new INBOX mails from reaching fetch range. \Junk/\Spam stay in scope. Folders without specialUse (iCloud, GMX) pass through untouched — no false exclusions without confirmed metadata. Fix 2 (mail-classifier): raise SUBJECT_GAMBLING_KEYWORD from 35 to 50 so a single unambiguous casino/jackpot/freispiel subject hit alone reaches the SCORE_BLOCK_MIDRANGE threshold and triggers a block. Previously 35 pts fell short when sender domain was generic and display name empty. Tests: 9 new cases added (2 Fix-2 classifier + 4 Fix-1 folder-filter unit + 1 computeScore score=50 exact assertion). All 265 tests green. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-16 05:12:14 +02:00
chahinebrini	f2e3c00943	refactor(mail): remove groq llm layer — deterministic pipeline only User-Direktive: Mail-Filter bleibt auf dem deterministischen Score+Layer-2.5-Stack. Groq-LLM Borderline-Call (Layer 4) entfernt. Layer 2.5 Brand+Random fängt den Apple Hide-My-Email Fall (icloud.com-Adressen mit kryptischen Local-Parts + Brand-DisplayName) weiterhin sauber via Hard-Block. Score-Mid-Range 25-79 entscheidet jetzt deterministisch: ≥50 → BLOCK, sonst PASS. Damit auch DSGVO-P0-Items aus dem Hans-Müller-Review obsolet (AVV-Annex Groq, Drittland-USA-Consent-Toggle, Datenschutzerklärung-Absatz). - mail-classifier.ts: callGroqClassifier + redactLocalPartForLLM + groq-Feld raus - scan.post.ts + scan-internal.post.ts: groqApiKey-Param raus, groq-Sample-Felder raus - mail-classifier.test.ts: Groq-Tests + redactLocalPart-Tests entfernt, 46 Tests grün DB-Spalten in mail_classification_samples (groq_) bleiben als legacy nullable — Cleanup-Migration optional in späterem Sprint. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-14 22:15:32 +02:00
chahinebrini	bdd93668ae	feat(mail): multi-layer classifier — Brand+Random, Relay-Decoder, Score, Groq + ML-Sampling Layer 0–4 Klassifikations-Pipeline in mail-classifier.ts: - Layer 2: Domain-Hard-Block + Relay-Decoder (=domain.tld aus SendGrid/Mailchimp-Bounces) - Layer 2.5: Brand+Random-Token-Hard-Block (Gambling-Brand-Normalisierung + Random-Token-Detection) verhindert LLM-Call für bekannte Gambling-Relayer (Gamblezen, BetandPlay etc.) - Layer 3: Score 0–100 (TS-Gewichte: Domain-Keywords, Subject-Keywords, Name-Match, Geld-Pattern, Urgency, All-Caps, Short-Random-Domain, Brand/Random-Ergänzungen) - Layer 4: Groq Llama 3.3 70B Borderline-Klassifikation (Score 25–75) mit Local-Part-Redaction (DSGVO: nur behalten wenn local-part selbst Keyword enthält) - Layer 5: MailClassificationSample-Insert nach jeder Klassifikation (ML-Phase 3) Migrations: - 20260514_add_mail_blocked_trigger_source: ADD COLUMN trigger_source auf mail_blocked - 20260514_add_mail_classification_sample: CREATE TABLE mail_classification_samples 50 neue Tests (mail-classifier.test.ts): alle Layer, beide Screenshot-Beispiele (Gamblezen + BetandPlay) bestätigt als Layer-2.5-Hard-Block ohne LLM-Call, Whitelist, Score, Redaction. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-14 22:05:35 +02:00

8 Commits