4 Commits

Author SHA1 Message Date
chahinebrini
4573d16e1a refactor(mail-classifier): display-name aus Score-Pfad entfernen (v1.0)
SENDER_NAME_GAMBLING_KEYWORD (+30) und SENDER_NAME_BRAND_MATCH (+20) aus
SCORE_WEIGHTS entfernt. Layer-2.5-Brand-Match prüft nur noch Domain-Root
und Relay-Domain, nicht mehr displayNameNorm. Sender-Name-Keywords-Block
in computeScore() entfernt. keywordHitsName bleibt im Interface für v1.1.

Tests: Brand+Random-Tests die Display-Name als einzige Brand-Source hatten
auf neues v1.0-Verhalten (PASS) umgeschrieben. Zwei neue Tests: Display-Name-
only Casino-Signal → Score=0 → PASS verifiziert.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 05:18:00 +02:00
chahinebrini
00ec716694 fix(mail): skip Gmail system folders in scan + raise subject-keyword score to 50
Fix 1 (scan-internal): filter out \All, \Drafts, \Sent, \Trash, \Flagged via
specialUse — stops [Gmail]/All Mail from consuming the SCAN_LIMIT=200 and
blocking new INBOX mails from reaching fetch range. \Junk/\Spam stay in scope.
Folders without specialUse (iCloud, GMX) pass through untouched — no false
exclusions without confirmed metadata.

Fix 2 (mail-classifier): raise SUBJECT_GAMBLING_KEYWORD from 35 to 50 so a
single unambiguous casino/jackpot/freispiel subject hit alone reaches the
SCORE_BLOCK_MIDRANGE threshold and triggers a block. Previously 35 pts fell
short when sender domain was generic and display name empty.

Tests: 9 new cases added (2 Fix-2 classifier + 4 Fix-1 folder-filter unit +
1 computeScore score=50 exact assertion). All 265 tests green.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 05:12:14 +02:00
chahinebrini
f2e3c00943 refactor(mail): remove groq llm layer — deterministic pipeline only
User-Direktive: Mail-Filter bleibt auf dem deterministischen Score+Layer-2.5-Stack.
Groq-LLM Borderline-Call (Layer 4) entfernt. Layer 2.5 Brand+Random fängt den
Apple Hide-My-Email Fall (icloud.com-Adressen mit kryptischen Local-Parts +
Brand-DisplayName) weiterhin sauber via Hard-Block.

Score-Mid-Range 25-79 entscheidet jetzt deterministisch: ≥50 → BLOCK, sonst PASS.

Damit auch DSGVO-P0-Items aus dem Hans-Müller-Review obsolet
(AVV-Annex Groq, Drittland-USA-Consent-Toggle, Datenschutzerklärung-Absatz).

- mail-classifier.ts: callGroqClassifier + redactLocalPartForLLM + groq-Feld raus
- scan.post.ts + scan-internal.post.ts: groqApiKey-Param raus, groq*-Sample-Felder raus
- mail-classifier.test.ts: Groq-Tests + redactLocalPart-Tests entfernt, 46 Tests grün

DB-Spalten in mail_classification_samples (groq_*) bleiben als legacy nullable —
Cleanup-Migration optional in späterem Sprint.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-14 22:15:32 +02:00
chahinebrini
bdd93668ae feat(mail): multi-layer classifier — Brand+Random, Relay-Decoder, Score, Groq + ML-Sampling
Layer 0–4 Klassifikations-Pipeline in mail-classifier.ts:
- Layer 2: Domain-Hard-Block + Relay-Decoder (=domain.tld aus SendGrid/Mailchimp-Bounces)
- Layer 2.5: Brand+Random-Token-Hard-Block (Gambling-Brand-Normalisierung + Random-Token-Detection)
  verhindert LLM-Call für bekannte Gambling-Relayer (Gamblezen, BetandPlay etc.)
- Layer 3: Score 0–100 (TS-Gewichte: Domain-Keywords, Subject-Keywords, Name-Match,
  Geld-Pattern, Urgency, All-Caps, Short-Random-Domain, Brand/Random-Ergänzungen)
- Layer 4: Groq Llama 3.3 70B Borderline-Klassifikation (Score 25–75)
  mit Local-Part-Redaction (DSGVO: nur behalten wenn local-part selbst Keyword enthält)
- Layer 5: MailClassificationSample-Insert nach jeder Klassifikation (ML-Phase 3)

Migrations:
- 20260514_add_mail_blocked_trigger_source: ADD COLUMN trigger_source auf mail_blocked
- 20260514_add_mail_classification_sample: CREATE TABLE mail_classification_samples

50 neue Tests (mail-classifier.test.ts): alle Layer, beide Screenshot-Beispiele (Gamblezen +
BetandPlay) bestätigt als Layer-2.5-Hard-Block ohne LLM-Call, Whitelist, Score, Redaction.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 22:05:35 +02:00