Skip to content

feat: v4 refactor — driver architecture, multi-language, caching#45

Merged
deemonic merged 22 commits intomainfrom
feature/v4-refactor
Feb 18, 2026
Merged

feat: v4 refactor — driver architecture, multi-language, caching#45
deemonic merged 22 commits intomainfrom
feature/v4-refactor

Conversation

@deemonic
Copy link
Collaborator

@deemonic deemonic commented Feb 13, 2026

Summary

Ground-up rewrite for v4 with a clean, extensible architecture:

  • Driver-based detectionregex (obfuscation-aware), pattern (fast exact match), phonetic (sound-alike via metaphone + Levenshtein), and pipeline (chains multiple drivers)
  • Multi-language support — English, Spanish, French, German with language-specific normalizers and severity maps
  • Severity scoring — Words categorised as mild/moderate/high/extreme with 0-100 scoring. Severity maps added for all non-English languages so withSeverity() filtering works correctly
  • Result cachingcheck() results cached by content hash (text + driver + language + severity + allow/block + mask). Bypassed for CallbackMask. Configurable via cache.results
  • Masking strategies — Character mask, grawlix, or custom callback
  • Eloquent integrationBlaspable trait for auto-sanitize/reject on model save
  • Middleware, validation rules, Blade directive, Str macros
  • Fluent APIBlasp::in('spanish')->mask('#')->withSeverity(Severity::High)->check($text)
  • Testing utilitiesBlasp::fake() with assertions
  • Full backward compatibility — All v3 methods work as deprecated aliases

Test plan

  • tests/SeverityMapTest.php — severity filtering works for Spanish, French, German (19 tests)
  • tests/ResultCachingTest.php — caching hits, key variation, CallbackMask bypass, cache clearing, config toggle (12 tests)
  • Full suite passes: 282 tests, 929 assertions, zero failures

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Fluent checking API with driver/pipeline selection, masking options, language/severity controls, and richer Result (original, clean, score, count, words).
    • New detection drivers (regex, pattern, phonetic, pipeline), severity maps per language, Eloquent trait, middleware, Blade @clean, Str/Stringable helpers, and artisan commands (blasp:clear, blasp:languages, blasp:test).
  • Refactor
    • Major README rewrite, reorganized configuration and language files, new service provider/facade/manager architecture.
  • Tests
    • Expanded and updated test suites covering drivers, phonetic matching, pipelines, caching, model integration, and directives.

deemonic and others added 13 commits February 12, 2026 19:37
…ction

Adds a `Blaspable` trait that hooks into the Eloquent `saving` event to
automatically check and sanitize (or reject) profanity on specified model
attributes. Supports per-model language, mask, and mode overrides.

- Blaspable trait with sanitize/reject modes and helper methods
- ProfanityRejectedException for reject mode
- ModelProfanityDetected event fired on detection
- `model.mode` config key in blasp.php
- 21 tests covering all trait functionality

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove all v3-era source files that have been replaced by the new v4
architecture: Abstracts, Config, Contracts, Facades, Generators,
Normalizers, Registries, and the monolithic BlaspService/ProfanityDetector.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New modular core with Analyzer, Dictionary, Result, and driver-based
detection (RegexDriver, PatternDriver). Includes normalizers per language,
configurable masking strategies, severity levels, and false positive filtering.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BlaspManager with fluent PendingCheck API, Facade, ServiceProvider,
middleware, validation rule, artisan commands (clear, test, languages),
events (ProfanityDetected, ContentBlocked), and BlaspFake for testing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update composer.json laravel extra to point to new BlaspServiceProvider
and Facade namespaces. Add severity tiers to English language config.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Migrate all tests to use the new v4 Facade, PendingCheck fluent API,
and Result methods. Simplify TestCase base class to use BlaspServiceProvider.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Full rewrite covering the new driver architecture, fluent API, Result
object, Blaspable trait, middleware, validation rules, testing utilities,
events, artisan commands, configuration reference, and v3 migration guide.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Register 'blasp' as a short middleware alias, add @clean Blade directive
for XSS-safe profanity masking in views, and register isProfane/cleanProfanity
macros on Str and Stringable for fluent usage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move all classes from Blaspsoft\Blasp\Laravel\* to Blaspsoft\Blasp\* and
update imports across src and tests to match.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Catches sound-alike profanity evasions (e.g. "phuck", "fuk", "sheit")
that bypass the regex and pattern drivers. Uses PHP's metaphone() for
indexing and levenshtein() for confirmation, with a curated false-positive
list to protect common words like "fork", "duck", and "beach".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Allows combining regex, pattern, and phonetic drivers so a single
check() call catches obfuscated text, exact matches, and sound-alikes
in one pass. Supports config-based (`driver('pipeline')`) and ad-hoc
(`pipeline('regex', 'phonetic')`) usage with union merge semantics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add severity maps (mild/moderate/extreme) for Spanish, French, and German
so withSeverity() filtering works correctly for all languages instead of
defaulting everything to High.

Implement result caching in PendingCheck — check() results are cached by
a hash of all parameters (text, driver, language, severity, allow/block
lists, mask strategy). CallbackMask bypasses cache since closures can't
serialize. Add Result::fromArray() for deserialization, extend
Dictionary::clearCache() to also clear result cache, and add
cache.results config toggle.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Feb 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replaces the v3 monolithic profanity service with a v4 modular system: adds BlaspManager, PendingCheck, Dictionary, DriverInterface and drivers (regex, pattern, phonetic, pipeline), Result/Severity models, mask strategies, normalizers, Laravel integrations (provider, middleware, trait, validation), and removes legacy loaders, registries, and the old service classes.

Changes

Cohort / File(s) Summary
Docs & Config
README.md, composer.json, config/blasp.php, config/config.php, config/languages/*
README rewritten for v4; new consolidated config/blasp.php added and legacy config/config.php removed; language files gain severity maps and expanded lists; composer provider class name updated.
Manager & Fluent API
src/BlaspManager.php, src/PendingCheck.php, src/Facades/Blasp.php
Introduces BlaspManager, fluent PendingCheck builder, facade refactor (new static API, fakes/testing helpers) and facade method surface changes.
Core & Dictionary
src/Core/Dictionary.php, src/Core/Analyzer.php, src/Core/Result.php, src/Core/Score.php
Adds Dictionary for language/config aggregation, Analyzer as thin orchestration layer, Result/Score models and new result API (with v3 compatibility wrappers).
Drivers
src/Drivers/... (RegexDriver.php, PatternDriver.php, PhoneticDriver.php, PipelineDriver.php)
Adds DriverInterface and four drivers implementing detect(...) with new matching, masking, deduplication and scoring flows.
Matching & Filters
src/Core/Matchers/* (FalsePositiveFilter, CompoundWordDetector, PhoneticMatcher, RegexMatcher)
New matcher/filter classes centralize false-positive logic, compound-word checks, phonetic indexing and regex expression generation.
Masking & Models
src/Core/Masking/*, src/Core/MatchedWord.php, src/Enums/Severity.php
Adds mask strategies (Character, Grawlix, Callback), MatchedWord VO, and Severity enum with weights and comparison helpers.
Normalizers
src/Core/Normalizers/*, src/Core/Normalizers/StringNormalizer.php
Introduces StringNormalizer interface and language normalizers (English, French, German, Spanish, Null); legacy normalizer abstractions removed.
Laravel Integration
src/BlaspServiceProvider.php, src/Blaspable.php, src/Middleware/CheckProfanity.php, src/Rules/Profanity.php
New service provider, Blaspable trait for Eloquent, middleware, validation rule, Blade/Str macros and event wiring.
Events, Exceptions & Testing
src/Events/*, src/Exceptions/*, src/Testing/BlaspFake.php
Adds ProfanityDetected/ModelProfanityDetected/ContentBlocked events, ProfanityRejectedException, and BlaspFake test helper.
Console & Commands
src/Console/*
Adds artisan commands: blasp:clear, blasp:languages, and blasp:test (replaces old command).
Removed legacy subsystems
src/BlaspService.php, src/ProfanityDetector.php, src/ServiceProvider.php, src/Config/*, src/Contracts/*, src/Registries/*, src/Abstracts/*, src/Normalizers/*
Removes old BlaspService, ConfigurationLoader/DetectionConfig/MultiLanguageDetectionConfig, registries, contracts and legacy normalizers; public API surface substantially changed.
Tests
tests/*
Extensive test updates and additions to exercise new API (facade/manager), drivers, pipeline, phonetic matcher, caching, Blaspable trait, severity maps, Blade directive and string macros.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Manager as BlaspManager
    participant Pending as PendingCheck
    participant Dict as Dictionary
    participant Driver as Driver (Regex/Pattern/Phonetic/Pipeline)
    participant Analyzer as Analyzer
    participant Result as Result

    Client->>Manager: in('english') / driver('regex') / pipeline(...)
    Manager->>Pending: newPendingCheck()
    Client->>Pending: mask('*')->withSeverity(High)->check(text)
    Pending->>Dict: forLanguage('english') / forAllLanguages()
    Dict-->>Pending: Dictionary
    Pending->>Manager: resolveDriver('regex')
    Manager-->>Driver: create/return DriverInterface
    Pending->>Analyzer: analyze(text, driver, dictionary, mask, options)
    Analyzer->>Driver: detect(text, dictionary, mask, options)
    Driver->>Driver: tokenize → match → filter false-positives → build MatchedWord[]
    Driver->>Driver: apply MaskStrategy → produce clean text
    Driver->>Result: build Result(original, clean, matches, score)
    Result-->>Analyzer: Result
    Analyzer-->>Pending: Result
    Pending-->>Client: Result (isOffensive(), clean(), score(), words())
Loading
sequenceDiagram
    participant Model as EloquentModel
    participant Trait as Blaspable
    participant Pending as PendingCheck
    participant Result as Result
    participant Event as EventDispatcher

    Model->>Trait: saving event triggers
    Trait->>Pending: app('blasp')->in(language)->mask('#')->check(value)
    Pending-->>Result: Result
    alt Result is offensive
        Trait->>Event: dispatch(ModelProfanityDetected(model, attribute, result))
        alt model mode == 'reject'
            Trait->>Model: throw ProfanityRejectedException
        else
            Trait->>Model: replace attribute with Result.clean()
        end
    end
    Model->>DB: persist (if not rejected)
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Poem

🐰 I hopped from monolith to modular plains,
Drivers stitched pipelines, masks in my chains,
Dictionary hummed words by tongue and tongue's art,
Events sprung, tests leapt — a cleaner start,
A little rabbit applauds v4's smart.

🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.27% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main architectural change: a v4 refactor introducing driver-based architecture, multi-language support, and result caching functionality.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/v4-refactor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
config/languages/french.php (1)

39-1533: ⚠️ Potential issue | 🔴 Critical

Critical: French profanities list contains hundreds of common, non-profane words.

Starting around line ~526, the list drifts into completely legitimate vocabulary: farming terms (pâturage, fourrage), biology (cellule, chromosome, génome), technology (ordinateur, smartphone, internet, facebook, google), daily life (famille, diamant, information, maison), etc. This appears to be an accidental thesaurus/vocabulary dump that was never pruned.

This will cause massive false positives for virtually any French-language text. The list should be truncated to actual profanities and vulgar expressions — roughly lines 39–525 appear legitimate; everything after needs careful review and likely removal.

#!/bin/bash
# Count how many French "profanities" overlap with the false_positives list in the same file
python3 << 'EOF'
import re

with open("config/languages/french.php", "r") as f:
    content = f.read()

prof_match = re.search(r"'profanities'\s*=>\s*\[(.*?)\],\s*\n\s*'false_positives'", content, re.DOTALL)
fp_match = re.search(r"'false_positives'\s*=>\s*\[(.*?)\],\s*\n\s*'substitutions'", content, re.DOTALL)

if not prof_match or not fp_match:
    print("Could not parse sections")
    exit(1)

def extract_words(text):
    return set(re.findall(r"'([^']+)'", text))

profanities = extract_words(prof_match.group(1))
false_positives = extract_words(fp_match.group(1))

overlap = profanities & false_positives
print(f"Total profanities: {len(profanities)}")
print(f"Total false_positives: {len(false_positives)}")
print(f"Overlap (in both lists): {len(overlap)}")
if overlap:
    print("Examples of overlap:")
    for w in sorted(overlap)[:30]:
        print(f"  - {w}")
EOF
config/languages/spanish.php (1)

29-420: ⚠️ Potential issue | 🟡 Minor

Profanities list includes many common, innocuous words that will cause false positives.

Words like 'triste' (sad), 'malo'/'mala' (bad), 'adios'/'adiós', 'bye', 'goodbye', 'hasta luego', 'hasta pronto', 'despedida' (farewell), 'partida' (departure/game), 'amargo' (bitter — a taste), and 'soso'/'sosa' (bland) are standard vocabulary, not profanities. Including them will flag ordinary text as offensive and erode user trust.

Consider pruning clearly non-profane entries or moving them to a separate "sensitive context" tier if needed.

config/languages/german.php (1)

34-1048: ⚠️ Potential issue | 🟠 Major

Profanities list contains extremely common German words that will flag virtually all German text as offensive.

Words like 'zu' (to/too — one of the most frequent German words), 'weg' (path/away), 'warm' (warm), 'breit' (wide), 'voll' (full), 'dicht' (dense/tight), 'matt' (matte), 'platt' (flat), 'oral' (oral), 'stramm' (tight), 'schritt' (step), and 'roh' (raw) are basic everyday vocabulary. Including them will produce an unacceptable false-positive rate, rendering the German detection unusable for real content.

Additionally, there are duplicate entries: 'dicht'/'dichte'/'dichter'/'dichtes' (Lines 998–1001 and 1040–1043), 'verbrannt' group (Lines 921–924 and 949–952), 'verlogen' group (Lines 554–557 and 574–577), and 'hinüber'/'hinueber' (Lines 753–754 and 1006–1007). Duplicates cause redundant matches in the PatternDriver.

🤖 Fix all issues with AI agents
In `@config/languages/english.php`:
- Around line 4-35: The severity arrays under the 'severity' key contain ~67
words that are not present in the main profanities list, so update the
profanities array to include every token referenced in severity (e.g. add
'shit', 'pussy', 'slut', 'whore', 'nigga', 'niggers', 'bloody', 'piss', 'twat',
'wanker', etc.), and ensure every entry in the profanities array has a
corresponding severity classification; modify the profanities definition and
reconcile it with the 'severity' arrays so no severity-classified word is
missing and consider auditing remaining profanities (the ~749 unclassified) to
assign appropriate severity levels or remove false positives.

In `@config/languages/french.php`:
- Around line 4-37: The French severity map under the 'severity' array is
missing the 'high' category (it currently has 'mild', 'moderate', 'extreme'),
which breaks parity with the English config; add a new 'high' key in the
'severity' array (between 'moderate' and 'extreme') and populate it with the
equivalent French terms used in the English config's high severity list so the
code paths that call withSeverity('high') or index by 'high' work consistently;
update the array in the same format as 'mild'/'moderate'/'extreme' (string
entries, singular/plural and accent variants) to match localization behavior.

In `@config/languages/spanish.php`:
- Around line 4-27: The Spanish language config's 'severity' array (keys 'mild',
'moderate', 'extreme') is missing the 'high' tier required by the Severity enum
(Mild, Moderate, High, Extreme); update the 'severity' array in
config/languages/spanish.php to include a 'high' key with appropriate
high-severity words (matching the pattern used in English) so Severity::High
maps consistently across locales, or explicitly document the intentional
omission if you choose not to add it.

In `@src/Console/TestCommand.php`:
- Line 9: The command signature currently re-declares the built-in
Artisan/Symfony `--verbose` flag via the protected `$signature` (in
`TestCommand`) which conflicts with the base `Command`; rename the custom flag
(e.g., `--detailed` or `--show-matches`) in the `$signature` string and update
all usages that call `$this->option('verbose')` (such as in the `handle` method
of `TestCommand`) to use the new option name so you no longer shadow the
framework verbosity option.

In `@src/Core/Dictionary.php`:
- Around line 237-264: The loadLanguageConfig method currently interpolates
$language into $possiblePaths and then require()s the chosen file, so first
validate/sanitize $language before building paths: enforce a strict whitelist or
regex (e.g. only letters, digits, hyphen and underscore, no dots or slashes) or
normalize with basename to prevent path traversal, and if validation fails
return the default ['profanities'=>[], 'false_positives'=>[]]; apply this check
at the top of loadLanguageConfig (before building $possiblePaths and before
require) and ensure $languageFile is only chosen from the sanitized/validated
input.

In `@src/Drivers/PatternDriver.php`:
- Around line 34-56: preg_match_all with PREG_OFFSET_CAPTURE gives byte offsets
but the code uses mb_substr (character offsets) and performs left-to-right
replacements causing multibyte corruption and wrong positions; fix by converting
each match's byte offset ($match[1] stored to $start) to a character offset
before using mb_substr or creating MatchedWord (e.g., compute $charStart =
mb_strlen(substr($lowerText, 0, $start), 'UTF-8')), collect matches into
$matchedWords without mutating $cleanText, then perform replacements
right-to-left (sort matches by $charStart descending) and apply mask->mask on
originalMatch when splicing $cleanText so masking and MatchedWord::position
(used in MatchedWord and returned by functions like PipelineDriver) use correct
character offsets and avoid position drift.

In `@src/Drivers/PhoneticDriver.php`:
- Around line 105-117: The severity filter is being applied to $matchedWords
after masking, causing cleanText to still contain masks for words that were
filtered out; in PhoneticDriver (the method that builds $matchedWords and
$cleanText) either move the masking step so it runs after applying the severity
filter to $matchedWords, or reconstruct cleanText from the surviving MatchedWord
instances (using their offsets/lengths) before returning the Result; ensure
Score::calculate still receives the filtered $matchedWords and the returned
Result($text, $cleanText, $matchedWords, $scoreValue) reflects the post-filtered
state.

In `@src/Drivers/PipelineDriver.php`:
- Around line 63-69: The masking loop in PipelineDriver (variables $cleanText,
$reversed, and $match->position) assumes $match->position is a character offset
but sub-drivers may supply byte offsets; before using mb_substr with
$match->position convert the offset to a character index (e.g., compute
character count of the prefix up to the byte offset using a binary-safe
substring and mb_strlen with UTF-8) or, alternatively, ensure the producing
driver (e.g., PatternDriver) returns character offsets instead; update the loop
to use the converted character offset when slicing and applying $mask->mask so
multibyte strings are handled correctly.

In `@src/Drivers/RegexDriver.php`:
- Around line 117-129: The severity filter is applied to $matchedWords after
$workingCleanString has already been masked, causing clean() to still show
masked low-severity words while isOffensive()/count() exclude them; fix by
applying the severity filter (check $minimumSeverity instanceof Severity and
filter $matchedWords accordingly) before performing the masking that produces
$workingCleanString (or alternatively construct a separate masked string from
the filtered $matchedWords), then compute $totalWords, call Score::calculate
with the filtered $matchedWords, and return the Result so clean(), isOffensive()
and count() are consistent.
- Around line 58-95: preg_match_all returns byte/char offsets relative to
$normalizedString but the code uses $start directly on $workingCleanString and
$originalNormalized, causing misaligned masks after normalizers like
GermanNormalizer/SpanishNormalizer change lengths; fix by mapping
normalized-string match offsets back to original-string character offsets before
any substring/mask/replacement: build a position map between $normalizedString
and the original input (or compute $startOrig = mapNormalizedToOriginal($start)
and $lengthOrig = mb_strlen($matchedText, 'UTF-8') measured in original
coordinates), then use those original offsets for getFullWordContext,
$mask->mask application, and mb_substr on $workingCleanString and
$normalizedString (or alternatively perform matching on the original string and
validate with normalized checks via compoundDetector->isPureAlphaSubstring and
filter methods) so $start/$length always refer to the same coordinate system
across RegexDriver methods and updates.

In `@src/Facades/Blasp.php`:
- Around line 64-78: The current assertChecked() and assertCheckedTimes(int
$times) silently no-op when the facade root isn't a BlaspFake; update both
methods so that after obtaining $instance = static::getFacadeRoot() they check
if $instance is a BlaspFake and call the fake's methods as before, but if not,
throw a LogicException (or appropriate runtime exception) with a clear message
instructing the developer to call Blasp::fake() in their test; reference these
methods (assertChecked, assertCheckedTimes) and the BlaspFake type so the thrown
error is specific and actionable.

In `@src/Rules/Profanity.php`:
- Around line 15-34: The static factory methods in Profanity (in, maxScore,
severity) are non-composable because each creates a new self; change the design
so that in(string $language) remains a static factory that returns a new
Profanity instance, and convert maxScore(int $score) and severity(Severity
$severity) into instance fluent methods that set $this->maxScore and
$this->minimumSeverity respectively and return $this, allowing calls like
Profanity::in('es')->maxScore(50)->severity(Severity::High); if you still want
standalone static entry points, add separate static constructors (e.g., static
maxScoreFactory(int $score)) rather than making the mutators static.
🟡 Minor comments (18)
config/blasp.php-158-185 (1)

158-185: ⚠️ Potential issue | 🟡 Minor

Duplicate entries in substitution arrays.

A few substitution arrays contain duplicate characters:

  • Line 159 (/a/): 'Â' appears twice.
  • Line 173 (/o/): 'ø' appears twice.

These won't cause functional issues but add unnecessary noise. Consider deduplicating them.

src/Testing/BlaspFake.php-6-6 (1)

6-6: ⚠️ Potential issue | 🟡 Minor

Unused import: PendingCheck.

PendingCheck is imported but never referenced in this class.

Proposed fix
 use Blaspsoft\Blasp\Core\Result;
-use Blaspsoft\Blasp\PendingCheck;
 use PHPUnit\Framework\Assert;
README.md-30-33 (1)

30-33: ⚠️ Potential issue | 🟡 Minor

Clarify minimum Laravel version vs tested version.

The README states "Laravel 8.0+" but require-dev in composer.json pins orchestra/testbench: ^10.0 (Laravel 11+). While illuminate/support: ^8.0|... technically allows Laravel 8, the test suite doesn't verify compatibility with Laravel 8–10. Consider noting that Laravel 8–10 support is best-effort/untested, or aligning the claim with what's actually tested.

README.md-226-230 (1)

226-230: ⚠️ Potential issue | 🟡 Minor

README.md example has incorrect property types for nullable properties.

Lines 228–229 declare $blaspLanguage and $blaspMask as non-nullable string, but the Blaspable trait defines them as string|null (see trait docblock lines 16–17) and the implementation uses null coalescing operators (lines 47, 51) to check for null values. The comments "// null = config default" further indicate these should be nullable. Users following this example and setting values to null would encounter a TypeError.

The example should use ?string to match the trait's actual type hints:

-    protected string $blaspLanguage = 'spanish'; // null = config default
-    protected string $blaspMask = '#';           // null = config default
+    protected ?string $blaspLanguage = 'spanish'; // null = config default
+    protected ?string $blaspMask = '#';           // null = config default
src/Drivers/PatternDriver.php-25-25 (1)

25-25: ⚠️ Potential issue | 🟡 Minor

Use mb_strtolower for false positives to handle multibyte characters correctly.

strtolower won't correctly lowercase multibyte characters (e.g., 'Ñ''ñ'). Since the profanity list uses mb_strtolower (Line 31), the comparison on Line 41 may fail for multibyte false positives.

-        $falsePositives = array_map('strtolower', $dictionary->getFalsePositives());
+        $falsePositives = array_map(fn($fp) => mb_strtolower($fp, 'UTF-8'), $dictionary->getFalsePositives());
src/Core/Matchers/CompoundWordDetector.php-7-7 (1)

7-7: ⚠️ Potential issue | 🟡 Minor

SUFFIXES is English-only — misses language-specific morphology.

The hardcoded suffix list (s, es, ed, ing, etc.) is English-centric. In a multi-language system, German (e.g., -en, -ung, -lich), Spanish (e.g., -ado, -ción), and French (e.g., -ment, -tion) suffixes are not covered. This means legitimate profanity+suffix forms in non-English languages won't be recognized, causing false negatives.

Consider making the suffix list configurable per language or per-dictionary.

tests/MultiLanguageProfanityTest.php-43-49 (1)

43-49: ⚠️ Potential issue | 🟡 Minor

Duplicate array key 'scheisse' on lines 44–45.

Both entries have identical key and value, so the second silently overwrites the first and the array effectively has only 4 entries instead of 5. Remove the duplicate or replace with a different profanity (e.g., 'Scheiße' to test ß normalization).

Proposed fix
         $testCases = [
-            'scheisse' => 'Das ist scheisse',
             'scheisse' => 'Das ist scheisse',
             'arsch' => 'Du bist ein arsch',
             'ficken' => 'Ich will ficken',
src/Middleware/CheckProfanity.php-18-21 (1)

18-21: ⚠️ Potential issue | 🟡 Minor

Potential null severity when config value is invalid.

In the else branch (no $severity parameter), Severity::tryFrom(config('blasp.middleware.severity', 'mild')) can return null if the configured string doesn't match a valid Severity case. Line 36 guards against null, but then no severity filter is applied — which silently degrades to no filtering rather than failing with a clear error. Consider adding a fallback like the ?? Severity::Mild used in the if-branch:

Proposed fix
-        $minimumSeverity = $severity ? (Severity::tryFrom($severity) ?? Severity::Mild) : Severity::tryFrom(config('blasp.middleware.severity', 'mild'));
+        $minimumSeverity = $severity
+            ? (Severity::tryFrom($severity) ?? Severity::Mild)
+            : (Severity::tryFrom(config('blasp.middleware.severity', 'mild')) ?? Severity::Mild);
src/Middleware/CheckProfanity.php-63-72 (1)

63-72: ⚠️ Potential issue | 🟡 Minor

extractTextFields skips nested input arrays, which could leave nested form data unchecked.

When request data contains nested arrays (e.g., address[street], user[name]), the current implementation silently ignores them since it only processes string values. If comprehensive profanity checking across all user-submitted text is required, consider flattening nested arrays or recursively traversing them in extractTextFields.

Note: No tests currently exercise nested input with this middleware. Verify whether your application uses nested form data with the blasp middleware before implementing this.

src/Drivers/PhoneticDriver.php-62-93 (1)

62-93: ⚠️ Potential issue | 🟡 Minor

Byte-offset vs. character-offset mismatch in masking and offset drift from normalization.

Two related concerns with the offset handling:

  1. PREG_OFFSET_CAPTURE (line 66) returns byte offsets, but mb_substr on line 93 expects character offsets. For ASCII this is identical, but any multi-byte character (emoji, accented char) before a match will cause misaligned masking.

  2. Tokenization runs on $normalized text, but masking is applied to $cleanText (initialized from the original $text). If normalization changes character positions (e.g., accent stripping reduces multi-byte chars to single-byte), offsets won't correspond.

Since the phonetic driver is currently English-only and the English normalizer likely preserves ASCII positions, this is low-risk today but will break if supportedLanguages is expanded.

src/BlaspManager.php-76-87 (1)

76-87: ⚠️ Potential issue | 🟡 Minor

Potential infinite recursion if pipeline config includes 'pipeline' as a sub-driver.

createPipelineDriver() calls resolveDriver() for each configured sub-driver name. If blasp.drivers.pipeline.drivers contains 'pipeline', this recurses infinitely. A guard clause would prevent this misconfiguration from causing a stack overflow.

🛡️ Proposed guard
     public function createPipelineDriver(): DriverInterface
     {
         $config = $this->app['config']->get('blasp.drivers.pipeline', []);
         $driverNames = $config['drivers'] ?? ['regex', 'phonetic'];
+        $driverNames = array_filter($driverNames, fn (string $name) => $name !== 'pipeline');

         $resolvedDrivers = array_map(
             fn (string $name) => $this->resolveDriver($name),
             $driverNames,
         );

         return new PipelineDriver($resolvedDrivers);
     }
tests/BlaspCheckTest.php-163-163 (1)

163-163: ⚠️ Potential issue | 🟡 Minor

Typo in method name: test_word_boudarytest_word_boundary.

Proposed fix
-    public function test_word_boudary()
+    public function test_word_boundary()
tests/BlaspCheckTest.php-175-175 (1)

175-175: ⚠️ Potential issue | 🟡 Minor

Typo in method name: test_pural_profanitytest_plural_profanity.

Proposed fix
-    public function test_pural_profanity()
+    public function test_plural_profanity()
src/PendingCheck.php-149-157 (1)

149-157: ⚠️ Potential issue | 🟡 Minor

$falsePositives parameter is accepted but silently ignored.

The PHPMD hint is valid. This backward-compat method accepts $falsePositives but never uses it — callers may expect false positives to be applied. Either wire it into the dictionary (e.g., add to an internal falsePositivesList) or document why it's intentionally a no-op.

If intentionally a no-op, annotate to suppress the warning and document the reason
+    /**
+     * `@deprecated` Backward-compat method. Only $profanities (→ blockList) is supported;
+     *             false positives are now managed via language config files.
+     */
     public function configure(?array $profanities = null, ?array $falsePositives = null): self
     {
         if ($profanities !== null) {
             $this->blockList = array_merge($this->blockList, $profanities);
         }
+        // $falsePositives intentionally unused — false positives are managed per-language in config.
         return $this;
     }
tests/ResultCachingTest.php-103-119 (1)

103-119: ⚠️ Potential issue | 🟡 Minor

Bug: $keys is overwritten before the verification loop, so the loop body never executes.

On line 107, $keys captures the original cache keys. On line 112, $keys is reassigned to the result of a fresh Cache::get(...) call — which returns [] after the cache was cleared. The foreach on line 116 then iterates over this empty array, so it never verifies that the individual cached results were actually deleted.

Proposed fix: preserve the original keys for verification
     public function test_clear_cache_wipes_result_cache(): void
     {
         Blasp::check('This is a fucking sentence');

-        $keys = Cache::get('blasp_result_cache_keys', []);
+        $originalKeys = Cache::get('blasp_result_cache_keys', []);
-        $this->assertNotEmpty($keys);
+        $this->assertNotEmpty($originalKeys);

         Dictionary::clearCache();

-        $keys = Cache::get('blasp_result_cache_keys', []);
         $this->assertNull(Cache::get('blasp_result_cache_keys'));

         // Verify the cached result data was also cleared
-        foreach ($keys as $key) {
+        foreach ($originalKeys as $key) {
             $this->assertNull(Cache::get($key));
         }
     }
src/Core/Dictionary.php-49-50 (1)

49-50: ⚠️ Potential issue | 🟡 Minor

Use mb_strtolower instead of strtolower for multi-language support.

strtolower is not Unicode-safe. For languages like German (Ärsch → should be ärsch), accented characters may not be lowered correctly. Since v4 explicitly supports non-English languages, use mb_strtolower throughout for consistency.

Proposed fix
-        $this->allowList = array_map('strtolower', $allowList);
-        $this->blockList = array_map('strtolower', $blockList);
+        $this->allowList = array_map('mb_strtolower', $allowList);
+        $this->blockList = array_map('mb_strtolower', $blockList);

Also apply to line 65 and line 181:

-                fn($p) => !in_array(strtolower($p), $this->allowList)
+                fn($p) => !in_array(mb_strtolower($p), $this->allowList)
-        $lower = strtolower($word);
+        $lower = mb_strtolower($word);
src/Core/Result.php-156-156 (1)

156-156: ⚠️ Potential issue | 🟡 Minor

str_word_count() is unreliable for non-English/multi-byte text.

str_word_count uses locale-dependent rules and may miscount words in languages with accented characters, CJK scripts, or other non-ASCII text. Since v4 explicitly supports Spanish, French, and German, this could produce incorrect score calculations.

Consider using a Unicode-aware word count, e.g., preg_match_all('/\S+/u', $text) or count(preg_split('/\s+/u', $text, -1, PREG_SPLIT_NO_EMPTY)).

Proposed fix
-        $totalWords = max(1, str_word_count($originalText ?: implode(' ', $words)));
+        $totalWords = max(1, count(preg_split('/\s+/u', trim($originalText ?: implode(' ', $words)), -1, PREG_SPLIT_NO_EMPTY)));
src/PendingCheck.php-39-43 (1)

39-43: ⚠️ Potential issue | 🟡 Minor

strict() mode is set but never consumed during analysis—only affects cache invalidation.

strictMode is toggled via strict() (line 89) and included in the cache key (line 301), but unlike lenientMode (which forces the 'pattern' driver on line 254), strictMode does not influence driver selection or analysis behavior. The $options array passed to the analyzer contains only severity (if set), and strictMode is never passed to the driver's detect() method. This appears to be incomplete implementation. Either use strictMode to influence driver selection or behavior, or remove it to avoid confusion.

🧹 Nitpick comments (37)
src/Core/Masking/GrawlixMask.php (1)

11-18: $word parameter is unused — expected given the interface contract, but worth a brief note.

The static analysis tool flags $word as unused. This is inherent to the MaskStrategyInterface contract (other implementations like CallbackMask do use it). Consider suppressing with @SuppressWarnings or a brief inline comment for clarity.

config/blasp.php (1)

26-27: Backward-compat aliases can drift from primary keys.

The aliases (default_language, mask_character, cache_driver) duplicate the values of the primary keys (language, mask, cache.driver). If a user edits one in a published config without updating the other, consumers reading different keys will see inconsistent values. Consider having the aliases reference the primary keys instead (e.g., 'default_language' => null, // deprecated, use 'language') or resolve them at runtime in the service provider.

Also applies to: 39-40, 76-77

src/Core/Matchers/PhoneticMatcher.php (1)

63-66: Inconsistent string length functions: strlen vs mb_strlen.

Line 65 uses strlen() to compute $maxLen for the Levenshtein threshold, while lines 25 and 47 use mb_strlen() for the minimum word length check. Since levenshtein() operates on bytes, using strlen() here is arguably correct for threshold consistency with the distance value. However, mixing the two could produce surprising results if non-ASCII input reaches this code path (e.g., a 2-character word with multi-byte chars passes the mb_strlen >= 3 check but strlen returns a larger value).

Given the phonetic driver is English-only per config, this is low-risk, but worth aligning for defensive correctness.

tests/DetectionStrategyRegistryTest.php (1)

14-14: Test class name no longer reflects what it tests.

DetectionStrategyRegistryTest now tests BlaspManager rather than a registry. Consider renaming to BlaspManagerTest to match the new architecture and avoid confusion.

src/Core/Matchers/FalsePositiveFilter.php (1)

125-139: getFullWordContext is not multibyte-safe, unlike sibling methods.

This method uses byte-level string access ($string[$left - 1], strlen(), substr()) which can break on multi-byte UTF-8 characters — splitting mid-character or producing corrupted substrings. In contrast, isSpanningWordBoundary in the same class correctly uses mb_strlen/mb_substr.

If this code only processes ASCII-normalized text, it's fine in practice, but it's an inconsistency worth noting.

♻️ Multibyte-safe version
 public function getFullWordContext(string $string, int $start, int $length): string
 {
     $left = $start;
     $right = $start + $length;
+    $strLen = mb_strlen($string, 'UTF-8');
 
-    while ($left > 0 && preg_match('/\w/', $string[$left - 1])) {
+    while ($left > 0 && preg_match('/\w/u', mb_substr($string, $left - 1, 1, 'UTF-8'))) {
         $left--;
     }
 
-    while ($right < strlen($string) && preg_match('/\w/', $string[$right])) {
+    while ($right < $strLen && preg_match('/\w/u', mb_substr($string, $right, 1, 'UTF-8'))) {
         $right++;
     }
 
-    return substr($string, $left, $right - $left);
+    return mb_substr($string, $left, $right - $left, 'UTF-8');
 }
tests/ProfanityExpressionGeneratorTest.php (1)

7-7: Test class/file name doesn't match the class under test.

ProfanityExpressionGeneratorTest now tests RegexMatcher. Consider renaming to RegexMatcherTest for discoverability and consistency.

src/Testing/BlaspFake.php (1)

59-63: __call magic method may silently swallow typos and invalid method calls.

The catch-all __call means any misspelled method in tests (e.g., ->diver('regex') instead of ->driver('regex')) will silently return $this instead of failing. Since you already have explicit no-op methods for the known fluent API surface, consider removing __call or at least logging/tracking unknown calls to aid debugging.

tests/EdgeCaseTest.php (1)

9-41: Tests still use deprecated v3 method names.

These tests call hasProfanity() and getUniqueProfanitiesFound() which are preserved as deprecated aliases. Since this is a v4 rewrite, consider migrating the test assertions to the v4 equivalents (isOffensive(), uniqueWords()) to lead by example and avoid deprecation noise in future tooling.

src/Core/Normalizers/EnglishNormalizer.php (1)

1-11: EnglishNormalizer is functionally identical to NullNormalizer.

Both classes implement StringNormalizer and return the input unchanged. Additionally, src/Normalizers/EnglishStringNormalizer (extending the abstract StringNormalizer) does the same thing. Consider whether all three are needed, or if EnglishNormalizer could simply be an alias or removed in favor of NullNormalizer.

composer.json (1)

18-24: orchestra/testbench ^10.0 limits tested Laravel versions.

illuminate/support supports ^8.0 through ^12.0, but orchestra/testbench: ^10.0 only tests against Laravel 11/12. Earlier Laravel versions (8–10) are untested in CI. If those are officially supported, consider a test matrix with multiple testbench versions, or narrow the illuminate/support constraint to match what's actually tested.

tests/BladeDirectiveTest.php (1)

9-17: renderBlade helper is fine for test-only usage but could be fragile.

Using eval() on compiled Blade output is a common pattern for Blade directive testing in Orchestra Testbench. However, if Blade::compileString throws or returns malformed PHP, the ob_start() buffer will leak. Consider wrapping in a try/finally to ensure ob_end_clean() on failure.

🛡️ Suggested defensive improvement
     protected function renderBlade(string $template, array $data = []): string
     {
         $compiled = Blade::compileString($template);
 
         ob_start();
-        extract($data);
-        eval('?>' . $compiled);
-        return ob_get_clean();
+        try {
+            extract($data);
+            eval('?>' . $compiled);
+            return ob_get_clean();
+        } catch (\Throwable $e) {
+            ob_end_clean();
+            throw $e;
+        }
     }
src/Events/ProfanityDetected.php (1)

7-13: Minor: originalText may be redundant with $result->original().

Result already exposes original() which returns the original text. The $originalText property duplicates this. If they're always the same, consider removing the extra property to keep the event lean. Not blocking.

src/Drivers/PatternDriver.php (1)

17-19: Null coalescing on a non-nullable string parameter is redundant.

$text is typed string so $text ?? '' can never trigger the fallback.

-            return new Result($text ?? '', $text ?? '', [], 0);
+            return new Result($text, $text, [], 0);
src/Drivers/PipelineDriver.php (1)

19-21: Redundant null coalescing on non-nullable string parameter.

Same minor nit as PatternDriver$text is typed string, so $text ?? '' is always $text.

-            return new Result($text ?? '', $text ?? '', [], 0);
+            return new Result($text, $text, [], 0);
tests/PipelineDriverTest.php (1)

5-5: Unused import: PipelineDriver.

PipelineDriver is imported but never referenced — all tests interact through the Blasp facade.

Proposed fix
-use Blaspsoft\Blasp\Drivers\PipelineDriver;
 use Blaspsoft\Blasp\Enums\Severity;
src/Middleware/CheckProfanity.php (1)

25-29: $except filter is silently bypassed when $fields is explicitly configured.

When $fields !== ['*'], line 28 overwrites $input with $request->only($fields), discarding the $except exclusion from line 25. If someone accidentally adds password to the fields config, it will be scanned. Consider intersecting:

Proposed fix
-        $input = $request->except($except);
-
-        if ($fields !== ['*']) {
-            $input = $request->only($fields);
-        }
+        if ($fields !== ['*']) {
+            $input = $request->only(array_diff($fields, $except));
+        } else {
+            $input = $request->except($except);
+        }
tests/TestCase.php (1)

22-30: Several Config::set calls are no-ops (setting a key to its own value).

Lines like Config::set('blasp.profanities', config('blasp.profanities')) read the merged config and write the same value back — these are effectively no-ops since the service provider's mergeConfigFrom has already loaded them. Consider removing the redundant lines and keeping only the overrides that set explicit test values (like lines 22, 27, 28).

Also, both blasp.cache.driver (line 29) and blasp.cache_driver (line 30) are present — if v4 uses the nested key, the flat key may be a leftover.

src/Core/Normalizers/SpanishNormalizer.php (1)

19-26: Accented vowels in the ll lookahead are dead after normalization.

By line 19, strtr has already replaced all accented vowels (áa, etc.). The accented characters in the lookahead (?=[aeiouáéíóúü]) on line 21 will never match. They're harmless but misleading — the character class could be simplified to (?=[aeiou]).

src/Drivers/RegexDriver.php (2)

22-24: Redundant null coalescing on typed string parameter.

$text is typed as string (non-nullable) on line 20, so $text ?? '' on line 23 can never trigger the fallback.

Proposed fix
         if (empty($text)) {
-            return new Result($text ?? '', $text ?? '', [], 0);
+            return new Result($text, $text, [], 0);
         }

30-31: Per-call objects stored as instance properties — thread-unsafe code smell.

FalsePositiveFilter and CompoundWordDetector are created per detect() call but stored as instance fields ($this->filter, $this->compoundDetector). If the RegexDriver instance is reused across calls (e.g., as a singleton in the container), this overwrites shared state. Consider using local variables instead.

Proposed fix
 class RegexDriver implements DriverInterface
 {
-    private FalsePositiveFilter $filter;
-    private CompoundWordDetector $compoundDetector;
-
     public function detect(string $text, Dictionary $dictionary, MaskStrategyInterface $mask, array $options = []): Result
     {
         // ...
-        $this->filter = new FalsePositiveFilter($dictionary->getFalsePositives());
-        $this->compoundDetector = new CompoundWordDetector();
+        $filter = new FalsePositiveFilter($dictionary->getFalsePositives());
+        $compoundDetector = new CompoundWordDetector();

Then replace all $this->filter / $this->compoundDetector references with the local variables.

tests/AllLanguagesDetectionTest.php (2)

67-70: Duplicate test variant 'scheisse' in the German list.

The array on line 69 contains 'scheisse' twice (positions 3 and 5). This doesn't break anything but wastes a test iteration.

-                'scheisse' => ['SCHEISSE', 'Scheisse', 'scheisse', 'ScHeIsSe', 'scheisse']
+                'scheisse' => ['SCHEISSE', 'Scheisse', 'scheisse', 'ScHeIsSe']

100-122: German normalizer tests don't exercise umlauts or ß.

The German normalizer presumably handles ö, ü, ä, ß (e.g., Scheißescheisse). The test only verifies scheisse without diacritics, which means the normalizer's umlaut/ß handling is untested here. Consider adding variants like 'Scheiße' and 'Ärsch'.

tests/MultiLanguageProfanityTest.php (1)

123-158: test_comprehensive_language_coverage invokes a full check per profanity word — potentially slow.

This iterates over every profanity in every language dictionary and creates a new PendingCheck + driver execution for each one. Depending on dictionary size, this could be hundreds or thousands of invocations. Consider whether this is intended as a development/CI-only smoke test, and if so, annotating it with a PHPUnit group (e.g., #[Group('slow')]) so it can be excluded from fast feedback loops.

src/Core/Matchers/CompoundWordDetector.php (1)

9-47: Method name isPureAlphaSubstring is misleading about return semantics.

Returning true means "this match is a pure alpha substring of a larger word and should be suppressed." The caller in RegexDriver (line 77) uses it in a continue guard, so true → skip the match. The name reads like a predicate about the string's nature, but it's actually a "should suppress" signal. Consider renaming to something like shouldSuppressAsSubstring for clarity.

src/Blaspable.php (1)

93-102: withoutBlaspChecking — consider concurrency with queued/async saves.

The static flag pattern is correct for synchronous request-scoped use. However, if a model save is deferred or dispatched to a queue inside the callback, the flag will have already been re-enabled by the time the queued job actually saves. This is a known limitation of this pattern (Laravel's own Model::withoutEvents has the same caveat), but worth a note in the docblock for users.

src/Drivers/PhoneticDriver.php (1)

27-29: Null coalesce on non-nullable string parameter is redundant.

$text ?? '' on line 28 is unnecessary since $text is typed as string (non-nullable). Minor nit — no functional impact.

Suggested simplification
         if (empty($text)) {
-            return new Result($text ?? '', $text ?? '', [], 0);
+            return new Result($text, $text, [], 0);
         }
src/BlaspServiceProvider.php (1)

53-64: PHPMD $attribute unused warning is a false positive — Laravel's validator callback signature requires it.

The $attribute parameter is part of Laravel's Validator::extend() callback contract ($attribute, $value, $parameters, $validator). Suppressing the warning with a docblock annotation would be the cleanest approach if PHPMD noise is a concern.

tests/PhoneticDriverTest.php (1)

145-153: Add a clean() assertion to fully validate severity filtering.

This test verifies that isClean() returns true when the severity threshold filters out "fuck", but it doesn't assert on $result->clean(). Given the masking-before-filtering bug in PhoneticDriver (raised separately), the clean text may still contain asterisks even though isClean() is true. Adding $this->assertSame('What the fuck', $result->clean()) would catch this inconsistency.

Proposed additional assertion
         // "fuck" is typically High severity, not Extreme, so should be filtered out
         $this->assertTrue($result->isClean());
+        $this->assertSame('What the fuck', $result->clean());
tests/ConfigurationLoaderLanguageTest.php (1)

116-118: Inconsistent import: uses FQCN inline instead of a use statement.

Line 117 references \Blaspsoft\Blasp\Facades\Blasp as a fully-qualified class name, while the rest of the test files in this PR import the facade via a use statement at the top.

Proposed fix

Add to the imports at the top of the file:

 use Blaspsoft\Blasp\Core\Dictionary;
+use Blaspsoft\Blasp\Facades\Blasp;
 use Blaspsoft\Blasp\Core\Normalizers\EnglishNormalizer;

Then update line 117:

-        $result = \Blaspsoft\Blasp\Facades\Blasp::french()->check('connard');
+        $result = Blasp::french()->check('connard');
tests/ResultCachingTest.php (1)

141-155: Misleading comment: no PHP state is actually cleared between the two calls.

The comment on line 145 says "Clear PHP state but keep cache" but no action follows — the second check() call runs immediately. The test still validates that a cached result deserializes correctly (the second call hits cache and returns via Result::fromArray), but the comment is misleading. Consider either removing it or adding an actual state reset (e.g., clearing the resolved facade instance) between the two calls to make the intent explicit.

src/PendingCheck.php (2)

252-258: Lenient mode silently overrides an explicitly set driver name.

If a caller chains ->driver('regex')->lenient()->check(...), the explicit 'regex' driver is silently replaced with 'pattern' at line 255. strict() mode has no analogous override. This asymmetry could surprise users.

Consider either documenting this precedence clearly or throwing/warning when both an explicit driver and lenient mode are set.


316-322: Unbounded growth of blasp_result_cache_keys tracking array.

Every unique check appends to this array (stored forever). Over time in high-throughput applications, this list can grow large, degrading the performance of clearCache() and the storage cost of the tracking key itself.

Consider bounding the list (e.g., capping at a configurable max), using a time-based eviction strategy, or switching to a cache tag-based approach if the underlying cache driver supports tags.

src/Core/Result.php (1)

117-137: fromArray silently tolerates missing keys in word data — could mask upstream bugs.

If a cached entry is corrupted or schema-drifts (e.g., missing 'text' or 'position'), the constructor will throw an unrelated error deep inside MatchedWord. Consider adding a guard or at minimum a null coalesce on the required fields to produce a more informative failure.

src/Core/Dictionary.php (3)

114-156: Multi-language dictionary always uses EnglishNormalizer.

forLanguages() on line 151 hard-codes self::getNormalizerForLanguage('english'). When checking text in e.g. German + Spanish, the English normalizer may not handle language-specific normalization correctly (e.g., ß, ñ, accented chars).

Consider either accepting a normalizer parameter, using a composite normalizer that chains per-language normalizers, or selecting based on the first/primary language.


339-344: getCache() logic is duplicated with PendingCheck::getCache() (lines 309-314 in PendingCheck.php).

Both methods resolve the cache store identically. Extract into a shared utility or trait to avoid drift.


301-314: Normalizer cache key doesn't normalize case but match does.

If getNormalizerForLanguage('English') is called, the cache stores it under key 'English', but the match lowercases to find the correct class. A subsequent call with 'english' creates a separate instance cached under 'english'. Normalizing the key would be more robust.

Proposed fix
     public static function getNormalizerForLanguage(string $language): StringNormalizer
     {
-        if (!isset(self::$normalizers[$language])) {
-            self::$normalizers[$language] = match (strtolower($language)) {
+        $key = strtolower($language);
+        if (!isset(self::$normalizers[$key])) {
+            self::$normalizers[$key] = match ($key) {
                 'english' => new EnglishNormalizer(),
                 'spanish' => new SpanishNormalizer(),
                 'german' => new GermanNormalizer(),
                 'french' => new FrenchNormalizer(),
                 default => new EnglishNormalizer(),
             };
         }

-        return self::$normalizers[$language];
+        return self::$normalizers[$key];
     }
tests/BlaspableTest.php (1)

230-240: Consider adding a test for withoutBlaspChecking exception safety.

The withoutBlaspChecking implementation uses a finally block to re-enable checking even when the callback throws. There's no test verifying that $blaspCheckingDisabled is properly reset after an exception inside the callback. If the finally clause were accidentally removed, subsequent tests could silently pass with checking disabled.

💡 Suggested additional test
public function test_without_blasp_checking_resets_after_exception()
{
    try {
        BlaspableTestModel::withoutBlaspChecking(function () {
            throw new \RuntimeException('boom');
        });
    } catch (\RuntimeException) {
        // expected
    }

    // Checking should be re-enabled — profanity should be masked
    $model = BlaspableTestModel::create([
        'body' => 'This is a fucking sentence',
    ]);

    $this->assertStringNotContainsString('fucking', $model->body);
}

deemonic and others added 2 commits February 13, 2026 15:37
…lists

Non-English severity maps (Spanish, French, German) only had 3 tiers
(mild, moderate, extreme) while English had 4. Added 'high' tier with
representative strong profanity words to each.

Also added 39 words that appeared in severity maps but were missing
from profanities arrays (21 English, 5 French, 13 German), which
meant they could never be detected.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Dictionary: sanitize language parameter to prevent path traversal
  via loadLanguageConfig(), forLanguage(), and forLanguages()
- TestCommand: rename --verbose to --detail to avoid conflict with
  Symfony Console's built-in -v|--verbose flag
- PatternDriver, PhoneticDriver, RegexDriver: convert PREG_OFFSET_CAPTURE
  byte offsets to character offsets for correct multibyte string handling
- PatternDriver, PhoneticDriver, RegexDriver: apply severity filter before
  masking so low-severity words aren't masked in cleanText when filtered out
- Blasp facade: throw RuntimeException in assertChecked() and
  assertCheckedTimes() when fake() hasn't been called, instead of silently
  passing
- Profanity rule: convert static factory methods to instance methods with
  __callStatic for backward compat, enabling chaining like
  Profanity::in('spanish')->severity(Severity::High)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
config/languages/spanish.php (1)

370-426: ⚠️ Potential issue | 🟠 Major

Spanish profanities list contains common non-vulgar words that will cause false positives.

The tail end of the profanities array includes everyday Spanish words like triste (sad), deprimido/deprimida (depressed), morir/muerte (to die/death), fallecido (deceased), adiós/adios, hasta luego, hasta pronto, chao, bye, goodbye, and many others. These are not profanities or vulgar expressions — they are standard vocabulary that will be flagged and masked in normal Spanish text.

This will severely degrade user experience for Spanish-language content. Audit and trim the profanities list to actual vulgar/offensive terms.

🤖 Fix all issues with AI agents
In `@config/languages/english.php`:
- Line 1337: The word "knob" is present in both the profanities array and the
false_positives array in config/languages/english.php, causing conflicting
behavior; remove "knob" from one of those arrays to make intent explicit (either
keep it only in profanities if you intend it to be flagged, or only in
false_positives if you intend it to be ignored), update the corresponding array
(profanities or false_positives) where "knob" appears, and ensure the change is
reflected consistently so PatternDriver/PhoneticDriver logic no longer has to
reconcile the duplicate.

In `@src/Drivers/PatternDriver.php`:
- Around line 29-54: PatternDriver currently adds MatchedWord entries for every
independent match against $lowerText which allows overlapping matches (e.g.,
"motherfucker" and "fuck") and causes double-masking and corrupted positions;
fix this by filtering/deduplicating overlapping ranges before masking: when
iterating matches in the foreach over $profanities/$matches, compute the
candidate range (start, length) and skip creating a MatchedWord if that range
overlaps any already-recorded MatchedWord ranges in $matchedWords (or
alternatively build a dedicated list of reserved character ranges and check
against it), ensuring you still prefer longer matches (keep existing
longest-first order) so shorter matches that fall inside an existing range are
ignored prior to the right-to-left masking step.

In `@src/Drivers/PhoneticDriver.php`:
- Around line 80-84: The check in PhoneticDriver (and similarly in RegexDriver)
passes character offsets ($start, $length calculated with mb_* functions) into
FalsePositiveFilter::isInsideHexToken which uses byte-level functions; convert
the character offsets to byte offsets before calling isInsideHexToken (or change
isInsideHexToken to accept character offsets and use mb_* internally).
Specifically, compute the byte-based start and length for the substring (e.g.
use mb_strlen/mb_substr to determine the byte position or use mb_strpos combined
with utf8 byte offset conversion) and then call isInsideHexToken($normalized,
$byteStart, $byteLength); ensure the referenced symbols are updated:
PhoneticDriver::isInsideHexToken call sites (and the analogous RegexDriver call)
or update FalsePositiveFilter::isInsideHexToken to operate on multibyte-aware
offsets.
🧹 Nitpick comments (6)
src/Rules/Profanity.php (2)

38-41: Consider guarding __callStatic against invalid method names.

Currently any undefined static call (e.g., Profanity::foo()) will trigger a confusing Call to undefined method error on the anonymous instance. A whitelist or method_exists check would produce a clearer error message.

♻️ Optional improvement
 public static function __callStatic(string $name, array $arguments): self
 {
-    return (new self())->$name(...$arguments);
+    $instance = new self();
+
+    if (! method_exists($instance, $name)) {
+        throw new \BadMethodCallException(sprintf('Method %s::%s does not exist.', static::class, $name));
+    }
+
+    return $instance->$name(...$arguments);
 }

62-71: maxScore mode silently skips the isOffensive() check — document or unify the behavior.

When maxScore is set, the early return on line 66 means isOffensive() is never evaluated. This creates two mutually exclusive validation modes: score-threshold vs. binary-offensive. If a user sets both maxScore and severity, the severity filtering still applies to the check (line 56-58), but only the score is used for the pass/fail decision.

This is likely intentional, but it could surprise users who expect both constraints to be enforced simultaneously. Consider either:

  • Documenting this behavior, or
  • Combining both checks when both are configured (fail if score exceeds threshold or result is offensive).
config/languages/spanish.php (1)

149-149: Duplicate entries in profanities array.

cabronazo appears at both line 149 and line 186. cochino/cochina also appear at lines 158–159 and 233–234. While harmless at runtime (just slightly more memory), it indicates a lack of deduplication.

Also applies to: 186-186

src/Drivers/PhoneticDriver.php (1)

27-29: Redundant null-coalesce on a non-nullable string parameter.

$text is typed as string (line 25), so $text ?? '' on line 28 is a no-op. Use $text directly.

Proposed fix
         if (empty($text)) {
-            return new Result($text ?? '', $text ?? '', [], 0);
+            return new Result($text, $text, [], 0);
         }
src/Drivers/PatternDriver.php (1)

17-19: Redundant null-coalesce — same as PhoneticDriver.

$text is typed string, so $text ?? '' is a no-op.

src/Drivers/RegexDriver.php (1)

22-24: Redundant null-coalesce — same pattern as the other drivers.

$text is typed string; $text ?? '' is always a no-op.

- Remove 'knob' from false_positives list (conflicts with profanities)
- PatternDriver: deduplicate overlapping matches before masking to
  prevent double-masking (e.g., "motherfucker" matching both
  "motherfucker" and "fuck")
- PhoneticDriver, RegexDriver: pass byte offsets to FalsePositiveFilter
  methods (isInsideHexToken, isSpanningWordBoundary, getFullWordContext)
  which use byte-level operations, while keeping character offsets for
  MatchedWord positions and mb_substr masking

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/Drivers/PatternDriver.php`:
- Line 24: The false-positive normalization uses strtolower which fails on
multibyte characters; change the mapping that builds $falsePositives in
PatternDriver (where $falsePositives = array_map('strtolower',
$dictionary->getFalsePositives())) to use mb_strtolower with the same encoding
used elsewhere (e.g., 'UTF-8') so its casing matches the other mb_strtolower
calls (see the earlier uses around the code and the in_array check that compares
against these values).
- Around line 56-66: The deduplication can keep a shorter match when two entries
share the same position because usort is unstable; change the usort comparator
used on $matchedWords to sort by position ascending and then by length
descending so longer matches at the same position come first, ensuring the dedup
loop (which uses $coveredEnd, $deduplicated, $mw->position and $mw->length)
always retains the longest overlapping match; update the comparator passed to
usort in PatternDriver (the $matchedWords sort) accordingly.
🧹 Nitpick comments (7)
config/languages/english.php (1)

1331-1351: Appended words break the alphabetical order of the profanities array.

The existing list is alphabetically sorted, but these 21 additions are appended at the end. This makes it harder to spot duplicates and maintain the list over time. Consider inserting them at their sorted positions.

src/Drivers/RegexDriver.php (2)

22-24: Redundant null-coalescing on a typed string parameter.

$text is declared as string in the method signature, so it can never be null. The ?? '' on line 23 is dead code. Same applies to PhoneticDriver and PatternDriver.

Suggested fix
         if (empty($text)) {
-            return new Result($text ?? '', $text ?? '', [], 0);
+            return new Result($text, $text, [], 0);
         }

86-109: No guard against the same text region being matched multiple times within one while-loop pass.

Within a single pass of the while loop, preg_match_all for a given profanity captures all occurrences up-front (line 52), and then the inner foreach masks them one by one (line 89). However, a subsequent profanity expression in the same outer foreach iteration can still match a region that was just masked in the current pass (since preg_match_all runs against the updated $normalizedString). The longest-first sort mitigates this in the common case, but it doesn't handle every scenario—for example, two profanities of equal key length where one is a substring of a match region already processed by the other.

In practice this is low-risk because asterisks are unlikely to match profanity regexes, but the PatternDriver explicitly deduplicates and this driver doesn't. Consider adding a covered-range check similar to PatternDriver if robustness is important.

src/Drivers/PhoneticDriver.php (2)

27-29: Same redundant ?? '' as noted in RegexDriver.

$text is typed string?? '' is unreachable.


35-50: Language-support check could be streamlined.

Minor readability improvement — the loop + flag can be collapsed with array_intersect.

Suggested simplification
-        $languages = array_map('strtolower', explode(',', $language));
-        $supported = array_map('strtolower', $this->supportedLanguages);
-
-        $isSupported = false;
-        foreach ($languages as $lang) {
-            if (in_array(trim($lang), $supported, true)) {
-                $isSupported = true;
-                break;
-            }
-        }
-
-        if (!$isSupported) {
+        $languages = array_map(fn($l) => strtolower(trim($l)), explode(',', $language));
+        $supported = array_map('strtolower', $this->supportedLanguages);
+
+        if (empty(array_intersect($languages, $supported))) {
             return new Result($text, $text, [], 0);
         }
src/Drivers/PatternDriver.php (2)

17-18: Same $text ?? '' nit as the other drivers.


29-54: False-positive check is per-profanity, not per-match — consider hoisting it.

in_array($lowerProfanity, $falsePositives) on line 40 doesn't depend on the individual match, only on the profanity word itself. Moving this check before preg_match_all avoids the regex work entirely for false-positive profanities.

Suggested restructuring
         foreach ($profanities as $profanity) {
             $lowerProfanity = mb_strtolower($profanity, 'UTF-8');
+
+            // Skip profanities that are known false positives
+            if (in_array($lowerProfanity, $falsePositives)) {
+                continue;
+            }
+
             $pattern = '/\b' . preg_quote($lowerProfanity, '/') . '\b/iu';
 
             if (preg_match_all($pattern, $lowerText, $matches, PREG_OFFSET_CAPTURE)) {
                 foreach ($matches[0] as $match) {
                     $start = mb_strlen(substr($lowerText, 0, $match[1]), 'UTF-8');
                     $length = mb_strlen($match[0], 'UTF-8');
                     $originalMatch = mb_substr($text, $start, $length);
 
-                    // Skip false positives
-                    if (in_array($lowerProfanity, $falsePositives)) {
-                        continue;
-                    }
-
                     $matchedWords[] = new MatchedWord(

…n PatternDriver

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/Drivers/PatternDriver.php`:
- Line 88: The use of str_word_count in PatternDriver.php can miscount multibyte
words; replace it with a Unicode-aware word count using preg_match_all with the
\p{L}+ pattern and the 'u' modifier (e.g., count matches into $totalWords) and
wrap with max(1, ...) to preserve the minimum-one behavior; update the code
where $totalWords is computed so it uses preg_match_all('/\p{L}+/u', $text,
$matches) and sets $totalWords = max(1, count($matches[0])) to correctly handle
accented and non-ASCII letters.
🧹 Nitpick comments (2)
src/Drivers/PatternDriver.php (2)

17-18: Redundant null-coalescing on a non-nullable string parameter.

$text is typed as string, so $text ?? '' can never evaluate to ''. This is harmless but misleading—it suggests $text could be null.

Proposed simplification
-            return new Result($text ?? '', $text ?? '', [], 0);
+            return new Result($text, $text, [], 0);

29-53: False-positive check can be hoisted before the regex call.

The in_array($lowerProfanity, $falsePositives) check on line 40 depends only on the outer $profanity loop variable, not on the individual match. Moving it before preg_match_all avoids running the regex entirely for profanities that will always be skipped.

Proposed optimization
         foreach ($profanities as $profanity) {
             $lowerProfanity = mb_strtolower($profanity, 'UTF-8');
+
+            if (in_array($lowerProfanity, $falsePositives)) {
+                continue;
+            }
+
             $pattern = '/\b' . preg_quote($lowerProfanity, '/') . '\b/iu';
 
             if (preg_match_all($pattern, $lowerText, $matches, PREG_OFFSET_CAPTURE)) {
                 foreach ($matches[0] as $match) {
                     $start = mb_strlen(substr($lowerText, 0, $match[1]), 'UTF-8');
                     $length = mb_strlen($match[0], 'UTF-8');
                     $originalMatch = mb_substr($text, $start, $length);
 
-                    // Skip false positives
-                    if (in_array($lowerProfanity, $falsePositives)) {
-                        continue;
-                    }
-
                     $matchedWords[] = new MatchedWord(

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/Drivers/PatternDriver.php`:
- Around line 15-19: PatternDriver::detect currently omits UTF-8 validation but
uses mb_* functions; add the same guard used in RegexDriver/PhoneticDriver at
the top of detect(): check encoding with mb_check_encoding($text, 'UTF-8') and,
if it fails, normalise the input with mb_convert_encoding($text, 'UTF-8',
'UTF-8') before continuing (ensuring the early-return branch that constructs
Result uses the normalized text). This keeps encoding-consistency with
RegexDriver/PhoneticDriver and avoids garbled output when detect() relies on
mb_* calls.

In `@src/Drivers/PhoneticDriver.php`:
- Around line 93-100: The MatchedWord in PhoneticDriver is being populated with
the normalized token ($word) which causes inconsistent mixed normalized/original
text compared to PatternDriver; update the construction of MatchedWord in
PhoneticDriver to set text to the substring extracted from the original $text
using the same approach as PatternDriver (use mb_substr($text, $start, $length)
with proper encoding) while keeping base, severity, position, length, and
language unchanged so downstream PipelineDriver/CallbackMask receive the
original-cased matched text.
🧹 Nitpick comments (4)
src/Drivers/RegexDriver.php (2)

22-24: LGTM — empty-text guard is correct.

Tiny nit: $text ?? '' is redundant since $text is typed as string and cannot be null. empty('') is true, so this path works, but the null-coalesce is a no-op.


42-106: Dead variables: $uniqueMap and $profanitiesCount are never read.

$profanitiesCount (line 44, incremented on line 93) and $uniqueMap (line 43, populated on lines 104-106) are tracked but never consumed — they don't feed into the Result, score calculation, or any return value. Remove them to avoid confusion.

♻️ Suggested cleanup
         $matchedWords = [];
-        $uniqueMap = [];
-        $profanitiesCount = 0;
         $continue = true;

         while ($continue) {
             ...
-                        $continue = true;
-
-                        // Mask in normalizedString only (needed for loop termination)
                         ...
-                        // Track match
-                        $profanitiesCount++;

                         $matchedWords[] = new MatchedWord(
                             ...
                         );
-
-                        if (!isset($uniqueMap[$profanity])) {
-                            $uniqueMap[$profanity] = true;
-                        }
                     }
src/Drivers/PatternDriver.php (1)

29-54: False-positive check can be hoisted outside the inner match loop.

The in_array($lowerProfanity, $falsePositives) on line 40 doesn't depend on the individual match — it depends only on the profanity word. Moving it before preg_match_all avoids running the regex entirely for profanities that are also false positives.

♻️ Suggested change
         foreach ($profanities as $profanity) {
             $lowerProfanity = mb_strtolower($profanity, 'UTF-8');
+
+            // Skip profanities that are also false positives
+            if (in_array($lowerProfanity, $falsePositives)) {
+                continue;
+            }
+
             $pattern = '/\b' . preg_quote($lowerProfanity, '/') . '\b/iu';

             if (preg_match_all($pattern, $lowerText, $matches, PREG_OFFSET_CAPTURE)) {
                 foreach ($matches[0] as $match) {
                     $start = mb_strlen(substr($lowerText, 0, $match[1]), 'UTF-8');
                     $length = mb_strlen($match[0], 'UTF-8');
                     $originalMatch = mb_substr($text, $start, $length);

-                    // Skip false positives
-                    if (in_array($lowerProfanity, $falsePositives)) {
-                        continue;
-                    }
-
                     $matchedWords[] = new MatchedWord(
src/Drivers/PipelineDriver.php (1)

23-30: Sub-driver results include wasted masking work.

Each sub-driver's detect() builds a fully masked cleanText (including right-to-left masking), but PipelineDriver only reads result->words() and discards the clean text, re-masking from scratch on lines 64-69. This means every mask invocation inside a sub-driver is thrown away.

For CharacterMask/GrawlixMask this is negligible, but for CallbackMask with side effects, the callback fires for every match in every sub-driver and again in PipelineDriver. Consider adding a lightweight "detect-only" mode or a no-op mask sentinel for sub-drivers when running inside a pipeline.

deemonic and others added 4 commits February 14, 2026 15:05
…icDriver matches

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replaces unbounded lazy quantifier (*?) with {0,3} in the separator
expression between profanity characters. This prevents PHP-FPM worker
segfaults caused by PCRE JIT stack overflow when processing 1,300+
complex patterns with nested lazy quantifiers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tack overflow

Each branch in the separator group now matches exactly one character,
with the outer {0,3}? handling repetition. Removes redundant (?:\s)
alternative since \s is already in the character class.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@deemonic deemonic merged commit b863354 into main Feb 18, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant