feat: v4 refactor — driver architecture, multi-language, caching#45
feat: v4 refactor — driver architecture, multi-language, caching#45
Conversation
…ction Adds a `Blaspable` trait that hooks into the Eloquent `saving` event to automatically check and sanitize (or reject) profanity on specified model attributes. Supports per-model language, mask, and mode overrides. - Blaspable trait with sanitize/reject modes and helper methods - ProfanityRejectedException for reject mode - ModelProfanityDetected event fired on detection - `model.mode` config key in blasp.php - 21 tests covering all trait functionality Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove all v3-era source files that have been replaced by the new v4 architecture: Abstracts, Config, Contracts, Facades, Generators, Normalizers, Registries, and the monolithic BlaspService/ProfanityDetector. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New modular core with Analyzer, Dictionary, Result, and driver-based detection (RegexDriver, PatternDriver). Includes normalizers per language, configurable masking strategies, severity levels, and false positive filtering. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BlaspManager with fluent PendingCheck API, Facade, ServiceProvider, middleware, validation rule, artisan commands (clear, test, languages), events (ProfanityDetected, ContentBlocked), and BlaspFake for testing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update composer.json laravel extra to point to new BlaspServiceProvider and Facade namespaces. Add severity tiers to English language config. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Migrate all tests to use the new v4 Facade, PendingCheck fluent API, and Result methods. Simplify TestCase base class to use BlaspServiceProvider. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Full rewrite covering the new driver architecture, fluent API, Result object, Blaspable trait, middleware, validation rules, testing utilities, events, artisan commands, configuration reference, and v3 migration guide. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Register 'blasp' as a short middleware alias, add @clean Blade directive for XSS-safe profanity masking in views, and register isProfane/cleanProfanity macros on Str and Stringable for fluent usage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move all classes from Blaspsoft\Blasp\Laravel\* to Blaspsoft\Blasp\* and update imports across src and tests to match. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Catches sound-alike profanity evasions (e.g. "phuck", "fuk", "sheit") that bypass the regex and pattern drivers. Uses PHP's metaphone() for indexing and levenshtein() for confirmation, with a curated false-positive list to protect common words like "fork", "duck", and "beach". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Allows combining regex, pattern, and phonetic drivers so a single
check() call catches obfuscated text, exact matches, and sound-alikes
in one pass. Supports config-based (`driver('pipeline')`) and ad-hoc
(`pipeline('regex', 'phonetic')`) usage with union merge semantics.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add severity maps (mild/moderate/extreme) for Spanish, French, and German so withSeverity() filtering works correctly for all languages instead of defaulting everything to High. Implement result caching in PendingCheck — check() results are cached by a hash of all parameters (text, driver, language, severity, allow/block lists, mask strategy). CallbackMask bypasses cache since closures can't serialize. Add Result::fromArray() for deserialization, extend Dictionary::clearCache() to also clear result cache, and add cache.results config toggle. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplaces the v3 monolithic profanity service with a v4 modular system: adds BlaspManager, PendingCheck, Dictionary, DriverInterface and drivers (regex, pattern, phonetic, pipeline), Result/Severity models, mask strategies, normalizers, Laravel integrations (provider, middleware, trait, validation), and removes legacy loaders, registries, and the old service classes. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Manager as BlaspManager
participant Pending as PendingCheck
participant Dict as Dictionary
participant Driver as Driver (Regex/Pattern/Phonetic/Pipeline)
participant Analyzer as Analyzer
participant Result as Result
Client->>Manager: in('english') / driver('regex') / pipeline(...)
Manager->>Pending: newPendingCheck()
Client->>Pending: mask('*')->withSeverity(High)->check(text)
Pending->>Dict: forLanguage('english') / forAllLanguages()
Dict-->>Pending: Dictionary
Pending->>Manager: resolveDriver('regex')
Manager-->>Driver: create/return DriverInterface
Pending->>Analyzer: analyze(text, driver, dictionary, mask, options)
Analyzer->>Driver: detect(text, dictionary, mask, options)
Driver->>Driver: tokenize → match → filter false-positives → build MatchedWord[]
Driver->>Driver: apply MaskStrategy → produce clean text
Driver->>Result: build Result(original, clean, matches, score)
Result-->>Analyzer: Result
Analyzer-->>Pending: Result
Pending-->>Client: Result (isOffensive(), clean(), score(), words())
sequenceDiagram
participant Model as EloquentModel
participant Trait as Blaspable
participant Pending as PendingCheck
participant Result as Result
participant Event as EventDispatcher
Model->>Trait: saving event triggers
Trait->>Pending: app('blasp')->in(language)->mask('#')->check(value)
Pending-->>Result: Result
alt Result is offensive
Trait->>Event: dispatch(ModelProfanityDetected(model, attribute, result))
alt model mode == 'reject'
Trait->>Model: throw ProfanityRejectedException
else
Trait->>Model: replace attribute with Result.clean()
end
end
Model->>DB: persist (if not rejected)
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 12
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
config/languages/french.php (1)
39-1533:⚠️ Potential issue | 🔴 CriticalCritical: French profanities list contains hundreds of common, non-profane words.
Starting around line ~526, the list drifts into completely legitimate vocabulary: farming terms (
pâturage,fourrage), biology (cellule,chromosome,génome), technology (ordinateur,smartphone,internet,famille,diamant,information,maison), etc. This appears to be an accidental thesaurus/vocabulary dump that was never pruned.This will cause massive false positives for virtually any French-language text. The list should be truncated to actual profanities and vulgar expressions — roughly lines 39–525 appear legitimate; everything after needs careful review and likely removal.
#!/bin/bash # Count how many French "profanities" overlap with the false_positives list in the same file python3 << 'EOF' import re with open("config/languages/french.php", "r") as f: content = f.read() prof_match = re.search(r"'profanities'\s*=>\s*\[(.*?)\],\s*\n\s*'false_positives'", content, re.DOTALL) fp_match = re.search(r"'false_positives'\s*=>\s*\[(.*?)\],\s*\n\s*'substitutions'", content, re.DOTALL) if not prof_match or not fp_match: print("Could not parse sections") exit(1) def extract_words(text): return set(re.findall(r"'([^']+)'", text)) profanities = extract_words(prof_match.group(1)) false_positives = extract_words(fp_match.group(1)) overlap = profanities & false_positives print(f"Total profanities: {len(profanities)}") print(f"Total false_positives: {len(false_positives)}") print(f"Overlap (in both lists): {len(overlap)}") if overlap: print("Examples of overlap:") for w in sorted(overlap)[:30]: print(f" - {w}") EOFconfig/languages/spanish.php (1)
29-420:⚠️ Potential issue | 🟡 MinorProfanities list includes many common, innocuous words that will cause false positives.
Words like
'triste'(sad),'malo'/'mala'(bad),'adios'/'adiós','bye','goodbye','hasta luego','hasta pronto','despedida'(farewell),'partida'(departure/game),'amargo'(bitter — a taste), and'soso'/'sosa'(bland) are standard vocabulary, not profanities. Including them will flag ordinary text as offensive and erode user trust.Consider pruning clearly non-profane entries or moving them to a separate "sensitive context" tier if needed.
config/languages/german.php (1)
34-1048:⚠️ Potential issue | 🟠 MajorProfanities list contains extremely common German words that will flag virtually all German text as offensive.
Words like
'zu'(to/too — one of the most frequent German words),'weg'(path/away),'warm'(warm),'breit'(wide),'voll'(full),'dicht'(dense/tight),'matt'(matte),'platt'(flat),'oral'(oral),'stramm'(tight),'schritt'(step), and'roh'(raw) are basic everyday vocabulary. Including them will produce an unacceptable false-positive rate, rendering the German detection unusable for real content.Additionally, there are duplicate entries:
'dicht'/'dichte'/'dichter'/'dichtes'(Lines 998–1001 and 1040–1043),'verbrannt'group (Lines 921–924 and 949–952),'verlogen'group (Lines 554–557 and 574–577), and'hinüber'/'hinueber'(Lines 753–754 and 1006–1007). Duplicates cause redundant matches in thePatternDriver.
🤖 Fix all issues with AI agents
In `@config/languages/english.php`:
- Around line 4-35: The severity arrays under the 'severity' key contain ~67
words that are not present in the main profanities list, so update the
profanities array to include every token referenced in severity (e.g. add
'shit', 'pussy', 'slut', 'whore', 'nigga', 'niggers', 'bloody', 'piss', 'twat',
'wanker', etc.), and ensure every entry in the profanities array has a
corresponding severity classification; modify the profanities definition and
reconcile it with the 'severity' arrays so no severity-classified word is
missing and consider auditing remaining profanities (the ~749 unclassified) to
assign appropriate severity levels or remove false positives.
In `@config/languages/french.php`:
- Around line 4-37: The French severity map under the 'severity' array is
missing the 'high' category (it currently has 'mild', 'moderate', 'extreme'),
which breaks parity with the English config; add a new 'high' key in the
'severity' array (between 'moderate' and 'extreme') and populate it with the
equivalent French terms used in the English config's high severity list so the
code paths that call withSeverity('high') or index by 'high' work consistently;
update the array in the same format as 'mild'/'moderate'/'extreme' (string
entries, singular/plural and accent variants) to match localization behavior.
In `@config/languages/spanish.php`:
- Around line 4-27: The Spanish language config's 'severity' array (keys 'mild',
'moderate', 'extreme') is missing the 'high' tier required by the Severity enum
(Mild, Moderate, High, Extreme); update the 'severity' array in
config/languages/spanish.php to include a 'high' key with appropriate
high-severity words (matching the pattern used in English) so Severity::High
maps consistently across locales, or explicitly document the intentional
omission if you choose not to add it.
In `@src/Console/TestCommand.php`:
- Line 9: The command signature currently re-declares the built-in
Artisan/Symfony `--verbose` flag via the protected `$signature` (in
`TestCommand`) which conflicts with the base `Command`; rename the custom flag
(e.g., `--detailed` or `--show-matches`) in the `$signature` string and update
all usages that call `$this->option('verbose')` (such as in the `handle` method
of `TestCommand`) to use the new option name so you no longer shadow the
framework verbosity option.
In `@src/Core/Dictionary.php`:
- Around line 237-264: The loadLanguageConfig method currently interpolates
$language into $possiblePaths and then require()s the chosen file, so first
validate/sanitize $language before building paths: enforce a strict whitelist or
regex (e.g. only letters, digits, hyphen and underscore, no dots or slashes) or
normalize with basename to prevent path traversal, and if validation fails
return the default ['profanities'=>[], 'false_positives'=>[]]; apply this check
at the top of loadLanguageConfig (before building $possiblePaths and before
require) and ensure $languageFile is only chosen from the sanitized/validated
input.
In `@src/Drivers/PatternDriver.php`:
- Around line 34-56: preg_match_all with PREG_OFFSET_CAPTURE gives byte offsets
but the code uses mb_substr (character offsets) and performs left-to-right
replacements causing multibyte corruption and wrong positions; fix by converting
each match's byte offset ($match[1] stored to $start) to a character offset
before using mb_substr or creating MatchedWord (e.g., compute $charStart =
mb_strlen(substr($lowerText, 0, $start), 'UTF-8')), collect matches into
$matchedWords without mutating $cleanText, then perform replacements
right-to-left (sort matches by $charStart descending) and apply mask->mask on
originalMatch when splicing $cleanText so masking and MatchedWord::position
(used in MatchedWord and returned by functions like PipelineDriver) use correct
character offsets and avoid position drift.
In `@src/Drivers/PhoneticDriver.php`:
- Around line 105-117: The severity filter is being applied to $matchedWords
after masking, causing cleanText to still contain masks for words that were
filtered out; in PhoneticDriver (the method that builds $matchedWords and
$cleanText) either move the masking step so it runs after applying the severity
filter to $matchedWords, or reconstruct cleanText from the surviving MatchedWord
instances (using their offsets/lengths) before returning the Result; ensure
Score::calculate still receives the filtered $matchedWords and the returned
Result($text, $cleanText, $matchedWords, $scoreValue) reflects the post-filtered
state.
In `@src/Drivers/PipelineDriver.php`:
- Around line 63-69: The masking loop in PipelineDriver (variables $cleanText,
$reversed, and $match->position) assumes $match->position is a character offset
but sub-drivers may supply byte offsets; before using mb_substr with
$match->position convert the offset to a character index (e.g., compute
character count of the prefix up to the byte offset using a binary-safe
substring and mb_strlen with UTF-8) or, alternatively, ensure the producing
driver (e.g., PatternDriver) returns character offsets instead; update the loop
to use the converted character offset when slicing and applying $mask->mask so
multibyte strings are handled correctly.
In `@src/Drivers/RegexDriver.php`:
- Around line 117-129: The severity filter is applied to $matchedWords after
$workingCleanString has already been masked, causing clean() to still show
masked low-severity words while isOffensive()/count() exclude them; fix by
applying the severity filter (check $minimumSeverity instanceof Severity and
filter $matchedWords accordingly) before performing the masking that produces
$workingCleanString (or alternatively construct a separate masked string from
the filtered $matchedWords), then compute $totalWords, call Score::calculate
with the filtered $matchedWords, and return the Result so clean(), isOffensive()
and count() are consistent.
- Around line 58-95: preg_match_all returns byte/char offsets relative to
$normalizedString but the code uses $start directly on $workingCleanString and
$originalNormalized, causing misaligned masks after normalizers like
GermanNormalizer/SpanishNormalizer change lengths; fix by mapping
normalized-string match offsets back to original-string character offsets before
any substring/mask/replacement: build a position map between $normalizedString
and the original input (or compute $startOrig = mapNormalizedToOriginal($start)
and $lengthOrig = mb_strlen($matchedText, 'UTF-8') measured in original
coordinates), then use those original offsets for getFullWordContext,
$mask->mask application, and mb_substr on $workingCleanString and
$normalizedString (or alternatively perform matching on the original string and
validate with normalized checks via compoundDetector->isPureAlphaSubstring and
filter methods) so $start/$length always refer to the same coordinate system
across RegexDriver methods and updates.
In `@src/Facades/Blasp.php`:
- Around line 64-78: The current assertChecked() and assertCheckedTimes(int
$times) silently no-op when the facade root isn't a BlaspFake; update both
methods so that after obtaining $instance = static::getFacadeRoot() they check
if $instance is a BlaspFake and call the fake's methods as before, but if not,
throw a LogicException (or appropriate runtime exception) with a clear message
instructing the developer to call Blasp::fake() in their test; reference these
methods (assertChecked, assertCheckedTimes) and the BlaspFake type so the thrown
error is specific and actionable.
In `@src/Rules/Profanity.php`:
- Around line 15-34: The static factory methods in Profanity (in, maxScore,
severity) are non-composable because each creates a new self; change the design
so that in(string $language) remains a static factory that returns a new
Profanity instance, and convert maxScore(int $score) and severity(Severity
$severity) into instance fluent methods that set $this->maxScore and
$this->minimumSeverity respectively and return $this, allowing calls like
Profanity::in('es')->maxScore(50)->severity(Severity::High); if you still want
standalone static entry points, add separate static constructors (e.g., static
maxScoreFactory(int $score)) rather than making the mutators static.
🟡 Minor comments (18)
config/blasp.php-158-185 (1)
158-185:⚠️ Potential issue | 🟡 MinorDuplicate entries in substitution arrays.
A few substitution arrays contain duplicate characters:
- Line 159 (
/a/):'Â'appears twice.- Line 173 (
/o/):'ø'appears twice.These won't cause functional issues but add unnecessary noise. Consider deduplicating them.
src/Testing/BlaspFake.php-6-6 (1)
6-6:⚠️ Potential issue | 🟡 MinorUnused import:
PendingCheck.
PendingCheckis imported but never referenced in this class.Proposed fix
use Blaspsoft\Blasp\Core\Result; -use Blaspsoft\Blasp\PendingCheck; use PHPUnit\Framework\Assert;README.md-30-33 (1)
30-33:⚠️ Potential issue | 🟡 MinorClarify minimum Laravel version vs tested version.
The README states "Laravel 8.0+" but
require-devincomposer.jsonpinsorchestra/testbench: ^10.0(Laravel 11+). Whileilluminate/support: ^8.0|...technically allows Laravel 8, the test suite doesn't verify compatibility with Laravel 8–10. Consider noting that Laravel 8–10 support is best-effort/untested, or aligning the claim with what's actually tested.README.md-226-230 (1)
226-230:⚠️ Potential issue | 🟡 MinorREADME.md example has incorrect property types for nullable properties.
Lines 228–229 declare
$blaspLanguageand$blaspMaskas non-nullablestring, but the Blaspable trait defines them asstring|null(see trait docblock lines 16–17) and the implementation uses null coalescing operators (lines 47, 51) to check for null values. The comments "// null = config default" further indicate these should be nullable. Users following this example and setting values tonullwould encounter a TypeError.The example should use
?stringto match the trait's actual type hints:- protected string $blaspLanguage = 'spanish'; // null = config default - protected string $blaspMask = '#'; // null = config default + protected ?string $blaspLanguage = 'spanish'; // null = config default + protected ?string $blaspMask = '#'; // null = config defaultsrc/Drivers/PatternDriver.php-25-25 (1)
25-25:⚠️ Potential issue | 🟡 MinorUse
mb_strtolowerfor false positives to handle multibyte characters correctly.
strtolowerwon't correctly lowercase multibyte characters (e.g.,'Ñ'→'ñ'). Since the profanity list usesmb_strtolower(Line 31), the comparison on Line 41 may fail for multibyte false positives.- $falsePositives = array_map('strtolower', $dictionary->getFalsePositives()); + $falsePositives = array_map(fn($fp) => mb_strtolower($fp, 'UTF-8'), $dictionary->getFalsePositives());src/Core/Matchers/CompoundWordDetector.php-7-7 (1)
7-7:⚠️ Potential issue | 🟡 MinorSUFFIXES is English-only — misses language-specific morphology.
The hardcoded suffix list (
s,es,ed,ing, etc.) is English-centric. In a multi-language system, German (e.g.,-en,-ung,-lich), Spanish (e.g.,-ado,-ción), and French (e.g.,-ment,-tion) suffixes are not covered. This means legitimate profanity+suffix forms in non-English languages won't be recognized, causing false negatives.Consider making the suffix list configurable per language or per-dictionary.
tests/MultiLanguageProfanityTest.php-43-49 (1)
43-49:⚠️ Potential issue | 🟡 MinorDuplicate array key
'scheisse'on lines 44–45.Both entries have identical key and value, so the second silently overwrites the first and the array effectively has only 4 entries instead of 5. Remove the duplicate or replace with a different profanity (e.g.,
'Scheiße'to test ß normalization).Proposed fix
$testCases = [ - 'scheisse' => 'Das ist scheisse', 'scheisse' => 'Das ist scheisse', 'arsch' => 'Du bist ein arsch', 'ficken' => 'Ich will ficken',src/Middleware/CheckProfanity.php-18-21 (1)
18-21:⚠️ Potential issue | 🟡 MinorPotential
nullseverity when config value is invalid.In the else branch (no
$severityparameter),Severity::tryFrom(config('blasp.middleware.severity', 'mild'))can returnnullif the configured string doesn't match a validSeveritycase. Line 36 guards againstnull, but then no severity filter is applied — which silently degrades to no filtering rather than failing with a clear error. Consider adding a fallback like the?? Severity::Mildused in the if-branch:Proposed fix
- $minimumSeverity = $severity ? (Severity::tryFrom($severity) ?? Severity::Mild) : Severity::tryFrom(config('blasp.middleware.severity', 'mild')); + $minimumSeverity = $severity + ? (Severity::tryFrom($severity) ?? Severity::Mild) + : (Severity::tryFrom(config('blasp.middleware.severity', 'mild')) ?? Severity::Mild);src/Middleware/CheckProfanity.php-63-72 (1)
63-72:⚠️ Potential issue | 🟡 Minor
extractTextFieldsskips nested input arrays, which could leave nested form data unchecked.When request data contains nested arrays (e.g.,
address[street],user[name]), the current implementation silently ignores them since it only processes string values. If comprehensive profanity checking across all user-submitted text is required, consider flattening nested arrays or recursively traversing them inextractTextFields.Note: No tests currently exercise nested input with this middleware. Verify whether your application uses nested form data with the
blaspmiddleware before implementing this.src/Drivers/PhoneticDriver.php-62-93 (1)
62-93:⚠️ Potential issue | 🟡 MinorByte-offset vs. character-offset mismatch in masking and offset drift from normalization.
Two related concerns with the offset handling:
PREG_OFFSET_CAPTURE(line 66) returns byte offsets, butmb_substron line 93 expects character offsets. For ASCII this is identical, but any multi-byte character (emoji, accented char) before a match will cause misaligned masking.Tokenization runs on
$normalizedtext, but masking is applied to$cleanText(initialized from the original$text). If normalization changes character positions (e.g., accent stripping reduces multi-byte chars to single-byte), offsets won't correspond.Since the phonetic driver is currently English-only and the English normalizer likely preserves ASCII positions, this is low-risk today but will break if
supportedLanguagesis expanded.src/BlaspManager.php-76-87 (1)
76-87:⚠️ Potential issue | 🟡 MinorPotential infinite recursion if pipeline config includes
'pipeline'as a sub-driver.
createPipelineDriver()callsresolveDriver()for each configured sub-driver name. Ifblasp.drivers.pipeline.driverscontains'pipeline', this recurses infinitely. A guard clause would prevent this misconfiguration from causing a stack overflow.🛡️ Proposed guard
public function createPipelineDriver(): DriverInterface { $config = $this->app['config']->get('blasp.drivers.pipeline', []); $driverNames = $config['drivers'] ?? ['regex', 'phonetic']; + $driverNames = array_filter($driverNames, fn (string $name) => $name !== 'pipeline'); $resolvedDrivers = array_map( fn (string $name) => $this->resolveDriver($name), $driverNames, ); return new PipelineDriver($resolvedDrivers); }tests/BlaspCheckTest.php-163-163 (1)
163-163:⚠️ Potential issue | 🟡 MinorTypo in method name:
test_word_boudary→test_word_boundary.Proposed fix
- public function test_word_boudary() + public function test_word_boundary()tests/BlaspCheckTest.php-175-175 (1)
175-175:⚠️ Potential issue | 🟡 MinorTypo in method name:
test_pural_profanity→test_plural_profanity.Proposed fix
- public function test_pural_profanity() + public function test_plural_profanity()src/PendingCheck.php-149-157 (1)
149-157:⚠️ Potential issue | 🟡 Minor
$falsePositivesparameter is accepted but silently ignored.The PHPMD hint is valid. This backward-compat method accepts
$falsePositivesbut never uses it — callers may expect false positives to be applied. Either wire it into the dictionary (e.g., add to an internalfalsePositivesList) or document why it's intentionally a no-op.If intentionally a no-op, annotate to suppress the warning and document the reason
+ /** + * `@deprecated` Backward-compat method. Only $profanities (→ blockList) is supported; + * false positives are now managed via language config files. + */ public function configure(?array $profanities = null, ?array $falsePositives = null): self { if ($profanities !== null) { $this->blockList = array_merge($this->blockList, $profanities); } + // $falsePositives intentionally unused — false positives are managed per-language in config. return $this; }tests/ResultCachingTest.php-103-119 (1)
103-119:⚠️ Potential issue | 🟡 MinorBug:
$keysis overwritten before the verification loop, so the loop body never executes.On line 107,
$keyscaptures the original cache keys. On line 112,$keysis reassigned to the result of a freshCache::get(...)call — which returns[]after the cache was cleared. Theforeachon line 116 then iterates over this empty array, so it never verifies that the individual cached results were actually deleted.Proposed fix: preserve the original keys for verification
public function test_clear_cache_wipes_result_cache(): void { Blasp::check('This is a fucking sentence'); - $keys = Cache::get('blasp_result_cache_keys', []); + $originalKeys = Cache::get('blasp_result_cache_keys', []); - $this->assertNotEmpty($keys); + $this->assertNotEmpty($originalKeys); Dictionary::clearCache(); - $keys = Cache::get('blasp_result_cache_keys', []); $this->assertNull(Cache::get('blasp_result_cache_keys')); // Verify the cached result data was also cleared - foreach ($keys as $key) { + foreach ($originalKeys as $key) { $this->assertNull(Cache::get($key)); } }src/Core/Dictionary.php-49-50 (1)
49-50:⚠️ Potential issue | 🟡 MinorUse
mb_strtolowerinstead ofstrtolowerfor multi-language support.
strtoloweris not Unicode-safe. For languages like German (Ärsch→ should beärsch), accented characters may not be lowered correctly. Since v4 explicitly supports non-English languages, usemb_strtolowerthroughout for consistency.Proposed fix
- $this->allowList = array_map('strtolower', $allowList); - $this->blockList = array_map('strtolower', $blockList); + $this->allowList = array_map('mb_strtolower', $allowList); + $this->blockList = array_map('mb_strtolower', $blockList);Also apply to line 65 and line 181:
- fn($p) => !in_array(strtolower($p), $this->allowList) + fn($p) => !in_array(mb_strtolower($p), $this->allowList)- $lower = strtolower($word); + $lower = mb_strtolower($word);src/Core/Result.php-156-156 (1)
156-156:⚠️ Potential issue | 🟡 Minor
str_word_count()is unreliable for non-English/multi-byte text.
str_word_countuses locale-dependent rules and may miscount words in languages with accented characters, CJK scripts, or other non-ASCII text. Since v4 explicitly supports Spanish, French, and German, this could produce incorrect score calculations.Consider using a Unicode-aware word count, e.g.,
preg_match_all('/\S+/u', $text)orcount(preg_split('/\s+/u', $text, -1, PREG_SPLIT_NO_EMPTY)).Proposed fix
- $totalWords = max(1, str_word_count($originalText ?: implode(' ', $words))); + $totalWords = max(1, count(preg_split('/\s+/u', trim($originalText ?: implode(' ', $words)), -1, PREG_SPLIT_NO_EMPTY)));src/PendingCheck.php-39-43 (1)
39-43:⚠️ Potential issue | 🟡 Minor
strict()mode is set but never consumed during analysis—only affects cache invalidation.
strictModeis toggled viastrict()(line 89) and included in the cache key (line 301), but unlikelenientMode(which forces the 'pattern' driver on line 254),strictModedoes not influence driver selection or analysis behavior. The$optionsarray passed to the analyzer contains onlyseverity(if set), andstrictModeis never passed to the driver'sdetect()method. This appears to be incomplete implementation. Either usestrictModeto influence driver selection or behavior, or remove it to avoid confusion.
🧹 Nitpick comments (37)
src/Core/Masking/GrawlixMask.php (1)
11-18:$wordparameter is unused — expected given the interface contract, but worth a brief note.The static analysis tool flags
$wordas unused. This is inherent to theMaskStrategyInterfacecontract (other implementations likeCallbackMaskdo use it). Consider suppressing with@SuppressWarningsor a brief inline comment for clarity.config/blasp.php (1)
26-27: Backward-compat aliases can drift from primary keys.The aliases (
default_language,mask_character,cache_driver) duplicate the values of the primary keys (language,mask,cache.driver). If a user edits one in a published config without updating the other, consumers reading different keys will see inconsistent values. Consider having the aliases reference the primary keys instead (e.g.,'default_language' => null, // deprecated, use 'language') or resolve them at runtime in the service provider.Also applies to: 39-40, 76-77
src/Core/Matchers/PhoneticMatcher.php (1)
63-66: Inconsistent string length functions:strlenvsmb_strlen.Line 65 uses
strlen()to compute$maxLenfor the Levenshtein threshold, while lines 25 and 47 usemb_strlen()for the minimum word length check. Sincelevenshtein()operates on bytes, usingstrlen()here is arguably correct for threshold consistency with the distance value. However, mixing the two could produce surprising results if non-ASCII input reaches this code path (e.g., a 2-character word with multi-byte chars passes themb_strlen >= 3check butstrlenreturns a larger value).Given the phonetic driver is English-only per config, this is low-risk, but worth aligning for defensive correctness.
tests/DetectionStrategyRegistryTest.php (1)
14-14: Test class name no longer reflects what it tests.
DetectionStrategyRegistryTestnow testsBlaspManagerrather than a registry. Consider renaming toBlaspManagerTestto match the new architecture and avoid confusion.src/Core/Matchers/FalsePositiveFilter.php (1)
125-139:getFullWordContextis not multibyte-safe, unlike sibling methods.This method uses byte-level string access (
$string[$left - 1],strlen(),substr()) which can break on multi-byte UTF-8 characters — splitting mid-character or producing corrupted substrings. In contrast,isSpanningWordBoundaryin the same class correctly usesmb_strlen/mb_substr.If this code only processes ASCII-normalized text, it's fine in practice, but it's an inconsistency worth noting.
♻️ Multibyte-safe version
public function getFullWordContext(string $string, int $start, int $length): string { $left = $start; $right = $start + $length; + $strLen = mb_strlen($string, 'UTF-8'); - while ($left > 0 && preg_match('/\w/', $string[$left - 1])) { + while ($left > 0 && preg_match('/\w/u', mb_substr($string, $left - 1, 1, 'UTF-8'))) { $left--; } - while ($right < strlen($string) && preg_match('/\w/', $string[$right])) { + while ($right < $strLen && preg_match('/\w/u', mb_substr($string, $right, 1, 'UTF-8'))) { $right++; } - return substr($string, $left, $right - $left); + return mb_substr($string, $left, $right - $left, 'UTF-8'); }tests/ProfanityExpressionGeneratorTest.php (1)
7-7: Test class/file name doesn't match the class under test.
ProfanityExpressionGeneratorTestnow testsRegexMatcher. Consider renaming toRegexMatcherTestfor discoverability and consistency.src/Testing/BlaspFake.php (1)
59-63:__callmagic method may silently swallow typos and invalid method calls.The catch-all
__callmeans any misspelled method in tests (e.g.,->diver('regex')instead of->driver('regex')) will silently return$thisinstead of failing. Since you already have explicit no-op methods for the known fluent API surface, consider removing__callor at least logging/tracking unknown calls to aid debugging.tests/EdgeCaseTest.php (1)
9-41: Tests still use deprecated v3 method names.These tests call
hasProfanity()andgetUniqueProfanitiesFound()which are preserved as deprecated aliases. Since this is a v4 rewrite, consider migrating the test assertions to the v4 equivalents (isOffensive(),uniqueWords()) to lead by example and avoid deprecation noise in future tooling.src/Core/Normalizers/EnglishNormalizer.php (1)
1-11:EnglishNormalizeris functionally identical toNullNormalizer.Both classes implement
StringNormalizerand return the input unchanged. Additionally,src/Normalizers/EnglishStringNormalizer(extending the abstractStringNormalizer) does the same thing. Consider whether all three are needed, or ifEnglishNormalizercould simply be an alias or removed in favor ofNullNormalizer.composer.json (1)
18-24:orchestra/testbench^10.0 limits tested Laravel versions.
illuminate/supportsupports^8.0through^12.0, butorchestra/testbench: ^10.0only tests against Laravel 11/12. Earlier Laravel versions (8–10) are untested in CI. If those are officially supported, consider a test matrix with multiple testbench versions, or narrow theilluminate/supportconstraint to match what's actually tested.tests/BladeDirectiveTest.php (1)
9-17:renderBladehelper is fine for test-only usage but could be fragile.Using
eval()on compiled Blade output is a common pattern for Blade directive testing in Orchestra Testbench. However, ifBlade::compileStringthrows or returns malformed PHP, theob_start()buffer will leak. Consider wrapping in a try/finally to ensureob_end_clean()on failure.🛡️ Suggested defensive improvement
protected function renderBlade(string $template, array $data = []): string { $compiled = Blade::compileString($template); ob_start(); - extract($data); - eval('?>' . $compiled); - return ob_get_clean(); + try { + extract($data); + eval('?>' . $compiled); + return ob_get_clean(); + } catch (\Throwable $e) { + ob_end_clean(); + throw $e; + } }src/Events/ProfanityDetected.php (1)
7-13: Minor:originalTextmay be redundant with$result->original().
Resultalready exposesoriginal()which returns the original text. The$originalTextproperty duplicates this. If they're always the same, consider removing the extra property to keep the event lean. Not blocking.src/Drivers/PatternDriver.php (1)
17-19: Null coalescing on a non-nullablestringparameter is redundant.
$textis typedstringso$text ?? ''can never trigger the fallback.- return new Result($text ?? '', $text ?? '', [], 0); + return new Result($text, $text, [], 0);src/Drivers/PipelineDriver.php (1)
19-21: Redundant null coalescing on non-nullablestringparameter.Same minor nit as
PatternDriver—$textis typedstring, so$text ?? ''is always$text.- return new Result($text ?? '', $text ?? '', [], 0); + return new Result($text, $text, [], 0);tests/PipelineDriverTest.php (1)
5-5: Unused import:PipelineDriver.
PipelineDriveris imported but never referenced — all tests interact through theBlaspfacade.Proposed fix
-use Blaspsoft\Blasp\Drivers\PipelineDriver; use Blaspsoft\Blasp\Enums\Severity;src/Middleware/CheckProfanity.php (1)
25-29:$exceptfilter is silently bypassed when$fieldsis explicitly configured.When
$fields !== ['*'], line 28 overwrites$inputwith$request->only($fields), discarding the$exceptexclusion from line 25. If someone accidentally addspasswordto thefieldsconfig, it will be scanned. Consider intersecting:Proposed fix
- $input = $request->except($except); - - if ($fields !== ['*']) { - $input = $request->only($fields); - } + if ($fields !== ['*']) { + $input = $request->only(array_diff($fields, $except)); + } else { + $input = $request->except($except); + }tests/TestCase.php (1)
22-30: SeveralConfig::setcalls are no-ops (setting a key to its own value).Lines like
Config::set('blasp.profanities', config('blasp.profanities'))read the merged config and write the same value back — these are effectively no-ops since the service provider'smergeConfigFromhas already loaded them. Consider removing the redundant lines and keeping only the overrides that set explicit test values (like lines 22, 27, 28).Also, both
blasp.cache.driver(line 29) andblasp.cache_driver(line 30) are present — if v4 uses the nested key, the flat key may be a leftover.src/Core/Normalizers/SpanishNormalizer.php (1)
19-26: Accented vowels in thelllookahead are dead after normalization.By line 19,
strtrhas already replaced all accented vowels (á→a, etc.). The accented characters in the lookahead(?=[aeiouáéíóúü])on line 21 will never match. They're harmless but misleading — the character class could be simplified to(?=[aeiou]).src/Drivers/RegexDriver.php (2)
22-24: Redundant null coalescing on typedstringparameter.
$textis typed asstring(non-nullable) on line 20, so$text ?? ''on line 23 can never trigger the fallback.Proposed fix
if (empty($text)) { - return new Result($text ?? '', $text ?? '', [], 0); + return new Result($text, $text, [], 0); }
30-31: Per-call objects stored as instance properties — thread-unsafe code smell.
FalsePositiveFilterandCompoundWordDetectorare created perdetect()call but stored as instance fields ($this->filter,$this->compoundDetector). If theRegexDriverinstance is reused across calls (e.g., as a singleton in the container), this overwrites shared state. Consider using local variables instead.Proposed fix
class RegexDriver implements DriverInterface { - private FalsePositiveFilter $filter; - private CompoundWordDetector $compoundDetector; - public function detect(string $text, Dictionary $dictionary, MaskStrategyInterface $mask, array $options = []): Result { // ... - $this->filter = new FalsePositiveFilter($dictionary->getFalsePositives()); - $this->compoundDetector = new CompoundWordDetector(); + $filter = new FalsePositiveFilter($dictionary->getFalsePositives()); + $compoundDetector = new CompoundWordDetector();Then replace all
$this->filter/$this->compoundDetectorreferences with the local variables.tests/AllLanguagesDetectionTest.php (2)
67-70: Duplicate test variant'scheisse'in the German list.The array on line 69 contains
'scheisse'twice (positions 3 and 5). This doesn't break anything but wastes a test iteration.- 'scheisse' => ['SCHEISSE', 'Scheisse', 'scheisse', 'ScHeIsSe', 'scheisse'] + 'scheisse' => ['SCHEISSE', 'Scheisse', 'scheisse', 'ScHeIsSe']
100-122: German normalizer tests don't exercise umlauts orß.The German normalizer presumably handles
ö,ü,ä,ß(e.g.,Scheiße→scheisse). The test only verifiesscheissewithout diacritics, which means the normalizer's umlaut/ß handling is untested here. Consider adding variants like'Scheiße'and'Ärsch'.tests/MultiLanguageProfanityTest.php (1)
123-158:test_comprehensive_language_coverageinvokes a full check per profanity word — potentially slow.This iterates over every profanity in every language dictionary and creates a new
PendingCheck+ driver execution for each one. Depending on dictionary size, this could be hundreds or thousands of invocations. Consider whether this is intended as a development/CI-only smoke test, and if so, annotating it with a PHPUnit group (e.g.,#[Group('slow')]) so it can be excluded from fast feedback loops.src/Core/Matchers/CompoundWordDetector.php (1)
9-47: Method nameisPureAlphaSubstringis misleading about return semantics.Returning
truemeans "this match is a pure alpha substring of a larger word and should be suppressed." The caller inRegexDriver(line 77) uses it in acontinueguard, sotrue→ skip the match. The name reads like a predicate about the string's nature, but it's actually a "should suppress" signal. Consider renaming to something likeshouldSuppressAsSubstringfor clarity.src/Blaspable.php (1)
93-102:withoutBlaspChecking— consider concurrency with queued/async saves.The
staticflag pattern is correct for synchronous request-scoped use. However, if a model save is deferred or dispatched to a queue inside the callback, the flag will have already been re-enabled by the time the queued job actually saves. This is a known limitation of this pattern (Laravel's ownModel::withoutEventshas the same caveat), but worth a note in the docblock for users.src/Drivers/PhoneticDriver.php (1)
27-29: Null coalesce on non-nullablestringparameter is redundant.
$text ?? ''on line 28 is unnecessary since$textis typed asstring(non-nullable). Minor nit — no functional impact.Suggested simplification
if (empty($text)) { - return new Result($text ?? '', $text ?? '', [], 0); + return new Result($text, $text, [], 0); }src/BlaspServiceProvider.php (1)
53-64: PHPMD$attributeunused warning is a false positive — Laravel's validator callback signature requires it.The
$attributeparameter is part of Laravel'sValidator::extend()callback contract($attribute, $value, $parameters, $validator). Suppressing the warning with a docblock annotation would be the cleanest approach if PHPMD noise is a concern.tests/PhoneticDriverTest.php (1)
145-153: Add aclean()assertion to fully validate severity filtering.This test verifies that
isClean()returnstruewhen the severity threshold filters out "fuck", but it doesn't assert on$result->clean(). Given the masking-before-filtering bug inPhoneticDriver(raised separately), the clean text may still contain asterisks even thoughisClean()istrue. Adding$this->assertSame('What the fuck', $result->clean())would catch this inconsistency.Proposed additional assertion
// "fuck" is typically High severity, not Extreme, so should be filtered out $this->assertTrue($result->isClean()); + $this->assertSame('What the fuck', $result->clean());tests/ConfigurationLoaderLanguageTest.php (1)
116-118: Inconsistent import: uses FQCN inline instead of ausestatement.Line 117 references
\Blaspsoft\Blasp\Facades\Blaspas a fully-qualified class name, while the rest of the test files in this PR import the facade via ausestatement at the top.Proposed fix
Add to the imports at the top of the file:
use Blaspsoft\Blasp\Core\Dictionary; +use Blaspsoft\Blasp\Facades\Blasp; use Blaspsoft\Blasp\Core\Normalizers\EnglishNormalizer;Then update line 117:
- $result = \Blaspsoft\Blasp\Facades\Blasp::french()->check('connard'); + $result = Blasp::french()->check('connard');tests/ResultCachingTest.php (1)
141-155: Misleading comment: no PHP state is actually cleared between the two calls.The comment on line 145 says "Clear PHP state but keep cache" but no action follows — the second
check()call runs immediately. The test still validates that a cached result deserializes correctly (the second call hits cache and returns viaResult::fromArray), but the comment is misleading. Consider either removing it or adding an actual state reset (e.g., clearing the resolved facade instance) between the two calls to make the intent explicit.src/PendingCheck.php (2)
252-258: Lenient mode silently overrides an explicitly set driver name.If a caller chains
->driver('regex')->lenient()->check(...), the explicit'regex'driver is silently replaced with'pattern'at line 255.strict()mode has no analogous override. This asymmetry could surprise users.Consider either documenting this precedence clearly or throwing/warning when both an explicit driver and lenient mode are set.
316-322: Unbounded growth ofblasp_result_cache_keystracking array.Every unique check appends to this array (stored
forever). Over time in high-throughput applications, this list can grow large, degrading the performance ofclearCache()and the storage cost of the tracking key itself.Consider bounding the list (e.g., capping at a configurable max), using a time-based eviction strategy, or switching to a cache tag-based approach if the underlying cache driver supports tags.
src/Core/Result.php (1)
117-137:fromArraysilently tolerates missing keys in word data — could mask upstream bugs.If a cached entry is corrupted or schema-drifts (e.g., missing
'text'or'position'), the constructor will throw an unrelated error deep insideMatchedWord. Consider adding a guard or at minimum a null coalesce on the required fields to produce a more informative failure.src/Core/Dictionary.php (3)
114-156: Multi-language dictionary always usesEnglishNormalizer.
forLanguages()on line 151 hard-codesself::getNormalizerForLanguage('english'). When checking text in e.g. German + Spanish, the English normalizer may not handle language-specific normalization correctly (e.g., ß, ñ, accented chars).Consider either accepting a normalizer parameter, using a composite normalizer that chains per-language normalizers, or selecting based on the first/primary language.
339-344:getCache()logic is duplicated withPendingCheck::getCache()(lines 309-314 in PendingCheck.php).Both methods resolve the cache store identically. Extract into a shared utility or trait to avoid drift.
301-314: Normalizer cache key doesn't normalize case butmatchdoes.If
getNormalizerForLanguage('English')is called, the cache stores it under key'English', but thematchlowercases to find the correct class. A subsequent call with'english'creates a separate instance cached under'english'. Normalizing the key would be more robust.Proposed fix
public static function getNormalizerForLanguage(string $language): StringNormalizer { - if (!isset(self::$normalizers[$language])) { - self::$normalizers[$language] = match (strtolower($language)) { + $key = strtolower($language); + if (!isset(self::$normalizers[$key])) { + self::$normalizers[$key] = match ($key) { 'english' => new EnglishNormalizer(), 'spanish' => new SpanishNormalizer(), 'german' => new GermanNormalizer(), 'french' => new FrenchNormalizer(), default => new EnglishNormalizer(), }; } - return self::$normalizers[$language]; + return self::$normalizers[$key]; }tests/BlaspableTest.php (1)
230-240: Consider adding a test forwithoutBlaspCheckingexception safety.The
withoutBlaspCheckingimplementation uses afinallyblock to re-enable checking even when the callback throws. There's no test verifying that$blaspCheckingDisabledis properly reset after an exception inside the callback. If thefinallyclause were accidentally removed, subsequent tests could silently pass with checking disabled.💡 Suggested additional test
public function test_without_blasp_checking_resets_after_exception() { try { BlaspableTestModel::withoutBlaspChecking(function () { throw new \RuntimeException('boom'); }); } catch (\RuntimeException) { // expected } // Checking should be re-enabled — profanity should be masked $model = BlaspableTestModel::create([ 'body' => 'This is a fucking sentence', ]); $this->assertStringNotContainsString('fucking', $model->body); }
…lists Non-English severity maps (Spanish, French, German) only had 3 tiers (mild, moderate, extreme) while English had 4. Added 'high' tier with representative strong profanity words to each. Also added 39 words that appeared in severity maps but were missing from profanities arrays (21 English, 5 French, 13 German), which meant they could never be detected. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Dictionary: sanitize language parameter to prevent path traversal
via loadLanguageConfig(), forLanguage(), and forLanguages()
- TestCommand: rename --verbose to --detail to avoid conflict with
Symfony Console's built-in -v|--verbose flag
- PatternDriver, PhoneticDriver, RegexDriver: convert PREG_OFFSET_CAPTURE
byte offsets to character offsets for correct multibyte string handling
- PatternDriver, PhoneticDriver, RegexDriver: apply severity filter before
masking so low-severity words aren't masked in cleanText when filtered out
- Blasp facade: throw RuntimeException in assertChecked() and
assertCheckedTimes() when fake() hasn't been called, instead of silently
passing
- Profanity rule: convert static factory methods to instance methods with
__callStatic for backward compat, enabling chaining like
Profanity::in('spanish')->severity(Severity::High)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
config/languages/spanish.php (1)
370-426:⚠️ Potential issue | 🟠 MajorSpanish profanities list contains common non-vulgar words that will cause false positives.
The tail end of the profanities array includes everyday Spanish words like
triste(sad),deprimido/deprimida(depressed),morir/muerte(to die/death),fallecido(deceased),adiós/adios,hasta luego,hasta pronto,chao,bye,goodbye, and many others. These are not profanities or vulgar expressions — they are standard vocabulary that will be flagged and masked in normal Spanish text.This will severely degrade user experience for Spanish-language content. Audit and trim the profanities list to actual vulgar/offensive terms.
🤖 Fix all issues with AI agents
In `@config/languages/english.php`:
- Line 1337: The word "knob" is present in both the profanities array and the
false_positives array in config/languages/english.php, causing conflicting
behavior; remove "knob" from one of those arrays to make intent explicit (either
keep it only in profanities if you intend it to be flagged, or only in
false_positives if you intend it to be ignored), update the corresponding array
(profanities or false_positives) where "knob" appears, and ensure the change is
reflected consistently so PatternDriver/PhoneticDriver logic no longer has to
reconcile the duplicate.
In `@src/Drivers/PatternDriver.php`:
- Around line 29-54: PatternDriver currently adds MatchedWord entries for every
independent match against $lowerText which allows overlapping matches (e.g.,
"motherfucker" and "fuck") and causes double-masking and corrupted positions;
fix this by filtering/deduplicating overlapping ranges before masking: when
iterating matches in the foreach over $profanities/$matches, compute the
candidate range (start, length) and skip creating a MatchedWord if that range
overlaps any already-recorded MatchedWord ranges in $matchedWords (or
alternatively build a dedicated list of reserved character ranges and check
against it), ensuring you still prefer longer matches (keep existing
longest-first order) so shorter matches that fall inside an existing range are
ignored prior to the right-to-left masking step.
In `@src/Drivers/PhoneticDriver.php`:
- Around line 80-84: The check in PhoneticDriver (and similarly in RegexDriver)
passes character offsets ($start, $length calculated with mb_* functions) into
FalsePositiveFilter::isInsideHexToken which uses byte-level functions; convert
the character offsets to byte offsets before calling isInsideHexToken (or change
isInsideHexToken to accept character offsets and use mb_* internally).
Specifically, compute the byte-based start and length for the substring (e.g.
use mb_strlen/mb_substr to determine the byte position or use mb_strpos combined
with utf8 byte offset conversion) and then call isInsideHexToken($normalized,
$byteStart, $byteLength); ensure the referenced symbols are updated:
PhoneticDriver::isInsideHexToken call sites (and the analogous RegexDriver call)
or update FalsePositiveFilter::isInsideHexToken to operate on multibyte-aware
offsets.
🧹 Nitpick comments (6)
src/Rules/Profanity.php (2)
38-41: Consider guarding__callStaticagainst invalid method names.Currently any undefined static call (e.g.,
Profanity::foo()) will trigger a confusingCall to undefined methoderror on the anonymous instance. A whitelist ormethod_existscheck would produce a clearer error message.♻️ Optional improvement
public static function __callStatic(string $name, array $arguments): self { - return (new self())->$name(...$arguments); + $instance = new self(); + + if (! method_exists($instance, $name)) { + throw new \BadMethodCallException(sprintf('Method %s::%s does not exist.', static::class, $name)); + } + + return $instance->$name(...$arguments); }
62-71:maxScoremode silently skips theisOffensive()check — document or unify the behavior.When
maxScoreis set, the early return on line 66 meansisOffensive()is never evaluated. This creates two mutually exclusive validation modes: score-threshold vs. binary-offensive. If a user sets bothmaxScoreandseverity, the severity filtering still applies to the check (line 56-58), but only the score is used for the pass/fail decision.This is likely intentional, but it could surprise users who expect both constraints to be enforced simultaneously. Consider either:
- Documenting this behavior, or
- Combining both checks when both are configured (fail if score exceeds threshold or result is offensive).
config/languages/spanish.php (1)
149-149: Duplicate entries in profanities array.
cabronazoappears at both line 149 and line 186.cochino/cochinaalso appear at lines 158–159 and 233–234. While harmless at runtime (just slightly more memory), it indicates a lack of deduplication.Also applies to: 186-186
src/Drivers/PhoneticDriver.php (1)
27-29: Redundant null-coalesce on a non-nullablestringparameter.
$textis typed asstring(line 25), so$text ?? ''on line 28 is a no-op. Use$textdirectly.Proposed fix
if (empty($text)) { - return new Result($text ?? '', $text ?? '', [], 0); + return new Result($text, $text, [], 0); }src/Drivers/PatternDriver.php (1)
17-19: Redundant null-coalesce — same as PhoneticDriver.
$textis typedstring, so$text ?? ''is a no-op.src/Drivers/RegexDriver.php (1)
22-24: Redundant null-coalesce — same pattern as the other drivers.
$textis typedstring;$text ?? ''is always a no-op.
- Remove 'knob' from false_positives list (conflicts with profanities) - PatternDriver: deduplicate overlapping matches before masking to prevent double-masking (e.g., "motherfucker" matching both "motherfucker" and "fuck") - PhoneticDriver, RegexDriver: pass byte offsets to FalsePositiveFilter methods (isInsideHexToken, isSpanningWordBoundary, getFullWordContext) which use byte-level operations, while keeping character offsets for MatchedWord positions and mb_substr masking Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/Drivers/PatternDriver.php`:
- Line 24: The false-positive normalization uses strtolower which fails on
multibyte characters; change the mapping that builds $falsePositives in
PatternDriver (where $falsePositives = array_map('strtolower',
$dictionary->getFalsePositives())) to use mb_strtolower with the same encoding
used elsewhere (e.g., 'UTF-8') so its casing matches the other mb_strtolower
calls (see the earlier uses around the code and the in_array check that compares
against these values).
- Around line 56-66: The deduplication can keep a shorter match when two entries
share the same position because usort is unstable; change the usort comparator
used on $matchedWords to sort by position ascending and then by length
descending so longer matches at the same position come first, ensuring the dedup
loop (which uses $coveredEnd, $deduplicated, $mw->position and $mw->length)
always retains the longest overlapping match; update the comparator passed to
usort in PatternDriver (the $matchedWords sort) accordingly.
🧹 Nitpick comments (7)
config/languages/english.php (1)
1331-1351: Appended words break the alphabetical order of the profanities array.The existing list is alphabetically sorted, but these 21 additions are appended at the end. This makes it harder to spot duplicates and maintain the list over time. Consider inserting them at their sorted positions.
src/Drivers/RegexDriver.php (2)
22-24: Redundant null-coalescing on a typedstringparameter.
$textis declared asstringin the method signature, so it can never benull. The?? ''on line 23 is dead code. Same applies toPhoneticDriverandPatternDriver.Suggested fix
if (empty($text)) { - return new Result($text ?? '', $text ?? '', [], 0); + return new Result($text, $text, [], 0); }
86-109: No guard against the same text region being matched multiple times within one while-loop pass.Within a single pass of the while loop,
preg_match_allfor a given profanity captures all occurrences up-front (line 52), and then the innerforeachmasks them one by one (line 89). However, a subsequent profanity expression in the same outerforeachiteration can still match a region that was just masked in the current pass (sincepreg_match_allruns against the updated$normalizedString). The longest-first sort mitigates this in the common case, but it doesn't handle every scenario—for example, two profanities of equal key length where one is a substring of a match region already processed by the other.In practice this is low-risk because asterisks are unlikely to match profanity regexes, but the PatternDriver explicitly deduplicates and this driver doesn't. Consider adding a covered-range check similar to PatternDriver if robustness is important.
src/Drivers/PhoneticDriver.php (2)
27-29: Same redundant?? ''as noted in RegexDriver.
$textis typedstring—?? ''is unreachable.
35-50: Language-support check could be streamlined.Minor readability improvement — the loop + flag can be collapsed with
array_intersect.Suggested simplification
- $languages = array_map('strtolower', explode(',', $language)); - $supported = array_map('strtolower', $this->supportedLanguages); - - $isSupported = false; - foreach ($languages as $lang) { - if (in_array(trim($lang), $supported, true)) { - $isSupported = true; - break; - } - } - - if (!$isSupported) { + $languages = array_map(fn($l) => strtolower(trim($l)), explode(',', $language)); + $supported = array_map('strtolower', $this->supportedLanguages); + + if (empty(array_intersect($languages, $supported))) { return new Result($text, $text, [], 0); }src/Drivers/PatternDriver.php (2)
17-18: Same$text ?? ''nit as the other drivers.
29-54: False-positive check is per-profanity, not per-match — consider hoisting it.
in_array($lowerProfanity, $falsePositives)on line 40 doesn't depend on the individual match, only on the profanity word itself. Moving this check beforepreg_match_allavoids the regex work entirely for false-positive profanities.Suggested restructuring
foreach ($profanities as $profanity) { $lowerProfanity = mb_strtolower($profanity, 'UTF-8'); + + // Skip profanities that are known false positives + if (in_array($lowerProfanity, $falsePositives)) { + continue; + } + $pattern = '/\b' . preg_quote($lowerProfanity, '/') . '\b/iu'; if (preg_match_all($pattern, $lowerText, $matches, PREG_OFFSET_CAPTURE)) { foreach ($matches[0] as $match) { $start = mb_strlen(substr($lowerText, 0, $match[1]), 'UTF-8'); $length = mb_strlen($match[0], 'UTF-8'); $originalMatch = mb_substr($text, $start, $length); - // Skip false positives - if (in_array($lowerProfanity, $falsePositives)) { - continue; - } - $matchedWords[] = new MatchedWord(
…n PatternDriver Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/Drivers/PatternDriver.php`:
- Line 88: The use of str_word_count in PatternDriver.php can miscount multibyte
words; replace it with a Unicode-aware word count using preg_match_all with the
\p{L}+ pattern and the 'u' modifier (e.g., count matches into $totalWords) and
wrap with max(1, ...) to preserve the minimum-one behavior; update the code
where $totalWords is computed so it uses preg_match_all('/\p{L}+/u', $text,
$matches) and sets $totalWords = max(1, count($matches[0])) to correctly handle
accented and non-ASCII letters.
🧹 Nitpick comments (2)
src/Drivers/PatternDriver.php (2)
17-18: Redundant null-coalescing on a non-nullablestringparameter.
$textis typed asstring, so$text ?? ''can never evaluate to''. This is harmless but misleading—it suggests$textcould be null.Proposed simplification
- return new Result($text ?? '', $text ?? '', [], 0); + return new Result($text, $text, [], 0);
29-53: False-positive check can be hoisted before the regex call.The
in_array($lowerProfanity, $falsePositives)check on line 40 depends only on the outer$profanityloop variable, not on the individual match. Moving it beforepreg_match_allavoids running the regex entirely for profanities that will always be skipped.Proposed optimization
foreach ($profanities as $profanity) { $lowerProfanity = mb_strtolower($profanity, 'UTF-8'); + + if (in_array($lowerProfanity, $falsePositives)) { + continue; + } + $pattern = '/\b' . preg_quote($lowerProfanity, '/') . '\b/iu'; if (preg_match_all($pattern, $lowerText, $matches, PREG_OFFSET_CAPTURE)) { foreach ($matches[0] as $match) { $start = mb_strlen(substr($lowerText, 0, $match[1]), 'UTF-8'); $length = mb_strlen($match[0], 'UTF-8'); $originalMatch = mb_substr($text, $start, $length); - // Skip false positives - if (in_array($lowerProfanity, $falsePositives)) { - continue; - } - $matchedWords[] = new MatchedWord(
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/Drivers/PatternDriver.php`:
- Around line 15-19: PatternDriver::detect currently omits UTF-8 validation but
uses mb_* functions; add the same guard used in RegexDriver/PhoneticDriver at
the top of detect(): check encoding with mb_check_encoding($text, 'UTF-8') and,
if it fails, normalise the input with mb_convert_encoding($text, 'UTF-8',
'UTF-8') before continuing (ensuring the early-return branch that constructs
Result uses the normalized text). This keeps encoding-consistency with
RegexDriver/PhoneticDriver and avoids garbled output when detect() relies on
mb_* calls.
In `@src/Drivers/PhoneticDriver.php`:
- Around line 93-100: The MatchedWord in PhoneticDriver is being populated with
the normalized token ($word) which causes inconsistent mixed normalized/original
text compared to PatternDriver; update the construction of MatchedWord in
PhoneticDriver to set text to the substring extracted from the original $text
using the same approach as PatternDriver (use mb_substr($text, $start, $length)
with proper encoding) while keeping base, severity, position, length, and
language unchanged so downstream PipelineDriver/CallbackMask receive the
original-cased matched text.
🧹 Nitpick comments (4)
src/Drivers/RegexDriver.php (2)
22-24: LGTM — empty-text guard is correct.Tiny nit:
$text ?? ''is redundant since$textis typed asstringand cannot benull.empty('')istrue, so this path works, but the null-coalesce is a no-op.
42-106: Dead variables:$uniqueMapand$profanitiesCountare never read.
$profanitiesCount(line 44, incremented on line 93) and$uniqueMap(line 43, populated on lines 104-106) are tracked but never consumed — they don't feed into theResult, score calculation, or any return value. Remove them to avoid confusion.♻️ Suggested cleanup
$matchedWords = []; - $uniqueMap = []; - $profanitiesCount = 0; $continue = true; while ($continue) { ... - $continue = true; - - // Mask in normalizedString only (needed for loop termination) ... - // Track match - $profanitiesCount++; $matchedWords[] = new MatchedWord( ... ); - - if (!isset($uniqueMap[$profanity])) { - $uniqueMap[$profanity] = true; - } }src/Drivers/PatternDriver.php (1)
29-54: False-positive check can be hoisted outside the inner match loop.The
in_array($lowerProfanity, $falsePositives)on line 40 doesn't depend on the individual match — it depends only on the profanity word. Moving it beforepreg_match_allavoids running the regex entirely for profanities that are also false positives.♻️ Suggested change
foreach ($profanities as $profanity) { $lowerProfanity = mb_strtolower($profanity, 'UTF-8'); + + // Skip profanities that are also false positives + if (in_array($lowerProfanity, $falsePositives)) { + continue; + } + $pattern = '/\b' . preg_quote($lowerProfanity, '/') . '\b/iu'; if (preg_match_all($pattern, $lowerText, $matches, PREG_OFFSET_CAPTURE)) { foreach ($matches[0] as $match) { $start = mb_strlen(substr($lowerText, 0, $match[1]), 'UTF-8'); $length = mb_strlen($match[0], 'UTF-8'); $originalMatch = mb_substr($text, $start, $length); - // Skip false positives - if (in_array($lowerProfanity, $falsePositives)) { - continue; - } - $matchedWords[] = new MatchedWord(src/Drivers/PipelineDriver.php (1)
23-30: Sub-driver results include wasted masking work.Each sub-driver's
detect()builds a fully maskedcleanText(including right-to-left masking), butPipelineDriveronly readsresult->words()and discards the clean text, re-masking from scratch on lines 64-69. This means every mask invocation inside a sub-driver is thrown away.For
CharacterMask/GrawlixMaskthis is negligible, but forCallbackMaskwith side effects, the callback fires for every match in every sub-driver and again in PipelineDriver. Consider adding a lightweight "detect-only" mode or a no-op mask sentinel for sub-drivers when running inside a pipeline.
…icDriver matches Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replaces unbounded lazy quantifier (*?) with {0,3} in the separator
expression between profanity characters. This prevents PHP-FPM worker
segfaults caused by PCRE JIT stack overflow when processing 1,300+
complex patterns with nested lazy quantifiers.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tack overflow
Each branch in the separator group now matches exactly one character,
with the outer {0,3}? handling repetition. Removes redundant (?:\s)
alternative since \s is already in the character class.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Ground-up rewrite for v4 with a clean, extensible architecture:
regex(obfuscation-aware),pattern(fast exact match),phonetic(sound-alike via metaphone + Levenshtein), andpipeline(chains multiple drivers)withSeverity()filtering works correctlycheck()results cached by content hash (text + driver + language + severity + allow/block + mask). Bypassed forCallbackMask. Configurable viacache.resultsBlaspabletrait for auto-sanitize/reject on model saveBlasp::in('spanish')->mask('#')->withSeverity(Severity::High)->check($text)Blasp::fake()with assertionsTest plan
tests/SeverityMapTest.php— severity filtering works for Spanish, French, German (19 tests)tests/ResultCachingTest.php— caching hits, key variation, CallbackMask bypass, cache clearing, config toggle (12 tests)🤖 Generated with Claude Code
Summary by CodeRabbit