Skip to content

perf(#1470): cache compiled PathMatchers in GlobFilter constructor#1471

Open
idrisoffrinat-cpu wants to merge 2 commits into
objectionary:masterfrom
idrisoffrinat-cpu:1470-cache-glob-matchers
Open

perf(#1470): cache compiled PathMatchers in GlobFilter constructor#1471
idrisoffrinat-cpu wants to merge 2 commits into
objectionary:masterfrom
idrisoffrinat-cpu:1470-cache-glob-matchers

Conversation

@idrisoffrinat-cpu
Copy link
Copy Markdown

@idrisoffrinat-cpu idrisoffrinat-cpu commented Apr 24, 2026

Closes #1470.

Problem

GlobFilter.test() recompiled every include/exclude glob into a PathMatcher
on every invocation, even though includes and excludes are final and
immutable. For a plugin run that filters thousands of .class files, each
pattern was compiled N times instead of once.

Fix

Move the PathMatcher compilation to the constructor and store the resulting
sets as final fields. test() now reads precomputed whitelist / blacklist
instead of rebuilding them per call. The matcher-building logic itself is
unchanged (same GlobFilter::matcher helper, same Collectors.toSet()).

This also aligns with the immutable-object principle the class already follows
for its string fields — all state is established at construction time.

Testing

mvn test -Dtest=GlobFilterTest — 12/12 green, no new or modified tests
needed since behaviour is identical.

[INFO] Running org.eolang.jeo.GlobFilterTest
[INFO] Tests run: 12, Failures: 0, Errors: 0, Skipped: 0
[INFO] BUILD SUCCESS

Summary by CodeRabbit

Release Notes

  • Performance Improvements
    • Optimized glob pattern matching performance by precompiling patterns during initialization, reducing computational overhead on subsequent operations.

…nstructor

GlobFilter.test() used to compile the include/exclude glob patterns
into PathMatcher objects on every invocation, despite includes and
excludes being final. In a typical run over thousands of .class files
that meant every pattern was compiled N times instead of once.

Move the compilation to the constructor and store the resulting
matchers as final fields. test() now reads the precomputed sets.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

📝 Walkthrough

Walkthrough

The GlobFilter class now precompiles glob patterns into cached PathMatcher sets during object construction rather than recompiling them on every test() invocation. The filtering logic remains unchanged, but performance is improved by eliminating redundant pattern compilation for repeated file filtering operations.

Changes

Cohort / File(s) Summary
Pattern Compilation Caching
src/main/java/org/eolang/jeo/GlobFilter.java
Moved PathMatcher compilation from the test(Path) method to the constructor, storing precompiled matchers in private whitelist and blacklist fields. The filter logic remains functionally identical but eliminates repeated pattern compilation for each file tested.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Suggested reviewers

  • yegor256

Poem

🐰 A rabbit hops through files with glee,
Patterns compiled once, not endlessly!
Cache them up front, so tests run swift,
Each little filter gets a gift ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly summarizes the main change: caching compiled PathMatchers in the GlobFilter constructor for performance improvement, addressing issue #1470.
Linked Issues check ✅ Passed The PR fulfills all coding requirements from #1470: PathMatchers are now compiled once in the constructor and cached in final whitelist/blacklist fields, preserving existing matching behavior and aligning with immutable object principles.
Out of Scope Changes check ✅ Passed All changes are directly in scope: only the GlobFilter class was modified with the addition of whitelist/blacklist fields and constructor refactoring to address the performance issue, with no extraneous modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/main/java/org/eolang/jeo/GlobFilter.java (1)

54-59: Optional: consider List<PathMatcher> instead of Set<PathMatcher>.

PathMatcher implementations don't override equals/hashCode, so the Set deduplicates by identity only — every compiled matcher is distinct regardless of pattern. Since the source includes/excludes are already Set<String> (patterns are unique), you gain no dedup but pay for hashing on insert and iteration. A List<PathMatcher> would be a touch more efficient and semantically clearer.

♻️ Proposed refactor
-    private final Set<PathMatcher> whitelist;
+    private final List<PathMatcher> whitelist;
@@
-    private final Set<PathMatcher> blacklist;
+    private final List<PathMatcher> blacklist;
@@
-        this.whitelist = includes.stream()
-            .map(GlobFilter::matcher)
-            .collect(Collectors.toSet());
-        this.blacklist = excludes.stream()
-            .map(GlobFilter::matcher)
-            .collect(Collectors.toSet());
+        this.whitelist = includes.stream()
+            .map(GlobFilter::matcher)
+            .collect(Collectors.toUnmodifiableList());
+        this.blacklist = excludes.stream()
+            .map(GlobFilter::matcher)
+            .collect(Collectors.toUnmodifiableList());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/java/org/eolang/jeo/GlobFilter.java` around lines 54 - 59, The
whitelist and blacklist are built as Set<PathMatcher> but PathMatcher doesn't
implement equals/hashCode so the Set provides no real dedup and wastes hashing;
change the fields in GlobFilter from Set<PathMatcher> to List<PathMatcher>,
build them with
includes.stream().map(GlobFilter::matcher).collect(Collectors.toList()) and
excludes.stream().map(GlobFilter::matcher).collect(Collectors.toList()), and
update any methods that iterate or test matchers (referencing whitelist,
blacklist and the matcher(...) method) to use List semantics (e.g., iterate with
for/stream anyMatch) rather than Set-specific operations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/main/java/org/eolang/jeo/GlobFilter.java`:
- Around line 54-59: The whitelist and blacklist are built as Set<PathMatcher>
but PathMatcher doesn't implement equals/hashCode so the Set provides no real
dedup and wastes hashing; change the fields in GlobFilter from Set<PathMatcher>
to List<PathMatcher>, build them with
includes.stream().map(GlobFilter::matcher).collect(Collectors.toList()) and
excludes.stream().map(GlobFilter::matcher).collect(Collectors.toList()), and
update any methods that iterate or test matchers (referencing whitelist,
blacklist and the matcher(...) method) to use List semantics (e.g., iterate with
for/stream anyMatch) rather than Set-specific operations.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 955fbaca-d7cd-484b-8c53-d08b9fb148e6

📥 Commits

Reviewing files that changed from the base of the PR and between 8f15f26 and 9287a91.

📒 Files selected for processing (1)
  • src/main/java/org/eolang/jeo/GlobFilter.java

Copy link
Copy Markdown
Member

@volodya-lombrozo volodya-lombrozo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@idrisoffrinat-cpu great thanks for the contribution, just a small suggestion.

GlobFilter(final Set<String> includes, final Set<String> excludes) {
this.includes = includes;
this.excludes = excludes;
this.whitelist = includes.stream()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@idrisoffrinat-cpu Can we create a separate constructor for the GolbFilter that will accept all four params? GlobFilter(final Set<String> includes, final Set<String> excludes, Set<PathMatcher> whitelist, private final Set<PathMatcher> blacklist;)

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Performance Analysis

All benchmarks are within the acceptable range. No critical degradation detected (threshold is 100%). Please refer to the detailed report for more information.

Click to see the detailed report
Test Base Score PR Score Change % Change Unit Mode
benchmark.AssembleBenchmark.assemble 4641.792 3667.194 -974.598 -21.00% ops/s Throughput
benchmark.DisassembleBenchmark.disassemble 1647.849 1547.300 -100.549 -6.10% ops/s Throughput

⚠️ Performance loss: benchmark.AssembleBenchmark.assemble is slower by -974.598 ops/s (-21.00%)
⚠️ Performance loss: benchmark.DisassembleBenchmark.disassemble is slower by -100.549 ops/s (-6.10%)

PathMatcher does not override equals/hashCode, so Set deduplicates
only by identity — providing no real dedup since the source includes
and excludes are already Set<String> with unique patterns. Switch to
List<PathMatcher> built with Collectors.toUnmodifiableList() to drop
the unnecessary hashing on insert and to express intent more clearly.

Per CodeRabbit nitpick on PR objectionary#1471.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GlobFilter.test() recompiles glob patterns on every call

2 participants