-
Notifications
You must be signed in to change notification settings - Fork 28
perf(parser): cache regex predicates with rc using router attribute #251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
3e66902 to
2e13e8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer this solution:
- LRU cache makes the performance less stable.
- The thread_local solution should save more memory for more than one router instance, this is not the typical usage of the router, and I personally don't like keeping a global state in a standalone library.
I think this approach is fine, I'd like to discuss one thing. The current implementation leaks the caching logical during the parsing, It works, but it is not as elegant as it could be for the type designing as AST should just describe the syntax. Do you have any idea to make it better?
Maybe we can just store the regex string in AST and build that later.
|
Do we need to take into account things like this: Aka in Lua PCRE regex caches we have a upper bound. |
Let's say the customer configured 10 regexes, and our upper bound is 5, which means that we have to build regex on matching; it slows down the router-matching. I think the reason OpenResty sets an upper bound is that OpenResty doesn't know the lifetime of each regex, but it doesn't want to build a regex each time, so it has to set an upper bound. For atc-router, we know the lifetime of each regex, so the upper bound is not required. It makes sense to build N regexes in memory if the customer configured N regexes, it is an acceptable cost and makes the matching speed more stable. |
2e13e8f to
e78ee56
Compare
|
Hey @ADD-SP :) ! The only other approaches I saw were in the other PRs linked in the description at the top. |
|
@nowNick Could you rebase this PR? |
|
@nowNick Could you resolve merge conflicts? |
|
@ADD-SP I need a little bit more time to resolve it. The changes with CPU locality optimizations are a little bit tricky to incorporate in this PR. |
c6957bb to
5d6cf68
Compare
src/router.rs
Outdated
| } | ||
|
|
||
| fn release_cache(cir: &CirProgram, router: &mut Router) { | ||
| cir.instructions.iter().for_each(|instruction| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since instructions are now just in an array I can simply iterate through it and release regexes when I encounter them. Previously I had to traverse the tree of AST.
5d6cf68 to
de4143b
Compare
|
I've rebased the branch to use CIR program. From what I can see the memory improvents remained the same ~19times less memory for this benchmark but what is more we achieved further CPU optimizations. From 80ms to 45ms! (46% better) |
|
@xianghai2 @ADD-SP Could you help @nowNick review this once you have some chance? Not a release blocker. |
de4143b to
7aad44b
Compare
02ee011 to
3be0480
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM.
(There is a bit of re-inventing of Weak in the removing logic, but I think it makes sense as they would be slightly annoying to use here anyway)
Just pointing out that this changes some public signatures, so it would need a v2 bump. I'm not sure how those are coordinated here.
| #[allow(clippy::result_large_err)] // it's fine as parsing is not the hot path | ||
| pub fn parse(source: &str) -> ParseResult<Expression> { | ||
| ATCParser::new().parse_matcher(source) | ||
| pub fn parse( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self-note: this changes a public signature.
src/ast.rs
Outdated
| Int(i64), | ||
| #[cfg_attr(feature = "serde", serde(with = "serde_regex"))] | ||
| Regex(Regex), | ||
| Regex(Rc<Regex>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self-note: this changes a public type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will Rc makes the entire router !Send?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so.
And the new regex_cache: HashMap<String, Rc<Regex>> field will definitely have that effect too.
Does this need to be used in a multi-threaded context?
(I didn't see any static assertions about Send/Sync)
3be0480 to
c148ec4
Compare
c148ec4 to
fd5c470
Compare
fd5c470 to
43f0a4e
Compare
|
LGTM |
switch Rc to Arc and add static assertions
43f0a4e to
8e65ead
Compare
Description
This PR is one out of 3 proposed approaches how to optimize memory consumption for specific edge case with ATC Router. The scenario that is being considered is a Router that's defined with the same regular expression in different predicates. Currently Router does not have any ability to remember the regex passed resulting in a lot of copies of the same regex.
Approach in this PR
This PR proposes adding a special attribute to Router struct called
regex_cache. It is passed down to the parser and allows the parser to either retrieve theValue::Regexfrom cache or create a new one and store it there. This approach does not use any singleton pattern. The upside of this solution is easy to track state - no global state. The downside is the requirement to change a lot of functions to "drill" down theregex_cacheproperty to the place where it needs to be used.Benchmarks
The benchmarking method was to use the commit in this PR: #253 on top of each of these PRs. The memory benchmark was done using dhat crate and performance was measured with criterion crate.
Memory consumption:
Performance:
Other PRs Links:
Issue reference:
KAG-3182