Skip to content

Conversation

@nowNick
Copy link

@nowNick nowNick commented Sep 19, 2024

Description

This PR is one out of 3 proposed approaches how to optimize memory consumption for specific edge case with ATC Router. The scenario that is being considered is a Router that's defined with the same regular expression in different predicates. Currently Router does not have any ability to remember the regex passed resulting in a lot of copies of the same regex.

Approach in this PR

This PR proposes adding a special attribute to Router struct called regex_cache. It is passed down to the parser and allows the parser to either retrieve the Value::Regex from cache or create a new one and store it there. This approach does not use any singleton pattern. The upside of this solution is easy to track state - no global state. The downside is the requirement to change a lot of functions to "drill" down the regex_cache property to the place where it needs to be used.

Benchmarks

The benchmarking method was to use the commit in this PR: #253 on top of each of these PRs. The memory benchmark was done using dhat crate and performance was measured with criterion crate.

Memory consumption:

Baseline (main) Router attribute (this) Thread local (#250) Memoize (#252)
Total (human-readable) 4.54 GB 🔴 242.5 MB 🏆 242.5 MB 🏆 299.6 MB
Total (exact) 4,544,275,926 bytes in 6,762,329 blocks 242,508,176 bytes in 1,922,359 blocks 242,508,176 bytes in 1,922,359 blocks 299,608,085 bytes in 1,962,821 blocks
At t-gmax 194,867,606 bytes in 382,216 blocks 4,713,936 bytes in 41,863 blocks 4,713,936 bytes in 41,863 block 60,538,606 bytes in 72,258 blocks
At t-end 0 bytes in 0 blocks 0 bytes in 0 blocks 108,034 bytes in 168 blocks 35,671,000 bytes in 42 blocks

Performance:

Baseline (main) Router attribute (this) Thread local (#250) Memoize (#252)
660.68 ms 🔴 78.834 ms 🏆 80.695 ms 86.169 ms

Other PRs Links:

Issue reference:

KAG-3182

@nowNick nowNick force-pushed the chore/cache-regex-predicates-with-rc---simple branch from 3e66902 to 2e13e8f Compare September 19, 2024 12:45
@nowNick nowNick changed the title perf(*): cache regex predicates with rc using router attribute perf(parser): cache regex predicates with rc using router attribute Sep 19, 2024
@nowNick nowNick marked this pull request as ready for review September 19, 2024 15:17
@nowNick nowNick requested review from dndx and samugi September 19, 2024 15:19
Copy link
Member

@ADD-SP ADD-SP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer this solution:

  1. LRU cache makes the performance less stable.
  2. The thread_local solution should save more memory for more than one router instance, this is not the typical usage of the router, and I personally don't like keeping a global state in a standalone library.

I think this approach is fine, I'd like to discuss one thing. The current implementation leaks the caching logical during the parsing, It works, but it is not as elegant as it could be for the type designing as AST should just describe the syntax. Do you have any idea to make it better?

Maybe we can just store the regex string in AST and build that later.

@bungle
Copy link
Member

bungle commented Sep 20, 2024

Do we need to take into account things like this:
https://github.com/openresty/lua-nginx-module?tab=readme-ov-file#lua_regex_cache_max_entries

Aka in Lua PCRE regex caches we have a upper bound.

@ADD-SP
Copy link
Member

ADD-SP commented Sep 23, 2024

Do we need to take into account things like this: https://github.com/openresty/lua-nginx-module?tab=readme-ov-file#lua_regex_cache_max_entries

Aka in Lua PCRE regex caches we have a upper bound.

@bungle

Let's say the customer configured 10 regexes, and our upper bound is 5, which means that we have to build regex on matching; it slows down the router-matching.

I think the reason OpenResty sets an upper bound is that OpenResty doesn't know the lifetime of each regex, but it doesn't want to build a regex each time, so it has to set an upper bound.

For atc-router, we know the lifetime of each regex, so the upper bound is not required.

It makes sense to build N regexes in memory if the customer configured N regexes, it is an acceptable cost and makes the matching speed more stable.

@nowNick nowNick force-pushed the chore/cache-regex-predicates-with-rc---simple branch from 2e13e8f to e78ee56 Compare September 26, 2024 17:11
@nowNick nowNick requested a review from dndx September 27, 2024 06:48
@nowNick
Copy link
Author

nowNick commented Sep 27, 2024

Hey @ADD-SP :) !
I was thinking if there's a way to refactor it but unfortunately this is the only solution that comes to my mind. We have to find the first place that can hold an object that will be shared across add_matcher calls and that's the Router. We cannot store it in ast since it creates a new instance on each add_matcher call. Actually when you think of it regex_cache shares some similar logic to fields. They both store general information about the matchers that are shared across add_matcher calls. The first one remembers regexes the other one predicates.

The only other approaches I saw were in the other PRs linked in the description at the top.

@nowNick nowNick requested a review from ADD-SP October 8, 2024 09:11
@dndx
Copy link
Member

dndx commented Oct 16, 2024

@nowNick Could you rebase this PR?

@ADD-SP
Copy link
Member

ADD-SP commented Oct 31, 2024

@nowNick Could you resolve merge conflicts?

@nowNick
Copy link
Author

nowNick commented Oct 31, 2024

@ADD-SP I need a little bit more time to resolve it. The changes with CPU locality optimizations are a little bit tricky to incorporate in this PR.

@nowNick nowNick force-pushed the chore/cache-regex-predicates-with-rc---simple branch from c6957bb to 5d6cf68 Compare November 21, 2024 16:13
src/router.rs Outdated
}

fn release_cache(cir: &CirProgram, router: &mut Router) {
cir.instructions.iter().for_each(|instruction| {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since instructions are now just in an array I can simply iterate through it and release regexes when I encounter them. Previously I had to traverse the tree of AST.

@nowNick nowNick force-pushed the chore/cache-regex-predicates-with-rc---simple branch from 5d6cf68 to de4143b Compare November 21, 2024 16:17
@nowNick
Copy link
Author

nowNick commented Nov 21, 2024

I've rebased the branch to use CIR program. From what I can see the memory improvents remained the same ~19times less memory for this benchmark but what is more we achieved further CPU optimizations. From 80ms to 45ms! (46% better)

@dndx
Copy link
Member

dndx commented Nov 22, 2024

@xianghai2 @ADD-SP Could you help @nowNick review this once you have some chance? Not a release blocker.

@nowNick nowNick force-pushed the chore/cache-regex-predicates-with-rc---simple branch from de4143b to 7aad44b Compare May 5, 2025 12:02
@nowNick nowNick force-pushed the chore/cache-regex-predicates-with-rc---simple branch 4 times, most recently from 02ee011 to 3be0480 Compare May 5, 2025 12:21
Copy link
Contributor

@lucab lucab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM.
(There is a bit of re-inventing of Weak in the removing logic, but I think it makes sense as they would be slightly annoying to use here anyway)

Just pointing out that this changes some public signatures, so it would need a v2 bump. I'm not sure how those are coordinated here.

#[allow(clippy::result_large_err)] // it's fine as parsing is not the hot path
pub fn parse(source: &str) -> ParseResult<Expression> {
ATCParser::new().parse_matcher(source)
pub fn parse(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-note: this changes a public signature.

src/ast.rs Outdated
Int(i64),
#[cfg_attr(feature = "serde", serde(with = "serde_regex"))]
Regex(Regex),
Regex(Rc<Regex>),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-note: this changes a public type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Rc makes the entire router !Send?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so.
And the new regex_cache: HashMap<String, Rc<Regex>> field will definitely have that effect too.
Does this need to be used in a multi-threaded context?
(I didn't see any static assertions about Send/Sync)

@nowNick nowNick force-pushed the chore/cache-regex-predicates-with-rc---simple branch from 3be0480 to c148ec4 Compare May 6, 2025 08:41
@nowNick nowNick force-pushed the chore/cache-regex-predicates-with-rc---simple branch from c148ec4 to fd5c470 Compare June 23, 2025 12:15
@lucab lucab self-requested a review June 23, 2025 14:27
@nowNick nowNick force-pushed the chore/cache-regex-predicates-with-rc---simple branch from fd5c470 to 43f0a4e Compare June 23, 2025 14:43
@nowNick
Copy link
Author

nowNick commented Jun 23, 2025

@lucab @dndx @ADD-SP - I've switched from Rc to Arc and added static assertions to make sure Router is Send and Sync. I've rerun my benchmarks and there was no change. Could you guys give it another look?

@ProBrian
Copy link

ProBrian commented Jul 7, 2025

LGTM

@nowNick nowNick force-pushed the chore/cache-regex-predicates-with-rc---simple branch from 43f0a4e to 8e65ead Compare September 24, 2025 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants