Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
impl: substantially reduce regex stack size
This commit fixes a fairly large regression in the stack size of a Regex
introduced in regex 1.4.4. When I dropped thread_local and replaced it
with Pool, it turned out that Pool inlined a T into its struct and a
Regex in turn had Pool inlined into itself. It further turns out that
the T=ProgramCache is itself quite large.

We fix this by introducing an indirection in the inner regex type. That
is, we use a Box<Pool> instead of a Pool. This shrinks the size of a
Regex from 856 bytes to 16 bytes.

Interestingly, prior to regex 1.4.4, a Regex was still quite substantial
in size, coming in at around 552 bytes. So it looks like the 1.4.4
release didn't dramatically increase it, but it increased it enough that
folks started experiencing real problems: stack overflows.

Since indirection can lead to worse locality and performance loss, I did
run the benchmark suite. I couldn't see any measurable difference. This
is generally what I would expect. This is an indirection at a fairly
high level. There's lots of other indirection already, and this
indirection isn't accessed in a hot path. (The regex cache itself is of
course used in hot paths, but by the time we get there, we have already
followed this particular pointer.)

We also include a regression test that asserts a Regex (and company) are
16 bytes in size. While this isn't an API guarantee, it at least means
that increasing the size of Regex will be an intentional thing in the
future and not an accidental leakage of implementation details.

Fixes #750, Fixes #751

Ref servo/servo#28269
  • Loading branch information
BurntSushi committed Mar 14, 2021
commit 081d4300af960413c9269ad5f64447e650110138
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,17 @@
1.4.5 (2021-03-14)
==================
This is a small patch release that fixes a regression in the size of a `Regex`
in the 1.4.4 release. Prior to 1.4.4, a `Regex` was 552 bytes. In the 1.4.4
release, it was 856 bytes due to internal changes. In this release, a `Regex`
is now 16 bytes. In general, the size of a `Regex` was never something that was
on my radar, but this increased size in the 1.4.4 release seems to have crossed
a threshold and resulted in stack overflows in some programs.

* [BUG #750](https://github.com/rust-lang/regex/pull/750):
Fixes stack overflows seemingly caused by a large `Regex` size by decreasing
its size.


1.4.4 (2021-03-11)
==================
This is a small patch release that contains some bug fixes. Notably, it also
Expand Down
15 changes: 11 additions & 4 deletions src/exec.rs
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,14 @@ pub struct Exec {
/// All read only state.
ro: Arc<ExecReadOnly>,
/// A pool of reusable values for the various matching engines.
pool: Pool<ProgramCache>,
///
/// Note that boxing this value is not strictly necessary, but it is an
/// easy way to ensure that T does not bloat the stack sized used by a pool
/// in the case where T is big. And this turns out to be the case at the
/// time of writing for regex's use of this pool. At the time of writing,
/// the size of a Regex on the stack is 856 bytes. Boxing this value
/// reduces that size to 16 bytes.
pool: Box<Pool<ProgramCache>>,
}

/// `ExecNoSync` is like `Exec`, except it embeds a reference to a cache. This
Expand Down Expand Up @@ -1446,11 +1453,11 @@ impl ExecReadOnly {
lcs_len >= 3 && lcs_len > self.dfa.prefixes.lcp().char_len()
}

fn new_pool(ro: &Arc<ExecReadOnly>) -> Pool<ProgramCache> {
fn new_pool(ro: &Arc<ExecReadOnly>) -> Box<Pool<ProgramCache>> {
let ro = ro.clone();
Pool::new(Box::new(move || {
Box::new(Pool::new(Box::new(move || {
AssertUnwindSafe(RefCell::new(ProgramCacheInner::new(&ro)))
}))
})))
}
}

Expand Down
15 changes: 15 additions & 0 deletions tests/test_default.rs
Original file line number Diff line number Diff line change
Expand Up @@ -136,3 +136,18 @@ fn oibits_regression() {

let _ = panic::catch_unwind(|| Regex::new("a").unwrap());
}

// See: https://github.com/rust-lang/regex/issues/750
#[test]
#[cfg(target_pointer_width = "64")]
fn regex_is_reasonably_small() {
use std::mem::size_of;

use regex::bytes;
use regex::{Regex, RegexSet};

assert_eq!(16, size_of::<Regex>());
assert_eq!(16, size_of::<RegexSet>());
assert_eq!(16, size_of::<bytes::Regex>());
assert_eq!(16, size_of::<bytes::RegexSet>());
}