Skip to content

Conversation

@BurntSushi
Copy link
Member

@BurntSushi BurntSushi commented May 15, 2016

This uses the "Teddy" algorithm, as learned from the Hyperscan regular
expression library.

This support is optional, subject to the following:

  1. A nightly compiler.
  2. Enabling the simd-accel feature.
  3. Adding RUSTFLAGS="-C target-feature=+ssse3" when compiling.

@BurntSushi
Copy link
Member Author

Note that this PR is blocked on a new release of simd making its way to crates.io. :-)

cc @huonw @alexcrichton

@killercup I may be able to carve out some interesting projects worth mentoring from this. I left quite a number of TODOs. Feel like learning SIMD? :-)

@BurntSushi
Copy link
Member Author

Relevant benchmarks:

name                                   rust.master ns/iter   rust.simd ns/iter       diff ns/iter   diff %
sherlock::name_alt3                    1,153,246 (515 MB/s)  187,304 (3,176 MB/s)        -965,942  -83.76%
sherlock::name_alt4_nocase             1,223,618 (486 MB/s)  293,523 (2,026 MB/s)        -930,095  -76.01%
sherlock::name_alt5                    319,736 (1,860 MB/s)  182,599 (3,258 MB/s)        -137,137  -42.89%
sherlock::name_alt5_nocase             1,223,311 (486 MB/s)  726,282 (819 MB/s)          -497,029  -40.63%
sherlock::name_holmes_nocase           1,108,772 (536 MB/s)  258,606 (2,300 MB/s)        -850,166  -76.68%
sherlock::name_sherlock_holmes_nocase  1,159,518 (513 MB/s)  239,155 (2,487 MB/s)        -920,363  -79.37%
sherlock::name_sherlock_nocase         1,160,342 (512 MB/s)  235,768 (2,523 MB/s)        -924,574  -79.68%
sherlock::the_nocase                   1,643,616 (361 MB/s)  461,669 (1,288 MB/s)      -1,181,947  -71.91%

@killercup
Copy link
Contributor

Cool! Thank you for thinking of me! I'll have a look at this :)

@killercup
Copy link
Contributor

killercup commented May 16, 2016

Fiddling with this a bit I noticed (aside from the stuff I did in #232) that there seems to be a genuine underflow exposed by the fowler::match_basic_81 test case: In src/simd_accel/teddy128.rs:573, you call verify_128 with pos - 2. But pos is usize and doesn't appear to be strictly > 2 here.

Edit: I added a simple check to prevent the underflow in #232.

@BurntSushi
Copy link
Member Author

@killercup Thanks! I've fixed that in this PR. (I didn't see your edit.)

@BurntSushi BurntSushi force-pushed the simd-teddy branch 6 times, most recently from 36faa2c to 6f2bb0f Compare May 18, 2016 14:24
This uses the "Teddy" algorithm, as learned from the Hyperscan regular
expression library: https://01.org/hyperscan

This support optional, subject to the following:

1. A nightly compiler.
2. Enabling the `simd-accel` feature.
3. Adding `RUSTFLAGS="-C target-feature=+ssse3"` when compiling.
@BurntSushi BurntSushi merged commit 05e4a02 into master May 18, 2016
@BurntSushi
Copy link
Member Author

@llogiq FYI, this PR impacts the regex-dna benchmark. It makes the multithreaded version slightly faster (0.69s down to 0.63s on my system), but it makes the single threaded version twice as fast (2.55s down to 1.23s). We'll have to wait for simd on stable to get this into the benchmark game though!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants