Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Simplify u8::to_ascii_{upp,low}ercase while keeping it fast
  • Loading branch information
SimonSapin committed Mar 18, 2019
commit 0ad91f73d92c3b8d3978f8f54c04b8efe3d2e673
24 changes: 23 additions & 1 deletion src/libcore/benches/ascii.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,26 @@
// See comments in `u8::to_ascii_uppercase` in `src/libcore/num/mod.rs`.
// Lower-case ASCII 'a' is the first byte that has its highest bit set
// after wrap-adding 0x1F:
//
// b'a' + 0x1F == 0x80 == 0b1000_0000
// b'z' + 0x1F == 0x98 == 0b10011000
//
// Lower-case ASCII 'z' is the last byte that has its highest bit unset
// after wrap-adding 0x05:
//
// b'a' + 0x05 == 0x66 == 0b0110_0110
// b'z' + 0x05 == 0x7F == 0b0111_1111
//
// … except for 0xFB to 0xFF, but those are in the range of bytes
// that have the highest bit unset again after adding 0x1F.
//
// So `(byte + 0x1f) & !(byte + 5)` has its highest bit set
// iff `byte` is a lower-case ASCII letter.
//
// Lower-case ASCII letters all have the 0x20 bit set.
// (Two positions right of 0x80, the highest bit.)
// Unsetting that bit produces the same letter, in upper-case.
//
// Therefore:
fn branchless_to_ascii_upper_case(byte: u8) -> u8 {
byte &
!(
Expand Down
46 changes: 4 additions & 42 deletions src/libcore/num/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3794,39 +3794,8 @@ impl u8 {
#[stable(feature = "ascii_methods_on_intrinsics", since = "1.23.0")]
#[inline]
pub fn to_ascii_uppercase(&self) -> u8 {
// See benchmarks in src/libcore/benches/ascii_case.rs

// Lower-case ASCII 'a' is the first byte that has its highest bit set
// after wrap-adding 0x1F:
//
// b'a' + 0x1F == 0x80 == 0b1000_0000
// b'z' + 0x1F == 0x98 == 0b10011000
//
// Lower-case ASCII 'z' is the last byte that has its highest bit unset
// after wrap-adding 0x05:
//
// b'a' + 0x05 == 0x66 == 0b0110_0110
// b'z' + 0x05 == 0x7F == 0b0111_1111
//
// … except for 0xFB to 0xFF, but those are in the range of bytes
// that have the highest bit unset again after adding 0x1F.
//
// So `(byte + 0x1f) & !(byte + 5)` has its highest bit set
// iff `byte` is a lower-case ASCII letter.
//
// Lower-case ASCII letters all have the 0x20 bit set.
// (Two positions right of 0x80, the highest bit.)
// Unsetting that bit produces the same letter, in upper-case.
//
// Therefore:
*self &
!(
(
self.wrapping_add(0x1f) &
!self.wrapping_add(0x05) &
0x80
) >> 2
)
// Unset the fith bit if this is a lowercase letter
*self & !((self.is_ascii_lowercase() as u8) << 5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*self & !((self.is_ascii_lowercase() as u8) << 5)
*self - ((self.is_ascii_lowercase() as u8) << 5)

Using subtract is slightly faster for me:

test long::case12_mask_shifted_bool_match_range         ... bench:         776 ns/iter (+/- 26) = 9007 MB/s
test long::case13_sub_shifted_bool_match_range          ... bench:         734 ns/iter (+/- 49) = 9523 MB/s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also an improvement for me, but smaller:

test ascii::long::case12_mask_shifted_bool_match_range         ... bench:         352 ns/iter (+/- 0) = 19857 MB/s
test ascii::long::case13_subtract_shifted_bool_match_range     ... bench:         350 ns/iter (+/- 1) = 19971 MB/s
test ascii::medium::case12_mask_shifted_bool_match_range       ... bench:          15 ns/iter (+/- 0) = 2133 MB/s
test ascii::medium::case13_subtract_shifted_bool_match_range   ... bench:          15 ns/iter (+/- 0) = 2133 MB/s
test ascii::short::case12_mask_shifted_bool_match_range        ... bench:          19 ns/iter (+/- 0) = 368 MB/s
test ascii::short::case13_subtract_shifted_bool_match_range    ... bench:          18 ns/iter (+/- 0) = 388 MB/s

}

/// Makes a copy of the value in its ASCII lower case equivalent.
Expand All @@ -3848,15 +3817,8 @@ impl u8 {
#[stable(feature = "ascii_methods_on_intrinsics", since = "1.23.0")]
#[inline]
pub fn to_ascii_lowercase(&self) -> u8 {
// See comments in to_ascii_uppercase above.
*self |
(
(
self.wrapping_add(0x3f) &
!self.wrapping_add(0x25) &
0x80
) >> 2
)
// Set the fith bit if this is an uppercase letter
*self | ((self.is_ascii_uppercase() as u8) << 5)
}

/// Checks that two values are an ASCII case-insensitive match.
Expand Down