Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
only split by newlines
To reduce overhead of the Extractor itself, we can chunk the work by
lines instead of every whitespace-separated chunk.

This seems to improve the overall cost even more!

Co-authored-by: Jordan Pittman <[email protected]>
  • Loading branch information
RobinMalfait and thecrypticace committed Dec 2, 2024
commit 8fe397717d11e85737220049ebc8fcdc02744628
2 changes: 1 addition & 1 deletion crates/oxide/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -456,7 +456,7 @@ fn read_all_files(changed_content: Vec<ChangedContent>) -> Vec<Vec<u8>> {
fn parse_all_blobs(blobs: Vec<Vec<u8>>) -> Vec<String> {
let mut result: Vec<_> = blobs
.par_iter()
.flat_map(|blob| blob.par_split(|x| x.is_ascii_whitespace()))
.flat_map(|blob| blob.par_split(|x| matches!(x, b'\n' | b'\r')))
.map(|blob| Extractor::unique(blob, Default::default()))
.reduce(Default::default, |mut a, b| {
a.extend(b);
Expand Down