Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
extract used CSS variables from CSS files
We will track CSS files while traversing the folder structure, but don't
extract any normal candidates from these CSS files. We will also not
include these files into any of the returned globs.

We will just run the CSS extractor on these CSS files, and every time we
find a CSS variable, we will verify whether it was used or not.

For now, "using", just means if it is used inside of `var(…)`.
  • Loading branch information
RobinMalfait committed Mar 28, 2025
commit 9f7d0f96963d5d19d0971161f2c0bb8c768f4dd7
36 changes: 36 additions & 0 deletions crates/oxide/src/extractor/mod.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
use crate::cursor;
use crate::extractor::machine::Span;
use bstr::ByteSlice;
use candidate_machine::CandidateMachine;
use css_variable_machine::CssVariableMachine;
use machine::{Machine, MachineState};
Expand Down Expand Up @@ -139,6 +140,41 @@ impl<'a> Extractor<'a> {

extracted
}

pub fn extract_css_variables_from_css_files(&mut self) -> Vec<Extracted<'a>> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of a mouthful, open to suggestions. Note: this is completely internal API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract_variables_from_css if you want a shorter name but this is fine imo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's nicer, and shorter, thanks!

let mut extracted = Vec::with_capacity(100);

let len = self.cursor.input.len();

let cursor = &mut self.cursor.clone();
while cursor.pos < len {
if cursor.curr.is_ascii_whitespace() {
cursor.advance();
continue;
}

if let MachineState::Done(span) = self.css_variable_machine.next(cursor) {
// We are only interested in variables that are used, not defined. Therefore we
// need to ensure that the variable is prefixed with `var(`.
if span.start < 4 {
cursor.advance();
continue;
}

let slice_before = Span::new(span.start - 4, span.start - 1);
if !slice_before.slice(self.cursor.input).starts_with(b"var(") {
cursor.advance();
continue;
}

extracted.push(Extracted::CssVariable(span.slice(self.cursor.input)));
}

cursor.advance();
}

extracted
}
}

// Extract sub-candidates from a given range.
Expand Down
56 changes: 54 additions & 2 deletions crates/oxide/src/scanner/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -215,11 +215,25 @@ impl Scanner {
fn extract_candidates(&mut self) -> Vec<String> {
let changed_content = self.changed_content.drain(..).collect::<Vec<_>>();

let candidates = parse_all_blobs(read_all_files(changed_content));
// Extract all candidates from the changed content
let mut new_candidates = parse_all_blobs(read_all_files(changed_content));

// Extract all CSS variables from the CSS files
let css_files = self.css_files.drain(..).collect::<Vec<_>>();
if !css_files.is_empty() {
let css_variables = extract_css_variables(read_all_files(
css_files
.into_iter()
.map(|file| ChangedContent::File(file, "css".into()))
.collect(),
));

new_candidates.extend(css_variables);
}

// Only compute the new candidates and ignore the ones we already have. This is for
// subsequent calls to prevent serializing the entire set of candidates every time.
let mut new_candidates = candidates
let mut new_candidates = new_candidates
.into_par_iter()
.filter(|candidate| !self.candidates.contains(candidate))
.collect::<Vec<_>>();
Expand Down Expand Up @@ -411,6 +425,44 @@ fn read_all_files(changed_content: Vec<ChangedContent>) -> Vec<Vec<u8>> {
.collect()
}

#[tracing::instrument(skip_all)]
fn extract_css_variables(blobs: Vec<Vec<u8>>) -> Vec<String> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started passing options to the other parse_all_blobs function but it looked a bit messy. So duplicated it instead (even though there is a good chunk of duplication going on).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the only difference really should be that we no-op the Candidate machine though? Can you elaborate what you meant with messy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started passing through the contents and the extension and it leaked everywhere. Can see that in this commit: 7dc7878

Then I was thinking about an ExtractorOptions struct that we could pass in as well, but that also required checking the options in the extract() call at runtime. While that's not the end of the world, that's additional check we have to perform but it's a useless check for 99% of the files we scan. To make things a bit worse, the extract() function is called for every line in every file. So there could be a lot of unnecessary checks.

So instead of checking it in this hot path, I created a separate function instead.

let mut result: Vec<_> = blobs
.par_iter()
.flat_map(|blob| blob.par_split(|x| *x == b'\n'))
.filter_map(|blob| {
if blob.is_empty() {
return None;
}

let extracted =
crate::extractor::Extractor::new(blob).extract_css_variables_from_css_files();
if extracted.is_empty() {
return None;
}

Some(FxHashSet::from_iter(extracted.into_iter().map(
|x| match x {
Extracted::CssVariable(bytes) => bytes,
_ => &[],
},
)))
})
.reduce(Default::default, |mut a, b| {
a.extend(b);
a
})
.into_iter()
.map(|s| unsafe { String::from_utf8_unchecked(s.to_vec()) })
.collect();

// SAFETY: Unstable sort is faster and in this scenario it's also safe because we are
// guaranteed to have unique candidates.
result.par_sort_unstable();

result
}

#[tracing::instrument(skip_all)]
fn parse_all_blobs(blobs: Vec<Vec<u8>>) -> Vec<String> {
let mut result: Vec<_> = blobs
Expand Down
35 changes: 35 additions & 0 deletions crates/oxide/tests/scanner.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1735,4 +1735,39 @@ mod scanner {

assert_eq!(candidates, vec!["content-['abcd/xyz.html']"]);
}

#[test]
fn test_extract_used_css_variables_from_css() {
let dir = tempdir().unwrap().into_path();
create_files_in(
&dir,
&[
(
"src/index.css",
r#"
@theme {
--color-red: #ff0000; /* Not used, so don't extract */
--color-green: #00ff00; /* Not used, so don't extract */
}

.button {
color: var(--color-red); /* Used, so extract */
}
"#,
),
("src/used-at-start.css", "var(--color-used-at-start)"),
// Here to verify that we don't crash when trying to find `var(` in front of the
// variable.
("src/defined-at-start.css", "--color-defined-at-start: red;"),
],
);

let mut scanner = Scanner::new(vec![public_source_entry_from_pattern(
dir.clone(),
"@source './'",
)]);
let candidates = scanner.scan();

assert_eq!(candidates, vec!["--color-red", "--color-used-at-start"]);
}
}