Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
fix(parser): handle invalid surrogate pair as lossy (#9964)
- Closes #3526

It looks like invalid pair wasn't preserved properly. For example, `tasks/coverage/test262/test/built-ins/RegExp/escape/escaped-surrogates.js` has such code like this:

https://playground.oxc.rs/#eNplkU1TAjEMhv8K0wsXcAEFEceDCnjUgSuX0g1LtdvsJC0fw/Df7RKWccbT0ybp+ybNSRk1UQY9hxZ6aL202qtVnI57vfbzyks87PFPXNCv0yufZa2g+YczgzsgXUAWgMNgNLgwW0frQtd6zhZQzA5VBmx0BVfkXY5EWOhUe/fNjd3WFtvlLfFBGKv+f3/BQHAveBAMBSPBo2AseBK8Ct4E74KpYCaY1xOqjkI1OSmKvgYffdAHNQkUoaOc9UFNNtpxurDBCpoMH8s1uuYWSHveIJXX4nNHVZoYqJbUzuF+ASGS/4yBbQ7z6E2wmAzleUWQanfwpQk8SzRJ1O4XiXS+OYhm0FRAak0BD3r9YRrCMrr0m/kUjNOka/mk1HRjMIcCLhOC12sHS4xkoNTVza203m5s45cWFQjdPLVeP0qrXyOn6UXwfP4FSqrNaw==

input

```js
const one = '\uD800';
const two = '\uD800\uD801';

// tasks/coverage/test262/test/built-ins/RegExp/escape/escaped-surrogates.js
const highSurrogatesGroup1 = '\uD800\uD801\uD802\uD803\uD804\uD805\uD806\uD807\uD808\uD809\uD80A\uD80B\uD80C\uD80D\uD80E\uD80F';
```

output

```js
const one = '\uD800';
const two = "\\ud800\\ud801";
const highSurrogatesGroup1 = "\\ud800\\ud801\\ud802\\ud803\\ud804\\ud805\\ud806\\ud807\\ud808\\ud809\\ud80a\\ud80b\\ud80c\\ud80d\\ud80e\\ud80f";
```
  • Loading branch information
hi-ogawa committed Mar 23, 2025
commit eaea5fd4c2f61c0c77eb8ddf3a133b7604e86cd4
1 change: 1 addition & 0 deletions crates/oxc_codegen/tests/integration/unit.rs
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,7 @@ fn unicode_escape() {
test("console.log('こんにちは');", "console.log(\"こんにちは\");\n");
test("console.log('안녕하세요');", "console.log(\"안녕하세요\");\n");
test("console.log('🧑‍🤝‍🧑');", "console.log(\"🧑‍🤝‍🧑\");\n");
test("console.log(\"\\uD800\\uD801\")", "console.log(\"\\uD800\\uD801\");\n");
}

#[test]
Expand Down
8 changes: 3 additions & 5 deletions crates/oxc_parser/src/lexer/unicode.rs
Original file line number Diff line number Diff line change
Expand Up @@ -136,11 +136,9 @@ impl<'a> Lexer<'a> {
self.token.lossy = true;
}
}
SurrogatePair::HighLow(high, low) => {
text.push_str("\\u");
text.push_str(format!("{high:x}").as_str());
text.push_str("\\u");
text.push_str(format!("{low:x}").as_str());
SurrogatePair::HighLow(_high, _low) => {
text.push_str("\u{FFFD}\u{FFFD}");
self.token.lossy = true;
}
}
}
Expand Down