-
-
Notifications
You must be signed in to change notification settings - Fork 34.1k
src: implement Windows-1252 encoding support and update related tests #60893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
src: implement Windows-1252 encoding support and update related tests #60893
Conversation
|
I think this test type is flakky https://github.com/nodejs/node/actions/runs/19785758751/job/56691976357?pr=60893 |
mcollina
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
ChALkeR
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any reasons to keep DecodeLatin1? It seems unused
709e9ce to
25319d5
Compare
I removed DecodeLatin1, thanks |
848e480 to
59fd7cb
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #60893 +/- ##
=======================================
Coverage 88.56% 88.56%
=======================================
Files 703 703
Lines 208291 208302 +11
Branches 40170 40160 -10
=======================================
+ Hits 184472 184482 +10
- Misses 15837 15841 +4
+ Partials 7982 7979 -3
🚀 New features to boost your workflow:
|
|
Landed in f65b0bc |
PR-URL: #60893 Fixes: #60888 Fixes: #59515 Fixes: #56542 Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Rafael Gonzaga <[email protected]>
|
I see no performance improvements from this over #60889, and a significant performance degradation instead cc @nodejs/performance |
| // Check if byte is in the special Windows-1252 range (128-159) | ||
| if (byte >= 0x80 && byte <= 0x9F) { | ||
| codepoint = windows1252_mapping[byte - 0x80]; | ||
| } else { | ||
| // For all other bytes, Windows-1252 is identical to Latin-1 | ||
| codepoint = byte; | ||
| } | ||
|
|
||
| if (has_fatal && written == 0) { | ||
| return node::THROW_ERR_ENCODING_INVALID_ENCODED_DATA( | ||
| env->isolate(), "The encoded data was not valid for encoding latin1"); | ||
| // Convert codepoint to UTF-8 | ||
| if (codepoint < 0x80) { | ||
| result.push_back(static_cast<char>(codepoint)); | ||
| } else if (codepoint < 0x800) { | ||
| result.push_back(static_cast<char>(0xC0 | (codepoint >> 6))); | ||
| result.push_back(static_cast<char>(0x80 | (codepoint & 0x3F))); | ||
| } else { | ||
| result.push_back(static_cast<char>(0xE0 | (codepoint >> 12))); | ||
| result.push_back(static_cast<char>(0x80 | ((codepoint >> 6) & 0x3F))); | ||
| result.push_back(static_cast<char>(0x80 | (codepoint & 0x3F))); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This set of ifs on each char is extremely slow
|
Looking back, #55275 should have been reverted then re-added only if it shows a performance improvement over revert What instead happened is that #55275 showed perf improvements in some cases but broke the decoder, then a fix for the breakage here negated all those performance improvements and instead made decoder 4-20x slower than it was originally This is a process problem, perf improvements should not be merged without benchmarks, and this should have been treated as a perf path |
I will inspect this problem today |
Alternative for this pr: #60889
this pr deleted decodeLatin1, and added Windows-1252 encoding support
problem: #60888
Thanks for repair & info detail 🙏 @ChALkeR
cc @mcollina
Fixes: #60888
Fixes: #59515
Fixes: #56542