Skip to content

Conversation

@mertcanaltin
Copy link
Member

@mertcanaltin mertcanaltin commented Nov 29, 2025

Alternative for this pr: #60889

this pr deleted decodeLatin1, and added Windows-1252 encoding support

problem: #60888

Thanks for repair & info detail 🙏 @ChALkeR

cc @mcollina

Fixes: #60888
Fixes: #59515
Fixes: #56542

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. encoding Issues and PRs related to the TextEncoder and TextDecoder APIs. needs-ci PRs that need a full CI run. typings labels Nov 29, 2025
@mertcanaltin
Copy link
Member Author

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Member

@ChALkeR ChALkeR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any reasons to keep DecodeLatin1? It seems unused

@mertcanaltin mertcanaltin force-pushed the mert/fix/windows-1252-decoding branch from 709e9ce to 25319d5 Compare November 29, 2025 18:11
@mertcanaltin
Copy link
Member Author

Are there any reasons to keep DecodeLatin1? It seems unused

I removed DecodeLatin1, thanks

@mertcanaltin mertcanaltin force-pushed the mert/fix/windows-1252-decoding branch from 848e480 to 59fd7cb Compare November 29, 2025 18:28
@codecov
Copy link

codecov bot commented Nov 29, 2025

Codecov Report

❌ Patch coverage is 86.20690% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.56%. Comparing base (6c22420) to head (59fd7cb).
⚠️ Report is 41 commits behind head on main.

Files with missing lines Patch % Lines
src/encoding_binding.cc 81.81% 1 Missing and 3 partials ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #60893   +/-   ##
=======================================
  Coverage   88.56%   88.56%           
=======================================
  Files         703      703           
  Lines      208291   208302   +11     
  Branches    40170    40160   -10     
=======================================
+ Hits       184472   184482   +10     
- Misses      15837    15841    +4     
+ Partials     7982     7979    -3     
Files with missing lines Coverage Δ
lib/internal/encoding.js 99.51% <100.00%> (ø)
src/encoding_binding.h 100.00% <ø> (ø)
src/encoding_binding.cc 80.95% <81.81%> (+1.73%) ⬆️

... and 31 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@aduh95 aduh95 added lts-watch-v20.x PRs that may need to be released in v20.x lts-watch-v22.x PRs that may need to be released in v22.x labels Nov 30, 2025
@RafaelGSS RafaelGSS added author ready PRs that have at least one approval, no pending requests for changes, and a CI started. request-ci Add this label to start a Jenkins CI on a PR. labels Nov 30, 2025
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Nov 30, 2025
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@aduh95 aduh95 added the commit-queue Add this label to land a pull request using GitHub Actions. label Dec 4, 2025
@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Dec 4, 2025
@nodejs-github-bot nodejs-github-bot merged commit f65b0bc into nodejs:main Dec 4, 2025
78 of 79 checks passed
@nodejs-github-bot
Copy link
Collaborator

Landed in f65b0bc

targos pushed a commit that referenced this pull request Dec 5, 2025
PR-URL: #60893
Fixes: #60888
Fixes: #59515
Fixes: #56542
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Rafael Gonzaga <[email protected]>
@ChALkeR
Copy link
Member

ChALkeR commented Dec 17, 2025

I see no performance improvements from this over #60889, and a significant performance degradation instead
For what reason do we want to have that native path?

cc @nodejs/performance

Comment on lines +289 to +307
// Check if byte is in the special Windows-1252 range (128-159)
if (byte >= 0x80 && byte <= 0x9F) {
codepoint = windows1252_mapping[byte - 0x80];
} else {
// For all other bytes, Windows-1252 is identical to Latin-1
codepoint = byte;
}

if (has_fatal && written == 0) {
return node::THROW_ERR_ENCODING_INVALID_ENCODED_DATA(
env->isolate(), "The encoded data was not valid for encoding latin1");
// Convert codepoint to UTF-8
if (codepoint < 0x80) {
result.push_back(static_cast<char>(codepoint));
} else if (codepoint < 0x800) {
result.push_back(static_cast<char>(0xC0 | (codepoint >> 6)));
result.push_back(static_cast<char>(0x80 | (codepoint & 0x3F)));
} else {
result.push_back(static_cast<char>(0xE0 | (codepoint >> 12)));
result.push_back(static_cast<char>(0x80 | ((codepoint >> 6) & 0x3F)));
result.push_back(static_cast<char>(0x80 | (codepoint & 0x3F)));
}
Copy link
Member

@ChALkeR ChALkeR Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This set of ifs on each char is extremely slow

@ChALkeR
Copy link
Member

ChALkeR commented Dec 17, 2025

Looking back, #55275 should have been reverted then re-added only if it shows a performance improvement over revert

What instead happened is that #55275 showed perf improvements in some cases but broke the decoder, then a fix for the breakage here negated all those performance improvements and instead made decoder 4-20x slower than it was originally

This is a process problem, perf improvements should not be merged without benchmarks, and this should have been treated as a perf path

@mertcanaltin
Copy link
Member Author

Looking back, #55275 should have been reverted then re-added only if it shows a performance improvement over revert

What instead happened is that #55275 showed perf improvements in some cases but broke the decoder, then a fix for the breakage here negated all those performance improvements and instead made decoder 4-20x slower than it was originally

This is a process problem, perf improvements should not be merged without benchmarks, and this should have been treated as a perf path

I will inspect this problem today

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

author ready PRs that have at least one approval, no pending requests for changes, and a CI started. c++ Issues and PRs that require attention from people who are familiar with C++. encoding Issues and PRs related to the TextEncoder and TextDecoder APIs. lts-watch-v20.x PRs that may need to be released in v20.x lts-watch-v22.x PRs that may need to be released in v22.x needs-ci PRs that need a full CI run. typings

Projects

None yet

6 participants