Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Oct 8, 2025

Summary

This PR updates the Sanitize method in LogHelper.cs to utilize SearchValues<char> for improved performance on .NET 8.0 and later, while maintaining backward compatibility with older target frameworks.

Changes

Performance Optimization

  • For .NET 8.0+: Uses SearchValues<char> for vectorized search of ASCII control characters, providing significant performance improvements
  • For earlier frameworks: Maintains the original implementation using char.IsControl() and CharUnicodeInfo.GetUnicodeCategory()
  • Fast path optimization: Returns input string unchanged when no sanitizable characters are detected, avoiding unnecessary StringBuilder allocations

Implementation Details

The SearchValues collection includes all ASCII control characters that require sanitization:

  • U+0000-U+0008 (null through backspace)
  • U+000B-U+000C (vertical tab, form feed)
  • U+000E-U+001F (shift out through unit separator)
  • U+007F-U+009F (delete through application program command)

Special characters \r, \n, and \t are handled separately to maintain their readable escape sequences ("\\r", "\\n", "\\t"), while all other control and Unicode format characters are encoded as "\\uXXXX".

Character Coverage

All required characters are properly sanitized:

  • Carriage Return (U+000D) → "\\r"
  • Line Feed (U+000A) → "\\n"
  • Tab (U+0009) → "\\t"
  • ASCII control characters (U+0000-U+0008, U+000B-U+000C, U+000E-U+001F, U+007F-U+009F) → "\\uXXXX"
  • Unicode format characters (Cf category: U+200B, U+200C, U+200D, U+200E, U+200F, U+202A-U+202E, U+2060-U+206F, U+FEFF, etc.) → "\\uXXXX"

Test Coverage

Added comprehensive test cases covering:

  • Empty and null string handling
  • Strings with no sanitizable characters (fast path verification)
  • Strings with only ASCII control characters
  • Extended ASCII control characters (U+0080-U+009F)
  • Unicode format characters (zero-width spaces, directional marks, etc.)
  • Very long strings for performance validation
  • Mixed control and format characters

All 37 tests pass on both net8.0 and net9.0. The implementation produces identical output to the original method, ensuring no breaking changes.

Testing

# Build succeeded on all target frameworks
dotnet build src/Microsoft.IdentityModel.Logging/Microsoft.IdentityModel.Logging.csproj -c Release

# All tests passing
dotnet test test/Microsoft.IdentityModel.Logging.Tests/Microsoft.IdentityModel.Logging.Tests.csproj --filter "FullyQualifiedName~LogHelperTests"
# Result: Passed: 37, Failed: 0

Example

// Before and after produce identical output:
LogHelper.FormatInvariant("Value: {0}", LogHelper.MarkAsNonPII("A\rB\nC\tD" + (char)0x00));
// Output: "Value: A\\rB\\nC\\tD\\u0000"

LogHelper.FormatInvariant("Value: {0}", LogHelper.MarkAsNonPII("Test\u200B\u202A"));
// Output: "Value: Test\\u200B\\u202A"

Fixes #2962

Original prompt

This section details on the original issue you should resolve

<issue_title>Log sanitization with use of SearchValues v2</issue_title>
<issue_description>Problem Statement.
We want to update the Sanitize method:

to utilize SearchValues. All values which need to be sanitized can be added to a SearchValues collection.

Specific Characters to Sanitize:

  • '\r' (Carriage Return, U+000D)
  • '\n' (Line Feed, U+000A)
  • '\t' (Tab, U+0009)
  • All other ASCII control characters: U+0000-U+0008, U+000B-U+000C, U+000E-U+001F, U+007F-U+009F
  • All Unicode characters where char.IsControl(c) is true (Unicode category Cc)
  • All Unicode characters where CharUnicodeInfo.GetUnicodeCategory(c) == UnicodeCategory.Format (Unicode category Cf), e.g.:
    • U+200B (Zero Width Space)
    • U+200C (Zero Width Non-Joiner)
    • U+200D (Zero Width Joiner)
    • U+200E (Left-to-Right Mark)
    • U+200F (Right-to-Left Mark)
    • U+202A-U+202E (Directional formatting)
    • U+2060-U+206F (Various format characters)
    • U+FEFF (Zero Width No-Break Space)

Implementation Clarifications:

  • The referenced SanitizeEntryFromFilePath method from dotnet/runtime should be used as a performance pattern reference only. The actual character set for sanitization should follow this issue's explicit enumeration.
  • The new implementation must produce identical output to the current method, including string formatting (e.g., using "\u{(int)c:X4}" for control/format characters).

Steps:

  1. Update the Sanitize method to have a SearchValues collection that includes the above values. Create a SearchValues for the common ASCII control characters and handle Unicode format characters separately with a fallback to char.IsControl() and CharUnicodeInfo.GetUnicodeCategory()
  2. Update the Sanitize method to follow the structure of the method called SanitizeEntryFromFilePath: dotnet/runtime/src/libraries/Common/src/System/IO/Archiving.Utils.Windows.cs#L11
  3. Update tests accordingly. If tests to cover the following edge cases are not present, add them: Strings with no sanitizable characters (performance fast path), Strings with only ASCII control characters, Strings with Unicode format characters. Very long strings, and Null/empty string handling

Only update the Sanitize method as described above; do not update any other methods.</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #3340

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI changed the title [WIP] Update log sanitization to use SearchValues v2 Optimize log sanitization using SearchValues for .NET 8.0+ Oct 8, 2025
Copilot AI requested a review from mdchennu October 8, 2025 18:59
@mdchennu mdchennu marked this pull request as ready for review October 8, 2025 20:43
@mdchennu mdchennu requested a review from a team as a code owner October 8, 2025 20:43
@github-actions
Copy link

github-actions bot commented Oct 8, 2025

Summary

Summary
Generated on: 10/8/2025 - 8:56:49 PM
Coverage date: 10/8/2025 - 8:48:21 PM - 10/8/2025 - 8:56:16 PM
Parser: MultiReport (60x Cobertura)
Assemblies: 1
Classes: 7
Files: 2
Line coverage: 80.3% (620 of 772)
Covered lines: 620
Uncovered lines: 152
Coverable lines: 772
Total lines: 483
Branch coverage: 67.8% (228 of 336)
Covered branches: 228
Total branches: 336
Method coverage: Feature is only available for sponsors

Coverage

Microsoft.IdentityModel.JsonWebTokens - 80.3%
Name Line Branch
Microsoft.IdentityModel.JsonWebTokens 80.3% 67.8%
Microsoft.IdentityModel.JsonWebTokens.JwtTokenUtilities 100%
System.Text.RegularExpressions.Generated 80.3% 67.8%
System.Text.RegularExpressions.Generated 80.3% 67.8%
System.Text.RegularExpressions.Generated.<RegexGenerator_g>F12A1AEDDDFE32BA
DF4DBFF323AF1BCB48B9F9721B7CD3E05F5E034CF225E3DF8__CreateJweRegex_1
79.2% 68%
System.Text.RegularExpressions.Generated.<RegexGenerator_g>F12A1AEDDDFE32BA
DF4DBFF323AF1BCB48B9F9721B7CD3E05F5E034CF225E3DF8__CreateJwsRegex_0
81.4% 67.6%
System.Text.RegularExpressions.Generated.<RegexGenerator_g>F334844C618E00D3
CEC5D3FE0D00CF0141BBEE98635313BB2CB8D3921464CE05A__CreateJweRegex_1
79.2% 68%
System.Text.RegularExpressions.Generated.<RegexGenerator_g>F334844C618E00D3
CEC5D3FE0D00CF0141BBEE98635313BB2CB8D3921464CE05A__CreateJwsRegex_0
81.4% 67.6%

@mdchennu
Copy link
Contributor

mdchennu commented Oct 10, 2025

here are baseline JsonWebTokenHandler_ValidateTokenAsyncWithTVP

from Peter's PR: #3316

Sanitize Method Mean Error StdDev P90 P95 P100 Ratio Gen0 Gen1 Allocated Alloc Ratio
False JsonWebTokenHandler_ValidateTokenAsyncWithTVP 36.49 μs 0.182 μs 0.399 μs 36.97 μs 37.01 μs 37.17 μs 1.00 0.6104 0.3052 10.2 KB 1.00
True JsonWebTokenHandler_ValidateTokenAsyncWithTVP 36.70 μs 0.126 μs 0.267 μs 37.05 μs 37.12 μs 37.44 μs 1.00 0.6104 0.1831 10.6 KB 1.04

@mdchennu
Copy link
Contributor

mdchennu commented Oct 10, 2025

here are JsonWebTokenHandler_ValidateTokenAsyncWithTVP with SANITIZE == FALSE

image

here is log sanitzation-update JsonWebTokenHandler_ValidateTokenAsyncWithTVP SANITIZE == TRUE

BenchmarkDotNet v0.13.12, Windows 11 (10.0.26100.6584) (Hyper-V)
Unknown processor
.NET SDK 9.0.305
[Host] : .NET 8.0.20 (8.0.2025.41914), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
MediumRun : .NET 8.0.20 (8.0.2025.41914), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Job=MediumRun MaxAbsoluteError=10.0000 ms IterationCount=15
LaunchCount=4 WarmupCount=10

| Method                                        | Mean     | Error    | StdDev   | P90      | P95      | P100     | Ratio | Gen0   | Allocated | Alloc Ratio |
|---------------------------------------------- |---------:|---------:|---------:|---------:|---------:|---------:|------:|-------:|----------:|------------:|
| JsonWebTokenHandler_ValidateTokenAsyncWithTVP | 32.82 μs | 0.570 μs | 1.239 μs | 34.82 μs | 35.62 μs | 36.38 μs |  1.00 | 0.3662 |   9.98 KB |        1.00 |

@mdchennu
Copy link
Contributor

here are baseline LogSanitization benchmarks (with no log sanitization changes):


BenchmarkDotNet v0.13.12, Windows 11 (10.0.26100.6584) (Hyper-V)
Unknown processor
.NET SDK 9.0.305
  [Host]    : .NET 9.0.9 (9.0.925.41916), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  MediumRun : .NET 9.0.9 (9.0.925.41916), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Job=MediumRun  MaxAbsoluteError=10.0000 ms  IterationCount=15  
LaunchCount=4  WarmupCount=10  

Method Mean Error StdDev P90 P95 P100 Ratio RatioSD Gen0 Allocated Alloc Ratio
Sanitize_LongStringNoSpecialChars 1,514.6 ns 10.75 ns 22.92 ns 1,543.5 ns 1,555.9 ns 1,580.9 ns 4.53 0.11 0.0992 2504 B 3.82
Sanitize_LongStringWithSparseChars 1,136.4 ns 8.15 ns 17.71 ns 1,158.4 ns 1,172.5 ns 1,180.5 ns 3.40 0.06 0.0896 2264 B 3.45
Sanitize_NoSpecialChars 334.7 ns 2.47 ns 5.36 ns 342.3 ns 344.0 ns 349.5 ns 1.00 0.00 0.0257 656 B 1.00
Sanitize_WithCommonControlChars 274.0 ns 3.35 ns 7.41 ns 283.8 ns 287.3 ns 295.9 ns 0.82 0.03 0.0257 648 B 0.99
Sanitize_WithMixedChars 376.8 ns 2.90 ns 6.43 ns 384.5 ns 387.6 ns 391.8 ns 1.13 0.03 0.0315 792 B 1.21
Sanitize_WithUnicodeFormatChars 453.4 ns 2.80 ns 5.90 ns 460.5 ns 464.1 ns 470.9 ns 1.36 0.03 0.0358 904 B 1.38

@mdchennu
Copy link
Contributor

here are LogSanitization benchmarks (with log sanitization changes):


BenchmarkDotNet v0.13.12, Windows 11 (10.0.26100.6584) (Hyper-V)
Unknown processor
.NET SDK 9.0.305
  [Host]    : .NET 9.0.9 (9.0.925.41916), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  MediumRun : .NET 9.0.9 (9.0.925.41916), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Job=MediumRun  MaxAbsoluteError=10.0000 ms  IterationCount=15  
LaunchCount=4  WarmupCount=10  

Method Mean Error StdDev Median P90 P95 P100 Ratio RatioSD Gen0 Allocated Alloc Ratio
Sanitize_LongStringNoSpecialChars 747.8 ns 10.43 ns 22.22 ns 744.2 ns 759.4 ns 800.2 ns 837.5 ns 3.84 0.09 0.0362 920 B 3.03
Sanitize_LongStringWithSparseChars 1,243.1 ns 4.79 ns 10.52 ns 1,241.0 ns 1,255.9 ns 1,265.1 ns 1,279.5 ns 6.39 0.15 0.0896 2264 B 7.45
Sanitize_NoSpecialChars 194.4 ns 1.70 ns 3.80 ns 193.3 ns 200.3 ns 201.4 ns 203.6 ns 1.00 0.00 0.0119 304 B 1.00
Sanitize_WithCommonControlChars 282.6 ns 2.99 ns 6.51 ns 281.5 ns 291.2 ns 292.8 ns 299.9 ns 1.45 0.04 0.0257 648 B 2.13
Sanitize_WithMixedChars 415.4 ns 15.78 ns 33.62 ns 402.9 ns 459.3 ns 496.5 ns 549.9 ns 2.14 0.20 0.0315 792 B 2.61
Sanitize_WithUnicodeFormatChars 525.2 ns 7.46 ns 16.54 ns 530.7 ns 543.2 ns 545.1 ns 548.2 ns 2.70 0.09 0.0353 904 B 2.97

@pmaytak
Copy link
Contributor

pmaytak commented Oct 10, 2025

I think this can be closed in favour of #3341

@keegan-caruso
Copy link
Contributor

#3341 was used instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Log sanitization with use of SearchValues v2

4 participants