Skip to content

Conversation

@markusicu
Copy link
Member

@markusicu markusicu commented Sep 5, 2024

  • I compared the ICU implementation with the UTS46 changes in Unicode 15.1 & 16.0.
    • Code & spec are not totally parallel because (a) ICU makes some optimizations and (b) where the spec has options for performing certain checks ICU always performs those but records errors in an output where callers can ignore them.
    • I checked some cases via temporary test code to make sure that the outcome was as desired.
  • Added a specific check for "starts with xn--" to the code path after Punycode-decoding a label, so that we can set an invalid-ACE-label error rather than a vanilla has-hyphens-at-offsets-3-and-4 error.
    • In IDNA2008, this is checked via round-trip conversion back to Punycode. So it makes sense to me to treat this as an ACE label error.
  • Added support for the new test file syntax for explicitly empty strings. (Blank fields mean "same as another field".)
  • Worked around test cases where the only failure is from an empty root label. ICU always performs label length checks, but deliberately does not report the empty root label as failing that check. I filed ICU-22882 for adding a separate error flag for that.
Checklist
  • Required: Issue filed: https://unicode-org.atlassian.net/browse/ICU-22707
  • Required: The PR title must be prefixed with a JIRA Issue number.
  • Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
  • Required: Each commit message must be prefixed with a JIRA Issue number.
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable

macchiati
macchiati previously approved these changes Sep 5, 2024
@markusicu
Copy link
Member Author

Hi @macchiati , thanks for the early review. I have now ported the changes to C++ (very parallel), and filled out the PR description. PTAL

@markusicu markusicu marked this pull request as ready for review September 5, 2024 23:41
@markusicu
Copy link
Member Author

PS: The best part is that, with the code changes and the test workaround, ICU is back to passing the UTS46 conformance test cases.

@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@markusicu markusicu merged commit 415a7ac into unicode-org:main Sep 6, 2024
@markusicu markusicu deleted the uni16-uts46 branch September 6, 2024 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants