Skip to content

Conversation

@sevenzing
Copy link
Owner

@sevenzing sevenzing commented Dec 5, 2024

Summary by CodeRabbit

Release Notes

  • New Features

    • Added a badge for the Crates.io version of the ens-normalize-rs library in the README.
    • Introduced new methods for normalizing and beautifying ENS names.
    • Added new structures for enhanced token representation in ENS names.
  • Bug Fixes

    • Improved error handling in the tokenization and validation processes for ENS names.
  • Documentation

    • Updated README to include version information.
  • Refactor

    • Simplified error handling in various methods related to ENS name processing.
    • Enhanced context-aware error reporting in validation functions.

@coderabbitai
Copy link

coderabbitai bot commented Dec 5, 2024

Walkthrough

The changes in this pull request include several updates across multiple files, primarily focusing on error handling, method signatures, and the introduction of new structures and enums. Notable modifications include the addition of a new badge in README.md, improvements to the error handling in the tokenize methods, the introduction of new structs for ENS name normalization, and enhanced visibility control in lib.rs. Overall, the changes aim to refine the functionality, structure, and clarity of the codebase while maintaining existing logic.

Changes

File Path Change Summary
README.md Added a new badge for the Crates.io version of the ens-normalize-rs library.
examples/tokens.rs Updated the tokenize method to return a Result type instead of unwrapping, improving error handling.
src/code_points/specs.rs Changed return type of finditer_emoji method to improve lifetime specification.
src/code_points/types.rs Added #[allow(dead_code)] to Number variant in ParsedWholeValue enum and v field in ParsedWholeObject struct.
src/error.rs Removed Confused and ConfusedGroups variants from ProcessError, added CurrableError and DisallowedSequence enums with new variants for better error categorization.
src/lib.rs Changed visibility of code_points module exports from public to crate-level.
src/normalizer.rs Introduced EnsNameNormalizer and ProcessedName structs, modified tokenize method to return TokenizedName directly, added normalize and beautify methods.
src/tokens/tokenize.rs Updated several methods to return types directly instead of Result, simplifying error handling.
src/tokens/types.rs Added new structs (TokenValid, TokenMapped, etc.) and an enum (CollapsedEnsNameToken) to enhance token representation.
src/validate.rs Updated determine_group function signature for better error handling, enhanced error reporting in check_group and check_and_get_group functions.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Normalizer
    participant ProcessedName

    User->>Normalizer: tokenize(input)
    Normalizer-->>User: TokenizedName
    User->>Normalizer: process(input)
    Normalizer->>ProcessedName: Create ProcessedName
    Normalizer-->>User: Result<ProcessedName>
    User->>Normalizer: normalize(input)
    Normalizer->>ProcessedName: Call normalize
    Normalizer-->>User: Result<String>
    User->>Normalizer: beautify(input)
    Normalizer->>ProcessedName: Call beautify
    Normalizer-->>User: Result<String>
Loading

🐇 In the code we play,
New badges come to stay,
With tokens now refined,
Errors less confined.
A rabbit's joyful cheer,
For changes that are here! 🌟


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (10)
src/tokens/types.rs (3)

75-79: Consider adding documentation comments to struct fields

To enhance code maintainability and clarity for users of the API, consider adding documentation comments for the public fields in the TokenValid struct. This practice helps others understand the purpose and usage of each field.


81-112: Consider adding documentation comments to newly added structs

The newly introduced structs (TokenMapped, TokenIgnored, TokenDisallowed, TokenStop, TokenNfc) have public fields without documentation comments. Adding comments to these fields will improve code readability and assist developers in understanding their roles.


Line range hint 113-124: Add explanations for complex fields in TokenEmoji struct

The TokenEmoji struct contains multiple fields like input, emoji, cps_input, and cps_no_fe0f. Providing detailed documentation comments for these fields will help clarify their purposes and how they interact within the struct.

src/validate.rs (2)

215-215: Cloning ParsedGroup may have performance implications

At line 215, cloning the ParsedGroup using .cloned() may introduce unnecessary overhead if ParsedGroup is large. If possible, refactor the code to operate on references or consider implementing Clone more efficiently for ParsedGroup.


226-236: Optimizing error creation by avoiding unnecessary allocations

When creating the ProcessError::CurrableError, group.name.to_string() allocates a new String. If group.name is already a String or can be made to borrow a &str, consider passing references to avoid unnecessary allocations.

src/tokens/tokenize.rs (1)

Line range hint 134-179: Restoring error handling in tokenization functions

The functions tokenize_name and tokenize_input no longer return Result types, which eliminates error propagation from these functions. This change might hide issues that should be reported to the caller. It's advisable to retain error handling to ensure that any processing errors are correctly managed.

src/error.rs (3)

Line range hint 3-12: Documentation could be more detailed for ProcessError

While the documentation is present, it would be helpful to add more details about:

  • When each variant occurs
  • How to handle or resolve these errors
  • Examples of scenarios that trigger these errors

Line range hint 17-36: Well-structured error handling for currable errors

The CurrableError enum effectively categorizes errors that can be automatically fixed by the normalizer. The error messages are clear and descriptive.

Consider adding examples in the documentation for each variant to help users understand when these errors occur.


Line range hint 38-53: Good separation of disallowed sequence errors

The DisallowedSequence enum effectively separates permanent errors from currable ones. The error messages are clear and actionable.

Consider adding documentation about:

  • Whether these errors are permanent or can be resolved
  • What actions users should take when encountering these errors
src/normalizer.rs (1)

6-11: Documentation could be more detailed for EnsNameNormalizer

While the high-level documentation is good, consider adding:

  • Examples of usage
  • When to use this vs the no-cache versions
  • Performance implications
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 8d1aad0 and 4fb080d.

📒 Files selected for processing (10)
  • README.md (1 hunks)
  • examples/tokens.rs (1 hunks)
  • src/code_points/specs.rs (2 hunks)
  • src/code_points/types.rs (2 hunks)
  • src/error.rs (4 hunks)
  • src/lib.rs (1 hunks)
  • src/normalizer.rs (3 hunks)
  • src/tokens/tokenize.rs (6 hunks)
  • src/tokens/types.rs (3 hunks)
  • src/validate.rs (5 hunks)
✅ Files skipped from review due to trivial changes (2)
  • README.md
  • src/code_points/types.rs
🔇 Additional comments (5)
src/tokens/tokenize.rs (1)

457-457: Ensure test coverage reflects changes in error handling

With the changes in function signatures, update the tests to check for scenarios that previously resulted in errors. This will help maintain test coverage and ensure that all functionalities are working as expected.

examples/tokens.rs (1)

7-7: Confirm behavior after removing .unwrap() from tokenize method

By removing .unwrap(), the code assumes that tokenize no longer returns a Result. Ensure that this change aligns with the new behavior of tokenize and that any potential errors are properly handled elsewhere in the code.

src/normalizer.rs (2)

Line range hint 41-50: Well-designed normalize and beautify methods

The methods follow a good pattern:

  • They reuse the process method
  • They have clear, single responsibilities
  • They maintain consistent error handling

27-30: Verify impact of removing Result from tokenize

The tokenize method no longer returns a Result. This could potentially hide errors that were previously being handled explicitly.

src/code_points/specs.rs (1)

99-101: Good improvement in lifetime specification

The explicit lifetime parameter in finditer_emoji improves clarity and ensures the returned iterator's lifetime is properly tied to both the struct and input string.

src/validate.rs Outdated
Comment on lines 333 to 361
let mut maybe_group = None;
for last_cp_index in 0..unique_cps.len() {
let cps = &unique_cps[..last_cp_index + 1];
if let Some(group) = specs.groups_for_cps(cps).next() {
maybe_group = Some(group);
continue;
} else {
let cp = unique_cps[last_cp_index];
let index_of_cp = label
.iter_cps()
.position(|cp_in_label| cp_in_label == cp)
.expect("cp must exist in label");
match maybe_group {
Some(group) => {
return Err(ProcessError::CurrableError {
inner: CurrableError::Confused {
group_name: group.name.to_string(),
cp,
},
index: index_of_cp,
sequence: utils::cp2str(cp),
maybe_suggest: Some("".to_string()),
});
}
None => unreachable!(),
}
}
}
unreachable!("")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid using unreachable!() macro; handle all match arms explicitly

In the determine_group function, the use of unreachable!() within the match statement can lead to panics if assumptions about the data change. To make the code more robust, consider explicitly handling the None case or adding logic to ensure that all possible cases are accounted for.

specs: &CodePointsSpecs,
apply_nfc: bool,
) -> Result<Self, ProcessError> {
pub fn from_input(input: impl AsRef<str>, specs: &CodePointsSpecs, apply_nfc: bool) -> Self {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Reconsider removing error handling from from_input method

Changing from_input to return Self instead of Result<Self, ProcessError> might suppress errors that can occur during tokenization. To prevent unexpected behaviors, consider maintaining error propagation to handle invalid inputs gracefully.

mod validate;

pub use code_points::*;
pub(crate) use code_points::*;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Changing visibility to pub(crate) may impact external users

Modifying pub use code_points::*; to pub(crate) use code_points::*; restricts the visibility of code_points exports to within the crate. If any external code depends on these exports, this change can introduce breaking changes. Please verify that no public APIs are affected or consider deprecating before removal.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
src/error.rs (1)

Line range hint 17-38: Consider providing meaningful suggestions for curable errors

While the error structure is well-designed, the maybe_suggest field in ProcessError is consistently set to Some("") in the validation code. Consider providing actual suggestions for fixing the errors where possible.

src/validate.rs (1)

421-421: Consider handling potential errors in test setup

Using unwrap() on TokenizedName::from_input could make test failures less clear if the input is invalid. Consider handling the error case explicitly or using expect() with a descriptive message.

-        let name = TokenizedName::from_input(input, specs, true);
-        let label = name.iter_labels().next().unwrap();
+        let name = TokenizedName::from_input(input, specs, true)
+            .expect("Failed to create TokenizedName from input");
+        let label = name.iter_labels()
+            .next()
+            .expect("Expected at least one label");
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 4fb080d and 0784b0e.

📒 Files selected for processing (2)
  • src/error.rs (4 hunks)
  • src/validate.rs (5 hunks)
🔇 Additional comments (5)
src/error.rs (2)

Line range hint 3-15: Well-structured error handling with rich context!

The ProcessError enum effectively wraps errors with additional context (index, sequence, suggestion) that will help users understand and fix issues.


Line range hint 40-55: Clear categorization of disallowed sequences!

The DisallowedSequence enum effectively categorizes different types of invalid sequences with clear error messages.

src/validate.rs (3)

326-361: Avoid using unreachable!() macro

The use of unreachable!() can lead to runtime panics if the assumptions about the code change. Consider handling this case explicitly.

Consider simplifying the group determination logic

The nested loops and conditions could be simplified using iterators and early returns.

 fn determine_group<'a>(
     label: &TokenizedLabel,
     unique_cps: &'a [CodePoint],
     specs: &'a CodePointsSpecs,
 ) -> Result<&'a ParsedGroup, ProcessError> {
-    if let Some(group) = specs.groups_for_cps(unique_cps).next() {
-        Ok(group)
-    } else {
-        let mut maybe_group = None;
-        for last_cp_index in 0..unique_cps.len() {
-            let cps = &unique_cps[..last_cp_index + 1];
-            if let Some(group) = specs.groups_for_cps(cps).next() {
-                maybe_group = Some(group);
-                continue;
-            } else {
-                let cp = unique_cps[last_cp_index];
-                let index_of_cp = label
-                    .iter_cps()
-                    .position(|cp_in_label| cp_in_label == cp)
-                    .expect("cp must exist in label");
-                let inner = match maybe_group {
-                    Some(group) => CurrableError::Confused {
-                        group_name: group.name.to_string(),
-                        cp,
-                    },
-                    None => CurrableError::Disallowed,
-                };
-                return Err(ProcessError::CurrableError {
-                    inner,
-                    index: index_of_cp,
-                    sequence: utils::cp2str(cp),
-                    maybe_suggest: Some("".to_string()),
-                });
-            }
-        }
-        unreachable!()
+    // Try with all code points first
+    if let Some(group) = specs.groups_for_cps(unique_cps).next() {
+        return Ok(group);
+    }
+
+    // Find the first code point that doesn't fit in any group
+    for (last_cp_index, window) in unique_cps.iter().enumerate() {
+        let cps = &unique_cps[..=last_cp_index];
+        let current_group = specs.groups_for_cps(cps).next();
+        
+        if current_group.is_none() {
+            let cp = unique_cps[last_cp_index];
+            let index_of_cp = label
+                .iter_cps()
+                .position(|cp_in_label| cp_in_label == cp)
+                .expect("cp must exist in label");
+                
+            let previous_group = if last_cp_index > 0 {
+                specs.groups_for_cps(&unique_cps[..last_cp_index]).next()
+            } else {
+                None
+            };
+            
+            let inner = match previous_group {
+                Some(group) => CurrableError::Confused {
+                    group_name: group.name.to_string(),
+                    cp,
+                },
+                None => CurrableError::Disallowed,
+            };
+            
+            return Err(ProcessError::CurrableError {
+                inner,
+                index: index_of_cp,
+                sequence: utils::cp2str(cp),
+                maybe_suggest: Some("".to_string()),
+            });
+        }
     }
+    
+    // This case is actually impossible as we checked all code points above
+    Ok(specs.groups_for_cps(unique_cps)
+        .next()
+        .expect("At least one group should exist"))
 }

226-236: Improved error handling with better context!

The error handling now provides better context by including the group name, code point, and position in the error message.


278-283: Clear error handling for confused groups!

The error handling properly identifies and reports confusion between different script groups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants