-
Notifications
You must be signed in to change notification settings - Fork 0
Change error handling and update comments for cargo doc #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughThe changes in this pull request include several updates across multiple files, primarily focusing on error handling, method signatures, and the introduction of new structures and enums. Notable modifications include the addition of a new badge in Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Normalizer
participant ProcessedName
User->>Normalizer: tokenize(input)
Normalizer-->>User: TokenizedName
User->>Normalizer: process(input)
Normalizer->>ProcessedName: Create ProcessedName
Normalizer-->>User: Result<ProcessedName>
User->>Normalizer: normalize(input)
Normalizer->>ProcessedName: Call normalize
Normalizer-->>User: Result<String>
User->>Normalizer: beautify(input)
Normalizer->>ProcessedName: Call beautify
Normalizer-->>User: Result<String>
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (10)
src/tokens/types.rs (3)
75-79: Consider adding documentation comments to struct fieldsTo enhance code maintainability and clarity for users of the API, consider adding documentation comments for the public fields in the
TokenValidstruct. This practice helps others understand the purpose and usage of each field.
81-112: Consider adding documentation comments to newly added structsThe newly introduced structs (
TokenMapped,TokenIgnored,TokenDisallowed,TokenStop,TokenNfc) have public fields without documentation comments. Adding comments to these fields will improve code readability and assist developers in understanding their roles.
Line range hint
113-124: Add explanations for complex fields inTokenEmojistructThe
TokenEmojistruct contains multiple fields likeinput,emoji,cps_input, andcps_no_fe0f. Providing detailed documentation comments for these fields will help clarify their purposes and how they interact within the struct.src/validate.rs (2)
215-215: CloningParsedGroupmay have performance implicationsAt line 215, cloning the
ParsedGroupusing.cloned()may introduce unnecessary overhead ifParsedGroupis large. If possible, refactor the code to operate on references or consider implementingClonemore efficiently forParsedGroup.
226-236: Optimizing error creation by avoiding unnecessary allocationsWhen creating the
ProcessError::CurrableError,group.name.to_string()allocates a newString. Ifgroup.nameis already aStringor can be made to borrow a&str, consider passing references to avoid unnecessary allocations.src/tokens/tokenize.rs (1)
Line range hint
134-179: Restoring error handling in tokenization functionsThe functions
tokenize_nameandtokenize_inputno longer returnResulttypes, which eliminates error propagation from these functions. This change might hide issues that should be reported to the caller. It's advisable to retain error handling to ensure that any processing errors are correctly managed.src/error.rs (3)
Line range hint
3-12: Documentation could be more detailed for ProcessErrorWhile the documentation is present, it would be helpful to add more details about:
- When each variant occurs
- How to handle or resolve these errors
- Examples of scenarios that trigger these errors
Line range hint
17-36: Well-structured error handling for currable errorsThe
CurrableErrorenum effectively categorizes errors that can be automatically fixed by the normalizer. The error messages are clear and descriptive.Consider adding examples in the documentation for each variant to help users understand when these errors occur.
Line range hint
38-53: Good separation of disallowed sequence errorsThe
DisallowedSequenceenum effectively separates permanent errors from currable ones. The error messages are clear and actionable.Consider adding documentation about:
- Whether these errors are permanent or can be resolved
- What actions users should take when encountering these errors
src/normalizer.rs (1)
6-11: Documentation could be more detailed for EnsNameNormalizerWhile the high-level documentation is good, consider adding:
- Examples of usage
- When to use this vs the no-cache versions
- Performance implications
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (10)
README.md(1 hunks)examples/tokens.rs(1 hunks)src/code_points/specs.rs(2 hunks)src/code_points/types.rs(2 hunks)src/error.rs(4 hunks)src/lib.rs(1 hunks)src/normalizer.rs(3 hunks)src/tokens/tokenize.rs(6 hunks)src/tokens/types.rs(3 hunks)src/validate.rs(5 hunks)
✅ Files skipped from review due to trivial changes (2)
- README.md
- src/code_points/types.rs
🔇 Additional comments (5)
src/tokens/tokenize.rs (1)
457-457: Ensure test coverage reflects changes in error handling
With the changes in function signatures, update the tests to check for scenarios that previously resulted in errors. This will help maintain test coverage and ensure that all functionalities are working as expected.
examples/tokens.rs (1)
7-7: Confirm behavior after removing .unwrap() from tokenize method
By removing .unwrap(), the code assumes that tokenize no longer returns a Result. Ensure that this change aligns with the new behavior of tokenize and that any potential errors are properly handled elsewhere in the code.
src/normalizer.rs (2)
Line range hint 41-50: Well-designed normalize and beautify methods
The methods follow a good pattern:
- They reuse the process method
- They have clear, single responsibilities
- They maintain consistent error handling
27-30: Verify impact of removing Result from tokenize
The tokenize method no longer returns a Result. This could potentially hide errors that were previously being handled explicitly.
src/code_points/specs.rs (1)
99-101: Good improvement in lifetime specification
The explicit lifetime parameter in finditer_emoji improves clarity and ensures the returned iterator's lifetime is properly tied to both the struct and input string.
src/validate.rs
Outdated
| let mut maybe_group = None; | ||
| for last_cp_index in 0..unique_cps.len() { | ||
| let cps = &unique_cps[..last_cp_index + 1]; | ||
| if let Some(group) = specs.groups_for_cps(cps).next() { | ||
| maybe_group = Some(group); | ||
| continue; | ||
| } else { | ||
| let cp = unique_cps[last_cp_index]; | ||
| let index_of_cp = label | ||
| .iter_cps() | ||
| .position(|cp_in_label| cp_in_label == cp) | ||
| .expect("cp must exist in label"); | ||
| match maybe_group { | ||
| Some(group) => { | ||
| return Err(ProcessError::CurrableError { | ||
| inner: CurrableError::Confused { | ||
| group_name: group.name.to_string(), | ||
| cp, | ||
| }, | ||
| index: index_of_cp, | ||
| sequence: utils::cp2str(cp), | ||
| maybe_suggest: Some("".to_string()), | ||
| }); | ||
| } | ||
| None => unreachable!(), | ||
| } | ||
| } | ||
| } | ||
| unreachable!("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid using unreachable!() macro; handle all match arms explicitly
In the determine_group function, the use of unreachable!() within the match statement can lead to panics if assumptions about the data change. To make the code more robust, consider explicitly handling the None case or adding logic to ensure that all possible cases are accounted for.
| specs: &CodePointsSpecs, | ||
| apply_nfc: bool, | ||
| ) -> Result<Self, ProcessError> { | ||
| pub fn from_input(input: impl AsRef<str>, specs: &CodePointsSpecs, apply_nfc: bool) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reconsider removing error handling from from_input method
Changing from_input to return Self instead of Result<Self, ProcessError> might suppress errors that can occur during tokenization. To prevent unexpected behaviors, consider maintaining error propagation to handle invalid inputs gracefully.
| mod validate; | ||
|
|
||
| pub use code_points::*; | ||
| pub(crate) use code_points::*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing visibility to pub(crate) may impact external users
Modifying pub use code_points::*; to pub(crate) use code_points::*; restricts the visibility of code_points exports to within the crate. If any external code depends on these exports, this change can introduce breaking changes. Please verify that no public APIs are affected or consider deprecating before removal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
src/error.rs (1)
Line range hint
17-38: Consider providing meaningful suggestions for curable errorsWhile the error structure is well-designed, the
maybe_suggestfield inProcessErroris consistently set toSome("")in the validation code. Consider providing actual suggestions for fixing the errors where possible.src/validate.rs (1)
421-421: Consider handling potential errors in test setupUsing
unwrap()onTokenizedName::from_inputcould make test failures less clear if the input is invalid. Consider handling the error case explicitly or usingexpect()with a descriptive message.- let name = TokenizedName::from_input(input, specs, true); - let label = name.iter_labels().next().unwrap(); + let name = TokenizedName::from_input(input, specs, true) + .expect("Failed to create TokenizedName from input"); + let label = name.iter_labels() + .next() + .expect("Expected at least one label");
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
src/error.rs(4 hunks)src/validate.rs(5 hunks)
🔇 Additional comments (5)
src/error.rs (2)
Line range hint 3-15: Well-structured error handling with rich context!
The ProcessError enum effectively wraps errors with additional context (index, sequence, suggestion) that will help users understand and fix issues.
Line range hint 40-55: Clear categorization of disallowed sequences!
The DisallowedSequence enum effectively categorizes different types of invalid sequences with clear error messages.
src/validate.rs (3)
326-361: Avoid using unreachable!() macro
The use of unreachable!() can lead to runtime panics if the assumptions about the code change. Consider handling this case explicitly.
Consider simplifying the group determination logic
The nested loops and conditions could be simplified using iterators and early returns.
fn determine_group<'a>(
label: &TokenizedLabel,
unique_cps: &'a [CodePoint],
specs: &'a CodePointsSpecs,
) -> Result<&'a ParsedGroup, ProcessError> {
- if let Some(group) = specs.groups_for_cps(unique_cps).next() {
- Ok(group)
- } else {
- let mut maybe_group = None;
- for last_cp_index in 0..unique_cps.len() {
- let cps = &unique_cps[..last_cp_index + 1];
- if let Some(group) = specs.groups_for_cps(cps).next() {
- maybe_group = Some(group);
- continue;
- } else {
- let cp = unique_cps[last_cp_index];
- let index_of_cp = label
- .iter_cps()
- .position(|cp_in_label| cp_in_label == cp)
- .expect("cp must exist in label");
- let inner = match maybe_group {
- Some(group) => CurrableError::Confused {
- group_name: group.name.to_string(),
- cp,
- },
- None => CurrableError::Disallowed,
- };
- return Err(ProcessError::CurrableError {
- inner,
- index: index_of_cp,
- sequence: utils::cp2str(cp),
- maybe_suggest: Some("".to_string()),
- });
- }
- }
- unreachable!()
+ // Try with all code points first
+ if let Some(group) = specs.groups_for_cps(unique_cps).next() {
+ return Ok(group);
+ }
+
+ // Find the first code point that doesn't fit in any group
+ for (last_cp_index, window) in unique_cps.iter().enumerate() {
+ let cps = &unique_cps[..=last_cp_index];
+ let current_group = specs.groups_for_cps(cps).next();
+
+ if current_group.is_none() {
+ let cp = unique_cps[last_cp_index];
+ let index_of_cp = label
+ .iter_cps()
+ .position(|cp_in_label| cp_in_label == cp)
+ .expect("cp must exist in label");
+
+ let previous_group = if last_cp_index > 0 {
+ specs.groups_for_cps(&unique_cps[..last_cp_index]).next()
+ } else {
+ None
+ };
+
+ let inner = match previous_group {
+ Some(group) => CurrableError::Confused {
+ group_name: group.name.to_string(),
+ cp,
+ },
+ None => CurrableError::Disallowed,
+ };
+
+ return Err(ProcessError::CurrableError {
+ inner,
+ index: index_of_cp,
+ sequence: utils::cp2str(cp),
+ maybe_suggest: Some("".to_string()),
+ });
+ }
}
+
+ // This case is actually impossible as we checked all code points above
+ Ok(specs.groups_for_cps(unique_cps)
+ .next()
+ .expect("At least one group should exist"))
}226-236: Improved error handling with better context!
The error handling now provides better context by including the group name, code point, and position in the error message.
278-283: Clear error handling for confused groups!
The error handling properly identifies and reports confusion between different script groups.
Summary by CodeRabbit
Release Notes
New Features
ens-normalize-rslibrary in the README.Bug Fixes
Documentation
Refactor