-
Notifications
You must be signed in to change notification settings - Fork 489
Closed
Labels
Description
What version of regex are you using?
1.5.6
Describe the bug at a high level.
The letters of the Vithkuqi script, a script for writing the Albanian language, were added to Unicode version 14.0. The respective Unicode block is from U+10570 to U+105BF. I discovered that the regex \w+ does not match the letters of this block. Additionally, case-insensitive regexes starting with (?i) do not match both Vithkuqi uppercase and lowercase letters.
What are the steps to reproduce the behavior?
use regex::Regex;
let upper = "\u{10570}"; // Vithkuqi Capital Letter A
let lower = upper.to_lowercase(); // Vithkuqi Small Letter A (U+10597)
let r1 = Regex::new("(?i)^\u{10570}$").unwrap();
let r2 = Regex::new("^\\w+$").unwrap();
println!("{}", r1.is_match(upper));
println!("{}", r1.is_match(&lower));
println!("{}", r2.is_match(upper));
println!("{}", r2.is_match(&lower));What is the actual behavior?
The actual output is:
true
false
false
false
What is the expected behavior?
The expected output is:
true
true
true
true