You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is possible to use `\p` in a regular expression to match all characters to which the Unicode standard assigns a given property. This allows us to match things like letters in a more cosmopolitan way. However, again due to compatibility with the original language standards, those are only recognized when you put a `u` character (for ((Unicode))) after the regular expression.
156
+
It is possible to use `\p` in a regular expression to match all characters to which the Unicode standard assigns a given property. This allows us to match things like letters in a more cosmopolitan way. However, again due to compatibility with the original language standards, those are recognized only when you put a `u` character (for ((Unicode))) after the regular expression.
157
157
158
158
{{table {cols: [1, 5]}}}
159
159
160
160
| `\p{L}` | Any letter
161
161
| `\p{N}` | Any numeric character
162
162
| `\p{P}` | Any punctuation character
163
-
| `\P{L}` | Any non-letter (uppercase P inverts)
163
+
| `\P{L}` | Any nonletter (uppercase P inverts)
164
164
| `\p{Script=Hangul}` | Any character from the given script (see [Chapter ?](higher_order#scripts))
165
165
166
166
Using `\w` for text processing that may need to handle non-English text (or even English text with borrowed words like “cliché”) is a liability, since it won't treat characters like “é” as letters. Though they tend to be a bit more verbose, `\p` property groups are more robust.
@@ -251,7 +251,7 @@ The first and second `+` characters apply only to the second `o` in `boo` and `h
The `i` at the end of the expression in the example makes this regular expression case-insensitive, allowing it to match the uppercase _B_ in the input string, even though the pattern is itself all lowercase.
254
+
The `i` at the end of the expression in the example makes this regular expression caseinsensitive, allowing it to match the uppercase _B_ in the input string, even though the pattern is itself all lowercase.
255
255
256
256
## Matches and groups
257
257
@@ -359,7 +359,7 @@ If you give the `Date` constructor a single argument, that argument is treated a
Date objects provide methods such as `getFullYear`, `getMonth`, `getDate`, `getHours`, `getMinutes`, and `getSeconds` to extract their components. Besides `getFullYear` there's also `getYear`, which gives you the year minus 1900 (`98` or `119`) and is mostly useless.
362
+
Date objects provide methods such as `getFullYear`, `getMonth`, `getDate`, `getHours`, `getMinutes`, and `getSeconds` to extract their components. Besides `getFullYear` there's also `getYear`, which gives you the year minus 1900 (such as `98` or `125`) and is mostly useless.
363
363
364
364
{{index "capture group", "getDate method", [parentheses, "in regular expressions"]}}
365
365
@@ -391,7 +391,7 @@ If we want to enforce that the match must span the whole string, we can add the
391
391
392
392
{{index "word boundary", "word character"}}
393
393
394
-
There is also a `\b` marker that matches _word boundaries_, positions that have a word character on one side, and a non-word character on the other. Unfortunately, these use the same simplistic concept of word characters as `\w`, and are therefore not very reliable.
394
+
There is also a `\b` marker that matches _word boundaries_, positions that have a word character on one side, and a nonword character on the other. Unfortunately, these use the same simplistic concept of word characters as `\w` and are therefore not very reliable.
395
395
396
396
Note that these boundary markers don't match any actual characters. They just enforce that a given condition holds at the place where it appears in the pattern.
The `e` in the first example is necessary to match, but is not part of the matched string. The `(?! )` notation expresses a _negative_ look-ahead. This only matches if the pattern in the parentheses _doesn't_ match, causing the second example to only match `a` characters that don't have a space after them.
409
+
The `e` in the first example is necessary to match, but is not part of the matched string. The `(?! )` notation expresses a _negative_ look-ahead. This matches only if the pattern in the parentheses _doesn't_ match, causing the second example to match only`a` characters that don't have a space after them.
This code takes a string, finds all occurrences of a number followed by an alphanumeric word, and returns a string that has one less of every such quantity.
540
540
541
-
The `(\d+)` group ends up as the `amount` argument to the function, and the `(\p{L}+)` group gets bound to `unit`. The function converts `amount` to a number—which always works since it matched `\d+` earlier—and makes some adjustments in case there is only one or zero left.
541
+
The `(\d+)` group ends up as the `amount` argument to the function, and the `(\p{L}+)` group gets bound to `unit`. The function converts `amount` to a number—which always works, since it matched `\d+` earlier—and makes some adjustments in case there is only one or zero left.
542
542
543
543
## Greed
544
544
@@ -597,7 +597,7 @@ console.log(regexp.test("Harry is a dodgy character."));
597
597
598
598
{{index ["regular expression", flags], ["backslash character", "in regular expressions"]}}
599
599
600
-
When creating the `\s` part of the string, we have to use two backslashes because we are writing them in a normal string, not a slash-enclosed regular expression. The second argument to the `RegExp` constructor contains the options for the regular expression—in this case, `"gi"` for global and case-insensitive.
600
+
When creating the `\s` part of the string, we have to use two backslashes because we are writing them in a normal string, not a slash-enclosed regular expression. The second argument to the `RegExp` constructor contains the options for the regular expression—in this case, `"gi"` for global and caseinsensitive.
601
601
602
602
But what if the name is `"dea+hl[]rd"` because our user is a ((nerd))y teenager? That would result in a nonsensical regular expression that won't actually match the user's name.
If the match was successful, the call to `exec` automatically updates the `lastIndex` property to point after the match. If no match was found, `lastIndex` is set back to zero, which is also the value it has in a newly constructed regular expression object.
659
+
If the match was successful, the call to `exec` automatically updates the `lastIndex` property to point after the match. If no match was found, `lastIndex` is set back to 0, which is also the value it has in a newly constructed regular expression object.
660
660
661
661
The difference between the global and the sticky options is that when sticky is enabled, the match will succeed only if it starts directly at `lastIndex`, whereas with global, it will search ahead for a position where a match can start.
662
662
@@ -794,11 +794,11 @@ The pattern `if (match = string.match(...))` makes use of the fact that the valu
794
794
795
795
{{index [parentheses, "in regular expressions"]}}
796
796
797
-
If a line is not a section header or a property, the function checks whether it is a comment or an empty line using the expression `/^\s*(;|$)/` to match lines that either contain only space, or space followed by a semicolon (making the rest of the line a comment). When a line doesn't match any of the expected forms, the function throws an exception.
797
+
If a line is not a section header or a property, the function checks whether it is a comment or an empty line using the expression `/^\s*(;|$)/` to match lines that either contain only whitespace, or whitespace followed by a semicolon (making the rest of the line a comment). When a line doesn't match any of the expected forms, the function throws an exception.
798
798
799
799
## Code units and characters
800
800
801
-
Another design mistake that's been standardized in JavaScript regular expressions is that by default, operators like `.` or `?` work on code units, as discussed in [Chapter ?](higher_order#code_units), not actual characters. This means characters that are composed of two code units behave strangely.
801
+
Another design mistake that's been standardized in JavaScript regular expressions is that by default, operators like `.` or `?` work on code units (as discussed in [Chapter ?](higher_order#code_units)), not actual characters. This means characters that are composed of two code units behave strangely.
802
802
803
803
```
804
804
console.log(/🍎{3}/.test("🍎🍎🍎"));
@@ -858,7 +858,7 @@ Regular expressions are a sharp ((tool)) with an awkward handle. They simplify s
858
858
859
859
{{index debugging, bug}}
860
860
861
-
It is almost unavoidable that, in the course of working on these exercises, you will get confused and frustrated by some regular expression's inexplicable ((behavior)). Sometimes it helps to enter your expression into an online tool like [_debuggex.com_](https://www.debuggex.com/) to see whether its visualization corresponds to what you intended and to ((experiment)) with the way it responds to various input strings.
861
+
It is almost unavoidable that, in the course of working on these exercises, you will get confused and frustrated by some regular expression's inexplicable ((behavior)). Sometimes it helps to enter your expression into an online tool like [_debuggex.com_](https://www.debuggex.com) to see whether its visualization corresponds to what you intended and to ((experiment)) with the way it responds to various input strings.
0 commit comments