1717 * E.g. match having class `1<"2` needs to recognize `class="1<"2"`.
1818 * @TODO: Decode character references in `get_attribute()`
1919 * @TODO: Properly escape attribute value in `set_attribute()`
20+ * @TODO: Add slow mode to escape character entities in CSS class names?
21+ * (This requires a custom decoder since `html_entity_decode()`
22+ * doesn't handle attribute character reference decoding rules.
2023 *
2124 * @package WordPress
2225 * @subpackage HTML
2831 * of patches to that input. Tokenizes HTML but does not fully
2932 * parse the input document.
3033 *
34+ * ## Usage
35+ *
36+ * Use of this class requires three steps:
37+ *
38+ * 1. Create a new class instance with your input HTML document.
39+ * 2. Find the tag(s) you are looking for.
40+ * 3. Request changes to the attributes in those tag(s).
41+ *
42+ * Example:
43+ * ```php
44+ * $tags = new WP_HTML_Tag_Processor( $html );
45+ * if ( $tags->next_tag( [ 'tag_name' => 'option' ] ) ) {
46+ * $tags->set_attribute( 'selected', true );
47+ * }
48+ * ```
49+ *
50+ * ### Finding tags
51+ *
52+ * The `next_tag()` function moves the internal cursor through
53+ * your input HTML document until it finds a tag meeting any of
54+ * the supplied restrictions in the optional query argument. If
55+ * no argument is provided then it will find the next HTML tag,
56+ * regardless of what kind it is.
57+ *
58+ * If you want to _find whatever the next tag is_
59+ * ```php
60+ * $tags->next_tag();
61+ * ```
62+ *
63+ * | Goal | Query |
64+ * |-----------------------------------------------------------|----------------------------------------------------------------------------|
65+ * | Find any tag. | `$tags->next_tag();` |
66+ * | Find next image tag. | `$tags->next_tag( [ 'tag_name' => 'img' ] );` |
67+ * | Find next tag containing the `fullwidth` CSS class. | `$tags->next_tag( [ 'class_name' => 'fullwidth' ] );` |
68+ * | Find next image tag containing the `fullwidth` CSS class. | `$tags->next_tag( [ 'tag_name' => 'img', 'class_name' => 'fullwidth' ] );` |
69+ *
70+ * If a tag was found meeting your criteria then `next_tag()`
71+ * will return `true` and you can proceed to modify it. If it
72+ * returns `false`, however, it failed to find the tag and
73+ * moved the cursor to the end of the file.
74+ *
75+ * Once the cursor reaches the end of the file the processor
76+ * is done and if you want to reach an earlier tag you will
77+ * need to recreate the processor and start over. The internal
78+ * cursor can only proceed forward, never backing up.
79+ *
80+ * #### Custom queries
81+ *
82+ * Sometimes it's necessary to further inspect an HTML tag than
83+ * the query syntax here permits. In these cases one may further
84+ * inspect the search results using the read-only functions
85+ * provided by the processor or external state or variables.
86+ *
87+ * Example:
88+ * ```php
89+ * // Paint up to the first five DIV or SPAN tags marked with the "jazzy" style.
90+ * $remaining_count = 5;
91+ * while ( $remaining_count > 0 && $tags->next_tag() ) {
92+ * if (
93+ * ( 'DIV' === $tags->get_tag() || 'SPAN' === $tags->get_tag() ) &&
94+ * 'jazzy' === $tags->get_attribute( 'data-style' )
95+ * ) {
96+ * $tags->add_class( 'theme-style-everest-jazz' );
97+ * $remaining_count--;
98+ * }
99+ * }
100+ * ```
101+ *
102+ * `get_attribute()` will return `null` if the attribute wasn't present
103+ * on the tag when it was called. It may return `""` (the empty string)
104+ * in cases where the attribute was present but its value was empty.
105+ * For boolean attributes, those whose name is present but no value is
106+ * given, it will return `true` (the only way to set `false` for an
107+ * attribute is to remove it).
108+ *
109+ * ### Modifying HTML attributes for a found tag
110+ *
111+ * Once you've found the start of an opening tag you can modify
112+ * any number of the attributes on that tag. You can set a new
113+ * value for an attribute, remove the entire attribute, or do
114+ * nothing and move on to the next opening tag.
115+ *
116+ * Example:
117+ * ```php
118+ * if ( $tags->next_tag( [ 'class' => 'wp-group-block' ] ) ) {
119+ * $tags->set_attribute( 'title', 'This groups the contained content.' );
120+ * $tags->remove_attribute( 'data-test-id' );
121+ * }
122+ * ```
123+ *
124+ * If `set_attribute()` is called for an existing attribute it will
125+ * overwrite the existing value. Similarly, calling `remove_attribute()`
126+ * for a non-existing attribute has no effect on the document. Both
127+ * of these methods are safe to call without knowing if a given attribute
128+ * exists beforehand.
129+ *
130+ * ### Modifying CSS classes for a found tag
131+ *
132+ * The tag processor treats the `class` attribute as a special case.
133+ * Because it's a common operation to add or remove CSS classes you
134+ * can do so using this interface.
135+ *
136+ * As with attribute values, adding or removing CSS classes is a safe
137+ * operation that doesn't require checking if the attribute or class
138+ * exists before making changes. If removing the only class then the
139+ * entire `class` attribute will be removed.
140+ *
141+ * Example:
142+ * ```php
143+ * // from `<span>Yippee!</span>`
144+ * // to `<span class="is-active">Yippee!</span>`
145+ * $tags->add_class( 'is-active' );
146+ *
147+ * // from `<span class="excited">Yippee!</span>`
148+ * // to `<span class="excited is-active">Yippee!</span>`
149+ * $tags->add_class( 'is-active' );
150+ *
151+ * // from `<span class="is-active heavy-accent">Yippee!</span>`
152+ * // to `<span class="is-active heavy-accent">Yippee!</span>`
153+ * $tags->add_class( 'is-active' );
154+ *
155+ * // from `<input type="text" class="is-active rugby not-disabled" length="24">`
156+ * // to `<input type="text" class="is-active not-disabled" length="24">
157+ * $tags->remove_class( 'rugby' );
158+ *
159+ * // from `<input type="text" class="rugby" length="24">`
160+ * // to `<input type="text" length="24">
161+ * $tags->remove_class( 'rugby' );
162+ *
163+ * // from `<input type="text" length="24">`
164+ * // to `<input type="text" length="24">
165+ * $tags->remove_class( 'rugby' );
166+ * ```
167+ *
168+ * ## Design limitations
169+ *
170+ * @TODO: Expand this section
171+ *
172+ * - no nesting: cannot match open and close tag
173+ * - only move forward, never backward
174+ * - class names not decoded if they contain character references
175+ * - only secures against HTML escaping issues; requires
176+ * manually sanitizing or escaping values based on the needs of
177+ * each individual attribute, since different attributes have
178+ * different needs.
179+ *
31180 * @since 6.2.0
32181 */
33182class WP_HTML_Tag_Processor {
@@ -136,16 +285,16 @@ class WP_HTML_Tag_Processor {
136285 * // and stops after recognizing the `id` attribute
137286 * // <div id="test-4" class=outline title="data:text/plain;base64=asdk3nk1j3fo8">
138287 * // ^ parsing will continue from this point
139- * $this->attributes = array(
288+ * $this->attributes = [
140289 * 'id' => new WP_HTML_Attribute_Match( 'id', null, 6, 17 )
141- * ) ;
290+ * ] ;
142291 *
143292 * // when picking up parsing again, or when asking to find the
144293 * // `class` attribute we will continue and add to this array
145- * $this->attributes = array(
146- * 'id' => new WP_HTML_Attribute_Match( 'id', null, 6, 17 ),
294+ * $this->attributes = [
295+ * 'id' => new WP_HTML_Attribute_Match( 'id', null, 6, 17 ),
147296 * 'class' => new WP_HTML_Attribute_Match( 'class', 'outline', 18, 32 )
148- * ) ;
297+ * ] ;
149298 *
150299 * // Note that only the `class` attribute value is stored in the index.
151300 * // That's because it is the only value used by this class at the moment.
@@ -170,11 +319,11 @@ class WP_HTML_Tag_Processor {
170319 * Example:
171320 * <code>
172321 * // Add the `WP-block-group` class, remove the `WP-group` class.
173- * $class_changes = array(
322+ * $class_changes = [
174323 * // Indexed by a comparable class name
175324 * 'wp-block-group' => new WP_Class_Name_Operation( 'WP-block-group', WP_Class_Name_Operation::ADD ),
176325 * 'wp-group' => new WP_Class_Name_Operation( 'WP-group', WP_Class_Name_Operation::REMOVE )
177- * ) ;
326+ * ] ;
178327 * </code>
179328 *
180329 * @since 6.2.0
@@ -206,9 +355,9 @@ class WP_HTML_Tag_Processor {
206355 *
207356 * // Correspondingly, something like this
208357 * // will appear in the replacements array.
209- * $replacements = array(
358+ * $replacements = [
210359 * WP_HTML_Text_Replacement( 14, 28, 'https://my-site.my-domain/wp-content/uploads/2014/08/kittens.jpg' )
211- * ) ;
360+ * ] ;
212361 * </code>
213362 *
214363 * @since 6.2.0
@@ -270,9 +419,9 @@ public function next_tag( $query = null ) {
270419 if ( 's ' === $ t || 'S ' === $ t || 't ' === $ t || 'T ' === $ t ) {
271420 $ tag_name = $ this ->get_tag ();
272421
273- if ( 'script ' === $ tag_name ) {
422+ if ( 'SCRIPT ' === $ tag_name ) {
274423 $ this ->skip_script_data ();
275- } elseif ( 'textarea ' === $ tag_name || 'title ' === $ tag_name ) {
424+ } elseif ( 'TEXTAREA ' === $ tag_name || 'TITLE ' === $ tag_name ) {
276425 $ this ->skip_rcdata ( $ tag_name );
277426 }
278427 }
@@ -318,7 +467,7 @@ private function skip_rcdata( $tag_name ) {
318467 $ tag_char = $ tag_name [ $ i ];
319468 $ html_char = $ html [ $ at + $ i ];
320469
321- if ( $ html_char !== $ tag_char && strtolower ( $ html_char ) !== $ tag_char ) {
470+ if ( $ html_char !== $ tag_char && strtoupper ( $ html_char ) !== $ tag_char ) {
322471 $ at += $ i ;
323472 continue 2 ;
324473 }
@@ -937,7 +1086,7 @@ public function get_tag() {
9371086
9381087 $ tag_name = substr ( $ this ->html , $ this ->tag_name_starts_at , $ this ->tag_name_length );
9391088
940- return strtolower ( $ tag_name );
1089+ return strtoupper ( $ tag_name );
9411090 }
9421091
9431092 /**
@@ -1189,7 +1338,7 @@ private function matches() {
11891338
11901339 /*
11911340 * Otherwise we have to check for each character if they
1192- * are the same, and only `strtolower ()` if we have to.
1341+ * are the same, and only `strtoupper ()` if we have to.
11931342 * Presuming that most people will supply lowercase tag
11941343 * names and most HTML will contain lowercase tag names,
11951344 * most of the time this runs we shouldn't expect to
@@ -1199,7 +1348,7 @@ private function matches() {
11991348 $ html_char = $ this ->html [ $ this ->tag_name_starts_at + $ i ];
12001349 $ tag_char = $ this ->sought_tag_name [ $ i ];
12011350
1202- if ( $ html_char !== $ tag_char && strtolower ( $ html_char ) !== $ tag_char ) {
1351+ if ( $ html_char !== $ tag_char && strtoupper ( $ html_char ) !== $ tag_char ) {
12031352 return false ;
12041353 }
12051354 }
0 commit comments