Skip to content
Closed
Prev Previous commit
Next Next commit
Update escaping function to use simple strtr()
  • Loading branch information
sirreal committed Oct 8, 2025
commit 1435b6e1dbd24fdfa28b0992ef12b53ddeb930e9
22 changes: 20 additions & 2 deletions src/wp-includes/html-api/class-wp-html-tag-processor.php
Original file line number Diff line number Diff line change
Expand Up @@ -3757,7 +3757,16 @@ public function set_modifiable_text( string $plaintext_content ): bool {
$this->lexical_updates['modifiable text'] = new WP_HTML_Text_Replacement(
$this->text_starts_at,
$this->text_length,
htmlspecialchars( $plaintext_content, ENT_QUOTES | ENT_HTML5 )
strtr(
$plaintext_content,
array(
'<' => '&lt;',
'>' => '&gt;',
'&' => '&amp;',
'"' => '&quot;',
"'" => '&apos;',
)
)
);

return true;
Expand Down Expand Up @@ -3957,7 +3966,16 @@ public function set_attribute( $name, $value ): bool {
*/
$escaped_new_value = in_array( $comparable_name, wp_kses_uri_attributes(), true )
? esc_url( $value )
Copy link
Member

@westonruter westonruter Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The esc_url() function also does some stuff with entities, namely:

// Replace ampersands and single quotes only when displaying.
if ( 'display' === $_context ) {
$url = wp_kses_normalize_entities( $url );
$url = str_replace( '&amp;', '&#038;', $url );
$url = str_replace( "'", '&#039;', $url );
}

To avoid that, I think using esc_url_raw() is preferable:

Suggested change
? esc_url( $value )
? htmlspecialchars( esc_url_raw( $value ) )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is only a formatting change, I broke the ternary onto three lines.

Can we consider that suggestion in a separate ticket and PR?

Copy link
Member

@westonruter westonruter Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sure, but it it is related to the double-escaping issue, because esc_url() does wp_kses_normalize_entities() which includes converting all & to &amp; among other things. For example, these:

var_dump( esc_url( 'http://example.com/?foo&amp;bar' ) );
var_dump( esc_url( 'http://example.com/?foo&#038;bar' ) );
var_dump( esc_url( 'http://example.com/?foo&bar' ) );

All result in:

string(32) "http://example.com/?foo&#038;bar"
string(32) "http://example.com/?foo&#038;bar"
string(32) "http://example.com/?foo&#038;bar"

I don't really see why a URI attribute should be treated differently at the HTML API layer. So this could instead just be:

$escaped_new_value = strtr(
	$value,
	array(
		'<' => '&lt;',
		'>' => '&gt;',
		'&' => '&amp;',
		'"' => '&quot;',
		"'" => '&apos;',
	)
);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right that the double-escaping issue also exists in esc_url(), that's unfortunate. esc_url() does a lot of things, it doesn't seem as easily replaced as esc_attr() or esc_html(). I suspect more discussion will be necessary regarding URL-related changes which is why I'm reluctant to include those changes here.

See r58472 and r58473.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, sounds good.

: htmlspecialchars( $value, ENT_QUOTES | ENT_HTML5 );
: strtr(
$value,
array(
'<' => '&lt;',
'>' => '&gt;',
'&' => '&amp;',
'"' => '&quot;',
"'" => '&apos;',
)
);

// If the escaping functions wiped out the update, reject it and indicate it was rejected.
if ( '' === $escaped_new_value && '' !== $value ) {
Expand Down
Loading