Skip to content
Closed
Changes from 1 commit
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
7f1f6a0
Introduce data provider to allow extending test coverage
ockham Aug 13, 2025
054756d
Add test coverage for Button block's text attribute
ockham Aug 13, 2025
b328dee
Block Bindings: Simplify replace_html() method
ockham Aug 13, 2025
21e80e2
Add WP_Block_Bindings_Processor class
ockham Aug 18, 2025
1c70bc3
Use WP_Block_Bindings_Processor for block bindings
ockham Aug 18, 2025
8f76759
Add kses back :/
ockham Aug 18, 2025
b7b7ca5
WPCS
ockham Aug 18, 2025
bb7c906
Remove obsolete var
ockham Aug 18, 2025
103a5c4
Indentation
ockham Aug 18, 2025
3f2e32b
Return true upon success
ockham Aug 18, 2025
85b6354
Add basic PHPDoc
ockham Aug 18, 2025
b67890e
Add more PHPDoc
ockham Aug 19, 2025
6bc4fd5
Add basic test coverage
ockham Aug 19, 2025
d943cd8
Allow setting attributes
ockham Aug 19, 2025
527c5d8
Increase test coverage
ockham Aug 19, 2025
86e836d
Increase test coverage
ockham Aug 20, 2025
dad7380
Add more commentary and warnings
ockham Aug 20, 2025
88af5ff
Do not explose block bindings processor class
sirreal Aug 21, 2025
a56f978
Remove block bindings processor class
sirreal Aug 21, 2025
7c3fc45
wpcs
sirreal Aug 21, 2025
66fef38
Make the hidden class instance static
sirreal Aug 21, 2025
24a0b0d
Use Reflection to make tests work again
ockham Aug 25, 2025
2e7df73
Block Bindings Processor: Tweak test to break with current implementa…
ockham Aug 25, 2025
e2f0a38
Block Bindings Processor: Base implementation on WP_HTML_Text_Replace…
ockham Aug 25, 2025
b045050
Remove now-obsolete build() alias method
ockham Aug 25, 2025
e739c6c
Rename test file
ockham Aug 25, 2025
6a4a3a3
Correct @since PHPDoc
ockham Aug 25, 2025
4dbb0e5
More descriptive variable names
ockham Aug 26, 2025
a18e435
Remove static var
ockham Aug 26, 2025
3410f40
Make sure we're not stopped on atomic/void/self-closing element
ockham Aug 26, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add basic PHPDoc
  • Loading branch information
ockham committed Aug 26, 2025
commit 85b635419cffd7dbff1d65b74d533fd99106f751
16 changes: 16 additions & 0 deletions src/wp-includes/class-wp-block-bindings-processor.php
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
<?php

/**
* WP_Block_Bindings_Processor class.
*
* This class can be used to perform the sort of structural
* changes to an HTML document that are required by
* Block Bindings.
*/
class WP_Block_Bindings_Processor extends WP_HTML_Processor {
private $output = '';
private $end_of_flushed = 0;
Expand All @@ -8,6 +15,15 @@ public function build() {
return $this->output . substr( $this->html, $this->end_of_flushed );
}

/**
* Replace the rich text content between a tag opener and matching closer.
*
* When stopped on a tag opener, replace the content enclosed by it and its
* matching closer with the provided rich text.
*
* @param string $rich_text The rich text to replace the original content with.
* @return bool True on success.
*/
public function replace_rich_text( $rich_text ) {
Copy link
Member

@sirreal sirreal Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should rely on WP_HTML_Text_Replacement and lexical_updates, is there a reason it does not?

The algorithm should be something like:

  • Ensure the processor is stopped on an open tag that is not atomic (like SCRIPT), void (like BR), nor foreign content with a self-closing flag (like G in (<svg><g /></svg>).
  • It may be necessary to flush updates with get_updated_html, I'm not sure off the top of my head.
  • Find the open tag, use a bookmark to get its closing position.
  • Find the matching close tag, use a bookmark to find the byte length of the replacement.
  • Push a lexical update replacement like this:
$this->lexical_updates[] = new WP_HTML_Text_Replacement(
	$start,
	$length,
	$replacement_html
);

I'd like if the function mentioned that the replacement is not checked for proper HTML nesting, it's unsafe by nature. The method provides raw HTML replacement between tags. There's no guarantee that the HTML will nest as expected after the replacement is applied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should rely on WP_HTML_Text_Replacement and lexical_updates, is there a reason it does not?

The algorithm should be something like: [...]

I think that would amount to a basic version of set_inner_html(), no? The idea for WP_Block_Bindings_Processor was that it would give weaker guarantees -- e.g. you can't currently replace_rich_text(), then seek() to an earlier position, and make another change. This simpler and less safe implementation should however be sufficient for the problem domain -- i.e. block bindings -- where we can make a few assumptions on what kind of operations are needed.

The current approach is based on a recommendation and an earlier experiment by @dmsnell, which is similar in spirit (with a string buffer that's totally separate from the one that's kept by WP_HTML_Processor).

That said, I can spin up another PR to try out the lexical_updates-based approach.


I'd like if the function mentioned that the replacement is not checked for proper HTML nesting, it's unsafe by nature. The method provides raw HTML replacement between tags. There's no guarantee that the HTML will nest as expected after the replacement is applied.

Good point. Something like that is mentioned in the PHPDoc of the class, but it's a good idea to also include it in the method's.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I can spin up another PR to try out the lexical_updates-based approach.

ockham#6

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would amount to a basic version of set_inner_html(), no?

Yes, but an unsafe version of set_inner_html(). Similar to what's implemented here.

The idea for WP_Block_Bindings_Processor was that it would give weaker guarantees -- e.g. you can't currently replace_rich_text(), then seek() to an earlier position, and make another change. This simpler and less safe implementation should however be sufficient for the problem domain -- i.e. block bindings -- where we can make a few assumptions on what kind of operations are needed.

The current approach is based on a recommendation and an earlier experiment by @dmsnell, which is similar in spirit (with a string buffer that's totally separate from the one that's kept by WP_HTML_Processor).

It's unclear to me why this would be safer. The unsafe operation is intending to put HTML inside a tag that potentially "leaks" to change external structure. Once we do that here, it feels like the unsafe part has happened. Taking the result of this and putting it back into another processor easily circumvents the prevention on seeks and such.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for context I was using substr() because we controlled the entire execution environment and only moved forward after making any changes.

while I have no problem moving to using lexical updates, it may be valuable to wait until the end to do things with them given that the HTML Processor is already relatively slow.

what I’ve done in the past is keep an external array of updates, then at the end dump them into a new Tag Processor for batch replacement. but still, I think that moving forward only is important here.

$this->replacements[] = new WP_HTML_Text_Replacment(…);

…

$replacer = new class( $html ) extends WP_HTML_Tag_Processor {
	public function swap_replacements( $replacements ) {
		$this->lexical_updates = $replacements;
	}
}

$replacer->swap_replacements( $this->replacements );
return $replacer->get_updated_html();

@ockham and I did discuss safety and I was suggesting that this serialization builder is probably good enough for now, providing comparable security to what Dom\HtmlDocument provides. everything above that is even better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Ensure the processor is stopped on an open tag that is not atomic (like SCRIPT), void (like BR), nor foreign content with a self-closing flag (like G in (<svg><g /></svg>).

Looks like expects_closer() (negated) should cover those cases: 3410f40

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the way that the attribute replacing in block bindings currently works is as follows:

  • For each bound block attribute, we create a new processor for the block markup. The type of processor we create depends on the attribute source:
    • If the attribute source is rich-text, we create a block bindings processor; if the attribute source is an HTML attribute, we create an HTML Tag Processor.
  • We use the processor to locate the sourced attribute's content in the block markup, and replace it with its new value.
  • We then call get_updated_html() to get the resulting HTML.

This means that there's currently only ever one attribute replaced per processor created. We also only move forward within that processor, with one exception: If we have a comma-separated selector like e.g. the Button block's text attribute has, we iterate over those individual tag selectors, and reset the processor's cursor to the first tag (by seek()ing a bookmark that we set initially).

In practice, a comma-separated selector such as a,button is there because the user can choose whether they want their button block to use an a tag or a button. These choices are of course mutually exclusive -- one button block will either contain an a or a button, but never both. As a result, the seek will only happen before the block markup is modified, which AFAICT should be fine ✅


It is however arguable that instantiating a new processor for each attribute isn't great in terms of performance. If we were ever going to change that and instead use one processor to update all bound attributes, we'd have to deal with cases like the following -- an image block whose url and caption attributes need to be bound:

<!-- wp:image {"metadata":{"bindings":{"url":{"source":"test/source"}}}} -->
<figure class="wp-block-image">
<img src="fallback.jpg" alt=""/>
<figcaption class="wp-element-caption">Fallback caption</figcaption></figure>
<!-- /wp:image -->

I guess we'd change the algorithm to advance through the markup tag-by-tag, check if the current tag matches any of the sourced attribute selectors, and replace it if yes. So we'd probably still be able to ensure forward movement.


So what's the bottom line? I'd say that we can guarantee forward-only movement right now, meaning that the serialization builder would probably be good enough indeed. I'm also fine with the WP_HTML_Text_Replacment-based one.

@dmsnell As for the extra optimization of keeping a separate stack of WP_HTML_Text_Replacments, and feeding them to a tag processor at the end for better performance, I'm wondering if that makes a difference for our use case? We stop traversing the document and apply the (only) change via get_updated_html() once we've located the portion that needs replacing anyway.

Copy link
Contributor Author

@ockham ockham Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've applied Dennis' suggestion locally:

Patch
diff --git a/src/wp-includes/class-wp-block.php b/src/wp-includes/class-wp-block.php
index e2b75f6643..54f175bb38 100644
--- a/src/wp-includes/class-wp-block.php
+++ b/src/wp-includes/class-wp-block.php
@@ -435,7 +435,13 @@ class WP_Block {
 						// TODO: Use `WP_HTML_Processor::set_inner_html` method once it's available.
 						$block_reader->release_bookmark( 'iterate-selectors' );
 						$block_reader->replace_rich_text( wp_kses_post( $source_value ) );
-						return $block_reader->get_updated_html();
+						$replacer = new class( $block_content ) extends WP_HTML_Tag_Processor {
+							public function swap_replacements( $replacements ) {
+								$this->lexical_updates = $replacements;
+							}
+						};
+						$replacer->swap_replacements( $block_reader->my_lexical_updates );
+						return $replacer->get_updated_html();
 					} else {
 						$block_reader->seek( 'iterate-selectors' );
 					}
@@ -463,6 +469,8 @@ class WP_Block {
 
 	private static function get_block_bindings_processor( string $block_content ) {
 		$internal_processor_class = new class('', WP_HTML_Processor::CONSTRUCTOR_UNLOCK_CODE) extends WP_HTML_Processor {
+			public $my_lexical_updates = array();
+
 			/**
 			 * Replace the rich text content between a tag opener and matching closer.
 			 *
@@ -494,7 +502,7 @@ class WP_Block {
 				$end         = $tag_closer->start;
 				$this->release_bookmark( '_wp_block_bindings_tag_closer' );
 
-				$this->lexical_updates[] = new WP_HTML_Text_Replacement(
+				$this->my_lexical_updates[] = new WP_HTML_Text_Replacement(
 					$start,
 					$end - $start,
 					$rich_text

One thing I've noted based on the unit tests I wrote for this PR is that it becomes harder to mix attribute changes (that rely on methods that are native to the HTML (Tag) Processor) with rich text replacements that rely on the block bindings processor -- essentially requiring an extra get_updated_html() to apply the former before the latter:

Patch
diff --git a/tests/phpunit/tests/block-bindings/wp-block-get-block-bindings-processor.php b/tests/phpunit/tests/block-bindings/wp-block-get-block-bindings-processor.php
index b199e10b0e..1714734d5a 100644
--- a/tests/phpunit/tests/block-bindings/wp-block-get-block-bindings-processor.php
+++ b/tests/phpunit/tests/block-bindings/wp-block-get-block-bindings-processor.php
@@ -45,26 +52,35 @@ class Tests_Blocks_GetBlockBindingsProcessor extends WP_UnitTestCase {
 		$figure_opener = '<figure class="wp-block-image">';
 		$img           = '<img src="breakfast.jpg" alt="" class="wp-image-1"/>';
 		$figure_closer = '</figure>';
-		$processor     = self::$get_block_bindings_processor_method->invoke(
-			null,
-			$figure_opener .
+
+		$html = $figure_opener .
 			$img .
 			'<figcaption class="wp-element-caption">Breakfast at a <em>café</em> in Berlin</figcaption>' .
-			$figure_closer
-		);
+			$figure_closer;
+
+		$processor = self::$get_block_bindings_processor_method->invoke( null, $html );
 
 		$processor->next_tag( array( 'tag_name' => 'figure' ) );
 		$processor->add_class( 'size-large' );
+		$html = $processor->get_updated_html(); // This right here.
 
 		$processor->next_tag( array( 'tag_name' => 'figcaption' ) );
 
 		$this->assertTrue( $processor->replace_rich_text( '<strong>New</strong> image caption' ) );
+
+		$replacer = new class( $html ) extends WP_HTML_Tag_Processor {
+			public function swap_replacements( $replacements ) {
+				$this->lexical_updates = $replacements;
+			}
+		};
+		$replacer->swap_replacements( $processor->my_lexical_updates );
+
 		$this->assertEquals(
 			'<figure class="wp-block-image size-large">' .
 			$img .
 			'<figcaption class="wp-element-caption"><strong>New</strong> image caption</figcaption>' .
 			$figure_closer,
-			$processor->get_updated_html()
+			$replacer->get_updated_html()
 		);
 	}
 

It gets of course even messier if the attribute change is made after the rich text change, and at an earlier position than the rich text change.

if ( $this->is_tag_closer() ) {
return false;
Expand Down