-
Notifications
You must be signed in to change notification settings - Fork 4.6k
RichText: Add a format prop to allow HTML string values to be used in RichText components #6034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
62653d6 to
3877b17
Compare
|
|
||
| switch ( format ) { | ||
| case 'string': | ||
| return this.editor.getContent(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this be executed many times? This will be much slower than this.editor.getContent( { format: 'raw' } ) or one loop over this.editor.getBody().childNodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be executed on each change (type), It's probably slower but since it's limited to a single RichText, maybe it's not that important.
raw is not great because we don't want to save the TinyMCE temporary nodes and attributes. We could probably write an alternative using childNodes and stripping these nodes, but not sure if this is different of what's done internally in TinyMCE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think this is fine as long as it's used on inline RichText instances. Once we get lists to move over to nested blocks we can even get rid of multiline instances altogether.
|
Just wondering, but what's the point of adding it if it is used anywhere? |
|
@iseulde As a said in the description, it's an extensibility API that have been asked for to save HTML to meta. And it could be used on a follow-up PR as alternative to the |
|
Would it be okay to add a use case here to demonstrate and to ensure it works as expected? Cool if it's too complicated, but that might be a sign more work is needed elsewhere if it's not just a one line change. |
|
Sure, here's a small patch to apply to test it on the Audio block (caption) diff --git a/blocks/library/audio/index.js b/blocks/library/audio/index.js
index 803bc907f..0565d0b3e 100644
--- a/blocks/library/audio/index.js
+++ b/blocks/library/audio/index.js
@@ -13,7 +13,7 @@ import {
Placeholder,
Toolbar,
} from '@wordpress/components';
-import { Component } from '@wordpress/element';
+import { Component, RawHTML } from '@wordpress/element';
import { mediaUpload } from '@wordpress/utils';
/**
@@ -44,8 +44,8 @@ export const settings = {
attribute: 'src',
},
caption: {
- type: 'array',
- source: 'children',
+ type: 'string',
+ source: 'html',
selector: 'figcaption',
},
id: {
@@ -161,6 +161,7 @@ export const settings = {
onChange={ ( value ) => setAttributes( { caption: value } ) }
isSelected={ isSelected }
inlineToolbar
+ format="string"
/>
) }
</figure>,
@@ -174,7 +175,7 @@ export const settings = {
return (
<figure>
<audio controls="controls" src={ src } />
- { caption && caption.length > 0 && <figcaption>{ caption }</figcaption> }
+ { caption && caption.length > 0 && <figcaption><RawHTML>{ caption }</RawHTML></figcaption> }
</figure>
);
}, |
aduth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Through my various failures in trying to drop the children source (#5380), I ultimately landed at the same conclusion that this type of format option would be the best path for those who need to work with content as an HTML string, so I'm generally on board with it. I think it does raise the maintenance burden in needing to support multiple formats, particularly when none of the core blocks have shown to have need for it.
One thing which I considered in #5380 is the idea of a RichText.Content component for use in save, so that a developer doesn't need to do the RawHTML rendering themselves. Do you have any thoughts on this? I guess if they need to explicitly assign format, there's already some awareness of the data format. And it would be unfortunate if this needed to be repeated for the RichText.Content element (unless we otherwise "infer" format from the shape of the value).
blocks/rich-text/format.js
Outdated
| * | ||
| * @param {WPElement} value Element. | ||
| * | ||
| * @return {string} HTML. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: I don't think we try to align return and param descriptions now that they're separated with newline.
blocks/rich-text/format.js
Outdated
| * | ||
| * @return {WPElement} Element. | ||
| */ | ||
| export function stringToElement( value ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function gives me bad vibes, for two reasons:
- The implementation is susceptible to XSS
- As used, probably not in a way that would be of any practical concern
- We're performing two levels of transformation, first from string to DOM, then from DOM to element
Since its use is quite limited (for default value), maybe not of huge concern at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though it makes me wonder, why don't we just apply the default value as dangerouslySetInnerHTML ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea it allows us to factorize more logic using valueToString(value, format)
blocks/rich-text/index.js
Outdated
|
|
||
| this.restoreContentAndSplit( beforeElement, afterElement ); | ||
| const { format } = this.props; | ||
| const before = format === 'string' ? domToString( beforeNodes ) : domToElement( beforeNodes ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of ternaries in this file now and it seems like it'd be easy to make mistakes in future maintenance.
I wonder if a simple improvement might just be to encapsulate logic to a single function:
domToFormat( nodes, format )
I actually like the Let's introduce |
|
Can we get this in? |
| onChange() { | ||
| this.isEmpty = this.editor.dom.isEmpty( this.editor.getBody() ); | ||
| this.savedContent = this.isEmpty ? [] : this.getContent(); | ||
| this.savedContent = this.getContent(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any way to prevent the empty check from running twice here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, maybe it could be inferred from the content.
blocks/rich-text/format.js
Outdated
| * @return {string} HTML. | ||
| */ | ||
| export function domToString( value ) { | ||
| return map( value, element => element.outerHTML ).join( '' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In domToElement, we're always stripping any TinyMCE internal element and attributes, but here we're not, even though there might be cases where we receive some in the case of splitting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how to address this in an optimal way. Don't we have tools in TinyMCE to do this for us instead of adding custom logic? Is it a requirement since we're not using this when getting the whole content of the editor but only when splitting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
( new tinymce.dom.Serializer( {} ) ).serialize( el ) ? (Likely with some reuse of a serializer instance)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Andrew but unfortunately it's not working in my testing, we more likely have to specify the same rules used by TinyMCE internally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a requirement since we're not using this when getting the whole content of the editor but only when splitting?
Well, the content will end up getting passed to the store, so I guess so. On consecutive edits we may pass getContent again, but this doesn't necessarily happen. Say you split a paragraph in two and the first part contains some elements with internal attributes. This will be passed on eventually to (block) setAttributes.
Not sure immediately how to fix either but I guess @aduth goes in the right direction. Maybe parse and serialise if you want to work on the string? I'll have a look if we can work on the fragment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a little test here: 39d9966
Splitting is currently entirely broken with the HTML format. The diff fixes it and should also get rid of internal stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the splitting fix, but without serialising, you'd get this:
The goal of this new editor is to make adding rich content to WordPress simple and enjoyable. This whole post is composed of <em data-mce-selected="inline-boundary">pieces of</em>when splitting the first paragraph of the demo content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(The splitting is broken because element.outerHTML doesn't work on text nodes that are passed.)
| <p | ||
| a-prop="hi" | ||
| > | ||
| <p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, isn't this invalid HTML? Won't React warn on nesting paragraphs? Or do we not care here for these tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our new non-React serializer is much less noisy about issues it encounters. Not to say we shouldn't be using valid HTML here though.
aduth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a good refactoring of RichText, even without the new feature. Nice one 👍
| case 'string': | ||
| return this.editor.getContent(); | ||
| default: | ||
| return this.editor.dom.isEmpty( this.editor.getBody() ) ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside: Is the isEmpty check here meant to be an optimization for the easy case of generating an empty string? Is isEmpty itself actually performant? Wondering if we're actually shooting ourselves in the foot with this for the more common case that RichText is non-empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not an optimization, it's to ensure the paragraph block is considered empty consistently. If we remove it we can end up with content like [''] which will break the default appender.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the sense this has been discussed before, but if we have control over the elements we're producing, why are we keeping the empty string ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't tell honestly :) maybe @iseulde can tell more here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but if we have control over the elements we're producing
Do we have control over that? This one is getting the nodes from a contenteditable field which can contain all sorts combination. I've seen <p></p>,<p>''</p>, <p><br></p>, <p><br>''</p>, <p><strong></strong></p> etc., all of which are empty.
blocks/rich-text/index.js
Outdated
| default: | ||
| return this.editor.dom.isEmpty( this.editor.getBody() ) ? | ||
| [] : | ||
| domToFormat( this.editor.getBody().childNodes || [], 'element' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside: In the future, I might imagine we avoid working with the DOM as a source and instead defer to TinyMCE's getContent( { format: 'tree' } ) to convert from their tree form to the desired target format. This would avoid the need for us to maintain our own cleaning logic. At which point, domToFormat doesn't make as much sense, might just be toFormat( value, targetFormat, sourceFormat ) (maybe infer source format?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to always prefer explicitness over implicitness. I don't really care about the name of the function but I tend to be suspicious (maybe I shoudn't) every time I hear "infer from value"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, I can see it, but it's kinda like if one had to pass a second argument to e.g. Array.from:
Array.from( new Set( [ 1, 2, 3 ] ), 'Set' );... one could say: "Well, duh, you passed me a set, I know it's a set"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But sometimes it's not obvious, the easy example is that the tree structure and the element structure can't be distinguished easily because both can be plain objects.
| <p | ||
| a-prop="hi" | ||
| > | ||
| <p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our new non-React serializer is much less noisy about issues it encounters. Not to say we shouldn't be using valid HTML here though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, but would love to see issues in #6034 (comment) fixed before merge.
… RichText components
17e6011 to
db26521
Compare
|
@youknowriad Fixed the splitting. There are still errors on merging though, where the merge function seems to assume React values. |
|
@iseulde I tried splitting and merging and it works well for me with the last changes. The |
|
Oh, I didn't update that. Sorry. Looks good to go then. |
|
Noting that we should probably wait to merge this until after the pending release. |
|
Just want to throw in here that this would be super helpful and something have gotten requested from VIP. Especially for migrating data from ACF or CMB2 fields and porting them over to blocks, or at least saving data in a more consistent way with what folks are used to. Would love to see this rolled it, have been struggling with workarounds for this exact thing. Thanks @youknowriad! |
… RichText components (WordPress#6034) * RichText: Add a format prop to allow HTML string values to be used in RichText components * Don't align params and returns docs * Add domToFormat function to factorize ternaries * Remove useless stringToElement * Add valueToString to avoid ternaries when converting to strings * Avoid double isEmpty check * Use valid HTML in unit tests * Fix splitting
|
Noting that our gutenberg/packages/editor/src/components/rich-text/patterns.js Lines 193 to 196 in 751a2fc
|
We discussed several times whether we want to treat RichText values as strings, elements or raw trees. In this PR I'm introducing a
formatprop for the RichText component for two reasons:If someone wants to store HTML in a meta attribute, the RichText component should produce a string output and not an array of element. This has been raised a while ago but I'm not able to find the corresponding issue.
We know we want to move away from the element's tree as default representation of RichText value (because it's not serializable properly on the server), we're not certain yet about the alternative. But anyway in order to do so (change the format) we need a migration plan, this
formatwill allow us to introduce warnings and at some point switch the defaultformat(which iselementat the moment) and at a later point maybe remove theelementformat entirely.Testing instructions
Right now, the newly introduce "string" format is not used anywhere, so this should not change anything for the existing blocks
You can setup a meta block with a "string" format for the RichText component.