RichText: Add a format prop to allow HTML string values to be used in RichText components #6034

youknowriad · 2018-04-06T11:20:03Z

We discussed several times whether we want to treat RichText values as strings, elements or raw trees. In this PR I'm introducing a format prop for the RichText component for two reasons:

If someone wants to store HTML in a meta attribute, the RichText component should produce a string output and not an array of element. This has been raised a while ago but I'm not able to find the corresponding issue.
We know we want to move away from the element's tree as default representation of RichText value (because it's not serializable properly on the server), we're not certain yet about the alternative. But anyway in order to do so (change the format) we need a migration plan, this format will allow us to introduce warnings and at some point switch the default format (which is element at the moment) and at a later point maybe remove the element format entirely.

Testing instructions

Right now, the newly introduce "string" format is not used anywhere, so this should not change anything for the existing blocks
You can setup a meta block with a "string" format for the RichText component.

ellatrix · 2018-04-06T13:29:57Z

blocks/rich-text/index.js

+
+		switch ( format ) {
+			case 'string':
+				return this.editor.getContent();


Will this be executed many times? This will be much slower than this.editor.getContent( { format: 'raw' } ) or one loop over this.editor.getBody().childNodes.

It will be executed on each change (type), It's probably slower but since it's limited to a single RichText, maybe it's not that important.

raw is not great because we don't want to save the TinyMCE temporary nodes and attributes. We could probably write an alternative using childNodes and stripping these nodes, but not sure if this is different of what's done internally in TinyMCE

Yeah, I think this is fine as long as it's used on inline RichText instances. Once we get lists to move over to nested blocks we can even get rid of multiline instances altogether.

ellatrix · 2018-04-06T13:32:59Z

Just wondering, but what's the point of adding it if it is used anywhere?

youknowriad · 2018-04-06T13:54:44Z

@iseulde As a said in the description, it's an extensibility API that have been asked for to save HTML to meta.

And it could be used on a follow-up PR as alternative to the children source for some blocks.

ellatrix · 2018-04-06T15:38:25Z

Would it be okay to add a use case here to demonstrate and to ensure it works as expected? Cool if it's too complicated, but that might be a sign more work is needed elsewhere if it's not just a one line change.

youknowriad · 2018-04-06T15:57:27Z

Sure, here's a small patch to apply to test it on the Audio block (caption)

diff --git a/blocks/library/audio/index.js b/blocks/library/audio/index.js
index 803bc907f..0565d0b3e 100644
--- a/blocks/library/audio/index.js
+++ b/blocks/library/audio/index.js
@@ -13,7 +13,7 @@ import {
        Placeholder,
        Toolbar,
 } from '@wordpress/components';
-import { Component } from '@wordpress/element';
+import { Component, RawHTML } from '@wordpress/element';
 import { mediaUpload } from '@wordpress/utils';
 
 /**
@@ -44,8 +44,8 @@ export const settings = {
                        attribute: 'src',
                },
                caption: {
-                       type: 'array',
-                       source: 'children',
+                       type: 'string',
+                       source: 'html',
                        selector: 'figcaption',
                },
                id: {
@@ -161,6 +161,7 @@ export const settings = {
                                                        onChange={ ( value ) => setAttributes( { caption: value } ) }
                                                        isSelected={ isSelected }
                                                        inlineToolbar
+                                                       format="string"
                                                />
                                        ) }
                                </figure>,
@@ -174,7 +175,7 @@ export const settings = {
                return (
                        <figure>
                                <audio controls="controls" src={ src } />
-                               { caption && caption.length > 0 && <figcaption>{ caption }</figcaption> }
+                               { caption && caption.length > 0 && <figcaption><RawHTML>{ caption }</RawHTML></figcaption> }
                        </figure>
                );
        },

aduth

Through my various failures in trying to drop the children source (#5380), I ultimately landed at the same conclusion that this type of format option would be the best path for those who need to work with content as an HTML string, so I'm generally on board with it. I think it does raise the maintenance burden in needing to support multiple formats, particularly when none of the core blocks have shown to have need for it.

One thing which I considered in #5380 is the idea of a RichText.Content component for use in save, so that a developer doesn't need to do the RawHTML rendering themselves. Do you have any thoughts on this? I guess if they need to explicitly assign format, there's already some awareness of the data format. And it would be unfortunate if this needed to be repeated for the RichText.Content element (unless we otherwise "infer" format from the shape of the value).

aduth · 2018-04-10T17:51:15Z

blocks/rich-text/format.js

+ *
+ * @param {WPElement} value Element.
+ *
+ * @return {string}         HTML.


Minor: I don't think we try to align return and param descriptions now that they're separated with newline.

aduth · 2018-04-10T17:52:47Z

blocks/rich-text/format.js

+ *
+ * @return {WPElement}       Element.
+ */
+export function stringToElement( value ) {


This function gives me bad vibes, for two reasons:

The implementation is susceptible to XSS

As used, probably not in a way that would be of any practical concern

We're performing two levels of transformation, first from string to DOM, then from DOM to element

Since its use is quite limited (for default value), maybe not of huge concern at the moment.

Though it makes me wonder, why don't we just apply the default value as dangerouslySetInnerHTML ?

Good idea it allows us to factorize more logic using valueToString(value, format)

aduth · 2018-04-10T17:57:30Z

blocks/rich-text/index.js


-				this.restoreContentAndSplit( beforeElement, afterElement );
+				const { format } = this.props;
+				const before = format === 'string' ? domToString( beforeNodes ) : domToElement( beforeNodes );


There's a lot of ternaries in this file now and it seems like it'd be easy to make mistakes in future maintenance.

I wonder if a simple improvement might just be to encapsulate logic to a single function:

domToFormat( nodes, format )

youknowriad · 2018-04-11T08:41:31Z

One thing which I considered in #5380 is the idea of a RichText.Content component for use in save, so that a developer doesn't need to do the RawHTML rendering themselves. Do you have any thoughts on this? I guess if they need to explicitly assign format

I actually like the RichText.Content and I think we should consider using it even if the value is an element (for consistency). But I personally prefer explicitly assigning the format if it's different that the default format to avoid ambiguities. For example an element could be an object and a potential tree format is an object as well.

Let's introduce RichText.Content in a separate PR

youknowriad · 2018-04-13T08:18:19Z

Can we get this in?

ellatrix · 2018-04-13T10:16:57Z

blocks/rich-text/index.js

 	onChange() {
 		this.isEmpty = this.editor.dom.isEmpty( this.editor.getBody() );
-		this.savedContent = this.isEmpty ? [] : this.getContent();
+		this.savedContent = this.getContent();


Any way to prevent the empty check from running twice here?

Yes, maybe it could be inferred from the content.

ellatrix · 2018-04-13T10:36:15Z

blocks/rich-text/format.js

+ * @return {string} HTML.
+ */
+export function domToString( value ) {
+	return map( value, element => element.outerHTML ).join( '' );


In domToElement, we're always stripping any TinyMCE internal element and attributes, but here we're not, even though there might be cases where we receive some in the case of splitting.

Not sure how to address this in an optimal way. Don't we have tools in TinyMCE to do this for us instead of adding custom logic? Is it a requirement since we're not using this when getting the whole content of the editor but only when splitting?

( new tinymce.dom.Serializer( {} ) ).serialize( el ) ? (Likely with some reuse of a serializer instance)

Thanks Andrew but unfortunately it's not working in my testing, we more likely have to specify the same rules used by TinyMCE internally

Is it a requirement since we're not using this when getting the whole content of the editor but only when splitting?

Well, the content will end up getting passed to the store, so I guess so. On consecutive edits we may pass getContent again, but this doesn't necessarily happen. Say you split a paragraph in two and the first part contains some elements with internal attributes. This will be passed on eventually to (block) setAttributes.

Not sure immediately how to fix either but I guess @aduth goes in the right direction. Maybe parse and serialise if you want to work on the string? I'll have a look if we can work on the fragment.

Did a little test here: 39d9966

Splitting is currently entirely broken with the HTML format. The diff fixes it and should also get rid of internal stuff.

After the splitting fix, but without serialising, you'd get this:

The goal of this new editor is to make adding rich content to WordPress simple and enjoyable. This whole post is composed of pieces of

when splitting the first paragraph of the demo content.

(The splitting is broken because element.outerHTML doesn't work on text nodes that are passed.)

ellatrix · 2018-04-13T10:39:58Z

blocks/rich-text/test/__snapshots__/format.js.snap

+<p
+  a-prop="hi"
+>
+  <p>


Hm, isn't this invalid HTML? Won't React warn on nesting paragraphs? Or do we not care here for these tests?

Our new non-React serializer is much less noisy about issues it encounters. Not to say we shouldn't be using valid HTML here though.

aduth

This feels like a good refactoring of RichText, even without the new feature. Nice one 👍

aduth · 2018-04-13T13:23:01Z

blocks/rich-text/index.js

+			case 'string':
+				return this.editor.getContent();
+			default:
+				return this.editor.dom.isEmpty( this.editor.getBody() ) ?


Aside: Is the isEmpty check here meant to be an optimization for the easy case of generating an empty string? Is isEmpty itself actually performant? Wondering if we're actually shooting ourselves in the foot with this for the more common case that RichText is non-empty.

It's not an optimization, it's to ensure the paragraph block is considered empty consistently. If we remove it we can end up with content like [''] which will break the default appender.

I get the sense this has been discussed before, but if we have control over the elements we're producing, why are we keeping the empty string ?

I can't tell honestly :) maybe @iseulde can tell more here.

but if we have control over the elements we're producing

Do we have control over that? This one is getting the nodes from a contenteditable field which can contain all sorts combination. I've seen ,'',  ,  '',  etc., all of which are empty.

aduth · 2018-04-13T13:24:26Z

blocks/rich-text/index.js

+			default:
+				return this.editor.dom.isEmpty( this.editor.getBody() ) ?
+					[] :
+					domToFormat( this.editor.getBody().childNodes || [], 'element' );


Aside: In the future, I might imagine we avoid working with the DOM as a source and instead defer to TinyMCE's getContent( { format: 'tree' } ) to convert from their tree form to the desired target format. This would avoid the need for us to maintain our own cleaning logic. At which point, domToFormat doesn't make as much sense, might just be toFormat( value, targetFormat, sourceFormat ) (maybe infer source format?).

I tend to always prefer explicitness over implicitness. I don't really care about the name of the function but I tend to be suspicious (maybe I shoudn't) every time I hear "infer from value"

Eh, I can see it, but it's kinda like if one had to pass a second argument to e.g. Array.from:

Array.from( new Set( [ 1, 2, 3 ] ), 'Set' );

... one could say: "Well, duh, you passed me a set, I know it's a set"

But sometimes it's not obvious, the easy example is that the tree structure and the element structure can't be distinguished easily because both can be plain objects.

aduth · 2018-04-13T13:25:31Z

blocks/rich-text/test/__snapshots__/format.js.snap

+<p
+  a-prop="hi"
+>
+  <p>


Our new non-React serializer is much less noisy about issues it encounters. Not to say we shouldn't be using valid HTML here though.

ellatrix

Looks good to me, but would love to see issues in #6034 (comment) fixed before merge.

… RichText components

ellatrix · 2018-04-16T13:50:58Z

@youknowriad Fixed the splitting. There are still errors on merging though, where the merge function seems to assume React values.

youknowriad · 2018-04-18T08:05:30Z

@iseulde I tried splitting and merging and it works well for me with the last changes. The merge function is assume React values and that's fine because it's defined on the block level and the block author can update its behavior depending on the format used. I tried updating the function for paragraph block and it works great for me.

ellatrix · 2018-04-18T14:35:54Z

Oh, I didn't update that. Sorry. Looks good to go then.

aduth · 2018-04-18T15:23:40Z

Noting that we should probably wait to merge this until after the pending release.

zgordon · 2018-04-20T21:11:44Z

Just want to throw in here that this would be super helpful and something have gotten requested from VIP. Especially for migrating data from ACF or CMB2 fields and porting them over to blocks, or at least saving data in a more consistent way with what folks are used to. Would love to see this rolled it, have been struggling with workarounds for this exact thing. Thanks @youknowriad!

… RichText components (WordPress#6034) * RichText: Add a format prop to allow HTML string values to be used in RichText components * Don't align params and returns docs * Add domToFormat function to factorize ternaries * Remove useless stringToElement * Add valueToString to avoid ternaries when converting to strings * Avoid double isEmpty check * Use valid HTML in unit tests * Fix splitting

aduth · 2018-08-13T20:34:34Z

Noting that our patterns transforms make assumptions that the format would be of the children shape. Would this be unexpected for someone who otherwise uses the HTML format for a RichText element?

gutenberg/packages/editor/src/components/rich-text/patterns.js

Lines 193 to 196 in 751a2fc

    
           const block = transformation.transform( { 
        
           	content: [ remainingText, ...drop( content ) ], 
        
           	match: result, 
        
           } );

youknowriad added the [Feature] Rich Text Related to the Rich Text component that allows developers to render a contenteditable label Apr 6, 2018

youknowriad self-assigned this Apr 6, 2018

youknowriad requested review from aduth, ellatrix and mcsf April 6, 2018 11:20

youknowriad force-pushed the try/richtext-format-prop branch from 62653d6 to 3877b17 Compare April 6, 2018 11:22

ellatrix reviewed Apr 6, 2018

View reviewed changes

youknowriad mentioned this pull request Apr 6, 2018

Post Meta is not saved for custom post types #5622

Closed

aduth reviewed Apr 10, 2018

View reviewed changes

ellatrix reviewed Apr 13, 2018

View reviewed changes

aduth approved these changes Apr 13, 2018

View reviewed changes

ellatrix requested changes Apr 13, 2018

View reviewed changes

youknowriad and others added 8 commits April 16, 2018 14:27

RichText: Add a format prop to allow HTML string values to be used in…

cf196aa

… RichText components

Don't align params and returns docs

d85b718

Add domToFormat function to factorize ternaries

a3f4726

Remove useless stringToElement

0bec108

Add valueToString to avoid ternaries when converting to strings

dd9e099

Avoid double isEmpty check

2963b65

Use valid HTML in unit tests

665c59e

Fix splitting

db26521

ellatrix force-pushed the try/richtext-format-prop branch from 17e6011 to db26521 Compare April 16, 2018 13:48

ellatrix approved these changes Apr 18, 2018

View reviewed changes

youknowriad merged commit 1dfb30f into master Apr 19, 2018

youknowriad deleted the try/richtext-format-prop branch April 19, 2018 08:53

youknowriad mentioned this pull request Apr 19, 2018

RichText: Add the RichText.Content component to be used in conjunction with RichText #6274

Merged

mtias added this to the 2.8 milestone May 3, 2018

RichText: Add a format prop to allow HTML string values to be used in RichText components #6034

RichText: Add a format prop to allow HTML string values to be used in RichText components #6034

Uh oh!

Conversation

youknowriad commented Apr 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ellatrix commented Apr 6, 2018

Uh oh!

youknowriad commented Apr 6, 2018

Uh oh!

ellatrix commented Apr 6, 2018

Uh oh!

youknowriad commented Apr 6, 2018

Uh oh!

aduth left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

youknowriad commented Apr 11, 2018

Uh oh!

youknowriad commented Apr 13, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aduth Apr 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ellatrix Apr 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aduth left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

youknowriad commented Apr 6, 2018 •

edited

Loading

aduth Apr 13, 2018 •

edited

Loading

ellatrix Apr 13, 2018 •

edited

Loading

ellatrix left a comment •

edited

Loading