Skip to content

Conversation

@youknowriad
Copy link
Contributor

@youknowriad youknowriad commented Apr 6, 2018

We discussed several times whether we want to treat RichText values as strings, elements or raw trees. In this PR I'm introducing a format prop for the RichText component for two reasons:

  • If someone wants to store HTML in a meta attribute, the RichText component should produce a string output and not an array of element. This has been raised a while ago but I'm not able to find the corresponding issue.

  • We know we want to move away from the element's tree as default representation of RichText value (because it's not serializable properly on the server), we're not certain yet about the alternative. But anyway in order to do so (change the format) we need a migration plan, this format will allow us to introduce warnings and at some point switch the default format (which is element at the moment) and at a later point maybe remove the element format entirely.

Testing instructions

  • Right now, the newly introduce "string" format is not used anywhere, so this should not change anything for the existing blocks

  • You can setup a meta block with a "string" format for the RichText component.

@youknowriad youknowriad added the [Feature] Rich Text Related to the Rich Text component that allows developers to render a contenteditable label Apr 6, 2018
@youknowriad youknowriad self-assigned this Apr 6, 2018
@youknowriad youknowriad requested review from aduth, ellatrix and mcsf April 6, 2018 11:20
@youknowriad youknowriad force-pushed the try/richtext-format-prop branch from 62653d6 to 3877b17 Compare April 6, 2018 11:22

switch ( format ) {
case 'string':
return this.editor.getContent();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be executed many times? This will be much slower than this.editor.getContent( { format: 'raw' } ) or one loop over this.editor.getBody().childNodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be executed on each change (type), It's probably slower but since it's limited to a single RichText, maybe it's not that important.

raw is not great because we don't want to save the TinyMCE temporary nodes and attributes. We could probably write an alternative using childNodes and stripping these nodes, but not sure if this is different of what's done internally in TinyMCE

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this is fine as long as it's used on inline RichText instances. Once we get lists to move over to nested blocks we can even get rid of multiline instances altogether.

@ellatrix
Copy link
Member

ellatrix commented Apr 6, 2018

Just wondering, but what's the point of adding it if it is used anywhere?

@youknowriad
Copy link
Contributor Author

@iseulde As a said in the description, it's an extensibility API that have been asked for to save HTML to meta.

And it could be used on a follow-up PR as alternative to the children source for some blocks.

@ellatrix
Copy link
Member

ellatrix commented Apr 6, 2018

Would it be okay to add a use case here to demonstrate and to ensure it works as expected? Cool if it's too complicated, but that might be a sign more work is needed elsewhere if it's not just a one line change.

@youknowriad
Copy link
Contributor Author

Sure, here's a small patch to apply to test it on the Audio block (caption)

diff --git a/blocks/library/audio/index.js b/blocks/library/audio/index.js
index 803bc907f..0565d0b3e 100644
--- a/blocks/library/audio/index.js
+++ b/blocks/library/audio/index.js
@@ -13,7 +13,7 @@ import {
        Placeholder,
        Toolbar,
 } from '@wordpress/components';
-import { Component } from '@wordpress/element';
+import { Component, RawHTML } from '@wordpress/element';
 import { mediaUpload } from '@wordpress/utils';
 
 /**
@@ -44,8 +44,8 @@ export const settings = {
                        attribute: 'src',
                },
                caption: {
-                       type: 'array',
-                       source: 'children',
+                       type: 'string',
+                       source: 'html',
                        selector: 'figcaption',
                },
                id: {
@@ -161,6 +161,7 @@ export const settings = {
                                                        onChange={ ( value ) => setAttributes( { caption: value } ) }
                                                        isSelected={ isSelected }
                                                        inlineToolbar
+                                                       format="string"
                                                />
                                        ) }
                                </figure>,
@@ -174,7 +175,7 @@ export const settings = {
                return (
                        <figure>
                                <audio controls="controls" src={ src } />
-                               { caption && caption.length > 0 && <figcaption>{ caption }</figcaption> }
+                               { caption && caption.length > 0 && <figcaption><RawHTML>{ caption }</RawHTML></figcaption> }
                        </figure>
                );
        },

Copy link
Member

@aduth aduth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Through my various failures in trying to drop the children source (#5380), I ultimately landed at the same conclusion that this type of format option would be the best path for those who need to work with content as an HTML string, so I'm generally on board with it. I think it does raise the maintenance burden in needing to support multiple formats, particularly when none of the core blocks have shown to have need for it.

One thing which I considered in #5380 is the idea of a RichText.Content component for use in save, so that a developer doesn't need to do the RawHTML rendering themselves. Do you have any thoughts on this? I guess if they need to explicitly assign format, there's already some awareness of the data format. And it would be unfortunate if this needed to be repeated for the RichText.Content element (unless we otherwise "infer" format from the shape of the value).

*
* @param {WPElement} value Element.
*
* @return {string} HTML.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I don't think we try to align return and param descriptions now that they're separated with newline.

*
* @return {WPElement} Element.
*/
export function stringToElement( value ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function gives me bad vibes, for two reasons:

  • The implementation is susceptible to XSS
    • As used, probably not in a way that would be of any practical concern
  • We're performing two levels of transformation, first from string to DOM, then from DOM to element

Since its use is quite limited (for default value), maybe not of huge concern at the moment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though it makes me wonder, why don't we just apply the default value as dangerouslySetInnerHTML ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea it allows us to factorize more logic using valueToString(value, format)


this.restoreContentAndSplit( beforeElement, afterElement );
const { format } = this.props;
const before = format === 'string' ? domToString( beforeNodes ) : domToElement( beforeNodes );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of ternaries in this file now and it seems like it'd be easy to make mistakes in future maintenance.

I wonder if a simple improvement might just be to encapsulate logic to a single function:

domToFormat( nodes, format )

@youknowriad
Copy link
Contributor Author

One thing which I considered in #5380 is the idea of a RichText.Content component for use in save, so that a developer doesn't need to do the RawHTML rendering themselves. Do you have any thoughts on this? I guess if they need to explicitly assign format

I actually like the RichText.Content and I think we should consider using it even if the value is an element (for consistency). But I personally prefer explicitly assigning the format if it's different that the default format to avoid ambiguities. For example an element could be an object and a potential tree format is an object as well.

Let's introduce RichText.Content in a separate PR

@youknowriad
Copy link
Contributor Author

Can we get this in?

onChange() {
this.isEmpty = this.editor.dom.isEmpty( this.editor.getBody() );
this.savedContent = this.isEmpty ? [] : this.getContent();
this.savedContent = this.getContent();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way to prevent the empty check from running twice here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, maybe it could be inferred from the content.

* @return {string} HTML.
*/
export function domToString( value ) {
return map( value, element => element.outerHTML ).join( '' );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In domToElement, we're always stripping any TinyMCE internal element and attributes, but here we're not, even though there might be cases where we receive some in the case of splitting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how to address this in an optimal way. Don't we have tools in TinyMCE to do this for us instead of adding custom logic? Is it a requirement since we're not using this when getting the whole content of the editor but only when splitting?

Copy link
Member

@aduth aduth Apr 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

( new tinymce.dom.Serializer( {} ) ).serialize( el ) ? (Likely with some reuse of a serializer instance)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Andrew but unfortunately it's not working in my testing, we more likely have to specify the same rules used by TinyMCE internally

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a requirement since we're not using this when getting the whole content of the editor but only when splitting?

Well, the content will end up getting passed to the store, so I guess so. On consecutive edits we may pass getContent again, but this doesn't necessarily happen. Say you split a paragraph in two and the first part contains some elements with internal attributes. This will be passed on eventually to (block) setAttributes.

Not sure immediately how to fix either but I guess @aduth goes in the right direction. Maybe parse and serialise if you want to work on the string? I'll have a look if we can work on the fragment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a little test here: 39d9966

Splitting is currently entirely broken with the HTML format. The diff fixes it and should also get rid of internal stuff.

Copy link
Member

@ellatrix ellatrix Apr 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the splitting fix, but without serialising, you'd get this:

The goal of this new editor is to make adding rich content to WordPress simple and enjoyable. This whole post is composed of <em data-mce-selected="inline-boundary">pieces of</em>

when splitting the first paragraph of the demo content.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The splitting is broken because element.outerHTML doesn't work on text nodes that are passed.)

<p
a-prop="hi"
>
<p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, isn't this invalid HTML? Won't React warn on nesting paragraphs? Or do we not care here for these tests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our new non-React serializer is much less noisy about issues it encounters. Not to say we shouldn't be using valid HTML here though.

Copy link
Member

@aduth aduth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a good refactoring of RichText, even without the new feature. Nice one 👍

case 'string':
return this.editor.getContent();
default:
return this.editor.dom.isEmpty( this.editor.getBody() ) ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: Is the isEmpty check here meant to be an optimization for the easy case of generating an empty string? Is isEmpty itself actually performant? Wondering if we're actually shooting ourselves in the foot with this for the more common case that RichText is non-empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not an optimization, it's to ensure the paragraph block is considered empty consistently. If we remove it we can end up with content like [''] which will break the default appender.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the sense this has been discussed before, but if we have control over the elements we're producing, why are we keeping the empty string ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't tell honestly :) maybe @iseulde can tell more here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but if we have control over the elements we're producing

Do we have control over that? This one is getting the nodes from a contenteditable field which can contain all sorts combination. I've seen <p></p>,<p>''</p>, <p><br></p>, <p><br>''</p>, <p><strong></strong></p> etc., all of which are empty.

default:
return this.editor.dom.isEmpty( this.editor.getBody() ) ?
[] :
domToFormat( this.editor.getBody().childNodes || [], 'element' );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: In the future, I might imagine we avoid working with the DOM as a source and instead defer to TinyMCE's getContent( { format: 'tree' } ) to convert from their tree form to the desired target format. This would avoid the need for us to maintain our own cleaning logic. At which point, domToFormat doesn't make as much sense, might just be toFormat( value, targetFormat, sourceFormat ) (maybe infer source format?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to always prefer explicitness over implicitness. I don't really care about the name of the function but I tend to be suspicious (maybe I shoudn't) every time I hear "infer from value"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, I can see it, but it's kinda like if one had to pass a second argument to e.g. Array.from:

Array.from( new Set( [ 1, 2, 3 ] ), 'Set' );

... one could say: "Well, duh, you passed me a set, I know it's a set"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But sometimes it's not obvious, the easy example is that the tree structure and the element structure can't be distinguished easily because both can be plain objects.

<p
a-prop="hi"
>
<p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our new non-React serializer is much less noisy about issues it encounters. Not to say we shouldn't be using valid HTML here though.

Copy link
Member

@ellatrix ellatrix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but would love to see issues in #6034 (comment) fixed before merge.

@ellatrix ellatrix force-pushed the try/richtext-format-prop branch from 17e6011 to db26521 Compare April 16, 2018 13:48
@ellatrix
Copy link
Member

@youknowriad Fixed the splitting. There are still errors on merging though, where the merge function seems to assume React values.

@youknowriad
Copy link
Contributor Author

@iseulde I tried splitting and merging and it works well for me with the last changes. The merge function is assume React values and that's fine because it's defined on the block level and the block author can update its behavior depending on the format used. I tried updating the function for paragraph block and it works great for me.

@ellatrix
Copy link
Member

Oh, I didn't update that. Sorry. Looks good to go then.

@aduth
Copy link
Member

aduth commented Apr 18, 2018

Noting that we should probably wait to merge this until after the pending release.

@zgordon
Copy link

zgordon commented Apr 20, 2018

Just want to throw in here that this would be super helpful and something have gotten requested from VIP. Especially for migrating data from ACF or CMB2 fields and porting them over to blocks, or at least saving data in a more consistent way with what folks are used to. Would love to see this rolled it, have been struggling with workarounds for this exact thing. Thanks @youknowriad!

nuzzio pushed a commit to nuzzio/gutenberg that referenced this pull request Apr 25, 2018
… RichText components (WordPress#6034)

* RichText: Add a format prop to allow HTML string values to be used in RichText components

* Don't align params and returns docs

* Add domToFormat function to factorize ternaries

* Remove useless stringToElement

* Add valueToString to avoid ternaries when converting to strings

* Avoid double isEmpty check

* Use valid HTML in unit tests

* Fix splitting
@mtias mtias added this to the 2.8 milestone May 3, 2018
@aduth
Copy link
Member

aduth commented Aug 13, 2018

Noting that our patterns transforms make assumptions that the format would be of the children shape. Would this be unexpected for someone who otherwise uses the HTML format for a RichText element?

const block = transformation.transform( {
content: [ remainingText, ...drop( content ) ],
match: result,
} );

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

[Feature] Rich Text Related to the Rich Text component that allows developers to render a contenteditable

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants