Markdown: Add source:markdown element to RSS feeds#46968
Markdown: Add source:markdown element to RSS feeds#46968
Conversation
- Use get_post() directly instead of get_the_ID() + get_post($id) - Use WPCom_Markdown::IS_MD_META constant instead of hardcoded string - Add assertNotEmpty on CDATA extraction in test to prevent false passes
- Replace printf with echo concatenation to prevent garbled output when Markdown content contains %s, %d, or other format specifiers - Add test for printf format specifier preservation - Strengthen no-meta test by providing non-empty post_content_filtered so it truly validates the meta check independently
Move the source namespace xmlns declaration from an inline anonymous function in markdown.php to a named jetpack_markdown_rss_namespace() function in rss.php alongside the existing RSS output function.
RSS readers expect the element to be consistently present. Instead of only emitting source:markdown for Markdown posts, always include it: use raw Markdown from post_content_filtered when available, otherwise fall back to the rendered post_content.
Apply the_content filters when falling back to post_content so Gutenberg block markup is rendered into clean HTML instead of serving raw block comments to RSS readers.
|
Are you an Automattician? Please test your changes on all WordPress.com environments to help mitigate accidental explosions.
Interested in more tips and information?
|
|
Thank you for your PR! When contributing to Jetpack, we have a few suggestions that can help us test and review your patch:
This comment will be updated as you work on your PR and make changes. If you think that some of those checks are not needed for your PR, please explain why you think so. Thanks for cooperation 🤖 Follow this PR Review Process:
If you have questions about anything, reach out in #jetpack-developers for guidance! Jetpack plugin: The Jetpack plugin has different release cadences depending on the platform:
If you have any questions about the release process, please ask in the #jetpack-releases channel on Slack. |
There was a problem hiding this comment.
Pull request overview
This pull request adds a <source:markdown> element to RSS feeds when the Jetpack Markdown module is active, following the convention described at source.scripting.com. For posts written with the legacy Markdown module, the raw Markdown source is extracted from post_content_filtered. For all other posts, the rendered HTML content (with the_content filters applied and Gutenberg block comments stripped) is used as a fallback, ensuring the element is always present for RSS readers.
Changes:
- New RSS library file providing namespace declaration and source:markdown output functions
- Module initialization updated to register RSS hooks when Markdown is active
- Comprehensive PHPUnit test suite covering edge cases and escaping scenarios
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
_inc/lib/markdown/rss.php |
New library file implementing the RSS namespace declaration and source:markdown output logic with proper CDATA escaping |
modules/markdown.php |
Module initialization updated to require the RSS library and register hooks for RSS feeds (rss2_ns, rss_item, rss1_item, rss2_item) |
tests/php/modules/markdown/Markdown_RSS_Test.php |
Comprehensive test suite covering Markdown posts, non-Markdown fallback, empty content, CDATA escaping, printf specifiers, and namespace declaration |
changelog/add-markdown-rss-source-element |
Changelog entry following project conventions, describing the enhancement from a user perspective |
Code Coverage SummaryCoverage changed in 4 files.
1 file is newly checked for coverage.
Full summary · PHP report · JS report Coverage check overridden by
I don't care about code coverage for this PR
|
Replace WPCom_Markdown::IS_MD_META with the string literal '_wpcom_is_markdown' so rss.php works when loaded from the Markdown block entry point where the legacy class is absent. Remove the now-unnecessary easy-markdown.php require from tests.
Complements the existing bail-guard test by verifying that the block function produces correct output for a post that has both _wpcom_is_markdown meta and jetpack/markdown blocks.
The duplicate call to jetpack_markdown_rss_namespace() is intentional — it verifies the static deduplication guard.
|
@anomiex How would I go about solving those Phan issues? Should I create a stub for the class, since it's only present in WordPress 6.9? I thought the class_exists and the Phan comments would be enough, but it doesn't seem to solve the issues. Thank you! |
|
Unfortunately Phan doesn't realize that In this case, because the suppression comments are only needed for the run with WP 6.8 stubs, you have to get even more complicated and suppress the unused suppression error for the WP 6.9 run too, with something like
You could, but that might mask the problem if code anywhere else starts trying to use the class without checking it exists first. |
WP_Block_Processor is only available in WP 6.9+, so Phan flags it as undeclared when running against WP 6.8 stubs. Add dual suppression comments that also suppress the resulting UnusedSuppression warning on the WP 6.9 run.
| // Render all non-Markdown content through the standard pipeline. | ||
| $rendered = apply_filters( 'the_content', $modified_content ); | ||
|
|
There was a problem hiding this comment.
Since this output is specifically for RSS2 feeds, consider applying the feed-specific filter after rendering (i.e., run the_content_feed for 'rss2' similar to how core prepares feed content). That helps keep output consistent with what WordPress generates in feeds and avoids surprises from feed-only transforms.
| $content = $post->post_content_filtered; | ||
| } elseif ( ! empty( $post->post_content ) ) { | ||
| // Apply the_content filters to render Gutenberg blocks and shortcodes into clean HTML. | ||
| $content = apply_filters( 'the_content', $post->post_content ); |
There was a problem hiding this comment.
For the rendered-HTML fallback in feeds, it may be better to mirror core feed preparation by applying the_content_feed for 'rss2' after the_content (see e.g. modules/sitemaps/sitemap-builder.php where content is prepared like get_the_content_feed()). This keeps the custom element aligned with feed output expectations.
| $content = apply_filters( 'the_content', $post->post_content ); | |
| $content = apply_filters( 'the_content', $post->post_content ); | |
| // Mirror core feed preparation by applying the_content_feed for RSS2 feeds. | |
| $content = apply_filters( 'the_content_feed', $content, 'rss2' ); |
| $this->assertLessThan( $pos_para, $pos_md1, 'First markdown should appear before the paragraph.' ); | ||
| $this->assertLessThan( $pos_md2, $pos_para, 'Paragraph should appear before the second markdown.' ); | ||
|
|
There was a problem hiding this comment.
In this mixed-blocks test, the order assertions are inverted: assertLessThan( $pos_para, $pos_md1 ) currently asserts the paragraph appears before the first markdown, but the message says the opposite. Same for the paragraph vs second markdown assertion. This will either fail or validate the wrong ordering—swap the arguments so the assertions match the intended order (md1 < paragraph < md2).
| if ( ! jetpack_markdown_rss_post_has_markdown_block( $post->post_content ) ) { | ||
| return; | ||
| } | ||
|
|
||
| // First pass: find Markdown blocks, extract sources, record byte offsets. | ||
| // @phan-suppress-next-line PhanUndeclaredClassMethod @phan-suppress-current-line UnusedSuppression -- We checked that the class exists above. @todo Remove when we drop WP <6.9. | ||
| $processor = new WP_Block_Processor( $post->post_content ); | ||
| $sources = array(); | ||
| $regions = array(); // Each entry: array( 'start' => int, 'end' => int ). | ||
| $index = 0; | ||
|
|
||
| // @phan-suppress-next-line PhanUndeclaredClassMethod @phan-suppress-current-line UnusedSuppression -- We checked that the class exists above. @todo Remove when we drop WP <6.9. | ||
| while ( $processor->next_block( 'jetpack/markdown' ) ) { |
There was a problem hiding this comment.
jetpack_markdown_block_rss_output_source_markdown() scans for Markdown blocks twice: first via jetpack_markdown_rss_post_has_markdown_block() (which instantiates/scans a WP_Block_Processor), then again by creating a new processor and walking all markdown blocks. You can avoid the extra pass by dropping the initial helper check and just running the extraction loop; if no sources are found, return early.
| // Build modified content with placeholders replacing Markdown blocks. | ||
| $modified_content = ''; | ||
| $cursor = 0; | ||
|
|
||
| foreach ( $regions as $i => $region ) { | ||
| // Append content before this block. | ||
| $modified_content .= substr( $post->post_content, $cursor, $region['start'] - $cursor ); | ||
| // Insert placeholder. | ||
| $modified_content .= '%%JETPACK_MARKDOWN_' . $i . '%%'; | ||
| $cursor = $region['end']; | ||
| } | ||
|
|
||
| // Append any remaining content after the last block. | ||
| $modified_content .= substr( $post->post_content, $cursor ); | ||
|
|
||
| // Render all non-Markdown content through the standard pipeline. | ||
| $rendered = apply_filters( 'the_content', $modified_content ); | ||
|
|
||
| // Substitute placeholders with raw Markdown sources. | ||
| foreach ( $sources as $i => $source ) { | ||
| $rendered = str_replace( '%%JETPACK_MARKDOWN_' . $i . '%%', $source, $rendered ); |
There was a problem hiding this comment.
The placeholder approach (%%JETPACK_MARKDOWN_%d%% + apply_filters( 'the_content', ... ) + str_replace) can result in the raw Markdown source being wrapped/altered by the_content filters (notably wpautop), since the placeholder is plain text in the content stream. After replacement, the Markdown can end up inside generated HTML (e.g., <p>...</p>), which means it’s no longer truly “raw Markdown” in the output. Consider building the output by concatenating rendered non-markdown segments/blocks with raw markdown segments (e.g., render non-markdown blocks via block rendering, and insert markdown sources directly) instead of relying on placeholders that go through the_content.
| // Build modified content with placeholders replacing Markdown blocks. | |
| $modified_content = ''; | |
| $cursor = 0; | |
| foreach ( $regions as $i => $region ) { | |
| // Append content before this block. | |
| $modified_content .= substr( $post->post_content, $cursor, $region['start'] - $cursor ); | |
| // Insert placeholder. | |
| $modified_content .= '%%JETPACK_MARKDOWN_' . $i . '%%'; | |
| $cursor = $region['end']; | |
| } | |
| // Append any remaining content after the last block. | |
| $modified_content .= substr( $post->post_content, $cursor ); | |
| // Render all non-Markdown content through the standard pipeline. | |
| $rendered = apply_filters( 'the_content', $modified_content ); | |
| // Substitute placeholders with raw Markdown sources. | |
| foreach ( $sources as $i => $source ) { | |
| $rendered = str_replace( '%%JETPACK_MARKDOWN_' . $i . '%%', $source, $rendered ); | |
| // Build content by concatenating rendered non-Markdown segments with raw Markdown sources. | |
| $rendered = ''; | |
| $cursor = 0; | |
| foreach ( $regions as $i => $region ) { | |
| // Render and append content before this Markdown block. | |
| $before = substr( $post->post_content, $cursor, $region['start'] - $cursor ); | |
| if ( '' !== $before ) { | |
| $rendered .= apply_filters( 'the_content', $before ); | |
| } | |
| // Append the raw Markdown source for this block without running it through the_content filters. | |
| if ( isset( $sources[ $i ] ) ) { | |
| $rendered .= $sources[ $i ]; | |
| } | |
| $cursor = $region['end']; | |
| } | |
| // Render and append any remaining non-Markdown content after the last block. | |
| $after = substr( $post->post_content, $cursor ); | |
| if ( '' !== $after ) { | |
| $rendered .= apply_filters( 'the_content', $after ); |
| // Add source:markdown element to RSS feeds for posts containing the Markdown block. | ||
| require_once JETPACK__PLUGIN_DIR . '_inc/lib/markdown/rss.php'; | ||
| if ( class_exists( 'WP_Block_Processor' ) ) { | ||
| add_action( 'rss2_ns', 'jetpack_markdown_rss_namespace' ); |
There was a problem hiding this comment.
This will register jetpack_markdown_rss_namespace on rss2_ns even if it was already added by the Markdown module, causing the callback to run twice per feed render (the static guard prevents duplicate output but still adds overhead). Mirror the has_action( 'rss2_ns', ... ) guard used in modules/markdown.php here as well.
| add_action( 'rss2_ns', 'jetpack_markdown_rss_namespace' ); | |
| if ( ! has_action( 'rss2_ns', 'jetpack_markdown_rss_namespace' ) ) { | |
| add_action( 'rss2_ns', 'jetpack_markdown_rss_namespace' ); | |
| } |
Match WordPress core's get_the_content_feed() two-step sequence: first the_content, then the_content_feed. This ensures feed-specific transforms (e.g. relative-to-absolute URL conversion) are applied to the rendered HTML in the source:markdown element.
The extraction loop already handles the "no blocks" case via an early return on empty $sources, making the prior jetpack_markdown_rss_post_has_markdown_block() check a wasted processor instantiation + scan.
| foreach ( $regions as $i => $region ) { | ||
| // Append content before this block. | ||
| $modified_content .= substr( $post->post_content, $cursor, $region['start'] - $cursor ); | ||
| // Insert placeholder. | ||
| $modified_content .= '%%JETPACK_MARKDOWN_' . $i . '%%'; | ||
| $cursor = $region['end']; |
There was a problem hiding this comment.
The placeholder token format (e.g. %%JETPACK_MARKDOWN_0%%) can collide with real post content. If a post contains the same string, str_replace() will replace user content unexpectedly. Consider using a per-request unique placeholder prefix (e.g. UUID/random bytes) and doing a single pass replacement (e.g. strtr) to avoid accidental collisions.
| // If the post contains Markdown blocks, let the block function handle it. | ||
| if ( | ||
| class_exists( 'WP_Block_Processor' ) | ||
| && jetpack_markdown_rss_post_has_markdown_block( $post->post_content ) | ||
| ) { | ||
| return; | ||
| } |
There was a problem hiding this comment.
On WP versions where WP_Block_Processor is unavailable, the legacy rss2_item handler will not bail for posts containing jetpack/markdown blocks (because the block-detection helper always returns false). If the Markdown module is active, those block-based posts will still get a <source:markdown> element containing rendered HTML, which conflicts with the PR description’s stated behavior (“no element is emitted for block-based posts” on <6.9). If the intent is truly to emit nothing for block posts on <6.9, consider falling back to has_block( 'jetpack/markdown', $post->post_content ) for detection when WP_Block_Processor is missing, and bail in that case too (or update the PR description to match the actual fallback behavior).
Use regex assertions to account for class attributes (e.g.
wp-block-paragraph) that the block renderer adds to <p> tags
when apply_filters('the_content') runs.
Warning
This is on hold for now, pending more general discussion and potential change of direction.
pgle0O-1sG-p2
Fixes CM-533
Proposed changes:
This adds a
<source:markdown>element to RSS2 feed items for posts written with either the legacy Markdown module or the Markdown block (jetpack/markdown). The element provides the raw Markdown source alongside the rendered HTML, following the convention described at source.scripting.com/markdown.A
sourceXML namespace declaration (xmlns:source="https://source.scripting.com/") is added to RSS2 feeds via therss2_nshook.Legacy Markdown module (
modules/markdown/)For posts written with the legacy Markdown module, the raw Markdown is read from
post_content_filtered. For all other posts, the renderedpost_content(withthe_contentfilters applied) is used as a fallback, so the element is always present for RSS readers to consume.Markdown block (
extensions/blocks/markdown/)For posts containing
jetpack/markdownblocks, the function usesWP_Block_Processor(WP 6.9+) to scan the post content and extract raw Markdown from each block'ssourceattribute. Non-Markdown blocks are rendered throughthe_contentfilters, producing a hybrid document where Markdown blocks contribute raw source and everything else contributes rendered HTML.On WordPress versions older than 6.9 (where
WP_Block_Processoris unavailable), the block-level RSS output gracefully degrades — nosource:markdownelement is emitted for block-based posts.Architecture
_inc/lib/markdown/rss.phpwith four functions:jetpack_markdown_rss_namespace()— outputs thexmlns:sourcenamespace declaration once (static dedup guard).jetpack_markdown_rss_post_has_markdown_block()— shared helper usingWP_Block_Processorto detectjetpack/markdownblocks.jetpack_markdown_block_rss_output_source_markdown()— block path: placeholder substitution + content rendering + raw MD splice.jetpack_markdown_rss_output_source_markdown()— legacy path: raw MD frompost_content_filtered, or rendered fallback.modules/markdown.php— loadsrss.php(viarequire_once) and hooks the legacy function.extensions/blocks/markdown/markdown.php— loadsrss.php(viarequire_once) and hooks the block function, guarded byclass_exists( 'WP_Block_Processor' ).jetpack/markdownblocks are detected, deferring to the block function.rss.phpis decoupled from theWPCom_Markdownclass (uses the string literal'_wpcom_is_markdown'instead of the class constant) so it works from both entry points.Alternative
For a standalone implementation that works outside the Jetpack plugin, see a8cteam51/team51-markdown-rss.
Other information:
Does this pull request change what data or activity we track or use?
No. This only adds output to existing RSS feeds; no new data is collected or stored.
Testing instructions:
Automated tests
Markdown_RSS_Testtests.Manual testing — Legacy Markdown module
# Hello\n\nThis is **bold**.). Publish it./feed.<rss>element should includexmlns:source="https://source.scripting.com/".<item>should contain a<source:markdown>element with the raw Markdown inside a CDATA section.<item>should contain a<source:markdown>element with the rendered HTML content (no Gutenberg block comments).Manual testing — Markdown block
jetpack/markdown) and write some Markdown content. Optionally add other blocks (paragraphs, headings) around it. Publish it./feed.<item>should contain a<source:markdown>element.Edge cases
]]>and verify it appears escaped as]]>in the CDATA output.