Skip to content

Conversation

@dsas
Copy link
Contributor

@dsas dsas commented Jun 11, 2025

Fixes https://linear.app/a8c/issue/DOTCOM-13020

Proposed changes:

Warning

This code is overly broad and a hack. Maybe that's ok. 😞

In the past a fallback navigation was created when one didn't exist. This used to be created in render_block_core_navigation (see WordPress/gutenberg#47684) but no longer happens.

This has lead to a number of blogs having this fallback navigation created when an a12 visited their site. This means that the a12s details including - wpcom username, email address, etc are in the generated export file.

It feels unhelpful to have this in the export file, partly because it might need explaining if the legal team share an export as part of their operating procedures and fail to remove the non-material data, and partly because a12s data is shared to anyone exporting an affected blog.

This change tries to remove the information when generating the export file, but is somewhat hampered by a lack of relevant hooks and also by a way of identifying current and former a12s. Instead it chooses a relevant SQL generation, hackily identifies whether the query is the relevant part of the export, and then modifies it to add some SQL excluding wp_navigation post types that were created by users that are not currently part of the blog.

Other information:

  • Have you written new tests for your changes, if applicable?
  • Have you checked the E2E test CI results, and verified that your changes do not break them?
  • Have you tested your changes on WordPress.com, if applicable (if so, you'll see a generated comment below with a script to run)?

Jetpack product discussion

Does this pull request change what data or activity we track or use?

Testing instructions:

  1. go to .wordpress.com/wp-admin/export.php for the second site mentioned in this P2 pcgXQF-2um-p2
  2. Download the file
  3. Apply this patch to your wpcom-sandbox, sandbox the site.
  4. Download the file again

You should see that the wp_navigation CPT (and it's author) don't show up in the export, while the About page and it's author do.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 11, 2025

Are you an Automattician? Please test your changes on all WordPress.com environments to help mitigate accidental explosions.

  • To test on WoA, go to the Plugins menu on a WoA dev site. Click on the "Upload" button and follow the upgrade flow to be able to upload, install, and activate the Jetpack Beta plugin. Once the plugin is active, go to Jetpack > Jetpack Beta, select your plugin (WordPress.com Site Helper), and enable the dotcom-13020-a11n-data-appearing-in-docstorage-wxr-file branch.
  • To test on Simple, run the following command on your sandbox:
bin/jetpack-downloader test jetpack-mu-wpcom-plugin dotcom-13020-a11n-data-appearing-in-docstorage-wxr-file

Interested in more tips and information?

  • In your local development environment, use the jetpack rsync command to sync your changes to a WoA dev blog.
  • Read more about our development workflow here: PCYsg-eg0-p2
  • Figure out when your changes will be shipped to customers here: PCYsg-eg5-p2

@github-actions
Copy link
Contributor

github-actions bot commented Jun 11, 2025

Thank you for your PR!

When contributing to Jetpack, we have a few suggestions that can help us test and review your patch:

  • ✅ Include a description of your PR changes.
  • ✅ Add a "[Status]" label (In Progress, Needs Review, ...).
  • 🔴 Add a "[Type]" label (Bug, Enhancement, Janitorial, Task).
  • ✅ Add testing instructions.
  • ✅ Specify whether this PR includes any changes to data or privacy.
  • ✅ Add changelog entries to affected projects

This comment will be updated as you work on your PR and make changes. If you think that some of those checks are not needed for your PR, please explain why you think so. Thanks for cooperation 🤖


Follow this PR Review Process:

  1. Ensure all required checks appearing at the bottom of this PR are passing.
  2. Make sure to test your changes on all platforms that it applies to. You're responsible for the quality of the code you ship.
  3. You can use GitHub's Reviewers functionality to request a review.
  4. When it's reviewed and merged, you will be pinged in Slack to deploy the changes to WordPress.com simple once the build is done.

If you have questions about anything, reach out in #jetpack-developers for guidance!

@github-actions github-actions bot added the [Status] Needs Author Reply We need more details from you. This label will be auto-added until the PR meets all requirements. label Jun 11, 2025
@jp-launch-control
Copy link

jp-launch-control bot commented Jun 11, 2025

Code Coverage Summary

Coverage changed in 25 files. Only the first 5 are listed here.

File Coverage Δ% Δ Uncovered
projects/packages/jetpack-mu-wpcom/src/features/admin-color-schemes/admin-color-schemes.php 1/2 (50.00%) 50.00% -1 💚
projects/packages/jetpack-mu-wpcom/src/features/cloudflare-analytics/cloudflare-analytics.php 1/11 (9.09%) 9.09% -1 💚
projects/packages/jetpack-mu-wpcom/src/features/css-monkey-patches/index.php 1/3 (33.33%) 33.33% -1 💚
projects/packages/jetpack-mu-wpcom/src/features/first-posts-stream/first-posts-stream-helpers.php 1/11 (9.09%) 9.09% -1 💚
projects/packages/jetpack-mu-wpcom/src/features/import-customizations/import-customizations.php 1/29 (3.45%) 3.45% -1 💚

1 file is newly checked for coverage.

File Coverage
projects/packages/jetpack-mu-wpcom/src/features/wpcom-navigation-export-filter/class-export-filter.php 16/23 (69.57%) 💚

Full summary · PHP report

@dsas dsas self-assigned this Jun 12, 2025
@dsas
Copy link
Contributor Author

dsas commented Jun 12, 2025

The code "works" now. It means that wp_navigation posts will not be exported if they were authored by a user no longer on the site. I am somewhat concerned that this means people are going to export their site and then importing it isn't going to work.

I don't think this can be a problem on WoA sites. I'm going to port this change to wpcom so it limits the impact a tad.

@dsas
Copy link
Contributor Author

dsas commented Jun 20, 2025

I don't think this can be a problem on WoA sites. I'm going to port this change to wpcom so it limits the impact a tad.

Except WoA sites will not necessarily have been WoA sites during the period when these posts were being created. If we want to eliminate that problem then the solution needs to work on both WoA and Simple sites.

dsas added 5 commits June 20, 2025 17:23
This code doesn't work, and even if it did it's a hack that is probably
overly broad.

In the past a fallback navigation was created when one didn't exist.
This used to be created in `render_block_core_navigation` (see
WordPress/gutenberg#47684) but no longer happens.

This has lead to a number of blogs having this fallback navigation
created when an a12 visited their site. This means that the a12s details
- wpcom username, email address etc are in the generated export file.

It feels unhelpful to have this in the export file, partly because it
might need explaining if the legal time share an export as part of a
legal request, and partly because a12s data is shared to anyone
exporting an affected blog.

This change tries to remove the information when generating the export
file, but is somewhat hampered by a lack of relevent hooks and also by a
way of identifying current and former a12s. Instead it chooses a
relevant SQL generation, hackily identifies whether the query is the
relevant part of the export, and then modifies it to add some SQL
excluding wp_navigation post types that were created by users that are
not currently part of the blog.

See also https://linear.app/a8c/issue/DOTCOM-13020
Rather than using a subquery, rely on get_users to get the 'allowed'
users.
@dsas dsas force-pushed the dotcom-13020-a11n-data-appearing-in-docstorage-wxr-file branch from 79df023 to bd2971b Compare June 20, 2025 16:26
@dsas dsas requested a review from Copons June 20, 2025 16:41
@dsas dsas marked this pull request as ready for review June 20, 2025 16:41
@mmtr
Copy link
Member

mmtr commented Jun 25, 2025

It means that wp_navigation posts will not be exported if they were authored by a user no longer on the site. I am somewhat concerned that this means people are going to export their site and then importing it isn't going to work.

To alleviate that concern, would it be possible to change the author of the wp_navigation posts to the owner of the site? To have a more targeted approach, we can only change the wp_navigation posts that have been created by users with a @automattic.com or @a8c.com address who are not members of the site.

@dsas
Copy link
Contributor Author

dsas commented Jun 25, 2025

It means that wp_navigation posts will not be exported if they were authored by a user no longer on the site. I am somewhat concerned that this means people are going to export their site and then importing it isn't going to work.

To alleviate that concern, would it be possible to change the author of the wp_navigation posts to the owner of the site? To have a more targeted approach, we can only change the wp_navigation posts that have been created by users with a @automattic.com or @a8c.com address who are not members of the site.

That's a great suggestion, seems better than just omitting it.

We discussed the more focused targetting previously, the concern there is that we don't enforce that a12 wpcom accounts use an automattic email. Including the personal email is arguably less good than including an a8c email.

@rcrdortiz
Copy link
Contributor

rcrdortiz commented Jun 25, 2025

The code "works" now. It means that wp_navigation posts will not be exported if they were authored by a user no longer on the site. I am somewhat concerned that this means people are going to export their site and then importing it isn't going to work.

I don't think this can be a problem on WoA sites. I'm going to port this change to wpcom so it limits the impact a tad.

To mitigate your concern, could we continue to export the wp_navigation posts but changing the author to an existing user somehow?

Oh, sorry, I didn't continue reading the discussion haha. Miguel basically suggested the same thing.

\Automattic\Jetpack\Jetpack_Mu_Wpcom\Holiday_Snow::init();

// Initialize WPCOM Navigation Export Filter
if ( class_exists( '\Automattic\Jetpack\Jetpack_Mu_Wpcom\Wpcom_Navigation_Export_Filter\Export_Filter' ) ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class is defined in Jetpack; this caller is also in Jetpack. I'm failing to understand under what circumstances the class won't exist while having Jetpack loaded.

I also think that instantiating an object and not storing its reference indicates that something isn't right here. Calling a static method also isn't the best solution since static methods aren't mockable. Maybe we could change the __construct to an init method or something similar? Since most of this file uses static ::init, I would also consider it valid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class is defined in Jetpack; this caller is also in Jetpack. I'm failing to understand under what circumstances the class won't exist while having Jetpack loaded.

I'm not a PHP ninja, but if the require_once on line 287 succeeds, I don't see how this could fail.

*
* @return void
*/
public function stop_export_filtering(): void {
Copy link
Contributor

@rcrdortiz rcrdortiz Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we need the is_exporting state. From what I can see, the code that runs in prod (not test) never changes this state after initialization, and the state is always going to be is_exporting = true. Am I missing something here? Aren't we always running the transformation for all queries once the class is instantiated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely don't need the class var. We don't really need the method I guess. It kinda feels we should have it for symmetry, and in case we start exporting elsewhere but yagni, i'll remove it.

AND {$wpdb->posts}.post_author > 0
AND {$wpdb->posts}.post_author NOT IN ({$current_users})
)";
}
Copy link
Contributor

@rcrdortiz rcrdortiz Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This here is a personal preference, so feel free to ignore this comment completely if you want.

I think it's a bit simpler to follow if we merge both conditions, something like this resonates with me:

// Fetch current users (IDs)
$users = get_users( [ 'fields' => 'ID' ] );

// Build a comma-list of ints (or an empty string if no users)
$current_users = $users ? implode( ',', array_map( 'intval', $users ) ) : '';

// Now one assignment, tacking on the “NOT IN” bit only when $current_users isn’t empty
$additional_where = "
  AND NOT (
      {$wpdb->posts}.post_type   = 'wp_navigation'
      AND {$wpdb->posts}.post_author > 0" .
      ( $current_users ? " AND {$wpdb->posts}.post_author NOT IN ({$current_users_list})" : '' ) . "
  )";

Copy link
Contributor

@rcrdortiz rcrdortiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested the changes and they work as described. If we change to including the removed posts under an existing user, I'll re-review the changes.

@dsas
Copy link
Contributor Author

dsas commented Jun 26, 2025

To alleviate that concern, would it be possible to change the author of the wp_navigation posts to the owner of the site?

A constraint here is that there are no good places to hook, so the modifications have to be in SQL land. The SQL query to do that is more complicated than the current approach, so I'm more worried about its fragility. I'll finish it off and share it on this PR in a while.

@mmtr
Copy link
Member

mmtr commented Jun 26, 2025

To alleviate that concern, would it be possible to change the author of the wp_navigation posts to the owner of the site?

A constraint here is that there are no good places to hook, so the modifications have to be in SQL land.

Can we do that outside of the export action?

It seems to me that the DB is polluted to some extent due to a bug, and some posts have a wrong author. So rather than waiting for the export to fix that, can we run a script to change the author or do it during a normal hook like admin_init?

Something like the logic that exempts some sites from paying for GS styles if they used it before becoming a paid feature:

  • Normal hook like admin_init triggers
  • If the site doesn't have a certain blog sticker, check if it has wp_navigation posts created by non-members
  • If there are, update the post author and add the blog sticker

@dsas
Copy link
Contributor Author

dsas commented Jun 27, 2025

It seems to me that the DB is polluted to some extent due to a bug, and some posts have a wrong author. So rather than waiting for the export to fix that, can we run a script to change the author or do it during a normal hook like admin_init?

Something like the logic that exempts some sites from paying for GS styles if they used it before becoming a paid feature:

* Normal hook like `admin_init` triggers

* If the site doesn't have a certain blog sticker, check if it has wp_navigation posts created by non-members

* If there are, update the post author and add the blog sticker

I remember that originally we discussed and discarded the idea of running a script over "every" site to fix the db, though I can't remember specifically why.

Depolluting the DB seems like a fine idea though. We could do it on the export_wp hook if we want to be sure it's done whenever an export happens.

Copy link
Member

@alshakero alshakero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just drive by comments.

require_once __DIR__ . '/features/wpcom-post-list/wpcom-post-types-tracking.php';
require_once __DIR__ . '/features/wpcom-widgets/wpcom-widgets.php';
require_once __DIR__ . '/features/wpcom-wpadmin-page-view/wpcom-wpadmin-page-view.php';
require_once __DIR__ . '/features/wpcom-navigation-export-filter/class-export-filter.php';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may create a few hundred fatals during the first deployment.

Copy link
Contributor Author

@dsas dsas Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not with a jetpack sun/moon deployment it shouldn't

avoid mid-deploy fatals due to files being removed before in-use files referencing them are updated, but sun/moon achieves the same end by only updating code when it’s not currently in use in the first place.

PCYsg-Jjm-p2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah good point!

\Automattic\Jetpack\Jetpack_Mu_Wpcom\Holiday_Snow::init();

// Initialize WPCOM Navigation Export Filter
if ( class_exists( '\Automattic\Jetpack\Jetpack_Mu_Wpcom\Wpcom_Navigation_Export_Filter\Export_Filter' ) ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class is defined in Jetpack; this caller is also in Jetpack. I'm failing to understand under what circumstances the class won't exist while having Jetpack loaded.

I'm not a PHP ninja, but if the require_once on line 287 succeeds, I don't see how this could fail.

/**
* Test that export filter hooks are properly registered.
*/
public function test_export_filter_hooks_are_registered(): void {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it reasonable to include wp-admin/includes/export.php as a text file and check if $post_ids = $wpdb->get_col( "SELECT ID FROM {$wpdb->posts} $join WHERE $where" ); string occurs in it?

We have no guarantee this query will remain the same and we'll never know if Core changes it.

@dsas dsas closed this Jun 30, 2025
@dsas
Copy link
Contributor Author

dsas commented Jun 30, 2025

Marking as abandoned for now - I'll investigate an alternative solution.

@github-actions github-actions bot removed [Status] Needs Author Reply We need more details from you. This label will be auto-added until the PR meets all requirements. [Status] In Progress labels Jun 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants