Skip to content

Jetpack Sync: Checksums - convert blacklist to allowlist#47468

Open
coder-karen wants to merge 9 commits intotrunkfrom
update/sync-checksums-convert-blacklist-to-allowlist
Open

Jetpack Sync: Checksums - convert blacklist to allowlist#47468
coder-karen wants to merge 9 commits intotrunkfrom
update/sync-checksums-convert-blacklist-to-allowlist

Conversation

@coder-karen
Copy link
Contributor

@coder-karen coder-karen commented Mar 5, 2026

Fixes SYNC-278

Proposed changes:

  • This PR converts the existing checksums SQL ranges checks from checking blacklists, to allowlists. It does this by replacing post_type NOT IN (long blacklist) with post_type IN(allowlist of allowed post types), for better performance (fewer string comparisons per row, better MySQL query optimisation with IN).
  • This in turn should result in better Sync performance on sites with a lot of data being updated.

Other information:

  • Have you written new tests for your changes, if applicable?
  • Have you checked the E2E test CI results, and verified that your changes do not break them?
  • Have you tested your changes on WordPress.com, if applicable (if so, you'll see a generated comment below with a script to run)?

Jetpack product discussion

n/a

Does this pull request change what data or activity we track or use?

No.

Testing instructions:

  • On a self-hosted Jetpack test site, with this PR applied (using the Jetpack Beta tester plugin, or locally):
  • Modify a post field in the remote site's database, for example the post_modified_gmt field
  • In your WPcom sandbox, confirm the value has not been updated (via wpsh): gpr select * from wp_YOURBLOGID_posts where ID=POSTID
  • Then, run a fix checksum:
switch_to_blog(YOURBLOGID);
$v = new Jetpack_Sync_Validator(get_current_blog_id())
return $v->perform_cache_site_audit( ['posts'], true )
  • The field should be updated.

To test the other two modified files:

  • For class-posts.php: run a full sync of posts via the Jetpack debugger - that will hit class-posts.php. Do this after following the same steps as earlier to modify a value on the remote DB, and confirm the value has changed.
  • For theclass-full-sync.php file, the test file should be enough here. The legacy-full-sync tests should run for PHP 7.0 latest - in PR checks, so PR checks should pass.

Changelog

  • Generate changelog entries for this PR (using AI).

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

Are you an Automattician? Please test your changes on all WordPress.com environments to help mitigate accidental explosions.

  • To test on WoA, go to the Plugins menu on a WoA dev site. Click on the "Upload" button and follow the upgrade flow to be able to upload, install, and activate the Jetpack Beta plugin. Once the plugin is active, go to Jetpack > Jetpack Beta, select your plugin (Jetpack or WordPress.com Site Helper), and enable the update/sync-checksums-convert-blacklist-to-allowlist branch.
  • To test on Simple, run the following command on your sandbox:
bin/jetpack-downloader test jetpack update/sync-checksums-convert-blacklist-to-allowlist
bin/jetpack-downloader test jetpack-mu-wpcom-plugin update/sync-checksums-convert-blacklist-to-allowlist

Interested in more tips and information?

  • In your local development environment, use the jetpack rsync command to sync your changes to a WoA dev blog.
  • Read more about our development workflow here: PCYsg-eg0-p2
  • Figure out when your changes will be shipped to customers here: PCYsg-eg5-p2

@github-actions github-actions bot added [Plugin] Jetpack Issues about the Jetpack plugin. https://wordpress.org/plugins/jetpack/ [Tests] Includes Tests labels Mar 5, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

Thank you for your PR!

When contributing to Jetpack, we have a few suggestions that can help us test and review your patch:

  • ✅ Include a description of your PR changes.
  • ✅ Add a "[Status]" label (In Progress, Needs Review, ...).
  • ✅ Add testing instructions.
  • ✅ Specify whether this PR includes any changes to data or privacy.
  • ✅ Add changelog entries to affected projects

This comment will be updated as you work on your PR and make changes. If you think that some of those checks are not needed for your PR, please explain why you think so. Thanks for cooperation 🤖


Follow this PR Review Process:

  1. Ensure all required checks appearing at the bottom of this PR are passing.
  2. Make sure to test your changes on all platforms that it applies to. You're responsible for the quality of the code you ship.
  3. You can use GitHub's Reviewers functionality to request a review.
  4. When it's reviewed and merged, you will be pinged in Slack to deploy the changes to WordPress.com simple once the build is done.

If you have questions about anything, reach out in #jetpack-developers for guidance!


Jetpack plugin:

The Jetpack plugin has different release cadences depending on the platform:

  • WordPress.com Simple releases happen as soon as you deploy your changes after merging this PR (PCYsg-Jjm-p2).
  • WoA releases happen weekly.
  • Releases to self-hosted sites happen monthly:
    • Scheduled release: March 31, 2026
    • Code freeze: March 31, 2026

If you have any questions about the release process, please ask in the #jetpack-releases channel on Slack.

@jp-launch-control
Copy link

jp-launch-control bot commented Mar 5, 2026

Code Coverage Summary

Coverage changed in 2 files.

File Coverage Δ% Δ Uncovered
projects/packages/sync/src/class-settings.php 67/175 (38.29%) -1.84% 11 💔
projects/packages/sync/src/replicastore/class-table-checksum.php 0/426 (0.00%) 0.00% 8 💔

Full summary · PHP report · JS report

Coverage check overridden by I don't care about code coverage for this PR Use this label to ignore the check for insufficient code coveage. .

@coder-karen coder-karen marked this pull request as ready for review March 6, 2026 07:53
@coder-karen coder-karen requested a review from a team as a code owner March 6, 2026 07:53
@coder-karen coder-karen requested review from Copilot and removed request for a team March 6, 2026 07:53
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Converts Jetpack Sync’s posts checksum filtering from a long NOT IN blacklist to a smaller IN allowlist derived from “all registered post types minus blacklist”, aiming to improve SQL performance during checksum and full sync operations.

Changes:

  • Introduces Settings::get_allowed_post_types_sql() / Settings::get_allowed_post_types_for_checksum() (with per-request caching) and swaps callers over from the prior blacklist SQL.
  • Updates posts checksum default table config to use structured filter_values with operator => 'IN'.
  • Updates related PHP tests and adds changelogger entries for both touched projects.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
projects/plugins/jetpack/tests/php/sync/Jetpack_Sync_Checksum_Test.php Updates checksum test provider to use the new allowlist SQL helper.
projects/plugins/jetpack/changelog/update-sync-checksums-convert-blacklist-to-allowlist Adds a Jetpack plugin changelog entry (currently not in the standard format).
projects/packages/sync/src/replicastore/class-table-checksum.php Switches posts checksum filter_values to IN allowlist; fixes a docblock typo.
projects/packages/sync/src/modules/class-posts.php Uses allowlist SQL in posts module full-sync WHERE clause.
projects/packages/sync/src/modules/class-full-sync.php Uses allowlist SQL when computing posts range for full sync.
projects/packages/sync/src/class-settings.php Adds allowlist helpers + request cache and invalidation; replaces prior blacklist SQL helper.
projects/packages/sync/changelog/posts-checksum-in-allowlist Adds Sync package changelog entry describing the allowlist change.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@coder-karen coder-karen added I don't care about code coverage for this PR Use this label to ignore the check for insufficient code coveage. [Status] Needs Review This PR is ready for review. and removed [Status] In Progress labels Mar 6, 2026
@coder-karen coder-karen requested a review from fgiannar March 6, 2026 08:56
Copilot AI review requested due to automatic review settings March 6, 2026 11:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +404 to +417
* Get allowed post types for the posts checksum (all registered minus blacklist).
* Used so the checksum query can use IN (allowed) instead of NOT IN (blacklist).
*
* Result is cached for the request to prevent unnecessary get_post_types() calls.
*
* @return array Allowed post type names (no DB query; get_post_types() is used).
*/
public static function get_allowed_post_types_for_checksum() {
if ( null !== self::$cached_allowed_post_types_for_checksum ) {
return self::$cached_allowed_post_types_for_checksum;
}
$all_types = get_post_types( array(), 'names' );
$blacklist = static::get_setting( 'post_types_blacklist' );
$allowed = array_diff( $all_types, $blacklist );
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_allowed_post_types_for_checksum() builds the allowlist from get_post_types(), which only returns registered post types. This changes behavior vs the prior post_type NOT IN ( blacklist ) approach: posts whose post_type exists in the DB but is no longer registered (e.g. plugin removed leaving orphan rows) will now be excluded from checksums/full-sync ranges and may never be repaired/synced. If that behavior change isn’t intended, consider deriving the allowlist from the DB’s distinct post_type values (then subtract the blacklist) or keeping the old NOT IN behavior for full-sync/range queries while using the allowlist only for the checksum path.

Suggested change
* Get allowed post types for the posts checksum (all registered minus blacklist).
* Used so the checksum query can use IN (allowed) instead of NOT IN (blacklist).
*
* Result is cached for the request to prevent unnecessary get_post_types() calls.
*
* @return array Allowed post type names (no DB query; get_post_types() is used).
*/
public static function get_allowed_post_types_for_checksum() {
if ( null !== self::$cached_allowed_post_types_for_checksum ) {
return self::$cached_allowed_post_types_for_checksum;
}
$all_types = get_post_types( array(), 'names' );
$blacklist = static::get_setting( 'post_types_blacklist' );
$allowed = array_diff( $all_types, $blacklist );
* Get allowed post types for the posts checksum (all post types in the DB minus blacklist).
* Used so the checksum query can use IN (allowed) instead of NOT IN (blacklist).
*
* Result is cached for the request to prevent unnecessary database calls.
*
* @return array Allowed post type names.
*/
public static function get_allowed_post_types_for_checksum() {
if ( null !== self::$cached_allowed_post_types_for_checksum ) {
return self::$cached_allowed_post_types_for_checksum;
}
global $wpdb;
// Fetch all distinct post_type values present in the posts table.
$all_types = $wpdb->get_col( "SELECT DISTINCT post_type FROM {$wpdb->posts}" );
if ( ! is_array( $all_types ) ) {
$all_types = array();
}
$blacklist = static::get_setting( 'post_types_blacklist' );
$allowed = array_diff( $all_types, $blacklist );

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is very valid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

posts whose post_type exists in the DB but is no longer registered (e.g. plugin removed leaving orphan rows) will now be excluded from checksums/full-sync ranges

True, but I believe the current approach makes more sense actually :)

Copy link
Contributor Author

@coder-karen coder-karen Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. But I think we'd need to change how incremental sync syncs non-registered post types at the same time here. In testing a non-registered post type will get synced in incremental sync, and removed during a fix checksum, whereas on trunk it won't be removed.
Removing the non existent posts (or rather posts with no post type) from being synced via incremental sync would have a knock on effect - there have been 56 million synced jetpack_sync_save_post actions in the last month where the post_status is jetpack_sync_non_registered_post_type. Constituting approx 44000 sites]. For what reason do we need to sync 'non existent posts' anyway?

Comment on lines +378 to +383
public static function get_allowed_post_types_sql() {
$allowed = static::get_allowed_post_types_for_checksum();
if ( empty( $allowed ) ) {
return '1 = 0'; // This is an SQL condition that is always false.
}
return 'post_type IN (\'' . implode( '\', \'', array_map( 'esc_sql', $allowed ) ) . '\')';
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_allowed_post_types_sql() relies on get_allowed_post_types_for_checksum(), but the helper’s name/doc comment imply it’s checksum-specific while it’s now also the source of truth for full-sync/range filtering. Consider introducing a more general helper name (and delegating the checksum-named method to it) to avoid confusion about intended scope.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also valid.

Comment on lines 535 to +539
$filter_values_count = is_countable( $filter['values'] ) ? count( $filter['values'] ) : 0;
if ( 0 === $filter_values_count ) {
$result[] = 'IN' === $filter['operator'] ? '1 = 0' : '1 = 1';
break;
}
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new empty-list handling for IN/NOT IN is good defensive behavior, but it’s not covered by the existing Table_Checksum tests. Please add a unit/integration test that exercises an empty values array and asserts the generated filter SQL is valid (e.g. 1 = 0/1 = 1) and that checksum calculation does not error.

Copilot uses AI. Check for mistakes.
@coder-karen coder-karen added [Status] In Progress and removed [Status] Needs Review This PR is ready for review. labels Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

I don't care about code coverage for this PR Use this label to ignore the check for insufficient code coveage. [Package] Sync [Plugin] Jetpack Issues about the Jetpack plugin. https://wordpress.org/plugins/jetpack/ [Status] In Progress [Tests] Includes Tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants