Jetpack Sync: Checksums - convert blacklist to allowlist#47468
Jetpack Sync: Checksums - convert blacklist to allowlist#47468coder-karen wants to merge 9 commits intotrunkfrom
Conversation
|
Are you an Automattician? Please test your changes on all WordPress.com environments to help mitigate accidental explosions.
Interested in more tips and information?
|
|
Thank you for your PR! When contributing to Jetpack, we have a few suggestions that can help us test and review your patch:
This comment will be updated as you work on your PR and make changes. If you think that some of those checks are not needed for your PR, please explain why you think so. Thanks for cooperation 🤖 Follow this PR Review Process:
If you have questions about anything, reach out in #jetpack-developers for guidance! Jetpack plugin: The Jetpack plugin has different release cadences depending on the platform:
If you have any questions about the release process, please ask in the #jetpack-releases channel on Slack. |
Code Coverage SummaryCoverage changed in 2 files.
Full summary · PHP report · JS report Coverage check overridden by
I don't care about code coverage for this PR
|
There was a problem hiding this comment.
Pull request overview
Converts Jetpack Sync’s posts checksum filtering from a long NOT IN blacklist to a smaller IN allowlist derived from “all registered post types minus blacklist”, aiming to improve SQL performance during checksum and full sync operations.
Changes:
- Introduces
Settings::get_allowed_post_types_sql()/Settings::get_allowed_post_types_for_checksum()(with per-request caching) and swaps callers over from the prior blacklist SQL. - Updates posts checksum default table config to use structured
filter_valueswithoperator => 'IN'. - Updates related PHP tests and adds changelogger entries for both touched projects.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| projects/plugins/jetpack/tests/php/sync/Jetpack_Sync_Checksum_Test.php | Updates checksum test provider to use the new allowlist SQL helper. |
| projects/plugins/jetpack/changelog/update-sync-checksums-convert-blacklist-to-allowlist | Adds a Jetpack plugin changelog entry (currently not in the standard format). |
| projects/packages/sync/src/replicastore/class-table-checksum.php | Switches posts checksum filter_values to IN allowlist; fixes a docblock typo. |
| projects/packages/sync/src/modules/class-posts.php | Uses allowlist SQL in posts module full-sync WHERE clause. |
| projects/packages/sync/src/modules/class-full-sync.php | Uses allowlist SQL when computing posts range for full sync. |
| projects/packages/sync/src/class-settings.php | Adds allowlist helpers + request cache and invalidation; replaces prior blacklist SQL helper. |
| projects/packages/sync/changelog/posts-checksum-in-allowlist | Adds Sync package changelog entry describing the allowlist change. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
projects/plugins/jetpack/changelog/update-sync-checksums-convert-blacklist-to-allowlist
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
projects/plugins/jetpack/changelog/update-sync-checksums-convert-blacklist-to-allowlist
Show resolved
Hide resolved
| * Get allowed post types for the posts checksum (all registered minus blacklist). | ||
| * Used so the checksum query can use IN (allowed) instead of NOT IN (blacklist). | ||
| * | ||
| * Result is cached for the request to prevent unnecessary get_post_types() calls. | ||
| * | ||
| * @return array Allowed post type names (no DB query; get_post_types() is used). | ||
| */ | ||
| public static function get_allowed_post_types_for_checksum() { | ||
| if ( null !== self::$cached_allowed_post_types_for_checksum ) { | ||
| return self::$cached_allowed_post_types_for_checksum; | ||
| } | ||
| $all_types = get_post_types( array(), 'names' ); | ||
| $blacklist = static::get_setting( 'post_types_blacklist' ); | ||
| $allowed = array_diff( $all_types, $blacklist ); |
There was a problem hiding this comment.
get_allowed_post_types_for_checksum() builds the allowlist from get_post_types(), which only returns registered post types. This changes behavior vs the prior post_type NOT IN ( blacklist ) approach: posts whose post_type exists in the DB but is no longer registered (e.g. plugin removed leaving orphan rows) will now be excluded from checksums/full-sync ranges and may never be repaired/synced. If that behavior change isn’t intended, consider deriving the allowlist from the DB’s distinct post_type values (then subtract the blacklist) or keeping the old NOT IN behavior for full-sync/range queries while using the allowlist only for the checksum path.
| * Get allowed post types for the posts checksum (all registered minus blacklist). | |
| * Used so the checksum query can use IN (allowed) instead of NOT IN (blacklist). | |
| * | |
| * Result is cached for the request to prevent unnecessary get_post_types() calls. | |
| * | |
| * @return array Allowed post type names (no DB query; get_post_types() is used). | |
| */ | |
| public static function get_allowed_post_types_for_checksum() { | |
| if ( null !== self::$cached_allowed_post_types_for_checksum ) { | |
| return self::$cached_allowed_post_types_for_checksum; | |
| } | |
| $all_types = get_post_types( array(), 'names' ); | |
| $blacklist = static::get_setting( 'post_types_blacklist' ); | |
| $allowed = array_diff( $all_types, $blacklist ); | |
| * Get allowed post types for the posts checksum (all post types in the DB minus blacklist). | |
| * Used so the checksum query can use IN (allowed) instead of NOT IN (blacklist). | |
| * | |
| * Result is cached for the request to prevent unnecessary database calls. | |
| * | |
| * @return array Allowed post type names. | |
| */ | |
| public static function get_allowed_post_types_for_checksum() { | |
| if ( null !== self::$cached_allowed_post_types_for_checksum ) { | |
| return self::$cached_allowed_post_types_for_checksum; | |
| } | |
| global $wpdb; | |
| // Fetch all distinct post_type values present in the posts table. | |
| $all_types = $wpdb->get_col( "SELECT DISTINCT post_type FROM {$wpdb->posts}" ); | |
| if ( ! is_array( $all_types ) ) { | |
| $all_types = array(); | |
| } | |
| $blacklist = static::get_setting( 'post_types_blacklist' ); | |
| $allowed = array_diff( $all_types, $blacklist ); |
There was a problem hiding this comment.
That is very valid.
There was a problem hiding this comment.
posts whose post_type exists in the DB but is no longer registered (e.g. plugin removed leaving orphan rows) will now be excluded from checksums/full-sync ranges
True, but I believe the current approach makes more sense actually :)
There was a problem hiding this comment.
Right. But I think we'd need to change how incremental sync syncs non-registered post types at the same time here. In testing a non-registered post type will get synced in incremental sync, and removed during a fix checksum, whereas on trunk it won't be removed.
Removing the non existent posts (or rather posts with no post type) from being synced via incremental sync would have a knock on effect - there have been 56 million synced jetpack_sync_save_post actions in the last month where the post_status is jetpack_sync_non_registered_post_type. Constituting approx 44000 sites]. For what reason do we need to sync 'non existent posts' anyway?
| public static function get_allowed_post_types_sql() { | ||
| $allowed = static::get_allowed_post_types_for_checksum(); | ||
| if ( empty( $allowed ) ) { | ||
| return '1 = 0'; // This is an SQL condition that is always false. | ||
| } | ||
| return 'post_type IN (\'' . implode( '\', \'', array_map( 'esc_sql', $allowed ) ) . '\')'; |
There was a problem hiding this comment.
get_allowed_post_types_sql() relies on get_allowed_post_types_for_checksum(), but the helper’s name/doc comment imply it’s checksum-specific while it’s now also the source of truth for full-sync/range filtering. Consider introducing a more general helper name (and delegating the checksum-named method to it) to avoid confusion about intended scope.
| $filter_values_count = is_countable( $filter['values'] ) ? count( $filter['values'] ) : 0; | ||
| if ( 0 === $filter_values_count ) { | ||
| $result[] = 'IN' === $filter['operator'] ? '1 = 0' : '1 = 1'; | ||
| break; | ||
| } |
There was a problem hiding this comment.
The new empty-list handling for IN/NOT IN is good defensive behavior, but it’s not covered by the existing Table_Checksum tests. Please add a unit/integration test that exercises an empty values array and asserts the generated filter SQL is valid (e.g. 1 = 0/1 = 1) and that checksum calculation does not error.
Fixes SYNC-278
Proposed changes:
post_type NOT IN(long blacklist) withpost_type IN(allowlist of allowed post types), for better performance (fewer string comparisons per row, better MySQL query optimisation withIN).Other information:
Jetpack product discussion
n/a
Does this pull request change what data or activity we track or use?
No.
Testing instructions:
post_modified_gmtfieldwpsh):gpr select * from wp_YOURBLOGID_posts where ID=POSTIDTo test the other two modified files:
class-posts.php: run a full sync of posts via the Jetpack debugger - that will hitclass-posts.php. Do this after following the same steps as earlier to modify a value on the remote DB, and confirm the value has changed.class-full-sync.phpfile, the test file should be enough here. Thelegacy-full-sync testsshould run for PHP 7.0 latest - in PR checks, so PR checks should pass.Changelog