Skip to content

Conversation

@cammonro
Copy link
Contributor

@cammonro cammonro commented Jun 10, 2025

Summary

This PR aims to reduce overhead related to calculating collection size. As catalogs grow in size the query becomes increasingly less performant. By eliminating redundant COUNT() operations we can shave some processing time here.

This PR includes the following changes:

  • Removed rebuildEntityIds() as a relic of Algolia\AlgoliaSearch\Helper\Data::rebuildStoreProductIndex()here
    • It is not necessary to re-page an already chunked array of entityIds
    • Product set on the delta will never be higher than the page size max because of potential parent aggregation for configurables etc.
  • Removed collection cloning - was only needed for in-process collection paging on deltas which we are no longer doing
  • Removed collection reset ops for memory management - collection use is now limited to function scope
  • Paging information is now included on deltas for observability - previously logs misreported page 1 for all jobs on a large delta
    2025-06-10_10-44-57
  • Removed areParentsLoaded member boolean - this internal state was observed to be problematic when processing jobs across stores (parents would not be loaded for subsequent stores) - now should be consistent
    2025-06-10_10-50-23
  • Moved emulation to encompass entire collection build to ensure full alignment with storefront behavior.
  • Introduced an optional cache to store collection size in Redis
  • Added a plugin to purge the collection size when products are added or deleted or edited in a way that impacts visibility on the storefront
  • Added unit tests

Please note that I chose not to address all stock scenarios for cache invalidation until we more properly address MSI. That can be reevaluated at a later date.

Result

Unit tests:
image

Caching of collection size in Redis:
image

New indexing cache:
2025-06-10_11-22-57

Full index job gen - EXTRA LARGE fixture - uncached:
image

Full index job gen - EXTRA LARGE fixture - CACHED:
image

@cammonro cammonro changed the base branch from release/3.16.0-dev to release/3.17.0-dev June 10, 2025 15:06
@cammonro cammonro marked this pull request as ready for review June 10, 2025 16:46
@cammonro cammonro requested a review from damcou June 10, 2025 16:46
Copy link
Contributor

@damcou damcou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job ! 🙌

Unit tests are passing and integration tests for products indexing as well:
image

Added my thoughts, let me know if it makes sense :)

@cammonro
Copy link
Contributor Author

Thanks for the great feedback @damcou ! I will implement the changes you've proposed (though some might go on a different ticket).

@cammonro cammonro requested a review from damcou June 12, 2025 13:22
Copy link
Contributor

@damcou damcou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are passing, LGTM 🚀

@cammonro cammonro merged commit d667852 into release/3.17.0-dev Jun 12, 2025
3 checks passed
@cammonro cammonro deleted the feat/MAGE-1083-collection-size-handling branch June 12, 2025 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants