Skip to content

Conversation

@dmsnell
Copy link
Member

@dmsnell dmsnell commented Nov 14, 2017

In this patch I have rewritten markCollisions with performance in
mind. Recently we were investigating freezes in Calypso (#19446) and
noticed that SitesList.sync() was getting called over and over again
and that this particular part of that process, a quadric operation in
terms of complexity, was potentially a big part of our CPU usage.

Although I could not reproduce the freezing behavior I went ahead and
tried to refactor this function so that it would remain linear with the
number of sites (well, it's O( 2 * N ) now because it iterates twice).

The major change is storing the known Jetpack sites by URL in a mutable
map for quick retrieval. A quick pass can add in the jetpack sites then
a second pass can quickly determine if there are non-matching sites
sharing the same URL.

That is to say, we have replaced loop iteration with key lookup.

In my tests this was more than a magnitude in improvement, but overall
insignificant because I was only seeing this function take up 4-5ms on
average to run, plus it only ran a few times. (the function after these
changes was taking about 0.25ms)

If, however, we get into a loop and call this function a few hundred
times (as appears to have happened in a performance trace) then we could
end up shaving off a few hundred milliseconds of freeze time which
would indeed be noticeable.

Testing

When I was running this locally I actually compared the two algorithms to
make sure that I was marking the identical set of sites for the old one as
I was with the new one. I removed the code from the checked-in code for
the obvious reason that it doesn't belong in production.

Scan through the list of sites and make sure that their domains look like
they are appropriate. I might lean on @enejb for some help understanding
how to best test this out and what setup we might need (like specific
combinations of sites/jetpack sites).

@dmsnell dmsnell added Framework [Type] Performance [Status] Needs Review The PR is ready for review. This also triggers e2e canary tests and wp-desktop tests automatically. labels Nov 14, 2017
@dmsnell dmsnell requested a review from blowery November 14, 2017 01:22
@matticbot
Copy link
Contributor

@dmsnell dmsnell requested review from enejb and klimeryk November 14, 2017 01:22
@klimeryk
Copy link
Contributor

🎉

@aidvu Do you have a case at hand to make sure that the changes from #17805 are not affected by this PR?

@dmsnell
Copy link
Member Author

dmsnell commented Nov 14, 2017

thanks for pulling in @aidvu @klimeryk - I should not that in its current form this persists the state across a Calypso session. if we have issues when sites become Jetpack during a session we can restart by clearing out the persistent data structures on each call to markCollisions - it should still be fast.

}

const jetpackSites = jetpackUrls.get( withoutHttp( site.URL ) );
const siteCount = jetpackSites.size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename siteCount to jetpackSitesCount.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see much value in it. Jetpack is almost a given considering the context around this line, but I don't care much one way or the other; found jetpackSiteCount needlessly verbose


// then find non-Jetpack sites sharing the same URL
sites.forEach( site => {
if ( ! site.jetpack ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this not be ( site.jetpack ) return? Since we would assume that site from then on should be a .com site and not a Jetpack site.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think you're right - thanks!

const jetpackSites = jetpackUrls.get( withoutHttp( site.URL ) );
const siteCount = jetpackSites.size;

if ( siteCount > 1 || ( siteCount === 1 || ! jetpackSites.has( site.ID ) ) ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be able to just test for siteCount > 0 here. No?
Since we are looking for jetpack sites that have the same url.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly I was trying to update the code without changing the behavior. The purpose of this function was unclear to me and I wasn't sure if we'd end up with sites sharing the same ids as the corresponding Jetpack sites. Therefore, I wrote it this way because if we have more than one site tracked we know there's another site with the same ID that isn't the one under inspection.

Honestly some context would be really helpful here! What is this function supposed to do, what example inputs would we expect to mark as collisions? what possibilities could we get here that we need to handle?

let siteSet = jetpackUrls.get( baseUrl );

if ( undefined === siteSet ) {
siteSet = new Set();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we not just store the site url? Why do we need the ID? Is there a reason to think that they might not be unique?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment below - I wrote this from a perspective of ignorance, so I have tried as hard as I can to preserve existing behavior and I think the existing behavior here depends on knowing what is stored in a URL. If we never get more than one and rely on that assumption then I'm all for simplifying.

// first make sure all Jetpack sites are accounted for
sites.forEach( site => site.jetpack && addJetpackSiteUrl( site ) );

// then find non-Jetpack sites sharing the same URL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably end early if there are no Jetpack sites in jetpackUrls

@enejb
Copy link
Member

enejb commented Nov 14, 2017

I though we had tests for this but I don't think we do anymore. Could we add some?

@aidvu
Copy link
Contributor

aidvu commented Nov 14, 2017

@enejb got here first I guess. I can't agree more on the tests part. The new code is a bit hard to read now.

@enejb
Copy link
Member

enejb commented Nov 16, 2017

Hey @dmsnell
Here is a bit more history and context.

We have a problem on .com where sites that used to be .com sites become Jetpack sites and vice versa. And when we try to show the ui of the site in calypso (which in url always uses the site url) we have a problem mapping the site -> to right site. Since a site url could really mean a .com site or a Jetpack site. In the ideal world that would never happen but sometimes users don't properly disconnect the Jetpack site or they don't properly map the domain. You can read more about the issue here p7rcWF-gm-p2.

We have done a number of things on the .com side to make it harder for the collision to occur but it still can happen. So this fix still needs to be in place the code in markCollisions function shouln't really be needed any more since it is based on the old siteslist. But I guess since it is still around we should try to make it faster/better. Here is the equivalent code for the (redux) state side which I think is easier to follow. And also has tests.

Hopefully some of the above makes things more clear. let me know if you have any questions.

@gwwar
Copy link
Contributor

gwwar commented Nov 16, 2017

Is there any reason why we couldn't move this work to the server?

@gwwar
Copy link
Contributor

gwwar commented Nov 16, 2017

Otherwise I +1 the recommendation for setting up tests that work with the old/new implementations.

@dmsnell
Copy link
Member Author

dmsnell commented Nov 21, 2017

We have a problem on .com where sites that used to be .com sites become Jetpack sites and vice versa. And when we try to show the ui of the site in calypso (which in url always uses the site url) we have a problem mapping the site -> to right site. Since a site url could really mean a .com site or a Jetpack site. In the ideal world that would never happen but sometimes users don't properly disconnect the Jetpack site or they don't properly map the domain. You can read more about the issue here p7rcWF-gm-p2.

We have done a number of things on the .com side to make it harder for the collision to occur but it still can happen. So this fix still needs to be in place the code in markCollisions function shouln't really be needed any more since it is based on the old siteslist. But I guess since it is still around we should try to make it faster/better. Here is the equivalent code for the (redux) state side which I think is easier to follow. And also has tests.

Thanks! I'm going to rephrase what you said @enejb as I try to understand the desired behavior:

For every site object marked as a dotcom site, we want to know if there is another site object (marked as a Jetpack site) which shares the same ID.

Is that accurate?

@enejb
Copy link
Member

enejb commented Nov 21, 2017

Is there any reason why we couldn't move this work to the server?

I think that would be a good idea. I think the reason why we didn't do it yet is because it would make the slow api endpoint slower. But I think it would be a good thing to test out.

Is that accurate?

Almost. For every site object marked as a dotcom site, we want to know if there is another site object (marked as a Jetpack site) which shares the same domain.

@aidvu
Copy link
Contributor

aidvu commented Nov 21, 2017

Almost. For every site object marked as a dotcom site, we want to know if there is another site object (marked as a Jetpack site) which shares the same domain.

Almost. 😛 At some point, we thought that the site_slug is unique, and that we can assign it whichever domain is set as Primary for that site.

This is problematic because we identify things using the site_slug in almost every link. What this code did was mark such sites as conflicting, and then some other part of the code would set the site_slug for conflicting WPCOM sites to the .wp.com subdomain. :)

You have an example in my account, if you want to check. A domain-only site blablatest1.com. Then I connected that domain to a Jetpack blog, and that blog got the same slug.

@aidvu
Copy link
Contributor

aidvu commented Nov 21, 2017

Is there any reason why we couldn't move this work to the server?

Love this idea, so we skip all the mumbo-jumbp mutating site objects.

@dmsnell dmsnell force-pushed the performance/sites-list-mark-collisions branch from e9430ce to 056a3e3 Compare November 21, 2017 18:37
@dmsnell
Copy link
Member Author

dmsnell commented Nov 21, 2017

@enejb and @aidvu: armed with more knowledge about what this ought to do I have updated to a much simpler and faster variant. I can update tests and add documentation if you like this approach

@dmsnell dmsnell force-pushed the performance/sites-list-mark-collisions branch 2 times, most recently from 8dfd6fb to ef29de0 Compare November 21, 2017 19:44
@gwwar
Copy link
Contributor

gwwar commented Nov 21, 2017

Thanks for adding tests @dmsnell! Code changes make sense to me.

@aidvu or @hoverduck do you happen to have conflicting sites handy for manual testing?

@aidvu
Copy link
Contributor

aidvu commented Nov 21, 2017

@gwwar my WP.com account.

@gwwar
Copy link
Contributor

gwwar commented Nov 21, 2017

To summarize a chat with @aidvu, to create a conflicting site:

  1. Purchase a domain from https://wordpress.com/domains/ (when purchasing just a domain this creates a site as a side-effect)
  2. Get hosting from someplace (eg Pressable)
  3. Set the primary of your site to this domain, install Jetpack and connect the domain you bought with it.

Had to move `markCollisions` itself out of the primary sites list file
in order to avoid ugly mocking of the `store` lib.
@dmsnell dmsnell force-pushed the performance/sites-list-mark-collisions branch from ef29de0 to 3a82de9 Compare November 23, 2017 01:19
@dmsnell
Copy link
Member Author

dmsnell commented Nov 23, 2017

to create a conflicting site:

@gwwar and @aidvu how will I know if this is right once I have a conflicting site?

@aidvu would you be able to do a quick test and confirm these changes?

@aidvu aidvu self-requested a review November 23, 2017 13:36
@aidvu aidvu added [Status] Ready to Merge and removed [Status] Needs Review The PR is ready for review. This also triggers e2e canary tests and wp-desktop tests automatically. labels Nov 23, 2017
Copy link
Contributor

@aidvu aidvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. :) Nice job Dennis.

Rewrote the test, ran it with the all code, looking good. Tested my own conflicting sites, looking good. 🙇‍♂️

hasCollision = some( collisions, function( someSite ) {
return (
someSite.jetpack &&
site.ID !== someSite.ID &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing I'm thinking about is whether we need this or not? I don't think there's a chance for 2 sites to be jetpack and not-jetpack but share the same ID.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the selector version which @enejb linked has done away with the IDs altogether, and after seeing his description it seemed like they were irrelevant.

@dmsnell dmsnell merged commit 7dbc8a7 into master Nov 23, 2017
@dmsnell dmsnell deleted the performance/sites-list-mark-collisions branch November 23, 2017 17:48
@enejb
Copy link
Member

enejb commented Nov 23, 2017

Thanks for working on this @dmsnell and making things faster!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants