-
Notifications
You must be signed in to change notification settings - Fork 2k
Performance: Optimize SitesList.markCollisions
#19735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
thanks for pulling in @aidvu @klimeryk - I should not that in its current form this persists the state across a Calypso session. if we have issues when sites become Jetpack during a session we can restart by clearing out the persistent data structures on each call to |
client/lib/sites-list/list.js
Outdated
| } | ||
|
|
||
| const jetpackSites = jetpackUrls.get( withoutHttp( site.URL ) ); | ||
| const siteCount = jetpackSites.size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we rename siteCount to jetpackSitesCount.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see much value in it. Jetpack is almost a given considering the context around this line, but I don't care much one way or the other; found jetpackSiteCount needlessly verbose
client/lib/sites-list/list.js
Outdated
|
|
||
| // then find non-Jetpack sites sharing the same URL | ||
| sites.forEach( site => { | ||
| if ( ! site.jetpack ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this not be ( site.jetpack ) return? Since we would assume that site from then on should be a .com site and not a Jetpack site.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think you're right - thanks!
client/lib/sites-list/list.js
Outdated
| const jetpackSites = jetpackUrls.get( withoutHttp( site.URL ) ); | ||
| const siteCount = jetpackSites.size; | ||
|
|
||
| if ( siteCount > 1 || ( siteCount === 1 || ! jetpackSites.has( site.ID ) ) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be able to just test for siteCount > 0 here. No?
Since we are looking for jetpack sites that have the same url.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly I was trying to update the code without changing the behavior. The purpose of this function was unclear to me and I wasn't sure if we'd end up with sites sharing the same ids as the corresponding Jetpack sites. Therefore, I wrote it this way because if we have more than one site tracked we know there's another site with the same ID that isn't the one under inspection.
Honestly some context would be really helpful here! What is this function supposed to do, what example inputs would we expect to mark as collisions? what possibilities could we get here that we need to handle?
client/lib/sites-list/list.js
Outdated
| let siteSet = jetpackUrls.get( baseUrl ); | ||
|
|
||
| if ( undefined === siteSet ) { | ||
| siteSet = new Set(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we not just store the site url? Why do we need the ID? Is there a reason to think that they might not be unique?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment below - I wrote this from a perspective of ignorance, so I have tried as hard as I can to preserve existing behavior and I think the existing behavior here depends on knowing what is stored in a URL. If we never get more than one and rely on that assumption then I'm all for simplifying.
client/lib/sites-list/list.js
Outdated
| // first make sure all Jetpack sites are accounted for | ||
| sites.forEach( site => site.jetpack && addJetpackSiteUrl( site ) ); | ||
|
|
||
| // then find non-Jetpack sites sharing the same URL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably end early if there are no Jetpack sites in jetpackUrls
|
I though we had tests for this but I don't think we do anymore. Could we add some? |
|
@enejb got here first I guess. I can't agree more on the tests part. The new code is a bit hard to read now. |
|
Hey @dmsnell We have a problem on .com where sites that used to be .com sites become Jetpack sites and vice versa. And when we try to show the ui of the site in calypso (which in url always uses the site url) we have a problem mapping the site -> to right site. Since a site url could really mean a .com site or a Jetpack site. In the ideal world that would never happen but sometimes users don't properly disconnect the Jetpack site or they don't properly map the domain. You can read more about the issue here p7rcWF-gm-p2. We have done a number of things on the .com side to make it harder for the collision to occur but it still can happen. So this fix still needs to be in place the code in Hopefully some of the above makes things more clear. let me know if you have any questions. |
|
Is there any reason why we couldn't move this work to the server? |
|
Otherwise I +1 the recommendation for setting up tests that work with the old/new implementations. |
Thanks! I'm going to rephrase what you said @enejb as I try to understand the desired behavior:
Is that accurate? |
I think that would be a good idea. I think the reason why we didn't do it yet is because it would make the slow api endpoint slower. But I think it would be a good thing to test out.
Almost. For every site object marked as a dotcom site, we want to know if there is another site object (marked as a Jetpack site) which shares the same domain. |
Almost. 😛 At some point, we thought that the This is problematic because we identify things using the You have an example in my account, if you want to check. A domain-only site |
Love this idea, so we skip all the mumbo-jumbp mutating site objects. |
e9430ce to
056a3e3
Compare
8dfd6fb to
ef29de0
Compare
|
Thanks for adding tests @dmsnell! Code changes make sense to me. @aidvu or @hoverduck do you happen to have conflicting sites handy for manual testing? |
|
@gwwar my WP.com account. |
|
To summarize a chat with @aidvu, to create a conflicting site:
|
Had to move `markCollisions` itself out of the primary sites list file in order to avoid ugly mocking of the `store` lib.
ef29de0 to
3a82de9
Compare
aidvu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. :) Nice job Dennis.
Rewrote the test, ran it with the all code, looking good. Tested my own conflicting sites, looking good. 🙇♂️
| hasCollision = some( collisions, function( someSite ) { | ||
| return ( | ||
| someSite.jetpack && | ||
| site.ID !== someSite.ID && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing I'm thinking about is whether we need this or not? I don't think there's a chance for 2 sites to be jetpack and not-jetpack but share the same ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Thanks for working on this @dmsnell and making things faster! |
In this patch I have rewritten
markCollisionswith performance inmind. Recently we were investigating freezes in Calypso (#19446) and
noticed that
SitesList.sync()was getting called over and over againand that this particular part of that process, a quadric operation in
terms of complexity, was potentially a big part of our CPU usage.
Although I could not reproduce the freezing behavior I went ahead and
tried to refactor this function so that it would remain linear with the
number of sites (well, it's
O( 2 * N )now because it iterates twice).The major change is storing the known Jetpack sites by URL in a mutable
map for quick retrieval. A quick pass can add in the jetpack sites then
a second pass can quickly determine if there are non-matching sites
sharing the same URL.
That is to say, we have replaced loop iteration with key lookup.
In my tests this was more than a magnitude in improvement, but overall
insignificant because I was only seeing this function take up 4-5ms on
average to run, plus it only ran a few times. (the function after these
changes was taking about 0.25ms)
If, however, we get into a loop and call this function a few hundred
times (as appears to have happened in a performance trace) then we could
end up shaving off a few hundred milliseconds of freeze time which
would indeed be noticeable.
Testing
When I was running this locally I actually compared the two algorithms to
make sure that I was marking the identical set of sites for the old one as
I was with the new one. I removed the code from the checked-in code for
the obvious reason that it doesn't belong in production.
Scan through the list of sites and make sure that their domains look like
they are appropriate. I might lean on @enejb for some help understanding
how to best test this out and what setup we might need (like specific
combinations of sites/jetpack sites).