-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Reduce number of jump-stubs on ARM64 via smaller preserved space #63842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -66,7 +66,12 @@ void ExecutableAllocator::InitLazyPreferredRange(size_t base, size_t size, int r | |
| // to coreclr.dll. This avoids having to create jump stubs for calls to | ||
| // helpers and R2R images loaded close to coreclr.dll. | ||
| // | ||
| #ifdef TARGET_XARCH | ||
| SIZE_T reach = 0x7FFF0000u; | ||
| #else | ||
| // Smaller size for ARM64 where relative calls/jumps only work within 128MB | ||
| SIZE_T reach = 0x7FF0000u; | ||
|
||
| #endif | ||
|
|
||
| // We will choose the preferred code region based on the address of coreclr.dll. The JIT helpers | ||
| // in coreclr.dll are the most heavily called functions. | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much of this range gets consumed in a real-world ASP.NET app?
Note that we map all sorts of stuff into this range, including R2R images. I would expect that 128MB gets exhausted fairly quickly given how things work today.
I agree that this fix works well for micro-benchmarks that are very unlikely to exhaust the 128MB range.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will try to investigate, I guess the idea that the hot code (tier1) should better be closer to VM's FCalls by default?
For large apps I guess we still want to try to address your suggestion with emitting direct addresses without jump-stubs in tier1 #62302 (I have a draft)
For R2R-only it can be improved by pgo + method sorting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think R2R code today benefits from being close to coreclr as all 'external' calls/branches have to go through indirection cells anyway. This may have been different in the days of fragile ngen?
Please correct me if I'm wrong @jkotas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think it is just for large apps. With TC enabled, all managed->managed method calls go through a precode that has the exact same instructions as jump stub and so it will introduce similar bottleneck as what you have identified.
R2R images are generally smaller than 128MB. You can only sort within the image, so the sorting won't help with jump stubs. (Sorting within image is still good for locality.)
Also, once we get this all fixed, we may want to look at retuning the inliner. My feeling is that the inliner expands the code too much these days. Some of it may be just compensating for the extra method call overhead that we are paying today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calls from runtime generated stubs and JITed code to R2R code still benefit from the two being close.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these not go through an indirection when tiering is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - when tiering is enabled. No - when tiering is disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll start from your suggestion to emit direct calls for T1 Caller calls T1 Callee (not as part of this PR)