-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Reduce number of jump-stubs on ARM64 via smaller preserved space #63842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -191,10 +191,11 @@ class ExecutableMemoryAllocator | |
| // This constant represent the max size of the virtual memory that this allocator | ||
| // will try to reserve during initialization. We want all JIT-ed code and the | ||
| // entire libcoreclr to be located in a 2GB range. | ||
| static const int32_t MaxExecutableMemorySize = 0x7FFF0000; | ||
| static const int32_t MaxExecutableMemorySize = 0x7FFF0000;// 2GB - 64KB | ||
| #else | ||
| static const int32_t CoreClrLibrarySize = 16 * 1024 * 1024; | ||
| static const int32_t MaxExecutableMemorySize = 128 * 1024 * 1024; | ||
| // Smaller values for ARM64 where relative calls/jumps only work within 128MB | ||
| static const int32_t CoreClrLibrarySize = 32 * 1024 * 1024; | ||
| static const int32_t MaxExecutableMemorySize = 0x7FF0000; // 128MB - 64KB | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How much of this range gets consumed in a real-world ASP.NET app? Note that we map all sorts of stuff into this range, including R2R images. I would expect that 128MB gets exhausted fairly quickly given how things work today. I agree that this fix works well for micro-benchmarks that are very unlikely to exhaust the 128MB range.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will try to investigate, I guess the idea that the hot code (tier1) should better be closer to VM's FCalls by default? For R2R-only it can be improved by pgo + method sorting?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do not think R2R code today benefits from being close to coreclr as all 'external' calls/branches have to go through indirection cells anyway. This may have been different in the days of fragile ngen?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I do not think it is just for large apps. With TC enabled, all managed->managed method calls go through a precode that has the exact same instructions as jump stub and so it will introduce similar bottleneck as what you have identified.
R2R images are generally smaller than 128MB. You can only sort within the image, so the sorting won't help with jump stubs. (Sorting within image is still good for locality.) Also, once we get this all fixed, we may want to look at retuning the inliner. My feeling is that the inliner expands the code too much these days. Some of it may be just compensating for the extra method call overhead that we are paying today.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Calls from runtime generated stubs and JITed code to R2R code still benefit from the two being close.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Do these not go through an indirection when tiering is enabled?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes - when tiering is enabled. No - when tiering is disabled.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'll start from your suggestion to emit direct calls for T1 Caller calls T1 Callee (not as part of this PR) |
||
| #endif | ||
|
|
||
| static const int32_t MaxExecutableMemorySizeNearCoreClr = MaxExecutableMemorySize - CoreClrLibrarySize; | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.