-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Multicorejit unification #48326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multicorejit unification #48326
Conversation
9d4d725 to
956aa05
Compare
|
It may take me a couple days, but yes I'll take a look : ) Also @kouvel might be a good person to give it a look. Tiered compilation has some overlap with multicore JIT. |
|
Spent some time tonight learning about this code. Not done yet but haven't forgotten : ) |
|
@mangod9 yes, this is ready for review and merge |
noahfalk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Most of the comment are some stylistic suggestions but I do think removing the requirement for multiple processors should be evaluated by others.
@mangod9 - Ultimately if a bug was uncovered I wouldn't be the dev who owned fixing it. Whoever that dev would be its probably good if they get a chance to look at this before it is merged : )
src/coreclr/vm/multicorejitimpl.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
| // 5. Simple module name stored | |
| // 6. Simple module name stored |
src/coreclr/vm/multicorejit.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What enforces that this length assertion will be true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was an assertion before but the length limit was higher. I don't know how long signature lengths can get, this should probably be a real check and skip the method in that case.
src/coreclr/vm/multicorejit.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Earlier the code enforced that sigSize < MAX_SIGNATURE_LENGTH, but does anything enforce that AlignUp(sigSize + 8, 4) < MAX_SIGNATURE_LENGTH?
src/coreclr/vm/multicorejit.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
| DWORD dwSize = ((DWORD)(dataSize + sizeof(DWORD) - 1) / sizeof(DWORD)) * sizeof(DWORD); | |
| DWORD dwSize = AlignUp(dataSize, sizeof(DWORD)); |
src/coreclr/vm/multicorejit.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On arm devices cpus might be enabled dynamically (cpu hotplug). Thus, during coreclr_initialize there might be only one enabled cpu, yet there'll be more when there's enough work to do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gbalykov. My first thought is that if we're going to remove this check, we're going to want to limit the removal to Linux arm builds. It looks like per https://community.arm.com/developer/tools-software/oss-platforms/w/docs/529/cpuidle-hotplug, hotplug is only available on Linux.
My next thought is that I'm wanting to understand more about the hotplug behavior. Multi-Core JIT looks at the CPU count once when it gets invoked, and that's it. It sounds like we might need to modify Multi-Core JIT to be hotplug aware (if that's possible) so that we don't needlessly do aggressive assembly loading and jitting on the critical path if we want to support the hotplug scenario on arm. Is it possible to be hotplug aware?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option may be to make it configurable for now via clrconfigvalues.h and keep the default as it is currently. On CPU-limited environments where there is actually only one processor available, the background thread could otherwise steal a fair bit of CPU time from foreground work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option may be to make it configurable for now via clrconfigvalues.h
I've added this solution instead of check removal. By default 2 cpus will still be needed
src/coreclr/vm/multicorejitimpl.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The rest of this code uses PascalCasing for method names so I would align to that style
| static bool isMethod(unsigned data) | |
| static bool IsMethod(unsigned data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| MulticoreJitTrace(("ModuleRecord(%u) start module load", | |
| MulticoreJitTrace(("ModuleDependency(%u) start module load", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| MulticoreJitTrace(("ModuleRecord(%d) end module load, hr=%x", | |
| MulticoreJitTrace(("ModuleDependency(%d) end module load, hr=%x", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: if this is never false we may as well eliminate the if statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually would be part of the next PR, that's why initially I left it here even if it was dead code. Removed from this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block is dead code I think?
|
I'll take a look as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good to me, some comments below. Thanks!
src/coreclr/vm/multicorejitimpl.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to support this many modules? A large array of this many elements is allocated so if it's not necessary maybe something like 0x400 or 0x1000 would be reasonable. It looks like MAX_MODULES - 1 is also used as a mask, could update to use a separate constant MODULE_MASK that is specific to the packing format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that 0х1000 would be enough
src/coreclr/vm/multicorejitimpl.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the usage it looks like the max allowed signature length is actually 0xffff. To make it more clear I suggest changing the value to 0xffff (and usage to match), and maybe rename to SIGNATURE_LENGTH_MASK since it is also used as a mask.
src/coreclr/vm/multicorejitimpl.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be fairly fast to extract the record type with a shift and to check the ID, it doesn't seem to be worth making these powers of 2 just for that purpose (though I could have missed something)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, what I wanted was to have simple bitfield check in RecorderInfo::IsMethod instead of shift and equality comparison. I agree that this is not critical at all, fixed
src/coreclr/vm/multicorejitimpl.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the uses of the fields data1, data2, and ptr are difficult to read/understand (the uses are spread out too much), I suggest encapsulating away the data fields by making them private, and use constructors and instance methods to operate on the data/packing/unpacking, would be more clear in usage and knowledge of the data formats would hopefully be limited to methods in the class. Using a union of structs as @noahfalk suggested would also be helpful.
src/coreclr/vm/multicorejitimpl.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m_ModuleList could be allocated separately too for consistency, since that could also be a modestly sized array. My preference would be to keep that and m_JitInfoArray as inline arrays as before (MulticoreJitRecorder is new'ed anyway, and the behavior in some error conditions would be more consistent with before), though perhaps the inline arrays could be moved to be the last fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this could fail if count happened to be 0 and hr == COR_E_BADIMAGEFORMAT from above. Probably need a SUCCEEDED(hr) here and in the if expression below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should either be & (MAX_SIGNATURE_LENGTH - 1) (if its value is 0x10000), or & MAX_SIGNATURE_LENGTH (if its value is changed to 0xffff as I suggested in another comment). In any case it appears the length is not used, so it could just be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noticing! I've fixed this typo, but forgot to update it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is not used and could be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This'll become used in next PR. Removed from this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above regarding & MAX_SIGNATURE_LENGTH or & (MAX_SIGNATURE_LENGTH - 1), could be removed
src/coreclr/vm/multicorejit.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option may be to make it configurable for now via clrconfigvalues.h and keep the default as it is currently. On CPU-limited environments where there is actually only one processor available, the background thread could otherwise steal a fair bit of CPU time from foreground work.
956aa05 to
b5cb5ae
Compare
|
@noahfalk @kouvel Thanks for reviews! I've updated PR according to your notes. I've also measured mcj profile size with different modes and here are the results. For all cases there were no difference in performance between different modes. So, with current version of PR profile size increases on just 26-31%, instead of 100% increase on initial version of this PR. 1. x64 debug, console helloworld, no ni.dll:
2. tizen armel release, console helloworld, no ni.dll:
3. tizen armel release, crossgen2 for crossgen2.dll, no ni.dll:./corerun ./crossgen2/crossgen2.dll -r:`pwd`/*.dll -r:`pwd`/crossgen2/*.dll -O --parallelism=1 -o:/tmp/1.ni.dll `pwd`/crossgen2/crossgen2.dll
|
b5cb5ae to
9107365
Compare
Current multicorejit implementation in master has multiple flaws with NDirect methods: - exception might be thrown inside GetStubForInteropMethod at some point for NDirect method, which will kill background thread, thus, reducing effectiveness of multicorejit (for example, occurs when multicorejit is used with crossgen2) - some NDirect methods can lead to asserts during load inside GetStubForInteropMethod (for example, EvpMdCtxDestroy (0x6000044 token) from System.Security.Cryptography.Algorithms.dll)
9107365 to
3be92a6
Compare
…igure minimum allowed number of cpus for MultiCoreJit. On arm with cpu hotplug it should be set to 1.
3be92a6 to
339a561
Compare
This change reduces mcj profile size
This change reduces mcj profile size
339a561 to
4a3c5d2
Compare
kouvel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good to me, just one main comment and a couple of minor things. Thanks for sharing the data, that looks good to me as well.
src/coreclr/vm/multicorejit.cpp
Outdated
| for (LONG i = 0 ; i < m_JitInfoCount; i++) | ||
| { | ||
| for (LONG i = 0 ; i < m_GenericInfoCount; i++) | ||
| SigBuilder sigBuilder; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be moved into the if block for generic methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
| HRESULT hr = S_OK; | ||
|
|
||
| // Preprocessing Generic Methods | ||
| // Preprocessing Methods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The additional comments below are no longer valid, could be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
| } | ||
| else if (rcdTyp == MULTICOREJIT_GENERICMETHOD_RECORD_ID) | ||
| { | ||
| unsigned signatureLength = * (const unsigned short *) (((const unsigned *) pBuffer) + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be the intention in this part to check if there is enough space in the remaining buffer before decoding. From the loop condition we know that there is a DWORD worth of space but this is reading past that, so should check the remaining buffer length before dereferencing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
kouvel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
|
Failure looks unrelated, #52527 |
|
Thanks! |
* Fix assertion failure / crash in multi-core JIT - When the recorder times out it doesn't actually stop profiling, but writes out the profile - The app may later stop profiling, and then it tries to write the profile again - PR #48326 fairly expected that the profile is only written once (some state is mutated) - The non-timeout stop-profile path was also not stopping the timer - Fix for #53014 in main
* Add background type preloading based on multicorejit This is a second part of #48326 change, which enables handling of methods loaded from r2r images. Background thread of multicorejit now not only jits methods but also loads methods from R2R images. This allows to load types in background thread. This is required as part of #45748 change (specifically, #45748 (comment)), goal of which is to enable background type preloading using multicorejit.
This PR unifies non-generic/shared-generic methods with generic methods in multicorejit, allowing to keep the original jit order. This is required as part of #45748 change (specifically, #45748 (comment)), goal of which is to enable background type preloading using multicorejit.
Generic methods are stored in profile using their binary signature and non-generics methods are stored using method tokens. However, for non-generic methods record size has double (8 bytes now), which allowed to fit additional data. Thus, there's no limit on method token now (previously there was
METHODINDEX_MASK). Also, limit for module index has increased from 512 (MAX_MODULES) to 4096. Additional info can be stored for each method like JIT_BY_APP_THREAD_TAG or NO_JIT_TAG (NO_JIT_TAG will be added in further changes).Additionally, NDirect methods are removed, because their current implementation leads to asserts and exceptions. Also, few checks for number of cpus are configurable with env variables now, thus, allowing to use multicorejit on arm with cpu hotplug.
cc @alpencolt