-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Fix VS4Mac crash report and core dump generation perf problems #60205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8e9de92 to
925ad52
Compare
hoyosjs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple questions, otherwise LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we try to poison this in the case the pointer is null? And did you find a case where this was true? It should only be true if BACKGROUND_GC is not true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I attached to the WebApp debuggee process from the SOS tests (they were segfault'ing on this line) and the g_gcDacGlobals pointer was null. I'm not sure how this is happening unless BACKGROUND_GC is true like you said. The actual GC variable (gc_heap::current_c_gc_state) it is suppose to be pointing to was 2 (c_gc_state_free). I probably should set it to c_gc_state_free if it is null.
/cc: @Maoni0 @cshung do you have any clue why g_dacDacGlobals->current_c_gc_state isn't initialized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like I accidentally moved that into !MULTIPLE_HEAP so it fails for server GC. It should work now after I pushed the fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the detailsData->current_c_gc_state initialized to NULL somewhere upstream so that it doesn't end up having a random valueif the condition is false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm going to push a change that initializes it to c_gc_state_free.
bf3bfda to
83df56f
Compare
|
@hoyosjs can you review the opt-in env var code in crashinfo.cpp line 295? Thanks. |
This is so the crash report json is written and available before the core dump which for VS4Mac currently takes 4 minutes.
by createdump itself for heap dumps, then the sometimes slow (4 minutes for VS4Mac) heap enum memory region is changed to the faster normal one. It adds necessary DAC globals, etc. without the costly assembly, module, class, type runtime data structure enumeration.
…r to mitigate the risk of something missing from these dumps.
83df56f to
f93a60e
Compare
| // enumeration. | ||
| if (minidumpType & MiniDumpWithPrivateReadWriteMemory) | ||
| { | ||
| char* fastHeapDumps = getenv("COMPlus_DbgEnableFastHeapDumps"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only part I don't like about this is we are back to adding debt around the COMPlus_ vs DOTNET_. It's fixed in main - and I don't want to regress this. Granted - this is in 7.0 and I don't know if it's better in a follow up PR or on this one and just ignore that commit on the cherrypick. Basically #include <clrconfignocache.h> and CLRConfigNoCache fastHeapDumpCfg = CLRConfigNoCache::Get("DbgEnableFastHeapDumps", /*noprefix*/ false, &getenv); and then if (fastHeapDumpCfg.IsSet() && fastHeapDumpCfg.TryAsInteger(10, x) && x == 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I don't like that either. After this gets backported to 6.0 I'm going to either move it to a command option and do the env var in the pal launch code or use the clrconfigNoCache directly.
|
@mikem8361 is there a flag so we can run |
|
@kdubau you would need to set the COMPlus_DbgEnableFastHeapDumps=1 env var before launching createdump. You are aware that you will need to launch createdump as superuser? It may be better to use the Microsoft.Diagnostics.NETCore.Client.WriteDump() API from the diagnostics repo to generate these "hang" dumps because it doesn't have that requirement. It sends a message via the diagnostic server IPC channel. I just attempted to verify that this would work against the VS4Mac bundle that Greg sent me and it seems the this IPC channel isn't there or enabled. I'm not sure what is going there. |
|
@mikem8361 we found this same issue, John and I have been looking into it and found it related to fork/exec when the terminal pad is started. We have a working alternate solution, but with the build you have, just don't open a solution - start vsmac and then you can connect to the diagnostic pipes. I'll ping you with a newer build |
|
/backport to release/6.0 |
|
Started backporting to release/6.0: https://github.com/dotnet/runtime/actions/runs/1334586191 |
|
@mikem8361 backporting to release/6.0 failed, the patch most likely resulted in conflicts: $ git am --3way --ignore-whitespace --keep-non-patch changes.patch
Applying: Refactor the DAC enumerate memory region phase out of gather crash info
.git/rebase-apply/patch:224: space before tab in indent.
TRACE("MODULE: %" PRIA PRIx64 " dyn %d inmem %d file %d pe %" PRIA PRIx64 " pdb %" PRIA PRIx64, (uint64_t)moduleData.LoadedPEAddress, moduleData.IsDynamic,
warning: 1 line adds whitespace errors.
Using index info to reconstruct a base tree...
M src/coreclr/debug/createdump/crashinfo.cpp
Falling back to patching base and 3-way merge...
Auto-merging src/coreclr/debug/createdump/crashinfo.cpp
CONFLICT (content): Merge conflict in src/coreclr/debug/createdump/crashinfo.cpp
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Refactor the DAC enumerate memory region phase out of gather crash info
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128Please backport manually! |
This is a VS4Mac show stopper. The performance (4 min or so) of taking a core dump when VS4Mac crashes or hangs is unacceptable. Backport of dotnet#60205 Refactor the DAC enumerate memory region phase out of gather crash info This is so the crash report json is written and available before the core dump which for VS4Mac currently takes 4 minutes. Since on both Linux and MacOS all the RW regions have been already added by createdump itself for heap dumps, then the sometimes slow (4 minutes for VS4Mac) heap enum memory region is changed to the faster normal one. It adds necessary DAC globals, etc. without the costly assembly, module, class, type runtime data structure enumeration. This fast heap dumps is opt'ed in with COMPlus_DbgEnableFastHeapDumps=1 env var to mitigate the risk of something missing from these dumps. Tested creating a crash report/core dump against VS4Mac process. Ran all the SOS tests on MacOS and Linux against this change. Low since there is an opt-in env var that enables the most risk part.
This is a VS4Mac show stopper. The performance (4 min or so) of taking a core dump when VS4Mac crashes or hangs is unacceptable. Backport of #60205 Refactor the DAC enumerate memory region phase out of gather crash info This is so the crash report json is written and available before the core dump which for VS4Mac currently takes 4 minutes. Since on both Linux and MacOS all the RW regions have been already added by createdump itself for heap dumps, then the sometimes slow (4 minutes for VS4Mac) heap enum memory region is changed to the faster normal one. It adds necessary DAC globals, etc. without the costly assembly, module, class, type runtime data structure enumeration. This fast heap dumps is opt'ed in with COMPlus_DbgEnableFastHeapDumps=1 env var to mitigate the risk of something missing from these dumps. Tested creating a crash report/core dump against VS4Mac process. Ran all the SOS tests on MacOS and Linux against this change. Low since there is an opt-in env var that enables the most risk part.
Refactor the DAC enumerate memory region phase out of gather crash info. This is so the crash report json is written and available before the core dump which for VS4Mac currently takes 4 minutes. After this change both the crash report and core dump take 3 - 5 secs.
Since on both Linux and MacOS all the RW regions have been already added by createdump itself for heap dumps, then the sometimes slow heap enum memory region is changed to the faster "normal" (MiniDumpNormal) one. It adds necessary DAC globals, etc. without the costly assembly, module, class, type runtime data structure enumeration.
This is a show stopper for the VS4Mac team.