Skip to content

Conversation

@mikem8361
Copy link
Contributor

@mikem8361 mikem8361 commented Oct 8, 2021

Refactor the DAC enumerate memory region phase out of gather crash info. This is so the crash report json is written and available before the core dump which for VS4Mac currently takes 4 minutes. After this change both the crash report and core dump take 3 - 5 secs.

Since on both Linux and MacOS all the RW regions have been already added by createdump itself for heap dumps, then the sometimes slow heap enum memory region is changed to the faster "normal" (MiniDumpNormal) one. It adds necessary DAC globals, etc. without the costly assembly, module, class, type runtime data structure enumeration.

This is a show stopper for the VS4Mac team.

@mikem8361 mikem8361 requested review from hoyosjs and janvorli October 8, 2021 22:45
@mikem8361 mikem8361 self-assigned this Oct 8, 2021
@mikem8361 mikem8361 changed the title Refactor the DAC enumerate memory region phase out of gather crash info Fix VS4Mac crash report and core dump generation perf problems Oct 9, 2021
@mikem8361
Copy link
Contributor Author

/cc: @noahfalk, @kdubau, @tommcdon

Copy link
Member

@hoyosjs hoyosjs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple questions, otherwise LGTM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we try to poison this in the case the pointer is null? And did you find a case where this was true? It should only be true if BACKGROUND_GC is not true.

Copy link
Contributor Author

@mikem8361 mikem8361 Oct 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attached to the WebApp debuggee process from the SOS tests (they were segfault'ing on this line) and the g_gcDacGlobals pointer was null. I'm not sure how this is happening unless BACKGROUND_GC is true like you said. The actual GC variable (gc_heap::current_c_gc_state) it is suppose to be pointing to was 2 (c_gc_state_free). I probably should set it to c_gc_state_free if it is null.

/cc: @Maoni0 @cshung do you have any clue why g_dacDacGlobals->current_c_gc_state isn't initialized?

Copy link
Contributor

@cshung cshung Oct 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like I accidentally moved that into !MULTIPLE_HEAP so it fails for server GC. It should work now after I pushed the fix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the detailsData->current_c_gc_state initialized to NULL somewhere upstream so that it doesn't end up having a random valueif the condition is false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm going to push a change that initializes it to c_gc_state_free.

@mikem8361
Copy link
Contributor Author

@hoyosjs can you review the opt-in env var code in crashinfo.cpp line 295? Thanks.

Mike McLaughlin and others added 5 commits October 11, 2021 23:40
This is so the crash report json is written and available before the core
dump which for VS4Mac currently takes 4 minutes.
by createdump itself for heap dumps, then the sometimes slow (4 minutes for
VS4Mac) heap enum memory region is changed to the faster normal one. It adds
necessary DAC globals, etc. without the costly assembly, module, class, type
runtime data structure enumeration.
…r to mitigate the

risk of something missing from these dumps.
// enumeration.
if (minidumpType & MiniDumpWithPrivateReadWriteMemory)
{
char* fastHeapDumps = getenv("COMPlus_DbgEnableFastHeapDumps");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only part I don't like about this is we are back to adding debt around the COMPlus_ vs DOTNET_. It's fixed in main - and I don't want to regress this. Granted - this is in 7.0 and I don't know if it's better in a follow up PR or on this one and just ignore that commit on the cherrypick. Basically #include <clrconfignocache.h> and CLRConfigNoCache fastHeapDumpCfg = CLRConfigNoCache::Get("DbgEnableFastHeapDumps", /*noprefix*/ false, &getenv); and then if (fastHeapDumpCfg.IsSet() && fastHeapDumpCfg.TryAsInteger(10, x) && x == 1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I don't like that either. After this gets backported to 6.0 I'm going to either move it to a command option and do the env var in the pal launch code or use the clrconfigNoCache directly.

@kdubau
Copy link
Member

kdubau commented Oct 12, 2021

@mikem8361 is there a flag so we can run createdump directly with FastHeapDumps enabled?

@mikem8361
Copy link
Contributor Author

@kdubau you would need to set the COMPlus_DbgEnableFastHeapDumps=1 env var before launching createdump. You are aware that you will need to launch createdump as superuser? It may be better to use the Microsoft.Diagnostics.NETCore.Client.WriteDump() API from the diagnostics repo to generate these "hang" dumps because it doesn't have that requirement. It sends a message via the diagnostic server IPC channel. I just attempted to verify that this would work against the VS4Mac bundle that Greg sent me and it seems the this IPC channel isn't there or enabled. I'm not sure what is going there.

@sgmunn
Copy link

sgmunn commented Oct 12, 2021

@mikem8361 we found this same issue, John and I have been looking into it and found it related to fork/exec when the terminal pad is started. We have a working alternate solution, but with the build you have, just don't open a solution - start vsmac and then you can connect to the diagnostic pipes. I'll ping you with a newer build

@mikem8361 mikem8361 merged commit e734fde into dotnet:main Oct 12, 2021
@mikem8361 mikem8361 deleted the macoscrashreport branch October 12, 2021 19:34
@mikem8361
Copy link
Contributor Author

/backport to release/6.0

@github-actions
Copy link
Contributor

Started backporting to release/6.0: https://github.com/dotnet/runtime/actions/runs/1334586191

@github-actions
Copy link
Contributor

@mikem8361 backporting to release/6.0 failed, the patch most likely resulted in conflicts:

$ git am --3way --ignore-whitespace --keep-non-patch changes.patch

Applying: Refactor the DAC enumerate memory region phase out of gather crash info
.git/rebase-apply/patch:224: space before tab in indent.
        	TRACE("MODULE: %" PRIA PRIx64 " dyn %d inmem %d file %d pe %" PRIA PRIx64 " pdb %" PRIA PRIx64, (uint64_t)moduleData.LoadedPEAddress, moduleData.IsDynamic,
warning: 1 line adds whitespace errors.
Using index info to reconstruct a base tree...
M	src/coreclr/debug/createdump/crashinfo.cpp
Falling back to patching base and 3-way merge...
Auto-merging src/coreclr/debug/createdump/crashinfo.cpp
CONFLICT (content): Merge conflict in src/coreclr/debug/createdump/crashinfo.cpp
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Refactor the DAC enumerate memory region phase out of gather crash info
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

mikem8361 pushed a commit to mikem8361/runtime that referenced this pull request Oct 12, 2021
This is a VS4Mac show stopper. The performance (4 min or so) of taking a core dump
when VS4Mac crashes or hangs is unacceptable.

Backport of dotnet#60205

Refactor the DAC enumerate memory region phase out of gather crash info

This is so the crash report json is written and available before the core
dump which for VS4Mac currently takes 4 minutes.

Since on both Linux and MacOS all the RW regions have been already added
by createdump itself for heap dumps, then the sometimes slow (4 minutes for
VS4Mac) heap enum memory region is changed to the faster normal one. It adds
necessary DAC globals, etc. without the costly assembly, module, class, type
runtime data structure enumeration.

This fast heap dumps is opt'ed in with COMPlus_DbgEnableFastHeapDumps=1 env var to mitigate the
risk of something missing from these dumps.

Tested creating a crash report/core dump against VS4Mac process. Ran all the SOS tests on MacOS and Linux against this change.

Low since there is an opt-in env var that enables the most risk part.
Anipik pushed a commit that referenced this pull request Oct 14, 2021
This is a VS4Mac show stopper. The performance (4 min or so) of taking a core dump
when VS4Mac crashes or hangs is unacceptable.

Backport of #60205

Refactor the DAC enumerate memory region phase out of gather crash info

This is so the crash report json is written and available before the core
dump which for VS4Mac currently takes 4 minutes.

Since on both Linux and MacOS all the RW regions have been already added
by createdump itself for heap dumps, then the sometimes slow (4 minutes for
VS4Mac) heap enum memory region is changed to the faster normal one. It adds
necessary DAC globals, etc. without the costly assembly, module, class, type
runtime data structure enumeration.

This fast heap dumps is opt'ed in with COMPlus_DbgEnableFastHeapDumps=1 env var to mitigate the
risk of something missing from these dumps.

Tested creating a crash report/core dump against VS4Mac process. Ran all the SOS tests on MacOS and Linux against this change.

Low since there is an opt-in env var that enables the most risk part.
@ghost ghost locked as resolved and limited conversation to collaborators Nov 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants