Skip to content

Conversation

@davmason
Copy link
Contributor

Using one global evacuation counter per profiler was causing lots of contention in hot calls to CORProfilerPresent, this change switches to using one per thread.

It also switches hot CORProfiler* methods to FORCEINLINE, and changes CORProfilerPresent to be a much cheaper check, especially when a notification profiler is present.

@davmason davmason requested review from a team and jkotas September 29, 2021 08:55
@ghost
Copy link

ghost commented Sep 29, 2021

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

Issue Details

Using one global evacuation counter per profiler was causing lots of contention in hot calls to CORProfilerPresent, this change switches to using one per thread.

It also switches hot CORProfiler* methods to FORCEINLINE, and changes CORProfilerPresent to be a much cheaper check, especially when a notification profiler is present.

Author: davmason
Assignees: -
Labels:

area-Diagnostics-coreclr

Milestone: -

@davmason
Copy link
Contributor Author

Looks like there's also a CRST ordering issue: Assert failure(PID 9716 [0x000025f4], Thread: 12940 [0x328c]): Consistency check failed: Crst Level violation: Can't take level 12 lock CrstThreadStore because you already holding level 0 lock CrstProfilingAPIStatus, it should be fixed by adding the right relationship in crsttypes.def, I will update that later this afternoon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that the profiler detach can crash if it happens to interleave with a callback issued where GetThread is null?

How many of the callbacks are affected? Did we have this problem before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all of desktop and before my change in coreclr the evacuation counters were on the Thread * and it would only increment/decrement if GetThread returned non-null. So this behavior should be identical to 5.0 and previous runtimes.

I have been thinking since last night about if there was a latent bug. Mostly I don't think so, but there is a couple places like EventPipe callbacks that may not have a managed thread

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I am pretty suspicious there is a latent bug here when it comes to EventPipe notifications - they can be triggered on threads where GetThread() == NULL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@noahfalk thinking about it though, with the profiler they are synchronous on the thread that it happens. We may not actually have a bug, since I would expect any place EventPipe events happen to have an EE thread. Since we have the managed bridge from native events to EventPipe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue also affects server GC threads. Server GC threads do not have Thread*.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect any place EventPipe events happen to have an EE thread

I'm ~98% sure this isn't the case : ) Some of the EventPipe events are emitted where GetThread()==NULL (such as the events from Server GC threads @jkotas mentioned). Originally the code in EventPipe's WriteEvent() code refered to GetThread() for the thread-local write buffers but that was broken. We had to invent the new EventPipeThread object to work around the lack of Thread* for some events.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I agree there is a latent issue here, but it also has been present ever since the EventPipe stuff was merged in 3.1. Given that and that there's also an easy workaround - stop the session before detaching - I propose I file a bug and don't try to solve this now.

#60081

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wasn't pushing to have you fix it now given that it was pre-existing, just confirming that it does exist.

@davmason davmason force-pushed the notification_profiler_perf branch from da6b788 to 080d379 Compare September 30, 2021 16:40
Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

@davmason davmason force-pushed the notification_profiler_perf branch from 2257ec5 to e8e39fd Compare October 6, 2021 10:09
}

inline HRESULT ProfControlBlock::JITInlining(FunctionID callerId, FunctionID calleeId, BOOL *pfShouldInline)
{
LIMITED_METHOD_CONTRACT;

*pfShouldInline = TRUE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will initialize *pfShouldInline deterministically for the first profiler only.

Should this line rather be in JITInliningHelper instead? I think you can even delete pfShouldInline argument for the JITInliningHelper and have BOOL fShouldInline local in JITInliningHelper instead.

@jkotas
Copy link
Member

jkotas commented Oct 7, 2021

There is still unresolved feedback: #59741 (comment) . Do you plan to address it before merging?

I have not seen the last commit for some reason.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@jkotas jkotas merged commit 240a64d into dotnet:main Oct 7, 2021
@jkotas
Copy link
Member

jkotas commented Oct 7, 2021

/backport to release/6.0

@github-actions
Copy link
Contributor

github-actions bot commented Oct 7, 2021

Started backporting to release/6.0: https://github.com/dotnet/runtime/actions/runs/1314583486

@github-actions
Copy link
Contributor

github-actions bot commented Oct 7, 2021

@jkotas backporting to release/6.0 failed, the patch most likely resulted in conflicts:

$ git am --3way --ignore-whitespace --keep-non-patch changes.patch

Applying: Change profilers to use thread local evacuation counters
.git/rebase-apply/patch:191: trailing whitespace.
    return (&g_profControlBlock)->mainProfilerInfo.pProfInterface.Load() != NULL 
.git/rebase-apply/patch:1056: trailing whitespace.
        DWORD newValue = 
.git/rebase-apply/patch:1067: trailing whitespace.
        DWORD newValue = 
warning: 3 lines add whitespace errors.
Using index info to reconstruct a base tree...
M	src/coreclr/inc/profilepriv.inl
M	src/coreclr/vm/profilinghelper.cpp
M	src/coreclr/vm/proftoeeinterfaceimpl.cpp
M	src/coreclr/vm/threads.cpp
M	src/coreclr/vm/threads.h
Falling back to patching base and 3-way merge...
Auto-merging src/coreclr/vm/threads.h
Auto-merging src/coreclr/vm/threads.cpp
Auto-merging src/coreclr/vm/proftoeeinterfaceimpl.cpp
Auto-merging src/coreclr/vm/profilinghelper.cpp
Auto-merging src/coreclr/inc/profilepriv.inl
Applying: get rid of lambdas
.git/rebase-apply/patch:153: trailing whitespace.
        
.git/rebase-apply/patch:166: trailing whitespace.
    return pProfilerInfo->pProfInterface->IsCallback3Supported(); 
.git/rebase-apply/patch:194: trailing whitespace.
    return AnyProfilerPassesCondition(&RequiresGenericsContextForEnterLeaveHelper); 
warning: 3 lines add whitespace errors.
Using index info to reconstruct a base tree...
M	src/coreclr/inc/profilepriv.inl
Falling back to patching base and 3-way merge...
Auto-merging src/coreclr/inc/profilepriv.inl
CONFLICT (content): Merge conflict in src/coreclr/inc/profilepriv.inl
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0002 get rid of lambdas
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

davmason added a commit to davmason/runtime that referenced this pull request Oct 7, 2021
* Change profilers to use thread local evacuation counters

Change to prefix increment

* get rid of lambdas

* Fix jit inlining, fix R2R too

* Remove VolatilePtr<> from helpers

* Get rid of additionalData argument
Anipik pushed a commit that referenced this pull request Oct 8, 2021
…0116)

* Change profilers to use thread local evacuation counters

Change to prefix increment

* get rid of lambdas

* Fix jit inlining, fix R2R too

* Remove VolatilePtr<> from helpers

* Get rid of additionalData argument
@ghost ghost locked as resolved and limited conversation to collaborators Nov 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants