Skip to content

Conversation

@jakobbotsch
Copy link
Member

For partially interruptible methods there is a mismatch between where GC
stress runs GCs and where normal return address hijacking would run GCs.
GC stress runs the GC on the first instruction after a call returns,
where there for partially interruptible methods is no GC info that says
to protect GC pointers in return registers (thanks @jkotas for the explanation).
This means GC stress needs to do some special work to figure out that these
need to be protected.

The way it does that is the following:

  • For direct call sites, in GCCoverageInfo::SprinkleBreakpoints it gets
    the target MD of each call site and places a special illegal
    instruction right after the call that the GC stress handler will use
    to figure out which (if any) registers have GC pointers that need
    protection

  • For indirect call sites, in GCCoverageInfo::SprinkleBreakpoint it will
    first place an illegal instruction on the call instruction so that the
    GC stress handler will break there. Once the GC stress handler breaks,
    it computes the target address and gets the target MD from that, and
    then places the illegal instruction on the next instruction like
    above.

GCCoverageInfo::SprinkleBreakpoints runs right after jitting and should
therefore not race with anything. However, the GC stress handler can
race with any other thread accessing the same function. That makes the
latter problematic on x86 where unwinding reads the epilog code to work.
In this particular case thread A is about to unwind through the epilog
when thread B stops on an indirect call right before the epilog. Thread
B then overwrites the first instruction of the epilog, causing thread A
to unwind incorrectly.

The issue seems to be a long-standing one, but we are hitting it after
#65738 where we started using indirect calls more often.

UnwindStackFrame already has access to the GC stress saved code and is
actually already using it for other unwinding. To fix the issue, make it
use the saved code for unwinding epilogs as well.

Fix #68431

For partially interruptible methods there is a mismatch between where GC
stress runs GCs and where normal return address hijacking would run GCs.
GC stress runs the GC on the first instruction after a call returns,
where there for partially interruptible methods is no GC info that says
to protect GC pointers in return registers. It means GC stress needs to
do some special work to figure out that these need to be protected.

The way it does that is the following:

* For direct call sites, in GCCoverageInfo::SprinkleBreakpoints it gets
  the target MD of each call site and places a special illegal
  instruction right after the call that the GC stress handler will use
  to figure out which (if any) registers have GC pointers that need
  protection

* For indirect call sites, in GCCoverageInfo::SprinkleBreakpoint it will
  first place an illegal instruction on the call instruction so that the
  GC stress handler will break there. Once the GC stress handler breaks,
  it computes the target address and gets the target MD from that, and
  then places the illegal instruction on the next instruction like
  above.

GCCoverageInfo::SprinkleBreakpoints runs right after jitting and should
therefore not race with anything. However, the GC stress handler can
race with any other thread accessing the same function. That makes the
latter problematic on x86 where unwinding reads the epilog code to work.
In this particular case thread A is about to unwind through the epilog
when thread B stops on an indirect call right before the epilog. Thread
B then overwrites the first instruction of the epilog, causing thread A
to unwind incorrectly.

UnwindStackFrame already has access to the GC stress saved code and is
actually already using it for other unwinding. To fix the issue, make it
use the saved code for unwinding epilogs as well.

Fix dotnet#68431
@ghost ghost added the area-VM-coreclr label Jul 21, 2022
@ghost ghost assigned jakobbotsch Jul 21, 2022
@jakobbotsch

This comment was marked as outdated.

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jakobbotsch jakobbotsch requested review from janvorli and jkotas July 21, 2022 13:37
Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@jakobbotsch jakobbotsch merged commit e092687 into dotnet:main Jul 21, 2022
@jakobbotsch jakobbotsch deleted the fix-68431 branch July 21, 2022 16:41
@ghost ghost locked as resolved and limited conversation to collaborators Aug 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

3 participants