Skip to content

Conversation

@cfallin
Copy link
Member

@cfallin cfallin commented Feb 28, 2022

In #3859 we discovered that the unwind metadata info on Windows is
placed slightly incorrectly in the presence of explicit stack checks in
function prologues. In particular, if the stack check fails, then we
start the process of backtracing from a state where we have actually
updated rbp (the frame pointer) but we have not emitted the metadata
saying we have done so.

The fix is to emit the SetFPReg Win32 unwind op exactly at the offset
of the mov rbp, rsp instruction, not after the stack check. However,
actually achieving this output given our current unwind design (which
abstracts over SysV and Win32) is slightly tricky, because:

  • We previously had a single DefineNewFrame op that both indicated
    where the FP was set, and gave the size of the clobbers;
  • We don't know the size of the clobbers until we generate the
    clobber-save sequence; and
  • The clobber-save sequence is generated after any explicit stack check.

To resolve the problem, this PR splits the DefineNewFrame op into two
pieces: SetFrameReg and DefineNewFrame. On x86-64 and aarch64 this
is emitted in the appropriate place. (s390x does not use a frame
register, so has no need for this op.) This should resolve the issue we
were seeing with Windows-2022 and unwinding.

(Thanks to @alexcrichton, @iximeow and @peterhuene for help debugging
this!)

I've switched the main Windows CI jobs back to Windows-2022 (naming this explicitly rather than windows-latest to pin our CI on a known version -- happy to switch that to -latest though if we think that's better). I've also added a job that explicitly tests Windows-2019 to ensure we don't have any other regressions. If the overhead for this is too high then maybe we can remove it after seeing a clean pass at least on this fix.

Fixes #3859.

@cfallin cfallin requested a review from alexcrichton February 28, 2022 23:29
@cfallin cfallin force-pushed the windows-unwind branch 2 times, most recently from 1374eb0 to a30f092 Compare February 28, 2022 23:34
@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen labels Feb 28, 2022
@cfallin
Copy link
Member Author

cfallin commented Mar 1, 2022

Hmm, that didn't work on Windows-2022; a generic "stack overrun" crash. This will have to go to my back burner for a bit until I have a workable Windows machine to debug on; in the meantime if anyone else wants to poke at this, please feel free!

@alexcrichton
Copy link
Member

I was poking around at this again today. For a trivially recursive function (func $foo call $foo) I think that this is the machine code we generate with this PR

  0x0:  pushq   %rbp
  0x1:  unwind PushFrameRegs { offset_upward_to_caller_sp: 16 }
  0x1:  movq    %rsp, %rbp
  0x4:  unwind SetFrameReg
  0x4:  movq    0(%rdi), %r10
  0x7:  movq    0(%r10), %r10
  0xa:  cmpq    %rsp, %r10
  0xd:  jbe ; ud2 stk_ovf ;
  0x15:  unwind DefineNewFrame { offset_upward_to_caller_sp: 16, offset_downward_to_clobbers: 0 }
block0:
  0x15:  movq    %rdi, %rsi
  0x18:  call    User { namespace: 0, index: 0 }
  0x1d:  jmp     label1
block1:
  0x1d:  movq    %rbp, %rsp
  0x20:  popq    %rbp
  0x21:  ret

where the prefixed hex number is the offset of that instruction within the MachBuffer

The UnwindInfo for windows generated by this PR is:

UnwindInfo {
    flags: 0,
    prologue_size: 21,
    frame_register: Some(
        5,
    ),
    frame_register_offset: 0,
    unwind_codes: [
        PushRegister {
            instruction_offset: 1,
            reg: 5,
        },
        SetFPReg {
            instruction_offset: 4,
        },
    ],
}

The diff from the unwind info from main is:

@@ -11,7 +11,7 @@
             reg: 5,
         },
         SetFPReg {
-            instruction_offset: 21,
+            instruction_offset: 4,
         },
     ],
 }

which is to be expected.

So the only other bit of unwind info we could really modify here seems to be prologue_size: 21, which does seem fishy because we're technically unwinding from within the prologue itself.

@peterhuene you may know more about windows unwinding here, does windows disallow unwinding from within the prologue itself?

@peterhuene
Copy link
Member

peterhuene commented May 9, 2022

Unwinding from within the prologue is permitted on windows.

It is why the unwind codes are expected to be sorted in descending order from prologue offset: the unwinder simply walks the codes, skipping over any where the prologue offset is greater than the function offset of the exception address.

So if we were originally emitting the frame pointer establishment unwind code with a prologue offset equal to the size of the prologue, then any exception from within the prologue would have skipped undoing that operation.

@alexcrichton
Copy link
Member

Hm ok so the unwinding logs I have from this PR look like:

...
2022-05-09T17:24:20.1618041Z ip=0x7ffc7462b592 sp=0xb9a2d7b1e0
2022-05-09T17:24:20.1618120Z ip=0x7ffc745e2022 sp=0xb9a2d7b280
2022-05-09T17:24:20.1618188Z ip=0x7ffc74652e1e sp=0xb9a2d7b4c0
2022-05-09T17:24:20.1618271Z ip=0x24c8b5b1013 sp=0xb9a2d7c2a0
2022-05-09T17:24:20.1618403Z ip=0xb9a2d7c2b0 sp=0xb9a2d7c2a8
2022-05-09T17:24:20.1618484Z ip=0x24c8b5b101d sp=0xb9a2d7c2b0
2022-05-09T17:24:20.1618561Z ip=0xb9a2d7c2c0 sp=0xb9a2d7c2b8
2022-05-09T17:24:20.1618639Z ip=0x24c8b5b101d sp=0xb9a2d7c2c0
2022-05-09T17:24:20.1618707Z ip=0xb9a2d7c2d0 sp=0xb9a2d7c2c8
2022-05-09T17:24:20.1618783Z ip=0x24c8b5b101d sp=0xb9a2d7c2d0
...

where I believe the first wasm pc is ip=0x24c8b5b1013 sp=0xb9a2d7c2a0. Since text sections are page-aligned I think this corresponds with the address at 0x13 above which is in:

  0xd:  jbe ; ud2 stk_ovf ;

which is indeed the ud2 at 0x13.

The "second wasm frame", however is bogus ip=0xb9a2d7c2b0 sp=0xb9a2d7c2a8. The problem here is that the ip is actually a stack pointer and the sp is not 16-byte aligned.

This looks like the unwinder did not actually unwind the pushq %rbp instruction which is in our prologue. The unwinder then popped the return address of the call, which got the previous value of %rbp and misinterpreted it as an instruction pointer. Then everything goes awry b/c presumably everything is busted at that point.

I'm not sure why the PushRegister instruction isn't being unwound...

@cfallin
Copy link
Member Author

cfallin commented May 9, 2022

Thanks @alexcrichton for digging into this more!

For my own potential debugging later: are you able to test this locally on a desktop Windows install (and which version if so)? I do have a Win10 VM I can dig up, which I suspect corresponds to windows-2019 (or earlier); I wonder if I would have to upgrade to Win11 to see this behavior...

@alexcrichton
Copy link
Member

Nah I'm just throwing things at CI and reading output from github actions, basically vicariously printf-debugging through github heh

cfallin added 2 commits May 9, 2022 16:19
…s to fix metadata bug related to stack checks.

In bytecodealliance#3859 we discovered that the unwind metadata info on Windows is
placed slightly incorrectly in the presence of explicit stack checks in
function prologues. In particular, if the stack check fails, then we
start the process of backtracing from a state where we have actually
updated rbp (the frame pointer) but we have not emitted the metadata
saying we have done so.

The fix is to emit the `SetFPReg` Win32 unwind op exactly at the offset
of the `mov rbp, rsp` instruction, not after the stack check. However,
actually achieving this output given our current unwind design (which
abstracts over SysV and Win32) is slightly tricky, because:

- We previously had a single `DefineNewFrame` op that both indicated
  where the FP was set, and gave the size of the clobbers;
- We don't know the size of the clobbers until we generate the
  clobber-save sequence; and
- The clobber-save sequence is generated after any explicit stack check.

To resolve the problem, this PR splits the `DefineNewFrame` op into two
pieces: `SetFrameReg` and `DefineNewFrame`. On x86-64 and aarch64 this
is emitted in the appropriate place. (`s390x` does not use a frame
register, so has no need for this op.) This should resolve the issue we
were seeing with Windows-2022 and unwinding.

(Thanks to @alexcrichton, @iximeow and @peterhuene for help debugging
this!)
@cfallin
Copy link
Member Author

cfallin commented May 10, 2022

(Pushed rebase onto current main)

I'm at a bit of a loss here: I re-read all of the documentation on fastcall prologues and unwind info, and nothing seems amiss.

I will list all of the assumptions that we're making to see if any might be wrong:

  • Unwind entries are emitted in reverse order (highest offset first);
  • Unwind entries are associated with the offset of the following instruction (e.g. above, the push rbp is at offset 0 and is a 1-byte instruction, so is emitted with prologue offset 1);
  • The unwinder, if unwinding from a function offset in the middle of the prologue, will revert the effects of unwind ops at that offset or prior;
  • To revert a UWOP_PUSH_NONVOL (opcode 0), the unwinder reads old rbp from *rsp and increments rsp by 8;
  • We are allowed to have other instructions in the prologue (within the first "prologue size" bytes of the function body), such as the stack check.

Any others? Does all of this sound reasonable? The most dubious is IMHO the last. However (i) if the unwinder claims to work by only reading the array of unwind ops, this shouldn't be a problem; and (ii) we need to make it so, because if there were clobber-saves, they would come after the stack limit check, and these are part of the prologue.

I suppose a high-level alternative design branch would be to do the stack limit check outside of the prologue. However this requires a bunch of re-engineering, and divergence from the current shared ABI code:

  • It implies two checks: "enough stack limit to save clobbers" before every call (this becomes the responsibility of the caller) and then the usual one, just after the prologue;
  • When the host calls Wasm, it implies a stack limit check analogous to the above (as part of the trampoline maybe?);
  • It implies that we decrement rsp twice, once to create space for clobber-saves and once to allocate the rest of the fixed frame (currently we collapse all this overhead into one sub instruction).

So that doesn't seem terribly tenable to me either.

It's really unfortunate that I don't have a way of reproducing this locally, and that the unwinder is so opaque. I'm not sure what to do currently and have no remaining strategies for solving this bug other than "ask Microsoft what changed in the Windows-2022 unwinder"; does anyone else have better ideas?

@alexcrichton
Copy link
Member

I did a "final" poking around at this last night as I was inspired again. I double-checked everything down to the binary encoding and API alls we're calling into Windows. Everything checked out and made sense except for the fact that this doesn't work. It definitely feels odd that the exact same code works on 2019 and doesn't work on 2022. At this point I think we'd need to enlist a Windows expert to figure this out.

@cfallin
Copy link
Member Author

cfallin commented Aug 2, 2022

I'm cleaning up old PRs and closing this now: since we no longer rely on unwind metadata for stack walking on Windows (now that #4431 is merged), this is no longer an issue.

@cfallin cfallin closed this Aug 2, 2022
@bjorn3
Copy link
Contributor

bjorn3 commented Aug 2, 2022

Isn't this still necessary for debuggers?

@cfallin
Copy link
Member Author

cfallin commented Aug 2, 2022

Yes, I suppose it would still be an issue for debuggers on Windows versions with the bug. However we weren't able to find the issue after extensive debugging (see whole thread above!) and this doesn't seem to solve the problem, so for that case I would say I'm closing the PR simply because it doesn't work...

At this point to get Windows' native unwinder to fully work in windows-2022 I think we need someone who is a deep expert in SEH and/or is willing to single-instruction-step through Windows internals. That's definitely not me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unwind info for Windows may not be correct for entire function prologue

4 participants