Skip to content

Conversation

@maryamariyan
Copy link
Owner

Context containing the common services on the . Some properties may be null until set by the IHost.

AddHostedService
HostOptions.ShutdownTimeout

Microsoft.Extensions.Hosting

Context containing the common services on the <see cref="IHost" />. Some properties may be null until set by the IHost.
maryamariyan pushed a commit that referenced this pull request Feb 1, 2022
…otnet#63598)

* Fix native frame unwind in syscall on arm64 for VS4Mac crash report.

Add arm64 version of StepWithCompactNoEncoding for syscall leaf node wrappers that have compact encoding of 0.

Fix ReadCompactEncodingRegister so it actually decrements the addr.

Change StepWithCompactEncodingArm64 to match what MacOS libunwind does for framed and frameless stepping.

arm64 can have frames with the same SP (but different IPs). Increment SP for this condition so createdump's unwind
loop doesn't break out on the "SP not increasing" check and the frames are added to the thread frame list in the
correct order.

Add getting the unwind info for tail called functions like this:

__ZL14PROCEndProcessPvji:
   36630:       f6 57 bd a9     stp     x22, x21, [sp, #-48]!
   36634:       f4 4f 01 a9     stp     x20, x19, [sp, #16]
   36638:       fd 7b 02 a9     stp     x29, x30, [sp, #32]
   3663c:       fd 83 00 91     add     x29, sp, #32
...
   367ac:       e9 01 80 52     mov     w9, #15
   367b0:       7f 3e 02 71     cmp     w19, dotnet#143
   367b4:       20 01 88 1a     csel    w0, w9, w8, eq
   367b8:       2e 00 00 94     bl      _PROCAbort
_TerminateProcess:
-> 367bc:       22 00 80 52     mov     w2, #1
   367c0:       9c ff ff 17     b       __ZL14PROCEndProcessPvji

The IP (367bc) returns the (incorrect) frameless encoding with nothing on the stack (uses an incorrect LR to unwind). To fix this
get the unwind info for PC -1 which points to PROCEndProcess with the correct unwind info. This matches how lldb unwinds this frame.

Always address module segment to IP lookup list instead of checking the module regions.

Strip pointer authentication bits on PC/LR.
maryamariyan pushed a commit that referenced this pull request Feb 3, 2022
* Fix the MacOS remote unwinder for VS4Mac

The wrong module was being passed to the remote unwinder because the load bias for shared modules
was being calculated incorrectly.

Issue: dotnet#63309

* Fix native frame unwind in syscall on arm64 for VS4Mac crash report

From PR in main: dotnet#63598

Add arm64 version of StepWithCompactNoEncoding for syscall leaf node wrappers that have compact encoding of 0.

Fix ReadCompactEncodingRegister so it actually decrements the addr.

Change StepWithCompactEncodingArm64 to match what MacOS libunwind does for framed and frameless stepping.

arm64 can have frames with the same SP (but different IPs). Increment SP for this condition so createdump's unwind
loop doesn't break out on the "SP not increasing" check and the frames are added to the thread frame list in the
correct order.

Add getting the unwind info for tail called functions like this:

__ZL14PROCEndProcessPvji:
   36630:       f6 57 bd a9     stp     x22, x21, [sp, #-48]!
   36634:       f4 4f 01 a9     stp     x20, x19, [sp, #16]
   36638:       fd 7b 02 a9     stp     x29, x30, [sp, #32]
   3663c:       fd 83 00 91     add     x29, sp, #32
...
   367ac:       e9 01 80 52     mov     w9, #15
   367b0:       7f 3e 02 71     cmp     w19, dotnet#143
   367b4:       20 01 88 1a     csel    w0, w9, w8, eq
   367b8:       2e 00 00 94     bl      _PROCAbort
_TerminateProcess:
-> 367bc:       22 00 80 52     mov     w2, #1
   367c0:       9c ff ff 17     b       __ZL14PROCEndProcessPvji

The IP (367bc) returns the (incorrect) frameless encoding with nothing on the stack (uses an incorrect LR to unwind). To fix this
get the unwind info for PC -1 which points to PROCEndProcess with the correct unwind info. This matches how lldb unwinds this frame.

Always address module segment to IP lookup list instead of checking the module regions.

Strip pointer authentication bits on PC/LR.
maryamariyan pushed a commit that referenced this pull request Mar 1, 2022
# Local heap optimizations on Arm64

1. When not required to zero the allocated space for local heap (for sizes up to 64 bytes) - do not emit zeroing sequence. Instead do stack probing and adjust stack pointer:

```diff
-            stp     xzr, xzr, [sp,#-16]!
-            stp     xzr, xzr, [sp,#-16]!
-            stp     xzr, xzr, [sp,#-16]!
-            stp     xzr, xzr, [sp,#-16]!
+            ldr     wzr, [sp],#-64
```

2. For sizes less than one `PAGE_SIZE` use `ldr wzr, [sp], #-amount` that does probing at `[sp]` and allocates the space at the same time. This saves one instruction for such local heap allocations:

```diff
-            ldr     wzr, [sp]
-            sub     sp, sp, dotnet#208
+            ldr     wzr, [sp],#-208
```

Use `ldp tmpReg, xzr, [sp], #-amount` when the offset not encodable by post-index variant of `ldr`:
```diff
-            ldr     wzr, [sp]
-            sub     sp, sp, dotnet#512
+            ldp     x0, xzr, [sp],#-512
```

3. Allow non-loop zeroing (i.e. unrolled sequence) for sizes up to 128 bytes (i.e. up to `LCLHEAP_UNROLL_LIMIT`). This frees up two internal integer registers for such cases:

```diff
-            mov     w11, dotnet#128
-                                               ;; bbWeight=0.50 PerfScore 0.25
-G_M44913_IG19:        ; gcrefRegs=00F9 {x0 x3 x4 x5 x6 x7}, byrefRegs=0000 {}, byref, isz
             stp     xzr, xzr, [sp,#-16]!
-            subs    x11, x11, #16
-            bne     G_M44913_IG19
+            stp     xzr, xzr, [sp,#-112]!
+            stp     xzr, xzr, [sp,#16]
+            stp     xzr, xzr, [sp,#32]
+            stp     xzr, xzr, [sp,dotnet#48]
+            stp     xzr, xzr, [sp,dotnet#64]
+            stp     xzr, xzr, [sp,dotnet#80]
+            stp     xzr, xzr, [sp,dotnet#96]
```

4. Do zeroing in ascending order of the effective address:

```diff
-            mov     w7, dotnet#96
-G_M49279_IG13:
             stp     xzr, xzr, [sp,#-16]!
-            subs    x7, x7, #16
-            bne     G_M49279_IG13
+            stp     xzr, xzr, [sp,#-80]!
+            stp     xzr, xzr, [sp,#16]
+            stp     xzr, xzr, [sp,#32]
+            stp     xzr, xzr, [sp,dotnet#48]
+            stp     xzr, xzr, [sp,dotnet#64]
```

In the example, the zeroing is done at `[initialSp-16], [initialSp-96], [initialSp-80], [initialSp-64], [initialSp-48], [initialSp-32]` addresses. The idea here is to allow a CPU to detect the sequential `memset` to `0` pattern and switch into write streaming mode.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants