Skip to content

Conversation

@xtqqczze
Copy link
Contributor

No description provided.

@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

Comment on lines 400 to +403
GCHandle handle = GCHandle.Alloc(tmpObject, GCHandleType.Pinned);
ref char stringData = ref Unsafe.Add(ref s.GetRawStringData(), _index);
return new MemoryHandle(Unsafe.AsPointer(ref stringData), handle);
ref char reference = ref Unsafe.Add(ref s.GetRawStringData(), (nint)(uint)_index);
// Unsafe.AsPointer is safe since the handle pins it
return new MemoryHandle(Unsafe.AsPointer(ref reference), handle);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think the comment being moved down is "clearer", it just creates a larger diff.

The same with changing the name from stringData to reference.

For the actual Unsafe.Add, it can just be (uint)_index, you don't also need the cast to (nint). However, it's also worth noting that a lot of modern hardware has zero-cost sign-extension, so I'm not sure the extra churn/complexity is worthwhile.

This is unlikely to be a hot path as repeated pinning is itself expensive and the potential 1-cycle savings will get lost in the general noise due to the overhead of GCHandle.Alloc and freeing the handle.

Copy link
Contributor Author

@xtqqczze xtqqczze Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We see diffs like this on X64:

-       lea      rax, bword ptr [r14+0x10]
-       mov      ecx, dword ptr [r15+0x08]
-       and      ecx, 0xD1FFAB1E
-       movsxd   rcx, ecx
-       lea      rax, [rax+8*rcx]
+       mov      eax, dword ptr [r15+0x08]
+       and      eax, 0xD1FFAB1E
+       lea      rax, bword ptr [r14+8*rax+0x10]
        xor      rcx, rcx
        mov      gword ptr [rbx], rcx
        mov      qword ptr [rbx+0x08], rax
-						;; size=30 bbWeight=0.50 PerfScore 2.88
+						;; size=23 bbWeight=0.50 PerfScore 2.75

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's diffs on ARM64 as well. It seems like the primary improvement of the change is actually to addressing modes.

Copy link
Member

@tannergooding tannergooding Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter isn't "strictly" better. It may actually be more expensive due to using a more expensive lea mode which is restricted to a single port, rather than the simpler lea mode which works on multiple ports.

The former could also become (almost) the latter if it was better and looks like just a missing JIT opt. Namely, it could be (only differing by the movsxd):

       mov      eax, dword ptr [r15+0x08]
       and      eax, 0xD1FFAB1E
       movsxd   rax, eax
       lea      rax, [r14+8*rax+0x10]

That is, there shouldn't be anything here that blocks the r14+0x10 from being contained since it's just (x+y)+(8*z) and changing to x+(8*z)+y

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems far more meaningful to investigate what's "blocking" that transform right now and whether it is actually more beneficial to do in typical scenarios.

It would likely have much broader impact to ecosystem and not require manually fixing a bunch of C# code to explicitly cast to uint

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems far more meaningful to investigate what's "blocking" that transform right now and whether it is actually more beneficial to do in typical scenarios.

@jakobbotsch Is this an example of the following issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems far more meaningful to investigate what's "blocking" that transform right now and whether it is actually more beneficial to do in typical scenarios.

Well currently it is not possible because the logic reads _index multiple times, so it cannot be proven to be never negative. We could read var @this = this at the beginning of the method, this would certainly help.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memory model allows the reads to be hoisted in various scenarios, so there’s really nothing that should be blocking if we aren’t explicitly writing from the same thread

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Memory community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants