-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Use zero-extension inMemory<T>.Pin
#119895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Tagging subscribers to this area: @dotnet/area-system-memory |
| GCHandle handle = GCHandle.Alloc(tmpObject, GCHandleType.Pinned); | ||
| ref char stringData = ref Unsafe.Add(ref s.GetRawStringData(), _index); | ||
| return new MemoryHandle(Unsafe.AsPointer(ref stringData), handle); | ||
| ref char reference = ref Unsafe.Add(ref s.GetRawStringData(), (nint)(uint)_index); | ||
| // Unsafe.AsPointer is safe since the handle pins it | ||
| return new MemoryHandle(Unsafe.AsPointer(ref reference), handle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't think the comment being moved down is "clearer", it just creates a larger diff.
The same with changing the name from stringData to reference.
For the actual Unsafe.Add, it can just be (uint)_index, you don't also need the cast to (nint). However, it's also worth noting that a lot of modern hardware has zero-cost sign-extension, so I'm not sure the extra churn/complexity is worthwhile.
This is unlikely to be a hot path as repeated pinning is itself expensive and the potential 1-cycle savings will get lost in the general noise due to the overhead of GCHandle.Alloc and freeing the handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We see diffs like this on X64:
- lea rax, bword ptr [r14+0x10]
- mov ecx, dword ptr [r15+0x08]
- and ecx, 0xD1FFAB1E
- movsxd rcx, ecx
- lea rax, [rax+8*rcx]
+ mov eax, dword ptr [r15+0x08]
+ and eax, 0xD1FFAB1E
+ lea rax, bword ptr [r14+8*rax+0x10]
xor rcx, rcx
mov gword ptr [rbx], rcx
mov qword ptr [rbx+0x08], rax
- ;; size=30 bbWeight=0.50 PerfScore 2.88
+ ;; size=23 bbWeight=0.50 PerfScore 2.75There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's diffs on ARM64 as well. It seems like the primary improvement of the change is actually to addressing modes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latter isn't "strictly" better. It may actually be more expensive due to using a more expensive lea mode which is restricted to a single port, rather than the simpler lea mode which works on multiple ports.
The former could also become (almost) the latter if it was better and looks like just a missing JIT opt. Namely, it could be (only differing by the movsxd):
mov eax, dword ptr [r15+0x08]
and eax, 0xD1FFAB1E
movsxd rax, eax
lea rax, [r14+8*rax+0x10]That is, there shouldn't be anything here that blocks the r14+0x10 from being contained since it's just (x+y)+(8*z) and changing to x+(8*z)+y
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems far more meaningful to investigate what's "blocking" that transform right now and whether it is actually more beneficial to do in typical scenarios.
It would likely have much broader impact to ecosystem and not require manually fixing a bunch of C# code to explicitly cast to uint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems far more meaningful to investigate what's "blocking" that transform right now and whether it is actually more beneficial to do in typical scenarios.
@jakobbotsch Is this an example of the following issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems far more meaningful to investigate what's "blocking" that transform right now and whether it is actually more beneficial to do in typical scenarios.
Well currently it is not possible because the logic reads _index multiple times, so it cannot be proven to be never negative. We could read var @this = this at the beginning of the method, this would certainly help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The memory model allows the reads to be hoisted in various scenarios, so there’s really nothing that should be blocking if we aren’t explicitly writing from the same thread
No description provided.