Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure this is the right fix....
I would have expected
EncodedBySSE38OrSSE3Ato be returningtruehere, because CRC32 isF2 0F 38 F0orF2 0F 38 F1.However, looking at https://github.com/dotnet/runtime/blob/745fa1c7b30f1b9107084f0baecc325527fa9561/src/coreclr/jit/emitxarch.cpp#L1583-L1611, it might be failing because the "prefix" is
0xF2and not0x66.0x66,0xF2, and0xF3are all encounterable "prefix" bytes here and this is probably something that was missed when instructions covering the other two were added. In particular:0x66is the most prominent and covers every instruction we support today, exceptCRC320xF2- Only used forSSE42.CRC320xF3- no instructions exposed here yetADX.ADOXAESKLE.AESDEC128KLAESKLE.AESDEC256KLAESKLEWIDE_KL.AESDECWIDE128KLAESKLEWIDE_KL.AESDECWIDE256KLAESKLE.AESENC128KLAESKLE.AESENC256KLAESKLEWIDE_KL.AESENCWIDE128KLAESKLEWIDE_KL.AESENCWIDE256KLAESKLE.ENCODEKEY128AESKLE.ENCODEKEY256KL.LOADIWKEYThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean you don't know if this fixes the issue? Or you think there might be a more general fix? (I did of course verify that it fixes the issue. It's also nice that it's very simple and contained, as I want to port this to .NET 6.)
More details:
EncodedBySSE38OrSSE3Areturnsfalsefor crc32 because crc32 is not in the SSE or AVX instruction range, soIsSSEOrAVXInstructionreturnsfalse. (Not sure why it's defined that way; tzcnt/lzcnt/popcnt also aren't in that set.)Looking through the code base, there are 17 places that use the condition
(EncodedBySSE38orSSE3A(ins) || (ins == INS_crc32))and only 8 that use the condition(EncodedBySSE38orSSE3A(ins)). In particular, the places without the|| (ins == INS_crc32)logic handle emission of:Since crc32 is a binary operator, none of these apply, so we could let
EncodedBySSE38OrSSE3Ahandle crc32 as well (and rename it to match).The only question then is whether
Is4ByteSSEInstructionshould also include it (that's the only other use ofEncodedBySSE38OrSSE3A). That's a little trickier to unravel; there a case inemitOutputRRI'm not sure about; a case inemitOutputInstrdealing withIF_RRW_RRW_CNSthat doesn't matter; and a case inemitGetAdjustedSizeto fix. Presumably we could define it as:as crc32 is not affected by VEX encoding.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant this as, there might be additional locations that need equivalent fixups; however, adjusting the existing method that handles
SSE38andSSE3Aencodings may be a better overall fix.Ah I see, that would impact this as well: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/instrsxarch.h#L615
For what its worth, it is actually an
SSE4.2instruction (and SSE38 based encoding), even if its not a vector instruction and should likely be in the list (much asBMI1andBMI2are in the overall AVX range). I'm guessing thatcrc32,tzcnt,lzcnt, andpopcntfell into some special category that forced them out of theSSE/AVXranges, possibly due to more complicated fixups that would've been required. In particular:crc32is explicitly SSE4.2tzcntis explicitly BMI1lzcntis implicitly BMI1 on Intel; its separate because its a separate bit for compat with AMD which exposed it as part of ABM ~2007 (so 6 years earlier)popcntis implicitly SSE4.2 on Intel; its separate because for the same reason aslzcntThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on your comments above; it sounds like this fix is fine for .NET 6. It's probably worth seeing if we should adjust the overall handling for .NET 7 to properly account for the SSE38 prefix difference (as that is likely required for future instruction expansion regardless).