Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/coreclr/jit/emitxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10292,7 +10292,7 @@ BYTE* emitter::emitOutputAM(BYTE* dst, instrDesc* id, code_t code, CnsVal* addc)
noway_assert((int)dsp == dsp);

// This requires, specifying a SIB byte after ModRM byte.
if (EncodedBySSE38orSSE3A(ins))
if (EncodedBySSE38orSSE3A(ins) || (ins == INS_crc32))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure this is the right fix....

I would have expected EncodedBySSE38OrSSE3A to be returning true here, because CRC32 is F2 0F 38 F0 or F2 0F 38 F1.

However, looking at https://github.com/dotnet/runtime/blob/745fa1c7b30f1b9107084f0baecc325527fa9561/src/coreclr/jit/emitxarch.cpp#L1583-L1611, it might be failing because the "prefix" is 0xF2 and not 0x66.

0x66, 0xF2, and 0xF3 are all encounterable "prefix" bytes here and this is probably something that was missed when instructions covering the other two were added. In particular:

  • 0x66 is the most prominent and covers every instruction we support today, except CRC32
  • 0xF2 - Only used for SSE42.CRC32
  • 0xF3 - no instructions exposed here yet
    • ADX.ADOX
    • AESKLE.AESDEC128KL
    • AESKLE.AESDEC256KL
    • AESKLEWIDE_KL.AESDECWIDE128KL
    • AESKLEWIDE_KL.AESDECWIDE256KL
    • AESKLE.AESENC128KL
    • AESKLE.AESENC256KL
    • AESKLEWIDE_KL.AESENCWIDE128KL
    • AESKLEWIDE_KL.AESENCWIDE256KL
    • AESKLE.ENCODEKEY128
    • AESKLE.ENCODEKEY256
    • KL.LOADIWKEY

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure this is the right fix

Do you mean you don't know if this fixes the issue? Or you think there might be a more general fix? (I did of course verify that it fixes the issue. It's also nice that it's very simple and contained, as I want to port this to .NET 6.)

More details:

EncodedBySSE38OrSSE3A returns false for crc32 because crc32 is not in the SSE or AVX instruction range, so IsSSEOrAVXInstruction returns false. (Not sure why it's defined that way; tzcnt/lzcnt/popcnt also aren't in that set.)

Looking through the code base, there are 17 places that use the condition (EncodedBySSE38orSSE3A(ins) || (ins == INS_crc32)) and only 8 that use the condition (EncodedBySSE38orSSE3A(ins)). In particular, the places without the || (ins == INS_crc32) logic handle emission of:

IF_RRW_ARD_CNS
IF_RWR_ARD_CNS
IF_RWR_RRD_ARD_CNS
IF_RWR_RRD_ARD_RRD
IF_RRW_SRD_CNS
IF_RWR_SRD_CNS
IF_RWR_RRD_SRD
IF_RWR_RRD_SRD_CNS
IF_RWR_RRD_SRD_RRD
IF_RRW_MRD_CNS
IF_RWR_MRD_CNS
IF_RWR_RRD_MRD
IF_RWR_RRD_MRD_CNS
IF_RWR_RRD_MRD_RRD

Since crc32 is a binary operator, none of these apply, so we could let EncodedBySSE38OrSSE3A handle crc32 as well (and rename it to match).

The only question then is whether Is4ByteSSEInstruction should also include it (that's the only other use of EncodedBySSE38OrSSE3A). That's a little trickier to unravel; there a case in emitOutputRR I'm not sure about; a case in emitOutputInstr dealing with IF_RRW_RRW_CNS that doesn't matter; and a case in emitGetAdjustedSize to fix. Presumably we could define it as:

bool emitter::Is4ByteSSEInstruction(instruction ins)
{
    return (!UseVEXEncoding() && EncodedBySSE38orSSE3A(ins)) || (ins == INS_crc32);
}

as crc32 is not affected by VEX encoding.

Copy link
Member

@tannergooding tannergooding Oct 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean you don't know if this fixes the issue

I meant this as, there might be additional locations that need equivalent fixups; however, adjusting the existing method that handles SSE38 and SSE3A encodings may be a better overall fix.

EncodedBySSE38OrSSE3A returns false for crc32 because crc32 is not in the SSE or AVX instruction range

Ah I see, that would impact this as well: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/instrsxarch.h#L615

For what its worth, it is actually an SSE4.2 instruction (and SSE38 based encoding), even if its not a vector instruction and should likely be in the list (much as BMI1 and BMI2 are in the overall AVX range). I'm guessing that crc32, tzcnt, lzcnt, and popcnt fell into some special category that forced them out of the SSE/AVX ranges, possibly due to more complicated fixups that would've been required. In particular:

  • crc32 is explicitly SSE4.2
  • tzcnt is explicitly BMI1
  • lzcnt is implicitly BMI1 on Intel; its separate because its a separate bit for compat with AMD which exposed it as part of ABM ~2007 (so 6 years earlier)
  • popcnt is implicitly SSE4.2 on Intel; its separate because for the same reason as lzcnt

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on your comments above; it sounds like this fix is fine for .NET 6. It's probably worth seeing if we should adjust the overall handling for .NET 7 to properly account for the SSE38 prefix difference (as that is likely required for future instruction expansion regardless).

{
dst += emitOutputByte(dst, code | 0x04);
}
Expand Down