Skip to content

Conversation

@BruceForstall
Copy link
Contributor

On x64, when the crc32 instruction 2nd operand is a memory address
(such as for a static field), and that address is containable
(which normally doesn't happen, because the address will be above
the 4GB lower address space), then the instruction was being
improperly encoded.

Fixes #59714

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 13, 2021
@ghost
Copy link

ghost commented Oct 13, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

On x64, when the crc32 instruction 2nd operand is a memory address
(such as for a static field), and that address is containable
(which normally doesn't happen, because the address will be above
the 4GB lower address space), then the instruction was being
improperly encoded.

Fixes #59714

Author: BruceForstall
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@BruceForstall
Copy link
Contributor Author

@tannergooding @dotnet/jit-contrib PTAL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure this is the right fix....

I would have expected EncodedBySSE38OrSSE3A to be returning true here, because CRC32 is F2 0F 38 F0 or F2 0F 38 F1.

However, looking at https://github.com/dotnet/runtime/blob/745fa1c7b30f1b9107084f0baecc325527fa9561/src/coreclr/jit/emitxarch.cpp#L1583-L1611, it might be failing because the "prefix" is 0xF2 and not 0x66.

0x66, 0xF2, and 0xF3 are all encounterable "prefix" bytes here and this is probably something that was missed when instructions covering the other two were added. In particular:

  • 0x66 is the most prominent and covers every instruction we support today, except CRC32
  • 0xF2 - Only used for SSE42.CRC32
  • 0xF3 - no instructions exposed here yet
    • ADX.ADOX
    • AESKLE.AESDEC128KL
    • AESKLE.AESDEC256KL
    • AESKLEWIDE_KL.AESDECWIDE128KL
    • AESKLEWIDE_KL.AESDECWIDE256KL
    • AESKLE.AESENC128KL
    • AESKLE.AESENC256KL
    • AESKLEWIDE_KL.AESENCWIDE128KL
    • AESKLEWIDE_KL.AESENCWIDE256KL
    • AESKLE.ENCODEKEY128
    • AESKLE.ENCODEKEY256
    • KL.LOADIWKEY

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure this is the right fix

Do you mean you don't know if this fixes the issue? Or you think there might be a more general fix? (I did of course verify that it fixes the issue. It's also nice that it's very simple and contained, as I want to port this to .NET 6.)

More details:

EncodedBySSE38OrSSE3A returns false for crc32 because crc32 is not in the SSE or AVX instruction range, so IsSSEOrAVXInstruction returns false. (Not sure why it's defined that way; tzcnt/lzcnt/popcnt also aren't in that set.)

Looking through the code base, there are 17 places that use the condition (EncodedBySSE38orSSE3A(ins) || (ins == INS_crc32)) and only 8 that use the condition (EncodedBySSE38orSSE3A(ins)). In particular, the places without the || (ins == INS_crc32) logic handle emission of:

IF_RRW_ARD_CNS
IF_RWR_ARD_CNS
IF_RWR_RRD_ARD_CNS
IF_RWR_RRD_ARD_RRD
IF_RRW_SRD_CNS
IF_RWR_SRD_CNS
IF_RWR_RRD_SRD
IF_RWR_RRD_SRD_CNS
IF_RWR_RRD_SRD_RRD
IF_RRW_MRD_CNS
IF_RWR_MRD_CNS
IF_RWR_RRD_MRD
IF_RWR_RRD_MRD_CNS
IF_RWR_RRD_MRD_RRD

Since crc32 is a binary operator, none of these apply, so we could let EncodedBySSE38OrSSE3A handle crc32 as well (and rename it to match).

The only question then is whether Is4ByteSSEInstruction should also include it (that's the only other use of EncodedBySSE38OrSSE3A). That's a little trickier to unravel; there a case in emitOutputRR I'm not sure about; a case in emitOutputInstr dealing with IF_RRW_RRW_CNS that doesn't matter; and a case in emitGetAdjustedSize to fix. Presumably we could define it as:

bool emitter::Is4ByteSSEInstruction(instruction ins)
{
    return (!UseVEXEncoding() && EncodedBySSE38orSSE3A(ins)) || (ins == INS_crc32);
}

as crc32 is not affected by VEX encoding.

Copy link
Member

@tannergooding tannergooding Oct 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean you don't know if this fixes the issue

I meant this as, there might be additional locations that need equivalent fixups; however, adjusting the existing method that handles SSE38 and SSE3A encodings may be a better overall fix.

EncodedBySSE38OrSSE3A returns false for crc32 because crc32 is not in the SSE or AVX instruction range

Ah I see, that would impact this as well: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/instrsxarch.h#L615

For what its worth, it is actually an SSE4.2 instruction (and SSE38 based encoding), even if its not a vector instruction and should likely be in the list (much as BMI1 and BMI2 are in the overall AVX range). I'm guessing that crc32, tzcnt, lzcnt, and popcnt fell into some special category that forced them out of the SSE/AVX ranges, possibly due to more complicated fixups that would've been required. In particular:

  • crc32 is explicitly SSE4.2
  • tzcnt is explicitly BMI1
  • lzcnt is implicitly BMI1 on Intel; its separate because its a separate bit for compat with AMD which exposed it as part of ABM ~2007 (so 6 years earlier)
  • popcnt is implicitly SSE4.2 on Intel; its separate because for the same reason as lzcnt

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on your comments above; it sounds like this fix is fine for .NET 6. It's probably worth seeing if we should adjust the overall handling for .NET 7 to properly account for the SSE38 prefix difference (as that is likely required for future instruction expansion regardless).

On x64, when the crc32 instruction 2nd operand is a memory address
(such as for a static field), and that address is containable
(which normally doesn't happen, because the address will be above
the 4GB lower address space), then the instruction was being
improperly encoded.
@BruceForstall
Copy link
Contributor Author

@tannergooding Do you have any further change you think are required for this, or are you willing to approve this PR?

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good for a minimal fix.

I think, in terms of potential future instructions and the like we may need a more complex fix properly recognizing CRC32 as EncodedBySSE38

@BruceForstall BruceForstall merged commit 5b1ebf7 into dotnet:main Oct 13, 2021
@BruceForstall
Copy link
Contributor Author

/backport to release/6.0

@github-actions
Copy link
Contributor

Started backporting to release/6.0: https://github.com/dotnet/runtime/actions/runs/1338896860

@ghost ghost locked as resolved and limited conversation to collaborators Nov 13, 2021
@BruceForstall BruceForstall deleted the FixCrc32Encoding branch December 28, 2022 01:13
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

JIT: crc32 bad codegen for contained constant address

2 participants