-
Notifications
You must be signed in to change notification settings - Fork 5.3k
[release/8.0-staging] Ensure that the Create(Dot(...)) optimization doesn't kick in for Vector2 pre SSE4.1 #97074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release/8.0-staging] Ensure that the Create(Dot(...)) optimization doesn't kick in for Vector2 pre SSE4.1 #97074
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsBackport of #96951 to release/8.0-staging /cc @tannergooding Customer ImpactTestingRiskIMPORTANT: If this backport is for a servicing release, please verify that:
|
|
CC. @EgorBo |
|
Does this affect Arm? |
No.
[
vector[0] + vector[1],
vector[2] + vector[3],
vector[0] + vector[1],
vector[2] + vector[3]
]This is problematic for However, Arm64 supports V64 and so you get |
|
@jeffschwMSFT, @JulieLeeMSFT was this not approved in servicing review or was there some other issue blocking it? This was a customer reported codegen bug as indicated in the summary. |
|
This just needs a code review and then we can take it to tactics. I removed the servicing-consider as it was hanging out for multiple potential reviews. |
|
@EgorBo could you give the review here since you signed off on the original fix? |
jeffschwMSFT
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approved. we will take for consideration in 8.0.x
|
CC. @EgorBo, @dotnet/jit-contrib for merging. Monday February 12th is the Code Complete deadline for the March Release, so it needs to be merged by then (Julie asked that I let one of the JIT reviewers do the actual merge here). |
Backport of #96951 to release/8.0-staging
/cc @tannergooding
Customer Impact
This can result in incorrect computations if you are on a machine that pre-dates SSE4.1 support (a machine with a CPU from before ~Nov 2007)
Regression
On older hardware (pre-SSE4.1) use of the
Create(Dot(...))optimization would kick in for Vector2. However, since there are two elements, a multiply followed by a single pairwise addition is emitted and the operation isn't naturally broadcasting on this older hardware. This is in contrast to newer hardware where it can utilize thedot productinstruction which can always broadcast the result.Testing
A regression test covering the scenario was added.
Risk
Low. The issue is well understood and the hardware this impacts is increasingly small. For cloud providers, most default to AVX2 capabable hardware or newer (~Jun 2013). For consumer hardware, places such as the Steam Hardware Survey (see under
Other Settings) report 99.57% of customers have SSE4.1 support.This wasn't surfacing in scenarios such as crossgen/r2r because SSE4.1 support is opportunistically enabled by default.