Skip to content

Conversation

@jakobbotsch
Copy link
Member

Backport of #94413 to release/8.0-staging

Customer Impact

  • Customer reported
  • Found internally

Expressions involving multiplications of VectorX.One with a scalar are constant folded to incorrect values. For example, Vector2.One * a, which should be folded to <a, a> is folded to <a, 0>. Customer reported in #98137; this backports a .NET 9 fix #94413 that noticed and fixed the issue.

Regression

  • Yes
  • No

Testing

Regression test included.

Risk

Low. The fix has been in main for several months already.

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 8, 2024
@ghost ghost assigned jakobbotsch Feb 8, 2024
@ghost
Copy link

ghost commented Feb 8, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Backport of #94413 to release/8.0-staging

Customer Impact

  • Customer reported
  • Found internally

Expressions involving multiplications of VectorX.One with a scalar are constant folded to incorrect values. For example, Vector2.One * a, which should be folded to <a, a> is folded to <a, 0>. Customer reported in #98137; this backports a .NET 9 fix #94413 that noticed and fixed the issue.

Regression

  • Yes
  • No

Testing

Regression test included.

Risk

Low. The fix has been in main for several months already.

Author: jakobbotsch
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@carlossanlop
Copy link
Contributor

Friendly reminder that Monday February 12th is the Code Complete deadline for the March Release. Please make sure to add the servicing-consider label when ready and send the email to Tactics requesting approval.

@theofficialgman
Copy link

theofficialgman commented Feb 9, 2024

@jakobbotsch in the meantime, while this bug exists in release 8.0.x and likely will for at least a few weeks/months until this is merged and a new release is cut, can the problematic NI_AdvSimd_MultiplyByScalar and NI_AdvSimd_Arm64_MultiplyByScalar be forced to use the interpreter and never be jitted via an environment variable?

@jakobbotsch jakobbotsch added the Servicing-consider Issue for next servicing release review label Feb 9, 2024
@jakobbotsch
Copy link
Member Author

@jakobbotsch in the meantime, while this bug exists in release 8.0.x and likely will for at least a few weeks/months until this is merged and a new release is cut, can the problematic NI_AdvSimd_MultiplyByScalar and NI_AdvSimd_Arm64_MultiplyByScalar be forced to use the interpreter and never be jitted via an environment variable?

Not easily I think, @tannergooding do you know if we have any fine grained controls that would allow this?

A possible workaround is to extract the multiplication to your own uninlined function, something like

[MethodImpl(MethodImplOptions.NoInlining)]
static Vector2 MulByScalar(Vector2 v, float a) => v * a;

Copy link
Member

@jeffschwMSFT jeffschwMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved. we will take for consideration in 8.0.x

@tannergooding
Copy link
Member

do you know if we have any fine grained controls that would allow this?

We do not. You either get all intrinsics in an ISA, or none of them. There is no per method configuration (which would be a nightmare to maintain, given we have over 2000 Arm64 intrinsic APIs).

We likewise have no support for emulating the behavior of Arm64 hardware intrinsics. If the hardware doesn't support them, you don't get access to them and they throw.

@theofficialgman
Copy link

theofficialgman commented Feb 11, 2024

do you know if we have any fine grained controls that would allow this?

We do not. You either get all intrinsics in an ISA, or none of them. There is no per method configuration (which would be a nightmare to maintain, given we have over 2000 Arm64 intrinsic APIs).

@tannergooding understood. so disabling all arm64 hardware intrinsics is an option then in the meantime DOTNET_EnableArm64AdvSimd=0 (and is much more performant that disabling the whole jit)

Is there any way to set that environment variable as default at application compile time? It seems like it needs to be set before the dotnet runtime even starts inorder for the jit to respect it, so setting it in the main Program.cs doesn't seem to work as that is too late.

@tannergooding
Copy link
Member

so disabling all arm64 hardware intrinsics is an option then in the meantime DOTNET_EnableArm64AdvSimd=0 (and is much more performant that disabling the whole jit)

A lot of very core components in the BCL use hardware intrinsics for acceleration but very few of them rely on multiplication by a scalar and even fewer rely or expose any potential to hit multiplication of a scalar by a constant. As such, I expect you would see very significant performance regressions from globally disabling the feature and it would likely be significantly easier to just workaround the behavior in your own application instead.

For example, if you're needing to construct a Vector2 with every element being the same value, it's much better to simply do new Vector2(number). Multiplication by 1 is effectively a no-op, and so the optimization here in morph is really just a "nice to have" to help minimally improve codegen if you happen to hit this path after inlining. It's not expected to be some typical thing that a user would frequently hit in their applicaiton (much as you wouldn't expect to see x * 0 or x * 1 for scalar code in a library or application).

It seems like it needs to be set before the dotnet runtime even starts inorder for the jit to respect it

Yes

Is there any way to set that environment variable as default at application compile time

Not that I'm aware of. These configuration knobs are primarily meant for testing purposes, so dev's can trivially test things like their software fallback on a platform with SIMD support. They aren't really meant to be actually disabled in production scenarios, so they only support environment variables and don't have a corresponding RuntimeConfig.json switch or anything like that.

@jakobbotsch
Copy link
Member Author

Approved by tactics over email.

@jakobbotsch jakobbotsch added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Feb 12, 2024
@jakobbotsch jakobbotsch merged commit dedae5e into dotnet:release/8.0-staging Feb 12, 2024
@jakobbotsch jakobbotsch deleted the backport-94413 branch February 12, 2024 16:59
@theofficialgman
Copy link

theofficialgman commented Feb 12, 2024

A lot of very core components in the BCL use hardware intrinsics for acceleration but very few of them rely on multiplication by a scalar and even fewer rely or expose any potential to hit multiplication of a scalar by a constant. As such, I expect you would see very significant performance regressions from globally disabling the feature and it would likely be significantly easier to just workaround the behavior in your own application instead.

Sure, performance is poorer but the results are correct (which is what is important). Given the time/complexity/difficulty in working around the temporary problem (see below), I proposed the temporary solution. This issue existing at all even when a fix was in main for months I think highlights an issue in your development process for .NET. If there is a functional bug that can be trivially backported where there is a discrepancy between the interpreter and jit, why not mark that for backporting in every case?

For example, if you're needing to construct a Vector2 with every element being the same value, it's much better to simply do new Vector2(number). Multiplication by 1 is effectively a no-op, and so the optimization here in morph is really just a "nice to have" to help minimally improve codegen if you happen to hit this path after inlining. It's not expected to be some typical thing that a user would frequently hit in their applicaiton (much as you wouldn't expect to see x * 0 or x * 1 for scalar code in a library or application).

Sure, easy to say in theory, not feasible in practice where vector math is used across many files in the application and its framework (just two examples here: https://github.com/FosterFramework/Foster/blob/main/Framework/Utility/Calc.cs https://github.com/ExOK/Celeste64/blob/main/Source/Helpers/Utils.cs). Looking at it further, I assume that any use of the following constants multiplied by a vector would cause the bug https://github.com/FosterFramework/Foster/blob/dc065e0bfe1ed73e7285701c7128f5c6d1443b4b/Framework/Utility/Calc.cs#L18-L49 but I have not checked every file for user of vectors multiplied by constants... it would be great ordeal to do so and fix each one. edit: or maybe not since those are floats... in any case, this just highlights the ordeal that it would be to go through the application and all recursive dependencies to check for vector times scalar operations.

Its not my application either, I just was one of the ones that encountered the issue after building and testing the application on ARM64 linux and reported it to the application developer. Incase it wasn't clear, I am not a developer, just a hardware hacker/packager/porter/OS maintainer.

edit: this comment does not necessitate a response. it is just something to think about for the future.

@theofficialgman
Copy link

@jakobbotsch I'd like to confirm that this PR will get merged into the upcoming v8.0.3 tag by @dotnet-bot

It seems arbitrary to me what gets pulled from 8.0-staging into the release tag. Something like only commits that have an age >= 1 month seem to have gotten pulled in from 8.0-staging into the v8.0.2 release tag which is why this was missed last cycle.

@jakobbotsch
Copy link
Member Author

Yes, this will be in 8.0.3.

It seems arbitrary to me what gets pulled from 8.0-staging into the release tag. Something like only commits that have an age >= 1 month seem to have gotten pulled in from 8.0-staging into the v8.0.2 release tag which is why this was missed last cycle.

The servicing releases are finalized and validated several weeks before they are released. I've personally already manually validated that the reported issues around this bug are fixed with 8.0.3. It was never expected that this fix would make 8.0.2.

@github-actions github-actions bot locked and limited conversation to collaborators Apr 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Servicing-approved Approved for servicing release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants