Minor fixes to write barriers #75478

AntonLapounov · 2022-09-12T19:55:58Z

Similar to #74325, uniformly exclude the g_GCShadowEnd, g_ephemeral_high, and g_highest_address upper bounds in the corresponding range checks.

Note that JNB, JAE, and JNC mnemonics correspond to the same processor instruction (jump if CF = 0). I used JNB for amd64 and JAE for i386 for "local consistency" with other range checks in the corresponding files.

The Windows ARM64 version would trash the x16 register in the CheckCardTable block. That was not reflected in the comments — neither in this file nor in src\coreclr\jit\targetarm64.h. I am not convinced that ldp improves performance here as it requires an additional adr instruction and increases the average number of retired instructions. The Linux ARM64 version uses two ldr instructions instead and I decided to use the same for Windows.

AntonLapounov · 2022-09-14T05:49:06Z

@VSadov Please take a look.

src/coreclr/vm/amd64/JitHelpers_Fast.asm

VSadov · 2022-09-14T16:37:17Z

src/coreclr/vm/arm64/asmhelpers.asm

-        bhi      Exit
+        ldr      x12,  wbs_ephemeral_high
+        cmp      x15,  x12
+        bhs      Exit


The .S version is using ccmp here. I think we should use the same pattern in both cases.

I wonder which would be better though. In theory the ephemeral check should more often fail than pass, statistically. In a large heap most objects are tenured. Ephemeral set is supposed to be small.

On the other hand both patterns probably work the same on a speculative CPU, but the predicated pattern is shorter by one instruction.

And on server GC ephemeral check will always pass. I think, since we are touching this, it should be switched to the predicated form like in .S

mangod9 · 2023-03-06T16:05:43Z

@AntonLapounov @VSadov is this something we need to continue to fix?

VSadov · 2023-03-06T18:38:46Z

The changes from edge inclusive to edge exclusive compare are bug fixes. These kind of issues would be difficult to hit, but it makes sense to have correct comparisons.

I am not very concerned about NoShadow code paths as that is debug-only code and I do not know if WRITE_BARRIER_CHECK is used regularly or at all.
The changes in CheckCardTable are in the "real" code.

My only comment on that was that since we are using predicated form in unix ARM64 helper, perhaps we should use predicated form in the windows counterpart as well. It likely does not matter much which form is used, that would be mostly for consistency and predicated form is one instruction shorter.

I could do the change if @AntonLapounov is ok with that.

Otherwise this change LGTM

AntonLapounov · 2023-03-06T19:24:38Z

I will try to complete it this week.

trylek · 2023-04-10T21:52:17Z

Now that Anton has been reassigned to other work outside of .NET, can someone on @dotnet/gc and / or @dotnet/jit-contrib chime in who would be best suited to finalize and merge this PR?

VSadov · 2023-04-10T21:58:35Z

There was only one actionable suggestion - to use similar code in win-arm64 as in unix-arm64 in one place. There is no reason for the difference.
I will make the change.

Maoni0 · 2023-04-10T22:02:38Z

I think @cshung was making use of GC shadow at one point when he was debugging some WB stuff and jit folks probably use it for the most part.

…w/high

VSadov · 2023-04-10T22:24:07Z

I've made the change to address my earlier concern - no need to use different ways to compare on windows and on unix.
With that the change looks good to me.

VSadov · 2023-04-10T23:54:54Z

I may need another signoff on this, or it would look like I am signing off on my own PR

Maoni0

LGTM

VSadov · 2023-04-11T19:28:00Z

Thanks!!

trylek · 2023-04-11T19:49:07Z

Thanks to everyone involved for getting this finished!

EgorBo · 2023-04-20T16:50:34Z

Improvements on arm64:

[Perf] Windows/arm64: 14 Improvements on 4/11/2023 8:14:56 PM perf-autofiling-issues#16766

VSadov · 2023-04-20T17:56:58Z

Switching to predicated bounds check resulted in measurable improvement. I mostly expected just more compact code.

I was considering making the same change to NativeAOT barriers, but just shortening the code was not enough motivation.
It is a trivial change. I will apply the same to NativeAOT.

VSadov · 2023-04-20T18:02:09Z

@EgorBo - in addition to 14 improvements there was one regression in CastingPerf2.CastingPerf.FooObjIsNull. Was that a one time outlier or it stayed in later runs? Is there a way to check?

EgorBo · 2023-04-20T18:09:58Z

@EgorBo - in addition to 14 improvements there was one regression in CastingPerf2.CastingPerf.FooObjIsNull. Was that a one time outlier or it stayed in later runs? Is there a way to check?

We can revise that one in a week since there are not enough data point to judge yet

DrewScoggins · 2023-08-28T22:07:14Z

This ended up being a lasting regression.

VSadov · 2023-08-28T23:10:24Z

This ended up being a lasting regression.

The changed code performs two comparisons (like in logical &&).
The change appears to penalize slightly the cases where the first compare would fail and the second would not happen, while cases when both compares would be performed are slightly faster. Having a null for the object is one case when compare could short-circuit.
The difference is relatively small and only seen because this is such a tight loop benchmark.

Either flavor of this code would penalize one case or another. Considering we have many more improvements and the new variant is shorter, I think this regression is acceptable.

Minor fixes to write barriers

cb1509d

AntonLapounov requested a review from VSadov September 12, 2022 19:55

ghost added the area-VM-coreclr label Sep 12, 2022

ghost assigned AntonLapounov Sep 12, 2022

VSadov reviewed Sep 14, 2022

View reviewed changes

src/coreclr/vm/amd64/JitHelpers_Fast.asm Show resolved Hide resolved

VSadov reviewed Sep 14, 2022

View reviewed changes

trylek assigned VSadov and unassigned AntonLapounov Apr 10, 2023

use the same predicated compare as on Unix when checking ephemeral lo…

da811d6

…w/high

VSadov approved these changes Apr 10, 2023

View reviewed changes

Maoni0 approved these changes Apr 11, 2023

View reviewed changes

VSadov merged commit dbe9f2a into dotnet:main Apr 11, 2023

EgorBo mentioned this pull request Apr 20, 2023

[Perf] Windows/arm64: 1 Regression on 4/11/2023 8:14:56 PM dotnet/perf-autofiling-issues#16759

Closed

ghost locked as resolved and limited conversation to collaborators May 20, 2023

Minor fixes to write barriers #75478

Minor fixes to write barriers #75478

Uh oh!

Conversation

AntonLapounov commented Sep 12, 2022

Uh oh!

AntonLapounov commented Sep 14, 2022

Uh oh!

Uh oh!

VSadov Sep 14, 2022

Choose a reason for hiding this comment

Uh oh!

VSadov Sep 14, 2022

Choose a reason for hiding this comment

Uh oh!

mangod9 commented Mar 6, 2023

Uh oh!

VSadov commented Mar 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AntonLapounov commented Mar 6, 2023

Uh oh!

trylek commented Apr 10, 2023

Uh oh!

VSadov commented Apr 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Maoni0 commented Apr 10, 2023

Uh oh!

VSadov commented Apr 10, 2023

Uh oh!

VSadov commented Apr 10, 2023

Uh oh!

Maoni0 left a comment

Choose a reason for hiding this comment

Uh oh!

VSadov commented Apr 11, 2023

Uh oh!

trylek commented Apr 11, 2023

Uh oh!

EgorBo commented Apr 20, 2023

Uh oh!

VSadov commented Apr 20, 2023

Uh oh!

VSadov commented Apr 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EgorBo commented Apr 20, 2023

Uh oh!

DrewScoggins commented Aug 28, 2023

Uh oh!

VSadov commented Aug 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

VSadov commented Mar 6, 2023 •

edited

Loading

VSadov commented Apr 10, 2023 •

edited

Loading

VSadov commented Apr 20, 2023 •

edited

Loading

VSadov commented Aug 28, 2023 •

edited

Loading