Skip to content

Conversation

@kunalspathak
Copy link
Contributor

Earlier, I added a limit mask to include few registers in case we are allocating during consecutive registers. However, the mask was not sufficient, and we were still running into situation where we won't have register to allocate to one of the consecutive registers or a refPosition that is live at the same location. Updated the handling of such cases:

  1. Fix issue where we might have more than one busy register range available to allocate and in prevRegOpt heuristic, teach it to handle the free registers.
  2. If we see that consecutive registers are not available in JitStressRegs mode, add back a register range so it can allocate for it.
  3. Likewise, if a refposition that is alive at the same location where consecutive registers are live, most of the registers are busy and hence, just do not further limit the registers for such refpositions.
  4. While calculating minRegCount, take into account the refpositions for consecutive registers.
  5. With all of that, we do not need LsraLimitFPSetForConsecutive.

Fixes: #84536

If we find out that there are no candidates free/busy for refPositions
that need consecutive registers, have at least one range of registers
in the candidates such that allocation is possible.
Intially, we were just returning RBM_NONE if we don't find any freeCandidates,
but instead should try if we can find out if there are any busy candidates
that we should try them out.
…osition

If consecutive registers are being allocated, other refpositions that are live
at the same location might not have enough registers left to be assigned because
all registers are busy. As such, introduce a way to track if we are assigning
at the location of consecutive registers, and if yes, do not take jitstressregs
limit into account.
For consecutive register, also include the register count needed for "minimum
register requirement" when limiting the registers.
With other conditions in place, no need to have LsraLimitFPSetForConsecutive.
@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 10, 2023
@ghost ghost assigned kunalspathak Apr 10, 2023
@ghost
Copy link

ghost commented Apr 10, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

Earlier, I added a limit mask to include few registers in case we are allocating during consecutive registers. However, the mask was not sufficient, and we were still running into situation where we won't have register to allocate to one of the consecutive registers or a refPosition that is live at the same location. Updated the handling of such cases:

  1. Fix issue where we might have more than one busy register range available to allocate and in prevRegOpt heuristic, teach it to handle the free registers.
  2. If we see that consecutive registers are not available in JitStressRegs mode, add back a register range so it can allocate for it.
  3. Likewise, if a refposition that is alive at the same location where consecutive registers are live, most of the registers are busy and hence, just do not further limit the registers for such refpositions.
  4. While calculating minRegCount, take into account the refpositions for consecutive registers.
  5. With all of that, we do not need LsraLimitFPSetForConsecutive.

Fixes: #84536

Author: kunalspathak
Assignees: kunalspathak
Labels:

area-CodeGen-coreclr

Milestone: -

Copy link
Contributor

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple questions

candidates &= ~busyRegs;

#ifdef DEBUG
inUseOrBusyRegsMask |= ~busyRegs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be:

Suggested change
inUseOrBusyRegsMask |= ~busyRegs;
inUseOrBusyRegsMask |= busyRegs;

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below, I am using this as registerAssignment & inUseOrBusyRegsMask.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a separate one for just inUseRegsMask to help differentiate?

It's confusing to see clearing all the busy regs here given the local name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have incorporated Bruce's feedback.

{
candidates &= ~checkConflictBit;
#ifdef DEBUG
inUseOrBusyRegsMask |= ~checkConflictBit;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inUseOrBusyRegsMask |= ~checkConflictBit;
inUseOrBusyRegsMask |= checkConflictBit;

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below, I am using this as registerAssignment & inUseOrBusyRegsMask.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what if you | together two different conflict bits? Then all the bits will be set and registerAssignment & inUseOrBusyRegsMask will do nothing. Is that right? Don't you need to | together all the bits first and then registerAssignment & ~inUseOrBusyRegsMask? At the very least, inUseOrBusyRegsMask as a name doesn't make sense since in your usage it's (kind of) the opposite of that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true.

@kunalspathak
Copy link
Contributor Author

Seems build failures were fixed by #84595.

@kunalspathak
Copy link
Contributor Author

kunalspathak commented Apr 11, 2023

superpmi errors is are #84536 and #84631

@tannergooding
Copy link
Member

superpmi errors are #84536

Isn't this PR supposed to fix this failure in particular?

@kunalspathak
Copy link
Contributor Author

superpmi errors are #84536

Isn't this PR supposed to fix this failure in particular?

Ah, meant to have this comment for the other PR. Will update my comment.

@tannergooding
Copy link
Member

-- As with #84634 (comment), it looks like the SPMI asserts that are still present are from \base\.

This PR, compared to a previous PR has "half" the asserts with older PRs listing failures for both \base\ and \diff\.

@kunalspathak kunalspathak merged commit 7a57f6d into dotnet:main Apr 11, 2023
@kunalspathak kunalspathak deleted the consecutive_jitstressregs branch April 11, 2023 23:54
@ghost ghost locked as resolved and limited conversation to collaborators May 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SPMI replay failure for Arm64 VectorTableLookup

3 participants