Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion src/coreclr/jit/morph.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5624,7 +5624,11 @@ GenTree* Compiler::fgMorphField(GenTree* tree, MorphAddrContext* mac)
GenTree* lclVar = gtNewLclvNode(lclNum, objRefType);
nullchk = gtNewNullCheck(lclVar, compCurBB);

nullchk->gtFlags |= GTF_DONT_CSE; // Don't try to create a CSE for these TYP_BYTE indirections
if (lclNum != info.compThisArg)
{
// Don't try to create a CSE for these TYP_BYTE indirections unless it was a "this" pointer.
nullchk->gtFlags |= GTF_DONT_CSE;
}

if (asg)
{
Expand Down
8 changes: 8 additions & 0 deletions src/coreclr/jit/optcse.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3684,6 +3684,14 @@ bool Compiler::optIsCSEcandidate(GenTree* tree)
case GT_RETURN:
return false; // Currently the only special nodes that we hit
// that we know that we don't want to CSE
case GT_NULLCHECK:
Copy link
Contributor

@SingleAccretion SingleAccretion Apr 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NULLCHECK does not produce values and gets the "void" VN, so, how does this work at all...?

Edit: I see how it works, this actually just unblocks hoisting, not CSE. It would be better to special-case this in IsNodeHoistable.

Also, I don't think there is a need to check for this here, we can hoist any NULLCHECKs (it is always profitable, which is also where the cost function should be adjusted for these non-value nodes).

Copy link
Contributor Author

@kunalspathak kunalspathak Apr 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optIsCSEcandidate() has 2 purposes.

  1. During loop hoisting we check if it will be handled by CSE and if yes, we hoist it out of the loop. During CSE phase, when we actually have to do the expression elimination. By telling that it is CSE candidate, it helps hoisting the check out of loop (because it is an invariant). While working on Hoist the invariants out of multi-level nested loops #68061, I realized cases where we would hoist the field address calculation ahead of null-check although there might not be a visible functional difference. But this PR fixes that to some extent at least for nullchecks around this.

image

  1. While doing actual CSE, we use optIsCSEcandidate(). However, right after that, we check if has any reserved VN and if yes, skip CSEing it.

ValueNum valueVN = vnStore->VNNormalValue(tree->GetVN(VNK_Liberal));
if (ValueNumStore::isReservedVN(valueVN) && (valueVN != ValueNumStore::VNForNull()))
{
continue;
}

Later, global propagation phase takes care of eliminating the null-check that is present inside the loop.

Copy link
Contributor

@SingleAccretion SingleAccretion Apr 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.

Until now, it has been the case that a "hoistable tree" is (almost, modulo one struct pessimization) the same as "CSEable tree", and both must be values, as the design of hoisting is that we rely on CSE to "clean things up".

But if we start to hoist non-value trees (which is a perfectly valid thing to do), that's not really related to CSE, because: a) non-value trees will never be actual CSE candidates, b) the "cleanup" phase is assertion propagation.

So that's why I think it is confusing to put this check inside optIsCSECandidate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that's why I think it is confusing to put this check inside optIsCSECandidate.

Sure, I can probably put this in IsNodeHoistable() which is explicitly for hoisting. In future, we can add more non-value trees in it that can be hoisting candidates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think based on the above discussion it would be good to add a comment in IsNodeHoistable saying that we also can hoist null checks since assertion prop deals with the redundant check left in the loop. It sort of ties loop hoisting together with assertion prop in the same way it is tied together with CSE.

Also, as @SingleAccretion pointed out, I think you can allow this in more places than just for 'this' pointer. We hoist a tree only if its children are invariant which should be safe enough in general for GT_NULLCHECK. Of course that assumes/hopes assertion prop can pick up the additional cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried experimenting with relaxing the hoisting for other object's nullchecks and there are some wins, but I see some regressions of this sort.

image

Basically, we overcome the number of variables hoisted out of the loop as compared to the available registers. I have plans to revisit these heuristics (specially for arm64) in or after #68061 and I would look into "other object's nullcheck hoisting" at that time.

      [000167] hoistable    tree cost too low: 3 < 6 (loopVarCount 6 >= availableRegCount 6)
   ... not profitable to hoist

it would be good to add a comment

If there are no other changes to be done, I would rather do it in my #68061

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the new on the right? It looks like an improvement to me, both in size and perf score (the loop body is smaller and with fewer memory accesses). Or am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I was hoping to also get lea too out of the loop and it was not (because of reasons I stated above). I reran the jit-analyze with PerfScore and I did some improvements. Let me push the changes that enables this for other objects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I think there is still work to be done in hoisting, especially with regards to hoisting more invariant statements.

GenTreeLclVar* lclVar;
if (tree->AsUnOp()->gtGetOp1()->OperIs(GT_LCL_VAR))
{
lclVar = tree->AsUnOp()->gtGetOp1()->AsLclVar();
return lclVar->GetLclNum() == info.compThisArg;
}
return false;

default:
break; // Any new nodes that we might add later...
Expand Down