Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
ea2f3d5
Arm64: Add If conversion pass
a74nh Jul 15, 2022
8ee167f
Minor review fixups
a74nh Aug 10, 2022
039de48
Return a PhaseStatus
a74nh Aug 10, 2022
ddfad68
Fix formatting
a74nh Aug 11, 2022
41c3a9e
Check for side effects on NOPs
a74nh Aug 11, 2022
4c93a4c
Add function block comments for the phase
a74nh Aug 11, 2022
891510d
Remove creation of AND chains from if conversion pass
a74nh Aug 12, 2022
c1281b5
Update middleBlock flow
a74nh Aug 12, 2022
b900344
Check for order side effects
a74nh Aug 12, 2022
a3e89b9
Remove COLON_COND check
a74nh Aug 31, 2022
441a60c
Remove flag toggling
a74nh Aug 31, 2022
962c83b
Move the conditional assignment to the JTRUE block
a74nh Aug 31, 2022
90bff39
Fix formatting
a74nh Sep 2, 2022
36cf21d
Allow conditions with side effects
a74nh Sep 7, 2022
127f9a6
Fix formatting
a74nh Sep 7, 2022
d1f0862
Correct all moved SSA statements
a74nh Sep 15, 2022
852a5af
Add size costing check
a74nh Sep 15, 2022
6690566
Only move middle block ssa defs
a74nh Sep 16, 2022
58372e2
Fix formatting
a74nh Sep 16, 2022
10a8f1f
Fewer SSA assumptions
a74nh Sep 20, 2022
97b1af1
Use implicit func for value numbering
a74nh Sep 21, 2022
3f60851
Update header for gtFoldExprConditional
a74nh Oct 6, 2022
0f054fe
Cost based on speed
a74nh Oct 6, 2022
6f0abdf
Merge branch 'main'
a74nh Oct 6, 2022
393fd39
Add Stress mode for inner loops
a74nh Oct 7, 2022
09d62fe
Rework costings
a74nh Oct 7, 2022
73007b2
Check for invalid VNs
a74nh Oct 10, 2022
61af92d
Ignore compares against zero
a74nh Oct 10, 2022
4fdabea
Ensure float compares are contained
a74nh Oct 12, 2022
be60d30
Allow if conversion of test compares
a74nh Oct 12, 2022
806e061
Do not contain test compares within compare chains
a74nh Oct 13, 2022
a508e85
Add float versions of the JIT/opt/Compares tests
a74nh Oct 13, 2022
ac2568f
Fix formatting
a74nh Oct 13, 2022
bad8097
Compare chains use CmpCompares, selects use Compares
a74nh Oct 14, 2022
ed06f67
Merge main
a74nh Oct 17, 2022
935716c
Fix flow checking for empty blocks
a74nh Oct 20, 2022
12bfe0a
Merge main
a74nh Oct 20, 2022
f2f3a02
Fix to contexts setting JitStdOutFile
jakobbotsch Oct 26, 2022
2e183bd
Temporary fix for SPMI problem
jakobbotsch Oct 26, 2022
b2772ff
Fix attr and reg producing in select generation
a74nh Oct 28, 2022
4281fc0
Revert SPMI changes
jakobbotsch Oct 28, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add size costing check
  • Loading branch information
a74nh committed Sep 15, 2022
commit 852a5afdd66327d05b8abab13190da00ff72097d
3 changes: 3 additions & 0 deletions src/coreclr/jit/gentree.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5804,6 +5804,9 @@ unsigned Compiler::gtSetEvalOrder(GenTree* tree)
level = max(level, lvl2);
costEx += tree->AsConditional()->gtOp2->GetCostEx();
costSz += tree->AsConditional()->gtOp2->GetCostSz();

costEx += 1;
costSz += 1;
break;

default:
Expand Down
21 changes: 18 additions & 3 deletions src/coreclr/jit/optimizer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4755,18 +4755,33 @@ bool Compiler::optIfConvert(BasicBlock* block)
return false;
}

// Calculate size costs.
// First reclaculate existing costs to make sure they are up to date.
gtSetEvalOrder(last);
gtSetEvalOrder(asgNode);
// If the condition is against 0 then JTRUE will have no size cost as it'll be merged into the compare.
int currentCostSz =
asgNode->GetCostSz() + (cond->gtGetOp2()->IsIntegralConst(0) ? cond->GetCostSz() : last->GetCostSz());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would fix the costing instead. We should not be working around incorrect costing like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me that this is really the heuristic we want, barely anything in the JIT uses code size estimates when optimizing for speed (the places that do are more about code duplication -- which this transformation does not do).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would stick with Andy's suggestion here -- put a constant limit on GetCostEx() for the code that is now being evaluated eagerly. We can adjust the heuristic as needed once we get some mileage on the transformation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed optimising for speed is better than size.

But - by optimising for speed I always see regressions via spmidiff. That's due the costings not taking into account that comparison against 0 will get optimised away into cbnz.

With everything enabled:
[14:50:56] Total bytes of base: 138017016 (overridden on cmd)
[14:50:56] Total bytes of diff: 138022692 (overridden on cmd)
[14:50:56] Total bytes of delta: 5676 (0.00 % of base)
[14:50:56] 567 total methods with Code Size differences (87 improved, 480 regressed), 783 unchanged.

Limiting via costEx:
[14:54:47] Total bytes of base: 138017992 (overridden on cmd)
[14:54:47] Total bytes of diff: 138021404 (overridden on cmd)
[14:54:47] Total bytes of delta: 3412 (0.00 % of base)
[14:54:47] 281 total methods with Code Size differences (42 improved, 239 regressed), 572 unchanged.

This is using the following for costing:

    // Using SELECT nodes means that full assignment is always evaulated.
    // Put a limit on the original source and destination of the assignment.
    int costing = asgNode->gtGetOp1()->GetCostEx() + asgNode->gtGetOp2()->GetCostEx() - 2;
    JITDUMP("costing=%d\n", costing);

    if (costing > 0 && !compStressCompile(STRESS_IF_CONVERSION, 0))
    {
        JITDUMP("Aborting If Conversion: optimisation too expensive (+%d)\n", costing);
        return false;
    }

But, if speed is what we're looking for, then yes, checking spmidiff isn't that useful

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's due the costings not taking into account that comparison against 0 will get optimised away into cbnz.

Sounds like something we ought to fix, if you have free cycles can you submit a separate PR for it?

int costing = asgNode->gtGetOp1()->GetCostEx() + asgNode->gtGetOp2()->GetCostEx() - 2;

Evaluating the cost of the LHS of assignments uncontextually is odd since it is treated specially ("no cost" if it ends up being a local in a register, for example). gtSetEvalOrder uses gtIsLikelyRegVar to estimate this, does it make sense to take it into account? Basically, in that case the cost of ASG(lcl, SELECT(...)) is only going to be the cost of SELECT when evaluated into a target register.

I was really just thinking of something like:

if (!compStressCompile(STRESS_IF_CONVERSION, weight))
{
  int writeCost = gtIsLikelyRegVar(asgNode->gtGetOp1()) ? 0 : 2;

  if (writeCost + asgNode->gtGetOp2()->GetCostEx() > 3)
  {
    JITDUMP("Skipping if-conversion that will evaluate expensive RHS unconditionally");
    return false;
  }
}

Taking cbnz into account as you mentioned would be great. Exact values need to be adjusted, for example maybe pick the number such that we allow conversion for x = cond ? a + b : x, but no more than that. By the way, clang seems really aggressive: https://godbolt.org/z/54GMGaTfY

I'm not that worried about the size wise regressions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like something we ought to fix, if you have free cycles can you submit a separate PR for it?

Sure, wasn't sure if it was intentional. I'll do that and the incorrect size one too.

I was really just thinking of something like:

Will try this out.

By the way, clang seems really aggressive: https://godbolt.org/z/54GMGaTfY

I'm not certain on this, but, I think in later passes clang will sometimes turn the if conversions back into branches (maybe as part of auto vec).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated costings and set the Stress to 25%.

Limit of 7 to allow for:

N005 (  7,  5) [000016] -A------R--                         *  ASG       int    $VN.Void
N004 (  1,  1) [000015] D------N---                         +--*  LCL_VAR   int    V00 arg0         d:3 $VN.Void
N003 (  7,  5) [000014] -----------                         \--*  ADD       int    $102
N001 (  3,  2) [000012] -----------                            +--*  LCL_VAR   int    V02 arg2         u:1 (last use) $82
N002 (  3,  2) [000013] -----------                            \--*  LCL_VAR   int    V03 arg3         u:1 (last use) $83

int postOptCostSz =
asgNode->GetCostSz() + 1 /* GT_SELECT */ + cond->GetCostSz() + asgNode->AsOp()->gtOp1->GetCostSz();

#ifdef DEBUG
if (verbose)
{
JITDUMP("Conditionally executing " FMT_BB " inside " FMT_BB "\n", middleBlock->bbNum, block->bbNum);
JITDUMP("\nConditionally executing " FMT_BB " inside " FMT_BB "\n", middleBlock->bbNum, block->bbNum);
for (BasicBlock* dumpBlock = block; dumpBlock != middleBlock->bbNext; dumpBlock = dumpBlock->bbNext)
{
fgDumpBlock(dumpBlock);
}
JITDUMP("\n");
}
#endif

if (postOptCostSz > currentCostSz)
{
JITDUMP("Aborting If Conversion: optimisation too expensive (%d>%d)\n", postOptCostSz, currentCostSz);
return false;
}

// Duplicate the destination of the assign.
// This will be used as the false result of the select node.
assert(asgNode->AsOp()->gtOp1->IsLocal());
Expand Down Expand Up @@ -4850,7 +4865,7 @@ bool Compiler::optIfConvert(BasicBlock* block)
#ifdef DEBUG
if (verbose)
{
JITDUMP("After if conversion\n");
JITDUMP("\nAfter if conversion\n");
for (BasicBlock* dumpBlock = block; dumpBlock != middleBlock->bbNext; dumpBlock = dumpBlock->bbNext)
{
fgDumpBlock(dumpBlock);
Expand Down