Skip to content
Merged
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
ee2c0b6
Optimize FMA codegen base on the overwritten
weilinwa Jul 20, 2021
46d0011
Improve function/var names
weilinwa Aug 27, 2021
cce4bda
Add assertions
weilinwa Aug 27, 2021
b825291
Get use of FMA with TryGetUse
weilinwa Sep 7, 2021
f615e39
Decide FMA form with two conditions, OverwrittenOpNum and isContained
weilinwa Sep 8, 2021
b698036
Fix op reg error in codegen
weilinwa Sep 10, 2021
7d9c0d6
Decide form using lastUse and isContained in no overwritten case
weilinwa Sep 15, 2021
1344d92
Clean up code
weilinwa Sep 18, 2021
029a9b5
Separate default case overwrittenOpNum==0
weilinwa Sep 20, 2021
f2a371f
Apply format patch
weilinwa Sep 29, 2021
9955389
Change variable and function names
weilinwa Oct 1, 2021
7c56653
Update regOptional for op1 and resolve some other comments
weilinwa Oct 5, 2021
1d51caa
Optimize FMA codegen base on the overwritten
weilinwa Jul 20, 2021
091133e
Improve function/var names
weilinwa Aug 27, 2021
9a6ae44
Add assertions
weilinwa Aug 27, 2021
ffcff76
Get use of FMA with TryGetUse
weilinwa Sep 7, 2021
5641f8f
Decide FMA form with two conditions, OverwrittenOpNum and isContained
weilinwa Sep 8, 2021
b7312ac
Fix op reg error in codegen
weilinwa Sep 10, 2021
a325fe3
Decide form using lastUse and isContained in no overwritten case
weilinwa Sep 15, 2021
0f950dd
Clean up code
weilinwa Sep 18, 2021
33a596d
Separate default case overwrittenOpNum==0
weilinwa Sep 20, 2021
5da9368
Apply format patch
weilinwa Sep 29, 2021
c3a9f07
Change variable and function names
weilinwa Oct 1, 2021
9e356aa
Update regOptional for op1 and resolve some other comments
weilinwa Oct 5, 2021
f8159bc
Change var names
weilinwa Oct 13, 2021
18bbe4d
Resolve merge conflicts.
weilinwa Oct 13, 2021
2ca2524
Fix jit format
weilinwa Oct 13, 2021
17bd967
Fix build node error for op1 is regOptional
weilinwa Oct 14, 2021
eed5912
Use targetReg instead of GetResultOpNumForFMA in codegen
weilinwa Oct 28, 2021
43c5034
Update variable names
weilinwa Nov 2, 2021
5ef70a5
Refactor lsra to solve lastUse status changed caused assertion failure
weilinwa Nov 7, 2021
bfa6924
Add check to prioritize contained op in lsra
weilinwa Nov 7, 2021
12f260b
Update for jit format
weilinwa Nov 7, 2021
5ca658e
Simplify code
weilinwa Nov 17, 2021
ec4ef66
Resolve comments
weilinwa Nov 17, 2021
aa93a85
Comment out assert because of lastUse change
weilinwa Nov 19, 2021
c66a018
Fix some copiesUpperBits related errors
weilinwa Nov 22, 2021
ff5a433
Merge branch 'main' into fma_opt
weilinwa Nov 22, 2021
a4657c7
Update src/coreclr/jit/lsraxarch.cpp
weilinwa Nov 30, 2021
75d7a37
Add link to the new issue
weilinwa Nov 30, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions src/coreclr/jit/gentree.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -21818,6 +21818,52 @@ uint16_t GenTreeLclVarCommon::GetLclOffs() const
}
}

#if defined(TARGET_XARCH) && defined(FEATURE_HW_INTRINSICS)
//------------------------------------------------------------------------
// GetResultOpNumForFMA: check if the result is written into one of the operands.
// In the case that none of the operand is overwritten, check if any of them is lastUse.
//
// Return Value:
// The operand number overwritten or lastUse. 0 is the default value, where the result is written into
// a destination that is not one of the source operands and there is no last use op.
//
unsigned GenTreeHWIntrinsic::GetResultOpNumForFMA(GenTree* use, GenTree* op1, GenTree* op2, GenTree* op3)
{
// only FMA intrinsic node should call into this function
assert(HWIntrinsicInfo::lookupIsa(gtHWIntrinsicId) == InstructionSet_FMA);
if (use != nullptr && use->OperIs(GT_STORE_LCL_VAR))
{
// For store_lcl_var, check if any op is overwritten

GenTreeLclVarCommon* overwritten = use->AsLclVarCommon();
unsigned overwrittenLclNum = overwritten->GetLclNum();
if (op1->IsLocal() && op1->AsLclVarCommon()->GetLclNum() == overwrittenLclNum)
{
return 1;
}
else if (op2->IsLocal() && op2->AsLclVarCommon()->GetLclNum() == overwrittenLclNum)
{
return 2;
}
else if (op3->IsLocal() && op3->AsLclVarCommon()->GetLclNum() == overwrittenLclNum)
{
return 3;
}
}

// If no overwritten op, check if there is any last use op

if (op1->OperIs(GT_LCL_VAR) && op1->IsLastUse(0))
return 1;
else if (op3->OperIs(GT_LCL_VAR) && op3->IsLastUse(0))
return 3;
else if (op2->OperIs(GT_LCL_VAR) && op2->IsLastUse(0))
return 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest swapping the ordering of op2/op3 here

Suggested change
else if (op3->OperIs(GT_LCL_VAR) && op3->IsLastUse(0))
return 3;
else if (op2->OperIs(GT_LCL_VAR) && op2->IsLastUse(0))
return 2;
else if (op2->OperIs(GT_LCL_VAR) && op2->IsLastUse(0))
return 2;
else if (op3->OperIs(GT_LCL_VAR) && op3->IsLastUse(0))
return 3;

The reasoning is that this method is picking a preference for "overwritten op".

Preferencing op1 as the first check here makes sense because scalar ops "copy upper bits" and therefore if we're in that scenario, op1 is the only operand that can "be the target" as it were, the others will have to be contained or delay free.

Preferencing op2 after that (as the secondary preference) simply keeps it consistent with the op1->IsLocal checks above and results in the "least" amount of operand swapping if we order the containment checks accordingly.


return 0;
}
#endif // TARGET_XARCH && FEATURE_HW_INTRINSICS

#ifdef TARGET_ARM
//------------------------------------------------------------------------
// IsOffsetMisaligned: check if the field needs a special handling on arm.
Expand Down
1 change: 1 addition & 0 deletions src/coreclr/jit/gentree.h
Original file line number Diff line number Diff line change
Expand Up @@ -5278,6 +5278,7 @@ struct GenTreeHWIntrinsic : public GenTreeJitIntrinsic
{
return (gtFlags & GTF_SIMDASHW_OP) != 0;
}
unsigned GetResultOpNumForFMA(GenTree* use, GenTree* op1, GenTree* op2, GenTree* op3);

#if DEBUGGABLE_GENTREE
GenTreeHWIntrinsic() : GenTreeJitIntrinsic()
Expand Down
83 changes: 58 additions & 25 deletions src/coreclr/jit/hwintrinsiccodegenxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2106,7 +2106,9 @@ void CodeGen::genFMAIntrinsic(GenTreeHWIntrinsic* node)
NamedIntrinsic intrinsicId = node->gtHWIntrinsicId;
var_types baseType = node->GetSimdBaseType();
emitAttr attr = emitActualTypeSize(Compiler::getSIMDTypeForSize(node->GetSimdSize()));
instruction ins = HWIntrinsicInfo::lookupIns(intrinsicId, baseType);
instruction ins = HWIntrinsicInfo::lookupIns(intrinsicId, baseType); // 213 form
instruction _132form = (instruction)(ins - 1);
instruction _231form = (instruction)(ins + 1);
GenTree* op1 = node->gtGetOp1();
regNumber targetReg = node->GetRegNum();

Expand All @@ -2122,43 +2124,75 @@ void CodeGen::genFMAIntrinsic(GenTreeHWIntrinsic* node)
argList = argList->Rest();
GenTree* op3 = argList->Current();

regNumber op1Reg;
regNumber op2Reg;
regNumber op1NodeReg = op1->GetRegNum();
regNumber op2NodeReg = op2->GetRegNum();
regNumber op3NodeReg = op3->GetRegNum();

GenTree* emitOp1 = op1;
GenTree* emitOp2 = op2;
GenTree* emitOp3 = op3;

bool isCommutative = false;
const bool copiesUpperBits = HWIntrinsicInfo::CopiesUpperBits(intrinsicId);

// Intrinsics with CopyUpperBits semantics cannot have op1 be contained
assert(!copiesUpperBits || !op1->isContained());

if (op2->isContained() || op2->isUsedFromSpillTemp())
if (op1->isContained() || op1->isUsedFromSpillTemp())
{
// 132 form: op1 = (op1 * op3) + [op2]

ins = (instruction)(ins - 1);
op1Reg = op1->GetRegNum();
op2Reg = op3->GetRegNum();
op3 = op2;
if (targetReg == op2NodeReg)
{
std::swap(emitOp1, emitOp2);
// op2 = ([op1] * op2) + op3
// 132 form: XMM1 = (XMM1 * [XMM3]) + XMM2
ins = _132form;
std::swap(emitOp2, emitOp3);
}
else
{
// targetReg == op3NodeReg or targetReg == ?
// op3 = ([op1] * op2) + op3
// 231 form: XMM1 = (XMM2 * [XMM3]) + XMM1
ins = _231form;
std::swap(emitOp1, emitOp3);
}
}
else if (op1->isContained() || op1->isUsedFromSpillTemp())
else if (op2->isContained() || op2->isUsedFromSpillTemp())
{
// 231 form: op3 = (op2 * op3) + [op1]

ins = (instruction)(ins + 1);
op1Reg = op3->GetRegNum();
op2Reg = op2->GetRegNum();
op3 = op1;
if (targetReg == op3NodeReg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be !copiesUpperBits && (targetReg == op3NodeReg)

Otherwise, copiesUpperBits can be true since op1 is not Contained or UsedFromSpillTemp and therefore swapping emitOp1 isn't correct.

{
// op3 = (op1 * [op2]) + op3
// 231 form: XMM1 = (XMM2 * [XMM3]) + XMM1
ins = _231form;
std::swap(emitOp1, emitOp3);
}
else
{
// targetReg == op1NodeReg or targetReg == ?
// op1 = (op1 * [op2]) + op3
// 132 form: XMM1 = (XMM1 * [XMM3]) + XMM2
ins = _132form;
}
std::swap(emitOp2, emitOp3);
}
else
{
// 213 form: op1 = (op2 * op1) + [op3]

op1Reg = op1->GetRegNum();
op2Reg = op2->GetRegNum();

isCommutative = !copiesUpperBits;
// targetReg could be op1NodeReg, op2NodeReg, or not equal to any op
// op1 = (op1 * op2) + [op3] or op2 = (op1 * op2) + [op3]
// ? = (op1 * op2) + [op3] or ? = (op1 * op2) + op3
// 213 form: XMM1 = (XMM2 * XMM1) + [XMM3]
isCommutative = copiesUpperBits;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't isCommutative be !copiesUpperBits? We can't swap anything if copiesUpperBits == true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I used it inaccurately here to barely control if we should enter the branch.

if (targetReg == op2NodeReg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, I think this needs to be if (!copiesUpperBits && (targetReg == op2NodeReg)) for the same reason.

I think we also don't need the below section doing if (!copiesUpperBits && (emitOp2->GetRegNum() == targetReg)) as it will have already been covered up here.

{
// op2 = (op1 * op2) + [op3]
// 213 form: XMM1 = (XMM2 * XMM1) + [XMM3]
std::swap(emitOp1, emitOp2);
}
}

regNumber op1Reg = emitOp1->GetRegNum();
regNumber op2Reg = emitOp2->GetRegNum();

if (isCommutative && (op1Reg != targetReg) && (op2Reg == targetReg))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this block still needed given the above handling?

It feels like we should already be covering this under the last block, which is op3 or nothing is contained/spilled so:

if (!copiesUpperBits && (targetReg == op2Reg))
{
    std::swap(emitOp1, emitOp2);
}

Then everything should be in the right place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it if (!copiesUpperBits && (targetReg == op2Reg)) not if (copiesUpperBits && (targetReg == op2Reg))? I thought we need to ensure targetReg is op1Reg only when copiesUpperBits is true.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because emitOp1 is already op1, so if copiesUpperBits == true, then we don't want to change anything.

When its false, we only need to swap if the target reg is op2Reg.

{
assert(node->isRMWHWIntrinsic(compiler));
Expand All @@ -2173,8 +2207,7 @@ void CodeGen::genFMAIntrinsic(GenTreeHWIntrinsic* node)
op2Reg = op1Reg;
op1Reg = targetReg;
}

genHWIntrinsic_R_R_R_RM(ins, attr, targetReg, op1Reg, op2Reg, op3);
genHWIntrinsic_R_R_R_RM(ins, attr, targetReg, op1Reg, op2Reg, emitOp3);
genProduceReg(node);
}

Expand Down
57 changes: 35 additions & 22 deletions src/coreclr/jit/lowerxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6335,38 +6335,51 @@ void Lowering::ContainCheckHWIntrinsic(GenTreeHWIntrinsic* node)
{
if ((intrinsicId >= NI_FMA_MultiplyAdd) && (intrinsicId <= NI_FMA_MultiplySubtractNegatedScalar))
{
bool supportsRegOptional = false;
bool supportsOp1RegOptional = false;
bool supportsOp2RegOptional = false;
bool supportsOp3RegOptional = false;
unsigned resultOpNum = 0;
LIR::Use use;
GenTree* user = nullptr;

if (BlockRange().TryGetUse(node, &use))
{
user = use.User();
}
resultOpNum = node->GetResultOpNumForFMA(user, op1, op2, op3);

// Prioritize Containable op. Check if any one of the op is containable first.
// Set op regOptional only if none of them is containable.

if (IsContainableHWIntrinsicOp(node, op3, &supportsRegOptional))
if (resultOpNum != 1 && IsContainableHWIntrinsicOp(node, op1, &supportsOp1RegOptional) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with https://github.com/dotnet/runtime/pull/58196/files#r751517074, I'd actually swap the check around containing op1 with op3.

op1 is the primary preference for target because it may be copiesUpperBits and it results in the least number of operand swaps; so it should be the least preferred for containment.

op3 becomes the "most preferred" for containment because its the "least preferred" (last checked) for resultOpNum and its "naturally" in the containment slot (op3 == emitOp3, if nothing is swapped).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we swap only the MakeSrcContained part or both this and the SetRegOptional part?

Copy link
Member

@tannergooding tannergooding Nov 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would give us this:

if ((resultOpNum != 3) && IsContainableHWIntrinsicOp(node, op3, &supportsOp3RegOptional))
{
    // result = (op1 * op2) + [op3]
    MakeSrcContained(node, op3);
}
else if (resultOpNum != 2 && IsContainableHWIntrinsicOp(node, op2, &supportsOp2RegOptional))
{
    // result = (op1 * [op2]) + op3
    MakeSrcContained(node, op2);
}
else if (resultOpNum != 1 && !HWIntrinsicInfo::CopiesUpperBits(intrinsicId) && IsContainableHWIntrinsicOp(node, op1, &supportsOp1RegOptional))
{
    // result = ([op1] * op2) + op3
    MakeSrcContained(node, op1);
}
else if (supportsOp3RegOptional)
{
    assert(resultOpNum != 3);
    op3->SetRegOptional();
}
else if (supportsOp2RegOptional)
{
    assert(resultOpNum != 2);
    op2->SetRegOptional();
}
else if (supportsOp1RegOptional)
{
    op1->SetRegOptional();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think we want/need to be doing if (supportsOp2RegOptional) with assert(resultOpNum != 2) rather than the inverse as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we swap only the MakeSrcContained part or both this and the SetRegOptional part?

Its probably more consistent to swap both, but I don't think there's a strong need to do so for RegOptional. I've updated the suggested code above to account for keeping it consistent though.

The primary reason for asking for these to be swapped is to reduce the number of checks we need to do in the "normal" case. Due to how we set resultOpNum and the limits around coppiesUpperBits, this meant that op1 is the "prime" candidate for the result register and so having that check first means it will also likely be false (unlikely to be taken) most of the time.

Once we get to RegOptional cases; we're mostly looking at it from a perspective of "if the register allocator has to spill something, can we tell it that some item is a better candidate for spilling than something else", in which case we're already in a "non ideal" situation and what we do probably doesn't have a lot of impact.

!HWIntrinsicInfo::CopiesUpperBits(intrinsicId))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just writing out my own thoughts here, I don't think there is anything that strictly needs to be done for the PR. It could instead be handled in a follow up if we thought it was beneficial.

Consider the scenario where more than one operand is last use. Today, https://github.com/dotnet/runtime/pull/58196/files#diff-fba867eda1c745875748370c59da2195c3823b85a17d199444068d93f91c749bR21856-R21861 will just return the index of the first operand that's last use.

For this example, lets consider that all are last use and only op1 is coming from memory. In this scenario, GetResultOpNumForFMA will return 1 and this check will skip the containment logic for op1 because it was also last use, where-as in practice we also have op2 and op3 that are last use and so should be accounting for that and allowing op1 to be contained.

{
// 213 form: op1 = (op2 * op1) + [op3]
MakeSrcContained(node, op3);
// result = ([op1] * op2) + op3
MakeSrcContained(node, op1);
}
else if (IsContainableHWIntrinsicOp(node, op2, &supportsRegOptional))
else if (resultOpNum != 2 && IsContainableHWIntrinsicOp(node, op2, &supportsOp2RegOptional))
{
// 132 form: op1 = (op1 * op3) + [op2]
// result = (op1 * [op2]) + op3
MakeSrcContained(node, op2);
}
else if (IsContainableHWIntrinsicOp(node, op1, &supportsRegOptional))
else if (resultOpNum != 3 && IsContainableHWIntrinsicOp(node, op3, &supportsOp3RegOptional))
{
// Intrinsics with CopyUpperBits semantics cannot have op1 be contained

if (!HWIntrinsicInfo::CopiesUpperBits(intrinsicId))
{
// 231 form: op3 = (op2 * op3) + [op1]
MakeSrcContained(node, op1);
}
// result = (op1 * op2) + [op3]
MakeSrcContained(node, op3);
}
else if (resultOpNum != 1 && !HWIntrinsicInfo::CopiesUpperBits(intrinsicId))
{
assert(supportsOp1RegOptional);
op1->SetRegOptional();
}
else if (resultOpNum != 2)
{
assert(supportsOp2RegOptional);
op2->SetRegOptional();
}
else
{
assert(supportsRegOptional);

// TODO-XArch-CQ: Technically any one of the three operands can
// be reg-optional. With a limitation on op1 where
// it can only be so if CopyUpperBits is off.
// https://github.com/dotnet/runtime/issues/6358

// 213 form: op1 = (op2 * op1) + op3
assert(supportsOp3RegOptional);
op3->SetRegOptional();
}
}
Expand Down
79 changes: 54 additions & 25 deletions src/coreclr/jit/lsraxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2328,48 +2328,77 @@ int LinearScan::BuildHWIntrinsic(GenTreeHWIntrinsic* intrinsicTree)

const bool copiesUpperBits = HWIntrinsicInfo::CopiesUpperBits(intrinsicId);

// Intrinsics with CopyUpperBits semantics cannot have op1 be contained
assert(!copiesUpperBits || !op1->isContained());
unsigned resultOpNum = 0;
LIR::Use use;
GenTree* user = nullptr;

if (op2->isContained())
if (LIR::AsRange(blockSequence[curBBSeqNum]).TryGetUse(intrinsicTree, &use))
{
// 132 form: op1 = (op1 * op3) + [op2]
user = use.User();
}
resultOpNum = intrinsicTree->GetResultOpNumForFMA(user, op1, op2, op3);

tgtPrefUse = BuildUse(op1);
// Intrinsics with CopyUpperBits semantics cannot have op1 be contained
assert(!copiesUpperBits || !op1->isContained());

srcCount += 1;
srcCount += BuildOperandUses(op2);
srcCount += BuildDelayFreeUses(op3, op1);
unsigned containedOpNum = 0;

if (op1->isContained() || op1->IsRegOptional())
{
containedOpNum = 1;
}
else if (op1->isContained())
else if (op2->isContained() || op2->IsRegOptional())
{
// 231 form: op3 = (op2 * op3) + [op1]

tgtPrefUse = BuildUse(op3);

srcCount += BuildOperandUses(op1);
srcCount += BuildDelayFreeUses(op2, op1);
srcCount += 1;
containedOpNum = 2;
}
else
{
// 213 form: op1 = (op2 * op1) + [op3]
assert(op3->isContained() || op3->IsRegOptional());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for us to have a value which cannot be contained and which is not reg optional?

I'd feel better if we handled that case explicitly.

containedOpNum = 3;
}

tgtPrefUse = BuildUse(op1);
srcCount += 1;
GenTree* emitOp1 = op1;
GenTree* emitOp2 = op2;
GenTree* emitOp3 = op3;

if (copiesUpperBits)
// Intrinsics with CopyUpperBits semantics must have op1 as target
if (containedOpNum == 1 && !copiesUpperBits)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the !copiesUpperBits check needed? If we are copiesUpperBits then containedOpNum shouldn't be 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should be true. Do we need to add an assert before to ensure that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe replace it with assert(containedOpNum != 1 || !copiesUpperBits); to also cover the regOptional case

if (resultOpNum != 3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if this were the positive case and there were an assert that resultOpNum != containedOpNum

Therefore, if containedOpNum == 1 then resultOpNum can only be 0, 2, or 3

If it's 3, then swapping op1/op3 is sufficient
If it's 2, then swapping op2/op3 is needed first
If it's 0, then it doesn't matter what we do so its fine to not swap

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible that containedOpNum ==0 and resultOpNum==0?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If none of the operands are overwritten and none are last use, then containedOpNum == 0.

I think we probably won't also get containedOpNum == 0 because VFMADD should support general-purpose loads as well and so RegOptional should probably be true for at least one case. But in general its better to check and account for possible future changes, scenarios, or nodes that are introduced

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because op lastUse could be updated after lowering, there are cases that we have resultOpNum == containedOpNum when they are not 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make multiple ops contained in lowering or change that in lsra?

{
srcCount += BuildDelayFreeUses(op2, op1);
// op2 = ([op1] * op2) + op3
std::swap(emitOp2, emitOp3);
}
else

// else: op3 = ([op1] * op2) + op3
std::swap(emitOp1, emitOp3);
}
else if (containedOpNum == 3)
{
if (resultOpNum == 2 && !copiesUpperBits)
{
tgtPrefUse2 = BuildUse(op2);
srcCount += 1;
// op2 = (op1 * op2) + [op3]
std::swap(emitOp1, emitOp2);
}
// else: op1 = (op1 * op2) + [op3]
}
else
{
assert(containedOpNum == 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its possible for containedOpNum to be 0 and so we should check for this explicitly.

// op1 = (op1 * [op2]) + op3
std::swap(emitOp2, emitOp3);

srcCount += op3->isContained() ? BuildOperandUses(op3) : BuildDelayFreeUses(op3, op1);
if (resultOpNum == 3 && !copiesUpperBits)
Copy link
Member

@tannergooding tannergooding Nov 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just capturing a comment, I don't think we need to do anything in this PR.

I think the logic around copiesUpperBits could be simplified a bit so we don't need these extra checks everywhere. That is, if copiesUpperBits is true, then resultOpNum doesn't matter if its not 1 so maybe we should be forcing resultOpNum to be 0 in that case (that is if copiesUpperBits == true and resultOpNum != 1, then treat it as 0, because no matter what we do, op1 cannot be swapped or moved about and op2/op3 will be delay free or contained).

{
// op3 = (op1 * [op2]) + op3
std::swap(emitOp1, emitOp2);
}
}
tgtPrefUse = BuildUse(emitOp1);

srcCount += 1;
srcCount += BuildDelayFreeUses(emitOp2, emitOp1);
srcCount += emitOp3->isContained() ? BuildOperandUses(emitOp3) : BuildDelayFreeUses(emitOp3, emitOp1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot smaller and easier to follow now 🎉


buildUses = false;
break;
Expand Down