Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft for Min/Max intrinsics xarch (#65625) #65700

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions src/coreclr/jit/codegenxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7400,6 +7400,20 @@ void CodeGen::genIntrinsic(GenTree* treeNode)
// Handle intrinsics that can be implemented by target-specific instructions
switch (treeNode->AsIntrinsic()->gtIntrinsicName)
{
#ifdef TARGET_XARCH

case NI_System_Math_Max:
genConsumeOperands(treeNode->AsOp());
GetEmitter()->emitIns_R_R_R(INS_maxss, emitActualTypeSize(treeNode), treeNode->GetRegNum(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to use emitIns_SIMD_R_R_R which will take care of VEX vs non-VEX differences

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually also expect we want to handle things like containment/etc in which case lowering to a HWIntrinsic might be simpler overall

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, manual containment management is quite verbose

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tannergooding Thank you for the advice. I'm sorry , but could you tell me more about the containments?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally each node is processed by the JIT and will individually get register assignments and code generated for it. Containment is the term used to mean that a node should not be handled that way and instead is handled directly by its parent node.

So by default, if you have something like GT_HWINTRINSIC NI_System_Math_Max and it has a child node like GT_IND float you would end up generating

movss reg1, [mem]
maxss reg2, reg1

However, if you mark the GT_IND as "contained", then you can handle these together and emit just maxss reg1, [mem].


There is a lot of existing logic for supporting containment (https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/lowerxarch.cpp#L6030) and so it would likely be better to lower or import NI_System_Math_Max to be the relevant HWIntrinsic node(s) instead so that much of this handling implicitly lights up, rather than having to duplicate it just for the NI_System_Math_Max node.

If you were to do it the other way, by just handling NI_System_Math_Max, then it would likely be done around here: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/lowerxarch.cpp#L5371-L5387

treeNode->gtGetOp1()->GetRegNum(), treeNode->gtGetOp2()->GetRegNum());
break;

case NI_System_Math_Min:
genConsumeOperands(treeNode->AsOp());
GetEmitter()->emitIns_R_R_R(INS_minss, emitActualTypeSize(treeNode), treeNode->GetRegNum(),
treeNode->gtGetOp1()->GetRegNum(), treeNode->gtGetOp2()->GetRegNum());
break;
#endif
case NI_System_Math_Abs:
genSSE2BitwiseOp(treeNode);
break;
Expand Down
5 changes: 4 additions & 1 deletion src/coreclr/jit/importer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4492,13 +4492,14 @@ GenTree* Compiler::impIntrinsic(GenTree* newobjThis,
case NI_System_Math_Log:
case NI_System_Math_Log2:
case NI_System_Math_Log10:
#ifdef TARGET_ARM64
#if defined TARGET_ARM64 || defined TARGET_XARCH
// ARM64 has fmax/fmin which are IEEE754:2019 minimum/maximum compatible
// TODO-XARCH-CQ: Enable this for XARCH when one of the arguments is a constant
// so we can then emit maxss/minss and avoid NaN/-0.0 handling
case NI_System_Math_Max:
case NI_System_Math_Min:
#endif

case NI_System_Math_Pow:
case NI_System_Math_Round:
case NI_System_Math_Sin:
Expand Down Expand Up @@ -20541,6 +20542,8 @@ bool Compiler::IsTargetIntrinsic(NamedIntrinsic intrinsicName)

case NI_System_Math_Abs:
case NI_System_Math_Sqrt:
case NI_System_Math_Max:
case NI_System_Math_Min:
return true;

case NI_System_Math_Ceiling:
Expand Down
16 changes: 14 additions & 2 deletions src/coreclr/jit/lsraxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1804,9 +1804,21 @@ int LinearScan::BuildIntrinsic(GenTree* tree)
assert(varTypeIsFloating(op1));
assert(op1->TypeGet() == tree->TypeGet());
RefPosition* internalFloatDef = nullptr;

int srcCount;
switch (tree->AsIntrinsic()->gtIntrinsicName)
{
#ifdef TARGET_XARCH
case NI_System_Math_Max:
case NI_System_Math_Min:
assert(varTypeIsFloating(tree->gtGetOp1()));
assert(varTypeIsFloating(tree->gtGetOp2()));
assert(tree->gtGetOp1()->TypeIs(tree->TypeGet()));

srcCount = BuildBinaryUses(tree->AsOp());

BuildDef(tree);
return srcCount;
#endif
case NI_System_Math_Abs:
// Abs(float x) = x & 0x7fffffff
// Abs(double x) = x & 0x7ffffff ffffffff
Expand Down Expand Up @@ -1837,7 +1849,7 @@ int LinearScan::BuildIntrinsic(GenTree* tree)
break;
}
assert(tree->gtGetOp2IfPresent() == nullptr);
int srcCount;

if (op1->isContained())
{
srcCount = BuildOperandUses(op1);
Expand Down