Draft for Min/Max intrinsics xarch (#65625) #65700

SkiFoD · 2022-02-22T10:59:13Z

I made the changes using #65584 as a referrence. It works with HelloWorld'sh code, but I'm stuck with building it via build.cmd. It doesn't want to compile and I can't figure out why.
I face the errors:

Assertion failed 'IsThreeOperandAVXInstruction(ins)' in 'System.MathF:Min(float,float):float' during 'Generate code' (IL size 13)
Assertion failed 'IsThreeOperandAVXInstruction(ins)' in 'System.Single:System.INumber<System.Single>.Min(float,float):float' during 'Generate code' (IL size 8)

@EgorBo Could you please help me to figure out what I implemented wrong?

ghost · 2022-02-22T10:59:23Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

I made the changes using #65584 as a referrence. It works with HelloWorld'sh code, but I'm stuck with building it via build.cmd. It doesn't want to compile and I can't figure out why.
I face the errors:

Assertion failed 'IsThreeOperandAVXInstruction(ins)' in 'System.MathF:Min(float,float):float' during 'Generate code' (IL size 13)
Assertion failed 'IsThreeOperandAVXInstruction(ins)' in 'System.Single:System.INumber<System.Single>.Min(float,float):float' during 'Generate code' (IL size 8)

@EgorBo Could you please help me to figure out what I implemented wrong?

Author:	SkiFoD
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

EgorBo · 2022-02-22T11:42:02Z

@EgorBo Could you please help me to figure out what I implemented wrong?

maxss and vmaxss are different instructions basically (VEX encoding), only vmaxss can be used with R_R_R, but it's not available without AVX

tannergooding · 2022-02-22T16:02:42Z

src/coreclr/jit/codegenxarch.cpp

+
+        case NI_System_Math_Max:
+            genConsumeOperands(treeNode->AsOp());
+            GetEmitter()->emitIns_R_R_R(INS_maxss, emitActualTypeSize(treeNode), treeNode->GetRegNum(),


You want to use emitIns_SIMD_R_R_R which will take care of VEX vs non-VEX differences

I'd actually also expect we want to handle things like containment/etc in which case lowering to a HWIntrinsic might be simpler overall

agree, manual containment management is quite verbose

@tannergooding Thank you for the advice. I'm sorry , but could you tell me more about the containments?

Normally each node is processed by the JIT and will individually get register assignments and code generated for it. Containment is the term used to mean that a node should not be handled that way and instead is handled directly by its parent node.

So by default, if you have something like GT_HWINTRINSIC NI_System_Math_Max and it has a child node like GT_IND float you would end up generating

movss reg1, [mem] maxss reg2, reg1

However, if you mark the GT_IND as "contained", then you can handle these together and emit just maxss reg1, [mem].

There is a lot of existing logic for supporting containment (https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/lowerxarch.cpp#L6030) and so it would likely be better to lower or import NI_System_Math_Max to be the relevant HWIntrinsic node(s) instead so that much of this handling implicitly lights up, rather than having to duplicate it just for the NI_System_Math_Max node.

If you were to do it the other way, by just handling NI_System_Math_Max, then it would likely be done around here: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/lowerxarch.cpp#L5371-L5387

EgorBo · 2022-03-15T08:33:31Z

I assume failures are related e.g.:

Process terminated. Assertion failed.
   at System.Globalization.CalendricalCalculationsHelper.PersianNewYearOnOrBefore(Int64 numberOfDays) in /_/src/libraries/System.Private.CoreLib/src/System/Globalization/CalendricalCalculationsHelper.cs:line 396

https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-65700-merge-f505ab65165146859e/System.Runtime.Tests/1/console.e40913b6.log?sv=2019-07-07&se=2022-03-21T10%3A27%3A05Z&sr=c&sp=rl&sig=8MkYay3CV4QZ0iVUwk2q8n54366%2F0ilLMZ0UWu%2FbUms%3D

SkiFoD · 2022-03-15T08:52:20Z

@EgorBo Since I'm learning to work with the project, I got a question. Let's say you have a piece of code failing (like the assertion in the log). How do you run clr against this piece of code to debug what is wrong? I believe there must be a better way than just "copy pasting" the code into a simple "Hello world" project to emulate the behaviour. Thank you in advance.

SkiFoD · 2022-03-31T12:28:27Z

@tannergooding Thank you for the explanation. Let's say I want to implement importing. I made an investigation and the HWIntrinsics are used if either a type belongs to System.Numerical.Vector or System.Numerics. Is it ok to implement the import somewhere here

runtime/src/coreclr/jit/importer.cpp

Lines 5158 to 5165 in 43ab6b8

    
           else if (strcmp(methodName, "Max") == 0) 
        
           { 
        
               result = NI_System_Math_Max; 
        
           } 
        
           else if (strcmp(methodName, "Min") == 0) 
        
           { 
        
               result = NI_System_Math_Min; 
        
           }

in case FEATURE_HW_INTRINSICS directive is turned on? The second questions is, I can't figure out which ISA to use for converting NI_System_Math_Max to a relevant HW node. Could you give me a hand please?

ghost · 2022-04-15T21:01:17Z

This pull request has been automatically marked no-recent-activity because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will remove no-recent-activity.

tannergooding · 2022-04-15T21:21:48Z

The second questions is, I can't figure out which ISA to use for converting NI_System_Math_Max to a relevant HW node. Could you give me a hand please?

Sorry, I had missed this. You can generally see the two approaches if you look at NI_System_Math_Round and NI_System_Math_FusedMultiplyAdd. You'll see that Compiler::IsTargetIntrinsic reports if the given math call can be handled via special logic here: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/importer.cpp#L20682-L20745

Compiler::impIntrinsic then specially handles importing Math_FusedMultiplyAdd directly as the relevant nodes here:

runtime/src/coreclr/jit/importer.cpp

Lines 4390 to 4459 in b5381bf

    
           #ifdef FEATURE_HW_INTRINSICS 
        
                       case NI_System_Math_FusedMultiplyAdd: 
        
                       { 
        
           #ifdef TARGET_XARCH 
        
                           if (compExactlyDependsOn(InstructionSet_FMA)) 
        
                           { 
        
                               assert(varTypeIsFloating(callType)); 
        
                               // We are constructing a chain of intrinsics similar to: 
        
                               //    return FMA.MultiplyAddScalar( 
        
                               //        Vector128.CreateScalarUnsafe(x), 
        
                               //        Vector128.CreateScalarUnsafe(y), 
        
                               //        Vector128.CreateScalarUnsafe(z) 
        
                               //    ).ToScalar(); 
        
                               GenTree* op3 = gtNewSimdHWIntrinsicNode(TYP_SIMD16, impPopStack().val, 
        
                                                                       NI_Vector128_CreateScalarUnsafe, callJitType, 16); 
        
                               GenTree* op2 = gtNewSimdHWIntrinsicNode(TYP_SIMD16, impPopStack().val, 
        
                                                                       NI_Vector128_CreateScalarUnsafe, callJitType, 16); 
        
                               GenTree* op1 = gtNewSimdHWIntrinsicNode(TYP_SIMD16, impPopStack().val, 
        
                                                                       NI_Vector128_CreateScalarUnsafe, callJitType, 16); 
        
                               GenTree* res = 
        
                                   gtNewSimdHWIntrinsicNode(TYP_SIMD16, op1, op2, op3, NI_FMA_MultiplyAddScalar, callJitType, 16); 
        
                               retNode = gtNewSimdHWIntrinsicNode(callType, res, NI_Vector128_ToScalar, callJitType, 16); 
        
                               break; 
        
                           } 
        
           #elif defined(TARGET_ARM64) 
        
                           if (compExactlyDependsOn(InstructionSet_AdvSimd)) 
        
                           { 
        
                               assert(varTypeIsFloating(callType)); 
        
                               // We are constructing a chain of intrinsics similar to: 
        
                               //    return AdvSimd.FusedMultiplyAddScalar( 
        
                               //        Vector64.Create{ScalarUnsafe}(z), 
        
                               //        Vector64.Create{ScalarUnsafe}(y), 
        
                               //        Vector64.Create{ScalarUnsafe}(x) 
        
                               //    ).ToScalar(); 
        
                               NamedIntrinsic createVector64 = 
        
                                   (callType == TYP_DOUBLE) ? NI_Vector64_Create : NI_Vector64_CreateScalarUnsafe; 
        
                               constexpr unsigned int simdSize = 8; 
        
                               GenTree* op3 = 
        
                                   gtNewSimdHWIntrinsicNode(TYP_SIMD8, impPopStack().val, createVector64, callJitType, simdSize); 
        
                               GenTree* op2 = 
        
                                   gtNewSimdHWIntrinsicNode(TYP_SIMD8, impPopStack().val, createVector64, callJitType, simdSize); 
        
                               GenTree* op1 = 
        
                                   gtNewSimdHWIntrinsicNode(TYP_SIMD8, impPopStack().val, createVector64, callJitType, simdSize); 
        
                               // Note that AdvSimd.FusedMultiplyAddScalar(op1,op2,op3) corresponds to op1 + op2 * op3 
        
                               // while Math{F}.FusedMultiplyAddScalar(op1,op2,op3) corresponds to op1 * op2 + op3 
        
                               retNode = gtNewSimdHWIntrinsicNode(TYP_SIMD8, op3, op2, op1, NI_AdvSimd_FusedMultiplyAddScalar, 
        
                                                                  callJitType, simdSize); 
        
                               retNode = gtNewSimdHWIntrinsicNode(callType, retNode, NI_Vector64_ToScalar, callJitType, simdSize); 
        
                               break; 
        
                           } 
        
           #endif 
        
                           // TODO-CQ-XArch: Ideally we would create a GT_INTRINSIC node for fma, however, that currently 
        
                           // requires more extensive changes to valuenum to support methods with 3 operands 
        
                           // We want to generate a GT_INTRINSIC node in the case the call can't be treated as 
        
                           // a target intrinsic so that we can still benefit from CSE and constant folding. 
        
                           break; 
        
                       } 
        
           #endif // FEATURE_HW_INTRINSICS

. It does this because FusedMultiplyAdd is a slightly more complex sequence of nodes.

Alternatively, many of the other Math_* intrinsics defer to impMathIntrinsic here:

runtime/src/coreclr/jit/importer.cpp

Lines 4461 to 4498 in b5381bf

    
                       case NI_System_Math_Abs: 
        
                       case NI_System_Math_Acos: 
        
                       case NI_System_Math_Acosh: 
        
                       case NI_System_Math_Asin: 
        
                       case NI_System_Math_Asinh: 
        
                       case NI_System_Math_Atan: 
        
                       case NI_System_Math_Atanh: 
        
                       case NI_System_Math_Atan2: 
        
                       case NI_System_Math_Cbrt: 
        
                       case NI_System_Math_Ceiling: 
        
                       case NI_System_Math_Cos: 
        
                       case NI_System_Math_Cosh: 
        
                       case NI_System_Math_Exp: 
        
                       case NI_System_Math_Floor: 
        
                       case NI_System_Math_FMod: 
        
                       case NI_System_Math_ILogB: 
        
                       case NI_System_Math_Log: 
        
                       case NI_System_Math_Log2: 
        
                       case NI_System_Math_Log10: 
        
           #ifdef TARGET_ARM64 
        
                       // ARM64 has fmax/fmin which are IEEE754:2019 minimum/maximum compatible 
        
                       // TODO-XARCH-CQ: Enable this for XARCH when one of the arguments is a constant 
        
                       // so we can then emit maxss/minss and avoid NaN/-0.0 handling 
        
                       case NI_System_Math_Max: 
        
                       case NI_System_Math_Min: 
        
           #endif 
        
                       case NI_System_Math_Pow: 
        
                       case NI_System_Math_Round: 
        
                       case NI_System_Math_Sin: 
        
                       case NI_System_Math_Sinh: 
        
                       case NI_System_Math_Sqrt: 
        
                       case NI_System_Math_Tan: 
        
                       case NI_System_Math_Tanh: 
        
                       case NI_System_Math_Truncate: 
        
                       { 
        
                           retNode = impMathIntrinsic(method, sig, callType, ni, tailCall); 
        
                           break; 
        
                       }

. Those get handling in a few phases throughout the JIT including containment handling here:

runtime/src/coreclr/jit/lowerxarch.cpp

Lines 5323 to 5346 in b5381bf

    
           void Lowering::ContainCheckIntrinsic(GenTreeOp* node) 
        
           { 
        
               assert(node->OperIs(GT_INTRINSIC)); 
        
               NamedIntrinsic intrinsicName = node->AsIntrinsic()->gtIntrinsicName; 
        
               if ((intrinsicName == NI_System_Math_Ceiling) || (intrinsicName == NI_System_Math_Floor) || 
        
                   (intrinsicName == NI_System_Math_Truncate) || (intrinsicName == NI_System_Math_Round) || 
        
                   (intrinsicName == NI_System_Math_Sqrt)) 
        
               { 
        
                   GenTree* op1 = node->gtGetOp1(); 
        
                   if ((IsContainableMemoryOp(op1) && IsSafeToContainMem(node, op1)) || op1->IsCnsNonZeroFltOrDbl()) 
        
                   { 
        
                       MakeSrcContained(node, op1); 
        
                   } 
        
                   else 
        
                   { 
        
                       // Mark the operand as reg optional since codegen can still 
        
                       // generate code if op1 is on stack. 
        
                       op1->SetRegOptional(); 
        
                   } 
        
               } 
        
           }

. And then code generation here:

runtime/src/coreclr/jit/codegenxarch.cpp

Lines 7421 to 7457 in b5381bf

    
           void CodeGen::genIntrinsic(GenTree* treeNode) 
        
           { 
        
               // Handle intrinsics that can be implemented by target-specific instructions 
        
               switch (treeNode->AsIntrinsic()->gtIntrinsicName) 
        
               { 
        
                   case NI_System_Math_Abs: 
        
                       genSSE2BitwiseOp(treeNode); 
        
                       break; 
        
                   case NI_System_Math_Ceiling: 
        
                   case NI_System_Math_Floor: 
        
                   case NI_System_Math_Truncate: 
        
                   case NI_System_Math_Round: 
        
                       genSSE41RoundOp(treeNode->AsOp()); 
        
                       break; 
        
                   case NI_System_Math_Sqrt: 
        
                   { 
        
                       // Both operand and its result must be of the same floating point type. 
        
                       GenTree* srcNode = treeNode->AsOp()->gtOp1; 
        
                       assert(varTypeIsFloating(srcNode)); 
        
                       assert(srcNode->TypeGet() == treeNode->TypeGet()); 
        
                       genConsumeOperands(treeNode->AsOp()); 
        
                       const instruction ins = (treeNode->TypeGet() == TYP_FLOAT) ? INS_sqrtss : INS_sqrtsd; 
        
                       GetEmitter()->emitInsBinary(ins, emitTypeSize(treeNode), treeNode, srcNode); 
        
                       break; 
        
                   } 
        
                   default: 
        
                       assert(!"genIntrinsic: Unsupported intrinsic"); 
        
                       unreached(); 
        
               } 
        
               genProduceReg(treeNode); 
        
           }

tannergooding · 2022-04-15T21:22:01Z

The second questions is, I can't figure out which ISA to use for converting NI_System_Math_Max to a relevant HW node. Could you give me a hand please?

Sorry, I had missed this. You can generally see the two approaches if you look at NI_System_Math_Round and NI_System_Math_FusedMultiplyAdd. You'll see that Compiler::IsTargetIntrinsic reports if the given math call can be handled via special logic here: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/importer.cpp#L20682-L20745

Compiler::impIntrinsic then specially handles importing Math_FusedMultiplyAdd directly as the relevant nodes here:

runtime/src/coreclr/jit/importer.cpp

Lines 4390 to 4459 in b5381bf

    
           #ifdef FEATURE_HW_INTRINSICS 
        
                       case NI_System_Math_FusedMultiplyAdd: 
        
                       { 
        
           #ifdef TARGET_XARCH 
        
                           if (compExactlyDependsOn(InstructionSet_FMA)) 
        
                           { 
        
                               assert(varTypeIsFloating(callType)); 
        
                               // We are constructing a chain of intrinsics similar to: 
        
                               //    return FMA.MultiplyAddScalar( 
        
                               //        Vector128.CreateScalarUnsafe(x), 
        
                               //        Vector128.CreateScalarUnsafe(y), 
        
                               //        Vector128.CreateScalarUnsafe(z) 
        
                               //    ).ToScalar(); 
        
                               GenTree* op3 = gtNewSimdHWIntrinsicNode(TYP_SIMD16, impPopStack().val, 
        
                                                                       NI_Vector128_CreateScalarUnsafe, callJitType, 16); 
        
                               GenTree* op2 = gtNewSimdHWIntrinsicNode(TYP_SIMD16, impPopStack().val, 
        
                                                                       NI_Vector128_CreateScalarUnsafe, callJitType, 16); 
        
                               GenTree* op1 = gtNewSimdHWIntrinsicNode(TYP_SIMD16, impPopStack().val, 
        
                                                                       NI_Vector128_CreateScalarUnsafe, callJitType, 16); 
        
                               GenTree* res = 
        
                                   gtNewSimdHWIntrinsicNode(TYP_SIMD16, op1, op2, op3, NI_FMA_MultiplyAddScalar, callJitType, 16); 
        
                               retNode = gtNewSimdHWIntrinsicNode(callType, res, NI_Vector128_ToScalar, callJitType, 16); 
        
                               break; 
        
                           } 
        
           #elif defined(TARGET_ARM64) 
        
                           if (compExactlyDependsOn(InstructionSet_AdvSimd)) 
        
                           { 
        
                               assert(varTypeIsFloating(callType)); 
        
                               // We are constructing a chain of intrinsics similar to: 
        
                               //    return AdvSimd.FusedMultiplyAddScalar( 
        
                               //        Vector64.Create{ScalarUnsafe}(z), 
        
                               //        Vector64.Create{ScalarUnsafe}(y), 
        
                               //        Vector64.Create{ScalarUnsafe}(x) 
        
                               //    ).ToScalar(); 
        
                               NamedIntrinsic createVector64 = 
        
                                   (callType == TYP_DOUBLE) ? NI_Vector64_Create : NI_Vector64_CreateScalarUnsafe; 
        
                               constexpr unsigned int simdSize = 8; 
        
                               GenTree* op3 = 
        
                                   gtNewSimdHWIntrinsicNode(TYP_SIMD8, impPopStack().val, createVector64, callJitType, simdSize); 
        
                               GenTree* op2 = 
        
                                   gtNewSimdHWIntrinsicNode(TYP_SIMD8, impPopStack().val, createVector64, callJitType, simdSize); 
        
                               GenTree* op1 = 
        
                                   gtNewSimdHWIntrinsicNode(TYP_SIMD8, impPopStack().val, createVector64, callJitType, simdSize); 
        
                               // Note that AdvSimd.FusedMultiplyAddScalar(op1,op2,op3) corresponds to op1 + op2 * op3 
        
                               // while Math{F}.FusedMultiplyAddScalar(op1,op2,op3) corresponds to op1 * op2 + op3 
        
                               retNode = gtNewSimdHWIntrinsicNode(TYP_SIMD8, op3, op2, op1, NI_AdvSimd_FusedMultiplyAddScalar, 
        
                                                                  callJitType, simdSize); 
        
                               retNode = gtNewSimdHWIntrinsicNode(callType, retNode, NI_Vector64_ToScalar, callJitType, simdSize); 
        
                               break; 
        
                           } 
        
           #endif 
        
                           // TODO-CQ-XArch: Ideally we would create a GT_INTRINSIC node for fma, however, that currently 
        
                           // requires more extensive changes to valuenum to support methods with 3 operands 
        
                           // We want to generate a GT_INTRINSIC node in the case the call can't be treated as 
        
                           // a target intrinsic so that we can still benefit from CSE and constant folding. 
        
                           break; 
        
                       } 
        
           #endif // FEATURE_HW_INTRINSICS

. It does this because FusedMultiplyAdd is a slightly more complex sequence of nodes.

Alternatively, many of the other Math_* intrinsics defer to impMathIntrinsic here:

runtime/src/coreclr/jit/importer.cpp

Lines 4461 to 4498 in b5381bf

    
                       case NI_System_Math_Abs: 
        
                       case NI_System_Math_Acos: 
        
                       case NI_System_Math_Acosh: 
        
                       case NI_System_Math_Asin: 
        
                       case NI_System_Math_Asinh: 
        
                       case NI_System_Math_Atan: 
        
                       case NI_System_Math_Atanh: 
        
                       case NI_System_Math_Atan2: 
        
                       case NI_System_Math_Cbrt: 
        
                       case NI_System_Math_Ceiling: 
        
                       case NI_System_Math_Cos: 
        
                       case NI_System_Math_Cosh: 
        
                       case NI_System_Math_Exp: 
        
                       case NI_System_Math_Floor: 
        
                       case NI_System_Math_FMod: 
        
                       case NI_System_Math_ILogB: 
        
                       case NI_System_Math_Log: 
        
                       case NI_System_Math_Log2: 
        
                       case NI_System_Math_Log10: 
        
           #ifdef TARGET_ARM64 
        
                       // ARM64 has fmax/fmin which are IEEE754:2019 minimum/maximum compatible 
        
                       // TODO-XARCH-CQ: Enable this for XARCH when one of the arguments is a constant 
        
                       // so we can then emit maxss/minss and avoid NaN/-0.0 handling 
        
                       case NI_System_Math_Max: 
        
                       case NI_System_Math_Min: 
        
           #endif 
        
                       case NI_System_Math_Pow: 
        
                       case NI_System_Math_Round: 
        
                       case NI_System_Math_Sin: 
        
                       case NI_System_Math_Sinh: 
        
                       case NI_System_Math_Sqrt: 
        
                       case NI_System_Math_Tan: 
        
                       case NI_System_Math_Tanh: 
        
                       case NI_System_Math_Truncate: 
        
                       { 
        
                           retNode = impMathIntrinsic(method, sig, callType, ni, tailCall); 
        
                           break; 
        
                       }

. Those get handling in a few phases throughout the JIT including containment handling here:

runtime/src/coreclr/jit/lowerxarch.cpp

Lines 5323 to 5346 in b5381bf

    
           void Lowering::ContainCheckIntrinsic(GenTreeOp* node) 
        
           { 
        
               assert(node->OperIs(GT_INTRINSIC)); 
        
               NamedIntrinsic intrinsicName = node->AsIntrinsic()->gtIntrinsicName; 
        
               if ((intrinsicName == NI_System_Math_Ceiling) || (intrinsicName == NI_System_Math_Floor) || 
        
                   (intrinsicName == NI_System_Math_Truncate) || (intrinsicName == NI_System_Math_Round) || 
        
                   (intrinsicName == NI_System_Math_Sqrt)) 
        
               { 
        
                   GenTree* op1 = node->gtGetOp1(); 
        
                   if ((IsContainableMemoryOp(op1) && IsSafeToContainMem(node, op1)) || op1->IsCnsNonZeroFltOrDbl()) 
        
                   { 
        
                       MakeSrcContained(node, op1); 
        
                   } 
        
                   else 
        
                   { 
        
                       // Mark the operand as reg optional since codegen can still 
        
                       // generate code if op1 is on stack. 
        
                       op1->SetRegOptional(); 
        
                   } 
        
               } 
        
           }

. And then code generation here:

runtime/src/coreclr/jit/codegenxarch.cpp

Lines 7421 to 7457 in b5381bf

    
           void CodeGen::genIntrinsic(GenTree* treeNode) 
        
           { 
        
               // Handle intrinsics that can be implemented by target-specific instructions 
        
               switch (treeNode->AsIntrinsic()->gtIntrinsicName) 
        
               { 
        
                   case NI_System_Math_Abs: 
        
                       genSSE2BitwiseOp(treeNode); 
        
                       break; 
        
                   case NI_System_Math_Ceiling: 
        
                   case NI_System_Math_Floor: 
        
                   case NI_System_Math_Truncate: 
        
                   case NI_System_Math_Round: 
        
                       genSSE41RoundOp(treeNode->AsOp()); 
        
                       break; 
        
                   case NI_System_Math_Sqrt: 
        
                   { 
        
                       // Both operand and its result must be of the same floating point type. 
        
                       GenTree* srcNode = treeNode->AsOp()->gtOp1; 
        
                       assert(varTypeIsFloating(srcNode)); 
        
                       assert(srcNode->TypeGet() == treeNode->TypeGet()); 
        
                       genConsumeOperands(treeNode->AsOp()); 
        
                       const instruction ins = (treeNode->TypeGet() == TYP_FLOAT) ? INS_sqrtss : INS_sqrtsd; 
        
                       GetEmitter()->emitInsBinary(ins, emitTypeSize(treeNode), treeNode, srcNode); 
        
                       break; 
        
                   } 
        
                   default: 
        
                       assert(!"genIntrinsic: Unsupported intrinsic"); 
        
                       unreached(); 
        
               } 
        
               genProduceReg(treeNode); 
        
           }

ghost · 2022-04-30T03:01:24Z

This pull request has been automatically marked no-recent-activity because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will remove no-recent-activity.

ghost · 2022-05-14T06:01:00Z

This pull request will now be closed since it had been marked no-recent-activity but received no further activity in the past 14 days. It is still possible to reopen or comment on the pull request, but please note that it will be locked if it remains inactive for another 30 days.

SkiFoD · 2022-05-16T07:09:03Z

@EgorBo @tannergooding I'm sorry, I had been busy recently and the PR was closed. Can I open it again?

tannergooding · 2022-05-16T07:40:33Z

Yep, feel free to reopen it whenever you have time again

Draft for Min/Max intrinsics xarch (dotnet#65625)

954fa30

ghost added the community-contribution Indicates that the PR has been added by a community member label Feb 22, 2022

dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed community-contribution Indicates that the PR has been added by a community member labels Feb 22, 2022

runfoapp bot mentioned this pull request Feb 22, 2022

System.Threading.Tests.PeriodicTimerTests.PeriodicTimer_ActiveOperations_TimerRooted failing in CI #59542

Closed

tannergooding reviewed Feb 22, 2022

View reviewed changes

Fix for VEX and Non-VEX instructions (dotnet#65625)

b450b25

SkiFoD requested review from EgorBo and tannergooding March 1, 2022 12:26

danmoseley mentioned this pull request Apr 10, 2022

JIT: Faster Math.Max/Min for x64 #65625

Closed

ghost added the no-recent-activity label Apr 15, 2022

ghost removed the no-recent-activity label Apr 15, 2022

ghost added the no-recent-activity label Apr 30, 2022

ghost closed this May 14, 2022

SkiFoD mentioned this pull request May 17, 2022

Optimization. Use Min/Max intrinsics if one of arguments is constant. #69434

Merged

ghost locked as resolved and limited conversation to collaborators Jun 15, 2022

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft for Min/Max intrinsics xarch (#65625) #65700

Draft for Min/Max intrinsics xarch (#65625) #65700

SkiFoD commented Feb 22, 2022

ghost commented Feb 22, 2022

EgorBo commented Feb 22, 2022

tannergooding Feb 22, 2022

tannergooding Feb 22, 2022

EgorBo Feb 22, 2022

SkiFoD Feb 22, 2022

tannergooding Mar 15, 2022

EgorBo commented Mar 15, 2022

SkiFoD commented Mar 15, 2022 •

edited

Loading

SkiFoD commented Mar 31, 2022

ghost commented Apr 15, 2022

tannergooding commented Apr 15, 2022

tannergooding commented Apr 15, 2022

ghost commented Apr 30, 2022

ghost commented May 14, 2022

SkiFoD commented May 16, 2022

tannergooding commented May 16, 2022

Draft for Min/Max intrinsics xarch (#65625) #65700

Draft for Min/Max intrinsics xarch (#65625) #65700

Conversation

SkiFoD commented Feb 22, 2022

ghost commented Feb 22, 2022

EgorBo commented Feb 22, 2022

tannergooding Feb 22, 2022

Choose a reason for hiding this comment

tannergooding Feb 22, 2022

Choose a reason for hiding this comment

EgorBo Feb 22, 2022

Choose a reason for hiding this comment

SkiFoD Feb 22, 2022

Choose a reason for hiding this comment

tannergooding Mar 15, 2022

Choose a reason for hiding this comment

EgorBo commented Mar 15, 2022

SkiFoD commented Mar 15, 2022 • edited Loading

SkiFoD commented Mar 31, 2022

ghost commented Apr 15, 2022

tannergooding commented Apr 15, 2022

tannergooding commented Apr 15, 2022

ghost commented Apr 30, 2022

ghost commented May 14, 2022

SkiFoD commented May 16, 2022

tannergooding commented May 16, 2022

SkiFoD commented Mar 15, 2022 •

edited

Loading