Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization. Use Min/Max intrinsics if one of arguments is constant. #69434

Merged
merged 13 commits into from
Jun 23, 2022

Conversation

SkiFoD
Copy link
Contributor

@SkiFoD SkiFoD commented May 17, 2022

fixes #65625
@tannergooding I had spent some time and implemented HW intrinstics for Min/Max and as asmdiffs shows more regressions I assume I did it wrong. I also couldn't figured it how to force the containment, I've always gotten something like this:
image
if I understand the role of "containment" right then vmovss must go away.

Alternatively, I tried to investigagate the lowering way and change ContainCheckIntrinsic method, but ended up with the IsContainableMemoryOp check is never true for Math.Min/Math.Max. The only option I had was marking both op's with SetRegOptional which doesn't seem right.

I would be so happy if you pushed me to the right direction :)

This PR was created due #65700 had been closed.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 17, 2022
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label May 17, 2022
@ghost
Copy link

ghost commented May 17, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

@tannergooding I had spent some time and implemented HW intrinstics for Min/Max and as asmdiffs shows more regressions I assume I did it wrong. I also couldn't figured it how to force the containment, I've always gotten something like this:
image
if I understand the role of "containment" right then vmovss must go away.

Alternatively, I tried to investigagate the lowering way and change ContainCheckIntrinsic method, but ended up with the IsContainableMemoryOp check is never true for Math.Min/Math.Max. The only option I had was marking both op's with SetRegOptional which doesn't seem right.

I would be so happy if you pushed me to the right direction :)

This PR was created due #65700 had been closed.

Author: SkiFoD
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@tannergooding
Copy link
Member

tannergooding commented May 17, 2022

I had spent some time and implemented HW intrinstics for Min/Max and as asmdiffs shows more regressions I assume I did it wrong. I also couldn't figured it how to force the containment, I've always gotten something like this:

Could you share what C# code you're using to get this. At first glance this looks like it might be a loop and so the relevant data may be getting hoisted.

A simple: Max(x, y[i]) for float x and float[] y should be enough to trigger containment here.

@SkiFoD
Copy link
Contributor Author

SkiFoD commented May 17, 2022

I had spent some time and implemented HW intrinstics for Min/Max and as asmdiffs shows more regressions I assume I did it wrong. I also couldn't figured it how to force the containment, I've always gotten something like this:

Could you share what C# code you're using to get this. At first glance this looks like it might be a loop and so the relevant data may be getting hoisted.

A simple: Max(x, y[i]) for float x and float[] y should be enough to trigger containment here.

Yes, you are right. I was using this loop to force it to be optimized:

 for(int i = 0; i <= 300000;i++)
  {
      result +=  Math.Min(x, y);
  }

There are not JitDisasm output when I use these flags: COMPlus_TieredCompilation=0 COMPlus_JITMinOpts=1, so I used the loop :)
Is there any way to fix this behaviour?

C# code as is return Math.Min(x, y[0]) produces this:
image

@jakobbotsch
Copy link
Member

COMPlus_JITMinOpts=1

This will disable all JIT optimizations so you should not set this.
If you are not seeing JitDisasm it's typically because your function was inlined, make sure to mark it with [MethodImpl(MethodImplOptions.NoInlining)] to prevent that. If you post the full disassembly and C# code instead of just snippets it's easier for us to figure out what might be wrong.

@SkiFoD
Copy link
Contributor Author

SkiFoD commented May 18, 2022

You are right, my code was inlined so there was not any output. Thank you, now It works 💯

@tannergooding @jakobbotsch What do you think of this implementation?
https://github.com/dotnet/runtime/blob/a9fbdbee8052bd14674d8712d1cd69b45dcdb0ef/src/coreclr/jit/importer.cpp#L4525-L4545
I chose SSE as minimum required for the optimization, is that right?

src/coreclr/jit/importer.cpp Outdated Show resolved Hide resolved
@JulieLeeMSFT JulieLeeMSFT added this to the 7.0.0 milestone Jun 2, 2022
impPopStack().val;
impPopStack().val;

op1 = gtNewSimdHWIntrinsicNode(TYP_SIMD16, op1, NI_Vector128_CreateScalarUnsafe, callJitType, 16);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the assert and that we now have GT_CNS_VEC, this should probably be:

GenTreeVecCon* vecCon = gtNewVconNode(TYP_SIMD16, callJitType);

if (callJitType == CORINFO_TYPE_FLOAT)
{
    vecCon->gtSimd16Val.f32[0] = (float)op1->AsDblCon()->gtDconVal;
}
else
{
    vecCon->gtSimd16Val.f64[0] = op1->AsDblCon()->gtDconVal;
}

op1 = vecCon;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise the JIT is just going to have to convert the CreateScalarUnsafe back to a GT_CNS_VEC later in morph.

@SkiFoD
Copy link
Contributor Author

SkiFoD commented Jun 17, 2022

@tannergooding Could you assist me with figuring out why I'm getting compilation failures during spmi replays. It feels like I'm missing something.

@tannergooding
Copy link
Member

Its not clear to me at a glance. Many methods are failing with missing key "key" in map ....

The PR isn't changing the JIT/EE version and isn't adding anything that should really cause an issue here.

Maybe try merging in latest dotnet/main and seeing if it was just a CI issue

@jakobbotsch
Copy link
Member

We don't do a good job of surfacing the assertion failures in superpmi-replay pipeline -- it takes some digging (usually jit devs run the replays locally, where it is more visible). In this case the failure is:

[18:57:25] ISSUE: <ASSERT> #45239 D:\a\_work\1\s\src\coreclr\jit\importer.cpp (4600) - Assertion failed 'ni == NI_System_Math_Min' in 'C:M(double):int' during 'Importation' (IL size 54; hash 0x97f702cc; FullOpts)

which looks related to the change.

@SkiFoD
Copy link
Contributor Author

SkiFoD commented Jun 21, 2022

@tannergooding @jakobbotsch I made some adjustments, but CI failed with:

Assert failure(PID 5132 [0x0000140c], Thread: 4600 [0x11f8]): m_alignpad == 0 
File: D:\a\_work\1\s\src\coreclr\vm\syncblk.cpp Line: 2952

#if defined(HOST_64BIT) && defined(_DEBUG)
void ObjHeader::IllegalAlignPad()
{
WRAPPER_NO_CONTRACT;
#ifdef LOGGING
void** object = ((void**) this) + 1;
LogSpewAlways("\n\n******** Illegal ObjHeader m_alignpad not 0, object" FMT_ADDR "\n\n",
DBG_ADDR(object));
#endif
_ASSERTE(m_alignpad == 0);
}
#endif // HOST_64BIT && _DEBUG

during System.Text.Json tests.

I run the tests on my local machine and couldn't reproduce the failure. I also run Fuzzlyn to find an issue, but got nothing.
Do you have any idea, how my code changes may cause the assert to fail?

@jakobbotsch
Copy link
Member

That one looks like #70231

Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
@SkiFoD
Copy link
Contributor Author

SkiFoD commented Jun 23, 2022

@jakobbotsch Is there a way to restart CI pipelines? It seems like the current run is stuck.

@jakobbotsch
Copy link
Member

@SkiFoD I've restarted the failing jobs.

@SkiFoD
Copy link
Contributor Author

SkiFoD commented Jun 23, 2022

@tannergooding @jakobbotsch Yay, looks like it is working. Thank you a lot for your help. I have learnt a lot 👍

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Assuming @jakobbotsch has no more feedback I think this can be merged

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to me too

@tannergooding tannergooding merged commit 5877e8b into dotnet:main Jun 23, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jul 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JIT: Faster Math.Max/Min for x64
4 participants