Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero-extend int #61

Merged
merged 1 commit into from
Feb 17, 2022
Merged

Zero-extend int #61

merged 1 commit into from
Feb 17, 2022

Conversation

buybackoff
Copy link
Contributor

@buybackoff buybackoff commented Feb 16, 2022

Following Mendel's link to "Peanut butter" https://devblogs.microsoft.com/dotnet/performance-improvements-in-net-6/#peanut-butter.

This gives 2-3% on my machine on the same HT core. The bench with different physical cores became very unstable on my machine: jumping from 1xx to 280 (compared to my previous PR, which works as it did on the same machine). This is not related to this change, it jumps before the change as well.

Adding ExtraWork method, which was removed for some reason, every time is tedious so I have not, but it makes batching behavior more stable. So these numbers need to be double checked on different machines.

Following Mendel's link to "Peanut butter" https://devblogs.microsoft.com/dotnet/performance-improvements-in-net-6/#peanut-butter.

This gives 2-3% on my machine on the same HT core. The bench with different physical cores became very unstable on my machine: jumping from 1xx to 280 (compared to my previous PR, which works as it did on the same machine).

Adding `ExtraWork` method, which was removed for some reason, every time is tedious so I have not, but it makes batching behavior more stable. So these numbers need to be double checked on different machines.
@MendelMonteiro
Copy link
Member

Yes, well spotted :)

The associated PR is dotnet/runtime#51190 for reference's sake.

@@ -58,10 +58,13 @@ public static T Read<T>(object array, int index)
Ldloc_0(); // load the object pointer as a byref

Ldarg(nameof(index));
Conv_U(); // zero extend

Sizeof(typeof(object));
Mul(); // index x sizeof(object)

Call(MethodRef.PropertyGet(typeof(InternalUtil), nameof(ArrayDataOffset)));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could avoid zero extend at all by storing ArrayDataOffset as nint

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make the code easier to understand as well and is exactly what nint was intended for

@ocoanet
Copy link
Collaborator

ocoanet commented Feb 17, 2022

Thank you for spotting this.

|  Method |      Mean |     Error |    StdDev |
|-------- |----------:|----------:|----------:|
|   Read1 | 0.8264 ns | 0.0014 ns | 0.0013 ns |
|   Read2 | 0.7641 ns | 0.0013 ns | 0.0012 ns |

Not a huge win but this method method was already quite optimized.
I take every perf improvement that has a reasonable impact on complexity 🙂

I will add back ExtraWork to the perf test if it helps stabilizing results on your machine.
However, for such low level improvements, benchmarks might be more appropriate.

@buybackoff
Copy link
Contributor Author

buybackoff commented Feb 17, 2022

Adding ExtraWork is extra work 😄
I did not need it that much actually this time. It helps control batching, I could add it next time.

@ocoanet
Copy link
Collaborator

ocoanet commented Feb 17, 2022

I get the same perf improvements using nint (or nuint) for ArrayDataOffset and index. For some reason the Conv_U for ArrayDataOffset does not change anything, only the one for index is required, but it feels more coherent to apply the conversion for both.

I will accept the PR and then apply the same trick to other InternalUtil, unless you want to change them too?

@buybackoff
Copy link
Contributor Author

The change after adding the second Conv_U for ArrayDataOffset was quite small, close to noise, but after around 5 runs with and without it the result looked better with it. Maybe just noise. For JIT it a constant, probably it knows what to do itself.

apply the same trick to other InternalUtil, unless you want to change them too?

Go ahead!

@ocoanet ocoanet merged commit 1b2baf0 into disruptor-net:master Feb 17, 2022
@ocoanet
Copy link
Collaborator

ocoanet commented Feb 17, 2022

Just in time for V5.0!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants