Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Including FastFloat in parsing process #62301

Merged
merged 29 commits into from
Feb 8, 2022

Conversation

CarlVerret
Copy link
Contributor

@CarlVerret CarlVerret commented Dec 2, 2021

This PR adds a new fast path to the processing of the floating-point numbers. It is used when the existing fast path (Clinger's) fails to apply, but before the existing 'slow path' is used. There's a corresponding issue ##48646.

To assess the performance effect, we have three realistic data sets (canada.txt, mesh.txt and synthetic.txt).

' This PR 
|         Method |      FileName |      Mean |     Error |    StdDev |    Median |       Min |       Max | MFloat/s |
|--------------- |-------------- |----------:|----------:|----------:|----------:|----------:|----------:|---------:|
| Double.Parse() |    canada.txt | 14.301 ms | 0.0465 ms | 0.0389 ms | 14.308 ms | 14.239 ms | 14.372 ms |     7,80 |
| Double.Parse() |      mesh.txt |  5.081 ms | 0.0130 ms | 0.0122 ms |  5.086 ms |  5.058 ms |  5.102 ms |    14,44 |
| Double.Parse() | synthetic.txt | 18.323 ms | 0.0356 ms | 0.0315 ms | 18.325 ms | 18.265 ms | 18.364 ms |     8,21 |

' Upstream main
|         Method |      FileName |      Mean |     Error |    StdDev |    Median |       Min |       Max | MFloat/s |
|--------------- |-------------- |----------:|----------:|----------:|----------:|----------:|----------:|---------:|
| Double.Parse() |    canada.txt | 29.391 ms | 0.0505 ms | 0.0448 ms | 29.385 ms | 29.322 ms | 29.464 ms |     3,79 |
| Double.Parse() |      mesh.txt |  4.924 ms | 0.0153 ms | 0.0128 ms |  4.924 ms |  4.905 ms |  4.943 ms |    14,89 |
| Double.Parse() | synthetic.txt | 39.016 ms | 0.0423 ms | 0.0353 ms | 39.012 ms | 38.961 ms | 39.070 ms |     3,85 |

The mesh.txt data set is a scenario where we rely on Clinger's fast path so we would not expect this PR to improve performance in that case. And indeed, we see no significant difference.

For canada.txt and synthetic.txt, this PR brings substantial gains: about 2x. Further gains are possible as demonstrated by the csFastFloat library, but they would be maybe more invasive and are maybe best done by distinct pull requests.

links :
csFastFloat repo : https://github.com/CarlVerret/csFastFloat
Test data can be found here : https://github.com/CarlVerret/csFastFloat/tree/master/Benchmark/data

Benchmarks had been executed on my personnal computer
Intel Core i5-5900 2.90GHz / 12 GB Ram
under Windows 11

@EgorBo @tannergooding @lemire

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Dec 2, 2021
@ghost
Copy link

ghost commented Dec 2, 2021

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR is a contribution to runtime regarding issue ##48646 to insert Lemire's Fast-float algorithm to doube/single/half parsing process.

Steps in the draft will be :

  • Integration of fast-float algorithm with minimum impact on the actual code for the 3 data types.
  • Prevention of bound checking for table lookup (powers of 5/10)
  • String parsing optimisation with SIMD vectorization

@EgorBo @tannergooding @lemire

Author: CarlVerret
Assignees: -
Labels:

area-System.Numerics, community-contribution

Milestone: -

@dnfadmin
Copy link

dnfadmin commented Dec 2, 2021

CLA assistant check
All CLA requirements met.

@EgorBo
Copy link
Member

EgorBo commented Dec 3, 2021

Awesome!
Looks like some asserts are fired:

Process terminated. Assertion failed.
   at System.Number.DigitsToUInt64(Byte* p, Int32 count) in /_/src/libraries/System.Private.CoreLib/src/System/Number.NumberToFloatingPointBits.cs:line 1021

@CarlVerret
Copy link
Contributor Author

Awesome! Looks like some asserts are fired:

Process terminated. Assertion failed.
   at System.Number.DigitsToUInt64(Byte* p, Int32 count) in /_/src/libraries/System.Private.CoreLib/src/System/Number.NumberToFloatingPointBits.cs:line 1021

yes, it looks like something' trying to read above 19 digits...

CarlVerret and others added 2 commits December 4, 2021 16:29
…oFloatingPointBits.cs

Co-authored-by: Günther Foidl <gue@korporal.at>
…oFloatingPointBits.cs

Co-authored-by: Günther Foidl <gue@korporal.at>
@CarlVerret CarlVerret marked this pull request as ready for review December 5, 2021 13:56
@CarlVerret CarlVerret marked this pull request as draft December 5, 2021 18:54
@lemire
Copy link

lemire commented Dec 6, 2021

@EgorBo @CarlVerret

A lot of the original code has been transformed to look like heap allocations (new ...). This is especially true of the constants...

https://github.com/CarlVerret/csFastFloat/blob/master/csFastFloat/Constants/Constants.cs

I am no C# expert but representing constant arrays as heap-allocated content seems likely to trigger heap allocations during the execution of the code which will surely harm the performance.

Of course, maybe this gets optimized away. But are we sure it is the case?

@tannergooding
Copy link
Member

@lemire is correct and these should stay as static readonly until the language support for the ReadOnlySpan<T> pattern is added for other primitive types.

@lemire
Copy link

lemire commented Dec 6, 2021

Reading the comments, I understand that @EgorBo suggested writing the code for the upcoming language support. Of course, this might imply that the current draft PR should be expected to have suboptimal performance with the current language support.

@CarlVerret
Copy link
Contributor Author

CarlVerret commented Dec 7, 2021

I ran existing benchmarks on both double and float : comparing main branch with this PR

This PR:

| Method |                    value |      Mean |    Error |   StdDev |    Median |       Min |       Max | Allocated |
|------- |------------------------- |----------:|---------:|---------:|----------:|----------:|----------:|----------:|
|  Parse | -1,7976931348623157e+308 | 149.68 ns | 0.929 ns | 0.869 ns | 149.31 ns | 148.28 ns | 150.93 ns |         - |
|  Parse |  1,7976931348623157e+308 | 149.58 ns | 1.713 ns | 1.603 ns | 150.13 ns | 146.93 ns | 152.03 ns |         - |
|  Parse |                    12345 |  63.43 ns | 0.109 ns | 0.091 ns |  63.44 ns |  63.30 ns |  63.58 ns |         - |


Main :

| Method |                    value |      Mean |    Error |   StdDev |    Median |       Min |       Max | Allocated |
|------- |------------------------- |----------:|---------:|---------:|----------:|----------:|----------:|----------:|
|  Parse | -1,7976931348623157e+308 | 388.55 ns | 3.163 ns | 2.959 ns | 387.14 ns | 385.30 ns | 395.41 ns |         - |
|  Parse |  1,7976931348623157e+308 | 388.25 ns | 3.735 ns | 3.311 ns | 386.89 ns | 385.13 ns | 396.69 ns |         - |
|  Parse |                    12345 |  62.62 ns | 0.338 ns | 0.316 ns |  62.49 ns |  62.33 ns |  63.18 ns |         - |


Single

This PR :

| Method |          value |      Mean |    Error |   StdDev |    Median |       Min |       Max | Allocated |
|------- |--------------- |----------:|---------:|---------:|----------:|----------:|----------:|----------:|
|  Parse | -3,4028235E+38 | 111.35 ns | 0.886 ns | 0.829 ns | 111.33 ns | 110.06 ns | 113.02 ns |         - |
|  Parse |          12345 |  64.66 ns | 0.333 ns | 0.295 ns |  64.59 ns |  64.21 ns |  65.27 ns |         - |
|  Parse |  3,4028235E+38 | 111.72 ns | 0.838 ns | 0.784 ns | 111.52 ns | 110.68 ns | 112.92 ns |         - |

Main :

| Method |          value |      Mean |    Error |   StdDev |    Median |       Min |       Max | Allocated |
|------- |--------------- |----------:|---------:|---------:|----------:|----------:|----------:|----------:|
|  Parse | -3,4028235E+38 | 150.71 ns | 1.136 ns | 1.063 ns | 150.64 ns | 149.16 ns | 152.24 ns |         - |
|  Parse |          12345 |  63.86 ns | 0.550 ns | 0.515 ns |  63.85 ns |  63.31 ns |  64.95 ns |         - |
|  Parse |  3,4028235E+38 | 148.15 ns | 0.646 ns | 0.604 ns | 147.80 ns | 147.51 ns | 149.51 ns |         - |


@stephentoub
Copy link
Member

I understand that @EgorBo suggested writing the code for the upcoming language support. Of course, this might imply that the current draft PR should be expected to have suboptimal performance with the current language support.

I agree with @tannergooding and @lemire that we should stick with static readonly array fields until compiler support is actually in place. At that point, there are many other places we'll be updating, and we can just update these as part of that.

@CarlVerret
Copy link
Contributor Author

Thank you Tanner. It has been a nice learning occasion for me.

@tannergooding
Copy link
Member

Thank you as well, excited to get this in once CI is passing (the OSX failure was unrelated; I've not looked at the other 2 yet).

@CarlVerret
Copy link
Contributor Author

Thank you as well, excited to get this in once CI is passing (the OSX failure was unrelated; I've not looked at the other 2 yet).

is there something I can do to help with failing checks ?

@tannergooding
Copy link
Member

We need to ensure they aren't related and ideally get CI passing.

It looks like there is an arm32 timeout that should have been resolved in #63357
and a WASM failure that doesn't look familiar (possibly #64759 ?)

If you push a merge commit integrating the current dotnet/main then it should pick up the fixes that have gone in and will ensure the rest of CI can finish.


I'm decently confident that the remaining failures are not related to your changes here and are instead CI issues on our side, so I can also take care of pushing the relevant merge commit if you would prefer

@danmoseley
Copy link
Member

Simply clicking close then reopen will restart validation on rebased code of course. Assuming no conflicts.

@danmoseley
Copy link
Member

@tannergooding historically you ran I think an extra JavaScript test corpus - is that necessary too?

@tannergooding
Copy link
Member

@danmoseley, those run implicitly now: https://github.com/dotnet/runtime/blob/main/src/libraries/System.Runtime/tests/System/DoubleTests.cs#L378-L419

There are always more test suites we could run on top, but I wouldn't expect they would catch anything additional here.

Simply clicking close then reopen will restart validation on rebased code of course. Assuming no conflicts.

I've seen issues with that in the past and have found pushing a merge commit to be much more reliable.

@CarlVerret
Copy link
Contributor Author

just to make sure I'm doing the right thing here : you mean pulling the main into my branch then push back my branch to update this PR?

@tannergooding
Copy link
Member

you mean pulling the main into my branch then push back my branch to update this PR?

Right, something roughly equivalent to:

git fetch --all
git merge dotnet/main
git push

Where dotnet/main could also be called upstream/main or something similar (whatever matches git remote -v and gives you https://github.com/dotnet/runtime)

@tannergooding
Copy link
Member

OSX failures are #65000.

Going to merge this, thanks a lot for the contribution @CarlVerret (and @lemire for the original algorithm/base implementation)!

@tannergooding tannergooding merged commit a702712 into dotnet:main Feb 8, 2022
@CarlVerret
Copy link
Contributor Author

@tannergooding maybe just a little question regarding this PR. Will this contribution be available only for a future version (7.x) of the .net core framework or it will be also included for previous versions ? (5.x, 6.x, etc...) Is there an ETA for a public release of the runtime including this PR ?

@CarlVerret CarlVerret deleted the 48646-FastFloat-integration branch February 8, 2022 21:20
@tannergooding
Copy link
Member

Only on future versions of .NET (7.x and later), we don't tend to backport perf improvements to past releases (even LTS) due to the risk.

This should be available in nightly runtime builds as soon as later today (https://github.com/dotnet/runtime/blob/main/docs/project/dogfooding.md has some more info on how to try out nightly builds) and in the nightly SDK likely in a few days (same link has info on this as well).

It should ship publicly in .NET 7 Preview 2

@CarlVerret
Copy link
Contributor Author

Fine, thanks! I was asking because csFastFloat was built upon .net core 5 and I'd like to run my benchmarks again to compare with 7.x. The goal is to evaluate the impact of another contribution we might suggest for the parsing process.

@danmoseley
Copy link
Member

I probably missed it, but do the inputs to our existing benchmarks in dotnet/performance suffice here? Or should we extend with some of the inputs that proved this change? (your point taken about functional tests)

@tannergooding
Copy link
Member

but do the inputs to our existing benchmarks in dotnet/performance suffice here?

There are lots of improvements we could potentially make to the perf tests, but the current should be at least somewhat representative.

I'd be more interested in what things like ML.NET show with this change.

@EgorBo
Copy link
Member

EgorBo commented Feb 15, 2022

Nice improvements: dotnet/perf-autofiling-issues#3531 and dotnet/perf-autofiling-issues#3543 cc @CarlVerret

@tannergooding
Copy link
Member

Noting that the improvement is for long strings and is going from 300ns to 100ns while the regression is for a short string and is going from 35ns to 37ns.

This regression is acceptable, but we should do some minimal investigation to see what is slower here (its probably just a branch or some other minor thing and if its trivially "fixable", then we should do so; but its not worth spending a lot of time on it).

@CarlVerret
Copy link
Contributor Author

While programming the PR, i've tried to identify what could cause this little regression without any luck. I am really interested if you can provide some insight on how to fix that kind of variation.

@tannergooding
Copy link
Member

I'll run this through AMD uProf (https://developer.amd.com/amd-uprof/) and see if I can see anything interesting.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Numerics community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants