Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-41541: [Go][Parquet] Fix writer performance regression #41638

Merged
merged 2 commits into from
May 15, 2024

Conversation

zeroshade
Copy link
Member

@zeroshade zeroshade commented May 13, 2024

Rationale for this change

A performance regression was reported for the parquet writer since v14. Profiling revealed excessive allocations. This was due to us always adding the current offset to the current capacity when reserving, resulting in Reserve always performing a reallocate even when it didn't need to.

What changes are included in this PR?

PooledBufferWriter should only pass nbytes to the Reserve call, not byteoffset + nbytes. BitWriter should not be adding b.offset to the capacity when determining the new capacity.

Are these changes tested?

Yes.

Are there any user-facing changes?

No, only performance changes:

Before:

goos: linux
goarch: amd64
pkg: github.com/apache/arrow/go/v17/parquet/pqarrow
cpu: 12th Gen Intel(R) Core(TM) i7-12700H
BenchmarkWriteColumn/int32_not_nullable-20         	     514	   2127175 ns/op	1971.77 MB/s	 5425676 B/op	     239 allocs/op
BenchmarkWriteColumn/int32_nullable-20             	      31	 467352621 ns/op	   8.97 MB/s	2210271923 B/op2350 allocs/op
BenchmarkWriteColumn/int64_not_nullable-20         	     326	   4132204 ns/op	2030.06 MB/s	 5442976 B/op	     265 allocs/op
BenchmarkWriteColumn/int64_nullable-20             	      33	 432764687 ns/op	  19.38 MB/s	2100068812 B/op2384 allocs/op
BenchmarkWriteColumn/float32_not_nullable-20       	     334	   3540566 ns/op	1184.64 MB/s	 5453079 B/op	    1263 allocs/op
BenchmarkWriteColumn/float32_nullable-20           	       6	 492103646 ns/op	   8.52 MB/s	2283305841 B/op3371 allocs/op
BenchmarkWriteColumn/float64_not_nullable-20       	     241	   4783268 ns/op	1753.74 MB/s	 5498759 B/op	    1292 allocs/op
BenchmarkWriteColumn/float64_nullable-20           	       4	 369619096 ns/op	  22.70 MB/s	1725354454 B/op3401 allocs/op
PASS
ok  	github.com/apache/arrow/go/v17/parquet/pqarrow	40.862s

After:

goos: linux
goarch: amd64
pkg: github.com/apache/arrow/go/v17/parquet/pqarrow
cpu: 12th Gen Intel(R) Core(TM) i7-12700H
BenchmarkWriteColumn/int32_not_nullable-20         	     500	   2136823 ns/op	1962.87 MB/s	 5410591 B/op	     240 allocs/op
BenchmarkWriteColumn/int32_nullable-20             	      48	  26604880 ns/op	 157.65 MB/s	12053510 B/op	     250 allocs/op
BenchmarkWriteColumn/int64_not_nullable-20         	     340	   3530509 ns/op	2376.03 MB/s	 5439578 B/op	     265 allocs/op
BenchmarkWriteColumn/int64_nullable-20             	      44	  27387334 ns/op	 306.30 MB/s	11870305 B/op	     260 allocs/op
BenchmarkWriteColumn/float32_not_nullable-20       	     316	   3479312 ns/op	1205.50 MB/s	 5456685 B/op	    1263 allocs/op
BenchmarkWriteColumn/float32_nullable-20           	      50	  25910872 ns/op	 161.87 MB/s	12054582 B/op	    1271 allocs/op
BenchmarkWriteColumn/float64_not_nullable-20       	     249	   4769664 ns/op	1758.74 MB/s	 5486020 B/op	    1292 allocs/op
BenchmarkWriteColumn/float64_nullable-20           	      51	  25496256 ns/op	 329.01 MB/s	12140753 B/op	    1284 allocs/op
PASS
ok  	github.com/apache/arrow/go/v17/parquet/pqarrow	11.492s

All of the nullable column cases average around a 16x-17x performance improvement.

@zeroshade zeroshade requested review from bkietz and mapleFU May 13, 2024 14:59
Copy link

⚠️ GitHub issue #41541 has been automatically assigned in GitHub to PR creator.

@zeroshade
Copy link
Member Author

@zhouyan can you give this a test to confirm that it fixes the performance degradation you saw?

@ggodik
Copy link
Contributor

ggodik commented May 13, 2024

confirming this restored write performance to pre v14 levels

thanks @zeroshade

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General this LGTM

@@ -185,7 +185,7 @@ func (b *PooledBufferWriter) Reserve(nbytes int) {
b.buf = bufferPool.Get().(*memory.Buffer)
}

newCap := utils.Max(b.buf.Cap()+b.offset, 256)
newCap := utils.Max(b.buf.Cap(), 256)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why at least 256 more bytes should be reserved?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not "256 more bytes", it's giving a minimum. If you have a buffer which currently has less than 256 bytes as its capacity, we push its capacity to 256 in order to reduce future reallocations (since this is a buffer pool and we'll eventually reuse the buffer).

If this becomes an issue for anyone we can definitely make this configurable or reduce it. but it does go a long way to reduce small allocations if you're writing small row groups

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels May 14, 2024
@zhouyan
Copy link
Contributor

zhouyan commented May 15, 2024 via email

@zeroshade zeroshade merged commit e1de9c5 into apache:main May 15, 2024
27 checks passed
@zeroshade zeroshade removed the awaiting changes Awaiting changes label May 15, 2024
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit e1de9c5.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them.

@zeroshade zeroshade deleted the parquet-performance-fix branch May 16, 2024 14:47
vibhatha pushed a commit to vibhatha/arrow that referenced this pull request May 25, 2024
…he#41638)

### Rationale for this change
A performance regression was reported  for the parquet writer since v14. Profiling revealed excessive allocations. This was due to us always adding the current offset to the current capacity when reserving, resulting in Reserve always performing a reallocate even when it didn't need to.

### What changes are included in this PR?
`PooledBufferWriter` should only pass `nbytes` to the `Reserve` call, not `byteoffset + nbytes`. `BitWriter` should not be adding `b.offset` to the capacity when determining the new capacity.

### Are these changes tested?
Yes.

### Are there any user-facing changes?
No, only performance changes:

Before:
```shell
goos: linux
goarch: amd64
pkg: github.com/apache/arrow/go/v17/parquet/pqarrow
cpu: 12th Gen Intel(R) Core(TM) i7-12700H
BenchmarkWriteColumn/int32_not_nullable-20         	     514	   2127175 ns/op	1971.77 MB/s	 5425676 B/op	     239 allocs/op
BenchmarkWriteColumn/int32_nullable-20             	      31	 467352621 ns/op	   8.97 MB/s	2210271923 B/op2350 allocs/op
BenchmarkWriteColumn/int64_not_nullable-20         	     326	   4132204 ns/op	2030.06 MB/s	 5442976 B/op	     265 allocs/op
BenchmarkWriteColumn/int64_nullable-20             	      33	 432764687 ns/op	  19.38 MB/s	2100068812 B/op2384 allocs/op
BenchmarkWriteColumn/float32_not_nullable-20       	     334	   3540566 ns/op	1184.64 MB/s	 5453079 B/op	    1263 allocs/op
BenchmarkWriteColumn/float32_nullable-20           	       6	 492103646 ns/op	   8.52 MB/s	2283305841 B/op3371 allocs/op
BenchmarkWriteColumn/float64_not_nullable-20       	     241	   4783268 ns/op	1753.74 MB/s	 5498759 B/op	    1292 allocs/op
BenchmarkWriteColumn/float64_nullable-20           	       4	 369619096 ns/op	  22.70 MB/s	1725354454 B/op3401 allocs/op
PASS
ok  	github.com/apache/arrow/go/v17/parquet/pqarrow	40.862s
```

After:
```shell
goos: linux
goarch: amd64
pkg: github.com/apache/arrow/go/v17/parquet/pqarrow
cpu: 12th Gen Intel(R) Core(TM) i7-12700H
BenchmarkWriteColumn/int32_not_nullable-20         	     500	   2136823 ns/op	1962.87 MB/s	 5410591 B/op	     240 allocs/op
BenchmarkWriteColumn/int32_nullable-20             	      48	  26604880 ns/op	 157.65 MB/s	12053510 B/op	     250 allocs/op
BenchmarkWriteColumn/int64_not_nullable-20         	     340	   3530509 ns/op	2376.03 MB/s	 5439578 B/op	     265 allocs/op
BenchmarkWriteColumn/int64_nullable-20             	      44	  27387334 ns/op	 306.30 MB/s	11870305 B/op	     260 allocs/op
BenchmarkWriteColumn/float32_not_nullable-20       	     316	   3479312 ns/op	1205.50 MB/s	 5456685 B/op	    1263 allocs/op
BenchmarkWriteColumn/float32_nullable-20           	      50	  25910872 ns/op	 161.87 MB/s	12054582 B/op	    1271 allocs/op
BenchmarkWriteColumn/float64_not_nullable-20       	     249	   4769664 ns/op	1758.74 MB/s	 5486020 B/op	    1292 allocs/op
BenchmarkWriteColumn/float64_nullable-20           	      51	  25496256 ns/op	 329.01 MB/s	12140753 B/op	    1284 allocs/op
PASS
ok  	github.com/apache/arrow/go/v17/parquet/pqarrow	11.492s
```

All of the nullable column cases average around a 16x-17x performance improvement.

* GitHub Issue: apache#41541

Authored-by: Matt Topol <zotthewizard@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
zeroshade pushed a commit that referenced this pull request Jul 10, 2024
…42003)

### Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

This PR is complementary to #41638 .

The prior PR reduces reallocations in `PooledBufferWriter`. However the
problematic formula it addressed is still used in other functions.

In addition to this, `(*PooledBufferWriter).Reserve()` simply doubles
the capacity of buffers regardless of its argument `nbytes`. This may
result in excessive allocations in some cases.

### What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

- Applied the fixed formula to `(*BufferWriter).Reserve()`.
- Updated the new capacity passed to `(*memory.Buffer).Reserve()`.
- Now using `bitutil.NextPowerOf2(b.pos + nbytes)` to avoid
reallocations when adding `nbytes`.
- Replaced `math.Max` with `utils.Max` in
`(*bufferWriteSeeker).Reserve()` to avoid unnecessary type conversions.

### Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Yes. The following commands pass.

```
$ export PARQUET_TEST_DATA=$PWD/cpp/submodules/parquet-testing/data
$ (cd go && go test ./...)
```

### Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

No, but it may reduce the number of allocations and improve the
throughput.

Before:

```
$ go test -test.run='^$' -test.bench='^BenchmarkWriteColumn$' -benchmem ./parquet/pqarrow/...
goos: linux
goarch: arm64
pkg: github.com/apache/arrow/go/v17/parquet/pqarrow
BenchmarkWriteColumn/int32_not_nullable-10                  1190           1016705 ns/op        4125.39 MB/s     5443579 B/op        240 allocs/op
BenchmarkWriteColumn/int32_nullable-10                        52          24780561 ns/op         169.26 MB/s    12048944 B/op        249 allocs/op
BenchmarkWriteColumn/int64_not_nullable-10                   632           1717090 ns/op        4885.36 MB/s     5445954 B/op        265 allocs/op
BenchmarkWriteColumn/int64_nullable-10                        51          22949770 ns/op         365.52 MB/s    12209860 B/op        262 allocs/op
BenchmarkWriteColumn/float32_not_nullable-10                 519           2234718 ns/op        1876.88 MB/s     5452627 B/op       1263 allocs/op
BenchmarkWriteColumn/float32_nullable-10                      56          23423793 ns/op         179.06 MB/s    12057540 B/op       1272 allocs/op
BenchmarkWriteColumn/float64_not_nullable-10                 416           2761247 ns/op        3037.98 MB/s     5507068 B/op       1292 allocs/op
BenchmarkWriteColumn/float64_nullable-10                      51          25767881 ns/op         325.55 MB/s    12059614 B/op       1285 allocs/op
PASS
ok      github.com/apache/arrow/go/v17/parquet/pqarrow  10.592s
```

After:

```
$ go test -test.run='^$' -test.bench='^BenchmarkWriteColumn$' -benchmem ./parquet/pqarrow/...
goos: linux
goarch: arm64
pkg: github.com/apache/arrow/go/v17/parquet/pqarrow
BenchmarkWriteColumn/int32_not_nullable-10                  1196            959528 ns/op        4371.22 MB/s     5420349 B/op        238 allocs/op
BenchmarkWriteColumn/int32_nullable-10                        51          23017598 ns/op         182.22 MB/s    14138480 B/op        248 allocs/op
BenchmarkWriteColumn/int64_not_nullable-10                   690           1671710 ns/op        5017.98 MB/s     5419878 B/op        263 allocs/op
BenchmarkWriteColumn/int64_nullable-10                        50          23196051 ns/op         361.64 MB/s    13728465 B/op        261 allocs/op
BenchmarkWriteColumn/float32_not_nullable-10                 540           2185075 ns/op        1919.52 MB/s     5459392 B/op       1261 allocs/op
BenchmarkWriteColumn/float32_nullable-10                      54          21796783 ns/op         192.43 MB/s    14150622 B/op       1271 allocs/op
BenchmarkWriteColumn/float64_not_nullable-10                 418           2708292 ns/op        3097.38 MB/s     5455095 B/op       1290 allocs/op
BenchmarkWriteColumn/float64_nullable-10                      51          22174952 ns/op         378.29 MB/s    14142791 B/op       1283 allocs/op
PASS
ok      github.com/apache/arrow/go/v17/parquet/pqarrow  10.210s
```

<!--
If there are any breaking changes to public APIs, please uncomment the
line below and explain which changes are breaking.
-->
<!-- **This PR includes breaking changes to public APIs.** -->

<!--
Please uncomment the line below (and provide explanation) if the changes
fix either (a) a security vulnerability, (b) a bug that caused incorrect
or invalid data to be produced, or (c) a bug that causes a crash (even
when the API contract is upheld). We use this to highlight fixes to
issues that may affect users without their knowledge. For this reason,
fixing bugs that cause errors don't count, since those are usually
obvious.
-->
<!-- **This PR contains a "Critical Fix".** -->
* GitHub Issue: #41541
raulcd pushed a commit that referenced this pull request Jul 11, 2024
…42003)

### Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

This PR is complementary to #41638 .

The prior PR reduces reallocations in `PooledBufferWriter`. However the
problematic formula it addressed is still used in other functions.

In addition to this, `(*PooledBufferWriter).Reserve()` simply doubles
the capacity of buffers regardless of its argument `nbytes`. This may
result in excessive allocations in some cases.

### What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

- Applied the fixed formula to `(*BufferWriter).Reserve()`.
- Updated the new capacity passed to `(*memory.Buffer).Reserve()`.
- Now using `bitutil.NextPowerOf2(b.pos + nbytes)` to avoid
reallocations when adding `nbytes`.
- Replaced `math.Max` with `utils.Max` in
`(*bufferWriteSeeker).Reserve()` to avoid unnecessary type conversions.

### Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Yes. The following commands pass.

```
$ export PARQUET_TEST_DATA=$PWD/cpp/submodules/parquet-testing/data
$ (cd go && go test ./...)
```

### Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

No, but it may reduce the number of allocations and improve the
throughput.

Before:

```
$ go test -test.run='^$' -test.bench='^BenchmarkWriteColumn$' -benchmem ./parquet/pqarrow/...
goos: linux
goarch: arm64
pkg: github.com/apache/arrow/go/v17/parquet/pqarrow
BenchmarkWriteColumn/int32_not_nullable-10                  1190           1016705 ns/op        4125.39 MB/s     5443579 B/op        240 allocs/op
BenchmarkWriteColumn/int32_nullable-10                        52          24780561 ns/op         169.26 MB/s    12048944 B/op        249 allocs/op
BenchmarkWriteColumn/int64_not_nullable-10                   632           1717090 ns/op        4885.36 MB/s     5445954 B/op        265 allocs/op
BenchmarkWriteColumn/int64_nullable-10                        51          22949770 ns/op         365.52 MB/s    12209860 B/op        262 allocs/op
BenchmarkWriteColumn/float32_not_nullable-10                 519           2234718 ns/op        1876.88 MB/s     5452627 B/op       1263 allocs/op
BenchmarkWriteColumn/float32_nullable-10                      56          23423793 ns/op         179.06 MB/s    12057540 B/op       1272 allocs/op
BenchmarkWriteColumn/float64_not_nullable-10                 416           2761247 ns/op        3037.98 MB/s     5507068 B/op       1292 allocs/op
BenchmarkWriteColumn/float64_nullable-10                      51          25767881 ns/op         325.55 MB/s    12059614 B/op       1285 allocs/op
PASS
ok      github.com/apache/arrow/go/v17/parquet/pqarrow  10.592s
```

After:

```
$ go test -test.run='^$' -test.bench='^BenchmarkWriteColumn$' -benchmem ./parquet/pqarrow/...
goos: linux
goarch: arm64
pkg: github.com/apache/arrow/go/v17/parquet/pqarrow
BenchmarkWriteColumn/int32_not_nullable-10                  1196            959528 ns/op        4371.22 MB/s     5420349 B/op        238 allocs/op
BenchmarkWriteColumn/int32_nullable-10                        51          23017598 ns/op         182.22 MB/s    14138480 B/op        248 allocs/op
BenchmarkWriteColumn/int64_not_nullable-10                   690           1671710 ns/op        5017.98 MB/s     5419878 B/op        263 allocs/op
BenchmarkWriteColumn/int64_nullable-10                        50          23196051 ns/op         361.64 MB/s    13728465 B/op        261 allocs/op
BenchmarkWriteColumn/float32_not_nullable-10                 540           2185075 ns/op        1919.52 MB/s     5459392 B/op       1261 allocs/op
BenchmarkWriteColumn/float32_nullable-10                      54          21796783 ns/op         192.43 MB/s    14150622 B/op       1271 allocs/op
BenchmarkWriteColumn/float64_not_nullable-10                 418           2708292 ns/op        3097.38 MB/s     5455095 B/op       1290 allocs/op
BenchmarkWriteColumn/float64_nullable-10                      51          22174952 ns/op         378.29 MB/s    14142791 B/op       1283 allocs/op
PASS
ok      github.com/apache/arrow/go/v17/parquet/pqarrow  10.210s
```

<!--
If there are any breaking changes to public APIs, please uncomment the
line below and explain which changes are breaking.
-->
<!-- **This PR includes breaking changes to public APIs.** -->

<!--
Please uncomment the line below (and provide explanation) if the changes
fix either (a) a security vulnerability, (b) a bug that caused incorrect
or invalid data to be produced, or (c) a bug that causes a crash (even
when the API contract is upheld). We use this to highlight fixes to
issues that may affect users without their knowledge. For this reason,
fixing bugs that cause errors don't count, since those are usually
obvious.
-->
<!-- **This PR contains a "Critical Fix".** -->
* GitHub Issue: #41541
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants