Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ZSTD default compression for Parquet writes #4726

Merged
merged 4 commits into from
Jun 5, 2023
Merged

Conversation

clairemcginty
Copy link
Contributor

@clairemcginty clairemcginty commented Feb 27, 2023

(fix #4698)

The Parquet Java library supports ZSTD with a default level of 3 (doc) based off the zstd-jni library; level can be customized using the Configuration parquet.compression.codec.zstd.level.

ZSTD has been shown to have moderate performance improvements over GZIP or Snappy, and is supported for BigQuery loads.

Copy link
Contributor

@regadas regadas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌

@codecov
Copy link

codecov bot commented Feb 27, 2023

Codecov Report

Merging #4726 (fa7ac8f) into main (3776382) will increase coverage by 0.51%.
The diff coverage is 90.20%.

❗ Current head fa7ac8f differs from pull request most recent head 3eac71f. Consider uploading reports for the commit 3eac71f to get more accurate results

@@            Coverage Diff             @@
##             main    #4726      +/-   ##
==========================================
+ Coverage   61.26%   61.78%   +0.51%     
==========================================
  Files         286      288       +2     
  Lines       10572    10598      +26     
  Branches      776      758      -18     
==========================================
+ Hits         6477     6548      +71     
+ Misses       4095     4050      -45     
Impacted Files Coverage Δ
...otify/scio/coders/LowPriorityCoderDerivation.scala 97.43% <ø> (ø)
...com/spotify/scio/coders/instances/JavaCoders.scala 92.72% <ø> (ø)
...in/scala/com/spotify/scio/coders/CustomCoder.scala 84.90% <84.90%> (ø)
...n/scala/com/spotify/scio/coders/WrappedCoder.scala 88.46% <88.46%> (ø)
...src/main/scala/com/spotify/scio/coders/Coder.scala 93.47% <100.00%> (+8.21%) ⬆️
...com/spotify/scio/coders/instances/JodaCoders.scala 98.30% <100.00%> (+0.12%) ⬆️
...om/spotify/scio/coders/instances/ScalaCoders.scala 74.39% <100.00%> (+1.35%) ⬆️
.../com/spotify/scio/parquet/avro/ParquetAvroIO.scala 87.71% <100.00%> (ø)
...ify/scio/parquet/tensorflow/ParquetExampleIO.scala 89.02% <100.00%> (ø)
...com/spotify/scio/parquet/types/ParquetTypeIO.scala 98.59% <100.00%> (ø)
... and 2 more

... and 1 file with indirect coverage changes

@kellen kellen added this to the 0.13.0 milestone May 16, 2023
@RustedBones RustedBones changed the base branch from v0.13.x to main May 17, 2023 12:02
@RustedBones RustedBones changed the base branch from main to v0.13.x May 17, 2023 12:04
@RustedBones RustedBones changed the base branch from v0.13.x to main May 17, 2023 12:06
@RustedBones
Copy link
Contributor

@clairemcginty what the state of this one ?

@clairemcginty
Copy link
Contributor Author

@RustedBones ready to merge for Scio 0.13 👍

@RustedBones RustedBones merged commit 7459661 into main Jun 5, 2023
@RustedBones RustedBones deleted the zstd_default branch June 5, 2023 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support ZSTD compression for Parquet
4 participants