-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Iceberg split manager threads configurable #22754
Make Iceberg split manager threads configurable #22754
Conversation
Thanks for the change. I notice that Iceberg use java system property |
03762df
to
6bdf1c8
Compare
@hantangwangd Thank you for your review. I have added this configuration parameter to the document. |
6bdf1c8
to
5560bba
Compare
So that others can more easily know that they need to tune this value, can you please also export the metrics from this executor? See an example in HivePartitionManager#getExecutor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull branch, local docs build, looks good. Thanks!
Suggest revising the release note entry based on the Order of changes and Phrasing of the Release Notes Guidelines:
|
5e58ce2
to
57f8110
Compare
Hi @tdcmeehan Thank you for your review. I followed your suggestion and added the Executor information to JMX metrics.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix.
|
||
``iceberg.split-manager-threads`` Number of threads to use for generating iceberg splits. ``Number of available processors`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing, should configuration properties add here? See Configuration Properties
chapter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question, and good catch! I have a question for you:
Is the iceberg.split-manager-threads
property related to Manifest File Caching?
- If yes, then I think it should stay where it is.
- If no, then I suggest moving the
iceberg.split-manager-threads
property to the table in Configuration Properties that you noticed.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not related to Manifest File Caching
, so I think it's better to be placed in Configuration Properties
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hantangwangd @steveburnett I read the documentation and found that I had indeed added the wrong place, and I have corrected it. Thanks.
8ff1bb2
to
2af039a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wypb Thanks for your work! Just left a minor suggestion.
@@ -60,6 +60,7 @@ public class IcebergConfig | |||
private long maxManifestCacheSize = IO_MANIFEST_CACHE_MAX_TOTAL_BYTES_DEFAULT; | |||
private long manifestCacheExpireDuration = IO_MANIFEST_CACHE_EXPIRATION_INTERVAL_MS_DEFAULT; | |||
private long manifestCacheMaxContentLength = IO_MANIFEST_CACHE_MAX_CONTENT_LENGTH_DEFAULT; | |||
private int splitManagerThreads = Runtime.getRuntime().availableProcessors(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that they used Runtime.getRuntime().availableProcessors() * 2
as the default value, probably because each processor core commonly supports 2 threads (hyper-threading). Are there any specific reasons for your setting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This default value is to be consistent with the default value of Iceberg's iceberg.worker.num-threads
attribute:https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/SystemConfigs.java#L38-L43
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Thanks for the clarification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for moving the property to the best location! One final nit just noticed today, about capitalizing Iceberg.
@@ -247,6 +247,8 @@ Property Name Description | |||
metadata optimization is skipped. | |||
|
|||
Set to ``0`` to disable metadata optimization. | |||
|
|||
``iceberg.split-manager-threads`` Number of threads to use for generating iceberg splits. ``Number of available processors`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
``iceberg.split-manager-threads`` Number of threads to use for generating iceberg splits. ``Number of available processors`` | |
``iceberg.split-manager-threads`` Number of threads to use for generating Iceberg splits. ``Number of available processors`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch @steveburnett, I've already fixed it.
@@ -60,6 +60,7 @@ public class IcebergConfig | |||
private long maxManifestCacheSize = IO_MANIFEST_CACHE_MAX_TOTAL_BYTES_DEFAULT; | |||
private long manifestCacheExpireDuration = IO_MANIFEST_CACHE_EXPIRATION_INTERVAL_MS_DEFAULT; | |||
private long manifestCacheMaxContentLength = IO_MANIFEST_CACHE_MAX_CONTENT_LENGTH_DEFAULT; | |||
private int splitManagerThreads = Runtime.getRuntime().availableProcessors(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Thanks for the clarification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but since it is a cherry-pick from trino, I think you should credit the original author as the co-author in the commit.
cfc5671
d9591b8
to
070f748
Compare
|
Hi @ZacBlanco Thank you for your reply, I've added the co-author to the commit log. |
070f748
to
2d0fec1
Compare
Cherry-pick of trinodb/trino@57f0081 Co-authored-by: electrum
2d0fec1
to
51b619b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull updated branch, local build, everything looks good. Thanks!
Description
We have an Iceberg table with a lot of metadata. Recently, we found that every time we query this table, the Coordinator's CPU usage will be very high. We checked and found that it is because when calling Iceberg's
tableScan#planFiles()
API, Iceberg's tableScan will use all cores available in the system (Runtime.getRuntime().availableProcessors()
) for calculations. This PR allows us to control the number of threads used by Iceberg's tableScan on the Presto side.BTW, this PR Cherry-pick from trinodb/trino@57f0081
The JMX metrics for executor used for Iceberg tableScan can be tracked at
com.facebook.presto.iceberg:name=iceberg,type=icebergsplitmanager
Motivation and Context
Impact
Test Plan
Contributor checklist
Release Notes
CC: @tdcmeehan @yingsu00 @ChunxuTang @hantangwangd