-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow different scheduling affinity for different sections of a file #22563
Allow different scheduling affinity for different sections of a file #22563
Conversation
8a693cd
to
47cb31d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an integration test that verifies this all works when scheduling by section?
Duration::toString), | ||
dataSizeSessionProperty( | ||
AFFINITY_SCHEDULING_FILE_SECTION_SIZE, | ||
"Size of file section for affinity scheduling. Each section of a give size may be assigned an independent affinity preference.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of a give size --> of a given size
Also, is that true? If there are five different sizes there can be five different affinity preferences? How does that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does sound confusing. Maybe you can help me with a concise and more intuitive description.
Let me try to elaborate more on what is going on here.
Currently preferred nodes for affinity scheduling are assigned based on the file path. However it is hard to guarantee the files being similar is size and relatively small. Hence we are running into a skew.
Here's a real example from one of our production systems:
One of the input tables consist of multiple partitions. Each partition consist of a single file (~30GB in size). Queries usually read a single partition at a time, and multiple queries are submitted hitting a single partition at the same time.
Now since there's only a single file per partition, the partition is always read by a single "preferred" node creating a major bottleneck.
The idea is to increase granularity by assigning different preferred nodes to different sections of a file (256MB by default). So a single file can be cached and served by more than a single node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I suspect right here you should simply delete "Each section of a give size may be assigned an independent affinity preference." "Size of file section for affinity scheduling" is all that's needed to describe this property. A detailed description of affinity scheduling can be added elsewhere in the prose docs. It's perhaps worthy of an entire page of its own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
presto-hive/src/main/java/com/facebook/presto/hive/HiveFileSplit.java
Outdated
Show resolved
Hide resolved
Can we also add to the release note about the new config, session property ? |
If this PR adds a new config or session property, please add documentation for the new property. |
47cb31d
to
0046776
Compare
Codenotify: Notifying subscribers in CODENOTIFY files for diff 656d95a...0f300d9.
|
Updated documentation and release notes. @steveburnett, @NikhilCollooru could you please take an another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the documentation! I made a couple of rephrasing suggestions, let me know what you think.
I noticed in the release note draft the presence of both a hive.affinity-scheduling-file-section-size
configuration property and a affinity_scheduling_file_section_size
session property. This doc addition addresses the config property but there's no mention of the session property. Is that sufficient?
Following the Release Notes Guidelines Order of Changes and Phrasing, suggest the following revision for consideration for the release note entry:
or
depending on the relative importance of the feature to the property, as mentioned in Phrasing. |
I would probably not expect people to change this value much a certain session. But yeah, I think it might be worth adding for completeness. Updated. |
0046776
to
0f300d9
Compare
Thanks for the review @steveburnett . All great suggestions. Updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull updated branch, new local build, everything looks good. Thanks!
Description
Increase granularity of affinity node selection (from per file - to per file section)
Motivation and Context
Currently node affinity is assigned for the entire file as a whole. When files are large it may create a significant scheduling skew.
Impact
Splitting file into sections and assigning affinity independently should help mitigate the skew
Test Plan
Unit test