Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow different scheduling affinity for different sections of a file #22563

Merged
merged 1 commit into from
Apr 23, 2024

Conversation

arhimondr
Copy link
Member

@arhimondr arhimondr commented Apr 18, 2024

Description

Increase granularity of affinity node selection (from per file - to per file section)

Motivation and Context

Currently node affinity is assigned for the entire file as a whole. When files are large it may create a significant scheduling skew.

Impact

Splitting file into sections and assigning affinity independently should help mitigate the skew

Test Plan

Unit test

== RELEASE NOTES ==

Hive Connector Changes
* Improve affinity scheduling granularity from a file to a section of a file by adding a `hive.affinity-scheduling-file-section-size` configuration property and `affinity_scheduling_file_section_size` session property. The default file size is 256MB. :pr:`22563`

Copy link
Contributor

@elharo elharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an integration test that verifies this all works when scheduling by section?

Duration::toString),
dataSizeSessionProperty(
AFFINITY_SCHEDULING_FILE_SECTION_SIZE,
"Size of file section for affinity scheduling. Each section of a give size may be assigned an independent affinity preference.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of a give size --> of a given size
Also, is that true? If there are five different sizes there can be five different affinity preferences? How does that work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does sound confusing. Maybe you can help me with a concise and more intuitive description.

Let me try to elaborate more on what is going on here.

Currently preferred nodes for affinity scheduling are assigned based on the file path. However it is hard to guarantee the files being similar is size and relatively small. Hence we are running into a skew.

Here's a real example from one of our production systems:

One of the input tables consist of multiple partitions. Each partition consist of a single file (~30GB in size). Queries usually read a single partition at a time, and multiple queries are submitted hitting a single partition at the same time.

Now since there's only a single file per partition, the partition is always read by a single "preferred" node creating a major bottleneck.

The idea is to increase granularity by assigning different preferred nodes to different sections of a file (256MB by default). So a single file can be cached and served by more than a single node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I suspect right here you should simply delete "Each section of a give size may be assigned an independent affinity preference." "Size of file section for affinity scheduling" is all that's needed to describe this property. A detailed description of affinity scheduling can be added elsewhere in the prose docs. It's perhaps worthy of an entire page of its own.

NikhilCollooru
NikhilCollooru previously approved these changes Apr 21, 2024
Copy link
Contributor

@NikhilCollooru NikhilCollooru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@NikhilCollooru
Copy link
Contributor

Can we also add to the release note about the new config, session property ?

@steveburnett
Copy link
Contributor

Can we also add to the release note about the new config, session property ?

If this PR adds a new config or session property, please add documentation for the new property.

Copy link

github-actions bot commented Apr 22, 2024

Codenotify: Notifying subscribers in CODENOTIFY files for diff 656d95a...0f300d9.

Notify File(s)
@steveburnett presto-docs/src/main/sphinx/cache/local.rst

@arhimondr
Copy link
Member Author

Updated documentation and release notes.

@steveburnett, @NikhilCollooru could you please take an another look?

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the documentation! I made a couple of rephrasing suggestions, let me know what you think.

I noticed in the release note draft the presence of both a hive.affinity-scheduling-file-section-size configuration property and a affinity_scheduling_file_section_size session property. This doc addition addresses the config property but there's no mention of the session property. Is that sufficient?

presto-docs/src/main/sphinx/cache/local.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/cache/local.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/cache/local.rst Outdated Show resolved Hide resolved
@steveburnett
Copy link
Contributor

Following the Release Notes Guidelines Order of Changes and Phrasing, suggest the following revision for consideration for the release note entry:

== RELEASE NOTES ==

Hive Connector Changes
* Improve affinity scheduling granularity from a file to a section of a file by adding a `hive.affinity-scheduling-file-section-size` configuration property and `affinity_scheduling_file_section_size` session property. The default file size is 256MB. :pr:`22563`

or

== RELEASE NOTES ==

Hive Connector Changes
* Add `hive.affinity-scheduling-file-section-size` configuration property and `affinity_scheduling_file_section_size` session property to improve affinity scheduling granularity from a file to a section of a file. The default file size is 256MB. :pr:`22563`

depending on the relative importance of the feature to the property, as mentioned in Phrasing.

@arhimondr
Copy link
Member Author

@steveburnett

This doc addition addresses the config property but there's no mention of the session property. Is that sufficient?

I would probably not expect people to change this value much a certain session. But yeah, I think it might be worth adding for completeness. Updated.

@arhimondr arhimondr force-pushed the preferred-node-block-selection branch from 0046776 to 0f300d9 Compare April 23, 2024 00:20
@arhimondr
Copy link
Member Author

Thanks for the review @steveburnett . All great suggestions. Updated.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)
Pull updated branch, new local build, everything looks good. Thanks!

@arhimondr arhimondr merged commit ff5282e into prestodb:master Apr 23, 2024
57 checks passed
@arhimondr arhimondr deleted the preferred-node-block-selection branch April 23, 2024 16:45
@wanglinsong wanglinsong mentioned this pull request Jun 25, 2024
36 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants