Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Fix incorrect assumption about minimum ML node size #91694

Merged
merged 1 commit into from
Nov 18, 2022

Conversation

droberts195
Copy link
Contributor

The ML autoscaling code was making an assumption that all ML nodes in Cloud will be at least 1GB. This is not correct. After allowing for logging and metrics collection it is possible for ML nodes to be smaller.

This PR updates the assumption to 0.5GB.

The ML autoscaling code was making an assumption that all ML
nodes in Cloud will be at least 1GB. This is not correct. After
allowing for logging and metrics collection it is possible for
ML nodes to be smaller.

This PR updates the assumption to 0.5GB.
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@droberts195
Copy link
Contributor Author

Marked as >non-issue even though it's a bug fix, as it relates to an internal implementation detail of autoscaling in ESS.

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@droberts195 droberts195 merged commit a0a743e into elastic:main Nov 18, 2022
@droberts195 droberts195 deleted the correct_min_ml_node_size branch November 18, 2022 12:43
droberts195 added a commit to droberts195/elasticsearch that referenced this pull request Nov 18, 2022
The ML autoscaling code was making an assumption that all ML
nodes in Cloud will be at least 1GB. This is not correct. After
allowing for logging and metrics collection it is possible for
ML nodes to be smaller.

This PR updates the assumption to 0.5GB.
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.6
7.17 Commit could not be cherrypicked due to conflicts
8.5

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 91694

droberts195 added a commit to droberts195/elasticsearch that referenced this pull request Nov 18, 2022
The ML autoscaling code was making an assumption that all ML
nodes in Cloud will be at least 1GB. This is not correct. After
allowing for logging and metrics collection it is possible for
ML nodes to be smaller.

This PR updates the assumption to 0.5GB.
@droberts195
Copy link
Contributor Author

On closer inspection the code changed quite radically in 8.3, so I think it would be best not to backport this change to 7.x. Doing so might aggravate some of the other ML autoscaling discrepancies that were fixed in 8.3.

elasticsearchmachine pushed a commit that referenced this pull request Nov 18, 2022
…1696)

The ML autoscaling code was making an assumption that all ML
nodes in Cloud will be at least 1GB. This is not correct. After
allowing for logging and metrics collection it is possible for
ML nodes to be smaller.

This PR updates the assumption to 0.5GB.
elasticsearchmachine pushed a commit that referenced this pull request Nov 18, 2022
…1697)

The ML autoscaling code was making an assumption that all ML
nodes in Cloud will be at least 1GB. This is not correct. After
allowing for logging and metrics collection it is possible for
ML nodes to be smaller.

This PR updates the assumption to 0.5GB.
droberts195 added a commit to droberts195/elasticsearch that referenced this pull request Nov 19, 2022
This change fixes a discrepancy that has existed for a long time
but was revealed by elastic#91694. The ML automatic node/JVM sizing code
contained a minimum node size but did not restrict the minimum
JVM size to the size that would be chosen on that minimum node
size. This could throw off calculations at small scale.

Fixes elastic#91728
droberts195 added a commit that referenced this pull request Nov 21, 2022
This change fixes a discrepancy that has existed for a long time
but was revealed by #91694. The ML automatic node/JVM sizing code
contained a minimum node size but did not restrict the minimum
JVM size to the size that would be chosen on that minimum node
size. This could throw off calculations at small scale.

Fixes #91728
droberts195 added a commit to droberts195/elasticsearch that referenced this pull request Nov 21, 2022
…ic#91732)

This change fixes a discrepancy that has existed for a long time
but was revealed by elastic#91694. The ML automatic node/JVM sizing code
contained a minimum node size but did not restrict the minimum
JVM size to the size that would be chosen on that minimum node
size. This could throw off calculations at small scale.

Fixes elastic#91728
droberts195 added a commit to droberts195/elasticsearch that referenced this pull request Nov 21, 2022
…ic#91732)

This change fixes a discrepancy that has existed for a long time
but was revealed by elastic#91694. The ML automatic node/JVM sizing code
contained a minimum node size but did not restrict the minimum
JVM size to the size that would be chosen on that minimum node
size. This could throw off calculations at small scale.

Fixes elastic#91728
elasticsearchmachine pushed a commit that referenced this pull request Nov 21, 2022
… (#91742)

This change fixes a discrepancy that has existed for a long time
but was revealed by #91694. The ML automatic node/JVM sizing code
contained a minimum node size but did not restrict the minimum
JVM size to the size that would be chosen on that minimum node
size. This could throw off calculations at small scale.

Fixes #91728
elasticsearchmachine pushed a commit that referenced this pull request Nov 21, 2022
… (#91741)

This change fixes a discrepancy that has existed for a long time
but was revealed by #91694. The ML automatic node/JVM sizing code
contained a minimum node size but did not restrict the minimum
JVM size to the size that would be chosen on that minimum node
size. This could throw off calculations at small scale.

Fixes #91728
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning >non-issue Team:ML Meta label for the ML team v8.5.3 v8.6.0 v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants