[ML] Fix incorrect assumption about minimum ML node size #91694

droberts195 · 2022-11-18T11:56:14Z

The ML autoscaling code was making an assumption that all ML nodes in Cloud will be at least 1GB. This is not correct. After allowing for logging and metrics collection it is possible for ML nodes to be smaller.

This PR updates the assumption to 0.5GB.

The ML autoscaling code was making an assumption that all ML nodes in Cloud will be at least 1GB. This is not correct. After allowing for logging and metrics collection it is possible for ML nodes to be smaller. This PR updates the assumption to 0.5GB.

elasticsearchmachine · 2022-11-18T11:56:39Z

Pinging @elastic/ml-core (Team:ML)

droberts195 · 2022-11-18T11:56:47Z

Marked as >non-issue even though it's a bug fix, as it relates to an internal implementation detail of autoscaling in ESS.

dimitris-athanasiou

LGTM

The ML autoscaling code was making an assumption that all ML nodes in Cloud will be at least 1GB. This is not correct. After allowing for logging and metrics collection it is possible for ML nodes to be smaller. This PR updates the assumption to 0.5GB.

elasticsearchmachine · 2022-11-18T12:45:25Z

💔 Backport failed

Status	Branch	Result
✅	8.6
❌	7.17	Commit could not be cherrypicked due to conflicts
✅	8.5

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 91694

The ML autoscaling code was making an assumption that all ML nodes in Cloud will be at least 1GB. This is not correct. After allowing for logging and metrics collection it is possible for ML nodes to be smaller. This PR updates the assumption to 0.5GB.

droberts195 · 2022-11-18T12:51:48Z

On closer inspection the code changed quite radically in 8.3, so I think it would be best not to backport this change to 7.x. Doing so might aggravate some of the other ML autoscaling discrepancies that were fixed in 8.3.

…1696) The ML autoscaling code was making an assumption that all ML nodes in Cloud will be at least 1GB. This is not correct. After allowing for logging and metrics collection it is possible for ML nodes to be smaller. This PR updates the assumption to 0.5GB.

…1697) The ML autoscaling code was making an assumption that all ML nodes in Cloud will be at least 1GB. This is not correct. After allowing for logging and metrics collection it is possible for ML nodes to be smaller. This PR updates the assumption to 0.5GB.

This change fixes a discrepancy that has existed for a long time but was revealed by elastic#91694. The ML automatic node/JVM sizing code contained a minimum node size but did not restrict the minimum JVM size to the size that would be chosen on that minimum node size. This could throw off calculations at small scale. Fixes elastic#91728

This change fixes a discrepancy that has existed for a long time but was revealed by #91694. The ML automatic node/JVM sizing code contained a minimum node size but did not restrict the minimum JVM size to the size that would be chosen on that minimum node size. This could throw off calculations at small scale. Fixes #91728

…ic#91732) This change fixes a discrepancy that has existed for a long time but was revealed by elastic#91694. The ML automatic node/JVM sizing code contained a minimum node size but did not restrict the minimum JVM size to the size that would be chosen on that minimum node size. This could throw off calculations at small scale. Fixes elastic#91728

… (#91742) This change fixes a discrepancy that has existed for a long time but was revealed by #91694. The ML automatic node/JVM sizing code contained a minimum node size but did not restrict the minimum JVM size to the size that would be chosen on that minimum node size. This could throw off calculations at small scale. Fixes #91728

… (#91741) This change fixes a discrepancy that has existed for a long time but was revealed by #91694. The ML automatic node/JVM sizing code contained a minimum node size but did not restrict the minimum JVM size to the size that would be chosen on that minimum node size. This could throw off calculations at small scale. Fixes #91728

droberts195 added >non-issue :ml Machine learning auto-backport-and-merge v8.6.0 v7.17.8 v8.7.0 v8.5.3 labels Nov 18, 2022

droberts195 requested a review from dimitris-athanasiou November 18, 2022 11:56

elasticsearchmachine added the Team:ML Meta label for the ML team label Nov 18, 2022

dimitris-athanasiou approved these changes Nov 18, 2022

View reviewed changes

droberts195 merged commit a0a743e into elastic:main Nov 18, 2022

droberts195 deleted the correct_min_ml_node_size branch November 18, 2022 12:43

droberts195 mentioned this pull request Nov 18, 2022

[8.6] [ML] Fix incorrect assumption about minimum ML node size (#91694) #91696

Merged

droberts195 mentioned this pull request Nov 18, 2022

[8.5] [ML] Fix incorrect assumption about minimum ML node size (#91694) #91697

Merged

elasticsearchmachine added the backport pending label Nov 18, 2022

droberts195 removed v7.17.8 backport pending labels Nov 18, 2022

juliaElastic mentioned this pull request Nov 18, 2022

[Fleet] Added logs-elastic_agent* read privileges to kibana_system #91701

Merged

droberts195 mentioned this pull request Nov 19, 2022

[ML] Impose a minimum on the automatically calculated JVM size #91732

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Fix incorrect assumption about minimum ML node size #91694

[ML] Fix incorrect assumption about minimum ML node size #91694

droberts195 commented Nov 18, 2022

elasticsearchmachine commented Nov 18, 2022

droberts195 commented Nov 18, 2022

dimitris-athanasiou left a comment

elasticsearchmachine commented Nov 18, 2022

droberts195 commented Nov 18, 2022

[ML] Fix incorrect assumption about minimum ML node size #91694

[ML] Fix incorrect assumption about minimum ML node size #91694

Conversation

droberts195 commented Nov 18, 2022

elasticsearchmachine commented Nov 18, 2022

droberts195 commented Nov 18, 2022

dimitris-athanasiou left a comment

Choose a reason for hiding this comment

elasticsearchmachine commented Nov 18, 2022

💔 Backport failed

droberts195 commented Nov 18, 2022