[Monitoring] Disk usage alerting #75419

igoristic · 2020-08-19T12:17:23Z

Resolves #74819

This is part of the "Additional Alerting" effort for Stack Monitoring

The check calculates each data node to make sure the disk usage is below the implied threshold.

Testing:

Create a Stack Monitoring environment
Through the regular Setup Mode > Alert Edit flow/ux, set the threshold to something low (like 2%)

…usage-alerting

elasticmachine · 2020-09-02T13:49:04Z

Pinging @elastic/stack-monitoring (Team:Monitoring)

chrisronline

Overall, nice work so far!

Found a couple of things right away that's making it hard to continue to test

x-pack/plugins/monitoring/server/alerts/disk_usage_alert.ts

…usage-alerting

chrisronline

Added another comment about the next steps and hopefully get feedback from Ravi

x-pack/plugins/monitoring/server/alerts/disk_usage_alert.ts

chrisronline

We should also add the DISK alert to this list: https://github.com/elastic/kibana/blob/master/x-pack/plugins/monitoring/public/views/elasticsearch/nodes/index.js#L86

x-pack/plugins/monitoring/server/lib/alerts/fetch_disk_usage_node_stats.ts

chrisronline · 2020-09-04T20:07:13Z

Also, @hbharding came up with some changes to the panel that I think will make it look better. WDYT?

igoristic · 2020-09-04T22:51:21Z

Also, @hbharding came up with some changes to the panel that I think will make it look better. WDYT?

This seems like it should be separate issue, maybe?

Or, I could try doing it here, just feel like there'll be some back n forth if there isn't the how it should look picture

chrisronline · 2020-09-08T13:15:58Z

This seems like it should be separate issue, maybe?

Yes, good point! Let's tackle it separately

…usage-alerting

chrisronline · 2020-09-17T17:17:53Z

x-pack/plugins/monitoring/server/alerts/disk_usage_alert.ts

+        createLink(
+          'xpack.monitoring.alerts.diskUsage.ui.nextSteps.resizeYourDeployment',
+          'Resize your deployment (ECE)',
+          `{elasticWebsiteUrl}/guide/en/cloud-enterprise/{docLinkVersion}/ece-resize-deployment.html`


I don't think cloud docs work the same way. The correct url seems to be https://www.elastic.co/guide/en/cloud-enterprise/current/ece-resize-deployment.html whereas this code generates https://www.elastic.co/guide/en/cloud-enterprise/master/ece-resize-deployment.html

chrisronline · 2020-09-17T19:20:04Z

x-pack/plugins/monitoring/server/alerts/cpu_usage_alert.ts

-      }, [] as string[]);
-      firingNodeUuids.sort(); // It doesn't matter how we sort, but keep the order consistent
-      const instanceId = `${this.type}:${cluster.clusterUuid}:${firingNodeUuids.join(',')}`;
+      const instanceId = `.monitoring:${this.type}:${cluster.clusterUuid}`;


While this does help with maintaining instance state for after resolving an alert, it introduces another problem. The unique instance id is where the throttling is enforced - meaning if you create an instance with a previously used id, the actions are subject to the throttle period started by the first time you used that instance.

In this case, since the instance id is based off the cluster id, this will never change from cluster to cluster, even if the number of nodes that are firing changes.

Imagine a 3 node cluster (A, B, C) and node A is firing an alert, this instance id will be .monitoring:cpu_usage:clusterUuid and the actions will fire and then the throttling will start for all .monitoring:cpu_usage:clusterUuid instances. Now, imagine node A resolves itself, but node B starts firing an alert. This will run and try and fire the actions, but it will be subject to the throttle period (which by default is 1d) so they wouldn't see any messaging about it.

This is why I originally did it by firingNodeUuids to ensure we generated a unique instance id based on what was actually firing.

I'm not sure we can go in this direction because I worry that our alerting will miss valid cases where it should send actions and we will lose trust with our users.

WDYT?

…usage-alerting

chrisronline · 2020-09-23T14:17:54Z

x-pack/plugins/monitoring/server/alerts/disk_usage_alert.ts

+          oldState.ui.isFiring !== newAlertState.ui.isFiring &&
+          oldState.ui.resolvedMS !== newAlertState.ui.resolvedMS
+      );
+      if (!relatedOldState) {


I'm not sure I understand this logic here.

If the the old state is the same node as the new state, but the isFiring flipped and resolvedMS is different, that means we need to fire a resolution? I'm not sure we even need to look at resolvedMS. If the isFiring went from true to false, then we need to fire resolved actions.

Or am I just not reading this properly?

chrisronline · 2020-09-23T14:24:15Z

x-pack/plugins/monitoring/server/alerts/disk_usage_alert.ts

+    }
+
+    if (deltaFiringStates.length) {
+      const instance = services.alertInstanceFactory(`${deltaInstanceIdPrefix}:firing`);


I like this idea a lot - We can just execute actions off a unique instance id for firing and resolved

chrisronline · 2020-09-23T14:58:43Z

x-pack/plugins/monitoring/server/alerts/disk_usage_alert.ts

+    }
+  }
+
+  protected async processData(


What if we did something like this for processData to handle the resolutions?

https://gist.github.com/chrisronline/8cc094cd1876e895746d5e91db84be7c

This working properly actually uncovers a UX issue in the UI where we don't really surface this well. We should really defer this to a separate PR IMO.

…usage-alerting

chrisronline

Looking good! A couple of things I noticed

x-pack/plugins/monitoring/server/alerts/alerts_common.ts

x-pack/plugins/monitoring/server/alerts/disk_usage_alert.ts

…usage-alerting

kibanamachine · 2020-09-30T05:36:57Z

💚 Build Succeeded

continuous-integration/kibana-ci/pull-request
Commit: e56ea3e

Metrics [docs]

@kbn/optimizer bundle module count

id	value	diff	baseline
`monitoring`	628	+4	624

async chunks size

id	value	diff	baseline
`monitoring`	1.2MB	+137.0B	1.2MB

distributable file count

id	value	diff	baseline
`default`	45784	+3	45781

page load bundle size

id	value	diff	baseline
`monitoring`	183.3KB	+22.7KB	160.6KB

History

💚 Build #77725 succeeded 47711db
💔 Build #76129 failed b32226a
💔 Build #75276 failed 481f144
💔 Build #72621 failed add24ff
💚 Build #71876 succeeded 2ba9fbe

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

chrisronline

LGTM! Awesome job!

* Disk usage alert draft * Fixed typings and defaults * Fixed tests * Fixed tests * Addressed code feedback * Fixed disk and cpu usage states * Fixed resolve state and throttle * CR feedback * Fixed links

igoristic · 2020-09-30T18:22:46Z

Backport:
7.x: bf93191

* master: (97 commits) [Actions] Adds a "Test Connector" button on the Connectors List to make discovery of the Test tab easier (elastic#78746) [Discover] Fix functional time picker test permissions (elastic#78564) [ML] Fixing module datafeed overrides (elastic#78925) Adds some missing licenses to the CSV export (elastic#78719) [dev/cli] ensure plugins/ and all watch source dirs exist (elastic#78973) [Lens] Stop using scripted metric to collect telemetry (elastic#78687) [Lens] fix wrong message in fields accordion (elastic#78924) [Enterprise Search][App Search] Credentials Logic updates (elastic#78644) [Monitoring] Disk usage alerting (elastic#75419) [SECURITY_SOLUTION] Trusted apps list expand/collapse details (elastic#78601) Update content on interstitial page (elastic#78881) chore(NA): include hjson as a prod dependency (elastic#78941) Fix empty meta fields input in Advanced Settings (elastic#78576) [Lens] Maintain order of operations in dimension panel (elastic#78864) Fix plugin doc title (elastic#78880) load apm-rum agent lazily (elastic#78760) [ML] Skip full ML access permission test Optimize charts plugin (elastic#78922) ui_actions service initial docs (elastic#78902) skip failing suite (elastic#78942) ...

Disk usage alert draft

795218e

igoristic added release_note:enhancement Team:Monitoring Stack Monitoring team v8.0.0 Feature:Stack Monitoring v7.10.0 labels Aug 19, 2020

igoristic added 9 commits August 21, 2020 01:38

Merge branch 'master' of https://github.com/elastic/kibana into disk-…

2e306fe

…usage-alerting

Merge branch 'master' of https://github.com/elastic/kibana into disk-…

3e8a441

…usage-alerting

Merge branch 'master' of https://github.com/elastic/kibana into disk-…

e46f3db

…usage-alerting

Merge branch 'master' of https://github.com/elastic/kibana into disk-…

065c669

…usage-alerting

Fixed typings and defaults

98ad58d

Merge branch 'master' of https://github.com/elastic/kibana into disk-…

a2e5e69

…usage-alerting

Fixed tests

26e1f84

Fixed tests

d7bca1c

Merge branch 'master' of https://github.com/elastic/kibana into disk-…

2ba9fbe

…usage-alerting

igoristic requested a review from a team September 2, 2020 13:48

igoristic marked this pull request as ready for review September 2, 2020 13:49

chrisronline suggested changes Sep 2, 2020

View reviewed changes

x-pack/plugins/monitoring/server/alerts/disk_usage_alert.ts Outdated Show resolved Hide resolved

x-pack/plugins/monitoring/server/alerts/disk_usage_alert.ts Outdated Show resolved Hide resolved

x-pack/plugins/monitoring/server/alerts/disk_usage_alert.ts Show resolved Hide resolved

igoristic added 3 commits September 4, 2020 05:26

Merge branch 'master' of https://github.com/elastic/kibana into disk-…

6f0007f

…usage-alerting

Merge branch 'master' of https://github.com/elastic/kibana into disk-…

6f3e35f

…usage-alerting

Addressed code feedback

add24ff

igoristic requested a review from chrisronline September 4, 2020 14:05

chrisronline reviewed Sep 4, 2020

View reviewed changes

x-pack/plugins/monitoring/server/alerts/disk_usage_alert.ts Outdated Show resolved Hide resolved

chrisronline reviewed Sep 4, 2020

View reviewed changes

x-pack/plugins/monitoring/server/alerts/disk_usage_alert.ts Outdated Show resolved Hide resolved

chrisronline reviewed Sep 4, 2020

View reviewed changes

x-pack/plugins/monitoring/server/lib/alerts/fetch_disk_usage_node_stats.ts Show resolved Hide resolved

igoristic added 3 commits September 17, 2020 11:42

Merge branch 'master' of https://github.com/elastic/kibana into disk-…

f030b10

…usage-alerting

Fixed disk and cpu usage states

7bb5c06

Merge branch 'master' of https://github.com/elastic/kibana into disk-…

481f144

…usage-alerting

chrisronline suggested changes Sep 17, 2020

View reviewed changes

igoristic added 2 commits September 22, 2020 01:50

Merge branch 'master' of https://github.com/elastic/kibana into disk-…

663e7dd

…usage-alerting

Fixed resolve state and throttle

b32226a

chrisronline reviewed Sep 23, 2020

View reviewed changes

igoristic added 4 commits September 23, 2020 15:07

Merge branch 'master' of https://github.com/elastic/kibana into disk-…

327db89

…usage-alerting

Merge branch 'master' of https://github.com/elastic/kibana into disk-…

d164b33

…usage-alerting

Merge branch 'master' of https://github.com/elastic/kibana into disk-…

c65b7b5

…usage-alerting

CR feedback

47711db

igoristic requested a review from chrisronline September 29, 2020 03:15

chrisronline suggested changes Sep 29, 2020

View reviewed changes

x-pack/plugins/monitoring/server/alerts/alerts_common.ts Outdated Show resolved Hide resolved

x-pack/plugins/monitoring/server/alerts/disk_usage_alert.ts Show resolved Hide resolved

igoristic added 2 commits September 30, 2020 00:01

Fixed links

8693ed7

Merge branch 'master' of https://github.com/elastic/kibana into disk-…

e56ea3e

…usage-alerting

igoristic requested a review from chrisronline September 30, 2020 04:06

chrisronline approved these changes Sep 30, 2020

View reviewed changes

igoristic merged commit c49d546 into elastic:master Sep 30, 2020

igoristic deleted the disk-usage-alerting branch September 30, 2020 16:38

igoristic mentioned this pull request Sep 30, 2020

[7.x] [Monitoring] Disk usage alerting (#75419) #78990

Merged

igoristic added the backported label Sep 30, 2020

chrisronline mentioned this pull request Dec 14, 2020

[Stack Monitoring] [Test Scenario] Out of the box alerting #85841

Closed

23 tasks

chrisronline mentioned this pull request Mar 1, 2021

[Stack Monitoring] [Test Scenario] Out of the box alerting #93072

Closed

24 tasks

simianhacker mentioned this pull request Apr 29, 2021

[Stack Monitoring] [Test Scenario] Out of the box alerting #98765

Closed

24 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Monitoring] Disk usage alerting #75419

[Monitoring] Disk usage alerting #75419

igoristic commented Aug 19, 2020 •

edited

Loading

elasticmachine commented Sep 2, 2020

chrisronline left a comment

chrisronline left a comment

chrisronline left a comment

chrisronline commented Sep 4, 2020

igoristic commented Sep 4, 2020

chrisronline commented Sep 8, 2020

chrisronline Sep 17, 2020

chrisronline Sep 17, 2020

chrisronline Sep 23, 2020

chrisronline Sep 23, 2020

chrisronline Sep 23, 2020

chrisronline Sep 23, 2020

chrisronline left a comment

kibanamachine commented Sep 30, 2020

chrisronline left a comment

igoristic commented Sep 30, 2020

[Monitoring] Disk usage alerting #75419

[Monitoring] Disk usage alerting #75419

Conversation

igoristic commented Aug 19, 2020 • edited Loading

elasticmachine commented Sep 2, 2020

chrisronline left a comment

Choose a reason for hiding this comment

chrisronline left a comment

Choose a reason for hiding this comment

chrisronline left a comment

Choose a reason for hiding this comment

chrisronline commented Sep 4, 2020

igoristic commented Sep 4, 2020

chrisronline commented Sep 8, 2020

chrisronline Sep 17, 2020

Choose a reason for hiding this comment

chrisronline Sep 17, 2020

Choose a reason for hiding this comment

chrisronline Sep 23, 2020

Choose a reason for hiding this comment

chrisronline Sep 23, 2020

Choose a reason for hiding this comment

chrisronline Sep 23, 2020

Choose a reason for hiding this comment

chrisronline Sep 23, 2020

Choose a reason for hiding this comment

chrisronline left a comment

Choose a reason for hiding this comment

kibanamachine commented Sep 30, 2020

💚 Build Succeeded

Metrics [docs]

@kbn/optimizer bundle module count

async chunks size

distributable file count

page load bundle size

History

chrisronline left a comment

Choose a reason for hiding this comment

igoristic commented Sep 30, 2020

igoristic commented Aug 19, 2020 •

edited

Loading