Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add target to event existence check #15719

Merged
merged 1 commit into from
Sep 29, 2017

Conversation

moolitayer
Copy link

@moolitayer moolitayer commented Aug 3, 2017

In Prometheus alerting if in one evaluation cycle, the same alert was first found to be firing
for different labels (==entities) we will get multiple alerts indistinguishable by timestamp[1]

[1]
Example, for both
"startsAt": "2017-08-02T10:16:30.457040707Z"
and there is no other time related difference:

{
  "generationID": "95b45589-ebee-49ca-a1bc-978c401dfde0",
  "messages": [
    {
      "index": 1,
      "timestamp": "2017-08-02T10:17:00.472823555Z",
      "data": {
        "alerts": [
          {
            "annotations": {
              "description": "ocp-compute01.10.35.48.142.nip.io is down_d",
              "message": "ocp-compute01.10.35.48.142.nip.io is down",
              "miqTarget": "ContainerNode",
              "severity": "HIGH",
              "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
            },
            "endsAt": "0001-01-01T00:00:00Z",
            "generatorURL": "http://prometheus-1018048397-9stw4:9090/graph?g0.expr=up%7Bjob%3D%22kubernetes-nodes%22%7D+%3D%3D+1&g0.tab=0",
            "labels": {
              "alertname": "Node Down",
              "beta_kubernetes_io_arch": "amd64",
              "beta_kubernetes_io_os": "linux",
              "instance": "ocp-compute01.10.35.48.142.nip.io",
              "job": "kubernetes-nodes",
              "kubernetes_io_hostname": "ocp-compute01.10.35.48.142.nip.io",
              "logging_infra_fluentd": "true",
              "region": "primary",
              "zone": "default"
            },
            "startsAt": "2017-08-02T10:16:30.457040707Z",
            "status": "firing"
          },
          {
            "annotations": {
              "description": "ocp-compute02.10.35.48.143.nip.io is down_d",
              "message": "ocp-compute02.10.35.48.143.nip.io is down",
              "miqTarget": "ContainerNode",
              "severity": "HIGH",
              "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
            },
            "endsAt": "0001-01-01T00:00:00Z",
            "generatorURL": "http://prometheus-1018048397-9stw4:9090/graph?g0.expr=up%7Bjob%3D%22kubernetes-nodes%22%7D+%3D%3D+1&g0.tab=0",
            "labels": {
              "alertname": "Node Down",
              "beta_kubernetes_io_arch": "amd64",
              "beta_kubernetes_io_os": "linux",
              "instance": "ocp-compute02.10.35.48.143.nip.io",
              "job": "kubernetes-nodes",
              "kubernetes_io_hostname": "ocp-compute02.10.35.48.143.nip.io",
              "logging_infra_fluentd": "true",
              "region": "primary",
              "zone": "default"
            },
            "startsAt": "2017-08-02T10:16:30.457040707Z",
            "status": "firing"
          },
        ]
      }
    }
}

@moolitayer
Copy link
Author

@miq-bot add_label events, enhancement

@moolitayer
Copy link
Author

@Fryguy @jrafanie does this make sense?

@moolitayer
Copy link
Author

@Fryguy @jrafanie Please take a look. Context explained in first comment

@jrafanie
Copy link
Member

This makes sense @moolitayer. Can you add a test that recreates the issue and demonstrates how your change fixes it?

@Fryguy
Copy link
Member

Fryguy commented Sep 6, 2017

Seems good to me. Agree on adding a spec.

@moolitayer
Copy link
Author

Thanks, done, PTAL @jrafanie @Fryguy

@@ -196,6 +197,27 @@
expect(new_event.availability_zone_id).to eq @availability_zone.id
end
end

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

".add" context is nested under "with availability zones"
Not sure if it's less confusing to add the new context here or create another top level ".add" context

@moolitayer moolitayer force-pushed the event_uniqueness branch 3 times, most recently from 2adfcc9 to 450e8c0 Compare September 28, 2017 20:10
@moolitayer
Copy link
Author

@jrafanie @Fryguy I think this is ready. PTAL
(rubocop issue ins't from this change)

@@ -196,6 +198,24 @@
expect(new_event.availability_zone_id).to eq @availability_zone.id
end
end

context "when two events have the same timestamp" do
context "when events belong to the same object" do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were you going to add another test here? If not, we can probably collapse these two contexts into one.

it "should add only one of them" do
EmsEvent.add(@ems.id, @event_hash)
second = EmsEvent.add(@ems.id, @event_hash)
expect(second).to be_nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check that there is only one EmsEvent row for the event hash details instead of checking the return value from add?

it "should add them both" do
EmsEvent.add(@ems.id, @event_hash)
second = EmsEvent.add(@ems.id, @event_hash.update(:target_id => 1))
expect(second).not_to be_nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@jrafanie
Copy link
Member

jrafanie commented Sep 28, 2017

sorry @moolitayer, I didn't see this update. I had a few minor comments on the tests. Please keep mentioning me until I see it and respond 😉

@moolitayer moolitayer force-pushed the event_uniqueness branch 3 times, most recently from aa29dd7 to 1d4ba5a Compare September 28, 2017 22:50
@miq-bot
Copy link
Member

miq-bot commented Sep 28, 2017

Checked commit moolitayer@e368740 with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0
2 files checked, 1 offense detected

spec/models/ems_event_spec.rb

  • ❗ - Line 176, Col 32 - Rails/TimeZone - Do not use Time.now without zone. Use one of Time.zone.now, Time.current, Time.now.in_time_zone, Time.now.utc, Time.now.getlocal, Time.now.iso8601, Time.now.jisx0301, Time.now.rfc3339, Time.now.to_i, Time.now.to_f instead.

Copy link
Member

@jrafanie jrafanie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

@jrafanie jrafanie merged commit 39d657f into ManageIQ:master Sep 29, 2017
@jrafanie jrafanie added this to the Sprint 70 Ending Oct 2, 2017 milestone Sep 29, 2017
@jrafanie jrafanie assigned jrafanie and unassigned Fryguy Sep 29, 2017
@moolitayer moolitayer mentioned this pull request Oct 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants