[AO][SERVERLESS] Fix Custom Threshold rule tests for Serverless #167644

fkanout · 2023-09-29T13:56:21Z

Summary

Revert the revert for #166942 💪🏻

Fixes #165569
Fixes #166617
Fixes #166618
Fixes #166619
Fixes #166620

apmmachine · 2023-09-29T13:56:38Z

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

/oblt-deploy : Deploy a Kibana instance using the Observability test environments.
/oblt-deploy-serverless : Deploy a serverless Kibana instance using the Observability test environments.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

…e data in it

…-fix'

elasticmachine · 2023-10-02T09:06:14Z

Pinging @elastic/actionable-observability (Team: Actionable Observability)

maryam-saeidi · 2023-10-02T09:30:11Z

...erverless/api_integration/test_suites/observability/custom_threshold_rule/avg_pct_no_data.ts

@@ -51,10 +48,6 @@ export default function ({ getService }: FtrProviderContext) {
        index: CUSTOM_THRESHOLD_RULE_ALERT_INDEX,
        query: { term: { 'kibana.alert.rule.uuid': ruleId } },
      });
-      await esClient.deleteByQuery({


Any particular reason to remove cleaning up the event log index?

@maryam-saeidi, good question. Actually, in this use case, a.k.a "no_data," we don't ingest any data; hence, there is nothing to delete

But, the event log index is related to the result of executing a rule, not the ingested data.

You can check the test server with and without cleaning; you'll see without it, you still have some logs in this index.

@maryam-saeidi thank you for pointing that out. There is a race condition on this index as we use the wildcard option *, which creates inconsistent results and sometimes throws an error.

"cause":{"type":"version_conflict_engine_exception","reason":"[S_D574oB8qizAIQCLOib]: version conflict, required seqNo [1], primary term [1]. but no document was found","index_uuid":"cZ1A-w-KS1qY34KrbPGHPA","shard":"0","index":".ds-.kibana-event-log-ds-2023.10.02-000001"},"status":409},{"index":".ds-.kibana-event-log-ds-2023.10.02-000001","id":"TPD574oB8qizAIQCMOiB","cause":{"type":"version_conflict_engine_exception","reason":"[TPD574oB8qizAIQCMOiB]: version conflict, required seqNo [2], primary term [1]. but no document was found","index_uuid":"cZ1A-w-KS1qY34KrbPGHPA","shard":"0","index":".ds-.kibana-event-log-ds-2023.10.02-000001"},"status":409}

So, I changed the delete term to use the rule.id instead of the alert.consumer to be more accurate and avoid this issue.

Interesting, how does changing the field solve the issue?
The error says there is no data based on the query, so it should be the same case for any field (including rule.id), isn't it?

@maryam-saeidi No, the error message is before I change it to rule.id

The error says there is no data based on the query,

This is why I removed the deleteByQuery in the beginning, but then I realized there is a racing condition.
If you look at the error type version_conflict_engine_exception. It means there are two operations at the same.

This is why I removed the deleteByQuery in the beginning, but then I realized there is a racing condition.
If you look at the error type version_conflict_engine_exception. It means there are two operations at the same.

I still don't understand, either the document exists in the index, which should contain both fields (rule.id and kibana.alert.rule.consumer), or the document does not exist. How does checking one field instead of the other help?
In which case, do we have a document with rule.id that does not contain kibana.alert.rule.consumer?

@maryam-saeidi, the test files run in parallel. So we could have two test files trying to perform CRUD operations on the same index. As the error message I shared mentioned, version conflict, required seqNo [1], primary term [1].. e.g. A rule is running and still generating events while another after from another file tries to delete everything using the wildcard *. (race condition)

So when I scoped the deletion i.e. every rule/every test file deleted ONLY its generated event using rule.id we
no longer have any issue. We should have relied on the rule.id instead of consumer from the beginning.

Is that make sense?

Yes, super clear, thanks.

Then can we bring back cleaning the event log index with the rule.id condition? Otherwise, after this test, we still have data in the event log index.

await esClient.deleteByQuery({ index: '.kibana-event-log-*', query: { term: { 'rule.id': ruleId } }, });

Yeah, I already got it back!

fkanout · 2023-10-02T09:40:10Z

@elasticmachine merge upstream

kibana-ci · 2023-10-02T15:30:04Z

💚 Build Succeeded

Buildkite Build
Commit: 8c77d4f

Metrics [docs]

✅ unchanged

History

💚 Build #164499 succeeded bc252eb
💔 Build #164435 failed e94cc74
💔 Build #164359 failed 81af405
💚 Build #164085 succeeded 9bb86f6
💔 Build #164014 failed d0dc63b

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @fkanout

fkanout and others added 8 commits September 21, 2023 15:50

Fix serverless tests

3316118

Update file name

c42f17c

Fix metric threshold rule test

516ae30

Fix metric threshold rule test

dc4ec0a

Merge branch 'main' into fix-custom-threshold-rule-serverless-tests

5681a69

Update serverless tests after ResponseOps PR about consumer

6b6bbcd

Merge branch 'main' into fix-custom-threshold-rule-serverless-tests

94a0a8a

Merge branch 'main' into fix-custom-threshold-rule-serverless-tests

555cc28

Merge branch 'main' into fix-custom-threshold-rule-serverless-tests

be5d3a4

fkanout self-assigned this Sep 29, 2023

fkanout added backport:skip This commit does not require backporting Team: Actionable Observability - DEPRECATED For Observability Alerting and SLOs use "Team:obs-ux-management", for AIops "Team:obs-knowledge" v8.11.0 labels Sep 29, 2023

fkanout and others added 2 commits September 29, 2023 16:03

adding json files to the include

1405a53

Merge branch 'main' into fix-custom-threshold-rule-serverless-tests

d0dc63b

fkanout added the release_note:skip Skip the PR/issue when compiling release notes label Sep 29, 2023

fkanout and others added 3 commits September 29, 2023 17:29

Remove deleting kibana event in the no_data tests as we don't generat…

a5487da

…e data in it

Merge branch 'main' into fix-custom-threshold-rule-serverless-tests

3f96c6f

[CI] Auto-commit changed files from 'node scripts/eslint --no-cache -…

9bb86f6

…-fix'

fkanout marked this pull request as ready for review October 2, 2023 09:06

fkanout requested a review from a team as a code owner October 2, 2023 09:06

maryam-saeidi reviewed Oct 2, 2023

View reviewed changes

kibanamachine and others added 5 commits October 2, 2023 05:40

Merge branch 'main' into fix-custom-threshold-rule-serverless-tests

81af405

use rule.id term for deleteByQuery

55a1e7e

Merge branch 'main' into fix-custom-threshold-rule-serverless-tests

e94cc74

Merge branch 'main' into fix-custom-threshold-rule-serverless-tests

bc252eb

Fix mixing tests

8c77d4f

fkanout enabled auto-merge (squash) October 2, 2023 15:31

maryam-saeidi approved these changes Oct 2, 2023

View reviewed changes

fkanout merged commit b41dd04 into elastic:main Oct 2, 2023
19 checks passed

fkanout mentioned this pull request Oct 3, 2023

[AO] Add testing action variables to the Custom threshold rule API integration tests #167757

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AO][SERVERLESS] Fix Custom Threshold rule tests for Serverless #167644

[AO][SERVERLESS] Fix Custom Threshold rule tests for Serverless #167644

fkanout commented Sep 29, 2023 •

edited

Loading

apmmachine commented Sep 29, 2023

elasticmachine commented Oct 2, 2023

maryam-saeidi Oct 2, 2023

fkanout Oct 2, 2023

maryam-saeidi Oct 2, 2023

fkanout Oct 2, 2023

maryam-saeidi Oct 2, 2023

fkanout Oct 2, 2023 •

edited

Loading

maryam-saeidi Oct 2, 2023

fkanout Oct 2, 2023 •

edited

Loading

maryam-saeidi Oct 2, 2023

fkanout Oct 2, 2023

fkanout commented Oct 2, 2023

kibana-ci commented Oct 2, 2023

[AO][SERVERLESS] Fix Custom Threshold rule tests for Serverless #167644

[AO][SERVERLESS] Fix Custom Threshold rule tests for Serverless #167644

Conversation

fkanout commented Sep 29, 2023 • edited Loading

Summary

apmmachine commented Sep 29, 2023

🤖 GitHub comments

elasticmachine commented Oct 2, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fkanout Oct 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fkanout Oct 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fkanout commented Oct 2, 2023

kibana-ci commented Oct 2, 2023

💚 Build Succeeded

Metrics [docs]

History

fkanout commented Sep 29, 2023 •

edited

Loading

fkanout Oct 2, 2023 •

edited

Loading

fkanout Oct 2, 2023 •

edited

Loading