Fix grok processors #35276

HiDAl · 2023-05-02T09:03:47Z

The Elasticsearch made a behavioral change to the grok processor
(PR elastic/elasticsearch#92586), where in case a field matches more than
once, the processor will return a list of values. This change was
introduced in 8.7. Some of the filebeats pipelines were built over the
assumption that only the first match would be returned.

In the case of IIS, the source.address is repeated in the second part, by
which only the first match will be used, so, it's safe to remove the
second source.address

In the case of MySQL, according to their documentation:

the Schema is defined always at the
beginning of the log, right after the "Thread_id:", so it's safe to
remove the second mysql.schema
thread_id is more tricky, because it can be matched from different
places, in this case, the potential matches are stored in 3 temporary
fields, and then a new script processor does the job of using the
correct temporary field and then removing it.

What does this PR do?

Why is it important?

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

[ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

This reverts commit f5ace09.

botelastic · 2023-05-02T09:03:54Z

This pull request doesn't have a Team:<team> label.

mergify · 2023-05-02T09:04:22Z

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @HiDAl? 🙏.
For such, you'll need to label your PR with:

The upcoming major version of the Elastic Stack
The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

The Elasticsearch made a behavioral change to the grok processor (PR elastic/elasticsearch#92586), where in case a field matches more than once, the processor will return a list of values. This change was introduced in 8.7. Some of the filebeats pipelines were built over the assumption that only the first match would be returned. In the case of IIS, the `source.address` is repeated in the second part, by which only the first match will be used, so, it's safe to remove the second `source.address` In the case of MySQL, according to their documentation: - the Schema is defined always at the beginning of the log, right after the "Thread_id:", so it's safe to remove the second `mysql.schema` - `thread_id` is more tricky, because it can be matched from different places, in this case, the potential matches are stored in 3 temporary fields, and then a new `script` processor does the job of using the correct temporary field and then removing it.

elasticmachine · 2023-05-02T10:21:52Z

💔 Tests Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2023-05-02T09:25:56.574+0000
Duration: 66 min 31 sec

Test stats 🧪

Test	Results
Failed	13
Passed	27522
Skipped	2140
Total	29675