Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MYSQL/IIS log file issue in Integrations #6016

Closed
ishleenk17 opened this issue Apr 27, 2023 · 6 comments · Fixed by #6531 or #6610
Closed

MYSQL/IIS log file issue in Integrations #6016

ishleenk17 opened this issue Apr 27, 2023 · 6 comments · Fixed by #6531 or #6610
Assignees

Comments

@ishleenk17
Copy link
Contributor

There have been some recent failures with IIS/MYSQL have gone unnoticed after 8.7.0. (Since CI/CD is broken).
These issues came up in beats and same can be observed in Integrations as well.

The issue is seen since ES has updated the grok processing(8.7.0 onwards) and it is generating different outputs (Expected json files).

Files now report duplicate values (list) in contrast to a string value before.

For MYSQL:

Maria DB Log

Issue: {"root['mysql.slowlog.schema']": {'old_type': <class 'str'>, 'new_type': <class 'list'>, 'old_value': 'employees-test', 'new_value': ['employees-test', 'employees-test']}}}

MYSQL Ubuntu Logs

Issue: {"root['mysql.thread_id']": {'old_type': <class 'str'>, 'new_type': <class 'list'>, 'old_value': '16', 'new_value': [16, '16']}}}

Percona Ubuntu Logs

Issue: {"root['mysql.slowlog.schema']": {'old_type': <class 'str'>, 'new_type': <class 'list'>, 'old_value': 'employees', 'new_value': ['employees', 'employees']}}}

For IIS:

IIS Logs 7.5

Issue: {"root['destination.address']": {'old_type': <class 'str'>, 'new_type': <class 'list'>, 'old_value': '10.100.220.70', 'new_value': ['10.100.220.70', '10.100.220.70']}

The issue has been fixed in beats.
But since integrations don't follow a release cycle, we need to figure out a way to fix this in integrations.

  1. Are changes required in the pipeline?
  2. Does ES need to backport the change?
  3. Do we need to update the Kibana version of the integrations?

The issue raised in beats: elastic/beats#35133

@jsoriano
Copy link
Member

Thanks @ishleenk17 for creating this follow up issue.

The issue has been fixed in beats.

From my understanding the issue is still present in beats, we have only fixed the tests, but values are still duplicated. And some fields in the IIS module are also gone, what is more concerning (see destination.ip and source.ip in https://github.com/elastic/beats/pull/35221/files#diff-5fb5c8dbeb10416fe0cdbd4193a7a7973ab34a26047edd3faa224560aa24494fL5).

Are changes required in the pipeline?

Yes, I think that unexpectedly duplicated values need to be deduplicated. Something like what has been done here. Hopefuly this will also fix the issue with the disappeared fields in IIS.

These changes should be ideally applied to Filebeat modules too.

Does ES need to backport the change?

Do you mean to revert elastic/elasticsearch#92586? Maybe this can be discussed, but it looks like this change was desired, to align Logstash and Elasticsearch implementations of the grok processor. Also, the change has been already released, so I am afraid that we have to live with it. ccing @ruflin in case he thinks something can be done in this line.

Do we need to update the Kibana version of the integrations?

It shouldn't be needed, the change in the pipelines should work both when values are duplicated and when they are not.

@ruflin
Copy link
Member

ruflin commented Apr 27, 2023

@ishleenk17 I'm wondering on your take on this putting aside the potential breaking changes and issues: Is the Elasticsearch change an improvement over the previous behaviour?

Does Elasticsearch have an functionality to simplify the removal of duplicates?

@ishleenk17
Copy link
Contributor Author

@ishleenk17 I'm wondering on your take on this putting aside the potential breaking changes and issues: Is the Elasticsearch change an improvement over the previous behaviour?

Does Elasticsearch have an functionality to simplify the removal of duplicates?

@ruflin : Yes, I think it is an improvement as it addresses the issue of storing all values in case there are multiple values. But that has also led to duplication of values, which doesn't look to be right.

Does Elasticsearch have an functionality to simplify the removal of duplicates?

I suppose we can add a check in ES in case a value previously exists in the array we need not add it.
It would remove duplication. Rather than handling this in all individual pipelines in integrations.

@ishleenk17
Copy link
Contributor Author

we have only fixed the tests

Thats right, we should remove the duplication of fields and I suppose we should handle it at ES rather than handling it in integrations. Details here.

That will automatically handle scenarios of Beats/Integrations both.

@ruflin
Copy link
Member

ruflin commented May 1, 2023

But that has also led to duplication of values, which doesn't look to be right.

Can you take this up with the Elasticsearch team?

@ishleenk17
Copy link
Contributor Author

But that has also led to duplication of values, which doesn't look to be right.

Can you take this up with the Elasticsearch team?

Yes @ruflin , I have mentioned this issue to the ES team. Let's see what they revert.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment