Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grok processor extracts only the first value if there are multiple matches #92092

Closed
gmarouli opened this issue Dec 5, 2022 · 1 comment · Fixed by #92586
Closed

Grok processor extracts only the first value if there are multiple matches #92092

gmarouli opened this issue Dec 5, 2022 · 1 comment · Fixed by #92586
Assignees
Labels
>bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team

Comments

@gmarouli
Copy link
Contributor

gmarouli commented Dec 5, 2022

Elasticsearch Version

7.17.6, 8.x

Installed Plugins

No response

Java Version

bundled

OS Version

Doesn't depend on the OS Version

Problem Description

The Grok processor in the elasticsearch ingest pipelines does not extract multiple value that match a group but only the first one.

For example the following Grok expression:

^%{IPORHOST:source.address} (%{IPORHOST:source.address} )?

when given the following input:

"127.0.0.1 127.0.0.2"

It should match both values and not only the first one.

Steps to Reproduce

POST /_ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "_description",
    "processors": [
      {
        "grok": {
          "field": "message",
          "patterns": [
            "^%{IPORHOST:source.address} (%{IPORHOST:source.address} )?"
          ]
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_id": "id",
      "_source": {
        "message": "82.10.222.126 192.168.10.1"
      }
    }
  ]
}

The source.addresses in the response is:

{
  "docs": [
    {
      "doc": {
        "_index": "index",
        "_id": "id",
        "_version": "-3",
        "_source": {
          "message": "82.10.222.126 192.168.10.1",
          "apache": {
            "access": {
              "user": {
                "identity": "jean"
              }
            }
          },
          "source": {
            "address": "82.10.222.126"
          }
        },
        "_ingest": {
          "timestamp": "2022-12-01T09:10:24.439912308Z"
        }
      }
    }
  ]
}

While the expected value of source.address is:

{
  "docs": [
    {
      "doc": {
        "_index": "index",
        "_id": "id",
        "_version": "-3",
        "_source": {
          "message": "82.10.222.126 192.168.10.1",
          "apache": {
            "access": {
              "user": {
                "identity": "jean"
              }
            }
          },
          "source": {
            "address": [
              "82.10.222.126",
              "192.168.10.1"
            ]
          }
        },
        "_ingest": {
          "timestamp": "2022-12-01T09:10:24.439912308Z"
        }
      }
    }
  ]
}

### Logs (if relevant)

_No response_
@gmarouli gmarouli added >bug needs:triage Requires assignment of a team area label labels Dec 5, 2022
@michaelbaamonde michaelbaamonde added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP and removed needs:triage Requires assignment of a team area label labels Dec 5, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Dec 5, 2022
@HiDAl HiDAl self-assigned this Dec 27, 2022
HiDAl added a commit to HiDAl/elasticsearch that referenced this issue Dec 28, 2022
This commit makes the Elasticsearch Grok processor behaves the same that
Logstash's grok:

when Logstash's Grok processor runs with a pattern having repeated 
pattern names (e.g.: `"%{IP:ip} %{IP:ip} %{anything else}"`), it returns
a list of matches, instead only the first match. 

Closes elastic#92092
HiDAl added a commit that referenced this issue Jan 10, 2023
)

Grok returns a list of matches for repeated pattern names

This change makes the Elasticsearch Grok processor behaves in the 
same way that Logstash's grok, when handling repeated pattern 
names, returning a list of matches instead only the first only

Closes #92092
danielmitterdorfer pushed a commit to danielmitterdorfer/elasticsearch that referenced this issue Jan 12, 2023
… (elastic#92586)

Grok returns a list of matches for repeated pattern names

This change makes the Elasticsearch Grok processor behaves in the 
same way that Logstash's grok, when handling repeated pattern 
names, returning a list of matches instead only the first only

Closes elastic#92092
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants