Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TSDB] Enablement of TSDB fails for package with fields having flattened type #94113

Closed
agithomas opened this issue Feb 24, 2023 · 15 comments
Closed
Assignees
Labels
>bug :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@agithomas
Copy link

If there exist a field of type flattened in the field mapping of a datastream, on TSDB enablement of the package in Kibana, the below error is displayed in kibana.

illegal_argument_exception Caused by: illegal_argument_exception: invalid composite mappings for [metrics-mysql.performance] Root causes: illegal_argument_exception: composable template [metrics-mysql.performance] template after composition with component templates [metrics-mysql.performance@package, metrics-mysql.performance@custom, .fleet_globals-1, .fleet_agent_id_verification-1] is invalid

In this case, none of the flattened field type is annotated with dimension: true.

Impact:

Many integration packages have flattened field type , TSDB enablement will fail unless this issue is fixed.

@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Feb 24, 2023
@martijnvg martijnvg added :StorageEngine/TSDB You know, for Metrics and removed needs:triage Requires assignment of a team area label labels Feb 24, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Feb 24, 2023
@martijnvg martijnvg added the >bug label Feb 24, 2023
@agithomas
Copy link
Author

The above mentioned problem appeared when TSDB enablement was attempted in MySQL's performance datastream .

Component template is as below. You can identify a field having flattened type. As part of TSDB enablement, i did not choose the flattened type, understanding dimension fields cannot be of type flattened. Other fields form the dimensions.

If the flattened field type mapping is removed, TSDB enablement works.

{
  "properties": {
    "cloud": {
      "properties": {
        "instance": {
          "properties": {
            "id": {
              "time_series_dimension": true,
              "type": "keyword"
            }
          }
        },
        "provider": {
          "time_series_dimension": true,
          "type": "keyword"
        },
        "project": {
          "properties": {
            "id": {
              "time_series_dimension": true,
              "type": "keyword"
            }
          }
        }
      }
    },
    "container": {
      "properties": {
        "id": {
          "time_series_dimension": true,
          "type": "keyword"
        }
      }
    },
    "agent": {
      "properties": {
        "id": {
          "time_series_dimension": true,
          "type": "keyword"
        }
      }
    },
    "@timestamp": {
      "type": "date"
    },
    "ecs": {
      "properties": {
        "version": {
          "ignore_above": 1024,
          "type": "keyword"
        }
      }
    },
    "data_stream": {
      "properties": {
        "namespace": {
          "type": "constant_keyword"
        },
        "type": {
          "type": "constant_keyword"
        },
        "dataset": {
          "type": "constant_keyword"
        }
      }
    },
    "service": {
      "properties": {
        "address": {
          "time_series_dimension": true,
          "type": "keyword"
        },
        "type": {
          "ignore_above": 1024,
          "type": "keyword"
        }
      }
    },
    "host": {
      "properties": {
        "ip": {
          "type": "ip"
        }
      }
    },
    "mysql": {
      "properties": {
        "performance": {
          "properties": {
            "table_io_waits": {
              "properties": {
                "count": {
                  "properties": {
                    "fetch": {
                      "type": "long"
                    }
                  }
                },
                "index": {
                  "properties": {
                    "name": {
                      "ignore_above": 1024,
                      "type": "keyword"
                    }
                  }
                },
                "object": {
                  "properties": {
                    "schema": {
                      "ignore_above": 1024,
                      "type": "keyword"
                    },
                    "name": {
                      "ignore_above": 1024,
                      "type": "keyword"
                    }
                  }
                }
              }
            },
            "events_statements": {
              "properties": {
                "avg": {
                  "properties": {
                    "timer": {
                      "properties": {
                        "wait": {
                          "type": "long"
                        }
                      }
                    }
                  }
                },
                "last": {
                  "properties": {
                    "seen": {
                      "type": "date"
                    }
                  }
                },
                "max": {
                  "properties": {
                    "timer": {
                      "properties": {
                        "wait": {
                          "type": "long"
                        }
                      }
                    }
                  }
                },
                "quantile": {
                  "properties": {
                    "95": {
                      "type": "long"
                    }
                  }
                },
                "digest": {
                  "type": "flattened"
                },
                "count": {
                  "properties": {
                    "star": {
                      "type": "long"
                    }
                  }
                },
                "statement_id": {
                  "ignore_above": 1024,
                  "type": "keyword"
                }
              }
            }
          }
        }
      }
    }
  }
}

@agithomas
Copy link
Author

agithomas commented Feb 24, 2023

My initial estimate is below mentioned Package datastreams would be impacted

azure_application_insights/app_insights
azure_billing/billing
docker/event
elasticsearch/cluster_stats
linux/memory
mysql/performance

@lalit-satapathy lalit-satapathy changed the title [TSDB] Enablement of TSDB fails if there exist a field having flattened type [TSDB] Enablement of TSDB fails for package with fields having flattened type Feb 24, 2023
@ruflin
Copy link
Member

ruflin commented Feb 27, 2023

@agithomas Thanks for the list. The number is pretty low. We can wait with migrating these datasets until the bug is resolved.

@lalit-satapathy
Copy link

Packages with (type: flattened) in fields.yml

akamai
atlassian_bitbucket
atlassian_confluence
atlassian_jira
auditd_manager
auth0
aws
azure
carbonblack_edr
cef
cisco_asa
cisco_duo
cisco_ftd
cisco_ise
cisco_meraki
cisco_secure_endpoint
citrix_waf
cloudflare
cloudflare_logpush
cyberarkpas
darktrace
docker
elasticsearch
gcp
google_workspace
hashicorp_vault
infoblox_bloxone_ddi
jamf_compliance_reporter
kubernetes
linux
logstash
m365_defender
modsecurity
mongodb
mysql
mysql_enterprise
okta
ping_one
sentinel_one
snyk
suricata
symantec_endpoint
tenable_sc
ti_recordedfuture
ti_threatq
trend_micro_vision_one
zerofox
zoom
zscaler_zpa

@lalit-satapathy
Copy link

The list of package above include both logs and metric data streams.

Current there are two related? behaviours:

  • Enabling TSDB on package with (type: flattened) leads to error:

illegal_argument_exception Caused by: illegal_argument_exception: invalid composite mappings for [metrics-mysql.performance] Root causes: illegal_argument_exception: composable template [metrics-mysql.performance] template after composition with component templates [metrics-mysql.performance@package, metrics-mysql.performance@custom, .fleet_globals-1, .fleet_agent_id_verification-1] is invalid

  • Enabling Synthetic source on package with (type: flattened) leads to error:

illegal_argument_exception Root causes: illegal_argument_exception: field [mysql.performance.events_statements.digest] of type [flattened] doesn't support synthetic source

@martijnvg, Let's know if we need to track the Synthetic source issue separately, or this issue will track both?

@martijnvg
Copy link
Member

Let's know if we need to track the Synthetic source issue separately, or this issue will track both?

This should be tracked separately. The cause of this error is different than when dimension fields are part of flattened fields.

@agithomas
Copy link
Author

@martijnvg , is there a timeline you can give on this issue fix?

@martijnvg
Copy link
Member

@agithomas We're working on it. We hope to resolve these two issues before 8.8 FF.

@salvatore-campagna
Copy link
Contributor

salvatore-campagna commented Apr 5, 2023

With PR #94842 we are ready to support synthetic source for flattened fields. This is required to be able to use flattened fields as dimensions in TSDB indices. I am going to work on that after clarifying a few things.

Right now dimension fields are identified by a boolean parameter time_series_dimension. For flattened fields that doesn't work since flattened fields include multiple fields, each of which might or might not be a dimension. As a result we need the complete list of fields and their full path (at least the full path from the flattened field root) to be able to identify dimension fields inside the flattened field.

My idea was to use a list, part of the flattened field mapping like in the following example:

"digest": {
    "type": "flattened",
    "time_series_dimensions": [ "a", "a.b", "c.z.y" ]
}

(I used made-up fields because have no idea of what sorts of data is stored in this "digest" flattened field)
NOTE: we expect all fields inside to be treated as keywords.

If that is ok for you I would ask the following questions:

  • do we know the list of fields upfront?
  • do we need the ability to add all fields in the flattened field as dimension? If yes maybe we can just re-use the time_series_dimension: true as a shortcut to mean "use all fields".
  • how do we deal with possible empty/null values? I guess we need to handle that situation especially if we need the ability to add all fields.
  • do we need any support for wildcards? something like "time_series_dimensions": [ "a.*", "c.*" ].
  • if we use a list of fields, is it possible that one of the fields in the list is not available in one or more documents?

@agithomas
Copy link
Author

@salvatore-campagna , related to fields having flattened type, there are two issues

  1. There exist a field having type flattened type in the datastream, flattened type field is not a dimension, below error appears on TSDB enablement

illegal_argument_exception Caused by: illegal_argument_exception: invalid composite mappings for [metrics-mysql.performance] Root causes: illegal_argument_exception: composable template [metrics-mysql.performance] template after composition with component templates [metrics-mysql.performance@package, metrics-mysql.performance@custom, .fleet_globals-1, .fleet_agent_id_verification-1] is invalid

  1. A field having type flattened cannot be a dimension field.

The issue no - 1, is currently the blocker for me . Can can prioritise resolving this error? This git issue is created to report the issue no -1 .

I understand that your comment is more around issue no -2 - how to make flattened type as a dimension field?

I managed to find a work-around a solution for the second problem which is extract the required field from flattened type in the ingest pipeline, annotate the extracted field as dimension filed. In the usecase that i have, i have only one field in the flattened type and hence the options you proposed, including regular expression matching works for my usecase.

@ruflin / @lalit-satapathy , considering the current usecases and potential usecases that may arise in future, do you have any recommendations ?

@martijnvg
Copy link
Member

The issue no - 1, is currently the blocker for me . Can can prioritise resolving this error? This git issue is created to report the issue no -1 .

@agithomas This should be fixed via #94842

@salvatore-campagna
Copy link
Contributor

salvatore-campagna commented Apr 6, 2023

@agithomas, thanks for your answer.

Among all the options/questions I have above there is only one thing that I will advise to avoid, unless there is really a reason behind it, which is having wildcards and the reason is that wildcards might cause issues when we have too many dimensions. You are aware that we have a limit on the number of dimension fields right now (because of a Lucene limitation on the maximum size of the tsid). Using wildcards it means we don't know exactly the number of fields that will be included in the tsid, if a flattened field is used (all matching fields will be used), and that indexing might fail because we exceed that limit.

Not using wildcards and specifying the fields in a list including full names makes it easier to see if we reached the limit or not.

For this reason I would advise not to support wildcards unless there is a strong reason to require that.
Probably the same reasoning should be applied for the option to include all fields.

The easiest thing would be just to have a list with full names. Would this be enough?

@agithomas
Copy link
Author

The easiest thing would be just to have a list with full names. Would this be enough?

For the usecases that i am aware of, this would suffice - to have a list with full names.

@agithomas
Copy link
Author

Closing the issue . Test results after testing using 8.8-SNAPSHOT (image pull date: today)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

6 participants