[ECS] [TSDB] Centralisation of Dimension Fields #5193

agithomas · 2023-02-07T05:42:44Z

Scope

Identification of common fields in package that can be moved to ecs as dimensions.
Aggregation of fields that related to service integration and cloud native , if needed.
Identification of ecs repo changes and ecs testing.
Creation of PR

agithomas · 2023-02-08T10:38:49Z

From the context of packages having ownership of service integration team, there exist certain fields that are part of every package and they are potential candidates of becoming dimension fields.

service.address
host.ip or host.name (Expect duplicates if the host.ip is not public and is a part of a subnet)
agent.id

agithomas · 2023-02-08T10:52:10Z

When metrics are collected from a resource running in cloud or in a container, below mentioned fields are potential candidates of becoming dimension fields

cloud.instance.id or cloud.instance.name
cloud.provider : This is considering multi-cloud / hybrid infra deployment
cloud.project.id : One organisation can have multiple projects. This is to support multi-regional deployment
container.id : container.name may not be apt here. There may exist pods having same container names running in same host.
host.name: Expect duplicates if the host.ip is not public and is a part of a subnet
agent.id

Should subnet / network name be include ? TBD

ruflin · 2023-02-13T13:06:55Z

For apps running in k8s, we do wee something like k8s.cluster.name or similar?
Lets assume we specify all the fields above in ECS as dimension fields. In the k8s case, the cloud provider fields would likely not be used. Do this also count then to the max 16 fields even if there are no values? And how does this look, if the fields are set in dynamic templates? (@P1llus )

lalit-satapathy · 2023-02-13T14:31:40Z

CC @tommyers-elastic @gizas @felixbarny for any suggestions/comments on the TSDB ECS dimension fields. Once we close on these ECS fields, we can raise a PR for the same.

felixbarny · 2023-02-13T16:03:55Z

@kruskall has tried to define the dimensions for APM (elastic/apm-server#9730) but quickly hit the dimension limit (elastic/elasticsearch#93564).

ruflin · 2023-02-14T11:36:45Z

I like the direction elastic/elasticsearch#93564 is taking. An initial shortcut might be to just increase the limit to 32 which could already help.

Looking at elastic/apm-server#9730, it seems there is some overlap with the dimensions proposed here but there are also quite a few dimensions which I would argue are potentially unique to apm data. It would be nice if all the default dimensions can be set in ECS (or a common base) but are only used / applied when there is actual data. This goes back to my question: Is the limit reached when a field for the dimension is there or if it is in the mapping itself. If mapping is already enough, will it help to have it in the dynamic template?

For each default dimension define, I would like to see us have a note on why it is a dimension. There went a lot of thought into which dimensions to pick and it should be persisted and shared.

agithomas · 2023-02-14T13:45:42Z

Can we consider the above mentioned list for Service Integration and for Cloud (#5193 (comment)). We may have to expect new common fields added to the above list as we test new scenario.

We know, in such cases, explicit dimension field mapping can be done to avoid being blocked.

Is the above list good enough to work towards preparing RFC in ECS ?

agithomas · 2023-02-14T13:49:51Z

For apps running in k8s, we do wee something like k8s.cluster.name or similar?

I think, having host.ip and container.id will cover the criteria of unique identification of document. Don't you think so?

ruflin · 2023-02-15T08:57:44Z

Is the above list good enough to work towards preparing RFC in ECS ?

I assume so, you know best :-)

I think, having host.ip and container.id will cover the criteria of unique identification of document. Don't you think so?

Is a container.id globally unique? I assume the chances are low enough for conflicts to have this work.

agithomas · 2023-02-15T09:39:08Z

Is a container.id globally unique? I assume the chances are low enough for conflicts to have this work.

There are no references i could find that says a container.id repeats in a cluster.

I understand the scenario you are referring to - the scenario where multiple k8s cluster are provisioned. Cluster name is a good logical segregation in such cases.

Reference : https://cloud.google.com/stackdriver/docs/solutions/gke/observing

It may then be checked

Are the ways to get the cluster name in GKE / EKS / Self managed installation different or same?
Are these information currently captured ? If yes, which ecs field is currently used ?

These may be the questions that can be asked to the owners of GKE / EKS integration team.

agithomas · 2023-02-15T10:09:58Z

It may then be checked

Are the ways to get the cluster name in GKE / EKS / Self managed installation different or same?

Are these information currently captured ? If yes, which ecs field is currently used ?

These may be the questions that can be asked to the owners of GKE / EKS integration team.

My preference would be not to modify the ecs dimension fields based on an application or cluster technology / deployment architecture. Every technology such a aws.kinesis, gke , eks may have its own identifier a unique resource it is monitoring. In akw.kinesis it is aws.dimensions.streamName.

As part of integration enhacement to use TSDB, it is expected these unique fields that represent a resource is identified carefully as dimension field in the integration.

agithomas · 2023-02-18T10:42:47Z

data_stream.namespace is an important one too. Adding it to the confirmed list.

ruflin · 2023-02-20T07:08:02Z

Can you share some details why data_stream.namespace is an important dimension? If this value changes, the data goes into a different index.

agithomas · 2023-02-20T09:05:19Z

Can you share some details why data_stream.namespace is an important dimension? If this value changes, the data goes into a different index.

As you rightly said, this is unnecessary.

lalit-satapathy · 2023-02-20T10:55:01Z

@agithomas,

Can we summarise the final list of ECS fields which are dimensions below and one-line description for each, providing the rationale?

felixbarny · 2023-02-20T14:23:16Z

When the dimension limit is removed (elastic/elasticsearch#93564), can we just make any non-metric (keyword?) field a dimension by default. I don't see the value of spending our time on finding out what good dimension fields are. Other TSDBs only support two types of fields: metrics and dimensions. Can we just operate under the same mental model?

lalit-satapathy · 2023-02-21T04:00:51Z

@felixbarny,

Its a good point, in particular, it's not very clear what is the difference between not-dimension meta fields which are keywords vs. dimension fields. TSDB documents will primarily contain a combination of metric fields, dimension fields and meta fields. From a document query point of view, assuming dimension fields and meta fields behave the same.

Currently missing any specific details, which links number of dimension fields vs. TSDB size/performance. I am assuming there is a relationship. Hoping someone from ES team, can provide more details on this.

In the mean time, we can just continue to annotate dimensions, as is this is the ask for TSDB enablement.

felixbarny · 2023-02-21T06:26:02Z

@martijnvg could you give us some guidance on the impact of having a lot of dimensions, assuming the _tsid is a hash and there are no size restrictions. See also elastic/elasticsearch#93564 (comment).

What if any negative consequences do we need to expect if we declare too many dimensions or when making all non-metric fields a dimension by default?

Note that this is the default in other TSDBs so if there are negative consequences in ES when treating non-metric fields as dimensions by default, I'd be curious to have your thoughts on whether they're tolerable, and if not, what we could do to minimize the impact so that we can work with ES like with any other TSDB.

martijnvg · 2023-02-21T08:15:21Z

@felixbarny I need to think more about this.

We might end up with a default dynamic mapping in where every keyword field or every non-metric field (everything except for counter, gauge, or histogram) in order to support dynamic user-defined metrics. (from elastic/elasticsearch#93564 (comment).)

How are keyword labels modelled in this model?

felixbarny · 2023-02-21T09:37:48Z

Keyword labels would be mapped as a dimension. By default, everything except actual metrics would be a dimension.

agithomas · 2023-02-21T10:14:55Z

While we discuss the limitations and the possible future enhancements, i would like to freeze the ecs fields which must be marked as dimension fields.

host.ip
service.address
agent.id
cloud.project.id
cloud.instance.id
cloud.provider
container.id

ruflin · 2023-02-22T12:30:01Z

Lets separate immediate changes from future plans. TSDB is to be released soonish and we want to adopt it in integrations to also make sure it all works as expected. This is where we need the list from @agithomas . These are all ECS fields and if we add it to ECS, all integrations will have these dimensions by default as soon as ECS is updated. Everything using ECS will have a field annotated as dimension from there on, but as long as TSDB is not enabled, it wont have any effect. @agithomas List LGTM

Then there is the mid term and long term and I agree with @felixbarny , ideally we should not have to think about dimensions at all but this will likely not happen immediately. I suggest to keep the "no dimension" discussion in the Elasticsearch issue.

agithomas · 2023-03-08T13:25:41Z

host.ip

service.address

agent.id

cloud.project.id

cloud.instance.id

cloud.provider

container.id

Based on the recommendations, host.ip will be replaced by host.name.

The new list will be

host.name
service.address
agent.id
cloud.project.id
cloud.instance.id
cloud.provider
container.id

agithomas · 2023-03-23T08:56:34Z

@lalit-satapathy ,can you please help by approving , if there are no further queries?

lalit-satapathy · 2023-03-23T23:53:41Z

@agithomas,

Lets update the TSDB migration document to change from host.ip to host.name

tetianakravchenko · 2023-05-05T11:05:11Z

as discussed with @agithomas:

It is recommended to include all those fields below to be on the safe side.
But package developers can choose to pick a subset out of the recommended list after analyzing possible impacts.
For now this list is not enforced and not set as a default dimensions list. In the future it might be changed on the ECS side.

For packages that can be deployed on cloud/on-prem/k8s (examples - MongoDB, Nginx)

Field name	Explanation/reasoning
host.name	It is a host where the agent is running. Mainly used for cases when integration is installed on-prem. Note: we are not using host.id since it might be not unique
service.address	This field is present in case we provide a concrete target for scraping, like IP:PORT of some service (like mongodb), in some cases - it might be not needed to use this field
container.id	For now it is mainly used for: docker/containerd/kubernetes packages, this field is empty for other integrations. Container.id in this case is an id of the monitored container
cloud.account.id (new)	id used to identify different entities in a multi-tenant environment.
cloud.provider	To avoid minimal chance that account.id might be the same for different providers
cloud.region (new)	For services that are region specific
cloud.availability_zone (new)	For services that are zone specific
cloud.instance.id	host.name (can be defined manually by customer) is not unique enough. for Azure - instance.id is globally unique, AWS - region, GCP - availability zone. Technically for azure for example it would be enough to define cloud.instance.id only, but since it should be unified we include all fields: cloud.region/zone
agent.id	For cases when 2 data shipers are monitoring the same resource
cloud.project.id (deleted)

For Cloud-only Integration Packages / Managed Services Packages ( examples - AWS S3 )

Field name
cloud.account.id	required	cloud id used to identify different entities in a multi-tenant environment
cloud.region	required	For managed services that are region specific
cloud.availability_zone	required	For managed services that are zone specific
cloud.provided	Not needed	Because package specific fields already covers it
agent.id	Required	For cases when 2 data shipers are monitoring the same resource

agithomas · 2023-05-06T03:45:24Z

The above recommended list is based on based on the understanding we presently have on

dimensions in managed services running in cloud
dimensions a service / product need when running in on-prem, public-cloud in monolithic ,microservice manner.
based on the available fields in ECS.

The above list may be needed when more fields are added to the ECS & used. For example - details of the subnet (for on-prem infrastructure).

The above list will be used to prepare the RFC-1 of RFC-0

@ruflin , @felixbarny , @martijnvg

Kindly help by reviewing the new list mentioned here

ruflin · 2023-05-08T07:20:14Z

I stumbled over the following line.

agent.id: For cases when 2 data shipers are monitoring the same resource

If 2 agents are monitoring the same resource, shouldn't it be the same time serie? Can you provide an example on where this happens, this likely clarifies things.

agithomas · 2023-05-08T08:52:31Z

If 2 agents are monitoring the same resource, shouldn't it be the same time serie?

We can have one policy deployed on any number of agents. This permits two agents monitoring same resource. This may be done intentionally or accidentally by the customer.

Case 1: If intentionally, it is important that agent.id should be part of a dimension field so that data can be recorded as separate timeseries. A valid usecase i can think here is - a standalone elastic-agent may be running on single node monitoring several infra assets. The admin on understanding a problem related to disk or over-utilisation choose to migrate to a different system. As part of cut-over, during maintenance window, it is important that the user verifies data received from new agent is consistent . Without including agent.id, the data in ES from new agent will be recorded in staggered manner.

Case 2: If agent policy is installed accidentally on more than on agents, is elasticsearch expected to do the de-duplication making use of dimension field constraint (not a feature) of timeseries database ? We think, It may be best that a datastore is a true representation of data received from the upstream system, in this case integration packages.

agithomas · 2023-05-08T14:29:04Z

@ruflin , I have mentioned here , the reason why the agent.id must be included.

Do you think these usecases and scenario are valid to include agent.id? Or, should we consider it as exceptions and save a few bytes of _tsid field by removing agent.id ?

ruflin · 2023-05-09T07:31:04Z

At the moment, I would rather opt for too many then too few dimensions so I'm good with the approach.

…ment) Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>

botelastic · 2024-05-08T07:38:16Z

Hi! We just realized that we haven't looked into this issue in a while. We're sorry! We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

agithomas self-assigned this Feb 7, 2023

agithomas changed the title ~~[Draft] [Research / Scoping] [ECS] Centralisation of Dimension Fields~~ [Draft] [ECS] [TSDB] Centralisation of Dimension Fields Feb 7, 2023

agithomas changed the title ~~[Draft] [ECS] [TSDB] Centralisation of Dimension Fields~~ [ECS] [TSDB] Centralisation of Dimension Fields Feb 15, 2023

This was referenced Mar 7, 2023

[Meta] Observability TSDB packages migration #5233

Closed

[RFC] Stage 0 - TSDB Dimensions elastic/ecs#2172

Merged

tetianakravchenko mentioned this issue May 8, 2023

[System] Add dimensions to system package metrics data_streams only, except core data_streams #6118

Merged

4 tasks

lalit-satapathy mentioned this issue May 9, 2023

[Nginx] Modify the dimension field mapping to support public cloud deployment #6033

Merged

4 tasks

This was referenced May 31, 2023

[System] [Network] Add dimensions and clean up duplicated fields definition #6405

Merged

[System][Process] Add dimensions metadata; remove duplicated fields #6407

Merged

tetianakravchenko added a commit to tetianakravchenko/integrations that referenced this issue Jun 7, 2023

merge, fix conflicts; add dimensions accordingly to elastic#5193 (com…

60ba5d5

…ment) Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>

agithomas mentioned this issue Jun 8, 2023

RFC Stage 1: TSDB Dimensions elastic/ecs#2217

Open

This was referenced Jun 13, 2023

[Istio] Set dimension fields for TSDB migration #6551

Merged

[AWS] Add dimensions to S3 Storage Lens #6583

Merged

[Elasticsearch] Add dimensions fields for TSDB migration #6623

Merged

constanca-m mentioned this issue Jun 28, 2023

[AWS Farget] Set dimension fields #6733

Merged

4 tasks

agithomas mentioned this issue Jul 11, 2023

[Apache Tomcat] Add integration package with memory data stream #6527

Merged

7 tasks

This was referenced Jul 13, 2023

[AWS][API Gateway] Set dimension fields #6950

Merged

[AWS][EMR] Update metric type and set dimensions fields #6964

Merged

constanca-m mentioned this issue Aug 14, 2023

Updated TSDB documentation with additional details #5706

Merged

4 tasks

constanca-m mentioned this issue Aug 22, 2023

[AWS] Add dimensions to EC2 data stream. #7487

Merged

4 tasks

agithomas mentioned this issue Nov 24, 2023

[GCP] Add dimensions for metrics data streams #8314

Merged

16 tasks

gpop63 mentioned this issue Nov 28, 2023

[GCP] Enable TSDB #7555

Open

4 tasks

constanca-m mentioned this issue Dec 12, 2023

[Kubernetes] Reference ECS fields and add agent.id field #8697

Merged

4 tasks

botelastic bot added the Stalled label May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ECS] [TSDB] Centralisation of Dimension Fields #5193

[ECS] [TSDB] Centralisation of Dimension Fields #5193

agithomas commented Feb 7, 2023 •

edited

Loading

agithomas commented Feb 8, 2023 •

edited

Loading

agithomas commented Feb 8, 2023 •

edited

Loading

ruflin commented Feb 13, 2023

lalit-satapathy commented Feb 13, 2023

felixbarny commented Feb 13, 2023

ruflin commented Feb 14, 2023

agithomas commented Feb 14, 2023

agithomas commented Feb 14, 2023 •

edited

Loading

ruflin commented Feb 15, 2023

agithomas commented Feb 15, 2023

agithomas commented Feb 15, 2023

agithomas commented Feb 18, 2023

ruflin commented Feb 20, 2023

agithomas commented Feb 20, 2023

lalit-satapathy commented Feb 20, 2023

felixbarny commented Feb 20, 2023

lalit-satapathy commented Feb 21, 2023

felixbarny commented Feb 21, 2023

martijnvg commented Feb 21, 2023

felixbarny commented Feb 21, 2023

agithomas commented Feb 21, 2023

ruflin commented Feb 22, 2023

agithomas commented Mar 8, 2023

agithomas commented Mar 23, 2023

lalit-satapathy commented Mar 23, 2023

tetianakravchenko commented May 5, 2023

agithomas commented May 6, 2023

ruflin commented May 8, 2023

agithomas commented May 8, 2023

agithomas commented May 8, 2023

ruflin commented May 9, 2023

botelastic bot commented May 8, 2024

[ECS] [TSDB] Centralisation of Dimension Fields #5193

[ECS] [TSDB] Centralisation of Dimension Fields #5193

Comments

agithomas commented Feb 7, 2023 • edited Loading

agithomas commented Feb 8, 2023 • edited Loading

agithomas commented Feb 8, 2023 • edited Loading

ruflin commented Feb 13, 2023

lalit-satapathy commented Feb 13, 2023

felixbarny commented Feb 13, 2023

ruflin commented Feb 14, 2023

agithomas commented Feb 14, 2023

agithomas commented Feb 14, 2023 • edited Loading

ruflin commented Feb 15, 2023

agithomas commented Feb 15, 2023

agithomas commented Feb 15, 2023

agithomas commented Feb 18, 2023

ruflin commented Feb 20, 2023

agithomas commented Feb 20, 2023

lalit-satapathy commented Feb 20, 2023

felixbarny commented Feb 20, 2023

lalit-satapathy commented Feb 21, 2023

felixbarny commented Feb 21, 2023

martijnvg commented Feb 21, 2023

felixbarny commented Feb 21, 2023

agithomas commented Feb 21, 2023

ruflin commented Feb 22, 2023

agithomas commented Mar 8, 2023

agithomas commented Mar 23, 2023

lalit-satapathy commented Mar 23, 2023

tetianakravchenko commented May 5, 2023

agithomas commented May 6, 2023

ruflin commented May 8, 2023

agithomas commented May 8, 2023

agithomas commented May 8, 2023

ruflin commented May 9, 2023

botelastic bot commented May 8, 2024

agithomas commented Feb 7, 2023 •

edited

Loading

agithomas commented Feb 8, 2023 •

edited

Loading

agithomas commented Feb 8, 2023 •

edited

Loading

agithomas commented Feb 14, 2023 •

edited

Loading