Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chore][otel] Enable persistence in otel.yml #5549

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

VihasMakwana
Copy link
Contributor

@VihasMakwana VihasMakwana commented Sep 17, 2024

  • Enhancement
  • Docs

What does this PR do?

This PR adds the filestorage extension in otel.yaml and updates the documentation.
Updates otel dependencies to 0.110.0

Why is it important?

By default, the OpenTelemetry Collector operates in a stateless mode, meaning it does not save file offsets to disk. To resolve this, we enable persistence in the default configuration files.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

How to test this PR locally

  • Build from this branch
  • Keep the config default and run elastic-agent otel --config otel.yaml
  • Ingest some data and stop the process.
  • Restart the process and ingest some more data.
  • You shouldn't see any data loss/duplication.

Related issues

@VihasMakwana VihasMakwana added documentation Improvements or additions to documentation enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team opentelemetry Related to the Elastic Distribution of the OpenTelemetry Collector labels Sep 17, 2024
@VihasMakwana VihasMakwana self-assigned this Sep 17, 2024
Copy link
Contributor

mergify bot commented Sep 17, 2024

This pull request does not have a backport label. Could you fix it @VihasMakwana? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label to automatically backport to the 8./d branch. /d is the digit

Copy link
Contributor

mergify bot commented Sep 17, 2024

backport-v8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Sep 17, 2024
@VihasMakwana VihasMakwana marked this pull request as ready for review September 18, 2024 10:19
@VihasMakwana VihasMakwana requested a review from a team as a code owner September 18, 2024 10:19
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@VihasMakwana
Copy link
Contributor Author

VihasMakwana commented Sep 18, 2024

@elastic/elastic-agent-control-plane

I propose two options for default directory:

  1. Set it as per current sample configurations and modify them during the onboarding process. Currently, the sample configs use STORAGE_DIR env as the default directory option
  2. Follow Filebeat's directory layout and set the default to a path under path.data.

For the latter option, we need to implement a config converter which maps path.data to /opt/Elastic/Agent/data (in case of Linux).

Let me know how do you feel about this proposal.

cc: @ycombinator @pierrehilbert @andrzej-stencel

@VihasMakwana
Copy link
Contributor Author

Introducing a new config converter would be straightforward and could be easily integrated into our distribution.

@cmacknz
Copy link
Member

cmacknz commented Sep 19, 2024

As discussed let's leave this as an environment variable. How the collector is run and installed (if it is installed as a service at all) is currently completely up to the user and this goes along with that.

The only place we likely want a default is in our reference Kubernetes configurations, which can set the environment variable or set a default in the configuration it includes directly.

In general we do not want to use any of the paths that are also used by Beats or Agent, because a standalone OTel collector must be able to run alongside them without issue.

In our OTel distribution I think we solve this by having an install command that puts this in whatever base install directory we pick (/opt/Elastic, C:\Program Files\Elastic, or /Library/Elastic) but we need an installer that runs as a privileged user to do that. You can't do it with config alone.

Copy link
Contributor

mergify bot commented Sep 27, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b enabl-file-storage-update-docs upstream/enabl-file-storage-update-docs
git merge upstream/main
git push upstream enabl-file-storage-update-docs

otel.yml Outdated Show resolved Hide resolved
internal/pkg/otel/README.md Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
Comment on lines +28 to +30
file_storage/filelogreceiver:
create_directory: true
directory: ${env:STORAGE_DIR}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition of the File Storage extension may cause problems for users using this configuration with read-only filesystems, like in Docker, in Kubernetes, unless they configure a writable storage layer in their containers/pods. Should we be concerned about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition of the File Storage extension may cause problems for users using this configuration with read-only filesystems, like in Docker, in Kubernetes, unless they configure a writable storage layer in their containers/pods. Should we be concerned about this?

I believe we will have a separate recommendation in our k8s configs based on @cmacknz's comment.

Copy link
Contributor

@andrzej-stencel andrzej-stencel Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change breaks running Elatic Agent as OTel Collector in Docker. I'd rather have this change applied in a way that doesn't break running the EDOT Collector in Docker. If we want to enable persistence by default, it should "just work", like it "just works" with the regular Elastic Agent.

$ docker run --rm -ti --entrypoint="elastic-agent" docker.elastic.co/beats/elastic-agent:9.0.0-SNAPSHOT otel
Starting in otel mode
2024-09-30T09:06:39.260Z	info	service@v0.110.0/service.go:137	Setting up own telemetry...
2024-09-30T09:06:39.261Z	info	service@v0.110.0/service.go:186	Skipped telemetry setup.
2024-09-30T09:06:39.261Z	info	builders/builders.go:26	Development component. May change in the future.	{"kind": "exporter", "data_type": "logs", "name": "debug"}
2024-09-30T09:06:39.262Z	info	builders/extension.go:50	Development component. May change in the future.	{"kind": "extension", "name": "memory_limiter"}
2024-09-30T09:06:39.262Z	info	memorylimiter/memorylimiter.go:75	Memory limiter configured	{"kind": "extension", "name": "memory_limiter", "limit_mib": 700, "spike_limit_mib": 180, "check_interval": 1}
2024-09-30T09:06:39.252Z	warn	envprovider@v1.16.0/provider.go:59	Configuration references unset environment variable	{"name": "STORAGE_DIR"}
failed to build extensions: failed to create extension "file_storage/filelogreceiver": mkdir /var/lib/otelcol: permission denied

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrzej-stencel I see what you mean. The error is due to an empty env:STORAGE_DIR and hence, the filestorage extension defaults to /var/lib/otelcol. Since the default user in the Docker container is elastic-agent, this causes a permission denied error.

If I use any of following commands, it works.

docker run --rm -ti --entrypoint="elastic-agent" --mount type=bind,source=$(pwd)/data,target=/usr/share/otel -e STORAGE_DIR=/usr/share/otel/  docker.elastic.co/beats/elastic-agent:9.0.0-SNAPSHOT otel

or

docker run --user root --rm -ti --entrypoint="elastic-agent" docker.elastic.co/beats/elastic-agent:9.0.0-SNAPSHOT otel

I wonder how should we approach this. Maybe add documentation? or update EDOT according to @cmacknz's comment. Or do both.

In our OTel distribution I think we solve this by having an install command that puts this in whatever base install directory we pick (/opt/Elastic, C:\Program Files\Elastic, or /Library/Elastic) but we need an installer that runs as a privileged user to do that. You can't do it with config alone.

@cmacknz @andrzej-stencel I'd appreciate your opinions over this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you that is should "just work".

Copy link
Member

@cmacknz cmacknz Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the container we control the paths, and can pick a default that is writable by our user in the container. Elastic agent uses the STATE_PATH environment variable and puts it under /usr/share/elastic-agent.

agentBaseDirectory = "/usr/share/elastic-agent" // directory that holds all elastic-agent related files
defaultStateDirectory = agentBaseDirectory + "/state" // directory that will hold the state data

We could add an /otelcol sub-directory to /usr/share/elastic-agent for one possibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly we can reuse the STATE_PATH variable and change which sub-directory we use based on whether the entrypoint is the otel command or the elastic-agent container command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

Copy link
Contributor

mergify bot commented Oct 1, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b enabl-file-storage-update-docs upstream/enabl-file-storage-update-docs
git merge upstream/main
git push upstream enabl-file-storage-update-docs

@VihasMakwana
Copy link
Contributor Author

@andrzej-stencel @cmacknz PTAL!

// The filestorage extension will handle directory creation since create_directory: true is set by default.
// If the user hasn’t specified the env:STORAGE_DIR in filestorage, they may have opted for a custom path, and the extension will create the directory accordingly.
// In this case, setting env:STORAGE_DIR will have no effect.
if err := os.Setenv("STORAGE_DIR", filepath.Join(paths.Top(), "otel_registry")); err != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this to avoid introducing a breaking change, as we expect the collector to function smoothly without issues.

Copy link
Contributor

mergify bot commented Oct 7, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b enabl-file-storage-update-docs upstream/enabl-file-storage-update-docs
git merge upstream/main
git push upstream enabl-file-storage-update-docs

@VihasMakwana
Copy link
Contributor Author

@andrzej-stencel @cmacknz PTAL 😅

Copy link

Quality Gate failed Quality Gate failed

Failed conditions
0.0% Coverage on New Code (required ≥ 40%)

See analysis details on SonarQube

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify documentation Improvements or additions to documentation enhancement New feature or request opentelemetry Related to the Elastic Distribution of the OpenTelemetry Collector Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants