Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compactors require high memory as traces combine and grow in the backend #976

Closed
annanay25 opened this issue Sep 22, 2021 · 1 comment · Fixed by #1317
Closed

Compactors require high memory as traces combine and grow in the backend #976

annanay25 opened this issue Sep 22, 2021 · 1 comment · Fixed by #1317
Milestone

Comments

@annanay25
Copy link
Contributor

Describe the bug

Compactors require high memory as traces combine and grow in the backend. This opens the possibility of crafting a long running trace with low spans-per-second that can grow and eventually OOM compactors.

To Reproduce
Steps to reproduce the behavior:

  1. Start Tempo (SHA or version): all versions till e5f7ded
  2. Perform Operations (Read/Write/Others): Carefully craft super-long running traces with a few spans every second that will eventually be combined by the compactors into a MEGA trace (the largest we are seeing so far are 1.3GB)

Expected behavior

Compactors do not keep OOMing.

Environment:

  • Infrastructure: [e.g., Kubernetes, bare-metal, laptop]
  • Deployment tool: [e.g., helm, jsonnet]

Additional Context

Some possibilities considered:

  • write multiple splits of a trace into the same block (might be harder than it sounds)
  • Do not compact blocks that have very large traces
@mdisibio
Copy link
Contributor

a MEGA trace (the largest we are seeing so far are 1.3GB)

For both possibilities listed, a trace of this size would still be trouble for the queriers. I.e. if the compactor is able to write multiple splits of the trace (or ignore the block entirely), the expected behavior is still the querier recombines all segments. Maybe a solution for the querier is limit the amount of data returned in a single call, add a new paged API to retrieve all the splits. Quick ideas on the limits per call, 100MiB? Even a 100MiB trace is quite large and hard to utilize.

Also, I propose to call any trace over 1GB a GIGA trace :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants