Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tempo logs errors relating to multipart uploads with S3 backend #306

Closed
mdisibio opened this issue Nov 2, 2020 · 6 comments · Fixed by #325
Closed

Tempo logs errors relating to multipart uploads with S3 backend #306

mdisibio opened this issue Nov 2, 2020 · 6 comments · Fixed by #325

Comments

@mdisibio
Copy link
Contributor

mdisibio commented Nov 2, 2020

Describe the bug
Running the S3 (MinIO) docker-compose example experiences some errors relating to multipart uploads within a few minutes. The error messages are:

msg="error during compaction cycle" err="error shipping block to backend: error completing multipart upload, object: single-tenant/f5337fd6-ba2c-45e6-bf3a-31496290075b/data, obj etag: : Your proposed upload is smaller than the minimum allowed object size."

To Reproduce
Steps to reproduce the behavior:

  1. Run docker-compose -f docker-compose.s3.minio.yaml up -d
  2. Check tempo logs after a few minutes docker logs ...

Expected behavior
There are no errors.

Environment:
OSX, docker-compose, local

Additional Context
It seems that S3 multipart uploads have a minimum object size of 5 MB, documented here: https://docs.aws.amazon.com/AmazonS3/latest/dev/qfacts.html.

@joe-elliott
Copy link
Member

Note that if you are seeing these issues a current workaround would be to increase these values:

traces_per_block: 100 # cut the head block when it his this number of traces or ...
max_block_duration: 5m # this much time passes

Until you are cutting blocks large enough to be compacted.

@mdisibio
Copy link
Contributor Author

mdisibio commented Nov 2, 2020

The traces_per_block and max_block_duration play into it, however the root cause seems to be more related to the number of traces flushed in each multi-part upload here:

recordsPerBatch = 1000

1000 records results in parts smaller than 5MB. Local testing shows that a value of 3000 works reliably.

@joe-elliott
Copy link
Member

Ah, that makes sense. We should really make this configurable. For history it used to be 10k, but we saw OOMs in our compactors which had us reduce the value. Since this could be a sticking point for S3 users we need to provide a way for them to configure it.

@annanay25
Copy link
Contributor

Thanks for digging into this @mdisibio!

Is there a way to default to a higher value if we're using the s3 backend? It may be confusing to users if we have this very specific edge case for a config value which is dependent on the backend.

I've also created #312 to track the addition of an integration test.

joe-elliott pushed a commit that referenced this issue Nov 5, 2020
* Compactor flushes to backend based on buffer size instead of trace count. Expose configuration option. Default to 30MB

* Add compactor settings to docs

* Add entry to changelog
@joe-elliott
Copy link
Member

joe-elliott commented Nov 13, 2020

Note that if you are seeing this issue then you probably have a lot of failed blocks sitting in s3. These blocks are characterized by having only a partial data object and no meta.json or compacted.meta.json. It's quite possible, over time, to build up a lot of these blocks so it is recommended to use s3 lifecycle rules or manual processes to clean them up.

We are discussing ways to alleviate this issue generally here: #59

Thanks to @chancez for bringing this to our attention.

@chancez
Copy link
Contributor

chancez commented Nov 13, 2020

To be clear, I was seeing failed multi-part uploads in S3, which take up space, but are not visible as objects. You can see the space a bucket consumes in cloudwatch. Aborting them clears the space, and can also be done with an S3 lifecycle rule to abort incomplete multipart uploads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants