Fix closed writer panic by resetting buffers/writer correctly on error path #1379

annanay25 · 2022-04-13T05:43:33Z

Signed-off-by: Annanay annanayagarwal@gmail.com

What this PR does:
Refer to a longer explanation in this comment: #1374 (comment)

There is a small section of code in our data writer where the compression writer is closed but not reset. If there are any errors like writing to disk / transient errors with kubernetes pvcs, the function could return without resetting the compression writer. This PR changes that to put the reset logic in a defer func so it triggers whenever the return is triggered.

I understand that working with defer funcs in this manner makes it tricky to follow the logic, so I'm open to alternate implementation methods.

Edit: Offline discussions confirmed my fears as mentioned above :) and so went ahead with a different strategy.

cc @jvilhuber

Which issue(s) this PR fixes:
Fixes the red herring error message in #1374. But we still need to fix the fact that we drop traces without warning when there are issues writing to disk.

Checklist

N/A Tests updated
N/A Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Annanay <annanayagarwal@gmail.com>

joe-elliott · 2022-04-13T15:00:49Z

Is there an actual panic here? Are we just looking to guarantee we do all cleanup work even if marshalPageToWriter fails?

annanay25 · 2022-04-13T18:40:26Z

Nope theres no panic. We just want to make sure ingesters function normally after a transient failure like disk full.

joe-elliott · 2022-04-13T21:08:38Z

Frankly if the disk is full it might make sense to actually panic. This is a pretty dire failure mode for Tempo as ingested data will be lost.

With this change how will the ingester behave if the disk is persistently full?

annanay25 · 2022-04-15T05:10:57Z

Yes I agree. If the disk is persistently full the ingester will continue to try and write but fail at every attempt till the disk is replaced. Not sure if its worth bubbling that error up and fail on the write path in which case it will trigger some ingester write alert. But as of now the failure to write to disk will go undetected unless someone checks the logs when no more blocks are being created.

Does it then make sense to panic if there's a error in these lines:
https://github.com/grafana/tempo/pull/1379/files#diff-5925637787bab0225dab0a4ab2bcf81bfdd3a6c841d051759b11eae011e37067R75-R78

I believe that should do the trick, WDYT?

Signed-off-by: Annanay <annanayagarwal@gmail.com>

joe-elliott

Thanks for the updates! Looks good.

Reset writers/buffer in a defer func

00cc572

Signed-off-by: Annanay <annanayagarwal@gmail.com>

annanay25 requested review from joe-elliott, mdisibio, dgzlopes, mapno, kvrhdn and zalegrala as code owners April 13, 2022 05:43

annanay25 added 3 commits April 19, 2022 20:32

Move logic out of defer func and add comment

53e9c9f

Signed-off-by: Annanay <annanayagarwal@gmail.com>

Merge branch 'main' into fix-writer-panic

ab747db

Signed-off-by: Annanay <annanayagarwal@gmail.com>

Changelog

7281da7

Signed-off-by: Annanay <annanayagarwal@gmail.com>

annanay25 changed the title ~~Fix closed writer panic by resetting buffers/writer in a defer func~~ Fix closed writer panic by resetting buffers/writer correctly on error path Apr 19, 2022

joe-elliott approved these changes Apr 19, 2022

View reviewed changes

annanay25 merged commit 214ceac into grafana:main Apr 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix closed writer panic by resetting buffers/writer correctly on error path #1379

Fix closed writer panic by resetting buffers/writer correctly on error path #1379

annanay25 commented Apr 13, 2022 •

edited

Loading

joe-elliott commented Apr 13, 2022

annanay25 commented Apr 13, 2022

joe-elliott commented Apr 13, 2022 •

edited

Loading

annanay25 commented Apr 15, 2022 •

edited

Loading

joe-elliott left a comment

Fix closed writer panic by resetting buffers/writer correctly on error path #1379

Fix closed writer panic by resetting buffers/writer correctly on error path #1379

Conversation

annanay25 commented Apr 13, 2022 • edited Loading

joe-elliott commented Apr 13, 2022

annanay25 commented Apr 13, 2022

joe-elliott commented Apr 13, 2022 • edited Loading

annanay25 commented Apr 15, 2022 • edited Loading

joe-elliott left a comment

Choose a reason for hiding this comment

annanay25 commented Apr 13, 2022 •

edited

Loading

joe-elliott commented Apr 13, 2022 •

edited

Loading

annanay25 commented Apr 15, 2022 •

edited

Loading