Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close WAL block file before deleting everything #2152

Merged
merged 8 commits into from
Mar 8, 2023

Conversation

kostya9
Copy link
Contributor

@kostya9 kostya9 commented Feb 27, 2023

What this PR does: Close WAL block file before deleting everything

Also added some other files closing that I found during a look around the code.

PS: I not proficient in Go, so I could have misunderstood, so please check if it looks good

Which issue(s) this PR fixes:
Attempt to fix #2034
Fixes #2153
Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@CLAassistant
Copy link

CLAassistant commented Feb 27, 2023

CLA assistant check
All committers have signed the CLA.

Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for taking a look at this 👍

tempodb/backend/local/local.go Show resolved Hide resolved
tempodb/encoding/vparquet/wal_block.go Outdated Show resolved Hide resolved
@@ -460,6 +463,8 @@ func (b *walBlock) FindTraceByID(ctx context.Context, id common.ID, _ common.Sea
}

r := parquet.NewReader(file)
defer r.Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also defer close the file above? and all of the files returned from page.file() calls?

Copy link
Contributor Author

@kostya9 kostya9 Feb 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parquet.File unfortunately does not implement a Close function, so we can not close it. I noticed that when I was figuring out what is going on and logged an issue - #2153

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, i forgot this was a parquet.File and not an os.File. wdyt about returning a close() method from file() to close the underlying os file?

Copy link
Contributor Author

@kostya9 kostya9 Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds interesting.

What do you think of wrapping the parquet.File with another type that implements Close()? I feel like maybe that would be a more intuitive design. Also, it could later be used in other call sites theoretically (which would probably also need to close the file at some point).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so like:

type pageFile struct {
  parquet.File
  os.File
}

then the caller can interact with the parquet.file and eventually close the os.file? i'm good on that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added closing the file opened in file(). It resulted in a little bit more changes that I anticipated, but we should close everything opened in the wal_block module now

Copy link
Contributor Author

@kostya9 kostya9 Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, did all that work, but didn't call the final iterator.Close(). Fixed that

Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apologies. i had a pending review that i did not submit from a couple days back.

tempodb/backend/local/local.go Show resolved Hide resolved
@@ -460,6 +463,8 @@ func (b *walBlock) FindTraceByID(ctx context.Context, id common.ID, _ common.Sea
}

r := parquet.NewReader(file)
defer r.Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, i forgot this was a parquet.File and not an os.File. wdyt about returning a close() method from file() to close the underlying os file?

Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mapno I'm good on this, but before merging can you take a quick look? this does a lot of subtle changes with regard to the wal and searching. i'm concerned we may see some strange issues crop up b/c we relied on these files being opened, but i think explicitly closing these files is the right direction.

don't feel the need to do a line by line review (unless you want). mainly flagging you due to the changes along the SearchTags and TagValues paths.

@kostya9 thanks for the persistence and level of detail on this! after @mapno takes a look we will merge.

CHANGELOG.md Outdated Show resolved Hide resolved
…MetadataIterator (wrap it instead of spansetIterator)
Copy link
Member

@mapno mapno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This change touches a lot of code, so it's difficult to follow all the close() calls. Tests should cover most edge cases. Thanks!

@mapno mapno enabled auto-merge (squash) March 8, 2023 11:24
@kostya9
Copy link
Contributor Author

kostya9 commented Mar 8, 2023

I see failures in the generator storage tests, from my analysis, the generator doesn't use the code I changed here. Are these tests flaky, or is there a chance something is broken?

@joe-elliott
Copy link
Member

I see failures in the generator storage tests, from my analysis, the generator doesn't use the code I changed here. Are these tests flaky, or is there a chance something is broken?

Yeah. It's a flakey test. Will rerun.

@mapno mapno merged commit 40cc4c5 into grafana:main Mar 8, 2023
@joe-elliott
Copy link
Member

Thanks for the fixes @kostya9!

@kostya9 kostya9 deleted the close_files_after_opening branch March 8, 2023 15:29
mdisibio pushed a commit to mdisibio/tempo that referenced this pull request Apr 18, 2023
* WAL_BLOCK should close the file it opened before trying to remove stuff from the directory

Close other files after opening.

* Add changelog row

* Add nullcheck

* Add one more missed close

* Aggregate errors for wal.Clear

* Close file opened by WAL's file() by wrapping the parquet.File with os.File

* Add the acual close for the iterator

* Fix changelog entry position. Fix types after introduction of spansetMetadataIterator (wrap it instead of spansetIterator)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

walBlockFlush opened parquet files are never closed Tempo Ingester Flushes Failing
4 participants