-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GetMetrics use second pass #2765
Conversation
…messy memory pooling required to keep things from exploding
…xpected test block version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ci is mad at you, but lgtm.
this should improve memory usage on traceql as well?
Right, since it is all the same pipeline any query using |
* [DOC] Update screenshots tracing (#2767) * Change screenshots to use GCP * Remove conflict images * Update grafana/dskit dependency (#2773) * Update grafana/dskit dependency * go mod tidy all the go.mod's * Update dskit again to catch patch fix * Check in all of vendor/ * GetMetrics use second pass (#2765) * traceqlmetris use second pass for more correct output. However super messy memory pooling required to keep things from exploding * Expose parquetquery methods better * comment * Move Release() to the spanset instead of the Fetcher interface, much less churn * cleanup, simplification * cleanup * Small cleanup * Apply same pooling changes to vparquet3. Update benchmark to verify expected test block version * fix test * changelog * Refactor user-configurable overrides API and client; add detailed logging (#2755) * Refactor user-configurable overrides API and client * Refactor user-configurable overrides API and client * Bug smash * Replace weaveworks imports * 🤦 * Address why GCS is faling e2e test * Add patch to e2e test * Sprinkle some more println in tests * Update various OTel dependencies (#2778) * Update various OTel dependencies * Breaking changes, fix compilation issues --------- Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com> Co-authored-by: Martin Disibio <martin.disibio@grafana.com>
What this PR does:
There is a subtle behavior issue in the metrics summary api (traceqlmetrics/GetMetrics) that needs to be fixed, but doing so isn't tenable without also addressing the memory inefficiencies in the second pass layer in the parquet encodings.
** Main goal **
The metrics summary api is currently putting the groupBy conditions in the first pass, so this isn't quite working as expected. If you group by
span.foo
(which doesn't exist), it should return a series with nil values (all the spans matching the query that don't have this attribute), but instead it returns nothing. You can think of it as the difference between these two queries:vs:
We need it to operate like the latter. So this updates
traceqlmetrics.GetMetrics
to move the groupBy to the second pass (the same asselect()
).** Memory impact **
Doing that starting hitting a lot of memory inefficiencies in the second pass layer. This isn't normally an issue because searches typically hit a smallish number of spansets (1000 or less) but the metrics summary api will hit millions. There is ~10x increase in memory in
BenchmarkBackendBlockGetMetrics
without these changes.There are two main changes:
parquetquery
to allow external iterators to get and put things back to the pool. TherebatchIterator
andbridgeIterator
need to do this. They were creating a lot of IteratorResults outside the pool.Release
method totraceql.Spanset
so the final user of the data (traceqlmetrics.GetMetrics
) can return the spansets back to the pools. I tried a couple of variations of this and I'm happy with how this ended up. It's optional and things fallback to GC if this is never used (and in fact I didn't update any other traceql areas to use it.)Which issue(s) this PR fixes:
Fixes n/a
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]