Skip to content

Traces & Metrics: Scan Plan #1502

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

sdd
Copy link
Contributor

@sdd sdd commented Jul 12, 2025

Which issue does this PR close?

What changes are included in this PR?

Only commit 2a02e55 is relevant - I branched off #482 which is unmerged and so the commit for that PR shows up in here too for now.

Implements an initial set of traces and metrics on the table scan plan phase.

Approach guided by prior discussions on #482 and on monthly Iceberg-rust community sync discussion a few months ago.

Are these changes tested?

An integration test has been added that also functions as a mini example. Jaeger has been added to the integration test docker compose stack, and the test exports traces to it which can be viewed by browsing to Jaeger. If you're running the integration tests, setting the env var ICEBERG_INTEG_TEST_PERSISTENT_DOCKER_STACK=1 will ensure that the docker stack is kept up between test runs so that you can view traces on the Jaeger container by browsing to http://localhost:16686

Configuration and Example Output

info level traces

Here's an example of how the traces look with RUST_LOG specifying iceberg=info. The aim at this level is for the trace to only contain spans that represent the calls made to the public API.

image

We see just a single span for plan_files. There are attributes on the span for the scan predicate, the field ids, case sensitivity, and the scan's snapshot id.

debug level traces

With RUST_LOG setting iceberg=debug, the trace is more verbose, but not overly so. This is intended as a good default for when you want to see how long the major sub-tasks of a file plan take, as well as some info on file pruning. Here's what the trace at this level looks like when expanded just one level:

image

Expanding one more level, we can see the individual data file manifest spans - useful for seeing how many manifests there are, how many are processed in parallel, and what kind of variance there is between the retrieval and processing times of each.

image

Expanding the detail for one of those manifests, we see attributes detailing the file path of the manifest in question and the number of entries within it.

image

Inspecting the object_cache.get_manifest span shows us that this was a cache miss (kind of obvious given the span duration, but useful when analysing traces in aggregate).

image

Finally, looking at the deepest levels in the debug level trace shows us more detail for a given manifest. We see the time taken to retrieve that manifest, plus individual spans representing each manifest entry / data file. The expanded details for each of these contain the data file path, plus info on the filtering applied. We see here for this file that it was filtered out because its partition did not match the predicate.

image

trace level traces

The most detailed level of trace is enabled by specifying iceberg=trace in RUST_LOG. This may generate traces that are too large for general use but could be useful when activated selectively for troubleshooting specific issues.

At this level we get spans for the eval calls on the expression evaluator and metrics evaluator, showing the time taken to evaluate the predicate. Not usually useful but I've found it occasionally useful when I've received pathologically large queries on very fragmented tables.

image

@sdd sdd force-pushed the scan-plan-instrumentation branch from af106a7 to d4f440a Compare July 19, 2025 08:36
@sdd
Copy link
Contributor Author

sdd commented Jul 19, 2025

Extended to add trace coverage to arrow reader too. Timings similar to the scan, plus per-data-file row selection / filtering stats:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant