Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search v0.1.0 (ingester only) #806

Merged
merged 118 commits into from
Sep 1, 2021
Merged

Search v0.1.0 (ingester only) #806

merged 118 commits into from
Sep 1, 2021

Conversation

mdisibio
Copy link
Contributor

@mdisibio mdisibio commented Jul 9, 2021

What this PR does:
This PR adds basic search capabilities of traces in the ingesters. The approach should be decently fast and powerful enough for a first pass. Flatbuffer-encoded metadata for each trace is stored in a new file alongside the block data. When searching, the flatbuffer metadata is evaluated and matching traces are returned. The OTLP/protobuf block data is not involved at all. A new search api is exposed in the query-frontend/querier/ingester, and it can be called directly (by a Grafana experimental UI that is in the works), or via tempo-query which is also updated to translate the jaeger search conditions. There are also apis to lookup attribute names and values (for autocomplete and jaeger dropdowns).

Flatbuffers are quite fast and this basic implementation can search 3+GB/s of files, based on the benchmark in /tempodb/search/. With additional tuning I believe we could increase it further, but this is already enough to saturate most disk and network i/o, therefore a better direction might be to look at indexing or compression.

Functionality must be enabled by setting search_enabled : true at the root yaml config for query-frontend, querier, and distributor. This causes the search apis to be registered and the distributor to start capturing search data of traces as they are received. Ingester is backwards compatible and tolerates data present or not.

Query Details
This approach is tag-oriented. The search metadata effectively flattens a trace down to all unique key/value pairs for span and resource attributes, and the min/max start and end times. All attributes are coerced to strings. Therefore this first version is quite basic and can answer questions about hits anywhere within a trace, or the overall trace duration. But it cannot satisfy complex searches on individual spans except the root.

Examples:

  • service.name="myservice" // Find traces where any span had resource attribute service.name
  • root.service.name = "myservice" // Find traces where root span had etc
  • http.status_code="500" // Find traces where any span had attribute http.status_code=500

Combining conditions is matching multiple hits anywhere in the trace:

  • http.url="/api/thing" http.status_code="500" // Find traces where there exists both of these values, but not necessarily on the same span.

Implementation Details
Search data is extracted in the distributor since this is the only location where the trace is available in "cleartext". Flatbuffer metadata is built and byte slices are sent from distributor->ingester. Ingester stores this data alongside live traces, and it is eventually flushed to the WAL (separate files in /wal), and completed to the local backend (new files in /wal/blocks///search). When searching, the ingester checks in all 3 locations.

Flatbuffers are a compiled schema. There is a new pkg/tempofb with this and make gen-flat to compile it. One key detail is the use of flatbuffers.CreateSharedString which interns a string in the block of data, leading to efficient storage for common tags and values.

Search results are trace headers and not entire trace bodies. It includes basic details like id, duration, service, operation. Also included in the api response is some basic metrics for how many traces, blocks, and bytes were inspected. This will let us quickly gauge the performance of various queries.

Large remaining gaps / next steps
Consensus is that these are not needed for this first pass and will be addressed in the future.

  • All search data is abandoned on each restart. This will be resolved when versioning for search files is introduced, but not included now so that we can continue to experiment with the file format. Versioning is a requirement if/when search data is pushed to the full backend (s3, gcs, etc).
  • Explore extracting common/well-known tags to their own fields. This will be even more efficient than the interned strings, and also faster to evaluate. When not present there should be no overhead because of how flatbuffers skips encoding for non-present fields. This is not done yet because it gets messy/repetitive quickly, if we want to extract dozens or hundreds of well-known tags (which I think is a good direction).

Small remaining gaps / next steps. Not necessarily required in this PR but open to feedback

  • When searching live traces don't lock the mutex the whole time. Maybe unlock/relock between each trace?
  • Change signature of search api? Maybe POST instead of GET.
  • Instrumentation: tracing, metrics, logging for the search itself.

Which issue(s) this PR fixes:
Fixes: #471

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

mdisibio and others added 30 commits June 23, 2021 15:50
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
…search everything generically

Signed-off-by: Martin Disibio <mdisibio@gmail.com>
…ue value in-memory

Signed-off-by: Martin Disibio <mdisibio@gmail.com>
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
…sults are found

Signed-off-by: Martin Disibio <mdisibio@gmail.com>
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
pkg/tempofb/tempo.fbs Outdated Show resolved Hide resolved
…h bytes exceed per trace limit

Signed-off-by: Martin Disibio <mdisibio@gmail.com>
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
@mdisibio mdisibio changed the title WIP: Search v0.0.00000001 (ingester only) Search v0.1.0 (ingester only) Aug 30, 2021
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EL GEE TEE EM

Copy link
Contributor

@annanay25 annanay25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@mdisibio mdisibio merged commit 09d455b into grafana:main Sep 1, 2021
@mdisibio mdisibio deleted the search branch September 1, 2021 12:20
@mdisibio mdisibio mentioned this pull request Sep 2, 2021
12 tasks
@joe-elliott joe-elliott mentioned this pull request Sep 13, 2021
3 tasks
@mdisibio
Copy link
Contributor Author

We determined that this also fixed #216

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support FindService && FindOperations
5 participants