Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tenant Index #834

Merged
merged 39 commits into from
Aug 11, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
3a126d9
added tenant index and tests
joe-elliott Jul 14, 2021
108a6bc
note
joe-elliott Jul 14, 2021
02e8ec9
Added tenantindex functionality
joe-elliott Jul 14, 2021
2994512
Added poller functionality
joe-elliott Jul 14, 2021
26e9204
fixed poller tests
joe-elliott Jul 14, 2021
93ad4d2
EnablePolling
joe-elliott Jul 15, 2021
97045d3
log msgs
joe-elliott Jul 16, 2021
d02a061
removed incorrect comment
joe-elliott Jul 16, 2021
6f07a08
Split out list functionality
joe-elliott Jul 16, 2021
eebb6a7
patched up test
joe-elliott Jul 16, 2021
807d97a
tests
joe-elliott Jul 16, 2021
63327bb
TestApplyResults
joe-elliott Jul 21, 2021
004a590
made tenants return a []string
joe-elliott Jul 21, 2021
f0eeb2a
Added TestUpdateSaved
joe-elliott Jul 21, 2021
cb22bfa
Don't add blocks that are already in the lists
joe-elliott Jul 21, 2021
573f2bd
force polling on startup. don't poll querier in single binary
joe-elliott Jul 21, 2021
fb6a05b
Added fallback config option
joe-elliott Jul 21, 2021
16543a5
Added metrics
joe-elliott Jul 21, 2021
90047cf
removed tenant from index
joe-elliott Jul 21, 2021
354fa4a
cleanup
joe-elliott Jul 21, 2021
2ac8014
metrics, alerts, runbook
joe-elliott Jul 21, 2021
b50ff6b
Added tenant builders config option
joe-elliott Jul 22, 2021
7dd03cc
todos
joe-elliott Jul 22, 2021
7545cc8
tenant index too old
joe-elliott Jul 22, 2021
7c6bd57
fixed tests
joe-elliott Jul 22, 2021
e70caa9
changelog
joe-elliott Jul 23, 2021
7f32a8e
lint
joe-elliott Jul 23, 2021
cb4d15a
Merge branch 'main' into bucket-index
joe-elliott Jul 23, 2021
adb27ce
vendor check
joe-elliott Jul 23, 2021
619f545
Merge branch 'main' into bucket-index
joe-elliott Aug 9, 2021
2747fa9
Fixed e2e tests.
joe-elliott Aug 10, 2021
17867a6
PR comments
joe-elliott Aug 10, 2021
8c8c434
docs
joe-elliott Aug 10, 2021
ee5db85
Added testing
joe-elliott Aug 10, 2021
cfba991
removed jpe
joe-elliott Aug 10, 2021
05ae732
doc improvements
joe-elliott Aug 11, 2021
ebde24a
Improved tenant index poll tests
joe-elliott Aug 11, 2021
566be4d
Added fallback testing
joe-elliott Aug 11, 2021
339bb4c
Merge branch 'main' into bucket-index
joe-elliott Aug 11, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@
## main / unreleased

* [BUGFIX] Allow only valid trace ID characters when decoding [#854](https://github.com/grafana/tempo/pull/854) (@zalegrala)
* [BUGFIX] Queriers complete one polling cycle before finishing startup. [#834](https://github.com/grafana/tempo/pull/834) (@joe-elliott)
* [CHANGE] Upgrade Cortex from v1.9.0 to v1.9.0-131-ga4bf10354 [#841](https://github.com/grafana/tempo/pull/841) (@aknuds1)
* [CHANGE] Change default tempo port from 3100 to 3200 [#770](https://github.com/grafana/tempo/pull/809) (@MurzNN)
* [ENHANCEMENT] Added hedged request metric `tempodb_backend_hedged_roundtrips_total` and a new storage agnostic `tempodb_backend_request_duration_seconds` metric that
supersedes the soon-to-be deprecated storage specific metrics (`tempodb_azure_request_duration_seconds`, `tempodb_s3_request_duration_seconds` and `tempodb_gcs_request_duration_seconds`). [#790](https://github.com/grafana/tempo/pull/790) (@JosephWoodward)
* [CHANGE] Jsonnet: use dedicated configmaps for distributors and ingesters [#775](https://github.com/grafana/tempo/pull/775) (@kvrhdn)
* [CHANGE] Docker images are now prefixed by their branch name [#828](https://github.com/grafana/tempo/pull/828) (@jvrplmlmn)
* [FEATURE] Added the ability to hedge requests with all backends [#750](https://github.com/grafana/tempo/pull/750) (@joe-elliott)
* [FEATURE] Added a tenant index to reduce bucket polling. [#834](https://github.com/grafana/tempo/pull/834) (@joe-elliott)
* [ENHANCEMENT] Added hedged request metric `tempodb_backend_hedged_roundtrips_total` and a new storage agnostic `tempodb_backend_request_duration_seconds` metric that
supersedes the soon-to-be deprecated storage specific metrics (`tempodb_azure_request_duration_seconds`, `tempodb_s3_request_duration_seconds` and `tempodb_gcs_request_duration_seconds`). [#790](https://github.com/grafana/tempo/pull/790) (@JosephWoodward)
* [ENHANCEMENT] Performance: improve compaction speed with concurrent reads and writes [#754](https://github.com/grafana/tempo/pull/754) (@mdisibio)
* [ENHANCEMENT] Improve readability of cpu and memory metrics on operational dashboard [#764](https://github.com/grafana/tempo/pull/764) (@bboreham)
* [ENHANCEMENT] Add `azure_request_duration_seconds` metric. [#767](https://github.com/grafana/tempo/pull/767) (@JosephWoodward)
Expand All @@ -19,7 +22,6 @@
* [ENHANCEMENT] Add retry middleware in query-frontend. [#814](https://github.com/grafana/tempo/pull/814) (@kvrhdn)
* [ENHANCEMENT] Add `-use-otel-tracer` to use the OpenTelemetry tracer, this will also capture traces emitted by the gcs sdk. Experimental: not all features are supported (i.e. remote sampling). [#842](https://github.com/grafana/tempo/pull/842) (@kvrhdn)
* [ENHANCEMENT] Add `/services` endpoint. [#863](https://github.com/grafana/tempo/pull/863) (@kvrhdn)
* [CHANGE] Docker images are now prefixed by their branch name [#828](https://github.com/grafana/tempo/pull/828) (@jvrplmlmn)

## v1.0.1

Expand Down
5 changes: 4 additions & 1 deletion cmd/tempo/app/modules.go
Original file line number Diff line number Diff line change
Expand Up @@ -146,8 +146,11 @@ func (t *App) initQuerier() (services.Service, error) {
}
}

// do not enable polling if this is the single binary. in that case the compactor will take care of polling
annanay25 marked this conversation as resolved.
Show resolved Hide resolved
enablePolling := t.cfg.Target == Querier

// todo: make ingester client a module instead of passing config everywhere
querier, err := querier.New(t.cfg.Querier, t.cfg.IngesterClient, t.ring, t.store, t.overrides)
querier, err := querier.New(t.cfg.Querier, t.cfg.IngesterClient, t.ring, t.store, t.overrides, enablePolling)
if err != nil {
return nil, fmt.Errorf("failed to create querier %w", err)
}
Expand Down
9 changes: 9 additions & 0 deletions docs/tempo/website/configuration/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ This document explains the configuration options for Tempo as well as the detail
- [compactor](#compactor)
- [storage](#storage)
- [memberlist](#memberlist)
- [polling](#polling)

#### Use environment variables in the configuration

Expand Down Expand Up @@ -368,6 +369,14 @@ storage:
# Number of blocks to process in parallel during polling. Default is 50.
[blocklist_poll_concurrency: <int>]

# By default components will pull the blocklist from the tenant index. If that fails the component can
# fallback to scanning the entire bucket. Set to false to disable this behavior. Default is true.
[blocklist_poll_fallback: <bool>]

# Maximum number of compactors that should build the tenant index. All other components will download
# the index. Default 2.
[blocklist_poll_tenant_index_builders: <int>]

# Cache type to use. Should be one of "redis", "memcached"
# Example: "cache: memcached"
[cache: <string>]
Expand Down
2 changes: 2 additions & 0 deletions docs/tempo/website/configuration/manifest.md
Original file line number Diff line number Diff line change
Expand Up @@ -300,6 +300,8 @@ storage:
encoding: zstd
blocklist_poll: 5m0s
blocklist_poll_concurrency: 50
blocklist_poll_fallback: true
blocklist_poll_tenant_index_builders: 2
backend: local
local:
path: /tmp/tempo/traces
Expand Down
47 changes: 47 additions & 0 deletions docs/tempo/website/configuration/polling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
title: Polling
---

The polling cycle is controlled by a number of configuration options detailed here.

```
storage:
trace:
# How often to repoll the backend for new blocks. Default is 5m
[blocklist_poll: <duration>]

# Number of blocks to process in parallel during polling. Default is 50.
[blocklist_poll_concurrency: <int>]

# By default components will pull the blocklist from the tenant index. If that fails the component can
# fallback to scanning the entire bucket. Set to false to disable this behavior. Default is true.
[blocklist_poll_fallback: <bool>]

# Maximum number of compactors that should build the tenant index. All other components will download
# the index. Default 2.
[blocklist_poll_tenant_index_builders: <int>]
```

Due to the mechanics of the [tenant index]({{< relref "../operations/polling" >}}) the blocklist will be stale by
at most 2x the configured `blockist_poll` duration. There are two configuration options that need to be balanced
against the blockist_poll to handle this:

The ingester `complete_block_timeout` is used to hold a block in the ingester for a given period of time after
it has been flushed. This allows the ingester to return traces to the queriers while they are still unaware
of the newly flushed blocks.
```
ingester:
# How long to hold a complete block in the ingester after it has been flushed to the backend. Default is 15m
[complete_block_timeout: <duration>]
```

The compactor `compacted_block_retention` is used to keep a block in the backend for a given period of time
after it has been compacted and the data is no longer needed. This allows queriers with a stale blocklist to access
these blocks successfully until they complete their polling cycles and have up to date blocklists.

```
compactor:
compaction:
# How long to leave a block in the backend after it has been compacted successfully. Default is 1h
[compacted_block_retention: <duration>]
```
3 changes: 2 additions & 1 deletion docs/tempo/website/operations/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,5 @@ Resources for deploying and running Tempo:
- [Monitoring](monitoring)
- [Ingester PVCs](ingester_pvcs)
- [Caching](caching)
- [Tempo CLI](tempo_cli)
- [Tempo CLI](tempo_cli)
- [Polling/Tenant Index](polling)
42 changes: 42 additions & 0 deletions docs/tempo/website/operations/polling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
title: Polling
weight: 8
---

# Polling

Tempo maintains knowledge of the state of the backend by polling it on regular intervals. There are currently
only two components that need this knowledge and, consequently, only two that poll the backend: compactors
and queriers.

In order to reduce calls to the backend only a small subset of compactors actually list all blocks and build
what's called a tenant index. The tenant index is a gzip'ed json file located at `/<tenant>/index.json.gz` containing
an entry for every block and compacted block for that tenant. This is done once every `blocklist_poll` duration.

All other compactors and all queriers then rely on downloading this file, unzipping it and using the contained list.
Again this is done once every `blocklist_poll` duration.

Due to this behavior a given compactor or querier will often have an out of date blocklist. During normal operation
it will stale by at most 2x the configured `blocklist_poll`. See [configuration]({{< relref "../configuration/polling" >}})
for more information.

# Monitoring

See our jsonnet for example [alerts](https://github.com/grafana/tempo/blob/main/operations/tempo-mixin/alerts.libsonnet) and [runbook entries](https://github.com/grafana/tempo/blob/main/operations/tempo-mixin/runbook.md)
related to polling.

If you are building your own dashboards/alerts here are a few relevant metrics:

- `tempodb_blocklist_poll_errors_total`
A holistic metric that increments for any error with polling the blocklist. Any increase in this metric should be reviewed.
- `tempodb_blocklist_poll_duration_seconds`
Histogram recording the length of time in seconds to poll the entire blocklist.
- `tempodb_blocklist_length`
Total blocks as seen by this component.
- `tempodb_blocklist_tenant_index_errors_total`
A holistic metrics that indcrements for any error building the tenant index. Any increase in this metric should be reviewed.
- `tempodb_blocklist_tenant_index_builder`
A gauge that has the value 1 if this compactor is attempting to build the tenant index and 0 if it is not. At least one compactor
must have this value set to 1 for the system to be working.
- `tempodb_blocklist_tenant_index_age_seconds`
The age of the last loaded tenant index. now() minus this value indicates how stale this components view of the blocklist is.
1 change: 1 addition & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ require (
github.com/golang/glog v0.0.0-20160126235308-23def4e6c14b
github.com/golang/protobuf v1.5.2
github.com/golang/snappy v0.0.3
github.com/google/go-cmp v0.5.6
github.com/google/uuid v1.2.0
github.com/gorilla/mux v1.8.0
github.com/grpc-ecosystem/go-grpc-prometheus v1.2.1-0.20191002090509-6af20e3a5340 // indirect
Expand Down
6 changes: 5 additions & 1 deletion integration/e2e/e2e_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -222,14 +222,18 @@ func TestMicroservices(t *testing.T) {
require.NoError(t, err)
require.Equal(t, 204, res.StatusCode)

res, err = cortex_e2e.GetRequest("http://" + tempoIngester3.Endpoint(3200) + "/flush")
require.NoError(t, err)
require.Equal(t, 204, res.StatusCode)

// sleep for one maintenance cycle
time.Sleep(5 * time.Second)

// test metrics
for _, i := range []*cortex_e2e.HTTPService{tempoIngester1, tempoIngester2, tempoIngester3} {
require.NoError(t, i.WaitSumMetrics(cortex_e2e.Equals(1), "tempo_ingester_blocks_flushed_total"))
require.NoError(t, i.WaitSumMetrics(cortex_e2e.Equals(3), "tempodb_blocklist_length"))
}
require.NoError(t, tempoQuerier.WaitSumMetrics(cortex_e2e.Equals(3), "tempodb_blocklist_length"))
require.NoError(t, tempoQueryFrontend.WaitSumMetrics(cortex_e2e.Equals(1), "tempo_query_frontend_queries_total"))

// query trace - should fetch from backend
Expand Down
3 changes: 3 additions & 0 deletions modules/compactor/compactor.go
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,9 @@ func (c *Compactor) starting(ctx context.Context) error {
}
}

// this will block until one poll cycle is complete
c.store.EnablePolling(c)

return nil
}

Expand Down
5 changes: 2 additions & 3 deletions modules/ingester/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,7 @@ import (
"github.com/cortexproject/cortex/pkg/util/flagext"
cortex_util "github.com/cortexproject/cortex/pkg/util/log"
"github.com/go-kit/kit/log/level"

"github.com/grafana/tempo/modules/storage"
"github.com/grafana/tempo/tempodb"
)

// Config for an ingester.
Expand Down Expand Up @@ -42,7 +41,7 @@ func (cfg *Config) RegisterFlagsAndApplyDefaults(prefix string, f *flag.FlagSet)
f.DurationVar(&cfg.MaxTraceIdle, prefix+".trace-idle-period", 30*time.Second, "Duration after which to consider a trace complete if no spans have been received")
f.DurationVar(&cfg.MaxBlockDuration, prefix+".max-block-duration", time.Hour, "Maximum duration which the head block can be appended to before cutting it.")
f.Uint64Var(&cfg.MaxBlockBytes, prefix+".max-block-bytes", 1024*1024*1024, "Maximum size of the head block before cutting it.")
f.DurationVar(&cfg.CompleteBlockTimeout, prefix+".complete-block-timeout", time.Minute+storage.DefaultBlocklistPoll, "Duration to keep head blocks in the ingester after they have been cut.")
f.DurationVar(&cfg.CompleteBlockTimeout, prefix+".complete-block-timeout", 3*tempodb.DefaultBlocklistPoll, "Duration to keep head blocks in the ingester after they have been cut.")

hostname, err := os.Hostname()
if err != nil {
Expand Down
21 changes: 18 additions & 3 deletions modules/querier/querier.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ type Querier struct {

subservices *services.Manager
subservicesWatcher *services.FailureWatcher

enablePolling bool
}

type responseFromIngesters struct {
Expand All @@ -55,7 +57,7 @@ type responseFromIngesters struct {
}

// New makes a new Querier.
func New(cfg Config, clientCfg ingester_client.Config, ring ring.ReadRing, store storage.Store, limits *overrides.Overrides) (*Querier, error) {
func New(cfg Config, clientCfg ingester_client.Config, ring ring.ReadRing, store storage.Store, limits *overrides.Overrides, enablePolling bool) (*Querier, error) {
factory := func(addr string) (ring_client.PoolClient, error) {
return ingester_client.New(addr, clientCfg)
}
Expand All @@ -69,8 +71,9 @@ func New(cfg Config, clientCfg ingester_client.Config, ring ring.ReadRing, store
factory,
metricIngesterClients,
log.Logger),
store: store,
limits: limits,
store: store,
limits: limits,
enablePolling: enablePolling,
}

q.Service = services.NewBasicService(q.starting, q.running, q.stopping)
Expand Down Expand Up @@ -107,6 +110,12 @@ func (q *Querier) starting(ctx context.Context) error {
return fmt.Errorf("failed to start subservices %w", err)
}
}

if q.enablePolling {
// this will block until one poll cycle is complete
q.store.EnablePolling(q)
}

return nil
}

Expand Down Expand Up @@ -246,3 +255,9 @@ func (q *Querier) forGivenIngesters(ctx context.Context, replicationSet ring.Rep

return responses, err
}

// implements blocklist.JobSharder. Queriers rely on compactors to build the tenant
// index which they then consume.
func (q *Querier) Owns(_ string) bool {
return false
}
12 changes: 7 additions & 5 deletions modules/querier/querier_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -64,17 +64,19 @@ func TestReturnAllHits(t *testing.T) {
WAL: &wal.Config{
Filepath: path.Join(tempDir, "wal"),
},
BlocklistPoll: 50 * time.Millisecond,
BlocklistPoll: 50 * time.Millisecond,
BlocklistPollFallback: true,
}, log.NewNopLogger())
assert.NoError(t, err, "unexpected error creating tempodb")
require.NoError(t, err, "unexpected error creating tempodb")

r.EnablePolling(&Querier{})

wal := w.WAL()
assert.NoError(t, err)

blockCount := 2
testTraceID := make([]byte, 16)
_, err = rand.Read(testTraceID)
assert.NoError(t, err)
require.NoError(t, err)

// keep track of traces sent
testTraces := make([]*tempopb.Trace, 0, blockCount)
Expand All @@ -98,7 +100,7 @@ func TestReturnAllHits(t *testing.T) {
}

// sleep for blocklist poll
time.Sleep(100 * time.Millisecond)
time.Sleep(200 * time.Millisecond)

// find should return both now
foundBytes, _, err := r.Find(context.Background(), util.FakeTenantID, testTraceID, tempodb.BlockIDMin, tempodb.BlockIDMax)
Expand Down
7 changes: 3 additions & 4 deletions modules/storage/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ package storage

import (
"flag"
"time"

cortex_cache "github.com/cortexproject/cortex/pkg/chunk/cache"

Expand All @@ -23,15 +22,15 @@ type Config struct {
Trace tempodb.Config `yaml:"trace"`
}

const DefaultBlocklistPoll = 5 * time.Minute

// RegisterFlagsAndApplyDefaults registers the flags.
func (cfg *Config) RegisterFlagsAndApplyDefaults(prefix string, f *flag.FlagSet) {

cfg.Trace.BlocklistPollFallback = true
cfg.Trace.BlocklistPollConcurrency = tempodb.DefaultBlocklistPollConcurrency
cfg.Trace.BlocklistPollTenantIndexBuilders = tempodb.DefaultTenantIndexBuilders
kvrhdn marked this conversation as resolved.
Show resolved Hide resolved

f.StringVar(&cfg.Trace.Backend, util.PrefixConfig(prefix, "trace.backend"), "", "Trace backend (s3, azure, gcs, local)")
f.DurationVar(&cfg.Trace.BlocklistPoll, util.PrefixConfig(prefix, "trace.maintenance-cycle"), DefaultBlocklistPoll, "Period at which to run the maintenance cycle.")
f.DurationVar(&cfg.Trace.BlocklistPoll, util.PrefixConfig(prefix, "trace.maintenance-cycle"), tempodb.DefaultBlocklistPoll, "Period at which to run the maintenance cycle.")
kvrhdn marked this conversation as resolved.
Show resolved Hide resolved

cfg.Trace.WAL = &wal.Config{}
f.StringVar(&cfg.Trace.WAL.Filepath, util.PrefixConfig(prefix, "trace.wal.path"), "/var/tempo/wal", "Path at which store WAL blocks.")
Expand Down
44 changes: 43 additions & 1 deletion operations/tempo-mixin/alerts.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -109,8 +109,50 @@
runbook_url: 'https://github.com/grafana/tempo/tree/main/operations/tempo-mixin/runbook.md#TempoPollsFailing'
},
},
{
alert: 'TempoTenantIndexFailures',
expr: |||
sum by (cluster, namespace) (increase(tempodb_blocklist_tenant_index_errors_total{}[1h])) > %s and
sum by (cluster, namespace) (increase(tempodb_blocklist_tenant_index_errors_total{}[5m])) > 0
||| % $._config.alerts.polls_per_hour_failed,
labels: {
severity: 'critical',
},
annotations: {
message: 'Greater than %s tenant index failures in the past hour.' % $._config.alerts.polls_per_hour_failed,
runbook_url: 'https://github.com/grafana/tempo/tree/main/operations/tempo-mixin/runbook.md#TempoTenantIndexFailures'
},
},
{
alert: 'TempoNoTenantIndexBuilders',
expr: |||
sum by (cluster, namespace) (tempodb_blocklist_tenant_index_builder{}) == 0
|||,
'for': '5m',
labels: {
severity: 'critical',
},
annotations: {
message: 'No tenant index builders. Tenant index is out of date.',
runbook_url: 'https://github.com/grafana/tempo/tree/main/operations/tempo-mixin/runbook.md#TempoNoTenantIndexBuilders'
},
},
{
alert: 'TempoTenantIndexTooOld',
expr: |||
max by (cluster, namespace) (tempodb_blocklist_tenant_index_age_seconds{}) > %s
||| % $._config.alerts.max_tenant_index_age_seconds,
'for': '5m',
labels: {
severity: 'critical',
},
annotations: {
message: 'Tenant index age is %s seconds old.' % $._config.alerts.max_tenant_index_age_seconds,
runbook_url: 'https://github.com/grafana/tempo/tree/main/operations/tempo-mixin/runbook.md#TempoTenantIndexTooOld'
},
},
],
},
],
},
}
}
Loading