Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tenant Index #834

Merged
merged 39 commits into from
Aug 11, 2021
Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
3a126d9
added tenant index and tests
joe-elliott Jul 14, 2021
108a6bc
note
joe-elliott Jul 14, 2021
02e8ec9
Added tenantindex functionality
joe-elliott Jul 14, 2021
2994512
Added poller functionality
joe-elliott Jul 14, 2021
26e9204
fixed poller tests
joe-elliott Jul 14, 2021
93ad4d2
EnablePolling
joe-elliott Jul 15, 2021
97045d3
log msgs
joe-elliott Jul 16, 2021
d02a061
removed incorrect comment
joe-elliott Jul 16, 2021
6f07a08
Split out list functionality
joe-elliott Jul 16, 2021
eebb6a7
patched up test
joe-elliott Jul 16, 2021
807d97a
tests
joe-elliott Jul 16, 2021
63327bb
TestApplyResults
joe-elliott Jul 21, 2021
004a590
made tenants return a []string
joe-elliott Jul 21, 2021
f0eeb2a
Added TestUpdateSaved
joe-elliott Jul 21, 2021
cb22bfa
Don't add blocks that are already in the lists
joe-elliott Jul 21, 2021
573f2bd
force polling on startup. don't poll querier in single binary
joe-elliott Jul 21, 2021
fb6a05b
Added fallback config option
joe-elliott Jul 21, 2021
16543a5
Added metrics
joe-elliott Jul 21, 2021
90047cf
removed tenant from index
joe-elliott Jul 21, 2021
354fa4a
cleanup
joe-elliott Jul 21, 2021
2ac8014
metrics, alerts, runbook
joe-elliott Jul 21, 2021
b50ff6b
Added tenant builders config option
joe-elliott Jul 22, 2021
7dd03cc
todos
joe-elliott Jul 22, 2021
7545cc8
tenant index too old
joe-elliott Jul 22, 2021
7c6bd57
fixed tests
joe-elliott Jul 22, 2021
e70caa9
changelog
joe-elliott Jul 23, 2021
7f32a8e
lint
joe-elliott Jul 23, 2021
cb4d15a
Merge branch 'main' into bucket-index
joe-elliott Jul 23, 2021
adb27ce
vendor check
joe-elliott Jul 23, 2021
619f545
Merge branch 'main' into bucket-index
joe-elliott Aug 9, 2021
2747fa9
Fixed e2e tests.
joe-elliott Aug 10, 2021
17867a6
PR comments
joe-elliott Aug 10, 2021
8c8c434
docs
joe-elliott Aug 10, 2021
ee5db85
Added testing
joe-elliott Aug 10, 2021
cfba991
removed jpe
joe-elliott Aug 10, 2021
05ae732
doc improvements
joe-elliott Aug 11, 2021
ebde24a
Improved tenant index poll tests
joe-elliott Aug 11, 2021
566be4d
Added fallback testing
joe-elliott Aug 11, 2021
339bb4c
Merge branch 'main' into bucket-index
joe-elliott Aug 11, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@
## main / unreleased

* [BUGFIX] Allow only valid trace ID characters when decoding [#854](https://github.com/grafana/tempo/pull/854) (@zalegrala)
* [BUGFIX] Queriers complete one polling cycle before finishing startup. [#834](https://github.com/grafana/tempo/pull/834) (@joe-elliott)
* [CHANGE] Upgrade Cortex from v1.9.0 to v1.9.0-131-ga4bf10354 [#841](https://github.com/grafana/tempo/pull/841) (@aknuds1)
* [CHANGE] Change default tempo port from 3100 to 3200 [#770](https://github.com/grafana/tempo/pull/809) (@MurzNN)
* [ENHANCEMENT] Added hedged request metric `tempodb_backend_hedged_roundtrips_total` and a new storage agnostic `tempodb_backend_request_duration_seconds` metric that
supersedes the soon-to-be deprecated storage specific metrics (`tempodb_azure_request_duration_seconds`, `tempodb_s3_request_duration_seconds` and `tempodb_gcs_request_duration_seconds`). [#790](https://github.com/grafana/tempo/pull/790) (@JosephWoodward)
* [CHANGE] Jsonnet: use dedicated configmaps for distributors and ingesters [#775](https://github.com/grafana/tempo/pull/775) (@kvrhdn)
* [CHANGE] Docker images are now prefixed by their branch name [#828](https://github.com/grafana/tempo/pull/828) (@jvrplmlmn)
* [FEATURE] Added the ability to hedge requests with all backends [#750](https://github.com/grafana/tempo/pull/750) (@joe-elliott)
* [FEATURE] Added a tenant index to reduce bucket polling. [#834](https://github.com/grafana/tempo/pull/834) (@joe-elliott)
* [ENHANCEMENT] Added hedged request metric `tempodb_backend_hedged_roundtrips_total` and a new storage agnostic `tempodb_backend_request_duration_seconds` metric that
supersedes the soon-to-be deprecated storage specific metrics (`tempodb_azure_request_duration_seconds`, `tempodb_s3_request_duration_seconds` and `tempodb_gcs_request_duration_seconds`). [#790](https://github.com/grafana/tempo/pull/790) (@JosephWoodward)
* [ENHANCEMENT] Performance: improve compaction speed with concurrent reads and writes [#754](https://github.com/grafana/tempo/pull/754) (@mdisibio)
* [ENHANCEMENT] Improve readability of cpu and memory metrics on operational dashboard [#764](https://github.com/grafana/tempo/pull/764) (@bboreham)
* [ENHANCEMENT] Add `azure_request_duration_seconds` metric. [#767](https://github.com/grafana/tempo/pull/767) (@JosephWoodward)
Expand All @@ -18,7 +21,6 @@
* [ENHANCEMENT] Emit traces for ingester flush operations. [#812](https://github.com/grafana/tempo/pull/812) (@bboreham)
* [ENHANCEMENT] Add retry middleware in query-frontend. [#814](https://github.com/grafana/tempo/pull/814) (@kvrhdn)
* [ENHANCEMENT] Add `-use-otel-tracer` to use the OpenTelemetry tracer, this will also capture traces emitted by the gcs sdk. Experimental: not all features are supported (i.e. remote sampling). [#842](https://github.com/grafana/tempo/pull/842) (@kvrhdn)
* [CHANGE] Docker images are now prefixed by their branch name [#828](https://github.com/grafana/tempo/pull/828) (@jvrplmlmn)

## v1.0.1

Expand Down
5 changes: 4 additions & 1 deletion cmd/tempo/app/modules.go
Original file line number Diff line number Diff line change
Expand Up @@ -146,8 +146,11 @@ func (t *App) initQuerier() (services.Service, error) {
}
}

// do not enable polling if this is the single binary. in that case the compactor will take care of polling
annanay25 marked this conversation as resolved.
Show resolved Hide resolved
enablePolling := t.cfg.Target == Querier

// todo: make ingester client a module instead of passing config everywhere
querier, err := querier.New(t.cfg.Querier, t.cfg.IngesterClient, t.ring, t.store, t.overrides)
querier, err := querier.New(t.cfg.Querier, t.cfg.IngesterClient, t.ring, t.store, t.overrides, enablePolling)
if err != nil {
return nil, fmt.Errorf("failed to create querier %w", err)
}
Expand Down
8 changes: 8 additions & 0 deletions docs/tempo/website/configuration/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -368,6 +368,14 @@ storage:
# Number of blocks to process in parallel during polling. Default is 50.
[blocklist_poll_concurrency: <int>]

# By default components will pull the blocklist from the tenant index. If that fails the component can
# fallback to scanning the entire bucket. Set to false to disable this behavior. Default is true.
[blocklist_poll_fallback: <bool>]

# Maximum number of compactors that should build the tenant index. All other components will download
# the index. Default 2.
[blocklist_poll_tenant_index_builders: <int>]

# Cache type to use. Should be one of "redis", "memcached"
# Example: "cache: memcached"
[cache: <string>]
Expand Down
2 changes: 2 additions & 0 deletions docs/tempo/website/configuration/manifest.md
Original file line number Diff line number Diff line change
Expand Up @@ -300,6 +300,8 @@ storage:
encoding: zstd
blocklist_poll: 5m0s
blocklist_poll_concurrency: 50
blocklist_poll_fallback: true
blocklist_poll_tenant_index_builders: 2
backend: local
local:
path: /tmp/tempo/traces
Expand Down
5 changes: 4 additions & 1 deletion docs/tempo/website/operations/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,7 @@ Resources for deploying and running Tempo:
- [Monitoring](monitoring)
- [Ingester PVCs](ingester_pvcs)
- [Caching](caching)
- [Tempo CLI](tempo_cli)
- [Tempo CLI](tempo_cli)


# jpe tenant index
joe-elliott marked this conversation as resolved.
Show resolved Hide resolved
1 change: 1 addition & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ require (
github.com/golang/glog v0.0.0-20160126235308-23def4e6c14b
github.com/golang/protobuf v1.5.2
github.com/golang/snappy v0.0.3
github.com/google/go-cmp v0.5.6
github.com/google/uuid v1.2.0
github.com/gorilla/mux v1.8.0
github.com/grpc-ecosystem/go-grpc-prometheus v1.2.1-0.20191002090509-6af20e3a5340 // indirect
Expand Down
3 changes: 3 additions & 0 deletions modules/compactor/compactor.go
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,9 @@ func (c *Compactor) starting(ctx context.Context) error {
}
}

// this will block until one poll cycle is complete
c.store.EnablePolling(c)

return nil
}

Expand Down
5 changes: 2 additions & 3 deletions modules/ingester/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,7 @@ import (
"github.com/cortexproject/cortex/pkg/util/flagext"
cortex_util "github.com/cortexproject/cortex/pkg/util/log"
"github.com/go-kit/kit/log/level"

"github.com/grafana/tempo/modules/storage"
"github.com/grafana/tempo/tempodb"
)

// Config for an ingester.
Expand Down Expand Up @@ -42,7 +41,7 @@ func (cfg *Config) RegisterFlagsAndApplyDefaults(prefix string, f *flag.FlagSet)
f.DurationVar(&cfg.MaxTraceIdle, prefix+".trace-idle-period", 30*time.Second, "Duration after which to consider a trace complete if no spans have been received")
f.DurationVar(&cfg.MaxBlockDuration, prefix+".max-block-duration", time.Hour, "Maximum duration which the head block can be appended to before cutting it.")
f.Uint64Var(&cfg.MaxBlockBytes, prefix+".max-block-bytes", 1024*1024*1024, "Maximum size of the head block before cutting it.")
f.DurationVar(&cfg.CompleteBlockTimeout, prefix+".complete-block-timeout", time.Minute+storage.DefaultBlocklistPoll, "Duration to keep head blocks in the ingester after they have been cut.")
f.DurationVar(&cfg.CompleteBlockTimeout, prefix+".complete-block-timeout", time.Minute+tempodb.DefaultBlocklistPoll, "Duration to keep head blocks in the ingester after they have been cut.")

hostname, err := os.Hostname()
if err != nil {
Expand Down
21 changes: 18 additions & 3 deletions modules/querier/querier.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ type Querier struct {

subservices *services.Manager
subservicesWatcher *services.FailureWatcher

enablePolling bool
}

type responseFromIngesters struct {
Expand All @@ -55,7 +57,7 @@ type responseFromIngesters struct {
}

// New makes a new Querier.
func New(cfg Config, clientCfg ingester_client.Config, ring ring.ReadRing, store storage.Store, limits *overrides.Overrides) (*Querier, error) {
func New(cfg Config, clientCfg ingester_client.Config, ring ring.ReadRing, store storage.Store, limits *overrides.Overrides, enablePolling bool) (*Querier, error) {
factory := func(addr string) (ring_client.PoolClient, error) {
return ingester_client.New(addr, clientCfg)
}
Expand All @@ -69,8 +71,9 @@ func New(cfg Config, clientCfg ingester_client.Config, ring ring.ReadRing, store
factory,
metricIngesterClients,
log.Logger),
store: store,
limits: limits,
store: store,
limits: limits,
enablePolling: enablePolling,
}

q.Service = services.NewBasicService(q.starting, q.running, q.stopping)
Expand Down Expand Up @@ -107,6 +110,12 @@ func (q *Querier) starting(ctx context.Context) error {
return fmt.Errorf("failed to start subservices %w", err)
}
}

if q.enablePolling {
// this will block until one poll cycle is complete
q.store.EnablePolling(q)
}

return nil
}

Expand Down Expand Up @@ -246,3 +255,9 @@ func (q *Querier) forGivenIngesters(ctx context.Context, replicationSet ring.Rep

return responses, err
}

// implements blocklist.JobSharder. Queriers rely on compactors to build the tenant
// index which they then consume.
func (q *Querier) Owns(_ string) bool {
return false
}
12 changes: 7 additions & 5 deletions modules/querier/querier_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -64,17 +64,19 @@ func TestReturnAllHits(t *testing.T) {
WAL: &wal.Config{
Filepath: path.Join(tempDir, "wal"),
},
BlocklistPoll: 50 * time.Millisecond,
BlocklistPoll: 50 * time.Millisecond,
BlocklistPollFallback: true,
}, log.NewNopLogger())
assert.NoError(t, err, "unexpected error creating tempodb")
require.NoError(t, err, "unexpected error creating tempodb")

r.EnablePolling(&Querier{})

wal := w.WAL()
assert.NoError(t, err)

blockCount := 2
testTraceID := make([]byte, 16)
_, err = rand.Read(testTraceID)
assert.NoError(t, err)
require.NoError(t, err)

// keep track of traces sent
testTraces := make([]*tempopb.Trace, 0, blockCount)
Expand All @@ -98,7 +100,7 @@ func TestReturnAllHits(t *testing.T) {
}

// sleep for blocklist poll
time.Sleep(100 * time.Millisecond)
time.Sleep(200 * time.Millisecond)

// find should return both now
foundBytes, _, err := r.Find(context.Background(), util.FakeTenantID, testTraceID, tempodb.BlockIDMin, tempodb.BlockIDMax)
Expand Down
7 changes: 3 additions & 4 deletions modules/storage/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ package storage

import (
"flag"
"time"

cortex_cache "github.com/cortexproject/cortex/pkg/chunk/cache"

Expand All @@ -23,15 +22,15 @@ type Config struct {
Trace tempodb.Config `yaml:"trace"`
}

const DefaultBlocklistPoll = 5 * time.Minute

// RegisterFlagsAndApplyDefaults registers the flags.
func (cfg *Config) RegisterFlagsAndApplyDefaults(prefix string, f *flag.FlagSet) {

cfg.Trace.BlocklistPollFallback = true
cfg.Trace.BlocklistPollConcurrency = tempodb.DefaultBlocklistPollConcurrency
cfg.Trace.BlocklistPollTenantIndexBuilders = tempodb.DefaultTenantIndexBuilders
kvrhdn marked this conversation as resolved.
Show resolved Hide resolved

f.StringVar(&cfg.Trace.Backend, util.PrefixConfig(prefix, "trace.backend"), "", "Trace backend (s3, azure, gcs, local)")
f.DurationVar(&cfg.Trace.BlocklistPoll, util.PrefixConfig(prefix, "trace.maintenance-cycle"), DefaultBlocklistPoll, "Period at which to run the maintenance cycle.")
f.DurationVar(&cfg.Trace.BlocklistPoll, util.PrefixConfig(prefix, "trace.maintenance-cycle"), tempodb.DefaultBlocklistPoll, "Period at which to run the maintenance cycle.")
kvrhdn marked this conversation as resolved.
Show resolved Hide resolved

cfg.Trace.WAL = &wal.Config{}
f.StringVar(&cfg.Trace.WAL.Filepath, util.PrefixConfig(prefix, "trace.wal.path"), "/var/tempo/wal", "Path at which store WAL blocks.")
Expand Down
44 changes: 43 additions & 1 deletion operations/tempo-mixin/alerts.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -109,8 +109,50 @@
runbook_url: 'https://github.com/grafana/tempo/tree/main/operations/tempo-mixin/runbook.md#TempoPollsFailing'
},
},
{
alert: 'TempoTenantIndexFailures',
expr: |||
sum by (cluster, namespace) (increase(tempodb_blocklist_tenant_index_errors_total{}[1h])) > %s and
sum by (cluster, namespace) (increase(tempodb_blocklist_tenant_index_errors_total{}[5m])) > 0
||| % $._config.alerts.polls_per_hour_failed,
labels: {
severity: 'critical',
},
annotations: {
message: 'Greater than %s tenant index failures in the past hour.' % $._config.alerts.polls_per_hour_failed,
runbook_url: 'https://github.com/grafana/tempo/tree/main/operations/tempo-mixin/runbook.md#TempoTenantIndexFailures'
},
},
{
alert: 'TempoNoTenantIndexBuilders',
expr: |||
sum by (cluster, namespace) (tempodb_blocklist_tenant_index_builder{}) == 0
|||,
'for': '5m',
labels: {
severity: 'critical',
},
annotations: {
message: 'No tenant index builders. Tenant index is out of date.',
runbook_url: 'https://github.com/grafana/tempo/tree/main/operations/tempo-mixin/runbook.md#TempoNoTenantIndexBuilders'
},
},
{
alert: 'TempoTenantIndexTooOld',
expr: |||
max by (cluster, namespace) (tempodb_blocklist_tenant_index_age_seconds{}) > %s
||| % $._config.alerts.max_tenant_index_age_seconds,
'for': '5m',
labels: {
severity: 'critical',
},
annotations: {
message: 'Tenant index age is %s seconds old.' % $._config.alerts.max_tenant_index_age_seconds,
runbook_url: 'https://github.com/grafana/tempo/tree/main/operations/tempo-mixin/runbook.md#TempoTenantIndexTooOld'
},
},
],
},
],
},
}
}
1 change: 1 addition & 0 deletions operations/tempo-mixin/config.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
compactions_per_hour_failed: 2,
flushes_per_hour_failed: 2,
polls_per_hour_failed: 2,
max_tenant_index_age_seconds: 600,
},
},
}
61 changes: 58 additions & 3 deletions operations/tempo-mixin/runbook.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,11 +107,66 @@ remove them.

In the case of failed flushes your local WAL disk is now filling up. Tempo will continue to retry sending the blocks
until it succeeds, but at some point your WAL files will start failing to write due to out of disk issues. If the problem
persists consider killing the block that's failing to upload in `/var/tempo/wal` and restarting the ingester.
persists consider removing files from `/var/tempo/wal/blocks` and restarting the ingester or increasing the amount of disk space
available to the ingester.

## TempoPollsFailing

See [Polling Issues](#polling-issues) below for general information.

If polls are failing check the component that is raising this metric and look for any obvious logs that may indicate a quick fix.
We have only seen polling failures occur due to intermittent backend issues.

## TempoTenantIndexFailures

See [Polling Issues](#polling-issues) below for general information.

If the following is being logged then things are stable (due to polling fallback) and we just need to review the logs to determine why
there is an issue with the index and correct.
```
failed to pull bucket index for tenant. falling back to polling
```

If the following (or other errors) are being logged repeatedly then the tenant index is not being updated and more direct action is necessary.
If the core issue can not be resolved one option is to delete all tenant indexes which will force the components to fallback to
scanning the entire bucket.
```
failed to write tenant index
```

## TempoNoTenantIndexBuilders

See [Polling Issues](#polling-issues) below for general information.

If a cluster has no tenant index builders then nothing is refreshing the per tenant indexes. This can be dangerous
b/c other components will not be aware there is an issue as they repeatedly download a stale tenant index. In Tempo the compactors
play the role of building the tenant index. Ways to address this issue in order of preference:

- Find and forget all unhealthy compactors.
- Increase the number of compactors that attempt to build the index.
```
storage:
trace:
blocklist_poll_tenant_index_builders: 2 # <- increase this value
```
- Delete tenant index files to force other components to fallback to scanning the entire bucket. They are located at
`/<tenant>/index.json.gz`

## TempoTenantIndexTooOld

See [Polling Issues](#polling-issues) below for general information.

If the tenant indexes are too old we need to review the compactor logs to determine why they are failing to update. Compactors
with `tempodb_blocklist_tenant_index_builder` set to 1 are expected to be creating the tenant indexes are should be checked
first. If no compactors are creating tenant indexes refer to [TempoNoTenantIndexBuilders](#temponotenantindexbuilders) above.

Additionally the metric `tempodb_blocklist_tenant_index_age_seconds` can be grouped by the `tenant` label. If only one (or few)
indexes are lagging these can be deleted to force components to manually rescan the bucket.

### Polling Issues

In the case of all polling issues intermittent issues are not concerning. Sustained polling issues need to be addressed.

Generally, failure to poll just means that the component is not aware of the current state of the backend but will continue working
otherwise. Queriers, for instance, will start returning 404s as their internal representation of the backend grows stale.
Failure to poll just means that the component is not aware of the current state of the backend but will continue working
otherwise. Queriers, for instance, will start returning 404s as their internal representation of the backend grows stale.
Compactors will attempt to compact blocks that don't exist.
4 changes: 4 additions & 0 deletions tempodb/backend/backend.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ type Writer interface {
Append(ctx context.Context, name string, blockID uuid.UUID, tenantID string, tracker AppendTracker, buffer []byte) (AppendTracker, error)
// Closes any resources associated with the AppendTracker
CloseAppend(ctx context.Context, tracker AppendTracker) error
// WriteTenantIndex writes the two meta slices as a tenant index
WriteTenantIndex(ctx context.Context, tenantID string, meta []*BlockMeta, compactedMeta []*CompactedBlockMeta) error
}

// Reader is a collection of methods to read data from tempodb backends
Expand All @@ -45,6 +47,8 @@ type Reader interface {
Blocks(ctx context.Context, tenantID string) ([]uuid.UUID, error)
// BlockMeta returns the blockmeta given a block and tenant id
BlockMeta(ctx context.Context, blockID uuid.UUID, tenantID string) (*BlockMeta, error)
// TenantIndex returns lists of all metas given a tenant
TenantIndex(ctx context.Context, tenantID string) (*TenantIndex, error)
// Shutdown shuts...down?
Shutdown()
}
Expand Down
2 changes: 1 addition & 1 deletion tempodb/backend/block_meta.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import (
type CompactedBlockMeta struct {
BlockMeta

CompactedTime time.Time `json:"-"`
CompactedTime time.Time `json:"compactedTime"` // jpe test/consider this super scary change
}

type BlockMeta struct {
Expand Down
Loading