Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index out of range at completing block log message #967

Closed
zalegrala opened this issue Sep 17, 2021 · 8 comments · Fixed by #968
Closed

Index out of range at completing block log message #967

zalegrala opened this issue Sep 17, 2021 · 8 comments · Fixed by #968

Comments

@zalegrala
Copy link
Contributor

Describe the bug

It appears that when a block is completing, we have an index out of range panic.

level=info ts=2021-09-17T15:08:48.719429979Z caller=compactor.go:104 msg="enabling compaction"
level=info ts=2021-09-17T15:08:48.719564092Z caller=tempodb.go:368 msg="compaction and retention enabled."
level=info ts=2021-09-17T15:09:50.54346294Z caller=frontend.go:109 tenant=single-tenant method=GET traceID=49c4f409f3d3da11 url=/api/traces/5cfd75829f7636034774f39fd6294859 duration=58.658035ms response_size=12225 status=200
level=info ts=2021-09-17T15:10:20.505573808Z caller=frontend.go:109 tenant=single-tenant method=GET traceID=34b75bfdabe97a08 url=/api/traces/2640357a799d655265fe5791c5734dbb duration=23.859058ms response_size=7949 status=200
level=info ts=2021-09-17T15:10:50.522667639Z caller=frontend.go:109 tenant=single-tenant method=GET traceID=528a543f686516df url=/api/traces/0c360f4ee129bece3f5a3fab9884c732 duration=42.164202ms response_size=19854 status=200
level=info ts=2021-09-17T15:11:08.726815404Z caller=flush.go:157 msg="head block cut. enqueueing flush op" userid=single-tenant block=5ec589ad-0528-4b9c-84bd-13f99a994b5f
level=info ts=2021-09-17T15:11:17.624818984Z caller=flush.go:232 msg="completing block" userid=single-tenant blockID=5ec589ad-0528-4b9c-84bd-13f99a994b5f
panic: runtime error: index out of range [3] with length 0

goroutine 473 [running]:
github.com/google/flatbuffers/go.GetUint32(...)
	/home/zach/go/src/github.com/grafana/tempo/vendor/github.com/google/flatbuffers/go/encode.go:47
github.com/google/flatbuffers/go.GetUOffsetT(...)
	/home/zach/go/src/github.com/grafana/tempo/vendor/github.com/google/flatbuffers/go/encode.go:121
github.com/grafana/tempo/pkg/tempofb.GetRootAsSearchEntry(...)
	/home/zach/go/src/github.com/grafana/tempo/pkg/tempofb/SearchEntry.go:14
github.com/grafana/tempo/pkg/tempofb.SearchEntryFromBytes(...)
	/home/zach/go/src/github.com/grafana/tempo/pkg/tempofb/searchdata_util.go:257
github.com/grafana/tempo/tempodb/search.(*DataCombiner).Combine(0xc00257b230, {0x28, 0x266f400}, {0xc00257b260, 0x2, 0x185d9df})
	/home/zach/go/src/github.com/grafana/tempo/tempodb/search/data_combiner.go:28 +0x3a6
github.com/grafana/tempo/tempodb/encoding.(*dedupingIterator).Next(0xc0022bc660, {0x26b7098, 0xc000052038})
	/home/zach/go/src/github.com/grafana/tempo/tempodb/encoding/iterator_deduping.go:61 +0x237
github.com/grafana/tempo/tempodb/search.NewBackendSearchBlock(0xc0004fef60, 0xc00013c4d0, {0x5e, 0xc5, 0x89, 0xad, 0x5, 0x28, 0x4b, 0x9c, ...}, ...)
	/home/zach/go/src/github.com/grafana/tempo/tempodb/search/backend_search_block.go:64 +0x2f1
github.com/grafana/tempo/modules/ingester.(*instance).CompleteBlock(0xc0018b4480, {0x5e, 0xc5, 0x89, 0xad, 0x5, 0x28, 0x4b, 0x9c, 0x84, ...})
	/home/zach/go/src/github.com/grafana/tempo/modules/ingester/instance.go:272 +0x269
github.com/grafana/tempo/modules/ingester.(*Ingester).handleComplete(0xc0004c2c00, 0xc0017f7540)
	/home/zach/go/src/github.com/grafana/tempo/modules/ingester/flush.go:238 +0x25c
github.com/grafana/tempo/modules/ingester.(*Ingester).flushLoop(0xc0004c2c00, 0x1)
	/home/zach/go/src/github.com/grafana/tempo/modules/ingester/flush.go:195 +0xc7
created by github.com/grafana/tempo/modules/ingester.New
	/home/zach/go/src/github.com/grafana/tempo/modules/ingester/ingester.go:78 +0x1e9

To Reproduce
Steps to reproduce the behavior:

  1. Start Tempo 3108978d configured with memberlist and s3 backend pointed to minio
  2. Start Vulture to send and query traces
  3. Wait for completing block to happen

Expected behavior
Not panic.

Environment:

  • Infrastructure: laptop
  • Deployment tool: docker-compose

Additional Context

@zalegrala
Copy link
Contributor Author

I had memberlist configured, but due to the way the code is currently in main, I don't think it is taking effect. The logs seem to indicate that inmemory is in use.

level=info ts=2021-09-17T15:19:38.027865616Z caller=lifecycler.go:560 msg="instance not found in ring, adding with no tokens" ring=ingester
level=info ts=2021-09-17T15:19:38.028147558Z caller=lifecycler.go:407 msg="auto-joining cluster after timeout" ring=ingester

@annanay25
Copy link
Contributor

This is very likely related to #940. Will dig into it.

@annanay25
Copy link
Contributor

tempo_1                     | level=info ts=2021-09-17T15:32:00.8998625Z caller=main.go:189 msg="initialising OpenTracing tracer"
tempo_1                     | level=info ts=2021-09-17T15:32:00.9028193Z caller=main.go:108 msg="Starting Tempo" version="(version=main-3108978, branch=main, revision=3108978d)"
tempo_1                     | level=info ts=2021-09-17T15:32:00.9165684Z caller=server.go:260 http=[::]:3200 grpc=[::]:9095 msg="server listening on addresses"
tempo_1                     | level=info ts=2021-09-17T15:32:00.9183584Z caller=frontend.go:36 msg="creating tripperware in query frontend"
.
.
.
tempo_1                     | level=info ts=2021-09-17T15:36:00.7442605Z caller=compactor.go:97 msg="compaction cycle complete. No more blocks to compact" tenantID=single-tenant
tempo_1                     | level=info ts=2021-09-17T15:36:00.9597497Z caller=flush.go:157 msg="head block cut. enqueueing flush op" userid=single-tenant block=8e087d82-acd4-4df0-a0f9-bc9b3164f703
tempo_1                     | level=info ts=2021-09-17T15:36:07.0363883Z caller=flush.go:232 msg="completing block" userid=single-tenant blockID=8e087d82-acd4-4df0-a0f9-bc9b3164f703
tempo_1                     | level=info ts=2021-09-17T15:36:07.7684297Z caller=flush.go:239 msg="block completed" userid=single-tenant blockID=8e087d82-acd4-4df0-a0f9-bc9b3164f703 duration=732.0544ms
tempo_1                     | level=info ts=2021-09-17T15:36:07.7764808Z caller=flush.go:288 msg="flushing block" userid=single-tenant block=8e087d82-acd4-4df0-a0f9-bc9b3164f703
tempo_1                     | level=info ts=2021-09-17T15:36:30.7152507Z caller=compactor.go:93 msg="starting compaction cycle" tenantID=single-tenant offset=0
tempo_1                     | level=info ts=2021-09-17T15:36:30.7154144Z caller=compactor.go:97 msg="compaction cycle complete. No more blocks to compact" tenantID=single-tenant

Unable to reproduce. :/ Can you dump your version ^ like the first few lines of this snippet as well as mention which docker-compose you are using?

@zalegrala
Copy link
Contributor Author

Here is the full log for the tempo instance that crashed.

❯ docker logs 5e647fd27501
level=info ts=2021-09-17T15:48:32.757354067Z caller=main.go:189 msg="initialising OpenTracing tracer"
level=info ts=2021-09-17T15:48:32.763389678Z caller=main.go:108 msg="Starting Tempo" version="(version=main-3108978, branch=main, revision=3108978d)"
level=info ts=2021-09-17T15:48:32.764156026Z caller=server.go:260 http=[::]:3200 grpc=[::]:9095 msg="server listening on addresses"
level=info ts=2021-09-17T15:48:32.781912938Z caller=frontend.go:36 msg="creating tripperware in query frontend"
ts=2021-09-17T15:48:32Z level=info msg="OTel Shim Logger Initialized" component=tempo
level=warn ts=2021-09-17T15:48:32.785786345Z caller=modules.go:149 msg="Worker address is empty in single binary mode.  Attempting automatic worker configuration.  If queries are unresponsive consider configuring the worker explicitly." address=127.0.0.1:9095
level=info ts=2021-09-17T15:48:32.787026032Z caller=worker.go:113 msg="Starting querier worker connected to query-frontend" frontend=127.0.0.1:9095
level=info ts=2021-09-17T15:48:32.790345221Z caller=module_service.go:64 msg=initialising module=store
level=info ts=2021-09-17T15:48:32.789030728Z caller=module_service.go:64 msg=initialising module=server
level=info ts=2021-09-17T15:48:32.791479596Z caller=module_service.go:64 msg=initialising module=query-frontend
level=info ts=2021-09-17T15:48:32.791480113Z caller=module_service.go:64 msg=initialising module=memberlist-kv
level=info ts=2021-09-17T15:48:32.792031476Z caller=module_service.go:64 msg=initialising module=ring
level=info ts=2021-09-17T15:48:32.792145668Z caller=module_service.go:64 msg=initialising module=overrides
level=info ts=2021-09-17T15:48:32.792424789Z caller=module_service.go:64 msg=initialising module=compactor
level=info ts=2021-09-17T15:48:32.792385338Z caller=module_service.go:64 msg=initialising module=ingester
level=info ts=2021-09-17T15:48:32.792865306Z caller=ring.go:276 msg="ring doesn't exist in KV store yet"
level=info ts=2021-09-17T15:48:32.793079239Z caller=client.go:254 msg="value is nil" key=collectors/ring index=1
level=info ts=2021-09-17T15:48:32.793191111Z caller=module_service.go:64 msg=initialising module=distributor
level=info ts=2021-09-17T15:48:32.793242288Z caller=module_service.go:64 msg=initialising module=querier
level=info ts=2021-09-17T15:48:32.793575128Z caller=worker.go:193 msg="adding connection" addr=127.0.0.1:9095
level=info ts=2021-09-17T15:48:32.792650062Z caller=tempodb.go:388 msg="polling enabled" interval=5m0s concurrency=50
level=info ts=2021-09-17T15:48:32.794305437Z caller=ingester.go:331 msg="beginning wal replay"
level=info ts=2021-09-17T15:48:32.794565429Z caller=ingester.go:366 msg="wal replay complete"
level=info ts=2021-09-17T15:48:32.794853775Z caller=ingester.go:380 msg="reloading local blocks" tenants=0
level=info ts=2021-09-17T15:48:32.795072127Z caller=lifecycler.go:531 msg="not loading tokens from file, tokens file path is empty"
level=info ts=2021-09-17T15:48:32.795673371Z caller=lifecycler.go:560 msg="instance not found in ring, adding with no tokens" ring=ingester
level=info ts=2021-09-17T15:48:32.796467603Z caller=lifecycler.go:407 msg="auto-joining cluster after timeout" ring=ingester
level=info ts=2021-09-17T15:48:32.805126117Z caller=compactor.go:102 msg="waiting for compaction ring to settle" waitDuration=1m0s
ts=2021-09-17T15:48:32Z level=info msg="No sampling strategies provided or URL is unavailable, using defaults" component=tempo
ts=2021-09-17T15:48:32Z level=info msg="Starting GRPC server on endpoint 0.0.0.0:4317" component=tempo
ts=2021-09-17T15:48:32Z level=info msg="Setting up a second GRPC listener on legacy endpoint 0.0.0.0:55680" component=tempo
ts=2021-09-17T15:48:32Z level=info msg="Starting GRPC server on endpoint 0.0.0.0:55680" component=tempo
ts=2021-09-17T15:48:32Z level=info msg="Starting HTTP server on endpoint 0.0.0.0:55681" component=tempo
level=info ts=2021-09-17T15:48:32.806643132Z caller=app.go:257 msg="Tempo started"
level=info ts=2021-09-17T15:49:32.806263787Z caller=compactor.go:104 msg="enabling compaction"
level=info ts=2021-09-17T15:49:32.806371343Z caller=tempodb.go:368 msg="compaction and retention enabled."
level=info ts=2021-09-17T15:51:05.294150726Z caller=frontend.go:109 tenant=single-tenant method=GET traceID=6fccd2c48b131822 url=/api/traces/2738b28affc3bd5c6d8f68946d106b52 duration=28.971397ms response_size=15131 status=200
level=info ts=2021-09-17T15:51:42.812531478Z caller=flush.go:157 msg="head block cut. enqueueing flush op" userid=single-tenant block=afa931f6-b332-4815-8416-e66afd6b1027
level=info ts=2021-09-17T15:51:48.569306563Z caller=flush.go:232 msg="completing block" userid=single-tenant blockID=afa931f6-b332-4815-8416-e66afd6b1027
panic: runtime error: index out of range [3] with length 0

goroutine 474 [running]:
github.com/google/flatbuffers/go.GetUint32(...)
	/home/zach/go/src/github.com/grafana/tempo/vendor/github.com/google/flatbuffers/go/encode.go:47
github.com/google/flatbuffers/go.GetUOffsetT(...)
	/home/zach/go/src/github.com/grafana/tempo/vendor/github.com/google/flatbuffers/go/encode.go:121
github.com/grafana/tempo/pkg/tempofb.GetRootAsSearchEntry(...)
	/home/zach/go/src/github.com/grafana/tempo/pkg/tempofb/SearchEntry.go:14
github.com/grafana/tempo/pkg/tempofb.SearchEntryFromBytes(...)
	/home/zach/go/src/github.com/grafana/tempo/pkg/tempofb/searchdata_util.go:257
github.com/grafana/tempo/tempodb/search.(*DataCombiner).Combine(0xc00189e660, {0x28, 0x266f3c0}, {0xc00189e6f0, 0x2, 0x185d9df})
	/home/zach/go/src/github.com/grafana/tempo/tempodb/search/data_combiner.go:28 +0x3a6
github.com/grafana/tempo/tempodb/encoding.(*dedupingIterator).Next(0xc001677ec0, {0x26b7058, 0xc0001a6008})
	/home/zach/go/src/github.com/grafana/tempo/tempodb/encoding/iterator_deduping.go:61 +0x237
github.com/grafana/tempo/tempodb/search.NewBackendSearchBlock(0xc000102e40, 0xc0001b4ea0, {0xaf, 0xa9, 0x31, 0xf6, 0xb3, 0x32, 0x48, 0x15, ...}, ...)
	/home/zach/go/src/github.com/grafana/tempo/tempodb/search/backend_search_block.go:64 +0x2f1
github.com/grafana/tempo/modules/ingester.(*instance).CompleteBlock(0xc000a92a20, {0xaf, 0xa9, 0x31, 0xf6, 0xb3, 0x32, 0x48, 0x15, 0x84, ...})
	/home/zach/go/src/github.com/grafana/tempo/modules/ingester/instance.go:272 +0x269
github.com/grafana/tempo/modules/ingester.(*Ingester).handleComplete(0xc000aa0000, 0xc00251cf00)
	/home/zach/go/src/github.com/grafana/tempo/modules/ingester/flush.go:238 +0x25c
github.com/grafana/tempo/modules/ingester.(*Ingester).flushLoop(0xc000aa0000, 0x1)
	/home/zach/go/src/github.com/grafana/tempo/modules/ingester/flush.go:195 +0xc7
created by github.com/grafana/tempo/modules/ingester.New
	/home/zach/go/src/github.com/grafana/tempo/modules/ingester/ingester.go:78 +0x1e9

I've just committed the docker-compose I'm using here: https://github.com/zalegrala/tempo/tree/haOperationalRunMode/example/docker-compose/ha

@baiwfg2
Copy link

baiwfg2 commented Sep 18, 2021

I met with the same situation ( commit: 3108978)

tempo_1                     | panic: runtime error: index out of range [3] with length 0
tempo_1                     | 
tempo_1                     | goroutine 444 [running]:
tempo_1                     | github.com/google/flatbuffers/go.GetUint32(...)
tempo_1                     |   /drone/src/vendor/github.com/google/flatbuffers/go/encode.go:47
tempo_1                     | github.com/google/flatbuffers/go.GetUOffsetT(...)
tempo_1                     |   /drone/src/vendor/github.com/google/flatbuffers/go/encode.go:121
tempo_1                     | github.com/grafana/tempo/pkg/tempofb.GetRootAsSearchEntry(...)
tempo_1                     |   /drone/src/pkg/tempofb/SearchEntry.go:14
tempo_1                     | github.com/grafana/tempo/pkg/tempofb.SearchEntryFromBytes(...)
tempo_1                     |   /drone/src/pkg/tempofb/searchdata_util.go:257
tempo_1                     | github.com/grafana/tempo/tempodb/search.(*DataCombiner).Combine(0xc0016e4b10, {0x28, 0x266c6e0}, {0xc0016e4b40, 0x2, 0x185d9df})
tempo_1                     |   /drone/src/tempodb/search/data_combiner.go:28 +0x3a6
tempo_1                     | github.com/grafana/tempo/tempodb/encoding.(*dedupingIterator).Next(0xc0019202a0, {0x26b4378, 0xc000054038})
tempo_1                     |   /drone/src/tempodb/encoding/iterator_deduping.go:61 +0x237
tempo_1                     | github.com/grafana/tempo/tempodb/search.NewBackendSearchBlock(0xc000b888a0, 0xc00058a408, {0x88, 0x45, 0x36, 0x46, 0xb5, 0x31, 0x49, 0x4c, ...}, ...)
tempo_1                     |   /drone/src/tempodb/search/backend_search_block.go:64 +0x2f1
tempo_1                     | github.com/grafana/tempo/modules/ingester.(*instance).CompleteBlock(0xc000cb2360, {0x88, 0x45, 0x36, 0x46, 0xb5, 0x31, 0x49, 0x4c, 0x80, ...})
tempo_1                     |   /drone/src/modules/ingester/instance.go:272 +0x269
tempo_1                     | github.com/grafana/tempo/modules/ingester.(*Ingester).handleComplete(0xc00040bb00, 0xc000627630)
tempo_1                     |   /drone/src/modules/ingester/flush.go:238 +0x25c
tempo_1                     | github.com/grafana/tempo/modules/ingester.(*Ingester).flushLoop(0xc00040bb00, 0x3)
tempo_1                     |   /drone/src/modules/ingester/flush.go:195 +0xc7
tempo_1                     | created by github.com/grafana/tempo/modules/ingester.New
tempo_1                     |   /drone/src/modules/ingester/ingester.go:78 +0x1e9

@annanay25
Copy link
Contributor

OK. Able to reproduce this error, its happening in the case where search is not enabled. I missed that. Will submit a patch soon.

@annanay25
Copy link
Contributor

Thanks for reporting this @zalegrala, caught an evasive bug!

@zalegrala
Copy link
Contributor Author

Thanks for the fix @annanay25!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants