Add support for vulture sending long running traces #951

zalegrala · 2021-09-13T17:10:22Z

What this PR does:

Here we implement the ability for the vulture to send long-running traces, which will enable the vulture verify additional infrastructure components required to resolve the trace.

Which issue(s) this PR fixes:
Fixes #791

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

zalegrala · 2021-09-13T17:10:42Z

This will require #944

zalegrala · 2021-09-15T18:16:01Z

This is working pretty good locally. There are two conditions that could result in false positives that I know about currently.

The first is that the number of bytes on a trace may exceed the allowed limit of Tempo, which is hard to know unless we query the API to know the size. But perhaps some work on that API should precede that so we can work with json. In lieu of working on the API and reading the configured value from Tempo, I've reduced the chance of extending a trace to include future amendments to a given trace. This seems like maybe a reasonable approach, since for each extension of a trace, the chance of extending it yet again continues to diminish.

The second is that when we go to read what we expect is the result, we are effectively counting the number of writes that we expect has taken place. There is a margin of error, where if the timing worked out just perfect, the incorrectresult ticker would rise falsely. I've added a dirty little sleep that should reduce the chance of it happening if it seems like we are within that margin of error, but I don't feel great about it, but it probably works fine.

I'll leave it running locally for a while and see where we're at.

joe-elliott · 2021-09-20T12:29:57Z

cmd/tempo-vulture/main.go

+		zap.Int64("seed", info.timestamp.Unix()),
+	)
+
+	if maybe(info.r) {


i'd like to look at a better way to do this. perhaps we could generate a "total batches" int from the random number generator and store it in traceinfo. then we could decrement that value each time we emitted a batch?

Good call, I think that cleans up the readability a bit. I've made the update.

joe-elliott · 2021-09-20T18:30:46Z

cmd/tempo-vulture/main.go

+		// If the last write has happened very recently, we'll wait a bit to make
+		// sure the write has taken place, and then add the batches that would have
+		// just been written.
+		if time.Since(lastWrite) < (1 * time.Second) {


this is pretty janky.

from a seed we should be able to determine when a trace was completed and either query or not query the trace if we feel it is not complete.

putting sleeps here is going to weirdly impact the timings of other loops.

Alright, I've washed off some of that jank. No more sleeps to construct the expected trace, and we only search/query when we expect that all writes have taken place.

joe-elliott · 2021-09-22T15:20:52Z

cmd/tempo-vulture/main.go

+	// that we get the expected number of batches on a trace.  A value larger
+	// than 25 here results in vulture writing traces that exceed the maximum
+	// trace size.
+	batchHighWaterMark int64 = 25


rename:

maxBatchesPerWrite maxLongWritesPerTrace

zalegrala force-pushed the vultureLongWrite branch 3 times, most recently from 75a3c00 to 3c21291 Compare September 15, 2021 17:57

zalegrala force-pushed the vultureLongWrite branch 2 times, most recently from 11ff31c to 2688012 Compare September 15, 2021 20:09

zalegrala marked this pull request as ready for review September 15, 2021 21:38

zalegrala requested review from annanay25, dgzlopes, joe-elliott, kvrhdn, mapno and mdisibio as code owners September 15, 2021 21:38

joe-elliott reviewed Sep 20, 2021

View reviewed changes

zalegrala added 4 commits September 21, 2021 08:25

Add support for sending long running traces

fe2d835

Update changelog

173bd3f

Update test fixture and epoch

3d6a2c3

Refactor to clean up chance logic for long writes

cca020a

zalegrala force-pushed the vultureLongWrite branch from 8388921 to 3d29093 Compare September 21, 2021 14:26

joe-elliott reviewed Sep 22, 2021

View reviewed changes

zalegrala force-pushed the vultureLongWrite branch from 3d29093 to fc19df5 Compare September 22, 2021 15:21

Only query/search traces when we expect that all write have taken place

81eb68f

zalegrala force-pushed the vultureLongWrite branch from fc19df5 to 81eb68f Compare September 22, 2021 15:48

joe-elliott approved these changes Sep 22, 2021

View reviewed changes

joe-elliott merged commit 2498d5b into grafana:main Sep 22, 2021

zalegrala deleted the vultureLongWrite branch September 23, 2021 13:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for vulture sending long running traces #951

Add support for vulture sending long running traces #951

zalegrala commented Sep 13, 2021 •

edited

Loading

zalegrala commented Sep 13, 2021

zalegrala commented Sep 15, 2021 •

edited

Loading

joe-elliott Sep 20, 2021

zalegrala Sep 20, 2021

joe-elliott Sep 20, 2021

zalegrala Sep 21, 2021

joe-elliott Sep 22, 2021

Add support for vulture sending long running traces #951

Add support for vulture sending long running traces #951

Conversation

zalegrala commented Sep 13, 2021 • edited Loading

zalegrala commented Sep 13, 2021

zalegrala commented Sep 15, 2021 • edited Loading

joe-elliott Sep 20, 2021

Choose a reason for hiding this comment

zalegrala Sep 20, 2021

Choose a reason for hiding this comment

joe-elliott Sep 20, 2021

Choose a reason for hiding this comment

zalegrala Sep 21, 2021

Choose a reason for hiding this comment

joe-elliott Sep 22, 2021

Choose a reason for hiding this comment

zalegrala commented Sep 13, 2021 •

edited

Loading

zalegrala commented Sep 15, 2021 •

edited

Loading