Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new option in debug tool to get a histogram of key and value sizes. #2844

Merged
merged 3 commits into from
Dec 27, 2018

Conversation

martinmr
Copy link
Contributor

@martinmr martinmr commented Dec 25, 2018

The new option can be accessed via the --histogram flag. It will print a
histogram of the key and value sizes, as well as the min, max, mean and
standard deviation of their distributions.


This change is Reviewable

@CLAassistant
Copy link

CLAassistant commented Dec 25, 2018

CLA assistant check
All committers have signed the CLA.

@martinmr
Copy link
Contributor Author

Example output below (fyi, it looks justified in the terminal but not when I pasted it here):

Opening DB: ./p
badger2018/12/25 02:41:37 INFO: Replaying file id: 2 at offset: 60334520
badger2018/12/25 02:41:37 INFO: Replay took: 21.41µs
prefix =
Found 1036362 keys

Histogram of key sizes (in bytes)
Min value: 6
Max value: 33
Mean: 23.51
Standard deviation: 5.69
Range Count
[ 0, 32) 910747
[ 32, 64) 125615

Histogram of value sizes (in bytes)
Min value: 0
Max value: 2432
Mean: 37.71
Standard deviation: 45.32
Range Count
[ 0, 32) 635819
[ 32, 64) 261183
[ 64, 128) 131757
[ 128, 256) 5804
[ 256, 512) 850
[ 512, 1024) 690
[ 1024, 2048) 250
[ 2048, 4096) 9

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 8 unresolved discussions (waiting on @martinmr and @manishrjain)


dgraph/cmd/debug/run.go, line 82 at r1 (raw file):

	flag.BoolVarP(&opt.keyHistory, "history", "y", false, "Show all versions of a key.")
	flag.StringVarP(&opt.pdir, "postings", "p", "", "Directory where posting lists are stored.")
	flag.BoolVar(&opt.sizeHistogram, "histogram", false, "Show a histogram of the key and value sizes.")

100 chars.


dgraph/cmd/debug/run.go, line 530 at r1 (raw file):

// Return a new instance of HistogramData with properly initialized fields.
func NewHistogramData(bounds []float64) *HistogramData {
	histogram := new(HistogramData)

Can also be written as:

return &HistogramData{
Bounds: bounds,
Min: math.MaxInt64,
}

Slightly easier to read.


dgraph/cmd/debug/run.go, line 539 at r1 (raw file):

// Update the Min and Max fields if value is less than or greater than the
// current values.
func (histogram *HistogramData) UpdateMinMax(value int64) {

Also, find the right bucket, and +1 it.


dgraph/cmd/debug/run.go, line 555 at r1 (raw file):

// Simple exporter function that copies the view data to the exporter.
func (exporter HistogramExporter) ExportView(viewData *view.Data) {

Remove all this.


dgraph/cmd/debug/run.go, line 611 at r1 (raw file):

func sizeHistogram(db *badger.DB) {
	txn := db.NewTransactionAt(math.MaxUint64, false)

Use opt.readTs instead of maxuint64.


dgraph/cmd/debug/run.go, line 628 at r1 (raw file):

	keySizeMeasure := stats.Int64("debug/key_sizes",
		"The size of the keys in bytes", "1")
	keySizeView := &view.View{

Get rid of the opencensus stuff.


dgraph/cmd/debug/run.go, line 673 at r1 (raw file):

		keySize := int64(len(item.Key()))
		valueSize := item.ValueSize()
		keySizeHistogram.UpdateMinMax(keySize)

.Update


dgraph/cmd/debug/run.go, line 684 at r1 (raw file):

	// printing.
	sleepDuration, _ := time.ParseDuration("500ms")
	time.Sleep(sleepDuration)

time.Sleep(500*time.Millisecond), or better still time.Sleep(time.Second).

var (
Debug x.SubCommand
opt flagOptions
keySizeViewName = "debug/key_size_histogram"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keySizeViewName is unused (from varcheck)

Debug x.SubCommand
opt flagOptions
keySizeViewName = "debug/key_size_histogram"
valueSizeViewName = "debug/value_size_histogram"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valueSizeViewName is unused (from varcheck)

Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 10 unresolved discussions (waiting on @manishrjain and @golangcibot)


dgraph/cmd/debug/run.go, line 82 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

100 chars.

Done.


dgraph/cmd/debug/run.go, line 530 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Can also be written as:

return &HistogramData{
Bounds: bounds,
Min: math.MaxInt64,
}

Slightly easier to read.

Done.


dgraph/cmd/debug/run.go, line 539 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Also, find the right bucket, and +1 it.

Done.


dgraph/cmd/debug/run.go, line 555 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Remove all this.

Done.


dgraph/cmd/debug/run.go, line 611 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Use opt.readTs instead of maxuint64.

Done.


dgraph/cmd/debug/run.go, line 628 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Get rid of the opencensus stuff.

Done.


dgraph/cmd/debug/run.go, line 673 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

.Update

Done.


dgraph/cmd/debug/run.go, line 684 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

time.Sleep(500*time.Millisecond), or better still time.Sleep(time.Second).

Done . Got rid of all the opencensus stuff so this is no longer needed.


dgraph/cmd/debug/run.go, line 43 at r2 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

keySizeViewName is unused (from varcheck)

Done.


dgraph/cmd/debug/run.go, line 44 at r2 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

valueSizeViewName is unused (from varcheck)

Done.


var opt flagOptions
var (
Debug x.SubCommand

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File is not goimports-ed (from goimports)

@martinmr
Copy link
Contributor Author

I compared the output with the previous version using opencensus to verify that they are the same.

Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 11 unresolved discussions (waiting on @manishrjain and @golangcibot)


dgraph/cmd/debug/run.go, line 41 at r3 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

File is not goimports-ed (from goimports)

This was due to a formatting issue. I formatted the file and the warning went away.

Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 11 unresolved discussions (waiting on @manishrjain and @golangcibot)


dgraph/cmd/debug/run.go, line 41 at r3 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

This was due to a formatting issue. I formatted the file and the warning went away.

Done.

The new option can be accessed via the --histogram flag. It will print a
histogram of the key and value sizes, as well as the min, max, mean and
standard deviation of their distributions.
Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong: Looks good. Got a few comments. Address them and feel free to merge.

Reviewed 1 of 1 files at r5.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @manishrjain, @golangcibot, and @martinmr)


dgraph/cmd/debug/run.go, line 562 at r5 (raw file):

	fmt.Printf("Min value: %d\n", histogram.Min)
	fmt.Printf("Max value: %d\n", histogram.Max)
	fmt.Printf("Mean: %.2f\n", histogram.Mean)

calculate the mean here.


dgraph/cmd/debug/run.go, line 617 at r5 (raw file):

		item := itr.Item()

		keySize := int64(len(item.Key()))

keySize := ...
keyHistogram.Update(keySize)

valSize := ...
valHistogram.Update(valSize)

Define the variable as close as possible to usage.


dgraph/cmd/debug/run.go, line 620 at r5 (raw file):

		valueSize := item.ValueSize()
		keySizeHistogram.Update(keySize)
		valueSizeHistogram.Update(valueSize)

valueSizeHistogra

Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @manishrjain and @golangcibot)


dgraph/cmd/debug/run.go, line 562 at r5 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

calculate the mean here.

Done.


dgraph/cmd/debug/run.go, line 617 at r5 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

keySize := ...
keyHistogram.Update(keySize)

valSize := ...
valHistogram.Update(valSize)

Define the variable as close as possible to usage.

Done.


dgraph/cmd/debug/run.go, line 620 at r5 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

valueSizeHistogra

Done.

@martinmr martinmr merged commit 320fb1c into master Dec 27, 2018
@martinmr martinmr deleted the martinmr/histogram branch December 27, 2018 23:53
dna2github pushed a commit to dna2fork/dgraph that referenced this pull request Jul 19, 2019
…s. (dgraph-io#2844)

The new option can be accessed via the --histogram flag. It will print a
histogram of the key and value sizes, as well as the min, max, mean and
standard deviation of their distributions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants