Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[THOG-327] Resuse the strings builder when writing each fragment. #535

Closed
wants to merge 1 commit into from

Conversation

ahrav
Copy link
Collaborator

@ahrav ahrav commented May 10, 2022

Following up #534 reusing the strings.builder does provide a pretty nice speedup during the gitscan.

Rough test: (as a quick rough estimate using a decent sized repo)
Screen Shot 2022-05-09 at 7 45 46 PM

NOTE: ideally we can get some even better benchmarks, but for now this looks to be quite promising.

@ahrav ahrav requested a review from a team May 10, 2022 02:53
Copy link
Collaborator

@mcastorina mcastorina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@@ -349,6 +349,7 @@ func (s *Git) ScanCommits(repo *git.Repository, path string, scanOptions *ScanOp
Data: []byte(sb.String()),
Verify: s.verify,
}
sb.Reset()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sets the buffer to nil on strings.Builder so it doesn't actually prevent reallocation of the buffer like bytes.Buffer's Reset() does.

The total run time from your test is misleading. Notice the user CPU time is actually greater on the second test. Hard to say what the performance difference is with this change, but I'm guessing it's probably not much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I was curious why it doesn't do the same b.buf = b.buf[:0] for the string builder and found this: golang/go#24716

The returned string data is the []byte and so reusing it would mean modifying a previous returned string (which should be immutable).

Copy link
Collaborator Author

@ahrav ahrav May 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why there isn't say a Clear() method on the Builder? I think that could be potentially useful in order to keep that underlying array in memory vs it getting re-allocated.

Alas i'll poke at this a bit more using bytes.buffer w/ reset see what we can conjure up. I'll instead also maybe benchmark only the method in question to see if we can get some more accurate benchmark values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it looks like a design tradoff in the strings builder. They opted for being able to output a string from bytes with zero allocations instead of being able to reuse the buffer: https://cs.opensource.google/go/go/+/refs/tags/go1.18.1:src/strings/builder.go;l=47

@ahrav ahrav closed this May 10, 2022
@ahrav ahrav deleted the THOG-327-reuse-strings-builder branch May 10, 2022 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants