-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[THOG-327] Resuse the strings builder when writing each fragment. #535
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
@@ -349,6 +349,7 @@ func (s *Git) ScanCommits(repo *git.Repository, path string, scanOptions *ScanOp | |||
Data: []byte(sb.String()), | |||
Verify: s.verify, | |||
} | |||
sb.Reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sets the buffer to nil on strings.Builder so it doesn't actually prevent reallocation of the buffer like bytes.Buffer's Reset() does.
The total run time from your test is misleading. Notice the user CPU time is actually greater on the second test. Hard to say what the performance difference is with this change, but I'm guessing it's probably not much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and for comparison, the bytes.Buffer Reset(): https://cs.opensource.google/go/go/+/refs/tags/go1.18.1:src/bytes/buffer.go;l=97
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, I was curious why it doesn't do the same b.buf = b.buf[:0]
for the string builder and found this: golang/go#24716
The returned string
data is the []byte
and so reusing it would mean modifying a previous returned string
(which should be immutable).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why there isn't say a Clear()
method on the Builder? I think that could be potentially useful in order to keep that underlying array in memory vs it getting re-allocated.
Alas i'll poke at this a bit more using bytes.buffer w/ reset see what we can conjure up. I'll instead also maybe benchmark only the method in question to see if we can get some more accurate benchmark values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it looks like a design tradoff in the strings builder. They opted for being able to output a string from bytes with zero allocations instead of being able to reuse the buffer: https://cs.opensource.google/go/go/+/refs/tags/go1.18.1:src/strings/builder.go;l=47
Following up #534 reusing the strings.builder does provide a pretty nice speedup during the gitscan.
Rough test: (as a quick rough estimate using a decent sized repo)
NOTE: ideally we can get some even better benchmarks, but for now this looks to be quite promising.