-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reuse Large Buffers in MigrateSession #623
base: main
Are you sure you want to change the base?
Conversation
29aa345
to
146da1a
Compare
e13374e
to
9d65b87
Compare
18e9c50
to
d34f522
Compare
2ab3868
to
c362324
Compare
c362324
to
dfc86d7
Compare
@@ -224,15 +225,15 @@ private void PopulateObjectStoreStats(StoreWrapper storeWrapper) | |||
]; | |||
} | |||
|
|||
public void PopulateStoreHashDistribution(StoreWrapper storeWrapper) => storeHashDistrInfo = [new("", storeWrapper.store.DumpDistribution())]; | |||
private void PopulateStoreHashDistribution(StoreWrapper storeWrapper) => storeHashDistrInfo = [new("", storeWrapper.store.DumpDistribution())]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to explicitly call out "private" members in the codebase.
libs/common/NetworkBuffers.cs
Outdated
/// <summary> | ||
/// Create a NetworkBuffers instance | ||
/// </summary> | ||
public struct NetworkBuffers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This struct does not do much, can create challenges with single-copy as it is a struct with value semantics, creates another level of indirection everywhere, and generally pervades across the codebase. The main thing it does is to use two allocation sizes, which can be maintained by callers or within LFBP itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments.
/// <summary> | ||
/// Used to free up buffer pool | ||
/// </summary> | ||
public void Purge() => networkBuffers.Purge(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be dangerous to call potentially mutating methods (purge sounds mutating, although its implementation here is safe as it probably passes through into the LFBP class). As this could be a potential copy of the struct, it can create subtle bugs, as well as correctness and maintenance challenges.
libs/common/LightClient.cs
Outdated
@@ -212,7 +211,7 @@ public override void Dispose() | |||
{ | |||
networkSender.ReturnResponseObject(); | |||
networkHandler?.Dispose(); | |||
networkPool.Dispose(); | |||
networkBuffers.Dispose(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't really reliably dispose structs.
@@ -77,6 +77,8 @@ private GarnetClient CreateConnection(string nodeId) | |||
address, | |||
port, | |||
clusterProvider.serverOptions.TlsOptions?.TlsClientOptions, | |||
sendPageSize: 1 << 17, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the 3rd time (so far) I've seen the magic # 17, so best to make it a const int somewhere
client = new GarnetClientSession( | ||
address, | ||
port, | ||
new(Math.Max(131072, opts.IntraThreadParallelism * opts.ValueLength)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just new() by itself (with or without args) requires an extra step to figure it out. Better to prefix with param name or include the type.
this.clusterProvider = clusterProvider; | ||
var bufferSize = 1 << clusterProvider.serverOptions.PageSizeBits(); | ||
this.networkBufferSettings = new NetworkBufferSettings(bufferSize, 1 << 12); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where does 12 come from? These magic numbers should be a "const long" or a config
@@ -46,8 +46,10 @@ public AofSyncTaskInfo( | |||
|
|||
public void Dispose() | |||
{ | |||
iter?.Dispose(); | |||
cts.Cancel(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs a ?
this.clusterProvider = clusterProvider; | ||
this.storeWrapper = clusterProvider.storeWrapper; | ||
|
||
this.networkBufferSettings = new NetworkBufferSettings(1 << 22, 1 << 12); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
magic numberrrrrrrrs
@@ -61,8 +69,8 @@ public void Return(PoolEntry buffer) | |||
Interlocked.Decrement(ref pool[level].size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs commenting. My first thought on seeing .size was to wonder why we can't just use the queue.Count method. Now it looks like .size is total number of allocations? The PoolLevel field name comments are uninformative.
{ | ||
#if HANGDETECT | ||
if (++count % 10000 == 0) | ||
logger?.LogTrace("Dispose iteration {count}, {activeHandlerCount}", count, activeHandlerCount); | ||
#endif | ||
Thread.Yield(); | ||
} | ||
for (int i = 0; i < numLevels; i++) | ||
for (var i = 0; i < numLevels; i++) | ||
{ | ||
if (pool[i] == null) continue; | ||
while (pool[i].size > 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding correctly that .size is total # of allocations which may be > maxEntriesPerLevel, then this can spin forever if there are unReturned items. This might warrant an Assert or logging with an exit. (Holding an unReturned item and calling Dispose() (and Purge()?) is bad anyway, so let's make it easier to catch).
libs/common/NetworkBuffers.cs
Outdated
/// <summary> | ||
/// Initial allocation size for receive network buffer. | ||
/// (NOTE: Receive buffers can automatically grow to accomodate larger payloads.) | ||
/// </summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good comments. "Accommodate"
/// MigrationManager Buffer Pool | ||
/// </summary> | ||
MM, | ||
/// <summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not Migration, Replication, ServerSocket?
internal sealed unsafe partial class RespServerSession : ServerSessionBase | ||
{ | ||
private bool NetworkBurgeBP() | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"PurgeBP"
This PR tries to improve memory utilization and reduce fragmentation by reusing large buffers that were allocated across different migrate sessions.
The PR includes the following:
Notes:
There is an upper limit on the number of entries per level in the LimitedBufferPool which may cause fragmentation of the LOH due to the way we allocate and return buffers to the pool itself (shown below)
garnet/libs/common/Memory/LimitedFixedBufferPool.cs
Lines 55 to 61 in 68a7a85
Separate buffer pool from send and receive spec.
Remove unused parts of NetworkSenderBase.
Resize allocation for send/receive of replication code.