-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bigquery/storage/managedwriter: multiplexed writes #7103
Labels
api: bigquery
Issues related to the BigQuery API.
priority: p2
Moderately-important priority. Fix may not be included in next release.
type: feature request
‘Nice-to-have’ improvement, new feature or different behavior or design.
Comments
shollyman
added
type: feature request
‘Nice-to-have’ improvement, new feature or different behavior or design.
api: bigquery
Issues related to the BigQuery API.
priority: p2
Moderately-important priority. Fix may not be included in next release.
labels
Nov 29, 2022
shollyman
added a commit
to shollyman/google-cloud-go
that referenced
this issue
Nov 29, 2022
This PR adds a new internal mechanism to simplify duplicating flow controllers, and does some preliminary work to wire in a UUID-based ID for managed stream instances. Neither is used elsewhere. Towards: googleapis#7103
shollyman
added a commit
to shollyman/google-cloud-go
that referenced
this issue
Dec 15, 2022
This PR establishes new unexported connection and connection pool abstractions. The implementation is partial, and key areas where implementation is missing is generally marked with TODOs. This PR does not alter existing functionality, but continues to lay groundwork for later refactors. Towards: googleapis#7103
shollyman
added a commit
to shollyman/google-cloud-go
that referenced
this issue
Dec 16, 2022
This PR adds two references to live code: * a pool reference on ManagedStream * a reference to ManagedStream on pendingWrite The ManagedStream->pool reference is to allow a writer to resolve where to find its associated connection, retries, lookups, etc. The reference on the pendingWrite is primarily in service of retries, particularly when we need to re-enqueue and thus potentially re-resolve what connection is associated with the writer. This PR also moves some of the retry processing code onto the connectionPool in service to that goal. As before, this is new code that isn't yet referenced from existing functionality. This PR also more substantially starts to carve out connection management in the pool, providing a basic connection resolver and eviction capabilities. This initial implementation is primitive, and aligns with our current behavior (single unshared connection per writer). We also add some testing of the mapping behavior to ensure we're consistently updating the map for resolution and eviction. Towards: googleapis#7103
shollyman
added a commit
to shollyman/google-cloud-go
that referenced
this issue
Dec 28, 2022
This PR augments the base client with fuller support for view-based resolution of GetWriteStream metadata. This PR also adds an integration test that compares behaviors between different stream types (default vs explicitly created). Towards: googleapis#7103
shollyman
added a commit
that referenced
this issue
Dec 29, 2022
This PR augments the base client with fuller support for view-based resolution of GetWriteStream metadata. This PR also adds an integration test that compares behaviors between different stream types (default vs explicitly created). Towards: #7103
codyoss
pushed a commit
that referenced
this issue
Jan 9, 2023
This PR includes much of the rewiring of the existing ManagedStream abstraction, but doesn't cut over to the new implemention yet. We add a reference to the origin writer as part of the pendingWrite which retains information about a single write request and response. This allows us to resolve retry settings for a given write by checking if the writer has a custom retry policy. In other cases, we use the default settings of the connection pool. We introduce internal UUID identifiers to the core abstractions (pool, connection, writer) so that we can add observability later to see which components are responsible for processing requests. We remove the notion of adding connections to the connectionpool contract. Instead, we introduce a new interface in the pool called a poolRouter. By interface contract, it's responsible for picking the correct connection for a given write. However, this allows us to abstract away different implementations for pool behavior and make it the responsibility of an individual router. Further, this PR adds the most simplistic router we'll use for the initial migration to multiplexing (simpleRouter): it supports a single connection, and routes all traffic to it. This PR also moves over more internal functionality from the ManagedStream, namely appendWithRetry() and lockingAppend(). The implementations still remain on the ManagedStream implementation at this time, we'll remove most of the functionality when we cut over to using pools/connections. Towards: #7103
This was referenced Jan 26, 2023
shollyman
added a commit
to shollyman/google-cloud-go
that referenced
this issue
Feb 7, 2023
This PR does more settings consolidation, and updates/adds new WriterOption options to propagate settings. In particular, this PR: * moves the AppendRows call options into streamSettings * adds a multiplex flag and option to streamSettings * adds a call function option into streamSettings This PR also updates managed stream to use the new option(s) as appropriate, but most of this is unused here and is in preparation for a larger cutover of functionality related to the new connection abstractions. Towards: googleapis#7103
shollyman
added a commit
to shollyman/google-cloud-go
that referenced
this issue
Feb 23, 2023
This PR revisits the expected behavior for config knobs in the client. Previously, all configuration was done when instantiating a writer (aka a ManagedStream). There are some chicken-and-egg problems related to multiplex settings, as connection options are decoupled from individual writers. This PR adds the following unexported custom client options (but does not yet use them for anything): * enableMultiplex * defaultInflightRequests * defaultInflightBytes * defaultAppendRowsCallOption This PR also removes the still-unexported enableMultiplex from the set of defined WriterOption options which can be passed when instantiating individual writes. Towards: googleapis#7103
shollyman
added a commit
that referenced
this issue
Feb 27, 2023
…7490) * refactor(bigquery/storage/managedwriter): add custom client options This PR revisits the expected behavior for config knobs in the client. Previously, all configuration was done when instantiating a writer (aka a ManagedStream). There are some chicken-and-egg problems related to multiplex settings, as connection options are decoupled from individual writers. This PR adds the following unexported custom client options (but does not yet use them for anything): * enableMultiplex * defaultInflightRequests * defaultInflightBytes * defaultAppendRowsCallOption This PR also removes the still-unexported enableMultiplex from the set of defined WriterOption options which can be passed when instantiating individual writes. Additionally, this refactor includes a correctness fix for the traceID option that was causing the traceID to duplicate the initial token. Towards: #7103
shollyman
added a commit
to shollyman/google-cloud-go
that referenced
this issue
Mar 2, 2023
This PR extends the poolRouter interface to allow writers to be registered and removed, and augments the existing simpleRouter to support the contract. PR adds a basic test of the router. A future refactor (when we wire up the new abstractions) will hook up the functionality properly. Towards: googleapis#7103
shollyman
added a commit
to shollyman/google-cloud-go
that referenced
this issue
Mar 17, 2023
This PR allows the flowcontroller to report bytes in flight for flow controllers with a bounded byte definition. The primary connection load signals for a connection are the inserts/bytes in flight as reported by the flow controller, and this makes the bytes in flight a signal we can use. Important note: an unbounded flow controller will not report any bytes in flight. This avoids introducing odd situations due to size normalization where bytes tracked and the actual capacity of the semaphore could get out of sync. Towards: googleapis#7103
This was referenced Mar 17, 2023
shollyman
added a commit
to shollyman/google-cloud-go
that referenced
this issue
Mar 24, 2023
This PR switches newConnection to a typed argument to make it less prone to invoke incorrectly. Raised during a review on a related PR. Towards: googleapis#7103
shollyman
added a commit
to shollyman/google-cloud-go
that referenced
this issue
Mar 31, 2023
With recent multiplex refactors, call options were not being propagated properly for non-multiplex writers as we formerly created a pool-per-writer. This allows the router to build exclusive connections using the writers settings, namely overrides to flow control and call options propagated to the underlying AppendRows RPC. Towards: googleapis#7103
This was referenced Mar 31, 2023
At this point, experimental multiplexing is available at head, but not baked into a release version. Next release is 1.51.0, but it is likely release will be deferred until week of April 17. |
This has been released as part of bigquery/v1.51.0 |
While there continues to be smaller features related to multiplexing, going forward we'll track those individually rather than via this umbrella issue. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
api: bigquery
Issues related to the BigQuery API.
priority: p2
Moderately-important priority. Fix may not be included in next release.
type: feature request
‘Nice-to-have’ improvement, new feature or different behavior or design.
This issue tracks the PRs related to supporting multiplex connections in managedwriter.
The text was updated successfully, but these errors were encountered: