Concurrency control in core-data #26325

adamziel · 2020-10-20T16:43:27Z

The problem

While investigating #22127, I discovered what appears to be a massive problem with API interactions in core-data.

There seems to be no concept of concurrency control.

A very simple example is that if I call saveEntityRecord twice on the same record, and it will spark two concurrent POST requests. One of them wins, but the client doesn't know which one. Similarly, I can trigger saveEntityRecord and deleteEntityRecord at the same time and the result will vary depending on the exact timing. That's not very common at the moment, but obvious in #22127.

Let's talk about a common problem. Consider this minimal component that renders some data retrieved using getEntityRecords and saves changes using saveEntityRecord:

https://github.com/samueljseay/gutenberg/blob/c852bee7ce33498c3ee7faca743fac9e473bb03c/test-plugins/core-data/js/index.js#L1-L57

Fetching data

How does the resolution flow look like? To answer that question, I will use a chart since reasoning about timing issues without one is just too hard. Time flows downwards:

Easy peasy, first withSelect only gets an empty list because nothing is stored yet, then resolver kicks in asynchronously, talks to the API, updates the store, and the store re-runs withSelect handler once the data is available.

Saving data

Each time the user clicks a checkbox, the component dispatches saveEntityRecords() and triggers the following chain of events:

There's already a problem here, saveEntityRecord() calls getRawEntityRecord(), which doesn't consider the record with a specific id to be resolved even though a list of records from /wp/v2/books was loaded earlier on. Instead, a resolver is triggered, and an explicit GET request is issued to /wp/v2/books/1 - that happens around the same time as the PUT request to persist changes. Depending on the timing of both, GET results could override the last RECEIVE_ITEMS triggered by saveEntityRecord(). In that case, the user would see some flickering on the screen, and the store would end up with stale data.

Fetching and saving data combined

What is really interesting, though, is what happens when we combine fetching AND saving. Let's take a look:

Woah, that's a lot of arrows and boxes! What happens on that chart is:

We fetch the data first - just like on the very first chart in this issue.
The user triggers a save after the data is loaded - same as the second chart.
Transparent boxes are the interactions with withSelect as well.

This is pretty fragile - there are multiple requests started around the same time, and they may be resolved in any order. If GET /wp/v2/books is resolved last, the store is stuck with stale data until the page is refreshed.

Batch processing is affected tenfold as more entity records in the mix means more timing issues.

Possible solutions

Atomic operations

I really don't like how saveEntityRecord() triggers a bunch of side effects while everything is still up in the air. And even if saveEntityRecord didn't, the user could. The point is that writes and reads mix together in unpredictable ways.

A quick, immediate solution

We could prevent concurrent conflicting operations:

By not resolving while write operation is in progress
Using locking (shared lock and exclusive lock maybe?). Selectors would "just work", but API reads (resolvers) and API writes would never run clashing operations at the same time.

Also, to reduce the number of factors in the mix we could apply optimistic updates a bit differently. Namely, instead of using receiveEntityRecords, we could leverage editedEntityRecord by just adding one more edit signifying a "checkpoint". It would work like this: 1. Edits before the checkpoint are frozen as long as the checkpoint is in place, 2. Successful save discards all the edits before the checkpoint and replaces the entity record with the server response 3. Failed save simply removes the checkpoint, no rollback is needed.

One other solution is to assume core/data is a low-level API that just does what it's told and shift the burden of concurrency control onto the consumer. As in: assume it's the developer's responsibility to avoid conflicting select() and dispatch() calls. Even in this scenario, core/data gets in its own way by triggering resolvers when it shouldn't. Also, the pitfalls are very generic so the logic implemented by each and every consumer would be almost the same. An alternative would be to build a higher-level API on top of core/data that would understand locking and concurrency.

Long term considerations

The above would solve the problem for now and may even suffice for a bit longer. It has some downsides though:

Stacking multiple conflicting operations may take a long time to process
It doesn't address conflict resolution, e.g. server receiving conflicting updates from another user. This would be really nice to address but it doesn't seem to be a blocker for now. These explorations could part of multi-user concurrent editing project too - it's a closely related problem and could potentially require rethinking how core data works. Here's a bunch of related acronyms that cold come handy later on: MVVC, OT, CRDT.

While 1 could potentially be addressed by 2, there is another simple solution: squash enqueued operations when possible. E.g. if there are two updates waiting to be processed, we could perform just one. Thee updates and a delete? Perform only the delete.

Re-use fetched data

Selecting specific entity record could re-use one fetched earlier from the list API endpoint. As in getEntityRecords( 'postType', 'book' ); followed by getEntityRecords( 'postType', 'book', 4 ); could ideally trigger just a single request - the first one to /wp/v2/books. Ideally that would be the case even if the list request is still in progress.
Alternatively, resolving a list of entity records could also resolve specific records, although I'm not sure if it would fix it entirely.

cc @youknowriad @mcsf @gziolo @draganescu @noisysocks @talldan @tellthemachines @kevin940726 @jorgefilipecosta @mtias @samueljseay @ellatrix @TimothyBJacobs

The text was updated successfully, but these errors were encountered:

adamziel · 2020-10-20T16:45:57Z

Also, I'm acting as a reporter here, but I'm also happy to spin up a PR exploring a solution (or many PRs).

adamziel · 2020-10-21T10:28:49Z

I considered different strategies of removing the interference between resolvers and did a pairwise analysis of how different types of API operations interact when concurrent:

Concurrent API operations on two entity records

Same type, different IDs

Same type, same ID

Record operations vs list operations

Partial results as in we need to re-request the data after the writes are finished:

Some record vs list scenarios are "optimistically negotiable" with various degrees of complexity. For example:

Updating a record and fetching a list of results concurrently could work if we optimistically mask any server results with local changes until the update is finished. Keeping track of version numbers or timestamps would make it more reliable.
Similarly, deleting a record and fetching a list concurrently could if we keep track of deletions in progress and filter out corresponding records in selectors.
Deleting a record and fetching a page (number=2, size=10) of results concurrently could work if we're okay with having only 9 results for a moment and then re-fetching the page once the delete is complete. There is some added complexity if a request for page 3 was completed after the delete was finished but before re-requesting page 2 was completed. Using a cursor for pagination would make it easier.
Inserting a record while fetching a list of results - we could guess where the new record fits in the list of results returned by the server. I'm not sure how reliable could this one be though.

That being said, I don't think the initial fix needs to include any of that.

noisysocks · 2020-10-22T00:57:54Z

Tough problem! Thanks for the clear write-up, though.

Are there any other libraries that we can look to for inspiration?

It sounds like we need to make it so that, instead of calling apiFetch() directly, actions like getEntityRecord() and saveEntityRecord() add an apiFetch() request to a queue. Each entity has its own queue. Requests in the queue are performed one at a time. Some subsequences of requests in the queue can be done in parallel, e.g. multiple GET requests. Some subsequences of requests in the queue can be collapsed into one request, e.g. multiple PUT /books/1 requests.

Stacking multiple conflicting operations may take a long time to process

How common is this? Maybe it's not such a bad trade-off.

adamziel · 2020-10-22T12:35:01Z

@noisysocks my thoughts exactly! Only I wouldn't enqueue just network requests but everything asynchronous or affected by asynchronicity. If only network requests themselves are stacked there are less possible outcomes, but there are still many. For example:

Note how the failed operation overwritten the result of the successful one. I can come up with more example like that. The point is that to get rid of interferences I would considering entire segments of code as "critical sections" or "atomic operations":

This makes things conceptually simple by taking all the asynchronicity out of the equation - interfering operations are always executed serially and the outcomes are easily predictable as nothing else updates parts of the state they depend on.

adamziel · 2020-10-22T12:38:57Z

I'm exploring the idea of atomic operations and locks in #26389

adamziel · 2020-11-03T11:46:45Z

Surfacing this comment here:

#26627 and #26575 improved the situation here. There is much less flickering, although there is still some of it as seen below:

I think locks would be the ultimate solution here. The downside is that they would add a layer of complexity to an already complex core-data so I'm still trying to come up with some alternatives. But for reference, see the same interaction with #26389 applied:

adamziel · 2020-11-10T15:46:10Z

#26389 addressed the bulk of this problem. It would still be amazing to implement a lock-less solution or be more "optimistic" about different operations, but the bug part of this issue is now fixed 🎉

adamziel added [Package] Data /packages/data [Package] Core data /packages/core-data labels Oct 20, 2020

adamziel changed the title ~~Race conditions in resolution logic~~ Race conditions in core/data Oct 20, 2020

adamziel changed the title ~~Race conditions in core/data~~ Race conditions in core-data Oct 20, 2020

adamziel changed the title ~~Race conditions in core-data~~ Race conditions and data consistency problems in core-data Oct 21, 2020

adamziel changed the title ~~Race conditions and data consistency problems in core-data~~ Race conditions and lack of data consistency in core-data Oct 21, 2020

adamziel changed the title ~~Race conditions and lack of data consistency in core-data~~ Concurrency control in core-data Oct 21, 2020

adamziel added [Package] Edit Widgets /packages/edit-widgets [Package] Edit Site /packages/edit-site labels Oct 21, 2020

adamziel self-assigned this Oct 21, 2020

This was referenced Oct 22, 2020

Demo of concurrency control issue #26387

Closed

[core data] State locks for concurrency control #26389

Merged

This was referenced Oct 29, 2020

Resolve per-entity resolvers after receiving a list of records #26575

Merged

Do not invalidate the entity record during optimistic update in saveEntityRecord #26627

Merged

Field-level resolution in core data #26629

Open

gziolo added the [Type] Bug An existing feature does not function as intended label Nov 8, 2020

adamziel closed this as completed Nov 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrency control in core-data #26325

Concurrency control in core-data #26325

adamziel commented Oct 20, 2020 •

edited

Loading

adamziel commented Oct 20, 2020 •

edited

Loading

adamziel commented Oct 21, 2020 •

edited

Loading

noisysocks commented Oct 22, 2020

adamziel commented Oct 22, 2020 •

edited

Loading

adamziel commented Oct 22, 2020

adamziel commented Nov 3, 2020

adamziel commented Nov 10, 2020

Concurrency control in core-data #26325

Concurrency control in core-data #26325

Comments

adamziel commented Oct 20, 2020 • edited Loading

The problem

Fetching data

Saving data

Fetching and saving data combined

Possible solutions

Atomic operations

Re-use fetched data

adamziel commented Oct 20, 2020 • edited Loading

adamziel commented Oct 21, 2020 • edited Loading

Concurrent API operations on two entity records

Same type, different IDs

Same type, same ID

Record operations vs list operations

noisysocks commented Oct 22, 2020

adamziel commented Oct 22, 2020 • edited Loading

adamziel commented Oct 22, 2020

adamziel commented Nov 3, 2020

adamziel commented Nov 10, 2020

adamziel commented Oct 20, 2020 •

edited

Loading

adamziel commented Oct 20, 2020 •

edited

Loading

adamziel commented Oct 21, 2020 •

edited

Loading

adamziel commented Oct 22, 2020 •

edited

Loading