Fix concurrent access for SQL storage. #1370

kepkin · 2018-12-10T13:25:28Z

SERIALIZABLE isolation level is not viable solution, as it will slow down the whole Postgres. And again there must be retries as you will always receive errors like "could not serialize access...".

Previous pull request for retries like #1356 were wrong as it resulted in deadlock in Database (you open transaction inside another transaction in conformance tests).

In SQL world the better solution is to lock the row explicitly with SELECT FOR UPDATE.

srenatus

Code-wise, this looks good. I'm curious to play with this a little, and I didn't get to it yet.

Thanks for digging in, and providing some domain-knowledge where it's needed! 😃

srenatus · 2018-12-11T11:31:37Z

storage/sql/crud.go

@@ -380,13 +386,23 @@ func (c *conn) UpdateKeys(updater func(old storage.Keys) (storage.Keys, error))
 		firstUpdate := false
 		// TODO(ericchiang): errors may cause a transaction be rolled back by the SQL
 		// server. Test this, and consider adding a COUNT() command beforehand.


🤔 Is that comment still accurate now?

I don't remember. Ha

srenatus · 2018-12-11T11:37:07Z

storage/sql/sql.go

@@ -22,6 +22,10 @@ type flavor struct {
 	// Optional function to create and finish a transaction.
 	executeTx func(db *sql.DB, fn func(*sql.Tx) error) error


This field could go away now, can't it?

srenatus · 2018-12-11T11:38:17Z

storage/sql/crud.go

@@ -140,10 +140,16 @@ func (c *conn) UpdateAuthRequest(id string, updater func(a storage.AuthRequest)
 			return err
 		}

+		err = c.flavor.lockForUpdate(tx, "auth_request", "id", r.ID)
+		if err != nil {
+			return fmt.Errorf("update auth request: %v", err)


💭 All these fmt.Errorf make me nervous -- they're hard to deal with in call sites -- but this is the style this package is written in, so it's fine at add a few. 👍

We can switch to github.com/pkg/errors in another PR :)

vito · 2018-12-11T16:27:12Z

storage/sql/sql.go

-			}
-			return tx.Commit()
+		lockForUpdate: func(tx *trans, table, column, value string) error {
+			_, err := tx.Exec("SELECT 1 FROM "+table+" WHERE "+column+" = $1 FOR UPDATE NOWAIT;", value)


(For my own understanding,) NOWAIT is used here because the intent of this PR is still to fail one transaction in the event of concurrent updates? i.e. same goal as before, just a different (less heavy-handed) method. And so the existing concurrency tests still pass with no changes necessary.

So this doesn't attempt to address #1341 (and associated PR #1342) as-is; that could be done in a separate PR and would also involve changes to the tests. I suppose that could either be done by removing the NOWAIT or adding explicit retry logic, similar to #1342.

I'm happy to submit that PR after this is merged if no one else has plans to already, not trying to increase scope of your PR. Just making sure I understand. 🙂

Thanks!

Probably better to add an inline comment :)

This works, but the lockForUpdate API seems a bit risky. Ideally it could be baked into ExecTx or something, but nothing's coming to mind at the moment.

Yes NOWAIT here is only because of tests. And yes, I think tests need to be redesigned a little bit, as changing the same auth_request concurrently wouldn't actually harm a login flow, and deadlock which current tests produce is not possible in real life situations.

We should probably just update/remove the tests then, right?

I haven't come to idea how to test concurrency better yet. Maybe later.

What about the issue with interleaving connectors in the concurrent login flow mentioned in #1356 (comment) ?

Failing one of the transactions seems to be the only way to handle it in current implementation of handler/storage.

ericchiang · 2018-12-11T16:39:04Z

storage/sql/sql.go

-		//
-		// NOTE(ericchiang): For some reason using `SET SESSION CHARACTERISTICS AS TRANSACTION` at a
-		// session level didn't work for some edge cases. Might be something worth exploring.
-		executeTx: func(db *sql.DB, fn func(sqlTx *sql.Tx) error) error {


Does anything use this anymore? can we nuke it?

storage/sql/crud.go

storage/sql/sql.go

srenatus

I'd rather fix the tests to remove the requirement for that NOWAIT (if I understand the situation correctly). I'll try to come up with something. 🤔

srenatus · 2018-12-14T07:44:09Z

storage/sql/crud.go

+			return fmt.Errorf("get keys: %v", err)
+		}
+
+		old, err = getKeys(tx)


[nit] Let's ditch line 389 and go with old, err := here, now that the code is rearranged.

srenatus · 2018-12-14T08:23:45Z

Wait a moment, I've misunderstood something there, around #1370 (comment).

So, what would be a better set of test cases? Which app-level conditions do we want to avoid, when it comes to concurrent actions?

(To be explicit -- don't wait for me based on #1370 (review), I have nothing up my sleeves)

bonifaido · 2019-08-03T13:22:53Z

Can we revive this PR? I'm happy to help. We have an issue with the MySQL storage backend where according to the old conformance tests (the ones currently in master) SERIALIZABLE isolation level is needed, which has quite some issues on different MySQL implementations (as in #1511). Percona XtraDB doesn't let you use it by default, AWS Aurora doesn't have it at all, on MariaDB transaction_isolation is called tx_isolation so it is a pain TBH.

FYI: I have rebased this branch on the current master and copied the lockForUpdate method from the Postgres implementation, removed the explicit SERIALIZABLE tx_isolation level setting, all the MySQL conformance tests have passed on MariaDB, in MySQL 5.7 NOWAIT doesn't exist, it got introduced in 8.0, so there it works. But if I understand correctly NOWAIT is something we would like to see removed.

srenatus · 2019-08-03T16:46:01Z

💯 I'd be very interested in getting this resolved. I think the main blocker here is that we have no test suite that makes the error appear -- our contributors, however, seem to have that. This would help evaluating the different approaches that have been put forward when this issue had been on the table last winter....

kepkin added 2 commits December 10, 2018 16:13

Fix concurrent access for SQL storage.

e8fe795

Fmt.

2d57695

kepkin mentioned this pull request Dec 11, 2018

postgres: fix tx leak on serialization failure #1363

Closed

srenatus approved these changes Dec 11, 2018

View reviewed changes

vito reviewed Dec 11, 2018

View reviewed changes

ericchiang reviewed Dec 11, 2018

View reviewed changes

storage/sql/crud.go Outdated Show resolved Hide resolved

ericchiang reviewed Dec 11, 2018

View reviewed changes

storage/sql/sql.go Outdated Show resolved Hide resolved

kepkin added 2 commits December 12, 2018 13:52

Remove not used executeTx, better documentation for locking function.

8f68b89

More sane code for getting first keys.

ec4e637

srenatus approved these changes Dec 14, 2018

View reviewed changes

Remove not needed line.

2e9a63b

This was referenced Dec 17, 2018

quickfix #1355: only try three times #1356

Closed

Release v2.14.0 #1362

Closed

Postgres: forcing transactions to be serializable breaks concurrent login #1341

Open

jwntrs mentioned this pull request Apr 3, 2019

Move to upstream dex concourse/concourse#3306

Closed

mkontani mentioned this pull request Aug 3, 2019

Fix for mysql xtraDB support #1511

Open

bonifaido mentioned this pull request Sep 23, 2019

MySQL storage - Take 2 #1485

Merged

jwntrs mentioned this pull request Apr 16, 2020

Investigate login errors and watsjs concourse/concourse#3890

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix concurrent access for SQL storage. #1370

Fix concurrent access for SQL storage. #1370

kepkin commented Dec 10, 2018

srenatus left a comment

srenatus Dec 11, 2018

ericchiang Dec 11, 2018

srenatus Dec 11, 2018

srenatus Dec 11, 2018

ericchiang Dec 11, 2018

vito Dec 11, 2018 •

edited

Loading

ericchiang Dec 11, 2018

kepkin Dec 11, 2018

ericchiang Dec 11, 2018

kepkin Dec 13, 2018

krylovsk Apr 16, 2019

ericchiang Dec 11, 2018

kepkin Dec 13, 2018

srenatus left a comment

srenatus Dec 14, 2018

srenatus commented Dec 14, 2018

bonifaido commented Aug 3, 2019 •

edited

Loading

srenatus commented Aug 3, 2019

		@@ -22,6 +22,10 @@ type flavor struct {
		// Optional function to create and finish a transaction.
		executeTx func(db sql.DB, fn func(sql.Tx) error) error

Fix concurrent access for SQL storage. #1370

Are you sure you want to change the base?

Fix concurrent access for SQL storage. #1370

Conversation

kepkin commented Dec 10, 2018

srenatus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vito Dec 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

srenatus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

srenatus commented Dec 14, 2018

bonifaido commented Aug 3, 2019 • edited Loading

srenatus commented Aug 3, 2019

vito Dec 11, 2018 •

edited

Loading

bonifaido commented Aug 3, 2019 •

edited

Loading