Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for gocql/gocql issue 1274 #22

Merged
merged 3 commits into from
Aug 14, 2019

Conversation

martin-sucha
Copy link

The first commit adds a reproducer test for apache#1274 which runs for 100ms. I'll let you decide if you want to include it or not.

The second commit is almost identical to apache#1332

Henrik Johansson and others added 3 commits July 2, 2019 16:17
Add option to fallback to other DC for given token
This test reproduces the race condition.
tokenAwareHostPolicy keeps track of hosts in the cluster, keeps the
token ring and a map of token to replicas that keep the data (which is
computed using a placement strategy).

One issue that is present in old the code is that it updates the host
list and token ring whenever a node goes up or down. While the replica
map is only updated in AddHost/RemoveHost, it could be computed based
on partial host list in case AddHost/RemoveHost is called whenever some
host is down. The placement strategy would not see all nodes in the
cluster in that case. In fact, HostUp is never called anywhere in the
code now and AddHost is called even in case when a node goes up. So this
commit changes tokenAwareHostPolicy to only update token ring and
replica map whenever hosts change in AddHost/RemoveHost and not
HostUp/HostDown.

There is also a race condition in updateKeyspaceMetadata when computing
replicas, as it accesses fields that are synchronized separately
(tokenRing and hosts), but the placement strategy assumes the values it
receives are consistent. This leads to panics triggered in replicaMap
in case the race is hit when hosts are rapidly added/removed or getting
up/down (such as when we lose connection to multiple nodes at the same
time).

One sequence of events that might lead to the race condition and panic
in networkTopology.replicaMap could be like below:

1. A: calls tokenAwareHostPolicy.AddHost
2. A: calls tokenAwareHostPolicy.HostUp
3. A: calls tokenAwareHostPolicy.updateKeyspaceMetadata
4. A: calls tr, _ := t.tokenRing.Load().(*tokenRing) to read token Ring
5. B: calls tokenAwareHostPolicy.RemoveHost
6. B: calls tokenAwareHostPolicy.HostDown which calls
      t.hosts.remove(host.ConnectAddress())
7. A: calls strat.replicaMap(t.hosts.get(), tr.tokens)

To fix the inconsistencies, we need to perform the entire state change
atomically.

I think we can just lock (for writing) during the entire state update
as host add/remove should be rare. The remaining reads in Pick still
use lock-free access, but we now copy and replace all the metadata with
a single atomic.Value store instead of multiple atomic.Value fields.

I left the hosts field as cowHostList type, but it could be replaced by
a simple unsynchronized type as all accesses to hosts are now protected
by `tokenAwareHostPolicy.mu`.

Fixes apache#1274
@dahankzter dahankzter merged commit 47e1472 into scylladb:master Aug 14, 2019
dahankzter pushed a commit that referenced this pull request Aug 30, 2019
This commit fixes the next error:
```
type *tokenAwareHostPolicy has no field or method keyspaces
```

which was introduced with 47e1472 after merging
#22

Commit 9867ae9 removed field used in the test and later change was
not reflected during ongoing merges.
dahankzter pushed a commit that referenced this pull request Oct 8, 2019
This commit fixes the next error:
```
type *tokenAwareHostPolicy has no field or method keyspaces
```

which was introduced with 47e1472 after merging
#22

Commit 9867ae9 removed field used in the test and later change was
not reflected during ongoing merges.
dahankzter pushed a commit that referenced this pull request Oct 9, 2019
This commit fixes the next error:
```
type *tokenAwareHostPolicy has no field or method keyspaces
```

which was introduced with 47e1472 after merging
#22

Commit 9867ae9 removed field used in the test and later change was
not reflected during ongoing merges.
dahankzter pushed a commit that referenced this pull request Oct 14, 2019
This commit fixes the next error:
```
type *tokenAwareHostPolicy has no field or method keyspaces
```

which was introduced with 47e1472 after merging
#22

Commit 9867ae9 removed field used in the test and later change was
not reflected during ongoing merges.
dahankzter pushed a commit that referenced this pull request Oct 16, 2019
This commit fixes the next error:
```
type *tokenAwareHostPolicy has no field or method keyspaces
```

which was introduced with 47e1472 after merging
#22

Commit 9867ae9 removed field used in the test and later change was
not reflected during ongoing merges.
dahankzter pushed a commit that referenced this pull request Nov 7, 2019
This commit fixes the next error:
```
type *tokenAwareHostPolicy has no field or method keyspaces
```

which was introduced with 47e1472 after merging
#22

Commit 9867ae9 removed field used in the test and later change was
not reflected during ongoing merges.
dahankzter pushed a commit that referenced this pull request Jan 2, 2020
This commit fixes the next error:
```
type *tokenAwareHostPolicy has no field or method keyspaces
```

which was introduced with 47e1472 after merging
#22

Commit 9867ae9 removed field used in the test and later change was
not reflected during ongoing merges.
martin-sucha pushed a commit to kiwicom/gocql that referenced this pull request Jan 31, 2020
This commit fixes the next error:
```
type *tokenAwareHostPolicy has no field or method keyspaces
```

which was introduced with 47e1472 after merging
scylladb#22

Commit 9867ae9 removed field used in the test and later change was
not reflected during ongoing merges.
martin-sucha pushed a commit to kiwicom/gocql that referenced this pull request Feb 6, 2020
This commit fixes the next error:
```
type *tokenAwareHostPolicy has no field or method keyspaces
```

which was introduced with 47e1472 after merging
scylladb#22

Commit 9867ae9 removed field used in the test and later change was
not reflected during ongoing merges.
dahankzter pushed a commit that referenced this pull request Feb 6, 2020
This commit fixes the next error:
```
type *tokenAwareHostPolicy has no field or method keyspaces
```

which was introduced with 47e1472 after merging
#22

Commit 9867ae9 removed field used in the test and later change was
not reflected during ongoing merges.
martin-sucha pushed a commit to kiwicom/gocql that referenced this pull request Mar 2, 2020
This commit fixes the next error:
```
type *tokenAwareHostPolicy has no field or method keyspaces
```

which was introduced with 47e1472 after merging
scylladb#22

Commit 9867ae9 removed field used in the test and later change was
not reflected during ongoing merges.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants