-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rolling restart of k8s db cluster causes all nodes to remain down and not recover #1582
Comments
Just an update to this statement. Per yugabyte/yugabyte-db#10182 it is still an active issue even with the forked client having been updated to gocql |
We're also hitting this in our production environment when doing rolling Create a minikube cluster: $ minikube start --cni calico Deploy a 3 node cassandra cluster to minikube: $ cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
name: cassandra
labels:
app: cassandra
spec:
ports:
- port: 9042
name: cql
type: ClusterIP
selector:
app: cassandra
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: cassandra
spec:
replicas: 3
selector:
matchLabels:
app: cassandra
serviceName: cassandra
template:
metadata:
labels:
app: cassandra
spec:
terminationGracePeriodSeconds: 1800
containers:
- name: cassandra
image: cassandra:3.11.14
readinessProbe:
tcpSocket:
port: 9042
env:
- name: CASSANDRA_SEEDS
value: cassandra-0,cassandra-0.cassandra
ports:
- containerPort: 9042
name: cql
- containerPort: 7000
name: intra-node
lifecycle:
preStop:
exec:
command: [ "nodetool", "drain" ]
volumeMounts:
- name: data
mountPath: /var/lib/cassandra
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: standard
resources:
requests:
storage: 10Gi
EOF Wait for cassandra to start. The progress may be monitored by calling: $ kubectl get pod -l app=cassandra Once all pods are marked as running, we are good to go: NAME READY STATUS RESTARTS AGE
cassandra-0 1/1 Running 0 2m28s
cassandra-1 1/1 Running 2 (2m2s ago) 2m8s
cassandra-2 1/1 Running 0 48s On my system, cassandra-1 restarts a couple of times before bootstrapping, as Now we need to add a route to the system so that we can directly connect to the $ sudo route add -net $(kubectl cluster-info dump | grep -m 1 cluster-cidr | cut -d = -f 2 | cut -d '"' -f 1) gw $(minikube ip) After this, we can run the following test program: package main
import (
"fmt"
"os"
"os/signal"
"syscall"
"github.com/gocql/gocql"
)
type connectObs struct{}
func (connectObs) ObserveConnect(c gocql.ObservedConnect) {
if c.Err == nil {
fmt.Printf(
"Connected to host %s in %s\n",
c.Host.HostnameAndPort(), c.End.Sub(c.Start),
)
} else {
fmt.Printf(
"Failed connecting to host %s after %s: %s\n",
c.Host.HostnameAndPort(), c.End.Sub(c.Start), c.Err,
)
}
}
func main() {
if len(os.Args) != 2 {
fmt.Println("Supply a cassandra host")
os.Exit(1)
}
conf := gocql.NewCluster(os.Args[1])
conf.ProtoVersion = 4
_, err := gocql.NewSession(*conf)
if err != nil {
fmt.Println(err)
os.Exit(1)
}
fmt.Println("connected")
ch := make(chan os.Signal)
signal.Notify(ch, syscall.SIGINT, syscall.SIGTERM)
<-ch
} Now we can start the test program and connect to a node: $ go run -tags gocql_debug . $(kubectl get pod cassandra-0 --template '{{.status.podIP}}')
2022/11/11 10:28:14 gocql: Session.handleNodeConnected: 10.244.120.67:9042
2022/11/11 10:28:14 gocql: conns of pool after stopped "10.244.120.67": 2
2022/11/11 10:28:14 gocql: Session.handleNodeConnected: 10.244.120.68:9042
connected
2022/11/11 10:28:14 gocql: Session.handleNodeConnected: 10.244.120.69:9042
2022/11/11 10:28:14 gocql: conns of pool after stopped "10.244.120.68": 2
2022/11/11 10:28:14 gocql: conns of pool after stopped "10.244.120.69": 2 We can see that all three nodes were connected. Now we will restart one of the $ kubectl delete pod cassandra-2
pod "cassandra-2" deleted It will get a new IP. gocql should discover and connect to the new node, but
gocql misses the message
This will cause the I tried patching this so that the message doesn't get dropped, but the bug still Hope this helps. I'd be happy to assist further in fixing the bug :) |
I think that this broke at 64cda7b. If I revert to the commit before that (ae365fa) the restarted cassandra node is detected with its new IP:
The |
I'm surprised this hasn't been an issue for more users to get more attention on this. That suggests most people are not running in a container orchestrated platform. Maybe the most common is VM or bare metal? |
We have same problem. I just noticed it when performing disaster recovery in development env (restored medusa backup which causes datacenter shutdown / restart). running k8ssandra-operator 0.38.2 (cassandra 4.0.1) and gocql v1.2.1 I was planning on fixing this by restarting pod(s) that have problems connecting to Cassandra since these errors could be caused by actual network problems too. Never-ending reconnect loop in the pod is not a good idea. Maybe few reconnect attempts and then change pod health check response so that K8s will kill the pod and re-schedule it.. |
Hi @justinfx , for context, we are experiencing a very similar situation here, we are running Cassandra 3.11.13 on many (hundreds of) clusters, and following a recent trend (on our side) of node replacements, with new IPs etc, we are seeing this happening daily. It is complex for us to reproduce it though, and we're working on a small isolated use-case which could expose the problem. Our GoCQL version is The problem is definitely real, we witness it routinely, and have thousands of Cassandra nodes running on k8s. Also I think this #1667 is related as well. We've been studying GoCQL code on our side, but did not identify the root cause yet. What we witness is basically clients dying with error As I mentioned, we're trying to reproduce the problem on a smaller scale. |
@ufoot I'm glad to hear others are trying to look at this and narrow down repros. You seems to have quite a massive deployment, whereas my repo focused on what might be equivalent to an outtage with all nodes restarting too quickly. If I recall, I did observe a slow but successful topology update if a node is restarted one at a time with enough time between. There is a bit of time on the client side where it still has the old node ip and can't connect to the new one yet. I'm just really concerned about more unexpected situations where nodes in a small cluster restart too quickly leaving the cluster unaccessible from a running client because of the stale client pool. |
@ufoot @sseidman and I all work together and have spent time investigating this issue. We've filed #1668 to document our findings. After initially thinking we were seeing a manifestation of what's reported here, I now think we're dealing with something different for the following reasons:
It may be that there's more than one underlying issue that can cause some kind of topology corruption in gocql, but it feels to me like the acute issue we're currently experiencing is likely to be different from the one originally described here. |
Please answer these questions before submitting your issue. Thanks!
What version of Cassandra are you using?
N/A (Yugabyte 2.9.0)
What version of Gocql are you using?
gocql@v0.0.0-20200602185649-ef3952a45ff4
(https://github.com/yugabyte/gocql)
What version of Go are you using?
1.17.0
What did you do?
Scale all database nodes in cluster down from 3 -> 0 and then back from 0 -> 3
What did you expect to see?
I expected to see the gocql client log the connection failure for the 3 database node ip addresses, but then recover with 3 new ip addresses for the 3 new database pods.
What did you see instead?
The client continues to see the original 3 node ips as DOWN and never recovers with the new nodes.
refs #1575 , yugabyte/yugabyte-db#10182
Because the forked client I am using is outdated, it is possible that this issue is resolved in a newer release of gocql, or that it is still outstanding as per #1575
My connection host list look like the following:
In k8s these are 3 unique ip addresses in the cluster. When I scale down and back up, the pods have new ip addresses.
I am attaching the client application logs, using the build tag
gocql_debug
, showing two different scenarios.The first test is where I only scale from 3 -> 2 and then back up from 2 -> 3.
This log shows that it does seem to handle the old IP being down but then the new IP becoming part of the ring again.
cql_scale_1.log
The second test is where I scale all 3 nodes ->0 and then back up to 3 again. I am using a policy that scales each node, one at a time. The log shows the nodes going down, but then never recover with any sign of the new nodes.
cql_scale_all.log
If I manually kill the database pods, depending on the order I choose, I may or may not see this topology event:
The text was updated successfully, but these errors were encountered: