Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sseidman/ring update #1

Merged
merged 2 commits into from
Jan 25, 2023
Merged

Sseidman/ring update #1

merged 2 commits into from
Jan 25, 2023

Conversation

sseidman
Copy link

@sseidman sseidman commented Jan 25, 2023

The current issue is that there are no events dispatched for NEW_NODE changes. In a 3 node cassandra cluster deployed on cloud infrastructure, a cassandra instance was replaced using the replace_address flag so that it was moved to a new cloud instance (new IP, new host ID). The following debug logs are an example of how the client is currently handling this change in topology.

2023/01/11 20:59:35 gocql: handling frame: [topology_change change=NEW_NODE host=10.128.191.122 port=9042]
2023/01/11 20:59:35 gocql: handling frame: [status_change change=UP host=10.128.191.122 port=9042]
2023/01/11 20:59:35 gocql: dispatching event: &{change:UP host:[10 128 191 122] port:9042}
2023/01/11 20:59:35 gocql: Session.handleNodeUp: 10.128.191.122:9042

There is no event dispatched for the NEW_NODE event and it jumps straight to processing the UP event, so the new node is never added to session ring. If all nodes in the cluster are replaced in this manner, the eventual outcome is that clients lose connection to the cluster and begin outputting gocql: no hosts available in the pool.

After bisecting recent commits, I found this PR introduced the bug into the client. It looks like there was seemingly a fail-safe for when the NEW_NODE event was missed, but this function was removed in the previously mentioned PR.

This change proposes to refresh the hostSource ring on UP events when the host cannot be found in the ring. This ensures that the hostMap stays up to date even if NEW_NODE events are not processed.

This should fix apache#1668, apache#1667, and apache#1582

@sseidman sseidman merged commit d5a64cf into master Jan 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Session hosts are not being updated
1 participant