Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

Metrictank does not rediscover cassandra node IPs correctly #1566

Closed
replay opened this issue Dec 7, 2019 · 8 comments · Fixed by #1579
Closed

Metrictank does not rediscover cassandra node IPs correctly #1566

replay opened this issue Dec 7, 2019 · 8 comments · Fixed by #1579
Assignees
Labels
Milestone

Comments

@replay
Copy link
Contributor

replay commented Dec 7, 2019

In a scenario where the cassandra cluster temporarily becomes unavailable and then comes back with different IPs Metrictank does not re-discover the IPs, so it needs to be restarted.
This might possibly involve updating gocql.

@replay replay added this to the sprint-5 milestone Dec 7, 2019
@Dieterbe
Copy link
Contributor

Dieterbe commented Dec 9, 2019

you mean, when the configured hostname resolves to new ip's, MT should resolve the hostname again right?

@fkaleo
Copy link
Contributor

fkaleo commented Dec 9, 2019

Is it apache/cassandra-gocql-driver#915 ?

@replay
Copy link
Contributor Author

replay commented Dec 9, 2019

@Dieterbe yes, it will have to do another DNS lookup

@robert-milan
Copy link
Contributor

First, review Gocql changes and try to update.

@robert-milan
Copy link
Contributor

Do we want to implement this fix in Metrictank or gocql?

@woodsaj
Copy link
Member

woodsaj commented Dec 20, 2019

i had a quick look over the gocql code, and from what i could see it appears that there is intent for gocql to already do this. ie, gocql continually retries to connect to nodes, if all nodes are down it is supposed to try and reconnect to the address that was first passed in.

The problem is that during initialization, gocql resolves the passed hosts to IP addresses. It then uses these addresses in the reconnect code.

So personally, i think we should fix this in gocql and have it resolve the passed hostnames when retying and not just at startup.

@robert-milan
Copy link
Contributor

Yeah, I was reading over some of the tickets as well. As I started to implement the fix in Metrictank it seemed like adding a lot of complexity for one small thing, and then Florian suggested that we should just change it in gocql. I'll start looking at implementing it there instead.

@robert-milan
Copy link
Contributor

Updating it in GoCQL turned out to be more work than what is required to update it in Metrictank. For now we will update it in Metrictank, and when it is fixed in GoCQL we can revert the PR and update GoCQL again.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants