Auto reconnect or quickly exit when cassandra cluster down and then up. #1821

cmsxbc · 2020-03-19T03:47:32Z

Is your feature request related to a problem? Please describe.
Using cassandra cluster as storage.When the cassandra cluster down, this error occurs.And it will never recover though the cassandra is UP.
And if we want to restart loki, loki will fall because of loop forever.

Describe the solution you'd like
As gocql wounld fix it (eg. gocql-915 created at 2017 and never closed.)
Would loki do something to reconect? Or do not retry to flush forever.

Describe alternatives you've considered

Additional context
Add any other context or screenshots about the feature request here.

freggelicious · 2020-04-06T11:35:02Z

I'm experiencing the same issue. I run Cassandra as a Statefullset within my K8s cluster. For some reason my Cassandra cluster became unhealthy. I initiated a restarted of the Cassandra pods and the cluster was healthy again, but i got the same error as you mention. I was able to get it working again by restarting the Loki pod, but this is not preferred.

stale · 2020-05-06T11:43:32Z

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

liguozhong · 2022-07-20T09:41:11Z

I also have this error on online loki

liguozhong · 2022-07-20T09:42:25Z

**What this PR does / why we need it**: level=error ts=2022-07-20T04:07:11.881370946Z caller=flush.go:146 org_id=166256_8sxv2 msg="failed to flush user" err="store put chunk: gocql: no hosts available in the pool" The restart of cassandra （on k8s） caused the production accident that loki had to restart. It happened 4 times in our production environment in only half a year. I hope this problem can be fixed. The code of this PR is not particularly complete, but this PR seems to be available. If this PR does not merge, I also hope that other contributors in the loki community can propose another PR to fix this problem. The restart of cassandra that caused loki to be unavailable is already a conclusion recorded in our JIRA. **Which issue(s) this PR fixes**: Fixes #1821 #7140

…a#6725) **What this PR does / why we need it**: level=error ts=2022-07-20T04:07:11.881370946Z caller=flush.go:146 org_id=166256_8sxv2 msg="failed to flush user" err="store put chunk: gocql: no hosts available in the pool" The restart of cassandra （on k8s） caused the production accident that loki had to restart. It happened 4 times in our production environment in only half a year. I hope this problem can be fixed. The code of this PR is not particularly complete, but this PR seems to be available. If this PR does not merge, I also hope that other contributors in the loki community can propose another PR to fix this problem. The restart of cassandra that caused loki to be unavailable is already a conclusion recorded in our JIRA. **Which issue(s) this PR fixes**: Fixes grafana#1821 grafana#7140

stale bot added the stale A stale issue or PR that will automatically be closed. label May 6, 2020

stale bot closed this as completed May 13, 2020

liguozhong mentioned this issue Jul 20, 2022

[fix] cassandra: reconnection when k8s cassandra pods restart #6725

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto reconnect or quickly exit when cassandra cluster down and then up. #1821

Auto reconnect or quickly exit when cassandra cluster down and then up. #1821

cmsxbc commented Mar 19, 2020 •

edited

Loading

freggelicious commented Apr 6, 2020

stale bot commented May 6, 2020

liguozhong commented Jul 20, 2022

liguozhong commented Jul 20, 2022

Auto reconnect or quickly exit when cassandra cluster down and then up. #1821

Auto reconnect or quickly exit when cassandra cluster down and then up. #1821

Comments

cmsxbc commented Mar 19, 2020 • edited Loading

freggelicious commented Apr 6, 2020

stale bot commented May 6, 2020

liguozhong commented Jul 20, 2022

liguozhong commented Jul 20, 2022

cmsxbc commented Mar 19, 2020 •

edited

Loading