Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto reconnect or quickly exit when cassandra cluster down and then up. #1821

Closed
cmsxbc opened this issue Mar 19, 2020 · 4 comments · Fixed by #6725
Closed

Auto reconnect or quickly exit when cassandra cluster down and then up. #1821

cmsxbc opened this issue Mar 19, 2020 · 4 comments · Fixed by #6725
Labels
stale A stale issue or PR that will automatically be closed.

Comments

@cmsxbc
Copy link

cmsxbc commented Mar 19, 2020

Is your feature request related to a problem? Please describe.
Using cassandra cluster as storage.When the cassandra cluster down, this error occurs.And it will never recover though the cassandra is UP.
And if we want to restart loki, loki will fall because of loop forever.
image

Describe the solution you'd like
As gocql wounld fix it (eg. gocql-915 created at 2017 and never closed.)
Would loki do something to reconect? Or do not retry to flush forever.

Describe alternatives you've considered

Additional context
Add any other context or screenshots about the feature request here.

@freggelicious
Copy link

I'm experiencing the same issue. I run Cassandra as a Statefullset within my K8s cluster. For some reason my Cassandra cluster became unhealthy. I initiated a restarted of the Cassandra pods and the cluster was healthy again, but i got the same error as you mention. I was able to get it working again by restarting the Loki pod, but this is not preferred.

@stale
Copy link

stale bot commented May 6, 2020

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label May 6, 2020
@stale stale bot closed this as completed May 13, 2020
@liguozhong
Copy link
Contributor

I also have this error on online loki

@liguozhong
Copy link
Contributor

image

MichelHollands pushed a commit that referenced this issue Nov 8, 2022
**What this PR does / why we need it**:

level=error ts=2022-07-20T04:07:11.881370946Z caller=flush.go:146
org_id=166256_8sxv2 msg="failed to flush user" err="store put chunk:
gocql: no hosts available in the pool"

The restart of cassandra (on k8s) caused the production accident that
loki had to restart. It happened 4 times in our production environment
in only half a year.

I hope this problem can be fixed. The code of this PR is not
particularly complete, but this PR seems to be available.

If this PR does not merge, I also hope that other contributors in the
loki community can propose another PR to fix this problem.

The restart of cassandra that caused loki to be unavailable is already a
conclusion recorded in our JIRA.

**Which issue(s) this PR fixes**:
Fixes #1821 #7140
changhyuni pushed a commit to changhyuni/loki that referenced this issue Nov 8, 2022
…a#6725)

**What this PR does / why we need it**:

level=error ts=2022-07-20T04:07:11.881370946Z caller=flush.go:146
org_id=166256_8sxv2 msg="failed to flush user" err="store put chunk:
gocql: no hosts available in the pool"

The restart of cassandra (on k8s) caused the production accident that
loki had to restart. It happened 4 times in our production environment
in only half a year.

I hope this problem can be fixed. The code of this PR is not
particularly complete, but this PR seems to be available.

If this PR does not merge, I also hope that other contributors in the
loki community can propose another PR to fix this problem.

The restart of cassandra that caused loki to be unavailable is already a
conclusion recorded in our JIRA.

**Which issue(s) this PR fixes**:
Fixes grafana#1821 grafana#7140
Abuelodelanada pushed a commit to canonical/loki that referenced this issue Dec 1, 2022
…a#6725)

**What this PR does / why we need it**:

level=error ts=2022-07-20T04:07:11.881370946Z caller=flush.go:146
org_id=166256_8sxv2 msg="failed to flush user" err="store put chunk:
gocql: no hosts available in the pool"

The restart of cassandra (on k8s) caused the production accident that
loki had to restart. It happened 4 times in our production environment
in only half a year.

I hope this problem can be fixed. The code of this PR is not
particularly complete, but this PR seems to be available.

If this PR does not merge, I also hope that other contributors in the
loki community can propose another PR to fix this problem.

The restart of cassandra that caused loki to be unavailable is already a
conclusion recorded in our JIRA.

**Which issue(s) this PR fixes**:
Fixes grafana#1821 grafana#7140
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale A stale issue or PR that will automatically be closed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants