Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kuberesolver didn't update endpoints when service changed #4

Closed
tomwilkie opened this issue Nov 20, 2017 · 3 comments
Closed

kuberesolver didn't update endpoints when service changed #4

tomwilkie opened this issue Nov 20, 2017 · 3 comments

Comments

@tomwilkie
Copy link

tomwilkie commented Nov 20, 2017

I have some frontends using kuberesolver to update and find backends and talk to them. I deployed updated backends and one of the frontends never updated to see the new backends, just started erroring with:

2017/11/19 13:02:10 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 10.60.2.10:9095: getsockopt: connection refused"; Reconnecting to {10.60.2.10:9095 <nil>}

On the other frontends, at around the same time, I get:

2017/11/19 13:02:10 kuberesolver: 10.60.2.10:9095 DELETED from querier
2017/11/19 13:02:10 Failed to dial 10.60.2.10:9095: context canceled; please retry.
2017/11/19 13:02:27 kuberesolver: 10.60.2.11:9095 ADDED to querier
2017/11/19 13:02:37 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 10.60.1.8:9095: getsockopt: connection refused"; Reconnecting to {10.60.1.8:9095 <nil>}
2017/11/19 13:02:37 kuberesolver: 10.60.1.8:9095 DELETED from querier
2017/11/19 13:02:37 Failed to dial 10.60.1.8:9095: context canceled; please retry.
2017/11/19 13:02:53 kuberesolver: 10.60.1.18:9095 ADDED to querier
2017/11/19 13:03:03 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 10.60.0.8:9095: getsockopt: connection refused"; Reconnecting to {10.60.0.8:9095 <nil>}
2017/11/19 13:03:03 kuberesolver: 10.60.0.8:9095 DELETED from querier

Looking at the goroutine dump for the frontend with the failures, I see that one of the watch goroutines is just sitting there (maybe didn't get events from api-server?):

goroutine 30 [select, 2397 minutes]:
github.com/sercand/kuberesolver.(*kubeResolver).watch(0xc4201c3f20, 0xc4201c3f0d, 0x7, 0xc4201c4c60, 0xc4201c4c00, 0x0, 0x0)
	vendor/github.com/sercand/kuberesolver/resolver.go:73 +0x56b
github.com/sercand/kuberesolver.(*kubeResolver).Resolve.func1()
	vendor/github.com/sercand/kuberesolver/resolver.go:38 +0x52
github.com/sercand/kuberesolver.until.func1(0xc4201c1cb0)
	vendor/github.com/sercand/kuberesolver/util.go:20 +0x43
github.com/sercand/kuberesolver.until(0xc4201c1cb0, 0x3b9aca00, 0xc4201c4c60)
	vendor/github.com/sercand/kuberesolver/util.go:21 +0x73
created by github.com/sercand/kuberesolver.(*kubeResolver).Resolve
	vendor/github.com/sercand/kuberesolver/resolver.go:42 +0x1ac

The only other similar stack trace:

goroutine 50 [select, 28 minutes]:
github.com/sercand/kuberesolver.(*kubeResolver).watch(0xc420226740, 0xc42022672d, 0xb, 0xc4201c4f60, 0xc4201c4f00, 0x0, 0x0)
	vendor/github.com/sercand/kuberesolver/resolver.go:73 +0x56b
github.com/sercand/kuberesolver.(*kubeResolver).Resolve.func1()
	vendor/github.com/sercand/kuberesolver/resolver.go:38 +0x52
github.com/sercand/kuberesolver.until.func1(0xc420230060)
	vendor/github.com/sercand/kuberesolver/util.go:20 +0x43
github.com/sercand/kuberesolver.until(0xc420230060, 0x3b9aca00, 0xc4201c4f60)
	vendor/github.com/sercand/kuberesolver/util.go:21 +0x73
created by github.com/sercand/kuberesolver.(*kubeResolver).Resolve
	vendor/github.com/sercand/kuberesolver/resolver.go:42 +0x1ac

Which corresponds nicely with the two kuberesolver-d backend services I have.

Perhaps there should be a timeout in this watch, to catch intermittent errors like this? I think this is how the kubernetes golang client behaves.

@sercand
Copy link
Owner

sercand commented Nov 28, 2017

Which Grpc version are you using? Grpc-go added new resolver and balancer APIs and I did not check if it is working correctly with newer API.

@benley
Copy link

benley commented Feb 12, 2019

It looks like this was probably fixed in #6

@sercand
Copy link
Owner

sercand commented Feb 13, 2019

@benley most probably

@sercand sercand closed this as completed Feb 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants