Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nsq_to_nsq: panic on shutdown #95

Merged
merged 1 commit into from
Dec 23, 2014
Merged

Conversation

mreiferson
Copy link
Member

My steps:

Using the following run file for nsq_to_nsq, have a local nsqd running. Do not have a destination nsqd running. sv stop nsq_to_nsq once (this hangs for 10 seconds). nsq_to_nsq continues backing off that it cannot connect to the remote nsqd. sv stop again, panic. If you need more details, please ask.

Run file:

#!/bin/sh
exec 2>&1

export GOMAXPROCS=2

exec chpst -u nsq:nsq /opt/nsq/bin/nsq_to_nsq \
    -topic=stuff \
    -destination-topic=stuff \
    -destination-nsqd-tcp-address=nsq-instance-1:1910 \
    -nsqd-tcp-address=localhost:1910 \
    -channel=nsq_to_nsq \
    -consumer-opt=tls_root_ca_file,/opt/nsq/certs/nsq-ca-cert.pem \
    -consumer-opt=tls_cert,/opt/nsq/certs/nsq-client-cert.pem \
    -consumer-opt=tls_key,/opt/nsq/certs/nsq-client-key.pem \
    -consumer-opt=tls_v1,true \
    -producer-opt=tls_root_ca_file,/opt/nsq/certs/nsq-ca-cert.pem \
    -producer-opt=tls_cert,/opt/nsq/certs/nsq-client-cert.pem \
    -producer-opt=tls_key,/opt/fnsq/certs/nsq-client-key.pem \
    -producer-opt=tls_v1,true

Log trace on panic:

2014-11-21_08:55:07.68147 2014/11/21 08:55:07 INF    2 (nsq-instance-1:1910) connecting to nsqd
2014-11-21_08:55:08.68159 2014/11/21 08:55:08 ERR    2 (nsq-instance-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_08:55:08.68162 2014/11/21 08:55:08 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041e875000
2014-11-21_08:55:08.68163 2014/11/21 08:55:08 INF    2 (nsq-instance-1:1910) connecting to nsqd
2014-11-21_08:55:09.68180 2014/11/21 08:55:09 ERR    2 (nsq-instance-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_08:55:09.68182 2014/11/21 08:55:09 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041e875001
2014-11-21_08:55:09.68183 2014/11/21 08:55:09 INF    2 (nsq-instance-1:1910) connecting to nsqd
2014-11-21_08:55:10.59484 2014/11/21 08:55:10 INF    1 [stuff/nsq_to_nsq] stopping handlers
2014-11-21_08:55:10.59814 panic: runtime error: send on closed channel
2014-11-21_08:55:10.59815 
2014-11-21_08:55:10.59816 goroutine 25 [running]:
2014-11-21_08:55:10.59816 runtime.panic(0x6fe900, 0x8eea1e)
2014-11-21_08:55:10.59816   /usr/local/Cellar/go/1.3.1/libexec/src/pkg/runtime/panic.c:279 +0xf5
2014-11-21_08:55:10.59817 github.com/bitly/go-nsq.(*Consumer).onConnMessage(0xc20802e6c0, 0xc208003c20, 0xc2080c18b0)
2014-11-21_08:55:10.59817   /var/folders/4q/k4wfn18j7v79zj7l2tqr_5p80000gn/T/godep/rev/ac/221df5bdb6d5bfc624a297b5b00b59d7065be2/src/github.com/bitly/go-nsq/consumer.go:490 +0x90
2014-11-21_08:55:10.59818 github.com/bitly/go-nsq.(*consumerConnDelegate).OnMessage(0xc208036040, 0xc208003c20, 0xc2080c18b0)
2014-11-21_08:55:10.59818   /var/folders/4q/k4wfn18j7v79zj7l2tqr_5p80000gn/T/godep/rev/ac/221df5bdb6d5bfc624a297b5b00b59d7065be2/src/github.com/bitly/go-nsq/delegates.go:119 +0x3e
2014-11-21_08:55:10.59819 github.com/bitly/go-nsq.(*Conn).readLoop(0xc208003c20)
2014-11-21_08:55:10.59819   /var/folders/4q/k4wfn18j7v79zj7l2tqr_5p80000gn/T/godep/rev/ac/221df5bdb6d5bfc624a297b5b00b59d7065be2/src/github.com/bitly/go-nsq/conn.go:495 +0xb5e
2014-11-21_08:55:10.59819 created by github.com/bitly/go-nsq.(*Conn).Connect
2014-11-21_08:55:10.59820   /var/folders/4q/k4wfn18j7v79zj7l2tqr_5p80000gn/T/godep/rev/ac/221df5bdb6d5bfc624a297b5b00b59d7065be2/src/github.com/bitly/go-nsq/conn.go:168 +0x65c
2014-11-21_08:55:10.59821 
2014-11-21_08:55:10.59821 goroutine 16 [select]:
2014-11-21_08:55:10.59821 main.main()
2014-11-21_08:55:10.59822   /Users/mreiferson/dev/src/github.com/bitly/nsq/apps/nsq_to_nsq/nsq_to_nsq.go:396 +0x1586
2014-11-21_08:55:10.59822 
2014-11-21_08:55:10.59823 goroutine 19 [finalizer wait, 1 minutes]:
2014-11-21_08:55:10.59823 runtime.park(0x417500, 0x8f3788, 0x8f1a09)
2014-11-21_08:55:10.59823   /usr/local/Cellar/go/1.3.1/libexec/src/pkg/runtime/proc.c:1369 +0x89
2014-11-21_08:55:10.59824 runtime.parkunlock(0x8f3788, 0x8f1a09)
2014-11-21_08:55:10.59824   /usr/local/Cellar/go/1.3.1/libexec/src/pkg/runtime/proc.c:1385 +0x3b
2014-11-21_08:55:10.59825 runfinq()
2014-11-21_08:55:10.59825   /usr/local/Cellar/go/1.3.1/libexec/src/pkg/runtime/mgc0.c:2644 +0xcf
2014-11-21_08:55:10.59825 runtime.goexit()
2014-11-21_08:55:10.59826   /usr/local/Cellar/go/1.3.1/libexec/src/pkg/runtime/proc.c:1445
2014-11-21_08:55:10.59826 
2014-11-21_08:55:10.59826 goroutine 20 [syscall]:
2014-11-21_08:55:10.59827 os/signal.loop()
2014-11-21_08:55:10.59827   /usr/local/Cellar/go/1.3.1/libexec/src/pkg/os/signal/signal_unix.go:21 +0x1e
2014-11-21_08:55:10.59828 created by os/signal.init·1
2014-11-21_08:55:10.59828   /usr/local/Cellar/go/1.3.1/libexec/src/pkg/os/signal/signal_unix.go:27 +0x32
2014-11-21_08:55:10.59828 
2014-11-21_08:55:10.59829 goroutine 21 [select]:
2014-11-21_08:55:10.59829 github.com/bitly/go-nsq.(*Consumer).rdyLoop(0xc20802e6c0)
2014-11-21_08:55:10.59831   /var/folders/4q/k4wfn18j7v79zj7l2tqr_5p80000gn/T/godep/rev/ac/221df5bdb6d5bfc624a297b5b00b59d7065be2/src/github.com/bitly/go-nsq/consumer.go:625 +0xee1
2014-11-21_08:55:10.59831 created by github.com/bitly/go-nsq.NewConsumer
2014-11-21_08:55:10.59832   /var/folders/4q/k4wfn18j7v79zj7l2tqr_5p80000gn/T/godep/rev/ac/221df5bdb6d5bfc624a297b5b00b59d7065be2/src/github.com/bitly/go-nsq/consumer.go:155 +0x5ef
2014-11-21_08:55:10.59832 
2014-11-21_08:55:10.59833 goroutine 22 [IO wait]:
2014-11-21_08:55:10.59833 net.runtime_pollWait(0x7f08e63c7b20, 0x77, 0x0)
2014-11-21_08:55:10.59834   /usr/local/Cellar/go/1.3.1/libexec/src/pkg/runtime/netpoll.goc:146 +0x66
2014-11-21_08:55:10.59834 net.(*pollDesc).Wait(0xc208132f40, 0x77, 0x0, 0x0)
2014-11-21_08:55:10.59834   /usr/local/Cellar/go/1.3.1/libexec/src/pkg/net/fd_poll_runtime.go:84 +0x46
2014-11-21_08:55:10.59835 net.(*pollDesc).WaitWrite(0xc208132f40, 0x0, 0x0)
2014-11-21_08:55:10.59835   /usr/local/Cellar/go/1.3.1/libexec/src/pkg/net/fd_poll_runtime.go:93 +0x42
2014-11-21_08:55:10.59836 net.(*netFD).connect(0xc208132ee0, 0x0, 0x0, 0x7f08e63c6c88, 0xc2080e5f60, 0xecc00f4ee, 0x28a53e51, 0x8f4de0, 0x0, 0x0)
2014-11-21_08:55:10.59836   /usr/local/Cellar/go/1.3.1/libexec/src/pkg/net/fd_unix.go:114 +0x22e
2014-11-21_08:55:10.59837 net.(*netFD).dial(0xc208132ee0, 0x7f08e63c6c38, 0x0, 0x7f08e63c6c38, 0xc208010c60, 0xecc00f4ee, 0x28a53e51, 0x8f4de0, 0x7d8440, 0x0, ...)
2014-11-21_08:55:10.59837   /usr/local/Cellar/go/1.3.1/libexec/src/pkg/net/sock_posix.go:115 +0x321
2014-11-21_08:55:10.59838 net.socket(0x744b90, 0x3, 0x2, 0x1, 0x0, 0xc208010c00, 0x7f08e63c6c38, 0x0, 0x7f08e63c6c38, 0xc208010c60, ...)
2014-11-21_08:55:10.59839   /usr/local/Cellar/go/1.3.1/libexec/src/pkg/net/sock_posix.go:91 +0x424
2014-11-21_08:55:10.59839 net.internetSocket(0x744b90, 0x3, 0x7f08e63c6c38, 0x0, 0x7f08e63c6c38, 0xc208010c60, 0xecc00f4ee, 0x28a53e51, 0x8f4de0, 0x1, ...)
2014-11-21_08:55:10.59839   /usr/local/Cellar/go/1.3.1/libexec/src/pkg/net/ipsock_posix.go:137 +0x161
2014-11-21_08:55:10.59840 net.dialTCP(0x744b90, 0x3, 0x0, 0xc208010c60, 0xecc00f4ee, 0x28a53e51, 0x8f4de0, 0x0, 0x0, 0x0)
2014-11-21_08:55:10.59840   /usr/local/Cellar/go/1.3.1/libexec/src/pkg/net/tcpsock_posix.go:156 +0x13a
2014-11-21_08:55:10.59841 net.dialSingle(0x744b90, 0x3, 0x7fff12835d6f, 0x12, 0x0, 0x0, 0x7f08e63c6ba8, 0xc208010c60, 0xecc00f4ee, 0x28a53e51, ...)
2014-11-21_08:55:10.59841   /usr/local/Cellar/go/1.3.1/libexec/src/pkg/net/dial.go:238 +0x3ef
2014-11-21_08:55:10.59842 net.func·016(0xecc00f4ee, 0x28a53e51, 0x8f4de0, 0x0, 0x0, 0x0, 0x0)
2014-11-21_08:55:10.59842   /usr/local/Cellar/go/1.3.1/libexec/src/pkg/net/dial.go:164 +0x132
2014-11-21_08:55:10.59843 net.dial(0x744b90, 0x3, 0x7f08e63c6ba8, 0xc208010c60, 0x7f08e6238978, 0xecc00f4ee, 0x28a53e51, 0x8f4de0, 0x0, 0x0, ...)
2014-11-21_08:55:10.59843   /usr/local/Cellar/go/1.3.1/libexec/src/pkg/net/fd_unix.go:40 +0x75
2014-11-21_08:55:10.59844 net.(*Dialer).Dial(0xc2080eb9c0, 0x744b90, 0x3, 0x7fff12835d6f, 0x12, 0x0, 0x0, 0x0, 0x0)
2014-11-21_08:55:10.59844   /usr/local/Cellar/go/1.3.1/libexec/src/pkg/net/dial.go:171 +0x44f
2014-11-21_08:55:10.59845 net.DialTimeout(0x744b90, 0x3, 0x7fff12835d6f, 0x12, 0x3b9aca00, 0x0, 0x0, 0x0, 0x0)
2014-11-21_08:55:10.59845   /usr/local/Cellar/go/1.3.1/libexec/src/pkg/net/dial.go:151 +0xda
2014-11-21_08:55:10.59846 github.com/bitly/go-nsq.(*Conn).Connect(0xc20802dc20, 0x9, 0x0, 0x0)
2014-11-21_08:55:10.59846   /var/folders/4q/k4wfn18j7v79zj7l2tqr_5p80000gn/T/godep/rev/ac/221df5bdb6d5bfc624a297b5b00b59d7065be2/src/github.com/bitly/go-nsq/conn.go:135 +0x7f
2014-11-21_08:55:10.59847 github.com/bitly/go-nsq.(*Producer).connect(0xc20809a000, 0x0, 0x0)
2014-11-21_08:55:10.59847   /var/folders/4q/k4wfn18j7v79zj7l2tqr_5p80000gn/T/godep/rev/ac/221df5bdb6d5bfc624a297b5b00b59d7065be2/src/github.com/bitly/go-nsq/producer.go:220 +0x3f4
2014-11-21_08:55:10.59848 github.com/bitly/go-nsq.(*Producer).sendCommandAsync(0xc20809a000, 0xc2080c1860, 0xc2080d0900, 0xc208010ae0, 0x3, 0x3, 0x0, 0x0)
2014-11-21_08:55:10.59849   /var/folders/4q/k4wfn18j7v79zj7l2tqr_5p80000gn/T/godep/rev/ac/221df5bdb6d5bfc624a297b5b00b59d7065be2/src/github.com/bitly/go-nsq/producer.go:182 +0xd7
2014-11-21_08:55:10.59850 github.com/bitly/go-nsq.(*Producer).PublishAsync(0xc20809a000, 0x7fff12835d47, 0x9, 0xc20807e000, 0xea, 0x600, 0xc2080d0900, 0xc208010ae0, 0x3, 0x3, ...)
2014-11-21_08:55:10.59850   /var/folders/4q/k4wfn18j7v79zj7l2tqr_5p80000gn/T/godep/rev/ac/221df5bdb6d5bfc624a297b5b00b59d7065be2/src/github.com/bitly/go-nsq/producer.go:128 +0x1bb
2014-11-21_08:55:10.59851 main.(*PublishHandler).HandleMessage(0xc2080335e0, 0xc2080ef1d0, 0x0, 0x0)
2014-11-21_08:55:10.59851   /Users/mreiferson/dev/src/github.com/bitly/nsq/apps/nsq_to_nsq/nsq_to_nsq.go:244 +0x36e
2014-11-21_08:55:10.59852 github.com/bitly/go-nsq.(*Consumer).handlerLoop(0xc20802e6c0, 0x7f08e63c6ae8, 0xc2080335e0)
2014-11-21_08:55:10.59852   /var/folders/4q/k4wfn18j7v79zj7l2tqr_5p80000gn/T/godep/rev/ac/221df5bdb6d5bfc624a297b5b00b59d7065be2/src/github.com/bitly/go-nsq/consumer.go:926 +0x1c5
2014-11-21_08:55:10.59853 created by github.com/bitly/go-nsq.(*Consumer).AddConcurrentHandlers
2014-11-21_08:55:10.59853   /var/folders/4q/k4wfn18j7v79zj7l2tqr_5p80000gn/T/godep/rev/ac/221df5bdb6d5bfc624a297b5b00b59d7065be2/src/github.com/bitly/go-nsq/consumer.go:908 +0xf1
2014-11-21_08:55:10.59854 
2014-11-21_08:55:10.59854 goroutine 23 [chan receive, 1 minutes]:
2014-11-21_08:55:10.59855 main.(*PublishHandler).responder(0xc2080335e0)
2014-11-21_08:55:10.59856   /Users/mreiferson/dev/src/github.com/bitly/nsq/apps/nsq_to_nsq/nsq_to_nsq.go:92 +0xb9
2014-11-21_08:55:10.59856 created by main.main
2014-11-21_08:55:10.59856   /Users/mreiferson/dev/src/github.com/bitly/nsq/apps/nsq_to_nsq/nsq_to_nsq.go:382 +0x12f2
2014-11-21_08:55:10.59857 
2014-11-21_08:55:10.59857 goroutine 26 [select]:
2014-11-21_08:55:10.59858 github.com/bitly/go-nsq.(*Conn).writeLoop(0xc208003c20)
2014-11-21_08:55:10.59858   /var/folders/4q/k4wfn18j7v79zj7l2tqr_5p80000gn/T/godep/rev/ac/221df5bdb6d5bfc624a297b5b00b59d7065be2/src/github.com/bitly/go-nsq/conn.go:523 +0x7bc
2014-11-21_08:55:10.59859 created by github.com/bitly/go-nsq.(*Conn).Connect
2014-11-21_08:55:10.59859   /var/folders/4q/k4wfn18j7v79zj7l2tqr_5p80000gn/T/godep/rev/ac/221df5bdb6d5bfc624a297b5b00b59d7065be2/src/github.com/bitly/go-nsq/conn.go:169 +0x677

@mreiferson mreiferson changed the title nsq_to_nsq panic on shutdown via go-nsq api. nsq_to_nsq: panic on shutdown Nov 21, 2014
@mreiferson
Copy link
Member

hi @twmb, thanks for the report.

I'm having trouble reproducing this locally - can you provide more of the logs before where you began to paste?

@twmb
Copy link
Contributor Author

twmb commented Nov 21, 2014

more logs before:

2014-11-21_09:00:30.76432 2014/11/21 09:00:30 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:31.76475 2014/11/21 09:00:31 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:00:31.76477 2014/11/21 09:00:31 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec75125
2014-11-21_09:00:31.76478 2014/11/21 09:00:31 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:31.76479 2014/11/21 09:00:31 WRN    1 [stuff/nsq_to_nsq] backing off for 2.0000 seconds (backoff level 1), setting all to RDY 0
2014-11-21_09:00:32.76811 2014/11/21 09:00:32 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:00:32.76813 2014/11/21 09:00:32 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec75126
2014-11-21_09:00:32.76815 2014/11/21 09:00:32 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:33.76090 2014/11/21 09:00:33 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: no route to host
2014-11-21_09:00:33.76093 2014/11/21 09:00:33 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: no route to host) for msg 076f3d041ec75127
2014-11-21_09:00:33.76093 2014/11/21 09:00:33 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:33.76476 2014/11/21 09:00:33 WRN    1 [stuff/nsq_to_nsq] (localhost:1910) backoff timeout expired, sending RDY 1
2014-11-21_09:00:34.76144 2014/11/21 09:00:34 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:00:34.76147 2014/11/21 09:00:34 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec75128
2014-11-21_09:00:34.76148 2014/11/21 09:00:34 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:34.76148 2014/11/21 09:00:34 WRN    1 [stuff/nsq_to_nsq] backing off for 4.0000 seconds (backoff level 2), setting all to RDY 0
2014-11-21_09:00:35.76157 2014/11/21 09:00:35 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:00:35.76160 2014/11/21 09:00:35 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec75129
2014-11-21_09:00:35.76161 2014/11/21 09:00:35 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:36.76194 2014/11/21 09:00:36 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:00:36.76197 2014/11/21 09:00:36 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec7512a
2014-11-21_09:00:36.76198 2014/11/21 09:00:36 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:37.76239 2014/11/21 09:00:37 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:00:37.76242 2014/11/21 09:00:37 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec7512b
2014-11-21_09:00:37.76242 2014/11/21 09:00:37 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:38.76155 2014/11/21 09:00:38 WRN    1 [stuff/nsq_to_nsq] (localhost:1910) backoff timeout expired, sending RDY 1
2014-11-21_09:00:38.76263 2014/11/21 09:00:38 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:00:38.76264 2014/11/21 09:00:38 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec7512c
2014-11-21_09:00:38.76264 2014/11/21 09:00:38 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:38.76265 2014/11/21 09:00:38 WRN    1 [stuff/nsq_to_nsq] backing off for 8.0000 seconds (backoff level 3), setting all to RDY 0
2014-11-21_09:00:39.76270 2014/11/21 09:00:39 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:00:39.76273 2014/11/21 09:00:39 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec7512d
2014-11-21_09:00:39.76273 2014/11/21 09:00:39 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:39.79121 2014/11/21 09:00:39 INF    1 [stuff/nsq_to_nsq] stopping...
2014-11-21_09:00:40.76310 2014/11/21 09:00:40 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:00:40.76312 2014/11/21 09:00:40 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec7512e
2014-11-21_09:00:40.76312 2014/11/21 09:00:40 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:41.76313 2014/11/21 09:00:41 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:00:41.76317 2014/11/21 09:00:41 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec7512f
2014-11-21_09:00:41.76318 2014/11/21 09:00:41 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:42.76356 2014/11/21 09:00:42 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:00:42.76358 2014/11/21 09:00:42 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec75130
2014-11-21_09:00:42.76358 2014/11/21 09:00:42 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:43.76368 2014/11/21 09:00:43 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:00:43.76370 2014/11/21 09:00:43 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec75131
2014-11-21_09:00:43.76371 2014/11/21 09:00:43 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:44.76407 2014/11/21 09:00:44 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:00:44.76409 2014/11/21 09:00:44 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec75132
2014-11-21_09:00:44.76410 2014/11/21 09:00:44 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:45.76415 2014/11/21 09:00:45 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:00:45.76418 2014/11/21 09:00:45 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec75133
2014-11-21_09:00:45.76418 2014/11/21 09:00:45 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:46.76269 2014/11/21 09:00:46 WRN    1 [stuff/nsq_to_nsq] (localhost:1910) backoff timeout expired, sending RDY 1
2014-11-21_09:00:46.76433 2014/11/21 09:00:46 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:00:46.76435 2014/11/21 09:00:46 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec75134
2014-11-21_09:00:46.76445 2014/11/21 09:00:46 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:46.76446 2014/11/21 09:00:46 WRN    1 [stuff/nsq_to_nsq] backing off for 16.0000 seconds (backoff level 4), setting all to RDY 0
2014-11-21_09:00:47.76476 2014/11/21 09:00:47 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:00:47.76478 2014/11/21 09:00:47 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec75135
2014-11-21_09:00:47.76478 2014/11/21 09:00:47 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:48.76089 2014/11/21 09:00:48 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: no route to host
2014-11-21_09:00:48.76092 2014/11/21 09:00:48 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: no route to host) for msg 076f3d041ec75136
2014-11-21_09:00:48.76092 2014/11/21 09:00:48 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:49.76140 2014/11/21 09:00:49 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:00:49.76143 2014/11/21 09:00:49 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec75137
2014-11-21_09:00:49.76143 2014/11/21 09:00:49 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:50.76169 2014/11/21 09:00:50 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:00:50.76171 2014/11/21 09:00:50 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec75138
2014-11-21_09:00:50.76174 2014/11/21 09:00:50 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:51.76091 2014/11/21 09:00:51 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: no route to host
2014-11-21_09:00:51.76092 2014/11/21 09:00:51 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: no route to host) for msg 076f3d041ec75139
2014-11-21_09:00:51.76094 2014/11/21 09:00:51 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:52.76119 2014/11/21 09:00:52 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:00:52.76122 2014/11/21 09:00:52 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec7513a
2014-11-21_09:00:52.76122 2014/11/21 09:00:52 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:53.76156 2014/11/21 09:00:53 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:00:53.76158 2014/11/21 09:00:53 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec7513b
2014-11-21_09:00:53.76158 2014/11/21 09:00:53 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:54.76164 2014/11/21 09:00:54 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:00:54.76167 2014/11/21 09:00:54 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec7513c
2014-11-21_09:00:54.76167 2014/11/21 09:00:54 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:55.76205 2014/11/21 09:00:55 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:00:55.76207 2014/11/21 09:00:55 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec7513d
2014-11-21_09:00:55.76207 2014/11/21 09:00:55 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:56.76234 2014/11/21 09:00:56 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:00:56.76236 2014/11/21 09:00:56 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec7513e
2014-11-21_09:00:56.76237 2014/11/21 09:00:56 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:57.76241 2014/11/21 09:00:57 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:00:57.76243 2014/11/21 09:00:57 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec7513f
2014-11-21_09:00:57.76243 2014/11/21 09:00:57 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:58.76089 2014/11/21 09:00:58 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: no route to host
2014-11-21_09:00:58.76093 2014/11/21 09:00:58 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: no route to host) for msg 076f3d041ec75140
2014-11-21_09:00:58.76093 2014/11/21 09:00:58 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:00:59.76126 2014/11/21 09:00:59 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:00:59.76128 2014/11/21 09:00:59 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec75141
2014-11-21_09:00:59.76128 2014/11/21 09:00:59 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:01:00.76148 2014/11/21 09:01:00 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:01:00.76150 2014/11/21 09:01:00 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec75142
2014-11-21_09:01:00.76152 2014/11/21 09:01:00 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:01:01.76507 2014/11/21 09:01:01 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:01:01.76509 2014/11/21 09:01:01 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec75143
2014-11-21_09:01:01.76510 2014/11/21 09:01:01 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:01:02.76506 2014/11/21 09:01:02 WRN    1 [stuff/nsq_to_nsq] (localhost:1910) backoff timeout expired, sending RDY 1
2014-11-21_09:01:02.76546 2014/11/21 09:01:02 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:01:02.76547 2014/11/21 09:01:02 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec75144
2014-11-21_09:01:02.76548 2014/11/21 09:01:02 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:01:02.76548 2014/11/21 09:01:02 WRN    1 [stuff/nsq_to_nsq] backing off for 32.0000 seconds (backoff level 5), setting all to RDY 0
2014-11-21_09:01:03.76315 2014/11/21 09:01:03 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: no route to host
2014-11-21_09:01:03.76318 2014/11/21 09:01:03 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: no route to host) for msg 076f3d041ec75145
2014-11-21_09:01:03.76318 2014/11/21 09:01:03 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:01:04.77337 2014/11/21 09:01:04 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:01:04.77339 2014/11/21 09:01:04 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec75146
2014-11-21_09:01:04.77340 2014/11/21 09:01:04 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:01:05.76512 2014/11/21 09:01:05 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:01:05.76515 2014/11/21 09:01:05 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec75147
2014-11-21_09:01:05.76515 2014/11/21 09:01:05 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:01:06.76102 2014/11/21 09:01:06 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: no route to host
2014-11-21_09:01:06.76105 2014/11/21 09:01:06 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: no route to host) for msg 076f3d041ec75148
2014-11-21_09:01:06.76106 2014/11/21 09:01:06 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:01:07.76126 2014/11/21 09:01:07 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:01:07.76129 2014/11/21 09:01:07 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec75149
2014-11-21_09:01:07.76129 2014/11/21 09:01:07 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:01:08.76151 2014/11/21 09:01:08 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:01:08.76154 2014/11/21 09:01:08 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec7514a
2014-11-21_09:01:08.76155 2014/11/21 09:01:08 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:01:09.76190 2014/11/21 09:01:09 ERR    2 (nsq-control-1:1910) error connecting to nsqd - dial tcp 1.2.3.4:5678: i/o timeout
2014-11-21_09:01:09.76193 2014/11/21 09:01:09 ERR    1 [stuff/nsq_to_nsq] Handler returned error (dial tcp 1.2.3.4:5678: i/o timeout) for msg 076f3d041ec7514b
2014-11-21_09:01:09.76193 2014/11/21 09:01:09 INF    2 (nsq-control-1:1910) connecting to nsqd
2014-11-21_09:01:09.79134 2014/11/21 09:01:09 INF    1 [stuff/nsq_to_nsq] stopping handlers
2014-11-21_09:01:09.79403 panic: runtime error: send on closed channel
// panic, which I pasted

@twmb
Copy link
Contributor Author

twmb commented Nov 21, 2014

/opt/nsq/bin/nsq_to_nsq -version
nsq_to_nsq v0.2.31

@mreiferson
Copy link
Member

(from discussion on IRC)

nsq.Consumer tries to ensure a clean exit by waiting for all handlers to finish processing before exiting.

The problem is that in this case it has already ingested N (max_in_flight) messages (default of 200) and they're queued up inside nsq_to_nsq waiting to be pipelined to the destination.

Also, because there is only one destination host configured, nsq_to_nsq only configures a single handler internally (producers map 1:1 with a destination nsqd).

So, in this case, because the consumer has ingested N messages and because there is only one handler, we serially fail to connect to the destination nsqd (each failed connection timing out at 1s). This exceeds the consumer's maximum clean-stop wait time of 30s, forcefully closing the consumer, which apparently causes a panic.

In this example the problem is exacerbated by the fact that this destination address is like a black hole, where each connection blocks for 1s until timing out (i.e. it would only require 30 messages in flight to hit this bug under these conditions).

We obviously can/should fix the panic, but the practical problem of handlers blocking is a much larger issue. Thinking out loud, I guess nsq.Producer could implement a connection back off strategy where it would fail instantly on subsequent requests to PUB rather than trying to connect each time.

Thoughts?

@mreiferson
Copy link
Member

RFR, one possible "solution" @jehiah

@mreiferson
Copy link
Member

There's nothing we can do about the general case of a handler blocking and clean close, but I'm going to open an issue to discuss the Producer backoff I mentioned earlier for the specific case of nsq_to_nsq.

@mreiferson
Copy link
Member

ping @jehiah

jehiah added a commit that referenced this pull request Dec 23, 2014
@jehiah jehiah merged commit 4e74fa1 into nsqio:master Dec 23, 2014
@mreiferson mreiferson deleted the stop_panic_95 branch February 7, 2015 05:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants