Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry on socket timeouts #9

Closed
wants to merge 1 commit into from
Closed

Conversation

alevchuk
Copy link
Contributor

Context: I have 3 nodes running in different environments and occasionally the monitor crashes after a few days on different nodes.

Crash log:
2020-01-10T14:37:07Z INFO Refresh took 0:00:00.654659 seconds, sleeping for 30.0 seconds
2020-01-10T14:37:38Z INFO Refresh took 0:00:00.596821 seconds, sleeping for 30.0 seconds
2020-01-10T14:38:08Z INFO Refresh took 0:00:00.600586 seconds, sleeping for 30.0 seconds
Traceback (most recent call last):
File "/home/bitcoin/jvstein/bitcoin-prometheus-exporter/bitcoind-monitor.py", line 334, in
main()
File "/home/bitcoin/jvstein/bitcoin-prometheus-exporter/bitcoind-monitor.py", line 318, in main
refresh_metrics()
File "/home/bitcoin/jvstein/bitcoin-prometheus-exporter/bitcoind-monitor.py", line 216, in refresh_metrics
blockchaininfo = bitcoinrpc("getblockchaininfo")
File "/home/bitcoin/monitoring-bitcoind/lib/python3.7/site-packages/riprova/retry.py", line 129, in wrapper
return retrier.run(fn, *args, **_kw)
File "/home/bitcoin/monitoring-bitcoind/lib/python3.7/site-packages/riprova/retrier.py", line 291, in run
self._handle_error(err)
File "/home/bitcoin/monitoring-bitcoind/lib/python3.7/site-packages/riprova/retrier.py", line 232, in _handle_error
raise err
File "/home/bitcoin/monitoring-bitcoind/lib/python3.7/site-packages/riprova/retrier.py", line 288, in run
return self._call(fn, *args, **kw)
File "/home/bitcoin/monitoring-bitcoind/lib/python3.7/site-packages/riprova/retrier.py", line 162, in _call
res = fn(*args, **kw)
File "/home/bitcoin/jvstein/bitcoin-prometheus-exporter/bitcoind-monitor.py", line 180, in bitcoinrpc
result = rpc_client().call(*args)
File "/home/bitcoin/monitoring-bitcoind/lib/python3.7/site-packages/bitcoin/rpc.py", line 352, in call
return self._call(service_name, *args)
File "/home/bitcoin/monitoring-bitcoind/lib/python3.7/site-packages/bitcoin/rpc.py", line 236, in _call
response = self._get_response()
File "/home/bitcoin/monitoring-bitcoind/lib/python3.7/site-packages/bitcoin/rpc.py", line 261, in _get_response
http_response = self.__conn.getresponse()
File "/usr/lib/python3.7/http/client.py", line 1321, in getresponse
response.begin()
File "/usr/lib/python3.7/http/client.py", line 296, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.7/http/client.py", line 257, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/lib/python3.7/socket.py", line 589, in readinto
return self._sock.recv_into(b)
socket.timeout: timed out

Context:  I have 3 nodes running in different environments and occasionally the monitor crashes after a few days on different nodes.

Crash log:
2020-01-10T14:37:07Z INFO Refresh took 0:00:00.654659 seconds, sleeping for 30.0 seconds
2020-01-10T14:37:38Z INFO Refresh took 0:00:00.596821 seconds, sleeping for 30.0 seconds
2020-01-10T14:38:08Z INFO Refresh took 0:00:00.600586 seconds, sleeping for 30.0 seconds
Traceback (most recent call last):
  File "/home/bitcoin/jvstein/bitcoin-prometheus-exporter/bitcoind-monitor.py", line 334, in <module>
    main()
  File "/home/bitcoin/jvstein/bitcoin-prometheus-exporter/bitcoind-monitor.py", line 318, in main
    refresh_metrics()
  File "/home/bitcoin/jvstein/bitcoin-prometheus-exporter/bitcoind-monitor.py", line 216, in refresh_metrics
    blockchaininfo = bitcoinrpc("getblockchaininfo")
  File "/home/bitcoin/monitoring-bitcoind/lib/python3.7/site-packages/riprova/retry.py", line 129, in wrapper
    return retrier.run(fn, *args, **_kw)
  File "/home/bitcoin/monitoring-bitcoind/lib/python3.7/site-packages/riprova/retrier.py", line 291, in run
    self._handle_error(err)
  File "/home/bitcoin/monitoring-bitcoind/lib/python3.7/site-packages/riprova/retrier.py", line 232, in _handle_error
    raise err
  File "/home/bitcoin/monitoring-bitcoind/lib/python3.7/site-packages/riprova/retrier.py", line 288, in run
    return self._call(fn, *args, **kw)
  File "/home/bitcoin/monitoring-bitcoind/lib/python3.7/site-packages/riprova/retrier.py", line 162, in _call
    res = fn(*args, **kw)
  File "/home/bitcoin/jvstein/bitcoin-prometheus-exporter/bitcoind-monitor.py", line 180, in bitcoinrpc
    result = rpc_client().call(*args)
  File "/home/bitcoin/monitoring-bitcoind/lib/python3.7/site-packages/bitcoin/rpc.py", line 352, in call
    return self._call(service_name, *args)
  File "/home/bitcoin/monitoring-bitcoind/lib/python3.7/site-packages/bitcoin/rpc.py", line 236, in _call
    response = self._get_response()
  File "/home/bitcoin/monitoring-bitcoind/lib/python3.7/site-packages/bitcoin/rpc.py", line 261, in _get_response
    http_response = self.__conn.getresponse()
  File "/usr/lib/python3.7/http/client.py", line 1321, in getresponse
    response.begin()
  File "/usr/lib/python3.7/http/client.py", line 296, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.7/http/client.py", line 257, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
socket.timeout: timed out
@alevchuk
Copy link
Contributor Author

alevchuk commented Jan 10, 2020

btw, tested by SIGSTOP and SIGCONT the bitcoind process:

2020-01-10T16:55:13Z INFO Refresh took 0:00:01.985098 seconds, sleeping for 30.0 seconds
2020-01-10T16:55:45Z INFO Refresh took 0:00:01.893927 seconds, sleeping for 30.0 seconds
2020-01-10T16:56:17Z INFO Refresh took 0:00:01.951264 seconds, sleeping for 30.0 seconds
2020-01-10T16:56:49Z INFO Refresh took 0:00:01.933636 seconds, sleeping for 30.0 seconds
2020-01-10T16:57:21Z INFO Refresh took 0:00:01.927020 seconds, sleeping for 30.0 seconds

2020-01-10T16:58:21Z ERROR Retry after exception socket.timeout: timed out
2020-01-10T16:58:22Z ERROR Refresh failed during retry. Cause: max timeout exceeded while retrying task: 30s
2020-01-10T16:58:22Z INFO Refresh took 0:00:30.535688 seconds, sleeping for 30.0 seconds
2020-01-10T16:58:54Z INFO Refresh took 0:00:01.938800 seconds, sleeping for 30.0 seconds
2020-01-10T16:59:26Z INFO Refresh took 0:00:01.884686 seconds, sleeping for 30.0 seconds
2020-01-10T16:59:58Z INFO Refresh took 0:00:01.919117 seconds, sleeping for 30.0 seconds

jvstein pushed a commit that referenced this pull request Jan 10, 2020
Context:  I have 3 nodes running in different environments and occasionally the monitor crashes after a few days on different nodes.

Crash log:
2020-01-10T14:37:07Z INFO Refresh took 0:00:00.654659 seconds, sleeping for 30.0 seconds
2020-01-10T14:37:38Z INFO Refresh took 0:00:00.596821 seconds, sleeping for 30.0 seconds
2020-01-10T14:38:08Z INFO Refresh took 0:00:00.600586 seconds, sleeping for 30.0 seconds
Traceback (most recent call last):
  File "/home/bitcoin/jvstein/bitcoin-prometheus-exporter/bitcoind-monitor.py", line 334, in <module>
    main()
  File "/home/bitcoin/jvstein/bitcoin-prometheus-exporter/bitcoind-monitor.py", line 318, in main
    refresh_metrics()
  File "/home/bitcoin/jvstein/bitcoin-prometheus-exporter/bitcoind-monitor.py", line 216, in refresh_metrics
    blockchaininfo = bitcoinrpc("getblockchaininfo")
  File "/home/bitcoin/monitoring-bitcoind/lib/python3.7/site-packages/riprova/retry.py", line 129, in wrapper
    return retrier.run(fn, *args, **_kw)
  File "/home/bitcoin/monitoring-bitcoind/lib/python3.7/site-packages/riprova/retrier.py", line 291, in run
    self._handle_error(err)
  File "/home/bitcoin/monitoring-bitcoind/lib/python3.7/site-packages/riprova/retrier.py", line 232, in _handle_error
    raise err
  File "/home/bitcoin/monitoring-bitcoind/lib/python3.7/site-packages/riprova/retrier.py", line 288, in run
    return self._call(fn, *args, **kw)
  File "/home/bitcoin/monitoring-bitcoind/lib/python3.7/site-packages/riprova/retrier.py", line 162, in _call
    res = fn(*args, **kw)
  File "/home/bitcoin/jvstein/bitcoin-prometheus-exporter/bitcoind-monitor.py", line 180, in bitcoinrpc
    result = rpc_client().call(*args)
  File "/home/bitcoin/monitoring-bitcoind/lib/python3.7/site-packages/bitcoin/rpc.py", line 352, in call
    return self._call(service_name, *args)
  File "/home/bitcoin/monitoring-bitcoind/lib/python3.7/site-packages/bitcoin/rpc.py", line 236, in _call
    response = self._get_response()
  File "/home/bitcoin/monitoring-bitcoind/lib/python3.7/site-packages/bitcoin/rpc.py", line 261, in _get_response
    http_response = self.__conn.getresponse()
  File "/usr/lib/python3.7/http/client.py", line 1321, in getresponse
    response.begin()
  File "/usr/lib/python3.7/http/client.py", line 296, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.7/http/client.py", line 257, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
socket.timeout: timed out
@jvstein
Copy link
Owner

jvstein commented Jan 10, 2020

Thanks! Merged.

@jvstein jvstein closed this Jan 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants