Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure in controller_upgrade_test.ControllerUpgradeTest.test_updating_cluster_when_executing_operations #5886

Closed
BenPope opened this issue Aug 8, 2022 · 5 comments
Assignees
Labels
ci-failure kind/bug Something isn't working

Comments

@BenPope
Copy link
Member

BenPope commented Aug 8, 2022

Version & Environment

Redpanda version: dev

https://buildkite.com/redpanda/redpanda/builds/13761#01827bd6-e3be-48dc-8851-3662e1bca36b

What went wrong?

CI Failure

What should have happened instead?

CI Success

How to reproduce the issue?

???

Additional information

[INFO  - 2022-08-08 05:52:40,466 - runner_client - log - lineno:278]: RunnerClient: rptest.tests.controller_upgrade_test.ControllerUpgradeTest.test_updating_cluster_when_executing_operations: FAIL: <BadLogLines nodes=docker-rp-10(4) example="ERROR 2022-08-08 05:48:52,100 [shard 0] rpc - Service handler threw an exception: std::exception (std::exception)">
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/root/tests/rptest/services/cluster.py", line 48, in wrapped
    self.redpanda.raise_on_bad_logs(allow_list=log_allow_list)
  File "/root/tests/rptest/services/redpanda.py", line 1121, in raise_on_bad_logs
    raise BadLogLines(bad_lines)
rptest.services.utils.BadLogLines: <BadLogLines nodes=docker-rp-10(4) example="ERROR 2022-08-08 05:48:52,100 [shard 0] rpc - Service handler threw an exception: std::exception (std::exception)">

And from docker-rp-10, during shutdown:

TRACE 2022-08-08 05:48:52,099 [shard 0] kafka - fetch.cc:548 - handling fetch request: {replica_id=-1 max_wait_ms=500 min_bytes=1 max_bytes=52428800 isolation_level=0 session_id=1 session_epoch=2894 topics={{name={topic-gpwztznbjk} fetch_partitions={{partition_index=0 current_leader_epoch=-1 fetch_offset=15425 log_start_offset=-1 max_bytes=1048576}}}} forgotten={} rack_id=}
INFO  2022-08-08 05:48:52,099 [shard 0] redpanda::main - application.cc:200 - Stopping...
INFO  2022-08-08 05:48:52,099 [shard 0] rpc - server.cc:242 - kafka rpc protocol - Stopping 2 listeners
TRACE 2022-08-08 05:48:52,099 [shard 1] kafka - fetch.cc:337 - fetch_ntps_in_parallel: for 2 partitions returning 0 total bytes
DEBUG 2022-08-08 05:48:52,099 [shard 0] rpc - server.cc:246 - kafka rpc protocol - Service probes {connects: 16, current connections: 2, connection close errors: 0, requests completed: 2947, received bytes: 342679, sent bytes: 733673, corrupted headers: 0, method not found errors: 0, requests blocked by memory: 0}
INFO  2022-08-08 05:48:52,099 [shard 0] rpc - server.cc:251 - kafka rpc protocol - Shutting down 2 connections
INFO  2022-08-08 05:48:52,099 [shard 1] rpc - server.cc:242 - kafka rpc protocol - Stopping 2 listeners
DEBUG 2022-08-08 05:48:52,099 [shard 1] rpc - server.cc:246 - kafka rpc protocol - Service probes {connects: 17, current connections: 2, connection close errors: 0, requests completed: 3270, received bytes: 666172, sent bytes: 232632, corrupted headers: 0, method not found errors: 0, requests blocked by memory: 0}
INFO  2022-08-08 05:48:52,099 [shard 1] rpc - server.cc:251 - kafka rpc protocol - Shutting down 2 connections
TRACE 2022-08-08 05:48:52,099 [shard 0] rpc - conn_quota.cc:118 - put(172.16.16.23)
TRACE 2022-08-08 05:48:52,099 [shard 1] rpc - conn_quota.cc:118 - put(172.16.16.18)
TRACE 2022-08-08 05:48:52,099 [shard 1] rpc - conn_quota.cc:118 - put(172.16.16.23)
TRACE 2022-08-08 05:48:52,099 [shard 0] cluster - config_frontend.cc:118 - set_status from node 4: {cluster_status: node 4, version: 5, restart: true (0 invalid, 0 unknown)}
TRACE 2022-08-08 05:48:52,099 [shard 0] raft - [group_id:0, {redpanda/controller/0}] replicate_entries_stm.cc:186 - Self append entries - {raft_group:{0}, commit_index:{85}, term:{1}, prev_log_index:{85}, prev_log_term:{1}}
TRACE 2022-08-08 05:48:52,099 [shard 0] raft - [group_id:0, {redpanda/controller/0}] replicate_entries_stm.cc:196 - Leader append result: {append_time:2656360, base_offset:{86}, last_offset:{86}, byte_size:95}
TRACE 2022-08-08 05:48:52,099 [shard 0] raft - [group_id:0, {redpanda/controller/0}] replicate_entries_stm.cc:96 - Sending append entries request {raft_group:{0}, commit_index:{85}, term:{1}, prev_log_index:{85}, prev_log_term:{1}} to {id: {4}, revision: {0}}
TRACE 2022-08-08 05:48:52,099 [shard 0] raft - [group_id:0, {redpanda/controller/0}] replicate_entries_stm.cc:96 - Sending append entries request {raft_group:{0}, commit_index:{85}, term:{1}, prev_log_index:{85}, prev_log_term:{1}} to {id: {2}, revision: {0}}
TRACE 2022-08-08 05:48:52,099 [shard 0] raft - [group_id:0, {redpanda/controller/0}] replicate_entries_stm.cc:96 - Sending append entries request {raft_group:{0}, commit_index:{85}, term:{1}, prev_log_index:{85}, prev_log_term:{1}} to {id: {3}, revision: {0}}
TRACE 2022-08-08 05:48:52,099 [shard 0] raft - [group_id:0, {redpanda/controller/0}] replicate_entries_stm.cc:96 - Sending append entries request {raft_group:{0}, commit_index:{85}, term:{1}, prev_log_index:{85}, prev_log_term:{1}} to {id: {5}, revision: {0}}
INFO  2022-08-08 05:48:52,099 [shard 0] rpc - server.cc:242 - vectorized internal rpc protocol - Stopping 1 listeners
DEBUG 2022-08-08 05:48:52,099 [shard 0] rpc - server.cc:246 - vectorized internal rpc protocol - Service probes {connects: 12, current connections: 4, connection close errors: 0, requests completed: 1851, received bytes: 672827, sent bytes: 171179, corrupted headers: 8, method not found errors: 0, requests blocked by memory: 0}
INFO  2022-08-08 05:48:52,099 [shard 0] rpc - server.cc:251 - vectorized internal rpc protocol - Shutting down 4 connections
INFO  2022-08-08 05:48:52,099 [shard 1] rpc - server.cc:242 - vectorized internal rpc protocol - Stopping 1 listeners
TRACE 2022-08-08 05:48:52,099 [shard 0] cluster - config_frontend.cc:118 - set_status from node 3: {cluster_status: node 3, version: 5, restart: true (0 invalid, 0 unknown)}
DEBUG 2022-08-08 05:48:52,099 [shard 1] rpc - server.cc:246 - vectorized internal rpc protocol - Service probes {connects: 3, current connections: 3, connection close errors: 0, requests completed: 5861, received bytes: 2335698, sent bytes: 525907, corrupted headers: 0, method not found errors: 0, requests blocked by memory: 0}
INFO  2022-08-08 05:48:52,099 [shard 1] rpc - server.cc:251 - vectorized internal rpc protocol - Shutting down 3 connections
TRACE 2022-08-08 05:48:52,099 [shard 0] raft - [group_id:0, {redpanda/controller/0}] replicate_entries_stm.cc:186 - Self append entries - {raft_group:{0}, commit_index:{85}, term:{1}, prev_log_index:{86}, prev_log_term:{1}}
DEBUG 2022-08-08 05:48:52,099 [shard 1] rpc - could not parse header from client: 172.16.16.22:63683
TRACE 2022-08-08 05:48:52,099 [shard 0] raft - [group_id:0, {redpanda/controller/0}] replicate_entries_stm.cc:196 - Leader append result: {append_time:2656360, base_offset:{87}, last_offset:{87}, byte_size:95}
DEBUG 2022-08-08 05:48:52,099 [shard 1] rpc - could not parse header from client: 172.16.16.10:59575
DEBUG 2022-08-08 05:48:52,099 [shard 1] rpc - could not parse header from client: 172.16.16.21:57833
TRACE 2022-08-08 05:48:52,099 [shard 0] raft - [group_id:0, {redpanda/controller/0}] replicate_entries_stm.cc:96 - Sending append entries request {raft_group:{0}, commit_index:{85}, term:{1}, prev_log_index:{86}, prev_log_term:{1}} to {id: {5}, revision: {0}}
TRACE 2022-08-08 05:48:52,099 [shard 0] raft - [group_id:0, {redpanda/controller/0}] replicate_entries_stm.cc:96 - Sending append entries request {raft_group:{0}, commit_index:{85}, term:{1}, prev_log_index:{86}, prev_log_term:{1}} to {id: {2}, revision: {0}}
TRACE 2022-08-08 05:48:52,099 [shard 0] raft - [group_id:0, {redpanda/controller/0}] replicate_entries_stm.cc:96 - Sending append entries request {raft_group:{0}, commit_index:{85}, term:{1}, prev_log_index:{86}, prev_log_term:{1}} to {id: {3}, revision: {0}}
TRACE 2022-08-08 05:48:52,099 [shard 0] raft - [group_id:0, {redpanda/controller/0}] replicate_entries_stm.cc:96 - Sending append entries request {raft_group:{0}, commit_index:{85}, term:{1}, prev_log_index:{86}, prev_log_term:{1}} to {id: {4}, revision: {0}}
TRACE 2022-08-08 05:48:52,099 [shard 0] cluster - config_frontend.cc:118 - set_status from node 2: {cluster_status: node 2, version: 5, restart: true (0 invalid, 0 unknown)}
TRACE 2022-08-08 05:48:52,099 [shard 0] raft - [group_id:0, {redpanda/controller/0}] replicate_entries_stm.cc:186 - Self append entries - {raft_group:{0}, commit_index:{85}, term:{1}, prev_log_index:{87}, prev_log_term:{1}}
TRACE 2022-08-08 05:48:52,099 [shard 0] raft - [group_id:0, {redpanda/controller/0}] replicate_entries_stm.cc:196 - Leader append result: {append_time:2656360, base_offset:{88}, last_offset:{88}, byte_size:95}
TRACE 2022-08-08 05:48:52,099 [shard 0] raft - [group_id:0, {redpanda/controller/0}] replicate_entries_stm.cc:96 - Sending append entries request {raft_group:{0}, commit_index:{85}, term:{1}, prev_log_index:{87}, prev_log_term:{1}} to {id: {3}, revision: {0}}
TRACE 2022-08-08 05:48:52,099 [shard 0] raft - [group_id:0, {redpanda/controller/0}] replicate_entries_stm.cc:96 - Sending append entries request {raft_group:{0}, commit_index:{85}, term:{1}, prev_log_index:{87}, prev_log_term:{1}} to {id: {5}, revision: {0}}
TRACE 2022-08-08 05:48:52,099 [shard 0] raft - [group_id:0, {redpanda/controller/0}] replicate_entries_stm.cc:96 - Sending append entries request {raft_group:{0}, commit_index:{85}, term:{1}, prev_log_index:{87}, prev_log_term:{1}} to {id: {4}, revision: {0}}
TRACE 2022-08-08 05:48:52,099 [shard 0] raft - [group_id:0, {redpanda/controller/0}] replicate_entries_stm.cc:96 - Sending append entries request {raft_group:{0}, commit_index:{85}, term:{1}, prev_log_index:{87}, prev_log_term:{1}} to {id: {2}, revision: {0}}
DEBUG 2022-08-08 05:48:52,099 [shard 0] rpc - could not parse header from client: 172.16.16.21:64008
TRACE 2022-08-08 05:48:52,099 [shard 0] cluster - config_frontend.cc:118 - set_status from node 5: {cluster_status: node 5, version: 5, restart: true (0 invalid, 0 unknown)}
TRACE 2022-08-08 05:48:52,099 [shard 0] raft - [group_id:0, {redpanda/controller/0}] replicate_entries_stm.cc:186 - Self append entries - {raft_group:{0}, commit_index:{85}, term:{1}, prev_log_index:{88}, prev_log_term:{1}}
TRACE 2022-08-08 05:48:52,100 [shard 0] raft - [group_id:0, {redpanda/controller/0}] replicate_entries_stm.cc:196 - Leader append result: {append_time:2656360, base_offset:{89}, last_offset:{89}, byte_size:95}
TRACE 2022-08-08 05:48:52,100 [shard 0] raft - [group_id:0, {redpanda/controller/0}] replicate_entries_stm.cc:96 - Sending append entries request {raft_group:{0}, commit_index:{85}, term:{1}, prev_log_index:{88}, prev_log_term:{1}} to {id: {4}, revision: {0}}
TRACE 2022-08-08 05:48:52,100 [shard 0] raft - [group_id:0, {redpanda/controller/0}] replicate_entries_stm.cc:96 - Sending append entries request {raft_group:{0}, commit_index:{85}, term:{1}, prev_log_index:{88}, prev_log_term:{1}} to {id: {2}, revision: {0}}
TRACE 2022-08-08 05:48:52,100 [shard 0] raft - [group_id:0, {redpanda/controller/0}] replicate_entries_stm.cc:96 - Sending append entries request {raft_group:{0}, commit_index:{85}, term:{1}, prev_log_index:{88}, prev_log_term:{1}} to {id: {3}, revision: {0}}
TRACE 2022-08-08 05:48:52,100 [shard 0] raft - [group_id:0, {redpanda/controller/0}] replicate_entries_stm.cc:96 - Sending append entries request {raft_group:{0}, commit_index:{85}, term:{1}, prev_log_index:{88}, prev_log_term:{1}} to {id: {5}, revision: {0}}
DEBUG 2022-08-08 05:48:52,100 [shard 1] cluster - controller_backend.cc:401 - Abort requested while reconciling topics
DEBUG 2022-08-08 05:48:52,100 [shard 0] raft - [group_id:0, {redpanda/controller/0}] replicate_entries_stm.cc:357 - Replication of entries with last offset: 86 aborted - shutting down
DEBUG 2022-08-08 05:48:52,100 [shard 0] raft - [group_id:0, {redpanda/controller/0}] replicate_entries_stm.cc:357 - Replication of entries with last offset: 87 aborted - shutting down
DEBUG 2022-08-08 05:48:52,100 [shard 0] raft - [group_id:0, {redpanda/controller/0}] replicate_entries_stm.cc:357 - Replication of entries with last offset: 88 aborted - shutting down
DEBUG 2022-08-08 05:48:52,100 [shard 0] raft - [group_id:0, {redpanda/controller/0}] replicate_entries_stm.cc:357 - Replication of entries with last offset: 89 aborted - shutting down
DEBUG 2022-08-08 05:48:52,100 [shard 0] cluster - controller_backend.cc:401 - Abort requested while reconciling topics
TRACE 2022-08-08 05:48:52,100 [shard 0] raft - [group_id:0, {redpanda/controller/0}] consensus.cc:2092 - flushed offset updated: 86
TRACE 2022-08-08 05:48:52,100 [shard 0] cluster - members_backend.cc:109 - error waiting for members updates - seastar::abort_requested_exception (abort requested)
ERROR 2022-08-08 05:48:52,100 [shard 0] rpc - Service handler threw an exception: std::exception (std::exception)
INFO  2022-08-08 05:48:52,100 [shard 0] rpc - simple_protocol.cc:85 - Error dispatching method: std::runtime_error (batched_output_stream is already closed. Ignoring: 26 bytes)
ERROR 2022-08-08 05:48:52,100 [shard 0] rpc - Service handler threw an exception: std::exception (std::exception)
INFO  2022-08-08 05:48:52,100 [shard 0] rpc - simple_protocol.cc:85 - Error dispatching method: std::runtime_error (batched_output_stream is already closed. Ignoring: 26 bytes)
ERROR 2022-08-08 05:48:52,100 [shard 0] rpc - Service handler threw an exception: std::exception (std::exception)
INFO  2022-08-08 05:48:52,100 [shard 0] rpc - simple_protocol.cc:85 - Error dispatching method: std::runtime_error (batched_output_stream is already closed. Ignoring: 26 bytes)
ERROR 2022-08-08 05:48:52,100 [shard 0] rpc - Service handler threw an exception: std::exception (std::exception)
INFO  2022-08-08 05:48:52,100 [shard 0] rpc - simple_protocol.cc:85 - Error dispatching method: std::runtime_error (batched_output_stream is already closed. Ignoring: 26 bytes)
@BenPope BenPope added kind/bug Something isn't working ci-failure labels Aug 8, 2022
@andrwng
Copy link
Contributor

andrwng commented Aug 8, 2022

It looks like the errors are being logged while the node is still on v22.1.4.

I suspect this is caused by the fact that #5097 seems to only live only on 22.2+, specifically the bits around adding a new exception type for disconnect errors 9ad09f0, d339b68, 07c8b7e

To boot, #4895 only lives on 22.1.6+, which likely explains the bare exception.

I think I'll go about bumping the version being upgraded from to v22.1.6+, though longer term this is a tricky and annoying problem to solve: the error-log-allow list is itself a sort of loosely defined protocol that evolves over time, but isn't versioned for upgrade tests on dev. Longer term I think we'll probably want to do away with "allowed errors" altogether, but until then there's a fine line between being hiding real bugs and avoiding noise, especially with mixed version tests.

@BenPope
Copy link
Member Author

BenPope commented Aug 9, 2022

@jcsp
Copy link
Contributor

jcsp commented Aug 18, 2022

I agree with @andrwng assessemnt, here's another example I experienced

grep -e ERROR -e Redpanda.v redpanda\ \(2\).log 
INFO   redpanda::main - application.cc:162 - Redpanda v22.1.4 - 491e56900d2316fcbb22aa1d37e7195897878309
ERROR 2022-08-18 17:21:37,026 [shard 0] rpc - Service handler threw an exception: std::exception (std::exception)
INFO   redpanda::main - application.cc:213 - Redpanda v22.2.1-rc3-194-g241a9dbee - 241a9dbee46c174e81ec3ef86d35c11337a074e3

The log allow list for the test should be extended to include this type of bogus exception.

@jcsp
Copy link
Contributor

jcsp commented Aug 18, 2022

Added a commit to update the allow list in https://github.com/redpanda-data/redpanda/pull/5972/commits

jcsp added a commit to jcsp/redpanda that referenced this issue Aug 19, 2022
@ajfabbri
Copy link
Contributor

Something similar here. I expected to see the newer named semaphore output, then realized the error may be from the pre-upgrade code.

@jcsp jcsp closed this as completed in dacdb3b Aug 20, 2022
felixguendling pushed a commit to felixguendling/redpanda that referenced this issue Aug 22, 2022
mmaslankaprv pushed a commit to mmaslankaprv/redpanda that referenced this issue Sep 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-failure kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants