Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure (BadLogLines exceptional future ignored) in PartitionMoveInterruption.test_cancelling_partition_move_x_core #14149

Closed
NyaliaLui opened this issue Oct 13, 2023 · 7 comments · May be fixed by #16534
Labels
area/kafka area/redpanda ci-failure kind/bug Something isn't working sev/medium Bugs that do not meet criteria for high or critical, but are more severe than low.

Comments

@NyaliaLui
Copy link
Contributor

NyaliaLui commented Oct 13, 2023

https://buildkite.com/redpanda/redpanda/builds/38786

Module: rptest.tests.partition_move_interruption_test
Class: PartitionMoveInterruption
Method: test_cancelling_partition_move_x_core
Arguments: {
    "recovery": "restart_recovery",
    "compacted": false,
    "unclean_abort": true,
    "replication_factor": 1
}
test_id:    PartitionMoveInterruption.test_cancelling_partition_move_x_core
status:     FAIL
run time:   195.681 seconds

<BadLogLines nodes=docker-rp-20(1) example="WARN  2023-10-12 06:46:16,189 [shard 1:main] seastar - Exceptional future ignored: seastar::no_sharded_instance_exception (sharded instance does not exist: seastar::abort_source), backtrace: 0xee8b6 /opt/redpanda_installs/ci/lib/libseastar.so+0x58c7582 /opt/redpanda_installs/ci/lib/libseastar.so+0x58c7165 /opt/redpanda_installs/ci/lib/libseastar.so+0x58cb79e /opt/redpanda_installs/ci/lib/libseastar.so+0x58ce17f /opt/redpanda_installs/ci/lib/libseastar.so+0x424fa25 0x19bc22 0x19bab4 /opt/redpanda_installs/ci/lib/libv_v_application.so+0x60e6876 /opt/redpanda_installs/ci/lib/libseastar.so+0x425517d /opt/redpanda_installs/ci/lib/libseastar.so+0x4255747 /opt/redpanda_installs/ci/lib/libseastar.so+0x4613314 /opt/redpanda_installs/ci/lib/libseastar.so+0x4620591 /opt/redpanda_installs/ci/lib/libseastar.so+0x46267fb /opt/redpanda_installs/ci/lib/libseastar.so+0x47a3866 /opt/redpanda_installs/ci/lib/libseastar.so+0x47a20f0 /opt/redpanda_installs/ci/lib/libseastar.so+0x47a1fb0 /opt/redpanda_installs/ci/lib/libseastar.so+0x47a1f54 /opt/redpanda_installs/ci/lib/libseastar.so+0x479d580 /opt/redpanda_installs/ci/lib/libv_v_application.so+0x6b34351 /opt/redpanda_installs/ci/lib/libv_v_application.so+0x6b2feb8 /opt/redpanda_installs/ci/lib/libseastar.so+0x43f7c5c /opt/redpanda_installs/ci/lib/libc.so.6+0x91016 /opt/redpanda_installs/ci/lib/libc.so.6+0x1166cf">
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 269, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 481, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/root/tests/rptest/services/cluster.py", line 142, in wrapped
    redpanda.raise_on_bad_logs(
  File "/root/tests/rptest/services/redpanda.py", line 1244, in raise_on_bad_logs
    raise BadLogLines(bad_lines)
rptest.services.utils.BadLogLines: <BadLogLines nodes=docker-rp-20(1) example="WARN  2023-10-12 06:46:16,189 [shard 1:main] seastar - Exceptional future ignored: seastar::no_sharded_instance_exception (sharded instance does not exist: seastar::abort_source), backtrace: 0xee8b6 /opt/redpanda_installs/ci/lib/libseastar.so+0x58c7582 /opt/redpanda_installs/ci/lib/libseastar.so+0x58c7165 /opt/redpanda_installs/ci/lib/libseastar.so+0x58cb79e /opt/redpanda_installs/ci/lib/libseastar.so+0x58ce17f /opt/redpanda_installs/ci/lib/libseastar.so+0x424fa25 0x19bc22 0x19bab4 /opt/redpanda_installs/ci/lib/libv_v_application.so+0x60e6876 /opt/redpanda_installs/ci/lib/libseastar.so+0x425517d /opt/redpanda_installs/ci/lib/libseastar.so+0x4255747 /opt/redpanda_installs/ci/lib/libseastar.so+0x4613314 /opt/redpanda_installs/ci/lib/libseastar.so+0x4620591 /opt/redpanda_installs/ci/lib/libseastar.so+0x46267fb /opt/redpanda_installs/ci/lib/libseastar.so+0x47a3866 /opt/redpanda_installs/ci/lib/libseastar.so+0x47a20f0 /opt/redpanda_installs/ci/lib/libseastar.so+0x47a1fb0 /opt/redpanda_installs/ci/lib/libseastar.so+0x47a1f54 /opt/redpanda_installs/ci/lib/libseastar.so+0x479d580 /opt/redpanda_installs/ci/lib/libv_v_application.so+0x6b34351 /opt/redpanda_installs/ci/lib/libv_v_application.so+0x6b2feb8 /opt/redpanda_installs/ci/lib/libseastar.so+0x43f7c5c /opt/redpanda_installs/ci/lib/libc.so.6+0x91016 /opt/redpanda_installs/ci/lib/libc.so.6+0x1166cf">

JIRA Link: CORE-1505

@NyaliaLui NyaliaLui added kind/bug Something isn't working ci-failure labels Oct 13, 2023
@NyaliaLui
Copy link
Contributor Author

Backtrace indicates some lambda with a temporary buffer

{./vbuild/debug/clang/dist/local/redpanda/libexec/redpanda} 0x19bc22: seastar::internal::repeat_until_value_state<seastar::data_source_impl::skip(unsigned long)::{lambda(unsigned long&)#1}::operator()(unsigned long&) const::{lambda()#1}, seastar::temporary_buffer<char> >::run_and_dispose() at ??:?
{./vbuild/debug/clang/dist/local/redpanda/libexec/redpanda} 0x19bab4: seastar::internal::repeat_until_value_state<seastar::data_source_impl::skip(unsigned long)::{lambda(unsigned long&)#1}::operator()(unsigned long&) const::{lambda()#1}, seastar::temporary_buffer<char> >::run_and_dispose() at ??:?

@dotnwat dotnwat added the sev/medium Bugs that do not meet criteria for high or critical, but are more severe than low. label Oct 14, 2023
@dotnwat
Copy link
Member

dotnwat commented Oct 14, 2023

Exceptional future ignored should really always start off at sev/medium. It can often indicate a logic error where error handling is skipped.

In this particular case I think the biggest clue is the type of exception that is being ignored. Normally it can be hard to track down the ignored exception because it is something very generic like "gate_closed_exception".

But in this case it's pretty clear. A service is being accessed that was either not started or had been stopped.

Exceptional future ignored: seastar::no_sharded_instance_exception (sharded instance does not exist: seastar::abort_source)

@rockwotj i'm not sure this would be any of your stuff if it isn't "hooked up" yet, but I do recall one of your PRs recently in which in the admin server (IIRC) you were checking if a service had been initialized, presumably because the HTTP request might race with application start up?

@rockwotj
Copy link
Contributor

It's not my stuff, that's not running yet.

I believe you're referring to

throw ss::httpd::bad_request_exception("data transforms not enabled");
that is there because if the config has not yet enabled data transforms the service does not exist. If we ever decide to always have wasm on, then we don't need that line. The admin server is not marked ready in the startup sequence until after everything is started, so it shouldn't be that.

Based on that exception: https://github.com/scylladb/seastar/blob/4dc3871ed9f32816e7a03895e98c86a5502d980f/include/seastar/core/sharded.hh#L520C2-L520C2

I would venture to guess this was introduced in https://github.com/redpanda-data/redpanda/pull/12021/files#diff-397a29fc3f287cdaabfb356be3e66125970a6c66c437fa49259b08ef5289362a or someone is using that sharded service wrong (accessing it before it's been started (or after it's stopped which is more likely).

@dotnwat
Copy link
Member

dotnwat commented Oct 14, 2023

I would venture to guess this was introduced in https://github.com/redpanda-data/redpanda/pull/12021/files#diff-397a29fc3f287cdaabfb356be3e66125970a6c66c437fa49259b08ef5289362a or someone is using that sharded service wrong (accessing it before it's been started (or after it's stopped which is more likely).

good call. look suspicious @BenPope ?

@rockwotj
Copy link
Contributor

Similar/same: #13278?

@dotnwat
Copy link
Member

dotnwat commented Dec 15, 2023

Assigning enterprise team since sharded was Ben's thing? In reality this is probably a user of it in application.cc shutdown so it might need to be tossed over somewhere lese.

@piyushredpanda
Copy link
Contributor

Not seen in at least two months, closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kafka area/redpanda ci-failure kind/bug Something isn't working sev/medium Bugs that do not meet criteria for high or critical, but are more severe than low.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants