Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic on varnishadm backend.set_health of dynamic backend #4183

Open
delthas opened this issue Sep 16, 2024 · 3 comments · May be fixed by #4186
Open

Panic on varnishadm backend.set_health of dynamic backend #4183

delthas opened this issue Sep 16, 2024 · 3 comments · May be fixed by #4186

Comments

@delthas
Copy link
Contributor

delthas commented Sep 16, 2024

Expected Behavior

No panic on the VTC.

Current Behavior

Panic:

  • Assert error in VBE_connwait_signal_all(), cache/cache_backend.c line 156:
  • Condition((bp)->magic == 0x64c4c7c6) not true.
  • cls_exec -> mgt_launch_child -> child_main -> CLI_Run -> VCLS_Poll -> cls_exec -> cli_backend_set_health -> VCL_IterDirector -> vcl_iterdir -> do_set_health -> VBE_connwait_signal_all -> VAS_Fail -> pan_ic
varnishtest "panic"

server s1 {
} -start

varnish v1 -vcl {
vcl 4.1;
import std;
import directors;
import dynamic;

backend foo none;

sub vcl_init {
    new cluster_default = directors.fallback();
    new main = dynamic.director(port = {"${s1_port}"});
    cluster_default.add_backend(main.backend(host={"${s1_addr}"}, port={"${s1_port}"}));
}
} -start

# Panic:
# Assert error in VBE_connwait_signal_all(), cache/cache_backend.c line 156:
#  Condition((bp)->magic == 0x64c4c7c6) not true.
# cls_exec -> mgt_launch_child -> child_main -> CLI_Run -> VCLS_Poll -> cls_exec -> cli_backend_set_health -> VCL_IterDirector -> vcl_iterdir -> do_set_health -> VBE_connwait_signal_all -> VAS_Fail -> pan_ic
varnish v1 -cliok "backend.set_health main(*) sick"

Even though my VTC involes libvmod-dynamic, this is likely an issue in Varnish, because this broke between Varnish 7.5.0 and Varnish 7.6.0

I ran a git bisect and the first failing commit is: e46f972 ("unsurprisingly" this is the commit adding the function which has the assert).

Context

I'm using dynamic backends inside directors.

@nigoroll
Copy link
Member

Thank you for the test case, this is reproducible immediately.

The problem here is that VBE_connwait_signal_all() gets called on a dynamic domain director.

looking... 👀

4  0x000055fa0d0376ec in pan_ic (func=0x55fa0d1299ec "VBE_connwait_signal_all", file=0x55fa0d1299af "cache/cache_backend.c", line=156, cond=0x55fa0d1299d2 "(bp)->magic == 0x64c4c7c6", kind=VAS_ASSERT) at cache/cache_panic.c:778
#5  0x000055fa0d0fead5 in VAS_Fail (func=0x55fa0d1299ec "VBE_connwait_signal_all", file=0x55fa0d1299af "cache/cache_backend.c", line=156, cond=0x55fa0d1299d2 "(bp)->magic == 0x64c4c7c6", kind=VAS_ASSERT) at vas.c:67
#6  0x000055fa0cff5340 in VBE_connwait_signal_all (bp=0x7ffa789d7000) at cache/cache_backend.c:156
#7  0x000055fa0d00e101 in do_set_health (cli=0x7ffa788d49d0, d=0x7ffa7881df30, priv=0x7ffcedc1dc90) at cache/cache_director.c:495
#8  0x000055fa0d04ef07 in vcl_iterdir (cli=0x7ffa788d49d0, pat=0x7ffa788202c0 "vcl1.main(*)", vcl=0x7ffa788471e0, func=0x55fa0d00def0 <do_set_health>, priv=0x7ffcedc1dc90) at cache/cache_vcl.c:385
#9  0x000055fa0d04ed4a in VCL_IterDirector (cli=0x7ffa788d49d0, pat=0x7ffa788283c8 "main(*)", func=0x55fa0d00def0 <do_set_health>, priv=0x7ffcedc1dc90) at cache/cache_vcl.c:421
#10 0x000055fa0d00d3fa in cli_backend_set_health (cli=0x7ffa788d49d0, av=0x7ffa7882c580, priv=0x0) at cache/cache_director.c:519
#11 0x000055fa0d103872 in cls_dispatch (cli=0x7ffa788d49d0, cs=0x7ffa78822d60, av=0x7ffa7882c580, ac=3) at vcli_serve.c:273
#12 0x000055fa0d10335d in cls_exec (cfd=0x7ffa788d49a0, av=0x7ffa7882c580, ac=3) at vcli_serve.c:324
#13 0x000055fa0d1021f6 in cls_feed (cfd=0x7ffa788d49a0, p=0x7ffcedc1de30 "\n54.947472/vgc.so 1auto\nhy", e=0x7ffcedc1de31 "54.947472/vgc.so 1auto\nhy") at vcli_serve.c:412
#14 0x000055fa0d101cba in VCLS_Poll (cs=0x7ffa78822d60, cli=0x7ffa788d49d0, timeout=-1) at vcli_serve.c:617
#15 0x000055fa0d0060b2 in CLI_Run () at cache/cache_cli.c:110
#16 0x000055fa0d031925 in child_main (sigmagic=1, altstksz=81920) at cache/cache_main.c:453
...
(gdb) fr 6
#6  0x000055fa0cff5340 in VBE_connwait_signal_all (bp=0x7ffa789d7000) at cache/cache_backend.c:156
156		CHECK_OBJ_NOTNULL(bp, BACKEND_MAGIC);
(gdb) p /x bp->magic
$2 = 0x1bfe1345

$ git grep 0x1bfe1345
src/vmod_dynamic.h:#define DYNAMIC_DOMAIN_MAGIC         0x1bfe1345

@nigoroll
Copy link
Member

I pointed out a layer violation at the time, and overlooked this one. The director code should really not call into VBE code directly...

@dridi
Copy link
Member

dridi commented Sep 16, 2024

This was bothering me but I couldn't put my finger on it at the time. I also found another wrong assumption (#4134 (review)) and I'm not entirely convinced by VBE_is_ah_auto().

nigoroll added a commit to nigoroll/varnish-cache that referenced this issue Sep 16, 2024
the idea here is to use the backend's healthy callback to signal to it a health
change from outside.

this fixes the regression from varnishcache#4183, but makes v74.vtc fail. And also I am not
happy about the locking involved and need to spend more time...
nigoroll added a commit to nigoroll/varnish-cache that referenced this issue Sep 17, 2024
nigoroll added a commit to nigoroll/varnish-cache that referenced this issue Sep 17, 2024
Pondering solutions for varnishcache#4183 it became apparent that we currently have no way
for core code to notify directors of changes, in particular changes of the
admin_health. I looked into re-using the healthy callback, but it is unsuitable
because VRT_Healthy() does not even call it for an "auto" admin_health.

So the cleanest option is another director callback, which is cheap and also
helps us maintain clean layering.
nigoroll added a commit to nigoroll/varnish-cache that referenced this issue Sep 17, 2024
Using the new facility, we can now selectively notify interested backend types
(so far: VBE only) when the admin health changes.

This fixes the layer violation introduced with
e46f972, where a director private pointer was
used with a VBE specific function.

Fixes varnishcache#4183
nigoroll added a commit to nigoroll/varnish-cache that referenced this issue Sep 17, 2024
Pondering solutions for varnishcache#4183 it became apparent that we currently have no way
for core code to notify directors of changes, in particular changes of the
admin_health. I looked into re-using the healthy callback, but it is unsuitable
because VRT_Healthy() does not even call it for an "auto" admin_health.

So the cleanest option is another director callback, which is cheap and also
helps us maintain clean layering.
nigoroll added a commit to nigoroll/varnish-cache that referenced this issue Sep 17, 2024
Using the new facility, we can now selectively notify interested backend types
(so far: VBE only) when the admin health changes.

This fixes the layer violation introduced with
e46f972, where a director private pointer was
used with a VBE specific function.

Fixes varnishcache#4183
nigoroll added a commit to nigoroll/libvmod-dynamic that referenced this issue Sep 17, 2024
nigoroll added a commit to nigoroll/varnish-cache that referenced this issue Sep 17, 2024
Pondering solutions for varnishcache#4183 it became apparent that we currently have no way
for core code to notify directors of changes, in particular changes of the
admin_health. I looked into re-using the healthy callback, but it is unsuitable
because VRT_Healthy() does not even call it for an "auto" admin_health.

So the cleanest option is another director callback, which is cheap and also
helps us maintain clean layering.
nigoroll added a commit to nigoroll/varnish-cache that referenced this issue Sep 17, 2024
Using the new facility, we can now selectively notify interested backend types
(so far: VBE only) when the admin health changes.

This fixes the layer violation introduced with
e46f972, where a director private pointer was
used with a VBE specific function.

Fixes varnishcache#4183
nigoroll added a commit to nigoroll/varnish-cache that referenced this issue Sep 17, 2024
Pondering solutions for varnishcache#4183 it became apparent that we currently have no way
for core code to notify directors of changes, in particular changes of the
admin_health. I looked into re-using the healthy callback, but it is unsuitable
because VRT_Healthy() does not even call it for an "auto" admin_health.

So the cleanest option is another director callback, which is cheap and also
helps us maintain clean layering.
nigoroll added a commit to nigoroll/varnish-cache that referenced this issue Sep 17, 2024
Using the new facility, we can now selectively notify interested backend types
(so far: VBE only) when the admin health changes.

This fixes the layer violation introduced with
e46f972, where a director private pointer was
used with a VBE specific function.

Fixes varnishcache#4183
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants