-
Notifications
You must be signed in to change notification settings - Fork 348
Fix: fencer: Refresh CIB devices on change to nodes section #3855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@HideoYamauchi Can you test this if you have time and see if it fixes your issue? I made sure it compiles, but I haven't had time to set up a test for your issue yet, and I have to step away for a while. |
I'll check. Best Regards, |
Great! With this fix, we have confirmed that fenced correctly registers STONITH devices both when there is a node section in cib.xml and when there is not.
In both cases, STONITH with fence_scsi also worked correctly. Please merge this fix. Many thanks, |
Excellent! I'm about to push an edit that adds a sanity check, ensuring that the local node is not already part of a resource's allowed nodes table. (I think that should be impossible, but the scheduler is complex :) ).
I'll let the new CI build finish and then we'll merge it during NA hours if @clumens and @wenningerk have no objection. |
776441d
to
39aab7b
Compare
@clumens We'll need to make sure this gets into RHEL 10. If we can get it into 10.0 without a z-stream, that would be ideal. |
I'm going to bed soon, but I am starting to wonder if this approach is still too eager. If If that concern has merit, then another possibility might be:
I almost never think about clusters with BTW, I'm looking at |
Is this something we can add a scheduler regression test for? |
I don't think so.
|
* Remove some nesting. * Use const where possible. * Improve variable names and scope. * Use a macro for repeated string literal. * Use sizeof() - 1 instead of strlen() for string literal. * Remove unnecessary escaping of single quote within double quotes. * Copy only what is needed of the xpath string, if anything. * Log assertion on malformed patchset, since we created it in cib_perform_op(). * Add a comment to clarify why we're (apparently) doing all this in the first place. Keep the watchdog_device_update() comment as-is because I haven't looked into it. Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
An ID is required when schema validation is enabled anyway, and this allows us to be sure rsc_id will be non-NULL within the block. It also allows us to shorten some strstr() searches by starting later in the string, not that this will matter in practice. Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
The sole caller has already done the checking, we don't need event, and we don't need msg except as a way to get the patchset. Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
If the child of a primitive element representing a fencing device was deleted, and we didn't catch it earlier as instance_attributes or meta_attributes, then previously we would remove the fencing device. We should remove the device only if the primitive itself was deleted. By inspection, it looks like this would unregister a fencing device if one of its operations (op elements) were deleted. If we find that only a child of the primitive was deleted, fall through to the next if block -- that is, "continue" only if the primitive itself was deleted and we call stonith_device_remove(). Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
39aab7b
to
b3ece51
Compare
I hope this new version works, and that temporarily unregistering devices doesn't break anything. As mentioned in the commit message, I'm afraid to rely on the allowed nodes table being correct when a fake local node is involved. I'd prefer to use the cluster layer (via the controller) as the source of truth. |
This fixes a regression introduced by bf7ffcd. As of that commit, the fake local node is created after all resources have been unpacked. So it doesn't get added to resources' allowed_nodes tables. This prevents registration of fencing devices when the fencer receives a CIB diff that removes the local node. For example, the user may have replaced the CIB with a boilerplate configuration that has an empty nodes section. Registering a fencing device requires that the local node be in the resource's allowed nodes table. One option would be to add the fake local node to all resources' allowed nodes tables immediately after creating it. However, it shouldn't necessarily be an allowed node for all resources. For example, if symmetric-cluster=false, then a node should not be placed in a resource's allowed nodes table by default. It's difficult to ensure correctness of the allowed nodes tables when a fake local node is involved. It may involve repeated code or a fragile and confusing dependence on the order of unpacking. Since the fake local node is a hack in the first place, it seems better to avoid using it where possible. Currently the only code that even sets the local_node_name member of scheduler->priv is in: * the fencer * crm_mon when showing the "times" section This commit works as follows. If the fencer receives a CIB diff notification that contains the nodes section, it triggers a full refresh of fencing device registrations. In our example above, where a user has replaced the CIB with a configuration that has an empty nodes section, this means all fencing device registrations will be removed. However, the controller also has a CIB diff notification callback: do_cib_updated(). The controller's callback repopulates the nodes section with up-to-date information from the cluster layer (or its node cache) if it finds that an untrusted client like cibadmin has sent a modified the nodes section. Then it updates the CIB accordingly. The fencer will receive this updated CIB and refresh fencing device registrations again, re-registering the fencing devices that were just removed. Note that in the common case, we're not doing all this wasted work. The "remove and then re-register" sequence should happen only when a user has modified the CIB in a sloppy way (for example, by deleting nodes from the CIB's nodes section that have not been removed from the cluster). In short, it seems better to rely on the controller's maintenance of the CIB's node list, than to rely on a "fake local node" hack in the scheduler. See the following pull requests from Hideo Yamauchi and their discussions: ClusterLabs#3849 ClusterLabs#3852 Thanks to Hideo for the report and finding the cause. Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
b3ece51
to
757cce1
Compare
Pushed a new version with a FIXME comment. So I'm still not totally happy with this :( |
I have not yet confirmed that the fixes I made after I checked them work. However, there is something that concerns me. I haven't looked at the contents of your fix PR in detail, but I thought I'd comment just to be sure. Best Regards, |
I think this is okay, but I've never even heard of the fake local node thing until this PR came up. If @HideoYamauchi confirms it works for him, and ctslab testing doesn't show any other problems, I am okay with merging this. |
Okay! Many thanks, |
As a quick status update, these two registration flags ( For some reason, in 2009 we thought it was a good idea to allow registering fence devices without adding them to the CIB. Over the years, code has been added to handle corner cases stemming from that. For example, what should the fencer do when it finds a new fence device in the CIB with the same ID as a fence device that was already registered via the C API (including
I thought a simple fix would be to avoid deleting a fence device due to CIB change if it's registered via the API, but that has its own complications (like the PRs above). Additionally, the executor registers fence devices via the API as part of their start operation ("to avoid a race condition"). So a running fence device that exists in the CIB is registered via both the CIB and the API, and the fencer refuses to execute the monitor operation unless the device is registered via the API. Okay, so this status update wasn't quick after all :) I feel like this is all more complicated than it needs to be, and I'm trying to disentangle it. I appreciate your patience. |
iirc background was the ability to use fenced independently from pacemaker. Don't know if that is still relevant or even possible. |
It's possible via
I find it highly unlikely that anyone is registering fencing devices directly via the API, except for |
Hi @nrwahl2, Regarding the fix for this issue, what is the status of the fix since then? Best Regards, |
@HideoYamauchi This is my main project right now. However, I have not been able to work on it in the past 1 to 1.5 weeks. I'm hoping to return to it in this upcoming week. I don't have major updates. #3863 and #3866 were both prep work for this fix. I wanted to make the fencer code easier for me to read and to reason about, so that I can try to come up with a fix that I'm confident in. |
Hi @nrwahl2, Thanks for your comment. Many thanks, |
Hi @nrwahl2 I have one question. Best Regards, |
This fixes a regression introduced by bf7ffcd. As of that commit, the fake local node is created after all resources have been unpacked. So it doesn't get added to resources' allowed_nodes tables.
This prevents registration of fencing devices when the fencer receives a CIB diff that removes the local node. For example, the user may have replaced the CIB with a boilerplate configuration that has an empty nodes section. Registering a fencing device requires that the local node be in the resource's allowed nodes table.
One option would be to add the fake local node to all resources' allowed nodes tables immediately after creating it. However, it shouldn't necessarily be an allowed node for all resources. For example, if
symmetric-cluster="false"
, then a node should not be placed in a resource's allowed nodes table by default.It's difficult to ensure correctness of the allowed nodes tables when a fake local node is involved. It may involve repeated code or a fragile and confusing dependence on the order of unpacking.
Since the fake local node is a hack in the first place, it seems better to avoid using it where possible. Currently the only code that even sets the
local_node_name
member ofscheduler->priv
is in:crm_mon
when showing the "times" sectionThis commit works as follows. If the fencer receives a CIB diff notification that contains the nodes section, it triggers a full refresh of fencing device registrations. In our example above, where a user has replaced the CIB with a configuration that has an empty nodes section, this means all fencing device registrations will be removed.
However, the controller also has a CIB diff notification callback:
do_cib_updated()
. The controller's callback repopulates the nodes section with up-to-date information from the cluster layer (or its node cache) if it finds that an untrusted client like cibadmin has sent a modified the nodes section. Then it updates the CIB accordingly.The fencer will receive this updated CIB and refresh fencing device registrations again, re-registering the fencing devices that were just removed.
Note that in the common case, we're not doing all this wasted work. The "remove and then re-register" sequence should happen only when a user has modified the CIB in a sloppy way (for example, by deleting nodes from the CIB's nodes section that have not been removed from the cluster).
In short, it seems better to rely on the controller's maintenance of the CIB's node list, than to rely on a "fake local node" hack in the scheduler.
See the following pull requests from Hideo Yamauchi and their discussions:
Thanks to Hideo for the report and finding the cause.