Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added changes to handle dependency check in FdbSyncd and FpmSyncd for warm-boot #1556

Merged
merged 14 commits into from
Mar 3, 2021

Conversation

nkelapur
Copy link
Contributor

What I did
Added changes to handle dependency check in FpmSyncd and FdbSyncd for warmreboot

Why I did it
This was done to ensure for EVPN warm-reboot the order of data replay to kernel is maintained across various submodules and the kernel programming will be successful.

How I verified it
Verified with EVPN warmreboot

Details if related
More details in warmreboot section of EVPN VXLAN HLD
sonic-net/SONiC#437

@nkelapur nkelapur marked this pull request as ready for review December 18, 2020 16:57
@@ -117,6 +117,16 @@ AppRestartAssist::cache_state_t AppRestartAssist::getCacheEntryState(const std::
throw std::logic_error("cache entry state is invalid");
}

void AppRestartAssist::appDataReplayed()
{
WarmStart::setWarmStartState(m_appName, WarmStart::REPLAYED);
Copy link
Contributor

@qiluo-msft qiluo-msft Dec 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    [](start = 0, length = 8)

Too much indentation. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

else
{
SWSS_LOG_INFO("Module %s NOT Replayed or Reconciled %d",module.c_str(), (int) state);
//return false;
Copy link
Contributor

@qiluo-msft qiluo-msft Dec 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unused code #WontFix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is stubbed code so that the basic functionality will not fail until all the warm-reboot changes are available in the code base. Once all the warm-reboot changes are available, the actual code will be uncommented and the stub will be deleted

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I am following here, what are the other changes needed to support warm-reboot? If this PR dependent on them, just mark this PR depending on other PRs, this PR won't get merged before others to be available. That will be easier to track the dependencies by automatic testing and guard all the correctness?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is the last PR being merged other dependent PRs are already merged. Will delete the stub and activate the actual code here with this PR based on these comments.

{
SWSS_LOG_INFO("Module %s NOT Replayed or Reconciled %d",module.c_str(), (int) state);
//return false;
//Return true till all the dependant code is checked in
Copy link
Contributor

@qiluo-msft qiluo-msft Dec 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// [](start = 12, length = 2)

Add a blank after // #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

{
vector<string> required_modules = {
"orchagent",
};
Copy link
Contributor

@qiluo-msft qiluo-msft Dec 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too much indentation #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

{
SWSS_LOG_INFO("Module %s NOT Reconciled %d",module.c_str(), (int) state);
//return false;
//Return True untill the dependant orchagent code is commited
Copy link
Contributor

@qiluo-msft qiluo-msft Dec 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same #Closed

Copy link
Contributor Author

@nkelapur nkelapur Dec 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code and comment will be deleted when all the dependent warm-reboot code is available in the code base. will add the space though

sync.getRestartAssist()->readTablesToMap();

while (!sync.isIntfRestoreDone())
Copy link
Contributor

@qiluo-msft qiluo-msft Dec 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isIntfRestoreDone [](start = 29, length = 17)

CPU is wasted on waiting. Could you subscribe Redis? #WontFix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is not a continuous busy wait ( sleep is present), this should not cause the cpu to be continuously busy. Also there is nothing for fdbsyncd to do until the interface info is populated to kernel after system warm-reboot, hence it needs to wait till such time.

replayCheckTimer.start();
s.addSelectable(&replayCheckTimer);

}
Copy link
Contributor

@qiluo-msft qiluo-msft Dec 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra blank line #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

@nkelapur
Copy link
Contributor Author

retest this please

else
{
SWSS_LOG_INFO("Module %s NOT Reconciled %d",module.c_str(), (int) state);
//return false;
Copy link
Contributor

@qiluo-msft qiluo-msft Dec 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unused code. #WontFix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is stubbed code so that the basic functionality will not fail until all the warm-reboot changes are available in the code base. Once all the warm-reboot changes are available, the actual code will be uncommented and the stub will be deleted

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same points here, the code should be correct, and dependencies should be marked as PR description level and let the test to guide if PR is ready or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update the actual code flow as this is the last dependent PR.

@nkelapur
Copy link
Contributor Author

nkelapur commented Jan 3, 2021

retest this please

2 similar comments
@nkelapur
Copy link
Contributor Author

nkelapur commented Jan 6, 2021

retest this please

@prsunny
Copy link
Collaborator

prsunny commented Jan 8, 2021

retest this please

@liushilongbuaa
Copy link
Contributor

retest vs please

@prsunny
Copy link
Collaborator

prsunny commented Jan 8, 2021

retest this please

1 similar comment
@nkelapur
Copy link
Contributor Author

nkelapur commented Jan 9, 2021

retest this please

@nkelapur
Copy link
Contributor Author

nkelapur commented Jan 9, 2021

retest this please

Copy link
Collaborator

@prsunny prsunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the PR, it seems orchagent dependency is pending and the function is simply returning true. Based on the priority for this PR to be taken to release branch, suggest to remove the orchagent part from this and later add with the dependent orchagent changes.

{
bool readyToReconcile = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alignment issue. Please fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Fix this

if (temps == &warmStartTimer)
{
readyToReconcile = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alignment issue. Please fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Fix this

else
{
readyToReconcile = sync.isReadyToReconcile();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alignment issue. Please fix

{
SWSS_LOG_INFO("Module %s NOT Reconciled %d",module.c_str(), (int) state);
//return false;
//Return true untill dependent module code is commited
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the dependent module code? Are there any further changes expected?

Copy link
Contributor Author

@nkelapur nkelapur Jan 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dependent code means all the code PRs which add this dependency check. Since these are committed as different PRs, I have stubbed the actual check for dependency. This will ensure that if one of the PR is not present, the dependency check will not fail. Once all the PRs related to this dependence check are merged, these stubbs will be deleted and actual check will be enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same applies to the above comment regarding orchagent code too. The orchagent code is already present. Once all the PRs ( now only 1556 is remaining) are merged, the actual code check will be activated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will commit the actual code and remove the stubbed code as other PRs are already merged and this is the last PR

SWSS_LOG_INFO("Module %s NOT Replayed or Reconciled %d",module.c_str(), (int) state);
//return false;
// Return true till all the dependant code is checked in
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why return true here? if we do nothing here, the function will return true in the end anyways, why bother returning true here?

else
{
readyToReconcile = sync.isReadyToReconcile();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you replace tab with white spaces?

@ben-gale
Copy link
Collaborator

@prsunny and team - can this one advance now? Thx.

@prsunny
Copy link
Collaborator

prsunny commented Jan 25, 2021

@prsunny and team - can this one advance now? Thx.

waiting on @zhenggen-xu to sign-off

@@ -10,7 +10,16 @@
#include "warmRestartAssist.h"

// The timeout value (in seconds) for fdbsyncd reconcilation logic
#define DEFAULT_FDBSYNC_WARMSTART_TIMER 30
#define DEFAULT_FDBSYNC_WARMSTART_TIMER 600
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I get the point why we increase this timer to a huge value. This is the same timer value used before for fdb table reconciliation logic itself, so we expect much longer time for fdb to reconciliation due to dependencies? Now, we have more timers, based on the code, we wait minimal FDBSYNC_RECON_WAIT_TIME to check orchagent reconciliation state after replay, if not ready, we check every second. Why FDBSYNC_RECON_WAIT_TIME is 120? This is also considerably big, we need some data to support this value. And also, if orchagent never reconcile, should we abort instead, I,E warm restart failed? We need define the behaviour of the timers mentioned above and document it.


if (pasttime > INTF_RESTORE_MAX_WAIT_TIME)
{
SWSS_LOG_INFO("timed-out before all interface data was replayed to kernel!!!");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if intf is not restored after max_wait_time? Shouldn't we abort to avoid more issues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System will proceed further. Some mac programming to kernel might fail because underlying interface is not yet created.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we could not restore interface, why we should proceed further and get into some limbo state that may or may not have critical issues. I would suggest we abort to bring user's attention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interfaces will be eventually restored. The only impact will be that warm-reboot might not be hitless and there will be traffic loss seen. Not sure if we need to go for full abort and impact everything and all traffic. Requesting @prsunny to comment on this too

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Some mac programming to kernel might fail because underlying interface is not yet created." so this condition will be recovered by someone later? Again, if it is a critical condition, we should raise/abort so we don't get into limbo state.

{
if (sync.isReadyToReconcile())
{
reconcileHoldTimer.setInterval(timespec{2, 0});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the hold-timer for? Why do we need wait another 2 seconds before reconcile happens?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once fpmsyncd reconcile is done, this timer checks if orchagent is reconciled and we are ready to start updating the reconciled fpmsyncd entries into the APP-DB. Since did not want to check too frequently, kept the timer at 2 seconds.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reconcileHoldTimer is one-shot timer in the code, it just means we wait one time 2 seconds then do reconcile. Question was, why need wait another 2 seconds? what did we wait for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reconcileHoldTimer is one shot timer which is fired once the control plane reconciliation is done. At this time if orchagnet is reconciled, we continue with fpnsyncd reconcillation. However when reconcileHoldTimer expires if orchangent is still not reconciled, we fire reconcileHoldTimer for another 2 seconds and re-check if the orchagent is reconciled yet. This continues till it finds orchagent has reconciled. The idea is it wait for minimum of control plane ( BGP ) reconciliation time ( default 120 seconds) and also check if orchagent is reconciled before reconciling fpmsyncd. Both conditions should be met. Orchagent reconcile can occur earlier or later based on system scale, however we need to wait for minimum control plane ( BGP ) reconcillation time, before reconciling fpmsyncd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some correction in the above explanation, reconcileCheckTimer ( not reconcileHoldTimer) is fired again for 2 seconds to re-check if orchagent has reconciled. Hence reconcileHoldTimer does not need 2 seconds. Will change it to 1 seconds like eoiu hold timer.

else
{
readyToReconcile = sync.isReadyToReconcile();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will basically invalidate the eoiu design for fast reconciliation. Understood we probably have to rely on orchagent reconciliation status, then the DEFAULT_ROUTING_RECON_CHECK_INTERVAL should be reduced to way smaller to take advantage of eoui.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEFAULT_ROUTING_RECON_CHECK_INTERVAL timer is only fallback timer. Is will only come into action when the replay/reconcile does not happen in given time. Hence it should not affect eoui handling.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eoiuHoldTimer is one-shot timer as is in the code, it could be triggered very early like after a few seconds, at that time readyToReconcile could be false (likely) , then the code will fall back to the reconcileCheckTimer (if timer not configured, then DEFAULT_ROUTING_RECON_CHECK_INTERVAL =120 seconds). This is what I meant invalidate the eoiu design. Example: eoui takes 10 seconds to finish, the orchagent takes 15 seconds to get readyToReconcile state, we still need wait at least 120 seconds to reconcile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got your point. When eoiuHoldTimer expires, that means control plane ( BGP ) has converged. So now we just need to wait for orchangent reconcile and proceed once that is done. To implement this, will restart reconCheck timer with 2 seconds when eoiuHoldTimer expires, to check if orchagent has reconciled. Will implement this change.

if (temps == &warmStartTimer)
{
readyToReconcile = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of orchagent still not ready, should we just abort?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reconcile should take care of cleaning up stale entries. So not sure abort is necessary.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there was a reason we wait for "sync.isReadyToReconcile", I assume it was a must condition for reconcile to be working as expected. Again, if that condition is broken, we should make it visible to user. This probably won't happen in normal case, if it does, we should have information for user to debug, so raise or abort could help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again same as earlier, orchagent will eventually reconcile. If it does not, it would have its own mechanism to recover/abort. Here we are making effort to reconcile the system to pre-warm-reboot state. Hence if we continue, we would reconcile prematurely and hence warm-reboot may not be hitless. However if we abort, we would impact everything and the full traffic will be hit.

@@ -16,7 +16,9 @@ using namespace swss;
* Default warm-restart timer interval for routing-stack app. To be used only if
* no explicit value has been defined in configuration.
*/
const uint32_t DEFAULT_ROUTING_RESTART_INTERVAL = 120;
const uint32_t DEFAULT_ROUTING_RESTART_INTERVAL = 600;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, we should well define these two timers: DEFAULT_ROUTING_RESTART_INTERVAL and DEFAULT_ROUTING_RECON_CHECK_INTERVAL in document/code-comments.

@prsunny
Copy link
Collaborator

prsunny commented Feb 3, 2021

@zhenggen-xu , could you please check the updated code?

@prsunny
Copy link
Collaborator

prsunny commented Feb 4, 2021

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ben-gale
Copy link
Collaborator

ben-gale commented Feb 5, 2021

@prsunny - what needs to happen to merge this? Are we just waiting for @zhenggen-xu approval?

@prsunny
Copy link
Collaborator

prsunny commented Feb 5, 2021

@prsunny - what needs to happen to merge this? Are we just waiting for @zhenggen-xu approval?

Yes @ben-gale . @zhenggen-xu and @yxieca

When eoiuHoldTimer expires, that means control plane ( BGP ) has
converged. So now we just need to wait for orchangent reconcile and
proceed once that is done. To implement this restart reconCheck timer
when eoiuHoldTimer expires, to check if orchagent has reconciled
@prsunny
Copy link
Collaborator

prsunny commented Feb 12, 2021

Based on discussion with @qiluo-msft , in the current warmboot design, it is not required for fdbsyncd (or any application) to wait for orchagent reconciliation. Once orchagent reads the APP_DB data during warmboot start, applications can write to DB and it would be treated as a normal operations by orchagent post bake. Also, from a design perspective, it is not recommended for applications to wait for orchagent but should be able to handle independently. Suggest to remove the orchagent reconcile section from the PR. Let us know if you've any further questions.

@nkelapur
Copy link
Contributor Author

Based on discussion with @qiluo-msft , in the current warmboot design, it is not required for fdbsyncd (or any application) to wait for orchagent reconciliation. Once orchagent reads the APP_DB data during warmboot start, applications can write to DB and it would be treated as a normal operations by orchagent post bake. Also, from a design perspective, it is not recommended for applications to wait for orchagent but should be able to handle independently. Suggest to remove the orchagent reconcile section from the PR. Let us know if you've any further questions.

Sure .. will make the change and re-submit

Removed the dependency on orchagent reconcillation as per the review
discussion and conclusion
Also added exception when interfaces are not replayed to kernel in the
given time
@prsunny
Copy link
Collaborator

prsunny commented Feb 18, 2021

@zhenggen-xu , the PR has been restructured. Could you please take a look?

if (sync.getFdbStateTable()->empty() && sync.getCfgEvpnNvoTable()->empty())
{
sync.getRestartAssist()->appDataReplayed();
SWSS_LOG_NOTICE("FDB Replay Complete");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removeSelectable for replayCheckTimer?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, since the replaychecktimer and reconciliation timer are in parallel, what is the consequence if reconciliation timer is up, but we haven't replayed? If replay is must, but not yet done after reconciliation timer, we should log the error and raise.

Copy link
Contributor Author

@nkelapur nkelapur Feb 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will removeSelectable replayCheckTimer and start recontillation timer after replay is done.

/*
* Default warm-restart timer interval for routing-stack app
*/
#define DEFAULT_FDBSYNC_WARMSTART_TIMER 120
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned earlier, how is this fdb depending on routing stack? If user configures the routing stack warm-restart timer to a bigger value and it actually took that much time to reconcile for routing stack, what is the consequence?
If the dependency is must, We should probably also read the routing stack reconciliation status before we reconcile here for fdb.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fdbsyncd reconciliation is dependant on BGP convergence time. Will change it to use the BGP warm-restart timer config value instead of hardcoded value. That way the reconcile is related to the control plane convergence. Same way its done in fpmsyncd too. Further to optimise the reconciliation time, EOIU feature is implemented for fpmsyncd to check for actual protocol convergence. This is not yet validated for fdbsyncd and will be implemented later. For now fdbsyncd will only use the bgp warm-restart timer config value as in fpmsyncd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fdbsyncd reconciliation is dependant on BGP convergence time. Will change it to use the BGP warm-restart timer config value instead of hardcoded value. That way the reconcile is related to the control plane convergence. Same way its done in fpmsyncd too. Further to optimise the reconciliation time, EOIU feature is implemented for fpmsyncd to check for actual protocol convergence. This is not yet validated for fdbsyncd and will be implemented later. For now fdbsyncd will only use the bgp warm-restart timer config value as in fpmsyncd

// The timeout value (in seconds) for fdbsyncd reconcilation logic
#define DEFAULT_FDBSYNC_WARMSTART_TIMER 30
/*
* Default warm-restart timer interval for routing-stack app
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the comment to default timer for fdb reconciliation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure .. will do

SWSS_LOG_NOTICE("FDB Replay Complete");
s.removeSelectable(&replayCheckTimer);

/* Obtain warm-restart timer defined for routing application */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comments here for:

  • fdb dependencies on routing application
  • We should have TBD comment for addressing optimization of EOIU etc later. IMO, checking the bgp reconciliation state is a better way to handle the dependency. If we really have to do it next, let's track it with an issue, and add it to the code comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure will add the comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitted issue #1657 to track the eoiu implementation for EVPN AF

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 1556 in repo Azure/sonic-swss

@nkelapur
Copy link
Contributor Author

nkelapur commented Mar 2, 2021

retest this please

@prsunny
Copy link
Collaborator

prsunny commented Mar 2, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@prsunny prsunny merged commit 721f47d into sonic-net:master Mar 3, 2021
DavidZagury pushed a commit to DavidZagury/sonic-swss that referenced this pull request Mar 4, 2021
… warm-boot (sonic-net#1556)

Added changes to handle dependency check in FpmSyncd and FdbSyncd for warmreboot. 
This was done to ensure for EVPN warm-reboot the order of data replay to kernel is maintained across various submodules and the kernel programming will be successful.
raphaelt-nvidia pushed a commit to raphaelt-nvidia/sonic-swss that referenced this pull request Oct 5, 2021
… warm-boot (sonic-net#1556)

Added changes to handle dependency check in FpmSyncd and FdbSyncd for warmreboot. 
This was done to ensure for EVPN warm-reboot the order of data replay to kernel is maintained across various submodules and the kernel programming will be successful.
EdenGri pushed a commit to EdenGri/sonic-swss that referenced this pull request Feb 28, 2022
What I did
Added platform pre check support in reboot script.
Checking platform based changes before stopping dockers and sonic services.
Porting changes in master from 201911 branch sonic-net#1472
How I did it
On branch reboot_pre_check_master
Changes not staged for commit:
(use "git add ..." to update what will be committed)
(use "git checkout -- ..." to discard changes in working directory)

modified:   scripts/reboot
How to verify it
Write a platform pre check script(platform_reboot_pre_check) and place it in /usr/share/sonic/device// directory.
If the script exit with status 0, reboot will be proceeded.
If script exit with non-zero status, the reboot script gets stopped.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants