Fix error handling when failing to install a deb package #11846

saiarcot895 · 2022-08-25T16:40:38Z

Signed-off-by: Saikrishna Arcot sarcot@microsoft.com

Why I did it

The current error handling code for when a deb package fails to be
installed currently has a chain of commands linked together by && and
ends with exit 1. The assumption is that the commands would succeed,
and the last exit 1 would end it with a non-zero return code, thus
fully failing the target and causing the build to stop because of bash's
-e flag.

However, if one of the commands prior to exit 1 returns a non-zero
return code, then bash won't actually treat it as a terminating error.
From bash's man page:

-e      Exit immediately if a pipeline (which may consist of a single simple
	command), a list, or a compound command (see SHELL GRAMMAR above),
        exits with a non-zero status.  The shell does not exit if the
        command that fails is part of the  command  list  immediately
        following a while or until keyword, part of the test following the
        if or elif reserved words, part of any command executed in a && or
        || list except the command following the final && or ||, any
        command in a pipeline but the last, or if the command's return
        value is being inverted with !.  If a compound command other than a
        subshell returns a non-zero status because a command failed while
        -e was being ignored, the shell does not exit.

The part part of any command executed in a && or || list except the command following the final && or || says that if the failing command
is not the exit 1 that we have at the end, then bash doesn't treat it
as an error and exit immediately. Additionally, since this is a compound
command, but isn't in a subshell (subshell are marked by ( and ),
whereas { and } just tells bash to run the commands in the current
environment), bash doesn't exist. The result of this is that in the
deb-install target, if a package installation fails, it may be
infinitely stuck in that while-loop.

This was seen when the snmpd package upgrade happened, and
builds were failing to install the mismatching libsnmp-dev package,
the builds did not immediately terminate; instead, the installation
was retried again and again, suggesting it was stuck in some infinite
loop. The build jobs finally terminated only because of the timeout
specified for the jobs.

How I did it

There are two fixes for this: change to using a subshell, or use ;
instead of &&. Using a subshell would, I think, require exporting any
shell variables used in the subshell, so I chose to change the && to
;. In addition, at the start of the subshell, set +e is added in,
which removes the exit-on-error handling of bash. This makes sure that
all commands are run (the output of which may help for debugging) and
that it still exits with 1, which will then fully fail the target.

How to verify it

Which release branch to backport (provide reason below if selected)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

The current error handling code for when a deb package fails to be installed currently has a chain of commands linked together by && and ends with `exit 1`. The assumption is that the commands would succeed, and the last `exit 1` would end it with a non-zero return code, thus fully failing the target and causing the build to stop because of bash's -e flag. However, if one of the commands prior to `exit 1` returns a non-zero return code, then bash won't actually treat it as a terminating error. From bash's man page: ``` -e Exit immediately if a pipeline (which may consist of a single simple command), a list, or a compound command (see SHELL GRAMMAR above), exits with a non-zero status. The shell does not exit if the command that fails is part of the command list immediately following a while or until keyword, part of the test following the if or elif reserved words, part of any command executed in a && or || list except the command following the final && or ||, any command in a pipeline but the last, or if the command's return value is being inverted with !. If a compound command other than a subshell returns a non-zero status because a command failed while -e was being ignored, the shell does not exit. ``` The part `part of any command executed in a && or || list except the command following the final && or ||` says that if the failing command is not the `exit 1` that we have at the end, then bash doesn't treat it as an error and exit immediately. Additionally, since this is a compound command, but isn't in a subshell (subshell are marked by `(` and `)`, whereas `{` and `}` just tells bash to run the commands in the current environment), bash doesn't exist. The result of this is that in the deb-install target, if a package installation fails, it may be infinitely stuck in that while-loop. There are two fixes for this: change to using a subshell, or use `;` instead of `&&`. Using a subshell would, I think, require exporting any shell variables used in the subshell, so I chose to change the `&&` to `;`. In addition, at the start of the subshell, `set +e` is added in, which removes the exit-on-error handling of bash. This makes sure that all commands are run (the output of which may help for debugging) and that it still exits with 1, which will then fully fail the target. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

…1846) The current error handling code for when a deb package fails to be installed currently has a chain of commands linked together by && and ends with `exit 1`. The assumption is that the commands would succeed, and the last `exit 1` would end it with a non-zero return code, thus fully failing the target and causing the build to stop because of bash's -e flag. However, if one of the commands prior to `exit 1` returns a non-zero return code, then bash won't actually treat it as a terminating error. From bash's man page: ``` -e Exit immediately if a pipeline (which may consist of a single simple command), a list, or a compound command (see SHELL GRAMMAR above), exits with a non-zero status. The shell does not exit if the command that fails is part of the command list immediately following a while or until keyword, part of the test following the if or elif reserved words, part of any command executed in a && or || list except the command following the final && or ||, any command in a pipeline but the last, or if the command's return value is being inverted with !. If a compound command other than a subshell returns a non-zero status because a command failed while -e was being ignored, the shell does not exit. ``` The part `part of any command executed in a && or || list except the command following the final && or ||` says that if the failing command is not the `exit 1` that we have at the end, then bash doesn't treat it as an error and exit immediately. Additionally, since this is a compound command, but isn't in a subshell (subshell are marked by `(` and `)`, whereas `{` and `}` just tells bash to run the commands in the current environment), bash doesn't exist. The result of this is that in the deb-install target, if a package installation fails, it may be infinitely stuck in that while-loop. There are two fixes for this: change to using a subshell, or use `;` instead of `&&`. Using a subshell would, I think, require exporting any shell variables used in the subshell, so I chose to change the `&&` to `;`. In addition, at the start of the subshell, `set +e` is added in, which removes the exit-on-error handling of bash. This makes sure that all commands are run (the output of which may help for debugging) and that it still exits with 1, which will then fully fail the target. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

* Add YANG models for structured events * [Arista] Update platform submodule (sonic-net#11853) * Advance submodule sonic-sairedis (sonic-net#11704) 2022-07-28 854d54e: Add support of mdio IPC server class using sai switch api and unix socket (sonic-net/sonic-sairedis#1080) (Jiahua Wang) 2022-07-27 513cb2a: [FlexCounter] Refactor FlexCounter class (sonic-net/sonic-sairedis#1073) (Junchao-Mellanox) * Update swss common submodule for events api (sonic-net#11858) #### Why I did it Structured events code like eventd, rsyslogplugin, requires changes made in swss-common Submodule adds these newest commits: 56b0f18 (HEAD, origin/master, origin/HEAD, master) Events: APIs to set/get global options (sonic-net#672) 5467c89 Add changes to yml file to improve pytest (sonic-net#674) #### How I did it Updated git submodule #### How to verify it Check new commit pointer * [Arista] Fix content of platform.json for DCS-720DT-48S (sonic-net#11855) Why I did it Content of platform.json was outdated and some platform_tests/api of sonic-mgmt were failing. How I did it Added the necessary values to platform.json How to verify it Running platform_tests/api of sonic-mgmt should yield 100% passrate. * [actions] Update github actions label and automerge. (sonic-net#11736) 1. Add auto approve step when adding label to version upgrading PR. 2. Use mssonicbld TOKEN to merge version upgrading PR instead of 'github actions' * [ci] Update reproducible build related pipeline. (sonic-net#11810) * Address Review Comment to define SONIC_GLOBAL_DB_CLI in gbsyncd.sh (sonic-net#11857) As part of PR sonic-net#11754 Change was added to use variable SONIC_DB_NS_CLI for namespace but that will not work since ./files/scripts/syncd_common.sh uses SONIC_DB_CLI. So revert back to use SONIC_DB_CLI and define new variable for SONIC_GLOBAL_DB_CLI for global/host db cli access Also fixed DB_CLI not working for namespace. * [Build] Increase the size of the installer image (sonic-net#11869) #### Why I did it Fix the build failure caused by the installer image size too small. The installer image is only used during the build, not impact the final images. See https://dev.azure.com/mssonic/build/_build/results?buildId=139926&view=logs&j=cef3d8a9-152e-5193-620b-567dc18af272&t=359769c4-8b5e-5976-a793-85da132e0a6f ``` + fallocate -l 2048M ./sonic-installer.img + mkfs.vfat ./sonic-installer.img mkfs.fat 4.2 (2021-01-31) ++ mktemp -d + tmpdir=/tmp/tmp.TqdDSc00Cn + mount -o loop ./sonic-installer.img /tmp/tmp.TqdDSc00Cn + cp target/sonic-vs.bin /tmp/tmp.TqdDSc00Cn/onie-installer.bin cp: error writing '/tmp/tmp.TqdDSc00Cn/onie-installer.bin': No space left on device [ FAIL LOG END ] [ target/sonic-vs.img.gz ] ``` #### How I did it Increase the size from 2048M to 4096M. Why not increase to 16G like qcow2 image? The qcow2 supports the sparse disk, although a big disk size allocated, but it will not consume the real disk size. The falocate does not support the sparse disk. We do not want to allocate a very big disk, but no use at all. It will require more space to build. * Update sensor names for msn4600c for the 5.10 kernel (sonic-net#11491) * Update sensor names for msn4600c for the 5.10 kernel Looks like a sensor was removed in the 5.10 kernel for the tps53679 sensor, so the names/indexing has changed. Related to sonic-net/sonic-mgmt#4513. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> * Update sensors file Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> * Fix error handling when failing to install a deb package (sonic-net#11846) The current error handling code for when a deb package fails to be installed currently has a chain of commands linked together by && and ends with `exit 1`. The assumption is that the commands would succeed, and the last `exit 1` would end it with a non-zero return code, thus fully failing the target and causing the build to stop because of bash's -e flag. However, if one of the commands prior to `exit 1` returns a non-zero return code, then bash won't actually treat it as a terminating error. From bash's man page: ``` -e Exit immediately if a pipeline (which may consist of a single simple command), a list, or a compound command (see SHELL GRAMMAR above), exits with a non-zero status. The shell does not exit if the command that fails is part of the command list immediately following a while or until keyword, part of the test following the if or elif reserved words, part of any command executed in a && or || list except the command following the final && or ||, any command in a pipeline but the last, or if the command's return value is being inverted with !. If a compound command other than a subshell returns a non-zero status because a command failed while -e was being ignored, the shell does not exit. ``` The part `part of any command executed in a && or || list except the command following the final && or ||` says that if the failing command is not the `exit 1` that we have at the end, then bash doesn't treat it as an error and exit immediately. Additionally, since this is a compound command, but isn't in a subshell (subshell are marked by `(` and `)`, whereas `{` and `}` just tells bash to run the commands in the current environment), bash doesn't exist. The result of this is that in the deb-install target, if a package installation fails, it may be infinitely stuck in that while-loop. There are two fixes for this: change to using a subshell, or use `;` instead of `&&`. Using a subshell would, I think, require exporting any shell variables used in the subshell, so I chose to change the `&&` to `;`. In addition, at the start of the subshell, `set +e` is added in, which removes the exit-on-error handling of bash. This makes sure that all commands are run (the output of which may help for debugging) and that it still exits with 1, which will then fully fail the target. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> * Fix vs check install login timeout issue (sonic-net#11727) Why I did it Fix a build not stable issue: sonic-net#11620 The vs vm has started successfully, but failed to wait for the message "sonic login:". There were 55 builds failed caused by the issue in the last 30 days. AzurePipelineBuildLogs | where startTime > ago(30d) | where type =~ "task" | where result =~ "failed" | where name =~ "Build sonic image" | where content contains "Timeout exceeded" | where content contains "re.compile('sonic login:')" | project-away content | extend branchName=case(reason=~"pullRequest", tostring(todynamic(parameters)['system.pullRequest.targetBranch']), replace("refs/heads/", "", sourceBranch)) | summarize FailedCount=dcount(buildId) by branchName branchName FailedCount master 37 202012 9 202106 4 202111 2 202205 1 201911 1 It is caused by the login message mixed with the output message of the /etc/rc.local, one of the examples as below: (see the message rc.local[307]: sonic+ onie_disco_subnet=255.255.255.0 login: ) The check_install.py was waiting for the message "sonic login:", and Linux console was waiting for the username input (the login message has already printed in the console). https://dev.azure.com/mssonic/build/_build/results?buildId=123294&view=logs&j=cef3d8a9-152e-5193-620b-567dc18af272&t=359769c4-8b5e-5976-a793-85da132e0a6f 2022-07-17T15:00:58.9198877Z [ 25.493855] rc.local[307]: + onie_disco_opt53=05 2022-07-17T15:00:58.9199330Z [ 25.595054] rc.local[307]: + onie_disco_router=10.0.2.2 2022-07-17T15:00:58.9199781Z [ 25.699409] rc.local[307]: + onie_disco_serverid=10.0.2.2 2022-07-17T15:00:58.9200252Z [ 25.789891] rc.local[307]: + onie_disco_siaddr=10.0.2.2 2022-07-17T15:00:58.9200622Z [ 25.880920] 2022-07-17T15:00:58.9200745Z 2022-07-17T15:00:58.9201019Z Debian GNU/Linux 10 sonic ttyS0 2022-07-17T15:00:58.9201201Z 2022-07-17T15:00:58.9201542Z rc.local[307]: sonic+ onie_disco_subnet=255.255.255.0 login: 2022-07-17T15:00:58.9202309Z [ 26.079767] rc.local[307]: + onie_exec_url=file://dev/vdb/onie-installer.bin How I did it Input a newline when finished to run the script /etc/rc.local. If entering a newline, the message "sonic login:" will prompt again. * [ci] Fix bug involved by PR 11810 which affect official build pipeline (sonic-net#11891) Why I did it Fix the official build not triggered correctly issue, caused by the azp template path not existing. How I did it Change the azp template path. * DellEMC: Z9332f - Graceful platform reboot (sonic-net#10240) Why I did it To gracefully unmount filesystems and stop containers while performing a cold reboot. Unmount ONIE-BOOT if mounted during fast/soft/warm reboot How I did it Override systemd-reboot service to perform a cold reboot. Unmount ONIE-BOOT if mounted using fast/soft/warm-reboot plugins. How to verify it On reboot, verify that the container stop and filesystem unmount services have completed execution before the platform reboot. * [Nokia][Nokia-IXR7250E-36x100G & Nokia-IXR7250E-36x400G] Update BCM (sonic-net#11577) config to support ERSPAN egress mirror and also set flag to preserve ECN * Align API get_device_runtime_metadata() for python version < 3.9 (sonic-net#11900) Why I did it: API get_device_runtime_metadata() added by sonic-net#11795 uses merge operator for dict but that is supported only for python version >=3.9. This API will be be used by scrips eg:hostcfgd which is still build for buster which does not have python 3.9 support. * [Arista7050cx3] TD3 SKU changes for pg headroom value after interop testing with cisco 8102 (sonic-net#11901) Why I did it After PFC interop testing between 8102 and 7050cx3, data packet losses were observed on the Rx ports of the 7050cx3 (inflow from 8102) during testing. This was primarily due to the slower response times to react to PFC pause packets for the 8102, when receiving such frames from neighboring devices. To solve for the packet drops, the 7050cx3 pg headroom size has to be increased to 160kB. How I did it Modified the xoff threshold value to 160kB in the pg_profile file to allow for the buffer manager to read that value when building the image, and configuring the device How to verify it run "mmuconfig -l" once image is built Signed-off-by: dojha <devojha@microsoft.com> * Add peer review comments on bgp * Add peer review changes + spacing * Add changes to events-swss * Add peer review changes in pmon swss * Add review changes dhcp-relay * Add peer review changes to host * Add changes to severity, leafref * Remove unused grouping * Remove redis generic Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> Signed-off-by: dojha <devojha@microsoft.com> Co-authored-by: Samuel Angebault <staphylo@arista.com> Co-authored-by: Junhua Zhai <junhua.zhai@outlook.com> Co-authored-by: Liu Shilong <shilongliu@microsoft.com> Co-authored-by: abdosi <58047199+abdosi@users.noreply.github.com> Co-authored-by: xumia <59720581+xumia@users.noreply.github.com> Co-authored-by: Saikrishna Arcot <sarcot@microsoft.com> Co-authored-by: Arun Saravanan Balachandran <52521751+ArunSaravananBalachandran@users.noreply.github.com> Co-authored-by: saksarav-nokia <sakthivadivu.saravanaraj@nokia.com> Co-authored-by: Dev Ojha <47282568+developfast@users.noreply.github.com>

xumia · 2022-11-20T04:49:37Z

It is to break the infinite while loop, found in the release branches, 202111 logs as below, so request the add the patch to release branches. Waiting for the build for 15 hours, and cannot open the too long logs.

	Line   1877: 2022-11-17T14:49:25.3670141Z [ FAIL LOG START ] [ target/debs/bullseye/libsnmp-dev_5.9+dfsg-3+b1_amd64.deb-install ]
	Line   1903: 2022-11-17T14:49:25.3832815Z [  FAIL LOG END  ] [ target/debs/bullseye/libsnmp-dev_5.9+dfsg-3+b1_amd64.deb-install ]
	Line   1904: 2022-11-17T14:49:40.1761304Z [ finished ] [ target/debs/bullseye/libsnmp-dev_5.9+dfsg-3+b1_amd64.deb-install ] 
	Line   1905: 2022-11-17T14:49:40.1762440Z [ FAIL LOG START ] [ target/debs/bullseye/libsnmp-dev_5.9+dfsg-3+b1_amd64.deb-install ]
	Line   1943: 2022-11-17T14:49:40.1803303Z [  FAIL LOG END  ] [ target/debs/bullseye/libsnmp-dev_5.9+dfsg-3+b1_amd64.deb-install ]
	Line   1944: 2022-11-17T14:49:54.9713950Z [ finished ] [ target/debs/bullseye/libsnmp-dev_5.9+dfsg-3+b1_amd64.deb-install ] 
	Line   1945: 2022-11-17T14:49:54.9715063Z [ FAIL LOG START ] [ target/debs/bullseye/libsnmp-dev_5.9+dfsg-3+b1_amd64.deb-install ]
	Line   1995: 2022-11-17T14:49:54.9775842Z [  FAIL LOG END  ] [ target/debs/bullseye/libsnmp-dev_5.9+dfsg-3+b1_amd64.deb-install ]
	Line   1996: 2022-11-17T14:50:10.1359304Z [ finished ] [ target/debs/bullseye/libsnmp-dev_5.9+dfsg-3+b1_amd64.deb-install ] 
	Line   1997: 2022-11-17T14:50:10.1360306Z [ FAIL LOG START ] [ target/debs/bullseye/libsnmp-dev_5.9+dfsg-3+b1_amd64.deb-install ]
	Line   2059: 2022-11-17T14:50:10.1439661Z [  FAIL LOG END  ] [ target/debs/bullseye/libsnmp-dev_5.9+dfsg-3+b1_amd64.deb-install ]
	Line   2060: 2022-11-17T14:50:24.8886666Z [ finished ] [ target/debs/bullseye/libsnmp-dev_5.9+dfsg-3+b1_amd64.deb-install ] 
	Line   2061: 2022-11-17T14:50:24.8887827Z [ FAIL LOG START ] [ target/debs/bullseye/libsnmp-dev_5.9+dfsg-3+b1_amd64.deb-install ]
	Line   2135: 2022-11-17T14:50:24.9155837Z [  FAIL LOG END  ] [ target/debs/bullseye/libsnmp-dev_5.9+dfsg-3+b1_amd64.deb-install ]
	Line   2136: 2022-11-17T14:50:39.8263591Z [ finished ] [ target/debs/bullseye/libsnmp-dev_5.9+dfsg-3+b1_amd64.deb-install ] 
	Line   2137: 2022-11-17T14:50:39.8264741Z [ FAIL LOG START ] [ target/debs/bullseye/libsnmp-dev_5.9+dfsg-3+b1_amd64.deb-install ]
	Line   2223: 2022-11-17T14:50:39.8390080Z [  FAIL LOG END  ] [ target/debs/bullseye/libsnmp-dev_5.9+dfsg-3+b1_amd64.deb-install ]

mssonicbld · 2022-11-21T04:25:31Z

@saiarcot895 PR conflicts with 202111 branch

…1846) The current error handling code for when a deb package fails to be installed currently has a chain of commands linked together by && and ends with `exit 1`. The assumption is that the commands would succeed, and the last `exit 1` would end it with a non-zero return code, thus fully failing the target and causing the build to stop because of bash's -e flag. However, if one of the commands prior to `exit 1` returns a non-zero return code, then bash won't actually treat it as a terminating error. From bash's man page: ``` -e Exit immediately if a pipeline (which may consist of a single simple command), a list, or a compound command (see SHELL GRAMMAR above), exits with a non-zero status. The shell does not exit if the command that fails is part of the command list immediately following a while or until keyword, part of the test following the if or elif reserved words, part of any command executed in a && or || list except the command following the final && or ||, any command in a pipeline but the last, or if the command's return value is being inverted with !. If a compound command other than a subshell returns a non-zero status because a command failed while -e was being ignored, the shell does not exit. ``` The part `part of any command executed in a && or || list except the command following the final && or ||` says that if the failing command is not the `exit 1` that we have at the end, then bash doesn't treat it as an error and exit immediately. Additionally, since this is a compound command, but isn't in a subshell (subshell are marked by `(` and `)`, whereas `{` and `}` just tells bash to run the commands in the current environment), bash doesn't exist. The result of this is that in the deb-install target, if a package installation fails, it may be infinitely stuck in that while-loop. There are two fixes for this: change to using a subshell, or use `;` instead of `&&`. Using a subshell would, I think, require exporting any shell variables used in the subshell, so I chose to change the `&&` to `;`. In addition, at the start of the subshell, `set +e` is added in, which removes the exit-on-error handling of bash. This makes sure that all commands are run (the output of which may help for debugging) and that it still exits with 1, which will then fully fail the target. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

…2777) Cherry-pick PR: #11846 Signed-off-by: Saikrishna Arcot sarcot@microsoft.com Why I did it The current error handling code for when a deb package fails to be installed currently has a chain of commands linked together by && and ends with exit 1. The assumption is that the commands would succeed, and the last exit 1 would end it with a non-zero return code, thus fully failing the target and causing the build to stop because of bash's -e flag. However, if one of the commands prior to exit 1 returns a non-zero return code, then bash won't actually treat it as a terminating error. From bash's man page: -e Exit immediately if a pipeline (which may consist of a single simple command), a list, or a compound command (see SHELL GRAMMAR above), exits with a non-zero status. The shell does not exit if the command that fails is part of the command list immediately following a while or until keyword, part of the test following the if or elif reserved words, part of any command executed in a && or || list except the command following the final && or ||, any command in a pipeline but the last, or if the command's return value is being inverted with !. If a compound command other than a subshell returns a non-zero status because a command failed while -e was being ignored, the shell does not exit. The part part of any command executed in a && or || list except the command following the final && or || says that if the failing command is not the exit 1 that we have at the end, then bash doesn't treat it as an error and exit immediately. Additionally, since this is a compound command, but isn't in a subshell (subshell are marked by ( and ), whereas { and } just tells bash to run the commands in the current environment), bash doesn't exist. The result of this is that in the deb-install target, if a package installation fails, it may be infinitely stuck in that while-loop. This was seen when the snmpd package upgrade happened, and builds were failing to install the mismatching libsnmp-dev package, the builds did not immediately terminate; instead, the installation was retried again and again, suggesting it was stuck in some infinite loop. The build jobs finally terminated only because of the timeout specified for the jobs. How I did it There are two fixes for this: change to using a subshell, or use ; instead of &&. Using a subshell would, I think, require exporting any shell variables used in the subshell, so I chose to change the && to ;. In addition, at the start of the subshell, set +e is added in, which removes the exit-on-error handling of bash. This makes sure that all commands are run (the output of which may help for debugging) and that it still exits with 1, which will then fully fail the target. How to verify it

yxieca · 2022-11-28T18:18:51Z

@saiarcot895 can you help raise separate PR for 202205 branch?

…1846) Signed-off-by: Saikrishna Arcot sarcot@microsoft.com Why I did it The current error handling code for when a deb package fails to be installed currently has a chain of commands linked together by && and ends with exit 1. The assumption is that the commands would succeed, and the last exit 1 would end it with a non-zero return code, thus fully failing the target and causing the build to stop because of bash's -e flag. However, if one of the commands prior to exit 1 returns a non-zero return code, then bash won't actually treat it as a terminating error. From bash's man page: -e Exit immediately if a pipeline (which may consist of a single simple command), a list, or a compound command (see SHELL GRAMMAR above), exits with a non-zero status. The shell does not exit if the command that fails is part of the command list immediately following a while or until keyword, part of the test following the if or elif reserved words, part of any command executed in a && or || list except the command following the final && or ||, any command in a pipeline but the last, or if the command's return value is being inverted with !. If a compound command other than a subshell returns a non-zero status because a command failed while -e was being ignored, the shell does not exit. The part part of any command executed in a && or || list except the command following the final && or || says that if the failing command is not the exit 1 that we have at the end, then bash doesn't treat it as an error and exit immediately. Additionally, since this is a compound command, but isn't in a subshell (subshell are marked by ( and ), whereas { and } just tells bash to run the commands in the current environment), bash doesn't exist. The result of this is that in the deb-install target, if a package installation fails, it may be infinitely stuck in that while-loop. This was seen when the snmpd package upgrade happened, and builds were failing to install the mismatching libsnmp-dev package, the builds did not immediately terminate; instead, the installation was retried again and again, suggesting it was stuck in some infinite loop. The build jobs finally terminated only because of the timeout specified for the jobs. How I did it There are two fixes for this: change to using a subshell, or use ; instead of &&. Using a subshell would, I think, require exporting any shell variables used in the subshell, so I chose to change the && to ;. In addition, at the start of the subshell, set +e is added in, which removes the exit-on-error handling of bash. This makes sure that all commands are run (the output of which may help for debugging) and that it still exits with 1, which will then fully fail the target. How to verify it

…2857) Signed-off-by: Saikrishna Arcot sarcot@microsoft.com Why I did it The current error handling code for when a deb package fails to be installed currently has a chain of commands linked together by && and ends with exit 1. The assumption is that the commands would succeed, and the last exit 1 would end it with a non-zero return code, thus fully failing the target and causing the build to stop because of bash's -e flag. However, if one of the commands prior to exit 1 returns a non-zero return code, then bash won't actually treat it as a terminating error. From bash's man page: -e Exit immediately if a pipeline (which may consist of a single simple command), a list, or a compound command (see SHELL GRAMMAR above), exits with a non-zero status. The shell does not exit if the command that fails is part of the command list immediately following a while or until keyword, part of the test following the if or elif reserved words, part of any command executed in a && or || list except the command following the final && or ||, any command in a pipeline but the last, or if the command's return value is being inverted with !. If a compound command other than a subshell returns a non-zero status because a command failed while -e was being ignored, the shell does not exit. The part part of any command executed in a && or || list except the command following the final && or || says that if the failing command is not the exit 1 that we have at the end, then bash doesn't treat it as an error and exit immediately. Additionally, since this is a compound command, but isn't in a subshell (subshell are marked by ( and ), whereas { and } just tells bash to run the commands in the current environment), bash doesn't exist. The result of this is that in the deb-install target, if a package installation fails, it may be infinitely stuck in that while-loop. This was seen when the snmpd package upgrade happened, and builds were failing to install the mismatching libsnmp-dev package, the builds did not immediately terminate; instead, the installation was retried again and again, suggesting it was stuck in some infinite loop. The build jobs finally terminated only because of the timeout specified for the jobs. How I did it There are two fixes for this: change to using a subshell, or use ; instead of &&. Using a subshell would, I think, require exporting any shell variables used in the subshell, so I chose to change the && to ;. In addition, at the start of the subshell, set +e is added in, which removes the exit-on-error handling of bash. This makes sure that all commands are run (the output of which may help for debugging) and that it still exits with 1, which will then fully fail the target. How to verify it

…1846) The current error handling code for when a deb package fails to be installed currently has a chain of commands linked together by && and ends with `exit 1`. The assumption is that the commands would succeed, and the last `exit 1` would end it with a non-zero return code, thus fully failing the target and causing the build to stop because of bash's -e flag. However, if one of the commands prior to `exit 1` returns a non-zero return code, then bash won't actually treat it as a terminating error. From bash's man page: ``` -e Exit immediately if a pipeline (which may consist of a single simple command), a list, or a compound command (see SHELL GRAMMAR above), exits with a non-zero status. The shell does not exit if the command that fails is part of the command list immediately following a while or until keyword, part of the test following the if or elif reserved words, part of any command executed in a && or || list except the command following the final && or ||, any command in a pipeline but the last, or if the command's return value is being inverted with !. If a compound command other than a subshell returns a non-zero status because a command failed while -e was being ignored, the shell does not exit. ``` The part `part of any command executed in a && or || list except the command following the final && or ||` says that if the failing command is not the `exit 1` that we have at the end, then bash doesn't treat it as an error and exit immediately. Additionally, since this is a compound command, but isn't in a subshell (subshell are marked by `(` and `)`, whereas `{` and `}` just tells bash to run the commands in the current environment), bash doesn't exist. The result of this is that in the deb-install target, if a package installation fails, it may be infinitely stuck in that while-loop. There are two fixes for this: change to using a subshell, or use `;` instead of `&&`. Using a subshell would, I think, require exporting any shell variables used in the subshell, so I chose to change the `&&` to `;`. In addition, at the start of the subshell, `set +e` is added in, which removes the exit-on-error handling of bash. This makes sure that all commands are run (the output of which may help for debugging) and that it still exits with 1, which will then fully fail the target. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

…5087) #### Why I did it Fix endless build log issue. Cherry pick [PR#11846](#11846) ##### Work item tracking - Microsoft ADO **(number only)**: 19299131 #### How I did it The current error handling code for when a deb package fails to be installed currently has a chain of commands linked together by && and ends with `exit 1`. The assumption is that the commands would succeed, and the last `exit 1` would end it with a non-zero return code, thus fully failing the target and causing the build to stop because of bash's -e flag. However, if one of the commands prior to `exit 1` returns a non-zero return code, then bash won't actually treat it as a terminating error. From bash's man page: ``` -e Exit immediately if a pipeline (which may consist of a single simple command), a list, or a compound command (see SHELL GRAMMAR above), exits with a non-zero status. The shell does not exit if the command that fails is part of the command list immediately following a while or until keyword, part of the test following the if or elif reserved words, part of any command executed in a && or || list except the command following the final && or ||, any command in a pipeline but the last, or if the command's return value is being inverted with !. If a compound command other than a subshell returns a non-zero status because a command failed while -e was being ignored, the shell does not exit. ``` The part `part of any command executed in a && or || list except the command following the final && or ||` says that if the failing command is not the `exit 1` that we have at the end, then bash doesn't treat it as an error and exit immediately. Additionally, since this is a compound command, but isn't in a subshell (subshell are marked by `(` and `)`, whereas `{` and `}` just tells bash to run the commands in the current environment), bash doesn't exist. The result of this is that in the deb-install target, if a package installation fails, it may be infinitely stuck in that while-loop. There are two fixes for this: change to using a subshell, or use `;` instead of `&&`. Using a subshell would, I think, require exporting any shell variables used in the subshell, so I chose to change the `&&` to `;`. In addition, at the start of the subshell, `set +e` is added in, which removes the exit-on-error handling of bash. This makes sure that all commands are run (the output of which may help for debugging) and that it still exits with 1, which will then fully fail the target. #### How to verify it

saiarcot895 requested review from qiluo-msft, xumia and lguohan as code owners August 25, 2022 16:40

saiarcot895 requested a review from liushilongbuaa August 25, 2022 16:59

qiluo-msft approved these changes Aug 26, 2022

View reviewed changes

saiarcot895 merged commit de54eec into sonic-net:master Aug 29, 2022

saiarcot895 deleted the fix-deb-install-exit-on-error branch August 29, 2022 18:35

xumia added Request for 202111 Branch For PRs being requested for 202111 branch Request for 202205 Branch labels Nov 20, 2022

xumia added the Approved for 202111 Branch label Nov 21, 2022

mssonicbld added the Cherry Pick Conflict_202111 label Nov 21, 2022

xumia mentioned this pull request Nov 21, 2022

Fix error handling when failing to install a deb package (#11846) #12777

Merged

7 tasks

yxieca added the Cherry Pick Conflict_202205 label Nov 28, 2022

saiarcot895 mentioned this pull request Nov 29, 2022

[202205] Fix error handling when failing to install a deb package (#11846) #12857

Merged

7 tasks

yxieca added the Included in 202205 Branch label Dec 7, 2022

liushilongbuaa mentioned this pull request May 18, 2023

Fix error handling when failing to install a deb package (#11846) #15087

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix error handling when failing to install a deb package #11846

Fix error handling when failing to install a deb package #11846

saiarcot895 commented Aug 25, 2022 •

edited

Loading

xumia commented Nov 20, 2022 •

edited

Loading

mssonicbld commented Nov 21, 2022

yxieca commented Nov 28, 2022

Fix error handling when failing to install a deb package #11846

Fix error handling when failing to install a deb package #11846

Conversation

saiarcot895 commented Aug 25, 2022 • edited Loading

Why I did it

How I did it

How to verify it

Which release branch to backport (provide reason below if selected)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

xumia commented Nov 20, 2022 • edited Loading

mssonicbld commented Nov 21, 2022

yxieca commented Nov 28, 2022

saiarcot895 commented Aug 25, 2022 •

edited

Loading

xumia commented Nov 20, 2022 •

edited

Loading