Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kdump] Fix OOM events in crashkernel #6447

Merged
merged 2 commits into from
Feb 2, 2021

Conversation

Staphylo
Copy link
Collaborator

@Staphylo Staphylo commented Jan 14, 2021

- Why I did it

A few issues where discovered with crashkernel on Arista platforms.

  1. platforms using docker_inram=on would end up OOM in kdump environment.
    This happens because the same initramfs is used by SONiC and the crashkernel.
    With docker_inram=on the dockerfs.tar.gz is extracted in a tmpfs created for the occasion.
    Since dockerfs.tar.gz weights more than 1.5G, it doesn't fit into the kdump environment and ends up OOM.
    This OOM event can in turn trigger a panic.

  2. Arista platforms with secureboot enabled would fail to load the crashkernel because the kernel parameter would be discarded on boot.
    This happens because the boot0 in secureboot mode is strict about kernel parameter injection.

  3. The secureboot path allowlist would remove kernel crash reports.

  4. The kdump service would fail on Arista products since /boot/ is empty in secureboot

- How I did it

  1. To prevent an OOM event in the crashkernel the fix is to avoid the codepaths in union-mount that create tmpfs and populate them. Some more codepath specific to Arista devices are also skipped to make the kdump process faster.
    This relies on detecting that the initramfs is starting in a kdump environment and skipping some initialization.
    The /usr/sbin/kdump-config tool appends a few kernel cmdline arguments when loading the crashkernel.
    The most unique one is systemd.unit=kdump-tools.service which is used in a few initramfs hooks to set in_kdump.

  2. To allow kdump to work in secureboot environment the cmdline generation in boot0 was slightly modified.
    The codepath to load kernel parameters changed by SONiC is now running for booting in secure mode.
    It was altered to prevent an append only behavior which would grow the kernel-cmdline at every reboot.
    This ever growing behavior would lead kexec to fail to load the kernel due to a too long cmdline.

  3. To get the kernel crash under /var/crash this path has to be added to allowlist_paths

  4. The /host/image-XXX/boot folder is now populated in secureboot mode but not used.

- How to verify it

Regular boot:

  • enable kdump
  • enable docker_inram=on via kernel-params
  • reboot
  • generate a crash echo c > /proc/sysrq-trigger
  • before: witness OOM events on the console
  • after: crash kernel works and crash available under /var/crash

Secure boot:

  • enable kdump
  • reboot
  • generate a crash echo c > /proc/sysrq-trigger
  • before: witness no kdump
  • after: crash kernel works and crash available under /var/crash

- Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012

This bug is present in 202012 and needs to be addressed.

- Description for the changelog

Fix kdump crashkernel errors seen on Arista devices

@Staphylo Staphylo marked this pull request as ready for review January 14, 2021 15:58
@Staphylo
Copy link
Collaborator Author

retest vsimage please

@Staphylo
Copy link
Collaborator Author

@jleveque @qiluo-msft

write_cmdline() {
# use extra parameters from kernel-params hook if the file exists
if [ -f "$target_path/$kernel_params" ]; then
if $secureboot && $debug; then
Copy link
Collaborator

@qiluo-msft qiluo-msft Jan 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if $secureboot && $debug; then [](start = 8, length = 30)

if $secureboot && $debug; then [](start = 8, length = 30)

If secureboot and not debug, you will not use extra parameter. This is surprising different than the case of secureboot and debug. #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct.
The $target_path/$kernel_params file which is /host/kernel-params allows the user to inject more kernel parameters on the cmdline. It's especially useful during development to figure out which set of parameters should be used on a platform. But it can also be used to change some runtime behavior. Unfortunately some could have unfortunate behavior and end up compromising a system. To remove the risk of an attacker injecting harmful parameter we disable this feature in secureboot and print a message. It is still possible to boot the box in debug mode and inject parameters but that will show at runtime.
If you prefer I can disable this mechanism for secureboot entirely, including secureboot && debug.

@Staphylo
Copy link
Collaborator Author

retest vsimage please

@lguohan lguohan merged commit 0c4d4ac into sonic-net:master Feb 2, 2021
@lguohan
Copy link
Collaborator

lguohan commented Feb 3, 2021

is this needed for 202012?

@Staphylo
Copy link
Collaborator Author

Staphylo commented Feb 3, 2021

Yes it's needed for 202012

deran1980 pushed a commit to deran1980/sonic-buildimage that referenced this pull request Feb 4, 2021
A few issues where discovered with crashkernel on Arista platforms.

1) platforms using `docker_inram=on` would end up OOM in kdump environment.
This happens because the same initramfs is used by SONiC and the crashkernel.
With `docker_inram=on` the `dockerfs.tar.gz` is extracted in a `tmpfs` created for the occasion.
Since `dockerfs.tar.gz` weights more than 1.5G, it doesn't fit into the kdump environment and ends up OOM.
This OOM event can in turn trigger a panic.

2) Arista platforms with `secureboot` enabled would fail to load the crashkernel because the kernel parameter would be discarded on boot.
This happens because the `boot0` in secureboot mode is strict about kernel parameter injection.

3) The secureboot path allowlist would remove kernel crash reports.

4) The kdump service would fail on Arista products since `/boot/` is empty in `secureboot`

**- How I did it**

1) To prevent an OOM event in the crashkernel the fix is to avoid the codepaths in `union-mount` that create tmpfs and populate them. Some more codepath specific to Arista devices are also skipped to make the kdump process faster.
This relies on detecting that the initramfs is starting in a kdump environment and skipping some initialization.
The `/usr/sbin/kdump-config` tool appends a few kernel cmdline arguments when loading the crashkernel.
The most unique one is `systemd.unit=kdump-tools.service` which is used in a few initramfs hooks to set `in_kdump`.

2) To allow `kdump` to work in `secureboot` environment the cmdline generation in boot0 was slightly modified.
The codepath to load kernel parameters changed by SONiC is now running for booting in secure mode.
It was altered to prevent an append only behavior which would grow the `kernel-cmdline` at every reboot.
This ever growing behavior would lead `kexec` to fail to load the kernel due to a too long cmdline.

3) To get the kernel crash under /var/crash this path has to be added to `allowlist_paths`

4) The `/host/image-XXX/boot` folder is now populated in `secureboot` mode but not used.

**- How to verify it**

Regular boot:
 - enable kdump
 - enable docker_inram=on via kernel-params
 - reboot
 - generate a crash `echo c > /proc/sysrq-trigger`
 - before: witness OOM events on the console
 - after: crash kernel works and crash available under /var/crash

Secure boot:
 - enable kdump
 - reboot
 - generate a crash `echo c > /proc/sysrq-trigger`
 - before: witness no kdump
 - after: crash kernel works and crash available under /var/crash


Co-authored-by: Boyang Yu <byu@arista.com>
@Staphylo Staphylo deleted the master-crashkernel branch December 6, 2022 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants