Skip to content

[rocky10_0] History Rebuild to kernel-6.12.0-55.18.1.el10_0 #382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 160 commits into from
Jul 2, 2025

Conversation

PlaidCat
Copy link
Collaborator

This is the attempt at a re-builder built on Cron and some internal tools, but the same process is as follows as previous rebuilds.

FIRST REBUILD

  • Download all unprocessed src.rpm
  • for each src,pm
    • Find all commits in changelog up to last known tag ... in this case 6.12.0-55
    • Re-play commits in reverse order (oldest in change log to newest) with git cherry-pick
    • After replay replace ENTIRE code in branch with rpmbuild -bp from corresponding src.rpm.
    • Tag Rebuild branch
  • Use New Local Build with prodman and test (note test results will be different than usual)

Checking Rebuild Commits for potentially missing commits:

kernel-6.12.0-55.11.1.el10_0

Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v6.12~1..kernel-mainline: 52012
Number of commits in rpm: 1459
Number of commits matched with upstream: 81 (5.55%)
Number of commits in upstream but not in rpm: 51933
Number of commits NOT found in upstream: 1378 (94.45%)

Rebuilding Kernel on Branch rocky10_0_rebuild_kernel-6.12.0-55.11.1.el10_0 for kernel-6.12.0-55.11.1.el10_0
Clean Cherry Picks: 74 (91.36%)
Empty Cherry Picks: 5 (6.17%)
_______________________________

__EMPTY COMMITS__________________________
35a0437d9f33071d81d51af70432ecab1e686078 scsi: mpi3mr: Update driver version to 8.12.1.0.50
f10593ad9bc36921f623361c9e3dd96bd52d85ee scsi: sg: Fix slab-use-after-free read in sg_release()
ad95bab0cd28ed77c2c0d0b6e76e03e031391064 nvme-tcp: fix potential memory corruption in nvme_tcp_recv_pdu()
ebb2693f8fbdb444e01ac6b6b50282aaabc96e77 ice: Read SDP section from NVM for pin definitions
53ce7166cbffd2b8f3bd821fd3918be665afd4c6 ice: ensure periodic output start time is in the future

__CHANGES NOT IN UPSTREAM________________
Porting to Rocky Linux 10, debranding and Rocky Linux branding'
Add partial riscv64 support for build root'
Provide basic VisionFive 2 support'
redhat: drop Y issues from changelog
redhat: kabi: update stablelist checksums
shrinker: include rh_kabi.h
redhat: pad the folio_batch struct
redhat: pad the lruvec structure for the kabi
redhat: pad the zone structures for kabi
redhat: pad the wait_page_queue for the kabi
redhat: pad the lru_gen functions for kabi

[SNIP]

kabi: add symbol acpi_disabled to stablelist
kabi: add symbol abort_creds to stablelist
rhel_files: ensure all qdiscs are in modules-core
configs: enable FW_CACHE on centos-stream/rhel 10 for nouveau
kernel.spec: add missing tools-libs on s390x
redhat/configs: Enable Mediatek Bluetooth USB drivers
CVE-2025-1272: security: Re-enable lockdown LSM in some setup_arch()
uki: enable FIPS mode
redhat: regenerate self-test data for new RHDISTGIT_BRANCH
redhat: adjust rhel branch name
redhat: set defaults for RHEL 10.0
Revert "x86/kvm: Override default caching mode for SEV-SNP and TDX"

kernel-6.12.0-55.12.1.el10_0

[jmaple@devbox ciq_backports]$ cat kernel-6.12.0-55.12.1.el10_0/rebuild.details.txt
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v6.12~1..kernel-mainline: 52012
Number of commits in rpm: 9
Number of commits matched with upstream: 6 (66.67%)
Number of commits in upstream but not in rpm: 52006
Number of commits NOT found in upstream: 3 (33.33%)

Rebuilding Kernel on Branch rocky10_0_rebuild_kernel-6.12.0-55.12.1.el10_0 for kernel-6.12.0-55.12.1.el10_0
Clean Cherry Picks: 6 (100.00%)
Empty Cherry Picks: 0 (0.00%)
_______________________________

__EMPTY COMMITS__________________________

__CHANGES NOT IN UPSTREAM________________
Porting to Rocky Linux 10, debranding and Rocky Linux branding'
Add partial riscv64 support for build root'
Provide basic VisionFive 2 support'

kernel-6.12.0-55.13.1.el10_0

[jmaple@devbox ciq_backports]$ cat kernel-6.12.0-55.12.1.el10_0/rebuild.details.txt
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v6.12~1..kernel-mainline: 52012
Number of commits in rpm: 9
Number of commits matched with upstream: 6 (66.67%)
Number of commits in upstream but not in rpm: 52006
Number of commits NOT found in upstream: 3 (33.33%)

Rebuilding Kernel on Branch rocky10_0_rebuild_kernel-6.12.0-55.12.1.el10_0 for kernel-6.12.0-55.12.1.el10_0
Clean Cherry Picks: 6 (100.00%)
Empty Cherry Picks: 0 (0.00%)
_______________________________

__EMPTY COMMITS__________________________

__CHANGES NOT IN UPSTREAM________________
Porting to Rocky Linux 10, debranding and Rocky Linux branding'
Add partial riscv64 support for build root'
Provide basic VisionFive 2 support'
[jmaple@devbox ciq_backports]$ cat kernel-6.12.0-55.13.1.el10_0/rebuild.details.txt
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v6.12~1..kernel-mainline: 52012
Number of commits in rpm: 27
Number of commits matched with upstream: 20 (74.07%)
Number of commits in upstream but not in rpm: 51992
Number of commits NOT found in upstream: 7 (25.93%)

Rebuilding Kernel on Branch rocky10_0_rebuild_kernel-6.12.0-55.13.1.el10_0 for kernel-6.12.0-55.13.1.el10_0
Clean Cherry Picks: 20 (100.00%)
Empty Cherry Picks: 0 (0.00%)
_______________________________

__EMPTY COMMITS__________________________

__CHANGES NOT IN UPSTREAM________________
Porting to Rocky Linux 10, debranding and Rocky Linux branding'
Add partial riscv64 support for build root'
Provide basic VisionFive 2 support'
gitlab-ci: use rhel10.0 builder image
redhat: enable CONFIG_WERROR=y
redhat: don't enforce WERROR for 3rd-party OOT kmods
redhat: make ENABLE_WERROR enable also KVM_WERROR

kernel-6.12.0-55.13.1.el10_0

[jmaple@devbox ciq_backports]$ cat kernel-6.12.0-55.13.1.el10_0/rebuild.details.txt
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v6.12~1..kernel-mainline: 52012
Number of commits in rpm: 27
Number of commits matched with upstream: 20 (74.07%)
Number of commits in upstream but not in rpm: 51992
Number of commits NOT found in upstream: 7 (25.93%)

Rebuilding Kernel on Branch rocky10_0_rebuild_kernel-6.12.0-55.13.1.el10_0 for kernel-6.12.0-55.13.1.el10_0
Clean Cherry Picks: 20 (100.00%)
Empty Cherry Picks: 0 (0.00%)
_______________________________

__EMPTY COMMITS__________________________

__CHANGES NOT IN UPSTREAM________________
Porting to Rocky Linux 10, debranding and Rocky Linux branding'
Add partial riscv64 support for build root'
Provide basic VisionFive 2 support'
gitlab-ci: use rhel10.0 builder image
redhat: enable CONFIG_WERROR=y
redhat: don't enforce WERROR for 3rd-party OOT kmods
redhat: make ENABLE_WERROR enable also KVM_WERROR

kernel-6.12.0-55.14.1.el10_0

[jmaple@devbox ciq_backports]$ cat kernel-6.12.0-55.14.1.el10_0/rebuild.details.txt
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v6.12~1..kernel-mainline: 52012
Number of commits in rpm: 16
Number of commits matched with upstream: 13 (81.25%)
Number of commits in upstream but not in rpm: 52001
Number of commits NOT found in upstream: 3 (18.75%)

Rebuilding Kernel on Branch rocky10_0_rebuild_kernel-6.12.0-55.14.1.el10_0 for kernel-6.12.0-55.14.1.el10_0
Clean Cherry Picks: 9 (69.23%)
Empty Cherry Picks: 2 (15.38%)
_______________________________

__EMPTY COMMITS__________________________
f2e2092a979cd46b43445daf23628015ac776ac3 drm/i915/display: Use joined pipes in dsc helpers for slices, bpp
802a69b6b8a0502a9e2309afec7e1b77f67874f2 drm/i915/dp_mst: Handle error during DSC BW overhead/slice calculation

__CHANGES NOT IN UPSTREAM________________
Porting to Rocky Linux 10, debranding and Rocky Linux branding'
Add partial riscv64 support for build root'
Provide basic VisionFive 2 support'

kernel-6.12.0-55.16.1.el10_0

[jmaple@devbox ciq_backports]$ cat kernel-6.12.0-55.16.1.el10_0/rebuild.details.txt
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v6.12~1..kernel-mainline: 52012
Number of commits in rpm: 8
Number of commits matched with upstream: 5 (62.50%)
Number of commits in upstream but not in rpm: 52007
Number of commits NOT found in upstream: 3 (37.50%)

Rebuilding Kernel on Branch rocky10_0_rebuild_kernel-6.12.0-55.16.1.el10_0 for kernel-6.12.0-55.16.1.el10_0
Clean Cherry Picks: 5 (100.00%)
Empty Cherry Picks: 0 (0.00%)
_______________________________

__EMPTY COMMITS__________________________

__CHANGES NOT IN UPSTREAM________________
Porting to Rocky Linux 10, debranding and Rocky Linux branding'
Add partial riscv64 support for build root'
Provide basic VisionFive 2 support'

kernel-6.12.0-55.17.1.el10_0

[jmaple@devbox ciq_backports]$ cat kernel-6.12.0-55.17.1.el10_0/rebuild.details.txt
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v6.12~1..kernel-mainline: 52012
Number of commits in rpm: 24
Number of commits matched with upstream: 21 (87.50%)
Number of commits in upstream but not in rpm: 51991
Number of commits NOT found in upstream: 3 (12.50%)

Rebuilding Kernel on Branch rocky10_0_rebuild_kernel-6.12.0-55.17.1.el10_0 for kernel-6.12.0-55.17.1.el10_0
Clean Cherry Picks: 21 (100.00%)
Empty Cherry Picks: 0 (0.00%)
_______________________________

__EMPTY COMMITS__________________________

__CHANGES NOT IN UPSTREAM________________
Porting to Rocky Linux 10, debranding and Rocky Linux branding'
Add partial riscv64 support for build root'
Provide basic VisionFive 2 support'

kernel-6.12.0-55.18.1.el10_0

[jmaple@devbox ciq_backports]$ cat kernel-6.12.0-55.18.1.el10_0/rebuild.details.txt
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v6.12~1..kernel-mainline: 52012
Number of commits in rpm: 12
Number of commits matched with upstream: 9 (75.00%)
Number of commits in upstream but not in rpm: 52003
Number of commits NOT found in upstream: 3 (25.00%)

Rebuilding Kernel on Branch rocky10_0_rebuild_kernel-6.12.0-55.18.1.el10_0 for kernel-6.12.0-55.18.1.el10_0
Clean Cherry Picks: 7 (77.78%)
Empty Cherry Picks: 2 (22.22%)
_______________________________

__EMPTY COMMITS__________________________
fc0e982b8a3a169b1c654d9a1aa45bf292943ef2 block: make sure ->nr_integrity_segments is cloned in blk_rq_prep_clone
690e47d1403e90b7f2366f03b52ed3304194c793 sched/rt: Fix race in push_rt_task

__CHANGES NOT IN UPSTREAM________________
Porting to Rocky Linux 10, debranding and Rocky Linux branding'
Add partial riscv64 support for build root'
Provide basic VisionFive 2 support'

Build

[jmaple@devbox code]$ egrep -B 5 -A 5 "\[TIMER\]|^Starting Build" kbuild.resf_kernel-6.12.0-55.18.1.el10_0.log
/mnt/code/kernel-src-tree
no .config file found, moving on
[TIMER]{MRPROPER}: 0s
x86_64 architecture detected, copying config
'configs/kernel-x86_64-rhel.config' -> '.config'
Setting Local Version for build
CONFIG_LOCALVERSION="-rocky10_0_rebuild-ddc9a4ae48f1"
Making olddefconfig
--
  HOSTCC  scripts/kconfig/util.o
  HOSTLD  scripts/kconfig/conf
#
# configuration written to .config
#
Starting Build
  SYNC    include/config/auto.conf
  GEN     arch/x86/include/generated/asm/orc_hash.h
  UPD     include/generated/uapi/linux/version.h
  WRAP    arch/x86/include/generated/uapi/asm/bpf_perf_event.h
  WRAP    arch/x86/include/generated/uapi/asm/errno.h
--
  LD [M]  net/qrtr/qrtr.ko
  LD [M]  net/qrtr/qrtr-mhi.ko
  BTF [M] net/qrtr/qrtr.ko
  BTF [M] net/hsr/hsr.ko
  BTF [M] net/qrtr/qrtr-mhi.ko
[TIMER]{BUILD}: 2271s
Making Modules
  SYMLINK /lib/modules/6.12.0-rocky10_0_rebuild-ddc9a4ae48f1+/build
  INSTALL /lib/modules/6.12.0-rocky10_0_rebuild-ddc9a4ae48f1+/modules.order
  INSTALL /lib/modules/6.12.0-rocky10_0_rebuild-ddc9a4ae48f1+/modules.builtin
  INSTALL /lib/modules/6.12.0-rocky10_0_rebuild-ddc9a4ae48f1+/modules.builtin.modinfo
--
  STRIP   /lib/modules/6.12.0-rocky10_0_rebuild-ddc9a4ae48f1+/kernel/net/qrtr/qrtr-mhi.ko
  SIGN    /lib/modules/6.12.0-rocky10_0_rebuild-ddc9a4ae48f1+/kernel/net/qrtr/qrtr-mhi.ko
  SIGN    /lib/modules/6.12.0-rocky10_0_rebuild-ddc9a4ae48f1+/kernel/net/hsr/hsr.ko
  SIGN    /lib/modules/6.12.0-rocky10_0_rebuild-ddc9a4ae48f1+/kernel/net/qrtr/qrtr.ko
  DEPMOD  /lib/modules/6.12.0-rocky10_0_rebuild-ddc9a4ae48f1+
[TIMER]{MODULES}: 10s
Making Install
  INSTALL /boot
[TIMER]{INSTALL}: 20s
Checking kABI
Checking kABI
kABI check passed
Setting Default Kernel to /boot/vmlinuz-6.12.0-rocky10_0_rebuild-ddc9a4ae48f1+ and Index to 0
Hopefully Grub2.0 took everything ... rebooting after time metrices
[TIMER]{MRPROPER}: 0s
[TIMER]{BUILD}: 2271s
[TIMER]{MODULES}: 10s
[TIMER]{INSTALL}: 20s
[TIMER]{TOTAL} 2306s
Rebooting in 10 seconds

KSelfTest

[jmaple@devbox code]$ grep '^ok ' kselftest.6.12.0-rocky10_0_rebuild-ddc9a4ae48f1+.log  | wc -l
498

@PlaidCat PlaidCat self-assigned this Jun 30, 2025
@PlaidCat PlaidCat force-pushed the rocky10_0_rebuild branch 2 times, most recently from bc1efce to 1bf1201 Compare June 30, 2025 21:48
@thefossguy-ciq
Copy link

The configs/ directory is devoid of the configs and as a result, all workflows are failing.

PlaidCat added 22 commits July 1, 2025 17:01
jira LE-3460
Rebuild_History Non-Buildable kernel-6.12.0-55.11.1.el10_0
commit-author Ranjan Kumar <ranjan.kumar@broadcom.com>
commit 367ac16

The driver serializes ioctls through a mutex lock but access to the
ioctl data buffer is not guarded by the mutex. This results in multiple
user threads being able to write to the driver's ioctl buffer
simultaneously.

Protect the ioctl buffer with the ioctl mutex.

	Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
	Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20241110194405.10108-2-ranjan.kumar@broadcom.com
	Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 367ac16)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
cve CVE-2024-57804
Rebuild_History Non-Buildable kernel-6.12.0-55.11.1.el10_0
commit-author Ranjan Kumar <ranjan.kumar@broadcom.com>
commit 711201a

The driver, through the SAS transport, exposes a sysfs interface to
enable/disable PHYs in a controller/expander setup.  When multiple PHYs
are disabled and enabled in rapid succession, the persistent and current
config pages related to SAS IO unit/SAS Expander pages could get
corrupted.

Use separate memory for each config request.

	Signed-off-by: Prayas Patel <prayas.patel@broadcom.com>
	Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20241110194405.10108-3-ranjan.kumar@broadcom.com
	Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 711201a)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
Rebuild_History Non-Buildable kernel-6.12.0-55.11.1.el10_0
commit-author Ranjan Kumar <ranjan.kumar@broadcom.com>
commit 0d32014

Instead of displaying the controller index starting from '1' make the
driver display the controller index starting from '0'.

	Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
	Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20241110194405.10108-4-ranjan.kumar@broadcom.com
	Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 0d32014)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
Rebuild_History Non-Buildable kernel-6.12.0-55.11.1.el10_0
commit-author Ranjan Kumar <ranjan.kumar@broadcom.com>
commit fb6eb98

Before retrying initialization, check and abort if the fault code
indicates insufficient power. Also mark the controller as unrecoverable
instead of issuing reset in the watch dog timer if the fault code
indicates insufficient power.

	Signed-off-by: Prayas Patel <prayas.patel@broadcom.com>
	Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20241110194405.10108-5-ranjan.kumar@broadcom.com
	Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit fb6eb98)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
Rebuild_History Non-Buildable kernel-6.12.0-55.11.1.el10_0
commit-author Guixin Liu <kanie@linux.alibaba.com>
commit 295006f

If bsg_setup_queue() fails, the bsg_queue is assigned a non-NULL value.
Consequently, in mpi3mr_bsg_exit(), the condition "if(!mrioc->bsg_queue)"
will not be satisfied, preventing execution from entering
bsg_remove_queue(), which could lead to the following crash:

BUG: kernel NULL pointer dereference, address: 000000000000041c
Call Trace:
  <TASK>
  mpi3mr_bsg_exit+0x1f/0x50 [mpi3mr]
  mpi3mr_remove+0x6f/0x340 [mpi3mr]
  pci_device_remove+0x3f/0xb0
  device_release_driver_internal+0x19d/0x220
  unbind_store+0xa4/0xb0
  kernfs_fop_write_iter+0x11f/0x200
  vfs_write+0x1fc/0x3e0
  ksys_write+0x67/0xe0
  do_syscall_64+0x38/0x80
  entry_SYSCALL_64_after_hwframe+0x78/0xe2

Fixes: 4268fa7 ("scsi: mpi3mr: Add bsg device support")
	Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
Link: https://lore.kernel.org/r/20250107022032.24006-1-kanie@linux.alibaba.com
	Reviewed-by: Bart Van Assche <bvanassche@acm.org>
	Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 295006f)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
Rebuild_History Non-Buildable kernel-6.12.0-55.11.1.el10_0
commit-author Ranjan Kumar <ranjan.kumar@broadcom.com>
commit f08b24d

To avoid reply queue full condition, update the driver to check IOCFacts
capabilities for qfull.

Update the operational reply queue's Consumer Index after processing 100
replies. If pending I/Os on a reply queue exceeds a threshold
(reply_queue_depth - 200), then return I/O back to OS to retry.

Also increase default admin reply queue size to 2K.

	Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
	Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20250129100850.25430-2-ranjan.kumar@broadcom.com
	Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit f08b24d)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
Rebuild_History Non-Buildable kernel-6.12.0-55.11.1.el10_0
commit-author Ranjan Kumar <ranjan.kumar@broadcom.com>
commit f195fc0

When the task management thread processes reply queues while the reset
thread resets them, the task management thread accesses an invalid queue ID
(0xFFFF), set by the reset thread, which points to unallocated memory,
causing a crash.

Add flag 'io_admin_reset_sync' to synchronize access between the reset,
I/O, and admin threads. Before a reset, the reset handler sets this flag to
block I/O and admin processing threads. If any thread bypasses the initial
check, the reset thread waits up to 10 seconds for processing to finish. If
the wait exceeds 10 seconds, the controller is marked as unrecoverable.

	Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
	Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20250129100850.25430-4-ranjan.kumar@broadcom.com
	Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit f195fc0)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
Rebuild_History Non-Buildable kernel-6.12.0-55.11.1.el10_0
commit-author Ranjan Kumar <ranjan.kumar@broadcom.com>
commit 35a0437
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-6.12.0-55.11.1.el10_0/35a0437d.failed

Update driver version to 8.12.1.0.50

	Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20250129100850.25430-5-ranjan.kumar@broadcom.com
	Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 35a0437)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	drivers/scsi/mpi3mr/mpi3mr.h
jira LE-3460
Rebuild_History Non-Buildable kernel-6.12.0-55.11.1.el10_0
commit-author Sourabh Jain <sourabhjain@linux.ibm.com>
commit 0bdd7ff

Commit 683eab9 ("powerpc/fadump: setup additional parameters for
dump capture kernel") introduced the additional parameter feature in
fadump for HASH MMU with the understanding that GRUB does not use the
memory area between 640MB and 768MB for its operation.

However, the third patch ("powerpc: increase MIN RMA size for CAS
negotiation") in this series is changing the MIN RMA size to 768MB,
allowing GRUB to use memory up to 768MB. This makes the fadump
reservation for the additional parameter feature for HASH MMU
unreliable.

To address this, export the MIN_RMA so that the next patch
("powerpc/fadump: fix additional param memory reservation for HASH MMU")
can identify the correct memory range for the additional parameter
feature in fadump for HASH MMU.

	Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
	Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
	Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250123114254.200527-2-sourabhjain@linux.ibm.com

(cherry picked from commit 0bdd7ff)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
Rebuild_History Non-Buildable kernel-6.12.0-55.11.1.el10_0
commit-author Sourabh Jain <sourabhjain@linux.ibm.com>
commit b7bb460

Commit 683eab9 ("powerpc/fadump: setup additional parameters for
dump capture kernel") introduced the additional parameter feature in
fadump for HASH MMU with the understanding that GRUB does not use the
memory area between 640MB and 768MB for its operation.

However, the third patch in this series ("powerpc: increase MIN RMA
size for CAS negotiation") changes the MIN RMA size to 768MB, allowing
GRUB to use memory up to 768MB. This makes the fadump reservation for
the additional parameter feature for HASH MMU unreliable.

To address this, adjust the memory range for the additional parameter in
fadump for HASH MMU. This will ensure that GRUB does not overwrite the
memory reserved for fadump's additional parameter in HASH MMU.

The new policy for the memory range for the additional parameter in HASH
MMU is that the first memory block must be larger than the MIN_RMA size,
as the bootloader can use memory up to the MIN_RMA size. The range
should be between MIN_RMA and the RMA size (ppc64_rma_size), and it must
not overlap with the fadump reserved area.

	Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
	Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
	Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
	Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250123114254.200527-3-sourabhjain@linux.ibm.com

(cherry picked from commit b7bb460)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
Rebuild_History Non-Buildable kernel-6.12.0-55.11.1.el10_0
commit-author Avnish Chouhan <avnish@linux.ibm.com>
commit fdc4453

Change RMA size from 512 MB to 768 MB which will result
in more RMA at boot time for PowerPC. When PowerPC LPAR use/uses vTPM,
Secure Boot or FADump, the 512 MB RMA memory is not sufficient for
booting. With this 512 MB RMA, GRUB2 run out of memory and unable to
load the necessary. Sometimes even usage of CDROM which requires more
memory for installation along with the options mentioned above troubles
the boot memory and result in boot failures. Increasing the RMA size
will resolves multiple out of memory issues observed in PowerPC.

Failure details:

1. GRUB2

kern/ieee1275/init.c:550: mm requested region of size 8513000, flags 1
kern/ieee1275/init.c:563: Cannot satisfy allocation and retain minimum runtime
space
kern/ieee1275/init.c:550: mm requested region of size 8513000, flags 0
kern/ieee1275/init.c:563: Cannot satisfy allocation and retain minimum runtime
space
kern/file.c:215: Closing `/ppc/ppc64/initrd.img' ...
kern/disk.c:297: Closing
`ieee1275//vdevice/v-scsi
@30000067/disk8300000000000000'...
kern/disk.c:311: Closing
`ieee1275//vdevice/v-scsi
@30000067/disk8300000000000000' succeeded.
kern/file.c:225: Closing `/ppc/ppc64/initrd.img' failed with 3.
kern/file.c:148: Opening `/ppc/ppc64/initrd.img' succeeded.
error: ../../grub-core/kern/mm.c:552:out of memory.

2. Kernel

[    0.777633] List of all partitions:
[    0.777639] No filesystem could mount root, tried:
[    0.777640]
[    0.777649] Kernel panic - not syncing: VFS: Unable to mount root fs on "" or unknown-block(0,0)
[    0.777658] CPU: 17 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-0.rc4.20.el10.ppc64le #1
[    0.777669] Hardware name: IBM,9009-22A POWER9 (architected) 0x4e0202 0xf000005 of:IBM,FW950.B0 (VL950_149) hv:phyp pSeries
[    0.777678] Call Trace:
[    0.777682] [c000000003db7b60] [c000000001119714] dump_stack_lvl+0x88/0xc4 (unreliable)
[    0.777700] [c000000003db7b90] [c00000000016c274] panic+0x174/0x460
[    0.777711] [c000000003db7c30] [c00000000200631c] mount_root_generic+0x320/0x354
[    0.777724] [c000000003db7d00] [c0000000020066f8] prepare_namespace+0x27c/0x2f4
[    0.777735] [c000000003db7d90] [c000000002005824] kernel_init_freeable+0x254/0x294
[    0.777747] [c000000003db7df0] [c00000000001131c] kernel_init+0x30/0x1c4
[    0.777757] [c000000003db7e50] [c00000000000debc] ret_from_kernel_user_thread+0x14/0x1c
[    0.777768] --- interrupt: 0 at 0x0
[    0.784238] pstore: backend (nvram) writing error (-1)
[    0.790447] Rebooting in 10 seconds..

	Signed-off-by: Avnish Chouhan <avnish@linux.ibm.com>
	Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250123114254.200527-4-sourabhjain@linux.ibm.com

(cherry picked from commit fdc4453)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
Rebuild_History Non-Buildable kernel-6.12.0-55.11.1.el10_0
commit-author Sourabh Jain <sourabhjain@linux.ibm.com>
commit 61c403b

Update the fadump document to include details about the fadump
additional parameter feature.

The document includes the following:
- Significance of the feature
- How to use it
- Feature restrictions

No functional changes are introduced.

	Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
	Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
	Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250123114254.200527-5-sourabhjain@linux.ibm.com

(cherry picked from commit 61c403b)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
Rebuild_History Non-Buildable kernel-6.12.0-55.11.1.el10_0
commit-author Yang Shi <yang@os.amperecomputing.com>
commit 56a7087

Commit ba0fb44 ("dma-mapping: replace zone_dma_bits by
zone_dma_limit") and subsequent patches changed how zone_dma_limit is
calculated to allow a reduced ZONE_DMA even when RAM starts above 4GB.
Commit 122c234 ("arm64: mm: keep low RAM dma zone") further fixed
this to ensure ZONE_DMA remains below U32_MAX if RAM starts below 4GB,
especially on platforms that do not have IORT or DT description of the
device DMA ranges. While zone boundaries calculation was fixed by the
latter commit, zone_dma_limit, used to determine the GFP_DMA flag in the
core code, was not updated. This results in excessive use of GFP_DMA and
unnecessary ZONE_DMA allocations on some platforms.

Update zone_dma_limit to match the actual upper bound of ZONE_DMA.

Fixes: ba0fb44 ("dma-mapping: replace zone_dma_bits by zone_dma_limit")
	Cc: <stable@vger.kernel.org> # 6.12.x
	Reported-by: Yutang Jiang <jiangyutang@os.amperecomputing.com>
	Tested-by: Yutang Jiang <jiangyutang@os.amperecomputing.com>
	Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
Link: https://lore.kernel.org/r/20241125171650.77424-1-yang@os.amperecomputing.com
[catalin.marinas@arm.com: some tweaking of the commit log]
	Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit 56a7087)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
Rebuild_History Non-Buildable kernel-6.12.0-55.11.1.el10_0
commit-author Yishai Hadas <yishaih@nvidia.com>
commit abc7b3f

Memory regions (MR) of type DM (device memory) do not have an associated
umem.

In the __mlx5_ib_dereg_mr() -> mlx5_free_priv_descs() flow, the code
incorrectly takes the wrong branch, attempting to call
dma_unmap_single() on a DMA address that is not mapped.

This results in a WARN [1], as shown below.

The issue is resolved by properly accounting for the DM type and
ensuring the correct branch is selected in mlx5_free_priv_descs().

[1]
WARNING: CPU: 12 PID: 1346 at drivers/iommu/dma-iommu.c:1230 iommu_dma_unmap_page+0x79/0x90
Modules linked in: ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry ovelay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core fuse mlx5_core
CPU: 12 UID: 0 PID: 1346 Comm: ibv_rc_pingpong Not tainted 6.12.0-rc7+ #1631
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:iommu_dma_unmap_page+0x79/0x90
Code: 2b 49 3b 29 72 26 49 3b 69 08 73 20 4d 89 f0 44 89 e9 4c 89 e2 48 89 ee 48 89 df 5b 5d 41 5c 41 5d 41 5e 41 5f e9 07 b8 88 ff <0f> 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 66 0f 1f 44 00
RSP: 0018:ffffc90001913a10 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88810194b0a8 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
RBP: ffff88810194b0a8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f537abdd740(0000) GS:ffff88885fb00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f537aeb8000 CR3: 000000010c248001 CR4: 0000000000372eb0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
? __warn+0x84/0x190
? iommu_dma_unmap_page+0x79/0x90
? report_bug+0xf8/0x1c0
? handle_bug+0x55/0x90
? exc_invalid_op+0x13/0x60
? asm_exc_invalid_op+0x16/0x20
? iommu_dma_unmap_page+0x79/0x90
dma_unmap_page_attrs+0xe6/0x290
mlx5_free_priv_descs+0xb0/0xe0 [mlx5_ib]
__mlx5_ib_dereg_mr+0x37e/0x520 [mlx5_ib]
? _raw_spin_unlock_irq+0x24/0x40
? wait_for_completion+0xfe/0x130
? rdma_restrack_put+0x63/0xe0 [ib_core]
ib_dereg_mr_user+0x5f/0x120 [ib_core]
? lock_release+0xc6/0x280
destroy_hw_idr_uobject+0x1d/0x60 [ib_uverbs]
uverbs_destroy_uobject+0x58/0x1d0 [ib_uverbs]
uobj_destroy+0x3f/0x70 [ib_uverbs]
ib_uverbs_cmd_verbs+0x3e4/0xbb0 [ib_uverbs]
? __pfx_uverbs_destroy_def_handler+0x10/0x10 [ib_uverbs]
? lock_acquire+0xc1/0x2f0
? ib_uverbs_ioctl+0xcb/0x170 [ib_uverbs]
? ib_uverbs_ioctl+0x116/0x170 [ib_uverbs]
? lock_release+0xc6/0x280
ib_uverbs_ioctl+0xe7/0x170 [ib_uverbs]
? ib_uverbs_ioctl+0xcb/0x170 [ib_uverbs]
__x64_sys_ioctl+0x1b0/0xa70
do_syscall_64+0x6b/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f537adaf17b
Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffff218f0b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffff218f1d8 RCX: 00007f537adaf17b
RDX: 00007ffff218f1c0 RSI: 00000000c0181b01 RDI: 0000000000000003
RBP: 00007ffff218f1a0 R08: 00007f537aa8d010 R09: 0000561ee2e4f270
R10: 00007f537aace3a8 R11: 0000000000000246 R12: 00007ffff218f190
R13: 000000000000001c R14: 0000561ee2e4d7c0 R15: 00007ffff218f450
</TASK>

Fixes: f18ec42 ("RDMA/mlx5: Use a union inside mlx5_ib_mr")
	Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://patch.msgid.link/2039c22cfc3df02378747ba4d623a558b53fc263.1738587076.git.leon@kernel.org
	Signed-off-by: Leon Romanovsky <leon@kernel.org>
(cherry picked from commit abc7b3f)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
cve CVE-2024-56631
Rebuild_History Non-Buildable kernel-6.12.0-55.11.1.el10_0
commit-author Suraj Sonawane <surajsonawane0215@gmail.com>
commit f10593a
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-6.12.0-55.11.1.el10_0/f10593ad.failed

Fix a use-after-free bug in sg_release(), detected by syzbot with KASAN:

BUG: KASAN: slab-use-after-free in lock_release+0x151/0xa30
kernel/locking/lockdep.c:5838
__mutex_unlock_slowpath+0xe2/0x750 kernel/locking/mutex.c:912
sg_release+0x1f4/0x2e0 drivers/scsi/sg.c:407

In sg_release(), the function kref_put(&sfp->f_ref, sg_remove_sfp) is
called before releasing the open_rel_lock mutex. The kref_put() call may
decrement the reference count of sfp to zero, triggering its cleanup
through sg_remove_sfp(). This cleanup includes scheduling deferred work
via sg_remove_sfp_usercontext(), which ultimately frees sfp.

After kref_put(), sg_release() continues to unlock open_rel_lock and may
reference sfp or sdp. If sfp has already been freed, this results in a
slab-use-after-free error.

Move the kref_put(&sfp->f_ref, sg_remove_sfp) call after unlocking the
open_rel_lock mutex. This ensures:

 - No references to sfp or sdp occur after the reference count is
   decremented.

 - Cleanup functions such as sg_remove_sfp() and
   sg_remove_sfp_usercontext() can safely execute without impacting the
   mutex handling in sg_release().

The fix has been tested and validated by syzbot. This patch closes the
bug reported at the following syzkaller link and ensures proper
sequencing of resource cleanup and mutex operations, eliminating the
risk of use-after-free errors in sg_release().

	Reported-by: syzbot+7efb5850a17ba6ce098b@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=7efb5850a17ba6ce098b
	Tested-by: syzbot+7efb5850a17ba6ce098b@syzkaller.appspotmail.com
Fixes: cc833ac ("sg: O_EXCL and other lock handling")
	Signed-off-by: Suraj Sonawane <surajsonawane0215@gmail.com>
Link: https://lore.kernel.org/r/20241120125944.88095-1-surajsonawane0215@gmail.com
	Reviewed-by: Bart Van Assche <bvanassche@acm.org>
	Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit f10593a)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	drivers/scsi/sg.c
jira LE-3460
cve CVE-2024-56623
Rebuild_History Non-Buildable kernel-6.12.0-55.11.1.el10_0
commit-author Quinn Tran <qutran@marvell.com>
commit 07c903d

System crash is observed with stack trace warning of use after
free. There are 2 signals to tell dpc_thread to terminate (UNLOADING
flag and kthread_stop).

On setting the UNLOADING flag when dpc_thread happens to run at the time
and sees the flag, this causes dpc_thread to exit and clean up
itself. When kthread_stop is called for final cleanup, this causes use
after free.

Remove UNLOADING signal to terminate dpc_thread.  Use the kthread_stop
as the main signal to exit dpc_thread.

[596663.812935] kernel BUG at mm/slub.c:294!
[596663.812950] invalid opcode: 0000 [#1] SMP PTI
[596663.812957] CPU: 13 PID: 1475935 Comm: rmmod Kdump: loaded Tainted: G          IOE    --------- -  - 4.18.0-240.el8.x86_64 #1
[596663.812960] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/20/2012
[596663.812974] RIP: 0010:__slab_free+0x17d/0x360

...
[596663.813008] Call Trace:
[596663.813022]  ? __dentry_kill+0x121/0x170
[596663.813030]  ? _cond_resched+0x15/0x30
[596663.813034]  ? _cond_resched+0x15/0x30
[596663.813039]  ? wait_for_completion+0x35/0x190
[596663.813048]  ? try_to_wake_up+0x63/0x540
[596663.813055]  free_task+0x5a/0x60
[596663.813061]  kthread_stop+0xf3/0x100
[596663.813103]  qla2x00_remove_one+0x284/0x440 [qla2xxx]

	Cc: stable@vger.kernel.org
	Signed-off-by: Quinn Tran <qutran@marvell.com>
	Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20241115130313.46826-3-njavali@marvell.com
	Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
	Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 07c903d)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
Rebuild_History Non-Buildable kernel-6.12.0-55.11.1.el10_0
commit-author Chris Lu <chris.lu@mediatek.com>
commit ad0c6f6

Move MediaTek Bluetooth power off command before releasing
usb ISO interface.

	Signed-off-by: Chris Lu <chris.lu@mediatek.com>
	Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
(cherry picked from commit ad0c6f6)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
Rebuild_History Non-Buildable kernel-6.12.0-55.11.1.el10_0
commit-author Chris Lu <chris.lu@mediatek.com>
commit cea1805

Add disconnect callback function in btusb_disconnect which is reserved
for vendor specific usage before deregister hci in btusb_disconnect.

	Signed-off-by: Chris Lu <chris.lu@mediatek.com>
	Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
(cherry picked from commit cea1805)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
cve CVE-2024-56757
Rebuild_History Non-Buildable kernel-6.12.0-55.11.1.el10_0
commit-author Chris Lu <chris.lu@mediatek.com>
commit 489304e

MediaTek claim an special usb intr interface for ISO data transmission.
The interface need to be released before unregistering hci device when
usb disconnect. Removing BT usb dongle without properly releasing the
interface may cause Kernel panic while unregister hci device.

	Signed-off-by: Chris Lu <chris.lu@mediatek.com>
	Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
(cherry picked from commit 489304e)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
Rebuild_History Non-Buildable kernel-6.12.0-55.11.1.el10_0
commit-author Chris Lu <chris.lu@mediatek.com>
commit defc33b

Change conditions for Bluetooth driver claiming and releasing usb
ISO interface for MediaTek ISO data transmission.

	Signed-off-by: Chris Lu <chris.lu@mediatek.com>
	Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
(cherry picked from commit defc33b)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
cve CVE-2024-53238
Rebuild_History Non-Buildable kernel-6.12.0-55.11.1.el10_0
commit-author Chris Lu <chris.lu@mediatek.com>
commit 61c5a3d

MediaTek iso data anchor init should be moved to where MediaTek
claims iso data interface.
If there is an unexpected BT usb disconnect during setup flow,
it will cause a NULL pointer crash issue when releasing iso
anchor since the anchor wasn't been init yet. Adjust the position
to do iso data anchor init.

[   17.137991] pc : usb_kill_anchored_urbs+0x60/0x168
[   17.137998] lr : usb_kill_anchored_urbs+0x44/0x168
[   17.137999] sp : ffffffc0890cb5f0
[   17.138000] x29: ffffffc0890cb5f0 x28: ffffff80bb6c2e80
[   17.144081] gpio gpiochip0: registered chardev handle for 1 lines
[   17.148421]  x27: 0000000000000000
[   17.148422] x26: ffffffd301ff4298 x25: 0000000000000003 x24: 00000000000000f0
[   17.148424] x23: 0000000000000000 x22: 00000000ffffffff x21: 0000000000000001
[   17.148425] x20: ffffffffffffffd8 x19: ffffff80c0f25560 x18: 0000000000000000
[   17.148427] x17: ffffffd33864e408 x16: ffffffd33808f7c8 x15: 0000000000200000
[   17.232789] x14: e0cd73cf80ffffff x13: 50f2137c0a0338c9 x12: 0000000000000001
[   17.239912] x11: 0000000080150011 x10: 0000000000000002 x9 : 0000000000000001
[   17.247035] x8 : 0000000000000000 x7 : 0000000000008080 x6 : 8080000000000000
[   17.254158] x5 : ffffffd33808ebc0 x4 : fffffffe033dcf20 x3 : 0000000080150011
[   17.261281] x2 : ffffff8087a91400 x1 : 0000000000000000 x0 : ffffff80c0f25588
[   17.268404] Call trace:
[   17.270841]  usb_kill_anchored_urbs+0x60/0x168
[   17.275274]  btusb_mtk_release_iso_intf+0x2c/0xd8 [btusb (HASH:5afe 6)]
[   17.284226]  btusb_mtk_disconnect+0x14/0x28 [btusb (HASH:5afe 6)]
[   17.292652]  btusb_disconnect+0x70/0x140 [btusb (HASH:5afe 6)]
[   17.300818]  usb_unbind_interface+0xc4/0x240
[   17.305079]  device_release_driver_internal+0x18c/0x258
[   17.310296]  device_release_driver+0x1c/0x30
[   17.314557]  bus_remove_device+0x140/0x160
[   17.318643]  device_del+0x1c0/0x330
[   17.322121]  usb_disable_device+0x80/0x180
[   17.326207]  usb_disconnect+0xec/0x300
[   17.329948]  hub_quiesce+0x80/0xd0
[   17.333339]  hub_disconnect+0x44/0x190
[   17.337078]  usb_unbind_interface+0xc4/0x240
[   17.341337]  device_release_driver_internal+0x18c/0x258
[   17.346551]  device_release_driver+0x1c/0x30
[   17.350810]  usb_driver_release_interface+0x70/0x88
[   17.355677]  proc_ioctl+0x13c/0x228
[   17.359157]  proc_ioctl_default+0x50/0x80
[   17.363155]  usbdev_ioctl+0x830/0xd08
[   17.366808]  __arm64_sys_ioctl+0x94/0xd0
[   17.370723]  invoke_syscall+0x6c/0xf8
[   17.374377]  el0_svc_common+0x84/0xe0
[   17.378030]  do_el0_svc+0x20/0x30
[   17.381334]  el0_svc+0x34/0x60
[   17.384382]  el0t_64_sync_handler+0x88/0xf0
[   17.388554]  el0t_64_sync+0x180/0x188
[   17.392208] Code: f9400677 f100a2f4 54fffea0 d503201f (b8350288)
[   17.398289] ---[ end trace 0000000000000000 ]---

Fixes: ceac1cb ("Bluetooth: btusb: mediatek: add ISO data transmission functions")
	Signed-off-by: Chris Lu <chris.lu@mediatek.com>
	Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
(cherry picked from commit 61c5a3d)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
cve CVE-2024-56653
Rebuild_History Non-Buildable kernel-6.12.0-55.11.1.el10_0
commit-author Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
commit b548f5e

hci_devcd_append may lead to the release of the skb, so it cannot be
accessed once it is called.

==================================================================
BUG: KASAN: slab-use-after-free in btmtk_process_coredump+0x2a7/0x2d0 [btmtk]
Read of size 4 at addr ffff888033cfabb0 by task kworker/0:3/82

CPU: 0 PID: 82 Comm: kworker/0:3 Tainted: G     U             6.6.40-lockdep-03464-g1d8b4eb3060e #1 b0b3c1cc0c842735643fb411799d97921d1f688c
Hardware name: Google Yaviks_Ufs/Yaviks_Ufs, BIOS Google_Yaviks_Ufs.15217.552.0 05/07/2024
Workqueue: events btusb_rx_work [btusb]
Call Trace:
 <TASK>
 dump_stack_lvl+0xfd/0x150
 print_report+0x131/0x780
 kasan_report+0x177/0x1c0
 btmtk_process_coredump+0x2a7/0x2d0 [btmtk 03edd567dd71a65958807c95a65db31d433e1d01]
 btusb_recv_acl_mtk+0x11c/0x1a0 [btusb 675430d1e87c4f24d0c1f80efe600757a0f32bec]
 btusb_rx_work+0x9e/0xe0 [btusb 675430d1e87c4f24d0c1f80efe600757a0f32bec]
 worker_thread+0xe44/0x2cc0
 kthread+0x2ff/0x3a0
 ret_from_fork+0x51/0x80
 ret_from_fork_asm+0x1b/0x30
 </TASK>

Allocated by task 82:
 stack_trace_save+0xdc/0x190
 kasan_set_track+0x4e/0x80
 __kasan_slab_alloc+0x4e/0x60
 kmem_cache_alloc+0x19f/0x360
 skb_clone+0x132/0xf70
 btusb_recv_acl_mtk+0x104/0x1a0 [btusb]
 btusb_rx_work+0x9e/0xe0 [btusb]
 worker_thread+0xe44/0x2cc0
 kthread+0x2ff/0x3a0
 ret_from_fork+0x51/0x80
 ret_from_fork_asm+0x1b/0x30

Freed by task 1733:
 stack_trace_save+0xdc/0x190
 kasan_set_track+0x4e/0x80
 kasan_save_free_info+0x28/0xb0
 ____kasan_slab_free+0xfd/0x170
 kmem_cache_free+0x183/0x3f0
 hci_devcd_rx+0x91a/0x2060 [bluetooth]
 worker_thread+0xe44/0x2cc0
 kthread+0x2ff/0x3a0
 ret_from_fork+0x51/0x80
 ret_from_fork_asm+0x1b/0x30

The buggy address belongs to the object at ffff888033cfab40
 which belongs to the cache skbuff_head_cache of size 232
The buggy address is located 112 bytes inside of
 freed 232-byte region [ffff888033cfab40, ffff888033cfac28)

The buggy address belongs to the physical page:
page:00000000a174ba93 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x33cfa
head:00000000a174ba93 order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0
anon flags: 0x4000000000000840(slab|head|zone=1)
page_type: 0xffffffff()
raw: 4000000000000840 ffff888100848a00 0000000000000000 0000000000000001
raw: 0000000000000000 0000000080190019 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff888033cfaa80: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc
 ffff888033cfab00: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
>ffff888033cfab80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                     ^
 ffff888033cfac00: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc
 ffff888033cfac80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

Check if we need to call hci_devcd_complete before calling
hci_devcd_append. That requires that we check data->cd_info.cnt >=
MTK_COREDUMP_NUM instead of data->cd_info.cnt > MTK_COREDUMP_NUM, as we
increment data->cd_info.cnt only once the call to hci_devcd_append
succeeds.

Fixes: 0b70151 ("Bluetooth: btusb: mediatek: add MediaTek devcoredump support")
	Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
	Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
(cherry picked from commit b548f5e)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
PlaidCat added 15 commits July 1, 2025 17:12
jira LE-3460
Rebuild_History Non-Buildable kernel-6.12.0-55.17.1.el10_0
commit-author Jacob Keller <jacob.e.keller@intel.com>
commit a5c69d4

Newer versions of firmware support programming the PHY timer via the low
latency interface exposed over REG_LL_PROXY_L and REG_LL_PROXY_H. Add
support for checking the device capabilities for this feature.

Co-developed-by: Karol Kolacinski <karol.kolacinski@intel.com>
	Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
	Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
	Reviewed-by: Milena Olech <milena.olech@intel.com>
	Signed-off-by: Anton Nadezhdin <anton.nadezhdin@intel.com>
	Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
(cherry picked from commit a5c69d4)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
Rebuild_History Non-Buildable kernel-6.12.0-55.17.1.el10_0
commit-author Jacob Keller <jacob.e.keller@intel.com>
commit ef9a64c

Programming the PHY registers in preparation for an increment value change
or a timer adjustment on E810 requires issuing Admin Queue commands for
each PHY register. It has been found that the firmware Admin Queue
processing occasionally has delays of tens or rarely up to hundreds of
milliseconds. This delay cascades to failures in the PTP applications which
depend on these updates being low latency.

Consider a standard PTP profile with a sync rate of 16 times per second.
This means there is ~62 milliseconds between sync messages. A complete
cycle of the PTP algorithm

1) Sync message (with Tx timestamp) from source
2) Follow-up message from source
3) Delay request (with Tx timestamp) from sink
4) Delay response (with Rx timestamp of request) from source
5) measure instantaneous clock offset
6) request time adjustment via CLOCK_ADJTIME systemcall

The Tx timestamps have a default maximum timeout of 10 milliseconds. If we
assume that the maximum possible time is used, this leaves us with ~42
milliseconds of processing time for a complete cycle.

The CLOCK_ADJTIME system call is synchronous and will block until the
driver completes its timer adjustment or frequency change.

If the writes to prepare the PHY timers get hit by a latency spike of 50
milliseconds, then the PTP application will be delayed past the point where
the next cycle should start. Packets from the next cycle may have already
arrived and are waiting on the socket.

In particular, LinuxPTP ptp4l may start complaining about missing an
announce message from the source, triggering a fault. In addition, the
clockcheck logic it uses may trigger. This clockcheck failure occurs
because the timestamp captured by hardware is compared against a reading of
CLOCK_MONOTONIC. It is assumed that the time when the Rx timestamp is
captured and the read from CLOCK_MONOTONIC are relatively close together.
This is not the case if there is a significant delay to processing the Rx
packet.

Newer firmware supports programming the PHY registers over a low latency
interface which bypasses the Admin Queue. Instead, software writes to the
REG_LL_PROXY_L and REG_LL_PROXY_H registers. Firmware reads these registers
and then programs the PHY timers.

Implement functions to use this interface when available to program the PHY
timers instead of using the Admin Queue. This avoids the Admin Queue
latency and ensures that adjustments happen within acceptable latency
bounds.

Co-developed-by: Karol Kolacinski <karol.kolacinski@intel.com>
	Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
	Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
	Signed-off-by: Anton Nadezhdin <anton.nadezhdin@intel.com>
	Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
(cherry picked from commit ef9a64c)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
Rebuild_History Non-Buildable kernel-6.12.0-55.17.1.el10_0
commit-author Ming Lei <ming.lei@redhat.com>
commit b654f7a

Device mapper bioset often has big bio_slab size, which can be more than
1000, then 8byte can't hold the slab name any more, cause the kmem_cache
allocation warning of 'kmem_cache of name 'bio-108' already exists'.

Fix the warning by extending bio_slab->name to 12 bytes, but fix output
of /proc/slabinfo

	Reported-by: Guangwu Zhang <guazhang@redhat.com>
	Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250228132656.2838008-1-ming.lei@redhat.com
	Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit b654f7a)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
Rebuild_History Non-Buildable kernel-6.12.0-55.17.1.el10_0
commit-author Sankararaman Jayaraman <sankararaman.jayaraman@broadcom.com>
commit 0dd765f

vmxnet3 does not unregister xdp rxq info in the
vmxnet3_reset_work() code path as vmxnet3_rq_destroy()
is not invoked in this code path. So, we get below message with a
backtrace.

Missing unregister, handled but fix driver
WARNING: CPU:48 PID: 500 at net/core/xdp.c:182
__xdp_rxq_info_reg+0x93/0xf0

This patch fixes the problem by moving the unregister
code of XDP from vmxnet3_rq_destroy() to vmxnet3_rq_cleanup().

Fixes: 54f00cc ("vmxnet3: Add XDP support.")
	Signed-off-by: Sankararaman Jayaraman <sankararaman.jayaraman@broadcom.com>
	Signed-off-by: Ronak Doshi <ronak.doshi@broadcom.com>
Link: https://patch.msgid.link/20250320045522.57892-1-sankararaman.jayaraman@broadcom.com
	Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 0dd765f)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v6.12~1..kernel-mainline: 52012
Number of commits in rpm: 24
Number of commits matched with upstream: 21 (87.50%)
Number of commits in upstream but not in rpm: 51991
Number of commits NOT found in upstream: 3 (12.50%)

Rebuilding Kernel on Branch rocky10_0_rebuild_kernel-6.12.0-55.17.1.el10_0 for kernel-6.12.0-55.17.1.el10_0
Clean Cherry Picks: 21 (100.00%)
Empty Cherry Picks: 0 (0.00%)
_______________________________

Full Details Located here:
ciq/ciq_backports/kernel-6.12.0-55.17.1.el10_0/rebuild.details.txt

Includes:
* git commit header above
* Empty Commits with upstream SHA
* RPM ChangeLog Entries that could not be matched

Individual Empty Commit failures contained in the same containing directory.
The git message for empty commits will have the path for the failed commit.
File names are the first 8 characters of the upstream SHA
jira LE-3460
Rebuild_History Non-Buildable kernel-6.12.0-55.18.1.el10_0
commit-author ChunHao Lin <hau@realtek.com>
commit 3d9b8ac

This patch will enable RTL8168H/RTL8168EP/RTL8168FP ASPM support on
the platforms that have tested with ASPM enabled.

	Signed-off-by: ChunHao Lin <hau@realtek.com>
	Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/20250318083721.4127-2-hau@realtek.com
	Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 3d9b8ac)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
Rebuild_History Non-Buildable kernel-6.12.0-55.18.1.el10_0
commit-author ChunHao Lin <hau@realtek.com>
commit b48688e

Disable it due to it dose not meet ZRX-DC specification. If it is enabled,
device will exit L1 substate every 100ms. Disable it for saving more power
in L1 substate.

	Signed-off-by: ChunHao Lin <hau@realtek.com>
	Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/20250318083721.4127-3-hau@realtek.com
	Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit b48688e)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
cve CVE-2025-23150
Rebuild_History Non-Buildable kernel-6.12.0-55.18.1.el10_0
commit-author Artem Sadovnikov <a.sadovnikov@ispras.ru>
commit 94824ac

Syzkaller detected a use-after-free issue in ext4_insert_dentry that was
caused by out-of-bounds access due to incorrect splitting in do_split.

BUG: KASAN: use-after-free in ext4_insert_dentry+0x36a/0x6d0 fs/ext4/namei.c:2109
Write of size 251 at addr ffff888074572f14 by task syz-executor335/5847

CPU: 0 UID: 0 PID: 5847 Comm: syz-executor335 Not tainted 6.12.0-rc6-syzkaller-00318-ga9cda7c0ffed #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:377 [inline]
 print_report+0x169/0x550 mm/kasan/report.c:488
 kasan_report+0x143/0x180 mm/kasan/report.c:601
 kasan_check_range+0x282/0x290 mm/kasan/generic.c:189
 __asan_memcpy+0x40/0x70 mm/kasan/shadow.c:106
 ext4_insert_dentry+0x36a/0x6d0 fs/ext4/namei.c:2109
 add_dirent_to_buf+0x3d9/0x750 fs/ext4/namei.c:2154
 make_indexed_dir+0xf98/0x1600 fs/ext4/namei.c:2351
 ext4_add_entry+0x222a/0x25d0 fs/ext4/namei.c:2455
 ext4_add_nondir+0x8d/0x290 fs/ext4/namei.c:2796
 ext4_symlink+0x920/0xb50 fs/ext4/namei.c:3431
 vfs_symlink+0x137/0x2e0 fs/namei.c:4615
 do_symlinkat+0x222/0x3a0 fs/namei.c:4641
 __do_sys_symlink fs/namei.c:4662 [inline]
 __se_sys_symlink fs/namei.c:4660 [inline]
 __x64_sys_symlink+0x7a/0x90 fs/namei.c:4660
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
 </TASK>

The following loop is located right above 'if' statement.

for (i = count-1; i >= 0; i--) {
	/* is more than half of this entry in 2nd half of the block? */
	if (size + map[i].size/2 > blocksize/2)
		break;
	size += map[i].size;
	move++;
}

'i' in this case could go down to -1, in which case sum of active entries
wouldn't exceed half the block size, but previous behaviour would also do
split in half if sum would exceed at the very last block, which in case of
having too many long name files in a single block could lead to
out-of-bounds access and following use-after-free.

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

	Cc: stable@vger.kernel.org
Fixes: 5872331 ("ext4: fix potential negative array index in do_split()")
	Signed-off-by: Artem Sadovnikov <a.sadovnikov@ispras.ru>
	Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20250404082804.2567-3-a.sadovnikov@ispras.ru
	Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 94824ac)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
Rebuild_History Non-Buildable kernel-6.12.0-55.18.1.el10_0
commit-author Maurizio Lombardi <mlombard@redhat.com>
commit 88c23a3

The NVMe specification states that MAXCMD is mandatory
for NVMe-over-Fabrics implementations. However, some NVMe/TCP
and NVMe/FC arrays from major vendors have buggy firmware
that reports MAXCMD as zero in the Identify Controller data structure.

Currently, the implementation closes the connection in such cases,
completely preventing the host from connecting to the target.

Fix the issue by printing a clear error message about the firmware bug
and allowing the connection to proceed. It assumes that the
target supports a MAXCMD value of SQSIZE + 1. If any issues arise,
the user can manually adjust SQSIZE to mitigate them.

Fixes: 4999568 ("nvme-fabrics: check max outstanding commands")
	Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
	Reviewed-by: Laurence Oberman <loberman@redhat.com>
	Reviewed-by: Christoph Hellwig <hch@lst.de>
	Signed-off-by: Keith Busch <kbusch@kernel.org>
(cherry picked from commit 88c23a3)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
cve CVE-2025-37738
Rebuild_History Non-Buildable kernel-6.12.0-55.18.1.el10_0
commit-author Bhupesh <bhupesh@igalia.com>
commit c8e008b

Once inside 'ext4_xattr_inode_dec_ref_all' we should
ignore xattrs entries past the 'end' entry.

This fixes the following KASAN reported issue:

==================================================================
BUG: KASAN: slab-use-after-free in ext4_xattr_inode_dec_ref_all+0xb8c/0xe90
Read of size 4 at addr ffff888012c120c4 by task repro/2065

CPU: 1 UID: 0 PID: 2065 Comm: repro Not tainted 6.13.0-rc2+ #11
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x1fd/0x300
 ? tcp_gro_dev_warn+0x260/0x260
 ? _printk+0xc0/0x100
 ? read_lock_is_recursive+0x10/0x10
 ? irq_work_queue+0x72/0xf0
 ? __virt_addr_valid+0x17b/0x4b0
 print_address_description+0x78/0x390
 print_report+0x107/0x1f0
 ? __virt_addr_valid+0x17b/0x4b0
 ? __virt_addr_valid+0x3ff/0x4b0
 ? __phys_addr+0xb5/0x160
 ? ext4_xattr_inode_dec_ref_all+0xb8c/0xe90
 kasan_report+0xcc/0x100
 ? ext4_xattr_inode_dec_ref_all+0xb8c/0xe90
 ext4_xattr_inode_dec_ref_all+0xb8c/0xe90
 ? ext4_xattr_delete_inode+0xd30/0xd30
 ? __ext4_journal_ensure_credits+0x5f0/0x5f0
 ? __ext4_journal_ensure_credits+0x2b/0x5f0
 ? inode_update_timestamps+0x410/0x410
 ext4_xattr_delete_inode+0xb64/0xd30
 ? ext4_truncate+0xb70/0xdc0
 ? ext4_expand_extra_isize_ea+0x1d20/0x1d20
 ? __ext4_mark_inode_dirty+0x670/0x670
 ? ext4_journal_check_start+0x16f/0x240
 ? ext4_inode_is_fast_symlink+0x2f2/0x3a0
 ext4_evict_inode+0xc8c/0xff0
 ? ext4_inode_is_fast_symlink+0x3a0/0x3a0
 ? do_raw_spin_unlock+0x53/0x8a0
 ? ext4_inode_is_fast_symlink+0x3a0/0x3a0
 evict+0x4ac/0x950
 ? proc_nr_inodes+0x310/0x310
 ? trace_ext4_drop_inode+0xa2/0x220
 ? _raw_spin_unlock+0x1a/0x30
 ? iput+0x4cb/0x7e0
 do_unlinkat+0x495/0x7c0
 ? try_break_deleg+0x120/0x120
 ? 0xffffffff81000000
 ? __check_object_size+0x15a/0x210
 ? strncpy_from_user+0x13e/0x250
 ? getname_flags+0x1dc/0x530
 __x64_sys_unlinkat+0xc8/0xf0
 do_syscall_64+0x65/0x110
 entry_SYSCALL_64_after_hwframe+0x67/0x6f
RIP: 0033:0x434ffd
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8
RSP: 002b:00007ffc50fa7b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000107
RAX: ffffffffffffffda RBX: 00007ffc50fa7e18 RCX: 0000000000434ffd
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000005
RBP: 00007ffc50fa7be0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00007ffc50fa7e08 R14: 00000000004bbf30 R15: 0000000000000001
 </TASK>

The buggy address belongs to the object at ffff888012c12000
 which belongs to the cache filp of size 360
The buggy address is located 196 bytes inside of
 freed 360-byte region [ffff888012c12000, ffff888012c12168)

The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x12c12
head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x40(head|node=0|zone=0)
page_type: f5(slab)
raw: 0000000000000040 ffff888000ad7640 ffffea0000497a00 dead000000000004
raw: 0000000000000000 0000000000100010 00000001f5000000 0000000000000000
head: 0000000000000040 ffff888000ad7640 ffffea0000497a00 dead000000000004
head: 0000000000000000 0000000000100010 00000001f5000000 0000000000000000
head: 0000000000000001 ffffea00004b0481 ffffffffffffffff 0000000000000000
head: 0000000000000002 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff888012c11f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff888012c12000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff888012c12080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                           ^
 ffff888012c12100: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc
 ffff888012c12180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================

	Reported-by: syzbot+b244bda78289b00204ed@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b244bda78289b00204ed
	Suggested-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
	Signed-off-by: Bhupesh <bhupesh@igalia.com>
Link: https://patch.msgid.link/20250128082751.124948-2-bhupesh@igalia.com
	Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit c8e008b)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
cve CVE-2025-21999
Rebuild_History Non-Buildable kernel-6.12.0-55.18.1.el10_0
commit-author Ye Bin <yebin10@huawei.com>
commit 654b33a

Fix race between rmmod and /proc/XXX's inode instantiation.

The bug is that pde->proc_ops don't belong to /proc, it belongs to a
module, therefore dereferencing it after /proc entry has been registered
is a bug unless use_pde/unuse_pde() pair has been used.

use_pde/unuse_pde can be avoided (2 atomic ops!) because pde->proc_ops
never changes so information necessary for inode instantiation can be
saved _before_ proc_register() in PDE itself and used later, avoiding
pde->proc_ops->...  dereference.

      rmmod                         lookup
sys_delete_module
                         proc_lookup_de
			   pde_get(de);
			   proc_get_inode(dir->i_sb, de);
  mod->exit()
    proc_remove
      remove_proc_subtree
       proc_entry_rundown(de);
  free_module(mod);

                               if (S_ISREG(inode->i_mode))
	                         if (de->proc_ops->proc_read_iter)
                           --> As module is already freed, will trigger UAF

BUG: unable to handle page fault for address: fffffbfff80a702b
PGD 817fc4067 P4D 817fc4067 PUD 817fc0067 PMD 102ef4067 PTE 0
Oops: Oops: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 26 UID: 0 PID: 2667 Comm: ls Tainted: G
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
RIP: 0010:proc_get_inode+0x302/0x6e0
RSP: 0018:ffff88811c837998 EFLAGS: 00010a06
RAX: dffffc0000000000 RBX: ffffffffc0538140 RCX: 0000000000000007
RDX: 1ffffffff80a702b RSI: 0000000000000001 RDI: ffffffffc0538158
RBP: ffff8881299a6000 R08: 0000000067bbe1e5 R09: 1ffff11023906f20
R10: ffffffffb560ca07 R11: ffffffffb2b43a58 R12: ffff888105bb78f0
R13: ffff888100518048 R14: ffff8881299a6004 R15: 0000000000000001
FS:  00007f95b9686840(0000) GS:ffff8883af100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: fffffbfff80a702b CR3: 0000000117dd2000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 proc_lookup_de+0x11f/0x2e0
 __lookup_slow+0x188/0x350
 walk_component+0x2ab/0x4f0
 path_lookupat+0x120/0x660
 filename_lookup+0x1ce/0x560
 vfs_statx+0xac/0x150
 __do_sys_newstat+0x96/0x110
 do_syscall_64+0x5f/0x170
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

[adobriyan@gmail.com: don't do 2 atomic ops on the common path]
Link: https://lkml.kernel.org/r/3d25ded0-1739-447e-812b-e34da7990dcf@p183
Fixes: 778f3dd ("Fix procfs compat_ioctl regression")
	Signed-off-by: Ye Bin <yebin10@huawei.com>
	Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
	Cc: Al Viro <viro@zeniv.linux.org.uk>
	Cc: David S. Miller <davem@davemloft.net>
	Cc: <stable@vger.kernel.org>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 654b33a)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
Rebuild_History Non-Buildable kernel-6.12.0-55.18.1.el10_0
commit-author Ming Lei <ming.lei@redhat.com>
commit fc0e982
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-6.12.0-55.18.1.el10_0/fc0e982b.failed

Make sure ->nr_integrity_segments is cloned in blk_rq_prep_clone(),
otherwise requests cloned by device-mapper multipath will not have the
proper nr_integrity_segments values set, then BUG() is hit from
sg_alloc_table_chained().

Fixes: b0fd271 ("block: add request clone interface (v2)")
	Cc: stable@vger.kernel.org
	Cc: Christoph Hellwig <hch@infradead.org>
	Signed-off-by: Ming Lei <ming.lei@redhat.com>
	Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250310115453.2271109-1-ming.lei@redhat.com
	Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit fc0e982)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	block/blk-mq.c
…address into one operation

jira LE-3460
cve CVE-2024-56559
Rebuild_History Non-Buildable kernel-6.12.0-55.18.1.el10_0
commit-author Adrian Huang <ahuang12@lenovo.com>
commit 9e9e085

When compiling kernel source 'make -j $(nproc)' with the up-and-running
KASAN-enabled kernel on a 256-core machine, the following soft lockup is
shown:

watchdog: BUG: soft lockup - CPU#28 stuck for 22s! [kworker/28:1:1760]
CPU: 28 PID: 1760 Comm: kworker/28:1 Kdump: loaded Not tainted 6.10.0-rc5 #95
Workqueue: events drain_vmap_area_work
RIP: 0010:smp_call_function_many_cond+0x1d8/0xbb0
Code: 38 c8 7c 08 84 c9 0f 85 49 08 00 00 8b 45 08 a8 01 74 2e 48 89 f1 49 89 f7 48 c1 e9 03 41 83 e7 07 4c 01 e9 41 83 c7 03 f3 90 <0f> b6 01 41 38 c7 7c 08 84 c0 0f 85 d4 06 00 00 8b 45 08 a8 01 75
RSP: 0018:ffffc9000cb3fb60 EFLAGS: 00000202
RAX: 0000000000000011 RBX: ffff8883bc4469c0 RCX: ffffed10776e9949
RDX: 0000000000000002 RSI: ffff8883bb74ca48 RDI: ffffffff8434dc50
RBP: ffff8883bb74ca40 R08: ffff888103585dc0 R09: ffff8884533a1800
R10: 0000000000000004 R11: ffffffffffffffff R12: ffffed1077888d39
R13: dffffc0000000000 R14: ffffed1077888d38 R15: 0000000000000003
FS:  0000000000000000(0000) GS:ffff8883bc400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005577b5c8d158 CR3: 0000000004850000 CR4: 0000000000350ef0
Call Trace:
 <IRQ>
 ? watchdog_timer_fn+0x2cd/0x390
 ? __pfx_watchdog_timer_fn+0x10/0x10
 ? __hrtimer_run_queues+0x300/0x6d0
 ? sched_clock_cpu+0x69/0x4e0
 ? __pfx___hrtimer_run_queues+0x10/0x10
 ? srso_return_thunk+0x5/0x5f
 ? ktime_get_update_offsets_now+0x7f/0x2a0
 ? srso_return_thunk+0x5/0x5f
 ? srso_return_thunk+0x5/0x5f
 ? hrtimer_interrupt+0x2ca/0x760
 ? __sysvec_apic_timer_interrupt+0x8c/0x2b0
 ? sysvec_apic_timer_interrupt+0x6a/0x90
 </IRQ>
 <TASK>
 ? asm_sysvec_apic_timer_interrupt+0x16/0x20
 ? smp_call_function_many_cond+0x1d8/0xbb0
 ? __pfx_do_kernel_range_flush+0x10/0x10
 on_each_cpu_cond_mask+0x20/0x40
 flush_tlb_kernel_range+0x19b/0x250
 ? srso_return_thunk+0x5/0x5f
 ? kasan_release_vmalloc+0xa7/0xc0
 purge_vmap_node+0x357/0x820
 ? __pfx_purge_vmap_node+0x10/0x10
 __purge_vmap_area_lazy+0x5b8/0xa10
 drain_vmap_area_work+0x21/0x30
 process_one_work+0x661/0x10b0
 worker_thread+0x844/0x10e0
 ? srso_return_thunk+0x5/0x5f
 ? __kthread_parkme+0x82/0x140
 ? __pfx_worker_thread+0x10/0x10
 kthread+0x2a5/0x370
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x30/0x70
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1a/0x30
 </TASK>

Debugging Analysis:

  1. The following ftrace log shows that the lockup CPU spends too much
     time iterating vmap_nodes and flushing TLB when purging vm_area
     structures. (Some info is trimmed).

     kworker: funcgraph_entry:              |  drain_vmap_area_work() {
     kworker: funcgraph_entry:              |   mutex_lock() {
     kworker: funcgraph_entry:  1.092 us    |     __cond_resched();
     kworker: funcgraph_exit:   3.306 us    |   }
     ...                                        ...
     kworker: funcgraph_entry:              |    flush_tlb_kernel_range() {
     ...                                          ...
     kworker: funcgraph_exit: # 7533.649 us |    }
     ...                                         ...
     kworker: funcgraph_entry:  2.344 us    |   mutex_unlock();
     kworker: funcgraph_exit: $ 23871554 us | }

     The drain_vmap_area_work() spends over 23 seconds.

     There are 2805 flush_tlb_kernel_range() calls in the ftrace log.
       * One is called in __purge_vmap_area_lazy().
       * Others are called by purge_vmap_node->kasan_release_vmalloc.
         purge_vmap_node() iteratively releases kasan vmalloc
         allocations and flushes TLB for each vmap_area.
           - [Rough calculation] Each flush_tlb_kernel_range() runs
             about 7.5ms.
               -- 2804 * 7.5ms = 21.03 seconds.
               -- That's why a soft lock is triggered.

  2. Extending the soft lockup time can work around the issue (For example,
     # echo 60 > /proc/sys/kernel/watchdog_thresh). This confirms the
     above-mentioned speculation: drain_vmap_area_work() spends too much
     time.

If we combine all TLB flush operations of the KASAN shadow virtual
address into one operation in the call path
'purge_vmap_node()->kasan_release_vmalloc()', the running time of
drain_vmap_area_work() can be saved greatly. The idea is from the
flush_tlb_kernel_range() call in __purge_vmap_area_lazy(). And, the
soft lockup won't be triggered.

Here is the test result based on 6.10:

[6.10 wo/ the patch]
  1. ftrace latency profiling (record a trace if the latency > 20s).
     echo 20000000 > /sys/kernel/debug/tracing/tracing_thresh
     echo drain_vmap_area_work > /sys/kernel/debug/tracing/set_graph_function
     echo function_graph > /sys/kernel/debug/tracing/current_tracer
     echo 1 > /sys/kernel/debug/tracing/tracing_on

  2. Run `make -j $(nproc)` to compile the kernel source

  3. Once the soft lockup is reproduced, check the ftrace log:
     cat /sys/kernel/debug/tracing/trace
        # tracer: function_graph
        #
        # CPU  DURATION                  FUNCTION CALLS
        # |     |   |                     |   |   |   |
          76) $ 50412985 us |    } /* __purge_vmap_area_lazy */
          76) $ 50412997 us |  } /* drain_vmap_area_work */
          76) $ 29165911 us |    } /* __purge_vmap_area_lazy */
          76) $ 29165926 us |  } /* drain_vmap_area_work */
          91) $ 53629423 us |    } /* __purge_vmap_area_lazy */
          91) $ 53629434 us |  } /* drain_vmap_area_work */
          91) $ 28121014 us |    } /* __purge_vmap_area_lazy */
          91) $ 28121026 us |  } /* drain_vmap_area_work */

[6.10 w/ the patch]
  1. Repeat step 1-2 in "[6.10 wo/ the patch]"

  2. The soft lockup is not triggered and ftrace log is empty.
     cat /sys/kernel/debug/tracing/trace
     # tracer: function_graph
     #
     # CPU  DURATION                  FUNCTION CALLS
     # |     |   |                     |   |   |   |

  3. Setting 'tracing_thresh' to 10/5 seconds does not get any ftrace
     log.

  4. Setting 'tracing_thresh' to 1 second gets ftrace log.
     cat /sys/kernel/debug/tracing/trace
     # tracer: function_graph
     #
     # CPU  DURATION                  FUNCTION CALLS
     # |     |   |                     |   |   |   |
       23) $ 1074942 us  |    } /* __purge_vmap_area_lazy */
       23) $ 1074950 us  |  } /* drain_vmap_area_work */

  The worst execution time of drain_vmap_area_work() is about 1 second.

Link: https://lore.kernel.org/lkml/ZqFlawuVnOMY2k3E@pc638.lan/
Link: https://lkml.kernel.org/r/20240726165246.31326-1-ahuang12@lenovo.com
Fixes: 282631c ("mm: vmalloc: remove global purge_vmap_area_root rb-tree")
	Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Co-developed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
	Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
	Tested-by: Jiwei Sun <sunjw10@lenovo.com>
	Reviewed-by: Baoquan He <bhe@redhat.com>
	Cc: Alexander Potapenko <glider@google.com>
	Cc: Andrey Konovalov <andreyknvl@gmail.com>
	Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
	Cc: Christoph Hellwig <hch@infradead.org>
	Cc: Dmitry Vyukov <dvyukov@google.com>
	Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
	Cc: <stable@vger.kernel.org>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 9e9e085)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3460
Rebuild_History Non-Buildable kernel-6.12.0-55.18.1.el10_0
commit-author Harshit Agarwal <harshit@nutanix.com>
commit 690e47d
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-6.12.0-55.18.1.el10_0/690e47d1.failed

Overview
========
When a CPU chooses to call push_rt_task and picks a task to push to
another CPU's runqueue then it will call find_lock_lowest_rq method
which would take a double lock on both CPUs' runqueues. If one of the
locks aren't readily available, it may lead to dropping the current
runqueue lock and reacquiring both the locks at once. During this window
it is possible that the task is already migrated and is running on some
other CPU. These cases are already handled. However, if the task is
migrated and has already been executed and another CPU is now trying to
wake it up (ttwu) such that it is queued again on the runqeue
(on_rq is 1) and also if the task was run by the same CPU, then the
current checks will pass even though the task was migrated out and is no
longer in the pushable tasks list.

Crashes
=======
This bug resulted in quite a few flavors of crashes triggering kernel
panics with various crash signatures such as assert failures, page
faults, null pointer dereferences, and queue corruption errors all
coming from scheduler itself.

Some of the crashes:
-> kernel BUG at kernel/sched/rt.c:1616! BUG_ON(idx >= MAX_RT_PRIO)
   Call Trace:
   ? __die_body+0x1a/0x60
   ? die+0x2a/0x50
   ? do_trap+0x85/0x100
   ? pick_next_task_rt+0x6e/0x1d0
   ? do_error_trap+0x64/0xa0
   ? pick_next_task_rt+0x6e/0x1d0
   ? exc_invalid_op+0x4c/0x60
   ? pick_next_task_rt+0x6e/0x1d0
   ? asm_exc_invalid_op+0x12/0x20
   ? pick_next_task_rt+0x6e/0x1d0
   __schedule+0x5cb/0x790
   ? update_ts_time_stats+0x55/0x70
   schedule_idle+0x1e/0x40
   do_idle+0x15e/0x200
   cpu_startup_entry+0x19/0x20
   start_secondary+0x117/0x160
   secondary_startup_64_no_verify+0xb0/0xbb

-> BUG: kernel NULL pointer dereference, address: 00000000000000c0
   Call Trace:
   ? __die_body+0x1a/0x60
   ? no_context+0x183/0x350
   ? __warn+0x8a/0xe0
   ? exc_page_fault+0x3d6/0x520
   ? asm_exc_page_fault+0x1e/0x30
   ? pick_next_task_rt+0xb5/0x1d0
   ? pick_next_task_rt+0x8c/0x1d0
   __schedule+0x583/0x7e0
   ? update_ts_time_stats+0x55/0x70
   schedule_idle+0x1e/0x40
   do_idle+0x15e/0x200
   cpu_startup_entry+0x19/0x20
   start_secondary+0x117/0x160
   secondary_startup_64_no_verify+0xb0/0xbb

-> BUG: unable to handle page fault for address: ffff9464daea5900
   kernel BUG at kernel/sched/rt.c:1861! BUG_ON(rq->cpu != task_cpu(p))

-> kernel BUG at kernel/sched/rt.c:1055! BUG_ON(!rq->nr_running)
   Call Trace:
   ? __die_body+0x1a/0x60
   ? die+0x2a/0x50
   ? do_trap+0x85/0x100
   ? dequeue_top_rt_rq+0xa2/0xb0
   ? do_error_trap+0x64/0xa0
   ? dequeue_top_rt_rq+0xa2/0xb0
   ? exc_invalid_op+0x4c/0x60
   ? dequeue_top_rt_rq+0xa2/0xb0
   ? asm_exc_invalid_op+0x12/0x20
   ? dequeue_top_rt_rq+0xa2/0xb0
   dequeue_rt_entity+0x1f/0x70
   dequeue_task_rt+0x2d/0x70
   __schedule+0x1a8/0x7e0
   ? blk_finish_plug+0x25/0x40
   schedule+0x3c/0xb0
   futex_wait_queue_me+0xb6/0x120
   futex_wait+0xd9/0x240
   do_futex+0x344/0xa90
   ? get_mm_exe_file+0x30/0x60
   ? audit_exe_compare+0x58/0x70
   ? audit_filter_rules.constprop.26+0x65e/0x1220
   __x64_sys_futex+0x148/0x1f0
   do_syscall_64+0x30/0x80
   entry_SYSCALL_64_after_hwframe+0x62/0xc7

-> BUG: unable to handle page fault for address: ffff8cf3608bc2c0
   Call Trace:
   ? __die_body+0x1a/0x60
   ? no_context+0x183/0x350
   ? spurious_kernel_fault+0x171/0x1c0
   ? exc_page_fault+0x3b6/0x520
   ? plist_check_list+0x15/0x40
   ? plist_check_list+0x2e/0x40
   ? asm_exc_page_fault+0x1e/0x30
   ? _cond_resched+0x15/0x30
   ? futex_wait_queue_me+0xc8/0x120
   ? futex_wait+0xd9/0x240
   ? try_to_wake_up+0x1b8/0x490
   ? futex_wake+0x78/0x160
   ? do_futex+0xcd/0xa90
   ? plist_check_list+0x15/0x40
   ? plist_check_list+0x2e/0x40
   ? plist_del+0x6a/0xd0
   ? plist_check_list+0x15/0x40
   ? plist_check_list+0x2e/0x40
   ? dequeue_pushable_task+0x20/0x70
   ? __schedule+0x382/0x7e0
   ? asm_sysvec_reschedule_ipi+0xa/0x20
   ? schedule+0x3c/0xb0
   ? exit_to_user_mode_prepare+0x9e/0x150
   ? irqentry_exit_to_user_mode+0x5/0x30
   ? asm_sysvec_reschedule_ipi+0x12/0x20

Above are some of the common examples of the crashes that were observed
due to this issue.

Details
=======
Let's look at the following scenario to understand this race.

1) CPU A enters push_rt_task
  a) CPU A has chosen next_task = task p.
  b) CPU A calls find_lock_lowest_rq(Task p, CPU Z’s rq).
  c) CPU A identifies CPU X as a destination CPU (X < Z).
  d) CPU A enters double_lock_balance(CPU Z’s rq, CPU X’s rq).
  e) Since X is lower than Z, CPU A unlocks CPU Z’s rq. Someone else has
     locked CPU X’s rq, and thus, CPU A must wait.

2) At CPU Z
  a) Previous task has completed execution and thus, CPU Z enters
     schedule, locks its own rq after CPU A releases it.
  b) CPU Z dequeues previous task and begins executing task p.
  c) CPU Z unlocks its rq.
  d) Task p yields the CPU (ex. by doing IO or waiting to acquire a
     lock) which triggers the schedule function on CPU Z.
  e) CPU Z enters schedule again, locks its own rq, and dequeues task p.
  f) As part of dequeue, it sets p.on_rq = 0 and unlocks its rq.

3) At CPU B
  a) CPU B enters try_to_wake_up with input task p.
  b) Since CPU Z dequeued task p, p.on_rq = 0, and CPU B updates
     B.state = WAKING.
  c) CPU B via select_task_rq determines CPU Y as the target CPU.

4) The race
  a) CPU A acquires CPU X’s lock and relocks CPU Z.
  b) CPU A reads task p.cpu = Z and incorrectly concludes task p is
     still on CPU Z.
  c) CPU A failed to notice task p had been dequeued from CPU Z while
     CPU A was waiting for locks in double_lock_balance. If CPU A knew
     that task p had been dequeued, it would return NULL forcing
     push_rt_task to give up the task p's migration.
  d) CPU B updates task p.cpu = Y and calls ttwu_queue.
  e) CPU B locks Ys rq. CPU B enqueues task p onto Y and sets task
     p.on_rq = 1.
  f) CPU B unlocks CPU Y, triggering memory synchronization.
  g) CPU A reads task p.on_rq = 1, cementing its assumption that task p
     has not migrated.
  h) CPU A decides to migrate p to CPU X.

This leads to A dequeuing p from Y's queue and various crashes down the
line.

Solution
========
The solution here is fairly simple. After obtaining the lock (at 4a),
the check is enhanced to make sure that the task is still at the head of
the pushable tasks list. If not, then it is anyway not suitable for
being pushed out.

Testing
=======
The fix is tested on a cluster of 3 nodes, where the panics due to this
are hit every couple of days. A fix similar to this was deployed on such
cluster and was stable for more than 30 days.

Co-developed-by: Jon Kohler <jon@nutanix.com>
	Signed-off-by: Jon Kohler <jon@nutanix.com>
Co-developed-by: Gauri Patwardhan <gauri.patwardhan@nutanix.com>
	Signed-off-by: Gauri Patwardhan <gauri.patwardhan@nutanix.com>
Co-developed-by: Rahul Chunduru <rahul.chunduru@nutanix.com>
	Signed-off-by: Rahul Chunduru <rahul.chunduru@nutanix.com>
	Signed-off-by: Harshit Agarwal <harshit@nutanix.com>
	Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
	Reviewed-by: "Steven Rostedt (Google)" <rostedt@goodmis.org>
	Reviewed-by: Phil Auld <pauld@redhat.com>
	Tested-by: Will Ton <william.ton@nutanix.com>
	Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250225180553.167995-1-harshit@nutanix.com
(cherry picked from commit 690e47d)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	kernel/sched/rt.c
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v6.12~1..kernel-mainline: 52012
Number of commits in rpm: 12
Number of commits matched with upstream: 9 (75.00%)
Number of commits in upstream but not in rpm: 52003
Number of commits NOT found in upstream: 3 (25.00%)

Rebuilding Kernel on Branch rocky10_0_rebuild_kernel-6.12.0-55.18.1.el10_0 for kernel-6.12.0-55.18.1.el10_0
Clean Cherry Picks: 7 (77.78%)
Empty Cherry Picks: 2 (22.22%)
_______________________________

Full Details Located here:
ciq/ciq_backports/kernel-6.12.0-55.18.1.el10_0/rebuild.details.txt

Includes:
* git commit header above
* Empty Commits with upstream SHA
* RPM ChangeLog Entries that could not be matched

Individual Empty Commit failures contained in the same containing directory.
The git message for empty commits will have the path for the failed commit.
File names are the first 8 characters of the upstream SHA
@PlaidCat PlaidCat force-pushed the rocky10_0_rebuild branch from 1bf1201 to 4de01b5 Compare July 1, 2025 21:15
@PlaidCat PlaidCat force-pushed the rocky10_0_rebuild branch from 2112e5c to 54d36d9 Compare July 1, 2025 22:10
thefossguy-ciq
thefossguy-ciq previously approved these changes Jul 2, 2025
Copy link

@thefossguy-ciq thefossguy-ciq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚤

bmastbergen
bmastbergen previously approved these changes Jul 2, 2025
Copy link
Collaborator

@bmastbergen bmastbergen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥌

@PlaidCat PlaidCat dismissed stale reviews from bmastbergen and thefossguy-ciq July 2, 2025 13:23

The merge-base changed after approval.

@PlaidCat PlaidCat deleted the branch rocky10_0 July 2, 2025 13:25
@PlaidCat PlaidCat closed this Jul 2, 2025
@PlaidCat PlaidCat reopened this Jul 2, 2025
Copy link

@thefossguy-ciq thefossguy-ciq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚤

Copy link
Collaborator

@bmastbergen bmastbergen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥌

@PlaidCat
Copy link
Collaborator Author

PlaidCat commented Jul 2, 2025

Thanks for dealing with the kerfuffle.

@PlaidCat PlaidCat merged commit 54d36d9 into rocky10_0 Jul 2, 2025
8 checks passed
@PlaidCat PlaidCat deleted the rocky10_0_rebuild branch July 2, 2025 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants