Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Intel-SIG] [Meteor Lake] Sync 6.6 EDAC driver patches to Deepin 6.6 #79

Conversation

shiqingd
Copy link

@shiqingd shiqingd commented Feb 23, 2024

This PR include the 9 patches which are used for EDAC/igen6 Intel Meteor Lake-P SoCs support.
Five patches have already been upstreamed to Linux kernel community.

Upstream:
commit 7f5b45e
6807434 ("EDAC/igen6: Add Intel Meteor Lake-P SoCs support")

commit 4681a10
3c77090 ("EDAC/igen6: Add Intel Meteor Lake-PS SoCs support").

commit 72597e7
a264f71 ("EDAC/igen6: Make get_mchbar() helper function").

commit cfa3e9a
d23627a ("EDAC/igen6: Add Intel Raptor Lake-P SoCs support").

commit 8f077b8
c4a5398 ("EDAC/igen6: Add Intel Alder Lake-N SoCs support").

Four Patches have already been uploaded to github/linux-intel-lts public repo.

commit be12908
intel/linux-intel-lts@da1536bdff36 ("EDAC/igen6: Add registration APIs for In-Band ECC error notification")

commit 1673f29
intel/linux-intel-lts@39b1cbd8dc70 ("EDAC/ieh: Add I/O device EDAC support for Intel Tiger Lake-H SoC")

commit 15ecf04
intel/linux-intel-lts@9f721ed88580 ("EDAC/ieh: Add I/O device EDAC driver for Intel CPUs with IEH")

commit 22dbb4d
intel/linux-intel-lts@1afcf245df31 ("x86/mce: Add MCACOD code for generic I/O error")

Errors of some I/O devices can be signaled by MCE and logged in
IOMCA bank. Add MCACOD code of generic I/O error and related macros
for MCi_MISC to support IOMCA logging.

See Intel Software Developers' Manual, version 071, volume 3B,
section "IOMCA".

Intel-SIG: x86/mce: Add MCACOD code for generic I/O error.
intel/linux-intel-lts@1afcf245df31

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
[ Qingdong Shi: amend commit log ]
Signed-off-by: Qingdong Shi <qingdong.shi@intel.com>
Integrated Error Handlers (IEHs) are PCIe devices which aggregate and
report error events of different severities (correctable, non-fatal
uncorrectable, and fatal uncorrectable) from various I/O devices, e.g.,
PCIe devices, legacy PCI devices. Each error severity is notified by
one of {SMI, NMI, MCE} which is configured by BIOS/platform firmware.

The first IEH-supported platform is Intel Tiger Lake-U CPU. The driver
reads/prints the error severity and error source (bus/device/function)
logged in the IEH(s) and restarts the system on fatal I/O device error.

Intel-SIG: EDAC/ieh: Add I/O device EDAC driver for Intel CPUs with IEH.
intel/linux-intel-lts@9f721ed88580

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
[ Qingdong Shi: amend commit log ]
Signed-off-by: Qingdong Shi <qingdong.shi@intel.com>
Tiger Lake-H SoC shares the same Integrated Error Handler(IEH) architecture
with Tiger Lake-U, so can use the same ieh_edac driver.

Add Tiger Lake-H IEH device ID for I/O device EDAC support.

Intel-SIG: EDAC/ieh: Add I/O device EDAC support for Intel Tiger Lake-H SoC.
intel/linux-intel-lts@39b1cbd8dc70

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
[ Qingdong Shi: amend commit log ]
Signed-off-by: Qingdong Shi <qingdong.shi@intel.com>
The igen6_edac driver is the root to capture the In-Band ECC error
event. There are some external modules which want to be notified about
the In-Band ECC errors for specific error handling. So add the
registration APIs for those external modules for the In-Band ECC errors.

Intel-SIG: EDAC/igen6: Add registration APIs for In-Band ECC error notification.
intel/linux-intel-lts@da1536bdff36

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
[ Qingdong Shi: amend commit log ]
Signed-off-by: Qingdong Shi <qingdong.shi@intel.com>
Mainline: commit c4a5398("EDAC/igen6: Add Intel Alder Lake-N SoCs support")
from: v6.8-rc1

Add several Intel Alder Lake-N SoC compute die IDs for EDAC support.
Alder Lake-N (one memory controller) is a cut-down derivative of
Alder Lake-P (two memory controllers).

Intel-SIG: Upstream commit c4a5398 ("EDAC/igen6: Add Intel Alder Lake-N SoCs support").

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Qingdong Shi: amend commit log ]
Signed-off-by: Qingdong Shi <qingdong.shi@intel.com>
Mainline: commit d23627a("EDAC/igen6: Add Intel Raptor Lake-P SoCs support")
from: v6.8-rc1

Add several Intel Raptor Lake-P SoC compute die IDs for EDAC support.
These Raptor Lake-P SoCs use similar memory controller and IBECC as
Alder Lake-P SoC but extend the most significant bit of error address
logged in IBECC from bit 38 to bit 45.

Intel-SIG: Upstream commit d23627a ("EDAC/igen6: Add Intel Raptor Lake-P SoCs support").
upstream link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d23627a7688f&dt=2

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Qingdong Shi: amend commit log ]
Signed-off-by: Qingdong Shi <qingdong.shi@intel.com>
Mainline: commit a264f71 ("EDAC/igen6: Make get_mchbar() helper function")
from: v6.8-rc1

Make get_mchbar() helper function to retrieve the BAR address of
the memory controller. No function changes.

Intel-SIG: Upstream commit a264f71 ("EDAC/igen6: Make get_mchbar() helper function").

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
[ Qingdong Shi: amend commit log ]
Signed-off-by: Qingdong Shi <qingdong.shi@intel.com>
Mainline: commit 3c77090 ("EDAC/igen6: Add Intel Meteor Lake-PS SoCs support")
from: v6.8-rc1

Add several Intel Meteor Lake-PS SoC compute die IDs for EDAC support.
These Meteor Lake-PS SoCs use similar memory controller and IBECC as
Alder Lake-P SoC.

Intel-SIG: Upstream commit 3c77090 ("EDAC/igen6: Add Intel Meteor Lake-PS SoCs support").

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
[ Qingdong Shi: amend commit log ]
Signed-off-by: Qingdong Shi <qingdong.shi@intel.com>
Mainline: commit 6807434 ("EDAC/igen6: Add Intel Meteor Lake-P SoCs support")
from: v6.8-rc1

Add several Intel Meteor Lake-P SoC compute die IDs for EDAC support.
These Meteor Lake-P SoCs use similar memory controller and IBECC as
Alder Lake-P SoC.

Intel-SIG: Upstream commit 6807434 ("EDAC/igen6: Add Intel Meteor Lake-P SoCs support").

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
[ Qingdong Shi: amend commit log ]
Signed-off-by: Qingdong Shi <qingdong.shi@intel.com>
@matrix-wsk matrix-wsk merged commit f4f7870 into deepin-community:linux-6.6.y Feb 23, 2024
1 check was pending
opsiff pushed a commit to opsiff/UOS-kernel that referenced this pull request Jul 30, 2024
commit 667574e873b5f77a220b2a93329689f36fb56d5d upstream.

When tries to demote 1G hugetlb folios, a lockdep warning is observed:

============================================
WARNING: possible recursive locking detected
6.10.0-rc6-00452-ga4d0275fa660-dirty deepin-community#79 Not tainted
--------------------------------------------
bash/710 is trying to acquire lock:
ffffffff8f0a7850 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0x244/0x460

but task is already holding lock:
ffffffff8f0a6f48 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0xae/0x460

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&h->resize_lock);
  lock(&h->resize_lock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

4 locks held by bash/710:
 #0: ffff8f118439c3f0 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x64/0xe0
 deepin-community#1: ffff8f11893b9e88 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf8/0x1d0
 deepin-community#2: ffff8f1183dc4428 (kn->active#98){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x100/0x1d0
 deepin-community#3: ffffffff8f0a6f48 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0xae/0x460

stack backtrace:
CPU: 3 PID: 710 Comm: bash Not tainted 6.10.0-rc6-00452-ga4d0275fa660-dirty deepin-community#79
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x68/0xa0
 __lock_acquire+0x10f2/0x1ca0
 lock_acquire+0xbe/0x2d0
 __mutex_lock+0x6d/0x400
 demote_store+0x244/0x460
 kernfs_fop_write_iter+0x12c/0x1d0
 vfs_write+0x380/0x540
 ksys_write+0x64/0xe0
 do_syscall_64+0xb9/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fa61db14887
RSP: 002b:00007ffc56c48358 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fa61db14887
RDX: 0000000000000002 RSI: 000055a030050220 RDI: 0000000000000001
RBP: 000055a030050220 R08: 00007fa61dbd1460 R09: 000000007fffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
R13: 00007fa61dc1b780 R14: 00007fa61dc17600 R15: 00007fa61dc16a00
 </TASK>

Lockdep considers this an AA deadlock because the different resize_lock
mutexes reside in the same lockdep class, but this is a false positive.
Place them in distinct classes to avoid these warnings.

Link: https://lkml.kernel.org/r/20240712031314.2570452-1-linmiaohe@huawei.com
Fixes: 8531fc6 ("hugetlb: add hugetlb demote page support")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Muchun Song <muchun.song@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b10c359e644235bc1c1899981d7203cbf5abf443)
opsiff pushed a commit to opsiff/UOS-kernel that referenced this pull request Aug 1, 2024
commit 667574e873b5f77a220b2a93329689f36fb56d5d upstream.

When tries to demote 1G hugetlb folios, a lockdep warning is observed:

============================================
WARNING: possible recursive locking detected
6.10.0-rc6-00452-ga4d0275fa660-dirty deepin-community#79 Not tainted
--------------------------------------------
bash/710 is trying to acquire lock:
ffffffff8f0a7850 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0x244/0x460

but task is already holding lock:
ffffffff8f0a6f48 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0xae/0x460

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&h->resize_lock);
  lock(&h->resize_lock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

4 locks held by bash/710:
 #0: ffff8f118439c3f0 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x64/0xe0
 deepin-community#1: ffff8f11893b9e88 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf8/0x1d0
 deepin-community#2: ffff8f1183dc4428 (kn->active#98){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x100/0x1d0
 deepin-community#3: ffffffff8f0a6f48 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0xae/0x460

stack backtrace:
CPU: 3 PID: 710 Comm: bash Not tainted 6.10.0-rc6-00452-ga4d0275fa660-dirty deepin-community#79
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x68/0xa0
 __lock_acquire+0x10f2/0x1ca0
 lock_acquire+0xbe/0x2d0
 __mutex_lock+0x6d/0x400
 demote_store+0x244/0x460
 kernfs_fop_write_iter+0x12c/0x1d0
 vfs_write+0x380/0x540
 ksys_write+0x64/0xe0
 do_syscall_64+0xb9/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fa61db14887
RSP: 002b:00007ffc56c48358 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fa61db14887
RDX: 0000000000000002 RSI: 000055a030050220 RDI: 0000000000000001
RBP: 000055a030050220 R08: 00007fa61dbd1460 R09: 000000007fffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
R13: 00007fa61dc1b780 R14: 00007fa61dc17600 R15: 00007fa61dc16a00
 </TASK>

Lockdep considers this an AA deadlock because the different resize_lock
mutexes reside in the same lockdep class, but this is a false positive.
Place them in distinct classes to avoid these warnings.

Link: https://lkml.kernel.org/r/20240712031314.2570452-1-linmiaohe@huawei.com
Fixes: 8531fc6 ("hugetlb: add hugetlb demote page support")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Muchun Song <muchun.song@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 60c223ed1f6f47ad36da6b4032e3b45cad8fb41f)
opsiff pushed a commit to opsiff/UOS-kernel that referenced this pull request Aug 4, 2024
commit 667574e873b5f77a220b2a93329689f36fb56d5d upstream.

When tries to demote 1G hugetlb folios, a lockdep warning is observed:

============================================
WARNING: possible recursive locking detected
6.10.0-rc6-00452-ga4d0275fa660-dirty deepin-community#79 Not tainted
--------------------------------------------
bash/710 is trying to acquire lock:
ffffffff8f0a7850 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0x244/0x460

but task is already holding lock:
ffffffff8f0a6f48 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0xae/0x460

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&h->resize_lock);
  lock(&h->resize_lock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

4 locks held by bash/710:
 #0: ffff8f118439c3f0 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x64/0xe0
 deepin-community#1: ffff8f11893b9e88 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf8/0x1d0
 deepin-community#2: ffff8f1183dc4428 (kn->active#98){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x100/0x1d0
 deepin-community#3: ffffffff8f0a6f48 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0xae/0x460

stack backtrace:
CPU: 3 PID: 710 Comm: bash Not tainted 6.10.0-rc6-00452-ga4d0275fa660-dirty deepin-community#79
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x68/0xa0
 __lock_acquire+0x10f2/0x1ca0
 lock_acquire+0xbe/0x2d0
 __mutex_lock+0x6d/0x400
 demote_store+0x244/0x460
 kernfs_fop_write_iter+0x12c/0x1d0
 vfs_write+0x380/0x540
 ksys_write+0x64/0xe0
 do_syscall_64+0xb9/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fa61db14887
RSP: 002b:00007ffc56c48358 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fa61db14887
RDX: 0000000000000002 RSI: 000055a030050220 RDI: 0000000000000001
RBP: 000055a030050220 R08: 00007fa61dbd1460 R09: 000000007fffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
R13: 00007fa61dc1b780 R14: 00007fa61dc17600 R15: 00007fa61dc16a00
 </TASK>

Lockdep considers this an AA deadlock because the different resize_lock
mutexes reside in the same lockdep class, but this is a false positive.
Place them in distinct classes to avoid these warnings.

Link: https://lkml.kernel.org/r/20240712031314.2570452-1-linmiaohe@huawei.com
Fixes: 8531fc6 ("hugetlb: add hugetlb demote page support")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Muchun Song <muchun.song@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 99a49b670ede46ad6f47eef2c93bb31f7db8dfd5)
Avenger-285714 pushed a commit that referenced this pull request Aug 12, 2024
commit 667574e873b5f77a220b2a93329689f36fb56d5d upstream.

When tries to demote 1G hugetlb folios, a lockdep warning is observed:

============================================
WARNING: possible recursive locking detected
6.10.0-rc6-00452-ga4d0275fa660-dirty #79 Not tainted
--------------------------------------------
bash/710 is trying to acquire lock:
ffffffff8f0a7850 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0x244/0x460

but task is already holding lock:
ffffffff8f0a6f48 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0xae/0x460

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&h->resize_lock);
  lock(&h->resize_lock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

4 locks held by bash/710:
 #0: ffff8f118439c3f0 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x64/0xe0
 #1: ffff8f11893b9e88 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf8/0x1d0
 #2: ffff8f1183dc4428 (kn->active#98){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x100/0x1d0
 #3: ffffffff8f0a6f48 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0xae/0x460

stack backtrace:
CPU: 3 PID: 710 Comm: bash Not tainted 6.10.0-rc6-00452-ga4d0275fa660-dirty #79
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x68/0xa0
 __lock_acquire+0x10f2/0x1ca0
 lock_acquire+0xbe/0x2d0
 __mutex_lock+0x6d/0x400
 demote_store+0x244/0x460
 kernfs_fop_write_iter+0x12c/0x1d0
 vfs_write+0x380/0x540
 ksys_write+0x64/0xe0
 do_syscall_64+0xb9/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fa61db14887
RSP: 002b:00007ffc56c48358 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fa61db14887
RDX: 0000000000000002 RSI: 000055a030050220 RDI: 0000000000000001
RBP: 000055a030050220 R08: 00007fa61dbd1460 R09: 000000007fffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
R13: 00007fa61dc1b780 R14: 00007fa61dc17600 R15: 00007fa61dc16a00
 </TASK>

Lockdep considers this an AA deadlock because the different resize_lock
mutexes reside in the same lockdep class, but this is a false positive.
Place them in distinct classes to avoid these warnings.

Link: https://lkml.kernel.org/r/20240712031314.2570452-1-linmiaohe@huawei.com
Fixes: 8531fc6 ("hugetlb: add hugetlb demote page support")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Muchun Song <muchun.song@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 99a49b670ede46ad6f47eef2c93bb31f7db8dfd5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants