Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1116 #57

Closed
wants to merge 3 commits into from
Closed

1116 #57

wants to merge 3 commits into from

Conversation

coder280
Copy link

No description provided.

@coder280 coder280 closed this Nov 16, 2013
ruppi referenced this pull request in hardkernel/linux Nov 17, 2013
As the new x86 CPU bootup printout format code maintainer, I am
taking immediate action to improve and clean (and thus indulge
my OCD) the reporting of the cores when coming up online.

Fix padding to a right-hand alignment, cleanup code and bind
reporting width to the max number of supported CPUs on the
system, like this:

 [    0.074509] smpboot: Booting Node   0, Processors:      #1  #2  #3  #4  #5  #6  #7 OK
 [    0.644008] smpboot: Booting Node   1, Processors:  #8  #9 #10 #11 #12 #13 #14 #15 OK
 [    1.245006] smpboot: Booting Node   2, Processors: #16 #17 #18 #19 #20 #21 #22 #23 OK
 [    1.864005] smpboot: Booting Node   3, Processors: #24 #25 #26 #27 #28 #29 #30 #31 OK
 [    2.489005] smpboot: Booting Node   4, Processors: #32 #33 #34 #35 #36 #37 #38 #39 OK
 [    3.093005] smpboot: Booting Node   5, Processors: #40 #41 #42 #43 #44 #45 #46 #47 OK
 [    3.698005] smpboot: Booting Node   6, Processors: #48 #49 #50 #51 #52 #53 #54 #55 OK
 [    4.304005] smpboot: Booting Node   7, Processors: #56 #57 #58 #59 #60 #61 #62 #63 OK
 [    4.961413] Brought up 64 CPUs

and this:

 [    0.072367] smpboot: Booting Node   0, Processors:    #1 #2 #3 #4 #5 #6 #7 OK
 [    0.686329] Brought up 8 CPUs

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Libin <huawei.libin@huawei.com>
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Link: http://lkml.kernel.org/r/20130927143554.GF4422@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
ruppi referenced this pull request in hardkernel/linux Nov 17, 2013
Turn it into (for example):

[    0.073380] x86: Booting SMP configuration:
[    0.074005] .... node   #0, CPUs:          #1   #2   #3   #4   #5   #6   #7
[    0.603005] .... node   #1, CPUs:     #8   #9  #10  #11  #12  #13  #14  #15
[    1.200005] .... node   #2, CPUs:    #16  #17  #18  #19  #20  #21  #22  #23
[    1.796005] .... node   #3, CPUs:    #24  #25  #26  #27  #28  #29  #30  #31
[    2.393005] .... node   #4, CPUs:    #32  #33  #34  #35  #36  #37  #38  #39
[    2.996005] .... node   #5, CPUs:    #40  #41  #42  #43  #44  #45  #46  #47
[    3.600005] .... node   #6, CPUs:    #48  #49  #50  #51  #52  #53  #54  #55
[    4.202005] .... node   #7, CPUs:    #56  #57  #58  #59  #60  #61  #62  #63
[    4.811005] .... node   #8, CPUs:    #64  #65  #66  #67  #68  #69  #70  #71
[    5.421006] .... node   #9, CPUs:    #72  #73  #74  #75  #76  #77  #78  #79
[    6.032005] .... node  #10, CPUs:    #80  #81  #82  #83  #84  #85  #86  #87
[    6.648006] .... node  #11, CPUs:    #88  #89  #90  #91  #92  #93  #94  #95
[    7.262005] .... node  #12, CPUs:    #96  #97  #98  #99 #100 #101 #102 #103
[    7.865005] .... node  #13, CPUs:   #104 #105 #106 #107 #108 #109 #110 #111
[    8.466005] .... node  #14, CPUs:   #112 #113 #114 #115 #116 #117 #118 #119
[    9.073006] .... node  #15, CPUs:   #120 #121 #122 #123 #124 #125 #126 #127
[    9.679901] x86: Booted up 16 nodes, 128 CPUs

and drop useless elements.

Change num_digits() to hpa's division-avoiding, cell-phone-typed
version which he went at great lengths and pains to submit on a
Saturday evening.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: huawei.libin@huawei.com
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130930095624.GB16383@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
ruppi referenced this pull request in ruppi/linux Jan 7, 2014
The nm10 gpio driver is being modified to recognize the Panther Point
LPC device as one of valid GPIO controllers.

BUG=chrome-os-partner:8612
TEST=manual
 . build the new kernel
 . observe that the gpio driver gets installed:
    localhost ~ # dmesg | grep gpio
    [    7.020200] nm10_gpio version 0.04 built on Mar 22 2012 at 20:47:08
    [    7.020220] gpiochip_find_base: found new base at 160
 . enable the write protect GPIO (hardkernel#57)
    localhost ~ # echo 217 > /sys/class/gpio/export
 . examine its value
    localhost ~ # cat /sys/class/gpio/gpio217/value
    0
 . short the pins and examine the value again
   localhost ~ # cat /sys/class/gpio/gpio217/value
   1
 . observe value change when the pins status changes

Change-Id: Idf354a64d5b964a37ee72b8e14fcedd3aab83654
Signed-off-by: Vadim Bendebury <vbendeb@chromium.org>
Reviewed-on: https://gerrit.chromium.org/gerrit/18928
zeitgeist87 pushed a commit to zeitgeist87/linux that referenced this pull request Mar 14, 2014
…loop-during-umount-checkpatch-fixes

ERROR: code indent should use tabs where possible
torvalds#56: FILE: fs/ocfs2/dlm/dlmmaster.c:3087:
+                       if (tmp->type == DLM_MLE_MASTER) {$

WARNING: please, no spaces at the start of a line
torvalds#56: FILE: fs/ocfs2/dlm/dlmmaster.c:3087:
+                       if (tmp->type == DLM_MLE_MASTER) {$

WARNING: suspect code indent for conditional statements (23, 31)
torvalds#56: FILE: fs/ocfs2/dlm/dlmmaster.c:3087:
+                       if (tmp->type == DLM_MLE_MASTER) {
+                               ret = DLM_MIGRATE_RESPONSE_MASTERY_REF;

ERROR: code indent should use tabs where possible
torvalds#57: FILE: fs/ocfs2/dlm/dlmmaster.c:3088:
+                               ret = DLM_MIGRATE_RESPONSE_MASTERY_REF;$

WARNING: please, no spaces at the start of a line
torvalds#57: FILE: fs/ocfs2/dlm/dlmmaster.c:3088:
+                               ret = DLM_MIGRATE_RESPONSE_MASTERY_REF;$

ERROR: code indent should use tabs where possible
#58: FILE: fs/ocfs2/dlm/dlmmaster.c:3089:
+                               mlog(0, "%s:%.*s: master=%u, newmaster=%u, "$

WARNING: please, no spaces at the start of a line
#58: FILE: fs/ocfs2/dlm/dlmmaster.c:3089:
+                               mlog(0, "%s:%.*s: master=%u, newmaster=%u, "$

WARNING: quoted string split across lines
torvalds#59: FILE: fs/ocfs2/dlm/dlmmaster.c:3090:
+                               mlog(0, "%s:%.*s: master=%u, newmaster=%u, "
+                                               "telling master to get ref "

ERROR: code indent should use tabs where possible
torvalds#59: FILE: fs/ocfs2/dlm/dlmmaster.c:3090:
+                                               "telling master to get ref "$

WARNING: please, no spaces at the start of a line
torvalds#59: FILE: fs/ocfs2/dlm/dlmmaster.c:3090:
+                                               "telling master to get ref "$

WARNING: quoted string split across lines
torvalds#60: FILE: fs/ocfs2/dlm/dlmmaster.c:3091:
+                                               "telling master to get ref "
+                                               "for cleared out mle during "

ERROR: code indent should use tabs where possible
torvalds#60: FILE: fs/ocfs2/dlm/dlmmaster.c:3091:
+                                               "for cleared out mle during "$

WARNING: please, no spaces at the start of a line
torvalds#60: FILE: fs/ocfs2/dlm/dlmmaster.c:3091:
+                                               "for cleared out mle during "$

WARNING: quoted string split across lines
torvalds#61: FILE: fs/ocfs2/dlm/dlmmaster.c:3092:
+                                               "for cleared out mle during "
+                                               "migration\n", dlm->name,

ERROR: code indent should use tabs where possible
torvalds#61: FILE: fs/ocfs2/dlm/dlmmaster.c:3092:
+                                               "migration\n", dlm->name,$

WARNING: please, no spaces at the start of a line
torvalds#61: FILE: fs/ocfs2/dlm/dlmmaster.c:3092:
+                                               "migration\n", dlm->name,$

ERROR: code indent should use tabs where possible
torvalds#62: FILE: fs/ocfs2/dlm/dlmmaster.c:3093:
+                                               namelen, name, master,$

WARNING: please, no spaces at the start of a line
torvalds#62: FILE: fs/ocfs2/dlm/dlmmaster.c:3093:
+                                               namelen, name, master,$

ERROR: code indent should use tabs where possible
torvalds#63: FILE: fs/ocfs2/dlm/dlmmaster.c:3094:
+                                               new_master);$

WARNING: please, no spaces at the start of a line
torvalds#63: FILE: fs/ocfs2/dlm/dlmmaster.c:3094:
+                                               new_master);$

ERROR: code indent should use tabs where possible
torvalds#64: FILE: fs/ocfs2/dlm/dlmmaster.c:3095:
+                       }$

WARNING: please, no spaces at the start of a line
torvalds#64: FILE: fs/ocfs2/dlm/dlmmaster.c:3095:
+                       }$

total: 9 errors, 13 warnings, 20 lines checked

NOTE: whitespace errors detected, you may wish to use scripts/cleanpatch or
      scripts/cleanfile

./patches/ocfs2-do-not-return-dlm_migrate_response_mastery_ref-to-avoid-endlessloop-during-umount.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Xue jiufei <xuejiufei@huawei.com>
Cc: jiangyiwen <jiangyiwen@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
swarren pushed a commit to swarren/linux-tegra that referenced this pull request Mar 19, 2014
…loop-during-umount-checkpatch-fixes

ERROR: code indent should use tabs where possible
torvalds#56: FILE: fs/ocfs2/dlm/dlmmaster.c:3087:
+                       if (tmp->type == DLM_MLE_MASTER) {$

WARNING: please, no spaces at the start of a line
torvalds#56: FILE: fs/ocfs2/dlm/dlmmaster.c:3087:
+                       if (tmp->type == DLM_MLE_MASTER) {$

WARNING: suspect code indent for conditional statements (23, 31)
torvalds#56: FILE: fs/ocfs2/dlm/dlmmaster.c:3087:
+                       if (tmp->type == DLM_MLE_MASTER) {
+                               ret = DLM_MIGRATE_RESPONSE_MASTERY_REF;

ERROR: code indent should use tabs where possible
torvalds#57: FILE: fs/ocfs2/dlm/dlmmaster.c:3088:
+                               ret = DLM_MIGRATE_RESPONSE_MASTERY_REF;$

WARNING: please, no spaces at the start of a line
torvalds#57: FILE: fs/ocfs2/dlm/dlmmaster.c:3088:
+                               ret = DLM_MIGRATE_RESPONSE_MASTERY_REF;$

ERROR: code indent should use tabs where possible
#58: FILE: fs/ocfs2/dlm/dlmmaster.c:3089:
+                               mlog(0, "%s:%.*s: master=%u, newmaster=%u, "$

WARNING: please, no spaces at the start of a line
#58: FILE: fs/ocfs2/dlm/dlmmaster.c:3089:
+                               mlog(0, "%s:%.*s: master=%u, newmaster=%u, "$

WARNING: quoted string split across lines
torvalds#59: FILE: fs/ocfs2/dlm/dlmmaster.c:3090:
+                               mlog(0, "%s:%.*s: master=%u, newmaster=%u, "
+                                               "telling master to get ref "

ERROR: code indent should use tabs where possible
torvalds#59: FILE: fs/ocfs2/dlm/dlmmaster.c:3090:
+                                               "telling master to get ref "$

WARNING: please, no spaces at the start of a line
torvalds#59: FILE: fs/ocfs2/dlm/dlmmaster.c:3090:
+                                               "telling master to get ref "$

WARNING: quoted string split across lines
torvalds#60: FILE: fs/ocfs2/dlm/dlmmaster.c:3091:
+                                               "telling master to get ref "
+                                               "for cleared out mle during "

ERROR: code indent should use tabs where possible
torvalds#60: FILE: fs/ocfs2/dlm/dlmmaster.c:3091:
+                                               "for cleared out mle during "$

WARNING: please, no spaces at the start of a line
torvalds#60: FILE: fs/ocfs2/dlm/dlmmaster.c:3091:
+                                               "for cleared out mle during "$

WARNING: quoted string split across lines
torvalds#61: FILE: fs/ocfs2/dlm/dlmmaster.c:3092:
+                                               "for cleared out mle during "
+                                               "migration\n", dlm->name,

ERROR: code indent should use tabs where possible
torvalds#61: FILE: fs/ocfs2/dlm/dlmmaster.c:3092:
+                                               "migration\n", dlm->name,$

WARNING: please, no spaces at the start of a line
torvalds#61: FILE: fs/ocfs2/dlm/dlmmaster.c:3092:
+                                               "migration\n", dlm->name,$

ERROR: code indent should use tabs where possible
torvalds#62: FILE: fs/ocfs2/dlm/dlmmaster.c:3093:
+                                               namelen, name, master,$

WARNING: please, no spaces at the start of a line
torvalds#62: FILE: fs/ocfs2/dlm/dlmmaster.c:3093:
+                                               namelen, name, master,$

ERROR: code indent should use tabs where possible
torvalds#63: FILE: fs/ocfs2/dlm/dlmmaster.c:3094:
+                                               new_master);$

WARNING: please, no spaces at the start of a line
torvalds#63: FILE: fs/ocfs2/dlm/dlmmaster.c:3094:
+                                               new_master);$

ERROR: code indent should use tabs where possible
torvalds#64: FILE: fs/ocfs2/dlm/dlmmaster.c:3095:
+                       }$

WARNING: please, no spaces at the start of a line
torvalds#64: FILE: fs/ocfs2/dlm/dlmmaster.c:3095:
+                       }$

total: 9 errors, 13 warnings, 20 lines checked

NOTE: whitespace errors detected, you may wish to use scripts/cleanpatch or
      scripts/cleanfile

./patches/ocfs2-do-not-return-dlm_migrate_response_mastery_ref-to-avoid-endlessloop-during-umount.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Xue jiufei <xuejiufei@huawei.com>
Cc: jiangyiwen <jiangyiwen@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
swarren pushed a commit to swarren/linux-tegra that referenced this pull request Jun 23, 2014
WARNING: Missing a blank line after declarations
torvalds#46: FILE: mm/huge_memory.c:954:
+		struct page *endpage = page + HPAGE_PMD_NR;
+		atomic_add(HPAGE_PMD_NR, &page->_count);

WARNING: Missing a blank line after declarations
torvalds#57: FILE: mm/huge_memory.c:965:
+		struct page *endpage = page + HPAGE_PMD_NR;
+		while (page < endpage)

total: 0 errors, 2 warnings, 65 lines checked

./patches/mm-thp-fix-debug_pagealloc-oops-in-copy_page_rep.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
bengal pushed a commit to bengal/linux that referenced this pull request Nov 9, 2014
There have been several times where I have had to rebuild a kernel to
cause a panic when hitting a WARN() in the code in order to get a crash
dump from a system.  Sometimes this is easy to do, other times (such as in
the case of a remote admin) it is not trivial to send new images to the
user.

A much easier method would be a switch to change the WARN() over to a
panic.  This makes debugging easier in that I can now test the actual
image the WARN() was seen on and I do not have to engage in remote
debugging.

This patch adds a panic_on_warn kernel parameter and
/proc/sys/kernel/panic_on_warn calls panic() in the warn_slowpath_common()
path.  The function will still print out the location of the warning.

An example of the panic_on_warn output:

The first line below is from the WARN_ON() to output the WARN_ON()'s
location.  After that the panic() output is displayed.

WARNING: CPU: 30 PID: 11698 at /home/prarit/dummy_module/dummy-module.c:25 init_dummy+0x1f/0x30 [dummy_module]()
Kernel panic - not syncing: panic_on_warn set ...

CPU: 30 PID: 11698 Comm: insmod Tainted: G        W  OE  3.17.0+ torvalds#57
Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
 0000000000000000 000000008e3f87df ffff88080f093c38 ffffffff81665190
 0000000000000000 ffffffff818aea3d ffff88080f093cb8 ffffffff8165e2ec
 ffffffff00000008 ffff88080f093cc8 ffff88080f093c68 000000008e3f87df
Call Trace:
 [<ffffffff81665190>] dump_stack+0x46/0x58
 [<ffffffff8165e2ec>] panic+0xd0/0x204
 [<ffffffffa038e05f>] ? init_dummy+0x1f/0x30 [dummy_module]
 [<ffffffff81076b90>] warn_slowpath_common+0xd0/0xd0
 [<ffffffffa038e040>] ? dummy_greetings+0x40/0x40 [dummy_module]
 [<ffffffff81076c8a>] warn_slowpath_null+0x1a/0x20
 [<ffffffffa038e05f>] init_dummy+0x1f/0x30 [dummy_module]
 [<ffffffff81002144>] do_one_initcall+0xd4/0x210
 [<ffffffff811b52c2>] ? __vunmap+0xc2/0x110
 [<ffffffff810f8889>] load_module+0x16a9/0x1b30
 [<ffffffff810f3d30>] ? store_uevent+0x70/0x70
 [<ffffffff810f49b9>] ? copy_module_from_fd.isra.44+0x129/0x180
 [<ffffffff810f8ec6>] SyS_finit_module+0xa6/0xd0
 [<ffffffff8166cf29>] system_call_fastpath+0x12/0x17

Successfully tested by me.

hpa said: There is another very valid use for this: many operators would
rather a machine shuts down than being potentially compromised either
functionally or security-wise.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
aryabinin pushed a commit to aryabinin/linux that referenced this pull request Nov 18, 2014
There have been several times where I have had to rebuild a kernel to
cause a panic when hitting a WARN() in the code in order to get a crash
dump from a system.  Sometimes this is easy to do, other times (such as in
the case of a remote admin) it is not trivial to send new images to the
user.

A much easier method would be a switch to change the WARN() over to a
panic.  This makes debugging easier in that I can now test the actual
image the WARN() was seen on and I do not have to engage in remote
debugging.

This patch adds a panic_on_warn kernel parameter and
/proc/sys/kernel/panic_on_warn calls panic() in the warn_slowpath_common()
path.  The function will still print out the location of the warning.

An example of the panic_on_warn output:

The first line below is from the WARN_ON() to output the WARN_ON()'s
location.  After that the panic() output is displayed.

WARNING: CPU: 30 PID: 11698 at /home/prarit/dummy_module/dummy-module.c:25 init_dummy+0x1f/0x30 [dummy_module]()
Kernel panic - not syncing: panic_on_warn set ...

CPU: 30 PID: 11698 Comm: insmod Tainted: G        W  OE  3.17.0+ torvalds#57
Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
 0000000000000000 000000008e3f87df ffff88080f093c38 ffffffff81665190
 0000000000000000 ffffffff818aea3d ffff88080f093cb8 ffffffff8165e2ec
 ffffffff00000008 ffff88080f093cc8 ffff88080f093c68 000000008e3f87df
Call Trace:
 [<ffffffff81665190>] dump_stack+0x46/0x58
 [<ffffffff8165e2ec>] panic+0xd0/0x204
 [<ffffffffa038e05f>] ? init_dummy+0x1f/0x30 [dummy_module]
 [<ffffffff81076b90>] warn_slowpath_common+0xd0/0xd0
 [<ffffffffa038e040>] ? dummy_greetings+0x40/0x40 [dummy_module]
 [<ffffffff81076c8a>] warn_slowpath_null+0x1a/0x20
 [<ffffffffa038e05f>] init_dummy+0x1f/0x30 [dummy_module]
 [<ffffffff81002144>] do_one_initcall+0xd4/0x210
 [<ffffffff811b52c2>] ? __vunmap+0xc2/0x110
 [<ffffffff810f8889>] load_module+0x16a9/0x1b30
 [<ffffffff810f3d30>] ? store_uevent+0x70/0x70
 [<ffffffff810f49b9>] ? copy_module_from_fd.isra.44+0x129/0x180
 [<ffffffff810f8ec6>] SyS_finit_module+0xa6/0xd0
 [<ffffffff8166cf29>] system_call_fastpath+0x12/0x17

Successfully tested by me.

hpa said: There is another very valid use for this: many operators would
rather a machine shuts down than being potentially compromised either
functionally or security-wise.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
aryabinin pushed a commit to aryabinin/linux that referenced this pull request Nov 24, 2014
There have been several times where I have had to rebuild a kernel to
cause a panic when hitting a WARN() in the code in order to get a crash
dump from a system.  Sometimes this is easy to do, other times (such as in
the case of a remote admin) it is not trivial to send new images to the
user.

A much easier method would be a switch to change the WARN() over to a
panic.  This makes debugging easier in that I can now test the actual
image the WARN() was seen on and I do not have to engage in remote
debugging.

This patch adds a panic_on_warn kernel parameter and
/proc/sys/kernel/panic_on_warn calls panic() in the warn_slowpath_common()
path.  The function will still print out the location of the warning.

An example of the panic_on_warn output:

The first line below is from the WARN_ON() to output the WARN_ON()'s
location.  After that the panic() output is displayed.

WARNING: CPU: 30 PID: 11698 at /home/prarit/dummy_module/dummy-module.c:25 init_dummy+0x1f/0x30 [dummy_module]()
Kernel panic - not syncing: panic_on_warn set ...

CPU: 30 PID: 11698 Comm: insmod Tainted: G        W  OE  3.17.0+ torvalds#57
Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
 0000000000000000 000000008e3f87df ffff88080f093c38 ffffffff81665190
 0000000000000000 ffffffff818aea3d ffff88080f093cb8 ffffffff8165e2ec
 ffffffff00000008 ffff88080f093cc8 ffff88080f093c68 000000008e3f87df
Call Trace:
 [<ffffffff81665190>] dump_stack+0x46/0x58
 [<ffffffff8165e2ec>] panic+0xd0/0x204
 [<ffffffffa038e05f>] ? init_dummy+0x1f/0x30 [dummy_module]
 [<ffffffff81076b90>] warn_slowpath_common+0xd0/0xd0
 [<ffffffffa038e040>] ? dummy_greetings+0x40/0x40 [dummy_module]
 [<ffffffff81076c8a>] warn_slowpath_null+0x1a/0x20
 [<ffffffffa038e05f>] init_dummy+0x1f/0x30 [dummy_module]
 [<ffffffff81002144>] do_one_initcall+0xd4/0x210
 [<ffffffff811b52c2>] ? __vunmap+0xc2/0x110
 [<ffffffff810f8889>] load_module+0x16a9/0x1b30
 [<ffffffff810f3d30>] ? store_uevent+0x70/0x70
 [<ffffffff810f49b9>] ? copy_module_from_fd.isra.44+0x129/0x180
 [<ffffffff810f8ec6>] SyS_finit_module+0xa6/0xd0
 [<ffffffff8166cf29>] system_call_fastpath+0x12/0x17

Successfully tested by me.

hpa said: There is another very valid use for this: many operators would
rather a machine shuts down than being potentially compromised either
functionally or security-wise.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
aryabinin pushed a commit to aryabinin/linux that referenced this pull request Nov 27, 2014
There have been several times where I have had to rebuild a kernel to
cause a panic when hitting a WARN() in the code in order to get a crash
dump from a system.  Sometimes this is easy to do, other times (such as in
the case of a remote admin) it is not trivial to send new images to the
user.

A much easier method would be a switch to change the WARN() over to a
panic.  This makes debugging easier in that I can now test the actual
image the WARN() was seen on and I do not have to engage in remote
debugging.

This patch adds a panic_on_warn kernel parameter and
/proc/sys/kernel/panic_on_warn calls panic() in the warn_slowpath_common()
path.  The function will still print out the location of the warning.

An example of the panic_on_warn output:

The first line below is from the WARN_ON() to output the WARN_ON()'s
location.  After that the panic() output is displayed.

WARNING: CPU: 30 PID: 11698 at /home/prarit/dummy_module/dummy-module.c:25 init_dummy+0x1f/0x30 [dummy_module]()
Kernel panic - not syncing: panic_on_warn set ...

CPU: 30 PID: 11698 Comm: insmod Tainted: G        W  OE  3.17.0+ torvalds#57
Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
 0000000000000000 000000008e3f87df ffff88080f093c38 ffffffff81665190
 0000000000000000 ffffffff818aea3d ffff88080f093cb8 ffffffff8165e2ec
 ffffffff00000008 ffff88080f093cc8 ffff88080f093c68 000000008e3f87df
Call Trace:
 [<ffffffff81665190>] dump_stack+0x46/0x58
 [<ffffffff8165e2ec>] panic+0xd0/0x204
 [<ffffffffa038e05f>] ? init_dummy+0x1f/0x30 [dummy_module]
 [<ffffffff81076b90>] warn_slowpath_common+0xd0/0xd0
 [<ffffffffa038e040>] ? dummy_greetings+0x40/0x40 [dummy_module]
 [<ffffffff81076c8a>] warn_slowpath_null+0x1a/0x20
 [<ffffffffa038e05f>] init_dummy+0x1f/0x30 [dummy_module]
 [<ffffffff81002144>] do_one_initcall+0xd4/0x210
 [<ffffffff811b52c2>] ? __vunmap+0xc2/0x110
 [<ffffffff810f8889>] load_module+0x16a9/0x1b30
 [<ffffffff810f3d30>] ? store_uevent+0x70/0x70
 [<ffffffff810f49b9>] ? copy_module_from_fd.isra.44+0x129/0x180
 [<ffffffff810f8ec6>] SyS_finit_module+0xa6/0xd0
 [<ffffffff8166cf29>] system_call_fastpath+0x12/0x17

Successfully tested by me.

hpa said: There is another very valid use for this: many operators would
rather a machine shuts down than being potentially compromised either
functionally or security-wise.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
torvalds pushed a commit that referenced this pull request Dec 11, 2014
There have been several times where I have had to rebuild a kernel to
cause a panic when hitting a WARN() in the code in order to get a crash
dump from a system.  Sometimes this is easy to do, other times (such as
in the case of a remote admin) it is not trivial to send new images to
the user.

A much easier method would be a switch to change the WARN() over to a
panic.  This makes debugging easier in that I can now test the actual
image the WARN() was seen on and I do not have to engage in remote
debugging.

This patch adds a panic_on_warn kernel parameter and
/proc/sys/kernel/panic_on_warn calls panic() in the
warn_slowpath_common() path.  The function will still print out the
location of the warning.

An example of the panic_on_warn output:

The first line below is from the WARN_ON() to output the WARN_ON()'s
location.  After that the panic() output is displayed.

    WARNING: CPU: 30 PID: 11698 at /home/prarit/dummy_module/dummy-module.c:25 init_dummy+0x1f/0x30 [dummy_module]()
    Kernel panic - not syncing: panic_on_warn set ...

    CPU: 30 PID: 11698 Comm: insmod Tainted: G        W  OE  3.17.0+ #57
    Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
     0000000000000000 000000008e3f87df ffff88080f093c38 ffffffff81665190
     0000000000000000 ffffffff818aea3d ffff88080f093cb8 ffffffff8165e2ec
     ffffffff00000008 ffff88080f093cc8 ffff88080f093c68 000000008e3f87df
    Call Trace:
     [<ffffffff81665190>] dump_stack+0x46/0x58
     [<ffffffff8165e2ec>] panic+0xd0/0x204
     [<ffffffffa038e05f>] ? init_dummy+0x1f/0x30 [dummy_module]
     [<ffffffff81076b90>] warn_slowpath_common+0xd0/0xd0
     [<ffffffffa038e040>] ? dummy_greetings+0x40/0x40 [dummy_module]
     [<ffffffff81076c8a>] warn_slowpath_null+0x1a/0x20
     [<ffffffffa038e05f>] init_dummy+0x1f/0x30 [dummy_module]
     [<ffffffff81002144>] do_one_initcall+0xd4/0x210
     [<ffffffff811b52c2>] ? __vunmap+0xc2/0x110
     [<ffffffff810f8889>] load_module+0x16a9/0x1b30
     [<ffffffff810f3d30>] ? store_uevent+0x70/0x70
     [<ffffffff810f49b9>] ? copy_module_from_fd.isra.44+0x129/0x180
     [<ffffffff810f8ec6>] SyS_finit_module+0xa6/0xd0
     [<ffffffff8166cf29>] system_call_fastpath+0x12/0x17

Successfully tested by me.

hpa said: There is another very valid use for this: many operators would
rather a machine shuts down than being potentially compromised either
functionally or security-wise.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
torvalds pushed a commit that referenced this pull request Apr 15, 2015
The perf core implicitly rejects events spanning multiple HW PMUs, as in
these cases the event->ctx will differ. However this validation is
performed after pmu::event_init() is called in perf_init_event(), and
thus pmu::event_init() may be called with a group leader from a
different HW PMU.

The ARM PMU driver does not take this fact into account, and when
validating groups assumes that it can call to_arm_pmu(event->pmu) for
any HW event. When the event in question is from another HW PMU this is
wrong, and results in dereferencing garbage.

This patch updates the ARM PMU driver to first test for and reject
events from other PMUs, moving the to_arm_pmu and related logic after
this test. Fixes a crash triggered by perf_fuzzer on Linux-4.0-rc2, with
a CCI PMU present:

 ---
CPU: 0 PID: 1527 Comm: perf_fuzzer Not tainted 4.0.0-rc2 #57
Hardware name: ARM-Versatile Express
task: bd8484c0 ti: be676000 task.ti: be676000
PC is at 0xbf1bbc90
LR is at validate_event+0x34/0x5c
pc : [<bf1bbc90>]    lr : [<80016060>]    psr: 00000013
...
[<80016060>] (validate_event) from [<80016198>] (validate_group+0x28/0x90)
[<80016198>] (validate_group) from [<80016398>] (armpmu_event_init+0x150/0x218)
[<80016398>] (armpmu_event_init) from [<800882e4>] (perf_try_init_event+0x30/0x48)
[<800882e4>] (perf_try_init_event) from [<8008f544>] (perf_init_event+0x5c/0xf4)
[<8008f544>] (perf_init_event) from [<8008f8a8>] (perf_event_alloc+0x2cc/0x35c)
[<8008f8a8>] (perf_event_alloc) from [<8009015c>] (SyS_perf_event_open+0x498/0xa70)
[<8009015c>] (SyS_perf_event_open) from [<8000e420>] (ret_fast_syscall+0x0/0x34)
Code: bf1be000 bf1bb380 802a2664 00000000 (00000002)
---[ end trace 01aff0ff00926a0a ]---

Also cleans up the code to use the arm_pmu only when we know that
we are dealing with an arm pmu event.

Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Peter Ziljstra (Intel) <peterz@infradead.org>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
hzhuang1 pushed a commit to hzhuang1/linux that referenced this pull request May 22, 2015
aejsmith pushed a commit to aejsmith/linux that referenced this pull request Jul 28, 2015
torvalds pushed a commit that referenced this pull request Sep 1, 2015
On multi-function JMicron SATA/PATA/AHCI devices, the PATA controller at
function 1 doesn't work if it is powered on before the SATA controller at
function 0.  The result is that PATA doesn't work after resume, and we
print messages like this:

  pata_jmicron 0000:02:00.1: Refused to change power state, currently in D3
  irq 17: nobody cared (try booting with the "irqpoll" option)

Async resume was introduced in v3.15 by 76569fa ("PM / sleep:
Asynchronous threads for resume_noirq").  Prior to that, we powered on
the functions in order, so this problem shouldn't happen.

e6b7e41 ("ata: Disabling the async PM for JMicron chip 363/361")
solved the problem for JMicron 361 and 363 devices.  With async suspend
disabled, we always power on function 0 before function 1.

Barto then reported the same problem with a JMicron 368 (see comment #57 in
the bugzilla).

Rather than extending the blacklist piecemeal, disable async suspend for
all JMicron multi-function SATA/PATA/AHCI devices.

This quirk could stay in the ahci and pata_jmicron drivers, but it's likely
the problem will occur even if pata_jmicron isn't loaded until after the
suspend/resume.  Making it a PCI quirk ensures that we'll preserve the
power-on order even if the drivers aren't loaded.

[bhelgaas: changelog, limit to multi-function, limit to IDE/ATA]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=81551
Reported-and-tested-by: Barto <mister.freeman@laposte.net>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org	# v3.15+
quinte17 pushed a commit to quinte17/linux-stable that referenced this pull request Sep 21, 2015
commit 91f15fb upstream.

On multi-function JMicron SATA/PATA/AHCI devices, the PATA controller at
function 1 doesn't work if it is powered on before the SATA controller at
function 0.  The result is that PATA doesn't work after resume, and we
print messages like this:

  pata_jmicron 0000:02:00.1: Refused to change power state, currently in D3
  irq 17: nobody cared (try booting with the "irqpoll" option)

Async resume was introduced in v3.15 by 76569fa ("PM / sleep:
Asynchronous threads for resume_noirq").  Prior to that, we powered on
the functions in order, so this problem shouldn't happen.

e6b7e41 ("ata: Disabling the async PM for JMicron chip 363/361")
solved the problem for JMicron 361 and 363 devices.  With async suspend
disabled, we always power on function 0 before function 1.

Barto then reported the same problem with a JMicron 368 (see comment torvalds#57 in
the bugzilla).

Rather than extending the blacklist piecemeal, disable async suspend for
all JMicron multi-function SATA/PATA/AHCI devices.

This quirk could stay in the ahci and pata_jmicron drivers, but it's likely
the problem will occur even if pata_jmicron isn't loaded until after the
suspend/resume.  Making it a PCI quirk ensures that we'll preserve the
power-on order even if the drivers aren't loaded.

[bhelgaas: changelog, limit to multi-function, limit to IDE/ATA]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=81551
Reported-and-tested-by: Barto <mister.freeman@laposte.net>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
quinte17 pushed a commit to quinte17/linux-stable that referenced this pull request Sep 21, 2015
commit 91f15fb upstream.

On multi-function JMicron SATA/PATA/AHCI devices, the PATA controller at
function 1 doesn't work if it is powered on before the SATA controller at
function 0.  The result is that PATA doesn't work after resume, and we
print messages like this:

  pata_jmicron 0000:02:00.1: Refused to change power state, currently in D3
  irq 17: nobody cared (try booting with the "irqpoll" option)

Async resume was introduced in v3.15 by 76569fa ("PM / sleep:
Asynchronous threads for resume_noirq").  Prior to that, we powered on
the functions in order, so this problem shouldn't happen.

e6b7e41 ("ata: Disabling the async PM for JMicron chip 363/361")
solved the problem for JMicron 361 and 363 devices.  With async suspend
disabled, we always power on function 0 before function 1.

Barto then reported the same problem with a JMicron 368 (see comment torvalds#57 in
the bugzilla).

Rather than extending the blacklist piecemeal, disable async suspend for
all JMicron multi-function SATA/PATA/AHCI devices.

This quirk could stay in the ahci and pata_jmicron drivers, but it's likely
the problem will occur even if pata_jmicron isn't loaded until after the
suspend/resume.  Making it a PCI quirk ensures that we'll preserve the
power-on order even if the drivers aren't loaded.

[bhelgaas: changelog, limit to multi-function, limit to IDE/ATA]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=81551
Reported-and-tested-by: Barto <mister.freeman@laposte.net>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
0day-ci pushed a commit to 0day-ci/linux that referenced this pull request Nov 11, 2015
On 11/11/2015 03:35 AM, Dmitry Vyukov wrote:
> Hello,
>
> I've hit the following deadlock while running syzkaller on commit
> ce5c2d2 (Nov 7):
>
> [ INFO: possible circular locking dependency detected ]
> 4.3.0+ torvalds#57 Not tainted
> -------------------------------------------------------
> syzkaller_execu/24882 is trying to acquire lock:
>  (&tty->atomic_write_lock){+.+.+.}, at: [<ffffffff81774876>]
> tty_write_lock+0x46/0x70 drivers/tty/tty_io.c:1093
>
> but task is already holding lock:
>  (&tty->termios_rwsem){++++.+}, at: [<ffffffff817864d7>]
> n_tty_ioctl_helper+0x177/0x210 drivers/tty/tty_ioctl.c:1150
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (&tty->termios_rwsem){++++.+}:
>        [<ffffffff81122501>] lock_acquire+0x101/0x1d0 kernel/locking/lockdep.c:3585
>        [<ffffffff821727a9>] down_read+0x39/0x50 kernel/locking/rwsem.c:22
>        [<ffffffff8177eb07>] n_tty_write+0x137/0xaa0 drivers/tty/n_tty.c:2362
>        [<     inline     >] do_tty_write drivers/tty/tty_io.c:1164
>        [<ffffffff8177832e>] tty_write+0x28e/0x4f0 drivers/tty/tty_io.c:1248
>        [<ffffffff81778631>] redirected_tty_write+0xa1/0xb0 drivers/tty/tty_io.c:1269
>        [<ffffffff812d788b>] __vfs_write+0xeb/0x2a0 fs/read_write.c:489
>        [<ffffffff812d8a63>] vfs_write+0x113/0x290 fs/read_write.c:538
>        [<     inline     >] SYSC_write fs/read_write.c:585
>        [<ffffffff812da27b>] SyS_write+0xbb/0x170 fs/read_write.c:577
>        [<ffffffff82175fd1>] entry_SYSCALL_64_fastpath+0x31/0x9a
> arch/x86/entry/entry_64.S:187
>
> -> #0 (&tty->atomic_write_lock){+.+.+.}:
>        [<     inline     >] __mutex_lock_common kernel/locking/mutex.c:518
>        [<ffffffff82170525>] mutex_lock_interruptible_nested+0xa5/0x660 kernel/locking/mutex.c:647
>        [<ffffffff81774876>] tty_write_lock+0x46/0x70 drivers/tty/tty_io.c:1093
>        [<ffffffff8177a5a4>] tty_send_xchar+0x94/0x130 drivers/tty/tty_io.c:1289
>        [<ffffffff81786507>] n_tty_ioctl_helper+0x1a7/0x210 drivers/tty/tty_ioctl.c:1158
>        [<ffffffff8177f6e9>] n_tty_ioctl+0xe9/0x1e0 drivers/tty/n_tty.c:2514
>        [<ffffffff8177979c>] tty_ioctl+0xa4c/0x1650 drivers/tty/tty_io.c:2945
>        [<     inline     >] vfs_ioctl fs/ioctl.c:43
>        [<ffffffff812fdf0f>] do_vfs_ioctl+0x53f/0x980 fs/ioctl.c:607
>        [<     inline     >] SYSC_ioctl fs/ioctl.c:622
>        [<ffffffff812fe3df>] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:613
>        [<ffffffff82175fd1>] entry_SYSCALL_64_fastpath+0x31/0x9a
> arch/x86/entry/entry_64.S:187
>
> other info that might help us debug this:
>
>  Possible unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(&tty->termios_rwsem);
>                                lock(&tty->atomic_write_lock);
>                                lock(&tty->termios_rwsem);
>   lock(&tty->atomic_write_lock);
>
>  *** DEADLOCK ***

Thanks for the report, Dmitry.
Please re-test with the accompanying patch.

Regards,
Peter Hurley
diaevd referenced this pull request in diaevd/zen-kernel Dec 30, 2015
0day-ci pushed a commit to 0day-ci/linux that referenced this pull request Jan 28, 2016
Starting in 4.5-rc1, I noticed this warning on ipmi_si driver removal:

% modprobe ipmi_si
% rmmod ipmi_si

bus: 'platform': driver_probe_device: matched device IPI0001:00 with driver ipmi_si
bus: 'platform': really_probe: probing driver ipmi_si with device IPI0001:00
ipmi_si IPI0001:00: ipmi_si: probing via ACPI
ipmi_si IPI0001:00: [io  0x0ca2-0x0ca3] regsize 1 spacing 1 irq 0
ipmi_si: Adding ACPI-specified kcs state machine
driver: 'ipmi_si': driver_bound: bound to device 'IPI0001:00'
bus: 'platform': really_probe: bound device IPI0001:00 to driver ipmi_si
IPMI System Interface driver.
ipmi_si: probing via SMBIOS
ipmi_si: SMBIOS: io 0xda2 regsize 1 spacing 1 irq 0
ipmi_si: Adding SMBIOS-specified kcs state machine
ipmi_si: Trying ACPI-specified kcs state machine at i/o address 0xca2, slave address 0x0, irq 0
Registering platform device 'ipmi_bmc.0674.66'. Parent at platform
driver: 'ipmi': driver_bound: bound to device 'ipmi_bmc.0674.66'
ipmi_si IPI0001:00: Found new BMC (man_id: 0x000077, prod_id: 0x0674, dev_id: 0x42)
ipmi_si IPI0001:00: IPMI kcs interface initialized
------------[ cut here ]------------
WARNING: CPU: 39 PID: 3678 at drivers/base/power/common.c:150 dev_pm_domain_set+0x52/0x60()
PM domains can only be changed for unbound devices
[ ... snip ... ]
CPU: 39 PID: 3678 Comm: rmmod Tainted: G           OE   4.5.0-rc1+ torvalds#57
Hardware name: Stratus ftServer 6800/G7LYY, BIOS BIOS Version 8.1:61 09/10/2015
0000000000000000 000000003883b68d ffff8820334d7d30 ffffffff8132caf0
ffff8820334d7d78 ffff8820334d7d68 ffffffff8107f1b6 ffff8810351eca00
0000000000000000 0000000000000001 0000000001c2a330 0000000001c29010
Call Trace:
[<ffffffff8132caf0>] dump_stack+0x44/0x64
[<ffffffff8107f1b6>] warn_slowpath_common+0x86/0xc0
[<ffffffff8107f24c>] warn_slowpath_fmt+0x5c/0x80
[<ffffffff8145e152>] dev_pm_domain_set+0x52/0x60
[<ffffffff813a38ae>] acpi_dev_pm_detach+0x3f/0x84
[<ffffffff8145e0d7>] dev_pm_domain_detach+0x27/0x30
[<ffffffff814575f8>] platform_drv_remove+0x38/0x40
[<ffffffff814557ba>] __device_release_driver+0x9a/0x140
[<ffffffff81455968>] driver_detach+0xb8/0xc0
[<ffffffff814547b5>] bus_remove_driver+0x55/0xd0
[<ffffffff814560cc>] driver_unregister+0x2c/0x50
[<ffffffff814576b2>] platform_driver_unregister+0x12/0x20
[<ffffffffa02586c9>] cleanup_ipmi_si+0x29/0xa0 [ipmi_si]
[<ffffffff81102100>] SyS_delete_module+0x190/0x220
[<ffffffff8167ffee>] entry_SYSCALL_64_fastpath+0x12/0x71
---[ end trace 671ca97b9ac15462 ]---

My platform has two BMCs (perhaps this is messing with a refcount
somewhere), but I wonder about the ordering of this code:

__device_release_driver(struct device *dev)

  drv->remove(dev);
    [ platform_drv_remove ]
      ...
      dev_pm_domain_detach
        device_is_bound
          return dev->p && klist_node_attached(&dev->p->knode_driver)
  ...
  klist_remove(&dev->p->knode_driver);

Is the klist_remove at the bottom of __device_release_driver necessary
to satisfy the earlier check in dev_pm_domain_detach's device_is_bound
assertion?  If so, could these be out of order?

This is core driver code, so I'm assuming it's not something as simple
as the following (which avoided the warning on unload at least).  Any
suggestions or extra debugging ideas welcome!  This occurs on every
unload, so I'd be glad to test real solutions :)

Thanks,

-- Joe

-->8--
0day-ci pushed a commit to 0day-ci/linux that referenced this pull request Jan 28, 2016
WARNING: 'occuring' may be misspelled - perhaps 'occurring'?
torvalds#57: FILE: mm/Kconfig.debug:74:
+	   zeros. This makes it harder to detect when errors are occuring

total: 0 errors, 1 warnings, 47 lines checked

./patches/mm-page_poisoningc-allow-for-zero-poisoning.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Laura Abbott <labbott@fedoraproject.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
0day-ci pushed a commit to 0day-ci/linux that referenced this pull request Feb 4, 2016
WARNING: 'occuring' may be misspelled - perhaps 'occurring'?
torvalds#57: FILE: mm/Kconfig.debug:74:
+	   zeros. This makes it harder to detect when errors are occuring

total: 0 errors, 1 warnings, 47 lines checked

./patches/mm-page_poisoningc-allow-for-zero-poisoning.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Laura Abbott <labbott@fedoraproject.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
0day-ci pushed a commit to 0day-ci/linux that referenced this pull request Feb 22, 2016
WARNING: 'occuring' may be misspelled - perhaps 'occurring'?
torvalds#57: FILE: mm/Kconfig.debug:74:
+	   zeros. This makes it harder to detect when errors are occuring

total: 0 errors, 1 warnings, 47 lines checked

./patches/mm-page_poisoningc-allow-for-zero-poisoning.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Laura Abbott <labbott@fedoraproject.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
0day-ci pushed a commit to 0day-ci/linux that referenced this pull request Feb 29, 2016
WARNING: 'occuring' may be misspelled - perhaps 'occurring'?
torvalds#57: FILE: mm/Kconfig.debug:74:
+	   zeros. This makes it harder to detect when errors are occuring

total: 0 errors, 1 warnings, 47 lines checked

./patches/mm-page_poisoningc-allow-for-zero-poisoning.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Laura Abbott <labbott@fedoraproject.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
0day-ci pushed a commit to 0day-ci/linux that referenced this pull request Feb 29, 2016
WARNING: 'occuring' may be misspelled - perhaps 'occurring'?
torvalds#57: FILE: mm/Kconfig.debug:74:
+	   zeros. This makes it harder to detect when errors are occuring

total: 0 errors, 1 warnings, 47 lines checked

./patches/mm-page_poisoningc-allow-for-zero-poisoning.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Laura Abbott <labbott@fedoraproject.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
0day-ci pushed a commit to 0day-ci/linux that referenced this pull request May 10, 2016
Hi,

Using checkpatch.pl on the forwarded patch results in:

WARNING: please write a paragraph that describes the config symbol fully
torvalds#57: FILE: mm/Kconfig:451:
+       config OVERCOMMIT_GUESS

WARNING: please write a paragraph that describes the config symbol fully
torvalds#64: FILE: mm/Kconfig:458:
+       config OVERCOMMIT_ALWAYS

but there is a 'help' section for those 'config' sections.
NOTE: I followed the same indentation than the code laying just above the place where I inserted mine.

I think it is a false positive, what do you think?

Best regards,

Sebastian

Message-ID: <5731CC6E.3080807@laposte.net>
Date: Tue, 10 May 2016 13:56:30 +0200
From: Sebastian Frias <sf84@laposte.net>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0
To: <linux-mm@kvack.org>, Andrew Morton <akpm@linux-foundation.org>, "Michal
 Hocko" <mhocko@suse.com>, Linus Torvalds <torvalds@linux-foundation.org>
CC: LKML <linux-kernel@vger.kernel.org>, mason <slash.tmp@free.fr>
Subject: [PATCH] mm: add config option to select the initial overcommit mode
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
MIME-Version: 1.0

Currently the initial value of the overcommit mode is OVERCOMMIT_GUESS.
However, on embedded systems it is usually better to disable overcommit
to avoid waking up the OOM-killer and its well known undesirable
side-effects.

This config option allows to setup the initial overcommit mode to any of
the 3 available values, OVERCOMMIT_GUESS (which remains as default),
OVERCOMMIT_ALWAYS and OVERCOMMIT_NEVER.
The overcommit mode can still be changed thru sysctl after the system
boots up.

This config option depends on CONFIG_EXPERT.
This patch does not introduces functional changes.

Signed-off-by: Sebastian Frias <sf84@laposte.net>
prabhakarlad added a commit to prabhakarlad/linux that referenced this pull request Nov 21, 2023
We should be probing for IOCP during boot stage only. As we were probing
for IOCP for all the stages this caused the below issue during module-init
stage,

[9.019104] Unable to handle kernel paging request at virtual address ffffffff8100d3a0
[9.027153] Oops [#1]
[9.029421] Modules linked in: rcar_canfd renesas_usbhs i2c_riic can_dev spi_rspi i2c_core
[9.037686] CPU: 0 PID: 90 Comm: udevd Not tainted 6.7.0-rc1+ torvalds#57
[9.043756] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
[9.050339] epc : riscv_noncoherent_supported+0x10/0x3e
[9.055558]  ra : andes_errata_patch_func+0x4a/0x52
[9.060418] epc : ffffffff8000d8c2 ra : ffffffff8000d95c sp : ffffffc8003abb00
[9.067607]  gp : ffffffff814e25a0 tp : ffffffd80361e540 t0 : 0000000000000000
[9.074795]  t1 : 000000000900031e t2 : 0000000000000001 s0 : ffffffc8003abb20
[9.081984]  s1 : ffffffff015b57c7 a0 : 0000000000000000 a1 : 0000000000000001
[9.089172]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffff8100d8be
[9.096360]  a5 : 0000000000000001 a6 : 0000000000000001 a7 : 000000000900031e
[9.103548]  s2 : ffffffff015b57d7 s3 : 0000000000000001 s4 : 000000000000031e
[9.110736]  s5 : 8000000000008a45 s6 : 0000000000000500 s7 : 000000000000003f
[9.117924]  s8 : ffffffc8003abd48 s9 : ffffffff015b1140 s10: ffffffff8151a1b0
[9.125113]  s11: ffffffff015b1000 t3 : 0000000000000001 t4 : fefefefefefefeff
[9.132301]  t5 : ffffffff015b57c7 t6 : ffffffd8b63a6000
[9.137587] status: 0000000200000120 badaddr: ffffffff8100d3a0 cause: 000000000000000f
[9.145468] [<ffffffff8000d8c2>] riscv_noncoherent_supported+0x10/0x3e
[9.151972] [<ffffffff800027e8>] _apply_alternatives+0x84/0x86
[9.157784] [<ffffffff800029be>] apply_module_alternatives+0x10/0x1a
[9.164113] [<ffffffff80008fcc>] module_finalize+0x5e/0x7a
[9.169583] [<ffffffff80085cd6>] load_module+0xfd8/0x179c
[9.174965] [<ffffffff80086630>] init_module_from_file+0x76/0xaa
[9.180948] [<ffffffff800867f6>] __riscv_sys_finit_module+0x176/0x2a8
[9.187365] [<ffffffff80889862>] do_trap_ecall_u+0xbe/0x130
[9.192922] [<ffffffff808920bc>] ret_from_exception+0x0/0x64
[9.198573] Code: 0009 b7e9 6797 014d a783 85a7 c799 4785 0717 0100 (0123) aef7
[9.205994] ---[ end trace 0000000000000000 ]---

This is because we called riscv_noncoherent_supported() for all the stages
during IOCP probe. riscv_noncoherent_supported() function sets
noncoherent_supported variable to true which has an annotation set to
"__ro_after_init" due to which we were seeing the above splat. Fix this by
probing IOCP during boot stage only.

Fixes: e021ae7 ("riscv: errata: Add Andes alternative ports")
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Nov 21, 2023
We should be probing for IOCP during boot stage only. As we were probing
for IOCP for all the stages this caused the below issue during module-init
stage,

[9.019104] Unable to handle kernel paging request at virtual address ffffffff8100d3a0
[9.027153] Oops [#1]
[9.029421] Modules linked in: rcar_canfd renesas_usbhs i2c_riic can_dev spi_rspi i2c_core
[9.037686] CPU: 0 PID: 90 Comm: udevd Not tainted 6.7.0-rc1+ torvalds#57
[9.043756] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
[9.050339] epc : riscv_noncoherent_supported+0x10/0x3e
[9.055558]  ra : andes_errata_patch_func+0x4a/0x52
[9.060418] epc : ffffffff8000d8c2 ra : ffffffff8000d95c sp : ffffffc8003abb00
[9.067607]  gp : ffffffff814e25a0 tp : ffffffd80361e540 t0 : 0000000000000000
[9.074795]  t1 : 000000000900031e t2 : 0000000000000001 s0 : ffffffc8003abb20
[9.081984]  s1 : ffffffff015b57c7 a0 : 0000000000000000 a1 : 0000000000000001
[9.089172]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffff8100d8be
[9.096360]  a5 : 0000000000000001 a6 : 0000000000000001 a7 : 000000000900031e
[9.103548]  s2 : ffffffff015b57d7 s3 : 0000000000000001 s4 : 000000000000031e
[9.110736]  s5 : 8000000000008a45 s6 : 0000000000000500 s7 : 000000000000003f
[9.117924]  s8 : ffffffc8003abd48 s9 : ffffffff015b1140 s10: ffffffff8151a1b0
[9.125113]  s11: ffffffff015b1000 t3 : 0000000000000001 t4 : fefefefefefefeff
[9.132301]  t5 : ffffffff015b57c7 t6 : ffffffd8b63a6000
[9.137587] status: 0000000200000120 badaddr: ffffffff8100d3a0 cause: 000000000000000f
[9.145468] [<ffffffff8000d8c2>] riscv_noncoherent_supported+0x10/0x3e
[9.151972] [<ffffffff800027e8>] _apply_alternatives+0x84/0x86
[9.157784] [<ffffffff800029be>] apply_module_alternatives+0x10/0x1a
[9.164113] [<ffffffff80008fcc>] module_finalize+0x5e/0x7a
[9.169583] [<ffffffff80085cd6>] load_module+0xfd8/0x179c
[9.174965] [<ffffffff80086630>] init_module_from_file+0x76/0xaa
[9.180948] [<ffffffff800867f6>] __riscv_sys_finit_module+0x176/0x2a8
[9.187365] [<ffffffff80889862>] do_trap_ecall_u+0xbe/0x130
[9.192922] [<ffffffff808920bc>] ret_from_exception+0x0/0x64
[9.198573] Code: 0009 b7e9 6797 014d a783 85a7 c799 4785 0717 0100 (0123) aef7
[9.205994] ---[ end trace 0000000000000000 ]---

This is because we called riscv_noncoherent_supported() for all the stages
during IOCP probe. riscv_noncoherent_supported() function sets
noncoherent_supported variable to true which has an annotation set to
"__ro_after_init" due to which we were seeing the above splat. Fix this by
probing IOCP during boot stage only.

Fixes: e021ae7 ("riscv: errata: Add Andes alternative ports")
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Nov 22, 2023
When I perform the following test operations:
1.ip link add br0 type bridge
2.brctl addif br0 eth0
3.ip addr add 239.0.0.1/32 dev eth0
4.ip addr add 239.0.0.1/32 dev br0
5.ip addr add 224.0.0.1/32 dev br0
6.while ((1))
    do
        ifconfig br0 up
        ifconfig br0 down
    done
7.send IGMPv2 query packets to port eth0 continuously. For example,
./mausezahn ethX -c 0 "01 00 5e 00 00 01 00 72 19 88 aa 02 08 00 45 00 00
1c 00 01 00 00 01 02 0e 7f c0 a8 0a b7 e0 00 00 01 11 64 ee 9b 00 00 00 00"

The preceding tests may trigger the refcnt uaf issue of the mc list. The
stack is as follows:
	refcount_t: addition on 0; use-after-free.
	WARNING: CPU: 21 PID: 144 at lib/refcount.c:25 refcount_warn_saturate+0x78/0x110
	CPU: 21 PID: 144 Comm: ksoftirqd/21 Kdump: loaded Not tainted 6.7.0-rc1-next-20231117-dirty torvalds#57
	RIP: 0010:refcount_warn_saturate+0x78/0x110
	Call Trace:
	<TASK>
	__warn+0x83/0x130
	refcount_warn_saturate+0x78/0x110
	igmp_start_timer
	igmp_mod_timer
	igmp_heard_query+0x221/0x690
	igmp_rcv+0xea/0x2f0
	ip_protocol_deliver_rcu+0x156/0x160
	ip_local_deliver_finish+0x77/0xa0
	__netif_receive_skb_one_core+0x8b/0xa0
	netif_receive_skb_internal+0x80/0xd0
	netif_receive_skb+0x18/0xc0
	br_handle_frame_finish+0x340/0x5c0 [bridge]
	nf_hook_bridge_pre+0x117/0x130 [bridge]
	__netif_receive_skb_core+0x241/0x1090
	__netif_receive_skb_list_core+0x13f/0x2e0
	__netif_receive_skb_list+0xfc/0x190
	netif_receive_skb_list_internal+0x102/0x1e0
	napi_gro_receive+0xd7/0x220
	e1000_clean_rx_irq+0x1d4/0x4f0 [e1000]
	e1000_clean+0x5e/0xe0 [e1000]
	__napi_poll+0x2c/0x1b0
	net_rx_action+0x2cb/0x3a0
	__do_softirq+0xcd/0x2a7
	run_ksoftirqd+0x22/0x30
	smpboot_thread_fn+0xdb/0x1d0
	kthread+0xe2/0x110
	ret_from_fork+0x34/0x50
	ret_from_fork_asm+0x1a/0x30
	</TASK>

The root causes are as follows:
Thread A					Thread B
...						netif_receive_skb
br_dev_stop					...
    br_multicast_leave_snoopers			...
        __ip_mc_dec_group			...
            __igmp_group_dropped		igmp_rcv
                igmp_stop_timer			    igmp_heard_query         //ref = 1
                ip_ma_put			        igmp_mod_timer
                    refcount_dec_and_test	            igmp_start_timer //ref = 0
			...                                     refcount_inc //ref increases from 0
When the device receives an IGMPv2 Query message, it starts the timer
immediately, regardless of whether the device is running. If the device is
down and has left the multicast group, it will cause the mc list refcount
uaf issue.

Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
prabhakarlad added a commit to prabhakarlad/linux that referenced this pull request Nov 24, 2023
We should be probing for IOCP during boot stage only. As we were probing
for IOCP for all the stages this caused the below issue during module-init
stage,

[9.019104] Unable to handle kernel paging request at virtual address ffffffff8100d3a0
[9.027153] Oops [#1]
[9.029421] Modules linked in: rcar_canfd renesas_usbhs i2c_riic can_dev spi_rspi i2c_core
[9.037686] CPU: 0 PID: 90 Comm: udevd Not tainted 6.7.0-rc1+ torvalds#57
[9.043756] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
[9.050339] epc : riscv_noncoherent_supported+0x10/0x3e
[9.055558]  ra : andes_errata_patch_func+0x4a/0x52
[9.060418] epc : ffffffff8000d8c2 ra : ffffffff8000d95c sp : ffffffc8003abb00
[9.067607]  gp : ffffffff814e25a0 tp : ffffffd80361e540 t0 : 0000000000000000
[9.074795]  t1 : 000000000900031e t2 : 0000000000000001 s0 : ffffffc8003abb20
[9.081984]  s1 : ffffffff015b57c7 a0 : 0000000000000000 a1 : 0000000000000001
[9.089172]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffff8100d8be
[9.096360]  a5 : 0000000000000001 a6 : 0000000000000001 a7 : 000000000900031e
[9.103548]  s2 : ffffffff015b57d7 s3 : 0000000000000001 s4 : 000000000000031e
[9.110736]  s5 : 8000000000008a45 s6 : 0000000000000500 s7 : 000000000000003f
[9.117924]  s8 : ffffffc8003abd48 s9 : ffffffff015b1140 s10: ffffffff8151a1b0
[9.125113]  s11: ffffffff015b1000 t3 : 0000000000000001 t4 : fefefefefefefeff
[9.132301]  t5 : ffffffff015b57c7 t6 : ffffffd8b63a6000
[9.137587] status: 0000000200000120 badaddr: ffffffff8100d3a0 cause: 000000000000000f
[9.145468] [<ffffffff8000d8c2>] riscv_noncoherent_supported+0x10/0x3e
[9.151972] [<ffffffff800027e8>] _apply_alternatives+0x84/0x86
[9.157784] [<ffffffff800029be>] apply_module_alternatives+0x10/0x1a
[9.164113] [<ffffffff80008fcc>] module_finalize+0x5e/0x7a
[9.169583] [<ffffffff80085cd6>] load_module+0xfd8/0x179c
[9.174965] [<ffffffff80086630>] init_module_from_file+0x76/0xaa
[9.180948] [<ffffffff800867f6>] __riscv_sys_finit_module+0x176/0x2a8
[9.187365] [<ffffffff80889862>] do_trap_ecall_u+0xbe/0x130
[9.192922] [<ffffffff808920bc>] ret_from_exception+0x0/0x64
[9.198573] Code: 0009 b7e9 6797 014d a783 85a7 c799 4785 0717 0100 (0123) aef7
[9.205994] ---[ end trace 0000000000000000 ]---

This is because we called riscv_noncoherent_supported() for all the stages
during IOCP probe. riscv_noncoherent_supported() function sets
noncoherent_supported variable to true which has an annotation set to
"__ro_after_init" due to which we were seeing the above splat. Fix this by
probing IOCP during boot stage only.

While at it make return type of errata_probe_iocp() to void as we were
not checking the return value in andes_errata_patch_func().

Fixes: e021ae7 ("riscv: errata: Add Andes alternative ports")
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
prabhakarlad added a commit to prabhakarlad/linux that referenced this pull request Nov 24, 2023
We should be probing for IOCP during boot stage only. As we were probing
for IOCP for all the stages this caused the below issue during module-init
stage,

[9.019104] Unable to handle kernel paging request at virtual address ffffffff8100d3a0
[9.027153] Oops [#1]
[9.029421] Modules linked in: rcar_canfd renesas_usbhs i2c_riic can_dev spi_rspi i2c_core
[9.037686] CPU: 0 PID: 90 Comm: udevd Not tainted 6.7.0-rc1+ torvalds#57
[9.043756] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
[9.050339] epc : riscv_noncoherent_supported+0x10/0x3e
[9.055558]  ra : andes_errata_patch_func+0x4a/0x52
[9.060418] epc : ffffffff8000d8c2 ra : ffffffff8000d95c sp : ffffffc8003abb00
[9.067607]  gp : ffffffff814e25a0 tp : ffffffd80361e540 t0 : 0000000000000000
[9.074795]  t1 : 000000000900031e t2 : 0000000000000001 s0 : ffffffc8003abb20
[9.081984]  s1 : ffffffff015b57c7 a0 : 0000000000000000 a1 : 0000000000000001
[9.089172]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffff8100d8be
[9.096360]  a5 : 0000000000000001 a6 : 0000000000000001 a7 : 000000000900031e
[9.103548]  s2 : ffffffff015b57d7 s3 : 0000000000000001 s4 : 000000000000031e
[9.110736]  s5 : 8000000000008a45 s6 : 0000000000000500 s7 : 000000000000003f
[9.117924]  s8 : ffffffc8003abd48 s9 : ffffffff015b1140 s10: ffffffff8151a1b0
[9.125113]  s11: ffffffff015b1000 t3 : 0000000000000001 t4 : fefefefefefefeff
[9.132301]  t5 : ffffffff015b57c7 t6 : ffffffd8b63a6000
[9.137587] status: 0000000200000120 badaddr: ffffffff8100d3a0 cause: 000000000000000f
[9.145468] [<ffffffff8000d8c2>] riscv_noncoherent_supported+0x10/0x3e
[9.151972] [<ffffffff800027e8>] _apply_alternatives+0x84/0x86
[9.157784] [<ffffffff800029be>] apply_module_alternatives+0x10/0x1a
[9.164113] [<ffffffff80008fcc>] module_finalize+0x5e/0x7a
[9.169583] [<ffffffff80085cd6>] load_module+0xfd8/0x179c
[9.174965] [<ffffffff80086630>] init_module_from_file+0x76/0xaa
[9.180948] [<ffffffff800867f6>] __riscv_sys_finit_module+0x176/0x2a8
[9.187365] [<ffffffff80889862>] do_trap_ecall_u+0xbe/0x130
[9.192922] [<ffffffff808920bc>] ret_from_exception+0x0/0x64
[9.198573] Code: 0009 b7e9 6797 014d a783 85a7 c799 4785 0717 0100 (0123) aef7
[9.205994] ---[ end trace 0000000000000000 ]---

This is because we called riscv_noncoherent_supported() for all the stages
during IOCP probe. riscv_noncoherent_supported() function sets
noncoherent_supported variable to true which has an annotation set to
"__ro_after_init" due to which we were seeing the above splat. Fix this by
probing IOCP during boot stage only.

While at it make return type of errata_probe_iocp() to void as we were
not checking the return value in andes_errata_patch_func().

Fixes: e021ae7 ("riscv: errata: Add Andes alternative ports")
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
prabhakarlad added a commit to prabhakarlad/linux that referenced this pull request Nov 30, 2023
We should be probing for IOCP during boot stage only. As we were probing
for IOCP for all the stages this caused the below issue during module-init
stage,

[9.019104] Unable to handle kernel paging request at virtual address ffffffff8100d3a0
[9.027153] Oops [#1]
[9.029421] Modules linked in: rcar_canfd renesas_usbhs i2c_riic can_dev spi_rspi i2c_core
[9.037686] CPU: 0 PID: 90 Comm: udevd Not tainted 6.7.0-rc1+ torvalds#57
[9.043756] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
[9.050339] epc : riscv_noncoherent_supported+0x10/0x3e
[9.055558]  ra : andes_errata_patch_func+0x4a/0x52
[9.060418] epc : ffffffff8000d8c2 ra : ffffffff8000d95c sp : ffffffc8003abb00
[9.067607]  gp : ffffffff814e25a0 tp : ffffffd80361e540 t0 : 0000000000000000
[9.074795]  t1 : 000000000900031e t2 : 0000000000000001 s0 : ffffffc8003abb20
[9.081984]  s1 : ffffffff015b57c7 a0 : 0000000000000000 a1 : 0000000000000001
[9.089172]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffff8100d8be
[9.096360]  a5 : 0000000000000001 a6 : 0000000000000001 a7 : 000000000900031e
[9.103548]  s2 : ffffffff015b57d7 s3 : 0000000000000001 s4 : 000000000000031e
[9.110736]  s5 : 8000000000008a45 s6 : 0000000000000500 s7 : 000000000000003f
[9.117924]  s8 : ffffffc8003abd48 s9 : ffffffff015b1140 s10: ffffffff8151a1b0
[9.125113]  s11: ffffffff015b1000 t3 : 0000000000000001 t4 : fefefefefefefeff
[9.132301]  t5 : ffffffff015b57c7 t6 : ffffffd8b63a6000
[9.137587] status: 0000000200000120 badaddr: ffffffff8100d3a0 cause: 000000000000000f
[9.145468] [<ffffffff8000d8c2>] riscv_noncoherent_supported+0x10/0x3e
[9.151972] [<ffffffff800027e8>] _apply_alternatives+0x84/0x86
[9.157784] [<ffffffff800029be>] apply_module_alternatives+0x10/0x1a
[9.164113] [<ffffffff80008fcc>] module_finalize+0x5e/0x7a
[9.169583] [<ffffffff80085cd6>] load_module+0xfd8/0x179c
[9.174965] [<ffffffff80086630>] init_module_from_file+0x76/0xaa
[9.180948] [<ffffffff800867f6>] __riscv_sys_finit_module+0x176/0x2a8
[9.187365] [<ffffffff80889862>] do_trap_ecall_u+0xbe/0x130
[9.192922] [<ffffffff808920bc>] ret_from_exception+0x0/0x64
[9.198573] Code: 0009 b7e9 6797 014d a783 85a7 c799 4785 0717 0100 (0123) aef7
[9.205994] ---[ end trace 0000000000000000 ]---

This is because we called riscv_noncoherent_supported() for all the stages
during IOCP probe. riscv_noncoherent_supported() function sets
noncoherent_supported variable to true which has an annotation set to
"__ro_after_init" due to which we were seeing the above splat. Fix this by
probing IOCP during boot stage only.

While at it make return type of errata_probe_iocp() to void as we were
not checking the return value in andes_errata_patch_func().

Fixes: e021ae7 ("riscv: errata: Add Andes alternative ports")
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Nov 30, 2023
We need to probe for IOCP only once during boot stage, as we were probing
for IOCP for all the stages this caused the below issue during module-init
stage,

[9.019104] Unable to handle kernel paging request at virtual address ffffffff8100d3a0
[9.027153] Oops [#1]
[9.029421] Modules linked in: rcar_canfd renesas_usbhs i2c_riic can_dev spi_rspi i2c_core
[9.037686] CPU: 0 PID: 90 Comm: udevd Not tainted 6.7.0-rc1+ torvalds#57
[9.043756] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
[9.050339] epc : riscv_noncoherent_supported+0x10/0x3e
[9.055558]  ra : andes_errata_patch_func+0x4a/0x52
[9.060418] epc : ffffffff8000d8c2 ra : ffffffff8000d95c sp : ffffffc8003abb00
[9.067607]  gp : ffffffff814e25a0 tp : ffffffd80361e540 t0 : 0000000000000000
[9.074795]  t1 : 000000000900031e t2 : 0000000000000001 s0 : ffffffc8003abb20
[9.081984]  s1 : ffffffff015b57c7 a0 : 0000000000000000 a1 : 0000000000000001
[9.089172]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffff8100d8be
[9.096360]  a5 : 0000000000000001 a6 : 0000000000000001 a7 : 000000000900031e
[9.103548]  s2 : ffffffff015b57d7 s3 : 0000000000000001 s4 : 000000000000031e
[9.110736]  s5 : 8000000000008a45 s6 : 0000000000000500 s7 : 000000000000003f
[9.117924]  s8 : ffffffc8003abd48 s9 : ffffffff015b1140 s10: ffffffff8151a1b0
[9.125113]  s11: ffffffff015b1000 t3 : 0000000000000001 t4 : fefefefefefefeff
[9.132301]  t5 : ffffffff015b57c7 t6 : ffffffd8b63a6000
[9.137587] status: 0000000200000120 badaddr: ffffffff8100d3a0 cause: 000000000000000f
[9.145468] [<ffffffff8000d8c2>] riscv_noncoherent_supported+0x10/0x3e
[9.151972] [<ffffffff800027e8>] _apply_alternatives+0x84/0x86
[9.157784] [<ffffffff800029be>] apply_module_alternatives+0x10/0x1a
[9.164113] [<ffffffff80008fcc>] module_finalize+0x5e/0x7a
[9.169583] [<ffffffff80085cd6>] load_module+0xfd8/0x179c
[9.174965] [<ffffffff80086630>] init_module_from_file+0x76/0xaa
[9.180948] [<ffffffff800867f6>] __riscv_sys_finit_module+0x176/0x2a8
[9.187365] [<ffffffff80889862>] do_trap_ecall_u+0xbe/0x130
[9.192922] [<ffffffff808920bc>] ret_from_exception+0x0/0x64
[9.198573] Code: 0009 b7e9 6797 014d a783 85a7 c799 4785 0717 0100 (0123) aef7
[9.205994] ---[ end trace 0000000000000000 ]---

This is because we called riscv_noncoherent_supported() for all the stages
during IOCP probe. riscv_noncoherent_supported() function sets
noncoherent_supported variable to true which has an annotation set to
"__ro_after_init" due to which we were seeing the above splat. Fix this by
probing for IOCP only once in boot stage by having a boolean variable
is_iocp_probe_done which will be set to true upon IOCP probe in
errata_probe_iocp() and we bail out early if is_iocp_probe_done is set.

While at it make return type of errata_probe_iocp() to void as we were
not checking the return value in andes_errata_patch_func().

Fixes: e021ae7 ("riscv: errata: Add Andes alternative ports")
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Nov 30, 2023
We need to probe for IOCP only once during boot stage, as we were probing
for IOCP for all the stages this caused the below issue during module-init
stage,

[9.019104] Unable to handle kernel paging request at virtual address ffffffff8100d3a0
[9.027153] Oops [#1]
[9.029421] Modules linked in: rcar_canfd renesas_usbhs i2c_riic can_dev spi_rspi i2c_core
[9.037686] CPU: 0 PID: 90 Comm: udevd Not tainted 6.7.0-rc1+ torvalds#57
[9.043756] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
[9.050339] epc : riscv_noncoherent_supported+0x10/0x3e
[9.055558]  ra : andes_errata_patch_func+0x4a/0x52
[9.060418] epc : ffffffff8000d8c2 ra : ffffffff8000d95c sp : ffffffc8003abb00
[9.067607]  gp : ffffffff814e25a0 tp : ffffffd80361e540 t0 : 0000000000000000
[9.074795]  t1 : 000000000900031e t2 : 0000000000000001 s0 : ffffffc8003abb20
[9.081984]  s1 : ffffffff015b57c7 a0 : 0000000000000000 a1 : 0000000000000001
[9.089172]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffff8100d8be
[9.096360]  a5 : 0000000000000001 a6 : 0000000000000001 a7 : 000000000900031e
[9.103548]  s2 : ffffffff015b57d7 s3 : 0000000000000001 s4 : 000000000000031e
[9.110736]  s5 : 8000000000008a45 s6 : 0000000000000500 s7 : 000000000000003f
[9.117924]  s8 : ffffffc8003abd48 s9 : ffffffff015b1140 s10: ffffffff8151a1b0
[9.125113]  s11: ffffffff015b1000 t3 : 0000000000000001 t4 : fefefefefefefeff
[9.132301]  t5 : ffffffff015b57c7 t6 : ffffffd8b63a6000
[9.137587] status: 0000000200000120 badaddr: ffffffff8100d3a0 cause: 000000000000000f
[9.145468] [<ffffffff8000d8c2>] riscv_noncoherent_supported+0x10/0x3e
[9.151972] [<ffffffff800027e8>] _apply_alternatives+0x84/0x86
[9.157784] [<ffffffff800029be>] apply_module_alternatives+0x10/0x1a
[9.164113] [<ffffffff80008fcc>] module_finalize+0x5e/0x7a
[9.169583] [<ffffffff80085cd6>] load_module+0xfd8/0x179c
[9.174965] [<ffffffff80086630>] init_module_from_file+0x76/0xaa
[9.180948] [<ffffffff800867f6>] __riscv_sys_finit_module+0x176/0x2a8
[9.187365] [<ffffffff80889862>] do_trap_ecall_u+0xbe/0x130
[9.192922] [<ffffffff808920bc>] ret_from_exception+0x0/0x64
[9.198573] Code: 0009 b7e9 6797 014d a783 85a7 c799 4785 0717 0100 (0123) aef7
[9.205994] ---[ end trace 0000000000000000 ]---

This is because we called riscv_noncoherent_supported() for all the stages
during IOCP probe. riscv_noncoherent_supported() function sets
noncoherent_supported variable to true which has an annotation set to
"__ro_after_init" due to which we were seeing the above splat. Fix this by
probing for IOCP only once in boot stage by having a boolean variable
"done" which will be set to true upon IOCP probe in errata_probe_iocp()
and we bail out early if "done" is set to true.

While at it make return type of errata_probe_iocp() to void as we were
not checking the return value in andes_errata_patch_func().

Fixes: e021ae7 ("riscv: errata: Add Andes alternative ports")
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
prabhakarlad added a commit to prabhakarlad/linux that referenced this pull request Dec 1, 2023
We need to probe for IOCP only once during boot stage, as we were probing
for IOCP for all the stages this caused the below issue during module-init
stage,

[9.019104] Unable to handle kernel paging request at virtual address ffffffff8100d3a0
[9.027153] Oops [#1]
[9.029421] Modules linked in: rcar_canfd renesas_usbhs i2c_riic can_dev spi_rspi i2c_core
[9.037686] CPU: 0 PID: 90 Comm: udevd Not tainted 6.7.0-rc1+ torvalds#57
[9.043756] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
[9.050339] epc : riscv_noncoherent_supported+0x10/0x3e
[9.055558]  ra : andes_errata_patch_func+0x4a/0x52
[9.060418] epc : ffffffff8000d8c2 ra : ffffffff8000d95c sp : ffffffc8003abb00
[9.067607]  gp : ffffffff814e25a0 tp : ffffffd80361e540 t0 : 0000000000000000
[9.074795]  t1 : 000000000900031e t2 : 0000000000000001 s0 : ffffffc8003abb20
[9.081984]  s1 : ffffffff015b57c7 a0 : 0000000000000000 a1 : 0000000000000001
[9.089172]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffff8100d8be
[9.096360]  a5 : 0000000000000001 a6 : 0000000000000001 a7 : 000000000900031e
[9.103548]  s2 : ffffffff015b57d7 s3 : 0000000000000001 s4 : 000000000000031e
[9.110736]  s5 : 8000000000008a45 s6 : 0000000000000500 s7 : 000000000000003f
[9.117924]  s8 : ffffffc8003abd48 s9 : ffffffff015b1140 s10: ffffffff8151a1b0
[9.125113]  s11: ffffffff015b1000 t3 : 0000000000000001 t4 : fefefefefefefeff
[9.132301]  t5 : ffffffff015b57c7 t6 : ffffffd8b63a6000
[9.137587] status: 0000000200000120 badaddr: ffffffff8100d3a0 cause: 000000000000000f
[9.145468] [<ffffffff8000d8c2>] riscv_noncoherent_supported+0x10/0x3e
[9.151972] [<ffffffff800027e8>] _apply_alternatives+0x84/0x86
[9.157784] [<ffffffff800029be>] apply_module_alternatives+0x10/0x1a
[9.164113] [<ffffffff80008fcc>] module_finalize+0x5e/0x7a
[9.169583] [<ffffffff80085cd6>] load_module+0xfd8/0x179c
[9.174965] [<ffffffff80086630>] init_module_from_file+0x76/0xaa
[9.180948] [<ffffffff800867f6>] __riscv_sys_finit_module+0x176/0x2a8
[9.187365] [<ffffffff80889862>] do_trap_ecall_u+0xbe/0x130
[9.192922] [<ffffffff808920bc>] ret_from_exception+0x0/0x64
[9.198573] Code: 0009 b7e9 6797 014d a783 85a7 c799 4785 0717 0100 (0123) aef7
[9.205994] ---[ end trace 0000000000000000 ]---

This is because we called riscv_noncoherent_supported() for all the stages
during IOCP probe. riscv_noncoherent_supported() function sets
noncoherent_supported variable to true which has an annotation set to
"__ro_after_init" due to which we were seeing the above splat. Fix this by
probing for IOCP only once in boot stage by having a boolean variable
"done" which will be set to true upon IOCP probe in errata_probe_iocp()
and we bail out early if "done" is set to true.

While at it make return type of errata_probe_iocp() to void as we were
not checking the return value in andes_errata_patch_func().

Fixes: e021ae7 ("riscv: errata: Add Andes alternative ports")
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
prabhakarlad added a commit to prabhakarlad/linux that referenced this pull request Dec 1, 2023
We need to probe for IOCP only once during boot stage, as we were probing
for IOCP for all the stages this caused the below issue during module-init
stage,

[9.019104] Unable to handle kernel paging request at virtual address ffffffff8100d3a0
[9.027153] Oops [#1]
[9.029421] Modules linked in: rcar_canfd renesas_usbhs i2c_riic can_dev spi_rspi i2c_core
[9.037686] CPU: 0 PID: 90 Comm: udevd Not tainted 6.7.0-rc1+ torvalds#57
[9.043756] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
[9.050339] epc : riscv_noncoherent_supported+0x10/0x3e
[9.055558]  ra : andes_errata_patch_func+0x4a/0x52
[9.060418] epc : ffffffff8000d8c2 ra : ffffffff8000d95c sp : ffffffc8003abb00
[9.067607]  gp : ffffffff814e25a0 tp : ffffffd80361e540 t0 : 0000000000000000
[9.074795]  t1 : 000000000900031e t2 : 0000000000000001 s0 : ffffffc8003abb20
[9.081984]  s1 : ffffffff015b57c7 a0 : 0000000000000000 a1 : 0000000000000001
[9.089172]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffff8100d8be
[9.096360]  a5 : 0000000000000001 a6 : 0000000000000001 a7 : 000000000900031e
[9.103548]  s2 : ffffffff015b57d7 s3 : 0000000000000001 s4 : 000000000000031e
[9.110736]  s5 : 8000000000008a45 s6 : 0000000000000500 s7 : 000000000000003f
[9.117924]  s8 : ffffffc8003abd48 s9 : ffffffff015b1140 s10: ffffffff8151a1b0
[9.125113]  s11: ffffffff015b1000 t3 : 0000000000000001 t4 : fefefefefefefeff
[9.132301]  t5 : ffffffff015b57c7 t6 : ffffffd8b63a6000
[9.137587] status: 0000000200000120 badaddr: ffffffff8100d3a0 cause: 000000000000000f
[9.145468] [<ffffffff8000d8c2>] riscv_noncoherent_supported+0x10/0x3e
[9.151972] [<ffffffff800027e8>] _apply_alternatives+0x84/0x86
[9.157784] [<ffffffff800029be>] apply_module_alternatives+0x10/0x1a
[9.164113] [<ffffffff80008fcc>] module_finalize+0x5e/0x7a
[9.169583] [<ffffffff80085cd6>] load_module+0xfd8/0x179c
[9.174965] [<ffffffff80086630>] init_module_from_file+0x76/0xaa
[9.180948] [<ffffffff800867f6>] __riscv_sys_finit_module+0x176/0x2a8
[9.187365] [<ffffffff80889862>] do_trap_ecall_u+0xbe/0x130
[9.192922] [<ffffffff808920bc>] ret_from_exception+0x0/0x64
[9.198573] Code: 0009 b7e9 6797 014d a783 85a7 c799 4785 0717 0100 (0123) aef7
[9.205994] ---[ end trace 0000000000000000 ]---

This is because we called riscv_noncoherent_supported() for all the stages
during IOCP probe. riscv_noncoherent_supported() function sets
noncoherent_supported variable to true which has an annotation set to
"__ro_after_init" due to which we were seeing the above splat. Fix this by
probing for IOCP only once in boot stage by having a boolean variable
"done" which will be set to true upon IOCP probe in errata_probe_iocp()
and we bail out early if "done" is set to true.

While at it make return type of errata_probe_iocp() to void as we were
not checking the return value in andes_errata_patch_func().

Fixes: e021ae7 ("riscv: errata: Add Andes alternative ports")
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
prabhakarlad added a commit to prabhakarlad/linux that referenced this pull request Dec 4, 2023
We need to probe for IOCP only once during boot stage, as we were probing
for IOCP for all the stages this caused the below issue during module-init
stage,

[9.019104] Unable to handle kernel paging request at virtual address ffffffff8100d3a0
[9.027153] Oops [#1]
[9.029421] Modules linked in: rcar_canfd renesas_usbhs i2c_riic can_dev spi_rspi i2c_core
[9.037686] CPU: 0 PID: 90 Comm: udevd Not tainted 6.7.0-rc1+ torvalds#57
[9.043756] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
[9.050339] epc : riscv_noncoherent_supported+0x10/0x3e
[9.055558]  ra : andes_errata_patch_func+0x4a/0x52
[9.060418] epc : ffffffff8000d8c2 ra : ffffffff8000d95c sp : ffffffc8003abb00
[9.067607]  gp : ffffffff814e25a0 tp : ffffffd80361e540 t0 : 0000000000000000
[9.074795]  t1 : 000000000900031e t2 : 0000000000000001 s0 : ffffffc8003abb20
[9.081984]  s1 : ffffffff015b57c7 a0 : 0000000000000000 a1 : 0000000000000001
[9.089172]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffff8100d8be
[9.096360]  a5 : 0000000000000001 a6 : 0000000000000001 a7 : 000000000900031e
[9.103548]  s2 : ffffffff015b57d7 s3 : 0000000000000001 s4 : 000000000000031e
[9.110736]  s5 : 8000000000008a45 s6 : 0000000000000500 s7 : 000000000000003f
[9.117924]  s8 : ffffffc8003abd48 s9 : ffffffff015b1140 s10: ffffffff8151a1b0
[9.125113]  s11: ffffffff015b1000 t3 : 0000000000000001 t4 : fefefefefefefeff
[9.132301]  t5 : ffffffff015b57c7 t6 : ffffffd8b63a6000
[9.137587] status: 0000000200000120 badaddr: ffffffff8100d3a0 cause: 000000000000000f
[9.145468] [<ffffffff8000d8c2>] riscv_noncoherent_supported+0x10/0x3e
[9.151972] [<ffffffff800027e8>] _apply_alternatives+0x84/0x86
[9.157784] [<ffffffff800029be>] apply_module_alternatives+0x10/0x1a
[9.164113] [<ffffffff80008fcc>] module_finalize+0x5e/0x7a
[9.169583] [<ffffffff80085cd6>] load_module+0xfd8/0x179c
[9.174965] [<ffffffff80086630>] init_module_from_file+0x76/0xaa
[9.180948] [<ffffffff800867f6>] __riscv_sys_finit_module+0x176/0x2a8
[9.187365] [<ffffffff80889862>] do_trap_ecall_u+0xbe/0x130
[9.192922] [<ffffffff808920bc>] ret_from_exception+0x0/0x64
[9.198573] Code: 0009 b7e9 6797 014d a783 85a7 c799 4785 0717 0100 (0123) aef7
[9.205994] ---[ end trace 0000000000000000 ]---

This is because we called riscv_noncoherent_supported() for all the stages
during IOCP probe. riscv_noncoherent_supported() function sets
noncoherent_supported variable to true which has an annotation set to
"__ro_after_init" due to which we were seeing the above splat. Fix this by
probing for IOCP only once in boot stage by having a boolean variable
"done" which will be set to true upon IOCP probe in errata_probe_iocp()
and we bail out early if "done" is set to true.

While at it make return type of errata_probe_iocp() to void as we were
not checking the return value in andes_errata_patch_func().

Fixes: e021ae7 ("riscv: errata: Add Andes alternative ports")
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Dec 8, 2023
We need to probe for IOCP only once during boot stage, as we were probing
for IOCP for all the stages this caused the below issue during module-init
stage,

[9.019104] Unable to handle kernel paging request at virtual address ffffffff8100d3a0
[9.027153] Oops [#1]
[9.029421] Modules linked in: rcar_canfd renesas_usbhs i2c_riic can_dev spi_rspi i2c_core
[9.037686] CPU: 0 PID: 90 Comm: udevd Not tainted 6.7.0-rc1+ torvalds#57
[9.043756] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
[9.050339] epc : riscv_noncoherent_supported+0x10/0x3e
[9.055558]  ra : andes_errata_patch_func+0x4a/0x52
[9.060418] epc : ffffffff8000d8c2 ra : ffffffff8000d95c sp : ffffffc8003abb00
[9.067607]  gp : ffffffff814e25a0 tp : ffffffd80361e540 t0 : 0000000000000000
[9.074795]  t1 : 000000000900031e t2 : 0000000000000001 s0 : ffffffc8003abb20
[9.081984]  s1 : ffffffff015b57c7 a0 : 0000000000000000 a1 : 0000000000000001
[9.089172]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffff8100d8be
[9.096360]  a5 : 0000000000000001 a6 : 0000000000000001 a7 : 000000000900031e
[9.103548]  s2 : ffffffff015b57d7 s3 : 0000000000000001 s4 : 000000000000031e
[9.110736]  s5 : 8000000000008a45 s6 : 0000000000000500 s7 : 000000000000003f
[9.117924]  s8 : ffffffc8003abd48 s9 : ffffffff015b1140 s10: ffffffff8151a1b0
[9.125113]  s11: ffffffff015b1000 t3 : 0000000000000001 t4 : fefefefefefefeff
[9.132301]  t5 : ffffffff015b57c7 t6 : ffffffd8b63a6000
[9.137587] status: 0000000200000120 badaddr: ffffffff8100d3a0 cause: 000000000000000f
[9.145468] [<ffffffff8000d8c2>] riscv_noncoherent_supported+0x10/0x3e
[9.151972] [<ffffffff800027e8>] _apply_alternatives+0x84/0x86
[9.157784] [<ffffffff800029be>] apply_module_alternatives+0x10/0x1a
[9.164113] [<ffffffff80008fcc>] module_finalize+0x5e/0x7a
[9.169583] [<ffffffff80085cd6>] load_module+0xfd8/0x179c
[9.174965] [<ffffffff80086630>] init_module_from_file+0x76/0xaa
[9.180948] [<ffffffff800867f6>] __riscv_sys_finit_module+0x176/0x2a8
[9.187365] [<ffffffff80889862>] do_trap_ecall_u+0xbe/0x130
[9.192922] [<ffffffff808920bc>] ret_from_exception+0x0/0x64
[9.198573] Code: 0009 b7e9 6797 014d a783 85a7 c799 4785 0717 0100 (0123) aef7
[9.205994] ---[ end trace 0000000000000000 ]---

This is because we called riscv_noncoherent_supported() for all the stages
during IOCP probe. riscv_noncoherent_supported() function sets
noncoherent_supported variable to true which has an annotation set to
"__ro_after_init" due to which we were seeing the above splat. Fix this by
probing for IOCP only once in boot stage by having a boolean variable
"done" which will be set to true upon IOCP probe in errata_probe_iocp()
and we bail out early if "done" is set to true.

While at it make return type of errata_probe_iocp() to void as we were
not checking the return value in andes_errata_patch_func().

Fixes: e021ae7 ("riscv: errata: Add Andes alternative ports")
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Yu Chien Peter Lin <peterlin@andestech.com>
Link: https://lore.kernel.org/r/20231130212647.108746-1-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
esmil pushed a commit to esmil/linux that referenced this pull request Dec 10, 2023
We need to probe for IOCP only once during boot stage, as we were probing
for IOCP for all the stages this caused the below issue during module-init
stage,

[9.019104] Unable to handle kernel paging request at virtual address ffffffff8100d3a0
[9.027153] Oops [#1]
[9.029421] Modules linked in: rcar_canfd renesas_usbhs i2c_riic can_dev spi_rspi i2c_core
[9.037686] CPU: 0 PID: 90 Comm: udevd Not tainted 6.7.0-rc1+ torvalds#57
[9.043756] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
[9.050339] epc : riscv_noncoherent_supported+0x10/0x3e
[9.055558]  ra : andes_errata_patch_func+0x4a/0x52
[9.060418] epc : ffffffff8000d8c2 ra : ffffffff8000d95c sp : ffffffc8003abb00
[9.067607]  gp : ffffffff814e25a0 tp : ffffffd80361e540 t0 : 0000000000000000
[9.074795]  t1 : 000000000900031e t2 : 0000000000000001 s0 : ffffffc8003abb20
[9.081984]  s1 : ffffffff015b57c7 a0 : 0000000000000000 a1 : 0000000000000001
[9.089172]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffff8100d8be
[9.096360]  a5 : 0000000000000001 a6 : 0000000000000001 a7 : 000000000900031e
[9.103548]  s2 : ffffffff015b57d7 s3 : 0000000000000001 s4 : 000000000000031e
[9.110736]  s5 : 8000000000008a45 s6 : 0000000000000500 s7 : 000000000000003f
[9.117924]  s8 : ffffffc8003abd48 s9 : ffffffff015b1140 s10: ffffffff8151a1b0
[9.125113]  s11: ffffffff015b1000 t3 : 0000000000000001 t4 : fefefefefefefeff
[9.132301]  t5 : ffffffff015b57c7 t6 : ffffffd8b63a6000
[9.137587] status: 0000000200000120 badaddr: ffffffff8100d3a0 cause: 000000000000000f
[9.145468] [<ffffffff8000d8c2>] riscv_noncoherent_supported+0x10/0x3e
[9.151972] [<ffffffff800027e8>] _apply_alternatives+0x84/0x86
[9.157784] [<ffffffff800029be>] apply_module_alternatives+0x10/0x1a
[9.164113] [<ffffffff80008fcc>] module_finalize+0x5e/0x7a
[9.169583] [<ffffffff80085cd6>] load_module+0xfd8/0x179c
[9.174965] [<ffffffff80086630>] init_module_from_file+0x76/0xaa
[9.180948] [<ffffffff800867f6>] __riscv_sys_finit_module+0x176/0x2a8
[9.187365] [<ffffffff80889862>] do_trap_ecall_u+0xbe/0x130
[9.192922] [<ffffffff808920bc>] ret_from_exception+0x0/0x64
[9.198573] Code: 0009 b7e9 6797 014d a783 85a7 c799 4785 0717 0100 (0123) aef7
[9.205994] ---[ end trace 0000000000000000 ]---

This is because we called riscv_noncoherent_supported() for all the stages
during IOCP probe. riscv_noncoherent_supported() function sets
noncoherent_supported variable to true which has an annotation set to
"__ro_after_init" due to which we were seeing the above splat. Fix this by
probing for IOCP only once in boot stage by having a boolean variable
"done" which will be set to true upon IOCP probe in errata_probe_iocp()
and we bail out early if "done" is set to true.

While at it make return type of errata_probe_iocp() to void as we were
not checking the return value in andes_errata_patch_func().

Fixes: e021ae7 ("riscv: errata: Add Andes alternative ports")
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Yu Chien Peter Lin <peterlin@andestech.com>
Link: https://lore.kernel.org/r/20231130212647.108746-1-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
(cherry picked from commit ed5b7cf)
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
BoukeHaarsma23 pushed a commit to ChimeraOS/linux that referenced this pull request Dec 13, 2023
[ Upstream commit ed5b7cf ]

We need to probe for IOCP only once during boot stage, as we were probing
for IOCP for all the stages this caused the below issue during module-init
stage,

[9.019104] Unable to handle kernel paging request at virtual address ffffffff8100d3a0
[9.027153] Oops [#1]
[9.029421] Modules linked in: rcar_canfd renesas_usbhs i2c_riic can_dev spi_rspi i2c_core
[9.037686] CPU: 0 PID: 90 Comm: udevd Not tainted 6.7.0-rc1+ torvalds#57
[9.043756] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
[9.050339] epc : riscv_noncoherent_supported+0x10/0x3e
[9.055558]  ra : andes_errata_patch_func+0x4a/0x52
[9.060418] epc : ffffffff8000d8c2 ra : ffffffff8000d95c sp : ffffffc8003abb00
[9.067607]  gp : ffffffff814e25a0 tp : ffffffd80361e540 t0 : 0000000000000000
[9.074795]  t1 : 000000000900031e t2 : 0000000000000001 s0 : ffffffc8003abb20
[9.081984]  s1 : ffffffff015b57c7 a0 : 0000000000000000 a1 : 0000000000000001
[9.089172]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffff8100d8be
[9.096360]  a5 : 0000000000000001 a6 : 0000000000000001 a7 : 000000000900031e
[9.103548]  s2 : ffffffff015b57d7 s3 : 0000000000000001 s4 : 000000000000031e
[9.110736]  s5 : 8000000000008a45 s6 : 0000000000000500 s7 : 000000000000003f
[9.117924]  s8 : ffffffc8003abd48 s9 : ffffffff015b1140 s10: ffffffff8151a1b0
[9.125113]  s11: ffffffff015b1000 t3 : 0000000000000001 t4 : fefefefefefefeff
[9.132301]  t5 : ffffffff015b57c7 t6 : ffffffd8b63a6000
[9.137587] status: 0000000200000120 badaddr: ffffffff8100d3a0 cause: 000000000000000f
[9.145468] [<ffffffff8000d8c2>] riscv_noncoherent_supported+0x10/0x3e
[9.151972] [<ffffffff800027e8>] _apply_alternatives+0x84/0x86
[9.157784] [<ffffffff800029be>] apply_module_alternatives+0x10/0x1a
[9.164113] [<ffffffff80008fcc>] module_finalize+0x5e/0x7a
[9.169583] [<ffffffff80085cd6>] load_module+0xfd8/0x179c
[9.174965] [<ffffffff80086630>] init_module_from_file+0x76/0xaa
[9.180948] [<ffffffff800867f6>] __riscv_sys_finit_module+0x176/0x2a8
[9.187365] [<ffffffff80889862>] do_trap_ecall_u+0xbe/0x130
[9.192922] [<ffffffff808920bc>] ret_from_exception+0x0/0x64
[9.198573] Code: 0009 b7e9 6797 014d a783 85a7 c799 4785 0717 0100 (0123) aef7
[9.205994] ---[ end trace 0000000000000000 ]---

This is because we called riscv_noncoherent_supported() for all the stages
during IOCP probe. riscv_noncoherent_supported() function sets
noncoherent_supported variable to true which has an annotation set to
"__ro_after_init" due to which we were seeing the above splat. Fix this by
probing for IOCP only once in boot stage by having a boolean variable
"done" which will be set to true upon IOCP probe in errata_probe_iocp()
and we bail out early if "done" is set to true.

While at it make return type of errata_probe_iocp() to void as we were
not checking the return value in andes_errata_patch_func().

Fixes: e021ae7 ("riscv: errata: Add Andes alternative ports")
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Yu Chien Peter Lin <peterlin@andestech.com>
Link: https://lore.kernel.org/r/20231130212647.108746-1-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
gyroninja added a commit to gyroninja/linux that referenced this pull request Jan 28, 2024
KSAN calls into rcu code which then triggers a write that reenters into KSAN
getting the system stuck doing infinite recursion.

#0  kmsan_get_context () at mm/kmsan/kmsan.h:106
#1  __msan_get_context_state () at mm/kmsan/instrumentation.c:331
#2  0xffffffff81495671 in get_current () at ./arch/x86/include/asm/current.h:42
#3  rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
#4  __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
#5  0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#6  pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#7  kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82
torvalds#8  virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75
torvalds#9  0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143
torvalds#10 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97
torvalds#11 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36
torvalds#12 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91
torvalds#13 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
torvalds#14 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
torvalds#15 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#16 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#17 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82
torvalds#18 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75
torvalds#19 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143
torvalds#20 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97
torvalds#21 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36
torvalds#22 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91
torvalds#23 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
torvalds#24 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
torvalds#25 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#26 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#27 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82
torvalds#28 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75
torvalds#29 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143
torvalds#30 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97
torvalds#31 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36
torvalds#32 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91
torvalds#33 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
torvalds#34 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
torvalds#35 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#36 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#37 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82
torvalds#38 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75
torvalds#39 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143
torvalds#40 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97
torvalds#41 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36
torvalds#42 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91
torvalds#43 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
torvalds#44 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
torvalds#45 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#46 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#47 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82
torvalds#48 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75
torvalds#49 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143
torvalds#50 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97
torvalds#51 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36
#52 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91
#53 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
torvalds#54 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
torvalds#55 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#56 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#57 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82
#58 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75
torvalds#59 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143
torvalds#60 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97
torvalds#61 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36
torvalds#62 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91
torvalds#63 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
torvalds#64 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
torvalds#65 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#66 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#67 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82
torvalds#68 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75
torvalds#69 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143
#70 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97
torvalds#71 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36
torvalds#72 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91
torvalds#73 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
torvalds#74 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
torvalds#75 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#76 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#77 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff86203c90) at ./arch/x86/include/asm/kmsan.h:82
torvalds#78 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff86203c90) at mm/kmsan/shadow.c:75
torvalds#79 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff86203c90, is_origin=false) at mm/kmsan/shadow.c:143
torvalds#80 kmsan_get_shadow_origin_ptr (address=0xffffffff86203c90, size=8, store=false) at mm/kmsan/shadow.c:97
torvalds#81 0xffffffff81b1dc72 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=8, store=false) at mm/kmsan/instrumentation.c:36
torvalds#82 __msan_metadata_ptr_for_load_8 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:92
torvalds#83 0xffffffff814fdb9e in filter_irq_stacks (entries=<optimized out>, nr_entries=4) at kernel/stacktrace.c:397
torvalds#84 0xffffffff829520e8 in stack_depot_save_flags (entries=0xffffffff8620d974 <init_task+1012>, nr_entries=4, alloc_flags=0, depot_flags=0) at lib/stackdepot.c:500
torvalds#85 0xffffffff81b1e560 in __msan_poison_alloca (address=0xffffffff86203da0, size=24, descr=<optimized out>) at mm/kmsan/instrumentation.c:285
torvalds#86 0xffffffff8562821c in _printk (fmt=0xffffffff85f191a5 "\0016Attempting lock1") at kernel/printk/printk.c:2324
torvalds#87 0xffffffff81942aa2 in kmem_cache_create_usercopy (name=0xffffffff85f18903 "mm_struct", size=1296, align=0, flags=270336, useroffset=<optimized out>, usersize=<optimized out>, ctor=0x0 <fixed_percpu_data>) at mm/slab_common.c:296
torvalds#88 0xffffffff86f337a0 in mm_cache_init () at kernel/fork.c:3262
torvalds#89 0xffffffff86eacb8e in start_kernel () at init/main.c:932
torvalds#90 0xffffffff86ecdf94 in x86_64_start_reservations (real_mode_data=0x140e0 <exception_stacks+28896> <error: Cannot access memory at address 0x140e0>) at arch/x86/kernel/head64.c:555
torvalds#91 0xffffffff86ecde9b in x86_64_start_kernel (real_mode_data=0x140e0 <exception_stacks+28896> <error: Cannot access memory at address 0x140e0>) at arch/x86/kernel/head64.c:536
torvalds#92 0xffffffff810001d3 in secondary_startup_64 () at /pool/workspace/linux/arch/x86/kernel/head_64.S:461
torvalds#93 0x0000000000000000 in ??
gyroninja added a commit to gyroninja/linux that referenced this pull request Jan 28, 2024
As of 5ec8e8e(mm/sparsemem: fix race in accessing memory_section->usage) KMSAN
now calls into RCU tree code during kmsan_get_metadata. This will trigger a
write that will reenter into KMSAN getting the system stuck doing infinite
recursion.

#0  kmsan_get_context () at mm/kmsan/kmsan.h:106
#1  __msan_get_context_state () at mm/kmsan/instrumentation.c:331
#2  0xffffffff81495671 in get_current () at ./arch/x86/include/asm/current.h:42
#3  rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
#4  __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
#5  0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#6  pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#7  kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82
torvalds#8  virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75
torvalds#9  0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143
torvalds#10 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97
torvalds#11 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36
torvalds#12 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91
torvalds#13 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
torvalds#14 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
torvalds#15 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#16 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#17 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82
torvalds#18 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75
torvalds#19 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143
torvalds#20 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97
torvalds#21 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36
torvalds#22 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91
torvalds#23 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
torvalds#24 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
torvalds#25 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#26 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#27 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82
torvalds#28 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75
torvalds#29 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143
torvalds#30 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97
torvalds#31 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36
torvalds#32 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91
torvalds#33 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
torvalds#34 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
torvalds#35 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#36 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#37 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82
torvalds#38 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75
torvalds#39 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143
torvalds#40 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97
torvalds#41 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36
torvalds#42 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91
torvalds#43 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
torvalds#44 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
torvalds#45 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#46 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#47 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82
torvalds#48 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75
torvalds#49 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143
torvalds#50 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97
torvalds#51 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36
#52 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91
#53 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
torvalds#54 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
torvalds#55 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#56 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#57 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82
#58 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75
torvalds#59 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143
torvalds#60 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97
torvalds#61 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36
torvalds#62 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91
torvalds#63 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
torvalds#64 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
torvalds#65 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#66 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#67 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82
torvalds#68 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75
torvalds#69 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143
#70 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97
torvalds#71 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36
torvalds#72 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91
torvalds#73 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
torvalds#74 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
torvalds#75 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#76 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#77 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff86203c90) at ./arch/x86/include/asm/kmsan.h:82
torvalds#78 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff86203c90) at mm/kmsan/shadow.c:75
torvalds#79 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff86203c90, is_origin=false) at mm/kmsan/shadow.c:143
torvalds#80 kmsan_get_shadow_origin_ptr (address=0xffffffff86203c90, size=8, store=false) at mm/kmsan/shadow.c:97
torvalds#81 0xffffffff81b1dc72 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=8, store=false) at mm/kmsan/instrumentation.c:36
torvalds#82 __msan_metadata_ptr_for_load_8 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:92
torvalds#83 0xffffffff814fdb9e in filter_irq_stacks (entries=<optimized out>, nr_entries=4) at kernel/stacktrace.c:397
torvalds#84 0xffffffff829520e8 in stack_depot_save_flags (entries=0xffffffff8620d974 <init_task+1012>, nr_entries=4, alloc_flags=0, depot_flags=0) at lib/stackdepot.c:500
torvalds#85 0xffffffff81b1e560 in __msan_poison_alloca (address=0xffffffff86203da0, size=24, descr=<optimized out>) at mm/kmsan/instrumentation.c:285
torvalds#86 0xffffffff8562821c in _printk (fmt=0xffffffff85f191a5 "\0016Attempting lock1") at kernel/printk/printk.c:2324
torvalds#87 0xffffffff81942aa2 in kmem_cache_create_usercopy (name=0xffffffff85f18903 "mm_struct", size=1296, align=0, flags=270336, useroffset=<optimized out>, usersize=<optimized out>, ctor=0x0 <fixed_percpu_data>) at mm/slab_common.c:296
torvalds#88 0xffffffff86f337a0 in mm_cache_init () at kernel/fork.c:3262
torvalds#89 0xffffffff86eacb8e in start_kernel () at init/main.c:932
torvalds#90 0xffffffff86ecdf94 in x86_64_start_reservations (real_mode_data=0x140e0 <exception_stacks+28896> <error: Cannot access memory at address 0x140e0>) at arch/x86/kernel/head64.c:555
torvalds#91 0xffffffff86ecde9b in x86_64_start_kernel (real_mode_data=0x140e0 <exception_stacks+28896> <error: Cannot access memory at address 0x140e0>) at arch/x86/kernel/head64.c:536
torvalds#92 0xffffffff810001d3 in secondary_startup_64 () at /pool/workspace/linux/arch/x86/kernel/head_64.S:461
torvalds#93 0x0000000000000000 in ??
jonhunter pushed a commit to jonhunter/linux that referenced this pull request Apr 8, 2024
During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

    BUG: unable to handle page fault for address: 000000000002a2b8
    #PF: supervisor write access in kernel mode
    #PF: error_code(0x0002) - not-present page
    PGD 1470e1067 P4D 0
    Oops: 0002 [#1] PREEMPT SMP NOPTI
    CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ torvalds#57
    Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
    RIP: 0010:mutex_lock+0x2e/0x50
    ...
    Call Trace:
    <TASK>
    __die+0x24/0x70
    page_fault_oops+0x82/0x160
    do_user_addr_fault+0x65/0x6b0
    __pfx___rdmsr_safe_on_cpu+0x10/0x10
    exc_page_fault+0x7d/0x170
    asm_exc_page_fault+0x26/0x30
    mutex_lock+0x2e/0x50
    mutex_lock+0x1e/0x50
    perf_pmu_migrate_context+0x87/0x1f0
    perf_event_cpu_offline+0x76/0x90 [idxd]
    cpuhp_invoke_callback+0xa2/0x4f0
    __pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
    cpuhp_thread_fun+0x98/0x150
    smpboot_thread_fn+0x27/0x260
    smpboot_thread_fn+0x1af/0x260
    __pfx_smpboot_thread_fn+0x10/0x10
    kthread+0x103/0x140
    __pfx_kthread+0x10/0x10
    ret_from_fork+0x31/0x50
    __pfx_kthread+0x10/0x10
    ret_from_fork_asm+0x1b/0x30
    <TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Kaz205 pushed a commit to Kaz205/linux that referenced this pull request Apr 29, 2024
[ Upstream commit f221033 ]

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

    BUG: unable to handle page fault for address: 000000000002a2b8
    #PF: supervisor write access in kernel mode
    #PF: error_code(0x0002) - not-present page
    PGD 1470e1067 P4D 0
    Oops: 0002 [#1] PREEMPT SMP NOPTI
    CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ torvalds#57
    Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
    RIP: 0010:mutex_lock+0x2e/0x50
    ...
    Call Trace:
    <TASK>
    __die+0x24/0x70
    page_fault_oops+0x82/0x160
    do_user_addr_fault+0x65/0x6b0
    __pfx___rdmsr_safe_on_cpu+0x10/0x10
    exc_page_fault+0x7d/0x170
    asm_exc_page_fault+0x26/0x30
    mutex_lock+0x2e/0x50
    mutex_lock+0x1e/0x50
    perf_pmu_migrate_context+0x87/0x1f0
    perf_event_cpu_offline+0x76/0x90 [idxd]
    cpuhp_invoke_callback+0xa2/0x4f0
    __pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
    cpuhp_thread_fun+0x98/0x150
    smpboot_thread_fn+0x27/0x260
    smpboot_thread_fn+0x1af/0x260
    __pfx_smpboot_thread_fn+0x10/0x10
    kthread+0x103/0x140
    __pfx_kthread+0x10/0x10
    ret_from_fork+0x31/0x50
    __pfx_kthread+0x10/0x10
    ret_from_fork_asm+0x1b/0x30
    <TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Apr 30, 2024
[ Upstream commit f221033 ]

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

    BUG: unable to handle page fault for address: 000000000002a2b8
    #PF: supervisor write access in kernel mode
    #PF: error_code(0x0002) - not-present page
    PGD 1470e1067 P4D 0
    Oops: 0002 [#1] PREEMPT SMP NOPTI
    CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ torvalds#57
    Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
    RIP: 0010:mutex_lock+0x2e/0x50
    ...
    Call Trace:
    <TASK>
    __die+0x24/0x70
    page_fault_oops+0x82/0x160
    do_user_addr_fault+0x65/0x6b0
    __pfx___rdmsr_safe_on_cpu+0x10/0x10
    exc_page_fault+0x7d/0x170
    asm_exc_page_fault+0x26/0x30
    mutex_lock+0x2e/0x50
    mutex_lock+0x1e/0x50
    perf_pmu_migrate_context+0x87/0x1f0
    perf_event_cpu_offline+0x76/0x90 [idxd]
    cpuhp_invoke_callback+0xa2/0x4f0
    __pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
    cpuhp_thread_fun+0x98/0x150
    smpboot_thread_fn+0x27/0x260
    smpboot_thread_fn+0x1af/0x260
    __pfx_smpboot_thread_fn+0x10/0x10
    kthread+0x103/0x140
    __pfx_kthread+0x10/0x10
    ret_from_fork+0x31/0x50
    __pfx_kthread+0x10/0x10
    ret_from_fork_asm+0x1b/0x30
    <TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Apr 30, 2024
[ Upstream commit f221033 ]

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

    BUG: unable to handle page fault for address: 000000000002a2b8
    #PF: supervisor write access in kernel mode
    #PF: error_code(0x0002) - not-present page
    PGD 1470e1067 P4D 0
    Oops: 0002 [#1] PREEMPT SMP NOPTI
    CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ torvalds#57
    Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
    RIP: 0010:mutex_lock+0x2e/0x50
    ...
    Call Trace:
    <TASK>
    __die+0x24/0x70
    page_fault_oops+0x82/0x160
    do_user_addr_fault+0x65/0x6b0
    __pfx___rdmsr_safe_on_cpu+0x10/0x10
    exc_page_fault+0x7d/0x170
    asm_exc_page_fault+0x26/0x30
    mutex_lock+0x2e/0x50
    mutex_lock+0x1e/0x50
    perf_pmu_migrate_context+0x87/0x1f0
    perf_event_cpu_offline+0x76/0x90 [idxd]
    cpuhp_invoke_callback+0xa2/0x4f0
    __pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
    cpuhp_thread_fun+0x98/0x150
    smpboot_thread_fn+0x27/0x260
    smpboot_thread_fn+0x1af/0x260
    __pfx_smpboot_thread_fn+0x10/0x10
    kthread+0x103/0x140
    __pfx_kthread+0x10/0x10
    ret_from_fork+0x31/0x50
    __pfx_kthread+0x10/0x10
    ret_from_fork_asm+0x1b/0x30
    <TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Kaz205 pushed a commit to Kaz205/linux that referenced this pull request May 1, 2024
[ Upstream commit f221033 ]

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

    BUG: unable to handle page fault for address: 000000000002a2b8
    #PF: supervisor write access in kernel mode
    #PF: error_code(0x0002) - not-present page
    PGD 1470e1067 P4D 0
    Oops: 0002 [#1] PREEMPT SMP NOPTI
    CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ torvalds#57
    Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
    RIP: 0010:mutex_lock+0x2e/0x50
    ...
    Call Trace:
    <TASK>
    __die+0x24/0x70
    page_fault_oops+0x82/0x160
    do_user_addr_fault+0x65/0x6b0
    __pfx___rdmsr_safe_on_cpu+0x10/0x10
    exc_page_fault+0x7d/0x170
    asm_exc_page_fault+0x26/0x30
    mutex_lock+0x2e/0x50
    mutex_lock+0x1e/0x50
    perf_pmu_migrate_context+0x87/0x1f0
    perf_event_cpu_offline+0x76/0x90 [idxd]
    cpuhp_invoke_callback+0xa2/0x4f0
    __pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
    cpuhp_thread_fun+0x98/0x150
    smpboot_thread_fn+0x27/0x260
    smpboot_thread_fn+0x1af/0x260
    __pfx_smpboot_thread_fn+0x10/0x10
    kthread+0x103/0x140
    __pfx_kthread+0x10/0x10
    ret_from_fork+0x31/0x50
    __pfx_kthread+0x10/0x10
    ret_from_fork_asm+0x1b/0x30
    <TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
1054009064 pushed a commit to 1054009064/linux that referenced this pull request May 2, 2024
[ Upstream commit f221033 ]

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

    BUG: unable to handle page fault for address: 000000000002a2b8
    #PF: supervisor write access in kernel mode
    #PF: error_code(0x0002) - not-present page
    PGD 1470e1067 P4D 0
    Oops: 0002 [#1] PREEMPT SMP NOPTI
    CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ torvalds#57
    Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
    RIP: 0010:mutex_lock+0x2e/0x50
    ...
    Call Trace:
    <TASK>
    __die+0x24/0x70
    page_fault_oops+0x82/0x160
    do_user_addr_fault+0x65/0x6b0
    __pfx___rdmsr_safe_on_cpu+0x10/0x10
    exc_page_fault+0x7d/0x170
    asm_exc_page_fault+0x26/0x30
    mutex_lock+0x2e/0x50
    mutex_lock+0x1e/0x50
    perf_pmu_migrate_context+0x87/0x1f0
    perf_event_cpu_offline+0x76/0x90 [idxd]
    cpuhp_invoke_callback+0xa2/0x4f0
    __pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
    cpuhp_thread_fun+0x98/0x150
    smpboot_thread_fn+0x27/0x260
    smpboot_thread_fn+0x1af/0x260
    __pfx_smpboot_thread_fn+0x10/0x10
    kthread+0x103/0x140
    __pfx_kthread+0x10/0x10
    ret_from_fork+0x31/0x50
    __pfx_kthread+0x10/0x10
    ret_from_fork_asm+0x1b/0x30
    <TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
1054009064 pushed a commit to 1054009064/linux that referenced this pull request May 2, 2024
[ Upstream commit f221033 ]

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

    BUG: unable to handle page fault for address: 000000000002a2b8
    #PF: supervisor write access in kernel mode
    #PF: error_code(0x0002) - not-present page
    PGD 1470e1067 P4D 0
    Oops: 0002 [#1] PREEMPT SMP NOPTI
    CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ torvalds#57
    Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
    RIP: 0010:mutex_lock+0x2e/0x50
    ...
    Call Trace:
    <TASK>
    __die+0x24/0x70
    page_fault_oops+0x82/0x160
    do_user_addr_fault+0x65/0x6b0
    __pfx___rdmsr_safe_on_cpu+0x10/0x10
    exc_page_fault+0x7d/0x170
    asm_exc_page_fault+0x26/0x30
    mutex_lock+0x2e/0x50
    mutex_lock+0x1e/0x50
    perf_pmu_migrate_context+0x87/0x1f0
    perf_event_cpu_offline+0x76/0x90 [idxd]
    cpuhp_invoke_callback+0xa2/0x4f0
    __pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
    cpuhp_thread_fun+0x98/0x150
    smpboot_thread_fn+0x27/0x260
    smpboot_thread_fn+0x1af/0x260
    __pfx_smpboot_thread_fn+0x10/0x10
    kthread+0x103/0x140
    __pfx_kthread+0x10/0x10
    ret_from_fork+0x31/0x50
    __pfx_kthread+0x10/0x10
    ret_from_fork_asm+0x1b/0x30
    <TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
1054009064 pushed a commit to 1054009064/linux that referenced this pull request May 2, 2024
[ Upstream commit f221033 ]

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

    BUG: unable to handle page fault for address: 000000000002a2b8
    #PF: supervisor write access in kernel mode
    #PF: error_code(0x0002) - not-present page
    PGD 1470e1067 P4D 0
    Oops: 0002 [#1] PREEMPT SMP NOPTI
    CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ torvalds#57
    Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
    RIP: 0010:mutex_lock+0x2e/0x50
    ...
    Call Trace:
    <TASK>
    __die+0x24/0x70
    page_fault_oops+0x82/0x160
    do_user_addr_fault+0x65/0x6b0
    __pfx___rdmsr_safe_on_cpu+0x10/0x10
    exc_page_fault+0x7d/0x170
    asm_exc_page_fault+0x26/0x30
    mutex_lock+0x2e/0x50
    mutex_lock+0x1e/0x50
    perf_pmu_migrate_context+0x87/0x1f0
    perf_event_cpu_offline+0x76/0x90 [idxd]
    cpuhp_invoke_callback+0xa2/0x4f0
    __pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
    cpuhp_thread_fun+0x98/0x150
    smpboot_thread_fn+0x27/0x260
    smpboot_thread_fn+0x1af/0x260
    __pfx_smpboot_thread_fn+0x10/0x10
    kthread+0x103/0x140
    __pfx_kthread+0x10/0x10
    ret_from_fork+0x31/0x50
    __pfx_kthread+0x10/0x10
    ret_from_fork_asm+0x1b/0x30
    <TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
ptr1337 pushed a commit to CachyOS/linux that referenced this pull request May 2, 2024
[ Upstream commit f221033 ]

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

    BUG: unable to handle page fault for address: 000000000002a2b8
    #PF: supervisor write access in kernel mode
    #PF: error_code(0x0002) - not-present page
    PGD 1470e1067 P4D 0
    Oops: 0002 [#1] PREEMPT SMP NOPTI
    CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ torvalds#57
    Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
    RIP: 0010:mutex_lock+0x2e/0x50
    ...
    Call Trace:
    <TASK>
    __die+0x24/0x70
    page_fault_oops+0x82/0x160
    do_user_addr_fault+0x65/0x6b0
    __pfx___rdmsr_safe_on_cpu+0x10/0x10
    exc_page_fault+0x7d/0x170
    asm_exc_page_fault+0x26/0x30
    mutex_lock+0x2e/0x50
    mutex_lock+0x1e/0x50
    perf_pmu_migrate_context+0x87/0x1f0
    perf_event_cpu_offline+0x76/0x90 [idxd]
    cpuhp_invoke_callback+0xa2/0x4f0
    __pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
    cpuhp_thread_fun+0x98/0x150
    smpboot_thread_fn+0x27/0x260
    smpboot_thread_fn+0x1af/0x260
    __pfx_smpboot_thread_fn+0x10/0x10
    kthread+0x103/0x140
    __pfx_kthread+0x10/0x10
    ret_from_fork+0x31/0x50
    __pfx_kthread+0x10/0x10
    ret_from_fork_asm+0x1b/0x30
    <TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request May 12, 2024
If there is a symbol link in the given patch, checkpatch.pl reports two
inaccurate warnings:

$ cat 0001-selftests-bpf-Add-mptcp-pm_nl_ctl-link.patch

 ... ...

 '''
 # diff --git a/tools/testing/selftests/bpf/mptcp_pm_nl_ctl.c \
 #            b/tools/testing/selftests/bpf/mptcp_pm_nl_ctl.c
 # new file mode 120000
 # index 000000000000..5a08c255b278
 # --- /dev/null
 # +++ b/tools/testing/selftests/bpf/mptcp_pm_nl_ctl.c
 # @@ -0,0 +1 @@
 # +../net/mptcp/pm_nl_ctl.c
 # \ No newline at end of file
 '''

$ ./scripts/checkpatch.pl 0001-selftests-bpf-Add-mptcp-pm_nl_ctl-link.patch

 '''
 WARNING: Missing or malformed SPDX-License-Identifier tag in line 1
 torvalds#57: FILE: tools/testing/selftests/bpf/mptcp_pm_nl_ctl.c:1:
 +../net/mptcp/pm_nl_ctl.c

 WARNING: adding a line without newline at end of file
 torvalds#57: FILE: tools/testing/selftests/bpf/mptcp_pm_nl_ctl.c:1:
 +../net/mptcp/pm_nl_ctl.c

 total: 0 errors, 2 warnings, 16 lines checked
 '''

This patch fixes this by adding a new variable $symbol_link in checkpatch
script, set it if the new file mode is 120000. Skip these two checks
"missing SPDX-License-Identifier" and "adding a line without newline at
end of file" if this variable is set.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request May 13, 2024
If there is a symbol link in the given patch, checkpatch.pl reports two
inaccurate warnings:

$ cat 0001-selftests-bpf-Add-mptcp-pm_nl_ctl-link.patch

 ... ...

 '''
 # diff --git a/tools/testing/selftests/bpf/mptcp_pm_nl_ctl.c \
 #            b/tools/testing/selftests/bpf/mptcp_pm_nl_ctl.c
 # new file mode 120000
 # index 000000000000..5a08c255b278
 # --- /dev/null
 # +++ b/tools/testing/selftests/bpf/mptcp_pm_nl_ctl.c
 # @@ -0,0 +1 @@
 # +../net/mptcp/pm_nl_ctl.c
 # \ No newline at end of file
 '''

$ ./scripts/checkpatch.pl 0001-selftests-bpf-Add-mptcp-pm_nl_ctl-link.patch

 '''
 WARNING: Missing or malformed SPDX-License-Identifier tag in line 1
 torvalds#57: FILE: tools/testing/selftests/bpf/mptcp_pm_nl_ctl.c:1:
 +../net/mptcp/pm_nl_ctl.c

 WARNING: adding a line without newline at end of file
 torvalds#57: FILE: tools/testing/selftests/bpf/mptcp_pm_nl_ctl.c:1:
 +../net/mptcp/pm_nl_ctl.c

 total: 0 errors, 2 warnings, 16 lines checked
 '''

This patch fixes this by adding a new variable $symbol_link in checkpatch
script, set it if the new file mode is 120000. Skip these two checks
"missing SPDX-License-Identifier" and "adding a line without newline at
end of file" if this variable is set.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Oct 2, 2024
[ Upstream commit 8206e4e ]

BPF CI reported that selftest deny_namespace failed with s390x.

  test_unpriv_userns_create_no_bpf:PASS:no-bpf unpriv new user ns 0 nsec
  test_deny_namespace:PASS:skel load 0 nsec
  libbpf: prog 'test_userns_create': failed to attach: ERROR: strerror_r(-524)=22
  libbpf: prog 'test_userns_create': failed to auto-attach: -524
  test_deny_namespace:FAIL:attach unexpected error: -524 (errno 524)
  torvalds#57/1    deny_namespace/unpriv_userns_create_no_bpf:FAIL
  torvalds#57      deny_namespace:FAIL

BPF program test_userns_create is a BPF LSM type program which is
based on trampoline and s390x does not support s390x. Let add the
test to x390x deny list to avoid this failure in BPF CI.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20221006053429.3549165-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stable-dep-of: aa8ebb2 ("selftests/bpf: Workaround strict bpf_lsm return value check.")
Signed-off-by: Sasha Levin <sashal@kernel.org>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Oct 8, 2024
[ Upstream commit 8206e4e ]

BPF CI reported that selftest deny_namespace failed with s390x.

  test_unpriv_userns_create_no_bpf:PASS:no-bpf unpriv new user ns 0 nsec
  test_deny_namespace:PASS:skel load 0 nsec
  libbpf: prog 'test_userns_create': failed to attach: ERROR: strerror_r(-524)=22
  libbpf: prog 'test_userns_create': failed to auto-attach: -524
  test_deny_namespace:FAIL:attach unexpected error: -524 (errno 524)
  torvalds#57/1    deny_namespace/unpriv_userns_create_no_bpf:FAIL
  torvalds#57      deny_namespace:FAIL

BPF program test_userns_create is a BPF LSM type program which is
based on trampoline and s390x does not support s390x. Let add the
test to x390x deny list to avoid this failure in BPF CI.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20221006053429.3549165-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stable-dep-of: aa8ebb2 ("selftests/bpf: Workaround strict bpf_lsm return value check.")
Signed-off-by: Sasha Levin <sashal@kernel.org>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Oct 8, 2024
[ Upstream commit 8206e4e ]

BPF CI reported that selftest deny_namespace failed with s390x.

  test_unpriv_userns_create_no_bpf:PASS:no-bpf unpriv new user ns 0 nsec
  test_deny_namespace:PASS:skel load 0 nsec
  libbpf: prog 'test_userns_create': failed to attach: ERROR: strerror_r(-524)=22
  libbpf: prog 'test_userns_create': failed to auto-attach: -524
  test_deny_namespace:FAIL:attach unexpected error: -524 (errno 524)
  torvalds#57/1    deny_namespace/unpriv_userns_create_no_bpf:FAIL
  torvalds#57      deny_namespace:FAIL

BPF program test_userns_create is a BPF LSM type program which is
based on trampoline and s390x does not support s390x. Let add the
test to x390x deny list to avoid this failure in BPF CI.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20221006053429.3549165-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stable-dep-of: aa8ebb2 ("selftests/bpf: Workaround strict bpf_lsm return value check.")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant