Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge pull request #1 from torvalds/master #85

Closed
wants to merge 1 commit into from

Conversation

e-mailky
Copy link

pull from torvalds

@e-mailky e-mailky closed this Mar 31, 2014
brianlilly pushed a commit to crystalfontz/cfa_10036_kernel that referenced this pull request Apr 1, 2014
Turn it into (for example):

[    0.073380] x86: Booting SMP configuration:
[    0.074005] .... node   #0, CPUs:          #1   #2   #3   #4   #5   torvalds#6   torvalds#7
[    0.603005] .... node   #1, CPUs:     torvalds#8   torvalds#9  torvalds#10  torvalds#11  torvalds#12  torvalds#13  torvalds#14  torvalds#15
[    1.200005] .... node   #2, CPUs:    torvalds#16  torvalds#17  torvalds#18  torvalds#19  torvalds#20  torvalds#21  torvalds#22  torvalds#23
[    1.796005] .... node   #3, CPUs:    torvalds#24  torvalds#25  torvalds#26  torvalds#27  torvalds#28  torvalds#29  torvalds#30  torvalds#31
[    2.393005] .... node   #4, CPUs:    torvalds#32  torvalds#33  torvalds#34  torvalds#35  torvalds#36  torvalds#37  torvalds#38  torvalds#39
[    2.996005] .... node   #5, CPUs:    torvalds#40  torvalds#41  torvalds#42  torvalds#43  torvalds#44  torvalds#45  torvalds#46  torvalds#47
[    3.600005] .... node   torvalds#6, CPUs:    torvalds#48  torvalds#49  torvalds#50  torvalds#51  #52  #53  torvalds#54  torvalds#55
[    4.202005] .... node   torvalds#7, CPUs:    torvalds#56  torvalds#57  #58  torvalds#59  torvalds#60  torvalds#61  torvalds#62  torvalds#63
[    4.811005] .... node   torvalds#8, CPUs:    torvalds#64  torvalds#65  torvalds#66  torvalds#67  torvalds#68  torvalds#69  #70  torvalds#71
[    5.421006] .... node   torvalds#9, CPUs:    torvalds#72  torvalds#73  torvalds#74  torvalds#75  torvalds#76  torvalds#77  torvalds#78  torvalds#79
[    6.032005] .... node  torvalds#10, CPUs:    torvalds#80  torvalds#81  torvalds#82  torvalds#83  torvalds#84  torvalds#85  torvalds#86  torvalds#87
[    6.648006] .... node  torvalds#11, CPUs:    torvalds#88  torvalds#89  torvalds#90  torvalds#91  torvalds#92  torvalds#93  torvalds#94  torvalds#95
[    7.262005] .... node  torvalds#12, CPUs:    torvalds#96  torvalds#97  torvalds#98  torvalds#99 torvalds#100 torvalds#101 torvalds#102 torvalds#103
[    7.865005] .... node  torvalds#13, CPUs:   torvalds#104 torvalds#105 torvalds#106 torvalds#107 torvalds#108 torvalds#109 torvalds#110 torvalds#111
[    8.466005] .... node  torvalds#14, CPUs:   torvalds#112 torvalds#113 torvalds#114 torvalds#115 torvalds#116 torvalds#117 torvalds#118 torvalds#119
[    9.073006] .... node  torvalds#15, CPUs:   torvalds#120 torvalds#121 torvalds#122 torvalds#123 torvalds#124 torvalds#125 torvalds#126 torvalds#127
[    9.679901] x86: Booted up 16 nodes, 128 CPUs

and drop useless elements.

Change num_digits() to hpa's division-avoiding, cell-phone-typed
version which he went at great lengths and pains to submit on a
Saturday evening.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: huawei.libin@huawei.com
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130930095624.GB16383@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
brianlilly pushed a commit to crystalfontz/cfa_10036_kernel that referenced this pull request Apr 1, 2014
Fixing the below dump:

root@freescale ~$ modprobe g_serial
g_serial gadget: Gadget Serial v2.4
g_serial gadget: g_serial ready
BUG: sleeping function called from invalid context at /home/b29397/work/projects/upstream/usb/usb/drivers/base/power/runtime.c:952
in_atomic(): 1, irqs_disabled(): 128, pid: 805, name: modprobe
2 locks held by modprobe/805:
 #0:  (udc_lock){+.+.+.}, at: [<7f000a74>] usb_gadget_probe_driver+0x44/0xb4 [udc_core]
 #1:  (&(&ci->lock)->rlock){......}, at: [<7f033488>] ci_udc_start+0x94/0x110 [ci_hdrc]
irq event stamp: 3878
hardirqs last  enabled at (3877): [<806b6720>] _raw_spin_unlock_irqrestore+0x40/0x6c
hardirqs last disabled at (3878): [<806b6474>] _raw_spin_lock_irqsave+0x2c/0xa8
softirqs last  enabled at (3872): [<8002ec0c>] __do_softirq+0x1c8/0x2e8
softirqs last disabled at (3857): [<8002f180>] irq_exit+0xbc/0x110
CPU: 0 PID: 805 Comm: modprobe Not tainted 3.11.0-next-20130910+ torvalds#85
[<80016b94>] (unwind_backtrace+0x0/0xf8) from [<80012e0c>] (show_stack+0x20/0x24)
[<80012e0c>] (show_stack+0x20/0x24) from [<806af554>] (dump_stack+0x9c/0xc4)
[<806af554>] (dump_stack+0x9c/0xc4) from [<8005940c>] (__might_sleep+0xf4/0x134)
[<8005940c>] (__might_sleep+0xf4/0x134) from [<803a04a4>] (__pm_runtime_resume+0x94/0xa0)
[<803a04a4>] (__pm_runtime_resume+0x94/0xa0) from [<7f0334a4>] (ci_udc_start+0xb0/0x110 [ci_hdrc])
[<7f0334a4>] (ci_udc_start+0xb0/0x110 [ci_hdrc]) from [<7f0009b4>] (udc_bind_to_driver+0x5c/0xd8 [udc_core])
[<7f0009b4>] (udc_bind_to_driver+0x5c/0xd8 [udc_core]) from [<7f000ab0>] (usb_gadget_probe_driver+0x80/0xb4 [udc_core])
[<7f000ab0>] (usb_gadget_probe_driver+0x80/0xb4 [udc_core]) from [<7f008618>] (usb_composite_probe+0xac/0xd8 [libcomposite])
[<7f008618>] (usb_composite_probe+0xac/0xd8 [libcomposite]) from [<7f04b168>] (init+0x8c/0xb4 [g_serial])
[<7f04b168>] (init+0x8c/0xb4 [g_serial]) from [<800088e8>] (do_one_initcall+0x108/0x16c)
[<800088e8>] (do_one_initcall+0x108/0x16c) from [<8008e518>] (load_module+0x1b00/0x20a4)
[<8008e518>] (load_module+0x1b00/0x20a4) from [<8008eba8>] (SyS_init_module+0xec/0x100)
[<8008eba8>] (SyS_init_module+0xec/0x100) from [<8000ec40>] (ret_fast_syscall+0x0/0x48)

Signed-off-by: Peter Chen <peter.chen@freescale.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
liubogithub pushed a commit to liubogithub/btrfs-work that referenced this pull request Apr 2, 2014
The @cn is stay in @clk_notifier_list after it is freed, it cause
memory corruption.

Example, if @clk is registered(first), unregistered(first),
registered(second), unregistered(second).

The freed @cn will be used when @clk is registered(second),
and the bug will be happened when @clk is unregistered(second):

[  517.040000] clk_notif_dbg clk_notif_dbg.1: clk_notifier_unregister()
[  517.040000] Unable to handle kernel paging request at virtual address 00df3008
[  517.050000] pgd = ed858000
[  517.050000] [00df3008] *pgd=00000000
[  517.060000] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[  517.060000] Modules linked in: clk_notif_dbg(O-) [last unloaded: clk_notif_dbg]
[  517.060000] CPU: 1 PID: 499 Comm: modprobe Tainted: G           O 3.10.0-rc3-00119-ga93cb29-dirty torvalds#85
[  517.060000] task: ee1e0180 ti: ee3e6000 task.ti: ee3e6000
[  517.060000] PC is at srcu_readers_seq_idx+0x48/0x84
[  517.060000] LR is at srcu_readers_seq_idx+0x60/0x84
[  517.060000] pc : [<c0052720>]    lr : [<c0052738>]    psr: 80070013
[  517.060000] sp : ee3e7d48  ip : 00000000  fp : ee3e7d6c
[  517.060000] r10: 00000000  r9 : ee3e6000  r8 : 00000000
[  517.060000] r7 : ed84fe4c  r6 : c068ec90  r5 : c068e430  r4 : 00000000
[  517.060000] r3 : 00df3000  r2 : 00000000  r1 : 00000002  r0 : 00000000
[  517.060000] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  517.060000] Control: 18c5387d  Table: 2d85804a  DAC: 00000015
[  517.060000] Process modprobe (pid: 499, stack limit = 0xee3e6238)
[  517.060000] Stack: (0xee3e7d48 to 0xee3e8000)
....
[  517.060000] [<c0052720>] (srcu_readers_seq_idx+0x48/0x84) from [<c0052790>] (try_check_zero+0x34/0xfc)
[  517.060000] [<c0052790>] (try_check_zero+0x34/0xfc) from [<c00528b0>] (srcu_advance_batches+0x58/0x114)
[  517.060000] [<c00528b0>] (srcu_advance_batches+0x58/0x114) from [<c0052c30>] (__synchronize_srcu+0x114/0x1ac)
[  517.060000] [<c0052c30>] (__synchronize_srcu+0x114/0x1ac) from [<c0052d14>] (synchronize_srcu+0x2c/0x34)
[  517.060000] [<c0052d14>] (synchronize_srcu+0x2c/0x34) from [<c0053a08>] (srcu_notifier_chain_unregister+0x68/0x74)
[  517.060000] [<c0053a08>] (srcu_notifier_chain_unregister+0x68/0x74) from [<c0375a78>] (clk_notifier_unregister+0x7c/0xc0)
[  517.060000] [<c0375a78>] (clk_notifier_unregister+0x7c/0xc0) from [<bf008034>] (clk_notif_dbg_remove+0x34/0x9c [clk_notif_dbg])
[  517.060000] [<bf008034>] (clk_notif_dbg_remove+0x34/0x9c [clk_notif_dbg]) from [<c02bb974>] (platform_drv_remove+0x24/0x28)
[  517.060000] [<c02bb974>] (platform_drv_remove+0x24/0x28) from [<c02b9bf8>] (__device_release_driver+0x8c/0xd4)
[  517.060000] [<c02b9bf8>] (__device_release_driver+0x8c/0xd4) from [<c02ba680>] (driver_detach+0x9c/0xc4)
[  517.060000] [<c02ba680>] (driver_detach+0x9c/0xc4) from [<c02b99c4>] (bus_remove_driver+0xcc/0xfc)
[  517.060000] [<c02b99c4>] (bus_remove_driver+0xcc/0xfc) from [<c02bace4>] (driver_unregister+0x54/0x78)
[  517.060000] [<c02bace4>] (driver_unregister+0x54/0x78) from [<c02bbb44>] (platform_driver_unregister+0x1c/0x20)
[  517.060000] [<c02bbb44>] (platform_driver_unregister+0x1c/0x20) from [<bf0081f8>] (clk_notif_dbg_driver_exit+0x14/0x1c [clk_notif_dbg])
[  517.060000] [<bf0081f8>] (clk_notif_dbg_driver_exit+0x14/0x1c [clk_notif_dbg]) from [<c00835e4>] (SyS_delete_module+0x200/0x28c)
[  517.060000] [<c00835e4>] (SyS_delete_module+0x200/0x28c) from [<c000edc0>] (ret_fast_syscall+0x0/0x48)
[  517.060000] Code: e5973004 e7911102 e0833001 e2881002 (e7933101)

Cc: stable@kernel.org
Reported-by: Sören Brinkmann <soren.brinkmann@xilinx.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Sören Brinkmann <soren.brinkmann@xilinx.com>
Signed-off-by: Mike Turquette <mturquette@linaro.org>
[mturquette@linaro.org: shortened $SUBJECT]
gregnietsky pushed a commit to Distrotech/linux that referenced this pull request Apr 9, 2014
commit 72b5322 upstream.

The @cn is stay in @clk_notifier_list after it is freed, it cause
memory corruption.

Example, if @clk is registered(first), unregistered(first),
registered(second), unregistered(second).

The freed @cn will be used when @clk is registered(second),
and the bug will be happened when @clk is unregistered(second):

[  517.040000] clk_notif_dbg clk_notif_dbg.1: clk_notifier_unregister()
[  517.040000] Unable to handle kernel paging request at virtual address 00df3008
[  517.050000] pgd = ed858000
[  517.050000] [00df3008] *pgd=00000000
[  517.060000] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[  517.060000] Modules linked in: clk_notif_dbg(O-) [last unloaded: clk_notif_dbg]
[  517.060000] CPU: 1 PID: 499 Comm: modprobe Tainted: G           O 3.10.0-rc3-00119-ga93cb29-dirty torvalds#85
[  517.060000] task: ee1e0180 ti: ee3e6000 task.ti: ee3e6000
[  517.060000] PC is at srcu_readers_seq_idx+0x48/0x84
[  517.060000] LR is at srcu_readers_seq_idx+0x60/0x84
[  517.060000] pc : [<c0052720>]    lr : [<c0052738>]    psr: 80070013
[  517.060000] sp : ee3e7d48  ip : 00000000  fp : ee3e7d6c
[  517.060000] r10: 00000000  r9 : ee3e6000  r8 : 00000000
[  517.060000] r7 : ed84fe4c  r6 : c068ec90  r5 : c068e430  r4 : 00000000
[  517.060000] r3 : 00df3000  r2 : 00000000  r1 : 00000002  r0 : 00000000
[  517.060000] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  517.060000] Control: 18c5387d  Table: 2d85804a  DAC: 00000015
[  517.060000] Process modprobe (pid: 499, stack limit = 0xee3e6238)
[  517.060000] Stack: (0xee3e7d48 to 0xee3e8000)
....
[  517.060000] [<c0052720>] (srcu_readers_seq_idx+0x48/0x84) from [<c0052790>] (try_check_zero+0x34/0xfc)
[  517.060000] [<c0052790>] (try_check_zero+0x34/0xfc) from [<c00528b0>] (srcu_advance_batches+0x58/0x114)
[  517.060000] [<c00528b0>] (srcu_advance_batches+0x58/0x114) from [<c0052c30>] (__synchronize_srcu+0x114/0x1ac)
[  517.060000] [<c0052c30>] (__synchronize_srcu+0x114/0x1ac) from [<c0052d14>] (synchronize_srcu+0x2c/0x34)
[  517.060000] [<c0052d14>] (synchronize_srcu+0x2c/0x34) from [<c0053a08>] (srcu_notifier_chain_unregister+0x68/0x74)
[  517.060000] [<c0053a08>] (srcu_notifier_chain_unregister+0x68/0x74) from [<c0375a78>] (clk_notifier_unregister+0x7c/0xc0)
[  517.060000] [<c0375a78>] (clk_notifier_unregister+0x7c/0xc0) from [<bf008034>] (clk_notif_dbg_remove+0x34/0x9c [clk_notif_dbg])
[  517.060000] [<bf008034>] (clk_notif_dbg_remove+0x34/0x9c [clk_notif_dbg]) from [<c02bb974>] (platform_drv_remove+0x24/0x28)
[  517.060000] [<c02bb974>] (platform_drv_remove+0x24/0x28) from [<c02b9bf8>] (__device_release_driver+0x8c/0xd4)
[  517.060000] [<c02b9bf8>] (__device_release_driver+0x8c/0xd4) from [<c02ba680>] (driver_detach+0x9c/0xc4)
[  517.060000] [<c02ba680>] (driver_detach+0x9c/0xc4) from [<c02b99c4>] (bus_remove_driver+0xcc/0xfc)
[  517.060000] [<c02b99c4>] (bus_remove_driver+0xcc/0xfc) from [<c02bace4>] (driver_unregister+0x54/0x78)
[  517.060000] [<c02bace4>] (driver_unregister+0x54/0x78) from [<c02bbb44>] (platform_driver_unregister+0x1c/0x20)
[  517.060000] [<c02bbb44>] (platform_driver_unregister+0x1c/0x20) from [<bf0081f8>] (clk_notif_dbg_driver_exit+0x14/0x1c [clk_notif_dbg])
[  517.060000] [<bf0081f8>] (clk_notif_dbg_driver_exit+0x14/0x1c [clk_notif_dbg]) from [<c00835e4>] (SyS_delete_module+0x200/0x28c)
[  517.060000] [<c00835e4>] (SyS_delete_module+0x200/0x28c) from [<c000edc0>] (ret_fast_syscall+0x0/0x48)
[  517.060000] Code: e5973004 e7911102 e0833001 e2881002 (e7933101)

Reported-by: Sören Brinkmann <soren.brinkmann@xilinx.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Sören Brinkmann <soren.brinkmann@xilinx.com>
Signed-off-by: Mike Turquette <mturquette@linaro.org>
[mturquette@linaro.org: shortened $SUBJECT]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
gregnietsky pushed a commit to Distrotech/linux that referenced this pull request Apr 9, 2014
commit 72b5322 upstream.

The @cn is stay in @clk_notifier_list after it is freed, it cause
memory corruption.

Example, if @clk is registered(first), unregistered(first),
registered(second), unregistered(second).

The freed @cn will be used when @clk is registered(second),
and the bug will be happened when @clk is unregistered(second):

[  517.040000] clk_notif_dbg clk_notif_dbg.1: clk_notifier_unregister()
[  517.040000] Unable to handle kernel paging request at virtual address 00df3008
[  517.050000] pgd = ed858000
[  517.050000] [00df3008] *pgd=00000000
[  517.060000] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[  517.060000] Modules linked in: clk_notif_dbg(O-) [last unloaded: clk_notif_dbg]
[  517.060000] CPU: 1 PID: 499 Comm: modprobe Tainted: G           O 3.10.0-rc3-00119-ga93cb29-dirty torvalds#85
[  517.060000] task: ee1e0180 ti: ee3e6000 task.ti: ee3e6000
[  517.060000] PC is at srcu_readers_seq_idx+0x48/0x84
[  517.060000] LR is at srcu_readers_seq_idx+0x60/0x84
[  517.060000] pc : [<c0052720>]    lr : [<c0052738>]    psr: 80070013
[  517.060000] sp : ee3e7d48  ip : 00000000  fp : ee3e7d6c
[  517.060000] r10: 00000000  r9 : ee3e6000  r8 : 00000000
[  517.060000] r7 : ed84fe4c  r6 : c068ec90  r5 : c068e430  r4 : 00000000
[  517.060000] r3 : 00df3000  r2 : 00000000  r1 : 00000002  r0 : 00000000
[  517.060000] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  517.060000] Control: 18c5387d  Table: 2d85804a  DAC: 00000015
[  517.060000] Process modprobe (pid: 499, stack limit = 0xee3e6238)
[  517.060000] Stack: (0xee3e7d48 to 0xee3e8000)
....
[  517.060000] [<c0052720>] (srcu_readers_seq_idx+0x48/0x84) from [<c0052790>] (try_check_zero+0x34/0xfc)
[  517.060000] [<c0052790>] (try_check_zero+0x34/0xfc) from [<c00528b0>] (srcu_advance_batches+0x58/0x114)
[  517.060000] [<c00528b0>] (srcu_advance_batches+0x58/0x114) from [<c0052c30>] (__synchronize_srcu+0x114/0x1ac)
[  517.060000] [<c0052c30>] (__synchronize_srcu+0x114/0x1ac) from [<c0052d14>] (synchronize_srcu+0x2c/0x34)
[  517.060000] [<c0052d14>] (synchronize_srcu+0x2c/0x34) from [<c0053a08>] (srcu_notifier_chain_unregister+0x68/0x74)
[  517.060000] [<c0053a08>] (srcu_notifier_chain_unregister+0x68/0x74) from [<c0375a78>] (clk_notifier_unregister+0x7c/0xc0)
[  517.060000] [<c0375a78>] (clk_notifier_unregister+0x7c/0xc0) from [<bf008034>] (clk_notif_dbg_remove+0x34/0x9c [clk_notif_dbg])
[  517.060000] [<bf008034>] (clk_notif_dbg_remove+0x34/0x9c [clk_notif_dbg]) from [<c02bb974>] (platform_drv_remove+0x24/0x28)
[  517.060000] [<c02bb974>] (platform_drv_remove+0x24/0x28) from [<c02b9bf8>] (__device_release_driver+0x8c/0xd4)
[  517.060000] [<c02b9bf8>] (__device_release_driver+0x8c/0xd4) from [<c02ba680>] (driver_detach+0x9c/0xc4)
[  517.060000] [<c02ba680>] (driver_detach+0x9c/0xc4) from [<c02b99c4>] (bus_remove_driver+0xcc/0xfc)
[  517.060000] [<c02b99c4>] (bus_remove_driver+0xcc/0xfc) from [<c02bace4>] (driver_unregister+0x54/0x78)
[  517.060000] [<c02bace4>] (driver_unregister+0x54/0x78) from [<c02bbb44>] (platform_driver_unregister+0x1c/0x20)
[  517.060000] [<c02bbb44>] (platform_driver_unregister+0x1c/0x20) from [<bf0081f8>] (clk_notif_dbg_driver_exit+0x14/0x1c [clk_notif_dbg])
[  517.060000] [<bf0081f8>] (clk_notif_dbg_driver_exit+0x14/0x1c [clk_notif_dbg]) from [<c00835e4>] (SyS_delete_module+0x200/0x28c)
[  517.060000] [<c00835e4>] (SyS_delete_module+0x200/0x28c) from [<c000edc0>] (ret_fast_syscall+0x0/0x48)
[  517.060000] Code: e5973004 e7911102 e0833001 e2881002 (e7933101)

Reported-by: Sören Brinkmann <soren.brinkmann@xilinx.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Sören Brinkmann <soren.brinkmann@xilinx.com>
Signed-off-by: Mike Turquette <mturquette@linaro.org>
[mturquette@linaro.org: shortened $SUBJECT]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sr105 referenced this pull request in RealDigitalMedia/linux-shuttle Jun 3, 2014
Fixing the below dump:

root@freescale ~$ modprobe g_serial
g_serial gadget: Gadget Serial v2.4
g_serial gadget: g_serial ready
BUG: sleeping function called from invalid context at /home/b29397/work/projects/upstream/usb/usb/drivers/base/power/runtime.c:952
in_atomic(): 1, irqs_disabled(): 128, pid: 805, name: modprobe
2 locks held by modprobe/805:
 #0:  (udc_lock){+.+.+.}, at: [<7f000a74>] usb_gadget_probe_driver+0x44/0xb4 [udc_core]
 Freescale#1:  (&(&ci->lock)->rlock){......}, at: [<7f033488>] ci_udc_start+0x94/0x110 [ci_hdrc]
irq event stamp: 3878
hardirqs last  enabled at (3877): [<806b6720>] _raw_spin_unlock_irqrestore+0x40/0x6c
hardirqs last disabled at (3878): [<806b6474>] _raw_spin_lock_irqsave+0x2c/0xa8
softirqs last  enabled at (3872): [<8002ec0c>] __do_softirq+0x1c8/0x2e8
softirqs last disabled at (3857): [<8002f180>] irq_exit+0xbc/0x110
CPU: 0 PID: 805 Comm: modprobe Not tainted 3.11.0-next-20130910+ Freescale#85
[<80016b94>] (unwind_backtrace+0x0/0xf8) from [<80012e0c>] (show_stack+0x20/0x24)
[<80012e0c>] (show_stack+0x20/0x24) from [<806af554>] (dump_stack+0x9c/0xc4)
[<806af554>] (dump_stack+0x9c/0xc4) from [<8005940c>] (__might_sleep+0xf4/0x134)
[<8005940c>] (__might_sleep+0xf4/0x134) from [<803a04a4>] (__pm_runtime_resume+0x94/0xa0)
[<803a04a4>] (__pm_runtime_resume+0x94/0xa0) from [<7f0334a4>] (ci_udc_start+0xb0/0x110 [ci_hdrc])
[<7f0334a4>] (ci_udc_start+0xb0/0x110 [ci_hdrc]) from [<7f0009b4>] (udc_bind_to_driver+0x5c/0xd8 [udc_core])
[<7f0009b4>] (udc_bind_to_driver+0x5c/0xd8 [udc_core]) from [<7f000ab0>] (usb_gadget_probe_driver+0x80/0xb4 [udc_core])
[<7f000ab0>] (usb_gadget_probe_driver+0x80/0xb4 [udc_core]) from [<7f008618>] (usb_composite_probe+0xac/0xd8 [libcomposite])
[<7f008618>] (usb_composite_probe+0xac/0xd8 [libcomposite]) from [<7f04b168>] (init+0x8c/0xb4 [g_serial])
[<7f04b168>] (init+0x8c/0xb4 [g_serial]) from [<800088e8>] (do_one_initcall+0x108/0x16c)
[<800088e8>] (do_one_initcall+0x108/0x16c) from [<8008e518>] (load_module+0x1b00/0x20a4)
[<8008e518>] (load_module+0x1b00/0x20a4) from [<8008eba8>] (SyS_init_module+0xec/0x100)
[<8008eba8>] (SyS_init_module+0xec/0x100) from [<8000ec40>] (ret_fast_syscall+0x0/0x48)

Signed-off-by: Peter Chen <peter.chen@freescale.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gnurou pushed a commit to Gnurou/linux that referenced this pull request Jun 27, 2014
When runing with the kernel(3.15-rc7+), the follow bug occurs:
[ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
[ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
[ 9969.441175] INFO: lockdep is turned off.
[ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G       A      3.15.0-rc7+ torvalds#85
[ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
[ 9969.706052]  ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
[ 9969.795323]  ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
[ 9969.884710]  ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
[ 9969.974071] Call Trace:
[ 9970.003403]  [<ffffffff8162f523>] dump_stack+0x4d/0x66
[ 9970.065074]  [<ffffffff8109995a>] __might_sleep+0xfa/0x130
[ 9970.130743]  [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
[ 9970.200638]  [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
[ 9970.272610]  [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
[ 9970.344584]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
[ 9970.409282]  [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
[ 9970.471897]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
[ 9970.536585]  [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
[ 9970.613763]  [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
[ 9970.683660]  [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
[ 9970.759795]  [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
[ 9970.834885]  [<ffffffff8106a598>] do_fork+0xd8/0x380
[ 9970.894375]  [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
[ 9970.969470]  [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
[ 9971.030011]  [<ffffffff81642009>] stub_clone+0x69/0x90
[ 9971.091573]  [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b

The cause is that cpuset_mems_allowed() try to take
mutex_lock(&callback_mutex) under the rcu_read_lock(which was hold in
__mpol_dup()). And in cpuset_mems_allowed(), the access to cpuset is
under rcu_read_lock, so in __mpol_dup, we can reduce the rcu_read_lock
protection region to protect the access to cpuset only in
current_cpuset_is_being_rebound(). So that we can avoid this bug.

This patch is a temporary solution that just addresses the bug
mentioned above, can not fix the long-standing issue about cpuset.mems
rebinding on fork():

"When the forker's task_struct is duplicated (which includes
 ->mems_allowed) and it races with an update to cpuset_being_rebound
 in update_tasks_nodemask() then the task's mems_allowed doesn't get
 updated. And the child task's mems_allowed can be wrong if the
 cpuset's nodemask changes before the child has been added to the
 cgroup's tasklist."

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable <stable@vger.kernel.org>
heftig referenced this pull request in zen-kernel/zen-kernel Jul 18, 2014
commit 391acf9 upstream.

When runing with the kernel(3.15-rc7+), the follow bug occurs:
[ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
[ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
[ 9969.441175] INFO: lockdep is turned off.
[ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G       A      3.15.0-rc7+ #85
[ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
[ 9969.706052]  ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
[ 9969.795323]  ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
[ 9969.884710]  ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
[ 9969.974071] Call Trace:
[ 9970.003403]  [<ffffffff8162f523>] dump_stack+0x4d/0x66
[ 9970.065074]  [<ffffffff8109995a>] __might_sleep+0xfa/0x130
[ 9970.130743]  [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
[ 9970.200638]  [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
[ 9970.272610]  [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
[ 9970.344584]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
[ 9970.409282]  [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
[ 9970.471897]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
[ 9970.536585]  [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
[ 9970.613763]  [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
[ 9970.683660]  [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
[ 9970.759795]  [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
[ 9970.834885]  [<ffffffff8106a598>] do_fork+0xd8/0x380
[ 9970.894375]  [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
[ 9970.969470]  [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
[ 9971.030011]  [<ffffffff81642009>] stub_clone+0x69/0x90
[ 9971.091573]  [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b

The cause is that cpuset_mems_allowed() try to take
mutex_lock(&callback_mutex) under the rcu_read_lock(which was hold in
__mpol_dup()). And in cpuset_mems_allowed(), the access to cpuset is
under rcu_read_lock, so in __mpol_dup, we can reduce the rcu_read_lock
protection region to protect the access to cpuset only in
current_cpuset_is_being_rebound(). So that we can avoid this bug.

This patch is a temporary solution that just addresses the bug
mentioned above, can not fix the long-standing issue about cpuset.mems
rebinding on fork():

"When the forker's task_struct is duplicated (which includes
 ->mems_allowed) and it races with an update to cpuset_being_rebound
 in update_tasks_nodemask() then the task's mems_allowed doesn't get
 updated. And the child task's mems_allowed can be wrong if the
 cpuset's nodemask changes before the child has been added to the
 cgroup's tasklist."

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
n-aizu pushed a commit to n-aizu/linux that referenced this pull request Sep 10, 2014
… context

Fixing the below dump:

root@freescale ~$ modprobe g_serial
g_serial gadget: Gadget Serial v2.4
g_serial gadget: g_serial ready
BUG: sleeping function called from invalid context at /home/b29397/work/projects/upstream/usb/usb/drivers/base/power/runtime.c:952
in_atomic(): 1, irqs_disabled(): 128, pid: 805, name: modprobe
2 locks held by modprobe/805:
 #0:  (udc_lock){+.+.+.}, at: [<7f000a74>] usb_gadget_probe_driver+0x44/0xb4 [udc_core]
 #1:  (&(&ci->lock)->rlock){......}, at: [<7f033488>] ci_udc_start+0x94/0x110 [ci_hdrc]
irq event stamp: 3878
hardirqs last  enabled at (3877): [<806b6720>] _raw_spin_unlock_irqrestore+0x40/0x6c
hardirqs last disabled at (3878): [<806b6474>] _raw_spin_lock_irqsave+0x2c/0xa8
softirqs last  enabled at (3872): [<8002ec0c>] __do_softirq+0x1c8/0x2e8
softirqs last disabled at (3857): [<8002f180>] irq_exit+0xbc/0x110
CPU: 0 PID: 805 Comm: modprobe Not tainted 3.11.0-next-20130910+ torvalds#85
[<80016b94>] (unwind_backtrace+0x0/0xf8) from [<80012e0c>] (show_stack+0x20/0x24)
[<80012e0c>] (show_stack+0x20/0x24) from [<806af554>] (dump_stack+0x9c/0xc4)
[<806af554>] (dump_stack+0x9c/0xc4) from [<8005940c>] (__might_sleep+0xf4/0x134)
[<8005940c>] (__might_sleep+0xf4/0x134) from [<803a04a4>] (__pm_runtime_resume+0x94/0xa0)
[<803a04a4>] (__pm_runtime_resume+0x94/0xa0) from [<7f0334a4>] (ci_udc_start+0xb0/0x110 [ci_hdrc])
[<7f0334a4>] (ci_udc_start+0xb0/0x110 [ci_hdrc]) from [<7f0009b4>] (udc_bind_to_driver+0x5c/0xd8 [udc_core])
[<7f0009b4>] (udc_bind_to_driver+0x5c/0xd8 [udc_core]) from [<7f000ab0>] (usb_gadget_probe_driver+0x80/0xb4 [udc_core])
[<7f000ab0>] (usb_gadget_probe_driver+0x80/0xb4 [udc_core]) from [<7f008618>] (usb_composite_probe+0xac/0xd8 [libcomposite])
[<7f008618>] (usb_composite_probe+0xac/0xd8 [libcomposite]) from [<7f04b168>] (init+0x8c/0xb4 [g_serial])
[<7f04b168>] (init+0x8c/0xb4 [g_serial]) from [<800088e8>] (do_one_initcall+0x108/0x16c)
[<800088e8>] (do_one_initcall+0x108/0x16c) from [<8008e518>] (load_module+0x1b00/0x20a4)
[<8008e518>] (load_module+0x1b00/0x20a4) from [<8008eba8>] (SyS_init_module+0xec/0x100)
[<8008eba8>] (SyS_init_module+0xec/0x100) from [<8000ec40>] (ret_fast_syscall+0x0/0x48)

Signed-off-by: Peter Chen <peter.chen@freescale.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
pstglia pushed a commit to pstglia/linux that referenced this pull request Oct 6, 2014
When runing with the kernel(3.15-rc7+), the follow bug occurs:
[ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
[ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
[ 9969.441175] INFO: lockdep is turned off.
[ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G       A      3.15.0-rc7+ torvalds#85
[ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
[ 9969.706052]  ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
[ 9969.795323]  ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
[ 9969.884710]  ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
[ 9969.974071] Call Trace:
[ 9970.003403]  [<ffffffff8162f523>] dump_stack+0x4d/0x66
[ 9970.065074]  [<ffffffff8109995a>] __might_sleep+0xfa/0x130
[ 9970.130743]  [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
[ 9970.200638]  [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
[ 9970.272610]  [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
[ 9970.344584]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
[ 9970.409282]  [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
[ 9970.471897]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
[ 9970.536585]  [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
[ 9970.613763]  [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
[ 9970.683660]  [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
[ 9970.759795]  [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
[ 9970.834885]  [<ffffffff8106a598>] do_fork+0xd8/0x380
[ 9970.894375]  [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
[ 9970.969470]  [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
[ 9971.030011]  [<ffffffff81642009>] stub_clone+0x69/0x90
[ 9971.091573]  [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b

The cause is that cpuset_mems_allowed() try to take
mutex_lock(&callback_mutex) under the rcu_read_lock(which was hold in
__mpol_dup()). And in cpuset_mems_allowed(), the access to cpuset is
under rcu_read_lock, so in __mpol_dup, we can reduce the rcu_read_lock
protection region to protect the access to cpuset only in
current_cpuset_is_being_rebound(). So that we can avoid this bug.

This patch is a temporary solution that just addresses the bug
mentioned above, can not fix the long-standing issue about cpuset.mems
rebinding on fork():

"When the forker's task_struct is duplicated (which includes
 ->mems_allowed) and it races with an update to cpuset_being_rebound
 in update_tasks_nodemask() then the task's mems_allowed doesn't get
 updated. And the child task's mems_allowed can be wrong if the
 cpuset's nodemask changes before the child has been added to the
 cgroup's tasklist."

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable <stable@vger.kernel.org>
chrillomat pushed a commit to chrillomat/linux that referenced this pull request Oct 6, 2014
Fixing the below dump:

root@freescale ~$ modprobe g_serial
g_serial gadget: Gadget Serial v2.4
g_serial gadget: g_serial ready
BUG: sleeping function called from invalid context at /home/b29397/work/projects/upstream/usb/usb/drivers/base/power/runtime.c:952
in_atomic(): 1, irqs_disabled(): 128, pid: 805, name: modprobe
2 locks held by modprobe/805:
 #0:  (udc_lock){+.+.+.}, at: [<7f000a74>] usb_gadget_probe_driver+0x44/0xb4 [udc_core]
 wandboard-org#1:  (&(&ci->lock)->rlock){......}, at: [<7f033488>] ci_udc_start+0x94/0x110 [ci_hdrc]
irq event stamp: 3878
hardirqs last  enabled at (3877): [<806b6720>] _raw_spin_unlock_irqrestore+0x40/0x6c
hardirqs last disabled at (3878): [<806b6474>] _raw_spin_lock_irqsave+0x2c/0xa8
softirqs last  enabled at (3872): [<8002ec0c>] __do_softirq+0x1c8/0x2e8
softirqs last disabled at (3857): [<8002f180>] irq_exit+0xbc/0x110
CPU: 0 PID: 805 Comm: modprobe Not tainted 3.11.0-next-20130910+ torvalds#85
[<80016b94>] (unwind_backtrace+0x0/0xf8) from [<80012e0c>] (show_stack+0x20/0x24)
[<80012e0c>] (show_stack+0x20/0x24) from [<806af554>] (dump_stack+0x9c/0xc4)
[<806af554>] (dump_stack+0x9c/0xc4) from [<8005940c>] (__might_sleep+0xf4/0x134)
[<8005940c>] (__might_sleep+0xf4/0x134) from [<803a04a4>] (__pm_runtime_resume+0x94/0xa0)
[<803a04a4>] (__pm_runtime_resume+0x94/0xa0) from [<7f0334a4>] (ci_udc_start+0xb0/0x110 [ci_hdrc])
[<7f0334a4>] (ci_udc_start+0xb0/0x110 [ci_hdrc]) from [<7f0009b4>] (udc_bind_to_driver+0x5c/0xd8 [udc_core])
[<7f0009b4>] (udc_bind_to_driver+0x5c/0xd8 [udc_core]) from [<7f000ab0>] (usb_gadget_probe_driver+0x80/0xb4 [udc_core])
[<7f000ab0>] (usb_gadget_probe_driver+0x80/0xb4 [udc_core]) from [<7f008618>] (usb_composite_probe+0xac/0xd8 [libcomposite])
[<7f008618>] (usb_composite_probe+0xac/0xd8 [libcomposite]) from [<7f04b168>] (init+0x8c/0xb4 [g_serial])
[<7f04b168>] (init+0x8c/0xb4 [g_serial]) from [<800088e8>] (do_one_initcall+0x108/0x16c)
[<800088e8>] (do_one_initcall+0x108/0x16c) from [<8008e518>] (load_module+0x1b00/0x20a4)
[<8008e518>] (load_module+0x1b00/0x20a4) from [<8008eba8>] (SyS_init_module+0xec/0x100)
[<8008eba8>] (SyS_init_module+0xec/0x100) from [<8000ec40>] (ret_fast_syscall+0x0/0x48)

Signed-off-by: Peter Chen <peter.chen@freescale.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kees pushed a commit to kees/linux that referenced this pull request Oct 9, 2014
BugLink: http://bugs.launchpad.net/bugs/1356913

commit 391acf9 upstream.

When runing with the kernel(3.15-rc7+), the follow bug occurs:
[ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
[ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
[ 9969.441175] INFO: lockdep is turned off.
[ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G       A      3.15.0-rc7+ torvalds#85
[ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
[ 9969.706052]  ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
[ 9969.795323]  ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
[ 9969.884710]  ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
[ 9969.974071] Call Trace:
[ 9970.003403]  [<ffffffff8162f523>] dump_stack+0x4d/0x66
[ 9970.065074]  [<ffffffff8109995a>] __might_sleep+0xfa/0x130
[ 9970.130743]  [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
[ 9970.200638]  [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
[ 9970.272610]  [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
[ 9970.344584]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
[ 9970.409282]  [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
[ 9970.471897]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
[ 9970.536585]  [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
[ 9970.613763]  [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
[ 9970.683660]  [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
[ 9970.759795]  [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
[ 9970.834885]  [<ffffffff8106a598>] do_fork+0xd8/0x380
[ 9970.894375]  [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
[ 9970.969470]  [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
[ 9971.030011]  [<ffffffff81642009>] stub_clone+0x69/0x90
[ 9971.091573]  [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b

The cause is that cpuset_mems_allowed() try to take
mutex_lock(&callback_mutex) under the rcu_read_lock(which was hold in
__mpol_dup()). And in cpuset_mems_allowed(), the access to cpuset is
under rcu_read_lock, so in __mpol_dup, we can reduce the rcu_read_lock
protection region to protect the access to cpuset only in
current_cpuset_is_being_rebound(). So that we can avoid this bug.

This patch is a temporary solution that just addresses the bug
mentioned above, can not fix the long-standing issue about cpuset.mems
rebinding on fork():

"When the forker's task_struct is duplicated (which includes
 ->mems_allowed) and it races with an update to cpuset_being_rebound
 in update_tasks_nodemask() then the task's mems_allowed doesn't get
 updated. And the child task's mems_allowed can be wrong if the
 cpuset's nodemask changes before the child has been added to the
 cgroup's tasklist."

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
hzhuang1 pushed a commit to hzhuang1/linux that referenced this pull request Jul 6, 2015
enbable vblank event and reserve 128MB memory for graphic usage
torvalds pushed a commit that referenced this pull request Oct 2, 2015
This is due to  commit 86839c5
"xen/block: add multi-page ring support"

When using an guest under UEFI - after the domain is destroyed
the following warning comes from blkback.

------------[ cut here ]------------
WARNING: CPU: 2 PID: 95 at
/home/julien/works/linux/drivers/block/xen-blkback/xenbus.c:274
xen_blkif_deferred_free+0x1f4/0x1f8()
Modules linked in:
CPU: 2 PID: 95 Comm: kworker/2:1 Tainted: G        W       4.2.0 #85
Hardware name: APM X-Gene Mustang board (DT)
Workqueue: events xen_blkif_deferred_free
Call trace:
[<ffff8000000890a8>] dump_backtrace+0x0/0x124
[<ffff8000000891dc>] show_stack+0x10/0x1c
[<ffff8000007653bc>] dump_stack+0x78/0x98
[<ffff800000097e88>] warn_slowpath_common+0x9c/0xd4
[<ffff800000097f80>] warn_slowpath_null+0x14/0x20
[<ffff800000557a0c>] xen_blkif_deferred_free+0x1f0/0x1f8
[<ffff8000000ad020>] process_one_work+0x160/0x3b4
[<ffff8000000ad3b4>] worker_thread+0x140/0x494
[<ffff8000000b2e34>] kthread+0xd8/0xf0
---[ end trace 6f859b7883c88cdd ]---

Request allocation has been moved to connect_ring, which is called every
time blkback connects to the frontend (this can happen multiple times during
a blkback instance life cycle). On the other hand, request freeing has not
been moved, so it's only called when destroying the backend instance. Due to
this mismatch, blkback can allocate the request pool multiple times, without
freeing it.

In order to fix it, move the freeing of requests to xen_blkif_disconnect to
restore the symmetry between request allocation and freeing.

Reported-by: Julien Grall <julien.grall@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Julien Grall <julien.grall@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: xen-devel@lists.xenproject.org
CC: stable@vger.kernel.org # 4.2
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
heftig referenced this pull request in zen-kernel/zen-kernel Oct 22, 2015
commit f929d42 upstream.

This is due to  commit 86839c5
"xen/block: add multi-page ring support"

When using an guest under UEFI - after the domain is destroyed
the following warning comes from blkback.

------------[ cut here ]------------
WARNING: CPU: 2 PID: 95 at
/home/julien/works/linux/drivers/block/xen-blkback/xenbus.c:274
xen_blkif_deferred_free+0x1f4/0x1f8()
Modules linked in:
CPU: 2 PID: 95 Comm: kworker/2:1 Tainted: G        W       4.2.0 #85
Hardware name: APM X-Gene Mustang board (DT)
Workqueue: events xen_blkif_deferred_free
Call trace:
[<ffff8000000890a8>] dump_backtrace+0x0/0x124
[<ffff8000000891dc>] show_stack+0x10/0x1c
[<ffff8000007653bc>] dump_stack+0x78/0x98
[<ffff800000097e88>] warn_slowpath_common+0x9c/0xd4
[<ffff800000097f80>] warn_slowpath_null+0x14/0x20
[<ffff800000557a0c>] xen_blkif_deferred_free+0x1f0/0x1f8
[<ffff8000000ad020>] process_one_work+0x160/0x3b4
[<ffff8000000ad3b4>] worker_thread+0x140/0x494
[<ffff8000000b2e34>] kthread+0xd8/0xf0
---[ end trace 6f859b7883c88cdd ]---

Request allocation has been moved to connect_ring, which is called every
time blkback connects to the frontend (this can happen multiple times during
a blkback instance life cycle). On the other hand, request freeing has not
been moved, so it's only called when destroying the backend instance. Due to
this mismatch, blkback can allocate the request pool multiple times, without
freeing it.

In order to fix it, move the freeing of requests to xen_blkif_disconnect to
restore the symmetry between request allocation and freeing.

Reported-by: Julien Grall <julien.grall@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Julien Grall <julien.grall@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: xen-devel@lists.xenproject.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ddstreet pushed a commit to ddstreet/linux that referenced this pull request Dec 1, 2015
BugLink: http://bugs.launchpad.net/bugs/1509886

commit f929d42 upstream.

This is due to  commit 86839c5
"xen/block: add multi-page ring support"

When using an guest under UEFI - after the domain is destroyed
the following warning comes from blkback.

------------[ cut here ]------------
WARNING: CPU: 2 PID: 95 at
/home/julien/works/linux/drivers/block/xen-blkback/xenbus.c:274
xen_blkif_deferred_free+0x1f4/0x1f8()
Modules linked in:
CPU: 2 PID: 95 Comm: kworker/2:1 Tainted: G        W       4.2.0 torvalds#85
Hardware name: APM X-Gene Mustang board (DT)
Workqueue: events xen_blkif_deferred_free
Call trace:
[<ffff8000000890a8>] dump_backtrace+0x0/0x124
[<ffff8000000891dc>] show_stack+0x10/0x1c
[<ffff8000007653bc>] dump_stack+0x78/0x98
[<ffff800000097e88>] warn_slowpath_common+0x9c/0xd4
[<ffff800000097f80>] warn_slowpath_null+0x14/0x20
[<ffff800000557a0c>] xen_blkif_deferred_free+0x1f0/0x1f8
[<ffff8000000ad020>] process_one_work+0x160/0x3b4
[<ffff8000000ad3b4>] worker_thread+0x140/0x494
[<ffff8000000b2e34>] kthread+0xd8/0xf0
---[ end trace 6f859b7883c88cdd ]---

Request allocation has been moved to connect_ring, which is called every
time blkback connects to the frontend (this can happen multiple times during
a blkback instance life cycle). On the other hand, request freeing has not
been moved, so it's only called when destroying the backend instance. Due to
this mismatch, blkback can allocate the request pool multiple times, without
freeing it.

In order to fix it, move the freeing of requests to xen_blkif_disconnect to
restore the symmetry between request allocation and freeing.

Reported-by: Julien Grall <julien.grall@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Julien Grall <julien.grall@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: xen-devel@lists.xenproject.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Brad Figg <brad.figg@canonical.com>
nkskjames referenced this pull request in nkskjames/linux Jan 13, 2016
This is due to  commit 86839c5
"xen/block: add multi-page ring support"

When using an guest under UEFI - after the domain is destroyed
the following warning comes from blkback.

------------[ cut here ]------------
WARNING: CPU: 2 PID: 95 at
/home/julien/works/linux/drivers/block/xen-blkback/xenbus.c:274
xen_blkif_deferred_free+0x1f4/0x1f8()
Modules linked in:
CPU: 2 PID: 95 Comm: kworker/2:1 Tainted: G        W       4.2.0 openbmc#85
Hardware name: APM X-Gene Mustang board (DT)
Workqueue: events xen_blkif_deferred_free
Call trace:
[<ffff8000000890a8>] dump_backtrace+0x0/0x124
[<ffff8000000891dc>] show_stack+0x10/0x1c
[<ffff8000007653bc>] dump_stack+0x78/0x98
[<ffff800000097e88>] warn_slowpath_common+0x9c/0xd4
[<ffff800000097f80>] warn_slowpath_null+0x14/0x20
[<ffff800000557a0c>] xen_blkif_deferred_free+0x1f0/0x1f8
[<ffff8000000ad020>] process_one_work+0x160/0x3b4
[<ffff8000000ad3b4>] worker_thread+0x140/0x494
[<ffff8000000b2e34>] kthread+0xd8/0xf0
---[ end trace 6f859b7883c88cdd ]---

Request allocation has been moved to connect_ring, which is called every
time blkback connects to the frontend (this can happen multiple times during
a blkback instance life cycle). On the other hand, request freeing has not
been moved, so it's only called when destroying the backend instance. Due to
this mismatch, blkback can allocate the request pool multiple times, without
freeing it.

In order to fix it, move the freeing of requests to xen_blkif_disconnect to
restore the symmetry between request allocation and freeing.

Reported-by: Julien Grall <julien.grall@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Julien Grall <julien.grall@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: xen-devel@lists.xenproject.org
CC: stable@vger.kernel.org # 4.2
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
0day-ci pushed a commit to 0day-ci/linux that referenced this pull request Apr 15, 2016
This adds test cases mostly around ARG_PTR_TO_RAW_STACK to check the
verifier behaviour.

  [...]
  torvalds#84 raw_stack: no skb_load_bytes OK
  torvalds#85 raw_stack: skb_load_bytes, no init OK
  torvalds#86 raw_stack: skb_load_bytes, init OK
  torvalds#87 raw_stack: skb_load_bytes, spilled regs around bounds OK
  torvalds#88 raw_stack: skb_load_bytes, spilled regs corruption OK
  torvalds#89 raw_stack: skb_load_bytes, spilled regs corruption 2 OK
  torvalds#90 raw_stack: skb_load_bytes, spilled regs + data OK
  torvalds#91 raw_stack: skb_load_bytes, invalid access 1 OK
  torvalds#92 raw_stack: skb_load_bytes, invalid access 2 OK
  torvalds#93 raw_stack: skb_load_bytes, invalid access 3 OK
  torvalds#94 raw_stack: skb_load_bytes, invalid access 4 OK
  torvalds#95 raw_stack: skb_load_bytes, invalid access 5 OK
  torvalds#96 raw_stack: skb_load_bytes, invalid access 6 OK
  torvalds#97 raw_stack: skb_load_bytes, large access OK
  Summary: 98 PASSED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
sashalevin pushed a commit to sashalevin/linux-stable-security that referenced this pull request Apr 29, 2016
commit 72b5322 upstream.

The @cn is stay in @clk_notifier_list after it is freed, it cause
memory corruption.

Example, if @clk is registered(first), unregistered(first),
registered(second), unregistered(second).

The freed @cn will be used when @clk is registered(second),
and the bug will be happened when @clk is unregistered(second):

[  517.040000] clk_notif_dbg clk_notif_dbg.1: clk_notifier_unregister()
[  517.040000] Unable to handle kernel paging request at virtual address 00df3008
[  517.050000] pgd = ed858000
[  517.050000] [00df3008] *pgd=00000000
[  517.060000] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[  517.060000] Modules linked in: clk_notif_dbg(O-) [last unloaded: clk_notif_dbg]
[  517.060000] CPU: 1 PID: 499 Comm: modprobe Tainted: G           O 3.10.0-rc3-00119-ga93cb29-dirty torvalds#85
[  517.060000] task: ee1e0180 ti: ee3e6000 task.ti: ee3e6000
[  517.060000] PC is at srcu_readers_seq_idx+0x48/0x84
[  517.060000] LR is at srcu_readers_seq_idx+0x60/0x84
[  517.060000] pc : [<c0052720>]    lr : [<c0052738>]    psr: 80070013
[  517.060000] sp : ee3e7d48  ip : 00000000  fp : ee3e7d6c
[  517.060000] r10: 00000000  r9 : ee3e6000  r8 : 00000000
[  517.060000] r7 : ed84fe4c  r6 : c068ec90  r5 : c068e430  r4 : 00000000
[  517.060000] r3 : 00df3000  r2 : 00000000  r1 : 00000002  r0 : 00000000
[  517.060000] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  517.060000] Control: 18c5387d  Table: 2d85804a  DAC: 00000015
[  517.060000] Process modprobe (pid: 499, stack limit = 0xee3e6238)
[  517.060000] Stack: (0xee3e7d48 to 0xee3e8000)
....
[  517.060000] [<c0052720>] (srcu_readers_seq_idx+0x48/0x84) from [<c0052790>] (try_check_zero+0x34/0xfc)
[  517.060000] [<c0052790>] (try_check_zero+0x34/0xfc) from [<c00528b0>] (srcu_advance_batches+0x58/0x114)
[  517.060000] [<c00528b0>] (srcu_advance_batches+0x58/0x114) from [<c0052c30>] (__synchronize_srcu+0x114/0x1ac)
[  517.060000] [<c0052c30>] (__synchronize_srcu+0x114/0x1ac) from [<c0052d14>] (synchronize_srcu+0x2c/0x34)
[  517.060000] [<c0052d14>] (synchronize_srcu+0x2c/0x34) from [<c0053a08>] (srcu_notifier_chain_unregister+0x68/0x74)
[  517.060000] [<c0053a08>] (srcu_notifier_chain_unregister+0x68/0x74) from [<c0375a78>] (clk_notifier_unregister+0x7c/0xc0)
[  517.060000] [<c0375a78>] (clk_notifier_unregister+0x7c/0xc0) from [<bf008034>] (clk_notif_dbg_remove+0x34/0x9c [clk_notif_dbg])
[  517.060000] [<bf008034>] (clk_notif_dbg_remove+0x34/0x9c [clk_notif_dbg]) from [<c02bb974>] (platform_drv_remove+0x24/0x28)
[  517.060000] [<c02bb974>] (platform_drv_remove+0x24/0x28) from [<c02b9bf8>] (__device_release_driver+0x8c/0xd4)
[  517.060000] [<c02b9bf8>] (__device_release_driver+0x8c/0xd4) from [<c02ba680>] (driver_detach+0x9c/0xc4)
[  517.060000] [<c02ba680>] (driver_detach+0x9c/0xc4) from [<c02b99c4>] (bus_remove_driver+0xcc/0xfc)
[  517.060000] [<c02b99c4>] (bus_remove_driver+0xcc/0xfc) from [<c02bace4>] (driver_unregister+0x54/0x78)
[  517.060000] [<c02bace4>] (driver_unregister+0x54/0x78) from [<c02bbb44>] (platform_driver_unregister+0x1c/0x20)
[  517.060000] [<c02bbb44>] (platform_driver_unregister+0x1c/0x20) from [<bf0081f8>] (clk_notif_dbg_driver_exit+0x14/0x1c [clk_notif_dbg])
[  517.060000] [<bf0081f8>] (clk_notif_dbg_driver_exit+0x14/0x1c [clk_notif_dbg]) from [<c00835e4>] (SyS_delete_module+0x200/0x28c)
[  517.060000] [<c00835e4>] (SyS_delete_module+0x200/0x28c) from [<c000edc0>] (ret_fast_syscall+0x0/0x48)
[  517.060000] Code: e5973004 e7911102 e0833001 e2881002 (e7933101)

Reported-by: Sören Brinkmann <soren.brinkmann@xilinx.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Sören Brinkmann <soren.brinkmann@xilinx.com>
Signed-off-by: Mike Turquette <mturquette@linaro.org>
[mturquette@linaro.org: shortened $SUBJECT]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
sashalevin pushed a commit to sashalevin/linux-stable-security that referenced this pull request Apr 29, 2016
commit 72b5322 upstream.

The @cn is stay in @clk_notifier_list after it is freed, it cause
memory corruption.

Example, if @clk is registered(first), unregistered(first),
registered(second), unregistered(second).

The freed @cn will be used when @clk is registered(second),
and the bug will be happened when @clk is unregistered(second):

[  517.040000] clk_notif_dbg clk_notif_dbg.1: clk_notifier_unregister()
[  517.040000] Unable to handle kernel paging request at virtual address 00df3008
[  517.050000] pgd = ed858000
[  517.050000] [00df3008] *pgd=00000000
[  517.060000] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[  517.060000] Modules linked in: clk_notif_dbg(O-) [last unloaded: clk_notif_dbg]
[  517.060000] CPU: 1 PID: 499 Comm: modprobe Tainted: G           O 3.10.0-rc3-00119-ga93cb29-dirty torvalds#85
[  517.060000] task: ee1e0180 ti: ee3e6000 task.ti: ee3e6000
[  517.060000] PC is at srcu_readers_seq_idx+0x48/0x84
[  517.060000] LR is at srcu_readers_seq_idx+0x60/0x84
[  517.060000] pc : [<c0052720>]    lr : [<c0052738>]    psr: 80070013
[  517.060000] sp : ee3e7d48  ip : 00000000  fp : ee3e7d6c
[  517.060000] r10: 00000000  r9 : ee3e6000  r8 : 00000000
[  517.060000] r7 : ed84fe4c  r6 : c068ec90  r5 : c068e430  r4 : 00000000
[  517.060000] r3 : 00df3000  r2 : 00000000  r1 : 00000002  r0 : 00000000
[  517.060000] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  517.060000] Control: 18c5387d  Table: 2d85804a  DAC: 00000015
[  517.060000] Process modprobe (pid: 499, stack limit = 0xee3e6238)
[  517.060000] Stack: (0xee3e7d48 to 0xee3e8000)
....
[  517.060000] [<c0052720>] (srcu_readers_seq_idx+0x48/0x84) from [<c0052790>] (try_check_zero+0x34/0xfc)
[  517.060000] [<c0052790>] (try_check_zero+0x34/0xfc) from [<c00528b0>] (srcu_advance_batches+0x58/0x114)
[  517.060000] [<c00528b0>] (srcu_advance_batches+0x58/0x114) from [<c0052c30>] (__synchronize_srcu+0x114/0x1ac)
[  517.060000] [<c0052c30>] (__synchronize_srcu+0x114/0x1ac) from [<c0052d14>] (synchronize_srcu+0x2c/0x34)
[  517.060000] [<c0052d14>] (synchronize_srcu+0x2c/0x34) from [<c0053a08>] (srcu_notifier_chain_unregister+0x68/0x74)
[  517.060000] [<c0053a08>] (srcu_notifier_chain_unregister+0x68/0x74) from [<c0375a78>] (clk_notifier_unregister+0x7c/0xc0)
[  517.060000] [<c0375a78>] (clk_notifier_unregister+0x7c/0xc0) from [<bf008034>] (clk_notif_dbg_remove+0x34/0x9c [clk_notif_dbg])
[  517.060000] [<bf008034>] (clk_notif_dbg_remove+0x34/0x9c [clk_notif_dbg]) from [<c02bb974>] (platform_drv_remove+0x24/0x28)
[  517.060000] [<c02bb974>] (platform_drv_remove+0x24/0x28) from [<c02b9bf8>] (__device_release_driver+0x8c/0xd4)
[  517.060000] [<c02b9bf8>] (__device_release_driver+0x8c/0xd4) from [<c02ba680>] (driver_detach+0x9c/0xc4)
[  517.060000] [<c02ba680>] (driver_detach+0x9c/0xc4) from [<c02b99c4>] (bus_remove_driver+0xcc/0xfc)
[  517.060000] [<c02b99c4>] (bus_remove_driver+0xcc/0xfc) from [<c02bace4>] (driver_unregister+0x54/0x78)
[  517.060000] [<c02bace4>] (driver_unregister+0x54/0x78) from [<c02bbb44>] (platform_driver_unregister+0x1c/0x20)
[  517.060000] [<c02bbb44>] (platform_driver_unregister+0x1c/0x20) from [<bf0081f8>] (clk_notif_dbg_driver_exit+0x14/0x1c [clk_notif_dbg])
[  517.060000] [<bf0081f8>] (clk_notif_dbg_driver_exit+0x14/0x1c [clk_notif_dbg]) from [<c00835e4>] (SyS_delete_module+0x200/0x28c)
[  517.060000] [<c00835e4>] (SyS_delete_module+0x200/0x28c) from [<c000edc0>] (ret_fast_syscall+0x0/0x48)
[  517.060000] Code: e5973004 e7911102 e0833001 e2881002 (e7933101)

Reported-by: Sören Brinkmann <soren.brinkmann@xilinx.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Sören Brinkmann <soren.brinkmann@xilinx.com>
Signed-off-by: Mike Turquette <mturquette@linaro.org>
[mturquette@linaro.org: shortened $SUBJECT]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
sashalevin pushed a commit to sashalevin/linux-stable-security that referenced this pull request Apr 29, 2016
commit f929d42 upstream.

This is due to  commit 86839c5
"xen/block: add multi-page ring support"

When using an guest under UEFI - after the domain is destroyed
the following warning comes from blkback.

------------[ cut here ]------------
WARNING: CPU: 2 PID: 95 at
/home/julien/works/linux/drivers/block/xen-blkback/xenbus.c:274
xen_blkif_deferred_free+0x1f4/0x1f8()
Modules linked in:
CPU: 2 PID: 95 Comm: kworker/2:1 Tainted: G        W       4.2.0 torvalds#85
Hardware name: APM X-Gene Mustang board (DT)
Workqueue: events xen_blkif_deferred_free
Call trace:
[<ffff8000000890a8>] dump_backtrace+0x0/0x124
[<ffff8000000891dc>] show_stack+0x10/0x1c
[<ffff8000007653bc>] dump_stack+0x78/0x98
[<ffff800000097e88>] warn_slowpath_common+0x9c/0xd4
[<ffff800000097f80>] warn_slowpath_null+0x14/0x20
[<ffff800000557a0c>] xen_blkif_deferred_free+0x1f0/0x1f8
[<ffff8000000ad020>] process_one_work+0x160/0x3b4
[<ffff8000000ad3b4>] worker_thread+0x140/0x494
[<ffff8000000b2e34>] kthread+0xd8/0xf0
---[ end trace 6f859b7883c88cdd ]---

Request allocation has been moved to connect_ring, which is called every
time blkback connects to the frontend (this can happen multiple times during
a blkback instance life cycle). On the other hand, request freeing has not
been moved, so it's only called when destroying the backend instance. Due to
this mismatch, blkback can allocate the request pool multiple times, without
freeing it.

In order to fix it, move the freeing of requests to xen_blkif_disconnect to
restore the symmetry between request allocation and freeing.

Reported-by: Julien Grall <julien.grall@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Julien Grall <julien.grall@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: xen-devel@lists.xenproject.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Nov 25, 2016
WARNING: line over 80 characters
torvalds#69: FILE: ipc/sem.c:1997:
+		 * fastpath: the semop has completed, either successfully or not, from

WARNING: line over 80 characters
#70: FILE: ipc/sem.c:1998:
+		 * the syscall pov, is quite irrelevant to us at this point; we're done.

WARNING: line over 80 characters
torvalds#73: FILE: ipc/sem.c:2001:
+		 * spuriously.  The queue.status is checked again in the slowpath (aka

WARNING: line over 80 characters
torvalds#74: FILE: ipc/sem.c:2002:
+		 * after taking sem_lock), such that we can detect scenarios where we

WARNING: line over 80 characters
torvalds#75: FILE: ipc/sem.c:2003:
+		 * were awakened externally, during the window between wake_q_add() and

WARNING: line over 80 characters
torvalds#84: FILE: ipc/sem.c:2009:
+			 * User space could assume that semop() is a memory barrier:

WARNING: line over 80 characters
torvalds#85: FILE: ipc/sem.c:2010:
+			 * Without the mb(), the cpu could speculatively read in user

WARNING: line over 80 characters
torvalds#86: FILE: ipc/sem.c:2011:
+			 * space stale data that was overwritten by the previous owner

total: 0 errors, 8 warnings, 127 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/ipc-sem-simplify-wait-wake-loop.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Nov 30, 2016
WARNING: line over 80 characters
torvalds#69: FILE: ipc/sem.c:1997:
+		 * fastpath: the semop has completed, either successfully or not, from

WARNING: line over 80 characters
#70: FILE: ipc/sem.c:1998:
+		 * the syscall pov, is quite irrelevant to us at this point; we're done.

WARNING: line over 80 characters
torvalds#73: FILE: ipc/sem.c:2001:
+		 * spuriously.  The queue.status is checked again in the slowpath (aka

WARNING: line over 80 characters
torvalds#74: FILE: ipc/sem.c:2002:
+		 * after taking sem_lock), such that we can detect scenarios where we

WARNING: line over 80 characters
torvalds#75: FILE: ipc/sem.c:2003:
+		 * were awakened externally, during the window between wake_q_add() and

WARNING: line over 80 characters
torvalds#84: FILE: ipc/sem.c:2009:
+			 * User space could assume that semop() is a memory barrier:

WARNING: line over 80 characters
torvalds#85: FILE: ipc/sem.c:2010:
+			 * Without the mb(), the cpu could speculatively read in user

WARNING: line over 80 characters
torvalds#86: FILE: ipc/sem.c:2011:
+			 * space stale data that was overwritten by the previous owner

total: 0 errors, 8 warnings, 127 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/ipc-sem-simplify-wait-wake-loop.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Dec 2, 2016
WARNING: line over 80 characters
torvalds#69: FILE: ipc/sem.c:1997:
+		 * fastpath: the semop has completed, either successfully or not, from

WARNING: line over 80 characters
#70: FILE: ipc/sem.c:1998:
+		 * the syscall pov, is quite irrelevant to us at this point; we're done.

WARNING: line over 80 characters
torvalds#73: FILE: ipc/sem.c:2001:
+		 * spuriously.  The queue.status is checked again in the slowpath (aka

WARNING: line over 80 characters
torvalds#74: FILE: ipc/sem.c:2002:
+		 * after taking sem_lock), such that we can detect scenarios where we

WARNING: line over 80 characters
torvalds#75: FILE: ipc/sem.c:2003:
+		 * were awakened externally, during the window between wake_q_add() and

WARNING: line over 80 characters
torvalds#84: FILE: ipc/sem.c:2009:
+			 * User space could assume that semop() is a memory barrier:

WARNING: line over 80 characters
torvalds#85: FILE: ipc/sem.c:2010:
+			 * Without the mb(), the cpu could speculatively read in user

WARNING: line over 80 characters
torvalds#86: FILE: ipc/sem.c:2011:
+			 * space stale data that was overwritten by the previous owner

total: 0 errors, 8 warnings, 127 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/ipc-sem-simplify-wait-wake-loop.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Dec 8, 2016
WARNING: line over 80 characters
torvalds#69: FILE: ipc/sem.c:1997:
+		 * fastpath: the semop has completed, either successfully or not, from

WARNING: line over 80 characters
#70: FILE: ipc/sem.c:1998:
+		 * the syscall pov, is quite irrelevant to us at this point; we're done.

WARNING: line over 80 characters
torvalds#73: FILE: ipc/sem.c:2001:
+		 * spuriously.  The queue.status is checked again in the slowpath (aka

WARNING: line over 80 characters
torvalds#74: FILE: ipc/sem.c:2002:
+		 * after taking sem_lock), such that we can detect scenarios where we

WARNING: line over 80 characters
torvalds#75: FILE: ipc/sem.c:2003:
+		 * were awakened externally, during the window between wake_q_add() and

WARNING: line over 80 characters
torvalds#84: FILE: ipc/sem.c:2009:
+			 * User space could assume that semop() is a memory barrier:

WARNING: line over 80 characters
torvalds#85: FILE: ipc/sem.c:2010:
+			 * Without the mb(), the cpu could speculatively read in user

WARNING: line over 80 characters
torvalds#86: FILE: ipc/sem.c:2011:
+			 * space stale data that was overwritten by the previous owner

total: 0 errors, 8 warnings, 127 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/ipc-sem-simplify-wait-wake-loop.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Dec 8, 2016
WARNING: line over 80 characters
torvalds#69: FILE: ipc/sem.c:1997:
+		 * fastpath: the semop has completed, either successfully or not, from

WARNING: line over 80 characters
#70: FILE: ipc/sem.c:1998:
+		 * the syscall pov, is quite irrelevant to us at this point; we're done.

WARNING: line over 80 characters
torvalds#73: FILE: ipc/sem.c:2001:
+		 * spuriously.  The queue.status is checked again in the slowpath (aka

WARNING: line over 80 characters
torvalds#74: FILE: ipc/sem.c:2002:
+		 * after taking sem_lock), such that we can detect scenarios where we

WARNING: line over 80 characters
torvalds#75: FILE: ipc/sem.c:2003:
+		 * were awakened externally, during the window between wake_q_add() and

WARNING: line over 80 characters
torvalds#84: FILE: ipc/sem.c:2009:
+			 * User space could assume that semop() is a memory barrier:

WARNING: line over 80 characters
torvalds#85: FILE: ipc/sem.c:2010:
+			 * Without the mb(), the cpu could speculatively read in user

WARNING: line over 80 characters
torvalds#86: FILE: ipc/sem.c:2011:
+			 * space stale data that was overwritten by the previous owner

total: 0 errors, 8 warnings, 127 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/ipc-sem-simplify-wait-wake-loop.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kaz205 pushed a commit to Kaz205/linux that referenced this pull request Nov 26, 2023
commit 745f17a upstream.

We got a WARNING in ext4_add_complete_io:
==================================================================
 WARNING: at fs/ext4/page-io.c:231 ext4_put_io_end_defer+0x182/0x250
 CPU: 10 PID: 77 Comm: ksoftirqd/10 Tainted: 6.3.0-rc2 torvalds#85
 RIP: 0010:ext4_put_io_end_defer+0x182/0x250 [ext4]
 [...]
 Call Trace:
  <TASK>
  ext4_end_bio+0xa8/0x240 [ext4]
  bio_endio+0x195/0x310
  blk_update_request+0x184/0x770
  scsi_end_request+0x2f/0x240
  scsi_io_completion+0x75/0x450
  scsi_finish_command+0xef/0x160
  scsi_complete+0xa3/0x180
  blk_complete_reqs+0x60/0x80
  blk_done_softirq+0x25/0x40
  __do_softirq+0x119/0x4c8
  run_ksoftirqd+0x42/0x70
  smpboot_thread_fn+0x136/0x3c0
  kthread+0x140/0x1a0
  ret_from_fork+0x2c/0x50
==================================================================

Above issue may happen as follows:

            cpu1                        cpu2
----------------------------|----------------------------
mount -o dioread_lock
ext4_writepages
 ext4_do_writepages
  *if (ext4_should_dioread_nolock(inode))*
    // rsv_blocks is not assigned here
                                 mount -o remount,dioread_nolock
  ext4_journal_start_with_reserve
   __ext4_journal_start
    __ext4_journal_start_sb
     jbd2__journal_start
      *if (rsv_blocks)*
        // h_rsv_handle is not initialized here
  mpage_map_and_submit_extent
    mpage_map_one_extent
      dioread_nolock = ext4_should_dioread_nolock(inode)
      if (dioread_nolock && (map->m_flags & EXT4_MAP_UNWRITTEN))
        mpd->io_submit.io_end->handle = handle->h_rsv_handle
        ext4_set_io_unwritten_flag
          io_end->flag |= EXT4_IO_END_UNWRITTEN
      // now io_end->handle is NULL but has EXT4_IO_END_UNWRITTEN flag

scsi_finish_command
 scsi_io_completion
  scsi_io_completion_action
   scsi_end_request
    blk_update_request
     req_bio_endio
      bio_endio
       bio->bi_end_io  > ext4_end_bio
        ext4_put_io_end_defer
	 ext4_add_complete_io
	  // trigger WARN_ON(!io_end->handle && sbi->s_journal);

The immediate cause of this problem is that ext4_should_dioread_nolock()
function returns inconsistent values in the ext4_do_writepages() and
mpage_map_one_extent(). There are four conditions in this function that
can be changed at mount time to cause this problem. These four conditions
can be divided into two categories:

    (1) journal_data and EXT4_EXTENTS_FL, which can be changed by ioctl
    (2) DELALLOC and DIOREAD_NOLOCK, which can be changed by remount

The two in the first category have been fixed by commit c8585c6
("ext4: fix races between changing inode journal mode and ext4_writepages")
and commit cb85f4d ("ext4: fix race between writepages and enabling
EXT4_EXTENTS_FL") respectively.

Two cases in the other category have not yet been fixed, and the above
issue is caused by this situation. We refer to the fix for the first
category, when applying options during remount, we grab s_writepages_rwsem
to avoid racing with writepages ops to trigger this problem.

Fixes: 6b523df ("ext4: use transaction reservation for extent conversion in ext4_end_io")
Cc: stable@vger.kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230524072538.2883391-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kaz205 pushed a commit to Kaz205/linux that referenced this pull request Nov 27, 2023
commit 745f17a upstream.

We got a WARNING in ext4_add_complete_io:
==================================================================
 WARNING: at fs/ext4/page-io.c:231 ext4_put_io_end_defer+0x182/0x250
 CPU: 10 PID: 77 Comm: ksoftirqd/10 Tainted: 6.3.0-rc2 torvalds#85
 RIP: 0010:ext4_put_io_end_defer+0x182/0x250 [ext4]
 [...]
 Call Trace:
  <TASK>
  ext4_end_bio+0xa8/0x240 [ext4]
  bio_endio+0x195/0x310
  blk_update_request+0x184/0x770
  scsi_end_request+0x2f/0x240
  scsi_io_completion+0x75/0x450
  scsi_finish_command+0xef/0x160
  scsi_complete+0xa3/0x180
  blk_complete_reqs+0x60/0x80
  blk_done_softirq+0x25/0x40
  __do_softirq+0x119/0x4c8
  run_ksoftirqd+0x42/0x70
  smpboot_thread_fn+0x136/0x3c0
  kthread+0x140/0x1a0
  ret_from_fork+0x2c/0x50
==================================================================

Above issue may happen as follows:

            cpu1                        cpu2
----------------------------|----------------------------
mount -o dioread_lock
ext4_writepages
 ext4_do_writepages
  *if (ext4_should_dioread_nolock(inode))*
    // rsv_blocks is not assigned here
                                 mount -o remount,dioread_nolock
  ext4_journal_start_with_reserve
   __ext4_journal_start
    __ext4_journal_start_sb
     jbd2__journal_start
      *if (rsv_blocks)*
        // h_rsv_handle is not initialized here
  mpage_map_and_submit_extent
    mpage_map_one_extent
      dioread_nolock = ext4_should_dioread_nolock(inode)
      if (dioread_nolock && (map->m_flags & EXT4_MAP_UNWRITTEN))
        mpd->io_submit.io_end->handle = handle->h_rsv_handle
        ext4_set_io_unwritten_flag
          io_end->flag |= EXT4_IO_END_UNWRITTEN
      // now io_end->handle is NULL but has EXT4_IO_END_UNWRITTEN flag

scsi_finish_command
 scsi_io_completion
  scsi_io_completion_action
   scsi_end_request
    blk_update_request
     req_bio_endio
      bio_endio
       bio->bi_end_io  > ext4_end_bio
        ext4_put_io_end_defer
	 ext4_add_complete_io
	  // trigger WARN_ON(!io_end->handle && sbi->s_journal);

The immediate cause of this problem is that ext4_should_dioread_nolock()
function returns inconsistent values in the ext4_do_writepages() and
mpage_map_one_extent(). There are four conditions in this function that
can be changed at mount time to cause this problem. These four conditions
can be divided into two categories:

    (1) journal_data and EXT4_EXTENTS_FL, which can be changed by ioctl
    (2) DELALLOC and DIOREAD_NOLOCK, which can be changed by remount

The two in the first category have been fixed by commit c8585c6
("ext4: fix races between changing inode journal mode and ext4_writepages")
and commit cb85f4d ("ext4: fix race between writepages and enabling
EXT4_EXTENTS_FL") respectively.

Two cases in the other category have not yet been fixed, and the above
issue is caused by this situation. We refer to the fix for the first
category, when applying options during remount, we grab s_writepages_rwsem
to avoid racing with writepages ops to trigger this problem.

Fixes: 6b523df ("ext4: use transaction reservation for extent conversion in ext4_end_io")
Cc: stable@vger.kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230524072538.2883391-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sirlucjan pushed a commit to CachyOS/linux that referenced this pull request Nov 28, 2023
This series is backport of sched_ext-v5 (597552f5447a) which is the split
version of devel branch commit e69323c ("Merge pull request torvalds#85 from
sched-ext/misc-fixes ").

This commit makes the following fixes for backport:

- s/wakeup_preempt_scx/check_preempt_curr_scx/ to revert the rename which
  happened post v6.6.

- Remove the use of BPF_F_TIMER_CPU_PIN which was introduced after v6.6.
BoukeHaarsma23 pushed a commit to ChimeraOS/linux that referenced this pull request Nov 28, 2023
commit 745f17a upstream.

We got a WARNING in ext4_add_complete_io:
==================================================================
 WARNING: at fs/ext4/page-io.c:231 ext4_put_io_end_defer+0x182/0x250
 CPU: 10 PID: 77 Comm: ksoftirqd/10 Tainted: 6.3.0-rc2 torvalds#85
 RIP: 0010:ext4_put_io_end_defer+0x182/0x250 [ext4]
 [...]
 Call Trace:
  <TASK>
  ext4_end_bio+0xa8/0x240 [ext4]
  bio_endio+0x195/0x310
  blk_update_request+0x184/0x770
  scsi_end_request+0x2f/0x240
  scsi_io_completion+0x75/0x450
  scsi_finish_command+0xef/0x160
  scsi_complete+0xa3/0x180
  blk_complete_reqs+0x60/0x80
  blk_done_softirq+0x25/0x40
  __do_softirq+0x119/0x4c8
  run_ksoftirqd+0x42/0x70
  smpboot_thread_fn+0x136/0x3c0
  kthread+0x140/0x1a0
  ret_from_fork+0x2c/0x50
==================================================================

Above issue may happen as follows:

            cpu1                        cpu2
----------------------------|----------------------------
mount -o dioread_lock
ext4_writepages
 ext4_do_writepages
  *if (ext4_should_dioread_nolock(inode))*
    // rsv_blocks is not assigned here
                                 mount -o remount,dioread_nolock
  ext4_journal_start_with_reserve
   __ext4_journal_start
    __ext4_journal_start_sb
     jbd2__journal_start
      *if (rsv_blocks)*
        // h_rsv_handle is not initialized here
  mpage_map_and_submit_extent
    mpage_map_one_extent
      dioread_nolock = ext4_should_dioread_nolock(inode)
      if (dioread_nolock && (map->m_flags & EXT4_MAP_UNWRITTEN))
        mpd->io_submit.io_end->handle = handle->h_rsv_handle
        ext4_set_io_unwritten_flag
          io_end->flag |= EXT4_IO_END_UNWRITTEN
      // now io_end->handle is NULL but has EXT4_IO_END_UNWRITTEN flag

scsi_finish_command
 scsi_io_completion
  scsi_io_completion_action
   scsi_end_request
    blk_update_request
     req_bio_endio
      bio_endio
       bio->bi_end_io  > ext4_end_bio
        ext4_put_io_end_defer
	 ext4_add_complete_io
	  // trigger WARN_ON(!io_end->handle && sbi->s_journal);

The immediate cause of this problem is that ext4_should_dioread_nolock()
function returns inconsistent values in the ext4_do_writepages() and
mpage_map_one_extent(). There are four conditions in this function that
can be changed at mount time to cause this problem. These four conditions
can be divided into two categories:

    (1) journal_data and EXT4_EXTENTS_FL, which can be changed by ioctl
    (2) DELALLOC and DIOREAD_NOLOCK, which can be changed by remount

The two in the first category have been fixed by commit c8585c6
("ext4: fix races between changing inode journal mode and ext4_writepages")
and commit cb85f4d ("ext4: fix race between writepages and enabling
EXT4_EXTENTS_FL") respectively.

Two cases in the other category have not yet been fixed, and the above
issue is caused by this situation. We refer to the fix for the first
category, when applying options during remount, we grab s_writepages_rwsem
to avoid racing with writepages ops to trigger this problem.

Fixes: 6b523df ("ext4: use transaction reservation for extent conversion in ext4_end_io")
Cc: stable@vger.kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230524072538.2883391-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
1054009064 pushed a commit to 1054009064/linux that referenced this pull request Nov 28, 2023
commit 745f17a upstream.

We got a WARNING in ext4_add_complete_io:
==================================================================
 WARNING: at fs/ext4/page-io.c:231 ext4_put_io_end_defer+0x182/0x250
 CPU: 10 PID: 77 Comm: ksoftirqd/10 Tainted: 6.3.0-rc2 torvalds#85
 RIP: 0010:ext4_put_io_end_defer+0x182/0x250 [ext4]
 [...]
 Call Trace:
  <TASK>
  ext4_end_bio+0xa8/0x240 [ext4]
  bio_endio+0x195/0x310
  blk_update_request+0x184/0x770
  scsi_end_request+0x2f/0x240
  scsi_io_completion+0x75/0x450
  scsi_finish_command+0xef/0x160
  scsi_complete+0xa3/0x180
  blk_complete_reqs+0x60/0x80
  blk_done_softirq+0x25/0x40
  __do_softirq+0x119/0x4c8
  run_ksoftirqd+0x42/0x70
  smpboot_thread_fn+0x136/0x3c0
  kthread+0x140/0x1a0
  ret_from_fork+0x2c/0x50
==================================================================

Above issue may happen as follows:

            cpu1                        cpu2
----------------------------|----------------------------
mount -o dioread_lock
ext4_writepages
 ext4_do_writepages
  *if (ext4_should_dioread_nolock(inode))*
    // rsv_blocks is not assigned here
                                 mount -o remount,dioread_nolock
  ext4_journal_start_with_reserve
   __ext4_journal_start
    __ext4_journal_start_sb
     jbd2__journal_start
      *if (rsv_blocks)*
        // h_rsv_handle is not initialized here
  mpage_map_and_submit_extent
    mpage_map_one_extent
      dioread_nolock = ext4_should_dioread_nolock(inode)
      if (dioread_nolock && (map->m_flags & EXT4_MAP_UNWRITTEN))
        mpd->io_submit.io_end->handle = handle->h_rsv_handle
        ext4_set_io_unwritten_flag
          io_end->flag |= EXT4_IO_END_UNWRITTEN
      // now io_end->handle is NULL but has EXT4_IO_END_UNWRITTEN flag

scsi_finish_command
 scsi_io_completion
  scsi_io_completion_action
   scsi_end_request
    blk_update_request
     req_bio_endio
      bio_endio
       bio->bi_end_io  > ext4_end_bio
        ext4_put_io_end_defer
	 ext4_add_complete_io
	  // trigger WARN_ON(!io_end->handle && sbi->s_journal);

The immediate cause of this problem is that ext4_should_dioread_nolock()
function returns inconsistent values in the ext4_do_writepages() and
mpage_map_one_extent(). There are four conditions in this function that
can be changed at mount time to cause this problem. These four conditions
can be divided into two categories:

    (1) journal_data and EXT4_EXTENTS_FL, which can be changed by ioctl
    (2) DELALLOC and DIOREAD_NOLOCK, which can be changed by remount

The two in the first category have been fixed by commit c8585c6
("ext4: fix races between changing inode journal mode and ext4_writepages")
and commit cb85f4d ("ext4: fix race between writepages and enabling
EXT4_EXTENTS_FL") respectively.

Two cases in the other category have not yet been fixed, and the above
issue is caused by this situation. We refer to the fix for the first
category, when applying options during remount, we grab s_writepages_rwsem
to avoid racing with writepages ops to trigger this problem.

Fixes: 6b523df ("ext4: use transaction reservation for extent conversion in ext4_end_io")
Cc: stable@vger.kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230524072538.2883391-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
AndrewFasano added a commit to panda-re/linux that referenced this pull request Jan 16, 2024
From: Kurt Kanzenbach <kurt@linutronix.de>
To: gregkh@linuxfoundation.org
Cc: kurt@linutronix.de, tglx@linutronix.de, jslaby@suse.com,
	linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH] tty: serial: 8250: pass IRQ shared flag to UART ports
Date: Fri, 16 Mar 2018 12:31:58 +0100	[thread overview]
Message-ID: <20180316113158.10298-1-kurt@linutronix.de> (raw)

On some systems IRQ lines between multiple UARTs might be shared. If so, the
irqflags have to be configured accordingly. The reason is: The 8250 port startup
code performs IRQ tests *before* the IRQ handler for that particular port is
registered. This is performed in serial8250_do_startup(). This function checks
whether IRQF_SHARED is configured and only then disables the IRQ line while
testing.

This test is performed upon each open() of the UART device. Imagine two UARTs
share the same IRQ line: On is already opened and the IRQ is active. When the
second UART is opened, the IRQ line has to be disabled while performing IRQ
tests. Otherwise an IRQ might handler might be invoked, but the the IRQ itself
cannot be handled, because the corresponding handler isn't registered,
yet. That's because the 8250 code uses a chain-handler and invokes the
corresponding port's IRQ handling rountines himself.

Unfortunately this IRQF_SHARED flag isn't configured for UARTs probed via device
tree even if the IRQs are shared. This way, the actual and shared IRQ line isn't
disabled while performing tests and the kernel correctly detects a spurious
IRQ. So, adding this flag to the DT probe solves the issue.

Note: The UPF_SHARE_IRQ flag is configured unconditionally. Therefore, the
IRQF_SHARED flag can be set unconditionally as well.

Example stacktrace by performing echo 1 > /dev/ttyS2 on a non-patched system:

|irq 85: nobody cared (try booting with the "irqpoll" option)
| [...]
|handlers:
|[<ffff0000080fc628>] irq_default_primary_handler threaded [<ffff00000855fbb8>] serial8250_interrupt
|Disabling IRQ torvalds#85

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
roxell pushed a commit to roxell/linux that referenced this pull request Jan 17, 2024
When I was testing mongodb over bcachefs with compression,
there is a lockdep warning when snapshotting mongodb data volume.

$ cat test.sh
prog=bcachefs

$prog subvolume create /mnt/data
$prog subvolume create /mnt/data/snapshots

while true;do
    $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s)
    sleep 1s
done

$ cat /etc/mongodb.conf
systemLog:
  destination: file
  logAppend: true
  path: /mnt/data/mongod.log

storage:
  dbPath: /mnt/data/

lockdep reports:
[ 3437.452330] ======================================================
[ 3437.452750] WARNING: possible circular locking dependency detected
[ 3437.453168] 6.7.0-rc7-custom+ torvalds#85 Tainted: G            E
[ 3437.453562] ------------------------------------------------------
[ 3437.453981] bcachefs/35533 is trying to acquire lock:
[ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190
[ 3437.454875]
               but task is already holding lock:
[ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.456009]
               which lock already depends on the new lock.

[ 3437.456553]
               the existing dependency chain (in reverse order) is:
[ 3437.457054]
               -> #3 (&type->s_umount_key#48){.+.+}-{3:3}:
[ 3437.457507]        down_read+0x3e/0x170
[ 3437.457772]        bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.458206]        __x64_sys_ioctl+0x93/0xd0
[ 3437.458498]        do_syscall_64+0x42/0xf0
[ 3437.458779]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.459155]
               -> #2 (&c->snapshot_create_lock){++++}-{3:3}:
[ 3437.459615]        down_read+0x3e/0x170
[ 3437.459878]        bch2_truncate+0x82/0x110 [bcachefs]
[ 3437.460276]        bchfs_truncate+0x254/0x3c0 [bcachefs]
[ 3437.460686]        notify_change+0x1f1/0x4a0
[ 3437.461283]        do_truncate+0x7f/0xd0
[ 3437.461555]        path_openat+0xa57/0xce0
[ 3437.461836]        do_filp_open+0xb4/0x160
[ 3437.462116]        do_sys_openat2+0x91/0xc0
[ 3437.462402]        __x64_sys_openat+0x53/0xa0
[ 3437.462701]        do_syscall_64+0x42/0xf0
[ 3437.462982]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.463359]
               -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}:
[ 3437.463843]        down_write+0x3b/0xc0
[ 3437.464223]        bch2_write_iter+0x5b/0xcc0 [bcachefs]
[ 3437.464493]        vfs_write+0x21b/0x4c0
[ 3437.464653]        ksys_write+0x69/0xf0
[ 3437.464839]        do_syscall_64+0x42/0xf0
[ 3437.465009]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.465231]
               -> #0 (sb_writers#10){.+.+}-{0:0}:
[ 3437.465471]        __lock_acquire+0x1455/0x21b0
[ 3437.465656]        lock_acquire+0xc6/0x2b0
[ 3437.465822]        mnt_want_write+0x46/0x1a0
[ 3437.465996]        filename_create+0x62/0x190
[ 3437.466175]        user_path_create+0x2d/0x50
[ 3437.466352]        bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
[ 3437.466617]        __x64_sys_ioctl+0x93/0xd0
[ 3437.466791]        do_syscall_64+0x42/0xf0
[ 3437.466957]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.467180]
               other info that might help us debug this:

[ 3437.469670] 2 locks held by bcachefs/35533:
               other info that might help us debug this:

[ 3437.467507] Chain exists of:
                 sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48

[ 3437.467979]  Possible unsafe locking scenario:

[ 3437.468223]        CPU0                    CPU1
[ 3437.468405]        ----                    ----
[ 3437.468585]   rlock(&type->s_umount_key#48);
[ 3437.468758]                                lock(&c->snapshot_create_lock);
[ 3437.469030]                                lock(&type->s_umount_key#48);
[ 3437.469291]   rlock(sb_writers#10);
[ 3437.469434]
                *** DEADLOCK ***

[ 3437.469670] 2 locks held by bcachefs/35533:
[ 3437.469838]  #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs]
[ 3437.470294]  #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.470744]
               stack backtrace:
[ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G            E      6.7.0-rc7-custom+ torvalds#85
[ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[ 3437.471694] Call Trace:
[ 3437.471795]  <TASK>
[ 3437.471884]  dump_stack_lvl+0x57/0x90
[ 3437.472035]  check_noncircular+0x132/0x150
[ 3437.472202]  __lock_acquire+0x1455/0x21b0
[ 3437.472369]  lock_acquire+0xc6/0x2b0
[ 3437.472518]  ? filename_create+0x62/0x190
[ 3437.472683]  ? lock_is_held_type+0x97/0x110
[ 3437.472856]  mnt_want_write+0x46/0x1a0
[ 3437.473025]  ? filename_create+0x62/0x190
[ 3437.473204]  filename_create+0x62/0x190
[ 3437.473380]  user_path_create+0x2d/0x50
[ 3437.473555]  bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
[ 3437.473819]  ? lock_acquire+0xc6/0x2b0
[ 3437.474002]  ? __fget_files+0x2a/0x190
[ 3437.474195]  ? __fget_files+0xbc/0x190
[ 3437.474380]  ? lock_release+0xc5/0x270
[ 3437.474567]  ? __x64_sys_ioctl+0x93/0xd0
[ 3437.474764]  ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs]
[ 3437.475090]  __x64_sys_ioctl+0x93/0xd0
[ 3437.475277]  do_syscall_64+0x42/0xf0
[ 3437.475454]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.475691] RIP: 0033:0x7f2743c313af
======================================================

In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally
and unlock it at the end of the function. There is a comment
"why do we need this lock?" about the lock coming from
commit 42d2373 ("bcachefs: Snapshot creation, deletion")
The reason is that __bch2_ioctl_subvolume_create() calls
sync_inodes_sb() which enforce locked s_umount to writeback all dirty
nodes before doing snapshot works.

Fix it by read locking s_umount for snapshotting only and unlocking
s_umount after sync_inodes_sb().

Signed-off-by: Su Yue <glass.su@suse.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
logic10492 pushed a commit to logic10492/linux-amd-zen2 that referenced this pull request Jan 18, 2024
scx: Move scx_ops_enable_state_str[] outside CONFIG_SCHED_DEBUG
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Jan 18, 2024
When I was testing mongodb over bcachefs with compression,
there is a lockdep warning when snapshotting mongodb data volume.

$ cat test.sh
prog=bcachefs

$prog subvolume create /mnt/data
$prog subvolume create /mnt/data/snapshots

while true;do
    $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s)
    sleep 1s
done

$ cat /etc/mongodb.conf
systemLog:
  destination: file
  logAppend: true
  path: /mnt/data/mongod.log

storage:
  dbPath: /mnt/data/

lockdep reports:
[ 3437.452330] ======================================================
[ 3437.452750] WARNING: possible circular locking dependency detected
[ 3437.453168] 6.7.0-rc7-custom+ torvalds#85 Tainted: G            E
[ 3437.453562] ------------------------------------------------------
[ 3437.453981] bcachefs/35533 is trying to acquire lock:
[ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190
[ 3437.454875]
               but task is already holding lock:
[ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.456009]
               which lock already depends on the new lock.

[ 3437.456553]
               the existing dependency chain (in reverse order) is:
[ 3437.457054]
               -> #3 (&type->s_umount_key#48){.+.+}-{3:3}:
[ 3437.457507]        down_read+0x3e/0x170
[ 3437.457772]        bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.458206]        __x64_sys_ioctl+0x93/0xd0
[ 3437.458498]        do_syscall_64+0x42/0xf0
[ 3437.458779]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.459155]
               -> #2 (&c->snapshot_create_lock){++++}-{3:3}:
[ 3437.459615]        down_read+0x3e/0x170
[ 3437.459878]        bch2_truncate+0x82/0x110 [bcachefs]
[ 3437.460276]        bchfs_truncate+0x254/0x3c0 [bcachefs]
[ 3437.460686]        notify_change+0x1f1/0x4a0
[ 3437.461283]        do_truncate+0x7f/0xd0
[ 3437.461555]        path_openat+0xa57/0xce0
[ 3437.461836]        do_filp_open+0xb4/0x160
[ 3437.462116]        do_sys_openat2+0x91/0xc0
[ 3437.462402]        __x64_sys_openat+0x53/0xa0
[ 3437.462701]        do_syscall_64+0x42/0xf0
[ 3437.462982]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.463359]
               -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}:
[ 3437.463843]        down_write+0x3b/0xc0
[ 3437.464223]        bch2_write_iter+0x5b/0xcc0 [bcachefs]
[ 3437.464493]        vfs_write+0x21b/0x4c0
[ 3437.464653]        ksys_write+0x69/0xf0
[ 3437.464839]        do_syscall_64+0x42/0xf0
[ 3437.465009]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.465231]
               -> #0 (sb_writers#10){.+.+}-{0:0}:
[ 3437.465471]        __lock_acquire+0x1455/0x21b0
[ 3437.465656]        lock_acquire+0xc6/0x2b0
[ 3437.465822]        mnt_want_write+0x46/0x1a0
[ 3437.465996]        filename_create+0x62/0x190
[ 3437.466175]        user_path_create+0x2d/0x50
[ 3437.466352]        bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
[ 3437.466617]        __x64_sys_ioctl+0x93/0xd0
[ 3437.466791]        do_syscall_64+0x42/0xf0
[ 3437.466957]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.467180]
               other info that might help us debug this:

[ 3437.469670] 2 locks held by bcachefs/35533:
               other info that might help us debug this:

[ 3437.467507] Chain exists of:
                 sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48

[ 3437.467979]  Possible unsafe locking scenario:

[ 3437.468223]        CPU0                    CPU1
[ 3437.468405]        ----                    ----
[ 3437.468585]   rlock(&type->s_umount_key#48);
[ 3437.468758]                                lock(&c->snapshot_create_lock);
[ 3437.469030]                                lock(&type->s_umount_key#48);
[ 3437.469291]   rlock(sb_writers#10);
[ 3437.469434]
                *** DEADLOCK ***

[ 3437.469670] 2 locks held by bcachefs/35533:
[ 3437.469838]  #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs]
[ 3437.470294]  #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.470744]
               stack backtrace:
[ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G            E      6.7.0-rc7-custom+ torvalds#85
[ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[ 3437.471694] Call Trace:
[ 3437.471795]  <TASK>
[ 3437.471884]  dump_stack_lvl+0x57/0x90
[ 3437.472035]  check_noncircular+0x132/0x150
[ 3437.472202]  __lock_acquire+0x1455/0x21b0
[ 3437.472369]  lock_acquire+0xc6/0x2b0
[ 3437.472518]  ? filename_create+0x62/0x190
[ 3437.472683]  ? lock_is_held_type+0x97/0x110
[ 3437.472856]  mnt_want_write+0x46/0x1a0
[ 3437.473025]  ? filename_create+0x62/0x190
[ 3437.473204]  filename_create+0x62/0x190
[ 3437.473380]  user_path_create+0x2d/0x50
[ 3437.473555]  bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
[ 3437.473819]  ? lock_acquire+0xc6/0x2b0
[ 3437.474002]  ? __fget_files+0x2a/0x190
[ 3437.474195]  ? __fget_files+0xbc/0x190
[ 3437.474380]  ? lock_release+0xc5/0x270
[ 3437.474567]  ? __x64_sys_ioctl+0x93/0xd0
[ 3437.474764]  ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs]
[ 3437.475090]  __x64_sys_ioctl+0x93/0xd0
[ 3437.475277]  do_syscall_64+0x42/0xf0
[ 3437.475454]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.475691] RIP: 0033:0x7f2743c313af
======================================================

In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally
and unlock it at the end of the function. There is a comment
"why do we need this lock?" about the lock coming from
commit 42d2373 ("bcachefs: Snapshot creation, deletion")
The reason is that __bch2_ioctl_subvolume_create() calls
sync_inodes_sb() which enforce locked s_umount to writeback all dirty
nodes before doing snapshot works.

Fix it by read locking s_umount for snapshotting only and unlocking
s_umount after sync_inodes_sb().

Signed-off-by: Su Yue <glass.su@suse.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
torvalds pushed a commit that referenced this pull request Jan 21, 2024
When I was testing mongodb over bcachefs with compression,
there is a lockdep warning when snapshotting mongodb data volume.

$ cat test.sh
prog=bcachefs

$prog subvolume create /mnt/data
$prog subvolume create /mnt/data/snapshots

while true;do
    $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s)
    sleep 1s
done

$ cat /etc/mongodb.conf
systemLog:
  destination: file
  logAppend: true
  path: /mnt/data/mongod.log

storage:
  dbPath: /mnt/data/

lockdep reports:
[ 3437.452330] ======================================================
[ 3437.452750] WARNING: possible circular locking dependency detected
[ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G            E
[ 3437.453562] ------------------------------------------------------
[ 3437.453981] bcachefs/35533 is trying to acquire lock:
[ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190
[ 3437.454875]
               but task is already holding lock:
[ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.456009]
               which lock already depends on the new lock.

[ 3437.456553]
               the existing dependency chain (in reverse order) is:
[ 3437.457054]
               -> #3 (&type->s_umount_key#48){.+.+}-{3:3}:
[ 3437.457507]        down_read+0x3e/0x170
[ 3437.457772]        bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.458206]        __x64_sys_ioctl+0x93/0xd0
[ 3437.458498]        do_syscall_64+0x42/0xf0
[ 3437.458779]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.459155]
               -> #2 (&c->snapshot_create_lock){++++}-{3:3}:
[ 3437.459615]        down_read+0x3e/0x170
[ 3437.459878]        bch2_truncate+0x82/0x110 [bcachefs]
[ 3437.460276]        bchfs_truncate+0x254/0x3c0 [bcachefs]
[ 3437.460686]        notify_change+0x1f1/0x4a0
[ 3437.461283]        do_truncate+0x7f/0xd0
[ 3437.461555]        path_openat+0xa57/0xce0
[ 3437.461836]        do_filp_open+0xb4/0x160
[ 3437.462116]        do_sys_openat2+0x91/0xc0
[ 3437.462402]        __x64_sys_openat+0x53/0xa0
[ 3437.462701]        do_syscall_64+0x42/0xf0
[ 3437.462982]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.463359]
               -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}:
[ 3437.463843]        down_write+0x3b/0xc0
[ 3437.464223]        bch2_write_iter+0x5b/0xcc0 [bcachefs]
[ 3437.464493]        vfs_write+0x21b/0x4c0
[ 3437.464653]        ksys_write+0x69/0xf0
[ 3437.464839]        do_syscall_64+0x42/0xf0
[ 3437.465009]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.465231]
               -> #0 (sb_writers#10){.+.+}-{0:0}:
[ 3437.465471]        __lock_acquire+0x1455/0x21b0
[ 3437.465656]        lock_acquire+0xc6/0x2b0
[ 3437.465822]        mnt_want_write+0x46/0x1a0
[ 3437.465996]        filename_create+0x62/0x190
[ 3437.466175]        user_path_create+0x2d/0x50
[ 3437.466352]        bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
[ 3437.466617]        __x64_sys_ioctl+0x93/0xd0
[ 3437.466791]        do_syscall_64+0x42/0xf0
[ 3437.466957]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.467180]
               other info that might help us debug this:

[ 3437.469670] 2 locks held by bcachefs/35533:
               other info that might help us debug this:

[ 3437.467507] Chain exists of:
                 sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48

[ 3437.467979]  Possible unsafe locking scenario:

[ 3437.468223]        CPU0                    CPU1
[ 3437.468405]        ----                    ----
[ 3437.468585]   rlock(&type->s_umount_key#48);
[ 3437.468758]                                lock(&c->snapshot_create_lock);
[ 3437.469030]                                lock(&type->s_umount_key#48);
[ 3437.469291]   rlock(sb_writers#10);
[ 3437.469434]
                *** DEADLOCK ***

[ 3437.469670] 2 locks held by bcachefs/35533:
[ 3437.469838]  #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs]
[ 3437.470294]  #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.470744]
               stack backtrace:
[ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G            E      6.7.0-rc7-custom+ #85
[ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[ 3437.471694] Call Trace:
[ 3437.471795]  <TASK>
[ 3437.471884]  dump_stack_lvl+0x57/0x90
[ 3437.472035]  check_noncircular+0x132/0x150
[ 3437.472202]  __lock_acquire+0x1455/0x21b0
[ 3437.472369]  lock_acquire+0xc6/0x2b0
[ 3437.472518]  ? filename_create+0x62/0x190
[ 3437.472683]  ? lock_is_held_type+0x97/0x110
[ 3437.472856]  mnt_want_write+0x46/0x1a0
[ 3437.473025]  ? filename_create+0x62/0x190
[ 3437.473204]  filename_create+0x62/0x190
[ 3437.473380]  user_path_create+0x2d/0x50
[ 3437.473555]  bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
[ 3437.473819]  ? lock_acquire+0xc6/0x2b0
[ 3437.474002]  ? __fget_files+0x2a/0x190
[ 3437.474195]  ? __fget_files+0xbc/0x190
[ 3437.474380]  ? lock_release+0xc5/0x270
[ 3437.474567]  ? __x64_sys_ioctl+0x93/0xd0
[ 3437.474764]  ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs]
[ 3437.475090]  __x64_sys_ioctl+0x93/0xd0
[ 3437.475277]  do_syscall_64+0x42/0xf0
[ 3437.475454]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.475691] RIP: 0033:0x7f2743c313af
======================================================

In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally
and unlock it at the end of the function. There is a comment
"why do we need this lock?" about the lock coming from
commit 42d2373 ("bcachefs: Snapshot creation, deletion")
The reason is that __bch2_ioctl_subvolume_create() calls
sync_inodes_sb() which enforce locked s_umount to writeback all dirty
nodes before doing snapshot works.

Fix it by read locking s_umount for snapshotting only and unlocking
s_umount after sync_inodes_sb().

Signed-off-by: Su Yue <glass.su@suse.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
gyroninja added a commit to gyroninja/linux that referenced this pull request Jan 28, 2024
KSAN calls into rcu code which then triggers a write that reenters into KSAN
getting the system stuck doing infinite recursion.

#0  kmsan_get_context () at mm/kmsan/kmsan.h:106
#1  __msan_get_context_state () at mm/kmsan/instrumentation.c:331
#2  0xffffffff81495671 in get_current () at ./arch/x86/include/asm/current.h:42
#3  rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
#4  __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
#5  0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#6  pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#7  kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82
torvalds#8  virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75
torvalds#9  0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143
torvalds#10 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97
torvalds#11 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36
torvalds#12 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91
torvalds#13 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
torvalds#14 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
torvalds#15 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#16 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#17 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82
torvalds#18 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75
torvalds#19 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143
torvalds#20 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97
torvalds#21 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36
torvalds#22 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91
torvalds#23 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
torvalds#24 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
torvalds#25 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#26 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#27 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82
torvalds#28 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75
torvalds#29 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143
torvalds#30 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97
torvalds#31 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36
torvalds#32 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91
torvalds#33 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
torvalds#34 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
torvalds#35 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#36 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#37 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82
torvalds#38 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75
torvalds#39 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143
torvalds#40 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97
torvalds#41 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36
torvalds#42 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91
torvalds#43 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
torvalds#44 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
torvalds#45 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#46 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#47 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82
torvalds#48 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75
torvalds#49 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143
torvalds#50 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97
torvalds#51 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36
#52 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91
#53 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
torvalds#54 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
torvalds#55 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#56 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#57 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82
#58 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75
torvalds#59 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143
torvalds#60 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97
torvalds#61 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36
torvalds#62 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91
torvalds#63 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
torvalds#64 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
torvalds#65 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#66 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#67 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82
torvalds#68 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75
torvalds#69 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143
#70 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97
torvalds#71 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36
torvalds#72 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91
torvalds#73 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
torvalds#74 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
torvalds#75 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#76 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#77 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff86203c90) at ./arch/x86/include/asm/kmsan.h:82
torvalds#78 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff86203c90) at mm/kmsan/shadow.c:75
torvalds#79 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff86203c90, is_origin=false) at mm/kmsan/shadow.c:143
torvalds#80 kmsan_get_shadow_origin_ptr (address=0xffffffff86203c90, size=8, store=false) at mm/kmsan/shadow.c:97
torvalds#81 0xffffffff81b1dc72 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=8, store=false) at mm/kmsan/instrumentation.c:36
torvalds#82 __msan_metadata_ptr_for_load_8 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:92
torvalds#83 0xffffffff814fdb9e in filter_irq_stacks (entries=<optimized out>, nr_entries=4) at kernel/stacktrace.c:397
torvalds#84 0xffffffff829520e8 in stack_depot_save_flags (entries=0xffffffff8620d974 <init_task+1012>, nr_entries=4, alloc_flags=0, depot_flags=0) at lib/stackdepot.c:500
torvalds#85 0xffffffff81b1e560 in __msan_poison_alloca (address=0xffffffff86203da0, size=24, descr=<optimized out>) at mm/kmsan/instrumentation.c:285
torvalds#86 0xffffffff8562821c in _printk (fmt=0xffffffff85f191a5 "\0016Attempting lock1") at kernel/printk/printk.c:2324
torvalds#87 0xffffffff81942aa2 in kmem_cache_create_usercopy (name=0xffffffff85f18903 "mm_struct", size=1296, align=0, flags=270336, useroffset=<optimized out>, usersize=<optimized out>, ctor=0x0 <fixed_percpu_data>) at mm/slab_common.c:296
torvalds#88 0xffffffff86f337a0 in mm_cache_init () at kernel/fork.c:3262
torvalds#89 0xffffffff86eacb8e in start_kernel () at init/main.c:932
torvalds#90 0xffffffff86ecdf94 in x86_64_start_reservations (real_mode_data=0x140e0 <exception_stacks+28896> <error: Cannot access memory at address 0x140e0>) at arch/x86/kernel/head64.c:555
torvalds#91 0xffffffff86ecde9b in x86_64_start_kernel (real_mode_data=0x140e0 <exception_stacks+28896> <error: Cannot access memory at address 0x140e0>) at arch/x86/kernel/head64.c:536
torvalds#92 0xffffffff810001d3 in secondary_startup_64 () at /pool/workspace/linux/arch/x86/kernel/head_64.S:461
torvalds#93 0x0000000000000000 in ??
gyroninja added a commit to gyroninja/linux that referenced this pull request Jan 28, 2024
As of 5ec8e8e(mm/sparsemem: fix race in accessing memory_section->usage) KMSAN
now calls into RCU tree code during kmsan_get_metadata. This will trigger a
write that will reenter into KMSAN getting the system stuck doing infinite
recursion.

#0  kmsan_get_context () at mm/kmsan/kmsan.h:106
#1  __msan_get_context_state () at mm/kmsan/instrumentation.c:331
#2  0xffffffff81495671 in get_current () at ./arch/x86/include/asm/current.h:42
#3  rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
#4  __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
#5  0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#6  pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#7  kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82
torvalds#8  virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75
torvalds#9  0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143
torvalds#10 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97
torvalds#11 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36
torvalds#12 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91
torvalds#13 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
torvalds#14 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
torvalds#15 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#16 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#17 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82
torvalds#18 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75
torvalds#19 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143
torvalds#20 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97
torvalds#21 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36
torvalds#22 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91
torvalds#23 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
torvalds#24 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
torvalds#25 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#26 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#27 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82
torvalds#28 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75
torvalds#29 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143
torvalds#30 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97
torvalds#31 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36
torvalds#32 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91
torvalds#33 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
torvalds#34 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
torvalds#35 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#36 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#37 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82
torvalds#38 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75
torvalds#39 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143
torvalds#40 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97
torvalds#41 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36
torvalds#42 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91
torvalds#43 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
torvalds#44 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
torvalds#45 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#46 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#47 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82
torvalds#48 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75
torvalds#49 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143
torvalds#50 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97
torvalds#51 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36
#52 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91
#53 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
torvalds#54 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
torvalds#55 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#56 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#57 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82
#58 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75
torvalds#59 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143
torvalds#60 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97
torvalds#61 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36
torvalds#62 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91
torvalds#63 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
torvalds#64 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
torvalds#65 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#66 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#67 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82
torvalds#68 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75
torvalds#69 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143
#70 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97
torvalds#71 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36
torvalds#72 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91
torvalds#73 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379
torvalds#74 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402
torvalds#75 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748
torvalds#76 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016
torvalds#77 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff86203c90) at ./arch/x86/include/asm/kmsan.h:82
torvalds#78 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff86203c90) at mm/kmsan/shadow.c:75
torvalds#79 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff86203c90, is_origin=false) at mm/kmsan/shadow.c:143
torvalds#80 kmsan_get_shadow_origin_ptr (address=0xffffffff86203c90, size=8, store=false) at mm/kmsan/shadow.c:97
torvalds#81 0xffffffff81b1dc72 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=8, store=false) at mm/kmsan/instrumentation.c:36
torvalds#82 __msan_metadata_ptr_for_load_8 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:92
torvalds#83 0xffffffff814fdb9e in filter_irq_stacks (entries=<optimized out>, nr_entries=4) at kernel/stacktrace.c:397
torvalds#84 0xffffffff829520e8 in stack_depot_save_flags (entries=0xffffffff8620d974 <init_task+1012>, nr_entries=4, alloc_flags=0, depot_flags=0) at lib/stackdepot.c:500
torvalds#85 0xffffffff81b1e560 in __msan_poison_alloca (address=0xffffffff86203da0, size=24, descr=<optimized out>) at mm/kmsan/instrumentation.c:285
torvalds#86 0xffffffff8562821c in _printk (fmt=0xffffffff85f191a5 "\0016Attempting lock1") at kernel/printk/printk.c:2324
torvalds#87 0xffffffff81942aa2 in kmem_cache_create_usercopy (name=0xffffffff85f18903 "mm_struct", size=1296, align=0, flags=270336, useroffset=<optimized out>, usersize=<optimized out>, ctor=0x0 <fixed_percpu_data>) at mm/slab_common.c:296
torvalds#88 0xffffffff86f337a0 in mm_cache_init () at kernel/fork.c:3262
torvalds#89 0xffffffff86eacb8e in start_kernel () at init/main.c:932
torvalds#90 0xffffffff86ecdf94 in x86_64_start_reservations (real_mode_data=0x140e0 <exception_stacks+28896> <error: Cannot access memory at address 0x140e0>) at arch/x86/kernel/head64.c:555
torvalds#91 0xffffffff86ecde9b in x86_64_start_kernel (real_mode_data=0x140e0 <exception_stacks+28896> <error: Cannot access memory at address 0x140e0>) at arch/x86/kernel/head64.c:536
torvalds#92 0xffffffff810001d3 in secondary_startup_64 () at /pool/workspace/linux/arch/x86/kernel/head_64.S:461
torvalds#93 0x0000000000000000 in ??
1054009064 pushed a commit to 1054009064/linux that referenced this pull request Feb 16, 2024
commit 2acc59d upstream.

When I was testing mongodb over bcachefs with compression,
there is a lockdep warning when snapshotting mongodb data volume.

$ cat test.sh
prog=bcachefs

$prog subvolume create /mnt/data
$prog subvolume create /mnt/data/snapshots

while true;do
    $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s)
    sleep 1s
done

$ cat /etc/mongodb.conf
systemLog:
  destination: file
  logAppend: true
  path: /mnt/data/mongod.log

storage:
  dbPath: /mnt/data/

lockdep reports:
[ 3437.452330] ======================================================
[ 3437.452750] WARNING: possible circular locking dependency detected
[ 3437.453168] 6.7.0-rc7-custom+ torvalds#85 Tainted: G            E
[ 3437.453562] ------------------------------------------------------
[ 3437.453981] bcachefs/35533 is trying to acquire lock:
[ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190
[ 3437.454875]
               but task is already holding lock:
[ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.456009]
               which lock already depends on the new lock.

[ 3437.456553]
               the existing dependency chain (in reverse order) is:
[ 3437.457054]
               -> #3 (&type->s_umount_key#48){.+.+}-{3:3}:
[ 3437.457507]        down_read+0x3e/0x170
[ 3437.457772]        bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.458206]        __x64_sys_ioctl+0x93/0xd0
[ 3437.458498]        do_syscall_64+0x42/0xf0
[ 3437.458779]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.459155]
               -> #2 (&c->snapshot_create_lock){++++}-{3:3}:
[ 3437.459615]        down_read+0x3e/0x170
[ 3437.459878]        bch2_truncate+0x82/0x110 [bcachefs]
[ 3437.460276]        bchfs_truncate+0x254/0x3c0 [bcachefs]
[ 3437.460686]        notify_change+0x1f1/0x4a0
[ 3437.461283]        do_truncate+0x7f/0xd0
[ 3437.461555]        path_openat+0xa57/0xce0
[ 3437.461836]        do_filp_open+0xb4/0x160
[ 3437.462116]        do_sys_openat2+0x91/0xc0
[ 3437.462402]        __x64_sys_openat+0x53/0xa0
[ 3437.462701]        do_syscall_64+0x42/0xf0
[ 3437.462982]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.463359]
               -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}:
[ 3437.463843]        down_write+0x3b/0xc0
[ 3437.464223]        bch2_write_iter+0x5b/0xcc0 [bcachefs]
[ 3437.464493]        vfs_write+0x21b/0x4c0
[ 3437.464653]        ksys_write+0x69/0xf0
[ 3437.464839]        do_syscall_64+0x42/0xf0
[ 3437.465009]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.465231]
               -> #0 (sb_writers#10){.+.+}-{0:0}:
[ 3437.465471]        __lock_acquire+0x1455/0x21b0
[ 3437.465656]        lock_acquire+0xc6/0x2b0
[ 3437.465822]        mnt_want_write+0x46/0x1a0
[ 3437.465996]        filename_create+0x62/0x190
[ 3437.466175]        user_path_create+0x2d/0x50
[ 3437.466352]        bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
[ 3437.466617]        __x64_sys_ioctl+0x93/0xd0
[ 3437.466791]        do_syscall_64+0x42/0xf0
[ 3437.466957]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.467180]
               other info that might help us debug this:

[ 3437.469670] 2 locks held by bcachefs/35533:
               other info that might help us debug this:

[ 3437.467507] Chain exists of:
                 sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48

[ 3437.467979]  Possible unsafe locking scenario:

[ 3437.468223]        CPU0                    CPU1
[ 3437.468405]        ----                    ----
[ 3437.468585]   rlock(&type->s_umount_key#48);
[ 3437.468758]                                lock(&c->snapshot_create_lock);
[ 3437.469030]                                lock(&type->s_umount_key#48);
[ 3437.469291]   rlock(sb_writers#10);
[ 3437.469434]
                *** DEADLOCK ***

[ 3437.469670] 2 locks held by bcachefs/35533:
[ 3437.469838]  #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs]
[ 3437.470294]  #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.470744]
               stack backtrace:
[ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G            E      6.7.0-rc7-custom+ torvalds#85
[ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[ 3437.471694] Call Trace:
[ 3437.471795]  <TASK>
[ 3437.471884]  dump_stack_lvl+0x57/0x90
[ 3437.472035]  check_noncircular+0x132/0x150
[ 3437.472202]  __lock_acquire+0x1455/0x21b0
[ 3437.472369]  lock_acquire+0xc6/0x2b0
[ 3437.472518]  ? filename_create+0x62/0x190
[ 3437.472683]  ? lock_is_held_type+0x97/0x110
[ 3437.472856]  mnt_want_write+0x46/0x1a0
[ 3437.473025]  ? filename_create+0x62/0x190
[ 3437.473204]  filename_create+0x62/0x190
[ 3437.473380]  user_path_create+0x2d/0x50
[ 3437.473555]  bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
[ 3437.473819]  ? lock_acquire+0xc6/0x2b0
[ 3437.474002]  ? __fget_files+0x2a/0x190
[ 3437.474195]  ? __fget_files+0xbc/0x190
[ 3437.474380]  ? lock_release+0xc5/0x270
[ 3437.474567]  ? __x64_sys_ioctl+0x93/0xd0
[ 3437.474764]  ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs]
[ 3437.475090]  __x64_sys_ioctl+0x93/0xd0
[ 3437.475277]  do_syscall_64+0x42/0xf0
[ 3437.475454]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.475691] RIP: 0033:0x7f2743c313af
======================================================

In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally
and unlock it at the end of the function. There is a comment
"why do we need this lock?" about the lock coming from
commit 42d2373 ("bcachefs: Snapshot creation, deletion")
The reason is that __bch2_ioctl_subvolume_create() calls
sync_inodes_sb() which enforce locked s_umount to writeback all dirty
nodes before doing snapshot works.

Fix it by read locking s_umount for snapshotting only and unlocking
s_umount after sync_inodes_sb().

Signed-off-by: Su Yue <glass.su@suse.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
mpe pushed a commit to linuxppc/linux that referenced this pull request May 7, 2024
Recent additions in BPF like cpu v4 instructions, test_bpf module
exhibits the following failures:

  test_bpf: torvalds#82 ALU_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: torvalds#83 ALU_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: torvalds#84 ALU64_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: torvalds#85 ALU64_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: torvalds#86 ALU64_MOVSX | BPF_W jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)

  test_bpf: torvalds#165 ALU_SDIV_X: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times)
  test_bpf: torvalds#166 ALU_SDIV_K: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times)

  test_bpf: torvalds#169 ALU_SMOD_X: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)
  test_bpf: torvalds#170 ALU_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)

  test_bpf: torvalds#172 ALU64_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)

  test_bpf: torvalds#313 BSWAP 16: 0x0123456789abcdef -> 0xefcd
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 301 PASS
  test_bpf: torvalds#314 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 555 PASS
  test_bpf: torvalds#315 BSWAP 64: 0x0123456789abcdef -> 0x67452301
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 268 PASS
  test_bpf: torvalds#316 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 269 PASS
  test_bpf: torvalds#317 BSWAP 16: 0xfedcba9876543210 -> 0x1032
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 460 PASS
  test_bpf: torvalds#318 BSWAP 32: 0xfedcba9876543210 -> 0x10325476
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 320 PASS
  test_bpf: torvalds#319 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 222 PASS
  test_bpf: torvalds#320 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 273 PASS

  test_bpf: torvalds#344 BPF_LDX_MEMSX | BPF_B
  eBPF filter opcode 0091 (@5) unsupported
  jited:0 432 PASS
  test_bpf: torvalds#345 BPF_LDX_MEMSX | BPF_H
  eBPF filter opcode 0089 (@5) unsupported
  jited:0 381 PASS
  test_bpf: torvalds#346 BPF_LDX_MEMSX | BPF_W
  eBPF filter opcode 0081 (@5) unsupported
  jited:0 505 PASS

  test_bpf: torvalds#490 JMP32_JA: Unconditional jump: if (true) return 1
  eBPF filter opcode 0006 (@1) unsupported
  jited:0 261 PASS

  test_bpf: Summary: 1040 PASSED, 10 FAILED, [924/1038 JIT'ed]

Fix them by adding missing processing.

Fixes: daabb2b ("bpf/tests: add tests for cpuv4 instructions")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/91de862dda99d170697eb79ffb478678af7e0b27.1709652689.git.christophe.leroy@csgroup.eu
Kaz205 pushed a commit to Kaz205/linux that referenced this pull request Jun 3, 2024
[ Upstream commit 8ecf3c1 ]

Recent additions in BPF like cpu v4 instructions, test_bpf module
exhibits the following failures:

  test_bpf: torvalds#82 ALU_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: torvalds#83 ALU_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: torvalds#84 ALU64_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: torvalds#85 ALU64_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: torvalds#86 ALU64_MOVSX | BPF_W jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)

  test_bpf: torvalds#165 ALU_SDIV_X: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times)
  test_bpf: torvalds#166 ALU_SDIV_K: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times)

  test_bpf: torvalds#169 ALU_SMOD_X: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)
  test_bpf: torvalds#170 ALU_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)

  test_bpf: torvalds#172 ALU64_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)

  test_bpf: torvalds#313 BSWAP 16: 0x0123456789abcdef -> 0xefcd
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 301 PASS
  test_bpf: torvalds#314 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 555 PASS
  test_bpf: torvalds#315 BSWAP 64: 0x0123456789abcdef -> 0x67452301
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 268 PASS
  test_bpf: torvalds#316 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 269 PASS
  test_bpf: torvalds#317 BSWAP 16: 0xfedcba9876543210 -> 0x1032
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 460 PASS
  test_bpf: torvalds#318 BSWAP 32: 0xfedcba9876543210 -> 0x10325476
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 320 PASS
  test_bpf: torvalds#319 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 222 PASS
  test_bpf: torvalds#320 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 273 PASS

  test_bpf: torvalds#344 BPF_LDX_MEMSX | BPF_B
  eBPF filter opcode 0091 (@5) unsupported
  jited:0 432 PASS
  test_bpf: torvalds#345 BPF_LDX_MEMSX | BPF_H
  eBPF filter opcode 0089 (@5) unsupported
  jited:0 381 PASS
  test_bpf: torvalds#346 BPF_LDX_MEMSX | BPF_W
  eBPF filter opcode 0081 (@5) unsupported
  jited:0 505 PASS

  test_bpf: torvalds#490 JMP32_JA: Unconditional jump: if (true) return 1
  eBPF filter opcode 0006 (@1) unsupported
  jited:0 261 PASS

  test_bpf: Summary: 1040 PASSED, 10 FAILED, [924/1038 JIT'ed]

Fix them by adding missing processing.

Fixes: daabb2b ("bpf/tests: add tests for cpuv4 instructions")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/91de862dda99d170697eb79ffb478678af7e0b27.1709652689.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
Kaz205 pushed a commit to Kaz205/linux that referenced this pull request Jun 5, 2024
[ Upstream commit 8ecf3c1 ]

Recent additions in BPF like cpu v4 instructions, test_bpf module
exhibits the following failures:

  test_bpf: torvalds#82 ALU_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: torvalds#83 ALU_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: torvalds#84 ALU64_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: torvalds#85 ALU64_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: torvalds#86 ALU64_MOVSX | BPF_W jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)

  test_bpf: torvalds#165 ALU_SDIV_X: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times)
  test_bpf: torvalds#166 ALU_SDIV_K: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times)

  test_bpf: torvalds#169 ALU_SMOD_X: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)
  test_bpf: torvalds#170 ALU_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)

  test_bpf: torvalds#172 ALU64_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)

  test_bpf: torvalds#313 BSWAP 16: 0x0123456789abcdef -> 0xefcd
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 301 PASS
  test_bpf: torvalds#314 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 555 PASS
  test_bpf: torvalds#315 BSWAP 64: 0x0123456789abcdef -> 0x67452301
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 268 PASS
  test_bpf: torvalds#316 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 269 PASS
  test_bpf: torvalds#317 BSWAP 16: 0xfedcba9876543210 -> 0x1032
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 460 PASS
  test_bpf: torvalds#318 BSWAP 32: 0xfedcba9876543210 -> 0x10325476
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 320 PASS
  test_bpf: torvalds#319 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 222 PASS
  test_bpf: torvalds#320 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 273 PASS

  test_bpf: torvalds#344 BPF_LDX_MEMSX | BPF_B
  eBPF filter opcode 0091 (@5) unsupported
  jited:0 432 PASS
  test_bpf: torvalds#345 BPF_LDX_MEMSX | BPF_H
  eBPF filter opcode 0089 (@5) unsupported
  jited:0 381 PASS
  test_bpf: torvalds#346 BPF_LDX_MEMSX | BPF_W
  eBPF filter opcode 0081 (@5) unsupported
  jited:0 505 PASS

  test_bpf: torvalds#490 JMP32_JA: Unconditional jump: if (true) return 1
  eBPF filter opcode 0006 (@1) unsupported
  jited:0 261 PASS

  test_bpf: Summary: 1040 PASSED, 10 FAILED, [924/1038 JIT'ed]

Fix them by adding missing processing.

Fixes: daabb2b ("bpf/tests: add tests for cpuv4 instructions")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/91de862dda99d170697eb79ffb478678af7e0b27.1709652689.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
hdeller pushed a commit to hdeller/linux that referenced this pull request Jun 12, 2024
[ Upstream commit 8ecf3c1 ]

Recent additions in BPF like cpu v4 instructions, test_bpf module
exhibits the following failures:

  test_bpf: torvalds#82 ALU_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: torvalds#83 ALU_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: torvalds#84 ALU64_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: torvalds#85 ALU64_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: torvalds#86 ALU64_MOVSX | BPF_W jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)

  test_bpf: torvalds#165 ALU_SDIV_X: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times)
  test_bpf: torvalds#166 ALU_SDIV_K: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times)

  test_bpf: torvalds#169 ALU_SMOD_X: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)
  test_bpf: torvalds#170 ALU_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)

  test_bpf: torvalds#172 ALU64_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)

  test_bpf: torvalds#313 BSWAP 16: 0x0123456789abcdef -> 0xefcd
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 301 PASS
  test_bpf: torvalds#314 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 555 PASS
  test_bpf: torvalds#315 BSWAP 64: 0x0123456789abcdef -> 0x67452301
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 268 PASS
  test_bpf: torvalds#316 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 269 PASS
  test_bpf: torvalds#317 BSWAP 16: 0xfedcba9876543210 -> 0x1032
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 460 PASS
  test_bpf: torvalds#318 BSWAP 32: 0xfedcba9876543210 -> 0x10325476
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 320 PASS
  test_bpf: torvalds#319 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 222 PASS
  test_bpf: torvalds#320 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 273 PASS

  test_bpf: torvalds#344 BPF_LDX_MEMSX | BPF_B
  eBPF filter opcode 0091 (@5) unsupported
  jited:0 432 PASS
  test_bpf: torvalds#345 BPF_LDX_MEMSX | BPF_H
  eBPF filter opcode 0089 (@5) unsupported
  jited:0 381 PASS
  test_bpf: torvalds#346 BPF_LDX_MEMSX | BPF_W
  eBPF filter opcode 0081 (@5) unsupported
  jited:0 505 PASS

  test_bpf: torvalds#490 JMP32_JA: Unconditional jump: if (true) return 1
  eBPF filter opcode 0006 (@1) unsupported
  jited:0 261 PASS

  test_bpf: Summary: 1040 PASSED, 10 FAILED, [924/1038 JIT'ed]

Fix them by adding missing processing.

Fixes: daabb2b ("bpf/tests: add tests for cpuv4 instructions")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/91de862dda99d170697eb79ffb478678af7e0b27.1709652689.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Jun 14, 2024
Reading the dispatch trace log from /sys/kernel/debug/powerpc/dtl/cpu-*
results in a BUG() when the config CONFIG_HARDENED_USERCOPY is enabled as
shown below.

    kernel BUG at mm/usercopy.c:102!
    Oops: Exception in kernel mode, sig: 5 [#1]
    LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
    Modules linked in: xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc
    scsi_transport_fc ibmveth pseries_wdt dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse
    CPU: 27 PID: 1815 Comm: python3 Not tainted 6.10.0-rc3 torvalds#85
    Hardware name: IBM,9040-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_042) hv:phyp pSeries
    NIP:  c0000000005d23d4 LR: c0000000005d23d0 CTR: 00000000006ee6f8
    REGS: c000000120c078c0 TRAP: 0700   Not tainted  (6.10.0-rc3)
    MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 2828220f  XER: 0000000e
    CFAR: c0000000001fdc80 IRQMASK: 0
    [ ... GPRs omitted ... ]
    NIP [c0000000005d23d4] usercopy_abort+0x78/0xb0
    LR [c0000000005d23d0] usercopy_abort+0x74/0xb0
    Call Trace:
     usercopy_abort+0x74/0xb0 (unreliable)
     __check_heap_object+0xf8/0x120
     check_heap_object+0x218/0x240
     __check_object_size+0x84/0x1a4
     dtl_file_read+0x17c/0x2c4
     full_proxy_read+0x8c/0x110
     vfs_read+0xdc/0x3a0
     ksys_read+0x84/0x144
     system_call_exception+0x124/0x330
     system_call_vectored_common+0x15c/0x2ec
    --- interrupt: 3000 at 0x7fff81f3ab34

Commit 6d07d1c ("usercopy: Restrict non-usercopy caches to size 0")
requires that only whitelisted areas in slab/slub objects can be copied to
userspace when usercopy hardening is enabled using CONFIG_HARDENED_USERCOPY.
Dtl contains hypervisor dispatch events which are expected to be read by
privileged users. Hence mark this safe for user access.
Specify useroffset=0 and usersize=DISPATCH_LOG_BYTES to whitelist the
entire object.

Co-developed-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Anjali K <anjalik@linux.ibm.com>
mpe pushed a commit to linuxppc/linux that referenced this pull request Jun 19, 2024
Reading the dispatch trace log from /sys/kernel/debug/powerpc/dtl/cpu-*
results in a BUG() when the config CONFIG_HARDENED_USERCOPY is enabled as
shown below.

    kernel BUG at mm/usercopy.c:102!
    Oops: Exception in kernel mode, sig: 5 [#1]
    LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
    Modules linked in: xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc
    scsi_transport_fc ibmveth pseries_wdt dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse
    CPU: 27 PID: 1815 Comm: python3 Not tainted 6.10.0-rc3 torvalds#85
    Hardware name: IBM,9040-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_042) hv:phyp pSeries
    NIP:  c0000000005d23d4 LR: c0000000005d23d0 CTR: 00000000006ee6f8
    REGS: c000000120c078c0 TRAP: 0700   Not tainted  (6.10.0-rc3)
    MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 2828220f  XER: 0000000e
    CFAR: c0000000001fdc80 IRQMASK: 0
    [ ... GPRs omitted ... ]
    NIP [c0000000005d23d4] usercopy_abort+0x78/0xb0
    LR [c0000000005d23d0] usercopy_abort+0x74/0xb0
    Call Trace:
     usercopy_abort+0x74/0xb0 (unreliable)
     __check_heap_object+0xf8/0x120
     check_heap_object+0x218/0x240
     __check_object_size+0x84/0x1a4
     dtl_file_read+0x17c/0x2c4
     full_proxy_read+0x8c/0x110
     vfs_read+0xdc/0x3a0
     ksys_read+0x84/0x144
     system_call_exception+0x124/0x330
     system_call_vectored_common+0x15c/0x2ec
    --- interrupt: 3000 at 0x7fff81f3ab34

Commit 6d07d1c ("usercopy: Restrict non-usercopy caches to size 0")
requires that only whitelisted areas in slab/slub objects can be copied to
userspace when usercopy hardening is enabled using CONFIG_HARDENED_USERCOPY.
Dtl contains hypervisor dispatch events which are expected to be read by
privileged users. Hence mark this safe for user access.
Specify useroffset=0 and usersize=DISPATCH_LOG_BYTES to whitelist the
entire object.

Co-developed-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Anjali K <anjalik@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240614173844.746818-1-anjalik@linux.ibm.com
mpe pushed a commit to linuxppc/linux that referenced this pull request Jun 23, 2024
Reading the dispatch trace log from /sys/kernel/debug/powerpc/dtl/cpu-*
results in a BUG() when the config CONFIG_HARDENED_USERCOPY is enabled as
shown below.

    kernel BUG at mm/usercopy.c:102!
    Oops: Exception in kernel mode, sig: 5 [#1]
    LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
    Modules linked in: xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc
    scsi_transport_fc ibmveth pseries_wdt dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse
    CPU: 27 PID: 1815 Comm: python3 Not tainted 6.10.0-rc3 torvalds#85
    Hardware name: IBM,9040-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_042) hv:phyp pSeries
    NIP:  c0000000005d23d4 LR: c0000000005d23d0 CTR: 00000000006ee6f8
    REGS: c000000120c078c0 TRAP: 0700   Not tainted  (6.10.0-rc3)
    MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 2828220f  XER: 0000000e
    CFAR: c0000000001fdc80 IRQMASK: 0
    [ ... GPRs omitted ... ]
    NIP [c0000000005d23d4] usercopy_abort+0x78/0xb0
    LR [c0000000005d23d0] usercopy_abort+0x74/0xb0
    Call Trace:
     usercopy_abort+0x74/0xb0 (unreliable)
     __check_heap_object+0xf8/0x120
     check_heap_object+0x218/0x240
     __check_object_size+0x84/0x1a4
     dtl_file_read+0x17c/0x2c4
     full_proxy_read+0x8c/0x110
     vfs_read+0xdc/0x3a0
     ksys_read+0x84/0x144
     system_call_exception+0x124/0x330
     system_call_vectored_common+0x15c/0x2ec
    --- interrupt: 3000 at 0x7fff81f3ab34

Commit 6d07d1c ("usercopy: Restrict non-usercopy caches to size 0")
requires that only whitelisted areas in slab/slub objects can be copied to
userspace when usercopy hardening is enabled using CONFIG_HARDENED_USERCOPY.
Dtl contains hypervisor dispatch events which are expected to be read by
privileged users. Hence mark this safe for user access.
Specify useroffset=0 and usersize=DISPATCH_LOG_BYTES to whitelist the
entire object.

Co-developed-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Anjali K <anjalik@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240614173844.746818-1-anjalik@linux.ibm.com
mj22226 pushed a commit to mj22226/linux that referenced this pull request Jul 23, 2024
[ Upstream commit 1a14150 ]

Reading the dispatch trace log from /sys/kernel/debug/powerpc/dtl/cpu-*
results in a BUG() when the config CONFIG_HARDENED_USERCOPY is enabled as
shown below.

    kernel BUG at mm/usercopy.c:102!
    Oops: Exception in kernel mode, sig: 5 [#1]
    LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
    Modules linked in: xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc
    scsi_transport_fc ibmveth pseries_wdt dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse
    CPU: 27 PID: 1815 Comm: python3 Not tainted 6.10.0-rc3 torvalds#85
    Hardware name: IBM,9040-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_042) hv:phyp pSeries
    NIP:  c0000000005d23d4 LR: c0000000005d23d0 CTR: 00000000006ee6f8
    REGS: c000000120c078c0 TRAP: 0700   Not tainted  (6.10.0-rc3)
    MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 2828220f  XER: 0000000e
    CFAR: c0000000001fdc80 IRQMASK: 0
    [ ... GPRs omitted ... ]
    NIP [c0000000005d23d4] usercopy_abort+0x78/0xb0
    LR [c0000000005d23d0] usercopy_abort+0x74/0xb0
    Call Trace:
     usercopy_abort+0x74/0xb0 (unreliable)
     __check_heap_object+0xf8/0x120
     check_heap_object+0x218/0x240
     __check_object_size+0x84/0x1a4
     dtl_file_read+0x17c/0x2c4
     full_proxy_read+0x8c/0x110
     vfs_read+0xdc/0x3a0
     ksys_read+0x84/0x144
     system_call_exception+0x124/0x330
     system_call_vectored_common+0x15c/0x2ec
    --- interrupt: 3000 at 0x7fff81f3ab34

Commit 6d07d1c ("usercopy: Restrict non-usercopy caches to size 0")
requires that only whitelisted areas in slab/slub objects can be copied to
userspace when usercopy hardening is enabled using CONFIG_HARDENED_USERCOPY.
Dtl contains hypervisor dispatch events which are expected to be read by
privileged users. Hence mark this safe for user access.
Specify useroffset=0 and usersize=DISPATCH_LOG_BYTES to whitelist the
entire object.

Co-developed-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Anjali K <anjalik@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240614173844.746818-1-anjalik@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Jul 23, 2024
[ Upstream commit 1a14150 ]

Reading the dispatch trace log from /sys/kernel/debug/powerpc/dtl/cpu-*
results in a BUG() when the config CONFIG_HARDENED_USERCOPY is enabled as
shown below.

    kernel BUG at mm/usercopy.c:102!
    Oops: Exception in kernel mode, sig: 5 [#1]
    LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
    Modules linked in: xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc
    scsi_transport_fc ibmveth pseries_wdt dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse
    CPU: 27 PID: 1815 Comm: python3 Not tainted 6.10.0-rc3 torvalds#85
    Hardware name: IBM,9040-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_042) hv:phyp pSeries
    NIP:  c0000000005d23d4 LR: c0000000005d23d0 CTR: 00000000006ee6f8
    REGS: c000000120c078c0 TRAP: 0700   Not tainted  (6.10.0-rc3)
    MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 2828220f  XER: 0000000e
    CFAR: c0000000001fdc80 IRQMASK: 0
    [ ... GPRs omitted ... ]
    NIP [c0000000005d23d4] usercopy_abort+0x78/0xb0
    LR [c0000000005d23d0] usercopy_abort+0x74/0xb0
    Call Trace:
     usercopy_abort+0x74/0xb0 (unreliable)
     __check_heap_object+0xf8/0x120
     check_heap_object+0x218/0x240
     __check_object_size+0x84/0x1a4
     dtl_file_read+0x17c/0x2c4
     full_proxy_read+0x8c/0x110
     vfs_read+0xdc/0x3a0
     ksys_read+0x84/0x144
     system_call_exception+0x124/0x330
     system_call_vectored_common+0x15c/0x2ec
    --- interrupt: 3000 at 0x7fff81f3ab34

Commit 6d07d1c ("usercopy: Restrict non-usercopy caches to size 0")
requires that only whitelisted areas in slab/slub objects can be copied to
userspace when usercopy hardening is enabled using CONFIG_HARDENED_USERCOPY.
Dtl contains hypervisor dispatch events which are expected to be read by
privileged users. Hence mark this safe for user access.
Specify useroffset=0 and usersize=DISPATCH_LOG_BYTES to whitelist the
entire object.

Co-developed-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Anjali K <anjalik@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240614173844.746818-1-anjalik@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
staging-kernelci-org pushed a commit to kernelci/linux that referenced this pull request Jul 25, 2024
[ Upstream commit 1a14150 ]

Reading the dispatch trace log from /sys/kernel/debug/powerpc/dtl/cpu-*
results in a BUG() when the config CONFIG_HARDENED_USERCOPY is enabled as
shown below.

    kernel BUG at mm/usercopy.c:102!
    Oops: Exception in kernel mode, sig: 5 [#1]
    LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
    Modules linked in: xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc
    scsi_transport_fc ibmveth pseries_wdt dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse
    CPU: 27 PID: 1815 Comm: python3 Not tainted 6.10.0-rc3 torvalds#85
    Hardware name: IBM,9040-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_042) hv:phyp pSeries
    NIP:  c0000000005d23d4 LR: c0000000005d23d0 CTR: 00000000006ee6f8
    REGS: c000000120c078c0 TRAP: 0700   Not tainted  (6.10.0-rc3)
    MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 2828220f  XER: 0000000e
    CFAR: c0000000001fdc80 IRQMASK: 0
    [ ... GPRs omitted ... ]
    NIP [c0000000005d23d4] usercopy_abort+0x78/0xb0
    LR [c0000000005d23d0] usercopy_abort+0x74/0xb0
    Call Trace:
     usercopy_abort+0x74/0xb0 (unreliable)
     __check_heap_object+0xf8/0x120
     check_heap_object+0x218/0x240
     __check_object_size+0x84/0x1a4
     dtl_file_read+0x17c/0x2c4
     full_proxy_read+0x8c/0x110
     vfs_read+0xdc/0x3a0
     ksys_read+0x84/0x144
     system_call_exception+0x124/0x330
     system_call_vectored_common+0x15c/0x2ec
    --- interrupt: 3000 at 0x7fff81f3ab34

Commit 6d07d1c ("usercopy: Restrict non-usercopy caches to size 0")
requires that only whitelisted areas in slab/slub objects can be copied to
userspace when usercopy hardening is enabled using CONFIG_HARDENED_USERCOPY.
Dtl contains hypervisor dispatch events which are expected to be read by
privileged users. Hence mark this safe for user access.
Specify useroffset=0 and usersize=DISPATCH_LOG_BYTES to whitelist the
entire object.

Co-developed-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Anjali K <anjalik@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240614173844.746818-1-anjalik@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
staging-kernelci-org pushed a commit to kernelci/linux that referenced this pull request Jul 25, 2024
[ Upstream commit 1a14150 ]

Reading the dispatch trace log from /sys/kernel/debug/powerpc/dtl/cpu-*
results in a BUG() when the config CONFIG_HARDENED_USERCOPY is enabled as
shown below.

    kernel BUG at mm/usercopy.c:102!
    Oops: Exception in kernel mode, sig: 5 [#1]
    LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
    Modules linked in: xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc
    scsi_transport_fc ibmveth pseries_wdt dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse
    CPU: 27 PID: 1815 Comm: python3 Not tainted 6.10.0-rc3 torvalds#85
    Hardware name: IBM,9040-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_042) hv:phyp pSeries
    NIP:  c0000000005d23d4 LR: c0000000005d23d0 CTR: 00000000006ee6f8
    REGS: c000000120c078c0 TRAP: 0700   Not tainted  (6.10.0-rc3)
    MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 2828220f  XER: 0000000e
    CFAR: c0000000001fdc80 IRQMASK: 0
    [ ... GPRs omitted ... ]
    NIP [c0000000005d23d4] usercopy_abort+0x78/0xb0
    LR [c0000000005d23d0] usercopy_abort+0x74/0xb0
    Call Trace:
     usercopy_abort+0x74/0xb0 (unreliable)
     __check_heap_object+0xf8/0x120
     check_heap_object+0x218/0x240
     __check_object_size+0x84/0x1a4
     dtl_file_read+0x17c/0x2c4
     full_proxy_read+0x8c/0x110
     vfs_read+0xdc/0x3a0
     ksys_read+0x84/0x144
     system_call_exception+0x124/0x330
     system_call_vectored_common+0x15c/0x2ec
    --- interrupt: 3000 at 0x7fff81f3ab34

Commit 6d07d1c ("usercopy: Restrict non-usercopy caches to size 0")
requires that only whitelisted areas in slab/slub objects can be copied to
userspace when usercopy hardening is enabled using CONFIG_HARDENED_USERCOPY.
Dtl contains hypervisor dispatch events which are expected to be read by
privileged users. Hence mark this safe for user access.
Specify useroffset=0 and usersize=DISPATCH_LOG_BYTES to whitelist the
entire object.

Co-developed-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Anjali K <anjalik@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240614173844.746818-1-anjalik@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
ericwoud pushed a commit to ericwoud/linux that referenced this pull request Jul 26, 2024
[ Upstream commit 1a14150 ]

Reading the dispatch trace log from /sys/kernel/debug/powerpc/dtl/cpu-*
results in a BUG() when the config CONFIG_HARDENED_USERCOPY is enabled as
shown below.

    kernel BUG at mm/usercopy.c:102!
    Oops: Exception in kernel mode, sig: 5 [#1]
    LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
    Modules linked in: xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc
    scsi_transport_fc ibmveth pseries_wdt dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse
    CPU: 27 PID: 1815 Comm: python3 Not tainted 6.10.0-rc3 torvalds#85
    Hardware name: IBM,9040-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_042) hv:phyp pSeries
    NIP:  c0000000005d23d4 LR: c0000000005d23d0 CTR: 00000000006ee6f8
    REGS: c000000120c078c0 TRAP: 0700   Not tainted  (6.10.0-rc3)
    MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 2828220f  XER: 0000000e
    CFAR: c0000000001fdc80 IRQMASK: 0
    [ ... GPRs omitted ... ]
    NIP [c0000000005d23d4] usercopy_abort+0x78/0xb0
    LR [c0000000005d23d0] usercopy_abort+0x74/0xb0
    Call Trace:
     usercopy_abort+0x74/0xb0 (unreliable)
     __check_heap_object+0xf8/0x120
     check_heap_object+0x218/0x240
     __check_object_size+0x84/0x1a4
     dtl_file_read+0x17c/0x2c4
     full_proxy_read+0x8c/0x110
     vfs_read+0xdc/0x3a0
     ksys_read+0x84/0x144
     system_call_exception+0x124/0x330
     system_call_vectored_common+0x15c/0x2ec
    --- interrupt: 3000 at 0x7fff81f3ab34

Commit 6d07d1c ("usercopy: Restrict non-usercopy caches to size 0")
requires that only whitelisted areas in slab/slub objects can be copied to
userspace when usercopy hardening is enabled using CONFIG_HARDENED_USERCOPY.
Dtl contains hypervisor dispatch events which are expected to be read by
privileged users. Hence mark this safe for user access.
Specify useroffset=0 and usersize=DISPATCH_LOG_BYTES to whitelist the
entire object.

Co-developed-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Anjali K <anjalik@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240614173844.746818-1-anjalik@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
tobetter pushed a commit to tobetter/linux that referenced this pull request Aug 7, 2024
[ Upstream commit 1a14150 ]

Reading the dispatch trace log from /sys/kernel/debug/powerpc/dtl/cpu-*
results in a BUG() when the config CONFIG_HARDENED_USERCOPY is enabled as
shown below.

    kernel BUG at mm/usercopy.c:102!
    Oops: Exception in kernel mode, sig: 5 [#1]
    LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
    Modules linked in: xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc
    scsi_transport_fc ibmveth pseries_wdt dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse
    CPU: 27 PID: 1815 Comm: python3 Not tainted 6.10.0-rc3 torvalds#85
    Hardware name: IBM,9040-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_042) hv:phyp pSeries
    NIP:  c0000000005d23d4 LR: c0000000005d23d0 CTR: 00000000006ee6f8
    REGS: c000000120c078c0 TRAP: 0700   Not tainted  (6.10.0-rc3)
    MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 2828220f  XER: 0000000e
    CFAR: c0000000001fdc80 IRQMASK: 0
    [ ... GPRs omitted ... ]
    NIP [c0000000005d23d4] usercopy_abort+0x78/0xb0
    LR [c0000000005d23d0] usercopy_abort+0x74/0xb0
    Call Trace:
     usercopy_abort+0x74/0xb0 (unreliable)
     __check_heap_object+0xf8/0x120
     check_heap_object+0x218/0x240
     __check_object_size+0x84/0x1a4
     dtl_file_read+0x17c/0x2c4
     full_proxy_read+0x8c/0x110
     vfs_read+0xdc/0x3a0
     ksys_read+0x84/0x144
     system_call_exception+0x124/0x330
     system_call_vectored_common+0x15c/0x2ec
    --- interrupt: 3000 at 0x7fff81f3ab34

Commit 6d07d1c ("usercopy: Restrict non-usercopy caches to size 0")
requires that only whitelisted areas in slab/slub objects can be copied to
userspace when usercopy hardening is enabled using CONFIG_HARDENED_USERCOPY.
Dtl contains hypervisor dispatch events which are expected to be read by
privileged users. Hence mark this safe for user access.
Specify useroffset=0 and usersize=DISPATCH_LOG_BYTES to whitelist the
entire object.

Co-developed-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Anjali K <anjalik@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240614173844.746818-1-anjalik@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
intelfx pushed a commit to intelfx/linux that referenced this pull request Aug 15, 2024
When I was testing mongodb over bcachefs with compression,
there is a lockdep warning when snapshotting mongodb data volume.

$ cat test.sh
prog=bcachefs

$prog subvolume create /mnt/data
$prog subvolume create /mnt/data/snapshots

while true;do
    $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s)
    sleep 1s
done

$ cat /etc/mongodb.conf
systemLog:
  destination: file
  logAppend: true
  path: /mnt/data/mongod.log

storage:
  dbPath: /mnt/data/

lockdep reports:
[ 3437.452330] ======================================================
[ 3437.452750] WARNING: possible circular locking dependency detected
[ 3437.453168] 6.7.0-rc7-custom+ torvalds#85 Tainted: G            E
[ 3437.453562] ------------------------------------------------------
[ 3437.453981] bcachefs/35533 is trying to acquire lock:
[ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190
[ 3437.454875]
               but task is already holding lock:
[ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.456009]
               which lock already depends on the new lock.

[ 3437.456553]
               the existing dependency chain (in reverse order) is:
[ 3437.457054]
               -> #3 (&type->s_umount_key#48){.+.+}-{3:3}:
[ 3437.457507]        down_read+0x3e/0x170
[ 3437.457772]        bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.458206]        __x64_sys_ioctl+0x93/0xd0
[ 3437.458498]        do_syscall_64+0x42/0xf0
[ 3437.458779]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.459155]
               -> #2 (&c->snapshot_create_lock){++++}-{3:3}:
[ 3437.459615]        down_read+0x3e/0x170
[ 3437.459878]        bch2_truncate+0x82/0x110 [bcachefs]
[ 3437.460276]        bchfs_truncate+0x254/0x3c0 [bcachefs]
[ 3437.460686]        notify_change+0x1f1/0x4a0
[ 3437.461283]        do_truncate+0x7f/0xd0
[ 3437.461555]        path_openat+0xa57/0xce0
[ 3437.461836]        do_filp_open+0xb4/0x160
[ 3437.462116]        do_sys_openat2+0x91/0xc0
[ 3437.462402]        __x64_sys_openat+0x53/0xa0
[ 3437.462701]        do_syscall_64+0x42/0xf0
[ 3437.462982]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.463359]
               -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}:
[ 3437.463843]        down_write+0x3b/0xc0
[ 3437.464223]        bch2_write_iter+0x5b/0xcc0 [bcachefs]
[ 3437.464493]        vfs_write+0x21b/0x4c0
[ 3437.464653]        ksys_write+0x69/0xf0
[ 3437.464839]        do_syscall_64+0x42/0xf0
[ 3437.465009]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.465231]
               -> #0 (sb_writers#10){.+.+}-{0:0}:
[ 3437.465471]        __lock_acquire+0x1455/0x21b0
[ 3437.465656]        lock_acquire+0xc6/0x2b0
[ 3437.465822]        mnt_want_write+0x46/0x1a0
[ 3437.465996]        filename_create+0x62/0x190
[ 3437.466175]        user_path_create+0x2d/0x50
[ 3437.466352]        bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
[ 3437.466617]        __x64_sys_ioctl+0x93/0xd0
[ 3437.466791]        do_syscall_64+0x42/0xf0
[ 3437.466957]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.467180]
               other info that might help us debug this:

[ 3437.469670] 2 locks held by bcachefs/35533:
               other info that might help us debug this:

[ 3437.467507] Chain exists of:
                 sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48

[ 3437.467979]  Possible unsafe locking scenario:

[ 3437.468223]        CPU0                    CPU1
[ 3437.468405]        ----                    ----
[ 3437.468585]   rlock(&type->s_umount_key#48);
[ 3437.468758]                                lock(&c->snapshot_create_lock);
[ 3437.469030]                                lock(&type->s_umount_key#48);
[ 3437.469291]   rlock(sb_writers#10);
[ 3437.469434]
                *** DEADLOCK ***

[ 3437.469670] 2 locks held by bcachefs/35533:
[ 3437.469838]  #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs]
[ 3437.470294]  #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.470744]
               stack backtrace:
[ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G            E      6.7.0-rc7-custom+ torvalds#85
[ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[ 3437.471694] Call Trace:
[ 3437.471795]  <TASK>
[ 3437.471884]  dump_stack_lvl+0x57/0x90
[ 3437.472035]  check_noncircular+0x132/0x150
[ 3437.472202]  __lock_acquire+0x1455/0x21b0
[ 3437.472369]  lock_acquire+0xc6/0x2b0
[ 3437.472518]  ? filename_create+0x62/0x190
[ 3437.472683]  ? lock_is_held_type+0x97/0x110
[ 3437.472856]  mnt_want_write+0x46/0x1a0
[ 3437.473025]  ? filename_create+0x62/0x190
[ 3437.473204]  filename_create+0x62/0x190
[ 3437.473380]  user_path_create+0x2d/0x50
[ 3437.473555]  bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
[ 3437.473819]  ? lock_acquire+0xc6/0x2b0
[ 3437.474002]  ? __fget_files+0x2a/0x190
[ 3437.474195]  ? __fget_files+0xbc/0x190
[ 3437.474380]  ? lock_release+0xc5/0x270
[ 3437.474567]  ? __x64_sys_ioctl+0x93/0xd0
[ 3437.474764]  ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs]
[ 3437.475090]  __x64_sys_ioctl+0x93/0xd0
[ 3437.475277]  do_syscall_64+0x42/0xf0
[ 3437.475454]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.475691] RIP: 0033:0x7f2743c313af
======================================================

In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally
and unlock it at the end of the function. There is a comment
"why do we need this lock?" about the lock coming from
commit 42d2373 ("bcachefs: Snapshot creation, deletion")
The reason is that __bch2_ioctl_subvolume_create() calls
sync_inodes_sb() which enforce locked s_umount to writeback all dirty
nodes before doing snapshot works.

Fix it by read locking s_umount for snapshotting only and unlocking
s_umount after sync_inodes_sb().

Signed-off-by: Su Yue <glass.su@suse.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
jhautbois pushed a commit to YoseliSAS/linux that referenced this pull request Aug 21, 2024
Reading the dispatch trace log from /sys/kernel/debug/powerpc/dtl/cpu-*
results in a BUG() when the config CONFIG_HARDENED_USERCOPY is enabled as
shown below.

    kernel BUG at mm/usercopy.c:102!
    Oops: Exception in kernel mode, sig: 5 [#1]
    LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
    Modules linked in: xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc
    scsi_transport_fc ibmveth pseries_wdt dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse
    CPU: 27 PID: 1815 Comm: python3 Not tainted 6.10.0-rc3 torvalds#85
    Hardware name: IBM,9040-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_042) hv:phyp pSeries
    NIP:  c0000000005d23d4 LR: c0000000005d23d0 CTR: 00000000006ee6f8
    REGS: c000000120c078c0 TRAP: 0700   Not tainted  (6.10.0-rc3)
    MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 2828220f  XER: 0000000e
    CFAR: c0000000001fdc80 IRQMASK: 0
    [ ... GPRs omitted ... ]
    NIP [c0000000005d23d4] usercopy_abort+0x78/0xb0
    LR [c0000000005d23d0] usercopy_abort+0x74/0xb0
    Call Trace:
     usercopy_abort+0x74/0xb0 (unreliable)
     __check_heap_object+0xf8/0x120
     check_heap_object+0x218/0x240
     __check_object_size+0x84/0x1a4
     dtl_file_read+0x17c/0x2c4
     full_proxy_read+0x8c/0x110
     vfs_read+0xdc/0x3a0
     ksys_read+0x84/0x144
     system_call_exception+0x124/0x330
     system_call_vectored_common+0x15c/0x2ec
    --- interrupt: 3000 at 0x7fff81f3ab34

Commit 6d07d1c ("usercopy: Restrict non-usercopy caches to size 0")
requires that only whitelisted areas in slab/slub objects can be copied to
userspace when usercopy hardening is enabled using CONFIG_HARDENED_USERCOPY.
Dtl contains hypervisor dispatch events which are expected to be read by
privileged users. Hence mark this safe for user access.
Specify useroffset=0 and usersize=DISPATCH_LOG_BYTES to whitelist the
entire object.

Co-developed-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Anjali K <anjalik@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240614173844.746818-1-anjalik@linux.ibm.com
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Sep 4, 2024
[ Upstream commit 1a14150 ]

Reading the dispatch trace log from /sys/kernel/debug/powerpc/dtl/cpu-*
results in a BUG() when the config CONFIG_HARDENED_USERCOPY is enabled as
shown below.

    kernel BUG at mm/usercopy.c:102!
    Oops: Exception in kernel mode, sig: 5 [#1]
    LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
    Modules linked in: xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc
    scsi_transport_fc ibmveth pseries_wdt dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse
    CPU: 27 PID: 1815 Comm: python3 Not tainted 6.10.0-rc3 torvalds#85
    Hardware name: IBM,9040-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_042) hv:phyp pSeries
    NIP:  c0000000005d23d4 LR: c0000000005d23d0 CTR: 00000000006ee6f8
    REGS: c000000120c078c0 TRAP: 0700   Not tainted  (6.10.0-rc3)
    MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 2828220f  XER: 0000000e
    CFAR: c0000000001fdc80 IRQMASK: 0
    [ ... GPRs omitted ... ]
    NIP [c0000000005d23d4] usercopy_abort+0x78/0xb0
    LR [c0000000005d23d0] usercopy_abort+0x74/0xb0
    Call Trace:
     usercopy_abort+0x74/0xb0 (unreliable)
     __check_heap_object+0xf8/0x120
     check_heap_object+0x218/0x240
     __check_object_size+0x84/0x1a4
     dtl_file_read+0x17c/0x2c4
     full_proxy_read+0x8c/0x110
     vfs_read+0xdc/0x3a0
     ksys_read+0x84/0x144
     system_call_exception+0x124/0x330
     system_call_vectored_common+0x15c/0x2ec
    --- interrupt: 3000 at 0x7fff81f3ab34

Commit 6d07d1c ("usercopy: Restrict non-usercopy caches to size 0")
requires that only whitelisted areas in slab/slub objects can be copied to
userspace when usercopy hardening is enabled using CONFIG_HARDENED_USERCOPY.
Dtl contains hypervisor dispatch events which are expected to be read by
privileged users. Hence mark this safe for user access.
Specify useroffset=0 and usersize=DISPATCH_LOG_BYTES to whitelist the
entire object.

Co-developed-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Anjali K <anjalik@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240614173844.746818-1-anjalik@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
1054009064 pushed a commit to 1054009064/linux that referenced this pull request Sep 9, 2024
[ Upstream commit 1a14150 ]

Reading the dispatch trace log from /sys/kernel/debug/powerpc/dtl/cpu-*
results in a BUG() when the config CONFIG_HARDENED_USERCOPY is enabled as
shown below.

    kernel BUG at mm/usercopy.c:102!
    Oops: Exception in kernel mode, sig: 5 [#1]
    LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
    Modules linked in: xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc
    scsi_transport_fc ibmveth pseries_wdt dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse
    CPU: 27 PID: 1815 Comm: python3 Not tainted 6.10.0-rc3 torvalds#85
    Hardware name: IBM,9040-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_042) hv:phyp pSeries
    NIP:  c0000000005d23d4 LR: c0000000005d23d0 CTR: 00000000006ee6f8
    REGS: c000000120c078c0 TRAP: 0700   Not tainted  (6.10.0-rc3)
    MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 2828220f  XER: 0000000e
    CFAR: c0000000001fdc80 IRQMASK: 0
    [ ... GPRs omitted ... ]
    NIP [c0000000005d23d4] usercopy_abort+0x78/0xb0
    LR [c0000000005d23d0] usercopy_abort+0x74/0xb0
    Call Trace:
     usercopy_abort+0x74/0xb0 (unreliable)
     __check_heap_object+0xf8/0x120
     check_heap_object+0x218/0x240
     __check_object_size+0x84/0x1a4
     dtl_file_read+0x17c/0x2c4
     full_proxy_read+0x8c/0x110
     vfs_read+0xdc/0x3a0
     ksys_read+0x84/0x144
     system_call_exception+0x124/0x330
     system_call_vectored_common+0x15c/0x2ec
    --- interrupt: 3000 at 0x7fff81f3ab34

Commit 6d07d1c ("usercopy: Restrict non-usercopy caches to size 0")
requires that only whitelisted areas in slab/slub objects can be copied to
userspace when usercopy hardening is enabled using CONFIG_HARDENED_USERCOPY.
Dtl contains hypervisor dispatch events which are expected to be read by
privileged users. Hence mark this safe for user access.
Specify useroffset=0 and usersize=DISPATCH_LOG_BYTES to whitelist the
entire object.

Co-developed-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Anjali K <anjalik@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240614173844.746818-1-anjalik@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant