Merge pull request #1 from torvalds/master #85

e-mailky · 2014-03-31T08:07:47Z

pull from torvalds

Turn it into (for example): [ 0.073380] x86: Booting SMP configuration: [ 0.074005] .... node #0, CPUs: #1 #2 #3 #4 #5 torvalds#6 torvalds#7 [ 0.603005] .... node #1, CPUs: torvalds#8 torvalds#9 torvalds#10 torvalds#11 torvalds#12 torvalds#13 torvalds#14 torvalds#15 [ 1.200005] .... node #2, CPUs: torvalds#16 torvalds#17 torvalds#18 torvalds#19 torvalds#20 torvalds#21 torvalds#22 torvalds#23 [ 1.796005] .... node #3, CPUs: torvalds#24 torvalds#25 torvalds#26 torvalds#27 torvalds#28 torvalds#29 torvalds#30 torvalds#31 [ 2.393005] .... node #4, CPUs: torvalds#32 torvalds#33 torvalds#34 torvalds#35 torvalds#36 torvalds#37 torvalds#38 torvalds#39 [ 2.996005] .... node #5, CPUs: torvalds#40 torvalds#41 torvalds#42 torvalds#43 torvalds#44 torvalds#45 torvalds#46 torvalds#47 [ 3.600005] .... node torvalds#6, CPUs: torvalds#48 torvalds#49 torvalds#50 torvalds#51 #52 #53 torvalds#54 torvalds#55 [ 4.202005] .... node torvalds#7, CPUs: torvalds#56 torvalds#57 #58 torvalds#59 torvalds#60 torvalds#61 torvalds#62 torvalds#63 [ 4.811005] .... node torvalds#8, CPUs: torvalds#64 torvalds#65 torvalds#66 torvalds#67 torvalds#68 torvalds#69 #70 torvalds#71 [ 5.421006] .... node torvalds#9, CPUs: torvalds#72 torvalds#73 torvalds#74 torvalds#75 torvalds#76 torvalds#77 torvalds#78 torvalds#79 [ 6.032005] .... node torvalds#10, CPUs: torvalds#80 torvalds#81 torvalds#82 torvalds#83 torvalds#84 torvalds#85 torvalds#86 torvalds#87 [ 6.648006] .... node torvalds#11, CPUs: torvalds#88 torvalds#89 torvalds#90 torvalds#91 torvalds#92 torvalds#93 torvalds#94 torvalds#95 [ 7.262005] .... node torvalds#12, CPUs: torvalds#96 torvalds#97 torvalds#98 torvalds#99 torvalds#100 torvalds#101 torvalds#102 torvalds#103 [ 7.865005] .... node torvalds#13, CPUs: torvalds#104 torvalds#105 torvalds#106 torvalds#107 torvalds#108 torvalds#109 torvalds#110 torvalds#111 [ 8.466005] .... node torvalds#14, CPUs: torvalds#112 torvalds#113 torvalds#114 torvalds#115 torvalds#116 torvalds#117 torvalds#118 torvalds#119 [ 9.073006] .... node torvalds#15, CPUs: torvalds#120 torvalds#121 torvalds#122 torvalds#123 torvalds#124 torvalds#125 torvalds#126 torvalds#127 [ 9.679901] x86: Booted up 16 nodes, 128 CPUs and drop useless elements. Change num_digits() to hpa's division-avoiding, cell-phone-typed version which he went at great lengths and pains to submit on a Saturday evening. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: huawei.libin@huawei.com Cc: wangyijing@huawei.com Cc: fenghua.yu@intel.com Cc: guohanjun@huawei.com Cc: paul.gortmaker@windriver.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20130930095624.GB16383@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>

Fixing the below dump: root@freescale ~$ modprobe g_serial g_serial gadget: Gadget Serial v2.4 g_serial gadget: g_serial ready BUG: sleeping function called from invalid context at /home/b29397/work/projects/upstream/usb/usb/drivers/base/power/runtime.c:952 in_atomic(): 1, irqs_disabled(): 128, pid: 805, name: modprobe 2 locks held by modprobe/805: #0: (udc_lock){+.+.+.}, at: [<7f000a74>] usb_gadget_probe_driver+0x44/0xb4 [udc_core] #1: (&(&ci->lock)->rlock){......}, at: [<7f033488>] ci_udc_start+0x94/0x110 [ci_hdrc] irq event stamp: 3878 hardirqs last enabled at (3877): [<806b6720>] _raw_spin_unlock_irqrestore+0x40/0x6c hardirqs last disabled at (3878): [<806b6474>] _raw_spin_lock_irqsave+0x2c/0xa8 softirqs last enabled at (3872): [<8002ec0c>] __do_softirq+0x1c8/0x2e8 softirqs last disabled at (3857): [<8002f180>] irq_exit+0xbc/0x110 CPU: 0 PID: 805 Comm: modprobe Not tainted 3.11.0-next-20130910+ torvalds#85 [<80016b94>] (unwind_backtrace+0x0/0xf8) from [<80012e0c>] (show_stack+0x20/0x24) [<80012e0c>] (show_stack+0x20/0x24) from [<806af554>] (dump_stack+0x9c/0xc4) [<806af554>] (dump_stack+0x9c/0xc4) from [<8005940c>] (__might_sleep+0xf4/0x134) [<8005940c>] (__might_sleep+0xf4/0x134) from [<803a04a4>] (__pm_runtime_resume+0x94/0xa0) [<803a04a4>] (__pm_runtime_resume+0x94/0xa0) from [<7f0334a4>] (ci_udc_start+0xb0/0x110 [ci_hdrc]) [<7f0334a4>] (ci_udc_start+0xb0/0x110 [ci_hdrc]) from [<7f0009b4>] (udc_bind_to_driver+0x5c/0xd8 [udc_core]) [<7f0009b4>] (udc_bind_to_driver+0x5c/0xd8 [udc_core]) from [<7f000ab0>] (usb_gadget_probe_driver+0x80/0xb4 [udc_core]) [<7f000ab0>] (usb_gadget_probe_driver+0x80/0xb4 [udc_core]) from [<7f008618>] (usb_composite_probe+0xac/0xd8 [libcomposite]) [<7f008618>] (usb_composite_probe+0xac/0xd8 [libcomposite]) from [<7f04b168>] (init+0x8c/0xb4 [g_serial]) [<7f04b168>] (init+0x8c/0xb4 [g_serial]) from [<800088e8>] (do_one_initcall+0x108/0x16c) [<800088e8>] (do_one_initcall+0x108/0x16c) from [<8008e518>] (load_module+0x1b00/0x20a4) [<8008e518>] (load_module+0x1b00/0x20a4) from [<8008eba8>] (SyS_init_module+0xec/0x100) [<8008eba8>] (SyS_init_module+0xec/0x100) from [<8000ec40>] (ret_fast_syscall+0x0/0x48) Signed-off-by: Peter Chen <peter.chen@freescale.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

@clk

The @cn is stay in @clk_notifier_list after it is freed, it cause memory corruption. Example, if @clk is registered(first), unregistered(first), registered(second), unregistered(second). The freed @cn will be used when @clk is registered(second), and the bug will be happened when @clk is unregistered(second): [ 517.040000] clk_notif_dbg clk_notif_dbg.1: clk_notifier_unregister() [ 517.040000] Unable to handle kernel paging request at virtual address 00df3008 [ 517.050000] pgd = ed858000 [ 517.050000] [00df3008] *pgd=00000000 [ 517.060000] Internal error: Oops: 5 [#1] PREEMPT SMP ARM [ 517.060000] Modules linked in: clk_notif_dbg(O-) [last unloaded: clk_notif_dbg] [ 517.060000] CPU: 1 PID: 499 Comm: modprobe Tainted: G O 3.10.0-rc3-00119-ga93cb29-dirty torvalds#85 [ 517.060000] task: ee1e0180 ti: ee3e6000 task.ti: ee3e6000 [ 517.060000] PC is at srcu_readers_seq_idx+0x48/0x84 [ 517.060000] LR is at srcu_readers_seq_idx+0x60/0x84 [ 517.060000] pc : [<c0052720>] lr : [<c0052738>] psr: 80070013 [ 517.060000] sp : ee3e7d48 ip : 00000000 fp : ee3e7d6c [ 517.060000] r10: 00000000 r9 : ee3e6000 r8 : 00000000 [ 517.060000] r7 : ed84fe4c r6 : c068ec90 r5 : c068e430 r4 : 00000000 [ 517.060000] r3 : 00df3000 r2 : 00000000 r1 : 00000002 r0 : 00000000 [ 517.060000] Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [ 517.060000] Control: 18c5387d Table: 2d85804a DAC: 00000015 [ 517.060000] Process modprobe (pid: 499, stack limit = 0xee3e6238) [ 517.060000] Stack: (0xee3e7d48 to 0xee3e8000) .... [ 517.060000] [<c0052720>] (srcu_readers_seq_idx+0x48/0x84) from [<c0052790>] (try_check_zero+0x34/0xfc) [ 517.060000] [<c0052790>] (try_check_zero+0x34/0xfc) from [<c00528b0>] (srcu_advance_batches+0x58/0x114) [ 517.060000] [<c00528b0>] (srcu_advance_batches+0x58/0x114) from [<c0052c30>] (__synchronize_srcu+0x114/0x1ac) [ 517.060000] [<c0052c30>] (__synchronize_srcu+0x114/0x1ac) from [<c0052d14>] (synchronize_srcu+0x2c/0x34) [ 517.060000] [<c0052d14>] (synchronize_srcu+0x2c/0x34) from [<c0053a08>] (srcu_notifier_chain_unregister+0x68/0x74) [ 517.060000] [<c0053a08>] (srcu_notifier_chain_unregister+0x68/0x74) from [<c0375a78>] (clk_notifier_unregister+0x7c/0xc0) [ 517.060000] [<c0375a78>] (clk_notifier_unregister+0x7c/0xc0) from [<bf008034>] (clk_notif_dbg_remove+0x34/0x9c [clk_notif_dbg]) [ 517.060000] [<bf008034>] (clk_notif_dbg_remove+0x34/0x9c [clk_notif_dbg]) from [<c02bb974>] (platform_drv_remove+0x24/0x28) [ 517.060000] [<c02bb974>] (platform_drv_remove+0x24/0x28) from [<c02b9bf8>] (__device_release_driver+0x8c/0xd4) [ 517.060000] [<c02b9bf8>] (__device_release_driver+0x8c/0xd4) from [<c02ba680>] (driver_detach+0x9c/0xc4) [ 517.060000] [<c02ba680>] (driver_detach+0x9c/0xc4) from [<c02b99c4>] (bus_remove_driver+0xcc/0xfc) [ 517.060000] [<c02b99c4>] (bus_remove_driver+0xcc/0xfc) from [<c02bace4>] (driver_unregister+0x54/0x78) [ 517.060000] [<c02bace4>] (driver_unregister+0x54/0x78) from [<c02bbb44>] (platform_driver_unregister+0x1c/0x20) [ 517.060000] [<c02bbb44>] (platform_driver_unregister+0x1c/0x20) from [<bf0081f8>] (clk_notif_dbg_driver_exit+0x14/0x1c [clk_notif_dbg]) [ 517.060000] [<bf0081f8>] (clk_notif_dbg_driver_exit+0x14/0x1c [clk_notif_dbg]) from [<c00835e4>] (SyS_delete_module+0x200/0x28c) [ 517.060000] [<c00835e4>] (SyS_delete_module+0x200/0x28c) from [<c000edc0>] (ret_fast_syscall+0x0/0x48) [ 517.060000] Code: e5973004 e7911102 e0833001 e2881002 (e7933101) Cc: stable@kernel.org Reported-by: Sören Brinkmann <soren.brinkmann@xilinx.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Tested-by: Sören Brinkmann <soren.brinkmann@xilinx.com> Signed-off-by: Mike Turquette <mturquette@linaro.org> [mturquette@linaro.org: shortened $SUBJECT]

@clk

commit 72b5322 upstream. The @cn is stay in @clk_notifier_list after it is freed, it cause memory corruption. Example, if @clk is registered(first), unregistered(first), registered(second), unregistered(second). The freed @cn will be used when @clk is registered(second), and the bug will be happened when @clk is unregistered(second): [ 517.040000] clk_notif_dbg clk_notif_dbg.1: clk_notifier_unregister() [ 517.040000] Unable to handle kernel paging request at virtual address 00df3008 [ 517.050000] pgd = ed858000 [ 517.050000] [00df3008] *pgd=00000000 [ 517.060000] Internal error: Oops: 5 [#1] PREEMPT SMP ARM [ 517.060000] Modules linked in: clk_notif_dbg(O-) [last unloaded: clk_notif_dbg] [ 517.060000] CPU: 1 PID: 499 Comm: modprobe Tainted: G O 3.10.0-rc3-00119-ga93cb29-dirty torvalds#85 [ 517.060000] task: ee1e0180 ti: ee3e6000 task.ti: ee3e6000 [ 517.060000] PC is at srcu_readers_seq_idx+0x48/0x84 [ 517.060000] LR is at srcu_readers_seq_idx+0x60/0x84 [ 517.060000] pc : [<c0052720>] lr : [<c0052738>] psr: 80070013 [ 517.060000] sp : ee3e7d48 ip : 00000000 fp : ee3e7d6c [ 517.060000] r10: 00000000 r9 : ee3e6000 r8 : 00000000 [ 517.060000] r7 : ed84fe4c r6 : c068ec90 r5 : c068e430 r4 : 00000000 [ 517.060000] r3 : 00df3000 r2 : 00000000 r1 : 00000002 r0 : 00000000 [ 517.060000] Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [ 517.060000] Control: 18c5387d Table: 2d85804a DAC: 00000015 [ 517.060000] Process modprobe (pid: 499, stack limit = 0xee3e6238) [ 517.060000] Stack: (0xee3e7d48 to 0xee3e8000) .... [ 517.060000] [<c0052720>] (srcu_readers_seq_idx+0x48/0x84) from [<c0052790>] (try_check_zero+0x34/0xfc) [ 517.060000] [<c0052790>] (try_check_zero+0x34/0xfc) from [<c00528b0>] (srcu_advance_batches+0x58/0x114) [ 517.060000] [<c00528b0>] (srcu_advance_batches+0x58/0x114) from [<c0052c30>] (__synchronize_srcu+0x114/0x1ac) [ 517.060000] [<c0052c30>] (__synchronize_srcu+0x114/0x1ac) from [<c0052d14>] (synchronize_srcu+0x2c/0x34) [ 517.060000] [<c0052d14>] (synchronize_srcu+0x2c/0x34) from [<c0053a08>] (srcu_notifier_chain_unregister+0x68/0x74) [ 517.060000] [<c0053a08>] (srcu_notifier_chain_unregister+0x68/0x74) from [<c0375a78>] (clk_notifier_unregister+0x7c/0xc0) [ 517.060000] [<c0375a78>] (clk_notifier_unregister+0x7c/0xc0) from [<bf008034>] (clk_notif_dbg_remove+0x34/0x9c [clk_notif_dbg]) [ 517.060000] [<bf008034>] (clk_notif_dbg_remove+0x34/0x9c [clk_notif_dbg]) from [<c02bb974>] (platform_drv_remove+0x24/0x28) [ 517.060000] [<c02bb974>] (platform_drv_remove+0x24/0x28) from [<c02b9bf8>] (__device_release_driver+0x8c/0xd4) [ 517.060000] [<c02b9bf8>] (__device_release_driver+0x8c/0xd4) from [<c02ba680>] (driver_detach+0x9c/0xc4) [ 517.060000] [<c02ba680>] (driver_detach+0x9c/0xc4) from [<c02b99c4>] (bus_remove_driver+0xcc/0xfc) [ 517.060000] [<c02b99c4>] (bus_remove_driver+0xcc/0xfc) from [<c02bace4>] (driver_unregister+0x54/0x78) [ 517.060000] [<c02bace4>] (driver_unregister+0x54/0x78) from [<c02bbb44>] (platform_driver_unregister+0x1c/0x20) [ 517.060000] [<c02bbb44>] (platform_driver_unregister+0x1c/0x20) from [<bf0081f8>] (clk_notif_dbg_driver_exit+0x14/0x1c [clk_notif_dbg]) [ 517.060000] [<bf0081f8>] (clk_notif_dbg_driver_exit+0x14/0x1c [clk_notif_dbg]) from [<c00835e4>] (SyS_delete_module+0x200/0x28c) [ 517.060000] [<c00835e4>] (SyS_delete_module+0x200/0x28c) from [<c000edc0>] (ret_fast_syscall+0x0/0x48) [ 517.060000] Code: e5973004 e7911102 e0833001 e2881002 (e7933101) Reported-by: Sören Brinkmann <soren.brinkmann@xilinx.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Tested-by: Sören Brinkmann <soren.brinkmann@xilinx.com> Signed-off-by: Mike Turquette <mturquette@linaro.org> [mturquette@linaro.org: shortened $SUBJECT] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

@clk

commit 72b5322 upstream. The @cn is stay in @clk_notifier_list after it is freed, it cause memory corruption. Example, if @clk is registered(first), unregistered(first), registered(second), unregistered(second). The freed @cn will be used when @clk is registered(second), and the bug will be happened when @clk is unregistered(second): [ 517.040000] clk_notif_dbg clk_notif_dbg.1: clk_notifier_unregister() [ 517.040000] Unable to handle kernel paging request at virtual address 00df3008 [ 517.050000] pgd = ed858000 [ 517.050000] [00df3008] *pgd=00000000 [ 517.060000] Internal error: Oops: 5 [#1] PREEMPT SMP ARM [ 517.060000] Modules linked in: clk_notif_dbg(O-) [last unloaded: clk_notif_dbg] [ 517.060000] CPU: 1 PID: 499 Comm: modprobe Tainted: G O 3.10.0-rc3-00119-ga93cb29-dirty torvalds#85 [ 517.060000] task: ee1e0180 ti: ee3e6000 task.ti: ee3e6000 [ 517.060000] PC is at srcu_readers_seq_idx+0x48/0x84 [ 517.060000] LR is at srcu_readers_seq_idx+0x60/0x84 [ 517.060000] pc : [<c0052720>] lr : [<c0052738>] psr: 80070013 [ 517.060000] sp : ee3e7d48 ip : 00000000 fp : ee3e7d6c [ 517.060000] r10: 00000000 r9 : ee3e6000 r8 : 00000000 [ 517.060000] r7 : ed84fe4c r6 : c068ec90 r5 : c068e430 r4 : 00000000 [ 517.060000] r3 : 00df3000 r2 : 00000000 r1 : 00000002 r0 : 00000000 [ 517.060000] Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [ 517.060000] Control: 18c5387d Table: 2d85804a DAC: 00000015 [ 517.060000] Process modprobe (pid: 499, stack limit = 0xee3e6238) [ 517.060000] Stack: (0xee3e7d48 to 0xee3e8000) .... [ 517.060000] [<c0052720>] (srcu_readers_seq_idx+0x48/0x84) from [<c0052790>] (try_check_zero+0x34/0xfc) [ 517.060000] [<c0052790>] (try_check_zero+0x34/0xfc) from [<c00528b0>] (srcu_advance_batches+0x58/0x114) [ 517.060000] [<c00528b0>] (srcu_advance_batches+0x58/0x114) from [<c0052c30>] (__synchronize_srcu+0x114/0x1ac) [ 517.060000] [<c0052c30>] (__synchronize_srcu+0x114/0x1ac) from [<c0052d14>] (synchronize_srcu+0x2c/0x34) [ 517.060000] [<c0052d14>] (synchronize_srcu+0x2c/0x34) from [<c0053a08>] (srcu_notifier_chain_unregister+0x68/0x74) [ 517.060000] [<c0053a08>] (srcu_notifier_chain_unregister+0x68/0x74) from [<c0375a78>] (clk_notifier_unregister+0x7c/0xc0) [ 517.060000] [<c0375a78>] (clk_notifier_unregister+0x7c/0xc0) from [<bf008034>] (clk_notif_dbg_remove+0x34/0x9c [clk_notif_dbg]) [ 517.060000] [<bf008034>] (clk_notif_dbg_remove+0x34/0x9c [clk_notif_dbg]) from [<c02bb974>] (platform_drv_remove+0x24/0x28) [ 517.060000] [<c02bb974>] (platform_drv_remove+0x24/0x28) from [<c02b9bf8>] (__device_release_driver+0x8c/0xd4) [ 517.060000] [<c02b9bf8>] (__device_release_driver+0x8c/0xd4) from [<c02ba680>] (driver_detach+0x9c/0xc4) [ 517.060000] [<c02ba680>] (driver_detach+0x9c/0xc4) from [<c02b99c4>] (bus_remove_driver+0xcc/0xfc) [ 517.060000] [<c02b99c4>] (bus_remove_driver+0xcc/0xfc) from [<c02bace4>] (driver_unregister+0x54/0x78) [ 517.060000] [<c02bace4>] (driver_unregister+0x54/0x78) from [<c02bbb44>] (platform_driver_unregister+0x1c/0x20) [ 517.060000] [<c02bbb44>] (platform_driver_unregister+0x1c/0x20) from [<bf0081f8>] (clk_notif_dbg_driver_exit+0x14/0x1c [clk_notif_dbg]) [ 517.060000] [<bf0081f8>] (clk_notif_dbg_driver_exit+0x14/0x1c [clk_notif_dbg]) from [<c00835e4>] (SyS_delete_module+0x200/0x28c) [ 517.060000] [<c00835e4>] (SyS_delete_module+0x200/0x28c) from [<c000edc0>] (ret_fast_syscall+0x0/0x48) [ 517.060000] Code: e5973004 e7911102 e0833001 e2881002 (e7933101) Reported-by: Sören Brinkmann <soren.brinkmann@xilinx.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Tested-by: Sören Brinkmann <soren.brinkmann@xilinx.com> Signed-off-by: Mike Turquette <mturquette@linaro.org> [mturquette@linaro.org: shortened $SUBJECT] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Fixing the below dump: root@freescale ~$ modprobe g_serial g_serial gadget: Gadget Serial v2.4 g_serial gadget: g_serial ready BUG: sleeping function called from invalid context at /home/b29397/work/projects/upstream/usb/usb/drivers/base/power/runtime.c:952 in_atomic(): 1, irqs_disabled(): 128, pid: 805, name: modprobe 2 locks held by modprobe/805: #0: (udc_lock){+.+.+.}, at: [<7f000a74>] usb_gadget_probe_driver+0x44/0xb4 [udc_core] Freescale#1: (&(&ci->lock)->rlock){......}, at: [<7f033488>] ci_udc_start+0x94/0x110 [ci_hdrc] irq event stamp: 3878 hardirqs last enabled at (3877): [<806b6720>] _raw_spin_unlock_irqrestore+0x40/0x6c hardirqs last disabled at (3878): [<806b6474>] _raw_spin_lock_irqsave+0x2c/0xa8 softirqs last enabled at (3872): [<8002ec0c>] __do_softirq+0x1c8/0x2e8 softirqs last disabled at (3857): [<8002f180>] irq_exit+0xbc/0x110 CPU: 0 PID: 805 Comm: modprobe Not tainted 3.11.0-next-20130910+ Freescale#85 [<80016b94>] (unwind_backtrace+0x0/0xf8) from [<80012e0c>] (show_stack+0x20/0x24) [<80012e0c>] (show_stack+0x20/0x24) from [<806af554>] (dump_stack+0x9c/0xc4) [<806af554>] (dump_stack+0x9c/0xc4) from [<8005940c>] (__might_sleep+0xf4/0x134) [<8005940c>] (__might_sleep+0xf4/0x134) from [<803a04a4>] (__pm_runtime_resume+0x94/0xa0) [<803a04a4>] (__pm_runtime_resume+0x94/0xa0) from [<7f0334a4>] (ci_udc_start+0xb0/0x110 [ci_hdrc]) [<7f0334a4>] (ci_udc_start+0xb0/0x110 [ci_hdrc]) from [<7f0009b4>] (udc_bind_to_driver+0x5c/0xd8 [udc_core]) [<7f0009b4>] (udc_bind_to_driver+0x5c/0xd8 [udc_core]) from [<7f000ab0>] (usb_gadget_probe_driver+0x80/0xb4 [udc_core]) [<7f000ab0>] (usb_gadget_probe_driver+0x80/0xb4 [udc_core]) from [<7f008618>] (usb_composite_probe+0xac/0xd8 [libcomposite]) [<7f008618>] (usb_composite_probe+0xac/0xd8 [libcomposite]) from [<7f04b168>] (init+0x8c/0xb4 [g_serial]) [<7f04b168>] (init+0x8c/0xb4 [g_serial]) from [<800088e8>] (do_one_initcall+0x108/0x16c) [<800088e8>] (do_one_initcall+0x108/0x16c) from [<8008e518>] (load_module+0x1b00/0x20a4) [<8008e518>] (load_module+0x1b00/0x20a4) from [<8008eba8>] (SyS_init_module+0xec/0x100) [<8008eba8>] (SyS_init_module+0xec/0x100) from [<8000ec40>] (ret_fast_syscall+0x0/0x48) Signed-off-by: Peter Chen <peter.chen@freescale.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

When runing with the kernel(3.15-rc7+), the follow bug occurs: [ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586 [ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python [ 9969.441175] INFO: lockdep is turned off. [ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G A 3.15.0-rc7+ torvalds#85 [ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012 [ 9969.706052] ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18 [ 9969.795323] ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c [ 9969.884710] ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000 [ 9969.974071] Call Trace: [ 9970.003403] [<ffffffff8162f523>] dump_stack+0x4d/0x66 [ 9970.065074] [<ffffffff8109995a>] __might_sleep+0xfa/0x130 [ 9970.130743] [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0 [ 9970.200638] [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210 [ 9970.272610] [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140 [ 9970.344584] [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150 [ 9970.409282] [<ffffffff811b1385>] __mpol_dup+0xe5/0x150 [ 9970.471897] [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150 [ 9970.536585] [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40 [ 9970.613763] [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10 [ 9970.683660] [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50 [ 9970.759795] [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40 [ 9970.834885] [<ffffffff8106a598>] do_fork+0xd8/0x380 [ 9970.894375] [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0 [ 9970.969470] [<ffffffff8106a8c6>] SyS_clone+0x16/0x20 [ 9971.030011] [<ffffffff81642009>] stub_clone+0x69/0x90 [ 9971.091573] [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex) under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(), the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the rcu_read_lock protection region to protect the access to cpuset only in current_cpuset_is_being_rebound(). So that we can avoid this bug. This patch is a temporary solution that just addresses the bug mentioned above, can not fix the long-standing issue about cpuset.mems rebinding on fork(): "When the forker's task_struct is duplicated (which includes ->mems_allowed) and it races with an update to cpuset_being_rebound in update_tasks_nodemask() then the task's mems_allowed doesn't get updated. And the child task's mems_allowed can be wrong if the cpuset's nodemask changes before the child has been added to the cgroup's tasklist." Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Acked-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable <stable@vger.kernel.org>

commit 391acf9 upstream. When runing with the kernel(3.15-rc7+), the follow bug occurs: [ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586 [ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python [ 9969.441175] INFO: lockdep is turned off. [ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G A 3.15.0-rc7+ #85 [ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012 [ 9969.706052] ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18 [ 9969.795323] ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c [ 9969.884710] ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000 [ 9969.974071] Call Trace: [ 9970.003403] [<ffffffff8162f523>] dump_stack+0x4d/0x66 [ 9970.065074] [<ffffffff8109995a>] __might_sleep+0xfa/0x130 [ 9970.130743] [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0 [ 9970.200638] [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210 [ 9970.272610] [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140 [ 9970.344584] [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150 [ 9970.409282] [<ffffffff811b1385>] __mpol_dup+0xe5/0x150 [ 9970.471897] [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150 [ 9970.536585] [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40 [ 9970.613763] [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10 [ 9970.683660] [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50 [ 9970.759795] [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40 [ 9970.834885] [<ffffffff8106a598>] do_fork+0xd8/0x380 [ 9970.894375] [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0 [ 9970.969470] [<ffffffff8106a8c6>] SyS_clone+0x16/0x20 [ 9971.030011] [<ffffffff81642009>] stub_clone+0x69/0x90 [ 9971.091573] [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex) under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(), the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the rcu_read_lock protection region to protect the access to cpuset only in current_cpuset_is_being_rebound(). So that we can avoid this bug. This patch is a temporary solution that just addresses the bug mentioned above, can not fix the long-standing issue about cpuset.mems rebinding on fork(): "When the forker's task_struct is duplicated (which includes ->mems_allowed) and it races with an update to cpuset_being_rebound in update_tasks_nodemask() then the task's mems_allowed doesn't get updated. And the child task's mems_allowed can be wrong if the cpuset's nodemask changes before the child has been added to the cgroup's tasklist." Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Acked-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

… context Fixing the below dump: root@freescale ~$ modprobe g_serial g_serial gadget: Gadget Serial v2.4 g_serial gadget: g_serial ready BUG: sleeping function called from invalid context at /home/b29397/work/projects/upstream/usb/usb/drivers/base/power/runtime.c:952 in_atomic(): 1, irqs_disabled(): 128, pid: 805, name: modprobe 2 locks held by modprobe/805: #0: (udc_lock){+.+.+.}, at: [<7f000a74>] usb_gadget_probe_driver+0x44/0xb4 [udc_core] #1: (&(&ci->lock)->rlock){......}, at: [<7f033488>] ci_udc_start+0x94/0x110 [ci_hdrc] irq event stamp: 3878 hardirqs last enabled at (3877): [<806b6720>] _raw_spin_unlock_irqrestore+0x40/0x6c hardirqs last disabled at (3878): [<806b6474>] _raw_spin_lock_irqsave+0x2c/0xa8 softirqs last enabled at (3872): [<8002ec0c>] __do_softirq+0x1c8/0x2e8 softirqs last disabled at (3857): [<8002f180>] irq_exit+0xbc/0x110 CPU: 0 PID: 805 Comm: modprobe Not tainted 3.11.0-next-20130910+ torvalds#85 [<80016b94>] (unwind_backtrace+0x0/0xf8) from [<80012e0c>] (show_stack+0x20/0x24) [<80012e0c>] (show_stack+0x20/0x24) from [<806af554>] (dump_stack+0x9c/0xc4) [<806af554>] (dump_stack+0x9c/0xc4) from [<8005940c>] (__might_sleep+0xf4/0x134) [<8005940c>] (__might_sleep+0xf4/0x134) from [<803a04a4>] (__pm_runtime_resume+0x94/0xa0) [<803a04a4>] (__pm_runtime_resume+0x94/0xa0) from [<7f0334a4>] (ci_udc_start+0xb0/0x110 [ci_hdrc]) [<7f0334a4>] (ci_udc_start+0xb0/0x110 [ci_hdrc]) from [<7f0009b4>] (udc_bind_to_driver+0x5c/0xd8 [udc_core]) [<7f0009b4>] (udc_bind_to_driver+0x5c/0xd8 [udc_core]) from [<7f000ab0>] (usb_gadget_probe_driver+0x80/0xb4 [udc_core]) [<7f000ab0>] (usb_gadget_probe_driver+0x80/0xb4 [udc_core]) from [<7f008618>] (usb_composite_probe+0xac/0xd8 [libcomposite]) [<7f008618>] (usb_composite_probe+0xac/0xd8 [libcomposite]) from [<7f04b168>] (init+0x8c/0xb4 [g_serial]) [<7f04b168>] (init+0x8c/0xb4 [g_serial]) from [<800088e8>] (do_one_initcall+0x108/0x16c) [<800088e8>] (do_one_initcall+0x108/0x16c) from [<8008e518>] (load_module+0x1b00/0x20a4) [<8008e518>] (load_module+0x1b00/0x20a4) from [<8008eba8>] (SyS_init_module+0xec/0x100) [<8008eba8>] (SyS_init_module+0xec/0x100) from [<8000ec40>] (ret_fast_syscall+0x0/0x48) Signed-off-by: Peter Chen <peter.chen@freescale.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

When runing with the kernel(3.15-rc7+), the follow bug occurs: [ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586 [ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python [ 9969.441175] INFO: lockdep is turned off. [ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G A 3.15.0-rc7+ torvalds#85 [ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012 [ 9969.706052] ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18 [ 9969.795323] ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c [ 9969.884710] ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000 [ 9969.974071] Call Trace: [ 9970.003403] [<ffffffff8162f523>] dump_stack+0x4d/0x66 [ 9970.065074] [<ffffffff8109995a>] __might_sleep+0xfa/0x130 [ 9970.130743] [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0 [ 9970.200638] [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210 [ 9970.272610] [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140 [ 9970.344584] [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150 [ 9970.409282] [<ffffffff811b1385>] __mpol_dup+0xe5/0x150 [ 9970.471897] [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150 [ 9970.536585] [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40 [ 9970.613763] [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10 [ 9970.683660] [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50 [ 9970.759795] [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40 [ 9970.834885] [<ffffffff8106a598>] do_fork+0xd8/0x380 [ 9970.894375] [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0 [ 9970.969470] [<ffffffff8106a8c6>] SyS_clone+0x16/0x20 [ 9971.030011] [<ffffffff81642009>] stub_clone+0x69/0x90 [ 9971.091573] [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex) under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(), the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the rcu_read_lock protection region to protect the access to cpuset only in current_cpuset_is_being_rebound(). So that we can avoid this bug. This patch is a temporary solution that just addresses the bug mentioned above, can not fix the long-standing issue about cpuset.mems rebinding on fork(): "When the forker's task_struct is duplicated (which includes ->mems_allowed) and it races with an update to cpuset_being_rebound in update_tasks_nodemask() then the task's mems_allowed doesn't get updated. And the child task's mems_allowed can be wrong if the cpuset's nodemask changes before the child has been added to the cgroup's tasklist." Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Acked-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable <stable@vger.kernel.org>

Fixing the below dump: root@freescale ~$ modprobe g_serial g_serial gadget: Gadget Serial v2.4 g_serial gadget: g_serial ready BUG: sleeping function called from invalid context at /home/b29397/work/projects/upstream/usb/usb/drivers/base/power/runtime.c:952 in_atomic(): 1, irqs_disabled(): 128, pid: 805, name: modprobe 2 locks held by modprobe/805: #0: (udc_lock){+.+.+.}, at: [<7f000a74>] usb_gadget_probe_driver+0x44/0xb4 [udc_core] wandboard-org#1: (&(&ci->lock)->rlock){......}, at: [<7f033488>] ci_udc_start+0x94/0x110 [ci_hdrc] irq event stamp: 3878 hardirqs last enabled at (3877): [<806b6720>] _raw_spin_unlock_irqrestore+0x40/0x6c hardirqs last disabled at (3878): [<806b6474>] _raw_spin_lock_irqsave+0x2c/0xa8 softirqs last enabled at (3872): [<8002ec0c>] __do_softirq+0x1c8/0x2e8 softirqs last disabled at (3857): [<8002f180>] irq_exit+0xbc/0x110 CPU: 0 PID: 805 Comm: modprobe Not tainted 3.11.0-next-20130910+ torvalds#85 [<80016b94>] (unwind_backtrace+0x0/0xf8) from [<80012e0c>] (show_stack+0x20/0x24) [<80012e0c>] (show_stack+0x20/0x24) from [<806af554>] (dump_stack+0x9c/0xc4) [<806af554>] (dump_stack+0x9c/0xc4) from [<8005940c>] (__might_sleep+0xf4/0x134) [<8005940c>] (__might_sleep+0xf4/0x134) from [<803a04a4>] (__pm_runtime_resume+0x94/0xa0) [<803a04a4>] (__pm_runtime_resume+0x94/0xa0) from [<7f0334a4>] (ci_udc_start+0xb0/0x110 [ci_hdrc]) [<7f0334a4>] (ci_udc_start+0xb0/0x110 [ci_hdrc]) from [<7f0009b4>] (udc_bind_to_driver+0x5c/0xd8 [udc_core]) [<7f0009b4>] (udc_bind_to_driver+0x5c/0xd8 [udc_core]) from [<7f000ab0>] (usb_gadget_probe_driver+0x80/0xb4 [udc_core]) [<7f000ab0>] (usb_gadget_probe_driver+0x80/0xb4 [udc_core]) from [<7f008618>] (usb_composite_probe+0xac/0xd8 [libcomposite]) [<7f008618>] (usb_composite_probe+0xac/0xd8 [libcomposite]) from [<7f04b168>] (init+0x8c/0xb4 [g_serial]) [<7f04b168>] (init+0x8c/0xb4 [g_serial]) from [<800088e8>] (do_one_initcall+0x108/0x16c) [<800088e8>] (do_one_initcall+0x108/0x16c) from [<8008e518>] (load_module+0x1b00/0x20a4) [<8008e518>] (load_module+0x1b00/0x20a4) from [<8008eba8>] (SyS_init_module+0xec/0x100) [<8008eba8>] (SyS_init_module+0xec/0x100) from [<8000ec40>] (ret_fast_syscall+0x0/0x48) Signed-off-by: Peter Chen <peter.chen@freescale.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

BugLink: http://bugs.launchpad.net/bugs/1356913 commit 391acf9 upstream. When runing with the kernel(3.15-rc7+), the follow bug occurs: [ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586 [ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python [ 9969.441175] INFO: lockdep is turned off. [ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G A 3.15.0-rc7+ torvalds#85 [ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012 [ 9969.706052] ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18 [ 9969.795323] ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c [ 9969.884710] ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000 [ 9969.974071] Call Trace: [ 9970.003403] [<ffffffff8162f523>] dump_stack+0x4d/0x66 [ 9970.065074] [<ffffffff8109995a>] __might_sleep+0xfa/0x130 [ 9970.130743] [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0 [ 9970.200638] [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210 [ 9970.272610] [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140 [ 9970.344584] [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150 [ 9970.409282] [<ffffffff811b1385>] __mpol_dup+0xe5/0x150 [ 9970.471897] [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150 [ 9970.536585] [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40 [ 9970.613763] [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10 [ 9970.683660] [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50 [ 9970.759795] [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40 [ 9970.834885] [<ffffffff8106a598>] do_fork+0xd8/0x380 [ 9970.894375] [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0 [ 9970.969470] [<ffffffff8106a8c6>] SyS_clone+0x16/0x20 [ 9971.030011] [<ffffffff81642009>] stub_clone+0x69/0x90 [ 9971.091573] [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex) under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(), the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the rcu_read_lock protection region to protect the access to cpuset only in current_cpuset_is_being_rebound(). So that we can avoid this bug. This patch is a temporary solution that just addresses the bug mentioned above, can not fix the long-standing issue about cpuset.mems rebinding on fork(): "When the forker's task_struct is duplicated (which includes ->mems_allowed) and it races with an update to cpuset_being_rebound in update_tasks_nodemask() then the task's mems_allowed doesn't get updated. And the child task's mems_allowed can be wrong if the cpuset's nodemask changes before the child has been added to the cgroup's tasklist." Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Acked-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>

enbable vblank event and reserve 128MB memory for graphic usage

This is due to commit 86839c5 "xen/block: add multi-page ring support" When using an guest under UEFI - after the domain is destroyed the following warning comes from blkback. ------------[ cut here ]------------ WARNING: CPU: 2 PID: 95 at /home/julien/works/linux/drivers/block/xen-blkback/xenbus.c:274 xen_blkif_deferred_free+0x1f4/0x1f8() Modules linked in: CPU: 2 PID: 95 Comm: kworker/2:1 Tainted: G W 4.2.0 #85 Hardware name: APM X-Gene Mustang board (DT) Workqueue: events xen_blkif_deferred_free Call trace: [<ffff8000000890a8>] dump_backtrace+0x0/0x124 [<ffff8000000891dc>] show_stack+0x10/0x1c [<ffff8000007653bc>] dump_stack+0x78/0x98 [<ffff800000097e88>] warn_slowpath_common+0x9c/0xd4 [<ffff800000097f80>] warn_slowpath_null+0x14/0x20 [<ffff800000557a0c>] xen_blkif_deferred_free+0x1f0/0x1f8 [<ffff8000000ad020>] process_one_work+0x160/0x3b4 [<ffff8000000ad3b4>] worker_thread+0x140/0x494 [<ffff8000000b2e34>] kthread+0xd8/0xf0 ---[ end trace 6f859b7883c88cdd ]--- Request allocation has been moved to connect_ring, which is called every time blkback connects to the frontend (this can happen multiple times during a blkback instance life cycle). On the other hand, request freeing has not been moved, so it's only called when destroying the backend instance. Due to this mismatch, blkback can allocate the request pool multiple times, without freeing it. In order to fix it, move the freeing of requests to xen_blkif_disconnect to restore the symmetry between request allocation and freeing. Reported-by: Julien Grall <julien.grall@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Tested-by: Julien Grall <julien.grall@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: xen-devel@lists.xenproject.org CC: stable@vger.kernel.org # 4.2 Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit f929d42 upstream. This is due to commit 86839c5 "xen/block: add multi-page ring support" When using an guest under UEFI - after the domain is destroyed the following warning comes from blkback. ------------[ cut here ]------------ WARNING: CPU: 2 PID: 95 at /home/julien/works/linux/drivers/block/xen-blkback/xenbus.c:274 xen_blkif_deferred_free+0x1f4/0x1f8() Modules linked in: CPU: 2 PID: 95 Comm: kworker/2:1 Tainted: G W 4.2.0 #85 Hardware name: APM X-Gene Mustang board (DT) Workqueue: events xen_blkif_deferred_free Call trace: [<ffff8000000890a8>] dump_backtrace+0x0/0x124 [<ffff8000000891dc>] show_stack+0x10/0x1c [<ffff8000007653bc>] dump_stack+0x78/0x98 [<ffff800000097e88>] warn_slowpath_common+0x9c/0xd4 [<ffff800000097f80>] warn_slowpath_null+0x14/0x20 [<ffff800000557a0c>] xen_blkif_deferred_free+0x1f0/0x1f8 [<ffff8000000ad020>] process_one_work+0x160/0x3b4 [<ffff8000000ad3b4>] worker_thread+0x140/0x494 [<ffff8000000b2e34>] kthread+0xd8/0xf0 ---[ end trace 6f859b7883c88cdd ]--- Request allocation has been moved to connect_ring, which is called every time blkback connects to the frontend (this can happen multiple times during a blkback instance life cycle). On the other hand, request freeing has not been moved, so it's only called when destroying the backend instance. Due to this mismatch, blkback can allocate the request pool multiple times, without freeing it. In order to fix it, move the freeing of requests to xen_blkif_disconnect to restore the symmetry between request allocation and freeing. Reported-by: Julien Grall <julien.grall@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Tested-by: Julien Grall <julien.grall@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: xen-devel@lists.xenproject.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

BugLink: http://bugs.launchpad.net/bugs/1509886 commit f929d42 upstream. This is due to commit 86839c5 "xen/block: add multi-page ring support" When using an guest under UEFI - after the domain is destroyed the following warning comes from blkback. ------------[ cut here ]------------ WARNING: CPU: 2 PID: 95 at /home/julien/works/linux/drivers/block/xen-blkback/xenbus.c:274 xen_blkif_deferred_free+0x1f4/0x1f8() Modules linked in: CPU: 2 PID: 95 Comm: kworker/2:1 Tainted: G W 4.2.0 torvalds#85 Hardware name: APM X-Gene Mustang board (DT) Workqueue: events xen_blkif_deferred_free Call trace: [<ffff8000000890a8>] dump_backtrace+0x0/0x124 [<ffff8000000891dc>] show_stack+0x10/0x1c [<ffff8000007653bc>] dump_stack+0x78/0x98 [<ffff800000097e88>] warn_slowpath_common+0x9c/0xd4 [<ffff800000097f80>] warn_slowpath_null+0x14/0x20 [<ffff800000557a0c>] xen_blkif_deferred_free+0x1f0/0x1f8 [<ffff8000000ad020>] process_one_work+0x160/0x3b4 [<ffff8000000ad3b4>] worker_thread+0x140/0x494 [<ffff8000000b2e34>] kthread+0xd8/0xf0 ---[ end trace 6f859b7883c88cdd ]--- Request allocation has been moved to connect_ring, which is called every time blkback connects to the frontend (this can happen multiple times during a blkback instance life cycle). On the other hand, request freeing has not been moved, so it's only called when destroying the backend instance. Due to this mismatch, blkback can allocate the request pool multiple times, without freeing it. In order to fix it, move the freeing of requests to xen_blkif_disconnect to restore the symmetry between request allocation and freeing. Reported-by: Julien Grall <julien.grall@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Tested-by: Julien Grall <julien.grall@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: xen-devel@lists.xenproject.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Brad Figg <brad.figg@canonical.com>

This is due to commit 86839c5 "xen/block: add multi-page ring support" When using an guest under UEFI - after the domain is destroyed the following warning comes from blkback. ------------[ cut here ]------------ WARNING: CPU: 2 PID: 95 at /home/julien/works/linux/drivers/block/xen-blkback/xenbus.c:274 xen_blkif_deferred_free+0x1f4/0x1f8() Modules linked in: CPU: 2 PID: 95 Comm: kworker/2:1 Tainted: G W 4.2.0 openbmc#85 Hardware name: APM X-Gene Mustang board (DT) Workqueue: events xen_blkif_deferred_free Call trace: [<ffff8000000890a8>] dump_backtrace+0x0/0x124 [<ffff8000000891dc>] show_stack+0x10/0x1c [<ffff8000007653bc>] dump_stack+0x78/0x98 [<ffff800000097e88>] warn_slowpath_common+0x9c/0xd4 [<ffff800000097f80>] warn_slowpath_null+0x14/0x20 [<ffff800000557a0c>] xen_blkif_deferred_free+0x1f0/0x1f8 [<ffff8000000ad020>] process_one_work+0x160/0x3b4 [<ffff8000000ad3b4>] worker_thread+0x140/0x494 [<ffff8000000b2e34>] kthread+0xd8/0xf0 ---[ end trace 6f859b7883c88cdd ]--- Request allocation has been moved to connect_ring, which is called every time blkback connects to the frontend (this can happen multiple times during a blkback instance life cycle). On the other hand, request freeing has not been moved, so it's only called when destroying the backend instance. Due to this mismatch, blkback can allocate the request pool multiple times, without freeing it. In order to fix it, move the freeing of requests to xen_blkif_disconnect to restore the symmetry between request allocation and freeing. Reported-by: Julien Grall <julien.grall@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Tested-by: Julien Grall <julien.grall@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: xen-devel@lists.xenproject.org CC: stable@vger.kernel.org # 4.2 Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

This adds test cases mostly around ARG_PTR_TO_RAW_STACK to check the verifier behaviour. [...] torvalds#84 raw_stack: no skb_load_bytes OK torvalds#85 raw_stack: skb_load_bytes, no init OK torvalds#86 raw_stack: skb_load_bytes, init OK torvalds#87 raw_stack: skb_load_bytes, spilled regs around bounds OK torvalds#88 raw_stack: skb_load_bytes, spilled regs corruption OK torvalds#89 raw_stack: skb_load_bytes, spilled regs corruption 2 OK torvalds#90 raw_stack: skb_load_bytes, spilled regs + data OK torvalds#91 raw_stack: skb_load_bytes, invalid access 1 OK torvalds#92 raw_stack: skb_load_bytes, invalid access 2 OK torvalds#93 raw_stack: skb_load_bytes, invalid access 3 OK torvalds#94 raw_stack: skb_load_bytes, invalid access 4 OK torvalds#95 raw_stack: skb_load_bytes, invalid access 5 OK torvalds#96 raw_stack: skb_load_bytes, invalid access 6 OK torvalds#97 raw_stack: skb_load_bytes, large access OK Summary: 98 PASSED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>

@clk

commit 72b5322 upstream. The @cn is stay in @clk_notifier_list after it is freed, it cause memory corruption. Example, if @clk is registered(first), unregistered(first), registered(second), unregistered(second). The freed @cn will be used when @clk is registered(second), and the bug will be happened when @clk is unregistered(second): [ 517.040000] clk_notif_dbg clk_notif_dbg.1: clk_notifier_unregister() [ 517.040000] Unable to handle kernel paging request at virtual address 00df3008 [ 517.050000] pgd = ed858000 [ 517.050000] [00df3008] *pgd=00000000 [ 517.060000] Internal error: Oops: 5 [#1] PREEMPT SMP ARM [ 517.060000] Modules linked in: clk_notif_dbg(O-) [last unloaded: clk_notif_dbg] [ 517.060000] CPU: 1 PID: 499 Comm: modprobe Tainted: G O 3.10.0-rc3-00119-ga93cb29-dirty torvalds#85 [ 517.060000] task: ee1e0180 ti: ee3e6000 task.ti: ee3e6000 [ 517.060000] PC is at srcu_readers_seq_idx+0x48/0x84 [ 517.060000] LR is at srcu_readers_seq_idx+0x60/0x84 [ 517.060000] pc : [<c0052720>] lr : [<c0052738>] psr: 80070013 [ 517.060000] sp : ee3e7d48 ip : 00000000 fp : ee3e7d6c [ 517.060000] r10: 00000000 r9 : ee3e6000 r8 : 00000000 [ 517.060000] r7 : ed84fe4c r6 : c068ec90 r5 : c068e430 r4 : 00000000 [ 517.060000] r3 : 00df3000 r2 : 00000000 r1 : 00000002 r0 : 00000000 [ 517.060000] Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [ 517.060000] Control: 18c5387d Table: 2d85804a DAC: 00000015 [ 517.060000] Process modprobe (pid: 499, stack limit = 0xee3e6238) [ 517.060000] Stack: (0xee3e7d48 to 0xee3e8000) .... [ 517.060000] [<c0052720>] (srcu_readers_seq_idx+0x48/0x84) from [<c0052790>] (try_check_zero+0x34/0xfc) [ 517.060000] [<c0052790>] (try_check_zero+0x34/0xfc) from [<c00528b0>] (srcu_advance_batches+0x58/0x114) [ 517.060000] [<c00528b0>] (srcu_advance_batches+0x58/0x114) from [<c0052c30>] (__synchronize_srcu+0x114/0x1ac) [ 517.060000] [<c0052c30>] (__synchronize_srcu+0x114/0x1ac) from [<c0052d14>] (synchronize_srcu+0x2c/0x34) [ 517.060000] [<c0052d14>] (synchronize_srcu+0x2c/0x34) from [<c0053a08>] (srcu_notifier_chain_unregister+0x68/0x74) [ 517.060000] [<c0053a08>] (srcu_notifier_chain_unregister+0x68/0x74) from [<c0375a78>] (clk_notifier_unregister+0x7c/0xc0) [ 517.060000] [<c0375a78>] (clk_notifier_unregister+0x7c/0xc0) from [<bf008034>] (clk_notif_dbg_remove+0x34/0x9c [clk_notif_dbg]) [ 517.060000] [<bf008034>] (clk_notif_dbg_remove+0x34/0x9c [clk_notif_dbg]) from [<c02bb974>] (platform_drv_remove+0x24/0x28) [ 517.060000] [<c02bb974>] (platform_drv_remove+0x24/0x28) from [<c02b9bf8>] (__device_release_driver+0x8c/0xd4) [ 517.060000] [<c02b9bf8>] (__device_release_driver+0x8c/0xd4) from [<c02ba680>] (driver_detach+0x9c/0xc4) [ 517.060000] [<c02ba680>] (driver_detach+0x9c/0xc4) from [<c02b99c4>] (bus_remove_driver+0xcc/0xfc) [ 517.060000] [<c02b99c4>] (bus_remove_driver+0xcc/0xfc) from [<c02bace4>] (driver_unregister+0x54/0x78) [ 517.060000] [<c02bace4>] (driver_unregister+0x54/0x78) from [<c02bbb44>] (platform_driver_unregister+0x1c/0x20) [ 517.060000] [<c02bbb44>] (platform_driver_unregister+0x1c/0x20) from [<bf0081f8>] (clk_notif_dbg_driver_exit+0x14/0x1c [clk_notif_dbg]) [ 517.060000] [<bf0081f8>] (clk_notif_dbg_driver_exit+0x14/0x1c [clk_notif_dbg]) from [<c00835e4>] (SyS_delete_module+0x200/0x28c) [ 517.060000] [<c00835e4>] (SyS_delete_module+0x200/0x28c) from [<c000edc0>] (ret_fast_syscall+0x0/0x48) [ 517.060000] Code: e5973004 e7911102 e0833001 e2881002 (e7933101) Reported-by: Sören Brinkmann <soren.brinkmann@xilinx.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Tested-by: Sören Brinkmann <soren.brinkmann@xilinx.com> Signed-off-by: Mike Turquette <mturquette@linaro.org> [mturquette@linaro.org: shortened $SUBJECT] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

@clk

commit 72b5322 upstream. The @cn is stay in @clk_notifier_list after it is freed, it cause memory corruption. Example, if @clk is registered(first), unregistered(first), registered(second), unregistered(second). The freed @cn will be used when @clk is registered(second), and the bug will be happened when @clk is unregistered(second): [ 517.040000] clk_notif_dbg clk_notif_dbg.1: clk_notifier_unregister() [ 517.040000] Unable to handle kernel paging request at virtual address 00df3008 [ 517.050000] pgd = ed858000 [ 517.050000] [00df3008] *pgd=00000000 [ 517.060000] Internal error: Oops: 5 [#1] PREEMPT SMP ARM [ 517.060000] Modules linked in: clk_notif_dbg(O-) [last unloaded: clk_notif_dbg] [ 517.060000] CPU: 1 PID: 499 Comm: modprobe Tainted: G O 3.10.0-rc3-00119-ga93cb29-dirty torvalds#85 [ 517.060000] task: ee1e0180 ti: ee3e6000 task.ti: ee3e6000 [ 517.060000] PC is at srcu_readers_seq_idx+0x48/0x84 [ 517.060000] LR is at srcu_readers_seq_idx+0x60/0x84 [ 517.060000] pc : [<c0052720>] lr : [<c0052738>] psr: 80070013 [ 517.060000] sp : ee3e7d48 ip : 00000000 fp : ee3e7d6c [ 517.060000] r10: 00000000 r9 : ee3e6000 r8 : 00000000 [ 517.060000] r7 : ed84fe4c r6 : c068ec90 r5 : c068e430 r4 : 00000000 [ 517.060000] r3 : 00df3000 r2 : 00000000 r1 : 00000002 r0 : 00000000 [ 517.060000] Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [ 517.060000] Control: 18c5387d Table: 2d85804a DAC: 00000015 [ 517.060000] Process modprobe (pid: 499, stack limit = 0xee3e6238) [ 517.060000] Stack: (0xee3e7d48 to 0xee3e8000) .... [ 517.060000] [<c0052720>] (srcu_readers_seq_idx+0x48/0x84) from [<c0052790>] (try_check_zero+0x34/0xfc) [ 517.060000] [<c0052790>] (try_check_zero+0x34/0xfc) from [<c00528b0>] (srcu_advance_batches+0x58/0x114) [ 517.060000] [<c00528b0>] (srcu_advance_batches+0x58/0x114) from [<c0052c30>] (__synchronize_srcu+0x114/0x1ac) [ 517.060000] [<c0052c30>] (__synchronize_srcu+0x114/0x1ac) from [<c0052d14>] (synchronize_srcu+0x2c/0x34) [ 517.060000] [<c0052d14>] (synchronize_srcu+0x2c/0x34) from [<c0053a08>] (srcu_notifier_chain_unregister+0x68/0x74) [ 517.060000] [<c0053a08>] (srcu_notifier_chain_unregister+0x68/0x74) from [<c0375a78>] (clk_notifier_unregister+0x7c/0xc0) [ 517.060000] [<c0375a78>] (clk_notifier_unregister+0x7c/0xc0) from [<bf008034>] (clk_notif_dbg_remove+0x34/0x9c [clk_notif_dbg]) [ 517.060000] [<bf008034>] (clk_notif_dbg_remove+0x34/0x9c [clk_notif_dbg]) from [<c02bb974>] (platform_drv_remove+0x24/0x28) [ 517.060000] [<c02bb974>] (platform_drv_remove+0x24/0x28) from [<c02b9bf8>] (__device_release_driver+0x8c/0xd4) [ 517.060000] [<c02b9bf8>] (__device_release_driver+0x8c/0xd4) from [<c02ba680>] (driver_detach+0x9c/0xc4) [ 517.060000] [<c02ba680>] (driver_detach+0x9c/0xc4) from [<c02b99c4>] (bus_remove_driver+0xcc/0xfc) [ 517.060000] [<c02b99c4>] (bus_remove_driver+0xcc/0xfc) from [<c02bace4>] (driver_unregister+0x54/0x78) [ 517.060000] [<c02bace4>] (driver_unregister+0x54/0x78) from [<c02bbb44>] (platform_driver_unregister+0x1c/0x20) [ 517.060000] [<c02bbb44>] (platform_driver_unregister+0x1c/0x20) from [<bf0081f8>] (clk_notif_dbg_driver_exit+0x14/0x1c [clk_notif_dbg]) [ 517.060000] [<bf0081f8>] (clk_notif_dbg_driver_exit+0x14/0x1c [clk_notif_dbg]) from [<c00835e4>] (SyS_delete_module+0x200/0x28c) [ 517.060000] [<c00835e4>] (SyS_delete_module+0x200/0x28c) from [<c000edc0>] (ret_fast_syscall+0x0/0x48) [ 517.060000] Code: e5973004 e7911102 e0833001 e2881002 (e7933101) Reported-by: Sören Brinkmann <soren.brinkmann@xilinx.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Tested-by: Sören Brinkmann <soren.brinkmann@xilinx.com> Signed-off-by: Mike Turquette <mturquette@linaro.org> [mturquette@linaro.org: shortened $SUBJECT] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

commit f929d42 upstream. This is due to commit 86839c5 "xen/block: add multi-page ring support" When using an guest under UEFI - after the domain is destroyed the following warning comes from blkback. ------------[ cut here ]------------ WARNING: CPU: 2 PID: 95 at /home/julien/works/linux/drivers/block/xen-blkback/xenbus.c:274 xen_blkif_deferred_free+0x1f4/0x1f8() Modules linked in: CPU: 2 PID: 95 Comm: kworker/2:1 Tainted: G W 4.2.0 torvalds#85 Hardware name: APM X-Gene Mustang board (DT) Workqueue: events xen_blkif_deferred_free Call trace: [<ffff8000000890a8>] dump_backtrace+0x0/0x124 [<ffff8000000891dc>] show_stack+0x10/0x1c [<ffff8000007653bc>] dump_stack+0x78/0x98 [<ffff800000097e88>] warn_slowpath_common+0x9c/0xd4 [<ffff800000097f80>] warn_slowpath_null+0x14/0x20 [<ffff800000557a0c>] xen_blkif_deferred_free+0x1f0/0x1f8 [<ffff8000000ad020>] process_one_work+0x160/0x3b4 [<ffff8000000ad3b4>] worker_thread+0x140/0x494 [<ffff8000000b2e34>] kthread+0xd8/0xf0 ---[ end trace 6f859b7883c88cdd ]--- Request allocation has been moved to connect_ring, which is called every time blkback connects to the frontend (this can happen multiple times during a blkback instance life cycle). On the other hand, request freeing has not been moved, so it's only called when destroying the backend instance. Due to this mismatch, blkback can allocate the request pool multiple times, without freeing it. In order to fix it, move the freeing of requests to xen_blkif_disconnect to restore the symmetry between request allocation and freeing. Reported-by: Julien Grall <julien.grall@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Tested-by: Julien Grall <julien.grall@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: xen-devel@lists.xenproject.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

WARNING: line over 80 characters torvalds#69: FILE: ipc/sem.c:1997: + * fastpath: the semop has completed, either successfully or not, from WARNING: line over 80 characters #70: FILE: ipc/sem.c:1998: + * the syscall pov, is quite irrelevant to us at this point; we're done. WARNING: line over 80 characters torvalds#73: FILE: ipc/sem.c:2001: + * spuriously. The queue.status is checked again in the slowpath (aka WARNING: line over 80 characters torvalds#74: FILE: ipc/sem.c:2002: + * after taking sem_lock), such that we can detect scenarios where we WARNING: line over 80 characters torvalds#75: FILE: ipc/sem.c:2003: + * were awakened externally, during the window between wake_q_add() and WARNING: line over 80 characters torvalds#84: FILE: ipc/sem.c:2009: + * User space could assume that semop() is a memory barrier: WARNING: line over 80 characters torvalds#85: FILE: ipc/sem.c:2010: + * Without the mb(), the cpu could speculatively read in user WARNING: line over 80 characters torvalds#86: FILE: ipc/sem.c:2011: + * space stale data that was overwritten by the previous owner total: 0 errors, 8 warnings, 127 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/ipc-sem-simplify-wait-wake-loop.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit 745f17a upstream. We got a WARNING in ext4_add_complete_io: ================================================================== WARNING: at fs/ext4/page-io.c:231 ext4_put_io_end_defer+0x182/0x250 CPU: 10 PID: 77 Comm: ksoftirqd/10 Tainted: 6.3.0-rc2 torvalds#85 RIP: 0010:ext4_put_io_end_defer+0x182/0x250 [ext4] [...] Call Trace: <TASK> ext4_end_bio+0xa8/0x240 [ext4] bio_endio+0x195/0x310 blk_update_request+0x184/0x770 scsi_end_request+0x2f/0x240 scsi_io_completion+0x75/0x450 scsi_finish_command+0xef/0x160 scsi_complete+0xa3/0x180 blk_complete_reqs+0x60/0x80 blk_done_softirq+0x25/0x40 __do_softirq+0x119/0x4c8 run_ksoftirqd+0x42/0x70 smpboot_thread_fn+0x136/0x3c0 kthread+0x140/0x1a0 ret_from_fork+0x2c/0x50 ================================================================== Above issue may happen as follows: cpu1 cpu2 ----------------------------|---------------------------- mount -o dioread_lock ext4_writepages ext4_do_writepages *if (ext4_should_dioread_nolock(inode))* // rsv_blocks is not assigned here mount -o remount,dioread_nolock ext4_journal_start_with_reserve __ext4_journal_start __ext4_journal_start_sb jbd2__journal_start *if (rsv_blocks)* // h_rsv_handle is not initialized here mpage_map_and_submit_extent mpage_map_one_extent dioread_nolock = ext4_should_dioread_nolock(inode) if (dioread_nolock && (map->m_flags & EXT4_MAP_UNWRITTEN)) mpd->io_submit.io_end->handle = handle->h_rsv_handle ext4_set_io_unwritten_flag io_end->flag |= EXT4_IO_END_UNWRITTEN // now io_end->handle is NULL but has EXT4_IO_END_UNWRITTEN flag scsi_finish_command scsi_io_completion scsi_io_completion_action scsi_end_request blk_update_request req_bio_endio bio_endio bio->bi_end_io > ext4_end_bio ext4_put_io_end_defer ext4_add_complete_io // trigger WARN_ON(!io_end->handle && sbi->s_journal); The immediate cause of this problem is that ext4_should_dioread_nolock() function returns inconsistent values in the ext4_do_writepages() and mpage_map_one_extent(). There are four conditions in this function that can be changed at mount time to cause this problem. These four conditions can be divided into two categories: (1) journal_data and EXT4_EXTENTS_FL, which can be changed by ioctl (2) DELALLOC and DIOREAD_NOLOCK, which can be changed by remount The two in the first category have been fixed by commit c8585c6 ("ext4: fix races between changing inode journal mode and ext4_writepages") and commit cb85f4d ("ext4: fix race between writepages and enabling EXT4_EXTENTS_FL") respectively. Two cases in the other category have not yet been fixed, and the above issue is caused by this situation. We refer to the fix for the first category, when applying options during remount, we grab s_writepages_rwsem to avoid racing with writepages ops to trigger this problem. Fixes: 6b523df ("ext4: use transaction reservation for extent conversion in ext4_end_io") Cc: stable@vger.kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230524072538.2883391-1-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

This series is backport of sched_ext-v5 (597552f5447a) which is the split version of devel branch commit e69323c ("Merge pull request torvalds#85 from sched-ext/misc-fixes "). This commit makes the following fixes for backport: - s/wakeup_preempt_scx/check_preempt_curr_scx/ to revert the rename which happened post v6.6. - Remove the use of BPF_F_TIMER_CPU_PIN which was introduced after v6.6.

commit 745f17a upstream. We got a WARNING in ext4_add_complete_io: ================================================================== WARNING: at fs/ext4/page-io.c:231 ext4_put_io_end_defer+0x182/0x250 CPU: 10 PID: 77 Comm: ksoftirqd/10 Tainted: 6.3.0-rc2 torvalds#85 RIP: 0010:ext4_put_io_end_defer+0x182/0x250 [ext4] [...] Call Trace: <TASK> ext4_end_bio+0xa8/0x240 [ext4] bio_endio+0x195/0x310 blk_update_request+0x184/0x770 scsi_end_request+0x2f/0x240 scsi_io_completion+0x75/0x450 scsi_finish_command+0xef/0x160 scsi_complete+0xa3/0x180 blk_complete_reqs+0x60/0x80 blk_done_softirq+0x25/0x40 __do_softirq+0x119/0x4c8 run_ksoftirqd+0x42/0x70 smpboot_thread_fn+0x136/0x3c0 kthread+0x140/0x1a0 ret_from_fork+0x2c/0x50 ================================================================== Above issue may happen as follows: cpu1 cpu2 ----------------------------|---------------------------- mount -o dioread_lock ext4_writepages ext4_do_writepages *if (ext4_should_dioread_nolock(inode))* // rsv_blocks is not assigned here mount -o remount,dioread_nolock ext4_journal_start_with_reserve __ext4_journal_start __ext4_journal_start_sb jbd2__journal_start *if (rsv_blocks)* // h_rsv_handle is not initialized here mpage_map_and_submit_extent mpage_map_one_extent dioread_nolock = ext4_should_dioread_nolock(inode) if (dioread_nolock && (map->m_flags & EXT4_MAP_UNWRITTEN)) mpd->io_submit.io_end->handle = handle->h_rsv_handle ext4_set_io_unwritten_flag io_end->flag |= EXT4_IO_END_UNWRITTEN // now io_end->handle is NULL but has EXT4_IO_END_UNWRITTEN flag scsi_finish_command scsi_io_completion scsi_io_completion_action scsi_end_request blk_update_request req_bio_endio bio_endio bio->bi_end_io > ext4_end_bio ext4_put_io_end_defer ext4_add_complete_io // trigger WARN_ON(!io_end->handle && sbi->s_journal); The immediate cause of this problem is that ext4_should_dioread_nolock() function returns inconsistent values in the ext4_do_writepages() and mpage_map_one_extent(). There are four conditions in this function that can be changed at mount time to cause this problem. These four conditions can be divided into two categories: (1) journal_data and EXT4_EXTENTS_FL, which can be changed by ioctl (2) DELALLOC and DIOREAD_NOLOCK, which can be changed by remount The two in the first category have been fixed by commit c8585c6 ("ext4: fix races between changing inode journal mode and ext4_writepages") and commit cb85f4d ("ext4: fix race between writepages and enabling EXT4_EXTENTS_FL") respectively. Two cases in the other category have not yet been fixed, and the above issue is caused by this situation. We refer to the fix for the first category, when applying options during remount, we grab s_writepages_rwsem to avoid racing with writepages ops to trigger this problem. Fixes: 6b523df ("ext4: use transaction reservation for extent conversion in ext4_end_io") Cc: stable@vger.kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230524072538.2883391-1-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

From: Kurt Kanzenbach <kurt@linutronix.de> To: gregkh@linuxfoundation.org Cc: kurt@linutronix.de, tglx@linutronix.de, jslaby@suse.com, linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] tty: serial: 8250: pass IRQ shared flag to UART ports Date: Fri, 16 Mar 2018 12:31:58 +0100 [thread overview] Message-ID: <20180316113158.10298-1-kurt@linutronix.de> (raw) On some systems IRQ lines between multiple UARTs might be shared. If so, the irqflags have to be configured accordingly. The reason is: The 8250 port startup code performs IRQ tests *before* the IRQ handler for that particular port is registered. This is performed in serial8250_do_startup(). This function checks whether IRQF_SHARED is configured and only then disables the IRQ line while testing. This test is performed upon each open() of the UART device. Imagine two UARTs share the same IRQ line: On is already opened and the IRQ is active. When the second UART is opened, the IRQ line has to be disabled while performing IRQ tests. Otherwise an IRQ might handler might be invoked, but the the IRQ itself cannot be handled, because the corresponding handler isn't registered, yet. That's because the 8250 code uses a chain-handler and invokes the corresponding port's IRQ handling rountines himself. Unfortunately this IRQF_SHARED flag isn't configured for UARTs probed via device tree even if the IRQs are shared. This way, the actual and shared IRQ line isn't disabled while performing tests and the kernel correctly detects a spurious IRQ. So, adding this flag to the DT probe solves the issue. Note: The UPF_SHARE_IRQ flag is configured unconditionally. Therefore, the IRQF_SHARED flag can be set unconditionally as well. Example stacktrace by performing echo 1 > /dev/ttyS2 on a non-patched system: |irq 85: nobody cared (try booting with the "irqpoll" option) | [...] |handlers: |[<ffff0000080fc628>] irq_default_primary_handler threaded [<ffff00000855fbb8>] serial8250_interrupt |Disabling IRQ torvalds#85 Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>

When I was testing mongodb over bcachefs with compression, there is a lockdep warning when snapshotting mongodb data volume. $ cat test.sh prog=bcachefs $prog subvolume create /mnt/data $prog subvolume create /mnt/data/snapshots while true;do $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s) sleep 1s done $ cat /etc/mongodb.conf systemLog: destination: file logAppend: true path: /mnt/data/mongod.log storage: dbPath: /mnt/data/ lockdep reports: [ 3437.452330] ====================================================== [ 3437.452750] WARNING: possible circular locking dependency detected [ 3437.453168] 6.7.0-rc7-custom+ torvalds#85 Tainted: G E [ 3437.453562] ------------------------------------------------------ [ 3437.453981] bcachefs/35533 is trying to acquire lock: [ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190 [ 3437.454875] but task is already holding lock: [ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.456009] which lock already depends on the new lock. [ 3437.456553] the existing dependency chain (in reverse order) is: [ 3437.457054] -> #3 (&type->s_umount_key#48){.+.+}-{3:3}: [ 3437.457507] down_read+0x3e/0x170 [ 3437.457772] bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.458206] __x64_sys_ioctl+0x93/0xd0 [ 3437.458498] do_syscall_64+0x42/0xf0 [ 3437.458779] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.459155] -> #2 (&c->snapshot_create_lock){++++}-{3:3}: [ 3437.459615] down_read+0x3e/0x170 [ 3437.459878] bch2_truncate+0x82/0x110 [bcachefs] [ 3437.460276] bchfs_truncate+0x254/0x3c0 [bcachefs] [ 3437.460686] notify_change+0x1f1/0x4a0 [ 3437.461283] do_truncate+0x7f/0xd0 [ 3437.461555] path_openat+0xa57/0xce0 [ 3437.461836] do_filp_open+0xb4/0x160 [ 3437.462116] do_sys_openat2+0x91/0xc0 [ 3437.462402] __x64_sys_openat+0x53/0xa0 [ 3437.462701] do_syscall_64+0x42/0xf0 [ 3437.462982] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.463359] -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}: [ 3437.463843] down_write+0x3b/0xc0 [ 3437.464223] bch2_write_iter+0x5b/0xcc0 [bcachefs] [ 3437.464493] vfs_write+0x21b/0x4c0 [ 3437.464653] ksys_write+0x69/0xf0 [ 3437.464839] do_syscall_64+0x42/0xf0 [ 3437.465009] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.465231] -> #0 (sb_writers#10){.+.+}-{0:0}: [ 3437.465471] __lock_acquire+0x1455/0x21b0 [ 3437.465656] lock_acquire+0xc6/0x2b0 [ 3437.465822] mnt_want_write+0x46/0x1a0 [ 3437.465996] filename_create+0x62/0x190 [ 3437.466175] user_path_create+0x2d/0x50 [ 3437.466352] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.466617] __x64_sys_ioctl+0x93/0xd0 [ 3437.466791] do_syscall_64+0x42/0xf0 [ 3437.466957] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.467180] other info that might help us debug this: [ 3437.469670] 2 locks held by bcachefs/35533: other info that might help us debug this: [ 3437.467507] Chain exists of: sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48 [ 3437.467979] Possible unsafe locking scenario: [ 3437.468223] CPU0 CPU1 [ 3437.468405] ---- ---- [ 3437.468585] rlock(&type->s_umount_key#48); [ 3437.468758] lock(&c->snapshot_create_lock); [ 3437.469030] lock(&type->s_umount_key#48); [ 3437.469291] rlock(sb_writers#10); [ 3437.469434] *** DEADLOCK *** [ 3437.469670] 2 locks held by bcachefs/35533: [ 3437.469838] #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs] [ 3437.470294] #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.470744] stack backtrace: [ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G E 6.7.0-rc7-custom+ torvalds#85 [ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 3437.471694] Call Trace: [ 3437.471795] <TASK> [ 3437.471884] dump_stack_lvl+0x57/0x90 [ 3437.472035] check_noncircular+0x132/0x150 [ 3437.472202] __lock_acquire+0x1455/0x21b0 [ 3437.472369] lock_acquire+0xc6/0x2b0 [ 3437.472518] ? filename_create+0x62/0x190 [ 3437.472683] ? lock_is_held_type+0x97/0x110 [ 3437.472856] mnt_want_write+0x46/0x1a0 [ 3437.473025] ? filename_create+0x62/0x190 [ 3437.473204] filename_create+0x62/0x190 [ 3437.473380] user_path_create+0x2d/0x50 [ 3437.473555] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.473819] ? lock_acquire+0xc6/0x2b0 [ 3437.474002] ? __fget_files+0x2a/0x190 [ 3437.474195] ? __fget_files+0xbc/0x190 [ 3437.474380] ? lock_release+0xc5/0x270 [ 3437.474567] ? __x64_sys_ioctl+0x93/0xd0 [ 3437.474764] ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs] [ 3437.475090] __x64_sys_ioctl+0x93/0xd0 [ 3437.475277] do_syscall_64+0x42/0xf0 [ 3437.475454] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.475691] RIP: 0033:0x7f2743c313af ====================================================== In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally and unlock it at the end of the function. There is a comment "why do we need this lock?" about the lock coming from commit 42d2373 ("bcachefs: Snapshot creation, deletion") The reason is that __bch2_ioctl_subvolume_create() calls sync_inodes_sb() which enforce locked s_umount to writeback all dirty nodes before doing snapshot works. Fix it by read locking s_umount for snapshotting only and unlocking s_umount after sync_inodes_sb(). Signed-off-by: Su Yue <glass.su@suse.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

scx: Move scx_ops_enable_state_str[] outside CONFIG_SCHED_DEBUG

When I was testing mongodb over bcachefs with compression, there is a lockdep warning when snapshotting mongodb data volume. $ cat test.sh prog=bcachefs $prog subvolume create /mnt/data $prog subvolume create /mnt/data/snapshots while true;do $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s) sleep 1s done $ cat /etc/mongodb.conf systemLog: destination: file logAppend: true path: /mnt/data/mongod.log storage: dbPath: /mnt/data/ lockdep reports: [ 3437.452330] ====================================================== [ 3437.452750] WARNING: possible circular locking dependency detected [ 3437.453168] 6.7.0-rc7-custom+ torvalds#85 Tainted: G E [ 3437.453562] ------------------------------------------------------ [ 3437.453981] bcachefs/35533 is trying to acquire lock: [ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190 [ 3437.454875] but task is already holding lock: [ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.456009] which lock already depends on the new lock. [ 3437.456553] the existing dependency chain (in reverse order) is: [ 3437.457054] -> #3 (&type->s_umount_key#48){.+.+}-{3:3}: [ 3437.457507] down_read+0x3e/0x170 [ 3437.457772] bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.458206] __x64_sys_ioctl+0x93/0xd0 [ 3437.458498] do_syscall_64+0x42/0xf0 [ 3437.458779] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.459155] -> #2 (&c->snapshot_create_lock){++++}-{3:3}: [ 3437.459615] down_read+0x3e/0x170 [ 3437.459878] bch2_truncate+0x82/0x110 [bcachefs] [ 3437.460276] bchfs_truncate+0x254/0x3c0 [bcachefs] [ 3437.460686] notify_change+0x1f1/0x4a0 [ 3437.461283] do_truncate+0x7f/0xd0 [ 3437.461555] path_openat+0xa57/0xce0 [ 3437.461836] do_filp_open+0xb4/0x160 [ 3437.462116] do_sys_openat2+0x91/0xc0 [ 3437.462402] __x64_sys_openat+0x53/0xa0 [ 3437.462701] do_syscall_64+0x42/0xf0 [ 3437.462982] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.463359] -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}: [ 3437.463843] down_write+0x3b/0xc0 [ 3437.464223] bch2_write_iter+0x5b/0xcc0 [bcachefs] [ 3437.464493] vfs_write+0x21b/0x4c0 [ 3437.464653] ksys_write+0x69/0xf0 [ 3437.464839] do_syscall_64+0x42/0xf0 [ 3437.465009] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.465231] -> #0 (sb_writers#10){.+.+}-{0:0}: [ 3437.465471] __lock_acquire+0x1455/0x21b0 [ 3437.465656] lock_acquire+0xc6/0x2b0 [ 3437.465822] mnt_want_write+0x46/0x1a0 [ 3437.465996] filename_create+0x62/0x190 [ 3437.466175] user_path_create+0x2d/0x50 [ 3437.466352] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.466617] __x64_sys_ioctl+0x93/0xd0 [ 3437.466791] do_syscall_64+0x42/0xf0 [ 3437.466957] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.467180] other info that might help us debug this: [ 3437.469670] 2 locks held by bcachefs/35533: other info that might help us debug this: [ 3437.467507] Chain exists of: sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48 [ 3437.467979] Possible unsafe locking scenario: [ 3437.468223] CPU0 CPU1 [ 3437.468405] ---- ---- [ 3437.468585] rlock(&type->s_umount_key#48); [ 3437.468758] lock(&c->snapshot_create_lock); [ 3437.469030] lock(&type->s_umount_key#48); [ 3437.469291] rlock(sb_writers#10); [ 3437.469434] *** DEADLOCK *** [ 3437.469670] 2 locks held by bcachefs/35533: [ 3437.469838] #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs] [ 3437.470294] #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.470744] stack backtrace: [ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G E 6.7.0-rc7-custom+ torvalds#85 [ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 3437.471694] Call Trace: [ 3437.471795] <TASK> [ 3437.471884] dump_stack_lvl+0x57/0x90 [ 3437.472035] check_noncircular+0x132/0x150 [ 3437.472202] __lock_acquire+0x1455/0x21b0 [ 3437.472369] lock_acquire+0xc6/0x2b0 [ 3437.472518] ? filename_create+0x62/0x190 [ 3437.472683] ? lock_is_held_type+0x97/0x110 [ 3437.472856] mnt_want_write+0x46/0x1a0 [ 3437.473025] ? filename_create+0x62/0x190 [ 3437.473204] filename_create+0x62/0x190 [ 3437.473380] user_path_create+0x2d/0x50 [ 3437.473555] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.473819] ? lock_acquire+0xc6/0x2b0 [ 3437.474002] ? __fget_files+0x2a/0x190 [ 3437.474195] ? __fget_files+0xbc/0x190 [ 3437.474380] ? lock_release+0xc5/0x270 [ 3437.474567] ? __x64_sys_ioctl+0x93/0xd0 [ 3437.474764] ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs] [ 3437.475090] __x64_sys_ioctl+0x93/0xd0 [ 3437.475277] do_syscall_64+0x42/0xf0 [ 3437.475454] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.475691] RIP: 0033:0x7f2743c313af ====================================================== In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally and unlock it at the end of the function. There is a comment "why do we need this lock?" about the lock coming from commit 42d2373 ("bcachefs: Snapshot creation, deletion") The reason is that __bch2_ioctl_subvolume_create() calls sync_inodes_sb() which enforce locked s_umount to writeback all dirty nodes before doing snapshot works. Fix it by read locking s_umount for snapshotting only and unlocking s_umount after sync_inodes_sb(). Signed-off-by: Su Yue <glass.su@suse.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

When I was testing mongodb over bcachefs with compression, there is a lockdep warning when snapshotting mongodb data volume. $ cat test.sh prog=bcachefs $prog subvolume create /mnt/data $prog subvolume create /mnt/data/snapshots while true;do $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s) sleep 1s done $ cat /etc/mongodb.conf systemLog: destination: file logAppend: true path: /mnt/data/mongod.log storage: dbPath: /mnt/data/ lockdep reports: [ 3437.452330] ====================================================== [ 3437.452750] WARNING: possible circular locking dependency detected [ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G E [ 3437.453562] ------------------------------------------------------ [ 3437.453981] bcachefs/35533 is trying to acquire lock: [ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190 [ 3437.454875] but task is already holding lock: [ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.456009] which lock already depends on the new lock. [ 3437.456553] the existing dependency chain (in reverse order) is: [ 3437.457054] -> #3 (&type->s_umount_key#48){.+.+}-{3:3}: [ 3437.457507] down_read+0x3e/0x170 [ 3437.457772] bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.458206] __x64_sys_ioctl+0x93/0xd0 [ 3437.458498] do_syscall_64+0x42/0xf0 [ 3437.458779] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.459155] -> #2 (&c->snapshot_create_lock){++++}-{3:3}: [ 3437.459615] down_read+0x3e/0x170 [ 3437.459878] bch2_truncate+0x82/0x110 [bcachefs] [ 3437.460276] bchfs_truncate+0x254/0x3c0 [bcachefs] [ 3437.460686] notify_change+0x1f1/0x4a0 [ 3437.461283] do_truncate+0x7f/0xd0 [ 3437.461555] path_openat+0xa57/0xce0 [ 3437.461836] do_filp_open+0xb4/0x160 [ 3437.462116] do_sys_openat2+0x91/0xc0 [ 3437.462402] __x64_sys_openat+0x53/0xa0 [ 3437.462701] do_syscall_64+0x42/0xf0 [ 3437.462982] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.463359] -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}: [ 3437.463843] down_write+0x3b/0xc0 [ 3437.464223] bch2_write_iter+0x5b/0xcc0 [bcachefs] [ 3437.464493] vfs_write+0x21b/0x4c0 [ 3437.464653] ksys_write+0x69/0xf0 [ 3437.464839] do_syscall_64+0x42/0xf0 [ 3437.465009] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.465231] -> #0 (sb_writers#10){.+.+}-{0:0}: [ 3437.465471] __lock_acquire+0x1455/0x21b0 [ 3437.465656] lock_acquire+0xc6/0x2b0 [ 3437.465822] mnt_want_write+0x46/0x1a0 [ 3437.465996] filename_create+0x62/0x190 [ 3437.466175] user_path_create+0x2d/0x50 [ 3437.466352] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.466617] __x64_sys_ioctl+0x93/0xd0 [ 3437.466791] do_syscall_64+0x42/0xf0 [ 3437.466957] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.467180] other info that might help us debug this: [ 3437.469670] 2 locks held by bcachefs/35533: other info that might help us debug this: [ 3437.467507] Chain exists of: sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48 [ 3437.467979] Possible unsafe locking scenario: [ 3437.468223] CPU0 CPU1 [ 3437.468405] ---- ---- [ 3437.468585] rlock(&type->s_umount_key#48); [ 3437.468758] lock(&c->snapshot_create_lock); [ 3437.469030] lock(&type->s_umount_key#48); [ 3437.469291] rlock(sb_writers#10); [ 3437.469434] *** DEADLOCK *** [ 3437.469670] 2 locks held by bcachefs/35533: [ 3437.469838] #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs] [ 3437.470294] #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.470744] stack backtrace: [ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G E 6.7.0-rc7-custom+ #85 [ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 3437.471694] Call Trace: [ 3437.471795] <TASK> [ 3437.471884] dump_stack_lvl+0x57/0x90 [ 3437.472035] check_noncircular+0x132/0x150 [ 3437.472202] __lock_acquire+0x1455/0x21b0 [ 3437.472369] lock_acquire+0xc6/0x2b0 [ 3437.472518] ? filename_create+0x62/0x190 [ 3437.472683] ? lock_is_held_type+0x97/0x110 [ 3437.472856] mnt_want_write+0x46/0x1a0 [ 3437.473025] ? filename_create+0x62/0x190 [ 3437.473204] filename_create+0x62/0x190 [ 3437.473380] user_path_create+0x2d/0x50 [ 3437.473555] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.473819] ? lock_acquire+0xc6/0x2b0 [ 3437.474002] ? __fget_files+0x2a/0x190 [ 3437.474195] ? __fget_files+0xbc/0x190 [ 3437.474380] ? lock_release+0xc5/0x270 [ 3437.474567] ? __x64_sys_ioctl+0x93/0xd0 [ 3437.474764] ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs] [ 3437.475090] __x64_sys_ioctl+0x93/0xd0 [ 3437.475277] do_syscall_64+0x42/0xf0 [ 3437.475454] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.475691] RIP: 0033:0x7f2743c313af ====================================================== In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally and unlock it at the end of the function. There is a comment "why do we need this lock?" about the lock coming from commit 42d2373 ("bcachefs: Snapshot creation, deletion") The reason is that __bch2_ioctl_subvolume_create() calls sync_inodes_sb() which enforce locked s_umount to writeback all dirty nodes before doing snapshot works. Fix it by read locking s_umount for snapshotting only and unlocking s_umount after sync_inodes_sb(). Signed-off-by: Su Yue <glass.su@suse.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

KSAN calls into rcu code which then triggers a write that reenters into KSAN getting the system stuck doing infinite recursion. #0 kmsan_get_context () at mm/kmsan/kmsan.h:106 #1 __msan_get_context_state () at mm/kmsan/instrumentation.c:331 #2 0xffffffff81495671 in get_current () at ./arch/x86/include/asm/current.h:42 #3 rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379 #4 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402 #5 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748 torvalds#6 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016 torvalds#7 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82 torvalds#8 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75 torvalds#9 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143 torvalds#10 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97 torvalds#11 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36 torvalds#12 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91 torvalds#13 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379 torvalds#14 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402 torvalds#15 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748 torvalds#16 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016 torvalds#17 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82 torvalds#18 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75 torvalds#19 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143 torvalds#20 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97 torvalds#21 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36 torvalds#22 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91 torvalds#23 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379 torvalds#24 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402 torvalds#25 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748 torvalds#26 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016 torvalds#27 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82 torvalds#28 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75 torvalds#29 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143 torvalds#30 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97 torvalds#31 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36 torvalds#32 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91 torvalds#33 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379 torvalds#34 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402 torvalds#35 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748 torvalds#36 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016 torvalds#37 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82 torvalds#38 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75 torvalds#39 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143 torvalds#40 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97 torvalds#41 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36 torvalds#42 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91 torvalds#43 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379 torvalds#44 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402 torvalds#45 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748 torvalds#46 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016 torvalds#47 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82 torvalds#48 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75 torvalds#49 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143 torvalds#50 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97 torvalds#51 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36 #52 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91 #53 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379 torvalds#54 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402 torvalds#55 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748 torvalds#56 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016 torvalds#57 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82 #58 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75 torvalds#59 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143 torvalds#60 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97 torvalds#61 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36 torvalds#62 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91 torvalds#63 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379 torvalds#64 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402 torvalds#65 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748 torvalds#66 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016 torvalds#67 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82 torvalds#68 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75 torvalds#69 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143 #70 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97 torvalds#71 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36 torvalds#72 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91 torvalds#73 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379 torvalds#74 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402 torvalds#75 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748 torvalds#76 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016 torvalds#77 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff86203c90) at ./arch/x86/include/asm/kmsan.h:82 torvalds#78 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff86203c90) at mm/kmsan/shadow.c:75 torvalds#79 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff86203c90, is_origin=false) at mm/kmsan/shadow.c:143 torvalds#80 kmsan_get_shadow_origin_ptr (address=0xffffffff86203c90, size=8, store=false) at mm/kmsan/shadow.c:97 torvalds#81 0xffffffff81b1dc72 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=8, store=false) at mm/kmsan/instrumentation.c:36 torvalds#82 __msan_metadata_ptr_for_load_8 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:92 torvalds#83 0xffffffff814fdb9e in filter_irq_stacks (entries=<optimized out>, nr_entries=4) at kernel/stacktrace.c:397 torvalds#84 0xffffffff829520e8 in stack_depot_save_flags (entries=0xffffffff8620d974 <init_task+1012>, nr_entries=4, alloc_flags=0, depot_flags=0) at lib/stackdepot.c:500 torvalds#85 0xffffffff81b1e560 in __msan_poison_alloca (address=0xffffffff86203da0, size=24, descr=<optimized out>) at mm/kmsan/instrumentation.c:285 torvalds#86 0xffffffff8562821c in _printk (fmt=0xffffffff85f191a5 "\0016Attempting lock1") at kernel/printk/printk.c:2324 torvalds#87 0xffffffff81942aa2 in kmem_cache_create_usercopy (name=0xffffffff85f18903 "mm_struct", size=1296, align=0, flags=270336, useroffset=<optimized out>, usersize=<optimized out>, ctor=0x0 <fixed_percpu_data>) at mm/slab_common.c:296 torvalds#88 0xffffffff86f337a0 in mm_cache_init () at kernel/fork.c:3262 torvalds#89 0xffffffff86eacb8e in start_kernel () at init/main.c:932 torvalds#90 0xffffffff86ecdf94 in x86_64_start_reservations (real_mode_data=0x140e0 <exception_stacks+28896> <error: Cannot access memory at address 0x140e0>) at arch/x86/kernel/head64.c:555 torvalds#91 0xffffffff86ecde9b in x86_64_start_kernel (real_mode_data=0x140e0 <exception_stacks+28896> <error: Cannot access memory at address 0x140e0>) at arch/x86/kernel/head64.c:536 torvalds#92 0xffffffff810001d3 in secondary_startup_64 () at /pool/workspace/linux/arch/x86/kernel/head_64.S:461 torvalds#93 0x0000000000000000 in ??

As of 5ec8e8e(mm/sparsemem: fix race in accessing memory_section->usage) KMSAN now calls into RCU tree code during kmsan_get_metadata. This will trigger a write that will reenter into KMSAN getting the system stuck doing infinite recursion. #0 kmsan_get_context () at mm/kmsan/kmsan.h:106 #1 __msan_get_context_state () at mm/kmsan/instrumentation.c:331 #2 0xffffffff81495671 in get_current () at ./arch/x86/include/asm/current.h:42 #3 rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379 #4 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402 #5 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748 torvalds#6 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016 torvalds#7 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82 torvalds#8 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75 torvalds#9 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143 torvalds#10 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97 torvalds#11 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36 torvalds#12 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91 torvalds#13 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379 torvalds#14 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402 torvalds#15 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748 torvalds#16 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016 torvalds#17 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82 torvalds#18 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75 torvalds#19 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143 torvalds#20 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97 torvalds#21 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36 torvalds#22 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91 torvalds#23 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379 torvalds#24 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402 torvalds#25 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748 torvalds#26 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016 torvalds#27 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82 torvalds#28 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75 torvalds#29 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143 torvalds#30 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97 torvalds#31 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36 torvalds#32 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91 torvalds#33 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379 torvalds#34 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402 torvalds#35 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748 torvalds#36 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016 torvalds#37 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82 torvalds#38 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75 torvalds#39 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143 torvalds#40 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97 torvalds#41 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36 torvalds#42 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91 torvalds#43 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379 torvalds#44 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402 torvalds#45 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748 torvalds#46 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016 torvalds#47 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82 torvalds#48 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75 torvalds#49 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143 torvalds#50 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97 torvalds#51 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36 #52 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91 #53 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379 torvalds#54 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402 torvalds#55 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748 torvalds#56 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016 torvalds#57 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82 #58 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75 torvalds#59 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143 torvalds#60 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97 torvalds#61 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36 torvalds#62 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91 torvalds#63 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379 torvalds#64 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402 torvalds#65 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748 torvalds#66 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016 torvalds#67 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff8620d974 <init_task+1012>) at ./arch/x86/include/asm/kmsan.h:82 torvalds#68 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/shadow.c:75 torvalds#69 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff8620d974 <init_task+1012>, is_origin=false) at mm/kmsan/shadow.c:143 #70 kmsan_get_shadow_origin_ptr (address=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/shadow.c:97 torvalds#71 0xffffffff81b1dbd2 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=4, store=false) at mm/kmsan/instrumentation.c:36 torvalds#72 __msan_metadata_ptr_for_load_4 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:91 torvalds#73 0xffffffff8149568f in rcu_preempt_read_enter () at kernel/rcu/tree_plugin.h:379 torvalds#74 __rcu_read_lock () at kernel/rcu/tree_plugin.h:402 torvalds#75 0xffffffff81b2054b in rcu_read_lock () at ./include/linux/rcupdate.h:748 torvalds#76 pfn_valid (pfn=<optimized out>) at ./include/linux/mmzone.h:2016 torvalds#77 kmsan_virt_addr_valid (addr=addr@entry=0xffffffff86203c90) at ./arch/x86/include/asm/kmsan.h:82 torvalds#78 virt_to_page_or_null (vaddr=vaddr@entry=0xffffffff86203c90) at mm/kmsan/shadow.c:75 torvalds#79 0xffffffff81b2023c in kmsan_get_metadata (address=0xffffffff86203c90, is_origin=false) at mm/kmsan/shadow.c:143 torvalds#80 kmsan_get_shadow_origin_ptr (address=0xffffffff86203c90, size=8, store=false) at mm/kmsan/shadow.c:97 torvalds#81 0xffffffff81b1dc72 in get_shadow_origin_ptr (addr=0xffffffff8620d974 <init_task+1012>, size=8, store=false) at mm/kmsan/instrumentation.c:36 torvalds#82 __msan_metadata_ptr_for_load_8 (addr=0xffffffff8620d974 <init_task+1012>) at mm/kmsan/instrumentation.c:92 torvalds#83 0xffffffff814fdb9e in filter_irq_stacks (entries=<optimized out>, nr_entries=4) at kernel/stacktrace.c:397 torvalds#84 0xffffffff829520e8 in stack_depot_save_flags (entries=0xffffffff8620d974 <init_task+1012>, nr_entries=4, alloc_flags=0, depot_flags=0) at lib/stackdepot.c:500 torvalds#85 0xffffffff81b1e560 in __msan_poison_alloca (address=0xffffffff86203da0, size=24, descr=<optimized out>) at mm/kmsan/instrumentation.c:285 torvalds#86 0xffffffff8562821c in _printk (fmt=0xffffffff85f191a5 "\0016Attempting lock1") at kernel/printk/printk.c:2324 torvalds#87 0xffffffff81942aa2 in kmem_cache_create_usercopy (name=0xffffffff85f18903 "mm_struct", size=1296, align=0, flags=270336, useroffset=<optimized out>, usersize=<optimized out>, ctor=0x0 <fixed_percpu_data>) at mm/slab_common.c:296 torvalds#88 0xffffffff86f337a0 in mm_cache_init () at kernel/fork.c:3262 torvalds#89 0xffffffff86eacb8e in start_kernel () at init/main.c:932 torvalds#90 0xffffffff86ecdf94 in x86_64_start_reservations (real_mode_data=0x140e0 <exception_stacks+28896> <error: Cannot access memory at address 0x140e0>) at arch/x86/kernel/head64.c:555 torvalds#91 0xffffffff86ecde9b in x86_64_start_kernel (real_mode_data=0x140e0 <exception_stacks+28896> <error: Cannot access memory at address 0x140e0>) at arch/x86/kernel/head64.c:536 torvalds#92 0xffffffff810001d3 in secondary_startup_64 () at /pool/workspace/linux/arch/x86/kernel/head_64.S:461 torvalds#93 0x0000000000000000 in ??

commit 2acc59d upstream. When I was testing mongodb over bcachefs with compression, there is a lockdep warning when snapshotting mongodb data volume. $ cat test.sh prog=bcachefs $prog subvolume create /mnt/data $prog subvolume create /mnt/data/snapshots while true;do $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s) sleep 1s done $ cat /etc/mongodb.conf systemLog: destination: file logAppend: true path: /mnt/data/mongod.log storage: dbPath: /mnt/data/ lockdep reports: [ 3437.452330] ====================================================== [ 3437.452750] WARNING: possible circular locking dependency detected [ 3437.453168] 6.7.0-rc7-custom+ torvalds#85 Tainted: G E [ 3437.453562] ------------------------------------------------------ [ 3437.453981] bcachefs/35533 is trying to acquire lock: [ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190 [ 3437.454875] but task is already holding lock: [ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.456009] which lock already depends on the new lock. [ 3437.456553] the existing dependency chain (in reverse order) is: [ 3437.457054] -> #3 (&type->s_umount_key#48){.+.+}-{3:3}: [ 3437.457507] down_read+0x3e/0x170 [ 3437.457772] bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.458206] __x64_sys_ioctl+0x93/0xd0 [ 3437.458498] do_syscall_64+0x42/0xf0 [ 3437.458779] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.459155] -> #2 (&c->snapshot_create_lock){++++}-{3:3}: [ 3437.459615] down_read+0x3e/0x170 [ 3437.459878] bch2_truncate+0x82/0x110 [bcachefs] [ 3437.460276] bchfs_truncate+0x254/0x3c0 [bcachefs] [ 3437.460686] notify_change+0x1f1/0x4a0 [ 3437.461283] do_truncate+0x7f/0xd0 [ 3437.461555] path_openat+0xa57/0xce0 [ 3437.461836] do_filp_open+0xb4/0x160 [ 3437.462116] do_sys_openat2+0x91/0xc0 [ 3437.462402] __x64_sys_openat+0x53/0xa0 [ 3437.462701] do_syscall_64+0x42/0xf0 [ 3437.462982] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.463359] -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}: [ 3437.463843] down_write+0x3b/0xc0 [ 3437.464223] bch2_write_iter+0x5b/0xcc0 [bcachefs] [ 3437.464493] vfs_write+0x21b/0x4c0 [ 3437.464653] ksys_write+0x69/0xf0 [ 3437.464839] do_syscall_64+0x42/0xf0 [ 3437.465009] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.465231] -> #0 (sb_writers#10){.+.+}-{0:0}: [ 3437.465471] __lock_acquire+0x1455/0x21b0 [ 3437.465656] lock_acquire+0xc6/0x2b0 [ 3437.465822] mnt_want_write+0x46/0x1a0 [ 3437.465996] filename_create+0x62/0x190 [ 3437.466175] user_path_create+0x2d/0x50 [ 3437.466352] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.466617] __x64_sys_ioctl+0x93/0xd0 [ 3437.466791] do_syscall_64+0x42/0xf0 [ 3437.466957] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.467180] other info that might help us debug this: [ 3437.469670] 2 locks held by bcachefs/35533: other info that might help us debug this: [ 3437.467507] Chain exists of: sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48 [ 3437.467979] Possible unsafe locking scenario: [ 3437.468223] CPU0 CPU1 [ 3437.468405] ---- ---- [ 3437.468585] rlock(&type->s_umount_key#48); [ 3437.468758] lock(&c->snapshot_create_lock); [ 3437.469030] lock(&type->s_umount_key#48); [ 3437.469291] rlock(sb_writers#10); [ 3437.469434] *** DEADLOCK *** [ 3437.469670] 2 locks held by bcachefs/35533: [ 3437.469838] #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs] [ 3437.470294] #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.470744] stack backtrace: [ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G E 6.7.0-rc7-custom+ torvalds#85 [ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 3437.471694] Call Trace: [ 3437.471795] <TASK> [ 3437.471884] dump_stack_lvl+0x57/0x90 [ 3437.472035] check_noncircular+0x132/0x150 [ 3437.472202] __lock_acquire+0x1455/0x21b0 [ 3437.472369] lock_acquire+0xc6/0x2b0 [ 3437.472518] ? filename_create+0x62/0x190 [ 3437.472683] ? lock_is_held_type+0x97/0x110 [ 3437.472856] mnt_want_write+0x46/0x1a0 [ 3437.473025] ? filename_create+0x62/0x190 [ 3437.473204] filename_create+0x62/0x190 [ 3437.473380] user_path_create+0x2d/0x50 [ 3437.473555] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.473819] ? lock_acquire+0xc6/0x2b0 [ 3437.474002] ? __fget_files+0x2a/0x190 [ 3437.474195] ? __fget_files+0xbc/0x190 [ 3437.474380] ? lock_release+0xc5/0x270 [ 3437.474567] ? __x64_sys_ioctl+0x93/0xd0 [ 3437.474764] ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs] [ 3437.475090] __x64_sys_ioctl+0x93/0xd0 [ 3437.475277] do_syscall_64+0x42/0xf0 [ 3437.475454] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.475691] RIP: 0033:0x7f2743c313af ====================================================== In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally and unlock it at the end of the function. There is a comment "why do we need this lock?" about the lock coming from commit 42d2373 ("bcachefs: Snapshot creation, deletion") The reason is that __bch2_ioctl_subvolume_create() calls sync_inodes_sb() which enforce locked s_umount to writeback all dirty nodes before doing snapshot works. Fix it by read locking s_umount for snapshotting only and unlocking s_umount after sync_inodes_sb(). Signed-off-by: Su Yue <glass.su@suse.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

@2

Recent additions in BPF like cpu v4 instructions, test_bpf module exhibits the following failures: test_bpf: torvalds#82 ALU_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: torvalds#83 ALU_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: torvalds#84 ALU64_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: torvalds#85 ALU64_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: torvalds#86 ALU64_MOVSX | BPF_W jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: torvalds#165 ALU_SDIV_X: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times) test_bpf: torvalds#166 ALU_SDIV_K: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times) test_bpf: torvalds#169 ALU_SMOD_X: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times) test_bpf: torvalds#170 ALU_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times) test_bpf: torvalds#172 ALU64_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times) test_bpf: torvalds#313 BSWAP 16: 0x0123456789abcdef -> 0xefcd eBPF filter opcode 00d7 (@2) unsupported jited:0 301 PASS test_bpf: torvalds#314 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89 eBPF filter opcode 00d7 (@2) unsupported jited:0 555 PASS test_bpf: torvalds#315 BSWAP 64: 0x0123456789abcdef -> 0x67452301 eBPF filter opcode 00d7 (@2) unsupported jited:0 268 PASS test_bpf: torvalds#316 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89 eBPF filter opcode 00d7 (@2) unsupported jited:0 269 PASS test_bpf: torvalds#317 BSWAP 16: 0xfedcba9876543210 -> 0x1032 eBPF filter opcode 00d7 (@2) unsupported jited:0 460 PASS test_bpf: torvalds#318 BSWAP 32: 0xfedcba9876543210 -> 0x10325476 eBPF filter opcode 00d7 (@2) unsupported jited:0 320 PASS test_bpf: torvalds#319 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe eBPF filter opcode 00d7 (@2) unsupported jited:0 222 PASS test_bpf: torvalds#320 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476 eBPF filter opcode 00d7 (@2) unsupported jited:0 273 PASS test_bpf: torvalds#344 BPF_LDX_MEMSX | BPF_B eBPF filter opcode 0091 (@5) unsupported jited:0 432 PASS test_bpf: torvalds#345 BPF_LDX_MEMSX | BPF_H eBPF filter opcode 0089 (@5) unsupported jited:0 381 PASS test_bpf: torvalds#346 BPF_LDX_MEMSX | BPF_W eBPF filter opcode 0081 (@5) unsupported jited:0 505 PASS test_bpf: torvalds#490 JMP32_JA: Unconditional jump: if (true) return 1 eBPF filter opcode 0006 (@1) unsupported jited:0 261 PASS test_bpf: Summary: 1040 PASSED, 10 FAILED, [924/1038 JIT'ed] Fix them by adding missing processing. Fixes: daabb2b ("bpf/tests: add tests for cpuv4 instructions") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/91de862dda99d170697eb79ffb478678af7e0b27.1709652689.git.christophe.leroy@csgroup.eu

@2

[ Upstream commit 8ecf3c1 ] Recent additions in BPF like cpu v4 instructions, test_bpf module exhibits the following failures: test_bpf: torvalds#82 ALU_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: torvalds#83 ALU_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: torvalds#84 ALU64_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: torvalds#85 ALU64_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: torvalds#86 ALU64_MOVSX | BPF_W jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: torvalds#165 ALU_SDIV_X: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times) test_bpf: torvalds#166 ALU_SDIV_K: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times) test_bpf: torvalds#169 ALU_SMOD_X: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times) test_bpf: torvalds#170 ALU_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times) test_bpf: torvalds#172 ALU64_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times) test_bpf: torvalds#313 BSWAP 16: 0x0123456789abcdef -> 0xefcd eBPF filter opcode 00d7 (@2) unsupported jited:0 301 PASS test_bpf: torvalds#314 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89 eBPF filter opcode 00d7 (@2) unsupported jited:0 555 PASS test_bpf: torvalds#315 BSWAP 64: 0x0123456789abcdef -> 0x67452301 eBPF filter opcode 00d7 (@2) unsupported jited:0 268 PASS test_bpf: torvalds#316 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89 eBPF filter opcode 00d7 (@2) unsupported jited:0 269 PASS test_bpf: torvalds#317 BSWAP 16: 0xfedcba9876543210 -> 0x1032 eBPF filter opcode 00d7 (@2) unsupported jited:0 460 PASS test_bpf: torvalds#318 BSWAP 32: 0xfedcba9876543210 -> 0x10325476 eBPF filter opcode 00d7 (@2) unsupported jited:0 320 PASS test_bpf: torvalds#319 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe eBPF filter opcode 00d7 (@2) unsupported jited:0 222 PASS test_bpf: torvalds#320 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476 eBPF filter opcode 00d7 (@2) unsupported jited:0 273 PASS test_bpf: torvalds#344 BPF_LDX_MEMSX | BPF_B eBPF filter opcode 0091 (@5) unsupported jited:0 432 PASS test_bpf: torvalds#345 BPF_LDX_MEMSX | BPF_H eBPF filter opcode 0089 (@5) unsupported jited:0 381 PASS test_bpf: torvalds#346 BPF_LDX_MEMSX | BPF_W eBPF filter opcode 0081 (@5) unsupported jited:0 505 PASS test_bpf: torvalds#490 JMP32_JA: Unconditional jump: if (true) return 1 eBPF filter opcode 0006 (@1) unsupported jited:0 261 PASS test_bpf: Summary: 1040 PASSED, 10 FAILED, [924/1038 JIT'ed] Fix them by adding missing processing. Fixes: daabb2b ("bpf/tests: add tests for cpuv4 instructions") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/91de862dda99d170697eb79ffb478678af7e0b27.1709652689.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>

@2

[ Upstream commit 8ecf3c1 ] Recent additions in BPF like cpu v4 instructions, test_bpf module exhibits the following failures: test_bpf: torvalds#82 ALU_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: torvalds#83 ALU_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: torvalds#84 ALU64_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: torvalds#85 ALU64_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: torvalds#86 ALU64_MOVSX | BPF_W jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: torvalds#165 ALU_SDIV_X: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times) test_bpf: torvalds#166 ALU_SDIV_K: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times) test_bpf: torvalds#169 ALU_SMOD_X: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times) test_bpf: torvalds#170 ALU_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times) test_bpf: torvalds#172 ALU64_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times) test_bpf: torvalds#313 BSWAP 16: 0x0123456789abcdef -> 0xefcd eBPF filter opcode 00d7 (@2) unsupported jited:0 301 PASS test_bpf: torvalds#314 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89 eBPF filter opcode 00d7 (@2) unsupported jited:0 555 PASS test_bpf: torvalds#315 BSWAP 64: 0x0123456789abcdef -> 0x67452301 eBPF filter opcode 00d7 (@2) unsupported jited:0 268 PASS test_bpf: torvalds#316 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89 eBPF filter opcode 00d7 (@2) unsupported jited:0 269 PASS test_bpf: torvalds#317 BSWAP 16: 0xfedcba9876543210 -> 0x1032 eBPF filter opcode 00d7 (@2) unsupported jited:0 460 PASS test_bpf: torvalds#318 BSWAP 32: 0xfedcba9876543210 -> 0x10325476 eBPF filter opcode 00d7 (@2) unsupported jited:0 320 PASS test_bpf: torvalds#319 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe eBPF filter opcode 00d7 (@2) unsupported jited:0 222 PASS test_bpf: torvalds#320 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476 eBPF filter opcode 00d7 (@2) unsupported jited:0 273 PASS test_bpf: torvalds#344 BPF_LDX_MEMSX | BPF_B eBPF filter opcode 0091 (@5) unsupported jited:0 432 PASS test_bpf: torvalds#345 BPF_LDX_MEMSX | BPF_H eBPF filter opcode 0089 (@5) unsupported jited:0 381 PASS test_bpf: torvalds#346 BPF_LDX_MEMSX | BPF_W eBPF filter opcode 0081 (@5) unsupported jited:0 505 PASS test_bpf: torvalds#490 JMP32_JA: Unconditional jump: if (true) return 1 eBPF filter opcode 0006 (@1) unsupported jited:0 261 PASS test_bpf: Summary: 1040 PASSED, 10 FAILED, [924/1038 JIT'ed] Fix them by adding missing processing. Fixes: daabb2b ("bpf/tests: add tests for cpuv4 instructions") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/91de862dda99d170697eb79ffb478678af7e0b27.1709652689.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>

@2

[ Upstream commit 8ecf3c1 ] Recent additions in BPF like cpu v4 instructions, test_bpf module exhibits the following failures: test_bpf: torvalds#82 ALU_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: torvalds#83 ALU_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: torvalds#84 ALU64_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: torvalds#85 ALU64_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: torvalds#86 ALU64_MOVSX | BPF_W jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: torvalds#165 ALU_SDIV_X: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times) test_bpf: torvalds#166 ALU_SDIV_K: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times) test_bpf: torvalds#169 ALU_SMOD_X: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times) test_bpf: torvalds#170 ALU_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times) test_bpf: torvalds#172 ALU64_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times) test_bpf: torvalds#313 BSWAP 16: 0x0123456789abcdef -> 0xefcd eBPF filter opcode 00d7 (@2) unsupported jited:0 301 PASS test_bpf: torvalds#314 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89 eBPF filter opcode 00d7 (@2) unsupported jited:0 555 PASS test_bpf: torvalds#315 BSWAP 64: 0x0123456789abcdef -> 0x67452301 eBPF filter opcode 00d7 (@2) unsupported jited:0 268 PASS test_bpf: torvalds#316 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89 eBPF filter opcode 00d7 (@2) unsupported jited:0 269 PASS test_bpf: torvalds#317 BSWAP 16: 0xfedcba9876543210 -> 0x1032 eBPF filter opcode 00d7 (@2) unsupported jited:0 460 PASS test_bpf: torvalds#318 BSWAP 32: 0xfedcba9876543210 -> 0x10325476 eBPF filter opcode 00d7 (@2) unsupported jited:0 320 PASS test_bpf: torvalds#319 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe eBPF filter opcode 00d7 (@2) unsupported jited:0 222 PASS test_bpf: torvalds#320 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476 eBPF filter opcode 00d7 (@2) unsupported jited:0 273 PASS test_bpf: torvalds#344 BPF_LDX_MEMSX | BPF_B eBPF filter opcode 0091 (@5) unsupported jited:0 432 PASS test_bpf: torvalds#345 BPF_LDX_MEMSX | BPF_H eBPF filter opcode 0089 (@5) unsupported jited:0 381 PASS test_bpf: torvalds#346 BPF_LDX_MEMSX | BPF_W eBPF filter opcode 0081 (@5) unsupported jited:0 505 PASS test_bpf: torvalds#490 JMP32_JA: Unconditional jump: if (true) return 1 eBPF filter opcode 0006 (@1) unsupported jited:0 261 PASS test_bpf: Summary: 1040 PASSED, 10 FAILED, [924/1038 JIT'ed] Fix them by adding missing processing. Fixes: daabb2b ("bpf/tests: add tests for cpuv4 instructions") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/91de862dda99d170697eb79ffb478678af7e0b27.1709652689.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>

Reading the dispatch trace log from /sys/kernel/debug/powerpc/dtl/cpu-* results in a BUG() when the config CONFIG_HARDENED_USERCOPY is enabled as shown below. kernel BUG at mm/usercopy.c:102! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc scsi_transport_fc ibmveth pseries_wdt dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse CPU: 27 PID: 1815 Comm: python3 Not tainted 6.10.0-rc3 torvalds#85 Hardware name: IBM,9040-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_042) hv:phyp pSeries NIP: c0000000005d23d4 LR: c0000000005d23d0 CTR: 00000000006ee6f8 REGS: c000000120c078c0 TRAP: 0700 Not tainted (6.10.0-rc3) MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 2828220f XER: 0000000e CFAR: c0000000001fdc80 IRQMASK: 0 [ ... GPRs omitted ... ] NIP [c0000000005d23d4] usercopy_abort+0x78/0xb0 LR [c0000000005d23d0] usercopy_abort+0x74/0xb0 Call Trace: usercopy_abort+0x74/0xb0 (unreliable) __check_heap_object+0xf8/0x120 check_heap_object+0x218/0x240 __check_object_size+0x84/0x1a4 dtl_file_read+0x17c/0x2c4 full_proxy_read+0x8c/0x110 vfs_read+0xdc/0x3a0 ksys_read+0x84/0x144 system_call_exception+0x124/0x330 system_call_vectored_common+0x15c/0x2ec --- interrupt: 3000 at 0x7fff81f3ab34 Commit 6d07d1c ("usercopy: Restrict non-usercopy caches to size 0") requires that only whitelisted areas in slab/slub objects can be copied to userspace when usercopy hardening is enabled using CONFIG_HARDENED_USERCOPY. Dtl contains hypervisor dispatch events which are expected to be read by privileged users. Hence mark this safe for user access. Specify useroffset=0 and usersize=DISPATCH_LOG_BYTES to whitelist the entire object. Co-developed-by: Vishal Chourasia <vishalc@linux.ibm.com> Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com> Signed-off-by: Anjali K <anjalik@linux.ibm.com>

Reading the dispatch trace log from /sys/kernel/debug/powerpc/dtl/cpu-* results in a BUG() when the config CONFIG_HARDENED_USERCOPY is enabled as shown below. kernel BUG at mm/usercopy.c:102! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc scsi_transport_fc ibmveth pseries_wdt dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse CPU: 27 PID: 1815 Comm: python3 Not tainted 6.10.0-rc3 torvalds#85 Hardware name: IBM,9040-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_042) hv:phyp pSeries NIP: c0000000005d23d4 LR: c0000000005d23d0 CTR: 00000000006ee6f8 REGS: c000000120c078c0 TRAP: 0700 Not tainted (6.10.0-rc3) MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 2828220f XER: 0000000e CFAR: c0000000001fdc80 IRQMASK: 0 [ ... GPRs omitted ... ] NIP [c0000000005d23d4] usercopy_abort+0x78/0xb0 LR [c0000000005d23d0] usercopy_abort+0x74/0xb0 Call Trace: usercopy_abort+0x74/0xb0 (unreliable) __check_heap_object+0xf8/0x120 check_heap_object+0x218/0x240 __check_object_size+0x84/0x1a4 dtl_file_read+0x17c/0x2c4 full_proxy_read+0x8c/0x110 vfs_read+0xdc/0x3a0 ksys_read+0x84/0x144 system_call_exception+0x124/0x330 system_call_vectored_common+0x15c/0x2ec --- interrupt: 3000 at 0x7fff81f3ab34 Commit 6d07d1c ("usercopy: Restrict non-usercopy caches to size 0") requires that only whitelisted areas in slab/slub objects can be copied to userspace when usercopy hardening is enabled using CONFIG_HARDENED_USERCOPY. Dtl contains hypervisor dispatch events which are expected to be read by privileged users. Hence mark this safe for user access. Specify useroffset=0 and usersize=DISPATCH_LOG_BYTES to whitelist the entire object. Co-developed-by: Vishal Chourasia <vishalc@linux.ibm.com> Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com> Signed-off-by: Anjali K <anjalik@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240614173844.746818-1-anjalik@linux.ibm.com

[ Upstream commit 1a14150 ] Reading the dispatch trace log from /sys/kernel/debug/powerpc/dtl/cpu-* results in a BUG() when the config CONFIG_HARDENED_USERCOPY is enabled as shown below. kernel BUG at mm/usercopy.c:102! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc scsi_transport_fc ibmveth pseries_wdt dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse CPU: 27 PID: 1815 Comm: python3 Not tainted 6.10.0-rc3 torvalds#85 Hardware name: IBM,9040-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_042) hv:phyp pSeries NIP: c0000000005d23d4 LR: c0000000005d23d0 CTR: 00000000006ee6f8 REGS: c000000120c078c0 TRAP: 0700 Not tainted (6.10.0-rc3) MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 2828220f XER: 0000000e CFAR: c0000000001fdc80 IRQMASK: 0 [ ... GPRs omitted ... ] NIP [c0000000005d23d4] usercopy_abort+0x78/0xb0 LR [c0000000005d23d0] usercopy_abort+0x74/0xb0 Call Trace: usercopy_abort+0x74/0xb0 (unreliable) __check_heap_object+0xf8/0x120 check_heap_object+0x218/0x240 __check_object_size+0x84/0x1a4 dtl_file_read+0x17c/0x2c4 full_proxy_read+0x8c/0x110 vfs_read+0xdc/0x3a0 ksys_read+0x84/0x144 system_call_exception+0x124/0x330 system_call_vectored_common+0x15c/0x2ec --- interrupt: 3000 at 0x7fff81f3ab34 Commit 6d07d1c ("usercopy: Restrict non-usercopy caches to size 0") requires that only whitelisted areas in slab/slub objects can be copied to userspace when usercopy hardening is enabled using CONFIG_HARDENED_USERCOPY. Dtl contains hypervisor dispatch events which are expected to be read by privileged users. Hence mark this safe for user access. Specify useroffset=0 and usersize=DISPATCH_LOG_BYTES to whitelist the entire object. Co-developed-by: Vishal Chourasia <vishalc@linux.ibm.com> Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com> Signed-off-by: Anjali K <anjalik@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240614173844.746818-1-anjalik@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>

When I was testing mongodb over bcachefs with compression, there is a lockdep warning when snapshotting mongodb data volume. $ cat test.sh prog=bcachefs $prog subvolume create /mnt/data $prog subvolume create /mnt/data/snapshots while true;do $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s) sleep 1s done $ cat /etc/mongodb.conf systemLog: destination: file logAppend: true path: /mnt/data/mongod.log storage: dbPath: /mnt/data/ lockdep reports: [ 3437.452330] ====================================================== [ 3437.452750] WARNING: possible circular locking dependency detected [ 3437.453168] 6.7.0-rc7-custom+ torvalds#85 Tainted: G E [ 3437.453562] ------------------------------------------------------ [ 3437.453981] bcachefs/35533 is trying to acquire lock: [ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190 [ 3437.454875] but task is already holding lock: [ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.456009] which lock already depends on the new lock. [ 3437.456553] the existing dependency chain (in reverse order) is: [ 3437.457054] -> #3 (&type->s_umount_key#48){.+.+}-{3:3}: [ 3437.457507] down_read+0x3e/0x170 [ 3437.457772] bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.458206] __x64_sys_ioctl+0x93/0xd0 [ 3437.458498] do_syscall_64+0x42/0xf0 [ 3437.458779] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.459155] -> #2 (&c->snapshot_create_lock){++++}-{3:3}: [ 3437.459615] down_read+0x3e/0x170 [ 3437.459878] bch2_truncate+0x82/0x110 [bcachefs] [ 3437.460276] bchfs_truncate+0x254/0x3c0 [bcachefs] [ 3437.460686] notify_change+0x1f1/0x4a0 [ 3437.461283] do_truncate+0x7f/0xd0 [ 3437.461555] path_openat+0xa57/0xce0 [ 3437.461836] do_filp_open+0xb4/0x160 [ 3437.462116] do_sys_openat2+0x91/0xc0 [ 3437.462402] __x64_sys_openat+0x53/0xa0 [ 3437.462701] do_syscall_64+0x42/0xf0 [ 3437.462982] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.463359] -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}: [ 3437.463843] down_write+0x3b/0xc0 [ 3437.464223] bch2_write_iter+0x5b/0xcc0 [bcachefs] [ 3437.464493] vfs_write+0x21b/0x4c0 [ 3437.464653] ksys_write+0x69/0xf0 [ 3437.464839] do_syscall_64+0x42/0xf0 [ 3437.465009] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.465231] -> #0 (sb_writers#10){.+.+}-{0:0}: [ 3437.465471] __lock_acquire+0x1455/0x21b0 [ 3437.465656] lock_acquire+0xc6/0x2b0 [ 3437.465822] mnt_want_write+0x46/0x1a0 [ 3437.465996] filename_create+0x62/0x190 [ 3437.466175] user_path_create+0x2d/0x50 [ 3437.466352] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.466617] __x64_sys_ioctl+0x93/0xd0 [ 3437.466791] do_syscall_64+0x42/0xf0 [ 3437.466957] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.467180] other info that might help us debug this: [ 3437.469670] 2 locks held by bcachefs/35533: other info that might help us debug this: [ 3437.467507] Chain exists of: sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48 [ 3437.467979] Possible unsafe locking scenario: [ 3437.468223] CPU0 CPU1 [ 3437.468405] ---- ---- [ 3437.468585] rlock(&type->s_umount_key#48); [ 3437.468758] lock(&c->snapshot_create_lock); [ 3437.469030] lock(&type->s_umount_key#48); [ 3437.469291] rlock(sb_writers#10); [ 3437.469434] *** DEADLOCK *** [ 3437.469670] 2 locks held by bcachefs/35533: [ 3437.469838] #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs] [ 3437.470294] #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.470744] stack backtrace: [ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G E 6.7.0-rc7-custom+ torvalds#85 [ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 3437.471694] Call Trace: [ 3437.471795] <TASK> [ 3437.471884] dump_stack_lvl+0x57/0x90 [ 3437.472035] check_noncircular+0x132/0x150 [ 3437.472202] __lock_acquire+0x1455/0x21b0 [ 3437.472369] lock_acquire+0xc6/0x2b0 [ 3437.472518] ? filename_create+0x62/0x190 [ 3437.472683] ? lock_is_held_type+0x97/0x110 [ 3437.472856] mnt_want_write+0x46/0x1a0 [ 3437.473025] ? filename_create+0x62/0x190 [ 3437.473204] filename_create+0x62/0x190 [ 3437.473380] user_path_create+0x2d/0x50 [ 3437.473555] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.473819] ? lock_acquire+0xc6/0x2b0 [ 3437.474002] ? __fget_files+0x2a/0x190 [ 3437.474195] ? __fget_files+0xbc/0x190 [ 3437.474380] ? lock_release+0xc5/0x270 [ 3437.474567] ? __x64_sys_ioctl+0x93/0xd0 [ 3437.474764] ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs] [ 3437.475090] __x64_sys_ioctl+0x93/0xd0 [ 3437.475277] do_syscall_64+0x42/0xf0 [ 3437.475454] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.475691] RIP: 0033:0x7f2743c313af ====================================================== In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally and unlock it at the end of the function. There is a comment "why do we need this lock?" about the lock coming from commit 42d2373 ("bcachefs: Snapshot creation, deletion") The reason is that __bch2_ioctl_subvolume_create() calls sync_inodes_sb() which enforce locked s_umount to writeback all dirty nodes before doing snapshot works. Fix it by read locking s_umount for snapshotting only and unlocking s_umount after sync_inodes_sb(). Signed-off-by: Su Yue <glass.su@suse.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

Reading the dispatch trace log from /sys/kernel/debug/powerpc/dtl/cpu-* results in a BUG() when the config CONFIG_HARDENED_USERCOPY is enabled as shown below. kernel BUG at mm/usercopy.c:102! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc scsi_transport_fc ibmveth pseries_wdt dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse CPU: 27 PID: 1815 Comm: python3 Not tainted 6.10.0-rc3 torvalds#85 Hardware name: IBM,9040-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_042) hv:phyp pSeries NIP: c0000000005d23d4 LR: c0000000005d23d0 CTR: 00000000006ee6f8 REGS: c000000120c078c0 TRAP: 0700 Not tainted (6.10.0-rc3) MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 2828220f XER: 0000000e CFAR: c0000000001fdc80 IRQMASK: 0 [ ... GPRs omitted ... ] NIP [c0000000005d23d4] usercopy_abort+0x78/0xb0 LR [c0000000005d23d0] usercopy_abort+0x74/0xb0 Call Trace: usercopy_abort+0x74/0xb0 (unreliable) __check_heap_object+0xf8/0x120 check_heap_object+0x218/0x240 __check_object_size+0x84/0x1a4 dtl_file_read+0x17c/0x2c4 full_proxy_read+0x8c/0x110 vfs_read+0xdc/0x3a0 ksys_read+0x84/0x144 system_call_exception+0x124/0x330 system_call_vectored_common+0x15c/0x2ec --- interrupt: 3000 at 0x7fff81f3ab34 Commit 6d07d1c ("usercopy: Restrict non-usercopy caches to size 0") requires that only whitelisted areas in slab/slub objects can be copied to userspace when usercopy hardening is enabled using CONFIG_HARDENED_USERCOPY. Dtl contains hypervisor dispatch events which are expected to be read by privileged users. Hence mark this safe for user access. Specify useroffset=0 and usersize=DISPATCH_LOG_BYTES to whitelist the entire object. Co-developed-by: Vishal Chourasia <vishalc@linux.ibm.com> Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com> Signed-off-by: Anjali K <anjalik@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240614173844.746818-1-anjalik@linux.ibm.com

[ Upstream commit 1a14150 ] Reading the dispatch trace log from /sys/kernel/debug/powerpc/dtl/cpu-* results in a BUG() when the config CONFIG_HARDENED_USERCOPY is enabled as shown below. kernel BUG at mm/usercopy.c:102! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc scsi_transport_fc ibmveth pseries_wdt dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse CPU: 27 PID: 1815 Comm: python3 Not tainted 6.10.0-rc3 torvalds#85 Hardware name: IBM,9040-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_042) hv:phyp pSeries NIP: c0000000005d23d4 LR: c0000000005d23d0 CTR: 00000000006ee6f8 REGS: c000000120c078c0 TRAP: 0700 Not tainted (6.10.0-rc3) MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 2828220f XER: 0000000e CFAR: c0000000001fdc80 IRQMASK: 0 [ ... GPRs omitted ... ] NIP [c0000000005d23d4] usercopy_abort+0x78/0xb0 LR [c0000000005d23d0] usercopy_abort+0x74/0xb0 Call Trace: usercopy_abort+0x74/0xb0 (unreliable) __check_heap_object+0xf8/0x120 check_heap_object+0x218/0x240 __check_object_size+0x84/0x1a4 dtl_file_read+0x17c/0x2c4 full_proxy_read+0x8c/0x110 vfs_read+0xdc/0x3a0 ksys_read+0x84/0x144 system_call_exception+0x124/0x330 system_call_vectored_common+0x15c/0x2ec --- interrupt: 3000 at 0x7fff81f3ab34 Commit 6d07d1c ("usercopy: Restrict non-usercopy caches to size 0") requires that only whitelisted areas in slab/slub objects can be copied to userspace when usercopy hardening is enabled using CONFIG_HARDENED_USERCOPY. Dtl contains hypervisor dispatch events which are expected to be read by privileged users. Hence mark this safe for user access. Specify useroffset=0 and usersize=DISPATCH_LOG_BYTES to whitelist the entire object. Co-developed-by: Vishal Chourasia <vishalc@linux.ibm.com> Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com> Signed-off-by: Anjali K <anjalik@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240614173844.746818-1-anjalik@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>

Merge pull request #1 from torvalds/master

e28d69f

pull from torvalds

e-mailky closed this Mar 31, 2014

hzhuang1 pushed a commit to hzhuang1/linux that referenced this pull request Jul 6, 2015

Merge pull request torvalds#85 from xin3liang/vblank

a50f5d0

enbable vblank event and reserve 128MB memory for graphic usage

logic10492 pushed a commit to logic10492/linux-amd-zen2 that referenced this pull request Jan 18, 2024

Merge pull request torvalds#85 from sched-ext/misc-fixes

e69323c

scx: Move scx_ops_enable_state_str[] outside CONFIG_SCHED_DEBUG

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge pull request #1 from torvalds/master #85

Merge pull request #1 from torvalds/master #85

e-mailky commented Mar 31, 2014

Merge pull request #1 from torvalds/master #85

Merge pull request #1 from torvalds/master #85

Conversation

e-mailky commented Mar 31, 2014