Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

video: sun4i-lcd: add support LCD SPI module (thanks hipboi) #50

Conversation

Quarx2k
Copy link

@Quarx2k Quarx2k commented Jul 4, 2012

It's not broke devices without SPI.
Issue #38

@amery
Copy link
Member

amery commented Jul 4, 2012

I'll try to merge the rest of hipboi's patch too

@amery
Copy link
Member

amery commented Jul 4, 2012

done.... finally. can you test 6c01b6c? It's the full patch from @hipboi, but I had to guess some parts as there is a gap between the codebase we have and the codebase where this commit was made.

@amery
Copy link
Member

amery commented Jul 5, 2012

can you redo this patch to apply after 6c01b6c moving lcd_spi from mid9742 and aurora to the default lcd0_panel_cfg?

@Quarx2k
Copy link
Author

Quarx2k commented Jul 5, 2012

H,. My commit doing the same, moving lcd_spi to lcd0_panel_cfg

@amery
Copy link
Member

amery commented Sep 7, 2012

@techn started splitting 6c01b6c. we already have lcd%d_bright (ebc95e8) and some cleanups (0165c7d, 0165c7d) but we need to find a way decent way to for lcd_spi without making it aurora specific.

@amery
Copy link
Member

amery commented Sep 10, 2012

as of 0a3d1af 3.0-v2 finally has lcd_spi support

@amery
Copy link
Member

amery commented Sep 10, 2012

and the rest of the changes in the large patch are now on-hold on the wip/linux-sunxi-3.0/disp branch until the greenish hdmi output effect. so I'm closing this ticket.

@amery amery closed this Sep 10, 2012
amery pushed a commit that referenced this pull request Sep 28, 2012
Result of regulator_get() call is not checked properly against error causing
kernel to panic:

[...]
[    2.010000] Unable to handle kernel paging request at virtual address fffffe2b
[    2.010000] pgd = c0004000
[    2.020000] [fffffe2b] *pgd=6f7fe821, *pte=00000000, *ppte=00000000
[    2.020000] Internal error: Oops: 17 [#1] PREEMPT ARM
[    2.020000] Modules linked in:
[    2.020000] CPU: 0    Not tainted  (3.6.0-rc6+ #50)
[    2.020000] PC is at regulator_get_voltage+0xc/0x3c
[    2.020000] LR is at sun4i_cpufreq_initcall+0xd0/0x12c
[...]

Signed-off-by: Aliaksei Katovich <aliaksei.katovich@gmail.com>
amery pushed a commit that referenced this pull request Oct 3, 2012
Result of regulator_get() call is not checked properly against error causing
kernel to panic:

[...]
[    2.010000] Unable to handle kernel paging request at virtual address fffffe2b
[    2.010000] pgd = c0004000
[    2.020000] [fffffe2b] *pgd=6f7fe821, *pte=00000000, *ppte=00000000
[    2.020000] Internal error: Oops: 17 [#1] PREEMPT ARM
[    2.020000] Modules linked in:
[    2.020000] CPU: 0    Not tainted  (3.6.0-rc6+ #50)
[    2.020000] PC is at regulator_get_voltage+0xc/0x3c
[    2.020000] LR is at sun4i_cpufreq_initcall+0xd0/0x12c
[...]

Signed-off-by: Aliaksei Katovich <aliaksei.katovich@gmail.com>
amery pushed a commit that referenced this pull request Oct 5, 2012
fix:
[  132.474633] 3.5.0-rc1+ #50 Not tainted
[  132.474634] -------------------------------
[  132.474635] include/linux/kvm_host.h:369 suspicious rcu_dereference_check() usage!
[  132.474636]
[  132.474636] other info that might help us debug this:
[  132.474636]
[  132.474638]
[  132.474638] rcu_scheduler_active = 1, debug_locks = 1
[  132.474640] 1 lock held by qemu-kvm/2832:
[  132.474657]  #0:  (&vcpu->mutex){+.+.+.}, at: [<ffffffffa01e1636>] vcpu_load+0x1e/0x91 [kvm]
[  132.474658]
[  132.474658] stack backtrace:
[  132.474660] Pid: 2832, comm: qemu-kvm Not tainted 3.5.0-rc1+ #50
[  132.474661] Call Trace:
[  132.474665]  [<ffffffff81092f40>] lockdep_rcu_suspicious+0xfc/0x105
[  132.474675]  [<ffffffffa01e0c85>] kvm_memslots+0x6d/0x75 [kvm]
[  132.474683]  [<ffffffffa01e0ca1>] gfn_to_memslot+0x14/0x4c [kvm]
[  132.474693]  [<ffffffffa01e3575>] mark_page_dirty+0x17/0x2a [kvm]
[  132.474706]  [<ffffffffa01f21ea>] kvm_arch_vcpu_ioctl+0xbcf/0xc07 [kvm]

Actually, we do not write vcpu->arch.time at this time, mark_page_dirty
should be removed.

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
amery pushed a commit that referenced this pull request Jun 22, 2013
was playing with suspend and run into this:

|BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:891
|in_atomic(): 1, irqs_disabled(): 0, pid: 1963, name: bash
|6 locks held by bash/1963:
|CPU: 0 PID: 1963 Comm: bash Not tainted 3.10.0-rc4+ #50
|[<c0014fdc>] (unwind_backtrace+0x0/0xf8) from [<c0011da4>] (show_stack+0x10/0x14)
|[<c0011da4>] (show_stack+0x10/0x14) from [<c02e8680>] (__pm_runtime_idle+0xa4/0xac)
|[<c02e8680>] (__pm_runtime_idle+0xa4/0xac) from [<c0341158>] (davinci_mdio_suspend+0x6c/0x9c)
|[<c0341158>] (davinci_mdio_suspend+0x6c/0x9c) from [<c02e0628>] (platform_pm_suspend+0x2c/0x54)
|[<c02e0628>] (platform_pm_suspend+0x2c/0x54) from [<c02e52bc>] (dpm_run_callback.isra.3+0x2c/0x64)
|[<c02e52bc>] (dpm_run_callback.isra.3+0x2c/0x64) from [<c02e57e4>] (__device_suspend+0x100/0x22c)
|[<c02e57e4>] (__device_suspend+0x100/0x22c) from [<c02e67e8>] (dpm_suspend+0x68/0x230)
|[<c02e67e8>] (dpm_suspend+0x68/0x230) from [<c0072a20>] (suspend_devices_and_enter+0x68/0x350)
|[<c0072a20>] (suspend_devices_and_enter+0x68/0x350) from [<c0072f18>] (pm_suspend+0x210/0x24c)
|[<c0072f18>] (pm_suspend+0x210/0x24c) from [<c0071c74>] (state_store+0x6c/0xbc)
|[<c0071c74>] (state_store+0x6c/0xbc) from [<c02714dc>] (kobj_attr_store+0x14/0x20)
|[<c02714dc>] (kobj_attr_store+0x14/0x20) from [<c01341a0>] (sysfs_write_file+0x16c/0x19c)
|[<c01341a0>] (sysfs_write_file+0x16c/0x19c) from [<c00ddfe4>] (vfs_write+0xb4/0x190)
|[<c00ddfe4>] (vfs_write+0xb4/0x190) from [<c00de3a4>] (SyS_write+0x3c/0x70)
|[<c00de3a4>] (SyS_write+0x3c/0x70) from [<c000e2c0>] (ret_fast_syscall+0x0/0x48)

I don't see a reason why the pm_runtime call must be under the lock.
Further I don't understand why this is a spinlock and not mutex.

Cc: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
amery pushed a commit that referenced this pull request Nov 12, 2013
As the new x86 CPU bootup printout format code maintainer, I am
taking immediate action to improve and clean (and thus indulge
my OCD) the reporting of the cores when coming up online.

Fix padding to a right-hand alignment, cleanup code and bind
reporting width to the max number of supported CPUs on the
system, like this:

 [    0.074509] smpboot: Booting Node   0, Processors:      #1  #2  #3  #4  #5  #6  #7 OK
 [    0.644008] smpboot: Booting Node   1, Processors:  #8  #9 #10 #11 #12 #13 #14 #15 OK
 [    1.245006] smpboot: Booting Node   2, Processors: #16 #17 #18 #19 #20 #21 #22 #23 OK
 [    1.864005] smpboot: Booting Node   3, Processors: #24 #25 #26 #27 #28 #29 #30 #31 OK
 [    2.489005] smpboot: Booting Node   4, Processors: #32 #33 #34 #35 #36 #37 #38 #39 OK
 [    3.093005] smpboot: Booting Node   5, Processors: #40 #41 #42 #43 #44 #45 #46 #47 OK
 [    3.698005] smpboot: Booting Node   6, Processors: #48 #49 #50 #51 #52 #53 #54 #55 OK
 [    4.304005] smpboot: Booting Node   7, Processors: #56 #57 #58 #59 #60 #61 #62 #63 OK
 [    4.961413] Brought up 64 CPUs

and this:

 [    0.072367] smpboot: Booting Node   0, Processors:    #1 #2 #3 #4 #5 #6 #7 OK
 [    0.686329] Brought up 8 CPUs

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Libin <huawei.libin@huawei.com>
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Link: http://lkml.kernel.org/r/20130927143554.GF4422@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
amery pushed a commit that referenced this pull request Nov 12, 2013
Turn it into (for example):

[    0.073380] x86: Booting SMP configuration:
[    0.074005] .... node   #0, CPUs:          #1   #2   #3   #4   #5   #6   #7
[    0.603005] .... node   #1, CPUs:     #8   #9  #10  #11  #12  #13  #14  #15
[    1.200005] .... node   #2, CPUs:    #16  #17  #18  #19  #20  #21  #22  #23
[    1.796005] .... node   #3, CPUs:    #24  #25  #26  #27  #28  #29  #30  #31
[    2.393005] .... node   #4, CPUs:    #32  #33  #34  #35  #36  #37  #38  #39
[    2.996005] .... node   #5, CPUs:    #40  #41  #42  #43  #44  #45  #46  #47
[    3.600005] .... node   #6, CPUs:    #48  #49  #50  #51  #52  #53  #54  #55
[    4.202005] .... node   #7, CPUs:    #56  #57  #58  #59  #60  #61  #62  #63
[    4.811005] .... node   #8, CPUs:    #64  #65  #66  #67  #68  #69  #70  #71
[    5.421006] .... node   #9, CPUs:    #72  #73  #74  #75  #76  #77  #78  #79
[    6.032005] .... node  #10, CPUs:    #80  #81  #82  #83  #84  #85  #86  #87
[    6.648006] .... node  #11, CPUs:    #88  #89  #90  #91  #92  #93  #94  #95
[    7.262005] .... node  #12, CPUs:    #96  #97  #98  #99 #100 #101 #102 #103
[    7.865005] .... node  #13, CPUs:   #104 #105 #106 #107 #108 #109 #110 #111
[    8.466005] .... node  #14, CPUs:   #112 #113 #114 #115 #116 #117 #118 #119
[    9.073006] .... node  #15, CPUs:   #120 #121 #122 #123 #124 #125 #126 #127
[    9.679901] x86: Booted up 16 nodes, 128 CPUs

and drop useless elements.

Change num_digits() to hpa's division-avoiding, cell-phone-typed
version which he went at great lengths and pains to submit on a
Saturday evening.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: huawei.libin@huawei.com
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130930095624.GB16383@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
amery pushed a commit that referenced this pull request Jun 27, 2014
…ll_time_in_state

Commit 40cf2f8 (cpufreq: Persist cpufreq time in state data across hotplug)
causes the following call trace to be spit on boot:

BUG: sleeping function called from invalid context at mm/slub.c:936
in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: swapper/0
CPU: 6 PID: 1 Comm: swapper/0 Not tainted 3.10.9-20140624.172707-eng-gd6c0f69-dirty #50
Backtrace:
[<c0012270>] (dump_backtrace+0x0/0x10c) from [<c001256c>] (show_stack+0x18/0x1c)
 r6:ffff1788 r5:c0c020c0 r4:e609c000 r3:00000000
[<c0012554>] (show_stack+0x0/0x1c) from [<c07a2970>] (dump_stack+0x20/0x28)
[<c07a2950>] (dump_stack+0x0/0x28) from [<c0057678>] (__might_sleep+0x104/0x120)
[<c0057574>] (__might_sleep+0x0/0x120) from [<c00ff000>] (__kmalloc_track_caller+0x144/0x274)
 r6:00000000 r5:e609c000 r4:e6802140
[<c00feebc>] (__kmalloc_track_caller+0x0/0x274) from [<c00da098>] (krealloc+0x58/0xb0)
[<c00da040>] (krealloc+0x0/0xb0) from [<c050266c>] (cpufreq_allstats_create+0x120/0x204)
 r8:e4c4ff00 r7:c0d266b8 r6:0013d620 r5:e4c4e600 r4:00000001
r3:e535d6d0
[<c050254c>] (cpufreq_allstats_create+0x0/0x204) from [<c0502e38>] (cpufreq_stat_notifier_policy+0xb8/0xd0)
[<c0502d80>] (cpufreq_stat_notifier_policy+0x0/0xd0) from [<c00517cc>] (notifier_call_chain+0x4c/0x8c)
 r5:00000000 r4:fffffffe
[<c0051780>] (notifier_call_chain+0x0/0x8c) from [<c00519fc>] (__blocking_notifier_call_chain+0x50/0x68)
 r8:c0cd4d00 r7:00000002 r6:e609dd7c r5:ffffffff r4:c0d25a4c
r3:ffffffff
[<c00519ac>] (__blocking_notifier_call_chain+0x0/0x68) from [<c0051a34>] (blocking_notifier_call_chain+0x20/0x28)
 r7:c0e24f30 r6:00000000 r5:e53e1e00 r4:e609dd7c
[<c0051a14>] (blocking_notifier_call_chain+0x0/0x28) from [<c0500fec>] (__cpufreq_set_policy+0xc0/0x1d0)
[<c0500f2c>] (__cpufreq_set_policy+0x0/0x1d0) from [<c0501308>] (cpufreq_add_dev_interface+0x20c/0x270)
 r7:00000008 r6:00000000 r5:e53e1e00 r4:e53e1e58
[<c05010fc>] (cpufreq_add_dev_interface+0x0/0x270) from [<c05016a8>] (cpufreq_add_dev+0x33c/0x420)
[<c050136c>] (cpufreq_add_dev+0x0/0x420) from [<c03604a4>] (subsys_interface_register+0x80/0xbc)
[<c0360424>] (subsys_interface_register+0x0/0xbc) from [<c050035c>] (cpufreq_register_driver+0x8c/0x194)

Change-Id: If77a656d0ea60a8fc4083283d104509fa6c07f8f
Signed-off-by: Minsung Kim <ms925.kim@samsung.com>
amery pushed a commit that referenced this pull request Jun 27, 2014
…ll_time_in_state

Commit 40cf2f8 (cpufreq: Persist cpufreq time in state data across hotplug)
causes the following call trace to be spit on boot:

BUG: sleeping function called from invalid context at mm/slub.c:936
in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: swapper/0
CPU: 6 PID: 1 Comm: swapper/0 Not tainted 3.10.9-20140624.172707-eng-gd6c0f69-dirty #50
Backtrace:
[<c0012270>] (dump_backtrace+0x0/0x10c) from [<c001256c>] (show_stack+0x18/0x1c)
 r6:ffff1788 r5:c0c020c0 r4:e609c000 r3:00000000
[<c0012554>] (show_stack+0x0/0x1c) from [<c07a2970>] (dump_stack+0x20/0x28)
[<c07a2950>] (dump_stack+0x0/0x28) from [<c0057678>] (__might_sleep+0x104/0x120)
[<c0057574>] (__might_sleep+0x0/0x120) from [<c00ff000>] (__kmalloc_track_caller+0x144/0x274)
 r6:00000000 r5:e609c000 r4:e6802140
[<c00feebc>] (__kmalloc_track_caller+0x0/0x274) from [<c00da098>] (krealloc+0x58/0xb0)
[<c00da040>] (krealloc+0x0/0xb0) from [<c050266c>] (cpufreq_allstats_create+0x120/0x204)
 r8:e4c4ff00 r7:c0d266b8 r6:0013d620 r5:e4c4e600 r4:00000001
r3:e535d6d0
[<c050254c>] (cpufreq_allstats_create+0x0/0x204) from [<c0502e38>] (cpufreq_stat_notifier_policy+0xb8/0xd0)
[<c0502d80>] (cpufreq_stat_notifier_policy+0x0/0xd0) from [<c00517cc>] (notifier_call_chain+0x4c/0x8c)
 r5:00000000 r4:fffffffe
[<c0051780>] (notifier_call_chain+0x0/0x8c) from [<c00519fc>] (__blocking_notifier_call_chain+0x50/0x68)
 r8:c0cd4d00 r7:00000002 r6:e609dd7c r5:ffffffff r4:c0d25a4c
r3:ffffffff
[<c00519ac>] (__blocking_notifier_call_chain+0x0/0x68) from [<c0051a34>] (blocking_notifier_call_chain+0x20/0x28)
 r7:c0e24f30 r6:00000000 r5:e53e1e00 r4:e609dd7c
[<c0051a14>] (blocking_notifier_call_chain+0x0/0x28) from [<c0500fec>] (__cpufreq_set_policy+0xc0/0x1d0)
[<c0500f2c>] (__cpufreq_set_policy+0x0/0x1d0) from [<c0501308>] (cpufreq_add_dev_interface+0x20c/0x270)
 r7:00000008 r6:00000000 r5:e53e1e00 r4:e53e1e58
[<c05010fc>] (cpufreq_add_dev_interface+0x0/0x270) from [<c05016a8>] (cpufreq_add_dev+0x33c/0x420)
[<c050136c>] (cpufreq_add_dev+0x0/0x420) from [<c03604a4>] (subsys_interface_register+0x80/0xbc)
[<c0360424>] (subsys_interface_register+0x0/0xbc) from [<c050035c>] (cpufreq_register_driver+0x8c/0x194)

Change-Id: If77a656d0ea60a8fc4083283d104509fa6c07f8f
Signed-off-by: Minsung Kim <ms925.kim@samsung.com>
wens pushed a commit that referenced this pull request Sep 29, 2015
During quick plug/removal of OTG adapter during dual-role testing
it can happen that xhci_alloc_device() is called for the newly
detected device after the DRD library has called xhci_stop to
remove the HCD.

If that is the case, just fail early to prevent the following warning.

[  154.732649] hub 4-0:1.0: USB hub found
[  154.742204] hub 4-0:1.0: 1 port detected
[  154.824458] hub 3-0:1.0: state 7 ports 1 chg 0002 evt 0000
[  154.854609] hub 4-0:1.0: state 7 ports 1 chg 0000 evt 0000
[  154.944430] usb 3-1: new high-speed USB device number 2 using xhci-hcd
[  154.951009] xhci-hcd xhci-hcd.0.auto: xhci_setup_device
[  155.038191] xhci-hcd xhci-hcd.0.auto: remove, state 4
[  155.043315] usb usb4: USB disconnect, device number 1
[  155.055270] xhci-hcd xhci-hcd.0.auto: xhci_stop
[  155.060094] xhci-hcd xhci-hcd.0.auto: USB bus 4 deregistered
[  155.066576] xhci-hcd xhci-hcd.0.auto: remove, state 1
[  155.071710] usb usb3: USB disconnect, device number 1
[  155.077124] xhci-hcd xhci-hcd.0.auto: xhci_setup_device
[  155.082389] ------------[ cut here ]------------
[  155.087690] WARNING: CPU: 0 PID: 72 at drivers/usb/host/xhci.c:3800 xhci_setup_device+0x410/0x484 [xhci_hcd]()
[  155.097861] Modules linked in: sd_mod usb_storage scsi_mod usb_f_ss_lb g_zero libcomposite ipv6 xhci_plat_hcd xhci_hcd usbcore dwc3 udc_core evdev ti_am335x_adc joydev kfifo_buf industrialio snd_soc_simple_cc
[  155.146734] CPU: 0 PID: 72 Comm: kworker/0:3 Tainted: G        W       4.1.4-00834-gcd9380b-dirty #50
[  155.156073] Hardware name: Generic AM43 (Flattened Device Tree)
[  155.162117] Workqueue: usb_hub_wq hub_event [usbcore]
[  155.167249] Backtrace:
[  155.169751] [<c0012af0>] (dump_backtrace) from [<c0012c8c>] (show_stack+0x18/0x1c)
[  155.177390]  r6:c089d4a4 r5:ffffffff r4:00000000 r3:ee46c000
[  155.183137] [<c0012c74>] (show_stack) from [<c05f7c14>] (dump_stack+0x84/0xd0)
[  155.190446] [<c05f7b90>] (dump_stack) from [<c00439ac>] (warn_slowpath_common+0x80/0xbc)
[  155.198605]  r7:00000009 r6:00000ed8 r5:bf27eb70 r4:00000000
[  155.204348] [<c004392c>] (warn_slowpath_common) from [<c0043a0c>] (warn_slowpath_null+0x24/0x2c)
[  155.213202]  r8:ee49f000 r7:ee7c0004 r6:00000000 r5:ee7c0158 r4:ee7c0000
[  155.220051] [<c00439e8>] (warn_slowpath_null) from [<bf27eb70>] (xhci_setup_device+0x410/0x484 [xhci_hcd])
[  155.229816] [<bf27e760>] (xhci_setup_device [xhci_hcd]) from [<bf27ec10>] (xhci_address_device+0x14/0x18 [xhci_hcd])
[  155.240415]  r10:ee598200 r9:00000001 r8:00000002 r7:00000001 r6:00000003 r5:00000002
[  155.248363]  r4:ee49f000
[  155.250978] [<bf27ebfc>] (xhci_address_device [xhci_hcd]) from [<bf20cb94>] (hub_port_init+0x1b8/0xa9c [usbcore])
[  155.261403] [<bf20c9dc>] (hub_port_init [usbcore]) from [<bf2101e0>] (hub_event+0x738/0x1020 [usbcore])
[  155.270874]  r10:ee598200 r9:ee7c0000 r8:ee7c0038 r7:ee518800 r6:ee49f000 r5:00000001
[  155.278822]  r4:00000000
[  155.281426] [<bf20faa8>] (hub_event [usbcore]) from [<c005754c>] (process_one_work+0x128/0x340)
[  155.290196]  r10:00000000 r9:00000003 r8:00000000 r7:fedfa000 r6:eeec5400 r5:ee598314
[  155.298151]  r4:ee434380
[  155.300718] [<c0057424>] (process_one_work) from [<c00578f8>] (worker_thread+0x158/0x49c)
[  155.308963]  r10:ee434380 r9:00000003 r8:eeec5400 r7:00000008 r6:ee434398 r5:eeec5400
[  155.316913]  r4:eeec5414
[  155.319482] [<c00577a0>] (worker_thread) from [<c005cc40>] (kthread+0xdc/0xf8)
[  155.326765]  r10:00000000 r9:00000000 r8:00000000 r7:c00577a0 r6:ee434380 r5:ee4441c0
[  155.334713]  r4:00000000 r3:00000000
[  155.338341] [<c005cb64>] (kthread) from [<c000fc08>] (ret_from_fork+0x14/0x2c)
[  155.345626]  r7:00000000 r6:00000000 r5:c005cb64 r4:ee4441c0
[  155.356108] ---[ end trace a58d34c223b190e6 ]---
[  155.360783] xhci-hcd xhci-hcd.0.auto: Virt dev invalid for slot_id 0x1!
[  155.574404] xhci-hcd xhci-hcd.0.auto: xhci_setup_device
[  155.579667] ------------[ cut here ]------------

Cc: <stable@vger.kernel.org>
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
amery pushed a commit that referenced this pull request Nov 5, 2015
…ll_time_in_state

Commit 40cf2f8 (cpufreq: Persist cpufreq time in state data across hotplug)
causes the following call trace to be spit on boot:

BUG: sleeping function called from invalid context at mm/slub.c:936
in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: swapper/0
CPU: 6 PID: 1 Comm: swapper/0 Not tainted 3.10.9-20140624.172707-eng-gd6c0f69-dirty #50
Backtrace:
[<c0012270>] (dump_backtrace+0x0/0x10c) from [<c001256c>] (show_stack+0x18/0x1c)
 r6:ffff1788 r5:c0c020c0 r4:e609c000 r3:00000000
[<c0012554>] (show_stack+0x0/0x1c) from [<c07a2970>] (dump_stack+0x20/0x28)
[<c07a2950>] (dump_stack+0x0/0x28) from [<c0057678>] (__might_sleep+0x104/0x120)
[<c0057574>] (__might_sleep+0x0/0x120) from [<c00ff000>] (__kmalloc_track_caller+0x144/0x274)
 r6:00000000 r5:e609c000 r4:e6802140
[<c00feebc>] (__kmalloc_track_caller+0x0/0x274) from [<c00da098>] (krealloc+0x58/0xb0)
[<c00da040>] (krealloc+0x0/0xb0) from [<c050266c>] (cpufreq_allstats_create+0x120/0x204)
 r8:e4c4ff00 r7:c0d266b8 r6:0013d620 r5:e4c4e600 r4:00000001
r3:e535d6d0
[<c050254c>] (cpufreq_allstats_create+0x0/0x204) from [<c0502e38>] (cpufreq_stat_notifier_policy+0xb8/0xd0)
[<c0502d80>] (cpufreq_stat_notifier_policy+0x0/0xd0) from [<c00517cc>] (notifier_call_chain+0x4c/0x8c)
 r5:00000000 r4:fffffffe
[<c0051780>] (notifier_call_chain+0x0/0x8c) from [<c00519fc>] (__blocking_notifier_call_chain+0x50/0x68)
 r8:c0cd4d00 r7:00000002 r6:e609dd7c r5:ffffffff r4:c0d25a4c
r3:ffffffff
[<c00519ac>] (__blocking_notifier_call_chain+0x0/0x68) from [<c0051a34>] (blocking_notifier_call_chain+0x20/0x28)
 r7:c0e24f30 r6:00000000 r5:e53e1e00 r4:e609dd7c
[<c0051a14>] (blocking_notifier_call_chain+0x0/0x28) from [<c0500fec>] (__cpufreq_set_policy+0xc0/0x1d0)
[<c0500f2c>] (__cpufreq_set_policy+0x0/0x1d0) from [<c0501308>] (cpufreq_add_dev_interface+0x20c/0x270)
 r7:00000008 r6:00000000 r5:e53e1e00 r4:e53e1e58
[<c05010fc>] (cpufreq_add_dev_interface+0x0/0x270) from [<c05016a8>] (cpufreq_add_dev+0x33c/0x420)
[<c050136c>] (cpufreq_add_dev+0x0/0x420) from [<c03604a4>] (subsys_interface_register+0x80/0xbc)
[<c0360424>] (subsys_interface_register+0x0/0xbc) from [<c050035c>] (cpufreq_register_driver+0x8c/0x194)

Change-Id: If77a656d0ea60a8fc4083283d104509fa6c07f8f
Signed-off-by: Minsung Kim <ms925.kim@samsung.com>
amery pushed a commit that referenced this pull request Jul 27, 2016
…end()

The wait_event() call in dvb_unregister_frontend() waits synchronously
for other tasks to free a file descriptor, but it does that while
holding several mutexes.  That alone is a bad idea, but if one user
process happens to keep a (defunct) file descriptor open indefinitely,
the kernel will correctly detect a hung task:

    INFO: task kworker/0:1:314 blocked for more than 30 seconds.
          Not tainted 4.7.0-rc1-hosting+ #50
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    kworker/0:1     D ffff88003daf7a50     0   314      2 0x00000000
    Workqueue: usb_hub_wq hub_event
     ffff88003daf7a50 0000000000000296 ffff88003daf7a30 ffff88003fc13f98
     ffff88003dadce00 ffff88003daf8000 ffff88003e3fc010 ffff88003d48d4f8
     ffff88003e3b5030 ffff88003e3f8898 ffff88003daf7a68 ffffffff810cf860
    Call Trace:
     [<ffffffff810cf860>] schedule+0x30/0x80
     [<ffffffff812f88d3>] dvb_unregister_frontend+0x93/0xc0
     [<ffffffff8107a000>] ? __wake_up_common+0x80/0x80
     [<ffffffff813019c7>] dvb_usb_adapter_frontend_exit+0x37/0x70
     [<ffffffff81300614>] dvb_usb_exit+0x34/0xb0
     [<ffffffff81300d4a>] dvb_usb_device_exit+0x3a/0x50
     [<ffffffff81302dc2>] pctv452e_usb_disconnect+0x52/0x60
     [<ffffffff81295a07>] usb_unbind_interface+0x67/0x1e0
     [<ffffffff810609f3>] ? __blocking_notifier_call_chain+0x53/0x70
     [<ffffffff8127ba67>] __device_release_driver+0x77/0x110
     [<ffffffff8127c2d3>] device_release_driver+0x23/0x30
     [<ffffffff8127ab5d>] bus_remove_device+0x10d/0x150
     [<ffffffff8127879b>] device_del+0x13b/0x260
     [<ffffffff81299dea>] ? usb_remove_ep_devs+0x1a/0x30
     [<ffffffff8129468e>] usb_disable_device+0x9e/0x1e0
     [<ffffffff8128bb09>] usb_disconnect+0x89/0x260
     [<ffffffff8128db8d>] hub_event+0x30d/0xfc0
     [<ffffffff81059475>] process_one_work+0x1c5/0x4a0
     [<ffffffff8105940c>] ? process_one_work+0x15c/0x4a0
     [<ffffffff81059799>] worker_thread+0x49/0x480
     [<ffffffff81059750>] ? process_one_work+0x4a0/0x4a0
     [<ffffffff81059750>] ? process_one_work+0x4a0/0x4a0
     [<ffffffff8105f65e>] kthread+0xee/0x110
     [<ffffffff810400bf>] ret_from_fork+0x1f/0x40
     [<ffffffff8105f570>] ? __kthread_unpark+0x70/0x70
    5 locks held by kworker/0:1/314:
     #0:  ("usb_hub_wq"){......}, at: [<ffffffff8105940c>] process_one_work+0x15c/0x4a0
     #1:  ((&hub->events)){......}, at: [<ffffffff8105940c>] process_one_work+0x15c/0x4a0
     #2:  (&dev->mutex){......}, at: [<ffffffff8128d8cb>] hub_event+0x4b/0xfc0
     #3:  (&dev->mutex){......}, at: [<ffffffff8128bad2>] usb_disconnect+0x52/0x260
     #4:  (&dev->mutex){......}, at: [<ffffffff8127c2cb>] device_release_driver+0x1b/0x30

This patch removes the blocking wait, and postpones the kfree() call
until all file handles have been closed by using struct kref.

Signed-off-by: Max Kellermann <max@duempel.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
amery pushed a commit that referenced this pull request Apr 1, 2017
[ Upstream commit 448116b ]

During quick plug/removal of OTG adapter during dual-role testing
it can happen that xhci_alloc_device() is called for the newly
detected device after the DRD library has called xhci_stop to
remove the HCD.

If that is the case, just fail early to prevent the following warning.

[  154.732649] hub 4-0:1.0: USB hub found
[  154.742204] hub 4-0:1.0: 1 port detected
[  154.824458] hub 3-0:1.0: state 7 ports 1 chg 0002 evt 0000
[  154.854609] hub 4-0:1.0: state 7 ports 1 chg 0000 evt 0000
[  154.944430] usb 3-1: new high-speed USB device number 2 using xhci-hcd
[  154.951009] xhci-hcd xhci-hcd.0.auto: xhci_setup_device
[  155.038191] xhci-hcd xhci-hcd.0.auto: remove, state 4
[  155.043315] usb usb4: USB disconnect, device number 1
[  155.055270] xhci-hcd xhci-hcd.0.auto: xhci_stop
[  155.060094] xhci-hcd xhci-hcd.0.auto: USB bus 4 deregistered
[  155.066576] xhci-hcd xhci-hcd.0.auto: remove, state 1
[  155.071710] usb usb3: USB disconnect, device number 1
[  155.077124] xhci-hcd xhci-hcd.0.auto: xhci_setup_device
[  155.082389] ------------[ cut here ]------------
[  155.087690] WARNING: CPU: 0 PID: 72 at drivers/usb/host/xhci.c:3800 xhci_setup_device+0x410/0x484 [xhci_hcd]()
[  155.097861] Modules linked in: sd_mod usb_storage scsi_mod usb_f_ss_lb g_zero libcomposite ipv6 xhci_plat_hcd xhci_hcd usbcore dwc3 udc_core evdev ti_am335x_adc joydev kfifo_buf industrialio snd_soc_simple_cc
[  155.146734] CPU: 0 PID: 72 Comm: kworker/0:3 Tainted: G        W       4.1.4-00834-gcd9380b-dirty #50
[  155.156073] Hardware name: Generic AM43 (Flattened Device Tree)
[  155.162117] Workqueue: usb_hub_wq hub_event [usbcore]
[  155.167249] Backtrace:
[  155.169751] [<c0012af0>] (dump_backtrace) from [<c0012c8c>] (show_stack+0x18/0x1c)
[  155.177390]  r6:c089d4a4 r5:ffffffff r4:00000000 r3:ee46c000
[  155.183137] [<c0012c74>] (show_stack) from [<c05f7c14>] (dump_stack+0x84/0xd0)
[  155.190446] [<c05f7b90>] (dump_stack) from [<c00439ac>] (warn_slowpath_common+0x80/0xbc)
[  155.198605]  r7:00000009 r6:00000ed8 r5:bf27eb70 r4:00000000
[  155.204348] [<c004392c>] (warn_slowpath_common) from [<c0043a0c>] (warn_slowpath_null+0x24/0x2c)
[  155.213202]  r8:ee49f000 r7:ee7c0004 r6:00000000 r5:ee7c0158 r4:ee7c0000
[  155.220051] [<c00439e8>] (warn_slowpath_null) from [<bf27eb70>] (xhci_setup_device+0x410/0x484 [xhci_hcd])
[  155.229816] [<bf27e760>] (xhci_setup_device [xhci_hcd]) from [<bf27ec10>] (xhci_address_device+0x14/0x18 [xhci_hcd])
[  155.240415]  r10:ee598200 r9:00000001 r8:00000002 r7:00000001 r6:00000003 r5:00000002
[  155.248363]  r4:ee49f000
[  155.250978] [<bf27ebfc>] (xhci_address_device [xhci_hcd]) from [<bf20cb94>] (hub_port_init+0x1b8/0xa9c [usbcore])
[  155.261403] [<bf20c9dc>] (hub_port_init [usbcore]) from [<bf2101e0>] (hub_event+0x738/0x1020 [usbcore])
[  155.270874]  r10:ee598200 r9:ee7c0000 r8:ee7c0038 r7:ee518800 r6:ee49f000 r5:00000001
[  155.278822]  r4:00000000
[  155.281426] [<bf20faa8>] (hub_event [usbcore]) from [<c005754c>] (process_one_work+0x128/0x340)
[  155.290196]  r10:00000000 r9:00000003 r8:00000000 r7:fedfa000 r6:eeec5400 r5:ee598314
[  155.298151]  r4:ee434380
[  155.300718] [<c0057424>] (process_one_work) from [<c00578f8>] (worker_thread+0x158/0x49c)
[  155.308963]  r10:ee434380 r9:00000003 r8:eeec5400 r7:00000008 r6:ee434398 r5:eeec5400
[  155.316913]  r4:eeec5414
[  155.319482] [<c00577a0>] (worker_thread) from [<c005cc40>] (kthread+0xdc/0xf8)
[  155.326765]  r10:00000000 r9:00000000 r8:00000000 r7:c00577a0 r6:ee434380 r5:ee4441c0
[  155.334713]  r4:00000000 r3:00000000
[  155.338341] [<c005cb64>] (kthread) from [<c000fc08>] (ret_from_fork+0x14/0x2c)
[  155.345626]  r7:00000000 r6:00000000 r5:c005cb64 r4:ee4441c0
[  155.356108] ---[ end trace a58d34c223b190e6 ]---
[  155.360783] xhci-hcd xhci-hcd.0.auto: Virt dev invalid for slot_id 0x1!
[  155.574404] xhci-hcd xhci-hcd.0.auto: xhci_setup_device
[  155.579667] ------------[ cut here ]------------

Cc: <stable@vger.kernel.org>
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
amery pushed a commit that referenced this pull request Jun 4, 2018
…access

James Harvey reported that some corrupted compressed extent data can
lead to various kernel memory corruption.

Such corrupted extent data belongs to inode with NODATASUM flags, thus
data csum won't help us detecting such bug.

If lucky enough, KASAN could catch it like:

BUG: KASAN: slab-out-of-bounds in lzo_decompress_bio+0x384/0x7a0 [btrfs]
Write of size 4096 at addr ffff8800606cb0f8 by task kworker/u16:0/2338

CPU: 3 PID: 2338 Comm: kworker/u16:0 Tainted: G           O      4.17.0-rc5-custom+ #50
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
Call Trace:
 dump_stack+0xc2/0x16b
 print_address_description+0x6a/0x270
 kasan_report+0x260/0x380
 memcpy+0x34/0x50
 lzo_decompress_bio+0x384/0x7a0 [btrfs]
 end_compressed_bio_read+0x99f/0x10b0 [btrfs]
 bio_endio+0x32e/0x640
 normal_work_helper+0x15a/0xea0 [btrfs]
 process_one_work+0x7e3/0x1470
 worker_thread+0x1b0/0x1170
 kthread+0x2db/0x390
 ret_from_fork+0x22/0x40
...

The offending compressed data has the following info:

Header:			length 32768		(looks completely valid)
Segment 0 Header:	length 3472882419	(obviously out of bounds)

Then when handling segment 0, since it's over the current page, we need
the copy the compressed data to temporary buffer in workspace, then such
large size would trigger out-of-bounds memory access, screwing up the
whole kernel.

Fix it by adding extra checks on header and segment headers to ensure we
won't access out-of-bounds, and even checks the decompressed data won't
be out-of-bounds.

Reported-by: James Harvey <jamespharvey20@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ updated comments ]
Signed-off-by: David Sterba <dsterba@suse.com>
amery pushed a commit that referenced this pull request Nov 9, 2018
Increase kasan instrumented kernel stack size from 32k to 64k. Other
architectures seems to get away with just doubling kernel stack size under
kasan, but on s390 this appears to be not enough due to bigger frame size.
The particular pain point is kasan inlined checks (CONFIG_KASAN_INLINE
vs CONFIG_KASAN_OUTLINE). With inlined checks one particular case hitting
stack overflow is fs sync on xfs filesystem:

 #0 [9a0681e8]  704 bytes  check_usage at 34b1fc
 #1 [9a0684a8]  432 bytes  check_usage at 34c710
 #2 [9a068658]  1048 bytes  validate_chain at 35044a
 #3 [9a068a70]  312 bytes  __lock_acquire at 3559fe
 #4 [9a068ba8]  440 bytes  lock_acquire at 3576ee
 #5 [9a068d60]  104 bytes  _raw_spin_lock at 21b44e0
 #6 [9a068dc8]  1992 bytes  enqueue_entity at 2dbf72
 #7 [9a069590]  1496 bytes  enqueue_task_fair at 2df5f0
 #8 [9a069b68]  64 bytes  ttwu_do_activate at 28f438
 #9 [9a069ba8]  552 bytes  try_to_wake_up at 298c4c
 #10 [9a069dd0]  168 bytes  wake_up_worker at 23f97c
 #11 [9a069e78]  200 bytes  insert_work at 23fc2e
 #12 [9a069f40]  648 bytes  __queue_work at 2487c0
 #13 [9a06a1c8]  200 bytes  __queue_delayed_work at 24db28
 #14 [9a06a290]  248 bytes  mod_delayed_work_on at 24de84
 #15 [9a06a388]  24 bytes  kblockd_mod_delayed_work_on at 153e2a0
 #16 [9a06a3a0]  288 bytes  __blk_mq_delay_run_hw_queue at 158168c
 #17 [9a06a4c0]  192 bytes  blk_mq_run_hw_queue at 1581a3c
 #18 [9a06a580]  184 bytes  blk_mq_sched_insert_requests at 15a2192
 #19 [9a06a638]  1024 bytes  blk_mq_flush_plug_list at 1590f3a
 #20 [9a06aa38]  704 bytes  blk_flush_plug_list at 1555028
 #21 [9a06acf8]  320 bytes  schedule at 219e476
 #22 [9a06ae38]  760 bytes  schedule_timeout at 21b0aac
 #23 [9a06b130]  408 bytes  wait_for_common at 21a1706
 #24 [9a06b2c8]  360 bytes  xfs_buf_iowait at fa1540
 #25 [9a06b430]  256 bytes  __xfs_buf_submit at fadae6
 #26 [9a06b530]  264 bytes  xfs_buf_read_map at fae3f6
 #27 [9a06b638]  656 bytes  xfs_trans_read_buf_map at 10ac9a8
 #28 [9a06b8c8]  304 bytes  xfs_btree_kill_root at e72426
 #29 [9a06b9f8]  288 bytes  xfs_btree_lookup_get_block at e7bc5e
 #30 [9a06bb18]  624 bytes  xfs_btree_lookup at e7e1a6
 #31 [9a06bd88]  2664 bytes  xfs_alloc_ag_vextent_near at dfa070
 #32 [9a06c7f0]  144 bytes  xfs_alloc_ag_vextent at dff3ca
 #33 [9a06c880]  1128 bytes  xfs_alloc_vextent at e05fce
 #34 [9a06cce8]  584 bytes  xfs_bmap_btalloc at e58342
 #35 [9a06cf30]  1336 bytes  xfs_bmapi_write at e618de
 #36 [9a06d468]  776 bytes  xfs_iomap_write_allocate at ff678e
 #37 [9a06d770]  720 bytes  xfs_map_blocks at f82af8
 #38 [9a06da40]  928 bytes  xfs_writepage_map at f83cd6
 #39 [9a06dde0]  320 bytes  xfs_do_writepage at f85872
 #40 [9a06df20]  1320 bytes  write_cache_pages at 73dfe8
 #41 [9a06e448]  208 bytes  xfs_vm_writepages at f7f892
 #42 [9a06e518]  88 bytes  do_writepages at 73fe6a
 #43 [9a06e570]  872 bytes  __writeback_single_inode at a20cb6
 #44 [9a06e8d8]  664 bytes  writeback_sb_inodes at a23be2
 #45 [9a06eb70]  296 bytes  __writeback_inodes_wb at a242e0
 #46 [9a06ec98]  928 bytes  wb_writeback at a2500e
 #47 [9a06f038]  848 bytes  wb_do_writeback at a260ae
 #48 [9a06f388]  536 bytes  wb_workfn at a28228
 #49 [9a06f5a0]  1088 bytes  process_one_work at 24a234
 #50 [9a06f9e0]  1120 bytes  worker_thread at 24ba26
 #51 [9a06fe40]  104 bytes  kthread at 26545a
 #52 [9a06fea8]             kernel_thread_starter at 21b6b62

To be able to increase the stack size to 64k reuse LLILL instruction
in __switch_to function to load 64k - STACK_FRAME_OVERHEAD - __PT_SIZE
(65192) value as unsigned.

Reported-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
repojohnray pushed a commit to repojohnray/linux-sunxi-4.7.y that referenced this pull request Mar 4, 2019
scsi_device_quiesce() and scsi_device_resume() are called during
system-wide suspend and resume. scsi_device_quiesce() only succeeds for
SCSI devices that are in one of the RUNNING, OFFLINE or TRANSPORT_OFFLINE
states (see also scsi_set_device_state()).  This patch avoids that the
following warning is triggered when resuming a system for which quiescing a
SCSI device failed:

WARNING: CPU: 2 PID: 11303 at drivers/scsi/scsi_lib.c:2600 scsi_device_resume+0x4f/0x58
CPU: 2 PID: 11303 Comm: kworker/u8:70 Not tainted 5.0.0-rc1+ linux-sunxi#50
Hardware name: LENOVO 80E3/Lancer 5B2, BIOS A2CN45WW(V2.13) 08/04/2016
Workqueue: events_unbound async_run_entry_fn
Call Trace:
 scsi_dev_type_resume+0x2e/0x60
 async_run_entry_fn+0x32/0xd8
 process_one_work+0x1f4/0x420
 worker_thread+0x28/0x3c0
 kthread+0x118/0x130
 ret_from_fork+0x22/0x40

Cc: Przemek Socha <soprwa@gmail.com>
Reported-by: Przemek Socha <soprwa@gmail.com>
Fixes: 3a0a529 ("block, scsi: Make SCSI quiesce and resume work reliably") # v4.15
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
repojohnray pushed a commit to repojohnray/linux-sunxi-4.7.y that referenced this pull request Jul 26, 2019
[ Upstream commit 3f167e1 ]

ipv4_pdp_add() is called in RCU read-side critical section.
So GFP_KERNEL should not be used in the function.
This patch make ipv4_pdp_add() to use GFP_ATOMIC instead of GFP_KERNEL.

Test commands:
gtp-link add gtp1 &
gtp-tunnel add gtp1 v1 100 200 1.1.1.1 2.2.2.2

Splat looks like:
[  130.618881] =============================
[  130.626382] WARNING: suspicious RCU usage
[  130.626994] 5.2.0-rc6+ linux-sunxi#50 Not tainted
[  130.627622] -----------------------------
[  130.628223] ./include/linux/rcupdate.h:266 Illegal context switch in RCU read-side critical section!
[  130.629684]
[  130.629684] other info that might help us debug this:
[  130.629684]
[  130.631022]
[  130.631022] rcu_scheduler_active = 2, debug_locks = 1
[  130.632136] 4 locks held by gtp-tunnel/1025:
[  130.632925]  #0: 000000002b93c8b7 (cb_lock){++++}, at: genl_rcv+0x15/0x40
[  130.634159]  jwrdegoede#1: 00000000f17bc999 (genl_mutex){+.+.}, at: genl_rcv_msg+0xfb/0x130
[  130.635487]  jwrdegoede#2: 00000000c644ed8e (rtnl_mutex){+.+.}, at: gtp_genl_new_pdp+0x18c/0x1150 [gtp]
[  130.636936]  jwrdegoede#3: 0000000007a1cde7 (rcu_read_lock){....}, at: gtp_genl_new_pdp+0x187/0x1150 [gtp]
[  130.638348]
[  130.638348] stack backtrace:
[  130.639062] CPU: 1 PID: 1025 Comm: gtp-tunnel Not tainted 5.2.0-rc6+ linux-sunxi#50
[  130.641318] Call Trace:
[  130.641707]  dump_stack+0x7c/0xbb
[  130.642252]  ___might_sleep+0x2c0/0x3b0
[  130.642862]  kmem_cache_alloc_trace+0x1cd/0x2b0
[  130.643591]  gtp_genl_new_pdp+0x6c5/0x1150 [gtp]
[  130.644371]  genl_family_rcv_msg+0x63a/0x1030
[  130.645074]  ? mutex_lock_io_nested+0x1090/0x1090
[  130.645845]  ? genl_unregister_family+0x630/0x630
[  130.646592]  ? debug_show_all_locks+0x2d0/0x2d0
[  130.647293]  ? check_flags.part.40+0x440/0x440
[  130.648099]  genl_rcv_msg+0xa3/0x130
[ ... ]

Fixes: 459aa66 ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
repojohnray pushed a commit to repojohnray/linux-sunxi-4.7.y that referenced this pull request Jul 26, 2019
[ Upstream commit 1788b85 ]

gtp_encap_destroy() is called twice.
1. When interface is deleted.
2. When udp socket is destroyed.
either gtp->sk0 or gtp->sk1u could be freed by sock_put() in
gtp_encap_destroy(). so, when gtp_encap_destroy() is called again,
it would uses freed sk pointer.

patch makes gtp_encap_destroy() to set either gtp->sk0 or gtp->sk1u to
null. in addition, both gtp->sk0 and gtp->sk1u pointer are protected
by rtnl_lock. so, rtnl_lock() is added.

Test command:
   gtp-link add gtp1 &
   killall gtp-link
   ip link del gtp1

Splat looks like:
[   83.182767] BUG: KASAN: use-after-free in __lock_acquire+0x3a20/0x46a0
[   83.184128] Read of size 8 at addr ffff8880cc7d5360 by task ip/1008
[   83.185567] CPU: 1 PID: 1008 Comm: ip Not tainted 5.2.0-rc6+ linux-sunxi#50
[   83.188469] Call Trace:
[ ... ]
[   83.200126]  lock_acquire+0x141/0x380
[   83.200575]  ? lock_sock_nested+0x3a/0xf0
[   83.201069]  _raw_spin_lock_bh+0x38/0x70
[   83.201551]  ? lock_sock_nested+0x3a/0xf0
[   83.202044]  lock_sock_nested+0x3a/0xf0
[   83.202520]  gtp_encap_destroy+0x18/0xe0 [gtp]
[   83.203065]  gtp_encap_disable.isra.14+0x13/0x50 [gtp]
[   83.203687]  gtp_dellink+0x56/0x170 [gtp]
[   83.204190]  rtnl_delete_link+0xb4/0x100
[ ... ]
[   83.236513] Allocated by task 976:
[   83.236925]  save_stack+0x19/0x80
[   83.237332]  __kasan_kmalloc.constprop.3+0xa0/0xd0
[   83.237894]  kmem_cache_alloc+0xd8/0x280
[   83.238360]  sk_prot_alloc.isra.42+0x50/0x200
[   83.238874]  sk_alloc+0x32/0x940
[   83.239264]  inet_create+0x283/0xc20
[   83.239684]  __sock_create+0x2dd/0x540
[   83.240136]  __sys_socket+0xca/0x1a0
[   83.240550]  __x64_sys_socket+0x6f/0xb0
[   83.240998]  do_syscall_64+0x9c/0x450
[   83.241466]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   83.242061]
[   83.242249] Freed by task 0:
[   83.242616]  save_stack+0x19/0x80
[   83.243013]  __kasan_slab_free+0x111/0x150
[   83.243498]  kmem_cache_free+0x89/0x250
[   83.244444]  __sk_destruct+0x38f/0x5a0
[   83.245366]  rcu_core+0x7e9/0x1c20
[   83.245766]  __do_softirq+0x213/0x8fa

Fixes: 1e3a3ab ("gtp: make GTP sockets in gtp_newlink optional")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
repojohnray pushed a commit to repojohnray/linux-sunxi-4.7.y that referenced this pull request Jul 26, 2019
[ Upstream commit a2bed90 ]

Current gtp_newlink() could be called after unregister_pernet_subsys().
gtp_newlink() uses gtp_net but it can be destroyed by
unregister_pernet_subsys().
So unregister_pernet_subsys() should be called after
rtnl_link_unregister().

Test commands:
   #SHELL 1
   while :
   do
	   for i in {1..5}
	   do
		./gtp-link add gtp$i &
	   done
	   killall gtp-link
   done

   #SHELL 2
   while :
   do
	modprobe -rv gtp
   done

Splat looks like:
[  753.176631] BUG: KASAN: use-after-free in gtp_newlink+0x9b4/0xa5c [gtp]
[  753.177722] Read of size 8 at addr ffff8880d48f2458 by task gtp-link/7126
[  753.179082] CPU: 0 PID: 7126 Comm: gtp-link Tainted: G        W         5.2.0-rc6+ linux-sunxi#50
[  753.185801] Call Trace:
[  753.186264]  dump_stack+0x7c/0xbb
[  753.186863]  ? gtp_newlink+0x9b4/0xa5c [gtp]
[  753.187583]  print_address_description+0xc7/0x240
[  753.188382]  ? gtp_newlink+0x9b4/0xa5c [gtp]
[  753.189097]  ? gtp_newlink+0x9b4/0xa5c [gtp]
[  753.189846]  __kasan_report+0x12a/0x16f
[  753.190542]  ? gtp_newlink+0x9b4/0xa5c [gtp]
[  753.191298]  kasan_report+0xe/0x20
[  753.191893]  gtp_newlink+0x9b4/0xa5c [gtp]
[  753.192580]  ? __netlink_ns_capable+0xc3/0xf0
[  753.193370]  __rtnl_newlink+0xb9f/0x11b0
[ ... ]
[  753.241201] Allocated by task 7186:
[  753.241844]  save_stack+0x19/0x80
[  753.242399]  __kasan_kmalloc.constprop.3+0xa0/0xd0
[  753.243192]  __kmalloc+0x13e/0x300
[  753.243764]  ops_init+0xd6/0x350
[  753.244314]  register_pernet_operations+0x249/0x6f0
[ ... ]
[  753.251770] Freed by task 7178:
[  753.252288]  save_stack+0x19/0x80
[  753.252833]  __kasan_slab_free+0x111/0x150
[  753.253962]  kfree+0xc7/0x280
[  753.254509]  ops_free_list.part.11+0x1c4/0x2d0
[  753.255241]  unregister_pernet_operations+0x262/0x390
[ ... ]
[  753.285883] list_add corruption. next->prev should be prev (ffff8880d48f2458), but was ffff8880d497d878. (next.
[  753.287241] ------------[ cut here ]------------
[  753.287794] kernel BUG at lib/list_debug.c:25!
[  753.288364] invalid opcode: 0000 [jwrdegoede#1] SMP DEBUG_PAGEALLOC KASAN PTI
[  753.289099] CPU: 0 PID: 7126 Comm: gtp-link Tainted: G    B   W         5.2.0-rc6+ linux-sunxi#50
[  753.291036] RIP: 0010:__list_add_valid+0x74/0xd0
[  753.291589] Code: 48 39 da 75 27 48 39 f5 74 36 48 39 dd 74 31 48 83 c4 08 b8 01 00 00 00 5b 5d c3 48 89 d9 48b
[  753.293779] RSP: 0018:ffff8880cae8f398 EFLAGS: 00010286
[  753.294401] RAX: 0000000000000075 RBX: ffff8880d497d878 RCX: 0000000000000000
[  753.296260] RDX: 0000000000000075 RSI: 0000000000000008 RDI: ffffed10195d1e69
[  753.297070] RBP: ffff8880cd250ae0 R08: ffffed101b4bff21 R09: ffffed101b4bff21
[  753.297899] R10: 0000000000000001 R11: ffffed101b4bff20 R12: ffff8880d497d878
[  753.298703] R13: 0000000000000000 R14: ffff8880cd250ae0 R15: ffff8880d48f2458
[  753.299564] FS:  00007f5f79805740(0000) GS:ffff8880da400000(0000) knlGS:0000000000000000
[  753.300533] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  753.301231] CR2: 00007fe8c7ef4f10 CR3: 00000000b71a6006 CR4: 00000000000606f0
[  753.302183] Call Trace:
[  753.302530]  gtp_newlink+0x5f6/0xa5c [gtp]
[  753.303037]  ? __netlink_ns_capable+0xc3/0xf0
[  753.303576]  __rtnl_newlink+0xb9f/0x11b0
[  753.304092]  ? rtnl_link_unregister+0x230/0x230

Fixes: 459aa66 ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
repojohnray pushed a commit to repojohnray/linux-sunxi-4.7.y that referenced this pull request Jan 18, 2021
…tirq

Commit 39d42fa ("dm crypt: add flags to optionally bypass kcryptd
workqueues") made it possible for some code paths in dm-crypt to be
executed in softirq context, when the underlying driver processes IO
requests in interrupt/softirq context.

When Crypto API backlogs a crypto request, dm-crypt uses
wait_for_completion to avoid sending further requests to an already
overloaded crypto driver. However, if the code is executing in softirq
context, we might get the following stacktrace:

[  210.235213][    C0] BUG: scheduling while atomic: fio/2602/0x00000102
[  210.236701][    C0] Modules linked in:
[  210.237566][    C0] CPU: 0 PID: 2602 Comm: fio Tainted: G        W         5.10.0+ linux-sunxi#50
[  210.239292][    C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[  210.241233][    C0] Call Trace:
[  210.241946][    C0]  <IRQ>
[  210.242561][    C0]  dump_stack+0x7d/0xa3
[  210.243466][    C0]  __schedule_bug.cold+0xb3/0xc2
[  210.244539][    C0]  __schedule+0x156f/0x20d0
[  210.245518][    C0]  ? io_schedule_timeout+0x140/0x140
[  210.246660][    C0]  schedule+0xd0/0x270
[  210.247541][    C0]  schedule_timeout+0x1fb/0x280
[  210.248586][    C0]  ? usleep_range+0x150/0x150
[  210.249624][    C0]  ? unpoison_range+0x3a/0x60
[  210.250632][    C0]  ? ____kasan_kmalloc.constprop.0+0x82/0xa0
[  210.251949][    C0]  ? unpoison_range+0x3a/0x60
[  210.252958][    C0]  ? __prepare_to_swait+0xa7/0x190
[  210.254067][    C0]  do_wait_for_common+0x2ab/0x370
[  210.255158][    C0]  ? usleep_range+0x150/0x150
[  210.256192][    C0]  ? bit_wait_io_timeout+0x160/0x160
[  210.257358][    C0]  ? blk_update_request+0x757/0x1150
[  210.258582][    C0]  ? _raw_spin_lock_irq+0x82/0xd0
[  210.259674][    C0]  ? _raw_read_unlock_irqrestore+0x30/0x30
[  210.260917][    C0]  wait_for_completion+0x4c/0x90
[  210.261971][    C0]  crypt_convert+0x19a6/0x4c00
[  210.263033][    C0]  ? _raw_spin_lock_irqsave+0x87/0xe0
[  210.264193][    C0]  ? kasan_set_track+0x1c/0x30
[  210.265191][    C0]  ? crypt_iv_tcw_ctr+0x4a0/0x4a0
[  210.266283][    C0]  ? kmem_cache_free+0x104/0x470
[  210.267363][    C0]  ? crypt_endio+0x91/0x180
[  210.268327][    C0]  kcryptd_crypt_read_convert+0x30e/0x420
[  210.269565][    C0]  blk_update_request+0x757/0x1150
[  210.270563][    C0]  blk_mq_end_request+0x4b/0x480
[  210.271680][    C0]  blk_done_softirq+0x21d/0x340
[  210.272775][    C0]  ? _raw_spin_lock+0x81/0xd0
[  210.273847][    C0]  ? blk_mq_stop_hw_queue+0x30/0x30
[  210.275031][    C0]  ? _raw_read_lock_irq+0x40/0x40
[  210.276182][    C0]  __do_softirq+0x190/0x611
[  210.277203][    C0]  ? handle_edge_irq+0x221/0xb60
[  210.278340][    C0]  asm_call_irq_on_stack+0x12/0x20
[  210.279514][    C0]  </IRQ>
[  210.280164][    C0]  do_softirq_own_stack+0x37/0x40
[  210.281281][    C0]  irq_exit_rcu+0x110/0x1b0
[  210.282286][    C0]  common_interrupt+0x74/0x120
[  210.283376][    C0]  asm_common_interrupt+0x1e/0x40
[  210.284496][    C0] RIP: 0010:_aesni_enc1+0x65/0xb0

Fix this by making crypt_convert function reentrant from the point of
a single bio and make dm-crypt defer further bio processing to a
workqueue, if Crypto API backlogs a request in interrupt context.

Fixes: 39d42fa ("dm crypt: add flags to optionally bypass kcryptd workqueues")
Cc: stable@vger.kernel.org # v5.9+
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
repojohnray pushed a commit to repojohnray/linux-sunxi-4.7.y that referenced this pull request Jan 18, 2021
Commit 39d42fa ("dm crypt: add flags to optionally bypass kcryptd
workqueues") made it possible for some code paths in dm-crypt to be
executed in softirq context, when the underlying driver processes IO
requests in interrupt/softirq context.

In this case sometimes when allocating a new crypto request we may get
a stacktrace like below:

[  210.103008][    C0] BUG: sleeping function called from invalid context at mm/mempool.c:381
[  210.104746][    C0] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2602, name: fio
[  210.106599][    C0] CPU: 0 PID: 2602 Comm: fio Tainted: G        W         5.10.0+ linux-sunxi#50
[  210.108331][    C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[  210.110212][    C0] Call Trace:
[  210.110921][    C0]  <IRQ>
[  210.111527][    C0]  dump_stack+0x7d/0xa3
[  210.112411][    C0]  ___might_sleep.cold+0x122/0x151
[  210.113527][    C0]  mempool_alloc+0x16b/0x2f0
[  210.114524][    C0]  ? __queue_work+0x515/0xde0
[  210.115553][    C0]  ? mempool_resize+0x700/0x700
[  210.116586][    C0]  ? crypt_endio+0x91/0x180
[  210.117479][    C0]  ? blk_update_request+0x757/0x1150
[  210.118513][    C0]  ? blk_mq_end_request+0x4b/0x480
[  210.119572][    C0]  ? blk_done_softirq+0x21d/0x340
[  210.120628][    C0]  ? __do_softirq+0x190/0x611
[  210.121626][    C0]  crypt_convert+0x29f9/0x4c00
[  210.122668][    C0]  ? _raw_spin_lock_irqsave+0x87/0xe0
[  210.123824][    C0]  ? kasan_set_track+0x1c/0x30
[  210.124858][    C0]  ? crypt_iv_tcw_ctr+0x4a0/0x4a0
[  210.125930][    C0]  ? kmem_cache_free+0x104/0x470
[  210.126973][    C0]  ? crypt_endio+0x91/0x180
[  210.127947][    C0]  kcryptd_crypt_read_convert+0x30e/0x420
[  210.129165][    C0]  blk_update_request+0x757/0x1150
[  210.130231][    C0]  blk_mq_end_request+0x4b/0x480
[  210.131294][    C0]  blk_done_softirq+0x21d/0x340
[  210.132332][    C0]  ? _raw_spin_lock+0x81/0xd0
[  210.133289][    C0]  ? blk_mq_stop_hw_queue+0x30/0x30
[  210.134399][    C0]  ? _raw_read_lock_irq+0x40/0x40
[  210.135458][    C0]  __do_softirq+0x190/0x611
[  210.136409][    C0]  ? handle_edge_irq+0x221/0xb60
[  210.137447][    C0]  asm_call_irq_on_stack+0x12/0x20
[  210.138507][    C0]  </IRQ>
[  210.139118][    C0]  do_softirq_own_stack+0x37/0x40
[  210.140191][    C0]  irq_exit_rcu+0x110/0x1b0
[  210.141151][    C0]  common_interrupt+0x74/0x120
[  210.142171][    C0]  asm_common_interrupt+0x1e/0x40

Fix this by allocating crypto requests with GFP_ATOMIC mask in
interrupt context.

Fixes: 39d42fa ("dm crypt: add flags to optionally bypass kcryptd workqueues")
Cc: stable@vger.kernel.org # v5.9+
Reported-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
repojohnray pushed a commit to repojohnray/linux-sunxi-4.7.y that referenced this pull request Jan 21, 2021
commit d68b295 upstream.

Commit 39d42fa ("dm crypt: add flags to optionally bypass kcryptd
workqueues") made it possible for some code paths in dm-crypt to be
executed in softirq context, when the underlying driver processes IO
requests in interrupt/softirq context.

In this case sometimes when allocating a new crypto request we may get
a stacktrace like below:

[  210.103008][    C0] BUG: sleeping function called from invalid context at mm/mempool.c:381
[  210.104746][    C0] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2602, name: fio
[  210.106599][    C0] CPU: 0 PID: 2602 Comm: fio Tainted: G        W         5.10.0+ linux-sunxi#50
[  210.108331][    C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[  210.110212][    C0] Call Trace:
[  210.110921][    C0]  <IRQ>
[  210.111527][    C0]  dump_stack+0x7d/0xa3
[  210.112411][    C0]  ___might_sleep.cold+0x122/0x151
[  210.113527][    C0]  mempool_alloc+0x16b/0x2f0
[  210.114524][    C0]  ? __queue_work+0x515/0xde0
[  210.115553][    C0]  ? mempool_resize+0x700/0x700
[  210.116586][    C0]  ? crypt_endio+0x91/0x180
[  210.117479][    C0]  ? blk_update_request+0x757/0x1150
[  210.118513][    C0]  ? blk_mq_end_request+0x4b/0x480
[  210.119572][    C0]  ? blk_done_softirq+0x21d/0x340
[  210.120628][    C0]  ? __do_softirq+0x190/0x611
[  210.121626][    C0]  crypt_convert+0x29f9/0x4c00
[  210.122668][    C0]  ? _raw_spin_lock_irqsave+0x87/0xe0
[  210.123824][    C0]  ? kasan_set_track+0x1c/0x30
[  210.124858][    C0]  ? crypt_iv_tcw_ctr+0x4a0/0x4a0
[  210.125930][    C0]  ? kmem_cache_free+0x104/0x470
[  210.126973][    C0]  ? crypt_endio+0x91/0x180
[  210.127947][    C0]  kcryptd_crypt_read_convert+0x30e/0x420
[  210.129165][    C0]  blk_update_request+0x757/0x1150
[  210.130231][    C0]  blk_mq_end_request+0x4b/0x480
[  210.131294][    C0]  blk_done_softirq+0x21d/0x340
[  210.132332][    C0]  ? _raw_spin_lock+0x81/0xd0
[  210.133289][    C0]  ? blk_mq_stop_hw_queue+0x30/0x30
[  210.134399][    C0]  ? _raw_read_lock_irq+0x40/0x40
[  210.135458][    C0]  __do_softirq+0x190/0x611
[  210.136409][    C0]  ? handle_edge_irq+0x221/0xb60
[  210.137447][    C0]  asm_call_irq_on_stack+0x12/0x20
[  210.138507][    C0]  </IRQ>
[  210.139118][    C0]  do_softirq_own_stack+0x37/0x40
[  210.140191][    C0]  irq_exit_rcu+0x110/0x1b0
[  210.141151][    C0]  common_interrupt+0x74/0x120
[  210.142171][    C0]  asm_common_interrupt+0x1e/0x40

Fix this by allocating crypto requests with GFP_ATOMIC mask in
interrupt context.

Fixes: 39d42fa ("dm crypt: add flags to optionally bypass kcryptd workqueues")
Cc: stable@vger.kernel.org # v5.9+
Reported-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
repojohnray pushed a commit to repojohnray/linux-sunxi-4.7.y that referenced this pull request Jan 21, 2021
…tirq

commit 8abec36 upstream.

Commit 39d42fa ("dm crypt: add flags to optionally bypass kcryptd
workqueues") made it possible for some code paths in dm-crypt to be
executed in softirq context, when the underlying driver processes IO
requests in interrupt/softirq context.

When Crypto API backlogs a crypto request, dm-crypt uses
wait_for_completion to avoid sending further requests to an already
overloaded crypto driver. However, if the code is executing in softirq
context, we might get the following stacktrace:

[  210.235213][    C0] BUG: scheduling while atomic: fio/2602/0x00000102
[  210.236701][    C0] Modules linked in:
[  210.237566][    C0] CPU: 0 PID: 2602 Comm: fio Tainted: G        W         5.10.0+ linux-sunxi#50
[  210.239292][    C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[  210.241233][    C0] Call Trace:
[  210.241946][    C0]  <IRQ>
[  210.242561][    C0]  dump_stack+0x7d/0xa3
[  210.243466][    C0]  __schedule_bug.cold+0xb3/0xc2
[  210.244539][    C0]  __schedule+0x156f/0x20d0
[  210.245518][    C0]  ? io_schedule_timeout+0x140/0x140
[  210.246660][    C0]  schedule+0xd0/0x270
[  210.247541][    C0]  schedule_timeout+0x1fb/0x280
[  210.248586][    C0]  ? usleep_range+0x150/0x150
[  210.249624][    C0]  ? unpoison_range+0x3a/0x60
[  210.250632][    C0]  ? ____kasan_kmalloc.constprop.0+0x82/0xa0
[  210.251949][    C0]  ? unpoison_range+0x3a/0x60
[  210.252958][    C0]  ? __prepare_to_swait+0xa7/0x190
[  210.254067][    C0]  do_wait_for_common+0x2ab/0x370
[  210.255158][    C0]  ? usleep_range+0x150/0x150
[  210.256192][    C0]  ? bit_wait_io_timeout+0x160/0x160
[  210.257358][    C0]  ? blk_update_request+0x757/0x1150
[  210.258582][    C0]  ? _raw_spin_lock_irq+0x82/0xd0
[  210.259674][    C0]  ? _raw_read_unlock_irqrestore+0x30/0x30
[  210.260917][    C0]  wait_for_completion+0x4c/0x90
[  210.261971][    C0]  crypt_convert+0x19a6/0x4c00
[  210.263033][    C0]  ? _raw_spin_lock_irqsave+0x87/0xe0
[  210.264193][    C0]  ? kasan_set_track+0x1c/0x30
[  210.265191][    C0]  ? crypt_iv_tcw_ctr+0x4a0/0x4a0
[  210.266283][    C0]  ? kmem_cache_free+0x104/0x470
[  210.267363][    C0]  ? crypt_endio+0x91/0x180
[  210.268327][    C0]  kcryptd_crypt_read_convert+0x30e/0x420
[  210.269565][    C0]  blk_update_request+0x757/0x1150
[  210.270563][    C0]  blk_mq_end_request+0x4b/0x480
[  210.271680][    C0]  blk_done_softirq+0x21d/0x340
[  210.272775][    C0]  ? _raw_spin_lock+0x81/0xd0
[  210.273847][    C0]  ? blk_mq_stop_hw_queue+0x30/0x30
[  210.275031][    C0]  ? _raw_read_lock_irq+0x40/0x40
[  210.276182][    C0]  __do_softirq+0x190/0x611
[  210.277203][    C0]  ? handle_edge_irq+0x221/0xb60
[  210.278340][    C0]  asm_call_irq_on_stack+0x12/0x20
[  210.279514][    C0]  </IRQ>
[  210.280164][    C0]  do_softirq_own_stack+0x37/0x40
[  210.281281][    C0]  irq_exit_rcu+0x110/0x1b0
[  210.282286][    C0]  common_interrupt+0x74/0x120
[  210.283376][    C0]  asm_common_interrupt+0x1e/0x40
[  210.284496][    C0] RIP: 0010:_aesni_enc1+0x65/0xb0

Fix this by making crypt_convert function reentrant from the point of
a single bio and make dm-crypt defer further bio processing to a
workqueue, if Crypto API backlogs a request in interrupt context.

Fixes: 39d42fa ("dm crypt: add flags to optionally bypass kcryptd workqueues")
Cc: stable@vger.kernel.org # v5.9+
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
repojohnray pushed a commit to repojohnray/linux-sunxi-4.7.y that referenced this pull request May 20, 2021
[ Upstream commit 5bbf219 ]

An out of bounds write happens when setting the default power state.
KASAN sees this as:

[drm] radeon: 512M of GTT memory ready.
[drm] GART: num cpu pages 131072, num gpu pages 131072
==================================================================
BUG: KASAN: slab-out-of-bounds in
radeon_atombios_parse_power_table_1_3+0x1837/0x1998 [radeon]
Write of size 4 at addr ffff88810178d858 by task systemd-udevd/157

CPU: 0 PID: 157 Comm: systemd-udevd Not tainted 5.12.0-E620 linux-sunxi#50
Hardware name: eMachines        eMachines E620  /Nile       , BIOS V1.03 09/30/2008
Call Trace:
 dump_stack+0xa5/0xe6
 print_address_description.constprop.0+0x18/0x239
 kasan_report+0x170/0x1a8
 radeon_atombios_parse_power_table_1_3+0x1837/0x1998 [radeon]
 radeon_atombios_get_power_modes+0x144/0x1888 [radeon]
 radeon_pm_init+0x1019/0x1904 [radeon]
 rs690_init+0x76e/0x84a [radeon]
 radeon_device_init+0x1c1a/0x21e5 [radeon]
 radeon_driver_load_kms+0xf5/0x30b [radeon]
 drm_dev_register+0x255/0x4a0 [drm]
 radeon_pci_probe+0x246/0x2f6 [radeon]
 pci_device_probe+0x1aa/0x294
 really_probe+0x30e/0x850
 driver_probe_device+0xe6/0x135
 device_driver_attach+0xc1/0xf8
 __driver_attach+0x13f/0x146
 bus_for_each_dev+0xfa/0x146
 bus_add_driver+0x2b3/0x447
 driver_register+0x242/0x2c1
 do_one_initcall+0x149/0x2fd
 do_init_module+0x1ae/0x573
 load_module+0x4dee/0x5cca
 __do_sys_finit_module+0xf1/0x140
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Without KASAN, this will manifest later when the kernel attempts to
allocate memory that was stomped, since it collides with the inline slab
freelist pointer:

invalid opcode: 0000 [jwrdegoede#1] SMP NOPTI
CPU: 0 PID: 781 Comm: openrc-run.sh Tainted: G        W 5.10.12-gentoo-E620 jwrdegoede#2
Hardware name: eMachines        eMachines E620  /Nile , BIOS V1.03       09/30/2008
RIP: 0010:kfree+0x115/0x230
Code: 89 c5 e8 75 ea ff ff 48 8b 00 0f ba e0 09 72 63 e8 1f f4 ff ff 41 89 c4 48 8b 45 00 0f ba e0 10 72 0a 48 8b 45 08 a8 01 75 02 <0f> 0b 44 89 e1 48 c7 c2 00 f0 ff ff be 06 00 00 00 48 d3 e2 48 c7
RSP: 0018:ffffb42f40267e10 EFLAGS: 00010246
RAX: ffffd61280ee8d88 RBX: 0000000000000004 RCX: 000000008010000d
RDX: 4000000000000000 RSI: ffffffffba1360b0 RDI: ffffd61280ee8d80
RBP: ffffd61280ee8d80 R08: ffffffffb91bebdf R09: 0000000000000000
R10: ffff8fe2c1047ac8 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000100
FS:  00007fe80eff6b68(0000) GS:ffff8fe339c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe80eec7bc0 CR3: 0000000038012000 CR4: 00000000000006f0
Call Trace:
 __free_fdtable+0x16/0x1f
 put_files_struct+0x81/0x9b
 do_exit+0x433/0x94d
 do_group_exit+0xa6/0xa6
 __x64_sys_exit_group+0xf/0xf
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fe80ef64bea
Code: Unable to access opcode bytes at RIP 0x7fe80ef64bc0.
RSP: 002b:00007ffdb1c47528 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fe80ef64bea
RDX: 00007fe80ef64f60 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
R10: 00007fe80ee2c620 R11: 0000000000000246 R12: 00007fe80eff41e0
R13: 00000000ffffffff R14: 0000000000000024 R15: 00007fe80edf9cd0
Modules linked in: radeon(+) ath5k(+) snd_hda_codec_realtek ...

Use a valid power_state index when initializing the "flags" and "misc"
and "misc2" fields.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211537
Reported-by: Erhard F. <erhard_f@mailbox.org>
Fixes: a48b9b4 ("drm/radeon/kms/pm: add asic specific callbacks for getting power state (v2)")
Fixes: 79daedc ("drm/radeon/kms: minor pm cleanups")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
repojohnray pushed a commit to repojohnray/linux-sunxi-4.7.y that referenced this pull request May 22, 2021
[ Upstream commit 5bbf219 ]

An out of bounds write happens when setting the default power state.
KASAN sees this as:

[drm] radeon: 512M of GTT memory ready.
[drm] GART: num cpu pages 131072, num gpu pages 131072
==================================================================
BUG: KASAN: slab-out-of-bounds in
radeon_atombios_parse_power_table_1_3+0x1837/0x1998 [radeon]
Write of size 4 at addr ffff88810178d858 by task systemd-udevd/157

CPU: 0 PID: 157 Comm: systemd-udevd Not tainted 5.12.0-E620 linux-sunxi#50
Hardware name: eMachines        eMachines E620  /Nile       , BIOS V1.03 09/30/2008
Call Trace:
 dump_stack+0xa5/0xe6
 print_address_description.constprop.0+0x18/0x239
 kasan_report+0x170/0x1a8
 radeon_atombios_parse_power_table_1_3+0x1837/0x1998 [radeon]
 radeon_atombios_get_power_modes+0x144/0x1888 [radeon]
 radeon_pm_init+0x1019/0x1904 [radeon]
 rs690_init+0x76e/0x84a [radeon]
 radeon_device_init+0x1c1a/0x21e5 [radeon]
 radeon_driver_load_kms+0xf5/0x30b [radeon]
 drm_dev_register+0x255/0x4a0 [drm]
 radeon_pci_probe+0x246/0x2f6 [radeon]
 pci_device_probe+0x1aa/0x294
 really_probe+0x30e/0x850
 driver_probe_device+0xe6/0x135
 device_driver_attach+0xc1/0xf8
 __driver_attach+0x13f/0x146
 bus_for_each_dev+0xfa/0x146
 bus_add_driver+0x2b3/0x447
 driver_register+0x242/0x2c1
 do_one_initcall+0x149/0x2fd
 do_init_module+0x1ae/0x573
 load_module+0x4dee/0x5cca
 __do_sys_finit_module+0xf1/0x140
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Without KASAN, this will manifest later when the kernel attempts to
allocate memory that was stomped, since it collides with the inline slab
freelist pointer:

invalid opcode: 0000 [jwrdegoede#1] SMP NOPTI
CPU: 0 PID: 781 Comm: openrc-run.sh Tainted: G        W 5.10.12-gentoo-E620 jwrdegoede#2
Hardware name: eMachines        eMachines E620  /Nile , BIOS V1.03       09/30/2008
RIP: 0010:kfree+0x115/0x230
Code: 89 c5 e8 75 ea ff ff 48 8b 00 0f ba e0 09 72 63 e8 1f f4 ff ff 41 89 c4 48 8b 45 00 0f ba e0 10 72 0a 48 8b 45 08 a8 01 75 02 <0f> 0b 44 89 e1 48 c7 c2 00 f0 ff ff be 06 00 00 00 48 d3 e2 48 c7
RSP: 0018:ffffb42f40267e10 EFLAGS: 00010246
RAX: ffffd61280ee8d88 RBX: 0000000000000004 RCX: 000000008010000d
RDX: 4000000000000000 RSI: ffffffffba1360b0 RDI: ffffd61280ee8d80
RBP: ffffd61280ee8d80 R08: ffffffffb91bebdf R09: 0000000000000000
R10: ffff8fe2c1047ac8 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000100
FS:  00007fe80eff6b68(0000) GS:ffff8fe339c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe80eec7bc0 CR3: 0000000038012000 CR4: 00000000000006f0
Call Trace:
 __free_fdtable+0x16/0x1f
 put_files_struct+0x81/0x9b
 do_exit+0x433/0x94d
 do_group_exit+0xa6/0xa6
 __x64_sys_exit_group+0xf/0xf
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fe80ef64bea
Code: Unable to access opcode bytes at RIP 0x7fe80ef64bc0.
RSP: 002b:00007ffdb1c47528 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fe80ef64bea
RDX: 00007fe80ef64f60 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
R10: 00007fe80ee2c620 R11: 0000000000000246 R12: 00007fe80eff41e0
R13: 00000000ffffffff R14: 0000000000000024 R15: 00007fe80edf9cd0
Modules linked in: radeon(+) ath5k(+) snd_hda_codec_realtek ...

Use a valid power_state index when initializing the "flags" and "misc"
and "misc2" fields.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211537
Reported-by: Erhard F. <erhard_f@mailbox.org>
Fixes: a48b9b4 ("drm/radeon/kms/pm: add asic specific callbacks for getting power state (v2)")
Fixes: 79daedc ("drm/radeon/kms: minor pm cleanups")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
repojohnray pushed a commit to repojohnray/linux-sunxi-4.7.y that referenced this pull request May 22, 2021
[ Upstream commit d5027ca ]

Ritesh reported a bug [1] against UML, noting that it crashed on
startup. The backtrace shows the following (heavily redacted):

(gdb) bt
...
 linux-sunxi#26 0x0000000060015b5d in sem_init () at ipc/sem.c:268
 linux-sunxi#27 0x00007f89906d92f7 in ?? () from /lib/x86_64-linux-gnu/libcom_err.so.2
 linux-sunxi#28 0x00007f8990ab8fb2 in call_init (...) at dl-init.c:72
...
 linux-sunxi#40 0x00007f89909bf3a6 in nss_load_library (...) at nsswitch.c:359
...
 linux-sunxi#44 0x00007f8990895e35 in _nss_compat_getgrnam_r (...) at nss_compat/compat-grp.c:486
 linux-sunxi#45 0x00007f8990968b85 in __getgrnam_r [...]
 linux-sunxi#46 0x00007f89909d6b77 in grantpt [...]
 linux-sunxi#47 0x00007f8990a9394e in __GI_openpty [...]
 linux-sunxi#48 0x00000000604a1f65 in openpty_cb (...) at arch/um/os-Linux/sigio.c:407
 linux-sunxi#49 0x00000000604a58d0 in start_idle_thread (...) at arch/um/os-Linux/skas/process.c:598
 linux-sunxi#50 0x0000000060004a3d in start_uml () at arch/um/kernel/skas/process.c:45
 linux-sunxi#51 0x00000000600047b2 in linux_main (...) at arch/um/kernel/um_arch.c:334
 linux-sunxi#52 0x000000006000574f in main (...) at arch/um/os-Linux/main.c:144

indicating that the UML function openpty_cb() calls openpty(),
which internally calls __getgrnam_r(), which causes the nsswitch
machinery to get started.

This loads, through lots of indirection that I snipped, the
libcom_err.so.2 library, which (in an unknown function, "??")
calls sem_init().

Now, of course it wants to get libpthread's sem_init(), since
it's linked against libpthread. However, the dynamic linker
looks up that symbol against the binary first, and gets the
kernel's sem_init().

Hajime Tazaki noted that "objcopy -L" can localize a symbol,
so the dynamic linker wouldn't do the lookup this way. I tried,
but for some reason that didn't seem to work.

Doing the same thing in the linker script instead does seem to
work, though I cannot entirely explain - it *also* works if I
just add "VERSION { { global: *; }; }" instead, indicating that
something else is happening that I don't really understand. It
may be that explicitly doing that marks them with some kind of
empty version, and that's different from the default.

Explicitly marking them with a version breaks kallsyms, so that
doesn't seem to be possible.

Marking all the symbols as local seems correct, and does seem
to address the issue, so do that. Also do it for static link,
nsswitch libraries could still be loaded there.

[1] https://bugs.debian.org/983379

Reported-by: Ritesh Raj Sarraf <rrs@debian.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Tested-By: Ritesh Raj Sarraf <rrs@debian.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
repojohnray pushed a commit to repojohnray/linux-sunxi-4.7.y that referenced this pull request May 24, 2021
[ Upstream commit d5027ca ]

Ritesh reported a bug [1] against UML, noting that it crashed on
startup. The backtrace shows the following (heavily redacted):

(gdb) bt
...
 linux-sunxi#26 0x0000000060015b5d in sem_init () at ipc/sem.c:268
 linux-sunxi#27 0x00007f89906d92f7 in ?? () from /lib/x86_64-linux-gnu/libcom_err.so.2
 linux-sunxi#28 0x00007f8990ab8fb2 in call_init (...) at dl-init.c:72
...
 linux-sunxi#40 0x00007f89909bf3a6 in nss_load_library (...) at nsswitch.c:359
...
 linux-sunxi#44 0x00007f8990895e35 in _nss_compat_getgrnam_r (...) at nss_compat/compat-grp.c:486
 linux-sunxi#45 0x00007f8990968b85 in __getgrnam_r [...]
 linux-sunxi#46 0x00007f89909d6b77 in grantpt [...]
 linux-sunxi#47 0x00007f8990a9394e in __GI_openpty [...]
 linux-sunxi#48 0x00000000604a1f65 in openpty_cb (...) at arch/um/os-Linux/sigio.c:407
 linux-sunxi#49 0x00000000604a58d0 in start_idle_thread (...) at arch/um/os-Linux/skas/process.c:598
 linux-sunxi#50 0x0000000060004a3d in start_uml () at arch/um/kernel/skas/process.c:45
 linux-sunxi#51 0x00000000600047b2 in linux_main (...) at arch/um/kernel/um_arch.c:334
 linux-sunxi#52 0x000000006000574f in main (...) at arch/um/os-Linux/main.c:144

indicating that the UML function openpty_cb() calls openpty(),
which internally calls __getgrnam_r(), which causes the nsswitch
machinery to get started.

This loads, through lots of indirection that I snipped, the
libcom_err.so.2 library, which (in an unknown function, "??")
calls sem_init().

Now, of course it wants to get libpthread's sem_init(), since
it's linked against libpthread. However, the dynamic linker
looks up that symbol against the binary first, and gets the
kernel's sem_init().

Hajime Tazaki noted that "objcopy -L" can localize a symbol,
so the dynamic linker wouldn't do the lookup this way. I tried,
but for some reason that didn't seem to work.

Doing the same thing in the linker script instead does seem to
work, though I cannot entirely explain - it *also* works if I
just add "VERSION { { global: *; }; }" instead, indicating that
something else is happening that I don't really understand. It
may be that explicitly doing that marks them with some kind of
empty version, and that's different from the default.

Explicitly marking them with a version breaks kallsyms, so that
doesn't seem to be possible.

Marking all the symbols as local seems correct, and does seem
to address the issue, so do that. Also do it for static link,
nsswitch libraries could still be loaded there.

[1] https://bugs.debian.org/983379

Reported-by: Ritesh Raj Sarraf <rrs@debian.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Tested-By: Ritesh Raj Sarraf <rrs@debian.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
repojohnray pushed a commit to repojohnray/linux-sunxi-4.7.y that referenced this pull request Apr 17, 2022
commit 6c8e2a2 upstream.

Problem:
=======

Userspace might read the zero-page instead of actual data from a direct IO
read on a block device if the buffers have been called madvise(MADV_FREE)
on earlier (this is discussed below) due to a race between page reclaim on
MADV_FREE and blkdev direct IO read.

- Race condition:
  ==============

During page reclaim, the MADV_FREE page check in try_to_unmap_one() checks
if the page is not dirty, then discards its rmap PTE(s) (vs.  remap back
if the page is dirty).

However, after try_to_unmap_one() returns to shrink_page_list(), it might
keep the page _anyway_ if page_ref_freeze() fails (it expects exactly
_one_ page reference, from the isolation for page reclaim).

Well, blkdev_direct_IO() gets references for all pages, and on READ
operations it only sets them dirty _later_.

So, if MADV_FREE'd pages (i.e., not dirty) are used as buffers for direct
IO read from block devices, and page reclaim happens during
__blkdev_direct_IO[_simple]() exactly AFTER bio_iov_iter_get_pages()
returns, but BEFORE the pages are set dirty, the situation happens.

The direct IO read eventually completes.  Now, when userspace reads the
buffers, the PTE is no longer there and the page fault handler
do_anonymous_page() services that with the zero-page, NOT the data!

A synthetic reproducer is provided.

- Page faults:
  ===========

If page reclaim happens BEFORE bio_iov_iter_get_pages() the issue doesn't
happen, because that faults-in all pages as writeable, so
do_anonymous_page() sets up a new page/rmap/PTE, and that is used by
direct IO.  The userspace reads don't fault as the PTE is there (thus
zero-page is not used/setup).

But if page reclaim happens AFTER it / BEFORE setting pages dirty, the PTE
is no longer there; the subsequent page faults can't help:

The data-read from the block device probably won't generate faults due to
DMA (no MMU) but even in the case it wouldn't use DMA, that happens on
different virtual addresses (not user-mapped addresses) because `struct
bio_vec` stores `struct page` to figure addresses out (which are different
from user-mapped addresses) for the read.

Thus userspace reads (to user-mapped addresses) still fault, then
do_anonymous_page() gets another `struct page` that would address/ map to
other memory than the `struct page` used by `struct bio_vec` for the read.
(The original `struct page` is not available, since it wasn't freed, as
page_ref_freeze() failed due to more page refs.  And even if it were
available, its data cannot be trusted anymore.)

Solution:
========

One solution is to check for the expected page reference count in
try_to_unmap_one().

There should be one reference from the isolation (that is also checked in
shrink_page_list() with page_ref_freeze()) plus one or more references
from page mapping(s) (put in discard: label).  Further references mean
that rmap/PTE cannot be unmapped/nuked.

(Note: there might be more than one reference from mapping due to
fork()/clone() without CLONE_VM, which use the same `struct page` for
references, until the copy-on-write page gets copied.)

So, additional page references (e.g., from direct IO read) now prevent the
rmap/PTE from being unmapped/dropped; similarly to the page is not freed
per shrink_page_list()/page_ref_freeze()).

- Races and Barriers:
  ==================

The new check in try_to_unmap_one() should be safe in races with
bio_iov_iter_get_pages() in get_user_pages() fast and slow paths, as it's
done under the PTE lock.

The fast path doesn't take the lock, but it checks if the PTE has changed
and if so, it drops the reference and leaves the page for the slow path
(which does take that lock).

The fast path requires synchronization w/ full memory barrier: it writes
the page reference count first then it reads the PTE later, while
try_to_unmap() writes PTE first then it reads page refcount.

And a second barrier is needed, as the page dirty flag should not be read
before the page reference count (as in __remove_mapping()).  (This can be
a load memory barrier only; no writes are involved.)

Call stack/comments:

- try_to_unmap_one()
  - page_vma_mapped_walk()
    - map_pte()			# see pte_offset_map_lock():
        pte_offset_map()
        spin_lock()

  - ptep_get_and_clear()	# write PTE
  - smp_mb()			# (new barrier) GUP fast path
  - page_ref_count()		# (new check) read refcount

  - page_vma_mapped_walk_done()	# see pte_unmap_unlock():
      pte_unmap()
      spin_unlock()

- bio_iov_iter_get_pages()
  - __bio_iov_iter_get_pages()
    - iov_iter_get_pages()
      - get_user_pages_fast()
        - internal_get_user_pages_fast()

          # fast path
          - lockless_pages_from_mm()
            - gup_{pgd,p4d,pud,pmd,pte}_range()
                ptep = pte_offset_map()		# not _lock()
                pte = ptep_get_lockless(ptep)

                page = pte_page(pte)
                try_grab_compound_head(page)	# inc refcount
                                            	# (RMW/barrier
                                             	#  on success)

                if (pte_val(pte) != pte_val(*ptep)) # read PTE
                        put_compound_head(page) # dec refcount
                        			# go slow path

          # slow path
          - __gup_longterm_unlocked()
            - get_user_pages_unlocked()
              - __get_user_pages_locked()
                - __get_user_pages()
                  - follow_{page,p4d,pud,pmd}_mask()
                    - follow_page_pte()
                        ptep = pte_offset_map_lock()
                        pte = *ptep
                        page = vm_normal_page(pte)
                        try_grab_page(page)	# inc refcount
                        pte_unmap_unlock()

- Huge Pages:
  ==========

Regarding transparent hugepages, that logic shouldn't change, as MADV_FREE
(aka lazyfree) pages are PageAnon() && !PageSwapBacked()
(madvise_free_pte_range() -> mark_page_lazyfree() -> lru_lazyfree_fn())
thus should reach shrink_page_list() -> split_huge_page_to_list() before
try_to_unmap[_one](), so it deals with normal pages only.

(And in case unlikely/TTU_SPLIT_HUGE_PMD/split_huge_pmd_address() happens,
which should not or be rare, the page refcount should be greater than
mapcount: the head page is referenced by tail pages.  That also prevents
checking the head `page` then incorrectly call page_remove_rmap(subpage)
for a tail page, that isn't even in the shrink_page_list()'s page_list (an
effect of split huge pmd/pmvw), as it might happen today in this unlikely
scenario.)

MADV_FREE'd buffers:
===================

So, back to the "if MADV_FREE pages are used as buffers" note.  The case
is arguable, and subject to multiple interpretations.

The madvise(2) manual page on the MADV_FREE advice value says:

1) 'After a successful MADV_FREE ... data will be lost when
   the kernel frees the pages.'
2) 'the free operation will be canceled if the caller writes
   into the page' / 'subsequent writes ... will succeed and
   then [the] kernel cannot free those dirtied pages'
3) 'If there is no subsequent write, the kernel can free the
   pages at any time.'

Thoughts, questions, considerations... respectively:

1) Since the kernel didn't actually free the page (page_ref_freeze()
   failed), should the data not have been lost? (on userspace read.)
2) Should writes performed by the direct IO read be able to cancel
   the free operation?
   - Should the direct IO read be considered as 'the caller' too,
     as it's been requested by 'the caller'?
   - Should the bio technique to dirty pages on return to userspace
     (bio_check_pages_dirty() is called/used by __blkdev_direct_IO())
     be considered in another/special way here?
3) Should an upcoming write from a previously requested direct IO
   read be considered as a subsequent write, so the kernel should
   not free the pages? (as it's known at the time of page reclaim.)

And lastly:

Technically, the last point would seem a reasonable consideration and
balance, as the madvise(2) manual page apparently (and fairly) seem to
assume that 'writes' are memory access from the userspace process (not
explicitly considering writes from the kernel or its corner cases; again,
fairly)..  plus the kernel fix implementation for the corner case of the
largely 'non-atomic write' encompassed by a direct IO read operation, is
relatively simple; and it helps.

Reproducer:
==========

@ test.c (simplified, but works)

	#define _GNU_SOURCE
	#include <fcntl.h>
	#include <stdio.h>
	#include <unistd.h>
	#include <sys/mman.h>

	int main() {
		int fd, i;
		char *buf;

		fd = open(DEV, O_RDONLY | O_DIRECT);

		buf = mmap(NULL, BUF_SIZE, PROT_READ | PROT_WRITE,
                	   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

		for (i = 0; i < BUF_SIZE; i += PAGE_SIZE)
			buf[i] = 1; // init to non-zero

		madvise(buf, BUF_SIZE, MADV_FREE);

		read(fd, buf, BUF_SIZE);

		for (i = 0; i < BUF_SIZE; i += PAGE_SIZE)
			printf("%p: 0x%x\n", &buf[i], buf[i]);

		return 0;
	}

@ block/fops.c (formerly fs/block_dev.c)

	+#include <linux/swap.h>
	...
	... __blkdev_direct_IO[_simple](...)
	{
	...
	+	if (!strcmp(current->comm, "good"))
	+		shrink_all_memory(ULONG_MAX);
	+
         	ret = bio_iov_iter_get_pages(...);
	+
	+	if (!strcmp(current->comm, "bad"))
	+		shrink_all_memory(ULONG_MAX);
	...
	}

@ shell

        # NUM_PAGES=4
        # PAGE_SIZE=$(getconf PAGE_SIZE)

        # yes | dd of=test.img bs=${PAGE_SIZE} count=${NUM_PAGES}
        # DEV=$(losetup -f --show test.img)

        # gcc -DDEV=\"$DEV\" \
              -DBUF_SIZE=$((PAGE_SIZE * NUM_PAGES)) \
              -DPAGE_SIZE=${PAGE_SIZE} \
               test.c -o test

        # od -tx1 $DEV
        0000000 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a
        *
        0040000

        # mv test good
        # ./good
        0x7f7c10418000: 0x79
        0x7f7c10419000: 0x79
        0x7f7c1041a000: 0x79
        0x7f7c1041b000: 0x79

        # mv good bad
        # ./bad
        0x7fa1b8050000: 0x0
        0x7fa1b8051000: 0x0
        0x7fa1b8052000: 0x0
        0x7fa1b8053000: 0x0

Note: the issue is consistent on v5.17-rc3, but it's intermittent with the
support of MADV_FREE on v4.5 (60%-70% error; needs swap).  [wrap
do_direct_IO() in do_blockdev_direct_IO() @ fs/direct-io.c].

- v5.17-rc3:

        # for i in {1..1000}; do ./good; done \
            | cut -d: -f2 | sort | uniq -c
           4000  0x79

        # mv good bad
        # for i in {1..1000}; do ./bad; done \
            | cut -d: -f2 | sort | uniq -c
           4000  0x0

        # free | grep Swap
        Swap:             0           0           0

- v4.5:

        # for i in {1..1000}; do ./good; done \
            | cut -d: -f2 | sort | uniq -c
           4000  0x79

        # mv good bad
        # for i in {1..1000}; do ./bad; done \
            | cut -d: -f2 | sort | uniq -c
           2702  0x0
           1298  0x79

        # swapoff -av
        swapoff /swap

        # for i in {1..1000}; do ./bad; done \
            | cut -d: -f2 | sort | uniq -c
           4000  0x79

Ceph/TCMalloc:
=============

For documentation purposes, the use case driving the analysis/fix is Ceph
on Ubuntu 18.04, as the TCMalloc library there still uses MADV_FREE to
release unused memory to the system from the mmap'ed page heap (might be
committed back/used again; it's not munmap'ed.) - PageHeap::DecommitSpan()
-> TCMalloc_SystemRelease() -> madvise() - PageHeap::CommitSpan() ->
TCMalloc_SystemCommit() -> do nothing.

Note: TCMalloc switched back to MADV_DONTNEED a few commits after the
release in Ubuntu 18.04 (google-perftools/gperftools 2.5), so the issue
just 'disappeared' on Ceph on later Ubuntu releases but is still present
in the kernel, and can be hit by other use cases.

The observed issue seems to be the old Ceph bug #22464 [1], where checksum
mismatches are observed (and instrumentation with buffer dumps shows
zero-pages read from mmap'ed/MADV_FREE'd page ranges).

The issue in Ceph was reasonably deemed a kernel bug (comment linux-sunxi#50) and
mostly worked around with a retry mechanism, but other parts of Ceph could
still hit that (rocksdb).  Anyway, it's less likely to be hit again as
TCMalloc switched out of MADV_FREE by default.

(Some kernel versions/reports from the Ceph bug, and relation with
the MADV_FREE introduction/changes; TCMalloc versions not checked.)
- 4.4 good
- 4.5 (madv_free: introduction)
- 4.9 bad
- 4.10 good? maybe a swapless system
- 4.12 (madv_free: no longer free instantly on swapless systems)
- 4.13 bad

[1] https://tracker.ceph.com/issues/22464

Thanks:
======

Several people contributed to analysis/discussions/tests/reproducers in
the first stages when drilling down on ceph/tcmalloc/linux kernel:

- Dan Hill
- Dan Streetman
- Dongdong Tao
- Gavin Guo
- Gerald Yang
- Heitor Alves de Siqueira
- Ioanna Alifieraki
- Jay Vosburgh
- Matthew Ruffell
- Ponnuvel Palaniyappan

Reviews, suggestions, corrections, comments:

- Minchan Kim
- Yu Zhao
- Huang, Ying
- John Hubbard
- Christoph Hellwig

[mfo@canonical.com: v4]
  Link: https://lkml.kernel.org/r/20220209202659.183418-1-mfo@canonical.comLink: https://lkml.kernel.org/r/20220131230255.789059-1-mfo@canonical.com

Fixes: 802a3a9 ("mm: reclaim MADV_FREE pages")
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Dan Hill <daniel.hill@canonical.com>
Cc: Dan Streetman <dan.streetman@canonical.com>
Cc: Dongdong Tao <dongdong.tao@canonical.com>
Cc: Gavin Guo <gavin.guo@canonical.com>
Cc: Gerald Yang <gerald.yang@canonical.com>
Cc: Heitor Alves de Siqueira <halves@canonical.com>
Cc: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com>
Cc: Jay Vosburgh <jay.vosburgh@canonical.com>
Cc: Matthew Ruffell <matthew.ruffell@canonical.com>
Cc: Ponnuvel Palaniyappan <ponnuvel.palaniyappan@canonical.com>
Cc: <stable@vger.kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[mfo: backport: replace folio/test_flag with page/flag equivalents;
 real Fixes: 854e9ed ("mm: support madvise(MADV_FREE)") in v4.]
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
repojohnray pushed a commit to repojohnray/linux-sunxi-4.7.y that referenced this pull request Apr 25, 2022
commit 6c8e2a2 upstream.

Problem:
=======

Userspace might read the zero-page instead of actual data from a direct IO
read on a block device if the buffers have been called madvise(MADV_FREE)
on earlier (this is discussed below) due to a race between page reclaim on
MADV_FREE and blkdev direct IO read.

- Race condition:
  ==============

During page reclaim, the MADV_FREE page check in try_to_unmap_one() checks
if the page is not dirty, then discards its rmap PTE(s) (vs.  remap back
if the page is dirty).

However, after try_to_unmap_one() returns to shrink_page_list(), it might
keep the page _anyway_ if page_ref_freeze() fails (it expects exactly
_one_ page reference, from the isolation for page reclaim).

Well, blkdev_direct_IO() gets references for all pages, and on READ
operations it only sets them dirty _later_.

So, if MADV_FREE'd pages (i.e., not dirty) are used as buffers for direct
IO read from block devices, and page reclaim happens during
__blkdev_direct_IO[_simple]() exactly AFTER bio_iov_iter_get_pages()
returns, but BEFORE the pages are set dirty, the situation happens.

The direct IO read eventually completes.  Now, when userspace reads the
buffers, the PTE is no longer there and the page fault handler
do_anonymous_page() services that with the zero-page, NOT the data!

A synthetic reproducer is provided.

- Page faults:
  ===========

If page reclaim happens BEFORE bio_iov_iter_get_pages() the issue doesn't
happen, because that faults-in all pages as writeable, so
do_anonymous_page() sets up a new page/rmap/PTE, and that is used by
direct IO.  The userspace reads don't fault as the PTE is there (thus
zero-page is not used/setup).

But if page reclaim happens AFTER it / BEFORE setting pages dirty, the PTE
is no longer there; the subsequent page faults can't help:

The data-read from the block device probably won't generate faults due to
DMA (no MMU) but even in the case it wouldn't use DMA, that happens on
different virtual addresses (not user-mapped addresses) because `struct
bio_vec` stores `struct page` to figure addresses out (which are different
from user-mapped addresses) for the read.

Thus userspace reads (to user-mapped addresses) still fault, then
do_anonymous_page() gets another `struct page` that would address/ map to
other memory than the `struct page` used by `struct bio_vec` for the read.
(The original `struct page` is not available, since it wasn't freed, as
page_ref_freeze() failed due to more page refs.  And even if it were
available, its data cannot be trusted anymore.)

Solution:
========

One solution is to check for the expected page reference count in
try_to_unmap_one().

There should be one reference from the isolation (that is also checked in
shrink_page_list() with page_ref_freeze()) plus one or more references
from page mapping(s) (put in discard: label).  Further references mean
that rmap/PTE cannot be unmapped/nuked.

(Note: there might be more than one reference from mapping due to
fork()/clone() without CLONE_VM, which use the same `struct page` for
references, until the copy-on-write page gets copied.)

So, additional page references (e.g., from direct IO read) now prevent the
rmap/PTE from being unmapped/dropped; similarly to the page is not freed
per shrink_page_list()/page_ref_freeze()).

- Races and Barriers:
  ==================

The new check in try_to_unmap_one() should be safe in races with
bio_iov_iter_get_pages() in get_user_pages() fast and slow paths, as it's
done under the PTE lock.

The fast path doesn't take the lock, but it checks if the PTE has changed
and if so, it drops the reference and leaves the page for the slow path
(which does take that lock).

The fast path requires synchronization w/ full memory barrier: it writes
the page reference count first then it reads the PTE later, while
try_to_unmap() writes PTE first then it reads page refcount.

And a second barrier is needed, as the page dirty flag should not be read
before the page reference count (as in __remove_mapping()).  (This can be
a load memory barrier only; no writes are involved.)

Call stack/comments:

- try_to_unmap_one()
  - page_vma_mapped_walk()
    - map_pte()			# see pte_offset_map_lock():
        pte_offset_map()
        spin_lock()

  - ptep_get_and_clear()	# write PTE
  - smp_mb()			# (new barrier) GUP fast path
  - page_ref_count()		# (new check) read refcount

  - page_vma_mapped_walk_done()	# see pte_unmap_unlock():
      pte_unmap()
      spin_unlock()

- bio_iov_iter_get_pages()
  - __bio_iov_iter_get_pages()
    - iov_iter_get_pages()
      - get_user_pages_fast()
        - internal_get_user_pages_fast()

          # fast path
          - lockless_pages_from_mm()
            - gup_{pgd,p4d,pud,pmd,pte}_range()
                ptep = pte_offset_map()		# not _lock()
                pte = ptep_get_lockless(ptep)

                page = pte_page(pte)
                try_grab_compound_head(page)	# inc refcount
                                            	# (RMW/barrier
                                             	#  on success)

                if (pte_val(pte) != pte_val(*ptep)) # read PTE
                        put_compound_head(page) # dec refcount
                        			# go slow path

          # slow path
          - __gup_longterm_unlocked()
            - get_user_pages_unlocked()
              - __get_user_pages_locked()
                - __get_user_pages()
                  - follow_{page,p4d,pud,pmd}_mask()
                    - follow_page_pte()
                        ptep = pte_offset_map_lock()
                        pte = *ptep
                        page = vm_normal_page(pte)
                        try_grab_page(page)	# inc refcount
                        pte_unmap_unlock()

- Huge Pages:
  ==========

Regarding transparent hugepages, that logic shouldn't change, as MADV_FREE
(aka lazyfree) pages are PageAnon() && !PageSwapBacked()
(madvise_free_pte_range() -> mark_page_lazyfree() -> lru_lazyfree_fn())
thus should reach shrink_page_list() -> split_huge_page_to_list() before
try_to_unmap[_one](), so it deals with normal pages only.

(And in case unlikely/TTU_SPLIT_HUGE_PMD/split_huge_pmd_address() happens,
which should not or be rare, the page refcount should be greater than
mapcount: the head page is referenced by tail pages.  That also prevents
checking the head `page` then incorrectly call page_remove_rmap(subpage)
for a tail page, that isn't even in the shrink_page_list()'s page_list (an
effect of split huge pmd/pmvw), as it might happen today in this unlikely
scenario.)

MADV_FREE'd buffers:
===================

So, back to the "if MADV_FREE pages are used as buffers" note.  The case
is arguable, and subject to multiple interpretations.

The madvise(2) manual page on the MADV_FREE advice value says:

1) 'After a successful MADV_FREE ... data will be lost when
   the kernel frees the pages.'
2) 'the free operation will be canceled if the caller writes
   into the page' / 'subsequent writes ... will succeed and
   then [the] kernel cannot free those dirtied pages'
3) 'If there is no subsequent write, the kernel can free the
   pages at any time.'

Thoughts, questions, considerations... respectively:

1) Since the kernel didn't actually free the page (page_ref_freeze()
   failed), should the data not have been lost? (on userspace read.)
2) Should writes performed by the direct IO read be able to cancel
   the free operation?
   - Should the direct IO read be considered as 'the caller' too,
     as it's been requested by 'the caller'?
   - Should the bio technique to dirty pages on return to userspace
     (bio_check_pages_dirty() is called/used by __blkdev_direct_IO())
     be considered in another/special way here?
3) Should an upcoming write from a previously requested direct IO
   read be considered as a subsequent write, so the kernel should
   not free the pages? (as it's known at the time of page reclaim.)

And lastly:

Technically, the last point would seem a reasonable consideration and
balance, as the madvise(2) manual page apparently (and fairly) seem to
assume that 'writes' are memory access from the userspace process (not
explicitly considering writes from the kernel or its corner cases; again,
fairly)..  plus the kernel fix implementation for the corner case of the
largely 'non-atomic write' encompassed by a direct IO read operation, is
relatively simple; and it helps.

Reproducer:
==========

@ test.c (simplified, but works)

	#define _GNU_SOURCE
	#include <fcntl.h>
	#include <stdio.h>
	#include <unistd.h>
	#include <sys/mman.h>

	int main() {
		int fd, i;
		char *buf;

		fd = open(DEV, O_RDONLY | O_DIRECT);

		buf = mmap(NULL, BUF_SIZE, PROT_READ | PROT_WRITE,
                	   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

		for (i = 0; i < BUF_SIZE; i += PAGE_SIZE)
			buf[i] = 1; // init to non-zero

		madvise(buf, BUF_SIZE, MADV_FREE);

		read(fd, buf, BUF_SIZE);

		for (i = 0; i < BUF_SIZE; i += PAGE_SIZE)
			printf("%p: 0x%x\n", &buf[i], buf[i]);

		return 0;
	}

@ block/fops.c (formerly fs/block_dev.c)

	+#include <linux/swap.h>
	...
	... __blkdev_direct_IO[_simple](...)
	{
	...
	+	if (!strcmp(current->comm, "good"))
	+		shrink_all_memory(ULONG_MAX);
	+
         	ret = bio_iov_iter_get_pages(...);
	+
	+	if (!strcmp(current->comm, "bad"))
	+		shrink_all_memory(ULONG_MAX);
	...
	}

@ shell

        # NUM_PAGES=4
        # PAGE_SIZE=$(getconf PAGE_SIZE)

        # yes | dd of=test.img bs=${PAGE_SIZE} count=${NUM_PAGES}
        # DEV=$(losetup -f --show test.img)

        # gcc -DDEV=\"$DEV\" \
              -DBUF_SIZE=$((PAGE_SIZE * NUM_PAGES)) \
              -DPAGE_SIZE=${PAGE_SIZE} \
               test.c -o test

        # od -tx1 $DEV
        0000000 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a
        *
        0040000

        # mv test good
        # ./good
        0x7f7c10418000: 0x79
        0x7f7c10419000: 0x79
        0x7f7c1041a000: 0x79
        0x7f7c1041b000: 0x79

        # mv good bad
        # ./bad
        0x7fa1b8050000: 0x0
        0x7fa1b8051000: 0x0
        0x7fa1b8052000: 0x0
        0x7fa1b8053000: 0x0

Note: the issue is consistent on v5.17-rc3, but it's intermittent with the
support of MADV_FREE on v4.5 (60%-70% error; needs swap).  [wrap
do_direct_IO() in do_blockdev_direct_IO() @ fs/direct-io.c].

- v5.17-rc3:

        # for i in {1..1000}; do ./good; done \
            | cut -d: -f2 | sort | uniq -c
           4000  0x79

        # mv good bad
        # for i in {1..1000}; do ./bad; done \
            | cut -d: -f2 | sort | uniq -c
           4000  0x0

        # free | grep Swap
        Swap:             0           0           0

- v4.5:

        # for i in {1..1000}; do ./good; done \
            | cut -d: -f2 | sort | uniq -c
           4000  0x79

        # mv good bad
        # for i in {1..1000}; do ./bad; done \
            | cut -d: -f2 | sort | uniq -c
           2702  0x0
           1298  0x79

        # swapoff -av
        swapoff /swap

        # for i in {1..1000}; do ./bad; done \
            | cut -d: -f2 | sort | uniq -c
           4000  0x79

Ceph/TCMalloc:
=============

For documentation purposes, the use case driving the analysis/fix is Ceph
on Ubuntu 18.04, as the TCMalloc library there still uses MADV_FREE to
release unused memory to the system from the mmap'ed page heap (might be
committed back/used again; it's not munmap'ed.) - PageHeap::DecommitSpan()
-> TCMalloc_SystemRelease() -> madvise() - PageHeap::CommitSpan() ->
TCMalloc_SystemCommit() -> do nothing.

Note: TCMalloc switched back to MADV_DONTNEED a few commits after the
release in Ubuntu 18.04 (google-perftools/gperftools 2.5), so the issue
just 'disappeared' on Ceph on later Ubuntu releases but is still present
in the kernel, and can be hit by other use cases.

The observed issue seems to be the old Ceph bug #22464 [1], where checksum
mismatches are observed (and instrumentation with buffer dumps shows
zero-pages read from mmap'ed/MADV_FREE'd page ranges).

The issue in Ceph was reasonably deemed a kernel bug (comment linux-sunxi#50) and
mostly worked around with a retry mechanism, but other parts of Ceph could
still hit that (rocksdb).  Anyway, it's less likely to be hit again as
TCMalloc switched out of MADV_FREE by default.

(Some kernel versions/reports from the Ceph bug, and relation with
the MADV_FREE introduction/changes; TCMalloc versions not checked.)
- 4.4 good
- 4.5 (madv_free: introduction)
- 4.9 bad
- 4.10 good? maybe a swapless system
- 4.12 (madv_free: no longer free instantly on swapless systems)
- 4.13 bad

[1] https://tracker.ceph.com/issues/22464

Thanks:
======

Several people contributed to analysis/discussions/tests/reproducers in
the first stages when drilling down on ceph/tcmalloc/linux kernel:

- Dan Hill
- Dan Streetman
- Dongdong Tao
- Gavin Guo
- Gerald Yang
- Heitor Alves de Siqueira
- Ioanna Alifieraki
- Jay Vosburgh
- Matthew Ruffell
- Ponnuvel Palaniyappan

Reviews, suggestions, corrections, comments:

- Minchan Kim
- Yu Zhao
- Huang, Ying
- John Hubbard
- Christoph Hellwig

[mfo@canonical.com: v4]
  Link: https://lkml.kernel.org/r/20220209202659.183418-1-mfo@canonical.comLink: https://lkml.kernel.org/r/20220131230255.789059-1-mfo@canonical.com

Fixes: 802a3a9 ("mm: reclaim MADV_FREE pages")
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Dan Hill <daniel.hill@canonical.com>
Cc: Dan Streetman <dan.streetman@canonical.com>
Cc: Dongdong Tao <dongdong.tao@canonical.com>
Cc: Gavin Guo <gavin.guo@canonical.com>
Cc: Gerald Yang <gerald.yang@canonical.com>
Cc: Heitor Alves de Siqueira <halves@canonical.com>
Cc: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com>
Cc: Jay Vosburgh <jay.vosburgh@canonical.com>
Cc: Matthew Ruffell <matthew.ruffell@canonical.com>
Cc: Ponnuvel Palaniyappan <ponnuvel.palaniyappan@canonical.com>
Cc: <stable@vger.kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[mfo: backport: replace folio/test_flag with page/flag equivalents;
 real Fixes: 854e9ed ("mm: support madvise(MADV_FREE)") in v4.]
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
codekipper pushed a commit to codekipper/linux-sunxi that referenced this pull request Mar 7, 2023
…not present

DG1/DG2 and MTL+ has added a new display-present HW flag. Check this
flag and if cleared, disable the driver's display functionality.

So far the missing check resulted in running the display initialization
sequence, and the WARNs below, due to the display register accesses
timing out:

[    3.902843] ------------[ cut here ]------------
[    3.902848] i915 0000:03:00.0: drm_WARN_ON(intel_de_wait_for_set(dev_priv, ((const i915_reg_t){ .reg = (0x42000) }), (1 << (27 - (pg))), 1))
[    3.902879] WARNING: CPU: 6 PID: 462 at drivers/gpu/drm/i915/display/intel_display_power_well.c:326 gen9_wait_for_power_well_fuses+0x71/0x80 [i915]
[    3.903009] Modules linked in: hid_sensor_hub intel_ishtp_hid i915(+) rtsx_pci_sdmmc drm_buddy mmc_core drm_display_helper crct10dif_pclmul nvme cec crc32_pclmul intel_ish_ipc crc32c_intel ucsi_acpi hid_multitouch nvme_core ghash_clmulni_intel typec_ucsi rtsx_pci ttm sha512_ssse3 serio_raw intel_ishtp typec video i2c_hid_acpi i2c_hid wmi pinctrl_tigerlake ip6_tables ip_tables x_tables fuse
[    3.903021] CPU: 6 PID: 462 Comm: systemd-udevd Tainted: G     U             6.2.0-rc6+ linux-sunxi#50
[    3.903023] Hardware name: LENOVO 82VB/LNVNB161216, BIOS KMCN09WW 04/26/2022
[    3.903023] RIP: 0010:gen9_wait_for_power_well_fuses+0x71/0x80 [i915]
[    3.903105] Code: 48 8b 5f 50 48 85 db 75 03 48 8b 1f e8 98 bb 0d e9 48 c7 c1 00 65 a1 c0 48 89 da 48 c7 c7 4b c5 a3 c0 48 89 c6 e8 e3 df 53 e9 <0f> 0b 5b c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90
[    3.903106] RSP: 0018:ffffa7cec0b07a98 EFLAGS: 00010292
[    3.903107] RAX: 0000000000000080 RBX: ffff9a05430eaaa0 RCX: 0000000000000000
[    3.903108] RDX: 0000000000000001 RSI: ffffffffaa7ab69e RDI: 00000000ffffffff
[    3.903108] RBP: ffff9a0552ba2020 R08: ffffffffab062ce0 R09: 00000000abd3ffc2
[    3.903109] R10: ffffffffffffffff R11: 0000000000000081 R12: 0000000000000000
[    3.903109] R13: ffff9a05532a9cb0 R14: ffffffffc09e1670 R15: ffff9a0543132000
[    3.903110] FS:  00007f24d0fe5b40(0000) GS:ffff9a0ccf780000(0000) knlGS:0000000000000000
[    3.903110] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.903111] CR2: 00005643d7a31a28 CR3: 0000000111614002 CR4: 0000000000770ee0
[    3.903112] PKRU: 55555554
[    3.903112] Call Trace:
[    3.903113]  <TASK>
[    3.903114]  hsw_power_well_enable+0x12f/0x1a0 [i915]
[    3.903191]  intel_power_well_enable+0x21/0x70 [i915]
[    3.903265]  icl_display_core_init+0x92/0x6a0 [i915]
[    3.903346]  intel_power_domains_init_hw+0x1da/0x5b0 [i915]
[    3.903422]  intel_modeset_init_noirq+0x60/0x250 [i915]
[    3.903497]  i915_driver_probe+0x562/0xe10 [i915]
[    3.903557]  ? i915_pci_probe+0x87/0x180 [i915]
[    3.903617]  local_pci_probe+0x3e/0x80
[    3.903621]  pci_device_probe+0xb3/0x210
[    3.903622]  really_probe+0xdb/0x380
[    3.903624]  ? pm_runtime_barrier+0x50/0x90
[    3.903626]  __driver_probe_device+0x78/0x170
[    3.903627]  driver_probe_device+0x1f/0x90
[    3.903628]  __driver_attach+0xce/0x1c0
[    3.903629]  ? __pfx___driver_attach+0x10/0x10
[    3.903630]  bus_for_each_dev+0x5f/0x90
[    3.903631]  bus_add_driver+0x1ae/0x200
[    3.903632]  driver_register+0x89/0xe0
[    3.903634]  i915_init+0x1f/0x7f [i915]
[    3.903695]  ? __pfx_init_module+0x10/0x10 [i915]
[    3.903751]  do_one_initcall+0x43/0x220
[    3.903753]  ? kmalloc_trace+0x26/0x90
[    3.903756]  do_init_module+0x4a/0x200
[    3.903758]  __do_sys_init_module+0x157/0x180
[    3.903760]  do_syscall_64+0x58/0xc0
[    3.903762]  ? do_syscall_64+0x67/0xc0
[    3.903762]  ? exc_page_fault+0x70/0x170
[    3.903764]  entry_SYSCALL_64_after_hwframe+0x72/0xdc

Bspec: 49189, 53112

v2: (Jani)
- Change "Display fused off" dmesg info to "Display not present".
- Zero only runtime->pipe_mask, other fields being zeroed based on this
  later.
- Detect display presence already before the fused-off checks and only for
  HAS_DISPLAY().
v3: Fix "preset" vs "present" typo.

Reported-and-tested-by: iczero <iczero@hellomouse.net>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8015
Cc: iczero <iczero@hellomouse.net>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230208114300.3123934-4-imre.deak@intel.com
jwrdegoede pushed a commit to jwrdegoede/linux-sunxi that referenced this pull request Jun 12, 2023
The commit 4f7e723 ("cgroup: Fix threadgroup_rwsem <-> cpus_read_lock()
deadlock") fixed the deadlock between cgroup_threadgroup_rwsem and
cpus_read_lock() by introducing cgroup_attach_{lock,unlock}() and removing
cpus_read_{lock,unlock}() from cpuset_attach(). But cgroup_transfer_tasks()
was missed and not handled, which will cause th following warning:

 WARNING: CPU: 0 PID: 589 at kernel/cpu.c:526 lockdep_assert_cpus_held+0x32/0x40
 CPU: 0 PID: 589 Comm: kworker/1:4 Not tainted 6.4.0-rc2-next-20230517 linux-sunxi#50
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
 Workqueue: events cpuset_hotplug_workfn
 RIP: 0010:lockdep_assert_cpus_held+0x32/0x40
 <...>
 Call Trace:
  <TASK>
  cpuset_attach+0x40/0x240
  cgroup_migrate_execute+0x452/0x5e0
  ? _raw_spin_unlock_irq+0x28/0x40
  cgroup_transfer_tasks+0x1f3/0x360
  ? find_held_lock+0x32/0x90
  ? cpuset_hotplug_workfn+0xc81/0xed0
  cpuset_hotplug_workfn+0xcb1/0xed0
  ? process_one_work+0x248/0x5b0
  process_one_work+0x2b9/0x5b0
  worker_thread+0x56/0x3b0
  ? process_one_work+0x5b0/0x5b0
  kthread+0xf1/0x120
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x1f/0x30
  </TASK>

So just use the cgroup_attach_{lock,unlock}() helper to fix it.

Reported-by: Zhao Gongyi <zhaogongyi@bytedance.com>
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Fixes: 05c7b7a ("cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug")
Cc: stable@vger.kernel.org # v5.17+
Signed-off-by: Tejun Heo <tj@kernel.org>
jwrdegoede pushed a commit to jwrdegoede/linux-sunxi that referenced this pull request Sep 20, 2023
Commit a1d7671 ("md: use mddev->external to select holder in
export_rdev()") fix the problem that 'claim_rdev' is used for
blkdev_get_by_dev() while 'rdev' is used for blkdev_put().

However, if mddev->external is changed from 0 to 1, then 'rdev' is used
for blkdev_get_by_dev() while 'claim_rdev' is used for blkdev_put(). And
this problem can be reporduced reliably by following:

New file: mdadm/tests/23rdev-lifetime

devname=${dev0##*/}
devt=`cat /sys/block/$devname/dev`
pid=""
runtime=2

clean_up_test() {
        pill -9 $pid
        echo clear > /sys/block/md0/md/array_state
}

trap 'clean_up_test' EXIT

add_by_sysfs() {
        while true; do
                echo $devt > /sys/block/md0/md/new_dev
        done
}

remove_by_sysfs(){
        while true; do
                echo remove > /sys/block/md0/md/dev-${devname}/state
        done
}

echo md0 > /sys/module/md_mod/parameters/new_array || die "create md0 failed"

add_by_sysfs &
pid="$pid $!"

remove_by_sysfs &
pid="$pid $!"

sleep $runtime
exit 0

Test cmd:

./test --save-logs --logdir=/tmp/ --keep-going --dev=loop --tests=23rdev-lifetime

Test result:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 960 at block/bdev.c:618 blkdev_put+0x27c/0x330
Modules linked in: multipath md_mod loop
CPU: 0 PID: 960 Comm: test Not tainted 6.5.0-rc2-00121-g01e55c376936-dirty linux-sunxi#50
RIP: 0010:blkdev_put+0x27c/0x330
Call Trace:
 <TASK>
 export_rdev.isra.23+0x50/0xa0 [md_mod]
 mddev_unlock+0x19d/0x300 [md_mod]
 rdev_attr_store+0xec/0x190 [md_mod]
 sysfs_kf_write+0x52/0x70
 kernfs_fop_write_iter+0x19a/0x2a0
 vfs_write+0x3b5/0x770
 ksys_write+0x74/0x150
 __x64_sys_write+0x22/0x30
 do_syscall_64+0x40/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fix the problem by recording if 'rdev' is used as holder.

Fixes: a1d7671 ("md: use mddev->external to select holder in export_rdev()")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230825025532.1523008-3-yukuai1@huaweicloud.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants