forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync with upstream. #1
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…on SKL Bspec tells us to keep bashing the PCU for up to 3ms when trying to inform it about an upcoming change in the cdclk frequency. Currently we only keep at it for 15*10usec (+ whatever delays gets added by the sandybridge_pcode_read() itself). Let's change the limit to 3ms. I decided to keep 10 usec delay per iteration for now, even though the spec doesn't really tell us to do that. Cc: stable@vger.kernel.org Fixes: 5d96d8a ("drm/i915/skl: Deinit/init the display at suspend/resume") Cc: David Weinehall <david.weinehall@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1468416723-23440-1-git-send-email-ville.syrjala@linux.intel.com Tested-by: David Weinehall <david.weinehall@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (cherry picked from commit 848496e) Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
SNB (and IVB too I suppose) starts to misbehave if the GPU gets stuck in an infinite batch buffer loop. The GPU apparently hogs something critical and CPUs start to lose interrupts and whatnot. We can keep the system limping along by unmasking some interrupts in GEN6_PMINTRMSK. The EI up interrupt has been previously chosen for that task, so let's never mask it. v2: s/gen6_rps_pm_mask/gen6_sanitize_rps_pm_mask/ (Chris) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93122 Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1464014568-4529-1-git-send-email-ville.syrjala@linux.intel.com Cc: stable@vger.kernel.org (cherry picked from commit 12c100b) Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When we start an fsync we start ordered extents for all delalloc ranges. However before attempting to log the inode, we only wait for those ordered extents if we are not doing a full sync (bit BTRFS_INODE_NEEDS_FULL_SYNC is set in the inode's flags). This means that if an ordered extent completes with an IO error before we check if we can skip logging the inode, we will not catch and report the IO error to user space. This is because on an IO error, when the ordered extent completes we do not update the inode, so if the inode was not previously updated by the current transaction we end up not logging it through calls to fsync and therefore not check its mapping flags for the presence of IO errors. Fix this by checking for errors in the flags of the inode's mapping when we notice we can skip logging the inode. This caused sporadic failures in the test generic/331 (which explicitly tests for IO errors during an fsync call). Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
When doing an incremental send we can end up not moving directories that have the same name. This happens when the same parent directory has different child directories with the same name in the parent and send snapshots. For example, consider the following scenario: Parent snapshot: . (ino 256) |---- d/ (ino 257) | |--- p1/ (ino 258) | |---- p1/ (ino 259) Send snapshot: . (ino 256) |--- d/ (ino 257) |--- p1/ (ino 259) |--- p1/ (ino 258) The directory named "d" (inode 257) has in both snapshots an entry with the name "p1" but it refers to different inodes in both snapshots (inode 258 in the parent snapshot and inode 259 in the send snapshot). When attempting to move inode 258, the operation is delayed because its new parent, inode 259, was not yet moved/renamed (as the stream is currently processing inode 258). Then when processing inode 259, we also end up delaying its move/rename operation so that it happens after inode 258 is moved/renamed. This decision to delay the move/rename rename operation of inode 259 is due to the fact that the new parent inode (257) still has inode 258 as its child, which has the same name has inode 259. So we end up with inode 258 move/rename operation waiting for inode's 259 move/rename operation, which in turn it waiting for inode's 258 move/rename. This results in ending the send stream without issuing move/rename operations for inodes 258 and 259 and generating the following warnings in syslog/dmesg: [148402.979747] ------------[ cut here ]------------ [148402.980588] WARNING: CPU: 14 PID: 4117 at fs/btrfs/send.c:6177 btrfs_ioctl_send+0xe03/0xe51 [btrfs] [148402.981928] Modules linked in: btrfs crc32c_generic xor raid6_pq acpi_cpufreq tpm_tis ppdev tpm parport_pc psmouse parport sg pcspkr i2c_piix4 i2c_core evdev processor serio_raw button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring virtio e1000 scsi_mod floppy [last unloaded: btrfs] [148402.986999] CPU: 14 PID: 4117 Comm: btrfs Tainted: G W 4.6.0-rc7-btrfs-next-31+ #1 [148402.988136] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 [148402.988136] 0000000000000000 ffff88022139fca8 ffffffff8126b42c 0000000000000000 [148402.988136] 0000000000000000 ffff88022139fce8 ffffffff81052b14 000018212139fac8 [148402.988136] ffff88022b0db400 0000000000000000 0000000000000001 0000000000000000 [148402.988136] Call Trace: [148402.988136] [<ffffffff8126b42c>] dump_stack+0x67/0x90 [148402.988136] [<ffffffff81052b14>] __warn+0xc2/0xdd [148402.988136] [<ffffffff81052beb>] warn_slowpath_null+0x1d/0x1f [148402.988136] [<ffffffffa04bc831>] btrfs_ioctl_send+0xe03/0xe51 [btrfs] [148402.988136] [<ffffffffa048b358>] btrfs_ioctl+0x14f/0x1f81 [btrfs] [148402.988136] [<ffffffff8108e456>] ? arch_local_irq_save+0x9/0xc [148402.988136] [<ffffffff8108eb51>] ? __lock_is_held+0x3c/0x57 [148402.988136] [<ffffffff8118da05>] vfs_ioctl+0x18/0x34 [148402.988136] [<ffffffff8118e00c>] do_vfs_ioctl+0x550/0x5be [148402.988136] [<ffffffff81196f0c>] ? __fget+0x6b/0x77 [148402.988136] [<ffffffff81196fa1>] ? __fget_light+0x62/0x71 [148402.988136] [<ffffffff8118e0d1>] SyS_ioctl+0x57/0x79 [148402.988136] [<ffffffff8149e025>] entry_SYSCALL_64_fastpath+0x18/0xa8 [148402.988136] [<ffffffff8108e89d>] ? trace_hardirqs_off_caller+0x3f/0xaa [148403.011373] ---[ end trace a4539270c8056f8b ]--- [148403.012296] ------------[ cut here ]------------ [148403.013071] WARNING: CPU: 14 PID: 4117 at fs/btrfs/send.c:6194 btrfs_ioctl_send+0xe19/0xe51 [btrfs] [148403.014447] Modules linked in: btrfs crc32c_generic xor raid6_pq acpi_cpufreq tpm_tis ppdev tpm parport_pc psmouse parport sg pcspkr i2c_piix4 i2c_core evdev processor serio_raw button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring virtio e1000 scsi_mod floppy [last unloaded: btrfs] [148403.019708] CPU: 14 PID: 4117 Comm: btrfs Tainted: G W 4.6.0-rc7-btrfs-next-31+ #1 [148403.020104] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 [148403.020104] 0000000000000000 ffff88022139fca8 ffffffff8126b42c 0000000000000000 [148403.020104] 0000000000000000 ffff88022139fce8 ffffffff81052b14 000018322139fac8 [148403.020104] ffff88022b0db400 0000000000000000 0000000000000001 0000000000000000 [148403.020104] Call Trace: [148403.020104] [<ffffffff8126b42c>] dump_stack+0x67/0x90 [148403.020104] [<ffffffff81052b14>] __warn+0xc2/0xdd [148403.020104] [<ffffffff81052beb>] warn_slowpath_null+0x1d/0x1f [148403.020104] [<ffffffffa04bc847>] btrfs_ioctl_send+0xe19/0xe51 [btrfs] [148403.020104] [<ffffffffa048b358>] btrfs_ioctl+0x14f/0x1f81 [btrfs] [148403.020104] [<ffffffff8108e456>] ? arch_local_irq_save+0x9/0xc [148403.020104] [<ffffffff8108eb51>] ? __lock_is_held+0x3c/0x57 [148403.020104] [<ffffffff8118da05>] vfs_ioctl+0x18/0x34 [148403.020104] [<ffffffff8118e00c>] do_vfs_ioctl+0x550/0x5be [148403.020104] [<ffffffff81196f0c>] ? __fget+0x6b/0x77 [148403.020104] [<ffffffff81196fa1>] ? __fget_light+0x62/0x71 [148403.020104] [<ffffffff8118e0d1>] SyS_ioctl+0x57/0x79 [148403.020104] [<ffffffff8149e025>] entry_SYSCALL_64_fastpath+0x18/0xa8 [148403.020104] [<ffffffff8108e89d>] ? trace_hardirqs_off_caller+0x3f/0xaa [148403.038981] ---[ end trace a4539270c8056f8c ]--- There's another issue caused by similar (but more complex) changes in the directory hierarchy that makes move/rename operations fail, described with the following example: Parent snapshot: . |---- a/ (ino 262) | |---- c/ (ino 268) | |---- d/ (ino 263) |---- ance/ (ino 267) |---- e/ (ino 264) |---- f/ (ino 265) |---- ance/ (ino 266) Send snapshot: . |---- a/ (ino 262) |---- c/ (ino 268) | |---- ance/ (ino 267) | |---- d/ (ino 263) | |---- ance/ (ino 266) | |---- f/ (ino 265) |---- e/ (ino 264) When the inode 265 is processed, the path for inode 267 is computed, which at that time corresponds to "d/ance", and it's stored in the names cache. Later on when processing inode 266, we end up orphanizing (renaming to a name matching the pattern o<ino>-<gen>-<seq>) inode 267 because it has the same name as inode 266 and it's currently a child of the new parent directory (inode 263) for inode 266. After the orphanization and while we are still processing inode 266, a rename operation for inode 266 is generated. However the source path for that rename operation is incorrect because it ends up using the old, pre-orphanization, name of inode 267. The no longer valid name for inode 267 was previously cached when processing inode 265 and it remains usable and considered valid until the inode currently being processed has a number greater than 267. This resulted in the receiving side failing with the following error: ERROR: rename d/ance/ance -> d/ance failed: No such file or directory So fix these issues by detecting such circular dependencies for rename operations and by clearing the cached name of an inode once the inode is orphanized. A test case for fstests will follow soon. Signed-off-by: Robbie Ko <robbieko@synology.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> [Rewrote change log to be more detailed and organized, and improved comments] Signed-off-by: Filipe Manana <fdmanana@suse.com>
The function path_loop() can return a negative integer, signaling an error, 0 if there's no path loop and 1 if there's a path loop. We were treating any non zero values as meaning that a path loop exists. Fix this by explicitly checking for errors and gracefully return them to user space. Signed-off-by: Filipe Manana <fdmanana@suse.com>
Example scenario: Parent snapshot: . (ino 277) |---- tmp/ (ino 278) |---- pre/ (ino 280) | |---- wait_dir/ (ino 281) | |---- desc/ (ino 282) |---- ance/ (ino 283) | |---- below_ance/ (ino 279) | |---- other_dir/ (ino 284) Send snapshot: . (ino 277) |---- tmp/ (ino 278) |---- other_dir/ (ino 284) |---- below_ance/ (ino 279) | |---- pre/ (ino 280) | |---- wait_dir/ (ino 281) |---- desc/ (ino 282) |---- ance/ (ino 283) While computing the send stream the following steps happen: 1) While processing inode 279 we end up delaying its rename operation because its new parent in the send snapshot, inode 284, was not yet processed and therefore not yet renamed; 2) Later when processing inode 280 we end up renaming it immediately to "ance/below_once/pre" and not delay its rename operation because its new parent (inode 279 in the send snapshot) has its rename operation delayed and inode 280 is not an encestor of inode 279 (its parent in the send snapshot) in the parent snapshot; 3) When processing inode 281 we end up delaying its rename operation because its new parent in the send snapshot, inode 284, was not yet processed and therefore not yet renamed; 4) When processing inode 282 we do not delay its rename operation because its parent in the send snapshot, inode 281, already has its own rename operation delayed and our current inode (282) is not an ancestor of inode 281 in the parent snapshot. Therefore inode 282 is renamed to "ance/below_ance/pre/wait_dir"; 5) When processing inode 283 we realize that we can rename it because one of its ancestors in the send snapshot, inode 281, has its rename operation delayed and inode 283 is not an ancestor of inode 281 in the parent snapshot. So a rename operation to rename inode 283 to "ance/below_ance/pre/wait_dir/desc/ance" is issued. This path is invalid due to a missing path building loop that was undetected by the incremental send implementation, as inode 283 ends up getting included twice in the path (once with its path in the parent snapshot). Therefore its rename operation must wait before the ancestor inode 284 is renamed. Fix this by not terminating the rename dependency checks when we find an ancestor, in the send snapshot, that has its rename operation delayed. So that we continue doing the same checks if the current inode is not an ancestor, in the parent snapshot, of an ancestor in the send snapshot we are processing in the loop. The problem and reproducer were reported by Robbie Ko, as part of a patch titled "Btrfs: incremental send, avoid ancestor rename to descendant". However the fix was unnecessarily complicated and can be addressed with much less code and effort. Reported-by: Robbie Ko <robbieko@synology.com> Signed-off-by: Filipe Manana <fdmanana@suse.com>
Under certain situations, an incremental send operation can contain a rmdir operation that will make the receiving end fail when attempting to execute it, because the target directory is not yet empty. Consider the following example: Parent snapshot: . (ino 256) |--- a/ (ino 257) | |--- c/ (ino 260) | |--- del/ (ino 259) |--- tmp/ (ino 258) |--- x/ (ino 261) Send snapshot: . (ino 256) |--- a/ (ino 257) | |--- x/ (ino 261) | |--- c/ (ino 260) |--- tmp/ (ino 258) 1) When processing inode 258, we delay its rename operation because inode 260 is its new parent in the send snapshot and it was not yet renamed (since 260 > 258, that is, beyond the current progress); 2) When processing inode 259, we realize we can not yet send an rmdir operation (against inode 259) because inode 258 was still not yet renamed/moved away from inode 259. Therefore we update data structures so that after inode 258 is renamed, we try again to see if we can finally send an rmdir operation for inode 259; 3) When we process inode 260, we send a rename operation for it followed by a rename operation for inode 258. Once we send the rename operation for inode 258 we then check if we can finally issue an rmdir for its previous parent, inode 259, by calling the can_rmdir() function with a value of sctx->cur_ino + 1 (260 + 1 = 261) for its "progress" argument. This makes can_rmdir() return true (value 1) because even though there's still a child inode of inode 259 that was not yet renamed/moved, which is inode 261, the given value of progress (261) is not lower then 261 (that is, not lower than the inode number of some child of inode 259). So we end up sending a rmdir operation for inode 259 before its child inode 261 is processed and renamed. So fix this by passing the correct progress value to the call to can_rmdir() from within apply_dir_move() (where we issue delayed rename operations), which should match stcx->cur_ino (the number of the inode currently being processed) and not sctx->cur_ino + 1. A test case for fstests follows soon. Signed-off-by: Robbie Ko <robbieko@synology.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> [Rewrote change log to be more detailed, clear and well formatted] Signed-off-by: Filipe Manana <fdmanana@suse.com>
…tures Under certain situations, when doing an incremental send, we can end up not freeing orphan_dir_info structures as soon as they are no longer needed. Instead we end up freeing them only after finishing the send stream, which causes a warning to be emitted: [282735.229200] ------------[ cut here ]------------ [282735.229968] WARNING: CPU: 9 PID: 10588 at fs/btrfs/send.c:6298 btrfs_ioctl_send+0xe2f/0xe51 [btrfs] [282735.231282] Modules linked in: btrfs crc32c_generic xor raid6_pq acpi_cpufreq tpm_tis ppdev tpm parport_pc psmouse parport sg pcspkr i2c_piix4 i2c_core evdev processor serio_raw button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring virtio e1000 scsi_mod floppy [last unloaded: btrfs] [282735.237130] CPU: 9 PID: 10588 Comm: btrfs Tainted: G W 4.6.0-rc7-btrfs-next-31+ #1 [282735.239309] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 [282735.240160] 0000000000000000 ffff880224273ca8 ffffffff8126b42c 0000000000000000 [282735.240160] 0000000000000000 ffff880224273ce8 ffffffff81052b14 0000189a24273ac8 [282735.240160] ffff8802210c9800 0000000000000000 0000000000000001 0000000000000000 [282735.240160] Call Trace: [282735.240160] [<ffffffff8126b42c>] dump_stack+0x67/0x90 [282735.240160] [<ffffffff81052b14>] __warn+0xc2/0xdd [282735.240160] [<ffffffff81052beb>] warn_slowpath_null+0x1d/0x1f [282735.240160] [<ffffffffa03c99d5>] btrfs_ioctl_send+0xe2f/0xe51 [btrfs] [282735.240160] [<ffffffffa0398358>] btrfs_ioctl+0x14f/0x1f81 [btrfs] [282735.240160] [<ffffffff8108e456>] ? arch_local_irq_save+0x9/0xc [282735.240160] [<ffffffff8118da05>] vfs_ioctl+0x18/0x34 [282735.240160] [<ffffffff8118e00c>] do_vfs_ioctl+0x550/0x5be [282735.240160] [<ffffffff81196f0c>] ? __fget+0x6b/0x77 [282735.240160] [<ffffffff81196fa1>] ? __fget_light+0x62/0x71 [282735.240160] [<ffffffff8118e0d1>] SyS_ioctl+0x57/0x79 [282735.240160] [<ffffffff8149e025>] entry_SYSCALL_64_fastpath+0x18/0xa8 [282735.240160] [<ffffffff81100c6b>] ? time_hardirqs_off+0x9/0x14 [282735.240160] [<ffffffff8108e87d>] ? trace_hardirqs_off_caller+0x1f/0xaa [282735.256343] ---[ end trace a4539270c8056f93 ]--- Consider the following example: Parent snapshot: . (ino 256) |--- a/ (ino 257) | |--- c/ (ino 260) | |--- del/ (ino 259) |--- tmp/ (ino 258) |--- x/ (ino 261) |--- y/ (ino 262) Send snapshot: . (ino 256) |--- a/ (ino 257) | |--- x/ (ino 261) | |--- y/ (ino 262) | |--- c/ (ino 260) |--- tmp/ (ino 258) 1) When processing inode 258, we end up delaying its rename operation because it has an ancestor (in the send snapshot) that has a higher inode number (inode 260) which was also renamed in the send snapshot, therefore we delay the rename of inode 258 so that it happens after inode 260 is renamed; 2) When processing inode 259, we end up delaying its deletion (rmdir operation) because it has a child inode (258) that has its rename operation delayed. At this point we allocate an orphan_dir_info structure and tag inode 258 so that we later attempt to see if we can delete (rmdir) inode 259 once inode 258 is renamed; 3) When we process inode 260, after renaming it we finally do the rename operation for inode 258. Once we issue the rename operation for inode 258 we notice that this inode was tagged so that we attempt to see if at this point we can delete (rmdir) inode 259. But at this point we can not still delete inode 259 because it has 2 children, inodes 261 and 262, that were not yet processed and therefore not yet moved (renamed) away from inode 259. We end up not freeing the orphan_dir_info structure allocated in step 2; 4) We process inodes 261 and 262, and once we move/rename inode 262 we issue the rmdir operation for inode 260; 5) We finish the send stream and notice that red black tree that contains orphan_dir_info structures is not empty, so we emit a warning and then free any orphan_dir_structures left. So fix this by freeing an orphan_dir_info structure once we try to apply a pending rename operation if we can not delete yet the tagged directory. A test case for fstests follows soon. Signed-off-by: Robbie Ko <robbieko@synology.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> [Modified changelog to be more detailed and easier to understand]
…ions During an incremental send, if we have delayed rename operations for inodes that were children of directories which were removed in the send snapshot, we can end up accessing incorrect items in a leaf or accessing beyond the last item of the leaf due to issuing utimes operations for the removed inodes. Consider the following example: Parent snapshot: . (ino 256) |--- a/ (ino 257) | |--- c/ (ino 262) | |--- b/ (ino 258) | |--- d/ (ino 263) | |--- del/ (ino 261) |--- x/ (ino 259) |--- y/ (ino 260) Send snapshot: . (ino 256) |--- a/ (ino 257) | |--- b/ (ino 258) | |--- c/ (ino 262) | |--- y/ (ino 260) | |--- d/ (ino 263) |--- x/ (ino 259) 1) When processing inodes 259 and 260, we end up delaying their rename operations because their parents, inodes 263 and 262 respectively, were not yet processed and therefore not yet renamed; 2) When processing inode 262, its rename operation is issued and right after the rename operation for inode 260 is issued. However right after issuing the rename operation for inode 260, at send.c:apply_dir_move(), we issue utimes operations for all current and past parents of inode 260. This means we try to send a utimes operation for its old parent, inode 261 (deleted in the send snapshot), which does not cause any immediate and deterministic failure, because when the target inode is not found in the send snapshot, the send.c:send_utimes() function ignores it and uses the leaf region pointed to by path->slots[0], which can be any unrelated item (belonging to other inode) or it can be a region outside the leaf boundaries, if the leaf is full and path->slots[0] matches the number of items in the leaf. So we end up either successfully sending a utimes operation, which is fine and irrelevant because the old parent (inode 261) will end up being deleted later, or we end up doing an invalid memory access tha crashes the kernel. So fix this by making apply_dir_move() issue utimes operations only for parents that still exist in the send snapshot. In a separate patch we will make send_utimes() return an error (-ENOENT) if the given inode does not exists in the send snapshot. Signed-off-by: Robbie Ko <robbieko@synology.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> [Rewrote change log to be more detailed and better organized] Signed-off-by: Filipe Manana <fdmanana@suse.com>
…ions The caller of send_utimes() is supposed to be sure that the inode number it passes to this function does actually exists in the send snapshot. However due to logic/algorithm bugs (such as the one fixed by the patch titled "Btrfs: send, fix invalid leaf accesses due to incorrect utimes operations"), this might not be the case and when that happens it makes send_utimes() access use an unrelated leaf item as the target inode item or access beyond a leaf's boundaries (when the leaf is full and path->slots[0] matches the number of items in the leaf). So if the call to btrfs_search_slot() done by send_utimes() does not find the inode item, just make sure send_utimes() returns -ENOENT and does not silently accesses unrelated leaf items or does invalid leaf accesses, also allowing us to easialy and deterministically catch such algorithmic/logic bugs. Signed-off-by: Filipe Manana <fdmanana@suse.com>
When doing an incremental send, if we find a new/modified/deleted extent, reference or xattr without having previously processed the corresponding inode item we end up exexuting a BUG_ON(). This is because whenever an extent, xattr or reference is added, modified or deleted, we always expect to have the corresponding inode item updated. However there are situations where this will not happen due to transient -ENOMEM or -ENOSPC errors when doing delayed inode updates. For example, when punching holes we can succeed in deleting and modifying (shrinking) extents but later fail to do the delayed inode update. So after such failure we close our transaction handle and right after a snapshot of the fs/subvol tree can be made and used later for a send operation. The same thing can happen during truncate, link, unlink, and xattr related operations. So instead of executing a BUG_ON, make send return an -EIO error and print an informative error message do dmesg/syslog. Signed-off-by: Filipe Manana <fdmanana@suse.com>
When we attempt to read an inode from disk, we end up always returning an -ESTALE error to the caller regardless of the actual failure reason, which can be an out of memory problem (when allocating a path), some error found when reading from the fs/subvolume btree (like a genuine IO error) or the inode does not exists. So lets start returning the real error code to the callers so that they don't treat all -ESTALE errors as meaning that the inode does not exists (such as during orphan cleanup). This will also be needed for a subsequent patch in the same series dealing with a special fsync case. Signed-off-by: Filipe Manana <fdmanana@suse.com>
…link With commit 56f23fd ("Btrfs: fix file/data loss caused by fsync after rename and new inode") we got simple fix for a functional issue when the following sequence of actions is done: at transaction N create file A at directory D at transaction N + M (where M >= 1) move/rename existing file A from directory D to directory E create a new file named A at directory D fsync the new file power fail The solution was to simply detect such scenario and fallback to a full transaction commit when we detect it. However this turned out to had a significant impact on throughput (and a bit on latency too) for benchmarks using the dbench tool, which simulates real workloads from smbd (Samba) servers. For example on a test vm (with a debug kernel): Unpatched: Throughput 19.1572 MB/sec 32 clients 32 procs max_latency=1005.229 ms Patched: Throughput 23.7015 MB/sec 32 clients 32 procs max_latency=809.206 ms The patched results (this patch is applied) are similar to the results of a kernel with the commit 56f23fd ("Btrfs: fix file/data loss caused by fsync after rename and new inode") reverted. This change avoids the fallback to a transaction commit and instead makes sure all the names of the conflicting inode (the one that had a name in a past transaction that matches the name of the new file in the same parent directory) are logged so that at log replay time we don't lose neither the new file nor the old file, and the old file gets the name it was renamed to. This also ends up avoiding a full transaction commit for a similar case that involves an unlink instead of a rename of the old file: at transaction N create file A at directory D at transaction N + M (where M >= 1) remove file A create a new file named A at directory D fsync the new file power fail Signed-off-by: Filipe Manana <fdmanana@suse.com>
Looks like this got missed when we ported the code from radeon. Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
It's already covered by the default case, but add it for consistency. Reviewed-by: Alexandre Demers <alexandre.f.demers@gmail.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Eric Huang <JinHuiEric.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ken Wang <Qingqing.Wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Ken Wang <Qingqing.Wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
No longer used as of commit 5846a3c ("btrfs: qgroup: Fix a race in delayed_ref which leads to abort trans"). Signed-off-by: Filipe Manana <fdmanana@suse.com>
If the fbdev probing fails, and in our error path we fail to clear the dev_priv->fbdev, then we can try and use a dangling fbdev pointer, and in particular a NULL fb. This could also happen in pathological cases where we try to operate on the fbdev prior to it being probed. Reported-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1468431285-28264-2-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> (cherry picked from commit 6bc2654) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Fixes hangs under memory pressure, e.g. running the piglit test tex3d-maxsize concurrently with other tests. Fixes: 17d33bc ("drm/ttm: drop waiting for idle in ttm_bo_evict.") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
…kernel/git/fdmanana/linux into for-linus-4.8
The conversion of the rcar-du driver from the I2C slave encoder to the DRM bridge API left the HDMI encoder's bridge pointer NULL, preventing the bridge from being handled automatically by the DRM core. Fix it. Fixes: 1d92611 ("drm: rcar-du: Remove i2c slave encoder interface for hdmi encoder") Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
drm_connector_register_all requires a few too many locks because our connector_list locking is busted. Add another FIXME+hack to work around this. This should address the below lockdep splat: ====================================================== [ INFO: possible circular locking dependency detected ] 4.7.0-rc5+ #524 Tainted: G O ------------------------------------------------------- kworker/u8:0/6 is trying to acquire lock: (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffff815afde0>] drm_modeset_lock_all+0x40/0x120 but task is already holding lock: ((fb_notifier_list).rwsem){++++.+}, at: [<ffffffff810ac195>] __blocking_notifier_call_chain+0x35/0x70 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 ((fb_notifier_list).rwsem){++++.+}: [<ffffffff810df611>] lock_acquire+0xb1/0x200 [<ffffffff819a55b4>] down_write+0x44/0x80 [<ffffffff810abf91>] blocking_notifier_chain_register+0x21/0xb0 [<ffffffff814c7448>] fb_register_client+0x18/0x20 [<ffffffff814c6c86>] backlight_device_register+0x136/0x260 [<ffffffffa0127eb2>] intel_backlight_device_register+0xa2/0x160 [i915] [<ffffffffa00f46be>] intel_connector_register+0xe/0x10 [i915] [<ffffffffa0112bfb>] intel_dp_connector_register+0x1b/0x80 [i915] [<ffffffff8159dfea>] drm_connector_register+0x4a/0x80 [<ffffffff8159fe44>] drm_connector_register_all+0x64/0xf0 [<ffffffff815a2a64>] drm_modeset_register_all+0x174/0x1c0 [<ffffffff81599b72>] drm_dev_register+0xc2/0xd0 [<ffffffffa00621d7>] i915_driver_load+0x1547/0x2200 [i915] [<ffffffffa006d80f>] i915_pci_probe+0x4f/0x70 [i915] [<ffffffff814a2135>] local_pci_probe+0x45/0xa0 [<ffffffff814a349b>] pci_device_probe+0xdb/0x130 [<ffffffff815c07e3>] driver_probe_device+0x223/0x440 [<ffffffff815c0ad5>] __driver_attach+0xd5/0x100 [<ffffffff815be386>] bus_for_each_dev+0x66/0xa0 [<ffffffff815c002e>] driver_attach+0x1e/0x20 [<ffffffff815bf9be>] bus_add_driver+0x1ee/0x280 [<ffffffff815c1810>] driver_register+0x60/0xe0 [<ffffffff814a1a10>] __pci_register_driver+0x60/0x70 [<ffffffffa01a905b>] i915_init+0x5b/0x62 [i915] [<ffffffff8100042d>] do_one_initcall+0x3d/0x150 [<ffffffff811a935b>] do_init_module+0x5f/0x1d9 [<ffffffff81124416>] load_module+0x20e6/0x27e0 [<ffffffff81124d63>] SYSC_finit_module+0xc3/0xf0 [<ffffffff81124dae>] SyS_finit_module+0xe/0x10 [<ffffffff819a83a9>] entry_SYSCALL_64_fastpath+0x1c/0xac -> #0 (&dev->mode_config.mutex){+.+.+.}: [<ffffffff810df0ac>] __lock_acquire+0x10fc/0x1260 [<ffffffff810df611>] lock_acquire+0xb1/0x200 [<ffffffff819a3097>] mutex_lock_nested+0x67/0x3c0 [<ffffffff815afde0>] drm_modeset_lock_all+0x40/0x120 [<ffffffff8158f79b>] drm_fb_helper_restore_fbdev_mode_unlocked+0x2b/0x80 [<ffffffff8158f81d>] drm_fb_helper_set_par+0x2d/0x50 [<ffffffffa0105f7a>] intel_fbdev_set_par+0x1a/0x60 [i915] [<ffffffff814c13c6>] fbcon_init+0x586/0x610 [<ffffffff8154d16a>] visual_init+0xca/0x130 [<ffffffff8154e611>] do_bind_con_driver+0x1c1/0x3a0 [<ffffffff8154eaf6>] do_take_over_console+0x116/0x180 [<ffffffff814bd3a7>] do_fbcon_takeover+0x57/0xb0 [<ffffffff814c1e48>] fbcon_event_notify+0x658/0x750 [<ffffffff810abcae>] notifier_call_chain+0x3e/0xb0 [<ffffffff810ac1ad>] __blocking_notifier_call_chain+0x4d/0x70 [<ffffffff810ac1e6>] blocking_notifier_call_chain+0x16/0x20 [<ffffffff814c748b>] fb_notifier_call_chain+0x1b/0x20 [<ffffffff814c86b1>] register_framebuffer+0x251/0x330 [<ffffffff8158fa9f>] drm_fb_helper_initial_config+0x25f/0x3f0 [<ffffffffa0106b48>] intel_fbdev_initial_config+0x18/0x30 [i915] [<ffffffff810adfd8>] async_run_entry_fn+0x48/0x150 [<ffffffff810a3947>] process_one_work+0x1e7/0x750 [<ffffffff810a3efb>] worker_thread+0x4b/0x4f0 [<ffffffff810aad4f>] kthread+0xef/0x110 [<ffffffff819a85ef>] ret_from_fork+0x1f/0x40 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((fb_notifier_list).rwsem); lock(&dev->mode_config.mutex); lock((fb_notifier_list).rwsem); lock(&dev->mode_config.mutex); *** DEADLOCK *** 6 locks held by kworker/u8:0/6: #0: ("events_unbound"){.+.+.+}, at: [<ffffffff810a38c9>] process_one_work+0x169/0x750 #1: ((&entry->work)){+.+.+.}, at: [<ffffffff810a38c9>] process_one_work+0x169/0x750 #2: (registration_lock){+.+.+.}, at: [<ffffffff814c8487>] register_framebuffer+0x27/0x330 #3: (console_lock){+.+.+.}, at: [<ffffffff814c86ce>] register_framebuffer+0x26e/0x330 #4: (&fb_info->lock){+.+.+.}, at: [<ffffffff814c78dd>] lock_fb_info+0x1d/0x40 #5: ((fb_notifier_list).rwsem){++++.+}, at: [<ffffffff810ac195>] __blocking_notifier_call_chain+0x35/0x70 stack backtrace: CPU: 2 PID: 6 Comm: kworker/u8:0 Tainted: G O 4.7.0-rc5+ #524 Hardware name: Intel Corp. Broxton P/NOTEBOOK, BIOS APLKRVPA.X64.0138.B33.1606250842 06/25/2016 Workqueue: events_unbound async_run_entry_fn 0000000000000000 ffff8800758577f0 ffffffff814507a5 ffffffff828b9900 ffffffff828b9900 ffff880075857830 ffffffff810dc6fa ffff880075857880 ffff88007584d688 0000000000000005 0000000000000006 ffff88007584d6b0 Call Trace: [<ffffffff814507a5>] dump_stack+0x67/0x92 [<ffffffff810dc6fa>] print_circular_bug+0x1aa/0x200 [<ffffffff810df0ac>] __lock_acquire+0x10fc/0x1260 [<ffffffff810df611>] lock_acquire+0xb1/0x200 [<ffffffff815afde0>] ? drm_modeset_lock_all+0x40/0x120 [<ffffffff815afde0>] ? drm_modeset_lock_all+0x40/0x120 [<ffffffff819a3097>] mutex_lock_nested+0x67/0x3c0 [<ffffffff815afde0>] ? drm_modeset_lock_all+0x40/0x120 [<ffffffff810fa85f>] ? rcu_read_lock_sched_held+0x7f/0x90 [<ffffffff81208218>] ? kmem_cache_alloc_trace+0x248/0x2b0 [<ffffffff815afdc5>] ? drm_modeset_lock_all+0x25/0x120 [<ffffffff815afde0>] drm_modeset_lock_all+0x40/0x120 [<ffffffff8158f79b>] drm_fb_helper_restore_fbdev_mode_unlocked+0x2b/0x80 [<ffffffff8158f81d>] drm_fb_helper_set_par+0x2d/0x50 [<ffffffffa0105f7a>] intel_fbdev_set_par+0x1a/0x60 [i915] [<ffffffff814c13c6>] fbcon_init+0x586/0x610 [<ffffffff8154d16a>] visual_init+0xca/0x130 [<ffffffff8154e611>] do_bind_con_driver+0x1c1/0x3a0 [<ffffffff8154eaf6>] do_take_over_console+0x116/0x180 [<ffffffff814bd3a7>] do_fbcon_takeover+0x57/0xb0 [<ffffffff814c1e48>] fbcon_event_notify+0x658/0x750 [<ffffffff810abcae>] notifier_call_chain+0x3e/0xb0 [<ffffffff810ac1ad>] __blocking_notifier_call_chain+0x4d/0x70 [<ffffffff810ac1e6>] blocking_notifier_call_chain+0x16/0x20 [<ffffffff814c748b>] fb_notifier_call_chain+0x1b/0x20 [<ffffffff814c86b1>] register_framebuffer+0x251/0x330 [<ffffffff815b7e8d>] ? vga_switcheroo_client_fb_set+0x5d/0x70 [<ffffffff8158fa9f>] drm_fb_helper_initial_config+0x25f/0x3f0 [<ffffffffa0106b48>] intel_fbdev_initial_config+0x18/0x30 [i915] [<ffffffff810adfd8>] async_run_entry_fn+0x48/0x150 [<ffffffff810a3947>] process_one_work+0x1e7/0x750 [<ffffffff810a38c9>] ? process_one_work+0x169/0x750 [<ffffffff810a3efb>] worker_thread+0x4b/0x4f0 [<ffffffff810a3eb0>] ? process_one_work+0x750/0x750 [<ffffffff810aad4f>] kthread+0xef/0x110 [<ffffffff819a85ef>] ret_from_fork+0x1f/0x40 [<ffffffff810aac60>] ? kthread_stop+0x2e0/0x2e0 v2: Rebase onto the right branch (hand-editing patches ftw) and add more reporters. Reported-by: Imre Deak <imre.deak@intel.com> Cc: Imre Deak <imre.deak@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Reported-by: Jiri Kosina <jikos@kernel.org> Cc: Jiri Kosina <jikos@kernel.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
…top.org/drm-intel into drm-next 3 intel fixes. * tag 'drm-intel-next-fixes-2016-08-05' of git://anongit.freedesktop.org/drm-intel: drm/i915/fbdev: Check for the framebuffer before use drm/i915: Never fully mask the the EI up rps interrupt on SNB/IVB drm/i915: Wait up to 3ms for the pcu to ack the cdclk change request on SKL
…nux into drm-next A few fixes for amdgpu and ttm for 4.8 - fix a ttm regression caused by the new pipelining code - fixes for mullins on amdgpu - updated golden settings for amdgpu * 'drm-next-4.8' of git://people.freedesktop.org/~agd5f/linux: drm/ttm: Wait for a BO to become idle before unbinding it from GTT drm/amdgpu: update golden setting of polaris10 drm/amdgpu: update golden setting of stoney drm/amdgpu: update golden setting of polaris11 drm/amdgpu: update golden setting of carrizo drm/amdgpu: update golden setting of iceland drm/amd/amdgpu: change pptable output format from ASCII to binary drm/amdgpu/ci: add mullins to default case for smc ucode drm/amdgpu/gmc7: add missing mullins case
WMI event 0xe00e is received when battery was removed or inserted. Signed-off-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Commit b195d5e ("ipr: Wait to do async scan until scsi host is initialized") fixed async scan for ipr, but broke sync scan for ipr. This fixes sync scan back up. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Reported-and-tested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
…rlied/linux Pull drm fixes from Dave Airlie: "This contains a bunch of amdgpu fixes, and some i915 regression fixes. It also contains some fixes for an older regression with some EDID changes and some 6bpc panels. Then there are the lockdep, cirrus and rcar-du regression fixes from this window" * tag 'drm-fixes-for-4.8-rc2' of git://people.freedesktop.org/~airlied/linux: drm/cirrus: Fix NULL pointer dereference when registering the fbdev drm/edid: Set 8 bpc color depth for displays with "DFP 1.x compliant TMDS". drm/i915/dp: Revert "drm/i915/dp: fall back to 18 bpp when sink capability is unknown" drm/edid: Add 6 bpc quirk for display AEO model 0. drm: Paper over locking inversion after registration rework drm: rcar-du: Link HDMI encoder with bridge drm/ttm: Wait for a BO to become idle before unbinding it from GTT drm/i915/fbdev: Check for the framebuffer before use drm/amdgpu: update golden setting of polaris10 drm/amdgpu: update golden setting of stoney drm/amdgpu: update golden setting of polaris11 drm/amdgpu: update golden setting of carrizo drm/amdgpu: update golden setting of iceland drm/amd/amdgpu: change pptable output format from ASCII to binary drm/amdgpu/ci: add mullins to default case for smc ucode drm/amdgpu/gmc7: add missing mullins case drm/i915: Never fully mask the the EI up rps interrupt on SNB/IVB drm/i915: Wait up to 3ms for the pcu to ack the cdclk change request on SKL
…ers/dvhart/linux-platform-drivers-x86 Pull x86 platform driver update from Darren Hart: "dell-wmi: ignore battery remove/insert event" * tag 'platform-drivers-x86-v4.8-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86: dell-wmi: Ignore WMI event 0xe00e
…g/pub/scm/linux/kernel/git/kees/linux Pull gcc plugin improvements from Kees Cook: "Several fixes/improvements for the gcc plugin infrastructure: - fix a problem with gcc plugins interfering with cc-option tests. - abort more gracefully when gcc plugin headers or compiler support is missing. - improve the gcc plugin rule generation to be more dynamic, pass arguments, and build from subdirectories" * tag 'gcc-plugin-infrastructure-v4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: gcc-plugins: Add support for plugin subdirectories gcc-plugins: Automate make rule generation gcc-plugins: Add support for passing plugin arguments gcc-plugins: abort builds cleanly when not supported kbuild: no gcc-plugins during cc-option tests
…/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Fix tick_stop tracepoint symbols for user export. Luiz Capitulino noticed that the tick_stop tracepoint wasn't being parsed properly by the tracing user space tools. This was due to the TRACE_DEFINE_ENUM() being set to a define, when it should have been set to the enum itself. The define was of the MASK that used the BIT to shift. The BIT was the enum and by adding that, everything gets converted nicely. The MASK is still kept just in case it gets converted to an enum in the future" * tag 'trace-v4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix tick_stop tracepoint symbols for user export
This reverts commit 874f9c7. Geert Uytterhoeven reports: "This change seems to have an (unintendent?) side-effect. Before, pr_*() calls without a trailing newline characters would be printed with a newline character appended, both on the console and in the output of the dmesg command. After this commit, no new line character is appended, and the output of the next pr_*() call of the same type may be appended, like in: - Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000 - Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM) + Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM)" Joe Perches says: "No, that is not intentional. The newline handling code inside vprintk_emit is a bit involved and for now I suggest a revert until this has all the same behavior as earlier" Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Requested-by: Joe Perches <joe@perches.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If get_maintainer is not given any filename arguments on the command line, the standard input is read for a patch. But checking if a VCS has a file named &STDIN is not a good idea and fails. Verify the nominal input file is not &STDIN before checking the VCS. Fixes: 4cad35a ("get_maintainer.pl: reduce need for command-line option -f") Reported-by: Christopher Covington <cov@codeaurora.org> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
…kernel/git/jhogan/metag Pull metag architecture fix from James Hogan: "A single fix for a boot crash since a commit in the merge window. Metag was unusual in calling show_mem() early, before setup_per_cpu_pageset(), which is no longer safe. It doesn't add much value to the log, so the fix just drops the call" * tag 'metag-for-v4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag: metag: Drop show_mem() from mem_init()
…rnel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "Some fixes for btrfs send/recv and fsync from Filipe and Robbie Ko. Bonus points to Filipe for already having xfstests in place for many of these" * 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: remove unused function btrfs_add_delayed_qgroup_reserve() Btrfs: improve performance on fsync against new inode after rename/unlink Btrfs: be more precise on errors when getting an inode from disk Btrfs: send, don't bug on inconsistent snapshots Btrfs: send, avoid incorrect leaf accesses when sending utimes operations Btrfs: send, fix invalid leaf accesses due to incorrect utimes operations Btrfs: send, fix warning due to late freeing of orphan_dir_info structures Btrfs: incremental send, fix premature rmdir operations Btrfs: incremental send, fix invalid paths for rename operations Btrfs: send, add missing error check for calls to path_loop() Btrfs: send, fix failure to move directories with the same name around Btrfs: add missing check for writeback errors on fsync
Add access checks to sys_oabi_epoll_wait() and sys_oabi_semtimedop(). This fixes CVE-2016-3857, a local privilege escalation under CONFIG_OABI_COMPAT. Cc: stable@vger.kernel.org Reported-by: Chiachih Wu <wuchiachih@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Dave Weinstein <olorin@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Original patch: https://lkml.org/lkml/2016/8/4/32 If riocm_ch_alloc() fails then we end up dereferencing the error pointer. The problem is that we're not unwinding in the reverse order from how we allocate things so it gets confusing. I've changed this around so now "ch" is NULL when we are done with it after we call riocm_put_channel(). That way we can check if it's NULL and avoid calling riocm_put_channel() on it twice. I renamed err_nodev to err_put_new_ch so that it better reflects what the goto does. Then because we had flipping things around, it means we don't neeed to initialize the pointers to NULL and we can remove an if statement and pull things in an indent level. Link: http://lkml.kernel.org/r/20160805152406.20713-1-alexandre.bounine@idt.com Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert commit 51d5d12 ("ARM: keystone: dts: add psci command definition"), which was inadvertently added twice. Cc: Russell King - ARM Linux <linux@armlinux.org.uk> Cc: Vitaly Andrianov <vitalya@ti.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The newly introduced shmem_huge_enabled() function has two definitions, but neither of them is visible if CONFIG_SYSFS is disabled, leading to a build error: mm/khugepaged.o: In function `khugepaged': khugepaged.c:(.text.khugepaged+0x3ca): undefined reference to `shmem_huge_enabled' This changes the #ifdef guards around the definition to match those that are used in the header file. Fixes: e496cf3 ("thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE") Link: http://lkml.kernel.org/r/20160809123638.1357593-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
…tio changes Before resetting min_unmapped_pages, we need to initialize min_unmapped_pages rather than min_slab_pages. Fixes: a5f5f91 (mm: convert zone_reclaim to node_reclaim) Link: http://lkml.kernel.org/r/1470724248-26780-1-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
…emory Some of node threshold depends on number of managed pages in the node. When memory is going on/offline, it can be changed and we need to adjust them. Add recalculation to appropriate places and clean-up related functions for better maintenance. Link: http://lkml.kernel.org/r/1470724248-26780-2-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
PageTransCompound() doesn't distinguish THP from from any other type of compound pages. This can lead to false-positive VM_BUG_ON() in page_add_file_rmap() if called on compound page from a driver[1]. I think we can exclude such cases by checking if the page belong to a mapping. The VM_BUG_ON_PAGE() is downgraded to VM_WARN_ON_ONCE(). This path should not cause any harm to non-THP page, but good to know if we step on anything else. [1] http://lkml.kernel.org/r/c711e067-0bff-a6cb-3c37-04dfe77d2db1@redhat.com Link: http://lkml.kernel.org/r/20160810161345.GA67522@black.fi.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Laura Abbott <labbott@redhat.com> Tested-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In page_remove_file_rmap(.) we have the following check: VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page); This is meant to check for either HugeTLB pages or THP when a compound page is passed in. Unfortunately, if one disables CONFIG_TRANSPARENT_HUGEPAGE, then PageTransHuge(.) will always return false, provoking BUGs when one runs the libhugetlbfs test suite. This patch replaces PageTransHuge(), with PageHead() which will work for both HugeTLB and THP. Fixes: dd78fed ("rmap: support file thp") Link: http://lkml.kernel.org/r/1470838217-5889-1-git-send-email-steve.capper@arm.com Signed-off-by: Steve Capper <steve.capper@arm.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Huang Shijie <shijie.huang@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With debugobjects enabled and using SLAB_DESTROY_BY_RCU, when a kmem_cache_node is destroyed the call_rcu() may trigger a slab allocation to fill the debug object pool (__debug_object_init:fill_pool). Everywhere but during kmem_cache_destroy(), discard_slab() is performed outside of the kmem_cache_node->list_lock and avoids a lockdep warning about potential recursion: ============================================= [ INFO: possible recursive locking detected ] 4.8.0-rc1-gfxbench+ #1 Tainted: G U --------------------------------------------- rmmod/8895 is trying to acquire lock: (&(&n->list_lock)->rlock){-.-...}, at: [<ffffffff811c80d7>] get_partial_node.isra.63+0x47/0x430 but task is already holding lock: (&(&n->list_lock)->rlock){-.-...}, at: [<ffffffff811cbda4>] __kmem_cache_shutdown+0x54/0x320 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&n->list_lock)->rlock); lock(&(&n->list_lock)->rlock); *** DEADLOCK *** May be due to missing lock nesting notation 5 locks held by rmmod/8895: #0: (&dev->mutex){......}, at: driver_detach+0x42/0xc0 #1: (&dev->mutex){......}, at: driver_detach+0x50/0xc0 #2: (cpu_hotplug.dep_map){++++++}, at: get_online_cpus+0x2d/0x80 #3: (slab_mutex){+.+.+.}, at: kmem_cache_destroy+0x3c/0x220 #4: (&(&n->list_lock)->rlock){-.-...}, at: __kmem_cache_shutdown+0x54/0x320 stack backtrace: CPU: 6 PID: 8895 Comm: rmmod Tainted: G U 4.8.0-rc1-gfxbench+ #1 Hardware name: Gigabyte Technology Co., Ltd. H87M-D3H/H87M-D3H, BIOS F11 08/18/2015 Call Trace: __lock_acquire+0x1646/0x1ad0 lock_acquire+0xb2/0x200 _raw_spin_lock+0x36/0x50 get_partial_node.isra.63+0x47/0x430 ___slab_alloc.constprop.67+0x1a7/0x3b0 __slab_alloc.isra.64.constprop.66+0x43/0x80 kmem_cache_alloc+0x236/0x2d0 __debug_object_init+0x2de/0x400 debug_object_activate+0x109/0x1e0 __call_rcu.constprop.63+0x32/0x2f0 call_rcu+0x12/0x20 discard_slab+0x3d/0x40 __kmem_cache_shutdown+0xdb/0x320 shutdown_cache+0x19/0x60 kmem_cache_destroy+0x1ae/0x220 i915_gem_load_cleanup+0x14/0x40 [i915] i915_driver_unload+0x151/0x180 [i915] i915_pci_remove+0x14/0x20 [i915] pci_device_remove+0x34/0xb0 __device_release_driver+0x95/0x140 driver_detach+0xb6/0xc0 bus_remove_driver+0x53/0xd0 driver_unregister+0x27/0x50 pci_unregister_driver+0x25/0x70 i915_exit+0x1a/0x1e2 [i915] SyS_delete_module+0x193/0x1f0 entry_SYSCALL_64_fastpath+0x1c/0xac Fixes: 52b4b95 ("mm: slab: free kmem_cache_node after destroy sysfs file") Link: http://lkml.kernel.org/r/1470759070-18743-1-git-send-email-chris@chris-wilson.co.uk Reported-by: Dave Gordon <david.s.gordon@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Gordon <david.s.gordon@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge misc fixes from Andrew Morton: "8 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/slub.c: run free_partial() outside of the kmem_cache_node->list_lock rmap: fix compound check logic in page_remove_file_rmap mm, rmap: fix false positive VM_BUG() in page_add_file_rmap() mm/page_alloc.c: recalculate some of node threshold when on/offline memory mm/page_alloc.c: fix wrong initialization when sysctl_min_unmapped_ratio changes thp: move shmem_huge_enabled() outside of SYSFS ifdef revert "ARM: keystone: dts: add psci command definition" rapidio: dereferencing an error pointer
weener
pushed a commit
that referenced
this pull request
Aug 18, 2016
Inside the kafs filesystem it is possible to occasionally have a call processed and terminated before we've had a chance to check whether we need to clean up the rx queue for that call because afs_send_simple_reply() ends the call when it is done, but this is done in a workqueue item that might happen to run to completion before afs_deliver_to_call() completes. Further, it is possible for rxrpc_kernel_send_data() to be called to send a reply before the last request-phase data skb is released. The rxrpc skb destructor is where the ACK processing is done and the call state is advanced upon release of the last skb. ACK generation is also deferred to a work item because it's possible that the skb destructor is not called in a context where kernel_sendmsg() can be invoked. To this end, the following changes are made: (1) kernel_rxrpc_data_consumed() is added. This should be called whenever an skb is emptied so as to crank the ACK and call states. This does not release the skb, however. kernel_rxrpc_free_skb() must now be called to achieve that. These together replace rxrpc_kernel_data_delivered(). (2) kernel_rxrpc_data_consumed() is wrapped by afs_data_consumed(). This makes afs_deliver_to_call() easier to work as the skb can simply be discarded unconditionally here without trying to work out what the return value of the ->deliver() function means. The ->deliver() functions can, via afs_data_complete(), afs_transfer_reply() and afs_extract_data() mark that an skb has been consumed (thereby cranking the state) without the need to conditionally free the skb to make sure the state is correct on an incoming call for when the call processor tries to send the reply. (3) rxrpc_recvmsg() now has to call kernel_rxrpc_data_consumed() when it has finished with a packet and MSG_PEEK isn't set. (4) rxrpc_packet_destructor() no longer calls rxrpc_hard_ACK_data(). Because of this, we no longer need to clear the destructor and put the call before we free the skb in cases where we don't want the ACK/call state to be cranked. (5) The ->deliver() call-type callbacks are made to return -EAGAIN rather than 0 if they expect more data (afs_extract_data() returns -EAGAIN to the delivery function already), and the caller is now responsible for producing an abort if that was the last packet. (6) There are many bits of unmarshalling code where: ret = afs_extract_data(call, skb, last, ...); switch (ret) { case 0: break; case -EAGAIN: return 0; default: return ret; } is to be found. As -EAGAIN can now be passed back to the caller, we now just return if ret < 0: ret = afs_extract_data(call, skb, last, ...); if (ret < 0) return ret; (7) Checks for trailing data and empty final data packets has been consolidated as afs_data_complete(). So: if (skb->len > 0) return -EBADMSG; if (!last) return 0; becomes: ret = afs_data_complete(call, skb, last); if (ret < 0) return ret; (8) afs_transfer_reply() now checks the amount of data it has against the amount of data desired and the amount of data in the skb and returns an error to induce an abort if we don't get exactly what we want. Without these changes, the following oops can occasionally be observed, particularly if some printks are inserted into the delivery path: general protection fault: 0000 [#1] SMP Modules linked in: kafs(E) af_rxrpc(E) [last unloaded: af_rxrpc] CPU: 0 PID: 1305 Comm: kworker/u8:3 Tainted: G E 4.7.0-fsdevel+ #1303 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Workqueue: kafsd afs_async_workfn [kafs] task: ffff88040be041c0 ti: ffff88040c070000 task.ti: ffff88040c070000 RIP: 0010:[<ffffffff8108fd3c>] [<ffffffff8108fd3c>] __lock_acquire+0xcf/0x15a1 RSP: 0018:ffff88040c073bc0 EFLAGS: 00010002 RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000000 RCX: ffff88040d29a710 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88040d29a710 RBP: ffff88040c073c70 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: ffff88040be041c0 R15: ffffffff814c928f FS: 0000000000000000(0000) GS:ffff88041fa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa4595f4750 CR3: 0000000001c14000 CR4: 00000000001406f0 Stack: 0000000000000006 000000000be04930 0000000000000000 ffff880400000000 ffff880400000000 ffffffff8108f847 ffff88040be041c0 ffffffff81050446 ffff8803fc08a920 ffff8803fc08a958 ffff88040be041c0 ffff88040c073c38 Call Trace: [<ffffffff8108f847>] ? mark_held_locks+0x5e/0x74 [<ffffffff81050446>] ? __local_bh_enable_ip+0x9b/0xa1 [<ffffffff8108f9ca>] ? trace_hardirqs_on_caller+0x16d/0x189 [<ffffffff810915f4>] lock_acquire+0x122/0x1b6 [<ffffffff810915f4>] ? lock_acquire+0x122/0x1b6 [<ffffffff814c928f>] ? skb_dequeue+0x18/0x61 [<ffffffff81609dbf>] _raw_spin_lock_irqsave+0x35/0x49 [<ffffffff814c928f>] ? skb_dequeue+0x18/0x61 [<ffffffff814c928f>] skb_dequeue+0x18/0x61 [<ffffffffa009aa92>] afs_deliver_to_call+0x344/0x39d [kafs] [<ffffffffa009ab37>] afs_process_async_call+0x4c/0xd5 [kafs] [<ffffffffa0099e9c>] afs_async_workfn+0xe/0x10 [kafs] [<ffffffff81063a3a>] process_one_work+0x29d/0x57c [<ffffffff81064ac2>] worker_thread+0x24a/0x385 [<ffffffff81064878>] ? rescuer_thread+0x2d0/0x2d0 [<ffffffff810696f5>] kthread+0xf3/0xfb [<ffffffff8160a6ff>] ret_from_fork+0x1f/0x40 [<ffffffff81069602>] ? kthread_create_on_node+0x1cf/0x1cf Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
weener
pushed a commit
that referenced
this pull request
Aug 18, 2016
Panic occurs when issuing "cat /proc/net/route" whilst populating FIB with > 1M routes. Use of cached node pointer in fib_route_get_idx is unsafe. BUG: unable to handle kernel paging request at ffffc90001630024 IP: [<ffffffff814cf6a0>] leaf_walk_rcu+0x10/0xe0 PGD 11b08d067 PUD 11b08e067 PMD dac4b067 PTE 0 Oops: 0000 [#1] SMP Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscac snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep virti acpi_cpufreq button parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache jbd tio_ring virtio floppy uhci_hcd ehci_hcd usbcore usb_common libata scsi_mod CPU: 1 PID: 785 Comm: cat Not tainted 4.2.0-rc8+ #4 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 task: ffff8800da1c0bc0 ti: ffff88011a05c000 task.ti: ffff88011a05c000 RIP: 0010:[<ffffffff814cf6a0>] [<ffffffff814cf6a0>] leaf_walk_rcu+0x10/0xe0 RSP: 0018:ffff88011a05fda0 EFLAGS: 00010202 RAX: ffff8800d8a40c00 RBX: ffff8800da4af940 RCX: ffff88011a05ff20 RDX: ffffc90001630020 RSI: 0000000001013531 RDI: ffff8800da4af950 RBP: 0000000000000000 R08: ffff8800da1f9a00 R09: 0000000000000000 R10: ffff8800db45b7e4 R11: 0000000000000246 R12: ffff8800da4af950 R13: ffff8800d97a74c0 R14: 0000000000000000 R15: ffff8800d97a7480 FS: 00007fd3970e0700(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffc90001630024 CR3: 000000011a7e4000 CR4: 00000000000006e0 Stack: ffffffff814d00d3 0000000000000000 ffff88011a05ff20 ffff8800da1f9a00 ffffffff811dd8b9 0000000000000800 0000000000020000 00007fd396f35000 ffffffff811f8714 0000000000003431 ffffffff8138dce0 0000000000000f80 Call Trace: [<ffffffff814d00d3>] ? fib_route_seq_start+0x93/0xc0 [<ffffffff811dd8b9>] ? seq_read+0x149/0x380 [<ffffffff811f8714>] ? fsnotify+0x3b4/0x500 [<ffffffff8138dce0>] ? process_echoes+0x70/0x70 [<ffffffff8121cfa7>] ? proc_reg_read+0x47/0x70 [<ffffffff811bb823>] ? __vfs_read+0x23/0xd0 [<ffffffff811bbd42>] ? rw_verify_area+0x52/0xf0 [<ffffffff811bbe61>] ? vfs_read+0x81/0x120 [<ffffffff811bcbc2>] ? SyS_read+0x42/0xa0 [<ffffffff81549ab2>] ? entry_SYSCALL_64_fastpath+0x16/0x75 Code: 48 85 c0 75 d8 f3 c3 31 c0 c3 f3 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 a 04 89 f0 33 02 44 89 c9 48 d3 e8 0f b6 4a 05 49 89 RIP [<ffffffff814cf6a0>] leaf_walk_rcu+0x10/0xe0 RSP <ffff88011a05fda0> CR2: ffffc90001630024 Signed-off-by: Dave Forster <dforster@brocade.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
weener
pushed a commit
that referenced
this pull request
Aug 18, 2016
I have got a zero division error when disabling the forced idle injection from the intel powerclamp. I did echo 0 >/sys/class/thermal/cooling_device48/cur_state and got [ 986.072632] divide error: 0000 [#1] PREEMPT SMP [ 986.078989] Modules linked in: [ 986.083618] CPU: 17 PID: 24967 Comm: kidle_inject/17 Not tainted 4.7.0-1-default+ #3055 [ 986.093781] Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.R3.27.D685.1305151734 05/15/2013 [ 986.106227] task: ffff880430e1c080 task.stack: ffff880427ef0000 [ 986.114122] RIP: 0010:[<ffffffff81794859>] [<ffffffff81794859>] clamp_thread+0x1d9/0x600 [ 986.124609] RSP: 0018:ffff880427ef3e20 EFLAGS: 00010246 [ 986.131860] RAX: 0000000000000258 RBX: 0000000000000006 RCX: 0000000000000001 [ 986.141179] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000018 [ 986.150478] RBP: ffff880427ef3ec8 R08: ffff880427ef0000 R09: 0000000000000002 [ 986.159779] R10: 0000000000003df2 R11: 0000000000000018 R12: 0000000000000002 [ 986.169089] R13: 0000000000000000 R14: ffff880427ef0000 R15: ffff880427ef0000 [ 986.178388] FS: 0000000000000000(0000) GS:ffff880435940000(0000) knlGS:0000000000000000 [ 986.188785] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 986.196559] CR2: 00007f1d0caf0000 CR3: 0000000002006000 CR4: 00000000001406e0 [ 986.205909] Stack: [ 986.209524] ffff8802be897b00 ffff880430e1c080 0000000000000011 0000006a35959780 [ 986.219236] 0000000000000011 ffff880427ef0008 0000000000000000 ffff8804359503d0 [ 986.228966] 0000000100029d93 ffffffff81794140 0000000000000000 ffffffff05000011 [ 986.238686] Call Trace: [ 986.242825] [<ffffffff81794140>] ? pkg_state_counter+0x80/0x80 [ 986.250866] [<ffffffff81794680>] ? powerclamp_set_cur_state+0x180/0x180 [ 986.259797] [<ffffffff8111d1a9>] kthread+0xc9/0xe0 [ 986.266682] [<ffffffff8193d69f>] ret_from_fork+0x1f/0x40 [ 986.274142] [<ffffffff8111d0e0>] ? kthread_create_on_node+0x180/0x180 [ 986.282869] Code: d1 ea 48 89 d6 80 3d 6a d0 d4 00 00 ba 64 00 00 00 89 d8 41 0f 45 f5 0f af c2 42 8d 14 2e be 31 00 00 00 83 fa 31 0f 42 f2 31 d2 <f7> f6 48 8b 15 9e 07 87 00 48 8b 3d 97 07 87 00 48 63 f0 83 e8 [ 986.307806] RIP [<ffffffff81794859>] clamp_thread+0x1d9/0x600 [ 986.315871] RSP <ffff880427ef3e20> RIP points to the following lines: compensation = get_compensation(target_ratio); interval = duration_jiffies*100/(target_ratio+compensation); A solution would be to switch the following two commands in powerclamp_set_cur_state(): set_target_ratio = 0; end_power_clamp(); But I think that the zero division might happen also when target_ratio is non-zero because the compensation might be negative. Therefore we also check the sum of target_ratio and compensation explicitly. Also the compensated_ratio variable is always set. Therefore there is no need to initialize it. Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
weener
pushed a commit
that referenced
this pull request
Aug 18, 2016
The memory allocated by iov_iter_get_pages_alloc() can be allocated with vmalloc() if kmalloc() failed -- see get_pages_array(). In that case we need to free it with vfree(), so let's use kvfree(). The bug manifests like this: BUG: unable to handle kernel paging request at ffffeb0400072da0 IP: [<ffffffff8139c67b>] kfree+0x4b/0x140 PGD 0 Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 2 PID: 675 Comm: trinity-c2 Not tainted 4.7.0-rc7+ torvalds#14 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 task: ffff8800badef2c0 ti: ffff880069208000 task.ti: ffff880069208000 RIP: 0010:[<ffffffff8139c67b>] [<ffffffff8139c67b>] kfree+0x4b/0x140 RSP: 0000:ffff88006920f3f0 EFLAGS: 00010282 RAX: ffffea0000000000 RBX: ffffc90001cb6000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffffc90001cb6000 RBP: ffff88006920f410 R08: 0000000000000000 R09: dffffc0000000000 R10: ffff8800badefa30 R11: 0000056a3d3b0d9f R12: ffff88006920f620 R13: ffffeb0400072d80 R14: ffff8800baa94078 R15: 0000000000000000 FS: 00007fbd2b437700(0000) GS:ffff88011af00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffeb0400072da0 CR3: 000000006926d000 CR4: 00000000000006e0 Stack: 0000000000000001 ffff88006920f620 ffffed001755280f ffff8800baa94078 ffff88006920f6a8 ffffffff8310442b dffffc0000000000 ffff8800badefa30 ffff8800badefa28 ffff88011af1fba0 1ffff1000d241e98 ffff8800ba892150 Call Trace: [<ffffffff8310442b>] p9_virtio_zc_request+0x72b/0xdb0 [<ffffffff830f2116>] p9_client_zc_rpc.constprop.8+0x246/0xb10 [<ffffffff830f5d79>] p9_client_read+0x4c9/0x750 [<ffffffff8175ceac>] v9fs_fid_readpage+0x14c/0x320 [<ffffffff8175d0b6>] v9fs_vfs_readpage+0x36/0x50 [<ffffffff812c6f13>] filemap_fault+0x9a3/0xe60 [<ffffffff81331878>] __do_fault+0x158/0x300 [<ffffffff81339e01>] handle_mm_fault+0x1cf1/0x3c80 [<ffffffff810c0aaa>] __do_page_fault+0x30a/0x8e0 [<ffffffff810c10df>] do_page_fault+0x2f/0x80 [<ffffffff810b5b07>] do_async_page_fault+0x27/0xa0 [<ffffffff83296c48>] async_page_fault+0x28/0x30 Code: 00 80 41 54 53 49 01 fd 48 0f 42 05 b0 39 67 02 48 89 fb 49 01 c5 48 b8 00 00 00 00 00 ea ff ff 49 c1 ed 0c 49 c1 e5 06 49 01 c5 <49> 8b 45 20 48 8d 50 ff a8 01 4c 0f 45 ea 49 8b 55 20 48 8d 42 RIP [<ffffffff8139c67b>] kfree+0x4b/0x140 RSP <ffff88006920f3f0> CR2: ffffeb0400072da0 ---[ end trace f3d59a04bafec038 ]--- Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
weener
pushed a commit
that referenced
this pull request
Aug 18, 2016
Fix a use of a packet after it has been enqueued onto the packet processing queue in the data_ready handler. Once on a call's Rx queue, we mustn't touch it any more as it may be dequeued and freed by the call processor running on a work queue. Save the values we need before enqueuing. Without this, we can get an oops like the following: BUG: unable to handle kernel NULL pointer dereference at 000000000000009c IP: [<ffffffffa01854e8>] rxrpc_fast_process_packet+0x724/0xa11 [af_rxrpc] PGD 0 Oops: 0000 [#1] SMP Modules linked in: kafs(E) af_rxrpc(E) [last unloaded: af_rxrpc] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G E 4.7.0-fsdevel+ #1336 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 task: ffff88040d6863c0 task.stack: ffff88040d68c000 RIP: 0010:[<ffffffffa01854e8>] [<ffffffffa01854e8>] rxrpc_fast_process_packet+0x724/0xa11 [af_rxrpc] RSP: 0018:ffff88041fb03a78 EFLAGS: 00010246 RAX: ffffffffffffffff RBX: ffff8803ff195b00 RCX: 0000000000000001 RDX: ffffffffa01854d1 RSI: 0000000000000008 RDI: ffff8803ff195b00 RBP: ffff88041fb03ab0 R08: 0000000000000000 R09: 0000000000000001 R10: ffff88041fb038c8 R11: 0000000000000000 R12: ffff880406874800 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88041fb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000009c CR3: 0000000001c14000 CR4: 00000000001406e0 Stack: ffff8803ff195ea0 ffff880408348800 ffff880406874800 ffff8803ff195b00 ffff880408348800 ffff8803ff195ed8 0000000000000000 ffff88041fb03af0 ffffffffa0186072 0000000000000000 ffff8804054da000 0000000000000000 Call Trace: <IRQ> [<ffffffffa0186072>] rxrpc_data_ready+0x89d/0xbae [af_rxrpc] [<ffffffff814c94d7>] __sock_queue_rcv_skb+0x24c/0x2b2 [<ffffffff8155c59a>] __udp_queue_rcv_skb+0x4b/0x1bd [<ffffffff8155e048>] udp_queue_rcv_skb+0x281/0x4db [<ffffffff8155ea8f>] __udp4_lib_rcv+0x7ed/0x963 [<ffffffff8155ef9a>] udp_rcv+0x15/0x17 [<ffffffff81531d86>] ip_local_deliver_finish+0x1c3/0x318 [<ffffffff81532544>] ip_local_deliver+0xbb/0xc4 [<ffffffff81531bc3>] ? inet_del_offload+0x40/0x40 [<ffffffff815322a9>] ip_rcv_finish+0x3ce/0x42c [<ffffffff81532851>] ip_rcv+0x304/0x33d [<ffffffff81531edb>] ? ip_local_deliver_finish+0x318/0x318 [<ffffffff814dff9d>] __netif_receive_skb_core+0x601/0x6e8 [<ffffffff814e072e>] __netif_receive_skb+0x13/0x54 [<ffffffff814e082a>] netif_receive_skb_internal+0xbb/0x17c [<ffffffff814e1838>] napi_gro_receive+0xf9/0x1bd [<ffffffff8144eb9f>] rtl8169_poll+0x32b/0x4a8 [<ffffffff814e1c7b>] net_rx_action+0xe8/0x357 [<ffffffff81051074>] __do_softirq+0x1aa/0x414 [<ffffffff810514ab>] irq_exit+0x3d/0xb0 [<ffffffff810184a2>] do_IRQ+0xe4/0xfc [<ffffffff81612053>] common_interrupt+0x93/0x93 <EOI> [<ffffffff814af837>] ? cpuidle_enter_state+0x1ad/0x2be [<ffffffff814af832>] ? cpuidle_enter_state+0x1a8/0x2be [<ffffffff814af96a>] cpuidle_enter+0x12/0x14 [<ffffffff8108956f>] call_cpuidle+0x39/0x3b [<ffffffff81089855>] cpu_startup_entry+0x230/0x35d [<ffffffff810312ea>] start_secondary+0xf4/0xf7 Signed-off-by: David Howells <dhowells@redhat.com>
weener
pushed a commit
that referenced
this pull request
Aug 18, 2016
Commit 8d460f6 ("powerpc/process: Add the function flush_tmregs_to_thread") added flush_tmregs_to_thread() and included the assumption that it would only be called for a task which is not current. Although this is correct for ptrace, when generating a core dump, some of the routines which call flush_tmregs_to_thread() are called. This leads to a WARNing such as: Not expecting ptrace on self: TM regs may be incorrect ------------[ cut here ]------------ WARNING: CPU: 123 PID: 7727 at arch/powerpc/kernel/process.c:1088 flush_tmregs_to_thread+0x78/0x80 CPU: 123 PID: 7727 Comm: libvirtd Not tainted 4.8.0-rc1-gcc6x-g61e8a0d #1 task: c000000fe631b600 task.stack: c000000fe63b0000 NIP: c00000000001a1a8 LR: c00000000001a1a4 CTR: c000000000717780 REGS: c000000fe63b3420 TRAP: 0700 Not tainted (4.8.0-rc1-gcc6x-g61e8a0d) MSR: 900000010282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 28004222 XER: 20000000 ... NIP [c00000000001a1a8] flush_tmregs_to_thread+0x78/0x80 LR [c00000000001a1a4] flush_tmregs_to_thread+0x74/0x80 Call Trace: flush_tmregs_to_thread+0x74/0x80 (unreliable) vsr_get+0x64/0x1a0 elf_core_dump+0x604/0x1430 do_coredump+0x5fc/0x1200 get_signal+0x398/0x740 do_signal+0x54/0x2b0 do_notify_resume+0x98/0xb0 ret_from_except_lite+0x70/0x74 So fix flush_tmregs_to_thread() to detect the case where it is called on current, and a transaction is active, and in that case flush the TM regs to the thread_struct. This patch also moves flush_tmregs_to_thread() into ptrace.c as it is only called from that file. Fixes: 8d460f6 ("powerpc/process: Add the function flush_tmregs_to_thread") Signed-off-by: Cyril Bur <cyrilbur@gmail.com> [mpe: Flesh out change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
weener
pushed a commit
that referenced
this pull request
Aug 18, 2016
Both set_memory_ro() and set_memory_rw() will modify the page attributes of at least one page, even if the numpages parameter is zero. The author expected that calling these functions with numpages == zero would never happen. However with the new 444d13f ("modules: add ro_after_init support") feature this happens frequently. Therefore do the right thing and make these two functions return gracefully if nothing should be done. Fixes crashes on module load like this one: Unable to handle kernel pointer dereference in virtual kernel address space Failing address: 000003ff80008000 TEID: 000003ff80008407 Fault in home space mode while using kernel ASCE. AS:0000000000d18007 R3:00000001e6aa4007 S:00000001e6a10800 P:00000001e34ee21d Oops: 0004 ilc:3 [#1] SMP Modules linked in: x_tables CPU: 10 PID: 1 Comm: systemd Not tainted 4.7.0-11895-g3fa9045 #4 Hardware name: IBM 2964 N96 703 (LPAR) task: 00000001e9118000 task.stack: 00000001e9120000 Krnl PSW : 0704e00180000000 00000000005677f8 (rb_erase+0xf0/0x4d0) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 Krnl GPRS: 000003ff80008b20 000003ff80008b20 000003ff80008b70 0000000000b9d608 000003ff80008b20 0000000000000000 00000001e9123e88 000003ff80008950 00000001e485ab40 000003ff00000000 000003ff80008b00 00000001e4858480 0000000100000000 000003ff80008b68 00000000001d5998 00000001e9123c28 Krnl Code: 00000000005677e8: ec1801c3007c cgij %r1,0,8,567b6e 00000000005677ee: e32010100020 cg %r2,16(%r1) #00000000005677f4: a78401c2 brc 8,567b78 >00000000005677f8: e35010080024 stg %r5,8(%r1) 00000000005677fe: ec5801af007c cgij %r5,0,8,567b5c 0000000000567804: e30050000024 stg %r0,0(%r5) 000000000056780a: ebacf0680004 lmg %r10,%r12,104(%r15) 0000000000567810: 07fe bcr 15,%r14 Call Trace: ([<000003ff80008900>] __this_module+0x0/0xffffffffffffd700 [x_tables]) ([<0000000000264fd4>] do_init_module+0x12c/0x220) ([<00000000001da14a>] load_module+0x24e2/0x2b10) ([<00000000001da976>] SyS_finit_module+0xbe/0xd8) ([<0000000000803b26>] system_call+0xd6/0x264) Last Breaking-Event-Address: [<000000000056771a>] rb_erase+0x12/0x4d0 Kernel panic - not syncing: Fatal exception: panic_on_oops Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Reported-and-tested-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Fixes: e8a97e4 ("s390/pageattr: allow kernel page table splitting") Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
weener
pushed a commit
that referenced
this pull request
Aug 18, 2016
The following oops occurs after a pgdat is hotadded: Unable to handle kernel paging request for data at address 0x00c30001 Faulting instruction address: 0xc00000000022f8f4 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=2048 NUMA pSeries Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter nls_utf8 isofs sg virtio_balloon uio_pdrv_genirq uio ip_tables xfs libcrc32c sr_mod cdrom sd_mod virtio_net ibmvscsi scsi_transport_srp virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 4.8.0-rc1-device torvalds#110 task: c000000000ef3080 task.stack: c000000000f6c000 NIP: c00000000022f8f4 LR: c00000000022f948 CTR: 0000000000000000 REGS: c000000000f6fa50 TRAP: 0300 Tainted: G W (4.8.0-rc1-device) MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 84002028 XER: 20000000 CFAR: d000000001d2013c DAR: 0000000000c30001 DSISR: 40000000 SOFTE: 0 NIP refresh_cpu_vm_stats+0x1a4/0x2f0 LR refresh_cpu_vm_stats+0x1f8/0x2f0 Call Trace: refresh_cpu_vm_stats+0x1f8/0x2f0 (unreliable) Add per_cpu_nodestats initialization to the hotplug codepath. Link: http://lkml.kernel.org/r/1470931473-7090-1-git-send-email-arbab@linux.vnet.ibm.com Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
weener
pushed a commit
that referenced
this pull request
Aug 18, 2016
Executing from a non-executable area gives an ugly message: lkdtm: Performing direct entry EXEC_RODATA lkdtm: attempting ok execution at ffff0000084c0e08 lkdtm: attempting bad execution at ffff000008880700 Bad mode in Synchronous Abort handler detected on CPU2, code 0x8400000e -- IABT (current EL) CPU: 2 PID: 998 Comm: sh Not tainted 4.7.0-rc2+ torvalds#13 Hardware name: linux,dummy-virt (DT) task: ffff800077e35780 ti: ffff800077970000 task.ti: ffff800077970000 PC is at lkdtm_rodata_do_nothing+0x0/0x8 LR is at execute_location+0x74/0x88 The 'IABT (current EL)' indicates the error but it's a bit cryptic without knowledge of the ARM ARM. There is also no indication of the specific address which triggered the fault. The increase in kernel page permissions makes hitting this case more likely as well. Handling the case in the vectors gives a much more familiar looking error message: lkdtm: Performing direct entry EXEC_RODATA lkdtm: attempting ok execution at ffff0000084c0840 lkdtm: attempting bad execution at ffff000008880680 Unable to handle kernel paging request at virtual address ffff000008880680 pgd = ffff8000089b2000 [ffff000008880680] *pgd=00000000489b4003, *pud=0000000048904003, *pmd=0000000000000000 Internal error: Oops: 8400000e [#1] PREEMPT SMP Modules linked in: CPU: 1 PID: 997 Comm: sh Not tainted 4.7.0-rc1+ torvalds#24 Hardware name: linux,dummy-virt (DT) task: ffff800077f9f080 ti: ffff800008a1c000 task.ti: ffff800008a1c000 PC is at lkdtm_rodata_do_nothing+0x0/0x8 LR is at execute_location+0x74/0x88 Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
weener
pushed a commit
that referenced
this pull request
Aug 18, 2016
tipc_msg_create() can return a NULL skb and if so, we shouldn't try to call tipc_node_xmit_skb() on it. general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 3 PID: 30298 Comm: trinity-c0 Not tainted 4.7.0-rc7+ torvalds#19 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 task: ffff8800baf09980 ti: ffff8800595b8000 task.ti: ffff8800595b8000 RIP: 0010:[<ffffffff830bb46b>] [<ffffffff830bb46b>] tipc_node_xmit_skb+0x6b/0x140 RSP: 0018:ffff8800595bfce8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000003023b0e0 RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffffffff83d12580 RBP: ffff8800595bfd78 R08: ffffed000b2b7f32 R09: 0000000000000000 R10: fffffbfff0759725 R11: 0000000000000000 R12: 1ffff1000b2b7f9f R13: ffff8800595bfd58 R14: ffffffff83d12580 R15: dffffc0000000000 FS: 00007fcdde242700(0000) GS:ffff88011af80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fcddde1db10 CR3: 000000006874b000 CR4: 00000000000006e0 DR0: 00007fcdde248000 DR1: 00007fcddd73d000 DR2: 00007fcdde248000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000090602 Stack: 0000000000000018 0000000000000018 0000000041b58ab3 ffffffff83954208 ffffffff830bb400 ffff8800595bfd30 ffffffff8309d767 0000000000000018 0000000000000018 ffff8800595bfd78 ffffffff8309da1a 00000000810ee611 Call Trace: [<ffffffff830c84a3>] tipc_shutdown+0x553/0x880 [<ffffffff825b4a3b>] SyS_shutdown+0x14b/0x170 [<ffffffff8100334c>] do_syscall_64+0x19c/0x410 [<ffffffff83295ca5>] entry_SYSCALL64_slow_path+0x25/0x25 Code: 90 00 b4 0b 83 c7 00 f1 f1 f1 f1 4c 8d 6d e0 c7 40 04 00 00 00 f4 c7 40 08 f3 f3 f3 f3 48 89 d8 48 c1 e8 03 c7 45 b4 00 00 00 00 <80> 3c 30 00 75 78 48 8d 7b 08 49 8d 75 c0 48 b8 00 00 00 00 00 RIP [<ffffffff830bb46b>] tipc_node_xmit_skb+0x6b/0x140 RSP <ffff8800595bfce8> ---[ end trace 57b0484e351e71f1 ]--- I feel like we should maybe return -ENOMEM or -ENOBUFS, but I'm not sure userspace is equipped to handle that. Anyway, this is better than a GPF and looks somewhat consistent with other tipc_msg_create() callers. Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.