Skip to content

Commit 2827e29

Browse files
fdmananakdave
authored andcommitted
btrfs: immediately drop extent maps after failed COW write
If a write path in COW mode fails, either before submitting a bio for the new extents or an actual IO error happens, we can end up allowing a fast fsync to log file extent items that point to unwritten extents. This is because the ordered extent completion for a failed write is executed in a work queue. This means that once the write path unlocks the inode, a fast fsync can come and log the extent maps created by the write attempt before the work queue completes the ordered extent. For example consider a direct IO write, in COW mode, that fails at btrfs_dio_submit_io() because btrfs_extract_ordered_extent() returned an error: 1) We call btrfs_finish_ordered_extent() with the 'uptodate' parameter set to false, meaning an error happened; 2) That results in marking the ordered extent with the BTRFS_ORDERED_IOERR flag; 3) btrfs_finish_ordered_extent() queues the completion of the ordered extent - so that btrfs_finish_one_ordered() will be executed later in a work queue. That function will drop extents maps in the range when it's executed, since the extent maps point to unwritten locations (signaled by the BTRFS_ORDERED_IOERR flag); 4) After calling btrfs_finish_ordered_extent() we keep going down the write path and unlock the inode; 5) After that a fast fsync starts and locks the inode; 6) Before the work queue executes btrfs_finish_one_ordered(), the fsync task sees the extent maps that point to the unwritten locations and logs file extent items based on them - it does not know they are unwritten, and the fast fsync path does not wait for ordered extents to complete in order to reduce latency. So to fix this make btrfs_finish_ordered_extent() drop the extent maps in the range if an error happened for a COW write. Note that this issues of using extent maps that point to unwritten locations can not happen for reads, because in read paths we start by locking the extent range and wait for any ordered extents in the range to complete before looking for extent maps. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
1 parent 9e82d3e commit 2827e29

File tree

1 file changed

+27
-0
lines changed

1 file changed

+27
-0
lines changed

fs/btrfs/ordered-data.c

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -388,6 +388,33 @@ bool btrfs_finish_ordered_extent(struct btrfs_ordered_extent *ordered,
388388
ret = can_finish_ordered_extent(ordered, page, file_offset, len, uptodate);
389389
spin_unlock_irqrestore(&inode->ordered_tree_lock, flags);
390390

391+
/*
392+
* If this is a COW write it means we created new extent maps for the
393+
* range and they point to an unwritten location if we got an error
394+
* either before submitting a bio or during IO.
395+
*
396+
* We have marked the ordered extent with BTRFS_ORDERED_IOERR, and we
397+
* are queuing its completion below. During completion, at
398+
* btrfs_finish_one_ordered(), we will drop the extent maps for the
399+
* unwritten extents.
400+
*
401+
* However because completion runs in a work queue we can end up
402+
* unlocking the inode before the ordered extent is completed.
403+
*
404+
* That means that a fast fsync can happen before the work queue
405+
* executes the completion of the ordered extent, and in that case
406+
* the fsync will use the extent maps that point to unwritten extents,
407+
* resulting in logging file extent items that point to unwritten
408+
* locations. Unlike read paths, a fast fsync doesn't wait for ordered
409+
* extent completion before proceeding (intentional to reduce latency).
410+
*
411+
* To be safe drop the new extent maps in the range (if are doing COW)
412+
* right here before we unlock the inode and allow a fsync to run.
413+
*/
414+
if (!uptodate && !test_bit(BTRFS_ORDERED_NOCOW, &ordered->flags))
415+
btrfs_drop_extent_map_range(inode, file_offset,
416+
file_offset + len - 1, false);
417+
391418
if (ret)
392419
btrfs_queue_ordered_fn(ordered);
393420
return ret;

0 commit comments

Comments
 (0)