Skip to content

Commit d1051d6

Browse files
lorddoskiaskdave
authored andcommitted
btrfs: Fix error handling in btrfs_cleanup_ordered_extents
Running btrfs/124 in a loop hung up on me sporadically with the following call trace: btrfs D 0 5760 5324 0x00000000 Call Trace: ? __schedule+0x243/0x800 schedule+0x33/0x90 btrfs_start_ordered_extent+0x10c/0x1b0 [btrfs] ? wait_woken+0xa0/0xa0 btrfs_wait_ordered_range+0xbb/0x100 [btrfs] btrfs_relocate_block_group+0x1ff/0x230 [btrfs] btrfs_relocate_chunk+0x49/0x100 [btrfs] btrfs_balance+0xbeb/0x1740 [btrfs] btrfs_ioctl_balance+0x2ee/0x380 [btrfs] btrfs_ioctl+0x1691/0x3110 [btrfs] ? lockdep_hardirqs_on+0xed/0x180 ? __handle_mm_fault+0x8e7/0xfb0 ? _raw_spin_unlock+0x24/0x30 ? __handle_mm_fault+0x8e7/0xfb0 ? do_vfs_ioctl+0xa5/0x6e0 ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs] do_vfs_ioctl+0xa5/0x6e0 ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe ksys_ioctl+0x3a/0x70 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x60/0x1b0 entry_SYSCALL_64_after_hwframe+0x49/0xbe This happens because during page writeback it's valid for writepage_delalloc to instantiate a delalloc range which doesn't belong to the page currently being written back. The reason this case is valid is due to find_lock_delalloc_range returning any available range after the passed delalloc_start and ignoring whether the page under writeback is within that range. In turn ordered extents (OE) are always created for the returned range from find_lock_delalloc_range. If, however, a failure occurs while OE are being created then the clean up code in btrfs_cleanup_ordered_extents will be called. Unfortunately the code in btrfs_cleanup_ordered_extents doesn't consider the case of such 'foreign' range being processed and instead it always assumes that the range OE are created for belongs to the page. This leads to the first page of such foregin range to not be cleaned up since it's deliberately missed and skipped by the current cleaning up code. Fix this by correctly checking whether the current page belongs to the range being instantiated and if so adjsut the range parameters passed for cleaning up. If it doesn't, then just clean the whole OE range directly. Fixes: 5242726 ("btrfs: Handle delalloc error correctly to avoid ordered extent hang") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
1 parent 3522e90 commit d1051d6

File tree

1 file changed

+20
-9
lines changed

1 file changed

+20
-9
lines changed

fs/btrfs/inode.c

+20-9
Original file line numberDiff line numberDiff line change
@@ -110,17 +110,17 @@ static void __endio_write_update_ordered(struct inode *inode,
110110
* extent_clear_unlock_delalloc() to clear both the bits EXTENT_DO_ACCOUNTING
111111
* and EXTENT_DELALLOC simultaneously, because that causes the reserved metadata
112112
* to be released, which we want to happen only when finishing the ordered
113-
* extent (btrfs_finish_ordered_io()). Also note that the caller of
114-
* btrfs_run_delalloc_range already does proper cleanup for the first page of
115-
* the range, that is, it invokes the callback writepage_end_io_hook() for the
116-
* range of the first page.
113+
* extent (btrfs_finish_ordered_io()).
117114
*/
118115
static inline void btrfs_cleanup_ordered_extents(struct inode *inode,
119-
const u64 offset,
120-
const u64 bytes)
116+
struct page *locked_page,
117+
u64 offset, u64 bytes)
121118
{
122119
unsigned long index = offset >> PAGE_SHIFT;
123120
unsigned long end_index = (offset + bytes - 1) >> PAGE_SHIFT;
121+
u64 page_start = page_offset(locked_page);
122+
u64 page_end = page_start + PAGE_SIZE - 1;
123+
124124
struct page *page;
125125

126126
while (index <= end_index) {
@@ -131,8 +131,18 @@ static inline void btrfs_cleanup_ordered_extents(struct inode *inode,
131131
ClearPagePrivate2(page);
132132
put_page(page);
133133
}
134-
return __endio_write_update_ordered(inode, offset + PAGE_SIZE,
135-
bytes - PAGE_SIZE, false);
134+
135+
/*
136+
* In case this page belongs to the delalloc range being instantiated
137+
* then skip it, since the first page of a range is going to be
138+
* properly cleaned up by the caller of run_delalloc_range
139+
*/
140+
if (page_start >= offset && page_end <= (offset + bytes - 1)) {
141+
offset += PAGE_SIZE;
142+
bytes -= PAGE_SIZE;
143+
}
144+
145+
return __endio_write_update_ordered(inode, offset, bytes, false);
136146
}
137147

138148
static int btrfs_dirty_inode(struct inode *inode);
@@ -1603,7 +1613,8 @@ int btrfs_run_delalloc_range(void *private_data, struct page *locked_page,
16031613
write_flags);
16041614
}
16051615
if (ret)
1606-
btrfs_cleanup_ordered_extents(inode, start, end - start + 1);
1616+
btrfs_cleanup_ordered_extents(inode, locked_page, start,
1617+
end - start + 1);
16071618
return ret;
16081619
}
16091620

0 commit comments

Comments
 (0)