Skip to content

Commit 88c0cda

Browse files
committed
btrfs: fix double accounting of ordered extents during errors
[BUG] Btrfs will fail generic/750 randomly if its sector size is smaller than page size. One of the warning looks like this: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 90263 at fs/btrfs/ordered-data.c:360 can_finish_ordered_extent+0x33c/0x390 [btrfs] CPU: 1 UID: 0 PID: 90263 Comm: kworker/u18:1 Tainted: G OE 6.12.0-rc3-custom+ torvalds#79 Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs] pc : can_finish_ordered_extent+0x33c/0x390 [btrfs] lr : can_finish_ordered_extent+0xdc/0x390 [btrfs] Call trace: can_finish_ordered_extent+0x33c/0x390 [btrfs] btrfs_mark_ordered_io_finished+0x130/0x2b8 [btrfs] extent_writepage+0xfc/0x338 [btrfs] extent_write_cache_pages+0x1d4/0x4b8 [btrfs] btrfs_writepages+0x94/0x158 [btrfs] do_writepages+0x74/0x190 filemap_fdatawrite_wbc+0x88/0xc8 start_delalloc_inodes+0x180/0x3b0 [btrfs] btrfs_start_delalloc_roots+0x17c/0x288 [btrfs] shrink_delalloc+0x11c/0x280 [btrfs] flush_space+0x27c/0x310 [btrfs] btrfs_async_reclaim_metadata_space+0xcc/0x208 [btrfs] process_one_work+0x228/0x670 worker_thread+0x1bc/0x360 kthread+0x100/0x118 ret_from_fork+0x10/0x20 irq event stamp: 9784200 hardirqs last enabled at (9784199): [<ffffd21ec54dc01c>] _raw_spin_unlock_irqrestore+0x74/0x80 hardirqs last disabled at (9784200): [<ffffd21ec54db374>] _raw_spin_lock_irqsave+0x8c/0xa0 softirqs last enabled at (9784148): [<ffffd21ec472ff44>] handle_softirqs+0x45c/0x4b0 softirqs last disabled at (9784141): [<ffffd21ec46d01e4>] __do_softirq+0x1c/0x28 ---[ end trace 0000000000000000 ]--- BTRFS critical (device dm-2): bad ordered extent accounting, root=5 ino=1492 OE offset=1654784 OE len=57344 to_dec=49152 left=0 [CAUSE] There are several error paths not properly handling during folio writeback: 1) Partially submitted folio During extent_writepage_io() if some error happened (the only possible case is submit_one_sector() failed to grab an extent map), then we can have partially submitted folio. Since extent_writepage_io() failed, we need to call btrfs_mark_ordered_io_finished() to cleanup the submitted range. But we will call btrfs_mark_ordered_io_finished() for submitted range too, causing double accounting. 2) Partially created ordered extents We cal also fail at writepage_delalloc(), which will stop creating new ordered extents if it hit any error from btrfs_run_delalloc_range(). In that case, we will call btrfs_mark_ordered_io_finished() for ranges where there is no ordered extent at all. Both bugs are only affecting sector size < page size cases. [FIX] - Introduce a new member btrfs_bio_ctrl::last_submitted This will trace the last sector submitted through extent_writepage_io(). So for the above extent_writepage() case, we will know exactly which sectors are submitted and should not do the ordered extent accounting. - Clear the submit_bitmap for ranges where no ordered extent is created So if btrfs_run_delalloc_range() failed for a range, it will be not cleaned up. - Pass NULL to __unlock_delalloc() inside writepage_delalloc() for error handling Since we will no longer unlock the folio range (subbit_bitmap cleared), so we have to unlock the folio at the error handling. - Introduce a helper cleanup_ordered_extents() This will do a sector-by-sector cleanup with btrfs_bio_ctrl::last_submitted and btrfs_bio_ctrl::submit_bitmap into consideartion. Using @last_submitted is to avoid double accounting on the submitted ranges. Meanwhile using @submit_bitmap is to avoid touching ranges going through compression. cc: stable@vger.kernel.org # 5.15+ Signed-off-by: Qu Wenruo <wqu@suse.com>
1 parent 1652a49 commit 88c0cda

File tree

1 file changed

+48
-8
lines changed

1 file changed

+48
-8
lines changed

fs/btrfs/extent_io.c

+48-8
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,14 @@ struct btrfs_bio_ctrl {
108108
* This is to avoid touching ranges covered by compression/inline.
109109
*/
110110
unsigned long submit_bitmap;
111+
112+
/*
113+
* The end (exclusive) of the last submitted range in the folio.
114+
*
115+
* This is for sector size < page size case where we may hit error
116+
* half way.
117+
*/
118+
u64 last_submitted;
111119
};
112120

113121
static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl)
@@ -1247,18 +1255,25 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
12471255
*/
12481256
unlock_extent(&inode->io_tree, found_start,
12491257
found_start + found_len - 1, NULL);
1250-
__unlock_for_delalloc(&inode->vfs_inode, folio,
1258+
__unlock_for_delalloc(&inode->vfs_inode, NULL,
12511259
found_start,
12521260
found_start + found_len - 1);
12531261
}
12541262

12551263
/*
12561264
* We have some ranges that's going to be submitted asynchronously
1257-
* (compression or inline). These range have their own control
1265+
* (compression or inline, ret > 0). These range have their own control
12581266
* on when to unlock the pages. We should not touch them
1259-
* anymore, so clear the range from the submission bitmap.
1267+
* anymore.
1268+
*
1269+
* We can also have some ranges where we didn't even call
1270+
* btrfs_run_delalloc_range() (as previous run failed, ret < 0).
1271+
* These error ranges should not be submitted nor cleaned up as
1272+
* there is no ordered extent allocated for them.
1273+
*
1274+
* For either cases, we should clear the submit_bitmap.
12601275
*/
1261-
if (ret > 0) {
1276+
if (ret) {
12621277
unsigned int start_bit = (found_start - page_start) >>
12631278
fs_info->sectorsize_bits;
12641279
unsigned int end_bit = (min(page_end + 1, found_start + found_len) -
@@ -1435,6 +1450,7 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
14351450
ret = submit_one_sector(inode, folio, cur, bio_ctrl, i_size);
14361451
if (ret < 0)
14371452
goto out;
1453+
bio_ctrl->last_submitted = cur + fs_info->sectorsize;
14381454
submitted_io = true;
14391455
}
14401456
out:
@@ -1453,6 +1469,24 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
14531469
return ret;
14541470
}
14551471

1472+
static void cleanup_ordered_extents(struct btrfs_inode *inode,
1473+
struct folio *folio, u64 file_pos,
1474+
u64 num_bytes, unsigned long *bitmap)
1475+
{
1476+
struct btrfs_fs_info *fs_info = inode->root->fs_info;
1477+
unsigned int cur_bit = (file_pos - folio_pos(folio)) >> fs_info->sectorsize_bits;
1478+
1479+
for_each_set_bit_from(cur_bit, bitmap, fs_info->sectors_per_page) {
1480+
u64 cur_pos = folio_pos(folio) + (cur_bit << fs_info->sectorsize_bits);
1481+
1482+
if (cur_pos >= file_pos + num_bytes)
1483+
break;
1484+
1485+
btrfs_mark_ordered_io_finished(inode, folio, cur_pos,
1486+
fs_info->sectorsize, false);
1487+
}
1488+
}
1489+
14561490
/*
14571491
* the writepage semantics are similar to regular writepage. extent
14581492
* records are inserted to lock ranges in the tree, and as dirty areas
@@ -1492,6 +1526,7 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
14921526
* The proper bitmap can only be initialized until writepage_delalloc().
14931527
*/
14941528
bio_ctrl->submit_bitmap = (unsigned long)-1;
1529+
bio_ctrl->last_submitted = page_start;
14951530
ret = set_folio_extent_mapped(folio);
14961531
if (ret < 0)
14971532
goto done;
@@ -1511,8 +1546,10 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
15111546

15121547
done:
15131548
if (ret) {
1514-
btrfs_mark_ordered_io_finished(BTRFS_I(inode), folio,
1515-
page_start, PAGE_SIZE, !ret);
1549+
cleanup_ordered_extents(BTRFS_I(inode), folio,
1550+
bio_ctrl->last_submitted,
1551+
page_start + PAGE_SIZE - bio_ctrl->last_submitted,
1552+
&bio_ctrl->submit_bitmap);
15161553
mapping_set_error(folio->mapping, ret);
15171554
}
15181555

@@ -2288,14 +2325,17 @@ void extent_write_locked_range(struct inode *inode, const struct folio *locked_f
22882325
* extent_writepage_io() will do the truncation correctly.
22892326
*/
22902327
bio_ctrl.submit_bitmap = (unsigned long)-1;
2328+
bio_ctrl.last_submitted = cur;
22912329
ret = extent_writepage_io(BTRFS_I(inode), folio, cur, cur_len,
22922330
&bio_ctrl, i_size);
22932331
if (ret == 1)
22942332
goto next_page;
22952333

22962334
if (ret) {
2297-
btrfs_mark_ordered_io_finished(BTRFS_I(inode), folio,
2298-
cur, cur_len, !ret);
2335+
cleanup_ordered_extents(BTRFS_I(inode), folio,
2336+
bio_ctrl.last_submitted,
2337+
cur_end + 1 - bio_ctrl.last_submitted,
2338+
&bio_ctrl.submit_bitmap);
22992339
mapping_set_error(mapping, ret);
23002340
}
23012341
btrfs_folio_end_lock(fs_info, folio, cur, cur_len);

0 commit comments

Comments
 (0)