Skip to content

Commit 8bf334b

Browse files
adam900710kdave
authored andcommitted
btrfs: fix double accounting race when extent_writepage_io() failed
[BUG] If submit_one_sector() failed inside extent_writepage_io() for sector size < page size cases (e.g. 4K sector size and 64K page size), then we can hit double ordered extent accounting error. This should be very rare, as submit_one_sector() only fails when we failed to grab the extent map, and such extent map should exist inside the memory and has been pinned. [CAUSE] For example we have the following folio layout: 0 4K 32K 48K 60K 64K |//| |//////| |///| Where |///| is the dirty range we need to writeback. The 3 different dirty ranges are submitted for regular COW. Now we hit the following sequence: - submit_one_sector() returned 0 for [0, 4K) - submit_one_sector() returned 0 for [32K, 48K) - submit_one_sector() returned error for [60K, 64K) - btrfs_mark_ordered_io_finished() called for the whole folio This will mark the following ranges as finished: * [0, 4K) * [32K, 48K) Both ranges have their IO already submitted, this cleanup will lead to double accounting. * [60K, 64K) That's the correct cleanup. The only good news is, this error is only theoretical, as the target extent map is always pinned, thus we should directly grab it from memory, other than reading it from the disk. [FIX] Instead of calling btrfs_mark_ordered_io_finished() for the whole folio range, which can touch ranges we should not touch, instead move the error handling inside extent_writepage_io(). So that we can cleanup exact sectors that ought to be submitted but failed. This provides much more accurate cleanup, avoiding the double accounting. CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
1 parent 72dad8e commit 8bf334b

File tree

1 file changed

+24
-13
lines changed

1 file changed

+24
-13
lines changed

fs/btrfs/extent_io.c

+24-13
Original file line numberDiff line numberDiff line change
@@ -1420,6 +1420,7 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
14201420
struct btrfs_fs_info *fs_info = inode->root->fs_info;
14211421
unsigned long range_bitmap = 0;
14221422
bool submitted_io = false;
1423+
bool error = false;
14231424
const u64 folio_start = folio_pos(folio);
14241425
u64 cur;
14251426
int bit;
@@ -1462,20 +1463,38 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
14621463
break;
14631464
}
14641465
ret = submit_one_sector(inode, folio, cur, bio_ctrl, i_size);
1465-
if (ret < 0)
1466-
goto out;
1466+
if (unlikely(ret < 0)) {
1467+
/*
1468+
* bio_ctrl may contain a bio crossing several folios.
1469+
* Submit it immediately so that the bio has a chance
1470+
* to finish normally, other than marked as error.
1471+
*/
1472+
submit_one_bio(bio_ctrl);
1473+
/*
1474+
* Failed to grab the extent map which should be very rare.
1475+
* Since there is no bio submitted to finish the ordered
1476+
* extent, we have to manually finish this sector.
1477+
*/
1478+
btrfs_mark_ordered_io_finished(inode, folio, cur,
1479+
fs_info->sectorsize, false);
1480+
error = true;
1481+
continue;
1482+
}
14671483
submitted_io = true;
14681484
}
1469-
out:
1485+
14701486
/*
14711487
* If we didn't submitted any sector (>= i_size), folio dirty get
14721488
* cleared but PAGECACHE_TAG_DIRTY is not cleared (only cleared
14731489
* by folio_start_writeback() if the folio is not dirty).
14741490
*
14751491
* Here we set writeback and clear for the range. If the full folio
14761492
* is no longer dirty then we clear the PAGECACHE_TAG_DIRTY tag.
1493+
*
1494+
* If we hit any error, the corresponding sector will still be dirty
1495+
* thus no need to clear PAGECACHE_TAG_DIRTY.
14771496
*/
1478-
if (!submitted_io) {
1497+
if (!submitted_io && !error) {
14791498
btrfs_folio_set_writeback(fs_info, folio, start, len);
14801499
btrfs_folio_clear_writeback(fs_info, folio, start, len);
14811500
}
@@ -1495,7 +1514,6 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
14951514
{
14961515
struct btrfs_inode *inode = BTRFS_I(folio->mapping->host);
14971516
struct btrfs_fs_info *fs_info = inode->root->fs_info;
1498-
const u64 page_start = folio_pos(folio);
14991517
int ret;
15001518
size_t pg_offset;
15011519
loff_t i_size = i_size_read(&inode->vfs_inode);
@@ -1538,10 +1556,6 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
15381556

15391557
bio_ctrl->wbc->nr_to_write--;
15401558

1541-
if (ret)
1542-
btrfs_mark_ordered_io_finished(inode, folio,
1543-
page_start, PAGE_SIZE, !ret);
1544-
15451559
done:
15461560
if (ret < 0)
15471561
mapping_set_error(folio->mapping, ret);
@@ -2314,11 +2328,8 @@ void extent_write_locked_range(struct inode *inode, const struct folio *locked_f
23142328
if (ret == 1)
23152329
goto next_page;
23162330

2317-
if (ret) {
2318-
btrfs_mark_ordered_io_finished(BTRFS_I(inode), folio,
2319-
cur, cur_len, !ret);
2331+
if (ret)
23202332
mapping_set_error(mapping, ret);
2321-
}
23222333
btrfs_folio_end_lock(fs_info, folio, cur, cur_len);
23232334
if (ret < 0)
23242335
found_error = true;

0 commit comments

Comments
 (0)