From 14d5e800a52a2fdca550c0defa68b37e2dbb34d3 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Thu, 23 May 2013 10:37:18 +1000 Subject: [PATCH] mm-vmscan-block-kswapd-if-it-is-encountering-pages-under-writeback-fix Historically, kswapd used to congestion_wait() at higher priorities if it was not making forward progress. This made no sense as the failure to make progress could be completely independent of IO. It was later replaced by wait_iff_congested() and removed entirely by commit 258401a6 (mm: don't wait on congested zones in balance_pgdat()) as it was duplicating logic in shrink_inactive_list(). This is problematic. If kswapd encounters many pages under writeback and it continues to scan until it reaches the high watermark then it will quickly skip over the pages under writeback and reclaim clean young pages or push applications out to swap. The use of wait_iff_congested() is not suited to kswapd as it will only stall if the underlying BDI is really congested or a direct reclaimer was unable to write to the underlying BDI. kswapd bypasses the BDI congestion as it sets PF_SWAPWRITE but even if this was taken into account then it would cause direct reclaimers to stall on writeback which is not desirable. This patch sets a ZONE_WRITEBACK flag if direct reclaim or kswapd is encountering too many pages under writeback. If this flag is set and kswapd encounters a PageReclaim page under writeback then it'll assume that the LRU lists are being recycled too quickly before IO can complete and block waiting for some IO to complete. Signed-off-by: Mel Gorman Cc: Mel Gorman Cc: Michal Hocko Cc: Rik van Riel Cc: Johannes Weiner Cc: KAMEZAWA Hiroyuki Cc: Jiri Slaby Cc: Valdis Kletnieks Cc: Zlatko Calusic Cc: dormando Signed-off-by: Andrew Morton --- mm/vmscan.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 911c9cd3d81a..45aee36d9b93 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -732,9 +732,11 @@ static unsigned long shrink_page_list(struct list_head *page_list, * under writeback and this page is both under writeback and * PageReclaim then it indicates that pages are being queued * for IO but are being recycled through the LRU before the - * IO can complete. In this case, wait on the IO to complete - * and then clear the ZONE_WRITEBACK flag to recheck if the - * condition exists. + * IO can complete. Waiting on the page itself risks an + * indefinite stall if it is impossible to writeback the + * page due to IO error or disconnected storage so instead + * block for HZ/10 or until some IO completes then clear the + * ZONE_WRITEBACK flag to recheck if the condition exists. * * 2) Global reclaim encounters a page, memcg encounters a * page that is not marked for immediate reclaim or @@ -764,7 +766,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, if (current_is_kswapd() && PageReclaim(page) && zone_is_reclaim_writeback(zone)) { - wait_on_page_writeback(page); + congestion_wait(BLK_RW_ASYNC, HZ/10); zone_clear_flag(zone, ZONE_WRITEBACK); /* Case 2 above */ -- 2.39.5