mm: filemap: update find_get_pages_tag() to deal with shadow entries
Dave Jones reports the following crash when find_get_pages_tag() runs into
an exceptional entry:
kernel BUG at mm/filemap.c:1347!
RIP: 0010:[<
ffffffffb815aeab>] [<
ffffffffb815aeab>] find_get_pages_tag+0x1cb/0x220
Call Trace:
[<
ffffffffb815ad16>] ? find_get_pages_tag+0x36/0x220
[<
ffffffffb8168511>] pagevec_lookup_tag+0x21/0x30
[<
ffffffffb81595de>] filemap_fdatawait_range+0xbe/0x1e0
[<
ffffffffb8159727>] filemap_fdatawait+0x27/0x30
[<
ffffffffb81f2fa4>] sync_inodes_sb+0x204/0x2a0
[<
ffffffffb874d98f>] ? wait_for_completion+0xff/0x130
[<
ffffffffb81fa5b0>] ? vfs_fsync+0x40/0x40
[<
ffffffffb81fa5c9>] sync_inodes_one_sb+0x19/0x20
[<
ffffffffb81caab2>] iterate_supers+0xb2/0x110
[<
ffffffffb81fa864>] sys_sync+0x44/0xb0
[<
ffffffffb875c4a9>] ia32_do_call+0x13/0x13
1343 /*
1344 * This function is never used on a shmem/tmpfs
1345 * mapping, so a swap entry won't be found here.
1346 */
1347 BUG();
After
0cd6144aadd2 ("mm + fs: prepare for non-page entries in page cache
radix trees") this comment and BUG() are out of date because exceptional
entries can now appear in all mappings - as shadows of recently evicted
pages.
However, as Hugh Dickins notes,
"it is truly surprising for a PAGECACHE_TAG_WRITEBACK (and probably
any other PAGECACHE_TAG_*) to appear on an exceptional entry.
I expect it comes down to an occasional race in RCU lookup of the
radix_tree: lacking absolute synchronization, we might sometimes
catch an exceptional entry, with the tag which really belongs with
the unexceptional entry which was there an instant before."
And indeed, not only is the tree walk lockless, the tags are also read in
chunks, one radix tree node at a time. There is plenty of time for page
reclaim to swoop in and replace a page that was already looked up as
tagged with a shadow entry.
Remove the BUG() and update the comment. While reviewing all other lookup
sites for whether they properly deal with shadow entries of evicted pages,
update all the comments and fix memcg file charge moving to not miss
shmem/tmpfs swapcache pages.
Fixes: 0cd6144aadd2 ("mm + fs: prepare for non-page entries in page cache radix trees")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Dave Jones <davej@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>