git.karo-electronics.de Git - karo-tx-linux.git/commit

The maximum size of a shmem/tmpfs file has been limited by the maximum
size of its triple-indirect swap vector.  With 4kB page size, maximum
filesize was just over 2TB on a 32-bit kernel, but sadly one eighth of
that on a 64-bit kernel.  (With 8kB page size, maximum filesize was just
over 4TB on a 64-bit kernel, but 16TB on a 32-bit kernel, MAX_LFS_FILESIZE
being then more restrictive than swap vector layout.)

It's a shame that tmpfs should be more restrictive than ramfs, and this
limitation has now been noticed.  Add another level to the swap vector?
No, it became obscure and hard to maintain, once I complicated it to make
use of highmem pages nine years ago: better choose another way.

Surely, if 2.4 had had the radix tree pagecache introduced in 2.5, then
tmpfs would never have invented its own peculiar radix tree: we would have
fitted swap entries into the common radix tree instead, in much the same
way as we fit swap entries into page tables.

And why should each file have a separate radix tree for its pages and for
its swap entries?  The swap entries are required precisely where and when
the pages are not.  We want to put them together in a single radix tree:
which can then avoid much of the locking which was needed to prevent them
from being exchanged underneath us.

This also avoids the waste of memory devoted to swap vectors, first in the
shmem_inode itself, then at least two more pages once a file grew beyond
16 data pages (pages accounted by df and du, but not by memcg).  Allocated
upfront, to avoid allocation when under swapping pressure, but pure waste
when CONFIG_SWAP is not set - I have never spattered around the ifdefs to
prevent that, preferring this move to sharing the common radix tree
instead.

There are three downsides to sharing the radix tree.  One, that it binds
tmpfs more tightly to the rest of mm, either requiring knowledge of swap
entries in radix tree there, or duplication of its code here in shmem.c.
I believe that the simplications and memory savings (and probable higher
performance, not yet measured) justify that.

Two, that on HIGHMEM systems with SWAP enabled, it's the lowmem radix
nodes that cannot be freed under memory pressure - whereas before it was
the less precious highmem swap vector pages that could not be freed.  I'm
hoping that 64-bit has now been accessible for long enough, that the
highmem argument has grown much less persuasive.

Three, that swapoff is slower than it used to be on tmpfs files, since
it's using a simple generic mechanism not tailored to it: I find this
noticeable, and shall want to improve, but maybe nobody else will notice.

So...  now remove most of the old swap vector code from shmem.c.  But, for
the moment, keep the simple i_direct vector of 16 pages, with simple
accessors shmem_put_swap() and shmem_get_swap(), as a toy implementation
to help mark where swap needs to be handled in subsequent patches.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

author	Hugh Dickins <hughd@google.com>
	Wed, 3 Aug 2011 00:52:52 +0000 (10:52 +1000)
committer	Stephen Rothwell <sfr@canb.auug.org.au>
	Thu, 4 Aug 2011 02:50:40 +0000 (12:50 +1000)
commit	7afa43b3ce755296183bdfc55214d6df2ce1082a
tree	2b434dc6564b1cbcbd2c79d6b0f607da94ade91d	tree \| snapshot
parent	682a889af670bc642183ff3c15a50217c0cfad67	commit \| diff

include/linux/shmem_fs.h		diff \| blob \| history
mm/shmem.c		diff \| blob \| history