From: Eric W. Biederman Date: Wed, 9 Apr 2014 22:16:50 +0000 (-0700) Subject: vfs: In mntput run deactivate_super on a shallow stack. X-Git-Tag: next-20140428~9^2~1 X-Git-Url: https://git.karo-electronics.de/?a=commitdiff_plain;h=36e5e6d62c03ba3347c651deabce86256f67ae3a;p=karo-tx-linux.git vfs: In mntput run deactivate_super on a shallow stack. mntput as part of path_put is called from all over the vfs sometimes as in the case of symlink chasing from some rather deep call chains. During lazy filesystem unmount with the right set of races those innocuous little mntput calls (that take very little stack space) can call deactivate_super and wind up taking 3k of stack space or more (David Chinner reports 5k for xfs). Avoid deactivate_super being called from a deep stack by moving the cleanup of the mount into a work queue. To avoid semantic changes mntput waits for deactivate_super to complete before returning. With this change all filesystem unmounting happens with about 7400 bytes free on the stack at the point where deactivate_super is called. Giving filesystems plenty of room to do I/O and not overflow the kernel stack during unmounting. Signed-off-by: "Eric W. Biederman" --- diff --git a/fs/mount.h b/fs/mount.h index c5e717542bbc..4104a3cca238 100644 --- a/fs/mount.h +++ b/fs/mount.h @@ -59,6 +59,8 @@ struct mount { int mnt_expiry_mark; /* true if marked for expiry */ int mnt_pinned; struct path mnt_ex_mountpoint; + struct work_struct mnt_cleanup_work; + struct completion *mnt_undone; }; #define MNT_NS_INTERNAL ERR_PTR(-EINVAL) /* distinct from any mnt_namespace */ diff --git a/fs/namespace.c b/fs/namespace.c index 81086e46f1f7..1d92f888f4dc 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -953,8 +953,25 @@ static void delayed_free(struct rcu_head *head) kmem_cache_free(mnt_cache, mnt); } +static void cleanup_mnt(struct mount *mnt) +{ + fsnotify_vfsmount_delete(&mnt->mnt); + dput(mnt->mnt.mnt_root); + deactivate_super(mnt->mnt.mnt_sb); + mnt_free_id(mnt); + complete(mnt->mnt_undone); + call_rcu(&mnt->mnt_rcu, delayed_free); +} + +static void cleanup_mnt_work(struct work_struct *work) +{ + cleanup_mnt(container_of(work, struct mount, mnt_cleanup_work)); +} + static void mntput_no_expire(struct mount *mnt) { + struct completion undone; + rcu_read_lock(); mnt_add_count(mnt, -1); if (likely(mnt->mnt_ns)) { /* shouldn't be the last one */ @@ -997,11 +1014,16 @@ static void mntput_no_expire(struct mount *mnt) * so mnt_get_writers() below is safe. */ WARN_ON(mnt_get_writers(mnt)); - fsnotify_vfsmount_delete(&mnt->mnt); - dput(mnt->mnt.mnt_root); - deactivate_super(mnt->mnt.mnt_sb); - mnt_free_id(mnt); - call_rcu(&mnt->mnt_rcu, delayed_free); + /* The stack may be deep here, cleanup the mount on a work + * queue where the stack is guaranteed to be shallow. + */ + init_completion(&undone); + mnt->mnt_undone = &undone; + + INIT_WORK(&mnt->mnt_cleanup_work, cleanup_mnt_work); + schedule_work(&mnt->mnt_cleanup_work); + + wait_for_completion(&undone); } void mntput(struct vfsmount *mnt)