Peng Tao [Thu, 18 Jul 2013 23:57:00 +0000 (09:57 +1000)]
staging/lustre: replace num_physpages with totalram_pages
The global variable num_physpages is going away. Replace it
with totalram_pages.
Signed-off-by: Peng Tao <tao.peng@emc.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Dave Chinner <dchinner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Peng Tao [Thu, 18 Jul 2013 23:57:00 +0000 (09:57 +1000)]
staging/lustre/libcfs: cleanup linux-mem.h
remove shrinker related wrappers.
Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Andreas Dilger <andreas.dilger@intel.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Dave Chinner <dchinner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Peng Tao [Thu, 18 Jul 2013 23:56:59 +0000 (09:56 +1000)]
staging/lustre/ptlrpc: convert to new shrinker API
Convert sptlrpc encode pool shrinker to use scan/count API.
Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Andreas Dilger <andreas.dilger@intel.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Dave Chinner <dchinner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Peng Tao [Thu, 18 Jul 2013 23:56:59 +0000 (09:56 +1000)]
staging/lustre/obdclass: convert lu_object shrinker to count/scan API
convert lu_object shrinker to new count/scan API.
Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Andreas Dilger <andreas.dilger@intel.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Dave Chinner <dchinner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Peng Tao [Thu, 18 Jul 2013 23:56:59 +0000 (09:56 +1000)]
staging/lustre/ldlm: convert to shrinkers to count/scan API
convert ldlm shrinker to new count/scan API.
Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Andreas Dilger <andreas.dilger@intel.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Dave Chinner <dchinner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
As suggested by Andrew, add a generic initial locking scheme used
throughout all sysv ipc mechanisms. Documenting the ids rwsem, how rcu
can be enough to do the initial checks and when to actually acquire the
kern_ipc_perm.lock spinlock.
I found that adding it to util.c was generic enough.
Clean up some of the messy do_shmat() spaghetti code, getting rid of
out_free and out_put_dentry labels. This makes shortening the critical
region of this function in the next patch a little easier to do and read.
With the *_INFO, *_STAT, IPC_RMID and IPC_SET commands already optimized,
deal with the remaining SHM_LOCK and SHM_UNLOCK commands. Take the
shm_perm lock after doing the initial auditing and security checks. The
rest of the logic remains unchanged.
While the INFO cmd doesn't take the ipc lock, the STAT commands do acquire
it unnecessarily. We can do the permissions and security checks only
holding the rcu lock.
Similar to semctl and msgctl, when calling msgctl, the *_INFO and *_STAT
commands can be performed without acquiring the ipc object.
Add a shmctl_nolock() function and move the logic of *_INFO and *_STAT out
of msgctl(). Since we are just moving functionality, this change still
takes the lock and it will be properly lockless in the next patch.
ipc,shm: introduce lockless functions to obtain the ipc object
This is the third and final patchset that deals with reducing the amount
of contention we impose on the ipc lock (kern_ipc_perm.lock). These
changes mostly deal with shared memory, previous work has already been
done for semaphores and message queues:
With these patches applied, a custom shm microbenchmark stressing shmctl
doing IPC_STAT with 4 threads a million times, reduces the execution time
by 50%. A similar run, this time with IPC_SET, reduces the execution time
from 3 mins and 35 secs to 27 seconds.
Patches 1-8: replaces blindly taking the ipc lock for a smarter combination
of rcu and ipc_obtain_object, only acquiring the spinlock when updating.
Patch 9: renames the ids rw_mutex to rwsem, which is what it already was.
Patch 10: is a trivial mqueue leftover cleanup
Patch 11: adds a brief lock scheme description, requested by Andrew.
This patch:
Add shm_obtain_object() and shm_obtain_object_check(), which will allow us
to get the ipc object without acquiring the lock. Just as with other
forms of ipc, these functions are basically wrappers around
ipc_obtain_object*().
When I'm using below ktap script to trace all event tracepoints, without
this patch, the system will hang in few seconds, the patch indeed fix the
problem as the changelog pointed.
This patch is old, I can found the original patch discussion in 2007.
http://marc.info/?l=linux-kernel&m=118544794717162&w=2 (In that mail
thread, the patch didn't fix that problem, but it fix the problem I
encountered now)
Ingo's original changelog:
Remove timer calls (!!!) from deep within the tracing infrastructure.
This was totally bogus code that can cause lockups and worse.
Poll the buffer every 2 jiffies for now.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: "zhangwei(Jovi)" <jovi.zhangwei@huawei.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
signals: eventpoll: set ->saved_sigmask at the start
task_struct->saved_sigmask has no meaning unless we return with
set_restore_sigmask() and nobody except current can use it.
This means that sys_epoll_pwait() doesn't need to save ->blocked
into the local var and then memcopy it into ->saved_sigmask, we
can simply set ->saved_sigmask right before set_current_blocked().
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eric Wong <normalperson@yhbt.net> Cc: Jason Baron <jbaron@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Implement preallocation via the fallocate syscall on VFAT partitions.
With FALLOC_FL_KEEP_SIZE, there is no way to distinguish if the mismatch
between i_size and no. of clusters allocated is a consequence of
fallocate or just plain corruption. When a non fallocate aware (old)
linux fat driver tries to write to such a file, it throws an error.Also,
fsck detects this as inconsistency and truncates the prealloc'd blocks.
To avoid this, as suggested by OGAWA, remove changes that make fallocate
persistent across mounts and restrict lifetime of blocks from fallocate(2)
to file release.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ravishankar N <ravi.n1@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
autofs4: translate pids to the right namespace for the daemon
The PID and the TGID of the process triggering the mount are sent to the
daemon. Currently the global pid values are sent (ones valid in the
initial pid namespace) but this is wrong if the autofs daemon itself is
not running in the initial pid namespace.
So send the pid values that are valid in the namespace of the autofs daemon.
The namespace to use is taken from the oz_pgrp pid pointer, which was set
at mount time to the mounting process' pid namespace.
If the pid translation fails (the triggering process is in an unrelated
pid namespace) then the automount fails with ENOENT.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Ian Kent <raven@themaw.net> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>