git.karo-electronics.de Git - karo-tx-linux.git/log

]> git.karo-electronics.de Git - karo-tx-linux.git/log

projects / karo-tx-linux.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Cyrill Gorcunov [Thu, 3 May 2012 05:44:59 +0000 (15:44 +1000)]

c/r: prctl: extend PR_SET_MM to set up more mm_struct entries

During checkpoint we dump whole process memory to a file and the dump
includes process stack memory. But among stack data itself, the stack
carries additional parameters such as command line arguments, environment
data and auxiliary vector.

So when we do restore procedure and once we've restored stack data itself
we need to setup mm_struct::arg_start/end, env_start/end, so restored
process would be able to find command line arguments and environment data
it had at checkpoint time. The same applies to auxiliary vector.

For this reason additional PR_SET_MM_(ARG_START | ARG_END | ENV_START |
ENV_END | AUXV) codes are introduced.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Cyrill Gorcunov [Thu, 3 May 2012 05:44:59 +0000 (15:44 +1000)]

c/r: procfs: add arg_start/end, env_start/end and exit_code members to /proc/$pid/stat

We would like to have an ability to restore command line arguments and
program environment pointers but first we need to obtain them somehow.
Thus we put these values into /proc/$pid/stat. The exit_code is needed to
restore zombie tasks.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Andrew Morton [Thu, 3 May 2012 05:44:59 +0000 (15:44 +1000)]

syscalls-x86-add-__nr_kcmp-syscall-v8-comment-update-fix

tweak comment text

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Cyrill Gorcunov [Thu, 3 May 2012 05:44:58 +0000 (15:44 +1000)]

syscalls-x86-add-__nr_kcmp-syscall-v8 comment update

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Cyrill Gorcunov [Thu, 3 May 2012 05:44:58 +0000 (15:44 +1000)]

syscalls, x86: add __NR_kcmp syscall

While doing the checkpoint-restore in the user space one need to determine
whether various kernel objects (like mm_struct-s of file_struct-s) are
shared between tasks and restore this state.

The 2nd step can be solved by using appropriate CLONE_ flags and the
unshare syscall, while there's currently no ways for solving the 1st one.

One of the ways for checking whether two tasks share e.g. mm_struct is to
provide some mm_struct ID of a task to its proc file, but showing such
info considered to be not that good for security reasons.

Thus after some debates we end up in conclusion that using that named
'comparison' syscall might be the best candidate. So here is it --
__NR_kcmp.

It takes up to 5 arguments - the pids of the two tasks (which
characteristics should be compared), the comparison type and (in case of
comparison of files) two file descriptors.

Lookups for pids are done in the caller's PID namespace only.

At moment only x86 is supported and tested.

[akpm@linux-foundation.org: fix up selftests, warnings]
[akpm@linux-foundation.org: include errno.h]
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Valdis.Kletnieks@vt.edu
Cc: Michal Marek <mmarek@suse.cz>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Cyrill Gorcunov [Thu, 3 May 2012 05:44:57 +0000 (15:44 +1000)]

c/r: fs, proc: Move children entry back to tid_base_stuff

While merging to linux-next the "children" entry jumped from
tid_base_stuff to tgid_base_stuff by mistake. Fix it, by moving it back.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Cyrill Gorcunov [Thu, 3 May 2012 05:44:57 +0000 (15:44 +1000)]

fs, proc: introduce /proc/<pid>/task/<tid>/children entry

When we do checkpoint of a task we need to know the list of children the
task, has but there is no easy and fast way to generate reverse
parent->children chain from arbitrary <pid> (while a parent pid is
provided in "PPid" field of /proc/<pid>/status).

So instead of walking over all pids in the system (creating one big
process tree in memory, just to figure out which children a task has) --
we add explicit /proc/<pid>/task/<tid>/children entry, because the kernel
already has this kind of information but it is not yet exported.

This is a first level children, not the whole process tree.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Cyrill Gorcunov [Thu, 3 May 2012 05:44:57 +0000 (15:44 +1000)]

sysctl: make kernel.ns_last_pid control dependent on CHECKPOINT_RESTORE

For those who doesn't need C/R functionality there is no need to control
last pid, ie the pid for the next fork() call.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Christopher Yeoh [Thu, 3 May 2012 05:44:56 +0000 (15:44 +1000)]

aio/vfs: cleanup of rw_copy_check_uvector() and compat_rw_copy_check_uvector()

A cleanup of rw_copy_check_uvector and compat_rw_copy_check_uvector after
changes made to support CMA in an earlier patch.

Rather than having an additional check_access parameter to these
functions, the first paramater type is overloaded to allow the caller to
specify CHECK_IOVEC_ONLY which means check that the contents of the iovec
are valid, but do not check the memory that they point to. This is used
by process_vm_readv/writev where we need to validate that a iovec passed
to the syscall is valid but do not want to check the memory that it points
to at this point because it refers to an address space in another process.

Signed-off-by: Chris Yeoh <yeohc@au1.ibm.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

NeilBrown [Thu, 3 May 2012 05:44:56 +0000 (15:44 +1000)]

w1: introduce a slave mutex for serializing IO

w1 devices need a mutex to serial IO. Most use master->mutex.
However that is used for other purposes and they can conflict.

In particular master->mutex is held while w1_attach_slave_device is
called.

For bq27000, this registers a 'powersupply' device which tries to read the
current status. The attempt to read will cause a deadlock on
master->mutex.

So create a new per-slave mutex and use that for serializing IO for
bq27000.

Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Andrew Morton [Thu, 3 May 2012 05:44:55 +0000 (15:44 +1000)]

eventfd-change-int-to-__u64-in-eventfd_signal-fix

update interface documentation

Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Sha Zhengju <handai.szj@taobao.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Sha Zhengju [Thu, 3 May 2012 05:44:55 +0000 (15:44 +1000)]

eventfd: change int to __u64 in eventfd_signal()

eventfd_ctx->count is an __u64 counter which is allowed to reach
ULLONG_MAX. eventfd_write() adds a __u64 value to "count", but the kernel
side eventfd_signal() only adds an int value to it. Make them consistent.

Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Alexandre Bounine [Thu, 3 May 2012 05:44:54 +0000 (15:44 +1000)]

rapidio/tsi721: add DMA engine support

Adds support for DMA Engine API into Tsi721 mport driver.

Includes following changes for Tsi721 driver:
- Modifies BDMA register offset definitions to support per-channel handling
- Separates BDMA channel reserved for RIO Maintenance requests
- Adds DMA Engine callback routines

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Li Yang <leoli@freescale.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Alexandre Bounine [Thu, 3 May 2012 05:44:54 +0000 (15:44 +1000)]

rapidio: add DMA engine support for RIO data transfers

Adds DMA Engine framework support into RapidIO subsystem.

Uses DMA Engine DMA_SLAVE interface to generate data transfers to/from
remote RapidIO target devices.

Introduces RapidIO-specific wrapper for prep_slave_sg() interface with an
extra parameter to pass target specific information.

Uses scatterlist to describe local data buffer. Address flat data buffer
on a remote side.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Acked-by: Vinod Koul <vinod.koul@linux.intel.com>
Cc: Li Yang <leoli@freescale.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Andrew Morton [Thu, 3 May 2012 05:44:54 +0000 (15:44 +1000)]

tools-selftests-add-mq_perf_tests-checkpatch-fixes

Cc: Doug Ledford <dledford@redhat.com>
ERROR: space required after that ',' (ctx:VxV)
#117: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:73:
+#define min(a,b) ((a) < (b) ? (a) : (b))
              ^

ERROR: that open brace { should be on the previous line
#145: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:101:
+const struct poptOption options[] =
+{

WARNING: externs should be avoided in .c files
#196: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:152:
+void shutdown(int exit_val, char *err_cause, int line_no);

WARNING: externs should be avoided in .c files
#197: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:153:
+void sig_action_SIGUSR1(int signum, siginfo_t *info, void *context);

WARNING: externs should be avoided in .c files
#198: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:154:
+void sig_action(int signum, siginfo_t *info, void *context);

WARNING: externs should be avoided in .c files
#205: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:161:
+void increase_limits(void);

ERROR: do not initialise statics to 0 or NULL
#217: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:173:
+ static int in_shutdown = 0;

ERROR: spaces required around that '=' (ctx:VxV)
#225: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:181:
+ for (i=0; i < num_cpus_to_pin; i++)
      ^

WARNING: quoted string split across lines
#258: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:214:
+ fprintf(stderr, "Caught signal %d in SIGUSR1 handler, "
+ "exiting\n", signum);

ERROR: do not use assignment in if condition
#336: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:292:
+ if ((queue = mq_open(queue_path, flags, perms, attr)) == -1)

ERROR: spaces required around that '=' (ctx:VxV)
#352: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:308:
+ for (i=0; i < num_cpus_to_pin; i++)
      ^

ERROR: space required before the open parenthesis '('
#357: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:313:
+ while(1) ;

ERROR: trailing statements should be on next line
#357: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:313:
+ while(1) ;

ERROR: spaces required around that '=' (ctx:VxV)
#365: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:321:
+ for (i=0; i < num_cpus_to_pin; i++)
      ^

ERROR: space required before the open parenthesis '('
#370: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:326:
+ while(1) {

ERROR: space required before the open parenthesis '('
#371: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:327:
+ while(mq_send(queue, buff, sizeof(buff), 0) == 0);

ERROR: trailing statements should be on next line
#371: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:327:
+ while(mq_send(queue, buff, sizeof(buff), 0) == 0);

ERROR: Macros with complex values should be enclosed in parenthesis
#376: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:332:
+#define drain_queue() \
+ while (mq_receive(queue, buff, MSG_SIZE, &prio_in) == MSG_SIZE)

ERROR: space required before the open parenthesis '('
#383: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:339:
+ } while(0)

WARNING: Statements terminations use 1 semicolon
#423: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:379:
+ *prio = 0;;

ERROR: do not use assignment in if condition
#475: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:431:
+ if ((mq_prio_max = sysconf(_SC_MQ_PRIO_MAX)) == -1)

WARNING: quoted string split across lines
#490: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:446:
+ printf("\n\tTest #1: Time send/recv message, queue "
+        "empty\n");

ERROR: trailing statements should be on next line
#566: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:522:
+ while (try_set(max_msgs, cur_max_msgs += 10));

ERROR: trailing statements should be on next line
#568: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:524:
+ while (try_set(max_msgsize, cur_max_msgsize += 1024));

ERROR: do not use assignment in if condition
#593: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:549:
+ if ((cpu_set = CPU_ALLOC(cpus_online)) == NULL) {

WARNING: quoted string split across lines
#615: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:571:
+ fprintf(stderr, "CPU %d exceeds "
+ "cpus online, ignoring.\n",

WARNING: quoted string split across lines
#628: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:584:
+ fprintf(stderr, "Any given CPU may "
+ "only be given once.\n");

WARNING: quoted string split across lines
#660: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:616:
+ fprintf(stderr, "Must pass at least one CPU to continuous "
+ "mode.\n");

WARNING: quoted string split across lines
#670: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:626:
+ fprintf(stderr, "Not running as root, but almost all tests "
+ "require root in order to modify\nsystem settings.  "

ERROR: spaces required around that '=' (ctx:VxV)
#721: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:677:
+ for(cpu=1; cpu < num_cpus_to_pin; cpu++)
       ^

ERROR: space required before the open parenthesis '('
#721: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:677:
+ for(cpu=1; cpu < num_cpus_to_pin; cpu++)

ERROR: spaces required around that '=' (ctx:VxV)
#750: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:706:
+ for (i=0; i < num_cpus_to_pin; i++) {
      ^

ERROR: space required before the open parenthesis '('
#776: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:732:
+ while(1) sleep(1);

ERROR: trailing statements should be on next line
#776: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:732:
+ while(1) sleep(1);

ERROR: space required after that ',' (ctx:VxV)
#777: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:733:
+ shutdown(0,"",0);
          ^

ERROR: space required after that ',' (ctx:VxV)
#777: FILE: tools/testing/selftests/mqueue/mq_perf_tests.c:733:
+ shutdown(0,"",0);
             ^

total: 25 errors, 11 warnings, 747 lines checked

./patches/tools-selftests-add-mq_perf_tests.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Doug Ledford <dledford@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Doug Ledford [Thu, 3 May 2012 05:44:53 +0000 (15:44 +1000)]

tools/selftests: add mq_perf_tests

Add the mq_perf_tests tool I used when creating my mq performance patch.
Also add a local .gitignore to keep the binaries from showing up in git
status output.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Andrew Morton [Thu, 3 May 2012 05:44:53 +0000 (15:44 +1000)]

ipc-mqueue-strengthen-checks-on-mqueue-creation-fix

s/ENOMEM/EOVERFLOW/

Cc: Doug Ledford <dledford@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Doug Ledford [Thu, 3 May 2012 05:44:52 +0000 (15:44 +1000)]

ipc/mqueue: strengthen checks on mqueue creation

We already check the mq attr struct if it's passed in, but now that the
admin can set system wide defaults separate from maximums, it's actually
possible to set the defaults to something that would overflow. So, if
there is no attr struct passed in to the open call, check the default
values.

While we are at it, simplify mq_attr_ok() by making it return 0 or an
error condition, so that way if we add more tests to it later, we have the
option of what error should be returned instead of the calling location
having to pick a possibly inaccurate error code.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Andrew Morton [Thu, 3 May 2012 05:44:52 +0000 (15:44 +1000)]

ipc-mqueue-correct-mq_attr_ok-test-fix

add a local to simplify overflow-checking expression

Cc: Doug Ledford <dledford@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Doug Ledford [Thu, 3 May 2012 05:44:52 +0000 (15:44 +1000)]

ipc/mqueue: correct mq_attr_ok test

While working on the other parts of the mqueue stuff, I noticed that the
calculation for overflow in mq_attr_ok didn't actually match reality (this
is especially true since my last patch which changed how we account memory
slightly).

In particular, we used to test for overflow using:
msgs * msgsize + msgs * sizeof(struct msg_msg *)

That was never really correct because each message we allocate via
load_msg() is actually a struct msg_msg followed by the data for the
message (and if struct msg_msg + data exceeds PAGE_SIZE we end up
allocating struct msg_msgseg structs too, but accounting for them would
get really tedious, so let's ignore those...they're only a pointer in size
anyway). This patch updates the calculation to be more accurate in
regards to maximum possible memory consumption by the mqueue.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Andrew Morton [Thu, 3 May 2012 05:44:51 +0000 (15:44 +1000)]

ipc-mqueue-improve-performance-of-send-recv-fix

fix typo in comment, remove stray semicolon

Cc: Doug Ledford <dledford@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Doug Ledford [Thu, 3 May 2012 05:44:51 +0000 (15:44 +1000)]

ipc/mqueue: improve performance of send/recv

The existing implementation of the POSIX message queue send and recv
functions is, well, abysmal.  Even worse than abysmal.  I submitted a
patch to increase the maximum POSIX message queue limit to 65536 due to
customer needs, however, upon looking over the send/recv implementation, I
realized that my customer needs help with that too even if they don't know
it.  The basic problem is that, given the fairly typical use case scenario
for a large queue of queueing lots of messages all at the same priority (I
verified with my customer that this is indeed what their app does), the
msg_insert routine is basically a frikkin' bubble sort.  I mean, whoa,
that's *so* middle school.

OK, OK, to not slam the original author too much, I'm sure they didn't
envision a queue depth of 50,000+ messages.  No one would think that
moving elements in an array, one at a time, and dereferencing each pointer
in that array to check priority of the message being pointed too, again
one at a time, for 50,000+ times would be good.  So let's assume that, as
is typical, the users have found a way to break our code simply by using
it in a way we didn't envision.  Fair enough.

"So, just how broken is it?", you ask.  I wondered the same thing, so I
wrote an app to let me know.  It's my next patch.  It gave me some
interesting results.  Here's what it tested:

Interference with other apps - In continuous mode, the app just sits there
and hits a message queue forever, while you go do something productive on
another terminal using other CPUs.  You then measure how long it takes you
to do that something productive.  Then you restart the app in fake
continuous mode, and it sits in a tight loop on a CPU while you repeat
your tests.  The whole point of this is to keep one CPU tied up (so it
can't be used in your other work) but in one case tied up hitting the
mqueue code so we can see the effect of walking that 65,528 element array
one pointer at a time on the global CPU cache.  If it's bad, then it will
slow down your app on the other CPUs just by polluting cache mercilessly.
In the fake case, it will be in a tight loop, but not polluting cache.
Testing the mqueue subsystem directly - Here we just run a number of tests
to see how the mqueue subsystem performs under different conditions.  A
couple conditions are known to be worst case for the old system, and some
routines, so this tests all of them.

So, on to the results already:

Subsystem/Test                  Old                         New

Time to compile linux
kernel (make -j12 on a
6 core CPU)
  Running mqueue test     user 49m10.744s             user 45m26.294s
   sys  5m51.924s              sys  4m59.894s
total 55m02.668s            total 50m26.188s

  Running fake test       user 45m32.686s             user 45m18.552s
                           sys  5m12.465s              sys  4m56.468s
                         total 50m45.151s            total 50m15.020s

  % slowdown from mqueue
    cache thrashing            ~8%                         ~.5%

Avg time to send/recv (in nanoseconds per message)
  when queue empty            305/288                    349/318
  when queue full (65528 messages)
    constant priority      526589/823                    362/314
    increasing priority    403105/916                    495/445
    decreasing priority     73420/594                    482/409
    random priority        280147/920                    546/436

Time to fill/drain queue (65528 messages, in seconds)
  constant priority         17.37/.12                    .13/.12
  increasing priority        4.14/.14                    .21/.18
  decreasing priority       12.93/.13                    .21/.18
  random priority            8.88/.16                    .22/.17

So, I think the results speak for themselves.  It's possible this
implementation could be improved by cacheing at least one priority level
in the node tree (that would bring the queue empty performance more in
line with the old implementation), but this works and is *so* much better
than what we had, especially for the common case of a single priority in
use, that further refinements can be in follow on patches.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Doug Ledford [Thu, 3 May 2012 05:44:51 +0000 (15:44 +1000)]

selftests: add mq_open_tests

Add a directory to house POSIX message queue subsystem specific tests.
Add first test which checks the operation of mq_open() under various
corner conditions.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Joe Korty <joe.korty@ccur.com>
Cc: Amerigo Wang <amwang@redhat.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

KOSAKI Motohiro [Thu, 3 May 2012 05:44:50 +0000 (15:44 +1000)]

mqueue: separate mqueue default value from maximum value

commit b231cca438 ("message queues: increase range limits") changed mqueue
default value when attr parameter is specified NULL from hard coded value
to fs.mqueue.{msg,msgsize}_max sysctl value.

This made large side effect. When user need to use two mqueue
applications 1) using !NULL attr parameter and it require big message size
and 2) using NULL attr parameter and only need small size message, app (1)
require to raise fs.mqueue.msgsize_max and app (2) consume large memory
size even though it doesn't need.

Doug Ledford propsed to switch back it to static hard coded value.
However it also has a compatibility problem. Some applications might
started depend on the default value is tunable.

The solution is to separate default value from maximum value.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Acked-by: Joe Korty <joe.korty@ccur.com>
Cc: Amerigo Wang <amwang@redhat.com>
Acked-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

KOSAKI Motohiro [Thu, 3 May 2012 05:44:50 +0000 (15:44 +1000)]

mqueue: don't use kmalloc with KMALLOC_MAX_SIZE

KMALLOC_MAX_SIZE is not a good threshold.  It is extremely high and
problematic.  Unfortunately, some silly drivers depend on this and we
can't change it.  But any new code needn't use such extreme ugly high
order allocations.  It brings us awful fragmentation issues and system
slowdown.

Signed-off-by: KOSAKI Motohiro <mkosaki@jp.fujitsu.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Acked-by: Joe Korty <joe.korty@ccur.com>
Cc: Amerigo Wang <amwang@redhat.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Joe Korty <joe.korty@ccur.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

KOSAKI Motohiro [Thu, 3 May 2012 05:44:49 +0000 (15:44 +1000)]

mqueue: revert bump up DFLT_*MAX

Mqueue limitation is slightly naieve parameter likes other ipcs because
unprivileged user can consume kernel memory by using ipcs.

Thus, too aggressive raise bring us security issue. Example, current
setting allow evil unprivileged user use 256GB (= 256 * 1024 * 1024*1024)
and it's enough large to system will belome unresponsive. Don't do that.

Instead, every admin should adjust the knobs for their own systems.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Acked-by: Joe Korty <joe.korty@ccur.com>
Cc: Amerigo Wang <amwang@redhat.com>
Acked-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Stephen Rothwell [Thu, 3 May 2012 05:44:49 +0000 (15:44 +1000)]

ipc/mqueue: using vmalloc requires including vmalloc.h

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Doug Ledford <dledford@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Doug Ledford [Thu, 3 May 2012 05:44:49 +0000 (15:44 +1000)]

ipc/mqueue: update maximums for the mqueue subsystem

Commit b231cca4381e ("message queues: increase range limits") changed the
maximum size of a message in a message queue from INT_MAX to 8192*128.
Unfortunately, we had customers that relied on a size much larger than
8192*128 on their production systems.  After reviewing POSIX, we found
that it is silent on the maximum message size.  We did find a couple other
areas in which it was not silent.  Fix up the mqueue maximums so that the
customer's system can continue to work, and document both the POSIX and
real world requirements in ipc_namespace.h so that we don't have this
issue crop back up.

Also, commit 9cf18e1dd74cd0 ("ipc: HARD_MSGMAX should be higher not lower
on 64bit") fiddled with HARD_MSGMAX without realizing that the number was
intentionally in place to limit the msg queue depth to one that was small
enough to kmalloc an array of pointers (hence why we divided 128k by
sizeof(long)).  If we wish to meet POSIX requirements, we have no choice
but to change our allocation to a vmalloc instead (at least for the large
queue size case).  With that, it's possible to increase our allowed
maximum to the POSIX requirements (or more if we choose).

Signed-off-by: Doug Ledford <dledford@redhat.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: Amerigo Wang <amwang@redhat.com>
Cc: Joe Korty <joe.korty@ccur.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Doug Ledford [Thu, 3 May 2012 05:44:48 +0000 (15:44 +1000)]

ipc/mqueue: enforce hard limits

In two places we don't enforce the hard limits for CAP_SYS_RESOURCE apps.
In preparation for making more reasonable hard limits, start enforcing
them even on CAP_SYS_RESOURCE.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: Amerigo Wang <amwang@redhat.com>
Cc: Joe Korty <joe.korty@ccur.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Doug Ledford [Thu, 3 May 2012 05:44:48 +0000 (15:44 +1000)]

ipc/mqueue: switch back to using non-max values on create

Commit b231cca4381 ("message queues: increase range limits") changed how
we create a queue that does not include an attr struct passed to open so
that it creates the queue with whatever the maximum values are.  However,
if the admin has set the maximums to allow flexibility in creating a queue
(aka, both a large size and large queue are allowed, but combined they
create a queue too large for the RLIMIT_MSGQUEUE of the user), then
attempts to create a queue without an attr struct will fail.  Switch back
to using acceptable defaults regardless of what the maximums are.

Note: so far, we only know of a few applications that rely on this
behavior (specifically, set the maximums in /proc, then run the
application which calls mq_open() without passing in an attr struct, and
the application expects the newly created message queue to have the
maximum sizes that were set in /proc used on the mq_open() call, and all
of those applications that we know of are actually part of regression test
suites that were coded to do something like this:

for size in 4096 65536 $((1024 * 1024)) $((16 * 1024 * 1024)); do
echo $size > /proc/sys/fs/mqueue/msgsize_max
mq_open || echo "Error opening mq with size $size"
done

These test suites that depend on any behavior like this are broken.  The
concept that programs should rely upon the system wide maximum in order to
get their desired results instead of simply using a attr struct to specify
what they want is fundamentally unfriendly programming practice for any
multi-tasking OS.

Fixing this will break those few apps that we know of (and those app
authors recognize the brokenness of their code and the need to fix it).
However, the following patch "mqueue: separate mqueue default value"
allows a workaround in the form of new knobs for the default msg queue
creation parameters for any software out there that we don't already know
about that might rely on this behavior at the moment.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: Amerigo Wang <amwang@redhat.com>
Cc: Joe Korty <joe.korty@ccur.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Doug Ledford [Thu, 3 May 2012 05:44:48 +0000 (15:44 +1000)]

ipc/mqueue: cleanup definition names and locations

Since commit b231cca4381 ("message queues: increase range limits") on Oct
18, 2008, calls to mq_open() that did not pass in an attribute struct and
expected to get default values for the size of the queue and the max
message size now get the system wide maximums instead of hardwired
defaults like they used to get.

This was uncovered when one of the earlier patches in this patch set
increased the default system wide maximums at the same time it increased
the hard ceiling on the system wide maximums (a customer specifically
needed the hard ceiling brought back up, the new ceiling that commit
b231cca4381 introduced was too low for their production systems).  By
increasing the default maximums and not realising they were tied to any
attempt to create a message queue without an attribute struct, I had
inadvertently made it such that all message queue creation attempts
without an attribute struct were failing because the new default maximums
would create a queue that exceeded the default rlimit for message queue
bytes.

As a result, the system wide defaults were brought back down to their
previous levels, and the system wide ceilings on the maximums were raised
to meet the customer's needs.  However, the fact that the no attribute
struct behavior of mq_open() could be broken by changing the system wide
maximums for message queues was seen as fundamentally broken itself.  So
we hardwired the no attribute case back like it used to be.  But, then we
realized that on the very off chance that some piece of software in the
wild depended on that behavior, we could work around that issue by adding
two new knobs to /proc that allowed setting the defaults for message
queues created without an attr struct separately from the system wide
maximums.

What is not an option IMO is to leave the current behavior in place.  No
piece of software should ever rely on setting the system wide maximums in
order to get a desired message queue.  Such a reliance would be so
fundamentally multitasking OS unfriendly as to not really be tolerable.
Fortunately, we don't know of any software in the wild that uses this
except for a regression test program that caught the issue in the first
place.  If there is though, we have made accommodations with the two new
/proc knobs (and that's all the accommodations such fundamentally broken
software can be allowed)..

This patch:

The various defines for minimums and maximums of the sysctl controllable
mqueue values are scattered amongst different files and named
inconsistently.  Move them all into ipc_namespace.h and make them have
consistent names.  Additionally, make the number of queues per namespace
also have a minimum and maximum and use the same sysctl function as the
other two settable variables.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Acked-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Amerigo Wang <amwang@redhat.com>
Cc: Joe Korty <joe.korty@ccur.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Manfred Spraul [Thu, 3 May 2012 05:44:47 +0000 (15:44 +1000)]

ipc/sem.c: alternatives to preempt_disable()

ipc/sem.c uses a custom wakeup scheme that relies on preempt_disable().
On -RT, this causes increased latencies and debug warnings.

The patch adds two additional schemes:
- one built around a completion - could be better for -RT kernels
- one built around a spinlock - unfortunately it's broken
- and the current one

My preferred solution would be the spinlock implementation: RT would use
premptible spinlocks, mainline normal spinlocks. Thus both get the
optimal implementation without any special code in ipc/sem.c.
Unfortunately, I don't see how it could be fixed.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Anton Vorontsov [Thu, 3 May 2012 05:44:47 +0000 (15:44 +1000)]

um: properly check all process' threads for a live mm

kill_off_processes() might miss a valid process, this is because checking
for process->mm is not enough. Process' main thread may exit or detach
its mm via use_mm(), but other threads may still have a valid mm.

To catch this we use find_lock_task_mm(), which walks up all threads and
returns an appropriate task (with task lock held).

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Anton Vorontsov [Thu, 3 May 2012 05:44:47 +0000 (15:44 +1000)]

um: fix possible race on task->mm

Checking for task->mm is dangerous as ->mm might disappear (exit_mm()
assigns NULL under task_lock(), so tasklist lock is not enough).

We can't use get_task_mm()/mmput() pair as mmput() might sleep, so let's
take the task lock while we care about its mm.

Note that we should also use find_lock_task_mm() to check all process'
threads for a valid mm, but for uml we'll do it in a separate patch.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Anton Vorontsov [Thu, 3 May 2012 05:44:46 +0000 (15:44 +1000)]

um: should hold tasklist_lock while traversing processes

Traversing the tasks requires holding tasklist_lock, otherwise it is
unsafe.

p.s. However, I'm not sure that calling os_kill_ptraced_process() in the
atomic context is correct. It seem to work, but please take a closer
look.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Anton Vorontsov [Thu, 3 May 2012 05:44:46 +0000 (15:44 +1000)]

blackfin: fix possible deadlock in decode_address()

Oleg Nesterov found an interesting deadlock possibility:

> sysrq_showregs_othercpus() does smp_call_function(showacpu)
> and showacpu() show_stack()->decode_address(). Now suppose that IPI
> interrupts the task holding read_lock(tasklist).

To fix this, blackfin should not grab the write_ variant of the
tasklist lock, read_ one is enough.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Anton Vorontsov [Thu, 3 May 2012 05:44:45 +0000 (15:44 +1000)]

blackfin: a couple of task->mm handling fixes

The patch fixes two problems:

1. Working with task->mm w/o getting mm or grabing the task lock is
   dangerous as ->mm might disappear (exit_mm() assigns NULL under
   task_lock(), so tasklist lock is not enough).

   We can't use get_task_mm()/mmput() pair as mmput() might sleep,
   so we have to take the task lock while handle its mm.

2. Checking for process->mm is not enough because process' main
   thread may exit or detach its mm via use_mm(), but other threads
   may still have a valid mm.

   To catch this we use find_lock_task_mm(), which walks up all
   threads and returns an appropriate task (with task lock held).

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Anton Vorontsov [Thu, 3 May 2012 05:44:45 +0000 (15:44 +1000)]

sh: use clear_tasks_mm_cpumask()

Checking for process->mm is not enough because process' main thread may
exit or detach its mm via use_mm(), but other threads may still have a
valid mm.

To fix this we would need to use find_lock_task_mm(), which would walk up
all threads and returns an appropriate task (with task lock held).

clear_tasks_mm_cpumask() has the issue fixed, so let's use it.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Anton Vorontsov [Thu, 3 May 2012 05:44:45 +0000 (15:44 +1000)]

powerpc: use clear_tasks_mm_cpumask()

Current CPU hotplug code has some task->mm handling issues:

1. Working with task->mm w/o getting mm or grabing the task lock is
   dangerous as ->mm might disappear (exit_mm() assigns NULL under
   task_lock(), so tasklist lock is not enough).

   We can't use get_task_mm()/mmput() pair as mmput() might sleep,
   so we must take the task lock while handle its mm.

2. Checking for process->mm is not enough because process' main
   thread may exit or detach its mm via use_mm(), but other threads
   may still have a valid mm.

   To fix this we would need to use find_lock_task_mm(), which would
   walk up all threads and returns an appropriate task (with task
   lock held).

clear_tasks_mm_cpumask() has all the issues fixed, so let's use it.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Anton Vorontsov [Thu, 3 May 2012 05:44:44 +0000 (15:44 +1000)]

arm: use clear_tasks_mm_cpumask()

Checking for process->mm is not enough because process' main thread may
exit or detach its mm via use_mm(), but other threads may still have a
valid mm.

To fix this we would need to use find_lock_task_mm(), which would walk up
all threads and returns an appropriate task (with task lock held).

clear_tasks_mm_cpumask() has this issue fixed, so let's use it.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Anton Vorontsov [Thu, 3 May 2012 05:44:44 +0000 (15:44 +1000)]

cpu: introduce clear_tasks_mm_cpumask() helper

Many architectures clear tasks' mm_cpumask like this:

read_lock(&tasklist_lock);
for_each_process(p) {
if (p->mm)
cpumask_clear_cpu(cpu, mm_cpumask(p->mm));
}
read_unlock(&tasklist_lock);

Depending on the context, the code above may have several problems,
such as:

1. Working with task->mm w/o getting mm or grabing the task lock is
   dangerous as ->mm might disappear (exit_mm() assigns NULL under
   task_lock(), so tasklist lock is not enough).

2. Checking for process->mm is not enough because process' main
   thread may exit or detach its mm via use_mm(), but other threads
   may still have a valid mm.

This patch implements a small helper function that does things
correctly, i.e.:

1. We take the task's lock while whe handle its mm (we can't use
   get_task_mm()/mmput() pair as mmput() might sleep);

2. To catch exited main thread case, we use find_lock_task_mm(),
   which walks up all threads and returns an appropriate task
   (with task lock held).

Also, Per Peter Zijlstra's idea, now we don't grab tasklist_lock in
the new helper, instead we take the rcu read lock. We can do this
because the function is called after the cpu is taken down and marked
offline, so no new tasks will get this cpu set in their mm mask.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Konstantin Khlebnikov [Thu, 3 May 2012 05:44:44 +0000 (15:44 +1000)]

fork: call complete_vfork_done() after clearing child_tid and flushing rss-counters

Child should wake up the parent from vfork() only after finishing all
operations with shared mm. There is no sense in using
CLONE_CHILD_CLEARTID together with CLONE_VFORK, but it looks more accurate
now.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Cong Wang [Thu, 3 May 2012 05:44:43 +0000 (15:44 +1000)]

proc: use IS_ERR_OR_NULL()

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Konstantin Khlebnikov [Thu, 3 May 2012 05:44:43 +0000 (15:44 +1000)]

proc/smaps: show amount of hwpoison pages

Add the line "HWPoinson: <size> kB" into /proc/pid/smaps if
CONFIG_MEMORY_FAILURE=y and some HWPoison pages were found. This may be
useful for searching applications which use a broken memory.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Konstantin Khlebnikov [Thu, 3 May 2012 05:44:42 +0000 (15:44 +1000)]

proc/smaps: show amount of nonlinear ptes in vma

Currently, nonlinear mappings can not be distinguished from ordinary
mappings. This patch adds into /proc/pid/smaps line "Nonlinear: <size>
kB", where size is amount of nonlinear ptes in vma, this line appears only
if VM_NONLINEAR is set. This information may be useful not only for
checkpoint/restore project.

Requested by Pavel Emelyanov.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Konstantin Khlebnikov [Thu, 3 May 2012 05:44:42 +0000 (15:44 +1000)]

proc/smaps: carefully handle migration entries

Currently smaps reports migration entries as "swap", as result "swap" can
appears in shared mapping.

This patch converts migration entries into pages and handles them as usual.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Konstantin Khlebnikov [Thu, 3 May 2012 05:44:42 +0000 (15:44 +1000)]

proc: report file/anon bit in /proc/pid/pagemap

This is an implementation of Andrew's proposal to extend the pagemap file
bits to report what is missing about tasks' working set.

The problem with the working set detection is multilateral.  In the criu
(checkpoint/restore) project we dump the tasks' memory into image files
and to do it properly we need to detect which pages inside mappings are
really in use.  The mincore syscall I though could help with this did not.
First, it doesn't report swapped pages, thus we cannot find out which
parts of anonymous mappings to dump.  Next, it does report pages from page
cache as present even if they are not mapped, and it doesn't make that has
not been cow-ed.

Note, that issue with swap pages is critical -- we must dump swap pages to
image file.  But the issues with file pages are optimization -- we can
take all file pages to image, this would be correct, but if we know that a
page is not mapped or not cow-ed, we can remove them from dump file.  The
dump would still be self-consistent, though significantly smaller in size
(up to 10 times smaller on real apps).

Andrew noticed, that the proc pagemap file solved 2 of 3 above issues --
it reports whether a page is present or swapped and it doesn't report not
mapped page cache pages.  But, it doesn't distinguish cow-ed file pages
from not cow-ed.

I would like to make the last unused bit in this file to report whether the
page mapped into respective pte is PageAnon or not.

[comment stolen from Pavel Emelyanov's v1 patch]

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Jan Engelhardt [Thu, 3 May 2012 05:44:41 +0000 (15:44 +1000)]

procfs: use more apprioriate types when dumping /proc/N/stat

- use int fpr priority and nice, since task_nice()/task_prio() return that

- field 24: get_mm_rss() returns unsigned long

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Alexey Dobriyan [Thu, 3 May 2012 05:44:41 +0000 (15:44 +1000)]

proc: pass "fd" by value in /proc/*/{fd,fdinfo} code

Pass "fd" directly, not via pointer -- one less memory read.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Alexey Dobriyan [Thu, 3 May 2012 05:44:41 +0000 (15:44 +1000)]

proc: don't do dummy rcu_read_lock/rcu_read_unlock on error path

rcu_read_lock()/rcu_read_unlock() is nop for TINY_RCU, but is not a nop
for, say, PREEMPT_RCU.

proc_fill_cache() is called without RCU lock, there is no need to
lock/unlock on error path, simply jump out of the loop.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Cong Wang [Thu, 3 May 2012 05:44:40 +0000 (15:44 +1000)]

proc: use mm_access() instead of ptrace_may_access()

mm_access() handles this much better, and avoids some race conditions.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Cong Wang [Thu, 3 May 2012 05:44:40 +0000 (15:44 +1000)]

proc: remove mm_for_maps()

mm_for_maps() is a simple wrapper for mm_access(), and the name is
misleading, so just remove it and use mm_access() directly.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Cong Wang [Thu, 3 May 2012 05:44:40 +0000 (15:44 +1000)]

proc: unify ptrace_may_access() locking code

Unify mutex_lock+ptrace_may_access code and rename lock_trace() to
task_access_lock(), which better describes what it does.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Andrew Morton [Thu, 3 May 2012 05:44:39 +0000 (15:44 +1000)]

proc-clean-up-proc-pid-environ-handling-checkpatch-fixes

ERROR: "foo* bar" should be "foo *bar"
#26: FILE: fs/proc/base.c:680:
+static int __mem_open(struct inode* inode, struct file* file, unsigned int mode)

ERROR: "foo* bar" should be "foo *bar"
#26: FILE: fs/proc/base.c:680:
+static int __mem_open(struct inode* inode, struct file* file, unsigned int mode)

ERROR: "foo* bar" should be "foo *bar"
#43: FILE: fs/proc/base.c:708:
+static int mem_open(struct inode* inode, struct file* file)

ERROR: "foo* bar" should be "foo *bar"
#43: FILE: fs/proc/base.c:708:
+static int mem_open(struct inode* inode, struct file* file)

ERROR: "foo* bar" should be "foo *bar"
#55: FILE: fs/proc/base.c:809:
+static int environ_open(struct inode* inode, struct file* file)

ERROR: "foo* bar" should be "foo *bar"
#55: FILE: fs/proc/base.c:809:
+static int environ_open(struct inode* inode, struct file* file)

total: 6 errors, 0 warnings, 100 lines checked

./patches/proc-clean-up-proc-pid-environ-handling.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Cong Wang [Thu, 3 May 2012 05:44:39 +0000 (15:44 +1000)]

proc: clean up /proc/<pid>/environ handling

Similar to e268337dfe2 ("proc: clean up and fix /proc/<pid>/mem
handling"), move the check of permission to open(), this will simplify
read() code.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Tim Bird [Thu, 3 May 2012 05:44:38 +0000 (15:44 +1000)]

stack usage: add pid to warning printk in check_stack_usage

In embedded systems, sometimes the same program (busybox) is the cause of
multiple warnings. Outputting the pid with the program name in the
warning printk helps distinguish which instances of a program are using
the stack most.

This is a small patch, but useful.

Signed-off-by: Tim Bird <tim.bird@am.sony.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Oleg Nesterov [Thu, 3 May 2012 05:44:38 +0000 (15:44 +1000)]

cred: remove task_is_dead() from __task_cred() validation

commit 8f92054e ("CRED: Fix __task_cred()'s lockdep check and banner
comment"):

    add the following validation condition:

        task->exit_state >= 0

    to permit the access if the target task is dead and therefore
    unable to change its own credentials.

OK, but afaics currently this can only help wait_task_zombie() which calls
__task_cred() without rcu lock.

Remove this validation and change wait_task_zombie() to use task_uid()
instead.  This means we do rcu_read_lock() only to shut up the lockdep,
but we already do the same in, say, wait_task_stopped().

task_is_dead() should die, task->exit_state != 0 means that this task has
passed exit_notify(), only do_wait-like code paths should use this.

Unfortunately, we can't kill task_is_dead() right now, it has already
acquired buggy users in drivers/staging.  The fix already exists.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Tetsuo Handa [Thu, 3 May 2012 05:44:37 +0000 (15:44 +1000)]

kmod: avoid deadlock from recursive kmod call

The system deadlocks (at least since 2.6.10) when
call_usermodehelper(UMH_WAIT_EXEC) request triggered
call_usermodehelper(UMH_WAIT_PROC) request.

This is because "khelper thread is waiting for the worker thread at
wait_for_completion() in do_fork() since the worker thread was created
with CLONE_VFORK flag" and "the worker thread cannot call complete()
because do_execve() is blocked at UMH_WAIT_PROC request" and "the khelper
thread cannot start processing UMH_WAIT_PROC request because the khelper
thread is waiting for the worker thread at wait_for_completion() in
do_fork()".

In order to avoid deadlock, do not try to call wait_for_completion() in
call_usermodehelper_exec() if the worker thread was created by khelper
thread with CLONE_VFORK flag.

The easiest example to observe this deadlock is to use a corrupted
/sbin/hotplug binary (like shown below).

  # : > /tmp/dummy
  # chmod 755 /tmp/dummy
  # echo /tmp/dummy > /proc/sys/kernel/hotplug
  # modprobe whatever

call_usermodehelper("/tmp/dummy", UMH_WAIT_EXEC) is called from
kobject_uevent_env() in lib/kobject_uevent.c upon loading/unloading a
module.  do_execve("/tmp/dummy") triggers a call to
request_module("binfmt-0000") from search_binary_handler() which in turn
calls call_usermodehelper(UMH_WAIT_PROC).

There are various hooks called during do_execve() operation (e.g.
security_bprm_check(), audit_bprm(), "struct
linux_binfmt"->load_binary()).  If one of such hooks triggers
UMH_WAIT_EXEC, this deadlock will happen even if /sbin/hotplug is not
corrupted.

[akpm@linux-foundation.org: add comment to kmod_thread_locker]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Randy Dunlap [Thu, 3 May 2012 05:44:37 +0000 (15:44 +1000)]

kmod.c: fix kernel-doc warning

Warning(kernel/kmod.c:419): No description found for parameter 'depth'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Boaz Harrosh [Thu, 3 May 2012 05:44:37 +0000 (15:44 +1000)]

kmod: move call_usermodehelper_fns() to .c file and unexport all it's helpers

If we move call_usermodehelper_fns() to kmod.c file and EXPORT_SYMBOL it
we can avoid exporting all it's helper functions:
call_usermodehelper_setup
call_usermodehelper_setfns
call_usermodehelper_exec
And make all of them static to kmod.c

Since the optimizer will see all these as a single call site it will
inline them inside call_usermodehelper_fns(). So we loose the call to
_fns but gain 3 calls to the helpers. (Not that it matters)

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Boaz Harrosh [Thu, 3 May 2012 05:44:36 +0000 (15:44 +1000)]

kmod: convert two call sites to call_usermodehelper_fns()

Both kernel/sys.c && security/keys/request_key.c where inlining the exact
same code as call_usermodehelper_fns(); So simply convert these sites to
directly use call_usermodehelper_fns().

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Boaz Harrosh [Thu, 3 May 2012 05:44:36 +0000 (15:44 +1000)]

kmod: unexport call_usermodehelper_freeinfo()

call_usermodehelper_freeinfo() is not used outside of kmod.c. So unexport
it, and make it static to kmod.c

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Dan Carpenter [Thu, 3 May 2012 05:44:35 +0000 (15:44 +1000)]

HPFS: remove PRINTK() macro

The PRINTK() macro isn't really used. Let's just remove it because it
is ugly and out of date.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Nikolaus Voss [Thu, 3 May 2012 05:44:35 +0000 (15:44 +1000)]

drivers/rtc/rtc-m41t93.c: don't let get_time() reset M41T93_FLAG_OF

If the rtc reports the time might be invalid due to oscillator failure,
M41T93_FLAG_OF flag must not be reset by get_time() as the read operation
doesn't make the time valid.

Without this patch, only the first get_time() reported an invalid time,
the second get_time() reported a valid time althought the reported time is
probably wrong due to oscillator failure.

Instead of resetting in get_time(), with this patch M41T93_FLAG_OF is
reset in set_time() when a valid time is to be written.

Signed-off-by: Nikolaus Voss <n.voss@weinmann.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Wolfram Sang [Thu, 3 May 2012 05:44:35 +0000 (15:44 +1000)]

rtc: ds1307: add trickle charger support

Some DS13XX devices have "trickle chargers". Its configuration register
is at different locations, the setup is the same, though. Since the
configuration is board specific, introduce a platform_data to this driver.
Tested with a DS1339 on a custom board.

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Alessandro Zummo <alessandro.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Wolfram Sang [Thu, 3 May 2012 05:44:34 +0000 (15:44 +1000)]

rtc: ds1307: remove superfluous initialization

ds1307 was kzalloced, so no need to zero members of the struct.

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Andrew Morton [Thu, 3 May 2012 05:44:34 +0000 (15:44 +1000)]

rtc-rename-config_rtc_mxc-to-config_rtc_drv_mxc-fix

Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Fabio Estevam <fabio.estevam@freescale.com>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Fabio Estevam [Thu, 3 May 2012 05:44:34 +0000 (15:44 +1000)]

rtc: rename CONFIG_RTC_MXC to CONFIG_RTC_DRV_MXC

In order to keep consistency with other rtc drivers,rename CONFIG_RTC_MXC
to CONFIG_RTC_DRV_MXC.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Fabio Estevam [Thu, 3 May 2012 05:44:33 +0000 (15:44 +1000)]

drivers/rtc/Kconfig: place RTC_DRV_IMXDI and RTC_MXC under "on-CPU RTC drivers"

RTC_DRV_IMXDI and RTC_MXC are on-chip RTC modules, so move them under
"on-CPU RTC drivers" selection menu.

While at it change the dependency of RTC_DRV_IMXDI from ARCH_MX25 to
SOC_IMX25.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Alexander Stein [Thu, 3 May 2012 05:44:33 +0000 (15:44 +1000)]

drivers/rtc/rtc-pcf8563.c: add RTC_VL_READ/RTC_VL_CLR ioctl feature

Changes are based on arch/cris/arch-v10/drivers/pcf8563.c

Signed-off-by: Alexander Stein <alexander.stein@systec-electronic.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Mikael Starvik <starvik@axis.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Alexander Stein [Thu, 3 May 2012 05:44:32 +0000 (15:44 +1000)]

rtc: add ioctl to get/clear battery low voltage status

Currently there is no generic way to get the RTC battery status within an
application. So add an ioctl to read the status bit. The idea is that
the bit is set once a low voltage is detected. It stays there until it is
reset using the RTC_VL_CLR ioctl.

Signed-off-by: Alexander Stein <alexander.stein@systec-electronic.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

H Hartley Sweeten [Thu, 3 May 2012 05:44:32 +0000 (15:44 +1000)]

drivers/rtc/rtc-ep93xx.c: convert to use module_platform_driver()

Use module_platform_driver() to remove the boilerplate code.

Also, change the probe and remove functions to __devinit/__devexit.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Viresh Kumar [Thu, 3 May 2012 05:44:32 +0000 (15:44 +1000)]

rtc/spear: add Device Tree probing capability

SPEAr platforms now support DT and so must convert all drivers support DT.
This patch adds DT probing support for rtc and updates its documentation
too.

Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Cc: Stefan Roese <sr@denx.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Rajeev Kumar <rajeev-dlh.kumar@st.com>
Cc: Rob Herring <robherring2@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

H Hartley Sweeten [Thu, 3 May 2012 05:44:31 +0000 (15:44 +1000)]

init: disable sparse checking of the mount.o source files

The init/mount.o source files produce a number of sparse warnings of the
type:

warning: incorrect type in argument 1 (different address spaces)
   expected char [noderef] <asn:1>*dev_name
   got char *name

This is due to the syscalls expecting some of the arguments to be user
pointers but they are being passed as kernel pointers.  This is harmless
but adds a lot of noise to a sparse build.

To limit the noise just disable the sparse checking in the relevant source
files, but still display a warning so that the user knows this has been
done.

Since the sparse checking has been disabled we can also remove the __user
__force casts that are scattered thru the source.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Joe Perches [Thu, 3 May 2012 05:44:31 +0000 (15:44 +1000)]

checkpatch: suggest pr_<level> over printk(KERN_<LEVEL>

Suggest the shorter pr_<level> instead of printk(KERN_<LEVEL>.

Prefer to use pr_<level> over bare printks.
Prefer to use pr_warn over pr_warning.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Nick Piggin [Thu, 3 May 2012 05:44:30 +0000 (15:44 +1000)]

radix-tree: fix preload vector size

We are not preallocating a sufficient number of nodes.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Stephen Boyd [Thu, 3 May 2012 05:44:30 +0000 (15:44 +1000)]

spinlock_debug: print kallsyms name for lock

When a spinlock warning is printed we usually get

BUG: spinlock bad magic on CPU#0, modprobe/111
lock: 0xdff09f38, .magic: 00000000, .owner: /0, .owner_cpu: 0

but it's nicer to print the symbol for the lock if we have it so that we
can avoid 'grep dff09f38 /proc/kallsyms' to find out which lock it was.
Use kallsyms to print the symbol name so we get something a bit easier to
read

BUG: spinlock bad magic on CPU#0, modprobe/112
lock: test_lock, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0

If the lock is not in kallsyms %ps will fall back to printing the address
directly.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Stephen Boyd [Thu, 3 May 2012 05:44:30 +0000 (15:44 +1000)]

vsprintf: fix %ps on non symbols when using kallsyms

Using %ps in a printk format will sometimes fail silently and print the
empty string if the address passed in does not match a symbol that
kallsyms knows about.  But using %pS will fall back to printing the full
address if kallsyms can't find the symbol.  Make %ps act the same as %pS
by falling back to printing the address.

While we're here also make %ps print the module that a symbol comes from
so that it matches what %pS already does.  Take this simple function for
example (in a module):

static void test_printk(void)
{
int test;
pr_info("with pS: %pS\n", &test);
pr_info("with ps: %ps\n", &test);
}

Before this patch:

with pS: 0xdff7df44
with ps:

After this patch:

with pS: 0xdff7df44
with ps: 0xdff7df44

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Andrew Morton [Thu, 3 May 2012 05:44:29 +0000 (15:44 +1000)]

lib/bitmap.c: fix documentation for scnprintf() functions

The code comments for bscnl_emit() and bitmap_scnlistprintf() are
describing snprintf() return semantics, but these functions use
scnprintf() return semantics. Fix that, and document the
bitmap_scnprintf() return value as well.

Cc: Ryota Ozaki <ozaki.ryota@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Andrew Morton [Thu, 3 May 2012 05:44:29 +0000 (15:44 +1000)]

lib/string_helpers.c: make arrays static

Moving these arrays into static storage shrinks the kernel a bit:

   text    data     bss     dec     hex filename
    723     112      64     899     383 lib/string_helpers.o
    516     272      64     852     354 lib/string_helpers.o

Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Uwe Kleine-König [Thu, 3 May 2012 05:44:29 +0000 (15:44 +1000)]

lib/test-kstrtox.c: mark const init data with __initconst instead of __initdata

As long as there is no other non-const variable marked __initdata in the
same compilation unit it doesn't hurt. If there were one however
compilation would fail with

error: $variablename causes a section type conflict

because a section containing const variables is marked read only and so
cannot contain non-const variables.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Chris Metcalf [Thu, 3 May 2012 05:44:28 +0000 (15:44 +1000)]

list_debug: WARN for adding something already in the list

We were bitten by this at one point and added an additional sanity test
for DEBUG_LIST. You can't validly add a list_head to a list where either
prev or next is the same as the thing you're adding.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Shuah Khan [Thu, 3 May 2012 05:44:28 +0000 (15:44 +1000)]

leds: add new transient trigger for one shot timer activation

The leds timer trigger does not currently have an interface to activate a
one shot timer.  The current support allows for setting two timers, one
for specifying how long a state to be on, and the second for how long the
state to be off.  The delay_on value specifies the time period an LED
should stay in on state, followed by a delay_off value that specifies how
long the LED should stay in off state.  The on and off cycle repeats until
the trigger gets deactivated.  There is no provision for one time
activation to implement features that require an on or off state to be
held just once and then stay in the original state forever.

Without one shot timer interface, user space can still use timer trigger
to set a timer to hold a state, however when user space application
crashes or goes away without deactivating the timer, the hardware will be
left in that state permanently.

As a specific example of this use-case, let's look at vibrate feature on
phones.  Vibrate function on phones is implemented using PWM pins on SoC
or PMIC.  There is a need to activate one shot timer to control the
vibrate feature, to prevent user space crashes leaving the phone in
vibrate mode permanently causing the battery to drain.

This trigger exports three properties, activate, state, and duration When
transient trigger is activated these properties are set to default values.

- duration allows setting timer value in msecs. The initial value is 0.
- activate allows activating and deactivating the timer specified by
  duration as needed. The initial and default value is 0.  This will allow
  duration to be set after trigger activation.
- state allows user to specify a transient state to be held for the specified
  duration.

Signed-off-by: Shuah Khan <shuahkhan@gmail.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: NeilBrown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Andrew Morton [Thu, 3 May 2012 05:44:27 +0000 (15:44 +1000)]

leds-heartbeat-stop-on-shutdown-checkpatch-fixes

ERROR: space prohibited after that '!' (ctx:WxW)
#45: FILE: drivers/leds/ledtrig-heartbeat.c:121:
+ if( ! rc )
^

ERROR: space prohibited after that open parenthesis '('
#45: FILE: drivers/leds/ledtrig-heartbeat.c:121:
+ if( ! rc )

ERROR: space prohibited before that close parenthesis ')'
#45: FILE: drivers/leds/ledtrig-heartbeat.c:121:
+ if( ! rc )

ERROR: space required before the open parenthesis '('
#45: FILE: drivers/leds/ledtrig-heartbeat.c:121:
+ if( ! rc )

total: 4 errors, 0 warnings, 36 lines checked

./patches/leds-heartbeat-stop-on-shutdown.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Alexander Holler <holler@ahsoftware.de>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Shuah Khan <shuahkhan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Alexander Holler [Thu, 3 May 2012 05:44:27 +0000 (15:44 +1000)]

leds-heartbeat-stop-on-shutdown-v5

Signed-off-by: Alexander Holler <holler@ahsoftware.de>
Cc: Shuah Khan <shuahkhan@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Alexander Holler [Thu, 3 May 2012 05:44:27 +0000 (15:44 +1000)]

leds: heartbeat: stop on shutdown

A halted kernel should not show a heartbeat.

Signed-off-by: Alexander Holler <holler@ahsoftware.de>
Cc: Shuah Khan <shuahkhan@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Kim, Milo [Thu, 3 May 2012 05:44:26 +0000 (15:44 +1000)]

drivers/leds/leds-lm3530.c: simplify als configuration on initialization

For better code readability, ALS code is moved to new a function -
lm3530_als_configure()

Signed-off-by: Milo(Woogyom) Kim <milo.kim@ti.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Shreshtha Kumar SAHU <shreshthakumar.sahu@stericsson.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Kim, Milo [Thu, 3 May 2012 05:44:26 +0000 (15:44 +1000)]

include/linux/led-lm3530.h: comment correction about the range of brightness

max brightness is 127, so the range of brt_val should be from 0 to 127

Signed-off-by: Milo(Woogyom) Kim <milo.kim@ti.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Shreshtha Kumar SAHU <shreshthakumar.sahu@stericsson.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Shuah Khan [Thu, 3 May 2012 05:44:26 +0000 (15:44 +1000)]

leds: change ledtrig-timer to use activated flag

Change existing timer trigger to use the new ->activated flag to set
activate successful status in activate routine and check it in deactivate
routine to do cleanup.

Signed-off-by: Shuah Khan <shuahkhan@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Shuah Khan [Thu, 3 May 2012 05:44:25 +0000 (15:44 +1000)]

leds: change existing triggers to use activated flag

Change existing triggers backlight, gpio, and heartbeat to use the new
->activated flag to set activate successful status in their activate
routines and check it in their deactivate routines to do cleanup.

Signed-off-by: Shuah Khan <shuahkhan@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Shuah Khan [Thu, 3 May 2012 05:44:25 +0000 (15:44 +1000)]

leds: add new field to led_classdev struct to save activation state

Add a new field to led_classdev to save activattion state after activate
routine is successful. This saved state is used in deactivate routine to
do cleanup such as removing device files, and free memory allocated during
activation. Currently trigger_data not being null is used for this
purpose.

Existing triggers will need changes to use this new field.

Signed-off-by: Shuah Khan <shuahkhan@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Thomas Meyer [Thu, 3 May 2012 05:44:25 +0000 (15:44 +1000)]

leds: Use kcalloc instead of kzalloc to allocate array

The advantage of kcalloc is that will prevent integer overflows which
could result from the multiplication of number of elements and size and it
is also a bit nicer to read.

The semantic patch that makes this change is available
in https://lkml.org/lkml/2011/11/25/107

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Shuah Khan [Thu, 3 May 2012 05:44:24 +0000 (15:44 +1000)]

leds: simple_strtoul() cleanup

led-class.c and ledtrig-timer.c still use simple_strtoul(). Change them
to use kstrtoul() instead of obsolete simple_strtoul().

Also fix the existing int ret declaration to be ssize_t to match the
return type for _store functions in ledtrig-timer.c.

Signed-off-by: Shuah Khan <shuahkhan@gmail.com>
Cc: Joe Perches <joe@perches.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Axel Lin [Thu, 3 May 2012 05:44:24 +0000 (15:44 +1000)]

leds: lm3556: don't call kfree for the memory allocated by devm_kzalloc

The devm_* functions eliminate the need for manual resource releasing
and simplify error handling. Resources allocated by devm_* are freed
automatically on driver detach.

Thus adding kfree calls here will introduce double free bug.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: Geon Si Jeong <gshark.jeong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Andrew Morton [Thu, 3 May 2012 05:44:23 +0000 (15:44 +1000)]

leds-add-led-driver-for-lm3556-chip-checkpatch-fixes

WARNING: please write a paragraph that describes the config symbol fully
#35: FILE: drivers/leds/Kconfig:405:
+config LEDS_LM3556

ERROR: "foo * bar" should be "foo *bar"
#204: FILE: drivers/leds/leds-lm3556.c:142:
+static int lm3556_read_reg(struct i2c_client *client, u8 reg, u8 * val)

total: 1 errors, 1 warnings, 736 lines checked

./patches/leds-add-led-driver-for-lm3556-chip.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Geon Si Jeong <gshark.jeong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Geon Si Jeong [Thu, 3 May 2012 05:44:23 +0000 (15:44 +1000)]

leds: add LED driver for lm3556 chip

A simple driver for the Texas Instruments LM3556 chip.

The LM3556 is a 4 MHz fixed-frequency synchronous boost converter plus
1.5A constant current driver for a high-current white LED. Datasheet:
www.national.com/ds/LM/LM3556.pdf

Tested on OMAP4430

Signed-off-by: Geon Si Jeong <gshark.jeong@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Daniel Jeong <daniel.jeong@ti.com>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Greg KH <greg@kroah.com>
Cc: Wolfram Sang <w.sang@pengutronix.de>
Cc: Shuah Khan <shuahkhan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Andrew Morton [Thu, 3 May 2012 05:44:23 +0000 (15:44 +1000)]

leds-led-module-for-da9052-53-pmic-v2-fix

Cc: Ashish Jangam <ashish.jangam@kpitcummins.com>
Cc: David Dajun Chen <dchen@diasemi.com>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

David Dajun Chen [Thu, 3 May 2012 05:44:22 +0000 (15:44 +1000)]

leds: driver for DA9052/53 PMIC v2

LED Driver for Dialog Semiconductor DA9052/53 PMICs.

Signed-off-by: David Dajun Chen <dchen@diasemi.com>
Signed-off-by: Ashish Jangam <ashish.jangam@kpitcummins.com>
Reviewed-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Dan Carpenter [Thu, 3 May 2012 05:44:22 +0000 (15:44 +1000)]

drivers/leds/leds-lp5521.c: fix lp5521_read() error handling

Gcc 4.6.2 complains that:
drivers/leds/leds-lp5521.c: In function `lp5521_load_program':
drivers/leds/leds-lp5521.c:214:21: warning: `mode' may be used uninitialized in this function [-Wuninitialized]
drivers/leds/leds-lp5521.c: In function `lp5521_probe':
drivers/leds/leds-lp5521.c:788:5: warning: `buf' may be used uninitialized in this function [-Wuninitialized]
drivers/leds/leds-lp5521.c:740:6: warning: `ret' may be used uninitialized in this function [-Wuninitialized]

These are real problems if lp5521_read() returns an error. When that
happens we should handle it, instead of ignoring it or doing a bitwise OR
with all the other error codes and continuing.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Milo <Milo.Kim@ti.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit | commitdiff | tree

Inki Dae [Thu, 3 May 2012 05:44:22 +0000 (15:44 +1000)]

fbdev: add events for early fb event support

Add FB_EARLY_EVENT_BLANK and FB_R_EARLY_EVENT_BLANK event mode supports.
first, fb_notifier_call_chain() is called with FB_EARLY_EVENT_BLANK and
fb_blank() of specific fb driver is called and then
fb_notifier_call_chain() is called with FB_EVENT_BLANK again at
fb_blank(). and if fb_blank() was failed then fb_nitifier_call_chain()
would be called with FB_R_EARLY_EVENT_BLANK to revert the previous
effects.

Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Linux kernel for KaRo TX COM modules