]> git.karo-electronics.de Git - karo-tx-linux.git/log
karo-tx-linux.git
12 years agomm: fix page-faults detection in swap-token logic
Konstantin Khlebnikov [Thu, 8 Dec 2011 04:38:53 +0000 (15:38 +1100)]
mm: fix page-faults detection in swap-token logic

After commit v2.6.36-5896-gd065bd8 "mm: retry page fault when blocking on
disk transfer" we usually wait in page-faults without mmap_sem held, so
all swap-token logic was broken, because it based on using
rwsem_is_locked(&mm->mmap_sem) as sign of in progress page-faults.

Add an atomic counter of in progress page-faults for mm to the mm_struct
with swap-token.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm-tracepoint: fix documentation and examples
Konstantin Khlebnikov [Thu, 8 Dec 2011 04:38:52 +0000 (15:38 +1100)]
mm-tracepoint: fix documentation and examples

We renamed the page-free mm tracepoints.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm-tracepoint: rename page-free events
Konstantin Khlebnikov [Thu, 8 Dec 2011 04:38:52 +0000 (15:38 +1100)]
mm-tracepoint: rename page-free events

Rename mm_page_free_direct into mm_page_free and mm_pagevec_free into
mm_page_free_batched

Since v2.6.33-5426-gc475dab the kernel triggers mm_page_free_direct for
all freed pages, not only for directly freed.  So, let's name it properly.
 For pages freed via page-list we also trigger mm_page_free_batched event.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: remove unused pagevec_free
Konstantin Khlebnikov [Thu, 8 Dec 2011 04:38:51 +0000 (15:38 +1100)]
mm: remove unused pagevec_free

It not exported and now nobody uses it.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm-add-free_hot_cold_page_list-helper-v3
Konstantin Khlebnikov [Thu, 8 Dec 2011 04:38:51 +0000 (15:38 +1100)]
mm-add-free_hot_cold_page_list-helper-v3

v3: Always free pages in reverse order.
    The most recently added struct page, the most likely to be hot.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm-add-free_hot_cold_page_list-helper-v2
Konstantin Khlebnikov [Thu, 8 Dec 2011 04:38:51 +0000 (15:38 +1100)]
mm-add-free_hot_cold_page_list-helper-v2

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: add free_hot_cold_page_list() helper
Konstantin Khlebnikov [Thu, 8 Dec 2011 04:38:50 +0000 (15:38 +1100)]
mm: add free_hot_cold_page_list() helper

This patch adds helper free_hot_cold_page_list() to free list of 0-order
pages.  It frees pages directly from list without temporary page-vector.
It also calls trace_mm_pagevec_free() to simulate pagevec_free()
behaviour.

bloat-o-meter:

add/remove: 1/1 grow/shrink: 1/3 up/down: 267/-295 (-28)
function                                     old     new   delta
free_hot_cold_page_list                        -     264    +264
get_page_from_freelist                      2129    2132      +3
__pagevec_free                               243     239      -4
split_free_page                              380     373      -7
release_pages                                606     510     -96
free_page_list                               188       -    -188

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agovmscan: activate executable pages after first usage
Konstantin Khlebnikov [Thu, 8 Dec 2011 04:38:50 +0000 (15:38 +1100)]
vmscan: activate executable pages after first usage

Logic added in commit 8cab4754d24a0 ("vmscan: make mapped executable pages
the first class citizen") was noticeably weakened in commit
645747462435d84 ("vmscan: detect mapped file pages used only once").

Currently these pages can become "first class citizens" only after second
usage.  After this patch page_check_references() will activate they after
first usage, and executable code gets yet better chance to stay in memory.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agovmscan: promote shared file mapped pages
Konstantin Khlebnikov [Thu, 8 Dec 2011 04:38:50 +0000 (15:38 +1100)]
vmscan: promote shared file mapped pages

Commit 645747462435 ("vmscan: detect mapped file pages used only once")
greatly decreases lifetime of single-used mapped file pages.
Unfortunately it also decreases life time of all shared mapped file pages.
Because after commit bf3f3bc5e7347 ("mm: don't mark_page_accessed
in fault path") page-fault handler does not mark page active or even
referenced.

Thus page_check_references() activates file page only if it was used twice
while it stays in inactive list, meanwhile it activates anon pages after
first access.  Inactive list can be small enough, this way reclaimer can
accidentally throw away any widely used page if it wasn't used twice in
short period.

After this patch page_check_references() also activate file mapped page at
first inactive list scan if this page is already used multiple times via
several ptes.

I found this while trying to fix degragation in rhel6 (~2.6.32) from rhel5
(~2.6.18).  There a complete mess with >100 web/mail/spam/ftp containers,
they share all their files but there a lot of anonymous pages: ~500mb
shared file mapped memory and 15-20Gb non-shared anonymous memory.  In
this situation major-pagefaults are very costly, because all containers
share the same page.  In my load kernel created a disproportionate
pressure on the file memory, compared with the anonymous, they equaled
only if I raise swappiness up to 150 =)

These patches actually wasn't helped a lot in my problem, but I saw
noticable (10-20 times) reduce in count and average time of
major-pagefault in file-mapped areas.

Actually both patches are fixes for commit v2.6.33-5448-g6457474, because
it was aimed at one scenario (singly used pages), but it breaks the logic
in other scenarios (shared and/or executable pages)

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm/page-writeback.c: make determine_dirtyable_memory static again
Johannes Weiner [Thu, 8 Dec 2011 04:38:39 +0000 (15:38 +1100)]
mm/page-writeback.c: make determine_dirtyable_memory static again

The tracing ring-buffer used this function briefly, but not anymore.
Make it local to the writeback code again.

Also, move the function so that no forward declaration needs to be
reintroduced.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoMAINTAINERS: Staging: cx25821: Add L: linux-media
Joe Perches [Thu, 8 Dec 2011 04:33:20 +0000 (15:33 +1100)]
MAINTAINERS: Staging: cx25821: Add L: linux-media

Send patches to a mailing list.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Greg KH <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agofs: remove unneeded plug in mpage_readpages()
Namjae Jeon [Thu, 8 Dec 2011 04:33:20 +0000 (15:33 +1100)]
fs: remove unneeded plug in mpage_readpages()

The block plug in mpage_readpages() is duplicates the one in read_pages().

Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Amit Sahrawat <amit.sahrawat83@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/message/fusion/mptbase.c: ensure NUL-termination of MptCallbacksName elements
Ferenc Wagner [Thu, 8 Dec 2011 04:33:20 +0000 (15:33 +1100)]
drivers/message/fusion/mptbase.c: ensure NUL-termination of MptCallbacksName elements

I just stumbled upon this while pondering over
https://bugzilla.kernel.org/show_bug.cgi?id=26692 and thought this could
be made better.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Ferenc Wagner <wferi@niif.hu>
Cc: Desai <kashyap.desai@lsi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/scsi/mpt2sas/mpt2sas_base.c: fix mismatch in mpt2sas_base_hard_reset_handler...
Alexey Khoroshilov [Thu, 8 Dec 2011 04:33:19 +0000 (15:33 +1100)]
drivers/scsi/mpt2sas/mpt2sas_base.c: fix mismatch in mpt2sas_base_hard_reset_handler() mutex lock-unlock

If ioc->pci_error_recovery is set, goto out in
mpt2sas_base_hard_reset_handler() leads to unlock unheld
ioc->reset_in_progress_mutex.

Fix the issue by jumping afer mutex_unlock() call.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Cc: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/scsi/sg.c: convert to kstrtoul_from_user()
Stephen Boyd [Thu, 8 Dec 2011 04:33:19 +0000 (15:33 +1100)]
drivers/scsi/sg.c: convert to kstrtoul_from_user()

Instead of open coding this function use kstrtoul_from_user() directly.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: Douglas Gilbert <dougg@torque.net>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/scsi/aacraid/commctrl.c: fix mem leak in aac_send_raw_srb()
Jesper Juhl [Thu, 8 Dec 2011 04:33:19 +0000 (15:33 +1100)]
drivers/scsi/aacraid/commctrl.c: fix mem leak in aac_send_raw_srb()

We leak in drivers/scsi/aacraid/commctrl.c::aac_send_raw_srb() :

We allocate memory:
        ...
                        struct user_sgmap* usg;
                        usg = kmalloc(actual_fibsize - sizeof(struct aac_srb)
                          + sizeof(struct sgmap), GFP_KERNEL);
and then neglect to free it:
        ...
                        for (i = 0; i < usg->count; i++) {
                                u64 addr;
                                void* p;
                                if (usg->sg[i].count >
                                    ((dev->adapter_info.options &
                                     AAC_OPT_NEW_COMM) ?
                                      (dev->scsi_host_ptr->max_sectors << 9) :
                                      65536)) {
                                        rcode = -EINVAL;
                                        goto cleanup;
        ... this 'goto' makes 'usg' go out of scope and leak the memory we
            allocated.
            Other exits properly kfree(usg), it's just here it is neglected.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/scsi/megaraid.c: fix sparse warnings
Randy Dunlap [Thu, 8 Dec 2011 04:33:18 +0000 (15:33 +1100)]
drivers/scsi/megaraid.c: fix sparse warnings

Fix sparse warnings of right shift bigger than source value size:

drivers/scsi/megaraid.c:311:65: warning: right shift by bigger than source value
drivers/scsi/megaraid.c:313:65: warning: right shift by bigger than source value
drivers/scsi/megaraid.c:317:67: warning: right shift by bigger than source value
drivers/scsi/megaraid.c:319:67: warning: right shift by bigger than source value

Patch suggestion from email by Al Viro:

"Since both are claimed to be strings, I really suspect that this >> 8 is
misspelled >> 4 and they have a character followed by pair of two-digit
packed decimals in there..."

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Neela Syam Kolli <megaraidlinux@lsi.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoscsi: fix a header to include linux/types.h
Alexander Shishkin [Thu, 8 Dec 2011 04:33:18 +0000 (15:33 +1100)]
scsi: fix a header to include linux/types.h

For headers that get exported to userland and make use of u32 style
type names, it is advised to include linux/types.h.

This fixes a headers_check warning.

Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoparisc, exec: remove redundant set_fs(USER_DS)
Mathias Krause [Thu, 8 Dec 2011 04:33:17 +0000 (15:33 +1100)]
parisc, exec: remove redundant set_fs(USER_DS)

The address limit is already set in flush_old_exec() so those calls to
set_fs(USER_DS) are redundant.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Helge Deller <deller@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoext4: use proper little-endian bitops
Akinobu Mita [Thu, 8 Dec 2011 04:33:17 +0000 (15:33 +1100)]
ext4: use proper little-endian bitops

ext4_{set,clear}_bit() is defined as __test_and_{set,clear}_bit_le() for
ext4.  Only two ext4_{set,clear}_bit() calls check the return value.  The
rest of calls ignore the return value and they can be replaced with
__{set,clear}_bit_le().

This changes ext4_{set,clear}_bit() from __test_and_{set,clear}_bit_le()
to __{set,clear}_bit_le() and introduces ext4_test_and_{set,clear}_bit()
for the two places where old bit needs to be returned.

This ext4_{set,clear}_bit() change is considered safe, because if someone
uses these macros without noticing the change, new ext4_{set,clear}_bit
don't have return value and causes compiler errors where the return value
is used.

This also removes unused ext4_find_first_zero_bit().

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoipc-mqueue-update-maximums-for-the-mqueue-subsystem-checkpatch-fixes
Andrew Morton [Thu, 8 Dec 2011 04:33:17 +0000 (15:33 +1100)]
ipc-mqueue-update-maximums-for-the-mqueue-subsystem-checkpatch-fixes

Cc: Amerigo Wang <amwang@redhat.com>
ERROR: Macros with complex values should be enclosed in parenthesis
#87: FILE: include/linux/ipc_namespace.h:126:
+#define DFLT_MSGSIZEMAX 1024*1024

ERROR: Macros with complex values should be enclosed in parenthesis
#88: FILE: include/linux/ipc_namespace.h:127:
+#define HARD_MSGSIZEMAX      16*1024*1024

total: 2 errors, 0 warnings, 75 lines checked

./patches/ipc-mqueue-update-maximums-for-the-mqueue-subsystem.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Doug Ledford <dledford@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoipc-mqueue-update-maximums-for-the-mqueue-subsystem-fix
Stephen Rothwell [Thu, 8 Dec 2011 04:33:16 +0000 (15:33 +1100)]
ipc-mqueue-update-maximums-for-the-mqueue-subsystem-fix

ipc/mqueue.c: In function 'mqueue_get_inode':
ipc/mqueue.c:154:4: error: implicit declaration of function 'vmalloc'
ipc/mqueue.c:154:19: warning: assignment makes pointer from integer without=
 a cast
ipc/mqueue.c: In function 'mqueue_evict_inode':
ipc/mqueue.c:278:3: error: implicit declaration of function 'vfree'

Caused by commit 8a53f9442429 ("ipc/mqueue: update maximums for the
mqueue subsystem").  See Rule 1 in Documentation/SubmitChecklist.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Doug Ledford <dledford@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoipc/mqueue: update maximums for the mqueue subsystem
Doug Ledford [Thu, 8 Dec 2011 04:33:16 +0000 (15:33 +1100)]
ipc/mqueue: update maximums for the mqueue subsystem

Commit b231cca4381ee ("message queues: increase range limits") changed the
maximum size of a message in a message queue from INT_MAX to 8192*128.
Unfortunately, we had customers that relied on a size much larger than
8192*128 on their production systems.  After reviewing POSIX, we found
that it is silent on the maximum message size.  We did find a couple other
areas in which it was not silent.  Fix up the mqueue maximums so that the
customer's system can continue to work, and document both the POSIX and
real world requirements in ipc_namespace.h so that we don't have this
issue crop back up.

Also, commit 9cf18e1dd74c ("ipc: HARD_MSGMAX should be higher not lower on
64bit") fiddled with HARD_MSGMAX without realizing that the number was
intentionally in place to limit the msg queue depth to one that was small
enough to kmalloc an array of pointers (hence why we divided 128k by
sizeof(long)).  If we wish to meet POSIX requirements, we have no choice
but to change our allocation to a vmalloc instead (at least for the large
queue size case).  With that, it's possible to increase our allowed
maximum to the POSIX requirements (or more if we choose).

Signed-off-by: Doug Ledford <dledford@redhat.com>
Cc: Amerigo Wang <amwang@redhat.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: Joe Korty <joe.korty@ccur.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoipc/mqueue: enforce hard limits
Doug Ledford [Thu, 8 Dec 2011 04:33:16 +0000 (15:33 +1100)]
ipc/mqueue: enforce hard limits

In two places we don't enforce the hard limits for CAP_SYS_RESOURCE apps.
In preparation for making more reasonable hard limits, start enforcing
them even on CAP_SYS_RESOURCE.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Cc: Amerigo Wang <amwang@redhat.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: Joe Korty <joe.korty@ccur.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoipc/mqueue: switch back to using non-max values on create
Doug Ledford [Thu, 8 Dec 2011 04:33:15 +0000 (15:33 +1100)]
ipc/mqueue: switch back to using non-max values on create

Commit b231cca4381ee15e ("message queues: increase range limits") changed
how we create a queue that does not include an attr struct passed to open
so that it creates the queue with whatever the maximum values are.
However, if the admin has set the maximums to allow flexibility in
creating a queue (aka, both a large size and large queue are allowed, but
combined they create a queue too large for the RLIMIT_MSGQUEUE of the
user), then attempts to create a queue without an attr struct will fail.
Switch back to using acceptable defaults regardless of what the maximums
are.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Cc: Amerigo Wang <amwang@redhat.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: Joe Korty <joe.korty@ccur.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoipc/mqueue: cleanup definition names and locations
Doug Ledford [Thu, 8 Dec 2011 04:33:15 +0000 (15:33 +1100)]
ipc/mqueue: cleanup definition names and locations

We had a customer come up with a problem while trying to upgrade from our
2.6.18 kernel to our 2.6.32 kernel.  In diagnosing their problem, it was
determined that when commit b231cca4 ("message queues: increase range
limits") changed the msg size max from INT_MAX to 8192*128, that's what
broke their setup.

While fixing this problem, testing showed that if you increase the max
values of a msg queue, then attempt to create one without an attr struct
passed in to the open call, it could fail because it sets the queue size
to the max of both the msg size and queue size.  If these are large
enough, they over run the default RLIMIT_MSGQUEUE.  This change was also
introduced in the b231cca4 ("message queues: increase range limits")
commit.

We then found that the msg queue limits were not all being enforced on
CAP_SYS_RESOURCE apps.

Finally, we found that commit 9cf18e1d ("ipc: HARD_MSGMAX should be higher
not lower on 64bit") fiddled with HARD_MSGMAX without realizing that the
reason it was set to what it was, was to avoid trying to kmalloc a chunk
larger than 128K.

So this series of patches cleans up the various defines, takes us back to
having a larger HARD_MSGSIZEMAX, goes back to using a separate define for
the case where a user doesn't pass in an attr struct in case the maxes
have been raised too large for RLIMIT_MSGQUEUE, enforces the maximums on
CAP_SYS_RESOURCE apps, uses vmalloc instead of kmalloc when the msg
pointer array is too large, and documents all of this so it shouldn't
happen again.

This patch:

The various defines for minimums and maximums of the sysctl controllable
mqueue values are scattered amongst different files and named
inconsistently.  Move them all into ipc_namespace.h and make them have
consistent names.  Additionally, make the number of queues per namespace
also have a minimum and maximum and use the same sysctl function as the
other two settable variables.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Cc: Amerigo Wang <amwang@redhat.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: Joe Korty <joe.korty@ccur.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoctags: remove struct forward declarations
Alexey Dobriyan [Thu, 8 Dec 2011 04:33:14 +0000 (15:33 +1100)]
ctags: remove struct forward declarations

They're quite pointless and obscure location of real structure definition.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomerge_config.sh: fix bug in final check
John Stultz [Thu, 8 Dec 2011 04:33:14 +0000 (15:33 +1100)]
merge_config.sh: fix bug in final check

Arnaud Lacombe pointed out the final checking that the requested configs
were included in the final .config was broken.

The example was that if you had a fragment that disabled
CONFIG_DECOMPRESS_GZIP applied to a normal defconfig, there would be no
final warning that CONFIG_DECOMPRESS_GZIP was acutally set in the final
.config.

This bug was introduced by me in v3 of the original patch, and the
following patch reverts the invalid change.

Signed-off-by: John Stultz <john.stultz@linaro.org>
Reported-by: Arnaud Lacombe <lacombar@gmail.com>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Arnaud Lacombe <lacombar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomerge_config.sh: whitespace cleanup
Darren Hart [Thu, 8 Dec 2011 04:33:14 +0000 (15:33 +1100)]
merge_config.sh: whitespace cleanup

Fix whitespace usage in the clean_up routine.

Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomerge_config.sh: use signal names compatible with dash and bash
Darren Hart [Thu, 8 Dec 2011 04:33:13 +0000 (15:33 +1100)]
merge_config.sh: use signal names compatible with dash and bash

The SIGHUP SIGINT and SIGTERM names caused failures when running
merge_config.sh with the dash shell.  Dropping the "SIG" component makes
the script work in both bash and dash.

Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agokconfig: add merge_config.sh script
john stultz [Thu, 8 Dec 2011 04:33:04 +0000 (15:33 +1100)]
kconfig: add merge_config.sh script

After noticing almost every distro has their own method of managing config
fragments, I went looking at some best practices, and wanted to try to
consolidate some of the different approaches so this fairly simple
infrastructure can be shared (and new distros/build systems don't have to
implement yet another config fragment merge script).

This script is most influenced by the Windriver tools used in the Yocto
Project, reusing some portions found there.

This script merges multiple config fragments, warning on any overridden
values.  It then sets any unspecified values to their default, then
finally checks to make sure no specified value was dropped due to
unsatisfied dependencies.

I'm sure this implementation won't work for everyone, and I expect it will
need to evolve to adapt for various use cases.  But I think its a
reasonable starting point.

Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: <tartler@cs.fau.de>
Cc: Dmitry Fink <Dmitry.Fink@palm.com>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Eric B Munson <ebmunson@us.ibm.com>
Cc: Bruce Ashfield <Bruce.Ashfield@windriver.com>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoia64, exec: remove redundant set_fs(USER_DS)
Mathias Krause [Thu, 8 Dec 2011 04:32:14 +0000 (15:32 +1100)]
ia64, exec: remove redundant set_fs(USER_DS)

The address limit is already set in flush_old_exec() so this
set_fs(USER_DS) is redundant.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agotick-sched: add specific do_timer_cpu value for nohz off mode
Dimitri Sivanich [Thu, 8 Dec 2011 04:32:13 +0000 (15:32 +1100)]
tick-sched: add specific do_timer_cpu value for nohz off mode

Show and modify the tick_do_timer_cpu via sysfs.  This determines the cpu
on which global time (jiffies) updates occur.  Modification can only be
done on systems with nohz mode turned off.

While not necessarily harmful, doing jiffies updates on an application cpu
does cause some extra overhead that HPC benchmarking people notice.  They
prefer to have OS activity isolated to certain cpus.  They like
reproducibility of results, and having jiffies updates bouncing around
introduces variability.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agohrtimers: Special-case zero length sleeps
Matthew Garrett [Thu, 8 Dec 2011 04:32:13 +0000 (15:32 +1100)]
hrtimers: Special-case zero length sleeps

sleep(0) is a common construct used by applications that want to trigger
the scheduler.  sched_yield() might make more sense, but only appeared in
POSIX.1-2001 and so plenty of example code still uses the sleep(0) form.

This wouldn't normally be a problem, but it means that event-driven
applications that are merely trying to avoid starving other processes may
actually end up sleeping due to having large timer_slack values.  Special-
casing this seems reasonable.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoarm, exec: remove redundant set_fs(USER_DS)
Mathias Krause [Thu, 8 Dec 2011 04:32:12 +0000 (15:32 +1100)]
arm, exec: remove redundant set_fs(USER_DS)

The address limit is already set in flush_old_exec() so this
set_fs(USER_DS) is redundant.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoarch/arm/mach-ux500/mbox-db5500.c: world-writable sysfs fifo file
Vasiliy Kulikov [Thu, 8 Dec 2011 04:32:12 +0000 (15:32 +1100)]
arch/arm/mach-ux500/mbox-db5500.c: world-writable sysfs fifo file

Don't allow everybody to use a modem.

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Cc: Srinidhi Kasagar <srinidhi.kasagar@stericsson.com>
Cc: Linus Walleij <linus.walleij@stericsson.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers-platform-x86-sony-laptopc-fix-scancodes-v2-checkpatch-fixes
Andrew Morton [Thu, 8 Dec 2011 04:32:12 +0000 (15:32 +1100)]
drivers-platform-x86-sony-laptopc-fix-scancodes-v2-checkpatch-fixes

Cc: Dan Carpenter <dan.carpenter@oracle.com>
ERROR: code indent should use tabs where possible
#34: FILE: drivers/platform/x86/sony-laptop.c:395:
+                   remap the key */$

total: 1 errors, 0 warnings, 28 lines checked

NOTE: whitespace errors detected, you may wish to use scripts/cleanpatch or
      scripts/cleanfile

./patches/drivers-platform-x86-sony-laptopc-fix-scancodes-v2.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: John Hughes <john@Calva.COM>
Cc: John Hughes <john@calva.com>
Cc: John Hughes <john@calvaedi.com>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Mattia Dongili <malattia@linux.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers-platform-x86-sony-laptopc-fix-scancodes-v2
John Hughes [Thu, 8 Dec 2011 04:32:11 +0000 (15:32 +1100)]
drivers-platform-x86-sony-laptopc-fix-scancodes-v2

Signed-off-by: John Hughes <john@calva.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Mattia Dongili <malattia@linux.it>
Cc: John Hughes <john@Calva.COM>
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers-platform-x86-sony-laptopc-fix-scancodes-checkpatch-fixes
Andrew Morton [Thu, 8 Dec 2011 04:32:11 +0000 (15:32 +1100)]
drivers-platform-x86-sony-laptopc-fix-scancodes-checkpatch-fixes

ERROR: do not use assignment in if condition
#59: FILE: drivers/platform/x86/sony-laptop.c:384:
+ if ((scancode = sony_laptop_input_index[event]) != -1) {

total: 1 errors, 0 warnings, 39 lines checked

./patches/drivers-platform-x86-sony-laptopc-fix-scancodes.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/platform/x86/sony-laptop.c: fix scancodes
John Hughes [Thu, 8 Dec 2011 04:32:11 +0000 (15:32 +1100)]
drivers/platform/x86/sony-laptop.c: fix scancodes

The scancodes returned by the sony-laptop driver for function keys did not
match the scancodes used to remap keys.  Also, since the scancode was sent
to the input subsystem after the mapped keysym the /lib/udev/keymap
utility was confused about which scancode to report for which keysym.

This patch fixes the driver so the correct scancode is shown for each key.
 It also adds to the documentation a description of where to find the
scancodes.

Before the patch FN/E returned scancode 0x1B, but to remap scancode 0x14
had to be used.

Signed-off-by: John Hughes <john@calva.com>
Cc: Mattia Dongili <malattia@linux.it>
Cc: Matthew Garrett <mjg@redhat.com>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agox86, olpc: add debugfs interface for EC commands
Daniel Drake [Thu, 8 Dec 2011 04:32:10 +0000 (15:32 +1100)]
x86, olpc: add debugfs interface for EC commands

Add a debugfs interface for sending commands to the OLPC Embedded
Controller (EC) and reading the responses.  The EC provides functionality
for machine identification, battery and AC control, wakeup control, etc.

Having a debugfs interface available is useful for EC development and
debugging.

Based on code by Paul Fox.

Signed-off-by: Paul Fox <pgf@laptop.org>
Signed-off-by: Daniel Drake <dsd@laptop.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andres Salomon <dilinger@queued.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm-vmallocc-eliminate-extra-loop-in-pcpu_get_vm_areas-error-path-fix
Andrew Morton [Thu, 8 Dec 2011 04:32:10 +0000 (15:32 +1100)]
mm-vmallocc-eliminate-extra-loop-in-pcpu_get_vm_areas-error-path-fix

remove now-unneeded tests

Cc: Kautuk Consul <consul.kautuk@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm/vmalloc.c: eliminate extra loop in pcpu_get_vm_areas error path
Kautuk Consul [Thu, 8 Dec 2011 04:32:09 +0000 (15:32 +1100)]
mm/vmalloc.c: eliminate extra loop in pcpu_get_vm_areas error path

If either of the vas or vms arrays are not properly kzalloced, then the
code jumps to the err_free label.

The err_free label runs a loop to check and free each of the array members
of the vas and vms arrays which is not required for this situation as none
of the array members have been allocated till this point.

Eliminate the extra loop we have to go through by introducing a new label
err_free2 and then jumping to it.

Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agox86-olpc-xo15-sci-enable-lid-close-wakeup-control-through-sysfs-v2
Daniel Drake [Thu, 8 Dec 2011 04:32:09 +0000 (15:32 +1100)]
x86-olpc-xo15-sci-enable-lid-close-wakeup-control-through-sysfs-v2

v2: Fix sscanf usage error and add an explanatory comment in the code, both pointed out by Andrew Morton. Thanks!

Signed-off-by: Daniel Drake <dsd@laptop.org>
Cc: Andres Salomon <dilinger@queued.net>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agox86-olpc-xo15-sci-enable-lid-close-wakeup-control-through-sysfs-fix
Andrew Morton [Thu, 8 Dec 2011 04:32:09 +0000 (15:32 +1100)]
x86-olpc-xo15-sci-enable-lid-close-wakeup-control-through-sysfs-fix

fix sscanf checking

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andres Salomon <dilinger@queued.net>
Cc: Daniel Drake <dsd@laptop.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agox86, olpc-xo15-sci: enable lid close wakeup control through sysfs
Daniel Drake [Thu, 8 Dec 2011 04:32:08 +0000 (15:32 +1100)]
x86, olpc-xo15-sci: enable lid close wakeup control through sysfs

Like most systems, OLPC's ACPI LID switch wakes up the system when the lid
is opened, but not when it is closed.

Under OLPC's opportunistic suspend model, the lid may be closed while the
system was oportunistically suspended with the screen running.  In this
event, we want to wake up to turn the screen off.

Enable control of normal ACPI wakeups through lid close events through a
new sysfs attribute "lid_wake_on_closed".  When set, and when LID wakeups
are enabled through ACPI, the system will wake up on both open and close
lid events.

Signed-off-by: Daniel Drake <dsd@laptop.org>
Cc: Andres Salomon <dilinger@queued.net>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoarch/x86/platform/iris/iris.c: register a platform device and a platform driver
Shérab [Thu, 8 Dec 2011 04:32:08 +0000 (15:32 +1100)]
arch/x86/platform/iris/iris.c: register a platform device and a platform driver

This makes the iris driver use the platform API, so it is properly exposed
in /sys.

[akpm@linux-foundation.org: remove commented-out code, add missing space to printk, clean up code layout]
Signed-off-by: Shérab <Sebastien.Hinderer@ens-lyon.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoacerhdf: lowered default temp fanon/fanoff values
Peter Feuerer [Thu, 8 Dec 2011 04:32:08 +0000 (15:32 +1100)]
acerhdf: lowered default temp fanon/fanoff values

Due to new supported hardware, of which the actual temperature limits of
processor, harddisk and other components are unknown, it feels safer with
lower fanon / fanoff settings.

It won't change much for most people, already using acerhdf, as they use
their own fanon/fanoff variable settings when loading the module.

Furthermore seems like kernel and userspace tools have been improved to
work more efficient and netbooks don't get so hot anymore.

Signed-off-by: Peter Feuerer <peter@piie.net>
Acked-by: Borislav Petkov <petkovbb@gmail.com>
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoacerhdf: add support for new hardware
Peter Feuerer [Thu, 8 Dec 2011 04:32:07 +0000 (15:32 +1100)]
acerhdf: add support for new hardware

Add support for new hardware:
Acer Aspire LT-10Q/531/751/1810/1825,
Acer Travelmate 7730,
Packard Bell ENBFT/DOTVR46

Signed-off-by: Peter Feuerer <peter@piie.net>
Acked-by: Borislav Petkov <petkovbb@gmail.com>
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoacerhdf: add support for Aspire 1410 BIOS v1.3314
Clay Carpenter [Thu, 8 Dec 2011 04:32:07 +0000 (15:32 +1100)]
acerhdf: add support for Aspire 1410 BIOS v1.3314

Add support for Aspire 1410 BIOS v1.3314.  Fixes the following error:

acerhdf: unknown (unsupported) BIOS version Acer/Aspire 1410/v1.3314,
please report, aborting!

Signed-off-by: Clay Carpenter <claycarpenter@gmail.com>
Signed-off-by: Peter Feuerer <peter@piie.net>
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agonet/netfilter/nf_conntrack_netlink.c: fix Oops on container destroy
Alex Bligh [Thu, 8 Dec 2011 04:32:07 +0000 (15:32 +1100)]
net/netfilter/nf_conntrack_netlink.c: fix Oops on container destroy

Problem:

A repeatable Oops can be caused if a container with networking
unshared is destroyed when it has nf_conntrack entries yet to expire.

A copy of the oops follows below. A perl program generating the oops
repeatably is attached inline below.

Analysis:

The oops is called from cleanup_net when the namespace is
destroyed. conntrack iterates through outstanding events and calls
death_by_timeout on each of them, which in turn produces a call to
ctnetlink_conntrack_event. This calls nf_netlink_has_listeners, which
oopses because net->nfnl is NULL.

The perl program generates the container through fork() then
clone(NS_NEWNET). I does not explicitly set up netlink
explicitly set up netlink, but I presume it was set up else net->nfnl
would have been NULL earlier (i.e. when an earlier connection
timed out). This would thus suggest that net->nfnl is made NULL
during the destruction of the container, which I think is done by
nfnetlink_net_exit_batch.

I can see that the various subsystems are deinitialised in the opposite
order to which the relevant register_pernet_subsys calls are called,
and both nf_conntrack and nfnetlink_net_ops register their relevant
subsystems. If nfnetlink_net_ops registered later than nfconntrack,
then its exit routine would have been called first, which would cause
the oops described. I am not sure there is anything to prevent this
happening in a container environment.

Whilst there's perhaps a more complex problem revolving around ordering
of subsystem deinit, it seems to me that missing a netlink event on a
container that is dying is not a disaster. An early check for net->nfnl
being non-NULL in ctnetlink_conntrack_event appears to fix this. There
may remain a potential race condition if it becomes NULL immediately
after being checked (I am not sure any lock is held at this point or
how synchronisation for subsystem deinitialization works).

Patch:

The patch attached should apply on everything from 2.6.26 (if not before)
onwards; it appears to be a problem on all kernels. This was taken against
Ubuntu-3.0.0-11.17 which is very close to 3.0.4. I have torture-tested it
with the above perl script for 15 minutes or so; the perl script hung the
machine within 20 seconds without this patch.

Applicability:

If this is the right solution, it should be applied to all stable kernels
as well as head. Apart from the minor overhead of checking one variable
against NULL, it can never 'do the wrong thing', because if net->nfnl
is NULL, an oops will inevitably result. Therefore, checking is a reasonable
thing to do unless it can be proven than net->nfnl will never be NULL.

Check net->nfnl for NULL in ctnetlink_conntrack_event to avoid Oops on
container destroy

Signed-off-by: Alex Bligh <alex@alex.org.uk>
Cc: Patrick McHardy <kaber@trash.net>
Cc: David Miller <davem@davemloft.net>
Cc: <stable@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agocpusets-stall-when-updating-mems_allowed-for-mempolicy-or-disjoint-nodemask-fix
Andrew Morton [Thu, 8 Dec 2011 04:32:04 +0000 (15:32 +1100)]
cpusets-stall-when-updating-mems_allowed-for-mempolicy-or-disjoint-nodemask-fix

stupid temporary hack to make it build with CONFIG_NUMA=n

Cc: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Paul Menage <paul@paulmenage.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agocpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask
David Rientjes [Thu, 8 Dec 2011 04:32:04 +0000 (15:32 +1100)]
cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask

c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when changing
cpuset's mems") adds get_mems_allowed() to prevent the set of allowed
nodes from changing for a thread.  This causes any update to a set of
allowed nodes to stall until put_mems_allowed() is called.

This stall is unncessary, however, if at least one node remains unchanged
in the update to the set of allowed nodes.  This was addressed by
89e8a244b97e ("cpusets: avoid looping when storing to mems_allowed if one
node remains set"), but it's still possible that an empty nodemask may be
read from a mempolicy because the old nodemask may be remapped to the new
nodemask during rebind.  To prevent this, only avoid the stall if there is
no mempolicy for the thread being changed.

This is a temporary solution until all reads from mempolicy nodemasks can
be guaranteed to not be empty without the get_mems_allowed()
synchronization.

Also moves the check for nodemask intersection inside task_lock() so that
tsk->mems_allowed cannot change.  This ensures that nothing can set this
tsk's mems_allowed out from under us and also protects tsk->mempolicy.

Reported-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Paul Menage <paul@paulmenage.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoMerge remote-tracking branch 'uapi/for-next'
Stephen Rothwell [Wed, 14 Dec 2011 05:39:01 +0000 (16:39 +1100)]
Merge remote-tracking branch 'uapi/for-next'

12 years agoMerge remote-tracking branch 'kvmtool/master'
Stephen Rothwell [Wed, 14 Dec 2011 05:37:29 +0000 (16:37 +1100)]
Merge remote-tracking branch 'kvmtool/master'

Conflicts:
include/net/9p/9p.h
scripts/kconfig/Makefile

12 years agoMerge remote-tracking branch 'remoteproc/for-next'
Stephen Rothwell [Wed, 14 Dec 2011 05:33:03 +0000 (16:33 +1100)]
Merge remote-tracking branch 'remoteproc/for-next'

12 years agoMerge remote-tracking branch 'memblock/memblock-kill-early_node_map'
Stephen Rothwell [Wed, 14 Dec 2011 05:26:09 +0000 (16:26 +1100)]
Merge remote-tracking branch 'memblock/memblock-kill-early_node_map'

Conflicts:
arch/arm/mm/init.c
arch/score/Kconfig

12 years agoMerge remote-tracking branch 'xshm/xshm-for-next'
Stephen Rothwell [Wed, 14 Dec 2011 05:24:38 +0000 (16:24 +1100)]
Merge remote-tracking branch 'xshm/xshm-for-next'

12 years agoMerge remote-tracking branch 'kmap_atomic/kmap_atomic'
Stephen Rothwell [Wed, 14 Dec 2011 05:21:30 +0000 (16:21 +1100)]
Merge remote-tracking branch 'kmap_atomic/kmap_atomic'

Conflicts:
Documentation/feature-removal-schedule.txt

12 years agoMerge remote-tracking branch 'vhost/linux-next'
Stephen Rothwell [Wed, 14 Dec 2011 05:14:49 +0000 (16:14 +1100)]
Merge remote-tracking branch 'vhost/linux-next'

Conflicts:
arch/hexagon/Kconfig
arch/m68k/Kconfig

12 years agoMerge remote-tracking branch 'pinctrl/for-next'
Stephen Rothwell [Wed, 14 Dec 2011 05:13:15 +0000 (16:13 +1100)]
Merge remote-tracking branch 'pinctrl/for-next'

12 years agoMerge remote-tracking branch 'writeback/writeback-for-next'
Stephen Rothwell [Wed, 14 Dec 2011 05:06:50 +0000 (16:06 +1100)]
Merge remote-tracking branch 'writeback/writeback-for-next'

12 years agoMerge remote-tracking branch 'tmem/tmem'
Stephen Rothwell [Wed, 14 Dec 2011 05:04:59 +0000 (16:04 +1100)]
Merge remote-tracking branch 'tmem/tmem'

Conflicts:
mm/swapfile.c

12 years agoMerge remote-tracking branch 'char-misc/char-misc-next'
Stephen Rothwell [Wed, 14 Dec 2011 05:03:18 +0000 (16:03 +1100)]
Merge remote-tracking branch 'char-misc/char-misc-next'

12 years agoMerge remote-tracking branch 'staging/staging-next'
Stephen Rothwell [Wed, 14 Dec 2011 05:01:35 +0000 (16:01 +1100)]
Merge remote-tracking branch 'staging/staging-next'

Conflicts:
drivers/hid/hid-hyperv.c
drivers/staging/hv/Kconfig
drivers/staging/hv/Makefile
drivers/staging/iio/adc/ad799x_core.c

12 years agoMerge remote-tracking branch 'usb/usb-next'
Stephen Rothwell [Wed, 14 Dec 2011 04:54:37 +0000 (15:54 +1100)]
Merge remote-tracking branch 'usb/usb-next'

12 years agoMerge remote-tracking branch 'tty/tty-next'
Stephen Rothwell [Wed, 14 Dec 2011 04:52:40 +0000 (15:52 +1100)]
Merge remote-tracking branch 'tty/tty-next'

Conflicts:
drivers/tty/serial/Kconfig
drivers/tty/serial/Makefile

12 years agoMerge commit 'refs/next/20111213/driver-core'
Stephen Rothwell [Wed, 14 Dec 2011 04:45:29 +0000 (15:45 +1100)]
Merge commit 'refs/next/20111213/driver-core'

12 years agoMerge remote-tracking branch 'hsi/for-next'
Stephen Rothwell [Wed, 14 Dec 2011 04:31:51 +0000 (15:31 +1100)]
Merge remote-tracking branch 'hsi/for-next'

12 years agoMerge remote-tracking branch 'regmap/for-next'
Stephen Rothwell [Wed, 14 Dec 2011 04:30:23 +0000 (15:30 +1100)]
Merge remote-tracking branch 'regmap/for-next'

Conflicts:
drivers/base/regmap/regcache.c
drivers/base/regmap/regmap.c

12 years agoMerge remote-tracking branch 'namespace/master'
Stephen Rothwell [Wed, 14 Dec 2011 04:15:03 +0000 (15:15 +1100)]
Merge remote-tracking branch 'namespace/master'

12 years agoMerge remote-tracking branch 'sysctl/master'
Stephen Rothwell [Wed, 14 Dec 2011 04:13:25 +0000 (15:13 +1100)]
Merge remote-tracking branch 'sysctl/master'

12 years agoMerge remote-tracking branch 'xen-two/linux-next'
Stephen Rothwell [Wed, 14 Dec 2011 04:07:19 +0000 (15:07 +1100)]
Merge remote-tracking branch 'xen-two/linux-next'

Conflicts:
arch/x86/xen/Kconfig

12 years agoMerge remote-tracking branch 'xen/upstream/xen'
Stephen Rothwell [Wed, 14 Dec 2011 04:05:48 +0000 (15:05 +1100)]
Merge remote-tracking branch 'xen/upstream/xen'

Conflicts:
arch/x86/xen/Kconfig

12 years agoMerge remote-tracking branch 'oprofile/for-next'
Stephen Rothwell [Wed, 14 Dec 2011 04:04:03 +0000 (15:04 +1100)]
Merge remote-tracking branch 'oprofile/for-next'

12 years agoMerge remote-tracking branch 'kmemleak/kmemleak'
Stephen Rothwell [Wed, 14 Dec 2011 03:57:33 +0000 (14:57 +1100)]
Merge remote-tracking branch 'kmemleak/kmemleak'

12 years agoMerge remote-tracking branch 'cgroup/for-next'
Stephen Rothwell [Wed, 14 Dec 2011 03:51:16 +0000 (14:51 +1100)]
Merge remote-tracking branch 'cgroup/for-next'

12 years agoMerge remote-tracking branch 'uprobes/for-next'
Stephen Rothwell [Wed, 14 Dec 2011 03:44:18 +0000 (14:44 +1100)]
Merge remote-tracking branch 'uprobes/for-next'

12 years agoMerge remote-tracking branch 'tip/auto-latest'
Stephen Rothwell [Wed, 14 Dec 2011 03:36:58 +0000 (14:36 +1100)]
Merge remote-tracking branch 'tip/auto-latest'

Conflicts:
drivers/cpufreq/cpufreq_conservative.c
drivers/cpufreq/cpufreq_ondemand.c
drivers/macintosh/rack-meter.c
fs/proc/stat.c
fs/proc/uptime.c
kernel/sched/core.c

12 years agoMerge remote-tracking branch 'gpio/gpio/next'
Stephen Rothwell [Wed, 14 Dec 2011 03:34:24 +0000 (14:34 +1100)]
Merge remote-tracking branch 'gpio/gpio/next'

12 years agoMerge remote-tracking branch 'edac-amd/for-next'
Stephen Rothwell [Wed, 14 Dec 2011 03:32:47 +0000 (14:32 +1100)]
Merge remote-tracking branch 'edac-amd/for-next'

12 years agoMerge remote-tracking branch 'fsnotify/for-next'
Stephen Rothwell [Wed, 14 Dec 2011 03:29:10 +0000 (14:29 +1100)]
Merge remote-tracking branch 'fsnotify/for-next'

12 years agoMerge remote-tracking branch 'apm/for-next'
Stephen Rothwell [Wed, 14 Dec 2011 03:27:36 +0000 (14:27 +1100)]
Merge remote-tracking branch 'apm/for-next'

12 years agoMerge remote-tracking branch 'pm/linux-next'
Stephen Rothwell [Wed, 14 Dec 2011 03:20:44 +0000 (14:20 +1100)]
Merge remote-tracking branch 'pm/linux-next'

12 years agoMerge remote-tracking branch 'trivial/for-next'
Stephen Rothwell [Wed, 14 Dec 2011 03:13:26 +0000 (14:13 +1100)]
Merge remote-tracking branch 'trivial/for-next'

Conflicts:
arch/powerpc/platforms/40x/Kconfig

12 years agoMerge remote-tracking branch 'osd/linux-next'
Stephen Rothwell [Wed, 14 Dec 2011 03:13:16 +0000 (14:13 +1100)]
Merge remote-tracking branch 'osd/linux-next'

12 years agoMerge remote-tracking branch 'cputime/cputime'
Stephen Rothwell [Wed, 14 Dec 2011 03:06:52 +0000 (14:06 +1100)]
Merge remote-tracking branch 'cputime/cputime'

12 years agoMerge remote-tracking branch 'iommu/next'
Stephen Rothwell [Wed, 14 Dec 2011 03:05:25 +0000 (14:05 +1100)]
Merge remote-tracking branch 'iommu/next'

12 years agoMerge remote-tracking branch 'watchdog/linux-next'
Stephen Rothwell [Wed, 14 Dec 2011 03:03:47 +0000 (14:03 +1100)]
Merge remote-tracking branch 'watchdog/linux-next'

12 years agoMerge remote-tracking branch 'security/next'
Stephen Rothwell [Wed, 14 Dec 2011 03:01:03 +0000 (14:01 +1100)]
Merge remote-tracking branch 'security/next'

Conflicts:
lib/Makefile

12 years agoMerge remote-tracking branch 'regulator/for-next'
Stephen Rothwell [Wed, 14 Dec 2011 02:59:35 +0000 (13:59 +1100)]
Merge remote-tracking branch 'regulator/for-next'

12 years agoMerge remote-tracking branch 'fbdev/fbdev-next'
Stephen Rothwell [Wed, 14 Dec 2011 02:58:01 +0000 (13:58 +1100)]
Merge remote-tracking branch 'fbdev/fbdev-next'

12 years agoMerge remote-tracking branch 'drm/drm-next'
Stephen Rothwell [Wed, 14 Dec 2011 02:53:09 +0000 (13:53 +1100)]
Merge remote-tracking branch 'drm/drm-next'

Conflicts:
drivers/gpu/drm/nouveau/nouveau_sgdma.c

12 years agoMerge remote-tracking branch 'mfd/for-next'
Stephen Rothwell [Wed, 14 Dec 2011 02:48:53 +0000 (13:48 +1100)]
Merge remote-tracking branch 'mfd/for-next'

12 years agoMerge remote-tracking branch 'md/for-next'
Stephen Rothwell [Wed, 14 Dec 2011 02:47:24 +0000 (13:47 +1100)]
Merge remote-tracking branch 'md/for-next'

12 years agoMerge remote-tracking branch 'slab/for-next'
Stephen Rothwell [Wed, 14 Dec 2011 02:45:55 +0000 (13:45 +1100)]
Merge remote-tracking branch 'slab/for-next'

12 years agoMerge remote-tracking branch 'kgdb/kgdb-next'
Stephen Rothwell [Wed, 14 Dec 2011 02:45:45 +0000 (13:45 +1100)]
Merge remote-tracking branch 'kgdb/kgdb-next'

12 years agoMerge remote-tracking branch 'mmc/mmc-next'
Stephen Rothwell [Wed, 14 Dec 2011 02:44:19 +0000 (13:44 +1100)]
Merge remote-tracking branch 'mmc/mmc-next'

Conflicts:
drivers/mmc/card/block.c

12 years agoMerge remote-tracking branch 'battery/master'
Stephen Rothwell [Wed, 14 Dec 2011 02:41:01 +0000 (13:41 +1100)]
Merge remote-tracking branch 'battery/master'

12 years agoMerge remote-tracking branch 'block/for-next'
Stephen Rothwell [Wed, 14 Dec 2011 02:38:34 +0000 (13:38 +1100)]
Merge remote-tracking branch 'block/for-next'