Doug Ledford [Mon, 24 Oct 2011 14:53:42 +0000 (01:53 +1100)]
ipc/mqueue: cleanup definition names and locations
We had a customer come up with a problem while trying to upgrade from our
2.6.18 kernel to our 2.6.32 kernel. In diagnosing their problem, it was
determined that when commit b231cca4 ("message queues: increase range
limits") changed the msg size max from INT_MAX to 8192*128, that's what
broke their setup.
While fixing this problem, testing showed that if you increase the max
values of a msg queue, then attempt to create one without an attr struct
passed in to the open call, it could fail because it sets the queue size
to the max of both the msg size and queue size. If these are large
enough, they over run the default RLIMIT_MSGQUEUE. This change was also
introduced in the b231cca4 ("message queues: increase range limits")
commit.
We then found that the msg queue limits were not all being enforced on
CAP_SYS_RESOURCE apps.
Finally, we found that commit 9cf18e1d ("ipc: HARD_MSGMAX should be higher
not lower on 64bit") fiddled with HARD_MSGMAX without realizing that the
reason it was set to what it was, was to avoid trying to kmalloc a chunk
larger than 128K.
So this series of patches cleans up the various defines, takes us back to
having a larger HARD_MSGSIZEMAX, goes back to using a separate define for
the case where a user doesn't pass in an attr struct in case the maxes
have been raised too large for RLIMIT_MSGQUEUE, enforces the maximums on
CAP_SYS_RESOURCE apps, uses vmalloc instead of kmalloc when the msg
pointer array is too large, and documents all of this so it shouldn't
happen again.
This patch:
The various defines for minimums and maximums of the sysctl controllable
mqueue values are scattered amongst different files and named
inconsistently. Move them all into ipc_namespace.h and make them have
consistent names. Additionally, make the number of queues per namespace
also have a minimum and maximum and use the same sysctl function as the
other two settable variables.
Signed-off-by: Doug Ledford <dledford@redhat.com> Cc: Amerigo Wang <amwang@redhat.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Joe Korty <joe.korty@ccur.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: <stable@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andi Kleen [Mon, 24 Oct 2011 14:53:41 +0000 (01:53 +1100)]
brlocks/lglocks: clean up code
lglocks and brlocks are currently generated with some complicated macros
in lglock.h. But there's no reason I can see to not just use common
utility functions that get pointers to the lglock.
Since there are at least two users it makes sense to share this code in a
library.
This will also make it later possible to dynamically allocate lglocks.
In general the users now look more like normal function calls with
pointers, not magic macros.
The patch is rather large because I move over all users in one go to keep
it bisectable. This impacts the VFS somewhat in terms of lines changed.
But no actual behaviour change.
Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Garrett [Mon, 24 Oct 2011 14:53:40 +0000 (01:53 +1100)]
hrtimers: Special-case zero length sleeps
sleep(0) is a common construct used by applications that want to trigger
the scheduler. sched_yield() might make more sense, but only appeared in
POSIX.1-2001 and so plenty of example code still uses the sleep(0) form.
This wouldn't normally be a problem, but it means that event-driven
applications that are merely trying to avoid starving other processes may
actually end up sleeping due to having large timer_slack values. Special-
casing this seems reasonable.
Signed-off-by: Matthew Garrett <mjg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Hugh Dickins [Mon, 24 Oct 2011 14:53:40 +0000 (01:53 +1100)]
drm: avoid switching to text console if there is no panic timeout
Add a check for panic_timeout in the drm_fb_helper_panic() notifier: if
we're going to reboot immediately, the user will not be able to see the
messages anyway, and messing with the video mode may display artifacts,
and certainly get into several layers of complexity (including mutexes and
memory allocations) which we shall be much safer to avoid.
[msb@chromium.org: edited commit message and modified to short-circuit panic_timeout < 0 instead of testing panic_timeout >= 0] Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Mandeep Singh Baines <msb@chromium.org> Cc: Dave Airlie <airlied@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Stéphane Marchesin <marcheu@chromium.org> Cc: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Shaohui Xie [Mon, 24 Oct 2011 14:53:38 +0000 (01:53 +1100)]
drivers/edac/mpc85xx_edac.c: fix memory controller compatible for edac
compatible in dts has been changed, so the driver needs to be updated
accordingly.
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
warning: symbol 'get_nonsnap_parent' was not declared. Should it be static?
warning: symbol 'done_closing_sessions' was not declared. Should it be static?
Local functions don't need external visability. Make them static.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Sage Weil <sage@newdream.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jesper Juhl [Mon, 24 Oct 2011 14:53:36 +0000 (01:53 +1100)]
audit: always follow va_copy() with va_end()
A call to va_copy() should always be followed by a call to va_end() in the
same function. In kernel/autit.c::audit_log_vformat() this is not always
done. This patch makes sure va_end() is always called.
Signed-off-by: Jesper Juhl <jj@chaosbits.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mathias Krause [Mon, 24 Oct 2011 14:53:35 +0000 (01:53 +1100)]
arm, exec: remove redundant set_fs(USER_DS)
The address limit is already set in flush_old_exec() so this
set_fs(USER_DS) is redundant.
Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
arch/x86/kernel/tsc.c: In function 'calibrate_delay_is_known':
arch/x86/kernel/tsc.c:1012: error: 'struct cpuinfo_x86' has no member named 'phys_proc_id'
arch/x86/kernel/tsc.c:1012: error: 'struct cpuinfo_x86' has no member named 'phys_proc_id'
arch/x86/kernel/tsc.c:1006: warning: unused variable 'cpu'
Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jack Steiner <steiner@sgi.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jack Steiner [Mon, 24 Oct 2011 14:53:34 +0000 (01:53 +1100)]
x86: reduce clock calibration time during slave cpu startup
Reduce the startup time for slave cpus.
Adds hooks for an arch-specific function for clock calibration. These
hooks are used on x86. If a newly started cpu has the same phys_proc_id
as a core already active, uses the TSC for the delay loop and has a
CONSTANT_TSC, use the already-calculated value of loops_per_jiffy.
This patch reduces the time required to start slave cpus on a 4096 cpu
system from: 465 sec OLD 62 sec NEW
This reduces boot time on a 4096p system by almost 7 minutes. Nice...
Signed-off-by: Jack Steiner <steiner@sgi.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Shaohua Li [Mon, 24 Oct 2011 14:53:34 +0000 (01:53 +1100)]
x86: tlb flush avoid superflous leave_mm()
If just one page VA tlb is required to be flushed and current task is in
lazy TLB state, doing leave_mm() is superfluous because it flushes the
whole TLB. This can reduce some TLB miss.
Signed-off-by: Shaohua Li <shaohua.li@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
arch/x86/kernel/e820.c: quiet sparse noise about plain integer as NULL pointer
The last parameter to sort() is a pointer to the function used to swap
items. This parameter should be NULL, not 0, when not used. This quiets
the following sparse warning:
warning: Using plain integer as NULL pointer
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
William Douglas [Mon, 24 Oct 2011 14:53:32 +0000 (01:53 +1100)]
x86,mrst: add mapping for bma023
There is now an upstream bma023 driver so instead of submitting ours we
use that one. The defaults are just fine so it's a simple mapping entry.
(Thanks go to Erik Andersson for incorporating the changes we needed into his
version)
Signed-off-by: William Douglas <william.douglas@intel.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrew Morton [Mon, 24 Oct 2011 14:53:32 +0000 (01:53 +1100)]
drivers/power/intel_mid_battery.c: fix build
Seems that nobody's even trying any more.
Cc: Nithish Mahalingam <nithish.mahalingam@intel.com> Cc: Alan Cox <alan@linux.intel.com> Cc: Anton Vorontsov <cbouatmailru@gmail.com> Cc: Major Lee <major_lee@wistron.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Feng Tang [Mon, 24 Oct 2011 14:53:30 +0000 (01:53 +1100)]
vrtc: change its year offset from 1960 to 1972
Real world year equals the value in vrtc YEAR register plus an offset. We
used 1960 for original developepment as the offset to make leap year
consistent, but for a device's first use, its YEAR register is 0 and the
system year will be parsed as 1960 which is not a valid UNIX time and will
cause many applications to fail mysteriously. Devices use 1972 instead to
fix this issue.
Updated patch which adds a sanity check suggested by Mathias
Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ludwig Nussel [Mon, 24 Oct 2011 14:53:30 +0000 (01:53 +1100)]
x86: fix mmap random address range
On x86_32 casting the unsigned int result of get_random_int() to long may
result in a negative value. On x86_32 the range of mmap_rnd() therefore
was -255 to 255. The 32bit mode on x86_64 used 0 to 255 as intended.
The bug was introduced by 675a081 ("x86: unify mmap_{32|64}.c") in January
2008.
Signed-off-by: Ludwig Nussel <ludwig.nussel@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Harvey Harrison <harvey.harrison@gmail.com> Cc: <stable@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Shérab [Mon, 24 Oct 2011 14:53:29 +0000 (01:53 +1100)]
arch/x86/platform/iris/iris.c: register a platform device and a platform driver
This makes the iris driver use the platform API, so it is properly exposed
in /sys.
[akpm@linux-foundation.org: remove commented-out code, add missing space to printk, clean up code layout] Signed-off-by: Shérab <Sebastien.Hinderer@ens-lyon.org> Cc: Len Brown <lenb@kernel.org> Cc: Matthew Garrett <mjg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andy Whitcroft [Mon, 24 Oct 2011 14:53:27 +0000 (01:53 +1100)]
readlinkat: ensure we return ENOENT for the empty pathname for normal lookups
Since the commit below which added O_PATH support to the *at() calls, the
error return for readlink/readlinkat for the empty pathname has switched
from ENOENT to EINVAL:
readlinkat(), fchownat() and fstatat() with empty relative pathnames
This is both unexpected for userspace and makes readlink/readlinkat
inconsistant with all other interfaces; and inconsistant with our stated
return for these pathnames.
As the readlinkat call does not have a flags parameter we cannot use the
AT_EMPTY_PATH approach used in the other calls. Therefore expose whether
the original path is infact entry via a new user_path_at_empty() path
lookup function. Use this to determine whether to default to EINVAL or
ENOENT for failures.
BugLink: http://bugs.launchpad.net/bugs/817187 Signed-off-by: Andy Whitcroft <apw@canonical.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alex Bligh [Mon, 24 Oct 2011 14:53:27 +0000 (01:53 +1100)]
net/netfilter/nf_conntrack_netlink.c: fix Oops on container destroy
Problem:
A repeatable Oops can be caused if a container with networking
unshared is destroyed when it has nf_conntrack entries yet to expire.
A copy of the oops follows below. A perl program generating the oops
repeatably is attached inline below.
Analysis:
The oops is called from cleanup_net when the namespace is
destroyed. conntrack iterates through outstanding events and calls
death_by_timeout on each of them, which in turn produces a call to
ctnetlink_conntrack_event. This calls nf_netlink_has_listeners, which
oopses because net->nfnl is NULL.
The perl program generates the container through fork() then
clone(NS_NEWNET). I does not explicitly set up netlink
explicitly set up netlink, but I presume it was set up else net->nfnl
would have been NULL earlier (i.e. when an earlier connection
timed out). This would thus suggest that net->nfnl is made NULL
during the destruction of the container, which I think is done by
nfnetlink_net_exit_batch.
I can see that the various subsystems are deinitialised in the opposite
order to which the relevant register_pernet_subsys calls are called,
and both nf_conntrack and nfnetlink_net_ops register their relevant
subsystems. If nfnetlink_net_ops registered later than nfconntrack,
then its exit routine would have been called first, which would cause
the oops described. I am not sure there is anything to prevent this
happening in a container environment.
Whilst there's perhaps a more complex problem revolving around ordering
of subsystem deinit, it seems to me that missing a netlink event on a
container that is dying is not a disaster. An early check for net->nfnl
being non-NULL in ctnetlink_conntrack_event appears to fix this. There
may remain a potential race condition if it becomes NULL immediately
after being checked (I am not sure any lock is held at this point or
how synchronisation for subsystem deinitialization works).
Patch:
The patch attached should apply on everything from 2.6.26 (if not before)
onwards; it appears to be a problem on all kernels. This was taken against
Ubuntu-3.0.0-11.17 which is very close to 3.0.4. I have torture-tested it
with the above perl script for 15 minutes or so; the perl script hung the
machine within 20 seconds without this patch.
Applicability:
If this is the right solution, it should be applied to all stable kernels
as well as head. Apart from the minor overhead of checking one variable
against NULL, it can never 'do the wrong thing', because if net->nfnl
is NULL, an oops will inevitably result. Therefore, checking is a reasonable
thing to do unless it can be proven than net->nfnl will never be NULL.
Check net->nfnl for NULL in ctnetlink_conntrack_event to avoid Oops on
container destroy
Signed-off-by: Alex Bligh <alex@alex.org.uk> Cc: Patrick McHardy <kaber@trash.net> Cc: David Miller <davem@davemloft.net> Cc: <stable@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrew Morton [Mon, 24 Oct 2011 14:53:26 +0000 (01:53 +1100)]
drivers/net/ethernet/i825xx/3c505.c: fix build with dynamic debug
The `#define filename' screws up the expansion of
DEFINE_DYNAMIC_DEBUG_METADATA:
drivers/net/ethernet/i825xx/3c505.c: In function 'send_pcb':
drivers/net/ethernet/i825xx/3c505.c:390: error: expected identifier before string constant
drivers/net/ethernet/i825xx/3c505.c:390: error: expected '}' before '.' token
drivers/net/ethernet/i825xx/3c505.c:436: error: expected identifier before string constant
drivers/net/ethernet/i825xx/3c505.c:435: error: expected '}' before '.' token
drivers/net/ethernet/i825xx/3c505.c: In function 'start_receive':
drivers/net/ethernet/i825xx/3c505.c:557: error: expected identifier before string constant
drivers/net/ethernet/i825xx/3c505.c:557: error: expected '}' before '.' token
drivers/net/ethernet/i825xx/3c505.c: In function 'receive_packet':
drivers/net/ethernet/i825xx/3c505.c:629: error: expected identifier before string constant
etc
Cc: Philip Blundell <philb@gnu.org> Cc: David Miller <davem@davemloft.net> Cc: Jason Baron <jbaron@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In file included from arch/x86/kernel/pci-dma.c:3:
include/linux/dmar.h:248: warning: 'struct acpi_dmar_header' declared inside parameter list
include/linux/dmar.h:248: warning: its scope is only this definition or declaration, which is probably not what you want