Steven Rostedt [Fri, 29 Oct 2010 16:33:43 +0000 (12:33 -0400)]
jump label: Add work around to i386 gcc asm goto bug
On i386 (not x86_64) early implementations of gcc would have a bug
with asm goto causing it to produce code like the following:
(This was noticed by Peter Zijlstra)
56 pushl 0
67 nopl jmp 0x6f
popl
jmp 0x8c
6f mov
test
je 0x8c
8c mov
call *(%esp)
The jump added in the asm goto skipped over the popl that matched
the pushl 0, which lead up to a quick crash of the system when
the jump was enabled. The nopl is defined in the asm goto () statement
and when tracepoints are enabled, the nop changes to a jump to the label
that was specified by the asm goto. asm goto is suppose to tell gcc that
the code in the asm might jump to an external label. Here gcc obviously
fails to make that work.
The bug report for gcc is here:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46226
The bug only appears on x86 when not compiled with
-maccumulate-outgoing-args. This option is always set on x86_64 and it
is also the work around for a function graph tracer i386 bug.
(See commit: 746357d6a526d6da9d89a2ec645b28406e959c2e)
This explains why the bug only showed up on i386 when function graph
tracer was not enabled.
This patch now adds a CONFIG_JUMP_LABEL option that is default
off instead of using jump labels by default. When jump labels are
enabled, the -maccumulate-outgoing-args will be used (causing a
slightly larger kernel image on i386). This option will exist
until we have a way to detect if the gcc compiler in use is safe
to use on all configurations without the work around.
Note, there exists such a test, but for now we will keep the enabling
of jump label as a manual option.
Archs that know the compiler is safe with asm goto, may choose to
select JUMP_LABEL and enable it by default.
Reported-by: Ingo Molnar <mingo@elte.hu> Cause-discovered-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jason Baron <jbaron@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: David Daney <ddaney@caviumnetworks.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: David Miller <davem@davemloft.net> Cc: Richard Henderson <rth@redhat.com>
LKML-Reference: <1288028746.3673.11.camel@laptop> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
H. Peter Anvin [Thu, 28 Oct 2010 04:09:15 +0000 (21:09 -0700)]
x86, ftrace: Use safe noops, drop trap test
Always use a safe 5-byte noop sequence. Drop the trap test, since it
is known to return false negatives on some virtualization platforms on
32 bits. The resulting code is both simpler and safer.
Cc: Daniel Drake <dsd@laptop.org> Cc: Jason Baron <jbaron@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
David Miller [Sat, 23 Oct 2010 18:06:24 +0000 (11:06 -0700)]
jump_label: Fix unaligned traps on sparc.
The vmlinux.lds.h knobs to emit the __jump_table section in the main
kernel image takes care to align the section, but this doesn't help
for the __jump_table section that gets emitted into modules.
Fix the resulting lack of section alignment by explicitly specifying
it in the assembler.
Signed-off-by: David S. Miller <davem@davemloft.net>
LKML-Reference: <20101023.110624.226758370.davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Fri, 29 Oct 2010 15:02:43 +0000 (11:02 -0400)]
jump label: Make arch_jump_label_text_poke_early() optional
Some archs do not need to do anything special for jump labels on
startup (like MIPS). This patch adds a weak function stub for
arch_jump_label_text_poke_early();
Cc: Jason Baron <jbaron@redhat.com> Cc: David Miller <davem@davemloft.net> Cc: David Daney <ddaney@caviumnetworks.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <1286218615-24011-2-git-send-email-ddaney@caviumnetworks.com>
LKML-Reference: <20101015201037.703989993@goodmis.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Mon, 18 Oct 2010 14:38:58 +0000 (10:38 -0400)]
jump label: Fix error with preempt disable holding mutex
Kprobes and jump label were having a race between mutexes that
was fixed by reordering the jump label. But this reordering
moved the jump label mutex into a preempt disable location.
This patch does a little fiddling to move the grabbing of
the jump label mutex from inside the preempt disable section
and still keep the order correct between the mutex and the
kprobes lock.
Reported-by: Ingo Molnar <mingo@elte.hu> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Jason Baron <jbaron@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Jason Baron [Fri, 1 Oct 2010 21:23:48 +0000 (17:23 -0400)]
jump label: Fix deadlock b/w jump_label_mutex vs. text_mutex
register_kprobe() downs the 'text_mutex' and then calls
jump_label_text_reserved(), which downs the 'jump_label_mutex'.
However, the jump label code takes those mutexes in the reverse
order.
Fix by requiring the caller of jump_label_text_reserved() to do
the jump label locking via the newly added: jump_label_lock(),
jump_label_unlock(). Currently, kprobes is the only user
of jump_label_text_reserved().
Reported-by: Ingo Molnar <mingo@elte.hu> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Jason Baron <jbaron@redhat.com>
LKML-Reference: <759032c48d5e30c27f0bba003d09bffa8e9f28bb.1285965957.git.jbaron@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Jason Baron [Fri, 1 Oct 2010 21:23:41 +0000 (17:23 -0400)]
jump label: Fix module __init section race
Jump label uses is_module_text_address() to ensure that the module
__init sections are valid before updating them. However, between the
check for a valid module __init section and the subsequent jump
label update, the module's __init section could be freed out from under
us.
We fix this potential race by adding a notifier callback to the
MODULE_STATE_LIVE state. This notifier is called *after* the __init
section has been run but before it is going to be freed. In the
callback, the jump label code zeros the key value for any __init jump
code within the module, and we add a check for a non-zero key value when
we update jump labels. In this way we require no additional data
structures.
Thanks to Mathieu Desnoyers for pointing out this race condition.
Linus Torvalds [Thu, 28 Oct 2010 02:04:36 +0000 (19:04 -0700)]
Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (48 commits)
DMAENGINE: move COH901318 to arch_initcall
dma: imx-dma: fix signedness bug
dma/timberdale: simplify conditional
ste_dma40: remove channel_type
ste_dma40: remove enum for endianess
ste_dma40: remove TIM_FOR_LINK option
ste_dma40: move mode_opt to separate config
ste_dma40: move channel mode to a separate field
ste_dma40: move priority to separate field
ste_dma40: add variable to indicate valid dma_cfg
async_tx: make async_tx channel switching opt-in
move async raid6 test to lib/Kconfig.debug
dmaengine: Add Freescale i.MX1/21/27 DMA driver
intel_mid_dma: change the slave interface
intel_mid_dma: fix the WARN_ONs
intel_mid_dma: Add sg list support to DMA driver
intel_mid_dma: Allow DMAC2 to share interrupt
intel_mid_dma: Allow IRQ sharing
intel_mid_dma: Add runtime PM support
DMAENGINE: define a dummy filter function for ste_dma40
...
Linus Torvalds [Thu, 28 Oct 2010 02:02:41 +0000 (19:02 -0700)]
Merge branch 'viafb-next' of git://github.com/schandinat/linux-2.6
* 'viafb-next' of git://github.com/schandinat/linux-2.6: (29 commits)
viafb: add initial VX900 support
viafb: fix hardware acceleration for suspend & resume
viafb: make suspend and resume work (on all machines?)
viafb: restore display on resume
Minimal support for viafb suspend/resume
viafb: use proper register for colour when doing fill ops
viafb: add documentation for proc interface
viafb: rename output devices
viafb: add a mapping of supported output devices
viafb: set sync polarity for all output devices
viafb: add function to change sync polarity per device
viafb: reduce I2C timeout and delay
viafb: enable I2C for CRT
viafb: fix i2c_transfer error handling
viafb: vt1636 cleanup
viafb: introduce per output device power management
viafb: limit LCD code impact
viafb: add interface for output device configuration
viafb: merge the remaining output path with enable functions
viafb: use new device routing
...
* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-mn10300: (44 commits)
MN10300: Save frame pointer in thread_info struct rather than global var
MN10300: Change "Matsushita" to "Panasonic".
MN10300: Create a defconfig for the ASB2364 board
MN10300: Update the ASB2303 defconfig
MN10300: ASB2364: Add support for SMSC911X and SMC911X
MN10300: ASB2364: Handle the IRQ multiplexer in the FPGA
MN10300: Generic time support
MN10300: Specify an ELF HWCAP flag for MN10300 Atomic Operations Unit support
MN10300: Map userspace atomic op regs as a vmalloc page
MN10300: And Panasonic AM34 subarch and implement SMP
MN10300: Delete idle_timestamp from irq_cpustat_t
MN10300: Make various interrupt priority settings configurable
MN10300: Optimise do_csum()
MN10300: Implement atomic ops using atomic ops unit
MN10300: Make the FPU operate in non-lazy mode under SMP
MN10300: SMP TLB flushing
MN10300: Use the [ID]PTEL2 registers rather than [ID]PTEL for TLB control
MN10300: Make the use of PIDR to mark TLB entries controllable
MN10300: Rename __flush_tlb*() to local_flush_tlb*()
MN10300: AM34 erratum requires MMUCTR read and write on exception entry
...
Linus Torvalds [Thu, 28 Oct 2010 01:47:39 +0000 (18:47 -0700)]
Merge branch 'module' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* 'module' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
NULL-terminate all pci_device_id tables
(trivial) Fix compiler warning in kernel/modules.c
Linus Torvalds [Thu, 28 Oct 2010 01:42:52 +0000 (18:42 -0700)]
Merge branch 'akpm-incoming-2'
* akpm-incoming-2: (139 commits)
epoll: make epoll_wait() use the hrtimer range feature
select: rename estimate_accuracy() to select_estimate_accuracy()
Remove duplicate includes from many files
ramoops: use the platform data structure instead of module params
kernel/resource.c: handle reinsertion of an already-inserted resource
kfifo: fix kfifo_alloc() to return a signed int value
w1: don't allow arbitrary users to remove w1 devices
alpha: remove dma64_addr_t usage
mips: remove dma64_addr_t usage
sparc: remove dma64_addr_t usage
fuse: use release_pages()
taskstats: use real microsecond granularity for CPU times
taskstats: split fill_pid function
taskstats: separate taskstats commands
delayacct: align to 8 byte boundary on 64-bit systems
delay-accounting: reimplement -c for getdelays.c to report information on a target command
namespaces Kconfig: move namespace menu location after the cgroup
namespaces Kconfig: remove the cgroup device whitelist experimental tag
namespaces Kconfig: remove pointless cgroup dependency
namespaces Kconfig: make namespace a submenu
...
Linus Torvalds [Thu, 28 Oct 2010 01:38:55 +0000 (18:38 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
percpu: Remove the multi-page alignment facility
x86-32: Allocate irq stacks seperate from percpu area
x86-32, mm: Remove duplicated #include
x86, printk: Get rid of <0> from stack output
x86, kexec: Make sure to stop all CPUs before exiting the kernel
x86/vsmp: Eliminate kconfig dependency warning
Linus Torvalds [Thu, 28 Oct 2010 01:34:59 +0000 (18:34 -0700)]
proc_bus_pci_ioctl: remove pointless BKL usage
The BKL was pushed into this function when it was converted to use the
unlocked_ioctl interface, but nothing that the function touches is
actually protected by the BKL. So just remove the BKL entirely, so that
we finally can get a realistic system build without the BKL being
enabled at all.
Linus Torvalds [Thu, 28 Oct 2010 01:17:02 +0000 (18:17 -0700)]
fasync: Fix placement of FASYNC flag comment
In commit f7347ce4ee7c ("fasync: re-organize fasync entry insertion to
allow it under a spinlock") Arnd took an earlier patch of mine that had
the comment about the FASYNC flag above the wrong function.
When the fasync_add_entry() function was split to introduce the new
fasync_insert_entry() helper function, the code that actually cares
about the FASYNC bit moved to that new helper.
Linus Torvalds [Thu, 28 Oct 2010 01:13:34 +0000 (18:13 -0700)]
Merge branch 'flock' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl
* 'flock' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
locks: turn lock_flocks into a spinlock
fasync: re-organize fasync entry insertion to allow it under a spinlock
locks/nfsd: allocate file lock outside of spinlock
lockd: fix nlmsvc_notify_blocked locking
lockd: push lock_flocks down
Andrew Morton [Wed, 27 Oct 2010 22:34:53 +0000 (15:34 -0700)]
select: rename estimate_accuracy() to select_estimate_accuracy()
Make it a subsystem-specific identifier because we wish to amke it
non-static in the next patch ("epoll: make epoll_wait() use the hrtimer
range feature").
Cc: Shawn Bohrer <shawn.bohrer@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Huang Shijie [Wed, 27 Oct 2010 22:34:52 +0000 (15:34 -0700)]
kernel/resource.c: handle reinsertion of an already-inserted resource
If the same resource is inserted to the resource tree (maybe not on
purpose), a dead loop will be created. In this situation, The kernel does
not report any warning or error :(
The command below will show a endless print.
#cat /proc/iomem
Michael Holzheu [Wed, 27 Oct 2010 22:34:45 +0000 (15:34 -0700)]
taskstats: use real microsecond granularity for CPU times
The taskstats interface uses microsecond granularity for the user and
system time values. The conversion from cputime to the taskstats values
uses the cputime_to_msecs primitive which effectively limits the
granularity to milliseconds. Add the cputime_to_usecs primitive for
architectures that have better, more precise CPU time values. Remove
cputime_to_msecs primitive because there are no more users left.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Luck Tony <tony.luck@intel.com> Cc: Shailabh Nagar <nagar1234@in.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Shailabh Nagar <nagar@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The start of the taskstats struct must be 8 byte aligned on IA64 (and
other systems with 8 byte alignment rules for 64-bit types) or runtime
alignment warnings will be issued.
This patch pads the pid/tgid field out to sizeof(long), which forces the
alignment of taskstats. The getdelays userspace code is ok with this
since it assumes 32-bit pid/tgid and then honors that header's length
field.
An array is used to avoid exposing kernel memory contents to userspace in
the response.
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Wed, 27 Oct 2010 22:34:42 +0000 (15:34 -0700)]
delay-accounting: reimplement -c for getdelays.c to report information on a target command
Task delay-accounting was identified as one means of determining how long
a process spends in congestion_wait() without adding new statistics. For
example, if the workload should not be doing IO, delay-accounting could
reveal how long it was spending in unexpected IO or delays.
Unfortunately, on closer examination it was clear that getdelays does not
act as documented.
Commit a3baf649 ("per-task-delay-accounting: documentation") added
Documentation/accounting/getdelays.c with a -c switch that was documented
to fork/exec a child and report statistics on it but for reasons that are
unclear to me, commit 9e06d3f9 deleted support for this switch but did not
update the documentation. It might be an oversight or it might be because
the control flow of the program meant that accounting information would be
printed once early in the lifetime of the program making it of limited
use.
This patch reimplements -c for getdelays.c to act as documented. Unlike
the original version, it waits until the command completes before printing
any information on it. An example of it being used looks like
CPU count real total virtual total delay total
42779 50512320965164722692564207988
IO count delay total
41727 97804147758
SWAP count delay total
0 0
RECLAIM count delay total
0 0
Daniel Lezcano [Wed, 27 Oct 2010 22:34:41 +0000 (15:34 -0700)]
namespaces Kconfig: move namespace menu location after the cgroup
We have the namespaces as a menuconfig like the cgroup. The cgroup and
the namespace are two base bricks for the containers.
It is more logical to put the namespace menu right after the cgroup menu.
Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: "Serge E. Hallyn" <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Lezcano [Wed, 27 Oct 2010 22:34:40 +0000 (15:34 -0700)]
namespaces Kconfig: remove the cgroup device whitelist experimental tag
This subsystem is merged since a long time now, I think we can consider it
mature enough.
Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: "Serge E. Hallyn" <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The different cgroup subsystems are under the cgroup submenu. The
dependency between the cgroups and the menu subsystems is pointless.
Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Cc: "Serge E. Hallyn" <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Lezcano [Wed, 27 Oct 2010 22:34:38 +0000 (15:34 -0700)]
namespaces Kconfig: make namespace a submenu
Make the namespaces config option a submenu.
Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: "Serge E. Hallyn" <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Lezcano [Wed, 27 Oct 2010 22:34:37 +0000 (15:34 -0700)]
namespaces: default all the namespaces to 'yes' when CONFIG_NAMESPACES is selected
As the different namespaces depend on 'CONFIG_NAMESPACES', it is logical
to enable all the namespaces when we enable NAMESPACES.
Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: David Miller <davem@davemloft.net> Acked-By: Matt Helsley <matthltc@us.ibm.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Lezcano [Wed, 27 Oct 2010 22:34:37 +0000 (15:34 -0700)]
namespaces: remove pid_ns and net_ns experimental status
The pid namespace is in the kernel since 2.6.27 and the net_ns since
2.6.29. They are enabled in the distro by default and used by userspace
component. They are mature enough to remove the 'experimental' label.
Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rapidio: fix destructive port EM initialization for Tsi568
Replace possibly damaging broadcast writes into the per-port SP_MODE
registers with individual writes for each port. This will preserve
individual port configurations in case if ports are configured
differently.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Thomas Moll <thomas.moll@sysgo.com> Cc: Micha Nelissen <micha@neli.hopto.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
RapidIO spec v.2.1 adds Idle Sequence 2 into LP-Serial Physical Layer.
The fix ensures that corresponding bits are not corrupted during error
handling.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Thomas Moll <thomas.moll@sysgo.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Micha Nelissen <micha@neli.hopto.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Detects RIO link to the already enumerated device and properly sets links
between device objects. Changes to the enumeration/discovery logic:
1. Use Master Enable bit to signal end of the enumeration - agents may
start their discovery process as soon as they see this bit set
(Component Tag register was used before for this purpose).
2. Enumerator sets Component Tag (!= 0) immediately during device
setup. This allows to identify the device if the redundant route
exists in a RIO system.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Thomas Moll <thomas.moll@sysgo.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Micha Nelissen <micha@neli.hopto.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rapidio: add device access check into the enumeration
Add explicit device access check before performing device enumeration.
This gives a chance to clear possible link error conditions by issuing
safe maintenance read request(s).
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Thomas Moll <thomas.moll@sysgo.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Micha Nelissen <micha@neli.hopto.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rapidio: add handling of orphan port-write message
Add check for access to port-write (PW) message source device before
processing the PW message. If source RIO device is not available (power
down or RIO link failure) trace back to a last available switch/port on
the PW message route and service failure at that point.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Thomas Moll <thomas.moll@sysgo.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Micha Nelissen <micha@neli.hopto.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rapidio: add relation links between RIO device structures
Create back and forward links between RIO devices. These links are
intended for use by error management and hot-plug extensions. Links for
redundant RIO connections between switches are not set (will be fixed in a
separate patch).
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Thomas Moll <thomas.moll@sysgo.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Micha Nelissen <micha@neli.hopto.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This set of RapidIO patches extends support for standard error recovery
mechanism and adds new IDT Gen2 sRIO switch devices - CPS-1848 and
CPS-1616. Implementation of the standard error-stopped state recovery
mechanism (as defined by the RapidIO specification) is required for the
new switches.
Version 2 of this set of patches addresses received comments and fixes an
error notification setup issue found in the idt_gen2.c after the first
version was released.
This patch:
Make RapidIO devices appear in /sys/devices/rapidio directory instead of
top of /sys/devices directory.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Thomas Moll <thomas.moll@sysgo.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Micha Nelissen <micha@neli.hopto.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rakib Mullick [Wed, 27 Oct 2010 22:34:26 +0000 (15:34 -0700)]
drivers/char/mxser.c: fix compilation warning in mxser.c
Both mxser_disable_must_enchance_mode() and mxser_get_must_hardware_id()
called from function CheckIsMoxaMust(), when CONFIG_PCI=y. So mark both
the functions under CONFIG_PCI.
We were warned by the following warning.
drivers/char/mxser.c:306: warning: `mxser_disable_must_enchance_mode' defined but not used
drivers/char/mxser.c:391: warning: `mxser_get_must_hardware_id'
defined but not used
Vasiliy Kulikov [Wed, 27 Oct 2010 22:34:21 +0000 (15:34 -0700)]
drivers/char/ppdev.c: fix information leak to userland
Structure par_timeout is copied to userland with some padding fields
unitialized. Field tv_usec has type __kernel_suseconds_t, it differs from
tv_sec's type on some architectures. It leads to leaking of stack memory.
Paul Fulghum [Wed, 27 Oct 2010 22:34:20 +0000 (15:34 -0700)]
synclink_gt: fix per device locking
Fix a long standing bug with per device locking.
Each device has an associated spinlock for synchronizing access to
hardware and state information with the ISR. A single hardware card has
one or more devices.
Bug: Non ISR code correctly acquires and releases the per device lock.
ISR incorrectly always acquires and releases the lock of the first device
on the card.
The decoupled and list based nature of the ISR and deferred processing
interaction allowed this to work in normal operation. Exceptional events
like an application forcing hardware shutdown, reset, or reconfiguration
while active can trigger the bug.
Fixed ISR to acquire and release the per device lock.
One exception is manipulation of the GPIO card resource which is global
and effectively owned by the first device of the card. Non-ISR access to
GPIO resource is changed to use lock of first device on card.
Signed-off-by: Paul Fulghum <paulkf@microgate.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dimitri Sivanich [Wed, 27 Oct 2010 22:34:20 +0000 (15:34 -0700)]
SGI Altix IA64 mmtimer: eliminate long interval timer holdoffs
This patch for SGI Altix/IA64 eliminates interval long timer holdoffs in
cases where we don't start an interval timer before the expiration time.
This sometimes happens when a number of interval timers on the same shub
with the same interval run simultaneously.
Cc: Timur Tabi <timur@freescale.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Amit Shah <amit.shah@redhat.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Kumar Gala <galak@gate.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Helge Deller [Wed, 27 Oct 2010 22:34:16 +0000 (15:34 -0700)]
ipc/shm.c: add RSS and swap size information to /proc/sysvipc/shm
The kernel currently provides no functionality to analyze the RSS and swap
space usage of each individual sysvipc shared memory segment.
This patch adds this info for each existing shm segment by extending the
output of /proc/sysvipc/shm by two columns for RSS and swap.
Since shmctl(SHM_INFO) already provides a similiar calculation (it
currently sums up all RSS/swap info for all segments), I did split out a
static function which is now used by the /proc/sysvipc/shm output and
shmctl(SHM_INFO).
SAP products (esp. the SAP Netweaver ABAP Kernel) uses lots of big shared
memory segments (we often have Linux systems with >= 16GB shm usage).
Sometimes we get customer reports about "slow" system responses and while
looking into their configurations we often find massive swapping activity
on the system. With this patch it's now easy to see from the command line
if and which shm segments gets swapped out (and how much) and can more
easily give recommendations for system tuning. Without the patch it's
currently not possible to do such shm analysis at all.
Also...
Add some spaces in front of the "size" field for 64bit kernels to get the
columns correct if you cat the contents of the file. In
sysvipc_shm_proc_show() the kernel prints the size value in "SPEC_SIZE"
format, which is defined like this:
KOSAKI Motohiro [Wed, 27 Oct 2010 22:34:16 +0000 (15:34 -0700)]
exec: don't turn PF_KTHREAD off when a target command was not found
Presently do_execve() turns PF_KTHREAD off before search_binary_handler().
THis has a theorical risk of PF_KTHREAD getting lost. We don't have to
turn PF_KTHREAD off in the ENOEXEC case.
This patch moves this flag modification to after the finding of the
executable file.
This is only a theorical issue because kthreads do not call do_execve()
directly. But fixing would be better.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In /proc/stat, the number of per-IRQ event is shown by making a sum each
irq's events on all cpus. But we can make use of kstat_irqs().
kstat_irqs() do the same calculation, If !CONFIG_GENERIC_HARDIRQ,
it's not a big cost. (Both of the number of cpus and irqs are small.)
If a system is very big and CONFIG_GENERIC_HARDIRQ, it does
for_each_irq()
for_each_cpu()
- look up a radix tree
- read desc->irq_stat[cpu]
This seems not efficient. This patch adds kstat_irqs() for
CONFIG_GENRIC_HARDIRQ and change the calculation as
for_each_irq()
look up radix tree
for_each_cpu()
- read desc->irq_stat[cpu]
This reduces cost.
A test on (4096cpusp, 256 nodes, 4592 irqs) host (by Jack Steiner)
%time cat /proc/stat > /dev/null
Before Patch: 2.459 sec
After Patch : .561 sec
[akpm@linux-foundation.org: unexport kstat_irqs, coding-style tweaks]
[akpm@linux-foundation.org: fix unused variable 'per_irq_sum'] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: Jack Steiner <steiner@sgi.com> Acked-by: Jack Steiner <steiner@sgi.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
/proc/stat shows the total number of all interrupts to each cpu. But when
the number of IRQs are very large, it take very long time and 'cat
/proc/stat' takes more than 10 secs. This is because sum of all irq
events are counted when /proc/stat is read. This patch adds "sum of all
irq" counter percpu and reduce read costs.
The cost of reading /proc/stat is important because it's used by major
applications as 'top', 'ps', 'w', etc....
A test on a mechin (4096cpu, 256 nodes, 4592 irqs) shows
%time cat /proc/stat > /dev/null
Before Patch: 12.627 sec
After Patch: 2.459 sec
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: Jack Steiner <steiner@sgi.com> Acked-by: Jack Steiner <steiner@sgi.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
/proc/pid/smaps: export amount of anonymous memory in a mapping
Export the number of anonymous pages in a mapping via smaps.
Even the private pages in a mapping backed by a file, would be marked as
anonymous, when they are modified. Export this information to user-space via
smaps.
Exporting this count will help gdb to make a better decision on which
areas need to be dumped in its coredump; and should be useful to others
studying the memory usage of a process.
The userland ELF tools have been coping with partial-segments core files
for a few years now. Multiple distro builds are now setting this option.
It behooves everyone who ever deals with core files to have more info
dumped in there, especially as more and more people's compilers are
producing build IDs. Make it the default.
Anyone using older tools confused by these core files can configure this
option off, or just change /proc/PID/coredump_filter after boot.
Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Xiaotian Feng [Wed, 27 Oct 2010 22:34:08 +0000 (15:34 -0700)]
core_pattern: fix truncation by core_pattern handler with long parameters
We met a parameter truncated issue, consider following:
> echo "|/root/core_pattern_pipe_test %p /usr/libexec/blah-blah-blah \
%s %c %p %u %g 11 12345678901234567890123456789012345678 %t" > \
/proc/sys/kernel/core_pattern
This is okay because the strings is less than CORENAME_MAX_SIZE. "cat
/proc/sys/kernel/core_pattern" shows the whole string. but after we run
core_pattern_pipe_test in man page, we found last parameter was truncated
like below:
The root cause is core_pattern allows % specifiers, which need to be
replaced during parse time, but the replace may expand the strings to
larger than CORENAME_MAX_SIZE. So if the last parameter is % specifiers,
the replace code is using snprintf(out_ptr, out_end - out_ptr, ...), this
will write out of corename array.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Neil Horman <nhorman@tuxdriver.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Wed, 27 Oct 2010 22:34:08 +0000 (15:34 -0700)]
signals: move cred_guard_mutex from task_struct to signal_struct
Oleg Nesterov pointed out we have to prevent multiple-threads-inside-exec
itself and we can reuse ->cred_guard_mutex for it. Yes, concurrent
execve() has no worth.
Let's move ->cred_guard_mutex from task_struct to signal_struct. It
naturally prevent multiple-threads-inside-exec.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Namhyung Kim [Wed, 27 Oct 2010 22:34:06 +0000 (15:34 -0700)]
signals: annotate lock_task_sighand()
lock_task_sighand() grabs sighand->siglock in case of returning non-NULL
but unlock_task_sighand() releases it unconditionally. This leads sparse
to complain about the lock context imbalance. Rename and wrap
lock_task_sighand() using __cond_lock() macro to make sparse happy.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Namhyung Kim [Wed, 27 Oct 2010 22:34:04 +0000 (15:34 -0700)]
ptrace: cleanup arch_ptrace() on um
Remove unnecessary castings using void pointer and fix copy_to_user()
return value. Also add missing __user markup on the argument of
arch_ptrctl().
Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Namhyung Kim [Wed, 27 Oct 2010 22:34:03 +0000 (15:34 -0700)]
ptrace: cleanup arch_ptrace() on sparc
Factor out struct fps and remove redundant castings.
Signed-off-by: Namhyung Kim <namhyung@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Namhyung Kim [Wed, 27 Oct 2010 22:34:03 +0000 (15:34 -0700)]
ptrace: cleanup arch_ptrace() on sh
Remove unnecessary castings and get rid of dummy pointer in favor of
offsetof() macro in ptrace_32.c. Also use temporary variables and
break long lines in order to improve readability.
Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Namhyung Kim [Wed, 27 Oct 2010 22:34:01 +0000 (15:34 -0700)]
ptrace: cleanup arch_ptrace() on powerpc
Use new 'datavp' and 'datalp' variables in order to remove unnecessary
castings.
Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Namhyung Kim [Wed, 27 Oct 2010 22:33:56 +0000 (15:33 -0700)]
ptrace: cleanup arch_ptrace() on m68knommu
Use new 'regno', 'datap' variables in order to remove duplicated
expressions and unnecessary castings. Alse remove checking @addr less
than 0 because addr is now unsigned.
Signed-off-by: Namhyung Kim <namhyung@gmail.com> Acked-by: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Namhyung Kim [Wed, 27 Oct 2010 22:33:53 +0000 (15:33 -0700)]
ptrace: cleanup arch_ptrace() on h8300
Use new 'regno', 'datap' variables in order to remove duplicated
expressions and unnecessary castings. Alse remove checking @addr
less than 0 because addr is now unsigned.
Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Namhyung Kim [Wed, 27 Oct 2010 22:33:52 +0000 (15:33 -0700)]
ptrace: cleanup arch_ptrace() on frv
Use new 'regno', 'datap' variables in order to remove duplicated
expressions and unnecessary castings. Alse remove checking @addr
less than 0 because addr is now unsigned.
Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: "Daniel K." <dk@uw.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Namhyung Kim [Wed, 27 Oct 2010 22:33:51 +0000 (15:33 -0700)]
ptrace: cleanup arch_ptrace() on cris
Use new 'regno' variable in order to remove redandunt expression and
remove checking @addr less than 0 because @addr is now unsigned. Also
update 'datap' on PTRACE_GET/SETREGS to fix a bug on arch-v10.
Signed-off-by: Namhyung Kim <namhyung@gmail.com> Acked-by: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>