Steven Rostedt [Tue, 4 Nov 2008 04:15:57 +0000 (23:15 -0500)]
ftrace: function tracer with irqs disabled
Impact: disable interrupts during trace entry creation (as opposed to preempt)
To help with performance, I set the ftracer to not disable interrupts,
and only to disable preemption. If an interrupt occurred, it would not
be traced, because the function tracer protects itself from recursion.
This may be faster, but the trace output might miss some traces.
This patch makes the fuction trace disable interrupts, but it also
adds a runtime feature to disable preemption instead. It does this by
having two different tracer functions. When the function tracer is
enabled, it will check to see which version is requested (irqs disabled
or preemption disabled). Then it will use the corresponding function
as the tracer.
Irq disabling is the default behaviour, but if the user wants better
performance, with the chance of missing traces, then they can choose
the preempt disabled version.
Running hackbench 3 times with the irqs disabled and 3 times with
the preempt disabled function tracer yielded:
tracing type times entries recorded
------------ -------- ----------------
irq disabled 43.393 166433066
43.282 166172618
43.298 166256704
preempt is 10.8 percent faster than irqs disabled.
I wrote a patch to count function trace recursion and reran hackbench.
With irq disabled: 1,150 times the function tracer did not trace due to
recursion.
with preempt disabled: 5,117,718 times.
The thousand times with irq disabled could be due to NMIs, or simply a case
where it called a function that was not protected by notrace.
But we also see that a large amount of the trace is lost with the
preempt version.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Parts of the tracer needs to be careful about schedule recursion.
If the NEED_RESCHED flag is set, a preempt_enable will call schedule.
Inside the schedule function, the NEED_RESCHED flag is cleared.
The problem arises when a trace happens in the schedule function but before
NEED_RESCHED is cleared. The race is as follows:
schedule()
>> tracer called
trace_function()
preempt_disable()
[ record trace ]
preempt_enable() <<- here's the issue.
[check NEED_RESCHED]
schedule()
[ Repeat the above, over and over again ]
The naive approach is simply to use preempt_enable_no_schedule instead.
The problem with that approach is that, although we solve the schedule
recursion issue, we now might lose a preemption check when not in the
schedule function.
trace_function()
preempt_disable()
[ record trace ]
[Interrupt comes in and sets NEED_RESCHED]
preempt_enable_no_resched()
[continue without scheduling]
The way ftrace handles this problem is with the following approach:
This may seem like the opposite of what we want. If resched is set
then we call the "no_sched" version?? The reason we do this is because
if NEED_RESCHED is set before we disable preemption, there's two reasons
for that:
1) we are in an atomic code path
2) we are already on our way to the schedule function, and maybe even
in the schedule function, but have yet to clear the flag.
Both the above cases we do not want to schedule.
This solution has already been implemented within the ftrace infrastructure.
But the problem is that it has been implemented several times. This patch
encapsulates this code to two nice functions.
resched = ftrace_preempt_disable();
[ record trace]
ftrace_preempt_enable(resched);
This way the tracers do not need to worry about getting it right.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Lai Jiangshan [Tue, 28 Oct 2008 02:51:53 +0000 (10:51 +0800)]
tracepoint: introduce *_noupdate APIs.
Impact: add new tracepoint APIs to allow the batched registration of probes
new APIs separate tracepoint_probe_register(),
tracepoint_probe_unregister() into 2 steps. The first step of them
is just update tracepoint_entry, not connect or disconnect.
this patch introduces tracepoint_probe_update_all() for update all.
these APIs are very useful for registering lots of probes
but just updating once. Another very important thing is that
*_noupdate APIs do not require module_mutex.
Lai Jiangshan [Tue, 28 Oct 2008 02:51:49 +0000 (10:51 +0800)]
tracepoint: simplification for tracepoints using RCU
Impact: simplify implementation
Now, unused memory is handled by struct tp_probes.
old code use these three field to handle unused memory.
struct tracepoint_entry {
...
struct rcu_head rcu;
void *oldptr;
unsigned char rcu_pending:1;
...
};
in this way, unused memory is handled by struct tracepoint_entry.
it bring reenter bug(it was fixed) and tracepoint.c is filled
full of ".*rcu.*" code statements. this patch removes all these.
and:
rcu_barrier_sched() is removed.
Do not need regain tracepoints_mutex after tracepoint_update_probes()
several little cleanup.
Al Viro [Fri, 31 Oct 2008 19:50:41 +0000 (19:50 +0000)]
tracing, alpha: undefined reference to `save_stack_trace'
Impact: build fix on !stacktrace architectures
only select STACKTRACE on architectures that have STACKTRACE_SUPPORT
... since we also need to ifdef out the guts of ftrace_trace_stack().
We also want to disallow setting TRACE_ITER_STACKTRACE in trace_flags
on such configs, but that can wait.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6:
ide-gd: re-get capacity on revalidate
tx4938ide: Avoid underflow on calculation of a wait cycle
tx4938ide: Do not call devm_ioremap for whole 128KB
tx4938ide: Check minimum cycle time and SHWT range (v2)
ide: Switch to a common address
ide-cd: fix DMA alignment regression
Atsushi Nemoto [Sun, 2 Nov 2008 20:40:09 +0000 (21:40 +0100)]
tx4938ide: Check minimum cycle time and SHWT range (v2)
SHWT value is used as address valid to -CSx assertion and -CSx to -DIOx
assertion setup time, and contrarywise, -DIOx to -CSx release and -CSx
release to address invalid hold time, so it actualy applies 4 times and
so constitutes -DIOx recovery time. Check requirement of the recovery
time and cycle time. Also check SHWT maximum value.
Suggested-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Cc: ralf@linux-mips.org Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Borislav Petkov [Sun, 2 Nov 2008 20:40:07 +0000 (21:40 +0100)]
ide-cd: fix DMA alignment regression
e5318b531b008c79d2a0c0df06a7b8628da38e2f ("ide: use the dma safe check for
REQ_TYPE_ATA_PC") introduced a regression which caused some ATAPI drives to
turn off DMA for REQ_TYPE_BLOCK_PC commands while burning and thus degrading
performance and ultimately causing an excessive amount of underruns.
The issue is documented also in:
http://bugzilla.kernel.org/show_bug.cgi?id=11742.
Signed-off-by: Borislav Petkov <petkovbb@gmail.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Tested-by: Valerio Passini <valerio.passini@unicam.it>
[bart: fixup patch description per comments from Sergei Shtylyov] Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (33 commits)
af_unix: netns: fix problem of return value
IRDA: remove double inclusion of module.h
udp: multicast packets need to check namespace
net: add documentation for skb recycling
key: fix setkey(8) policy set breakage
bpa10x: free sk_buff with kfree_skb
xfrm: do not leak ESRCH to user space
net: Really remove all of LOOPBACK_TSO code.
netfilter: nf_conntrack_proto_gre: switch to register_pernet_gen_subsys()
netns: add register_pernet_gen_subsys/unregister_pernet_gen_subsys
net: delete excess kernel-doc notation
pppoe: Fix socket leak.
gianfar: Don't reset TBI<->SerDes link if it's already up
gianfar: Fix race in TBI/SerDes configuration
at91_ether: request/free GPIO for PHY interrupt
amd8111e: fix dma_free_coherent context
atl1: fix vlan tag regression
SMC91x: delete unused local variable "lp"
myri10ge: fix stop/go mmio ordering
bonding: fix panic when taking bond interface down before removing module
...
There is a problem discovered in recent versions of ATI Mach64 driver
in X.org on sparc64 architecture. In short, the driver fails to mmap
MMIO aperture (PCI resource #2).
I've found that kernel's __pci_mmap_make_offset() returns EINVAL. It
checks whether user attempts to mmap more than the resource length,
which is 0x1000 bytes in our case. But PAGE_SIZE on SPARC64 is 0x2000
and this is what actually is being mmaped. So __pci_mmap_make_offset()
failed for this PCI resource.
Signed-off-by: Max Dmitrichenko <dmitrmax@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Beregalov reports oops in __bzero() called from
copy_from_user_fixup() called from iov_iter_copy_from_user_atomic(),
when running dbench on tmpfs on sparc64: its __copy_from_user_inatomic
and __copy_to_user_inatomic should be avoiding, not calling, the fixups.
Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 04a4bb55bcf35b63d40fd2725e58599ff8310dd7 ("net: add
skb_recycle_check() to enable netdriver skb recycling") added a
method for network drivers to recycle skbuffs, but while use of
this mechanism was documented in the commit message, it should
really have been added as a docbook comment as well -- this
patch does that.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Sat, 1 Nov 2008 18:20:39 +0000 (18:20 +0000)]
section fixes for cirrusfb
cirrusfb_zorro_unmap() may be called both from __devexit and (on
cleanup path) from __devinit. So it needs to be a normal function,
same as for cirrusfb_pci_unmap()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Al Viro [Sat, 1 Nov 2008 18:19:49 +0000 (18:19 +0000)]
oss: fix O_NONBLOCK in dmasound_core
We broke O_NONBLOCK handling in OSS dmasound_core in 2.3.11-pre3 - the
original code copied f_flags to open_mode and then checked for
O_NONBLOCK in there, but that got changed to copying f_mode and
O_NONBLOCK has not reached that field in any kernel version.
Since we do not care for any other bits, the fix is obvious...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 1 Nov 2008 17:36:30 +0000 (10:36 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: fix AMDC1E and XTOPOLOGY conflict in cpufeature
x86: build fix
by people adding the <linux/delay.h> include in two slightly different
places. Andrew's quilt scripts happily ignore the fuzz, and will
re-apply the patch even though they had conflicts.
Linus Torvalds [Sat, 1 Nov 2008 17:17:22 +0000 (10:17 -0700)]
x86: Clean up late e820 resource allocation
This makes the late e820 resources use 'insert_resource_expand_to_fit()'
instead of doing a 'reserve_region_with_split()', and also avoids
marking them as IORESOURCE_BUSY.
This results in us being perfectly happy to use pre-existing PCI
resources even if they were marked as being in a reserved region, while
still avoiding any _new_ allocations in the reserved regions. It also
makes for a simpler and more accurate resource tree.
Example resource allocation from Jonathan Corbet, who has firmware that
has an e820 reserved entry that covered a big range (e0000000-fed003ff),
and that had various PCI resources in it set up by firmware.
With old kernels, the reserved range would force us to re-allocate all
pre-existing PCI resources, and his reserved range would end up looking
like this:
and because the reserved entry had been split and moved into the
individual resources, and because it used the IORESOURCE_BUSY flag, the
drivers that actually wanted to _use_ those resources couldn't actually
attach to them:
e1000e 0000:00:19.0: BAR 0: can't reserve mem region [0xfe9e0000-0xfe9fffff]
HDA Intel 0000:00:1b.0: BAR 0: can't reserve mem region [0xfe9dc000-0xfe9dffff]
with this patch, the resource tree instead becomes
ie the one reserved region now ends up surrounding all the PCI resources
that were allocated inside of it by firmware, and because it is not
marked BUSY, drivers have no problem attaching to the pre-allocated
resources.
Reported-and-tested-by: Jonathan Corbet <corbet@lwn.net> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Robert Hancock <hancockr@shaw.ca> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 1 Nov 2008 16:53:58 +0000 (09:53 -0700)]
reserve_region_with_split: Fix GFP_KERNEL usage under spinlock
This one apparently doesn't generate any warnings, because the function
is only used during system bootup, when the warnings are disabled. But
it's still very wrong.
The __reserve_region_with_split() function is called with the
resource_lock held for writing, so it must only ever do GFP_ATOMIC
allocations.
Al Viro [Fri, 31 Oct 2008 23:28:30 +0000 (23:28 +0000)]
saner FASYNC handling on file close
As it is, all instances of ->release() for files that have ->fasync()
need to remember to evict file from fasync lists; forgetting that
creates a hole and we actually have a bunch that *does* forget.
So let's keep our lives simple - let __fput() check FASYNC in
file->f_flags and call ->fasync() there if it's been set. And lose that
crap in ->release() instances - leaving it there is still valid, but we
don't have to bother anymore.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Serge Hallyn [Thu, 30 Oct 2008 16:52:23 +0000 (11:52 -0500)]
file caps: always start with clear bprm->caps_*
While Linux doesn't honor setuid on scripts. However, it mistakenly
behaves differently for file capabilities.
This patch fixes that behavior by making sure that get_file_caps()
begins with empty bprm->caps_*. That way when a script is loaded,
its bprm->caps_* may be filled when binfmt_misc calls prepare_binprm(),
but they will be cleared again when binfmt_elf calls prepare_binprm()
next to read the interpreter's file capabilities.
Signed-off-by: Serge Hallyn <serue@us.ibm.com> Acked-by: David Howells <dhowells@redhat.com> Acked-by: Andrew G. Morgan <morgan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 31 Oct 2008 22:44:08 +0000 (15:44 -0700)]
Merge branch 'for-2.6.28' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.28' of git://linux-nfs.org/~bfields/linux:
NLM: Set address family before calling nlm_host_rebooted()
nfsd: fix failure to set eof in readdir in some situations
Eric Paris [Fri, 31 Oct 2008 21:40:00 +0000 (17:40 -0400)]
SELinux: properly handle empty tty_files list
SELinux has wrongly (since 2004) had an incorrect test for an empty
tty->tty_files list. With an empty list selinux would be pointing to part
of the tty struct itself and would then proceed to dereference that value
and again dereference that result. An F10 change to plymouth on a ppc64
system is actually currently triggering this bug. This patch uses
list_empty() to handle empty lists rather than looking at a meaningless
location.
[note, this fixes the oops reported in
https://bugzilla.redhat.com/show_bug.cgi?id=469079]
Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
Jesper Nilsson [Wed, 22 Oct 2008 21:57:53 +0000 (23:57 +0200)]
[CRIS] Remove links from CRIS build
Remove the links to architecture and machine dependent directories
(boot, lib, drivers, arch, mach)
The links were created and used mostly from the arch/cris/Makefile,
so why not dispense with them altogether?
Changed $(ARCH) to "cris" in Makefile, it is easier to read this way.
The CRISv32 head.S common files for the kernel and compressed images
needed to be modified to use ifdefs instead of using the now removed
mach link. Since there are only two versions, this is not a huge loss
in readability.
The link to vmlinux.lds.S is also replaced with a merged version
which uses ifdefs to select the correct layout.
System.map before and after are identical.
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com> Acked-by: Sam Ravnborg <sam@ravnborg.org>
Linus Torvalds [Fri, 31 Oct 2008 15:14:15 +0000 (08:14 -0700)]
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (23 commits)
Revert "powerpc: Sync RPA note in zImage with kernel's RPA note"
powerpc: Fix compile errors with CONFIG_BUG=n
powerpc: Fix format string warning in arch/powerpc/boot/main.c
powerpc: Fix bug in kernel copy of libfdt's fdt_subnode_offset_namelen()
powerpc: Remove duplicate DMA entry from mpc8313erdb device tree
powerpc/cell/OProfile: Fix on-stack array size in activate spu profiling function
powerpc/mpic: Fix regression caused by change of default IRQ affinity
powerpc: Update remaining dma_mapping_ops to use map/unmap_page
powerpc/pci: Fix unmapping of IO space on 64-bit
powerpc/pci: Properly allocate bus resources for hotplug PHBs
OF-device: Don't overwrite numa_node in device registration
powerpc: Fix swapcontext system for VSX + old ucontext size
powerpc: Fix compiler warning for the relocatable kernel
powerpc: Work around ld bug in older binutils
powerpc/ppc64/kdump: Better flag for running relocatable
powerpc: Use is_kdump_kernel()
powerpc: Kexec exit should not use magic numbers
powerpc/44x: Update 44x defconfigs
powerpc/40x: Update 40x defconfigs
powerpc: enable heap randomization for linkstations
...
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (21 commits)
sh: fix sh2a cache entry_mask
sh: Enable NFS root in Migo-R defconfig.
sh: FTRACE renamed to FUNCTION_TRACER.
sh: Fix up the shared IRQ demuxer's control bit testing logic.
Define SCSPTR1 for SH 7751R
sh: Add sci_rxd_in of SH4-202
Add support usb setting on sh 7366
sh: Change register name SCSPTR to SCSPTR2
sh: use the new byteorder headers.
sh: SHmedia ISA tuning fixups.
sh: Kill off long-dead HD64465 cchip support.
sh: Revert "SH 7366 needs SCIF_ONLY"
sh: Simplify and lock down the ISA tuning.
sh: sh7785lcr: Select uImage as default image target.
sh: Add on-chip RTC support for SH7722.
SH 7366 needs SCIF_ONLY
gdrom: Fix compile error
sh: Provide a sample defconfig for the UL2 (SH7366) board.
sh: Fix FPU tuning on toolchains with mismatched multilib targets.
sh: oprofile: Fix up the SH7750 performance counter name.
...
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc64: Add missing null terminating entry to bq4802_match[].
sparc: use the new byteorder headers
rtc-m48t59: shift zero year to 1968 on sparc (rev 2)
dbri: check dma_alloc_coherent errors
sparc64: remove byteshifting from out* helpers
Linus Torvalds [Fri, 31 Oct 2008 14:52:12 +0000 (07:52 -0700)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
delay capable() check in ext4_has_free_blocks()
merge ext4_claim_free_blocks & ext4_has_free_blocks
jbd2: Call the commit callback before the transaction could get dropped
ext4: fix a bug accessing freed memory in ext4_abort
ext3: fix a bug accessing freed memory in ext3_abort
Linus Torvalds [Fri, 31 Oct 2008 14:47:57 +0000 (07:47 -0700)]
Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
libata: add whitelist for devices with known good pata-sata bridges
sata_via: fix support for 5287
libata: Avoid overflow in ata_tf_to_lba48() when tf->hba_lbal > 127
ATA: remove excess kernel-doc notation
Paul Mackerras [Fri, 31 Oct 2008 10:34:09 +0000 (21:34 +1100)]
powerpc: Fix compile errors with CONFIG_BUG=n
This makes sure we don't try to call find_bug or is_warning_bug when
CONFIG_BUG=n and CONFIG_XMON=y. Otherwise we get these errors:
arch/powerpc/xmon/xmon.c: In function ‘print_bug_trap’:
arch/powerpc/xmon/xmon.c:1364: error: implicit declaration of function ‘find_bug’
arch/powerpc/xmon/xmon.c:1364: warning: assignment makes pointer from integer without a cast
arch/powerpc/xmon/xmon.c:1367: error: implicit declaration of function ‘is_warning_bug’
arch/powerpc/xmon/xmon.c:1374: error: dereferencing pointer to incomplete type
make[2]: *** [arch/powerpc/xmon/xmon.o] Error 1
make[1]: *** [arch/powerpc/xmon] Error 2
make: *** [sub-make] Error 2
Jon Smirl [Thu, 30 Oct 2008 16:51:32 +0000 (16:51 +0000)]
powerpc: Fix format string warning in arch/powerpc/boot/main.c
Fix format string warning in arch/powerpc/boot/main.c. Also correct
a typo ("uncomressed") on the same line.
BOOTCC arch/powerpc/boot/main.o
arch/powerpc/boot/main.c: In function 'prep_kernel':
arch/powerpc/boot/main.c:65: warning: format '%08x' expects type
'unsigned int', but argument 3 has type 'long unsigned int'
Signed-off-by: Jon Smirl <jonsmirl@gmail.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
David Gibson [Thu, 30 Oct 2008 16:37:05 +0000 (16:37 +0000)]
powerpc: Fix bug in kernel copy of libfdt's fdt_subnode_offset_namelen()
There's currently an off-by-one bug in fdt_subnode_offset_namelen()
which causes it to keep searching after it's finished the subnodes of
the given parent, and into the subnodes of siblings of the original
node which come after it in the tree. This bug was introduced in
commit ed95d7450dcbfeb45ffc9d39b1747aee82b49a51 ("powerpc: Update
in-kernel dtc and libfdt to version 1.2.0").
A patch has already been submitted to dtc/libfdt mainline. We don't
really want to pull in a new upstream version during the 2.6.28 cycle,
but we should still fix this bug, hence this standalone version of the
fix for the in-kernel libfdt.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
Ingo Molnar [Fri, 31 Oct 2008 08:31:38 +0000 (09:31 +0100)]
x86: build fix
Impact: build fix on certain UP configs
fix:
arch/x86/kernel/cpu/common.c: In function 'cpu_init':
arch/x86/kernel/cpu/common.c:1141: error: 'boot_cpu_id' undeclared (first use in this function)
arch/x86/kernel/cpu/common.c:1141: error: (Each undeclared identifier is reported only once
arch/x86/kernel/cpu/common.c:1141: error: for each function it appears in.)
Pull in asm/smp.h on UP, so that we get the definition of
boot_cpu_id.
Ilpo Järvinen [Fri, 31 Oct 2008 07:40:19 +0000 (00:40 -0700)]
bpa10x: free sk_buff with kfree_skb
Inspired by Sergio Luis' similar patches, I finally found
a case which is trivial enough that spatch won't choke
on it.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: David S. Miller <davem@davemloft.net>
sh: Fix up the shared IRQ demuxer's control bit testing logic.
Correct the interrupt handler in sh4 serial device, return the correct
value and check for what is anabled in the SCSCR register. The sh7722 is
broken just sending a break using minicom.
Signed-off-by: Michael Trimarchi <trimarchimichael@yahoo.it> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Matt Fleming [Wed, 29 Oct 2008 07:16:02 +0000 (07:16 +0000)]
Define SCSPTR1 for SH 7751R
After the recent commit to kill off SCI/SCIF special casing SH 7751R
fails to compile with CONFIG_SH_RTS7751R2D set. This is because SCSPTR1
is undefined. Take the value for SCSPTR1 from the SH7751R Group Hardware
Manual.
Signed-off-by: Matt Fleming <mjf@gentoo.org> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
I noticed that, under certain conditions, ESRCH can be leaked from the
xfrm layer to user space through sys_connect. In particular, this seems
to happen reliably when the kernel fails to resolve a template either
because the AF_KEY receive buffer being used by racoon is full or
because the SA entry we are trying to use is in XFRM_STATE_EXPIRED
state.
However, since this could be a transient issue it could be argued that
EAGAIN would be more appropriate. Besides this error code is not even
documented in the man page for sys_connect (as of man-pages 3.07).
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
Harvey Harrison [Fri, 31 Oct 2008 07:01:22 +0000 (16:01 +0900)]
sh: use the new byteorder headers.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
David S. Miller [Fri, 31 Oct 2008 07:00:33 +0000 (00:00 -0700)]
net: Really remove all of LOOPBACK_TSO code.
As noticed by Saikiran Madugula, commit 7447ef63cf2dfdc444f4c72ae13f604350b2e25f
("loopback: Remove rest of LOOPBACK_TSO code.") got rid of
emulate_large_send_offload() but didn't get rid of the call
site as well.
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Dobriyan [Fri, 31 Oct 2008 06:55:44 +0000 (23:55 -0700)]
netfilter: nf_conntrack_proto_gre: switch to register_pernet_gen_subsys()
register_pernet_gen_device() can't be used is nf_conntrack_pptp module is
also used (compiled in or loaded).
Right now, proto_gre_net_exit() is called before nf_conntrack_pptp_net_exit().
The former shutdowns and frees GRE piece of netns, however the latter
absolutely needs it to flush keymap. Oops is inevitable.
Switch to shiny new register_pernet_gen_subsys() to get correct ordering in
netns ops list.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
netns ops which are registered with register_pernet_gen_device() are
shutdown strictly before those which are registered with
register_pernet_subsys(). Sometimes this leads to opposite (read: buggy)
shutdown ordering between two modules.
Add register_pernet_gen_subsys()/unregister_pernet_gen_subsys() for modules
which aren't elite enough for entry in struct net, and which can't use
register_pernet_gen_device(). PPTP conntracking module is such one.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap [Fri, 31 Oct 2008 06:54:35 +0000 (23:54 -0700)]
net: delete excess kernel-doc notation
Remove excess kernel-doc function parameters from networking header
& driver files:
Warning(include/net/sock.h:946): Excess function parameter or struct member 'sk' description in 'sk_filter_release'
Warning(include/linux/netdevice.h:1545): Excess function parameter or struct member 'cpu' description in 'netif_tx_lock'
Warning(drivers/net/wan/z85230.c:712): Excess function parameter or struct member 'regs' description in 'z8530_interrupt'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jens Axboe [Wed, 27 Aug 2008 13:23:18 +0000 (15:23 +0200)]
libata: add whitelist for devices with known good pata-sata bridges
libata currently imposes a UDMA5 max transfer rate and 200 sector max
transfer size for SATA devices that sit behind a pata-sata bridge. Lots
of devices have known good bridges that don't need this limit applied.
The MTRON SSD disks are such devices. Transfer rates are increased by
20-30% with the restriction removed.
So add a "blacklist" entry for the MTRON devices, with a flag indicating
that the bridge is known good.
Tejun Heo [Tue, 21 Oct 2008 15:46:36 +0000 (00:46 +0900)]
sata_via: fix support for 5287
5287 used to be treated as vt6420 but it didn't work. It's new family
of controllers called vt8251 which hosts four SATA ports as M/S of the
two ATA ports. This configuration is rather peculiar in that although
the M/S devices are on the same port, each have its own SCR (or
equivalent link status/control) registers which screws up the
port-link-device hierarchy assumed by libata. Another controller
which falls into this category is ata_piix w/ SIDPR access.
libata now has facility to deal with this class of controllers named
slave_link. A low level driver for such controllers can just call
ata_slave_link_init() on the respective ports and libata will handle
all the difficult parts like following up with single SRST after
hardresetting both ports.
This patch creates new controller class vt8251, implements slave_link
aware init sequence and config space based SCR access for it and moves
5287 to the new class.
This patch is based on Joseph Chan's larger patch which was created
before slave_link was implemented in libata.
Roland Dreier [Tue, 28 Oct 2008 23:52:20 +0000 (16:52 -0700)]
libata: Avoid overflow in ata_tf_to_lba48() when tf->hba_lbal > 127
In ata_tf_to_lba48(), when evaluating
(tf->hob_lbal & 0xff) << 24
the expression is promoted to signed int (since int can hold all values
of u8). However, if hob_lbal is 128 or more, then it is treated as a
negative signed value and sign-extended when promoted to u64 to | into
sectors, which leads to the MSB 32 bits of section getting set
incorrectly.
For example, Phillip O'Donnell <phillip.odonnell@gmail.com> reported
that a 1.5GB drive caused:
Randy Dunlap [Thu, 30 Oct 2008 05:35:08 +0000 (22:35 -0700)]
ATA: remove excess kernel-doc notation
Remove excess kernel-doc function parameter notation from drivers/ata/:
Warning(drivers/ata/libata-core.c:1622): Excess function parameter or struct member 'fn' description in 'ata_pio_queue_task'
Warning(drivers/ata/libata-core.c:4655): Excess function parameter or struct member 'err_mask' description in 'ata_qc_complete'
Warning(drivers/ata/ata_piix.c:751): Excess function parameter or struct member 'udma' description in 'do_pata_set_dmamode'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Carl Love [Wed, 29 Oct 2008 05:06:45 +0000 (05:06 +0000)]
powerpc/cell/OProfile: Fix on-stack array size in activate spu profiling function
The size of the pm_signal_local array should be equal to the
number of SPUs being configured in the array. Currently, the
array is of size 4 (NR_PHYS_CTRS) but being indexed by a for
loop from 0 to 7 (NUM_SPUS_PER_NODE). This could potentially
cause an oops or random memory corruption since the pm_signal_local
array is on the stack. This fixes it.
Signed-off-by: Carl Love <carll@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
Kumar Gala [Tue, 28 Oct 2008 18:01:39 +0000 (18:01 +0000)]
powerpc/mpic: Fix regression caused by change of default IRQ affinity
The Freescale implementation of MPIC only allows a single CPU destination
for non-IPI interrupts. We add a flag to the mpic_init to distinquish
these variants of MPIC. We pull in the irq_choose_cpu from sparc64 to
select a single CPU as the destination of the interrupt.
This is to deal with the fact that the default smp affinity was
changed by commit 18404756765c713a0be4eb1082920c04822ce588 ("genirq:
Expose default irq affinity mask (take 3)") to be all CPUs.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Mark Nelson [Mon, 27 Oct 2008 20:38:08 +0000 (20:38 +0000)]
powerpc: Update remaining dma_mapping_ops to use map/unmap_page
After the merge of the 32 and 64bit DMA code, dma_direct_ops lost
their map/unmap_single() functions but gained map/unmap_page(). This
caused a problem for Cell because Cell's dma_iommu_fixed_ops called
the dma_direct_ops if the fixed linear mapping was to be used or the
iommu ops if the dynamic window was to be used. So in order to fix
this problem we need to update the 64bit DMA code to use
map/unmap_page.
First, we update the generic IOMMU code so that iommu_map_single()
becomes iommu_map_page() and iommu_unmap_single() becomes
iommu_unmap_page(). Then we propagate these changes up through all
the callers of these two functions and in the process update all the
dma_mapping_ops so that they have map/unmap_page rahter than
map/unmap_single. We can do this because on 64bit there is no HIGHMEM
memory so map/unmap_page ends up performing exactly the same function
as map/unmap_single, just taking different arguments.
This has no affect on drivers because the dma_map_single_attrs() just
ends up calling the map_page() function of the appropriate
dma_mapping_ops and similarly the dma_unmap_single_attrs() calls
unmap_page().
This fixes an oops on Cell blades, which oops on boot without this
because they call dma_direct_ops.map_single, which is NULL.
Signed-off-by: Mark Nelson <markn@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
A typo/thinko made us pass the wrong argument to __flush_hash_table_range
when unplugging bridges, thus not flushing all the translations for
the IO space on unplug. The third parameter to __flush_hash_table_range
is `end', not `size'.
This causes the hypervisor to refuse unplugging slots.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Nathan Fontenot [Mon, 27 Oct 2008 19:48:17 +0000 (19:48 +0000)]
powerpc/pci: Properly allocate bus resources for hotplug PHBs
Resources for PHB's that are dynamically added to a system are not
properly allocated in the resource tree.
Not having these resources allocated causes an oops when removing
the PHB when we try to release them.
The diff appears a bit messy, this is mainly due to moving everything
one tab to the left in the pcibios_allocate_bus_resources routine.
The functionality change in this routine is only that the
list_for_each_entry() loop is pulled out and moved to the necessary
calling routine.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Jeremy Kerr [Sun, 26 Oct 2008 21:51:25 +0000 (21:51 +0000)]
OF-device: Don't overwrite numa_node in device registration
Currently, the numa_node of OF-devices will be overwritten during
device_register, which simply sets the node to -1. On cell machines,
this means that devices can't find their IOMMU, which is referenced
through the device's numa node.
Set the numa node for OF devices with no parent, and use the
lower-level device_initialize and device_add functions, so that the
node is preserved.
We can remove the call to set_dev_node in of_device_alloc, as it
will be overwritten during register.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Michael Neuling [Thu, 23 Oct 2008 00:42:36 +0000 (00:42 +0000)]
powerpc: Fix swapcontext system for VSX + old ucontext size
Since VSX support was added, we now have two sizes of ucontext_t;
the older, smaller size without the extra VSX state, and the new
larger size with the extra VSX state. A program using the
sys_swapcontext system call and supplying smaller ucontext_t
structures will currently get an EINVAL error if the task has
used VSX (e.g. because of calling library code that uses VSX) and
the old_ctx argument is non-NULL (i.e. the program is asking for
its current context to be saved). Thus the program will start
getting EINVAL errors on calls that previously worked.
This commit changes this behaviour so that we don't send an EINVAL in
this case. It will now return the smaller context but the VSX MSR bit
will always be cleared to indicate that the ucontext_t doesn't include
the extra VSX state, even if the task has executed VSX instructions.
Both 32 and 64 bit cases are updated.
[paulus@samba.org - also fix some access_ok() and get_user() calls]
Thanks to Ben Herrenschmidt for noticing this problem.
Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Paul Mackerras [Wed, 22 Oct 2008 18:43:45 +0000 (18:43 +0000)]
powerpc: Work around ld bug in older binutils
Commit 549e8152de8039506f69c677a4546e5427aa6ae7 ("powerpc: Make the
64-bit kernel as a position-independent executable") added lines to
vmlinux.lds.S to add the extra sections needed to implement a
relocatable kernel. However, those lines seem to trigger a bug in
older versions of GNU ld (such as 2.16.1) when building a
non-relocatable kernel. Since ld 2.16.1 is still a popular choice for
cross-toolchains, this adds an #ifdef to vmlinux.lds.S so the added
lines are only included when building a relocatable kernel.
Milton Miller [Thu, 23 Oct 2008 18:41:09 +0000 (18:41 +0000)]
powerpc/ppc64/kdump: Better flag for running relocatable
The __kdump_flag ABI is overly constraining for future development.
As of 2.6.27, the kernel entry point has 4 constraints: Offset 0 is
the starting point for the master (boot) cpu (entered with r3 pointing
to the device tree structure), offset 0x60 is code for the slave cpus
(entered with r3 set to their device tree physical id), offset 0x20 is
used by the iseries hypervisor, and secondary cpus must be well behaved
when the first 256 bytes are copied to address 0.
Placing the __kdump_flag at 0x18 is bad because:
- It was taking the last 8 bytes before the iseries hypervisor data.
- It was 8 bytes for a boolean flag
- It had no way of identifying that the flag was present
- It does leave any room for the master to add any additional code
before branching, which hurts debug.
- It will be unnecessarily hard for 32 bit code to be common (8 bytes)
Now that we have eliminated the use of __kdump_flag in favor of
the standard is_kdump_kernel(), this flag only controls run without
relocating the kernel to PHYSICAL_START (0), so rename it __run_at_load.
Move the flag to 0x5c, 1 word before the secondary cpu entry point at
0x60. Initialize it with "run0" to say it will run at 0 unless it is
set to 1. It only exists if we are relocatable.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>