Sven Schnelle [Fri, 28 Mar 2008 21:15:55 +0000 (14:15 -0700)]
afs: prevent double cell registration
kafs doesn't check if the cell already exists - so if you do an echo "add
newcell.org 1.2.3.4" >/proc/fs/afs/cells it will try to create this cell
again. kobject will also complain about a double registration. To prevent
such problems, return -EEXIST in that case.
Signed-off-by: Sven Schnelle <svens@stackframe.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dmitri Monakhov [Fri, 28 Mar 2008 21:15:52 +0000 (14:15 -0700)]
vfs: fix data leak in nobh_write_end()
Current nobh_write_end() implementation ignore partial writes(copied < len)
case if page was fully mapped and simply mark page as Uptodate, which is
totally wrong because area [pos+copied, pos+len) wasn't updated explicitly in
previous write_begin call. It simply contains garbage from pagecache and
result in data leakage.
As you can see file's page contains garbage from pagecache instead of zeros.
#TEST_CASE_END
Attached patch:
- Add sanity check BUG_ON in order to prevent incorrect usage by caller,
This is function invariant because page can has buffers and in no zero
*fadata pointer at the same time.
- Always attach buffers to page is it is partial write case.
- Always switch back to generic_write_end if page has buffers.
This is reasonable because if page already has buffer then generic_write_begin
was called previously.
Signed-off-by: Dmitri Monakhov <dmonakhov@openvz.org> Reviewed-by: Nick Piggin <npiggin@suse.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
drivers/char/drm/ati_pcigart.c: In function 'drm_ati_pcigart_init':
drivers/char/drm/ati_pcigart.c:125: warning: format '%08X' expects type 'unsigned int', but argument 3 has type 'dma_addr_t'
Cc: Dave Airlie <airlied@linux.ie> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This has been forgotten in commit f5bbdacc419 ("[MTD] NAND Modularize
read function") and nobody compiled the driver.
Signed-off-by: Sebastian Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joern Engel <joern@wh.fh-wedel.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Fri, 28 Mar 2008 18:47:34 +0000 (11:47 -0700)]
Avoid false positive warnings in kmap_atomic_prot() with DEBUG_HIGHMEM
I believe http://bugzilla.kernel.org/show_bug.cgi?id=10318 is a false
positive. There's no way in which networking will be using highmem pages
here, so it won't be taking the KM_USER0 kmap slot, so there's no point in
performing these checks.
Cc: Pawel Staszewski <pstaszewski@artcom.pl> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Christoph Lameter <clameter@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Really sad. We lose almost all real-life coverage of the debug tests
with this patch. Now it will only report problems for the cases where
people actually end up using a HIGHMEM page, not when they just _might_
use one. - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roland Dreier [Fri, 28 Mar 2008 17:28:17 +0000 (10:28 -0700)]
RDMA/cxgb3: Program hardware IRD with correct value
Because of a typo in iwch_accept_cr(), the cxgb3 connection handling
code programs the hardware IRD (incoming RDMA read queue depth) with
the value that is passed in for the ORD (outgoing RDMA read queue
depth). In particular this means that if an application passes in IRD
> 0 and ORD = 0 (which is a completely sane and valid thing to do for
an app that expects only incoming RDMA read requests), then the
hardware will end up programmed with IRD = 0 and the app will fail in
a mysterious way.
Fix this by using "ep->ird" instead of "ep->ord" in the intended place.
Signed-off-by: Roland Dreier <rolandd@cisco.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Fri, 28 Mar 2008 13:28:03 +0000 (14:28 +0100)]
revert "ACPI: drivers/acpi: elide a non-zero test on a result that is never 0"
Revert commit 1192aeb957402b45f311895f124e4ca41206843c ("ACPI:
drivers/acpi: elide a non-zero test on a result that is never 0")
because it turns out that thermal_cooling_device_register() does
actually return NULL if CONFIG_THERMAL is turned off (then the routine
turns into a dummy inline routine in the header files that returns NULL
unconditionally).
This was found with randconfig testing, causing a crash during bootup:
initcall 0x78878534 ran for 13 msecs: acpi_button_init+0x0/0x51()
Calling initcall 0x78878585: acpi_fan_init+0x0/0x2c()
BUG: unable to handle kernel NULL pointer dereference at 00000000
IP: [<782b8ad0>] acpi_fan_add+0x7d/0xfd
*pde = 00000000
Oops: 0000 [#1]
Modules linked in:
Michael Ellerman [Fri, 28 Mar 2008 08:11:48 +0000 (19:11 +1100)]
[POWERPC] Fix missed hardware breakpoints across multiple threads
There is a bug in the powerpc DABR (data access breakpoint) handling,
which can result in us missing breakpoints if several threads are trying
to break on the same address.
The circumstances are that do_page_fault() calls do_dabr(), this clears
the DABR (sets it to 0) and sets up the signal which will report to
userspace that the DABR was hit. The do_signal() code will restore the DABR
value on the way out to userspace.
If we reschedule before calling do_signal(), __switch_to() will check the
cached DABR value and compare it to the new thread's value, if they match
we don't set the DABR in hardware.
So if two threads have the same DABR value, and we schedule from one to
the other after taking the interrupt for the first thread hitting the DABR,
the second thread will run without the DABR set in hardware.
The cleanest fix is to move the cache update into set_dabr(), that way we
can't forget to do it.
Reported-by: Jan Kratochvil <jan.kratochvil@redhat.com> Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
The masking was not at all useless, and it was sensible. We handle
GFP_ZERO in the caller, and passing it down to any page allocator logic
is buggy and wrong.
Linus Torvalds [Thu, 27 Mar 2008 16:14:07 +0000 (09:14 -0700)]
Merge branch 'avr32-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6
* 'avr32-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6:
avr32: Fix bug in early resource allocation code
avr32: Build fix for CONFIG_BUG=n
avr32: Work around byteswap bug in gcc < 4.2
We need to set up the shared_info pointer once we've mapped the real
shared_info into its fixmap slot. That needs to happen once the general
pagetable setup has been done. Previously, the UP shared_info was set
up one in xen_start_kernel, but that was left pointing to the dummy
shared info. Unfortunately there's no really good place to do a later
setup of the shared_info in UP, so just do it once the pagetable setup
has been done.
xen_irq_enable_direct and xen_sysexit were using "andw $0x00ff,
XEN_vcpu_info_pending(vcpu)" to unmask events and test for pending ones
in one instuction.
Unfortunately, the pending flag must be modified with a locked operation
since it can be set by another CPU, and the unlocked form of this
operation was causing the pending flag to get lost, allowing the processor
to return to usermode with pending events and ultimately deadlock.
The simple fix would be to make it a locked operation, but that's rather
costly and unnecessary. The fix here is to split the mask-clearing and
pending-testing into two instructions; the interrupt window between
them is of no concern because either way pending or new events will
be processed.
This should fix lingering bugs in using direct vcpu structure access too.
The first page of the compound page is determined in follow_huge_addr()
but then PageCompound() only checks if the page is part of a compound page.
PageHead() allows checking if this is indeed the first page of the
compound.
Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Florian Fainelli [Wed, 26 Mar 2008 21:39:15 +0000 (22:39 +0100)]
rdc321x: GPIO routines bugfixes
This patch fixes the use of GPIO routines which are in the PCI
configuration space of the RDC321x, therefore reading/writing
to this space without spinlock protection can be problematic.
We also now request and free GPIOs and support the MGB100
board, previous code was very AR525W-centric.
Andrew Morton [Tue, 4 Mar 2008 23:05:39 +0000 (15:05 -0800)]
x86: ptrace.c: fix defined-but-unused warnings
arch/x86/kernel/ptrace.c:548: warning: 'ptrace_bts_get_size' defined but not used
arch/x86/kernel/ptrace.c:558: warning: 'ptrace_bts_read_record' defined but not used
arch/x86/kernel/ptrace.c:607: warning: 'ptrace_bts_clear' defined but not used
arch/x86/kernel/ptrace.c:617: warning: 'ptrace_bts_drain' defined but not used
arch/x86/kernel/ptrace.c:720: warning: 'ptrace_bts_config' defined but not used
arch/x86/kernel/ptrace.c:788: warning: 'ptrace_bts_status' defined but not used
Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Linus Torvalds [Thu, 27 Mar 2008 15:03:22 +0000 (08:03 -0700)]
Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
ACPI: drivers/acpi: elide a non-zero test on a result that is never 0
pnpacpi: reduce printk severity for "pnpacpi: exceeded the max number of ..."
cpuidle: fix 100% C0 statistics regression
cpuidle: fix cpuidle time and usage overflow
ACPI: fix mis-merge -- invoke acpi_unlazy_tlb() only on C3 entry
ACPI: fix a regression of ACPI device driver autoloading
ACPI: SBS: remove typo from sbchc.c
add_reserved_region() tries to keep the resource list sorted, so when
looking for a place to insert the new resource, it may break out
before the last entry.
When this happens, the list is broken in two because the sibling field
of the new entry doesn't point to the next resource. Fix it by
updating the new resource's sibling field appropriately.
Julia Lawall [Thu, 27 Mar 2008 05:48:22 +0000 (01:48 -0400)]
ACPI: drivers/acpi: elide a non-zero test on a result that is never 0
The function thermal_cooling_device_register always returns either a valid
pointer or a value made with ERR_PTR, so a test for non-zero on the result
will always succeed.
The problem was found using the following semantic match.
(http://www.emn.fr/x-info/coccinelle/)
//<smpl>
@a@
expression E, E1;
statement S,S1;
position p;
@@
E = thermal_cooling_device_register(...)
... when != E = E1
if@p (E) S else S1
@n@
position a.p;
expression E,E1;
statement S,S1;
@@
E = NULL
... when != E = E1
if@p (E) S else S1
@depends on !n@
expression E;
statement S,S1;
position a.p;
@@
* if@p (E)
S else S1
//</smpl>
Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
Herbert Xu [Wed, 26 Mar 2008 23:51:09 +0000 (16:51 -0700)]
[IPSEC]: Fix BEET output
The IPv6 BEET output function is incorrectly including the inner
header in the payload to be protected. This causes a crash as
the packet doesn't actually have that many bytes for a second
header.
The IPv4 BEET output on the other hand is broken when it comes
to handling an inner IPv6 header since it always assumes an
inner IPv4 header.
This patch fixes both by making sure that neither BEET output
function touches the inner header at all. All access is now
done through the protocol-independent cb structure. Two new
attributes are added to make this work, the IP header length
and the IPv4 option length. They're filled in by the inner
mode's output function.
Thanks to Joakim Koskela for finding this problem.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 26 Mar 2008 22:02:12 +0000 (15:02 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
x86: fix performance drop for glx
x86: fix trim mtrr not to setup_memory two times
x86: GEODE: add missing module.h include
x86, cpufreq: fix Speedfreq-SMI call that clobbers ECX
x86: fix memoryless node oops during boot
x86: add dmi quirk for io_delay
x86: convert mtrr/generic.c to kernel-doc
x86: Documentation/i386/IO-APIC.txt: fix description
This was tracked down to a potential livelock in
return_unused_surplus_hugepages(). In the case where we have surplus
pages on some node, but no free pages on the same node, we may never
break out of the loop. To avoid this livelock, terminate the search if
we iterate a number of times equal to the number of online nodes without
freeing a page.
Thanks to Andy Whitcroft and Adam Litke for helping with debugging and
the patch.
hugetlb: indicate surplus huge page counts in per-node meminfo
Currently we show the surplus hugetlb pool state in /proc/meminfo, but
not in the per-node meminfo files, even though we track the information
on a per-node basis. Printing it there can help track down dynamic pool
bugs including the one in the follow-on patch.
Suresh Siddha [Wed, 26 Mar 2008 00:39:12 +0000 (17:39 -0700)]
x86: fix performance drop for glx
fix the 3D performance drop reported at:
http://bugzilla.kernel.org/show_bug.cgi?id=10328
fb drivers are using ioremap()/ioremap_nocache(), followed by mtrr_add with
WC attribute. Recent changes in page attribute code made both
ioremap()/ioremap_nocache() mappings as UC (instead of previous UC-). This
breaks the graphics performance, as the effective memory type is UC instead
of expected WC.
The correct way to fix this is to add ioremap_wc() (which uses UC- in the
absence of PAT kernel support and WC with PAT) and change all the
fb drivers to use this new ioremap_wc() API.
We can take this correct and longer route for post 2.6.25. For now,
revert back to the UC- behavior for ioremap/ioremap_nocache.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Yinghai Lu [Sun, 23 Mar 2008 07:16:49 +0000 (00:16 -0700)]
x86: fix trim mtrr not to setup_memory two times
we could call find_max_pfn() directly instead of setup_memory() to get
max_pfn needed for mtrr trimming.
otherwise setup_memory() is called two times... that is duplicated...
[ mingo@elte.hu: both Thomas and me simulated a double call to
setup_bootmem_allocator() and can confirm that it is a real bug
which can hang in certain configs. It's not been reported yet but
that is probably due to the relatively scarce nature of
MTRR-trimming systems. ]
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Andres Salomon [Wed, 26 Mar 2008 18:13:01 +0000 (14:13 -0400)]
x86: GEODE: add missing module.h include
On Wed, 26 Mar 2008 11:56:22 -0600
Jordan Crouse <jordan.crouse@amd.com> wrote:
> On 26/03/08 14:31 +0100, Stefan Pfetzing wrote:
> > Hello Jordan,
> >
> > I just tried to build your geodwdt driver for the geode watchdog. Therefore
> > I pulled your repository from http://git.infradead.org/geode.git (or more,
> > the git url).
> >
> > I tried to build the geodewdt driver as a module - which didn't work, and
> > it failed with the same problem as earlier mentioned on lkmk [1]. I also
> > checked the fix [2], but that seems to be already in your (or linus) tree -
> > and so I'm unsure what the problem is.
> >
> > [1] http://kerneltrap.org/mailarchive/linux-kernel/2008/2/17/884074
> > [2] http://kerneltrap.org/mailarchive/linux-kernel/2008/2/17/884174
> >
> > Building directly into the kernel seems to work.
> >
> > Maybe you have some idea?
>
> Hmm - that is strange. Exporting the symbols should work. I recommend
> starting over with a clean tree.
>
> CCing Andres - any thoughts?
>
> Jordan
>
Er, yeah. The patch below should fix it. This should probably go into
2.6.25.
Oops, EXPORT_SYMBOL_GPL wasn't being declared due to this header
being missing.
Signed-off-by: Andres Salomon <dilinger@debian.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86, cpufreq: fix Speedfreq-SMI call that clobbers ECX
I have found that using SMI to change the cpu's frequency on my DELL
Latitude L400 clobbers the ECX register in speedstep_set_state, causing
unneccessary retries because the "state" variable has changed silently (GCC
assumes it is still present in ECX).
play safe and avoid gcc caching any register across IO port accesses
that trigger SMIs.
The description of the interrupt routing doesn't match the (nice) diagram.
Signed-off-by: Nick Andrew <nick@nick-andrew.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Linus Torvalds [Wed, 26 Mar 2008 18:30:17 +0000 (11:30 -0700)]
Merge branch 'slab-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/christoph/vm
* 'slab-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/christoph/vm:
slab: fix cache_cache bootstrap in kmem_cache_init()
count_partial() is not used if !SLUB_DEBUG and !CONFIG_SLABINFO
Linus Torvalds [Wed, 26 Mar 2008 18:29:35 +0000 (11:29 -0700)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt:
NOHZ: reevaluate idle sleep length after add_timer_on()
clocksource: revert: use init_timer_deferrable for clocksource_watchdog
Tom Tucker [Wed, 26 Mar 2008 02:27:19 +0000 (22:27 -0400)]
SVCRDMA: Check num_sge when setting LAST_CTXT bit
The RDMACTXT_F_LAST_CTXT bit was getting set incorrectly
when the last chunk in the read-list spanned multiple pages. This
resulted in a kernel panic when the wrong context was used to
build the RPC iovec page list.
RDMA_READ is used to fetch RPC data from the client for
NFS_WRITE requests. A scatter-gather is used to map the
advertised client side buffer to the server-side iovec and
associated page list.
WR contexts are used to convey which scatter-gather entries are
handled by each WR. When the write data is large, a single RPC may
require multiple RDMA_READ requests so the contexts for a single RPC
are chained together in a linked list. The last context in this list
is marked with a bit RDMACTXT_F_LAST_CTXT so that when this WR completes,
the CQ handler code can enqueue the RPC for processing.
The code in rdma_read_xdr was setting this bit on the last two
contexts on this list when the last read-list chunk spanned multiple
pages. This caused the svc_rdma_recvfrom logic to incorrectly build
the RPC and caused the kernel to crash because the second-to-last
context doesn't contain the iovec page list.
Modified the condition that sets this bit so that it correctly detects
the last context for the RPC.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Tested-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 26 Mar 2008 18:22:40 +0000 (11:22 -0700)]
Revert "PCI: remove transparent bridge sizing"
This reverts commit 8fa5913d54f3b1e09948e6a0db34da887e05ff1f, which
caused various interesting problems for people, including wrong resource
allocations. See for example bugzilla entry "2.6.25-rc2: ohci1394
problem (MMIO broken)" at
http://bugzilla.kernel.org/show_bug.cgi?id=10080
And Gary Hade says:
"The same change had also exposed an issue reported by Paul Martin that
has been causing an Oops while hotplugging ThinkPads to a ThinkPad
Dock II. See
I have a fix for the ThinkPad docking Oops but if the issue being
discussed here is caused by the transparent bridge sizing removal
change I totally agree that it should be reverted."
The transparent bridge sizing removal change was motivated by
insufficient PCI memory resource for a transparent bridge window that
was being created as a result of expansion ROM(s) being included in
the transparent bridge sizing calculations.
A later "PCI: Remove default PCI expansion ROM memory allocation"
change ( re: http://lkml.org/lkml/2007/12/11/361 ) removes the
expansion ROM(s) from the transparent bridge sizing calculations which
actually resolves the original issue in a different manner. So, even
if the "PCI: remove transparent bridge sizing" is not problematic it
is no longer needed anyway."
Identified-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Tested-by: Thomas Meyer <thomas@m3y3r.de> Acked-by: Gary Hade <garyhade@us.ibm.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Len Brown [Wed, 26 Mar 2008 17:29:32 +0000 (13:29 -0400)]
pnpacpi: reduce printk severity for "pnpacpi: exceeded the max number of ..."
We have been printing these messages at KERN_ERR since 2.6.24,
per http://bugzilla.kernel.org/show_bug.cgi?id=9535
But KERN_ERR pops up on a console booted with "quiet"
and causes users to get alarmed and file bugs
about the message itself:
https://bugzilla.redhat.com/show_bug.cgi?id=436589
So reduce the severity of these messages to
KERN_WARNING, which is not printed by "quiet".
This message will still be seen without "quiet",
but a lot of messages are printed in that mode
and it will be less likely to cause undue alarm.
We could go all the way to KERN_DEBUG, but this
is a real warning after all, so it seems prudent
not to require "debug" to see it.
Daniel Yeisley [Tue, 25 Mar 2008 21:59:08 +0000 (23:59 +0200)]
slab: fix cache_cache bootstrap in kmem_cache_init()
Commit 556a169dab38b5100df6f4a45b655dddd3db94c1 ("slab: fix bootstrap on
memoryless node") introduced bootstrap-time cache_cache list3s for all nodes
but forgot that initkmem_list3 needs to be accessed by [somevalue + node]. This
patch fixes list_add() corruption in mm/slab.c seen on the ES7000.
Cc: Mel Gorman <mel@csn.ul.ie> Cc: Olaf Hering <olaf@aepfle.de> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Dan Yeisley <dan.yeisley@unisys.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Christoph Lameter <clameter@sgi.com>
count_partial() is not used if !SLUB_DEBUG and !CONFIG_SLABINFO
Avoid warnings about unused functions if neither SLUB_DEBUG nor CONFIG_SLABINFO
is defined. This patch will be reversed when slab defrag is merged since slab
defrag requires count_partial() to determine the fragmentation status of
slab caches.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Pavel Emelyanov [Wed, 26 Mar 2008 09:27:09 +0000 (02:27 -0700)]
[ICMP]: Dst entry leak in icmp_send host re-lookup code (v2).
Commit 8b7817f3a959ed99d7443afc12f78a7e1fcc2063 ([IPSEC]: Add ICMP host
relookup support) introduced some dst leaks on error paths: the rt
pointer can be forgotten to be put. Fix it bu going to a proper label.
Found after net namespace's lo refused to unregister :) Many thanks to
Den for valuable help during debugging.
Herbert pointed out, that xfrm_lookup() will put the rtable in case
of error itself, so the first goto fix is redundant.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 26 Mar 2008 09:12:11 +0000 (02:12 -0700)]
[NET]: Fix multicast device ioctl checks
SIOCADDMULTI/SIOCDELMULTI check whether the driver has a set_multicast_list
method to determine whether it supports multicast. Drivers implementing
secondary unicast support use set_rx_mode however.
Check for both dev->set_multicast_mode and dev->set_rx_mode to determine
multicast capabilities.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 26 Mar 2008 08:43:29 +0000 (01:43 -0700)]
[SPARC64]: Fix most sparse warnings in arch/sparc64/kernel/sys_sparc.c
Sparse still doesn't like the funny cast we make from a scalar to a
"union semun" (which is correct by the C language and in particular
works with the sparc64 calling conventions, but sparse doesn't grok
that yet).
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Gleixner [Sat, 22 Mar 2008 08:20:24 +0000 (09:20 +0100)]
NOHZ: reevaluate idle sleep length after add_timer_on()
add_timer_on() can add a timer on a CPU which is currently in a long
idle sleep, but the timer wheel is not reevaluated by the nohz code on
that CPU. So a timer can be delayed for quite a long time. This
triggered a false positive in the clocksource watchdog code.
To avoid this we need to wake up the idle CPU and enforce the
reevaluation of the timer wheel for the next timer event.
Add a function, which checks a given CPU for idle state, marks the
idle task with NEED_RESCHED and sends a reschedule IPI to notify the
other CPU of the change in the timer wheel.
Patrick McHardy [Wed, 26 Mar 2008 07:16:29 +0000 (00:16 -0700)]
[UML]: uml-net: don't set IFF_ALLMULTI in set_multicast_list
IFF_ALLMULTI is an indication from the network stack to the driver
to disable multicast filters, drivers should never set it directly.
Since the UML networking device doesn't have any filtering capabilites,
it doesn't the set_multicast_list function at all, it is kept so userspace
can still issue SIOCADDMULTI/SIOCDELMULTI ioctls however.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 26 Mar 2008 07:15:17 +0000 (00:15 -0700)]
[VLAN]: Don't copy ALLMULTI/PROMISC flags from underlying device
Changing these flags requires to use dev_set_allmulti/dev_set_promiscuity
or dev_change_flags. Setting it directly causes two unwanted effects:
- the next dev_change_flags call will notice a difference between
dev->gflags and the actual flags, enable promisc/allmulti
mode and incorrectly update dev->gflags
- this keeps the underlying device in promisc/allmulti mode until
the VLAN device is deleted
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Venki Pallipadi [Fri, 29 Feb 2008 18:24:32 +0000 (10:24 -0800)]
cpuidle: fix 100% C0 statistics regression
commit 9b12e18cdc1553de62d931e73443c806347cd974
'ACPI: cpuidle: Support C1 idle time accounting'
was implicated in a 100% C0 idle regression.
http://bugzilla.kernel.org/show_bug.cgi?id=10076
It pointed out a potential problem where the menu governor
may get confused by the C-state residency time from poll
idle or C1 idle, where this timing info is not accurate.
This inaccuracy is due to interrupts being handled
before we account for C-state exit.
Do not mark TIME_VALID for CO poll state.
Mark C1 time as valid only with the MWAIT (CSTATE_FFH) entry method.
This makes governors use the timing information only when it is correct and
eliminates any wrong policy decisions that may result from invalid timing
information.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
David S. Miller [Wed, 26 Mar 2008 04:51:40 +0000 (21:51 -0700)]
[SPARC64]: Fix sparse warnings in arch/sparc64/kernel/{cpu,setup}.c
We create a local header file entry.h, under arch/sparc64/kernel/,
that we can use to declare routines either defined in assembler
or only invoked from assembler. As well as other data objects
which are private to the inner sparc64 kernel arch code.
Signed-off-by: David S. Miller <davem@davemloft.net>
Yi Yang [Mon, 25 Feb 2008 00:46:12 +0000 (08:46 +0800)]
cpuidle: fix cpuidle time and usage overflow
cpuidle C-state sysfs node time and usage are very easy to overflow because
they are all of unsigned int type, time will overflow within about two hours,
usage will take longer time to overflow, but they are increasing for ever.
This patch will convert them to unsigned long long.
Signed-off-by: Yi Yang <yi.y.yang@intel.com> Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
Venki Pallipadi [Mon, 24 Mar 2008 21:24:10 +0000 (14:24 -0700)]
ACPI: fix mis-merge -- invoke acpi_unlazy_tlb() only on C3 entry
This original patch
http://ussg.iu.edu/hypermail/linux/kernel/0712.2/1451.html
was intending to add acpi_unlazy_tlb() to acpi_idle_enter_bm(),
which is used for C3 entry.
Michael Buesch [Tue, 25 Mar 2008 17:04:46 +0000 (18:04 +0100)]
b44: Truncate PHY address
Some ROMs on embedded devices store incorrect values for
the PHY address of the ethernet device.
It looks like the number is sign-extended.
Truncate the value by applying the PHY-address mask to it.
The patch was tested on a bcm47xx embedded system (where the bug
triggers) and a bcm4400 PCI card.
Signed-off-by: Michael Buesch <mb@bu3sch.de> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Jussi Kivilinna [Sun, 23 Mar 2008 10:45:44 +0000 (12:45 +0200)]
rndis_host: fix oops when query for OID_GEN_PHYSICAL_MEDIUM fails
When query for OID_GEN_PHYSICAL_MEDIUM fails, uninitialized pointer
'phym' is being accessed in generic_rndis_bind(), resulting OOPS.
Patch fixes phym to be initialized and setup correctly when
rndis_query() for physical medium fails.
Bug was introduced by following commit:
commit 039ee17d1baabaa21783a0d5ab3e8c6d8c794bdf
Author: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Date: Sun Jan 27 23:34:33 2008 +0200
Reported-by: Dmitri Monakhov <dmonakhov@openvz.org> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Jeff Garzik <jeff@garzik.org>
The problem is that reg_lock is used with plain spin_lock() in
drivers/net/cxgb3/sge.c but is used with spin_lock_irqsave() in
drivers/net/cxgb3/cxgb3_offload.c. This is technically a false
positive, since the uses in sge.c are only in the initialization and
cleanup paths and cannot overlap with any use in interrupt context.
The best fix is probably just to use spin_lock_irq() with reg_lock in
sge.c. Even though it's not strictly required for correctness, it
avoids triggering lockdep and the extra overhead of disabling
interrupts is not important at all in the initialization and cleanup
slow paths.
Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Alexandr Smirnov [Tue, 18 Mar 2008 21:37:24 +0000 (00:37 +0300)]
Marvell PHY m88e1111 driver fix
Marvell PHY m88e1111 (not sure about other models, but think they too)
works in two modes: fiber and copper. In Marvell PHY driver (that we
have in current community kernels) code supported only copper mode,
and this is not configurable, bits for copper mode are simply written
in registers during PHY initialization.
This patch adds support for both modes.
Signed-off-by: Alexandr Smirnov <asmirnov@ru.mvista.com> Acked-by: Andy Fleming <afleming@freescale.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Dhananjay Phadke [Tue, 18 Mar 2008 02:59:50 +0000 (19:59 -0700)]
netxen: remove low level tx lock
o eliminate tx lock in netxen adapter struct, instead pound on netdev
tx lock appropriately.
o remove old "concurrent transmit" code that unnecessarily drops and
reacquires tx lock in hard_xmit_frame(), this is already serialized
the netdev xmit lock.
o reduce scope of tx lock in tx cleanup. tx cleanup operates on
different section of the ring than transmitting cpus and is
guarded by producer and consumer indices. This fixes a race
caused by rx softirq preemption on realtime kernels.
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com> Tested-by: Vernon Mauery <mauery@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Dhananjay Phadke [Tue, 18 Mar 2008 02:59:49 +0000 (19:59 -0700)]
netxen: napi and irq cleanup
o separate and simpler irq handler for msi interrupts, avoids few checks
than legacy mode.
o avoid redudant tx_has_work() and rx_has_work() checks in interrupt
and napi, which can uncork irq based on racy (lockless) access to tx
and rx ring indices. If we get interrupt, there's sufficient reason to
schedule napi.
o replenish rx ring more often, remove self-imposed threshold rcv_free
that prevents posting rx desc to card. This improves performance in
low memory.
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com> Tested-by: Vernon Mauery <mauery@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Dhananjay Phadke [Tue, 18 Mar 2008 02:59:48 +0000 (19:59 -0700)]
netxen: improve msi support
Recent netxen firmware has new scheme of generating MSI interrupts, it
raises interrupt and blocks itself, waiting for driver to unmask. This
reduces chance of spurious interrupts.
The driver will be able to deal with older firmware as well.
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com> Tested-by: Vernon Mauery <mauery@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>