git.karo-electronics.de Git - karo-tx-linux.git/log

rtc: fix reported IRQ rate for when HPET is enabled

commit 61ca9daa2ca3022dc9cb22bd98e69c1b61e412ad upstream

The IRQ rate reported back by the RTC is incorrect when HPET is enabled.

Newer hardware that has HPET to emulate the legacy RTC device gets this value
wrong since after it sets the rate, it returns before setting the variable
used to report the IRQ rate back to users of the device -- so the set rate and
the reported rate get out of sync.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Brownell <david-b@pacbell.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

slub: Fix use-after-preempt of per-CPU data structure

commit bdb21928512a860a60e6a24a849dc5b63cbaf96a upstream

Vegard Nossum reported a crash in kmem_cache_alloc():

BUG: unable to handle kernel paging request at da87d000
IP: [<c01991c7>] kmem_cache_alloc+0xc7/0xe0
*pde = 28180163 *pte = 1a87d160
Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Pid: 3850, comm: grep Not tainted (2.6.26-rc9-00059-gb190333 #5)
EIP: 0060:[<c01991c7>] EFLAGS: 00210203 CPU: 0
EIP is at kmem_cache_alloc+0xc7/0xe0
EAX: 00000000 EBX: da87c100 ECX: 1adad71a EDX: 6b6b6b6b
ESI: 00200282 EDI: da87d000 EBP: f60bfe74 ESP: f60bfe54
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068

and analyzed it:

  "The register %ecx looks innocent but is very important here. The disassembly:

       mov    %edx,%ecx
       shr    $0x2,%ecx
       rep stos %eax,%es:(%edi) <-- the fault

   So %ecx has been loaded from %edx... which is 0x6b6b6b6b/POISON_FREE.
   (0x6b6b6b6b >> 2 == 0x1adadada.)

   %ecx is the counter for the memset, from here:

       memset(object, 0, c->objsize);

  i.e. %ecx was loaded from c->objsize, so "c" must have been freed.
  Where did "c" come from? Uh-oh...

       c = get_cpu_slab(s, smp_processor_id());

  This looks like it has very much to do with CPU hotplug/unplug. Is
  there a race between SLUB/hotplug since the CPU slab is used after it
  has been freed?"

Good analysis.

Yeah, it's possible that a caller of kmem_cache_alloc() -> slab_alloc()
can be migrated on another CPU right after local_irq_restore() and
before memset().  The inital cpu can become offline in the mean time (or
a migration is a consequence of the CPU going offline) so its
'kmem_cache_cpu' structure gets freed ( slab_cpuup_callback).

At some point of time the caller continues on another CPU having an
obsolete pointer...

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

exec: fix stack excutability without PT_GNU_STACK

commit 96a8e13ed44e380fc2bb6c711d74d5ba698c00b2 upstream

Kernel Bugzilla #11063 points out that on some architectures (e.g. x86_32)
exec'ing an ELF without a PT_GNU_STACK program header should default to an
executable stack; but this got broken by the unlimited argv feature because
stack vma is now created before the right personality has been established:
so breaking old binaries using nested function trampolines.

Therefore re-evaluate VM_STACK_FLAGS in setup_arg_pages, where stack
vm_flags used to be set, before the mprotect_fixup. Checking through
our existing VM_flags, none would have changed since insert_vm_struct:
so this seems safer than finding a way through the personality labyrinth.

Reported-by: pageexec@freemail.hu
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

zd1211rw: add ID for AirTies WUS-201

Commit 9dfd55008e3863dcd93219c74bf05b09e5c549e2 upstream

I would like to inform you of our zd1211 based usb wifi adapter (AirTies
WUS-201), which works with the zd1211rw driver with the following device
id definition.

Signed-off-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Cc: Peter Nixon <listuser@peternixon.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

netfilter: nf_conntrack_tcp: fixing to check the lower bound of valid ACK

Upstream commit 84ebe1c:

Lost connections was reported by Thomas Bätzler (running 2.6.25 kernel) on
the netfilter mailing list (see the thread "Weird nat/conntrack Problem
with PASV FTP upload"). He provided tcpdump recordings which helped to
find a long lingering bug in conntrack.

In TCP connection tracking, checking the lower bound of valid ACK could
lead to mark valid packets as INVALID because:

- We have got a "higher or equal" inequality, but the test checked
   the "higher" condition only; fixed.
- If the packet contains a SACK option, it could occur that the ACK
   value was before the left edge of our (S)ACK "window": if a previous
   packet from the other party intersected the right edge of the window
   of the receiver, we could move forward the window parameters beyond
   accepting a valid ack. Therefore in this patch we check the rightmost
   SACK edge instead of the ACK value in the lower bound of valid (S)ACK
   test.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

textsearch: fix Boyer-Moore text search bug

Upstream commit aebb6a849cfe7d89bcacaaecc20a480dfc1180e7

The current logic has a bug which cannot find matching pattern, if the
pattern is matched from the first character of target string.
for example:
pattern=abc, string=abcdefg
pattern=a, string=abcdefg
Searching algorithm should return 0 for those things.

Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

md: ensure all blocks are uptodate or locked when syncing

commit 7a1fc53c5adb910751a9b212af90302eb4ffb527 upstream

Remove the dubious attempt to prefer 'compute' over 'read'. Not only is it
wrong given commit c337869d (md: do not compute parity unless it is on a failed
drive), but it can trigger a BUG_ON in handle_parity_checks5().

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

sisusbvga: Fix oops on disconnect.

commit f15e39739a1d7dfaa2173a91707a74c11a246648 upstream

Remove dev_info call on disconnect. The sisusb_dev pointer may have been
set to zero by sisusb_delete at this point causing an oops.

The message does not provide any extra information over the standard USB
subsystem output so removing it does not affect functionality.

Signed-off-by: Will Newton <will.newton@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

can: add sanity checks

commit 7f2d38eb7a42bea1c1df51bbdaa2ca0f0bdda07f upstream

Even though the CAN netlayer only deals with CAN netdevices, the
netlayer interface to the userspace and to the device layer should
perform some sanity checks.

This patch adds several sanity checks that mainly prevent userspace apps
to send broken content into the system that may be misinterpreted by
some other userspace application.

Signed-off-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de>
Signed-off-by: Urs Thuermann <urs.thuermann@volkswagen.de>
Acked-by: Andre Naujoks <nautsch@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

serial: fix serial_match_port() for dynamic major tty-device numbers

commit 7ca796f492a11f9408e661c8f22cd8c4f486b8e5 upstream

As reported by Vipul Gandhi, the current serial_match_port() doesn't work
for tty-devices using dynamic major number allocation. Fix it.

It oopses if you suspend a serial port with _dynamic_ major number. ATM,
I think, there's only the drivers/serial/jsm/jsm_driver.c driver, that
does it in-tree.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Tested-by: Vipul Gandhi <vcgandhi1@aol.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

cciss: read config to obtain max outstanding commands per controller

commit 491539982aa01fa71de93c2a06ac5d890d4cf1e2 upstream

This patch changes the way we determine the maximum number of outstanding
commands for each controller.

Most Smart Array controllers can support up to 1024 commands, the notable
exceptions are the E200 and E200i.

The next generation of controllers which were just added support a mode of
operation called Zero Memory Raid (ZMR).  In this mode they only support
64 outstanding commands.  In Full Function Raid (FFR) mode they support
1024.

We have been setting the queue depth by arbitrarily assigning some value
for each controller.  We needed a better way to set the queue depth to
avoid lots of annoying "fifo full" messages.  So we made the driver a
little smarter.  We now read the config table and subtract 4 from the
returned value.  The -4 is to allow some room for ioctl calls which are
not tracked the same way as io commands are tracked.

Please consider this for inclusion.

Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

reiserfs: discard prealloc in reiserfs_delete_inode

commit eb35c218d83ec0780d9db869310f2e333f628702 upstream

With the removal of struct file from the xattr code,
reiserfs_file_release() isn't used anymore, so the prealloc isn't
discarded. This causes hangs later down the line.

This patch adds it to reiserfs_delete_inode. In most cases it will be a
no-op due to it already having been called, but will avoid hangs with
xattrs.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

mm: switch node meminfo Active & Inactive pages to Kbytes

commit 2d5c1be8870383622809c25935fff00d2630c7a5 upstream

There is a bug in the output of /sys/devices/system/node/node[n]/meminfo
where the Active and Inactive values are in pages instead of Kbytes.

Looks like this occurred back in 2.6.20 when the code was changed
over to use node_page_state().

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

SCSI: ses: Fix timeout

commit c95e62ce8905aab62fed224eaaa9b8558a0ef652 upstream

Timeouts are measured in jiffies, not in seconds.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

SCSI: esp: tidy up target reference counting

commit ec5e69f6d3f4350681d6f7eaae515cf014be9276 upstream

The esp driver currently does hand rolled reference counting of its
target. It's much easier to do what it needs to do if it's plugged into
the mid-layer callbacks (target_alloc and target_destroy) which were
designed for this case, so do it this way and get rid of the internal
target reference count.

Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

SCSI: esp: Fix OOPS in esp_reset_cleanup().

commit eadc49b1a8d09480f14caea292142f103a89c77a upstream

OOPS reported by Friedrich Oslage <bluebird@porno-bullen.de>

The problem here is that tp->starget is set every time a lun
is allocated for a particular target so we can catch the
sdev_target parent value.

The reset handler uses the NULL'ness of this value to determine
which targets are active.

But esp_slave_destroy() does not NULL out this value when appropriate.

So for every target that doesn't respond, the SCSI bus scan causes
a stale pointer to be left here, with ensuing crashes like you're
seeing.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

netdrvr: 3c59x: remove irqs_disabled warning from local_bh_enable

commit c5643cab7bf663ae049b11be43de8819683176dd upstream

Original Author: Michael Buesch <mb@bu3sch.de>

net, vortex: fix lockup

Ingo Molnar reported:

-tip testing found that Johannes Berg's "softirq: remove irqs_disabled
warning from local_bh_enable" enhancement to lockdep triggers a new
warning on an old testbox that uses 3c59x vortex and netlogging:

----->
    calling  vortex_init+0x0/0xb0
    PCI: Found IRQ 10 for device 0000:00:0b.0
    PCI: Sharing IRQ 10 with 0000:00:0a.0
    PCI: Sharing IRQ 10 with 0000:00:0b.1
    3c59x: Donald Becker and others.
    0000:00:0b.0: 3Com PCI 3c556 Laptop Tornado at e0800400.
    PCI: Enabling bus mastering for device 0000:00:0b.0
    initcall vortex_init+0x0/0xb0 returned 0 after 47 msecs
..
    calling  init_netconsole+0x0/0x1b0
    netconsole: local port 4444
    netconsole: local IP 10.0.1.9
    netconsole: interface eth0
    netconsole: remote port 4444
    netconsole: remote IP 10.0.1.16
    netconsole: remote ethernet address 00:19:xx:xx:xx:xx
    netconsole: device eth0 not up yet, forcing it
    eth0:  setting half-duplex.
    eth0:  setting full-duplex.
------------[ cut here ]------------
    WARNING: at kernel/softirq.c:137 local_bh_enable_ip+0xd1/0xe0()
    Pid: 1, comm: swapper Not tainted 2.6.26-rc6-tip #2091
     [<c0125ecf>] warn_on_slowpath+0x4f/0x70
     [<c0126834>] ? release_console_sem+0x1b4/0x1d0
     [<c0126d00>] ? vprintk+0x2a0/0x450
     [<c012fde5>] ? __mod_timer+0xa5/0xc0
     [<c046f7fd>] ? mdio_sync+0x3d/0x50
     [<c0160ef6>] ? marker_probe_cb+0x46/0xa0
     [<c0126ed7>] ? printk+0x27/0x50
     [<c046f4c3>] ? vortex_set_duplex+0x43/0xc0
     [<c046f521>] ? vortex_set_duplex+0xa1/0xc0
     [<c0471b92>] ? vortex_timer+0xe2/0x3e0
     [<c012b361>] local_bh_enable_ip+0xd1/0xe0
     [<c08d9f9f>] _spin_unlock_bh+0x2f/0x40
     [<c0471b92>] vortex_timer+0xe2/0x3e0
     [<c014743b>] ? trace_hardirqs_on+0xb/0x10
     [<c0147358>] ? trace_hardirqs_on_caller+0x88/0x160
     [<c012f8b2>] run_timer_softirq+0x162/0x1c0
     [<c0471ab0>] ? vortex_timer+0x0/0x3e0
     [<c012b361>] local_bh_enable_ip+0xd1/0xe0
     [<c08d9f9f>] _spin_unlock_bh+0x2f/0x40
     [<c0471b92>] vortex_timer+0xe2/0x3e0
     [<c014743b>] ? trace_hardirqs_on+0xb/0x10
     [<c0147358>] ? trace_hardirqs_on_caller+0x88/0x160
     [<c012f8b2>] run_timer_softirq+0x162/0x1c0
     [<c0471ab0>] ? vortex_timer+0x0/0x3e0
     [<c0471ab0>] ? vortex_timer+0x0/0x3e0
     [<c012b60a>] __do_softirq+0x9a/0x160
     [<c012b570>] ? __do_softirq+0x0/0x160
     [<c0106775>] call_on_stack+0x15/0x30
     [<c012b4f5>] ? irq_exit+0x55/0x60
     [<c0106e85>] ? do_IRQ+0x85/0xd0
     [<c0147391>] ? trace_hardirqs_on_caller+0xc1/0x160
     [<c0104888>] ? common_interrupt+0x28/0x30
     [<c08d8ac8>] ? mutex_unlock+0x8/0x10
     [<c08d8180>] ? _cond_resched+0x10/0x30
     [<c07a3be7>] ? netpoll_setup+0x117/0x390
     [<c0cbfcfe>] ? init_netconsole+0x14e/0x1b0
     [<c013d539>] ? ktime_get+0x19/0x40
     [<c0c9bab2>] ? kernel_init+0x1b2/0x2c0
     [<c0cbfbb0>] ? init_netconsole+0x0/0x1b0
     [<c0396aa4>] ? trace_hardirqs_on_thunk+0xc/0x10
     [<c0103f12>] ? restore_nocheck_notrace+0x0/0xe
     [<c0c9b900>] ? kernel_init+0x0/0x2c0
     [<c0c9b900>] ? kernel_init+0x0/0x2c0
     [<c0104aa7>] ? kernel_thread_helper+0x7/0x10
     =======================
---[ end trace 37f9c502aff112e0 ]---
    console [netcon0] enabled
    netconsole: network logging started
    initcall init_netconsole+0x0/0x1b0 returned 0 after 2914 msecs

looking at the driver I think the bug is real and the fix actually
is trivial.

vp->lock is also taken in hardware IRQ context, so we _have_ to always
use irqsafe locking. As we run in a timer with IRQs disabled,
we can simply use spin_lock.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

b43legacy: Fix possible NULL pointer dereference in DMA code

commit 2f9ec47d0954f9d2e5a00209c2689cbc477a8c89 upstream

This fixes a possible NULL pointer dereference in an error path of the
DMA allocation error checking code. This is also necessary for a future
DMA API change that is on its way into the mainline kernel that adds
an additional dev parameter to dma_mapping_error().

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

hdaps: add support for various newer Lenovo thinkpads

commit 292d73551d0aa19526c3417e791c529b49ebadf3 upstream

Adds R61, T61p, X61s, X61, Z61m, Z61p models to whitelist.

Fixes this:

cullen@lenny:~$ sudo modprobe hdaps
FATAL: Error inserting hdaps (/lib/modules/2.6.22-10-generic/kernel/drivers/hwmon/hdaps.ko): No such device

[25192.888000] hdaps: supported laptop not found!
[25192.888000] hdaps: driver init failed (ret=-19)!

Originally based on an Ubuntu patch that got it wrong, the dmidecode
output of the corresponding laptops shows LENOVO as the manufacturer.
https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/133636

tested on X61s:
[  184.893588] hdaps: inverting axis readings.
[  184.893588] hdaps: LENOVO ThinkPad X61s detected.
[  184.893588] input: hdaps as /class/input/input12
[  184.924326] hdaps: driver successfully loaded.

Cc: Klaus S. Madsen <ubuntu@hjernemadsen.org>
Cc: Chuck Short <zulcss@ubuntu.com>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: maximilian attems <max@stro.at>
Cc: Mark M. Hoffman <mhoffman@lightlink.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

USB: fix interrupt disabling for HCDs with shared interrupt handlers

commit de85422b94ddb23c021126815ea49414047c13dc upstream

As has been discussed several times on LKML, IRQF_SHARED | IRQF_DISABLED
doesn't work reliably, i.e. a shared interrupt handler CAN'T be certain to
be called with interrupts disabled. Most USB HCD handlers use IRQF_DISABLED
and therefore havoc can break out if they share their interrupt with a
handler that doesn't use it.

On my test machine the yenta_socket interrupt handler (no IRQF_DISABLED)
was registered before ehci_hcd and one uhci_hcd instance. Therefore all
usb_hcd_irq() invocations for ehci_hcd and for one uhci_hcd instance
happened with interrupts enabled. That led to random lockups as USB core
HCD functions that acquire the same spinlock could be called twice
from interrupt handlers.

This patch updates usb_hcd_irq() to always disable/restore interrupts.
usb_add_hcd() will silently remove any IRQF_DISABLED requested from HCD code.

Signed-off-by: Stefan Becker <stefan.becker@nokia.com>
Acked-by: David Brownell <david-b@pacbell.net>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

USB: ohci - record data toggle after unlink

commit 29c8f6a727a683b5988877dd80dbdefd49e64a51 upstream

This patch fixes a problem with OHCI where canceling bulk or
interrupt URBs may lose track of the right data toggle.  This
seems to be a longstanding bug, possibly dating back to the
Linux 2.4 kernel, which stayed hidden because

(a) about half the time the data toggle bit was correct;
(b) canceling such URBs is unusual; and
(c) the few drivers which cancel these URBs either
      [1] do it only as part of shutting down, or
      [2] have fault recovery logic, which recovers.

For those transfer types, the toggle is normally written back
into the ED when each TD is retired.  But canceling bypasses
the mechanism used to retire TDs ... so on average, half the
time the toggle bit will be invalid after cancelation.

The fix is simple:  the toggle state of any canceled TDs are
propagated back to the ED in the finish_unlinks function.

(Issue found by leonidv11@gmail.com ...)

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Leonid <leonidv11@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

USB: ehci - fix timer regression

commit 056761e55c8687ddf3db14226213f2e8dc2689bc upstream

This patch fixes a regression in the EHCI driver's TIMER_IO_WATCHDOG
behavior. The patch "USB: EHCI: add separate IAA watchdog timer" changed
how that timer is handled, so that short timeouts on the remaining
timer (unfortunately, overloaded) would never be used.

This takes a more direct approach, reorganizing the code slightly to
be explicit about only the I/O watchdog role now being overridable.
It also replaces a now-obsolete comment describing older timer behavior.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Leonid <leonidv11@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

OHCI: Fix problem if SM501 and another platform driver is selected

commit 3ee38d8bf46b364b1ca364ddb7c379a4afcd8bbb upstream

If the SM501 and another platform driver, such as the SM501
then we end up defining PLATFORM_DRIVER twice. This patch
seperated the SM501 onto a seperate define of SM501_OHCI_DRIVER
so that it can be selected without overwriting the original
definition.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

block: Properly notify block layer of sync writes

commit 18ce3751ccd488c78d3827e9f6bf54e6322676fb upstream

fsync_buffers_list() and sync_dirty_buffer() both issue async writes and
then immediately wait on them. Conceptually, that makes them sync writes
and we should treat them as such so that the IO schedulers can handle
them appropriately.

This patch fixes a write starvation issue that Lin Ming reported, where
xx is stuck for more than 2 minutes because of a large number of
synchronous IO in the system:

INFO: task kjournald:20558 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
kjournald D ffff810010820978 6712 20558 2
ffff81022ddb1d10 0000000000000046 ffff81022e7baa10 ffffffff803ba6f2
ffff81022ecd0000 ffff8101e6dc9160 ffff81022ecd0348 000000008048b6cb
0000000000000086 ffff81022c4e8d30 0000000000000000 ffffffff80247537
Call Trace:
[<ffffffff803ba6f2>] kobject_get+0x12/0x17
[<ffffffff80247537>] getnstimeofday+0x2f/0x83
[<ffffffff8029c1ac>] sync_buffer+0x0/0x3f
[<ffffffff8066d195>] io_schedule+0x5d/0x9f
[<ffffffff8029c1e7>] sync_buffer+0x3b/0x3f
[<ffffffff8066d3f0>] __wait_on_bit+0x40/0x6f
[<ffffffff8029c1ac>] sync_buffer+0x0/0x3f
[<ffffffff8066d48b>] out_of_line_wait_on_bit+0x6c/0x78
[<ffffffff80243909>] wake_bit_function+0x0/0x23
[<ffffffff8029e3ad>] sync_dirty_buffer+0x98/0xcb
[<ffffffff8030056b>] journal_commit_transaction+0x97d/0xcb6
[<ffffffff8023a676>] lock_timer_base+0x26/0x4b
[<ffffffff8030300a>] kjournald+0xc1/0x1fb
[<ffffffff802438db>] autoremove_wake_function+0x0/0x2e
[<ffffffff80302f49>] kjournald+0x0/0x1fb
[<ffffffff802437bb>] kthread+0x47/0x74
[<ffffffff8022de51>] schedule_tail+0x28/0x5d
[<ffffffff8020cac8>] child_rip+0xa/0x12
[<ffffffff80243774>] kthread+0x0/0x74
[<ffffffff8020cabe>] child_rip+0x0/0x12

Lin Ming confirms that this patch fixes the issue. I've run tests with
it for the past week and no ill effects have been observed, so I'm
proposing it for inclusion into 2.6.26.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

md: Ensure interrupted recovery completed properly (v1 metadata plus bitmap)

commit 8c2e870a625bd336b2e7a65a97c1836acef07322 upstream

If, while assembling an array, we find a device which is not fully
in-sync with the array, it is important to set the "fullsync" flags.
This is an exact analog to the setting of this flag in hot_add_disk
methods.

Currently, only v1.x metadata supports having devices in an array
which are not fully in-sync (it keep track of how in sync they are).
The 'fullsync' flag only makes a difference when a write-intent bitmap
is being used. In this case it tells recovery to ignore the bitmap
and recovery all blocks.

This fix is already in place for raid1, but not raid5/6 or raid10.

So without this fix, a raid1 ir raid4/5/6 array with version 1.x
metadata and a write intent bitmaps, that is stopped in the middle
of a recovery, will appear to complete the recovery instantly
after it is reassembled, but the recovery will not be correct.

If you might have an array like that, issueing
echo repair > /sys/block/mdXX/md/sync_action

will make sure recovery completes properly.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

md: Don't acknowlege that stripe-expand is complete until it really is.

commit efe311431869b40d67911820a309f9a1a41306f3 upstream

We shouldn't acknowledge that a stripe has been expanded (When
reshaping a raid5 by adding a device) until the moved data has
actually been written out. However we are currently
acknowledging (by calling md_done_sync) when the POST_XOR
is complete and before the write.

So track in s.locked whether there are pending writes, and don't
call md_done_sync yet if there are.

Note: we all set R5_LOCKED on devices which are are about to
read from. This probably isn't technically necessary, but is
usually done when writing a block, and justifies the use of
s.locked here.

This bug can lead to a crash if an array is stopped while an reshape
is in progress.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

md: Fix error paths if md_probe fails.

commit 9bbbca3a0ee09293108b67835c6bdf6196d7bcb3 upstream

md_probe can fail (e.g. alloc_disk could fail) without
returning an error (as it alway returns NULL).
So when we call mddev_find immediately afterwards, we need
to check that md_probe actually succeeded. This means checking
that mdev->gendisk is non-NULL.

Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

block: Fix the starving writes bug in the anticipatory IO scheduler

commit d585d0b9d73ed999cc7b8cf3cac4a5b01abb544e upstream

AS scheduler alternates between issuing read and write batches. It does
the batch switch only after all requests from the previous batch are
completed.

When switching to a write batch, if there is an on-going read request,
it waits for its completion and indicates its intention of switching by
setting ad->changed_batch and the new direction but does not update the
batch_expire_time for the new write batch which it does in the case of
no previous pending requests.
On completion of the read request, it sees that we were waiting for the
switch and schedules work for kblockd right away and resets the
ad->changed_data flag.
Now when kblockd enters dispatch_request where it is expected to pick
up a write request, it in turn ends the write batch because the
batch_expire_timer was not updated and shows the expire timestamp for
the previous batch.

This results in the write starvation for all the cases where there is
the intention for switching to a write batch, but there is a previous
in-flight read request and the batch gets reverted to a read_batch
right away.

This also holds true in the reverse case (switching from a write batch
to a read batch with an in-flight write request).

I've checked that this bug exists on 2.6.11, 2.6.18, 2.6.24 and
linux-2.6-block git HEAD. I've tested the fix on x86 platforms with
SCSI drives where the driver asks for the next request while a current
request is in-flight.

This patch is based off linux-2.6-block git HEAD.

Bug reproduction:
A simple scenario which reproduces this bug is:
- dd if=/dev/hda3 of=/dev/null &
- lilo
   The lilo takes forever to complete.

This can also be reproduced fairly easily with the earlier dd and
another test
program doing msync().

The example test program below should print out a message after every
iteration
but it simply hangs forever. With this bugfix it makes forward progress.

====
Example test program using msync() (thanks to suleiman AT google DOT
com)

inline uint64_t
rdtsc(void)
{
         int64_t tsc;

         __asm __volatile("rdtsc" : "=A" (tsc));
         return (tsc);
}

int
main(int argc, char **argv)
{
         struct stat st;
         uint64_t e, s, t;
         char *p, q;
         long i;
         int fd;

         if (argc < 2) {
                 printf("Usage: %s <file>\n", argv[0]);
                 return (1);
         }

         if ((fd = open(argv[1], O_RDWR | O_NOATIME)) < 0)
                 err(1, "open");

         if (fstat(fd, &st) < 0)
                 err(1, "fstat");

         p = mmap(NULL, st.st_size, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0);

         t = 0;
         for (i = 0; i < 1000; i++) {
                 *p = 0;
                 msync(p, 4096, MS_SYNC);
                 s = rdtsc();
                *p = 0;
                 __asm __volatile(""::: "memory");
                 e = rdtsc();
                 if (argc > 2)
                         printf("%d: %lld cycles %jd %jd\n",
                                i, e - s, (intmax_t)s, (intmax_t)e);
                 t += e - s;
         }
         printf("average time: %lld cycles\n", t / 1000);
         return (0);
}

Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

mac80211: detect driver tx bugs

When a driver rejects a frame in it's ->tx() callback, it must also
stop queues, otherwise mac80211 can go into a loop here. Detect this
situation and abort the loop after five retries, warning about the
driver bug.

This patch was added to mainline as
commit ef3a62d272f033989e83eb1f26505f93f93e3e69.

Thanks to Larry Finger <Larry.Finger@lwfinger.net> for doing the -stable
port.

Cc: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

b43: Fix possible MMIO access while device is down

This fixes a possible MMIO access while the device is still down
from a suspend cycle. MMIO accesses with the device powered down
may cause crashes on certain devices.

Upstream commit is
33598cf261e393f2b3349cb55509e358014bfd1f

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

b43: Do not return TX_BUSY from op_tx

Never return TX_BUSY from op_tx. It doesn't make sense to return
TX_BUSY, if we can not transmit the packet.
Drop the packet and return TX_OK.
This will fix the resume hang.

Upstream commit is
66193a7cef2239bfd1b9b96e304770facf7a49c7

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

b43legacy: Do not return TX_BUSY from op_tx

Never return TX_BUSY from op_tx. It doesn't make sense to return
TX_BUSY, if we can not transmit the packet.
Drop the packet and return TX_OK.

Upstream commit is
eb803e419ca6be06ece2e42027bb4ebd8ec09f91

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Linux 2.6.25.11

x86: fix ldt limit for 64 bit

commit 5ac37f87ff18843aabab84cf75b2f8504c2d81fe upstream

Fix size of LDT entries. On x86-64, ldt_desc is a double-sized descriptor.

Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Linux 2.6.25.10

x86: shift bits the right way in native_read_tscp

Commit 41aefdcc98fdba47459eab67630293d67e855fc3 upstream

x86: shift bits the right way in native_read_tscp

native_read_tscp shifts the bits in the high order value in the
wrong direction, the attached patch fixes that.

Signed-off-by: Max Asbock <masbock@linux.vnet.ibm.com>
Acked-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86: fix cpu hotplug crash

Commit fcb43042ef55d2f46b0efa5d7746967cef38f056 upstream

x86: fix cpu hotplug crash

Vegard Nossum reported crashes during cpu hotplug tests:

http://marc.info/?l=linux-kernel&m=121413950227884&w=4

In function _cpu_up, the panic happens when calling
__raw_notifier_call_chain at the second time. Kernel doesn't panic when
calling it at the first time. If just say because of nr_cpu_ids, that's
not right.

By checking the source code, I found that function do_boot_cpu is the culprit.
Consider below call chain:
_cpu_up=>__cpu_up=>smp_ops.cpu_up=>native_cpu_up=>do_boot_cpu.

So do_boot_cpu is called in the end. In do_boot_cpu, if
boot_error==true, cpu_clear(cpu, cpu_possible_map) is executed. So later
on, when _cpu_up calls __raw_notifier_call_chain at the second time to
report CPU_UP_CANCELED, because this cpu is already cleared from
cpu_possible_map, get_cpu_sysdev returns NULL.

Many resources are related to cpu_possible_map, so it's better not to
change it.

Below patch against 2.6.26-rc7 fixes it by removing the bit clearing in
cpu_possible_map.

Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>
Tested-by: Vegard Nossum <vegard.nossum@gmail.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ptrace GET/SET FPXREGS broken

Commit 11dbc963a8f6128595d0f6ecf138dc369e144997 upstream

ptrace GET/SET FPXREGS broken

When I update kernel 2.6.25 from 2.6.24, gdb does not work.
On 2.6.25, ptrace(PTRACE_GETFPXREGS, ...) returns ENODEV.

But 2.6.24 kernel's ptrace() returns EIO.
It is issue of compatibility.

I attached test program as pt.c and patch for fix it.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <errno.h>
#include <sys/ptrace.h>
#include <sys/types.h>

struct user_fxsr_struct {
unsigned short cwd;
unsigned short swd;
unsigned short twd;
unsigned short fop;
long fip;
long fcs;
long foo;
long fos;
long mxcsr;
long reserved;
long st_space[32]; /* 8*16 bytes for each FP-reg = 128 bytes */
long xmm_space[32]; /* 8*16 bytes for each XMM-reg = 128 bytes */
long padding[56];
};

int main(void)
{
  pid_t pid;

  pid = fork();

  switch(pid){
  case -1:/*  error */
    break;
  case 0:/*  child */
    child();
    break;
  default:
    parent(pid);
    break;
  }
  return 0;
}

int child(void)
{
  ptrace(PTRACE_TRACEME);
  kill(getpid(), SIGSTOP);
  sleep(10);
  return 0;
}
int parent(pid_t pid)
{
  int ret;
  struct user_fxsr_struct fpxregs;

  ret = ptrace(PTRACE_GETFPXREGS, pid, 0, &fpxregs);
  if(ret < 0){
    printf("%d: %s.\n", errno, strerror(errno));
  }
  kill(pid, SIGCONT);
  wait(pid);
  return 0;
}

/* in the kerel, at kernel/i387.c get_fpxregs() */

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

sched: fix cpu hotplug

Commit 79c537998d143b127c8c662a403c3356cb885f1c upstream

the CPU hotplug problems (crashes under high-volume unplug+replug
tests) seem to be related to migrate_dead_tasks().

Firstly I added traces to see all tasks being migrated with
migrate_live_tasks() and migrate_dead_tasks(). On my setup the problem
pops up (the one with "se == NULL" in the loop of
pick_next_task_fair()) shortly after the traces indicate that some has
been migrated with migrate_dead_tasks()). btw., I can reproduce it
much faster now with just a plain cpu down/up loop.

[disclaimer] Well, unless I'm really missing something important in
this late hour [/desclaimer] pick_next_task() is not something
appropriate for migrate_dead_tasks() :-)

the following change seems to eliminate the problem on my setup
(although, I kept it running only for a few minutes to get a few
messages indicating migrate_dead_tasks() does move tasks and the
system is still ok)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86_64 ptrace: fix sys32_ptrace task_struct leak

Commit 5a4646a4efed8c835f76c3b88f3155f6ab5b8d9b introduced a leak of
task_struct refs into sys32_ptrace. This bug has already gone away in
for 2.6.26 in commit 562b80bafffaf42a6d916b0a2ee3d684220a1c10.

Signed-off-by: Roland McGrath <roland@redhat.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

DRM: enable bus mastering on i915 at resume time

commit ea7b44c8e6baa1a4507f05ba2c0009ac21c3fe0b upstream

On 9xx chips, bus mastering needs to be enabled at resume time for much of the
chip to function. With this patch, vblank interrupts will work as expected
on resume, along with other chip functions. Fixes kernel bugzilla #10844.

Signed-off-by: Jie Luo <clotho67@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

IB/mthca: Clear ICM pages before handing to FW

commit 87afd448b186c885d67a08b7417cd46253b6a9d6 upstream

Current memfree FW has a bug which in some cases, assumes that ICM
pages passed to it are cleared.  This patch uses __GFP_ZERO to
allocate all ICM pages passed to the FW.  Once firmware with a fix is
released, we can make the workaround conditional on firmware version.

This fixes the bug reported by Arthur Kepner <akepner@sgi.com> here:
http://lists.openfabrics.org/pipermail/general/2008-May/050026.html

[ Rewritten to be a one-liner using __GFP_ZERO instead of vmap()ing
  ICM memory and memset()ing it to 0. - Roland ]

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

futexes: fix fault handling in futex_lock_pi

commit 1b7558e457ed0de61023cfc913d2c342c7c3d9f2 upstream

This patch addresses a very sporadic pi-futex related failure in
highly threaded java apps on large SMP systems.

David Holmes reported that the pi_state consistency check in
lookup_pi_state triggered with his test application. This means that
the kernel internal pi_state and the user space futex variable are out
of sync. First we assumed that this is a user space data corruption,
but deeper investigation revieled that the problem happend because the
pi-futex code is not handling a fault in the futex_lock_pi path when
the user space variable needs to be fixed up.

The fault happens when a fork mapped the anon memory which contains
the futex readonly for COW or the page got swapped out exactly between
the unlock of the futex and the return of either the new futex owner
or the task which was the expected owner but failed to acquire the
kernel internal rtmutex. The current futex_lock_pi() code drops out
with an inconsistent in case it faults and returns -EFAULT to user
space. User space has no way to fixup that state.

When we wrote this code we thought that we could not drop the hash
bucket lock at this point to handle the fault.

After analysing the code again it turned out to be wrong because there
are only two tasks involved which might modify the pi_state and the
user space variable:

- the task which acquired the rtmutex
- the pending owner of the pi_state which did not get the rtmutex

Both tasks drop into the fixup_pi_state() function before returning to
user space. The first task which acquired the hash bucket lock faults
in the fixup of the user space variable, drops the spinlock and calls
futex_handle_fault() to fault in the page. Now the second task could
acquire the hash bucket lock and tries to fixup the user space
variable as well. It either faults as well or it succeeds because the
first task already faulted the page in.

One caveat is to avoid a double fixup. After returning from the fault
handling we reacquire the hash bucket lock and check whether the
pi_state owner has been modified already.

Reported-by: David Holmes <david.holmes@sun.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Holmes <david.holmes@sun.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

TTY: fix for tty operations bugs

This is fixed with the recent tty operations rewrite in mainline in a
different way, this is a selective backport of the relevant portions to
the -stable tree.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Linux 2.6.25.9

Fix ZERO_PAGE breakage with vmware

commit 672ca28e300c17bf8d792a2a7a8631193e580c74 upstream

Commit 89f5b7da2a6bad2e84670422ab8192382a5aeb9f ("Reinstate ZERO_PAGE
optimization in 'get_user_pages()' and fix XIP") broke vmware, as
reported by Jeff Chua:

  "This broke vmware 6.0.4.
   Jun 22 14:53:03.845: vmx| NOT_IMPLEMENTED
   /build/mts/release/bora-93057/bora/vmx/main/vmmonPosix.c:774"

and the reason seems to be that there's an old bug in how we handle do
FOLL_ANON on VM_SHARED areas in get_user_pages(), but since it only
triggered if the whole page table was missing, nobody had apparently hit
it before.

The recent changes to 'follow_page()' made the FOLL_ANON logic trigger
not just for whole missing page tables, but for individual pages as
well, and exposed this problem.

This fixes it by making the test for when FOLL_ANON is used more
careful, and also makes the code easier to read and understand by moving
the logic to a separate inline function.

Reported-and-tested-by: Jeff Chua <jeff.chua.linux@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

hwmon: (adt7473) Initialize max_duty_at_overheat before use

commit ed4ec814e45ae8b1596aea0a29b92f6c3614acaa upstream

data->max_duty_at_overheat is not updated in adt7473_update_device,
so it might be used before it is initialized (if the user reads from
sysfs file max_duty_at_crit before writing to it.)

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

hwmon: (lm85) Fix function RANGE_TO_REG()

Function RANGE_TO_REG() is broken. For a requested range of 2000 (2
degrees C), it will return an index value of 15, i.e. 80.0 degrees C,
instead of the expected index value of 0. All other values are handled
properly, just 2000 isn't.

The bug was introduced back in November 2004 by this patch:
http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commit;h=1c28d80f1992240373099d863e4996cdd5d646d0

In Linus' kernel I decided to rewrite the whole function in a way
which was more obviously correct. But for -stable let's just do the
minimal fix.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

watchdog: hpwdt: fix use of inline assembly

commit 1f6ef2342972dc7fd623f360f84006e2304eb935 upstream

The inline assembly in drivers/watchdog/hpwdt.c was incredibly broken,
and included all the function prologue and epilogue stuff, even though
it was itself then inside a C function where the compiler would add its
own prologue and epilogue on top of it all.

This then just _happened_ to work if you had exactly the right compiler
version and exactly the right compiler flags, so that gcc just happened
to not create any prologue at all (the gcc-generated epilogue wouldn't
matter, since it would never be reached).

But the more proper way to fix it is to simply not do this. Move the
inline asm to the top level, with no surrounding function at all (the
better alternative would be to remove the prologue and make it actually
use proper description of the arguments to the inline asm, but that's a
bigger change than the one I'm willing to make right now).

Tested-by: S.Çağlar Onur <caglar@pardus.org.tr>
Acked-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86: set PAE PHYSICAL_MASK_SHIFT to 44 bits.

commit ad524d46f36bbc32033bb72ba42958f12bf49b06 upstream

When a 64-bit x86 processor runs in 32-bit PAE mode, a pte can
potentially have the same number of physical address bits as the
64-bit host ("Enhanced Legacy PAE Paging").  This means, in theory,
we could have up to 52 bits of physical address in a pte.

The 32-bit kernel uses a 32-bit unsigned long to represent a pfn.
This means that it can only represent physical addresses up to 32+12=44
bits wide.  Rather than widening pfns everywhere, just set 2^44 as the
Linux x86_32-PAE architectural limit for physical address size.

This is a bugfix for two cases:
1. running a 32-bit PAE kernel on a machine with
  more than 64GB RAM.
2. running a 32-bit PAE Xen guest on a host machine with
  more than 64GB RAM

In both cases, a pte could need to have more than 36 bits of physical,
and masking it to 36-bits will cause fairly severe havoc.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86: use BOOTMEM_EXCLUSIVE on 32-bit

commit d3942cff620bea073fc4e3c8ed878eb1e84615ce upstream

This patch uses the BOOTMEM_EXCLUSIVE for crashkernel reservation also for
i386 and prints a error message on failure.

The patch is still for 2.6.26 since it is only bug fixing. The unification
of reserve_crashkernel() between i386 and x86_64 should be done for 2.6.27.

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Add return value to reserve_bootmem_node()

commit 71c2742f5e6348d76ee62085cf0a13e5eff0f00e upstream

This patch changes the function reserve_bootmem_node() from void to int,
returning -ENOMEM if the allocation fails.

This fixes a build problem on x86 with CONFIG_KEXEC=y and
CONFIG_NEED_MULTIPLE_NODES=y

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Reported-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

sctp: Make sure N * sizeof(union sctp_addr) does not overflow.

commit 735ce972fbc8a65fb17788debd7bbe7b4383cc62 upstream

As noticed by Gabriel Campana, the kmalloc() length arg
passed in by sctp_getsockopt_local_addrs_old() can overflow
if ->addr_num is large enough.

Therefore, enforce an appropriate limit.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Reinstate ZERO_PAGE optimization in 'get_user_pages()' and fix XIP

commit 89f5b7da2a6bad2e84670422ab8192382a5aeb9f upstream

KAMEZAWA Hiroyuki and Oleg Nesterov point out that since the commit
557ed1fa2620dc119adb86b34c614e152a629a80 ("remove ZERO_PAGE") removed
the ZERO_PAGE from the VM mappings, any users of get_user_pages() will
generally now populate the VM with real empty pages needlessly.

We used to get the ZERO_PAGE when we did the "handle_mm_fault()", but
since fault handling no longer uses ZERO_PAGE for new anonymous pages,
we now need to handle that special case in follow_page() instead.

In particular, the removal of ZERO_PAGE effectively removed the core
file writing optimization where we would skip writing pages that had not
been populated at all, and increased memory pressure a lot by allocating
all those useless newly zeroed pages.

This reinstates the optimization by making the unmapped PTE case the
same as for a non-existent page table, which already did this correctly.

While at it, this also fixes the XIP case for follow_page(), where the
caller could not differentiate between the case of a page that simply
could not be used (because it had no "struct page" associated with it)
and a page that just wasn't mapped.

We do that by simply returning an error pointer for pages that could not
be turned into a "struct page *". The error is arbitrarily picked to be
EFAULT, since that was what get_user_pages() already used for the
equivalent IO-mapped page case.

[ Also removed an impossible test for pte_offset_map_lock() failing:
that's not how that function works ]

Acked-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Nick Piggin <npiggin@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

atl1: relax eeprom mac address error check

upstream commit: 58c7821c4264a7ddd6f0c31c5caaf393b3897f10

The atl1 driver tries to determine the MAC address thusly:

- If an EEPROM exists, read the MAC address from EEPROM and
  validate it.
- If an EEPROM doesn't exist, try to read a MAC address from
  SPI flash.
- If that fails, try to read a MAC address directly from the
  MAC Station Address register.
- If that fails, assign a random MAC address provided by the
  kernel.

We now have a report of a system fitted with an EEPROM containing all
zeros where we expect the MAC address to be, and we currently handle
this as an error condition.  Turns out, on this system the BIOS writes
a valid MAC address to the NIC's MAC Station Address register, but we
never try to read it because we return an error when we find the all-
zeros address in EEPROM.

This patch relaxes the error check and continues looking for a MAC
address even if it finds an illegal one in EEPROM.

http://ubuntuforums.org/showthread.php?t=562617

[jacliburn@bellsouth.net: backport to 2.6.25.7]

Signed-off-by: Radu Cristescu <advantis@gmx.net>
Signed-off-by: Jay Cliburn <jacliburn@bellsouth.net>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Linux 2.6.25.8

x86: disable mwait for AMD family 10H/11H CPUs

back-ported from upstream commit e9623b35599fcdbc00c16535cbefbb4d5578f4ab by Vegard Nossum

The previous revert of 0c07ee38c9d4eb081758f5ad14bbffa7197e1aec left
out the mwait disable condition for AMD family 10H/11H CPUs.

Andreas Herrman said:

It depends on the CPU. For AMD CPUs that support MWAIT this is wrong.
Family 0x10 and 0x11 CPUs will enter C1 on HLT. Powersavings then
depend on a clock divisor and current Pstate of the core.

If all cores of a processor are in halt state (C1) the processor can
enter the C1E (C1 enhanced) state. If mwait is used this will never
happen.

Thus HLT saves more power than MWAIT here.

It might be best to switch off the mwait flag for these AMD CPU
families like it was introduced with commit
f039b754714a422959027cb18bb33760eb8153f0 (x86: Don't use MWAIT on AMD
Family 10)

Re-add the AMD families 10H/11H check and disable the mwait usage for
those.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86: remove mwait capability C-state check

back-ported from upstream commit a738d897b7b03b83488ae74a9bc03d26a2875dc6 by Vegard Nossum

Vegard Nossum reports:

| powertop shows between 200-400 wakeups/second with the description
| "<kernel IPI>: Rescheduling interrupts" when all processors have load (e.g.
| I need to run two busy-loops on my 2-CPU system for this to show up).
|
| The bisect resulted in this commit:
|
| commit 0c07ee38c9d4eb081758f5ad14bbffa7197e1aec
| Date: Wed Jan 30 13:33:16 2008 +0100
|
| x86: use the correct cpuid method to detect MWAIT support for C states

remove the functional effects of this patch and make mwait unconditional.

A future patch will turn off mwait on specific CPUs where that causes
power to be wasted.

Bisected-by: Vegard Nossum <vegard.nossum@gmail.com>
Tested-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

nf_conntrack_h323: fix memory leak in module initialization error path

netfilter: nf_conntrack_h323: fix memory leak in module initialization error path

Upstream commit 8a548868db62422113104ebc658065e3fe976951

Properly free h323_buffer when helper registration fails.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

nf_conntrack_h323: fix module unload crash

netfilter: nf_conntrack_h323: fix module unload crash

Upstream commit a56b8f81580761c65e4d8d0c04ac1cb7a788bdf1

The H.245 helper is not registered/unregistered, but assigned to
connections manually from the Q.931 helper. This means on unload
existing expectations and connections using the helper are not
cleaned up, leading to the following oops on module unload:

CPU 0 Unable to handle kernel paging request at virtual address c00a6828, epc == 802224dc, ra == 801d4e7c
Oops[#1]:
Cpu 0
$ 0   : 00000000 00000000 00000004 c00a67f0
$ 4   : 802a5ad0 81657e00 00000000 00000000
$ 8   : 00000008 801461c8 00000000 80570050
$12   : 819b0280 819b04b0 00000006 00000000
$16   : 802a5a60 80000000 80b46000 80321010
$20   : 00000000 00000004 802a5ad0 00000001
$24   : 00000000 802257a8
$28   : 802a4000 802a59e8 00000004 801d4e7c
Hi    : 0000000b
Lo    : 00506320
epc   : 802224dc ip_conntrack_help+0x38/0x74     Tainted: P
ra    : 801d4e7c nf_iterate+0xbc/0x130
Status: 1000f403    KERNEL EXL IE
Cause : 00800008
BadVA : c00a6828
PrId  : 00019374
Modules linked in: ip_nat_pptp ip_conntrack_pptp ath_pktlog wlan_acl wlan_wep wlan_tkip wlan_ccmp wlan_xauth ath_pci ath_dev ath_dfs ath_rate_atheros wlan ath_hal ip_nat_tftp ip_conntrack_tftp ip_nat_ftp ip_conntrack_ftp pppoe ppp_async ppp_deflate ppp_mppe pppox ppp_generic slhc
Process swapper (pid: 0, threadinfo=802a4000, task=802a6000)
Stack : 801e7d98 00000004 802a5a60 80000000 801d4e7c 801d4e7c 802a5ad0 00000004
        00000000 00000000 801e7d98 00000000 00000004 802a5ad0 00000000 00000010
        801e7d98 80b46000 802a5a60 80320000 80000000 801d4f8c 802a5b00 00000002
        80063834 00000000 80b46000 802a5a60 801e7d98 80000000 802ba854 00000000
        81a02180 80b7e260 81a021b0 819b0000 819b0000 80570056 00000000 00000001
        ...
Call Trace:
[<801e7d98>] ip_finish_output+0x0/0x23c
[<801d4e7c>] nf_iterate+0xbc/0x130
[<801d4e7c>] nf_iterate+0xbc/0x130
[<801e7d98>] ip_finish_output+0x0/0x23c
[<801e7d98>] ip_finish_output+0x0/0x23c
[<801d4f8c>] nf_hook_slow+0x9c/0x1a4

One way to fix this would be to split helper cleanup from the unregistration
function and invoke it for the H.245 helper, but since ctnetlink needs to be
able to find the helper for synchonization purposes, a better fix is to
register it normally and make sure its not assigned to connections during
helper lookup. The missing l3num initialization is enough for this, this
patch changes it to use AF_UNSPEC to make it more explicit though.

Reported-by: liannan <liannan@twsz.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

nf_conntrack: fix ctnetlink related crash in nf_nat_setup_info()

netfilter: nf_conntrack: fix ctnetlink related crash in nf_nat_setup_info()

Upstream commit ceeff7541e5a4ba8e8d97ffbae32b3f283cb7a3f

When creation of a new conntrack entry in ctnetlink fails after having
set up the NAT mappings, the conntrack has an extension area allocated
that is not getting properly destroyed when freeing the conntrack again.
This means the NAT extension is still in the bysource hash, causing a
crash when walking over the hash chain the next time:

BUG: unable to handle kernel paging request at 00120fbd
IP: [<c03d394b>] nf_nat_setup_info+0x221/0x58a
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP

Pid: 2795, comm: conntrackd Not tainted (2.6.26-rc5 #1)
EIP: 0060:[<c03d394b>] EFLAGS: 00010206 CPU: 1
EIP is at nf_nat_setup_info+0x221/0x58a
EAX: 00120fbd EBX: 00120fbd ECX: 00000001 EDX: 00000000
ESI: 0000019e EDI: e853bbb4 EBP: e853bbc8 ESP: e853bb78
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process conntrackd (pid: 2795, ti=e853a000 task=f7de10f0 task.ti=e853a000)
Stack: 00000000 e853bc2c e85672ec 00000008 c0561084 63c1db4a 00000000 00000000
00000000 0002e109 61d2b1c3 00000000 00000000 00000000 01114e22 61d2b1c3
00000000 00000000 f7444674 e853bc04 00000008 c038e728 0000000a f7444674
Call Trace:
[<c038e728>] nla_parse+0x5c/0xb0
[<c0397c1b>] ctnetlink_change_status+0x190/0x1c6
[<c0397eec>] ctnetlink_new_conntrack+0x189/0x61f
[<c0119aee>] update_curr+0x3d/0x52
[<c03902d1>] nfnetlink_rcv_msg+0xc1/0xd8
[<c0390228>] nfnetlink_rcv_msg+0x18/0xd8
[<c0390210>] nfnetlink_rcv_msg+0x0/0xd8
[<c038d2ce>] netlink_rcv_skb+0x2d/0x71
[<c0390205>] nfnetlink_rcv+0x19/0x24
[<c038d0f5>] netlink_unicast+0x1b3/0x216
...

Move invocation of the extension destructors to nf_conntrack_free()
to fix this problem.

Fixes http://bugzilla.kernel.org/show_bug.cgi?id=10875

Reported-and-Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

SCSI: sr: fix corrupt CD data after media change and delay

commit: d1daeabf0da5bfa1943272ce508e2ba785730bf0 upstream

Reported-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
If you delay 30s or more before mounting a CD after inserting it then
the kernel has the wrong value for the CD size.

http://marc.info/?t=121276133000001

The problem is in sr_test_unit_ready(): the function eats unit
attentions without adjusting the sdev->changed status. This means
that when the CD signals changed media via unit attention, we can
ignore it. Fix by making sr_test_unit_ready() adjust the changed
status.

Reported-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Tested-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ACPICA: Ignore ACPI table signature for Load() operator

upstream bc45b1d39a925b56796bebf8a397a0491489d85c

Without this patch booting with acpi_osi="!Windows 2006" is required
for several machines to function properly with cpufreq
due to failure to load a Vista specific table with a bad signature.

Only "SSDT" is acceptable to the ACPI spec, but tables are
seen with OEMx and null sigs. Therefore, signature validation
is worthless. Apparently MS ACPI accepts such signatures, ACPICA
must be compatible.

http://bugzilla.kernel.org/show_bug.cgi?id=9919
http://bugzilla.kernel.org/show_bug.cgi?id=10383
http://bugzilla.kernel.org/show_bug.cgi?id=10454
https://bugzilla.novell.com/show_bug.cgi?id=396311

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

scsi_host regression: fix scsi host leak

The patch is upstream as commit 3ed7897242b7efe977f3a8d06d4e5a4ebe28b10e

A different backport is necessary because of the class_device to device
conversion post 2.6.25.

commit 9c7701088a61cc0cf8a6e1c68d1e74e3cc2ee0b7
Author: Dave Young <hidave.darkstar@gmail.com>
Date:   Tue Jan 22 14:01:34 2008 +0800

    scsi: use class iteration api

Isn't a correct replacement for the original hand rolled host
lookup. The problem is that class_find_child would get a reference to
the host's class device which is never released.  Since the host class
device holds a reference to the host gendev, the host can never be
freed.

In 2.6.25 we started using class_find_device, and this function also
gets a reference to the device, so we end up with an extra ref
and the host will not get released.

This patch adds a class_put_device to balance the class_find_device()
get. I kept the scsi_host_get in scsi_host_lookup, because the target
layer is using scsi_host_lookup and it looks like it needs the SHOST_DEL
check.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

b43: Fix possible NULL pointer dereference in DMA code

a cut-down version of commit 028118a5f09a9c807e6b43e2231efdff9f224c74 upstream

This fixes a possible NULL pointer dereference in an error path of the
DMA allocation error checking code. In case the DMA allocation address is invalid,
the dev pointer is dereferenced for unmapping of the buffer.

Reported-by: Miles Lane <miles.lane@gmail.com>
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

b43: Fix noise calculation WARN_ON

commit 98a3b2fe435ae76170936c14f5c9e6a87548e3ef upstream.

This removes a WARN_ON that is responsible for the following koops:
http://www.kerneloops.org/searchweek.php?search=b43_generate_noise_sample

The comment in the patch describes why it's safe to simply remove
the check.

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

virtio_net: Fix skb->csum_start computation

commit 23cde76d801246a702e7a84c3fe3d655b35c89a1 upstream.

hdr->csum_start is the offset from the start of the ethernet
header to the transport layer checksum field. skb->csum_start
is the offset from skb->head.

skb_partial_csum_set() assumes that skb->data points to the
ethernet header - i.e. it computes skb->csum_start by adding
the headroom to hdr->csum_start.

Since eth_type_trans() skb_pull()s the ethernet header,
skb_partial_csum_set() should be called before
eth_type_trans().

(Without this patch, GSO packets from a guest to the world outside the
host are corrupted).

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

opti621: remove DMA support

commit f361037631ba547ea88adf8d2359d810c1b2605a upstream

These controllers don't support DMA.

Based on a bugreport from Juergen Kosel & inspired by pata_opti.c code.

Tested-by: Juergen Kosel <juergen.kosel@gmx.de>
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

opti621: disable read prefetch

commit 62128b2ca812c1266f4ff7bac068bf0b626c6179 upstream

This fixes 2.6.25 regression (kernel.org bugzilla bug #10723) caused by:

commit 912fb29a36a7269ac1c4a4df45bc0ac1d2637972
Author: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Date: Fri Oct 19 00:30:11 2007 +0200

opti621: always tune PIO
...

Based on a bugreport from Juergen Kosel & inspired by pata_opti.c code.

Bisected-by: Juergen Kosel <juergen.kosel@gmx.de>
Tested-by: Juergen Kosel <juergen.kosel@gmx.de>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Fix tty speed handling on 8250

commit e991a2bd4fa0b2f475b67dfe8f33e8ecbdcbb40b upstream.

We try and write the correct speed back but the serial midlayer already
mangles the speed on us and that means if we request B0 we report back B9600
when we should not. For now we'll hack around this in the drivers and serial
code, pending a better long term solution.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86-64: Fix "bytes left to copy" return value for copy_from_user()

commit 42a886af728c089df8da1b0017b0e7e6c81b5335 upstream

Most users by far do not care about the exact return value (they only
really care about whether the copy succeeded in its entirety or not),
but a few special core routines actually care deeply about exactly how
many bytes were copied from user space.

And the unrolled versions of the x86-64 user copy routines would
sometimes report that it had copied more bytes than it actually had.

Very few uses actually have partial copies to begin with, but to make
this bug even harder to trigger, most x86 CPU's use the "rep string"
instructions for normal user copies, and that version didn't have this
issue.

To make it even harder to hit, the one user of this that really cared
about the return value (and used the uncached version of the copy that
doesn't use the "rep string" instructions) was the generic write
routine, which pre-populated its source, once more hiding the problem by
avoiding the exception case that triggers the bug.

In other words, very special thanks to Bron Gondwana who not only
triggered this, but created a test-program to show it, and bisected the
behavior down to commit 08291429cfa6258c4cd95d8833beb40f828b194e ("mm:
fix pagecache write deadlocks") which changed the access pattern just
enough that you can now trigger it with 'writev()' with multiple
iovec's.

That commit itself was not the cause of the bug, it just allowed all the
stars to align just right that you could trigger the problem.

[ Side note: this is just the minimal fix to make the copy routines
  (with __copy_from_user_inatomic_nocache as the particular version that
  was involved in showing this) have the right return values.

  We really should improve on the exceptional case further - to make the
  copy do a byte-accurate copy up to the exact page limit that causes it
  to fail.  As it is, the callers have to do extra work to handle the
  limit case gracefully. ]

Reported-by: Bron Gondwana <brong@fastmail.fm>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Linux 2.6.25.7

mac80211: send association event on IBSS create

patch 507b06d0622480f8026d49a94f86068bb0fd6ed6 upstream

Otherwise userspace has no idea the IBSS creation succeeded.

Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86: fix recursive dependencies

commit 823c248e7cc75b4f22da914b01f8e5433cff197e in mainline

The proper dependency check uncovered a few dependency problems,
the subarchitecture used a mixture of selects and depends on SMP
and PCI dependency was messed up.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>

bttv: Fix a deadlock in the bttv driver

commit 81b2dbcad86732ffc02bad87aa25c4651199fc77 in mainline.

vidiocgmbuf() does this:
        mutex_lock(&fh->cap.vb_lock);
        retval = videobuf_mmap_setup(&fh->cap, gbuffers, gbufsize,
                                     V4L2_MEMORY_MMAP);

and videobuf_mmap_setup() then just does
        mutex_lock(&q->vb_lock);
        ret = __videobuf_mmap_setup(q, bcount, bsize, memory);
        mutex_unlock(&q->vb_lock);

which is an obvious double-take deadlock.

This patch fixes this by having vidiocgmbuf() just call the
__videobuf_mmap_setup function instead.

Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Reported-by: Koos Vriezen <koos.vriezen@gmail.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Kconfig: introduce ARCH_DEFCONFIG to DEFCONFIG_LIST

commit 73531905ed53576d9e8707659a761e7046a60497 in mainline.

init/Kconfig contains a list of configs that are searched
for if 'make *config' are used with no .config present.
Extend this list to look at the config identified by
ARCH_DEFCONFIG.

With this change we now try the defconfig targets last.

This fixes a regression reported
by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

serial: fix enable_irq_wake/disable_irq_wake imbalance in serial_core.c

commit 03a74dcc7eebe6edd778317e82fafdf71e68488c in mainline.

enable_irq_wake() and disable_irq_wake() need to be balanced. However,
serial_core.c calls these for different conditions during the suspend and
resume functions...

This is causing a regular WARN_ON() as found at
http://www.kerneloops.org/search.php?search=set_irq_wake

This patch makes the conditions for triggering the _wake enable/disable
sequence identical.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

CPUFREQ: Fix format string bug.

commit 326f6a5c9c9e1a62aec37bdc0c3f8d53adabe77b upstream

Format string bug. Not exploitable, as this is only writable by root,
but worth fixing all the same.

From: Chris Wright <chrisw@sous-sol.org>
Spotted-by: Ilja van Sprundel <ilja@netric.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

cifs: fix oops on mount when CONFIG_CIFS_DFS_UPCALL is enabled

simple "mount -t cifs //xxx /mnt" oopsed on strlen of options
http://kerneloops.org/guilty.php?guilty=cifs_get_sb&version=2.6.25-release&start=16711 \
68&end=1703935&class=oops

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

m68k: Add ext2_find_{first,next}_bit() for ext4

commit 69c5ddf58a03da3686691ad2f293bc79fd977c10 upstream

Add ext2_find_{first,next}_bit(), which are needed for ext4.
They're derived out of the ext2_find_next_zero_bit found in the same file.
Compile tested with crosstools

[Reworked to preserve all symmetry with ext2_find_{first,next}_zero_bit()]

This fixes http://bugzilla.kernel.org/show_bug.cgi?id=10393

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

IB/umem: Avoid sign problems when demoting npages to integer

commit 8079ffa0e18baaf2940e52e0c118eef420a473a4 upstream

On a 64-bit architecture, if ib_umem_get() is called with a size value
that is so big that npages is negative when cast to int, then the
length of the page list passed to get_user_pages(), namely

min_t(int, npages, PAGE_SIZE / sizeof (struct page *))

will be negative, and get_user_pages() will immediately return 0 (at
least since 900cf086, "Be more robust about bad arguments in
get_user_pages()"). This leads to an infinite loop in ib_umem_get(),
since the code boils down to:

while (npages) {
ret = get_user_pages(...);
npages -= ret;
}

Fix this by taking the minimum as unsigned longs, so that the value of
npages is never truncated.

The impact of this bug isn't too severe, since the value of npages is
checked against RLIMIT_MEMLOCK, so a process would need to have an
astronomical limit or have CAP_IPC_LOCK to be able to trigger this,
and such a process could already cause lots of mischief. But it does
let buggy userspace code cause a kernel lock-up; for example I hit
this with code that passes a negative value into a memory registartion
function where it is promoted to a huge u64 value.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

tcp: Fix inconsistency source (CA_Open only when !tcp_left_out(tp))

[ upstream commit: 8aca6cb1179ed9bef9351028c8d8af852903eae2 ]

It is possible that this skip path causes TCP to end up into an
invalid state where ca_state was left to CA_Open while some
segments already came into sacked_out. If next valid ACK doesn't
contain new SACK information TCP fails to enter into
tcp_fastretrans_alert(). Thus at least high_seq is set
incorrectly to a too high seqno because some new data segments
could be sent in between (and also, limited transmit is not
being correctly invoked there). Reordering in both directions
can easily cause this situation to occur.

I guess we would want to use tcp_moderate_cwnd(tp) there as well
as it may be possible to use this to trigger oversized burst to
network by sending an old ACK with huge amount of SACK info, but
I'm a bit unsure about its effects (mainly to FlightSize), so to
be on the safe side I just currently fixed it minimally to keep
TCP's state consistent (obviously, such nasty ACKs have been
possible this far). Though it seems that FlightSize is already
underestimated by some amount, so probably on the long term we
might want to trigger recovery there too, if appropriate, to make
FlightSize calculation to resemble reality at the time when the
losses where discovered (but such change scares me too much now
and requires some more thinking anyway how to do that as it
likely involves some code shuffling).

This bug was found by Brian Vowell while running my TCP debug
patch to find cause of another TCP issue (fackets_out
miscount).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

forcedeth: msi interrupts

commit 4db0ee176e256444695ee2d7b004552e82fec987 upstream

Add a workaround for lost MSI interrupts. There is a race condition in
the HW in which future interrupts could be missed. The workaround is to
toggle the MSI irq mask.

Added cleanup based on comments from Andrew Morton.

Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

hgafb: resource management fix

commit 630c270183133ac25bef8c8d726ac448df9b169a upstream
Date: Thu, 12 Jun 2008 15:21:29 -0700
Subject: hgafb: resource management fix

Release ports which are requested during detection which are not freed if
there is no hga card. Otherwise there is a crash during cat /proc/ioports
command.

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

cciss: add new hardware support

commit 24aac480e76c6f5d1391ac05c5e9c0eb9b0cd302 upstream
Date: Thu, 12 Jun 2008 15:21:34 -0700
Subject: cciss: add new hardware support

Add support for the next generation of HP Smart Array SAS/SATA
controllers. Shipping date is late Fall 2008.

Bump the driver version to 3.6.20 to reflect the new hardware support from
patch 1 of this set.

Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

mmc: wbsd: initialize tasklets before requesting interrupt

commit cef33400d0349fb24b6f8b7dea79b66e3144fd8b upstream

With CONFIG_DEBUG_SHIRQ set we will get an interrupt as soon as we
allocate one. Tasklets may be scheduled in the interrupt handler but they
will be initialized after the handler returns, causing a BUG() in
kernel/softirq.c when they run.

Should fix this Fedora bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=449817

Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
Acked-by: Pierre Ossman <drzeus@drzeus.cx>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

tcp FRTO: work-around inorder receivers

[ upstream commit: 79d44516b4b178ffb6e2159c75584cfcfc097914 ]

If receiver consumes segments successfully only in-order, FRTO
fallback to conventional recovery produces RTO loop because
FRTO's forward transmissions will always get dropped and need to
be resent, yet by default they're not marked as lost (which are
the only segments we will retransmit in CA_Loss).

Price to pay about this is occassionally unnecessarily
retransmitting the forward transmission(s). SACK blocks help
a bit to avoid this, so it's mainly a concern for NewReno case
though SACK is not fully immune either.

This change has a side-effect of fixing SACKFRTO problem where
it didn't have snd_nxt of the RTO time available anymore when
fallback become necessary (this problem would have only occured
when RTO would occur for two or more segments and ECE arrives
in step 3; no need to figure out how to fix that unless the
TODO item of selective behavior is considered in future).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Reported-by: Damon L. Chesser <damon@damtek.com>
Tested-by: Damon L. Chesser <damon@damtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>

tcp FRTO: SACK variant is errorneously used with NewReno

[ upstream commit: 62ab22278308a40bcb7f4079e9719ab8b7fe11b5 ]

Note: there's actually another bug in FRTO's SACK variant, which
is the causing failure in NewReno case because of the error
that's fixed here. I'll fix the SACK case separately (it's
a separate bug really, though related, but in order to fix that
I need to audit tp->snd_nxt usage a bit).

There were two places where SACK variant of FRTO is getting
incorrectly used even if SACK wasn't negotiated by the TCP flow.
This leads to incorrect setting of frto_highmark with NewReno
if a previous recovery was interrupted by another RTO.

An eventual fallback to conventional recovery then incorrectly
considers one or couple of segments as forward transmissions
though they weren't, which then are not LOST marked during
fallback making them "non-retransmittable" until the next RTO.
In a bad case, those segments are really lost and are the only
one left in the window. Thus TCP needs another RTO to continue.
The next FRTO, however, could again repeat the same events
making the progress of the TCP flow extremely slow.

In order for these events to occur at all, FRTO must occur
again in FRTOs step 3 while the key segments must be lost as
well, which is not too likely in practice. It seems to most
frequently with some small devices such as network printers
that *seem* to accept TCP segments only in-order. In cases
were key segments weren't lost, things get automatically
resolved because those wrongly marked segments don't need to be
retransmitted in order to continue.

I found a reproducer after digging up relevant reports (few
reports in total, none at netdev or lkml I know of), some
cases seemed to indicate middlebox issues which seems now
to be a false assumption some people had made. Bugzilla
#10063 _might_ be related. Damon L. Chesser <damon@damtek.com>
had a reproducable case and was kind enough to tcpdump it
for me. With the tcpdump log it was quite trivial to figure
out.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>

tcp FRTO: Fix fallback to conventional recovery

[ upstream commit: a1c1f281b84a751fdb5ff919da3b09df7297619f ]

It seems that commit 009a2e3e4ec ("[TCP] FRTO: Improve
interoperability with other undo_marker users") run into
another land-mine which caused fallback to conventional
recovery to break:

1. Cumulative ACK arrives after FRTO retransmission
2. tcp_try_to_open sees zero retrans_out, clears retrans_stamp
   which should be kept like in CA_Loss state it would be
3. undo_marker change allowed tcp_packet_delayed to return
   true because of the cleared retrans_stamp once FRTO is
   terminated causing LossUndo to occur, which means all loss
   markings FRTO made are reverted.

This means that the conventional recovery basically recovered
one loss per RTT, which is not that efficient. It was quite
unobvious that the undo_marker change broken something like
this, I had a quite long session to track it down because of
the non-intuitiviness of the bug (luckily I had a trivial
reproducer at hand and I was also able to learn to use kprobes
in the process as well :-)).

This together with the NewReno+FRTO fix and FRTO in-order
workaround this fixes Damon's problems, this and the first
mentioned are enough to fix Bugzilla #10063.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Reported-by: Damon L. Chesser <damon@damtek.com>
Tested-by: Damon L. Chesser <damon@damtek.com>
Tested-by: Sebastian Hyrwall <zibbe@cisko.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>

tcp: fix skb vs fack_count out-of-sync condition

[ upstream commit: a6604471db5e7a33474a7f16c64d6b118fae3e74 ]

This bug is able to corrupt fackets_out in very rare cases.
In order for this to cause corruption:
  1) DSACK in the middle of previous SACK block must be generated.
  2) In order to take that particular branch, part or all of the
     DSACKed segment must already be SACKed so that we have that
     in cache in the first place.
  3) The new info must be top enough so that fackets_out will be
     updated on this iteration.
...then fack_count is updated while skb wasn't, then we walk again
that particular segment thus updating fack_count twice for
a single skb and finally that value is assigned to fackets_out
by tcp_sacktag_one.

It is safe to call tcp_sacktag_one just once for a segment (at
DSACK), no need to call again for plain SACK.

Potential problem of the miscount are limited to premature entry
to recovery and to inflated reordering metric (which could even
cancel each other out in the most the luckiest scenarios :-)).
Both are quite insignificant in worst case too and there exists
also code to reset them (fackets_out once sacked_out becomes zero
and reordering metric on RTO).

This has been reported by a number of people, because it occurred
quite rarely, it has been very evasive. Andy Furniss was able to
get it to occur couple of times so that a bit more info was
collected about the problem using a debug patch, though it still
required lot of checking around. Thanks also to others who have
tried to help here.

This is listed as Bugzilla #10346. The bug was introduced by
me in commit 68f8353b48 ([TCP]: Rewrite SACK block processing &
sack_recv_cache use), I probably thought back then that there's
need to scan that entry twice or didn't dare to make it go
through it just once there. Going through twice would have
required restoring fack_count after the walk but as noted above,
I chose to drop the additional walk step altogether here.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>

l2tp: Fix possible oops if transmitting or receiving when tunnel goes down

[ upstream commit: 24b95685ffcdb3dc28f64b9e8af6ea3e8360fbc5 ]

Some problems have been experienced in the field which cause an oops
in the pppol2tp driver if L2TP tunnels fail while passing data.

The pppol2tp driver uses private data that is referenced via the
sk->sk_user_data of its UDP and PPPoL2TP sockets. This patch makes
sure that the driver uses sock_hold() when it holds a reference to the
sk pointer. This affects its sendmsg(), recvmsg(), getname(),
[gs]etsockopt() and ioctl() handlers.

Tested by ISP where problem was seen. System has been up 10 days with
no oops since running this patch. Without the patch, an oops would
occur every 1-2 days.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>

l2tp: Fix possible WARN_ON from socket code when UDP socket is closed

[ upstream commit: 199f7d24ae59894243687a234a909f44a8724506 ]

If an L2TP daemon closes a tunnel socket while packets are queued in
the tunnel's reorder queue, a kernel warning is logged because the
socket is closed while skbs are still referencing it. The fix is to
purge the queue in the socket's release handler.

WARNING: at include/net/sock.h:351 udp_lib_unhash+0x41/0x68()
Pid: 12998, comm: openl2tpd Not tainted 2.6.25 #8
[<c0423c58>] warn_on_slowpath+0x41/0x51
[<c05d33a7>] udp_lib_unhash+0x41/0x68
[<c059424d>] sk_common_release+0x23/0x90
[<c05d16be>] udp_lib_close+0x8/0xa
[<c05d8684>] inet_release+0x42/0x48
[<c0592599>] sock_release+0x14/0x60
[<c059299f>] sock_close+0x29/0x30
[<c046ef52>] __fput+0xad/0x15b
[<c046f1d9>] fput+0x17/0x19
[<c046c8c4>] filp_close+0x50/0x5a
[<c046da06>] sys_close+0x69/0x9f
[<c04048ce>] syscall_call+0x7/0xb

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>

tcp: Limit cwnd growth when deferring for GSO

[ upstream commit: 246eb2af060fc32650f07203c02bdc0456ad76c7 ]

This fixes inappropriately large cwnd growth on sender-limited flows
when GSO is enabled, limiting cwnd growth to 64k.

[ Backport to 2.6.25 by replacing sk->sk_gso_max_size with 65536 -DaveM ]

Signed-off-by: John Heffner <johnwheffner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>

tcp: Allow send-limited cwnd to grow up to max_burst when gso disabled

[ upstream commit: ce447eb91409225f8a488f6b7b2a1bdf7b2d884f ]

This changes the logic in tcp_is_cwnd_limited() so that cwnd may grow
up to tcp_max_burst() even when sk_can_gso() is false, or when
sysctl_tcp_tso_win_divisor != 0.

Signed-off-by: John Heffner <johnwheffner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>

tcp: TCP connection times out if ICMP frag needed is delayed

[ upstream commit: 7d227cd235c809c36c847d6a597956ad9e9d2bae ]

We are seeing an issue with TCP in handling an ICMP frag needed
message that is received after net.ipv4.tcp_retries1 retransmits.
The default value of retries1 is 3. So if the path mtu changes
and ICMP frag needed is lost for the first 3 retransmits or if
it gets delayed until 3 retransmits are done, TCP doesn't update
MSS correctly and continues to retransmit the orginal message
until it timesout after tcp_retries2 retransmits.

I am seeing this issue even with the latest 2.6.25.4 kernel.

In tcp_retransmit_timer(), when retransmits counter exceeds
tcp_retries1 value, the dst cache entry of the socket is reset.
At this time, if we receive an ICMP frag needed message, the
dst entry gets updated with the new MTU, but the TCP sockets
dst_cache entry remains NULL.

So the next time when we try to retransmit after the ICMP frag
needed is received, tcp_retransmit_skb() gets called. Here the
cur_mss value is calculated at the start of the routine with
a NULL sk_dst_cache. Instead we should call tcp_current_mss after
the rebuild_header that caches the dst entry with the updated mtu.
Also the rebuild_header should be called before tcp_fragment
so that skb is fragmented if the mss goes down.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>

vlan: Correctly handle device notifications for layered VLAN devices

[ upstream commit: 81d85346b3fcd8b3167eac8b5fb415a210bd4345 ]

Commit 30688a9 ([VLAN]: Handle vlan devices net namespace changing)
changed the device notifier to special-case notifications for VLAN
devices, effectively disabling state propagation to underlying VLAN
devices. This is needed for layered VLANs though, so restore the
original behaviour.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>

l2tp: avoid skb truesize bug if headroom is increased

[ upstream commit: 090c48d3dd5ea90b37350334aaed9a93b0c1e0a1 ]

A user reported seeing occasional bugs such as the following when
using the L2TP driver.

SKB BUG: Invalid truesize (272) len=72, sizeof(sk_buff)=208

When L2TP adds its header in the transmit path, it might need to
increase the headroom of the skb. In some cases, the increased
headroom trips a kernel bug when the skb is freed because the skb has
grown beyond its truesize value. The fix is to increase the truesize
by the amount of headroom added, after orphaning the skb.

While here, fix a misleading comment.

Thanks to Iouri Kharon <bc-info@styx.cabel.net> for the initial
report and testing the fix.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>

netlink: Fix nla_parse_nested_compat() to call nla_parse() directly

[ upstream commit: b9a2f2e450b0f770bb4347ae8d48eb2dea701e24 ]

The purpose of nla_parse_nested_compat() is to parse attributes which
contain a struct followed by a stream of nested attributes. So far,
it called nla_parse_nested() to parse the stream of nested attributes
which was wrong, as nla_parse_nested() expects a container attribute
as data which holds the attribute stream. It needs to call
nla_parse() directly while pointing at the next possible alignment
point after the struct in the beginning of the attribute.

With this patch, I can no longer reproduce the reported leftover
warnings.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>

ipsec: Use the correct ip_local_out function

[ upstream commit: 1ac06e0306d0192a7a4d9ea1c9e06d355ce7e7d3 ]

Because the IPsec output function xfrm_output_resume does its
own dst_output call it should always call __ip_local_output
instead of ip_local_output as the latter may invoke dst_output
directly. Otherwise the return values from nf_hook and dst_output
may clash as they both use the value 1 but for different purposes.

When that clash occurs this can cause a packet to be used after
it has been freed which usually leads to a crash. Because the
offending value is only returned from dst_output with qdiscs
such as HTB, this bug is normally not visible.

Thanks to Marco Berizzi for his perseverance in tracking this
down.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>

net_sched: cls_api: fix return value for non-existant classifiers

[ upstream commit: f2df824948d559ea818e03486a8583e42ea6ab37 ]

cls_api should return ENOENT when the requested classifier doesn't
exist.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>