git.karo-electronics.de Git - karo-tx-linux.git/log

CONNECTOR: Don't touch queue dev after decrement of ref count.

[CONNECTOR]: Don't touch queue dev after decrement of ref count.

[ Upstream commit: cf585ae8ae9ac7287a6d078425ea32f22bf7f1f7 ]

cn_queue_free_callback() will touch 'dev'(i.e. cbq->pdev), so it
should be called before atomic_dec(&dev->refcnt).

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

INET: Fix netdev renaming and inet address labels

[INET]: Fix netdev renaming and inet address labels

[ Upstream commit: 44344b2a85f03326c7047a8c861b0c625c674839 ]

When re-naming an interface, the previous secondary address
labels get lost e.g.

  $> brctl addbr foo
  $> ip addr add 192.168.0.1 dev foo
  $> ip addr add 192.168.0.2 dev foo label foo:00
  $> ip addr show dev foo | grep inet
    inet 192.168.0.1/32 scope global foo
    inet 192.168.0.2/32 scope global foo:00
  $> ip link set foo name bar
  $> ip addr show dev bar | grep inet
    inet 192.168.0.1/32 scope global bar
    inet 192.168.0.2/32 scope global bar:2

Turns out to be a simple thinko in inetdev_changename() - clearly we
want to look at the address label, rather than the device name, for
a suffix to retain.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

IRDA: irda_create() nuke user triggable printk

[IRDA]: irda_create() nuke user triggable printk

[ Upstream commit: 9e8d6f8959c356d8294d45f11231331c3e1bcae6 ]

easy to trigger as user with sfuzz.

irda_create() is quiet on unknown sock->type,
match this behaviour for SOCK_DGRAM unknown protocol

Signed-off-by: maximilian attems <max@stro.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

NET: kaweth was forgotten in msec switchover of usb_start_wait_urb

[NET]: kaweth was forgotten in msec switchover of usb_start_wait_urb

[ Upstream commit: 2b2b2e35b71e5be8bc06cc0ff38df15dfedda19b ]

Back in 2.6.12-pre, usb_start_wait_urb was switched over to take
milliseconds instead of jiffies. kaweth.c was never updated to match.

Signed-off-by: Russ Dill <Russ.Dill@asu.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

NET: mcs7830 passes msecs instead of jiffies to usb_control_msg

[NET]: mcs7830 passes msecs instead of jiffies to usb_control_msg

[ Upstream commit 1d39da3dcaad4231f0fa75024b1d6d710a2ced74 ]

usb_control_msg was changed long ago (2.6.12-pre) to take milliseconds
instead of jiffies. Oddly, mcs7830 wasn't added until 2.6.19-rc3.

Signed-off-by: Russ Dill <Russ.Dill@asu.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

X25: Add missing x25_neigh_put

[X25]: Add missing x25_neigh_put

[ Upstream commit: 76975f8a3186dae501584d0155ea410464f62815 ]

The function x25_get_neigh increments a reference count.  At the point of
the second goto out, the result of calling x25_get_neigh is only stored in
a local variable, and thus no one outside the function will be able to
decrease the reference count.  Thus, x25_neigh_put should be called before
the return in this case.

The problem was found using the following semantic match.
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>

@@
type T,T1,T2;
identifier E;
statement S;
expression x1,x2,x3;
int ret;
@@

  T E;
  ...
* if ((E = x25_get_neigh(...)) == NULL)
  S
  ... when != x25_neigh_put(...,(T1)E,...)
      when != if (E != NULL) { ... x25_neigh_put(...,(T1)E,...); ...}
      when != x1 = (T1)E
      when != E = x3;
      when any
  if (...) {
    ... when != x25_neigh_put(...,(T2)E,...)
        when != if (E != NULL) { ... x25_neigh_put(...,(T2)E,...); ...}
        when != x2 = (T2)E
(
*   return;
|
*   return ret;
)
  }
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

POWERPC: Change fallocate to match unistd.h on powerpc

patch f2205fbb5a8933514fd343cc329df631802b4543 in mainline.

Fix the fallocate system call on powerpc to match its unistd.h.

This implies none of these system calls are currently working with the
unistd.h sys call values:
fallocate
signalfd
timerfd
eventfd
sync_file_range2

Signed-off-by: Patrick Mansfield <patmans@us.ibm.com>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

sky2: RX lockup fix

Backport commit 798fdd07fcc131f396e521febb4a7d42559bf4b5

I'm using a Marvell 88E8062 on a custom PPC64 blade and ran into RX
lockups while validating the sky2 driver.  The receive MAC FIFO would
become stuck during testing with high traffic.  One port of the 88E8062
would lockup, while the other port remained functional.  Re-inserting
the sky2 module would not fix the problem - only a power cycle would.

I looked over Marvell's most recent sk98lin driver and it looks like
they had a "workaround" for the Yukon XL that the sky2 doesn't have yet.
The sk98lin driver disables the RX MAC FIFO flush feature for all
revisions of the Yukon XL.

According to skgeinit.c of the sk98lin driver, "Flushing must be enabled
(needed for ASF see dev. #4.29), but the flushing mask should be
disabled (see dev. #4.115)".  Nice. I implemented this same change in
the sky2 driver and verified that the RX lockup I was seeing was
resolved.

Signed-off-by: Peter Tyser <ptyser@xes-inc.com>
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

sky2: disable rx checksum on Yukon XL

Backport of 8b31cfbcd1b54362ef06c85beb40e65a349169a2

The Marvell Yukon XL chipset appears to have a hardware glitch
where it will repeat the checksum of the last packet. Of course, this is
timing sensitive and only happens sometimes...

More info: http://bugzilla.kernel.org/show_bug.cgi?id=9381

As a workaround just disable hardware checksumming by default on
this chip version. The earlier workaround for PCIX, dual port
was also on Yukon XL so don't need to disable checksumming there.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

IPV4 raw: Strengthen check on validity of iph->ihl

[IPV4] raw: Strengthen check on validity of iph->ihl

[ Upstream commit: f844c74fe07321953e2dd227fe35280075f18f60 ]

We currently check that iph->ihl is bounded by the real length and that
the real length is greater than the minimum IP header length. However,
we did not check the caes where iph->ihl is less than the minimum IP
header length.

This breaks because some ip_fast_csum implementations assume that which
is quite reasonable.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

tty: fix logic change introduced by wait_event_interruptible_timeout()

patch db99247ac68fc352100090ad7704fb5efb9327b6 in mainline.

Commit 5a52bd4a2dcb570333ce6fe2e16cd311650dbdc8 introduced a subtle logic
change in tty_wait_until_sent().  The original version would only error out
of the 'do { ...  } while (timeout)' loop if signal_pending() evaluated to
true; a timeout or break due to an empty buffer would fall out of the loop
and into the tty->driver->wait_until_sent handling.  The current
implementation will error out on either a pending signal or an empty
buffer, falling through to the tty->driver->wait_until_sent handling only
on a timeout.

The ->wait_until_sent() will not be reached if the buffer empties before
timeout jiffies have elapsed.  This behavior differs from that prior to commit
5a52bd4a2dcb570333ce6fe2e16cd311650dbdc8.

I turned this up while using a little serial download utility to bootstrap an
ARM-based eval board.  The util worked fine on 2.6.22.x, but consistently
failed on 2.6.23.x.  Once I'd determined that, I narrowed things down with git
bisect, and found the above difference in logic in tty_wait_until_sent() by
inspection.

This change reverts the logic flow in tty_wait_until_sent() to match that
prior to the aforementioned commit.

Signed-off-by: Cory T. Tusar <ctusar@videon-central.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

VLAN: Lost rtnl_unlock() in vlan_ioctl()

[VLAN]: Lost rtnl_unlock() in vlan_ioctl()

[ Upstream commit: e35de02615f97b785dc6f73cba421cea06bcbd10 ]

The SET_VLAN_NAME_TYPE_CMD command w/o CAP_NET_ADMIN capability
doesn't release the rtnl lock.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

IPSEC: Fix potential dst leak in xfrm_lookup

[IPSEC]: Fix potential dst leak in xfrm_lookup

[ Upstream commit: 75b8c133267053c9986a7c8db5131f0e7349e806 ]

If we get an error during the actual policy lookup we don't free the
original dst while the caller expects us to always free the original
dst in case of error.

This patch fixes that.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

SPARC64: Fix two kernel linear mapping setup bugs.

[SPARC64]: Fix two kernel linear mapping setup bugs.

[ Upstream commit: 8f361453d8e9a67c85b2cf9b93c642c2d8fe0462 ]

This was caught and identified by Greg Onufer.

Since we setup the 256M/4M bitmap table after taking over the trap
table, it's possible for some 4M mapping to get loaded in the TLB
beforhand which later will be 256M mappings.

This can cause illegal TLB multiple-match conditions. Fix this by
setting up the bitmap before we take over the trap table.

Next, __flush_tlb_all() was not doing anything on hypervisor
platforms. Fix by adding sun4v_mmu_demap_all() and calling it.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

SPARC64: Fix memory controller register access when non-SMP.

[SPARC64]: Fix memory controller register access when non-SMP.

[ Upstream commit: b332b8bc9c67165eabdfc7d10b4a2e4cc9f937d0 ]

get_cpu() always returns zero on non-SMP builds, but we
really want the physical cpu number in this code in order
to do the right thing.

Based upon a non-SMP kernel boot failure report from Bernd Zeimetz.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ACPI: thinkpad-acpi: fix lenovo keymap for brightness

upstream commit 56a185b43be05e48da7428e6a1d3e2585b232b1d

Starting in 2.6.23...

Several reports from X60 users complained that the default Lenovo keymap
issuing EV_KEY KEY_BRIGHTNESS_UP/DOWN input events caused major issues when
the proper brightness support through ACPI video.c was loaded.

Therefore, remove the generation of these events by default, which is the
right thing for T60, X60, R60, T61, X61 and R61 with their latest BIOSes.

Distros that want to misuse these events into OSD reporting (which requires
an ugly hack from hell in HAL) are welcome to set up the key map they need
through HAL. That way, we don't break everyone else's systems.

Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ACPI: video_device_list corruption

The ->cap fields of struct acpi_video_device and struct acpi_video_bus
are 1B each, not 4B. The oversized memset()'s corrupted the subsequent
list_head fields. This resulted in silent corruption without
CONFIG_DEBUG_LIST and BUG's with it. This patch uses sizeof() to pass
the proper bounds to the memset() calls and thereby correct the bugs.

upstream commit 98934def70b48dac74fac3738b78ab2d1a28edda

Signed-off-by: William Irwin <wli@holomorphy.com>
Acked-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

dm crypt: use bio_add_page

patch 91e106259214b40e992a58fb9417da46868e19b2 in mainline.

Fix possible max_phys_segments violation in cloned dm-crypt bio.

In write operation dm-crypt needs to allocate new bio request
and run crypto operation on this clone. Cloned request has always
the same size, but number of physical segments can be increased
and violate max_phys_segments restriction.

This can lead to data corruption and serious hardware malfunction.
This was observed when using XFS over dm-crypt and at least
two HBA controller drivers (arcmsr, cciss) recently.

Fix it by using bio_add_page() call (which tests for other
restrictions too) instead of constructing own biovec.

All versions of dm-crypt are affected by this bug.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

dm crypt: fix write endio

patch adfe47702c4726b3e045f9f83178def02833be4c in mainline.

Fix BIO_UPTODATE test for write io.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

dm: table detect io beyond device

Patch 512875bd9661368da6f993205a61213b79ba1df0 in mainline.

This patch fixes a panic on shrinking a DM device if there is
outstanding I/O to the part of the device that is being removed.
(Normally this doesn't happen - a filesystem would be resized first,
for example.)

The bug is that __clone_and_map() assumes dm_table_find_target()
always returns a valid pointer. It may fail if a bio arrives from the
block layer but its target sector is no longer included in the DM
btree.

This patch appends an empty entry to table->targets[] which will
be returned by a lookup beyond the end of the device.

After calling dm_table_find_target(), __clone_and_map() and target_message()
check for this condition using
dm_target_is_valid().

Sample test script to trigger oops:

#!/bin/bash

FILE=$(mktemp)
LODEV=$(losetup -f)
MAP=$(basename ${FILE})
SIZE=4M

dd if=/dev/zero of=${FILE} bs=${SIZE} count=1
losetup ${LODEV} ${FILE}

echo "0 $(blockdev --getsz ${LODEV}) linear ${LODEV} 0" |dmsetup create ${MAP}
dmsetup suspend ${MAP}
echo "0 1 linear ${LODEV} 0" |dmsetup load ${MAP}
dd if=/dev/zero of=/dev/mapper/${MAP} bs=${SIZE} count=1 &
echo "Wait til dd push some I/O"
sleep 5
dmsetup resume ${MAP}

Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

SPARC64: Fix sparc64 cpu cross call hangs.

[SPARC64]: Fix endless loop in cheetah_xcall_deliver().

[ Upsteam commit: 0de56d1ab83323d604d95ca193dcbd28388dbabb ]

We need to mask out the proper bits when testing the dispatch status
register else we can see unrelated NACK bits from previous cross call
sends.

Signed-off-by: David S. Miller <davem@davemloft.net>

Linux 2.6.23.14

Use access mode instead of open flags to determine needed permissions (CVE-2008-0001)

patch 974a9f0b47da74e28f68b9c8645c3786aa5ace1a in mainline

Way back when (in commit 834f2a4a1554dc5b2598038b3fe8703defcbe467, aka
"VFS: Allow the filesystem to return a full file pointer on open intent"
to be exact), Trond changed the open logic to keep track of the original
flags to a file open, in order to pass down the the intent of a dentry
lookup to the low-level filesystem.

However, when doing that reorganization, it changed the meaning of
namei_flags, and thus inadvertently changed the test of access mode for
directories (and RO filesystem) to use the wrong flag. So fix those
test back to use access mode ("acc_mode") rather than the open flag
("flag").

Issue noticed by Bill Roman at Datalight.

Reported-and-tested-by: Bill Roman <bill.roman@datalight.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Linux 2.6.23.13

hwmon: (w83627ehf) Be more careful when changing VID input level

Already in Linus' tree:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=58e6e78119da2bdade9f6f588155f0320072b76b

Fix for:
http://bugzilla.kernel.org/show_bug.cgi?id=9634

The VID input level change has been reported to cause trouble. Be more
careful in this respect:
* Only change the level on the W83627EHF/EHG. The W83627DHG is more
complex in this respect.
* Don't change the level if the VID pins are in output mode.
* Only set the level to TTL if VRM 9.x is used.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Linux 2.6.23.12

Revert "PNP: increase the maximum number of resources"

This reverts commit fc175adc1c935ea8679d76a78d7a58df34af16eb.

There have been reports that it causes problems:
http://bugzilla.kernel.org/show_bug.cgi?id=9514
people are still debating for 2.6.24 if it should be reverted or not,
but as it causes a known problem, we will revert this for now.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Linux 2.6.23.11

BRIDGE: Section fix.

WARNING: vmlinux.o(.init.text+0x204e2): Section mismatch: reference to .exit.text:br_fdb_fini (between 'br_init' and 'br_fdb_init')

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Revert "Freezer: Fix APM emulation breakage"

This reverts commit a6eda373a0fe1c4d169d0ec081518d68323428ab.

It causes a build breakage.

Thanks to Chuck Ebbert <cebbert@redhat.com> for pointing it out.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Linux 2.6.23.10

XFS: Make xfsbufd threads freezable

patch 978c7b2ff49597ab76ff7529a933bd366941ac25 in mainline

Fix breakage caused by commit 831441862956fffa17b9801db37e6ea1650b0f69
that did not introduce the necessary call to set_freezable() in
xfs/linux-2.6/xfs_buf.c .

SGI-PV: 974224
SGI-Modid: xfs-linux-melb:xfs-kern:30203a

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Cc: Oliver Pintr <oliver.pntr@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

BRIDGE: Properly dereference the br_should_route_hook

[BRIDGE]: Properly dereference the br_should_route_hook

[ Upstream commit: 82de382ce8e1c7645984616728dc7aaa057821e4 ]

This hook is protected with the RCU, so simple

if (br_should_route_hook)
br_should_route_hook(...)

is not enough on some architectures.

Use the rcu_dereference/rcu_assign_pointer in this case.

Fixed Stephen's comment concerning using the typeof().

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

NETFILTER: xt_TCPMSS: remove network triggerable WARN_ON

[NETFILTER]: xt_TCPMSS: remove network triggerable WARN_ON

[ Upstream commit: 9dc0564e862b1b9a4677dec2c736b12169e03e99 ]

ipv6_skip_exthdr() returns -1 for invalid packets. don't WARN_ON
that.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

XFRM: Fix leak of expired xfrm_states

[XFRM]: Fix leak of expired xfrm_states

[ Upstream commit: 5dba4797115c8fa05c1a4d12927a6ae0b33ffc41 ]

The xfrm_timer calls __xfrm_state_delete, which drops the final reference
manually without triggering destruction of the state. Change it to use
xfrm_state_put to add the state to the gc list when we're dropping the
last reference. The timer function may still continue to use the state
safely since the final destruction does a del_timer_sync().

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

libata: kill spurious NCQ completion detection

patch 459ad68893a84fb0881e57919340b97edbbc3dc7 in mainline.

Spurious NCQ completion detection implemented in ahci was incorrect.
On AHCI receving and processing FISes and raising interrupts are not
interlocked and spurious interrupts are expected.

For example, if an interrupt occurs while interrupt handler is running
and the running interrupt handler handles the event the new IRQ
indicated, after IRQ handler finishes, it will be executed again
because IRQ pending bit is set by the new interrupt but there won't be
anything to process.

Please read the following message for more information.

  http://article.gmane.org/gmane.linux.ide/26012

This patch...

* Removes all spurious IRQ whining from ahci.  Spurious NCQ completion
  detection was completely wrong.  Spurious D2H Register FIS taught us
  that some early drives send spurious D2H Register FIS with I bit set
  while NCQ commands are in progress but none of recent drives does
  that and even the ones which show such behavior can do NCQ fine.

* Kills all NCQ blacklist entries which were added because of spurious
  NCQ completions.  I tracked down each commit and verified all
  removed ones are actually added because of spurious completions.

  WD740ADFD-00NLR1 wasn't deleted but moved upward because the drive
  not only had spurious NCQ completions but also is slow on sequential
  data transfers if NCQ is enabled.

  Maxtor 7V300F0 was added by 0e3dbc01d53940fe10e5a5cfec15ede3e929c918
  from Alan Cox.  I can only find evidences that the drive only had
  troubles with spuruious completions by searching the mailing list.
  This entry needs to be verified and removed if it doesn't have other
  NCQ related problems.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

NETFILTER: fix forgotten module release in xt_CONNMARK and xt_CONNSECMARK

[NETFILTER]: fix forgotten module release in xt_CONNMARK and xt_CONNSECMARK

[ Upstream commit: 67b4af297033f5f65999885542f95ba7b562848a ]

Fix forgotten module release in xt_CONNMARK and xt_CONNSECMARK

When xt_CONNMARK is used outside the mangle table and the user specified
"--restore-mark", the connmark_tg_check() function will (correctly)
error out, but (incorrectly) forgets to release the L3 conntrack module.
Same for xt_CONNSECMARK.

Fix is to move the call to acquire the L3 module after the basic
constraint checks.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

UNIX: EOF on non-blocking SOCK_SEQPACKET

[UNIX]: EOF on non-blocking SOCK_SEQPACKET

[ Upstream commit: 0a11225887fe6cbccd882404dc36ddc50f47daf9 ]

I am not absolutely sure whether this actually is a bug (as in: I've got
no clue what the standards say or what other implementations do), but at
least I was pretty surprised when I noticed that a recv() on a
non-blocking unix domain socket of type SOCK_SEQPACKET (which is connection
oriented, after all) where the remote end has closed the connection
returned -1 (EAGAIN) rather than 0 to indicate end of file.

This is a test case:

| #include <sys/types.h>
| #include <unistd.h>
| #include <sys/socket.h>
| #include <sys/un.h>
| #include <fcntl.h>
| #include <string.h>
| #include <stdlib.h>
|
| int main(){
| int sock;
| struct sockaddr_un addr;
| char buf[4096];
| int pfds[2];
|
| pipe(pfds);
| sock=socket(PF_UNIX,SOCK_SEQPACKET,0);
| addr.sun_family=AF_UNIX;
| strcpy(addr.sun_path,"/tmp/foobar_testsock");
| bind(sock,(struct sockaddr *)&addr,sizeof(addr));
| listen(sock,1);
| if(fork()){
| close(sock);
| sock=socket(PF_UNIX,SOCK_SEQPACKET,0);
| connect(sock,(struct sockaddr *)&addr,sizeof(addr));
| fcntl(sock,F_SETFL,fcntl(sock,F_GETFL)|O_NONBLOCK);
| close(pfds[1]);
| read(pfds[0],buf,sizeof(buf));
| recv(sock,buf,sizeof(buf),0); // <-- this one
| }else accept(sock,NULL,NULL);
| exit(0);
| }

If you try it, make sure /tmp/foobar_testsock doesn't exist.

The marked recv() returns -1 (EAGAIN) on 2.6.23.9. Below you find a
patch that fixes that.

Signed-off-by: Florian Zumbiehl <florz@florz.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

TCP: illinois: Incorrect beta usage

[TCP] illinois: Incorrect beta usage

[ Upstream commit: a357dde9df33f28611e6a3d4f88265e39bcc8880 ]

Lachlan Andrew observed that my TCP-Illinois implementation uses the
beta value incorrectly:
The parameter  beta  in the paper specifies the amount to decrease
*by*:  that is, on loss,
W <-  W -  beta*W
but in   tcp_illinois_ssthresh() uses  beta  as the amount
to decrease  *to*: W <- beta*W

This bug makes the Linux TCP-Illinois get less-aggressive on uncongested network,
hurting performance. Note: since the base beta value is .5, it has no
impact on a congested network.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

IPV6: Restore IPv6 when MTU is big enough

[IPV6]: Restore IPv6 when MTU is big enough

[ Upstream commit: d31c7b8fa303eb81311f27b80595b8d2cbeef950 ]

Avaid provided test application, so bug got fixed.

IPv6 addrconf removes ipv6 inner device from netdev each time cmu
changes and new value is less than IPV6_MIN_MTU (1280 bytes).
When mtu is changed and new value is greater than IPV6_MIN_MTU,
it does not add ipv6 addresses and inner device bac.

This patch fixes that.

Tested with Avaid's application, which works ok now.

Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

DECNET: dn_nl_deladdr() almost always returns no error

[DECNET]: dn_nl_deladdr() almost always returns no error

[ Upstream commit: 3ccd86241b277249d5ac08e91eddfade47184520 ]

As far as I see from the err variable initialization
the dn_nl_deladdr() routine was designed to report errors
like "EADDRNOTAVAIL" and probaby "ENODEV".

But the code sets this err to 0 after the first nlmsg_parse
and goes on, returning this 0 in any case.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

VLAN: Fix nested VLAN transmit bug

[VLAN]: Fix nested VLAN transmit bug

[ Upstream commit: 6ab3b487db77fa98a24560f11a5a8e744b98d877 ]

Fix misbehavior of vlan_dev_hard_start_xmit() for recursive encapsulations.

Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

TEXTSEARCH: Do not allow zero length patterns in the textsearch infrastructure

[TEXTSEARCH]: Do not allow zero length patterns in the textsearch infrastructure

[ Upstream commit: e03ba84adb62fbc6049325a5bc00ef6932fa5e39 ]

If a zero length pattern is passed then return EINVAL.
Avoids infinite loops (bm) or invalid memory accesses (kmp).

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

RXRPC: Add missing select on CRYPTO

[RXRPC]: Add missing select on CRYPTO

[ Upstream commit: d5a784b3719ae364f49ecff12a0248f6e4252720 ]

AF_RXRPC uses the crypto services, so should depend on or select CRYPTO.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

BRIDGE: Lost call to br_fdb_fini() in br_init() error path

[BRIDGE]: Lost call to br_fdb_fini() in br_init() error path

[ Upstream commit: 17efdd45755c0eb8d1418a1368ef7c7ebbe98c6e ]

In case the br_netfilter_init() (or any subsequent call)
fails, the br_fdb_fini() must be called to free the allocated
in br_fdb_init() br_fdb_cache kmem cache.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

PFKEY: Sending an SADB_GET responds with an SADB_GET

[PFKEY]: Sending an SADB_GET responds with an SADB_GET

[ Upstream commit: 435000bebd94aae3a7a50078d142d11683d3b193 ]

Kernel needs to respond to an SADB_GET with the same message type to
conform to the RFC 2367 Section 3.1.5

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

TCP: MTUprobe: fix potential sk_send_head corruption

[TCP] MTUprobe: fix potential sk_send_head corruption

[ Upstream commit: 6e42141009ff18297fe19d19296738b742f861db ]

When the abstraction functions got added, conversion here was
made incorrectly. As a result, the skb may end up pointing
to skb which got included to the probe skb and then was freed.
For it to trigger, however, skb_transmit must fail sending as
well.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

TCP: Fix TCP header misalignment

[TCP]: Fix TCP header misalignment

[ Upstream commit: 21df56c6e2372e09c916111efb6c14c372a5ab2e ]

Indeed my previous change to alloc_pskb has made it possible
for the TCP header to be misaligned iff the MTU is not a multiple
of 4 (and less than a page).  So I suspect the optimised IPsec
MTU calculation is giving you just such an MTU :)

This patch fixes it by changing alloc_pskb to make sure that
the size is at least 32-bit aligned.  This does not cause the
problem fixed by the previous patch because max_header is always
32-bit aligned which means that in the SG/NOTSO case this will
be a no-op.

I thought about putting this in the callers but all the current
callers are from TCP.  If and when we get a non-TCP caller we
can always create a TCP wrapper for this function and move the
alignment over there.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

CRYPTO api: Fix potential race in crypto_remove_spawn

[CRYPTO] api: Fix potential race in crypto_remove_spawn

[ Upstream commit: 38cb2419f544ad413c7f7aa8c17fd7377610cdd8 ]

As it is crypto_remove_spawn may try to unregister an instance which is
yet to be registered. This patch fixes this by checking whether the
instance has been registered before attempting to remove it.

It also removes a bogus cra_destroy check in crypto_register_instance as
1) it's outside the mutex;
2) we have a check in __crypto_register_alg already.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

TCP: Problem bug with sysctl_tcp_congestion_control function

[TCP]: Problem bug with sysctl_tcp_congestion_control function

[ Upstream commit: 5487796f0c9475586277a0a7a91211ce5746fa6a ]

sysctl_tcp_congestion_control seems to have a bug that prevents it
from actually calling the tcp_set_default_congestion_control
function. This is not so apparent because it does not return an error
and generally the /proc interface is used to configure the default TCP
congestion control algorithm. This is present in 2.6.18 onwards and
probably earlier, though I have not inspected 2.6.15--2.6.17.

sysctl_tcp_congestion_control calls sysctl_string and expects a successful
return code of 0. In such a case it actually sets the congestion control
algorithm with tcp_set_default_congestion_control. Otherwise, it returns the
value returned by sysctl_string. This was correct in 2.6.14, as sysctl_string
returned 0 on success. However, sysctl_string was updated to return 1 on
success around about 2.6.15 and sysctl_tcp_congestion_control was not updated.
Even though sysctl_tcp_congestion_control returns 1, do_sysctl_strategy
converts this return code to '0', so the caller never notices the error.

Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ATM: [he] initialize lock and tasklet earlier

[ATM]: [he] initialize lock and tasklet earlier

[ Upstream commit: 8a8037ac9dbe4eb20ce50aa20244faf77444f4a3 ]

if you are lucky (unlucky?) enough to have shared interrupts, the
interrupt handler can be called before the tasklet and lock are ready
for use.

Signed-off-by: chas williams <chas@cmf.nrl.navy.mil>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

IPV4: Remove bogus ifdef mess in arp_process

[IPV4]: Remove bogus ifdef mess in arp_process

[ Upstream commit: 3660019e5f96fd9a8b7d4214a96523c0bf7b676d ]

The #ifdef's in arp_process() were not only a mess, they were also wrong
in the CONFIG_NET_ETHERNET=n and (CONFIG_NETDEV_1000=y or
CONFIG_NETDEV_10000=y) cases.

Since they are not required this patch removes them.

Also removed are some #ifdef's around #include's that caused compile
errors after this change.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

NET: Corrects a bug in ip_rt_acct_read()

[NET]: Corrects a bug in ip_rt_acct_read()

[ Upstream commit: 483b23ffa3a5f44767038b0a676d757e0668437e ]

It seems that stats of cpu 0 are counted twice, since
for_each_possible_cpu() is looping on all possible cpus, including 0

Before percpu conversion of ip_rt_acct, we should also remove the
assumption that CPU 0 is online (or even possible)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

create /sys/.../power when CONFIG_PM is set

patch dec13c15445fec29ca9087890895718450e80b95 in mainline.

The CONFIG_SUSPEND changes in 2.6.23 caused a regression under certain
configuration conditions (SUSPEND=n, USB_AUTOSUSPEND=y) where all USB
device attributes in sysfs (idVendor, idProduct, ...) silently disappeared,
causing udev breakage and more.

The cause of this is that the /sys/.../power subdirectory is now only
created when CONFIG_PM_SLEEP is set, however, it should be created whenever
CONFIG_PM is set to handle the above situation. The following patch fixes
the regression.

Signed-off-by: Daniel Drake <dsd@gentoo.org>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: stable <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

netfilter: Fix kernel panic with REDIRECT target.

This patch fixes a NAT regression in 2.6.23, resulting in a
crash when a connection is NATed and matches a conntrack
helper after NAT.

Please apply, thanks.
[NETFILTER]: Fix kernel panic with REDIRECT target.

Upstream commit 1f305323ff5b9ddc1a4346d36072bcdb58f3f68a

When connection tracking entry (nf_conn) is about to copy itself it can
have some of its extension users (like nat) as being already freed and
thus not required to be copied.

Actually looking at this function I suspect it was copied from
nf_nat_setup_info() and thus bug was introduced.

Report and testing from David <david@unsolicited.net>.

[ Patrick McHardy states:

        I now understand whats happening:

        - new connection is allocated without helper
        - connection is REDIRECTed to localhost
        - nf_nat_setup_info adds NAT extension, but doesn't initialize it yet
        - nf_conntrack_alter_reply performs a helper lookup based on the
           new tuple, finds the SIP helper and allocates a helper extension,
           causing reallocation because of too little space
        - nf_nat_move_storage is called with the uninitialized nat extension

        So your fix is entirely correct, thanks a lot :)  ]

Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

nf_nat: fix memset error

This patch fixes an incorrect memset in the NAT code, causing
misbehaviour when unloading and reloading the NAT module.
Applies to stable-2.6.22 and stable-2.6.23.

Please apply, thanks.
[NETFILTER]: nf_nat: fix memset error

Upstream commit e0bf9cf15fc30d300b7fbd821c6bc975531fab44

The size passing to memset is the size of a pointer. Fixes
misbehaviour when unloading and reloading the NAT module.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

esp_scsi: fix reset cleanup spinlock recursion

patch 522939d45c293388e6a360210905f9230298df16 in mainline.

The esp_reset_cleanup() function is called with the host lock held and
invokes starget_for_each_device() which wants to take it too. Here is a
fix along the lines of shost_for_each_device()/__shost_for_each_device()
adding a __starget_for_each_device() counterpart which assumes the lock
has already been taken.

Eventually, I think the driver should get modified so that more work is
done as a softirq rather than in the interrupt context, but for now it
fixes a bug that causes the spinlock debugger to fire.

While at it, it fixes a small number of cosmetic problems with
starget_for_each_device() too.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

revert "dpt_i2o: convert to SCSI hotplug model"

patch 24601bbcacb3356657747f2e64317923feb7a1a2 in mainline.

revert

    commit 55d9fcf57ba5ec427544fca7abc335cf3da78160
    Author: Matthew Wilcox <matthew@wil.cx>
    Date:   Mon Jul 30 15:19:18 2007 -0600

        [SCSI] dpt_i2o: convert to SCSI hotplug model

         - Delete refereces to HOSTS_C
         - Switch to module_init/module_exit instead of detect/release
         - Don't pass around the host template and rename it to adpt_template
         - Switch from scsi_register/scsi_unregister to scsi_host_alloc,
           scsi_add_host, scsi_scan_host and scsi_host_put.

Because it caused (for unknown reasons) Andres' all-data-reads-as-zeroes
problem, reported at
http://groups.google.com/group/fa.linux.kernel/msg/083a9acff0330234

Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Mark Salyzyn <mark_salyzyn@adaptec.com>
Cc: James Bottomley <James.Bottomley@SteelEye.com>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Anders Henke <anders.henke@1und1.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

fb_ddc: fix DDC lines quirk

patch b64d70825abbf706bbe80be1b11b09514b71f45e in mainline.

The code in fb_ddc_read() is said to be based on the implementation of the
radeon driver:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=fc5891c8a3ba284f13994d7bc1f1bfa8283982de

However, comparing the old radeon driver code with the new fb_ddc code
reveals some differences.  Most notably, the I2C bus lines are held at the
end of the function, while the original code was releasing them (as the
comment above correctly says.)

There are a few other differences, which appear to be responsible for read
failures on my system.  While tracing low-level I2C code in i2c-algo-bit, I
noticed that the initial attempt to read the EDID always failed.  It takes
one retry for the read to succeed.  As we are about to remove this
automatic retry property from i2c-algo-bit, reading the EDID would really
fail.

As a summary, the I2C lines quirk which is supposedly needed to read EDID
on some older monitors is currently breaking the (first) read on all other
monitors (and might not even work with older ones - did anyone try since
October 2006?)

After applying the patch below, which makes the code in fb_ddc_read()
really similar to what the radeon driver used to have, the first EDID read
succeeds again.

On top of that, as it appears that this code has been broken for one year
now and nobody seems to have complained, I'm curious if it makes sense to
keep this quirk in place.  It makes the code more complex and slower just
for the sake of monitors which I guess nobody uses anymore.  Can't we just
get rid of it?

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Roger Leigh <rleigh@whinlatter.ukfsn.org>
Tested-by: Michael Buesch <mb@bu3sch.de>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

wait_task_stopped(): pass correct exit_code to wait_noreap_copyout()

patch e6ceb32aa25fc33f21af84cc7a32fe289b3e860c in mainline.

In wait_task_stopped() exit_code already contains the right value for the
si_status member of siginfo, and this is simply set in the non WNOWAIT
case.

If you call waitid() with a stopped or traced process, you'll get the signal
in siginfo.si_status as expected -- however if you call waitid(WNOWAIT) at the
same time, you'll get the signal << 8 | 0x7f

Pass it unchanged to wait_noreap_copyout(); we would only need to shift it
and add 0x7f if we were returning it in the user status field and that
isn't used for any function that permits WNOWAIT.

Signed-off-by: Scott James Remnant <scott@ubuntu.com>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

PNP: increase the maximum number of resources

patch a7839e960675b549f06209d18283d5cee2ce9261 in mainline.

On some systems the number of resources(IO,MEM) returnedy by PNP device is
greater than the PNP constant, for example motherboard devices. It brings
that some resources can't be reserved and resource confilicts. This will
cause PCI resources are assigned wrongly in some systems, and cause hang.
This is a regression since we deleted ACPI motherboard driver and use PNP
system driver.

[akpm@linux-foundation.org: fix text and coding-style a bit]
Signed-off-by: Li Shaohua <shaohua.li@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Thomas Renninger <trenn@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Freezer: Fix APM emulation breakage

patch cb43c54ca05c01533c45e4d3abfe8f99b7acf624 in mainline.

The APM emulation is currently broken as a result of commit
831441862956fffa17b9801db37e6ea1650b0f69
"Freezer: make kernel threads nonfreezable by default"
that removed the PF_NOFREEZE annotations from apm_ioctl() without adding
the appropriate freezer hooks. Fix it and remove the unnecessary variable flags
from apm_ioctl().

Special thanks to Franck Bui-Huu <vagabon.xyz@gmail.com> for pointing out the
problem.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Franck Bui-Huu <vagabon.xyz@gmail.com>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

futex: fix for futex_wait signal stack corruption

From Steven Rostedt <srostedt@redhat.com>

patch ce6bd420f43b28038a2c6e8fbb86ad24014727b6 in mainline.

David Holmes found a bug in the -rt tree with respect to
pthread_cond_timedwait. After trying his test program on the latest git
from mainline, I found the bug was there too.  The bug he was seeing
that his test program showed, was that if one were to do a "Ctrl-Z" on a
process that was in the pthread_cond_timedwait, and then did a "bg" on
that process, it would return with a "-ETIMEDOUT" but early. That is,
the timer would go off early.

Looking into this, I found the source of the problem. And it is a rather
nasty bug at that.

Here's the relevant code from kernel/futex.c: (not in order in the file)

[...]
smlinkage long sys_futex(u32 __user *uaddr, int op, u32 val,
                          struct timespec __user *utime, u32 __user *uaddr2,
                          u32 val3)
{
        struct timespec ts;
        ktime_t t, *tp = NULL;
        u32 val2 = 0;
        int cmd = op & FUTEX_CMD_MASK;

        if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI)) {
                if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
                        return -EFAULT;
                if (!timespec_valid(&ts))
                        return -EINVAL;

                t = timespec_to_ktime(ts);
                if (cmd == FUTEX_WAIT)
                        t = ktime_add(ktime_get(), t);
                tp = &t;
        }
[...]
        return do_futex(uaddr, op, val, tp, uaddr2, val2, val3);
}

[...]

long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
                u32 __user *uaddr2, u32 val2, u32 val3)
{
        int ret;
        int cmd = op & FUTEX_CMD_MASK;
        struct rw_semaphore *fshared = NULL;

        if (!(op & FUTEX_PRIVATE_FLAG))
                fshared = &current->mm->mmap_sem;

        switch (cmd) {
        case FUTEX_WAIT:
                ret = futex_wait(uaddr, fshared, val, timeout);

[...]

static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared,
                      u32 val, ktime_t *abs_time)
{
[...]
               struct restart_block *restart;
                restart = &current_thread_info()->restart_block;
                restart->fn = futex_wait_restart;
                restart->arg0 = (unsigned long)uaddr;
                restart->arg1 = (unsigned long)val;
                restart->arg2 = (unsigned long)abs_time;
                restart->arg3 = 0;
                if (fshared)
                        restart->arg3 |= ARG3_SHARED;
                return -ERESTART_RESTARTBLOCK;
[...]

static long futex_wait_restart(struct restart_block *restart)
{
        u32 __user *uaddr = (u32 __user *)restart->arg0;
        u32 val = (u32)restart->arg1;
        ktime_t *abs_time = (ktime_t *)restart->arg2;
        struct rw_semaphore *fshared = NULL;

        restart->fn = do_no_restart_syscall;
        if (restart->arg3 & ARG3_SHARED)
                fshared = &current->mm->mmap_sem;
        return (long)futex_wait(uaddr, fshared, val, abs_time);
}

So when the futex_wait is interrupt by a signal we break out of the
hrtimer code and set up or return from signal. This code does not return
back to userspace, so we set up a RESTARTBLOCK.  The bug here is that we
save the "abs_time" which is a pointer to the stack variable "ktime_t t"
from sys_futex.

This returns and unwinds the stack before we get to call our signal. On
return from the signal we go to futex_wait_restart, where we update all
the parameters for futex_wait and call it. But here we have a problem
where abs_time is no longer valid.

I verified this with print statements, and sure enough, what abs_time
was set to ends up being garbage when we get to futex_wait_restart.

The solution I did to solve this (with input from Linus Torvalds)
was to add unions to the restart_block to allow system calls to
use the restart with specific parameters.  This way the futex code now
saves the time in a 64bit value in the restart block instead of storing
it on the stack.

Note: I'm a bit nervious to add "linux/types.h" and use u32 and u64
in thread_info.h, when there's a #ifdef __KERNEL__ just below that.
Not sure what that is there for.  If this turns out to be a problem, I've
tested this with using "unsigned int" for u32 and "unsigned long long" for
u64 and it worked just the same. I'm using u32 and u64 just to be
consistent with what the futex code uses.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

isdn: avoid copying overly-long strings

patch 0f13864e5b24d9cbe18d125d41bfa4b726a82e40 in mainline.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=9416

Signed-off-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86 setup: add a near jump to serialize %cr0 on 386/486

patch 7ed192906a2144ebc8ca2925a85d27b9c5355668 in mainline.

The 386 and 486 needs a jump immediately after setting %cr0 in order
to serialize the pipeline.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

KVM: VMX: Reset mmu context when entering real mode

patch 8668a3c468ed55d19514117a5a959d91d3d03823 in mainline.

Resetting an SMP guest will force AP enter real mode (RESET) with
paging enabled in protected mode. While current enter_rmode() can
only handle mode switch from nonpaging mode to real mode which leads
to SMP reboot failure.

Fix by reloading the mmu context on entering real mode.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

KVM: VMX: Force vm86 mode if setting flags during real mode

patch 78f7826868da8e27d097802139a3fec39f47f3b8 in mainline.

When resetting from userspace, we need to handle the flags being cleared
even after we are in real mode.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

KVM: Skip pio instruction when it is emulated, not executed

patch 0967b7bf1c22b55777aba46ff616547feed0b141 in mainline.

If we defer updating rip until pio instructions are executed, we have a
problem with reset: a pio reset updates rip, and when the instruction
completes we skip the emulated instruction, pointing rip somewhere completely
unrelated.

Fix by updating rip when we see decode the instruction, not after emulation.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

KVM: SVM: Fix FPU leak while emulating clts

patch 404fb881b82cf0cf6981832f8d31a7484e4dee81 in mainline.

The clts code didn't use set_cr0 properly, so our lazy FPU
processing wasn't being done by the clts instruction at all.

(this isn't called on Intel as the hardware does the decode for us)

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

KVM: Fix hang on uniprocessor

This is not in mainline, as it was fixed differently in that tree.

first_cpu(cpus) returns the only CPU when NR_CPUS is 1 regardless of
the cpus mask. Therefore we avoid a kernel hang in
KVM_SET_MEMORY_REGION ioctl on uniprocessor by not entering the loop at
all.

Signed-off-by: Marko Kohtala <marko.kohtala@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

KVM: x86 emulator: Use emulator_write_emulated and not emulator_write_std

patch 00b2ef475d4728ca53a2bc788c7978042907e354 in mainline.

emulator_write_std() is not implemented, and calling write_emulated should
work just as well in place of write_std.

Fixes emulator failures with the push r/m instruction.

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

KVM: SVM: Intercept the 'invd' and 'wbinvd' instructions

patch cf5a94d1331b411b84414c13e43f578260942d6b in mainline.

'invd' can destroy host data, and 'wbinvd' allows the guest to induce
long (milliseconds) latencies.

Noted by Ben Serebrin.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

KVM: x86 emulator: invd instruction

patch 651a3e29b3d19418d7a8a9787906061f9be7cc5f in mainline.

Emulate the 'invd' instruction (opcode 0f 08).

Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

KVM: x86 emulator: fix access registers for instructions with ModR/M byte and Mod = 3

patch 4e62417bf317504c0b85e0d7abd236f334f54eaf in mainline.

The patch belows changes the access type to register from memory for
instructions that are declared as SrcMem or DstMem, but have a
ModR/M byte with Mod = 3.

It fixes (at least) the lmsw and smsw instructions on an AMD64 CPU,
which are needed for FreeBSD.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

KVM: x86 emulator: implement 'movnti mem, reg'

patch a012e65aee48379a7a87eadafa74f878b61522b9 in mainline.

Implement emulation of instruction:
movnti m32/m64, r32/r64
opcode: 0x0f 0xc3

Needed to support Linux 2.6.16 as guest (used for mmio).

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

hrtimers: avoid overflow for large relative timeouts (CVE-2007-5966)

patch 62f0f61e6673e67151a7c8c0f9a09c7ea43fe2b5 in mainline

Relative hrtimers with a large timeout value might end up as negative
timer values, when the current time is added in hrtimer_start().

This in turn is causing the clockevents_set_next() function to set an
huge timeout and sleep for quite a long time when we have a clock
source which is capable of long sleeps like HPET. With PIT this almost
goes unnoticed as the maximum delta is ~27ms. The non-hrt/nohz code
sorts this out in the next timer interrupt, so we never noticed that
problem which has been there since the first day of hrtimers.

This bug became more apparent in 2.6.24 which activates HPET on more
hardware.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

forcedeth boot delay fix

patch 9e555930bd873d238f5f7b9d76d3bf31e6e3ce93 in mainline.

Fix a long boot delay in the forcedeth driver. During initialization, the
timeout for the handshake between mgmt unit and driver can be very long.
The patch reduces the timeout by eliminating a extra loop around the
timeout logic.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=9308

Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Cc: Alex Howells <astinus@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

forcedeth: new mcp79 pci ids

patch 490dde8990c55662596a4be71b5070bd7d382d4a in mainline.

This patch adds new device ids and features for mcp79 devices into the
forcedeth driver.

Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
index 92ce2e3..f9ba0ac 100644

I4L: fix isdn_ioctl memory overrun vulnerability

patch eafe1aa37e6ec2d56f14732b5240c4dd09f0613a in mainline.

Fix possible memory overrun issue in the isdn ioctl code. Found by ADLAB
<adlab@venustech.com.cn>

Signed-off-by: Karsten Keil <kkeil@suse.de>
Cc: ADLAB <adlab@venustech.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

tmpfs: restore missing clear_highpage

patch e84e2e132c9c66d8498e7710d4ea532d1feaaac5 in mainline

tmpfs was misconverted to __GFP_ZERO in 2.6.11. There's an unusual case in
which shmem_getpage receives the page from its caller instead of allocating.
We must cover this case by clear_highpage before SetPageUptodate, as before.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

USB: fix up EHCI startup synchronization

patch 1cb52658b4f5b10a9e91f8e1c21ca2bcc1b9a3ca in mainline.

A recent patch added software synchronization during EHCI startup,
so ports aren't switched away from the companion controllers after
resets have started. This patch adds a short delay letting hardware
finish that port switching before any new resets begin ... so both
ends of that hardware race window are closed.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Dave Miller <davem@davemloft.net>
Cc: Dely Sy <dely.l.sy@intel.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

USB: make the microtek driver and HAL cooperate

patch 5cf1973a44bd298e3cfce6f6af8faa8c9d0a6d55 in mainline

to make HAL like the microtek driver's devices the parent must be
correctly set.

Signed-off-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Input: ALPS - add support for model found in Dell Vostro 1400

changeset dac4ae0daa1be36ab015973ed9e9dc04a2684395 in mainline.

Input: ALPS - add support for model found in Dell Vostro 1400

Signed-off-by: William Pettersson <william.pettersson@gmail.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Fix synchronize_irq races with IRQ handler

patch a98ce5c6feead6bfedefabd46cb3d7f5be148d9a in mainline.

Fix synchronize_irq races with IRQ handler

As it is some callers of synchronize_irq rely on memory barriers
to provide synchronisation against the IRQ handlers. For example,
the tg3 driver does

tp->irq_sync = 1;
smp_mb();
synchronize_irq();

and then in the IRQ handler:

if (!tp->irq_sync)
netif_rx_schedule(dev, &tp->napi);

Unfortunately memory barriers only work well when they come in
pairs. Because we don't actually have memory barriers on the
IRQ path, the memory barrier before the synchronize_irq() doesn't
actually protect us.

In particular, synchronize_irq() may return followed by the
result of netif_rx_schedule being made visible.

This patch (mostly written by Linus) fixes this by using spin
locks instead of memory barries on the synchronize_irq() path.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chuck Ebbert <cebbert@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

PKT_SCHED: Check subqueue status before calling hard_start_xmit

[PKT_SCHED]: Check subqueue status before calling hard_start_xmit

[ Upstream commit: 5f1a485d5905aa641f33009019b3699076666a4c ]

The only qdiscs that check subqueue state before dequeue'ing are PRIO
and RR.  The other qdiscs, including the default pfifo_fast qdisc,
will allow traffic bound for subqueue 0 through to hard_start_xmit.
The check for netif_queue_stopped() is done above in pkt_sched.h, so
it is unnecessary for qdisc_restart().  However, if the underlying
driver is multiqueue capable, and only sets queue states on subqueues,
this will allow packets to enter the driver when it's currently unable
to process packets, resulting in expensive requeues and driver
entries.  This patch re-adds the check for the subqueue status before
calling hard_start_xmit, so we can try and avoid the driver entry when
the queues are stopped.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

sched: some proc entries are missed in sched_domain sys_ctl debug code

patch ace8b3d633f93da8535921bf3e3679db3c619578 in mainline.

cache_nice_tries and flags entry do not appear in proc fs sched_domain
directory, because ctl_table entry is skipped.

This patch fixes the issue.

Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Future of Linux 2.6.22.y series

commit 5d0360ee96a5ef953dbea45873c2a8c87e77d59b upstream.

We have seen ramdisk based install systems, where some pages of mapped
libraries and programs were suddendly zeroed under memory pressure. This
should not happen, as the ramdisk avoids freeing its pages by keeping
them dirty all the time.

It turns out that there is a case, where the VM makes a ramdisk page
clean, without telling the ramdisk driver. On memory pressure
shrink_zone runs and it starts to run shrink_active_list. There is a
check for buffer_heads_over_limit, and if true, pagevec_strip is called.
pagevec_strip calls try_to_release_page. If the mapping has no
releasepage callback, try_to_free_buffers is called. try_to_free_buffers
has now a special logic for some file systems to make a dirty page
clean, if all buffers are clean. Thats what happened in our test case.

The simplest solution is to provide a noop-releasepage callback for the
ramdisk driver. This avoids try_to_free_buffers for ramdisk pages.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Nick Piggin <npiggin@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

NETFILTER: Fix NULL pointer dereference in nf_nat_move_storage()

[NETFILTER]: Fix NULL pointer dereference in nf_nat_move_storage()

[ Upstream commit: 7799652557d966e49512479f4d3b9079bbc01fff ]

Reported by Chuck Ebbert as:

https://bugzilla.redhat.com/show_bug.cgi?id=259501#c14

This routine is called each time hash should be replaced, nf_conn has
extension list which contains pointers to connection tracking users
(like nat, which is right now the only such user), so when replace takes
place it should copy own extensions. Loop above checks for own
extension, but tries to move higer-layer one, which can lead to above
oops.

Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

NET: random : secure_tcp_sequence_number should not assume CONFIG_KTIME_SCALAR

[NET] random : secure_tcp_sequence_number should not assume CONFIG_KTIME_SCALAR

[ Upstream commit: 6dd10a62353a50b30b30e0c18653650975b29c71 ]

All 32 bits machines but i386 dont have CONFIG_KTIME_SCALAR. On these
machines, ktime.tv64 is more than 4 times the (correct) result given
by ktime_to_ns()

Again on these machines, using ktime_get_real().tv64 >> 6 give a
32bits rollover every 64 seconds, which is not wanted (less than the
120 s MSL)

Using ktime_to_ns() is the portable way to get nsecs from a ktime, and
have correct code.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

libertas: properly account for queue commands

patch 29f5f2a19b055feabfcc6f92e1d40ec092c373ea in mainline.

Properly account for queue commands, this fixes a problem reported
by Holger Schurig when using the debugfs interface.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Linux 2.6.23.9

ipw2200: batch non-user-requested scan result notifications

patch 0b5316769774d1dc2fdd702e095f9e6992af269a in mainline.

ipw2200 makes extensive use of background scanning when unassociated or
down. Unfortunately, the firmware sends scan completed events many
times per second, which the driver pushes directly up to userspace.
This needlessly wakes up processes listening for wireless events many
times per second. Batch together scan completed events for
non-user-requested scans and send them up to userspace every 4 seconds.
Scan completed events resulting from an SIOCSIWSCAN call are pushed up
without delay.

Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Cc: Tobias Powalowski <t.powa@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

USB: Nikon D40X unusual_devs entry

patch d466a9190ff1ceddfee50686e61d63590fc820d9 in mainline.

Not surprisingly the Nikon D40X DSC needs the same quirks as the D40,
but it has a separate ID.
See http://bugs.gentoo.org/show_bug.cgi?id=191431

From: Ortwin Glück <odi@odi.ch>
Cc: Tobias Powalowski <t.powa@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

USB: unusual_devs modification for Nikon D200

patch 16eb345f4d9189b59bae576ae63cba7ca77817b2 in mainline.

Upgrade the unusual_devs.h file to support the Nikon D200

Signed-off-by: Mike Pagano <mpagano-kernel@mpagano.com>
Signed-off-by: Phil Dibowitz <phil@ipom.com>
Cc: Tobias Powalowski <t.powa@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

softlockup: use cpu_clock() instead of sched_clock()

patch a3b13c23f186ecb57204580cc1f2dbe9c284953a in mainline.

sched_clock() is not a reliable time-source, use cpu_clock() instead.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

softlockup watchdog fixes and cleanups

This is a merge of commits a5f2ce3c6024a5bb895647b6bd88ecae5001020a and
43581a10075492445f65234384210492ff333eba in mainline to fix a warning in
the 2.6.23.3 kernel release.

softlockup watchdog: style cleanups

kernel/softirq.c grew a few style uncleanlinesses in the past few
months, clean that up. No functional changes:

text    data     bss     dec     hex filename
1126      76       4    1206     4b6 softlockup.o.before
1129      76       4    1209     4b9 softlockup.o.after

( the 3 bytes .text increase is due to the "<1>" appended to one of
the printk messages. )

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
softlockup: improve debug output

Improve the debuggability of kernel lockups by enhancing the debug
output of the softlockup detector: print the task that causes the lockup
and try to print a more intelligent backtrace.

The old format was:

BUG: soft lockup detected on CPU#1!
[<c0105e4a>] show_trace_log_lvl+0x19/0x2e
[<c0105f43>] show_trace+0x12/0x14
[<c0105f59>] dump_stack+0x14/0x16
[<c015f6bc>] softlockup_tick+0xbe/0xd0
[<c013457d>] run_local_timers+0x12/0x14
[<c01346b8>] update_process_times+0x3e/0x63
[<c0145fb8>] tick_sched_timer+0x7c/0xc0
[<c0140a75>] hrtimer_interrupt+0x135/0x1ba
[<c011bde7>] smp_apic_timer_interrupt+0x6e/0x80
[<c0105aa3>] apic_timer_interrupt+0x33/0x38
[<c0104f8a>] syscall_call+0x7/0xb
=======================

The new format is:

BUG: soft lockup detected on CPU#1! [prctl:2363]

Pid: 2363, comm:                prctl
EIP: 0060:[<c013915f>] CPU: 1
EIP is at sys_prctl+0x24/0x18c
EFLAGS: 00000213    Not tainted  (2.6.22-cfs-v20 #26)
EAX: 00000001 EBX: 000003e7 ECX: 00000001 EDX: f6df0000
ESI: 000003e7 EDI: 000003e7 EBP: f6df0fb0 DS: 007b ES: 007b FS: 00d8
CR0: 8005003b CR2: 4d8c3340 CR3: 3731d000 CR4: 000006d0
[<c0105e4a>] show_trace_log_lvl+0x19/0x2e
[<c0105f43>] show_trace+0x12/0x14
[<c01040be>] show_regs+0x1ab/0x1b3
[<c015f807>] softlockup_tick+0xef/0x108
[<c013457d>] run_local_timers+0x12/0x14
[<c01346b8>] update_process_times+0x3e/0x63
[<c0145fcc>] tick_sched_timer+0x7c/0xc0
[<c0140a89>] hrtimer_interrupt+0x135/0x1ba
[<c011bde7>] smp_apic_timer_interrupt+0x6e/0x80
[<c0105aa3>] apic_timer_interrupt+0x33/0x38
[<c0104f8a>] syscall_call+0x7/0xb
=======================

Note that in the old format we only knew that some system call locked
up, we didnt know _which_. With the new format we know that it's at a
specific place in sys_prctl(). [which was where i created an artificial
kernel lockup to test the new format.]

This is also useful if the lockup happens in user-space - the user-space
EIP (and other registers) will be printed too. (such a lockup would
either suggest that the task was running at SCHED_FIFO:99 and looping
for more than 10 seconds, or that the softlockup detector has a
false-positive.)

The task name is printed too first, just in case we dont manage to print
a useful backtrace.

[satyam@infradead.org: fix warning]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Satyam Sharma <satyam@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86: fix freeze in x86_64 RTC update code in time_64.c

patch c399da0d97e06803e51085ec076b63a3168aad1b in mainline.

x86: fix freeze in x86_64 RTC update code in time_64.c

Fix hard freeze on x86_64 when the ntpd service calls
update_persistent_clock()

A repeatable but randomly timed freeze has been happening in Fedora 6
and 7 for the last year, whenever I run the ntpd service on my AMD64x2
HP Pavilion dv9000z laptop. This freeze is due to the use of
spin_lock(&rtc_lock) under the assumption (per a bad comment) that
set_rtc_mmss is called only with interrupts disabled. The call from
ntp.c to update_persistent_clock is made with interrupts enabled.

[ tglx@linutronix.de: ported to 2.6.23.stable ]

Signed-off-by: David P. Reed <dpreed@reed.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ntp: fix typo that makes sync_cmos_clock erratic

patch fa6a1a554b50cbb7763f6907e6fef927ead480d9 in mainline.

ntp: fix typo that makes sync_cmos_clock erratic

Fix a typo in ntp.c that has caused updating of the persistent (RTC)
clock when synced to NTP to behave erratically.

When debugging a freeze that arises on my AMD64 machines when I
run the ntpd service, I added a number of printk's to monitor the
sync_cmos_clock procedure.  I discovered that it was not syncing to
cmos RTC every 11 minutes as documented, but instead would keep trying
every second for hours at a time.  The reason turned out to be a typo
in sync_cmos_clock, where it attempts to ensure that
update_persistent_clock is called very close to 500 msec. after a 1
second boundary (required by the PC RTC's spec). That typo referred to
"xtime" in one spot, rather than "now", which is derived from "xtime"
but not equal to it.  This makes the test erratic, creating a
"coin-flip" that decides when update_persistent_clock is called - when
it is called, which is rarely, it may be at any time during the one
second period, rather than close to 500 msec, so the value written is
needlessly incorrect, too.

Signed-off-by: David P. Reed <dpreed@reed.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86: return correct error code from child_rip in x86_64 entry.S

patch 1c5b5cfd290b6cb7c67020ef420e275f746a7236 in mainline.

x86: return correct error code from child_rip in x86_64 entry.S

Right now register edi is just cleared before calling do_exit.
That is wrong because correct return value will be ignored.
Value from rax should be copied to rdi instead of clearing edi.

AK: changed to 32bit move because it's strictly an int

[ tglx: arch/x86 adaptation ]

Signed-off-by: Andrey Mirkin <major@openvz.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86: NX bit handling in change_page_attr()

patch 84e0fdb1754d066dd0a8b257de7299f392d1e727 in mainline.

x86: NX bit handling in change_page_attr()

This patch fixes a bug of change_page_attr/change_page_attr_addr on
Intel x86_64 CPUs. After changing page attribute to be executable with
these functions, the page remains un-executable on Intel x86_64 CPU.
Because on Intel x86_64 CPU, only if the "NX" bits of all four level
page tables are cleared, the corresponding page is executable (refer to
section 4.13.2 of Intel 64 and IA-32 Architectures Software Developer's
Manual). So, the bug is fixed through clearing the "NX" bit of PMD when
splitting the huge PMD.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>