git.karo-electronics.de Git - karo-tx-linux.git/log

vmscan: properly account for the number of page cache pages zone_reclaim() can reclaim

commit 90afa5de6f3fa89a733861e843377302479fcf7e upstream.

A bug was brought to my attention against a distro kernel but it affects
mainline and I believe problems like this have been reported in various
guises on the mailing lists although I don't have specific examples at the
moment.

The reported problem was that malloc() stalled for a long time (minutes in
some cases) if a large tmpfs mount was occupying a large percentage of
memory overall.  The pages did not get cleaned or reclaimed by
zone_reclaim() because the zone_reclaim_mode was unsuitable, but the lists
are uselessly scanned frequencly making the CPU spin at near 100%.

This patchset intends to address that bug and bring the behaviour of
zone_reclaim() more in line with expectations which were noticed during
investigation.  It is based on top of mmotm and takes advantage of
Kosaki's work with respect to zone_reclaim().

Patch 1 fixes the heuristics that zone_reclaim() uses to determine if the
scan should go ahead. The broken heuristic is what was causing the
malloc() stall as it uselessly scanned the LRU constantly. Currently,
zone_reclaim is assuming zone_reclaim_mode is 1 and historically it
could not deal with tmpfs pages at all. This fixes up the heuristic so
that an unnecessary scan is more likely to be correctly avoided.

Patch 2 notes that zone_reclaim() returning a failure automatically means
the zone is marked full. This is not always true. It could have
failed because the GFP mask or zone_reclaim_mode were unsuitable.

Patch 3 introduces a counter zreclaim_failed that will increment each
time the zone_reclaim scan-avoidance heuristics fail. If that
counter is rapidly increasing, then zone_reclaim_mode should be
set to 0 as a temporarily resolution and a bug reported because
the scan-avoidance heuristic is still broken.

This patch:

On NUMA machines, the administrator can configure zone_reclaim_mode that
is a more targetted form of direct reclaim.  On machines with large NUMA
distances for example, a zone_reclaim_mode defaults to 1 meaning that
clean unmapped pages will be reclaimed if the zone watermarks are not
being met.

There is a heuristic that determines if the scan is worthwhile but the
problem is that the heuristic is not being properly applied and is
basically assuming zone_reclaim_mode is 1 if it is enabled.  The lack of
proper detection can manfiest as high CPU usage as the LRU list is scanned
uselessly.

Historically, once enabled it was depending on NR_FILE_PAGES which may
include swapcache pages that the reclaim_mode cannot deal with.  Patch
vmscan-change-the-number-of-the-unmapped-files-in-zone-reclaim.patch by
Kosaki Motohiro noted that zone_page_state(zone, NR_FILE_PAGES) included
pages that were not file-backed such as swapcache and made a calculation
based on the inactive, active and mapped files.  This is far superior when
zone_reclaim==1 but if RECLAIM_SWAP is set, then NR_FILE_PAGES is a
reasonable starting figure.

This patch alters how zone_reclaim() works out how many pages it might be
able to reclaim given the current reclaim_mode.  If RECLAIM_SWAP is set in
the reclaim_mode it will either consider NR_FILE_PAGES as potential
candidates or else use NR_{IN}ACTIVE}_PAGES-NR_FILE_MAPPED to discount
swapcache and other non-file-backed pages.  If RECLAIM_WRITE is not set,
then NR_FILE_DIRTY number of pages are not candidates.  If RECLAIM_SWAP is
not set, then NR_FILE_MAPPED are not.

[kosaki.motohiro@jp.fujitsu.com: Estimate unmapped pages minus tmpfs pages]
[fengguang.wu@intel.com: Fix underflow problem in Kosaki's estimate]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

dm: use i_size_read

commit 5657e8fa45cf230df278040c420fb80e06309d8f upstream.

Use i_size_read() instead of reading i_size.

If someone changes the size of the device simultaneously, i_size_read
is guaranteed to return a valid value (either the old one or the new one).

i_size can return some intermediate invalid value (on 32-bit computers
with 64-bit i_size, the reads to both halves of i_size can be interleaved
with updates to i_size, resulting in garbage being returned).

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

dm exception store: really fix type lookup

commit 874d2f61d31e596c36af7732dc1b3aa2dc233824 upstream.

Fix exception store name handling.

We need to reference exception store by zero terminated string.

Fixes regression introduced in commit f6bd4eb73cdf2a5bf954e497972842f39cabb7e3

Cc: Yi Yang <yi.y.yang@intel.com>
Cc: Jonathan Brassow <jbrassow@redhat.com>
Cc: stable@kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

dm exception store: fix exstore lookup to be case insensitive

commit f6bd4eb73cdf2a5bf954e497972842f39cabb7e3 upstream.

When snapshots are created using 'p' instead of 'P' as the
exception store type, the device-mapper table loading fails.

This patch makes the code case insensitive as intended and fixes some
regressions reported with device-mapper snapshots.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

dm mpath: flush keventd queue in destructor

commit 53b351f972a882ea8b6cdb19602535f1057c884a upstream.

The commit fe9cf30eb8186ef267d1868dc9f12f2d0f40835a moves dm table event
submission from kmultipath queue to kernel kevent queue to avoid a
deadlock.

There is a possibility of race condition because kevent queue is not flushed
in the multipath destructor. The scenario is:
- some event happens and is queued to keventd
- keventd thread is delayed due to scheuling latency or some other work
- multipath device is destroyed
- keventd now attempts to process work_struct that is residing in already
released memory.

The patch flushes the keventd queue in multipath constructor.
I've already fixed similar bug in dm-raid1.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

dm: sysfs skip output when device is being destroyed

commit 4d89b7b4e4726893453d0fb4ddbb5b3e16353994 upstream.

Do not process sysfs attributes when device is being destroyed.

Otherwise code can cause
BUG_ON(test_bit(DMF_FREEING, &md->flags));
in dm_put() call.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

dm mpath: validate table argument count

commit 0e0497c0c017664994819f4602dc07fd95896c52 upstream.

The parser reads the argument count as a number but doesn't check that
sufficient arguments are supplied. This command triggers the bug:

dmsetup create mpath --table "0 `blockdev --getsize /dev/mapper/cr0`
multipath 0 0 2 1 round-robin 1000 0 1 1 /dev/mapper/cr0
round-robin 0 1 1 /dev/mapper/cr1 1000"
kernel BUG at drivers/md/dm-mpath.c:530!

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

dm mpath: validate hw_handler argument count

commit e094f4f15f5169526c7200b9bde44b900548a81e upstream.

Fix arg count parsing error in hw handlers.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

mm: fix handling of pagesets for downed cpus

commit 364df0ebfbbb1330bfc6ca159f4d6020efc15a12 upstream.

After downing/upping a cpu, an attempt to set
/proc/sys/vm/percpu_pagelist_fraction results in an oops in
percpu_pagelist_fraction_sysctl_handler().

If a processor is downed then we need to set the pageset pointer back to
the boot pageset.

Updates of the high water marks should not access pagesets of unpopulated
zones (those pointer go to the boot pagesets which would be no longer
functional if their size would be increased beyond zero).

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

qla2xxx: Correct (again) overflow during dump-processing on large-memory ISP23xx parts.

commit e18e963b7e533149676b5d387d0a56160df9f111 upstream.

Commit 7b867cf76fbcc8d77867cbec6f509f71dce8a98f ([SCSI] qla2xxx:
Refactor qla data structures) inadvertently reverted
e792121ec85672c1fa48f79d13986a3f4f56c590 ([SCSI] qla2xxx: Correct
overflow during dump-processing on large-memory ISP23xx parts.).

Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

PCI PM: Follow PCI_PM_CTRL_NO_SOFT_RESET during transitions from D3

commit f62795f1e892ca9269849fa83de97621da7e02c0 upstream.

According to the PCI PM specification (PCI Bus Power Management
Interface Specification, Rev. 1.2, Section 5.4.1) we are supposed to
reinitialize devices that have PCI_PM_CTRL_NO_SOFT_RESET clear during
all transitions from PCI_D3hot to PCI_D0, but we only do it if the
device's current_state field is equal to PCI_UNKNOWN.

This may lead to problems if a device with PCI_PM_CTRL_NO_SOFT_RESET
unset is put into PCI_D3hot at run time by its driver and
pci_set_power_state() is used to put it back into PCI_D0, because in
that case the device will remain uninitialized after
pci_set_power_state() has returned. Prevent that from happening by
modifying pci_raw_set_power_state() to reinitialize devices with
PCI_PM_CTRL_NO_SOFT_RESET unset during all transitions from D3 to D0.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

PCI PM: Fix handling of devices without PM support by pci_target_state()

commit d2abdf62882d982c58e7a6b09ecdcfcc28075e2e upstream.

If a PCI device is not power-manageable either by the platform, or
with the help of the native PCI PM interface, pci_target_state() will
return either PCI_D3hot, or PCI_POWER_ERROR for it, depending on
whether or not the device is configured to wake up the system. Alas,
none of these return values is correct, because each of them causes
pci_prepare_to_sleep() to return error code, although it should
complete successfully in such a case.

Fix this problem by making pci_target_state() always return PCI_D0
for devices that cannot be power managed.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

sound: seq_midi_event: fix decoding of (N)RPN events

commit 6423f9ea8035138d70bae1a278d3b57b743f8b3e upstream.

When decoding (N)RPN sequencer events into raw MIDI commands, the
extra_decode_xrpn() function had accidentally swapped the MSB and LSB
controller values of both the parameter number and the data value.

Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ath5k: avoid PCI FATAL interrupts by restoring RETRY_TIMEOUT disabling

commit 8451d22dad40a66416b8d9c0952efa09ec5398c5 upstream.

This reverts 'ath5k: remove dummy PCI "retry timeout" fix' on the
same theory as in 'ath9k: Fix PCI FATAL interrupts by restoring
RETRY_TIMEOUT disabling'.

Reported-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

mv643xx_eth: fix unicast filter programming in promiscuous mode

commit 6877f54e6a3326c99aaf84b7bff6a3019da0b847 upstream.

The Unicast Promiscious Mode (UPM) bit in the mv643xx_eth port
configuration register doesn't do exactly what its name would suggest:
setting this bit merely enables reception of all unicast frames with a
destination address that differs from our local MAC address in bits
[47:4]. In particular, it doesn't have any effect on unicast frames
with a destination address that matches our MAC address in bits [47:4]
-- these will still be tested against the 16-entry unicast address
filter table.

Therefore, if the interface is set to promiscuous mode, just setting
the unicast promiscuous bit isn't enough -- we need to set all filter
bits in the unicast filter table to 1 as well.

Reported-by: Sachin Sanap <ssanap@marvell.com>
Signed-off-by: Prabhanjan Sarnaik <sarnaik@marvell.com>
Tested-by: Siddarth Gore <gores@marvell.com>
Tested-by: Mahavir Jain <mjain@marvell.com>
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

parport_pc: after superio probing restore original register values

commit e2434dc1c19412639dd047a4d4eff8ed0e5d0d50 upstream.

CONFIG_PARPORT_PC_SUPERIO probes for various superio chips by writing
byte sequences to a set of different potential I/O ranges.  But the
probed ranges are not exclusive to parallel ports.  Some of our boards
just happen to have a watchdog in one of them.  Took us almost a week
to figure out why some distros reboot without warning after running
flawlessly for 3 hours.  For exactly 170 = 0xAA minutes, that is ...

Fixed by restoring original values after probing.  Also fixed too small
request_region() in detect_and_report_it87().

Signed-off-by: Jens Rottmann <JRottmann@LiPPERTEmbedded.de>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

parport_pc: set properly the dma_mask for parport_pc device

commit dfa7c4d869b7d3d37b70f1de856f2901b6ebfcf0 upstream.

parport_pc_probe_port() creates the own 'parport_pc' device if the
device argument is NULL. Then parport_pc_probe_port() doesn't
initialize the dma_mask and coherent_dma_mask of the device and calls
dma_alloc_coherent with it. dma_alloc_coherent fails because
dma_alloc_coherent() doesn't accept the uninitialized dma_mask:

http://lkml.org/lkml/2009/6/16/150

Long ago, X86_32 and X86_64 had the own dma_alloc_coherent
implementations; X86_32 accepted a device having dma_mask that is not
initialized however X86_64 didn't. When we merged them, we chose to
prohibit a device having dma_mask that is not initialized. I think
that it's good to require drivers to set up dma_mask (and
coherent_dma_mask) properly if the drivers want DMA.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Reported-by: Malcom Blaney <malcolm.blaney@maptek.com.au>
Tested-by: Malcom Blaney <malcolm.blaney@maptek.com.au>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

n_r3964: fix lock imbalance

commit eca41044268887838fa122aa24475df8f23d614c upstream.

There is omitted BKunL in r3964_read.

Centralize the paths to one point with one unlock.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

pcmcia/cm4000: fix lock imbalance

commit 69ae59d7d8df14413cf0a97b3e372d7dc8352563 upstream.

Don't return from switch/case, break instead, so that we unlock BKL.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

usb-serial: replace shutdown with disconnect, release

commit f9c99bb8b3a1ec81af68d484a551307326c2e933 upstream

This patch splits up the shutdown method of usb_serial_driver into a
disconnect and a release method.

The problem is that the usb-serial core was calling shutdown during
disconnect handling, but drivers didn't expect it to be called until
after all the open file references had been closed. The result was an
oops when the close method tried to use memory that had been
deallocated by shutdown.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86: Set cpu_llc_id on AMD CPUs

commit 99bd0c0fc4b04da54cb311953ef9489931c19c63 upstream.

This counts when building sched domains in case NUMA information
is not available.

( See cpu_coregroup_mask() which uses llc_shared_map which in turn is
  created based on cpu_llc_id. )

Currently Linux builds domains as follows:
(example from a dual socket quad-core system)

CPU0 attaching sched-domain:
  domain 0: span 0-7 level CPU
   groups: 0 1 2 3 4 5 6 7

  ...

CPU7 attaching sched-domain:
  domain 0: span 0-7 level CPU
   groups: 7 0 1 2 3 4 5 6

Ever since that is borked for multi-core AMD CPU systems.
This patch fixes that and now we get a proper:

CPU0 attaching sched-domain:
  domain 0: span 0-3 level MC
   groups: 0 1 2 3
   domain 1: span 0-7 level CPU
    groups: 0-3 4-7

  ...

CPU7 attaching sched-domain:
  domain 0: span 4-7 level MC
   groups: 7 4 5 6
   domain 1: span 0-7 level CPU
    groups: 4-7 0-3

This allows scheduler to assign tasks to cores on different sockets
(i.e. that don't share last level cache) for performance reasons.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20090619085909.GJ5218@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86: Fix non-lazy GS handling in sys_vm86()

commit 3aa6b186f86c5d06d6d92d14311ffed51f091f40 upstream.

This fixes a stack corruption panic or null dereference oops
due to a bad GS in resume_userspace() when returning from
sys_vm86() and calling lockdep_sys_exit().

Only a problem when CONFIG_LOCKDEP and CONFIG_CC_STACKPROTECTOR
enabled.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1244384628.2323.4.camel@bimbo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Steven Noonan <steven@uplinklabs.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

vt_ioctl: fix lock imbalance

commit a115902f67ef51fbbe83e214fb761aaa9734c1ce upstream.

Don't return from switch/case directly in vt_ioctl. Set ret and break
instead so that we unlock BKL.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

cifs: fix fh_mutex locking in cifs_reopen_file

commit f0a71eb820596bd8f6abf64beb4cb181edaa2341 upstream.

Fixes a regression caused by commit a6ce4932fbdbcd8f8e8c6df76812014351c32892

When this lock was converted to a mutex, the locks were turned into
unlocks and vice-versa.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

tracing/urgent: fix unbalanced ftrace_start_up

commit c85a17e22695969aa24a7ffa40cf26d6e6fcfd50 upstream.

Perfcounter reports the following stats for a wide system
profiling:

#
# (2364 samples)
#
# Overhead  Symbol
# ........  ......
#
    15.40%  [k] mwait_idle_with_hints
     8.29%  [k] read_hpet
     5.75%  [k] ftrace_caller
     3.60%  [k] ftrace_call
     [...]

This snapshot has been taken while neither the function tracer nor
the function graph tracer was running.
With dynamic ftrace, such results show a wrong ftrace behaviour
because all calls to ftrace_caller or ftrace_graph_caller (the patched
calls to mcount) are supposed to be patched into nop if none of those
tracers are running.

The problem occurs after the first run of the function tracer. Once we
launch it a second time, the callsites will never be nopped back,
unless you set custom filters.
For example it happens during the self tests at boot time.
The function tracer selftest runs, and then the dynamic tracing is
tested too. After that, the callsites are left un-nopped.

This is because the reset callback of the function tracer tries to
unregister two ftrace callbacks in once: the common function tracer
and the function tracer with stack backtrace, regardless of which
one is currently in use.
It then creates an unbalance on ftrace_start_up value which is expected
to be zero when the last ftrace callback is unregistered. When it
reaches zero, the FTRACE_DISABLE_CALLS is set on the next ftrace
command, triggering the patching into nop. But since it becomes
unbalanced, ie becomes lower than zero, if the kernel functions
are patched again (as in every further function tracer runs), they
won't ever be nopped back.

Note that ftrace_call and ftrace_graph_call are still patched back
to ftrace_stub in the off case, but not the callers of ftrace_call
and ftrace_graph_caller. It means that the tracing is well deactivated
but we waste a useless call into every kernel function.

This patch just unregisters the right ftrace_ops for the function
tracer on its reset callback and ignores the other one which is
not registered, fixing the unbalance. The problem also happens
is .30

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

md/raid5: add missing call to schedule() after prepare_to_wait()

commit 7a3ab908948b6296ee7e81d42f7c176361c51975 upstream.

In the unlikely event that reshape progresses past the current request
while it is waiting for a stripe we need to schedule() before retrying
for 2 reasons:
1/ Prevent list corruption from duplicated list_add() calls without
intervening list_del().
2/ Give the reshape code a chance to make some progress to resolve the
conflict.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

crypto: aes-ni - Fix cbc mode IV saving

commit e6efaa025384f86a18814a6b9f4e5d54484ab9ff upstream.

Original implementation of aesni_cbc_dec do not save IV if input
length % 4 == 0. This will make decryption of next block failed.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ath9k: Fix PCI FATAL interrupts by restoring RETRY_TIMEOUT disabling

commit f0214843ba23d9bf6dc6b8ad2c6ee27b60f0322e upstream.

An earlier commit, 'ath9k: remove dummy PCI "retry timeout" fix', removed
code that was documented to disable RETRY_TIMEOUT register (PCI reg
0x41) since it was claimed to be a no-op. However, it turns out that
there are some combinations of hosts and ath9k-supported cards for
which this is not a no-op (reg 0x41 has value 0x80, not 0) and this
code (or something similar) is needed. In such cases, the driver may
be next to unusable due to very frequent PCI FATAL interrupts from the
card.

Reverting the earlier commit, i.e., restoring the RETRY_TIMEOUT
disabling, seems to resolve the issue. Since the removal of this code
was not based on any known issue and was purely a cleanup change, the
safest option here is to just revert that commit. Should there be
desire to clean this up in the future, the change will need to be
tested with a more complete coverage of cards and host systems.

http://bugzilla.kernel.org/show_bug.cgi?id=13483

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ath9k: Initialize ANI timers

commit 415f738ecf41b427921b503ecfd427e26f89dc23 upstream.

The various ANI timers have to be initialized properly when
starting the calibration timer.

Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ath9k: Fix memleak on TX DMA failure

commit 675902ef822c114c0dac17ed10eed43eb8f5c9ec upstream.

The driver-specific region has to be freed in case
of a DMA mapping failure.

Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ath9k: Fix bug in scan termination

commit 9c07a7777f44c7e39accec5ad8c4293d6a9b2a47 upstream.

A full HW reset needs to be done on termination of a scan run.
Not setting SC_OP_FULL_RESET resulted in doing a
fast channel change.

Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ath9k: Fix bug in checking HT flag

commit db2f63f60a087ed29ae04310c1076c61f77a5d20 upstream.

The operating HT mode is stored in chanmode and
not channelFlags.

Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ath9k: Fix bug in determining calibration support

commit a451aa66dcb14efcb7addf1d8edcac8df76a97b6 upstream.

ADC gain calibration has to be done for all non 2GHZ-HT20 channels.
Regression from "ath9k: use ieee80211_conf on ath9k_hw_iscal_supported()"

Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ath9k: Fix bug in calibration initialization

commit 04d19ddd254b404703151ab25aa5041e50ff40f7 upstream.

This patch fixes a bug in ath9k_hw_init_cal() where the wrong
calibration was being done for non-AR9285 chipsets.
Also add a few helpful comments.

Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ath9k: Fix bug when using a card with a busted EEPROM

backport of commit 85efc86eb7c6cbb1c8ce8d99b10b948be033fbb9 upstream.

We fail if your EEPROM is busted but we were never propagated the
error back so such users could end up with a cryptic oops message
like:

IP: [<f883e1b9>] ath9k_reg_apply_world_flags+0x29/0x130 [ath9k]
*pde = 00000000
Oops: 0000 [#1] SMP
Modules linked in: ath9k(+) mac80211 cfg80211
Pid: 4284, comm: insmod Not tainted (2.6.29-wl #3) 7660A14
EIP: 0060:[<f883e1b9>] EFLAGS: 00010286 CPU: 1
EIP is at ath9k_reg_apply_world_flags+0x29/0x130 [ath9k]

Fix this by propagating the error and also lets not leave the
user in the dark and communicate what's going on. When this
happens you will now see this:

ath9k 0000:16:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
ath9k: Invalid EEPROM contents

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

cfg80211: fix in nl80211_set_reg()

commit 61405e97788b1bc4e7c5be5b4ec04a73fc11bac2 upstream.

There is a race on access to last_request and its alpha2
through reg_is_valid_request() and us possibly processing
first another regulatory request on another CPU. We avoid
this improbably race by locking with the cfg80211_mutex as
we should have done in the first place. While at it add
the assert on locking on reg_is_valid_request().

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

cfg80211: return immediately if num reg rules > NL80211_MAX_SUPP_REG_RULES

commit 4776c6e7f66f853011bc1fd6fe37fa63f0b6982c upstream.

This has no functional change except we save a kfree(rd) and
allows us to clean this code up a bit after this. We do
avoid an unnecessary kfree(NULL) but calling that was OK too.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

cfg80211: cleanup return calls on nl80211_set_reg()

commit d0e18f833d23afefb6751a21d14a2cd71d2d4d66 upstream.

This has no functional change, but it will make the race
fix easier to spot in my next patch.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

cfg80211: fix for duplicate userspace replies

commit 729e9c7663190d71fe5e29831634df80f38199c1 upstream.

This fixes an incorrect assumption (BUG_ON) made in
cfg80211 when handling country IE regulatory requests.
The assumption was that we won't try to call_crda()
twice for the same event and therefore we will not
recieve two replies through nl80211 for the regulatory
request. As it turns out it is true we don't call_crda()
twice for the same event, however, kobject_uevent_env()
*might* send the udev event twice and/or userspace can
simply process the udev event twice. We remove the BUG_ON()
and simply ignore the duplicate request.

For details refer to this thread:

http://marc.info/?l=linux-wireless&m=124149987921337&w=2

Reported-by: Maxim Levitsky <maximlevitsky@gmail.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

mac80211: fix minstrel single-rate memory corruption

commit 5ee58d7e6ad019675b4090582aec4fa1180d8703 upstream.

The minstrel rate controller periodically looks up rate indexes in
a sampling table.  When accessing a specific row and column, minstrel
correctly does a bounds check which, on the surface, appears to handle
the case where mi->n_rates < 2.  However, mi->sample_idx is actually
defined as an unsigned, so the right hand side is taken to be a huge
positive number when negative, and the check will always fail.

Consequently, the RC will overrun the array and cause random memory
corruption when communicating with a peer that has only a single rate.
The max value of mi->sample_idx is around 25 so casting to int should
have no ill effects.

Without the change, uptime is a few minutes under load with an AP
that has a single hard-coded rate, and both the AP and STA could
potentially crash.  With the change, both lasted 12 hours with a
steady load.

Thanks to Ognjen Maric for providing the single-rate clue so I could
reproduce this.

This fixes http://bugzilla.kernel.org/show_bug.cgi?id=12490 on the
regression list (also http://bugzilla.kernel.org/show_bug.cgi?id=13000).

Reported-by: Sergey S. Kostyliov <rathamahata@gmail.com>
Reported-by: Ognjen Maric <ognjen.maric@gmail.com>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ramfs: ignore unknown mount options

commit 0a8eba9b7f7aa3ad0305627c99ad4d6deedd871d upstream.

On systems where CONFIG_SHMEM is disabled, mounting tmpfs filesystems can
fail when tmpfs options are used.  This is because tmpfs creates a small
wrapper around ramfs which rejects unknown options, and ramfs itself only
supports a tiny subset of what tmpfs supports.  This makes it pretty hard
to use the same userspace systems across different configuration systems.
As such, ramfs should ignore the tmpfs options when tmpfs is merely a
wrapper around ramfs.

This used to work before commit c3b1b1cbf0 as previously, ramfs would
ignore all options.  But now, we get:
ramfs: bad mount option: size=10M
mount: mounting mdev on /dev failed: Invalid argument

Another option might be to restore the previous behavior, where ramfs
simply ignored all unknown mount options ... which is what Hugh prefers.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Acked-by: Matt Mackall <mpm@selenic.com>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ASoC: Remove odd bit clock ratios for WM8903

commit ba2533a47865ec0dbc72834287a8a048e9337a95 upstream.

These are not supported since performance can not be guaranteed
when they are in use.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

lockdep: Select frame pointers on x86

commit 00540e5d54be972a94a3b2ce6da8621bebe731a2 upstream.

x86 stack traces are a piece of crap without frame pointers, and its not
like the 'performance gain' of not having stack pointers matters when you
selected lockdep.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <new-submission>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

epoll: fix nested calls support

commit 3fe4a975d662f11037cb710f8b4b158a3e38f9c0 upstream.

This fixes a regression in 2.6.30.

I unfortunately accepted a patch time ago, to drop the "current" usage
from possible IRQ context, w/out proper thought over it. The patch
switched to using the CPU id by bounding the nested call callback with a
get_cpu()/put_cpu().

Unfortunately the ep_call_nested() function can be called with a callback
that grabs sleepy locks (from own f_op->poll()), that results in epic
fails. The following patch uses the proper "context" depending on the
path where it is called, and on the kind of callback.

This has been reported by Stefan Richter, that has also verified the patch
is his previously failing environment.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Reported-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

IB/mlx4: Add strong ordering to local inval and fast reg work requests

commit 2ac6bf4ddc87c3b6b609f8fa82f6ebbffeac12f4 upstream.

The ConnectX Programmer's Reference Manual states that the "SO" bit
must be set when posting Fast Register and Local Invalidate send work
requests. When this bit is set, the work request will be executed
only after all previous work requests on the send queue have been
executed. (If the bit is not set, Fast Register and Local Invalidate
WQEs may begin execution too early, which violates the defined
semantics for these operations)

This fixes the issue with NFS/RDMA reported in
<http://lists.openfabrics.org/pipermail/general/2009-April/059253.html>

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ARM: 5545/2: add flush_kernel_dcache_page() for ARM

commit 73be1591579084a8103a7005dd3172f3e9dd7362 upstream.

Without this, the default implementation is a no op which is completely
wrong with a VIVT cache, and usage of sg_copy_buffer() produces
unpredictable results.

Tested-by: Sebastian Andrzej Siewior <bigeasy@breakpoint.cc>
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86: hpet: Mark per cpu interrupts IRQF_TIMER to prevent resume failure

commit 507fa3a3d80365c595113a5ac3232309e3dbf5d8 upstream.

timer interrupts are excluded from being disabled during suspend. The
clock events code manages the disabling of clock events on its own
because the timer interrupt needs to be functional before the resume
code reenables the device interrupts.

The hpet per cpu timers request their interrupt without setting the
IRQF_TIMER flag so suspend_device_irqs() disables them as well which
results in a fatal resume failure on the boot CPU.

Adding IRQF_TIMER to the interupt flags when requesting the hpet per
cpu timer interrupts solves the problem.

Reported-by: Benjamin S. <sbenni@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Benjamin S. <sbenni@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ALSA: cmi8330: fix MPU-401 PnP init copy&paste bug

commit c2a30d711852e4f39c8a79135b3caa701f7a8e02 upstream.

Fix copy&paste bug in PnP MPU-401 initialization.

Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ALSA: hda - Add quirk for Sony VAIO Z21MN

commit 376b508ffde3b17e105265f89b83bdb044b1c1ae upstream.

It needs model=toshiba-s06 to work with the digital-mic.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ALSA: hda - Get back Input Source for ALC262 toshiba-s06 model

commit ae14ef68e8e67ca5b8b29f0eb640f7c106617f4e upstream.

The commit f9e336f65b666b8f1764d17e9b7c21c90748a37e
ALSA: hda - Unify capture mixer creation in realtek codes
removed the "Input Source" mixer element creation for toshiba-s06 model
because it contains a digital-mic input.

This patch take the control back.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ALSA: intel8x0 - Fix PCM position craziness

commit f708eb1d71dc8ffb184da9f0bc53461c6dc10653 upstream.

The PCM pointer callback sometimes returns invalid positions and this
screws up the hw_ptr updater in PCM core. Especially since now the
jiffies check is optional with xrun_debug, the invalid position is
handled as is, and causes serious sound skips, etc.

This patch simplifies the position-fix strategy in intel8x0 to be more
robust:
- just falls back to the last position if bogus position is detected
- another sanity check for the backward move of the position due to
a race of register update and the base-index update

This patch is applicable also for 2.6.30.

Tested-by: David Miller <davem@davemloft.net>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ALSA: ca0106 - Add missing registrations of vmaster controls

commit 601e1cc5df940b59e71c947726640811897d30df upstream.

Although the vmaster controls are created, they aren't registered thus
they don't appear in the real world. Added the missing snd_ctl_add()
calls.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86: handle initrd that extends into unusable memory

commit 8c5dd8f43367f4f266dd616f11658005bc2d20ef upstream.

On a system where system memory (according e820) is not covered by
mtrr, mtrr_trim_memory converts a portion of memory to reserved, but
bootloader has already put the initrd in that range.

Thus, we need to have 64bit to use relocate_initrd too.

[ Impact: fix using initrd when mtrr_trim_memory happen ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86: Add quirk for reboot stalls on a Dell Optiplex 360

commit 4a4aca641bc4598e77b866804f47c651ec4a764d upstream.

The Dell Optiplex 360 hangs on reboot, just like the Optiplex 330, so
the same quirk is needed.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Steve Conklin <steve.conklin@canonical.com>
Cc: Leann Ogasawara <leann.ogasawara@canonical.com>
LKML-Reference: <200906051202.38311.jdelvare@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86: Fix uv bau sending buffer initialization

commit 9c26f52b900f7207135bafc8789e1a4f5d43e096 upstream.

The initialization of the UV Broadcast Assist Unit's sending
buffers was making an invalid assumption about the
initialization of an MMR that defines its address.

The BIOS will not be providing that MMR. So
uv_activation_descriptor_init() should unconditionally set it.

Tested on UV simulator.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
LKML-Reference: <E1MJTfj-0005i1-W8@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86: move rdtsc_barrier() into the TSC vread method

commit 7d96fd41cadc55f4e00231c8c71b8e25c779f122 upstream.

The *fence instructions were moved to vsyscall_64.c by commit
cb9e35dce94a1b9c59d46224e8a94377d673e204. But this breaks the
vDSO, because vread methods are also called from there.

Besides, the synchronization might be unnecessary for other
time sources than TSC.

[ Impact: fix potential time warp in VDSO ]

Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
LKML-Reference: <9d0ea9ea0f866bdc1f4d76831221ae117f11ea67.1243241859.git.ptesarik@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86: enable GART-IOMMU only after setting up protection methods

commit fe2245c905631a3a353504fc04388ce3dfaf9d9e upstream.

The current code to set up the GART as an IOMMU enables GART
translations before it removes the aperture from the kernel memory
map, sets the GART PTEs to UC, sets up the guard and scratch
pages, or does a wbinvd(). This leaves the possibility of cache
aliasing open and can cause system crashes.

Re-order the code so as to enable the GART translations only
after all safeguards are in place and the tlb has been flushed.

AMD has tested this patch on both Istanbul systems and 1st
generation Opteron systems with APG enabled and seen no adverse
effects. Istanbul systems with HT Assist enabled sometimes
see MCE errors due to cache artifacts with the unmodified
code.

Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: akpm@linux-foundation.org
Cc: jbarnes@virtuousgeek.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>

x86, UV: Fix macros for multiple coherency domains

commit c4ed3f04ba9defe22aa729d1646f970f791c03d7 upstream.

Fix bug in the SGI UV macros that support systems with multiple
coherency domains. The macros used for referencing global MMR
(chipset registers) are failing to correctly "or" the NASID
(node identifier) bits that reside above M+N. These high bits
are supplied automatically by the chipset for memory accesses
coming from the processor socket.

However, the bits must be present for references to the special
global MMR space used to map chipset registers. (See uv_hub.h
for more details ...)

The bug results in references to invalid/incorrect nodes.

Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20090608154405.GA16395@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86: Fix UV BAU activation descriptor init

commit 0e2595cdfd7df9f1128f7185152601ae5417483b upstream.

The UV tlb shootdown code has a serious initialization error.

An array of structures [32*8] is initialized as if it were [32].
The array is indexed by (cpu number on the blade)*8, so the short
initialization works for up to 4 cpus on a blade.
But above that, we provide an invalid opcode to the hub's
broadcast assist unit.

This patch changes the allocation of the array to use its symbolic
dimensions for better clarity. And initializes all 32*8 entries.

Shortened 'UV_ACTIVATION_DESCRIPTOR_SIZE' to 'UV_ADP_SIZE' per Ingo's
recommendation.

Tested on the UV simulator.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
LKML-Reference: <E1M6lZR-0007kV-Aq@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86: memtest: remove 64-bit division

commit c9690998ef48ffefeccb91c70a7739eebdea57f9 upstream.

Using gcc 3.3.5 a "make allmodconfig" + "CONFIG_KVM=n"
triggers a build error:

arch/x86/mm/built-in.o(.init.text+0x43f7): In function `__change_page_attr':
arch/x86/mm/pageattr.c:114: undefined reference to `__udivdi3'
make: *** [.tmp_vmlinux1] Error 1

The culprit turned out to be a division in arch/x86/mm/memtest.c
For more info see this thread:

http://marc.info/?l=linux-kernel&m=124416232620683

The patch entirely removes the division that caused the build
error.

[ Impact: build fix with certain GCC versions ]

Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: xiyou.wangcong@gmail.com
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20090608170939.GB12431@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

serial: bfin_5xx: add missing spin_lock init

commit 9c529a3d76dffae943868ebad07b042d15764712 upstream.

The Blackfin serial driver never initialized the spin_lock that is part of
the serial core structure, but we never noticed because spin_lock's are
rarely enabled on UP systems. Yeah lockdep and friends.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

spi: takes size of a pointer to determine the size of the pointed-to type

commit 021415468c889979117b1a07b96f7e36de33e995 upstream.

Do not take the size of a pointer to determine the size of the pointed-to
type.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Cc: David Brownell <david-b@pacbell.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Kumar Gala <galak@gate.crashing.org>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

jfs: fix regression preventing coalescing of extents

commit f7c52fd17a7dda42fc9e88c2b2678403419bfe63 upstream.

Commit fec1878fe952b994125a3be7c94b1322db586f3b caused a regression in
which contiguous blocks being allocated to the end of an extent were
getting a new extent created. This typically results in files entirely
made up of 1-block extents even though the blocks are contiguous on
disk.

Apparently grub doesn't handle a jfs file being fragmented into too many
extents, since it refuses to boot a kernel from jfs that was created by
the 2.6.30 kernel.

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Reported-by: Alex <alevkovich@tut.by>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ISDN: Fix DMA alloc for hfcpci

commit 8a745b9d91962991ce87a649a4dc3af3206c2c8b upstream.

Replace wrong code with correct DMA API functions.

Signed-off-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

sym53c8xx: ratelimit parity errors

commit 75be63bcf73ebdd1fdc1d49f6bf2d1326a1ba7de upstream.

This makes a huge difference when you have a serial console on bootup to limit
these messages to a sane number.

Signed-off-by: John Stoffel <john@stoffel.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

V4L: i2c modules must be linked before the v4l2 drivers

(backported from commit df59f0b3df3cc35fa03ea395f5106d1625e3726a)

Please note that this patch attached has been BACKPORTED to fit kernel
2.6.30.y

Since i2c autoprobing is no longer supported by v4l2 we need to make sure
that the i2c modules are linked before the v4l2 modules. The v4l2 modules
now rely on the presence of the i2c modules, so these must have initialized
themselves before the v4l2 modules.

The exception is the ir-kbd-i2c module, which is the only one still using
autoprobing. This one should be loaded at the end of the v4l2 module. Loading
it earlier actually causes problems with tveeprom. Once ir-kbd-i2c is no
longer autoprobing, then it has to move up as well.

This is only an issue when everything is compiled into the kernel.

Thanks to Marcus Swoboda for reporting this and Udo Steinberg for testing
this patch.

Tested-by: Udo A. Steinberg <udo@hypervisor.org>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Michael Krufky <mkrufky@linuxtv.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

V4L: pvrusb2: Re-fix hardware scaling on video standard change

(cherry picked from commit a6862da2f3c7ce3ec6644958bc8937b630b9e2c1)

The cx25840 module's VBI initialization logic uses the current video
standard as part of its internal algorithm.  This therefore means that
we probably need to make sure that the correct video standard has been
set before initializing VBI.  (Normally we would not care about VBI,
but as described in an earlier changeset, VBI must be initialized
correctly on the cx25840 in order for the chip's hardware scaler to
operate correctly.)

It's kind of messy to force the video standard to be set before
initializing VBI (mainly because we can't know what the app really
wants that early in the initialization process).  So this patch does
the next best thing: VBI is re-initialized after any point where the
video standard has been set.

Signed-off-by: Mike Isely <isely@pobox.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Michael Krufky <mkrufky@linuxtv.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

V4L: pvrusb2: Fix hardware scaling when used with cx25840

(cherry picked from commit e17d787c513f41f59969247062561fff6340f211)

The cx25840 module requires that its VBI initialization entry point be
called in order for hardware-scaled video capture to work properly -
even if we don't care about VBI.  Making this behavior even more
subtle is that if the capture resolution is set to 720x480 - which is
the default that the pvrusb2 driver sets up - then the cx25840
bypasses the hardware scaler.  Therefore this problem does not
manifest itself until some other resolution, e.g. 640x480, is tried.
MythTV typically defaults to 640x480 or 480x480, which means that
things break whenever the driver is used with MythTV.

This all has been known for a while (since at least Nov 2006), but
recent changes in the pvrusb2 driver (specifically in regards to
sub-device support) caused this to break again.  VBI initialization
must happen *after* the chip's firmware is loaded, not before.  With
this fix, 24xxx devices work correctly again.

A related fix that is part of this changeset is that now we
re-initialize VBI any time after we issue a reset to the cx25840
driver.  Issuing a chip reset erases the state that the VBI setup
previously did.  Until the HVR-1950 came along this subtlety went
unnoticed, because the pvrusb2 driver previously never issued such a
reset.  But with the HVR-1950 we have to do that reset in order to
correctly transition from digital back to analog mode - and since the
HVR-1950 always starts in digital mode (required for the DVB side to
initialize correctly) then this device has never had a chance to work
correctly in analog mode!  Analog capture on the HVR-1950 has been
broken this *ENTIRE* time.  I had missed it until now because I've
usually been testing at the default 720x480 resolution which does not
require scaling...  What fun.  By re-initializing VBI after a cx25840
chip reset, correct behavior is restored.

Signed-off-by: Mike Isely <isely@pobox.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Michael Krufky <mkrufky@linuxtv.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

V4L: ivtv/cx18: fix regression: class controls are no longer seen

(cherry picked from commit c6711c3e6d4976716633047c0f6bbd953d6831fb)

A previous change (v4l2-common: remove v4l2_ctrl_query_fill_std) broke
the handling of class controls in VIDIOC_QUERYCTRL. The MPEG class control
was broken for all drivers that use the cx2341x module and the USER class
control was broken for ivtv and cx18.

This change adds back proper class control support.

Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Michael Krufky <mkrufky@linuxtv.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

DVB: lgdt3305: fix 64bit division in function lgdt3305_set_if

(cherry picked from commit 511da457340d3b30336f7a6731bad9bbe3ffaf08)

Signed-off-by: Michael Krufky <mkrufky@kernellabs.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

USB: usbtmc: fix switch statment

commit a92b63e7e4c185b4dd9e87762e2cb716e54482d0 upstream.

Steve Holland pointed out that we forgot to call break; in the switch
statment. This probably resolves a lot of the bug reports I've gotten
for the driver lately.

Stupid me...

Reported-by: Steve Holland <sdh4@iastate.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86: Detect use of extended APIC ID for AMD CPUs

commit 42937e81a82b6bbc51a309c83da140b3a7ca5945 upstream.

Booting a 32-bit kernel on Magny-Cours results in the following panic:

  ...
  Using APIC driver default
  ...
  Overriding APIC driver with bigsmp
  ...
  Getting VERSION: 80050010
  Getting VERSION: 80050010
  Getting ID: 10000000
  Getting ID: ef000000
  Getting LVT0: 700
  Getting LVT1: 10000
  Kernel panic - not syncing: Boot APIC ID in local APIC unexpected (16 vs 0)
  Pid: 1, comm: swapper Not tainted 2.6.30-rcX #2
  Call Trace:
   [<c05194da>] ? panic+0x38/0xd3
   [<c0743102>] ? native_smp_prepare_cpus+0x259/0x31f
   [<c073b19d>] ? kernel_init+0x3e/0x141
   [<c073b15f>] ? kernel_init+0x0/0x141
   [<c020325f>] ? kernel_thread_helper+0x7/0x10

The reason is that default_get_apic_id handled extension of local APIC
ID field just in case of XAPIC.

Thus for this AMD CPU, default_get_apic_id() returns 0 and
bigsmp_get_apic_id() returns 16 which leads to the respective kernel
panic.

This patch introduces a Linux specific feature flag to indicate
support for extended APIC id (8 bits instead of 4 bits width) and sets
the flag on AMD CPUs if applicable.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20090608135509.GA12431@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

epca: fix test_bit parameters

commit c3301a5c04800bcf8afc8a815bf9e570a4e25a08 upstream.

Switch from ASYNC_* to ASYNCB_*, because test_bit expects
bit number, not mask.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

rocket: fix test_bit parameters

commit a391ad0f09014856bbc4eeea309593eba977b3b0 upstream.

Switch from ASYNC_* to ASYNCB_*, because {test,set}_bit expect
bit number, not mask.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

serial: refactor ASYNC_ flags

commit 70beaed22cbe12979e55d99b370e147e2e168562 upstream.

Define ASYNCB_* flags which are bit numbers of the ASYNC_* flags.
This is useful for {test,set,clear}_bit.

Also convert each ASYNC_% to be (1 << ASYNCB_%) and define masks
with the macros, not constants.

Tested with:
#include "PATH_TO_KERNEL/include/linux/serial.h"
static struct {
        unsigned int new, old;
} as[] = {
        { ASYNC_HUP_NOTIFY, 0x0001 },
        { ASYNC_FOURPORT, 0x0002 },
...
{ ASYNC_BOOT_ONLYMCA, 0x00400000 },
        { ASYNC_INTERNAL_FLAGS, 0xFFC00000 }
};
...
        for (a = 0; a < ARRAY_SIZE(as); a++)
                if (as[a].old != as[a].new)
                        printf("%.8x != %.8x\n", as[a].old, as[a].new);

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

char: moxa, prevent opening unavailable ports

commit a90b037583d5f1ae3e54e9c687c79df82d1d34a4 upstream.

In moxa.c there are 32 minor numbers reserved for each device.  The number
of ports actually available per device is stored in
moxa_board_conf->numPorts.  This number is not considered in moxa_open().
Opening a port that is not available results in a kernel oops.  This patch
adds a test to moxa_open() that prevents opening unavailable ports.

[akpm@linux-foundation.org: avoid multiple returns]
Signed-off-by: Dirk Eibach <eibach@gdsys.de>
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

dma-debug: change hash_bucket_find from first-fit to best-fit

commit 7caf6a49bb17d0377210693af5737563b31aa5ee upstream.

Some device drivers map the same physical address multiple times to a
dma address. Without an IOMMU this results in the same dma address being
put into the dma-debug hash multiple times. With a first-fit match in
hash_bucket_find() this function may return the wrong dma_debug_entry.

This can result in false positive warnings. This patch fixes it by
changing the first-fit behavior of hash_bucket_find() into a best-fit
algorithm.

Reported-by: Torsten Kaiser <just.for.lkml@googlemail.com>
Reported-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: lethal@linux-sh.org
Cc: just.for.lkml@googlemail.com
Cc: hancockrwd@gmail.com
Cc: jens.axboe@oracle.com
Cc: bharrosh@panasas.com
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <20090605104132.GE24836@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

bonding: fix multiple module load problem

[ Upstream commit 130aa61a77b8518f1ea618e1b7d214d60b405f10 ]

Some users still load bond module multiple times to create bonding
devices.  This accidentally was broken by a later patch about
the time sysfs was fixed.  According to Jay, it was broken
by:
   commit b8a9787eddb0e4665f31dd1d64584732b2b5d051
   Author: Jay Vosburgh <fubar@us.ibm.com>
   Date:   Fri Jun 13 18:12:04 2008 -0700

     bonding: Allow setting max_bonds to zero

Note: sysfs and procfs still produce WARN() messages when this is done
so the sysfs method is the recommended API.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x25: Fix sleep from timer on socket destroy.

[ Upstream commit 14ebaf81e13ce66bff275380b246796fd16cbfa1 ]

If socket destuction gets delayed to a timer, we try to
lock_sock() from that timer which won't work.

Use bh_lock_sock() in that case.

Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

via-velocity: Fix velocity driver unmapping incorrect size.

[ Upstream commit f6b24caaf933a466397915a08e30e885a32f905a ]

When a packet is greater than ETH_ZLEN, we end up assigning the
boolean result of a comparison to the size we unmap.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

tun: Fix unregister race

[ Upstream commit f0a4d0e5b5bfd271e6737f7c095994835b70d450 ]

It is possible for tun_chr_close to race with dellink on the
a tun device. In which case if __tun_get runs before dellink
but dellink runs before tun_chr_close calls unregister_netdevice
we will attempt to unregister the netdevice after it is already
gone.

The two cases are already serialized on the rtnl_lock, so I have
gone for the cheap simple fix of moving rtnl_lock to cover __tun_get
in tun_chr_close. Eliminating the possibility of the tun device
being unregistered between __tun_get and unregister_netdevice in
tun_chr_close.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Tested-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

sky2: don't look for VPD size

[ Upstream commit 6cc90a5a6061428358d0f726a53fb44af5254111 ]

The code to compute VPD size didn't handle some systems that use
chip without VPD. Also some of the newer chips use some additional
registers to store the actual size, and wasn't worth putting the
additional complexity in, so just remove the code.

No big loss since the code to set the VPD size was only a
convenience so that utilities would not read the extra space past
the end of the available VPD.

Move the first PCI config read earlier to detect bad hardware
where it returns all ones and refuse loading driver before furthur
damage.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Tested-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ipv4 routing: Ensure that route cache entries are usable and reclaimable with caching is off

[ Upstream commit b6280b47a7a42970d098a3059f4ebe7e55e90d8d ]

When route caching is disabled (rt_caching returns false), We still use route
cache entries that are created and passed into rt_intern_hash once.  These
routes need to be made usable for the one call path that holds a reference to
them, and they need to be reclaimed when they're finished with their use.  To be
made usable, they need to be associated with a neighbor table entry (which they
currently are not), otherwise iproute_finish2 just discards the packet, since we
don't know which L2 peer to send the packet to.  To do this binding, we need to
follow the path a bit higher up in rt_intern_hash, which calls
arp_bind_neighbour, but not assign the route entry to the hash table.
Currently, if caching is off, we simply assign the route to the rp pointer and
are reutrn success.  This patch associates us with a neighbor entry first.

Secondly, we need to make sure that any single use routes like this are known to
the garbage collector when caching is off.  If caching is off, and we try to
hash in a route, it will leak when its refcount reaches zero.  To avoid this,
this patch calls rt_free on the route cache entry passed into rt_intern_hash.
This places us on the gc list for the route cache garbage collector, so that
when its refcount reaches zero, it will be reclaimed (Thanks to Alexey for this
suggestion).

I've tested this on a local system here, and with these patches in place, I'm
able to maintain routed connectivity to remote systems, even if I set
/proc/sys/net/ipv4/rt_cache_rebuild_count to -1, which forces rt_caching to
return false.

Signed-off-by: Neil Horman <nhorman@redhat.com>
Reported-by: Jarek Poplawski <jarkao2@gmail.com>
Reported-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ipv4: fix NULL pointer + success return in route lookup path

[ Upstream commit 73e42897e8e5619eacb787d2ce69be12f47cfc21 ]

Don't drop route if we're not caching

I recently got a report of an oops on a route lookup.  Maxime was
testing what would happen if route caching was turned off (doing so by setting
making rt_caching always return 0), and found that it triggered an oops.  I
looked at it and found that the problem stemmed from the fact that the route
lookup routines were returning success from their lookup paths (which is good),
but never set the **rp pointer to anything (which is bad).  This happens because
in rt_intern_hash, if rt_caching returns false, we call rt_drop and return 0.
This almost emulates slient success.  What we should be doing is assigning *rp =
rt and _not_ dropping the route.  This way, during slow path lookups, when we
create a new route cache entry, we don't immediately discard it, rather we just
don't add it into the cache hash table, but we let this one lookup use it for
the purpose of this route request.  Maxime has tested and reports it prevents
the oops.  There is still a subsequent routing issue that I'm looking into
further, but I'm confident that, even if its related to this same path, this
patch makes sense to take.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

pegasus usb-net: Fix endianness bugs

[ Upstream commit e3453f6342110d60edb37be92c4a4f668ca8b0c4 ]

This fixes various endianness bugs. Some harmless and some real ones.
This is tested on a PowerPC-64 machine.

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

e1000e: stop unnecessary polling when using msi-x

[ Upstream commit 679e8a0f0ae3333e94b1d374d07775fce9066025 ]

The last hunk of this commit:

    commit 12d04a3c12b420f23398b4d650127642469a60a6
    Author: Alexander Duyck <alexander.h.duyck@intel.com>
    Date:   Wed Mar 25 22:05:03 2009 +0000

        e1000e: commonize tx cleanup routine to match e1000 & igb

changed the logic for determining if we should call napi_complete or
not at then end of a napi poll.

If the NIC is using MSI-X with no work to do in ->poll, net_rx_action
can just spin indefinitely on older kernels and for 2 jiffies on newer
kernels since napi_complete is never called and budget isn't
decremented.

Discovered and verified while testing driver backport to an older
kernel.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

IMA: open all files O_LARGEFILE

commit 1a62e958fa4aaeeb752311b4f5e16b2a86737b23 upstream.

If IMA tried to measure a file which was larger than 4G dentry_open would fail
with -EOVERFLOW since IMA wasn't passing O_LARGEFILE. This patch passes
O_LARGEFILE to all IMA opens to avoid this problem.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

IMA: Handle dentry_open failures

commit f06dd16a03f6f7f72fab4db03be36e28c28c6fd6 upstream.

Currently IMA does not handle failures from dentry_open(). This means that we
leave a pointer set to ERR_PTR(errno) and then try to use it just a few lines
later in fput(). Oops.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

IMA: use current_cred() instead of current->cred

commit 37bcbf13d32e4e453e9def79ee72bd953b88302f upstream.

Proper invocation of the current credentials is to use current_cred() not
current->cred. This patches makes IMA use the new method.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

KVM: Fix dirty bit tracking for slots with large pages

commit e244584fe3a5c20deddeca246548ac86dbc6e1d1 upstream.

When slot is already allocated and being asked to be tracked we need
to break the large pages.

This code flush the mmu when someone ask a slot to start dirty bit
tracking.

Signed-off-by: Izik Eidus <ieidus@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

KVM: protect concurrent make_all_cpus_request

commit 84261923d3dddb766736023bead6fa07b7e218d5 upstream.

make_all_cpus_request contains a race condition which can
trigger false request completed status, as follows:

CPU0                                              CPU1

if (test_and_set_bit(req,&vcpu->requests))
   ....                                            if (test_and_set_bit(req,&vcpu->requests))
   ..                                                  return
proceed to smp_call_function_many(wait=1)

Use a spinlock to serialize concurrent CPUs.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

KVM: VMX: Handle vmx instruction vmexits

commit e3c7cb6ad7191e92ba89d00a7ae5f5dd1ca0c214 upstream.

IF a guest tries to use vmx instructions, inject a #UD to let it know the
instruction is not implemented, rather than crashing.

This prevents guest userspace from crashing the guest kernel.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

KVM: x86: check for cr3 validity in ioctl_set_sregs

commit 59839dfff5eabca01cc4e20b45797a60a80af8cb upstream.

Matt T. Yourst notes that kvm_arch_vcpu_ioctl_set_sregs lacks validity
checking for the new cr3 value:

"Userspace callers of KVM_SET_SREGS can pass a bogus value of cr3 to
the kernel. This will trigger a NULL pointer access in gfn_to_rmap()
when userspace next tries to call KVM_RUN on the affected VCPU and kvm
attempts to activate the new non-existent page table root.

This happens since kvm only validates that cr3 points to a valid guest
physical memory page when code *inside* the guest sets cr3. However, kvm
currently trusts the userspace caller (e.g. QEMU) on the host machine to
always supply a valid page table root, rather than properly validating
it along with the rest of the reloaded guest state."

http://sourceforge.net/tracker/?func=detail&atid=893831&aid=2687641&group_id=180599

Check for a valid cr3 address in kvm_arch_vcpu_ioctl_set_sregs, triple
fault in case of failure.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

KVM: Prevent overflow in largepages calculation

commit 09f8ca74ae6c2d78b2c7f6c0751ed0cbe815a3d9 upstream.

If userspace specifies a memory slot that is larger than 8 petabytes, it
could overflow the largepages variable.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

KVM: Disable large pages on misaligned memory slots

commit ac04527f7947020c5890090b2ac87af4e98d977e upstream.

If a slots guest physical address and host virtual address unequal (mod
large page size), then we would erronously try to back guest large pages
with host large pages. Detect this misalignment and diable large page
support for the trouble slot.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

KVM: Add VT-x machine check support v4

(replaces commit a0861c02a981c943573478ea13b29b1fb958ee5b upstream in a
cleaner way for the 2.6.30 kernel tree)

VT-x needs an explicit MC vector intercept to handle machine checks in
the hypervisor.

It also has a special option to catch machine checks that happen
during VT entry.

Do these interceptions and forward them to the Linux machine check
handler. Make it always look like user space is interrupted because
the machine check handler treats kernel/user space differently.

Thanks to Huang Ying and Jiang Yunhong for help and testing.

Cc: ying.huang@intel.com
v2: Handle machine checks still in interrupt off context
to avoid problems on preemptible kernels.
v3: Handle old style 32bit and make fully standalone

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Acked-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

PCI: disable ASPM on VIA root-port-under-bridge configurations

commit 8e822df700694ca6850d1e0c122fd7004b2778d8 upstream.

VIA has a strange chipset, it has root port under a bridge. Disable ASPM
for such strange chipset.

Tested-by: Wolfgang Denk <wd@denx.de>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

fs: remove incorrect I_NEW warnings

commit 545b9fd3d737afc0bb5203b1e79194a471605acd upstream.

Some filesystems can call in to sync an inode that is still in the
I_NEW state (eg. ext family, when mounted with -osync). This is OK
because the filesystem has sole access to the new inode, so it can
modify i_state without races (because no other thread should be
modifying it, by definition of I_NEW). Ie. a false positive, so
remove the warnings.

The races are described here 7ef0d7377cb287e08f3ae94cebc919448e1f5dff,
which is also where the warnings were introduced.

Reported-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

firmware_map: fix hang with x86/32bit

commit 3b0fde0fac19c180317eb0601b3504083f4b9bf5 upstream.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13484

Peer reported:
| The bug is introduced from kernel 2.6.27, if E820 table reserve the memory
| above 4G in 32bit OS(BIOS-e820: 00000000fff80000 - 0000000120000000
| (reserved)), system will report Int 6 error and hang up. The bug is caused by
| the following code in drivers/firmware/memmap.c, the resource_size_t is 32bit
| variable in 32bit OS, the BUG_ON() will be invoked to result in the Int 6
| error. I try the latest 32bit Ubuntu and Fedora distributions, all hit this
| bug.
|======
|static int firmware_map_add_entry(resource_size_t start, resource_size_t end,
| const char *type,
| struct firmware_map_entry *entry)

and it only happen with CONFIG_PHYS_ADDR_T_64BIT is not set.

it turns out we need to pass u64 instead of resource_size_t for that.

[akpm@linux-foundation.org: add comment]
Reported-and-tested-by: Peer Chen <pchen@nvidia.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Linux 2.6.30