git.karo-electronics.de Git - karo-tx-linux.git/log

smsc9420: prevent BUG() if ethtool is called with interface down

[ Upstream commit 6c53b1b15e222244358d3cbbefd2a13920faa352 ]

This patch fixes a null pointer dereference BUG() if ethtool is used on
an smsc9420 interface while it is down, because the phy_dev is only
allocated while the interface is up.

Signed-off-by: Steve Glendinning <steve.glendinning@smsc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

NET: smc91x: Fix irq flags

[ Upstream commit d5ccd67bb77ced5249067d05171992a7d5020393 ]

smc91x.h defines SMC_IRQ_FLAGS to be -1 when it wants the interrupt
flags to be taken from the resource structure. However, d280ead
changed this to checking for non-zero resource flags.

Unfortunately, this means that on some platforms, we end up passing
'-1' to request_irq rather than the desired result. Combine the two
conditions into one so that the IRQ flags are taken from the resource
if either SMC_IRQ_FLAGS is -1 or the resource flags specify an
interrupt trigger.

This restores network on at least the Versatile platform.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Eric Miao <eric.y.miao@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

net: Fix the rollback test in dev_change_name()

[ Upstream commit 91e9c07bd635353d1a278bdb38dbb56ac371bcb8 ]

net: Fix the rollback test in dev_change_name()

In dev_change_name() an err variable is used for storing the original
call_netdevice_notifiers() errno (negative) and testing for a rollback
error later, but the test for non-zero is wrong, because the err might
have positive value as well - from dev_alloc_name(). It means the
rollback for a netdevice with a number > 0 will never happen. (The err
test is reordered btw. to make it more readable.)

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Revert "isdn: isdn_ppp: Use SKB list facilities instead of home-grown implementation."

[ Upstream commit e29d4363174949a7a4e46f670993d7ff43342c1c ]

This reverts commit 38783e671399b5405f1fd177d602c400a9577ae6.

It causes kernel bugzilla #14594

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ip_fragment: also adjust skb->truesize for packets not owned by a socket

[ Upstream commit b2722b1c3a893ec6021508da15b32282ec79f4da ]

When a large packet gets reassembled by ip_defrag(), the head skb
accounts for all the fragments in skb->truesize. If this packet is
refragmented again, skb->truesize is not re-adjusted to reflect only
the head size since its not owned by a socket. If the head fragment
then gets recycled and reused for another received fragment, it might
exceed the defragmentation limits due to its large truesize value.

skb_recycle_check() explicitly checks for linear skbs, so any recycled
skb should reflect its true size in skb->truesize. Change ip_fragment()
to also adjust the truesize value of skbs not owned by a socket.

Reported-and-tested-by: Ben Menchaca <ben@bigfootnetworks.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

gro: Fix illegal merging of trailer trash

[ Upstream commit 69c0cab120a85471054614418b447349caba22d7 ]

When we've merged skb's with page frags, and subsequently receive
a trailer skb (< MSS) that is not completely non-linear (this can
occur on Intel NICs if the packet size falls below the threshold),
GRO ends up producing an illegal GSO skb with a frag_list.

This is harmless unless the skb is then forwarded through an
interface that requires software GSO, whereupon the GSO code
will BUG.

This patch detects this case in GRO and avoids merging the
trailer skb.

Reported-by: Mark Wagner <mwagner@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

b44: Fix wedge when using netconsole.

[ Upstream commit 0cae200eec6330cd2c20b24279597be1da50dc93 ]

Fixes kernel bugzilla #14691

Due to the way netpoll works, it is perfectly legal to see
NAPI already scheduled when new device events are pending
in b44_interrupt().

So logging a message about it is wrong and in fact harmful.

Based upon a patch by Andreas Mohr.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

b44 WOL setup: one-bit-off stack corruption kernel panic fix

[ Upstream commit: e0188829cb724e7d12a2d4e343b368ff1d6e1471 ]

About 50% of shutdowns of b44 Ethernet adapter ends by kernel panic
with kernels compiled with stack-protector.

Checking b44_magic_pattern() return values, one call of
b44_magic_pattern() returns 127. It means, that set_bit(128, pmask)
was called on line 1509. It means that bit 0 of 17th byte of pmask was
overwritten. But pmask has only 16 bytes. Stack corruption happens.

It seems that set_bit() on line 1509 always writes one bit off.

The fix does not only solve the stack corruption, but also makes Wake
On LAN working on my onboard B44 on Asus A7V-333X mainboard.

It seems that this problem affects all kernel versions since commit
725ad800 ([PATCH] b44: add wol for old nic) on 2006-06-20.

Signed-off-by: Stanislav Brabec <sbrabec@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Au1x00: fix crash when trying register_netdev()

[ Upstream commit 63edaf647607795a065e6956a79c47f500dc8447 ]

Andreas Lohre reported that the driver crashes when trying
to register_netdev(), he sugessted to move dev->netdev_ops initialization
before calling register_netdev(), it worked for him.

Reported-by: Andreas Lohre <alohre@gmail.com>
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

slc90e66: fix UDMA handling

[ Upstream commit ee31527a02b0a8e1aa4a5e4084d2db5fa34737ed ]

Fix checking of the currently programmed UDMA mode.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Revert "ide: try to use PIO Mode 0 during probe if possible"

[ Upstream commit 0fb18c4777ff424c1db694af98443a201fa4fc30 ]

This reverts commit 6029336426a2b43e4bc6f4a84be8789a047d139e.

Ok, we really do need to revert this, even with Bart's sis5513.c
fix in there.

The problem is that several driver's ->set_pio_mode() method
depends upon the drive->media type being set properly. Most
of them use this to enable prefetching, which can only be done
for disk media.

But the commit being reverted here calls ->set_pio_mode() before
it's setup. Actually it considers everything disk because that
is the default media type set by ide_port_init_devices_data().

The set of drivers that depend upon the media type in their
->set_pio_method() are:

drivers/ide/alim15x3.c
drivers/ide/it8172.c
drivers/ide/it8213.c
drivers/ide/pdc202xx_old.c
drivers/ide/piix.c
drivers/ide/qd65xx.c
drivers/ide/sis5513.c
drivers/ide/slc90e66.c

And it is possible that we could fix this by guarding the prefetching
and other media dependent setting changes with a test on
IDE_PFLAG_PROBING in hwif->port_flags, that's simply too risky for
2.6.32-rcX and -stable.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ide: fix ioctl to pass requested transfer mode to ide_find_dma_mode instead of UDMA6

[ Upstream commit 28c1969ff887bc2a7df39272850dece01de03285 ]

Currently, ide_cmd_ioctl when invoked for setting DMA transfer mode calls
ide_find_dma_mode with requested mode as XFER_UDMA_6. This prevents setting DMA
mode to any other value than the default (maximum) supported by the device (or
UDMA6, if supported) irrespective of the actual requested transfer mode and
returns error.

For example, setting mode to UDMA2 using hdparm, where UDMA4 is the default
transfer mode gives following error:
# ./hdparm -d1 -Xudma2 /dev/hda
/dev/hda:hda: UDMA/66 mode selected
setting using_dma to 1 (on)
hda: UDMA/66 mode selected
setting xfermode to 66 (UltraDMA mode2)
HDIO_DRIVE_CMD(setxfermode) failed: Invalid argument
using_dma = 1 (on)

This patch fixes the issue.

Signed-off-by: Hemant Pedanekar <hemantp@ti.com>
Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ide: Serialize CMD643 and CMD646 to fix a hardware bug with SSD

[ Upstream commit 9bd7496f5dd488e109e91d9d5743915fb4dfbfde ]

CMD646 corrupts data on concurrent transfers on both channels when IDE SSD is
connected to one of the channels.

Setup that demonstrates this hardware bug: Ultra 5, onboard CMD646, rev 3.
/dev/hda is 8GB Seagate ST38410A in MWDMA2
/dev/hdd is 32GB SSD SiliconHardDisk in MWDMA2

- When reading /dev/hdd (for example with dd or fsck), reads from /dev/hda
  are corrupted, there are twiddled single bits 1->0 and some full 32-bit
  words corrupted, sometimes commands fail (which switches /dev/hda to
  PIO mode but the corruptions happen even in PIO).
- Reads from /dev/hdd don't seem to be corrupted (i.e. fsck passes fine).
- When I connected normal rotating harddisk to /dev/hdd, there was no
  corruption, so the corruption is something specific to SSD.
- I tried the same setup on a PCI card with CMD649 and saw no corruption.

This patch serializes the operation for CMD646 and 643 (I didn't test
CMD643 but it may have the same hw bug too because it's earlier design).
CMD649 is good. I don't know anything about CMD 648.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Tested-by: Frans Pop <elendil@planet.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Serial: Do not read IIR in serial8250_start_tx when UART_BUG_TXEN

commit 68cb4f8e246bbbc649980be0628cae9265870a91 upstream.

Do not read IIR in serial8250_start_tx when UART_BUG_TXEN

Reading the IIR clears some oustanding interrupts so it is not safe.
Instead, simply transmit immediately if the buffer is empty without
regard to IIR.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Cc: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

powerpc: Fix usage of 64-bit instruction in 32-bit altivec code

commit e090aa80321b64c3b793f3b047e31ecf1af9538d upstream.

e821ea70f3b4873b50056a1e0f74befed1014c09 introduced a bug by copying
some 64-bit originated code as-is to be used by both 32 and 64-bit
but this code contains a 64-bit ony "cmpdi" instruction.

This changes it to cmpwi, which is fine since VRSAVE can only contains
a 32-bit value anyway.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

mm: hugetlb: fix hugepage memory leak in walk_page_range()

commit d33b9f45bd24a6391bc05e2b5a13c1b5787ca9c2 upstream.

Most callers of pmd_none_or_clear_bad() check whether the target page is
in a hugepage or not, but walk_page_range() do not check it.  So if we
read /proc/pid/pagemap for the hugepage on x86 machine, the hugepage
memory is leaked as shown below.  This patch fixes it.

Details
=======
My test program (leak_pagemap) works as follows:
- creat() and mmap() a file on hugetlbfs (file size is 200MB == 100 hugepages,)
- read()/write() something on it,
- call page-types with option -p (walk around the page tables),
- munmap() and unlink() the file on hugetlbfs

Without my patches
------------------
$ cat /proc/meminfo |grep "HugePage"
HugePages_Total:    1000
HugePages_Free:     1000
HugePages_Rsvd:        0
HugePages_Surp:        0
$ ./leak_pagemap
[snip output]
$ cat /proc/meminfo |grep "HugePage"
HugePages_Total:    1000
HugePages_Free:      900
HugePages_Rsvd:        0
HugePages_Surp:        0
$ ls /hugetlbfs/
$

100 hugepages are accounted as used while there is no file on hugetlbfs.

With my patches
---------------
$ cat /proc/meminfo |grep "HugePage"
HugePages_Total:    1000
HugePages_Free:     1000
HugePages_Rsvd:        0
HugePages_Surp:        0
$ ./leak_pagemap
[snip output]
$ cat /proc/meminfo |grep "HugePage"
HugePages_Total:    1000
HugePages_Free:     1000
HugePages_Rsvd:        0
HugePages_Surp:        0
$ ls /hugetlbfs
$

No memory leaks.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

mm: hugetlb: fix hugepage memory leak in mincore()

commit 4f16fc107d9c9b8a72aa19b189a9216e90a7aaef upstream.

Most callers of pmd_none_or_clear_bad() check whether the target page is
in a hugepage or not, but mincore() and walk_page_range() do not check it.
So if we use mincore() on a hugepage on x86 machine, the hugepage memory
is leaked as shown below.  This patch fixes it by extending mincore()
system call to support hugepages.

Details
=======
My test program (leak_mincore) works as follows:
- creat() and mmap() a file on hugetlbfs (file size is 200MB == 100 hugepages,)
- read()/write() something on it,
- call mincore() for first ten pages and printf() the values of *vec
- munmap() and unlink() the file on hugetlbfs

Without my patch
----------------
$ cat /proc/meminfo| grep "HugePage"
HugePages_Total:    1000
HugePages_Free:     1000
HugePages_Rsvd:        0
HugePages_Surp:        0
$ ./leak_mincore
vec[0] 0
vec[1] 0
vec[2] 0
vec[3] 0
vec[4] 0
vec[5] 0
vec[6] 0
vec[7] 0
vec[8] 0
vec[9] 0
$ cat /proc/meminfo |grep "HugePage"
HugePages_Total:    1000
HugePages_Free:      999
HugePages_Rsvd:        0
HugePages_Surp:        0
$ ls /hugetlbfs/
$

Return values in *vec from mincore() are set to 0, while the hugepage
should be in memory, and 1 hugepage is still accounted as used while
there is no file on hugetlbfs.

With my patch
-------------
$ cat /proc/meminfo| grep "HugePage"
HugePages_Total:    1000
HugePages_Free:     1000
HugePages_Rsvd:        0
HugePages_Surp:        0
$ ./leak_mincore
vec[0] 1
vec[1] 1
vec[2] 1
vec[3] 1
vec[4] 1
vec[5] 1
vec[6] 1
vec[7] 1
vec[8] 1
vec[9] 1
$ cat /proc/meminfo |grep "HugePage"
HugePages_Total:    1000
HugePages_Free:     1000
HugePages_Rsvd:        0
HugePages_Surp:        0
$ ls /hugetlbfs/
$

Return value in *vec set to 1 and no memory leaks.

[akpm@linux-foundation.org: cleanup]
[akpm@linux-foundation.org: build fix]
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

mac80211: Fix bug in computing crc over dynamic IEs in beacon

commit 1814077fd12a9cdf478c10076e9c42094e9d9250 upstream.

On a 32-bit machine, BIT() macro does not give the required
bit value if the bit is mroe than 31. In ieee802_11_parse_elems_crc(),
BIT() is suppossed to get the bit value more than 31 (42 (id of ERP_INFO_IE),
37 (CHANNEL_SWITCH_IE), (42), 32 (POWER_CONSTRAINT_IE), 45 (HT_CAP_IE),
61 (HT_INFO_IE)). As we do not get the required bit value for the above
IEs, crc over these IEs are never calculated, so any dynamic change in these
IEs after the association is not really handled on 32-bit platforms.
This patch fixes this issue.

Signed-off-by: Vasanthakumar Thiagarajan <vasanth@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

drm/radeon/kms: fix legacy crtc2 dpms

commit 8de21525439e6b5bb8d8c81e49094d867bf82f6d upstream.

noticed by Matthijs Kooijman on fdo bug 22140

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

drm/radeon/kms: Add quirk for HIS X1300 board

commit 4e3f9b78ff917cc5c833858fdb5d96bc262e0bf3 upstream.

Board is DVI+VGA, not DVI+DVI

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86: GART: pci-gart_64.c: Use correct length in strncmp

commit 41855b77547fa18d90ed6a5d322983d3fdab1959 upstream.

Signed-off-by: Joe Perches <joe@perches.com>
LKML-Reference: <1257818330.12852.72.camel@Joe-Laptop.home>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86: Fix typo in Intel CPU cache size descriptor

commit e02e0e1a130b9ca37c5186d38ad4b3aaf58bb149 upstream.

I double-checked the datasheet. One of the existing
descriptors has a typo: it should be 2MB not 2038 KB.

Signed-off-by: Dave Jones <davej@redhat.com>
LKML-Reference: <20091110200120.GA27090@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86: Fix iommu=nodac parameter handling

commit 2ae8bb75db1f3de422eb5898f2a063c46c36dba8 upstream.

iommu=nodac should forbid dac instead of enabling it. Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Matteo Frigo <athena@fftw.org>
LKML-Reference: <4AE5B52A.4050408@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86, Calgary IOMMU quirk: Find nearest matching Calgary while walking up the PCI tree

commit 4528752f49c1f4025473d12bc5fa9181085c3f22 upstream.

On a multi-node x3950M2 system, there's a slight oddity in the
PCI device tree for all secondary nodes:

30:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
  \-33:00.0 PCI bridge: IBM CalIOC2 PCI-E Root Port (rev 01)
     \-34:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)

...as compared to the primary node:

00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
  \-01:00.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)
03:00.0 PCI bridge: IBM CalIOC2 PCI-E Root Port (rev 01)
  \-04:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)

In both nodes, the LSI RAID controller hangs off a CalIOC2
device, but on the secondary nodes, the BIOS hides the VGA
device and substitutes the device tree ending with the disk
controller.

It would seem that Calgary devices don't necessarily appear at
the top of the PCI tree, which means that the current code to
find the Calgary IOMMU that goes with a particular device is
buggy.

Rather than walk all the way to the top of the PCI
device tree and try to match bus number with Calgary descriptor,
the code needs to examine each parent of the particular device;
if it encounters a Calgary with a matching bus number, simply
use that.

Otherwise, we BUG() when the bus number of the Calgary doesn't
match the bus number of whatever's at the top of the device tree.

Extra note: This patch appears to work correctly for the x3950
that came before the x3950 M2.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Jon D. Mason <jdmason@kudzu.us>
Cc: Corinna Schultz <coschult@us.ibm.com>
LKML-Reference: <20091202230556.GG10295@tux1.beaverton.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86: ASUS P4S800 reboot=bios quirk

commit 4832ddda2ec4df96ea1eed334ae2dbd65fc1f541 upstream.

Bug reporter noted their system with an ASUS P4S800 motherboard would
hang when rebooting unless reboot=b was specified.  Their dmidecode
didn't contain descriptive System Information for Manufacturer or
Product Name, so I used their Base Board Information to create a
reboot quirk patch.  The bug reporter confirmed this patch resolves
the reboot hang.

Handle 0x0001, DMI type 1, 25 bytes
System Information
       Manufacturer: System Manufacturer
       Product Name: System Name
       Version: System Version
       Serial Number: SYS-1234567890
       UUID: E0BFCD8B-7948-D911-A953-E486B4EEB67F
       Wake-up Type: Power Switch

Handle 0x0002, DMI type 2, 8 bytes
Base Board Information
     Manufacturer: ASUSTeK Computer INC.
     Product Name: P4S800
     Version: REV 1.xx
     Serial Number: xxxxxxxxxxx

BugLink: http://bugs.launchpad.net/bugs/366682
ASUS P4S800 will hang when rebooting unless reboot=b is specified.
Add a quirk to reboot through the bios.

Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com>
LKML-Reference: <1259972107.4629.275.camel@emiko>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86, apic: Enable lapic nmi watchdog on AMD Family 11h

commit 7d1849aff6687a135a8da3a75e32a00e3137a5e2 upstream.

The x86 lapic nmi watchdog does not recognize AMD Family 11h,
resulting in:

NMI watchdog: CPU not supported

As far as I can see from available documentation (the BKDM),
family 11h looks identical to family 10h as far as the PMU
is concerned.

Extending the check to accept family 11h results in:

Testing NMI watchdog ... OK.

I've been running with this change on a Turion X2 Ultra ZM-82
laptop for a couple of weeks now without problems.

Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
LKML-Reference: <19223.53436.931768.278021@pilspetsen.it.uu.se>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86/amd-iommu: un__init iommu_setup_msi

commit 9f800de38b05d84809e89f16671d636a140eede7 upstream.

This function may be called on the resume path and can not
be dropped after booting.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86/amd-iommu: attach devices to pre-allocated domains early

commit be831297716036de5b24308447ecb69f1706a846 upstream.

For some devices the ACPI table may define unity map
requirements which must me met when the IOMMU is enabled. So
we need to attach devices to their domains as early as
possible so that these mappings are in place when needed.
This patch assigns the domains right after they are
allocated. Otherwise this can result in I/O page faults
before a driver binds to a device and BIOS is still using
it.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

x86: Add new Intel CPU cache size descriptors

commit 85160b92fbd35321104819283c91bfed2b553e3c upstream.

The latest rev of Intel doc AP-485 details new cache descriptors
that we don't yet support. 12MB, 18MB and 24MB 24-way assoc L3
caches.

Signed-off-by: Dave Jones <davej@redhat.com>
LKML-Reference: <20091110184924.GA20337@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

V4L/DVB: Fix test in copy_reg_bits()

commit c95a419a5604ec8a23cd73f61e9bb151e8cbe89b upstream.

The reg_pair2[j].reg was tested twice.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked-by: Michael Krufky <mkrufky@linuxtv.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

USB: usbtmc: repeat usb_bulk_msg until whole message is transfered

commit ec412b92dbe3ea839716853eea058d1bcc5e6ca4 upstream.

usb_bulk_msg() transfers only bytes up to the maximum packet size.
It must be repeated by the usbtmc driver until all bytes of a TMC message
are transfered.

Without this patch, ETIMEDOUT is reported when writing TMC messages
larger than the maximum USB bulk size and the transfer remains incomplete.
The user will notice that the device hangs and must be reset by either closing
the application or pulling the plug.

Signed-off-by: Andre Herms <andre.herms@tec-venture.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

USB: usb-storage: add BAD_SENSE flag

commit a0bb108112a872c0b0c4b3ef4974f95fb75b155d upstream.

This patch (as1311) fixes a problem in usb-storage: Some devices are
pretty broken when it comes to reporting sense data.  The information
they send back indicates that they have more than 18 bytes of sense
data available, but when the system asks for more than 18 they fail or
hang.  The symptom is that probing fails with multiple resets.

The patch adds a new BAD_SENSE flag to indicate that usb-storage
should never ask for more than 18 bytes of sense data.  The flag can
be set in an unusual_devs entry or via the "quirks=" module parameter,
and it is set automatically whenever a REQUEST SENSE command for more
than 18 bytes fails or times out.

An unusual_devs entry is added for the Agfa photo frame, which uses a
Prolific chip having this bug.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Daniel Kukula <daniel.kuku@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

USB: option.c: add support for D-Link DWM-162-U5

commit 54a8e144acad6506920f385f4ef2779664f05b21 upstream.

Add D-Link DWM-162-U5 device id 1e0e:ce16 into option driver. The device
has 4 interfaces, of which 1 is handled by storage and the other 3 by
option driver.

The device appears first as CD-only 05c6:2100 device and must be switched
to 1e0e:ce16 mode either by using "eject CD" or usb_modeswitch.

The MessageContent for usb_modeswitch.conf is:
"55534243e0c26a85000000000000061b000000020000000000000000000000"

Signed-off-by: Zhang Le <r0bertz@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

USB: musb_gadget_ep0: fix unhandled endpoint 0 IRQs, again

commit 196f1b7a387546f425df2f1fad26772e3d513aea upstream.

Commit a5073b52833e4df8e16c93dc4cbb7e0c558c74a2 (musb_gadget: fix
unhandled endpoint 0 IRQs) somehow missed its key change:

"The gadget EP0 code routinely ignores an interrupt at end of
the data phase because of musb_g_ep0_giveback() resetting the
state machine to "idle, waiting for SETUP" phase prematurely."

So, the majority of the cases of unhandled IRQs is still unfixed...

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Anand Gadiyar <gadiyar@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

USB: Close usb_find_interface race v3

commit c2d284ee04ab6f6718de2ddcf1b43160e046c41d upstream.

USB drivers that create character devices call usb_register_dev in their
probe function. This associates the usb_interface device with that minor
number and creates the character device and announces it to the world.
However, the driver's probe function is called before the new
usb_interface is added to the driver's klist_devices.

This is a problem because userspace will respond to the character device
creation announcement by opening the character device. The driver's open
function will the call usb_find_interface to find the usb_interface
associated with that minor number. usb_find_interface will walk the
driver's list of devices and find the usb_interface with the matching
minor number.

Because the announcement happens before the usb_interface is added to the
driver's klist_devices, a race condition exists. A straightforward fix
is to walk the list of devices on usb_bus_type instead since the device
is added to that list before the announcement occurs.

bus_find_device calls get_device to bump the reference count on the found
device. It is arguable that the reference count should be dropped by the
caller of usb_find_interface instead of usb_find_interface, however,
the current users of usb_find_interface do not expect this.

The original version of this patch only matched against minor number
instead of driver and minor number. This version matches against both.

Signed-off-by: Russ Dill <Russ.Dill@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

SUNRPC: IS_ERR/PTR_ERR confusion

commit 480e3243df156e39eea6c91057e2ae612a6bbe19 upstream.

IS_ERR returns 1 or 0, PTR_ERR returns the error value.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ssb: Fix range check in sprom write

commit e33761e6f23881de9f3ee77cc2204ab2e26f3d9a upstream.

The range check in the sprom image parser hex2sprom() is broken.
One sprom word is 4 hex characters.
This fixes the check and also adds much better sanity checks to the code.
We better make sure the image is OK by doing some sanity checks to avoid
bricking the device by accident.

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

pxa/em-x270: fix usb hub power up/reset sequence

commit 1b82e4c32fba96d8805b1e2126ba5382e56fac32 upstream.

Signed-off-by: Igor Grinberg <grinberg@compulab.co.il>
Signed-off-by: Mike Rapoport <mike@compulab.co.il>
Signed-off-by: Eric Miao <eric.y.miao@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

pata_hpt{37x|3x2n}: fix timing register masks (take 2)

commit 5600c70e576199a7552e1c0fff43f3fe16f5566e upstream.

These drivers inherited from the older 'hpt366' IDE driver the buggy timing
register masks in their set_piomode() metods. As a result, too low command
cycle active time is programmed for slow PIO modes. Quite fortunately, it's
later "fixed up" by the set_dmamode() methods which also "helpfully" reprogram
the command timings, usually to PIO mode 4; unfortunately, setting an UltraDMA
mode #N also reprograms already set PIO data timings, usually to MWDMA mode #
max(N, 2) timings...

However, the drivers added some breakage of their own too: the bit that they
set/clear to control the FIFO is sometimes wrong -- it's actually the MSB of
the command cycle setup time; also, setting it in DMA mode is wrong as this
bit is only for PIO actually and clearing it for PIO modes is not needed as
no mode in any timing table has it set...

Fix all this, inverting the masks while at it, like in the 'hpt366' and
'pata_hpt366' drivers; bump the drivers' versions, accounting for recent
patches that forgot to do it...

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

md/bitmap: protect against bitmap removal while being updated.

commit aa5cbd103887011b4830355f88fb055f9ad2d556 upstream.

A write intent bitmap can be removed from an array while the
array is active.
When this happens, all IO is suspended and flushed before the
bitmap is removed.
However it is possible that bitmap_daemon_work is still running to
clear old bits from the bitmap. If it is, it can dereference the
bitmap after it has been freed.

So introduce a new mutex to protect bitmap_daemon_work and get it
before destroying a bitmap.

This is suitable for any current -stable kernel.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

KVM: s390: Make psw available on all exits, not just a subset

commit d7b0b5eb3000c6fb902f08c619fcd673a23d8fab upstream.

This patch moves s390 processor status word into the base kvm_run
struct and keeps it up-to date on all userspace exits.

The userspace ABI is broken by this, however there are no applications
in the wild using this. A capability check is provided so users can
verify the updated API exists.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

jbd2: don't wipe the journal on a failed journal checksum

commit e6a47428de84e19fda52f21ab73fde2906c40d09 upstream.

If there is a failed journal checksum, don't reset the journal.  This
allows for userspace programs to decide how to recover from this
situation.  It may be that ignoring the journal checksum failure might
be a better way of recovering the file system.  Once we add per-block
checksums, we can definitely do better.  Until then, a system
administrator can try backing up the file system image (or taking a
snapshot) and and trying to determine experimentally whether ignoring
the checksum failure or aborting the journal replay results in less
data loss.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

hrtimer: Fix /proc/timer_list regression

commit 8629ea2eaba8ca0de2e38ce1b4a825e16255976e upstream.

commit 507e1231 (timer stats: Optimize by adding quick check to avoid
function calls) introduced a regression in /proc/timer_list.

/proc/timer_list shows now
#0: <c27d46b0>, tick_sched_timer, S:01, <(null)>, /-1
instead of
#0: <c27d46b0>, tick_sched_timer, S:01, hrtimer_start, swapper/0

Revert the hrtimer quick check for now. The optimization needs more
thought, but this is neither 2.6.32-rc7 nor stable material.

[ tglx: - Removed unrelated changes from the original patch
- Prevent unneccesary call to timer_stats_update_stats
- massaged the changelog ]

Signed-off-by: Feng Tang <feng.tang@intel.com>
LKML-Reference: <alpine.LFD.2.00.0911181933540.24119@localhost.localdomain>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

hfs: fix a potential buffer overflow

commit ec81aecb29668ad71f699f4e7b96ec46691895b6 upstream.

A specially-crafted Hierarchical File System (HFS) filesystem could cause
a buffer overflow to occur in a process's kernel stack during a memcpy()
call within the hfs_bnode_read() function (at fs/hfs/bnode.c:24). The
attacker can provide the source buffer and length, and the destination
buffer is a local variable of a fixed length. This local variable (passed
as "&entry" from fs/hfs/dir.c:112 and allocated on line 60) is stored in
the stack frame of hfs_bnode_read()'s caller, which is hfs_readdir().
Because the hfs_readdir() function executes upon any attempt to read a
directory on the filesystem, it gets called whenever a user attempts to
inspect any filesystem contents.

[amwang@redhat.com: modify this patch and fix coding style problems]
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Eugene Teo <eteo@redhat.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Dave Anderson <anderson@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

futex: Take mmap_sem for get_user_pages in fault_in_user_writeable

commit 722d0172377a5697919b9f7e5beb95165b1dec4e upstream.

get_user_pages() must be called with mmap_sem held.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linuxfoundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20091208121942.GA21298@basil.fritz.box>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

devpts_get_tty() should validate inode

commit edfacdd6f81119b9005615593f2cbd94b8c7e2d8 upstream.

devpts_get_tty() assumes that the inode passed in is associated with a valid
pty. But if the only reference to the pty is via a bind-mount, the inode
passed to devpts_get_tty() while valid, would refer to a pty that no longer
exists.

With a lot of debug effort, Grzegorz Nosek developed a small program (see
below) to reproduce a crash on recent kernels. This crash is a regression
introduced by the commit:

commit 527b3e4773628b30d03323a2cb5fb0d84441990f
Author: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Date: Mon Oct 13 10:43:08 2008 +0100

To fix, ensure that the dentry associated with the inode has not yet been
deleted/unhashed by devpts_pty_kill().

See also:
https://lists.linux-foundation.org/pipermail/containers/2009-July/019273.html

tty-bug.c:

#define _GNU_SOURCE
#include <fcntl.h>
#include <sched.h>
#include <stdlib.h>
#include <sys/mount.h>
#include <sys/signal.h>
#include <unistd.h>
#include <stdio.h>

#include <linux/fs.h>

void dummy(int sig)
{
}

static int child(void *unused)
{
int fd;

signal(SIGINT, dummy); signal(SIGHUP, dummy);
pause(); /* cheesy synchronisation to wait for /dev/pts/0 to appear */

mount("/dev/pts/0", "/dev/console", NULL, MS_BIND, NULL);
sleep(2);

fd = open("/dev/console", O_RDWR);
dup(0); dup(0);
write(1, "Hello world!\n", sizeof("Hello world!\n")-1);
return 0;
}

int main(void)
{
pid_t pid;
char *stack;

stack = malloc(16384);
pid = clone(child, stack+16384, CLONE_NEWNS|SIGCHLD, NULL);

open("/dev/ptmx", O_RDWR|O_NOCTTY|O_NONBLOCK);

unlockpt(fd); grantpt(fd);

sleep(2);
kill(pid, SIGHUP);
sleep(1);
return 0; /* exit before child opens /dev/console */
}

Reported-by: Grzegorz Nosek <root@localdomain.pl>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Tested-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

debugfs: fix create mutex racy fops and private data

commit d3a3b0adad0865c12e39b712ca89efbd0a3a0dbc upstream.

Setting fops and private data outside of the mutex at debugfs file
creation introduces a race where the files can be opened with the wrong
file operations and private data. It is easy to trigger with a process
waiting on file creation notification.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

bsdacct: fix uid/gid misreporting

commit 4b731d50ff3df6b9141a6c12b088e8eb0109e83c upstream.

commit d8e180dcd5bbbab9cd3ff2e779efcf70692ef541 "bsdacct: switch
credentials for writing to the accounting file" introduced credential
switching during final acct data collecting. However, uid/gid pair
continued to be collected from current which became credentials of who
created acct file, not who exits.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=14676

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reported-by: Juho K. Juopperi <jkj@kapsi.fi>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ath5k: enable EEPROM checksum check

commit 512414b0bed0d376ac4d5ec1dd6f0b1a3551febc upstream.

Without this we have no gaurantee of the integrity of the
EEPROM and are likely to encounter a lot of bogus bug reports
due to actual issues on the EEPROM. With the EEPROM checksum
check in place we can easily rule those issues out.

If you run patch during a revert *you* have a card with a busted
EEPROM and only older kernel will support that concoction. This
patch is a trade off between not accepitng bogus EEPROMs and
avoiding bogus bug reports allowing developers to focus instead
on real concrete issues.

If stable keeps bogus bug reports because of a possibly busted EEPROM
feel free to apply this there too.

Tested on an AR5414

Cc: jirislaby@gmail.com
Cc: akpm@linux-foundation.org
Cc: rjw@sisk.pl
Cc: me@bobcopeland.com
Cc: david.quan@atheros.com
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ath5k: allow setting txpower to 0

commit 2eb2fa67e5462a36e98172fb92c78bc405b3035f upstream.

As a holdover from earlier code when we used to set
the power limit to '0' after a reset to configure the
default transmit power, ath5k interprets txpower=0 as
12.5 dBm. Fix that by just passing 0 through.

This fixes http://bugzilla.kernel.org/show_bug.cgi?id=14567

Reported-by: Daniel Folkers <daniel.folkers@task24.nl>
Tested-by: Daniel Folkers <daniel.folkers@task24.nl>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ALSA: hrtimer - Fix lock-up

commit fcfdebe70759c74e2e701f69aaa7f0e5e32cf5a6 upstream.

The timer stop callback can be called from snd_timer_interrupt(), which
is called from the hrtimer callback. Since hrtimer_cancel() waits for
the callback completion, this eventually results in a lock-up.

This patch fixes the problem by just toggling a flag at stop callback
and call hrtimer_cancel() later.

Reported-and-tested-by: Wojtek Zabolotny <W.Zabolotny@elka.pw.edu.pl>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ALSA: hda - Terradici HDA controllers does not support 64-bit mode

commit 396087eaead95fcb29eb36f1e59517aeb58c545e upstream.

Confirmed from vendor and tests in RedHat bugzilla #536782 .

Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

firewire: ohci: handle receive packets with a data length of zero

commit 8c0c0cc2d9f4c523fde04bdfe41e4380dec8ee54 upstream.

Queueing to receive an ISO packet with a payload length of zero
silently does nothing in dualbuffer mode, and crashes the kernel in
packet-per-buffer mode. Return an error in dualbuffer mode, because
the DMA controller won't let us do what we want, and work correctly in
packet-per-buffer mode.

Signed-off-by: Jay Fenlason <fenlason@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

USB: usb-storage: fix bug in fill_inquiry

commit f3f6faa9edf67c1018270793e0547b0f81abb47e upstream.

This patch (as1312) fixes a minor bug in usb-storage. The
fill_inquiry() routine neglects to pre-load the inquiry data buffer
with spaces. As a result, if the vendor name is shorter than 8
characters or the product name is shorter than 16, the remainder will
be filled with garbage.

The patch also removes some unnecessary calls to strlen().

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

USB: option: add pid for ZTE

commit 8d87cacda7c8db5c131bfcaaa1d90bfe918c2ebc upstream.

This patch adds ZTE modem devices.

Signed-off-by: Ming Zhao <zhao.ming9@zte.com.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Linux 2.6.31.8

ext4: Fix potential fiemap deadlock (mmap_sem vs. i_data_sem)

(cherry picked from commit fab3a549e204172236779f502eccb4f9bf0dc87d)

Fix the following potential circular locking dependency between
mm->mmap_sem and ei->i_data_sem:

    =======================================================
    [ INFO: possible circular locking dependency detected ]
    2.6.32-04115-gec044c5 #37
    -------------------------------------------------------
    ureadahead/1855 is trying to acquire lock:
     (&mm->mmap_sem){++++++}, at: [<ffffffff81107224>] might_fault+0x5c/0xac

    but task is already holding lock:
     (&ei->i_data_sem){++++..}, at: [<ffffffff811be1fd>] ext4_fiemap+0x11b/0x159

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&ei->i_data_sem){++++..}:
           [<ffffffff81099bfa>] __lock_acquire+0xb67/0xd0f
           [<ffffffff81099e7e>] lock_acquire+0xdc/0x102
           [<ffffffff81516633>] down_read+0x51/0x84
           [<ffffffff811a2414>] ext4_get_blocks+0x50/0x2a5
           [<ffffffff811a3453>] ext4_get_block+0xab/0xef
           [<ffffffff81154f39>] do_mpage_readpage+0x198/0x48d
           [<ffffffff81155360>] mpage_readpages+0xd0/0x114
           [<ffffffff811a104b>] ext4_readpages+0x1d/0x1f
           [<ffffffff810f8644>] __do_page_cache_readahead+0x12f/0x1bc
           [<ffffffff810f86f2>] ra_submit+0x21/0x25
           [<ffffffff810f0cfd>] filemap_fault+0x19f/0x32c
           [<ffffffff81107b97>] __do_fault+0x55/0x3a2
           [<ffffffff81109db0>] handle_mm_fault+0x327/0x734
           [<ffffffff8151aaa9>] do_page_fault+0x292/0x2aa
           [<ffffffff81518205>] page_fault+0x25/0x30
           [<ffffffff812a34d8>] clear_user+0x38/0x3c
           [<ffffffff81167e16>] padzero+0x20/0x31
           [<ffffffff81168b47>] load_elf_binary+0x8bc/0x17ed
           [<ffffffff81130e95>] search_binary_handler+0xc2/0x259
           [<ffffffff81166d64>] load_script+0x1b8/0x1cc
           [<ffffffff81130e95>] search_binary_handler+0xc2/0x259
           [<ffffffff8113255f>] do_execve+0x1ce/0x2cf
           [<ffffffff81027494>] sys_execve+0x43/0x5a
           [<ffffffff8102918a>] stub_execve+0x6a/0xc0

    -> #0 (&mm->mmap_sem){++++++}:
           [<ffffffff81099aa4>] __lock_acquire+0xa11/0xd0f
           [<ffffffff81099e7e>] lock_acquire+0xdc/0x102
           [<ffffffff81107251>] might_fault+0x89/0xac
           [<ffffffff81139382>] fiemap_fill_next_extent+0x95/0xda
           [<ffffffff811bcb43>] ext4_ext_fiemap_cb+0x138/0x157
           [<ffffffff811be069>] ext4_ext_walk_space+0x178/0x1f1
           [<ffffffff811be21e>] ext4_fiemap+0x13c/0x159
           [<ffffffff811390e6>] do_vfs_ioctl+0x348/0x4d6
           [<ffffffff811392ca>] sys_ioctl+0x56/0x79
           [<ffffffff81028cb2>] system_call_fastpath+0x16/0x1b

    other info that might help us debug this:

    1 lock held by ureadahead/1855:
     #0:  (&ei->i_data_sem){++++..}, at: [<ffffffff811be1fd>] ext4_fiemap+0x11b/0x159

    stack backtrace:
    Pid: 1855, comm: ureadahead Not tainted 2.6.32-04115-gec044c5 #37
    Call Trace:
     [<ffffffff81098c70>] print_circular_bug+0xa8/0xb7
     [<ffffffff81099aa4>] __lock_acquire+0xa11/0xd0f
     [<ffffffff8102f229>] ? sched_clock+0x9/0xd
     [<ffffffff81099e7e>] lock_acquire+0xdc/0x102
     [<ffffffff81107224>] ? might_fault+0x5c/0xac
     [<ffffffff81107251>] might_fault+0x89/0xac
     [<ffffffff81107224>] ? might_fault+0x5c/0xac
     [<ffffffff81124b44>] ? __kmalloc+0x13b/0x18c
     [<ffffffff81139382>] fiemap_fill_next_extent+0x95/0xda
     [<ffffffff811bcb43>] ext4_ext_fiemap_cb+0x138/0x157
     [<ffffffff811bca0b>] ? ext4_ext_fiemap_cb+0x0/0x157
     [<ffffffff811be069>] ext4_ext_walk_space+0x178/0x1f1
     [<ffffffff811be21e>] ext4_fiemap+0x13c/0x159
     [<ffffffff81107224>] ? might_fault+0x5c/0xac
     [<ffffffff811390e6>] do_vfs_ioctl+0x348/0x4d6
     [<ffffffff8129f6d0>] ? __up_read+0x8d/0x95
     [<ffffffff81517fb5>] ? retint_swapgs+0x13/0x1b
     [<ffffffff811392ca>] sys_ioctl+0x56/0x79
     [<ffffffff81028cb2>] system_call_fastpath+0x16/0x1b

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

signal: Fix alternate signal stack check

commit 2a855dd01bc1539111adb7233f587c5c468732ac upstream.

All architectures in the kernel increment/decrement the stack pointer
before storing values on the stack.

On architectures which have the stack grow down sas_ss_sp == sp is not
on the alternate signal stack while sas_ss_sp + sas_ss_size == sp is
on the alternate signal stack.

On architectures which have the stack grow up sas_ss_sp == sp is on
the alternate signal stack while sas_ss_sp + sas_ss_size == sp is not
on the alternate signal stack.

The current implementation fails for architectures which have the
stack grow down on the corner case where sas_ss_sp == sp.This was
reported as Debian bug #544905 on AMD64.
Simplified test case: http://download.breakpoint.cc/tc-sig-stack.c

The test case creates the following stack scenario:
   0xn0300 stack top
   0xn0200 alt stack pointer top (when switching to alt stack)
   0xn01ff alt stack end
   0xn0100 alt stack start == stack pointer

If the signal is sent the stack pointer is pointing to the base
address of the alt stack and the kernel erroneously decides that it
has already switched to the alternate stack because of the current
check for "sp - sas_ss_sp < sas_ss_size"

On parisc (stack grows up) the scenario would be:
   0xn0200 stack pointer
   0xn01ff alt stack end
   0xn0100 alt stack start = alt stack pointer base
              (when switching to alt stack)
   0xn0000 stack base

This is handled correctly by the current implementation.

[ tglx: Modified for archs which have the stack grow up (parisc) which
   would fail with the correct implementation for stack grows
   down. Added a check for sp >= current->sas_ss_sp which is
   strictly not necessary but makes the code symetric for both
   variants ]

Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
LKML-Reference: <20091025143758.GA6653@Chamillionaire.breakpoint.cc>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

SCSI: scsi_lib_dma: fix bug with dma maps on nested scsi objects

commit d139b9bd0e52dda14fd13412e7096e68b56d0076 upstream.

Some of our virtual SCSI hosts don't have a proper bus parent at the
top, which can be a problem for doing DMA on them

This patch makes the host device cache a pointer to the physical bus
device and provides an extra API for setting it (the normal API picks
it up from the parent). This patch also modifies the qla2xxx and lpfc
vport logic to use the new DMA host setting API.

Acked-By: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

SCSI: osd_protocol.h: Add missing #include

commit 0899638688f223fd9e9fee60d662665e11693d12 upstream.

include/scsi/osd_protocol.h uses ALIGN() without an #include
<linux/kernel.h>, leading to:
| include/scsi/osd_protocol.h:362: error: implicit declaration of function 'ALIGN'

Signed-off-by: Martin Michlmayr <tbm@cyrius.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

SCSI: megaraid_sas: fix 64 bit sense pointer truncation

commit 7b2519afa1abd1b9f63aa1e90879307842422dae upstream.

The current sense pointer is cast to a u32 pointer, which can truncate
on 64 bits. Fix by using unsigned long instead.

Signed-off-by Bo Yang<bo.yang@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: Fix insufficient checks in EXT4_IOC_MOVE_EXT

(cherry picked from commit 4a58579b9e4e2a35d57e6c9c8483e52f6f1b7fd6)

This patch fixes three problems in the handling of the
EXT4_IOC_MOVE_EXT ioctl:

1. In current EXT4_IOC_MOVE_EXT, there are read access mode checks for
original and donor files, but they allow the illegal write access to
donor file, since donor file is overwritten by original file data.  To
fix this problem, change access mode checks of original (r->r/w) and
donor (r->w) files.

2.  Disallow the use of donor files that have a setuid or setgid bits.

3.  Call mnt_want_write() and mnt_drop_write() before and after
ext4_move_extents() calling to get write access to a mount.

Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: Wait for proper transaction commit on fsync

(cherry picked from commit b436b9bef84de6893e86346d8fbf7104bc520645)

We cannot rely on buffer dirty bits during fsync because pdflush can come
before fsync is called and clear dirty bits without forcing a transaction
commit. What we do is that we track which transaction has last changed
the inode and which transaction last changed allocation and force it to
disk on fsync.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: fix incorrect block reservation on quota transfer.

(cherry picked from commit 194074acacebc169ded90a4657193f5180015051)

Inside ->setattr() call both ATTR_UID and ATTR_GID may be valid
This means that we may end-up with transferring all quotas. Add
we have to reserve QUOTA_DEL_BLOCKS for all quotas, as we do in
case of QUOTA_INIT_BLOCKS.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: quota macros cleanup

(cherry picked from commit 5aca07eb7d8f14d90c740834d15ca15277f4820c)

Currently all quota block reservation macros contains hard-coded "2"
aka MAXQUOTAS value. This is no good because in some places it is not
obvious to understand what does this digit represent. Let's introduce
new macro with self descriptive name.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Acked-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: ext4_get_reserved_space() must return bytes instead of blocks

(cherry picked from commit 8aa6790f876e81f5a2211fe1711a5fe3fe2d7b20)

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: remove blocks from inode prealloc list on failure

(cherry picked from commit b844167edc7fcafda9623955c05e4c1b3c32ebc7)

This fixes a leak of blocks in an inode prealloc list if device failures
cause ext4_mb_mark_diskspace_used() to fail.

Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Acked-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: wait for log to commit when umounting

(cherry picked from commit d4edac314e9ad0b21ba20ba8bc61b61f186f79e1)

There is a potential race when a transaction is committing right when
the file system is being umounting. This could reduce in a race
because EXT4_SB(sb)->s_group_info could be freed in ext4_put_super
before the commit code calls a callback so the mballoc code can
release freed blocks in the transaction, resulting in a panic trying
to access the freed s_group_info.

The fix is to wait for the transaction to finish committing before we
shutdown the multiblock allocator.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: Avoid data / filesystem corruption when write fails to copy data

(cherry picked from commit b9a4207d5e911b938f73079a83cc2ae10524ec7f)

When ext4_write_begin fails after allocating some blocks or
generic_perform_write fails to copy data to write, we truncate blocks
already instantiated beyond i_size. Although these blocks were never
inside i_size, we have to truncate the pagecache of these blocks so
that corresponding buffers get unmapped. Otherwise subsequent
__block_prepare_write (called because we are retrying the write) will
find the buffers mapped, not call ->get_block, and thus the page will
be backed by already freed blocks leading to filesystem and data
corruption.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: Return the PTR_ERR of the correct pointer in setup_new_group_blocks()

(cherry picked from commit c09eef305dd43846360944ad072f051f964fa383)

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

jbd2: Add ENOMEM checking in and for jbd2_journal_write_metadata_buffer()

(cherry picked from commit e6ec116b67f46e0e7808276476554727b2e6240b)

OOM happens.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: move_extent_per_page() cleanup

(cherry picked from commit ac48b0a1d068887141581bea8285de5fcab182b0)

Integrate duplicate lines (acquire/release semaphore and invalidate
extent cache in move_extent_per_page()) into mext_replace_branches(),
to reduce source and object code size.

Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: initialize moved_len before calling ext4_move_extents()

(cherry picked from commit 446aaa6e7e993b38a6f21c6acfa68f3f1af3dbe3)

The move_extent.moved_len is used to pass back the number of exchanged
blocks count to user space. Currently the caller must clear this
field; but we spend more code space checking for this requirement than
simply zeroing the field ourselves, so let's just make life easier for
everyone all around.

Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: Fix double-free of blocks with EXT4_IOC_MOVE_EXT

(cherry picked from commit 94d7c16cbbbd0e03841fcf272bcaf0620ad39618)

At the beginning of ext4_move_extent(), we call
ext4_discard_preallocations() to discard inode PAs of orig and donor
inodes.  But in the following case, blocks can be double freed, so
move ext4_discard_preallocations() to the end of ext4_move_extents().

1. Discard inode PAs of orig and donor inodes with
   ext4_discard_preallocations() in ext4_move_extents().

   orig : [ DATA1 ]
   donor: [ DATA2 ]

2. While data blocks are exchanging between orig and donor inodes, new
   inode PAs is created to orig by other process's block allocation.
   (Since there are semaphore gaps in ext4_move_extents().)  And new
   inode PAs is used partially (2-1).

   2-1 Create new inode PAs to orig inode
   orig : [ DATA1 | used PA1 | free PA1 ]
   donor: [ DATA2 ]

3. Donor inode which has old orig inode's blocks is deleted after
   EXT4_IOC_MOVE_EXT finished (3-1, 3-2).  So the block bitmap
   corresponds to old orig inode's blocks are freed.

   3-1 After EXT4_IOC_MOVE_EXT finished
   orig : [ DATA2 |  free PA1 ]
   donor: [ DATA1 |  used PA1 ]

   3-2 Delete donor inode
   orig : [ DATA2 |  free PA1 ]
   donor: [ FREE SPACE(DATA1) | FREE SPACE(used PA1) ]

4. The double-free of blocks is occurred, when close() is called to
   orig inode.  Because ext4_discard_preallocations() for orig inode
   frees used PA1 and free PA1, though used PA1 is already freed in 3.

   4-1 Double-free of blocks is occurred
   orig : [ DATA2 |  FREE SPACE(free PA1) ]
   donor: [ FREE SPACE(DATA1) | DOUBLE FREE(used PA1) ]

Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: make "norecovery" an alias for "noload"

(cherry picked from commit e3bb52ae2bb9573e84c17b8e3560378d13a5c798)

Users on the linux-ext4 list recently complained about differences
across filesystems w.r.t. how to mount without a journal replay.

In the discussion it was noted that xfs's "norecovery" option is
perhaps more descriptively accurate than "noload," so let's make
that an alias for ext4.

Also show this status in /proc/mounts

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: make trim/discard optional (and off by default)

(cherry picked from commit 5328e635315734d42080de9a5a1ee87bf4cae0a4)

It is anticipated that when sb_issue_discard starts doing
real work on trim-capable devices, we may see issues. Make
this mount-time optional, and default it to off until we know
that things are working out OK.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: fix error handling in ext4_ind_get_blocks()

(cherry picked from commit 2bba702d4f88d7b010ec37e2527b552588404ae7)

When an error happened in ext4_splice_branch we failed to notice that
in ext4_ind_get_blocks and mapped the buffer anyway. Fix the problem
by checking for error properly.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: avoid issuing unnecessary barriers

(cherry picked from commit 6b17d902fdd241adfa4ce780df20547b28bf5801)

We don't to issue an I/O barrier on an error or if we force commit
because we are doing data journaling.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: fix block validity checks so they work correctly with meta_bg

(cherry picked from commit 1032988c71f3f85483b2b4319684d1205a704c02)

The block validity checks used by ext4_data_block_valid() wasn't
correctly written to check file systems with the meta_bg feature. Fix
this.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: fix uninit block bitmap initialization when s_meta_first_bg is non-zero

(cherry picked from commit 8dadb198cb70ef811916668fe67eeec82e8858dd)

The number of old-style block group descriptor blocks is
s_meta_first_bg when the meta_bg feature flag is set.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: don't update the superblock in ext4_statfs()

(cherry picked from commit 3f8fb9490efbd300887470a2a880a64e04dcc3f5)

commit a71ce8c6c9bf269b192f352ea555217815cf027e updated ext4_statfs()
to update the on-disk superblock counters, but modified this buffer
directly without any journaling of the change. This is one of the
accesses that was causing the crc errors in journal replay as seen in
kernel.org bugzilla #14354.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: journal all modifications in ext4_xattr_set_handle

(cherry picked from commit 86ebfd08a1930ccedb8eac0aeb1ed4b8b6a41dbc)

ext4_xattr_set_handle() was zeroing out an inode outside
of journaling constraints; this is one of the accesses that
was causing the crc errors in journal replay as seen in
kernel.org bugzilla #14354.

Reviewed-by: Andreas Dilger <adilger@sun.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: fix i_flags access in ext4_da_writepages_trans_blocks()

(cherry picked from commit 30c6e07a92ea4cb87160d32ffa9bce172576ae4c)

We need to be testing the i_flags field in the ext4 specific portion
of the inode, instead of the (confusingly aliased) i_flags field in
the generic struct inode.

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: make sure directory and symlink blocks are revoked

(cherry picked from commit 50689696867d95b38d9c7be640a311494a04fb86)

When an inode gets unlinked, the functions ext4_clear_blocks() and
ext4_remove_blocks() call ext4_forget() for all the buffer heads
corresponding to the deleted inode's data blocks. If the inode is a
directory or a symlink, the is_metadata parameter must be non-zero so
ext4_forget() will revoke them via jbd2_journal_revoke(). Otherwise,
if these blocks are reused for a data file, and the system crashes
before a journal checkpoint, the journal replay could end up
corrupting these data blocks.

Thanks to Curt Wohlgemuth for pointing out potential problems in this
area.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: plug a buffer_head leak in an error path of ext4_iget()

(cherry picked from commit 567f3e9a70d71e5c9be03701b8578be77857293b)

One of the invalid error paths in ext4_iget() forgot to brelse() the
inode buffer head. Fix it by adding a brelse() in the common error
return path, which also simplifies function.

Thanks to Andi Kleen <ak@linux.intel.com> reporting the problem.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: fix possible recursive locking warning in EXT4_IOC_MOVE_EXT

(cherry picked from commit 49bd22bc4d603a2a4fc2a6a60e156cbea52eb494)

If CONFIG_PROVE_LOCKING is enabled, the double_down_write_data_sem()
will trigger a false-positive warning of a recursive lock. Since we
take i_data_sem for the two inodes ordered by their inode numbers,
this isn't a problem. Use of down_write_nested() will notify the lock
dependency checker machinery that there is no problem here.

This problem was reported by Brian Rogers:

http://marc.info/?l=linux-ext4&m=125115356928011&w=1

Reported-by: Brian Rogers <brian@xyzw.org>
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: fix lock order problem in ext4_move_extents()

(cherry picked from commit fc04cb49a898c372a22b21fffc47f299d8710801)

ext4_move_extents() checks the logical block contiguousness
of original file with ext4_find_extent() and mext_next_extent().
Therefore the extent which ext4_ext_path structure indicates
must not be changed between above functions.

But in current implementation, there is no i_data_sem protection
between ext4_ext_find_extent() and mext_next_extent().  So the extent
which ext4_ext_path structure indicates may be overwritten by
delalloc.  As a result, ext4_move_extents() will exchange wrong blocks
between original and donor files.  I change the place where
acquire/release i_data_sem to solve this problem.

Moreover, I changed move_extent_per_page() to start transaction first,
and then acquire i_data_sem.  Without this change, there is a
possibility of the deadlock between mmap() and ext4_move_extents():

* NOTE: "A", "B" and "C" mean different processes

A-1: ext4_ext_move_extents() acquires i_data_sem of two inodes.

B:   do_page_fault() starts the transaction (T),
     and then tries to acquire i_data_sem.
     But process "A" is already holding it, so it is kept waiting.

C:   While "A" and "B" running, kjournald2 tries to commit transaction (T)
     but it is under updating, so kjournald2 waits for it.

A-2: Call ext4_journal_start with holding i_data_sem,
     but transaction (T) is locked.

Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: fix the returned block count if EXT4_IOC_MOVE_EXT fails

(cherry picked from commit f868a48d06f8886cb0367568a12367fa4f21ea0d)

If the EXT4_IOC_MOVE_EXT ioctl fails, the number of blocks that were
exchanged before the failure should be returned to the userspace
caller. Unfortunately, currently if the block size is not the same as
the page size, the returned block count that is returned is the
page-aligned block count instead of the actual block count. This
commit addresses this bug.

Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: avoid divide by zero when trying to mount a corrupted file system

(cherry picked from commit 503358ae01b70ce6909d19dd01287093f6b6271c)

If s_log_groups_per_flex is greater than 31, then groups_per_flex will
will overflow and cause a divide by zero error. This can cause kernel
BUG if such a file system is mounted.

Thanks to Nageswara R Sastry for analyzing the failure and providing
an initial patch.

http://bugzilla.kernel.org/show_bug.cgi?id=14287

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: fix potential buffer head leak when add_dirent_to_buf() returns ENOSPC

(cherry picked from commit 2de770a406b06dfc619faabbf5d85c835ed3f2e1)

Previously add_dirent_to_buf() did not free its passed-in buffer head
in the case of ENOSPC, since in some cases the caller still needed it.
However, this led to potential buffer head leaks since not all callers
dealt with this correctly. Fix this by making simplifying the freeing
convention; now add_dirent_to_buf() *never* frees the passed-in buffer
head, and leaves that to the responsibility of its caller. This makes
things cleaner and easier to prove that the code is neither leaking
buffer heads or calling brelse() one time too many.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: Fix return value of ext4_split_unwritten_extents() to fix direct I/O

(cherry picked from commit ba230c3f6dc88ec008806adb27b12088486d508e)

To prepare for a direct I/O write, we need to split the unwritten
extents before submitting the I/O. When no extents needed to be
split, ext4_split_unwritten_extents() was incorrectly returning 0
instead of the size of uninitialized extents. This bug caused the
wrong return value sent back to VFS code when it gets called from
async IO path, leading to an unnecessary fall back to buffered IO.

This bug also hid the fact that the check to see whether or not a
split would be necessary was incorrect; we can only skip splitting the
extent if the write completely covers the uninitialized extent.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: code clean up for dio fallocate handling

(cherry picked from commit 4b70df181611012a3556f017b57dfcef7e1d279f)

The ext4_debug() call in ext4_end_io_dio() should be moved after the
check to make sure that io_end is non-NULL.

The comment above ext4_get_block_dio_write() ("Maximum number of
blocks...") is a duplicate; the original and correct comment is above
the #define DIO_MAX_BLOCKS up above.

Based on review comments from Curt Wohlgemuth.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: skip conversion of uninit extents after direct IO if there isn't any

(cherry picked from commit 5f5249507e4b5c4fc0f9c93f33d133d8c95f47e1)

At the end of direct I/O operation, ext4_ext_direct_IO() always called
ext4_convert_unwritten_extents(), regardless of whether there were any
unwritten extents involved in the I/O or not.

This commit adds a state flag so that ext4_ext_direct_IO() only calls
ext4_convert_unwritten_extents() when necessary.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: fix ext4_ext_direct_IO()'s return value after converting uninit extents

(cherry picked from commit 109f55651954def97fa41ee71c464d268c512ab0)

After a direct I/O request covering an uninitalized extent (i.e.,
created using the fallocate system call) or a hole in a file, ext4
will convert the uninitialized extent so it is marked as initialized
by calling ext4_convert_unwritten_extents(). This function returns
zero on success.

This return value was getting returned by ext4_direct_IO(); however
the file system's direct_IO function is supposed to return the number
of bytes read or written on a success. By returning zero, it confused
the direct I/O code into falling back to buffered I/O unnecessarily.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: discard preallocation when restarting a transaction during truncate

(cherry picked from commit fa5d11133b07053270e18fa9c18560e66e79217e)

When restart a transaction during a truncate operation, we drop and
reacquire i_data_sem. After reacquiring i_data_sem, we need to
discard any inode-based preallocation that might have been grabbed
while we released i_data_sem (for example, if pdflush is allocating
blocks and racing against the truncate).

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: retry failed direct IO allocations

(cherry picked from commit fbbf69456619de5d251cb9f1df609069178c62d5)

On a 256M filesystem, doing this in a loop:

        xfs_io -F -f -d -c 'pwrite 0 64m' test
        rm -f test

eventually leads to ENOSPC.  (the xfs_io command does a
64m direct IO write to the file "test")

As with other block allocation callers, it looks like we need to
potentially retry the allocations on the initial ENOSPC.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: fix a BUG_ON crash by checking that page has buffers attached to it

(cherry picked from commit 1f94533d9cd75f6d2826018d54a971b9cc085992)

In ext4_num_dirty_pages() we were calling page_buffers() before
checking to see if the page actually had pages attached to it; this
would cause a BUG check crash in the inline function page_buffers().

Thanks to Markus Trippelsdorf for reporting this bug.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: Fix time encoding with extra epoch bits

(cherry picked from commit c1fccc0696bcaff6008c11865091f5ec4b0937ab)

"Looking at ext4.h, I think the setting of extra time fields forgets to
mask the epoch bits so the epoch part overwrites nsec part. The second
change is only for coherency (2 -> EXT4_EPOCH_BITS)."

Thanks to Damien Guibouret for pointing out this problem.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: Handle nested ext4_journal_start/stop calls without a journal

(cherry picked from commit d3d1faf6a74496ea4435fd057c6a2cad49f3e523)

This patch fixes a problem with handling nested calls to
ext4_journal_start/ext4_journal_stop, when there is no journal present.

Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ext4: Make sure ext4_dirty_inode() updates the inode in no journal mode

(cherry picked from commit f3dc272fd5e2ae08244796bb39e7e1ce4b25d3b3)

This patch a problem that ext4_dirty_inode() was not calling
ext4_mark_inode_dirty() if the current_handle is not valid, which it
is the case in no journal mode.

It also removes a test for non-matching transaction which can never
happen.

Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>