This patch fixes a null pointer dereference BUG() if ethtool is used on
an smsc9420 interface while it is down, because the phy_dev is only
allocated while the interface is up.
Signed-off-by: Steve Glendinning <steve.glendinning@smsc.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
smc91x.h defines SMC_IRQ_FLAGS to be -1 when it wants the interrupt
flags to be taken from the resource structure. However, d280ead
changed this to checking for non-zero resource flags.
Unfortunately, this means that on some platforms, we end up passing
'-1' to request_irq rather than the desired result. Combine the two
conditions into one so that the IRQ flags are taken from the resource
if either SMC_IRQ_FLAGS is -1 or the resource flags specify an
interrupt trigger.
This restores network on at least the Versatile platform.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: Eric Miao <eric.y.miao@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In dev_change_name() an err variable is used for storing the original
call_netdevice_notifiers() errno (negative) and testing for a rollback
error later, but the test for non-zero is wrong, because the err might
have positive value as well - from dev_alloc_name(). It means the
rollback for a netdevice with a number > 0 will never happen. (The err
test is reordered btw. to make it more readable.)
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When a large packet gets reassembled by ip_defrag(), the head skb
accounts for all the fragments in skb->truesize. If this packet is
refragmented again, skb->truesize is not re-adjusted to reflect only
the head size since its not owned by a socket. If the head fragment
then gets recycled and reused for another received fragment, it might
exceed the defragmentation limits due to its large truesize value.
skb_recycle_check() explicitly checks for linear skbs, so any recycled
skb should reflect its true size in skb->truesize. Change ip_fragment()
to also adjust the truesize value of skbs not owned by a socket.
Reported-and-tested-by: Ben Menchaca <ben@bigfootnetworks.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When we've merged skb's with page frags, and subsequently receive
a trailer skb (< MSS) that is not completely non-linear (this can
occur on Intel NICs if the packet size falls below the threshold),
GRO ends up producing an illegal GSO skb with a frag_list.
This is harmless unless the skb is then forwarded through an
interface that requires software GSO, whereupon the GSO code
will BUG.
This patch detects this case in GRO and avoids merging the
trailer skb.
Reported-by: Mark Wagner <mwagner@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
About 50% of shutdowns of b44 Ethernet adapter ends by kernel panic
with kernels compiled with stack-protector.
Checking b44_magic_pattern() return values, one call of
b44_magic_pattern() returns 127. It means, that set_bit(128, pmask)
was called on line 1509. It means that bit 0 of 17th byte of pmask was
overwritten. But pmask has only 16 bytes. Stack corruption happens.
It seems that set_bit() on line 1509 always writes one bit off.
The fix does not only solve the stack corruption, but also makes Wake
On LAN working on my onboard B44 on Asus A7V-333X mainboard.
It seems that this problem affects all kernel versions since commit 725ad800 ([PATCH] b44: add wol for old nic) on 2006-06-20.
Signed-off-by: Stanislav Brabec <sbrabec@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Andreas Lohre reported that the driver crashes when trying
to register_netdev(), he sugessted to move dev->netdev_ops initialization
before calling register_netdev(), it worked for him.
Reported-by: Andreas Lohre <alohre@gmail.com> Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix checking of the currently programmed UDMA mode.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Ok, we really do need to revert this, even with Bart's sis5513.c
fix in there.
The problem is that several driver's ->set_pio_mode() method
depends upon the drive->media type being set properly. Most
of them use this to enable prefetching, which can only be done
for disk media.
But the commit being reverted here calls ->set_pio_mode() before
it's setup. Actually it considers everything disk because that
is the default media type set by ide_port_init_devices_data().
The set of drivers that depend upon the media type in their
->set_pio_method() are:
And it is possible that we could fix this by guarding the prefetching
and other media dependent setting changes with a test on
IDE_PFLAG_PROBING in hwif->port_flags, that's simply too risky for
2.6.32-rcX and -stable.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently, ide_cmd_ioctl when invoked for setting DMA transfer mode calls
ide_find_dma_mode with requested mode as XFER_UDMA_6. This prevents setting DMA
mode to any other value than the default (maximum) supported by the device (or
UDMA6, if supported) irrespective of the actual requested transfer mode and
returns error.
For example, setting mode to UDMA2 using hdparm, where UDMA4 is the default
transfer mode gives following error:
# ./hdparm -d1 -Xudma2 /dev/hda
/dev/hda:hda: UDMA/66 mode selected
setting using_dma to 1 (on)
hda: UDMA/66 mode selected
setting xfermode to 66 (UltraDMA mode2)
HDIO_DRIVE_CMD(setxfermode) failed: Invalid argument
using_dma = 1 (on)
This patch fixes the issue.
Signed-off-by: Hemant Pedanekar <hemantp@ti.com> Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
CMD646 corrupts data on concurrent transfers on both channels when IDE SSD is
connected to one of the channels.
Setup that demonstrates this hardware bug: Ultra 5, onboard CMD646, rev 3.
/dev/hda is 8GB Seagate ST38410A in MWDMA2
/dev/hdd is 32GB SSD SiliconHardDisk in MWDMA2
- When reading /dev/hdd (for example with dd or fsck), reads from /dev/hda
are corrupted, there are twiddled single bits 1->0 and some full 32-bit
words corrupted, sometimes commands fail (which switches /dev/hda to
PIO mode but the corruptions happen even in PIO).
- Reads from /dev/hdd don't seem to be corrupted (i.e. fsck passes fine).
- When I connected normal rotating harddisk to /dev/hdd, there was no
corruption, so the corruption is something specific to SSD.
- I tried the same setup on a PCI card with CMD649 and saw no corruption.
This patch serializes the operation for CMD646 and 643 (I didn't test
CMD643 but it may have the same hw bug too because it's earlier design).
CMD649 is good. I don't know anything about CMD 648.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Tested-by: Frans Pop <elendil@planet.nl> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Do not read IIR in serial8250_start_tx when UART_BUG_TXEN
Reading the IIR clears some oustanding interrupts so it is not safe.
Instead, simply transmit immediately if the buffer is empty without
regard to IIR.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Jiri Kosina <jkosina@suse.cz> Cc: Alan Cox <alan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
e821ea70f3b4873b50056a1e0f74befed1014c09 introduced a bug by copying
some 64-bit originated code as-is to be used by both 32 and 64-bit
but this code contains a 64-bit ony "cmpdi" instruction.
This changes it to cmpwi, which is fine since VRSAVE can only contains
a 32-bit value anyway.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Most callers of pmd_none_or_clear_bad() check whether the target page is
in a hugepage or not, but walk_page_range() do not check it. So if we
read /proc/pid/pagemap for the hugepage on x86 machine, the hugepage
memory is leaked as shown below. This patch fixes it.
Details
=======
My test program (leak_pagemap) works as follows:
- creat() and mmap() a file on hugetlbfs (file size is 200MB == 100 hugepages,)
- read()/write() something on it,
- call page-types with option -p (walk around the page tables),
- munmap() and unlink() the file on hugetlbfs
Most callers of pmd_none_or_clear_bad() check whether the target page is
in a hugepage or not, but mincore() and walk_page_range() do not check it.
So if we use mincore() on a hugepage on x86 machine, the hugepage memory
is leaked as shown below. This patch fixes it by extending mincore()
system call to support hugepages.
Details
=======
My test program (leak_mincore) works as follows:
- creat() and mmap() a file on hugetlbfs (file size is 200MB == 100 hugepages,)
- read()/write() something on it,
- call mincore() for first ten pages and printf() the values of *vec
- munmap() and unlink() the file on hugetlbfs
Return values in *vec from mincore() are set to 0, while the hugepage
should be in memory, and 1 hugepage is still accounted as used while
there is no file on hugetlbfs.
On a 32-bit machine, BIT() macro does not give the required
bit value if the bit is mroe than 31. In ieee802_11_parse_elems_crc(),
BIT() is suppossed to get the bit value more than 31 (42 (id of ERP_INFO_IE),
37 (CHANNEL_SWITCH_IE), (42), 32 (POWER_CONSTRAINT_IE), 45 (HT_CAP_IE),
61 (HT_INFO_IE)). As we do not get the required bit value for the above
IEs, crc over these IEs are never calculated, so any dynamic change in these
IEs after the association is not really handled on 32-bit platforms.
This patch fixes this issue.
Signed-off-by: Vasanthakumar Thiagarajan <vasanth@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On a multi-node x3950M2 system, there's a slight oddity in the
PCI device tree for all secondary nodes:
30:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
\-33:00.0 PCI bridge: IBM CalIOC2 PCI-E Root Port (rev 01)
\-34:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)
...as compared to the primary node:
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
\-01:00.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)
03:00.0 PCI bridge: IBM CalIOC2 PCI-E Root Port (rev 01)
\-04:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)
In both nodes, the LSI RAID controller hangs off a CalIOC2
device, but on the secondary nodes, the BIOS hides the VGA
device and substitutes the device tree ending with the disk
controller.
It would seem that Calgary devices don't necessarily appear at
the top of the PCI tree, which means that the current code to
find the Calgary IOMMU that goes with a particular device is
buggy.
Rather than walk all the way to the top of the PCI
device tree and try to match bus number with Calgary descriptor,
the code needs to examine each parent of the particular device;
if it encounters a Calgary with a matching bus number, simply
use that.
Otherwise, we BUG() when the bus number of the Calgary doesn't
match the bus number of whatever's at the top of the device tree.
Extra note: This patch appears to work correctly for the x3950
that came before the x3950 M2.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Acked-by: Muli Ben-Yehuda <muli@il.ibm.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Jon D. Mason <jdmason@kudzu.us> Cc: Corinna Schultz <coschult@us.ibm.com>
LKML-Reference: <20091202230556.GG10295@tux1.beaverton.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Bug reporter noted their system with an ASUS P4S800 motherboard would
hang when rebooting unless reboot=b was specified. Their dmidecode
didn't contain descriptive System Information for Manufacturer or
Product Name, so I used their Base Board Information to create a
reboot quirk patch. The bug reporter confirmed this patch resolves
the reboot hang.
Handle 0x0001, DMI type 1, 25 bytes
System Information
Manufacturer: System Manufacturer
Product Name: System Name
Version: System Version
Serial Number: SYS-1234567890
UUID: E0BFCD8B-7948-D911-A953-E486B4EEB67F
Wake-up Type: Power Switch
Handle 0x0002, DMI type 2, 8 bytes
Base Board Information
Manufacturer: ASUSTeK Computer INC.
Product Name: P4S800
Version: REV 1.xx
Serial Number: xxxxxxxxxxx
BugLink: http://bugs.launchpad.net/bugs/366682
ASUS P4S800 will hang when rebooting unless reboot=b is specified.
Add a quirk to reboot through the bios.
Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com>
LKML-Reference: <1259972107.4629.275.camel@emiko> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
For some devices the ACPI table may define unity map
requirements which must me met when the IOMMU is enabled. So
we need to attach devices to their domains as early as
possible so that these mappings are in place when needed.
This patch assigns the domains right after they are
allocated. Otherwise this can result in I/O page faults
before a driver binds to a device and BIOS is still using
it.
usb_bulk_msg() transfers only bytes up to the maximum packet size.
It must be repeated by the usbtmc driver until all bytes of a TMC message
are transfered.
Without this patch, ETIMEDOUT is reported when writing TMC messages
larger than the maximum USB bulk size and the transfer remains incomplete.
The user will notice that the device hangs and must be reset by either closing
the application or pulling the plug.
Signed-off-by: Andre Herms <andre.herms@tec-venture.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1311) fixes a problem in usb-storage: Some devices are
pretty broken when it comes to reporting sense data. The information
they send back indicates that they have more than 18 bytes of sense
data available, but when the system asks for more than 18 they fail or
hang. The symptom is that probing fails with multiple resets.
The patch adds a new BAD_SENSE flag to indicate that usb-storage
should never ask for more than 18 bytes of sense data. The flag can
be set in an unusual_devs entry or via the "quirks=" module parameter,
and it is set automatically whenever a REQUEST SENSE command for more
than 18 bytes fails or times out.
An unusual_devs entry is added for the Agfa photo frame, which uses a
Prolific chip having this bug.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Daniel Kukula <daniel.kuku@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add D-Link DWM-162-U5 device id 1e0e:ce16 into option driver. The device
has 4 interfaces, of which 1 is handled by storage and the other 3 by
option driver.
The device appears first as CD-only 05c6:2100 device and must be switched
to 1e0e:ce16 mode either by using "eject CD" or usb_modeswitch.
"The gadget EP0 code routinely ignores an interrupt at end of
the data phase because of musb_g_ep0_giveback() resetting the
state machine to "idle, waiting for SETUP" phase prematurely."
So, the majority of the cases of unhandled IRQs is still unfixed...
USB drivers that create character devices call usb_register_dev in their
probe function. This associates the usb_interface device with that minor
number and creates the character device and announces it to the world.
However, the driver's probe function is called before the new
usb_interface is added to the driver's klist_devices.
This is a problem because userspace will respond to the character device
creation announcement by opening the character device. The driver's open
function will the call usb_find_interface to find the usb_interface
associated with that minor number. usb_find_interface will walk the
driver's list of devices and find the usb_interface with the matching
minor number.
Because the announcement happens before the usb_interface is added to the
driver's klist_devices, a race condition exists. A straightforward fix
is to walk the list of devices on usb_bus_type instead since the device
is added to that list before the announcement occurs.
bus_find_device calls get_device to bump the reference count on the found
device. It is arguable that the reference count should be dropped by the
caller of usb_find_interface instead of usb_find_interface, however,
the current users of usb_find_interface do not expect this.
The original version of this patch only matched against minor number
instead of driver and minor number. This version matches against both.
The range check in the sprom image parser hex2sprom() is broken.
One sprom word is 4 hex characters.
This fixes the check and also adds much better sanity checks to the code.
We better make sure the image is OK by doing some sanity checks to avoid
bricking the device by accident.
Signed-off-by: Michael Buesch <mb@bu3sch.de> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
These drivers inherited from the older 'hpt366' IDE driver the buggy timing
register masks in their set_piomode() metods. As a result, too low command
cycle active time is programmed for slow PIO modes. Quite fortunately, it's
later "fixed up" by the set_dmamode() methods which also "helpfully" reprogram
the command timings, usually to PIO mode 4; unfortunately, setting an UltraDMA
mode #N also reprograms already set PIO data timings, usually to MWDMA mode #
max(N, 2) timings...
However, the drivers added some breakage of their own too: the bit that they
set/clear to control the FIFO is sometimes wrong -- it's actually the MSB of
the command cycle setup time; also, setting it in DMA mode is wrong as this
bit is only for PIO actually and clearing it for PIO modes is not needed as
no mode in any timing table has it set...
Fix all this, inverting the masks while at it, like in the 'hpt366' and
'pata_hpt366' drivers; bump the drivers' versions, accounting for recent
patches that forgot to do it...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
A write intent bitmap can be removed from an array while the
array is active.
When this happens, all IO is suspended and flushed before the
bitmap is removed.
However it is possible that bitmap_daemon_work is still running to
clear old bits from the bitmap. If it is, it can dereference the
bitmap after it has been freed.
So introduce a new mutex to protect bitmap_daemon_work and get it
before destroying a bitmap.
This patch moves s390 processor status word into the base kvm_run
struct and keeps it up-to date on all userspace exits.
The userspace ABI is broken by this, however there are no applications
in the wild using this. A capability check is provided so users can
verify the updated API exists.
If there is a failed journal checksum, don't reset the journal. This
allows for userspace programs to decide how to recover from this
situation. It may be that ignoring the journal checksum failure might
be a better way of recovering the file system. Once we add per-block
checksums, we can definitely do better. Until then, a system
administrator can try backing up the file system image (or taking a
snapshot) and and trying to determine experimentally whether ignoring
the checksum failure or aborting the journal replay results in less
data loss.
A specially-crafted Hierarchical File System (HFS) filesystem could cause
a buffer overflow to occur in a process's kernel stack during a memcpy()
call within the hfs_bnode_read() function (at fs/hfs/bnode.c:24). The
attacker can provide the source buffer and length, and the destination
buffer is a local variable of a fixed length. This local variable (passed
as "&entry" from fs/hfs/dir.c:112 and allocated on line 60) is stored in
the stack frame of hfs_bnode_read()'s caller, which is hfs_readdir().
Because the hfs_readdir() function executes upon any attempt to read a
directory on the filesystem, it gets called whenever a user attempts to
inspect any filesystem contents.
[amwang@redhat.com: modify this patch and fix coding style problems] Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Eugene Teo <eteo@redhat.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Dave Anderson <anderson@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
devpts_get_tty() assumes that the inode passed in is associated with a valid
pty. But if the only reference to the pty is via a bind-mount, the inode
passed to devpts_get_tty() while valid, would refer to a pty that no longer
exists.
With a lot of debug effort, Grzegorz Nosek developed a small program (see
below) to reproduce a crash on recent kernels. This crash is a regression
introduced by the commit:
Setting fops and private data outside of the mutex at debugfs file
creation introduces a race where the files can be opened with the wrong
file operations and private data. It is easy to trigger with a process
waiting on file creation notification.
commit d8e180dcd5bbbab9cd3ff2e779efcf70692ef541 "bsdacct: switch
credentials for writing to the accounting file" introduced credential
switching during final acct data collecting. However, uid/gid pair
continued to be collected from current which became credentials of who
created acct file, not who exits.
Without this we have no gaurantee of the integrity of the
EEPROM and are likely to encounter a lot of bogus bug reports
due to actual issues on the EEPROM. With the EEPROM checksum
check in place we can easily rule those issues out.
If you run patch during a revert *you* have a card with a busted
EEPROM and only older kernel will support that concoction. This
patch is a trade off between not accepitng bogus EEPROMs and
avoiding bogus bug reports allowing developers to focus instead
on real concrete issues.
If stable keeps bogus bug reports because of a possibly busted EEPROM
feel free to apply this there too.
Tested on an AR5414
Cc: jirislaby@gmail.com Cc: akpm@linux-foundation.org Cc: rjw@sisk.pl Cc: me@bobcopeland.com Cc: david.quan@atheros.com Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As a holdover from earlier code when we used to set
the power limit to '0' after a reset to configure the
default transmit power, ath5k interprets txpower=0 as
12.5 dBm. Fix that by just passing 0 through.
This fixes http://bugzilla.kernel.org/show_bug.cgi?id=14567
Reported-by: Daniel Folkers <daniel.folkers@task24.nl> Tested-by: Daniel Folkers <daniel.folkers@task24.nl> Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The timer stop callback can be called from snd_timer_interrupt(), which
is called from the hrtimer callback. Since hrtimer_cancel() waits for
the callback completion, this eventually results in a lock-up.
This patch fixes the problem by just toggling a flag at stop callback
and call hrtimer_cancel() later.
Queueing to receive an ISO packet with a payload length of zero
silently does nothing in dualbuffer mode, and crashes the kernel in
packet-per-buffer mode. Return an error in dualbuffer mode, because
the DMA controller won't let us do what we want, and work correctly in
packet-per-buffer mode.
Signed-off-by: Jay Fenlason <fenlason@redhat.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1312) fixes a minor bug in usb-storage. The
fill_inquiry() routine neglects to pre-load the inquiry data buffer
with spaces. As a result, if the vendor name is shorter than 8
characters or the product name is shorter than 16, the remainder will
be filled with garbage.
The patch also removes some unnecessary calls to strlen().
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
All architectures in the kernel increment/decrement the stack pointer
before storing values on the stack.
On architectures which have the stack grow down sas_ss_sp == sp is not
on the alternate signal stack while sas_ss_sp + sas_ss_size == sp is
on the alternate signal stack.
On architectures which have the stack grow up sas_ss_sp == sp is on
the alternate signal stack while sas_ss_sp + sas_ss_size == sp is not
on the alternate signal stack.
The current implementation fails for architectures which have the
stack grow down on the corner case where sas_ss_sp == sp.This was
reported as Debian bug #544905 on AMD64.
Simplified test case: http://download.breakpoint.cc/tc-sig-stack.c
The test case creates the following stack scenario:
0xn0300 stack top
0xn0200 alt stack pointer top (when switching to alt stack)
0xn01ff alt stack end
0xn0100 alt stack start == stack pointer
If the signal is sent the stack pointer is pointing to the base
address of the alt stack and the kernel erroneously decides that it
has already switched to the alternate stack because of the current
check for "sp - sas_ss_sp < sas_ss_size"
On parisc (stack grows up) the scenario would be:
0xn0200 stack pointer
0xn01ff alt stack end
0xn0100 alt stack start = alt stack pointer base
(when switching to alt stack)
0xn0000 stack base
This is handled correctly by the current implementation.
[ tglx: Modified for archs which have the stack grow up (parisc) which
would fail with the correct implementation for stack grows
down. Added a check for sp >= current->sas_ss_sp which is
strictly not necessary but makes the code symetric for both
variants ]
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Kyle McMartin <kyle@mcmartin.ca>
LKML-Reference: <20091025143758.GA6653@Chamillionaire.breakpoint.cc> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Some of our virtual SCSI hosts don't have a proper bus parent at the
top, which can be a problem for doing DMA on them
This patch makes the host device cache a pointer to the physical bus
device and provides an extra API for setting it (the normal API picks
it up from the parent). This patch also modifies the qla2xxx and lpfc
vport logic to use the new DMA host setting API.
Acked-By: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
include/scsi/osd_protocol.h uses ALIGN() without an #include
<linux/kernel.h>, leading to:
| include/scsi/osd_protocol.h:362: error: implicit declaration of function 'ALIGN'
Signed-off-by: Martin Michlmayr <tbm@cyrius.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch fixes three problems in the handling of the
EXT4_IOC_MOVE_EXT ioctl:
1. In current EXT4_IOC_MOVE_EXT, there are read access mode checks for
original and donor files, but they allow the illegal write access to
donor file, since donor file is overwritten by original file data. To
fix this problem, change access mode checks of original (r->r/w) and
donor (r->w) files.
2. Disallow the use of donor files that have a setuid or setgid bits.
3. Call mnt_want_write() and mnt_drop_write() before and after
ext4_move_extents() calling to get write access to a mount.
We cannot rely on buffer dirty bits during fsync because pdflush can come
before fsync is called and clear dirty bits without forcing a transaction
commit. What we do is that we track which transaction has last changed
the inode and which transaction last changed allocation and force it to
disk on fsync.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Inside ->setattr() call both ATTR_UID and ATTR_GID may be valid
This means that we may end-up with transferring all quotas. Add
we have to reserve QUOTA_DEL_BLOCKS for all quotas, as we do in
case of QUOTA_INIT_BLOCKS.
Currently all quota block reservation macros contains hard-coded "2"
aka MAXQUOTAS value. This is no good because in some places it is not
obvious to understand what does this digit represent. Let's introduce
new macro with self descriptive name.
There is a potential race when a transaction is committing right when
the file system is being umounting. This could reduce in a race
because EXT4_SB(sb)->s_group_info could be freed in ext4_put_super
before the commit code calls a callback so the mballoc code can
release freed blocks in the transaction, resulting in a panic trying
to access the freed s_group_info.
The fix is to wait for the transaction to finish committing before we
shutdown the multiblock allocator.
When ext4_write_begin fails after allocating some blocks or
generic_perform_write fails to copy data to write, we truncate blocks
already instantiated beyond i_size. Although these blocks were never
inside i_size, we have to truncate the pagecache of these blocks so
that corresponding buffers get unmapped. Otherwise subsequent
__block_prepare_write (called because we are retrying the write) will
find the buffers mapped, not call ->get_block, and thus the page will
be backed by already freed blocks leading to filesystem and data
corruption.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Integrate duplicate lines (acquire/release semaphore and invalidate
extent cache in move_extent_per_page()) into mext_replace_branches(),
to reduce source and object code size.
The move_extent.moved_len is used to pass back the number of exchanged
blocks count to user space. Currently the caller must clear this
field; but we spend more code space checking for this requirement than
simply zeroing the field ourselves, so let's just make life easier for
everyone all around.
At the beginning of ext4_move_extent(), we call
ext4_discard_preallocations() to discard inode PAs of orig and donor
inodes. But in the following case, blocks can be double freed, so
move ext4_discard_preallocations() to the end of ext4_move_extents().
1. Discard inode PAs of orig and donor inodes with
ext4_discard_preallocations() in ext4_move_extents().
orig : [ DATA1 ]
donor: [ DATA2 ]
2. While data blocks are exchanging between orig and donor inodes, new
inode PAs is created to orig by other process's block allocation.
(Since there are semaphore gaps in ext4_move_extents().) And new
inode PAs is used partially (2-1).
2-1 Create new inode PAs to orig inode
orig : [ DATA1 | used PA1 | free PA1 ]
donor: [ DATA2 ]
3. Donor inode which has old orig inode's blocks is deleted after
EXT4_IOC_MOVE_EXT finished (3-1, 3-2). So the block bitmap
corresponds to old orig inode's blocks are freed.
3-1 After EXT4_IOC_MOVE_EXT finished
orig : [ DATA2 | free PA1 ]
donor: [ DATA1 | used PA1 ]
4. The double-free of blocks is occurred, when close() is called to
orig inode. Because ext4_discard_preallocations() for orig inode
frees used PA1 and free PA1, though used PA1 is already freed in 3.
Users on the linux-ext4 list recently complained about differences
across filesystems w.r.t. how to mount without a journal replay.
In the discussion it was noted that xfs's "norecovery" option is
perhaps more descriptively accurate than "noload," so let's make
that an alias for ext4.
It is anticipated that when sb_issue_discard starts doing
real work on trim-capable devices, we may see issues. Make
this mount-time optional, and default it to off until we know
that things are working out OK.
When an error happened in ext4_splice_branch we failed to notice that
in ext4_ind_get_blocks and mapped the buffer anyway. Fix the problem
by checking for error properly.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit a71ce8c6c9bf269b192f352ea555217815cf027e updated ext4_statfs()
to update the on-disk superblock counters, but modified this buffer
directly without any journaling of the change. This is one of the
accesses that was causing the crc errors in journal replay as seen in
kernel.org bugzilla #14354.
ext4_xattr_set_handle() was zeroing out an inode outside
of journaling constraints; this is one of the accesses that
was causing the crc errors in journal replay as seen in
kernel.org bugzilla #14354.
Reviewed-by: Andreas Dilger <adilger@sun.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We need to be testing the i_flags field in the ext4 specific portion
of the inode, instead of the (confusingly aliased) i_flags field in
the generic struct inode.
When an inode gets unlinked, the functions ext4_clear_blocks() and
ext4_remove_blocks() call ext4_forget() for all the buffer heads
corresponding to the deleted inode's data blocks. If the inode is a
directory or a symlink, the is_metadata parameter must be non-zero so
ext4_forget() will revoke them via jbd2_journal_revoke(). Otherwise,
if these blocks are reused for a data file, and the system crashes
before a journal checkpoint, the journal replay could end up
corrupting these data blocks.
Thanks to Curt Wohlgemuth for pointing out potential problems in this
area.
One of the invalid error paths in ext4_iget() forgot to brelse() the
inode buffer head. Fix it by adding a brelse() in the common error
return path, which also simplifies function.
Thanks to Andi Kleen <ak@linux.intel.com> reporting the problem.
If CONFIG_PROVE_LOCKING is enabled, the double_down_write_data_sem()
will trigger a false-positive warning of a recursive lock. Since we
take i_data_sem for the two inodes ordered by their inode numbers,
this isn't a problem. Use of down_write_nested() will notify the lock
dependency checker machinery that there is no problem here.
ext4_move_extents() checks the logical block contiguousness
of original file with ext4_find_extent() and mext_next_extent().
Therefore the extent which ext4_ext_path structure indicates
must not be changed between above functions.
But in current implementation, there is no i_data_sem protection
between ext4_ext_find_extent() and mext_next_extent(). So the extent
which ext4_ext_path structure indicates may be overwritten by
delalloc. As a result, ext4_move_extents() will exchange wrong blocks
between original and donor files. I change the place where
acquire/release i_data_sem to solve this problem.
Moreover, I changed move_extent_per_page() to start transaction first,
and then acquire i_data_sem. Without this change, there is a
possibility of the deadlock between mmap() and ext4_move_extents():
* NOTE: "A", "B" and "C" mean different processes
A-1: ext4_ext_move_extents() acquires i_data_sem of two inodes.
B: do_page_fault() starts the transaction (T),
and then tries to acquire i_data_sem.
But process "A" is already holding it, so it is kept waiting.
C: While "A" and "B" running, kjournald2 tries to commit transaction (T)
but it is under updating, so kjournald2 waits for it.
A-2: Call ext4_journal_start with holding i_data_sem,
but transaction (T) is locked.
If the EXT4_IOC_MOVE_EXT ioctl fails, the number of blocks that were
exchanged before the failure should be returned to the userspace
caller. Unfortunately, currently if the block size is not the same as
the page size, the returned block count that is returned is the
page-aligned block count instead of the actual block count. This
commit addresses this bug.
If s_log_groups_per_flex is greater than 31, then groups_per_flex will
will overflow and cause a divide by zero error. This can cause kernel
BUG if such a file system is mounted.
Thanks to Nageswara R Sastry for analyzing the failure and providing
an initial patch.
Previously add_dirent_to_buf() did not free its passed-in buffer head
in the case of ENOSPC, since in some cases the caller still needed it.
However, this led to potential buffer head leaks since not all callers
dealt with this correctly. Fix this by making simplifying the freeing
convention; now add_dirent_to_buf() *never* frees the passed-in buffer
head, and leaves that to the responsibility of its caller. This makes
things cleaner and easier to prove that the code is neither leaking
buffer heads or calling brelse() one time too many.
To prepare for a direct I/O write, we need to split the unwritten
extents before submitting the I/O. When no extents needed to be
split, ext4_split_unwritten_extents() was incorrectly returning 0
instead of the size of uninitialized extents. This bug caused the
wrong return value sent back to VFS code when it gets called from
async IO path, leading to an unnecessary fall back to buffered IO.
This bug also hid the fact that the check to see whether or not a
split would be necessary was incorrect; we can only skip splitting the
extent if the write completely covers the uninitialized extent.
The ext4_debug() call in ext4_end_io_dio() should be moved after the
check to make sure that io_end is non-NULL.
The comment above ext4_get_block_dio_write() ("Maximum number of
blocks...") is a duplicate; the original and correct comment is above
the #define DIO_MAX_BLOCKS up above.
At the end of direct I/O operation, ext4_ext_direct_IO() always called
ext4_convert_unwritten_extents(), regardless of whether there were any
unwritten extents involved in the I/O or not.
This commit adds a state flag so that ext4_ext_direct_IO() only calls
ext4_convert_unwritten_extents() when necessary.
After a direct I/O request covering an uninitalized extent (i.e.,
created using the fallocate system call) or a hole in a file, ext4
will convert the uninitialized extent so it is marked as initialized
by calling ext4_convert_unwritten_extents(). This function returns
zero on success.
This return value was getting returned by ext4_direct_IO(); however
the file system's direct_IO function is supposed to return the number
of bytes read or written on a success. By returning zero, it confused
the direct I/O code into falling back to buffered I/O unnecessarily.
When restart a transaction during a truncate operation, we drop and
reacquire i_data_sem. After reacquiring i_data_sem, we need to
discard any inode-based preallocation that might have been grabbed
while we released i_data_sem (for example, if pdflush is allocating
blocks and racing against the truncate).
In ext4_num_dirty_pages() we were calling page_buffers() before
checking to see if the page actually had pages attached to it; this
would cause a BUG check crash in the inline function page_buffers().
Thanks to Markus Trippelsdorf for reporting this bug.
"Looking at ext4.h, I think the setting of extra time fields forgets to
mask the epoch bits so the epoch part overwrites nsec part. The second
change is only for coherency (2 -> EXT4_EPOCH_BITS)."
Thanks to Damien Guibouret for pointing out this problem.
This patch a problem that ext4_dirty_inode() was not calling
ext4_mark_inode_dirty() if the current_handle is not valid, which it
is the case in no journal mode.
It also removes a test for non-matching transaction which can never
happen.