When an ICMPV6_PKT_TOOBIG message is received with a MTU below 1280,
all further packets include a fragment header.
Unlike regular defragmentation, conntrack also needs to "reassemble"
those fragments in order to obtain a packet without the fragment
header for connection tracking. Currently nf_conntrack_reasm checks
whether a fragment has either IP6_MF set or an offset != 0, which
makes it ignore those fragments.
Remove the invalid check and make reassembly handle fragment queues
containing only a single fragment.
There's a race window in xen_drop_mm_ref, where remote cpu may exit
dirty bitmap between the check on this cpu and the point where remote
cpu handles drop request. So in drop_other_mm_ref we need check
whether TLB state is still lazy before calling into leave_mm. This
bug is rarely observed in earlier kernel, but exaggerated by the
commit 831d52bc153971b70e64eccfbed2b232394f22f8
("x86, mm: avoid possible bogus tlb entries by clearing prev mm_cpumask after switching mm")
which clears bitmap after changing the TLB state. the call trace is as below:
Tested-by: Maoxiaoyun<tinnycloud@hotmail.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com>
[v1: Fleshed out the git description a bit] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
TI816X (common name for DM816x/C6A816x/AM389x family) devices configured
to boot as PCIe Endpoint have class code = 0. This makes kernel PCI bus
code to skip allocating BARs to these devices resulting into following
type of error when trying to enable them:
"Device 0000:01:00.0 not available because of resource collisions"
The device cannot be operated because of the above issue.
This patch adds a ID specific (TI VENDOR ID and 816X DEVICE ID based)
'early' fixup quirk to replace class code with
PCI_CLASS_MULTIMEDIA_VIDEO as class.
The TCP connection state code depends on the state_change() callback
being called when the SYN_SENT state is set. However the networking layer
doesn't actually call us back in that case.
When finding or allocating a ram disk device, brd_probe() did not take
partition numbers into account so that it can result to a different
device. Consider following example (I set CONFIG_BLK_DEV_RAM_COUNT=4
for simplicity) :
$ sudo modprobe brd max_part=15
$ ls -l /dev/ram*
brw-rw---- 1 root disk 1, 0 2011-05-25 15:41 /dev/ram0
brw-rw---- 1 root disk 1, 16 2011-05-25 15:41 /dev/ram1
brw-rw---- 1 root disk 1, 32 2011-05-25 15:41 /dev/ram2
brw-rw---- 1 root disk 1, 48 2011-05-25 15:41 /dev/ram3
$ sudo mknod /dev/ram4 b 1 64
$ sudo dd if=/dev/zero of=/dev/ram4 bs=4k count=256
256+0 records in
256+0 records out 1048576 bytes (1.0 MB) copied, 0.00215578 s, 486 MB/s
namhyung@leonhard:linux$ ls -l /dev/ram*
brw-rw---- 1 root disk 1, 0 2011-05-25 15:41 /dev/ram0
brw-rw---- 1 root disk 1, 16 2011-05-25 15:41 /dev/ram1
brw-rw---- 1 root disk 1, 32 2011-05-25 15:41 /dev/ram2
brw-rw---- 1 root disk 1, 48 2011-05-25 15:41 /dev/ram3
brw-r--r-- 1 root root 1, 64 2011-05-25 15:45 /dev/ram4
brw-rw---- 1 root disk 1, 1024 2011-05-25 15:44 /dev/ram64
After this patch, /dev/ram4 - instead of /dev/ram64 - was
accessed correctly.
In addition, 'range' passed to blk_register_region() should
include all range of dev_t that RAMDISK_MAJOR can address.
It does not need to be limited by partition numbers unless
'rd_nr' param was specified.
The 'max_part' parameter controls the number of maximum partition
a brd device can have. However if a user specifies very large
value it would exceed the limitation of device minor number and
can cause a kernel panic (or, at least, produce invalid device
nodes in some cases).
On my desktop system, following command kills the kernel. On qemu,
it triggers similar oops but the kernel was alive:
$ sudo modprobe brd max_part=100000
BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
IP: [<ffffffff81110a9a>] sysfs_create_dir+0x2d/0xae
PGD 7af1067 PUD 7b19067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file:
CPU 0
Modules linked in: brd(+)
It's currently exposed only through /proc which, besides requiring
screen-scraping, doesn't allow userspace to distinguish between two
identical ATM adapters with different ATM indexes. The ATM device index
is required when using PPPoATM on a system with multiple ATM adapters.
Signed-off-by: Dan Williams <dcbw@redhat.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds a check that a block device has a request function
defined before it is used. Otherwise, misconfiguration can cause an oops.
Because we are allowing devices with zero size e.g. an offline multipath
device as in commit 2cd54d9bedb79a97f014e86c0da393416b264eb3
("dm: allow offline devices") there needs to be an additional check
to ensure devices are initialised. Some block devices, like a loop
device without a backing file, exist but have no request function.
Reproducer is trivial: dm-mirror on unbound loop device
(no backing file on loop devices)
Cpuidle menu governor is using u32 as a temporary datatype for storing
nanosecond values which wrap around at 4.294 seconds. This causes errors
in predicted sleep times resulting in higher than should be C state
selection and increased power consumption. This also breaks cpuidle
state residency statistics.
Signed-off-by: Tero Kristo <tero.kristo@nokia.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
i8k uses lahf to read the flag register in 64-bit code; early x86-64
CPUs, however, lack this instruction and we get an invalid opcode
exception at runtime.
Use pushf to load the flag register into the stack instead.
Signed-off-by: Luca Tettamanti <kronos.it@gmail.com> Reported-by: Jeff Rickman <jrickman@myamigos.us> Tested-by: Jeff Rickman <jrickman@myamigos.us> Tested-by: Harry G McGavran Jr <w5pny@arrl.net> Cc: Massimo Dal Zotto <dz@debian.org> Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When re-mounting from R/O mode to R/W mode and the LEB count in the superblock
is not up-to date, because for the underlying UBI volume became larger, we
re-write the superblock. We allocate RAM for these purposes, but never free it.
So this is a memory leak, although very rare one.
The buffers allocated while encrypting and decrypting long filenames can
sometimes straddle two pages. In this situation, virt_to_scatterlist()
will return -ENOMEM, causing the operation to fail and the user will get
scary error messages in their logs:
kernel: ecryptfs_write_tag_70_packet: Internal error whilst attempting
to convert filename memory to scatterlist; expected rc = 1; got rc =
[-12]. block_aligned_filename_size = [272]
kernel: ecryptfs_encrypt_filename: Error attempting to generate tag 70
packet; rc = [-12]
kernel: ecryptfs_encrypt_and_encode_filename: Error attempting to
encrypt filename; rc = [-12]
kernel: ecryptfs_lookup: Error attempting to encrypt and encode
filename; rc = [-12]
The solution is to allow up to 2 scatterlist entries to be used.
Reported-by: Mark Davis <marked86@gmail.com> Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The original problem encountered by people using NVIDIA chipsets was
that USB devices were not turning off when the system shut down. For
example, the LED on an optical mouse would remain on, draining a
laptop's battery. The problem was caused by a bug in the chipset; an
OHCI controller in the Reset state would continue to drive a bus reset
signal even after system shutdown. The workaround was to put the
controllers into the Suspend state instead.
It turns out that later NVIDIA chipsets do not suffer from this bug.
Instead some have the opposite bug: If a system is shut down while an
OHCI controller is in the Suspend state, USB devices remain powered!
On other systems, shutting down with a Suspended controller causes the
system to reboot immediately. Thus, working around the original bug
on some machines exposes other bugs on other machines.
The best solution seems to be to limit the workaround to OHCI
controllers with a low-numbered PCI product ID. I don't know exactly
at what point NVIDIA changed their chipsets; the value used here is a
guess. So far it was worked out okay for all the people who have
tested it.
This fixes Bugzilla #35032.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Andre "Osku" Schmidt <andre.osku.schmidt@googlemail.com> Tested-by: Yury Siamashka <yurand2@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
introduced a bug. The USB 2.0 spec says that full speed isochronous endpoints'
bInterval must be decoded as an exponent to a power of two (e.g. interval =
2^(bInterval - 1)). Full speed interrupt endpoints, on the other hand, don't
use exponents, and the interval in frames is encoded straight into bInterval.
Dmitry's patch was supposed to fix up the full speed isochronous to parse
bInterval as an exponent, but instead it changed the *interrupt* endpoint
bInterval decoding. The isochronous endpoint encoding was the same.
This caused full speed devices with interrupt endpoints (including mice, hubs,
and USB to ethernet devices) to fail under NEC 0.96 xHCI host controllers:
[ 100.909818] xhci_hcd 0000:06:00.0: add ep 0x83, slot id 1, new drop flags = 0x0, new add flags = 0x99, new slot info = 0x38100000
[ 100.909821] xhci_hcd 0000:06:00.0: xhci_check_bandwidth called for udev ffff88011f0ea000
...
[ 100.910187] xhci_hcd 0000:06:00.0: ERROR: unexpected command completion code 0x11.
[ 100.910190] xhci_hcd 0000:06:00.0: xhci_reset_bandwidth called for udev ffff88011f0ea000
When the interrupt endpoint was added and a Configure Endpoint command was
issued to the host, the host controller would return a very odd error message
(0x11 means "Slot Not Enabled", which isn't true because the slot was enabled).
Probably the host controller was getting very confused with the bad encoding.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Cc: Dmitry Torokhov <dtor@vmware.com> Reported-by: Thomas Lindroth <thomas.lindroth@gmail.com> Tested-by: Thomas Lindroth <thomas.lindroth@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
composite.c always sets req->length to zero
and expects function driver's setup handlers
to return the amount of bytes to be used
on req->length. If we test against req->length
w_length will always be greater than req->length
thus making us always stall that particular
SEND_ENCAPSULATED_COMMAND request.
Tested against a Windows XP SP3.
Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch fixes a problem where data received from the gps is sometimes
transferred incompletely to the serial port. If used in native mode now
all data received via the bulk queue will be forwarded to the serial
port.
Signed-off-by: Hermann Kneissel <herkne@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When finding or allocating a loop device, loop_probe() did not take
partition numbers into account so that it can result to a different
device. Consider following example:
$ sudo modprobe loop max_part=15
$ ls -l /dev/loop*
brw-rw---- 1 root disk 7, 0 2011-05-24 22:16 /dev/loop0
brw-rw---- 1 root disk 7, 16 2011-05-24 22:16 /dev/loop1
brw-rw---- 1 root disk 7, 32 2011-05-24 22:16 /dev/loop2
brw-rw---- 1 root disk 7, 48 2011-05-24 22:16 /dev/loop3
brw-rw---- 1 root disk 7, 64 2011-05-24 22:16 /dev/loop4
brw-rw---- 1 root disk 7, 80 2011-05-24 22:16 /dev/loop5
brw-rw---- 1 root disk 7, 96 2011-05-24 22:16 /dev/loop6
brw-rw---- 1 root disk 7, 112 2011-05-24 22:16 /dev/loop7
$ sudo mknod /dev/loop8 b 7 128
$ sudo losetup /dev/loop8 ~/temp/disk-with-3-parts.img
$ sudo losetup -a
/dev/loop128: [0805]:278201 (/home/namhyung/temp/disk-with-3-parts.img)
$ ls -l /dev/loop*
brw-rw---- 1 root disk 7, 0 2011-05-24 22:16 /dev/loop0
brw-rw---- 1 root disk 7, 16 2011-05-24 22:16 /dev/loop1
brw-rw---- 1 root disk 7, 2048 2011-05-24 22:18 /dev/loop128
brw-rw---- 1 root disk 7, 2049 2011-05-24 22:18 /dev/loop128p1
brw-rw---- 1 root disk 7, 2050 2011-05-24 22:18 /dev/loop128p2
brw-rw---- 1 root disk 7, 2051 2011-05-24 22:18 /dev/loop128p3
brw-rw---- 1 root disk 7, 32 2011-05-24 22:16 /dev/loop2
brw-rw---- 1 root disk 7, 48 2011-05-24 22:16 /dev/loop3
brw-rw---- 1 root disk 7, 64 2011-05-24 22:16 /dev/loop4
brw-rw---- 1 root disk 7, 80 2011-05-24 22:16 /dev/loop5
brw-rw---- 1 root disk 7, 96 2011-05-24 22:16 /dev/loop6
brw-rw---- 1 root disk 7, 112 2011-05-24 22:16 /dev/loop7
brw-r--r-- 1 root root 7, 128 2011-05-24 22:17 /dev/loop8
After this patch, /dev/loop8 - instead of /dev/loop128 - was
accessed correctly.
In addition, 'range' passed to blk_register_region() should
include all range of dev_t that LOOP_MAJOR can address. It does
not need to be limited by partition numbers unless 'max_loop'
param was specified.
The 'max_part' parameter controls the number of maximum partition
a loop block device can have. However if a user specifies very
large value it would exceed the limitation of device minor number
and can cause a kernel panic (or, at least, produce invalid
device nodes in some cases).
On my desktop system, following command kills the kernel. On qemu,
it triggers similar oops but the kernel was alive:
$ sudo modprobe loop max_part0000
------------[ cut here ]------------
kernel BUG at /media/Linux_Data/project/linux/fs/sysfs/group.c:65!
invalid opcode: 0000 [#1] SMP
last sysfs file:
CPU 0
Modules linked in: loop(+)
I'm not entirely sure it needs to go into 32, but it's probably the right
thing to do. Another way of explaining the patch is:
- we currently pick the _first_ exactly matching bus resource entry, but
the _last_ inexactly matching one. Normally first/last shouldn't
matter, but bus resource entries aren't actually all created equal: in
a transparent bus, the last resources will be the parent resources,
which we should generally try to avoid unless we have no choice. So
"first matching" is the thing we should always aim for.
- the patch is a bit bigger than it needs to be, because I simplified the
logic at the same time. It used to be a fairly incomprehensible
if ((res->flags & IORESOURCE_PREFETCH) && !(r->flags & IORESOURCE_PREFETCH))
best = r; /* Approximating prefetchable by non-prefetchable */
and technically, all the patch did was to make that complex choice be
even more complex (it basically added a "&& !best" to say that if we
already gound a non-prefetchable window for the prefetchable resource,
then we won't override an earlier one with that later one: remember
"first matching").
- So instead of that complex one with three separate conditionals in one,
I split it up a bit, and am taking advantage of the fact that we
already handled the exact case, so if 'res->flags' has the PREFETCH
bit, then we already know that 'r->flags' will _not_ have it. So the
simplified code drops the redundant test, and does the new '!best' test
separately. It also uses 'continue' as a way to ignore the bus
resource we know doesn't work (ie a prefetchable bus resource is _not_
acceptable for anything but an exact match), so it turns into:
/* We can't insert a non-prefetch resource inside a prefetchable parent .. */
if (r->flags & IORESOURCE_PREFETCH)
continue;
/* .. but we can put a prefetchable resource inside a non-prefetchable one */
if (!best)
best = r;
instead. With the comments, it's now six lines instead of two, but it's
conceptually simpler, and I _could_ have written it as two lines:
if ((res->flags & IORESOURCE_PREFETCH) && !best)
best = r; /* Approximating prefetchable by non-prefetchable */
I believe I found a problem in __alloc_pages_slowpath, which allows a
process to get stuck endlessly looping, even when lots of memory is
available.
Running an I/O and memory intensive stress-test I see a 0-order page
allocation with __GFP_IO and __GFP_WAIT, running on a system with very
little free memory. Right about the same time that the stress-test gets
killed by the OOM-killer, the utility trying to allocate memory gets stuck
in __alloc_pages_slowpath even though most of the systems memory was freed
by the oom-kill of the stress-test.
The utility ends up looping from the rebalance label down through the
wait_iff_congested continiously. Because order=0,
__alloc_pages_direct_compact skips the call to get_page_from_freelist.
Because all of the reclaimable memory on the system has already been
reclaimed, __alloc_pages_direct_reclaim skips the call to
get_page_from_freelist. Since there is no __GFP_FS flag, the block with
__alloc_pages_may_oom is skipped. The loop hits the wait_iff_congested,
then jumps back to rebalance without ever trying to
get_page_from_freelist. This loop repeats infinitely.
The test case is pretty pathological. Running a mix of I/O stress-tests
that do a lot of fork() and consume all of the system memory, I can pretty
reliably hit this on 600 nodes, in about 12 hours. 32GB/node.
Signed-off-by: Andrew Barry <abarry@cray.com> Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: Rik van Riel<riel@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There are no signs of a dmic at node 0x0b, so the user is left with
an additional internal mic which does not exist. This commit removes
that non-existing mic.
BugLink: http://bugs.launchpad.net/bugs/731706 Reported-by: James Page <james.page@canonical.com> Signed-off-by: David Henningsson <david.henningsson@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Move the smp_rmb after cpu_relax loop in read_seqlock and add
ACCESS_ONCE to make sure the test and return are consistent.
A multi-threaded core in the lab didn't like the update
from 2.6.35 to 2.6.36, to the point it would hang during
boot when multiple threads were active. Bisection showed af5ab277ded04bd9bc6b048c5a2f0e7d70ef0867 (clockevents:
Remove the per cpu tick skew) as the culprit and it is
supported with stack traces showing xtime_lock waits including
tick_do_update_jiffies64 and/or update_vsyscall.
Experimentation showed the combination of cpu_relax and smp_rmb
was significantly slowing the progress of other threads sharing
the core, and this patch is effective in avoiding the hang.
A theory is the rmb is affecting the whole core while the
cpu_relax is causing a resource rebalance flush, together they
cause an interfernce cadance that is unbroken when the seqlock
reader has interrupts disabled.
At first I was confused why the refactor in 3c22cd5709e8143444a6d08682a87f4c57902df3 (kernel: optimise
seqlock) didn't affect this patch application, but after some
study that affected seqcount not seqlock. The new seqcount was
not factored back into the seqlock. I defer that the future.
While the removal of the timer interrupt offset created
contention for the xtime lock while a cpu does the
additonal work to update the system clock, the seqlock
implementation with the tight rmb spin loop goes back much
further, and is just waiting for the right trigger.
Signed-off-by: Milton Miller <miltonm@bga.com> Cc: <linuxppc-dev@lists.ozlabs.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Anton Blanchard <anton@samba.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Link: http://lkml.kernel.org/r/%3Cseqlock-rmb%40mdm.bga.com%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As Ben Hutchings discovered [1], the patch for CVE-2011-1017 (buffer
overflow in ldm_frag_add) is not sufficient. The original patch in
commit c340b1d64000 ("fs/partitions/ldm.c: fix oops caused by corrupted
partition table") does not consider that, for subsequent fragments,
previously allocated memory is used.
[1] http://lkml.org/lkml/2011/5/6/407
Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Timo Warns <warns@pre-sense.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
HARDIRQ_ENTER() maps to irq_enter() which calls rcu_irq_enter().
But HARDIRQ_EXIT() maps to __irq_exit() which doesn't call
rcu_irq_exit().
So for every locking selftest that simulates hardirq disabled,
we create an imbalance in the rcu extended quiescent state
internal state.
As a result, after the first missing rcu_irq_exit(), subsequent
irqs won't exit dyntick-idle mode after leaving the interrupt
handler. This means that RCU won't see the affected CPU as being
in an extended quiescent state, resulting in long grace-period
delays (as in grace periods extending for hours).
To fix this, just use __irq_enter() to simulate the hardirq
context. This is sufficient for the locking selftests as we
don't need to exit any extended quiescent state or perform
any check that irqs normally do when they wake up from idle.
As a side effect, this patch makes it possible to restore
"rcu: Decrease memory-barrier usage based on semi-formal proof",
which eventually helped finding this bug.
Reported-and-tested-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
introduced a read and a write to the MC4 mask msr.
Unfortunatly this MSR is not emulated by the KVM hypervisor
so that the kernel will get a #GP and crashes when applying
this workaround when running inside KVM.
This issue was reported as:
https://bugzilla.kernel.org/show_bug.cgi?id=35132
and is fixed with this patch. The change just let the kernel
ignore any #GP it gets while accessing this MSR by using the
_safe msr access methods.
Reported-by: Török Edwin <edwintorok@gmail.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Maciej Rutecki <maciej.rutecki@gmail.com> Cc: Avi Kivity <avi@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit b87cf80af3ba4b4c008b4face3c68d604e1715c6 added support for
ARAT (Always Running APIC timer) on AMD processors that are not
affected by erratum 400. This erratum is present on certain processor
families and prevents APIC timer from waking up the CPU when it
is in a deep C state, including C1E state.
Determining whether a processor is affected by this erratum may
have some corner cases and handling these cases is somewhat
complicated. In the interest of simplicity we won't claim ARAT
support on processor families below 0x12 and will go back to
broadcasting timer when going idle.
Signed-off-by: Boris Ostrovsky <ostr@amd64.org> Link: http://lkml.kernel.org/r/1306423192-19774-1-git-send-email-ostr@amd64.org Tested-by: Boris Petkov <borislav.petkov@amd.com> Cc: Hans Rosenfeld <Hans.Rosenfeld@amd.com> Cc: Andreas Herrmann <Andreas.Herrmann3@amd.com> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
with comment "The following patch fixes it by using the '+' operator on
the (*field) operand, marking it as read-write to gcc."
'+' was actually forgotten. This really puts it.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Signed-off-by: James Bottomley <jbottomley@parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If an application program does not make any changes to the indirect
blocks or extent tree, i_datasync_tid will not get updated. If there
are enough commits (i.e., 2**31) such that tid_geq()'s calculations
wrap, and there isn't a currently active transaction at the time of
the fdatasync() call, this can end up triggering a BUG_ON in
fs/jbd/commit.c:
J_ASSERT(journal->j_running_transaction != NULL);
It's pretty rare that this can happen, since it requires the use of
fdatasync() plus *very* frequent and excessive use of fsync(). But
with the right workload, it can.
We fix this by replacing the use of tid_geq() with an equality test,
since there's only one valid transaction id that is valid for us to
start: namely, the currently running transaction (if it exists).
Reported-by: Martin_Zielinski@McAfee.com Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In do_get_write_access() we wait on BH_Unshadow bit for buffer to get
from shadow state. The waking code in journal_commit_transaction() has
a bug because it does not issue a memory barrier after the buffer is moved
from the shadow state and before wake_up_bit() is called. Thus a waitqueue
check can happen before the buffer is actually moved from the shadow state
and waiting process may never be woken. Fix the problem by issuing proper
barrier.
Reported-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When make_indexed_dir() fails (e.g. because of ENOSPC) after it has allocated
block for index tree root, we did not properly mark all changed buffers dirty.
This lead to only some of these buffers being written out and thus effectively
corrupting the directory.
Fix the issue by marking all changed data dirty even in the error failure case.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
there's a kernel bug related to reading the last allowed page on x86_64.
The _copy_to_user() and _copy_from_user() functions use the following
check for address limit:
if (buf + size >= limit)
fail();
while it should be more permissive:
if (buf + size > limit)
fail();
That's because the size represents the number of bytes being
read/write from/to buf address AND including the buf address.
So the copy function will actually never touch the limit
address even if "buf + size == limit".
Following program fails to use the last page as buffer
due to the wrong limit check:
The other place checking the addr limit is the access_ok() function,
which is working properly. There's just a misleading comment
for the __range_not_ok() macro - which this patch fixes as well.
The last page of the user-space address range is a guard page and
Brian Gerst observed that the guard page itself due to an erratum on K8 cpus
(#121 Sequential Execution Across Non-Canonical Boundary Causes Processor
Hang).
However, the test code is using the last valid page before the guard page.
The bug is that the last byte before the guard page can't be read
because of the off-by-one error. The guard page is left in place.
This bug would normally not show up because the last page is
part of the process stack and never accessed via syscalls.
Currently mtdconcat is broken for NAND. An attemtpt to create
JFFS2 filesystem on concatenation of several NAND devices fails
with OOB write errors. This patch fixes that problem.
Signed-off-by: Felix Radensky <felix@embedded-sol.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
blk_cleanup_queue() calls elevator_exit() and after this, we can't
touch the elevator without oopsing. __elv_next_request() must check
for this state because in the refcounted queue model, we can still
call it after blk_cleanup_queue() has been called.
This was reported as causing an oops attributable to scsi.
__blkdev_get() doesn't rescan partitions if disk->fops->open() fails,
which leads to ghost partition devices lingering after medimum removal
is known to both the kernel and userland. The behavior also creates a
subtle inconsistency where O_NONBLOCK open, which doesn't fail even if
there's no medium, clears the ghots partitions, which is exploited to
work around the problem from userland.
Fix it by updating __blkdev_get() to issue partition rescan after
-ENOMEDIA too.
This was reported in the following bz.
https://bugzilla.kernel.org/show_bug.cgi?id=13029
Stable: 2.6.38
Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: David Zeuthen <zeuthen@gmail.com> Reported-by: Martin Pitt <martin.pitt@ubuntu.com> Reported-by: Kay Sievers <kay.sievers@vrfy.org> Tested-by: Kay Sievers <kay.sievers@vrfy.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit 0837e3242c73566fc1c0196b4ec61779c25ffc93 fixes a situation on POWER7
where events can roll back if a specualtive event doesn't actually complete.
This can raise a performance monitor exception. We need to catch this to ensure
that we reset the PMC. In all cases the PMC will be less than 256 cycles from
overflow.
This patch lifts Anton's fix for the problem in perf and applies it to oprofile
as well.
Signed-off-by: Eric B Munson <emunson@mgebm.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit 1fc711f7ffb01089efc58042cfdbac8573d1b59a (powerpc/kexec: Fix race
in kexec shutdown) moved the write to signal the cpu had exited the kernel
from before the transition to real mode in kexec_smp_wait to kexec_wait.
Unfornately it missed that kexec_wait is used both by cpus leaving the
kernel and by secondary slave cpus that were not allocated a paca for
what ever reason -- they could be beyond nr_cpus or not described in
the current device tree for whatever reason (for example, kexec-load
was not refreshed after a cpu hotplug operation). Cpus coming through
that path they will write to paca[NR_CPUS] which is beyond the space
allocated for the paca data and overwrite memory not allocated to pacas
but very likely still real mode accessable).
Move the write back to kexec_smp_wait, which is used only by cpus that
found their paca, but after the transition to real mode.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When a CPU is taken offline in an SMP system, cpufreq_remove_dev()
nulls out the per-cpu policy before cpufreq_stats_free_table() can
make use of it. cpufreq_stats_free_table() then skips the
call to sysfs_remove_group(), leaving about 100 bytes of sysfs-related
memory unclaimed each time a CPU-removal occurs. Break up
cpu_stats_free_table into sysfs and table portions, and
call the sysfs portion early.
Signed-off-by: Steven Finney <steven.finney@palm.com> Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When we discover CPUs that are affected by each other's
frequency/voltage transitions, the first CPU gets a sysfs directory
created, and rest of the siblings get symlinks. Currently, when we
hotplug off only the first CPU, all of the symlinks and the sysfs
directory gets removed. Even though rest of the siblings are still
online and functional, they are orphaned, and no longer governed by
cpufreq.
This patch, given the above scenario, creates a sysfs directory for
the first sibling and symlinks for the rest of the siblings.
Please note the recursive call, it was rather too ugly to roll it
out. And the removal of redundant NULL setting (it is already taken
care of near the top of the function).
Signed-off-by: Jacob Shin <jacob.shin@amd.com> Acked-by: Mark Langsdorf <mark.langsdorf@amd.com> Reviewed-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The kmemleak_seq_next() function tries to get an object (and increment
its use count) before returning it. If it could not get the last object
during list traversal (because it may have been freed), the function
should return NULL rather than a pointer to such object that it did not
get.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Phil Carmody <ext-phil.2.carmody@nokia.com> Acked-by: Phil Carmody <ext-phil.2.carmody@nokia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If function tracing is enabled, a read of the filter files will
cause the call to stop_machine to update the function trace sites.
It should only call stop_machine on write.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Ben Hutchings [Tue, 17 May 2011 00:48:14 +0000 (01:48 +0100)]
netxen: Remove references to unified firmware file
Commit c23a103f0d9c2560c6839ed366feebec4cd5e556 wrongly introduced
references to the unified firmware file "phanfw.bin", which is not
supported by netxen in 2.6.32. The driver reports this filename when
loading firmware from flash, and includes a MODULE_FIRMWARE hint for
the filename even though it will never use it.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
During initialization of vmxnet3, the state of LRO
gets out of sync with netdev->features.
This leads to very poor TCP performance in a IP forwarding
setup and is hitting many VMware users.
Simplified call sequence:
1. vmxnet3_declare_features() initializes "adapter->lro" to true.
2. The kernel automatically disables LRO if IP forwarding is enabled,
so vmxnet3_set_flags() gets called. This also updates netdev->features.
3. Now vmxnet3_setup_driver_shared() is called. "adapter->lro" is still
set to true and LRO gets enabled again, even though
netdev->features shows it's disabled.
Fix it by updating "adapter->lro", too.
The private vmxnet3 adapter flags are scheduled for removal
in net-next, see commit a0d2730c9571aeba793cb5d3009094ee1d8fda35
"net: vmxnet3: convert to hw_features".
Patch applies to 2.6.37 / 2.6.38 and 2.6.39-rc6.
Please CC: comments.
Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
b may be added to a list, but is not removed before being freed
in the case of an error. This is done in the corresponding
deallocation function, so the code here has been changed to
follow that.
The sematic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression E,E1,E2;
identifier l;
@@
*list_add(&E->l,E1);
... when != E1
when != list_del(&E->l)
when != list_del_init(&E->l)
when != E = E2
*kfree(E);// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Link: http://lkml.kernel.org/r/1305294731-12127-1-git-send-email-julia@diku.dk Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch fixes a bug reported by a customer, who found
that many unreasonable error interrupts reported on all
non-boot CPUs (APs) during the system boot stage.
According to Chapter 10 of Intel Software Developer Manual
Volume 3A, Local APIC may signal an illegal vector error when
an LVT entry is set as an illegal vector value (0~15) under
FIXED delivery mode (bits 8-11 is 0), regardless of whether
the mask bit is set or an interrupt actually happen. These
errors are seen as error interrupts.
The initial value of thermal LVT entries on all APs always reads
0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI
sequence to them and LVT registers are reset to 0s except for
the mask bits which are set to 1s when APs receive INIT IPI.
When the BIOS takes over the thermal throttling interrupt,
the LVT thermal deliver mode should be SMI and it is required
from the kernel to keep AP's LVT thermal monitoring register
programmed as such as well.
This issue happens when BIOS does not take over thermal throttling
interrupt, AP's LVT thermal monitor register will be restored to
0x10000 which means vector 0 and fixed deliver mode, so all APs will
signal illegal vector error interrupts.
This patch check if interrupt delivery mode is not fixed mode before
restoring AP's LVT thermal monitor register.
The first cpu which switches from periodic to oneshot mode switches
also the broadcast device into oneshot mode. The broadcast device
serves as a backup for per cpu timers which stop in deeper
C-states. To avoid starvation of the cpus which might be in idle and
depend on broadcast mode it marks the other cpus as broadcast active
and sets the brodcast expiry value of those cpus to the next tick.
The oneshot mode broadcast bit for the other cpus is sticky and gets
only cleared when those cpus exit idle. If a cpu was not idle while
the bit got set in consequence the bit prevents that the broadcast
device is armed on behalf of that cpu when it enters idle for the
first time after it switched to oneshot mode.
In most cases that goes unnoticed as one of the other cpus has usually
a timer pending which keeps the broadcast device armed with a short
timeout. Now if the only cpu which has a short timer active has the
bit set then the broadcast device will not be armed on behalf of that
cpu and will fire way after the expected timer expiry. In the case of
Christians bug report it took ~145 seconds which is about half of the
wrap around time of HPET (the limit for that device) due to the fact
that all other cpus had no timers armed which expired before the 145
seconds timeframe.
The solution is simply to clear the broadcast active bit
unconditionally when a cpu switches to oneshot mode after the first
cpu switched the broadcast device over. It's not idle at that point
otherwise it would not be executing that code.
[ I fundamentally hate that broadcast crap. Why the heck thought some
folks that when going into deep idle it's a brilliant concept to
switch off the last device which brings the cpu back from that
state? ]
Thanks to Christian for providing all the valuable debug information!
Reported-and-tested-by: Christian Hoffmann <email@christianhoffmann.info> Cc: John Stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/%3Calpine.LFD.2.02.1105161105170.3078%40ionos%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Christian Hoffmann reported that the command line clocksource override
with acpi_pm timer fails:
Kernel command line: <SNIP> clocksource=acpi_pm
hpet clockevent registered
Switching to clocksource hpet
Override clocksource acpi_pm is not HRT compatible.
Cannot switch while in HRT/NOHZ mode.
The watchdog code is what enables CLOCK_SOURCE_VALID_FOR_HRES, but we
actually end up selecting the clocksource before we enqueue it into
the watchdog list, so that's why we see the warning and fail to switch
to acpi_pm timer as requested. That's particularly bad when we want to
debug timekeeping related problems in early boot.
Put the selection call last.
Reported-by: Christian Hoffmann <email@christianhoffmann.info> Signed-off-by: John Stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/%3C1304558210.2943.24.camel%40work-vm%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Trying to enable the local APIC timer on early K8 revisions
uncovers a number of other issues with it, in conjunction with
the C1E enter path on AMD. Fixing those causes much more churn
and troubles than the benefit of using that timer brings so
don't enable it on K8 at all, falling back to the original
functionality the kernel had wrt to that.
Reported-and-bisected-by: Nick Bowler <nbowler@elliptictech.com> Cc: Boris Ostrovsky <Boris.Ostrovsky@amd.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Hans Rosenfeld <hans.rosenfeld@amd.com> Cc: Nick Bowler <nbowler@elliptictech.com> Cc: Joerg-Volker-Peetz <jvpeetz@web.de> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Link: http://lkml.kernel.org/r/1305636919-31165-3-git-send-email-bp@amd64.org Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Moving the lower endpoint of the Erratum 400 check to accomodate
earlier K8 revisions (A-E) opens a can of worms which is simply
not worth to fix properly by tweaking the errata checking
framework:
* missing IntPenging MSR on revisions < CG cause #GP:
The is_path_accessible check uses a QPathInfo call, which isn't
supported by ancient win9x era servers. Fall back to an older
SMBQueryInfo call if it fails with the magic error codes.
Reported-and-Tested-by: Sandro Bonazzola <sandro.bonazzola@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Changeset b6114794a1c394534659f4a17420e48cf23aa922 ("zorro8390: convert to
net_device_ops") broke zorro8390 by adding 8390.o to the link. That
meant that lib8390.c was included twice, once in zorro8390.c and once in
8390.c, subject to different macros. This patch reverts that by
avoiding the wrappers in 8390.c.
Reported-by: Christian T. Steigies <cts@debian.org> Suggested-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Christian T. Steigies <cts@debian.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We occasionally see list corruption using libertas.
While we haven't been able to diagnose this precisely, we have spotted
a possible cause: cmdpendingq is generally modified with driver_lock
held. However, there are a couple of points where this is not the case.
Fix up those operations to execute under the lock, it seems like
the correct thing to do and will hopefully improve the situation.
Signed-off-by: Paul Fox <pgf@laptop.org> Signed-off-by: Daniel Drake <dsd@laptop.org> Acked-by: Dan Williams <dcbw@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Changeset 5618f0d1193d6b051da9b59b0e32ad24397f06a4 ("hydra: convert to
net_device_ops") broke hydra by adding 8390.o to the link. That
meant that lib8390.c was included twice, once in hydra.c and once in
8390.c, subject to different macros. This patch reverts that by
avoiding the wrappers in 8390.c.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Changeset dcd39c90290297f6e6ed8a04bb20da7ac2b043c5 ("ne-h8300: convert to
net_device_ops") broke ne-h8300 by adding 8390.o to the link. That
meant that lib8390.c was included twice, once in ne-h8300.c and once in
8390.c, subject to different macros. This patch reverts that by
avoiding the wrappers in 8390.c.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
TTY layer expects 0 if the ldisc->open operation succeeded.
Signed-off-by : Matvejchikov Ilya <matvejchikov@gmail.com> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Alan Cox <alan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently EHEA reports to ethtool as supporting 10M, 100M, 1G and
10G and connected to FIBRE independent of the hardware configuration.
However, when connected to FIBRE the only supported speed is 10G
full-duplex, and the other speeds and modes are only supported
when connected to twisted pair.
Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com> Acked-by: Breno Leitao <leitao@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
john stultz [Wed, 11 May 2011 23:10:28 +0000 (16:10 -0700)]
Fix time() inconsistencies caused by intermediate xtime_cache values being read
Currently with 2.6.32-longterm, its possible for time() to occasionally
return values one second earlier then the previous time() call.
This happens because update_xtime_cache() does:
xtime_cache = xtime;
timespec_add_ns(&xtime_cache, nsec);
Its possible that xtime is 1sec,999msecs, and nsecs is 1ms, resulting in
a xtime_cache that is 2sec,0ms.
get_seconds() (which is used by sys_time()) does not take the
xtime_lock, which is ok as the xtime.tv_sec value is a long and can be
atomically read safely.
The problem occurs the next call to update_xtime_cache() if xtime has
not increased:
/* This sets xtime_cache back to 1sec, 999msec */
xtime_cache = xtime;
/* get_seconds, calls here, and sees a 1second inconsistency */
timespec_add_ns(&xtime_cache, nsec);
In order to resolve this, we could add locking to get_seconds(), but it
needs to be lock free, as it is called from the machine check handler,
opening a possible deadlock.
So instead, this patch introduces an intermediate value for the
calculations, so that we only assign xtime_cache once with the correct
time, using ACCESS_ONCE to make sure the compiler doesn't optimize out
any intermediate values.
The xtime_cache manipulations were removed with 2.6.35, so that kernel
and later do not need this change.
In 2.6.33 and 2.6.34 the logarithmic accumulation should make it so
xtime is updated each tick, so it is unlikely that two updates to
xtime_cache could occur while the difference between xtime and
xtime_cache crosses the second boundary. However, the paranoid might
want to pull this into 2.6.33/34-longterm just to be sure.
Thanks to Stephen for helping finally narrow down the root cause and
many hours of help with testing and validation. Also thanks to Max,
Andi, Eric and Paul for review of earlier attempts and helping clarify
what is possible with regard to out of order execution.
Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
While password processing we can get out of options array bound if
the next character after array is delimiter. The patch adds a check
if we reach the end.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
A length of zero (after subtracting two for the type and len fields) for
the DCCPO_{CHANGE,CONFIRM}_{L,R} options will cause an underflow due to
the subtraction. The subsequent code may read past the end of the
options value buffer when parsing. I'm unsure of what the consequences
of this might be, but it's probably not good.
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It's possible that when we go to decode the string area in the
SESSION_SETUP response, that bytes_remaining will be 0. Decrementing it at
that point will mean that it can go "negative" and wrap. Check for a
bytes_remaining value of 0, and don't try to decode the string area if
that's the case.
Reported-and-Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
is causing a potential NULL deref in scsi_run_queue() because the
q->queuedata may already be NULL by the time this function is called.
Since we shouldn't be running a queue that is being torn down, simply
add a NULL check in scsi_run_queue() to forestall this.
Tested-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We can get here with a NULL socket argument passed from userspace,
so we need to handle it accordingly.
Thanks to Dave Jones pointing at this issue in net/can/bcm.c
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If preempted after kvmclock values are updated, but before hardware
virtualization is entered, the last tsc time as read by the guest is
never set. It underflows the next time kvmclock is updated if there
has not yet been a successful entry / exit into hardware virt.
Fix this by simply setting last_tsc to the newly read tsc value so
that any computed nsec advance of kvmclock is nulled.
Kernel time, which advances in discrete steps may progress much slower
than TSC. As a result, when kvmclock is adjusted to a new base, the
apparent time to the guest, which runs at a much higher, nsec scaled
rate based on the current TSC, may have already been observed to have
a larger value (kernel_ns + scaled tsc) than the value to which we are
setting it (kernel_ns + 0).
We must instead compute the clock as potentially observed by the guest
for kernel_ns to make sure it does not go backwards.
MUSB is a non-standard host implementation which
can handle all speeds with the same core. We need
to set has_tt flag after commit d199c96d41d80a567493e12b8e96ea056a1350c1 (USB: prevent
buggy hubs from crashing the USB stack) in order for
MUSB HCD to continue working.
Signed-off-by: Felipe Balbi <balbi@ti.com> Cc: Alan Stern <stern@rowland.harvard.edu> Tested-by: Michael Jones <michael.jones@matrix-vision.de> Tested-by: Alexander Holler <holler@ahsoftware.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
ata_pio_sectors() expects buffer for each sector to be contained in a
single page; otherwise, it ends up overrunning the first page. This
is achieved by setting queue DMA alignment. If sector_size is smaller
than PAGE_SIZE and all buffers are sector_size aligned, buffer for
each sector is always contained in a single page.
This wasn't applied to ATAPI devices but IDENTIFY_PACKET is executed
as ATA_PROT_PIO and thus uses ata_pio_sectors(). Newer versions of
udev issue IDENTIFY_PACKET with unaligned buffer triggering the
problem and causing oops.
This patch fixes the problem by setting sdev->sector_size to
ATA_SECT_SIZE on ATATPI devices and always setting DMA alignment to
sector_size. While at it, add a warning for the unlikely but still
possible scenario where sector_size is larger than PAGE_SIZE, in which
case the alignment wouldn't be enough.
Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: John Stanley <jpsinthemix@verizon.net> Tested-by: John Stanley <jpsinthemix@verizon.net> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Jonathan Liu <net147@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
My conversion of tehuti to use request_firmware() was confused about
the filename of the firmware blob. Change the driver to match the
blob.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When writing a disc on certain lite-on dvd-writers (also rebadged
as optiarc/LG/...) connected to a vt6420, the ATAPI CDB ends
up in the datastream and on the disc, causing silent corruption.
Delaying between sending the CDB and starting DMA seems to
prevent this.
I do not know if there are burners that do not suffer from
this, but the patch should be safe for those as well.
There are many reports of this issue, but AFAICT no solution was
found before. For example:
http://lkml.indiana.edu/hypermail/linux/kernel/0802.3/0561.html
Signed-off-by: Bart Hartgers <bart.hartgers@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
[bwh: Remove version bump for 2.6.32] Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
HW crypto in rt2500usb does not seem to support keys with different ciphers,
which breaks TKIP+AES mode. Fall back to software encryption to fix it.
This should fix long-standing problems with rt2500usb and WPA, such as:
http://rt2x00.serialmonkey.com/phpBB/viewtopic.php?f=4&t=4834
https://bugzilla.redhat.com/show_bug.cgi?id=484888
Also tested that it does not break WEP, TKIP-only and AES-only modes.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org> Acked-by: Gertjan van Wingerde <gwingerde@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
[bwh: Adjust context for 2.6.32] Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The dts-installed variable is initialised using a wildcard path that
will be expanded relative to the build directory. Use the existing
variable dtstree to generate an absolute wildcard path that will work
when building in a separate directory.
Reported-by: Gerhard Pircher <gerhard_pircher@gmx.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Tested-by: Gerhard Pircher <gerhard_pircher@gmx.net> [against 2.6.32] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
niu_get_ethtool_tcam_all() assumes that its output buffer is the right
size, and warns before returning if it is not. However, the output
buffer size is under user control and ETHTOOL_GRXCLSRLALL is an
unprivileged ethtool command. Therefore this is at least a local
denial-of-service vulnerability.
Change it to check before writing each entry and to return an error if
the buffer is already full.
Compile-tested only.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[Adjusted to apply to 2.6.32 by dann frazier <dannf@debian.org>] Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This only matters for ISA devices with a 24-bit DMA limit or for devices
with a 32-bit DMA limit on systems with ZONE_DMA32 enabled. The latter
currently only affects 32-bit PCI cards on Sibyte-based systems with more
than 1GB RAM installed.
This patch fixes UDP socket refcnt bugs in the pppol2tp driver.
A bug can cause a kernel stack trace when a tunnel socket is closed.
A way to reproduce the issue is to prepare the UDP socket for L2TP (by
opening a tunnel pppol2tp socket) and then close it before any L2TP
sessions are added to it. The sequence is
Create UDP socket
Create tunnel pppol2tp socket to prepare UDP socket for L2TP
pppol2tp_connect: session_id=0, peer_session_id=0
L2TP SCCRP control frame received (tunnel_id==0)
pppol2tp_recv_core: sock_hold()
pppol2tp_recv_core: sock_put
L2TP ZLB control frame received (tunnel_id=nnn)
pppol2tp_recv_core: sock_hold()
pppol2tp_recv_core: sock_put
Close tunnel management socket
pppol2tp_release: session_id=0, peer_session_id=0
Close UDP socket
udp_lib_close: BUG
The addition of sock_hold() in pppol2tp_connect() solves the problem.
For data frames, two sock_put() calls were added to plug a refcnt leak
per received data frame. The ref that is grabbed at the top of
pppol2tp_recv_core() must always be released, but this wasn't done for
accepted data frames or data frames discarded because of bad UDP
checksums. This leak meant that any UDP socket that had passed L2TP
data traffic (i.e. L2TP data frames, not just L2TP control frames)
using pppol2tp would not be released by the kernel.
Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When a network namespace is created (via CLONE_NEWNET), the loopback
interface is automatically added to the new namespace, triggering a
printk in ipv6_add_dev() if CONFIG_IPV6_PRIVACY is set.
This is problematic for applications which use CLONE_NEWNET as
part of a sandbox, like Chromium's suid sandbox or recent versions of
vsftpd. On a busy machine, it can lead to thousands of useless
"lo: Disabled Privacy Extensions" messages appearing in dmesg.
It's easy enough to check the status of privacy extensions via the
use_tempaddr sysctl, so just removing the printk seems like the most
sensible solution.
Signed-off-by: Romain Francoise <romain@orebokech.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch removes D-Link DGE-550T PCI ID (1186:4000) from the ipg
driver. The ipg driver is for IP2000-based cards and the DGE-550T is
a DL2000-based card. The driver loads and works for a few moments, but
once a real workload is applied it stops operating. The ipg driver
claimed this ID since it was introduced in 2.6.24 and it's forced many
users to blacklist it.
The correct driver for this hardware is the dl2k driver, which has been
claiming this PCI ID since the 2.4 days.
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Certain revisions of this chipset appear to be broken. There is a shadow
GTT which mirrors the real GTT but contains pre-translated physical
addresses, for performance reasons. When a GTT update happens, the
translations are done once and the resulting physical addresses written
back to the shadow GTT.
Except sometimes, the physical address is actually written back to the
_real_ GTT, not the shadow GTT. Thus we start to see faults when that
physical address is fed through translation again.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Resulted from bonding driver registering packet handlers via dev_add_pack and
then trying to call pskb_may_pull. If another packet handler (like for AF_PACKET
sockets) gets called first, the delivered skb will have a user count > 1, which
causes pskb_may_pull to BUG halt when it does its skb_shared check. Fix this by
calling skb_share_check prior to the may_pull call sites in the bonding driver
to clone the skb when needed. Tested by myself and the reported successfully.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: Jay Vosburgh <fubar@us.ibm.com> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Ilya reported that on a very slow machine he could reliably
reproduce a race between forking init and kthreadd. We first
fork init so that it obtains pid-1, however since the scheduler
is already fully running at this point it can preempt and run
the init thread before we spawn and set kthreadd_task.
The init thread can then attempt spawning kthreads without
kthreadd being present which results in an OOPS.
Vegard Nossum found a unix socket OOM was possible, posting an exploit
program.
My analysis is we can eat all LOWMEM memory before unix_gc() being
called from unix_release_sock(). Moreover, the thread blocked in
unix_gc() can consume huge amount of time to perform cleanup because of
huge working set.
One way to handle this is to have a sensible limit on unix_tot_inflight,
tested from wait_for_unix_gc() and to force a call to unix_gc() if this
limit is hit.
This solves the OOM and also reduce overall latencies, and should not
slowdown normal workloads.
Reported-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Its easy to eat all kernel memory and trigger NMI watchdog, using an
exploit program that queues unix sockets on top of others.
lkml ref : http://lkml.org/lkml/2010/11/25/8
This mechanism is used in applications, one choice we have is to have a
recursion limit.
Other limits might be needed as well (if we queue other types of files),
since the passfd mechanism is currently limited by socket receive queue
sizes only.
Add a recursion_level to unix socket, allowing up to 4 levels.
Each time we send an unix socket through sendfd mechanism, we copy its
recursion level (plus one) to receiver. This recursion level is cleared
when socket receive queue is emptied.
Reported-by: Марк Коренберг <socketpair@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Adjust for 2.6.32] Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Filesystem rebalancing (BTRFS_IOC_BALANCE) affects the entire
filesystem and may run uninterruptibly for a long time. This does not
seem to be something that an unprivileged user should be able to do.
Reported-by: Aron Xu <happyaron.xu@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>