Commit 7f74f8f28a2bd9db9404f7d364e2097a0c42cc12
(x86 quirk: Fix polarity for IRQ0 pin2 override on SB800
systems) introduced a regression. It removed some SB600 specific
code to determine the revision ID without adapting a
corresponding revision ID check for SB600.
This patch adapts the corresponding check to cover all SB600
revisions.
Tested-by: Wang Lei <f3d27b@gmail.com> Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20110315143137.GD29499@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
When processing a SIDR REQ, the ib_cm allocates a new cm_id. The
refcount of the cm_id is initialized to 1. However, cm_process_work
will decrement the refcount after invoking all callbacks. The result
is that the cm_id will end up with refcount set to 0 by the end of the
sidr req handler.
If a user tries to destroy the cm_id, the destruction will proceed,
under the incorrect assumption that no other threads are referencing
the cm_id. This can lead to a crash when the cm callback thread tries
to access the cm_id.
This problem was noticed as part of a larger investigation with kernel
crashes in the rdma_cm when running on a real time OS.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Acked-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
They were able to reproduce the crash multiple times with the
following details:
Crash seems to always happen on the:
mutex_unlock(&conn_id->handler_mutex);
as conn_id looks to have been freed during this code path.
An examination of the code shows that a race exists in the request
handlers. When a new connection request is received, the rdma_cm
allocates a new connection identifier. This identifier has a single
reference count on it. If a user calls rdma_destroy_id() from another
thread after receiving a callback, rdma_destroy_id will proceed to
destroy the id and free the associated memory. However, the request
handlers may still be in the process of running. When control returns
to the request handlers, they can attempt to access the newly created
identifiers.
Fix this by holding a reference on the newly created rdma_cm_id until
the request handler is through accessing it.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Acked-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Emit warning when "mem=nopentium" is specified on any arch other
than x86_32 (the only that arch supports it).
Signed-off-by: Kamal Mostafa <kamal@canonical.com> BugLink: http://bugs.launchpad.net/bugs/553464 Cc: Yinghai Lu <yinghai@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl>
LKML-Reference: <1296783486-23033-2-git-send-email-kamal@canonical.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Avoid removing all of memory and panicing when "mem={invalid}"
is specified, e.g. mem=blahblah, mem=0, or mem=nopentium (on
platforms other than x86_32).
Signed-off-by: Kamal Mostafa <kamal@canonical.com> BugLink: http://bugs.launchpad.net/bugs/553464 Cc: Yinghai Lu <yinghai@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl>
LKML-Reference: <1296783486-23033-1-git-send-email-kamal@canonical.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
When the fuction graph tracer starts, it needs to make a special
stack for each task to save the real return values of the tasks.
All running tasks have this stack created, as well as any new
tasks.
On CPU hot plug, the new idle task will allocate a stack as well
when init_idle() is called. The problem is that cpu hotplug does
not create a new idle_task. Instead it uses the idle task that
existed when the cpu went down.
ftrace_graph_init_task() will add a new ret_stack to the task
that is given to it. Because a clone will make the task
have a stack of its parent it does not check if the task's
ret_stack is already NULL or not. When the CPU hotplug code
starts a CPU up again, it will allocate a new stack even
though one already existed for it.
The solution is to treat the idle_task specially. In fact, the
function_graph code already does, just not at init_idle().
Instead of using the ftrace_graph_init_task() for the idle task,
which that function expects the task to be a clone, have a
separate ftrace_graph_init_idle_task(). Also, we will create a
per_cpu ret_stack that is used by the idle task. When we call
ftrace_graph_init_idle_task() it will check if the idle task's
ret_stack is NULL, if it is, then it will assign it the per_cpu
ret_stack.
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
mm_fault_error() should not execute oom-killer, if page fault
occurs in kernel space. E.g. in copy_from_user()/copy_to_user().
This would happen if we find ourselves in OOM on a
copy_to_user(), or a copy_from_user() which faults.
Without this patch, the kernels hangs up in copy_from_user(),
because OOM killer sends SIG_KILL to current process, but it
can't handle a signal while in syscall, then the kernel returns
to copy_from_user(), reexcute current command and provokes
page_fault again.
With this patch the kernel return -EFAULT from copy_from_user().
The code, which checks that page fault occurred in kernel space,
has been copied from do_sigbus().
This situation is handled by the same way on powerpc, xtensa,
tile, ...
Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <201103092322.p29NMNPH001682@imap1.linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
When au1000_eth probes the MII bus for PHY address, if we do not set
au1000_eth platform data's phy_search_highest_address, the MII probing
logic will exit early and will assume a valid PHY is found at address 0.
For MTX-1, the PHY is at address 31, and without this patch, the link
detection/speed/duplex would not work correctly.
Signed-off-by: Florian Fainelli <florian@openwrt.org>
To: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/2111/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
ata_qc_complete() contains special handling for certain commands. For
example, it schedules EH for device revalidation after certain
configurations are changed. These shouldn't be applied to EH
commands but they were.
In most cases, it doesn't cause an actual problem because EH doesn't
issue any command which would trigger special handling; however, ACPI
can issue such commands via _GTF which can cause weird interactions.
Restructure ata_qc_complete() such that EH commands are always passed
on to __ata_qc_complete().
stable: Please apply to -stable only after 2.6.38 is released.
Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Axel Lin <axel.lin@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This avoids a possible race leading to trying to dereference NULL.
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
I found that one of the 8168c chipsets (concretely XID 1c4000c0) starts
generating RxFIFO overflow errors. The result is an infinite loop in
interrupt handler as the RxFIFOOver is handled only for ...MAC_VER_11.
With the workaround everything goes fine.
Signed-off-by: Ivan Vecera <ivecera@redhat.com> Acked-by: Francois Romieu <romieu@fr.zoreil.com> Cc: Hayes <hayeswang@realtek.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
We must not use dummy for index.
After the first index, READ32(dummy) will change dummy!!!!
Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
[bfields@redhat.com: Trond points out READ_BUF alone is sufficient.] Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Like many other places, we have to check that the array index is
within allowed limits, or otherwise, a kernel oops and other nastiness
can ensue when we access memory beyond the end of the array.
The problem goes back to v2.6.30-rc1~1372~1342~31 where nf_log_bind
was decoupled from nf_log_register.
Reported-by: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>,
via irc.freenode.net/#netfilter Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
When CPU hotplug is used, some CPUs may be offline at the time a kexec is
performed. The subsequent kernel may expect these CPUs to be already running,
and will declare them stuck. On pseries, there's also a soft-offline (cede)
state that CPUs may be in; this can also cause problems as the kexeced kernel
may ask RTAS if they're online -- and RTAS would say they are. The CPU will
either appear stuck, or will cause a crash as we replace its cede loop beneath
it.
This patch kicks each present offline CPU awake before the kexec, so that
none are forever lost to these assumptions in the subsequent kernel.
Now, the behaviour is that all available CPUs that were offlined are now
online & usable after the kexec. This mimics the behaviour of a full reboot
(on which all CPUs will be restarted).
Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Currently for kexec the PTE tear down on 1TB segment systems normally
requires 3 hcalls for each PTE removal. On a machine with 32GB of
memory it can take around a minute to remove all the PTEs.
This optimises the path so that we only remove PTEs that are valid.
It also uses the read 4 PTEs at once HCALL. For the common case where
a PTEs is invalid in a 1TB segment, this turns the 3 HCALLs per PTE
down to 1 HCALL per 4 PTEs.
This gives an > 10x speedup in kexec times on PHYP, taking a 32GB
machine from around 1 minute down to a few seconds.
Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This adds plpar_pte_read_4_raw() which can be used read 4 PTEs from
PHYP at a time, while in real mode.
It also creates a new hcall9 which can be used in real mode. It's the
same as plpar_hcall9 but minus the tracing hcall statistics which may
require variables outside the RMO.
Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
On large machines we are running out of room below 256MB. In some cases we
only need to ensure the allocation is in the first segment, which may be
256MB or 1TB.
Add slb0_limit and use it to specify the upper limit for the irqstack and
emergency stacks.
On a large ppc64 box, this fixes a panic at boot when the crashkernel=
option is specified (previously we would run out of memory below 256MB).
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
IOMMU table initialized, virtual merging enabled
Interrupt 155954 (real) is invalid, disabling it.
Interrupt 155953 (real) is invalid, disabling it.
ie we took some spurious interrupts. default_machine_crash_shutdown tries
to disable all interrupt sources but uses chip->disable which maps to
the default action of:
static void default_disable(unsigned int irq)
{
}
If we use chip->shutdown, then we actually mask the IRQ:
Not sure why we don't implement a ->disable action for xics.c, or why
default_disable doesn't mask the interrupt.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
We wrap the crash_shutdown_handles[] calls with longjmp/setjmp, so if any
of them fault we can recover. The problem is we add a hook to the debugger
fault handler hook which calls longjmp unconditionally.
This first part of kdump is run before we marshall the other CPUs, so there
is a very good chance some CPU on the box is going to page fault. And when
it does it hits the longjmp code and assumes the context of the oopsing CPU.
The machine gets very confused when it has 10 CPUs all with the same stack,
all thinking they have the same CPU id. I get even more confused trying
to debug it.
The patch below adds crash_shutdown_cpu and uses it to specify which cpu is
in the protected region. Since it can only be -1 or the oopsing CPU, we don't
need to use memory barriers since it is only valid on the local CPU - no other
CPU will ever see a value that matches it's local CPU id.
Eventually we should switch the order and marshall all CPUs before doing the
crash_shutdown_handles[] calls, but that is a bigger fix.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Robert Swiecki reported a BUG_ON(page_mapped) from a fuzzer, punching
a hole with madvise(,, MADV_REMOVE). That path is under mutex, and
cannot be explained by lack of serialization in unmap_mapping_range().
Reviewing the code, I found one place where vm_truncate_count handling
should have been updated, when I switched at the last minute from one
way of managing the restart_addr to another: mremap move changes the
virtual addresses, so it ought to adjust the restart_addr.
But rather than exporting the notion of restart_addr from memory.c, or
converting to restart_pgoff throughout, simply reset vm_truncate_count
to 0 to force a rescan if mremap move races with preempted truncation.
We have no confirmation that this fixes Robert's BUG,
but it is a fix that's worth making anyway.
We have found a hardware erratum on 82599 hardware that can lead to
unpredictable behavior when Header Splitting mode is enabled. So
we are no longer enabling this feature on affected hardware.
Please see the 82599 Specification Update for more information.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
However the code in rxrpc_instantiate strips it away:
data += sizeof(kver);
datalen -= sizeof(kver);
Removing kif_version fixes my problem.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The "ct" variable should be an unsigned int. Both struct kbdiacrs
->kb_cnt and struct kbd_data ->accent_table_size are unsigned ints.
Making it signed causes a problem in KBDIACRUC because the user could
set the signed bit and cause a buffer overflow.
Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
When we get oplock break notification we should set the appropriate
value of OplockLevel field in oplock break acknowledge according to
the oplock level held by the client in this time. As we only can have
level II oplock or no oplock in the case of oplock break, we should be
aware only about clientCanCacheRead field in cifsInodeInfo structure.
Also fix bug connected with wrong interpretation of OplockLevel field
during oplock break notification processing.
[PG: above OplockLevel bug only exists via. e66673e39a which didn't
appear until v2.6.37-rc2, so cifs/misc.c hunk dropped.]
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
NETDEV_NOTIFY_PEER is an explicit request by the driver to send a link
notification while NETDEV_UP/NETDEV_CHANGEADDR generate link
notifications as a sort of side effect.
In the later cases the sysctl option is present because link
notification events can have undesired effects e.g. if the link is
flapping. I don't think this applies in the case of an explicit
request from a driver.
This patch makes NETDEV_NOTIFY_PEER unconditional, if preferred we
could add a new sysctl for this case which defaults to on.
This change causes Xen post-migration ARP notifications (which cause
switches to relearn their MAC tables etc) to be sent by default.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
If the iowarrior devices in this case statement support more than 8 bytes
per report, it is possible to write past the end of a kernel heap allocation.
This will probably never be possible, but change the allocation to be more
defensive anyway.
For some time is known that ASPM is causing troubles on r8169, i.e. make
device randomly stop working without any errors in dmesg.
Currently Tomi Leppikangas reports that system with r8169 device hangs
with MCE errors when ASPM is enabled:
https://bugzilla.redhat.com/show_bug.cgi?id=642861#c4
Lets disable ASPM for r8169 devices at all, to avoid problems with
r8169 PCIe devices at least for some users.
Reported-by: Tomi Leppikangas <tomi.leppikangas@gmail.com> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
When support for 82577/82578 was added[1] in 2.6.31, PHY wakeup was in-
advertently enabled (even though it does not function properly) on ICH10
LOMs. This patch makes it so that the ICH10 LOMs use MAC wakeup instead
as was done with the initial support for those devices (i.e. 82567LM-3,
82567LF-3 and 82567V-4).
Reported-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This fixes a bug in the order of dccp_rcv_state_process() that still permitted
reception even after closing the socket. A Reset after close thus causes a NULL
pointer dereference by not preventing operations on an already torn-down socket.
dccp_v4_do_rcv()
|
| state other than OPEN
v
dccp_rcv_state_process()
|
| DCCP_PKT_RESET
v
dccp_rcv_reset()
|
v
dccp_time_wait()
WARNING: at net/ipv4/inet_timewait_sock.c:141 __inet_twsk_hashdance+0x48/0x128()
Modules linked in: arc4 ecb carl9170 rt2870sta(C) mac80211 r8712u(C) crc_ccitt ah
[<c0038850>] (unwind_backtrace+0x0/0xec) from [<c0055364>] (warn_slowpath_common)
[<c0055364>] (warn_slowpath_common+0x4c/0x64) from [<c0055398>] (warn_slowpath_n)
[<c0055398>] (warn_slowpath_null+0x1c/0x24) from [<c02b72d0>] (__inet_twsk_hashd)
[<c02b72d0>] (__inet_twsk_hashdance+0x48/0x128) from [<c031caa0>] (dccp_time_wai)
[<c031caa0>] (dccp_time_wait+0x40/0xc8) from [<c031c15c>] (dccp_rcv_state_proces)
[<c031c15c>] (dccp_rcv_state_process+0x120/0x538) from [<c032609c>] (dccp_v4_do_)
[<c032609c>] (dccp_v4_do_rcv+0x11c/0x14c) from [<c0286594>] (release_sock+0xac/0)
[<c0286594>] (release_sock+0xac/0x110) from [<c031fd34>] (dccp_close+0x28c/0x380)
[<c031fd34>] (dccp_close+0x28c/0x380) from [<c02d9a78>] (inet_release+0x64/0x70)
The fix is by testing the socket state first. Receiving a packet in Closed state
now also produces the required "No connection" Reset reply of RFC 4340, 8.3.1.
Reported-and-tested-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Reported-by: Mark Davis Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
vfs_rename_other() does not lock renamed inode with i_mutex. Thus changing
i_nlink in a non-atomic manner (which happens in ext2_rename()) can corrupt
it as reported and analyzed by Josh.
In fact, there is no good reason to mess with i_nlink of the moved file.
We did it presumably to simulate linking into the new directory and unlinking
from an old one. But the practical effect of this is disputable because fsck
can possibly treat file as being properly linked into both directories without
writing any error which is confusing. So we just stop increment-decrement
games with i_nlink which also fixes the corruption.
CC: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Josh Hunt <johunt@akamai.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
When the per cpu timer is marked CLOCK_EVT_FEAT_C3STOP, then we only
can switch into oneshot mode, when the backup broadcast device
supports oneshot mode as well. Otherwise we would try to switch the
broadcast device into an unsupported mode unconditionally. This went
unnoticed so far as the current available broadcast devices support
oneshot mode. Seth unearthed this problem while debugging and working
around an hpet related BIOS wreckage.
Add the necessary check to tick_is_oneshot_available().
Reported-and-tested-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <alpine.LFD.2.00.1102252231200.2701@localhost6.localdomain6> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
A customer of ours, complained that when setting the reset
vector back to 0, it trashed other data and hung their box.
They noticed when only 4 bytes were set to 0 instead of 8,
everything worked correctly.
Mathew pointed out:
|
| We're supposed to be resetting trampoline_phys_low and
| trampoline_phys_high here, which are two 16-bit values.
| Writing 64 bits is definitely going to overwrite space
| that we're not supposed to be touching.
|
So limit the area modified to u32.
Signed-off-by: Don Zickus <dzickus@redhat.com> Acked-by: Matthew Garrett <mjg@redhat.com>
LKML-Reference: <1297139100-424-1-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jochen Friedrich <jochen@scram.de> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Current refcounttree codes actually didn't writeback the new pages out in
write-back mode, due to a bug of always passing a ZERO number of clusters
to 'ocfs2_cow_sync_writeback', the patch tries to pass a proper one in.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
does not take into account that the remaining data length can be less
than sg_dma_len(sg). In that case, running_total can end up being
greater than the total data length, so an extra TRB is counted.
Changing the expression to
while (running_total < sg_dma_len(sg) && running_total < temp)
fixes that.
This patch should be queued for stable kernels back to 2.6.31.
Signed-off-by: Paul Zimmerman <paulz@synopsys.com> Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
are incorrect, because running_total can never be zero, so the if()
expression will never be true. I think the intention was that
running_total be in the range of 0 to TRB_MAX_BUFF_SIZE-1, not 1
to TRB_MAX_BUFF_SIZE. So adding a
running_total &= TRB_MAX_BUFF_SIZE - 1;
fixes the problem.
This patch should be queued for stable kernels back to 2.6.31.
[PG: 34 doesn't have count_isoc_trbs_needed() hence this has
only two of the three upstream running_total changes.]
Signed-off-by: Paul Zimmerman <paulz@synopsys.com> Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This makes it easier to spot some problems, which will be fixed by the
next patch in the series. Also change dev_dbg to dev_err in
check_trb_math(), so any math errors will be visible even when running
with debug disabled.
Note: This patch changes the expressions containing
"((1 << TRB_MAX_BUFF_SHIFT) - 1)" to use the equivalent
"(TRB_MAX_BUFF_SIZE - 1)". No change in behavior is intended for
those expressions.
This patch should be queued for stable kernels back to 2.6.31.
[PG: 34 doesn't have count_isoc_trbs_needed() hence this has
one less instance to change vs. the originating release]
Signed-off-by: Paul Zimmerman <paulz@synopsys.com> Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Change the BUGs in xhci_find_new_dequeue_state() to WARN_ONs, to avoid
bringing down the box if one of them is hit
This patch should be queued for stable kernels back to 2.6.31.
Signed-off-by: Paul Zimmerman <paulz@synopsys.com> Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
On some SB800 systems polarity for IOAPIC pin2 is wrongly
specified as low active by BIOS. This caused system hangs after
resume from S3 when HPET was used in one-shot mode on such
systems because a timer interrupt was missed (HPET signal is
high active).
'mdp' devices are md devices with preallocated device numbers
for partitions. As such it is possible to mknod and open a partition
before opening the whole device.
this causes md_probe() to be called with a device number of a
partition, which in-turn calls mddev_find with such a number.
However mddev_find expects the number of a 'whole device' and
does the wrong thing with partition numbers.
So add code to mddev_find to remove the 'partition' part of
a device number and just work with the 'whole device'.
This patch addresses https://bugzilla.kernel.org/show_bug.cgi?id=28652
Reported-by: hkmaly@bigfoot.com Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The kernel automatically evaluates partition tables of storage devices.
The code for evaluating LDM partitions (in fs/partitions/ldm.c) contains
a bug that causes a kernel oops on certain corrupted LDM partitions. A
kernel subsystem seems to crash, because, after the oops, the kernel no
longer recognizes newly connected storage devices.
The patch changes ldm_parse_vmdb() to Validate the value of vblk_size.
Signed-off-by: Timo Warns <warns@pre-sense.de> Cc: Eugene Teo <eugeneteo@kernel.sg> Acked-by: Richard Russon <ldm@flatcap.org> Cc: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
In several places, an epoll fd can call another file's ->f_op->poll()
method with ep->mtx held. This is in general unsafe, because that other
file could itself be an epoll fd that contains the original epoll fd.
The code defends against this possibility in its own ->poll() method using
ep_call_nested, but there are several other unsafe calls to ->poll
elsewhere that can be made to deadlock. For example, the following simple
program causes the call in ep_insert recursively call the original fd's
->poll, leading to deadlock:
#include <unistd.h>
#include <sys/epoll.h>
int main(void) {
int e1, e2, p[2];
struct epoll_event evt = {
.events = EPOLLIN
};
On insertion, check whether the inserted file is itself a struct epoll,
and if so, do a recursive walk to detect whether inserting this file would
create a loop of epoll structures, which could lead to deadlock.
[nelhage@ksplice.com: Use epmutex to serialize concurrent inserts] Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Nelson Elhage <nelhage@ksplice.com> Reported-by: Nelson Elhage <nelhage@ksplice.com> Tested-by: Nelson Elhage <nelhage@ksplice.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
I picked up a new Sierra usb 308 (At&t Shockwave) on 2/2011 and the vendor code
is 0x0f3d
Looking up vendor and product id's I see:
0f3d Airprime, Incorporated
0112 CDMA 1xEVDO PC Card, PC 5220
Sierra and Airprime are somehow related and I'm guessing the At&t usb 308 might
be have some common hardware with the AirPrime SL809x.
Signed-off-by: Jon Thomas <jthomas@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
"DMA transfers need to be synced properly in order for
the cpu and device to see the most uptodate and correct
copy of the DMA buffer."
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
My Galaxy Spica needs this quirk when in modem mode, otherwise
it causes endless USB bus resets and is unusable in this mode.
Unfortunately Samsung decided to reuse ID of its old CDMA phone SGH-I500
for the modem part.
That's why in addition to this patch the visor driver must be prevented
from binding to SPH-I500 ID, so ACM driver can do that.
Signed-off-by: Maciej Szmigiero <mhej@o2.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
[USB]Add Samsung SGH-I500/Android modem ID switch to visor driver
Samsung decided to reuse USB ID of its old CDMA phone SGH-I500 for the
modem part of some of their Android phones. At least Galaxy Spica
is affected.
This modem needs ACM driver and does not work with visor driver which
binds the conflicting ID for SGH-I500.
Because SGH-I500 is pretty an old hardware its best to add switch to
visor
driver in cause somebody still wants to use that phone with Linux.
Note that this is needed only when using the Android phone as modem,
not in USB storage or ADB mode.
Signed-off-by: Maciej Szmigiero <mhej@o2.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This patch (as1448) adds a quirks entry for the Keytouch QWERTY Panel
firmware, used in the IEC 60945 keyboard. This device crashes during
enumeration when the computer asks for its configuration string
descriptor.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: kholis <nur.kholis.majid@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The idle timer could trigger after clock had been disabled leading to
kernel panic when MUSB_DEVCTL is accessed in musb_do_idle on 2.6.37.
The fault below is no longer triggered on 2.6.38-rc4 (clock is disabled
later, and only if compiled as a module, and the offending memory access
has moved) but the timer should be cancelled nonetheless.
Signed-off-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
With CONFIG_SHIRQ_DEBUG=y we call a newly installed interrupt handler
in request_threaded_irq().
The original implementation (commit a304e1b8) called the handler
_BEFORE_ it was installed, but that caused problems with handlers
calling disable_irq_nosync(). See commit 377bf1e4.
It's braindead in the first place to call disable_irq_nosync in shared
handlers, but ....
Moving this call after we installed the handler looks innocent, but it
is very subtle broken on SMP.
Interrupt handlers rely on the fact, that the irq core prevents
reentrancy.
Now this debug call violates that promise because we run the handler
w/o the IRQ_INPROGRESS protection - which we cannot apply here because
that would result in a possibly forever masked interrupt line.
A concurrent real hardware interrupt on a different CPU results in
handler reentrancy and can lead to complete wreckage, which was
unfortunately observed in reality and took a fricking long time to
debug.
Leave the code here for now. We want this debug feature, but that's
not easy to fix. We really should get rid of those
disable_irq_nosync() abusers and remove that function completely.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Anton Vorontsov <avorontsov@ru.mvista.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Don't allow everybody to change ACPI settings. The comment says that it
is done deliberatelly, however, the comment before disp_proc_write()
says that at least one of these setting is experimental.
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The lower filesystem may do some type of inode revalidation during a
getattr call. eCryptfs should take advantage of that by copying the
lower inode attributes to the eCryptfs inode after a call to
vfs_getattr() on the lower inode.
I originally wrote this fix while working on eCryptfs on nfsv3 support,
but discovered it also fixed an eCryptfs on ext4 nanosecond timestamp
bug that was reported.
https://bugs.launchpad.net/bugs/613873
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com> Acked-by: Carlos Corbacho <carlos@strangeworlds.co.uk> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Ensure a predictable endian state when entering signal handlers. This
avoids programs which use SETEND to momentarily switch their endian
state from having their signal handlers entered with an unpredictable
endian state.
Acked-by: Dave Martin <dave.martin@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Currently we return 0 in swsusp_alloc() when alloc_image_page() fails.
Fix that. Also remove unneeded "error" variable since the only
useful value of error is -ENOMEM.
[rjw: Fixed up the changelog and changed subject.]
Signed-off-by: Stanislaw Gruszka <stf_xl@wp.pl> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
task_show_regs used to be a debugging aid in the early bringup days
of Linux on s390. /proc/<pid>/status is a world readable file, it
is not a good idea to show the registers of a process. The only
correct fix is to remove task_show_regs.
Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Commit 7124fe0a5b619d65b739477b3b55a20bf805b06d ("xfs: validate untrusted inode
numbers during lookup") changes the inode lookup code to do btree lookups for
untrusted inode numbers. This change made an invalid assumption about the
alignment of inodes and hence incorrectly calculated the first inode in the
cluster. As a result, some inode numbers were being incorrectly considered
invalid when they were actually valid.
The issue was not picked up by the xfstests suite because it always runs fsr
and dump (the two utilities that utilise the bulkstat interface) on cache hot
inodes and hence the lookup code in the cold cache path was not sufficiently
exercised to uncover this intermittent problem.
Fix the issue by relaxing the btree lookup criteria and then checking if the
record returned contains the inode number we are lookup for. If it we get an
incorrect record, then the inode number is invalid.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The block number comes from bulkstat based inode lookups to shortcut
the mapping calculations. We ar enot able to trust anything from
bulkstat, so drop the block number as well so that the correct
lookups and mappings are always done.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Inode numbers may come from somewhere external to the filesystem
(e.g. file handles, bulkstat information) and so are inherently
untrusted. Rename the flag we use for these lookups to make it
obvious we are doing a lookup of an untrusted inode number and need
to verify it completely before trying to read it from disk.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
When we decode a handle or do a bulkstat lookup, we are using an
inode number we cannot trust to be valid. If we are deleting inode
chunks from disk (default noikeep mode), then we cannot trust the on
disk inode buffer for any given inode number to correctly reflect
whether the inode has been unlinked as the di_mode nor the
generation number may have been updated on disk.
This is due to the fact that when we delete an inode chunk, we do
not write the clusters back to disk when they are removed - instead
we mark them stale to avoid them being written back potentially over
the top of something that has been subsequently allocated at that
location. The result is that we can have locations of disk that look
like they contain valid inodes but in reality do not. Hence we
cannot simply convert the inode number to a block number and read
the location from disk to determine if the inode is valid or not.
As a result, and XFS_IGET_BULKSTAT lookup needs to actually look the
inode up in the inode allocation btree to determine if the inode
number is valid or not.
It should be noted even on ikeep filesystems, there is the
possibility that blocks on disk may look like valid inode clusters.
e.g. if there are filesystem images hosted on the filesystem. Hence
even for ikeep filesystems we really need to validate that the inode
number is valid before issuing the inode buffer read.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The non-coherent bulkstat versionsthat look directly at the inode
buffers causes various problems with performance optimizations that
make increased use of just logging inodes. This patch makes bulkstat
always use iget, which should be fast enough for normal use with the
radix-tree based inode cache introduced a while ago.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Reported-by: Herbert Poetzl <herbert@13thfloor.at> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
In prepare_kernel_cred() since 2.6.29, put_cred(new) is called without
assigning new->usage when security_prepare_creds() returned an error. As a
result, memory for new and refcount for new->{user,group_info,tgcred} are
leaked because put_cred(new) won't call __put_cred() unless old->usage == 1.
Fix these leaks by assigning new->usage (and new->subscribers which was added
in 2.6.32) before calling security_prepare_creds().
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
In cred_alloc_blank() since 2.6.32, abort_creds(new) is called with
new->security == NULL and new->magic == 0 when security_cred_alloc_blank()
returns an error. As a result, BUG() will be triggered if SELinux is enabled
or CONFIG_DEBUG_CREDENTIALS=y.
If CONFIG_DEBUG_CREDENTIALS=y, BUG() is called from __invalid_creds() because
cred->magic == 0. Failing that, BUG() is called from selinux_cred_free()
because selinux_cred_free() is not expecting cred->security == NULL. This does
not affect smack_cred_free(), tomoyo_cred_free() or apparmor_cred_free().
Fix these bugs by
(1) Set new->magic before calling security_cred_alloc_blank().
(2) Handle null cred->security in creds_are_invalid() and selinux_cred_free().
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
In get_empty_filp() since 2.6.29, file_free(f) is called with f->f_cred == NULL
when security_file_alloc() returned an error. As a result, kernel will panic()
due to put_cred(NULL) call within RCU callback.
Fix this bug by assigning f->f_cred before calling security_file_alloc().
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
It's possible for get_task_cred() as it currently stands to 'corrupt' a set of
credentials by incrementing their usage count after their replacement by the
task being accessed.
What happens is that get_task_cred() can race with commit_creds():
However, since a tasks credentials are generally not changed very often, we can
reasonably make use of a loop involving reading the creds pointer and using
atomic_inc_not_zero() to attempt to increment it if it hasn't already hit zero.
If successful, we can safely return the credentials in the knowledge that, even
if the task we're accessing has released them, they haven't gone to the RCU
cleanup code.
We then change task_state() in procfs to use get_task_cred() rather than
calling get_cred() on the result of __task_cred(), as that suffers from the
same problem.
Without this change, a BUG_ON in __put_cred() or in put_cred_rcu() can be
tripped when it is noticed that the usage count is not zero as it ought to be,
for example:
If the guest domain has been suspend/resumed or migrated, then the
system clock backing the pvclock clocksource may revert to a smaller
value (ie, can be non-monotonic across the migration/save-restore).
Make sure we zero last_value in that case so that the domain
continues to see clock updates.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The original problem encountered by people using NVIDIA chipsets was
that USB devices were not turning off when the system shut down. For
example, the LED on an optical mouse would remain on, draining a
laptop's battery. The problem was caused by a bug in the chipset; an
OHCI controller in the Reset state would continue to drive a bus reset
signal even after system shutdown. The workaround was to put the
controllers into the Suspend state instead.
It turns out that later NVIDIA chipsets do not suffer from this bug.
Instead some have the opposite bug: If a system is shut down while an
OHCI controller is in the Suspend state, USB devices remain powered!
On other systems, shutting down with a Suspended controller causes the
system to reboot immediately. Thus, working around the original bug
on some machines exposes other bugs on other machines.
The best solution seems to be to limit the workaround to OHCI
controllers with a low-numbered PCI product ID. I don't know exactly
at what point NVIDIA changed their chipsets; the value used here is a
guess. So far it was worked out okay for all the people who have
tested it.
This fixes Bugzilla #35032.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Andre "Osku" Schmidt <andre.osku.schmidt@googlemail.com> Tested-by: Yury Siamashka <yurand2@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This patch (as1417) fixes a problem affecting some (or all) nVidia
chipsets. When the computer is shut down, the OHCI controllers
continue to power the USB buses and evidently they drive a Reset
signal out all their ports. This prevents attached devices from going
to low power. Mouse LEDs stay on, for example, which is disconcerting
for users and a drain on laptop batteries.
The fix involves leaving each OHCI controller in the OPERATIONAL state
during system shutdown rather than putting it in the RESET state.
Although this nominally means the controller is running, in fact it's
not doing very much since all the schedules are all disabled. However
there is ongoing DMA to the Host Controller Communications Area, so
the patch also disables the bus-master capability of all PCI USB
controllers after the shutdown routine runs.
The fix is applied only to nVidia-based PCI OHCI controllers, so it
shouldn't cause problems on systems using other hardware. As an added
safety measure, in case the kernel encounters one of these running
controllers during boot, the patch changes quirk_usb_handoff_ohci()
(which runs early on during PCI discovery) to reset the controller
before anything bad can happen.
Reported-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Alan Stern <stern@rowland.harvard.edu> CC: David Brownell <david-b@pacbell.net> Tested-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
If ocfs2_live_connection_list is empty, ocfs2_connection_find() will return
a pointer to the LIST_HEAD, cast as a ocfs2_live_connection. This can cause
an oops when ocfs2_control_send_down() dereferences c->oc_conn:
Fix by explicitly returning NULL if no match is found.
Signed-off-by: dann frazier <dann.frazier@canonical.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
There's a branch at the end of this function that
is supposed to normalize the return value with what
the mid-layer expects. In this one case, we get it wrong.
Also increase the verbosity of the INFO level printk
at the end of mptscsih_abort to include the actual return value
and the scmd->serial_number. The reason being success
or failure is actually determined by the state of
the internal tag list when a TMF is issued, and not the
return value of the TMF cmd. The serial_number is also
used in this decision, thus it's useful to know for debugging
purposes.
Reported-by: Peter M. Petrakis <peter.petrakis@canonical.com> Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Added missing release callback for file_operations mptctl_fops.
Without release callback there will be never freed. It remains on
mptctl's eent list even after the file is closed and released.
Relavent RHEL bugzilla is 660871
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
If nfsd fails to find an exported via NFS file in the readahead cache, it
should increment corresponding nfsdstats counter (ra_depth[10]), but due to a
bug it may instead write to ra_depth[11], corrupting the following field.
In a kernel with NFSDv4 compiled in the corruption takes the form of an
increment of a counter of the number of NFSv4 operation 0's received; since
there is no operation 0, this is harmless.
In a kernel with NFSDv4 disabled it corrupts whatever happens to be in the
memory beyond nfsdstats.
Signed-off-by: Konstantin Khorenko <khorenko@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Ben Hutchings [Mon, 7 Feb 2011 19:20:55 +0000 (19:20 +0000)]
bonding/vlan: Avoid mangled NAs on slaves without VLAN tag insertion
This is related to commit f88a4a9b65a6f3422b81be995535d0e69df11bb8
upstream, but the bug cannot be properly fixed without the other
changes to VLAN tagging in 2.6.37.
bond_na_send() attempts to insert a VLAN tag in between building and
sending packets of the respective formats. If the slave does not
implement hardware VLAN tag insertion then vlan_put_tag() will mangle
the network-layer header because the Ethernet header is not present at
this point (unlike in bond_arp_send()).
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Mike reports that since e9e9250b (sched: Scale down cpu_power due to RT
tasks), wake_affine() goes funny on RT tasks due to them still having a
!0 weight and wake_affine() still subtracts that from the rq weight.
Since nobody should be using se->weight for RT tasks, set the value to
zero. Also, since we now use ->cpu_power to normalize rq weights to
account for RT cpu usage, add that factor into the imbalance computation.
Reported-by: Mike Galbraith <efault@gmx.de> Tested-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1275316109.27810.22969.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
[0.051203] CPU0: AMD QEMU Virtual CPU version 0.12.4 stepping 03
[0.052999] lockdep: fixing up alternatives.
[0.054105]
[0.054106] ===================================================
[0.054999] [ INFO: suspicious rcu_dereference_check() usage. ]
[0.054999] ---------------------------------------------------
[0.054999] kernel/sched.c:616 invoked rcu_dereference_check() without protection!
[0.054999]
[0.054999] other info that might help us debug this:
[0.054999]
[0.054999]
[0.054999] rcu_scheduler_active = 1, debug_locks = 1
[0.054999] 3 locks held by swapper/1:
[0.054999] #0: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff814be933>] cpu_up+0x42/0x6a
[0.054999] #1: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff810400d8>] cpu_hotplug_begin+0x2a/0x51
[0.054999] #2: (&rq->lock){-.-...}, at: [<ffffffff814be2f7>] init_idle+0x2f/0x113
[0.054999]
[0.054999] stack backtrace:
[0.054999] Pid: 1, comm: swapper Not tainted 2.6.35 #1
[0.054999] Call Trace:
[0.054999] [<ffffffff81068054>] lockdep_rcu_dereference+0x9b/0xa3
[0.054999] [<ffffffff810325c3>] task_group+0x7b/0x8a
[0.054999] [<ffffffff810325e5>] set_task_rq+0x13/0x40
[0.054999] [<ffffffff814be39a>] init_idle+0xd2/0x113
[0.054999] [<ffffffff814be78a>] fork_idle+0xb8/0xc7
[0.054999] [<ffffffff81068717>] ? mark_held_locks+0x4d/0x6b
[0.054999] [<ffffffff814bcebd>] do_fork_idle+0x17/0x2b
[0.054999] [<ffffffff814bc89b>] native_cpu_up+0x1c1/0x724
[0.054999] [<ffffffff814bcea6>] ? do_fork_idle+0x0/0x2b
[0.054999] [<ffffffff814be876>] _cpu_up+0xac/0x127
[0.054999] [<ffffffff814be946>] cpu_up+0x55/0x6a
[0.054999] [<ffffffff81ab562a>] kernel_init+0xe1/0x1ff
[0.054999] [<ffffffff81003854>] kernel_thread_helper+0x4/0x10
[0.054999] [<ffffffff814c353c>] ? restore_args+0x0/0x30
[0.054999] [<ffffffff81ab5549>] ? kernel_init+0x0/0x1ff
[0.054999] [<ffffffff81003850>] ? kernel_thread_helper+0x0/0x10
[0.056074] Booting Node 0, Processors #1lockdep: fixing up alternatives.
[0.130045] #2lockdep: fixing up alternatives.
[0.203089] #3 Ok.
[0.275286] Brought up 4 CPUs
[0.276005] Total of 4 processors activated (16017.17 BogoMIPS).
The cgroup_subsys_state structures referenced by idle tasks are never
freed, because the idle tasks should be part of the root cgroup,
which is not removable.
The problem is that while we do in-fact hold rq->lock, the newly spawned
idle thread's cpu is not yet set to the correct cpu so the lockdep check
in task_group():
lockdep_is_held(&task_rq(p)->lock)
will fail.
But this is a chicken and egg problem. Setting the CPU's runqueue requires
that the CPU's runqueue already be set. ;-)
So insert an RCU read-side critical section to avoid the complaint.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Here a newly created task is having its runqueue assigned. The new task
is not yet on the tasklist, so cannot go away. This is therefore a false
positive, suppress with an RCU read-side critical section.
Reported-by: Ben Greear <greearb@candelatech.com Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Ben Greear <greearb@candelatech.com Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Currently sched_avg_update() (which updates rt_avg stats in the rq)
is getting called from scale_rt_power() (in the load balance context)
which doesn't take rq->lock.
Fix it by moving the sched_avg_update() to more appropriate
update_cpu_load() where the CFS load gets updated as well.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1282596171.2694.3.camel@sbsiddha-MOBL3> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
When there's an xHCI host power loss after a suspend from memory, the USB
core attempts to reset and verify the USB devices that are attached to the
system. The xHCI driver has to reallocate those devices, since the
hardware lost all knowledge of them during the power loss.
When a hub is plugged in, and the host loses power, the xHCI hardware
structures are not updated to say the device is a hub. This is usually
done in hub_configure() when the USB hub is detected. That function is
skipped during a reset and verify by the USB core, since the core restores
the old configuration and alternate settings, and the hub driver has no
idea this happened. This bug makes the xHCI host controller reject the
enumeration of low speed devices under the resumed hub.
Therefore, make the USB core re-setup the internal xHCI hub device
information by calling update_hub_device() when hub_activate() is called
for a hub reset resume. After a host power loss, all devices under the
roothub get a reset-resume or a disconnect.
This patch should be queued for the 2.6.37 stable tree.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Clearing the cpu in prev's mm_cpumask early will avoid the flush tlb
IPI's while the cr3 is still pointing to the prev mm. And this window
can lead to the possibility of bogus TLB fills resulting in strange
failures. One such problematic scenario is mentioned below.
T1. CPU-1 is context switching from mm1 to mm2 context and got a NMI
etc between the point of clearing the cpu from the mm_cpumask(mm1)
and before reloading the cr3 with the new mm2.
T2. CPU-2 is tearing down a specific vma for mm1 and will proceed with
flushing the TLB for mm1. It doesn't send the flush TLB to CPU-1
as it doesn't see that cpu listed in the mm_cpumask(mm1).
T3. After the TLB flush is complete, CPU-2 goes ahead and frees the
page-table pages associated with the removed vma mapping.
T4. CPU-2 now allocates those freed page-table pages for something
else.
T5. As the CR3 and TLB caches for mm1 is still active on CPU-1, CPU-1
can potentially speculate and walk through the page-table caches
and can insert new TLB entries. As the page-table pages are
already freed and being used on CPU-2, this page walk can
potentially insert a bogus global TLB entry depending on the
(random) contents of the page that is being used on CPU-2.
T6. This bogus TLB entry being global will be active across future CR3
changes and can result in weird memory corruption etc.
To avoid this issue, for the prev mm that is handing over the cpu to
another mm, clear the cpu from the mm_cpumask(prev) after the cr3 is
changed.
Marking it for -stable, though we haven't seen any reported failure that
can be attributed to this.