Emit warning when "mem=nopentium" is specified on any arch other
than x86_32 (the only that arch supports it).
Signed-off-by: Kamal Mostafa <kamal@canonical.com> BugLink: http://bugs.launchpad.net/bugs/553464 Cc: Yinghai Lu <yinghai@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl>
LKML-Reference: <1296783486-23033-2-git-send-email-kamal@canonical.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Avoid removing all of memory and panicing when "mem={invalid}"
is specified, e.g. mem=blahblah, mem=0, or mem=nopentium (on
platforms other than x86_32).
Signed-off-by: Kamal Mostafa <kamal@canonical.com> BugLink: http://bugs.launchpad.net/bugs/553464 Cc: Yinghai Lu <yinghai@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl>
LKML-Reference: <1296783486-23033-1-git-send-email-kamal@canonical.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When the fuction graph tracer starts, it needs to make a special
stack for each task to save the real return values of the tasks.
All running tasks have this stack created, as well as any new
tasks.
On CPU hot plug, the new idle task will allocate a stack as well
when init_idle() is called. The problem is that cpu hotplug does
not create a new idle_task. Instead it uses the idle task that
existed when the cpu went down.
ftrace_graph_init_task() will add a new ret_stack to the task
that is given to it. Because a clone will make the task
have a stack of its parent it does not check if the task's
ret_stack is already NULL or not. When the CPU hotplug code
starts a CPU up again, it will allocate a new stack even
though one already existed for it.
The solution is to treat the idle_task specially. In fact, the
function_graph code already does, just not at init_idle().
Instead of using the ftrace_graph_init_task() for the idle task,
which that function expects the task to be a clone, have a
separate ftrace_graph_init_idle_task(). Also, we will create a
per_cpu ret_stack that is used by the idle task. When we call
ftrace_graph_init_idle_task() it will check if the idle task's
ret_stack is NULL, if it is, then it will assign it the per_cpu
ret_stack.
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
mm_fault_error() should not execute oom-killer, if page fault
occurs in kernel space. E.g. in copy_from_user()/copy_to_user().
This would happen if we find ourselves in OOM on a
copy_to_user(), or a copy_from_user() which faults.
Without this patch, the kernels hangs up in copy_from_user(),
because OOM killer sends SIG_KILL to current process, but it
can't handle a signal while in syscall, then the kernel returns
to copy_from_user(), reexcute current command and provokes
page_fault again.
With this patch the kernel return -EFAULT from copy_from_user().
The code, which checks that page fault occurred in kernel space,
has been copied from do_sigbus().
This situation is handled by the same way on powerpc, xtensa,
tile, ...
When au1000_eth probes the MII bus for PHY address, if we do not set
au1000_eth platform data's phy_search_highest_address, the MII probing
logic will exit early and will assume a valid PHY is found at address 0.
For MTX-1, the PHY is at address 31, and without this patch, the link
detection/speed/duplex would not work correctly.
ata_qc_complete() contains special handling for certain commands. For
example, it schedules EH for device revalidation after certain
configurations are changed. These shouldn't be applied to EH
commands but they were.
In most cases, it doesn't cause an actual problem because EH doesn't
issue any command which would trigger special handling; however, ACPI
can issue such commands via _GTF which can cause weird interactions.
Restructure ata_qc_complete() such that EH commands are always passed
on to __ata_qc_complete().
stable: Please apply to -stable only after 2.6.38 is released.
Add necessary alias to autoload ip6ip6 tunnel module.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Since a8f80e8ff94ecba629542d9b4b5f5a8ee3eb565c any process with
CAP_NET_ADMIN may load any module from /lib/modules/. This doesn't mean
that CAP_NET_ADMIN is a superset of CAP_SYS_MODULE as modules are
limited to /lib/modules/**. However, CAP_NET_ADMIN capability shouldn't
allow anybody load any module not related to networking.
This patch restricts an ability of autoloading modules to netdev modules
with explicit aliases. This fixes CVE-2011-1019.
Arnd Bergmann suggested to leave untouched the old pre-v2.6.32 behavior
of loading netdev modules by name (without any prefix) for processes
with CAP_SYS_MODULE to maintain the compatibility with network scripts
that use autoloading netdev modules by aliases like "eth0", "wlan0".
Currently there are only three users of the feature in the upstream
kernel: ipip, ip_gre and sit.
root@albatros:~# capsh --drop=$(seq -s, 0 11),$(seq -s, 13 34) --
root@albatros:~# grep Cap /proc/$$/status
CapInh: 0000000000000000
CapPrm: fffffff800001000
CapEff: fffffff800001000
CapBnd: fffffff800001000
root@albatros:~# modprobe xfs
FATAL: Error inserting xfs
(/lib/modules/2.6.38-rc6-00001-g2bf4ca3/kernel/fs/xfs/xfs.ko): Operation not permitted
root@albatros:~# lsmod | grep xfs
root@albatros:~# ifconfig xfs
xfs: error fetching interface information: Device not found
root@albatros:~# lsmod | grep xfs
root@albatros:~# lsmod | grep sit
root@albatros:~# ifconfig sit
sit: error fetching interface information: Device not found
root@albatros:~# lsmod | grep sit
root@albatros:~# ifconfig sit0
sit0 Link encap:IPv6-in-IPv4
NOARP MTU:1480 Metric:1
root@albatros:~# lsmod | grep sit
sit 10457 0
tunnel4 2957 1 sit
For CAP_SYS_MODULE module loading is still relaxed:
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Kees Cook <kees.cook@canonical.com> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
For the JR3/PCI cards, the size of the PCIBAR0 region depends on the
number of channels. Don't try and ioremap space for 4 channels if the
card has fewer channels. Also check for ioremap failure.
Thanks to Anders Blomdell for input and Sami Hussein for testing.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I found that one of the 8168c chipsets (concretely XID 1c4000c0) starts
generating RxFIFO overflow errors. The result is an infinite loop in
interrupt handler as the RxFIFOOver is handled only for ...MAC_VER_11.
With the workaround everything goes fine.
Signed-off-by: Ivan Vecera <ivecera@redhat.com> Acked-by: Francois Romieu <romieu@fr.zoreil.com> Cc: Hayes <hayeswang@realtek.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Like many other places, we have to check that the array index is
within allowed limits, or otherwise, a kernel oops and other nastiness
can ensue when we access memory beyond the end of the array.
The problem goes back to v2.6.30-rc1~1372~1342~31 where nf_log_bind
was decoupled from nf_log_register.
Reported-by: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>,
via irc.freenode.net/#netfilter Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When CPU hotplug is used, some CPUs may be offline at the time a kexec is
performed. The subsequent kernel may expect these CPUs to be already running,
and will declare them stuck. On pseries, there's also a soft-offline (cede)
state that CPUs may be in; this can also cause problems as the kexeced kernel
may ask RTAS if they're online -- and RTAS would say they are. The CPU will
either appear stuck, or will cause a crash as we replace its cede loop beneath
it.
This patch kicks each present offline CPU awake before the kexec, so that
none are forever lost to these assumptions in the subsequent kernel.
Now, the behaviour is that all available CPUs that were offlined are now
online & usable after the kexec. This mimics the behaviour of a full reboot
(on which all CPUs will be restarted).
Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Kamalesh babulal <kamalesh@linux.vnet.ibm.com>
cc: Anton Blanchard <anton@samba.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently for kexec the PTE tear down on 1TB segment systems normally
requires 3 hcalls for each PTE removal. On a machine with 32GB of
memory it can take around a minute to remove all the PTEs.
This optimises the path so that we only remove PTEs that are valid.
It also uses the read 4 PTEs at once HCALL. For the common case where
a PTEs is invalid in a 1TB segment, this turns the 3 HCALLs per PTE
down to 1 HCALL per 4 PTEs.
This gives an > 10x speedup in kexec times on PHYP, taking a 32GB
machine from around 1 minute down to a few seconds.
Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Kamalesh babulal <kamalesh@linux.vnet.ibm.com>
cc: Anton Blanchard <anton@samba.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This adds plpar_pte_read_4_raw() which can be used read 4 PTEs from
PHYP at a time, while in real mode.
It also creates a new hcall9 which can be used in real mode. It's the
same as plpar_hcall9 but minus the tracing hcall statistics which may
require variables outside the RMO.
Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Kamalesh babulal <kamalesh@linux.vnet.ibm.com> Cc: Anton Blanchard <anton@samba.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On large machines we are running out of room below 256MB. In some cases we
only need to ensure the allocation is in the first segment, which may be
256MB or 1TB.
Add slb0_limit and use it to specify the upper limit for the irqstack and
emergency stacks.
On a large ppc64 box, this fixes a panic at boot when the crashkernel=
option is specified (previously we would run out of memory below 256MB).
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
IOMMU table initialized, virtual merging enabled
Interrupt 155954 (real) is invalid, disabling it.
Interrupt 155953 (real) is invalid, disabling it.
ie we took some spurious interrupts. default_machine_crash_shutdown tries
to disable all interrupt sources but uses chip->disable which maps to
the default action of:
static void default_disable(unsigned int irq)
{
}
If we use chip->shutdown, then we actually mask the IRQ:
We wrap the crash_shutdown_handles[] calls with longjmp/setjmp, so if any
of them fault we can recover. The problem is we add a hook to the debugger
fault handler hook which calls longjmp unconditionally.
This first part of kdump is run before we marshall the other CPUs, so there
is a very good chance some CPU on the box is going to page fault. And when
it does it hits the longjmp code and assumes the context of the oopsing CPU.
The machine gets very confused when it has 10 CPUs all with the same stack,
all thinking they have the same CPU id. I get even more confused trying
to debug it.
The patch below adds crash_shutdown_cpu and uses it to specify which cpu is
in the protected region. Since it can only be -1 or the oopsing CPU, we don't
need to use memory barriers since it is only valid on the local CPU - no other
CPU will ever see a value that matches it's local CPU id.
Eventually we should switch the order and marshall all CPUs before doing the
crash_shutdown_handles[] calls, but that is a bigger fix.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Kamalesh babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Robert Swiecki reported a BUG_ON(page_mapped) from a fuzzer, punching
a hole with madvise(,, MADV_REMOVE). That path is under mutex, and
cannot be explained by lack of serialization in unmap_mapping_range().
Reviewing the code, I found one place where vm_truncate_count handling
should have been updated, when I switched at the last minute from one
way of managing the restart_addr to another: mremap move changes the
virtual addresses, so it ought to adjust the restart_addr.
But rather than exporting the notion of restart_addr from memory.c, or
converting to restart_pgoff throughout, simply reset vm_truncate_count
to 0 to force a rescan if mremap move races with preempted truncation.
We have no confirmation that this fixes Robert's BUG,
but it is a fix that's worth making anyway.
We have found a hardware erratum on 82599 hardware that can lead to
unpredictable behavior when Header Splitting mode is enabled. So
we are no longer enabling this feature on affected hardware.
Please see the 82599 Specification Update for more information.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
However the code in rxrpc_instantiate strips it away:
data += sizeof(kver);
datalen -= sizeof(kver);
Removing kif_version fixes my problem.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When we get oplock break notification we should set the appropriate
value of OplockLevel field in oplock break acknowledge according to
the oplock level held by the client in this time. As we only can have
level II oplock or no oplock in the case of oplock break, we should be
aware only about clientCanCacheRead field in cifsInodeInfo structure.
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
NETDEV_NOTIFY_PEER is an explicit request by the driver to send a link
notification while NETDEV_UP/NETDEV_CHANGEADDR generate link
notifications as a sort of side effect.
In the later cases the sysctl option is present because link
notification events can have undesired effects e.g. if the link is
flapping. I don't think this applies in the case of an explicit
request from a driver.
This patch makes NETDEV_NOTIFY_PEER unconditional, if preferred we
could add a new sysctl for this case which defaults to on.
This change causes Xen post-migration ARP notifications (which cause
switches to relearn their MAC tables etc) to be sent by default.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[reported to solve hyperv live migration problem - gkh] Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Mike Surcouf <mike@surcouf.co.uk> Cc: Hank Janssen <hjanssen@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If the iowarrior devices in this case statement support more than 8 bytes
per report, it is possible to write past the end of a kernel heap allocation.
This will probably never be possible, but change the allocation to be more
defensive anyway.
For some time is known that ASPM is causing troubles on r8169, i.e. make
device randomly stop working without any errors in dmesg.
Currently Tomi Leppikangas reports that system with r8169 device hangs
with MCE errors when ASPM is enabled:
https://bugzilla.redhat.com/show_bug.cgi?id=642861#c4
Lets disable ASPM for r8169 devices at all, to avoid problems with
r8169 PCIe devices at least for some users.
Reported-by: Tomi Leppikangas <tomi.leppikangas@gmail.com> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When support for 82577/82578 was added[1] in 2.6.31, PHY wakeup was in-
advertently enabled (even though it does not function properly) on ICH10
LOMs. This patch makes it so that the ICH10 LOMs use MAC wakeup instead
as was done with the initial support for those devices (i.e. 82567LM-3,
82567LF-3 and 82567V-4).
This fixes a bug in the order of dccp_rcv_state_process() that still permitted
reception even after closing the socket. A Reset after close thus causes a NULL
pointer dereference by not preventing operations on an already torn-down socket.
dccp_v4_do_rcv()
|
| state other than OPEN
v
dccp_rcv_state_process()
|
| DCCP_PKT_RESET
v
dccp_rcv_reset()
|
v
dccp_time_wait()
WARNING: at net/ipv4/inet_timewait_sock.c:141 __inet_twsk_hashdance+0x48/0x128()
Modules linked in: arc4 ecb carl9170 rt2870sta(C) mac80211 r8712u(C) crc_ccitt ah
[<c0038850>] (unwind_backtrace+0x0/0xec) from [<c0055364>] (warn_slowpath_common)
[<c0055364>] (warn_slowpath_common+0x4c/0x64) from [<c0055398>] (warn_slowpath_n)
[<c0055398>] (warn_slowpath_null+0x1c/0x24) from [<c02b72d0>] (__inet_twsk_hashd)
[<c02b72d0>] (__inet_twsk_hashdance+0x48/0x128) from [<c031caa0>] (dccp_time_wai)
[<c031caa0>] (dccp_time_wait+0x40/0xc8) from [<c031c15c>] (dccp_rcv_state_proces)
[<c031c15c>] (dccp_rcv_state_process+0x120/0x538) from [<c032609c>] (dccp_v4_do_)
[<c032609c>] (dccp_v4_do_rcv+0x11c/0x14c) from [<c0286594>] (release_sock+0xac/0)
[<c0286594>] (release_sock+0xac/0x110) from [<c031fd34>] (dccp_close+0x28c/0x380)
[<c031fd34>] (dccp_close+0x28c/0x380) from [<c02d9a78>] (inet_release+0x64/0x70)
The fix is by testing the socket state first. Receiving a packet in Closed state
now also produces the required "No connection" Reset reply of RFC 4340, 8.3.1.
Reported-and-tested-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Reported-by: Mark Davis Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Acan FG-8100 barcode reader (0x04b4/0xbca1) has vendor ID of
cypress and requires the same MIN/MAX swap descriptor quirk
as other barcode readers from cypress.
o If tx and rx resources are not available, during set mac request.
Then this request wont be passed to firmware and it will be added to
driver mac list and will never make it to firmware.
So if resources are not available, don't add it to driver mac list.
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As all virtio devices perform DMA, we
must enable bus mastering for them to be
spec compliant.
This patch fixes hotplug of virtio devices
with Linux guests and qemu 0.11-0.12.
Tested-by: Alexander Graf <agraf@suse.de> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When we finish processing ASCONF_ACK chunk, we try to send
the next queued ASCONF. This action runs the sctp state
machine recursively and it's not prepared to do so.
vfs_rename_other() does not lock renamed inode with i_mutex. Thus changing
i_nlink in a non-atomic manner (which happens in ext2_rename()) can corrupt
it as reported and analyzed by Josh.
In fact, there is no good reason to mess with i_nlink of the moved file.
We did it presumably to simulate linking into the new directory and unlinking
from an old one. But the practical effect of this is disputable because fsck
can possibly treat file as being properly linked into both directories without
writing any error which is confusing. So we just stop increment-decrement
games with i_nlink which also fixes the corruption.
CC: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Josh Hunt <johunt@akamai.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When the per cpu timer is marked CLOCK_EVT_FEAT_C3STOP, then we only
can switch into oneshot mode, when the backup broadcast device
supports oneshot mode as well. Otherwise we would try to switch the
broadcast device into an unsupported mode unconditionally. This went
unnoticed so far as the current available broadcast devices support
oneshot mode. Seth unearthed this problem while debugging and working
around an hpet related BIOS wreckage.
Add the necessary check to tick_is_oneshot_available().
A customer of ours, complained that when setting the reset
vector back to 0, it trashed other data and hung their box.
They noticed when only 4 bytes were set to 0 instead of 8,
everything worked correctly.
Mathew pointed out:
|
| We're supposed to be resetting trampoline_phys_low and
| trampoline_phys_high here, which are two 16-bit values.
| Writing 64 bits is definitely going to overwrite space
| that we're not supposed to be touching.
|
So limit the area modified to u32.
Signed-off-by: Don Zickus <dzickus@redhat.com> Acked-by: Matthew Garrett <mjg@redhat.com>
LKML-Reference: <1297139100-424-1-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Current refcounttree codes actually didn't writeback the new pages out in
write-back mode, due to a bug of always passing a ZERO number of clusters
to 'ocfs2_cow_sync_writeback', the patch tries to pass a proper one in.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In x25_link_free(), we destroy 'nb' before dereferencing
'nb->dev'. Don't do this, because 'nb' might be freed
by then.
Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Tested-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
does not take into account that the remaining data length can be less
than sg_dma_len(sg). In that case, running_total can end up being
greater than the total data length, so an extra TRB is counted.
Changing the expression to
while (running_total < sg_dma_len(sg) && running_total < temp)
fixes that.
This patch should be queued for stable kernels back to 2.6.31.
Signed-off-by: Paul Zimmerman <paulz@synopsys.com> Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
are incorrect, because running_total can never be zero, so the if()
expression will never be true. I think the intention was that
running_total be in the range of 0 to TRB_MAX_BUFF_SIZE-1, not 1
to TRB_MAX_BUFF_SIZE. So adding a
running_total &= TRB_MAX_BUFF_SIZE - 1;
fixes the problem.
This patch should be queued for stable kernels back to 2.6.31.
Signed-off-by: Paul Zimmerman <paulz@synopsys.com> Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This makes it easier to spot some problems, which will be fixed by the
next patch in the series. Also change dev_dbg to dev_err in
check_trb_math(), so any math errors will be visible even when running
with debug disabled.
Note: This patch changes the expressions containing
"((1 << TRB_MAX_BUFF_SHIFT) - 1)" to use the equivalent
"(TRB_MAX_BUFF_SIZE - 1)". No change in behavior is intended for
those expressions.
This patch should be queued for stable kernels back to 2.6.31.
Signed-off-by: Paul Zimmerman <paulz@synopsys.com> Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On some SB800 systems polarity for IOAPIC pin2 is wrongly
specified as low active by BIOS. This caused system hangs after
resume from S3 when HPET was used in one-shot mode on such
systems because a timer interrupt was missed (HPET signal is
high active).
'mdp' devices are md devices with preallocated device numbers
for partitions. As such it is possible to mknod and open a partition
before opening the whole device.
this causes md_probe() to be called with a device number of a
partition, which in-turn calls mddev_find with such a number.
However mddev_find expects the number of a 'whole device' and
does the wrong thing with partition numbers.
So add code to mddev_find to remove the 'partition' part of
a device number and just work with the 'whole device'.
This patch addresses https://bugzilla.kernel.org/show_bug.cgi?id=28652
The kernel automatically evaluates partition tables of storage devices.
The code for evaluating LDM partitions (in fs/partitions/ldm.c) contains
a bug that causes a kernel oops on certain corrupted LDM partitions. A
kernel subsystem seems to crash, because, after the oops, the kernel no
longer recognizes newly connected storage devices.
The patch changes ldm_parse_vmdb() to Validate the value of vblk_size.
Signed-off-by: Timo Warns <warns@pre-sense.de> Cc: Eugene Teo <eugeneteo@kernel.sg> Acked-by: Richard Russon <ldm@flatcap.org> Cc: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In several places, an epoll fd can call another file's ->f_op->poll()
method with ep->mtx held. This is in general unsafe, because that other
file could itself be an epoll fd that contains the original epoll fd.
The code defends against this possibility in its own ->poll() method using
ep_call_nested, but there are several other unsafe calls to ->poll
elsewhere that can be made to deadlock. For example, the following simple
program causes the call in ep_insert recursively call the original fd's
->poll, leading to deadlock:
#include <unistd.h>
#include <sys/epoll.h>
int main(void) {
int e1, e2, p[2];
struct epoll_event evt = {
.events = EPOLLIN
};
On insertion, check whether the inserted file is itself a struct epoll,
and if so, do a recursive walk to detect whether inserting this file would
create a loop of epoll structures, which could lead to deadlock.
[nelhage@ksplice.com: Use epmutex to serialize concurrent inserts] Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Nelson Elhage <nelhage@ksplice.com> Reported-by: Nelson Elhage <nelhage@ksplice.com> Tested-by: Nelson Elhage <nelhage@ksplice.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
"DMA transfers need to be synced properly in order for
the cpu and device to see the most uptodate and correct
copy of the DMA buffer."
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
My Galaxy Spica needs this quirk when in modem mode, otherwise
it causes endless USB bus resets and is unusable in this mode.
Unfortunately Samsung decided to reuse ID of its old CDMA phone SGH-I500
for the modem part.
That's why in addition to this patch the visor driver must be prevented
from binding to SPH-I500 ID, so ACM driver can do that.
Signed-off-by: Maciej Szmigiero <mhej@o2.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[USB]Add Samsung SGH-I500/Android modem ID switch to visor driver
Samsung decided to reuse USB ID of its old CDMA phone SGH-I500 for the
modem part of some of their Android phones. At least Galaxy Spica
is affected.
This modem needs ACM driver and does not work with visor driver which
binds the conflicting ID for SGH-I500.
Because SGH-I500 is pretty an old hardware its best to add switch to
visor
driver in cause somebody still wants to use that phone with Linux.
Note that this is needed only when using the Android phone as modem,
not in USB storage or ADB mode.
Signed-off-by: Maciej Szmigiero <mhej@o2.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1448) adds a quirks entry for the Keytouch QWERTY Panel
firmware, used in the IEC 60945 keyboard. This device crashes during
enumeration when the computer asks for its configuration string
descriptor.
The idle timer could trigger after clock had been disabled leading to
kernel panic when MUSB_DEVCTL is accessed in musb_do_idle on 2.6.37.
The fault below is no longer triggered on 2.6.38-rc4 (clock is disabled
later, and only if compiled as a module, and the offending memory access
has moved) but the timer should be cancelled nonetheless.
With CONFIG_SHIRQ_DEBUG=y we call a newly installed interrupt handler
in request_threaded_irq().
The original implementation (commit a304e1b8) called the handler
_BEFORE_ it was installed, but that caused problems with handlers
calling disable_irq_nosync(). See commit 377bf1e4.
It's braindead in the first place to call disable_irq_nosync in shared
handlers, but ....
Moving this call after we installed the handler looks innocent, but it
is very subtle broken on SMP.
Interrupt handlers rely on the fact, that the irq core prevents
reentrancy.
Now this debug call violates that promise because we run the handler
w/o the IRQ_INPROGRESS protection - which we cannot apply here because
that would result in a possibly forever masked interrupt line.
A concurrent real hardware interrupt on a different CPU results in
handler reentrancy and can lead to complete wreckage, which was
unfortunately observed in reality and took a fricking long time to
debug.
Leave the code here for now. We want this debug feature, but that's
not easy to fix. We really should get rid of those
disable_irq_nosync() abusers and remove that function completely.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Anton Vorontsov <avorontsov@ru.mvista.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Don't allow everybody to change ACPI settings. The comment says that it
is done deliberatelly, however, the comment before disp_proc_write()
says that at least one of these setting is experimental.
The lower filesystem may do some type of inode revalidation during a
getattr call. eCryptfs should take advantage of that by copying the
lower inode attributes to the eCryptfs inode after a call to
vfs_getattr() on the lower inode.
I originally wrote this fix while working on eCryptfs on nfsv3 support,
but discovered it also fixed an eCryptfs on ext4 nanosecond timestamp
bug that was reported.
Ensure a predictable endian state when entering signal handlers. This
avoids programs which use SETEND to momentarily switch their endian
state from having their signal handlers entered with an unpredictable
endian state.
Acked-by: Dave Martin <dave.martin@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| drivers/media/radio/radio-aimslab.c: In function ‘rt_decvol’:
| drivers/media/radio/radio-aimslab.c:76: error: implicit declaration of function ‘msleep’
Currently we return 0 in swsusp_alloc() when alloc_image_page() fails.
Fix that. Also remove unneeded "error" variable since the only
useful value of error is -ENOMEM.
[rjw: Fixed up the changelog and changed subject.]
Signed-off-by: Stanislaw Gruszka <stf_xl@wp.pl> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
task_show_regs used to be a debugging aid in the early bringup days
of Linux on s390. /proc/<pid>/status is a world readable file, it
is not a good idea to show the registers of a process. The only
correct fix is to remove task_show_regs.
Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit 7124fe0a5b619d65b739477b3b55a20bf805b06d ("xfs: validate untrusted inode
numbers during lookup") changes the inode lookup code to do btree lookups for
untrusted inode numbers. This change made an invalid assumption about the
alignment of inodes and hence incorrectly calculated the first inode in the
cluster. As a result, some inode numbers were being incorrectly considered
invalid when they were actually valid.
The issue was not picked up by the xfstests suite because it always runs fsr
and dump (the two utilities that utilise the bulkstat interface) on cache hot
inodes and hence the lookup code in the cold cache path was not sufficiently
exercised to uncover this intermittent problem.
Fix the issue by relaxing the btree lookup criteria and then checking if the
record returned contains the inode number we are lookup for. If it we get an
incorrect record, then the inode number is invalid.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
[dannf: Backported to 2.6.32.y] Cc: dann frazier <dannf@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The block number comes from bulkstat based inode lookups to shortcut
the mapping calculations. We ar enot able to trust anything from
bulkstat, so drop the block number as well so that the correct
lookups and mappings are always done.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
[dannf: Backported to 2.6.32.y] Cc: dann frazier <dannf@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Inode numbers may come from somewhere external to the filesystem
(e.g. file handles, bulkstat information) and so are inherently
untrusted. Rename the flag we use for these lookups to make it
obvious we are doing a lookup of an untrusted inode number and need
to verify it completely before trying to read it from disk.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
[dannf: backported to 2.6.32.y] Cc: dann frazier <dannf@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When we decode a handle or do a bulkstat lookup, we are using an
inode number we cannot trust to be valid. If we are deleting inode
chunks from disk (default noikeep mode), then we cannot trust the on
disk inode buffer for any given inode number to correctly reflect
whether the inode has been unlinked as the di_mode nor the
generation number may have been updated on disk.
This is due to the fact that when we delete an inode chunk, we do
not write the clusters back to disk when they are removed - instead
we mark them stale to avoid them being written back potentially over
the top of something that has been subsequently allocated at that
location. The result is that we can have locations of disk that look
like they contain valid inodes but in reality do not. Hence we
cannot simply convert the inode number to a block number and read
the location from disk to determine if the inode is valid or not.
As a result, and XFS_IGET_BULKSTAT lookup needs to actually look the
inode up in the inode allocation btree to determine if the inode
number is valid or not.
It should be noted even on ikeep filesystems, there is the
possibility that blocks on disk may look like valid inode clusters.
e.g. if there are filesystem images hosted on the filesystem. Hence
even for ikeep filesystems we really need to validate that the inode
number is valid before issuing the inode buffer read.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
[dannf: backported to 2.6.32.y] Cc: dann frazier <dannf@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The non-coherent bulkstat versionsthat look directly at the inode
buffers causes various problems with performance optimizations that
make increased use of just logging inodes. This patch makes bulkstat
always use iget, which should be fast enough for normal use with the
radix-tree based inode cache introduced a while ago.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
[dannf: backported to 2.6.32.y] Cc: dann frazier <dannf@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Reported-by: Min Zhang <mzhang@mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Moritz Muehlenhoff <jmm@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As noted by Steve Chen, since commit f5fff5dc8a7a3f395b0525c02ba92c95d42b7390 ("tcp: advertise MSS
requested by user") we can end up with a situation where
tcp_select_initial_window() does a divide by a zero (or
even negative) mss value.
The problem is that sometimes we effectively subtract
TCPOLEN_TSTAMP_ALIGNED and/or TCPOLEN_MD5SIG_ALIGNED from the mss.
Fix this by increasing the minimum from 8 to 64.
Reported-by: Steve Chen <schen@mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Moritz Muehlenhoff <jmm@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
For certain skews of the BE adapter, H/W Tx and Rx
counters could be common for more than one interface.
Add Tx and Rx counters in the adapter structure
(to maintain stats on a per interfae basis).
Signed-off-by: Ajit Khaparde <ajitk@serverengines.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Don't forget to release the module refcnt if seq_open() returns failure.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: J. Bruce Fields <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When suspending a failed mirror, bios are completed by mirror_end_io() and
__rh_lookup() in dm_rh_dec() returns NULL where a non-NULL return value is
required by design. Fix this by not changing the state of the recovery failed
region from DM_RH_RECOVERING to DM_RH_NOSYNC in dm_rh_recovery_end().
Issue
On 2.6.33-rc1 kernel, I hit the bug when I suspended the failed
mirror by dmsetup command.
When recovery process of a region failed, dm_rh_recovery_end() function
changes the state of the region from RM_RH_RECOVERING to DM_RH_NOSYNC.
When recovery_complete() is executed between dm_rh_update_states() and
dm_writes() in do_mirror(), bios are processed with the region state,
DM_RH_NOSYNC. However, the region data is freed without checking its
pending count when dm_rh_update_states() is called next time.
When bios are finished by mirror_end_io(), __rh_lookup() in dm_rh_dec()
returns NULL even though a valid return value are expected.
Solution
Remove the state change of the recovery failed region from DM_RH_RECOVERING
to DM_RH_NOSYNC in dm_rh_recovery_end(). We can remove the state change
because:
- If the region data has been released by dm_rh_update_states(),
a new region data is created with the state of DM_RH_NOSYNC, and
bios are processed according to the DM_RH_NOSYNC state.
- If the region data has not been released by dm_rh_update_states(),
a state of the region is DM_RH_RECOVERING and bios are put in the
delayed_bio list.
The flag change from DM_RH_RECOVERING to DM_RH_NOSYNC in dm_rh_recovery_end()
was added in the following commit:
dm raid1: handle resync failures
author Jonathan Brassow <jbrassow@redhat.com>
Thu, 12 Jul 2007 16:29:04 +0000 (17:29 +0100)
http://git.kernel.org/linus/f44db678edcc6f4c2779ac43f63f0b9dfa28b724
This patch solves a corner case during allocation which occurs if both
metadata (indirect) and data blocks are required but there is an
obstacle in the filesystem (e.g. a resource group header or another
allocated block) such that when the allocation is requested only
enough blocks for the metadata are returned.
By changing the exit condition of this loop, we ensure that a
minimum of one data block will always be returned.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If the mirror log fails when the handle_errors option was not selected
and there is no remaining valid mirror leg, writes return success even
though they weren't actually written to any device. This patch
completes them with EIO instead.
This code path is taken:
do_writes:
bio_list_merge(&ms->failures, &sync);
do_failures:
if (!get_valid_mirror(ms)) (false)
else if (errors_handled(ms)) (false)
else bio_endio(bio, 0);
The logic in do_failures is based on presuming that the write was already
tried: if it succeeded at least on one leg (without handle_errors) it
is reported as success.
Adds IBM Power Virtual SCSI ALUA devices to the ALUA device handler.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The PowerPC architecture does not require loads to independent bytes to be
ordered without adding an explicit barrier.
In ixgbe_clean_rx_irq we load the status bit then load the packet data.
With packet split disabled if these loads go out of order we get a
stale packet, but we will notice the bad sequence numbers and drop it.
The problem occurs with packet split enabled where the TCP/IP header and data
are in different descriptors. If the reads go out of order we may have data
that doesn't match the TCP/IP header. Since we use hardware checksumming this
bad data is never verified and it makes it all the way to the application.
This bug was found during stress testing and adding this barrier has been shown
to fix it.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>