Currently EHEA reports to ethtool as supporting 10M, 100M, 1G and
10G and connected to FIBRE independent of the hardware configuration.
However, when connected to FIBRE the only supported speed is 10G
full-duplex, and the other speeds and modes are only supported
when connected to twisted pair.
Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com> Acked-by: Breno Leitao <leitao@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Using the vmxnet3 driver produces a lockdep warning because
vmxnet3_set_mc(), which is called with mc->mca_lock held, takes
adapter->cmd_lock. However, there are a couple of places where
adapter->cmd_lock is taken with softirqs enabled, lockdep warns that a
softirq that tries to take mc->mca_lock could happen while
adapter->cmd_lock is held, leading to an AB-BA deadlock.
I'm not sure if this is a real potential deadlock or not, but the
simplest and best fix seems to be simply to make sure we take cmd_lock
with spin_lock_irqsave() everywhere -- the places with plain spin_lock
just look like oversights.
The full enormous lockdep warning is:
=========================================================
[ INFO: possible irq lock inversion dependency detected ]
2.6.39-rc6+ #1
---------------------------------------------------------
ifconfig/567 just changed the state of lock:
(&(&mc->mca_lock)->rlock){+.-...}, at: [<ffffffff81531e9f>] mld_ifc_timer_expire+0xff/0x280
but this lock took another, SOFTIRQ-unsafe lock in the past:
(&(&adapter->cmd_lock)->rlock){+.+...}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
4 locks held by ifconfig/567:
#0: (rtnl_mutex){+.+.+.}, at: [<ffffffff8147d547>] rtnl_lock+0x17/0x20
#1: ((inetaddr_chain).rwsem){.+.+.+}, at: [<ffffffff810896cf>] __blocking_notifier_call_chain+0x5f/0xb0
#2: (&idev->mc_ifc_timer){+.-...}, at: [<ffffffff8106f21b>] run_timer_softirq+0xeb/0x3f0
#3: (&ndev->lock){++.-..}, at: [<ffffffff81531dd2>] mld_ifc_timer_expire+0x32/0x280
Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Shreyas N Bhatewara <sbhatewara@vmware.com> Signed-off-by: Scott J. Goldman <scottjg@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The USB protocol this driver implements appears to require 2 bytes of
padding in front of each received packet. This used to be equal to
the value of NET_IP_ALIGN on x86, so the driver abused that constant
and mostly worked, but this is no longer the case. The driver also
mixed up the URB and packet lengths, resulting in 2 bytes of junk at
the end of the skb.
Introduce a private constant for the 2 bytes of padding; fix this
confusion and check for the under-length case.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
RTR frames do have a valid data length code on CAN.
The driver for SJA1000 did not handle that situation properly.
Signed-off-by: Kurt Van Dijck <kurt.van.dijck@eia.be> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit 4a94445c9a5c (net: Use ip_route_input_noref() in input path)
added a bug in IP defragmentation handling, in case timeout is fired.
When a frame is defragmented, we use last skb dst field when building
final skb. Its dst is valid, since we are in rcu read section.
But if a timeout occurs, we take first queued fragment to build one ICMP
TIME EXCEEDED message. Problem is all queued skb have weak dst pointers,
since we escaped RCU critical section after their queueing. icmp_send()
might dereference a now freed (and possibly reused) part of memory.
Calling skb_dst_drop() and ip_route_input_noref() to revalidate route is
the only possible choice.
Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The SNAPSHOT_S2RAM ioctl used for implementing the feature allowing
one to suspend to RAM after creating a hibernation image is currently
broken, because it doesn't clear the "ready" flag in the struct
snapshot_data object handled by it. As a result, the
SNAPSHOT_UNFREEZE doesn't work correctly after SNAPSHOT_S2RAM has
returned and the user space hibernate task cannot thaw the other
processes as appropriate. Make SNAPSHOT_S2RAM clear data->ready
to fix this problem.
Tested-by: Alexandre Felipe Muller de Souza <alexandrefm@mandriva.com.br> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If the process using the hibernate user space interface closes
/dev/snapshot after creating a hibernation image without thawing
tasks, snapshot_release() should call pm_restore_gfp_mask() to
restore the GFP mask used before the creation of the image. Make
that happen.
Tested-by: Alexandre Felipe Muller de Souza <alexandrefm@mandriva.com.br> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
A warning is printed by pm_restrict_gfp_mask() while the
SNAPSHOT_S2RAM ioctl is being executed after creating a hibernation
image, because pm_restrict_gfp_mask() has been called once already
before the image creation and suspend_devices_and_enter() calls it
once again. This happens after commit 452aa6999e6703ffbddd7f6ea124d3
(mm/pm: force GFP_NOIO during suspend/hibernation and resume).
To avoid this issue, move pm_restrict_gfp_mask() and
pm_restore_gfp_mask() from suspend_devices_and_enter() to its caller
in kernel/power/suspend.c.
Reported-by: Alexandre Felipe Muller de Souza <alexandrefm@mandriva.com.br> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
With ARMv5+ and EABI, the compiler expects a 64-bit aligned stack so
instructions like STRD and LDRD can be used. Without this, mysterious
boot failures were seen semi randomly with the LZMA decompressor.
While at it, let's align .bss as well.
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org> Tested-by: Shawn Guo <shawn.guo@linaro.org> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The use of igrab() in swapoff's shmem_unuse_inode() is just as vulnerable
to umount as that in shmem_writepage().
Fix this instance by extending the protection of shmem_swaplist_mutex
right across shmem_unuse_inode(): while it's on the list, the inode cannot
be evicted (and the filesystem cannot be unmounted) without
shmem_evict_inode() taking that mutex to remove it from the list.
But since shmem_writepage() might take that mutex, we should avoid making
memory allocations or memcg charges while holding it: prepare them at the
outer level in shmem_unuse(). When mem_cgroup_cache_charge() was
originally placed, we didn't know until that point that the page from swap
was actually a shmem page; but nowadays it's noted in the swap_map, so
we're safe to charge upfront. For the radix_tree, do as is done in
shmem_getpage(): preload upfront, but don't pin to the cpu; so we make a
habit of refreshing the node pool, but might dip into GFP_NOWAIT reserves
on occasion if subsequently preempted.
With the allocation and charge moved out from shmem_unuse_inode(),
we can also hold index map and info->lock over from finding the entry.
Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
While password processing we can get out of options array bound if
the next character after array is delimiter. The patch adds a check
if we reach the end.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
A length of zero (after subtracting two for the type and len fields) for
the DCCPO_{CHANGE,CONFIRM}_{L,R} options will cause an underflow due to
the subtraction. The subsequent code may read past the end of the
options value buffer when parsing. I'm unsure of what the consequences
of this might be, but it's probably not good.
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Reported-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Keith Packard <keithp@keithp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If we're using vga switcheroo, the device may be turned off
and poking it can return random state. This provokes an OOPS fixed
separately by 8ff887c847 (drm/i915/dp: Be paranoid in case we disable a
DP before it is attached). Trying to use and respond to events on a
device that has been turned off by the user is in principle a silly thing
to do.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Keith Packard <keithp@keithp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Given that the hardware may be left in a random condition by the BIOS,
it is conceivable that we then attempt to clear the DP_PIPEB_SELECT bit
without us ever enabling/attaching the DP encoder to a pipe. Thus
causing a NULL deference when we attempt to wait for a vblank on that
crtc.
Reported-and-tested-by: Bryan Christ <bryan.christ@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36314 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36456 Reported-and-tested-by: Bo Wang <bo.b.wang@intel.com> Signed-off-by: Keith Packard <keithp@keithp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Linux kernel excludes guard page when performing mlock on a VMA with
down-growing stack. However, some architectures have up-growing stack
and locking the guard page should be excluded in this case too.
This patch fixes lvm2 on PA-RISC (and possibly other architectures with
up-growing stack). lvm2 calculates number of used pages when locking and
when unlocking and reports an internal error if the numbers mismatch.
[ Patch changed fairly extensively to also fix /proc/<pid>/maps for the
grows-up case, and to move things around a bit to clean it all up and
share the infrstructure with the /proc bits.
Tested on ia64 that has both grow-up and grow-down segments - Linus ]
Commit a626ca6a6564 ("vm: fix vm_pgoff wrap in stack expansion") fixed
the case of an expanding mapping causing vm_pgoff wrapping when you had
downward stack expansion. But there was another case where IA64 and
PA-RISC expand mappings: upward expansion.
Add module ack_check, and plcp_check parameters. Ack_check is disabled
by default since is proved that check ack health can cause troubles.
Plcp_check is enabled by default.
This prevent connection hangs with "low ack count detected" messages.
We make use of ptrace_get_breakpoints() / ptrace_put_breakpoints() to
protect ptrace_set_debugreg() even if CONFIG_HAVE_HW_BREAKPOINT if off.
However in this case, these APIs are not implemented.
To fix this, push the protection down inside the relevant ifdef.
Best would be to export the code inside
CONFIG_HAVE_HW_BREAKPOINT into a standalone function to cleanup
the ifdefury there and call the breakpoint ref API inside. But
as it is more invasive, this should be rather made in an -rc1.
Fixes this build error:
arch/powerpc/kernel/ptrace.c:1594: error: implicit declaration of function 'ptrace_get_breakpoints' make[2]: ***
When a task is traced and is in a stopped state, the tracer
may execute a ptrace request to examine the tracee state and
get its task struct. Right after, the tracee can be killed
and thus its breakpoints released.
This can happen concurrently when the tracer is in the middle
of reading or modifying these breakpoints, leading to dereferencing
a freed pointer.
Hence, to prepare the fix, create a generic breakpoint reference
holding API. When a reference on the breakpoints of a task is
held, the breakpoints won't be released until the last reference
is dropped. After that, no more ptrace request on the task's
breakpoints can be serviced for the tracer.
While the tracer accesses ptrace breakpoints, the child task may
concurrently exit due to a SIGKILL and thus release its breakpoints
at the same time. We can then dereference some freed pointers.
To fix this, hold a reference on the child breakpoints before
manipulating them.
The newer Lenovo ThinkPads have HKEY HID of LEN0068 instead
of IBM0068. Added new HID so that thinkpad_acpi module will
auto load on these newer Lenovo ThinkPads.
Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com> Signed-off-by: Andy Lutomirski <luto@mit.edu> Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
cifs_demultiplex_thread calls coalesce_t2 to try and merge follow-on t2
responses into the original mid buffer. coalesce_t2 however can return
errors, but the caller doesn't handle that situation properly. Fix the
thread to treat such a case as it would a malformed packet. Mark the
mid as being malformed and issue the callback.
Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
...to reduce the extreme indentation. This should introduce no
behavioral changes.
Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There are a couple of places in this code where these values can wrap or
go negative, and that could potentially end up overflowing the buffer.
Ensure that that doesn't happen. Do all of the length calculation and
checks first, and only perform the memcpy after they pass.
Also, increase some stack variables to 32 bits to ensure that they don't
wrap without being detected.
Finally, change the error codes to be a bit more descriptive of any
problems detected. -EINVAL isn't very accurate.
Reported-and-Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It's possible that when we go to decode the string area in the
SESSION_SETUP response, that bytes_remaining will be 0. Decrementing it at
that point will mean that it can go "negative" and wrap. Check for a
bytes_remaining value of 0, and don't try to decode the string area if
that's the case.
Reported-and-Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The buffer length checks in this function depend on this value being a
signed data type, but 690c522fa converted it to an unsigned type.
Also, eliminate a problem with the null termination check in the same
function. cifs_strndup_from_ucs handles that situation correctly
already, and the existing check could potentially lead to a buffer
overrun since it increments bleft without checking to see whether it
falls off the end of the buffer.
Reported-and-Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The logic in __get_user_pages() used to skip the stack guard page lookup
whenever the caller wasn't interested in seeing what the actual page
was. But Michel Lespinasse points out that there are cases where we
don't care about the physical page itself (so 'pages' may be NULL), but
do want to make sure a page is mapped into the virtual address space.
So using the existence of the "pages" array as an indication of whether
to look up the guard page or not isn't actually so great, and we really
should just use the FOLL_MLOCK bit. But because that bit was only set
for the VM_LOCKED case (and not all vma's necessarily have it, even for
mlock()), we couldn't do that originally.
Fix that by moving the VM_LOCKED check deeper into the call-chain, which
actually simplifies many things. Now mlock() gets simpler, and we can
also check for FOLL_MLOCK in __get_user_pages() and the code ends up
much more straightforward.
is causing a potential NULL deref in scsi_run_queue() because the
q->queuedata may already be NULL by the time this function is called.
Since we shouldn't be running a queue that is being torn down, simply
add a NULL check in scsi_run_queue() to forestall this.
Tested-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We can get here with a NULL socket argument passed from userspace,
so we need to handle it accordingly.
Thanks to Dave Jones pointing at this issue in net/can/bcm.c
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When we enable an NMI window, we ask for an IRET intercept, since
the IRET re-enables NMIs. However, the IRET intercept happens before
the instruction executes, while the NMI window architecturally opens
afterwards.
To compensate for this mismatch, we only open the NMI window in the
following exit, assuming that the IRET has by then executed; however,
this assumption is not always correct; we may exit due to a host interrupt
or page fault, without having executed the instruction.
Fix by checking for forward progress by recording and comparing the IRET's
rip. This is somewhat of a hack, since an unchaging rip does not mean that
no forward progress has been made, but is the simplest fix for now.
Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The kernel automatically evaluates partition tables of storage devices.
The code for evaluating LDM partitions (in fs/partitions/ldm.c) contains
a bug that causes a kernel oops on certain corrupted LDM partitions.
A kernel subsystem seems to crash, because, after the oops, the kernel no
longer recognizes newly connected storage devices.
The patch validates the value of vblk_size.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Timo Warns <warns@pre-sense.de> Cc: Eugene Teo <eugeneteo@kernel.sg> Cc: Harvey Harrison <harvey.harrison@gmail.com> Cc: Richard Russon <rich@flatcap.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
An open on a NFS4 share using the O_CREAT flag on an existing file for
which we have permissions to open but contained in a directory with no
write permissions will fail with EACCES.
A tcpdump shows that the client had set the open mode to UNCHECKED which
indicates that the file should be created if it doesn't exist and
encountering an existing flag is not an error. Since in this case the
file exists and can be opened by the user, the NFS server is wrong in
attempting to check create permissions on the parent directory.
The patch adds a conditional statement to check for create permissions
only if the file doesn't exist.
Signed-off-by: Sachin S. Prabhu <sprabhu@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The old code considered valid empty LZMA2 streams to be corrupt.
Note that a typical empty .xz file has no LZMA2 data at all,
and thus most .xz files having no uncompressed data are handled
correctly even without this fix.
When CONFIG_OABI_COMPAT is set, the wrapper for semtimedop does not
bound the nsops argument. A sufficiently large value will cause an
integer overflow in allocation size, followed by copying too much data
into the allocated buffer. Fix this by restricting nsops to SEMOPM.
Untested.
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This fixes the following oops discovered by Dan Aloni:
> Anyway, the following is the output of the Oops that I got on the
> Ubuntu kernel on which I first detected the problem
> (2.6.37-12-generic). The Oops that followed will be more useful, I
> guess.
>[ 5594.669852] BUG: unable to handle kernel NULL pointer dereference
> at      (null)
> [ 5594.681606] IP: [<ffffffff81550b7b>] unix_dgram_recvmsg+0x1fb/0x420
> [ 5594.687576] PGD 2a05d067 PUD 2b951067 PMD 0
> [ 5594.693720] Oops: 0002 [#1] SMP
> [ 5594.699888] last sysfs file:
The bug was that unix domain sockets use a pseduo packet for
connecting and accept uses that psudo packet to get the socket.
In the buggy seqpacket case we were allowing unconnected
sockets to call recvmsg and try to receive the pseudo packet.
That is always wrong and as of commit 7361c36c5 the pseudo
packet had become enough different from a normal packet
that the kernel started oopsing.
Do for seqpacket_recv what was done for seqpacket_send in 2.5
and only allow it on connected seqpacket sockets.
Tested-by: Dan Aloni <dan@aloni.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The locking with SMPS requests means that the
debugs file should lock the mgd mutex, not the
iflist mutex. Calls to __ieee80211_request_smps()
need to hold that mutex, so add an assertion.
This has always been wrong, but for some reason
never been noticed, probably because the locking
error only happens while unassociated.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The patch 'ath9k_hw: fix stopping rx DMA during resets' added code to detect
a condition where rx DMA was stopped, but the MAC failed to enter the idle
state. This condition requires a hardware reset, however the return value
of ath_stoprecv was 'true' in that case, which allowed it to skip the reset
when issuing a fast channel change.
Signed-off-by: Felix Fietkau <nbd@openwrt.org> Reported-by: Paul Stewart <pstew@google.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Older AMD K8 processors (Revisions A-E) are affected by erratum
400 (APIC timer interrupts don't occur in C states greater than
C1). This, for example, means that X86_FEATURE_ARAT flag should
not be set for these parts.
This addresses regression introduced by commit b87cf80af3ba4b4c008b4face3c68d604e1715c6 ("x86, AMD: Set ARAT
feature on AMD processors") where the system may become
unresponsive until external interrupt (such as keyboard input)
occurs. This results, for example, in time not being reported
correctly, lack of progress on the system and other lockups.
Just like kmalloc will allow one to allocate a 0 length segment of memory
flex arrays should do the same thing. It should bomb if you try to use
something, but it should at least allow the allocation.
This is needed because when SELinux switched to using flex_arrays in 2.6.38
the inability to allocate a 0 length array resulted in SELinux policy load
returning -ENOSPC when previously it worked.
Based-on-patch-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Eric Paris <eparis@redhat.com> Tested-by: Chris Richards <gizmo@giz-works.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Change flex_array_prealloc to take the number of elements for which space
should be allocated instead of the last (inclusive) element. Users
and documentation are updated accordingly. flex_arrays got introduced before
they had users. When folks started using it, they ended up needing a
different API than was coded up originally. This swaps over to the API that
folks apparently need.
Based-on-patch-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Eric Paris <eparis@redhat.com> Tested-by: Chris Richards <gizmo@giz-works.com> Acked-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The imon_ir_change_protocol function gets called two different ways, one
way is from rc_register_device, for initial protocol selection/setup,
and the other is via a userspace-initiated protocol change request,
either by direct sysfs prodding or by something like ir-keytable.
In the rc_register_device case, the imon context lock is already held,
but when initiated from userspace, it is not, so we must acquire it,
prior to calling send_packet, which requires that the lock is held.
Without this change, there's an easily reproduceable deadlock when
another function calls send_packet (such as either of the display write
fops) after a userspace-initiated change_protocol.
With a lock-debugging-enabled kernel, I was getting this:
[ 15.014153] =====================================
[ 15.015048] [ BUG: bad unlock balance detected! ]
[ 15.015048] -------------------------------------
[ 15.015048] ir-keytable/773 is trying to release lock (&ictx->lock) at:
[ 15.015048] [<ffffffff814c6297>] mutex_unlock+0xe/0x10
[ 15.015048] but there are no more locks to release!
[ 15.015048]
[ 15.015048] other info that might help us debug this:
[ 15.015048] 2 locks held by ir-keytable/773:
[ 15.015048] #0: (&buffer->mutex){+.+.+.}, at: [<ffffffff8119d400>] sysfs_write_file+0x3c/0x144
[ 15.015048] #1: (s_active#87){.+.+.+}, at: [<ffffffff8119d4ab>] sysfs_write_file+0xe7/0x144
[ 15.015048]
[ 15.015048] stack backtrace:
[ 15.015048] Pid: 773, comm: ir-keytable Not tainted 2.6.38.4-20.fc15.x86_64.debug #1
[ 15.015048] Call Trace:
[ 15.015048] [<ffffffff81089715>] ? print_unlock_inbalance_bug+0xca/0xd5
[ 15.015048] [<ffffffff8108b35c>] ? lock_release_non_nested+0xc1/0x263
[ 15.015048] [<ffffffff814c6297>] ? mutex_unlock+0xe/0x10
[ 15.015048] [<ffffffff814c6297>] ? mutex_unlock+0xe/0x10
[ 15.015048] [<ffffffff8108b67b>] ? lock_release+0x17d/0x1a4
[ 15.015048] [<ffffffff814c6229>] ? __mutex_unlock_slowpath+0xc5/0x125
[ 15.015048] [<ffffffff814c6297>] ? mutex_unlock+0xe/0x10
[ 15.015048] [<ffffffffa02964b6>] ? send_packet+0x1c9/0x264 [imon]
[ 15.015048] [<ffffffff8108b376>] ? lock_release_non_nested+0xdb/0x263
[ 15.015048] [<ffffffffa0296731>] ? imon_ir_change_protocol+0x126/0x15e [imon]
[ 15.015048] [<ffffffffa024a334>] ? store_protocols+0x1c3/0x286 [rc_core]
[ 15.015048] [<ffffffff81326e4e>] ? dev_attr_store+0x20/0x22
[ 15.015048] [<ffffffff8119d4cc>] ? sysfs_write_file+0x108/0x144
...
The original report that led to the investigation was the following:
Some v4l drivers currently don't initialize their struct v4l2_subdev
with zeros, and this is a problem since some of the v4l2 code expects
this. One example is the addition of internal_ops in commit 45f6f84,
after that we are at risk of random oopses with these drivers when code
in v4l2_device_register_subdev tries to dereference sd->internal_ops->*,
as can be shown by the report at http://bugs.launchpad.net/bugs/745213
and analysis of its crash at https://lkml.org/lkml/2011/4/1/168
Use kzalloc within problematic drivers to ensure we have a zeroed struct
v4l2_subdev.
Some newer Huawei devices (T-Mobile Rocket, others) have cdc-ether
compatible ports, so recognize and expose them.
Signed-off-by: Dan Williams <dcbw@redhat.com> Acked-by: Oliver Neukum <oneukum@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Current implementation of ohci_set_config_rom() uses a deferred
bus reset via fw_schedule_bus_reset(). If clients add multiple
unit descriptors to the config_rom in quick succession, the
deferred bus reset may not have fired before succeeding update
requests have come in. This can lead to an incorrect partial
update of the config_rom for both addition and removal of
config_rom descriptors, as the ohci_set_config_rom() routine
will return -EBUSY if a previous pending update has not been
completed yet; the requested update just gets dropped on the floor.
This patch recognizes that the "in-flight" update can be modified
until it has been processed by the bus-reset, and the locking
in the bus_reset_tasklet ensures that the update is done atomically
with respect to modifications made by ohci_set_config_rom(). The
-EBUSY error case is simply removed.
[Stefan R: The bug always existed at least theoretically. But it
became easy to trigger since 2.6.36 commit 02d37bed188c "firewire: core:
integrate software-forced bus resets with bus management" which
introduced long mandatory delays between janitorial bus resets.]
Signed-off-by: Benjamin Buchalter <bj@mhlabs.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1460) fixes a regression in the usbip driver caused by
the new check for Transaction Translators in USB-2 hubs. The root hub
registered by vhci_hcd needs to have the has_tt flag set, because it
can connect to low- and full-speed devices as well as high-speed
devices.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-and-tested-by: Nikola Ciprich <nikola.ciprich@linuxbox.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It seems that under certain circumstances the sdhci_tasklet_finish()
call can be entered with mrq set to NULL, causing the system to crash
with a NULL pointer de-reference.
Seen on S3C6410 system. Based on a patch by Dimitris Papastamos.
It seems that under certain circumstances that the sdhci_tasklet_finish()
call can be entered with mrq->cmd set to NULL, causing the system to crash
with a NULL pointer de-reference.
Unable to handle kernel NULL pointer dereference at virtual address 00000000
PC is at sdhci_tasklet_finish+0x34/0xe8
LR is at sdhci_tasklet_finish+0x24/0xe8
Seen on S3C6410 system.
Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If pci_ioremap_bar() fails during probe, we "goto release;" and free the
host, but then we return 0 -- which tells sdhci_pci_probe() that the probe
succeeded. Since we think the probe succeeded, when we unload sdhci we'll
go to sdhci_pci_remove_slot() and it will try to dereference slot->host,
which is now NULL because we freed it in the error path earlier.
The patch simply sets ret appropriately, so that sdhci_pci_probe() will
detect the failure immediately and bail out.
Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently there is a race in the MMC core between a card-detect
rescan work and the clock-gating work, scheduled from a command
completion. Fix it by removing the dedicated clock-gating mutex
and using the MMC standard locking mechanism instead.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Cc: Simon Horman <horms@verge.net.au> Cc: Magnus Damm <damm@opensource.se> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
UBIFS error (pid 34456): could not find an empty LEB
which sometimes happens after power cuts when we mount the file-system - UBIFS
refuses it with the above error message which comes from the
'ubifs_rcvry_gc_commit()' function. I can reproduce this using the integck test
with the UBIFS power cut emulation enabled.
Analysis of the problem.
Currently UBIFS replay seeks the journal heads to the last _replayed_ bud.
But the buds are replayed out-of-order, so the replay basically seeks journal
heads to the "random" bud belonging to this head, and not to the _last_ one.
The result of this is that the GC head may be seeked to a full LEB with no free
space, or very little free space. And 'ubifs_rcvry_gc_commit()' tries to find a
fully or mostly dirty LEB to match the current GC head (because we need to
garbage-collect that dirty LEB at one go, because we do not have @c->gc_lnum).
So 'ubifs_find_dirty_leb()' fails and we fall back to finding an empty LEB and
also fail. As a result - recovery fails and mounting fails.
This patch teaches the replay to initialize the GC heads exactly to the latest
buds, i.e. the buds which have the largest sequence number in corresponding
log reference nodes.
Currently UBIFS has a small optimization - it frees write-buffers when it is
re-mounted from R/W mode to R/O mode. Of course, when it is mounted R/O, it
does not allocate write-buffers as well.
This optimization is nice but it leads to subtle problems and complications
in recovery, which I can reproduce using the integck test. The symptoms are
that after a power cut the file-system cannot be mounted if we first mount
it R/O, and then re-mount R/W - 'ubifs_rcvry_gc_commit()' prints:
UBIFS error (pid 34456): could not find an empty LEB
Analysis of the problem.
When mounting R/W, the reply process sets journal heads to buds [1], but
when mounting R/O - it does not do this, because the write-buffers are not
allocated. So 'ubifs_rcvry_gc_commit()' works completely differently for the
same file-system but for the following 2 cases:
1. mounting R/W after a power cut and recover
2. mounting R/O after a power cut, re-mounting R/W and run deferred recovery
In the former case, we have journal heads seeked to the a bud, in the latter
case, they are non-seeked (wbuf->lnum == -1). So in the latter case we do not
try to recover the GC LEB by garbage-collecting to the GC head, but we just
try to find an empty LEB, and there may be no empty LEBs, so we just fail.
On the other hand, in the former case (mount R/W), we are able to make a GC LEB
(@c->gc_lnum) by garbage-collecting.
Thus, let's remove this small nice optimization and always allocate
write-buffers. This should not make too big difference - we have only 3
of them, each of max. write unit size, which is usually 2KiB. So this is
about 6KiB of RAM for the typical case, and only when mounted R/O.
[1]: Note, currently the replay process is setting (seeking) the journal heads
to _some_ buds, not necessarily to the buds which had been the journal heads
before the power cut happened. This will be fixed separately.
The mechanism used to initiate work events from the interrupt
handler has a classic read/modify/write race between the interrupt
handler that sets the condition, and the worker task that reads and
clears the condition. Close these races by using atomic
bit fields.
Cc: Jie Yang <jie.yang@atheros.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Patch partially resolve:
https://bugzilla.kernel.org/show_bug.cgi?id=16691
However, there are still 11n performance problems on 4965 and 5xxx
devices that need to be investigated.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Wey-Yi Guy <wey-yi.w.guy@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If a rescuer and stop_machine() bringing down a CPU race with each
other, they may deadlock on non-preemptive kernel. The CPU won't
accept a new task, so the rescuer can't migrate to the target CPU,
while stop_machine() can't proceed because the rescuer is holding one
of the CPU retrying migration. GCWQ_DISASSOCIATED is never cleared
and worker_maybe_bind_and_lock() retries indefinitely.
This problem can be reproduced semi reliably while the system is
entering suspend.
Use a standard list with proper locking to handle the list of
adapters. Thankfully it only matters on systems with more than one
parallel port, which are very rare.
Thanks to Lukasz Kapiec for reporting the problem to me.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It turned out that there are different pin configurations for this
PCI SSID, including multi-channel modes. And more proper fix for
allowing line-out mutes will come up in 2.6.40 tree, so we won't need
this fixup any more there.
The PCI SSID is 1025:031c and the codec SSID is 1025:031d,
so the driver mistakes this for a SKU value, but looking at
the numbers, this is obviously wrong.
SCSI uses request_queue->queuedata == NULL as a signal that the queue
is dying. We set this state in the sdev release function. However,
this allows a small window where we release the last reference but
haven't quite got to this stage yet and so something will try to take
a reference in scsi_request_fn and oops. It's very rare, but we had a
report here, so we're pushing this as a bug fix
The actual fix is to set request_queue->queuedata to NULL in
scsi_remove_device() before we drop the reference. This causes
correct automatic rejects from scsi_request_fn as people who hold
additional references try to submit work and prevents anything from
getting a new reference to the sdev that way.
Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit db422318cbca55168cf965f655471dbf8be82433 ([SCSI] scsi_dh:
propagate SCSI device deletion) introduced a regression where the device
reference is not dropped prior to scsi_dh_activate's early return from
the error path.
Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
At two points in handling device ioctls via /dev/mpt2ctl, user-supplied
length values are used to copy data from userspace into heap buffers
without bounds checking, allowing controllable heap corruption and
subsequently privilege escalation.
Additionally, user-supplied values are used to determine the size of a
copy_to_user() as well as the offset into the buffer to be read, with no
bounds checking, allowing users to read arbitrary kernel memory.
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Acked-by: Eric Moore <eric.moore@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There's a code path in pmcraid that can be reached via device ioctl that
causes all sorts of ugliness, including heap corruption or triggering
the OOM killer due to consecutive allocation of large numbers of pages.
Not especially relevant from a security perspective, since users must
have CAP_SYS_ADMIN to open the character device.
First, the user can call pmcraid_chr_ioctl() with a type
PMCRAID_PASSTHROUGH_IOCTL. A pmcraid_passthrough_ioctl_buffer
is copied in, and the request_size variable is set to
buffer->ioarcb.data_transfer_length, which is an arbitrary 32-bit signed
value provided by the user.
If a negative value is provided here, bad things can happen. For
example, pmcraid_build_passthrough_ioadls() is called with this
request_size, which immediately calls pmcraid_alloc_sglist() with a
negative size. The resulting math on allocating a scatter list can
result in an overflow in the kzalloc() call (if num_elem is 0, the
sglist will be smaller than expected), or if num_elem is unexpectedly
large the subsequent loop will call alloc_pages() repeatedly, a high
number of pages will be allocated and the OOM killer might be invoked.
Prevent this value from being negative in pmcraid_ioctl_passthrough().
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Cc: Anil Ravindranath <anil_ravindranath@pmc-sierra.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Mouse gets "stuck" after restore of PV guest but buttons are in working
condition.
If driver has been configured for ABS coordinates at start it will get
XENKBD_TYPE_POS events and then suddenly after restore it'll start getting
XENKBD_TYPE_MOTION events, that will be dropped later and they won't get
into user-space.
Regression was introduced by hunk 5 and 6 of 5ea5254aa0ad269cfbd2875c973ef25ab5b5e9db
("Input: xen-kbdfront - advertise either absolute or relative
coordinates").
Driver on restore should ask xen for request-abs-pointer again if it is
available. So restore parts that did it before 5ea5254.
Acked-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
[v1: Expanded the commit description] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
if maximum tx power read from the eeprom is smaller than default.
In consequence card is unable to initialize properly. Fix the problem
and cleanup tx power initialization.
Reported-and-tested-by: Robin Dong <hao.bigrat@gmail.com> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
After new NetworkManager 0.8.996 changes, hardware scanning is causing
microcode errors as reported here:
https://bugzilla.redhat.com/show_bug.cgi?id=683571
and sometimes kernel crashes:
https://bugzilla.redhat.com/show_bug.cgi?id=688252
Also with hw scan there are very bad performance on some systems
as reported here:
https://bugzilla.redhat.com/show_bug.cgi?id=671366
Since Intel no longer supports 3945, there is no chance to get proper
firmware fixes, we need workaround problems by disable hardware scanning
by default.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Mac80211 can request for tx power and channel change in one ->config
call. If that happens, *_send_tx_power functions will try to setup tx
power for old channel, what can be not correct because we already change
the band. I.e error "Failed to get channel info for channel 140 [0]",
can be printed frequently when operating in software scanning mode.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Wey-Yi Guy <wey-yi.w.guy@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
page_count is copied from userspace. agp_allocate_memory() tries to
check whether this number is too big, but doesn't take into account the
wrap case. Also agp_create_user_memory() doesn't check whether
alloc_size is calculated from num_agp_pages variable without overflow.
This may lead to allocation of too small buffer with following buffer
overflow.
Another problem in agp code is not addressed in the patch - kernel memory
exhaustion (AGPIOC_RESERVE and AGPIOC_ALLOCATE ioctls). It is not checked
whether requested pid is a pid of the caller (no check in agpioc_reserve_wrap()).
Each allocation is limited to 16KB, though, there is no per-process limit.
This might lead to OOM situation, which is not even solved in case of the
caller death by OOM killer - the memory is allocated for another (faked) process.
pg_start is copied from userspace on AGPIOC_BIND and AGPIOC_UNBIND ioctl
cmds of agp_ioctl() and passed to agpioc_bind_wrap(). As said in the
comment, (pg_start + mem->page_count) may wrap in case of AGPIOC_BIND,
and it is not checked at all in case of AGPIOC_UNBIND. As a result, user
with sufficient privileges (usually "video" group) may generate either
local DoS or privilege escalation.
This adds support for 64 bit atomic operations on 32 bit UML systems. XFS
needs them since 2.6.38.
$ make ARCH=um SUBARCH=i386
...
LD .tmp_vmlinux1
fs/built-in.o: In function `xlog_regrant_reserve_log_space':
xfs_log.c:(.text+0xd8584): undefined reference to `atomic64_read_386'
xfs_log.c:(.text+0xd85ac): undefined reference to `cmpxchg8b_emu'
...
On a remount, the VFS layer will clear the MS_SYNCHRONOUS bit on the
assumption that the flags on the mount syscall will have it set if the
remounted fs is supposed to keep it.
In the case of "noac" though, MS_SYNCHRONOUS is implied. A remount of
such a mount will lose the MS_SYNCHRONOUS flag since "sync" isn't part
of the mount options.
Reported-by: Max Matveev <makc@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Azurit reports large increases in system time after 2.6.36 when running
Apache. It was bisected down to a892e2d7dcdfa6c76e6 ("vfs: use kmalloc()
to allocate fdmem if possible").
That patch caused the vfs to use kmalloc() for very large allocations and
this is causing excessive work (and presumably excessive reclaim) within
the page allocator.
Fix it by falling back to vmalloc() earlier - when the allocation attempt
would have been considered "costly" by reclaim.
For m68k, N_NORMAL_MEMORY represents all nodes that have present memory
since it does not support HIGHMEM. This patch sets the bit at the time
node_present_pages has been set by free_area_init_node.
At the time the node is brought online, the node state would have to be
done unconditionally since information about present memory has not yet
been recorded.
If N_NORMAL_MEMORY is not accurate, slub may encounter errors since it
uses this nodemask to setup per-cache kmem_cache_node data structures.
This pach is an alternative to the one proposed by David Rientjes
<rientjes@google.com> attempting to set node state immediately when
bringing the node online.
The huge_memory.c THP page fault was allowed to run if vm_ops was null
(which would succeed for /dev/zero MAP_PRIVATE, as the f_op->mmap wouldn't
setup a special vma->vm_ops and it would fallback to regular anonymous
memory) but other THP logics weren't fully activated for vmas with vm_file
not NULL (/dev/zero has a not NULL vma->vm_file).
So this removes the vm_file checks so that /dev/zero also can safely use
THP (the other albeit safer approach to fix this bug would have been to
prevent the THP initial page fault to run if vm_file was set).
After removing the vm_file checks, this also makes huge_memory.c stricter
in khugepaged for the DEBUG_VM=y case. It doesn't replace the vm_file
check with a is_pfn_mapping check (but it keeps checking for VM_PFNMAP
under VM_BUG_ON) because for a is_cow_mapping() mapping VM_PFNMAP should
only be allowed to exist before the first page fault, and in turn when
vma->anon_vma is null (so preventing khugepaged registration). So I tend
to think the previous comment saying if vm_file was set, VM_PFNMAP might
have been set and we could still be registered in khugepaged (despite
anon_vma was not NULL to be registered in khugepaged) was too paranoid.
The is_linear_pfn_mapping check is also I think superfluous (as described
by comment) but under DEBUG_VM it is safe to stay.
With transparent hugepage support, handle_mm_fault() has to be careful
that a normal PMD has been established before handling a PTE fault. To
achieve this, it used __pte_alloc() directly instead of pte_alloc_map as
pte_alloc_map is unsafe to run against a huge PMD. pte_offset_map() is
called once it is known the PMD is safe.
pte_alloc_map() is smart enough to check if a PTE is already present
before calling __pte_alloc but this check was lost. As a consequence,
PTEs may be allocated unnecessarily and the page table lock taken. Thi
useless PTE does get cleaned up but it's a performance hit which is
visible in page_test from aim9.
This patch simply re-adds the check normally done by pte_alloc_map to
check if the PTE needs to be allocated before taking the page table lock.
The effect is noticable in page_test from aim9.
(While this affects 2.6.38, it is a performance rather than a functional
bug and normally outside the rules -stable. While the big performance
differences are to a microbench, the difference in fork and exec
performance may be significant enough that -stable wants to consider the
patch)
Reported-by: Raz Ben Yehuda <raziebe@gmail.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
PTE pages eat up memory just like anything else, but we do not account for
them in any way in the OOM scores. They are also _guaranteed_ to get
freed up when a process is OOM killed, while RSS is not.
This call was disabled as hot-unplugging one virtconsole port led to
another virtconsole port freezing.
Upon testing it again, this now works, so enable it.
In addition, a bug was found in qemu wherein removing a port of one type
caused the guest output from another port to stop working. I doubt it
was just this bug that caused it (since disabling the hvc_remove() call
did allow other ports to continue working), but since it's all solved
now, we're fine with hot-unplugging of virtconsole ports.
Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch fixes the warning about bad names for sys-fs and other kernel-things. The flexcop-pci driver was using '/'-characters in it, which is not good.
This has been fixed in several attempts by several people, but obviously never made it into the kernel.
When a DISCONTIGMEM memory range is brought online as a NUMA node, it
also needs to have its bet set in N_NORMAL_MEMORY. This is necessary for
generic kernel code that utilizes N_NORMAL_MEMORY as a subset of N_ONLINE
for memory savings.
These types of hacks can hopefully be removed once DISCONTIGMEM is either
removed or abstracted away from CONFIG_NUMA.
Fixes a panic in the slub code which only initializes structures for
N_NORMAL_MEMORY to save memory:
Slub makes assumptions about page_to_nid() which are violated by
DISCONTIGMEM and !NUMA. This violation results in a panic because
page_to_nid() can be non-zero for pages in the discontiguous ranges and
this leads to a null return by get_node(). The assertion by the
maintainer is that DISCONTIGMEM should only be allowed when NUMA is also
defined. However, at least six architectures: alpha, ia64, m32r, m68k,
mips, parisc violate this. The panic is a regression against slab, so
just mark slub broken in the problem configuration to prevent users
reporting these panics.
Acked-by: David Rientjes <rientjes@google.com> Acked-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There is at least one BIOS with a DSDT containing a power resource
object with a _PR0 entry pointing back to that power resource. In
consequence, while registering that power resource
acpi_bus_get_power_flags() sees that it depends on itself and tries
to register it again, which leads to an infinitely deep recurrence.
This problem was introduced by commit bf325f9538d8c89312be305b9779e
(ACPI / PM: Register power resource devices as soon as they are
needed).
To fix this problem use the observation that power resources cannot
be power manageable and prevent acpi_bus_get_power_flags() from
being called for power resource objects.
References: https://bugzilla.kernel.org/show_bug.cgi?id=31872 Reported-and-tested-by: Pascal Dormeau <pdormeau@free.fr> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Len Brown <lenb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
f6649a7e "[S390] cleanup lowcore access from external interrupts" changed
handling of external interrupts. Instead of letting the external interrupt
handlers accessing the per cpu lowcore the entry code of the kernel reads
already all fields that are necessary and passes them to the handlers.
The pfault interrupt handler was incorrectly converted. It tries to
dereference a value which used to be a pointer to a lowcore field. After
the conversion however it is not anymore the pointer to the field but its
content. So instead of a dereference only a cast is needed to get the
task pointer that caused the pfault.
Fixes a NULL pointer dereference and a subsequent kernel crash:
From: Christian Borntraeger <borntraeger@de.ibm.com>
This patch fixes the sie exit on interrupts. The low level
interrupt handler returns to the PSW address in pt_regs and not
to the PSW address in the lowcore.
Without this fix a cpu bound guest might never leave guest state
since the host interrupt handler would blindly return to the
SIE instruction, even on need_resched and friends.
Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch fixes UBIFS mount failure when the debugging support is enabled,
we are recovering from a power cut, we were first mounter R/O and we are
re-mounting R/W. In this case we should not assume that the amount of free
space before we have re-mounted R/W and after are equivalent, because
when we have mounted R/O the file-system is in a non-committed state so
the amount of free space is slightly smaller, due to the fact that we cannot
predict the amount of free space precisely before we commit.
This patch fixes the issue by skipping the debugging check in case of
recovery. This issue was reported by Caizhiyong <caizhiyong@huawei.com>
here: http://thread.gmane.org/gmane.linux.drivers.mtd/34350/focus=34387
The rx error bit parsing was changed to consider PHY errors and various
decryption errors separately. While correct according to the documentation,
this is causing spurious decryption error reports in some situations.
Fix this by restoring the original order of the checks in those places,
where the errors are meant to be mutually exclusive.
If a CRC error is reported, then MIC failure and decryption errors
are irrelevant, and a PHY error is unlikely.
Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>