`pc263_detach()` is called by the comedi core to clean up if either
`pc263_attach()` or `pc263_attach_pci()` returns an error. It sets
`thisboard` to the return value of `comedi_board(dev)` and assumes it is
non-null. This is a valid assumption if `pc263_attach()` fails, but not
if `pc263_attach_pci()` fails, leading to a possible NULL pointer
dereference.
Check `thisboard` at the top of `pc263_detach()` and return early if it
is `NULL`. This is okay because no other resources need cleaning up in
this case.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
`dio200_detach()` is called by the comedi core to clean up if either
`dio200_attach()` or `dio200_attach_pci()` return an error. It assigns
`thisboard` to the return value of `comedi_board(dev)` and assumes it is
non-null. In the case of a previous call to `dio200_attach()` it won't
be `NULL` because the comedi core will have pointed it to one of the
elements of `dio200_boards[]`, but in the case of a previous call to
`dio200_attach_pci()` it could be `NULL`, leading to a null pointer
dereference.
Check that `thisboard` is valid at the top of `dio200_detach()` and
return early if it is `NULL`. This is okay because the only other thing
that could have been allocated is `dev->private` and that is freed by
the comedi core, not by this function.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
`labpc_common_detach()` is called by the comedi core to clean up if
either `labpc_attach()` (including the one in the "ni_labpc_cs" module)
or `labpc_attach_pci()` returns an error. It assumes the `thisboard`
macro (expanding to `((struct labpc_board_struct *)dev->board_ptr)`) is
non-null. This is a valid assumption if `labpc_attach()` fails, but not
if `labpc_attach_pci()` fails, leading to a possible NULL pointer
dereference.
Check `thisboard` at the top of `labpc_common_detach()` and return early
if it is `NULL`. This is okay because the only other thing that could
have been allocated is `dev->private` and that is freed by the comedi
core, not by this function.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Here is a small patch to fix a problem caused by a previous patch that
removed the callback function. The callback remove patch:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=1de02225358988e8fd48d1dc3fd12336bbae258a
I finally booted my dev machine on the latest kernel (running Debian
here so it's still on 3.2 normally) to test the ni_daq_700 driver with
my test program and noticed this bug.
Shift the DIO_R read result to bits 8..15 Digital direction
configuration: channels 0-7 output, 8-15 input (8225 device emu as port
A output, port B input, port C N/A).
Signed-off-by: Fred Brooks <nsaspook@nsaspook.com> Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Reviewed-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
`das08_detach()` is called by the comedi core to clean up if either
`das08_attach()` or `das08_attach_pci()` returns an error. It sets
`thisboard` to the return value of `comedi_board(dev)` and assumes it is
non-null. This is a valid assumption if `das08_attach()` fails, but not
if `das08_attach_pci()` fails, leading to a possible NULL pointer
dereference.
Check `thisboard` at the top of `das08_detach()` and return early if it
is `NULL`. This is okay because the only other thing that could have
been allocated is `dev->private` and that is freed by the comedi core,
not by this function.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change 130f315a (staging: zram: remove special handle of uncompressed page)
introduced a bug in the handling of incompressible pages which resulted in
memory allocation failure for such pages.
When a page expands on compression, say from 4K to 4K+30, we were trying to
do zsmalloc(pool, 4K+30). However, the maximum size which zsmalloc can
allocate is PAGE_SIZE (for obvious reasons), so such allocation requests
always return failure (0).
For a page that has compressed size larger than the original size (this may
happen with already compressed or random data), there is no point storing
the compressed version as that would take more space and would also require
time for decompression when needed again. So, the fix is to store any page,
whose compressed size exceeds a threshold (max_zpage_size), as-it-is i.e.
without compression. Memory required for storing this uncompressed page can
then be requested from zsmalloc which supports PAGE_SIZE sized allocations.
Lastly, the fix checks that we do not attempt to "decompress" the page which
we stored in the uncompressed form -- we just memcpy() out such pages.
The default kernel mapping for the pages allocated for the binder
buffers is never used. Set the __GFP_HIGHMEM flag when allocating
these pages so we don't needlessly use low memory pages that may
be required elsewhere.
The warning check for duplicate sysfs entries can cause a buffer overflow
when printing the warning, as strcat() doesn't check buffer sizes.
Use strlcat() instead.
Since strlcat() doesn't return a pointer to the passed buffer, unlike
strcat(), I had to convert the nested concatenation in sysfs_add_one() to
an admittedly more obscure comma operator construct, to avoid emitting code
for the concatenation if CONFIG_BUG is disabled.
This patch should be backported to kernels as old as 3.5, that contain
the commit e3567d2c15a7a8e2f992a5f7c7683453ca406d82 "xhci: Add Intel
U1/U2 timeout policy."
Signed-off-by: Oliver Neukum <oneukum@suse.de> Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
xhci_service_interval_to_ns() returns long long
to avoid an overflow. However, the type cast happens
too late. The fix is to force ULL from the beginning.
This patch should be backported to kernels as old as 3.5, that contain
the commit e3567d2c15a7a8e2f992a5f7c7683453ca406d82 "xhci: Add Intel
U1/U2 timeout policy."
Signed-off-by: Oliver Neukum <oneukum@suse.de> Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The command cancellation code doesn't check whether find_trb_seg()
couldn't find the segment that contains the TRB to be canceled. This
could cause a NULL pointer deference later in the function when next_trb
is called. It's unlikely to happen unless something is wrong with the
command ring pointers, so add some debugging in case it happens.
This patch should be backported to stable kernels as old as 3.0, that
contain the commit b63f4053cc8aa22a98e3f9a97845afe6c15d0a0d "xHCI:
handle command after aborting the command ring".
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The call to xprt_disconnect_done() that is triggered by a successful
connection reset will trigger another automatic wakeup of all tasks
on the xprt->pending rpc_wait_queue. In particular it will cause an
early wake up of the task that called xprt_connect().
All we really want to do here is clear all the socket-specific state
flags, so we split that functionality out of xs_sock_mark_closed()
into a helper that can be called by xs_abort_connection()
Reported-by: Chris Perl <chris.perl@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Chris Perl <chris.perl@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit e9406db20fecbfcab646bad157b4cfdc7cadddfb (lockd: per-net
NSM client creation and destruction helpers introduced) contains
a nasty race on initialisation of the per-net NSM client because
it doesn't check whether or not the client is set after grabbing
the nsm_create_mutex.
This reverts commit 55420c24a0d4d1fce70ca713f84aa00b6b74a70e.
Now that we clear the connected flag when entering TCP_CLOSE_WAIT,
the deadlock described in this commit is no longer possible.
Instead, the resulting call to xs_tcp_shutdown() can interfere
with pending reconnection attempts.
Reported-by: Chris Perl <chris.perl@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Chris Perl <chris.perl@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Chris Perl reports that we're seeing races between the wakeup call in
xs_error_report and the connect attempts. Basically, Chris has shown
that in certain circumstances, the call to xs_error_report causes the
rpc_task that is responsible for reconnecting to wake up early, thus
triggering a disconnect and retry.
Since the sk->sk_error_report() calls in the socket layer are always
followed by a tcp_done() in the cases where we care about waking up
the rpc_tasks, just let the state_change callbacks take responsibility
for those wake ups.
Reported-by: Chris Perl <chris.perl@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Chris Perl <chris.perl@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On s390 any write to a page (even from kernel itself) sets architecture
specific page dirty bit. Thus when a page is written to via buffered
write, HW dirty bit gets set and when we later map and unmap the page,
page_remove_rmap() finds the dirty bit and calls set_page_dirty().
Dirtying of a page which shouldn't be dirty can cause all sorts of
problems to filesystems. The bug we observed in practice is that
buffers from the page get freed, so when the page gets later marked as
dirty and writeback writes it, XFS crashes due to an assertion
BUG_ON(!PagePrivate(page)) in page_buffers() called from
xfs_count_page_state().
Similar problem can also happen when zero_user_segment() call from
xfs_vm_writepage() (or block_write_full_page() for that matter) set the
hardware dirty bit during writeback, later buffers get freed, and then
page unmapped.
Fix the issue by ignoring s390 HW dirty bit for page cache pages of
mappings with mapping_cap_account_dirty(). This is safe because for
such mappings when a page gets marked as writeable in PTE it is also
marked dirty in do_wp_page() or do_page_fault(). When the dirty bit is
cleared by clear_page_dirty_for_io(), the page gets writeprotected in
page_mkclean(). So pagecache page is writeable if and only if it is
dirty.
Thanks to Hugh Dickins for pointing out mapping has to have
mapping_cap_account_dirty() for things to work and proposing a cleaned
up variant of the patch.
The patch has survived about two hours of running fsx-linux on tmpfs
while heavily swapping and several days of running on out build machines
where the original problem was triggered.
Signed-off-by: Jan Kara <jack@suse.cz> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
722bc6b16771 x86/mm: Fix the size calculation of mapping tables
Tried to address the issue that the first 2/4M should use 4k pages
if PSE enabled, but extra counts should only be valid for x86_32.
This commit caused a kdump regression: the kdump kernel hangs.
Work is in progress to fundamentally fix the various page table
initialization issues that we have, via the design suggested
by H. Peter Anvin, but it's not ready yet to be merged.
So, to get a working kdump revert to the last known working version,
which is the revert of this commit and of a followup fix (which was
incomplete):
bd2753b2dda7 x86/mm: Only add extra pages count for the first memory range during pre-allocation
Tested kdump on physical and virtual machines.
Signed-off-by: Dave Young <dyoung@redhat.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Flavio Leitner <fbl@redhat.com> Tested-by: Flavio Leitner <fbl@redhat.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Flavio Leitner <fbl@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: ianfang.cn@gmail.com Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 20167d3421a089a1bf1bd680b150dc69c9506810 ("x86-64: Fix
accounting in kernel_physical_mapping_init()") went a little too
far by entirely removing the counting of pre-populated page
tables: this should be done at boot time (to cover the page
tables set up in early boot code), but shouldn't be done during
memory hot add.
Hence, re-add the removed increments of "pages", but make them
and the one in phys_pte_init() conditional upon !after_bootmem.
When booting a secondary CPU, the primary CPU hands two sets of page
tables via the secondary_data struct:
(1) swapper_pg_dir: a normal, cacheable, shared (if SMP) mapping
of the kernel image (i.e. the tables used by init_mm).
(2) idmap_pgd: an uncached mapping of the .idmap.text ELF
section.
The idmap is generally used when enabling and disabling the MMU, which
includes early CPU boot. In this case, the secondary CPU switches to
swapper as soon as it enters C code:
struct mm_struct *mm = &init_mm;
unsigned int cpu = smp_processor_id();
/*
* All kernel threads share the same mm context; grab a
* reference and switch to it.
*/
atomic_inc(&mm->mm_count);
current->active_mm = mm;
cpumask_set_cpu(cpu, mm_cpumask(mm));
cpu_switch_mm(mm->pgd, mm);
This causes a problem on ARMv7, where the identity mapping is treated as
strongly-ordered leading to architecturally UNPREDICTABLE behaviour of
exclusive accesses, such as those used by atomic_inc.
This patch re-orders the secondary_start_kernel function so that we
switch to swapper before performing any exclusive accesses.
Reported-by: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org> Cc: David McKay <david.mckay@st.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The genalloc code uses the bitmap API from include/linux/bitmap.h and
lib/bitmap.c, which is based on long values. Both bitmap_set from
lib/bitmap.c and bitmap_set_ll, which is the lockless version from
genalloc.c, use BITMAP_LAST_WORD_MASK to set the first bits in a long in
the bitmap.
That one uses (1 << bits) - 1, 0b111, if you are setting the first three
bits. This means that the API counts from the least significant bits
(LSB from now on) to the MSB. The LSB in the first long is bit 0, then.
The same works for the lookup functions.
The genalloc code uses longs for the bitmap, as it should. In
include/linux/genalloc.h, struct gen_pool_chunk has unsigned long
bits[0] as its last member. When allocating the struct, genalloc should
reserve enough space for the bitmap. This should be a proper number of
longs that can fit the amount of bits in the bitmap.
However, genalloc allocates an integer number of bytes that fit the
amount of bits, but may not be an integer amount of longs. 9 bytes, for
example, could be allocated for 70 bits.
This is a problem in itself if the Least Significat Bit in a long is in
the byte with the largest address, which happens in Big Endian machines.
This means genalloc is not allocating the byte in which it will try to
set or check for a bit.
This may end up in memory corruption, where genalloc will try to set the
bits it has not allocated. In fact, genalloc may not set these bits
because it may find them already set, because they were not zeroed since
they were not allocated. And that's what causes a BUG when
gen_pool_destroy is called and check for any set bits.
What really happens is that genalloc uses kmalloc_node with __GFP_ZERO
on gen_pool_add_virt. With SLAB and SLUB, this means the whole slab
will be cleared, not only the requested bytes. Since struct
gen_pool_chunk has a size that is a multiple of 8, and slab sizes are
multiples of 8, we get lucky and allocate and clear the right amount of
bytes.
Hower, this is not the case with SLOB or with older code that did memset
after allocating instead of using __GFP_ZERO.
So, a simple module as this (running 3.6.0), will cause a crash when
rmmod'ed.
The compat ioctl for VIDEO_SET_SPU_PALETTE was missing an error check
while converting ioctl arguments. This could lead to leaking kernel
stack contents into userspace.
Patch extracted from existing fix in grsecurity.
Signed-off-by: Kees Cook <keescook@chromium.org> Cc: David Miller <davem@davemloft.net> Cc: Brad Spengler <spender@grsecurity.net> Cc: PaX Team <pageexec@freemail.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix possible overflow of the buffer used for expanding environment
variables when building file list.
In the extremely unlikely case of an attacker having control over the
environment variables visible to gen_init_cpio, control over the
contents of the file gen_init_cpio parses, and gen_init_cpio was built
without compiler hardening, the attacker can gain arbitrary execution
control via a stack buffer overflow.
In commit 800179c9b8a1 ("This adds symlink and hardlink restrictions to
the Linux VFS"), the new link protections were enabled by default, in
the hope that no actual application would care, despite it being
technically against legacy UNIX (and documented POSIX) behavior.
However, it does turn out to break some applications. It's rare, and
it's unfortunate, but it's unacceptable to break existing systems, so
we'll have to default to legacy behavior.
In particular, it has broken the way AFD distributes files, see
http://www.dwd.de/AFD/
along with some legacy scripts.
Distributions can end up setting this at initrd time or in system
scripts: if you have security problems due to link attacks during your
early boot sequence, you have bigger problems than some kernel sysctl
setting. Do:
Alternatively, we may at some point introduce a kernel config option
that sets these kinds of "more secure but not traditional" behavioural
options automatically.
Reported-by: Nick Bowler <nbowler@elliptictech.com> Reported-by: Holger Kiehl <Holger.Kiehl@dwd.de> Cc: Kees Cook <keescook@chromium.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I have a Lenovo ThinkPad T430 and an UltraBase Series 3 docking
station.
Without this patch, if I plug my headphones into the jack on the
computer, everything works fine. The computer speakers mute and the
audio is played in the headphones. However, if I plug into the docking
station headphone jack the computer speakers are muted but there is no
audio in the headphones.
The call to drv_get/set_tsf() was put on the workqueue to perform tsf
adjustments since that function might sleep. However it ended up inside
a spinlock, whose critical section must be atomic. Do tsf adjustment
outside the spinlock instead, and get rid of a warning.
Signed-off-by: Thomas Pedersen <thomas@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There was a serious disconnect in the logic happening in
sparc_pmu_disable_event() vs. sparc_pmu_enable_event().
Event disable is implemented by programming a NOP event into the PCR.
However, event enable was not reversing this operation. Instead, it
was setting the User/Priv/Hypervisor trace enable bits.
That's not sparc_pmu_enable_event()'s job, that's what
sparc_pmu_enable() and sparc_pmu_disable() do .
The intent of sparc_pmu_enable_event() is clear, since it first clear
out the event type encoding field. So fix this by OR'ing in the event
encoding rather than the trace enable bits.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
we want syscall_trace_leave() called on exit from any syscall;
skipping its call in case we'd done force_successful_syscall_return()
is broken...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 1d5783030a1 (ipv6/addrconf: speedup /proc/net/if_inet6 filling)
added bugs hiding some devices from if_inet6 and breaking applications.
"ip -6 addr" could still display all IPv6 addresses, while "ifconfig -a"
couldnt.
One way to reproduce the bug is by starting in a shell :
unshare -n /bin/bash
ifconfig lo up
And in original net namespace, lo device disappeared from if_inet6
Reported-by: Jan Hinnerk Stosch <janhinnerk.stosch@gmail.com> Tested-by: Jan Hinnerk Stosch <janhinnerk.stosch@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mihai Maruseac <mihai.maruseac@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After commit e2446eaa ("tcp_v4_send_reset: binding oif to iif in no
sock case").. tcp resets are always lost, when routing is asymmetric.
Yes, backing out that patch will result in misrouting of resets for
dead connections which used interface binding when were alive, but we
actually cannot do anything here. What's died that's died and correct
handling normal unbound connections is obviously a priority.
Comment to comment:
> This has few benefits:
> 1. tcp_v6_send_reset already did that.
It was done to route resets for IPv6 link local addresses. It was a
mistake to do so for global addresses. The patch fixes this as well.
Actually, the problem appears to be even more serious than guaranteed
loss of resets. As reported by Sergey Soloviev <sol@eqv.ru>, those
misrouted resets create a lot of arp traffic and huge amount of
unresolved arp entires putting down to knees NAT firewalls which use
asymmetric routing.
This is the revised patch for fixing rds-ping spinlock recursion
according to Venkat's suggestions.
RDS ping/pong over TCP feature has been broken for years(2.6.39 to
3.6.0) since we have to set TCP cork and call kernel_sendmsg() between
ping/pong which both need to lock "struct sock *sk". However, this
lock has already been hold before rds_tcp_data_ready() callback is
triggerred. As a result, we always facing spinlock resursion which
would resulting in system panic.
Given that RDS ping is only used to test the connectivity and not for
serious performance measurements, we can queue the pong transmit to
rds_wq as a delayed response.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> CC: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> CC: David S. Miller <davem@davemloft.net> CC: James Morris <james.l.morris@oracle.com> Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After the change "Make neigh lookups directly in output packet path"
(commit a263b30936) IPVS can not reach the real server for DR mode
because we resolve the destination address from IP header, not from
route neighbour. Use the new FLOWI_FLAG_KNOWN_NH flag to request
output routes with known nexthop, so that it has preference
on resolving.
Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add flag to request that output route should be
returned with known rt_gateway, in case we want to use
it as nexthop for neighbour resolving.
The returned route can be cached as follows:
- in NH exception: because the cached routes are not shared
with other destinations
- in FIB NH: when using gateway because all destinations for
NH share same gateway
As last option, to return rt_gateway!=0 we have to
set DST_NOCACHE.
Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add new flag to remember when route is via gateway.
We will use it to allow rt_gateway to contain address of
directly connected host for the cases when DST_NOCACHE is
used or when the NH exception caches per-destination route
without DST_NOCACHE flag, i.e. when routes are not used for
other destinations. By this way we force the neighbour
resolving to work with the routed destination but we
can use different address in the packet, feature needed
for IPVS-DR where original packet for virtual IP is routed
via route to real IP.
Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Avoid checking nh_pcpu_rth_output in fast path,
abort fib_info creation on alloc_percpu failure.
Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After the change "Adjust semantics of rt->rt_gateway"
(commit f8126f1d51) rt_gateway can be 0 but ip_forward() compares
it directly with nexthop. What we want here is to check if traffic
is to directly connected nexthop and to fail if using gateway.
Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After "Cache input routes in fib_info nexthops" (commit d2d68ba9fe) and "Elide fib_validate_source() completely when possible"
(commit 7a9bc9b81a) we can not send ICMP redirects. It seems we
should not cache the RTCF_DOREDIRECT flag in nh_rth_input because
the same fib_info can be used for traffic that is not redirected,
eg. from other input devices or from sources that are not in same subnet.
As result, we have to disable the caching of RTCF_DOREDIRECT
flag and to force source validation for the case when forwarding
traffic to the input device. If traffic comes from directly connected
source we allow redirection as it was done before both changes.
Avoid setting RTCF_DOREDIRECT if IN_DEV_TX_REDIRECTS
is disabled, this can avoid source address validation and to
help caching the routes.
After the change "Adjust semantics of rt->rt_gateway"
(commit f8126f1d51) we should make sure our ICMP_REDIR_HOST messages
contain daddr instead of 0.0.0.0 when target is directly connected.
Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6a32e4f9dd9219261f8856f817e6655114cfec2f made the vlan code skip marking
vlan-tagged frames for not locally configured vlans as PACKET_OTHERHOST if
there was an rx_handler, as the rx_handler could cause the frame to be received
on a different (virtual) vlan-capable interface where that vlan might be
configured.
As rx_handlers do not necessarily return RX_HANDLER_ANOTHER, this could cause
frames for unknown vlans to be delivered to the protocol stack as if they had
been received untagged.
For example, if an ipv6 router advertisement that's tagged for a locally not
configured vlan is received on an interface with macvlan interfaces attached,
macvlan's rx_handler returns RX_HANDLER_PASS after delivering the frame to the
macvlan interfaces, which caused it to be passed to the protocol stack, leading
to ipv6 addresses for the announced prefix being configured even though those
are completely unusable on the underlying interface.
The fix moves marking as PACKET_OTHERHOST after the rx_handler so the
rx_handler, if there is one, sees the frame unchanged, but afterwards,
before the frame is delivered to the protocol stack, it gets marked whether
there is an rx_handler or not.
Signed-off-by: Florian Zumbiehl <florz@florz.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Marvell 88E8001 on an ASUS P5NSLI motherboard is unable to send/receive
packets on a system with >4gb ram unless a 32bit DMA mask is used.
This issue has been around for years and a fix was sent 3.5 years ago, but
there was some debate as to whether it should instead be fixed as a PCI quirk.
http://www.spinics.net/lists/netdev/msg88670.html
However, 18 months later a similar workaround was introduced for another
chipset exhibiting the same problem.
http://www.spinics.net/lists/netdev/msg142287.html
Signed-off-by: Graham Gower <graham.gower@gmail.com> Signed-off-by: Jan Ceuleers <jan.ceuleers@computer.org> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We report cached pmtu values even if they are already expired.
Change this to not report these values after they are expired
and fix a race in the expire time calculation, as suggested by
Eric Dumazet.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a local tool like tracepath tries to send packets bigger than
the device mtu, we create a nh exeption and set the pmtu to device
mtu. The device mtu does not expire, so check if the device mtu is
smaller than the reported pmtu and don't crerate a nh exeption in
that case.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some protocols, like IPsec still cache routes. So we need to invalidate
the old route on pmtu events to avoid the reuse of stale routes.
We also need to update the mtu and expire time of the route if we already
use a nh exception route, otherwise we ignore newly learned pmtu values
after the first expiration.
With this patch we always invalidate or update the route on pmtu events.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
IPv4 side of the problem was addressed in commit a9e050f4e7f9d
(net: tcp: GRO should be ECN friendly)
This patch does the same, but for IPv6 : A Traffic Class mismatch
doesnt mean flows are different, but instead should force a flush
of previous packets.
This patch removes artificial packet reordering problem.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The retry loop in neigh_resolve_output() and neigh_connected_output()
call dev_hard_header() with out reseting the skb to network_header.
This causes the retry to fail with skb_under_panic. The fix is to
reset the network_header within the retry loop.
Signed-off-by: Ramesh Nagappa <ramesh.nagappa@ericsson.com> Reviewed-by: Shawn Lu <shawn.lu@ericsson.com> Reviewed-by: Robert Coulson <robert.coulson@ericsson.com> Reviewed-by: Billie Alsup <billie.alsup@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Over time, skb recycling infrastructure got litle interest and
many bugs. Generic rx path skb allocation is now using page
fragments for efficient GRO / TCP coalescing, and recyling
a tx skb for rx path is not worth the pain.
Last identified bug is that fat skbs can be recycled
and it can endup using high order pages after few iterations.
With help from Maxime Bizon, who pointed out that commit 87151b8689d (net: allow pskb_expand_head() to get maximum tailroom)
introduced this regression for recycled skbs.
Instead of fixing this bug, lets remove skb recycling.
Drivers wanting really hot skbs should use build_skb() anyway,
to allocate/populate sk_buff right before netif_receive_skb()
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I get a panic when I use ss -a and rmmod inet_diag at the
same time.
It's because netlink_dump uses inet_diag_dump which belongs to module
inet_diag.
I search the codes and find many modules have the same problem. We
need to add a reference to the module which the cb->dump belongs to.
Thanks for all help from Stephen,Jan,Eric,Steffen and Pablo.
Change From v3:
change netlink_dump_start to inline,suggestion from Pablo and
Eric.
Change From v2:
delete netlink_dump_done,and call module_put in netlink_dump
and netlink_sock_destruct.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This prevents the case of unbinding the object in order to process the
relocations through the GTT and then rebinding it only to then proceed
to use cpu relocations as the object is now in the CPU write domain. By
choosing to use cpu relocations up front, we can therefore avoid the
rebind penalty.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We weren't checking whether the resource was in use before calling
res_free(), so applications which called STREAMOFF on a v4l2 device that
wasn't already streaming would cause a BUG() to be hit (MythTV).
Reported-by: Larry Finger <larry.finger@lwfinger.net> Reported-by: Jay Harbeston <jharbestonus@gmail.com> Signed-off-by: Devin Heitmueller <dheitmueller@kernellabs.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
If none of the elements in scrubrates[] matches, this loop will cause
__amd64_set_scrub_rate() to incorrectly use the n+1th element.
As the function is designed to use the final scrubrates[] element in the
case of no match, we can fix this bug by simply terminating the array
search at the n-1th element.
Boris: this code is fragile anyway, see here why:
http://marc.info/?l=linux-kernel&m=135102834131236&w=2
The commit incorrectly assumed that fork path always performed
threadgroup_change_begin/end() and depended on that for
synchronization against task exit and cgroup migration paths instead
of explicitly grabbing task_lock().
threadgroup_change is not locked when forking a new process (as
opposed to a new thread in the same process) and even if it were it
wouldn't be effective as different processes use different threadgroup
locks.
The commit incorrectly assumed that fork path always performed
threadgroup_change_begin/end() and depended on that for
synchronization against task exit and cgroup migration paths instead
of explicitly grabbing task_lock().
threadgroup_change is not locked when forking a new process (as
opposed to a new thread in the same process) and even if it were it
wouldn't be effective as different processes use different threadgroup
locks.
notify_on_release must be triggered when the last process in a cgroup is
move to another. But if the first(and only) process in a cgroup is moved to
another, notify_on_release is not triggered.
# mkdir /cgroup/cpu/SRC
# mkdir /cgroup/cpu/DST
#
# echo 1 >/cgroup/cpu/SRC/notify_on_release
# echo 1 >/cgroup/cpu/DST/notify_on_release
#
# sleep 300 &
[1] 8629
#
# echo 8629 >/cgroup/cpu/SRC/tasks
# echo 8629 >/cgroup/cpu/DST/tasks
-> notify_on_release for /SRC must be triggered at this point,
but it isn't.
This is because put_css_set() is called before setting CGRP_RELEASABLE
in cgroup_task_migrate(), and is a regression introduce by the
commit:74a1166d(cgroups: make procs file writable), which was merged
into v3.0.
Acked-by: Li Zefan <lizefan@huawei.com> Cc: Ben Blum <bblum@andrew.cmu.edu> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch (as1615) fixes a bug in the Garmin USB serial driver. It
uses attach, disconnect, and release routines to carry out actions
that should be handled by port_probe and port_remove routines, because
they access port-specific data.
The bug causes an oops when the device in unplugged, because the
private data for each port structure now gets erased when the port is
unbound from the driver, resulting in a null-pointer dereference.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported--by: Markus Schauler <mschauler@gmail.com> Tested-by: Markus Schauler <mschauler@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Set SEL control transfer tells a device the exit latencies
associated with a device-initated U1 or U2 exit. Since a parent hub may
initiate a transition to U1 soon after a downstream port's U1 timeout is
set, we need to make sure the device receives the Set SEL transfer
before the parent hub timeout is set.
This patch should be backported to kernels as old as 3.5, that contain
the commit 1ea7e0e8e3d0f50901d335ea4178ab2aa8c88201 "USB: Add support to
enable/disable USB3 link states."
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Before a driver is probed, we want to disable USB 3.0 Link Power
Management (LPM), in case the driver needs hub-initiated LPM disabled.
After the probe finishes, we want to attempt to re-enable LPM, order to
balance the LPM ref count.
When a probe fails (such as when libusual doesn't want to bind to a USB
3.0 mass storage device), make sure to balance the LPM ref counts by
re-enabling LPM.
This patch should be backported to kernels as old as 3.5, that contain
the commit 8306095fd2c1100e8244c09bf560f97aca5a311d "USB: Disable USB
3.0 LPM in critical sections."
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some USB 3.0 devices signal that they don't implement Link PM by having
all zeroes in the U1/U2 exit latencies in their SuperSpeed BOS
descriptor. Don found that a Western Digital device he has experiences
transfer errors when LPM is enabled. The lsusb shows the U1/U2 exit
latencies are set to zero:
Binary Object Store Descriptor:
bLength 5
bDescriptorType 15
wTotalLength 22
bNumDeviceCaps 2
SuperSpeed USB Device Capability:
bLength 10
bDescriptorType 16
bDevCapabilityType 3
bmAttributes 0x00
Latency Tolerance Messages (LTM) Supported
wSpeedsSupported 0x000e
Device can operate at Full Speed (12Mbps)
Device can operate at High Speed (480Mbps)
Device can operate at SuperSpeed (5Gbps)
bFunctionalitySupport 1
Lowest fully-functional device speed is Full Speed (12Mbps)
bU1DevExitLat 0 micro seconds
bU2DevExitLat 0 micro seconds
The fix is to not enable LPM for a particular link state if we find its
corresponding exit latency is zero.
This patch should be backported to kernels as old as 3.5, that contain
the commit 1ea7e0e8e3d0f50901d335ea4178ab2aa8c88201 "USB: Add support to
enable/disable USB3 link states."
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Reported-by: Don Zickus <dzickus@redhat.com> Tested-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If a USB transfer has already been started, meaning
we have already issued StartTransfer command to that
particular endpoint, DWC3_EP_BUSY flag has also
already been set.
When we try to cancel this transfer which is already
in controller's cache, we will not receive XferComplete
event and we must clear DWC3_EP_BUSY in order to allow
subsequent requests to be properly started.
The best place to clear that flag is right after issuing
DWC3_DEPCMD_ENDTRANSFER.
Fix port-data memory leak by replacing attach and release with
port_probe and port_remove.
Since commit 0998d0631001288 (device-core: Ensure drvdata = NULL when no
driver is bound) the port private data is no longer freed at release as
it is no longer accessible.
Note that the write waitqueue was initialised but never used.
Compile-only tested.
Signed-off-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix port-data memory leak by moving port data allocation and
deallocation to port_probe and port_remove.
Since commit 0998d0631001288 (device-core: Ensure drvdata = NULL when no
driver is bound) the port private data is no longer freed at release as
it is no longer accessible.
Fix port-data memory leak by replacing attach and release with
port_probe and port_remove.
Since commit 0998d0631001288 (device-core: Ensure drvdata = NULL when no
driver is bound) the port private data is no longer freed at release as
it is no longer accessible.
Note that the write waitqueue was initialised but never used.
Compile-only tested.
Signed-off-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix port-data memory leak by moving port data allocation and
deallocation to port_probe and port_remove.
Since commit 0998d0631001288 (device-core: Ensure drvdata = NULL when no
driver is bound) the port private data is no longer freed at release as
it is no longer accessible.
Compile-only tested.
Signed-off-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix port-data memory leak by replacing attach and release with
port_probe and port_remove.
Since commit 0998d0631001288 (device-core: Ensure drvdata = NULL when no
driver is bound) the port private data is no longer freed at release as
it is no longer accessible.
Compile-only tested.
Signed-off-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix port data memory leak by replacing port private data with serial
private data.
Since commit 0998d0631001288 (device-core: Ensure drvdata = NULL when no
driver is bound) the port private data is no longer freed at
release.
The private data is used to store the control interface number, but as
this is the same for all ports on an interface it should be stored as
usb-serial data anyway.
Signed-off-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix port-data memory leak by replacing attach and release with
port_probe and port_remove.
Since commit 0998d0631001288 (device-core: Ensure drvdata = NULL when no
driver is bound) the port private data is no longer freed at release as
it is no longer accessible.
Compile-only tested.
Signed-off-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix port-data memory leak by replacing attach and release with
port_probe and port_remove.
Since commit 0998d0631001288 (device-core: Ensure drvdata = NULL when no
driver is bound) the port private data is no longer freed at release as
it is no longer accessible.
Compile-only tested.
Signed-off-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix port-data memory leak by replacing attach and release with
port_probe and port_remove.
Since commit 0998d0631001288 (device-core: Ensure drvdata = NULL when no
driver is bound) the port private data is no longer freed at release as
it is no longer accessible.
Compile-only tested.
Signed-off-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix port-data memory leak by allocating and freeing port data in
port_probe/remove rather than in attach/release, and by introducing
serial private data to store the device type which is interface rather
than port specific.
Since commit 0998d0631001288 (device-core: Ensure drvdata = NULL when no
driver is bound) the port private data is no longer freed at release.
Signed-off-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix port-data memory leak by replacing attach and release with
port_probe and port_remove.
Since commit 0998d0631001288 (device-core: Ensure drvdata = NULL when no
driver is bound) the port private data is no longer freed at release as
it is no longer accessible.
Note that the write waitqueue was initialised but never used.
Compile-only tested.
Signed-off-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix port-data memory leak by replacing attach and release with
port_probe and port_remove.
Since commit 0998d0631001288 (device-core: Ensure drvdata = NULL when no
driver is bound) the port private data is no longer freed at release as
it is no longer accessible.
Compile-only tested.
Signed-off-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix port-data memory leak by replacing attach and release with
port_probe and port_remove.
Since commit 0998d0631001288 (device-core: Ensure drvdata = NULL when no
driver is bound) the port private data is no longer freed at release as
it is no longer accessible.
Compile-only tested.
Signed-off-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix port-data memory leak by replacing attach and release with
port_probe and port_remove.
Since commit 0998d0631001288 (device-core: Ensure drvdata = NULL when no
driver is bound) the port private data is no longer freed at release as
it is no longer accessible.
Compile-only tested.
Signed-off-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On AM3517, tx and rx interrupt are detected together with
the disconnect event. This generates a kernel panic in musb_interrupt,
because rx / tx are handled after disconnect.
This issue was seen on a Technexion's TAM3517 SOM. Unplugging a device,
tx / rx interrupts together with disconnect are detected. This brings
to kernel panic like this:
[ 68.526153] Unable to handle kernel NULL pointer dereference at virtual address 00000011
[ 68.534698] pgd = c0004000
[ 68.537536] [00000011] *pgd=00000000
[ 68.541351] Internal error: Oops: 17 [#1] ARM
[ 68.545928] Modules linked in:
[ 68.549163] CPU: 0 Not tainted (3.6.0-rc5-00020-g9e05905 #178)
[ 68.555694] PC is at rxstate+0x8/0xdc
[ 68.559539] LR is at musb_interrupt+0x98/0x858
[ 68.564239] pc : [<c035cd88>] lr : [<c035af1c>] psr: 40000193
[ 68.564239] sp : ce83fb40 ip : d0906410 fp : 00000000
[ 68.576293] r10: 00000000 r9 : cf3b0e40 r8 : 00000002
[ 68.581817] r7 : 00000019 r6 : 00000001 r5 : 00000001 r4 : 000000d4
[ 68.588684] r3 : 00000000 r2 : 00000000 r1 : ffffffcc r0 : cf23c108
[ 68.595550] Flags: nZcv IRQs off FIQs on Mode SVC_32 ISA ARM Segment ke
Note: this behavior is not seen with a USB hub, while it is
easy to reproduce connecting a USB-pen directly to the USB-A of
the board.
Drop tx / rx interrupts if disconnect is detected.
Signed-off-by: Stefano Babic <sbabic@denx.de> CC: Felipe Balbi <balbi@ti.com> Tested-by: Dmitry Lifshitz <lifshitz@compulab.co.il> Tested-by: Igor Grinberg <grinberg@compulab.co.il> Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix port-data memory leak by replacing attach and release with
port_probe and port_remove.
Since commit 0998d0631001288 (device-core: Ensure drvdata = NULL when no
driver is bound) the port private data is no longer freed at release as
it is no longer accessible.
Note that the write waitqueue was initialised but never used.
Compile-only tested.
Signed-off-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix port-data memory leak by moving port data allocation and
deallocation to port_probe and port_remove.
Since commit 0998d0631001288 (device-core: Ensure drvdata = NULL when no
driver is bound) the port private data is no longer freed at release as
it is no longer accessible.
Compile-only tested.
Signed-off-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In 32 bit guests, if a userspace process has %eax == -ERESTARTSYS
(-512) or -ERESTARTNOINTR (-513) when it is interrupted by an event
/and/ the process has a pending signal then %eip (and %eax) are
corrupted when returning to the main process after handling the
signal. The application may then crash with SIGSEGV or a SIGILL or it
may have subtly incorrect behaviour (depending on what instruction it
returned to).
The occurs because handle_signal() is incorrectly thinking that there
is a system call that needs to restarted so it adjusts %eip and %eax
to re-execute the system call instruction (even though user space had
not done a system call).
If %eax == -514 (-ERESTARTNOHAND (-514) or -ERESTART_RESTARTBLOCK
(-516) then handle_signal() only corrupted %eax (by setting it to
-EINTR). This may cause the application to crash or have incorrect
behaviour.
handle_signal() assumes that regs->orig_ax >= 0 means a system call so
any kernel entry point that is not for a system call must push a
negative value for orig_ax. For example, for physical interrupts on
bare metal the inverse of the vector is pushed and page_fault() sets
regs->orig_ax to -1, overwriting the hardware provided error code.
xen_hypervisor_callback() was incorrectly pushing 0 for orig_ax
instead of -1.
Classic Xen kernels pushed %eax which works as %eax cannot be both
non-negative and -RESTARTSYS (etc.), but using -1 is consistent with
other non-system call entry points and avoids some of the tests in
handle_signal().
There were similar bugs in xen_failsafe_callback() of both 32 and
64-bit guests. If the fault was corrected and the normal return path
was used then 0 was incorrectly pushed as the value for orig_ax.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Jan Beulich <JBeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>