Delete the 10 msec delay between the INIT and SIPI when starting
slave cpus. I can find no requirement for this delay. BIOS also
has similar code sequences without the delay.
Removing the delay reduces boot time by 40 sec. Every bit helps.
When we enter a 32-bit system call via SYSENTER or SYSCALL, we shuffle
the arguments to match the int $0x80 calling convention. This was
probably a design mistake, but it's what it is now. This causes
errors if the system call as to be restarted.
For SYSENTER, we have to invoke the instruction from the vdso as the
return address is hardcoded. Accordingly, we can simply replace the
jump in the vdso with an int $0x80 instruction and use the slower
entry point for a post-restart.
It was found (by Sasha) that if you use a futex located in the gate
area we get stuck in an uninterruptible infinite loop, much like the
ZERO_PAGE issue.
While looking at this problem, PeterZ realized you'll get into similar
trouble when hitting any install_special_pages() mapping. And are there
still drivers setting up their own special mmaps without page->mapping,
and without special VM or pte flags to make get_user_pages fail?
In most cases, if page->mapping is NULL, we do not need to retry at all:
Linus points out that even /proc/sys/vm/drop_caches poses no problem,
because it ends up using remove_mapping(), which takes care not to
interfere when the page reference count is raised.
But there is still one case which does need a retry: if memory pressure
called shmem_writepage in between get_user_pages_fast dropping page
table lock and our acquiring page lock, then the page gets switched from
filecache to swapcache (and ->mapping set to NULL) whatever the refcount.
Fault it back in to get the page->mapping needed for key->shared.inode.
Reported-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[PG: 2.6.34 variable is page, not page_head, since it doesn't have a5b338f2] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
commit 7485d0d3758e8e6491a5c9468114e74dc050785d (futexes: Remove rw
parameter from get_futex_key()) in 2.6.33 fixed two problems: First, It
prevented a loop when encountering a ZERO_PAGE. Second, it fixed RW
MAP_PRIVATE futex operations by forcing the COW to occur by
unconditionally performing a write access get_user_pages_fast() to get
the page. The commit also introduced a user-mode regression in that it
broke futex operations on read-only memory maps. For example, this
breaks workloads that have one or more reader processes doing a
FUTEX_WAIT on a futex within a read only shared file mapping, and a
writer processes that has a writable mapping issuing the FUTEX_WAKE.
This fixes the regression for valid futex operations on RO mappings by
trying a RO get_user_pages_fast() when the RW get_user_pages_fast()
fails. This change makes it necessary to also check for invalid use
cases, such as anonymous RO mappings (which can never change) and the
ZERO_PAGE which the commit referenced above was written to address.
This patch does restore the original behavior with RO MAP_PRIVATE
mappings, which have inherent user-mode usage problems and don't really
make sense. With this patch performing a FUTEX_WAIT within a RO
MAP_PRIVATE mapping will be successfully woken provided another process
updates the region of the underlying mapped file. However, the mmap()
man page states that for a MAP_PRIVATE mapping:
It is unspecified whether changes made to the file after
the mmap() call are visible in the mapped region.
So user-mode users attempting to use futex operations on RO MAP_PRIVATE
mappings are depending on unspecified behavior. Additionally a
RO MAP_PRIVATE mapping could fail to wake up in the following case.
Thread-A: call futex(FUTEX_WAIT, memory-region-A).
get_futex_key() return inode based key.
sleep on the key
Thread-B: call mprotect(PROT_READ|PROT_WRITE, memory-region-A)
Thread-B: write memory-region-A.
COW happen. This process's memory-region-A become related
to new COWed private (ie PageAnon=1) page.
Thread-B: call futex(FUETX_WAKE, memory-region-A).
get_futex_key() return mm based key.
IOW, we fail to wake up Thread-A.
Once again doing something like this is just silly and users who do
something like this get what they deserve.
While RO MAP_PRIVATE mappings are nonsensical, checking for a private
mapping requires walking the vmas and was deemed too costly to avoid a
userspace hang.
This Patch is based on Peter Zijlstra's initial patch with modifications to
only allow RO mappings for futex operations that need VERIFY_READ access.
Reported-by: David Oliver <david@rgmadvisors.com> Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Darren Hart <dvhart@linux.intel.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: peterz@infradead.org Cc: eric.dumazet@gmail.com Cc: zvonler@rgmadvisors.com Cc: hughd@google.com Link: http://lkml.kernel.org/r/1309450892-30676-1-git-send-email-sbohrer@rgmadvisors.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[PG: in 34, the variable is "page"; in original 9ea71503a it is page_head] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
BugLink: https://bugs.launchpad.net/bugs/826081
The original reporter needs 'Headphone Jack Sense' enabled to have
audible audio, so add his PCI SSID to the whitelist.
Reported-and-tested-by: Muhammad Khurram Khan Signed-off-by: Daniel T Chen <crimsun@ubuntu.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The snd_usb_caiaq driver currently assumes that output urbs are serviced
in time and doesn't track when and whether they are given back by the
USB core. That usually works fine, but due to temporary limitations of
the XHCI stack, we faced that urbs were submitted more than once with
this approach.
As it's no good practice to fire and forget urbs anyway, this patch
introduces a proper bit mask to track which requests have been submitted
and given back.
That alone however doesn't make the driver work in case the host
controller is broken and doesn't give back urbs at all, and the output
stream will stop once all pre-allocated output urbs are consumed. But
it does prevent crashes of the controller stack in such cases.
See http://bugzilla.kernel.org/show_bug.cgi?id=40702 for more details.
Signed-off-by: Daniel Mack <zonque@gmail.com> Reported-and-tested-by: Matej Laitl <matej@laitl.cz> Cc: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
In addition to /etc/perfconfig and $HOME/.perfconfig, perf looks for
configuration in the file ./config, imitating git which looks at
$GIT_DIR/config. If ./config is not a perf configuration file, it
fails, or worse, treats it as a configuration file and changes behavior
in some unexpected way.
"config" is not an unusual name for a file to be lying around and perf
does not have a private directory dedicated for its own use, so let's
just stop looking for configuration in the cwd. Callers needing
context-sensitive configuration can use the PERF_CONFIG environment
variable.
Requested-by: Christian Ohm <chr.ohm@gmx.net> Cc: 632923@bugs.debian.org Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Christian Ohm <chr.ohm@gmx.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110805165838.GA7237@elie.gateway.2wire.net Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Commit db64fe02258f ("mm: rewrite vmap layer") introduced code that does
address calculations under the assumption that VMAP_BLOCK_SIZE is a
power of two. However, this might not be true if CONFIG_NR_CPUS is not
set to a power of two.
Wrong vmap_block index/offset values could lead to memory corruption.
However, this has never been observed in practice (or never been
diagnosed correctly); what caught this was the BUG_ON in vb_alloc() that
checks for inconsistent vmap_block indices.
To fix this, ensure that VMAP_BLOCK_SIZE always is a power of two.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=31572 Reported-by: Pavel Kysilka <goldenfish@linuxsoft.cz> Reported-by: Matias A. Fonzo <selk@dragora.org> Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: Nick Piggin <npiggin@suse.de> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Krzysztof Helt <krzysztof.h1@poczta.fm> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This fixes faulty outbount packets in case the inbound packets
received from the hardware are fragmented and contain bogus input
iso frames. The bug has been there for ages, but for some strange
reasons, it was only triggered by newer machines in 64bit mode.
Signed-off-by: Daniel Mack <zonque@gmail.com> Reported-and-tested-by: William Light <wrl@illest.net> Reported-by: Pedro Ribeiro <pedrib@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
rs_resp is dynamically allocated in aem_read_sensor(), so it should be freed
before exiting in every case. This collects the kfree and the return at
the end of the function.
Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Reported-by: Pascal Hambourg <pascal@plouf.fr.eu.org> Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
On a box with 8TB of RAM the MMU hashtable is 64GB in size. That
means we have 4G PTEs. pSeries_lpar_hptab_clear was using a signed
int to store the index which will overflow at 2G.
Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
ie "IBM,Logh". OF got corrupted with a device tree string.
Looking at make_room and alloc_up, we claim the first chunk (1 MB)
but we never claim any more. mem_end is always set to alloc_top
which is the top of our available address space, guaranteeing we will
never call alloc_up and claim more memory.
Also alloc_up wasn't setting alloc_bottom to the bottom of the
available address space.
This doesn't help the box to boot, but we at least fail with
an obvious error. We could relocate the device tree in a future
patch.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Computers have become a lot faster since we compromised on the
partial MD4 hash which we use currently for performance reasons.
MD5 is a much safer choice, and is inline with both RFC1948 and
other ISS generators (OpenBSD, Solaris, etc.)
Furthermore, only having 24-bits of the sequence number be truly
unpredictable is a very serious limitation. So the periodic
regeneration and 8-bit counter have been removed. We compute and
use a full 32-bit sequence number.
For ipv6, DCCP was found to use a 32-bit truncated initial sequence
number (it needs 43-bits) and that is fixed here as well.
Reported-by: Dan Kaminsky <dan@doxpara.com> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
[PG: diffstat vs. 6e5714 differs, since no secure_ipv6_id to delete in 34] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
commit 995bd3bb5 (x86: Hpet: Avoid the comparator readback penalty)
chose 8 HPET cycles as a safe value for the ETIME check, as we had the
confirmation that the posted write to the comparator register is
delayed by two HPET clock cycles on Intel chipsets which showed
readback problems.
After that patch hit mainline we got reports from machines with newer
AMD chipsets which seem to have an even longer delay. See
http://thread.gmane.org/gmane.linux.kernel/1054283 and
http://thread.gmane.org/gmane.linux.kernel/1069458 for further
information.
Boris tried to come up with an ACPI based selection of the minimum
HPET cycles, but this failed on a couple of test machines. And of
course we did not get any useful information from the hardware folks.
For now our only option is to chose a paranoid high and safe value for
the minimum HPET cycles used by the ETIME check. Adjust the minimum ns
value for the HPET clockevent accordingly.
Reported-Bistected-and-Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <alpine.LFD.2.00.1012131222420.2653@localhost6.localdomain6> Cc: Simon Kirby <sim@hostway.ca> Cc: Borislav Petkov <bp@alien8.de> Cc: Andreas Herrmann <Andreas.Herrmann3@amd.com> Cc: John Stultz <johnstul@us.ibm.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Due to the overly intelligent design of HPETs, we need to workaround
the problem that the compare value which we write is already behind
the actual counter value at the point where the value hits the real
compare register. This happens for two reasons:
1) We read out the counter, add the delta and write the result to the
compare register. When a NMI or SMI hits between the read out and
the write then the counter can be ahead of the event already
2) The write to the compare register is delayed by up to two HPET
cycles in certain chipsets.
We worked around this by reading back the compare register to make
sure that the written value has hit the hardware. For certain ICH9+
chipsets this can require two readouts, as the first one can return
the previous compare register value. That's bad performance wise for
the normal case where the event is far enough in the future.
As we already know that the write can be delayed by up to two cycles
we can avoid the read back of the compare register completely if we
make the decision whether the delta has elapsed already or not based
on the following calculation:
cmp = event - actual_count;
If cmp is less than 8 HPET clock cycles, then we decide that the event
has happened already and return -ETIME. That covers the above #1 and
seconds).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Nix <nix@esperi.org.uk> Tested-by: Artur Skawina <art.08.09@gmail.com> Cc: Damien Wyart <damien.wyart@free.fr> Tested-by: John Drescher <drescherjm@gmail.com> Cc: Venkatesh Pallipadi <venki@google.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Tested-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <alpine.LFD.2.00.1009151500060.2416@localhost6.localdomain6>
[PG: diffstat differs from 995bd3bb since deleted comment was re-wrapped] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Avoid dereferencing a NULL pointer if the number of feature arguments
supplied is fewer than indicated.
Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
/proc/PID/io may be used for gathering private information. E.g. for
openssh and vsftpd daemons wchars/rchars may be used to learn the
precise password length. Restrict it to processes being able to ptrace
the target process.
ptrace_may_access() is needed to prevent keeping open file descriptor of
"io" file, executing setuid binary and gathering io information of the
setuid'ed process.
Fix several security issues in Alpha-specific syscalls. Untested, but
mostly trivial.
1. Signedness issue in osf_getdomainname allows copying out-of-bounds
kernel memory to userland.
2. Signedness issue in osf_sysinfo allows copying large amounts of
kernel memory to userland.
3. Typo (?) in osf_getsysinfo bounds minimum instead of maximum copy
size, allowing copying large amounts of kernel memory to userland.
4. Usage of user pointer in osf_wait4 while under KERNEL_DS allows
privilege escalation via writing return value of sys_wait4 to kernel
memory.
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
It's possible for a cifsSesInfo struct to have a NULL password, so we
need to check for that prior to running strncmp on it.
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
cifs_find_smb_ses assumes that the vol->password field is a valid
pointer, but that's only the case if a password was passed in via
the options string. It's possible that one won't be if there is
no mount helper on the box.
Reported-by: diabel <gacek-2004@wp.pl> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This patch replaces the earlier patch by the same name. The only
difference is that MAX_PASSWORD_SIZE has been increased to attempt to
match the limits that windows enforces.
Do a better job of matching sessions by authtype. Matching by username
for a Kerberos session is incorrect, and anonymous sessions need special
handling.
Also, in the case where we do match by username, we also need to match
by password. That ensures that someone else doesn't "borrow" an existing
session without needing to know the password.
Finally, passwords can be longer than 16 bytes. Bump MAX_PASSWORD_SIZE
to 512 to match the size that the userspace mount helper allows.
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
[PG: origin vs. in 2.6.34; ses <-- pSesInfo, server <-- srvTcp ] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Return -EAGAIN when we get H_BUSY back from the hypervisor. This
makes the hvc console driver retry, avoiding dropped printks.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This patch (as1480) fixes a rather obscure bug in ehci-hcd. The
qh_update() routine needs to know the number and direction of the
endpoint corresponding to its QH argument. The number can be taken
directly from the QH data structure, but the direction isn't stored
there. The direction is taken instead from the first qTD linked to
the QH.
However, it turns out that for interrupt transfers, qh_update() gets
called before the qTDs are linked to the QH. As a result, qh_update()
computes a bogus direction value, which messes up the endpoint toggle
handling. Under the right combination of circumstances this causes
usb_reset_endpoint() not to work correctly, which causes packets to be
dropped and communications to fail.
Now, it's silly for the QH structure not to have direct access to all
the descriptor information for the corresponding endpoint. Ultimately
it may get a pointer to the usb_host_endpoint structure; for now,
adding a copy of the direction flag solves the immediate problem.
This allows the Spyder2 color-calibration system (a low-speed USB
device that sends all its interrupt data packets with the toggle set
to 0 and hance requires constant use of usb_reset_endpoint) to work
when connected through a high-speed hub. Thanks to Graeme Gill for
supplying the hardware that allowed me to track down this bug.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Graeme Gill <graeme@argyllcms.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
MAX4967 USB power supply chip we use on our boards signals over-current when
power is not enabled; once it's enabled, over-current signal returns to normal.
That unfortunately caused the endless stream of "over-current change on port"
messages. The EHCI root hub code reacts on every over-current signal change
with powering off the port -- such change event is generated the moment the
port power is enabled, so once enabled the power is immediately cut off.
I think we should only cut off power when we're seeing the active over-current
signal, so I'm adding such check to that code. I also think that the fact that
we've cut off the port power should be reflected in the result of GetPortStatus
request immediately, hence I'm adding a PORTSCn register readback after write...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
After commit 3262c816a3d7fb1eaabce633caa317887ed549ae "[PATCH] knfsd:
split svc_serv into pools", svc_delete_xprt (then svc_delete_socket) no
longer removed its xpt_ready (then sk_ready) field from whatever list it
was on, noting that there was no point since the whole list was about to
be destroyed anyway.
That was mostly true, but forgot that a few svc_xprt_enqueue()'s might
still be hanging around playing with the about-to-be-destroyed list, and
could get themselves into trouble writing to freed memory if we left
this xprt on the list after freeing it.
(This is actually functionally identical to a patch made first by Ben
Greear, but with more comments.)
Cc: gnb@fmeh.org Reported-by: Ben Greear <greearb@candelatech.com> Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Block allocation is called from two places: ext3_get_blocks_handle() and
ext3_xattr_block_set(). These two callers are not necessarily synchronized
because xattr code holds only xattr_sem and i_mutex, and
ext3_get_blocks_handle() may hold only truncate_mutex when called from
writepage() path. Block reservation code does not expect two concurrent
allocations to happen to the same inode and thus assertions can be triggered
or reservation structure corruption can occur.
Fix the problem by taking truncate_mutex in xattr code to serialize
allocations.
CC: Sage Weil <sage@newdream.net> Reported-by: Fyodor Ustinov <ufm@ufm.su> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Most smartarrays will tolerate it, but some new ones don't.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Note: this is a regression caused by commit 1ddd5049 Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The function pci_enable_ari() may mistakenly set the downstream port
of a v1 PCIe switch in ARI Forwarding mode. This is a PCIe v2 feature,
and with an SR-IOV device on that switch port believing the switch above
is ARI capable it may attempt to use functions 8-255, translating into
invalid (non-zero) device numbers for that bus. This has been seen
to cause Completion Timeouts and general misbehaviour including hangs
and panics.
Acked-by: Don Dutile <ddutile@redhat.com> Tested-by: Don Dutile <ddutile@redhat.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The existing code it pretty ugly. How about we clean it up even more
like this?
From: Anton Blanchard <anton@samba.org>
We check for timeout expiry in the outer loop, but we also need to
check it in the inner loop or we can lock up forever waiting for a
CPU to hit real mode.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
In kexec jump support, jump back address passed to the kexeced
kernel via function calling ABI, that is, the function call
return address is the jump back entry.
Furthermore, jump back entry == 0 should be used to signal that
the jump back or preserve context is not enabled in the original
kernel.
But in the current implementation the stack position used for
function call return address is not cleared context
preservation is disabled. The patch fixes this bug.
Reported-and-tested-by: Yin Kangkai <kangkai.yin@intel.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Link: http://lkml.kernel.org/r/1310607277-25029-1-git-send-email-ying.huang@intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
There's a code path in pmcraid that can be reached via device ioctl that
causes all sorts of ugliness, including heap corruption or triggering the
OOM killer due to consecutive allocation of large numbers of pages.
First, the user can call pmcraid_chr_ioctl(), with a type
PMCRAID_PASSTHROUGH_IOCTL. This calls through to
pmcraid_ioctl_passthrough(). Next, a pmcraid_passthrough_ioctl_buffer
is copied in, and the request_size variable is set to
buffer->ioarcb.data_transfer_length, which is an arbitrary 32-bit
signed value provided by the user. If a negative value is provided
here, bad things can happen. For example,
pmcraid_build_passthrough_ioadls() is called with this request_size,
which immediately calls pmcraid_alloc_sglist() with a negative size.
The resulting math on allocating a scatter list can result in an
overflow in the kzalloc() call (if num_elem is 0, the sglist will be
smaller than expected), or if num_elem is unexpectedly large the
subsequent loop will call alloc_pages() repeatedly, a high number of
pages will be allocated and the OOM killer might be invoked.
It looks like preventing this value from being negative in
pmcraid_ioctl_passthrough() would be sufficient.
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Noticed that when the sysfs interface of the SCSI SES
driver was used to request a fault indication the LED
flashed but the buzzer didn't sound. So it was doing
what REQUEST IDENT (locate) should do.
Changelog:
- fix the setting of REQUEST FAULT for the device slot
and array device slot elements in the enclosure control
diagnostic page
- note the potentially defective code that reads the
FAULT SENSED and FAULT REQUESTED bits from the enclosure
status diagnostic page
The attached patch is against git/scsi-misc-2.6
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
A panic was observed when the device is failed to resume properly,
and there are no running interfaces. ieee80211_reconfig tries
to restart STA timers on unassociated state.
Signed-off-by: Rajkumar Manoharan <rmanohar@qca.qualcomm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
If expander discovery fails (sas_discover_expander()), remove the
expander from the port device list (sas_ex_discover_expander()),
before freeing it. Else the list is corrupted and, e.g., when we
attempt to send SMP commands to other devices, the kernel oopses.
Signed-off-by: Luben Tuikov <ltuikov@yahoo.com> Reviewed-by: Jack Wang <jack_wang@usish.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This patch add the missing dma_unmap().
Which solved the critical issue of system freeze on heavy load.
Michal Miroslaw's rejected patch:
[PATCH v2 10/46] net: jme: convert to generic DMA API
Pointed out the issue also, thank you Michal.
But the fix was incorrect. It would unmap needed address
when low memory.
Got lots of feedback from End user and Gentoo Bugzilla.
https://bugs.gentoo.org/show_bug.cgi?id=373109
Thank you all. :)
Signed-off-by: Guo-Fu Tseng <cooldavid@cooldavid.org> Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
While in sleep mode the CS# and other V3020 RTC GPIOs must be driven
high, otherwise V3020 RTC fails to keep the right time in sleep mode.
Signed-off-by: Igor Grinberg <grinberg@compulab.co.il> Signed-off-by: Eric Miao <eric.y.miao@gmail.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The NVIDIA series of OHCI controllers continues to be troublesome. A
few people using the MCP67 chipset have reported that even with the
most recent kernels, the OHCI controller fails to handle new
connections and spams the system log with "unable to enumerate USB
port" messages. This is different from the other problems previously
reported for NVIDIA OHCI controllers, although it is probably related.
It turns out that the MCP67 controller does not like to be kept in the
RESET state very long. After only a few seconds, it decides not to
work any more. This patch (as1479) changes the PCI initialization
quirk code so that NVIDIA controllers are switched into the SUSPEND
state after 50 ms of RESET. With no interrupts enabled and all the
downstream devices reset, and thus unable to send wakeup requests,
this should be perfectly safe (even for non-NVIDIA hardware).
The removal code in ohci-hcd hasn't been changed; it will still leave
the controller in the RESET state. As a result, if someone unloads
ohci-hcd and then reloads it, the controller won't work again until
the system is rebooted. If anybody complains about this, the removal
code can be updated similarly.
This fixes Bugzilla #22052.
Tested-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
driver_name and board_name are pointers to strings, not buffers of size
COMEDI_NAMELEN. Copying COMEDI_NAMELEN bytes of a string containing
less than COMEDI_NAMELEN-1 bytes would leak some unrelated bytes.
Add ID 4348:5523 for WinChipHead USB->RS 232 adapter with
Prolifec PL2303 chipset
Signed-off-by: Wolfgang Denk <wd@denx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Rebooting on the Dell E5420 often hangs with the keyboard or ACPI
methods, but is reliable via the PCI method.
[ hpa: this was deferred because we believed for a long time that the
recent reshuffling of the boot priorities in commit 660e34cebf0a11d54f2d5dd8838607452355f321 fixed this platform.
Unfortunately that turned out to be incorrect. ]
To work around controllers which can't properly plug events while
reset, ata_eh_reset() clears error states and ATA_PFLAG_EH_PENDING
after reset but before RESET is marked done. As reset is the final
recovery action and full verification of devices including onlineness
and classfication match is done afterwards, this shouldn't lead to
lost devices or missed hotplug events.
Unfortunately, it forgot to thaw the port when clearing EH_PENDING, so
if the condition happens after resetting an empty port, the port could
be left frozen and EH will end without thawing it, making the port
unresponsive to further hotplug events.
Thaw if the port is frozen after clearing EH_PENDING. This problem is
reported by Bruce Stenning in the following thread.
stable: I think we should weather this patch a bit longer in -rcX
before sending it to -stable. Please wait at least a month
after this patch makes upstream. Thanks.
-v2: Fixed spelling in the comment per Dave Howorth.
Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Bruce Stenning <b.stenning@indigovision.com> Cc: Dave Howorth <dhoworth@mrc-lmb.cam.ac.uk> Signed-off-by: Jeff Garzik <jgarzik@pobox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
What is supposed to happen:
* bridge with the lowest ID is elected root (for example: B)
* C detects that A->C is higher cost path and puts in blocking state
What happens. Bridge with lowest id (B) is elected correctly as
root and things start out fine initially. But then config BPDU
doesn't get transmitted from A -> C. Because of that
the link from A-C is transistioned to the forwarding state.
The root cause of this is that the configuration messages
is generated with bogus message age, and dropped before
sending.
In the standardmessage_age is supposed to be:
the time since the generation of the Configuration BPDU by
the Root that instigated the generation of this Configuration BPDU.
Reimplement this by recording the timestamp (age + jiffies) when
recording config information. The old code incorrectly used the time
elapsed on the ageing timer which was incorrect.
See also:
https://bugzilla.vyatta.com/show_bug.cgi?id=7164
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
spi_sync call uses its spi_message parameter to keep completion information,
using a drvdata structure is not thread-safe. Use a mutex to prevent
multiple access to shared driver data.
Signed-off-by: Pavel Herrmann <morpheus.ibis@gmail.com> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: Pavel Machek <pavel@ucw.cz> Acked-by: Marek Vasut <marek.vasut@gmail.com> Acked-by: Cyril Hrubis <metan@ucw.cz> Tested-by: Stanislav Brabec <utx@penguin.cz> Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
While compiling it with Fedora 15, I noticed this issue:
inlined from ‘si4713_write_econtrol_string’ at drivers/media/radio/si4713-i2c.c:1065:24:
arch/x86/include/asm/uaccess_32.h:211:26: error: call to ‘copy_from_user_overflow’ declared with attribute error: copy_from_user() buffer size is not provably correct
Because struct rpcbind_args *map was declared static, if two
threads entered this method at the same time, the values
assigned to map could be sent two two differen tasks.
This could cause all sorts of problems, include use-after-free
and double-free of memory.
Fix this by removing the static declaration so that the map
pointer is on the stack.
Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The tuner-core subdev requires that the type field of v4l2_tuner is
filled in correctly. This is done in v4l2-ioctl.c, but pvrusb2 doesn't
use that yet, so we have to do it manually based on whether the current
input is radio or not.
Tested with my pvrusb2.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Acked-by: Mike Isely <isely@pobox.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The subdevs are supposed to receive a valid tuner type for the g_frequency
and g/s_tuner subdev ops. Some drivers do this, others don't. So prefill
this in v4l2-ioctl.c based on whether the device node from which this is
called is a radio node or not.
The spec does not require applications to fill in the type, and if they
leave it at 0 then the 'check_mode' call in tuner-core.c will return
an error and the ioctl does nothing.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The Blackfin DMA controller can report one frame beyond the end of the
buffer in the wraparound case but ALSA requires that the pointer always
be in the buffer. Do the wraparound to handle this. A similar bug is
likely to apply to the other Blackfin PCM drivers but the code is less
obvious to inspection and I don't have a user to test.
Reported-by: Kieran O'Leary <Kieran.O'Leary@wolfsonmicro.com> Acked-by: Liam Girdwood <lrg@ti.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Michael Leun reported that running parallel opens on a fuse filesystem
can trigger a "kernel BUG at mm/truncate.c:475"
Gurudas Pai reported the same bug on NFS.
The reason is, unmap_mapping_range() is not prepared for more than
one concurrent invocation per inode. For example:
thread1: going through a big range, stops in the middle of a vma and
stores the restart address in vm_truncate_count.
thread2: comes in with a small (e.g. single page) unmap request on
the same vma, somewhere before restart_address, finds that the
vma was already unmapped up to the restart address and happily
returns without doing anything.
Another scenario would be two big unmap requests, both having to
restart the unmapping and each one setting vm_truncate_count to its
own value. This could go on forever without any of them being able to
finish.
Truncate and hole punching already serialize with i_mutex. Other
callers of unmap_mapping_range() do not, and it's difficult to get
i_mutex protection for all callers. In particular ->d_revalidate(),
which calls invalidate_inode_pages2_range() in fuse, may be called
with or without i_mutex.
This patch adds a new mutex to 'struct address_space' to prevent
running multiple concurrent unmap_mapping_range() on the same mapping.
[ We'll hopefully get rid of all this with the upcoming mm
preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex
lockbreak" patch in particular. But that is for 2.6.39 ]
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reported-by: Michael Leun <lkml20101129@newton.leun.net> Reported-by: Gurudas Pai <gurudas.pai@oracle.com> Tested-by: Gurudas Pai <gurudas.pai@oracle.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[PG: Some chunks dropped, since no ebdfed4dc5 in 34; came in at 2.6.37] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Consider this scenario: When the size of the first received udp packet
is bigger than the receive buffer, MSG_TRUNC bit is set in msg->msg_flags.
However, if checksum error happens and this is a blocking socket, it will
goto try_again loop to receive the next packet. But if the size of the
next udp packet is smaller than receive buffer, MSG_TRUNC flag should not
be set, but because MSG_TRUNC bit is not cleared in msg->msg_flags before
receive the next packet, MSG_TRUNC is still set, which is wrong.
Fix this problem by clearing MSG_TRUNC flag when starting over for a
new packet.
Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
udpv6_recvmsg() function is not using the correct variable to determine
whether or not the socket is in non-blocking operation, this will lead
to unexpected behavior when a UDP checksum error occurs.
Consider a non-blocking udp receive scenario: when udpv6_recvmsg() is
called by sock_common_recvmsg(), MSG_DONTWAIT bit of flags variable in
udpv6_recvmsg() is cleared by "flags & ~MSG_DONTWAIT" in this call:
But it will always go to try_again as MSG_DONTWAIT has been cleared
from flags at call time -- only noblock contains the original value
of MSG_DONTWAIT, so the test should be:
if (noblock)
return -EAGAIN;
This is also consistent with what the ipv4/udp code does.
Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Check against mistakenly passing in IPv6 addresses (which would result
in an INADDR_ANY bind) or similar incompatible sockaddrs.
Signed-off-by: Marcus Meissner <meissner@suse.de> Cc: Reinhard Max <max@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
In 2.6.27, commit 393e52e33c6c2 (packet: deliver VLAN TCI to userspace)
added a small information leak.
Add padding field and make sure its zeroed before copy to user.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
A mis-configured filter can spam the logs with lots of stack traces.
Rate-limit the warnings and add printout of the bogus filter information.
Original-patch-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Add a generic mechanism to ratelimit WARN(foo, fmt, ...) messages
using a hidden per call site static struct ratelimit_state.
Also add an __WARN_RATELIMIT variant to be able to use a specific
struct ratelimit_state.
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
There is a bug in free_unnecessary_pages() that causes it to
attempt to free too many pages in some cases, which triggers the
BUG_ON() in memory_bm_clear_bit() for copy_bm. Namely, if
count_data_pages() is initially greater than alloc_normal, we get
to_free_normal equal to 0 and "save" greater from 0. In that case,
if the sum of "save" and count_highmem_pages() is greater than
alloc_highmem, we subtract a positive number from to_free_normal.
Hence, since to_free_normal was 0 before the subtraction and is
an unsigned int, the result is converted to a huge positive number
that is used as the number of pages to free.
Fix this bug by checking if to_free_normal is actually greater
than or equal to the number we're going to subtract from it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reported-and-tested-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
There is a problem in hibernate_preallocate_memory() that it calls
preallocate_image_memory() with an argument that may be greater than
the total number of available non-highmem memory pages. If that's
the case, the OOM condition is guaranteed to trigger, which in turn
can cause significant slowdown to occur during hibernation.
To avoid that, make preallocate_image_memory() adjust its argument
before calling preallocate_image_pages(), so that the total number of
saveable non-highem pages left is not less than the minimum size of
a hibernation image. Change hibernate_preallocate_memory() to try to
allocate from highmem if the number of pages allocated by
preallocate_image_memory() is too low.
Modify free_unnecessary_pages() to take all possible memory
allocation patterns into account.
Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Tested-by: M. Vefa Bicakci <bicave@superonline.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
With glibc 2.11 or later that was built with --enable-multi-arch, the UML
link fails with undefined references to __rel_iplt_start and similar
symbols. In recent binutils, the default linker script defines these
symbols (see ld --verbose). Fix the UML linker scripts to match the new
defaults for these sections.
Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This patch (as1465) continues implementation of the policy that errors
during suspend or hibernation should not prevent the system from going
to sleep.
In this case, failure to turn on the Suspend feature for a hub port
shouldn't be reported as an error. There are situations where this
does actually occur (such as when the device plugged into that port
was disconnected in the recent past), and it turns out to be harmless.
There's no reason for it to prevent a system sleep.
Also, don't allow the hub driver to fail a system suspend if the
downstream ports aren't all suspended. This is also harmless (and
should never happen, given the change mentioned above); printing a
warning message in the kernel log is all we really need to do.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This patch (as1464) implements the recommended policy that most errors
during suspend or hibernation should not prevent the system from going
to sleep. In particular, failure to suspend a USB driver or a USB
device should not prevent the sleep from succeeding:
Failure to suspend a device won't matter, because the device will
automatically go into suspend mode when the USB bus stops carrying
packets. (This might be less true for USB-3.0 devices, but let's not
worry about them now.)
Failure of a driver to suspend might lead to trouble later on when the
system wakes up, but it isn't sufficient reason to prevent the system
from going to sleep.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The message hints that disc_data_lock is aquired with softirqs disabled,
but does not itself disable softirqs, which can in rare circumstances
lead to a deadlock.
The same problem is present in the 6pack driver, this patch fixes both
by using write_lock_bh instead of write_lock.
Reported-by: Bernard F6BVP <f6bvp@free.fr> Tested-by: Bernard F6BVP <f6bvp@free.fr> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Ralf Baechle<ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
If a device fails in a way that causes pending request to take a while
to complete, md will not be able to immediately remove it from the
array in remove_and_add_spares.
It will then incorrectly look like a spare device and md will try to
recover it even though it is failed.
This leads to a recovery process starting and instantly aborting over
and over again.
We should check if the device is faulty before considering it to be a
spare. This will avoid trying to start a recovery that cannot
proceed.
This bug was introduced in 2.6.26 so that patch is suitable for any
kernel since then.
Reported-by: Jim Paradis <james.paradis@stratus.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Since we are modifying this RCU pointer, we need to hold
the lock protecting it around it.
This fixes a potential reuse and double free of a cfq
io_context structure. The bug has been in CFQ for a long
time, it hit very few people but those it did hit seemed
to see it a lot.
Order of initialization look like this:
...
debugobjects
kmemleak
...(lots of other subsystems)...
workqueues (through early initcall)
...
debugobjects use schedule_work for batch freeing of its data and kmemleak
heavily use debugobjects, so when it comes to freeing and workqueues were
not initialized yet, kernel crashes:
Otherwise, the gpiolib autorequest feature will produce a WARN_ON():
WARNING: at drivers/gpio/gpiolib.c:101 0x8020ec6c()
autorequest GPIO-215
[...]
Signed-off-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
[PG: use combo gpio_request+gpio_direction_output vs. gpio_request_one
to avoid build failure, as per v2.6.32.47 commit 35b6863ce555c ] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
When 1GB hugepages are allocated on a system, free(1) reports less
available memory than what really is installed in the box. Also, if the
total size of hugepages allocated on a system is over half of the total
memory size, CommitLimit becomes a negative number.
The problem is that gigantic hugepages (order > MAX_ORDER) can only be
allocated at boot with bootmem, thus its frames are not accounted to
'totalram_pages'. However, they are accounted to hugetlb_total_pages()
What happens to turn CommitLimit into a negative number is this
calculation, in fs/proc/meminfo.c:
A similar calculation occurs in __vm_enough_memory() in mm/mmap.c.
Also, every vm statistic which depends on 'totalram_pages' will render
confusing values, as if system were 'missing' some part of its memory.
Impact of this bug:
When gigantic hugepages are allocated and sysctl_overcommit_memory ==
OVERCOMMIT_NEVER. In a such situation, __vm_enough_memory() goes through
the mentioned 'allowed' calculation and might end up mistakenly returning
-ENOMEM, thus forcing the system to start reclaiming pages earlier than it
would be ususal, and this could cause detrimental impact to overall
system's performance, depending on the workload.
Besides the aforementioned scenario, I can only think of this causing
annoyances with memory reports from /proc/meminfo and free(1).
[akpm@linux-foundation.org: standardize comment layout] Reported-by: Russ Anderson <rja@sgi.com> Signed-off-by: Rafael Aquini <aquini@linux.com> Acked-by: Russ Anderson <rja@sgi.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
We would free the proper number of curves, but in the wrong
slots, due to a missing level of indirection through
the pdgain_idx table.
It's simpler just to try to free all four slots, so do that.
Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
When opening /dev/snapshot device, snapshot_open() creates memory
bitmaps which are freed in snapshot_release(). But if any of the
callbacks called by pm_notifier_call_chain() returns NOTIFY_BAD, open()
fails, snapshot_release() is never called and bitmaps are not freed.
Next attempt to open /dev/snapshot then triggers BUG_ON() check in
create_basic_memory_bitmaps(). This happens e.g. when vmwatchdog module
is active on s390x.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
While trying to switch a UAS device from the BOT configuration to the UAS
configuration via the bConfigurationValue file, Tanya ran into an issue in
the USB core. usb_disable_device() sets entries in udev->ep_out and
udev->ep_out to NULL, but doesn't call into the xHCI bandwidth management
functions to remove the BOT configuration endpoints from the xHCI host's
internal structures.
The USB core would then attempt to add endpoints for the UAS
configuration, and some of the endpoints had the same address as endpoints
in the BOT configuration. The xHCI driver blindly added the endpoints
again, but the xHCI host controller rejected the Configure Endpoint
command because active endpoints were added without being dropped.
Make the xHCI driver reject calls to xhci_add_endpoint() that attempt to
add active endpoints without first calling xhci_drop_endpoint().
This should be backported to kernels as old as 2.6.31.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Reported-by: Tanya Brokhman <tlinder@codeaurora.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
We restored tty_ldisc_wait_idle in 100eeae2c5c (TTY: restore
tty_ldisc_wait_idle). We used it in the ldisc changing path to fix the
case where there are tasks in n_tty_read waiting for data and somebody
tries to change ldisc.
Similar to the case above, there may be also tasks waiting in
n_tty_read while hangup is performed. As 65b770468e98 (tty-ldisc: turn
ldisc user count into a proper refcount) removed the wait-until-idle
from all paths, hangup path won't wait for them to disappear either
now. So add it back even to the hangup path.
There is a difference, we need uninterruptible sleep as there is
obviously HUP signal pending. So tty_ldisc_wait_idle now sleeps
without possibility to be interrupted. This is what original
tty_ldisc_wait_idle did. After the wait idle reintroduction
(100eeae2c5c), we have had interruptible sleeps for the ldisc changing
path. But as there is a 5s timeout anyway, we don't allow it to be
interrupted from now on. It's not worth the added complexity of
deciding what kind of sleep we want.
Before 65b770468e98 tty_ldisc_release was called also from
tty_ldisc_release. It is called from tty_release, so I don't think we
need to restore that one.
This is nicely reproducible after constifying the timing when
drivers/tty/n_tty.c is patched as follows ("TTY: ntty, add one more
sanity check" patch is needed to actually see it explode):
%% -1548,6 +1549,7 @@ static int n_tty_open(struct tty_struct *tty)
/* These are ugly. Currently a malloc failure here can panic */
if (!tty->read_buf) {
+ msleep(100);
tty->read_buf = kzalloc(N_TTY_BUF_SIZE, GFP_KERNEL);
if (!tty->read_buf)
return -ENOMEM;
%% -1785,6 +1788,7 @@ do_it_again:
break;
}
timeout = schedule_timeout(timeout);
+ msleep(20);
continue;
}
__set_current_state(TASK_RUNNING);
===== With a process: =====
while (1) {
int fd = open(argv[1], O_RDWR);
read(fd, buf, sizeof(buf));
close(fd);
}
===== and its child: =====
setsid();
while (1) {
int fd = open(tty, O_RDWR|O_NOCTTY);
ioctl(fd, TIOCSCTTY, 1);
vhangup();
close(fd);
usleep(100 * (10 + random() % 1000));
}
===== EOF =====
References: https://bugzilla.novell.com/show_bug.cgi?id=693374
References: https://bugzilla.novell.com/show_bug.cgi?id=694509 Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[PG: account for char --> tty file rename post 2.6.34] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The clocksource watchdog code is interruptible and it has been
observed that this can trigger false positives which disable the TSC.
The reason is that an interrupt storm or a long running interrupt
handler between the read of the watchdog source and the read of the
TSC brings the two far enough apart that the delta is larger than the
unstable treshold. Move both reads into a short interrupt disabled
region to avoid that.
Reported-and-tested-by: Vernon Mauery <vernux@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
We only need to set max_pfn_mapped to the last pfn mapped on x86_64 to
make sure that cleanup_highmap doesn't remove important mappings at
_end.
We don't need to do this on x86_32 because cleanup_highmap is not called
on x86_32. Besides lowering max_pfn_mapped on x86_32 has the unwanted
side effect of limiting the amount of memory available for the 1:1
kernel pagetable allocation.
This patch reverts the x86_32 part of the original patch.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Andrea Righi reported a case where an exiting task can race against
ksmd::scan_get_next_rmap_item (http://lkml.org/lkml/2011/6/1/742) easily
triggering a NULL pointer dereference in ksmd.
ksm_scan.mm_slot == &ksm_mm_head with only one registered mm
CPU 1 (__ksm_exit) CPU 2 (scan_get_next_rmap_item)
list_empty() is false
lock slot == &ksm_mm_head
list_del(slot->mm_list)
(list now empty)
unlock
lock
slot = list_entry(slot->mm_list.next)
(list is empty, so slot is still ksm_mm_head)
unlock
slot->mm == NULL ... Oops
Close this race by revalidating that the new slot is not simply the list
head again.
When the clocksource is not a multiple of HZ, the clock will be off. For
acpi_pm, HZ=1000 the error is 127.111 ppm:
The rounding of cycle_interval ends up generating a false error term in
ntp_error accumulation since xtime_interval is not exactly 1/HZ. So, we
subtract out the error caused by the rounding.
This has been visible since 2.6.32-rc2
commit a092ff0f90cae22b2ac8028ecd2c6f6c1a9e4601
time: Implement logarithmic time accumulation
That commit raised NTP_INTERVAL_FREQ and exposed the rounding error.
testing tool: http://n1.taur.dk/permanent/testpmt.c
Also tested with ntpd and a frequency counter.
Signed-off-by: Kasper Pedersen <kkp2010@kasperkp.dk> Acked-by: john stultz <johnstul@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This adds a mechanism to resume selected IRQs during syscore_resume
instead of dpm_resume_noirq.
Under Xen we need to resume IRQs associated with IPIs early enough
that the resched IPI is unmasked and we can therefore schedule
ourselves out of the stop_machine where the suspend/resume takes
place.
This issue was introduced by 676dc3cf5bc3 "xen: Use IRQF_FORCE_RESUME".
Back ported to 2.6.32 (which lacks syscore support) by calling the relavant
resume function directly from sysdev_resume).
v2: Fixed non-x86 build errors.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Jeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com> Cc: xen-devel <xen-devel@lists.xensource.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1318713254.11016.52.camel@dagon.hellion.org.uk Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[PG: this v2 backport taken from v2.6.32.49 stable, commit 5e87d8ee34e3] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Mark the IRQF_NO_SUSPEND interrupts IRQF_FORCE_RESUME and remove the extra
walk through the interrupt descriptors.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Xen needs to reenable interrupts which are marked IRQF_NO_SUSPEND in the
resume path. Add a flag to force the reenabling in the resume code.
Tested-and-acked-by: Ian Campbell <Ian.Campbell@eu.citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The IRQ core code will take care of disabling and reenabling
interrupts over suspend resume automatically, therefore we do not need
to do this in the Xen event channel code.
The only exception is those event channels marked IRQF_NO_SUSPEND
which the IRQ core ignores. We must unmask these ourselves, taking
care to obey the current IRQ_DISABLED status. Failure check for
IRQ_DISABLED leads to enabling polled only event channels, such as
that associated with the pv spinlocks, which must never be enabled:
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>