nfsd_open() gets an unrefcounted pointer to the current process's effective
credentials at the top of the function, then calls nfsd_setuser() via
fh_verify() - which may replace and destroy the current process's effective
credentials - and then passes the unrefcounted pointer to dentry_open() - but
the credentials may have been destroyed by this point.
Instead, the value from current_cred() should be passed directly to
dentry_open() as one of its arguments, rather than being cached in a variable.
Possibly fh_verify() should return the creds to use.
Signed-off-by: David Howells <dhowells@redhat.com> Tested-and-Verified-By: Steve Dickson <steved@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I recently discovered on my zalon that if the attachment fails because
of a bus misconfiguration (I scrapped my HVD array, so the card is now
unterminated) then the system oopses. The reason is that if
ncr_attach() returns NULL (signalling failure) that NULL is passed by
the goto failed straight into ncr_detach() which oopses.
The fix is just to return -ENODEV in this case.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There's some odd bug in gcc-4.2 where it miscompiles a simple loop whent
he loop counter is of type 'unsigned char' and it should count to 128.
The compiler will incorrectly decide that a trivial loop like this:
unsigned char i, ...
for (i = 0; i < 128; i++) {
..
is endless, and will compile it to a single instruction that just
branches to itself.
This was triggered by the addition of '-fno-strict-overflow', and we
could play games with compiler versions and go back to '-fwrapv'
instead, but the trivial way to avoid it is to just make the loop
induction variable be an 'int' instead.
Thanks to Krzysztof Oledzki for reporting and testing and to Troy Moure
for digging through assembler differences and finding it.
Reported-and-tested-by: Krzysztof Oledzki <olel@ans.pl> Found-by: Troy Moure <twmoure@szypr.net> Gcc-bug-acked-by: Ian Lance Taylor <iant@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This causes kernel images that don't run init to completion with certain
broken gcc versions.
This fixes kernel bugzilla entry:
http://bugzilla.kernel.org/show_bug.cgi?id=13012
I suspect the gcc problem is this:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28230
Fix the problem by using the -fno-strict-overflow flag instead, which
not only does not exist in the known-to-be-broken versions of gcc (it
was introduced later than fwrapv), but seems to be much less disturbing
to gcc too: the difference in the generated code by -fno-strict-overflow
are smaller (compared to using neither flag) than when using -fwrapv.
Reported-by: Barry K. Nathan <barryn@pobox.com> Pushed-by: Frans Pop <elendil@planet.nl> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On 64 bit systems -- where sizeof(ssize_t) > sizeof(int) -- the following test
exposes a bug due to a non-careful return of an int or unsigned value:
implement a FUSE filesystem which sends an unsolicited notification to
the kernel with invalid opcode. The respective write to /dev/fuse
will return (1 << 32) - EINVAL with errno == 0 instead of -1 with
errno == EINVAL.
This fixes kernel.org bug #13584. The IOVA code attempted to optimise
the insertion of new ranges into the rbtree, with the unfortunate result
that some ranges just didn't get inserted into the tree at all. Then
those ranges would be handed out more than once, and things kind of go
downhill from there.
After discovering that we don't listen to gratuitious arps in 2.6.30
I tracked the failure down to this commit.
The patch makes absolutely no sense. RFC2131 RFC3927 and RFC5227.
are all in agreement that an arp request with sip == 0 should be used
for the probe (to prevent learning) and an arp request with sip == tip
should be used for the gratitous announcement that people can learn
from.
It appears the author of the broken patch got those two cases confused
and modified the code to drop all gratuitous arp traffic. Ouch!
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When an md device is created by name (rather than number) we need to
check that the name is not already in use. If this check finds a
duplicate, we return an error without dropping the lock or freeing
the newly create mddev.
This patch fixes that.
md allows write to regions on an array to be suspended temporarily.
This allows user-space to participate is aspects of reshape.
In particular, data can be copied with not risk of a race.
We should not be blocking read requests though, so don't.
The next_ordered flag is only meaningful for devices that use __make_request.
So move the test against next_ordered out of generic code and in to
__make_request
Since this test was added, barriers have not worked on md or any
devices that don't use __make_request and so don't bother to set
next_ordered. (dm explicitly sets something other than
QUEUE_ORDERED_NONE since
commit 99360b4c18f7675b50d283301d46d755affe75fd
but notes in the comments that it is otherwise meaningless).
This patch fixes a bug in the overlap function which returned true if
one region ends exactly before the second region begins. This is no
overlap but the function returned true in that case.
alpha percpu access requires custom SHIFT_PERCPU_PTR() definition for
modules to work around addressing range limitation. This is done via
generating inline assembly using C preprocessing which forces the
assembler to generate external reference. This happens behind the
compiler's back and makes the compiler think that static percpu variables
in modules are unused.
This used to be worked around by using __unused attribute for percpu
variables which prevent the compiler from omitting the variable; however,
recent declare/definition attribute unification change broke this as
__used can't be used for declaration. Also, in the process,
PER_CPU_ATTRIBUTES definition in alpha percpu.h got broken.
This patch adds PER_CPU_DEF_ATTRIBUTES which is only used for definitions
and make alpha use it to add __used for percpu variables in modules. This
also fixes the PER_CPU_ATTRIBUTES double definition bug.
Signed-off-by: Tejun Heo <tj@kernel.org> Tested-by: maximilian attems <max@stro.at> Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
get_futex_key() can infinitely loop if it is called on a
virtual address that is within a huge page but not aligned to
the beginning of that page. The call to get_user_pages_fast
will return the struct page for a sub-page within the huge page
and the check for page->mapping will always fail.
The fix is to call compound_head on the page before checking
that it's mapped.
commit 64d1304a64 (futex: setup writeable mapping for futex ops which
modify user space data) did address only half of the problem of write
access faults.
The patch was made on two wrong assumptions:
1) access_ok(VERIFY_WRITE,...) would actually check write access.
On x86 it does _NOT_. It's a pure address range check.
2) a RW mapped region can not go away under us.
That's wrong as well. Nobody can prevent another thread to call
mprotect(PROT_READ) on that region where the futex resides. If that
call hits between the get_user_pages_fast() verification and the
actual write access in the atomic region we are toast again.
The solution is to not rely on access_ok and get_user() for any write
access related fault on private and shared futexes. Instead we need to
fault it in with verification of write access.
There is no generic non destructive write mechanism which would fault
the user page in trough a #PF, but as we already know that we will
fault we can as well call get_user_pages() directly and avoid the #PF
overhead.
If get_user_pages() returns -EFAULT we know that we can not fix it
anymore and need to bail out to user space.
Remove a bunch of confusing comments on this issue as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit 6b3087c6 (which introduced Blackfin SMP) broke command line passing
when the DEBUG_DOUBLEFAULT config option was enabled. Switch the code to
using a scratch register and not R7 which holds the command line.
Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When a low priority interrupt (like ethernet) is triggered between 2 high
priority IPI messages, a deadlock in disable_irq() is hit by the second
IPI handler. This is because the second IPI message is queued within the
first IPI handler, but the handler doesn't process all messages, and new
ones are inserted rather than appended. So now we process all the pending
messages, and append new ones to the pending list.
URL: http://blackfin.uclinux.org/gf/tracker/5226
Signed-off-by: Sonic Zhang <sonic.zhang@analog.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
With the common IRQ code initializing much more of the irq_desc state, we
can't blindly initialize it ourselves to the local bad_irq state. If we
do, we end up wrongly clobbering many fields. So punt most of the bad irq
code as the common layers will handle the default state, and simply call
handle_bad_irq() directly when the IRQ we are processing is invalid.
Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We read the SWRST (Software Reset) register to get at the last reset
state, and then we may configure the DOUBLE_FAULT bit to control behavior
when a double fault occurs. But if the lower bits of the register is
already set (like UART boot mode on a BF54x), we inadvertently make the
system reset by writing to the SYSTEM_RESET field at the same time. So
make sure the lower 4 bits are always cleared.
Signed-off-by: Sonic Zhang <sonic.zhang@analog.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We have found that the current PER_CLEAR_ON_SETID mask on Linux doesn't
include neither ADDR_COMPAT_LAYOUT, nor MMAP_PAGE_ZERO.
The current mask is READ_IMPLIES_EXEC|ADDR_NO_RANDOMIZE.
We believe it is important to add MMAP_PAGE_ZERO, because by using this
personality it is possible to have the first page mapped inside a
process running as setuid root. This could be used in those scenarios:
- Exploiting a NULL pointer dereference issue in a setuid root binary
- Bypassing the mmap_min_addr restrictions of the Linux kernel: by
running a setuid binary that would drop privileges before giving us
control back (for instance by loading a user-supplied library), we
could get the first page mapped in a process we control. By further
using mremap and mprotect on this mapping, we can then completely
bypass the mmap_min_addr restrictions.
Less importantly, we believe ADDR_COMPAT_LAYOUT should also be added
since on x86 32bits it will in practice disable most of the address
space layout randomization (only the stack will remain randomized).
Signed-off-by: Julien Tinnes <jt@cr0.org> Signed-off-by: Tavis Ormandy <taviso@sdf.lonestar.org> Acked-by: Christoph Hellwig <hch@infradead.org> Acked-by: Kees Cook <kees@ubuntu.com> Acked-by: Eugene Teo <eugene@redhat.com>
[ Shortened lines and fixed whitespace as per Christophs' suggestion ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix NULL pointer dereference in tun_chr_pool() introduced by commit 33dccbb050bbe35b88ca8cf1228dcf3e4d4b3554 ("tun: Limit amount of queued
packets per device") and triggered by this code:
This patch removes the dependency of mmap_min_addr on CONFIG_SECURITY.
It also sets a default mmap_min_addr of 4096.
mmapping of addresses below 4096 will only be possible for processes
with CAP_SYS_RAWIO.
Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Acked-by: Eric Paris <eparis@redhat.com> Looks-ok-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Turning on this flag could prevent the compiler from optimising away
some "useless" checks for null pointers. Such bugs can sometimes become
exploitable at compile time because of the -O2 optimisation.
See http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Optimize-Options.html
An example that clearly shows this 'problem' is commit 6bf67672.
The file opened in acct_on and freshly stored in the ns->bacct struct can
be closed in acct_file_reopen by a concurrent call after we release
acct_lock and before we call mntput(file->f_path.mnt).
Record file->f_path.mnt in a local variable and use this variable only.
Signed-off-by: Renaud Lottiaux <renaud.lottiaux@kerlabs.com> Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As it turns out, the call trace is messed up due to gcc's inlining, but
I isolated the problem anyway: kvm_write_guest_time() is being used in a
non-thread-safe manner on preemptable kernels.
Basically kvm_write_guest_time()'s body needs to be surrounded by
preempt_disable() and preempt_enable(), since the kernel won't let us
query any per-CPU data (indirectly using smp_processor_id()) without
preemption disabled. The attached patch fixes this issue by disabling
preemption inside kvm_write_guest_time().
[marcelo: surround only __get_cpu_var calls since the warning
is harmless]
We need to save register state *after* idling GEM, clearing the ring,
and uninstalling the IRQ handler, or we might end up saving bogus
fence regs, for one. Our restore ordering should already be correct,
since we do GEM, ring and IRQ init after restoring the last register
state, which prevents us from clobbering things.
I put this together to potentially address a bug, but I haven't heard
back if it fixes it yet. However I think it stands on its own, so I'm
sending it in.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Eric Anholt <eric@anholt.net> Cc: Jie Luo <clotho67@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
With 2.6.30, the error handling code in cdrom_newpc_intr was changed
to deal with partial request failures by normally completing the 'good'
parts of a request and only 'error' the last (and presumably,
incompletely transferred) bio associated with a particular
request. In order to do this, ide_complete_rq is called over
ide_cd_error_cmd() to partially complete the rq. The block layer
does partial completion only for requests with bio's and if the
rq doesn't have one (eg 'GPCMD_READ_DISC_INFO') the request is
completed as a whole and the drive->hwif->rq pointer set to NULL
afterwards. When calling ide_complete_rq again to report
the error, this null pointer is derefenced, resulting in a kernel
crash.
This fixes http://bugzilla.kernel.org/show_bug.cgi?id=13399.
Since early printk only makes sense/works when the serial driver is built
into the kernel, disable the option for this driver when it is going to be
built as a module. Otherwise we get build failures due to the ifdef
handling.
Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There is a call to write_lock() in gen_pool_destroy which is not balanced
by any corresponding write_unlock(). This causes problems with preemption
because the preemption-disable counter is incremented in the write_lock()
call, but never decremented by any call to write_unlock(). This bug is
gen_pool_destroy, and one of them is non-x86 arch-specific code.
On NUMA machines, the administrator can configure zone_reclaim_mode that
is a more targetted form of direct reclaim. On machines with large NUMA
distances for example, a zone_reclaim_mode defaults to 1 meaning that
clean unmapped pages will be reclaimed if the zone watermarks are not
being met.
There is a heuristic that determines if the scan is worthwhile but it is
possible that the heuristic will fail and the CPU gets tied up scanning
uselessly. Detecting the situation requires some guesswork and
experimentation so this patch adds a counter "zreclaim_failed" to
/proc/vmstat. If during high CPU utilisation this counter is increasing
rapidly, then the resolution to the problem may be to set
/proc/sys/vm/zone_reclaim_mode to 0.
[akpm@linux-foundation.org: name things consistently] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Christoph Lameter <cl@linux-foundation.org> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
A bug was brought to my attention against a distro kernel but it affects
mainline and I believe problems like this have been reported in various
guises on the mailing lists although I don't have specific examples at the
moment.
The reported problem was that malloc() stalled for a long time (minutes in
some cases) if a large tmpfs mount was occupying a large percentage of
memory overall. The pages did not get cleaned or reclaimed by
zone_reclaim() because the zone_reclaim_mode was unsuitable, but the lists
are uselessly scanned frequencly making the CPU spin at near 100%.
This patchset intends to address that bug and bring the behaviour of
zone_reclaim() more in line with expectations which were noticed during
investigation. It is based on top of mmotm and takes advantage of
Kosaki's work with respect to zone_reclaim().
Patch 1 fixes the heuristics that zone_reclaim() uses to determine if the
scan should go ahead. The broken heuristic is what was causing the
malloc() stall as it uselessly scanned the LRU constantly. Currently,
zone_reclaim is assuming zone_reclaim_mode is 1 and historically it
could not deal with tmpfs pages at all. This fixes up the heuristic so
that an unnecessary scan is more likely to be correctly avoided.
Patch 2 notes that zone_reclaim() returning a failure automatically means
the zone is marked full. This is not always true. It could have
failed because the GFP mask or zone_reclaim_mode were unsuitable.
Patch 3 introduces a counter zreclaim_failed that will increment each
time the zone_reclaim scan-avoidance heuristics fail. If that
counter is rapidly increasing, then zone_reclaim_mode should be
set to 0 as a temporarily resolution and a bug reported because
the scan-avoidance heuristic is still broken.
This patch:
On NUMA machines, the administrator can configure zone_reclaim_mode that
is a more targetted form of direct reclaim. On machines with large NUMA
distances for example, a zone_reclaim_mode defaults to 1 meaning that
clean unmapped pages will be reclaimed if the zone watermarks are not
being met.
There is a heuristic that determines if the scan is worthwhile but the
problem is that the heuristic is not being properly applied and is
basically assuming zone_reclaim_mode is 1 if it is enabled. The lack of
proper detection can manfiest as high CPU usage as the LRU list is scanned
uselessly.
Historically, once enabled it was depending on NR_FILE_PAGES which may
include swapcache pages that the reclaim_mode cannot deal with. Patch
vmscan-change-the-number-of-the-unmapped-files-in-zone-reclaim.patch by
Kosaki Motohiro noted that zone_page_state(zone, NR_FILE_PAGES) included
pages that were not file-backed such as swapcache and made a calculation
based on the inactive, active and mapped files. This is far superior when
zone_reclaim==1 but if RECLAIM_SWAP is set, then NR_FILE_PAGES is a
reasonable starting figure.
This patch alters how zone_reclaim() works out how many pages it might be
able to reclaim given the current reclaim_mode. If RECLAIM_SWAP is set in
the reclaim_mode it will either consider NR_FILE_PAGES as potential
candidates or else use NR_{IN}ACTIVE}_PAGES-NR_FILE_MAPPED to discount
swapcache and other non-file-backed pages. If RECLAIM_WRITE is not set,
then NR_FILE_DIRTY number of pages are not candidates. If RECLAIM_SWAP is
not set, then NR_FILE_MAPPED are not.
[kosaki.motohiro@jp.fujitsu.com: Estimate unmapped pages minus tmpfs pages]
[fengguang.wu@intel.com: Fix underflow problem in Kosaki's estimate] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If someone changes the size of the device simultaneously, i_size_read
is guaranteed to return a valid value (either the old one or the new one).
i_size can return some intermediate invalid value (on 32-bit computers
with 64-bit i_size, the reads to both halves of i_size can be interleaved
with updates to i_size, resulting in garbage being returned).
Cc: Yi Yang <yi.y.yang@intel.com> Cc: Jonathan Brassow <jbrassow@redhat.com> Cc: stable@kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When snapshots are created using 'p' instead of 'P' as the
exception store type, the device-mapper table loading fails.
This patch makes the code case insensitive as intended and fixes some
regressions reported with device-mapper snapshots.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There is a possibility of race condition because kevent queue is not flushed
in the multipath destructor. The scenario is:
- some event happens and is queued to keventd
- keventd thread is delayed due to scheuling latency or some other work
- multipath device is destroyed
- keventd now attempts to process work_struct that is residing in already
released memory.
The patch flushes the keventd queue in multipath constructor.
I've already fixed similar bug in dm-raid1.
After downing/upping a cpu, an attempt to set
/proc/sys/vm/percpu_pagelist_fraction results in an oops in
percpu_pagelist_fraction_sysctl_handler().
If a processor is downed then we need to set the pageset pointer back to
the boot pageset.
Updates of the high water marks should not access pagesets of unpopulated
zones (those pointer go to the boot pagesets which would be no longer
functional if their size would be increased beyond zero).
According to the PCI PM specification (PCI Bus Power Management
Interface Specification, Rev. 1.2, Section 5.4.1) we are supposed to
reinitialize devices that have PCI_PM_CTRL_NO_SOFT_RESET clear during
all transitions from PCI_D3hot to PCI_D0, but we only do it if the
device's current_state field is equal to PCI_UNKNOWN.
This may lead to problems if a device with PCI_PM_CTRL_NO_SOFT_RESET
unset is put into PCI_D3hot at run time by its driver and
pci_set_power_state() is used to put it back into PCI_D0, because in
that case the device will remain uninitialized after
pci_set_power_state() has returned. Prevent that from happening by
modifying pci_raw_set_power_state() to reinitialize devices with
PCI_PM_CTRL_NO_SOFT_RESET unset during all transitions from D3 to D0.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If a PCI device is not power-manageable either by the platform, or
with the help of the native PCI PM interface, pci_target_state() will
return either PCI_D3hot, or PCI_POWER_ERROR for it, depending on
whether or not the device is configured to wake up the system. Alas,
none of these return values is correct, because each of them causes
pci_prepare_to_sleep() to return error code, although it should
complete successfully in such a case.
Fix this problem by making pci_target_state() always return PCI_D0
for devices that cannot be power managed.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When decoding (N)RPN sequencer events into raw MIDI commands, the
extra_decode_xrpn() function had accidentally swapped the MSB and LSB
controller values of both the parameter number and the data value.
This reverts 'ath5k: remove dummy PCI "retry timeout" fix' on the
same theory as in 'ath9k: Fix PCI FATAL interrupts by restoring
RETRY_TIMEOUT disabling'.
Reported-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The Unicast Promiscious Mode (UPM) bit in the mv643xx_eth port
configuration register doesn't do exactly what its name would suggest:
setting this bit merely enables reception of all unicast frames with a
destination address that differs from our local MAC address in bits
[47:4]. In particular, it doesn't have any effect on unicast frames
with a destination address that matches our MAC address in bits [47:4]
-- these will still be tested against the 16-entry unicast address
filter table.
Therefore, if the interface is set to promiscuous mode, just setting
the unicast promiscuous bit isn't enough -- we need to set all filter
bits in the unicast filter table to 1 as well.
Reported-by: Sachin Sanap <ssanap@marvell.com> Signed-off-by: Prabhanjan Sarnaik <sarnaik@marvell.com> Tested-by: Siddarth Gore <gores@marvell.com> Tested-by: Mahavir Jain <mjain@marvell.com> Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
CONFIG_PARPORT_PC_SUPERIO probes for various superio chips by writing
byte sequences to a set of different potential I/O ranges. But the
probed ranges are not exclusive to parallel ports. Some of our boards
just happen to have a watchdog in one of them. Took us almost a week
to figure out why some distros reboot without warning after running
flawlessly for 3 hours. For exactly 170 = 0xAA minutes, that is ...
Fixed by restoring original values after probing. Also fixed too small
request_region() in detect_and_report_it87().
Signed-off-by: Jens Rottmann <JRottmann@LiPPERTEmbedded.de> Signed-off-by: Alan Cox <alan@linux.intel.com> Acked-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
parport_pc_probe_port() creates the own 'parport_pc' device if the
device argument is NULL. Then parport_pc_probe_port() doesn't
initialize the dma_mask and coherent_dma_mask of the device and calls
dma_alloc_coherent with it. dma_alloc_coherent fails because
dma_alloc_coherent() doesn't accept the uninitialized dma_mask:
http://lkml.org/lkml/2009/6/16/150
Long ago, X86_32 and X86_64 had the own dma_alloc_coherent
implementations; X86_32 accepted a device having dma_mask that is not
initialized however X86_64 didn't. When we merged them, we chose to
prohibit a device having dma_mask that is not initialized. I think
that it's good to require drivers to set up dma_mask (and
coherent_dma_mask) properly if the drivers want DMA.
This patch splits up the shutdown method of usb_serial_driver into a
disconnect and a release method.
The problem is that the usb-serial core was calling shutdown during
disconnect handling, but drivers didn't expect it to be called until
after all the open file references had been closed. The result was an
oops when the close method tried to use memory that had been
deallocated by shutdown.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This fixes a stack corruption panic or null dereference oops
due to a bad GS in resume_userspace() when returning from
sys_vm86() and calling lockdep_sys_exit().
Only a problem when CONFIG_LOCKDEP and CONFIG_CC_STACKPROTECTOR
enabled.
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Cc: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1244384628.2323.4.camel@bimbo> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Steven Noonan <steven@uplinklabs.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This snapshot has been taken while neither the function tracer nor
the function graph tracer was running.
With dynamic ftrace, such results show a wrong ftrace behaviour
because all calls to ftrace_caller or ftrace_graph_caller (the patched
calls to mcount) are supposed to be patched into nop if none of those
tracers are running.
The problem occurs after the first run of the function tracer. Once we
launch it a second time, the callsites will never be nopped back,
unless you set custom filters.
For example it happens during the self tests at boot time.
The function tracer selftest runs, and then the dynamic tracing is
tested too. After that, the callsites are left un-nopped.
This is because the reset callback of the function tracer tries to
unregister two ftrace callbacks in once: the common function tracer
and the function tracer with stack backtrace, regardless of which
one is currently in use.
It then creates an unbalance on ftrace_start_up value which is expected
to be zero when the last ftrace callback is unregistered. When it
reaches zero, the FTRACE_DISABLE_CALLS is set on the next ftrace
command, triggering the patching into nop. But since it becomes
unbalanced, ie becomes lower than zero, if the kernel functions
are patched again (as in every further function tracer runs), they
won't ever be nopped back.
Note that ftrace_call and ftrace_graph_call are still patched back
to ftrace_stub in the off case, but not the callers of ftrace_call
and ftrace_graph_caller. It means that the tracing is well deactivated
but we waste a useless call into every kernel function.
This patch just unregisters the right ftrace_ops for the function
tracer on its reset callback and ignores the other one which is
not registered, fixing the unbalance. The problem also happens
is .30
In the unlikely event that reshape progresses past the current request
while it is waiting for a stripe we need to schedule() before retrying
for 2 reasons:
1/ Prevent list corruption from duplicated list_add() calls without
intervening list_del().
2/ Give the reshape code a chance to make some progress to resolve the
conflict.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
An earlier commit, 'ath9k: remove dummy PCI "retry timeout" fix', removed
code that was documented to disable RETRY_TIMEOUT register (PCI reg
0x41) since it was claimed to be a no-op. However, it turns out that
there are some combinations of hosts and ath9k-supported cards for
which this is not a no-op (reg 0x41 has value 0x80, not 0) and this
code (or something similar) is needed. In such cases, the driver may
be next to unusable due to very frequent PCI FATAL interrupts from the
card.
Reverting the earlier commit, i.e., restoring the RETRY_TIMEOUT
disabling, seems to resolve the issue. Since the removal of this code
was not based on any known issue and was purely a cleanup change, the
safest option here is to just revert that commit. Should there be
desire to clean this up in the future, the change will need to be
tested with a more complete coverage of cards and host systems.
http://bugzilla.kernel.org/show_bug.cgi?id=13483
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix this by propagating the error and also lets not leave the
user in the dark and communicate what's going on. When this
happens you will now see this:
ath9k 0000:16:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
ath9k: Invalid EEPROM contents
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There is a race on access to last_request and its alpha2
through reg_is_valid_request() and us possibly processing
first another regulatory request on another CPU. We avoid
this improbably race by locking with the cfg80211_mutex as
we should have done in the first place. While at it add
the assert on locking on reg_is_valid_request().
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This has no functional change except we save a kfree(rd) and
allows us to clean this code up a bit after this. We do
avoid an unnecessary kfree(NULL) but calling that was OK too.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This has no functional change, but it will make the race
fix easier to spot in my next patch.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This fixes an incorrect assumption (BUG_ON) made in
cfg80211 when handling country IE regulatory requests.
The assumption was that we won't try to call_crda()
twice for the same event and therefore we will not
recieve two replies through nl80211 for the regulatory
request. As it turns out it is true we don't call_crda()
twice for the same event, however, kobject_uevent_env()
*might* send the udev event twice and/or userspace can
simply process the udev event twice. We remove the BUG_ON()
and simply ignore the duplicate request.
Reported-by: Maxim Levitsky <maximlevitsky@gmail.com> Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The minstrel rate controller periodically looks up rate indexes in
a sampling table. When accessing a specific row and column, minstrel
correctly does a bounds check which, on the surface, appears to handle
the case where mi->n_rates < 2. However, mi->sample_idx is actually
defined as an unsigned, so the right hand side is taken to be a huge
positive number when negative, and the check will always fail.
Consequently, the RC will overrun the array and cause random memory
corruption when communicating with a peer that has only a single rate.
The max value of mi->sample_idx is around 25 so casting to int should
have no ill effects.
Without the change, uptime is a few minutes under load with an AP
that has a single hard-coded rate, and both the AP and STA could
potentially crash. With the change, both lasted 12 hours with a
steady load.
Thanks to Ognjen Maric for providing the single-rate clue so I could
reproduce this.
This fixes http://bugzilla.kernel.org/show_bug.cgi?id=12490 on the
regression list (also http://bugzilla.kernel.org/show_bug.cgi?id=13000).
Reported-by: Sergey S. Kostyliov <rathamahata@gmail.com> Reported-by: Ognjen Maric <ognjen.maric@gmail.com> Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On systems where CONFIG_SHMEM is disabled, mounting tmpfs filesystems can
fail when tmpfs options are used. This is because tmpfs creates a small
wrapper around ramfs which rejects unknown options, and ramfs itself only
supports a tiny subset of what tmpfs supports. This makes it pretty hard
to use the same userspace systems across different configuration systems.
As such, ramfs should ignore the tmpfs options when tmpfs is merely a
wrapper around ramfs.
This used to work before commit c3b1b1cbf0 as previously, ramfs would
ignore all options. But now, we get:
ramfs: bad mount option: size=10M
mount: mounting mdev on /dev failed: Invalid argument
Another option might be to restore the previous behavior, where ramfs
simply ignored all unknown mount options ... which is what Hugh prefers.
x86 stack traces are a piece of crap without frame pointers, and its not
like the 'performance gain' of not having stack pointers matters when you
selected lockdep.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <new-submission> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I unfortunately accepted a patch time ago, to drop the "current" usage
from possible IRQ context, w/out proper thought over it. The patch
switched to using the CPU id by bounding the nested call callback with a
get_cpu()/put_cpu().
Unfortunately the ep_call_nested() function can be called with a callback
that grabs sleepy locks (from own f_op->poll()), that results in epic
fails. The following patch uses the proper "context" depending on the
path where it is called, and on the kind of callback.
This has been reported by Stefan Richter, that has also verified the patch
is his previously failing environment.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Reported-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The ConnectX Programmer's Reference Manual states that the "SO" bit
must be set when posting Fast Register and Local Invalidate send work
requests. When this bit is set, the work request will be executed
only after all previous work requests on the send queue have been
executed. (If the bit is not set, Fast Register and Local Invalidate
WQEs may begin execution too early, which violates the defined
semantics for these operations)
This fixes the issue with NFS/RDMA reported in
<http://lists.openfabrics.org/pipermail/general/2009-April/059253.html>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Without this, the default implementation is a no op which is completely
wrong with a VIVT cache, and usage of sg_copy_buffer() produces
unpredictable results.
Tested-by: Sebastian Andrzej Siewior <bigeasy@breakpoint.cc> Signed-off-by: Nicolas Pitre <nico@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
timer interrupts are excluded from being disabled during suspend. The
clock events code manages the disabling of clock events on its own
because the timer interrupt needs to be functional before the resume
code reenables the device interrupts.
The hpet per cpu timers request their interrupt without setting the
IRQF_TIMER flag so suspend_device_irqs() disables them as well which
results in a fatal resume failure on the boot CPU.
Adding IRQF_TIMER to the interupt flags when requesting the hpet per
cpu timer interrupts solves the problem.
Reported-by: Benjamin S. <sbenni@gmx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Benjamin S. <sbenni@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The commit f9e336f65b666b8f1764d17e9b7c21c90748a37e
ALSA: hda - Unify capture mixer creation in realtek codes
removed the "Input Source" mixer element creation for toshiba-s06 model
because it contains a digital-mic input.
The PCM pointer callback sometimes returns invalid positions and this
screws up the hw_ptr updater in PCM core. Especially since now the
jiffies check is optional with xrun_debug, the invalid position is
handled as is, and causes serious sound skips, etc.
This patch simplifies the position-fix strategy in intel8x0 to be more
robust:
- just falls back to the last position if bogus position is detected
- another sanity check for the backward move of the position due to
a race of register update and the base-index update
This patch is applicable also for 2.6.30.
Tested-by: David Miller <davem@davemloft.net> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On a system where system memory (according e820) is not covered by
mtrr, mtrr_trim_memory converts a portion of memory to reserved, but
bootloader has already put the initrd in that range.
Thus, we need to have 64bit to use relocate_initrd too.
[ Impact: fix using initrd when mtrr_trim_memory happen ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The initialization of the UV Broadcast Assist Unit's sending
buffers was making an invalid assumption about the
initialization of an MMR that defines its address.
The BIOS will not be providing that MMR. So
uv_activation_descriptor_init() should unconditionally set it.
The *fence instructions were moved to vsyscall_64.c by commit cb9e35dce94a1b9c59d46224e8a94377d673e204. But this breaks the
vDSO, because vread methods are also called from there.
Besides, the synchronization might be unnecessary for other
time sources than TSC.
[ Impact: fix potential time warp in VDSO ]
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
LKML-Reference: <9d0ea9ea0f866bdc1f4d76831221ae117f11ea67.1243241859.git.ptesarik@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The current code to set up the GART as an IOMMU enables GART
translations before it removes the aperture from the kernel memory
map, sets the GART PTEs to UC, sets up the guard and scratch
pages, or does a wbinvd(). This leaves the possibility of cache
aliasing open and can cause system crashes.
Re-order the code so as to enable the GART translations only
after all safeguards are in place and the tlb has been flushed.
AMD has tested this patch on both Istanbul systems and 1st
generation Opteron systems with APG enabled and seen no adverse
effects. Istanbul systems with HT Assist enabled sometimes
see MCE errors due to cache artifacts with the unmodified
code.
Fix bug in the SGI UV macros that support systems with multiple
coherency domains. The macros used for referencing global MMR
(chipset registers) are failing to correctly "or" the NASID
(node identifier) bits that reside above M+N. These high bits
are supplied automatically by the chipset for memory accesses
coming from the processor socket.
However, the bits must be present for references to the special
global MMR space used to map chipset registers. (See uv_hub.h
for more details ...)
The bug results in references to invalid/incorrect nodes.
The UV tlb shootdown code has a serious initialization error.
An array of structures [32*8] is initialized as if it were [32].
The array is indexed by (cpu number on the blade)*8, so the short
initialization works for up to 4 cpus on a blade.
But above that, we provide an invalid opcode to the hub's
broadcast assist unit.
This patch changes the allocation of the array to use its symbolic
dimensions for better clarity. And initializes all 32*8 entries.
Shortened 'UV_ACTIVATION_DESCRIPTOR_SIZE' to 'UV_ADP_SIZE' per Ingo's
recommendation.