This patch implements a workaround for a UV2 hardware bug.
The bug is a non-atomic update of a memory-mapped register. When
hardware message delivery and software message acknowledge occur
simultaneously the pending message acknowledge for the arriving
message may be lost. This causes the sender's message status to
stay busy.
Part of the workaround is to not acknowledge a completed message
until it is verified that no other message is actually using the
resource that is mistakenly recorded in the completed message.
Part of the workaround is to test for long elapsed time in such
a busy condition, then handle it by using a spare sending
descriptor. The stay-busy condition is eventually timed out by
hardware, and then the original sending descriptor can be
re-used. Most of that logic change is in keeping track of the
current descriptor and the state of the spares.
The occurrences of the workaround are added to the BAU
statistics.
Tracepoints are disabled for tainted modules, which is usually because the
module is either proprietary or was forced, and we don't want either of them
using kernel tracepoints.
But, a module can also be tainted by being in the staging directory or
compiled out of tree. Either is fine for use with tracepoints, no need
to punish them. I found this out when I noticed that my sample trace event
module, when done out of tree, stopped working.
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Dave Jones <davej@redhat.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
All radio tuners in cx88 driver using same address for radio and tuner,
so there is no need to probe it twice for same tuner and we can use
radio_type UNSET, this also fix broken radio since kernel 2.6.39-rc1
for those tuners.
This clears the currently mapped core when suspending, to force
re-mapping after resume. Without that we were touching default core
registers believing some other core is mapped. Such a behaviour
resulted in lockups on some machines.
Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The target code was not setting the additional sense length field in the
sense data it returned, which meant that at least the Linux stack
ignored the ASC/ASCQ fields. For example, without this patch, on a
tcm_loop device:
Current SCSI specs say that the "response format" field in the standard
INQUIRY response should be set to 2, and all the real SCSI devices I
have do put 2 here. So let's do that too.
Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
sym53c8xx_slave_destroy unconditionally assumes that sym53c8xx_slave_alloc has
succesesfully allocated a sym_lcb. This can lead to a NULL pointer dereference
(exposed by commit 4e6c82b).
For UP processor, it is likely that no _MAT method or MADT table defined.
So currently acpi_get_cpuid(...) always return -1 for UP processor.
This is wrong. It should return valid value for CPU0.
In the other hand, BIOS may define multiple CPU handles even for UP
processor, for example
Reported-and-tested-by: wallak@free.fr Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The call to acpi_os_validate_address in acpi_ds_get_region_arguments was
removed by mistake in commit 9ad19ac(ACPICA: Split large dsopcode and
dsload.c files).
Put it back.
Reported-and-bisected-by: Luca Tettamanti <kronos.it@gmail.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In SRAT v1, we had 8bit proximity domain (PXM) fields; SRAT v2 provides
32bits for these. The new fields were reserved before.
According to the ACPI spec, the OS must disregrard reserved fields.
ia64 did handle the PXM fields almost consistently, but depending on
sgi's sn2 platform. This patch leaves the sn2 logic in, but does also
use 16/32 bits for PXM if the SRAT has rev 2 or higher.
The patch also adds __init to the two pxm accessor functions, as they
access __initdata now and are called from an __init function only anyway.
Note that the code only uses 16 bits for the PXM field in the processor
proximity field; the patch does not address this as 16 bits are more than
enough.
Signed-off-by: Kurt Garloff <kurt@garloff.de> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In SRAT v1, we had 8bit proximity domain (PXM) fields; SRAT v2 provides
32bits for these. The new fields were reserved before.
According to the ACPI spec, the OS must disregrard reserved fields.
x86/x86-64 was rather inconsistent prior to this patch; it used 8 bits
for the pxm field in cpu_affinity, but 32 bits in mem_affinity.
This patch makes it consistent: Either use 8 bits consistently (SRAT
rev 1 or lower) or 32 bits (SRAT rev 2 or higher).
cc: x86@kernel.org Signed-off-by: Kurt Garloff <kurt@garloff.de> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In SRAT v1, we had 8bit proximity domain (PXM) fields; SRAT v2 provides
32bits for these. The new fields were reserved before.
According to the ACPI spec, the OS must disregrard reserved fields.
In order to know whether or not, we must know what version the SRAT
table has.
This patch stores the SRAT table revision for later consumption
by arch specific __init functions.
Signed-off-by: Kurt Garloff <kurt@garloff.de> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
smp_call_function() only lets all other CPUs execute a specific function,
while we expect all CPUs do in intel_idle. Without the fix, we could have
one cpu which has auto_demotion enabled or has no broadcast timer setup.
Usually we don't see impact because auto demotion just harms power and the
intel_idle init is called in CPU 0, where boradcast timer delivers
interrupt, but this still could be a problem.
Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
kvm -cpu host passes the original cpuid info to the guest.
Latest kvm version seem to return true for mwait_leaf cpuid
function on recent Intel CPUs. But it does not return mwait
C-states (mwait_substates), instead zero is returned.
While real CPUs seem to always return non-zero values, the intel
idle driver should not get active in kvm (mwait_substates == 0)
case and bail out.
Otherwise a Null pointer exception will happen later when the
cpuidle subsystem tries to get active:
[0.984807] BUG: unable to handle kernel NULL pointer dereference at (null)
[0.984807] IP: [<(null)>] (null)
...
[0.984807][<ffffffff8143cf34>] ? cpuidle_idle_call+0xb4/0x340
[0.984807][<ffffffff8159e7bc>] ? __atomic_notifier_call_chain+0x4c/0x70
[0.984807][<ffffffff81001198>] ? cpu_idle+0x78/0xd0
Signed-off-by: Thomas Renninger <trenn@suse.de> CC: Bruno Friedmann <bruno@ioda-net.ch> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
TOMOYO 2.5 in Linux 3.2 and later handles Unix domain socket's address.
Thus, tomoyo_correct_word2() needs to accept \000 as a valid character, or
TOMOYO 2.5 cannot handle Unix domain's abstract socket address.
Reported-by: Steven Allen <steven@stebalien.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The two DACs for the front output and the surround/center/LFE/back
outputs are wired up out of phase, so when channels are duplicated,
their sound can cancel out each other and result in a weaker bass
response. To fix this, reverse the polarity of the neutron flow to
the front output.
Reported-any-tested-by: Daniel Hill <daniel@enemyplanet.geek.nz> Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Several users have reported "choppy" audio under the 3.2 kernel,
and that changing position_fix to 1 has resolved their problem.
The chip is an nVidia Corporation MCP89 High Definition Audio,
[10de:0d94] (rev a2).
Jüri Aedla reported that the /proc/<pid>/mem handling really isn't very
robust, and it also doesn't match the permission checking of any of the
other related files.
This changes it to do the permission checks at open time, and instead of
tracking the process, it tracks the VM at the time of the open. That
simplifies the code a lot, but does mean that if you hold the file
descriptor open over an execve(), you'll continue to read from the _old_
VM.
That is different from our previous behavior, but much simpler. If
somebody actually finds a load where this matters, we'll need to revert
this commit.
I suspect that nobody will ever notice - because the process mapping
addresses will also have changed as part of the execve. So you cannot
actually usefully access the fd across a VM change simply because all
the offsets for IO would have changed too.
Reported-by: Jüri Aedla <asd@ut.ee> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[ Changes with respect to 3.3: return -ENOTTY from scsi_verify_blk_ioctl
and -ENOIOCTLCMD from sd_compat_ioctl. ]
Linux allows executing the SG_IO ioctl on a partition or LVM volume, and
will pass the command to the underlying block device. This is
well-known, but it is also a large security problem when (via Unix
permissions, ACLs, SELinux or a combination thereof) a program or user
needs to be granted access only to part of the disk.
This patch lets partitions forward a small set of harmless ioctls;
others are logged with printk so that we can see which ioctls are
actually sent. In my tests only CDROM_GET_CAPABILITY actually occurred.
Of course it was being sent to a (partition on a) hard disk, so it would
have failed with ENOTTY and the patch isn't changing anything in
practice. Still, I'm treating it specially to avoid spamming the logs.
In principle, this restriction should include programs running with
CAP_SYS_RAWIO. If for example I let a program access /dev/sda2 and
/dev/sdb, it still should not be able to read/write outside the
boundaries of /dev/sda2 independent of the capabilities. However, for
now programs with CAP_SYS_RAWIO will still be allowed to send the
ioctls. Their actions will still be logged.
This patch does not affect the non-libata IDE driver. That driver
however already tests for bd != bd->bd_contains before issuing some
ioctl; it could be restricted further to forbid these ioctls even for
programs running with CAP_SYS_ADMIN/CAP_SYS_RAWIO.
Cc: linux-scsi@vger.kernel.org Cc: Jens Axboe <axboe@kernel.dk> Cc: James Bottomley <JBottomley@parallels.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ Make it also print the command name when warning - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
For 32-bit architectures using standard jiffies the idletime calculation
in uptime_proc_show will quickly overflow. It takes (2^32 / HZ) seconds
of idle-time, or e.g. 12.45 days with no load on a quad-core with HZ=1000.
Switch to 64-bit calculations.
Cc: Michael Abbott <michael.abbott@diamond.ac.uk> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds USB ID for the touchpanel in Acer Iconia W500. The panel
supports up to five fingers, therefore the need for a new addition of panel
types.
Signed-off-by: Marek Vasut <marek.vasut@gmail.com> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@enac.fr> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Backport note:
This patch it's a full revert of commit b23b025f "mac80211: Optimize
scans on current operating channel.". On upstrem revert e76aadc5 we
keep some bits from that commit, which are needed for upstream version
of mac80211.
The on-channel work optimisations have caused a
number of issues, and the code is unfortunately
very complex and almost impossible to follow.
Instead of attempting to put in more workarounds
let's just remove those optimisations, we can
work on them again later, after we change the
whole auth/assoc design.
This should fix rate_control_send_low() warnings,
see RH bug 731365.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
One bio can have at most BIO_MAX_PAGES pages. We should limit it bec otherwise
bio_alloc will fail when there are many pages in one read/write_pagelist.
bl_free_block_dev() may sleep. We can not call it with spinlock held.
Besides, there is no need to take bm_lock as we are last user freeing bm_devlist.
Two (or more) concurrent calls of shrink_dcache_parent() on the same dentry may
cause shrink_dcache_parent() to loop forever.
Here's what appears to happen:
1 - CPU0: select_parent(P) finds C and puts it on dispose list, returns 1
2 - CPU1: select_parent(P) locks P->d_lock
3 - CPU0: shrink_dentry_list() locks C->d_lock
dentry_kill(C) tries to lock P->d_lock but fails, unlocks C->d_lock
4 - CPU1: select_parent(P) locks C->d_lock,
moves C from dispose list being processed on CPU0 to the new
dispose list, returns 1
5 - CPU0: shrink_dentry_list() finds dispose list empty, returns
6 - Goto 2 with CPU0 and CPU1 switched
Basically select_parent() steals the dentry from shrink_dentry_list() and thinks
it found a new one, causing shrink_dentry_list() to think it's making progress
and loop over and over.
One way to trigger this is to make udev calls stat() on the sysfs file while it
is going away.
Having a file in /lib/udev/rules.d/ with only this one rule seems to the trick:
while true; do
echo -bond0 > /sys/class/net/bonding_masters
echo +bond0 > /sys/class/net/bonding_masters
echo -bond1 > /sys/class/net/bonding_masters
echo +bond1 > /sys/class/net/bonding_masters
done
One fix would be to check all callers and prevent concurrent calls to
shrink_dcache_parent(). But I think a better solution is to stop the
stealing behavior.
This patch adds a new dentry flag that is set when the dentry is added to the
dispose list. The flag is cleared in dentry_lru_del() in case the dentry gets a
new reference just before being pruned.
If the dentry has this flag, select_parent() will skip it and let
shrink_dentry_list() retry pruning it. With select_parent() skipping those
dentries there will not be the appearance of progress (new dentries found) when
there is none, hence shrink_dcache_parent() will not loop forever.
Set the flag is also set in prune_dcache_sb() for consistency as suggested by
Linus.
select_parent currently abuses the dentry cache LRU to provide
cleanup features for child dentries that need to be freed. It moves
them to the tail of the LRU, then tells shrink_dcache_parent() to
calls __shrink_dcache_sb to unconditionally move them to a dispose
list (as DCACHE_REFERENCED is ignored). __shrink_dcache_sb() has to
relock the dentries to move them off the LRU onto the dispose list,
but otherwise does not touch the dentries that select_parent() moved
to the tail of the LRU. It then passses the dispose list to
shrink_dentry_list() which tries to free the dentries.
IOWs, the use of __shrink_dcache_sb() is superfluous - we can build
exactly the same list of dentries for disposal directly in
select_parent() and call shrink_dentry_list() instead of calling
__shrink_dcache_sb() to do that. This means that we avoid long holds
on the lru lock walking the LRU moving dentries to the dispose list
We also avoid the need to relock each dentry just to move it off the
LRU, reducing the numebr of times we lock each dentry to dispose of
them in shrink_dcache_parent() from 3 to 2 times.
Further, we remove one of the two callers of __shrink_dcache_sb().
This also means that __shrink_dcache_sb can be moved into back into
prune_dcache_sb() and we no longer have to handle referenced
dentries conditionally, simplifying the code.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There is a potential integer overflow in uvc_ioctl_ctrl_map(). When a
large xmap->menu_count is passed from the userspace, the subsequent call
to kmalloc() will allocate a buffer smaller than expected.
map->menu_count and map->menu_info would later be used in a loop (e.g.
in uvc_query_v4l2_ctrl), which leads to out-of-bound access.
The patch checks the ioctl argument and returns -EINVAL for zero or too
large values in xmap->menu_count.
In ELF64, the sh_flags field is 64-bits wide. recordmcount was
erroneously treating it as a 32-bit wide field. For little endian
objects this works because the flags of interest (SHF_EXECINSTR)
reside in the lower 32 bits of the word, and you get the same result
with either a 32-bit or 64-bit read. Big endian objects on the
other hand do not work at all with this error.
The fix: Correctly treat sh_flags as 64-bits wide in elf64 objects.
The symptom I observed was that my
__start_mcount_loc..__stop_mcount_loc was empty even though ftrace
function tracing was enabled.
expkey_parse() oopses when handling a 0 length export. This is easily
triggerable from usermode by writing 0 bytes into
'/proc/[proc id]/net/rpc/nfsd.fh/channel'.
Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Cc: linux-nfs@vger.kernel.org Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Lockowners are looked up by file as well as by owner, but we were
forgetting to do a comparison on the file. This could cause an
incorrect result from lockt.
(Note looking up the inode from the lockowner is pretty awkward here.
The data structures need fixing.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Socket callbacks use svc_xprt_enqueue() to add an xprt to a
pool->sp_sockets list. In normal operation a server thread will later
come along and take the xprt off that list. On shutdown, after all the
threads have exited, we instead manually walk the sv_tempsocks and
sv_permsocks lists to find all the xprt's and delete them.
So the sp_sockets lists don't really matter any more. As a result,
we've mostly just ignored them and hoped they would go away.
Which has gotten us into trouble; witness for example ebc63e531cc6
"svcrpc: fix list-corrupting race on nfsd shutdown", the result of Ben
Greear noticing that a still-running svc_xprt_enqueue() could re-add an
xprt to an sp_sockets list just before it was deleted. The fix was to
remove it from the list at the end of svc_delete_xprt(). But that only
made corruption less likely--I can see nothing that prevents a
svc_xprt_enqueue() from adding another xprt to the list at the same
moment that we're removing this xprt from the list. In fact, despite
the earlier xpo_detach(), I don't even see what guarantees that
svc_xprt_enqueue() couldn't still be running on this xprt.
So, instead, note that svc_xprt_enqueue() essentially does:
lock sp_lock
if XPT_BUSY unset
add to sp_sockets
unlock sp_lock
So, if we do:
set XPT_BUSY on every xprt.
Empty every sp_sockets list, under the sp_socks locks.
Then we're left knowing that the sp_sockets lists are all empty and will
stay that way, since any svc_xprt_enqueue() will check XPT_BUSY under
the sp_lock and see it set.
And *then* we can continue deleting the xprt's.
(Thanks to Jeff Layton for being correctly suspicious of this code....)
Cc: Ben Greear <greearb@candelatech.com> Cc: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The pool_to and to_pool fields of the global svc_pool_map are freed on
shutdown, but are initialized in nfsd startup only in the
SVC_POOL_PERCPU and SVC_POOL_PERNODE cases.
They *are* initialized to zero on kernel startup. So as long as you use
only SVC_POOL_GLOBAL (the default), this will never be a problem.
You're also OK if you only ever use SVC_POOL_PERCPU or SVC_POOL_PERNODE.
However, the following sequence events leads to a double-free:
1. set SVC_POOL_PERCPU or SVC_POOL_PERNODE
2. start nfsd: both fields are initialized.
3. shutdown nfsd: both fields are freed.
4. set SVC_POOL_GLOBAL
5. start nfsd: the fields are left untouched.
6. shutdown nfsd: now we try to free them again.
Step 4 is actually unnecessary, since (for some bizarre reason), nfsd
automatically resets the pool mode to SVC_POOL_GLOBAL on shutdown.
Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Thomas Lange reported that when he did a 'make localmodconfig', his
config was missing the brcmsmac driver, even though he had the module
loaded.
Looking into this, I found the file:
drivers/net/wireless/brcm80211/brcmsmac/Makefile
had the following in the Makefile:
MODULEPFX := brcmsmac
obj-$(CONFIG_BRCMSMAC) += $(MODULEPFX).o
The way streamline-config.pl works, is parsing all the
obj-$(CONFIG_FOO) += foo.o
lines to find that CONFIG_FOO belongs to the module foo.ko.
But in this case, the brcmsmac.o was not used, but a variable in its place.
By changing streamline-config.pl to remember defined variables in Makefiles
and substituting them when they are used in the obj-X lines, allows
Thomas (and others) to have their brcmsmac module stay configured
when it is loaded and running "make localmodconfig".
Reported-by: Thomas Lange <thomas-lange2@gmx.de> Tested-by: Thomas Lange <thomas-lange2@gmx.de> Cc: Arend van Spriel <arend@broadcom.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Multiple users of the function tracer can register their functions
with the ftrace_ops structure. The accounting within ftrace will
update the counter on each function record that is being traced.
When the ftrace_ops filtering adds or removes functions, the
function records will be updated accordingly if the ftrace_ops is
still registered.
When a ftrace_ops is removed, the counter of the function records,
that the ftrace_ops traces, are decremented. When they reach zero
the functions that they represent are modified to stop calling the
mcount code.
When changes are made, the code is updated via stop_machine() with
a command passed to the function to tell it what to do. There is an
ENABLE and DISABLE command that tells the called function to enable
or disable the functions. But the ENABLE is really a misnomer as it
should just update the records, as records that have been enabled
and now have a count of zero should be disabled.
The DISABLE command is used to disable all functions regardless of
their counter values. This is the big off switch and is not the
complement of the ENABLE command.
To make matters worse, when a ftrace_ops is unregistered and there
is another ftrace_ops registered, neither the DISABLE nor the
ENABLE command are set when calling into the stop_machine() function
and the records will not be updated to match their counter. A command
is passed to that function that will update the mcount code to call
the registered callback directly if it is the only one left. This
means that the ftrace_ops that is still registered will have its callback
called by all functions that have been set for it as well as the ftrace_ops
that was just unregistered.
Here's a way to trigger this bug. Compile the kernel with
CONFIG_FUNCTION_PROFILER set and with CONFIG_FUNCTION_GRAPH not set:
CONFIG_FUNCTION_PROFILER=y
# CONFIG_FUNCTION_GRAPH is not set
This will force the function profiler to use the function tracer instead
of the function graph tracer.
The same problem could have happened with the trace_probe_ops,
but they are modified with the set_frace_filter file which does the
update at closure of the file.
The simple solution is to change ENABLE to UPDATE and call it every
time an ftrace_ops is unregistered.
Since commit 080d676de095 ("aio: allocate kiocbs in batches") iocbs are
allocated in a batch during processing of first iocbs. All iocbs in a
batch are automatically added to ctx->active_reqs list and accounted in
ctx->reqs_active.
If one (not the last one) of iocbs submitted by an user fails, further
iocbs are not processed, but they are still present in ctx->active_reqs
and accounted in ctx->reqs_active. This causes process to stuck in a D
state in wait_for_all_aios() on exit since ctx->reqs_active will never
go down to zero. Furthermore since kiocb_batch_free() frees iocb
without removing it from active_reqs list the list become corrupted
which may cause oops.
Fix this by removing iocb from ctx->active_reqs and updating
ctx->reqs_active in kiocb_batch_free().
If ctrls->count is too high the multiplication could overflow and
array_size would be lower than expected. Mauro and Hans Verkuil
suggested that we cap it at 1024. That comes from the maximum
number of controls with lots of room for expantion.
This patch fixes a failure to recognize SD cards reported on a Dell
Vostro with O2 Micro SD card reader. Patch 49c468f ("mmc: sd: add
support for uhs bus speed mode selection") caused the problem, by
setting the SDHCI_CTRL_HISPD flag even for legacy timings.
Signed-off-by: Alexander Elbs <alex@segv.de> Acked-by: Philip Rakity <prakity@marvell.com> Acked-by: Arindam Nath <arindam.nath@amd.com> Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When suspending host, the tuning timer shoule be deactivated.
And the HOST_NEEDS_TUNING flag should be set after tuning timer is
deactivated.
Signed-off-by: Philip Rakity <prakity@marvell.com> Signed-off-by: Aaron Lu <aaron.lu@amd.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch fixes the wrong comparison before setting the interface
voltage in DDR mode.
The assignment to the variable ddr before comaprison is either
ddr = MMC_1_2V_DDR_MODE; or ddr == MMC_1_8V_DDR_MODE. But the comparison
is done with the extended csd value if ddr == EXT_CSD_CARD_TYPE_DDR_1_2V.
Signed-off-by: Girish K S <girish.shivananjappa@linaro.org> Acked-by: Subhash Jadavani <subhashj@codeaurora.org> Acked-by: Philip Rakity <prakity@marvell.com> Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When adding checks for ACPI resource conflicts to many bus drivers,
not enough attention was paid to the error paths, and for several
drivers this causes 0 to be returned on error in some cases. Fix this
by properly returning a non-zero value on every error.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We switch to dynamic debugging in commit 56e46742e846e4de167dde0e1e1071ace1c882a5 but did not take into account that
now we do not control anymore whether a specific message is enabled or not.
So now we lock the "dbg_lock" and release it in every debugging macro, which
make them not so light-weight.
This commit removes the "dbg_lock" protection from the debugging macros to
fix the issue.
The downside is that now our DBGKEY() stuff is broken, but this is not
critical at all and will be fixed later.
Patch 56e46742e846e4de167dde0e1e1071ace1c882a5 broke UBIFS debugging messages:
before that commit when UBIFS debugging was enabled, users saw few useful
debugging messages after mount. However, that patch turned 'dbg_msg()' into
'pr_debug()', so to enable the debugging messages users have to enable them
first via /sys/kernel/debug/dynamic_debug/control, which is very impractical.
This commit makes 'dbg_msg()' to use 'printk()' instead of 'pr_debug()', just
as it was before the breakage.
Remove 'static' modifier from the 'vid_hdr' local variable. I do not know
how it slipped in, but this is a bug and will break UBI if someone attaches
2 UBI volumes at the same time.
Patch ab50ff684707031ed4bad2fdd313208ae392e5bb broke UBI debugging messages:
before that commit when UBI debugging was enabled, users saw few useful
debugging messages after attaching an MTD device. However, that patch turned
'dbg_msg()' into 'pr_debug()', so to enable the debugging messages users have
to enable them first via /sys/kernel/debug/dynamic_debug/control, which is
very impractical.
This commit makes 'dbg_msg()' to use 'printk()' instead of 'pr_debug()', just
as it was before the breakage.
On x86_32 casting the unsigned int result of get_random_int() to
long may result in a negative value. On x86_32 the range of
mmap_rnd() therefore was -255 to 255. The 32bit mode on x86_64
used 0 to 255 as intended.
The bug was introduced by 675a081 ("x86: unify mmap_{32|64}.c")
in January 2008.
Signed-off-by: Ludwig Nussel <ludwig.nussel@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: harvey.harrison@gmail.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/201111152246.pAFMklOB028527@wpaz5.hot.corp.google.com Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit ef6a3c6311 ("mm: add replace_page_cache_page() function") added a
function replace_page_cache_page(). This function replaces a page in the
radix-tree with a new page. WHen doing this, memory cgroup needs to fix
up the accounting information. memcg need to check PCG_USED bit etc.
In some(many?) cases, 'newpage' is on LRU before calling
replace_page_cache(). So, memcg's LRU accounting information should be
fixed, too.
This patch adds mem_cgroup_replace_page_cache() and removes the old hooks.
In that function, old pages will be unaccounted without touching
res_counter and new page will be accounted to the memcg (of old page).
WHen overwriting pc->mem_cgroup of newpage, take zone->lru_lock and avoid
races with LRU handling.
Background:
replace_page_cache_page() is called by FUSE code in its splice() handling.
Here, 'newpage' is replacing oldpage but this newpage is not a newly allocated
page and may be on LRU. LRU mis-accounting will be critical for memory cgroup
because rmdir() checks the whole LRU is empty and there is no account leak.
If a page is on the other LRU than it should be, rmdir() will fail.
This bug was added in March 2011, but no bug report yet. I guess there
are not many people who use memcg and FUSE at the same time with upstream
kernels.
The result of this bug is that admin cannot destroy a memcg because of
account leak. So, no panic, no deadlock. And, even if an active cgroup
exist, umount can succseed. So no problem at shutdown.
The commit "ath9k: Fix invalid noisefloor reading due to channel update"
preserves the current channel noisefloor readings before updating
channel type at the same channel index. It is also updating the curchan
pointer. As survey updation is also referring curchan pointer to fetch
the appropriate index, which might leads to invalid memory access. This
patch partially reverts the change and stores the noise floor history
buffer before updating channel type w/o updating curchan.
Cc: Gary Morain <gmorain@google.com> Cc: Paul Stewart <pstew@google.com> Reported-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com> Signed-off-by: Rajkumar Manoharan <rmanohar@qca.qualcomm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
is possible to that we dereference rx->key == NULL when driver set
RX_FLAG_MMIC_STRIPPED and not RX_FLAG_IV_STRIPPED and we are in
promiscuous mode. This happen with rt73usb and rt61pci at least.
Before the commit we always check rx->key against NULL, so I assume
fix should be done in mac80211 (also mic_fail path has similar check).
Reported-by: Stuart D Gathman <stuart@gathman.org> Reported-by: Kai Wohlfahrt <kai.scorpio@gmail.com> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When downloading firmware into the device, the driver fails to check the
return when allocating an skb. When the allocation fails, a BUG can be
generated, as seen in https://bugzilla.redhat.com/show_bug.cgi?id=771656.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Building an ARM target we get the following warnings:
CC arch/arm/kernel/setup.o
In file included from arch/arm/kernel/setup.c:39:
arch/arm/include/asm/elf.h:102:1: warning: "vmcore_elf64_check_arch" redefined
In file included from arch/arm/kernel/setup.c:24:
include/linux/crash_dump.h:30:1: warning: this is the location of the previous definition
Quoting Russell King:
"linux/crash_dump.h makes no attempt to include asm/elf.h, but it depends
on stuff in asm/elf.h to determine how stuff inside this file is defined
at parse time.
So, if asm/elf.h is included after linux/crash_dump.h or not at all, you
get a different result from the situation where asm/elf.h is included
before."
So add elf.h header to crash_dump.h to avoid this problem.
The original discussion about this can be found at:
http://www.spinics.net/lists/arm-kernel/msg154113.html
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In kernel v3.2 initialization sequence for Asix 88772 devices was changed so
that hardware is reseted on every time interface is brought up (ifconfig up),
instead just at USB probe time. This causes problem with setting custom MAC
address to device as ax88772_reset causes reload of MAC address from EEPROM.
This patch fixes the issue by rewriting MAC address at end of ax88772_reset.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: Grant Grundler <grundler@chromium.org> Cc: Allan Chou <allan@asix.com.tw> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In kernel v3.2 initialization sequence for Asix 88178 devices was changed so
that hardware is reseted on every time interface is brought up (ifconfig up),
instead just at USB probe time. This causes problem with setting custom MAC
address to device as ax88178_reset causes reload of MAC address from EEPROM.
This patch fixes the issue by rewriting MAC address at end of ax88178_reset.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: Grant Grundler <grundler@chromium.org> Cc: Allan Chou <allan@asix.com.tw> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Some Dell BIOSes have MCFG tables that don't report the entire
MMCONFIG area claimed by the chipset. If we move PCI devices into
that claimed-but-unreported area, they don't work.
This quirk reads the AMD MMCONFIG MSRs and adds PNP0C01 resources as
needed to cover the entire area.
Example problem scenario:
BIOS-e820: 00000000cfec5400 - 00000000d4000000 (reserved)
Fam 10h mmconf [d0000000, dfffffff]
PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem 0xd0000000-0xd3ffffff] (base 0xd0000000)
pnp 00:0c: [mem 0xd0000000-0xd3ffffff]
pci 0000:00:12.0: reg 10: [mem 0xffb00000-0xffb00fff]
pci 0000:00:12.0: no compatible bridge window for [mem 0xffb00000-0xffb00fff]
pci 0000:00:12.0: BAR 0: assigned [mem 0xd4000000-0xd40000ff]
Zhihua Che reported a possible memleak in slub allocator on
CONFIG_PREEMPT=y builds.
It is possible current thread migrates right before disabling irqs in
__slab_alloc(). We must check again c->freelist, and perform a normal
allocation instead of scratching c->freelist.
Many thanks to Zhihua Che for spotting this bug, introduced in 2.6.39
V2: Its also possible an IRQ freed one (or several) object(s) and
populated c->freelist, so its not a CONFIG_PREEMPT only problem.
Reported-by: Zhihua Che <zhihua.che@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Info about new measurements are cached in the iint for performance. When
the inode is flushed from cache, the associated iint is flushed as well.
Subsequent access to the inode will cause the inode to be re-measured and
will attempt to add a duplicate entry to the measurement list.
This patch frees the duplicate measurement memory, fixing a memory leak.
There is a potential integer overflow in process_msg() that could result
in cross-domain attack.
body = kmalloc(msg->hdr.len + 1, GFP_NOIO | __GFP_HIGH);
When a malicious guest passes 0xffffffff in msg->hdr.len, the subsequent
call to xb_read() would write to a zero-length buffer.
The other end of this connection is always the xenstore backend daemon
so there is no guest (malicious or otherwise) which can do this. The
xenstore daemon is a trusted component in the system.
However this seem like a reasonable robustness improvement so we should
have it.
And Ian when read the API docs found that:
The payload length (len field of the header) is limited to 4096
(XENSTORE_PAYLOAD_MAX) in both directions. If a client exceeds the
limit, its xenstored connection will be immediately killed by
xenstored, which is usually catastrophic from the client's point of
view. Clients (particularly domains, which cannot just reconnect)
should avoid this.
so this patch checks against that instead.
This also avoids a potential integer overflow pointed out by Haogang Chen.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Haogang Chen <haogangchen@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The amount of memory required for tracking chain buffers is rather
large, and when the host credit count is big, memory allocation
failure occurs inside __get_free_pages.
The fix is to limit the number of chains to 100,000. In addition,
the number of host credits is limited to 30,000 IOs. However this
limitation can be overridden this using the command line option
max_queue_depth. The algorithm for calculating the
reply_post_queue_depth is changed so that it is equal to
(reply_free_queue_depth + 16), previously it was (reply_free_queue_depth * 2).
Added code to release the spinlock that is used to protect the
raid device list before calling a function that can block. The
blocking was causing a reschedule, and subsequently it is tried
to acquire the same lock, resulting in a panic (NMI Watchdog
detecting a CPU lockup).
We only need amd_bus.o for AMD systems with PCI. arch/x86/pci/Makefile
already depends on CONFIG_PCI=y, so this patch just adds the dependency
on CONFIG_AMD_NB.
This factors out the AMD native MMCONFIG discovery so we can use it
outside amd_bus.c.
amd_bus.c reads AMD MSRs so it can remove the MMCONFIG area from the
PCI resources. We may also need the MMCONFIG information to work
around BIOS defects in the ACPI MCFG table.
This assures that a _CRS reserved host bridge window or window region is
not used if it is not addressable by the CPU. The new code either trims
the window to exclude the non-addressable portion or totally ignores the
window if the entire window is non-addressable.
The current code has been shown to be problematic with 32-bit non-PAE
kernels on systems where _CRS reserves resources above 4GB.
Signed-off-by: Gary Hade <garyhade@us.ibm.com> Reviewed-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Thomas Renninger <trenn@novell.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I traced a nasty kexec on panic boot failure to the fact that we had
screaming msi interrupts and we were not disabling the msi messages at
kernel startup. The booting kernel had not enabled those interupts so
was not prepared to handle them.
I can see no reason why we would ever want to leave the msi interrupts
enabled at boot if something else has enabled those interrupts. The pci
spec specifies that msi interrupts should be off by default. Drivers
are expected to enable the msi interrupts if they want to use them. Our
interrupt handling code reprograms the interrupt handlers at boot and
will not be be able to do anything useful with an unexpected interrupt.
This patch applies cleanly all of the way back to 2.6.32 where I noticed
the problem.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When we fail to erase a PEB, we free the corresponding erase entry object,
but then re-schedule this object if the error code was something like -EAGAIN.
Obviously, it is a bug to use the object after we have freed it.
Setting the security context of a NFSv4 mount via the context= mount
option is currently broken. The NFSv4 codepath allocates a parsed
options struct, and then parses the mount options to fill it. It
eventually calls nfs4_remote_mount which calls security_init_mnt_opts.
That clobbers the lsm_opts struct that was populated earlier. This bug
also looks like it causes a small memory leak on each v4 mount where
context= is used.
Fix this by moving the initialization of the lsm_opts into
nfs_alloc_parsed_mount_data. Also, add a destructor for
nfs_parsed_mount_data to make it easier to free all of the allocations
hanging off of it, and to ensure that the security_free_mnt_opts is
called whenever security_init_mnt_opts is.
I believe this regression was introduced quite some time ago, probably
by commit c02d7adf.
The NFSv4 bitmap size is unbounded: a server can return an arbitrary
sized bitmap in an FATTR4_WORD0_ACL request. Replace using the
nfs4_fattr_bitmap_maxsz as a guess to the maximum bitmask returned by a server
with the inclusion of the bitmap (xdr length plus bitmasks) and the acl data
xdr length to the (cached) acl page data.
This is a general solution to commit e5012d1f "NFSv4.1: update
nfs4_fattr_bitmap_maxsz" and fixes hitting a BUG_ON in xdr_shrink_bufhead
when getting ACLs.
Fix a bug in decode_getacl that returned -EINVAL on ACLs > page when getxattr
was called with a NULL buffer, preventing ACL > PAGE_SIZE from being retrieved.
Previously an error from filemap_write_and_wait_range would only be of
interest if nfs_file_fsync did not return an error. After this commit,
an error from filemap_write_and_wait_range would mean that (the rest of)
nfs_file_fsync would not even be called.
This means that:
1/ you are more likely to see EIO than e.g. EDQUOT or ENOSPC.
2/ NFS_CONTEXT_ERROR_WRITE remains set for longer so more writes are
synchronous.
This patch restores previous behaviour.
Cc: Josef Bacik <josef@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Lukas Razik <linux@razik.name> reports that on his SPARC system,
booting with an NFS root file system stopped working after commit 56463e50 "NFS: Use super.c for NFSROOT mount option parsing."
We found that the network switch to which Lukas' client was attached
was delaying access to the LAN after the client's NIC driver reported
that its link was up. The delay was longer than the timeouts used in
the NFS client during mounting.
NFSROOT worked for Lukas before commit 56463e50 because in those
kernels, the client's first operation was an rpcbind request to
determine which port the NFS server was listening on. When that
request failed after a long timeout, the client simply selected the
default NFS port (2049). By that time the switch was allowing access
to the LAN, and the mount succeeded.
Neither of these client behaviors is desirable, so reverting 56463e50
is really not a choice. Instead, introduce a mechanism that retries
the NFSROOT mount request several times. This is the same tactic that
normal user space NFS mounts employ to overcome server and network
delays.
As mandated by the standard. In case of an IO error, a pNFS
objects layout driver must return it's layout. This is because
all device errors are reported to the server as part of the
layout return buffer.
This is implemented the same way PNFS_LAYOUTRET_ON_SETATTR
is done, through a bit flag on the pnfs_layoutdriver_type->flags
member. The flag is set by the layout driver that wants a
layout_return preformed at pnfs_ld_{write,read}_done in case
of an error.
(Though I have not defined a wrapper like pnfs_ld_layoutret_on_setattr
because this code is never called outside of pnfs.c and pnfs IO
paths)
Without this patch 3.[0-2] Kernels leak memory and have an annoying
WARN_ON after every IO error utilizing the pnfs-obj driver.
Some time along the way pNFS IO errors were switched to
communicate with a special iodata->pnfs_error member instead
of the regular RPC members. But objlayout was not switched
over.
Fix that!
Without this fix any IO error is hanged, because IO is not
switched to MDS and pages are never cleared or read.
[Applies to 3.2.0. Same bug different patch for 3.1/0 Kernels] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It would previously write basically random bits to PCI configuration space...
Not very surprising that the GPU tended to stop responding completely. The
resulting MCE even froze the whole machine sometimes.
Now resetting the GPU after a lockup has at least a fighting chance of
succeeding.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We often end up missing fences on older asics with
writeback enabled which leads to delays in the userspace
accel code, so just disable it by default on those asics.
Reported-by: Helge Deller <deller@gmx.de> Reported-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This change was verified to fix both issues with no video I've
investigated. I've also checked checksum calculation with fglrx on:
RV620, HD54xx, HD5450, HD6310, HD6320.