Currently the selected timer backend is referred at any moment from
the running PCM callbacks. When the backend is switched, it's
possible to lead to inconsistency from the running backend. This was
pointed by syzkaller fuzzer, and the commit [7ee96216c31a: ALSA:
dummy: Disable switching timer backend via sysfs] disabled the dynamic
switching for avoiding the crash.
This patch improves the handling of timer backend switching. It keeps
the reference to the selected backend during the whole operation of an
opened stream so that it won't be changed by other streams.
Together with this change, the hrtimer parameter is reenabled as
writable now.
NOTE: this patch also turned out to fix the still remaining race.
Namely, ops was still replaced dynamically at dummy_pcm_open:
static int dummy_pcm_open(struct snd_pcm_substream *substream)
{
....
dummy->timer_ops = &dummy_systimer_ops;
if (hrtimer)
dummy->timer_ops = &dummy_hrtimer_ops;
Since dummy->timer_ops is common among all streams, and when the
replacement happens during accesses of other streams, it may lead to a
crash. This was actually triggered by syzkaller fuzzer and KASAN.
This patch rewrites the code not to use the ops shared by all streams
any longer, too.
The hda_jack_tbl entries are managed by snd_array for allowing
multiple jacks. It's good per se, but the problem is that struct
hda_jack_callback keeps the hda_jack_tbl pointer. Since snd_array
doesn't preserve each pointer at resizing the array, we can't keep the
original pointer but have to deduce the pointer at each time via
snd_array_entry() instead. Actually, this resulted in the deference
to the wrong pointer on codecs that have many pins such as CS4208.
This patch replaces the pointer to the NID value as the search key.
As an unexpected good side effect, this even simplifies the code, as
only NID is needed in most cases.
The original commit disabled the aamixer path due to the noise
problem, but it turned out that some mobo with the same PCI SSID
doesn't suffer from the issue, and the disabled function (analog
loopback) is still demanded by users.
Since the recent commit [e7fdd52779a6: ALSA: hda - Implement loopback
control switch for Realtek and other codecs], we have the dynamic
mixer switch to enable/disable the aamix path, and we don't have to
disable the path statically any longer. So, let's revert the
disablement, so that only the user suffering from the noise problem
can turn off the aamix on the fly.
sound/pci/hda/patch_hdmi.c:460 hdmi_eld_ctl_get()
error: __memcpy() 'eld->eld_buffer' too small (256 vs 512)
I have a hard time figuring out if this can ever cause an information leak
(I don't think so), but nonetheless it does not hurt to increase the
robustness of the code.
Fixes: 68e03de98507 ('ALSA: hda - hdmi: Do not expose eld data when eld is invalid') Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Henningsson <david.henningsson@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mac Mini 7,1 model with CS4208 codec reports the headphone jack
detection wrongly in an inverted way. Moreover, the advertised pins
for the audio input and SPDIF output have actually no jack detection.
This patch addresses these issues. The inv_jack_detect flag is set
for fixing the headphone jack detection, and the pin configs for audio
input and SPDIF output are marked as non-detectable.
A slave timer element also unlinks at snd_timer_stop() but it takes
only slave_active_lock. When a slave is assigned to a master,
however, this may become a race against the master's interrupt
handling, eventually resulting in a list corruption. The actual bug
could be seen with a syzkaller fuzzer test case in BugLink below.
As a fix, we need to take timeri->timer->lock when timer isn't NULL,
i.e. assigned to a master, while the assignment to a master itself is
protected by slave_active_lock.
In snd_timer_notify1(), the wrong timer instance was passed for slave
ccallback function. This leads to the access to the wrong data when
an incompatible master is handled (e.g. the master is the sequencer
timer and the slave is a user timer), as spotted by syzkaller fuzzer.
snd_timer_user_read() has a potential race among parallel reads, as
qhead and qused are updated outside the critical section due to
copy_to_user() calls. Move them into the critical section, and also
sanitize the relevant code a bit.
Although ALSA timer code got hardening for races, it still causes
use-after-free error. This is however rather a corrupted linked list,
not actually the concurrent accesses. Namely, when timer start is
triggered twice, list_add_tail() is called twice, too. This ends
up with the link corruption and triggers KASAN error.
The simplest fix would be replacing list_add_tail() with
list_move_tail(), but fundamentally it's the problem that we don't
check the double start/stop correctly. So, the right fix here is to
add the proper checks to snd_timer_start() and snd_timer_stop() (and
their variants).
In ALSA timer core, the active timer instance is managed in
active_list linked list. Each element is added / removed dynamically
at timer start, stop and in timer interrupt. The problem is that
snd_timer_interrupt() has a thinko and leaves the element in
active_list when it's the last opened element. This eventually leads
to list corruption or use-after-free error.
This hasn't been revealed because we used to delete the list forcibly
in snd_timer_stop() in the past. However, the recent fix avoids the
double-stop behavior (in commit [f784beb75ce8: ALSA: timer: Fix link
corruption due to double start or stop]), and this leak hits reality.
This patch fixes the link management in snd_timer_interrupt(). Now it
simply unlinks no matter which stream is.
This is a minor code cleanup without any functional changes:
- Kill keep_flag argument from _snd_timer_stop(), as all callers pass
only it false.
- Remove redundant NULL check in _snd_timer_stop().
The port subscription code uses double mutex locks for source and
destination ports, and this may become racy once when wrongly set up.
It leads to lockdep warning splat, typically triggered by fuzzer like
syzkaller, although the actual deadlock hasn't been seen, so far.
This patch simplifies the handling by reducing to two single locks, so
that no lockdep warning will be trigger any longer.
By splitting to two actions, a still-in-progress element shall be
added in one list while handling another. For ignoring this element,
a new check is added in deliver_to_subscribers().
Along with it, the code to add/remove the subscribers list element was
cleaned up and refactored.
ALSA sequencer may open/close and control ALSA timer instance
dynamically either via sequencer events or direct ioctls. These are
done mostly asynchronously, and it may call still some timer action
like snd_timer_start() while another is calling snd_timer_close().
Since the instance gets removed by snd_timer_close(), it may lead to
a use-after-free.
This patch tries to address such a race by protecting each
snd_timer_*() call via the existing spinlock and also by avoiding the
access to timer during close call.
While performing hw_free, DPCM checks the BE state but leaves out
the suspend state. The suspend state needs to be checked as well,
as we might be suspended and then usermode closes rather than
resuming the audio stream.
This was found by a stress testing of system with playback in
loop and killed after few seconds running in background and second
script running suspend-resume test in loop
Signed-off-by: Vinod Koul <vinod.koul@intel.com> Acked-by: Liam Girdwood <liam.r.girdwood@linux.intel.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are potential deadlocks in PCM OSS emulation code while
accessing read/write and mmap concurrently. This comes from the
infamous mmap_sem usage in copy_from/to_user(). Namely,
The rawmidi read and write functions manage runtime stream status
such as runtime->appl_ptr and runtime->avail. These point where to
copy the new data and how many bytes have been copied (or to be
read). The problem is that rawmidi read/write call copy_from_user()
or copy_to_user(), and the runtime spinlock is temporarily unlocked
and relocked while copying user-space. Since the current code
advances and updates the runtime status after the spin unlock/relock,
the copy and the update may be asynchronous, and eventually
runtime->avail might go to a negative value when many concurrent
accesses are done. This may lead to memory corruption in the end.
For fixing this race, in this patch, the status update code is
performed in the same lock before the temporary unlock. Also, the
spinlock is now taken more widely in snd_rawmidi_kernel_read1() for
protecting more properly during the whole operation.
NULL user-space buffer can be passed even in a normal path, thus it's
not good to spew a kernel warning with stack trace at each time.
Just drop snd_BUG_ON() macro usage there.
Also a similar warning is found but in another path:
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff82be2c0d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
[<ffffffff81355139>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
[<ffffffff81355369>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
[<ffffffff8527e69a>] rawmidi_transmit_ack+0x24a/0x3b0 sound/core/rawmidi.c:1133
[<ffffffff8527e851>] snd_rawmidi_transmit_ack+0x51/0x80 sound/core/rawmidi.c:1163
[<ffffffff852d9046>] snd_virmidi_output_trigger+0x2b6/0x570 sound/core/seq/seq_virmidi.c:185
[< inline >] snd_rawmidi_output_trigger sound/core/rawmidi.c:150
[<ffffffff85285a0b>] snd_rawmidi_kernel_write1+0x4bb/0x760 sound/core/rawmidi.c:1252
[<ffffffff85287b73>] snd_rawmidi_write+0x543/0xb30 sound/core/rawmidi.c:1302
[<ffffffff817ba5f3>] __vfs_write+0x113/0x480 fs/read_write.c:528
[<ffffffff817bc087>] vfs_write+0x167/0x4a0 fs/read_write.c:577
[< inline >] SYSC_write fs/read_write.c:624
[<ffffffff817bf371>] SyS_write+0x111/0x220 fs/read_write.c:616
[<ffffffff86660276>] entry_SYSCALL_64_fastpath+0x16/0x7a arch/x86/entry/entry_64.S:185
In the former case, the reason is that virmidi has an open code
calling snd_rawmidi_transmit_ack() with the value calculated outside
the spinlock. We may use snd_rawmidi_transmit() in a loop just for
consuming the input data, but even there, there is a race between
snd_rawmidi_transmit_peek() and snd_rawmidi_tranmit_ack().
Similarly in the latter case, it calls snd_rawmidi_transmit_peek() and
snd_rawmidi_tranmit_ack() separately without protection, so they are
racy as well.
The patch tries to address these issues by the following ways:
- Introduce the unlocked versions of snd_rawmidi_transmit_peek() and
snd_rawmidi_transmit_ack() to be called inside the explicit lock.
- Rewrite snd_rawmidi_transmit() to be race-free (the former case).
- Make the split calls (the latter case) protected in the rawmidi spin
lock.
ALSA OSS sequencer spews a kernel error message ("ALSA: seq_oss: too
many applications") when user-space tries to open more than the
limit. This means that it can easily fill the log buffer.
Since it's merely a normal error, it's safe to suppress it via
pr_debug() instead.
ALSA sequencer OSS emulation code has a sanity check for currently
opened devices, but there is a thinko there, eventually it spews
warnings and skips the operation wrongly like:
WARNING: CPU: 1 PID: 7573 at sound/core/seq/oss/seq_oss_synth.c:311
ALSA dummy driver can switch the timer backend between system timer
and hrtimer via its hrtimer module option. This can be also switched
dynamically via sysfs, but it may lead to a memory corruption when
switching is done while a PCM stream is running; the stream instance
for the newly switched timer method tries to access the memory that
was allocated by another timer method although the sizes differ.
As the simplest fix, this patch just disables the switch via sysfs by
dropping the writable bit.
Some architectures like PowerPC can handle the maximum struct size in
an ioctl only up to 13 bits, and struct snd_compr_codec_caps used by
SNDRV_COMPRESS_GET_CODEC_CAPS ioctl overflows this limit. This
problem was revealed recently by a powerpc change, as it's now treated
as a fatal build error.
This patch is a stop-gap for that: for architectures with less than 14
bit ioctl struct size, get rid of the handling of the relevant ioctl.
We should provide an alternative equivalent ioctl code later, but for
now just paper over it. Luckily, the compress API hasn't been used on
such architectures, so the impact must be effectively zero.
Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On Broxton, to make sure the reset controller works properly,
MISCBDCGE bit (bit 6) in CGCTL (0x48) of PCI configuration space
need be cleared before reset and set back to 1 after reset.
Otherwise, it may prevent the CORB/RIRB logic from being reset.
Since the build of PCM timer may be disabled via Kconfig now, each
driver that provides a timer interface needs to set CONFIG_SND_TIMER
explicitly. Otherwise it may get a build error due to missing
symbol.
Fixes: 90bbaf66ee7b ('ALSA: timer: add config item to export PCM timer disabling for expert') Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The return type "unsigned int" was used by the get_formation_index function
despite of the aspect that it will eventually return a negative error code.
So, change to signed int and get index by reference in the parameters.
Done with the help of Coccinelle.
[Fix the missing braces suggested by Julia Lawall -- tiwai]
The 'umidi' object will be free'd on the error path by snd_usbmidi_free()
when tearing down the rawmidi interface. So we shouldn't try to free it
in snd_usbmidi_create() after having registered the rawmidi interface.
TEAC UD-501/UD-503/NT-503 fail to switch properly between different
rate/format. Similar to 'Playback Design', this patch corrects the
invalid clock source error for TEAC products and avoids complete
freeze of the usb interface of 503 series.
If CONFIG_TIME_LOW_RES is enabled we add a jiffie to the relative timeout to
prevent short sleeps, but we do not account for that in interfaces which
retrieve the remaining time.
Helge observed that timerfd can return a remaining time larger than the
relative timeout. That's not expected and breaks userland test programs.
Store the information that the timer was armed relative and provide functions
to adjust the remaining time. To avoid bloating the hrtimer struct make state
a u8, which as a bonus results in better code on x86 at least.
Reported-and-tested-by: Helge Deller <deller@gmx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Cc: linux-m68k@lists.linux-m68k.org Cc: dhowells@redhat.com Link: http://lkml.kernel.org/r/20160114164159.273328486@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is not safe for an integrity profile to be changed while i/o is
in-flight in the queue. Prevent adding new disks or otherwise online
spares to an array if the device has an incompatible integrity profile.
The original change to the blk_integrity_unregister implementation in
md, commmit c7bfced9a671 "md: suspend i/o during runtime
blk_integrity_unregister" introduced an immediate hang regression.
This policy of disallowing changes the integrity profile once one has
been established is shared with DM.
Here is an abbreviated log from a test run that:
1/ Creates a degraded raid1 with an integrity-enabled device (pmem0s) [ 59.076127]
2/ Tries to add an integrity-disabled device (pmem1m) [ 90.489209]
3/ Retries with an integrity-enabled device (pmem1s) [ 205.671277]
[ 59.076127] md/raid1:md0: active with 1 out of 2 mirrors
[ 59.078302] md: data integrity enabled on md0
[..]
[ 90.489209] md0: incompatible integrity profile for pmem1m
[..]
[ 205.671277] md: super_written gets error=-5
[ 205.677386] md/raid1:md0: Disk failure on pmem1m, disabling device.
[ 205.677386] md/raid1:md0: Operation continuing on 1 devices.
[ 205.683037] RAID1 conf printout:
[ 205.684699] --- wd:1 rd:2
[ 205.685972] disk 0, wo:0, o:1, dev:pmem0s
[ 205.687562] disk 1, wo:1, o:1, dev:pmem1s
[ 205.691717] md: recovery of RAID array md0
Fixes: c7bfced9a671 ("md: suspend i/o during runtime blk_integrity_unregister") Cc: Mike Snitzer <snitzer@redhat.com> Reported-by: NeilBrown <neilb@suse.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is a partial revert of commit ed8d1cf07cb16d ("[media] Export I2C
module alias information in missing drivers") that exported the module
aliases for the I2C drivers that were missing to make autoload to work.
But there is a bug report [0] that auto load of the ir-kbd-i2c driver
cause the Hauppauge HD-PVR driver to not behave correctly.
This is a hdpvr latent bug that was just exposed by ir-kbd-i2c module
autoloading working and will also happen if the I2C driver is built-in
or a user calls modprobe to load the module and register the driver.
But there is a regression experimented by users so until the real bug
is fixed, let's not export the module alias for the ir-kbd-i2c driver
even when this just masks the actual issue.
Fixes: ed8d1cf07cb1 ("[media] Export I2C module alias information in missing drivers") Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On a 64bit kernel build the compiler aligns the _sifields union in the
struct siginfo_t on a 64bit address. The __ARCH_SI_PREAMBLE_SIZE define
compensates for this alignment and thus fixes the wait testcase of the
strace package.
The symptoms of a wrong __ARCH_SI_PREAMBLE_SIZE value is that
_sigchld.si_stime variable is missed to be copied and thus after a
copy_siginfo() will have uninitialized values.
PA-RISC doesn't have atomic instructions to modify page table entries, so it
takes spinlock in the TLB handler and modifies the page table entry
non-atomically. If you modify the page table entry without the spinlock, you
may race with TLB handler on another CPU and your modification may be lost.
Protect against that with usage of purge_tlb_start() and purge_tlb_end() which
handles the TLB spinlock.
@console_may_schedule tracks whether console_sem was acquired through
lock or trylock. If the former, we're inside a sleepable context and
console_conditional_schedule() performs cond_resched(). This allows
console drivers which use console_lock for synchronization to yield
while performing time-consuming operations such as scrolling.
However, the actual console outputting is performed while holding
irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
before starting outputting lines. Also, only a few drivers call
console_conditional_schedule() to begin with. This means that when a
lot of lines need to be output by console_unlock(), for example on a
console registration, the task doing console_unlock() may not yield for
a long time on a non-preemptible kernel.
If this happens with a slow console devices, for example a serial
console, the outputting task may occupy the cpu for a very long time.
Long enough to trigger softlockup and/or RCU stall warnings, which in
turn pile more messages, sometimes enough to trigger the next cycle of
warnings incapacitating the system.
Fix it by making console_unlock() insert cond_resched() between lines if
@console_may_schedule.
Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Calvin Owens <calvinowens@fb.com> Acked-by: Jan Kara <jack@suse.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Kyle McMartin <kyle@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a max stack trace is discovered, the stack dump is saved. In order to
not record the overhead of the stack tracer, the ip of the traced function
is looked for within the dump. The trace is started from the location of
that function. But if for some reason the ip is not found, the entire stack
trace is then truncated. That's not very useful. Instead, print everything
if the ip of the traced function is not found within the trace.
This issue showed up on s390.
Link: http://lkml.kernel.org/r/20160129102241.1b3c9c04@gandalf.local.home Fixes: 72ac426a5bb0 ("tracing: Clean up stack tracing and fix fentry updates") Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While cleaning the stacktrace code I unintentially changed the skip depth of
trace_buffer_unlock_commit_regs() from 0 to 6. kprobes uses this function,
and with skipping 6 call backs, it can easily produce no stack.
Commit 36e097a8a297 ("PCI: Split out bridge window override of minimum
allocation address") claimed to do no functional changes but unfortunately
did: The "min" variable is altered. At least the AVM A1 PCMCIA adapter was
no longer detected, breaking ISDN operation.
Use a local copy of "min" to restore the previous behaviour.
[bhelgaas: avoid gcc "?:" extension for portability and readability] Fixes: 36e097a8a297 ("PCI: Split out bridge window override of minimum allocation address") Signed-off-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On -RT and if kernel is booting with "threadirqs" cmd line parameter,
PCIe/PCI (MSI) IRQ cascade handlers (like dra7xx_pcie_msi_irq_handler())
will be forced threaded and, as result, will generate warnings like this:
WARNING: CPU: 1 PID: 82 at kernel/irq/handle.c:150 handle_irq_event_percpu+0x14c/0x174()
irq 460 handler irq_default_primary_handler+0x0/0x14 enabled interrupts
Backtrace:
(warn_slowpath_common) from (warn_slowpath_fmt+0x38/0x40)
(warn_slowpath_fmt) from (handle_irq_event_percpu+0x14c/0x174)
(handle_irq_event_percpu) from (handle_irq_event+0x84/0xb8)
(handle_irq_event) from (handle_simple_irq+0x90/0x118)
(handle_simple_irq) from (generic_handle_irq+0x30/0x44)
(generic_handle_irq) from (dra7xx_pcie_msi_irq_handler+0x7c/0x8c)
(dra7xx_pcie_msi_irq_handler) from (irq_forced_thread_fn+0x28/0x5c)
(irq_forced_thread_fn) from (irq_thread+0x128/0x204)
This happens because all of them invoke generic_handle_irq() from the
requested handler. generic_handle_irq() grabs raw_locks and thus needs to
run in raw-IRQ context.
This issue was originally reproduced on TI dra7-evem, but, as was
identified during discussion [1], other hosts can also suffer from this
issue. Fix all them at once by marking PCIe/PCI (MSI) IRQ cascade handlers
IRQF_NO_THREAD explicitly.
Commits such as commit 853f1c58c4b2 ("mtd: nand: omap2: show parent
device structure in sysfs") attempt to rely on the core MTD code to set
the MTD name based on the parent device. However, nand_base tries to set
a different default name according to the flash name (e.g., extracted
from the ONFI parameter page), which means NAND drivers will never make
use of the MTD defaults. This is not the intention of commit 853f1c58c4b2.
This results in problems when trying to use the cmdline partition
parser, since the MTD name is different than expected. Let's fix this by
providing a default NAND name, where possible.
Note that this is not really a great default name in the long run, since
this means that if there are multiple MTDs attached to the same
controller device, they will have the same name. But that is an existing
issue and requires future work on a better controller vs. flash chip
abstraction to fix properly.
Fixes: 853f1c58c4b2 ("mtd: nand: omap2: show parent device structure in sysfs") Reported-by: Heiko Schocher <hs@denx.de> Signed-off-by: Brian Norris <computersforpeace@gmail.com> Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com> Tested-by: Heiko Schocher <hs@denx.de> Cc: Heiko Schocher <hs@denx.de> Cc: Frans Klaver <fransklaver@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix the below Oops when trying to modprobe wlcore_spi.
The oops occurs because the wl1271_power_{off,on}()
function doesn't check the power() function pointer.
[ 23.665030] [<bf2581f0>] (wl12xx_set_power_on [wlcore]) from
[<bf25f7ac>] (wlcore_nvs_cb+0x118/0xa4c [wlcore])
[ 23.675604] [<bf25f7ac>] (wlcore_nvs_cb [wlcore]) from [<c04387ec>]
(request_firmware_work_func+0x30/0x58)
[ 23.685784] [<c04387ec>] (request_firmware_work_func) from
[<c0058e2c>] (process_one_work+0x1b4/0x4b4)
[ 23.695591] [<c0058e2c>] (process_one_work) from [<c0059168>]
(worker_thread+0x3c/0x4a4)
[ 23.704124] [<c0059168>] (worker_thread) from [<c005ee68>]
(kthread+0xd4/0xf0)
[ 23.711747] [<c005ee68>] (kthread) from [<c000f598>]
(ret_from_fork+0x14/0x3c)
[ 23.719357] Code: bad PC value
[ 23.722760] ---[ end trace 981be8510db9b3a9 ]---
Prevent oops by validationg power() pointer value before
calling the function.
Signed-off-by: Uri Mashiach <uri.mashiach@compulab.co.il> Acked-by: Igor Grinberg <grinberg@compulab.co.il> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The maximum chunks used by the function is
(SPI_AGGR_BUFFER_SIZE / WSPI_MAX_CHUNK_SIZE + 1).
The original commands array had space for
(SPI_AGGR_BUFFER_SIZE / WSPI_MAX_CHUNK_SIZE) commands.
When the last chunk is used (len > 4 * WSPI_MAX_CHUNK_SIZE), the last
command is stored outside the bounds of the commands array.
Oops 5 (page fault) is generated during current wl1271 firmware load
attempt:
root@debian-armhf:~# ifconfig wlan0 up
[ 294.312399] Unable to handle kernel paging request at virtual address 00203fc4
[ 294.320173] pgd = de528000
[ 294.323028] [00203fc4] *pgd=00000000
[ 294.326916] Internal error: Oops: 5 [#1] SMP ARM
[ 294.331789] Modules linked in: bnep rfcomm bluetooth ipv6 arc4 wl12xx
wlcore mac80211 musb_dsps cfg80211 musb_hdrc usbcore usb_common
wlcore_spi omap_rng rng_core musb_am335x omap_wdt cpufreq_dt thermal_sys
hwmon
[ 294.351838] CPU: 0 PID: 1827 Comm: ifconfig Not tainted 4.2.0-00002-g3e9ad27-dirty #78
[ 294.360154] Hardware name: Generic AM33XX (Flattened Device Tree)
[ 294.366557] task: dc9d6d40 ti: de550000 task.ti: de550000
[ 294.372236] PC is at __spi_validate+0xa8/0x2ac
[ 294.376902] LR is at __spi_sync+0x78/0x210
[ 294.381200] pc : [<c049c760>] lr : [<c049ebe0>] psr: 60000013
[ 294.381200] sp : de551998 ip : de5519d8 fp : 00200000
[ 294.393242] r10: de551c8c r9 : de5519d8 r8 : de3a9000
[ 294.398730] r7 : de3a9258 r6 : de3a9400 r5 : de551a48 r4 : 00203fbc
[ 294.405577] r3 : 00000000 r2 : 00000000 r1 : 00000000 r0 : de3a9000
[ 294.412420] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM
Segment user
[ 294.419918] Control: 10c5387d Table: 9e528019 DAC: 00000015
[ 294.425954] Process ifconfig (pid: 1827, stack limit = 0xde550218)
[ 294.432437] Stack: (0xde551998 to 0xde552000)
...
[ 294.883613] [<c049c760>] (__spi_validate) from [<c049ebe0>]
(__spi_sync+0x78/0x210)
[ 294.891670] [<c049ebe0>] (__spi_sync) from [<bf036598>]
(wl12xx_spi_raw_write+0xfc/0x148 [wlcore_spi])
[ 294.901661] [<bf036598>] (wl12xx_spi_raw_write [wlcore_spi]) from
[<bf21c694>] (wlcore_boot_upload_firmware+0x1ec/0x458 [wlcore])
[ 294.914038] [<bf21c694>] (wlcore_boot_upload_firmware [wlcore]) from
[<bf24532c>] (wl12xx_boot+0xc10/0xfac [wl12xx])
[ 294.925161] [<bf24532c>] (wl12xx_boot [wl12xx]) from [<bf20d5cc>]
(wl1271_op_add_interface+0x5b0/0x910 [wlcore])
[ 294.936364] [<bf20d5cc>] (wl1271_op_add_interface [wlcore]) from
[<bf15c4ac>] (ieee80211_do_open+0x44c/0xf7c [mac80211])
[ 294.947963] [<bf15c4ac>] (ieee80211_do_open [mac80211]) from
[<c0537978>] (__dev_open+0xa8/0x110)
[ 294.957307] [<c0537978>] (__dev_open) from [<c0537bf8>]
(__dev_change_flags+0x88/0x148)
[ 294.965713] [<c0537bf8>] (__dev_change_flags) from [<c0537cd0>]
(dev_change_flags+0x18/0x48)
[ 294.974576] [<c0537cd0>] (dev_change_flags) from [<c05a55a0>]
(devinet_ioctl+0x6b4/0x7d0)
[ 294.983191] [<c05a55a0>] (devinet_ioctl) from [<c0517040>]
(sock_ioctl+0x1e4/0x2bc)
[ 294.991244] [<c0517040>] (sock_ioctl) from [<c017d378>]
(do_vfs_ioctl+0x420/0x6b0)
[ 294.999208] [<c017d378>] (do_vfs_ioctl) from [<c017d674>]
(SyS_ioctl+0x6c/0x7c)
[ 295.006880] [<c017d674>] (SyS_ioctl) from [<c000f4c0>]
(ret_fast_syscall+0x0/0x54)
[ 295.014835] Code: e1550004e24440340a00007de5953018 (e5942008)
[ 295.021544] ---[ end trace 66ed188198f4e24e ]---
Signed-off-by: Uri Mashiach <uri.mashiach@compulab.co.il> Acked-by: Igor Grinberg <grinberg@compulab.co.il> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When recovery master down, dlm_do_local_recovery_cleanup() only remove
the $RECOVERY lock owned by dead node, but do not clear the refmap bit.
Which will make umount thread falling in dead loop migrating $RECOVERY
to the dead node.
Signed-off-by: xuejiufei <xuejiufei@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We have found that migration source will trigger a BUG that the refcount
of mle is already zero before put when the target is down during
migration. The situation is as follows:
dlm_migrate_lockres
dlm_add_migration_mle
dlm_mark_lockres_migrating
dlm_get_mle_inuse
<<<<<< Now the refcount of the mle is 2.
dlm_send_one_lockres and wait for the target to become the
new master.
<<<<<< o2hb detect the target down and clean the migration
mle. Now the refcount is 1.
dlm_migrate_lockres woken, and put the mle twice when found the target
goes down which trigger the BUG with the following message:
"ERROR: bad mle: ".
Signed-off-by: Jiufei Xue <xuejiufei@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Many codecs, typically found on Realtek codecs, have the analog
loopback path merged to the secondary input of the middle of the
output paths. Currently, we don't offer the dynamic switching in such
configuration but let each loopback path mute by itself.
This should work well in theory, but in reality, we often see that
such a dead loopback path causes some background noises even if all
the elements get muted. Such a problem has been fixed by adding the
quirk accordingly to disable aamix, and it's the right fix, per se.
The only problem is that it's not so trivial to achieve it; user needs
to pass a hint string via patch module option or sysfs.
This patch gives a bit improvement on the situation: it adds "Loopback
Mixing" control element for such codecs like other codecs (e.g. IDT or
VIA codecs) with the individual loopback paths. User can turn on/off
the loopback path simply via a mixer app.
For keeping the compatibility, the loopback is still enabled on these
codecs. But user can try to turn it off if experiencing a suspicious
background or click noise on the fly, then build a static fixup later
once after the problem is addressed.
Other than the addition of the loopback enable/disablement control,
there should be no changes.
After commit e36f62042880(block: split bios to maxpossible length),
bio can be splitted in the middle of a vector entry, then it
is easy to split out one bio which size isn't aligned with block
size, especially when the block size is bigger than 512.
This patch fixes the issue by making the max io size aligned
to logical block size.
Fixes: e36f62042880(block: split bios to maxpossible length) Reported-by: Stefan Haberland <sth@linux.vnet.ibm.com> Cc: Keith Busch <keith.busch@intel.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since b8b2c7d845d5, platform_drv_probe() is called for all platform
devices. If drv->probe is NULL, and dev_pm_domain_attach() fails,
platform_drv_probe() will return the error code from dev_pm_domain_attach().
This causes real_probe() to enter the "probe_failed" path and set
dev->driver to NULL. Before b8b2c7d845d5, real_probe() would assume
success if both dev->bus->probe and drv->probe were missing. As a result,
a device and driver could be "bound" together just by matching their names;
this doesn't work any more after b8b2c7d845d5.
This may cause problems later for certain usage of platform_driver_register()
and platform_device_register_simple(). I observed a panic while loading
the tpm_tis driver with parameter "force=1" (i.e. registering tpm_tis as
a platform driver), because tpm_tis_init's assumption that the device
returned by platform_device_register_simple() was bound didn't hold any more
(tpmm_chip_alloc() dereferences chip->pdev->driver, causing panic).
This patch restores the previous (4.3.0 and earlier) behavior of
platform_drv_probe() in the case when the associated platform driver has
no "probe" function.
Fixes: b8b2c7d845d5 ("base/platform: assert that dev_pm_domain callbacks are called unconditionally") Signed-off-by: Martin Wilck <Martin.Wilck@ts.fujitsu.com> Cc: Martin Fuzzey <mfuzzey@parkeon.com> Acked-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The critical section protected by usbhid->lock in hid_ctrl() is too
big and because of this it causes a recursive deadlock. "Too big" means
the case statement and the call to hid_input_report() do not need to be
protected by the spinlock (no URB operations are done inside them).
The deadlock happens because in certain rare cases drivers try to grab
the lock while handling the ctrl irq which grabs the lock before them
as described above. For example newer wacom tablets like 056a:033c try
to reschedule proximity reads from wacom_intuos_schedule_prox_event()
calling hid_hw_request() -> usbhid_request() -> usbhid_submit_report()
which tries to grab the usbhid lock already held by hid_ctrl().
There are two ways to get out of this deadlock:
1. Make the drivers work "around" the ctrl critical region, in the
wacom case for ex. by delaying the scheduling of the proximity read
request itself to a workqueue.
2. Shrink the critical region so the usbhid lock protects only the
instructions which modify usbhid state, calling hid_input_report()
with the spinlock unlocked, allowing the device driver to grab the
lock first, finish and then grab the lock afterwards in hid_ctrl().
NFS on a 2 node ocfs2 cluster each node exporting dir. The lock causing
the hang is the global bit map inode lock. Node 1 is master, has the
lock granted in PR mode; Node 2 is in the converting list (PR -> EX).
There are no holders of the lock on the master node so it should
downconvert to NL and grant EX to node 2 but that does not happen.
BLOCKED + QUEUED in lock res are set and it is on osb blocked list.
Threads are waiting in __ocfs2_cluster_lock on BLOCKED. One thread
wants EX, rest want PR. So it is as though the downconvert thread needs
to be kicked to complete the conv.
The hang is caused by an EX req coming into __ocfs2_cluster_lock on the
heels of a PR req after it sets BUSY (drops l_lock, releasing EX
thread), forcing the incoming EX to wait on BUSY without doing anything.
PR has called ocfs2_dlm_lock, which sets the node 1 lock from NL -> PR,
queues ast.
At this time, upconvert (PR ->EX) arrives from node 2, finds conflict
with node 1 lock in PR, so the lock res is put on dlm thread's dirty
listt.
After ret from ocf2_dlm_lock, PR thread now waits behind EX on BUSY till
awoken by ast.
Now it is dlm_thread that serially runs dlm_shuffle_lists, ast, bast, in
that order. dlm_shuffle_lists ques a bast on behalf of node 2 (which
will be run by dlm_thread right after the ast). ast does its part, sets
UPCONVERT_FINISHING, clears BUSY and wakes its waiters. Next,
dlm_thread runs bast. It sets BLOCKED and kicks dc thread. dc thread
runs ocfs2_unblock_lock, but since UPCONVERT_FINISHING set, skips doing
anything and reques.
Inside of __ocfs2_cluster_lock, since EX has been waiting on BUSY ahead
of PR, it wakes up first, finds BLOCKED set and skips doing anything but
clearing UPCONVERT_FINISHING (which was actually "meant" for the PR
thread), and this time waits on BLOCKED. Next, the PR thread comes out
of wait but since UPCONVERT_FINISHING is not set, it skips updating the
l_ro_holders and goes straight to wait on BLOCKED. So there, we have a
hang! Threads in __ocfs2_cluster_lock wait on BLOCKED, lock res in osb
blocked list. Only when dc thread is awoken, it will run
ocfs2_unblock_lock and things will unhang.
One way to fix this is to wake the dc thread on the flag after clearing
UPCONVERT_FINISHING
Orabug: 20933419 Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Eric Ren <zren@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This splits bio in the middle of a vector to form the largest possible
bio at the h/w's desired alignment, and guarantees the bio being split
will have some data.
The criteria for splitting is changed from the max sectors to the h/w's
optimal sector alignment if it is provided. For h/w that advertise their
block storage's underlying chunk size, it's a big performance win to not
submit commands that cross them. If sector alignment is not provided,
this patch uses the max sectors as before.
This addresses the performance issue commit d380561113 attempted to
fix, but was reverted due to splitting logic error.
Signed-off-by: Keith Busch <keith.busch@intel.com> Cc: Jens Axboe <axboe@fb.com> Cc: Ming Lei <tom.leiming@gmail.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit 2d8ae84fbc32, nothing is bumping lo->plh_block_lgets in the
layoutreturn path, so it should not be touched in nfs4_layoutreturn_release
either.
sun4i-ss implementaton of md5/sha1 is via ahash algorithms.
Commit 8996eafdcbad ("crypto: ahash - ensure statesize is non-zero")
made impossible to load them without giving statesize. This patch
specifiy statesize for sha1 and md5.
The Performance Monitors extension is an optional feature of the
AArch64 architecture, therefore, in order to access Performance
Monitors registers safely, the kernel should detect the architected
PMU unit presence through the ID_AA64DFR0_EL1 register PMUVer field
before accessing them.
This patch implements a guard by reading the ID_AA64DFR0_EL1 register
PMUVer field to detect the architected PMU presence and prevent accessing
PMU system registers if the Performance Monitors extension is not
implemented in the core.
Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Fixes: 60792ad349f3 ("arm64: kernel: enforce pmuserenr_el0 initialization and restore") Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The pmuserenr_el0 register value is architecturally UNKNOWN on reset.
Current kernel code resets that register value iff the core pmu device is
correctly probed in the kernel. On platforms with missing DT pmu nodes (or
disabled perf events in the kernel), the pmu is not probed, therefore the
pmuserenr_el0 register is not reset in the kernel, which means that its
value retains the reset value that is architecturally UNKNOWN (system
may run with eg pmuserenr_el0 == 0x1, which means that PMU counters access
is available at EL0, which must be disallowed).
This patch adds code that resets pmuserenr_el0 on cold boot and restores
it on core resume from shutdown, so that the pmuserenr_el0 setup is
always enforced in the kernel.
Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In paging_init, we allocate the zero page, memset it to zero and then
point TTBR0 to it in order to avoid speculative fetches through the
identity mapping.
In order to guarantee that the freshly zeroed page is indeed visible to
the page table walker, we need to execute a dsb instruction prior to
writing the TTBR.
Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Make sure to clear out any ptrace singlestep state when a ptrace(2)
PTRACE_DETACH call is made on arm64 systems.
Otherwise, the previously ptraced task will die off with a SIGTRAP
signal if the debugger just previously singlestepped the ptraced task.
Signed-off-by: John Blackwood <john.blackwood@ccur.com>
[will: added comment to justify why this is in the arch code] Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
GCC 6 will include changes to generated code with -mcmodel=large,
which is used to build kernel modules on powerpc64le. This was
necessary because the large model is supposed to allow arbitrary
sizes and locations of the code and data sections, but the ELFv2
global entry point prolog still made the unconditional assumption
that the TOC associated with any particular function can be found
within 2 GB of the function entry point:
The new .reloc triggers an optimization in the linker that will
replace this new prolog with the original code (see above) if the
linker determines that the distance between .TOC. and func is in
range after all.
Since this new relocation is now present in module object files,
the kernel module loader is required to handle them too. This
patch adds support for the new relocation and implements the
same optimization done by the GNU linker.
If a text section starts out with a data blob before the first
function start label, disassembly parsing doing in recordmcount.pl
gets confused on powerpc, leading to creation of corrupted module
objects.
This was not a problem so far since the compiler would never create
such text sections. However, this has changed with a recent change
in GCC 6 to support distances of > 2GB between a function and its
assoicated TOC in the ELFv2 ABI, exposing this problem.
There is already code in recordmcount.pl to handle such data blobs
on the sparc64 platform. This patch uses the same method to handle
those on powerpc as well.
Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ulrich Weigand <ulrich.weigand@de.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
According to memory-barriers.txt, xchg*, cmpxchg* and their atomic_
versions all need to be fully ordered, however they are now just
RELEASE+ACQUIRE, which are not fully ordered.
So also replace PPC_RELEASE_BARRIER and PPC_ACQUIRE_BARRIER with
PPC_ATOMIC_ENTRY_BARRIER and PPC_ATOMIC_EXIT_BARRIER in
__{cmp,}xchg_{u32,u64} respectively to guarantee fully ordered semantics
of atomic{,64}_{cmp,}xchg() and {cmp,}xchg(), as a complement of commit b97021f85517 ("powerpc: Fix atomic_xxx_return barrier semantics")
This patch depends on patch "powerpc: Make value-returning atomics fully
ordered" for PPC_ATOMIC_ENTRY_BARRIER definition.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Any atomic operation that modifies some state in memory and returns
> information about the state (old or new) implies an SMP-conditional
> general memory barrier (smp_mb()) on each side of the actual
> operation ...
Which mean these operations should be fully ordered. However on PPC,
PPC_ATOMIC_ENTRY_BARRIER is the barrier before the actual operation,
which is currently "lwsync" if SMP=y. The leading "lwsync" can not
guarantee fully ordered atomics, according to Paul Mckenney:
https://lkml.org/lkml/2015/10/14/970
To fix this, we define PPC_ATOMIC_ENTRY_BARRIER as "sync" to guarantee
the fully-ordered semantics.
This also makes futex atomics fully ordered, which can avoid possible
memory ordering problems if userspace code relies on futex system call
for fully ordered semantics.
Fixes: b97021f85517 ("powerpc: Fix atomic_xxx_return barrier semantics") Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently we can hit a scenario where we'll tm_reclaim() twice. This
results in a TM bad thing exception because the second reclaim occurs
when not in suspend mode.
The scenario in which this can happen is the following. We attempt to
deliver a signal to userspace. To do this we need obtain the stack
pointer to write the signal context. To get this stack pointer we
must tm_reclaim() in case we need to use the checkpointed stack
pointer (see get_tm_stackpointer()). Normally we'd then return
directly to userspace to deliver the signal without going through
__switch_to().
Unfortunatley, if at this point we get an error (such as a bad
userspace stack pointer), we need to exit the process. The exit will
result in a __switch_to(). __switch_to() will attempt to save the
process state which results in another tm_reclaim(). This
tm_reclaim() now causes a TM Bad Thing exception as this state has
already been saved and the processor is no longer in TM suspend mode.
Whee!
This patch checks the state of the MSR to ensure we are TM suspended
before we attempt the tm_reclaim(). If we've already saved the state
away, we should no longer be in TM suspend mode. This has the
additional advantage of checking for a potential TM Bad Thing
exception.
Found using syscall fuzzer.
Fixes: fb09692e71f1 ("powerpc: Add reclaim and recheckpoint functions for context switching transactional memory processes") Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is not allowed to free the memory of an object which is part of a list
which is protected by rcu-read-side-critical sections without making sure
that no other context is accessing the object anymore. This usually happens
by removing the references to this object and then waiting until the rcu
grace period is over and no one (allowedly) accesses it anymore.
But the _now functions ignore this completely. They free the object
directly even when a different context still tries to access it. This has
to be avoided and thus these functions must be removed and all functions
have to use batadv_orig_node_free_ref.
Fixes: 72822225bd41 ("batman-adv: Fix rcu_barrier() miss due to double call_rcu() in TT code") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is not allowed to free the memory of an object which is part of a list
which is protected by rcu-read-side-critical sections without making sure
that no other context is accessing the object anymore. This usually happens
by removing the references to this object and then waiting until the rcu
grace period is over and no one (allowedly) accesses it anymore.
But the _now functions ignore this completely. They free the object
directly even when a different context still tries to access it. This has
to be avoided and thus these functions must be removed and all functions
have to use batadv_hardif_free_ref.
Fixes: 89652331c00f ("batman-adv: split tq information in neigh_node struct") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is not allowed to free the memory of an object which is part of a list
which is protected by rcu-read-side-critical sections without making sure
that no other context is accessing the object anymore. This usually happens
by removing the references to this object and then waiting until the rcu
grace period is over and no one (allowedly) accesses it anymore.
But the _now functions ignore this completely. They free the object
directly even when a different context still tries to access it. This has
to be avoided and thus these functions must be removed and all functions
have to use batadv_neigh_ifinfo_free_ref.
Fixes: 89652331c00f ("batman-adv: split tq information in neigh_node struct") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is not allowed to free the memory of an object which is part of a list
which is protected by rcu-read-side-critical sections without making sure
that no other context is accessing the object anymore. This usually happens
by removing the references to this object and then waiting until the rcu
grace period is over and no one (allowedly) accesses it anymore.
But the _now functions ignore this completely. They free the object
directly even when a different context still tries to access it. This has
to be avoided and thus these functions must be removed and all functions
have to use batadv_neigh_node_free_ref.
Fixes: 89652331c00f ("batman-adv: split tq information in neigh_node struct") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is not allowed to free the memory of an object which is part of a list
which is protected by rcu-read-side-critical sections without making sure
that no other context is accessing the object anymore. This usually happens
by removing the references to this object and then waiting until the rcu
grace period is over and no one (allowedly) accesses it anymore.
But the _now functions ignore this completely. They free the object
directly even when a different context still tries to access it. This has
to be avoided and thus these functions must be removed and all functions
have to use batadv_orig_ifinfo_free_ref.
Fixes: 7351a4822d42 ("batman-adv: split out router from orig_node") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The batadv_nc_node_free_ref function uses call_rcu to delay the free of the
batadv_nc_node object until no (already started) rcu_read_lock is enabled
anymore. This makes sure that no context is still trying to access the
object which should be removed. But batadv_nc_node also contains a
reference to orig_node which must be removed.
The reference drop of orig_node was done in the call_rcu function
batadv_nc_node_free_rcu but should actually be done in the
batadv_nc_node_release function to avoid nested call_rcus. This is
important because rcu_barrier (e.g. batadv_softif_free or batadv_exit) will
not detect the inner call_rcu as relevant for its execution. Otherwise this
barrier will most likely be inserted in the queue before the callback of
the first call_rcu was executed. The caller of rcu_barrier will therefore
continue to run before the inner call_rcu callback finished.
Fixes: d56b1705e28c ("batman-adv: network coding - detect coding nodes and remove these after timeout") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The batadv_claim_free_ref function uses call_rcu to delay the free of the
batadv_bla_claim object until no (already started) rcu_read_lock is enabled
anymore. This makes sure that no context is still trying to access the
object which should be removed. But batadv_bla_claim also contains a
reference to backbone_gw which must be removed.
The reference drop of backbone_gw was done in the call_rcu function
batadv_claim_free_rcu but should actually be done in the
batadv_claim_release function to avoid nested call_rcus. This is important
because rcu_barrier (e.g. batadv_softif_free or batadv_exit) will not
detect the inner call_rcu as relevant for its execution. Otherwise this
barrier will most likely be inserted in the queue before the callback of
the first call_rcu was executed. The caller of rcu_barrier will therefore
continue to run before the inner call_rcu callback finished.
Fixes: 23721387c409 ("batman-adv: add basic bridge loop avoidance code") Signed-off-by: Sven Eckelmann <sven@narfation.org> Acked-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We can't be within an RCU read-side critical section when deleting
VLANs, as underlying drivers might sleep during the hardware operation.
Therefore, replace the RCU critical section with a mutex. This is
consistent with team_vlan_rx_add_vid.
Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device") Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixing it by changing of the irqn type from u8 to unsigned int to
support values > 255
Fixes: 61d0e73e0a5a ('net/mlx5_core: Use the the real irqn in eq->irqn') Reported-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Doron Tsur <doront@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After promisc mode management was introduced a bridge device could do
dev_set_promiscuity from its ndo_change_rx_flags() callback which in
turn can be called after the bridge's addr_list_lock has been taken
(e.g. by dev_uc_add). This causes a false positive lockdep splat because
the port interfaces' addr_list_lock is taken when br_manage_promisc()
runs after the bridge's addr list lock was already taken.
To remove the false positive introduce a custom bridge addr_list_lock
class and set it on bridge init.
A simple way to reproduce this is with the following:
$ brctl addbr br0
$ ip l add l br0 br0.100 type vlan id 100
$ ip l set br0 up
$ ip l set br0.100 up
$ echo 1 > /sys/class/net/br0/bridge/vlan_filtering
$ brctl addif br0 eth0
Splat:
[ 43.684325] =============================================
[ 43.684485] [ INFO: possible recursive locking detected ]
[ 43.684636] 4.4.0-rc8+ #54 Not tainted
[ 43.684755] ---------------------------------------------
[ 43.684906] brctl/1187 is trying to acquire lock:
[ 43.685047] (_xmit_ETHER){+.....}, at: [<ffffffff8150169e>] dev_set_rx_mode+0x1e/0x40
[ 43.685460] but task is already holding lock:
[ 43.685618] (_xmit_ETHER){+.....}, at: [<ffffffff815072a7>] dev_uc_add+0x27/0x80
[ 43.686015] other info that might help us debug this:
[ 43.686316] Possible unsafe locking scenario:
CC: Vlad Yasevich <vyasevic@redhat.com> CC: Stephen Hemminger <stephen@networkplumber.org> CC: Bridge list <bridge@lists.linux-foundation.org> CC: Andy Gospodarek <gospo@cumulusnetworks.com> CC: Roopa Prabhu <roopa@cumulusnetworks.com> Fixes: 2796d0c648c9 ("bridge: Automatically manage port promiscuous mode.") Reported-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a tunnel decapsulates the outer header, it has to comply
with RFC 6080 and eventually propagate CE mark into inner header.
It turns out IP6_ECN_set_ce() does not correctly update skb->csum
for CHECKSUM_COMPLETE packets, triggering infamous "hw csum failure"
messages and stack traces.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On ARM64, a BUG() is triggered in the eBPF JIT if a filter with a
constant shift that can't be encoded in the immediate field of the
UBFM/SBFM instructions is passed to the JIT. Since these shifts
amounts, which are negative or >= regsize, are invalid, reject them in
the eBPF verifier and the classic BPF filter checker, for all
architectures.
Signed-off-by: Rabin Vincent <rabin@rab.in> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The offset inside the fragment was not used for the dma address and
silent data corruption resulted because TSO makes the checksum match.
Fixes: 077742dac2c7 ("dwc_eth_qos: Add support for Synopsys DWC Ethernet QoS") Signed-off-by: Lars Persson <larper@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 1f718f0f4f97 ("bonding: populate neighbour's private on enslave")
undoes the fix provided by commit c2edacf80e15 ("bonding / ipv6: no addrconf
for slaves separately from master") by effectively setting the slave flag
after the slave has been opened. If the slave comes up quickly enough, it
will go through the IPv6 addrconf before the slave flag has been set and
will get a link local IPv6 address.
In order to ensure that addrconf knows to ignore the slave devices on state
change, set IFF_SLAVE before dev_open() during bonding enslavement.
Fixes: 1f718f0f4f97 ("bonding: populate neighbour's private on enslave") Signed-off-by: Karl Heiss <kheiss@gmail.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Reviewed-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Skb_gso_segment() uses skb control block during segmentation.
This patch adds 32-bytes room for previous control block which
will be copied into all resulting segments.
This patch fixes kernel crash during fragmenting forwarded packets.
Fragmentation requires valid IP CB in skb for clearing ip options.
Also patch removes custom save/restore in ovs code, now it's redundant.
Commit acf8dd0a9d0b ("udp: only allow UFO for packets from SOCK_DGRAM
sockets") disallows UFO for packets sent from raw sockets. We need to do
the same also for SOCK_DGRAM sockets with SO_NO_CHECK options, even if
for a bit different reason: while such socket would override the
CHECKSUM_PARTIAL set by ip_ufo_append_data(), gso_size is still set and
bad offloading flags warning is triggered in __skb_gso_segment().
In the IPv6 case, SO_NO_CHECK option is ignored but we need to disallow
UFO for packets sent by sockets with UDP_NO_CHECK6_TX option.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Tested-by: Shannon Nelson <shannon.nelson@intel.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix possible null pointer dereference that may occur when calling
skb_reserve() on a null skb.
Fixes: 879c7220e82 ("net: pktgen: Observe needed_headroom of the device") Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
only when user space passes the addresses should we consider their
presence
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For tcp_yeah, use an ssthresh floor of 2, the same floor used by Reno
and CUBIC, per RFC 5681 (equation 4).
tcp_yeah_ssthresh() was sometimes returning a 0 or negative ssthresh
value if the intended reduction is as big or bigger than the current
cwnd. Congestion control modules should never return a zero or
negative ssthresh. A zero ssthresh generally results in a zero cwnd,
causing the connection to stall. A negative ssthresh value will be
interpreted as a u32 and will set a target cwnd for PRR near 4
billion.
Oleksandr Natalenko reported that a system using tcp_yeah with ECN
could see a warning about a prior_cwnd of 0 in
tcp_cwnd_reduction(). Testing verified that this was due to
tcp_yeah_ssthresh() misbehaving in this way.
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When first SYNACK is sent, we already hold rcu_read_lock(), but this
is not true if a SYNACK is retransmitted, as a timer (soft) interrupt
does not hold rcu_read_lock()
Fixes: 45f6fad84cc30 ("ipv6: add complete rcu protection around np->opt") Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
proc_dostring() needs an initialized destination string, while the one
provided in proc_sctp_do_hmac_alg() contains stack garbage.
Thus, writing to cookie_hmac_alg would strlen() that garbage and end up
accessing invalid memory.
Fixes: 3c68198e7 ("sctp: Make hmac algorithm selection for cookie generation dynamic") Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a vxlan interface is created, the driver checks that there is not
another vxlan interface with the same properties. To do this, it checks
the existing vxlan udp socket. Since commit 1c51a9159dde, the creation of
the vxlan socket is done only when the interface is set up, thus it breaks
that test.
Example:
$ ip l a vxlan10 type vxlan id 10 group 239.0.0.10 dev eth0 dstport 0
$ ip l a vxlan11 type vxlan id 10 group 239.0.0.10 dev eth0 dstport 0
$ ip -br l | grep vxlan
vxlan10 DOWN f2:55:1c:6a:fb:00 <BROADCAST,MULTICAST>
vxlan11 DOWN 7a:cb:b9:38:59:0d <BROADCAST,MULTICAST>
Instead of checking sockets, let's loop over the vxlan iface list.
Fixes: 1c51a9159dde ("vxlan: fix race caused by dropping rtnl_unlock") Reported-by: Thomas Faivre <thomas.faivre@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is possible for a process to allocate and accumulate far more FDs than
the process' limit by sending them over a unix socket then closing them
to keep the process' fd count low.
This change addresses this problem by keeping track of the number of FDs
in flight per user and preventing non-privileged processes from having
more FDs in flight than their configured FD limit.
Reported-by: socketpair@gmail.com Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Mitigates: CVE-2013-4312 (Linux 2.0+) Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 655fe4effe0f ("usbcore: add sysfs support to xHCI usb3
hardware LPM") introduced usb3_hardware_lpm sysfs node. This
doesn't show the correct status of USB3 U1 and U2 LPM status.
This patch fixes this by replacing usb3_hardware_lpm with two
nodes, usb3_hardware_lpm_u1 (for U1) and usb3_hardware_lpm_u2
(for U2), and recording the U1/U2 LPM status in right places.
This patch should be back-ported to kernels as old as 4.3,
that contains Commit 655fe4effe0f ("usbcore: add sysfs support
to xHCI usb3 hardware LPM").
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Free skb for received frames with a wrong checksum. This can happen
pretty rapidly, exhausting all memory.
This fixes a memleak (detected with kmemleak). Originally found while
using monitor mode, but it also appears during managed mode (once the
link is up).
Signed-off-by: Peter Wu <peter@lekensteyn.nl> ACKed-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The detection of direction for compress was only taking into account codec
capabilities and not CPU ones. Fix this by checking the CPU side capabilities
as well
Tested-by: Ashish Panwar <ashish.panwar@intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We don't want to use a bypassed write in wm5110_clear_pga_volume,
we might disable the DRE whilst the CODEC is powered down. A
normal regmap_write will always go to the hardware (when not on
cache_only) even if the written value matches the cache. As using
a normal write will still achieve the desired behaviour of bring
the cache and hardware in sync, this patch updates the function
to use a normal write, which avoids issues when the CODEC is
powered down.
Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently ALSA timer device doesn't take the disconnection into
account very well; it merely unlinks the timer device at disconnection
callback but does nothing else. Because of this, when an application
accessing the timer device is disconnected, it may release the
resource before actually closed. In most cases, it results in a
warning message indicating a leftover timer instance like:
ALSA: timer xxxx is busy?
But basically this is an open race.
This patch tries to address it. The strategy is like other ALSA
devices: namely,
- Manage card's refcount at each open/close
- Wake up the pending tasks at disconnection
- Check the shutdown flag appropriately at each possible call
Note that this patch has one ugly hack to handle the wakeup of pending
tasks. It'd be cleaner to introduce a new disconnect op to
snd_timer_instance ops. But since it would lead to internal ABI
breakage and it eventually increase my own work when backporting to
stable kernels, I took a different path to implement locally in
timer.c. A cleanup patch will follow at next for 4.5 kernel.