Daniel Borkmann [Wed, 10 Feb 2016 15:47:11 +0000 (16:47 +0100)]
bpf: fix branch offset adjustment on backjumps after patching ctx expansion
When ctx access is used, the kernel often needs to expand/rewrite
instructions, so after that patching, branch offsets have to be
adjusted for both forward and backward jumps in the new eBPF program,
but for backward jumps it fails to account the delta. Meaning, for
example, if the expansion happens exactly on the insn that sits at
the jump target, it doesn't fix up the back jump offset.
Analysis on what the check in adjust_branches() is currently doing:
/* adjust offset of jmps if necessary */
if (i < pos && i + insn->off + 1 > pos)
insn->off += delta;
else if (i > pos && i + insn->off + 1 < pos)
insn->off -= delta;
First case is if we cross pos-boundary and the jump instruction was
before pos. This is handeled correctly. I.e. if i == pos, then this
would mean our jump that we currently check was the patchlet itself
that we just injected. Since such patchlets are self-contained and
have no awareness of any insns before or after the patched one, the
delta is correctly not adjusted. Also, for the second condition in
case of i + insn->off + 1 == pos, means we jump to that newly patched
instruction, so no offset adjustment are needed. That part is correct.
Second interesting case is where we cross pos-boundary and the jump
instruction was after pos. Backward jump with i == pos would be
impossible and pose a bug somewhere in the patchlet, so the first
condition checking i > pos is okay only by itself. However, i +
insn->off + 1 < pos does not always work as intended to trigger the
adjustment. It works when jump targets would be far off where the
delta wouldn't matter. But, for example, where the fixed insn->off
before pointed to pos (target_Y), it now points to pos + delta, so
that additional room needs to be taken into account for the check.
This means that i) both tests here need to be adjusted into pos + delta,
and ii) for the second condition, the test needs to be <= as pos
itself can be a target in the backjump, too.
Fixes: 9bac3d6d548e ("bpf: allow extended BPF programs access skb fields") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 10 Feb 2016 20:04:59 +0000 (12:04 -0800)]
Merge branch 'for-4.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
- PORTS_IMPL workaround for very early ahci controllers is misbehaving
on new systems. Disabled on recent ahci versions.
- Old-style PIO state machine had a horrible locking problem. Don't
know how we've been getting away this far. Fixed.
- Other device specific updates.
* 'for-4.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ahci: Intel DNV device IDs SATA
libata: fix sff host state machine locking while polling
libata-sff: use WARN instead of BUG on illegal host state machine state
libata: disable forced PORTS_IMPL for >= AHCI 1.3
libata: blacklist a Viking flash model for MWDMA corruption
drivers: ata: wake port before DMA stop for ALPM
Linus Torvalds [Wed, 10 Feb 2016 19:36:19 +0000 (11:36 -0800)]
Merge branch 'for-4.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
- The destruction path of cgroup objects are asynchronous and
multi-staged and some of them ended up destroying parents before
children leading to failures in cpu and memory controllers. Ensure
that parents are always destroyed after children.
- cpuset mm node migration was performed synchronously while holding
threadgroup and cgroup mutexes and the recent threadgroup locking
update resulted in a possible deadlock. The migration is best effort
and shouldn't have been performed under those locks to begin with.
Made asynchronous.
- Minor documentation fix.
* 'for-4.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
Documentation: cgroup: Fix 'cgroup-legacy' -> 'cgroup-v1'
cgroup: make sure a parent css isn't freed before its children
cgroup: make sure a parent css isn't offlined before its children
cpuset: make mm migration asynchronous
Linus Torvalds [Wed, 10 Feb 2016 19:04:05 +0000 (11:04 -0800)]
Merge branch 'for-4.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:
"Workqueue fixes for v4.5-rc3.
- Remove a spurious triggering of flush dependency warning.
- Officially break local execution guarantee of unbound work items
and add a debug feature to flush out usages which depend on it.
- Work around CPU -> NODE mapping becoming invalid on CPU offline.
The branch is young but pushing out early as stable kernels are being
affected"
* 'for-4.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup
workqueue: implement "workqueue.debug_force_rr_cpu" debug feature
workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs
Revert "workqueue: make sure delayed work run in local cpu"
workqueue: skip flush dependency checks for legacy workqueues
Tejun Heo [Wed, 3 Feb 2016 18:54:25 +0000 (13:54 -0500)]
workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup
When looking up the pool_workqueue to use for an unbound workqueue,
workqueue assumes that the target CPU is always bound to a valid NUMA
node. However, currently, when a CPU goes offline, the mapping is
destroyed and cpu_to_node() returns NUMA_NO_NODE.
This has always been broken but hasn't triggered often enough before 874bbfe600a6 ("workqueue: make sure delayed work run in local cpu").
After the commit, workqueue forcifully assigns the local CPU for
delayed work items without explicit target CPU to fix a different
issue. This widens the window where CPU can go offline while a
delayed work item is pending causing delayed work items dispatched
with target CPU set to an already offlined CPU. The resulting
NUMA_NO_NODE mapping makes workqueue try to queue the work item on a
NULL pool_workqueue and thus crash.
While 874bbfe600a6 has been reverted for a different reason making the
bug less visible again, it can still happen. Fix it by mapping
NUMA_NO_NODE to the default pool_workqueue from unbound_pwq_by_node().
This is a temporary workaround. The long term solution is keeping CPU
-> NODE mapping stable across CPU off/online cycles which is being
worked on.
Lyude [Thu, 4 Feb 2016 15:43:21 +0000 (10:43 -0500)]
drm/i915/skl: Fix typo in DPLL_CFGCR1 definition
We accidentally point both cfgcr registers for the second shared DPLL to
the same location in i915_reg.h. This results in a lot of hw pipe state
mismatches whenever we try to do a modeset that requires allocating the
DPLL to a CRTC:
[drm:intel_pipe_config_compare [i915]] *ERROR* mismatch in dpll_hw_state.cfgcr1 (expected 0x80000168, found 0x000004a5)
[drm:intel_pipe_config_compare [i915]] *ERROR* mismatch in base.adjusted_mode.crtc_clock (expected 108000, found 49500)
[drm:intel_pipe_config_compare [i915]] *ERROR* mismatch in port_clock (expected 108000, found 49500)
This usually ends up causing blank monitors, since the DPLL never can
get set to the right clock.
Lyude [Tue, 2 Feb 2016 15:49:43 +0000 (10:49 -0500)]
drm/i915/skl: Don't skip mst encoders in skl_ddi_pll_select()
We don't actually check for INTEL_OUTPUT_DP_MST at all in here, as a
result we skip assigning a DPLL to any DP MST ports, which makes link
training fail:
[ 1442.933896] [drm:intel_power_well_enable] enabling DDI D power well
[ 1442.933905] [drm:skl_set_power_well] Enabling DDI D power well
[ 1442.933957] [drm:intel_mst_pre_enable_dp] 0
[ 1442.935474] [drm:intel_dp_set_signal_levels] Using signal levels 00000000
[ 1442.935477] [drm:intel_dp_set_signal_levels] Using vswing level 0
[ 1442.935480] [drm:intel_dp_set_signal_levels] Using pre-emphasis level 0
[ 1442.936190] [drm:intel_dp_set_signal_levels] Using signal levels 05000000
[ 1442.936193] [drm:intel_dp_set_signal_levels] Using vswing level 1
[ 1442.936195] [drm:intel_dp_set_signal_levels] Using pre-emphasis level 1
[ 1442.936858] [drm:intel_dp_set_signal_levels] Using signal levels 08000000
[ 1442.936862] [drm:intel_dp_set_signal_levels] Using vswing level 2
…
[ 1442.998253] [drm:intel_dp_link_training_clock_recovery [i915]] *ERROR* too many full retries, give up
[ 1442.998512] [drm:intel_dp_start_link_train [i915]] *ERROR* failed to train DP, aborting
After which the pipe state goes completely out of sync:
[ 70.075596] [drm:check_crtc_state] [CRTC:25]
[ 70.075696] [drm:intel_pipe_config_compare [i915]] *ERROR* mismatch in ddi_pll_sel (expected 0x00000000, found 0x00000001)
[ 70.075747] [drm:intel_pipe_config_compare [i915]] *ERROR* mismatch in shared_dpll (expected -1, found 0)
[ 70.075798] [drm:intel_pipe_config_compare [i915]] *ERROR* mismatch in dpll_hw_state.ctrl1 (expected 0x00000000, found 0x00000021)
[ 70.075840] [drm:intel_pipe_config_compare [i915]] *ERROR* mismatch in dpll_hw_state.cfgcr1 (expected 0x00000000, found 0x80400173)
[ 70.075884] [drm:intel_pipe_config_compare [i915]] *ERROR* mismatch in dpll_hw_state.cfgcr2 (expected 0x00000000, found 0x000003a5)
[ 70.075954] [drm:intel_pipe_config_compare [i915]] *ERROR* mismatch in base.adjusted_mode.crtc_clock (expected 262750, found 72256)
[ 70.075999] [drm:intel_pipe_config_compare [i915]] *ERROR* mismatch in port_clock (expected 540000, found 148500)
And if you're especially lucky, it keeps going downhill:
David S. Miller [Wed, 10 Feb 2016 10:50:16 +0000 (05:50 -0500)]
Merge branch 'ovs-tunnel-mtu'
David Wragg says:
====================
Set a large MTU on ovs-created tunnel devices
Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could
transmit vxlan packets of any size, constrained only by the ability to
send out the resulting packets. 4.3 introduced netdevs corresponding
to tunnel vports. These netdevs have an MTU, which limits the size of
a packet that can be successfully encapsulated. The default MTU
values are low (1500 or less), which is awkwardly small in the context
of physical networks supporting jumbo frames, and leads to a
conspicuous change in behaviour for userspace.
This patch series sets the MTU on openvswitch-created netdevs to be
the relevant maximum (i.e. the maximum IP packet size minus any
relevant overhead), effectively restoring the behaviour prior to 4.3.
Where relevant, the limits on MTU values that can be directly set on
the netdevs are also relaxed.
Changes in v2:
* Extend to all openvswitch tunnel types, i.e. gre and geneve as well
* Use IP_MAX_MTU
Changes in v3:
* Fix block comment style
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Wragg [Wed, 10 Feb 2016 00:05:58 +0000 (00:05 +0000)]
vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices
Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could
transmit vxlan packets of any size, constrained only by the ability to
send out the resulting packets. 4.3 introduced netdevs corresponding
to tunnel vports. These netdevs have an MTU, which limits the size of
a packet that can be successfully encapsulated. The default MTU
values are low (1500 or less), which is awkwardly small in the context
of physical networks supporting jumbo frames, and leads to a
conspicuous change in behaviour for userspace.
Instead, set the MTU on openvswitch-created netdevs to be the relevant
maximum (i.e. the maximum IP packet size minus any relevant overhead),
effectively restoring the behaviour prior to 4.3.
Signed-off-by: David Wragg <david@weave.works> Signed-off-by: David S. Miller <davem@davemloft.net>
David Wragg [Wed, 10 Feb 2016 00:05:57 +0000 (00:05 +0000)]
geneve: Relax MTU constraints
Allow the MTU of geneve devices to be set to large values, in order to
exploit underlying networks with larger frame sizes.
GENEVE does not have a fixed encapsulation overhead (an openvswitch
rule can add variable length options), so there is no relevant maximum
MTU to enforce. A maximum of IP_MAX_MTU is used instead.
Encapsulated packets that are too big for the underlying network will
get dropped on the floor.
Signed-off-by: David Wragg <david@weave.works> Signed-off-by: David S. Miller <davem@davemloft.net>
David Wragg [Wed, 10 Feb 2016 00:05:55 +0000 (00:05 +0000)]
vxlan: Relax MTU constraints
Allow the MTU of vxlan devices without an underlying device to be set
to larger values (up to a maximum based on IP packet limits and vxlan
overhead).
Previously, their MTUs could not be set to higher than the
conventional ethernet value of 1500. This is a very arbitrary value
in the context of vxlan, and prevented vxlan devices from being able
to take advantage of jumbo frames etc.
The default MTU remains 1500, for compatibility.
Signed-off-by: David Wragg <david@weave.works> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vineet Gupta [Sun, 7 Feb 2016 07:24:35 +0000 (12:54 +0530)]
ARCv2: intc: Allow interruption by lowest priority interrupt
ARC HS Cores support configurable multiple interrupt priorities of upto
16 levels.
There is processor "interrupt preemption threshhold" in STATUS32.E[4:1]
And several places need to set this up:
1. seed value as kernel is booting
2. seed value for user space programs
3. Arg to SLEEP instruction in idle task (what interrupt prio can wake)
4. Per-IRQ line prioirty (i.e. what is the priority of interrupt
raised by a peripheral or timer or perf counter...
Currently above sites use the highest priority 0. This can be potential
problem when multiple priorities are supported. e.g. user space could
only be interrupted by P0 interrupt, not others...
So turn this over and instead make default interruption level to be
the lowest priority possible 15. This should be fine even if there are
fewer priority levels configured (say two: P0 HIGH, P1 LOW)
This feature also effectively disables FIRQ feature if present in
hardware config. With old code, a P0 interrupt would be FIRQ, needing
special handling (ISR or Register Banks) which is NOT supported yet.
Now it not be P0 (P15 or whatever is lowest prio) so FIRQ is not
triggered.
Gavin Shan [Tue, 9 Feb 2016 04:50:22 +0000 (15:50 +1100)]
powerpc/powernv: Fix stale PE primary bus
When PCI bus is unplugged during full hotplug for EEH recovery,
the platform PE instance (struct pnv_ioda_pe) isn't released and
it dereferences the stale PCI bus that has been released. It leads
to kernel crash when referring to the stale PCI bus.
This fixes the issue by correcting the PE's primary bus when it's
oneline at plugging time, in pnv_pci_dma_bus_setup() which is to
be called by pcibios_fixup_bus().
Cc: stable@vger.kernel.org # v4.1+ Reported-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reported-by: Pradipta Ghosh <pradghos@in.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gavin Shan [Tue, 9 Feb 2016 04:50:21 +0000 (15:50 +1100)]
powerpc/eeh: Fix stale cached primary bus
When PE is created, its primary bus is cached to pe->bus. At later
point, the cached primary bus is returned from eeh_pe_bus_get().
However, we could get stale cached primary bus and run into kernel
crash in one case: full hotplug as part of fenced PHB error recovery
releases all PCI busses under the PHB at unplugging time and recreate
them at plugging time. pe->bus is still dereferencing the PCI bus
that was released.
This adds another PE flag (EEH_PE_PRI_BUS) to represent the validity
of pe->bus. pe->bus is updated when its first child EEH device is
online and the flag is set. Before unplugging in full hotplug for
error recovery, the flag is cleared.
Fixes: 8cdb2833 ("powerpc/eeh: Trace PCI bus from PE") Cc: stable@vger.kernel.org #v3.11+ Reported-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reported-by: Pradipta Ghosh <pradghos@in.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Linus Torvalds [Wed, 10 Feb 2016 00:40:59 +0000 (16:40 -0800)]
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module fixes from Rusty Russell:
"Fix for async_probe module param added in 4.3 (clearly not widely used
yet), and a much more interesting kallsyms race which has been around
approximately forever. This fix is more invasive, and will require
some care in backporting, but I hated all the bandaids I could think
of, so...
There are some more coming, which are only for breakages introduced
this cycle (livepatch), but wanted these in now"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
modules: fix longstanding /proc/kallsyms vs module insertion race.
module: wrapper for symbol name.
modules: fix modparam async_probe request
drivers/input/touchscreen/colibri-vf50-ts.c: In function ‘vf50_ts_probe’:
drivers/input/touchscreen/colibri-vf50-ts.c:302: error: implicit declaration of function ‘of_property_read_u32’
The adp5589 has row 5, don't skip it when creating the GPIO mapping.
Otherwise the pin gets reserved as used and it is not possible to use it as
a GPIO.
Philipp Zabel [Tue, 9 Feb 2016 17:32:42 +0000 (09:32 -0800)]
Input: edt-ft5x06 - fix setting gain, offset, and threshold via device tree
A recent patch broke parsing the gain, offset, and threshold parameters
from device tree. Instead of setting the cached values and writing them
to the correct registers during probe, it would write the values from DT
into the register address variables and never write them to the chip
during normal operation.
Fixes: 2e23b7a96372 ("Input: edt-ft5x06 - use generic properties API") Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Workqueue used to guarantee local execution for work items queued
without explicit target CPU. The guarantee is gone now which can
break some usages in subtle ways. To flush out those cases, this
patch implements a debug feature which forces round-robin CPU
selection for all such work items.
The debug feature defaults to off and can be enabled with a kernel
parameter. The default can be flipped with a debug config option.
If you hit this commit during bisection, please refer to 041bd12e272c
("Revert "workqueue: make sure delayed work run in local cpu"") for
more information and ping me.
Mike Galbraith [Tue, 9 Feb 2016 22:59:38 +0000 (17:59 -0500)]
workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs
WORK_CPU_UNBOUND work items queued to a bound workqueue always run
locally. This is a good thing normally, but not when the user has
asked us to keep unbound work away from certain CPUs. Round robin
these to wq_unbound_cpumask CPUs instead, as perturbation avoidance
trumps performance.
tj: Cosmetic and comment changes. WARN_ON_ONCE() dropped from empty
(wq_unbound_cpumask AND cpu_online_mask). If we want that, it
should be done when config changes.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
Workqueue used to implicity guarantee that work items queued without
explicit CPU specified are put on the local CPU. Recent changes in
timer broke the guarantee and led to vmstat breakage which was fixed
by 176bed1de5bf ("vmstat: explicitly schedule per-cpu work on the CPU
we need it to run on").
vmstat is the most likely to expose the issue and it's quite possible
that there are other similar problems which are a lot more difficult
to trigger. As a preventive measure, 874bbfe600a6 ("workqueue: make
sure delayed work run in local cpu") was applied to restore the local
CPU guarnatee. Unfortunately, the change exposed a bug in timer code
which got fixed by 22b886dd1018 ("timers: Use proper base migration in
add_timer_on()"). Due to code restructuring, the commit couldn't be
backported beyond certain point and stable kernels which only had 874bbfe600a6 started crashing.
The local CPU guarantee was accidental more than anything else and we
want to get rid of it anyway. As, with the vmstat case fixed, 874bbfe600a6 is causing more problems than it's fixing, it has been
decided to take the chance and officially break the guarantee by
reverting the commit. A debug feature will be added to force foreign
CPU assignment to expose cases relying on the guarantee and fixes for
the individual cases will be backported to stable as necessary.
Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: 874bbfe600a6 ("workqueue: make sure delayed work run in local cpu") Link: http://lkml.kernel.org/g/20160120211926.GJ10810@quack.suse.cz Cc: stable@vger.kernel.org Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Daniel Bilik <daniel.bilik@neosystem.cz> Cc: Jan Kara <jack@suse.cz> Cc: Shaohua Li <shli@fb.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Daniel Bilik <daniel.bilik@neosystem.cz> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Michal Hocko <mhocko@kernel.org>
Linus Torvalds [Tue, 9 Feb 2016 18:15:16 +0000 (10:15 -0800)]
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes the following issues:
API:
- Fix async algif_skcipher, it was broken by recent fixes.
- Fix potential race condition in algif_skcipher with ctx.
- Fix potential memory corruption in algif_skcipher.
- Add missing lock to crypto_user when doing an alg dump.
Drivers:
- marvell/cesa was testing the wrong variable for NULL after
allocation.
- Fix potential double-free in atmel-sha.
- Fix illegal call to sleepin function from atomic context in
atmel-sha"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: marvell/cesa - fix test in mv_cesa_dev_dma_init()
crypto: atmel-sha - remove calls of clk_prepare() from atomic contexts
crypto: atmel-sha - fix atmel_sha_remove()
crypto: algif_skcipher - Do not set MAY_BACKLOG on the async path
crypto: algif_skcipher - Do not dereference ctx without socket lock
crypto: algif_skcipher - Do not assume that req is unchanged
crypto: user - lock crypto_alg_list on alg dump
J. Bruce Fields [Wed, 10 Jul 2013 20:54:34 +0000 (16:54 -0400)]
scripts: add "prune-kernel" script to clean up old kernel images
Long ago, Dave Jones complained about CONFIG_LOCALVERSION_AUTO:
"I don't use the auto config, because I end up filling up /boot unless
I go through and clean them out by hand every time I install a new one
(which I do probably a dozen or so times a day). Is there some easy
way to prune old builds I'm missing?"
To which Bruce replied:
"I run this by hand every now and then. I'm probably doing it all wrong"
And if he is running it wrong, then so am I - because I've been using
this script ever since. It is true that CONFIG_LOCALVERSION_AUTO easily
ends up filling your /boot partition if you don't clean up old versions
regularly, and this script helps make that easier.
Checked with Bruce to see that it's fine to add this to the kernel
scripts. Maybe people will come up with enhancements, but more
importantly, this way I won't misplace this script whenever I install a
new machine and start doing custom kernels for it.
Alexander Duyck [Tue, 9 Feb 2016 10:49:54 +0000 (02:49 -0800)]
flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen
This patch fixes an issue with unaligned accesses when using
eth_get_headlen on a page that was DMA aligned instead of being IP aligned.
The fact is when trying to check the length we don't need to be looking at
the flow label so we can reorder the checks to first check if we are
supposed to gather the flow label and then make the call to actually get
it.
v2: Updated path so that either STOP_AT_FLOW_LABEL or KEY_FLOW_LABEL can
cause us to check for the flow label.
Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Takashi Iwai [Mon, 8 Feb 2016 16:26:58 +0000 (17:26 +0100)]
ALSA: timer: Fix race at concurrent reads
snd_timer_user_read() has a potential race among parallel reads, as
qhead and qused are updated outside the critical section due to
copy_to_user() calls. Move them into the critical section, and also
sanitize the relevant code a bit.
Takashi Iwai [Tue, 9 Feb 2016 09:23:52 +0000 (10:23 +0100)]
ALSA: hda - Fix bad dereference of jack object
The hda_jack_tbl entries are managed by snd_array for allowing
multiple jacks. It's good per se, but the problem is that struct
hda_jack_callback keeps the hda_jack_tbl pointer. Since snd_array
doesn't preserve each pointer at resizing the array, we can't keep the
original pointer but have to deduce the pointer at each time via
snd_array_entry() instead. Actually, this resulted in the deference
to the wrong pointer on codecs that have many pins such as CS4208.
This patch replaces the pointer to the NID value as the search key.
As an unexpected good side effect, this even simplifies the code, as
only NID is needed in most cases.
Takashi Iwai [Tue, 9 Feb 2016 11:02:32 +0000 (12:02 +0100)]
ALSA: timer: Fix race between stop and interrupt
A slave timer element also unlinks at snd_timer_stop() but it takes
only slave_active_lock. When a slave is assigned to a master,
however, this may become a race against the master's interrupt
handling, eventually resulting in a list corruption. The actual bug
could be seen with a syzkaller fuzzer test case in BugLink below.
As a fix, we need to take timeri->timer->lock when timer isn't NULL,
i.e. assigned to a master, while the assignment to a master itself is
protected by slave_active_lock.
Denis Kirjanov [Mon, 14 Dec 2015 20:18:06 +0000 (23:18 +0300)]
powerpc/pseries: Don't trace hcalls on offline CPUs
If a cpu is hotplugged while the hcall trace points are active, it's
possible to hit a warning from RCU due to the trace points calling into
RCU from an offline cpu, eg:
RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 1
Make the hypervisor tracepoints conditional by using
TRACE_EVENT_FN_COND.
Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
powerpc/mm: Fix Multi hit ERAT cause by recent THP update
With ppc64 we use the deposited pgtable_t to store the hash pte slot
information. We should not withdraw the deposited pgtable_t without
marking the pmd none. This ensure that low level hash fault handling
will skip this huge pte and we will handle them at upper levels.
Recent change to pmd splitting changed the above in order to handle the
race between pmd split and exit_mmap. The race is explained below.
Consider following race:
CPU0 CPU1
shrink_page_list()
add_to_swap()
split_huge_page_to_list()
__split_huge_pmd_locked()
pmdp_huge_clear_flush_notify()
// pmd_none() == true
exit_mmap()
unmap_vmas()
zap_pmd_range()
// no action on pmd since pmd_none() == true
pmd_populate()
As result the THP will not be freed. The leak is detected by check_mm():
The above required us to not mark pmd none during a pmd split.
The fix for ppc is to clear the huge pte of _PAGE_USER, so that low
level fault handling code skip this pte. At higher level we do take ptl
lock. That should serialze us against the pmd split. Once the lock is
acquired we do check the pmd again using pmd_same. That should always
return false for us and hence we should retry the access. We do the
pmd_same check in all case after taking plt with
THP (do_huge_pmd_wp_page, do_huge_pmd_numa_page and
huge_pmd_set_accessed)
Also make sure we wait for irq disable section in other cpus to finish
before flipping a huge pte entry with a regular pmd entry. Code paths
like find_linux_pte_or_hugepte depend on irq disable to get
a stable pte_t pointer. A parallel thp split need to make sure we
don't convert a pmd pte to a regular pmd entry without waiting for the
irq disable section to finish.
Fixes: eef1b3ba053a ("thp: implement split_huge_pmd()") Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Aaro Koskinen [Wed, 3 Feb 2016 19:35:29 +0000 (21:35 +0200)]
of: of_mdio: Add marvell, 88e1145 to whitelist of PHY compatibilities.
Commit ae461131960b ("of: of_mdio: Add a whitelist of PHY
compatibilities.") missed one compatible string used in in-tree DTBs:
in OCTEON, for selected boards, the kernel DTB pruning code will overwrite
the DTB compatible string with "marvell,88e1145", which is missing
from the whitelist. Add it.
The patch fixes broken networking on EdgeRouter Lite.
Fixes: ae461131960b ("of: of_mdio: Add a whitelist of PHY compatibilities.") Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
This device tree is based on the board file:
arch/arm/mach-orion5x/kurobox_pro-setup.c
However, that board file also support Kurobox Pro, which is not supported by
device tree yet. So the board file is not removed.
Signed-off-by: Roger Shimizu <rogershimizu@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Roger Shimizu [Sat, 6 Feb 2016 05:59:52 +0000 (14:59 +0900)]
ARM: dts: orion5x: split linkstation lswtgl into common and device parts
In order to support more linkstation devices, common part of current
.dts file goes into .dtsi file. Some .dtsi start with "mvebu-" prefix
because other kirkwood based linkstation devices are similar, and
will be migrated to use these .dtsi some time later.
- orion5x-linkstation.dtsi
- mvebu-linkstation-fan.dtsi
- mvebu-linkstation-gpio-simple.dtsi
while all rest part remains in device specific .dts file:
- orion5x-linkstation-lswtgl.dts
Signed-off-by: Roger Shimizu <rogershimizu@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Thomas Petazzoni [Wed, 27 Jan 2016 15:08:20 +0000 (16:08 +0100)]
ARM: dts: armada-38x: add reference to ETH connectors for A385-AP
This commit adds some comments to the Armada 385 AP Device Tree
description to indicate which Ethernet interface matches which
physical connector on the board.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Maxime Ripard <maxime.ripard@free-electrons.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Thomas Petazzoni [Wed, 27 Jan 2016 15:08:19 +0000 (16:08 +0100)]
ARM: dts: armada-38x: change order of ethernet DT nodes on Armada 38x
On Armada 38x, the available network interfaces are:
- port 0, at 0x70000
- port 1, at 0x30000
- port 2, at 0x34000
Due to the rule saying that DT nodes should be ordered by register
addresses, the network interfaces are probed in this order:
- port 1, at 0x30000, which gets named eth0
- port 2, at 0x34000, which gets named eth1
- port 0, at 0x70000, which gets named eth2
(if all three ports are enabled at the board level)
Unfortunately, the network subsystem doesn't provide any way to rename
network interfaces from the kernel (it can only be done from
userspace). So, the default naming of the network interfaces is very
confusing as it doesn't match the datasheet, nor the naming of the
interfaces in the bootloader, nor the naming of the interfaces on
labels printed on the board.
For example, on the Armada 388 GP, the board has two ports, labelled
GE0 and GE1. One has to know that GE0 is eth1 and GE1 is eth0, which
isn't really obvious.
In order to solve this, this patch proposes to exceptionaly violate
the rule of "order DT nodes by register address", and put the 0x70000
node before the 0x30000 node, so that network interfaces get named in
a more natural way.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
ARM: dts: kirkwood: use unique machine name for ds112
Downstream packages like Debian flash-kernel use
/proc/device-tree/model
to determine which dtb file to install.
Hence each dts in the Linux kernel should provide a unique model
identifier.
Commit 2d0a7addbd10 ("ARM: Kirkwood: Add support for many Synology NAS
devices") created the new files kirkwood-ds111.dts and kirkwood-ds112.dts
using the same model identifier.
This patch provides a unique model identifier for the
Synology DiskStation DS112.
Fixes: 2d0a7addbd10 ("ARM: Kirkwood: Add support for many Synology NAS devices") Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Mario Lange [Mon, 25 Jan 2016 16:44:10 +0000 (01:44 +0900)]
ARM: dts: kirkwood: add device tree for buffalo linkstation ls-qvl
Add dts file to support Buffalo Linkstation LS-QVL,
which is marvell kirkwood based 4-bay 3.5" HDD NAS.
Product info:
- (JPN) http://buffalo.jp/product/hdd/network/ls-qvl_r5/
- (ENG) http://www.buffalotech.com/products/network-storage/home-and-small-office/linkstation-pro-quad
Signed-off-by: Mario Lange <mario_lange@gmx.net> Signed-off-by: Roger Shimizu <rogershimizu@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Aaro Koskinen [Sat, 23 Jan 2016 22:36:39 +0000 (00:36 +0200)]
ARM: dts: kirkwood: fix audio for OpenRD clients
Fix audio on kirkwood-openrd-client:
1) The audio-controller was left disabled.
2) The probe fails because cs42l51 is missing #sound-dai-cells.
/sound/simple-audio-card,codec: could not get #sound-dai-cells for /ocp@f1000000/i2c@11000/cs42l51@4a
asoc-simple-card sound: parse error -22
asoc-simple-card: probe of sound failed with error -22
3) The mapping is incorrect:
asoc-simple-card sound: cs42l51-hifi <-> spdif mapping ok
should be:
asoc-simple-card sound: cs42l51-hifi <-> i2s mapping ok
Reported-by: Rick Thomas <rbthomas@pobox.com> Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Rick Thomas <rbthomas@pobox.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Roger Shimizu [Thu, 21 Jan 2016 14:38:50 +0000 (23:38 +0900)]
ARM: dts: kirkwood: split lswvl dts to linkstation lsvl and lswvl
LS-WVL/VL are both kirkwood-6282 based NAS devices, which share
many MPP pins. However they are slightly different:
- LS-WVL is 2-Bay NAS, and LS-VL is only 1-Bay.
- There're two red LED indicator on LS-WVL to show when HDD fails,
which is similar to LS-WXL, but there's no such on LS-VL.
So after the split, common part goes into .dtsi file:
- kirkwood-linkstation-6282.dtsi
while all rest part goes into device specific .dts file:
- kirkwood-linkstation-lsvl.dts
- kirkwood-linkstation-lswvl.dts
Signed-off-by: Roger Shimizu <rogershimizu@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Roger Shimizu [Thu, 21 Jan 2016 14:38:49 +0000 (23:38 +0900)]
ARM: dts: kirkwood: split lswxl dts to linkstation lswsxl and lswxl
LS-WXL/WSXL are both kirkwood-6281 based 2-Bay NAS devices, which share
many MPP pins. However they are slightly different:
- There're two red LED indicator on LS-WXL to show when HDD fails,
but there's no such on LS-WSXL.
- There's 4-level speed adjustable FAN on LS-WXL, but not LS-WSXL.
So after the split, common part goes into .dtsi file:
- kirkwood-linkstation.dtsi
- kirkwood-linkstation-duo-6281.dtsi
while all rest part goes into device specific .dts file:
- kirkwood-linkstation-lswsxl.dts
- kirkwood-linkstation-lswxl.dts
Signed-off-by: Roger Shimizu <rogershimizu@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Aaro Koskinen [Tue, 12 Jan 2016 20:07:33 +0000 (22:07 +0200)]
ARM: dts: kirkwood: fix SD slot default configuration for OpenRD
The SD card slot was enabled by default with legacy booting.
It does not work anymore with DT boot. Fix by providing GPIO configuration
that matches the old default.
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Gregory CLEMENT [Wed, 23 Dec 2015 14:29:17 +0000 (15:29 +0100)]
ARM: dts: armada-370: Update the mpp63 function in the device tree on Armada 370
Since the commit a526973e0291 ("pinctrl: mvebu: Fix mapping of pin
63 (gpo -> gpio)"), the mpp63 is no more declared as a GPO but is a
GPIO. Even if in the datasheet this pin is described as GPO, the
experience of the D-Link DNS-327L board shows that it can be used as a
GPIO.
This commits generated warnings for the board using this pin as gpo, with
this patch the dts are fixed by using the new function (gpio) instead of
the old one.
The binding documentation has also been updated accordingly.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Acked-by: Jason Cooper <jason@lakedaemon.net>
Gregory CLEMENT [Wed, 23 Dec 2015 14:05:41 +0000 (15:05 +0100)]
ARM: dts: armada-38x: use usb-nop-xceiv PHY for the xhci nodes on Armada 388 GP
Using the usb-nop-xceiv PHY for the xhci nodes allows a better
representation of the hardware but also a better handling of the
regulator. By linking the regulator to the PHY there is no more need to
use the regulator-always-on property, then it allows a better power
management.
The remaining usb node uses the ehci-orion driver which can't be used
with the usb-nop-xceiv PHY and must keeps the direct link to the
regulator with the regulator-always-on property.
Thomas Petazzoni [Tue, 22 Dec 2015 14:28:42 +0000 (15:28 +0100)]
ARM: dts: armada-38x: use regulator-boot-on for SATA regulators on Armada 388 GP
Really, what we meant by regulator-always-on is that the regulators
are already turned on by the bootloader, for which regulator-boot-on
is a better description.
A net advantage of using regulator-boot-on is that the regulator is
not touched at boot time by the kernel, which avoids having the hard
drives spinning down and then up again, taking several (~5) seconds of
additional boot time.
In addition, there is no need to have such properties on the child
regulators used for SATA. Having it on the parent regulator that
really controls the GPIO is sufficient.
Without the patch:
[ 3.945866] ata2: SATA link down (SStatus 0 SControl 300)
[ 3.995862] ata3: SATA link down (SStatus 0 SControl 300)
[ 4.005863] ata4: SATA link down (SStatus 0 SControl 300)
[ 9.125861] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 9.144575] ata1.00: ATA-8: WDC WD5003ABYX-01WERA1, 01.01S02, max UDMA/133
[ 9.151471] ata1.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32)
(and you can hear the disk spinning down and up during this 5.1
seconds delay)
With the patch:
[ 3.945988] ata2: SATA link down (SStatus 0 SControl 300)
[ 4.005980] ata4: SATA link down (SStatus 0 SControl 300)
[ 4.011404] ata3: SATA link down (SStatus 0 SControl 300)
[ 4.145978] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 4.153701] ata1.00: ATA-8: WDC WD5003ABYX-01WERA1, 01.01S02, max UDMA/133
[ 4.160597] ata1.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Xin Long [Wed, 3 Feb 2016 15:33:30 +0000 (23:33 +0800)]
sctp: translate network order to host order when users get a hmacid
Commit ed5a377d87dc ("sctp: translate host order to network order when
setting a hmacid") corrected the hmacid byte-order when setting a hmacid.
but the same issue also exists on getting a hmacid.
We fix it by changing hmacids to host order when users get them with
getsockopt.
Fixes: Commit ed5a377d87dc ("sctp: translate host order to network order when setting a hmacid") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Petazzoni [Mon, 21 Dec 2015 14:41:55 +0000 (15:41 +0100)]
ARM: dts: armada-38x: adjust board name and compatible for Armada 388 GP
As the name of the Device Tree file name suggests, the Armada 388 GP
really contains an Armada 388 SoC, so this commit updates the board
name and compatible string in the Device Tree file.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Sandeep Pillai [Wed, 3 Feb 2016 09:10:44 +0000 (14:40 +0530)]
enic: increment devcmd2 result ring in case of timeout
Firmware posts the devcmd result in result ring. In case of timeout, driver
does not increment the current result pointer and firmware could post the
result after timeout has occurred. During next devcmd, driver would be
reading the result of previous devcmd.
Fix this by incrementing result even in case of timeout.
Fixes: 373fb0873d43 ("enic: add devcmd2") Signed-off-by: Sandeep Pillai <sanpilla@cisco.com> Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tg3: Fix for tg3 transmit queue 0 timed out when too many gso_segs
tg3_tso_bug() can hit a condition where the entire tx ring is not big
enough to segment the GSO packet. For example, if MSS is very small,
gso_segs can exceed the tx ring size. When we hit the condition, it
will cause tx timeout.
tg3_tso_bug() is called to handle TSO and DMA hardware bugs.
For TSO bugs, if tg3_tso_bug() cannot succeed, we have to drop the packet.
For DMA bugs, we can still fall back to linearize the SKB and let the
hardware transmit the TSO packet.
This patch adds a function tg3_tso_bug_gso_check() to check if there
are enough tx descriptors for GSO before calling tg3_tso_bug().
The caller will then handle the error appropriately - drop or
lineraize the SKB.
v2: Corrected patch description to avoid confusion.
Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Acked-by: Prashant Sreedharan <prashant@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Devices may have limits on the number of fragments in an skb they support.
Current codebase uses a constant as maximum for number of fragments one
skb can hold and use.
When enabling scatter/gather and running traffic with many small messages
the codebase uses the maximum number of fragments and may thereby violate
the max for certain devices.
The patch introduces a global variable as max number of fragments.
Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Matt Roper [Mon, 8 Feb 2016 19:05:28 +0000 (11:05 -0800)]
drm/i915: Pretend cursor is always on for ILK-style WM calculations (v2)
Due to our lack of two-step watermark programming, our driver has
historically pretended that the cursor plane is always on for the
purpose of watermark calculations; this helps avoid serious flickering
when the cursor turns off/on (e.g., when the user moves the mouse
pointer to a different screen). That workaround was accidentally
dropped as we started working toward atomic watermark updates. Since we
still aren't quite there yet with two-stage updates, we need to
resurrect the workaround and treat the cursor as always active.
v2: Tweak cursor width calculations slightly to more closely match the
logic we used before the atomic overhaul began. (Ville)
Eric Dumazet [Wed, 3 Feb 2016 03:31:12 +0000 (19:31 -0800)]
tcp: do not drop syn_recv on all icmp reports
Petr Novopashenniy reported that ICMP redirects on SYN_RECV sockets
were leading to RST.
This is of course incorrect.
A specific list of ICMP messages should be able to drop a SYN_RECV.
For instance, a REDIRECT on SYN_RECV shall be ignored, as we do
not hold a dst per SYN_RECV pseudo request.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111751 Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Reported-by: Petr Novopashenniy <pety@rusnet.ru> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Olof Johansson [Mon, 8 Feb 2016 21:59:29 +0000 (13:59 -0800)]
Merge tag 'mvebu-defconfig-4.6-1' of git://git.infradead.org/linux-mvebu into next/defconfig
mvebu defconfig for 4.6 (part 1)
- Enable sound module needed for OpenRD in mvebu_v5_defconfig
- Add USB nop xceiv support in mvebu_v7_defconfig
* tag 'mvebu-defconfig-4.6-1' of git://git.infradead.org/linux-mvebu:
ARM: mvebu_v5_defconfig: Enable sound module needed for OpenRD
ARM: mvebu: Add USB nop xceiv support in mvebu_v7_defconfig
Mans Rullgard [Fri, 5 Feb 2016 09:49:22 +0000 (10:49 +0100)]
ARM: debug: add support for Palmchip BK-310x UART
Some SoCs use a Palmchip BK-310x UART which is mostly 16550 compatible
but with a different register layout. While this UART has previously
only been supported in MIPS based chips (Alchemy, Ralink), the ARM based
SMP87xx series from Sigma Designs also uses it.
This patch allows the debug console to work with this type of UART.
Signed-off-by: Mans Rullgard <mans@mansr.com> Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Olof Johansson <olof@lixom.net>