Ira Weiny [Thu, 28 Jul 2016 01:08:42 +0000 (21:08 -0400)]
IB/hfi1: Allow for non-double word multiple message sizes for user SDMA
The driver pads non-double word multiple message sizes but it doesn't
account for this padding when the packet length is calculated. Also, the
data length is miscalculated for message sizes less than 4 bytes due to
the bit representation in LRH. And there's a check for non-double word
multiple message sizes that prevents these messages from being sent.
This patch fixes length miscalculations and enables the functionality to
send non-double word multiple message sizes.
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Ira Weiny [Thu, 28 Jul 2016 01:07:36 +0000 (21:07 -0400)]
IB/rdmavt: Eliminate redundant opcode test in mr ref clear
The use of the specific opcode test is redundant since
all ack entry users correctly manipulate the mr pointer
to selectively trigger the reference clearing.
The overly specific test hinders the use of implementation
specific operations.
The change needs to get rid of the union to insure that
an atomic value is not seen as an MR pointer.
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Ira Weiny [Thu, 28 Jul 2016 01:06:15 +0000 (21:06 -0400)]
IB/hfi1: Handle kzalloc failure in init_pervl_scs
Checking the return value of the memory allocation call in
init_pervl_scs() was missed. Recently the kmalloc() was changed to
kzalloc() which identified the problem.
While fixing this issue 2 other bugs were noticed. First, the array
being allocated is accessed in the nomem path which can be reached before
it is allocated. Second, kernel_send_context was not released on error.
Fix both of these by creating a more common memory unwind label structure.
Fixes: 35f6befc8441 ("staging/rdma/hfi1: Add qp to send context mapping for PIO") Reported-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Instead of copying the actual GRH of type struct ib_grh, existing code
copies the struct ib_global_route into the sge. This patch fixes that
and constructs the actual GRH from ib_global_route and copies the GRH
into the sge.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Cleanup hfi1_ud_rcv to not have to look at the packet
header fields multiple times. The fields are looked up
once and used throughout the function. Also fix sc
computation when validating MAD packets.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
IB/hfi1: Reset QSFP on every run through channel tuning
Active QSFP cables were reset only every alternate iteration of the
channel tuning algorithm instead of every iteration due to incorrect
reset of the flag that controlled QSFP reset, resulting in using stale
QSFP status in the channel tuning algorithm.
Fixes: 8ebd4cf1852a ("Add active and optical cable support") Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
IB/hfi1: Ignore QSFP interrupts until power stabilizes
Some QSFP cables assert the interrupt line as a side effect of module
plug-in and power up. This causes the SerDes and QSFP tuning algorithm
to begin cable initialization by reading the QSFP memory map over I2C,
which fails. This patch ignores any interrupt line assertion until
the module has completed power up and voltage rails have stabilized,
which can take a maximum of 500 ms per the SFF-8679 specification.
Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
QSFP CDR enablement is now controlled by determining power class
and the configuration file. We disable the DC 8051 from requesting
enablement or disabling of TX and RX CDRs by removing the code
that allowed the DC 8051 to request changes.
Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
With the base memory management extension enabled, FRMR is used instead
of FMR. The xprtrdma server issues each RDMA read request as the following
bundle:
These requests are signaled. In order to generate completion, the fast
register work request is processed by the hfi1 send engine after being
posted to the work queue, and the corresponding lkey is not valid until
the request is processed. However, the rdmavt driver validates lkey when
the RDMA read request is posted and thus it fails immediately with error
-EINVAL (-22).
This patch changes the work flow of local operations (fast register and
local invalidate) so that fast register work requests are always
processed immediately to ensure that the corresponding lkey is valid
when subsequent work requests are posted. Local invalidate requests are
processed immediately if fencing is not required and no previous local
invalidate request is pending.
To allow completion generation for signaled local operations that have
been processed before posting to the work queue, an internal send flag
RVT_SEND_COMPLETION_ONLY is added. The hfi1 send engine checks this flag
and only generates completion for such requests.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Mon, 25 Jul 2016 20:39:39 +0000 (13:39 -0700)]
IB/hfi1: Add the capability for reserved operations
This fix allows for support of in-kernel reserved operations
without impacting the ULP user.
The low level driver can register a non-zero value which
will be transparently added to the send queue size and hidden
from the ULP in every respect.
ULP post sends will never see a full queue due to a reserved
post send and reserved operations will never exceed that
registered value.
The s_avail will continue to track the ULP swqe availability
and the difference between the reserved value and the reserved
in use will track reserved availabity.
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Grzegorz Heldt [Mon, 25 Jul 2016 20:39:33 +0000 (13:39 -0700)]
IB/hfi1: Fix trace message units
Trace shows incorrect amount of allocated memory.
Fix trace to display memory in KB.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Grzegorz Heldt <grzegorz.heldt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Mon, 25 Jul 2016 20:39:21 +0000 (13:39 -0700)]
IB/hfi1: Add static PCIe Gen3 CTLE tuning
Enhance the PCIe Gen3 recipe to support static CTLE tuning,
and add a switch to choose between static and dynamic
approaches. Make discrete chips default to static CTLE
tuning.
Reviewed-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Mon, 25 Jul 2016 20:38:56 +0000 (13:38 -0700)]
IB/hfi1: Explain state complete frame details
When link up fails in LNI, the local and peer state complete
frames are reported as numbers. Explain what the values mean
so the operator can better diagnose the problem.
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
IB/hfi1: Modify the default number of kernel receive conexts
Currently, the default number of kernel receive contexts is set to the
number of NUMA nodes on the system plus one for control context. However,
the systems that have a single socket and/or have NUMA disabled in the BIOS
will have only one receive context by default. This patch would ensure that
by default there will be at least two kernel receive contexts plus one for
control context regardless of the number of NUMA nodes on the system. The
user can override the default number of kernel receive contexts with the
krcvqs module parameter.
Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
IB/hfi1: Work request processing for fast register mr and invalidate
In order to support extended memory management support, add send side
processing of work requests of type IB_WR_REG_MR, IB_WR_LOCAL_INV, and
IB_WR_SEND_WITH_INV. The first two are local operations and are supported
for both RC and UC. Send with invalidate is only supported for RC because
the corresponding IB opcodes are not defined for UC.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Some work requests are local operations, such as IB_WR_REG_MR and
IB_WR_LOCAL_INV. They differ from non-local operations in that:
(1) Local operations can be processed immediately without being posted
to the send queue if neither fencing nor completion generation is needed.
However, to ensure correct ordering, once a local operation is posted to
the work queue due to fencing or completion requiement, all subsequent
local operations must also be posted to the work queue until all the
local operations on the work queue have completed.
(2) Local operations don't send packets over the wire and thus don't
need (and shouldn't update) the packet sequence numbers.
Define a new a flag bit for the post send table to identify local
operations.
Add a new field to the QP structure to track the number of local
operations on the send queue to determine if direct processing of new
local operations should be enabled/disabled.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
In order to support extended memory management, add the mechanism to
invalidate MR keys. This includes a flag "lkey_invalid" in the MR data
structure that is to be checked when validating access to the MR via
the associated key, and two utility functions to perform fast memory
registration and memory key invalidate operations.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
IB/hfi1: Pull FECN/BECN processing to a common place
There were multiple places where FECN/BECN processing was
being done for the different types of QPs. All of that code
was very similar, which meant that it could be pulled into
a single function used by the different QP types.
To retain the performance in the fastpath, the common code
starts with an inline function, which only calls the slow
path if the packet has any of the [FB]ECN bits set.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
IB/hfi1: Fix to fully initialize send context area
While handling buffer control MAD, partially initialized
dd->kernel_send_context area may cause potential dereference
of uninitialized pointers. Fix by using kzalloc_node()
instead of kmalloc_node().
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com> Signed-off-by: Tymoteusz Kielan <tymoteusz.kielan@intel.com> Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Jakub Pawlak [Mon, 25 Jul 2016 20:37:54 +0000 (13:37 -0700)]
IB/hfi1: Fix integrity errors counter value calculation
PMA should not sum TX and RX replay counts when reporting
local link integrity errors. Fixed by removing C_DC_TX_REPLAY
counter from calculation of the link integrity errors counter
value.
Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
IB/hfi1: Improve SDMA engine assignment for user SDMA
Currently each user context is assigned a single SDMA engine
based on the VL, context id, and subcontext id. That means for
MPI applications, each rank can only use one SDMA engine for
all messages. This may create unwanted backup for independent
messages going to different destinations upon congestion at one
destination.
This patch adds the packet "dlid" to the formula of SDMA engine
selection for user SDMA requests. A simple hash table is used
to maintain even distribution among the available SDMA engines
regardless how the "dlid" values are distributed.
Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
When performing process affinity recommendations for MPI ranks, the current
algorithm doesn't take into account multiple HFI units. Also, real
cores and HT cores are not distinguished from one another. Therefore,
all HT cores are recommended to be assigned first within the local NUMA
node before recommending the assignments of cores in other NUMA nodes.
It's ideal to assign all real cores across all NUMA nodes first, then all
HT 1 cores, then all HT 2 cores, and so on to balance CPU workload. CPU
cores in other NUMA nodes could be running interrupt handlers, and this is
not taken into account.
To balance the CPU workload for user processes, the following
recommendation algorithm is used:
For each user process that is opening a context on HFI Y:
a) If all cores are assigned to user processes, start assignments all
over from the first core
b) Assign real cores first, then HT cores (First set of HT cores on
all physical cores, then second set of HT cores, and, so on) in the
following order:
1. Same NUMA node as HFI Y and not running an IRQ handler
2. Same NUMA node as HFI Y and running an IRQ handler
3. Different NUMA node to HFI Y and not running an IRQ handler
4. Different NUMA node to HFI Y and running an IRQ handler
c) Mark core as assigned in the global affinity structure. As user
processes are done, remove core assignments from global affinity
structure.
This implementation allows an arbitrary number of HT cores and provides
support for multiple HFIs.
This is being included in the kernel rather than user space due to the
fact that user space has no way of knowing the CPU recommendations for
contexts running as part of other jobs.
Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
IB/hfi1: Reserve and collapse CPU cores for contexts
Kernel receive queues oversubscribe CPU cores on multi-HFI systems.
To prevent this, the kernel receive queues are separated onto
different cores, and the SDMA engine interrupts are constrained to
a lesser number of cores.
hfi1s_on_numa_node*krcvqs is the number of CPU cores that are
reserved for kernel receive queues for all HFIs. Each HFI initializes
its kernel receive queues to one of the reserved CPU cores. If there
ends up being 0 CPU cores leftover for SDMA engines, use the same
CPU cores as receive contexts.
In addition, general and control contexts are assigned to their own
CPU core, however, both types of contexts tend to have low traffic.
To save CPU cores, collapse general and control contexts to one CPU
core for all HFI units. This change prevents SDMA engine interrupts
from wrapping around general contexts.
Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
IB/hfi1: Add global structure for affinity assignments
When HFI units get initialized, they each use their own mask copy for
affinity assignments. On a multi-HFI system, affinity assignments
overbook CPU cores as each HFI doesn't have knowledge of affinity
assignments for other HFI units. Therefore, some CPU cores are never
used for interrupt handlers in systems with high number of CPU cores
per NUMA node.
For multi-HFI systems, SDMA engine interrupt assignments start all over
from the first CPU in the local NUMA node after the first HFI
initialization. This change allows assignments to continue where the
last HFI unit left off.
Add global structure for affinity assignments for multiple HFIs to share
affinity mask.
Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Reviewed-by: Jubin John <jubin.john@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
IB/hfi1: Separate tracepoints into specific headers
The ftrace infrastructure used to evaluate the TRACE_SYSTEM
macro on every DEFINE_EVENT() macro. Now the TRACE_SYSTEM
macro only gets evaluated when trace/define_trace.h is
included, so the group event information is lost. This was
introduced in
commit acd388fd3af3 ("tracing: Give system name a pointer")
Therefore, each system tracepoint must be on its own file.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Merge tag 'ceph-for-4.7-rc8' of git://github.com/ceph/ceph-client
Pull ceph fix from Ilya Dryomov:
"A fix for a long-standing bug in the incremental osdmap handling code
that caused misdirected requests, tagged for stable"
The tag is signed with a brand new key - Sage is on vacation and I
didn't anticipate this"
* tag 'ceph-for-4.7-rc8' of git://github.com/ceph/ceph-client:
libceph: apply new_state before new_up_client on incrementals
1) Fix memory leak in nftables, from Liping Zhang.
2) Need to check result of vlan_insert_tag() in batman-adv otherwise we
risk NULL skb derefs, from Sven Eckelmann.
3) Check for dev_alloc_skb() failures in cfg80211, from Gregory
Greenman.
4) Handle properly when we have ppp_unregister_channel() happening in
parallel with ppp_connect_channel(), from WANG Cong.
5) Fix DCCP deadlock, from Eric Dumazet.
6) Bail out properly in UDP if sk_filter() truncates the packet to be
smaller than even the space that the protocol headers need. From
Michal Kubecek.
7) Similarly for rose, dccp, and sctp, from Willem de Bruijn.
8) Make TCP challenge ACKs less predictable, from Eric Dumazet.
9) Fix infinite loop in bgmac_dma_tx_add() from Florian Fainelli.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
packet: propagate sock_cmsg_send() error
net/mlx5e: Fix del vxlan port command buffer memset
packet: fix second argument of sock_tx_timestamp()
net: switchdev: change ageing_time type to clock_t
Update maintainer for EHEA driver.
net/mlx4_en: Add resilience in low memory systems
net/mlx4_en: Move filters cleanup to a proper location
sctp: load transport header after sk_filter
net/sched/sch_htb: clamp xstats tokens to fit into 32-bit int
net: cavium: liquidio: Avoid dma_unmap_single on uninitialized ndata
net: nb8800: Fix SKB leak in nb8800_receive()
et131x: Fix logical vs bitwise check in et131x_tx_timeout()
vlan: use a valid default mtu value for vlan over macsec
net: bgmac: Fix infinite loop in bgmac_dma_tx_add()
mlxsw: spectrum: Prevent invalid ingress buffer mapping
mlxsw: spectrum: Prevent overwrite of DCB capability fields
mlxsw: spectrum: Don't emit errors when PFC is disabled
mlxsw: spectrum: Indicate support for autonegotiation
mlxsw: spectrum: Force link training according to admin state
r8152: add MODULE_VERSION
...
Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
"This contains a fix for a potential crash/corruption issue and another
where the suid/sgid bits weren't cleared on write"
* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: verify upper dentry in ovl_remove_and_whiteout()
ovl: Copy up underlying inode's ->i_mode to overlay inode
ovl: handle ATTR_KILL*
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
pps: do not crash when failed to register
tools/vm/slabinfo: fix an unintentional printf
testing/radix-tree: fix a macro expansion bug
radix-tree: fix radix_tree_iter_retry() for tagged iterators.
mm: memcontrol: fix cgroup creation failure after many small jobs
Merge tag 'drm-fixes-for-v4.7-rc8-intel-kbl' of git://people.freedesktop.org/~airlied/linux
Pull intel kabylake drm fixes from Dave Airlie:
"As mentioned Intel has gathered all the Kabylake fixes from -next,
which we've enabled in 4.7 for the first time, these are pretty much
limited in scope to only affects kabylake, which is hw that isn't
shipping yet. So I'm mostly okay with it going in now.
If we don't land this, it might be a good idea to disable kabylake
support in 4.7 before we ship"
* tag 'drm-fixes-for-v4.7-rc8-intel-kbl' of git://people.freedesktop.org/~airlied/linux: (28 commits)
drm/i915/kbl: Introduce the first official DMC for Kabylake.
drm/i915: Introduce Kabypoint PCH for Kabylake H/DT.
drm/i915/gen9: implement WaConextSwitchWithConcurrentTLBInvalidate
drm/i915/gen9: Add WaFbcHighMemBwCorruptionAvoidance
drm/i195/fbc: Add WaFbcNukeOnHostModify
drm/i915/gen9: Add WaFbcWakeMemOn
drm/i915/gen9: Add WaFbcTurnOffFbcWatermark
drm/i915/kbl: Add WaClearSlmSpaceAtContextSwitch
drm/i915/gen9: Add WaEnableChickenDCPR
drm/i915/kbl: Add WaDisableSbeCacheDispatchPortSharing
drm/i915/kbl: Add WaDisableGafsUnitClkGating
drm/i915/kbl: Add WaForGAMHang
drm/i915: Add WaInsertDummyPushConstP for bxt and kbl
drm/i915/kbl: Add WaDisableDynamicCreditSharing
drm/i915/kbl: Add WaDisableGamClockGating
drm/i915/gen9: Enable must set chicken bits in config0 reg
drm/i915/kbl: Add WaDisableLSQCROPERFforOCL
drm/i915/kbl: Add WaDisableSDEUnitClockGating
drm/i915/kbl: Add WaDisableFenceDestinationToSLM for A0
drm/i915/kbl: Add WaEnableGapsTsvCreditFix
...
Merge tag 'drm-fixes-for-v4.7-rc8-intel' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Two i915 regression fixes.
Intel have submitted some Kabylake fixes I'll send separately, since
this is the first kernel with kabylake support and they don't go much
outside that area I think they should be fine"
* tag 'drm-fixes-for-v4.7-rc8-intel' of git://people.freedesktop.org/~airlied/linux:
drm/i915: add missing condition for committing planes on crtc
drm/i915: Treat eDP as always connected, again
* tag 'm68k-for-v4.8-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k/defconfig: Update defconfigs for v4.7-rc2
m68k: Assorted spelling fixes
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"A handful of fixes before final release:
Marvell Armada:
- One to fix a typo in the devicetree specifying memory ranges for
the crypto engine
- Two to deal with marking PCI and device-memory as strongly ordered
to avoid hardware deadlocks, in particular when enabling above
crypto driver.
- Compile fix for PM
Allwinner:
- DT clock fixes to deal with u-boot-enabled framebuffer (simplefb).
- Make R8 (C.H.I.P. SoC) inherit system compatibility from A13 to
make clocks register proper.
Tegra:
- Fix SD card voltage setting on the Tegra3 Beaver dev board
Misc:
- Two maintainers updates for STM32 and STi platforms"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: tegra: beaver: Allow SD card voltage to be changed
MAINTAINERS: update STi maintainer list
MAINTAINERS: update STM32 maintainers list
ARM: mvebu: compile pm code conditionally
ARM: dts: sun7i: Fix pll3x2 and pll7x2 not having a parent clock
ARM: dts: sunxi: Add pll3 to simplefb nodes clocks lists
ARM: dts: armada-38x: fix MBUS_ID for crypto SRAM on Armada 385 Linksys
ARM: mvebu: map PCI I/O regions strongly ordered
ARM: mvebu: fix HW I/O coherency related deadlocks
ARM: sunxi/dt: make the CHIP inherit from allwinner,sun5i-a13
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull key handling fixes from James Morris:
"Quoting David Howells:
Here are three miscellaneous fixes:
(1) Fix a panic in some debugging code in PKCS#7. This can only
happen by explicitly inserting a #define DEBUG into the code.
(2) Fix the calculation of the digest length in the PE file parser.
This causes a failure where there should be a success.
(3) Fix the case where an X.509 cert can be added as an asymmetric key
to a trusted keyring with no trust restriction if no AKID is
supplied.
Bugs (1) and (2) aren't particularly problematic, but (3) allows a
security check to be bypassed. Happily, this is a recent regression
and never made it into a released kernel"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
KEYS: Fix for erroneous trust of incorrectly signed X.509 certs
pefile: Fix the failure of calculation for digest
PKCS#7: Fix panic when referring to the empty AKID when DEBUG defined
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
"A few more fixes for the input subsystem:
- restore naming for tsc2005 touchscreens as some userspace match on it
- fix out of bound access in legacy keyboard driver
- fixup in RMI4 driver
Everything is tagged for stable as well"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: tsc200x - report proper input_dev name
tty/vt/keyboard: fix OOB access in do_compute_shiftstate()
Input: synaptics-rmi4 - fix maximum size check for F12 control register 8
Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fix from Dan Williams:
"This contains a regression fix for a problem that was introduced in
v4.7-rc6.
In 4.7-rc1 we introduced auto-probing for the ACPI DSM (device-
specific-method) format that the platform firmware implements for
nvdimm devices. We initially fixed a regression in probing the QEMU
DSM implementation by making acpi_check_dsm() tolerant of the way QEMU
reports the "0 DSMs supported" condition.
However, that broke HPE platforms since that tolerance caused the
driver to mistakenly match the 1-zero-byte response those platforms
give to "unknown" commands. Instead, we simply make the driver
tolerant of not finding any supported DSMs. This has been tested to
work with both QEMU and HPE platforms.
This commit has appeared in a -next release with no reported issues"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nfit: make DIMM DSMs optional
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Michael Turquette:
"Fix a bug in the at91 clk driver, two compile time warnings in sunxi
clk drivers, and one bug in a sunxi clk driver introduced in the 4.7
merge window"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: at91: fix clk_programmable_set_parent()
clk: sunxi: remove unused variable
clk: sunxi: display: Add per-clock flags
clk: sunxi: tcon-ch1: Do not return a negative error in get_parent
Merge tag 'sound-4.7-fix2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"No surprise, just a few small fixes: a couple of changes are seen in
the core part, and both of them are rather for unusual error paths.
The rest are the regular HD-audio fixes and one USB-audio regression
fix"
* tag 'sound-4.7-fix2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Fix quirks code is not called
ALSA: hda: add AMD Stoney PCI ID with proper driver caps
ALSA: hda - fix use-after-free after module unload
ALSA: pcm: Free chmap at PCM free callback, too
ALSA: ctl: Stop notification after disconnection
ALSA: hda/realtek - add new pin definition in alc225 pin quirk table
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull NVMe fix from Jens Axboe:
"Late addition here, it's basically a revert of a patch that was added
in this merge window, but has proven to cause problems.
This is swapping out the RCU based namespace protection with a good
old mutex instead"
* 'for-linus' of git://git.kernel.dk/linux-block:
nvme: Remove RCU namespace protection
Dan Carpenter [Wed, 20 Jul 2016 22:45:05 +0000 (15:45 -0700)]
tools/vm/slabinfo: fix an unintentional printf
The curly braces are missing here so we print stuff unintentionally.
Fixes: 9da4714a2d44 ('slub: slabinfo update for cmpxchg handling') Link: http://lkml.kernel.org/r/20160715211243.GE19522@mwanda Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: Laura Abbott <labbott@fedoraproject.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
radix-tree: fix radix_tree_iter_retry() for tagged iterators.
radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags.
Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot()
leading to crash:
Johannes Weiner [Wed, 20 Jul 2016 22:44:57 +0000 (15:44 -0700)]
mm: memcontrol: fix cgroup creation failure after many small jobs
The memory controller has quite a bit of state that usually outlives the
cgroup and pins its CSS until said state disappears. At the same time
it imposes a 16-bit limit on the CSS ID space to economically store IDs
in the wild. Consequently, when we use cgroups to contain frequent but
small and short-lived jobs that leave behind some page cache, we quickly
run into the 64k limitations of outstanding CSSs. Creating a new cgroup
fails with -ENOSPC while there are only a few, or even no user-visible
cgroups in existence.
Although pinning CSSs past cgroup removal is common, there are only two
instances that actually need an ID after a cgroup is deleted: cache
shadow entries and swapout records.
Cache shadow entries reference the ID weakly and can deal with the CSS
having disappeared when it's looked up later. They pose no hurdle.
Swap-out records do need to pin the css to hierarchically attribute
swapins after the cgroup has been deleted; though the only pages that
remain swapped out after offlining are tmpfs/shmem pages. And those
references are under the user's control, so they are manageable.
This patch introduces a private 16-bit memcg ID and switches swap and
cache shadow entries over to using that. This ID can then be recycled
after offlining when the CSS remains pinned only by objects that don't
specifically need it.
This script demonstrates the problem by faulting one cache page in a new
cgroup and deleting it again:
set -e
mkdir -p pages
for x in `seq 128000`; do
[ $((x % 1000)) -eq 0 ] && echo $x
mkdir /cgroup/foo
echo $$ >/cgroup/foo/cgroup.procs
echo trex >pages/$x
echo $$ >/cgroup/cgroup.procs
rmdir /cgroup/foo
done
When run on an unpatched kernel, we eventually run out of possible IDs
even though there are no visible cgroups:
[root@ham ~]# ./cssidstress.sh
[...]
65000
mkdir: cannot create directory '/cgroup/foo': No space left on device
After this patch, the IDs get released upon cgroup destruction and the
cache and css objects get released once memory reclaim kicks in.
[hannes@cmpxchg.org: init the IDR] Link: http://lkml.kernel.org/r/20160621154601.GA22431@cmpxchg.org Fixes: b2052564e66d ("mm: memcontrol: continue cache reclaim from offlined groups") Link: http://lkml.kernel.org/r/20160617162516.GD19084@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: John Garcia <john.garcia@mesosphere.io> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Nikolay Borisov <kernel@kyup.com> Cc: <stable@vger.kernel.org> [3.19+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I stumbled over a build error with COMPILE_TEST and CONFIG_OF
disabled:
drivers/gpio/gpio-tegra.c: In function 'tegra_gpio_probe':
drivers/gpio/gpio-tegra.c:603:9: error: 'struct gpio_chip' has no member named 'of_node'
The problem is that the newly added GPIO_TEGRA Kconfig symbol
does not have a dependency on CONFIG_OF. However, there is another
problem here as the driver gets enabled unconditionally whenever
COMPILE_TEST is set.
This fixes both problems, by making the symbol user-visible
when COMPILE_TEST is set and default-enabled for ARCH_TEGRA=y.
As a side-effect, it is now possible to compile-test a Tegra
kernel with GPIO support disabled, which is harmless.
libceph: apply new_state before new_up_client on incrementals
Currently, osd_weight and osd_state fields are updated in the encoding
order. This is wrong, because an incremental map may look like e.g.
new_up_client: { osd=6, addr=... } # set osd_state and addr
new_state: { osd=6, xorstate=EXISTS } # clear osd_state
Suppose osd6's current osd_state is EXISTS (i.e. osd6 is down). After
applying new_up_client, osd_state is changed to EXISTS | UP. Carrying
on with the new_state update, we flip EXISTS and leave osd6 in a weird
"!EXISTS but UP" state. A non-existent OSD is considered down by the
mapping code
2087 for (i = 0; i < pg->pg_temp.len; i++) {
2088 if (ceph_osd_is_down(osdmap, pg->pg_temp.osds[i])) {
2089 if (ceph_can_shift_osds(pi))
2090 continue;
2091
2092 temp->osds[temp->size++] = CRUSH_ITEM_NONE;
and so requests get directed to the second OSD in the set instead of
the first, resulting in OSD-side errors like:
[WRN] : client.4239 192.168.122.21:0/2444980242 misdirected client.4239.1:2827 pg 2.5df899f2 to osd.4 not [1,4,6] in e680/680
and hung rbds on the client:
[ 493.566367] rbd: rbd0: write 400000 at 11cc00000 (0)
[ 493.566805] rbd: rbd0: result -6 xferred 400000
[ 493.567011] blk_update_request: I/O error, dev rbd0, sector 9330688
The fix is to decouple application from the decoding and:
- apply new_weight first
- apply new_state before new_up_client
- twiddle osd_state flags if marking in
- clear out some of the state if osd is destroyed
Fixes: http://tracker.ceph.com/issues/14901 Cc: stable@vger.kernel.org # 3.15+: 6dd74e44dc1d: libceph: set 'exists' flag for newly up osd Cc: stable@vger.kernel.org # 3.15+ Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Josh Durgin <jdurgin@redhat.com>
ovl: verify upper dentry in ovl_remove_and_whiteout()
The upper dentry may become stale before we call ovl_lock_rename_workdir.
For example, someone could (mistakenly or maliciously) manually unlink(2)
it directly from upperdir.
To ensure it is not stale, let's lookup it after ovl_lock_rename_workdir
and and check if it matches the upper dentry.
Essentially, it is the same problem and similar solution as in
commit 11f3710417d0 ("ovl: verify upper dentry before unlink and rename").
sock_cmsg_send() can return different error codes and not only
-EINVAL, and we should properly propagate them.
Fixes: c14ac9451c34 ("sock: enable timestamping using control messages") Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Stancek [Thu, 30 Jun 2016 10:23:51 +0000 (12:23 +0200)]
crypto: qat - make qat_asym_algs.o depend on asn1 headers
Parallel build can sporadically fail because asn1 headers may
not be built yet by the time qat_asym_algs.o is compiled:
drivers/crypto/qat/qat_common/qat_asym_algs.c:55:32: fatal error: qat_rsapubkey-asn1.h: No such file or directory
#include "qat_rsapubkey-asn1.h"
Cc: stable@vger.kernel.org Signed-off-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Michael Welling [Wed, 20 Jul 2016 17:02:07 +0000 (10:02 -0700)]
Input: tsc200x - report proper input_dev name
Passes input_id struct to the common probe function for the tsc200x drivers
instead of just the bustype.
This allows for the use of the product variable to set the input_dev->name
variable according to the type of touchscreen used. Note that when we
introduced support for TSC2004 we started calling everything TSC200X, so
let's keep this quirk.
Signed-off-by: Michael Welling <mwelling@ieee.org> Cc: stable@vger.kernel.org Acked-by: Pavel Machek <pavel@ucw.cz> Acked-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Dmitry Torokhov [Mon, 27 Jun 2016 21:12:34 +0000 (14:12 -0700)]
tty/vt/keyboard: fix OOB access in do_compute_shiftstate()
The size of individual keymap in drivers/tty/vt/keyboard.c is NR_KEYS,
which is currently 256, whereas number of keys/buttons in input device (and
therefor in key_down) is much larger - KEY_CNT - 768, and that can cause
out-of-bound access when we do
sym = U(key_maps[0][k]);
with large 'k'.
To fix it we should not attempt iterating beyond smaller of NR_KEYS and
KEY_CNT.
Also while at it let's switch to for_each_set_bit() instead of open-coding
it.
net/mlx5e: Fix del vxlan port command buffer memset
memset the command buffers rather than the pointers to them.
Fixes: b3f63c3d5e2c ("net/mlx5e: Add netdev support for VXLAN tunneling") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
packet: fix second argument of sock_tx_timestamp()
This patch fixes an issue that a syscall (e.g. sendto syscall) cannot
work correctly. Since the sendto syscall doesn't have msg_control buffer,
the sock_tx_timestamp() in packet_snd() cannot work correctly because
the socks.tsflags is set to 0.
So, this patch sets the socks.tsflags to sk->sk_tsflags as default.
Fixes: c14ac9451c34 ("sock: enable timestamping using control messages") Reported-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com> Reported-by: Keita Kobayashi <keita.kobayashi.ym@renesas.com> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Duggan [Wed, 20 Jul 2016 00:53:59 +0000 (17:53 -0700)]
Input: synaptics-rmi4 - fix maximum size check for F12 control register 8
According to the RMI4 spec the maximum size of F12 control register 8 is
15 bytes. The current code incorrectly reports an error if control 8 is
greater then 14. Making sensors with a control register 8 with 15 bytes
unusable.
Signed-off-by: Andrew Duggan <aduggan@synaptics.com> Reported-by: Chris Healy <cphealy@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Vivien Didelot [Mon, 18 Jul 2016 19:02:06 +0000 (15:02 -0400)]
net: switchdev: change ageing_time type to clock_t
The switchdev value for the SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME
attribute is a clock_t and requires to use helpers such as
clock_t_to_jiffies() to convert to milliseconds.
Change ageing_time type from u32 to clock_t to make it explicit.
Fixes: f55ac58ae64c ("switchdev: add bridge ageing_time attribute") Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>