Chuck Lever [Wed, 29 Jun 2016 17:55:22 +0000 (13:55 -0400)]
NFS: Don't drop CB requests with invalid principals
Before commit 778be232a207 ("NFS do not find client in NFSv4
pg_authenticate"), the Linux callback server replied with
RPC_AUTH_ERROR / RPC_AUTH_BADCRED, instead of dropping the CB
request. Let's restore that behavior so the server has a chance to
do something useful about it, and provide a warning that helps
admins correct the problem.
Fixes: 778be232a207 ("NFS do not find client in NFSv4 ...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 29 Jun 2016 17:55:14 +0000 (13:55 -0400)]
svc: Avoid garbage replies when pc_func() returns rpc_drop_reply
If an RPC program does not set vs_dispatch and pc_func() returns
rpc_drop_reply, the server sends a reply anyway containing a single
word containing the value RPC_DROP_REPLY (in network byte-order, of
course). This is a nonsense RPC message.
Fixes: 9e701c610923 ("svcrpc: simpler request dropping") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 29 Jun 2016 17:55:06 +0000 (13:55 -0400)]
xprtrdma: No direct data placement with krb5i and krb5p
Direct data placement is not allowed when using flavors that
guarantee integrity or privacy. When such security flavors are in
effect, don't allow the use of Read and Write chunks for moving
individual data items. All messages larger than the inline threshold
are sent via Long Call or Long Reply.
On my systems (CX-3 Pro on FDR), for small I/O operations, the use
of Long messages adds only around 5 usecs of latency in each
direction.
Note that when integrity or encryption is used, the host CPU touches
every byte in these messages. Even if it could be used, data
movement offload doesn't buy much in this case.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 29 Jun 2016 17:54:58 +0000 (13:54 -0400)]
xprtrdma: Clean up fixup_copy_count accounting
fixup_copy_count should count only the number of bytes copied to the
page list. The head and tail are now always handled without a data
copy.
And the debugging at the end of rpcrdma_inline_fixup() is also no
longer necessary, since copy_len will be non-zero when there is reply
data in the tail (a normal and valid case).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 29 Jun 2016 17:54:49 +0000 (13:54 -0400)]
xprtrdma: Update only specific fields in private receive buffer
Now that rpcrdma_inline_fixup() updates only two fields in
rq_rcv_buf, a full memcpy of that structure to rq_private_buf is
unwarranted. Updating rq_private_buf fields only where needed also
better documents what is going on.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 29 Jun 2016 17:54:41 +0000 (13:54 -0400)]
xprtrdma: Do not update {head, tail}.iov_len in rpcrdma_inline_fixup()
While trying NFSv4.0/RDMA with sec=krb5p, I noticed small NFS READ
operations failed. After the client unwrapped the NFS READ reply
message, the NFS READ XDR decoder was not able to decode the reply.
The message was "Server cheating in reply", with the reported
number of received payload bytes being zero. Applications reported
a read(2) that returned -1/EIO.
The problem is rpcrdma_inline_fixup() sets the tail.iov_len to zero
when the incoming reply fits entirely in the head iovec. The zero
tail.iov_len confused xdr_buf_trim(), which then mangled the actual
reply data instead of simply removing the trailing GSS checksum.
As near as I can tell, RPC transports are not supposed to update the
head.iov_len, page_len, or tail.iov_len fields in the receive XDR
buffer when handling an incoming RPC reply message. These fields
contain the length of each component of the XDR buffer, and hence
the maximum number of bytes of reply data that can be stored in each
XDR buffer component. I've concluded this because:
- This is how xdr_partial_copy_from_skb() appears to behave
- rpcrdma_inline_fixup() already does not alter page_len
- call_decode() compares rq_private_buf and rq_rcv_buf and WARNs
if they are not exactly the same
Unfortunately, as soon as I tried the simple fix to just remove the
line that sets tail.iov_len to zero, I saw that the logic that
appends the implicit Write chunk pad inline depends on inline_fixup
setting tail.iov_len to zero.
To address this, re-organize the tail iovec handling logic to use
the same approach as with the head iovec: simply point tail.iov_base
to the correct bytes in the receive buffer.
While I remember all this, write down the conclusion in documenting
comments.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 29 Jun 2016 17:54:33 +0000 (13:54 -0400)]
xprtrdma: rpcrdma_inline_fixup() overruns the receive page list
When the remaining length of an incoming reply is longer than the
XDR buf's page_len, switch over to the tail iovec instead of
copying more than page_len bytes into the page list.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 29 Jun 2016 17:54:25 +0000 (13:54 -0400)]
xprtrdma: Chunk list encoders no longer share one rl_segments array
Currently, all three chunk list encoders each use a portion of the
one rl_segments array in rpcrdma_req. This is because the MWs for
each chunk list were preserved in rl_segments so that ro_unmap could
find and invalidate them after the RPC was complete.
However, now that MWs are placed on a per-req linked list as they
are registered, there is no longer any information in rpcrdma_mr_seg
that is shared between ro_map and ro_unmap_{sync,safe}, and thus
nothing in rl_segments needs to be preserved after
rpcrdma_marshal_req is complete.
Thus the rl_segments array can be used now just for the needs of
each rpcrdma_convert_iovs call. Once each chunk list is encoded, the
next chunk list encoder is free to re-use all of rl_segments.
This means all three chunk lists in one RPC request can now each
encode a full size data payload with no increase in the size of
rl_segments.
This is a key requirement for Kerberos support, since both the Call
and Reply for a single RPC transaction are conveyed via Long
messages (RDMA Read/Write). Both can be large.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 29 Jun 2016 17:54:16 +0000 (13:54 -0400)]
xprtrdma: Place registered MWs on a per-req list
Instead of placing registered MWs sparsely into the rl_segments
array, place these MWs on a per-req list.
ro_unmap_{sync,safe} can then simply pull those MWs off the list
instead of walking through the array.
This change significantly reduces the size of struct rpcrdma_req
by removing nsegs and rl_mw from every array element.
As an additional clean-up, chunk co-ordinates are returned in the
"*mw" output argument so they are no longer needed in every
array element.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 29 Jun 2016 17:54:08 +0000 (13:54 -0400)]
xprtrdma: Release orphaned MRs immediately
Instead of leaving orphaned MRs to be released when the transport
is destroyed, release them immediately. The MR free list can now be
replenished if it becomes exhausted.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 29 Jun 2016 17:54:00 +0000 (13:54 -0400)]
xprtrdma: Allocate MRs on demand
Frequent MR list exhaustion can impact I/O throughput, so enough MRs
are always created during transport set-up to prevent running out.
This means more MRs are created than most workloads need.
Commit 94f58c58c0b4 ("xprtrdma: Allow Read list and Reply chunk
simultaneously") introduced support for sending two chunk lists per
RPC, which consumes more MRs per RPC.
Instead of trying to provision more MRs, introduce a mechanism for
allocating MRs on demand. A few MRs are allocated during transport
set-up to kick things off.
This significantly reduces the average number of MRs per transport
while allowing the MR count to grow for workloads or devices that
need more MRs.
FRWR with mlx4 allocated almost 400 MRs per transport before this
patch. Now it starts with 32.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 29 Jun 2016 17:53:52 +0000 (13:53 -0400)]
xprtrdma: Chunk list encoders must not return zero
Clean up, based on code audit: Remove the possibility that the
chunk list XDR encoders can return zero, which would be interpreted
as a NULL.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 29 Jun 2016 17:53:43 +0000 (13:53 -0400)]
xprtrdma: Honor ->send_request API contract
Commit c93c62231cf5 ("xprtrdma: Disconnect on registration failure")
added a disconnect for some RPC marshaling failures. This is needed
only in a handful of cases, but it was triggering for simple stuff
like temporary resource shortages. Try to straighten this out.
Fix up the lower layers so they don't return -ENOMEM or other error
codes that the RPC client's FSM doesn't explicitly recognize.
Also fix up the places in the send_request path that do want a
disconnect. For example, when ib_post_send or ib_post_recv fail,
this is a sign that there is a send or receive queue resource
miscalculation. That should be rare, and is a sign of a software
bug. But xprtrdma can recover: disconnect to reset the transport and
start over.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 29 Jun 2016 17:53:35 +0000 (13:53 -0400)]
xprtrdma: Reply buffer exhaustion can be catastrophic
Not having an rpcrdma_rep at call_allocate time can be a problem.
It means that send_request can't post a receive buffer to catch
the RPC's reply. Possible consequences are RPC timeouts or even
transport deadlock.
Instead of allowing an RPC to proceed if an rpcrdma_rep is
not available, return NULL to force call_allocate to wait and
try again.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 29 Jun 2016 17:53:27 +0000 (13:53 -0400)]
xprtrdma: Clean up device capability detection
Clean up: Move device capability detection into memreg-specific
source files.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 29 Jun 2016 17:53:19 +0000 (13:53 -0400)]
xprtrdma: Remove rpcrdma_map_one() and friends
Clean up: ALLPHYSICAL is gone and FMR has been converted to use
scatterlists. There are no more users of these functions.
This patch shrinks the size of struct rpcrdma_req by about 3500
bytes on x86_64. There is one of these structs for each RPC credit
(128 credits per transport connection).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
No HCA or RNIC in the kernel tree requires the use of ALLPHYSICAL.
ALLPHYSICAL advertises in the clear on the network fabric an R_key
that is good for all of the client's memory. No known exploit
exists, but theoretically any user on the server can use that R_key
on the client's QP to read or update any part of the client's memory.
ALLPHYSICAL exposes the client to server bugs, including:
o base/bounds errors causing data outside the i/o buffer to be
accessed
o RDMA access after reply causing data corruption and/or integrity
fail
ALLPHYSICAL can't protect application memory regions from server
update after a local signal or soft timeout has terminated an RPC.
ALLPHYSICAL chunks are no larger than a page. Special cases to
handle small chunks and long chunk lists have been a source of
implementation complexity and bugs.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 29 Jun 2016 17:53:02 +0000 (13:53 -0400)]
xprtrdma: Do not leak an MW during a DMA map failure
Based on code audit.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 29 Jun 2016 17:52:54 +0000 (13:52 -0400)]
xprtrdma: Refactor MR recovery work queues
I found that commit ead3f26e359e ("xprtrdma: Add ro_unmap_safe
memreg method"), which introduces ro_unmap_safe, never wired up the
FMR recovery worker.
The FMR and FRWR recovery work queues both do the same thing.
Instead of setting up separate individual work queues for this,
schedule a delayed worker to deal with them, since recovering MRs is
not performance-critical.
Fixes: ead3f26e359e ("xprtrdma: Add ro_unmap_safe memreg method") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 29 Jun 2016 17:52:45 +0000 (13:52 -0400)]
xprtrdma: Use scatterlist for DMA mapping and unmapping under FMR
The use of a scatterlist for handling DMA mapping and unmapping
was recently introduced in frwr_ops.c in commit 4143f34e01e9
("xprtrdma: Port to new memory registration API"). That commit did
not make a similar update to xprtrdma's FMR support because the
core ib_map_phys_fmr() and ib_unmap_fmr() APIs have not been changed
to take a scatterlist argument.
However, FMR still needs to do DMA mapping and unmapping. It appears
that RDS, for example, uses a scatterlist for this, then builds the
DMA addr array for the ib_map_phys_fmr call separately. I see that
SRP also utilizes a scatterlist for DMA mapping. xprtrdma can do
something similar.
This modernization is used immediately to properly defer DMA
unmapping during fmr_unmap_safe (a FIXME). It separates the DMA
unmapping coordinates from the rl_segments array. This array, being
part of an rpcrdma_req, is always re-used immediately when an RPC
exits. A scatterlist is allocated in memory independent of the
rl_segments array, so it can be preserved indefinitely (ie, until
the MR invalidation and DMA unmapping can actually be done by a
worker thread).
The FRWR and FMR DMA mapping code are slightly different from each
other now, and will diverge further when the "Check for holes" logic
can be removed from FRWR (support for SG_GAP MRs). So I chose not to
create helpers for the common-looking code.
Fixes: ead3f26e359e ("xprtrdma: Add ro_unmap_safe memreg method") Suggested-by: Sagi Grimberg <sagi@lightbits.io> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 29 Jun 2016 17:52:37 +0000 (13:52 -0400)]
xprtrdma: Rename fields in rpcrdma_fmr
Clean up: Use the same naming convention used in other
RPC/RDMA-related data structures.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 29 Jun 2016 17:52:29 +0000 (13:52 -0400)]
xprtrdma: Move init and release helpers
Clean up: Moving these helpers in a separate patch makes later
patches more readable.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 29 Jun 2016 17:52:21 +0000 (13:52 -0400)]
xprtrdma: Create common scatterlist fields in rpcrdma_mw
Clean up: FMR is about to replace the rpcrdma_map_one code with
scatterlists. Move the scatterlist fields out of the FRWR-specific
union and into the generic part of rpcrdma_mw.
One minor change: -EIO is now returned if FRWR registration fails.
The RPC is terminated immediately, since the problem is likely due
to a software bug, thus retrying likely won't help.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Wed, 29 Jun 2016 17:52:12 +0000 (13:52 -0400)]
xprtrdma: Remove FMRs from the unmap list after unmapping
ib_unmap_fmr() takes a list of FMRs to unmap. However, it does not
remove the FMRs from this list as it processes them. Other
ib_unmap_fmr() call sites are careful to remove FMRs from the list
after ib_unmap_fmr() returns.
Since commit 7c7a5390dc6c8 ("xprtrdma: Add ro_unmap_sync method for FMR")
fmr_op_unmap_sync passes more than one FMR to ib_unmap_fmr(), but
it didn't bother to remove the FMRs from that list once the call was
complete.
I've noticed some instability that could be related to list
tangling by the new fmr_op_unmap_sync() logic. In an abundance
of caution, add some defensive logic to clean up properly after
ib_unmap_fmr().
Fixes: 7c7a5390dc6c8 ("xprtrdma: Add ro_unmap_sync method for FMR") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The well-spotted fallocate undo fix is good in most cases, but not when
fallocate failed on the very first page. index 0 then passes lend -1
to shmem_undo_range(), and that has two bad effects: (a) that it will
undo every fallocation throughout the file, unrestricted by the current
range; but more importantly (b) it can cause the undo to hang, because
lend -1 is treated as truncation, which makes it keep on retrying until
every page has gone, but those already fully instantiated will never go
away. Big thank you to xfstests generic/269 which demonstrates this.
Fixes: b9b4bb26af01 ("tmpfs: don't undo fallocate past its last page") Cc: stable@vger.kernel.org Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Three fixes. One is the qla24xx MSI regression, one is a theoretical
problem over blacklist matching, which would bite USB badly if it ever
triggered and one is a system hang with a particular type of IPR
device"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
qla2xxx: Fix NULL pointer deref in QLA interrupt
SCSI: fix new bug in scsi_dev_info_list string matching
ipr: Clear interrupt on croc/crocodile when running with LSI
Merge tag 'ecryptfs-4.7-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
Pull eCryptfs fixes from Tyler Hicks:
"Provide a more concise fix for CVE-2016-1583:
- Additionally fixes linux-stable regressions caused by the
cherry-picking of the original fix
Some very minor changes that have queued up:
- Fix typos in code comments
- Remove unnecessary check for NULL before destroying kmem_cache"
* tag 'ecryptfs-4.7-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
ecryptfs: don't allow mmap when the lower fs doesn't support it
Revert "ecryptfs: forbid opening files without mmap handler"
ecryptfs: fix spelling mistakes
eCryptfs: fix typos in comment
ecryptfs: drop null test before destroy functions
Merge tag 'iommu-fixes-v4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
"Two Fixes:
- Intel VT-d fix for a suspend/resume issue, introduced with the
scalability improvements in this cycle.
- AMD IOMMU fix for systems that have unity mappings defined. There
was a race where translation got enabled before the unity mappings
were in place. This issue was seen on some HP servers"
* tag 'iommu-fixes-v4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Fix unity mapping initialization race
iommu/vt-d: Fix infinite loop in free_all_cpu_cached_iovas
Merge tag 'for-linus-4.7b-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen bug fixes from David Vrabel:
- Fix two bugs in the handling of xenbus transactions.
- Make the xen acpi driver compatible with Xen 4.7.
* tag 'for-linus-4.7b-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/acpi: allow xen-acpi-processor driver to load on Xen 4.7
xenbus: simplify xenbus_dev_request_and_reply()
xenbus: don't bail early from xenbus_dev_request_and_reply()
xenbus: don't BUG() on user mode induced condition
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"A couple of late fixes here, but one that we've been sitting on for a
few weeks while the details were worked out. Specifically, we now
enforce USER_DS on taking exceptions whilst in the kernel, which
avoids leaking kernel data to userspace through things like perf. The
other patch is an update to a workaround for a hardware erratum on
some Cavium SoCs.
Summary:
- Enforce USER_DS on exception entry from EL1
- Apply workaround for Cavium errata #27456 on Thunderx-81xx parts"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Enable workaround for Cavium erratum 27456 on thunderx-81xx
arm64: kernel: Save and restore UAO and addr_limit on exception entry
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Three fixes:
- A boot crash fix with certain configs
- a MAINTAINERS entry update
- Documentation typo fixes"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/Documentation: Fix various typos in Documentation/x86/ files
x86/amd_nb: Fix boot crash on non-AMD systems
MAINTAINERS: Update the Calgary IOMMU entry
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Two load-balancing fixes for cgroups-intense workloads"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix calc_cfs_shares() fixed point arithmetics width confusion
sched/fair: Fix effective_load() to consistently use smoothed load
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Various fixes:
- 32-bit callgraph bug fix
- suboptimal event group scheduling bug fix
- event constraint fixes for Broadwell/Skylake
- RAPL module name collision fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix pmu::filter_match for SW-led groups
x86/perf/intel/rapl: Fix module name collision with powercap intel-rapl
perf/x86: Fix 32-bit perf user callgraph collection
perf/x86/intel: Update event constraints when HT is off
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
"Two MIPS-GIC irqchip driver fixes to unbreak certain MIPS boards"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/mips-gic: Match IPI IRQ domain by bus token only
irqchip/mips-gic: Map to VPs using HW VPNum
Merge tag 'gpio-v4.7-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"I don't like to toss in last minute patches, but these are all for
things that are broken, and have bitten people for real. Two of them
go into stable. Maybe all of them if the compile test problem is a
pain in the ass also for stable folks.
Final (hopefully) GPIO fixes for v4.7:
- Fix an oops on the Asus Eee PC 1201
- Revert a patch trying to split GPIO parsing and GPIO configuration
- Revert a too liberal compile testing thing"
* tag 'gpio-v4.7-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
Revert "gpio: gpiolib-of: Allow compile testing"
Revert "gpiolib: Split GPIO flags parsing and GPIO configuration"
gpio: sch: Fix Oops on module load on Asus Eee PC 1201
Merge tag 'drm-fixes-for-v4.7-rc7' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"One nouveau fix, and a few AMD Polaris fixes and some Allwinner fixes.
I've got some vmware fixes that I might send separate over the
weekend, they fix some black screens, but I'm still debating them"
* tag 'drm-fixes-for-v4.7-rc7' of git://people.freedesktop.org/~airlied/linux:
drm/amd/powerplay: Update CKS on/ CKS off voltage offset calculation.
drm/amd/powerplay: fix bug that get wrong polaris evv voltage.
drm/amd/powerplay: incorrectly use of the function return value
drm/amd/powerplay: fix incorrect voltage table value for tonga
drm/amd/powerplay: fix incorrect voltage table value for polaris10
drm/nouveau/disp/sor/gf119: select correct sor when poking training pattern
gpu: drm: sun4i_drv: add missing of_node_put after calling of_parse_phandle
drm/sun4i: Send vblank event when the CRTC is disabled
drm/sun4i: Report proper vblank
Jeff Mahoney [Tue, 5 Jul 2016 21:32:30 +0000 (17:32 -0400)]
ecryptfs: don't allow mmap when the lower fs doesn't support it
There are legitimate reasons to disallow mmap on certain files, notably
in sysfs or procfs. We shouldn't emulate mmap support on file systems
that don't offer support natively.
CVE-2016-1583
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: stable@vger.kernel.org
[tyhicks: clean up f_op check by using ecryptfs_file_to_lower()] Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Jan Beulich [Fri, 8 Jul 2016 12:15:07 +0000 (06:15 -0600)]
xen/acpi: allow xen-acpi-processor driver to load on Xen 4.7
As of Xen 4.7 PV CPUID doesn't expose either of CPUID[1].ECX[7] and
CPUID[0x80000007].EDX[7] anymore, causing the driver to fail to load on
both Intel and AMD systems. Doing any kind of hardware capability
checks in the driver as a prerequisite was wrong anyway: With the
hypervisor being in charge, all such checking should be done by it. If
ACPI data gets uploaded despite some missing capability, the hypervisor
is free to ignore part or all of that data.
Ditch the entire check_prereq() function, and do the only valid check
(xen_initial_domain()) in the caller in its place.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Jan Beulich [Thu, 7 Jul 2016 07:32:04 +0000 (01:32 -0600)]
xenbus: don't bail early from xenbus_dev_request_and_reply()
xenbus_dev_request_and_reply() needs to track whether a transaction is
open. For XS_TRANSACTION_START messages it calls transaction_start()
and for XS_TRANSACTION_END messages it calls transaction_end().
If sending an XS_TRANSACTION_START message fails or responds with an
an error, the transaction is not open and transaction_end() must be
called.
If sending an XS_TRANSACTION_END message fails, the transaction is
still open, but if an error response is returned the transaction is
closed.
Commit 027bd7e89906 ("xen/xenbus: Avoid synchronous wait on XenBus
stalling shutdown/restart") introduced a regression where failed
XS_TRANSACTION_START messages were leaving the transaction open. This
can cause problems with suspend (and migration) as all transactions
must be closed before suspending.
It appears that the problematic change was added accidentally, so just
remove it.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Merge tag 'acpi-4.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"All of these fix recent regressions in ACPICA, in the ACPI PCI IRQ
management code and in the ACPI AML debugger.
Specifics:
- Fix a lock ordering issue in ACPICA introduced by a recent commit
that attempted to fix a deadlock in the dynamic table loading code
which in turn appeared after changes related to the handling of
module-level AML also made in this cycle (Lv Zheng).
- Fix a recent regression in the ACPI IRQ management code that may
cause PCI drivers to be unable to register an IRQ if that IRQ
happens to be shared with a device on the ISA bus, like the
parallel port, by reverting one commit entirely and restoring the
previous behavior in two other places (Sinan Kaya).
- Fix a recent regression in the ACPI AML debugger introduced by the
commit that removed incorrect usage of IS_ERR_VALUE() from multiple
places (Lv Zheng)"
* tag 'acpi-4.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / debugger: Fix regression introduced by IS_ERR_VALUE() removal
ACPICA: Namespace: Fix namespace/interpreter lock ordering
ACPI,PCI,IRQ: separate ISA penalty calculation
Revert "ACPI, PCI, IRQ: remove redundant code in acpi_irq_penalty_init()"
ACPI,PCI,IRQ: factor in PCI possible
Merge tag 'pm-4.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"One fix for a recent cpuidle core change that, against all odds,
introduced a functional regression on Power systems and the fix for
the crash during resume from hibernation on x86-64 that has been in
the works for the last few weeks (it actually was ready last week, but
I wanted to allow the reporters to test if for some more time).
Specifics:
- Fix a recent performance regression on Power systems (powernv and
pseries) introduced by a core cpuidle commit that decreased the
precision of the last_residency conversion from nano- to
microseconds, which should not matter in theory, but turned out to
play not-so-well with the special "snooze" idle state on Power
(Shreyas B Prabhu).
- Fix a crash during resume from hibernation on x86-64 caused by
possible corruption of the kernel text part of page tables in the
last phase of image restoration exposed by a security-related
change during the 4.3 development cycle (Rafael Wysocki)"
* tag 'pm-4.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpuidle: Fix last_residency division
x86/power/64: Fix kernel text mapping corruption during image restoration
Dave Airlie [Fri, 8 Jul 2016 03:29:11 +0000 (13:29 +1000)]
Merge tag 'sunxi-drm-fixes-for-4.7-2' of https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux into drm-fixes
Allwinner DRM driver fixes for 4.7, take 2
A new set of fixes for the sun4i driver, mostly related to vblank handling,
and a minor fix to release a reference on the device tree nodes we're
parsing in the probe logic.
* tag 'sunxi-drm-fixes-for-4.7-2' of https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux:
gpu: drm: sun4i_drv: add missing of_node_put after calling of_parse_phandle
drm/sun4i: Send vblank event when the CRTC is disabled
drm/sun4i: Report proper vblank
apparmor: fix oops, validate buffer size in apparmor_setprocattr()
When proc_pid_attr_write() was changed to use memdup_user apparmor's
(interface violating) assumption that the setprocattr buffer was always
a single page was violated.
The size test is not strictly speaking needed as proc_pid_attr_write()
will reject anything larger, but for the sake of robustness we can keep
it in.
SMACK and SELinux look safe to me, but somebody else should probably
have a look just in case.
Based on original patch from Vegard Nossum <vegard.nossum@oracle.com>
modified for the case that apparmor provides null termination.
Fixes: bb646cdb12e75d82258c2f2e7746d5952d3e321a Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: John Johansen <john.johansen@canonical.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Eric Paris <eparis@parisplace.org> Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: stable@kernel.org Signed-off-by: John Johansen <john.johansen@canonical.com> Reviewed-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
It fixed a local root exploit but also introduced a dependency on
the lower file system implementing an mmap operation just to open a file,
which is a bit of a heavy hammer. The right fix is to have mmap depend
on the existence of the mmap handler instead.
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: stable@vger.kernel.org Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block IO fixes from Jens Axboe:
"Three small fixes that have been queued up and tested for this series:
- A bug fix for xen-blkfront from Bob Liu, fixing an issue with
incomplete requests during migration.
- A fix for an ancient issue in retrieving the IO priority of a
different PID than self, preventing that task from going away while
we access it. From Omar.
- A writeback fix from Tahsin, fixing a case where we'd call ihold()
with a zero ref count inode"
* 'for-linus' of git://git.kernel.dk/linux-block:
block: fix use-after-free in sys_ioprio_get()
writeback: inode cgroup wb switch should not call ihold()
xen-blkfront: save uncompleted reqs in blkfront_resume()
Merge tag 'configfs-for-4.7' of git://git.infradead.org/users/hch/configfs
Pull configfs fix from Christoph Hellwig:
"A fix from Marek for ppos handling in configfs_write_bin_file, which
was introduced in Linux 4.5, but didn't have any users until recently"
* tag 'configfs-for-4.7' of git://git.infradead.org/users/hch/configfs:
configfs: Remove ppos increment in configfs_write_bin_file
* acpi-pci-fixes:
ACPI,PCI,IRQ: separate ISA penalty calculation
Revert "ACPI, PCI, IRQ: remove redundant code in acpi_irq_penalty_init()"
ACPI,PCI,IRQ: factor in PCI possible
arm64: Enable workaround for Cavium erratum 27456 on thunderx-81xx
Cavium erratum 27456 commit 104a0c02e8b1
("arm64: Add workaround for Cavium erratum 27456")
is applicable for thunderx-81xx pass1.0 SoC as well.
Adding code to enable to 81xx.
Signed-off-by: Ganapatrao Kulkarni <gkulkarni@cavium.com> Reviewed-by: Andrew Pinski <apinski@cavium.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
James Morse [Mon, 20 Jun 2016 17:28:01 +0000 (18:28 +0100)]
arm64: kernel: Save and restore UAO and addr_limit on exception entry
If we take an exception while at EL1, the exception handler inherits
the original context's addr_limit and PSTATE.UAO values. To be consistent
always reset addr_limit and PSTATE.UAO on (re-)entry to EL1. This
prevents accidental re-use of the original context's addr_limit.
Based on a similar patch for arm from Russell King.
Cc: <stable@vger.kernel.org> # 4.6- Acked-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
Jan Beulich [Thu, 7 Jul 2016 07:23:57 +0000 (01:23 -0600)]
xenbus: don't BUG() on user mode induced condition
Inability to locate a user mode specified transaction ID should not
lead to a kernel crash. For other than XS_TRANSACTION_START also
don't issue anything to xenbus if the specified ID doesn't match that
of any active transaction.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Mark Rutland [Tue, 14 Jun 2016 15:10:41 +0000 (16:10 +0100)]
perf/core: Fix pmu::filter_match for SW-led groups
The following commit:
66eb579e66ec ("perf: allow for PMU-specific event filtering")
added the pmu::filter_match() callback. This was intended to
avoid HW constraints on events from resulting in extremely
pessimistic scheduling.
However, pmu::filter_match() is only called for the leader of each event
group. When the leader is a SW event, we do not filter the groups, and
may fail at pmu::add() time, and when this happens we'll give up on
scheduling any event groups later in the list until they are rotated
ahead of the failing group.
This can result in extremely sub-optimal event scheduling behaviour,
e.g. if running the following on a big.LITTLE platform:
$ taskset -c 0 ./perf stat \
-e 'a57{context-switches,armv8_cortex_a57/config=0x11/}' \
-e 'a53{context-switches,armv8_cortex_a53/config=0x11/}' \
ls
Here the 'a53' event group was always eligible to be scheduled, but
the 'a57' group never eligible to be scheduled, as the task was always
affine to a Cortex-A53 CPU. The SW (group leader) event in the 'a57'
group was eligible, but the HW event failed at pmu::add() time,
resulting in ctx_flexible_sched_in giving up on scheduling further
groups with HW events.
One way of avoiding this is to check pmu::filter_match() on siblings
as well as the group leader. If any of these fail their
pmu::filter_match() call, we must skip the entire group before
attempting to add any events.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Fixes: 66eb579e66ec ("perf: allow for PMU-specific event filtering") Link: http://lkml.kernel.org/r/1465917041-15339-1-git-send-email-mark.rutland@arm.com
[ Small readability edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
Dave Airlie [Thu, 7 Jul 2016 02:37:42 +0000 (12:37 +1000)]
Merge branch 'drm-fixes-4.7' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Just a couple of fixes for amdgpu for 4.7:
- 2 small tonga powerplay fixes
- Additional Polaris fixes
* 'drm-fixes-4.7' of git://people.freedesktop.org/~agd5f/linux:
drm/amd/powerplay: Update CKS on/ CKS off voltage offset calculation.
drm/amd/powerplay: fix bug that get wrong polaris evv voltage.
drm/amd/powerplay: incorrectly use of the function return value
drm/amd/powerplay: fix incorrect voltage table value for tonga
drm/amd/powerplay: fix incorrect voltage table value for polaris10
Randy Dunlap [Wed, 6 Jul 2016 23:06:53 +0000 (16:06 -0700)]
init/Kconfig: keep Expert users menu together
The "expert" menu was broken (split) such that all entries in it after
KALLSYMS were displayed in the "General setup" area instead of in the
"Expert users" area. Fix this by adding one kconfig dependency.
Yes, the Expert users menu is fragile. Problems like this have happened
several times in the past. I will attempt to isolate the Expert users
menu if there is interest in that.
Rex Zhu [Tue, 5 Jul 2016 05:11:47 +0000 (13:11 +0800)]
drm/amd/powerplay: incorrectly use of the function return value
'0' means true.
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
1) All users of AF_PACKET's fanout feature want a symmetric packet
header hash for load balancing purposes, so give it to them.
2) Fix vlan state synchronization in e1000e, from Jarod Wilson.
3) Use correct socket pointer in ip_skb_dst_mtu(), from Shmulik
Ladkani.
4) mlx5 bug fixes from Mohamad Haj Yahia, Daniel Jurgens, Matthew
Finlay, Rana Shahout, and Shaker Daibes. Mostly to do with
operation timeouts and PCI error handling.
5) Fix checksum handling in mirred packet action, from WANG Cong.
6) Set skb->dev correctly when transmitting in !protect_frames case of
macsec driver, from Daniel Borkmann.
7) Fix MTU calculation in geneve driver, from Haishuang Yan.
8) Missing netif_napi_del() in unregister path of qeth driver, from
Ursula Braun.
9) Handle malformed route netlink messages in decnet properly, from
Vergard Nossum.
10) Memory leak of percpu data in ipv6 routing code, from Martin KaFai
Lau.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
ipv6: Fix mem leak in rt6i_pcpu
net: fix decnet rtnexthop parsing
cxgb4: update latest firmware version supported
net/mlx5: Avoid setting unused var when modifying vport node GUID
bonding: fix enslavement slave link notifications
r8152: fix runtime function for RTL8152
qeth: delete napi struct when removing a qeth device
Revert "fsl/fman: fix error handling"
fsl/fman: fix error handling
cdc_ncm: workaround for EM7455 "silent" data interface
RDS: fix rds_tcp_init() error path
geneve: fix max_mtu setting
net: phy: dp83867: Fix initialization of PHYCR register
enc28j60: Fix race condition in enc28j60 driver
net: stmmac: Fix null-function call in ISR on stmmac1000
tipc: fix nl compat regression for link statistics
net: bcmsysport: Device stats are unsigned long
macsec: set actual real device for xmit when !protect_frames
net_sched: fix mirrored packets checksum
packet: Use symmetric hash for PACKET_FANOUT_HASH.
...
Merge tag 'sound-4.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Here are a collection of small fixes: at this time, we've got a
slightly high amount, but all small and trivial fixes, and nothing
scary can be seen there"
* tag 'sound-4.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (21 commits)
ALSA: hda/realtek: Add Lenovo L460 to docking unit fixup
ALSA: timer: Fix negative queue usage by racy accesses
ASoC: rt5645: fix reg-2f default value.
ASoC: fsl_ssi: Fix number of words per frame for I2S-slave mode
ALSA: au88x0: Fix calculation in vortex_wtdma_bufshift()
ALSA: hda - Add PCI ID for Kabylake-H
ALSA: echoaudio: Fix memory allocation
ASoC: Intel: atom: fix missing breaks that would cause the wrong operation to execute
ALSA: hda - fix read before array start
ASoC: cx20442: set tty->receiver_room in v253_open
ASoC: ak4613: Enable cache usage to fix crashes on resume
ASoC: wm8940: Enable cache usage to fix crashes on resume
ASoC: Intel: Skylake: Initialize module list for Broxton
ASoC: wm5102: Correct supported channels on trace compressed DAI
ASoC: wm5110: Add missing route from OUT3R to SYSCLK
ASoC: rt5670: fix HP Playback Volume control
ASoC: hdmi-codec: select CONFIG_HDMI
ASoC: davinci-mcasp: Fix dra7 DMA offset when using CFG port
ASoC: hdac_hdmi: Fix potential NULL dereference
ASoC: ak4613: Remove owner assignment from platform_driver
...
There is a race condition in the AMD IOMMU init code that
causes requested unity mappings to be blocked by the IOMMU
for a short period of time. This results on boot failures
and IO_PAGE_FAULTs on some machines.
Fix this by making sure the unity mappings are installed
before all other DMA is blocked.
David Daney [Thu, 16 Jun 2016 22:50:31 +0000 (15:50 -0700)]
MIPS: Fix page table corruption on THP permission changes.
When the core THP code is modifying the permissions of a huge page it
calls pmd_modify(), which unfortunately was clearing the _PAGE_HUGE bit
of the page table entry. The result can be kernel messages like:
Ville Syrjälä [Thu, 23 Jun 2016 15:06:49 +0000 (18:06 +0300)]
x86/perf/intel/rapl: Fix module name collision with powercap intel-rapl
Since commit 4b6e2571bf00 the rapl perf module calls itself intel-rapl. That
name was already in use by the rapl powercap driver, which now fails to load
if the perf module is loaded. Fix the problem by renaming the perf module to
intel-rapl-perf, so that both modules can coexist.
Fixes: 4b6e2571bf00 ("x86/perf/intel/rapl: Make the Intel RAPL PMU driver modular") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1466694409-3620-1-git-send-email-ville.syrjala@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
It was first reported and reproduced by Petr (thanks!) in
https://bugzilla.kernel.org/show_bug.cgi?id=119581
free_percpu(rt->rt6i_pcpu) used to always happen in ip6_dst_destroy().
However, after fixing a deadlock bug in
commit 9c7370a166b4 ("ipv6: Fix a potential deadlock when creating pcpu rt"),
free_percpu() is not called before setting non_pcpu_rt->rt6i_pcpu to NULL.
It is worth to note that rt6i_pcpu is protected by table->tb6_lock.
kmemleak somehow did not report it. We nailed it down by
observing the pcpu entries in /proc/vmallocinfo (first suggested
by Hannes, thanks!).
Signed-off-by: Martin KaFai Lau <kafai@fb.com> Fixes: 9c7370a166b4 ("ipv6: Fix a potential deadlock when creating pcpu rt") Reported-by: Petr Novopashenniy <pety@rusnet.ru> Tested-by: Petr Novopashenniy <pety@rusnet.ru> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Petr Novopashenniy <pety@rusnet.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
dn_fib_count_nhs() could enter an infinite loop if nhp->rtnh_len == 0
(i.e. if userspace passes a malformed netlink message).
Let's use the helpers from net/nexthop.h which take care of all this
stuff. We can do exactly the same as e.g. fib_count_nexthops() and
fib_get_nhs() from net/ipv4/fib_semantics.c.
This fixes the softlockup for me.
Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Tue, 21 Jun 2016 13:58:46 +0000 (16:58 +0300)]
platform/chrome: cros_ec_dev - double fetch bug in ioctl
We verify "u_cmd.outsize" and "u_cmd.insize" but we need to make sure
that those values have not changed between the two copy_from_user()
calls. Otherwise it could lead to a buffer overflow.
Additionally, cros_ec_cmd_xfer() can set s_cmd->insize to a lower value.
We should use the new smaller value so we don't copy too much data to
the user.
Reported-by: Pengfei Wang <wpengfeinudt@gmail.com> Fixes: a841178445bb ('mfd: cros_ec: Use a zero-length array for command data') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Gwendal Grignou <gwendal@chromium.org> Cc: <stable@vger.kernel.org> # v4.2+ Signed-off-by: Olof Johansson <olof@lixom.net>
However, this doesn't fix the current design issues related to the
namespace lock. For example, we can notice that in acpi_ns_evaluate(),
outside of acpi_ns_load_table(), the namespace objects may be created
by the named object creation control methods. And the creation of
the method-owned namespace objects are not locked by the namespace
lock. This patch doesn't try to fix such kind of existing issues.
Fixes: 2f38b1b16d92 (ACPICA: Namespace: Fix a regression that MLC support triggers dead lock in dynamic table loading) Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Or Gerlitz [Tue, 5 Jul 2016 09:17:12 +0000 (12:17 +0300)]
net/mlx5: Avoid setting unused var when modifying vport node GUID
GCC complains on unused-but-set-variable, clean this up.
Fixes: 23898c763f4a ('net/mlx5: E-Switch, Modify node guid on vf set MAC') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, link notifications are not sent by
bond_set_slave_link_state() upon enslavement if
the slave is enslaved when up.
This happens because slave->link default init value
is 0, which is the same as BOND_LINK_UP, resulting
in bond_set_slave_link_state() ignoring this transition.
This patch sets the default value of slave->link to
BOND_LINK_NOCHANGE, assuring it will count as a state
transition and thus trigger notification logic.
Signed-off-by: Aviv Heller <avivh@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Burton [Tue, 5 Jul 2016 13:26:00 +0000 (14:26 +0100)]
irqchip/mips-gic: Match IPI IRQ domain by bus token only
Commit fbde2d7d8290 ("MIPS: Add generic SMP IPI support") introduced
code which calls irq_find_matching_host with a NULL node parameter in
order to discover IPI IRQ domains which are not associated with the DT
root node's interrupt parent. This suggests that implementations of IPI
IRQ domains should effectively ignore the node parameter if it is NULL
and search purely based upon the bus token. Commit 2af70a962070
("irqchip/mips-gic: Add a IPI hierarchy domain") did not do this when
implementing the GIC IPI IRQ domain, and on MIPS Boston boards this
leads to no IPI domain being discovered and a NULL pointer dereference
when attempting to send an IPI:
Paul Burton [Tue, 5 Jul 2016 13:25:59 +0000 (14:25 +0100)]
irqchip/mips-gic: Map to VPs using HW VPNum
When mapping an interrupt to a VP(E) we must use the identifier for the
VP that the hardware expects, and this does not always match up with the
Linux CPU number. Commit d46812bb0bef ("irqchip: mips-gic: Use HW IDs
for VPE_OTHER_ADDR") corrected this for the cases that existed at the
time it was written, but commit 2af70a962070 ("irqchip/mips-gic: Add a
IPI hierarchy domain") added another case before the former patch was
merged. This leads to incorrectly using Linux CPU numbers when mapping
interrupts to VPs, which breaks on certain systems such as those with
multi-core I6400 CPUs. Fix by adding the appropriate call to
mips_cm_vp_id() to retrieve the expected VP identifier.
Fixes: d46812bb0bef ("irqchip: mips-gic: Use HW IDs for VPE_OTHER_ADDR") Fixes: 2af70a962070 ("irqchip/mips-gic: Add a IPI hierarchy domain") Signed-off-by: Paul Burton <paul.burton@imgtec.com> Cc: linux-mips@linux-mips.org Cc: Jason Cooper <jason@lakedaemon.net> Cc: Qais Yousef <qsyousef@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20160705132600.27730-1-paul.burton@imgtec.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
qeth: delete napi struct when removing a qeth device
A qeth_card contains a napi_struct linked to the net_device during
device probing. This struct must be deleted when removing the qeth
device, otherwise Panic on oops can occur when qeth devices are
repeatedly removed and added.
Fixes: a1c3ed4c9ca ("qeth: NAPI support for l2 and l3 discipline") Cc: stable@vger.kernel.org # v2.6.37+ Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Tested-by: Alexander Klein <ALKL@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
cdc_ncm: workaround for EM7455 "silent" data interface
Several Lenovo users have reported problems with their Sierra
Wireless EM7455 modem. The driver has loaded successfully and
the MBIM management channel has appeared to work, including
establishing a connection to the mobile network. But no frames
have been received over the data interface.
The problem affects all EM7455 and MC7455, and is assumed to
affect other modems based on the same Qualcomm chipset and
baseband firmware.
Testing narrowed the problem down to what seems to be a
firmware timing bug during initialization. Adding a short sleep
while probing is sufficient to make the problem disappear.
Experiments have shown that 1-2 ms is too little to have any
effect, while 10-20 ms is enough to reliably succeed.
Reported-by: Stefan Armbruster <ml001@armbruster-it.de> Reported-by: Ralph Plawetzki <ralph@purejava.org> Reported-by: Andreas Fett <andreas.fett@secunet.com> Reported-by: Rasmus Lerdorf <rasmus@lerdorf.com> Reported-by: Samo Ratnik <samo.ratnik@gmail.com> Reported-and-tested-by: Aleksander Morgado <aleksander@aleksander.es> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
If register_pernet_subsys() fails, we shouldn't try to call
unregister_pernet_subsys().
Fixes: 467fa15356 ("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.") Cc: stable@vger.kernel.org Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure consumers do not overwrite gpio flags for pins that have
already been claimed.
While adding support for gpio drivers to refuse a request using
unsupported flags, the order of when the requested flag was checked and
the new flags were applied was reversed to that consumers could
overwrite flags for already requested gpios.
This not only affects device-tree setups where two drivers could request
the same gpio using conflicting configurations, but also allowed user
space to clear gpio flags for already claimed pins simply by attempting
to export them through the sysfs interface. By for example clearing the
FLAG_ACTIVE_LOW flag this way, user space could effectively change the
polarity of a signal.
Reverting this change obviously prevents gpio drivers from doing sanity
checks on the flags in their request callbacks. Fortunately only one
recently added driver (gpio-tps65218 in v4.6) appears to do this, and a
follow up patch could restore this functionality through a different
interface.
Colin Pitrat [Sat, 18 Jun 2016 18:05:04 +0000 (19:05 +0100)]
gpio: sch: Fix Oops on module load on Asus Eee PC 1201
This fixes the issue descirbe in bug 117531
(https://bugzilla.kernel.org/show_bug.cgi?id=117531).
It's a regression introduced in linux 4.5 that causes a Oops at load of
gpio_sch and prevents powering off the computer.
The issue is that sch_gpio_reg_set is called in sch_gpio_probe before
gpio_chip data is initialized with the pointer to the sch_gpio struct. As
sch_gpio_reg_set calls gpiochip_get_data, it returns NULL which causes
the Oops.
The patch follows Mika's advice (https://lkml.org/lkml/2016/5/9/61) and
consists in modifying sch_gpio_reg_get and sch_gpio_reg_set to take a
sch_gpio struct directly instead of a gpio_chip, which avoids the call to
gpiochip_get_data.
Thanks Mika for your patience with me :-)
Cc: stable@vger.kernel.org Signed-off-by: Colin Pitrat <colin.pitrat@gmail.com> Acked-by: Alexandre Courbot <acourbot@nvidia.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Snooze is a poll idle state in powernv and pseries platforms. Snooze
has a timeout so that if a CPU stays in snooze for more than target
residency of the next available idle state, then it would exit
thereby giving chance to the cpuidle governor to re-evaluate and
promote the CPU to a deeper idle state. Therefore whenever snooze
exits due to this timeout, its last_residency will be target_residency
of the next deeper state.
Commit e93e59ce5b85 "cpuidle: Replace ktime_get() with local_clock()"
changed the math around last_residency calculation. Specifically,
while converting last_residency value from nano- to microseconds, it
carries out right shift by 10. Because of that, in snooze timeout
exit scenarios last_residency calculated is roughly 2.3% less than
target_residency of the next available state. This pattern is picked
up by get_typical_interval() in the menu governor and therefore
expected_interval in menu_select() is frequently less than the
target_residency of any state other than snooze.
Due to this we are entering snooze at a higher rate, thereby
affecting the single thread performance.
Fix this by using more precise division via ktime_us_delta().
Fixes: e93e59ce5b85 "cpuidle: Replace ktime_get() with local_clock()" Reported-by: Anton Blanchard <anton@samba.org> Bisected-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ALSA: timer: Fix negative queue usage by racy accesses
The user timer tu->qused counter may go to a negative value when
multiple concurrent reads are performed since both the check and the
decrement of tu->qused are done in two individual locked contexts.
This results in bogus read outs, and the endless loop in the
user-space side.
The fix is to move the decrement of the tu->qused counter into the
same spinlock context as the zero-check of the counter.
iommu/vt-d: Fix infinite loop in free_all_cpu_cached_iovas
Per VT-d spec Section 10.4.2 ("Capability Register"), the maximum
number of possible domains is 64K; indeed this is the maximum value
that the cap_ndoms() macro will expand to. Since the value 65536
will not fix in a u16, the 'did' variable must be promoted to an
int, otherwise the test for < 65536 will always be true and the
loop will never end.
The symptom, in my case, was a hung machine during suspend.