git.karo-electronics.de Git - karo-tx-linux.git/log

Merge branch 'for-chris-4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.11

parisc: Avoid stalled CPU warnings after system shutdown

Commit 73580dac7618 ("parisc: Fix system shutdown halt") introduced an endless
loop for systems which don't provide a software power off function. But the
soft lockup detector will detect this and report stalled CPUs after some time.
Avoid those unwanted warnings by disabling the soft lockup detector.

Fixes: 73580dac7618 ("parisc: Fix system shutdown halt")
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 4.9+

parisc: Clean up fixup routines for get_user()/put_user()

Al Viro noticed that userspace accesses via get_user()/put_user() can be
simplified a lot with regard to usage of the exception handling.

This patch implements a fixup routine for get_user() and put_user() in such
that the exception handler will automatically load -EFAULT into the register
%r8 (the error value) in case on a fault on userspace.  Additionally the fixup
routine will zero the target register on fault in case of a get_user() call.
The target register is extracted out of the faulting assembly instruction.

This patch brings a few benefits over the old implementation:
1. Exception handling gets much cleaner, easier and smaller in size.
2. Helper functions like fixup_get_user_skip_1 (all of fixup.S) can be dropped.
3. No need to hardcode %r9 as target register for get_user() any longer. This
   helps the compiler register allocator and thus creates less assembler
   statements.
4. No dependency on the exception_data contents any longer.
5. Nested faults will be handled cleanly.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Cc: <stable@vger.kernel.org> # v4.9+
Signed-off-by: Helge Deller <deller@gmx.de>

parisc: Fix access fault handling in pa_memcpy()

pa_memcpy() is the major memcpy implementation in the parisc kernel which is
used to do any kind of userspace/kernel memory copies.

Al Viro noticed various bugs in the implementation of pa_mempcy(), most notably
that in case of faults it may report back to have copied more bytes than it
actually did.

Fixing those bugs is quite hard in the C-implementation, because the compiler
is messing around with the registers and we are not guaranteed that specific
variables are always in the same processor registers. This makes proper fault
handling complicated.

This patch implements pa_memcpy() in assembler. That way we have correct fault
handling and adding a 64-bit copy routine was quite easy.

Runtime tested with 32- and 64bit kernels.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Cc: <stable@vger.kernel.org> # v4.9+
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>

Merge branch 'regset' (PTRACE_SETREGSET data leakage)

Merge PTRACE_SETREGSET leakage fixes from Dave Martin:
"This series is the collection of fixes I proposed on this topic, that
  have not yet appeared upstream or in the stable branches,

  The issue can leak kernel stack, but doesn't appear to allow userspace
  to attack the kernel directly.  The affected architectures are c6x,
  h8300, metag, mips and sparc.

  [ Mark Salter points out that c6x has no MMU or other mechanism to
    prevent userspace access to kernel code or data on c6x, but it
    doesn't hurt to clean that case up too. ]

  The bugs arise from use of user_regset_copyin(). Users of
  user_regset_copyin() can work in one of two ways:

   1) Copy directly to thread_struct or equivalent. (This seems to be
      the design assumption of the regset API, and is the most common
      approach.)

   2) Copy to a local variable and then transfer to thread_struct. (A
      significant minority of cases.)

  Buggy code typically involves approach 2"

* emailed patches from Dave Martin <Dave.Martin@arm.com>:
  sparc/ptrace: Preserve previous registers for short regset write
  mips/ptrace: Preserve previous registers for short regset write
  metag/ptrace: Reject partial NT_METAG_RPIPE writes
  metag/ptrace: Provide default TXSTATUS for short NT_PRSTATUS
  metag/ptrace: Preserve previous registers for short regset write
  h8300/ptrace: Fix incorrect register transfer count
  c6x/ptrace: Remove useless PTRACE_SETREGSET implementation

sparc/ptrace: Preserve previous registers for short regset write

Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill all the registers, the thread's old registers are preserved.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mips/ptrace: Preserve previous registers for short regset write

Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill all the registers, the thread's old registers are preserved.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

metag/ptrace: Reject partial NT_METAG_RPIPE writes

It's not clear what behaviour is sensible when doing partial write of
NT_METAG_RPIPE, so just don't bother.

This patch assumes that userspace will never rely on a partial SETREGSET
in this case, since it's not clear what should happen anyway.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

metag/ptrace: Provide default TXSTATUS for short NT_PRSTATUS

Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill TXSTATUS, a well-defined default value is used, based on the
task's current value.

Suggested-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

metag/ptrace: Preserve previous registers for short regset write

Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill all the registers, the thread's old registers are preserved.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

h8300/ptrace: Fix incorrect register transfer count

regs_set() and regs_get() are vulnerable to an off-by-1 buffer overrun
if CONFIG_CPU_H8S is set, since this adds an extra entry to
register_offset[] but not to user_regs_struct.

So, iterate over user_regs_struct based on its actual size, not based on
the length of register_offset[].

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

c6x/ptrace: Remove useless PTRACE_SETREGSET implementation

gpr_set won't work correctly and can never have been tested, and the
correct behaviour is not clear due to the endianness-dependent task
layout.

So, just remove it. The core code will now return -EOPNOTSUPPORT when
trying to set NT_PRSTATUS on this architecture until/unless a correct
implementation is supplied.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder

Kees Cook has pointed out that xfrm_replay_state_esn_len() is subject to
wrapping issues. To ensure we are correctly ensuring that the two ESN
structures are the same size compare both the overall size as reported
by xfrm_replay_state_esn_len() and the internal length are the same.

CVE-2017-7184
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window

When a new xfrm state is created during an XFRM_MSG_NEWSA call we
validate the user supplied replay_esn to ensure that the size is valid
and to ensure that the replay_window size is within the allocated
buffer.  However later it is possible to update this replay_esn via a
XFRM_MSG_NEWAE call.  There we again validate the size of the supplied
buffer matches the existing state and if so inject the contents.  We do
not at this point check that the replay_window is within the allocated
memory.  This leads to out-of-bounds reads and writes triggered by
netlink packets.  This leads to memory corruption and the potential for
priviledge escalation.

We already attempt to validate the incoming replay information in
xfrm_new_ae() via xfrm_replay_verify_len().  This confirms that the user
is not trying to change the size of the replay state buffer which
includes the replay_esn.  It however does not check the replay_window
remains within that buffer.  Add validation of the contained
replay_window.

CVE-2017-7184
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge remote-tracking branch 'mkp-scsi/4.11/scsi-fixes' into fixes

drm/etnaviv: (re-)protect fence allocation with GPU mutex

The fence allocation needs to be protected by the GPU mutex, otherwise
the fence seqnos of concurrent submits might not match the insertion order
of the jobs in the kernel ring. This breaks the assumption that jobs
complete with monotonically increasing fence seqnos.

Fixes: d9853490176c (drm/etnaviv: take GPU lock later in the submit process)
CC: stable@vger.kernel.org #4.9+
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>

Btrfs: fix an integer overflow check

This isn't super serious because you need CAP_ADMIN to run this code.

I added this integer overflow check last year but apparently I am
rubbish at writing integer overflow checks... There are two issues.
First, access_ok() works on unsigned long type and not u64 so on 32 bit
systems the access_ok() could be checking a truncated size. The other
issue is that we should be using a stricter limit so we don't overflow
the kzalloc() setting ctx->clone_roots later in the function after the
access_ok():

alloc_size = sizeof(struct clone_root) * (arg->clone_sources_count + 1);
sctx->clone_roots = kzalloc(alloc_size, GFP_KERNEL | __GFP_NOWARN);

Fixes: f5ecec3ce21f ("btrfs: send: silence an integer overflow warning")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ added comment ]
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: Change qgroup_meta_rsv to 64bit

Using an int value is causing qg->reserved to become negative and
exclusive -EDQUOT to be reached prematurely.

This affects exclusive qgroups only.

TEST CASE:

DEVICE=/dev/vdb
MOUNTPOINT=/mnt
SUBVOL=$MOUNTPOINT/tmp

umount $SUBVOL
umount $MOUNTPOINT

mkfs.btrfs -f $DEVICE
mount /dev/vdb $MOUNTPOINT
btrfs quota enable $MOUNTPOINT
btrfs subvol create $SUBVOL
umount $MOUNTPOINT
mount /dev/vdb $MOUNTPOINT
mount -o subvol=tmp $DEVICE $SUBVOL
btrfs qgroup limit -e 3G $SUBVOL

btrfs quota rescan /mnt -w

for i in `seq 1 44000`; do
  dd if=/dev/zero of=/mnt/tmp/test_$i bs=10k count=1
  if [[ $? > 0 ]]; then
     btrfs qgroup show -pcref $SUBVOL
     exit 1
  fi
done

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
[ add reproducer to changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>

Btrfs: bring back repair during read

Commit 20a7db8ab3f2 ("btrfs: add dummy callback for readpage_io_failed
and drop checks") made a cleanup around readpage_io_failed_hook, and
it was supposed to keep the original sematics, but it also
unexpectedly disabled repair during read for dup, raid1 and raid10.

This fixes the problem by letting data's inode call the generic
readpage_io_failed callback by returning -EAGAIN from its
readpage_io_failed_hook in order to notify end_bio_extent_readpage to
do the rest. We don't call it directly because the generic one takes
an offset from end_bio_extent_readpage() to calculate the index in the
checksum array and inode's readpage_io_failed_hook doesn't offer that
offset.

Cc: David Sterba <dsterba@suse.cz>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ keep the const function attribute ]
Signed-off-by: David Sterba <dsterba@suse.com>

Merge remote-tracking branches 'asoc/fix/rt5665', 'asoc/fix/simple', 'asoc/fix/sti' and 'asoc/fix/sun8i' into asoc-linus

Merge remote-tracking branches 'asoc/fix/adsp', 'asoc/fix/atmel', 'asoc/fix/hdac-hdmi' and 'asoc/fix/mtk' into asoc-linus

Merge remote-tracking branch 'asoc/fix/rcar' into asoc-linus

Merge remote-tracking branch 'asoc/fix/intel' into asoc-linus

usb: phy: isp1301: Fix build warning when CONFIG_OF is disabled

Commit fd567653bdb9 ("usb: phy: isp1301: Add OF device ID table")
added an OF device ID table, but used the of_match_ptr() macro
that will lead to a build warning if CONFIG_OF symbol is disabled:

drivers/usb/phy//phy-isp1301.c:36:34: warning: ‘isp1301_of_match’ defined but not used [-Wunused-const-variable=]
static const struct of_device_id isp1301_of_match[] = {
^~~~~~~~~~~~~~~~

Fixes: fd567653bdb9 ("usb: phy: isp1301: Add OF device ID table")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xhci: Manually give back cancelled URB if we can't queue it for cancel

xhci needs to take care of four scenarios when asked to cancel a URB.

1 URB is not queued or already given back.
  usb_hcd_check_unlink_urb() will return an error, we pass the error on

2 We fail to find xhci internal structures from urb private data such as
  virtual device and endpoint ring.
  Give back URB immediately, can't do anything about internal structures.

3 URB private data has valid pointers to xhci internal data, but host is
  not  responding.
  give back URB immedately and remove the URB from the endpoint lists.

4 Everyting is working
  add URB to cancel list, queue a command to stop the endpoint, after
  which the URB can be turned to no-op or skipped, removed from lists,
  and given back.

We failed to give back the urb in case 2 where the correct device and
endpoint pointers could not be retrieved from URB private data.

This caused a hang on Dell Inspiron 5558/0VNM2T at resume from suspend
as urb was never returned.

[  245.270505] INFO: task rtsx_usb_ms_1:254 blocked for more than 120 seconds.
[  245.272244]       Tainted: G        W       4.11.0-rc3-ARCH #2
[  245.273983] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  245.275737] rtsx_usb_ms_1   D    0   254      2 0x00000000
[  245.277524] Call Trace:
[  245.279278]  __schedule+0x2d3/0x8a0
[  245.281077]  schedule+0x3d/0x90
[  245.281961]  usb_kill_urb.part.3+0x6c/0xa0 [usbcore]
[  245.282861]  ? wake_atomic_t_function+0x60/0x60
[  245.283760]  usb_kill_urb+0x21/0x30 [usbcore]
[  245.284649]  usb_start_wait_urb+0xe5/0x170 [usbcore]
[  245.285541]  ? try_to_del_timer_sync+0x53/0x80
[  245.286434]  usb_bulk_msg+0xbd/0x160 [usbcore]
[  245.287326]  rtsx_usb_send_cmd+0x63/0x90 [rtsx_usb]

Reported-by: diego.viola@gmail.com
Tested-by: diego.viola@gmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xhci: Set URB actual length for stopped control transfers

A control transfer that stopped at the status stage incorrectly
warned about a "unexpected TRB Type 4", and did not set the
transferred actual_length for the URB.

The URB actual_length for control transfers should contain the
bytes transferred in the data stage.

Bytes of a partially sent setup stage and missing bytes from
status stage should be left out.

Cc: <stable@vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xhci: plat: Register shutdown for xhci_plat

Shutdown should be called for xhci_plat devices especially for
situations where kexec might be used by stopping DMA
transactions.

Signed-off-by: Adam Wallis <awallis@codeaurora.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

svcrdma: set XPT_CONG_CTRL flag for bc xprt

Same change as Kinglong Mee's fix for the TCP backchannel service.

Fixes: 5283b03ee5cd ("nfs/nfsd/sunrpc: enforce transport...")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

ACPI: Fix incompatibility with mcount-based function graph tracing

Paul Menzel reported a warning:

  WARNING: CPU: 0 PID: 774 at /build/linux-ROBWaj/linux-4.9.13/kernel/trace/trace_functions_graph.c:233 ftrace_return_to_handler+0x1aa/0x1e0
  Bad frame pointer: expected f6919d98, received f6919db0
    from func acpi_pm_device_sleep_wake return to c43b6f9d

The warning means that function graph tracing is broken for the
acpi_pm_device_sleep_wake() function.  That's because the ACPI Makefile
unconditionally sets the '-Os' gcc flag to optimize for size.  That's an
issue because mcount-based function graph tracing is incompatible with
'-Os' on x86, thanks to the following gcc bug:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109

I have another patch pending which will ensure that mcount-based
function graph tracing is never used with CONFIG_CC_OPTIMIZE_FOR_SIZE on
x86.

But this patch is needed in addition to that one because the ACPI
Makefile overrides that config option for no apparent reason.  It has
had this flag since the beginning of git history, and there's no related
comment, so I don't know why it's there.  As far as I can tell, there's
no reason for it to be there.  The appropriate behavior is for it to
honor CONFIG_CC_OPTIMIZE_FOR_{SIZE,PERFORMANCE} like the rest of the
kernel.

Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: All applicable <stable@vger.kernel.org>

ACPI / APEI: Add missing synchronize_rcu() on NOTIFY_SCI removal

When removing a GHES device notified by SCI, list_del_rcu() is used,
ghes_remove() should call synchronize_rcu() before it goes on to call
kfree(ghes), otherwise concurrent RCU readers may still hold this list
entry after it has been freed.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Fixes: 81e88fdc432a (ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ACPI: Do not create a platform_device for IOAPIC/IOxAPIC

No platform-device is required for IO(x)APICs, so don't even
create them.

[ rjw: This fixes a problem with leaking platform device objects
after IOAPIC/IOxAPIC hot-removal events.]

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ACPI: ioapic: Clear on-stack resource before using it

The on-stack resource-window 'win' in setup_res() is not
properly initialized. This causes the pointers in the
embedded 'struct resource' to contain stale addresses.

These pointers (in my case the ->child pointer) later get
propagated to the global iomem_resources list, causing a #GP
exception when the list is traversed in
iomem_map_sanity_check().

Fixes: c183619b63ec (x86/irq, ACPI: Implement ACPI driver to support IOAPIC hotplug)
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fixes from Michael Tsirkin:
"Fixes to multiple issues in virtio.

  Most notably a regression fix for crashes reported by Fedora users.
  Hibernate is still reportedly broken, working on it"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio_balloon: prevent uninitialized variable use
  virtio-balloon: use actual number of stats for stats queue buffers
  virtio_balloon: init 1st buffer in stats vq
  virtio_pci: fix out of bound access for msix_names

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
"All x86-specific, apart from some arch-independent syzkaller fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: cleanup the page tracking SRCU instance
  KVM: nVMX: fix nested EPT detection
  KVM: pci-assign: do not map smm memory slot pages in vt-d page tables
  KVM: kvm_io_bus_unregister_dev() should never fail
  KVM: VMX: Fix enable VPID conditions
  KVM: nVMX: Fix nested VPID vmx exec control
  KVM: x86: correct async page present tracepoint
  kvm: vmx: Flush TLB when the APIC-access address changes
  KVM: x86: use pic/ioapic destructor when destroy vm
  KVM: x86: check existance before destroy
  KVM: x86: clear bus pointer when destroyed
  KVM: Documentation: document MCE ioctls
  KVM: nVMX: don't reset kvm mmu twice
  PTP: fix ptr_ret.cocci warnings
  kvm: fix usage of uninit spinlock in avic_vm_destroy()
  KVM: VMX: downgrade warning on unexpected exit code

virtio_balloon: prevent uninitialized variable use

The latest gcc-7.0.1 snapshot reports a new warning:

virtio/virtio_balloon.c: In function 'update_balloon_stats':
virtio/virtio_balloon.c:258:26: error: 'events[2]' is used uninitialized in this function [-Werror=uninitialized]
virtio/virtio_balloon.c:260:26: error: 'events[3]' is used uninitialized in this function [-Werror=uninitialized]
virtio/virtio_balloon.c:261:56: error: 'events[18]' is used uninitialized in this function [-Werror=uninitialized]
virtio/virtio_balloon.c:262:56: error: 'events[17]' is used uninitialized in this function [-Werror=uninitialized]

This seems absolutely right, so we should add an extra check to
prevent copying uninitialized stack data into the statistics.
>From all I can tell, this has been broken since the statistics code
was originally added in 2.6.34.

Fixes: 9564e138b1f6 ("virtio: Add memory statistics reporting to the balloon driver (V4)")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

virtio-balloon: use actual number of stats for stats queue buffers

The virtio balloon driver contained a not-so-obvious invariant that
update_balloon_stats has to update exactly VIRTIO_BALLOON_S_NR counters
in order to send valid stats to the host. This commit fixes it by having
update_balloon_stats return the actual number of counters, and its
callers use it when pushing buffers to the stats virtqueue.

Note that it is still out of spec to change the number of counters
at run-time. "Driver MUST supply the same subset of statistics in all
buffers submitted to the statsq."

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

virtio_balloon: init 1st buffer in stats vq

When init_vqs runs, virtio_balloon.stats is either uninitialized or
contains stale values. The host updates its state with garbage data
because it has no way of knowing that this is just a marker buffer
used for signaling.

This patch updates the stats before pushing the initial buffer.

Alternative fixes:
* Push an empty buffer in init_vqs. Not easily done with the current
  virtio implementation and violates the spec "Driver MUST supply the
  same subset of statistics in all buffers submitted to the statsq".
* Push a buffer with invalid tags in init_vqs. Violates the same
  spec clause, plus "invalid tag" is not really defined.

Note: the spec says:
When using the legacy interface, the device SHOULD ignore all values in
the first buffer in the statsq supplied by the driver after device
initialization. Note: Historically, drivers supplied an uninitialized
buffer in the first buffer.

Unfortunately QEMU does not seem to implement the recommendation
even for the legacy interface.

Cc: stable@vger.kernel.org
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

virtio_pci: fix out of bound access for msix_names

Fedora has received multiple reports of crashes when running
4.11 as a guest

https://bugzilla.redhat.com/show_bug.cgi?id=1430297
https://bugzilla.redhat.com/show_bug.cgi?id=1434462
https://bugzilla.kernel.org/show_bug.cgi?id=194911
https://bugzilla.redhat.com/show_bug.cgi?id=1433899

The crashes are not always consistent but they are generally
some flavor of oops or GPF in virtio related code. Multiple people
have done bisections (Thank you Thorsten Leemhuis and
Richard W.M. Jones) and found this commit to be at fault

07ec51480b5eb1233f8c1b0f5d7a7c8d1247c507 is the first bad commit
commit 07ec51480b5eb1233f8c1b0f5d7a7c8d1247c507
Author: Christoph Hellwig <hch@lst.de>
Date: Sun Feb 5 18:15:19 2017 +0100

virtio_pci: use shared interrupts for virtqueues

The issue seems to be an out of bounds access to the msix_names
array corrupting kernel memory.

Fixes: 07ec51480b5e ("virtio_pci: use shared interrupts for virtqueues")
Reported-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Tested-by: Thorsten Leemhuis <linux@leemhuis.info>

NFS filelayout:call GETDEVICEINFO after pnfs_layout_process completes

Fix a filelayout GETDEVICEINFO call hang triggered from the LAYOUTGET
pnfs_layout_process where the GETDEVICEINFO call is waiting for a
session slot, and the LAYOUGET call is waiting for pnfs_layout_process
to complete before freeing the slot GETDEVICEINFO is waiting for..

This occurs in testing against the pynfs pNFS server where the
the on-wire reply highest_slotid and slot id are zero, and the
target high slot id is 8 (negotiated in CREATE_SESSION).

The internal fore channel slot table max_slotid, the maximum allowed
table slotid value, has been reduced via nfs41_set_max_slotid_locked
from 8 to 1. Thus there is one slot (slotid 0) available for use but
it has not been freed by LAYOUTGET proir to the GETDEVICEINFO request.

In order to ensure that layoutrecall callbacks are processed in the
correct order, nfs4_proc_layoutget processing needs to be finished
e.g. pnfs_layout_process) before giving up the slot that identifies
the layoutget (see referring_call_exists).

Move the filelayout_check_layout nfs4_find_get_device call outside of
the pnfs_layout_process call tree.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

NFS store nfs4_deviceid in struct nfs4_filelayout_segment

In preparation for moving the filelayout getdeviceinfo call from
filelayout_alloc_lseg called by pnfs_process_layout

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

KVM: x86: cleanup the page tracking SRCU instance

SRCU uses a delayed work item. Skip cleaning it up, and
the result is use-after-free in the work item callbacks.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Suggested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Fixes: 0eb05bf290cfe8610d9680b49abef37febd1c38a
Reviewed-by: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: nVMX: fix nested EPT detection

The nested_ept_enabled flag introduced in commit 7ca29de2136 was not
computed correctly. We are interested only in L1's EPT state, not the
the combined L0+L1 value.

In particular, if L0 uses EPT but L1 does not, nested_ept_enabled must
be false to make sure that PDPSTRs are loaded based on CR3 as usual,
because the special case described in 26.3.2.4 Loading Page-Directory-
Pointer-Table Entries does not apply.

Fixes: 7ca29de21362 ("KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT")
Cc: qemu-stable@nongnu.org
Reported-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: pci-assign: do not map smm memory slot pages in vt-d page tables

or VM memory are not put thus leaked in kvm_iommu_unmap_memslots() when
destroy VM.

This is consistent with current vfio implementation.

Signed-off-by: herongguang <herongguang.he@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

sched/headers: Remove duplicate #include <linux/sched/debug.h> line

Vito Caputo reported that the sched.h split-up series
introduced a duplicate #include <linux/sched/debug.h> line
in drivers/tty/vt/keyboard.c.

Remove it.

Reported-by: Vito Caputo <vcaputo@pengaru.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

vmlinux.lds: Add __clkevt_of_table to kernel

The code introduced by commit 0c8893c9095d ("clockevents: Add a
clkevt-of mechanism like clksrc-of") refer to __clkevt_of_table
what doesn't exist in the vmlinux. As a result kernel build
failed with error: "clkevt-probe.c:63: undefined reference to
`__clkevt_of_table’"

Fixes: 0c8893c9095d ("clockevents: Add a clkevt-of mechanism like clksrc-of")
Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

clockevents: Fix syntax error in clkevt-of macro

The patch fix syntax errors introduced by commit 0c8893c9095d
("clockevents: Add a clkevt-of mechanism like clksrc-of").

Fixes: 0c8893c9095d ("clockevents: Add a clkevt-of mechanism like clksrc-of")
Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

mm: Fix false-positive VM_BUG_ON() in page_cache_{get,add}_speculative()

0day testing by Fengguang Wu triggered this crash while running Trinity:

  kernel BUG at include/linux/pagemap.h:151!
  ...
  CPU: 0 PID: 458 Comm: trinity-c0 Not tainted 4.11.0-rc2-00251-g2947ba0 #1
  ...
  Call Trace:
   __get_user_pages_fast()
   get_user_pages_fast()
   get_futex_key()
   futex_requeue()
   do_futex()
   SyS_futex()
   do_syscall_64()
   entry_SYSCALL64_slow_path()

It' VM_BUG_ON() due to false-negative in_atomic(). We call
page_cache_get_speculative() with disabled local interrupts.
It should be atomic enough.

So let's check for disabled interrupts in the VM_BUG_ON() condition
too, to resolve this.

( This got triggered by the conversion of the x86 GUP code to the
  generic GUP code. )

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: LKP <lkp@01.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170324114709.pcytvyb3d6ajux33@black.fi.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

scsi: ufs: remove the duplicated checking for supporting clkscaling

There are same conditions for checking whether supporting clkscaling or
not. When ufshcd is supporting clkscaling, active_reqs should be
decreased by one.

[mkp: addressed comment from Bartlomiej]

Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

NFS cleanup struct nfs4_filelayout_segment

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

drm/radeon: Override fpfn for all VRAM placements in radeon_evict_flags

We were accidentally only overriding the first VRAM placement. For BOs
with the RADEON_GEM_NO_CPU_ACCESS flag set,
radeon_ttm_placement_from_domain creates a second VRAM placment with
fpfn == 0. If VRAM is almost full, the first VRAM placement with
fpfn > 0 may not work, but the second one with fpfn == 0 always will
(the BO's current location trivially satisfies it). Because "moving"
the BO to its current location puts it back on the LRU list, this
results in an infinite loop.

Fixes: 2a85aedd117c ("drm/radeon: Try evicting from CPU accessible to
inaccessible VRAM first")
Reported-by: Zachary Michaels <zmichaels@oblong.com>
Reported-and-Tested-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

Merge tag 'edac_for_4.11_2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC updates from Borislav Petkov:
"A new EDAC driver for the Pondicherry2 memory controller IP found in
  the Intel Apollo Lake platform and the Denverton microserver.

  Plus small fixlets.

  Normally I had this queued for 4.12 but Tony requested for the
  pnd2_edac driver to possibly land in 4.11 therefore I'm sending it to
  you now.

  It is a driver for new hardware which people don't have yet so it
  shouldn't cause any regressions.

  The couple of patches ontop of it show that Qiuxu actually did test it
  on the hardware he has access to :)"

* tag 'edac_for_4.11_2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC, pnd2_edac: Fix reported DIMM number
  EDAC, pnd2_edac: Fix !EDAC_DEBUG build
  EDAC: Select DEBUG_FS
  EDAC, pnd2_edac: Add new EDAC driver for Intel SoC platforms
  EDAC, i5000, i5400: Fix use of MTR_DRAM_WIDTH macro
  EDAC, xgene: Fix wrongly spelled "procesing"

Merge tag 'pinctrl-v4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull more pin control fixes from Linus Walleij:
"Here is a bunch of pin control fixes again

  A bit more than I'd like for this subsystem at this point, but what
  can I do. They are all driver fixes for hardware issues, as like "we
  forgot", "we didn't think of the fact that this could happen", "oops
  that one goes there" etc

   - Kconfig fixup for the TI IOdelay pinctrl-single add-on

   - fix up a typo in the meson i2c ao groups

   - switch a remapping back to use devm_ioremap() as
     devm_ioremap_resource() does not allow for sharing memory regions

   - do not clear the Qualcomm irq status bit in irq_unmask(), as this
     can lead to missing interrupts while the irq handler is executing

   - add irq_request/release_resources() on the ST driver

   - add a bunch of mysteriously missing pingroups for high numbered
     pins in the Qualcomm ipq4019 driver"

* tag 'pinctrl-v4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: qcom: ipq4019: add missing pingroups for pins > 70
  pinctrl: st: add irq_request/release_resources callbacks
  pinctrl: qcom: Don't clear status bit on irq_unmask
  pinctrl: samsung: Fix memory mapping code
  pinctrl: meson-gxbb: Fix typo in i2c ao groups
  pinctrl: ti: The IODelay driver is a DRA7xxx feature so depend on that SoC

Merge tag 'm68k-for-v4.11-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k

Pull m68k updates from Geert Uytterhoeven:

  - build warning fix

  - defconfig updates

  - wire up new statx syscall

* tag 'm68k-for-v4.11-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
  m68k: Wire up statx
  m68k/defconfig: Update defconfigs for v4.11-rc1
  m68k/bitops: Correct signature of test_bit()

cpufreq: Fix creation of symbolic links to policy directories

The cpufreq core only tries to create symbolic links from CPU
directories in sysfs to policy directories in cpufreq_add_dev(),
either when a given CPU is registered or when the cpufreq driver
is registered, whichever happens first. That is not sufficient,
however, because cpufreq_add_dev() may be called for an offline CPU
whose policy object has not been created yet and, quite obviously,
the symbolic cannot be added in that case.

Fix that by making cpufreq_online() attempt to add symbolic links to
policy objects for the CPUs in the related_cpus mask of every new
policy object created by it.

The cpufreq_driver_lock locking around the for_each_cpu() loop
in cpufreq_online() is dropped, because it is not necessary and the
code is somewhat simpler without it. Moreover, failures to create
a symbolic link will not be regarded as hard errors any more and
the CPUs without those links will not be taken offline automatically,
but that should not be problematic in practice.

Reported-and-tested-by: Prashanth Prakash <pprakash@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 4.9+ <stable@vger.kernel.org> # 4.9+

NFS: Fix old dentry rehash after move

Now that nfs_rename()'s d_move has moved within the RPC task's rpc_call_done
callback, rehashing new_dentry will actually rehash the old dentry's name
in nfs_rename(). d_move() is going to rehash the new dentry for us anyway,
so doing it again here is unnecessary.

Reported-by: Chuck Lever <chuck.lever@oracle.com>
Fixes: 920b4530fb80 ("NFS: nfs_rename() handle -ERESTARTSYS dentry left behind")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

drm/i915: Restore marking context objects as dirty on pinning

Commit e8a9c58fcd9a ("drm/i915: Unify active context tracking between
legacy/execlists/guc") converted the legacy intel_ringbuffer submission
to the same context pinning mechanism as execlists - that is to pin the
context until the subsequent request is retired. Previously it used the
vma retirement of the context object to keep itself pinned until the
next request (after i915_vma_move_to_active()). In the conversion, I
missed that the vma retirement was also responsible for marking the
object as dirty. Mark the context object as dirty when pinning
(equivalent to execlists) which ensures that if the context is swapped
out due to mempressure or suspend/hibernation, when it is loaded back in
it does so with the previous state (and not all zero).

Fixes: e8a9c58fcd9a ("drm/i915: Unify active context tracking between legacy/execlists/guc")
Reported-by: Dennis Gilmore <dennis@ausil.us>
Reported-by: Mathieu Marquer <mathieu.marquer@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99993
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100181
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.11-rc1
Link: http://patchwork.freedesktop.org/patch/msgid/20170322205930.12762-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
(cherry picked from commit 5d4bac5503fcc67dd7999571e243cee49371aef7)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

sched/clock: Fix broken stable to unstable transfer

When it is determined that the clock is actually unstable, and
we switch from stable to unstable, the __clear_sched_clock_stable()
function is eventually called.

In this function we set gtod_offset so the following holds true:

sched_clock() + raw_offset == ktime_get_ns() + gtod_offset

But instead of getting the latest timestamps, we use the last values
from scd, so instead of sched_clock() we use scd->tick_raw, and
instead of ktime_get_ns() we use scd->tick_gtod.

However, later, when we use gtod_offset sched_clock_local() we do not
add it to scd->tick_gtod to calculate the correct clock value when we
determine the boundaries for min/max clocks.

This can result in tick granularity sched_clock() values, so fix it.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hpa@zytor.com
Fixes: 5680d8094ffa ("sched/clock: Provide better clock continuity")
Link: http://lkml.kernel.org/r/1490214265-899964-2-git-send-email-pasha.tatashin@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge tag 'gvt-fixes-2017-03-23' of https://github.com/01org/gvt-linux into drm-intel-fixes

gvt-fixes-2017-03-23

- KVM reference fix from Alex
- shadow gtt entry partial update fix from Xiaoguang
- gvt context notification check (Changbin)
- other misc fixes.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>

USB: fix linked-list corruption in rh_call_control()

Using KASAN, Dmitry found a bug in the rh_call_control() routine: If
buffer allocation fails, the routine returns immediately without
unlinking its URB from the control endpoint, eventually leading to
linked-list corruption.

This patch fixes the problem by jumping to the end of the routine
(where the URB is unlinked) when an allocation failure occurs.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: Dmitry Vyukov <dvyukov@google.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86: Convert the rest of the code to support p4d_t

This patch converts x86 to use proper folding of a new (fifth) page table level
with <asm-generic/pgtable-nop4d.h>.

That's a bit of a kitchen sink patch, but I don't see how to split it further
without hurting bisectability.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170317185515.8636-7-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/xen: Change __xen_pgd_walk() and xen_cleanmfnmap() to support p4d

Split these helpers into a couple of per-level functions and add support for
an additional page table level.

Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
[ Split off into separate patch ]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170317185515.8636-6-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/kasan: Prepare clear_pgds() to switch to <asm-generic/pgtable-nop4d.h>

With folded p4d, pgd_clear() is a nop. Change clear_pgds() to use
p4d_clear() instead.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170317185515.8636-5-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/mm/pat: Add 5-level paging support

Straight-forward extension of existing code to support additional page
table level.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170317185515.8636-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/efi: Add 5-level paging support

Allocate additional page table level and ajdust efi_sync_low_kernel_mappings()
to work with additional page table level.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170317185515.8636-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/kexec: Add 5-level paging support

Handle additional page table level in the kexec code.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170317185515.8636-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Linux 4.11-rc4

Merge tag 'char-misc-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
"A smattering of different small fixes for some random driver
  subsystems. Nothing all that major, just resolutions for reported
  issues and bugs.

  All have been in linux-next with no reported issues"

* tag 'char-misc-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits)
  extcon: int3496: Set the id pin to direction-input if necessary
  extcon: int3496: Use gpiod_get instead of gpiod_get_index
  extcon: int3496: Add dependency on X86 as it's Intel specific
  extcon: int3496: Add GPIO ACPI mapping table
  extcon: int3496: Rename GPIO pins in accordance with binding
  vmw_vmci: handle the return value from pci_alloc_irq_vectors correctly
  ppdev: fix registering same device name
  parport: fix attempt to write duplicate procfiles
  auxdisplay: img-ascii-lcd: add missing sentinel entry in img_ascii_lcd_matches
  Drivers: hv: vmbus: Don't leak memory when a channel is rescinded
  Drivers: hv: vmbus: Don't leak channel ids
  Drivers: hv: util: don't forget to init host_ts.lock
  Drivers: hv: util: move waiting for release to hv_utils_transport itself
  vmbus: remove hv_event_tasklet_disable/enable
  vmbus: use rcu for per-cpu channel list
  mei: don't wait for os version message reply
  mei: fix deadlock on mei reset
  intel_th: pci: Add Gemini Lake support
  intel_th: pci: Add Denverton SOC support
  intel_th: Don't leak module refcount on failure to activate
  ...

Merge tag 'driver-core-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fix from Greg KH:
"Here is a single kernfs fix for 4.11-rc4 that resolves a reported
  issue.

  It has been in linux-next with no reported issues"

* tag 'driver-core-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  kernfs: Check KERNFS_HAS_RELEASE before calling kernfs_release_file()

Merge tag 'tty-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver fixes from Greg KH:
"Here are some tty and serial driver fixes for 4.11-rc4.

  One of these fix a long-standing issue in the ldisc code that was
  found by Dmitry Vyukov with his great fuzzing work. The other fixes
  resolve other reported issues, and there is one revert of a patch in
  4.11-rc1 that wasn't correct.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: fix data race in tty_ldisc_ref_wait()
  tty: don't panic on OOM in tty_set_ldisc()
  Revert "tty: serial: pl011: add ttyAMA for matching pl011 console"
  tty: acpi/spcr: QDF2400 E44 checks for wrong OEM revision
  serial: 8250_dw: Fix breakage when HAVE_CLK=n
  serial: 8250_dw: Honor clk_round_rate errors in dw8250_set_termios

Merge tag 'staging-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull IIO driver fixes from Greg KH:
"Here are some small IIO driver fixes for 4.11-rc4 that resolve a
  number of tiny reported issues. All of these have been in linux-next
  for a while with no reported issues"

* tag 'staging-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  iio: imu: st_lsm6dsx: fix FIFO_CTRL2 overwrite during watermark configuration
  iio: adc: ti_am335x_adc: fix fifo overrun recovery
  iio: sw-device: Fix config group initialization
  iio: magnetometer: ak8974: remove incorrect __exit markups
  iio: hid-sensor-trigger: Change get poll value function order to avoid sensor properties losing after resume from S3

Merge tag 'usb-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB/PHY fixes from Greg KH:
"Here are a number of small USB and PHY driver fixes for 4.11-rc4.

  Nothing major here, just an bunch of small fixes, and a handfull of
  good fixes from Johan for devices with crazy descriptors. There are a
  few new device ids in here as well.

  All of these have been in linux-next with no reported issues"

* tag 'usb-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (26 commits)
  usb: gadget: f_hid: fix: Don't access hidg->req without spinlock held
  usb: gadget: udc: remove pointer dereference after free
  usb: gadget: f_uvc: Sanity check wMaxPacketSize for SuperSpeed
  usb: gadget: f_uvc: Fix SuperSpeed companion descriptor's wBytesPerInterval
  usb: gadget: acm: fix endianness in notifications
  usb: dwc3: gadget: delay unmap of bounced requests
  USB: serial: qcserial: add Dell DW5811e
  usb: hub: Fix crash after failure to read BOS descriptor
  ACM gadget: fix endianness in notifications
  USB: usbtmc: fix probe error path
  USB: usbtmc: add missing endpoint sanity check
  USB: serial: option: add Quectel UC15, UC20, EC21, and EC25 modems
  usb: musb: fix possible spinlock deadlock
  usb: musb: dsps: fix iounmap in error and exit paths
  usb: musb: cppi41: don't check early-TX-interrupt for Isoch transfer
  usb-core: Add LINEAR_FRAME_INTR_BINTERVAL USB quirk
  uwb: i1480-dfu: fix NULL-deref at probe
  uwb: hwa-rc: fix NULL-deref at probe
  USB: wusbcore: fix NULL-deref at probe
  USB: uss720: fix NULL-deref at probe
  ...

Merge tag 'powerpc-4.11-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull more powerpc fixes from Michael Ellerman:
"These are all pretty minor. The fix for idle wakeup would be a bad bug
  but has not been observed in practice.

  The update to the gcc-plugins docs was Cc'ed to Kees and Jon, Kees
  OK'ed it going via powerpc and I didn't hear from Jon.

   - cxl: Route eeh events to all slices for pci_channel_io_perm_failure state

   - powerpc/64s: Fix idle wakeup potential to clobber registers

   - Revert "powerpc/64: Disable use of radix under a hypervisor"

   - gcc-plugins: update architecture list in documentation

  Thanks to: Andrew Donnellan, Nicholas Piggin, Paul Mackerras, Vaibhav
  Jain"

* tag 'powerpc-4.11-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  gcc-plugins: update architecture list in documentation
  Revert "powerpc/64: Disable use of radix under a hypervisor"
  powerpc/64s: Fix idle wakeup potential to clobber registers
  cxl: Route eeh events to all slices for pci_channel_io_perm_failure state

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
"Fix a memory leak on an error path, and two races when modifying
  inodes relating to the inline_data and metadata checksum features"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix two spelling nits
  ext4: lock the xattr block before checksuming it
  jbd2: don't leak memory if setting up journal fails
  ext4: mark inode dirty after converting inline directory

EDAC, pnd2_edac: Fix reported DIMM number

DIMM number passed to edac_mc_handle_error() was accidentally hardcoded
to zero. Pass in the correct daddr->dimm value.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>

Merge tag 'fscrypt-for-linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt

Pull fscrypto fixes from Ted Ts'o:
"A code cleanup and bugfix for fs/crypto"

* tag 'fscrypt-for-linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt:
fscrypt: eliminate ->prepare_context() operation
fscrypt: remove broken support for detecting keyring key revocation

Merge tag 'hwmon-for-linus-v4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

- bug fixes in asus_atk0110, it87 and max31790 drivers

- added missing API definition to hwmon core

* tag 'hwmon-for-linus-v4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (asus_atk0110) fix uninitialized data access
  hwmon: Add missing HWMON_T_ALARM
  hwmon: (it87) Avoid registering the same chip on both SIO addresses
  hwmon: (max31790) Set correct PWM value

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma fixes from Doug Ledford:
"This has been a slow -rc cycle for the RDMA subsystem. We really
  haven't had a lot of rc fixes come in. This pull request is the first
  of this entire rc cycle and it has all of the suitable fixes so far
  and it's still only about 20 patches. The fix for the minor breakage
  cause by the dma mapping patchset is in here, as well as a couple
  other potential oops fixes, but the rest is more minor.

  Summary:

   - fix for dma_ops change in this kernel, resolving the s390, powerpc,
     and IOMMU operation

   - a few other oops fixes

   - the rest are all minor fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  IB/qib: fix false-postive maybe-uninitialized warning
  RDMA/iser: Fix possible mr leak on device removal event
  IB/device: Convert ib-comp-wq to be CPU-bound
  IB/cq: Don't process more than the given budget
  IB/rxe: increment msn only when completing a request
  uapi: fix rdma/mlx5-abi.h userspace compilation errors
  IB/core: Restore I/O MMU, s390 and powerpc support
  IB/rxe: Update documentation link
  RDMA/ocrdma: fix a type issue in ocrdma_put_pd_num()
  IB/rxe: double free on error
  RDMA/vmw_pvrdma: Activate device on ethernet link up
  RDMA/vmw_pvrdma: Dont hardcode QP header page
  RDMA/vmw_pvrdma: Cleanup unused variables
  infiniband: Fix alignment of mmap cookies to support VIPT caching
  IB/core: Protect against self-requeue of a cq work item
  i40iw: Receive netdev events post INET_NOTIFIER state

Merge branch 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit

Pull audit fix from Paul Moore:
"We've got an audit fix, and unfortunately it is big.

  While I'm not excited that we need to be sending you something this
  large during the -rcX phase, it does fix some very real, and very
  tangled, problems relating to locking, backlog queues, and the audit
  daemon connection.

  This code has passed our testsuite without problem and it has held up
  to my ad-hoc stress tests (arguably better than the existing code),
  please consider pulling this as fix for the next v4.11-rcX tag"

* 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit:
  audit: fix auditd/kernel connection state tracking

ext4: fix two spelling nits

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

ext4: lock the xattr block before checksuming it

We must lock the xattr block before calculating or verifying the
checksum in order to avoid spurious checksum failures.

https://bugzilla.kernel.org/show_bug.cgi?id=193661

Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
"A handful of Sunxi and Rockchip clk driver fixes and a core framework
  one where we need to copy a string because we can't guarantee it isn't
  freed sometime later"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: sunxi-ng: fix recalc_rate formula of NKMP clocks
  clk: sunxi-ng: Fix div/mult settings for osc12M on A64
  clk: rockchip: Make uartpll a child of the gpll on rk3036
  clk: rockchip: add "," to mux_pll_src_apll_dpll_gpll_usb480m_p on rk3036
  clk: core: Copy connection id
  dt-bindings: arm: update Armada CP110 system controller binding
  clk: sunxi-ng: sun6i: Fix enable bit offset for hdmi-ddc module clock
  clk: sunxi: ccu-sun5i needs nkmp
  clk: sunxi-ng: mp: Adjust parent rate for pre-dividers

IB/qib: fix false-postive maybe-uninitialized warning

aarch64-linux-gcc-7 complains about code it doesn't fully understand:

drivers/infiniband/hw/qib/qib_iba7322.c: In function 'qib_7322_txchk_change':
include/asm-generic/bitops/non-atomic.h:105:35: error: 'shadow' may be used uninitialized in this function [-Werror=maybe-uninitialized]

The code is right, and despite trying hard, I could not come up with a version
that I liked better than just adding a fake initialization here to shut up the
warning.

Fixes: f931551bafe1 ("IB/qib: Add new qib driver for QLogic PCIe InfiniBand adapters")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

RDMA/iser: Fix possible mr leak on device removal event

When the rdma device is removed, we must cleanup all
the rdma resources within the DEVICE_REMOVAL event
handler to let the device teardown gracefully. When
this happens with live I/O, some memory regions are
occupied. Thus, track them too and dereg all the mr's.

We are safe with mr access by iscsi_iser_cleanup_task.

Reported-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

IB/device: Convert ib-comp-wq to be CPU-bound

This workqueue is used by our storage target mode ULPs
via the new CQ API. Recent observations when working
with very high-end flash storage devices reveal that
UNBOUND workqueue threads can migrate between cpu cores
and even numa nodes (although some numa locality is accounted
for).

While this attribute can be useful in some workloads,
it does not fit in very nicely with the normal
run-to-completion model we usually use in our target-mode
ULPs and the block-mq irq<->cpu affinity facilities.

The whole block-mq concept is that the completion will
land on the same cpu where the submission was performed.
The fact that our submitter thread is migrating cpus
can break this locality.

We assume that as a target mode ULP, we will serve multiple
initiators/clients and we can spread the load enough without
having to use unbound kworkers.

Also, while we're at it, expose this workqueue via sysfs which
is harmless and can be useful for debug.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>--
Signed-off-by: Doug Ledford <dledford@redhat.com>

IB/cq: Don't process more than the given budget

The caller might not want this overhead.

Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

IB/rxe: increment msn only when completing a request

According to C9-147, MSN should only be incremented when the last packet of
a multi packet request has been received.

"Logically, the requester associates a sequential Send Sequence Number
(SSN) with each WQE posted to the send queue. The SSN bears a one-
to-one relationship to the MSN returned by the responder in each re-
sponse packet. Therefore, when the requester receives a response, it in-
terprets the MSN as representing the SSN of the most recent request
completed by the responder to determine which send WQE(s) can be
completed."

Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

uapi: fix rdma/mlx5-abi.h userspace compilation errors

Consistently use types from linux/types.h to fix the following
rdma/mlx5-abi.h userspace compilation errors:

/usr/include/rdma/mlx5-abi.h:69:25: error: 'u64' undeclared here (not in a function)
  MLX5_LIB_CAP_4K_UAR = (u64)1 << 0,
/usr/include/rdma/mlx5-abi.h:69:29: error: expected ',' or '}' before numeric constant
  MLX5_LIB_CAP_4K_UAR = (u64)1 << 0,

Include <linux/if_ether.h> to fix the following rdma/mlx5-abi.h
userspace compilation error:

/usr/include/rdma/mlx5-abi.h:286:12: error: 'ETH_ALEN' undeclared here (not in a function)
  __u8 dmac[ETH_ALEN];

Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>

IB/core: Restore I/O MMU, s390 and powerpc support

Avoid that the following error message is reported on the console
while loading an RDMA driver with I/O MMU support enabled:

DMAR: Allocating domain for mlx5_0 failed

Ensure that DMA mapping operations that use to_pci_dev() to
access to struct pci_dev see the correct PCI device. E.g. the s390
and powerpc DMA mapping operations use to_pci_dev() even with I/O
MMU support disabled.

This patch preserves the following changes of the DMA mapping updates
patch series:
- Introduction of dma_virt_ops.
- Removal of ib_device.dma_ops.
- Removal of struct ib_dma_mapping_ops.
- Removal of an if-statement from each ib_dma_*() operation.
- IB HW drivers no longer set dma_device directly.

Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reported-by: Parav Pandit <parav@mellanox.com>
Fixes: commit 99db9494035f ("IB/core: Remove ib_device.dma_device")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: parav@mellanox.com
Tested-by: parav@mellanox.com
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

IB/rxe: Update documentation link

All Soft-RoCE (rxe) is handled now in rdma-core user space library,
so the documentation. The patch below updates the documentation
link to that new location.

Reported-by: Josh Beavers <josh.beavers@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

RDMA/ocrdma: fix a type issue in ocrdma_put_pd_num()

We want to return zero on success or negative error codes. The type
should be int and not u8.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

IB/rxe: double free on error

"goto err;" has it's own kfree_skb() call so it's a double free. We
only need to free on the "goto exit;" path.

Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

RDMA/vmw_pvrdma: Activate device on ethernet link up

Restore device state when ethernet link changes to active.

Acked-by: George Zhang <georgezhang@vmware.com>
Acked-by: Jorgen Hansen <jhansen@vmware.com>
Acked-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Aditya Sarwade <asarwade@vmware.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

RDMA/vmw_pvrdma: Dont hardcode QP header page

Moved the header page count to a macro.

Reported-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Tested-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

RDMA/vmw_pvrdma: Cleanup unused variables

Removed the unused nreq and redundant index variables.
Moved hardcoded async and cq ring pages number to macro.

Reported-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Tested-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

nbd: replace kill_bdev() with __invalidate_device()

When a filesystem is mounted on a nbd device and on a disconnect, because
of kill_bdev(), and resetting bdev size to zero, buffer_head mappings are
getting destroyed under mounted filesystem.

After a bdev size reset(i.e bdev->bd_inode->i_size = 0) on a disconnect,
followed by a sys_umount(),
        generic_shutdown_super()->...
        ->__sync_blockdev()->...
        -blkdev_writepages()->...
        ->do_invalidatepage()->...
        -discard_buffer()   is discarding superblock buffer_head assumed
to be in mapped state by ext4_commit_super().

[mlin: ported to 4.11-rc2]
Signed-off-by: Ratna Manoj Bolla <manoj.br@gmail.com
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

nbd: set queue timeout properly

We can't just set the timeout on the tagset, we have to set it on the
queue as it would have been setup already at this point.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

nbd: set rq->errors to actual error code

We've been relying on the block layer to assume rq->errors being set
translates into -EIO. I noticed in testing that sometimes this isn't
true, and really there's not much of a reason to have a counter instead
of just using -EIO. So set it properly so we don't leak random numbers
to unsuspecting victims.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

nbd: handle ERESTARTSYS properly

We can submit IO in a processes context, which means there can be
pending signals.  This isn't a fatal error for NBD, but it does require
some finesse.  If the signal happens before we transmit anything then we
are ok, just requeue the request and carry on.  However if we've done a
partial transmit we can't allow anything else to be transmitted on this
socket until we transmit the remaining part of the request.  Deal with
this by keeping track of how much we've sent for the current request,
and if we get an ERESTARTSYS during any part of our transmission save
the state of that request and requeue the IO.  If anybody tries to
submit a request that isn't our pending request then requeue that
request until we are able to service the one that is pending.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

blk-mq: include errors in did_work calculation

Currently we return true in blk_mq_dispatch_rq_list() if we queued IO
successfully, but we really want to return whether or not the we made
progress. Progress includes if we got an error return. If we don't,
this can lead to a hang in blk_mq_sched_dispatch_requests() when a
driver is draining IO by returning BLK_MQ_QUEUE_ERROR instead of
manually ending the IO in error and return BLK_MQ_QUEUE_OK.

Tested-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

Merge tag 'vfio-v4.11-rc4' of git://github.com/awilliam/linux-vfio

Pull VFIO fix from Alex Williamson:
"Rework sanity check for mdev driver group notifier de-registration
(Alex Williamson)"

* tag 'vfio-v4.11-rc4' of git://github.com/awilliam/linux-vfio:
vfio: Rework group release notifier warning