Darrick J. Wong [Thu, 2 Feb 2017 23:13:59 +0000 (15:13 -0800)]
xfs: check for obviously bad level values in the bmbt root
We can't handle a bmbt that's taller than BTREE_MAXLEVELS, and there's
no such thing as a zero-level bmbt (for that we have extents format),
so if we see this, send back an error code.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 2 Feb 2017 23:13:58 +0000 (15:13 -0800)]
xfs: fail _dir_open when readahead fails
When we open a directory, we try to readahead block 0 of the directory
on the assumption that we're going to need it soon. If the bmbt is
corrupt, the directory will never be usable and the readahead fails
immediately, so we might as well prevent the directory from being opened
at all. This prevents a subsequent read or modify operation from
hitting it and taking the fs offline.
NOTE: We're only checking for early failures in the block mapping, not
the readahead directory block itself.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 2 Feb 2017 23:13:57 +0000 (15:13 -0800)]
xfs: fix toctou race when locking an inode to access the data map
We use di_format and if_flags to decide whether we're grabbing the ilock
in btree mode (btree extents not loaded) or shared mode (anything else),
but the state of those fields can be changed by other threads that are
also trying to load the btree extents -- IFEXTENTS gets set before the
_bmap_read_extents call and cleared if it fails.
We don't actually need to have IFEXTENTS set until after the bmbt
records are successfully loaded and validated, which will fix the race
between multiple threads trying to read the same directory. The next
patch strengthens directory bmbt validation by refusing to open the
directory if reading the bmbt to start directory readahead fails.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
Eric Sandeen [Sat, 28 Jan 2017 07:24:28 +0000 (23:24 -0800)]
xfs: remove unused full argument from bmap
The "full" argument was used only by the fiemap formatter,
which is now gone with the iomap updates.
Remove the unused arg.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Brian Foster [Sat, 28 Jan 2017 07:22:57 +0000 (23:22 -0800)]
xfs: fix eofblocks race with file extending async dio writes
It's possible for post-eof blocks to end up being used for direct I/O
writes. dio write performs an upfront unwritten extent allocation, sends
the dio and then updates the inode size (if necessary) on write
completion. If a file release occurs while a file extending dio write is
in flight, it is possible to mistake the post-eof blocks for speculative
preallocation and incorrectly truncate them from the inode. This means
that the resulting dio write completion can discover a hole and allocate
new blocks rather than perform unwritten extent conversion.
This requires a strange mix of I/O and is thus not likely to reproduce
in real world workloads. It is intermittently reproduced by generic/299.
The error manifests as an assert failure due to transaction overrun
because the aforementioned write completion transaction has only
reserved enough blocks for btree operations:
The root cause is that xfs_free_eofblocks() uses i_size to truncate
post-eof blocks from the inode, but async, file extending direct writes
do not update i_size until write completion, long after inode locks are
dropped. Therefore, xfs_free_eofblocks() effectively truncates the inode
to the incorrect size.
Update xfs_free_eofblocks() to serialize against dio similar to how
extending writes are serialized against i_size updates before post-eof
block zeroing. Specifically, wait on dio while under the iolock. This
ensures that dio write completions have updated i_size before post-eof
blocks are processed.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Brian Foster [Sat, 28 Jan 2017 07:22:56 +0000 (23:22 -0800)]
xfs: sync eofblocks scans under iolock are livelock prone
The xfs_eofblocks.eof_scan_owner field is an internal field to
facilitate invoking eofb scans from the kernel while under the iolock.
This is necessary because the eofb scan acquires the iolock of each
inode. Synchronous scans are invoked on certain buffered write failures
while under iolock. In such cases, the scan owner indicates that the
context for the scan already owns the particular iolock and prevents a
double lock deadlock.
eofblocks scans while under iolock are still livelock prone in the event
of multiple parallel scans, however. If multiple buffered writes to
different inodes fail and invoke eofblocks scans at the same time, each
scan avoids a deadlock with its own inode by virtue of the
eof_scan_owner field, but will never be able to acquire the iolock of
the inode from the parallel scan. Because the low free space scans are
invoked with SYNC_WAIT, the scan will not return until it has processed
every tagged inode and thus both scans will spin indefinitely on the
iolock being held across the opposite scan. This problem can be
reproduced reliably by generic/224 on systems with higher cpu counts
(x16).
To avoid this problem, simplify the semantics of eofblocks scans to
never invoke a scan while under iolock. This means that the buffered
write context must drop the iolock before the scan. It must reacquire
the lock before the write retry and also repeat the initial write
checks, as the original state might no longer be valid once the iolock
was dropped.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Brian Foster [Sat, 28 Jan 2017 07:22:55 +0000 (23:22 -0800)]
xfs: pull up iolock from xfs_free_eofblocks()
xfs_free_eofblocks() requires the IOLOCK_EXCL lock, but is called from
different contexts where the lock may or may not be held. The
need_iolock parameter exists for this reason, to indicate whether
xfs_free_eofblocks() must acquire the iolock itself before it can
proceed.
This is ugly and confusing. Simplify the semantics of
xfs_free_eofblocks() to require the caller to acquire the iolock
appropriately and kill the need_iolock parameter. While here, the mp
param can be removed as well as the xfs_mount is accessible from the
xfs_inode structure. This patch does not change behavior.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Eric Sandeen [Sat, 28 Jan 2017 07:21:08 +0000 (23:21 -0800)]
xfs: remove unused struct declarations
After scratching my head looking for "xfs_busy_extent" I realized
it's not used; it's xfs_extent_busy, and the declaration for the
other name is bogus. Remove that and a few others as well.
(struct xfs_log_callback is used, but the 2nd declaration is
unnecessary).
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Eric Sandeen [Sat, 28 Jan 2017 07:16:39 +0000 (23:16 -0800)]
xfs: remove boilerplate around xfs_btree_init_block
Now that xfs_btree_init_block_int is able to determine crc
status from the passed-in mp, we can determine the proper
magic as well if we are given a btree number, rather than
an explicit magic value.
Change xfs_btree_init_block[_int] callers to pass in the
btree number, and let xfs_btree_init_block_int use the
xfs_magics array via the xfs_btree_magic macro to determine
which magic value is needed. This makes all of the
if (crc) / else stanzas identical, and the if/else can be
removed, leading to a single, common init_block call.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Eric Sandeen [Sat, 28 Jan 2017 07:16:38 +0000 (23:16 -0800)]
xfs: make xfs_btree_magic more generic
Right now the xfs_btree_magic() define takes only a cursor;
change this to take crc and btnum args to make it more generically
useful, and move to a function.
This will allow xfs_btree_init_block_int callers which don't
have a cursor to make use of the xfs_magics array, which will
happen in the next patch.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Eric Sandeen [Sat, 28 Jan 2017 07:16:37 +0000 (23:16 -0800)]
xfs: glean crc status from mp not flags in xfs_btree_init_block_int
xfs_btree_init_block_int() can determine whether crcs are
in effect without the passed-in XFS_BTREE_CRC_BLOCKS flag;
the mp argument allows us to determine this from the
superblock. Remove the flag from callers, and use
xfs_sb_version_hascrc(&mp->m_sb) internally instead.
This removes one difference between the if & else cases
in the callers.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Linus Torvalds [Sun, 29 Jan 2017 21:50:06 +0000 (13:50 -0800)]
drm/i915: Check for NULL i915_vma in intel_unpin_fb_obj()
I've seen this trigger twice now, where the i915_gem_object_to_ggtt()
call in intel_unpin_fb_obj() returns NULL, resulting in an oops
immediately afterwards as the (inlined) call to i915_vma_unpin_fence()
tries to dereference it.
It seems to be some race condition where the object is going away at
shutdown time, since both times happened when shutting down the X
server. The call chains were different:
and this patch purely papers over the issue by adding a NULL pointer
check and a WARN_ON_ONCE() to avoid the oops that would then generally
make the machine unresponsive.
Other callers of i915_gem_object_to_ggtt() seem to also check for the
returned pointer being NULL and warn about it, so this clearly has
happened before in other places.
[ Reported it originally to the i915 developers on Jan 8, applying the
ugly workaround on my own now after triggering the problem for the
second time with no feedback.
Linus Torvalds [Sun, 29 Jan 2017 18:56:56 +0000 (10:56 -0800)]
Merge branch 'parisc-4.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull two parisc fixes from Helge Deller:
"One fix to avoid usage of BITS_PER_LONG in user-space exported swab.h
header which breaks compiling qemu, and one trivial fix for printk
continuation in the parisc parport driver"
* 'parisc-4.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Don't use BITS_PER_LONG in userspace-exported swab.h header
parisc, parport_gsc: Fixes for printk continuation lines
Linus Torvalds [Sat, 28 Jan 2017 23:09:23 +0000 (15:09 -0800)]
Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Two I2C driver bugfixes.
The 'VLLS mode support' patch should have been entitled 'reconfigure
pinctrl after suspend' to make the bugfix more clear. Sorry, I missed
that, yet didn't want to rebase"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: imx-lpi2c: add VLLS mode support
i2c: i2c-cadence: Initialize configuration before probing devices
Helge Deller [Sat, 28 Jan 2017 10:52:02 +0000 (11:52 +0100)]
parisc: Don't use BITS_PER_LONG in userspace-exported swab.h header
In swab.h the "#if BITS_PER_LONG > 32" breaks compiling userspace programs if
BITS_PER_LONG is #defined by userspace with the sizeof() compiler builtin.
Solve this problem by using __BITS_PER_LONG instead. Since we now
#include asm/bitsperlong.h avoid further potential userspace pollution
by moving the #define of SHIFT_PER_LONG to bitops.h which is not
exported to userspace.
This patch unbreaks compiling qemu on hppa/parisc.
Linus Torvalds [Sat, 28 Jan 2017 19:50:17 +0000 (11:50 -0800)]
Merge tag 'nfs-for-4.10-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
"Stable patches:
- NFSv4.1: Fix a deadlock in layoutget
- NFSv4 must not bump sequence ids on NFS4ERR_MOVED errors
- NFSv4 Fix a regression with OPEN EXCLUSIVE4 mode
- Fix a memory leak when removing the SUNRPC module
Bugfixes:
- Fix a reference leak in _pnfs_return_layout"
* tag 'nfs-for-4.10-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
pNFS: Fix a reference leak in _pnfs_return_layout
nfs: Fix "Don't increment lock sequence ID after NFS4ERR_MOVED"
SUNRPC: cleanup ida information when removing sunrpc module
NFSv4.0: always send mode in SETATTR after EXCLUSIVE4
nfs: Don't increment lock sequence ID after NFS4ERR_MOVED
NFSv4.1: Fix a deadlock in layoutget
Linus Torvalds [Sat, 28 Jan 2017 19:09:04 +0000 (11:09 -0800)]
Merge tag 'md/4.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md
Pull MD fixes from Shaohua Li:
"This fixes several corner cases for raid5 cache, which is merged into
this cycle"
* tag 'md/4.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
md/r5cache: disable write back for degraded array
md/r5cache: shift complex rmw from read path to write path
md/r5cache: flush data only stripes in r5l_recovery_log()
md/raid5: move comment of fetch_block to right location
md/r5cache: read data into orig_page for prexor of cached data
md/raid5-cache: delete meaningless code
Linus Torvalds [Sat, 28 Jan 2017 19:00:08 +0000 (11:00 -0800)]
Merge tag 'arc-4.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
"Hopefully last set of changes for ARC for 4.10:
- fix for unaligned access emulation corner case
- fix for udelay loop inline asm regression
- fix irq affinity finally for AXS103 board [Yuriy]
- final fixes for setting IO-coherency sanely in SMP"
* tag 'arc-4.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: [arcompact] handle unaligned access delay slot corner case
ARCv2: smp-boot: wake_flag polling by non-Masters needs to be uncached
ARC: smp-boot: Decouple Non masters waiting API from jump to entry point
ARCv2: MCIP: update the BCR per current changes
ARC: udelay: fix inline assembler by adding LP_COUNT to clobber list
ARCv2: MCIP: Deprecate setting of affinity in Device Tree
1) GTP fixes from Andreas Schultz (missing genl module alias, clear IP
DF on transmit).
2) Netfilter needs to reflect the fwmark when sending resets, from Pau
Espin Pedrol.
3) nftable dump OOPS fix from Liping Zhang.
4) Fix erroneous setting of VIRTIO_NET_HDR_F_DATA_VALID on transmit,
from Rolf Neugebauer.
5) Fix build error of ipt_CLUSTERIP when procfs is disabled, from Arnd
Bergmann.
6) Fix regression in handling of NETIF_F_SG in harmonize_features(),
from Eric Dumazet.
7) Fix RTNL deadlock wrt. lwtunnel module loading, from David Ahern.
8) tcp_fastopen_create_child() needs to setup tp->max_window, from
Alexey Kodanev.
9) Missing kmemdup() failure check in ipv6 segment routing code, from
Eric Dumazet.
10) Don't execute unix_bind() under the bindlock, otherwise we deadlock
with splice. From WANG Cong.
11) ip6_tnl_parse_tlv_enc_lim() potentially reallocates the skb buffer,
therefore callers must reload cached header pointers into that skb.
Fix from Eric Dumazet.
12) Fix various bugs in legacy IRQ fallback handling in alx driver, from
Tobias Regnery.
13) Do not allow lwtunnel drivers to be unloaded while they are
referenced by active instances, from Robert Shearman.
14) Fix truncated PHY LED trigger names, from Geert Uytterhoeven.
15) Fix a few regressions from virtio_net XDP support, from John
Fastabend and Jakub Kicinski.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (102 commits)
ISDN: eicon: silence misleading array-bounds warning
net: phy: micrel: add support for KSZ8795
gtp: fix cross netns recv on gtp socket
gtp: clear DF bit on GTP packet tx
gtp: add genl family modules alias
tcp: don't annotate mark on control socket from tcp_v6_send_response()
ravb: unmap descriptors when freeing rings
virtio_net: reject XDP programs using header adjustment
virtio_net: use dev_kfree_skb for small buffer XDP receive
r8152: check rx after napi is enabled
r8152: re-schedule napi for tx
r8152: avoid start_xmit to schedule napi when napi is disabled
r8152: avoid start_xmit to call napi_schedule during autosuspend
net: dsa: Bring back device detaching in dsa_slave_suspend()
net: phy: leds: Fix truncated LED trigger names
net: phy: leds: Break dependency of phy.h on phy_led_triggers.h
net: phy: leds: Clear phy_num_led_triggers on failure to avoid crash
net-next: ethernet: mediatek: change the compatible string
Documentation: devicetree: change the mediatek ethernet compatible string
bnxt_en: Fix RTNL lock usage on bnxt_get_port_module_status().
...
Linus Torvalds [Fri, 27 Jan 2017 20:44:32 +0000 (12:44 -0800)]
Merge tag 'xfs-for-linus-4.10-rc6-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs uodates from Darrick Wong:
"I have some more fixes this week: better input validation, corruption
avoidance, build fixes, memory leak fixes, and a couple from Christoph
to avoid an ENOSPC failure.
Summary:
- Fix race conditions in the CoW code
- Fix some incorrect input validation checks
- Avoid crashing fs by running out of space when freeing inodes
- Fix toctou race wrt whether or not an inode has an attr
- Fix build error on arm
- Fix page refcount corruption when readahead fails
- Don't corrupt userspace in the bmap ioctl"
* tag 'xfs-for-linus-4.10-rc6-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: prevent quotacheck from overloading inode lru
xfs: fix bmv_count confusion w/ shared extents
xfs: clear _XBF_PAGES from buffers when readahead page
xfs: extsize hints are not unlikely in xfs_bmap_btalloc
xfs: remove racy hasattr check from attr ops
xfs: use per-AG reservations for the finobt
xfs: only update mount/resv fields on success in __xfs_ag_resv_init
xfs: verify dirblocklog correctly
xfs: fix COW writeback race
Linus Torvalds [Fri, 27 Jan 2017 20:41:46 +0000 (12:41 -0800)]
Merge branch 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
"Some fixes that we've collected from the list.
We still have one more pending to nail down a regression in lzo
compression, but I wanted to get this batch out the door"
* 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: remove ->{get, set}_acl() from btrfs_dir_ro_inode_operations
Btrfs: disable xattr operations on subvolume directories
Btrfs: remove old tree_root case in btrfs_read_locked_inode()
Btrfs: fix truncate down when no_holes feature is enabled
Btrfs: Fix deadlock between direct IO and fast fsync
btrfs: fix false enospc error when truncating heavily reflinked file
Linus Torvalds [Fri, 27 Jan 2017 20:36:39 +0000 (12:36 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A set of fixes for this series. This contains:
- Set of fixes for the nvme target code
- A revert of patch from this merge window, causing a regression with
WRITE_SAME on iSCSI targets at least.
- A fix for a use-after-free in the new O_DIRECT bdev code.
- Two fixes for the xen-blkfront driver"
* 'for-linus' of git://git.kernel.dk/linux-block:
Revert "sd: remove __data_len hack for WRITE SAME"
nvme-fc: use blk_rq_nr_phys_segments
nvmet-rdma: Fix missing dma sync to nvme data structures
nvmet: Call fatal_error from keep-alive timout expiration
nvmet: cancel fatal error and flush async work before free controller
nvmet: delete controllers deletion upon subsystem release
nvmet_fc: correct logic in disconnect queue LS handling
block: fix use after free in __blkdev_direct_IO
xen-blkfront: correct maximum segment accounting
xen-blkfront: feature flags handling adjustments
Linus Torvalds [Fri, 27 Jan 2017 20:29:30 +0000 (12:29 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
"Second round of -rc fixes for 4.10.
This -rc cycle has been slow for the rdma subsystem. I had already
sent you the first batch before the Holiday break. After that, we kept
only getting a few here or there. Up until this week, when I got a
drop of 13 to one driver (qedr). So, here's the -rc patches I have. I
currently have none held in reserve, so unless something new comes in,
this is it until the next merge window opens.
Summary:
- series of iw_cxgb4 fixes to make it work with the drain cq API
- one or two patches each to: srp, iser, cxgb3, vmw_pvrdma, umem,
rxe, and ipoib
- one big series (13 patches) for the new qedr driver"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (27 commits)
RDMA/cma: Fix unknown symbol when CONFIG_IPV6 is not enabled
IB/rxe: Prevent from completer to operate on non valid QP
IB/rxe: Fix rxe dev insertion to rxe_dev_list
IB/umem: Release pid in error and ODP flow
RDMA/qedr: Dispatch port active event from qedr_add
RDMA/qedr: Fix and simplify memory leak in PD alloc
RDMA/qedr: Fix RDMA CM loopback
RDMA/qedr: Fix formatting
RDMA/qedr: Mark three functions as static
RDMA/qedr: Don't reset QP when queues aren't flushed
RDMA/qedr: Don't spam dmesg if QP is in error state
RDMA/qedr: Remove CQ spinlock from CM completion handlers
RDMA/qedr: Return max inline data in QP query result
RDMA/qedr: Return success when not changing QP state
RDMA/qedr: Add uapi header qedr-abi.h
RDMA/qedr: Fix MTU returned from QP query
RDMA/core: Add the function ib_mtu_int_to_enum
IB/vmw_pvrdma: Fix incorrect cleanup on pvrdma_pci_probe error path
IB/vmw_pvrdma: Don't leak info from alloc_ucontext
IB/cxgb3: fix misspelling in header guard
...
Linus Torvalds [Fri, 27 Jan 2017 20:25:26 +0000 (12:25 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"Another two bug fixes:
- ptrace partial write information leak
- a guest page hinting regression introduced with v4.6"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/mm: Fix cmma unused transfer from pgste into pte
s390/ptrace: Preserve previous registers for short regset write
Linus Torvalds [Fri, 27 Jan 2017 20:17:07 +0000 (12:17 -0800)]
Merge branch 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
Pull swiotlb fix from Konrad Rzeszutek Wilk:
"An ARM fix in the Xen SWIOTLB - mainly the translation of physical to
bus addresses was done just a tad too late"
* 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
swiotlb-xen: update dev_addr after swapping pages
Linus Torvalds [Fri, 27 Jan 2017 20:10:58 +0000 (12:10 -0800)]
Merge tag 'vfio-v4.10-rc6' of git://github.com/awilliam/linux-vfio
Pull VFIO fix from Alex Williamson:
"mdev IOMMU groups are not yet compatible with the powerpc SPAPR IOMMU
backend, detect and fail group attach (Greg Kurz)"
* tag 'vfio-v4.10-rc6' of git://github.com/awilliam/linux-vfio:
vfio/spapr: fail tce_iommu_attach_group() when iommu_data is null
Jens Axboe [Fri, 27 Jan 2017 18:56:06 +0000 (11:56 -0700)]
Merge branch 'stable/for-jens-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus
Konrad writes:
Please pull in your 'for-linus' branch two little fixes for Xen
block front:
One fix is for handling the XEN_PAGE_SIZE != PAGE_SIZE (4KB vs 64KB
on ARM for example) mishandling while the other is fixing
the accounting for the configuration changes.
Vineet Gupta [Fri, 27 Jan 2017 18:45:27 +0000 (10:45 -0800)]
ARC: [arcompact] handle unaligned access delay slot corner case
After emulating an unaligned access in delay slot of a branch, we
pretend as the delay slot never happened - so return back to actual
branch target (or next PC if branch was not taken).
Curently we did this by handling STATUS32.DE, we also need to clear the
BTA.T bit, which is disregarded when returning from original misaligned
exception, but could cause weirdness if it took the interrupt return
path (in case interrupt was acive too)
One ARC700 customer ran into this when enabling unaligned access fixup
for kernel mode accesses as well
Linus Torvalds [Fri, 27 Jan 2017 18:29:33 +0000 (10:29 -0800)]
Merge tag 'media/v4.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- fix a regression on tvp5150 causing failures at input selection and
image glitches
- CEC was moved out of staging for v4.10. Fix some bugs on it while not
too late
- fix a regression on pctv452e caused by VM stack changes
- fix suspend issued with smiapp
- fix a regression on cobalt driver
- fix some warnings and Kconfig issues with some random configs.
* tag 'media/v4.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] s5k4ecgx: select CRC32 helper
[media] dvb: avoid warning in dvb_net
[media] v4l: tvp5150: Don't override output pinmuxing at stream on/off time
[media] v4l: tvp5150: Fix comment regarding output pin muxing
[media] v4l: tvp5150: Reset device at probe time, not in get/set format handlers
[media] pctv452e: move buffer to heap, no mutex
[media] media/cobalt: use pci_irq_allocate_vectors
[media] cec: fix race between configuring and unconfiguring
[media] cec: move cec_report_phys_addr into cec_config_thread_func
[media] cec: replace cec_report_features by cec_fill_msg_report_features
[media] cec: update log_addr[] before finishing configuration
[media] cec: CEC_MSG_GIVE_FEATURES should abort for CEC version < 2
[media] cec: when canceling a message, don't overwrite old status info
[media] cec: fix report_current_latency
[media] smiapp: Make suspend and resume functions __maybe_unused
[media] smiapp: Implement power-on and power-off sequences without runtime PM
Linus Torvalds [Fri, 27 Jan 2017 18:22:00 +0000 (10:22 -0800)]
Merge branch 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal management fix from Zhang Rui:
"A single revert from a recently introduced problem.
Specifics:
Commit 7611fb68062f ("thermal: thermal_hwmon: Convert to
hwmon_device_register_with_info()"), which was introduced in 4.10-rc5,
uses new hwmon API. But this breaks some soc thermal driver because
the new hwmon API has a strict rule for the hwmon device name. Revert
the offending commit as a quick solution for 4.10"
* 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
Revert "thermal: thermal_hwmon: Convert to hwmon_device_register_with_info()"
Brian Foster [Thu, 26 Jan 2017 21:18:09 +0000 (13:18 -0800)]
xfs: prevent quotacheck from overloading inode lru
Quotacheck runs at mount time in situations where quota accounting must
be recalculated. In doing so, it uses bulkstat to visit every inode in
the filesystem. Historically, every inode processed during quotacheck
was released and immediately tagged for reclaim because quotacheck runs
before the superblock is marked active by the VFS. In other words,
the final iput() lead to an immediate ->destroy_inode() call, which
allowed the XFS background reclaim worker to start reclaiming inodes.
Commit 17c12bcd3 ("xfs: when replaying bmap operations, don't let
unlinked inodes get reaped") marks the XFS superblock active sooner as
part of the mount process to support caching inodes processed during log
recovery. This occurs before quotacheck and thus means all inodes
processed by quotacheck are inserted to the LRU on release. The
s_umount lock is held until the mount has completed and thus prevents
the shrinkers from operating on the sb. This means that quotacheck can
excessively populate the inode LRU and lead to OOM conditions on systems
without sufficient RAM.
Update the quotacheck bulkstat handler to set XFS_IGET_DONTCACHE on
inodes processed by quotacheck. This causes ->drop_inode() to return 1
and in turn causes iput_final() to evict the inode. This preserves the
original quotacheck behavior and prevents it from overloading the LRU
and running out of memory.
CC: stable@vger.kernel.org # v4.9 Reported-by: Martin Svec <martin.svec@zoner.cz> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
With some gcc versions, we get a warning about the eicon driver,
and that currently shows up as the only remaining warning in one
of the build bots:
In file included from ../drivers/isdn/hardware/eicon/message.c:30:0:
eicon/message.c: In function 'mixer_notify_update':
eicon/platform.h:333:18: warning: array subscript is above array bounds [-Warray-bounds]
The code is easily changed to open-code the unusual PUT_WORD() line
causing this to avoid the warning.
David S. Miller [Fri, 27 Jan 2017 15:39:10 +0000 (10:39 -0500)]
Merge branch 'gtp-fixes'
Andreas Schultz says:
====================
various gtp fixes
I'm sorry for the compile error mess up in the last version.
It's no excuse for not test compiling, but the hunks got lost in
a rebase.
This is the part of the previous "simple gtp improvements" series
that Pablo indicated should go into net.
The addition of the module alias fixes genl family autoloading,
clearing the DF bit fixes a protocol violation in regard to the
specification and the netns comparison fixes a corner case of
cross netns recv.
v2->v3: fix compiler error introduced in rebase
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andreas Schultz [Fri, 27 Jan 2017 09:40:58 +0000 (10:40 +0100)]
gtp: fix cross netns recv on gtp socket
The use of the passed through netlink src_net to check for a
cross netns operation was wrong. Using the GTP socket and the
GTP netdevice is always correct (even if the netdev has been
moved to new netns after link creation).
Remove the now obsolete net field from gtp_dev.
Signed-off-by: Andreas Schultz <aschultz@tpip.net> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Andreas Schultz [Fri, 27 Jan 2017 09:40:57 +0000 (10:40 +0100)]
gtp: clear DF bit on GTP packet tx
3GPP TS 29.281 and 3GPP TS 29.060 imply that GTP-U packets should be
sent with the DF bit cleared. For example 3GPP TS 29.060, Release 8,
Section 13.2.2:
> Backbone router: Any router in the backbone may fragment the GTP
> packet if needed, according to IPv4.
Signed-off-by: Andreas Schultz <aschultz@tpip.net> Acked-by: Harald Welte <laforge@netfilter.org> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Andreas Schultz [Fri, 27 Jan 2017 09:40:56 +0000 (10:40 +0100)]
gtp: add genl family modules alias
Auto-load the module when userspace asks for the gtp netlink
family.
Signed-off-by: Andreas Schultz <aschultz@tpip.net> Acked-by: Harald Welte <laforge@netfilter.org> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira [Thu, 26 Jan 2017 21:56:21 +0000 (22:56 +0100)]
tcp: don't annotate mark on control socket from tcp_v6_send_response()
Unlike ipv4, this control socket is shared by all cpus so we cannot use
it as scratchpad area to annotate the mark that we pass to ip6_xmit().
Add a new parameter to ip6_xmit() to indicate the mark. The SCTP socket
family caches the flowi6 structure in the sctp_transport structure, so
we cannot use to carry the mark unless we later on reset it back, which
I discarded since it looks ugly to me.
Fixes: bf99b4ded5f8 ("tcp: fix mark propagation with fwmark_reflect enabled") Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
arm64: skip register_cpufreq_notifier on ACPI-based systems
On ACPI based systems where the topology is setup using the API
store_cpu_topology, at the moment we do not have necessary code
to parse cpu capacity and handle cpufreq notifier, thus
resulting in a kernel panic.
Linus Torvalds [Fri, 27 Jan 2017 02:04:56 +0000 (18:04 -0800)]
Merge tag 'drm-fixes-for-v4.10-rc6-part-two' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"This is the main request for rc6, since really the one earlier was the
rc5 one :-)
The main thing are the nouveau specific race fixes for the connector
locking bug we fixed in -next and reverted here as it has quite large
prereqs. These two fixes should solve the problem at that level and we
can fix it properly in 4.11
Otherwise i915 has a bunch of changes, one ABI change for GVT related
stuff, some VC4 leak fixes, one core fence fix and some AMD changes,
oh and one ast hang avoidance fix.
Hoping it calms down around now"
* tag 'drm-fixes-for-v4.10-rc6-part-two' of git://people.freedesktop.org/~airlied/linux: (25 commits)
drm/nouveau: Handle fbcon suspend/resume in seperate worker
drm/nouveau: Don't enabling polling twice on runtime resume
drm/ast: Fixed system hanged if disable P2A
Revert "drm/radeon: always apply pci shutdown callbacks"
drm/i915: reinstate call to trace_i915_vma_bind
drm/i915: Move atomic state free from out of fence release
drm/i915: Check for NULL atomic state in intel_crtc_disable_noatomic()
drm/i915: Fix calculation of rotated x and y offsets for planar formats
drm/i915: Don't init hpd polling for vlv and chv from runtime_suspend()
drm/i915: Don't leak edid in intel_crt_detect_ddc()
drm/i915: Release temporary load-detect state upon switching
drm/i915: prevent crash with .disable_display parameter
drm/i915: Avoid drm_atomic_state_put(NULL) in intel_display_resume
MAINTAINERS: update new mail list for intel gvt driver
drm/i915/gvt: Fix kmem_cache_create() name
drm/i915/gvt/kvmgt: mdev ABI is available_instances, not available_instance
drm/amdgpu: fix unload driver issue for virtual display
drm/amdgpu: check ring being ready before using
drm/vc4: Return -EINVAL on the overflow checks failing.
drm/vc4: Fix an integer overflow in temporary allocation layout.
...
Dave Airlie [Fri, 27 Jan 2017 01:29:44 +0000 (11:29 +1000)]
Merge tag 'drm-intel-fixes-2017-01-26' of git://anongit.freedesktop.org/git/drm-intel into drm-fixes
More fixes than I'd like at this stage, but I think the holidays and
conferences have delayed finding and fixing the stuff a bit. Almost all
of them have Fixes: tags, so it's not just random fixes, we can point
fingers at the commits that broke stuff.
There's an ABI fix to GVT from Alex, before we go on an release a kernel
with the wrong attribute name.
* tag 'drm-intel-fixes-2017-01-26' of git://anongit.freedesktop.org/git/drm-intel:
drm/i915: reinstate call to trace_i915_vma_bind
drm/i915: Move atomic state free from out of fence release
drm/i915: Check for NULL atomic state in intel_crtc_disable_noatomic()
drm/i915: Fix calculation of rotated x and y offsets for planar formats
drm/i915: Don't init hpd polling for vlv and chv from runtime_suspend()
drm/i915: Don't leak edid in intel_crt_detect_ddc()
drm/i915: Release temporary load-detect state upon switching
drm/i915: prevent crash with .disable_display parameter
drm/i915: Avoid drm_atomic_state_put(NULL) in intel_display_resume
MAINTAINERS: update new mail list for intel gvt driver
drm/i915/gvt: Fix kmem_cache_create() name
drm/i915/gvt/kvmgt: mdev ABI is available_instances, not available_instance
drm/i915/gvt: Fix relocation of shadow bb
drm/i915/gvt: Enable the shadow batch buffer
Linus Torvalds [Fri, 27 Jan 2017 01:27:00 +0000 (17:27 -0800)]
Merge tag 'acpi-4.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix two regressions introduced recently, one by reverting the
problematic commit and one by fixing up locking in the ACPICA core.
Specifics:
- Revert a recent change that added an ACPI video blacklist entry for
HP Pavilion dv6 as it turned to introduce backlight handling
regressions on some systems (Hans de Goede).
- Fix locking in the ACPICA core to avoid deadlocks related to table
loading that were exposed by a recent change in that area (Lv
Zheng)"
* tag 'acpi-4.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPI / video: Add force_native quirk for HP Pavilion dv6"
ACPICA: Tables: Fix hidden logic related to acpi_tb_install_standard_table()
Linus Torvalds [Fri, 27 Jan 2017 01:14:17 +0000 (17:14 -0800)]
Merge tag 'pm-4.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix two regressions introduced recently, one by reverting the
problematic commit and one by fixing up the behavior in an overlooked
case.
Specifics:
- Revert the recent change that caused suspend-to-idle to be used as
the default suspend method on systems where it is indicated to be
efficient by the ACPI tables, as that turned out to be premature
and introduced suspend regressions on some systems with missing
power management support in device drivers (Rafael Wysocki).
- Fix up the intel_pstate driver to take changes of the global limits
via sysfs correctly when the performance policy is used which has
been broken by a recent change in it (Srinivas Pandruvada)"
* tag 'pm-4.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: Fix sysfs limits enforcement for performance policy
Revert "PM / sleep / ACPI: Use the ACPI_FADT_LOW_POWER_S0 flag"
Lyude Paul [Thu, 12 Jan 2017 02:25:24 +0000 (21:25 -0500)]
drm/nouveau: Handle fbcon suspend/resume in seperate worker
Resuming from RPM can happen while already holding
dev->mode_config.mutex. This means we can't actually handle fbcon in
any RPM resume workers, since restoring fbcon requires grabbing
dev->mode_config.mutex again. So move the fbcon suspend/resume code into
it's own worker, and rely on that instead to avoid deadlocking.
This fixes more deadlocks for runtime suspending the GPU on the ThinkPad
W541. Reproduction recipe:
- Get a machine with both optimus and a nvidia card with connectors
attached to it
- Wait for the nvidia GPU to suspend
- Attempt to manually reprobe any of the connectors on the nvidia GPU
using sysfs
- *deadlock*
[airlied: use READ_ONCE to address Hans's comment]
Signed-off-by: Lyude <lyude@redhat.com> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Kilian Singer <kilian.singer@quantumtechnology.info> Cc: Lukas Wunner <lukas@wunner.de> Cc: David Airlie <airlied@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Lyude Paul [Thu, 12 Jan 2017 02:25:23 +0000 (21:25 -0500)]
drm/nouveau: Don't enabling polling twice on runtime resume
As it turns out, on cards that actually have CRTCs on them we're already
calling drm_kms_helper_poll_enable(drm_dev) from
nouveau_display_resume() before we call it in
nouveau_pmops_runtime_resume(). This leads us to accidentally trying to
enable polling twice, which results in a potential deadlock between the
RPM locks and drm_dev->mode_config.mutex if we end up trying to enable
polling the second time while output_poll_execute is running and holding
the mode_config lock. As such, make sure we only enable polling in
nouveau_pmops_runtime_resume() if we need to.
This fixes hangs observed on the ThinkPad W541
Signed-off-by: Lyude <lyude@redhat.com> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Kilian Singer <kilian.singer@quantumtechnology.info> Cc: Lukas Wunner <lukas@wunner.de> Cc: David Airlie <airlied@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Y.C. Chen [Thu, 26 Jan 2017 01:45:40 +0000 (09:45 +0800)]
drm/ast: Fixed system hanged if disable P2A
The original ast driver will access some BMC configuration through P2A bridge
that can be disabled since AST2300 and after.
It will cause system hanged if P2A bridge is disabled.
Here is the update to fix it.
Signed-off-by: Y.C. Chen <yc_chen@aspeedtech.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Fri, 27 Jan 2017 00:33:39 +0000 (10:33 +1000)]
Merge tag 'drm-vc4-fixes-2017-01-23' of https://github.com/anholt/linux into drm-fixes
This pull request brings in a few little error checking fixes and one
slow memory leak fix.
* tag 'drm-vc4-fixes-2017-01-23' of https://github.com/anholt/linux:
drm/vc4: Return -EINVAL on the overflow checks failing.
drm/vc4: Fix an integer overflow in temporary allocation layout.
drm/vc4: fix a bounds check
drm/vc4: Fix memory leak of the CRTC state.
Dave Airlie [Fri, 27 Jan 2017 00:17:43 +0000 (10:17 +1000)]
Merge branch 'drm-fixes-4.10' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Just a few small fixes.
* 'drm-fixes-4.10' of git://people.freedesktop.org/~agd5f/linux:
Revert "drm/radeon: always apply pci shutdown callbacks"
drm/amdgpu: fix unload driver issue for virtual display
drm/amdgpu: check ring being ready before using
Dave Airlie [Fri, 27 Jan 2017 00:16:56 +0000 (10:16 +1000)]
Merge tag 'drm-misc-fixes-2017-01-23' of git://anongit.freedesktop.org/git/drm-misc into drm-fixes
Single fence fix.
* tag 'drm-misc-fixes-2017-01-23' of git://anongit.freedesktop.org/git/drm-misc:
drm/fence: fix memory overwrite when setting out_fence fd
Omar Sandoval [Thu, 26 Jan 2017 01:06:39 +0000 (17:06 -0800)]
Btrfs: disable xattr operations on subvolume directories
When you snapshot a subvolume containing a subvolume, you get a
placeholder directory where the subvolume would be. These directory
inodes have ->i_ops set to btrfs_dir_ro_inode_operations. Previously,
these i_ops didn't include the xattr operation callbacks. The conversion
to xattr_handlers missed this case, leading to bogus attempts to set
xattrs on these inodes. This manifested itself as failures when running
delayed inodes.
To fix this, clear IOP_XATTR in ->i_opflags on these inodes.
Fixes: 6c6ef9f26e59 ("xattr: Stop calling {get,set,remove}xattr inode operations") Cc: Andreas Gruenbacher <agruenba@redhat.com> Reported-by: Chris Murphy <lists@colorremedies.com> Tested-by: Chris Murphy <lists@colorremedies.com> Cc: <stable@vger.kernel.org> # 4.9.x Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
Omar Sandoval [Thu, 26 Jan 2017 01:06:38 +0000 (17:06 -0800)]
Btrfs: remove old tree_root case in btrfs_read_locked_inode()
As Jeff explained in c2951f32d36c ("btrfs: remove old tree_root dirent
processing in btrfs_real_readdir()"), supporting this old format is no
longer necessary since the Btrfs magic number has been updated since we
changed to the current format. There are other places where we still
handle this old format, but since this is part of a fix that is going to
stable, I'm only removing this one for now.
Cc: <stable@vger.kernel.org> # 4.9.x Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
Kazuya Mizuguchi [Thu, 26 Jan 2017 13:29:27 +0000 (14:29 +0100)]
ravb: unmap descriptors when freeing rings
"swiotlb buffer is full" errors occur after repeated initialisation of a
device - f.e. suspend/resume or ip link set up/down. This is because memory
mapped using dma_map_single() in ravb_ring_format() and ravb_start_xmit()
is not released. Resolve this problem by unmapping descriptors when
freeing rings.
Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") Signed-off-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
[simon: reworked] Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Trond Myklebust [Thu, 26 Jan 2017 20:50:41 +0000 (15:50 -0500)]
pNFS: Fix a reference leak in _pnfs_return_layout
IF NFS_LAYOUT_RETURN_REQUESTED is not set, then we currently exit
without freeing the list of invalidated layout segments, leading
to a reference leak.
Reported-by: Olga Kornievskaia <aglo@umich.edu> Fixes: 24408f5282 ("pNFS: Fix bugs in _pnfs_return_layout") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Chuck Lever [Thu, 26 Jan 2017 20:14:52 +0000 (15:14 -0500)]
nfs: Fix "Don't increment lock sequence ID after NFS4ERR_MOVED"
Lock sequence IDs are bumped in decode_lock by calling
nfs_increment_seqid(). nfs_increment_sequid() does not use the
seqid_mutating_err() function fixed in commit 059aa7348241 ("Don't
increment lock sequence ID after NFS4ERR_MOVED").
Fixes: 059aa7348241 ("Don't increment lock sequence ID after ...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Xuan Qi <xuan.qi@oracle.com> Cc: stable@vger.kernel.org # v3.7+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
The following patchset contains a large batch with Netfilter fixes for
your net tree, they are:
1) Two patches to solve conntrack garbage collector cpu hogging, one to
remove GC_MAX_EVICTS and another to look at the ratio (scanned entries
vs. evicted entries) to make a decision on whether to reduce or not
the scanning interval. From Florian Westphal.
2) Two patches to fix incorrect set element counting if NLM_F_EXCL is
is not set. Moreover, don't decrenent set->nelems from abort patch
if -ENFILE which leaks a spare slot in the set. This includes a
patch to deconstify the set walk callback to update set->ndeact.
3) Two fixes for the fwmark_reflect sysctl feature: Propagate mark to
reply packets both from nf_reject and local stack, from Pau Espin Pedrol.
4) Fix incorrect handling of loopback traffic in rpfilter and nf_tables
fib expression, from Liping Zhang.
5) Fix oops on stateful objects netlink dump, when no filter is specified.
Also from Liping Zhang.
6) Fix a build error if proc is not available in ipt_CLUSTERIP, related
to fix that was applied in the previous batch for net. From Arnd Bergmann.
7) Fix lack of string validation in table, chain, set and stateful
object names in nf_tables, from Liping Zhang. Moreover, restrict
maximum log prefix length to 127 bytes, otherwise explicitly bail
out.
8) Two patches to fix spelling and typos in nf_tables uapi header file
and Kconfig, patches from Alexander Alemayhu and William Breathitt Gray.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Darrick J. Wong [Thu, 26 Jan 2017 17:50:30 +0000 (09:50 -0800)]
xfs: fix bmv_count confusion w/ shared extents
In a bmapx call, bmv_count is the total size of the array, including the
zeroth element that userspace uses to supply the search key. The output
array starts at offset 1 so that we can set up the user for the next
invocation. Since we now can split an extent into multiple bmap records
due to shared/unshared status, we have to be careful that we don't
overflow the output array.
In the original patch f86f403794b ("xfs: teach get_bmapx about shared
extents and the CoW fork") I used cur_ext (the output index) to check
for overflows, albeit with an off-by-one error. Since nexleft no longer
describes the number of unfilled slots in the output, we can rip all
that out and use cur_ext for the overflow check directly.
Failure to do this causes heap corruption in bmapx callers such as
xfs_io and xfs_scrub. xfs/328 can reproduce this problem.
Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Linus Torvalds [Thu, 26 Jan 2017 17:08:49 +0000 (09:08 -0800)]
Merge tag 'pinctrl-v4.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"A bunch of pin control fixes for v4.10 that didn't get sent off until
now, sorry for the delay.
It's only driver fixes:
- A bunch of fixes to the Intel drivers: broxton, baytrail. Bugs
related to register offsets, IRQ, debounce functionality.
- Fix a conflict amongst UART settings on the meson.
- Fix the ethernet setting on the Uniphier.
- A compilation warning squelched"
* tag 'pinctrl-v4.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: uniphier: fix Ethernet (RMII) pin-mux setting for LD20
pinctrl: meson: fix uart_ao_b for GXBB and GXL/GXM
pinctrl: amd: avoid maybe-uninitalized warning
pinctrl: baytrail: Do not add all GPIOs to IRQ domain
pinctrl: baytrail: Rectify debounce support
pinctrl: intel: Set pin direction properly
pinctrl: broxton: Use correct PADCFGLOCK offset
Bart Van Assche [Wed, 25 Jan 2017 21:43:56 +0000 (13:43 -0800)]
Revert "sd: remove __data_len hack for WRITE SAME"
This patch reverts commit f80de881d8df and avoids that sending a
WRITE SAME command to the iSCSI initiator triggers the following:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000014
TARGET_CORE[iSCSI]: Expected Transfer Length: 260096 does not match SCSI CDB Length: 512 for SAM Opcode: 0x41
IP: iscsi_tcp_segment_done+0x20b/0x310 [libiscsi_tcp]
Linus Torvalds [Thu, 26 Jan 2017 16:55:33 +0000 (08:55 -0800)]
Merge tag 'drm-fixes-for-v4.10-rc6-revert-one' of git://people.freedesktop.org/~airlied/linux
Pull drm revert from Dave Airlie:
"Revert one patch missing some prereqs.
One of the connector fixes was missing some prereqs, we have an
alternate driver fix that should work that I'll send tomorrow.
Today is a holiday here so quickly smashing this out"
Daniel Vetter explains:
"I pushed a locking change to fix a nouveau rpm issue to -fixes that
needed the connector_list rework. And that's only in -next, but I
missed that. Dave has the revert in a pull, and he'll follow-up with
the hack nouveau patch for 4.10, and then we'll reapply the proper fix
again for -next and revert the hacks. A bit a mess, but should be
sorted soon"
* tag 'drm-fixes-for-v4.10-rc6-revert-one' of git://people.freedesktop.org/~airlied/linux:
Revert "drm/probe-helpers: Drop locking from poll_enable"
Parav Pandit [Thu, 19 Jan 2017 15:55:08 +0000 (09:55 -0600)]
nvmet-rdma: Fix missing dma sync to nvme data structures
This patch performs dma sync operations on nvme_command
and nvme_completion.
nvme_command is synced
(a) on receiving of the recv queue completion for cpu access.
(b) before posting recv wqe back to rdma adapter for device access.
nvme_completion is synced
(a) on receiving of the recv queue completion of associated
nvme_command for cpu access.
(b) before posting send wqe to rdma adapter for device access.
This patch is generated for git://git.infradead.org/nvme-fabrics.git
Branch: nvmf-4.10
Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Sagi Grimberg [Sun, 1 Jan 2017 11:18:26 +0000 (13:18 +0200)]
nvmet: Call fatal_error from keep-alive timout expiration
We only need to call delete_ctrl once, so given that both
keep-alive timeout and any other fatal error can trigger it,
just make sure we only call delete_ctrl once.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de>
Sagi Grimberg [Sun, 27 Nov 2016 20:29:17 +0000 (22:29 +0200)]
nvmet: delete controllers deletion upon subsystem release
No reason for them to be kept around if we are
deleting the subsystem, so instead of passively
wait for the host to disconnect, actively delete
the controllers.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 26 Jan 2017 04:24:57 +0000 (20:24 -0800)]
xfs: clear _XBF_PAGES from buffers when readahead page
If we try to allocate memory pages to back an xfs_buf that we're trying
to read, it's possible that we'll be so short on memory that the page
allocation fails. For a blocking read we'll just wait, but for
readahead we simply dump all the pages we've collected so far.
Unfortunately, after dumping the pages we neglect to clear the
_XBF_PAGES state, which means that the subsequent call to xfs_buf_free
thinks that b_pages still points to pages we own. It then double-frees
the b_pages pages.
This results in screaming about negative page refcounts from the memory
manager, which xfs oughtn't be triggering. To reproduce this case,
mount a filesystem where the size of the inodes far outweighs the
availalble memory (a ~500M inode filesystem on a VM with 300MB memory
did the trick here) and run bulkstat in parallel with other memory
eating processes to put a huge load on the system. The "check summary"
phase of xfs_scrub also works for this purpose.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Jakub Kicinski [Wed, 25 Jan 2017 22:56:36 +0000 (14:56 -0800)]
virtio_net: reject XDP programs using header adjustment
commit 17bedab27231 ("bpf: xdp: Allow head adjustment in XDP prog")
added a new XDP helper to prepend and remove data from a frame.
Make virtio_net reject programs making use of this helper until
proper support is added.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Thu, 26 Jan 2017 02:22:48 +0000 (18:22 -0800)]
virtio_net: use dev_kfree_skb for small buffer XDP receive
In the small buffer case during driver unload we currently use
put_page instead of dev_kfree_skb. Resolve this by adding a check
for virtnet mode when checking XDP queue type. Also name the
function so that the code reads correctly to match the additional
check.
Fixes: bb91accf2733 ("virtio-net: XDP support for small buffers") Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 26 Jan 2017 03:47:31 +0000 (22:47 -0500)]
Merge branch 'r8152-napi-fixes'
Hayes Wang says:
====================
r8152: fix scheduling napi
v3:
simply the argument for patch #3. Replace &tp->napi with napi.
v2:
Add smp_mb__after_atomic() for patch #1.
v1:
Scheduling the napi during the following periods would let it be ignored.
And the events wouldn't be handled until next napi_schedule() is called.
1. after napi_disable and before napi_enable().
2. after all actions of napi function is completed and before calling
napi_complete().
If no next napi_schedule() is called, tx or rx would stop working.
In order to avoid these situations, the followings solutions are applied.
1. prevent start_xmit() from calling napi_schedule() during runtime suspend
or after napi_disable().
2. re-schedule the napi for tx if it is necessary.
3. check if any rx is finished or not after napi_enable().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Thu, 26 Jan 2017 01:38:33 +0000 (09:38 +0800)]
r8152: re-schedule napi for tx
Re-schedule napi after napi_complete() for tx, if it is necessay.
In r8152_poll(), if the tx is completed after tx_bottom() and before
napi_complete(), the scheduling of napi would be lost. Then, no
one handles the next tx until the next napi_schedule() is called.
Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gao Pan [Tue, 17 Jan 2017 10:20:55 +0000 (18:20 +0800)]
i2c: imx-lpi2c: add VLLS mode support
When system enters VLLS mode, module power is turned off. As a result,
all registers are reset to HW default value. After exiting VLLS mode,
registers are still in default mode. As a result, the pinctrl settings
are incorrect, which will affect the module function.
The patch recovers the pinctrl setting when exit VLLS mode.
Signed-off-by: Gao Pan <pandy.gao@nxp.com> Reviewed-by: Vladimir Zapolskiy <vz@mleia.com>
[wsa: added missing include] Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Mike Looijmans [Mon, 16 Jan 2017 14:49:38 +0000 (15:49 +0100)]
i2c: i2c-cadence: Initialize configuration before probing devices
The cadence I2C driver calls cdns_i2c_writereg(..) to setup a workaround
in the controller, but did so after calling i2c_add_adapter() which starts
probing devices on the bus. Change the order so that the configuration is
completely finished before using the adapter.
Signed-off-by: Mike Looijmans <mike.looijmans@topic.nl> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Florian Fainelli [Wed, 25 Jan 2017 17:10:41 +0000 (09:10 -0800)]
net: dsa: Bring back device detaching in dsa_slave_suspend()
Commit 448b4482c671 ("net: dsa: Add lockdep class to tx queues to avoid
lockdep splat") removed the netif_device_detach() call done in
dsa_slave_suspend() which is necessary, and paired with a corresponding
netif_device_attach(), bring it back.
Fixes: 448b4482c671 ("net: dsa: Add lockdep class to tx queues to avoid lockdep splat") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 25 Jan 2017 19:40:25 +0000 (14:40 -0500)]
Merge branch 'phy-truncated-led-names'
Geert Uytterhoeven says:
====================
net: phy: leds: Fix truncated LED trigger names and crashes
I started seeing crashes during s2ram and poweroff on all my ARM boards,
like:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
...
[<c04116d4>] (__list_del_entry_valid) from [<c05e8948>] (led_trigger_unregister+0x34/0xcc)
[<c05e8948>] (led_trigger_unregister) from [<c05336c4>] (phy_led_triggers_unregister+0x28/0x34)
[<c05336c4>] (phy_led_triggers_unregister) from [<c0531d44>] (phy_detach+0x30/0x74)
[<c0531d44>] (phy_detach) from [<c0538bdc>] (sh_eth_close+0x64/0x9c)
[<c0538bdc>] (sh_eth_close) from [<c04d4ce0>] (dpm_run_callback+0x48/0xc8)
or:
list_del corruption. prev->next should be dede6540, but was 2e323931
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:52!
...
[<c02f6d70>] (__list_del_entry_valid) from [<c0425168>] (led_trigger_unregister+0x34/0xcc)
[<c0425168>] (led_trigger_unregister) from [<c03a05a0>] (phy_led_triggers_unregister+0x28/0x34)
[<c03a05a0>] (phy_led_triggers_unregister) from [<c039ec04>] (phy_detach+0x30/0x74)
[<c039ec04>] (phy_detach) from [<c03a4fc0>] (sh_eth_close+0x6c/0xa4)
[<c03a4fc0>] (sh_eth_close) from [<c0483234>] (__dev_close_many+0xac/0xd0)
As the only clue was a kernel message like
sh-eth ee700000.ethernet eth0: No phy led trigger registered for speed(100)
I had to bisected this, leading to commit 4567d686f5c6d955 ("phy:
increase size of MII_BUS_ID_SIZE and bus_id"). Reverting that commit
fixed the issue.
More investigation revealed the crashes are due to the combination of
two things:
- Truncated LED trigger names, leading to duplicate names, and
registration failures,
- Bad error handling in case of registration failures.
Both are fixed by this patch series.
Changes compared to v1:
- Add Reviewed-by,
- New patch "net: phy: leds: Break dependency of phy.h on
phy_led_triggers.h",
- Drop moving the include of <linux/phy_led_triggers.h>, as
<linux/phy.h> no longer includes it,
- #include <linux/phy.h> from <linux/phy_led_triggers.h>.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 4567d686f5c6d955 ("phy: increase size of MII_BUS_ID_SIZE and
bus_id") increased the size of MII bus IDs, but forgot to update the
private definition in <linux/phy_led_triggers.h>.
This may cause:
1. Truncation of LED trigger names,
2. Duplicate LED trigger names,
3. Failures registering LED triggers,
4. Crashes due to bad error handling in the LED trigger failure path.
To fix this, and prevent the definitions going out of sync again in the
future, let the PHY LED trigger code use the existing MII_BUS_ID_SIZE
definition.
Example:
- Before I had triggers "ee700000.etherne:01:100Mbps" and
"ee700000.etherne:01:10Mbps",
- After the increase of MII_BUS_ID_SIZE, both became
"ee700000.ethernet-ffffffff:01:" => FAIL,
- Now, the triggers are "ee700000.ethernet-ffffffff:01:100Mbps" and
"ee700000.ethernet-ffffffff:01:10Mbps", which are unique again.
Fixes: 4567d686f5c6d955 ("phy: increase size of MII_BUS_ID_SIZE and bus_id") Fixes: 2e0bc452f4721520 ("net: phy: leds: add support for led triggers on phy link state change") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
net: phy: leds: Break dependency of phy.h on phy_led_triggers.h
<linux/phy.h> includes <linux/phy_led_triggers.h>, which is not really
needed. Drop the include from <linux/phy.h>, and add it to all users
that didn't include it explicitly.
Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
net: phy: leds: Clear phy_num_led_triggers on failure to avoid crash
phy_attach_direct() ignores errors returned by
phy_led_triggers_register(). I think that's OK, as LED triggers can be
considered a non-critical feature.
However, this causes problems later:
- phy_led_trigger_change_speed() will access the array
phy_device.phy_led_triggers, which has been freed in the error path
of phy_led_triggers_register(), which may lead to a crash.
- phy_led_triggers_unregister() will access the same array, leading to
crashes during s2ram or poweroff, like:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
...
[<c04116d4>] (__list_del_entry_valid) from [<c05e8948>] (led_trigger_unregister+0x34/0xcc)
[<c05e8948>] (led_trigger_unregister) from [<c05336c4>] (phy_led_triggers_unregister+0x28/0x34)
[<c05336c4>] (phy_led_triggers_unregister) from [<c0531d44>] (phy_detach+0x30/0x74)
[<c0531d44>] (phy_detach) from [<c0538bdc>] (sh_eth_close+0x64/0x9c)
[<c0538bdc>] (sh_eth_close) from [<c04d4ce0>] (dpm_run_callback+0x48/0xc8)
or:
list_del corruption. prev->next should be dede6540, but was 2e323931
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:52!
...
[<c02f6d70>] (__list_del_entry_valid) from [<c0425168>] (led_trigger_unregister+0x34/0xcc)
[<c0425168>] (led_trigger_unregister) from [<c03a05a0>] (phy_led_triggers_unregister+0x28/0x34)
[<c03a05a0>] (phy_led_triggers_unregister) from [<c039ec04>] (phy_detach+0x30/0x74)
[<c039ec04>] (phy_detach) from [<c03a4fc0>] (sh_eth_close+0x6c/0xa4)
[<c03a4fc0>] (sh_eth_close) from [<c0483234>] (__dev_close_many+0xac/0xd0)
To fix this, clear phy_device.phy_num_led_triggers in the error path of
phy_led_triggers_register() fails.
Note that the "No phy led trigger registered for speed" message will
still be printed on link speed changes, which is a good cue that
something went wrong with the LED triggers.
Fixes: 2e0bc452f4721520 ("net: phy: leds: add support for led triggers on phy link state change") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Crispin [Wed, 25 Jan 2017 08:20:55 +0000 (09:20 +0100)]
net-next: ethernet: mediatek: change the compatible string
When the binding was defined, I was not aware that mt2701 was an earlier
version of the SoC. For sake of consistency, the ethernet driver should
use mt2701 inside the compat string as this is the earliest SoC with the
ethernet core.
The ethernet driver is currently of no real use until we finish and
upstream the DSA driver. There are no users of this binding yet. It should
be safe to fix this now before it is too late and we need to provide
backward compatibility for the mt7623-eth compat string.
Reported-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
John Crispin [Wed, 25 Jan 2017 08:20:54 +0000 (09:20 +0100)]
Documentation: devicetree: change the mediatek ethernet compatible string
When the binding was defined, I was not aware that mt2701 was an earlier
version of the SoC. For sake of consistency, the ethernet driver should
use mt2701 inside the compat string as this is the earliest SoC with the
ethernet core.
The ethernet driver is currently of no real use until we finish and
upstream the DSA driver. There are no users of this binding yet. It should
be safe to fix this now before it is too late and we need to provide
backward compatibility for the mt7623-eth compat string.
Reported-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: John Crispin <john@phrozen.org> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 25 Jan 2017 18:27:14 +0000 (13:27 -0500)]
Merge branch 'bnxt_en-rtnl-fixes'
Michael Chan says:
====================
bnxt_en: Fix RTNL lock usage in bnxt_sp_task().
There are 2 function calls from bnxt_sp_task() that have buggy RTNL
usage. These 2 functions take RTNL lock under some conditions, but
some callers (such as open, ethtool) have already taken RTNL. These
3 patches fix the issue by making it clear that callers must take
RTNL. If the caller is bnxt_sp_task() which does not automatically
take RTNL, we add a common scheme for bnxt_sp_task() to call these
functions properly under RTNL.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Wed, 25 Jan 2017 07:55:09 +0000 (02:55 -0500)]
bnxt_en: Fix RTNL lock usage on bnxt_get_port_module_status().
bnxt_get_port_module_status() calls bnxt_update_link() which expects
RTNL to be held. In bnxt_sp_task() that does not hold RTNL, we need to
call it with a prior call to bnxt_rtnl_lock_sp() and the call needs to
be moved to the end of bnxt_sp_task().
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>