Joe Perches [Tue, 11 Oct 2016 20:51:47 +0000 (13:51 -0700)]
checkpatch: look for symbolic permissions and suggest octal instead
S_<FOO> uses should be avoided where octal is more intelligible.
Linus didst say:
: It's *much* easier to parse and understand the octal numbers, while the
: symbolic macro names are just random line noise and hard as hell to
: understand. You really have to think about it.
:
: So we should rather go the other way: convert existing bad symbolic
: permission bit macro use to just use the octal numbers.
:
: The symbolic names are good for the *other* bits (ie sticky bit, and the
: inode mode _type_ numbers etc), but for the permission bits, the symbolic
: names are just insane crap. Nobody sane should ever use them. Not in the
: kernel, not in user space.
(http://lkml.kernel.org/r/CA+55aFw5v23T-zvDZp-MmD_EYxF8WbafwwB59934FV7g21uMGQ@mail.gmail.com)
Noam Camus [Tue, 11 Oct 2016 20:51:35 +0000 (13:51 -0700)]
lib/bitmap.c: enhance bitmap syntax
Today there are platforms with many CPUs (up to 4K). Trying to boot only
part of the CPUs may result in too long string.
For example lets take NPS platform that is part of arch/arc. This
platform have SMP system with 256 cores each with 16 HW threads (SMT
machine) where HW thread appears as CPU to the kernel. In this example
there is total of 4K CPUs. When one tries to boot only part of the HW
threads from each core the string representing the map may be long... For
example if for sake of performance we decided to boot only first half of
HW threads of each core the map will look like:
0-7,16-23,32-39,...,4080-4087
This patch introduce new syntax to accommodate with such use case. I
added an optional postfix to a range of CPUs which will choose according
to given modulo the desired range of reminders i.e.:
<cpus range>:sed_size/group_size
For example, above map can be described in new syntax like this:
0-4095:8/16
Note that this patch is backward compatible with current syntax.
[akpm@linux-foundation.org: rework documentation] Link: http://lkml.kernel.org/r/1473579629-4283-1-git-send-email-noamca@mellanox.com Signed-off-by: Noam Camus <noamca@mellanox.com> Cc: David Decotigny <decot@googlers.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: David S. Miller <davem@davemloft.net> Cc: Pan Xinhui <xinhui@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark Rutland [Tue, 11 Oct 2016 20:51:27 +0000 (13:51 -0700)]
lib: harden strncpy_from_user
The strncpy_from_user() accessor is effectively a copy_from_user()
specialised to copy strings, terminating early at a NUL byte if possible.
In other respects it is identical, and can be used to copy an arbitrarily
large buffer from userspace into the kernel. Conceptually, it exposes a
similar attack surface.
As with copy_from_user(), we check the destination range when the kernel
is built with KASAN, but unlike copy_from_user() we do not check the
destination buffer when using HARDENED_USERCOPY. As strncpy_from_user()
calls get_user() in a loop, we must call check_object_size() explicitly.
This patch adds this instrumentation to strncpy_from_user(), per the same
rationale as with the regular copy_from_user(). In the absence of
hardened usercopy this will have no impact as the instrumentation expands
to an empty static inline function.
Ross Zwisler [Tue, 11 Oct 2016 20:51:21 +0000 (13:51 -0700)]
radix-tree tests: add iteration test
There are four cases I can see where we could end up with a NULL 'slot' in
radix_tree_next_slot(). This unit test exercises all four of them, making
sure that if in the future we have an unsafe path through
radix_tree_next_slot(), we'll catch it.
Here are details on the four cases:
1) radix_tree_iter_retry() via a non-tagged iteration like
radix_tree_for_each_slot(). In this case we currently aren't seeing a bug
because radix_tree_iter_retry() sets
iter->next_index = iter->index;
which means that in in the else case in radix_tree_next_slot(), 'count' is
zero, so we skip over the while() loop and effectively just return NULL
without ever dereferencing 'slot'.
2) radix_tree_iter_retry() via tagged iteration like
radix_tree_for_each_tagged(). This case was giving us NULL pointer
dereferences in testing, and was fixed with this commit:
commit 3cb9185c6730 ("radix-tree: fix radix_tree_iter_retry() for tagged
iterators.")
This fix doesn't explicitly check for 'slot' being NULL, though, it works
around the NULL pointer dereference by instead zeroing iter->tags in
radix_tree_iter_retry(), which makes us bail out of the if() case in
radix_tree_next_slot() before we dereference 'slot'.
3) radix_tree_iter_next() via via a non-tagged iteration like
radix_tree_for_each_slot(). This currently happens in shmem_tag_pins()
and shmem_partial_swap_usage().
As with non-tagged iteration, 'count' in the else case of
radix_tree_next_slot() is zero, so we skip over the while() loop and
effectively just return NULL without ever dereferencing 'slot'.
4) radix_tree_iter_next() via tagged iteration like
radix_tree_for_each_tagged(). This happens in shmem_wait_for_pins().
radix_tree_iter_next() zeros out iter->tags, so we end up exiting
radix_tree_next_slot() here:
if (flags & RADIX_TREE_ITER_TAGGED) {
void *canon = slot;
iter->tags >>= 1;
if (unlikely(!iter->tags))
return NULL;
Link: http://lkml.kernel.org/r/20160815194237.25967-3-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ross Zwisler [Tue, 11 Oct 2016 20:51:18 +0000 (13:51 -0700)]
radix-tree: 'slot' can be NULL in radix_tree_next_slot()
There are four cases I can see where we could end up with a NULL 'slot' in
radix_tree_next_slot(). Yet radix_tree_next_slot() never actually checks
whether 'slot' is NULL. It just happens that for the cases where 'slot'
is NULL, some other combination of factors prevents us from dereferencing
it.
It would be very easy for someone to unwittingly change one of these
factors without realizing that we are implicitly depending on it to save
us from a NULL pointer dereference.
Add a comment documenting the things that allow 'slot' to be safely passed
as NULL to radix_tree_next_slot().
Here are details on the four cases:
1) radix_tree_iter_retry() via a non-tagged iteration like
radix_tree_for_each_slot(). In this case we currently aren't seeing a bug
because radix_tree_iter_retry() sets
iter->next_index = iter->index;
which means that in in the else case in radix_tree_next_slot(), 'count' is
zero, so we skip over the while() loop and effectively just return NULL
without ever dereferencing 'slot'.
2) radix_tree_iter_retry() via tagged iteration like
radix_tree_for_each_tagged(). This case was giving us NULL pointer
dereferences in testing, and was fixed with this commit:
commit 3cb9185c6730 ("radix-tree: fix radix_tree_iter_retry() for tagged
iterators.")
This fix doesn't explicitly check for 'slot' being NULL, though, it works
around the NULL pointer dereference by instead zeroing iter->tags in
radix_tree_iter_retry(), which makes us bail out of the if() case in
radix_tree_next_slot() before we dereference 'slot'.
3) radix_tree_iter_next() via via a non-tagged iteration like
radix_tree_for_each_slot(). This currently happens in shmem_tag_pins()
and shmem_partial_swap_usage().
As with non-tagged iteration, 'count' in the else case of
radix_tree_next_slot() is zero, so we skip over the while() loop and
effectively just return NULL without ever dereferencing 'slot'.
4) radix_tree_iter_next() via tagged iteration like
radix_tree_for_each_tagged(). This happens in shmem_wait_for_pins().
radix_tree_iter_next() zeros out iter->tags, so we end up exiting
radix_tree_next_slot() here:
if (flags & RADIX_TREE_ITER_TAGGED) {
void *canon = slot;
iter->tags >>= 1;
if (unlikely(!iter->tags))
return NULL;
Link: http://lkml.kernel.org/r/20160815194237.25967-2-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vlastimil Babka [Tue, 11 Oct 2016 20:51:14 +0000 (13:51 -0700)]
fs/select: add vmalloc fallback for select(2)
The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size grows
with the number of fds passed. We had a customer report page allocation
failures of order-4 for this allocation. This is a costly order, so it might
easily fail, as the VM expects such allocation to have a lower-order fallback.
Such trivial fallback is vmalloc(), as the memory doesn't have to be physically
contiguous and the allocation is temporary for the duration of the syscall
only. There were some concerns, whether this would have negative impact on the
system by exposing vmalloc() to userspace. Although an excessive use of vmalloc
can cause some system wide performance issues - TLB flushes etc. - a large
order allocation is not for free either and an excessive reclaim/compaction can
have a similar effect. Also note that the size is effectively limited by
RLIMIT_NOFILE which defaults to 1024 on the systems I checked. That means the
bitmaps will fit well within single page and thus the vmalloc() fallback could
be only excercised for processes where root allows a higher limit.
Note that the poll(2) syscall seems to use a linked list of order-0 pages, so
it doesn't need this kind of fallback.
[eric.dumazet@gmail.com: fix failure path logic]
[akpm@linux-foundation.org: use proper type for size] Link: http://lkml.kernel.org/r/20160927084536.5923-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Jason Baron <jbaron@akamai.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Darrick J. Wong [Tue, 11 Oct 2016 20:51:11 +0000 (13:51 -0700)]
block: implement (some of) fallocate for block devices
After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been whitelisted
for zeroing SCSI UNMAP. Punch still requires that FALLOC_FL_KEEP_SIZE is
set. A length that goes past the end of the device will be clamped to the
device size if KEEP_SIZE is set; or will return -EINVAL if not. Both
start and length must be aligned to the device's logical block size.
Since the semantics of fallocate are fairly well established already, wire
up the two pieces. The other fallocate variants (collapse range, insert
range, and allocate blocks) are not supported.
Link: http://lkml.kernel.org/r/147518379992.22791.8849838163218235007.stgit@birch.djwong.org Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Mike Snitzer <snitzer@redhat.com> # tweaked header Cc: Brian Foster <bfoster@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hannes Reinecke <hare@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Darrick J. Wong [Tue, 11 Oct 2016 20:51:08 +0000 (13:51 -0700)]
block: require write_same and discard requests align to logical block size
Make sure that the offset and length arguments that we're using to
construct WRITE SAME and DISCARD requests are actually aligned to the
logical block size. Failure to do this causes other errors in other parts
of the block layer or the SCSI layer because disks don't support partial
logical block writes.
Link: http://lkml.kernel.org/r/147518379026.22791.4437508871355153928.stgit@birch.djwong.org Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Mike Snitzer <snitzer@redhat.com> # tweaked header Cc: Brian Foster <bfoster@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Darrick J. Wong [Tue, 11 Oct 2016 20:51:05 +0000 (13:51 -0700)]
block: invalidate the page cache when issuing BLKZEROOUT
Patch series "fallocate for block devices", v11.
This is a patchset to fix page cache coherency with BLKZEROOUT and
implement fallocate for block devices.
The first patch is a fix to the existing BLKZEROOUT ioctl to invalidate
the page cache if the zeroing command to the underlying device succeeds.
Without this patch we still have the pagecache coherence bug that's been
in the kernel forever.
The second patch changes the internal block device functions to reject
attempts to discard or zeroout that are not aligned to the logical block
size. Previously, we only checked that the start/len parameters were
512-byte aligned, which caused kernel BUG_ONs for unaligned IOs to 4k-LBA
devices.
The third patch creates an fallocate handler for block devices, wires up
the FALLOC_FL_PUNCH_HOLE flag to zeroing-discard, and connects
FALLOC_FL_ZERO_RANGE to write-same so that we can have a consistent
fallocate interface between files and block devices. It also allows the
combination of PUNCH_HOLE and NO_HIDE_STALE to invoke non-zeroing discard.
Test cases for the new block device fallocate are now in xfstests as
generic/349-351.
This patch (of 3):
Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
returning stale cache contents at a later time.
Link: http://lkml.kernel.org/r/147518378313.22791.16649519283678515021.stgit@birch.djwong.org Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Brian Foster <bfoster@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2) RXRPC/AFS bug fixes from David Howells (oops on call to serviceless
endpoints, build warnings, missing notifications, etc.) From David
Howells.
3) Kernel log message missing newlines, from Colin Ian King.
4) Don't enter direct reclaim in netlink dumps, the idea is to use a
high order allocation first and fallback quickly to a 0-order
allocation if such a high-order one cannot be done cheaply and
without reclaim. From Eric Dumazet.
5) Fix firmware download errors in btusb bluetooth driver, from Ethan
Hsieh.
6) Missing Kconfig deps for QCOM_EMAC, from Geert Uytterhoeven.
7) Fix MDIO_XGENE dup Kconfig entry. From Laura Abbott.
8) Constrain ipv6 rtr_solicits sysctl values properly, from Maciej
Żenczykowski.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
netfilter: Fix slab corruption.
be2net: Enable VF link state setting for BE3
be2net: Fix TX stats for TSO packets
be2net: Update Copyright string in be_hw.h
be2net: NCSI FW section should be properly updated with ethtool for BE3
be2net: Provide an alternate way to read pf_num for BEx chips
wan/fsl_ucc_hdlc: Fix size used in dma_free_coherent()
net: macb: NULL out phydev after removing mdio bus
xen-netback: make sure that hashes are not send to unaware frontends
Fixing a bug in team driver due to incorrect 'unsigned int' to 'int' conversion
MAINTAINERS: add myself as a maintainer of xen-netback
ipv6 addrconf: disallow rtr_solicits < -1
Bluetooth: btusb: Fix atheros firmware download error
drivers: net: phy: Correct duplicate MDIO_XGENE entry
ethernet: qualcomm: QCOM_EMAC should depend on HAS_DMA and HAS_IOMEM
net: ethernet: mediatek: remove hwlro property in the device tree
net: ethernet: mediatek: get hw lro capability by the chip id instead of by the dtsi
net: ethernet: mediatek: get the chip id by ETHDMASYS registers
net: bgmac: Fix errant feature flag check
netlink: do not enter direct reclaim from netlink_dump()
...
Linus Torvalds [Tue, 11 Oct 2016 03:16:43 +0000 (20:16 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
">rename2() work from Miklos + current_time() from Deepa"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Replace current_fs_time() with current_time()
fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
fs: Replace CURRENT_TIME with current_time() for inode timestamps
fs: proc: Delete inode time initializations in proc_alloc_inode()
vfs: Add current_time() api
vfs: add note about i_op->rename changes to porting
fs: rename "rename2" i_op to "rename"
vfs: remove unused i_op->rename
fs: make remaining filesystems use .rename2
libfs: support RENAME_NOREPLACE in simple_rename()
fs: support RENAME_NOREPLACE for local filesystems
ncpfs: fix unused variable warning
Linus Torvalds [Tue, 11 Oct 2016 00:39:51 +0000 (17:39 -0700)]
Merge tag 'for-linus-20161008' of git://git.infradead.org/linux-mtd
Pull MTD updates from Brian Norris:
"I've not been very active this cycle, so these are mostly from Boris,
for the NAND flash subsystem.
NAND:
- Add the infrastructure to automate NAND timings configuration
- Provide a generic DT property to maximize ECC strength
- Some refactoring in the core bad block table handling, to help with
improving some of the logic in error cases.
- Minor cleanups and fixes
MTD:
- Add APIs for handling page pairing; this is necessary for reliably
supporting MLC and TLC NAND flash, where paired-page disturbance
affects reliability. Upper layers (e.g., UBI) should make use of
these in the near future"
* tag 'for-linus-20161008' of git://git.infradead.org/linux-mtd: (35 commits)
mtd: nand: fix trivial spelling error
mtdpart: Propagate _get/put_device()
mtd: nand: Provide nand_cleanup() function to free NAND related resources
mtd: Kill the OF_MTD Kconfig option
mtd: nand: mxc: Test CONFIG_OF instead of CONFIG_OF_MTD
mtd: nand: Fix nand_command_lp() for 8bits opcodes
mtd: nand: sunxi: Support ECC maximization
mtd: nand: Support maximizing ECC when using software BCH
mtd: nand: Add an option to maximize the ECC strength
mtd: nand: mxc: Add timing setup for v2 controllers
mtd: nand: mxc: implement onfi get/set features
mtd: nand: sunxi: switch from manual to automated timing config
mtd: nand: automate NAND timings selection
mtd: nand: Expose data interface for ONFI mode 0
mtd: nand: Add function to convert ONFI mode to data_interface
mtd: nand: convert ONFI mode into data interface
mtd: nand: Introduce nand_data_interface
mtd: nand: Create a NAND reset function
mtd: nand: remove unnecessary 'extern' from function declarations
MAINTAINERS: Add maintainer entry for Ingenic JZ4780 NAND driver
...
Linus Torvalds [Tue, 11 Oct 2016 00:11:50 +0000 (17:11 -0700)]
Merge branch 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs xattr updates from Al Viro:
"xattr stuff from Andreas
This completes the switch to xattr_handler ->get()/->set() from
->getxattr/->setxattr/->removexattr"
* 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: Remove {get,set,remove}xattr inode operations
xattr: Stop calling {get,set,remove}xattr inode operations
vfs: Check for the IOP_XATTR flag in listxattr
xattr: Add __vfs_{get,set,remove}xattr helpers
libfs: Use IOP_XATTR flag for empty directory handling
vfs: Use IOP_XATTR flag for bad-inode handling
vfs: Add IOP_XATTR inode operations flag
vfs: Move xattr_resolve_name to the front of fs/xattr.c
ecryptfs: Switch to generic xattr handlers
sockfs: Get rid of getxattr iop
sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names
kernfs: Switch to generic xattr handlers
hfs: Switch to generic xattr handlers
jffs2: Remove jffs2_{get,set,remove}xattr macros
xattr: Remove unnecessary NULL attribute name check
Linus Torvalds [Mon, 10 Oct 2016 21:04:16 +0000 (14:04 -0700)]
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"Here is the crypto update for 4.9:
API:
- The crypto engine code now supports hashes.
Algorithms:
- Allow keys >= 2048 bits in FIPS mode for RSA.
Drivers:
- Memory overwrite fix for vmx ghash.
- Add support for building ARM sha1-neon in Thumb2 mode.
- Reenable ARM ghash-ce code by adding import/export.
- Reenable img-hash by adding import/export.
- Add support for multiple cores in omap-aes.
- Add little-endian support for sha1-powerpc.
- Add Cavium HWRNG driver for ThunderX SoC"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (137 commits)
crypto: caam - treat SGT address pointer as u64
crypto: ccp - Make syslog errors human-readable
crypto: ccp - clean up data structure
crypto: vmx - Ensure ghash-generic is enabled
crypto: testmgr - add guard to dst buffer for ahash_export
crypto: caam - Unmap region obtained by of_iomap
crypto: sha1-powerpc - little-endian support
crypto: gcm - Fix IV buffer size in crypto_gcm_setkey
crypto: vmx - Fix memory corruption caused by p8_ghash
crypto: ghash-generic - move common definitions to a new header file
crypto: caam - fix sg dump
hwrng: omap - Only fail if pm_runtime_get_sync returns < 0
crypto: omap-sham - shrink the internal buffer size
crypto: omap-sham - add support for export/import
crypto: omap-sham - convert driver logic to use sgs for data xmit
crypto: omap-sham - change the DMA threshold value to a define
crypto: omap-sham - add support functions for sg based data handling
crypto: omap-sham - rename sgl to sgl_tmp for deprecation
crypto: omap-sham - align algorithms on word offset
crypto: omap-sham - add context export/import stubs
...
Linus Torvalds [Mon, 10 Oct 2016 20:58:06 +0000 (13:58 -0700)]
Merge tag 'dlm-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm fix from David Teigland:
"This includes a bug fix for a bad memory access during workqueue
cleanup, which can happen while shutting down the dlm networking
layer"
* tag 'dlm-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
dlm: free workqueues after the connections
Linus Torvalds [Mon, 10 Oct 2016 20:52:05 +0000 (13:52 -0700)]
Merge tag 'ceph-for-4.9-rc1' of git://github.com/ceph/ceph-client
Pull Ceph updates from Ilya Dryomov:
"The big ticket item here is support for rbd exclusive-lock feature,
with maintenance operations offloaded to userspace (Douglas Fuller,
Mike Christie and myself). Another block device bullet is a series
fixing up layering error paths (myself).
On the filesystem side, we've got patches that improve our handling of
buffered vs dio write races (Neil Brown) and a few assorted fixes from
Zheng. Also included a couple of random cleanups and a minor CRUSH
update"
* tag 'ceph-for-4.9-rc1' of git://github.com/ceph/ceph-client: (39 commits)
crush: remove redundant local variable
crush: don't normalize input of crush_ln iteratively
libceph: ceph_build_auth() doesn't need ceph_auth_build_hello()
libceph: use CEPH_AUTH_UNKNOWN in ceph_auth_build_hello()
ceph: fix description for rsize and rasize mount options
rbd: use kmalloc_array() in rbd_header_from_disk()
ceph: use list_move instead of list_del/list_add
ceph: handle CEPH_SESSION_REJECT message
ceph: avoid accessing / when mounting a subpath
ceph: fix mandatory flock check
ceph: remove warning when ceph_releasepage() is called on dirty page
ceph: ignore error from invalidate_inode_pages2_range() in direct write
ceph: fix error handling of start_read()
rbd: add rbd_obj_request_error() helper
rbd: img_data requests don't own their page array
rbd: don't call rbd_osd_req_format_read() for !img_data requests
rbd: rework rbd_img_obj_exists_submit() error paths
rbd: don't crash or leak on errors in rbd_img_obj_parent_read_full_callback()
rbd: move bumping img_request refcount into rbd_obj_request_submit()
rbd: mark the original request as done if stat request fails
...
Linus Torvalds [Mon, 10 Oct 2016 20:38:49 +0000 (13:38 -0700)]
Merge branch 'work.splice_read' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull splice fixups from Al Viro:
"A couple of fixups for interaction of pipe-backed iov_iter with
O_DIRECT reads + constification of a couple of primitives in uio.h
missed by previous rounds.
Kudos to davej - his fuzzing has caught those bugs"
* 'work.splice_read' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
[btrfs] fix check_direct_IO() for non-iovec iterators
constify iov_iter_count() and iter_is_iovec()
fix ITER_PIPE interaction with direct_IO
Linus Torvalds [Mon, 10 Oct 2016 20:04:49 +0000 (13:04 -0700)]
Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
"Assorted misc bits and pieces.
There are several single-topic branches left after this (rename2
series from Miklos, current_time series from Deepa Dinamani, xattr
series from Andreas, uaccess stuff from from me) and I'd prefer to
send those separately"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits)
proc: switch auxv to use of __mem_open()
hpfs: support FIEMAP
cifs: get rid of unused arguments of CIFSSMBWrite()
posix_acl: uapi header split
posix_acl: xattr representation cleanups
fs/aio.c: eliminate redundant loads in put_aio_ring_file
fs/internal.h: add const to ns_dentry_operations declaration
compat: remove compat_printk()
fs/buffer.c: make __getblk_slow() static
proc: unsigned file descriptors
fs/file: more unsigned file descriptors
fs: compat: remove redundant check of nr_segs
cachefiles: Fix attempt to read i_blocks after deleting file [ver #2]
cifs: don't use memcpy() to copy struct iov_iter
get rid of separate multipage fault-in primitives
fs: Avoid premature clearing of capabilities
fs: Give dentry to inode_change_ok() instead of inode
fuse: Propagate dentry down to inode_change_ok()
ceph: Propagate dentry down to inode_change_ok()
xfs: Propagate dentry down to inode_change_ok()
...
Linus Torvalds [Mon, 10 Oct 2016 18:34:28 +0000 (11:34 -0700)]
Merge branch 'pcmcia' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM pcmcia updates from Russell King:
"These updates lay the foundations for more generic soc_common PCMCIA
support, which will result in several of the board specific drivers
being elimated.
As the dependencies for this are complex, the preliminary work is
being submitted now, with the remainder scheduled for the next merge
window"
* 'pcmcia' of git://git.armlinux.org.uk/~rmk/linux-arm:
pcmcia: soc_common: add driver-data pointer
pcmcia: soc_common: add support for voltage sense GPIOs
pcmcia: soc_common: constify pcmcia_low_level ops pointer
pcmcia: soc_common: switch to a per-socket cpufreq notifier
pcmcia: soc_common: add support for Vcc and Vpp regulators
pcmcia: soc_common: add CF socket state helper
pcmcia: soc_common: restore previous socket state on error
pcmcia: soc_common: add support for reset and bus enable GPIOs
pcmcia: soc_common: request legacy detect GPIO with active low
pcmcia: soc_common: ignore invalid interrupts
pcmcia: soc_common: switch to using gpio_descs
pcmcia: soc_common: use devm_gpio_request_one()
Linus Torvalds [Mon, 10 Oct 2016 18:01:51 +0000 (11:01 -0700)]
Merge branch 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull protection keys syscall interface from Thomas Gleixner:
"This is the final step of Protection Keys support which adds the
syscalls so user space can actually allocate keys and protect memory
areas with them. Details and usage examples can be found in the
documentation.
The mm side of this has been acked by Mel"
* 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/pkeys: Update documentation
x86/mm/pkeys: Do not skip PKRU register if debug registers are not used
x86/pkeys: Fix pkeys build breakage for some non-x86 arches
x86/pkeys: Add self-tests
x86/pkeys: Allow configuration of init_pkru
x86/pkeys: Default to a restrictive init PKRU
pkeys: Add details of system call use to Documentation/
generic syscalls: Wire up memory protection keys syscalls
x86: Wire up protection keys system calls
x86/pkeys: Allocation/free syscalls
x86/pkeys: Make mprotect_key() mask off additional vm_flags
mm: Implement new pkey_mprotect() system call
x86/pkeys: Add fault handling for PF_PK page fault bit
Linus Torvalds [Mon, 10 Oct 2016 17:59:07 +0000 (10:59 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 updates from Thomas Gleixner:
"A pile of regression fixes and updates:
- address the fallout of the patches which made the cpuid - nodeid
relation permanent: Handling of invalid APIC ids and preventing
pointless warning messages.
- force eager FPU when protection keys are enabled. Protection keys
are not generating FPU exceptions so they cannot work with the lazy
FPU mechanism.
- prevent force migration of interrupts which are not part of the CPU
vector domain.
- handle the fact that APIC ids are not updated in the ACPI/MADT
tables on physical CPU hotplug
- remove bash-isms from syscall table generator script
- use the hypervisor supplied APIC frequency when running on VMware"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/pkeys: Make protection keys an "eager" feature
x86/apic: Prevent pointless warning messages
x86/acpi: Prevent LAPIC id 0xff from being accounted
arch/x86: Handle non enumerated CPU after physical hotplug
x86/unwind: Fix oprofile module link error
x86/vmware: Skip lapic calibration on VMware
x86/syscalls: Remove bash-isms in syscall table generator
x86/irq: Prevent force migration of irqs which are not in the vector domain
Al Viro [Mon, 10 Oct 2016 17:39:05 +0000 (13:39 -0400)]
[btrfs] fix check_direct_IO() for non-iovec iterators
looking for duplicate ->iov_base makes sense only for
iovec-backed iterators; for kvec-backed ones it's pointless,
for bvec-backed ones it's pointless and broken on 32bit (we
walk through an array of struct bio_vec accessing them as if
they were struct iovec; works by accident on 64bit, but on
32bit it'll blow up) and for pipe-backed ones it's pointless
and ends up oopsing.
Al Viro [Mon, 10 Oct 2016 17:26:27 +0000 (13:26 -0400)]
fix ITER_PIPE interaction with direct_IO
by making sure we call iov_iter_advance() on original
iov_iter even if direct_IO (done on its copy) has returned 0.
It's a no-op for old iov_iter flavours and does the right thing
(== truncation of the stuff we'd allocated, but not filled) in
ITER_PIPE case. Failures (e.g. -EIO) get caught and dealt with
by cleanup in generic_file_read_iter().
Linus Torvalds [Mon, 10 Oct 2016 17:33:58 +0000 (10:33 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf tooling updates from Thomas Gleixner:
- handle uretprobe placement proper on little endian PPC64
- fix buffer handling in libtraceevent
- add a missing pointer derefence in perf probe
- fix the build of host tools in cross builds
- fix Intel PT timestamp handling
- synchronize memcpy, cpufeatures and bpf headers with the kernel headers
- support for vendor supplied JSON files describing PMU events
- a new set of tool tips
- initial work for clang/llvm support
- address some style issues found by cppcheck
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
tools build: Add feature detection for g++
tools build: Support compiling C++ source file
perf top/report: Add tips about a list option
perf report/top: Add a tip about system-wide collection from all CPUs
perf report/top: Add a tip about source line numbers with overhead
tools: Synchronize tools/include/uapi/linux/bpf.h
tools: Synchronize tools/arch/x86/include/asm/cpufeatures.h
perf bench mem: Sync memcpy assembly sources with the kernel
perf jevents: Fix Intel JSON fixed counter conversions
tools lib traceevent: Fix kbuffer_read_at_offset()
perf intel-pt: Fix MTC timestamp calculation for large MTC periods
perf intel-pt: Fix estimated timestamps for cycle-accurate mode
perf uretprobe ppc64le: Fix probe location
perf pmu-events: Add Skylake frontend MSR support
perf pmu-events: Fix fixed counters on Intel
perf tools: Make alias matching case-insensitive
perf tools: Allow period= in perf stat CPU event descriptions.
perf tools: Add README for info on parsing JSON/map files
perf list jevents: Add support for event list topics
perf list: Support long jevents descriptions
...
Linus Torvalds [Mon, 10 Oct 2016 17:29:59 +0000 (10:29 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Thomas Gleixner:
"A revert of a commit which pointelessly widened a preempt disabled
section which in turn caused might_sleep() to trigger.
The patch intended to prevent usage of smp_processor_id() in
preemptible context, but the usage in that case is fine because the
thread is pinned on a single cpu and therefore cannot be migrated off"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "sched/core: Do not use smp_processor_id() with preempt enabled in smpboot_thread_fn()"
Linus Torvalds [Mon, 10 Oct 2016 17:23:18 +0000 (10:23 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A single fix for a regression introduced in 4.8 which causes the
trace/perf clock to return random nonsense if CONFIG_DEBUG_TIMEKEEPING
is set"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Fix __ktime_get_fast_ns() regression
Linus Torvalds [Mon, 10 Oct 2016 16:29:50 +0000 (09:29 -0700)]
Merge branch 'printk-cleanups'
Merge my system logging cleanups, triggered by the broken '\n' patches.
The line continuation handling has been broken basically forever, and
the code to handle the system log records was both confusing and
dubious. And it would do entirely the wrong thing unless you always had
a terminating newline, partly because it couldn't actually see whether a
message was marked KERN_CONT or not (but partly because the LOG_CONT
handling in the recording code was rather confusing too).
This re-introduces a real semantically meaningful KERN_CONT, and fixes
the few places I noticed where it was missing. There are probably more
missing cases, since KERN_CONT hasn't actually had any semantic meaning
for at least four years (other than the checkpatch meaning of "no log
level necessary, this is a continuation line").
This also allows the combination of KERN_CONT and a log level. In that
case the log level will be ignored if the merging with a previous line
is successful, but if a new record is needed, that new record will now
get the right log level.
That also means that you can at least in theory combine KERN_CONT with
the "pr_info()" style helpers, although any use of pr_fmt() prefixing
would make that just result in a mess, of course (the prefix would end
up in the middle of a continuing line).
* printk-cleanups:
printk: make reading the kernel log flush pending lines
printk: re-organize log_output() to be more legible
printk: split out core logging code into helper function
printk: reinstate KERN_CONT for printing continuation lines
After backporting commit ee44b4bc054a ("dlm: use sctp 1-to-1 API")
series to a kernel with an older workqueue which didn't use RCU yet, it
was noticed that we are freeing the workqueues in dlm_lowcomms_stop()
too early as free_conn() will try to access that memory for canceling
the queued works if any.
This issue was introduced by commit 0d737a8cfd83 as before it such
attempt to cancel the queued works wasn't performed, so the issue was
not present.
This patch fixes it by simply inverting the free order.
Cc: stable@vger.kernel.org Fixes: 0d737a8cfd83 ("dlm: fix race while closing connections") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David Teigland <teigland@redhat.com>
Linus Torvalds [Mon, 10 Oct 2016 00:32:20 +0000 (17:32 -0700)]
Merge branch 'for-4.9/block-smp' of git://git.kernel.dk/linux-block
Pull blk-mq CPU hotplug update from Jens Axboe:
"This is the conversion of blk-mq to the new hotplug state machine"
* 'for-4.9/block-smp' of git://git.kernel.dk/linux-block:
blk-mq: fixup "Convert to new hotplug state machine"
blk-mq: Convert to new hotplug state machine
blk-mq/cpu-notif: Convert to new hotplug state machine
Linus Torvalds [Mon, 10 Oct 2016 00:29:33 +0000 (17:29 -0700)]
Merge branch 'for-4.9/block-irq' of git://git.kernel.dk/linux-block
Pull blk-mq irq/cpu mapping updates from Jens Axboe:
"This is the block-irq topic branch for 4.9-rc. It's mostly from
Christoph, and it allows drivers to specify their own mappings, and
more importantly, to share the blk-mq mappings with the IRQ affinity
mappings. It's a good step towards making this work better out of the
box"
* 'for-4.9/block-irq' of git://git.kernel.dk/linux-block:
blk_mq: linux/blk-mq.h does not include all the headers it depends on
blk-mq: kill unused blk_mq_create_mq_map()
blk-mq: get rid of the cpumask in struct blk_mq_tags
nvme: remove the post_scan callout
nvme: switch to use pci_alloc_irq_vectors
blk-mq: provide a default queue mapping for PCI device
blk-mq: allow the driver to pass in a queue mapping
blk-mq: remove ->map_queue
blk-mq: only allocate a single mq_map per tag_set
blk-mq: don't redistribute hardware queues on a CPU hotplug event
Linus Torvalds [Mon, 10 Oct 2016 00:16:18 +0000 (17:16 -0700)]
Merge tag 'dm-4.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- various fixes and cleanups for request-based DM core
- add support for delaying the requeue of requests; used by DM
multipath when all paths have failed and 'queue_if_no_path' is
enabled
- DM cache improvements to speedup the loading metadata and the writing
of the hint array
- fix potential for a dm-crypt crash on device teardown
- remove dm_bufio_cond_resched() and just using cond_resched()
- change DM multipath to return a reservation conflict error
immediately; rather than failing the path and retrying (potentially
indefinitely)
* tag 'dm-4.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (24 commits)
dm mpath: always return reservation conflict without failing over
dm bufio: remove dm_bufio_cond_resched()
dm crypt: fix crash on exit
dm cache metadata: switch to using the new cursor api for loading metadata
dm array: introduce cursor api
dm btree: introduce cursor api
dm cache policy smq: distribute entries to random levels when switching to smq
dm cache: speed up writing of the hint array
dm array: add dm_array_new()
dm mpath: delay the requeue of blk-mq requests while all paths down
dm mpath: use dm_mq_kick_requeue_list()
dm rq: introduce dm_mq_kick_requeue_list()
dm rq: reduce arguments passed to map_request() and dm_requeue_original_request()
dm rq: add DM_MAPIO_DELAY_REQUEUE to delay requeue of blk-mq requests
dm: convert wait loops to use autoremove_wake_function()
dm: use signal_pending_state() in dm_wait_for_completion()
dm: rename task state function arguments
dm: add two lockdep_assert_held() statements
dm rq: simplify dm_old_stop_queue()
dm mpath: check if path's request_queue is dying in activate_path()
...
Linus Torvalds [Mon, 10 Oct 2016 00:04:33 +0000 (17:04 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull main rdma updates from Doug Ledford:
"This is the main pull request for the rdma stack this release. The
code has been through 0day and I had it tagged for linux-next testing
for a couple days.
Summary:
- updates to mlx5
- updates to mlx4 (two conflicts, both minor and easily resolved)
- updates to iw_cxgb4 (one conflict, not so obvious to resolve,
proper resolution is to keep the code in cxgb4_main.c as it is in
Linus' tree as attach_uld was refactored and moved into
cxgb4_uld.c)
- improvements to uAPI (moved vendor specific API elements to uAPI
area)
- add hns-roce driver and hns and hns-roce ACPI reset support
- conversion of all rdma code away from deprecated
create_singlethread_workqueue
- security improvement: remove unsafe ib_get_dma_mr (breaks lustre in
staging)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (75 commits)
staging/lustre: Disable InfiniBand support
iw_cxgb4: add fast-path for small REG_MR operations
cxgb4: advertise support for FR_NSMR_TPTE_WR
IB/core: correctly handle rdma_rw_init_mrs() failure
IB/srp: Fix infinite loop when FMR sg[0].offset != 0
IB/srp: Remove an unused argument
IB/core: Improve ib_map_mr_sg() documentation
IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets
IB/mthca: Move user vendor structures
IB/nes: Move user vendor structures
IB/ocrdma: Move user vendor structures
IB/mlx4: Move user vendor structures
IB/cxgb4: Move user vendor structures
IB/cxgb3: Move user vendor structures
IB/mlx5: Move and decouple user vendor structures
IB/{core,hw}: Add constant for node_desc
ipoib: Make ipoib_warn ratelimited
IB/mlx4/alias_GUID: Remove deprecated create_singlethread_workqueue
IB/ipoib_verbs: Remove deprecated create_singlethread_workqueue
IB/ipoib: Remove deprecated create_singlethread_workqueue
...
Linus Torvalds [Sun, 9 Oct 2016 23:30:03 +0000 (16:30 -0700)]
Merge tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull more rdma updates from Doug Ledford:
"Minor updates for rxe driver"
[ Starting to do merge window pulls again - the current -git tree does
appear to have some netfilter use-after-free issues, but I've sent
off the report to the proper channels, and I don't want to delay merge
window activity any more ]
* tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
IB/rxe: improved debug prints & code cleanup
rdma_rxe: Ensure rdma_rxe init occurs at correct time
IB/rxe: Properly honor max IRD value for rd/atomic.
IB/{rxe,core,rdmavt}: Fix kernel crash for reg MR
IB/rxe: Fix sending out loopback packet on netdev interface.
IB/rxe: Avoid scheduling tasklet for userspace QP
Linus Torvalds [Sun, 9 Oct 2016 19:16:57 +0000 (12:16 -0700)]
printk: make reading the kernel log flush pending lines
That will mean that any possible subsequent continuation will now be
broken up onto a line of its own (since reading the log has finalized
the beginning og the line), but if user space has activated system
logging (or if there's a kernel message dump going on) that is the right
thing to do.
And now that we actually get the continuation flags _right_ for this
all, the user space logger that is reading the kernel messages can
actually see the continuation marker. Not that anybody seems to really
bother with it (or care), but in theory user space can do its own
message stitching.
Linus Torvalds [Sun, 9 Oct 2016 18:53:00 +0000 (11:53 -0700)]
printk: re-organize log_output() to be more legible
Avoid some duplicate logic now that we can return early, and update the
comments for the new LOG_CONT world order.
This also stops the continuation flushing from just using random record
flags for the flushing action, instead taking the flags from the proper
original line and updating them as we add continuations to it.
Linus Torvalds [Sun, 9 Oct 2016 05:02:09 +0000 (22:02 -0700)]
printk: split out core logging code into helper function
The code that actually decides how to log the message (whether to put it
directly into the record log, whether to append it to an existing
buffered log, or whether to start a new buffered log) is fairly
non-obvious code in the middle of the vprintk_emit() function.
Splitting that code up into a helper function makes it easier to
understand, but perhaps more importantly also allows for the code to
just return early out of the helper function once it has made the
decision about where the new log content goes.
Linus Torvalds [Sun, 9 Oct 2016 03:32:40 +0000 (20:32 -0700)]
printk: reinstate KERN_CONT for printing continuation lines
Long long ago the kernel log buffer was a buffered stream of bytes, very
much like stdio in user space. It supported log levels by scanning the
stream and noticing the log level markers at the beginning of each line,
but if you wanted to print a partial line in multiple chunks, you just
did multiple printk() calls, and it just automatically worked.
Except when it didn't, and you had very confusing output when different
lines got all mixed up with each other. Then you got fragment lines
mixing with each other, or with non-fragment lines, because it was
traditionally impossible to tell whether a printk() call was a
continuation or not.
To at least help clarify the issue of continuation lines, we added a
KERN_CONT marker back in 2007 to mark continuation lines:
That continuation marker was initially an empty string, and didn't
actuall make any semantic difference. But it at least made it possible
to annotate the source code, and have check-patch notice that a printk()
didn't need or want a log level marker, because it was a continuation of
a previous line.
To avoid the ambiguity between a continuation line that had that
KERN_CONT marker, and a printk with no level information at all, we then
in 2009 made KERN_CONT be a real log level marker which meant that we
could now reliably tell the difference between the two cases.
5fd29d6ccbc9 ("printk: clean up handling of log-levels and newlines")
and we could take advantage of that to make sure we didn't mix up
continuation lines with lines that just didn't have any loglevel at all.
Then, in 2012, the kernel log buffer was changed to be a "record" based
log, where each line was a record that has a loglevel and a timestamp.
You can see the beginning of that conversion in commits
e11fea92e13f ("kmsg: export printk records to the /dev/kmsg interface") 7ff9554bb578 ("printk: convert byte-buffer to variable-length record buffer")
with a number of follow-up commits to fix some painful fallout from that
conversion. Over all, it took a couple of months to sort out most of
it. But the upside was that you could have concurrent readers (and
writers) of the kernel log and not have lines with mixed output in them.
And one particular pain-point for the record-based kernel logging was
exactly the fragmentary lines that are generated in smaller chunks. In
order to still log them as one recrod, the continuation lines need to be
attached to the previous record properly.
However the explicit continuation record marker that is actually useful
for this exact case was actually removed in aroundm the same time by commit
61e99ab8e35a ("printk: remove the now unnecessary "C" annotation for KERN_CONT")
due to the incorrect belief that KERN_CONT wasn't meaningful. The
ambiguity between "is this a continuation line" or "is this a plain
printk with no log level information" was reintroduced, and in fact
became an even bigger pain point because there was now the whole
record-level merging of kernel messages going on.
This patch reinstates the KERN_CONT as a real non-empty string marker,
so that the ambiguity is fixed once again.
But it's not a plain revert of that original removal: in the four years
since we made KERN_CONT an empty string again, not only has the format
of the log level markers changed, we've also had some usage changes in
this area.
For example, some ACPI code seems to use KERN_CONT _together_ with a log
level, and now uses both the KERN_CONT marker and (for example) a
KERN_INFO marker to show that it's an informational continuation of a
line.
Which is actually not a bad idea - if the continuation line cannot be
attached to its predecessor, without the log level information we don't
know what log level to assign to it (and we traditionally just assigned
it the default loglevel). So having both a log level and the KERN_CONT
marker is not necessarily a bad idea, but it does mean that we need to
actually iterate over potentially multiple markers, rather than just a
single one.
Also, since KERN_CONT was still conceptually needed, and encouraged, but
didn't actually _do_ anything, we've also had the reverse problem:
rather than having too many annotations it has too few, and there is bit
rot with code that no longer marks the continuation lines with the
KERN_CONT marker.
So this patch not only re-instates the non-empty KERN_CONT marker, it
also fixes up the cases of bit-rot I noticed in my own logs.
There are probably other cases where KERN_CONT will be needed to be
added, either because it is new code that never dealt with the need for
KERN_CONT, or old code that has bitrotted without anybody noticing.
That said, we should strive to avoid the need for KERN_CONT. It does
result in real problems for logging, and should generally not be seen as
a good feature. If we some day can get rid of the feature entirely,
because nobody does any fragmented printk calls, that would be lovely.
But until that point, let's at mark the code that relies on the hacky
multi-fragment kernel printk's. Not only does it avoid the ambiguity,
it also annotates code as "maybe this would be good to fix some day".
(That said, particularly during single-threaded bootup, the downsides of
KERN_CONT are very limited. Things get much hairier when you have
multiple threads going on and user level reading and writing logs too).
David S. Miller [Sun, 9 Oct 2016 13:30:45 +0000 (09:30 -0400)]
Merge branch 'be2net-fixes'
Sriharsha Basavapatna says:
====================
be2net: patch-set
The following patch set contains a few bug fixes.
Please consider applying this to the net-next tree.
Thanks.
Patch-1 Obtains proper PF number for BEx chips
Patch-2 Fixes a FW update issue seen with BEx chips
Patch-3 Updates copyright string
Patch-4 Fixes TX stats for TSO packets
Patch-5 Enables VF link state setting for BE3
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Suresh Reddy [Sun, 9 Oct 2016 04:28:53 +0000 (09:58 +0530)]
be2net: Enable VF link state setting for BE3
The VF link state setting feature now works on BE3 chips too from
FW ver 11.1.192.0 onwards.
Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com> Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
TX stats update does not take into account headers which get duplicated
when the TSO packet is split into segments by HW. Fix this for both
tunneled (vxlan) and non-tunneled TSO packets.
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
be2net: NCSI FW section should be properly updated with ethtool for BE3
The driver has a check to ensure that NCSI FW section is updated only
if the current FW version in the card supports it. This FW version check
is done using memcmp() which obviously fails in some cases. Fix this by
breaking up the version string into integer version components and
comparing them.
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
be2net: Provide an alternate way to read pf_num for BEx chips
The driver gets the pf_num for Skyhawk and Lancer using
GET_FUNC_CONFIG FW command. But since that command is not
supported in BEx, we need to get it from some other command.
Otherwise TPE recovery would fail since all NIC PFs would
end up with a func num of 0. There's a pci function number
field in the response of GET_CNTL_ATTRIBUTES command that
can be read to get the same info for BEx adapters.
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Hansen [Fri, 7 Oct 2016 16:23:42 +0000 (09:23 -0700)]
x86/pkeys: Make protection keys an "eager" feature
Our XSAVE features are divided into two categories: those that
generate FPU exceptions, and those that do not. MPX and pkeys do
not generate FPU exceptions and thus can not be used lazily. We
disable them when lazy mode is forced on.
We have a pair of masks to collect these two sets of features, but
XFEATURE_MASK_PKRU was added to the wrong mask: XFEATURE_MASK_LAZY.
Fix it by moving the feature to XFEATURE_MASK_EAGER.
Note: this only causes problem if you boot with lazy FPU mode
(eagerfpu=off) which is *not* the default. It also only affects
hardware which is not currently publicly available. It looks like
eager mode is going away, but we still need this patch applied
to any kernel that has protection keys and lazy mode, which is 4.6
through 4.8 at this point, and 4.9 if the lazy removal isn't sent
to Linus for 4.9.
Fixes: c8df40098451 ("x86/fpu, x86/mm/pkeys: Add PKRU xsave fields and data structures") Signed-off-by: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave@sr71.net> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20161007162342.28A49813@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Fri, 7 Oct 2016 13:55:13 +0000 (15:55 +0200)]
x86/apic: Prevent pointless warning messages
Markus reported that he sees new warnings:
APIC: NR_CPUS/possible_cpus limit of 4 reached. Processor 4/0x84 ignored.
APIC: NR_CPUS/possible_cpus limit of 4 reached. Processor 5/0x85 ignored.
This comes from the recent persistant cpuid - nodeid changes. The code
which emits the warning has been called prior to these changes only for
enabled processors. Now it's called for disabled processors as well to get
the possible cpu accounting correct. So if the kernel is compiled for the
number of actual available/enabled CPUs and the BIOS reports disabled CPUs
as well then the above warnings are printed.
That's a pointless exercise as it only makes sense if there are more CPUs
enabled than the kernel supports.
Nake the warning conditional on enabled processors so we are back to the
state before these changes.
Fixes: 8f54969dc8d6 ("x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping") Reported-and-tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: linux-acpi@vger.kernel.org Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610071549330.19804@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Fri, 7 Oct 2016 12:02:12 +0000 (14:02 +0200)]
x86/acpi: Prevent LAPIC id 0xff from being accounted
Yinghai reported that the recent changes to make the cpuid - nodeid
relationship permanent causes a cpuid ordering regression on a system which
has 2apic enabled..
The reason is that the ACPI local APIC parser has no sanity check for
apicid 0xff, which is an invalid id. So a CPU id for this invalid local
APIC id is allocated and therefor breaks the cpuid ordering.
Add a sanity check to acpi_parse_lapic() which ignores the invalid id.
Fixes: 8f54969dc8d6 ("x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping") Reported-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>, Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: douly.fnst@cn.fujitsu.com, Cc: zhugh.fnst@cn.fujitsu.com Cc: Tony Luck <tony.luck@intel.com> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: Lv Zheng <lv.zheng@intel.com>, Cc: robert.moore@intel.com Cc: linux-acpi@vger.kernel.org Link: https://lkml.kernel.org/r/CAE9FiQVQx6FRXT-RdR7Crz4dg5LeUWHcUSy1KacjR+JgU_vGJg@mail.gmail.com
Linus Torvalds [Sat, 8 Oct 2016 04:38:00 +0000 (21:38 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:
- fsnotify updates
- ocfs2 updates
- all of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (127 commits)
console: don't prefer first registered if DT specifies stdout-path
cred: simpler, 1D supplementary groups
CREDITS: update Pavel's information, add GPG key, remove snail mail address
mailmap: add Johan Hovold
.gitattributes: set git diff driver for C source code files
uprobes: remove function declarations from arch/{mips,s390}
spelling.txt: "modeled" is spelt correctly
nmi_backtrace: generate one-line reports for idle cpus
arch/tile: adopt the new nmi_backtrace framework
nmi_backtrace: do a local dump_stack() instead of a self-NMI
nmi_backtrace: add more trigger_*_cpu_backtrace() methods
min/max: remove sparse warnings when they're nested
Documentation/filesystems/proc.txt: add more description for maps/smaps
mm, proc: fix region lost in /proc/self/smaps
proc: fix timerslack_ns CAP_SYS_NICE check when adjusting self
proc: add LSM hook checks to /proc/<tid>/timerslack_ns
proc: relax /proc/<tid>/timerslack_ns capability requirements
meminfo: break apart a very long seq_printf with #ifdefs
seq/proc: modify seq_put_decimal_[u]ll to take a const char *, not char
proc: faster /proc/*/status
...
Linus Torvalds [Sat, 8 Oct 2016 04:34:49 +0000 (21:34 -0700)]
Merge tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC late DT updates from Arnd Bergmann:
"These updates have been kept in a separate branch mostly because they
rely on updates to the respective clk drivers to keep the shared
header files in sync.
- The Renesas r8a7796 (R-Car M3-W) platform gets added, this is an
automotive SoC similar to the ⅹ8a7795 chip we already support, but
the dts changes rely on a clock driver change that has been merged
for v4.9 through the clk tree.
- The Amlogic meson-gxbb (S905) platform gains support for a few
drivers merged through our tree, in particular the network and usb
driver changes are required and included here, and also the clk
tree changes.
- The Allwinner platforms have seen a large-scale change to their clk
drivers and the dts file updates must come after that. This
includes the newly added Nextthing GR8 platform, which is derived
from sun5i/A13.
- Some integrator (arm32) changes rely on clk driver changes.
- A single patch for lpc32xx has no such dependency but wasn't added
until just before the merge window"
* tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (99 commits)
ARM: dts: lpc32xx: add device node for IRAM on-chip memory
ARM: dts: sun8i: Add accelerometer to polaroid-mid2407pxe03
ARM: dts: sun8i: enable UART1 for iNet D978 Rev2 board
ARM: dts: sun8i: add pinmux for UART1 at PG
dts: sun8i-h3: add I2C0-2 peripherals to H3 SOC
dts: sun8i-h3: add pinmux definitions for I2C0-2
dts: sun8i-h3: associate exposed UARTs on Orange Pi Boards
dts: sun8i-h3: split off RTS/CTS for UART1 in seperate pinmux
dts: sun8i-h3: add pinmux definitions for UART2-3
ARM: dts: sun9i: a80-optimus: Disable EHCI1
ARM: dts: sun9i: cubieboard4: Add AXP806 PMIC device node and regulators
ARM: dts: sun9i: a80-optimus: Add AXP806 PMIC device node and regulators
ARM: dts: sun9i: cubieboard4: Declare AXP809 SW regulator as unused
ARM: dts: sun9i: a80-optimus: Declare AXP809 SW regulator as unused
ARM: dts: sun8i: Add touchscreen node for sun8i-a33-ga10h
ARM: dts: sun8i: Add touchscreen node for sun8i-a23-polaroid-mid2809pxe04
ARM: dts: sun8i: Add touchscreen node for sun8i-a23-polaroid-mid2407pxe03
ARM: dts: sun8i: Add touchscreen node for sun8i-a23-inet86dz
ARM: dts: sun8i: Add touchscreen node for sun8i-a23-gt90h
ARM64: dts: meson-gxbb-vega-s95: Enable USB Nodes
...
Linus Torvalds [Sat, 8 Oct 2016 04:32:39 +0000 (21:32 -0700)]
Merge tag 'armsoc-dt64' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM 64-bit DT updates from Arnd Bergmann:
"The 64-bit DT changes are surprisingly small this time, we only add
two SoC platforms: the ZTE ZX296718 Set-top-box SoC and the SocioNext
UniPhier LD11 TV SoC, each with their reference boards.
There are three new machines added for existing SoC platforms:
- The Marvell Armada 8040 development board is an impressive
quad-core Cortex-A72 machine with three 10gbit ethernet interfaces
- Qualcomms DragonBoard 820c single-board computer is their current
high-end phone platform in the 96boards form factor
- Rockchip: Tronsmart Orion r86 set-top-box is a popular mid-range
Android box based on the 8-core rk3368 SoC"
* tag 'armsoc-dt64' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (91 commits)
arm64: dts: berlin4ct: Add L2 cache topology
arm64: dts: berlin4ct: enable all wdt nodes unconditionally
arm64: dts: berlin4ct: switch to Cortex-A53 specific pmu nodes
arm64: dts: Add ZTE ZX296718 SoC dts and Makefile
arm64: dts: apm: Add DT node for APM X-Gene 2 CPU clocks
arm64: dts: apm: Add X-Gene SoC hwmon to device tree
arm64: dts: apm: Fix interrupt polarity for X-Gene PCIe legacy interrupts
arm64: dts: apm: Add APM X-Gene v2 SoC PMU DTS entries
arm64: dts: apm: Add APM X-Gene SoC PMU DTS entries
arm64: dts: marvell: enable MSI for PCIe on Armada 7K/8K
arm64: dts: ls2080a: Add 'dma-coherent' for ls2080a PCI nodes
arm64: dts: rockchip: add Type-C phy for RK3399
arm64: dts: rockchip: enable the gmac for rk3399 evb board
arm64: dts: rockchip: add the gmac needed node for rk3399
arm64: dts: rockchip: support the pmu node for rk3399
arm64: dts: rockchip: change all interrupts cells to 4 on rk3399 SoCs
arm64: dts: rockchip: add the tcpc for rk3399 power domain
arm64: dts: rockchip: add efuse0 device node for rk3399
arm64: dts: rockchip: configure PCIe support for rk3399-evb
arm64: dts: rockchip: add the PCIe controller support for RK3399
...
Linus Torvalds [Sat, 8 Oct 2016 04:29:04 +0000 (21:29 -0700)]
Merge tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM DT updates from Arnd Bergmann:
"These are as usual a very large number of mostly boring updates to
enable devices in existing machines, or to fix minor bugs. Notably, an
ongoing treewide effort to fix warnings caused by an update to the
device tree compiler. These are enabled with "make W=1" at the moment
but can hopefully become the default once all issues have been
addressed.
No new SoC platform is added this time around (Armada 395 and Orion
mv88f5181 are slight variations of existing ones), but a significant
number of new dts files are added, which I list by platform:
- Allwinner: Empire Electronix M712 and iNet d978 Rev2 tablets,
Orange Pi PC Plus, Orange Pi 2, Orange Pi Plus 2E, Orange Pi Lite,
Olimex A33-Olinuxino, and Nano Pi Neo single-board computers
- ARM Realview: all supported machines (ported from board files)
- Broadcom: BCM958525er, BCM958522er, BCM988312hr, BCM958623hr and
BCM958622hr reference boards for Northstar platform, Raspberry Pi
Zero single-board computer
- Marvell EBU: Netgear WNR854T router (ported from board file),
Armada 395 SoC platform and GP board Armada 390 DB development
board
- NXP i.MX: imx7s Warp7 reference board, Gateworks Ventana GW553x
single-board computer, Technologic Systems TS-4900 and Engicam
IMX6UL GEA M6UL computer-on-module, Inverse Path USB armory board
- Qualcomm: LG Nexus 5 Phone
- Renesas: r8a7792/wheat and r7s72100/rskrza1 development boards
- ST Microelectronics STi: B2260 (96boards) single-board computer
- TI Davinci: OMAP-L138 LCDK Development kit
- TI OMAP: beagleboard-x15 rev B1 single-board computer"
* tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (390 commits)
ARM: dts: sony-nsz-gs7: add missing unit name to /memory node
ARM: dts: chromecast: add missing unit name to /memory node
ARM: dts: berlin2q-marvell-dmp: add missing unit name to /memory node
ARM: dts: berlin2: Add missing unit name to /soc node
ARM: dts: berlin2cd: Add missing unit name to /soc node
ARM: dts: berlin2q: Add missing unit name to /soc node
ARM: dts: berlin2: Remove skeleton.dtsi inclusion
ARM: dts: berlin2cd: Remove skeleton.dtsi inclusion
ARM: dts: berlin2q: Remove skeleton.dtsi inclusion
arm: dts: berlin2q: enable all wdt nodes unconditionally
arm: dts: berlin2: enable all wdt nodes unconditionally
ARM: dts: omap5-igep0050.dts: Use tabs for indentation
ARM: dts: Fix igepv5 power button GPIO direction
ARM: dts: am335x-evmsk: Add blue-and-red-wiring -property to lcdc node
ARM: dts: am335x-evmsk: Whitespace cleanup of lcdc related nodes
ARM: dts: am335x-evm: Add blue-and-red-wiring -property to lcdc node
ARM: dts: s3c64xx: Use macros for pinctrl configuration
ARM: dts: s3c2416: Use macros for pinctrl configuration
ARM: dts: s5pv210: Use macros for pinctrl configuration
ARM: dts: s3c64xx: Use common macros for pinctrl configuration
...
Linus Torvalds [Sat, 8 Oct 2016 04:23:40 +0000 (21:23 -0700)]
Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC driver updates from Arnd Bergmann:
"Driver updates for ARM SoCs, including a couple of newly added
drivers:
- The Qualcomm external bus interface 2 (EBI2), used in some of their
mobile phone chips for connecting flash memory, LCD displays or
other peripherals
- Secure monitor firmware for Amlogic SoCs, and an NVMEM driver for
the EFUSE based on that firmware interface.
- Perf support for the AppliedMicro X-Gene performance monitor unit
- Reset driver for STMicroelectronics STM32
- Reset driver for SocioNext UniPhier SoCs
Aside from these, there are minor updates to SoC-specific bus,
clocksource, firmware, pinctrl, reset, rtc and pmic drivers"
* tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (50 commits)
bus: qcom-ebi2: depend on HAS_IOMEM
pinctrl: mvebu: orion5x: Generalise mv88f5181l support for 88f5181
clk: mvebu: Add clk support for the orion5x SoC mv88f5181
dt-bindings: EXYNOS: Add Exynos5433 PMU compatible
clocksource: exynos_mct: Add the support for ARM64
perf: xgene: Add APM X-Gene SoC Performance Monitoring Unit driver
Documentation: Add documentation for APM X-Gene SoC PMU DTS binding
MAINTAINERS: Add entry for APM X-Gene SoC PMU driver
bus: qcom: add EBI2 driver
bus: qcom: add EBI2 device tree bindings
rtc: rtc-pm8xxx: Add support for pm8018 rtc
nvmem: amlogic: Add Amlogic Meson EFUSE driver
firmware: Amlogic: Add secure monitor driver
soc: qcom: smd: Reset rx tail rather than tx
memory: atmel-sdramc: fix a possible NULL dereference
reset: hi6220: allow to compile test driver on other architectures
reset: zynq: add driver Kconfig option
reset: sunxi: add driver Kconfig option
reset: stm32: add driver Kconfig option
reset: socfpga: add driver Kconfig option
...
Linus Torvalds [Sat, 8 Oct 2016 04:22:01 +0000 (21:22 -0700)]
Merge tag 'armsoc-arm64' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC 64-bit updates from Arnd Bergmann:
"Changes to platform code for 64-bit ARM platforms.
Nearly all of these are defconfig updates to enable new drivers or old
drivers still used on these 64-bit platforms.
Aside from that, we gain initial support for two set-top-box
platforms, both of which already have 32-bit support in arch/arm:
- Broadcom adds abstract support for the bcm7xxx/brcmstb platform,
presumably the respective dts files and more information will
follow at a later point.
- The ZTE ZX296718 SoC for set-top-boxes, a relative of the 32-bit
ZX296702 SoC that we already support"
* tag 'armsoc-arm64' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
arm64: add ZTE ZX SoC family
arm64: defconfig: enable ZTE ZX related config
arm64: defconfig: enable common modules for power management
arm64: defconfig: enable meson I2C
arm64: defconfig: enable meson SPI as module
arm64: defconfig: enable meson WDT as modules
arm64: defconfig: enable HW random as module
arm64: defconfig: Enable SDHI and GPIO_REGULATOR
arm64: configs: enable PCIe driver for Aardvark
Kconfig: ARCH_HISI: Add PINCTRL to HISI platform
arm64: defconfig: enable bluetooth supports as modules
arm64: defconfig: enable CONFIG_INPUT_HISI_POWERKEY for HiKey
arm64: defconfig: Enable HiSilicon kirin drm, adv7533 for HiKey
arm64: defconfig: Enable Hisi SAS and HNS
arm64: defconfig: Enable QDF2432 config options
arm64: sunxi: Kconfig: add essential pinctrl driver
arm64: defconfig: Add Renesas R-Car HSUSB driver support as module
arm64: Add Broadcom Set Top Box Kconfig entry point
arm64: defconfig: enable xhci-platform
Linus Torvalds [Sat, 8 Oct 2016 04:20:33 +0000 (21:20 -0700)]
Merge tag 'armsoc-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC defconfig updates from Arnd Bergmann:
"Defconfig additions, removals, etc. Most of these are small changes
adding the options for newly upstreamed drivers, or drivers needed for
new board support. Nothing specifically sticks out this time"
* tag 'armsoc-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (25 commits)
ARM: multi_v7_defconfig: enable CONFIG_EFI
ARM: multi_v7_defconfig: Build Atmel maXTouch driver as a module
ARM: defconfig: update the Integrator defconfig
ARM: keystone: defconfig: Fix USB configuration
ARM: imx_v6_v7_defconfig: Select the wm8960 codec driver
ARM: omap2plus_defconfig: switch to the IIO BMP085 driver
ARM: mvebu_v5_defconfig: use MV88E6XXX
ARM: davinci_all_defconfig: Enable some UBI modules
ARM: davinci_all_defconfig: Enable AEMIF as a module
ARM: multi_v7_defconfig: Enable SECCOMP
ARM: exynos_defconfig: Enable SECCOMP
ARM: imx_v6_v7_defconfig: Add CONFIG_MPL3115
ARM: imx_v6_v7_defconfig: Enable GPU support
ARM: s3c2410_defconfig: Remove CONFIG_IPV6_PRIVACY
ARM: exynos_defconfig: Enable PM_DEBUG
ARM: exynos_defconfig: Enable bus frequency scaling with devfreq
ARM: imx_v6_v7_defconfig: enable more USB configurations
ARM: davinci_all_defconfig: enable SMSC ethernet PHY
ARM: davinci_all_defconfig: enable RTC driver as module
ARM: multi_v7_defconfig: Enable ARM_IMX6Q_CPUFREQ
...
Linus Torvalds [Sat, 8 Oct 2016 04:18:42 +0000 (21:18 -0700)]
Merge tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC platform updates from Arnd Bergmann:
"These are updates for platform specific code on 32-bit ARM machines,
essentially anything that can not (yet) be expressed using DT files.
Noteworthy changes include:
- We get support for running in big-endian mode on two platforms:
sunxi (Allwinner) and s3c24xx (old Samsung).
- The recently added Uniphier platform now uses standard PSCI methods
for SMP booting and we remove support for old bootloader versions
that did not support it yet.
- In sunxi, we gain support for the "Nextthing GR8" SoC, which is a
close relative of the Allwinner A13 and R8 chips.
- PXA completes its move over to the generic dmaengine framework and
removes its old private API
- mach-bcm gains support for BCM47189/BCM53573, their first ARM SoC
with integrated 802.11ac wireless networking"
* tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (54 commits)
ARM: imx legacy: pca100: move peripheral initialization to .init_late
ARM: imx legacy: mx27ads: move peripheral initialization to .init_late
ARM: imx legacy: mx21ads: move peripheral initialization to .init_late
ARM: imx legacy: pcm043: move peripheral initialization to .init_late
ARM: imx legacy: mx35-3ds: move peripheral initialization to .init_late
ARM: imx legacy: mx27-3ds: move peripheral initialization to .init_late
ARM: imx legacy: imx27-visstrim-m10: move peripheral initialization to .init_late
ARM: imx legacy: vpr200: move peripheral initialization to .init_late
ARM: imx legacy: mx31moboard: move peripheral initialization to .init_late
ARM: imx legacy: armadillo5x0: move peripheral initialization to .init_late
ARM: imx legacy: qong: move peripheral initialization to .init_late
ARM: imx legacy: mx31-3ds: move peripheral initialization to .init_late
ARM: imx legacy: pcm037: move peripheral initialization to .init_late
ARM: imx legacy: mx31lilly: move peripheral initialization to .init_late
ARM: imx legacy: mx31ads: move peripheral initialization to .init_late
ARM: imx legacy: mx31lite: move peripheral initialization to .init_late
ARM: imx legacy: kzm: move peripheral initialization to .init_late
MAINTAINERS: update list of Oxnas maintainers
ARM: orion5x: remove extraneous NO_IRQ
ARM: orion: simplify orion_ge00_switch_init
...
Linus Torvalds [Sat, 8 Oct 2016 04:16:16 +0000 (21:16 -0700)]
Merge tag 'armsoc-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC cleanups from Arnd Bergmann:
"The cleanups for v4.9 are a little larger that usual, but thankfully
that is almost exclusively due to removing a significant number of
files that have become obsolete after the still ongoing conversion of
old board files to devicetree.
- for mach-omap2, which is still the largest platform in arch/arm/,
the conversion to DT is finally complete after the Nokia N900 is
now fully supported there, along with the omap3 LDP, and we can
remove those two board files. If no regressions are found, another
large cleanup for the platform will happen as a follow-up, removing
dead code and restructuring the platform based on being DT-only.
- In mach-imx, similar work is ongoing, but has not come that far.
This time, we remove the obsolete board file for the i.MX1
generation, which like i.MX25, i.MX5, i.MX6, and i.MX7 is now
DT-only. The remaining board files are for i.MX2 and i.MX3 machines
based on old ARM926 or ARM1136 cores that should work with DT in
principle.
- realview has just been converted from board files to DT, and a lot
of code gets removed in the process. This is the last
ARM/Keil/Versatile derived platform that was still using board
files, the other ones being integrator, versatile and vexpress. We
can probably merge the remaining code into a single directory in
the near future.
- clps711x had completed the conversion in v4.8, but we accidentally
left the files in place that should have been deleted then"
* tag 'armsoc-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (21 commits)
ARM: select PCI_DOMAINS config from ARCH_MULTIPLATFORM
ARM: stop *MIGHT_HAVE_PCI* config from being selected redundantly
ARM: imx: (trivial) fix typo and grammar
ARM: clps711x: remove extraneous files
ARM: imx: use IS_ENABLED() instead of checking for built-in or module
ARM: OMAP2+: use IS_ENABLED() instead of checking for built-in or module
ARM: OMAP1: use IS_ENABLED() instead of checking for built-in or module
ARM: imx: remove platform-mxc_rnga
ARM: realview: imply device tree boot
ARM: realview: no need to select SMP_ON_UP explicitly
ARM: realview: delete the RealView board files
ARM: imx: no need to select SMP_ON_UP explicitly
ARM: i.MX: Move SOC_IMX1 into 'Device tree only'
ARM: i.MX: Remove i.MX1 non-DT support
ARM: i.MX: Remove i.MX1 Synertronixx SCB9328 board support
ARM: i.MX: Remove i.MX1 Armadeus APF9328 board support
ARM: mxs: remove obsolete startup code for TX28
ARM: i.MX31 iomux: remove duplicates with alternate name
ARM: i.MX31 iomux: remove plain duplicates
ARM: OMAP2+: Drop legacy board file for LDP
...
wan/fsl_ucc_hdlc: Fix size used in dma_free_coherent()
Size used with 'dma_alloc_coherent()' and 'dma_free_coherent()' should be
consistent.
Here, the size of a pointer is used in dma_alloc... and the size of the
pointed structure is used in dma_free...
This has been spotted with coccinelle, using the following script:
////////////////////
@r@
expression x0, x1, y0, y1, z0, z1, t0, t1, ret;
@@
Nathan Sullivan [Fri, 7 Oct 2016 15:13:22 +0000 (10:13 -0500)]
net: macb: NULL out phydev after removing mdio bus
To ensure the dev->phydev pointer is not used after becoming invalid in
mdiobus_unregister, set it to NULL. This happens when removing the macb
driver without first taking its interface down, since unregister_netdev
will end up calling macb_close.
Signed-off-by: Xander Huff <xander.huff@ni.com> Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com> Signed-off-by: Brad Mouring <brad.mouring@ni.com> Reviewed-by: Moritz Fischer <moritz.fischer@ettus.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Durrant [Fri, 7 Oct 2016 08:32:31 +0000 (09:32 +0100)]
xen-netback: make sure that hashes are not send to unaware frontends
In the case when a frontend only negotiates a single queue with xen-
netback it is possible for a skbuff with a s/w hash to result in a
hash extra_info segment being sent to the frontend even when no hash
algorithm has been configured. (The ndo_select_queue() entry point makes
sure the hash is not set if no algorithm is configured, but this entry
point is not called when there is only a single queue). This can result
in a frontend that is unable to handle extra_info segments being given
such a segment, causing it to crash.
This patch fixes the problem by clearing the hash in ndo_start_xmit()
instead, which is clearly guaranteed to be called irrespective of the
number of queues.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Sidorenko [Fri, 7 Oct 2016 13:02:33 +0000 (09:02 -0400)]
Fixing a bug in team driver due to incorrect 'unsigned int' to 'int' conversion
Roundrobin runner of team driver uses 'unsigned int' variable to count
the number of sent_packets. Later it is passed to a subroutine
team_num_to_port_index(struct team *team, int num) as 'num' and when
we reach MAXINT (2**31-1), 'num' becomes negative.
This leads to using incorrect hash-bucket for port lookup
and as a result, packets are dropped. The fix consists of changing
'int num' to 'unsigned int num'. Testing of a fixed kernel shows that
there is no packet drop anymore.
Signed-off-by: Alex Sidorenko <alexandre.sidorenko@hpe.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 8 Oct 2016 03:50:37 +0000 (20:50 -0700)]
Merge branch 'parisc-4.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc updates from Helge Deller:
"Changes include:
- Fix boot of 32bit SMP kernel (initial kernel mapping was too small)
- Added hardened usercopy checks
- Drop bootmem and switch to memblock and NO_BOOTMEM implementation
- Drop the BROKEN_RODATA config option (and thus remove the relevant
code from the generic headers and files because parisc was the last
architecture which used this config option)
- Improve segfault reporting by printing human readable error strings
- Various smaller changes, e.g. dwarf debug support for assembly
code, update comments regarding copy_user_page_asm, switch to
kmalloc_array()"
* 'parisc-4.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Increase KERNEL_INITIAL_SIZE for 32-bit SMP kernels
parisc: Drop bootmem and switch to memblock
parisc: Add hardened usercopy feature
parisc: Add cfi_startproc and cfi_endproc to assembly code
parisc: Move hpmc stack into page aligned bss section
parisc: Fix self-detected CPU stall warnings on Mako machines
parisc: Report trap type as human readable string
parisc: Update comment regarding implementation of copy_user_page_asm
parisc: Use kmalloc_array() in add_system_map_addresses()
parisc: Check return value of smp_boot_one_cpu()
parisc: Drop BROKEN_RODATA config option
Paul Durrant [Fri, 7 Oct 2016 10:33:37 +0000 (11:33 +0100)]
MAINTAINERS: add myself as a maintainer of xen-netback
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 8 Oct 2016 03:48:45 +0000 (20:48 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32
Pull avr32 update from Hans-Christian Noren Egtvedt.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32:
avr32: migrate exception table users off module.h and onto extable.h
Linus Torvalds [Sat, 8 Oct 2016 03:19:31 +0000 (20:19 -0700)]
Merge tag 'powerpc-4.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Highlights:
- Major rework of Book3S 64-bit exception vectors (Nicholas Piggin)
- Use gas sections for arranging exception vectors et. al.
- Large set of TM cleanups and selftests (Cyril Bur)
- Enable transactional memory (TM) lazily for userspace (Cyril Bur)
- Support for XZ compression in the zImage wrapper (Oliver
O'Halloran)
- Add support for bpf constant blinding (Naveen N. Rao)
- Beginnings of upstream support for PA Semi Nemo motherboards
(Darren Stevens)
Fixes:
- Ensure .mem(init|exit).text are within _stext/_etext (Michael
Ellerman)
- xmon: Don't use ld on 32-bit (Michael Ellerman)
- vdso64: Use double word compare on pointers (Anton Blanchard)
- powerpc/nvram: Fix an incorrect partition merge (Pan Xinhui)
- powerpc: Fix usage of _PAGE_RO in hugepage (Christophe Leroy)
- powerpc/mm: Update FORCE_MAX_ZONEORDER range to allow hugetlb w/4K
(Aneesh Kumar K.V)
- Fix memory leak in queue_hotplug_event() error path (Andrew
Donnellan)
- Replay hypervisor maintenance interrupt first (Nicholas Piggin)
Various performance optimisations (Anton Blanchard):
- Align hot loops of memset() and backwards_memcpy()
- During context switch, check before setting mm_cpumask
- Remove static branch prediction in atomic{, 64}_add_unless
- Only disable HAVE_EFFICIENT_UNALIGNED_ACCESS on POWER7 little
endian
- Set default CPU type to POWER8 for little endian builds
Cleanups & features:
- Sparse fixes/cleanups (Daniel Axtens)
- Preserve CFAR value on SLB miss caused by access to bogus address
(Paul Mackerras)
- Radix MMU fixups for POWER9 (Aneesh Kumar K.V)
- Support for setting used_(vsr|vr|spe) in sigreturn path (for CRIU)
(Simon Guo)
- Optimise syscall entry for virtual, relocatable case (Nicholas
Piggin)
- Optimise MSR handling in exception handling (Nicholas Piggin)
- Support for kexec with Radix MMU (Benjamin Herrenschmidt)
- powernv EEH fixes (Russell Currey)
- Suprise PCI hotplug support for powernv (Gavin Shan)
- Endian/sparse fixes for powernv PCI (Gavin Shan)
- Defconfig updates (Anton Blanchard)
- KVM: PPC: Book3S HV: Migrate pinned pages out of CMA (Balbir Singh)
- cxl: Flush PSL cache before resetting the adapter (Frederic Barrat)
- cxl: replace loop with for_each_child_of_node(), remove unneeded
of_node_put() (Andrew Donnellan)
- Fix HV facility unavailable to use correct handler (Nicholas
Piggin)
- Remove unnecessary syscall trampoline (Nicholas Piggin)
- fadump: Fix build break when CONFIG_PROC_VMCORE=n (Michael
Ellerman)
- Quieten EEH message when no adapters are found (Anton Blanchard)
- powernv: Add PHB register dump debugfs handle (Russell Currey)
- Use kprobe blacklist for exception handlers & asm functions
(Nicholas Piggin)
- Document the syscall ABI (Nicholas Piggin)
- MAINTAINERS: Update cxl maintainers (Michael Neuling)
- powerpc: Remove all usages of NO_IRQ (Michael Ellerman)
Minor cleanups:
- Andrew Donnellan, Christophe Leroy, Colin Ian King, Cyril Bur,
Frederic Barrat, Pan Xinhui, PrasannaKumar Muralidharan, Rui Teng,
Simon Guo"
* tag 'powerpc-4.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (156 commits)
powerpc/bpf: Add support for bpf constant blinding
powerpc/bpf: Implement support for tail calls
powerpc/bpf: Introduce accessors for using the tmp local stack space
powerpc/fadump: Fix build break when CONFIG_PROC_VMCORE=n
powerpc: tm: Enable transactional memory (TM) lazily for userspace
powerpc/tm: Add TM Unavailable Exception
powerpc: Remove do_load_up_transact_{fpu,altivec}
powerpc: tm: Rename transct_(*) to ck(\1)_state
powerpc: tm: Always use fp_state and vr_state to store live registers
selftests/powerpc: Add checks for transactional VSXs in signal contexts
selftests/powerpc: Add checks for transactional VMXs in signal contexts
selftests/powerpc: Add checks for transactional FPUs in signal contexts
selftests/powerpc: Add checks for transactional GPRs in signal contexts
selftests/powerpc: Check that signals always get delivered
selftests/powerpc: Add TM tcheck helpers in C
selftests/powerpc: Allow tests to extend their kill timeout
selftests/powerpc: Introduce GPR asm helper header file
selftests/powerpc: Move VMX stack frame macros to header file
selftests/powerpc: Rework FPU stack placement macros and move to header file
selftests/powerpc: Check for VSX preservation across userspace preemption
...
Paul Burton [Sat, 8 Oct 2016 00:03:15 +0000 (17:03 -0700)]
console: don't prefer first registered if DT specifies stdout-path
If a device tree specifies a preferred device for kernel console output
via the stdout-path or linux,stdout-path chosen node properties or the
stdout alias then the kernel ought to honor it & output the kernel
console to that device. As it stands, this isn't the case. Whilst we
parse the stdout-path properties & set an of_stdout variable from
of_alias_scan(), and use that from of_console_check() to determine
whether to add a console device as a preferred console whilst
registering it, we also prefer the first registered console if no other
has been selected at the time of its registration.
This means that if a console other than the one the device tree selects
via stdout-path is registered first, we will switch to using it & when
the stdout-path console is later registered the call to
add_preferred_console() via of_console_check() is too late to do
anything useful. In practice this seems to mean that we switch to the
dummy console device fairly early & see no further console output:
Fix this by not automatically preferring the first registered console if
one is specified by the device tree. This allows consoles to be
registered but not enabled, and once the driver for the console selected
by stdout-path calls of_console_check() the driver will be added to the
list of preferred consoles before any other console has been enabled.
When that console is then registered via register_console() it will be
enabled as expected.
Link: http://lkml.kernel.org/r/20160809151937.26118-1-paul.burton@imgtec.com Signed-off-by: Paul Burton <paul.burton@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Tejun Heo <tj@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Ivan Delalande <colona@arista.com> Cc: Thierry Reding <treding@nvidia.com> Cc: Borislav Petkov <bp@suse.de> Cc: Jan Kara <jack@suse.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Joe Perches <joe@perches.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Frank Rowand <frowand.list@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Sat, 8 Oct 2016 00:03:12 +0000 (17:03 -0700)]
cred: simpler, 1D supplementary groups
Current supplementary groups code can massively overallocate memory and
is implemented in a way so that access to individual gid is done via 2D
array.
If number of gids is <= 32, memory allocation is more or less tolerable
(140/148 bytes). But if it is not, code allocates full page (!)
regardless and, what's even more fun, doesn't reuse small 32-entry
array.
2D array means dependent shifts, loads and LEAs without possibility to
optimize them (gid is never known at compile time).
All of the above is unnecessary. Switch to the usual
trailing-zero-len-array scheme. Memory is allocated with
kmalloc/vmalloc() and only as much as needed. Accesses become simpler
(LEA 8(gi,idx,4) or even without displacement).
Maximum number of gids is 65536 which translates to 256KB+8 bytes. I
think kernel can handle such allocation.
On my usual desktop system with whole 9 (nine) aux groups, struct
group_info shrinks from 148 bytes to 44 bytes, yay!
Nice side effects:
- "gi->gid[i]" is shorter than "GROUP_AT(gi, i)", less typing,
- fix little mess in net/ipv4/ping.c
should have been using GROUP_AT macro but this point becomes moot,
- aux group allocation is persistent and should be accounted as such.
Jean Delvare [Sat, 8 Oct 2016 00:03:04 +0000 (17:03 -0700)]
.gitattributes: set git diff driver for C source code files
Git can be told to apply language-specific rules when generating diffs.
Enable this for C source code files (*.c and *.h) so that function names
are printed right. Specifically, doing so prevents "git diff" from
mistakenly considering unindented goto labels as function names.
Link: http://lkml.kernel.org/r/20160907143403.1449324f@endymion Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Joe Perches <joe@perches.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
uprobes: remove function declarations from arch/{mips,s390}
The declarations of arch-specific functions have been moved to a common
header in commit 3820b4d2789f ('uprobes: Move function declarations out
of arch'), but MIPS and S390 has added them to their own trees later.
Remove the unnecessary duplicates.
Link: http://lkml.kernel.org/r/1472804384-17830-1-git-send-email-marcin.nowakowski@imgtec.com Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chris Metcalf [Sat, 8 Oct 2016 00:02:55 +0000 (17:02 -0700)]
nmi_backtrace: generate one-line reports for idle cpus
When doing an nmi backtrace of many cores, most of which are idle, the
output is a little overwhelming and very uninformative. Suppress
messages for cpus that are idling when they are interrupted and just
emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".
We do this by grouping all the cpuidle code together into a new
.cpuidle.text section, and then checking the address of the interrupted
PC to see if it lies within that section.
This commit suitably tags x86 and tile idle routines, and only adds in
the minimal framework for other architectures.
Link: http://lkml.kernel.org/r/1472487169-14923-5-git-send-email-cmetcalf@mellanox.com Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm] Tested-by: Petr Mladek <pmladek@suse.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chris Metcalf [Sat, 8 Oct 2016 00:02:52 +0000 (17:02 -0700)]
arch/tile: adopt the new nmi_backtrace framework
Previously tile was rolling its own method of capturing backtrace data
in the NMI handlers, but it was relying on running printk() from the NMI
handler, which is not always safe. So adopt the nmi_backtrace model
(with the new cpumask extension) instead.
So we can call the nmi_backtrace code directly from the nmi handler,
move the nmi_enter()/exit() into the top-level tile NMI handler.
The semantics of the routine change slightly since it is now synchronous
with the remote cores completing the backtraces. Previously it was
asynchronous, but with protection to avoid starting a new remote
backtrace if the old one was still in progress.
Link: http://lkml.kernel.org/r/1472487169-14923-4-git-send-email-cmetcalf@mellanox.com Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> [arm] Cc: Petr Mladek <pmladek@suse.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chris Metcalf [Sat, 8 Oct 2016 00:02:49 +0000 (17:02 -0700)]
nmi_backtrace: do a local dump_stack() instead of a self-NMI
Currently on arm there is code that checks whether it should call
dump_stack() explicitly, to avoid trying to raise an NMI when the
current context is not preemptible by the backtrace IPI. Similarly, the
forthcoming arch/tile support uses an IPI mechanism that does not
support generating an NMI to self.
Accordingly, move the code that guards this case into the generic
mechanism, and invoke it unconditionally whenever we want a backtrace of
the current cpu. It seems plausible that in all cases, dump_stack()
will generate better information than generating a stack from the NMI
handler. The register state will be missing, but that state is likely
not particularly helpful in any case.
Or, if we think it is helpful, we should be capturing and emitting the
current register state in all cases when regs == NULL is passed to
nmi_cpu_backtrace().
Link: http://lkml.kernel.org/r/1472487169-14923-3-git-send-email-cmetcalf@mellanox.com Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com> Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm] Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Aaron Tomlin <atomlin@redhat.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chris Metcalf [Sat, 8 Oct 2016 00:02:45 +0000 (17:02 -0700)]
nmi_backtrace: add more trigger_*_cpu_backtrace() methods
Patch series "improvements to the nmi_backtrace code" v9.
This patch series modifies the trigger_xxx_backtrace() NMI-based remote
backtracing code to make it more flexible, and makes a few small
improvements along the way.
The motivation comes from the task isolation code, where there are
scenarios where we want to be able to diagnose a case where some cpu is
about to interrupt a task-isolated cpu. It can be helpful to see both
where the interrupting cpu is, and also an approximation of where the
cpu that is being interrupted is. The nmi_backtrace framework allows us
to discover the stack of the interrupted cpu.
I've tested that the change works as desired on tile, and build-tested
x86, arm, mips, and sparc64. For x86 I confirmed that the generic
cpuidle stuff as well as the architecture-specific routines are in the
new cpuidle section. For arm, mips, and sparc I just build-tested it
and made sure the generic cpuidle routines were in the new cpuidle
section, but I didn't attempt to figure out which the platform-specific
idle routines might be. That might be more usefully done by someone
with platform experience in follow-up patches.
This patch (of 4):
Currently you can only request a backtrace of either all cpus, or all
cpus but yourself. It can also be helpful to request a remote backtrace
of a single cpu, and since we want that, the logical extension is to
support a cpumask as the underlying primitive.
This change modifies the existing lib/nmi_backtrace.c code to take a
cpumask as its basic primitive, and modifies the linux/nmi.h code to use
the new "cpumask" method instead.
The existing clients of nmi_backtrace (arm and x86) are converted to
using the new cpumask approach in this change.
The other users of the backtracing API (sparc64 and mips) are converted
to use the cpumask approach rather than the all/allbutself approach.
The mips code ignored the "include_self" boolean but with this change it
will now also dump a local backtrace if requested.
Link: http://lkml.kernel.org/r/1472487169-14923-2-git-send-email-cmetcalf@mellanox.com Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com> Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm] Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Berg [Sat, 8 Oct 2016 00:02:42 +0000 (17:02 -0700)]
min/max: remove sparse warnings when they're nested
Currently, when min/max are nested within themselves, sparse will warn:
warning: symbol '_min1' shadows an earlier one
originally declared here
warning: symbol '_min1' shadows an earlier one
originally declared here
warning: symbol '_min2' shadows an earlier one
originally declared here
This also immediately happens when min3() or max3() are used.
Since sparse implements __COUNTER__, we can use __UNIQUE_ID() to
generate unique variable names, avoiding this.
Robert Ho [Sat, 8 Oct 2016 00:02:39 +0000 (17:02 -0700)]
Documentation/filesystems/proc.txt: add more description for maps/smaps
Add some more description on the limitations for smaps/maps readings, as
well as some guaruntees we can make.
Link: http://lkml.kernel.org/r/1475296958-27652-2-git-send-email-robert.hu@intel.com Signed-off-by: Robert Ho <robert.hu@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Cc: Robert Hu <robert.hu@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Actually, this bug is not only for NVDIMM/DAX but also for any other
file systems. This simple test case abstracted from nvml can easily
reproduce this bug in common environment:
int retval = 0; /* assume false until proven otherwise */
char line[PROCMAXLEN]; /* for fgets() */
char *lo = NULL; /* beginning of current range in smaps file */
char *hi = NULL; /* end of current range in smaps file */
int needmm = 0; /* looking for mm flag for current range */
while (fgets(line, PROCMAXLEN, fp) != NULL) {
static const char vmflags[] = "VmFlags:";
static const char mm[] = " wr";
/* check for range line */
if (sscanf(line, "%p-%p", &lo, &hi) == 2) {
if (needmm) {
/* last range matched, but no mm flag found */
printf("never found mm flag.\n");
break;
} else if (caddr < lo) {
/* never found the range for caddr */
printf("#######no match for addr %p.\n", caddr);
break;
} else if (caddr < hi) {
/* start address is in this range */
size_t rangelen = (size_t)(hi - caddr);
/* remember that matching has started */
needmm = 1;
/* calculate remaining range to search for */
if (len > rangelen) {
len -= rangelen;
caddr += rangelen;
printf("matched %zu bytes in range "
"%p-%p, %zu left over.\n",
rangelen, lo, hi, len);
} else {
len = 0;
printf("matched all bytes in range "
"%p-%p.\n", lo, hi);
}
}
} else if (needmm && strncmp(line, vmflags,
sizeof(vmflags) - 1) == 0) {
if (strstr(&line[sizeof(vmflags) - 1], mm) != NULL) {
printf("mm flag found.\n");
if (len == 0) {
/* entire range matched */
retval = 1;
break;
}
needmm = 0; /* saw what was needed */
} else {
/* mm flag not set for some or all of range */
printf("range has no mm flag.\n");
break;
}
}
}
/* kick off NTHREAD threads */
for (int i = 0; i < NTHREAD; i++)
pthread_create(&threads[i], NULL, worker, &ret[i]);
/* wait for all the threads to complete */
for (int i = 0; i < NTHREAD; i++)
pthread_join(threads[i], NULL);
/* verify that all the threads return the same value */
for (int i = 1; i < NTHREAD; i++) {
if (ret[0] != ret[i]) {
printf("Error i %d ret[0] = %d ret[i] = %d.\n", i,
ret[0], ret[i]);
}
}
printf("%d", ret[0]);
return 0;
}
It failed as some threads can not find the memory region in
"/proc/self/smaps" which is allocated in the main process
It is caused by proc fs which uses 'file->version' to indicate the VMA that
is the last one has already been handled by read() system call. When the
next read() issues, it uses the 'version' to find the VMA, then the next
VMA is what we want to handle, the related code is as follows:
if (last_addr) {
vma = find_vma(mm, last_addr);
if (vma && (vma = m_next_vma(priv, vma)))
return vma;
}
However, VMA will be lost if the last VMA is gone, e.g:
The process VMA list is A->B->C->D
CPU 0 CPU 1
read() system call
handle VMA B
version = B
return to userspace
unmap VMA B
issue read() again to continue to get
the region info
find_vma(version) will get VMA C
m_next_vma(C) will get VMA D
handle D
!!! VMA C is lost !!!
In order to fix this bug, we make 'file->version' indicate the end address
of the current VMA. m_start will then look up a vma which with vma_start
< last_vm_end and moves on to the next vma if we found the same or an
overlapping vma. This will guarantee that we will not miss an exclusive
vma but we can still miss one if the previous vma was shrunk. This is
acceptable because guaranteeing "never miss a vma" is simply not feasible.
User has to cope with some inconsistencies if the file is not read in one
go.
[mhocko@suse.com: changelog fixes] Link: http://lkml.kernel.org/r/1475296958-27652-1-git-send-email-robert.hu@intel.com Acked-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Robert Hu <robert.hu@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
John Stultz [Sat, 8 Oct 2016 00:02:33 +0000 (17:02 -0700)]
proc: fix timerslack_ns CAP_SYS_NICE check when adjusting self
In changing from checking ptrace_may_access(p, PTRACE_MODE_ATTACH_FSCREDS)
to capable(CAP_SYS_NICE), I missed that ptrace_my_access succeeds when p
== current, but the CAP_SYS_NICE doesn't.
Thus while the previous commit was intended to loosen the needed
privileges to modify a processes timerslack, it needlessly restricted a
task modifying its own timerslack via the proc/<tid>/timerslack_ns
(which is permitted also via the PR_SET_TIMERSLACK method).
This patch corrects this by checking if p == current before checking the
CAP_SYS_NICE value.
This patch applies on top of my two previous patches currently in -mm
Link: http://lkml.kernel.org/r/1471906870-28624-1-git-send-email-john.stultz@linaro.org Signed-off-by: John Stultz <john.stultz@linaro.org> Acked-by: Kees Cook <keescook@chromium.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Oren Laadan <orenl@cellrox.com> Cc: Ruchi Kandoi <kandoiruchi@google.com> Cc: Rom Lemarchand <romlem@android.com> Cc: Todd Kjos <tkjos@google.com> Cc: Colin Cross <ccross@android.com> Cc: Nick Kralevich <nnk@google.com> Cc: Dmitry Shmidt <dimitrysh@google.com> Cc: Elliott Hughes <enh@google.com> Cc: Android Kernel Team <kernel-team@android.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
John Stultz [Sat, 8 Oct 2016 00:02:29 +0000 (17:02 -0700)]
proc: add LSM hook checks to /proc/<tid>/timerslack_ns
As requested, this patch checks the existing LSM hooks
task_getscheduler/task_setscheduler when reading or modifying the task's
timerslack value.
Previous versions added new get/settimerslack LSM hooks, but since they
checked the same PROCESS__SET/GETSCHED values as existing hooks, it was
suggested we just use the existing ones.
Link: http://lkml.kernel.org/r/1469132667-17377-2-git-send-email-john.stultz@linaro.org Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Kees Cook <keescook@chromium.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Oren Laadan <orenl@cellrox.com> Cc: Ruchi Kandoi <kandoiruchi@google.com> Cc: Rom Lemarchand <romlem@android.com> Cc: Todd Kjos <tkjos@google.com> Cc: Colin Cross <ccross@android.com> Cc: Nick Kralevich <nnk@google.com> Cc: Dmitry Shmidt <dimitrysh@google.com> Cc: Elliott Hughes <enh@google.com> Cc: James Morris <jmorris@namei.org> Cc: Android Kernel Team <kernel-team@android.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>