Jens Axboe [Tue, 2 Aug 2016 14:45:44 +0000 (08:45 -0600)]
blk-mq: fix deadlock in blk_mq_register_disk() error path
If we fail registering any of the hardware queues, we call
into blk_mq_unregister_disk() with the hotplug mutex already
held. Since blk_mq_unregister_disk() attempts to acquire the
same mutex, we end up in a less than happy place.
Reported-by: Jinpu Wang <jinpu.wang@profitbricks.com> Signed-off-by: Jens Axboe <axboe@fb.com>
Dan Williams [Sun, 31 Jul 2016 18:15:13 +0000 (11:15 -0700)]
block: fix bdi vs gendisk lifetime mismatch
The name for a bdi of a gendisk is derived from the gendisk's devt.
However, since the gendisk is destroyed before the bdi it leaves a
window where a new gendisk could dynamically reuse the same devt while a
bdi with the same name is still live. Arrange for the bdi to hold a
reference against its "owner" disk device while it is registered.
Otherwise we can hit sysfs duplicate name collisions like the following:
Cc: <stable@vger.kernel.org> Reported-by: Yi Zhang <yizhan@redhat.com> Tested-by: Yi Zhang <yizhan@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Fixed up missing 0 return in bdi_register_owner().
blk-mq: Allow timeouts to run while queue is freezing
In case a submitted request gets stuck for some reason, the block layer
can prevent the request starvation by starting the scheduled timeout work.
If this stuck request occurs at the same time another thread has started
a queue freeze, the blk_mq_timeout_work will not be able to acquire the
queue reference and will return silently, thus not issuing the timeout.
But since the request is already holding a q_usage_counter reference and
is unable to complete, it will never release its reference, preventing
the queue from completing the freeze started by first thread. This puts
the request_queue in a hung state, forever waiting for the freeze
completion.
This was observed while running IO to a NVMe device at the same time we
toggled the CPU hotplug code. Eventually, once a request got stuck
requiring a timeout during a queue freeze, we saw the CPU Hotplug
notification code get stuck inside blk_mq_freeze_queue_wait, as shown in
the trace below.
The fix is to allow the timeout work to execute in the window between
dropping the initial refcount reference and the release of the last
reference, which actually marks the freeze completion. This can be
achieved with percpu_refcount_tryget, which does not require the counter
to be alive. This way the timeout work can do it's job and terminate a
stuck request even during a freeze, returning its reference and avoiding
the deadlock.
Allowing the timeout to run is just a part of the fix, since for some
devices, we might get stuck again inside the device driver's timeout
handler, should it attempt to allocate a new request in that path -
which is a quite common action for Abort commands, which need to be sent
after a timeout. In NVMe, for instance, we call blk_mq_alloc_request
from inside the timeout handler, which will fail during a freeze, since
it also tries to acquire a queue reference.
I considered a similar change to blk_mq_alloc_request as a generic
solution for further device driver hangs, but we can't do that, since it
would allow new requests to disturb the freeze process. I thought about
creating a new function in the block layer to support unfreezable
requests for these occasions, but after working on it for a while, I
feel like this should be handled in a per-driver basis. I'm now
experimenting with changes to the NVMe timeout path, but I'm open to
suggestions of ways to make this generic.
Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Cc: Brian King <brking@linux.vnet.ibm.com> Cc: Keith Busch <keith.busch@intel.com> Cc: linux-nvme@lists.infradead.org Cc: linux-block@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
It seems fairly obvious that device_create_file() is not being protected
from being run concurrently on the same nbd.
Quentin found the following relevant commits:
1a2ad21 nbd: add locking to nbd_ioctl 90b8f28 [PATCH] end of methods switch: remove the old ones d4430d6 [PATCH] beginning of methods conversion 08f8585 [PATCH] move block_device_operations to blkdev.h
It would seem that the race was introduced in the process of moving nbd
from BKL to unlocked ioctls.
By setting nbd->task_recv while the mutex is held, we can prevent other
processes from running concurrently (since nbd->task_recv is also checked
while the mutex is held).
Reported-and-tested-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Markus Pargmann <mpa@pengutronix.de> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Pavel Machek <pavel@suse.cz> Cc: Jens Axboe <axboe@fb.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>
This problem can occur in the following situation:
open()
- pread()
- .seq_start()
- iter = kmalloc() // succeeds
- seqf->private = iter
- .seq_stop()
- kfree(seqf->private)
- pread()
- .seq_start()
- iter = kmalloc() // fails
- .seq_stop()
- class_dev_iter_exit(seqf->private) // boom! old pointer
As the comment in disk_seqf_stop() says, stop is called even if start
failed, so we need to reinitialise the private pointer to NULL when seq
iteration stops.
An alternative would be to set the private pointer to NULL when the
kmalloc() in disk_seqf_start() fails.
Merge 4fc29c1aa375 included this extra line, but it's not needed (or
useful) since we'll bio_set_op_attrs() right after to properly set
the op and flags for the bio.
Paolo Valente [Wed, 27 Jul 2016 05:22:05 +0000 (07:22 +0200)]
block: add missing group association in bio-cloning functions
When a bio is cloned, the newly created bio must be associated with
the same blkcg as the original bio (if BLK_CGROUP is enabled). If
this operation is not performed, then the new bio is not associated
with any group, and the group of the current task is returned when
the group of the bio is requested.
Depending on the cloning frequency, this may cause a large
percentage of the bios belonging to a given group to be treated
as if belonging to other groups (in most cases as if belonging to
the root group). The expected group isolation may thereby be broken.
This commit adds the missing association in bio-cloning functions.
Fixes: da2f0f74cf7d ("Btrfs: add support for blkio controllers") Cc: stable@vger.kernel.org # v4.3+ Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Reviewed-by: Nikolay Borisov <kernel@kyup.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
Hou Tao [Tue, 26 Jul 2016 09:13:42 +0000 (17:13 +0800)]
blkcg: kill unused field nr_undestroyed_grps
'nr_undestroyed_grps' in struct throtl_data was used to count
the number of throtl_grp related with throtl_data, but now
throtl_grp is tracked by blkcg_gq, so it is useless anymore.
Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Jens Axboe <axboe@fb.com>
Jan Kara [Tue, 26 Jul 2016 09:38:20 +0000 (11:38 +0200)]
writeback: Write dirty times for WB_SYNC_ALL writeback
Currently we take care to handle I_DIRTY_TIME in vfs_fsync() and
queue_io() so that inodes which have only dirty timestamps are properly
written on fsync(2) and sync(2). However there are other call sites -
most notably going through write_inode_now() - which expect inode to be
clean after WB_SYNC_ALL writeback. This is not currently true as we do
not clear I_DIRTY_TIME in __writeback_single_inode() even for
WB_SYNC_ALL writeback in all the cases. This then resulted in the
following oops because bdev_write_inode() did not clean the inode and
writeback code later stumbled over a dirty inode with detached wb.
Jiri Kosina [Thu, 16 Jun 2016 07:53:58 +0000 (09:53 +0200)]
floppy: fix open(O_ACCMODE) for ioctl-only open
Commit 09954bad4 ("floppy: refactor open() flags handling"), as a
side-effect, causes open(/dev/fdX, O_ACCMODE) to fail. It turns out that
this is being used setfdprm userspace for ioctl-only open().
Reintroduce back the original behavior wrt !(FMODE_READ|FMODE_WRITE)
modes, while still keeping the original O_NDELAY bug fixed.
Linus Torvalds [Thu, 4 Aug 2016 13:14:38 +0000 (09:14 -0400)]
Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module updates from Rusty Russell:
"The only interesting thing here is Jessica's patch to add
ro_after_init support to modules. The rest are all trivia"
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
extable.h: add stddef.h so "NULL" definition is not implicit
modules: add ro_after_init support
jump_label: disable preemption around __module_text_address().
exceptions: fork exception table content from module.h into extable.h
modules: Add kernel parameter to blacklist modules
module: Do a WARN_ON_ONCE() for assert module mutex not held
Documentation/module-signing.txt: Note need for version info if reusing a key
module: Invalidate signatures on force-loaded modules
module: Issue warnings when tainting kernel
module: fix redundant test.
module: fix noreturn attribute for __module_put_and_exit()
Linus Torvalds [Thu, 4 Aug 2016 12:51:12 +0000 (08:51 -0400)]
Merge branch 'akpm' (patches from Andrew)
Merge even more updates from Andrew Morton:
- dma-mapping API cleanup
- a few cleanups and misc things
- use jump labels in dynamic-debug
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
dynamic_debug: add jump label support
jump_label: remove bug.h, atomic.h dependencies for HAVE_JUMP_LABEL
arm: jump label may reference text in __exit
tile: support static_key usage in non-module __exit sections
sparc: support static_key usage in non-module __exit sections
powerpc: add explicit #include <asm/asm-compat.h> for jump label
drivers/media/dvb-frontends/cxd2841er.c: avoid misleading gcc warning
MAINTAINERS: update email and list of Samsung HW driver maintainers
block: remove BLK_DEV_DAX config option
samples/kretprobe: fix the wrong type
samples/kretprobe: convert the printk to pr_info/pr_err
samples/jprobe: convert the printk to pr_info/pr_err
samples/kprobe: convert the printk to pr_info/pr_err
dma-mapping: use unsigned long for dma_attrs
media: mtk-vcodec: remove unused dma_attrs
include/linux/bitmap.h: cleanup
tree-wide: replace config_enabled() with IS_ENABLED()
drivers/fpga/Kconfig: fix build failure
Jason Baron [Wed, 3 Aug 2016 20:46:39 +0000 (13:46 -0700)]
dynamic_debug: add jump label support
Although dynamic debug is often only used for debug builds, sometimes
its enabled for production builds as well. Minimize its impact by using
jump labels. This reduces the text section by 7000+ bytes in the kernel
image below. It does increase data, but this should only be referenced
when changing the direction of the branches, and hence usually not in
cache.
Jason Baron [Wed, 3 Aug 2016 20:46:36 +0000 (13:46 -0700)]
jump_label: remove bug.h, atomic.h dependencies for HAVE_JUMP_LABEL
The current jump_label.h includes bug.h for things such as WARN_ON().
This makes the header problematic for inclusion by kernel.h or any
headers that kernel.h includes, since bug.h includes kernel.h (circular
dependency). The inclusion of atomic.h is similarly problematic. Thus,
this should make jump_label.h 'includable' from most places.
Link: http://lkml.kernel.org/r/7060ce35ddd0d20b33bf170685e6b0fab816bdf2.1467837322.git.jbaron@akamai.com Signed-off-by: Jason Baron <jbaron@akamai.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Joe Perches <joe@perches.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jason Baron [Wed, 3 Aug 2016 20:46:33 +0000 (13:46 -0700)]
arm: jump label may reference text in __exit
The jump table can reference text found in an __exit section. Thus,
instead of discarding it at build time, include EXIT_TEXT as part of
__init and it will be released when the system boots.
Link: http://lkml.kernel.org/r/60284113bb759121e8ae3e99af1535647e52123f.1467837322.git.jbaron@akamai.com Signed-off-by: Jason Baron <jbaron@akamai.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Joe Perches <joe@perches.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chris Metcalf [Wed, 3 Aug 2016 20:46:30 +0000 (13:46 -0700)]
tile: support static_key usage in non-module __exit sections
Previously, all the __exit sections were just dropped by the link phase.
However, if there are static_key (jump label) constructs in __exit
sections that are not modules, the link fails with the message:
`.exit.text' referenced in section `__jump_table' of xxx.o:
defined in discarded section `.exit.text' of xxx.o
Support this usage by keeping the .exit.text sections in the final image
if JUMP_LABEL is defined, then discarding them once initialization is
complete.
Link: http://lkml.kernel.org/r/bfd7c107c610c30e992868ebfe2a5d796a097464.1467837322.git.jbaron@akamai.com Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Joe Perches <joe@perches.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jason Baron [Wed, 3 Aug 2016 20:46:27 +0000 (13:46 -0700)]
sparc: support static_key usage in non-module __exit sections
The jump table can reference text found in an __exit section. Thus,
instead of discarding it at build/link time, include EXIT_TEXT as part
of __init and release it at system boot time.
Without this patch the link fails with:
`.exit.text' referenced in section `__jump_table' of xxx.o:
defined in discarded section `.exit.text' of xxx.o
Link: http://lkml.kernel.org/r/d822da427ab07a02a394602eca687104ff682f83.1467837322.git.jbaron@akamai.com Signed-off-by: Jason Baron <jbaron@akamai.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Joe Perches <joe@perches.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The addition of jump label support in dynamic_debug caused an unexpected
warning in exactly one file in the kernel:
drivers/media/dvb-frontends/cxd2841er.c: In function 'cxd2841er_tune_tc':
include/linux/dynamic_debug.h:134:3: error: 'carrier_offset' may be used uninitialized in this function [-Werror=maybe-uninitialized]
__dynamic_dev_dbg(&descriptor, dev, fmt, \
^~~~~~~~~~~~~~~~~
drivers/media/dvb-frontends/cxd2841er.c:3177:11: note: 'carrier_offset' was declared here
int ret, carrier_offset;
^~~~~~~~~~~~~~
The problem seems to be that the compiler gets confused by the extra
conditionals in static_branch_unlikely, to the point where it can no
longer keep track of which branches have already been taken, and it
doesn't realize that this variable is now always initialized when it
gets used.
I have done lots of randconfig kernel builds and could not find any
other file with this behavior, so I assume it's a rare enough glitch
that we don't need to change the jump label support but instead just
work around the warning in the driver.
To achieve that, I'm moving the check for the return value into the
switch() statement, which is an obvious transformation, but is enough to
un-confuse the compiler here. The resulting code is not as nice to
read, but at least we retain the behavior of warning if it gets changed
to actually access an uninitialized carrier offset value in the future.
Ross Zwisler [Wed, 3 Aug 2016 20:46:15 +0000 (13:46 -0700)]
block: remove BLK_DEV_DAX config option
The functionality for block device DAX was already removed with commit acc93d30d7d4 ("Revert "block: enable dax for raw block devices"")
However, we still had a config option hanging around that was always
disabled because it depended on CONFIG_BROKEN. This config option was
introduced in commit 03cdadb04077 ("block: disable block device DAX by
default")
This change reverts that commit, removing the dead config option.
Link: http://lkml.kernel.org/r/20160729182314.6368-1-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Huang Shijie [Wed, 3 Aug 2016 20:46:09 +0000 (13:46 -0700)]
samples/kretprobe: convert the printk to pr_info/pr_err
We prefer to use the pr_* to print out the log now, this patch converts
the printk to pr_info. In the error path, use the pr_err to replace the
printk.
Link: http://lkml.kernel.org/r/1464143083-3877-3-git-send-email-shijie.huang@arm.com Signed-off-by: Huang Shijie <shijie.huang@arm.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Huang Shijie [Wed, 3 Aug 2016 20:46:06 +0000 (13:46 -0700)]
samples/jprobe: convert the printk to pr_info/pr_err
We prefer to use the pr_* to print out the log now, this patch converts
the printk to pr_info. In the error path, use the pr_err to replace the
printk.
Link: http://lkml.kernel.org/r/1464143083-3877-2-git-send-email-shijie.huang@arm.com Signed-off-by: Huang Shijie <shijie.huang@arm.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Huang Shijie [Wed, 3 Aug 2016 20:46:03 +0000 (13:46 -0700)]
samples/kprobe: convert the printk to pr_info/pr_err
We prefer to use the pr_* to print out the log now, this patch converts
the printk to pr_info. In the error path, use the pr_err to replace the
printk.
Link: http://lkml.kernel.org/r/1464143083-3877-1-git-send-email-shijie.huang@arm.com Signed-off-by: Huang Shijie <shijie.huang@arm.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The dma-mapping core and the implementations do not change the DMA
attributes passed by pointer. Thus the pointer can point to const data.
However the attributes do not have to be a bitfield. Instead unsigned
long will do fine:
1. This is just simpler. Both in terms of reading the code and setting
attributes. Instead of initializing local attributes on the stack
and passing pointer to it to dma_set_attr(), just set the bits.
2. It brings safeness and checking for const correctness because the
attributes are passed by value.
Semantic patches for this change (at least most of them):
Masahiro Yamada [Wed, 3 Aug 2016 20:45:50 +0000 (13:45 -0700)]
tree-wide: replace config_enabled() with IS_ENABLED()
The use of config_enabled() against config options is ambiguous. In
practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the
author might have used it for the meaning of IS_ENABLED(). Using
IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc. makes the intention
clearer.
This commit replaces config_enabled() with IS_ENABLED() where possible.
This commit is only touching bool config options.
I noticed two cases where config_enabled() is used against a tristate
option:
Paul Gortmaker [Thu, 28 Jul 2016 03:11:47 +0000 (23:11 -0400)]
extable.h: add stddef.h so "NULL" definition is not implicit
While not an issue now, eventually we will have independent users of
the extable.h file and we will stop sourcing it via module.h header.
In testing that pending work, with very sparse builds, characteristic
of an "allnoconfig" on various architectures, we can sometimes hit an
instance where the very basic standard definitions aren't present,
resulting in:
include/linux/extable.h:26:9: error: 'NULL' undeclared (first use in this function)
To be clear, this isn't a regression, since currently extable.h is
only used by module.h -- however, we will need this addition present
before we start migrating exception table users off module.h and onto
extable.h during the next release cycle.
Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Add ro_after_init support for modules by adding a new page-aligned section
in the module layout (after rodata) for ro_after_init data and enabling RO
protection for that section after module init runs.
Rusty Russell [Wed, 27 Jul 2016 02:47:35 +0000 (12:17 +0930)]
jump_label: disable preemption around __module_text_address().
Steven reported a warning caused by not holding module_mutex or
rcu_read_lock_sched: his backtrace was corrupted but a quick audit
found this possible cause. It's wrong anyway...
Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Paul Gortmaker [Wed, 27 Jul 2016 02:36:34 +0000 (12:06 +0930)]
exceptions: fork exception table content from module.h into extable.h
For historical reasons (i.e. pre-git) the exception table stuff was
buried in the middle of the module.h file. I noticed this while
doing an audit for needless includes of module.h and found core
kernel files (both arch specific and arch independent) were just
including module.h for this.
The converse is also true, in that conventional drivers, be they
for filesystems or actual hardware peripherals or similar, do not
normally care about the exception tables.
Here we fork the exception table content out of module.h into a
new file called extable.h -- and temporarily include it into the
module.h itself.
Then we will work our way across the arch independent and arch
specific files needing just exception table content, and move
them off module.h and onto extable.h
Once that is done, we can remove the extable.h from module.h
and in doing it like this, we avoid introducing build failures
into the git history.
The gain here is that module.h gets a bit smaller, across all
modular drivers that we build for allmodconfig. Also the core
files that only need exception table stuff don't have an include
of module.h that brings in lots of extra stuff and just looks
generally out of place.
Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
modules: Add kernel parameter to blacklist modules
Blacklisting a module in linux has long been a problem. The current
procedure is to use rd.blacklist=module_name, however, that doesn't
cover the case after the initramfs and before a boot prompt (where one
is supposed to use /etc/modprobe.d/blacklist.conf to blacklist
runtime loading). Using rd.shell to get an early prompt is hit-or-miss,
and doesn't cover all situations AFAICT.
This patch adds this functionality of permanently blacklisting a module
by its name via the kernel parameter module_blacklist=module_name.
[v2]: Rusty, use core_param() instead of __setup() which simplifies
things.
[v3]: Rusty, undo wreckage from strsep()
[v4]: Rusty, simpler version of blacklisted()
Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: linux-doc@vger.kernel.org Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Steven Rostedt [Mon, 18 Jul 2016 20:29:24 +0000 (05:59 +0930)]
module: Do a WARN_ON_ONCE() for assert module mutex not held
When running with lockdep enabled, I triggered the WARN_ON() in the
module code that asserts when module_mutex or rcu_read_lock_sched are
not held. The issue I have is that this can also be called from the
dump_stack() code, causing us to enter an infinite loop...
Linus Torvalds [Wed, 3 Aug 2016 22:11:24 +0000 (18:11 -0400)]
drm: i915: fix build when DEBUG_FS is disabled
This clearly had never gotten tested, probably because you need a fairly
minimal configuration in order to disable DEBUG_FS (several other
options select it).
The dummy inline functions that were used for the no-DEBUG_FS case were
missing the argument names in the declarations.
Linus Torvalds [Wed, 3 Aug 2016 16:50:06 +0000 (12:50 -0400)]
Merge tag 'trace-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"A few updates and fixes:
- move the suppressing of the __builtin_return_address >0 warning to
the tracing directory only.
- metag recordmcount fix for newer glibc's
- two tracing histogram fixes that were reported by KASAN"
* tag 'trace-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix use-after-free in hist_register_trigger()
tracing: Fix use-after-free in hist_unreg_all/hist_enable_unreg_all
Makefile: Mute warning for __builtin_return_address(>0) for tracing only
ftrace/recordmcount: Work around for addition of metag magic but not relocations
Linus Torvalds [Tue, 2 Aug 2016 23:47:06 +0000 (19:47 -0400)]
Merge tag 'for-linus-v4.8' of git://github.com/martinbrandenburg/linux
Pull orangefs update from Martin Brandenburg:
"Kernel side caching and executable bugfix
This allows OrangeFS to utilize the dcache and adds an in kernel
attribute cache. We previously used the user side client for this
purpose.
We see a modest performance increase on small file operations. For
example, without the cache, compiling coreutils takes about 17
minutes. With the patch and a 50 millisecond timeout for
dcache_timeout_msecs and getattr_timeout_msecs (the default),
compiling coreutils takes about 6 minutes 20 seconds. On the same
hardware, compiling coreutils on an xfs filesystem takes 90 seconds.
We see similar improvements with mdtest and a test involving writing,
reading, and deleting a large number of small files.
Interested parties can review more data at the following URL.
The eventual goal of this is to allow getdents to turn into a
readdirplus to the OrangeFS server. The cache will be filled then,
which should provide a performance benefit to the common case of
readdir followed by getattr on each entry (i.e. ls -l).
This also fixes a bug. When orangefs_inode_permission was added, it
did not collect i_size from the OrangeFS server, since this presses an
unnecessary load on the OrangeFS server. However, it left a case
where i_size is never initialized. Then running an executable could
fail.
With this patch, size is always collected to be inserted into the
cache. Thus the bug disappears. If this patch is not accepted during
this merge window, we will send a one-line band-aid for this bug
instead"
* tag 'for-linus-v4.8' of git://github.com/martinbrandenburg/linux:
Orangefs: update orangefs.txt
orangefs: Account for jiffies wraparound.
orangefs: Change default dcache and getattr timeout to 50 msec.
orangefs: Allow dcache and getattr cache time to be configured.
orangefs: Cache getattr results.
orangefs: Use d_time to avoid excessive lookups
Linus Torvalds [Tue, 2 Aug 2016 23:39:09 +0000 (19:39 -0400)]
Merge tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-client
Pull Ceph updates from Ilya Dryomov:
"The highlights are:
- RADOS namespace support in libceph and CephFS (Zheng Yan and
myself). The stopgaps added in 4.5 to deny access to inodes in
namespaces are removed and CEPH_FEATURE_FS_FILE_LAYOUT_V2 feature
bit is now fully supported
- A large rework of the MDS cap flushing code (Zheng Yan)
- Handle some of ->d_revalidate() in RCU mode (Jeff Layton). We were
overly pessimistic before, bailing at the first sight of LOOKUP_RCU
On top of that we've got a few CephFS bug fixes, a couple of cleanups
and Arnd's workaround for a weird genksyms issue"
* tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-client: (34 commits)
ceph: fix symbol versioning for ceph_monc_do_statfs
ceph: Correctly return NXIO errors from ceph_llseek
ceph: Mark the file cache as unreclaimable
ceph: optimize cap flush waiting
ceph: cleanup ceph_flush_snaps()
ceph: kick cap flushes before sending other cap message
ceph: introduce an inode flag to indicates if snapflush is needed
ceph: avoid sending duplicated cap flush message
ceph: unify cap flush and snapcap flush
ceph: use list instead of rbtree to track cap flushes
ceph: update types of some local varibles
ceph: include 'follows' of pending snapflush in cap reconnect message
ceph: update cap reconnect message to version 3
ceph: mount non-default filesystem by name
libceph: fsmap.user subscription support
ceph: handle LOOKUP_RCU in ceph_d_revalidate
ceph: allow dentry_lease_is_valid to work under RCU walk
ceph: clear d_fsinfo pointer under d_lock
ceph: remove ceph_mdsc_lease_release
ceph: don't use ->d_time
...
Vegard Nossum [Tue, 2 Aug 2016 21:07:30 +0000 (14:07 -0700)]
kcov: allow more fine-grained coverage instrumentation
For more targeted fuzzing, it's better to disable kernel-wide
instrumentation and instead enable it on a per-subsystem basis. This
follows the pattern of UBSAN and allows you to compile in the kcov
driver without instrumenting the whole kernel.
To instrument a part of the kernel, you can use either
# for a single file in the current directory
KCOV_INSTRUMENT_filename.o := y
or
# for all the files in the current directory (excluding subdirectories)
KCOV_INSTRUMENT := y
or
# (same as above)
ccflags-y += $(CFLAGS_KCOV)
or
# for all the files in the current directory (including subdirectories)
subdir-ccflags-y += $(CFLAGS_KCOV)
init/Kconfig: add clarification for out-of-tree modules
It doesn't trim just symbols that are totally unused in-tree - it trims
the symbols unused by any in-tree modules actually built. If you've
done a 'make localmodconfig' and only build a hundred or so modules,
it's pretty likely that your out-of-tree module will come up lacking
something...
Hopefully this will save the next guy from a Homer Simpson "D'oh!"
moment.
Link: http://lkml.kernel.org/r/10177.1469787292@turing-police.cc.vt.edu Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Cc: Michal Marek <mmarek@suse.cz> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rob Herring [Tue, 2 Aug 2016 21:07:24 +0000 (14:07 -0700)]
config: add android config fragments
Copy the config fragments from the AOSP common kernel android-4.4
branch. It is becoming possible to run mainline kernels with Android,
but the kernel defconfigs don't work as-is and debugging missing config
options is a pain. Adding the config fragments into the kernel tree,
makes configuring a mainline kernel as simple as:
make ARCH=arm multi_v7_defconfig android-base.config android-recommended.config
The following non-upstream config options were removed:
Alexey Dobriyan [Tue, 2 Aug 2016 21:07:21 +0000 (14:07 -0700)]
init/Kconfig: ban CONFIG_LOCALVERSION_AUTO with allmodconfig
Doing patches with allmodconfig kernel compiled and committing stuff
into local tree have unfortunate consequence: kernel version changes (as
it should) leading to recompiling and relinking of several files even if
they weren't touched (or interesting at all). This and "git-whatever"
figuring out current version slow down compilation for no good reason.
But lets face it, "allmodconfig" kernels don't care about kernel
version, they are simply compile check guinea pigs.
Make LOCALVERSION_AUTO depend on !COMPILE_TEST, so it doesn't sneak into
allmodconfig .config.
Link: http://lkml.kernel.org/r/20160707214954.GC31678@p183.telecom.by Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akash Goel [Tue, 2 Aug 2016 21:07:18 +0000 (14:07 -0700)]
relay: add global mode support for buffer-only channels
Commit 20d8b67c06fa ("relay: add buffer-only channels; useful for early
logging") added support to use channels with no associated files.
This is useful when the exact location of relay file is not known or the
the parent directory of relay file is not available, while creating the
channel and the logging has to start right from the boot.
But there was no provision to use global mode with buffer-only channels,
which is added by this patch, without modifying the interface where
initially there will be a dummy invocation of create_buf_file callback
through which kernel client can convey the need of a global buffer.
For the use case where drivers/kernel clients want a simple interface
for the userspace, which enables them to capture data/logs from relay
file inorder & without any post processing, support of Global buffer
mode is warranted.
Modules, like i915, using relay_open() in early init would have to later
register their buffer-only relays, once debugfs is available, by calling
relay_late_setup_files(). Hence relay_late_setup_files() symbol also
needs to be exported.
Link: http://lkml.kernel.org/r/1468404563-11653-1-git-send-email-akash.goel@intel.com Signed-off-by: Akash Goel <akash.goel@intel.com> Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Prarit Bhargava [Tue, 2 Aug 2016 21:07:15 +0000 (14:07 -0700)]
init: allow blacklisting of module_init functions
sprint_symbol_no_offset() returns the string "function_name
[module_name]" where [module_name] is not printed for built in kernel
functions. This means that the blacklisting code will fail when
comparing module function names with the extended string.
This patch adds the functionality to block a module's module_init()
function by finding the space in the string and truncating the
comparison to that length.
Link: http://lkml.kernel.org/r/1466124387-20446-1-git-send-email-prarit@redhat.com Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yang Shi <yang.shi@linaro.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Kees Cook <keescook@chromium.org> Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit e93762bbf681 ("w1: masters: omap_hdq: add support for 1-wire
mode") added a statement to clear the hdq_irqstatus flags in
hdq_read_byte().
If the hdq reading process is scheduled slowly or interrupts are
disabled for a while the hardware read activity might already be
finished on entry of hdq_read_byte(). And hdq_isr() already has set the
hdq_irqstatus to 0x6 (can be seen in debug mode) denoting that both, the
TXCOMPLETE and RXCOMPLETE interrupts occurred in parallel.
This means there is no need to wait and the hdq_read_byte() can just
read the byte from the hdq controller.
By resetting hdq_irqstatus to 0 the read process is forced to be always
waiting again (because the if statement always succeeds) but the
hardware will not issue another RXCOMPLETE interrupt. This results in a
false timeout.
Andrew F. Davis [Tue, 2 Aug 2016 21:07:09 +0000 (14:07 -0700)]
w1: add helper macro module_w1_family
The helper macro module_w1_family can be used in module drivers that
only register a w1 driver in their module init functions. Add this
macro and use it in all applicable drivers.
Link: http://lkml.kernel.org/r/20160531204313.20979-2-afd@ti.com Signed-off-by: Andrew F. Davis <afd@ti.com> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew F. Davis [Tue, 2 Aug 2016 21:07:06 +0000 (14:07 -0700)]
w1: remove need for ida and use PLATFORM_DEVID_AUTO
PLATFORM_DEVID_AUTO can be used to have the platform core assign a
unique ID instead of manually creating one with IDA. Do this in all
applicable drivers.
Link: http://lkml.kernel.org/r/20160531204313.20979-1-afd@ti.com Signed-off-by: Andrew F. Davis <afd@ti.com> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement changes made in RapidIO specification rev.3 to LP-Serial Physical
Layer register definitions:
- use per-port register offset calculations based on LP-Serial Extended
Features Block (EFB) Register Map type (I or II) with different
per-port offset step (0x20 vs 0x40 respectfully).
- remove deprecated Parallel Physical layer definitions and related
code.
Current definition of map_inb() mport operations callback uses u32 type
to specify required inbound window (IBW) size. This is limiting factor
because existing hardware - tsi721 and fsl_rio, both support IBW size up
to 16GB.
Changing type of size parameter to u64 to allow IBW size configurations
larger than 4GB.
[alexandre.bounine@idt.com: remove compiler warning about size of constant] Link: http://lkml.kernel.org/r/20160802184856.2566-1-alexandre.bounine@idt.com Link: http://lkml.kernel.org/r/1469125134-16523-11-git-send-email-alexandre.bounine@idt.com Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rapidio/tsi721_dma: advance queue processing from transfer submit call
Add advancing transfer queue immediately from transfer submit call. DMA
performance improvement: This will start transfer without waiting for
'issue_pending' command if there is no DMA transfer in progress.
Link: http://lkml.kernel.org/r/1469125134-16523-8-git-send-email-alexandre.bounine@idt.com Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add module parameter to allow load time configuration of available
RapidIO messaging mailboxes (MBOX1 - MBOX4).
Having a messaging MBOX selector mask allows to define which MBOXes are
controlled by the mport device driver and reserve some of them for
direct use by other drivers.
Link: http://lkml.kernel.org/r/1469125134-16523-7-git-send-email-alexandre.bounine@idt.com Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Tested-by: Barry Wood <barry.wood@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add PCIe Maximum Read Request Size (MRRS) adjustment parameter to allow
users to override configuration register value set during PCIe bus
initialization.
Performance of Tsi721 device as PCIe bus master can be improved if MRRS
is set to its maximum value (4096 bytes). Some platforms have
limitations for supported MRRS and therefore the default value should be
preserved, unless it is known that given platform supports full set of
MRRS values defined by PCI Express specification.
Link: http://lkml.kernel.org/r/1469125134-16523-6-git-send-email-alexandre.bounine@idt.com Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rapidio/tsi721_dma: add channel mask and queue size parameters
Add module parameters to allow load time configuration of DMA channels.
Depending on application, performance of DMA data transfers can benefit
from adjusted sizes of buffer descriptor ring and/or transaction
requests queue.
Having HW DMA channel selector mask allows to define which channels
(from seven available) are controlled by the mport device driver and
reserve some of them for direct use by other drivers.
Link: http://lkml.kernel.org/r/1469125134-16523-5-git-send-email-alexandre.bounine@idt.com Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Tested-by: Barry Wood <barry.wood@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rapidio: fix return value description for dma_prep functions
Update return value description for rio_dma_prep_... functions to
include error-valued pointer that can be returned by HW mport device
drivers. Return values from these functions must be checked using
IS_ERR_OR_NULL macro.
This patch is applicable to kernel versions starting from v4.6-rc1.
Link: http://lkml.kernel.org/r/1469125134-16523-4-git-send-email-alexandre.bounine@idt.com Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 2 Aug 2016 21:06:28 +0000 (14:06 -0700)]
rapidio: remove unnecessary 0x prefixes before %pa extension uses
Patch series "RapidIO subsystem updates".
This set of patches contains RapidIO subsystem fixes and updates that
have been made since kernel v4.6. The most significant update brings
changes related to the latest revision of RapidIO specification
(rev.3.x) and introduction of next generation of RapidIO switches by IDT
(RXS1632 and RXS2448).
This patch (of 13):
This is RapidIO part of the original patch submitted by Joe Perches.
(see: https://lkml.org/lkml/2016/3/5/19)
Since commit 3cab1e711297 ("lib/vsprintf: refactor duplicate code
to special_hex_number()") %pa uses have been output with a 0x prefix.
These 0x prefixes in the formats are unnecessary.
Link: http://lkml.kernel.org/r/1469125134-16523-2-git-send-email-alexandre.bounine@idt.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add channelized messaging driver to support native RapidIO messaging
exchange between multiple senders/recipients on devices that use kernel
RapidIO subsystem services.
This device driver is the result of collaboration within the RapidIO.org
Software Task Group (STG) between Texas Instruments, Prodrive
Technologies, Nokia Networks, BAE and IDT. Additional input was
received from other members of RapidIO.org.
The objective was to create a character mode driver interface which
exposes messaging capabilities of RapidIO endpoint devices (mports)
directly to applications, in a manner that allows the numerous and
varied RapidIO implementations to interoperate.
This char mode device driver allows user-space applications to setup
messaging communication channels using single shared RapidIO messaging
mailbox.
By default this driver uses RapidIO MBOX_1 (MBOX_0 is reserved for use by
RIONET Ethernet emulation driver).
zhong jiang [Tue, 2 Aug 2016 21:06:22 +0000 (14:06 -0700)]
kexec: add restriction on kexec_load() segment sizes
I hit the following issue when run trinity in my system. The kernel is
3.4 version, but mainline has the same issue.
The root cause is that the segment size is too large so the kerenl
spends too long trying to allocate a page. Other cases will block until
the test case quits. Also, OOM conditions will occur.
Petr Tesarik [Tue, 2 Aug 2016 21:06:19 +0000 (14:06 -0700)]
kexec: allow kdump with crash_kexec_post_notifiers
If a crash kernel is loaded, do not crash the running domain. This is
needed if the kernel is loaded with crash_kexec_post_notifiers, because
panic notifiers are run before __crash_kexec() in that case, and this
Xen hook prevents its being called later.
[akpm@linux-foundation.org: build fix: unconditionally include kexec.h] Link: http://lkml.kernel.org/r/20160713122000.14969.99963.stgit@hananiah.suse.cz Signed-off-by: Petr Tesarik <ptesarik@suse.com> Cc: Juergen Gross <jgross@suse.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Eric Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Petr Tesarik [Tue, 2 Aug 2016 21:06:16 +0000 (14:06 -0700)]
kexec: add a kexec_crash_loaded() function
Provide a wrapper function to be used by kernel code to check whether a
crash kernel is loaded. It returns the same value that can be seen in
/sys/kernel/kexec_crash_loaded by userspace programs.
I'm exporting the function, because it will be used by Xen, and it is
possible to compile Xen modules separately to enable the use of PV
drivers with unmodified bare-metal kernels.
Link: http://lkml.kernel.org/r/20160713121955.14969.69080.stgit@hananiah.suse.cz Signed-off-by: Petr Tesarik <ptesarik@suse.com> Cc: Juergen Gross <jgross@suse.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Eric Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hidehiro Kawai [Tue, 2 Aug 2016 21:06:13 +0000 (14:06 -0700)]
kexec: use core_param for crash_kexec_post_notifiers boot option
crash_kexec_post_notifiers ia a boot option which controls whether the
1st kernel calls panic notifiers or not before booting the 2nd kernel.
However, there is no need to limit it to being modifiable only at boot
time. So, use core_param instead of early_param.
Link: http://lkml.kernel.org/r/20160705113327.5864.43139.stgit@softrs Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Tue, 2 Aug 2016 21:06:10 +0000 (14:06 -0700)]
ARM: kexec: fix kexec for Keystone 2
Provide kexec with the boot view of memory by overriding the normal
kexec translation functions added in a previous patch. We also need to
fix a call to memblock in machine_kexec_prepare() so that we provide it
with a running-view physical address rather than a boot- view physical
address.
Link: http://lkml.kernel.org/r/E1b8koa-0004Hl-Ey@rmk-PC.armlinux.org.uk Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Keerthy <j-keerthy@ti.com> Cc: Pratyush Anand <panand@redhat.com> Cc: Vitaly Andrianov <vitalya@ti.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit adds definition for cpu_on, cpu_off and cpu_suspend
commands. These definitions must match the corresponding PSCI
definitions in boot monitor.
Having those command and corresponding PSCI support in boot monitor
allows run time CPU hot plugin.
Link: http://lkml.kernel.org/r/E1b8koV-0004Hf-2j@rmk-PC.armlinux.org.uk Signed-off-by: Keerthy <j-keerthy@ti.com> Signed-off-by: Vitaly Andrianov <vitalya@ti.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Pratyush Anand <panand@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Tue, 2 Aug 2016 21:06:04 +0000 (14:06 -0700)]
kexec: allow architectures to override boot mapping
kexec physical addresses are the boot-time view of the system. For
certain ARM systems (such as Keystone 2), the boot view of the system
does not match the kernel's view of the system: the boot view uses a
special alias in the lower 4GB of the physical address space.
To cater for these kinds of setups, we need to translate between the
boot view physical addresses and the normal kernel view physical
addresses. This patch extracts the current transation points into
linux/kexec.h, and allows an architecture to override the functions.
Due to the translations required, we unfortunately end up with six
translation functions, which are reduced down to four that the
architecture can override.
[akpm@linux-foundation.org: kexec.h needs asm/io.h for phys_to_virt()] Link: http://lkml.kernel.org/r/E1b8koP-0004HZ-Vf@rmk-PC.armlinux.org.uk Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Keerthy <j-keerthy@ti.com> Cc: Pratyush Anand <panand@redhat.com> Cc: Vitaly Andrianov <vitalya@ti.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Tue, 2 Aug 2016 21:06:00 +0000 (14:06 -0700)]
kdump: arrange for paddr_vmcoreinfo_note() to return phys_addr_t
On PAE systems (eg, ARM LPAE) the vmcore note may be located above 4GB
physical on 32-bit architectures, so we need a wider type than "unsigned
long" here. Arrange for paddr_vmcoreinfo_note() to return a
phys_addr_t, thereby allowing it to be located above 4GB.
This makes no difference for kexec-tools, as they already assume a
64-bit type when reading from this file.
Link: http://lkml.kernel.org/r/E1b8koK-0004HS-K9@rmk-PC.armlinux.org.uk Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Reviewed-by: Pratyush Anand <panand@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Keerthy <j-keerthy@ti.com> Cc: Vitaly Andrianov <vitalya@ti.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Tue, 2 Aug 2016 21:05:57 +0000 (14:05 -0700)]
kexec: ensure user memory sizes do not wrap
Ensure that user memory sizes do not wrap around when validating the
user input, which can lead to the following input validation working
incorrectly.
[akpm@linux-foundation.org: fix it for kexec-return-error-number-directly.patch] Link: http://lkml.kernel.org/r/E1b8koF-0004HM-5x@rmk-PC.armlinux.org.uk Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Reviewed-by: Pratyush Anand <panand@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Keerthy <j-keerthy@ti.com> Cc: Vitaly Andrianov <vitalya@ti.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Tue, 2 Aug 2016 21:05:54 +0000 (14:05 -0700)]
kexec: don't invoke OOM-killer for control page allocation
If we are unable to find a suitable page when allocating the control
page, do not invoke the OOM-killer: killing processes probably isn't
going to help.
Link: http://lkml.kernel.org/r/E1b8ko9-0004HG-R5@rmk-PC.armlinux.org.uk Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Reviewed-by: Pratyush Anand <panand@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Keerthy <j-keerthy@ti.com> Cc: Vitaly Andrianov <vitalya@ti.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Tue, 2 Aug 2016 21:05:51 +0000 (14:05 -0700)]
ARM: kexec: advertise location of bootable RAM
Advertise the location of bootable RAM to kexec-tools. kexec needs to
know where it can place the kernel in RAM, and so be executable when the
system needs to jump into it.
Advertise these areas in /proc/iomem with a "System RAM (boot alias)"
tag.
Link: http://lkml.kernel.org/r/E1b8ko4-0004HA-GF@rmk-PC.armlinux.org.uk Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Reviewed-by: Pratyush Anand <panand@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Keerthy <j-keerthy@ti.com> Cc: Vitaly Andrianov <vitalya@ti.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Advertise a resource which describes where the crash kernel is located
in the boot view of RAM. This allows kexec-tools to have this vital
information.
Link: http://lkml.kernel.org/r/E1b8knz-0004H4-Bd@rmk-PC.armlinux.org.uk Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Baoquan He <bhe@redhat.com> Cc: Keerthy <j-keerthy@ti.com> Cc: Pratyush Anand <panand@redhat.com> Cc: Vitaly Andrianov <vitalya@ti.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minfei Huang [Tue, 2 Aug 2016 21:05:45 +0000 (14:05 -0700)]
kexec: return error number directly
This is a cleanup patch to make kexec more clear to return error number
directly. The variable result is useless, because there is no other
function's return value assignes to it. So remove it.
Link: http://lkml.kernel.org/r/1464179273-57668-1-git-send-email-mnghuan@gmail.com Signed-off-by: Minfei Huang <mnghuan@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Xunlei Pang <xlpang@redhat.com> Cc: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Lutomirski [Tue, 2 Aug 2016 21:05:36 +0000 (14:05 -0700)]
signal: consolidate {TS,TLF}_RESTORE_SIGMASK code
In general, there's no need for the "restore sigmask" flag to live in
ti->flags. alpha, ia64, microblaze, powerpc, sh, sparc (64-bit only),
tile, and x86 use essentially identical alternative implementations,
placing the flag in ti->status.
Replace those optimized implementations with an equally good common
implementation that stores it in a bitfield in struct task_struct and
drop the custom implementations.
Additional architectures can opt in by removing their
TIF_RESTORE_SIGMASK defines.
Link: http://lkml.kernel.org/r/8a14321d64a28e40adfddc90e18a96c086a6d6f9.1468522723.git.luto@kernel.org Signed-off-by: Andy Lutomirski <luto@kernel.org> Tested-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Mahoney [Tue, 2 Aug 2016 21:05:33 +0000 (14:05 -0700)]
reiserfs: fix "new_insert_key may be used uninitialized ..."
new_insert_key only makes any sense when it's associated with a
new_insert_ptr, which is initialized to NULL and changed to a
buffer_head when we also initialize new_insert_key. We can key off of
that to avoid the uninitialized warning.
Link: http://lkml.kernel.org/r/5eca5ffb-2155-8df2-b4a2-f162f105efed@suse.com Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Jan Kara <jack@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:30 +0000 (14:05 -0700)]
nilfs2: move ioctl interface and disk layout to uapi separately
The header file "include/linux/nilfs2_fs.h" is composed of parts for
ioctl and disk format, and both are intended to be shared with user
space programs.
This moves them to the uapi directory "include/uapi/linux" splitting the
file to "nilfs2_api.h" and "nilfs2_ondisk.h". The following minor
changes are accompanied by this migration:
- nilfs_direct_node struct in nilfs2/direct.h is converged to
nilfs2_ondisk.h because it's an on-disk structure.
- inline functions nilfs_rec_len_from_disk() and
nilfs_rec_len_to_disk() are moved to nilfs2/dir.c.
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:25 +0000 (14:05 -0700)]
nilfs2: fix misuse of a semaphore in sysfs code
Variables ns_seg_seq, ns_segnum, ns_nextnum, ns_pseg_offset, ns_cno,
ns_ctime, ns_nongc_ctime, and ns_ndirtyblks, are protected by
ns_segctor_sem, but ns_sem is wrongly used by the nilfs sysfs code when
reading these variables. This fixes the misuse and clarifies which
semaphore protects them in the comment of the_nilfs struct.
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:22 +0000 (14:05 -0700)]
nilfs2: refactor parser of snapshot mount option
Move parser of snapshot mount option to a separate function
nilfs_parse_snapshot_option(), replace simple_strtoull() with
kstrtoull() to avoid checkpatch.pl warning "WARNING: simple_strtoull is
obsolete, use kstrtoull instead", and refine the error message of the
parser.
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:19 +0000 (14:05 -0700)]
nilfs2: do not use yield()
Use cond_resched() instead of yield() in the loop of
nilfs_transaction_lock() since the usage corresponds to the "be nice for
others" case that the comment of yield() says.
This removes the following checkpatch.pl warning:
"WARNING: Using yield() is generally wrong. See yield() kernel-doc
(sched/core.c)"
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:17 +0000 (14:05 -0700)]
nilfs2: emit error message when I/O error is detected
When nilfs returned -EIO as an error code, it's not always clear if it
came from the underlying block device or not. This will mend the issue
by having low level I/O routines of nilfs output an error message when
they detected an I/O error.
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:14 +0000 (14:05 -0700)]
nilfs2: replace nilfs_warning() with nilfs_msg()
Use nilfs_msg() to output warning messages and get rid of
nilfs_warning() function. This also removes function names from the
messages unless we embed them explicitly in format strings. Instead,
some messages are revised to clarify the context.
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:10 +0000 (14:05 -0700)]
nilfs2: reduce bare use of printk() with nilfs_msg()
Replace most use of printk() in nilfs2 implementation with nilfs_msg(),
and reduce the following checkpatch.pl warning:
"WARNING: Prefer [subsystem eg: netdev]_crit([subsystem]dev, ...
then dev_crit(dev, ... then pr_crit(... to printk(KERN_CRIT ..."
This patch also fixes a minor checkpatch warning "WARNING: quoted string
split across lines" that often accompanies the prior warning, and amends
message format as needed.
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:06 +0000 (14:05 -0700)]
nilfs2: embed a back pointer to super block instance in nilfs object
Insert a back pointer to super block instance in nilfs object so that
functions of nilfs2 easily refer to the super block instance. This
simplifies replacement of printk() in the successive change.
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:02 +0000 (14:05 -0700)]
nilfs2: add nilfs_msg() message interface
Define an own output routine to replace bare use of printk() function.
The output routine is implemented with a macro and a helper function,
which are named nilfs_msg() and __nilfs_msg(), respectively.
__nilfs_msg() formats a message like "NILFS (<device-name>): <message>",
prefixing it with a given log level, and terminates the statement with a
newline. The "device-name" is optional to make it available in early
stages; it will be omitted if a NULL pointer is passed to super block
instance argument. nilfs_msg() wraps __nilfs_msg() and is removed if
CONFIG_PRINTK is not set.
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:00 +0000 (14:05 -0700)]
nilfs2: hide function name argument from nilfs_error()
Simplify nilfs_error(), an output function used to report critical
issues in file system. This renames the original nilfs_error() function
to __nilfs_error() and redefines it as a macro to hide its function name
argument within the macro.
Every call site of nilfs_error() is changed to strip __func__ argument
except nilfs_bmap_convert_error(); nilfs_bmap_convert_error() directly
calls __nilfs_error() because it inherits caller's function name.