smp: Fix error case handling in smp_call_function_*()
Commit 8053871d0f7f ("smp: Fix smp_call_function_single_async()
locking") fixed the locking for the asynchronous smp-call case, but in
the process of moving the lock handling around, one of the error cases
ended up not unlocking the call data at all.
This went unnoticed on x86, because this is a "caller is buggy" case,
where the caller is trying to call a non-existent CPU. But apparently
ARM does that (at least under qemu-arm). Bindly doing cross-cpu calls
to random CPU's that aren't even online seems a bit fishy, but the error
handling was clearly not correct.
Simply add the missing "csd_unlock()" to the error path.
Pull sparc fixes from David Miller
"Unfortunately, I brown paper bagged the generic iommu pool allocator
by applying the wrong revision of the patch series.
This reverts the bad one, and puts the right one in"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
iommu-common: Fix PARISC compile-time warnings
sparc: Make LDC use common iommu poll management functions
sparc: Make sparc64 use scalable lib/iommu-common.c functions
Break up monolithic iommu table/lock into finer graularity pools and lock
sparc: Revert generic IOMMU allocator.
Merge tag 'for-linus-4.1-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
Pull 9pfs updates from Eric Van Hensbergen:
"Some accumulated cleanup patches for kerneldoc and unused variables as
well as some lock bug fixes and adding privateport option for RDMA"
* tag 'for-linus-4.1-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
net/9p: add a privport option for RDMA transport.
fs/9p: Initialize status in v9fs_file_do_lock.
net/9p: Initialize opts->privport as it should be.
net/9p: use memcpy() instead of snprintf() in p9_mount_tag_show()
9p: use unsigned integers for nwqid/count
9p: do not crash on unknown lock status code
9p: fix error handling in v9fs_file_do_lock
9p: remove unused variable in p9_fd_create()
9p: kerneldoc warning fixes
David S. Miller [Sat, 18 Apr 2015 19:35:09 +0000 (12:35 -0700)]
Merge branch 'iommu-generic-allocator'
Sowmini Varadhan says:
====================
Generic IOMMU pooled allocator
Investigation of network performance on Sparc shows a high
degree of locking contention in the IOMMU allocator, and it
was noticed that the PowerPC code has a better locking model.
This patch series tries to extract the generic parts of the
PowerPC code so that it can be shared across multiple PCI
devices and architectures.
v10: resend patchv9 without RFC tag, and a new mail Message-Id,
(previous non-RFC attempt did not show up on the patchwork queue?)
Full revision history below:
v2 changes:
- incorporate David Miller editorial comments: sparc specific
fields moved from iommu-common into sparc's iommu_64.h
- make the npools value an input parameter, for the case when
the iommu map size is not very large
- cookie_to_index mapping, and optimizations for span-boundary
check, for use case such as LDC.
v3: eliminate iommu_sparc, rearrange the ->demap indirection to
be invoked under the pool lock.
v4: David Miller review changes:
- s/IOMMU_ERROR_CODE/DMA_ERROR_CODE
- page_table_map_base and page_table_shift are unsigned long, not u32.
v5: removed ->cookie_to_index and ->demap indirection from the
iommu_tbl_ops The caller needs to call these functions as needed,
before invoking the generic arena allocator functions.
Added the "skip_span_boundary" argument to iommu_tbl_pool_init() for
those callers like LDC which do no care about span boundary checks.
v6: removed iommu_tbl_ops, and instead pass the ->flush_all as
an indirection to iommu_tbl_pool_init(); only invoke ->flush_all
when there is no large_pool, based on the assumption that large-pool
usage is infrequently encountered
v7: moved pool_hash initialization to lib/iommu-common.c and cleaned up
code duplication from sun4v/sun4u/ldc.
v8: Addresses BenH comments with one exception: I've left the
IOMMU_POOL_HASH as is, so that powerpc can tailor it to their
convenience. Discard trylock for simple spin_lock to acquire pool
v9: Addresses latest BenH comments: need_flush checks, add support
for dma mask and align_order.
v10: resend without RFC tag, and new mail Message-Id.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
sparc: Make LDC use common iommu poll management functions
Note that this conversion is only being done to consolidate the
code and ensure that the common code provides the sufficient
abstraction. It is not expected to result in any noticeable
performance improvement, as there is typically one ldc_iommu
per vnet_port, and each one has 8k entries, with a typical
request for 1-4 pages. Thus LDC uses npools == 1.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David S. Miller <davem@davemloft.net>
sparc: Make sparc64 use scalable lib/iommu-common.c functions
In iperf experiments running linux as the Tx side (TCP client) with
10 threads results in a severe performance drop when TSO is disabled,
indicating a weakness in the software that can be avoided by using
the scalable IOMMU arena DMA allocation.
Baseline numbers before this patch:
with default settings (TSO enabled) : 9-9.5 Gbps
Disable TSO using ethtool- drops badly: 2-3 Gbps.
After this patch, iperf client with 10 threads, can give a
throughput of at least 8.5 Gbps, even when TSO is disabled.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Break up monolithic iommu table/lock into finer graularity pools and lock
Investigation of multithreaded iperf experiments on an ethernet
interface show the iommu->lock as the hottest lock identified by
lockstat, with something of the order of 21M contentions out of
27M acquisitions, and an average wait time of 26 us for the lock.
This is not efficient. A more scalable design is to follow the ppc
model, where the iommu_map_table has multiple pools, each stretching
over a segment of the map, and with a separate lock for each pool.
This model allows for better parallelization of the iommu map search.
This patch adds the iommu range alloc/free function infrastructure.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge branch 'x86-pmem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull PMEM driver from Ingo Molnar:
"This is the initial support for the pmem block device driver:
persistent non-volatile memory space mapped into the system's physical
memory space as large physical memory regions.
The driver is based on Intel code, written by Ross Zwisler, with fixes
by Boaz Harrosh, integrated with x86 e820 memory resource management
and tidied up by Christoph Hellwig.
Note that there were two other separate pmem driver submissions to
lkml: but apparently all parties (Ross Zwisler, Boaz Harrosh) are
reasonably happy with this initial version.
This version enables minimal support that enables persistent memory
devices out in the wild to work as block devices, identified through a
magic (non-standard) e820 flag and auto-discovered if
CONFIG_X86_PMEM_LEGACY=y, or added explicitly through manipulating the
memory maps via the "memmap=..." boot option with the new, special '!'
modifier character.
Limitations: this is a regular block device, and since the pmem areas
are not struct page backed, they are invisible to the rest of the
system (other than the block IO device), so direct IO to/from pmem
areas, direct mmap() or XIP is not possible yet. The page cache will
also shadow and double buffer pmem contents, etc.
Initial support is for x86"
* 'x86-pmem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
drivers/block/pmem: Fix 32-bit build warning in pmem_alloc()
drivers/block/pmem: Add a driver for persistent memory
x86/mm: Add support for the non-standard protected e820 type
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"This tree includes:
- an FPU related crash fix
- a ptrace fix (with matching testcase in tools/testing/selftests/)
- an x86 Kconfig DMA-config defaults tweak to better avoid
non-working drivers"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
config: Enable NEED_DMA_MAP_STATE by default when SWIOTLB is selected
x86/fpu: Load xsave pointer *after* initialization
x86/ptrace: Fix the TIF_FORCED_TF logic in handle_signal()
x86, selftests: Add single_step_syscall test
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"This update has mostly fixes, but also other bits:
- perf tooling fixes
- PMU driver fixes
- Intel Broadwell PMU driver HW-enablement for LBR callstacks
- a late coming 'perf kmem' tool update that enables it to also
analyze page allocation data. Note, this comes with MM tracepoint
changes that we believe to not break anything: because it changes
the formerly opaque 'struct page *' field that uniquely identifies
pages to 'pfn' which identifies pages uniquely too, but isn't as
opaque and can be used for other purposes as well"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/pt: Fix and clean up error handling in pt_event_add()
perf/x86/intel: Add Broadwell support for the LBR callstack
perf/x86/intel/rapl: Fix energy counter measurements but supporing per domain energy units
perf/x86/intel: Fix Core2,Atom,NHM,WSM cycles:pp events
perf/x86: Fix hw_perf_event::flags collision
perf probe: Fix segfault when probe with lazy_line to file
perf probe: Find compilation directory path for lazy matching
perf probe: Set retprobe flag when probe in address-based alternative mode
perf kmem: Analyze page allocator events also
tracing, mm: Record pfn instead of pointer to struct page
Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
"Two fixes: an smp-call fix and a lockdep fix"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
smp: Fix smp_call_function_single_async() locking
lockdep: Make print_lock() robust against concurrent release
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull usernamespace mount fixes from Eric Biederman:
"Way back in October Andrey Vagin reported that umount(MNT_DETACH)
could be used to defeat MNT_LOCKED. As I worked to fix this I
discovered that combined with mount propagation and an appropriate
selection of shared subtrees a reference to a directory on an
unmounted filesystem is not necessary.
That MNT_DETACH is allowed in user namespace in a form that can break
MNT_LOCKED comes from my early misunderstanding what MNT_DETACH does.
To avoid breaking existing userspace the conflict between MNT_DETACH
and MNT_LOCKED is fixed by leaving mounts that are locked to their
parents in the mount hash table until the last reference goes away.
While investigating this issue I also found an issue with
__detach_mounts. The code was unnecessarily and incorrectly
triggering mount propagation. Resulting in too many mounts going away
when a directory is deleted, and too many cpu cycles are burned while
doing that.
Looking some more I realized that __detach_mounts by only keeping
mounts connected that were MNT_LOCKED it had the potential to still
leak information so I tweaked the code to keep everything locked
together that possibly could be.
This code was almost ready last cycle but Al invented fs_pin which
slightly simplifies this code but required rewrites and retesting, and
I have not been in top form for a while so it took me a while to get
all of that done. Similiarly this pull request is late because I have
been feeling absolutely miserable all week.
The issue of being able to escape a bind mount has not yet been
addressed, as the fixes are not yet mature"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
mnt: Update detach_mounts to leave mounts connected
mnt: Fix the error check in __detach_mounts
mnt: Honor MNT_LOCKED when detaching mounts
fs_pin: Allow for the possibility that m_list or s_list go unused.
mnt: Factor umount_mnt from umount_tree
mnt: Factor out unhash_mnt from detach_mnt and umount_tree
mnt: Fail collect_mounts when applied to unmounted mounts
mnt: Don't propagate unmounts to locked mounts
mnt: On an unmount propagate clearing of MNT_LOCKED
mnt: Delay removal from the mount hash.
mnt: Add MNT_UMOUNT flag
mnt: In umount_tree reuse mnt_list instead of mnt_hash
mnt: Don't propagate umounts in __detach_mounts
mnt: Improve the umount_tree flags
mnt: Use hlist_move_list in namespace_unlock
Merge tag 'for-f2fs-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"New features:
- in-memory extent_cache
- fs_shutdown to test power-off-recovery
- use inline_data to store symlink path
- show f2fs as a non-misc filesystem
Major fixes:
- avoid CPU stalls on sync_dirty_dir_inodes
- fix some power-off-recovery procedure
- fix handling of broken symlink correctly
- fix missing dot and dotdot made by sudden power cuts
- handle wrong data index during roll-forward recovery
- preallocate data blocks for direct_io
... and a bunch of minor bug fixes and cleanups"
* tag 'for-f2fs-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (71 commits)
f2fs: pass checkpoint reason on roll-forward recovery
f2fs: avoid abnormal behavior on broken symlink
f2fs: flush symlink path to avoid broken symlink after POR
f2fs: change 0 to false for bool type
f2fs: do not recover wrong data index
f2fs: do not increase link count during recovery
f2fs: assign parent's i_mode for empty dir
f2fs: add F2FS_INLINE_DOTS to recover missing dot dentries
f2fs: fix mismatching lock and unlock pages for roll-forward recovery
f2fs: fix sparse warnings
f2fs: limit b_size of mapped bh in f2fs_map_bh
f2fs: persist system.advise into on-disk inode
f2fs: avoid NULL pointer dereference in f2fs_xattr_advise_get
f2fs: preallocate fallocated blocks for direct IO
f2fs: enable inline data by default
f2fs: preserve extent info for extent cache
f2fs: initialize extent tree with on-disk extent info of inode
f2fs: introduce __{find,grab}_extent_tree
f2fs: split set_data_blkaddr from f2fs_update_extent_cache
f2fs: enable fast symlink by utilizing inline data
...
Merge tag 'docs-for-linus' of git://git.lwn.net/linux-2.6
Pull documentation updates from Jonathan Corbet:
"Numerous fixes, the overdue removal of the i2o docs, some new Chinese
translations, and, hopefully, the README fix that will end the flow of
identical patches to that file"
* tag 'docs-for-linus' of git://git.lwn.net/linux-2.6: (34 commits)
Documentation/memcg: update memcg/kmem status
Documentation: blackfin: Makefile: Typo building issue
Documentation/vm/pagemap.txt: correct location of page-types tool
Documentation/memory-barriers.txt: typo fix
doc: Add guest_nice column to example output of `cat /proc/stat'
Documentation/kernel-parameters: Move "eagerfpu" to its right place
Documentation: gpio: Update ACPI part of the document to mention _DSD
docs/completion.txt: Various tweaks and corrections
doc: completion: context, scope and language fixes
Documentation:Update Documentation/zh_CN/arm64/memory.txt
Documentation:Update Documentation/zh_CN/arm64/booting.txt
Documentation: Chinese translation of arm64/legacy_instructions.txt
DocBook media: fix broken EIA hyperlink
Documentation: tweak the maintainers entry
README: Change gzip/bzip2 to xz compression format
README: Update version number reference
doc:pci: Fix typo in Documentation/PCI
Documentation: drm: Use '->' when describing access through pointers.
Documentation: Remove mentioning of block barriers
Documentation/email-clients.txt: Fix one grammar mistake, add extra info about TB
...
Bluetooth: hidp: Fix regression with older userspace and flags validation
While it is not used by newer userspace anymore, the older userspace was
utilizing HIDP_VIRTUAL_CABLE_UNPLUG and HIDP_BOOT_PROTOCOL_MODE flags
when adding a new HIDP connection.
The flags validation is important, but we can not break older userspace
and with that allow providing these flags even if newer userspace does
not use them anymore.
config: Enable NEED_DMA_MAP_STATE by default when SWIOTLB is selected
A huge amount of NIC drivers use the DMA API, however if
compiled under 32-bit an very important part of the DMA API can
be ommitted leading to the drivers not working at all
(especially if used with 'swiotlb=force iommu=soft').
As Prashant Sreedharan explains it: "the driver [tg3] uses
DEFINE_DMA_UNMAP_ADDR(), dma_unmap_addr_set() to keep a copy of
the dma "mapping" and dma_unmap_addr() to get the "mapping"
value. On most of the platforms this is a no-op, but ... with
"iommu=soft and swiotlb=force" this house keeping is required,
... otherwise we pass 0 while calling pci_unmap_/pci_dma_sync_
instead of the DMA address."
As such enable this even when using 32-bit kernels.
Reported-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Prashant Sreedharan <prashant@broadcom.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Chan <mchan@broadcom.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boris.ostrovsky@oracle.com Cc: cascardo@linux.vnet.ibm.com Cc: david.vrabel@citrix.com Cc: sanjeevb@broadcom.com Cc: siva.kallam@broadcom.com Cc: vyasevich@gmail.com Cc: xen-devel@lists.xensource.com Link: http://lkml.kernel.org/r/20150417190448.GA9462@l.oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Merge tag 'devicetree-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux
Pull devicetree changes from Grant Likely:
"Here are the devicetree changes queued up for v4.1. Nothing really
exciting here. Rob has another few commits for big-endian attached
UARTs, but those will be sent in a separate merge request since they
haven't been as thoroughly tested as this batch.
Here are the highlights:
- lots of unittest cleanup from Frank Rowand
- bugfixes and updates to the of_graph code
- tighten up of_get_mac_address() code
- documentation updates"
* tag 'devicetree-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux:
of/unittest: Fix of_platform_depopulate test case
of/unittest: early return from test skips tests
of/unittest: breadcrumbs to reduce pain of future maintainers
of/unittest: reduce checkpatch noise - line after declarations
of/unittest: typo in error string
of/unittest: add const where needed
of_net: factor out repetitive code from of_get_mac_address()
drivers/of: Add empty ranges quirk for PA-Semi
of: Allow selection of OF_DYNAMIC and OF_OVERLAY if OF_UNITTEST
of: Empty node & property flag accessors when !OF
of: Explicitly include linux/types.h in of_graph.h
dt-bindings: brcm: rationalize Broadcom documentation naming
of/unittest: replace 'selftest' with 'unittest'
Documentation: rename of_selftest.txt to of_unittest.txt
Documentation: update the of_selftest.txt
dt: OF_UNITTEST make dependency broken
MAINTAINERS: Pantelis Antoniou device tree overlay maintainer
of: Add of_graph_get_port_by_id function
of: Add for_each_endpoint_of_node helper macro
of: Decrement refcount of previous endpoint in of_graph_get_next_endpoint
Merge tag 'gpio-v4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO updates from Linus Walleij:
"This is the bulk of GPIO changes for the v4.1 development cycle:
- A new GPIO hogging mechanism has been added. This can be used on
boards that want to drive some GPIO line high, low, or set it as
input on boot and then never touch it again. For some embedded
systems this is bliss and simplifies things to a great extent.
- Some API cleanup and closure: gpiod_get_array() and
gpiod_put_array() has been added to get and put GPIOs in bulk as
was possible with the non-descriptor API.
- Encapsulate cross-calls to the pin control subsystem in
<linux/gpio/driver.h>. Now this should be the only header any GPIO
driver needs to include or something is wrong. Cleanups
restricting drivers to this include are welcomed if tested.
- Sort the GPIO Kconfig and split it into submenus, as it was
becoming and unstructured, illogical and unnavigatable mess. I
hope this is easier to follow. Menus that require a certain
subsystem like I2C can now be hidden nicely for example, still
working on others.
- New drivers:
- New driver for the Altera Soft GPIO.
- The F7188x driver now handles the F71869 and F71869A variants.
- The MIPS Loongson driver has been moved to drivers/gpio for
consolidation and cleanup.
- Cleanups:
- The MAX732x is converted to use the GPIOLIB_IRQCHIP
infrastructure.
- The PCF857x is converted to use the GPIOLIB_IRQCHIP
infrastructure.
- Radical cleanup of the OMAP driver.
- Misc:
- Enable the DWAPB GPIO for all architectures. This is a "hard
IP" block from Synopsys which has started to turn up in so
diverse architectures as X86 Quark, ARC and a slew of ARM
systems. So even though it's not an expander, it's generic
enough to be available for all.
- We add a mock GPIO on Crystalcove PMIC after a long discussion
with Daniel Vetter et al, tracing back to the shootout at the
kernel summit where DRM drivers and sub-componentization was
discussed. In this case a mock GPIO is assumed to be the best
compromise gaining some reuse of infrastructure without making
DRM drivers overly complex at the same time. Let's see"
* tag 'gpio-v4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (62 commits)
Revert "gpio: sch: use uapi/linux/pci_ids.h directly"
gpio: dwapb: remove dependencies
gpio: dwapb: enable for ARC
gpio: removing kfree remove functionality
gpio: mvebu: Fix mask/unmask managment per irq chip type
gpio: split GPIO drivers in submenus
gpio: move MFD GPIO drivers under their own comment
gpio: move BCM Kona Kconfig option
gpio: arrange SPI Kconfig symbols alphabetically
gpio: arrange PCI GPIO controllers alphabetically
gpio: arrange I2C Kconfig symbols alphabetically
gpio: arrange Kconfig symbols alphabetically
gpio: ich: Implement get_direction function
gpio: use (!foo) instead of (foo == NULL)
gpio: arizona: drop owner assignment from platform_drivers
gpio: max7300: remove 'ret' variable
gpio: use devm_kzalloc
gpio: sch: use uapi/linux/pci_ids.h directly
gpio: x-gene: fix devm_ioremap_resource() check
gpio: loongson: Add Loongson-3A/3B GPIO driver support
...
Merge branch 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Two buildfixes for I2C based on your tree from the day before
yesterday. Sadly, these build errors never reached me while the
patches were sitting in -next. Need to fix that"
* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: core: Export bus recovery functions
i2c: jz4780: Fix build for m68k and sparc64
Merge tag 'dm-4.1-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- the most extensive changes this cycle are the DM core improvements to
add full blk-mq support to request-based DM.
- disabled by default but user can opt-in with CONFIG_DM_MQ_DEFAULT
- depends on some blk-mq changes from Jens' for-4.1/core branch so
that explains why this pull is built on linux-block.git
- update DM to use name_to_dev_t() rather than open-coding a less
capable device parser.
- includes a couple small improvements to name_to_dev_t() that offer
stricter constraints that DM's code provided.
- improvements to the dm-cache "mq" cache replacement policy.
- a DM crypt crypt_ctr() error path fix and an async crypto deadlock
fix
- a small efficiency improvement for DM crypt decryption by leveraging
immutable biovecs
- add error handling modes for corrupted blocks to DM verity
- a new "log-writes" DM target from Josef Bacik that is meant for file
system developers to test file system integrity at particular points
in the life of a file system
- a few DM log userspace cleanups and fixes
- a few Documentation fixes (for thin, cache, crypt and switch)
* tag 'dm-4.1-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (34 commits)
dm crypt: fix missing error code return from crypt_ctr error path
dm crypt: fix deadlock when async crypto algorithm returns -EBUSY
dm crypt: leverage immutable biovecs when decrypting on read
dm crypt: update URLs to new cryptsetup project page
dm: add log writes target
dm table: use bool function return values of true/false not 1/0
dm verity: add error handling modes for corrupted blocks
dm thin: remove stale 'trim' message documentation
dm delay: use msecs_to_jiffies for time conversion
dm log userspace base: fix compile warning
dm log userspace transfer: match wait_for_completion_timeout return type
dm table: fall back to getting device using name_to_dev_t()
init: stricter checking of major:minor root= values
init: export name_to_dev_t and mark name argument as const
dm: add 'use_blk_mq' module param and expose in per-device ro sysfs attr
dm: optimize dm_mq_queue_rq to _not_ use kthread if using pure blk-mq
dm: add full blk-mq support to request-based DM
dm: impose configurable deadline for dm_request_fn's merge heuristic
dm sysfs: introduce ability to add writable attributes
dm: don't start current request if it would've merged with the previous
...
perf/x86/intel/pt: Fix and clean up error handling in pt_event_add()
Dan Carpenter reported that pt_event_add() has buggy
error handling logic: it returns 0 instead of -EBUSY when
it fails to start a newly added event.
Furthermore, the control flow in this function is messy,
with cleanup labels mixed with direct returns.
Fix the bug and clean up the code by converting it to
a straight fast path for the regular non-failing case,
plus a clear sequence of cascading goto labels to do
all cleanup.
NOTE: I materially changed the existing clean up logic in the
pt_event_start() failure case to use the direct
perf_aux_output_end() path, not pt_event_del(), because
perf_aux_output_end() is enough here.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150416103830.GB7847@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
1) Fix verifier memory corruption and other bugs in BPF layer, from
Alexei Starovoitov.
2) Add a conservative fix for doing BPF properly in the BPF classifier
of the packet scheduler on ingress. Also from Alexei.
3) The SKB scrubber should not clear out the packet MARK and security
label, from Herbert Xu.
4) Fix oops on rmmod in stmmac driver, from Bryan O'Donoghue.
5) Pause handling is not correct in the stmmac driver because it
doesn't take into consideration the RX and TX fifo sizes. From
Vince Bridgers.
6) Failure path missing unlock in FOU driver, from Wang Cong.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
net: dsa: use DEVICE_ATTR_RW to declare temp1_max
netns: remove BUG_ONs from net_generic()
IB/ipoib: Fix ndo_get_iflink
sfc: Fix memcpy() with const destination compiler warning.
altera tse: Fix network-delays and -retransmissions after high throughput.
net: remove unused 'dev' argument from netif_needs_gso()
act_mirred: Fix bogus header when redirecting from VLAN
inet_diag: fix access to tcp cc information
tcp: tcp_get_info() should fetch socket fields once
net: dsa: mv88e6xxx: Add missing initialization in mv88e6xxx_set_port_state()
skbuff: Do not scrub skb mark within the same name space
Revert "net: Reset secmark when scrubbing packet"
bpf: fix two bugs in verification logic when accessing 'ctx' pointer
bpf: fix bpf helpers to use skb->mac_header relative offsets
stmmac: Configure Flow Control to work correctly based on rxfifo size
stmmac: Enable unicast pause frame detect in GMAC Register 6
stmmac: Read tx-fifo-depth and rx-fifo-depth from the devicetree
stmmac: Add defines and documentation for enabling flow control
stmmac: Add properties for transmit and receive fifo sizes
stmmac: fix oops on rmmod after assigning ip addr
...
Pull sparc updates from David Miller:
"The PowerPC folks have a really nice scalable IOMMU pool allocator
that we wanted to make use of for sparc. So here we have a series
that abstracts out their code into a common layer that anyone can make
use of.
Sparc is converted, and the PowerPC folks have reviewed and ACK'd this
series and plan to convert PowerPC over as well"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
iommu-common: Fix PARISC compile-time warnings
sparc: Make LDC use common iommu poll management functions
sparc: Make sparc64 use scalable lib/iommu-common.c functions
sparc: Break up monolithic iommu table/lock into finer graularity pools and lock
Pull arch/tile updates from Chris Metcalf:
"These are mostly nohz_full changes, plus a smattering of minor fixes
(notably a couple for ftrace)"
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
tile: nohz: warn if nohz_full uses hypervisor shared cores
tile: ftrace: fix function_graph tracer issues
tile: map data region shadow of kernel as R/W
tile: support CONTEXT_TRACKING and thus NOHZ_FULL
tile: support arch_irq_work_raise
arch: tile: fix null pointer dereference on pt_regs pointer
tile/elf: reorganize notify_exec()
tile: use si_int instead of si_ptr for compat_siginfo
Vivien Didelot [Fri, 17 Apr 2015 19:12:25 +0000 (15:12 -0400)]
net: dsa: use DEVICE_ATTR_RW to declare temp1_max
Since commit da4759c (sysfs: Use only return value from is_visible for
the file mode), it is possible to reduce the permissions of a file.
So declare temp1_max with the DEVICE_ATTR_RW macro and remove the write
permission in dsa_hwmon_attrs_visible if set_temp_limit isn't provided.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS updates from Ralf Baechle:
"This is the main pull request for MIPS for Linux 4.1. Most
noteworthy:
- Add more Octeon-optimized crypto functions
- Octeon crypto preemption and locking fixes
- Little endian support for Octeon
- Use correct CSR to soft reset Octeons
- Support LEDs on the Octeon-based DSR-1000N
- Fix PCI interrupt mapping for the Octeon-based DSR-1000N
- Mark prom_free_prom_memory() as __init for a number of systems
- Support for Imagination's Pistachio SOC. This includes arch and
CLK bits. I'd like to merge pinctrl bits later
- Improve parallelism of csum_partial for certain pipelines
- Organize DTB files in subdirs like other architectures
- Implement read_sched_clock for all MIPS platforms other than
Octeon
- Massive series of 38 fixes and cleanups for the FPU emulator /
kernel
- Further FPU remulator work to support new features. This sits on a
separate branch which also has been pulled into the 4.1 KVM branch
- Clean up and fixes for the SEAD3 eval board; remove unused file
- Various updates for Netlogic platforms
- A number of small updates for Loongson 3 platforms
- Increase the memory limit for ATH79 platforms to 256MB
- A fair number of fixes and updates for BCM47xx platforms
- Finish the implementation of XPA support
- MIPS FDC support. No, not floppy controller but Fast Debug Channel :)
- Detect the R16000 used in SGI legacy platforms
- Fix Kconfig dependencies for the SSB bus support"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (265 commits)
MIPS: Makefile: Fix MIPS ASE detection code
MIPS: asm: elf: Set O32 default FPU flags
MIPS: BCM47XX: Fix detecting Microsoft MN-700 & Asus WL500G
MIPS: Kconfig: Disable SMP/CPS for 64-bit
MIPS: Hibernate: flush TLB entries earlier
MIPS: smp-cps: cpu_set FPU mask if FPU present
MIPS: lose_fpu(): Disable FPU when MSA enabled
MIPS: ralink: add missing symbol for RALINK_ILL_ACC
MIPS: ralink: Fix bad config symbol in PCI makefile.
SSB: fix Kconfig dependencies
MIPS: Malta: Detect and fix bad memsize values
Revert "MIPS: Avoid pipeline stalls on some MIPS32R2 cores."
MIPS: Octeon: Delete override of cpu_has_mips_r2_exec_hazard.
MIPS: Fix cpu_has_mips_r2_exec_hazard.
MIPS: kernel: entry.S: Set correct ISA level for mips_ihb
MIPS: asm: spinlock: Fix addiu instruction for R10000_LLSC_WAR case
MIPS: r4kcache: Use correct base register for MIPS R6 cache flushes
MIPS: Kconfig: Fix typo for the r2-to-r6 emulator kernel parameter
MIPS: unaligned: Fix regular load/store instruction emulation for EVA
MIPS: unaligned: Surround load/store macros in do {} while statements
...
On 04/14/2015 08:37 PM, David Miller wrote:
> That BUG_ON() was added 7 years ago, and I don't remember it ever
> triggering or helping us diagnose something, so just remove it and
> keep the function inlined.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: Eric W. Biederman <ebiederm@xmission.com> CC: David S. Miller <davem@davemloft.net> CC: Jan Engelhardt <jengelh@medozas.de> CC: Jiri Pirko <jpirko@redhat.com> CC: linux-kernel@vger.kernel.org CC: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, iflink of the parent interface was always accessed, even
when interface didn't have a parent and hence we crashed there.
Handle the interface types properly: for a child interface, return
the ifindex of the parent, for parent interface, return its ifindex.
For child devices, make sure to set the parent pointer prior to
invoking register_netdevice(), this allows the new ndo to be called
by the stack immediately after the child device is registered.
Fixes: 5aa7add8f14b ('infiniband/ipoib: implement ndo_get_iflink') Reported-by: Honggang Li <honli@redhat.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Honggang Li <honli@redhat.com> Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>+ Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 17 Apr 2015 19:15:40 +0000 (15:15 -0400)]
sfc: Fix memcpy() with const destination compiler warning.
drivers/net/ethernet/sfc/selftest.c: In function ‘efx_iterate_state’:
drivers/net/ethernet/sfc/selftest.c:388:9: warning: passing argument 1 of ‘memcpy’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-array-qualifiers]
This is because the msg[] member of struct efx_loopback_payload
is marked as 'const'. Remove that.
Signed-off-by: David S. Miller <davem@davemloft.net>
Andreas Oetken [Thu, 16 Apr 2015 21:48:08 +0000 (23:48 +0200)]
altera tse: Fix network-delays and -retransmissions after high throughput.
Fix bug which occurs when more than <limit> packets are available during
napi-poll, leading to "delays" and retransmissions on the network.
Check for (count < limit) before checking the get_rx_status in tse_rx-function.
Function get_rx_status is reading from the response-fifo.
If there is currently a response in the fifo,
reading the last byte of the response pops the value from the fifo.
If the limit is checked as second condition
and the limit is reached the fifo is popped but the packet is not processed.
Signed-off-by: Andreas Oetken <ennoerlangen@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block core fix from Jens Axboe:
"A commit in the previous pull request introduce a regression. So far
only observed on qemu-sparc64, but it's a general bug. Please pull
this single fix to rectify that, thanks"
[ And it turns out that it's been seen outside of that qemu-sparc64
case, and is easy to trigger with small number of CPUs and blk-mq
enabled by default - Linus ]
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: fix iteration of busy bitmap
Merge tag 'acpica-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPICA updates from Rafael Wysocki:
"This updates the kernel's ACPICA code to upstream revision 20150410
and adds a fix for a GPE handling regression introduced during the
3.19 cycle on top of that.
Included are two stable-candidate bug fixes (one of them fixing a 3.16
regression), multiple other fixes and a bunch of cleanups.
Specifics:
- Fix for a GPE handling regression on Dell Latitude D600 that caused
GPE signaling to stop working on that machine, which appears to be
due to a hardware glitch, but it used to work and it can be made
work again in a relativly straightforward way (Rafael J Wysocki).
- Fix for a mutex unlock regression related to the handling of ACPI
tables introduced during the 3.16 development cycle (Octavian
Purdila).
- _REV modification to always return 2 which has been done by all
versions of Windows since NT and the firmware people started to use
it to distinguish between OSes in their AML and do some silly and
wrong things on that basis (Bob Moore).
- Fixes and cleanups related to the acpi_physicall_address data type
including one stable-candidate fix for an issue occasionally
occuring on 64-bit machines running 32-bit kernels where using
offsets provided by the firmware may lead to address overflows (Lv
Zheng).
- External() opcode support infrastructure needed for recompiling
disassembled ACPI tables in some cases including interpreter
modification to ignore that opcode (Bob Moore).
- Support for the "Windows 2015" string in _OSI (Bob Moore).
- GPE debug interface change to return values read from hardware
registers (Lv Zheng).
- Removal of the __DATE__ macro usage in tools (Rasmus Villemoes).
- Assorted minor fixes and cleanups (Lv Zheng, Rickard Strandqvist,
Bob Moore)"
* tag 'acpica-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (32 commits)
ACPICA: Store GPE register enable masks upfront
ACPICA: Update version to 20150410.
ACPICA: Fix a couple issues with the local printf module.
ACPICA: Disassembler: Some cleanup of the table dump module.
ACPICA: iASL: Add support for MSDM ACPI table.
ACPICA: Update for SLIC ACPI table.
ACPICA: Add "//" before ascii output of buffers.
ACPICA: Remove unused internal AML opcode.
ACPICA: Permanently set _REV to the value '2'.
ACPICA: Add "Windows 2015" string to _OSI support.
ACPICA: Add infrastructure for External() opcode.
ACPICA: iASL: Enhancement for constant folding.
ACPICA: iASL/Disassembler: Add option to assume table contains valid AML.
ACPICA: Update AML Debugger global variables.
ACPICA: Update Resource descriptor dump module.
ACPICA: Fix a sscanf format string.
ACPICA: Casting changes around acpi_physical_address/acpi_size.
ACPICA: Resources: Correct conditional compilation definitions.
ACPICA: Utilities: Correct conditional compilation definitions.
ACPICA: Tables: Move an iasl specific table function to iasl source file.
...
Chris Metcalf [Mon, 30 Mar 2015 20:33:00 +0000 (16:33 -0400)]
tile: nohz: warn if nohz_full uses hypervisor shared cores
The "hypervisor shared" cores are ones that the Tilera hypervisor
uses to receive interrupts to manage hypervisor-owned devices.
It's a bad idea to try to use those cores with nohz_full, since
they will get interrupted unpredictably -- and invisibly to Linux
tracing tools, since the interrupts are delivered at a higher
privilege level to the Tilera hypervisor.
Generate a clear warning at boot up that this doesn't end well
for the nohz_full cores in question.
Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
Tony Lu [Fri, 27 Mar 2015 18:46:38 +0000 (14:46 -0400)]
tile: ftrace: fix function_graph tracer issues
- Add support for ARCH_SUPPORTS_FTRACE_OPS
- Replace the instruction in ftrace_call with the bundle {move r10, lr;
jal ftrace_stub}, so that the lr contains the right value after returning
from ftrace_stub. An alternative fix might be to leave the instruction
in ftrace_call alone when it is being updated with ftrace_stub.
Signed-off-by: Tony Lu <zlu@ezchip.com> Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
Chris Metcalf [Fri, 27 Mar 2015 18:35:31 +0000 (14:35 -0400)]
tile: map data region shadow of kernel as R/W
This is necessary for things like reading /proc/kcore, doing ftrace,
etc. It happens by default when using huge pages to map the kernel
data, but not when using small pages.
Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
Chris Metcalf [Mon, 23 Mar 2015 18:23:58 +0000 (14:23 -0400)]
tile: support CONTEXT_TRACKING and thus NOHZ_FULL
Add the TIF_NOHZ flag appropriately.
Add call to user_exit() on entry to do_work_pending() and on entry
to syscalls via do_syscall_trace_enter(), and also the top of
do_syscall_trace_exit() just because it's done in x86.
Add call to user_enter() at the bottom of do_work_pending() once we
have no more work to do before returning to userspace.
Wrap all the trap code in exception_enter() / exception_exit().
Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Johannes Berg [Fri, 17 Apr 2015 13:45:04 +0000 (15:45 +0200)]
net: remove unused 'dev' argument from netif_needs_gso()
In commit 04ffcb255f22 ("net: Add ndo_gso_check") Tom originally
added the 'dev' argument to be able to call ndo_gso_check().
Then later, when generalizing this in commit 5f35227ea34b
("net: Generalize ndo_gso_check to ndo_features_check")
Jesse removed the call to ndo_gso_check() in netif_needs_gso()
by calling the new ndo_features_check() in a different place.
This made the 'dev' argument unused.
Remove the unused argument and go back to the code as before.
Cc: Tom Herbert <therbert@google.com> Cc: Jesse Gross <jesse@nicira.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 17 Apr 2015 05:32:09 +0000 (13:32 +0800)]
act_mirred: Fix bogus header when redirecting from VLAN
When you redirect a VLAN device to any device, you end up with
crap in af_packet on the xmit path because hard_header_len is
not equal to skb->mac_len. So the redirected packet contains
four extra bytes at the start which then gets interpreted as
part of the MAC address.
This patch fixes this by only pushing skb->mac_len. We also
need to fix ifb because it tries to undo the pushing done by
act_mirred.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 17 Apr 2015 01:10:35 +0000 (18:10 -0700)]
inet_diag: fix access to tcp cc information
Two different problems are fixed here :
1) inet_sk_diag_fill() might be called without socket lock held.
icsk->icsk_ca_ops can change under us and module be unloaded.
-> Access to freed memory.
Fix this using rcu_read_lock() to prevent module unload.
2) Some TCP Congestion Control modules provide information
but again this is not safe against icsk->icsk_ca_ops
change and nla_put() errors were ignored. Some sockets
could not get the additional info if skb was almost full.
Fix this by returning a status from get_info() handlers and
using rcu protection as well.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 16 Apr 2015 23:12:28 +0000 (16:12 -0700)]
tcp: tcp_get_info() should fetch socket fields once
tcp_get_info() can be called without holding socket lock,
so any socket fields can change under us.
Use READ_ONCE() to fetch sk_pacing_rate and sk_max_pacing_rate
Fixes: 977cb0ecf82e ("tcp: add pacing_rate information into tcp_info") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Chris Metcalf [Mon, 23 Mar 2015 15:21:23 +0000 (11:21 -0400)]
tile: support arch_irq_work_raise
Tile includes a hypervisor hook to deliver messages to arbitrary
tiles, so we can use that to raise an interrupt as soon as
possible on our own core. Unfortunately the Tilera hypervisor
disabled that support on principle in previous releases, but
it will be available in MDE 4.3.4 and later.
Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Colin Ian King [Mon, 16 Mar 2015 20:14:02 +0000 (16:14 -0400)]
arch: tile: fix null pointer dereference on pt_regs pointer
Cppcheck reports the following issue:
[arch/tile/kernel/stack.c:116]: (error) Possible null
pointer dereference: p
In this case, on reporting on an odd fault, p is set to NULL
and immediately afterwords p is dereferenced iff
!kbt->profile is false. Rather than doing this check just
return NULL rather than falling through to the potential
null pointer dereference (since the original intentional
outcome would be to return NULL anyhow) for this odd fault
case.
Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com> [tweaked lightly]
Davidlohr Bueso [Wed, 25 Feb 2015 21:58:35 +0000 (13:58 -0800)]
tile/elf: reorganize notify_exec()
In the future mm->exe_file will be done without mmap_sem
serialization, thus isolate and reorganize the tile elf
code to make the transition easier. Good users will, make
use of the more standard get_mm_exe_file(), requiring only
holding the mmap_sem to read the value, and relying on reference
counting to make sure that the exe file won't dissappear
underneath us.
The visible effects of this patch are:
o We now take and drop the mmap_sem more often. Instead of
just in arch_setup_additional_pages(), we also do it in:
1) get_mm_exe_file()
2) to get the mm->vm_file and notify the simulator.
[Note that 1) will disappear once we change the locking
rules for exe_file.]
o We avoid getting a free page and doing d_path() while
holding the mmap_sem. This requires reordering the checks.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
Chris Metcalf [Mon, 16 Mar 2015 19:04:05 +0000 (15:04 -0400)]
tile: use si_int instead of si_ptr for compat_siginfo
To be compatible with the generic get_compat_sigevent(), the
copy_siginfo_to_user32() and thus copy_siginfo_from_user32()
have to use si_int instead of si_ptr. Using si_ptr means that
for the case of ILP32 compat code running in big-endian mode,
we would end up copying the high 32 bits of the pointer value
into si_int instead of the desired low 32 bits.
Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com> Cc: Catalin Marinas <catalin.marinas@arm.com>
Commit 889fa31f00b2 was a bit too eager in reducing the loop count,
so we ended up missing queues in some configurations. Ensure that
our division rounds up, so that's not the case.
- more mmap_sem hold time reductions from Davidlohr
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (138 commits)
proc: show locks in /proc/pid/fdinfo/X
docs: add missing and new /proc/PID/status file entries, fix typos
drivers/rtc/rtc-at91rm9200.c: make IO endian agnostic
Documentation/spi/spidev_test.c: fix warning
drivers/rtc/rtc-s5m.c: allow usage on device type different than main MFD type
.gitignore: ignore *.tar
MAINTAINERS: add Mediatek SoC mailing list
tomoyo: reduce mmap_sem hold for mm->exe_file
powerpc/oprofile: reduce mmap_sem hold for exe_file
oprofile: reduce mmap_sem hold for mm->exe_file
mips: ip32: add platform data hooks to use DS1685 driver
lib/Kconfig: fix up HAVE_ARCH_BITREVERSE help text
x86: switch to using asm-generic for seccomp.h
sparc: switch to using asm-generic for seccomp.h
powerpc: switch to using asm-generic for seccomp.h
parisc: switch to using asm-generic for seccomp.h
mips: switch to using asm-generic for seccomp.h
microblaze: use asm-generic for seccomp.h
arm: use asm-generic for seccomp.h
seccomp: allow COMPAT sigreturn overrides
...
Let's show locks which are associated with a file descriptor in
its fdinfo file.
Currently we don't have a reliable way to determine who holds a lock. We
can find some information in /proc/locks, but PID which is reported there
can be wrong. For example, a process takes a lock, then forks a child and
dies. In this case /proc/locks contains the parent pid, which can be
reused by another process.
Andrew Morton [Thu, 16 Apr 2015 19:49:29 +0000 (12:49 -0700)]
Documentation/spi/spidev_test.c: fix warning
Documentation/spi/spidev_test.c:83:5: warning: no previous prototype for 'unespcape' [-Wmissing-prototypes]
fix spelling too.
Acked-by: Mark Brown <broonie@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
drivers/rtc/rtc-s5m.c: allow usage on device type different than main MFD type
The RTC driver supports two flavors of S5M devices: S5M8767-like and
S2MPS14-like.
On S2MPS13 and S2MPS14 devices the RTC module is the same so we want to
re-use the existing support of S2MPS14. However device type was passed
from parent MFD driver in platform data structure. This way for the
S2MPS13 device the main MFD driver passed device type of 'S2MPS13X'.
Instead decouple detecting of device type between main MFD and RTC driver.
This allows adding support for other S2MPS14 variations (like S2MPS11 and
S2MPS13) easily by adding to mfd/sec-core.c:
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# linux-4.0.0-rc3-next-20150313-150225--x86.tar
This patch makes git ignore *.tar files.
Running 'git ls-files -i --exclude-standard' does not show any
tar files excluded from tracking after the change.
Signed-off-by: Andrey Skvortsov <andrej.skvortzov@gmail.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Boaz Harrosh <boaz@plexistor.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Benjamin Romer <benjamin.romer@unisys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The mm->exe_file is currently serialized with mmap_sem (shared) in order
to both safely (1) read the file and (2) compute the realpath by calling
tomoyo_realpath_from_path, making it an absolute overkill. Good users
will, on the other hand, make use of the more standard get_mm_exe_file(),
requiring only holding the mmap_sem to read the value, and relying on
reference
powerpc/oprofile: reduce mmap_sem hold for exe_file
In the future mm->exe_file will be done without mmap_sem serialization,
thus isolate and reorganize the related code to make the transition
easier. Good users will, make use of the more standard get_mm_exe_file(),
requiring only holding the mmap_sem to read the value, and relying on
reference counting to make sure that the exe file won't dissappear
underneath us while getting the dcookie.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Robert Richter <rric@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sync_buffer() needs the mmap_sem for two distinct operations, both only
occurring upon user context switch handling:
1) Dealing with the exe_file.
2) Adding the dcookie data as we need to lookup the vma that
backs it. This is done via add_sample() and add_data().
This patch isolates 1), for it will no longer need the mmap_sem for
serialization. However, for now, make of the more standard
get_mm_exe_file(), requiring only holding the mmap_sem to read the value,
and relying on reference counting to make sure that the exe file won't
dissappear underneath us while doing the get dcookie.
As a consequence, for 2) we move the mmap_sem locking into where we really
need it, in lookup_dcookie(). The benefits are twofold: reduce mmap_sem
hold times, and cleaner code.
[akpm@linux-foundation.org: export get_mm_exe_file for arch/x86/oprofile/oprofile.ko] Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Robert Richter <rric@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mips: ip32: add platform data hooks to use DS1685 driver
This modifies the IP32 (SGI O2) platform and reset code to utilize the new
rtc-ds1685 driver. The old mc146818rtc.h header is removed and ip32_defconfig
is updated as well.
Andrew Morton [Thu, 16 Apr 2015 19:49:07 +0000 (12:49 -0700)]
lib/Kconfig: fix up HAVE_ARCH_BITREVERSE help text
Cc: Yalin Wang <yalin.wang@sonymobile.com> Cc: Russell King <linux@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Switch to using the newly created asm-generic/seccomp.h for the seccomp
strict mode syscall definitions. The obsolete sigreturn syscall override
is retained in 32-bit mode, and the ia32 syscall overrides are used in
the compat case. Remaining definitions were identical.
Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Switch to using the newly created asm-generic/seccomp.h for the seccomp
strict mode syscall definitions. The obsolete sigreturn in COMPAT mode
is retained as an override. Remaining definitions are identical. Also
corrected missing #define for header reinclusion protection.
Signed-off-by: Kees Cook <keescook@chromium.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
powerpc: switch to using asm-generic for seccomp.h
Switch to using the newly created asm-generic/seccomp.h for the seccomp
strict mode syscall definitions. The obsolete sigreturn in COMPAT mode is
retained as an override. Remaining definitions are identical, though they
incorrectly appeared in uapi, which has been corrected.
Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Switch to using the newly created asm-generic/seccomp.h for the seccomp
strict mode syscall definitions. COMPAT definitions retain their
overrides and the remaining definitions were identical.
Switch to using the newly created asm-generic/seccomp.h for the seccomp
strict mode syscall definitions. Since microblaze is 32-bit, the COMPAT
seccomp defines are unused and can be dropped. The obsolete sigreturn
for seccomp strict mode is retained as an override. Remaining definitions
are identical.
Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Michal Simek <monstr@monstr.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Switch to using the newly created asm-generic/seccomp.h for the seccomp
strict mode syscall definitions. Definitions were identical.
Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Laura Abbott <lauraa@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Simek [Thu, 16 Apr 2015 19:48:38 +0000 (12:48 -0700)]
include/linux/kconfig.h: ese macros which are already defined
It is better to use macros which are already available because then there
is just one location which needs to be change.
[akpm@linux-foundation.org: move IS_ENABLED definition to after the IS_BUILTIN and IS_MODULE definitions] Signed-off-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Carpenter [Thu, 16 Apr 2015 19:48:35 +0000 (12:48 -0700)]
memstick: mspro_block: add missing curly braces
Using the indenting we can see the curly braces were obviously intended.
This is a static checker fix, but my guess is that we don't read enough
bytes, because we don't calculate "t_len" correctly.
Fixes: f1d82698029b ('memstick: use fully asynchronous request processing') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Alex Dubov <oakad@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kernel/sysctl.c: detect overflows when converting to int
When converting unsigned long to int overflows may occur. These currently
are not detected when writing to the sysctl file system.
E.g. on a system where int has 32 bits and long has 64 bits
echo 0x800001234 > /proc/sys/kernel/threads-max
has the same effect as
echo 0x1234 > /proc/sys/kernel/threads-max
The patch adds the missing check in do_proc_dointvec_conv.
With the patch an overflow will result in an error EINVAL when writing to
the the sysctl file system.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cpumask: don't perform while loop in cpumask_next_and()
cpumask_next_and() is looking for cpumask_next() in src1 in a loop and
tests if found cpu is also present in src2. remove that loop, perform
cpumask_and() of src1 and src2 first and use that new mask to find
cpumask_next().
Apart from removing while loop, ./bloat-o-meter on x86_64 shows
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-8 (-8)
function old new delta
cpumask_next_and 62 54 -8
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Amir Vadai <amirv@mellanox.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
prctl: avoid using mmap_sem for exe_file serialization
Oleg cleverly suggested using xchg() to set the new mm->exe_file instead
of calling set_mm_exe_file() which requires some form of serialization --
mmap_sem in this case. For archs that do not have atomic rmw instructions
we still fallback to a spinlock alternative, so this should always be
safe. As such, we only need the mmap_sem for looking up the backing
vm_file, which can be done sharing the lock. Naturally, this means we
need to manually deal with both the new and old file reference counting,
and we need not worry about the MMF_EXE_FILE_CHANGED bits, which can
probably be deleted in the future anyway.
This patch removes mm->mmap_sem from mm->exe_file read side.
Also it kills dup_mm_exe_file() and moves exe_file duplication into
dup_mmap() where both mmap_sems are locked.
[akpm@linux-foundation.org: fix comment typo] Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
PAGE_SIZE is not guaranteed to be equal to or less than 8 times the
THREAD_SIZE.
E.g. architecture hexagon may have page size 1M and thread size 4096.
This would lead to a division by zero in the calculation of max_threads.
With 32-bit calculation there is no solution which delivers valid results
for all possible combinations of the parameters. The code is only called
once. Hence a 64-bit calculation can be used as solution.
[akpm@linux-foundation.org: use clamp_t(), per Oleg] Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jean Delvare [Thu, 16 Apr 2015 19:47:41 +0000 (12:47 -0700)]
fork_init: update max_threads comment
The comment explaining what value max_threads is set to is outdated. The
maximum memory consumption ratio for thread structures was 1/2 until
February 2002, then it was briefly changed to 1/16 before being set to 1/8
which we still use today. The comment was never updated to reflect that
change, it's about time.
Signed-off-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Thu, 16 Apr 2015 19:47:38 +0000 (12:47 -0700)]
fork: report pid reservation failure properly
copy_process will report any failure in alloc_pid as ENOMEM currently
which is misleading because the pid allocation might fail not only when
the memory is short but also when the pid space is consumed already.
The current man page even mentions this case:
: EAGAIN
:
: A system-imposed limit on the number of threads was encountered.
: There are a number of limits that may trigger this error: the
: RLIMIT_NPROC soft resource limit (set via setrlimit(2)), which
: limits the number of processes and threads for a real user ID, was
: reached; the kernel's system-wide limit on the number of processes
: and threads, /proc/sys/kernel/threads-max, was reached (see
: proc(5)); or the maximum number of PIDs, /proc/sys/kernel/pid_max,
: was reached (see proc(5)).
so the current behavior is also incorrect wrt. documentation. POSIX man
page also suggest returing EAGAIN when the process count limit is reached.
This patch simply propagates error code from alloc_pid and makes sure we
return -EAGAIN due to reservation failure. This will make behavior of
fork closer to both our documentation and POSIX.
alloc_pid might alsoo fail when the reaper in the pid namespace is dead
(the namespace basically disallows all new processes) and there is no
good error code which would match documented ones. We have traditionally
returned ENOMEM for this case which is misleading as well but as per
Eric W. Biederman this behavior is documented in man pid_namespaces(7)
: If the "init" process of a PID namespace terminates, the kernel
: terminates all of the processes in the namespace via a SIGKILL signal.
: This behavior reflects the fact that the "init" process is essential for
: the correct operation of a PID namespace. In this case, a subsequent
: fork(2) into this PID namespace will fail with the error ENOMEM; it is
: not possible to create a new processes in a PID namespace whose "init"
: process has terminated.
and introducing a new error code would be too risky so let's stick to
ENOMEM for this case.
Signed-off-by: Michal Hocko <mhocko@suse.cz> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vladimir Davydov [Thu, 16 Apr 2015 19:47:35 +0000 (12:47 -0700)]
signal: remove warning about using SI_TKILL in rt_[tg]sigqueueinfo
Sending SI_TKILL from rt_[tg]sigqueueinfo was deprecated, so now we issue
a warning on the first attempt of doing it. We use WARN_ON_ONCE, which is
not informative and, what is worse, taints the kernel, making the trinity
syscall fuzzer complain false-positively from time to time.
It does not look like we need this warning at all, because the behaviour
changed quite a long time ago (2.6.39), and if an application relies on
the old API, it gets EPERM anyway and can issue a warning by itself.
So let us zap the warning in kernel.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Richard Weinberger <richard@nod.at> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ptrace: ptrace_detach() can no longer race with SIGKILL
ptrace_detach() re-checks ->ptrace under tasklist lock and calls
release_task() if __ptrace_detach() returns true. This was needed because
the __TASK_TRACED tracee could be killed/untraced, and it could even pass
exit_notify() before we take tasklist_lock.
But this is no longer possible after 9899d11f6544 "ptrace: ensure
arch_ptrace/ptrace_request can never race with SIGKILL". We can turn
these checks into WARN_ON() and remove release_task().
While at it, document the setting of child->exit_code.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Pavel Labath <labath@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ptrace: fix race between ptrace_resume() and wait_task_stopped()
ptrace_resume() is called when the tracee is still __TASK_TRACED. We set
tracee->exit_code and then wake_up_state() changes tracee->state. If the
tracer's sub-thread does wait() in between, task_stopped_code(ptrace => T)
wrongly looks like another report from tracee.
This confuses debugger, and since wait_task_stopped() clears ->exit_code
the tracee can miss a signal.
Note for stable: the bug is very old, but without 9899d11f6544 "ptrace:
ensure arch_ptrace/ptrace_request can never race with SIGKILL" the fix
should use lock_task_sighand(child).
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Pavel Labath <labath@google.com> Tested-by: Pavel Labath <labath@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>