In section 5.2.25.2 of ACPI 6.1 spec, it mentions that the "SPA Range
Structure Index" of virtual SPA shall be set to zero. That means virtual SPA
will not be associated by any NVDIMM region mapping.
The VCD's SPA Range Structure in NFIT is similar to virtual disk region
as following:
[02Ch 0044 2] Range Index : 0000
[02Eh 0046 2] Flags (decoded below) : 0000
Add/Online Operation Only : 0
Proximity Domain Valid : 0
[030h 0048 4] Reserved : 00000000
[034h 0052 4] Proximity Domain : 00000000
[038h 0056 16] Address Range GUID : 77AB535A-45FC-624B-5560-F7B281D1F96E
[048h 0072 8] Address Range Base : 00000000B6ABD018
[050h 0080 8] Address Range Length : 0000000005500000
[058h 0088 8] Memory Map Attribute : 0000000000000000
The way to not associate a SPA range is to never reference it from a "flush hint",
"interleave", or "control region" table.
After testing on OVMF, pmem driver can support the region that it doesn't
assoicate to any NVDIMM mapping. So, treat VCD like pmem is a idea to get
a pmem block device that it contains iso.
v4:
Instoduce nfit_spa_is_virtual() to check virtual ramdisk SPA and create
pmem region.
v3:
To simplify patch, removed useless VCD region in libnvdimm.
v2:
Removed the code for setting VCD to a read-only region.
Cc: Gary Lin <GLin@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Linda Knippers <linda.knippers@hpe.com> Signed-off-by: Lee, Chun-Yi <jlee@suse.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Sat, 4 Jun 2016 01:06:47 +0000 (18:06 -0700)]
pmem: kill __pmem address space
The __pmem address space was meant to annotate codepaths that touch
persistent memory and need to coordinate a call to wmb_pmem(). Now that
wmb_pmem() is gone, there is little need to keep this annotation.
Cc: Christoph Hellwig <hch@lst.de> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Thu, 2 Jun 2016 06:14:22 +0000 (23:14 -0700)]
libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes
nsio_rw_bytes() is used to write info block metadata to the namespace,
so it should trigger a flush after every write. Replace wmb_pmem() with
nvdimm_flush() in this path.
Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Thu, 2 Jun 2016 03:48:15 +0000 (20:48 -0700)]
libnvdimm, pmem: use REQ_FUA, REQ_FLUSH for nvdimm_flush()
Given that nvdimm_flush() has higher overhead than wmb_pmem() (pointer
chasing through nd_region), and that we otherwise assume a platform has
ADR capability when flush hints are not present, move nvdimm_flush() to
REQ_FLUSH context.
Note that we still arrange for nvdimm_flush() to be called even in the
ADR case. We need at least once wmb() fence to push buffered writes in
the cpu out to the ADR protected domain.
Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Fri, 27 May 2016 16:23:01 +0000 (09:23 -0700)]
libnvdimm: cycle flush hints
When the NFIT provides multiple flush hint addresses per-dimm it is
expressing that the platform is capable of processing multiple flush
requests in parallel. There is some fixed cost per flush request, let
the cost be shared in parallel on multiple cpus.
Since there may not be enough flush hint addresses for each cpu to have
one, keep a per-cpu index of the last used hint, hash it with current
pid, and assume that access pattern and scheduler randomness will keep
the flush-hint usage somewhat staggered across cpus.
Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Fri, 8 Jul 2016 02:44:50 +0000 (19:44 -0700)]
libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush()
nvdimm_flush() is a replacement for the x86 'pcommit' instruction. It is
an optional write flushing mechanism that an nvdimm bus can provide for
the pmem driver to consume. In the case of the NFIT nvdimm-bus-provider
nvdimm_flush() is implemented as a series of flush-hint-address [1]
writes to each dimm in the interleave set (region) that backs the
namespace.
The nvdimm_has_flush() routine relies on platform firmware to describe
the flushing capabilities of a platform. It uses the heuristic of
whether an nvdimm bus provider provides flush address data to return a
ternary result:
1: flush addresses defined
0: dimm topology described without flush addresses (assume ADR)
-errno: no topology information, unable to determine flush mechanism
The pmem driver is expected to take the following actions on this ternary
result:
1: nvdimm_flush() in response to REQ_FUA / REQ_FLUSH and shutdown
0: do not set, WC or FUA on the queue, take no further action
-errno: warn and then operate as if nvdimm_has_flush() returned '0'
The caveat of this heuristic is that it can not distinguish the "dimm
does not have flush address" case from the "platform firmware is broken
and failed to describe a flush address". Given we are already
explicitly trusting the NFIT there's not much more we can do beyond
blacklisting broken firmwares if they are ever encountered.
Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Sat, 9 Jul 2016 23:46:29 +0000 (16:46 -0700)]
libnvdimm: keep region data alive over namespace removal
nd_region device driver data will be used in the namespace i/o path.
Re-order nd_region_remove() to ensure this data stays live across
namespace device removal
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 8 Jun 2016 00:00:04 +0000 (17:00 -0700)]
libnvdimm, nfit: move flush hint mapping to region-device driver-data
In preparation for triggering flushes of a DIMM's writes-posted-queue
(WPQ) via the pmem driver move mapping of flush hint addresses to the
region driver. Since this uses devm_nvdimm_memremap() the flush
addresses will remain mapped while any region to which the dimm belongs
is active.
We need to communicate more information to the nvdimm core to facilitate
this mapping, namely each dimm object now carries an array of flush hint
address resources.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Now that all shared mappings are handled by devm_nvdimm_memremap() we no
longer need nfit_spa_map() nor do we need to trigger a callback to the
bus provider at region disable time.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In preparation for generically mapping flush hint addresses for both the
BLK and PMEM use case, provide a generic / reference counted mapping
api. Given the fact that a dimm may belong to multiple regions (PMEM
and BLK), the flush hint addresses need to be held valid as long as any
region associated with the dimm is active. This is similar to the
existing BLK-region case where multiple BLK-regions may share an
aperture mapping. Up-level this shared / reference-counted mapping
capability from the nfit driver to a core nvdimm capability.
This eliminates the need for the nd_blk_region.disable() callback. Note
that the removal of nfit_spa_map() and related infrastructure is
deferred to a later patch.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Thu, 26 May 2016 18:38:08 +0000 (11:38 -0700)]
nfit: always associate flush hints
Before enabling use of flush hints for pmem regions, we need to make
sure they are always associated. Move the initialization of nfit_flush
out of the block-window specific init path to the general init path.
Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
If devm_add_action() fails, we are explicitly calling the cleanup to free
the resources allocated. Use the helper devm_add_action_or_reset()
and return directly in case of error, since the cleanup function
has been already called by the helper if there was any error.
Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Vikas C Sajjan <vikas.cha.sajjan@hpe.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
If devm_add_action() fails, we are explicitly calling the cleanup to free
the resources allocated. Lets use the helper devm_add_action_or_reset()
and return directly in case of error, since the cleanup function
has been already called by the helper if there was any error.
Signed-off-by: Vikas C Sajjan <vikas.cha.sajjan@hpe.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Lee, Chun-Yi <jlee@suse.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Initialize struct blk_integrity with 0 as blk_integrity_register() takes the
then unitialized struct blk_integrity::flags and ORs it to the resulting block
integrity structure.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Fri, 17 Jun 2016 18:08:06 +0000 (11:08 -0700)]
libnvdimm, pmem: allow nfit_test to override pmem_direct_access()
Currently phys_to_pfn_t() is an exported symbol to allow nfit_test to
override it and indicate that nfit_test-pmem is not device-mapped. Now,
we want to enable nfit_test to operate without DMA_CMA and the pmem it
provides will no longer be physically contiguous, i.e. won't be capable
of supporting direct_access requests larger than a page. Make
pmem_direct_access() a weak symbol so that it can be replaced by the
tools/testing/nvdimm/ version, and move phys_to_pfn_t() to a static
inline now that it no longer needs to be overridden.
Acked-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
[pavel: fix up braces] Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Fri, 27 May 2016 20:28:31 +0000 (13:28 -0700)]
libnvdimm: IS_ERR() usage cleanup
Prompted by commit 287980e49ffc "remove lots of IS_ERR_VALUE abuses", I
ran make coccicheck against drivers/nvdimm/ and found that:
if (IS_ERR(x))
return PTR_ERR(x);
return 0;
...can be replaced with PTR_ERR_OR_ZERO().
Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Vishal Verma [Fri, 27 May 2016 18:43:45 +0000 (12:43 -0600)]
libnvdimm, btt: update the usage section in Documentation
Section 5 about BTT's in kernel usage was quite obsolete,
replace it with a simple 'Usage' section that describes how
to set up a BTT namespace using the 'ndctl' utility.
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 15 Jun 2016 21:59:17 +0000 (14:59 -0700)]
libnvdimm: use devm_add_action_or_reset()
Clean up needless calls to the action routine by letting
devm_add_action_or_reset() call it automatically. This does cause the
disk to registered and immediately unregistered when a memory allocation
fails, but the block layer should be prepared for such an event.
Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Linus Torvalds [Sun, 12 Jun 2016 13:30:39 +0000 (06:30 -0700)]
Merge branch 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal management fixes from Zhang Rui:
- fix an ordering issue in cpu cooling that cooling device is
registered before it's ready (freq_table being populated).
(Lukasz Luba)
- fix a missing comment update (Caesar Wang)
* 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
thermal: add the note for set_trip_temp
thermal: cpu_cooling: fix improper order during initialization
Linus Torvalds [Sun, 12 Jun 2016 01:42:59 +0000 (18:42 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fixes from Jens Axboe:
"A small collection of fixes for the current series. This contains:
- Two fixes for xen-blkfront, from Bob Liu.
- A bug fix for NVMe, releasing only the specific resources we
requested.
- Fix for a debugfs flags entry for nbd, from Josef.
- Plug fix from Omar, fixing up a case of code being switched between
two functions.
- A missing bio_put() for the new discard callers of
submit_bio_wait(), fixing a regression causing a leak of the bio.
From Shaun.
- Improve dirty limit calculation precision in the writeback code,
fixing a case where setting a limit lower than 1% of memory would
end up being zero. From Tejun"
* 'for-linus' of git://git.kernel.dk/linux-block:
NVMe: Only release requested regions
xen-blkfront: fix resume issues after a migration
xen-blkfront: don't call talk_to_blkback when already connected to blkback
nbd: pass the nbd pointer for flags debugfs
block: missing bio_put following submit_bio_wait
blk-mq: really fix plug list flushing for nomerge queues
writeback: use higher precision calculation in domain_dirty_limits()
Linus Torvalds [Sun, 12 Jun 2016 01:03:39 +0000 (18:03 -0700)]
Merge tag 'gpio-v4.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"A new bunch of GPIO fixes for v4.7.
This time I am very grateful that Ricardo Ribalda Delgado went in and
fixed my stupid refcounting mistakes in the removal path for GPIO
chips. I had a feeling something was wrong here and so it was. It
exploded on OMAP and it fixes their problem. Now it should be (more)
solid.
The rest i compilation, Kconfig and driver fixes. Some tagged for
stable.
Summary:
- Fix a NULL pointer dereference when we are searching the GPIO
device list but one of the devices have been removed (struct
gpio_chip pointer is NULL).
- Fix unaligned reference counters: we were ending on +3 after all
said and done. It should be 0. Remove an extraneous get_device(),
and call cdev_del() followed by device_del() in gpiochip_remove()
instead and the count goes to zero and calls the release() function
properly.
- Fix a compile warning due to a missing #include in the OF/device
tree portions.
- Select ANON_INODES for GPIOLIB, we're using that for our character
device. Some randconfig tests disclosed the problem.
- Make sure the Zynq driver clock runs also without CONFIG_PM enabled
- Fix an off-by-one error in the 104-DIO-48E driver
- Fix warnings in bcm_kona_gpio_reset()"
* tag 'gpio-v4.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: bcm-kona: fix bcm_kona_gpio_reset() warnings
gpio: select ANON_INODES
gpio: include <linux/io-mapping.h> in gpiolib-of
gpiolib: Fix unaligned used of reference counters
gpiolib: Fix NULL pointer deference
gpio: zynq: initialize clock even without CONFIG_PM
gpio: 104-dio-48e: Fix control port offset computation off-by-one error
Linus Torvalds [Sat, 11 Jun 2016 18:42:08 +0000 (11:42 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two current fixes:
- one affects Qemu CD ROM emulation, which stopped working after the
updates in SCSI to require VPD pages from all conformant devices.
Fix temporarily by blacklisting Qemu (we can relax later when they
come into compliance).
- The other is a fix to the optimal transfer size. We set up a
minefield for ourselves by being confused about whether the limits
are in bytes or sectors (SCSI optimal is in blocks and the queue
parameter is in bytes).
This tries to fix the problem (wrong setting for queue limits
max_sectors) and make the problem more obvious by introducing a
wrapper function"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
sd: Fix rw_max for devices that report an optimal xfer size
scsi: Add QEMU CD-ROM to VPD Inquiry Blacklist
Linus Torvalds [Sat, 11 Jun 2016 18:24:54 +0000 (11:24 -0700)]
Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
- a bigger fix for i801 to finally be able to be loaded on some
machines again
- smaller driver fixes
- documentation update because of a renamed file
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: mux: reg: Provide of_match_table
i2c: mux: refer to i2c-mux.txt
i2c: octeon: Avoid printk after too long SMBUS message
i2c: octeon: Missing AAK flag in case of I2C_M_RECV_LEN
i2c: i801: Allow ACPI SystemIO OpRegion to conflict with PCI BAR
Linus Torvalds [Sat, 11 Jun 2016 17:55:30 +0000 (10:55 -0700)]
Merge tag '20160610_uvc_compat_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux
Pull uvc compat XU ioctl fixes from Andy Lutomirski:
"uvc's compat XU ioctls go through tons of potentially buggy
indirection. The first patch removes the indirection. The second one
cleans up the code.
Compile-tested only. I have the hardware, but I have absolutely no
idea what XU does, how to use it, what software to recompile as
32-bit, or what to test in that software"
* tag '20160610_uvc_compat_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux:
uvc_v4l2: Simplify compat ioctl implementation
uvc: Forward compat ioctls to their handlers directly
Linus Torvalds [Fri, 10 Jun 2016 21:13:27 +0000 (14:13 -0700)]
Merge branch 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"Has some fixes and some new self tests for btrfs. The self tests are
usually disabled in the .config file (unless you're doing btrfs dev
work), and this bunch is meant to find problems with the 64K page size
patches.
Jeff has a patch to help people see if they are using the hardware
assist crc32c module, which really helps us nail down problems when
people ask why crcs are using so much CPU.
Otherwise, it's small fixes"
* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: self-tests: Fix extent buffer bitmap test fail on BE system
Btrfs: self-tests: Fix test_bitmaps fail on 64k sectorsize
Btrfs: self-tests: Use macros instead of constants and add missing newline
Btrfs: self-tests: Support testing all possible sectorsizes and nodesizes
Btrfs: self-tests: Execute page straddling test only when nodesize < PAGE_SIZE
btrfs: advertise which crc32c implementation is being used at module load
Btrfs: add validadtion checks for chunk loading
Btrfs: add more validation checks for superblock
Btrfs: clear uptodate flags of pages in sys_array eb
Btrfs: self-tests: Support non-4k page size
Btrfs: Fix integer overflow when calculating bytes_per_bitmap
Btrfs: test_check_exists: Fix infinite loop when searching for free space entries
Btrfs: end transaction if we abort when creating uuid root
btrfs: Use __u64 in exported linux/btrfs.h.
Linus Torvalds [Fri, 10 Jun 2016 19:23:49 +0000 (12:23 -0700)]
Merge tag 'powerpc-4.7-3Michael Ellerman:' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from
- ptrace: Fix out of bounds array access warning from Khem Raj
- pseries: Fix PCI config address for DDW from Gavin Shan
- pseries: Fix IBM_ARCH_VEC_NRCORES_OFFSET since POWER8NVL was added
from Michael Ellerman
- of: fix autoloading due to broken modalias with no 'compatible' from
Wolfram Sang
- radix: Fix always false comparison against MMU_NO_CONTEXT from Aneesh
Kumar K.V
- hash: Compute the segment size correctly for ISA 3.0 from Aneesh
Kumar K.V
- nohash: Fix build break with 64K pages from Michael Ellerman
* tag 'powerpc-4.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/nohash: Fix build break with 64K pages
powerpc/mm/hash: Compute the segment size correctly for ISA 3.0
powerpc/mm/radix: Fix always false comparison against MMU_NO_CONTEXT
of: fix autoloading due to broken modalias with no 'compatible'
powerpc/pseries: Fix IBM_ARCH_VEC_NRCORES_OFFSET since POWER8NVL was added
powerpc/pseries: Fix PCI config address for DDW
powerpc/ptrace: Fix out of bounds array access warning
Linus Torvalds [Fri, 10 Jun 2016 19:18:34 +0000 (12:18 -0700)]
Merge tag 'hwmon-for-linus-v4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- fix regression in fam15h_power driver
- minor variable type fix in lm90 driver
- document compatible statement for ina2xx driver
* tag 'hwmon-for-linus-v4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (lm90) use proper type for update_interval
hwmon: (ina2xx) Document compatible for INA231
hwmon: (fam15h_power) Disable preemption when reading registers
Linus Torvalds [Fri, 10 Jun 2016 19:10:02 +0000 (12:10 -0700)]
Merge branch 'stacking-fixes' (vfs stacking fixes from Jann)
Merge filesystem stacking fixes from Jann Horn.
* emailed patches from Jann Horn <jannh@google.com>:
sched: panic on corrupted stack end
ecryptfs: forbid opening files without mmap handler
proc: prevent stacking filesystems on top
Jann Horn [Wed, 1 Jun 2016 09:55:07 +0000 (11:55 +0200)]
sched: panic on corrupted stack end
Until now, hitting this BUG_ON caused a recursive oops (because oops
handling involves do_exit(), which calls into the scheduler, which in
turn raises an oops), which caused stuff below the stack to be
overwritten until a panic happened (e.g. via an oops in interrupt
context, caused by the overwritten CPU index in the thread_info).
Jann Horn [Wed, 1 Jun 2016 09:55:06 +0000 (11:55 +0200)]
ecryptfs: forbid opening files without mmap handler
This prevents users from triggering a stack overflow through a recursive
invocation of pagefault handling that involves mapping procfs files into
virtual memory.
Jann Horn [Wed, 1 Jun 2016 09:55:05 +0000 (11:55 +0200)]
proc: prevent stacking filesystems on top
This prevents stacking filesystems (ecryptfs and overlayfs) from using
procfs as lower filesystem. There is too much magic going on inside
procfs, and there is no good reason to stack stuff on top of procfs.
(For example, procfs does access checks in VFS open handlers, and
ecryptfs by design calls open handlers from a kernel thread that doesn't
drop privileges or so.)
Linus Torvalds [Fri, 10 Jun 2016 18:57:17 +0000 (11:57 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Will Deacon:
"A fix for an issue that Alex saw whilst swapping with hardware
access/dirty bit support enabled in the kernel: Fix a failure to fault
in old pages on a write when CONFIG_ARM64_HW_AFDBM is enabled"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mm: always take dirty state from new pte in ptep_set_access_flags
Linus Torvalds [Fri, 10 Jun 2016 18:36:04 +0000 (11:36 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes from all around the map, plus a commit that introduces a
new header of Intel model name symbols (unused) that will make the
next merge window easier"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ioapic: Fix incorrect pointers in ioapic_setup_resources()
x86/entry/traps: Don't force in_interrupt() to return true in IST handlers
x86/cpu/AMD: Extend X86_FEATURE_TOPOEXT workaround to newer models
x86/cpu/intel: Introduce macros for Intel family numbers
x86, build: copy ldlinux.c32 to image.iso
x86/msr: Use the proper trace point conditional for writes
Linus Torvalds [Fri, 10 Jun 2016 18:15:41 +0000 (11:15 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"A handful of tooling fixes, two PMU driver fixes and a cleanup of
redundant code that addresses a security analyzer false positive"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Remove a redundant check
perf/x86/intel/uncore: Remove SBOX support for Broadwell server
perf ctf: Convert invalid chars in a string before set value
perf record: Fix crash when kptr is restricted
perf symbols: Check kptr_restrict for root
perf/x86/intel/rapl: Fix pmus free during cleanup
Linus Torvalds [Fri, 10 Jun 2016 17:53:46 +0000 (10:53 -0700)]
Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
"Misc fixes:
- a file-based futex fix
- one more spin_unlock_wait() fix
- a ww-mutex deadlock detection improvement/fix
- and a raw_read_seqcount_latch() barrier fix"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Calculate the futex key based on a tail page for file-based futexes
locking/qspinlock: Fix spin_unlock_wait() some more
locking/ww_mutex: Report recursive ww_mutex locking early
locking/seqcount: Re-fix raw_read_seqcount_latch()
Linus Torvalds [Fri, 10 Jun 2016 17:47:22 +0000 (10:47 -0700)]
Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
"Two fixes: a regression/crash fix, and a message output fix"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/arm: Fix the format of EFI debug messages
efi: Fix for_each_efi_memory_desc_in_map() for empty memmaps
1) nfnetlink timestamp taken from wrong skb, fix from Florian Westphal.
2) Revert some msleep conversions in rtlwifi as these spots are in
atomic context, from Larry Finger.
3) Validate that NFTA_SET_TABLE attribute is actually specified when we
call nf_tables_getset(). From Phil Turnbull.
4) Don't do mdio_reset in stmmac driver with spinlock held as that can
sleep, from Vincent Palatin.
5) sk_filter() does things other than run a BPF filter, so we should
not elide it's call just because sk->sk_filter is NULL. Fix from
Eric Dumazet.
6) Fix missing backlog updates in several packet schedulers, from Cong
Wang.
7) bnx2x driver should allow VLAN add/remove while the interface is
down, from Michal Schmidt.
8) Several RDS/TCP race fixes from Sowmini Varadhan.
9) fq_codel scheduler doesn't return correct queue length in dumps,
from Eric Dumazet.
10) Fix TCP stats for tail loss probe and early retransmit in ipv6, from
Yuchung Cheng.
11) Properly initialize udp_tunnel_socket_cfg in l2tp_tunnel_create(),
from Guillaume Nault.
12) qfq scheduler leaks SKBs if a kzalloc fails, fix from Florian
Westphal.
13) sock_fprog passed into PACKET_FANOUT_DATA needs compat handling,
from Willem de Bruijn.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (85 commits)
vmxnet3: segCnt can be 1 for LRO packets
packet: compat support for sock_fprog
stmmac: fix parameter to dwmac4_set_umac_addr()
net/mlx5e: Fix blue flame quota logic
net/mlx5e: Use ndo_stop explicitly at shutdown flow
net/mlx5: E-Switch, always set mc_promisc for allmulti vports
net/mlx5: E-Switch, Modify node guid on vf set MAC
net/mlx5: E-Switch, Fix vport enable flow
net/mlx5: E-Switch, Use the correct error check on returned pointers
net/mlx5: E-Switch, Use the correct free() function
net/mlx5: Fix E-Switch flow steering capabilities check
net/mlx5: Fix flow steering NIC capabilities check
net/mlx5: Fix root flow table update
net/mlx5: Fix MLX5_CMD_OP_MAX to be defined correctly
net/mlx5: Fix masking of reserved bits in XRCD number
net/mlx5: Fix the size of modify QP mailbox
mlxsw: spectrum: Don't sleep during ndo_get_phys_port_name()
mlxsw: spectrum: Make split flow match firmware requirements
wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel
cfg80211: remove get/set antenna and tx power warnings
...
Linus Torvalds [Fri, 10 Jun 2016 15:27:30 +0000 (08:27 -0700)]
Merge tag 'sound-4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"We have only few, mainly HD-audio device-specific fixes. Realtek
codec driver got a slightly more LOC, but they are all for the new
codec chip, and won't affect others at all"
* tag 'sound-4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Add PCI ID for Kabylake
ALSA: hda/realtek: Add T560 docking unit fixup
ALSA: hda - Fix headset mic detection problem for Dell machine
ALSA: uapi: Add three missing header files to Kbuild file
ALSA: hda/realtek - Add support for new codecs ALC700/ALC701/ALC703
ALSA: hda/realtek - ALC256 speaker noise issue
Linus Torvalds [Fri, 10 Jun 2016 15:21:06 +0000 (08:21 -0700)]
Merge tag 'drm-fixes-for-v4.7-rc3' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"This weeks instalment of fixes:
amdgpu:
Lots of memory leak and firmware leak fixes
nouveau:
Collection of display fixes, KASAN fixes
vc4:
vblank/pageflipping fixes
fsl-dcu:
Regmap cache fix
omap:
Unused variable warning fix.
Nothing too surprising so far"
* tag 'drm-fixes-for-v4.7-rc3' of git://people.freedesktop.org/~airlied/linux: (46 commits)
drm/amdgpu: fix warning with powerplay disabled.
drm/amd/powerplay: delete useless code as pptable changed in vbios.
drm/amd/powerplay: fix bug visit array out of bounds
drm/amdgpu: fix smu ucode memleak (v2)
drm/amdgpu: add release firmware for cgs
drm/amdgpu: fix tonga smu_fini mem leak
drm/amdgpu: fix fiji smu fini mem leak
drm/amdgpu: fix cik sdma ucode memleak
drm/amdgpu: fix sdma24 ucode mem leak
drm/amdgpu: fix sdma3 ucode mem leak
drm/amdgpu: fix uvd fini mem leak
drm/amdgpu: fix gfx 7 ucode mem leak
drm/amdgpu: fix gfx8 ucode mem leak
drm/amdgpu: fix missing free wb for cond_exec
drm/amdgpu: fix memleak in pptable_init
drm/amdgpu: fix mem leak in atombios
drm/amdgpu: fix mem leak in pplib/hwmgr
drm/amdgpu: fix mem leak in smumgr
drm/amdgpu: add pipeline sync while vmid switch in same ctx
drm/amdgpu: vBIOS post only call when mem_size zero
...
Linus Torvalds [Fri, 10 Jun 2016 15:15:37 +0000 (08:15 -0700)]
Merge tag 'acpi-4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"A recently introduced boot regression related to the ACPI EC
initialization is addressed by restoring the previous behavior (Lv
Zheng)"
* tag 'acpi-4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / EC: Fix a boot EC regresion by restoring boot EC support for the DSDT EC
Linus Torvalds [Fri, 10 Jun 2016 15:09:12 +0000 (08:09 -0700)]
Merge tag 'pm-4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"Stable-candidate fixes for the intel_pstate driver and the cpuidle
core.
Specifics:
- Fix two intel_pstate initialization issues, one of which was
introduced during the 4.4 cycle (Srinivas Pandruvada)
- Fix kernel build with CONFIG_UBSAN set and CONFIG_CPU_IDLE unset
(Catalin Marinas)"
* tag 'pm-4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: Fix ->set_policy() interface for no_turbo
cpufreq: intel_pstate: Fix code ordering in intel_pstate_set_policy()
cpuidle: Do not access cpuidle_devices when !CONFIG_CPU_IDLE
Linus Torvalds [Fri, 10 Jun 2016 15:00:47 +0000 (08:00 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"7 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm/fadvise.c: do not discard partial pages with POSIX_FADV_DONTNEED
mm: introduce dedicated WQ_MEM_RECLAIM workqueue to do lru_add_drain_all
kernel/relay.c: fix potential memory leak
mm: thp: broken page count after commit aa88b68c3b1d
revert "mm: memcontrol: fix possible css ref leak on oom"
kasan: change memory hot-add error messages to info messages
mm/hugetlb: fix huge page reserve accounting for private mappings
Rui Wang [Wed, 8 Jun 2016 06:59:52 +0000 (14:59 +0800)]
x86/ioapic: Fix incorrect pointers in ioapic_setup_resources()
On a 4-socket Brickland system, hot-removing one ioapic is fine.
Hot-removing the 2nd one causes panic in mp_unregister_ioapic()
while calling release_resource().
It is because the iomem_res pointer has already been released
when removing the first ioapic.
To explain the use of &res[num] here: res is assigned to ioapic_resources,
and later in ioapic_insert_resources() we do:
Here 'r' is treated as an arry of 'struct resource', and the r++ ensures
that each element of the array is inserted separately. Thus we should call
release_resouce() on each element at &res[num].
Fix it by assigning the correct pointers to ioapics[i].iomem_res in
ioapic_setup_resources().
Andy Lutomirski [Tue, 24 May 2016 22:54:04 +0000 (15:54 -0700)]
x86/entry/traps: Don't force in_interrupt() to return true in IST handlers
Forcing in_interrupt() to return true if we're not in a bona fide
interrupt confuses the softirq code. This fixes warnings like:
NOHZ: local_softirq_pending 282
... which can happen when running things like selftests/x86.
This will change perf's static percpu buffer usage in IST context.
I think this is okay, and it's changing the behavior to match
historical (pre-4.0) behavior.
Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 959274753857 ("x86, traps: Track entry into and exit from IST context") Link: http://lkml.kernel.org/r/cdc215f94d118d691d73df35275022331156fb45.1464130360.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Socket option PACKET_FANOUT_DATA takes a struct sock_fprog as argument
if PACKET_FANOUT has mode PACKET_FANOUT_CBPF. This structure contains
a pointer into user memory. If userland is 32-bit and kernel is 64-bit
the two disagree about the layout of struct sock_fprog.
Add compat setsockopt support to convert a 32-bit compat_sock_fprog to
a 64-bit sock_fprog. This is analogous to compat_sock_fprog support for
SO_REUSEPORT added in commit 1957598840f4 ("soreuseport: add compat
case for setsockopt SO_ATTACH_REUSEPORT_CBPF").
Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Dooks [Wed, 8 Jun 2016 18:21:17 +0000 (19:21 +0100)]
stmmac: fix parameter to dwmac4_set_umac_addr()
The dwmac4_set_umac_addr() takes a struct mac_device_info as
the first parameter, but is being passed a ioaddr instead from
dwmac4_set_filter(). Fix the warning/bug by changing the first
parameter.
drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c:159:46: warning: incorrect type in argument 1 (different address spaces)
drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c:159:46: expected struct mac_device_info *hw
drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c:159:46: got void [noderef] <asn:2>*ioaddr
Note, only compile tested this as do not have any
hardware with it in.
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 10 Jun 2016 05:06:27 +0000 (22:06 -0700)]
Merge branch 'mlx5-fixes'
Saeed Mahameed says:
====================
Mellanox 100G mlx5 fixes for 4.7-rc
The following series provides some small fixes for mlx5 driver.
Two small fixes for the mlx5e netdev, the 1st is for the blue flame
quota accounting and the 2nd is a small refactoring in shutdown flow.
Five trivial fixes for mlx5 E-Switch.
- Allmulti mc_promisc flag was not set in a specific flow.
- Modify VF node guid when admin mac is changed.
- Race in vport enable flow.
- Misc code fixes (kvfree when needed and error pointers checking).
Three in mlx5 steering area. Correct capabilities checking and root flow table update.
Three misc fixes in mlx5 commands enum and layouts.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Cohen [Thu, 9 Jun 2016 21:07:40 +0000 (00:07 +0300)]
net/mlx5e: Fix blue flame quota logic
Blue flame is a latency enhancement feature that allows the driver to
write the packet data directly to the NIC's registers thus making the
read of the packet data from host memory redundant.
We maintain a quota for the blue flame which is reloaded whenever we
identify that the hardware is processing send requests and processes
them fast enough so by the time we post the next send request it was
able to process all the pending ones. This indicates that the hardware
is capable of processing more blue flame requests efficiently. The blue
flame quota is decremented whenever we send using blue flame.
The current code erroneously clears the budget if we did not use blue
flame for the current post send operation and we fix it here.
Fixes: 88a85f99e51f ('net/mlx5e: TX latency optimization to save DMA reads') Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eran Ben Elisha [Thu, 9 Jun 2016 21:07:39 +0000 (00:07 +0300)]
net/mlx5e: Use ndo_stop explicitly at shutdown flow
The current implementation copies the flow of ndo_stop instead of
calling it explicitly, Fixed it.
Fixes: 5fc7197d3a25 ("net/mlx5: Add pci shutdown callback") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Noa Osherovich [Thu, 9 Jun 2016 21:07:37 +0000 (00:07 +0300)]
net/mlx5: E-Switch, Modify node guid on vf set MAC
In RoCE, the RDMA-CM needs the node guid to establish connection
between nodes.
Today, the node guid exposed to mlx5 Ethernet VFs is zero, therefore
RDMA-CM on the VF is broken.
Whenever the administrator sets a MAC for a VF, derive the node guid
from it and set it as well in the following way:
MAC: e4:1d:2d:b3:f4:01 -> node_guid: e4:1d:2d:ff:fe:b3:f4:01
Fixes: 77256579c6b43 ('net/mlx5: E-Switch, Introduce Vport...') Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Reorder vport enable flow to mark the vport as enabled before calling
the vport change handler which was modified to handle the case for
when vport is not enabled.
This fixes the case for when the PF netdev is open before sriov is
enabled, once sriov is enabled at esw_enable_vport,
esw_vport_change_handle_locked didn't read the PF context since it
thought the PF vport was not enabled.
When we enable the vport, arming for events is not required anymore,
since it's done on the vport change handle
Fixes: 586cfa7f1d58 ('net/mlx5: E-Switch, Use vport event handler for vport cleanup') Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz [Thu, 9 Jun 2016 21:07:35 +0000 (00:07 +0300)]
net/mlx5: E-Switch, Use the correct error check on returned pointers
The mlx5 flow-steering API (mlx5_create_flow_table/group/rule) never
returns null pointer on error. Even if it was doing that, checking
for IS_ERR_OR_NULL(p) and then returning PTR_ERR(p) would have cause
bugs, since PTR_ERR(NULL) --> success, crash.
To make things more robust and protect against related future bugs,
convert all IS_ERR_OR_NULL checks on returned values to IS_ERR.
Fixes: 5742df0f7dbe ('net/mlx5: E-Switch, Introduce VST vport ingress/egress ACLs') Fixes: 86d722ad2c3b ('net/mlx5: Use flow steering infrastructure for mlx5_en') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Maor Gottlieb [Thu, 9 Jun 2016 21:07:32 +0000 (00:07 +0300)]
net/mlx5: Fix flow steering NIC capabilities check
Flow steering infrastructure is currently used only on link layer
ethernet, therefore the driver should initialize the flow steering
when the device link layer is ethernet.
In addition, add missing capability check before initializing the
namespace of NIC RX flow tables.
Fixes: 2530236303d9 ('net/mlx5_core: Flow steering tree initialization') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Maor Gottlieb [Thu, 9 Jun 2016 21:07:31 +0000 (00:07 +0300)]
net/mlx5: Fix root flow table update
When we destroy the last flow table we need to update
the root_ft to NULL.
It fixes an issue for when the last flow table is destroyed
and recreated again, root_ft pointer will not be updated,
as a result traffic will be dropped.
Fixes: 2cc43b494a6c ('net/mlx5_core: Managing root flow table') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Shahar Klein [Thu, 9 Jun 2016 21:07:30 +0000 (00:07 +0300)]
net/mlx5: Fix MLX5_CMD_OP_MAX to be defined correctly
Having MLX5_CMD_OP_MAX on another file causes us to repeatedly miss
accounting new commands added to the driver and hence there're no entries
for them in debugfs. To solve that, we integrate it into the commands enum
as the last entry.
Fixes: 34a40e689393 ('net/mlx5_core: Introduce modify flow table command') Signed-off-by: Shahar Klein <shahark@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Majd Dibbiny [Thu, 9 Jun 2016 21:07:29 +0000 (00:07 +0300)]
net/mlx5: Fix masking of reserved bits in XRCD number
Mask the reserved bits when reading the number of newly
created XRCD.
Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Majd Dibbiny [Thu, 9 Jun 2016 21:07:28 +0000 (00:07 +0300)]
net/mlx5: Fix the size of modify QP mailbox
Add 16 reserved bytes at the end of mlx5_modify_qp_mbox_in to
match the hardware spec definition.
Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 74701d5947a6 "powerpc/mm: Rename function to indicate we are
allocating fragments" renamed page_table_free() to pte_fragment_free().
One occurrence was mistyped as pte_fragment_fre().
This only breaks the nohash 64K page build, which is not the default or
enabled in any defconfig.
Fixes: 74701d5947a6 ("powerpc/mm: Rename function to indicate we are allocating fragments") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Dave Airlie [Thu, 9 Jun 2016 23:46:59 +0000 (09:46 +1000)]
Merge branch 'drm-fixes-4.7' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Mostly memory leak and firmware leak fixes for amdgpu. A bit bigger than
usual since this is several weeks worth of fixes.
* 'drm-fixes-4.7' of git://people.freedesktop.org/~agd5f/linux: (28 commits)
drm/amd/powerplay: delete useless code as pptable changed in vbios.
drm/amd/powerplay: fix bug visit array out of bounds
drm/amdgpu: fix smu ucode memleak (v2)
drm/amdgpu: add release firmware for cgs
drm/amdgpu: fix tonga smu_fini mem leak
drm/amdgpu: fix fiji smu fini mem leak
drm/amdgpu: fix cik sdma ucode memleak
drm/amdgpu: fix sdma24 ucode mem leak
drm/amdgpu: fix sdma3 ucode mem leak
drm/amdgpu: fix uvd fini mem leak
drm/amdgpu: fix gfx 7 ucode mem leak
drm/amdgpu: fix gfx8 ucode mem leak
drm/amdgpu: fix missing free wb for cond_exec
drm/amdgpu: fix memleak in pptable_init
drm/amdgpu: fix mem leak in atombios
drm/amdgpu: fix mem leak in pplib/hwmgr
drm/amdgpu: fix mem leak in smumgr
drm/amdgpu: add pipeline sync while vmid switch in same ctx
drm/amdgpu: vBIOS post only call when mem_size zero
drm/amdgpu: modify sdma start sequence
...
Linus Torvalds [Thu, 9 Jun 2016 21:36:12 +0000 (14:36 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
"This is the first -rc pull for the RDMA subsystem. The patch count is
high, but they are all smallish patches fixing simple things for the
most part, and the overall line count of changes here is smaller than
the patch count would lead a person to believe.
Code is up and running in my labs, including direct testing of cxgb4,
mlx4, mlx5, ocrdma, and qib.
Summary:
- Multiple minor fixes to the rdma core
- Multiple minor fixes to hfi1
- Multiple minor fixes to mlx5
- A very few other minor fixes (SRP, IPoIB, usNIC, mlx4)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (35 commits)
IB/IPoIB: Don't update neigh validity for unresolved entries
IB/mlx5: Fix alternate path code
IB/mlx5: Fix pkey_index length in the QP path record
IB/mlx5: Fix entries check in mlx5_ib_resize_cq
IB/mlx5: Fix entries checks in mlx5_ib_create_cq
IB/mlx5: Check BlueFlame HCA support
IB/mlx5: Fix returned values of query QP
IB/mlx5: Limit query HCA clock
IB/mlx5: Fix FW version diaplay in sysfs
IB/mlx5: Return PORT_ERR in Active to Initializing tranisition
IB/mlx5: Set flow steering capability bit
IB/core: Make all casts in ib_device_cap_flags enum consistent
IB/core: Fix bit curruption in ib_device_cap_flags structure
IB/core: Initialize sysfs attributes before sysfs create group
IB/IPoIB: Disable bottom half when dealing with device address
IB/core: Fix removal of default GID cache entry
IB/IPoIB: Fix race between ipoib_remove_one to sysfs functions
IB/core: Fix query port failure in RoCE
IB/core: fix error unwind in sysfs hw counters code
IB/core: Fix array length allocation
...
Linus Torvalds [Thu, 9 Jun 2016 21:28:39 +0000 (14:28 -0700)]
Merge tag 'arc-4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- Revert of ll-sc backoff retry workaround in atomics/spinlocks as
hardware is now proven to work just fine
- Typo fixes (Thanks Andrea Gelmini)
- Removal of obsolete DT property (Alexey)
- Other minor fixes
* tag 'arc-4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
Revert "ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff"
Revert "ARCv2: spinlock/rwlock: Reset retry delay when starting a new spin-wait cycle"
Revert "ARCv2: spinlock/rwlock/atomics: reduce 1 instruction in exponential backoff"
ARC: don't enable DISCONTIGMEM unconditionally
ARC: [intc-compact] simplify code for 2 priority levels
arc: Get rid of root core-frequency property
Fix typos
Oleg Drokin [Wed, 8 Jun 2016 22:33:59 +0000 (15:33 -0700)]
mm/fadvise.c: do not discard partial pages with POSIX_FADV_DONTNEED
I noticed that the logic in the fadvise64_64 syscall is incorrect for
partial pages. While first page of the region is correctly skipped if
it is partial, the last page of the region is mistakenly discarded.
This leads to problems for applications that read data in
non-page-aligned chunks discarding already processed data between the
reads.
A somewhat misguided application that does something like write(XX bytes
(non-page-alligned)); drop the data it just wrote; repeat gets a
significant penalty in performance as a result.
Gerald Schaefer [Wed, 8 Jun 2016 22:33:50 +0000 (15:33 -0700)]
mm: thp: broken page count after commit aa88b68c3b1d
Christian Borntraeger reported a kernel panic after corrupt page counts,
and it turned out to be a regression introduced with commit aa88b68c3b1d
("thp: keep huge zero page pinned until tlb flush"), at least on s390.
put_huge_zero_page() was moved over from zap_huge_pmd() to
release_pages(), and it was replaced by tlb_remove_page(). However,
release_pages() might not always be triggered by (the arch-specific)
tlb_remove_page().
On s390 we call free_page_and_swap_cache() from tlb_remove_page(), and
not tlb_flush_mmu() -> free_pages_and_swap_cache() like the generic
version, because we don't use the MMU-gather logic. Although both
functions have very similar names, they are doing very unsimilar things,
in particular free_page_xxx is just doing a put_page(), while
free_pages_xxx calls release_pages().
This of course results in very harmful put_page()s on the huge zero
page, on architectures where tlb_remove_page() is implemented in this
way. It seems to affect only s390 and sh, but sh doesn't have THP
support, so the problem (currently) probably only exists on s390.
The following quick hack fixed the issue:
Link: http://lkml.kernel.org/r/20160602172141.75c006a9@thinkpad Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: <stable@vger.kernel.org> [4.6.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Wed, 8 Jun 2016 22:33:47 +0000 (15:33 -0700)]
revert "mm: memcontrol: fix possible css ref leak on oom"
Revert commit 1383399d7be0 ("mm: memcontrol: fix possible css ref leak
on oom"). Johannes points out "There is a task_in_memcg_oom() check
before calling mem_cgroup_oom()".
Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Kravetz [Wed, 8 Jun 2016 22:33:42 +0000 (15:33 -0700)]
mm/hugetlb: fix huge page reserve accounting for private mappings
When creating a private mapping of a hugetlbfs file, it is possible to
unmap pages via ftruncate or fallocate hole punch. If subsequent faults
repopulate these mappings, the reserve counts will go negative. This is
because the code currently assumes all faults to private mappings will
consume reserves. The problem can be recreated as follows:
- mmap(MAP_PRIVATE) a file in hugetlbfs filesystem
- write fault in pages in the mapping
- fallocate(FALLOC_FL_PUNCH_HOLE) some pages in the mapping
- write fault in pages in the hole
This will result in negative huge page reserve counts and negative
subpool usage counts for the hugetlbfs. Note that this can also be
recreated with ftruncate, but fallocate is more straight forward.
This patch modifies the routines vma_needs_reserves and vma_has_reserves
to examine the reserve map associated with private mappings similar to
that for shared mappings. However, the reserve map semantics for
private and shared mappings are very different. This results in subtly
different code that is explained in the comments.
Link: http://lkml.kernel.org/r/1464720957-15698-1-git-send-email-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The NVMe driver only requests the PCIe device's memory regions but releases
all possible regions (including eventual I/O regions). This leads to a stale
warning entry in dmesg about freeing non existent resources.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
Jan Glauber [Wed, 8 Jun 2016 06:51:17 +0000 (08:51 +0200)]
i2c: octeon: Missing AAK flag in case of I2C_M_RECV_LEN
During receive the controller requires the AAK flag for all
bytes but the final one. This was wrong in case of I2C_M_RECV_LEN,
where the decision if the final byte is to be transmitted
happened before adding the additional received length byte.
Set the AAK flag if additional bytes are to be received.
Signed-off-by: Jan Glauber <jglauber@cavium.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
There are also bunch of AML methods that that the BIOS can use to access
these fields. Most of the systems in question AML methods accessing the
SMBI OpRegion are never used.
Now, because of this SMBI OpRegion many systems fail to load the SMBus
driver with an error looking like one below:
ACPI Warning: SystemIO range 0x0000000000003040-0x000000000000305F
conflicts with OpRegion 0x0000000000003040-0x000000000000304F
(\_SB.PCI0.SBUS.SMBI) (20160108/utaddress-255)
ACPI: If an ACPI driver is available for this device, you should use
it instead of the native driver
The reason is that this SMBI OpRegion conflicts with the PCI BAR used by
the SMBus driver.
It turns out that we can install a custom SystemIO address space handler
for the SMBus device to intercept all accesses through that OpRegion. This
allows us to share the PCI BAR with the AML code if it for some reason is
using it. We do not expect that this OpRegion handler will ever be called
but if it is we print a warning and prevent all access from the SMBus
driver itself.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=110041 Reported-by: Andy Lutomirski <luto@kernel.org> Reported-by: Pali Rohár <pali.rohar@gmail.com> Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Jean Delvare <jdelvare@suse.de> Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Tested-by: Pali Rohár <pali.rohar@gmail.com> Tested-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@vger.kernel.org
Ben Dooks [Thu, 9 Jun 2016 10:38:34 +0000 (11:38 +0100)]
drivers: of: add definition of early_init_dt_alloc_reserved_memory_arch
The function early_init_dt_alloc_reserved_memory_arch is defined
in drivers/of/of_reserved_mem.c but is not declared in any of the
header files. Add the declaration of this to avoid the warning:
drivers/of/of_reserved_mem.c:31:19: warning: symbol 'early_init_dt_alloc_reserved_memory_arch' was not declared. Should it be static?
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
[robh: drop extern from declaration] Signed-off-by: Rob Herring <robh@kernel.org>
Gavin Shan [Thu, 9 Jun 2016 05:50:49 +0000 (15:50 +1000)]
drivers/of: Fix depth for sub-tree blob in unflatten_dt_nodes()
The function is unflattening device sub-tree blob if @dad passed to
the function is valid. Currently, this functionality is used by PPC
PowerNV PCI hotplug driver only. There are possibly multiple nodes
in the first level of depth, fdt_next_node() bails immediately when
@depth becomes negative before the second device node can be probed
successfully. It leads to the device nodes except the first one won't
be unflattened successfully.
This fixes the issue by setting the initial depth (@inital_depth) to
1 when this function is called to unflatten device sub-tree blob. No
logic changes when this function is used to unflatten non-sub-tree
blob.
Cc: Rhyland Klein <rklein@nvidia.com> Fixes: 78c44d910 ("drivers/of: Fix depth when unflattening devicetree") Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Tested-by: Rhyland Klein <rklein@nvidia.com> Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Rob Herring <robh@kernel.org>