Balbir Singh [Thu, 18 Feb 2016 02:48:01 +0000 (13:48 +1100)]
powerpc: Fix BUG_ON() reporting in real mode
I ran into this issue while debugging an early boot problem. The system
hit a BUG_ON() but report bug failed to print the line number and file
name. The reason being that the system was running in real mode and
report_bug() searches for addresses in the PAGE_OFFSET+ region.
Suggested-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Jeremy Kerr [Wed, 24 Feb 2016 01:55:03 +0000 (09:55 +0800)]
powerpc/powernv: Add powernv_defconfig
This change adds a defconfig for the non-virtualised power platforms,
based on pseries_defconfig, but without pseries, and little-endian,
and no OF trampoline.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Acked-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Neuling [Fri, 19 Feb 2016 00:16:24 +0000 (11:16 +1100)]
powerpc: Add POWER9 cputable entry
Add a cputable entry for POWER9. More code is required to actually
boot and run on a POWER9 but this gets the base piece in which we can
start building on.
Copies over from POWER8 except for:
- Adds a new CPU_FTR_ARCH_300 bit to start hanging new architecture
features from (in subsequent patches).
- Advertises new user features bits PPC_FEATURE2_ARCH_3_00 &
HAS_IEEE128 when on POWER9.
- Drops CPU_FTR_SUBCORE.
- Drops PMU code and machine check.
Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Neuling [Fri, 19 Feb 2016 00:16:22 +0000 (11:16 +1100)]
powerpc/powernv: Create separate subcores CPU feature bit
Subcores isn't really part of the 2.07 architecture but currently we
turn it on using the 2.07 feature bit. Subcores is really a POWER8
specific feature.
This adds a new CPU_FTR bit just for subcores and moves the subcore
init code over to use this.
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When initialising OPAL interfaces, there is a possibility that
opal_msglog_init() may fail to initialise the msglog/memory console.
Fix opal_msglog_sysfs_init() so it doesn't try to create sysfs entry for
the msglog if this occurs.
Suggested-by: Joel Stanley <joel@jms.id.au> Fixes: 9b4fffa14906 ("powerpc/powernv: new function to access OPAL msglog") Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
powerpc/mm/hash: Clear the invalid slot information correctly
We can get a hash pte fault with 4k base page size and find the pte
already inserted with 64K base page size. In that case we need to clear
the existing slot information from the old pte. Fix this correctly
With THP, we also clear the slot information with respect to all
the 64K hash pte mapping that 16MB page. They are all invalid
now. This make sure we don't find the slot valid when we fault with
4k base page size. Finding the slot valid should not result in any wrong
behavior because we do check again in hash page table for the validity.
But we can avoid that check completely.
Fixes: a43c0eb8364c022 ("powerpc/mm: Convert 4k hash insert to C") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gavin Shan [Fri, 12 Feb 2016 05:03:05 +0000 (16:03 +1100)]
powerpc/eeh: Fix partial hotplug criterion
During error recovery, the device could be removed as part of the
partial hotplug. The criterion used to come with partial hotplug
is: if the device driver provides error_detected(), slot_reset()
and resume() callbacks, it's immune from hotplug. Otherwise,
it's going to experience partial hotplug during EEH recovery. But
the criterion isn't correct enough: mlx4_core driver for Mellanox
adapters provides error_detected(), slot_reset() callbacks, but
resume() isn't there. Those Mellanox adapters won't be to involved
in the partial hotplug.
This fixes the criterion to a practical one: adpater with driver
that provides error_detected(), slot_reset() will be immune from
partial hotplug. resume() isn't mandatory.
Fixes: f2da4ccf ("powerpc/eeh: More relaxed hotplug criterion") Cc: stable@vger.kernel.org #v4.4+ Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Boqun Feng [Tue, 15 Dec 2015 14:24:17 +0000 (22:24 +0800)]
powerpc: atomic: Implement acquire/release/relaxed variants for cmpxchg
Implement cmpxchg{,64}_relaxed and atomic{,64}_cmpxchg_relaxed, based on
which _release variants can be built.
To avoid superfluous barriers in _acquire variants, we implement these
operations with assembly code rather use __atomic_op_acquire() to build
them automatically.
For the same reason, we keep the assembly implementation of fully
ordered cmpxchg operations.
However, we don't do the similar for _release, because that will require
putting barriers in the middle of ll/sc loops, which is probably a bad
idea.
Note cmpxchg{,64}_relaxed and atomic{,64}_cmpxchg_relaxed are not
compiler barriers.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Boqun Feng [Tue, 15 Dec 2015 14:24:16 +0000 (22:24 +0800)]
powerpc: atomic: Implement acquire/release/relaxed variants for xchg
Implement xchg{,64}_relaxed and atomic{,64}_xchg_relaxed, based on these
_relaxed variants, release/acquire variants and fully ordered versions
can be built.
Note that xchg{,64}_relaxed and atomic_{,64}_xchg_relaxed are not
compiler barriers.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On powerpc, acquire and release semantics can be achieved with
lightweight barriers("lwsync" and "ctrl+isync"), which can be used to
implement __atomic_op_{acquire,release}.
For release semantics, since we only need to ensure all memory accesses
that issue before must take effects before the -store- part of the
atomics, "lwsync" is what we only need. On the platform without
"lwsync", "sync" should be used. Therefore in __atomic_op_release() we
use PPC_RELEASE_BARRIER.
For acquire semantics, "lwsync" is what we only need for the similar
reason. However on the platform without "lwsync", we can use "isync"
rather than "sync" as an acquire barrier. Therefore in
__atomic_op_acquire() we use PPC_ACQUIRE_BARRIER, which is barrier() on
UP, "lwsync" if available and "isync" otherwise.
Implement atomic{,64}_{add,sub,inc,dec}_return_relaxed, and build other
variants with these helpers.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Boqun Feng [Tue, 15 Dec 2015 14:24:14 +0000 (22:24 +0800)]
atomics: Allow architectures to define their own __atomic_op_* helpers
Some architectures may have their special barriers for acquire, release
and fence semantics, so that general memory barriers(smp_mb__*_atomic())
in the default __atomic_op_*() may be too strong, so allow architectures
to define their own helpers which can overwrite the default helpers.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Balbir Singh [Mon, 1 Feb 2016 06:03:25 +0000 (17:03 +1100)]
powerpc: Fix kgdb on little endian ppc64le
I spent some time trying to use kgdb and debugged my inability to
resume from kgdb_handle_breakpoint(). NIP is not incremented
and that leads to a loop in the debugger.
I've tested this lightly on a virtual instance with KDB enabled.
After the patch, I am able to get the "go" command to work as
expected.
Signed-off-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
powerpc/ioda: Set "read" permission when "write" is set
Quite often drivers set only "write" permission assuming that this
includes "read" permission as well and this works on plenty of
platforms. However IODA2 is strict about this and produces an EEH when
"read" permission is not set and reading happens.
This adds a workaround in the IODA code to always add the "read" bit
when the "write" bit is set.
Fixes: 10b35b2b7485 ("powerpc/powernv: Do not set "read" flag if direction==DMA_NONE") Cc: stable@vger.kernel.org # 4.2+ Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Tested-by: Douglas Miller <dougmill@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
powerpc/mm: Fix Multi hit ERAT cause by recent THP update
With ppc64 we use the deposited pgtable_t to store the hash pte slot
information. We should not withdraw the deposited pgtable_t without
marking the pmd none. This ensure that low level hash fault handling
will skip this huge pte and we will handle them at upper levels.
Recent change to pmd splitting changed the above in order to handle the
race between pmd split and exit_mmap. The race is explained below.
Consider following race:
CPU0 CPU1
shrink_page_list()
add_to_swap()
split_huge_page_to_list()
__split_huge_pmd_locked()
pmdp_huge_clear_flush_notify()
// pmd_none() == true
exit_mmap()
unmap_vmas()
zap_pmd_range()
// no action on pmd since pmd_none() == true
pmd_populate()
As result the THP will not be freed. The leak is detected by check_mm():
The above required us to not mark pmd none during a pmd split.
The fix for ppc is to clear the huge pte of _PAGE_USER, so that low
level fault handling code skip this pte. At higher level we do take ptl
lock. That should serialze us against the pmd split. Once the lock is
acquired we do check the pmd again using pmd_same. That should always
return false for us and hence we should retry the access. We do the
pmd_same check in all case after taking plt with
THP (do_huge_pmd_wp_page, do_huge_pmd_numa_page and
huge_pmd_set_accessed)
Also make sure we wait for irq disable section in other cpus to finish
before flipping a huge pte entry with a regular pmd entry. Code paths
like find_linux_pte_or_hugepte depend on irq disable to get
a stable pte_t pointer. A parallel thp split need to make sure we
don't convert a pmd pte to a regular pmd entry without waiting for the
irq disable section to finish.
Fixes: eef1b3ba053a ("thp: implement split_huge_pmd()") Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gavin Shan [Tue, 9 Feb 2016 04:50:22 +0000 (15:50 +1100)]
powerpc/powernv: Fix stale PE primary bus
When PCI bus is unplugged during full hotplug for EEH recovery,
the platform PE instance (struct pnv_ioda_pe) isn't released and
it dereferences the stale PCI bus that has been released. It leads
to kernel crash when referring to the stale PCI bus.
This fixes the issue by correcting the PE's primary bus when it's
oneline at plugging time, in pnv_pci_dma_bus_setup() which is to
be called by pcibios_fixup_bus().
Cc: stable@vger.kernel.org # v4.1+ Reported-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reported-by: Pradipta Ghosh <pradghos@in.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gavin Shan [Tue, 9 Feb 2016 04:50:21 +0000 (15:50 +1100)]
powerpc/eeh: Fix stale cached primary bus
When PE is created, its primary bus is cached to pe->bus. At later
point, the cached primary bus is returned from eeh_pe_bus_get().
However, we could get stale cached primary bus and run into kernel
crash in one case: full hotplug as part of fenced PHB error recovery
releases all PCI busses under the PHB at unplugging time and recreate
them at plugging time. pe->bus is still dereferencing the PCI bus
that was released.
This adds another PE flag (EEH_PE_PRI_BUS) to represent the validity
of pe->bus. pe->bus is updated when its first child EEH device is
online and the flag is set. Before unplugging in full hotplug for
error recovery, the flag is cleared.
Fixes: 8cdb2833 ("powerpc/eeh: Trace PCI bus from PE") Cc: stable@vger.kernel.org #v3.11+ Reported-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reported-by: Pradipta Ghosh <pradghos@in.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Denis Kirjanov [Mon, 14 Dec 2015 20:18:06 +0000 (23:18 +0300)]
powerpc/pseries: Don't trace hcalls on offline CPUs
If a cpu is hotplugged while the hcall trace points are active, it's
possible to hit a warning from RCU due to the trace points calling into
RCU from an offline cpu, eg:
RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 1
Make the hypervisor tracepoints conditional by using
TRACE_EVENT_FN_COND.
Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The GPCI hcall allows for a 4K buffer but we limit the buffer to 1K.
The problem with a 1K buffer is if a request results in returning
more values than can be accomodated in the 1K buffer the request will
fail.
The buffer we are using is currently allocated on the stack and hence
limited in size. Instead use a per-CPU 4K buffer like we do with 24x7
counters (hv-24x7.c).
While here, rename the macro GPCI_MAX_DATA_BYTES to HGPCI_MAX_DATA_BYTES
for consistency with 24x7 counters.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Wei Yang [Thu, 22 Oct 2015 01:22:19 +0000 (09:22 +0800)]
powerpc/powernv: allocate sparse PE# when using M64 BAR in Single PE mode
When M64 BAR is set to Single PE mode, the PE# assigned to VF could be
sparse.
This patch restructures the code to allocate sparse PE# for VFs when M64
BAR is set to Single PE mode. Also it rename the offset to pe_num_map to
reflect the content is the PE number.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Wei Yang [Thu, 22 Oct 2015 01:22:17 +0000 (09:22 +0800)]
powerpc/powernv: replace the hard coded boundary with gate
At the moment 64bit-prefetchable window can be maximum 64GB, which is
currently got from device tree. This means that in shared mode the maximum
supported VF BAR size is 64GB/256=256MB. While this size could exhaust the
whole 64bit-prefetchable window. This is a design decision to set a
boundary to 64MB of the VF BAR size. Since VF BAR size with 64MB would
occupy a quarter of the 64bit-prefetchable window, this is affordable.
This patch replaces magic limit of 64MB with "gate", which is 1/4 of the
M64 Segment Size(m64_segsize >> 2) and adds comment to explain the reason
for it.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Reviewed-by: Gavin Shan <gwshan@linux.vent.ibm.com> Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Wei Yang [Thu, 22 Oct 2015 01:22:16 +0000 (09:22 +0800)]
powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR
In current implementation, when VF BAR is bigger than 64MB, it uses 4 M64
BARs in Single PE mode to cover the number of VFs required to be enabled.
By doing so, several VFs would be in one VF Group and leads to interference
between VFs in the same group.
And in this patch, m64_wins is renamed to m64_map, which means index number
of the M64 BAR used to map the VF BAR. Based on Gavin's comments. Also
makes sure the VF BAR size is bigger than 32MB when M64 BAR is used in
Single PE mode.
This patch changes the design by using one M64 BAR in Single PE mode for
one VF BAR. This gives absolute isolation for VFs.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Wei Yang [Thu, 22 Oct 2015 01:22:15 +0000 (09:22 +0800)]
powerpc/powernv: simplify the calculation of iov resource alignment
The alignment of IOV BAR on PowerNV platform is the total size of the IOV
BAR. No matter whether the IOV BAR is extended with number of
roundup_pow_of_two(total_vfs) or number of max PE number (256), the total
size could be calculated by (vfs_expanded * VF_BAR_size).
This patch simplifies the pnv_pci_iov_resource_alignment() by removing the
first case.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Wei Yang [Thu, 22 Oct 2015 01:22:14 +0000 (09:22 +0800)]
powerpc/powernv: don't enable SRIOV when VF BAR has non 64bit-prefetchable BAR
On PHB3, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If a
SRIOV device's IOV BAR is not 64bit-prefetchable, this is not assigned from
64bit prefetchable window, which means M64 BAR can't work on it.
The reason is PCI bridges support only 2 memory windows and the kernel code
programs bridges in the way that one window is 32bit-nonprefetchable and
the other one is 64bit-prefetchable. So if devices' IOV BAR is 64bit and
non-prefetchable, it will be mapped into 32bit space and therefore M64
cannot be used for it.
This patch makes this explicit and truncate IOV resource in this case to
save MMIO space.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Douglas Miller [Mon, 23 Nov 2015 15:01:15 +0000 (09:01 -0600)]
powerpc/xmon: Add xmon command to dump process/task similar to ps(1)
Add 'P' command with optional task_struct address to dump all/one task's
information: task pointer, kernel stack pointer, PID, PPID, state
(interpreted), CPU where (last) running, and command.
Signed-off-by: Douglas Miller <dougmill@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add the 'do' command to dump the OPAL msglog in xmon.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
[mpe: Reduce the amount of ifdefery required] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
powerpc/powernv: new function to access OPAL msglog
Currently, the OPAL msglog/console buffer is exposed as a sysfs file, with
the sysfs read handler responsible for retrieving the log from the OPAL
buffer. We'd like to be able to use it in xmon as well.
Refactor the OPAL msglog code to create a new function, opal_msglog_copy(),
that copies to an arbitrary buffer. Separate the initialisation code into
generic memcons init and sysfs file creation.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Russell Currey [Mon, 8 Feb 2016 04:08:20 +0000 (15:08 +1100)]
powerpc/powernv: Remove support for p5ioc2
"p5ioc2 is used by approximately 2 machines in the world, and has never
ever been a supported configuration."
The code for p5ioc2 is essentially unused and complicates what is already
a very complicated codebase. Its removal is essentially a "free win" in
the effort to simplify the powernv PCI code.
In addition, support for p5ioc2 has been dropped from skiboot. There's no
reason to keep it around in the kernel.
Signed-off-by: Russell Currey <ruscur@russell.cc> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Andreas Schwab [Fri, 5 Feb 2016 18:50:03 +0000 (19:50 +0100)]
powerpc: Fix dedotify for binutils >= 2.26
Since binutils 2.26 BFD is doing suffix merging on STRTAB sections. But
dedotify modifies the symbol names in place, which can also modify
unrelated symbols with a name that matches a suffix of a dotted name. To
remove the leading dot of a symbol name we can just increment the pointer
into the STRTAB section instead.
Backport to all stables to avoid breakage when people update their
binutils - mpe.
Cc: stable@vger.kernel.org Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Linus Torvalds [Sun, 7 Feb 2016 23:23:20 +0000 (15:23 -0800)]
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"The first real batch of fixes for this release cycle, so there are a
few more than usual.
Most of these are fixes and tweaks to board support (DT bugfixes,
etc). I've also picked up a couple of small cleanups that seemed
innocent enough that there was little reason to wait (const/
__initconst and Kconfig deps).
Quite a bit of the changes on OMAP were due to fixes to no longer
write to rodata from assembly when ARM_KERNMEM_PERMS was enabled, but
there were also other fixes.
Kirkwood had a bunch of gpio fixes for some boards. OMAP had RTC
fixes on OMAP5, and Nomadik had changes to MMC parameters in DT.
All in all, mostly the usual mix of various fixes"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (46 commits)
ARM: multi_v7_defconfig: enable DW_WATCHDOG
ARM: nomadik: fix up SD/MMC DT settings
ARM64: tegra: Add chosen node for tegra132 norrin
ARM: realview: use "depends on" instead of "if" after prompt
ARM: tango: use "depends on" instead of "if" after prompt
ARM: tango: use const and __initconst for smp_operations
ARM: realview: use const and __initconst for smp_operations
bus: uniphier-system-bus: revive tristate prompt
arm64: dts: Add missing DMA Abort interrupt to Juno
bus: vexpress-config: Add missing of_node_put
ARM: dts: am57xx: sbc-am57x: correct Eth PHY settings
ARM: dts: am57xx: cl-som-am57x: fix CPSW EMAC pinmux
ARM: dts: am57xx: sbc-am57x: fix UART3 pinmux
ARM: dts: am57xx: cl-som-am57x: update SPI Flash frequency
ARM: dts: am57xx: cl-som-am57x: set HOST mode for USB2
ARM: dts: am57xx: sbc-am57x: fix SB-SOM EEPROM I2C address
ARM: dts: LogicPD Torpedo: Revert Duplicative Entries
ARM: dts: am437x: pixcir_tangoc: use correct flags for irq types
ARM: dts: am4372: fix irq type for arm twd and global timer
ARM: dts: at91: sama5d4 xplained: fix phy0 IRQ type
...
Linus Torvalds [Sun, 7 Feb 2016 06:14:46 +0000 (22:14 -0800)]
Merge tag 'usb-4.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some USB fixes for 4.5-rc3.
The usual, xhci fixes for reported issues, combined with some small
gadget driver fixes, and a MAINTAINERS file update. All have been in
linux-next with no reported issues"
* tag 'usb-4.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
xhci: harden xhci_find_next_ext_cap against device removal
xhci: Fix list corruption in urb dequeue at host removal
usb: host: xhci-plat: fix NULL pointer in probe for device tree case
usb: xhci-mtk: fix AHB bus hang up caused by roothubs polling
usb: xhci-mtk: fix bpkts value of LS/HS periodic eps not behind TT
usb: xhci: apply XHCI_PME_STUCK_QUIRK to Intel Broxton-M platforms
usb: xhci: set SSIC port unused only if xhci_suspend succeeds
usb: xhci: add a quirk bit for ssic port unused
usb: xhci: handle both SSIC ports in PME stuck quirk
usb: dwc3: gadget: set the OTG flag in dwc3 gadget driver.
Revert "xhci: don't finish a TD if we get a short-transfer event mid TD"
MAINTAINERS: fix my email address
usb: dwc2: Fix probe problem on bcm2835
Revert "usb: dwc2: Move reset into dwc2_get_hwparams()"
usb: musb: ux500: Fix NULL pointer dereference at system PM
usb: phy: mxs: declare variable with initialized value
usb: phy: msm: fix error handling in probe.
Linus Torvalds [Sun, 7 Feb 2016 06:13:16 +0000 (22:13 -0800)]
Merge tag 'staging-4.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging and IIO driver fixes from Greg KH:
"Here are some IIO and staging driver fixes for 4.5-rc3.
All of them, except one, are for IIO drivers, and one is for a speakup
driver fix caused by some earlier patches, to resolve a reported build
failure"
* tag 'staging-4.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
Staging: speakup: Fix allyesconfig build on mn10300
iio: dht11: Use boottime
iio: ade7753: avoid uninitialized data
iio: pressure: mpl115: fix temperature offset sign
iio: imu: Fix dependencies for !HAS_IOMEM archs
staging: iio: Fix dependencies for !HAS_IOMEM archs
iio: adc: Fix dependencies for !HAS_IOMEM archs
iio: inkern: fix a NULL dereference on error
iio:adc:ti_am335x_adc Fix buffered mode by identifying as software buffer.
iio: light: acpi-als: Report data as processed
iio: dac: mcp4725: set iio name property in sysfs
iio: add HAS_IOMEM dependency to VF610_ADC
iio: add IIO_TRIGGER dependency to STK8BA50
iio: proximity: lidar: correct return value
iio-light: Use a signed return type for ltr501_match_samp_freq()
Linus Torvalds [Sat, 6 Feb 2016 04:20:07 +0000 (20:20 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
"22 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits)
epoll: restrict EPOLLEXCLUSIVE to POLLIN and POLLOUT
radix-tree: fix oops after radix_tree_iter_retry
MAINTAINERS: trim the file triggers for ABI/API
dax: dirty inode only if required
thp: make deferred_split_scan() work again
mm: replace vma_lock_anon_vma with anon_vma_lock_read/write
ocfs2/dlm: clear refmap bit of recovery lock while doing local recovery cleanup
um: asm/page.h: remove the pte_high member from struct pte_t
mm, hugetlb: don't require CMA for runtime gigantic pages
mm/hugetlb: fix gigantic page initialization/allocation
mm: downgrade VM_BUG in isolate_lru_page() to warning
mempolicy: do not try to queue pages from !vma_migratable()
mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any progress
vmstat: make vmstat_update deferrable
mm, vmstat: make quiet_vmstat lighter
mm/Kconfig: correct description of DEFERRED_STRUCT_PAGE_INIT
memblock: don't mark memblock_phys_mem_size() as __init
dump_stack: avoid potential deadlocks
mm: validate_mm browse_rb SMP race condition
m32r: fix build failure due to SMP and MMU
...
Linus Torvalds [Sat, 6 Feb 2016 03:52:57 +0000 (19:52 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fixes from Sage Weil:
"We have a few wire protocol compatibility fixes, ports of a few recent
CRUSH mapping changes, and a couple error path fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
libceph: MOSDOpReply v7 encoding
libceph: advertise support for TUNABLES5
crush: decode and initialize chooseleaf_stable
crush: add chooseleaf_stable tunable
crush: ensure take bucket value is valid
crush: ensure bucket id is valid before indexing buckets array
ceph: fix snap context leak in error path
ceph: checking for IS_ERR instead of NULL
Linus Torvalds [Sat, 6 Feb 2016 03:38:15 +0000 (19:38 -0800)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Fixes all over the place:
- amdkfd: two static checker fixes
- mst: a bunch of static checker and spec/hw interaction fixes
- amdgpu: fix Iceland hw properly, and some fiji bugs, along with
some write-combining fixes.
- exynos: some regression fixes
- adv7511: fix some EDID reading issues"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (38 commits)
drm/dp/mst: deallocate payload on port destruction
drm/dp/mst: Reverse order of MST enable and clearing VC payload table.
drm/dp/mst: move GUID storage from mgr, port to only mst branch
drm/dp/mst: change MST detection scheme
drm/dp/mst: Calculate MST PBN with 31.32 fixed point
drm: Add drm_fixp_from_fraction and drm_fixp2int_ceil
drm/mst: Add range check for max_payloads during init
drm/mst: Don't ignore the MST PBN self-test result
drm: fix missing reference counting decrease
drm/amdgpu: disable uvd and vce clockgating on Fiji
drm/amdgpu: remove exp hardware support from iceland
drm/amdgpu: load MEC ucode manually on iceland
drm/amdgpu: don't load MEC2 on topaz
drm/amdgpu: drop topaz support from gmc8 module
drm/amdgpu: pull topaz gmc bits into gmc_v7
drm/amdgpu: The VI specific EXE bit should only apply to GMC v8.0 above
drm/amdgpu: iceland use CI based MC IP
drm/amdgpu: move gmc7 support out of CIK dependency
drm/amdgpu/gfx7: enable cp inst/reg error interrupts
drm/amdgpu/gfx8: enable cp inst/reg error interrupts
...
Linus Torvalds [Sat, 6 Feb 2016 02:11:23 +0000 (18:11 -0800)]
Merge tag 'pm+acpi-4.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
"These are: a fix for a recently introduced false-positive warnings
about PM domain pointers being changed inappropriately (harmless but
annoying), an MCH size workaround quirk for one more platform, a
compiler warning fix (generic power domains framework), an ACPI LPSS
(Intel SoCs) driver fixup and a cleanup of the ACPI CPPC core code.
Specifics:
- PM core fix to avoid false-positive warnings generated when the
pm_domain field is cleared for a device that appears to be bound to
a driver (Rafael Wysocki).
- New MCH size workaround quirk for Intel Haswell-ULT (Josh Boyer).
- Fix for an "unused function" compiler warning in the generic power
domains framework (Ulf Hansson).
- Fixup for the ACPI driver for Intel SoCs (acpi-lpss) to set the PM
domain pointer of a device properly in one place that was
overlooked by a recent PM core update (Andy Shevchenko).
- Removal of a redundant function declaration in the ACPI CPPC core
code (Timur Tabi)"
* tag 'pm+acpi-4.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: Avoid false-positive warnings in dev_pm_domain_set()
PM / Domains: Silence compiler warning for an unused function
ACPI / CPPC: remove redundant mbox_send_message() declaration
ACPI / LPSS: set PM domain via helper setter
PNP: Add Haswell-ULT to Intel MCH size workaround
Jason Baron [Fri, 5 Feb 2016 23:37:04 +0000 (15:37 -0800)]
epoll: restrict EPOLLEXCLUSIVE to POLLIN and POLLOUT
In the current implementation of the EPOLLEXCLUSIVE flag (added for
4.5-rc1), if epoll waiters create different POLL* sets and register them
as exclusive against the same target fd, the current implementation will
stop waking any further waiters once it finds the first idle waiter.
This means that waiters could miss wakeups in certain cases.
For example, when we wake up a pipe for reading we do:
wake_up_interruptible_sync_poll(&pipe->wait, POLLIN | POLLRDNORM); So if
one epoll set or epfd is added to pipe p with POLLIN and a second set
epfd2 is added to pipe p with POLLRDNORM, only epfd may receive the
wakeup since the current implementation will stop after it finds any
intersection of events with a waiter that is blocked in epoll_wait().
We could potentially address this by requiring all epoll waiters that
are added to p be required to pass the same set of POLL* events. IE the
first EPOLL_CTL_ADD that passes EPOLLEXCLUSIVE establishes the set POLL*
flags to be used by any other epfds that are added as EPOLLEXCLUSIVE.
However, I think it might be somewhat confusing interface as we would
have to reference count the number of users for that set, and so
userspace would have to keep track of that count, or we would need a
more involved interface. It also adds some shared state that we'd have
store somewhere. I don't think anybody will want to bloat
__wait_queue_head for this.
I think what we could do instead, is to simply restrict EPOLLEXCLUSIVE
such that it can only be specified with EPOLLIN and/or EPOLLOUT. So
that way if the wakeup includes 'POLLIN' and not 'POLLOUT', we can stop
once we hit the first idle waiter that specifies the EPOLLIN bit, since
any remaining waiters that only have 'POLLOUT' set wouldn't need to be
woken. Likewise, we can do the same thing if 'POLLOUT' is in the wakeup
bit set and not 'POLLIN'. If both 'POLLOUT' and 'POLLIN' are set in the
wake bit set (there is at least one example of this I saw in fs/pipe.c),
then we just wake the entire exclusive list. Having both 'POLLOUT' and
'POLLIN' both set should not be on any performance critical path, so I
think that's ok (in fs/pipe.c its in pipe_release()). We also continue
to include EPOLLERR and EPOLLHUP by default in any exclusive set. Thus,
the user can specify EPOLLERR and/or EPOLLHUP but is not required to do
so.
Since epoll waiters may be interested in other events as well besides
EPOLLIN, EPOLLOUT, EPOLLERR and EPOLLHUP, these can still be added by
doing a 'dup' call on the target fd and adding that as one normally
would with EPOLL_CTL_ADD. Since I think that the POLLIN and POLLOUT
events are what we are interest in balancing, I think that the 'dup'
thing could perhaps be added to only one of the waiter threads.
However, I think that EPOLLIN, EPOLLOUT, EPOLLERR and EPOLLHUP should be
sufficient for the majority of use-cases.
Since EPOLLEXCLUSIVE is intended to be used with a target fd shared
among multiple epfds, where between 1 and n of the epfds may receive an
event, it does not satisfy the semantics of EPOLLONESHOT where only 1
epfd would get an event. Thus, it is not allowed to be specified in
conjunction with EPOLLEXCLUSIVE.
EPOLL_CTL_MOD is also not allowed if the fd was previously added as
EPOLLEXCLUSIVE. It seems with the limited number of flags to not be as
interesting, but this could be relaxed at some further point.
Signed-off-by: Jason Baron <jbaron@akamai.com> Tested-by: Madars Vitolins <m@silodev.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Eric Wong <normalperson@yhbt.net> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Helper radix_tree_iter_retry() resets next_index to the current index.
In following radix_tree_next_slot current chunk size becomes zero. This
isn't checked and it tries to dereference null pointer in slot.
Tagged iterator is fine because retry happens only at slot 0 where tag
bitmask in iter->tags is filled with single bit.
Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup") Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ohad Ben-Cohen <ohad@wizery.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit ea8f8fc8631 ("MAINTAINERS: add linux-api for review of API/ABI
changes") added file triggers for various paths that likely indicated
API/ABI changes. However, catching all changes in Documentation/ABI/
and include/uapi/ produces a large volume of mail to linux-api, rather
than only API/ABI changes. Drop those two entries, but leave
include/linux/syscalls.h and kernel/sys_ni.c to catch syscall-related
changes.
Dmitry Monakhov [Fri, 5 Feb 2016 23:36:55 +0000 (15:36 -0800)]
dax: dirty inode only if required
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need to iterate over split_queue, not local empty list to get
anything split from the shrinker.
Fixes: e3ae19535c66 ("thp: limit number of object to scan on deferred_split_scan()") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm: replace vma_lock_anon_vma with anon_vma_lock_read/write
Sequence vma_lock_anon_vma() - vma_unlock_anon_vma() isn't safe if
anon_vma appeared between lock and unlock. We have to check anon_vma
first or call anon_vma_prepare() to be sure that it's here. There are
only few users of these legacy helpers. Let's get rid of them.
This patch fixes anon_vma lock imbalance in validate_mm(). Write lock
isn't required here, read lock is enough.
And reorders expand_downwards/expand_upwards: security_mmap_addr() and
wrapping-around check don't have to be under anon vma lock.
Link: https://lkml.kernel.org/r/CACT4Y+Y908EjM2z=706dv4rV6dWtxTLK9nFg9_7DhRMLppBo2g@mail.gmail.com Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
xuejiufei [Fri, 5 Feb 2016 23:36:47 +0000 (15:36 -0800)]
ocfs2/dlm: clear refmap bit of recovery lock while doing local recovery cleanup
When recovery master down, dlm_do_local_recovery_cleanup() only remove
the $RECOVERY lock owned by dead node, but do not clear the refmap bit.
Which will make umount thread falling in dead loop migrating $RECOVERY
to the dead node.
Signed-off-by: xuejiufei <xuejiufei@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nicolai Stange [Fri, 5 Feb 2016 23:36:44 +0000 (15:36 -0800)]
um: asm/page.h: remove the pte_high member from struct pte_t
Commit 16da306849d0 ("um: kill pfn_t") introduced a compile warning for
defconfig (SUBARCH=i386):
arch/um/kernel/skas/mmu.c:38:206:
warning: right shift count >= width of type [-Wshift-count-overflow]
Aforementioned patch changes the definition of the phys_to_pfn() macro
from
((pfn_t) ((p) >> PAGE_SHIFT))
to
((p) >> PAGE_SHIFT)
This effectively changes the phys_to_pfn() expansion's type from
unsigned long long to unsigned long.
Through the callchain init_stub_pte() => mk_pte(), the expansion of
phys_to_pfn() is (indirectly) fed into the 'phys' argument of the
pte_set_val(pte, phys, prot) macro, eventually leading to
(pte).pte_high = (phys) >> 32;
This results in the warning from above.
Since UML only deals with 32 bit addresses, the upper 32 bits from
'phys' used to be always zero anyway. Also, all page protection flags
defined by UML don't use any bits beyond bit 9. Since the contents of a
PTE are defined within architecture scope only, the ->pte_high member
can be safely removed.
Remove the ->pte_high member from struct pte_t.
Rename ->pte_low to ->pte.
Adapt the pte helper macros in arch/um/include/asm/page.h.
Noteworthy is the pte_copy() macro where a smp_wmb() gets dropped. This
write barrier doesn't seem to be paired with any read barrier though and
thus, was useless anyway.
Fixes: 16da306849d0 ("um: kill pfn_t") Signed-off-by: Nicolai Stange <nicstange@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Richard Weinberger <richard@nod.at> Cc: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vlastimil Babka [Fri, 5 Feb 2016 23:36:41 +0000 (15:36 -0800)]
mm, hugetlb: don't require CMA for runtime gigantic pages
Commit 944d9fec8d7a ("hugetlb: add support for gigantic page allocation
at runtime") has added the runtime gigantic page allocation via
alloc_contig_range(), making this support available only when CONFIG_CMA
is enabled. Because it doesn't depend on MIGRATE_CMA pageblocks and the
associated infrastructure, it is possible with few simple adjustments to
require only CONFIG_MEMORY_ISOLATION instead of full CONFIG_CMA.
After this patch, alloc_contig_range() and related functions are
available and used for gigantic pages with just CONFIG_MEMORY_ISOLATION
enabled. Note CONFIG_CMA selects CONFIG_MEMORY_ISOLATION. This allows
supporting runtime gigantic pages without the CMA-specific checks in
page allocator fastpaths.
Attempting to preallocate 1G gigantic huge pages at boot time with
"hugepagesz=1G hugepages=1" on the kernel command line will prevent
booting with the following:
kernel BUG at mm/hugetlb.c:1218!
When mapcount accounting was reworked, the setting of
compound_mapcount_ptr in prep_compound_gigantic_page was overlooked. As
a result, the validation of mapcount in free_huge_page fails.
The "BUG_ON" checks in free_huge_page were also changed to
"VM_BUG_ON_PAGE" to assist with debugging.
Fixes: 53f9263baba69 ("mm: rework mapcount accounting to enable 4k mapping of THPs") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: David Rientjes <rientjes@google.com> Tested-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The only case when we can queue pages from such VMA is MPOL_MF_STRICT
plus MPOL_MF_MOVE or MPOL_MF_MOVE_ALL for VMA which has pages on LRU,
but gfp mask is not sutable for migaration (see mapping_gfp_mask() check
in vma_migratable()). That's looks like a bug to me.
Let's filter out non-migratable vma at start of queue_pages_test_walk()
and go to queue_pages_pte_range() only if MPOL_MF_MOVE or
MPOL_MF_MOVE_ALL flag is set.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tetsuo Handa [Fri, 5 Feb 2016 23:36:30 +0000 (15:36 -0800)]
mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any progress
Jan Stancek has reported that system occasionally hanging after "oom01"
testcase from LTP triggers OOM. Guessing from a result that there is a
kworker thread doing memory allocation and the values between "Node 0
Normal free:" and "Node 0 Normal:" differs when hanging, vmstat is not
up-to-date for some reason.
According to commit 373ccbe59270 ("mm, vmstat: allow WQ concurrency to
discover memory reclaim doesn't make any progress"), it meant to force
the kworker thread to take a short sleep, but it by error used
schedule_timeout(1). We missed that schedule_timeout() in state
TASK_RUNNING doesn't do anything.
Fix it by using schedule_timeout_uninterruptible(1) which forces the
kworker thread to take a short sleep in order to make sure that vmstat
is up-to-date.
Fixes: 373ccbe59270 ("mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: Jan Stancek <jstancek@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Tejun Heo <tj@kernel.org> Cc: Cristopher Lameter <clameter@sgi.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Arkadiusz Miskiewicz <arekm@maven.pl> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Fri, 5 Feb 2016 23:36:27 +0000 (15:36 -0800)]
vmstat: make vmstat_update deferrable
Commit 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
shut down on idle") made vmstat_shepherd deferrable. vmstat_update
itself is still useing standard timer which might interrupt idle task.
This is possible because "mm, vmstat: make quiet_vmstat lighter" removed
cancel_delayed_work from the quiet_vmstat.
Change vmstat_work to use DEFERRABLE_WORK to prevent from pointless
wakeups from the idle context.
Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is caused by commit 0eb77e988032 ("vmstat: make vmstat_updater
deferrable again and shut down on idle") which has placed quiet_vmstat
into cpu_idle_loop. The main reason here seems to be that the idle
entry has to get over all zones and perform atomic operations for each
vmstat entry even though there might be no per cpu diffs. This is a
pointless overhead for _each_ idle entry.
Make sure that quiet_vmstat is as light as possible.
First of all it doesn't make any sense to do any local sync if the
current cpu is already set in oncpu_stat_off because vmstat_update puts
itself there only if there is nothing to do.
Then we can check need_update which should be a cheap way to check for
potential per-cpu diffs and only then do refresh_cpu_vm_stats.
The original patch also did cancel_delayed_work which we are not doing
here. There are two reasons for that. Firstly cancel_delayed_work from
idle context will blow up on RT kernels (reported by Mike):
And secondly, even on !RT kernels it might add some non trivial overhead
which is not necessary. Even if the vmstat worker wakes up and preempts
idle then it will be most likely a single shot noop because the stats
were already synced and so it would end up on the oncpu_stat_off anyway.
We just need to teach both vmstat_shepherd and vmstat_update to stop
scheduling the worker if there is nothing to do.
[mgalbraith@suse.de: cancel pending work of the cpu_stat_off CPU] Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Mike Galbraith <mgalbraith@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Gibson [Fri, 5 Feb 2016 23:36:19 +0000 (15:36 -0800)]
memblock: don't mark memblock_phys_mem_size() as __init
At the moment memblock_phys_mem_size() is marked as __init, and so is
discarded after boot. This is different from most of the memblock
functions which are marked __init_memblock, and are only discarded after
boot if memory hotplug is not configured.
To allow for upcoming code which will need memblock_phys_mem_size() in
the hotplug path, change it from __init to __init_memblock.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The mmap_sem for reading in validate_mm called from expand_stack is not
enough to prevent the argumented rbtree rb_subtree_gap information to
change from under us because expand_stack may be running from other
threads concurrently which will hold the mmap_sem for reading too.
The argumented rbtree is updated with vma_gap_update under the
page_table_lock so use it in browse_rb() too to avoid false positives.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sudip Mukherjee [Fri, 5 Feb 2016 23:36:10 +0000 (15:36 -0800)]
m32r: fix build failure due to SMP and MMU
One of the randconfig build failed with the error:
arch/m32r/kernel/smp.c: In function 'smp_flush_tlb_mm':
arch/m32r/kernel/smp.c:283:20: error: subscripted value is neither array nor pointer nor vector
mmc = &mm->context[cpu_id];
^
arch/m32r/kernel/smp.c: In function 'smp_flush_tlb_page':
arch/m32r/kernel/smp.c:353:20: error: subscripted value is neither array nor pointer nor vector
mmc = &mm->context[cpu_id];
^
arch/m32r/kernel/smp.c: In function 'smp_invalidate_interrupt':
arch/m32r/kernel/smp.c:479:41: error: subscripted value is neither array nor pointer nor vector
unsigned long *mmc = &flush_mm->context[cpu_id];
It turned out that CONFIG_SMP was defined but CONFIG_MMU was not
defined. But arch/m32r/include/asm/mmu.h only defines mm_context_t as
an array when both CONFIG_SMP and CONFIG_MMU are defined. And
arch/m32r/kernel/smp.c is always using context as an array. So without
MMU SMP can not work.
Ross Zwisler [Fri, 5 Feb 2016 23:36:08 +0000 (15:36 -0800)]
block: fix pfn_mkwrite() DAX fault handler
Previously the pfn_mkwrite() fault handler for raw block devices called
bldev_dax_fault() -> __dax_fault() to do a full DAX page fault.
Really what the pfn_mkwrite() fault handler needs to do is call
dax_pfn_mkwrite() to make sure that the radix tree entry for the given
PTE is marked as dirty so that a follow-up fsync or msync call will
flush it durably to media.
Fixes: 5a023cdba50c ("block: enable dax for raw block devices") Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 5 Feb 2016 20:46:38 +0000 (12:46 -0800)]
Merge tag 'media/v4.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- vb2: fix a vb2_thread regression and DVB read() breakages
- vsp1: fix compilation and links creation
- s5k6a3: Fix VIDIOC_SUBDEV_G_FMT ioctl for TRY format
- exynos4-is: fix a build issue, format negotiation and sensor detection
- Fix a regression with pvrusb2 and ir-kbd-i2c
- atmel-isi: fix debug message which only show the first format
- tda1004x: fix a tuning bug if G_PROPERTY is called too early
- saa7134-alsa: fix a bug at device unbinding/driver removal
- Fix build of one driver if !HAS_DMA
- soc_camera: cleanup control device on async_unbind
* tag 'media/v4.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] saa7134-alsa: Only frees registered sound cards
[media] vb2-core: call threadio->fnc() if !VB2_BUF_STATE_ERROR
[media] vb2: fix nasty vb2_thread regression
[media] tda1004x: only update the frontend properties if locked
[media] media: i2c: Don't export ir-kbd-i2c module alias
[media] exynos4-is: make VIDEO_SAMSUNG_EXYNOS4_IS tristate
[media] media: Kconfig: add dependency of HAS_DMA
[media] exynos4-is: Wait for 100us before opening sensor
[media] exynos4-is: Open shouldn't fail when sensor entity is not linked
[media] s5k6a3: Fix VIDIOC_SUBDEV_G_FMT ioctl for TRY format
[media] exynos4-is: fix a format string bug
[media] drivers/media: vsp1_video: fix compile error
[media] atmel-isi: fix debug message which only show the first format
[media] soc_camera: cleanup control device on async_unbind
[media] v4l: vsp1: Fix wrong entities links creation
Linus Torvalds [Fri, 5 Feb 2016 20:34:44 +0000 (12:34 -0800)]
Merge tag 'sound-4.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This was a busy week and I had to prepare a pile of duct tapes for the
bugs reported by syzkaller fuzzer in wide range of ALSA core APIs:
timer, rawmidi, sequencer, and PCM OSS emulation. Let's see how many
other holes we need to plug.
Besides that, a few usual boring stuff, HD- and USB-audio quirks, have
been added"
* tag 'sound-4.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: timer: Fix leftover link at closing
ALSA: seq: Fix lockdep warnings due to double mutex locks
ALSA: rawmidi: Fix race at copying & updating the position
ALSA: rawmidi: Make snd_rawmidi_transmit() race-free
ALSA: hda - Add fixup for Mac Mini 7,1 model
ALSA: hda/realtek - Support headset mode for ALC225
ALSA: hda/realtek - Support Dell headset mode for ALC225
ALSA: hda/realtek - New codec support of ALC225
ALSA: timer: Sync timer deletion at closing the system timer
ALSA: timer: Fix link corruption due to double start or stop
ALSA: seq: Fix yet another races among ALSA timer accesses
ALSA: pcm: Fix potential deadlock in OSS emulation
ALSA: rawmidi: Remove kernel WARNING for NULL user-space buffer check
ALSA: seq: Fix race at closing in virmidi driver
ALSA: emu10k1: correctly handling failed thread creation
ALSA: usb-audio: Add quirk for Microsoft LifeCam HD-6000
ALSA: usb-audio: Add native DSD support for PS Audio NuWave DAC
ALSA: usb-audio: Fix OPPO HA-1 vendor ID
Linus Torvalds [Fri, 5 Feb 2016 19:20:15 +0000 (11:20 -0800)]
Merge git://www.linux-watchdog.org/linux-watchdog
Pull watchdog fixes from Wim Van Sebroeck:
"This fixes several Kconfig dependencies, a compilation warning in
pcwd_usb, a failure to abort the sp805 wdt after a ping and the
max63xx wdt's MODULE_LICENSE"
* git://www.linux-watchdog.org/linux-watchdog:
watchdog: Fix dependencies for !HAS_IOMEM archs
watchdog: imgdpc: select WATCHDOG_CORE
watchdog: tango: rename ARCH_TANGOX to ARCH_TANGO
watchdog: pcwd_usb: fix compilation warning
watchdog: sp805: ping fails to abort wdt reset
watchdog: max63xx: make module's license marker match the header
Dave Airlie [Fri, 5 Feb 2016 05:24:17 +0000 (15:24 +1000)]
Merge branch 'drm-fixes-mst' of git://people.freedesktop.org/~airlied/linux into drm-fixes
displayport multistream fixes from AMD.
* 'drm-fixes-mst' of git://people.freedesktop.org/~airlied/linux:
drm/dp/mst: deallocate payload on port destruction
drm/dp/mst: Reverse order of MST enable and clearing VC payload table.
drm/dp/mst: move GUID storage from mgr, port to only mst branch
drm/dp/mst: change MST detection scheme
drm/dp/mst: Calculate MST PBN with 31.32 fixed point
drm: Add drm_fixp_from_fraction and drm_fixp2int_ceil
drm/mst: Add range check for max_payloads during init
drm/mst: Don't ignore the MST PBN self-test result
drm: fix missing reference counting decrease
Mykola Lysenko [Wed, 27 Jan 2016 14:39:36 +0000 (09:39 -0500)]
drm/dp/mst: deallocate payload on port destruction
This is needed to properly deallocate port payload
after downstream branch get unplugged.
In order to do this unplugged MST topology should
be preserved, to find first alive port on path to
unplugged MST topology, and send payload deallocation
request to branch device of found port.
For this mstb and port kref's are used in reversed
order to track when port and branch memory could be
freed.
Added additional functions to find appropriate mstb
as described above.
Signed-off-by: Mykola Lysenko <Mykola.Lysenko@amd.com> Reviewed-by: Harry Wentland <Harry.Wentland@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com>
Hersen Wu [Fri, 22 Jan 2016 22:07:28 +0000 (17:07 -0500)]
drm/dp/mst: move GUID storage from mgr, port to only mst branch
Previous implementation does not handle case below: boot up one MST branch
to DP connector of ASIC. After boot up, hot plug 2nd MST branch to DP output
of 1st MST, GUID is not created for 2nd MST branch. When downstream port of
2nd MST branch send upstream request, it fails because 2nd MST branch GUID
is not available.
New Implementation: only create GUID for MST branch and save it within Branch.
Signed-off-by: Hersen Wu <hersenxs.wu@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Cc: stable@vger.kernel.org Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Mykola Lysenko [Fri, 22 Jan 2016 22:07:27 +0000 (17:07 -0500)]
drm/dp/mst: change MST detection scheme
1. Get edid for all connected MST displays, not only on logical ports,
in the same thread as MST topology detection is done:
There are displays that have branches inside w/o logical ports.
So in case another SST display connected downstream system can
end-up in situation when 3 DOWN requests sent: two for
‘remote i2c read’ and one for ‘enum path resources’, making slots full.
2. Call notification callback in one place in the end of topology discovery/update:
This is done to reduce number of events sent to userspace in case complex
topology discovery is going, adding multiple number of connectors;
3. Remove notification callback call from short pulse interrupt processing function:
This is done in order not to block interrupt processing function, in case any
MST request will be made from it. Notification will be send from topology
discovery/update work item.
Signed-off-by: Mykola Lysenko <Mykola.Lysenko@amd.com> Reviewed-by: Harry Wentland <Harry.Wentland@amd.com> Cc: stable@vger.kernel.org Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Harry Wentland [Fri, 22 Jan 2016 22:07:26 +0000 (17:07 -0500)]
drm/dp/mst: Calculate MST PBN with 31.32 fixed point
Our PBN value overflows the 20 bits integer part of the 20.12
fixed point. We need to use 31.32 fixed point to avoid this.
This happens with display clocks larger than 293122 (at 24 bpp),
which we see with the Sharp (and similar) 4k tiled displays.
Signed-off-by: Harry Wentland <harry.wentland@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Harry Wentland [Fri, 22 Jan 2016 22:07:25 +0000 (17:07 -0500)]
drm: Add drm_fixp_from_fraction and drm_fixp2int_ceil
drm_fixp_from_fraction allows us to create a fixed point directly
from a fraction, rather than creating fixed point values and dividing
later. This avoids overflow of our 64 bit value for large numbers.
drm_fixp2int_ceil allows us to return the ceiling of our fixed point
value.
[airlied: squash Jordan's fix]
32-bit-build-fix: Jordan Lazare <Jordan.Lazare@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Imre Deak [Fri, 29 Jan 2016 12:44:29 +0000 (14:44 +0200)]
drm/mst: Add range check for max_payloads during init
max_payload is limited by the space we have in
drm_dp_mst_topology_mgr::vcpi_mask,payload_mask. We need to track
max_payloads+1 IDs in these masks, see drm_dp_mst_assign_payload_id().
Add a sanity check for this.
Caught by coverity.
Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: David Weinehall <david.weinehall@intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Imre Deak [Fri, 29 Jan 2016 12:44:28 +0000 (14:44 +0200)]
drm/mst: Don't ignore the MST PBN self-test result
Otherwise this call would have no effect.
Caught by Coverity.
Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: David Weinehall <david.weinehall@intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Fri, 5 Feb 2016 04:48:36 +0000 (14:48 +1000)]
Merge branch 'drm-fixes-4.5' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
- fix and enable iceland/topaz support
- handle WC on platforms that don't support it
* 'drm-fixes-4.5' of git://people.freedesktop.org/~agd5f/linux:
drm/amdgpu: disable uvd and vce clockgating on Fiji
drm/amdgpu: remove exp hardware support from iceland
drm/amdgpu: load MEC ucode manually on iceland
drm/amdgpu: don't load MEC2 on topaz
drm/amdgpu: drop topaz support from gmc8 module
drm/amdgpu: pull topaz gmc bits into gmc_v7
drm/amdgpu: The VI specific EXE bit should only apply to GMC v8.0 above
drm/amdgpu: iceland use CI based MC IP
drm/amdgpu: move gmc7 support out of CIK dependency
drm/amdgpu/gfx7: enable cp inst/reg error interrupts
drm/amdgpu/gfx8: enable cp inst/reg error interrupts
drm/amdgpu: mask out WC from BO on unsupported arches
drm/radeon: mask out WC from BO on unsupported arches
drm: add helper to check for wc memory support
drm/amdgpu: no need to load MC firmware on fiji
Dave Airlie [Fri, 5 Feb 2016 04:47:24 +0000 (14:47 +1000)]
Merge tag 'drm-amdkfd-fixes-2016-01-28' of git://people.freedesktop.org/~gabbayo/linux into drm-fixes
two static checker fixes.
* tag 'drm-amdkfd-fixes-2016-01-28' of git://people.freedesktop.org/~gabbayo/linux:
drm/amdkfd: Remove unnecessary cast in kfree
drm/amdgpu: fix non-ANSI declaration of amdgpu_amdkfd_gfx_*_get_functions()
Dave Airlie [Fri, 5 Feb 2016 04:45:44 +0000 (14:45 +1000)]
Merge branch 'exynos-drm-fixes' of git://git.kernel.org:/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes
Just regression fixes.
- Fix build warning and error without PM configuration
- Fix no display issue on Snow board reported by Michal Suchanek,
http://www.spinics.net/lists/dri-devel/msg99473.html
* 'exynos-drm-fixes' of git://git.kernel.org:/pub/scm/linux/kernel/git/daeinki/drm-exynos:
drm/exynos: dp: Fix panel and bridge lookup logic
drm: exynos: make PM functions as __maybe_unused
drm/exynos: fix building without CONFIG_PM_SLEEP
Dave Airlie [Fri, 5 Feb 2016 04:43:35 +0000 (14:43 +1000)]
Merge tag 'drm-intel-fixes-2016-02-04' of git://anongit.freedesktop.org/drm-intel into drm-fixes
misc i915 fixes.
* tag 'drm-intel-fixes-2016-02-04' of git://anongit.freedesktop.org/drm-intel:
drm/i915: refine qemu south bridge detection
drm/i915: Remove select to deleted STOP_MACHINE from Kconfig
drm/i915: Fix NULL plane->fb oops on SKL
drm/i915: Don't reject primary plane windowing with color keying enabled on SKL+
drm/i915/dp: fall back to 18 bpp when sink capability is unknown
drm/i915: Make sure DC writes are coherent on flush.
Joe Lawrence [Wed, 3 Feb 2016 17:51:12 +0000 (12:51 -0500)]
xhci: harden xhci_find_next_ext_cap against device removal
xhci_find_next_ext_cap doesn't check for PCI hotplug removal and may use
the PCI master abort bit pattern (~0) to calculate a new PCI address
offset to read/write. The has lead to reproducable crashes when testing
surprise removal during device initialization on a Stratus platform, at
least after commit d5ddcdf4d672 ("xhci: rework xhci extended capability
list parsing functions").
The crash is repeatable on a Stratus platform when injecting hardware
faults to induce xHCI host controller hotplug during driver
initialization. If a PCI read in xhci_find_next_ext_cap returns the
master abort pattern, quirk_usb_handoff_xhci may start using a bogus
ext_cap_offset to start searching more bogus PCI addresses.
Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com> Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linus Torvalds [Thu, 4 Feb 2016 22:09:55 +0000 (14:09 -0800)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Nothing particularly interesting here, but all important fixes
nonetheless:
- Add missing PAN toggling in the futex code
- Fix missing #include that briefly caused issues in -next
- Allow changing of vmalloc permissions with set_memory_* (used by
bpf)"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: asm: Explicitly include linux/personality.h in asm/page.h
arm64: futex.h: Add missing PAN toggling
arm64: allow vmalloc regions to be set with set_memory_*
Linus Walleij [Mon, 1 Feb 2016 13:18:57 +0000 (14:18 +0100)]
ARM: nomadik: fix up SD/MMC DT settings
The DTSI file for the Nomadik does not properly specify how the
PL180 levelshifter is connected: the Nomadik actually needs all
the five st,sig-dir-* flags set to properly control all lines out.
Further this board supports full power cycling of the card, and
since this variant has no hardware clock gating, it needs a
ridiculously low frequency setting to keep up with the ever
overflowing FIFO.
The pin configuration set-up is a bit of a mystery, because of
course these pins are a mix of inputs and outputs. However the
reference implementation sets all pins to "output" with
unspecified initial value, so let's do that here as well.
Merge tag 'fixes-for-v4.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-linus
Felipe writes:
usb: fixes for v4.5-rc2
DWC3 got a fix for OTG Certification, DWC2 has two fixes for regressions on
RasPI, MUSB has a NULL pointer dereference fix for ux500 platforms and two
PHYs (MSM and MXS) got some minor fixes.
While at that, I'm also adding a fix to my email address which has changed
recently.
Linus Torvalds [Thu, 4 Feb 2016 19:18:22 +0000 (11:18 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
- One minor fix to the ib core
- Four minor fixes to the Mellanox drivers
- Remove three deprecated drivers from staging/rdma now that all of
Greg's queued changes to them are merged
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
staging/rdma: remove deprecated ipath driver
staging/rdma: remove deprecated ehca driver
staging/rdma: remove deprecated amso1100 driver
IB/core: Set correct payload length for RoCEv2 over IPv6
IB/mlx5: Use MLX5_GET to correctly get end of padding mode
IB/mlx5: Fix use of null pointer PD
IB/mlx5: Fix reqlen validation in mlx5_ib_alloc_ucontext
IB/mlx5: Add CREATE_CQ and CREATE_QP to uverbs_ex_cmd_mask
Ilya Dryomov [Wed, 3 Feb 2016 14:25:48 +0000 (15:25 +0100)]
libceph: MOSDOpReply v7 encoding
Empty request_redirect_t (struct ceph_request_redirect in the kernel
client) is now encoded with a bool. NEW_OSDOPREPLY_ENCODING feature
bit overlaps with already supported CRUSH_TUNABLES5.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>