Bjorn Helgaas [Mon, 1 Aug 2016 17:23:31 +0000 (12:23 -0500)]
Merge branches 'pci/aspm', 'pci/dpc', 'pci/hotplug', 'pci/misc', 'pci/msi', 'pci/pm' and 'pci/virtualization' into next
* pci/aspm:
PCI/ASPM: Remove redundant check of pcie_set_clkpm
* pci/dpc:
PCI: Remove DPC tristate module option
PCI: Bind DPC to Root Ports as well as Downstream Ports
PCI: Fix whitespace in struct dpc_dev
PCI: Convert Downstream Port Containment driver to use devm_* functions
* pci/hotplug:
PCI: Allow additional bus numbers for hotplug bridges
* pci/misc:
PCI: Include <asm/dma.h> for isa_dma_bridge_buggy
PCI: Make bus_attr_resource_alignment static
MAINTAINERS: Add file patterns for PCI device tree bindings
PCI: Fix comment typo
* pci/pm:
PCI: pciehp: Ignore interrupts during D3cold
PCI: Document connection between pci_power_t and hardware PM capability
PCI: Add runtime PM support for PCIe ports
ACPI / hotplug / PCI: Runtime resume bridge before rescan
PCI: Power on bridges before scanning new devices
PCI: Put PCIe ports into D3 during suspend
PCI: Don't clear d3cold_allowed for PCIe ports
PCI / PM: Enforce type casting for pci_power_t
* pci/virtualization:
PCI: Add ACS quirk for Solarflare SFC9220
PCI: Add DMA alias quirk for Adaptec 3805
PCI: Mark Atheros AR9485 and QCA9882 to avoid bus reset
PCI: Add function 1 DMA alias quirk for Marvell 88SE9182
Edward Cree [Thu, 28 Jul 2016 17:13:56 +0000 (18:13 +0100)]
PCI: Add ACS quirk for Solarflare SFC9220
The Solarflare SFC9220 apparently lacks an ACS capability, but does not
perform peer-to-peer between functions. Add a quirk so we know about this
isolation.
[bhelgaas: changelog] Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Keith Busch [Fri, 22 Jul 2016 03:40:28 +0000 (21:40 -0600)]
PCI: Allow additional bus numbers for hotplug bridges
A user may hot add a switch requiring more than one bus to enumerate. This
previously required a system reboot if BIOS did not sufficiently pad the
bus resource, which they frequently don't do.
Add a kernel parameter so a user can specify the minimum number of bus
numbers to reserve for a hotplug bridge's subordinate buses so rebooting
won't be necessary.
The default is 1, which is equivalent to previous behavior.
Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Keith Busch [Wed, 6 Jul 2016 16:06:00 +0000 (10:06 -0600)]
PCI: Bind DPC to Root Ports as well as Downstream Ports
PCIe port type values are not flags, so OR'ing them is not correct.
Previously the result was equivalent to PCIe Downstream Ports, so we were
missing binding to DPC-capable Root Ports.
Change the type to 'any' so we can bind to both port types. While this
will cause the code to check Upstream Ports, the driver won't claim them
since they are not DPC-capable.
Reported-by: Alexander Antonov <alexanderx.v.antonov@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Mika Westerberg <mika.westerberg@linux.intel.com>
Ben Dooks [Fri, 17 Jun 2016 15:05:13 +0000 (16:05 +0100)]
PCI: Include <asm/dma.h> for isa_dma_bridge_buggy
At least on arm, <asm/dma.h> does not get included when building
drivers/pci/pci.o. This causes the following build warning which can be
fixed by including <asm/dma.h>:
drivers/pci/pci.c:37:5: warning: symbol 'isa_dma_bridge_buggy' was not declared. Should it be static?
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Lukas Wunner [Fri, 13 May 2016 11:15:31 +0000 (13:15 +0200)]
PCI: pciehp: Ignore interrupts during D3cold
If a hotplug port is suspended to D3cold, its slot status register cannot
be read. If that hotplug port happens to share its IRQ with other devices,
whenever an interrupt occurs for one of these devices, pciehp logs a
"no response from device" message and tries to read the PCI_EXP_SLTSTA
register, even though we know that will fail.
Bjorn Helgaas [Fri, 17 Jun 2016 20:23:52 +0000 (15:23 -0500)]
PCI: Document connection between pci_power_t and hardware PM capability
The dev.pme_support field, pci_pm_init(), pci_pme_capable(), and
pci_raw_set_power_state() depend on the fact that the pci_power_t values
(PCI_D0, PCI_D1, etc.) match the definition of the Capabilities PME_Support
and the Control/Status PowerState fields in the Power Management capability
(see PCI Bus Power Management spec r1.2, sec 3.2.3).
Add a note to this effect at the pci_power_t typedef.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Arnd Bergmann [Wed, 15 Jun 2016 20:47:33 +0000 (15:47 -0500)]
PCI/MSI: irqchip: Fix PCI_MSI dependencies
The PCI_MSI symbol is used inconsistently throughout the tree, with some
drivers using 'select' and others using 'depends on', or using conditional
selects. This keeps causing problems; the latest one is a result of
ARCH_ALPINE using a 'select' statement to enable its platform-specific MSI
driver without enabling MSI:
warning: (ARCH_ALPINE) selects ALPINE_MSI which has unmet direct dependencies (PCI && PCI_MSI)
drivers/irqchip/irq-alpine-msi.c:104:15: error: variable 'alpine_msix_domain_info' has initializer but incomplete type
static struct msi_domain_info alpine_msix_domain_info = {
^~~~~~~~~~~~~~~
drivers/irqchip/irq-alpine-msi.c:105:2: error: unknown field 'flags' specified in initializer
.flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
^
drivers/irqchip/irq-alpine-msi.c:105:11: error: 'MSI_FLAG_USE_DEF_DOM_OPS' undeclared here (not in a function)
.flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
^~~~~~~~~~~~~~~~~~~~~~~~
There is little reason to enable PCI support for a platform that uses MSI
but then leave MSI disabled at compile time.
Select PCI_MSI from irqchips that implement MSI, and make PCI host bridges
that use MSI on ARM depend on PCI_MSI_IRQ_DOMAIN.
For all three architectures that support PCI_MSI_IRQ_DOMAIN (ARM, ARM64,
X86), enable it by default whenever MSI is enabled.
Mika Westerberg [Thu, 2 Jun 2016 08:17:15 +0000 (11:17 +0300)]
PCI: Add runtime PM support for PCIe ports
Add back runtime PM support for PCIe ports that was removed by fe9a743a2601 ("PCI/PM: Drop unused runtime PM support code for PCIe
ports").
We cannot enable it automatically for all ports since there have been
problems previously [1]. In summary suspended PCIe ports were not able
to deal with ACPI-based hotplug reliably. One reason why this might happen
is the fact that when a PCIe port is powered down, config space access to
the devices behind the port is not possible. If the BIOS hotplug SMI
handler assumes the port is always in D0 it will not be able to find the
hotplugged devices. To be on the safe side only enable runtime PM if the
port does not claim to support hotplug.
For PCIe ports not using hotplug, we enable and allow runtime PM
automatically. Since 'bridge_d3' can be changed any time we check this in
driver ->runtime_idle() and ->runtime_suspend() and only allow runtime
suspend if the flag is still set. Use autosuspend with default of 100ms
idle time to prevent the port from repeatedly suspending and resuming on
continuous configuration space access of devices behind the port.
The actual power transition to D3 and back is handled in the PCI core.
Idea to automatically unblock (allow) runtime PM for PCIe ports came from
Dave Airlie.
Mika Westerberg [Thu, 2 Jun 2016 08:17:14 +0000 (11:17 +0300)]
ACPI / hotplug / PCI: Runtime resume bridge before rescan
If a PCI bridge (or PCIe port) that is runtime suspended gets an ACPI
hotplug event, such as BUS_CHECK we need to make sure it is resumed before
devices below the bridge are re-scanned. Otherwise the devices behind the
port are not accessible and will be treated as hot-unplugged.
To fix this, resume PCI bridges from runtime suspend while rescanning.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Mika Westerberg [Thu, 2 Jun 2016 08:17:13 +0000 (11:17 +0300)]
PCI: Power on bridges before scanning new devices
When a PCI device is removed through sysfs interface, the upstream bridge
(PCIe port) can be runtime suspended if it was the last device on that bus.
Now, if the bridge is in D3 we cannot find devices below the bridge
anymore. For example following fails to find the removed device again:
In order to be able to rescan devices below the bridge add
pm_runtime_get_sync()/pm_runtime_put() calls to pci_scan_bridge(). This
should keep bridges powered on while their children devices are being
scanned.
Reported-by: Peter Wu <peter@lekensteyn.nl> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Mika Westerberg [Thu, 2 Jun 2016 08:17:12 +0000 (11:17 +0300)]
PCI: Put PCIe ports into D3 during suspend
Currently the Linux PCI core does not touch power state of PCI bridges and
PCIe ports when system suspend is entered. Leaving them in D0 consumes
power unnecessarily and may prevent the CPU from entering deeper C-states.
With recent PCIe hardware we can power down the ports to save power given
that we take into account few restrictions:
- The PCIe port hardware is recent enough, starting from 2015.
- Devices connected to PCIe ports are effectively in D3cold once the port
is transitioned to D3 (the config space is not accessible anymore and
the link may be powered down).
- Devices behind the PCIe port need to be allowed to transition to D3cold
and back. There is a way both drivers and userspace can forbid this.
- If the device behind the PCIe port is capable of waking the system it
needs to be able to do so from D3cold.
This patch adds a new flag to struct pci_device called 'bridge_d3'. This
flag is set and cleared by the PCI core whenever there is a change in power
management state of any of the devices behind the PCIe port. When system
later on is suspended we only need to check this flag and if it is true
transition the port to D3 otherwise we leave it in D0.
Also provide override mechanism via command line parameter
"pcie_port_pm=[off|force]" that can be used to disable or enable the
feature regardless of the BIOS manufacturing date.
Tested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Mika Westerberg [Thu, 2 Jun 2016 08:17:11 +0000 (11:17 +0300)]
PCI: Don't clear d3cold_allowed for PCIe ports
The PCI core skips bridges and ports when the system is suspended. The PCI
core checks return value of pci_has_subordinate() in pci_pm_suspend_noirq()
to skip all devices where it is non-zero (which means PCI bridges and PCIe
ports).
Since PCIe ports are never suspended in the first place, there is no need
to set d3cold_allowed for them.
Tested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Shawn Lin [Tue, 24 May 2016 09:32:10 +0000 (17:32 +0800)]
PCI/ASPM: Remove redundant check of pcie_set_clkpm
Without supporting clock PM capable, if we want to disable clkpm, we don't
need this extra check as it must already be zero for the enable argument.
And it's the same for enabling clkpm here. So let's remove this check.
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Steven Graham <sgraham@xes-inc.com> Signed-off-by: Aaron Sierra <asierra@xes-inc.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tomasz Nowicki [Fri, 10 Jun 2016 19:55:19 +0000 (21:55 +0200)]
ARM64: PCI: Support ACPI-based PCI host controller
Implement pci_acpi_scan_root() and other arch-specific calls so ARM64 can
use ACPI to setup and enumerate PCI buses.
Use memory-mapped configuration space information from either the ACPI
_CBA method or the MCFG table and the ECAM library and generic ECAM config
accessor ops.
Implement acpi_pci_bus_find_domain_nr() to retrieve the domain number from
the acpi_pci_root structure.
Implement pcibios_add_bus() and pcibios_remove_bus() to call
acpi_pci_add_bus() and acpi_pci_remove_bus() for ACPI slot management and
other configuration.
Signed-off-by: Tomasz Nowicki <tn@semihalf.com> Signed-off-by: Jayachandran C <jchandra@broadcom.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tomasz Nowicki [Fri, 10 Jun 2016 19:55:18 +0000 (21:55 +0200)]
ARM64: PCI: Implement AML accessors for PCI_Config region
On ACPI systems, the PCI_Config OperationRegion allows AML to access PCI
configuration space. The ACPI CA AML interpreter uses performs config
space accesses with acpi_os_read_pci_configuration() and
acpi_os_write_pci_configuration(), which are OS-dependent functions
supplied by acpi/osl.c.
Implement the arch-specific raw_pci_read() and raw_pci_write() interfaces
used by acpi/osl.c for PCI_Config accesses.
N.B. PCI_Config accesses are not supported before PCI bus enumeration.
[bhelgaas: changelog] Signed-off-by: Tomasz Nowicki <tn@semihalf.com> Signed-off-by: Jayachandran C <jchandra@broadcom.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tomasz Nowicki [Fri, 10 Jun 2016 19:55:17 +0000 (21:55 +0200)]
ARM64: PCI: ACPI support for legacy IRQs parsing and consolidation with DT code
To enable PCI legacy IRQs on platforms booting with ACPI, arch code should
include ACPI-specific callbacks that parse and set-up the device IRQ
number, equivalent to the DT boot path. Owing to the current ACPI core scan
handlers implementation, ACPI PCI legacy IRQs bindings cannot be parsed at
device add time, since that would trigger ACPI scan handlers ordering
issues depending on how the ACPI tables are defined.
To solve this problem and consolidate FW PCI legacy IRQs parsing in one
single pcibios callback (pending final removal), this patch moves DT PCI
IRQ parsing to the pcibios_alloc_irq() callback (called by PCI core code at
driver probe time) and adds ACPI PCI legacy IRQs parsing to the same
callback too, so that FW PCI legacy IRQs parsing is confined in one single
arch callback that can be easily removed when code parsing PCI legacy IRQs
is consolidated and moved to core PCI code.
Suggested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Tomasz Nowicki <tn@semihalf.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tomasz Nowicki [Fri, 10 Jun 2016 19:55:15 +0000 (21:55 +0200)]
PCI: Factor DT-specific pci_bus_find_domain_nr() code out
pci_bus_find_domain_nr() retrieves the host bridge domain number in a
DT-specific way. Rename it to of_pci_bus_find_domain_nr() to reflect that,
so we can add a corresponding function for ACPI.
[bhelgaas: changelog] Signed-off-by: Tomasz Nowicki <tn@semihalf.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tomasz Nowicki [Fri, 10 Jun 2016 19:55:14 +0000 (21:55 +0200)]
PCI: Refactor pci_bus_assign_domain_nr() for CONFIG_PCI_DOMAINS_GENERIC
Instead of assigning bus->domain_nr inside pci_bus_assign_domain_nr(),
return the domain and let the caller do the assignment. Rename
pci_bus_assign_domain_nr() to pci_bus_find_domain_nr() to reflect this.
No functional change intended.
[bhelgaas: changelog] Signed-off-by: Tomasz Nowicki <tn@semihalf.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tomasz Nowicki [Fri, 10 Jun 2016 19:55:13 +0000 (21:55 +0200)]
PCI/ACPI: Add generic MCFG table handling
On ACPI systems that support memory-mapped config space access, i.e., ECAM,
the PCI Firmware Specification says the OS can learn where the ECAM space
is from either:
- the static MCFG table (for non-hotpluggable bridges), or
- the _CBA method (for hotpluggable bridges)
The current MCFG table handling code cannot be easily generalized owing to
x86-specific quirks, which makes it hard to reuse on other architectures.
Implement generic MCFG handling from scratch, including:
- Simple MCFG table parsing (via pci_mmcfg_late_init() as in current x86)
- MCFG region lookup for a (domain, bus_start, bus_end) tuple
[bhelgaas: changelog] Signed-off-by: Tomasz Nowicki <tn@semihalf.com> Signed-off-by: Jayachandran C <jchandra@broadcom.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Jayachandran C [Fri, 10 Jun 2016 19:55:12 +0000 (21:55 +0200)]
PCI/ACPI: Support I/O resources when parsing host bridge resources
On platforms with memory-mapped I/O ports, such as ia64 and ARM64, we have
to map the memory region and coordinate it with the arch's I/O port
accessors.
For ia64, we do this in arch code because it supports both dense (1 byte
per I/O port) and sparse (1024 bytes per I/O port) memory mapping. For
arm64, we only support dense mappings, which we can do in the generic code
with pci_register_io_range() and pci_remap_iospace().
Add acpi_pci_root_remap_iospace() to remap dense memory-mapped I/O port
space when adding a bridge, and call pci_unmap_iospace() to release the
space when removing the bridge.
[bhelgaas: changelog, move #ifdef inside acpi_pci_root_remap_iospace()] Signed-off-by: Jayachandran C <jchandra@broadcom.com> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
[Tomasz: merged in Sinan's patch to unmap IO resources properly, updated changelog] Signed-off-by: Tomasz Nowicki <tn@semihalf.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Sinan Kaya [Fri, 10 Jun 2016 19:55:11 +0000 (21:55 +0200)]
PCI: Add pci_unmap_iospace() to unmap I/O resources
Add pci_unmap_iospace() to undo what pci_remap_iospace() did.
This is needed to support hotplug removal of host bridges that use
pci_remap_iospace().
[bhelgaas: changelog] Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: Tomasz Nowicki <tn@semihalf.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Jayachandran C [Fri, 10 Jun 2016 19:55:10 +0000 (21:55 +0200)]
PCI: Add parent device field to ECAM struct pci_config_window
Add a parent device field to struct pci_config_window. The parent is not
saved now, but will be useful to save it in some cases. For ACPI on ARM64,
it can be used to setup ACPI companion and domain.
Since the parent dev is in struct pci_config_window now, we need not pass
it to the init function as a separate argument.
Signed-off-by: Jayachandran C <jchandra@broadcom.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Linus Torvalds [Sun, 5 Jun 2016 18:15:33 +0000 (11:15 -0700)]
Merge branch 'parisc-4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
- Fix printk time stamps on SMP systems which got wrong due to a patch
which was added during the merge window
- Fix two bugs in the stack backtrace code: Races in module unloading
and possible invalid accesses to memory due to wrong instruction
decoding (Mikulas Patocka)
- Fix userspace crash when syscalls access invalid unaligned userspace
addresses. Those syscalls will now return EFAULT as expected.
(tagged for stable kernel series)
* 'parisc-4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Move die_if_kernel() prototype into traps.h header
parisc: Fix pagefault crash in unaligned __get_user() call
parisc: Fix printk time during boot
parisc: Fix backtrace on PA-RISC
Linus Torvalds [Sun, 5 Jun 2016 18:02:00 +0000 (11:02 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull key handling update from James Morris:
"This alters a new keyctl function added in the current merge window to
allow for a future extension planned for the next merge window"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
KEYS: Add placeholder for KDF usage with DH
devpts: Make each mount of devpts an independent filesystem.
The /dev/ptmx device node is changed to lookup the directory entry "pts"
in the same directory as the /dev/ptmx device node was opened in. If
there is a "pts" entry and that entry is a devpts filesystem /dev/ptmx
uses that filesystem. Otherwise the open of /dev/ptmx fails.
The DEVPTS_MULTIPLE_INSTANCES configuration option is removed, so that
userspace can now safely depend on each mount of devpts creating a new
instance of the filesystem.
Each mount of devpts is now a separate and equal filesystem.
Reserved ttys are now available to all instances of devpts where the
mounter is in the initial mount namespace.
A new vfs helper path_pts is introduced that finds a directory entry
named "pts" in the directory of the passed in path, and changes the
passed in path to point to it. The helper path_pts uses a function
path_parent_directory that was factored out of follow_dotdot.
In the implementation of devpts:
- devpts_mnt is killed as it is no longer meaningful if all mounts of
devpts are equal.
- pts_sb_from_inode is replaced by just inode->i_sb as all cached
inodes in the tty layer are now from the devpts filesystem.
- devpts_add_ref is rolled into the new function devpts_ptmx. And the
unnecessary inode hold is removed.
- devpts_del_ref is renamed devpts_release and reduced to just a
deacrivate_super.
- The newinstance mount option continues to be accepted but is now
ignored.
In devpts_fs.h definitions for when !CONFIG_UNIX98_PTYS are removed as
they are never used.
Documentation/filesystems/devices.txt is updated to describe the current
situation.
This has been verified to work properly on openwrt-15.05, centos5,
centos6, centos7, debian-6.0.2, debian-7.9, debian-8.2, ubuntu-14.04.3,
ubuntu-15.10, fedora23, magia-5, mint-17.3, opensuse-42.1,
slackware-14.1, gentoo-20151225 (13.0?), archlinux-2015-12-01. With the
caveat that on centos6 and on slackware-14.1 that there wind up being
two instances of the devpts filesystem mounted on /dev/pts, the lower
copy does not end up getting used.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Greg KH <greg@kroah.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Jann Horn <jann@thejh.net> Cc: Jiri Slaby <jslaby@suse.com> Cc: Florian Weimer <fw@deneb.enyo.de> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This means the userspace program clock_adjtime called the clock_adjtime()
syscall and then crashed inside the compat_get_timex() function.
Syscalls should never crash programs, but instead return EFAULT.
The IIR register contains the executed instruction, which disassebles
into "ldw 0(sr3,r5),r9".
This load-word instruction is part of __get_user() which tried to read the word
at %r5/IOR (0xfa6f7fff). This means the unaligned handler jumped in. The
unaligned handler is able to emulate all ldw instructions, but it fails if it
fails to read the source e.g. because of page fault.
int main(void) {
/* allocate 8k */
char *ptr = mmap(NULL, 2*4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
/* free second half (upper 4k) and make it invalid. */
munmap(ptr+4096, 4096);
/* syscall where first int is unaligned and clobbers into invalid memory region */
/* syscall should return EFAULT */
return syscall(__NR_clock_adjtime, 0, ptr+4095);
}
To fix this issue we simply need to check if the faulting instruction address
is in the exception fixup table when the unaligned handler failed. If it
is, call the fixup routine instead of crashing.
While looking at the unaligned handler I found another issue as well: The
target register should not be modified if the handler was unsuccessful.
Mikulas Patocka [Tue, 28 Jun 2011 22:48:19 +0000 (00:48 +0200)]
parisc: Fix backtrace on PA-RISC
This patch fixes backtrace on PA-RISC
There were several problems:
1) The code that decodes instructions handles instructions that subtract
from the stack pointer incorrectly. If the instruction subtracts the
number X from the stack pointer the code increases the frame size by
(0x100000000-X). This results in invalid accesses to memory and
recursive page faults.
2) Because gcc reorders blocks, handling instructions that subtract from
the frame pointer is incorrect. For example, this function
int f(int a)
{
if (__builtin_expect(a, 1))
return a;
g();
return a;
}
is compiled in such a way, that the code that decreases the stack
pointer for the first "return a" is placed before the code for "g" call.
If we recognize this decrement, we mistakenly believe that the frame
size for the "g" call is zero.
To fix problems 1) and 2), the patch doesn't recognize instructions that
decrease the stack pointer at all. To further safeguard the unwind code
against nonsense values, we don't allow frame size larger than
Total_frame_size.
3) The backtrace is not locked. If stack dump races with module unload,
invalid table can be accessed.
This patch adds a spinlock when processing module tables.
Note, that for correct backtrace, you need recent binutils.
Binutils 2.18 from Debian 5 produce garbage unwind tables.
Binutils 2.21 work better (it sometimes forgets function frames, but at
least it doesn't generate garbage).
Linus Torvalds [Sat, 4 Jun 2016 19:30:36 +0000 (12:30 -0700)]
Merge tag 'drm-fixes-for-v4.7-rc2' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"A bunch of ARM drivers got into the fixes vibe this time around, so
this contains a bunch of fixes for imx, atmel hlcdc, arm hdlcd (only
so many combos of hlcd), mediatek and omap drm.
Other than that there is one mgag200 fix and a few core drm regression
fixes"
* tag 'drm-fixes-for-v4.7-rc2' of git://people.freedesktop.org/~airlied/linux: (34 commits)
drm/omap: fix unused variable warning.
drm: hdlcd: Add information about the underlying framebuffers in debugfs
drm: hdlcd: Cleanup the atomic plane operations
drm/hdlcd: Fix up crtc_state->event handling
drm: hdlcd: Revamp runtime power management
drm/mediatek: mtk_dsi: Remove spurious drm_connector_unregister
drm/mediatek: mtk_dpi: remove invalid error message
drm: atmel-hlcdc: fix a NULL check
drm: atmel-hlcdc: fix atmel_hlcdc_crtc_reset() implementation
drm/mgag200: Black screen fix for G200e rev 4
drm: Wrap direct calls to driver->gem_free_object from CMA
drm: fix fb refcount issue with atomic modesetting
drm: make drm_atomic_set_mode_prop_for_crtc() more reliable
drm/sti: remove extra mode fixup
drm: add missing drm_mode_set_crtcinfo call
drm/omap: include gpio/consumer.h where needed
drm/omap: include linux/seq_file.h where needed
Revert "drm/omap: no need to select OMAP2_DSS"
drm/omap: Remove regulator API abuse
OMAPDSS: HDMI5: Change DDC timings
...
Linus Torvalds [Sat, 4 Jun 2016 19:25:36 +0000 (12:25 -0700)]
Merge tag 'vfio-v4.7-rc2' of git://github.com/awilliam/linux-vfio
Pull VFIO fixes from Alex Williamson:
"Fix irqfd shutdown ordering, build warning, and VPD short read"
* tag 'vfio-v4.7-rc2' of git://github.com/awilliam/linux-vfio:
vfio/pci: Allow VPD short read
vfio/type1: Fix build warning
vfio/pci: Fix ordering of eventfd vs virqfd shutdown
Linus Torvalds [Sat, 4 Jun 2016 18:56:28 +0000 (11:56 -0700)]
Merge branch 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"The important part of this pull is Filipe's set of fixes for btrfs
device replacement. Filipe fixed a few issues seen on the list and a
number he found on his own"
* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: deal with duplciates during extent_map insertion in btrfs_get_extent
Btrfs: fix race between device replace and read repair
Btrfs: fix race between device replace and discard
Btrfs: fix race between device replace and chunk allocation
Btrfs: fix race setting block group back to RW mode during device replace
Btrfs: fix unprotected assignment of the left cursor for device replace
Btrfs: fix race setting block group readonly during device replace
Btrfs: fix race between device replace and block group removal
Btrfs: fix race between readahead and device replace/removal
Linus Torvalds [Sat, 4 Jun 2016 18:37:53 +0000 (11:37 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fixes from Sage Weil:
"We have a few follow-up fixes for the libceph refactor from Ilya, and
then some cephfs + fscache fixes from Zheng.
The first two FS-Cache patches are acked by David Howells and deemed
trivial enough to go through our tree. The rest fix some issues with
the ceph fscache handling (disable cache for inodes opened for write,
and simplify the revalidation logic accordingly, dropping the
now-unnecessary work queue)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: use i_version to check validity of fscache
ceph: improve fscache revalidation
ceph: disable fscache when inode is opened for write
ceph: avoid unnecessary fscache invalidation/revlidation
ceph: call __fscache_uncache_page() if readpages fails
FS-Cache: make check_consistency callback return int
FS-Cache: wake write waiter after invalidating writes
libceph: use %s instead of %pE in dout()s
libceph: put request only if it's done in handle_reply()
libceph: change ceph_osdmap_flag() to take osdc
Linus Torvalds [Sat, 4 Jun 2016 18:26:49 +0000 (11:26 -0700)]
Merge tag 'acpi-4.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"Two fixes for problems introduced recently (ACPICA and the ACPI
backlight driver) and one fix for an older issue that prevents at
least one system from booting.
Specifics:
- Fix an incorrect check introduced by recent ACPICA changes which
causes problems with booting KVM guests to happen, among other
things (Lv Zheng).
- Fix a backlight issue introduced by recent changes to the ACPI
video driver (Aaron Lu).
- Fix the ACPI processor initialization which attempts to register an
IO region without checking if that really is necessary and
sometimes prevents drivers loaded subsequently from registering
their resources which leads to boot issues (Rafael Wysocki)"
* tag 'acpi-4.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / processor: Avoid reserving IO regions too early
ACPICA / Hardware: Fix old register check in acpi_hw_get_access_bit_width()
ACPI / Thermal / video: fix max_level incorrect value
Linus Torvalds [Sat, 4 Jun 2016 18:07:57 +0000 (11:07 -0700)]
Merge tag 'pm-4.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"Two fixes for problems introduced recently in the cpufreq core and the
intel_pstate driver.
Specifics:
- Fix a silly mistake related to the clamp_val() usage in a function
added by a recent commit (Rafael Wysocki).
- Reduce the log level of an annoying message added to intel_pstate
during the recent merge window (Srinivas Pandruvada)"
* tag 'pm-4.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: Fix clamp_val() usage in cpufreq_driver_fast_switch()
cpufreq: intel_pstate: Downgrade print level for _PPC
Linus Torvalds [Sat, 4 Jun 2016 17:51:29 +0000 (10:51 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge various fixes from Andrew Morton:
"10 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm, page_alloc: recalculate the preferred zoneref if the context can ignore memory policies
mm, page_alloc: reset zonelist iterator after resetting fair zone allocation policy
mm, oom_reaper: do not use siglock in try_oom_reaper()
mm, page_alloc: prevent infinite loop in buffered_rmqueue()
checkpatch: reduce git commit description style false positives
mm/z3fold.c: avoid modifying HEADLESS page and minor cleanup
memcg: add RCU locking around css_for_each_descendant_pre() in memcg_offline_kmem()
mm: check the return value of lookup_page_ext for all call sites
kdump: fix dmesg gdbmacro to work with record based printk
mm: fix overflow in vm_map_ram()
Linus Torvalds [Fri, 3 Jun 2016 23:12:35 +0000 (16:12 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
- a few simple fixes for fallout from the recent gic-v3 changes
- a workaround for a Cavium thunderX erratum
- a bugfix for the pic32 irqchip to make external interrupts work proper
- a missing return value in the generic IPI management code
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/irq-pic32-evic: Fix bug with external interrupts.
irqchip/gicv3-its: numa: Enable workaround for Cavium thunderx erratum 23144
irqchip/gic-v3: Fix quiescence check in gic_enable_redist
irqchip/gic-v3: Fix copy+paste mistakes in defines
irqchip/gic-v3: Fix ICC_SGI1R_EL1.INTID decoding mask
genirq: Fix missing return value in irq_destroy_ipi()
Mel Gorman [Fri, 3 Jun 2016 21:56:01 +0000 (14:56 -0700)]
mm, page_alloc: recalculate the preferred zoneref if the context can ignore memory policies
The optimistic fast path may use cpuset_current_mems_allowed instead of
of a NULL nodemask supplied by the caller for cpuset allocations. The
preferred zone is calculated on this basis for statistic purposes and as
a starting point in the zonelist iterator.
However, if the context can ignore memory policies due to being atomic
or being able to ignore watermarks then the starting point in the
zonelist iterator is no longer correct. This patch resets the zonelist
iterator in the allocator slowpath if the context can ignore memory
policies. This will alter the zone used for statistics but only after
it is known that it makes sense for that context. Resetting it before
entering the slowpath would potentially allow an ALLOC_CPUSET allocation
to be accounted for against the wrong zone. Note that while nodemask is
not explicitly set to the original nodemask, it would only have been
overwritten if cpuset_enabled() and it was reset before the slowpath was
entered.
Link: http://lkml.kernel.org/r/20160602103936.GU2527@techsingularity.net Fixes: c33d6c06f60f710 ("mm, page_alloc: avoid looking up the first zone in a zonelist twice") Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Fri, 3 Jun 2016 21:55:58 +0000 (14:55 -0700)]
mm, page_alloc: reset zonelist iterator after resetting fair zone allocation policy
Geert Uytterhoeven reported the following problem that bisected to
commit c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone
in a zonelist twice") on m68k/ARAnyM
The relationship is not obvious but it's due to a failure to rescan the
full zonelist after the fair zone allocation policy exhausts the batch
count. While this is a functional problem, it's also a performance
issue. A page allocator microbenchmark showed the following
4.7.0-rc1 4.7.0-rc1
vanilla reset-v1r2
Min alloc-odr0-1 327.00 ( 0.00%) 326.00 ( 0.31%)
Min alloc-odr0-2 235.00 ( 0.00%) 235.00 ( 0.00%)
Min alloc-odr0-4 198.00 ( 0.00%) 198.00 ( 0.00%)
Min alloc-odr0-8 170.00 ( 0.00%) 170.00 ( 0.00%)
Min alloc-odr0-16 156.00 ( 0.00%) 156.00 ( 0.00%)
Min alloc-odr0-32 150.00 ( 0.00%) 150.00 ( 0.00%)
Min alloc-odr0-64 146.00 ( 0.00%) 146.00 ( 0.00%)
Min alloc-odr0-128 145.00 ( 0.00%) 145.00 ( 0.00%)
Min alloc-odr0-256 155.00 ( 0.00%) 155.00 ( 0.00%)
Min alloc-odr0-512 168.00 ( 0.00%) 165.00 ( 1.79%)
Min alloc-odr0-1024 175.00 ( 0.00%) 174.00 ( 0.57%)
Min alloc-odr0-2048 180.00 ( 0.00%) 180.00 ( 0.00%)
Min alloc-odr0-4096 187.00 ( 0.00%) 186.00 ( 0.53%)
Min alloc-odr0-8192 190.00 ( 0.00%) 190.00 ( 0.00%)
Min alloc-odr0-16384 191.00 ( 0.00%) 191.00 ( 0.00%)
Min alloc-odr1-1 736.00 ( 0.00%) 445.00 ( 39.54%)
Min alloc-odr1-2 343.00 ( 0.00%) 335.00 ( 2.33%)
Min alloc-odr1-4 277.00 ( 0.00%) 270.00 ( 2.53%)
Min alloc-odr1-8 238.00 ( 0.00%) 233.00 ( 2.10%)
Min alloc-odr1-16 224.00 ( 0.00%) 218.00 ( 2.68%)
Min alloc-odr1-32 210.00 ( 0.00%) 208.00 ( 0.95%)
Min alloc-odr1-64 207.00 ( 0.00%) 203.00 ( 1.93%)
Min alloc-odr1-128 276.00 ( 0.00%) 202.00 ( 26.81%)
Min alloc-odr1-256 206.00 ( 0.00%) 202.00 ( 1.94%)
Min alloc-odr1-512 207.00 ( 0.00%) 202.00 ( 2.42%)
Min alloc-odr1-1024 208.00 ( 0.00%) 205.00 ( 1.44%)
Min alloc-odr1-2048 213.00 ( 0.00%) 212.00 ( 0.47%)
Min alloc-odr1-4096 218.00 ( 0.00%) 216.00 ( 0.92%)
Min alloc-odr1-8192 341.00 ( 0.00%) 219.00 ( 35.78%)
Note that order-0 allocations are unaffected but higher orders get a
small boost from this patch and a large reduction in system CPU usage
overall as can be seen here:
4.7.0-rc1 4.7.0-rc1
vanilla reset-v1r2
User 85.32 86.31
System 2221.39 2053.36
Elapsed 2368.89 2202.47
Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice") Link: http://lkml.kernel.org/r/20160531100848.GR2527@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Fri, 3 Jun 2016 21:55:55 +0000 (14:55 -0700)]
mm, oom_reaper: do not use siglock in try_oom_reaper()
Oleg has noted that siglock usage in try_oom_reaper is both pointless
and dangerous. signal_group_exit can be checked lockless. The problem
is that sighand becomes NULL in __exit_signal so we can crash.
Fixes: 3ef22dfff239 ("oom, oom_reaper: try to reap tasks which skip regular OOM killer path") Link: http://lkml.kernel.org/r/1464679423-30218-1-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vlastimil Babka [Fri, 3 Jun 2016 21:55:52 +0000 (14:55 -0700)]
mm, page_alloc: prevent infinite loop in buffered_rmqueue()
In DEBUG_VM kernel, we can hit infinite loop for order == 0 in
buffered_rmqueue() when check_new_pcp() returns 1, because the bad page
is never removed from the pcp list. Fix this by removing the page
before retrying. Also we don't need to check if page is non-NULL,
because we simply grab it from the list which was just tested for being
non-empty.
Fixes: 479f854a207c ("mm, page_alloc: defer debugging checks of pages allocated from the PCP") Link: http://lkml.kernel.org/r/20160530090154.GM2527@techsingularity.net Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vitaly Wool [Fri, 3 Jun 2016 21:55:47 +0000 (14:55 -0700)]
mm/z3fold.c: avoid modifying HEADLESS page and minor cleanup
Fix erroneous z3fold header access in a HEADLESS page in reclaim
function, and change one remaining direct handle-to-buddy conversion to
use the appropriate helper.
Tejun Heo [Fri, 3 Jun 2016 21:55:44 +0000 (14:55 -0700)]
memcg: add RCU locking around css_for_each_descendant_pre() in memcg_offline_kmem()
memcg_offline_kmem() may be called from memcg_free_kmem() after a css
init failure. memcg_free_kmem() is a ->css_free callback which is
called without cgroup_mutex and memcg_offline_kmem() ends up using
css_for_each_descendant_pre() without any locking. Fix it by adding rcu
read locking around it.
mkdir: cannot create directory `65530': No space left on device
===============================
[ INFO: suspicious RCU usage. ]
4.6.0-work+ #321 Not tainted
-------------------------------
kernel/cgroup.c:4008 cgroup_mutex or RCU read lock required!
[ 527.243970] other info that might help us debug this:
[ 527.244715]
rcu_scheduler_active = 1, debug_locks = 0
2 locks held by kworker/0:5/1664:
#0: ("cgroup_destroy"){.+.+..}, at: [<ffffffff81060ab5>] process_one_work+0x165/0x4a0
#1: ((&css->destroy_work)#3){+.+...}, at: [<ffffffff81060ab5>] process_one_work+0x165/0x4a0
[ 527.248098] stack backtrace:
CPU: 0 PID: 1664 Comm: kworker/0:5 Not tainted 4.6.0-work+ #321
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
Workqueue: cgroup_destroy css_free_work_fn
Call Trace:
dump_stack+0x68/0xa1
lockdep_rcu_suspicious+0xd7/0x110
css_next_descendant_pre+0x7d/0xb0
memcg_offline_kmem.part.44+0x4a/0xc0
mem_cgroup_css_free+0x1ec/0x200
css_free_work_fn+0x49/0x5e0
process_one_work+0x1c5/0x4a0
worker_thread+0x49/0x490
kthread+0xea/0x100
ret_from_fork+0x1f/0x40
Link: http://lkml.kernel.org/r/20160526203018.GG23194@mtj.duckdns.org Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: <stable@vger.kernel.org> [4.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Shi [Fri, 3 Jun 2016 21:55:38 +0000 (14:55 -0700)]
mm: check the return value of lookup_page_ext for all call sites
Per the discussion with Joonsoo Kim [1], we need check the return value
of lookup_page_ext() for all call sites since it might return NULL in
some cases, although it is unlikely, i.e. memory hotplug.
Corey Minyard [Fri, 3 Jun 2016 21:55:36 +0000 (14:55 -0700)]
kdump: fix dmesg gdbmacro to work with record based printk
Commit 7ff9554bb578 ("printk: convert byte-buffer to variable-length
record buffer") introduced a record based printk buffer. Modify
gdbmacros.txt to parse this new structure so dmesg will work properly.
Link: http://lkml.kernel.org/r/1463515794-1599-1-git-send-email-minyard@acm.org Signed-off-by: Corey Minyard <cminyard@mvista.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When remapping pages accounting for 4G or more memory space, the
operation 'count << PAGE_SHIFT' overflows as it is performed on an
integer. Solution: cast before doing the bitshift.
[akpm@linux-foundation.org: fix vm_unmap_ram() also]
[akpm@linux-foundation.org: fix vmap() as well, per Guillermo] Link: http://lkml.kernel.org/r/etPan.57175fb3.7a271c6b.2bd@naudit.es Signed-off-by: Guillermo Julián Moreno <guillermo.julian@naudit.es> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 3 Jun 2016 21:39:29 +0000 (14:39 -0700)]
Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fix from Russell King:
"Just one fix to the ptrace code, spotted by Simon Marchi, where if a
thread migrates to a different CPU and the VFP registers are changed
through ptrace, the application doesn't see the updated VFP registers"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: fix PTRACE_SETVFPREGS on SMP systems
Linus Torvalds [Fri, 3 Jun 2016 21:29:47 +0000 (14:29 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The main thing here is reviving hugetlb support using contiguous ptes,
which we ended up reverting at the last minute in 4.5 pending a fix
which went into the core mm/ code during the recent merge window.
- Revert a previous revert and get hugetlb going with contiguous hints
- Wire up missing compat syscalls
- Enable CONFIG_SET_MODULE_RONX by default
- Add missing line to our compat /proc/cpuinfo output
- Clarify levels in our page table dumps
- Fix booting with RANDOMIZE_TEXT_OFFSET enabled
- Misc fixes to the ARM CPU PMU driver (refcounting, probe failure)
- Remove some dead code and update a comment"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: fix alignment when RANDOMIZE_TEXT_OFFSET is enabled
arm64: move {PAGE,CONT}_SHIFT into Kconfig
arm64: mm: dump: log span level
arm64: update stale PAGE_OFFSET comment
drivers/perf: arm_pmu: Avoid leaking pmu->irq_affinity on error
drivers/perf: arm_pmu: Defer the setting of __oprofile_cpu_pmu
drivers/perf: arm_pmu: Fix reference count of a device_node in of_pmu_irq_cfg
arm64: report CPU number in bad_mode
arm64: unistd32.h: wire up missing syscalls for compat tasks
arm64: Provide "model name" in /proc/cpuinfo for PER_LINUX32 tasks
arm64: enable CONFIG_SET_MODULE_RONX by default
arm64: Remove orphaned __addr_ok() definition
Revert "arm64: hugetlb: partial revert of 66b3923a1a0f"
Linus Torvalds [Fri, 3 Jun 2016 21:20:22 +0000 (14:20 -0700)]
Merge tag 'powerpc-4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Handle RTAS delay requests in configure_bridge from Russell Currey
- Refactor the configure_bridge RTAS tokens from Russell Currey
- Fix definition of SIAR and SDAR registers from Thomas Huth
- Use privileged SPR number for MMCR2 from Thomas Huth
- Update LPCR only if it is powernv from Aneesh Kumar K.V
- Fix the reference bit update when handling hash fault from Aneesh
Kumar K.V
- Add missing tlb flush from Aneesh Kumar K.V
- Add POWER8NVL support to ibm,client-architecture-support call from
Thomas Huth
* tag 'powerpc-4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/pseries: Add POWER8NVL support to ibm,client-architecture-support call
powerpc/mm/radix: Add missing tlb flush
powerpc/mm/hash: Fix the reference bit update when handling hash fault
powerpc/mm/radix: Update LPCR only if it is powernv
powerpc: Use privileged SPR number for MMCR2
powerpc: Fix definition of SIAR and SDAR registers
powerpc/pseries/eeh: Refactor the configure_bridge RTAS tokens
powerpc/pseries/eeh: Handle RTAS delay requests in configure_bridge
Chris Mason [Sat, 19 Sep 2015 18:28:25 +0000 (11:28 -0700)]
Btrfs: deal with duplciates during extent_map insertion in btrfs_get_extent
When dealing with inline extents, btrfs_get_extent will incorrectly try
to insert a duplicate extent_map. The dup hits -EEXIST from
add_extent_map, but then we try to merge with the existing one and end
up trying to insert a zero length extent_map.
This actually works most of the time, except when there are extent maps
past the end of the inline extent. rocksdb will trigger this sometimes
because it preallocates an extent and then truncates down.
Thomas Gleixner [Fri, 3 Jun 2016 13:05:51 +0000 (15:05 +0200)]
Merge tag 'irqchip-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent
Merge irqchip updates from Marc Zyngier:
- A number of embarassing buglets (GICv3, PIC32)
- A more substential errata workaround for Cavium's GICv3 ITS
(kept for post-rc1 due to its dependency on NUMA)
Mark Rutland [Tue, 31 May 2016 14:58:00 +0000 (15:58 +0100)]
arm64: fix alignment when RANDOMIZE_TEXT_OFFSET is enabled
With ARM64_64K_PAGES and RANDOMIZE_TEXT_OFFSET enabled, we hit the
following issue on the boot:
kernel BUG at arch/arm64/mm/mmu.c:480!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 4.6.0 #310
Hardware name: ARM Juno development board (r2) (DT)
task: ffff000008d58a80 ti: ffff000008d30000 task.ti: ffff000008d30000
PC is at map_kernel_segment+0x44/0xb0
LR is at paging_init+0x84/0x5b0
pc : [<ffff000008c450b4>] lr : [<ffff000008c451a4>] pstate: 600002c5
Commit 7eb90f2ff7e3 ("arm64: cover the .head.text section in the .text
segment mapping") removed the alignment between the .head.text and .text
sections, and used the _text rather than the _stext interval for mapping
the .text segment.
Prior to this commit _stext was always section aligned and didn't cause
any issue even when RANDOMIZE_TEXT_OFFSET was enabled. Since that
alignment has been removed and _text is used to map the .text segment,
we need ensure _text is always page aligned when RANDOMIZE_TEXT_OFFSET
is enabled.
This patch adds logic to TEXT_OFFSET fuzzing to ensure that the offset
is always aligned to the kernel page size. To ensure this, we rely on
the PAGE_SHIFT being available via Kconfig.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Sudeep Holla <sudeep.holla@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Fixes: 7eb90f2ff7e3 ("arm64: cover the .head.text section in the .text segment mapping") Signed-off-by: Will Deacon <will.deacon@arm.com>
Mark Rutland [Tue, 31 May 2016 14:57:59 +0000 (15:57 +0100)]
arm64: move {PAGE,CONT}_SHIFT into Kconfig
In some cases (e.g. the awk for CONFIG_RANDOMIZE_TEXT_OFFSET) we would
like to make use of PAGE_SHIFT outside of code that can include the
usual header files.
Add a new CONFIG_ARM64_PAGE_SHIFT for this, likewise with
ARM64_CONT_SHIFT for consistency.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
Mark Rutland [Tue, 31 May 2016 13:49:02 +0000 (14:49 +0100)]
arm64: mm: dump: log span level
The page table dump code logs spans of entries at the same level
(pgd/pud/pmd/pte) which have the same attributes. While we log the
(decoded) attributes, we don't log the level, which leaves the output
ambiguous and/or confusing in some cases.
For example:
0xffff800800000000-0xffff800980000000 6G RW NX SHD AF BLK UXN MEM/NORMAL
If using 4K pages, this may describe a span of 6 1G block entries at the
PGD/PUD level, or 3072 2M block entries at the PMD level.
This patch adds the page table level to each output line, removing this
ambiguity. For the example above, this will produce:
0xffffffc800000000-0xffffffc980000000 6G PUD RW NX SHD AF BLK UXN MEM/NORMAL
When 3 level tables are in use, and we use the asm-generic/nopud.h
definitions, the dump code treats each entry in the PGD as a 1 element
table at the PUD level, and logs spans as being PUDs, which can be
confusing. To counteract this, the "PUD" mnemonic is replaced with "PGD"
when CONFIG_PGTABLE_LEVELS <= 3. Likewise for "PMD" when
CONFIG_PGTABLE_LEVELS <= 2.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Huang Shijie <shijie.huang@arm.com> Cc: Laura Abbott <labbott@fedoraproject.org> Cc: Steve Capper <steve.capper@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
Mark Rutland [Wed, 1 Jun 2016 11:07:17 +0000 (12:07 +0100)]
arm64: update stale PAGE_OFFSET comment
Commit ab893fb9f1b17f02 ("arm64: introduce KIMAGE_VADDR as the virtual
base of the kernel region") logically split KIMAGE_VADDR from
PAGE_OFFSET, and since commit f9040773b7bbbd9e ("arm64: move kernel
image to base of vmalloc area") the two have been distinct values.
Unfortunately, neither commit updated the comment above these
definitions, which now erroneously states that PAGE_OFFSET is the start
of the kernel image rather than the start of the linear mapping.
This patch fixes said comment, and introduces an explanation of
KIMAGE_VADDR.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
Julien Grall [Tue, 31 May 2016 11:41:23 +0000 (12:41 +0100)]
drivers/perf: arm_pmu: Avoid leaking pmu->irq_affinity on error
pmu->irq_affinity will not be freed if an error occurred within
arm_pmu_device_probe after of_pmu_irq_cfg has been called.
Note that in the case of_pmu_irq_cfg is returning an error,
pmu->irq_affinity will not be set, but it should be NULL as pmu was
kzalloc'd. Therefore the result kfree(NULL) is benign.
Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
Julien Grall [Tue, 31 May 2016 11:41:22 +0000 (12:41 +0100)]
drivers/perf: arm_pmu: Defer the setting of __oprofile_cpu_pmu
The global variable __oprofile_cpu_pmu is set before the PMU is fully
initialized. If an error occurs before the end of the initialization,
the PMU will be freed and the variable will contain an invalid pointer.
This will result in a kernel crash when perf will be used.
Fix it by moving the setting of __oprofile_cpu_pmu when the PMU is fully
initialized (i.e when it is no longer possible to fail).
Cc: <stable@vger.kernel.org> Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
Julien Grall [Tue, 31 May 2016 11:41:21 +0000 (12:41 +0100)]
drivers/perf: arm_pmu: Fix reference count of a device_node in of_pmu_irq_cfg
The only function called by of_pmu_irq_cfg that will increment the
reference count on dn is of_parse_phandle.
Each time we successfully parse a possible CPU from an
interrupt-affinity property, we increment the refcount of that CPU node
once via of_parse_handle. After validating the CPU is possible, we
decrement the refcount once. Subsequently, we decrement the refcount
again, either as part of an early break if we don't have a matching SPI,
or as part of the end of the loop body.
This will lead to decrementing twice the refcounnt.
Remove the second pairs of call to of_node_put as nobody is using dn
between the first and second call to of_node_put.
Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
Mark Rutland [Tue, 31 May 2016 11:07:47 +0000 (12:07 +0100)]
arm64: report CPU number in bad_mode
If we take an exception we don't expect (e.g. SError), we report this in
the bad_mode handler with pr_crit. Depending on the configured log
level, we may or may not log additional information in functions called
subsequently. Notably, the messages in dump_stack (including the CPU
number) are printed with KERN_DEFAULT and may not appear.
Some exceptions have an IMPLEMENTATION DEFINED ESR_ELx.ISS encoding, and
knowing the CPU number is crucial to correctly decode them. To ensure
that this is always possible, we should log the CPU number along with
the ESR_ELx value, so we are not reliant on subsequent logs or
additional printk configuration options.
This patch logs the CPU number in bad_mode such that it is possible for
a developer to decode these exceptions, provided access to sufficient
documentation.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Al Grant <Al.Grant@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Martin <dave.martin@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
Stephan Mueller [Thu, 26 May 2016 21:38:12 +0000 (23:38 +0200)]
KEYS: Add placeholder for KDF usage with DH
The values computed during Diffie-Hellman key exchange are often used
in combination with key derivation functions to create cryptographic
keys. Add a placeholder for a later implementation to configure a
key derivation function that will transform the Diffie-Hellman
result returned by the KEYCTL_DH_COMPUTE command.
[This patch was stripped down from a patch produced by Mat Martineau that
had a bug in the compat code - so for the moment Stephan's patch simply
requires that the placeholder argument must be NULL]
Original-signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
Dave Airlie [Fri, 3 Jun 2016 04:35:00 +0000 (14:35 +1000)]
Merge tag 'omapdrm-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux into drm-fixes
omapdrm fixes for 4.7
* multiple compile break fixes for missing includes, bad kconfig dependencies.
* remove regulator API misuse causing deprecation warnings
* OMAP5 HDMI fixes for DDC and AVI infoframe
* OMAP4 HDMI fix for CEC
* tag 'omapdrm-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
drm/omap: include gpio/consumer.h where needed
drm/omap: include linux/seq_file.h where needed
Revert "drm/omap: no need to select OMAP2_DSS"
drm/omap: Remove regulator API abuse
OMAPDSS: HDMI5: Change DDC timings
OMAPDSS: HDMI5: Fix AVI infoframe
drm/omap: fix OMAP4 hdmi_core_powerdown_disable()
drm/omap: Fix missing includes
drm/omapdrm: include pinctrl/consumer.h where needed
Dave Airlie [Fri, 3 Jun 2016 04:11:49 +0000 (14:11 +1000)]
Merge tag 'imx-drm-next-2016-06-01' of git://git.pengutronix.de/git/pza/linux into drm-fixes
imx-drm updates
- add support for reading LVDS panel EDID over DDC
- enable UYVY/VYUY support
- add support for pixel clock polarity configuration
- honor the native-mode DT property for LVDS
- various fixes and cleanups
* tag 'imx-drm-next-2016-06-01' of git://git.pengutronix.de/git/pza/linux:
drm/imx: plane: Don't set plane->crtc in ipu_plane_update()
drm/imx: ipuv3-plane: Constify ipu_plane_funcs
drm/imx: imx-ldb: honor 'native-mode' property when selecting video mode from DT
drm/imx: parallel-display: remove dead code
drm/imx: use bus_flags for pixel clock polarity
drm/imx: ipuv3-plane: enable UYVY and VYUY formats
drm/imx: parallel-display: use of_graph_get_endpoint_by_regs helper
drm/imx: imx-ldb: use of_graph_get_endpoint_by_regs helper
dt-bindings: imx: ldb: Add ddc-i2c-bus property
drm/imx: imx-ldb: Add DDC support
Dave Airlie [Fri, 3 Jun 2016 04:08:20 +0000 (14:08 +1000)]
Merge tag 'drm-atmel-hlcdc-fixes/for-4.7-rc2' of github.com:bbrezillon/linux-at91 into drm-fixes
Two trivial bugfixes for the atmel-hlcdc driver.
The first one is making use of __drm_atomic_helper_crtc_destroy_state()
instead of duplicating its logic in atmel_hlcdc_crtc_reset() and
risking memory leaks if other objects are added to the common CRTC
state.
The second one is fixing a possible NULL pointer dereference.
* tag 'drm-atmel-hlcdc-fixes/for-4.7-rc2' of github.com:bbrezillon/linux-at91:
drm: atmel-hlcdc: fix a NULL check
drm: atmel-hlcdc: fix atmel_hlcdc_crtc_reset() implementation
Dave Airlie [Fri, 3 Jun 2016 04:07:42 +0000 (14:07 +1000)]
Merge branch 'for-upstream/hdlcd' of git://linux-arm.org/linux-ld into drm-fixes
"I have accumulated some cleanup patches for HDLCD, partly triggered by
Daniel Vetter's work on non-blocking atomic operations, that I would like
to integrate into v4.7. My first patch is important for the newly enabled
hibernate option for AArch64 on Juno, the others are fixing behaviour in
HDLCD and adding a debugfs entry to help track the underlying framebuffer
usage. I'm also taking one of Daniel's patches from his non-blocking series
to help with the integration of his patches later."
* 'for-upstream/hdlcd' of git://linux-arm.org/linux-ld:
drm: hdlcd: Add information about the underlying framebuffers in debugfs
drm: hdlcd: Cleanup the atomic plane operations
drm/hdlcd: Fix up crtc_state->event handling
drm: hdlcd: Revamp runtime power management
Liviu Dudau [Wed, 1 Jun 2016 14:00:15 +0000 (15:00 +0100)]
drm: hdlcd: Cleanup the atomic plane operations
Harden the plane_check() code to drop attempts at scaling because
that is not supported. Make hdlcd_plane_atomic_update() set the pitch
and line length registers that correctly reflect the plane's values.
And make hdlcd_crtc_mode_set_nofb() a helper function for
hdlcd_crtc_enable() rather than an exposed hook.
Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
Liviu Dudau [Tue, 17 May 2016 09:06:54 +0000 (10:06 +0100)]
drm: hdlcd: Revamp runtime power management
Because the HDLCD driver acts as a component master it can end
up enabling the runtime PM functionality before the encoders
are initialised. This can cause crashes if the component slave
never probes (missing module) or if the PM operations kick in
before the probe finishes.
Move the enabling of the runtime PM after the component master
has finished collecting the slave components and use the DRM
atomic helpers to suspend and resume the device.
Tested-by: Robin Murphy <Robin.Murphy@arm.com> Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
Paolo Bonzini [Wed, 1 Jun 2016 12:09:23 +0000 (14:09 +0200)]
KVM: x86: fix OOPS after invalid KVM_SET_DEBUGREGS
MOV to DR6 or DR7 causes a #GP if an attempt is made to write a 1 to
any of bits 63:32. However, this is not detected at KVM_SET_DEBUGREGS
time, and the next KVM_RUN oopses: