Brian Norris [Thu, 20 Apr 2017 20:36:25 +0000 (15:36 -0500)]
PCI: Don't allow unbinding host controllers that aren't prepared
Many PCI host controller drivers aren't prepared to have their devices
unbound from them forcefully (e.g., through /sys/.../<driver>/unbind), as
they don't provide any driver .remove callback, where they'd detach the
root bus, release resources, etc. Keeping the driver built in (i.e., not a
loadable module) is not enough; and providing no .remove callback just
means we don't do any teardown.
To rule out the possibility of unbinding a device via sysfs, we need to set
the ".suppress_bind_attrs" field.
I found the suspect drivers via the following search:
* pci/virtualization:
ixgbe: Use pcie_flr() instead of duplicating it
IB/hfi1: Use pcie_flr() instead of duplicating it
PCI: Call pcie_flr() from reset_chelsio_generic_dev()
PCI: Call pcie_flr() from reset_intel_82599_sfp_virtfn()
PCI: Export pcie_flr()
PCI: Add sysfs sriov_drivers_autoprobe to control VF driver binding
PCI: Avoid FLR for Intel 82579 NICs
* pci/resource-mmap:
ia64: Use generic pci_mmap_resource_range()
ia64: Remove redundant checks for WC in pci_mmap_page_range()
ia64: Remove redundant valid_mmap_phys_addr_range() from pci_mmap_page_range()
PCI: Add I/O BAR support to generic pci_mmap_resource_range()
x86/PCI: Use generic pci_mmap_resource_range()
unicore32/PCI: Use generic pci_mmap_resource_range()
sh/PCI: Use generic pci_mmap_resource_range()
parisc: Use generic pci_mmap_resource_range()
mn10300/PCI: Use generic pci_mmap_resource_range()
MIPS: PCI: Use generic pci_mmap_resource_range()
cris/PCI: Use generic pci_mmap_resource_range()
ARM/PCI: Use generic pci_mmap_resource_range()
PCI: Add pci_mmap_resource_range() and use it for ARM64
PCI: Add BAR index argument to pci_mmap_page_range()
PCI: Use BAR index in sysfs attr->private instead of resource pointer
PCI: Add arch_can_pci_mmap_io() on architectures which can mmap() I/O space
PCI: Move multiple declarations of pci_mmap_page_range() to <linux/pci.h>
PCI: Add arch_can_pci_mmap_wc() macro
xtensa/PCI: Do not mmap PCI BARs to userspace as write-through
PCI: Only allow WC mmap on prefetchable resources
PCI: Fix another sanity check bug in /proc/pci mmap
PCI: Fix pci_mmap_fits() for HAVE_PCI_RESOURCE_TO_USER platforms
* pci/resource:
PCI: Don't resize resources when realigning all devices in system
PCI: Don't reassign resources that are already aligned
PCI: Factor pci_reassigndev_resource_alignment()
powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned
PCI: Add pcibios_default_alignment() for arch-specific alignment control
PCI: Fix calculation of bridge window's size and alignment
PCI: Ignore requested alignment for IOV BARs
PCI: Make PCI_ROM_ADDRESS_MASK a 32-bit constant
* pci/msi:
PCI/MSI: Use dev_printk() when possible
of/pci: Remove unused MSI controller helpers
PCI: mvebu: Remove useless MSI enabling code
PCI: aardvark: Move to MSI handling using generic MSI support
PCI/MSI: Make pci_msi_shutdown() and pci_msix_shutdown() static
PCI/MSI: Stop disabling MSI/MSI-X in pci_device_shutdown()
* pci/irq:
PCI: Disable boot interrupt quirk for ASUS M2N-LR
nvme/pci: Switch to pci_request_irq()
PCI/irq: Add pci_request_irq() and pci_free_irq() helpers
genirq: Return the IRQ name from free_irq()
genirq: Fix indentation in remove_irq()
* pci/ioremap:
PCI: versatile: Update PCI config space remap function
PCI: keystone-dw: Update PCI config space remap function
PCI: layerscape: Update PCI config space remap function
PCI: hisi: Update PCI config space remap function
PCI: tegra: Update PCI config space remap function
PCI: xgene: Update PCI config space remap function
PCI: armada8k: Update PCI config space remap function
PCI: designware: Update PCI config space remap function
PCI: iproc-platform: Update PCI config space remap function
PCI: qcom: Update PCI config space remap function
PCI: rockchip: Update PCI config space remap function
PCI: spear13xx: Update PCI config space remap function
PCI: xilinx-nwl: Update PCI config space remap function
PCI: xilinx: Update PCI config space remap function
PCI: ECAM: Map config region with pci_remap_cfgspace()
PCI: Implement devm_pci_remap_cfgspace()
devres: fix devm_ioremap_*() offset parameter kerneldoc description
ARM: Implement pci_remap_cfgspace() interface
ARM64: Implement pci_remap_cfgspace() interface
linux/io.h: Add pci_remap_cfgspace() interface
PCI: Remove __weak tag from pci_remap_iospace()
* pci/host-rockchip:
PCI: rockchip: Modularize
PCI: Export pci_remap_iospace() and pci_unmap_iospace()
PCI: rockchip: Add remove() support
PCI: rockchip: Set PCI_EXP_LNKSTA_SLC in the Root Port
PCI: rockchip: Advertise 128-byte Read Completion Boundary support
PCI: rockchip: Make 'return 0' more obvious in probe()
PCI: rockchip: Unindent rockchip_pcie_set_power_limit()
PCI: rockchip: Handle regulator_get_current_limit() failure correctly
* pci/host-imx6:
PCI: imx6: Fix spelling mistake: "contol" -> "control"
PCI: imx6: Do not switch speed if Gen2 is disabled
PCI: imx6: Do not wait for speed change on i.MX7
PCI: imx6: Allow probe deferral by reset GPIO
PCI: imx6: Add code to support i.MX7D
* pci/host-designware:
ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP
MAINTAINERS: Add PCI Endpoint maintainer
Documentation: PCI: Add userguide for PCI endpoint test function
tools: PCI: Add sample test script to invoke pcitest
tools: PCI: Add a userspace tool to test PCI endpoint
Documentation: misc-devices: Add Documentation for pci-endpoint-test driver
misc: Add host side PCI driver for PCI test function device
PCI: Add device IDs for DRA74x and DRA72x
dt-bindings: PCI: dra7xx: Add DT bindings to enable unaligned access
PCI: dwc: dra7xx: Workaround for errata id i870
dt-bindings: PCI: dra7xx: Add DT bindings for PCI dra7xx EP mode
PCI: dwc: dra7xx: Add EP mode support
PCI: dwc: dra7xx: Facilitate wrapper and MSI interrupts to be enabled independently
dt-bindings: PCI: Add DT bindings for PCI designware EP mode
PCI: dwc: designware: Add EP mode support
Documentation: PCI: Add binding documentation for pci-test endpoint function
PCI: endpoint: functions: Add an EP function to test PCI
Documentation: PCI: Add specification for the *PCI test* function device
PCI: endpoint: Create configfs entry for EPC device and EPF driver
Documentation: PCI: Guide to use PCI endpoint configfs
PCI: endpoint: Introduce configfs entry for configuring EP functions
Documentation: PCI: Guide to use PCI Endpoint Core Layer
PCI: endpoint: Add EP core layer to enable EP controller and EP functions
PCI: dwc: dra7xx: Push request_irq() call to the bottom of probe
PCI: dwc: designware: Move _unroll configurations to a separate function
PCI: dwc: all: Modify dbi accessors to access data of 4/2/1 bytes
PCI: dwc: all: Modify dbi accessors to take dbi_base as argument
PCI: dwc: artpec6: Populate cpu_addr_fixup ops
PCI: dwc: dra7xx: Populate cpu_addr_fixup ops
PCI: dwc: designware: Add new *ops* for CPU addr fixup
PCI: dwc: Fix uninitialized variable in dw_handle_msi_irq()
PCI: dwc: Unindent dw_handle_msi_irq() loop
PCI: dwc: Fix dw_pcie_ops NULL pointer dereference
PCI: dwc: Select PCI_HOST_COMMON for hisi
PCI: thunder-pem: Fix legacy firmware PEM-specific resources
PCI: thunder-pem: Add legacy firmware support for Cavium ThunderX host controller
PCI: thunder-pem: Use Cavium assigned hardware ID for ThunderX host controller
PCI: iproc: Save host bridge window resource in struct iproc_pcie
PCI/ASPM: Always set link->downstream to avoid NULL dereference on remove
PCI: Prevent VPD access for QLogic ISP2722
PCI: exynos: Initialize elbi_base even when using PHY framework
ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP
The PCIe programming sequence in TRM suggests CLKSTCTRL of PCIe should be
set to SW_WKUP. There are no issues when CLKSTCTRL is set to HW_AUTO in RC
mode. However in EP mode, the host system is not able to access the
MEMSPACE and setting the CLKSTCTRL to SW_WKUP fixes it.
Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
tools: PCI: Add sample test script to invoke pcitest
Add a simple test script that invokes the pcitest userspace tool to perform
all the PCI endpoint tests (BAR tests, interrupt tests, read tests, write
tests and copy tests).
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
misc: Add host side PCI driver for PCI test function device
Add PCI endpoint test driver that can verify base address register, legacy
interrupt/MSI interrupt and read/write/copy buffers between host and
device. The corresponding pci-epf-test function driver should be used on
the EP side.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
According to errata i870, access to the PCIe slave port that are not 32-bit
aligned will result in incorrect mapping to TLP Address and Byte enable
fields.
Accessing non 32-bit aligned data causes incorrect data in the target
buffer if memcpy is used. Implement the workaround for this errata here.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
PCI: dwc: dra7xx: Facilitate wrapper and MSI interrupts to be enabled independently
No functional change. Split dra7xx_pcie_enable_interrupts() into
dra7xx_pcie_enable_wrapper_interrupts() and
dra7xx_pcie_enable_msi_interrupts() so that wrapper interrupts and MSI
interrupts can be enabled independently. This is in preparation for adding
EP mode support to dra7xx driver since EP mode doesn't have to enable
msi_interrupts.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Add endpoint mode support to designware driver. This uses the EP Core layer
introduced recently to add endpoint mode support. *Any* function driver
can now use this designware device in order to achieve the EP
functionality.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Trivial fix to spelling mistake in dev_err message
Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Richard Zhu <hongxing.Zhu@nxp.com>
Stefan Assmann [Wed, 19 Apr 2017 07:22:45 +0000 (09:22 +0200)]
PCI: Disable boot interrupt quirk for ASUS M2N-LR
The ASUS M2N-LR should not trigger boot interrupt quirks although it
carries an Intel 6702PXH. On this board the boot interrupt quirks cause
incorrect IRQ assignments and should be disabled.
PCI: versatile: Update PCI config space remap function
PCI configuration space should be mapped with a memory region type that
generates on the CPU host bus non-posted write transations. Update the
driver to use the devm_ioremap_nopost* interface to make sure the correct
memory mappings for PCI configuration space are used.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Rob Herring <robh@kernel.org>
PCI: keystone-dw: Update PCI config space remap function
PCI configuration space should be mapped with a memory region type that
generates on the CPU host bus non-posted write transations. Update the
driver to use the devm_pci_remap_cfg* interface to make sure the correct
memory mappings for PCI configuration space are used.
PCI: layerscape: Update PCI config space remap function
PCI configuration space should be mapped with a memory region type that
generates on the CPU host bus non-posted write transations. Update the
driver to use the devm_pci_remap_cfg* interface to make sure the correct
memory mappings for PCI configuration space are used.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Mingkai Hu <mingkai.hu@freescale.com> Cc: Minghuan Lian <minghuan.Lian@freescale.com> Cc: Roy Zang <tie-fei.zang@freescale.com>
PCI configuration space should be mapped with a memory region type that
generates on the CPU host bus non-posted write transations. Update the
driver to use the devm_pci_remap_cfg* interface to make sure the correct
memory mappings for PCI configuration space are used.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Gabriele Paoloni <gabriele.paoloni@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Zhou Wang <wangzhou1@hisilicon.com>
PCI: tegra: Update PCI config space remap function
PCI configuration space should be mapped with a memory region type that
generates on the CPU host bus non-posted write transations. Update the
driver to use correct memory mapping attributes to map config space
regions to enforce configuration space non-posted writes behaviour.
PCI: xgene: Update PCI config space remap function
PCI configuration space should be mapped with a memory region type that
generates on the CPU host bus non-posted write transations. Update the
driver to use the devm_pci_remap_cfg* interface to make sure the correct
memory mappings for PCI configuration space are used.
PCI: armada8k: Update PCI config space remap function
PCI configuration space should be mapped with a memory region type that
generates on the CPU host bus non-posted write transations. Update the
driver to use the devm_pci_remap_cfg* interface to make sure the correct
memory mappings for PCI configuration space are used.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
PCI: designware: Update PCI config space remap function
PCI configuration space should be mapped with a memory region type that
generates on the CPU host bus non-posted write transations. Update the
driver to use the devm_pci_remap_cfg* interface to make sure the correct
memory mappings for PCI configuration space are used.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Jingoo Han <jingoohan1@gmail.com> Cc: Joao Pinto <Joao.Pinto@synopsys.com>
PCI: iproc-platform: Update PCI config space remap function
PCI configuration space should be mapped with a memory region type that
generates on the CPU host bus non-posted write transations. Update the
driver to use the devm_pci_remap_cfg* interface to make sure the correct
memory mappings for PCI configuration space are used.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Ray Jui <rjui@broadcom.com> Cc: Jon Mason <jonmason@broadcom.com>
PCI configuration space should be mapped with a memory region type that
generates on the CPU host bus non-posted write transations. Update the
driver to use the devm_pci_remap_cfg* interface to make sure the correct
memory mappings for PCI configuration space are used.
PCI: rockchip: Update PCI config space remap function
PCI configuration space should be mapped with a memory region type that
generates on the CPU host bus non-posted write transations. Update the
driver to use the devm_pci_remap_cfg* interface to make sure the correct
memory mappings for PCI configuration space are used.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Wenrui Li <wenrui.li@rock-chips.com> Cc: Shawn Lin <shawn.lin@rock-chips.com>
PCI: spear13xx: Update PCI config space remap function
PCI configuration space should be mapped with a memory region type that
generate on the CPU host bus non-posted write transations. Update the
driver to use the devm_pci_remap_cfg* interface to make sure the correct
memory mappings for PCI configuration space are used.
PCI: xilinx-nwl: Update PCI config space remap function
PCI configuration space should be mapped with a memory region type that
generates on the CPU host bus non-posted write transations. Update the
driver to use the devm_pci_remap_cfg* interface to make sure the correct
memory mappings for PCI configuration space are used.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Bharat Kumar Gogada <bharat.kumar.gogada@xilinx.com> Cc: Michal Simek <michal.simek@xilinx.com>
PCI: xilinx: Update PCI config space remap function
PCI configuration space should be mapped with a memory region type that
generates on the CPU host bus non-posted write transations. Update the
driver to use the devm_pci_remap_cfg* interface to make sure the correct
memory mappings for PCI configuration space are used.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Bharat Kumar Gogada <bharat.kumar.gogada@xilinx.com> Cc: Michal Simek <michal.simek@xilinx.com>
PCI: ECAM: Map config region with pci_remap_cfgspace()
The current ECAM kernel implementation uses ioremap() to map the ECAM
configuration space memory region; this is not safe in that on some
architectures the ioremap interface provides mappings that allow posted
write transactions. This, as highlighted in the PCIe specifications (4.0 -
Rev0.3, "Ordering Considerations for the Enhanced Configuration Address
Mechanism"), can create ordering issues for software because posted writes
transactions on the CPU host bus are non posted in the PCI express fabric.
Update the ioremap() interface to use pci_remap_cfgspace() whose mapping
attributes guarantee that non-posted writes transactions are issued for
memory writes within the ECAM memory mapped address region.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Jayachandran C <jnair@caviumnetworks.com>
The introduction of the pci_remap_cfgspace() interface allows PCI host
controller drivers to map PCI config space through a dedicated kernel
interface. Current PCI host controller drivers use the devm_ioremap_*()
devres interfaces to map PCI configuration space regions so in order to
update them to the new pci_remap_cfgspace() mapping interface a new set of
devres interfaces should be implemented so that PCI host controller drivers
can make use of them.
Introduce two new functions in the PCI kernel layer and Devres
documentation:
The offset parameter in the devres devm_ioremap_*() functions kerneldoc
entries is erroneously defined as BUS offset whereas it is actually a
resource address.
Since it is actually misleading, fix the devres devm_ioremap_* offset
parameter kerneldoc entry by replacing BUS offset with a more suitable
description (ie Resource address).
The PCI bus specification (rev 3.0, 3.2.5 "Transaction Ordering and
Posting") defines rules for PCI configuration space transactions ordering
and posting, that state that configuration writes have to be non-posted
transactions.
Current ioremap interface on ARM provides mapping functions that provide
"bufferable" writes transactions (ie ioremap uses MT_DEVICE memory type)
aka posted writes, so PCI host controller drivers have no arch interface to
remap PCI configuration space with memory attributes that comply with the
PCI specifications for configuration space.
Implement an ARM specific pci_remap_cfgspace() interface that allows to map
PCI config memory regions with MT_UNCACHED memory type (ie strongly ordered
- non-posted writes), providing a remap function that complies with PCI
specifications for config space transactions.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@armlinux.org.uk>
The PCI bus specification (rev 3.0, 3.2.5 "Transaction Ordering and
Posting") defines rules for PCI configuration space transactions ordering
and posting, that state that configuration writes are non-posted
transactions.
This rule is reinforced by the ARM v8 architecture reference manual (issue
A.k, Early Write Acknowledgment) that explicitly recommends that No Early
Write Acknowledgment attribute should be used to map PCI configuration
(write) transactions.
Current ioremap interface on ARM64 implements mapping functions where the
Early Write Acknowledgment hint is enabled, so they cannot be used to map
PCI configuration space in a PCI specs compliant way.
Implement an ARM64 specific pci_remap_cfgspace() interface that allows to
map PCI config region with nGnRnE attributes, providing a remap function
that complies with PCI specifications and the ARMv8 architecture reference
manual recommendations.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com>
Currently SoCs pass2.x do not emulate EA headers for ACPI boot method at
all. However, for pass2.x some devices (like EDAC) advertise incorrect
base addresses in their BARs which results in driver probe failure during
resource request. Since all problematic blocks are on 2nd NUMA node under
domain 10 add necessary quirk entry to obtain BAR addresses correction
using EA header emulation.
Fixes: 44f22bd91e88 ("PCI: Add MCFG quirks for Cavium ThunderX pass2.x host controller") Signed-off-by: Tomasz Nowicki <tn@semihalf.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Robert Richter <rrichter@cavium.com> CC: stable@vger.kernel.org # v4.10+
With no blank lines, it's not obvious where the macro definitions end and
the uses begin. Add some blank lines and reorder the ThunderX definitions.
No functional change intended.
Brian Norris [Fri, 10 Mar 2017 02:46:15 +0000 (18:46 -0800)]
PCI: rockchip: Add remove() support
Currently, if we try to unbind the platform device, the remove will
succeed, but the removal won't undo most of the registration, leaving
partially-configured PCI devices in the system.
This allows, for example, a simple 'lspci' to crash the system, as it will
try to touch the freed (via devm_*) driver structures, e.g., on RK3399:
Currently we opencode the FLR sequence in lots of place; export a core
helper instead. We split out the probing for FLR support as all the
non-core callers already know their hardware.
Note that in the new pci_has_flr() function the quirk check has been moved
before the capability check as there is no point in reading the capability
in this case.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Bodong Wang [Wed, 12 Apr 2017 22:51:40 +0000 (01:51 +0300)]
PCI: Add sysfs sriov_drivers_autoprobe to control VF driver binding
Sometimes it is not desirable to bind SR-IOV VFs to drivers. This can save
host side resource usage by VF instances that will be assigned to VMs.
Add a new PCI sysfs interface "sriov_drivers_autoprobe" to control that
from the PF. To modify it, echo 0/n/N (disable probe) or 1/y/Y (enable
probe) to:
Note that this must be done before enabling VFs. The change will not take
effect if VFs are already enabled. Simply, one can disable VFs by setting
sriov_numvfs to 0, choose whether to probe or not, and then re-enable the
VFs by restoring sriov_numvfs.
[bhelgaas: changelog, ABI doc] Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
David Woodhouse [Fri, 7 Apr 2017 10:01:00 +0000 (12:01 +0200)]
ia64: Remove redundant checks for WC in pci_mmap_page_range()
For a PCI MMIO BAR, phys_mem_access_prot() should always return UC or WC.
And while a mixture of cached and uncached mappings is forbidden, we were
already mixing WC and UC, which is OK. Just do as we're asked.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Tested-by: Tony Luck <tony.luck@intel.com>
David Woodhouse [Wed, 12 Apr 2017 12:25:59 +0000 (13:25 +0100)]
PCI: Add pci_mmap_resource_range() and use it for ARM64
Starting to leave behind the legacy of the pci_mmap_page_range() interface
which takes "user-visible" BAR addresses. This takes just the resource and
offset.
For now, both APIs coexist and depending on the platform, one is
implemented as a wrapper around the other.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The PCI specifications (Rev 3.0, 3.2.5 "Transaction Ordering and Posting")
mandate non-posted configuration transactions. As further highlighted in
the PCIe specifications (4.0 - Rev0.3, "Ordering Considerations for the
Enhanced Configuration Access Mechanism"), through ECAM and ECAM-derivative
configuration mechanism, the memory mapped transactions from the host CPU
into Configuration Requests on the PCI express fabric may create ordering
problems for software because writes to memory address are typically posted
transactions (unless the architecture can enforce through virtual address
mapping non-posted write transactions behaviour) but writes to
Configuration Space are not posted on the PCI express fabric.
Current DT and ACPI host bridge controllers map PCI configuration space
(ECAM and ECAM-derivative) into the virtual address space through ioremap()
calls, that are non-cacheable device accesses on most architectures, but
may provide "bufferable" or "posted" write semantics in architecture like
eg ARM/ARM64 that allow ioremap'ed regions writes to be buffered in the bus
connecting the host CPU to the PCI fabric; this behaviour, as underlined in
the PCIe specifications, may trigger transactions ordering rules and must
be prevented.
Introduce a new generic and explicit API to create a memory mapping for
ECAM and ECAM-derivative config space area that defaults to
ioremap_nocache() (which should provide a sane default behaviour) but still
allowing architectures on which ioremap_nocache() results in posted write
transactions to override the function call with an arch specific
implementation that complies with the PCI specifications for configuration
transactions.
[bhelgaas: fold in #ifdef CONFIG_PCI wrapper] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Will Deacon <will.deacon@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com>
pci_remap_iospace() is marked as a weak symbol even though no architecture
is currently overriding it; given that its implementation internals have
already code paths that are arch specific (ie PCI_IOBASE and
ioremap_page_range() attributes) there is no need to leave the weak symbol
in the kernel since the same functionality can be achieved by customizing
per-arch the corresponding functionality.
Remove the __weak symbol from pci_remap_iospace().
PCI: Don't resize resources when realigning all devices in system
The "pci=resource_alignment" argument aligns BARs of designated devices by
artificially increasing their size. Increasing the size increases the
alignment and prevents other resources from being assigned in the same
alignment region, e.g., in the same page, but it can break drivers that use
the BAR size to locate things, e.g., ilo_map_device() does this:
off = pci_resource_len(pdev, bar) - 0x2000;
The new pcibios_default_alignment() interface allows an arch to request
that *all* BARs in the system be aligned to a larger size. In this case,
we don't need to artificially increase the resource size because we know
every BAR of every device will be realigned, so nothing will share the same
alignment region.
Use IORESOURCE_STARTALIGN to request realignment of PCI BARs when we know
we're realigning all BARs in the system.
PCI: Don't reassign resources that are already aligned
The "pci=resource_alignment=" kernel argument designates devices for which
we want alignment greater than is required by the PCI specs. Previously we
set IORESOURCE_UNSET for every MEM resource of those devices, even if the
resource was *already* sufficiently aligned.
If a resource is already sufficiently aligned, leave it alone and don't try
to reassign it.
Pull the BAR size adjustment out into a new function,
pci_request_resource_alignment(), and add a comment about how and why we
increase the resource size and alignment.
powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned
Override pcibios_default_alignment() to set default alignment to PAGE_SIZE
for all PCI devices on PowerNV platform. Thus sub-page BARs would not
share a page and could be mapped into guest when VFIO passthrough them.
PCI: Add pcibios_default_alignment() for arch-specific alignment control
When VFIO passes through a PCI device to a guest, it does not allow the
guest to mmap BARs that are smaller than PAGE_SIZE unless it can reserve
the rest of the page (see vfio_pci_probe_mmaps()). This is because a page
might contain several small BARs for unrelated devices and a guest should
not be able to access all of them.
VFIO emulates guest accesses to non-mappable BARs, which is functional but
slow. On systems with large page sizes, e.g., PowerNV with 64K pages, BARs
are more likely to share a page and performance is more likely to be a
problem.
Add a weak function to set default alignment for all PCI devices. An arch
can override it to force the PCI core to place memory BARs on their own
pages.
PCI: Include PCI-to-PCIe bridges as "Downstream Ports"
A PCI/PCI-X to PCI Express bridge, sometimes referred to as a "reverse
bridge", is a bridge with conventional PCI or PCI-X on its primary side and
a PCI Express Port on its secondary (downstream) side.
That PCIe Port is a Downstream Port and could be connected to a slot, just
like a Root Port or a Switch Downstream Port. Make pcie_downstream_port()
return true for them, so we can access the Slot registers in the PCIe
capability.
Laurent Pinchart reported that the Renesas R-Car H2 Lager board (r8a7790)
crashes during suspend tests. Geert Uytterhoeven managed to reproduce the
issue on an M2-W Koelsch board (r8a7791):
It occurs when the PME scan runs, once per second. During PME scan, the
PCI host bridge (rcar-pci) registers are accessed while its module clock
has already been disabled, leading to the crash.
One reproducer is to configure s2ram to use "s2idle" instead of "deep"
suspend:
Another reproducer is to write either "platform" or "processors" to
/sys/power/pm_test. It does not (or is less likely) to happen during full
system suspend ("core" or "none") because system suspend also disables
timers, and thus the workqueue handling PME scans no longer runs. Geert
believes the issue may still happen in the small window between disabling
module clocks and disabling timers:
# echo 0 > /sys/module/printk/parameters/console_suspend
# echo platform > /sys/power/pm_test # Or "processors"
# echo mem > /sys/power/state
(Make sure CONFIG_PCI_RCAR_GEN2 and CONFIG_USB_OHCI_HCD_PCI are enabled.)
Rafael Wysocki agrees that PME scans should be suspended before the host
bridge registers become inaccessible. To that end, queue the task on a
workqueue that gets frozen before devices suspend.
Rafael notes however that as a result, some wakeup events may be missed if
they are delivered via PME from a device without working IRQ (which hence
must be polled) and occur after the workqueue has been frozen. If that
turns out to be an issue in practice, it may be possible to solve it by
calling pci_pme_list_scan() once directly from one of the host bridge's
pm_ops callbacks.
PCI: Fix calculation of bridge window's size and alignment
In case that one device's alignment is greater than its size, we may
get an incorrect size and alignment for its bus's memory window in
pbus_size_mem(). Fix this case.
A 64-bit value is not needed since a PCI ROM address consists in 32 bits.
This fixes a clang warning about "implicit conversion from 'unsigned long'
to 'u32'".
Also remove now unnecessary casts to u32 from __pci_read_base() and
pci_std_update_resource().
Marc Gonzalez [Mon, 10 Apr 2017 17:46:54 +0000 (19:46 +0200)]
PCI: Improve __pci_read_base() robustness
Local variables 'l' and 'sz' are uninitialized. Normally, they would
be initialized by pci_read_config_dword() but when an error occurs,
some drivers immediately return an error code, which leaves the
argument uninitialized.
Provide a safe initial value to make the code more robust.
Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
PCI/irq: Add pci_request_irq() and pci_free_irq() helpers
These are small wrappers around request_threaded_irq() and free_irq(),
which dynamically allocate space for the device name so that drivers don't
need to keep static buffers for these around. Additionally it works with
device-relative vector numbers to make the usage easier, and force the
IRQF_SHARED flag on given that it has no runtime overhead and should be
supported by all PCI devices.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>