Michael Neuling [Wed, 27 May 2015 06:07:18 +0000 (16:07 +1000)]
cxl: Add AFU virtual PHB and kernel API
This patch does two things.
Firstly it presents the Accelerator Function Unit (AFUs) behind the POWER
Service Layer (PSL) as PCI devices on a virtual PCI Host Bridge (vPHB). This
in in addition to the PSL being a PCI device itself.
As part of the Coherent Accelerator Interface Architecture (CAIA) AFUs can
provide an AFU configuration. This AFU configuration recored is architected to
be the same as a PCI config space.
This patch sets discovers the AFU configuration records, provides AFU config
space read/write functions to these configuration records. It then enumerates
the PCI bus. It also hooks in PCI ops where appropriate. It also destroys the
vPHB when the physical card is removed.
Secondly, it add an in kernel API for AFU to use CXL. AFUs must present a
driver that firstly binds as a PCI device. This PCI device can then be using
to do CXL specific operations (that can't sit in the PCI ops) using this API.
Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Neuling [Wed, 27 May 2015 06:07:17 +0000 (16:07 +1000)]
cxl: Export file ops for use by API
The cxl kernel API will allow drivers other than cxl to export a file
descriptor which has the same userspace API. These file descriptors will be
able to be used against libcxl.
This exports those file ops for use by other drivers.
Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Neuling [Wed, 27 May 2015 06:07:16 +0000 (16:07 +1000)]
cxl: Move include file cxl.h -> cxl-base.h
This moves the current include file from cxl.h -> cxl-base.h. This current
include file is used only to pass information between the base driver that
needs to be built into the kernel and the cxl module.
This is to make way for a new include/misc/cxl.h which will
contain just the kernel API for other driver to use
Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Neuling [Wed, 27 May 2015 06:07:14 +0000 (16:07 +1000)]
cxl: Rework context lifetimes
This reworks contexts lifetimes a bit to enable the kernel API where we may
want to reuse contexts. Here we will want to start and stop contexts without
freeing them.
Start context does the get pid & ctx so stop context will need to do the puts.
Here we move put pid & ctx to the detach context path which will become part of
the stop context path.
Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Neuling [Wed, 27 May 2015 06:07:13 +0000 (16:07 +1000)]
cxl: Configure PSL for kernel contexts and merge code
This updates AFU directed and dedicated modes for contexts attached to the
kernel.
The SR (similar to the MSR in the core) calculation is getting
quite complex and is duplicated in AFU directed and dedicated
modes. This patch also merges this SR calculation for these modes.
Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Neuling [Wed, 27 May 2015 06:07:07 +0000 (16:07 +1000)]
cxl: Add cookie parameter to afu_release_irqs()
Add cookie parameter to afu_release_irqs() so that we can pass in a different
cookie than the context structure. This will be useful for other kernel
drivers that want to call this but get their own cookie back in the interrupt
handler.
Update all existing call sites.
Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Neuling [Wed, 27 May 2015 06:07:05 +0000 (16:07 +1000)]
cxl: Fix error path on probe
When probing we call pci_enable_device() but don't call pci_disable_device() on
fail. This causes refcounting issues in the PCI subsystem if a second driver
tries to bind to the same device.
This patch adds the pci_disable_device() to the probe error path. This error
path is hit when this cxl driver tries to bind to AFUs (on the vPHB) rather
than the physical device.
Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Ian Munsie [Wed, 27 May 2015 06:07:04 +0000 (16:07 +1000)]
cxl: Re-order card init to check the VSEC earlier
When we expose AFUs as virtual PCI devices, they may look like the physical
CAPI PCI card. ie they may have the same vendor/device IDs.
We want to avoid these AFUs binding to this driver and any init this driver may
do.
Re-order card init to check the VSEC earlier before assigning BARs or
activating CXL. Also change the dev used in early prints as the adapter struct
may not be inited at this earlier stage.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Neuling [Wed, 27 May 2015 06:07:00 +0000 (16:07 +1000)]
powerpc/pci: Add pcibios_disable_device() hook
This adds a hook into the powerpc pci code for pci_disable_device() calls. The
generic code already provides a weak pcibios_disable_device() symbol, so we
just need to provide our own in powerpc and it'll get picked up.
This is passed directly to the phb controller ops, provided one exists.
Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Neuling [Wed, 27 May 2015 06:06:59 +0000 (16:06 +1000)]
powerpc/pci: Add shutdown hook to pci_controller_ops
Currently pnv_pci_shutdown() calls the PHB shutdown code for all PHBs in the
system. It dereferences the private_data assuming it's a powernv PHB, which
won't be the case when we have different PHB in the systems (like when we add
vPHBs for CXL).
This moves the shutdown hook to the pci_controller_ops and fixes the call site
to use that instead.
Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Neuling [Wed, 27 May 2015 06:06:58 +0000 (16:06 +1000)]
powerpc: Add cxl context to device archdata
Add cxl context pointer to archdata. We'll want to create one of these for cxl
PCI devices. Put them here until we can get a pci_dev specific private data.
This location was suggested by benh.
Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Ian Munsie [Fri, 8 May 2015 12:55:18 +0000 (22:55 +1000)]
cxl: Use call_rcu to reduce latency when releasing the afu fd
The afu fd release path was identified as a significant bottleneck in
the overall performance of cxl. While an optimal AFU design would
minimise the need to close & reopen the AFU fd, it is not always
practical to avoid.
The bottleneck seems to be down to the call to synchronize_rcu(), which
will block until every other thread is guaranteed to be out of an RCU
critical section. Replace it with call_rcu() to free the context
structures later so we can return to the application sooner.
This reduces the time spent in the fd release path from 13356 usec to
13.3 usec - about a 100x speed up.
Reported-by: Fei K Chen <uchen@cn.ibm.com> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Vaibhav Jain [Fri, 22 May 2015 05:26:05 +0000 (10:56 +0530)]
cxl: Export AFU error buffer via sysfs
Export the "AFU Error Buffer" via sysfs attribute (afu_err_buf). AFU
error buffer is used by the AFU to report application specific
errors. The contents of this buffer are AFU specific and are intended to
be interpreted by the application interacting with the afu.
Suggested-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
cxl: Implement an ioctl to fetch afu card-id, offset-id and mode
Given a file descriptor on an afu device, libcxl currently uses the
major/minor number obtained from fstat on the fd to construct path to
the afu's sysfs directory. However it is possible that rather than using
one of the device in /dev/cxl, a kernel driver creates its own device
which export generic cxl interface to the userspace. This causes
problems with libcxl as it tries to use a wrong major/minor number to
construct the sysfs path and fail.
So this patch introduces a new ioctl called CXL_IOCTL_GET_AFU_ID on the
afu file descriptor to fetch the cxl_afu_id struct that holds the
card/offset-id and mode information. These info is then used by libcxl to
construct the correct path to the afu sysfs directory.
Testing:
- Build against pseries be/le configs
- Testing with corresponding libcxl changes to verify that it constructs
right sysfs path to the afu.
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Cyril Bur [Tue, 26 May 2015 01:36:57 +0000 (11:36 +1000)]
powerpc/configs: Replace pseries_le_defconfig with a Makefile target using merge_config
Rather than continuing to maintain a copy of pseries_defconfig with
CONFIG_CPU_LITTLE_ENDIAN enabled, use the generic merge_config script
and use an le.config to enable little endian on top of pseries_defconfig
without the need for a duplicated _defconfig file.
This method will require less maintenance in the future and will ensure
that both 'defconfigs' are always in sync.
It is worth noting that the seemingly more simple approach of:
Also does not work. This is because if we have for example:
config FOO
depends on CPU_BIG_ENDIAN
select BAR
Then BAR will be enabled by the first call to kconfig (via
pseries_defconfig), and then will remain enabled after we merge
le.config, even though FOO will have been turned off.
The solution is to ensure to only invoke the kconfig logic once, after
we have merged all the config fragments. This ensures nothing is
select'ed on that should then be disabled by the later merged configs.
This is done through the explicit call to make olddefconfig
Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Reviewed-by: Samuel Mendoza-Jonas <sam.mj@au1.ibm.com>
[mpe: Massage change log, fix white space and use ARCH not SRCARCH] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Cyril Bur [Tue, 26 May 2015 01:36:56 +0000 (11:36 +1000)]
powerpc/configs: Merge pseries_defconfig and pseries_le_defconfig
These two configs should be identical with the exception of big or little
endian.
The big endian version has XMON_DEFAULT turned on while the little endian
has XMON_DEFAULT not set. It makes the most sense for defconfigs not to use
xmon by default, production systems should get back up as quickly as
possible, not sit in xmon.
In the event debugging is required, the option can be enabled or xmon=on
can be specified on commandline.
Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anton Blanchard [Tue, 26 May 2015 05:46:55 +0000 (15:46 +1000)]
powerpc: Non relocatable system call doesn't need a trampoline
We need to use a trampoline when using LOAD_HANDLER(), because the
destination needs to be in the first 64kB. An absolute branch has
no such limitations, so just jump there.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anton Blanchard [Tue, 26 May 2015 05:46:54 +0000 (15:46 +1000)]
powerpc: Relocatable system call no longer uses the LR
We had some code to restore the LR in the relocatable system call path
back when we used the LR to do an indirect branch.
Commit 6a404806dfce ("powerpc: Avoid link stack corruption in MMU
on syscall entry path") changed this to use the CTR which is volatile
across system calls so does not need restoring.
Remove the stale comment and the restore of the LR.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anton Blanchard [Tue, 26 May 2015 05:10:24 +0000 (15:10 +1000)]
powerpc/perf: Fix book3s kernel to userspace backtraces
When we take a PMU exception or a software event we call
perf_read_regs(). This overloads regs->result with a boolean that
describes if we should use the sampled instruction address register
(SIAR) or the regs.
If the exception is in kernel, we start with the kernel regs and
backtrace through the kernel stack. At this point we switch to the
userspace regs and backtrace the user stack with perf_callchain_user().
Unfortunately these regs have not got the perf_read_regs() treatment,
so regs->result could be anything. If it is non zero,
perf_instruction_pointer() decides to use the SIAR, and we get issues
like this:
Daniel Axtens [Tue, 28 Apr 2015 05:12:07 +0000 (15:12 +1000)]
powerpc/powernv: Move dma_set_mask() from pnv_phb to pci_controller_ops
Previously, dma_set_mask() on powernv was convoluted:
0) Call dma_set_mask() (a/p/kernel/dma.c)
1) In dma_set_mask(), ppc_md.dma_set_mask() exists, so call it.
2) On powernv, that function pointer is pnv_dma_set_mask().
In pnv_dma_set_mask(), the device is pci, so call pnv_pci_dma_set_mask().
3) In pnv_pci_dma_set_mask(), call pnv_phb->set_dma_mask() if it exists.
4) It only exists in the ioda case, where it points to
pnv_pci_ioda_dma_set_mask(), which is the final function.
So the call chain is:
dma_set_mask() ->
pnv_dma_set_mask() ->
pnv_pci_dma_set_mask() ->
pnv_pci_ioda_dma_set_mask()
Both ppc_md and pnv_phb function pointers are used.
Rip out the ppc_md call, pnv_dma_set_mask() and pnv_pci_dma_set_mask().
Instead:
0) Call dma_set_mask() (a/p/kernel/dma.c)
1) In dma_set_mask(), the device is pci, and pci_controller_ops.dma_set_mask()
exists, so call pci_controller_ops.dma_set_mask()
2) In the ioda case, that points to pnv_pci_ioda_dma_set_mask().
The new call chain is
dma_set_mask() ->
pnv_pci_ioda_dma_set_mask()
Now only the pci_controller_ops function pointer is used.
The fallback paths for p5ioc2 are the same.
Previously, pnv_pci_dma_set_mask() would find no pnv_phb->set_dma_mask()
function, to it would call __set_dma_mask().
Now, dma_set_mask() finds no ppc_md call or pci_controller_ops call,
so it calls __set_dma_mask().
Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Daniel Axtens [Tue, 28 Apr 2015 05:12:06 +0000 (15:12 +1000)]
powerpc/pci: add dma_set_mask to pci_controller_ops
Some systems only need to deal with DMA masks for PCI devices.
For these systems, we can avoid the need for a platform hook and
instead use a pci controller based hook.
Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Daniel Axtens [Tue, 28 Apr 2015 05:12:05 +0000 (15:12 +1000)]
powerpc/powernv: Specialise pci_controller_ops for each controller type
Remove powernv generic PCI controller operations. Replace it with
controller ops for each of the two supported PHBs.
As an added bonus, make the two new structs const, which will help
guard against bugs such as the one introduced in 65ebf4b63
("powerpc/powernv: Move controller ops from ppc_md to controller_ops")
Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Daniel Axtens [Tue, 14 Apr 2015 04:28:02 +0000 (14:28 +1000)]
powerpc/mpic_u3msi: Move MSI-related ops to pci_controller_ops
Move the u3 MPIC msi subsystem to use the pci_controller_ops structure
rather than ppc_md for MSI related PCI controller operations.
As with fsl_msi, operations are plugged in at the subsys level, after
controller creation. Again, we iterate over all controllers and
populate them with the MSI ops.
Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Daniel Axtens [Tue, 14 Apr 2015 04:28:01 +0000 (14:28 +1000)]
powerpc/pasemi: Move MSI-related ops to pci_controller_ops
Move the PaSemi MPIC msi subsystem to use the pci_controller_ops
structure rather than ppc_md for MSI related PCI controller
operations.
As with fsl_msi, operations are plugged in at the subsys level, after
controller creation. Again, we iterate over all controllers and
populate them with the MSI ops.
Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Daniel Axtens [Tue, 14 Apr 2015 04:28:00 +0000 (14:28 +1000)]
powerpc/ppc4xx_hsta_msi: Move MSI-related ops to pci_controller_ops
Move the ppc4xx hsta msi subsystem to use the pci_controller_ops
structure rather than ppc_md for MSI related PCI controller
operations.
As with fsl_msi, operations are plugged in at the subsys level, after
controller creation. Again, we iterate over all controllers and
populate them with the MSI ops.
Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Daniel Axtens [Tue, 14 Apr 2015 04:27:59 +0000 (14:27 +1000)]
powerpc/ppc4xx_msi: Move MSI-related ops to pci_controller_ops
Move the ppc4xx msi subsystem to use the pci_controller_ops structure
rather than ppc_md for MSI related PCI controller operations.
As with fsl_msi, operations are plugged in at the subsys level, after
controller creation. Again, we iterate over all controllers and
populate them with the MSI ops.
Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Daniel Axtens [Tue, 14 Apr 2015 04:27:58 +0000 (14:27 +1000)]
powerpc/fsl_msi: Move MSI-related ops to pci_controller_ops
Move the fsl_msi subsystem to use the pci_controller_ops structure
rather than ppc_md for MSI related PCI controller operations.
Previously, MSI ops were added to ppc_md at the subsys level. However,
in fsl_pci.c, PCI controllers are created at the at arch level. So,
unlike in e.g. PowerNV/pSeries/Cell, we can't simply populate a
platform-level controller ops structure and have it copied into the
controllers when they are created.
Instead, walk every phb, and attempt to populate it with the MSI ops.
Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Daniel Axtens [Tue, 14 Apr 2015 04:27:56 +0000 (14:27 +1000)]
powerpc/cell: Move MSI-related ops to pci_controller_ops
Move the Cell platform to use the pci_controller_ops structure rather
than ppc_md for MSI related PCI controller operations.
We can be confident that the functions will be added to the platform's
ops struct before any PCI controller's ops struct is populated
because:
1) These ops are added to the struct in a subsys initcall.
We populate the ops in axon_msi_probe, which is the probe call for the
axon-msi driver. However the driver is registered in axon_msi_init,
which is a subsys initcall, so this will happen at the subsys level.
2) The controller recieves the struct later, in a device initcall.
Cell populates the controller in cell_setup_phb, which is hooked up to
ppc_md.pci_setup_phb. ppc_md.pci_setup_phb is only ever called in
of_platform.c, as part of the OpenFirmware PCI driver's probe
routine. That driver is registered in a device initcall, so it will
occur *after* the struct is properly populated.
Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Alistair Popple [Fri, 15 May 2015 04:06:40 +0000 (14:06 +1000)]
powernv/eeh: Update the EEH code to use the opal irq domain
The eeh code currently uses the old notifier method to get eeh events
from OPAL. It also contains some logic to filter opal events which has
been moved into the virtual irqchip. This patch converts the eeh code
to the new event interface which simplifies event handling.
Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Alistair Popple [Fri, 15 May 2015 04:06:39 +0000 (14:06 +1000)]
hvc: Convert to using interrupts instead of opal events
Convert the opal hvc driver to use the new irqchip to register for
opal events. As older firmware versions may not have device tree
bindings for the interrupt parent we just use a hardcoded hwirq based
on the event number.
Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Alistair Popple [Fri, 15 May 2015 04:06:37 +0000 (14:06 +1000)]
powerpc/powernv: Add a virtual irqchip for opal events
Whenever an interrupt is received for opal the linux kernel gets a
bitfield indicating certain events that have occurred and need handling
by the various device drivers. Currently this is handled using a
notifier interface where we call every device driver that has
registered to receive opal events.
This approach has several drawbacks. For example each driver has to do
its own checking to see if the event is relevant as well as event
masking. There is also no easy method of recording the number of times
we receive particular events.
This patch solves these issues by exposing opal events via the
standard interrupt APIs by adding a new interrupt chip and
domain. Drivers can then register for the appropriate events using
standard kernel calls such as irq_of_parse_and_map().
Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Most of the OPAL subsystems are always compiled in for PowerNV and
many of them need to be initialised before or after other OPAL
subsystems. Rather than trying to control this ordering through
machine initcalls it is clearer and easier to control initialisation
order with explicit calls in opal_init.
powerpc/powernv: Introduce sysfs control for fastsleep workaround behavior
Fastsleep is one of the idle state which cpuidle subsystem currently
uses on power8 machines. In this state L2 cache is brought down to a
threshold voltage. Therefore when the core is in fastsleep, the
communication between L2 and L3 needs to be fenced. But there is a bug
in the current power8 chips surrounding this fencing.
OPAL provides a workaround which precludes the possibility of hitting
this bug. But running with this workaround applied causes checkstop
if any correctable error in L2 cache directory is detected. Hence OPAL
also provides a way to undo the workaround.
In the existing implementation, workaround is applied by the last thread
of the core entering fastsleep and undone by the first thread waking up.
But this has a performance cost. These OPAL calls account for roughly
4000 cycles everytime the core has to enter or wakeup from fastsleep.
This patch introduces a sysfs attribute (fastsleep_workaround_applyonce)
to choose the behavior of this workaround.
By default, fastsleep_workaround_applyonce = 0. In this case, workaround
is applied/undone everytime the core enters/exits fastsleep.
fastsleep_workaround_applyonce = 1. In this case the workaround is
applied once on all the cores and never undone. This can be triggered by
echo 1 > /sys/devices/system/cpu/fastsleep_workaround_applyonce
For simplicity this attribute can be modified only once. Implying, once
fastsleep_workaround_applyonce is changed to 1, it cannot be reverted
to the default state.
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
powerpc/powernv: Move cpuidle related code from setup.c to new file
This is a cleanup patch; doesn't change any functionality. Moves
all cpuidle related code from setup.c to a new file.
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
[mpe: Fix the SMP=n build by including asm/smp.h in idle.c] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
powerpc: Fix cpu_online_cores_map to return only online threads mask
Currently, cpu_online_cores_map returns a mask, which for every core with
at least one online thread, has the bit for thread 0 of the core set to 1,
and the bits for all other threads of the core set to 0. But thread 0 of
the core itself may not be online always. In such cases, if the returned
mask is used for IPI, then it'll cause IPIs to be skipped on cores where
the first thread is offline, because the IPI code refuses to send IPIs to
offline threads.
Fix this by setting the bit of the first online thread in the core.
This is done by fixing this in the underlying function
cpu_thread_mask_to_cores.
The result has the property that for all cores with online threads, there
is one bit set in the returned map. And further, all bits that are set in
the returned map correspond to online threads.
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
[ Changelog from Michael Ellerman <mpe@ellerman.id.au> ] Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Laurent Dufour [Tue, 5 May 2015 15:30:21 +0000 (17:30 +0200)]
powerpc: Enable sys_kcmp() for CRIU
The commit 8170a83f15ee ("powerpc: Wireup the kcmp syscall to sys_ni") has
disabled the kcmp syscall for powerpc. This has been done due to the use
of unsigned long parameters which may require a dedicated wrapper to handle
32bit process on top of 64bit kernel. However in the kcmp() case, the 2
unsigned long parameters are currently only used to carry file descriptors
from user space to the kernel. Since such a parameter is passed through
register, and file descriptor doesn't need to get extended, there is,
today, no need for a wrapper.
In the case there will be a need to pass address in or out of this system
call, then a wrapper could be required, it will then be to care of it.
As today this is not the case, it is safe to enable kcmp() on powerpc.
Tested (by Laurent) on 64-bit, 32-bit, and 32-bit userspace on 64-bit
kernel using tools/testing/selftests/kcmp [mpe].
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
As the comment indicates, powernv_eeh_get_state() will inform EEH core to
delay 1 second. This means the delay doesn't happen when
powernv_eeh_get_state() returns.
This patch moves the delay subtraction just before msleep(), which is the
same logic in pseries_eeh_wait_state().
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Wei Yang [Mon, 27 Apr 2015 01:25:09 +0000 (09:25 +0800)]
powerpc/eeh: fix start/end/flags type in struct pci_io_addr_range{}
struct pci_io_addr_range{} stores the information of pci resources. It
would be better to keep these related fields have the same type as in
struct resource{}.
This patch fixes the start/end/flags type in struct pci_io_addr_range{} to
have the same type as in struct resource{}.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gavin Shan [Thu, 26 Mar 2015 05:42:09 +0000 (16:42 +1100)]
drivers/vfio: Support EEH error injection
The patch adds one more EEH sub-command (VFIO_EEH_PE_INJECT_ERR)
to inject the specified EEH error, which is represented by
(struct vfio_eeh_pe_err), to the indicated PE for testing purpose.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gavin Shan [Thu, 26 Mar 2015 05:42:08 +0000 (16:42 +1100)]
powerpc/eeh: Introduce eeh_pe_inject_err()
The patch defines PCI error types and functions in uapi/asm/eeh.h
and exports function eeh_pe_inject_err(), which will be called by
VFIO driver to inject the specified PCI error to the indicated
PE for testing purpose.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gavin Shan [Thu, 26 Mar 2015 05:42:07 +0000 (16:42 +1100)]
powerpc/eeh: Move PE state constants around
There are two equivalent sets of PE state constants, defined in
arch/powerpc/include/asm/eeh.h and include/uapi/linux/vfio.h.
Though the names are different, their corresponding values are
exactly same. The former is used by EEH core and the latter is
used by userspace.
The patch moves those constants from arch/powerpc/include/asm/eeh.h
to arch/powerpc/include/uapi/asm/eeh.h, which are expected to be
used by userspace from now on. We can't delete those constants in
vfio.h as it's uncertain that those constants have been or will be
used by userspace.
Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Joel Stanley [Thu, 30 Apr 2015 03:50:07 +0000 (13:50 +1000)]
powerpc/powernv: Silence SYSPARAM warning on boot
OpenPower BMC machines do not place any sysparams in the device tree, so
at every boot we get a warning:
[ 0.437176] SYSPARAM: Opal sysparam node not found
Remove the warning, and reorder the init so we don't peform allocations
when there is no sysparam node in the device tree.
Signed-off-by: Joel Stanley <joel@jms.id.au> Acked-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Mon, 11 May 2015 10:01:02 +0000 (20:01 +1000)]
powerpc/vdso: Disable building the 32-bit VDSO on little endian
The only little endian configuration we support is ppc64le. As such if
we're building little endian we don't need a 32-bit VDSO, because there
is no 32-bit userspace.
This patch is a fairly ugly mess of #ifdefs, but is the minimal logic
required to disable the 32-bit VDSO. We can hopefully clean up the
result in future with some further refactoring.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 22 Apr 2015 05:40:35 +0000 (15:40 +1000)]
powerpc/vdso: Combine start/size variables
In vdso_fixup_features() we have start64/start32 and size64/size32, but
they have the same types, ie. void * and unsigned long.
They're only used to save the return value from find_sectionXX() for the
subsequent call to do_feature_fixups(), so there's no overlap in their
usage either.
So we can just consolidate them into start/size and avoid the
duplication.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Thu, 23 Apr 2015 07:27:12 +0000 (17:27 +1000)]
powerpc: Reject binutils 2.24 when building little endian
There is a bug in binutils 2.24 which causes miscompilation if we're
building little endian and using weak symbols (which the kernel does).
It is fixed in binutils commit 57fa7b8c7e59 "Correct elf_merge_st_other
arguments for weak symbols", which is in binutils 2.25 and has been
backported to the binutils 2.24 branch and has been picked up by most
distros it seems.
However if we're running stock 2.24 (no extra version) then the bug is
present, so check for that and bail.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Tue, 21 Apr 2015 12:26:57 +0000 (22:26 +1000)]
powerpc: Don't do gcc version checks if we're building with clang
We have several checks for bad gcc versions in our Makefile. These don't
apply if we're building with clang, so skip them in that case.
The obvious check would be for ${COMPILER} = "gcc", but because of the
way the logic in the top level Makefile conditionally sets COMPILER,
it's possible that we're building with gcc but COMPILER was not set.
So instead check for ${COMPILER} != "clang", which we know is currently
the only other possibility.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Fri, 10 Apr 2015 01:52:06 +0000 (11:52 +1000)]
powerpc/pasemi: Only the build the pasemi MSI code for PASEMI=y
The pasemi MSI code is currently always built when MPIC=y && PCI_MSI=y.
It should not have any effect on other platforms, because it immediately
checks the MPIC's compatible property for "pasemi,pwrficient-openpic".
However it's odd that it's still built even when PASEMI=n. It also
needn't be in sysdev, as it's only used by pasemi. So move it into
platforms/pasemi, whereby it will only be built for PASEMI=y.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
powerpc/pseries: Fix possible leaked device node reference
Failure return from dlpar_configure_connector when dlpar adding cpus
results in leaking references to the cpus parent device node. Move the
call to of_node_put() prior to checking the result of
dlpar_configure_connector.
Fixes: 8d5ff320766f ("powerpc/pseries: Make dlpar_configure_connector parent node aware") Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Linus Torvalds [Sun, 10 May 2015 21:58:53 +0000 (14:58 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"I really need to get back to sending these on my Friday, instead of my
Monday morning, but nothing too amazing in here: a few amdkfd fixes, a
few radeon fixes, i915 fixes, one tegra fix and one core fix"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm: Zero out invalid vblank timestamp in drm_update_vblank_count.
drm/tegra: Don't use vblank_disable_immediate on incapable driver.
drm/radeon: stop trying to suspend UVD sessions
drm/radeon: more strictly validate the UVD codec
drm/radeon: make UVD handle checking more strict
drm/radeon: make VCE handle check more strict
drm/radeon: fix userptr lockup
drm/radeon: fix userptr BO unpin bug v3
drm/amdkfd: Initialize sdma vm when creating sdma queue
drm/amdkfd: Don't report local memory size
drm/amdkfd: allow unregister process with queues
drm/i915: Drop PIPE-A quirk for 945GSE HP Mini
drm/i915: Sink rate read should be saved in deca-kHz
drm/i915/dp: there is no audio on port A
drm/i915: Add missing MacBook Pro models with dual channel LVDS
drm/i915: Assume dual channel LVDS if pixel clock necessitates it
drm/radeon: don't setup audio on asics that don't support it
drm/radeon: disable semaphores for UVD V1 (v2)
Dave Airlie [Sun, 10 May 2015 20:06:22 +0000 (06:06 +1000)]
Merge tag 'drm-intel-fixes-2015-05-08' of git://anongit.freedesktop.org/drm-intel into drm-fixes
misc i915 fixes.
* tag 'drm-intel-fixes-2015-05-08' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Drop PIPE-A quirk for 945GSE HP Mini
drm/i915: Sink rate read should be saved in deca-kHz
drm/i915/dp: there is no audio on port A
drm/i915: Add missing MacBook Pro models with dual channel LVDS
drm/i915: Assume dual channel LVDS if pixel clock necessitates it
Mario Kleiner [Tue, 7 Apr 2015 04:31:09 +0000 (06:31 +0200)]
drm: Zero out invalid vblank timestamp in drm_update_vblank_count.
Since commit 844b03f27739135fe1fed2fef06da0ffc4c7a081 we make
sure that after vblank irq off, we return the last valid
(vblank count, vblank timestamp) pair to clients, e.g., during
modesets, which is good.
An overlooked side effect of that commit for kms drivers without
support for precise vblank timestamping is that at vblank irq
enable, when we update the vblank counter from the hw counter, we
can't update the corresponding vblank timestamp, so now we have a
totally mismatched timestamp for the new count to confuse clients.
Restore old client visible behaviour from before Linux 3.17, but
zero out the timestamp at vblank counter update (instead of disable
as in original implementation) if we can't generate a meaningful
timestamp immediately for the new vblank counter. This will fix
this regression, so callers know they need to retry again later
if they need a valid timestamp, but at the same time preserves
the improvements made in the commit mentioned above.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Cc: <stable@vger.kernel.org> #v3.17+ Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com>
Linus Torvalds [Sun, 10 May 2015 18:13:19 +0000 (11:13 -0700)]
Merge tag 'samsung-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung
Pull samsung fixes from Kukjin Kim:
"Here is Samsung fixes for v4.1. Since I've missed to send this via
arm-soc tree before v4.1-rc3, so I'm sending this to you directly
- fix commit ea08de16eb1b ("ARM: dts: Add DISP1 power domain for
exynos5420") which causes 'unhandled fault: imprecise external
abort' error when PD turned off. ("make DP a consumer of DISP1
power domain")
- fix 's3c-rtc' probe failure on Odriod-X2/U2/U3 boards ("add
'rtc_src' clock to rtc node for source clock of rtc")
- fix typo for 'cpu-crit-0' trip point on exynos5420/5440
- fix S2R failure on exynos5250-snow due to card power of Marvell
WiFi driver (suspend/resume) ("add keep-power-in-susped to WiFi
SDIO node")"
* tag 'samsung-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung:
ARM: dts: Add keep-power-in-suspend to WiFi SDIO node for exynos5250-snow
ARM: dts: Fix typo in trip point temperature for exynos5420/5440
ARM: dts: add 'rtc_src' clock to rtc node for exynos4412-odroid boards
ARM: dts: Make DP a consumer of DISP1 power domain on Exynos5420
Linus Torvalds [Sat, 9 May 2015 23:13:38 +0000 (16:13 -0700)]
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"A few patches have come up since the merge window. The largest one is
a rewrite of the PXA lubbock/mainstone IRQ handling. This was already
broken in 2011 by a change to the GPIO code and only noticed now.
The other changes contained here are:
MAINTAINERS file updates:
- Ray Jui and Scott Branden are now co-maintainers for some of the
mach-bcm chips, while Christian Daudt and Marc Carino have stepped
down.
- Andrew Victor is no longer maintaining at91. Instead, Alexandre
Belloni now becomes an official maintainer, after having done a
bulk of the work for a while.
- Baruch Siach, who added the mach-digicolor platform in 4.1 is now
listed as maintainer
- The git URL for mach-socfpga has changed
Bug fixes:
- Three bug fixes for new rockchip rk3288 code
- A regression fix to make SD card support work on certain ux500
boards
- multiple smaller dts fixes for imx, omap, mvebu, and shmobile
- a regression fiix for omap3 power consumption
- a fix for regression in the ARM CCI bus driver
Configuration changes:
- more imx platforms are now enabled in multi_v7_defconfig"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (39 commits)
MAINTAINERS: add Conexant Digicolor machines entry
MAINTAINERS: socfpga: update the git repo for SoCFPGA
ARM: multi_v7_defconfig: Select more FSL SoCs
MAINTAINERS: replace an AT91 maintainer
drivers: CCI: fix used_mask init in validate_group()
bus: omap_l3_noc: Fix master id address decoding for OMAP5
bus: omap_l3_noc: Fix offset for DRA7 CLK1_HOST_CLK1_2 instance
ARM: dts: dra7: Fix efuse register size for ABB
ARM: dts: am57xx-beagle-x15: Switch GPIO fan number
ARM: dts: am57xx-beagle-x15: Switch UART mux pins
ARM: dts: am437x-sk: reduce col-scan-delay-us
ARM: dts: am437x-sk: fix for new newhaven display module revision
ARM: dts: am57xx-beagle-x15: Fix RTC aliases
ARM: dts: am57xx-beagle-x15: Fix IRQ type for mcp7941x
ARM: dts: omap3: Add #iommu-cells to isp and iva iommu
ARM: omap2plus_defconfig: Enable EXTCON_USB_GPIO
ARM: dts: OMAP3-N900: Add microphone bias voltages
ARM: OMAP2+: Fix omap off idle power consumption creeping up
MAINTAINERS: Update brcmstb entry
MAINTAINERS: Remove Christian Daudt for mach-bcm
...
Linus Torvalds [Sat, 9 May 2015 23:07:14 +0000 (16:07 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user-namespace fix from Eric Biederman:
"Eric Windish recently reported a really bug that allows mounting fresh
copies of proc and sysfs when it really should not be allowed. The
code attempted to verify that proc and sysfs were fully visible but
there is a test missing to ensure that the root of the filesystem is
visible. Doh!
The following patch fixes that.
This fixes a containment issue that the docker folks are seeing"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
mnt: Fix fs_fully_visible to verify the root directory is visible
Linus Torvalds [Sat, 9 May 2015 21:59:05 +0000 (14:59 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Two patches from the irq departement:
- a simple fix to make dummy_irq_chip usable for wakeup scenarios
- removal of the gic arch_extn hackery. Now that all users are
converted we really want to get rid of the interface so people wont
come up with new use cases"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip: gic: Drop support for gic_arch_extn
genirq: Set IRQCHIP_SKIP_SET_WAKE flag for dummy_irq_chip
Linus Torvalds [Sat, 9 May 2015 04:39:12 +0000 (21:39 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"A couple of fixes for bugs caught while digging in fs/namei.c. The
first one is this cycle regression, the second is 3.11 and later"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
path_openat(): fix double fput()
namei: d_is_negative() should be checked before ->d_seq validation
Al Viro [Sat, 9 May 2015 02:53:15 +0000 (22:53 -0400)]
path_openat(): fix double fput()
path_openat() jumps to the wrong place after do_tmpfile() - it has
already done path_cleanup() (as part of path_lookupat() called by
do_tmpfile()), so doing that again can lead to double fput().
Cc: stable@vger.kernel.org # v3.11+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 7 May 2015 23:24:57 +0000 (19:24 -0400)]
namei: d_is_negative() should be checked before ->d_seq validation
Fetching ->d_inode, verifying ->d_seq and finding d_is_negative() to
be true does *not* mean that inode we'd fetched had been NULL - that
holds only while ->d_seq is still unchanged.
Shift d_is_negative() checks into lookup_fast() prior to ->d_seq
verification.
Reported-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Sat, 9 May 2015 03:59:02 +0000 (20:59 -0700)]
Merge branch 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fix from Chris Mason:
"When an arm user reported crashes near page_address(page) in my new
code, it became clear that I can't be trusted with GFP masks. Filipe
beat me to the patch, and I'll just be in the corner with my dunce cap
on"
* 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix wrong mapping flags for free space inode
Linus Torvalds [Sat, 9 May 2015 03:38:21 +0000 (20:38 -0700)]
Merge tag 'dm-4.1-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
"Two additional fixes for changes introduced via DM during the 4.1
merge window.
The first reverts a dm-crypt change that wasn't correct. The second
fixes a device format regression that impacted userspace"
* tag 'dm-4.1-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
init: fix regression by supporting devices with major:minor:offset format
Revert "dm crypt: fix deadlock when async crypto algorithm returns -EBUSY"