]> git.karo-electronics.de Git - linux-beck.git/log
linux-beck.git
8 years agocxl: fix build for GCC 4.6.x
Brian Norris [Fri, 8 Jan 2016 18:30:09 +0000 (10:30 -0800)]
cxl: fix build for GCC 4.6.x

GCC 4.6.3 does not support -Wno-unused-const-variable. Instead, use the
kbuild infrastructure that checks if this options exists.

Fixes: 2cd55c68c0a4 ("cxl: Fix build failure due to -Wunused-variable behaviour change")
Suggested-by: Michal Marek <mmarek@suse.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc: Add HWCAP bits for Power9
Michael Ellerman [Mon, 11 Jan 2016 02:59:04 +0000 (13:59 +1100)]
powerpc: Add HWCAP bits for Power9

In order to support Power9 we need two new HWCAP bits. We are merging
these ahead of the cputable entry so that glibc can start referring to
them.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/powernv: Reserve PE#0 on NPU
Alistair Popple [Mon, 11 Jan 2016 05:53:50 +0000 (16:53 +1100)]
powerpc/powernv: Reserve PE#0 on NPU

P8+ hardware reports all errors on PE#0. This patch ensures PE#0 is
not assigned to NPU devices so that it can be used for EEH.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/powernv: Change NPU PE# assignment
Alistair Popple [Mon, 11 Jan 2016 05:53:49 +0000 (16:53 +1100)]
powerpc/powernv: Change NPU PE# assignment

The P8+ hardware supports four partitionable endpoints (PEs) however
the hardware reports all errors as occurring on PE#0. This means we
need to reserve this PE for error handling (EEH) and not assign it to
a NPU device, implying that some devices will need to share PEs.

This patch changes the PE assignment for NPU devices such that NPU
devices which connect to the same GPU are assigned to the same
PE#.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/powernv: Fix update of NVLink DMA mask
Alistair Popple [Fri, 8 Jan 2016 00:35:09 +0000 (11:35 +1100)]
powerpc/powernv: Fix update of NVLink DMA mask

The emulated NVLink PCI devices share the same IODA2 TCE tables but only
support a single TVT (instead of the normal two for PCI devices). This
requires the kernel to manually replace windows with either the bypass
or non-bypass window depending on what the driver has requested.

Unfortunately an incorrect optimisation was made in
pnv_pci_ioda_dma_set_mask() which caused updating of some NPU device PEs
to be skipped in certain configurations due to an incorrect assumption
that a NULL peer PE in the array indicated there were no more peers
present. This patch fixes the problem by ensuring all peer PEs are
updated.

Fixes: 5d2aa710e697 ("powerpc/powernv: Add support for Nvlink NPUs")
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/powernv: Remove misleading comment in pci.c
Russell Currey [Fri, 8 Jan 2016 05:16:47 +0000 (16:16 +1100)]
powerpc/powernv: Remove misleading comment in pci.c

PCI in powernv now supports quite a bit more than p5ioc2, so remove the
outdated comment.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc: Implement save_stack_trace_regs() to enable kprobe stack tracing
Steven Rostedt [Tue, 8 Dec 2015 18:50:56 +0000 (13:50 -0500)]
powerpc: Implement save_stack_trace_regs() to enable kprobe stack tracing

It has come to my attention that kprobe event stack tracing does not
work on powerpc. You can see with the following:

  # cd /sys/kernel/debug/tracing
  # echo stacktrace > trace_options
  # echo 'p kfree' > kprobe_events
  # echo 1 > events/kprobes/enable

Will print the following warning:
  save_stack_trace_regs() not implemented yet.

Although save_stack_trace() (which normal event stack traces use) is
implemented, save_stack_trace_regs() which kprobe events use is not.
This is a cheap attempt to implement that function.

Note, This may have issues if a task tries to get a stack trace from
another task with its regs, because it just passes in "current" to
save_context_stack(). But this does solve the issue with stack tracing
kprobe events.

Reported-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc: Fix build break due to paca mm_context_t changes
Michael Ellerman [Fri, 8 Jan 2016 21:25:01 +0000 (08:25 +1100)]
powerpc: Fix build break due to paca mm_context_t changes

Commit 2fc251a8dda5 ("powerpc: Copy only required pieces of the
mm_context_t to the paca") broke the build for CONFIG_PPC_STD_MMU_64=y
and CONFIG_PPC_MM_SLICES=n.

That only happens for a kernel built with 4K pages and HUGETLB disabled,
which is why we missed it.

Fix it by adding a mm_ctx_user_psize member to the paca and populating
it in the appropriate places.

Fixes: 2fc251a8dda5 ("powerpc: Copy only required pieces of the mm_context_t to the paca")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agocxl: Fix DSI misses when the context owning task exits
Vaibhav Jain [Tue, 24 Nov 2015 10:56:18 +0000 (16:26 +0530)]
cxl: Fix DSI misses when the context owning task exits

Presently when a user-space process issues CXL_IOCTL_START_WORK ioctl we
store the pid of the current task_struct and use it to get pointer to
the mm_struct of the process, while processing page or segment faults
from the capi card. However this causes issues when the thread that had
originally issued the start-work ioctl exits in which case the stored
pid is no more valid and the cxl driver is unable to handle faults as
the mm_struct corresponding to process is no more accessible.

This patch fixes this issue by using the mm_struct of the next alive
task in the thread group. This is done by iterating over all the tasks
in the thread group starting from thread group leader and calling
get_task_mm on each one of them. When a valid mm_struct is obtained the
pid of the associated task is stored in the context replacing the
exiting one for handling future faults.

The patch introduces a new function named get_mem_context that checks if
the current task pointed to by ctx->pid is dead? If yes it performs the
steps described above. Also a new variable cxl_context.glpid is
introduced which stores the pid of the thread group leader associated
with the context owning task.

Reported-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Reported-by: Frank Haverkamp <HAVERKAM@de.ibm.com>
Suggested-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/powernv: Fix minor off-by-one error in opal_mce_check_early_recovery()
Andrew Donnellan [Mon, 21 Dec 2015 07:28:37 +0000 (18:28 +1100)]
powerpc/powernv: Fix minor off-by-one error in opal_mce_check_early_recovery()

Fix off-by-one error in opal_mce_check_early_recovery() when checking
whether the NIP falls within OPAL space.

Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc: Fix style of self-test config prompts
Andrew Donnellan [Mon, 21 Dec 2015 06:38:41 +0000 (17:38 +1100)]
powerpc: Fix style of self-test config prompts

A few of the config prompts for powerpc self-tests have periods at the
end, which is inconsistent with the rest of the prompts. Remove the
periods.

Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/powernv: Only delay opal_rtc_read() retry when necessary
Michael Neuling [Fri, 18 Dec 2015 10:46:04 +0000 (21:46 +1100)]
powerpc/powernv: Only delay opal_rtc_read() retry when necessary

Only delay opal_rtc_read() when busy and are going to retry.

This has the advantage of possibly saving a massive 10ms off booting!

Kudos to Stewart for noticing.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/powernv: Add a kmsg_dumper that flushes console output on panic
Russell Currey [Fri, 27 Nov 2015 06:23:07 +0000 (17:23 +1100)]
powerpc/powernv: Add a kmsg_dumper that flushes console output on panic

On BMC machines, console output is controlled by the OPAL firmware and is
only flushed when its pollers are called.  When the kernel is in a panic
state, it no longer calls these pollers and thus console output does not
completely flush, causing some output from the panic to be lost.

Output is only actually lost when the kernel is configured to not power off
or reboot after panic (i.e. CONFIG_PANIC_TIMEOUT is set to 0) since OPAL
flushes the console buffer as part of its power down routines.  Before this
patch, however, only partial output would be printed during the timeout wait.

This patch adds a new kmsg_dumper which gets called at panic time to ensure
panic output is not lost.  It accomplishes this by calling OPAL_CONSOLE_FLUSH
in the OPAL API, and if that is not available, the pollers are called enough
times to (hopefully) completely flush the buffer.

The flushing mechanism will only affect output printed at and before the
kmsg_dump call in kernel/panic.c:panic().  As such, the "end Kernel panic"
message may still be truncated as follows:

>Call Trace:
>[c000000f1f603b00] [c0000000008e9458] dump_stack+0x90/0xbc (unreliable)
>[c000000f1f603b30] [c0000000008e7e78] panic+0xf8/0x2c4
>[c000000f1f603bc0] [c000000000be4860] mount_block_root+0x288/0x33c
>[c000000f1f603c80] [c000000000be4d14] prepare_namespace+0x1f4/0x254
>[c000000f1f603d00] [c000000000be43e8] kernel_init_freeable+0x318/0x350
>[c000000f1f603dc0] [c00000000000bd74] kernel_init+0x24/0x130
>[c000000f1f603e30] [c0000000000095b0] ret_from_kernel_thread+0x5c/0xac
>---[ end Kernel panic - not

This functionality is implemented as a kmsg_dumper as it seems to be the
most sensible way to introduce platform-specific functionality to the
panic function.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc: Copy only required pieces of the mm_context_t to the paca
Michael Neuling [Thu, 10 Dec 2015 22:34:42 +0000 (09:34 +1100)]
powerpc: Copy only required pieces of the mm_context_t to the paca

Currently we copy the whole mm_context_t to the paca but only access a
few bits of it.  This is wasteful of space paca and also takes quite
some time in the hot path of context switching.

This patch pulls in only the required bits from the mm_context_t to
the paca and on context switch, copies only those.

Benchmarking this (On top of Anton's recent MSR context switching
changes [1]) using processes and yield shows an improvement of almost
3% on POWER8:

  http://ozlabs.org/~anton/junkcode/context_switch2.c
  ./context_switch2 --test=yield --process 0 0

1. https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-October/135700.html

Signed-off-by: Michael Neuling <mikey@neuling.org>
[mpe: Rename paca fields to be mm_ctx_foo rather than context_foo]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc: Add function to copy mm_context_t to the paca
Michael Neuling [Wed, 28 Oct 2015 04:54:06 +0000 (15:54 +1100)]
powerpc: Add function to copy mm_context_t to the paca

This adds a function to copy the mm->context to the paca.  This is
only a basic conversion for now but will be used more extensively in
the next patch.

This also adds #ifdef CONFIG_PPC_BOOK3S around this code since it's
not used elsewhere.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc: Add missing calls to va_end()
Daniel Axtens [Thu, 17 Dec 2015 08:41:00 +0000 (19:41 +1100)]
powerpc: Add missing calls to va_end()

cppcheck picked up that there were a couple of missing va_end()
calls in functions using va_start().

Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/pseries: Enable kernel CPU dlpar from sysfs
Nathan Fontenot [Wed, 16 Dec 2015 20:56:02 +0000 (14:56 -0600)]
powerpc/pseries: Enable kernel CPU dlpar from sysfs

Enable new kernel cpu hotplug functionality by allowing cpu dlpar requests
to be initiated from sysfs.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/pseries: Add CPU dlpar add functionality
Nathan Fontenot [Wed, 16 Dec 2015 20:55:07 +0000 (14:55 -0600)]
powerpc/pseries: Add CPU dlpar add functionality

Add the ability to hotplug add cpus via rtas hotplug events by either
specifying the drc index of the CPU to add, or providing a count of the
number of CPUs to add.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/pseries: Add CPU dlpar remove functionality
Nathan Fontenot [Wed, 16 Dec 2015 20:54:05 +0000 (14:54 -0600)]
powerpc/pseries: Add CPU dlpar remove functionality

Add the ability to dlpar remove CPUs via hotplug rtas events, either by
specifying the drc-index of the CPU to remove or providing a count of cpus
to remove.

To remove multiple cpus in a single request we create a list of possible
DR (Dynamic Reconfiguration) cpus and their drc indexes that can be
removed.  We can then traverse the list remove each cpu and easily clean
up in any cases of failure.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/pseries: Update CPU hotplug error recovery
Nathan Fontenot [Wed, 16 Dec 2015 20:52:39 +0000 (14:52 -0600)]
powerpc/pseries: Update CPU hotplug error recovery

Update the cpu dlpar add/remove paths to do better error recovery when
a failure occurs during the add/remove operation.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/pseries: Factor out common cpu hotplug code
Nathan Fontenot [Wed, 16 Dec 2015 20:51:26 +0000 (14:51 -0600)]
powerpc/pseries: Factor out common cpu hotplug code

Re-factor the cpu hotplug code to support doing cpu hotplug completely in
the kernel and using the existing sysfs probe/release interfaces. This
patch pulls out pieces of existing cpu hotplug code into common routines,
dlpar_cpu_add() and dlpar_cpu_remove(), to be used by both interfaces.
There are no functional changes introduced.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/pseries: Consolidate CPU hotplug code to hotplug-cpu.c
Nathan Fontenot [Wed, 16 Dec 2015 20:50:21 +0000 (14:50 -0600)]
powerpc/pseries: Consolidate CPU hotplug code to hotplug-cpu.c

No functional changes, this patch is simply a move of the cpu hotplug
code from pseries/dlpar.c to pseries/hotplug-cpu.c. This is in an effort
to consolidate all of the cpu hotplug code in a common place.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/pseries: Verify CPU doesn't exist before adding
Nathan Fontenot [Fri, 23 Oct 2015 17:45:57 +0000 (12:45 -0500)]
powerpc/pseries: Verify CPU doesn't exist before adding

When DLPAR adding a CPU we should verify that the CPU does not already
exist. Failure to do so can generate a kernel oops;

[    9.465585] kernel BUG at arch/powerpc/platforms/pseries/dlpar.c:382!
[    9.465796] Oops: Exception in kernel mode, sig: 5 [#1]

This oops can be generated by causing a probe to be performed on a cpu
by writing to the sysfs cpu probe file (/sys/devices/system/cpu/probe).
This patch adds a check for the existence of cpu prior to probing the cpu
so userspace doing the wrong thing won't trigger a BUG_ON().

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/476fpe: Add support for kexec
Alistair Popple [Mon, 14 Dec 2015 03:31:24 +0000 (14:31 +1100)]
powerpc/476fpe: Add support for kexec

PPC476FPE has a different PVR from previous PPC476 processors. The
kexec code checks the PVR in order to correctly setup the MMU. When
the initial support for 476FPE processors was added the corresponding
change in the kexec code was missed. This patch simply adds the check
and solves the following bug on kexec:

kexec: Starting new kernel
Bye!
Unable to handle kernel paging request for instruction fetch
Faulting instruction address: 0xee9a50f8
cpu 0x0: Vector: 400 (Instruction Access) at [ee9d7d20]
    pc: ee9a50f8
    lr: ee9a50e4
    sp: ee9d7dd0
    msr: 21020
    current = 0xee40f000
    pid   = 960, comm = kexec
enter ? for help
[link register   ] ee9a50e4
[ee9d7dd0c0013748 default_machine_kexec+0x58/0x70 (unreliable)
[ee9d7df0c0012f04 machine_kexec+0x34/0x40
[ee9d7e00c00aa1ec kernel_kexec+0x9c/0xb0
[ee9d7e20c005d704 SyS_reboot+0x1f4/0x220
[ee9d7f40c000db68 ret_from_syscall+0x0/0x3c

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/powernv: Add support for Nvlink NPUs
Alistair Popple [Thu, 17 Dec 2015 02:43:13 +0000 (13:43 +1100)]
powerpc/powernv: Add support for Nvlink NPUs

NVLink is a high speed interconnect that is used in conjunction with a
PCI-E connection to create an interface between CPU and GPU that
provides very high data bandwidth. A PCI-E connection to a GPU is used
as the control path to initiate and report status of large data
transfers sent via the NVLink.

On IBM Power systems the NVLink processing unit (NPU) is similar to
the existing PHB3. This patch adds support for a new NPU PHB type. DMA
operations on the NPU are not supported as this patch sets the TCE
translation tables to be the same as the related GPU PCIe device for
each NVLink. Therefore all DMA operations are setup and controlled via
the PCIe device.

EEH is not presently supported for the NPU devices, although it may be
added in future.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc: Add __raw_rm_writeq() function
Alistair Popple [Thu, 17 Dec 2015 02:43:12 +0000 (13:43 +1100)]
powerpc: Add __raw_rm_writeq() function

Move __raw_rm_writeq() from platforms/powernv/pci-ioda.c to
include/asm/io.h so that it can be used by other code.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agoRevert "powerpc/pci: Remove unused struct pci_dn.pcidev field"
Alistair Popple [Thu, 17 Dec 2015 02:43:11 +0000 (13:43 +1100)]
Revert "powerpc/pci: Remove unused struct pci_dn.pcidev field"

This commit removed the pcidev field from struct pci_dn as it was no
longer in use by the kernel. However to support finding the
association of Nvlink devices to GPU devices from the device-tree this
field is required.

This reverts commit 250c7b277c65 ("powerpc/pci: Remove unused struct
pci_dn.pcidev field").

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/powernv: Fix M64 resource name in /proc/iomem
Gavin Shan [Thu, 22 Oct 2015 01:03:08 +0000 (12:03 +1100)]
powerpc/powernv: Fix M64 resource name in /proc/iomem

The name of PCI root bus's M64 resource isn't initialized properly.
When dumping "/proc/iomem", "<BAD>" is seen for those M64 resources
on PCI root buses.

   ~# cat /proc/iomem | grep -e "BAD"
   3b0000000000-3b0fefffffff : <BAD>
   3b1000000000-3b1fefffffff : <BAD>
   3c0000000000-3c0fefffffff : <BAD>
   3c1000000000-3c1fefffffff : <BAD>
   3c2000000000-3c2fefffffff : <BAD>

This fixes the issue by setting the name of PCI root bus's M64
resource to that of PHB's device node full name. With the patch,
no "<BAD>" is seen from "/proc/iomem".

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Add page soft dirty tracking
Laurent Dufour [Thu, 3 Dec 2015 10:29:19 +0000 (11:29 +0100)]
powerpc/mm: Add page soft dirty tracking

User space checkpoint and restart tool (CRIU) needs the page's change
to be soft tracked. This allows to do a pre checkpoint and then dump
only touched pages.

This is done by using a newly assigned PTE bit (_PAGE_SOFT_DIRTY) when
the page is backed in memory, and a new _PAGE_SWP_SOFT_DIRTY bit when
the page is swapped out.

To introduce a new PTE _PAGE_SOFT_DIRTY bit value common to hash 4k
and hash 64k pte, the bits already defined in hash-*4k.h should be
shifted left by one.

The _PAGE_SWP_SOFT_DIRTY bit is dynamically put after the swap type in
the swap pte. A check is added to ensure that the bit is not
overwritten by _PAGE_HPTEFLAGS.

Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
CC: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/kernel: Combine vec/loc for STD_EXCEPTION_PSERIES
Michael Ellerman [Wed, 16 Dec 2015 10:10:22 +0000 (21:10 +1100)]
powerpc/kernel: Combine vec/loc for STD_EXCEPTION_PSERIES

The STD_EXCEPTION_PSERIES macro takes both a vector number, and a
location (memory address). However both are always identical, so combine
them to save repeating ourselves.

This does mean an exception handler must always exist at the location in
memory that matches its vector number. But that's OK because this is the
"STD" macro (standard), which does exactly that. We have other macros
for the other cases, eg. STD_EXCEPTION_PSERIES_OOL (out of line).

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/kernel: Open code SET_DEFAULT_THREAD_PPR
Michael Ellerman [Wed, 25 Nov 2015 03:25:18 +0000 (14:25 +1100)]
powerpc/kernel: Open code SET_DEFAULT_THREAD_PPR

This is only used in one location, open code it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/kernel: Open code HMT_MEDIUM_LOW_HAS_PPR
Michael Ellerman [Wed, 25 Nov 2015 03:25:17 +0000 (14:25 +1100)]
powerpc/kernel: Open code HMT_MEDIUM_LOW_HAS_PPR

HMT_MEDIUM_LOW_HAS_PPR is only used in once place, open code it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/kernel: Drop HMT_MEDIUM_PPR_DISCARD
Michael Ellerman [Wed, 25 Nov 2015 03:25:16 +0000 (14:25 +1100)]
powerpc/kernel: Drop HMT_MEDIUM_PPR_DISCARD

HMT_MEDIUM_PPR_DISCARD is a macro which is present at the start of most
of our first level exception handlers. It conditionally executes a
HMT_MEDIUM instruction, which sets the processor priority to medium.

On on modern systems, ie. Power7 and later, it is nop'ed out at boot.
All it does is make the exception vectors more cramped, and consume 4
bytes of icache.

On old systems it has the effect of boosting the processor priority at
the start of exception processing. If we were previously in the idle
loop for example, we may be at low or very low priority. This is
desirable as we want to process the exception as fast as possible.

However looking closely at the generated code, we see that in all cases
we execute another HMT_MEDIUM just four instructions later. With code
patching applied, the final code on an old (Power6) system will look
like, eg:

  c000000000000300 <data_access_pSeries>:
  c000000000000300: 7c 42 13 78 mr r2,r2 <-
  c000000000000304: 7d b2 43 a6 mtsprg 2,r13
  c000000000000308: 7d b1 42 a6 mfsprg r13,1
  c00000000000030c: f9 2d 00 80 std r9,128(r13)
  c000000000000310: 60 00 00 00 nop
  c000000000000314: 7c 42 13 78 mr r2,r2 <-

So I suggest that the added code complexity of HMT_MEDIUM_PPR_DISCARD is
not justified by the benefit of boosting the processor priority for the
duration of four instructions, and therefore we drop it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/rtas: Make enter_rtas() private
Michael Ellerman [Tue, 24 Nov 2015 11:26:12 +0000 (22:26 +1100)]
powerpc/rtas: Make enter_rtas() private

There are no longer any users of enter_rtas() outside of rtas.c, so make
it "private", by moving the declaration inside rtas.c. Hopefully this
will encourage people to use one of the wrappers which takes the sharp
edges off the RTAS calling sequence.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/rtas: Use rtas_call_unlocked() in call_rtas_display_status()
Michael Ellerman [Tue, 24 Nov 2015 11:26:11 +0000 (22:26 +1100)]
powerpc/rtas: Use rtas_call_unlocked() in call_rtas_display_status()

Although call_rtas_display_status() does actually want to use the
regular RTAS locking, it doesn't want the extra logic that is in
rtas_call(), so currently it open codes the logic.

Instead we can use rtas_call_unlocked(), after taking the RTAS lock.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/pseries: Use rtas_call_unlocked() in pseries hotplug
Michael Ellerman [Tue, 24 Nov 2015 11:26:10 +0000 (22:26 +1100)]
powerpc/pseries: Use rtas_call_unlocked() in pseries hotplug

Avoid open coding the logic by using rtas_call_unlocked().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/xmon: Use rtas_call_unlocked() in xmon
Michael Ellerman [Tue, 24 Nov 2015 11:26:09 +0000 (22:26 +1100)]
powerpc/xmon: Use rtas_call_unlocked() in xmon

Avoid open coding the logic by using rtas_call_unlocked().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/rtas: Add rtas_call_unlocked()
Michael Ellerman [Wed, 16 Dec 2015 10:01:42 +0000 (21:01 +1100)]
powerpc/rtas: Add rtas_call_unlocked()

Most users of RTAS (Run-Time Abstraction Services) use rtas_call(),
which deals with locking as well as endian handling.

However we have two users outside of rtas.c that can't use rtas_call()
because they have different locking requirements.

The hotplug CPU code can't take the RTAS lock because the CPU would go
offline with the lock held and no other CPUs would be able to call RTAS
until the CPU came back online.

The xmon code doesn't want to take the lock because it would risk dead
locking when we are trying to recover from a crash.

Both sites required multiple patches when we added little endian
support, proving that programmers can't do endian right.

Although that ship has sailed, we can still clean the code up by
providing an unlocked version of rtas_call() which avoids the need to
open code the logic elsewhere.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/powernv: remove FW_FEATURE_OPALv3 and just use FW_FEATURE_OPAL
Stewart Smith [Wed, 9 Dec 2015 06:18:20 +0000 (17:18 +1100)]
powerpc/powernv: remove FW_FEATURE_OPALv3 and just use FW_FEATURE_OPAL

Long ago, only in the lab, there was OPALv1 and OPALv2. Now there is
just OPALv3, with nobody ever expecting anything on pre-OPALv3 to
be cared about or supported by mainline kernels.

So, let's remove FW_FEATURE_OPALv3 and instead use FW_FEATURE_OPAL
exclusively.

Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/powernv: Remove OPALv2 firmware define and references
Stewart Smith [Wed, 9 Dec 2015 06:18:19 +0000 (17:18 +1100)]
powerpc/powernv: Remove OPALv2 firmware define and references

OPALv2 only ever existed in the lab and didn't escape to the world.
All OPAL systems in the wild are OPALv3.

The probability of there being an OPALv2 system still powered on
anywhere inside IBM is approximately zero, let alone anyone
expecting to run mainline kernels.

So, start to remove references to OPALv2.

Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/powernv: panic() on OPAL < V3
Stewart Smith [Wed, 9 Dec 2015 06:18:18 +0000 (17:18 +1100)]
powerpc/powernv: panic() on OPAL < V3

The OpenPower Abstraction Layer firmware went through a couple
of iterations in the lab before being released. What we now know
as OPAL advertises itself as OPALv3.

OPALv2 and OPALv1 never made it outside the lab, and the possibility
of anyone at all ever building a mainline kernel today and expecting
it to boot on such hardware is zero.

Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agoselftests/powerpc: Add script to test HMI functionality
Daniel Axtens [Sun, 6 Dec 2015 23:50:51 +0000 (10:50 +1100)]
selftests/powerpc: Add script to test HMI functionality

HMIs (Hypervisor Management|Maintenance Interrupts) are a class of interrupt
on POWER systems.

HMI support has traditionally been exceptionally difficult to test, however
Skiboot ships a tool that, with the correct magic numbers, will inject them.

This, therefore, is a first pass at a script to inject HMIs and monitor
Linux's response. It injects an HMI on each core on every chip in turn
It then watches dmesg to see if it's acknowledged by Linux.

On a Tuletta, I observed that we see 8 (or sometimes 9 or more) events per
injection, regardless of SMT setting, so we wait for 8 before progressing.

It sits in a new scripts/ directory in selftests/powerpc, because it's not
designed to be run as part of the regular make selftests process. In
particular, it is quite possibly going to end up garding lots of your CPUs,
so it should only be run if you know how to undo that.

CC: Mahesh J Salgaonkar <mahesh.salgaonkar@in.ibm.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agoselftests/powerpc: Make context_switch touch FP/altivec/vector by default
Michael Ellerman [Wed, 2 Dec 2015 09:44:11 +0000 (20:44 +1100)]
selftests/powerpc: Make context_switch touch FP/altivec/vector by default

Simply because it touches more code paths that way, and therefore tests
more things.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Anton Blanchard <anton@samba.org>
8 years agoselftests/powerpc: Make context_switch do something with no args
Michael Ellerman [Wed, 2 Dec 2015 09:44:10 +0000 (20:44 +1100)]
selftests/powerpc: Make context_switch do something with no args

For ease of use make the context_switch test do something useful when
called with no arguments.

Default to a 30 second run, using threads, doing yield, and use any
online cpu. Make it print out what it's doing to avoid confusion.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Anton Blanchard <anton@samba.org>
8 years agoselftests/powerpc: Import Anton's context_switch2 benchmark
Michael Ellerman [Wed, 2 Dec 2015 09:44:09 +0000 (20:44 +1100)]
selftests/powerpc: Import Anton's context_switch2 benchmark

This gets referred to a lot in commit messages, so let's pull it into
the selftests.

Almost vanilla from: http://ozlabs.org/~anton/junkcode/context_switch2.c

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Anton Blanchard <anton@samba.org>
8 years agoselftests/powerpc: Move pick_online_cpu() up into utils.c
Michael Ellerman [Wed, 16 Dec 2015 07:59:31 +0000 (18:59 +1100)]
selftests/powerpc: Move pick_online_cpu() up into utils.c

We want to use this in another test, so make it available at the top of
the powerpc selftests tree.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc: Remove broken GregorianDay()
Daniel Axtens [Tue, 15 Dec 2015 07:09:14 +0000 (18:09 +1100)]
powerpc: Remove broken GregorianDay()

GregorianDay() is supposed to calculate the day of the week
(tm->tm_wday) for a given day/month/year. In that calcuation it
indexed into an array called MonthOffset using tm->tm_mon-1. However
tm_mon is zero-based, not one-based, so this is off-by-one. It also
means that every January, GregoiranDay() will access element -1 of
the MonthOffset array.

It also doesn't appear to be a correct algorithm either: see in
contrast kernel/time/timeconv.c's time_to_tm function.

It's been broken forever, which suggests no-one in userland uses
this. It looks like no-one in the kernel uses tm->tm_wday either
(see e.g. drivers/rtc/rtc-ds1305.c:319).

tm->tm_wday is conventionally set to -1 when not available in
hardware so we can simply set it to -1 and drop the function.
(There are over a dozen other drivers in drivers/rtc that do
this.)

Found using UBSAN.

Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andrew Morton <akpm@linux-foundation.org> # as an example of what UBSan finds.
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Cc: rtc-linux@googlegroups.com
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agoselftests/powerpc: Add test to check if VSRs are corrupted
Rashmica Gupta [Thu, 10 Dec 2015 09:49:33 +0000 (20:49 +1100)]
selftests/powerpc: Add test to check if VSRs are corrupted

When a transaction is aborted, VSR values should rollback to the
checkpointed values before the transaction began. VSRs used elsewhere in
the kernel during a transaction, or while the transaction is suspended
should not affect the checkpointed values.

Prior to the bug fix in commit d31626f70b61 ("powerpc: Don't corrupt
transactional state when using FP/VMX in kernel") when VMX was requested
by the kernel the .vr_state (which held the checkpointed state of VSRs
before the transaction) was overwritten with the current state from
outside the transation. Thus if the transaction did not complete, the
VSR values would be "rolled back" to potentially incorrect values.

Signed-off-by: Rashmica Gupta <rashmicy@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/xmon: Append linux_banner to exception information in xmon.
Rashmica Gupta [Wed, 25 Nov 2015 02:46:25 +0000 (13:46 +1100)]
powerpc/xmon: Append linux_banner to exception information in xmon.

Currently if you are in xmon without an oops etc. to view the kernel
version you have to type "d $linux_banner" - not necessarily obvious. As
this is useful information, append to the output of "e" command.

Example output:
  $mon> e
  cpu 0x1: Vector: 0  at [c0000000f879ba80]
      pc: c000000000081718: sysrq_handle_xmon+0x68/0x80
      lr: c000000000081718: sysrq_handle_xmon+0x68/0x80
      sp: c0000000f879bbe0
     msr: 8000000000009033
    current = 0xc0000000f604d5c0
    paca    = 0xc00000000fdc0480  softe: 0  irq_happened: 0x01
      pid   = 2467, comm = bash
  Linux version 4.4.0-rc2-00008-gc51af91c3ab3-dirty (rashmica@circle) (gcc
  version 5.1.1 20150629 (GCC) ) #45 SMP Wed Nov 25 10:25:12 AEDT 2015

Signed-off-by: Rashmica Gupta <rashmicy@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/cell: Remove the Cell QPACE code
Rashmica Gupta [Tue, 1 Dec 2015 03:51:38 +0000 (14:51 +1100)]
powerpc/cell: Remove the Cell QPACE code

All users of QPACE have upgraded to QPACE2 so remove the Cell QPACE code.

Signed-off-by: Rashmica Gupta <rashmicy@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/pseries: Limit EPOW reset event warnings
Vipin K Parashar [Tue, 1 Dec 2015 11:13:42 +0000 (16:43 +0530)]
powerpc/pseries: Limit EPOW reset event warnings

Kernel prints respective warnings about various EPOW events for
user information/action after parsing EPOW interrupts. At times
below EPOW reset event warning is seen to be flooding kernel log
over a period of time.

May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared
May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared
May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared
May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared
May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared
May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared
May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared

These EPOW reset events are spurious in nature and are triggered by
firmware without an actual EPOW event being reset. This patch avoids these
multiple EPOW reset warnings by using a counter variable. This variable
is incremented every time an EPOW event is reported. Upon receiving a EPOW
reset event the same variable is checked to filter out spurious events and
decremented accordingly.

This patch also improves log messages to better describe EPOW event being
reported. Merged adjacent log messages into single one to reduce number of
lines printed per event.

Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Vipin K Parashar <vipin@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agoselftests/powerpc: Add TM signal with invalid stack test
Michael Neuling [Fri, 20 Nov 2015 04:15:34 +0000 (15:15 +1100)]
selftests/powerpc: Add TM signal with invalid stack test

Test the kernels signal generation code to ensure it can handle an
invalid stack pointer when transactional.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Tested-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
[mpe: Skip if we don't have TM]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agoselftests/powerpc: Add TM signal return test
Michael Neuling [Fri, 20 Nov 2015 04:15:33 +0000 (15:15 +1100)]
selftests/powerpc: Add TM signal return test

Test the kernel's signal return code to ensure that it doesn't crash
when both the transactional and suspend MSR bits are set in the signal
context.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Tested-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
[mpe: Skip if we don't have TM]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agoselftests/powerpc: Skip tm-resched-dscr if we don't have TM
Michael Ellerman [Wed, 2 Dec 2015 05:00:04 +0000 (16:00 +1100)]
selftests/powerpc: Skip tm-resched-dscr if we don't have TM

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agoselftests/powerpc: Move TM helpers into tm.h
Michael Ellerman [Tue, 24 Nov 2015 02:05:40 +0000 (13:05 +1100)]
selftests/powerpc: Move TM helpers into tm.h

Move have_htm_nosc() into a new tm.h, and add a new helper, have_htm()
which we'll use in the next patch.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agoselftests/powerpc: Add have_hwcap2() helper
Michael Ellerman [Tue, 24 Nov 2015 02:05:39 +0000 (13:05 +1100)]
selftests/powerpc: Add have_hwcap2() helper

We already do this twice and want to add another so add a helper.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agoselftests/powerpc: Move get_auxv_entry() into utils.c
Michael Ellerman [Tue, 24 Nov 2015 02:05:38 +0000 (13:05 +1100)]
selftests/powerpc: Move get_auxv_entry() into utils.c

This doesn't really belong in harness.c, it's a helper function. So move
it into utils.c.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agoMerge tag 'powerpc-4.4-3' into next
Michael Ellerman [Mon, 14 Dec 2015 09:40:32 +0000 (20:40 +1100)]
Merge tag 'powerpc-4.4-3' into next

Merge the two TM fixes we merged in 4.4. We are about to merge selftests
for these, and without the fixes the selftests will oops.

powerpc fixes for 4.4 #2

 - tm: Block signal return from setting invalid MSR state from Michael Neuling
 - tm: Check for already reclaimed tasks from Michael Neuling

8 years agopowerpc: Print MSR TM bits in oops messages
Michael Neuling [Fri, 20 Nov 2015 04:15:32 +0000 (15:15 +1100)]
powerpc: Print MSR TM bits in oops messages

Print MSR TM bits in oops messages.  This appends them to the end
like this:

    MSR: 8000000502823031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[TE]>

You get the TM[] only if at least one TM MSR bit is set.  Inside the
TM[], E means Enabled (bit 32), S means Suspended (bit 33), and T
means Transactional (bit 34)

If no bits are set, you get no TM[] output.

Include rework of printbits() to handle this case.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc: Make {cmp}xchg* and their atomic_ versions fully ordered
Boqun Feng [Mon, 2 Nov 2015 01:30:32 +0000 (09:30 +0800)]
powerpc: Make {cmp}xchg* and their atomic_ versions fully ordered

According to memory-barriers.txt, xchg*, cmpxchg* and their atomic_
versions all need to be fully ordered, however they are now just
RELEASE+ACQUIRE, which are not fully ordered.

So also replace PPC_RELEASE_BARRIER and PPC_ACQUIRE_BARRIER with
PPC_ATOMIC_ENTRY_BARRIER and PPC_ATOMIC_EXIT_BARRIER in
__{cmp,}xchg_{u32,u64} respectively to guarantee fully ordered semantics
of atomic{,64}_{cmp,}xchg() and {cmp,}xchg(), as a complement of commit
b97021f85517 ("powerpc: Fix atomic_xxx_return barrier semantics")

This patch depends on patch "powerpc: Make value-returning atomics fully
ordered" for PPC_ATOMIC_ENTRY_BARRIER definition.

Cc: stable@vger.kernel.org # 3.2+
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc: Make value-returning atomics fully ordered
Boqun Feng [Mon, 2 Nov 2015 01:30:31 +0000 (09:30 +0800)]
powerpc: Make value-returning atomics fully ordered

According to memory-barriers.txt:

> Any atomic operation that modifies some state in memory and returns
> information about the state (old or new) implies an SMP-conditional
> general memory barrier (smp_mb()) on each side of the actual
> operation ...

Which mean these operations should be fully ordered. However on PPC,
PPC_ATOMIC_ENTRY_BARRIER is the barrier before the actual operation,
which is currently "lwsync" if SMP=y. The leading "lwsync" can not
guarantee fully ordered atomics, according to Paul Mckenney:

https://lkml.org/lkml/2015/10/14/970

To fix this, we define PPC_ATOMIC_ENTRY_BARRIER as "sync" to guarantee
the fully-ordered semantics.

This also makes futex atomics fully ordered, which can avoid possible
memory ordering problems if userspace code relies on futex system call
for fully ordered semantics.

Fixes: b97021f85517 ("powerpc: Fix atomic_xxx_return barrier semantics")
Cc: stable@vger.kernel.org # 3.2+
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Don't open code pgtable_t size
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:37:00 +0000 (09:07 +0530)]
powerpc/mm: Don't open code pgtable_t size

The slot information of base page size hash pte is stored in the
pgtable_t w.r.t transparent hugepage. We need to make sure we don't
index beyond pgtable_t size.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Use H_READ with H_READ_4
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:36:59 +0000 (09:06 +0530)]
powerpc/mm: Use H_READ with H_READ_4

This will bulk read 4 hash pte slot entries and should reduce the loop

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/nohash: we don't use real_pte_t for nohash
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:36:58 +0000 (09:06 +0530)]
powerpc/nohash: we don't use real_pte_t for nohash

Remove the related functions and #defines

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/nohash: Update 64K nohash config to have 32 pte fragement
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:36:57 +0000 (09:06 +0530)]
powerpc/nohash: Update 64K nohash config to have 32 pte fragement

They don't need to track 4k subpage slot details and hence don't need
second half of pgtable_t.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Don't hardcode the hash pte slot shift
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:36:56 +0000 (09:06 +0530)]
powerpc/mm: Don't hardcode the hash pte slot shift

Use the #define instead of open-coding the same

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Don't hardcode page table size
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:36:55 +0000 (09:06 +0530)]
powerpc/mm: Don't hardcode page table size

pte and pmd table size are dependent on config items. Don't
hard code the same. This make sure we use the right value
when masking pmd entries and also while checking pmd_bad

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Add a _PAGE_PTE bit
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:36:54 +0000 (09:06 +0530)]
powerpc/mm: Add a _PAGE_PTE bit

For a pte entry we will have _PAGE_PTE set. Our pte page
address have a minimum alignment requirement of HUGEPD_SHIFT_MASK + 1.
We use the lower 7 bits to indicate hugepd. ie.

For pmd and pgd we can find:
1) _PAGE_PTE set pte -> indicate PTE
2) bits [2..6] non zero -> indicate hugepd.
   They also encode the size. We skip bit 1 (_PAGE_PRESENT).
3) othewise pointer to next table.

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Move THP headers around
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:36:53 +0000 (09:06 +0530)]
powerpc/mm: Move THP headers around

We support THP only with book3s_64 and 64K page size. Move
THP details to hash64-64k.h to clarify the same.

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Move hugetlb related headers
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:36:52 +0000 (09:06 +0530)]
powerpc/mm: Move hugetlb related headers

W.r.t hugetlb, we support two format for pmd. With book3s_64 and
64K linux page size, we can have pte at the pmd level. Hence we
don't need to support hugepd there. For everything else hugepd
is supported and pmd_huge is (0).

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Move WIMG update to helper.
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:36:51 +0000 (09:06 +0530)]
powerpc/mm: Move WIMG update to helper.

Only difference here is, we apply the WIMG mapping early, so rflags
passed to updatepp will also be changed.

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Add helper for converting pte bit to hpte bits
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:36:50 +0000 (09:06 +0530)]
powerpc/mm: Add helper for converting pte bit to hpte bits

Instead of open coding it in multiple code paths, export the helper
and add more documentation. Also make sure we don't make assumption
regarding pte bit position

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Convert 4k insert from asm to C
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:36:49 +0000 (09:06 +0530)]
powerpc/mm: Convert 4k insert from asm to C

This is similar to 64K insert. May be we want to consolidate

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Convert __hash_page_64K to C
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:36:48 +0000 (09:06 +0530)]
powerpc/mm: Convert __hash_page_64K to C

Convert from asm to C

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Increase the width of #define
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:36:47 +0000 (09:06 +0530)]
powerpc/mm: Increase the width of #define

No real change, only style changes

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Remove pte_val usage for the second half of pgtable_t
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:36:46 +0000 (09:06 +0530)]
powerpc/mm: Remove pte_val usage for the second half of pgtable_t

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Don't track subpage valid bit in pte_t
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:36:45 +0000 (09:06 +0530)]
powerpc/mm: Don't track subpage valid bit in pte_t

This free up 11 bits in pte_t. In the later patch we also change
the pte_t format so that we can start supporting migration pte
at pmd level. We now track 4k subpage valid bit as below

If we have _PAGE_COMBO set, we override the _PAGE_F_GIX_SHIFT
and _PAGE_F_SECOND. Together we have 4 bits, each of them
used to indicate whether any of the 4 4k subpage in that group
is valid. ie,

[ group 1 bit ]   [ group 2 bit ]  ..... [ group 4 ]
[ subpage 1 - 4]  [ subpage 5- 8]  ..... [ subpage 13 - 16]

We still track each 4k subpage slot number and secondary hash
information in the second half of pgtable_t. Removing the subpage
tracking have some significant overhead on aim9 and ebizzy benchmark and
to support THP with 4K subpage, we do need a pgtable_t of 4096 bytes.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Remove the dependency on pte bit position in asm code
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:36:44 +0000 (09:06 +0530)]
powerpc/mm: Remove the dependency on pte bit position in asm code

We should not expect pte bit position in asm code. Simply
by moving part of that to C

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Convert 4k hash insert to C
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:36:43 +0000 (09:06 +0530)]
powerpc/mm: Convert 4k hash insert to C

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/booke: Move nohash headers
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:36:38 +0000 (09:06 +0530)]
powerpc/booke: Move nohash headers

Move the booke related headers below booke/32 or booke/64

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Move PTE bits from generic functions to hash64 functions.
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:36:37 +0000 (09:06 +0530)]
powerpc/mm: Move PTE bits from generic functions to hash64 functions.

functions which operate on pte bits are moved to hash*.h and other
generic functions are moved to pgtable.h

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Move hash64 PTE bits from book3s/64/pgtable.h to hash.h
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:36:36 +0000 (09:06 +0530)]
powerpc/mm: Move hash64 PTE bits from book3s/64/pgtable.h to hash.h

This enables us to keep hash64 related bits together, and makes it easy
to follow.

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Don't use pmd_val, pud_val and pgd_val as lvalue
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:36:35 +0000 (09:06 +0530)]
powerpc/mm: Don't use pmd_val, pud_val and pgd_val as lvalue

We convert them static inline function here as we did with pte_val in
the previous patch

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Don't use pte_val as lvalue
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:36:34 +0000 (09:06 +0530)]
powerpc/mm: Don't use pte_val as lvalue

We also convert few #define to static inline in this patch for better
type checking

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Drop pte-common.h from BOOK3S 64
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:36:33 +0000 (09:06 +0530)]
powerpc/mm: Drop pte-common.h from BOOK3S 64

We copy only needed PTE bits define from pte-common.h to respective
hash related header. This should greatly simply later patches in which
we are going to change the pte format for hash config

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Don't have generic headers introduce functions touching pte bits
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:36:32 +0000 (09:06 +0530)]
powerpc/mm: Don't have generic headers introduce functions touching pte bits

We are going to drop pte_common.h in the later patch. The idea is to
enable hash code not require to define all PTE bits. Having PTE bits
defined in pte_common.h made the code unnecessarily complex.

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Delete booke bits from book3s
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:36:31 +0000 (09:06 +0530)]
powerpc/mm: Delete booke bits from book3s

We also move __ASSEMBLY__ towards the end of header. This avoid
having #ifndef __ASSEMBLY___ all over the header

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Move hash specific pte width and other defines to book3s
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:36:30 +0000 (09:06 +0530)]
powerpc/mm: Move hash specific pte width and other defines to book3s

This further make a copy of pte defines to book3s/64/hash*.h. This
remove the dependency on pgtable-ppc64-4k.h and pgtable-ppc64-64k.h

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: make a separate copy for book3s
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:36:28 +0000 (09:06 +0530)]
powerpc/mm: make a separate copy for book3s

In this patch we do:
cp pgtable-ppc32.h book3s/32/pgtable.h
cp pgtable-ppc64.h book3s/64/pgtable.h

This enable us to do further changes to hash specific config.
We will change the page table format for 64bit hash in later patches.

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: move pte headers to book3s directory
Aneesh Kumar K.V [Tue, 1 Dec 2015 03:36:26 +0000 (09:06 +0530)]
powerpc/mm: move pte headers to book3s directory

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Fix infinite loop in hash fault with 4K page size
Aneesh Kumar K.V [Sat, 28 Nov 2015 17:09:33 +0000 (22:39 +0530)]
powerpc/mm: Fix infinite loop in hash fault with 4K page size

This is the same bug we fixed as part of 09567e7fd44291bfc08accfdd67ad8f467842332
("powerpc/mm: Check paca psize is up to date for huge mappings"). Please
check that for details. The difference here is that faults were
happening on a 4K page at an address previously mapped by hugetlb.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc: Fix DSCR inheritance over fork()
Anton Blanchard [Wed, 9 Dec 2015 09:11:47 +0000 (20:11 +1100)]
powerpc: Fix DSCR inheritance over fork()

Two DSCR tests have a hack in them:

/*
 * XXX: Force a context switch out so that DSCR
 * current value is copied into the thread struct
 * which is required for the child to inherit the
 * changed value.
 */
sleep(1);

We should not be working around this in the testcase, it is a kernel bug.
Fix it by copying the current DSCR to the child, instead of what we
had in the thread struct at last context switch.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc: Call restore_sprs() before _switch()
Anton Blanchard [Thu, 10 Dec 2015 09:44:39 +0000 (20:44 +1100)]
powerpc: Call restore_sprs() before _switch()

commit 152d523e6307 ("powerpc: Create context switch helpers save_sprs()
and restore_sprs()") moved the restore of SPRs after the call to _switch().

There is an issue with this approach - new tasks do not return through
_switch(), they are set up by copy_thread() to directly return through
ret_from_fork() or ret_from_kernel_thread(). This means restore_sprs() is
not getting called for new tasks.

Fix this by moving restore_sprs() before _switch().

Fixes: 152d523e6307 ("powerpc: Create context switch helpers save_sprs() and restore_sprs()")
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc: Call check_if_tm_restore_required() in enable_kernel_*()
Anton Blanchard [Thu, 10 Dec 2015 09:04:05 +0000 (20:04 +1100)]
powerpc: Call check_if_tm_restore_required() in enable_kernel_*()

Commit a0e72cf12b1a ("powerpc: Create msr_check_and_{set,clear}()")
removed a call to check_if_tm_restore_required() in the
enable_kernel_*() functions. Add them back in.

Fixes: a0e72cf12b1a ("powerpc: Create msr_check_and_{set,clear}()")
Reported-by: Rashmica Gupta <rashmicy@gmail.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc: clean up asm/switch_to.h
Anton Blanchard [Thu, 29 Oct 2015 00:44:11 +0000 (11:44 +1100)]
powerpc: clean up asm/switch_to.h

Remove a bunch of unnecessary fallback functions and group
things in a more logical way.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc: Rearrange __switch_to()
Anton Blanchard [Thu, 29 Oct 2015 00:44:10 +0000 (11:44 +1100)]
powerpc: Rearrange __switch_to()

Most of __switch_to() is housekeeping, TLB batching, timekeeping etc.
Move these away from the more complex and critical context switching
code.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc: create flush_all_to_thread()
Anton Blanchard [Thu, 29 Oct 2015 00:44:09 +0000 (11:44 +1100)]
powerpc: create flush_all_to_thread()

Create a single function that flushes everything (FP, VMX, VSX, SPE).
Doing this all at once means we only do one MSR write.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc: create giveup_all()
Anton Blanchard [Thu, 29 Oct 2015 00:44:08 +0000 (11:44 +1100)]
powerpc: create giveup_all()

Create a single function that gives everything up (FP, VMX, VSX, SPE).
Doing this all at once means we only do one MSR write.

A context switch microbenchmark using yield():

http://ozlabs.org/~anton/junkcode/context_switch2.c

./context_switch2 --test=yield --fp --altivec --vector 0 0

shows an improvement of 3% on POWER8.

Signed-off-by: Anton Blanchard <anton@samba.org>
[mpe: giveup_all() needs to be EXPORT_SYMBOL'ed]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc: Remove fp_enable() and vec_enable(), use msr_check_and_{set, clear}()
Anton Blanchard [Thu, 29 Oct 2015 00:44:07 +0000 (11:44 +1100)]
powerpc: Remove fp_enable() and vec_enable(), use msr_check_and_{set, clear}()

More consolidation of our MSR available bit handling.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc: Add ppc_strict_facility_enable boot option
Anton Blanchard [Thu, 29 Oct 2015 00:44:06 +0000 (11:44 +1100)]
powerpc: Add ppc_strict_facility_enable boot option

Add a boot option that strictly manages the MSR unavailable bits.
This catches kernel uses of FP/Altivec/SPE that would otherwise
corrupt user state.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>