git.karo-electronics.de Git - linux-beck.git/log

powerpc/rcpm: Fix build break when SMP=n

Add an include of asm/smp.h to fix a build break when SMP=n:

arch/powerpc/sysdev/fsl_rcpm.c:32:2: error: implicit declaration of
function 'get_hard_smp_processor_id'

Fixes: d17799f9c10e ("powerpc/rcpm: add RCPM driver")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/book3e-64: Use hardcoded mttmr opcode

This preserves the ability to build using older binutils (reportedly <=
2.22).

Fixes: 6becef7ea04a ("powerpc/mpc85xx: Add CPU hotplug support for E6500")
Signed-off-by: Scott Wood <oss@buserror.net>
Cc: chenhui.zhao@freescale.com
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next

Freescale updates from Scott:

"Highlights include 8xx optimizations, 32-bit checksum optimizations,
86xx consolidation, e5500/e6500 cpu hotplug, more fman and other dt
bits, and minor fixes/cleanup."

powerpc/fsl/dts: Add "jedec,spi-nor" flash compatible

Starting with commit <8947e396a829> ("Documentation: dt: mtd:
replace "nor-jedec" binding with "jedec, spi-nor"") we have
"jedec,spi-nor" binding indicating support for JEDEC identification.

Use it for all flashes that are supposed to support READ ID op
according to the datasheets.

Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@freescale.com>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc/T104xRDB: add tdm riser card node to device tree

Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc32: PAGE_EXEC required for inittext

PAGE_EXEC is required for inittext, otherwise CONFIG_DEBUG_PAGEALLOC
ends up with an Oops

[    0.000000] Inode-cache hash table entries: 8192 (order: 1, 32768 bytes)
[    0.000000] Sorting __ex_table...
[    0.000000] bootmem::free_all_bootmem_core nid=0 start=0 end=2000
[    0.000000] Unable to handle kernel paging request for instruction fetch
[    0.000000] Faulting instruction address: 0xc045b970
[    0.000000] Oops: Kernel access of bad area, sig: 11 [#1]
[    0.000000] PREEMPT DEBUG_PAGEALLOC CMPC885
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.25-local-dirty #1673
[    0.000000] task: c04d83d0 ti: c04f8000 task.ti: c04f8000
[    0.000000] NIP: c045b970 LR: c045b970 CTR: 0000000a
[    0.000000] REGS: c04f9ea0 TRAP: 0400   Not tainted  (3.18.25-local-dirty)
[    0.000000] MSR: 08001032 <ME,IR,DR,RI>  CR: 39955d35  XER: a000ff40
[    0.000000]
GPR00: c045b970 c04f9f50 c04d83d0 00000000 ffffffff c04dcdf4 00000048 c04f6b10
GPR08: c04f6ab0 00000001 c0563488 c04f6ab0 c04f8000 00000000 00000000 b6db6db7
GPR16: 00003474 00000180 00002000 c7fec000 00000000 000003ff 00000176 c0415014
GPR24: c0471018 c0414ee8 c05304e8 c03aeaac c0510000 c0471018 c0471010 00000000
[    0.000000] NIP [c045b970] free_all_bootmem+0x164/0x228
[    0.000000] LR [c045b970] free_all_bootmem+0x164/0x228
[    0.000000] Call Trace:
[    0.000000] [c04f9f50] [c045b970] free_all_bootmem+0x164/0x228 (unreliable)
[    0.000000] [c04f9fa0] [c0454044] mem_init+0x3c/0xd0
[    0.000000] [c04f9fb0] [c045080c] start_kernel+0x1f4/0x390
[    0.000000] [c04f9ff0] [c0002214] start_here+0x38/0x98
[    0.000000] Instruction dump:
[    0.000000] 2f150000 7f968840 72a90001 3ad60001 56b5f87e 419a0028 419e0024 41a20018
[    0.000000] 807cc20c 38800000 7c638214 4bffd2f5 <3a940001> 3a100024 4bffffc8 7e368b78
[    0.000000] ---[ end trace dc8fa200cb88537f ]---

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc/mpc85xx: Add pcsphy nodes to FManV3 device tree

This patch adds pcsphy node to FManV3 device tree.

Signed-off-by: Igal Liberman <igal.liberman@freescale.com>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc/mpc85xx: Add MDIO bus muxing support to the board device tree(s)

Describe the PHY topology for all configurations supported by each board

Based on prior work by Andy Fleming <afleming@freescale.com>

Signed-off-by: Shruti Kanetkar <Shruti@freescale.com>
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Igal Liberman <Igal.Liberman@freescale.com>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc/86xx: Introduce and use common dtsi

Signed-off-by: Alessio Igor Bogani <alessio.bogani@elettra.eu>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc/86xx: Update device tree

Avoid duplication of the interrupt-parent, migrate to 4 interrupt-cells
and set the right clock-frequency for pcie (100 Mhz).

Signed-off-by: Alessio Igor Bogani <alessio.bogani@elettra.eu>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc/86xx: Move dts files to fsl directory

Signed-off-by: Alessio Igor Bogani <alessio.bogani@elettra.eu>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc/86xx: Switch to kconfig fragments approach

Signed-off-by: Alessio Igor Bogani <alessio.bogani@elettra.eu>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc/86xx: Update defconfigs

This patch show how defconfigs appear if the kconfig fragment approach is
used.

Signed-off-by: Alessio Igor Bogani <alessio.bogani@elettra.eu>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc/86xx: Consolidate common platform code

Signed-off-by: Alessio Igor Bogani <alessio.bogani@elettra.eu>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc32: Remove one insn in mulhdu

Remove one instruction in mulhdu

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc32: small optimisation in flush_icache_range()

Inlining of _dcache_range() functions has shown that the compiler
does the same thing a bit better with one insn less

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc: Simplify test in __dma_sync()

This simplification helps the compiler. We now have only one test
instead of two, so it reduces the number of branches.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc32: move xxxxx_dcache_range() functions inline

flush/clean/invalidate _dcache_range() functions are all very
similar and are quite short. They are mainly used in __dma_sync()
perf_event locate them in the top 3 consumming functions during
heavy ethernet activity

They are good candidate for inlining, as __dma_sync() does
almost nothing but calling them

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc32: Remove clear_pages() and define clear_page() inline

clear_pages() is never used expect by clear_page, and PPC32 is the
only architecture (still) having this function. Neither PPC64 nor
any other architecture has it.

This patch removes clear_pages() and moves clear_page() function
inline (same as PPC64) as it only is a few isns

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc: add inline functions for cache related instructions

This patch adds inline functions to use dcbz, dcbi, dcbf, dcbst
from C functions

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc/8xx: rewrite flush_instruction_cache() in C

On PPC8xx, flushing instruction cache is performed by writing
in register SPRN_IC_CST. This registers suffers CPU6 ERRATA.
The patch rewrites the fonction in C so that CPU6 ERRATA will
be handled transparently

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc/8xx: rewrite set_context() in C

There is no real need to have set_context() in assembly.
Now that we have mtspr() handling CPU6 ERRATA directly, we
can rewrite set_context() in C language for easier maintenance.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc/8xx: remove special handling of CPU6 errata in set_dec()

CPU6 ERRATA is now handled directly in mtspr(), so we can use the
standard set_dec() fonction in all cases.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc/8xx: Handle CPU6 ERRATA directly in mtspr() macro

MPC8xx has an ERRATA on the use of mtspr() for some registers
This patch includes the ERRATA handling directly into mtspr() macro
so that mtspr() users don't need to bother about that errata

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc/8xx: Add missing SPRN defines into reg_8xx.h

Add missing SPRN defines into reg_8xx.h
Some of them are defined in mmu-8xx.h, so we include mmu-8xx.h in
reg_8xx.h, for that we remove references to PAGE_SHIFT in mmu-8xx.h
to have it self sufficient, as includers of reg_8xx.h don't all
include asm/page.h

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc32: remove ioremap_base

ioremap_base is not initialised and is nowhere used so remove it

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc32: Remove useless/wrong MMU:setio progress message

Commit 771168494719 ("[POWERPC] Remove unused machine call outs")
removed the call to setup_io_mappings(), so remove the associated
progress line message

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc32: refactor x_mapped_by_bats() and x_mapped_by_tlbcam() together

x_mapped_by_bats() and x_mapped_by_tlbcam() serve the same kind of
purpose, and are never defined at the same time.
So rename them x_block_mapped() and define them in the relevant
places

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc32: Fix pte_offset_kernel() to return NULL for bad pages

The fixmap related functions try to map kernel pages that are
already mapped through Large TLBs. pte_offset_kernel() has to
return NULL for LTLBs, otherwise the caller will try to access
level 2 table which doesn't exist

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc/8xx: move setup_initial_memory_limit() into 8xx_mmu.c

Now we have a 8xx specific .c file for that so put it in there
as other powerpc variants do

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc: Update documentation for noltlbs kernel parameter

Now the noltlbs kernel parameter is also applicable to PPC8xx

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc/8xx: Map linear kernel RAM with 8M pages

On a live running system (VoIP gateway for Air Trafic Control), over
a 10 minutes period (with 277s idle), we get 87 millions DTLB misses
and approximatly 35 secondes are spent in DTLB handler.
This represents 5.8% of the overall time and even 10.8% of the
non-idle time.
Among those 87 millions DTLB misses, 15% are on user addresses and
85% are on kernel addresses. And within the kernel addresses, 93%
are on addresses from the linear address space and only 7% are on
addresses from the virtual address space.

MPC8xx has no BATs but it has 8Mb page size. This patch implements
mapping of kernel RAM using 8Mb pages, on the same model as what is
done on the 40x.

In 4k pages mode, each PGD entry maps a 4Mb area: we map every two
entries to the same 8Mb physical page. In each second entry, we add
4Mb to the page physical address to ease life of the FixupDAR
routine. This is just ignored by HW.

In 16k pages mode, each PGD entry maps a 64Mb area: each PGD entry
will point to the first page of the area. The DTLB handler adds
the 3 bits from EPN to map the correct page.

With this patch applied, we now get only 13 millions TLB misses
during the 10 minutes period. The idle time has increased to 313s
and the overall time spent in DTLB miss handler is 6.3s, which
represents 1% of the overall time and 2.2% of non-idle time.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc/8xx: Save r3 all the time in DTLB miss handler

We are spending between 40 and 160 cycles with a mean of 65 cycles in
the DTLB handling routine (measured with mftbl) so make it more
simple althought it adds one instruction.
With this modification, we get three registers available at all time,
which will help with following patch.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

Merge branch 'topic/mprofile-kernel' into next

Merge the ftrace changes to support -mprofile-kernel on ppc64le. This is
a prerequisite for live patching, the support for which will be merged
via the livepatch tree based on this topic branch.

powerpc/perf: Fix misleading comment in pmao_restore_workaround()

The current comment in pmao_restore_workaround() regarding
hard_irq_disable() is wrong. It should say to hard *disable* interrupts
instead of *enable*. Fix it.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/perf/24x7: Eliminate domain suffix in event names

The Physical Core events of the 24x7 PMU can be monitored across various
domains (physical core, vcpu home core, vcpu home node etc). For each of
these core events, we currently create multiple events in sysfs, one for
each domain the event can be monitored in. These events are distinguished
by their suffixes like __PHYS_CORE, __VCPU_HOME_CORE etc.

Rather than creating multiple such entries, we could let the user specify
make 'domain' index a required parameter and let the user specify a value
for it (like they currently specify the core index).

$ cat /sys/bus/event_source/devices/hv_24x7/events/HPM_CCYC
domain=?,offset=0x98,core=?,lpar=0x0

$ perf stat -C 0 -e hv_24x7/HPM_CCYC,domain=2,core=1/ true

(the 'domain=?' and 'core=?' in sysfs tell perf tool to enforce them as
required parameters).

This simplifies the interface and allows users to identify events by the
name specified in the catalog (User can determine the domain index by
referring to '/sys/bus/event_source/devices/hv_24x7/interface/domains').

Eliminating the event suffix eliminates several functions and simplifies
code.

Note that Physical Chip events can only be monitored in the chip domain
so those events have the domain set to 1 (rather than =?) and users don't
need to specify the domain index for the Chip events.

$ cat /sys/bus/event_source/devices/hv_24x7/events/PM_XLINK_CYCLES
domain=1,offset=0x230,chip=?,lpar=0x0

$ perf stat -C 0 -e hv_24x7/PM_XLINK_CYCLES,chip=1/ true

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/perf/hv-24x7: Display domain indices in sysfs

To help users determine domains, display the domain indices used by the
kernel in sysfs.

$ cat /sys/bus/event_source/devices/hv_24x7/interface/domains
1: Physical Chip
2: Physical Core
3: VCPU Home Core
4: VCPU Home Chip
5: VCPU Home Node
6: VCPU Remote Node

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/perf/hv-24x7: Display change in counter values

For 24x7 counters, perf displays the raw value of the 24x7 counter, which
is a monotonically increasing value.

perf stat -C 0 -e \
'hv_24x7/HPM_0THRD_NON_IDLE_CCYC__PHYS_CORE,core=1/' \
sleep 1

Performance counter stats for 'CPU(s) 0':

     9,105,403,170      hv_24x7/HPM_0THRD_NON_IDLE_CCYC__PHYS_CORE,core=1/

       0.000425751 seconds time elapsed

In the typical usage of 'perf stat' this counter value is not as useful
as the _change_ in the counter value over the duration of the application.

Have h_24x7_event_init() set the event's prev_count to the raw value of
the 24x7 counter at the time of initialization. When the application
terminates, hv_24x7_event_read() will compute the change in value and
report to the perf tool. Similarly, for the transaction interface, clear
the event count to 0 at the beginning of the transaction.

perf stat -C 0 -e \
'hv_24x7/HPM_0THRD_NON_IDLE_CCYC__PHYS_CORE,core=1/' \
sleep 1

Performance counter stats for 'CPU(s) 0':

           245,758      hv_24x7/HPM_0THRD_NON_IDLE_CCYC__PHYS_CORE,core=1/

       1.006366383 seconds time elapsed

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/perf/hv-24x7: Fix usage with chip events.

24x7 counters can belong to different domains (core, chip, virtual CPU
etc). For events in the 'chip' domain, sysfs entry currently looks like:

$ cd /sys/bus/event_source/devices/hv_24x7/events
$ cat PM_XLINK_CYCLES__PHYS_CHIP
domain=0x1,offset=0x230,core=?,lpar=0x0

where the required parameter, 'core=?' is specified with perf as:

perf stat -C 0 -e hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP,core=1/ \
/bin/true

This is inconsistent in that 'core' is a required parameter for a chip
event.  Instead, have the the sysfs entry display 'chip=?' for chip
events:

$ cd /sys/bus/event_source/devices/hv_24x7/events
$ cat PM_XLINK_CYCLES__PHYS_CHIP
domain=0x1,offset=0x230,chip=?,lpar=0x0

We also need to add a 'chip' entry in the sysfs format directory:

$ ls /sys/bus/event_source/devices/hv_24x7/format
chip  core  domain  lpar  offset  vcpu
^^^^
(new)

so the perf tool can automatically check usage and format the chip
parameter correctly:

$ perf stat -C 0 -v -e hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP/ \
/bin/true
Required parameter 'chip' not specified
invalid or unsupported event: 'hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP/'

$ perf stat -C 0 -v -e hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP,chip=1/ \
/bin/true
hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP,chip=1/: 0 6628908 6628908

Performance counter stats for 'CPU(s) 0':

         0      hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP,chip=1/

    0.006606970 seconds time elapsed

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/perf: Export Power8 generic and cache events to sysfs

Power8 supports a large number of events in each susbystem so when a
user runs:

perf stat -e branch-instructions sleep 1
perf stat -e L1-dcache-loads sleep 1

it is not clear as to which PMU events were monitored.

Export the generic hardware and cache perf events for Power8 to sysfs,
so users can precisely determine the PMU event monitored by the generic
event.

Eg:
cat /sys/bus/event_source/devices/cpu/events/branch-instructions
event=0x10068

$ cat /sys/bus/event_source/devices/cpu/events/L1-dcache-loads
event=0x100ee

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/perf: Remove PME_ prefix for power7 events

We used the PME_ prefix earlier to avoid some macro/variable name
collisions. We have since changed the way we define/use the event
macros so we no longer need the prefix.

By dropping the prefix, we keep the the event macros consistent with
their official names.

Reported-by: Michael Ellerman <ellerman@au1.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/p5040: Add device node for RAID Engine

add the missing RAID Engine device node for p5040.
otherwise, the device can not be detected.

Signed-off-by: Xuelin Shi <xuelin.shi@nxp.com>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc: optimise csum_partial() call when len is constant

csum_partial is often called for small fixed length packets
for which it is suboptimal to use the generic csum_partial()
function.

For instance, in my configuration, I got:
* One place calling it with constant len 4
* Seven places calling it with constant len 8
* Three places calling it with constant len 14
* One place calling it with constant len 20
* One place calling it with constant len 24
* One place calling it with constant len 32

This patch renames csum_partial() to __csum_partial() and
implements csum_partial() as a wrapper inline function which
* uses csum_add() for small 16bits multiple constant length
* uses ip_fast_csum() for other 32bits multiple constant
* uses __csum_partial() in all other cases

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc/fsl-lbc: Modify suspend/resume entry sequence

Modify platform driver suspend/resume to syscore
suspend/resume. This is because p1022ds needs to use
localbus when entering the PCIE resume.

Signed-off-by: Raghav Dogra <raghav.dogra@nxp.com>
[scottwood: dropped makefile churn]
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc/8xx: CONFIG_DEBUG_PAGEALLOC requires ITLBmiss for kernel addresses

When CONFIG_DEBUG_PAGEALLOC is activated, the initial TLB mapping gets
flushed to track accesses to wrong areas. Therefore, kernel addresses
will also generate ITLB misses.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc/885: set SDCR to 0x40

The MPC885 reference manual says that SDCR shall have value 0x40, but
most exemples set SDCR to 0x1
With 0x1 in SDCR, we observe TX underruns on SCC when using it in
QMC mode.
According the NXP technical support, this is a copy/paste error from
MPC860 reference manual, 0x40 being the only value supported
by the MPC885 HW.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc/86xx: disable IDE subsystem in mpc8610_hpcd_defconfig

This patch disables deprecated IDE subsystem in mpc8610_hpcd_defconfig
(no IDE host drivers are selected in this config so there is no valid
reason to enable IDE subsystem itself).

Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc/85xx: disable IDE subsystem in stx_gp3_defconfig

This patch disables deprecated IDE subsystem in stx_gp3_defconfig
(no IDE host drivers are selected in this config so there is no valid
reason to enable IDE subsystem itself).

Cc: Scott Wood <oss@buserror.net>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc/85xx: disable IDE subsystem in ksi8560_defconfig

This patch disables deprecated IDE subsystem in ksi8560_defconfig
(no IDE host drivers are selected in this config so there is no valid
reason to enable IDE subsystem itself).

Cc: Scott Wood <oss@buserror.net>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc/83xx: disable IDE subsystem in mpc834x_itx_defconfig

This patch disables deprecated IDE subsystem in mpc834x_itx_defconfig
(no IDE host drivers are selected in this config so there is no valid
reason to enable IDE subsystem itself).

Cc: Scott Wood <oss@buserror.net>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Scott Wood <oss@buserror.net>

qe: Use GFP_ATOMIC while spin_lock_irqsave is held

cpm_muram_alloc_common is called twice and both the times
spin_lock_irqsave is held.
Using GFP_KERNEL can sleep in spin_lock_irqsave context and cause
deadlock

Signed-off-by: Saurabh Sengar <saurabh.truth@gmail.com>
Signed-off-by: Scott Wood <oss@buserror.net>

qe: Make cpm_muram_alloc_common static

as cpm_muram_alloc_common is used only in this file,
making it static

Signed-off-by: Saurabh Sengar <saurabh.truth@gmail.com>
Signed-off-by: Scott Wood <oss@buserror.net>

qe/ic: fix a buffer overflow error and add check elsewhere

127 is the theoretical up boundary of QEIC number,
in fact there only be 44 qe_ic_info now.
add check to overflow for qe_ic_info

Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com>
Acked-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Scott Wood <oss@buserror.net>

cxl: Ignore probes for virtual afu pci devices

Add a check at the beginning of cxl_probe function to ignore virtual pci
devices created for each afu registered. This fixes the the errors
messages logged about missing CXL vsec, when cxl probe is unable to
find necessary vsec entries in device pci config space. The error
message logged are of the form :

cxl-pci 0004:00:00.0: ABORTING: CXL VSEC not found!
cxl-pci 0004:00:00.0: cxl_init_adapter failed: -19

Cc: Ian Munsie <imunsie@au1.ibm.com>
Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Reviewed-by: fbarrat@linux.vnet.ibm.com
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

cxl: Remove cxl_get_phys_dev() kernel API

The cxl_get_phys_dev() API returns a struct device pointer which could
belong to either a struct pci_dev (bare-metal) or platform_device
(powerVM). To avoid potential problems in drivers, remove that API. It
was introduced to allow drivers to read the VPD of the adapter, but
the cxl driver now provides the cxl_pci_read_adapter_vpd() API for
that purpose.

Co-authored-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

cxlflash: Use new cxl_pci_read_adapter_vpd() API

To read the adapter VPD, drivers can't rely on pci config APIs, as it
wouldn't work on powerVM. cxl introduced a new kernel API especially
for this, so start using it.

Co-authored-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Acked-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

cxl: Add tracepoints around the cxl hcall

To ease debugging, add a few tracepoints around the cxl hcalls.

Co-authored-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Manoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

cxl: Adapter failure handling

Check the AFU state whenever an API is called. The hypervisor may
issue a reset of the adapter when it detects a fault. When it happens,
it launches an error recovery which will either move the AFU to a
permanent failure state, or in the disabled state.
If the AFU is found to be disabled, detach all existing contexts from
it before issuing a AFU reset to re-enable it.

Before detaching contexts, notify any kernel driver through the EEH
callbacks of the AFU pci device.

Co-authored-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Manoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

cxl: Support the cxl kernel API from a guest

Like on bare-metal, the cxl driver creates a virtual PHB and a pci
device for the AFU. The configuration space of the device is mapped to
the configuration record of the AFU.

Reuse the code defined in afu_cr_read8|16|32() when reading the
configuration space of the AFU device.

Even though the (virtual) AFU device is a pci device, the adapter is
not. So a driver using the cxl kernel API cannot read the VPD of the
adapter through the usual PCI interface. Therefore, we add a call to
the cxl kernel API:
ssize_t cxl_read_adapter_vpd(struct pci_dev *dev, void *buf, size_t count);

Co-authored-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Manoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

cxl: Parse device tree and create cxl device(s) at boot

Add new entry point to scan the device tree at boot in a guest,
looking for cxl devices.

Co-authored-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Manoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

cxl: Support to flash a new image on the adapter from a guest

The new flash.c file contains the logic to flash a new image on the
adapter, through a hcall. It is an iterative process, with chunks of
data of 1M at a time. There are also 2 phases: write and verify. The
flash operation itself is driven from a user-land tool.
Once flashing is successful, an rtas call is made to update the device
tree with the new properties values for the adapter and the AFU(s)

Add a new char device for the adapter, so that the flash tool can
access the card, even if there is no valid AFU on it.

Co-authored-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Manoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

cxl: sysfs support for guests

Filter out a few adapter parameters which don't make sense in a guest.
Document the changes.

Co-authored-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Manoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

cxl: Add guest-specific code

The new of.c file contains code to parse the device tree to find out
about cxl adapters and AFUs.

guest.c implements the guest-specific callbacks for the backend API.

The process element ID is not known until the context is attached, so
we have to separate the context ID assigned by the cxl driver from the
process element ID visible to the user applications. In bare-metal,
the 2 IDs match.

Co-authored-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Manoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
[mpe: Fix SMP=n build, fix PSERIES=n build, minor whitespace fixes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

cxl: Separate bare-metal fields in adapter and AFU data structures

Introduce sub-structures containing the bare-metal specific fields in
the structures describing the adapter (struct cxl) and AFU (struct
cxl_afu).
Update all their references.

Co-authored-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Manoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

cxl: New hcalls to support cxl adapters

The hypervisor calls provide an interface with a coherent platform
facility and function. It matches version 0.16 of the 'PAPR changes'
document.

The following hcalls are supported:
H_ATTACH_CA_PROCESS    Attach a process element to a coherent platform
                       function.
H_DETACH_CA_PROCESS    Detach a process element from a coherent
                       platform function.
H_CONTROL_CA_FUNCTION  Allow the partition to manipulate or query
                       certain coherent platform function behaviors.
H_COLLECT_CA_INT_INFO  Collect interrupt info about a coherent.
                       platform function after an interrupt occurred
H_CONTROL_CA_FAULTS    Control the operation of a coherent platform
                       function after a fault occurs.
H_DOWNLOAD_CA_FACILITY Support for downloading a base adapter image to
                       the coherent platform facility, and for
                       validating the entire image after the download.
H_CONTROL_CA_FACILITY  Allow the partition to manipulate or query
                       certain coherent platform facility behaviors.

Co-authored-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Manoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc: New possible return value from hcall

The hcalls introduced for cxl use a possible new value:
H_STATE (invalid state).

Co-authored-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Manoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

cxl: IRQ allocation for guests

The PSL interrupt cannot be multiplexed in a guest, as it is not
supported by the hypervisor. So an interrupt will be allocated
for it for each context. It will still be the first interrupt found in
the first interrupt range, but is treated almost like any other AFU
interrupt when creating/deleting the context. Only the handler is
different. Rework the code so that the range 0 is treated like the
other ranges.

Co-authored-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Manoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

cxl: Update cxl_irq() prototype

The context parameter when calling cxl_irq() should be strongly typed.

Co-authored-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Manoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

cxl: Isolate a few bare-metal-specific calls

A few functions are mostly common between bare-metal and guest and
just need minor tuning. To avoid crowding the backend API, introduce a
few 'if' based on the CPU being in HV mode.

Co-authored-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Manoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

cxl: Rename some bare-metal specific functions

Rename a few functions, changing the 'cxl_' prefix to either
'cxl_pci_' or 'cxl_native_', to make clear that the implementation is
bare-metal specific.

Those functions will have an equivalent implementation for a guest in
a later patch.

Co-authored-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Manoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

cxl: Introduce implementation-specific API

The backend API (in cxl.h) lists some low-level functions whose
implementation is different on bare-metal and in a guest. Each
environment implements its own functions, and the common code uses
them through function pointers, defined in cxl_backend_ops

Co-authored-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Manoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

cxl: Define process problem state area at attach time only

CXL kernel API was defining the process problem state area during
context initialization, making it possible to map the problem state
area before attaching the context. This won't work on a powerVM
guest. So force the logical behavior, like in userspace: attach first,
then map the problem state area.
Remove calls to cxl_assign_psn_space during init. The function is
already called on the attach paths.

Co-authored-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Manoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

cxl: Move bare-metal specific code to specialized files

Move a few functions around to better separate code specific to
bare-metal environment from code which will be commonly used between
guest and bare-metal.

Code specific to bare-metal is meant to be in native.c or pci.c
only. It's basically anything which touches the card p1 registers,
some p2 registers not needed from a guest and the PCI interface.

Co-authored-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Manoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

cxl: Move common code away from bare-metal-specific files

Move around some functions which will be accessed from the bare-metal
and guest environments.
Code in native.c and pci.c is meant to be bare-metal specific.
Other files contain code which may be shared with guests.

Co-authored-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Manoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/eeh: eeh_pci_enable(): fix checking of post-request state

In eeh_pci_enable(), after making the request to set the new options, we
call eeh_ops->wait_state() to check that the request finished successfully.

At the moment, if eeh_ops->wait_state() returns 0, we return 0 without
checking that it reflects the expected outcome. This can lead to callers
further up the chain incorrectly assuming the slot has been successfully
unfrozen and continuing to attempt recovery.

On powernv, this will occur if pnv_eeh_get_pe_state() or
pnv_eeh_get_phb_state() return 0, which in turn occurs if the relevant OPAL
call returns OPAL_EEH_STOPPED_MMIO_DMA_FREEZE or
OPAL_EEH_PHB_ERROR respectively.

On pseries, this will occur if pseries_eeh_get_state() returns 0, which in
turn occurs if RTAS reports that the PE is in the MMIO Stopped and DMA
Stopped states.

Obviously, none of these cases represent a successful completion of a
request to thaw MMIO or DMA.

Fix the check so that a wait_state() return value of 0 won't be considered
successful for the EEH_OPT_THAW_MMIO or EEH_OPT_THAW_DMA cases.

Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/eeh: Remove duplicated check in eeh_dump_pe_log()

When eeh_dump_pe_log() is only called by eeh_slot_error_detail(),
we already have the check that the PE isn't in PCI config blocked
state in eeh_slot_error_detail(). So we needn't the duplicated
check in eeh_dump_pe_log().

This removes the duplicated check in eeh_dump_pe_log(). No logical
changes introduced.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/eeh: Synchronize recovery in host/guest

When passing through SRIOV VFs to guest, we possibly encounter EEH
error on PF. In this case, the VF PEs are put into frozen state.
The error could be reported to guest before it's captured by the
host. That means the guest could attempt to recover errors on VFs
before host gets chance to recover errors on PFs. The VFs won't be
recovered successfully.

This enforces the recovery order for above case: the recovery on
child PE in guest is hold until the recovery on parent PE in host
is completed.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/eeh: Don't remove passed VFs

When we have partial hotplug as part of the error recovery on PF,
the VFs that are bound with vfio-pci driver will experience hotplug.
That's not allowed.

This checks if the VF PE is passed or not. If it does, we leave
the VF without removing it.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/eeh: Don't propagate error to guest

When EEH error happened to the parent PE of those PEs that have
been passed through to guest, the error is propagated to guest
domain and the VFIO driver's error handlers are called. It's not
correct as the error in the host domain shouldn't be propagated
to guests and affect them.

This adds one more limitation when calling EEH error handlers.
If the PE has been passed through to guest, the error handlers
won't be called.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/eeh: powerpc/eeh: Support error recovery for VF PE

PFs are enumerated on PCI bus, while VFs are created by PF's driver.

In EEH recovery, it has two cases:
1. Device and driver is EEH aware, error handlers are called.
2. Device and driver is not EEH aware, un-plug the device and plug it again
by enumerating it.

The special thing happens on the second case. For a PF, we could use the
original pci core to enumerate the bus, while for VF we need to record the
VFs which aer un-plugged then plug it again.

Also The patch caches the VF index in pci_dn, which can be used to
calculate VF's bus, device and function number. Those information helps to
locate the VF's PCI device instance when doing hotplug during EEH recovery
if necessary.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/powernv: Support PCI config restore for VFs

After PE reset, OPAL API opal_pci_reinit() is called on all devices
contained in the PE to reinitialize them. While skiboot is not aware of
VFs, we have to implement the function in kernel to reinitialize VFs after
reset on PE for VFs.

In this patch, two functions pnv_pci_fixup_vf_mps() and
pnv_eeh_restore_vf_config() both manipulate the MPS of the VF, since for a
VF it has three cases.

1. Normal creation for a VF
   In this case, pnv_pci_fixup_vf_mps() is called to make the MPS a proper
   value compared with its parent.
2. EEH recovery without VF removed
   In this case, MPS is stored in pci_dn and pnv_eeh_restore_vf_config() is
   called to restore it and reinitialize other part.
3. EEH recovery with VF removed
   In this case, VF will be removed then re-created. Both functions are
   called. First pnv_pci_fixup_vf_mps() is called to store the proper MPS
   to pci_dn and then pnv_eeh_restore_vf_config() is called to do proper
   thing.

This introduces two functions: pnv_pci_fixup_vf_mps() to fixup the VF's
MPS to make sure it is equal to parent's and store this value in pci_dn
for future use. pnv_eeh_restore_vf_config() to re-initialize on VF by
restoring MPS, disabling completion timeout, enabling SERR, etc.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/powernv: Support EEH reset for VF PE

PEs for VFs don't have primary bus. So they have to have their own reset
backend, which is used during EEH recovery. The patch implements the reset
backend for VF's PE by issuing FLR or AF FLR to the VFs, which are contained
in the PE.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/eeh: Create PE for VFs

This creates PEs for VFs in the weak function pcibios_bus_add_device().
Those PEs for VFs are identified with newly introduced flag EEH_PE_VF
so that we treat them differently during EEH recovery.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/eeh: EEH device for VF

VFs and their corresponding pdn are created and released dynamically
when their PF's SRIOV capability is enabled and disabled. This creates
and releases EEH devices for VFs when creating and releasing their pdn
instances, which means EEH devices and pdn instances have same life
cycle. Also, VF's EEH device is identified by (struct eeh_dev::physfn).

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/eeh: Cache normal BARs, not windows or IOV BARs

This restricts the EEH address cache to use only the first 7 BARs. This
makes __eeh_addr_cache_insert_dev() ignore PCI bridge window and IOV BARs.
As the result of this change, eeh_addr_cache_get_dev() will return VFs from
VF's resource addresses instead of parent PFs.

This also removes PCI bridge check as we limit __eeh_addr_cache_insert_dev()
to 7 BARs and this effectively excludes PCI bridges from being cached.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/pci: Remove VFs prior to PF

As commit ac205b7bb72f ("PCI: make sriov work with hotplug remove")
indicates, VFs which is on the same PCI bus as their PF, should be
removed before the PF. Otherwise, we might run into kernel crash
at PCI unplugging time.

This applies the above pattern to powerpc PCI hotplug path.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

PCI: Add pcibios_bus_add_device() weak function

This adds weak function pcibios_bus_add_device() for arch dependent
code could do proper setup. For example, powerpc could setup EEH
related resources for SRIOV VFs.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

PCI/IOV: Rename and export virtfn_{add, remove}

During EEH recovery, hotplug is applied to the devices which don't
have drivers or their drivers don't support EEH. However, the hotplug,
which was implemented based on PCI bus, can't be applied to VF directly.
Instead, we unplug and plug individual PCI devices (VFs).

This renames virtn_{add,remove}() and exports them so they can be used
in PCI hotplug during EEH recovery.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/eeh: Reworked eeh_pe_bus_get()

The original implementation is ugly: unnecessary if statements and
"out" tag. This reworks the function to avoid above weaknesses. No
functional changes introduced.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/ftrace: Add Kconfig & Make glue for mprofile-kernel

Firstly we add logic to Kconfig to allow a user to choose if they want
mprofile-kernel. This has to be user-selectable because only some
current toolchains support it. If we enabled it unconditionally we would
prevent some users from building the kernel entirely.

Arguably it would be nice if we could detect if mprofile-kernel was
available, and use it then. However that would violate the principle of
least surprise because a user having choosen options such as live
patching, would then see them quietly disabled at build time.

We also make the user selectable option negative, ie. it disables when
selected, so that allyesconfig continues to build on old toolchains.

Once we've decided we do want to use mprofile-kernel, we then add a
script which checks it actually works. That is because there are
versions of gcc that accept the flag but don't generate correct code.

Due to the way kconfig works, we can't error out when we detect a
non-working toolchain. If we did a user would never be able to modify
their config and run oldconfig - because the check would block oldconfig
from running. Instead we emit a warning and add a bogus flag to CFLAGS
so that the build will fail.

Signed-off-by: Torsten Duwe <duwe@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/ftrace: Add support for -mprofile-kernel ftrace ABI

The gcc switch -mprofile-kernel defines a new ABI for calling _mcount()
very early in the function with minimal overhead.

Although mprofile-kernel has been available since GCC 3.4, there were
bugs which were only fixed recently. Currently it is known to work in
GCC 4.9, 5 and 6.

Additionally there are two possible code sequences generated by the
flag, the first uses mflr/std/bl and the second is optimised to omit the
std. Currently only gcc 6 has the optimised sequence. This patch
supports both sequences.

Initial work started by Vojtech Pavlik, used with permission.

Key changes:
- rework _mcount() to work for both the old and new ABIs.
- implement new versions of ftrace_caller() and ftrace_graph_caller()
   which deal with the new ABI.
- updates to __ftrace_make_nop() to recognise the new mcount calling
   sequence.
- updates to __ftrace_make_call() to recognise the nop'ed sequence.
- implement ftrace_modify_call().
- updates to the module loader to surpress the toc save in the module
   stub when calling mcount with the new ABI.

Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Torsten Duwe <duwe@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/ftrace: Use $(CC_FLAGS_FTRACE) when disabling ftrace

Rather than open-coding -pg whereever we want to disable ftrace, use the
existing $(CC_FLAGS_FTRACE) variable.

This has the advantage that it will work in future when we use a
different set of flags to enable ftrace.

Signed-off-by: Torsten Duwe <duwe@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/ftrace: Use generic ftrace_modify_all_code()

Convert powerpc's arch_ftrace_update_code() from its own version to use
the generic default functionality (without stop_machine -- our
instructions are properly aligned and the replacements atomic).

With this we gain error checking and the much-needed function_trace_op
handling.

Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Torsten Duwe <duwe@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/module: Create a special stub for ftrace_caller()

In order to support the new -mprofile-kernel ABI, we need to be able to
call from the module back to ftrace_caller() (in the kernel) without
using the module's r2. That is because the function in this module which
is calling ftrace_caller() may not have setup r2, if it doesn't
otherwise need it (ie. it accesses no globals).

To make that work we add a new stub which is used for calling
ftrace_caller(), which uses the kernel toc instead of the module toc.

Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Torsten Duwe <duwe@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/module: Mark module stubs with a magic value

When a module is loaded, calls out to the kernel go via a stub which is
generated at runtime. One of these stubs is used to call _mcount(),
which is the default target of tracing calls generated by the compiler
with -pg.

If dynamic ftrace is enabled (which it typically is), another stub is
used to call ftrace_caller(), which is the target of tracing calls when
ftrace is actually active.

ftrace then wants to disable the calls to _mcount() at module startup,
and enable/disable the calls to ftrace_caller() when enabling/disabling
tracing - all of these it does by patching the code.

As part of that code patching, the ftrace code wants to confirm that the
branch it is about to modify, is in fact a call to a module stub which
calls _mcount() or ftrace_caller().

Currently it does that by inspecting the instructions and confirming
they are what it expects. Although that works, the code to do it is
pretty intricate because it requires lots of knowledge about the exact
format of the stub.

We can make that process easier by marking the generated stubs with a
magic value, and then looking for that magic value. Altough this is not
as rigorous as the current method, I believe it is sufficient in
practice.

Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Torsten Duwe <duwe@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/module: Only try to generate the ftrace_caller() stub once

Currently we generate the module stub for ftrace_caller() at the bottom
of apply_relocate_add(). However apply_relocate_add() is potentially
called more than once per module, which means we will try to generate
the ftrace_caller() stub multiple times.

Although the current code deals with that correctly, ie. it only
generates a stub the first time, it would be clearer to only try to
generate the stub once.

Note also on first reading it may appear that we generate a different
stub for each section that requires relocation, but that is not the
case. The code in stub_for_addr() that searches for an existing stub
uses sechdrs[me->arch.stubs_section], ie. the single stub section for
this module.

A cleaner approach is to only generate the ftrace_caller() stub once,
from module_finalize(). Although the original code didn't check to see
if the stub was actually generated correctly, it seems prudent to add a
check, so do that. And an additional benefit is we can clean the ifdefs
up a little.

Finally we must propagate the const'ness of some of the pointers passed
to module_finalize(), but that is also an improvement.

Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Torsten Duwe <duwe@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc: Create a helper for getting the kernel toc value

Move the logic to work out the kernel toc pointer into a header. This is
a good cleanup, and also means we can use it elsewhere in future.

Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Reviewed-by: Torsten Duwe <duwe@suse.de>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>

powerpc/fsl: Update fman dt binding with pcs-phy and tbi-phy

The FMan contains internal PHY devices used for SGMII connections
to external PHYs. When these PHYs are in use a reference is needed
for both the external PHY and the internal one. For the external
PHY phy-handle provides the reference. For the internal PHY a new
handle is required.
In dTSEC, the internal PHY is a TBI (Ten Bit Interface) PHY,
the handle used will be tbi-handle.
In mEMAC, the internal PHY is a PCS (Physical Coding Sublayer) PHY,
the handle used will be pcsphy-handle.

Signed-off-by: Igal Liberman <igal.liberman@freescale.com>
Signed-off-by: Scott Wood <oss@buserror.net>

powerpc/mpc85xx: Add CPU hotplug support for E6500

Support Freescale E6500 core-based platforms, like t4240.
Support disabling/enabling individual CPU thread dynamically.

Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com>

powerpc/mpc85xx: Add hotplug support on E5500 and E500MC cores

Freescale E500MC and E5500 core-based platforms, like P4080, T1040,
support disabling/enabling CPU dynamically.
This patch adds this feature on those platforms.

Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com>
Signed-off-by: Tang Yuantian <Yuantian.Tang@feescale.com>
[scottwood: removed unused pr_fmt]
Signed-off-by: Scott Wood <oss@buserror.net>