David S. Miller [Thu, 23 Feb 2006 22:19:28 +0000 (14:19 -0800)]
[SPARC64]: Fix TLB context allocation with SMT style shared TLBs.
The context allocation scheme we use depends upon there being a 1<-->1
mapping from cpu to physical TLB for correctness. Chips like Niagara
break this assumption.
So what we do is notify all cpus with a cross call when the context
version number changes, and if necessary this makes them allocate
a valid context for the address space they are running at the time.
Stress tested with make -j1024, make -j2048, and make -j4096 kernel
builds on a 32-strand, 8 core, T2000 with 16GB of ram.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 23 Feb 2006 00:20:11 +0000 (16:20 -0800)]
[SPARC64]: Fix %tstate ASI handling in start_thread{,32}()
Niagara helps us find a ancient bug in the sparc64 port :-)
The ASI_* values are plain constant defines, thus signed 32-bit
on sparc64. To put shift this into the regs->tstate value we were
doing or'ing "(ASI_PNF << 24)" into there.
ASI_PNF is 0x82 and shifted left by 24 makes that topmost bit the
sign bit in a 32-bit value. This would get sign extended to 64-bits
and thus corrupt the top-half of the reg->tstate value.
This never caused problems in pre-Niagara cpus because the only thing
up there were the condition code values. But Niagara has the global
register level field, and this all 1's value is illegal there so
Niagara gives an illegal instruction trap due to this bug.
I'm pretty sure this bug is about as old as the sparc64 port itself.
This also points out that we weren't setting ASI_PNF for 32-bit tasks.
We should, so fix that while we're here.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 23 Feb 2006 00:15:45 +0000 (16:15 -0800)]
[SPARC64]: Drop %gl to 0 before re-enabling PSTATE_IE in rtrap
If we take a window fault, on SUN4V set %gl to zero before we
turn PSTATE_IE back on in %pstate. Otherwise if we take an
interrupt we'll end up with corrupt register state.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 22 Feb 2006 06:31:11 +0000 (22:31 -0800)]
[SPARC64]: Create a seperate kernel TSB for 4MB/256MB mappings.
It can map all of the linear kernel mappings with zero TSB hash
conflicts for systems with 16GB or less ram. In such cases, on
SUN4V, once we load up this TSB the first time with all the
mappings, we never take a linear kernel mapping TLB miss ever
again, the hypervisor handles them all.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 22 Feb 2006 00:55:23 +0000 (16:55 -0800)]
[SPARC64]: Use sun4v_cpu_idle() in cpu_idle() on SUN4V.
We have to turn off the "polling nrflag" bit when we sleep
the cpu like this, so that we'll get a cross-cpu interrupt
to wake the processor up from the yield.
We also have to disable PSTATE_IE in %pstate around the yield
call and recheck need_resched() in order to avoid any races.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 21 Feb 2006 00:02:24 +0000 (16:02 -0800)]
[SPARC64]: Handle unimplemented FPU square-root on Niagara.
The math-emu code only expects unfinished fpop traps when
emulating FPU sqrt instructions on pre-Niagara chips.
On Niagara we can get unimplemented fpop, so handle that.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 19 Feb 2006 00:36:39 +0000 (16:36 -0800)]
[SPARC64]: Set %gl to 1 in kvmap_itlb_longpath on SUN4V.
Just like kvmap_dtlb_longpath we have to force the
global register level to one in order to mimick the
PSTATE_MG --> PSTATE_AG trasition done on SUN4U.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 18 Feb 2006 02:01:02 +0000 (18:01 -0800)]
[SPARC64]: More TLB/TSB handling fixes.
The SUN4V convention with non-shared TSBs is that the context
bit of the TAG is clear. So we have to choose an "invalid"
bit and initialize new TSBs appropriately. Otherwise a zero
TAG looks "valid".
Make sure, for the window fixup cases, that we use the right
global registers and that we don't potentially trample on
the live global registers in etrap/rtrap handling (%g2 and
%g6) and that we put the missing virtual address properly
in %g5.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 17 Feb 2006 09:29:17 +0000 (01:29 -0800)]
[SPARC64]: Get SUN4V SMP working.
The sibling cpu bringup is extremely fragile. We can only
perform the most basic calls until we take over the trap
table from the firmware/hypervisor on the new cpu.
This means no accesses to %g4, %g5, %g6 since those can't be
TLB translated without our trap handlers.
In order to achieve this:
1) Change sun4v_init_mondo_queues() so that it can operate in
several modes.
It can allocate the queues, or install them in the current
processor, or both.
The boot cpu does both in it's call early on.
Later, the boot cpu allocates the sibling cpu queue, starts
the sibling cpu, then the sibling cpu loads them in.
2) init_cur_cpu_trap() is changed to take the current_thread_info()
as an argument instead of reading %g6 directly on the current
cpu.
3) Create a trampoline stack for the sibling cpus. We do our basic
kernel calls using this stack, which is locked into the kernel
image, then go to our proper thread stack after taking over the
trap table.
4) While we are in this delicate startup state, we put 0xdeadbeef
into %g4/%g5/%g6 in order to catch accidental accesses.
5) On the final prom_set_trap_table*() call, we put &init_thread_union
into %g6. This is a hack to make prom_world(0) work. All that
wants to do is restore the %asi register using
get_thread_current_ds().
Longer term we should just do the OBP calls to set the trap table by
hand just like we do for everything else. This would avoid that silly
prom_world(0) issue, then we can remove the init_thread_union hack.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 17 Feb 2006 07:01:10 +0000 (23:01 -0800)]
[SPARC64]: Rewrite pci_intmap_match().
The whole algorithm was wrong. What we need to do is:
1) Walk each PCI bus above this device on the path to the
PCI controller nexus, and for each:
a) If interrupt-map exists, apply it, record IRQ controller node
b) Else, swivel interrupt number using PCI_SLOT(), use PCI bus
parent OBP node as controller node
c) Walk up to "controller node" until we hit the first PCI bus
in this domain, or "controller node" is the PCI controller
OBP node
2) If we walked to PCI controller OBP node, we're done.
3) Else, apply PCI controller interrupt-map to interrupt.
There is some stuff that needs to be checked out for ebus and
isa, but the PCI part is good to go.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 17 Feb 2006 00:23:45 +0000 (16:23 -0800)]
[SPARC64]: Fix return from trap on SUN4V.
We need to set the global register set _AND_ disable
PSTATE_IE in %pstate. The original patch sequence was
leaving PSTATE_IE enabled when returning to kernel mode,
oops.
This fixes the random register corruption being seen
on SUN4V.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 16 Feb 2006 05:16:42 +0000 (21:16 -0800)]
[SPARC64]: Do not write garbage into %pstate in tsb_context_switch().
For SUN4V, we were clobbering %o5 to do the hypervisor call.
This clobbers the saved %pstate value and we end up writing
garbage into that register as a result. Oops.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 15 Feb 2006 00:42:11 +0000 (16:42 -0800)]
[SPARC64]: Restrict PCI bus scanning on SUN4V.
On the PBM's first bus number, only allow device 0, function 0, to be
poked at with PCI config space accesses.
For some reason, this single device responds to all device numbers.
Also, reduce the verbiage of the debugging log printk's for PCI cfg
space accesses in the SUN4V PCI controller driver, so that it doesn't
overwhelm the slow SUN4V hypervisor console.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 14 Feb 2006 02:09:44 +0000 (18:09 -0800)]
[SPARC64]: More SUN4V PCI work.
Get bus range from child of PCI controller root nexus.
This is actually a hack, but the PCI-E bridge sitting
at the top of the PCI tree responds to PCI config cycles
for every device number, so best to just ignore it for now.
Preliminary PCI irq routing, needs lots of work.
Signed-off-by: David S. Miller <davem@davemloft.net>