Milton Miller [Wed, 11 May 2011 12:25:00 +0000 (12:25 +0000)]
powerpc/pseries/iommu: Cleanup ddw naming
When using a property refering to the availibily of dynamic dma windows
call it ddw_avail not ddr_avail.
dupe_ddw_if_already_created does not dupilcate anything, it only finds
and reuses the windows we already created, so rename it to
find_existing_ddw. Also, it does not need the pci device node, so
remove that argument.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Wed, 11 May 2011 12:24:59 +0000 (12:24 +0000)]
powerpc/pseries/iommu: Find windows after kexec during boot
Move the discovery of windows previously setup from when the pci driver
calls set_dma_mask to an arch_initcall.
When kexecing into a kernel with dynamic dma windows allocated, we need
to find the windows early so that memory hot remove will be able to
delete the tces mapping the to be removed memory and memory hotplug add
will map the new memory into the window. We should not wait for the
driver to be loaded and the device to be probed. The iommu init hooks
are before kmalloc is setup, so defer to arch_initcall.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Wed, 11 May 2011 12:24:58 +0000 (12:24 +0000)]
powerpc/pseries/iommu: Remove ddw property when destroying window
If we destroy the window, we need to remove the property recording that
we setup the window. Otherwise the next kernel we kexec will be
confused.
Also we should remove the property if even if we don't find the
ibm,ddw-applicable window or if one of the property sizes is unexpected;
presumably these came from a prior kernel via kexec, and we will not be
maintaining the window with respect to memory hotplug.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Ira Snyder [Fri, 11 Feb 2011 13:34:30 +0000 (13:34 +0000)]
misc: Add CARMA DATA-FPGA Programmer support
This adds support for programming the data processing FPGAs on the OVRO
CARMA board. These FPGAs have a special programming sequence that
requires that we program the Freescale DMA engine, which is only
available inside the kernel.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Ira Snyder [Fri, 11 Feb 2011 13:34:29 +0000 (13:34 +0000)]
misc: Add CARMA DATA-FPGA Access Driver
This driver allows userspace to access the data processing FPGAs on the
OVRO CARMA board. It has two modes of operation:
1) random access
This allows users to poke any DATA-FPGA registers by using mmap to map
the address region directly into their memory map.
2) correlation dumping
When correlating, the DATA-FPGA's have special requirements for getting
the data out of their memory before the next correlation. This nominally
happens at 64Hz (every 15.625ms). If the data is not dumped before the
next correlation, data is lost.
The data dumping driver handles buffering up to 1 second worth of
correlation data from the FPGAs. This lowers the realtime scheduling
requirements for the userspace process reading the device.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:30:44 +0000 (19:30 +0000)]
powerpc: Make IRQ_NOREQUEST last to clear, first to set
When creating an irq, don't allow a concurent driver request until
we have caled map, which will likley call set_chip_and_handler to
change the irq_chip and its operations.
Similarly, when tearing down an IRQ, make sure no new uses come
along while we change the irq back to the nop chip and then reset
the descriptor to freed status.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:30:40 +0000 (19:30 +0000)]
powerpc: Remove virq_to_host
The only references to the irq_map[].host field are internal to
arch/powerpc/kernel/irq.c
Signed-off-by: Milton Miller <miltonm@bga.com> Acked-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:30:36 +0000 (19:30 +0000)]
powerpc: Add virq_is_host to reduce virq_to_host usage
Some irq_host implementations are using virq_to_host to check if
they are the irq_host for a virtual irq. To allow us to make space
versus time tradeoffs, replace this usage with an assertive
virq_is_host that confirms or denies the irq is associated with the
given irq_host.
Signed-off-by: Milton Miller <miltonm@bga.com> Acked-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:30:33 +0000 (19:30 +0000)]
powerpc/axon_msi: Validate msi irq via chip_data
Instead of checking for rogue msi numbers via the irq_map host field
set the chip_data to h.host_data (which is the msic struct pointer)
at map and compare it in get_irq.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:30:29 +0000 (19:30 +0000)]
powerpc/spider-pic: Get pic from chip_data instead of irq_map
Building on Grant's efforts to remove the irq_map array, this patch
moves spider-pics use of virq_to_host() to use irq_data_get_chip_data
and sets the irq chip data in the map call, like most other interrupt
controllers in powerpc.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:30:26 +0000 (19:30 +0000)]
powerpc: Remove irq_host_ops->remap hook
It was called from irq_create_mapping if that was called for a host
and hwirq that was previously mapped, "to update the flags". But the
only implementation was in beat_interrupt and all it did was repeat a
hypervisor call without error checking that was performed with error
checking at the beginning of the map hook. In addition, the comment on
the beat remap hook says it will only called once for a given mapping,
which would apply to map not remap.
All flags should be known by the time the match hook is called, before
we call the map hook. Removing this mostly unused hook will simpify
the requirements of irq_domain concept.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:30:22 +0000 (19:30 +0000)]
powerpc/psurge: Create a irq_host for secondary cpus
Create a dummy irq_host using the generic dummy irq chip for the secondary
cpus to use. Create a direct irq mapping for the ipi and register the
ipi action handler against it. If for some unlikely reason part of this
fails then don't detect the secondary cpus.
This removes another instance of NO_IRQ_IGNORE, records the ipi stats
for the secondary cpus, and runs the ipi on the interrupt stack.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:30:18 +0000 (19:30 +0000)]
powerpc/mpc62xx_pic: Fix get_irq handling of NO_IRQ
If none of irq category bits were set mpc52xx_get_irq() would pass
NO_IRQ_IGNORE (-1) to irq_linear_revmap, which does an unsigned compare
and declares the interrupt above the linear map range. It then punts
to irq_find_mapping, which performs a linear search of all irqs,
which will likely miss and only then return NO_IRQ.
If no status bit is set, then we should return NO_IRQ directly.
The interrupt should not be suppressed from spurious counting, in fact
that is the definition of supurious.
Signed-off-by: Milton Miller <miltonm@bga.com> Acked-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:30:15 +0000 (19:30 +0000)]
powerpc/mpc5121_ads_cpld: Remove use of NO_IRQ_IGNORE
As NO_IRQ_IGNORE is only used between the static function cpld_pic_get_irq
and its caller cpld_pic_cascade, and cpld_pic_cascade only uses it to
suppress calling handle_generic_irq, we can change these uses to NO_IRQ
and remove the extra tests and pathlength in cpld_pic_cascade.
Signed-off-by: Milton Miller <miltonm@bga.com> Acked-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:30:11 +0000 (19:30 +0000)]
powerpc/fsl_msi: Use chip_data not handler_data
handler_data should be reserved for flow handlers on the dependent
irq, not consumed by the parent irq code that is part of the irq_chip
code. The msi_data pointer was already set in msidesc->irqhost->hostdata
and being copied to irq_data->chipdata in the msidesc->irqhost->map()
method called via create_irq_mapping, so we can obtain the pointer
from there and free the instance it in teardown_msi_irqs.
Also remove the unnecessary cast of irq_get_handler_data in the
cascade handler, which is the demux flow handler of the parent
msi interrupt. (This is the expected usage for handler_data).
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:30:07 +0000 (19:30 +0000)]
powerpc/fsl_msi: Don't abuse platform_data for driver_data
The msi platform device driver was abusing dev.platform_data for its
platform_driver_data. Use the correct pointer for storage.
Platform_data is supposed to be for platforms to communicate to drivers
parameters that are not otherwise discoverable. Its lifetime matches
the platform_device not the platform device driver. It is generally
not needed for drivers that only support systems with device trees.
Signed-off-by: Milton Miller <miltonm@bga.com> Acked-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:30:04 +0000 (19:30 +0000)]
powerpc: Remove i8259 irq_host_ops->unmap
It was never called because the host is always IRQ_HOST_MAP_LEGACY.
And what it purported to do was mask the interrupt (which will already
have happend if we shutdown the interrupt), then synchronise_irq and
clear the chip pointer, both of which will have been be done by the
caller were we to call unmap on a legacy irq.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:29:57 +0000 (19:29 +0000)]
powerpc: Return early if irq_host lookup type is wrong
If for some reason the code incrorectly calls the wrong function to
manage the revmap, not only should we warn, we should take action.
However, in the paths we expect to be taken every delivered interrupt
change to WARN_ON_ONCE. Use the if (WARN_ON(x)) format to get the
unlikely for free.
Signed-off-by: Milton Miller <miltonm@bga.com> Reviewed-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:29:53 +0000 (19:29 +0000)]
powerpc: Radix trees are available before init_IRQ
Since the generic irq code uses a radix tree for sparse interrupts,
the initcall ordering has been changed to initialize radix trees before
irqs. We no longer need to defer creating revmap radix trees to the
arch_initcall irq_late_init.
Also, the kmem caches are allocated so we don't need to use
zalloc_maybe_bootmem.
Signed-off-by: Milton Miller <miltonm@bga.com> Reviewed-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:29:46 +0000 (19:29 +0000)]
powerpc: Use bytes instead of bitops in smp ipi multiplexing
Since there are only 4 messages, we can replace the atomic bit set
(which uses atomic load reserve and store conditional sequence) with
a byte stores to seperate bytes. We still have to perform a load
reserve and store conditional sequence to avoid loosing messages on
reception but we can do that with a single call to xchg.
The do {} while and __BIG_ENDIAN specific mask testing was chosen by
looking at the generated asm code. On gcc-4.4, the bit masking becomes
a simple bit mask and test of the register returned from xchg without
storing and loading the value to the stack like attempts with a union
of bytes and an int (or worse, loading single bit constants from the
constant pool into non-voliatle registers that had to be preseved on
the stack). The do {} while avoids an unconditional branch to the
end of the loop to test the entry / repeat condition of a while loop
and instead optimises for the expected single iteration of the loop.
We have a full mb() at the beginning to cover ordering between send,
ipi, and receive so we can use xchg_local and forgo the further
acquire and release barriers of xchg.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:29:42 +0000 (19:29 +0000)]
powerpc: Add kconfig for muxed smp ipi support
Compile the new smp ipi mux and demux code only if a platform
will make use of it. The new config is selected as required.
The new cause_ipi smp op is only available conditionally to point out
configs where the select is required; this makes setting the op an
immediate fail instead of a deferred unresolved symbol at link.
This also creates a new config for power surge powermac upgrade support
that can be disabled in expert mode but is default on.
I also removed the depends / default y on CONFIG_XICS since it is selected
by PSERIES.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:29:39 +0000 (19:29 +0000)]
powerpc: Consolidate ipi message mux and demux
Consolidate the mux and demux of ipi messages into smp.c and call
a new smp_ops callback to actually trigger the ipi.
The powerpc architecture code is optimised for having 4 distinct
ipi triggers, which are mapped to 4 distinct messages (ipi many, ipi
single, scheduler ipi, and enter debugger). However, several interrupt
controllers only provide a single software triggered interrupt that
can be delivered to each cpu. To resolve this limitation, each smp_ops
implementation created a per-cpu variable that is manipulated with atomic
bitops. Since these lines will be contended they are optimialy marked as
shared_aligned and take a full cache line for each cpu. Distro kernels
may have 2 or 3 of these in their config, each taking per-cpu space
even though at most one will be in use.
This consolidation removes smp_message_recv and replaces the single call
actions cases with direct calls from the common message recognition loop.
The complicated debugger ipi case with its muxed crash handling code is
moved to debug_ipi_action which is now called from the demux code (instead
of the multi-message action calling smp_message_recv).
I put a call to reschedule_action to increase the likelyhood of correctly
merging the anticipated scheduler_ipi() hook coming from the scheduler
tree; that single required call can be inlined later.
The actual message decode is a copy of the old pseries xics code with its
memory barriers and cache line spacing, augmented with a per-cpu unsigned
long based on the book-e doorbell code. The optional data is set via a
callback from the implementation and is passed to the new cause-ipi hook
along with the logical cpu number. While currently only the doorbell
implemntation uses this data it should be almost zero cost to retrieve and
pass it -- it adds a single register load for the argument from the same
cache line to which we just completed a store and the register is dead
on return from the call. I extended the data element from unsigned int
to unsigned long in case some other code wanted to associate a pointer.
The doorbell check_self is replaced by a call to smp_muxed_ipi_resend,
conditioned on the CPU_DBELL feature. The ifdef guard could be relaxed
to CONFIG_SMP but I left it with BOOKE for now.
Also, the doorbell interrupt vector for book-e was not calling irq_enter
and irq_exit, which throws off cpu accounting and causes code to not
realize it is running in interrupt context. Add the missing calls.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:29:28 +0000 (19:29 +0000)]
powerpc: Remove stubbed beat smp support
I have no idea if the beat hypervisor supports multiple cpus in
a partition, but the code has not been touched since these stubs
were added in February of 2007 except to move them in April of 2008.
These are stubs: start_cpu always returns fail (which is dropped),
the message passing and reciving are empty functions, and the top
of file comment says "Incomplete".
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:29:24 +0000 (19:29 +0000)]
powerpc: Remove alloc_maybe_bootmem for zalloc version
Replace all remaining callers of alloc_maybe_bootmem with
zalloc_maybe_bootmem. The callsite in pci_dn is followed with a
memset to clear the memory, and not zeroing at the other callsites
in the celleb fake pci code could lead to following uninitialized
memory as pointers or even freeing said pointers on error paths.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:29:17 +0000 (19:29 +0000)]
powerpc/mpic: Simplify ipi cpu mask handling
Now that MSG_ALL and MSG_ALL_BUT_SELF have been eliminated,
smp_mpic_mesage_pass no longer needs to lookup the cpumask just to
have mpic_send_ipi extract part of it and recode it in a NR_CPUS loop
by mpic_physmask.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:29:06 +0000 (19:29 +0000)]
powerpc: Remove call sites of MSG_ALL_BUT_SELF
The only user of MSG_ALL_BUT_SELF in the whole kernel tree is powerpc,
and it only uses it to start the debugger. Both debuggers always call
smp_send_debugger_break with MSG_ALL_BUT_SELF, and only mpic can do
anything more optimal than a loop over all online cpus, but all message
passing implementations have to code for this special delivery target.
Convert smp_send_debugger_break to take void and loop calling the smp_ops
message_pass function for each of the other cpus in the online cpumask.
Use raw_smp_processor_id() because we are either entering the debugger
or trying to start kdump and the additional warning it not useful were
it to trigger.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:29:02 +0000 (19:29 +0000)]
powerpc/mpic: Break cpumask abstraction earlier
mpic_set_affinity is allocating and freeing a cpumask var even though
it was breaking the cpumask abstraction when passing the mask to
mpic_physmask. It also didn't have any check for allocatin failure.
Break the cpumask abstraction earlier and use simple bitwise and of the
bits from the mask with the bits of cpu_online_mask.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:28:59 +0000 (19:28 +0000)]
powerpc/mpic: Limit NR_CPUS loop to 32 bit
mpic_physmask was looping NR_CPUS times over a mask that was passed as
a u32. Since mpic is architecturaly limited to 32 physical cpus, clamp
the logical cpus to 32 when compiling (we could also clamp at runtime
to nr_cpu_ids).
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:28:55 +0000 (19:28 +0000)]
powerpc: Call no-longer static setup_nr_cpu_ids instead of replicating it
c1854e00727f50f7ac99e98d26ece04c087ef785 (powerpc: Set nr_cpu_ids early
and use it to free PACAs) copied the formerly static setup_nr_cpu_ids
from init/main.c but 34db18a054c600b6f81787165669dc572fe4de25 (smp:
move smp setup functions to kernel/smp.c) moved it to kernel/smp.c
with a declaration in include/linux/smp.h, so we can call it instead of
replicating it.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:28:52 +0000 (19:28 +0000)]
powerpc: Use nr_cpu_ids in initial paca allocation
Now that we never set a cpu above nr_cpu_ids possible we can
limit our initial paca allocation to nr_cpu_ids. We can then
clamp the number of cpus in platforms/iseries/setup.c.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:28:48 +0000 (19:28 +0000)]
powerpc: Respect nr_cpu_ids when calling set_cpu_possible and set_cpu_present
We should not set cpus above nr_cpu_ids to possible. While we
will trigger a warning with CONFIG_CPUMASK_DEBUG, even then the mask
initializers will set the bits beyond what the iterators check and cause
nr_cpu_ids to increase.
Respecting nr_cpu_ids during setup will allow us to use it in our initial
paca allocation. It can be reduced from NR_CPUS by the existing early param
nr_cpus=, which was added in 2b633e3fac5efada088b57d31e65401f22bcc18f (smp:
Use nr_cpus= to set nr_cpu_ids early). We already call parse_early_parms
between finding the command line and allocating the pacas.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:28:44 +0000 (19:28 +0000)]
powerpc/iseries: Cleanup and fix secondary startup
9cb82f2f4692293a27c578c3038518ce4477de72 (Make iSeries spin on
__secondary_hold_spinloop, like pSeries) added a load of current_set
but this load was repeated later and we don't even have the paca yet.
It also checked __secondary_hold_spinloop with a 32 bit compare instead
of a 64 bit compare.
1426d5a3bd07589534286375998c0c8c6fdc5260 (Dynamically allocate pacas)
doesn't allow for pacas to be less than lppacas and recalculated the paca
location from the cpu id in r0 every time through the secondary loop.
Various revisions over time made the comments on conditional branches
confusing with respect to being a hold loop or forward progress
Mostly in-order description of the changes:
Replicate the few lines of code saved by the ugly scoped ifdef CONFIG_SMP
in the secondary loop between yielding on UP and marking time with the
hypervisor on SMP. Always compile the iseries_secondary_yield loop and
use it if the cpu id is above nr_cpu_ids. Change all forward progress
paths to be forward branches to the next numerical label. Assign a
label to all loops. Move all sync instructions from the loops to the
forward progress path. Wait to load current_set until paca is set to go.
Move the iseries_secondary_smp_loop label to cover the whole spin loop.
Add HMT_MEDIUM when we make forward progress.
Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:28:41 +0000 (19:28 +0000)]
powerpc/kdump64: Don't reference freed memory as pacas
Starting with 1426d5a3bd07589534286375998c0c8c6fdc5260 (powerpc:
Dynamically allocate pacas) the space for pacas beyond cpu_possible
is freed, but we failed to update the loop in crash.c.
Milton Miller [Tue, 10 May 2011 19:28:37 +0000 (19:28 +0000)]
powerpc: Don't search for paca in freed memory
Starting with 1426d5a3bd07589534286375998c0c8c6fdc5260 (powerpc:
Dynamically allocate pacas) we free the memory for pacas beyond
cpu_possible, but we failed to update the loop the secondary cpus use
to find their paca. If the system has running cpu threads for which
the kernel did not allocate a paca for they will search the memory that
was freed. For instance this could happen when the device tree for
a kdump kernel was not updated after a cpu hotplug, or the kernel is
running with more cpus than the kernel was configured.
Since c1854e00727f50f7ac99e98d26ece04c087ef785 (powerpc: Set nr_cpu_ids
early and use it to free PACAs) we set nr_cpu_ids before telling the
cpus to advance, so use that to limit the search.
We can't reference nr_cpu_ids without CONFIG_SMP because it is defined
as 1 instead of a memory location, but any extra threads should be sent
to kexec_wait in that case anyways, so make that explicit and remove
the search loop for UP.
Note to stable: The fix also requires c1854e00727f50f7ac99e98d26ece04c087ef785 (powerpc: Set
nr_cpu_ids early and use it to free PACAs) to function. Also 9d07bc841c9779b4d7902e417f4e509996ce805d (Properly handshake CPUs going
out of boot spin loop) affects the second chunk, specifically the branch
target was 3b before and is 4b after that patch, and there was a blank
line before the #ifdef CONFIG_SMP that was removed
Cc: <stable@kernel.org> # .34.x: c1854e0072 powerpc: Set nr_cpu_ids early Cc: <stable@kernel.org> # .34.x Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:28:33 +0000 (19:28 +0000)]
powerpc/kexec: Fix memory corruption from unallocated slaves
Commit 1fc711f7ffb01089efc58042cfdbac8573d1b59a (powerpc/kexec: Fix race
in kexec shutdown) moved the write to signal the cpu had exited the kernel
from before the transition to real mode in kexec_smp_wait to kexec_wait.
Unfornately it missed that kexec_wait is used both by cpus leaving the
kernel and by secondary slave cpus that were not allocated a paca for
what ever reason -- they could be beyond nr_cpus or not described in
the current device tree for whatever reason (for example, kexec-load
was not refreshed after a cpu hotplug operation). Cpus coming through
that path they will write to paca[NR_CPUS] which is beyond the space
allocated for the paca data and overwrite memory not allocated to pacas
but very likely still real mode accessable).
Move the write back to kexec_smp_wait, which is used only by cpus that
found their paca, but after the transition to real mode.
Signed-off-by: Milton Miller <miltonm@bga.com> Cc: <stable@kernel.org> # (1fc711f was backported to 2.6.32) Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
pseries/iommu: Restore iommu table pointer when restoring iommu ops
When we swtich to direct dma ops, we set the dma data union to have the
dma offset. When we switch back to iommu table ops because of a later
dma_set_mask, we need to restore the iommu table pointer. Without this
change, crashes have been observed on kexec where (for reasons still
being investigated) we fall back to a 32-bit dma mask on a particular
device and then panic because the table pointer is not valid.
The easiset way to find this value is to call
pci_dma_dev_setup_pSeriesLP which will search up the pci tree until it
finds the node with the table.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Milton Miller <miltonm@bga.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Sun, 8 May 2011 21:36:44 +0000 (21:36 +0000)]
powerpc: Improve scheduling of system call entry instructions
After looking at our system call path, Mary Brown suggested that we
should put all mfspr SRR* instructions before any mtspr SRR*.
To test this I used a very simple null syscall (actually getppid)
testcase at http://ozlabs.org/~anton/junkcode/null_syscall.c
I tested with the following changes against the pseries_defconfig:
CONFIG_VIRT_CPU_ACCOUNTING=n
CONFIG_AUDIT=n
to remove the overhead of virtual CPU accounting and syscall
auditing.
POWER6:
baseline: mean = 757.2 cycles sd = 2.108
modified: mean = 759.1 cycles sd = 2.020
POWER7:
baseline: mean = 411.4 cycles sd = 0.138
modified: mean = 404.1 cycles sd = 0.109
So we have 1.77% improvement on POWER7 which looks significant. The
POWER6 suggest a 0.25% slowdown, but the results are within 1
standard deviation and may be in the noise.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Sun, 8 May 2011 21:20:19 +0000 (21:20 +0000)]
powerpc: Remove static branch hint in giveup_altivec
A static branch hint will override dynamic branch prediction on
recent POWER CPUs. Since we are about to use more altivec in the
kernel remove the static hint in giveup_altivec that assumes
a userspace task is using altivec.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc: Ensure dtl buffers do not cross 4k boundary
Future releases of fimrware will enforce a requirement that DTL buffers
do not cross a 4k boundary. Commit 127493d5dc73589cbe00ea5ec8357cc2a4c0d82a satisfies this requirement for
CONFIG_VIRT_CPU_ACCOUNTING=y kernels, but if !CONFIG_VIRT_CPU_ACCOUNTING
&& CONFIG_DTL=y, the current code will fail at dtl registration time.
Fix this by making the kmem cache from 127493d5dc73589cbe00ea5ec8357cc2a4c0d82a visible outside of setup.c and
using the same cache in both dtl.c and setup.c. This requires a bit of
reorganization to ensure ordering of the kmem cache and buffer
allocations.
Note: Since firmware now limits the size of the buffer, I made
dtl_buf_entries read-only in debugfs.
Tested with upcoming firmware with the 4 combinations of
CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_DTL.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When we kexec we look for a particular property added by the first
kernel, "linux,direct64-ddr-window-info", per-device where we already
have set up dynamic dma windows. The current code, though, wasn't
initializing the size of this property and thus when we kexec'd, we
would find the property but read uninitialized memory resulting in
garbage ddw values for the kexec'd kernel and panics. Fix this by
setting the size at enable_ddw() time and ensuring that the size of the
found property is valid at dupe_ddw_if_kexec() time.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch below removes an unused config variable found by using a kernel
cleanup script.
Note: I did try to cross compile these but hit erros while doing so..
(gcc is not setup to cross compile) and am unsure if anymore needs to be done.
Please have a look if/when anybody has free time.
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michal Marek [Tue, 5 Apr 2011 04:58:50 +0000 (04:58 +0000)]
powerpc: Call gzip with -n
The timestamps recorded in the .gz files add no value.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Michal Marek <mmarek@suse.cz> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
kerstin jonsson [Tue, 17 May 2011 23:57:11 +0000 (23:57 +0000)]
powerpc/4xx: Fix regression in SMP on 476
commit c56e58537d504706954a06570b4034c04e5b7500 breaks SMP support in PPC_47x chip.
secondary_ti must be set to current thread info before callin kick_cpu or else
start_secondary_47x will jump into void when trying to return to c-code.
In the current setup secondary_ti is initialized before the CPU idle task is started
and only the boot core will start. I am not sure this is the correct solution, but it
makes SMP possible in my chip.
Note! The HOTPLUG support probably need some fixing to, There is no trampoline code
available in head_44x.S - start_secondary_resume?
Signed-off-by: Kerstin Jonsson <kerstin.jonsson@ericsson.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Seems like people are getting confused by nested #ifdef's, so move the
definitions of crash_kexec_wait_realmode() after the #ifdef CONFIG_SMP
section.
Compile-tested with 32-bit UP, 32-bit SMP and 64-bit SMP configurations.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Linus Torvalds [Wed, 18 May 2011 20:25:57 +0000 (13:25 -0700)]
Merge branch 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6
* 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6:
drivercore: revert addition of of_match to struct device
of: fix race when matching drivers
Grant Likely [Wed, 18 May 2011 17:19:24 +0000 (11:19 -0600)]
drivercore: revert addition of of_match to struct device
Commit b826291c, "drivercore/dt: add a match table pointer to struct
device" added an of_match pointer to struct device to cache the
of_match_table entry discovered at driver match time. This was unsafe
because matching is not an atomic operation with probing a driver. If
two or more drivers are attempted to be matched to a driver at the
same time, then the cached matching entry pointer could get
overwritten.
This patch reverts the of_match cache pointer and reworks all users to
call of_match_device() directly instead.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Milton Miller [Wed, 18 May 2011 15:27:39 +0000 (10:27 -0500)]
of: fix race when matching drivers
If two drivers are probing devices at the same time, both will write
their match table result to the dev->of_match cache at the same time.
Only write the result if the device matches.
In a thread titled "SBus devices sometimes detected, sometimes not",
Meelis reported his SBus hme was not detected about 50% of the time.
From the debug suggested by Grant it was obvious another driver matched
some devices between the call to match the hme and the hme discovery
failling.
Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Milton Miller <miltonm@bga.com>
[grant.likely: modified to only call of_match_device() once] Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Linus Torvalds [Wed, 18 May 2011 13:49:02 +0000 (06:49 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: don't delay blk_run_queue_async
scsi: remove performance regression due to async queue run
blk-throttle: Use task_subsys_state() to determine a task's blkio_cgroup
block: rescan partitions on invalidated devices on -ENOMEDIA too
cdrom: always check_disk_change() on open
block: unexport DISK_EVENT_MEDIA_CHANGE for legacy/fringe drivers
Florian Fainelli [Fri, 13 May 2011 15:41:21 +0000 (17:41 +0200)]
MIPS: AR7: Fix GPIO register size for Titan variant.
The 'size' variable contains the correct register size for both AR7
and Titan, but we never used it to ioremap the correct register size.
This problem only shows up on Titan.
[ralf@linux-mips.org: Fixed the fix. The original patch as in patchwork
recognizes the problem correctly then fails to fix it ...]
This is the MIPS portion of Joe Perches <joe@perches.com>'s
https://patchwork.linux-mips.org/patch/2172/ which seems to have been
lost in time and space.
Shaohua Li [Wed, 18 May 2011 09:22:43 +0000 (11:22 +0200)]
block: don't delay blk_run_queue_async
Let's check a scenario:
1. blk_delay_queue(q, SCSI_QUEUE_DELAY);
2. blk_run_queue_async();
the second one will became a noop, because q->delay_work already has
WORK_STRUCT_PENDING_BIT set, so the delayed work will still run after
SCSI_QUEUE_DELAY. But blk_run_queue_async actually hopes the delayed
work runs immediately.
Fix this by doing a cancel on potentially pending delayed work
before queuing an immediate run of the workqueue.
Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Linus Torvalds [Wed, 18 May 2011 10:13:46 +0000 (03:13 -0700)]
Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf evlist: Fix per thread mmap setup
perf tools: Honour the cpu list parameter when also monitoring a thread list
kprobes, x86: Disable irqs during optimized callback
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
cifs: fix cifsConvertToUCS() for the mapchars case
cifs: add fallback in is_path_accessible for old servers
os_dump_core() uses abort() to terminate UML in case of an fatal error.
glibc's abort() calls raise(SIGABRT) which makes use of tgkill().
tgkill() has no effect within UML's kernel threads because they are not
pthreads. As fallback abort() executes an invalid instruction to
terminate the process. Therefore UML gets killed by SIGSEGV and leaves a
ugly log entry in the host's kernel ring buffer.
To get rid of this we use our own abort routine.
Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ZONE_CONGESTED should be a state of global memory reclaim. If not, a busy
memcg sets this and give unnecessary throttoling in wait_iff_congested()
against memory recalim in other contexts. This makes system performance
bad.
I'll think about "memcg is congested!" flag is required or not, later.
But this fix is required first.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Acked-by: Ying Han <yinghan@google.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix switch initialization to ensure that all switches have default routing
disabled. This guarantees that no unexpected RapidIO packets arrive to
the default port set by reset and there is no default routing destination
until it is properly configured by software.
This update also unifies handling of unmapped destinations by tsi57x, IDT
Gen1 and IDT Gen2 switches.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Thomas Moll <thomas.moll@sysgo.com> Cc: <stable@kernel.org> [2.6.37+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Layton [Tue, 17 May 2011 19:28:21 +0000 (15:28 -0400)]
cifs: fix cifsConvertToUCS() for the mapchars case
As Metze pointed out, commit 84cdf74e broke mapchars option:
Commit "cifs: fix unaligned accesses in cifsConvertToUCS"
(84cdf74e8096a10dd6acbb870dd404b92f07a756) does multiple steps
in just one commit (moving the function and changing it without
testing).
put_unaligned_le16(temp, &target[j]); is never called for any
codepoint the goes via the 'default' switch statement. As a result
we put just zero (or maybe uninitialized) bytes into the target
buffer.
His proposed patch looks correct, but doesn't apply to the current head
of the tree. This patch should also fix it.
Cc: <stable@kernel.org> # .38.x: 581ade4: cifs: clean up various nits in unicode routines (try #2) Reported-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
Jeff Layton [Tue, 17 May 2011 10:40:30 +0000 (06:40 -0400)]
cifs: add fallback in is_path_accessible for old servers
The is_path_accessible check uses a QPathInfo call, which isn't
supported by ancient win9x era servers. Fall back to an older
SMBQueryInfo call if it fails with the magic error codes.
Cc: stable@kernel.org Reported-and-Tested-by: Sandro Bonazzola <sandro.bonazzola@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
Borislav Petkov [Tue, 17 May 2011 12:55:19 +0000 (14:55 +0200)]
x86, AMD: Fix ARAT feature setting again
Trying to enable the local APIC timer on early K8 revisions
uncovers a number of other issues with it, in conjunction with
the C1E enter path on AMD. Fixing those causes much more churn
and troubles than the benefit of using that timer brings so
don't enable it on K8 at all, falling back to the original
functionality the kernel had wrt to that.
Reported-and-bisected-by: Nick Bowler <nbowler@elliptictech.com> Cc: Boris Ostrovsky <Boris.Ostrovsky@amd.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Hans Rosenfeld <hans.rosenfeld@amd.com> Cc: Nick Bowler <nbowler@elliptictech.com> Cc: Joerg-Volker-Peetz <jvpeetz@web.de> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Link: http://lkml.kernel.org/r/1305636919-31165-3-git-send-email-bp@amd64.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
Moving the lower endpoint of the Erratum 400 check to accomodate
earlier K8 revisions (A-E) opens a can of worms which is simply
not worth to fix properly by tweaking the errata checking
framework:
* missing IntPenging MSR on revisions < CG cause #GP:
Jens Axboe [Tue, 17 May 2011 09:04:44 +0000 (11:04 +0200)]
scsi: remove performance regression due to async queue run
Commit c21e6beb removed our queue request_fn re-enter
protection, and defaulted to always running the queues from
kblockd to be safe. This was a known potential slow down,
but should be safe.
Unfortunately this is causing big performance regressions for
some, so we need to improve this logic. Looking into the details
of the re-enter, the real issue is on requeue of requests.
Requeue of requests upon seeing a BUSY condition from the device
ends up re-running the queue, causing traces like this:
potentially causing the issue we want to avoid. So special
case the requeue re-run of the queue, but improve it to offload
the entire run of local queue and starved queue from a single
workqueue callback. This is a lot better than potentially
kicking off a workqueue run for each device seen.
This also fixes the issue of the local device going into recursion,
since the above mentioned commit never moved that queue run out
of line.
Linus Torvalds [Tue, 17 May 2011 01:36:47 +0000 (18:36 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
Revert "mmc: fix a race between card-detect rescan and clock-gate work instances"
Yinghai Lu [Sat, 14 May 2011 01:06:17 +0000 (18:06 -0700)]
PCI: Clear bridge resource flags if requested size is 0
During pci remove/rescan testing found:
pci 0000:c0:03.0: PCI bridge to [bus c4-c9]
pci 0000:c0:03.0: bridge window [io 0x1000-0x0fff]
pci 0000:c0:03.0: bridge window [mem 0xf0000000-0xf00fffff]
pci 0000:c0:03.0: bridge window [mem 0xfc180000000-0xfc197ffffff 64bit pref]
pci 0000:c0:03.0: device not available (can't reserve [io 0x1000-0x0fff])
pci 0000:c0:03.0: Error enabling bridge (-22), continuing
pci 0000:c0:03.0: enabling bus mastering
pci 0000:c0:03.0: setting latency timer to 64
pcieport 0000:c0:03.0: device not available (can't reserve [io 0x1000-0x0fff])
pcieport: probe of 0000:c0:03.0 failed with error -22
This bug was caused by commit c8adf9a3e873 ("PCI: pre-allocate
additional resources to devices only after successful allocation of
essential resources.")
After that commit, pci_hotplug_io_size is changed to additional_io_size
from minium size. So it will not go through resource_size(res) != 0
path, and will not be reset.
The root cause is: pci_bridge_check_ranges will set RESOURCE_IO flag for
pci bridge, and later if children do not need IO resource. those bridge
resources will not need to be allocated. but flags is still there.
that will confuse the the pci_enable_bridges later.
for (list = head->next; list; list = list->next) {
res = list->res;
idx = res - &list->dev->resource[0];
if (resource_size(res) && pci_assign_resource(list->dev, idx)) {
...
reset_resource(res);
}
}
}
At last, We have to clear the flags in pbus_size_mem/io when requested
size == 0 and !add_head. becasue this case it will not go through
adjust_resources_sorted().
Just make size1 = size0 when !add_head. it will make flags get cleared.
At the same time when requested size == 0, add_size != 0, will still
have in head and add_list. because we do not clear the flags for it.
After this, we will get right result:
pci 0000:c0:03.0: PCI bridge to [bus c4-c9]
pci 0000:c0:03.0: bridge window [io disabled]
pci 0000:c0:03.0: bridge window [mem 0xf0000000-0xf00fffff]
pci 0000:c0:03.0: bridge window [mem 0xfc180000000-0xfc197ffffff 64bit pref]
pci 0000:c0:03.0: enabling bus mastering
pci 0000:c0:03.0: setting latency timer to 64
pcieport 0000:c0:03.0: setting latency timer to 64
pcieport 0000:c0:03.0: irq 160 for MSI/MSI-X
pcieport 0000:c0:03.0: Signaling PME through PCIe PME interrupt
pci 0000:c4:00.0: Signaling PME through PCIe PME interrupt
pcie_pme 0000:c0:03.0:pcie01: service driver pcie_pme loaded
aer 0000:c0:03.0:pcie02: service driver aer loaded
pciehp 0000:c0:03.0:pcie04: Hotplug Controller:
v3: more simple fix. also fix one typo in pbus_size_mem
Signed-off-by: Yinghai Lu <yinghai@kernel.org> Reviewed-by: Ram Pai <linuxram@us.ibm.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Mon, 16 May 2011 09:07:48 +0000 (11:07 +0200)]
tick: Clear broadcast active bit when switching to oneshot
The first cpu which switches from periodic to oneshot mode switches
also the broadcast device into oneshot mode. The broadcast device
serves as a backup for per cpu timers which stop in deeper
C-states. To avoid starvation of the cpus which might be in idle and
depend on broadcast mode it marks the other cpus as broadcast active
and sets the brodcast expiry value of those cpus to the next tick.
The oneshot mode broadcast bit for the other cpus is sticky and gets
only cleared when those cpus exit idle. If a cpu was not idle while
the bit got set in consequence the bit prevents that the broadcast
device is armed on behalf of that cpu when it enters idle for the
first time after it switched to oneshot mode.
In most cases that goes unnoticed as one of the other cpus has usually
a timer pending which keeps the broadcast device armed with a short
timeout. Now if the only cpu which has a short timer active has the
bit set then the broadcast device will not be armed on behalf of that
cpu and will fire way after the expected timer expiry. In the case of
Christians bug report it took ~145 seconds which is about half of the
wrap around time of HPET (the limit for that device) due to the fact
that all other cpus had no timers armed which expired before the 145
seconds timeframe.
The solution is simply to clear the broadcast active bit
unconditionally when a cpu switches to oneshot mode after the first
cpu switched the broadcast device over. It's not idle at that point
otherwise it would not be executing that code.
[ I fundamentally hate that broadcast crap. Why the heck thought some
folks that when going into deep idle it's a brilliant concept to
switch off the last device which brings the cpu back from that
state? ]
Thanks to Christian for providing all the valuable debug information!
Michał Mirosław [Mon, 16 May 2011 19:14:21 +0000 (15:14 -0400)]
net: Change netdev_fix_features messages loglevel
Those reduced to DEBUG can possibly be triggered by unprivileged processes
and are nothing exceptional. Illegal checksum combinations can only be
caused by driver bug, so promote those messages to WARN.
Since GSO without SG will now only cause DEBUG message from
netdev_fix_features(), remove the workaround from register_netdevice().
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Jarosch [Mon, 16 May 2011 06:28:15 +0000 (06:28 +0000)]
vmxnet3: Fix inconsistent LRO state after initialization
During initialization of vmxnet3, the state of LRO
gets out of sync with netdev->features.
This leads to very poor TCP performance in a IP forwarding
setup and is hitting many VMware users.
Simplified call sequence:
1. vmxnet3_declare_features() initializes "adapter->lro" to true.
2. The kernel automatically disables LRO if IP forwarding is enabled,
so vmxnet3_set_flags() gets called. This also updates netdev->features.
3. Now vmxnet3_setup_driver_shared() is called. "adapter->lro" is still
set to true and LRO gets enabled again, even though
netdev->features shows it's disabled.
Fix it by updating "adapter->lro", too.
The private vmxnet3 adapter flags are scheduled for removal
in net-next, see commit a0d2730c9571aeba793cb5d3009094ee1d8fda35
"net: vmxnet3: convert to hw_features".
Patch applies to 2.6.37 / 2.6.38 and 2.6.39-rc6.
Please CC: comments.
Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Mon, 16 May 2011 06:13:49 +0000 (06:13 +0000)]
sfc: Fix oops in register dump after mapping change
Commit 747df2258b1b9a2e25929ef496262c339c380009 ('sfc: Always map MCDI
shared memory as uncacheable') introduced a separate mapping for the
MCDI shared memory (MC_TREG_SMEM). This means we can no longer easily
include it in the register dump. Since it is not particularly useful
in debugging, substitute a recognisable dummy value.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 16 May 2011 15:55:49 +0000 (08:55 -0700)]
Merge branch 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6
* 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6:
OMAP3: set the core dpll clk rate in its set_rate function
omap: iommu: Return IRQ_HANDLED in fault handler when no fault occured
Linus Torvalds [Mon, 16 May 2011 15:47:31 +0000 (08:47 -0700)]
Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm: Take lock around probes for drm_fb_helper_hotplug_event
drm/i915: Revert i915.semaphore=1 default from 47ae63e0
vga_switcheroo: don't toggle-switch devices
drm/radeon/kms: add some evergreen/ni safe regs
drm/radeon/kms: fix extended lvds info parsing
drm/radeon/kms: fix tiling reg on fusion
Vivek Goyal [Mon, 16 May 2011 13:24:08 +0000 (15:24 +0200)]
blk-throttle: Use task_subsys_state() to determine a task's blkio_cgroup
Currentlly we first map the task to cgroup and then cgroup to
blkio_cgroup. There is a more direct way to get to blkio_cgroup
from task using task_subsys_state(). Use that.
The real reason for the fix is that it also avoids a race in generic
cgroup code. During remount/umount rebind_subsystems() is called and
it can do following with and rcu protection.
cgrp->subsys[i] = NULL;
That means if somebody got hold of cgroup under rcu and then it tried
to do cgroup->subsys[] to get to blkio_cgroup, it would get NULL which
is wrong. I was running into this race condition with ltp running on a
upstream derived kernel and that lead to crash.
So ideally we should also fix cgroup generic code to wait for rcu
grace period before setting pointer to NULL. Li Zefan is not very keen
on introducing synchronize_wait() as he thinks it will slow
down moun/remount/umount operations.
So for the time being atleast fix the kernel crash by taking a more
direct route to blkio_cgroup.
One tester had reported a crash while running LTP on a derived kernel
and with this fix crash is no more seen while the test has been
running for over 6 days.
Youquan Song [Thu, 21 Apr 2011 16:22:43 +0000 (00:22 +0800)]
x86, apic: Fix spurious error interrupts triggering on all non-boot APs
This patch fixes a bug reported by a customer, who found
that many unreasonable error interrupts reported on all
non-boot CPUs (APs) during the system boot stage.
According to Chapter 10 of Intel Software Developer Manual
Volume 3A, Local APIC may signal an illegal vector error when
an LVT entry is set as an illegal vector value (0~15) under
FIXED delivery mode (bits 8-11 is 0), regardless of whether
the mask bit is set or an interrupt actually happen. These
errors are seen as error interrupts.
The initial value of thermal LVT entries on all APs always reads
0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI
sequence to them and LVT registers are reset to 0s except for
the mask bits which are set to 1s when APs receive INIT IPI.
When the BIOS takes over the thermal throttling interrupt,
the LVT thermal deliver mode should be SMI and it is required
from the kernel to keep AP's LVT thermal monitoring register
programmed as such as well.
This issue happens when BIOS does not take over thermal throttling
interrupt, AP's LVT thermal monitor register will be restored to
0x10000 which means vector 0 and fixed deliver mode, so all APs will
signal illegal vector error interrupts.
This patch check if interrupt delivery mode is not fixed mode before
restoring AP's LVT thermal monitor register.
Signed-off-by: Youquan Song <youquan.song@intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Yong Wang <yong.y.wang@intel.com> Cc: hpa@linux.intel.com Cc: joe@perches.com Cc: jbaron@redhat.com Cc: trenn@suse.de Cc: kent.liu@intel.com Cc: chaohong.guo@intel.com Cc: <stable@kernel.org> # As far back as possible Link: http://lkml.kernel.org/r/1303402963-17738-1-git-send-email-youquan.song@intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>