git.karo-electronics.de Git - linux-beck.git/log

genirq: Add chip flag to force mask on suspend

On suspend we disable all interrupts in the core code, but this does
not mask the interrupt line in the default implementation as we use a
lazy disable approach. That means we mark the interrupt disabled, but
leave the hardware unmasked. That's an optimization because we avoid
the hardware access for the common case where no interrupt happens
after we marked it disabled. If an interrupt happens, then the
interrupt flow handler masks the line at the hardware level and marks
it pending.

Suspend makes use of this delayed disable as it "disables" all
interrupts when preparing the suspend transition. Right before the
system goes into hardware suspend state it checks whether one of the
interrupts which is marked as a wakeup interrupt came in after
disabling it.

Most interrupt chips have a separate register which selects the
interrupts which can wake up the system from suspend, so we don't have
to mask any on the non wakeup interrupts.

But now we have to deal with brilliant designed hardware which lacks
such a wakeup configuration facility. For such hardware it's necessary
to mask all non wakeup interrupts before going into suspend in order
to avoid the wakeup from random interrupts.

Rather than working around this in the affected interrupt chip
implementations we can solve this elegant in the core code itself.

Add a flag IRQCHIP_MASK_ON_SUSPEND which can be set by the irq chip
implementation to indicate, that the interrupts which are not selected
as wakeup sources must be masked in the suspend path. Mask them in the
loop which checks the wakeup interrupts pending flag.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
LKML-Reference: <alpine.LFD.2.00.1103112112310.2787@localhost6.localdomain6>

genirq: Add desc->irq_data accessor

We have accessors for all fields in irq_data based on irq_desc, but
not for irq_data itself.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Add comments to Kconfig switches

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Sam Ravnborg <sam@ravnborg.org>

genirq: Fixup fasteoi handler for oneshot mode

The fasteoi handler must mask the interrupt line in oneshot mode
otherwise we end up with an irq storm.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Provide forced interrupt threading

Add a commandline parameter "threadirqs" which forces all interrupts except
those marked IRQF_NO_THREAD to run threaded. That's mostly a debug option to
allow retrieving better debug data from crashing interrupt handlers. If
"threadirqs" is not enabled on the kernel command line, then there is no
impact in the interrupt hotpath.

Architecture code needs to select CONFIG_IRQ_FORCED_THREADING after
marking the interrupts which cant be threaded IRQF_NO_THREAD. All
interrupts which have IRQF_TIMER set are implict marked
IRQF_NO_THREAD. Also all PER_CPU interrupts are excluded.

Forced threading hard interrupts also forces all soft interrupt
handling into thread context.

When enabled it might slow down things a bit, but for debugging problems in
interrupt code it's a reasonable penalty as it does not immediately
crash and burn the machine when an interrupt handler is buggy.

Some test results on a Core2Duo machine:

Cache cold run of:
# time git grep irq_desc

      non-threaded       threaded
real 1m18.741s          1m19.061s
user 0m1.874s           0m1.757s
sys  0m5.843s           0m5.427s

# iperf -c server
non-threaded
[  3]  0.0-10.0 sec  1.09 GBytes   933 Mbits/sec
[  3]  0.0-10.0 sec  1.09 GBytes   934 Mbits/sec
[  3]  0.0-10.0 sec  1.09 GBytes   933 Mbits/sec
threaded
[  3]  0.0-10.0 sec  1.09 GBytes   939 Mbits/sec
[  3]  0.0-10.0 sec  1.09 GBytes   934 Mbits/sec
[  3]  0.0-10.0 sec  1.09 GBytes   937 Mbits/sec

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20110223234956.772668648@linutronix.de>

sched: Switch wait_task_inactive to schedule_hrtimeout()

When we force thread hard and soft interrupts the startup of ksoftirqd
would hang in kthread_bind() when wait_task_inactive() calls
schedule_timeout_uninterruptible() because there is no softirq yet
which will wake us up.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20110223234956.677109139@linutronix.de>

genirq: Add IRQF_NO_THREAD

Some low level interrupts cannot be threaded even when we force thread
all interrupt handlers. Add a flag to annotate such interrupts. Add
all timer interrupts to this category by default.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20110223234956.578893460@linutronix.de>

genirq: Allow shared oneshot interrupts

Support ONESHOT on shared interrupts, if all drivers agree on it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20110223234956.483640430@linutronix.de>

genirq: Prepare the handling of shared oneshot interrupts

For level type interrupts we need to track how many threads are on
flight to avoid useless interrupt storms when not all thread handlers
have finished yet. Keep track of the woken threads and only unmask
when there are no more threads in flight.

Yes, I'm lazy and using a bitfield. But not only because I'm lazy, the
main reason is that it's way simpler than using a refcount. A refcount
based solution would need to keep track of various things like
crashing the irq thread, spurious interrupts coming in,
disables/enables, free_irq() and some more. The bitfield keeps the
tracking simple and makes things just work. It's also nicely confined
to the thread code pathes and does not require additional checks all
over the place.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20110223234956.388095876@linutronix.de>

genirq: Make warning in handle_percpu_event useful

The WARN_ON_ONCE in handle_percpu_event() which emits a warning when
an action handler returns with interrupts enabled is not really
useful. It does not reveal the interrupt number and handler function
which caused it. Make it WARN_ONCE() and add the information.

Reported-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Streamline kernel/irq/Kconfig

"def_bool n" without prompt is pointless, these should be just "bool".

[ tglx: Adapted to latest changes ]

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4D5D3309020000780003264A@vpn.id2.novell.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Use the correct variable for note_interrupt

note_interrupt wants to be called with the combined result of all
handlers called, not with the last one. If it's a shared interrupt
then the last handler might return IRQ_NONE often enough to trigger
the spurious dectector which turns off a perfectly fine working
interrupt line. Bug was introduced in commit 1277a532(genirq: Simplify
handle_irq_event()).

Yes, I really messed up there. First the variable ret should not have
been named differently to avoid similarity with retval. Second it
should have been declared in the do {} loop.

Rename it to res and move it into the do {} loop and vanish under a
huge brown paperbag.

Reported-bisected-tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Add missing break in __irq_set_trigger()

The switch case in __irq_set_trigger() lacks a break, which emits a
pr_err unconditionally on success.

Reported-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Use IRQ_BITMAP_BITS as search size in irq_alloc_descs()

The runtime expansion of nr_irqs does not take into account that
bitmap_find_next_zero_area() returns "start" + size in case the search
for an matching zero area fails. That results in a start value which
can be completely off and is not covered by the following
expand_nr_irqs() and possibly outside of the absolute limit. But we
use it without further checking.

Use IRQ_BITMAP_BITS as the limit for the bitmap search and expand
nr_irqs when the start bit is beyond nr_irqs. So start is always
pointing to the correct area in the bitmap. nr_irqs is just the limit
for irq enumerations, not the real limit for the irq space.

[ tglx: Let irq_expand_nr_irqs() take the new upper end so we do not
expand nr_irqs more than necessary. Made changelog readable ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4D6014F9.8040605@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Fix misplaced status update in irq_disable()

We lazy disable interrupt lines, so only mark the line masked, when
the chip provides an irq_disable callback.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Implement irq_data based move_*_irq() versions

No need to lookup the irq descriptor when calling from a chip callback
function which has irq_data already handy.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq; Add fasteoi irq_chip quirk

Some chips want irq_eoi() only called when an interrupt is actually
handled. So they have checks for INPROGRESS and DISABLED in their
irq_eoi callbacks. Add a chip flag, which allows to handle that in the
generic code. No impact on the fastpath.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Add preflow handler support

sparc64 needs to call a preflow handler on certain interrupts befor
calling the action chain. Integrate it into handle_fasteoi_irq. Must
be enabled via CONFIG_IRQ_FASTEOI_PREFLOW. No impact when disabled.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David S. Miller <davem@davemloft.net>

genirq: Consolidate set_chip_handler functions

No need to have separate functions if we have one plus inline wrappers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Use irq_get/put functions

Convert the management functions to use the common irq_get/put
function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Implement irq_get/put_desc_[bus]locked/unlock()

Most of the managing functions get the irq descriptor and lock it -
either with or without buslock. Instead of open coding this over and
over provide a common function to do that.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Remove real old transition functions

These transition helpers are stale for years now. Remove them.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Remove desc->status when GENERIC_HARDIRQS_NO_COMPAT=y

If everything uses the right accessors, then enabling
GENERIC_HARDIRQS_NO_COMPAT should just work. If not it will tell you.

Don't be lazy and use the trick which I use in the core code!

git grep status_use_accessors

will unearth it in a split second. Offenders are tracked down and not
slapped with stinking trouts. This time we use frozen shark for a
better educational value.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Reflect IRQ_MOVE_PCNTXT in irq_data state

Required by x86.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Move wakeup state to irq_data

Some irq_chips need to know the state of wakeup mode for
setting the trigger type etc. Reflect it in irq_data state.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Add IRQCHIP_SET_TYPE_MASKED flag

irq_chips, which require to mask the chip before changing the trigger
type should set this flag. So the core takes care of it and the
requirement for looking into desc->status in the chip goes away.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Walleij <linus.walleij@stericsson.com>
Cc: Lars-Peter Clausen <lars@metafoo.de>

genirq: Add flags to irq_chip

Looking through irq_chip implementations I noticed that some of them
have special requirements, like setting the type masked and therefor
fiddle in irq_desc->status. Add a flag field, so the core code can
handle it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Cleanup irq.h

Put the constants into an enum and document them.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Force wrapped access to desc->status in core code

Force the usage of wrappers by another nasty CPP substitution.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Wrap the remaning IRQ_* flags

Use wrappers to keep them away from the core code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Mirror irq trigger type bits in irq_data.state

That's the data structure chip functions get provided. Also allow them
to signal the core code that they updated the flags in irq_data.state
by returning IRQ_SET_MASK_OK_NOCOPY. The default is unchanged.

The type bits should be accessed via:

val = irqd_get_trigger_type(irqdata);
and
irqd_set_trigger_type(irqdata, val);

Coders who access them directly will be tracked down and slapped with
stinking trouts.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Move IRQ_AFFINITY_SET to core

Keep status in sync until last abuser is gone.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Reuse existing can set affinty check

Add a !desc check while at it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Mirror IRQ_PER_CPU and IRQ_NO_BALANCING in irq_data.state

That's the right data structure to look at for arch code.

Accessor functions are provided.

irqd_is_per_cpu(irqdata);
irqd_can_balance(irqdata);

Coders who access them directly will be tracked down and slapped with
stinking trouts.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Move debug code to separate header

It'll break when I'm going to undefine the constants.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Remove CHECK_IRQ_PER_CPU from core code

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Make CHECK_IRQ_PER_CPU an inline and deprecate it

Its' too ugly and needs to go. The only users are core code and
parisc. Core code does not need it and parisc gets a new check once
IRQ_PER_CPU is reflected in irq_data.state.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Remove CONFIG_IRQ_PER_CPU

The saving of this switch is minimal versus the ifdef mess it
creates. Simple enable PER_CPU unconditionally and remove the config
switch.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Add IRQ_MOVE_PENDING to irq_data.state

chip implementations need to know about it. Keep status in sync until
all users are fixed.

Accessor function: irqd_is_setaffinity_pending(irqdata)

Coders who access them directly will be tracked down and slapped with
stinking trouts.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Add state field to irq_data

Some chip implementations need to access certain status flags. With
sparse irqs that requires a lookup of the irq descriptor. Add a state
field which contains such flags.

Name it in a way which will make coders happy to access it with the
proper accessor functions. And it's easy to grep for.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Move IRQ_WAKEUP to core

No users outside of core.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Move IRQ_SUSPENDED to core

No users outside of core.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Move IRQ_MASKED to core

Keep status in sync until all users are fixed.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Move IRQ_PENDING flag to core

Keep status in sync until all users are fixed.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Move IRQ_DISABLED to core

Keep status in sync until all abusers are fixed.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Move IRQ_REPLAY and IRQ_WAITING to core

No users outside of core.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Move IRQ_ONESHOT to core

No users outside of core.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Add IRQ_INPROGRESS to core

We need to maintain the flag for now in both fields status and istate.
Add a CONFIG_GENERIC_HARDIRQS_NO_COMPAT switch to allow testing w/o
the status one. Wrap the access to status IRQ_INPROGRESS in a inline
which can be turned of with CONFIG_GENERIC_HARDIRQS_NO_COMPAT along
with the define.

There is no reason that anything outside of core looks at this. That
needs some modifications, but we'll get there.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Move IRQ_POLL_INPROGRESS to core

No users outside of core.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Use modify_status for set_irq_nested_thread

No need for a separate function in the core code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Move IRQ_SPURIOUS_DISABLED to core state

No users outside.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Move IRQ_AUTODETECT to internal state

No users outside of core

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Protect tglx from tripping over his own feet

The irq_desc.status field will either go away or renamed to
settings. Anyway we need to maintain compatibility to avoid breaking
the world and some more. While moving bits into the core, I need to
avoid that I use any of the still existing IRQ_ bits in the core code
by typos. So that file will hold the inline wrappers and some nasty
CPP tricks to break the build when typoed.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Add internal state field to irq_desc

That field will contain internal state information which is not going
to be exposed to anything outside the core code - except via accessor
functions. I'm tired of everyone fiddling in irq_desc.status.

core_internal_state__do_not_mess_with_it is clear enough, annoying to
type and easy to grep for. Offenders will be tracked down and slapped
with stinking trouts.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Fixup core code namespace fallout

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Implement generic irq_show_interrupts()

All archs implement show_interrupts() in more or less the same
way. That's tons of duplicated code with different bugs with no
value. Implement a generic version and deprecate show_interrupts()

Unfortunately we need some ifdeffery for !GENERIC_HARDIRQ archs.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Simplify handle_irq_event()

Now that all core users are converted one layer can go.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Use handle_irq_event() in the spurious poll code

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Use handle_perpcu_event() in handle_percpu_irq()

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Use handle_irq_event() in handle_edge_irq()

It's safe to drop the IRQ_INPROGRESS flag between action chain walks
as we are protected by desc->lock.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Use handle_irq_event() in handle_fasteoi_irq()

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Use handle_irq_event() in handle_level_irq()

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Use handle_irq_event() in handle_simple_irq()

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Implement handle_irq_event()

Core code replacement for the ugly camel case. It contains all the
code which is shared in all handlers.

     clear status flags
     set INPROGRESS flag
     unlock
     call action chain
     note_interrupt
     lock
     clr INPROGRESS flag

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Do not fiddle with IRQ_MASKED in handle_edge_irq()

IRQ_MASKED is set in mask_ack_irq() anyway. Remove it from
handle_edge_irq() to allow simpler ab^HHreuse of that function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20110202212551.918484270@linutronix.de>

genirq: Consolidate IRQ_DISABLED

Handle IRQ_DISABLED consistent.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Remove default magic

Now that everything uses the wrappers, we can remove the default
functions. None of those functions is performance critical.

That makes the IRQ_MASKED flag tracking fully consistent.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Consolidate disable/enable

Create irq_disable/enable and use them to keep the flags consistent.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Consolidate startup/shutdown of interrupts

Aside of duplicated code some of the startup/shutdown sites do not
handle the MASKED/DISABLED flags and the depth field at all. Move that
to a helper function and take care of it there.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20110202212551.787481468@linutronix.de>

genirq: Remove bogus conditional

The if (chip->irq_shutdown) check will always evaluate to true, as we
fill in chip->irq_shutdown with default_shutdown in
irq_chip_set_defaults() if the chip does not provide its own function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20110202212551.667607458@linutronix.de>

genirq: Move irq thread flags to core

Soleley used in core code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Mark polled irqs and defer the real handler

With the chip.end() function gone we might run into a situation where
a poll call runs and the real interrupt comes in, sees IRQ_INPROGRESS
and disables the line. That might be a perfect working one, which will
then be masked forever.

So mark them polled while the poll runs. When the real handler sees
IRQ_INPROGRESS it checks the poll flag and waits for the polling to
complete. Add the necessary amount of sanity checks to it to avoid
deadlocks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: spurious: Run only one poller at a time

No point in running concurrent pollers which confuse each other by
setting PENDING.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Do not poll disabled, percpu and timer interrupts

There is no point in polling disabled lines.

percpu does not make sense at all because we only poll on the cpu
we're currently running on. Also polling per_cpu interrupts is racy as
hell. The handler runs without locking so we might get a huge
surprise.

If the timer interrupt needs polling, then we wont get there anyway.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Fixup poll handling

try_one_irq() contains redundant code and lots of useless checks for
shared interrupts. Check for shared before setting IRQ_INPROGRESS and
then call handle_IRQ_event() while pending. Shorter version with the
same functionality.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Warn when handler enables interrupts

We run all handlers with interrupts disabled and expect them not to
enable them. Warn when we catch one who does.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Plug race in report_bad_irq()

We cannot walk the action chain unlocked. Even if IRQ_INPROGRESS is
set an action can be removed and we follow a null pointer. It's safe
to take the lock there, because the code which removes the action will
call synchronize_irq() which waits unlocked for IRQ_INPROGRESS going
away.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Remove redundant thread affinity setting

Thread affinity is already set by setup_affinity().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Do not copy affinity before set

While rumaging through arch code I found that there are a few
workarounds which deal with the fact that the initial affinity setting
from request_irq() copies the mask into irq_data->affinity before the
chip code is called. In the normal path we unconditionally copy the
mask when the chip code returns 0.

Copy after the code is called and add a return code
IRQ_SET_MASK_OK_NOCOPY for the chip functions, which prevents the
copy. That way we see the real mask when the chip function decided to
truncate it further as some arches do. IRQ_SET_MASK_OK is 0, which is
the current behaviour.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Always apply cpu online mask

If the affinity had been set by the user, then a later request_irq()
will honour that setting. But online cpus can have changed. So apply
the online mask and for this case as well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Rremove redundant check

IRQ_NO_BALANCING is already checked in irq_can_set_affinity() above,
no need to check it again.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Simplify affinity related code

There is lot of #ifdef CONFIG_GENERIC_PENDING_IRQ along with
duplicated code in the irq core. Move the #ifdeffery into one place
and cleanup the code so it's readable. No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Namespace cleanup

The irq namespace has become quite convoluted. My bad. Clean it up
and deprecate the old functions. All new functions follow the scheme:

irq number based:
irq_set/get/xxx/_xxx(unsigned int irq, ...)

irq_data based:
irq_data_set/get/xxx/_xxx(struct irq_data *d, ....)

irq_desc based:
irq_desc_get_xxx(struct irq_desc *desc)

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Add missing buslock to set_irq_type(), set_irq_wake()

chips behind a slow bus cannot update the chip under desc->lock, but
we miss the chip_buslock/chip_bus_sync_unlock() calls around the set
type and set wake functions.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Make nr_irqs runtime expandable

We face more and more the requirement to expand nr_irqs at
runtime. The reason are irq expanders which can not be detected in the
early boot stage. So we speculate nr_irqs to have enough room. Further
Xen needs extra irq numbers and we really want to avoid adding more
"detection" code into the early boot. There is no real good reason why
we need to limit nr_irqs at early boot.

Allow the allocation code to expand nr_irqs. We have already 8k extra
number space in the allocation bitmap, so lets use it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Merge branch 'irq/urgent' into irq/core

Reason: Further patches are conflicting with mainline fixes

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

genirq: Disable the SHIRQ_DEBUG call in request_threaded_irq for now

With CONFIG_SHIRQ_DEBUG=y we call a newly installed interrupt handler
in request_threaded_irq().

The original implementation (commit a304e1b8) called the handler
_BEFORE_ it was installed, but that caused problems with handlers
calling disable_irq_nosync(). See commit 377bf1e4.

It's braindead in the first place to call disable_irq_nosync in shared
handlers, but ....

Moving this call after we installed the handler looks innocent, but it
is very subtle broken on SMP.

Interrupt handlers rely on the fact, that the irq core prevents
reentrancy.

Now this debug call violates that promise because we run the handler
w/o the IRQ_INPROGRESS protection - which we cannot apply here because
that would result in a possibly forever masked interrupt line.

A concurrent real hardware interrupt on a different CPU results in
handler reentrancy and can lead to complete wreckage, which was
unfortunately observed in reality and took a fricking long time to
debug.

Leave the code here for now. We want this debug feature, but that's
not easy to fix. We really should get rid of those
disable_irq_nosync() abusers and remove that function completely.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Anton Vorontsov <avorontsov@ru.mvista.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: stable@kernel.org # .28 -> .37

genirq: Prevent access beyond allocated_irqs bitmap

Lars-Peter Clausen pointed out:

   I stumbled upon this while looking through the existing archs using
   SPARSE_IRQ.  Even with SPARSE_IRQ the NR_IRQS is still the upper
   limit for the number of IRQs.

   Both PXA and MMP set NR_IRQS to IRQ_BOARD_START, with
   IRQ_BOARD_START being the number of IRQs used by the core.

   In various machine files the nr_irqs field of the ARM machine
   defintion struct is then set to "IRQ_BOARD_START + NR_BOARD_IRQS".

   As a result "nr_irqs" will greater then NR_IRQS which then again
   causes the "allocated_irqs" bitmap in the core irq code to be
   accessed beyond its size overwriting unrelated data.

The core code really misses a sanity check there.

This went unnoticed so far as by chance the compiler/linker places
data behind that bitmap which gets initialized later on those affected
platforms.

So the obvious fix would be to add a sanity check in early_irq_init()
and break all affected platforms. Though that check wants to be
backported to stable as well, which will require to fix all known
problematic platforms and probably some more yet not known ones as
well. Lots of churn.

A way simpler solution is to allocate a slightly larger bitmap and
avoid the whole churn w/o breaking anything. Add a few warnings when
an arch returns utter crap.

Reported-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org # .37
Cc: Haojian Zhuang <haojian.zhuang@marvell.com>
Cc: Eric Miao <eric.y.miao@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>

Merge branch 'rtc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'rtc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  RTC: Re-enable UIE timer/polling emulation
  RTC: Revert UIE emulation removal
  RTC: Release mutex in error path of rtc_alarm_irq_enable

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (37 commits)
  net: deinit automatic LIST_HEAD
  net: dont leave active on stack LIST_HEAD
  net: provide default_advmss() methods to blackhole dst_ops
  tg3: Restrict phy ioctl access
  drivers/net: Call netif_carrier_off at the end of the probe
  ixgbe: work around for DDP last buffer size
  ixgbe: fix panic due to uninitialised pointer
  e1000e: flush all writebacks before unload
  e1000e: check down flag in tasks
  isdn: hisax: Use l2headersize() instead of dup (and buggy) func.
  arp_notify: unconditionally send gratuitous ARP for NETDEV_NOTIFY_PEERS.
  cxgb4vf: Use defined Mailbox Timeout
  cxgb4vf: Quiesce Virtual Interfaces on shutdown ...
  cxgb4vf: Behave properly when CONFIG_DEBUG_FS isn't defined ...
  cxgb4vf: Check driver parameters in the right place ...
  pch_gbe: Fix the MAC Address load issue.
  iwlwifi: Delete iwl3945_good_plcp_health.
  net/can/softing: make CAN_SOFTING_CS depend on CAN_SOFTING
  netfilter: nf_iterate: fix incorrect RCU usage
  pch_gbe: Fix the issue that the receiving data is not normal.
  ...

Merge branch 'for-linus/bugfixes' of git://xenbits.xen.org/people/ianc/linux-2.6

* 'for-linus/bugfixes' of git://xenbits.xen.org/people/ianc/linux-2.6:
xen: suspend and resume system devices when running PVHVM

Merge branch 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

* 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: make sure MAYDAY_INITIAL_TIMEOUT is at least 2 jiffies long
  workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable'
  workqueue: wake up a worker when a rescuer is leaving a gcwq

net: deinit automatic LIST_HEAD

commit 9b5e383c11b08784 (net: Introduce
unregister_netdevice_many()) left an active LIST_HEAD() in
rollback_registered(), with possible memory corruption.

Even if device is freed without touching its unreg_list (and therefore
touching the previous memory location holding LISTE_HEAD(single), better
close the bug for good, since its really subtle.

(Same fix for default_device_exit_batch() for completeness)

Reported-by: Michal Hocko <mhocko@suse.cz>
Tested-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Eric W. Biderman <ebiderman@xmission.com>
Tested-by: Eric W. Biderman <ebiderman@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Octavian Purdila <opurdila@ixiacom.com>
CC: stable <stable@kernel.org> [.33+]
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dont leave active on stack LIST_HEAD

Eric W. Biderman and Michal Hocko reported various memory corruptions
that we suspected to be related to a LIST head located on stack, that
was manipulated after thread left function frame (and eventually exited,
so its stack was freed and reused).

Eric Dumazet suggested the problem was probably coming from commit
443457242beb (net: factorize
sync-rcu call in unregister_netdevice_many)

This patch fixes __dev_close() and dev_close() to properly deinit their
respective LIST_HEAD(single) before exiting.

References: https://lkml.org/lkml/2011/2/16/304
References: https://lkml.org/lkml/2011/2/14/223

Reported-by: Michal Hocko <mhocko@suse.cz>
Tested-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Eric W. Biderman <ebiderman@xmission.com>
Tested-by: Eric W. Biderman <ebiderman@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: provide default_advmss() methods to blackhole dst_ops

Commit 0dbaee3b37e118a (net: Abstract default ADVMSS behind an
accessor.) introduced a possible crash in tcp_connect_init(), when
dst->default_advmss() is called from dst_metric_advmss()

Reported-by: George Spelvin <linux@horizon.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Expand CONFIG_DEBUG_LIST to several other list operations

When list debugging is enabled, we aim to readably show list corruption
errors, and the basic list_add/list_del operations end up having extra
debugging code in them to do some basic validation of the list entries.

However, "list_del_init()" and "list_move[_tail]()" ended up avoiding
the debug code due to how they were written. This fixes that.

So the _next_ time we have list_move() problems with stale list entries,
we'll hopefully have an easier time finding them..

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6

* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
PM / Hibernate: Return error code when alloc_image_page() fails

Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6

* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm/radeon/kms: add missing frac fb div flag for dce4+
  drm/radeon/kms: do not reject X16 and Y16X16 floating-point formats on r300
  drm/nouveau: fix suspend/resume on GPUs that don't have PM support
  drm/nouveau: flips/flipd need to always set 'evict' for move_accel_cleanup()
  drm/nv40: fix tiling-related setup for a number of chipsets
  drm/nouveau: fix non-EDIDful native mode selection
  drm/nouveau: Fix detection of DDC-based LVDS on DCB15 boards.
  drm/nv04-nv40: Fix NULL dereference when we fail to find an LVDS native mode.
  drm/nv10: Fix crash when allocating a BO larger than half the available VRAM.

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/qib: Prevent double completions after a timeout or RNR error
  IB/qib: Fix double add_timer()
  RDMA/nes: Don't generate async events for unregistered devices

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  sparc64: Fix NMI startup bug which also breaks perf.
  sparc: fix size argument to find_next_zero_bit()
  sparc: use bitmap_set()
  sparc32: unaligned memory access (MNA) trap handler bug