From 565198769bf7d6ea79dbfda3c1afe53a3d0b7862 Mon Sep 17 00:00:00 2001 From: "Srivatsa S. Bhat" Date: Thu, 22 May 2014 10:44:06 +1000 Subject: [PATCH] CPU hotplug, stop-machine: plug race-window that leads to "IPI-to-offline-CPU" During CPU offline, stop-machine is used to take control over all the online CPUs (via the per-cpu stopper thread) and then run take_cpu_down() on the CPU that is to be taken offline. But stop-machine itself has several stages: _PREPARE, _DISABLE_IRQ, _RUN etc. The important thing to note here is that the _DISABLE_IRQ stage comes much later after starting stop-machine, and hence there is a large window where other CPUs can send IPIs to the CPU going offline. As a result, we can encounter a scenario as depicted below, which causes IPIs to be sent to the CPU going offline, and that CPU notices them *after* it has gone offline, triggering the "IPI-to-offline-CPU" warning from the smp-call-function code. CPU 1 CPU 2 (Online CPU) (CPU going offline) Enter _PREPARE stage Enter _PREPARE stage Enter _DISABLE_IRQ stage = Got a device interrupt, | Didn't notice the IPI and the interrupt handler | since interrupts were called smp_call_function() | disabled on this CPU. and sent an IPI to CPU 2. | = Enter _DISABLE_IRQ stage Enter _RUN stage Enter _RUN stage = Busy loop with interrupts | Invoke take_cpu_down() disabled. | and take CPU 2 offline = Enter _EXIT stage Enter _EXIT stage Re-enable interrupts Re-enable interrupts The pending IPI is noted immediately, but alas, the CPU is offline at this point. So, as we can observe from this scenario, the IPI was sent when CPU 2 was still online, and hence it was perfectly legal. But unfortunately it was noted only after CPU 2 went offline, resulting in the warning from the IPI handling code. In other words, the fault was not at the sender, but at the receiver side - and if we look closely, the real bug is in the stop-machine sequence itself. The problem here is that the CPU going offline disabled its local interrupts (by entering _DISABLE_IRQ phase) *before* the other CPUs. And that's the reason why it was not able to respond to the IPI before going offline. A simple solution to this problem is to ensure that the CPU going offline disables its interrupts only *after* the other CPUs do the same thing. To achieve this, split the _DISABLE_IRQ state into 2 parts: 1st part: MULTI_STOP_DISABLE_IRQ_INACTIVE, where only the non-active CPUs (i.e., the "other" CPUs) disable their interrupts. 2nd part: MULTI_STOP_DISABLE_IRQ_ACTIVE, where the active CPU (i.e., the CPU going offline) disables its interrupts. With this in place, the CPU going offline will always be the last one to disable interrupts. After this step, no further IPIs can be sent to the outgoing CPU, since all the other CPUs would be executing the stop-machine code with interrupts disabled. And by the time stop-machine ends, the CPU would have gone offline and disappeared from the cpu_online_mask, and hence future invocations of smp_call_function() and friends will automatically prune that CPU out. Thus, we can guarantee that no CPU will end up *inadvertently* sending IPIs to an offline CPU. Signed-off-by: Srivatsa S. Bhat Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Ingo Molnar Reviewed-by: Tejun Heo Cc: Rusty Russell Cc: Frederic Weisbecker Cc: Christoph Hellwig Cc: Mel Gorman Cc: Rik van Riel Cc: Borislav Petkov Cc: Steven Rostedt Cc: Mike Galbraith Cc: Gautham R Shenoy Cc: "Paul E. McKenney" Cc: Oleg Nesterov Cc: Rafael J. Wysocki Signed-off-by: Andrew Morton --- kernel/stop_machine.c | 30 +++++++++++++++++++++++++++--- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index 695f0c6cd169..b6b67ec10db9 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -165,12 +165,13 @@ static void ack_state(struct multi_stop_data *msdata) set_state(msdata, msdata->state + 1); } + /* This is the cpu_stop function which stops the CPU. */ static int multi_cpu_stop(void *data) { struct multi_stop_data *msdata = data; enum multi_stop_state curstate = MULTI_STOP_NONE; - int cpu = smp_processor_id(), err = 0; + int cpu = smp_processor_id(), num_active_cpus, err = 0; unsigned long flags; bool is_active; @@ -180,15 +181,38 @@ static int multi_cpu_stop(void *data) */ local_save_flags(flags); - if (!msdata->active_cpus) + if (!msdata->active_cpus) { is_active = cpu == cpumask_first(cpu_online_mask); - else + num_active_cpus = 1; + } else { is_active = cpumask_test_cpu(cpu, msdata->active_cpus); + num_active_cpus = cpumask_weight(msdata->active_cpus); + } /* Simple state machine */ do { /* Chill out and ensure we re-read multi_stop_state. */ cpu_relax(); + + /* + * In the case of CPU offline, we don't want the other CPUs to + * send IPIs to the active_cpu (the one going offline) after it + * has entered the _DISABLE_IRQ state (because, then it will + * notice the IPIs only after it goes offline). So ensure that + * the active_cpu always follows the others while entering + * each subsequent state in this state-machine. + * + * msdata->thread_ack tracks the number of CPUs that are yet to + * move to the next state, during each transition. So make the + * active_cpu(s) wait until ->thread_ack indicates that the + * active_cpus are the only ones left to complete the transition. + */ + if (is_active) { + /* Wait until all the non-active threads ack the state */ + while (atomic_read(&msdata->thread_ack) > num_active_cpus) + cpu_relax(); + } + if (msdata->state != curstate) { curstate = msdata->state; switch (curstate) { -- 2.39.5