From: Tejun Heo Date: Thu, 6 Sep 2012 19:50:41 +0000 (-0700) Subject: workqueue: fix possible idle worker depletion during CPU_ONLINE X-Git-Tag: next-20120907~24^2~1^2 X-Git-Url: https://git.karo-electronics.de/?a=commitdiff_plain;h=985aafbf530834a9ab16348300adc7cbf35aab76;p=karo-tx-linux.git workqueue: fix possible idle worker depletion during CPU_ONLINE To simplify both normal and CPU hotplug paths, while CPU hotplug is in progress, manager_mutex is held to prevent one of the workers from becoming a manager and creating or destroying workers; unfortunately, it currently may lead to idle worker depletion which in turn can lead to deadlock under extreme circumstances. Idle workers aren't allowed to become busy if there's no other idle worker left to create more idle workers, but during CPU_ONLINE gcwq_associate() is holding all managerships and all the idle workers can proceed to become busy before gcwq_associate() is finished. This patch fixes the bug by releasing manager_mutexes before letting the rebound idle workers go. This ensures that by the time idle workers check whether management is necessary, CPU_ONLINE already has released the positions. Signed-off-by: Tejun Heo Reported-by: Lai Jiangshan --- diff --git a/kernel/workqueue.c b/kernel/workqueue.c index b19170bddc77..74487ef9963a 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -1454,10 +1454,19 @@ retry: } /* - * All idle workers are rebound and waiting for %WORKER_REBIND to - * be cleared inside idle_worker_rebind(). Clear and release. - * Clearing %WORKER_REBIND from this foreign context is safe - * because these workers are still guaranteed to be idle. + * At this point, each pool is guaranteed to have at least one idle + * worker and all idle workers are waiting for WORKER_REBIND to + * clear. Release management before releasing idle workers; + * otherwise, they can all go become busy as we're holding the + * manager_mutexes, which can lead to deadlock as we don't actually + * create new workers. + */ + gcwq_release_management(gcwq); + + /* + * Clear %WORKER_REBIND and release. Clearing it from this foreign + * context is safe because these workers are still guaranteed to be + * idle. * * We need to make sure all idle workers passed WORKER_REBIND wait * in idle_worker_rebind() before returning; otherwise, workers can @@ -1467,6 +1476,7 @@ retry: INIT_COMPLETION(idle_rebind.done); for_each_worker_pool(pool, gcwq) { + WARN_ON_ONCE(list_empty(&pool->idle_list)); list_for_each_entry(worker, &pool->idle_list, entry) { worker->flags &= ~WORKER_REBIND; idle_rebind.cnt++; @@ -1481,8 +1491,6 @@ retry: } else { spin_unlock_irq(&gcwq->lock); } - - gcwq_release_management(gcwq); } static struct worker *alloc_worker(void)