From: Oleg Nesterov Date: Tue, 17 Jul 2007 11:03:55 +0000 (-0700) Subject: destroy_workqueue() can livelock X-Git-Url: https://git.karo-electronics.de/?a=commitdiff_plain;h=13c22168b7276dffe49dc66675d5a78f6d288e0d;p=linux-beck.git destroy_workqueue() can livelock Pointed out by Michal Schmidt . The bug was introduced in 2.6.22 by me. cleanup_workqueue_thread() does flush_cpu_workqueue(cwq) in a loop until ->worklist becomes empty. This is live-lockable, a re-niced caller can get CPU after wake_up() and insert a new barrier before the lower-priority cwq->thread has a chance to clear ->current_work. Change cleanup_workqueue_thread() to do flush_cpu_workqueue(cwq) only once. We can rely on the fact that run_workqueue() won't return until it flushes all works. So it is safe to call kthread_stop() after that, the "should stop" request won't be noticed until run_workqueue() returns. Signed-off-by: Oleg Nesterov Cc: Michal Schmidt Cc: Srivatsa Vaddagiri Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 1935302cc645..58e5c152a6bb 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -752,18 +752,17 @@ static void cleanup_workqueue_thread(struct cpu_workqueue_struct *cwq, int cpu) if (cwq->thread == NULL) return; + flush_cpu_workqueue(cwq); /* - * If the caller is CPU_DEAD the single flush_cpu_workqueue() - * is not enough, a concurrent flush_workqueue() can insert a - * barrier after us. + * If the caller is CPU_DEAD and cwq->worklist was not empty, + * a concurrent flush_workqueue() can insert a barrier after us. + * However, in that case run_workqueue() won't return and check + * kthread_should_stop() until it flushes all work_struct's. * When ->worklist becomes empty it is safe to exit because no * more work_structs can be queued on this cwq: flush_workqueue * checks list_empty(), and a "normal" queue_work() can't use * a dead CPU. */ - while (flush_cpu_workqueue(cwq)) - ; - kthread_stop(cwq->thread); cwq->thread = NULL; }