From: Tim Sell Date: Tue, 14 Jul 2015 18:43:30 +0000 (-0400) Subject: staging: unisys: fix random hangs with network stress in visornic X-Git-Url: https://git.karo-electronics.de/?a=commitdiff_plain;h=24ac1074c1dad1bdc455f8f6df5c5316fae53eb7;p=linux-beck.git staging: unisys: fix random hangs with network stress in visornic We learned that it was possible for the core networking code to call visornic_xmit() within ISR context, resulting in the need for us to use spin_lock_irqsave() / spin_lock_irqrestore() to lock accesses to our virtual device channels. Without the correct locking added in this patch, random hangs would occur on typical kernels while stressing the netork. When using a kernel with CONFIG_DEBUG_SPINLOCK=y, a stackdump would occur at the time of the hang reporting: BUG: spinlock recursion on CPU#0, vnic_incoming/ (see below for more details) We considered the possibility of adding a protocol between a visordriver and visorbus where the visordriver could specify which type of locking it required for its virtual device channels (essentially indicating whether or not it was possible for the channel to be accessed in ISR context), but decided this extra complexity was NOT needed, and that channel queues should always be accessed with the most-stringent locking. So that is what is implemented in this commit. Below is an example stackdump illustrating the spinlock recursion that is fixed by this commit. Note that we are first in virtnic_rx() writing to the device channel when an APIC timer interrupt occurs. Within the core networking code, net_rx_action() calls process_backlog(), which eventually lands up back up in virtnic_xmit() in the code attempting to also write to the device channel. BUG: spinlock recursion on CPU#0, vnic_incoming/262 lock: 0xffff88002db810c0, .magic: dead4ead, .owner: vnic_incoming/262, .owner_cpu: 0 CPU: 0 PID: 262 Comm: vnic_incoming Tainted: G C 4.2.0-rc1-ARCH+ #56 Hardware name: Dell Inc. PowerEdge T110/ , BIOS 1.23 12/15/2009 ffff8800216ac200 ffff88002c803388 ffffffff81476364 0000000000000106 ffff88002db810c0 ffff88002c8033a8 ffffffff8109e2bc ffff88002db810c0 ffffffff817631d4 ffff88002c8033c8 ffffffff8109e330 ffff88002db810c0 Call Trace: [] dump_stack+0x4f/0x73 [] spin_dump+0x7c/0xc0 [] spin_bug+0x30/0x40 [] do_raw_spin_lock+0x127/0x140 [] _raw_spin_lock+0x40/0x50 [] ? visorchannel_signalinsert+0x46/0x70 [visorbus] [] visorchannel_signalinsert+0x46/0x70 [visorbus] [] visornic_xmit+0x302/0x5d0 [visornic] [] dev_hard_start_xmit+0x2e0/0x510 [] ? validate_xmit_skb+0x235/0x310 [] sch_direct_xmit+0xf7/0x1d0 [] __dev_queue_xmit+0x203/0x640 [] ? __dev_queue_xmit+0x50/0x640 [] ? ip_finish_output+0x1df/0x310 [] dev_queue_xmit_sk+0x13/0x20 [] ip_finish_output2+0x22c/0x470 [] ? ip_finish_output+0x1df/0x310 [] ? __lock_is_held+0x50/0x70 [] ip_finish_output+0x1df/0x310 [] ip_output+0xb1/0x100 [] ip_local_out_sk+0x3e/0x80 [] ip_queue_xmit+0x188/0x4a0 [] ? ip_local_out_sk+0x80/0x80 [] ? __alloc_skb+0x86/0x1e0 [] tcp_transmit_skb+0x4cb/0x9c0 [] ? __kmalloc_reserve+0x3c/0x90 [] ? __alloc_skb+0x9a/0x1e0 [] tcp_send_ack+0x10d/0x150 [] __tcp_ack_snd_check+0x5e/0x90 [] tcp_rcv_established+0x354/0x710 [] tcp_v4_do_rcv+0x162/0x3f0 [] tcp_v4_rcv+0xb22/0xb50 [] ? ip_local_deliver_finish+0x4c/0x2d0 [] ip_local_deliver_finish+0xe0/0x2d0 [] ? ip_local_deliver_finish+0x4c/0x2d0 [] ip_local_deliver+0xae/0xc0 [] ip_rcv_finish+0x14f/0x510 [] ? __netif_receive_skb_core+0x9d/0xb70 [] ip_rcv+0x2d3/0x3b0 [] ? cpuacct_css_alloc+0xb0/0xb0 [] __netif_receive_skb_core+0x663/0xb70 [] ? __netif_receive_skb_core+0x9d/0xb70 [] ? cpuacct_charge+0x99/0xb0 [] ? cpuacct_css_alloc+0xb0/0xb0 [] ? __lock_is_held+0x50/0x70 [] ? process_backlog+0xbc/0x150 [] ? process_backlog+0x11b/0x150 [] __netif_receive_skb+0x27/0x70 [] process_backlog+0x92/0x150 [] net_rx_action+0x13d/0x350 [] ? lapic_next_event+0x1d/0x30 [] __do_softirq+0x104/0x320 [] ? hrtimer_interrupt+0xc8/0x1a0 [] ? blocking_notifier_chain_cond_register+0x70/0x70 [] irq_exit+0x79/0xa0 [] smp_apic_timer_interrupt+0x4a/0x60 [] apic_timer_interrupt+0x68/0x70 [] ? __memcpy+0x12/0x20 [] ? visorchannel_write+0x4a/0x80 [visorbus] [] signalinsert_inner+0x88/0x130 [visorbus] [] visorchannel_signalinsert+0x55/0x70 [visorbus] [] visornic_rx+0x12e7/0x19d0 [visornic] [] process_incoming_rsps+0x289/0x690 [visornic] [] ? preempt_schedule+0x25/0x30 [] ? ___preempt_schedule+0x12/0x14 [] ? wait_woken+0x90/0x90 [] ? visornic_rx+0x19d0/0x19d0 [visornic] [] ? visornic_rx+0x19d0/0x19d0 [visornic] [] kthread+0xe9/0x110 [] ? __init_kthread_worker+0x70/0x70 [] ret_from_fork+0x3f/0x70 [] ? __init_kthread_worker+0x70/0x70 Fixes: b12fdf7da ('staging: unisys: rework signal remove/insert to avoid sparse lock warnings') Signed-off-by: Tim Sell Signed-off-by: Benjamin Romer Signed-off-by: Greg Kroah-Hartman --- diff --git a/drivers/staging/unisys/visorbus/visorchannel.c b/drivers/staging/unisys/visorbus/visorchannel.c index af349c8a3693..c6452c3908e6 100644 --- a/drivers/staging/unisys/visorbus/visorchannel.c +++ b/drivers/staging/unisys/visorbus/visorchannel.c @@ -417,11 +417,12 @@ bool visorchannel_signalremove(struct visorchannel *channel, u32 queue, void *msg) { bool rc; + unsigned long flags; if (channel->needs_lock) { - spin_lock(&channel->remove_lock); + spin_lock_irqsave(&channel->remove_lock, flags); rc = signalremove_inner(channel, queue, msg); - spin_unlock(&channel->remove_lock); + spin_unlock_irqrestore(&channel->remove_lock, flags); } else { rc = signalremove_inner(channel, queue, msg); } @@ -471,11 +472,12 @@ bool visorchannel_signalinsert(struct visorchannel *channel, u32 queue, void *msg) { bool rc; + unsigned long flags; if (channel->needs_lock) { - spin_lock(&channel->insert_lock); + spin_lock_irqsave(&channel->insert_lock, flags); rc = signalinsert_inner(channel, queue, msg); - spin_unlock(&channel->insert_lock); + spin_unlock_irqrestore(&channel->insert_lock, flags); } else { rc = signalinsert_inner(channel, queue, msg); }