Following Rafal request, we verified that on "modern" CPUs using one
or more workers is equivalent. Here is patch V3 that addresses the
packet loss bug in the dma engine using only one worker.
-------
This patch addresses a bug in the dma worker code that keeps draining
packets even when the hardware queues are full. In such cases packets
can not be passed down to the device and are erroneusly dropped by the
code.