PCI: Rewrite pci_call_probe() to use workqueue instead of work_on_cpu()
pci_call_probe() uses work_on_cpu(), which creates and tears down a
full kthread on each call, to invoke ->probe() on node local CPU for
allocation affinity.
The same goal can easily be achieved using a work item. This patch
rewrites pci_call_probe() so that it uses a work item instead of
work_on_cpu().
Note that the function is restructured for simplicity. This adds
get/put_online_cpus() pair for devices without node but the overhead
of doing so isn't anything material at this level.