The work cancellation code, the thermal zone unregistering, the work code
and the interrupt notification function are racy against each other and
against cpu hotplug and module exit. The random locking sprinkeled all
over the place does not help anything and probably exists to make people
feel good. The resulting issues (mainly use after free) are probably
hard to trigger, but they clearly exist
Protect the package list with a spinlock so it can be accessed from the
interrupt notifier and also from the work function. The add/removal code in
the hotplug callbacks take the lock for list manipulation. That makes sure
that on removal neither the interrupt notifier nor the work function can
access the about to be freed package structure anymore.
The thermal zone unregistering is another trainwreck. It's not serialized
against the work function. So unregistering the zone device can race with
the work function and cause havoc.
Protect the thermal zone with a mutex, which is held in the work
function to make sure that the zone device is not being unregistered
concurrently.
To solve the module exit issues, we simply invoke the cpu offline callback
and let it work its magic. For that it's required to keep track of the
participating cpus in a package, because topology_core_mask is not affected
by calling the offline callback for teardown of the driver, so it would
never free the package as there is always a valid target in
topology_core_mask.
Use proper names for the locks so it's clear what they are for and add a
pile of comments to explain the protection rules.
It's amazing that fixing the locking and adding 30 lines of comments
explaining it still removes more lines than it adds.