The for_each_rcu_flavor() loop unconditionally scans all flavors, even
when the first flavor might have some non-lazy callbacks. Once the
loop has seen a non-lazy callback, further passes through the loop
cannot change the state. This is not a huge problem, given that there
can be at most three RCU flavors (RCU-bh, RCU-preempt, and RCU-sched),
but this code is on the path to idle, so speeding it up even a small
amount would have some benefit.
This commit therefore does two things:
1. Rearranges the order of the list of RCU flavors in order to
place the most active flavor first in the list. The most active
RCU flavor is RCU-preempt, or, if there is no RCU-preempt,
RCU-sched.
2. Reworks the for_each_rcu_flavor() to exit early when the first
non-lazy callback is seen, or, in the case where the caller
does not care about non-lazy callbacks (RCU_FAST_NO_HZ=n),
when the first callback is seen.
Reported-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>