sched/idle: Optimize the generic idle loop
Currently, smp_processor_id() is used to fetch the current CPU in
cpu_idle_loop(). Every time the idle thread runs, it fetches the
current CPU using smp_processor_id().
Since the idle thread is per CPU, the current CPU is constant, so we
can lift the load out of the loop, saving execution cycles/time in the
loop.
x86-64:
Before patch (execution in loop):
148: 0f ae e8 lfence
14b: 65 8b 04 25 00 00 00 00 mov %gs:0x0,%eax
152: 00
153: 89 c0 mov %eax,%eax
155: 49 0f a3 04 24 bt %rax,(%r12)
After patch (execution in loop):
150: 0f ae e8 lfence
153: 4d 0f a3 34 24 bt %r14,(%r12)
ARM64:
Before patch (execution in loop):
168:
d5033d9f dsb ld
16c:
b9405661 ldr w1,[x19,#84]
170:
1100fc20 add w0,w1,#0x3f
174:
6b1f003f cmp w1,wzr
178:
1a81b000 csel w0,w0,w1,lt
17c:
130c7000 asr w0,w0,#6
180:
937d7c00 sbfiz x0,x0,#3,#32
184:
f8606aa0 ldr x0,[x21,x0]
188:
9ac12401 lsr x1,x0,x1
18c:
36000e61 tbz w1,#0,358
After patch (execution in loop):
1a8:
d50339df dsb ld
1ac:
f8776ac0 ldr x0,[x22,x23]
ab0:
ea18001f tst x0,x24
1b4:
54000ea0 b.eq 388
Further observance on ARM64 for 4 seconds shows that cpu_idle_loop is
called 8672 times. Shifting the code will save instructions executed
in loop and eventually time as well.
Signed-off-by: Gaurav Jindal <gaurav.jindal@spreadtrum.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sanjeev Yadav <sanjeev.yadav@spreadtrum.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160512101330.GA488@gauravjindalubtnb.del.spreadtrum.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>