Currently in the event of a stack overrun a call to schedule()
does not check for this type of corruption. This corruption is
often silent and can go unnoticed. However once the corrupted
region is examined at a later stage, the outcome is undefined
and often results in a sporadic page fault which cannot be
handled.
This patch checks for a stack overrun and takes appropriate
action since the damage is already done, there is no point
in continuing.
Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: aneesh.kumar@linux.vnet.ibm.com
Cc: dzickus@redhat.com
Cc: bmr@redhat.com
Cc: jcastillo@redhat.com
Cc: oleg@redhat.com
Cc: riel@redhat.com
Cc: prarit@redhat.com
Cc: jgh@redhat.com
Cc: minchan@kernel.org
Cc: mpe@ellerman.id.au
Cc: tglx@linutronix.de
Cc: rostedt@goodmis.org
Cc: hannes@cmpxchg.org
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lubomir Rintel <lkundrak@v3.sk>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1410527779-8133-4-git-send-email-atomlin@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
*/
static inline void schedule_debug(struct task_struct *prev)
{
+#ifdef CONFIG_SCHED_STACK_END_CHECK
+ BUG_ON(unlikely(task_stack_end_corrupted(prev)));
+#endif
/*
* Test if we are atomic. Since do_exit() needs to call into
* schedule() atomically, we ignore that path. Otherwise whine
application, you can say N to avoid the very slight overhead
this adds.
+config SCHED_STACK_END_CHECK
+ bool "Detect stack corruption on calls to schedule()"
+ depends on DEBUG_KERNEL
+ default n
+ help
+ This option checks for a stack overrun on calls to schedule().
+ If the stack end location is found to be over written always panic as
+ the content of the corrupted region can no longer be trusted.
+ This is to ensure no erroneous behaviour occurs which could result in
+ data corruption or a sporadic crash at a later stage once the region
+ is examined. The runtime overhead introduced is minimal.
+
config TIMER_STATS
bool "Collect kernel timers statistics"
depends on DEBUG_KERNEL && PROC_FS