From: Michal Hocko <mhocko@suse.cz>
Date: Wed, 24 Aug 2011 23:46:22 +0000 (+1000)
Subject: This patchset aims at addressing /proc/stat issue which has been
X-Git-Tag: next-20110905~1^2~125
X-Git-Url: https://git.karo-electronics.de/?a=commitdiff_plain;h=d0ec984ab22e144cbc9ff53f2fc845af54b46b2d;p=karo-tx-linux.git

This patchset aims at addressing /proc/stat issue which has been
introduced with tickless kernel.  In short, show_stat (proc handler)
relies on kstat_cpu(i).cpustat statistics which are updated periodically
so those numbers are more or less accurate.

This is, however, not true with tickless kernel for idle and iowait
counters because those are not updated while the cpu is in the tickless
state.  As the time when CPU might be tickless is not bounded, we can see
really outdated values.

The biggest problem is that tools which read /proc/stat interpret
unchanged idle/iowait numbers as 0% idle/iowait which might confuse those
who rely on them.

The first patch in this series is just a minor clean-up.

The second one changes update_ts_time_stat semantic.  The current
implementation updates idle counter regardless we are in iowait loop at
the moment.  I see it as an optimization because cpufreq drivers, which
are only users of those counters, care about busy vs.  non-busy states so
idle+iowait makes perfect sense.  This, however, makes idle counter
useless for others.

I think that using get_cpu_idle_time_us + get_cpu_iowait_time_us should
have the same meaning (at least this is what we do for jiffies variants).

The third patch changes get_cpu_{idle,iowait}_time_us semantic.  Both
functions call update_ts_time_stat so they update counters as a side
effect.  This should be OK most of the time as governors (the only users)
are singletons.  I can still see a potential problem because they might
race with IRQ:

irq_enter
  tick_check_idle
    tick_check_nohz
      tick_nohz_stop_idle

but this is a separate issue IMO.

Anyway, we shouldn't update those counters from other contexts so let's
make updating conditional based on the last_update_time parameter.

The final patch is the actual fix.  It uses get_cpu_{idle,iowait}_time_us
to get precise counters.  We still fall back to kstat_cpu if tickless
kernel is disabled.

The patchset is based on top of and gave it some testing (although I am
still not sure about the cpufreq part and possible side effects).  My
testing was quite trivial (8 CPU machine):

mount -t cgroup -o cpuset none /mnt/cgroup
mkdir /mnt/cgroup/a
echo 0-5 >  /mnt/cgroup/a/cpuset.cpus
echo 0 > /mnt/cgroup/a/cpuset.mems
for i in `cat /mnt/cgroup/tasks`; do echo $i > /mnt/cgroup/a/tasks; done
[only kernel threads will stay in the root cgroup]
mkdir /mnt/cgroup/b
echo 6,7 >  /mnt/cgroup/a/cpuset.cpus
echo 0 > /mnt/cgroup/a/cpuset.mems
[no task in that group so CPU6,7 should be idle most of the time]

Without the last patch I can see unchanged values for CPU[67] taking up to
several seconds.

This patch:

Get rid of semicolon so that those expressions can be used also somewhere
else than just in an assignment.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Jones <davej@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

diff --git a/include/asm-generic/cputime.h b/include/asm-generic/cputime.h
index 61e03dd7939e..62ce6823c0f2 100644
--- a/include/asm-generic/cputime.h
+++ b/include/asm-generic/cputime.h
@@ -38,8 +38,8 @@ typedef u64 cputime64_t;
 /*
  * Convert cputime to microseconds and back.
  */
-#define cputime_to_usecs(__ct)		jiffies_to_usecs(__ct);
-#define usecs_to_cputime(__msecs)	usecs_to_jiffies(__msecs);
+#define cputime_to_usecs(__ct)		jiffies_to_usecs(__ct)
+#define usecs_to_cputime(__msecs)	usecs_to_jiffies(__msecs)
 
 /*
  * Convert cputime to seconds and back.