From 19a55e92d85913e3638028fbdc03ac9ea0f7948c Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Thu, 25 Aug 2011 09:46:22 +1000 Subject: [PATCH] This patchset aims at addressing /proc/stat issue which has been introduced with tickless kernel. In short, show_stat (proc handler) relies on kstat_cpu(i).cpustat statistics which are updated periodically so those numbers are more or less accurate. This is, however, not true with tickless kernel for idle and iowait counters because those are not updated while the cpu is in the tickless state. As the time when CPU might be tickless is not bounded, we can see really outdated values. The biggest problem is that tools which read /proc/stat interpret unchanged idle/iowait numbers as 0% idle/iowait which might confuse those who rely on them. The first patch in this series is just a minor clean-up. The second one changes update_ts_time_stat semantic. The current implementation updates idle counter regardless we are in iowait loop at the moment. I see it as an optimization because cpufreq drivers, which are only users of those counters, care about busy vs. non-busy states so idle+iowait makes perfect sense. This, however, makes idle counter useless for others. I think that using get_cpu_idle_time_us + get_cpu_iowait_time_us should have the same meaning (at least this is what we do for jiffies variants). The third patch changes get_cpu_{idle,iowait}_time_us semantic. Both functions call update_ts_time_stat so they update counters as a side effect. This should be OK most of the time as governors (the only users) are singletons. I can still see a potential problem because they might race with IRQ: irq_enter tick_check_idle tick_check_nohz tick_nohz_stop_idle but this is a separate issue IMO. Anyway, we shouldn't update those counters from other contexts so let's make updating conditional based on the last_update_time parameter. The final patch is the actual fix. It uses get_cpu_{idle,iowait}_time_us to get precise counters. We still fall back to kstat_cpu if tickless kernel is disabled. The patchset is based on top of and gave it some testing (although I am still not sure about the cpufreq part and possible side effects). My testing was quite trivial (8 CPU machine): mount -t cgroup -o cpuset none /mnt/cgroup mkdir /mnt/cgroup/a echo 0-5 > /mnt/cgroup/a/cpuset.cpus echo 0 > /mnt/cgroup/a/cpuset.mems for i in `cat /mnt/cgroup/tasks`; do echo $i > /mnt/cgroup/a/tasks; done [only kernel threads will stay in the root cgroup] mkdir /mnt/cgroup/b echo 6,7 > /mnt/cgroup/a/cpuset.cpus echo 0 > /mnt/cgroup/a/cpuset.mems [no task in that group so CPU6,7 should be idle most of the time] Without the last patch I can see unchanged values for CPU[67] taking up to several seconds. This patch: Get rid of semicolon so that those expressions can be used also somewhere else than just in an assignment. Signed-off-by: Michal Hocko Acked-by: Arnd Bergmann Cc: Dave Jones Cc: Thomas Gleixner Cc: Alexey Dobriyan Signed-off-by: Andrew Morton --- include/asm-generic/cputime.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/asm-generic/cputime.h b/include/asm-generic/cputime.h index 61e03dd7939e..62ce6823c0f2 100644 --- a/include/asm-generic/cputime.h +++ b/include/asm-generic/cputime.h @@ -38,8 +38,8 @@ typedef u64 cputime64_t; /* * Convert cputime to microseconds and back. */ -#define cputime_to_usecs(__ct) jiffies_to_usecs(__ct); -#define usecs_to_cputime(__msecs) usecs_to_jiffies(__msecs); +#define cputime_to_usecs(__ct) jiffies_to_usecs(__ct) +#define usecs_to_cputime(__msecs) usecs_to_jiffies(__msecs) /* * Convert cputime to seconds and back. -- 2.39.5