Russell King [Wed, 15 Dec 2010 19:23:07 +0000 (19:23 +0000)]
ARM: sched_clock: provide common infrastructure for sched_clock()
Provide common sched_clock() infrastructure for platforms to use to
create a 64-bit ns based sched_clock() implementation from a counter
running at a non-variable clock rate.
This implementation is based upon maintaining an epoch for the counter
and an epoch for the nanosecond time. When we desire a sched_clock()
time, we calculate the number of counter ticks since the last epoch
update, convert this to nanoseconds and add to the epoch nanoseconds.
We regularly refresh these epochs within the counter wrap interval.
We perform a similar calculation as above, and store the new epochs.
We read and write the epochs in such a way that sched_clock() can easily
(and locklessly) detect when an update is in progress, and repeat the
loading of these constants when they're known not to be stable. The
one caveat is that sched_clock() is not called in the middle of an
update. We achieve that by disabling IRQs.
Finally, if the clock rate is known at compile time, the counter to ns
conversion factors can be specified, allowing sched_clock() to be tightly
optimized. We ensure that these factors are correct by providing an
initialization function which performs a run-time check.
Acked-by: Peter Zijlstra <peterz@infradead.org> Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Tested-by: Will Deacon <will.deacon@arm.com> Tested-by: Mikael Pettersson <mikpe@it.uu.se> Tested-by: Eric Miao <eric.y.miao@gmail.com> Tested-by: Olof Johansson <olof@lixom.net> Tested-by: Jamie Iles <jamie@jamieiles.com> Reviewed-by: Nicolas Pitre <nicolas.pitre@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
john stultz [Thu, 16 Dec 2010 19:03:27 +0000 (19:03 +0000)]
Fix rounding in clocks_calc_mult_shift()
Russell King reports:
| On the ARM dev boards, we have a 32-bit counter running at 24MHz. Calling
| clocks_calc_mult_shift(&mult, &shift, 24MHz, NSEC_PER_SEC, 60) gives
| us a multiplier of 2796202666 and a shift of 26.
|
| Over a large counter delta, this produces an error - lets take a count
| from 362976315 to 4280663372:
|
| (4280663372-362976315) * 2796202666 / 2^26 - (4280663372-362976315) * (1000/24)
| => -38.91872422891230269990
|
| Can we do better?
|
| (4280663372-362976315) * 2796202667 / 2^26 - (4280663372-362976315) * (1000/24)
| 19.45936211449532822051
|
| which is about twice as good as the 2796202666 multiplier.
|
| Looking at the equivalent divisions obtained, 2796202666 / 2^26 gives
| 41.66666665673255920410ns per tick, whereas 2796202667 / 2^26 gives
| 41.66666667163372039794ns. The actual value wanted is 1000/24 =
| 41.66666666666666666666ns.
Fix this by ensuring we round to nearest when calculating the
multiplier.
Signed-off-by: John Stultz <john.stultz@linaro.org> Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Tested-by: Will Deacon <will.deacon@arm.com> Tested-by: Mikael Pettersson <mikpe@it.uu.se> Tested-by: Eric Miao <eric.y.miao@gmail.com> Tested-by: Olof Johansson <olof@lixom.net> Tested-by: Jamie Iles <jamie@jamieiles.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Wed, 15 Dec 2010 19:19:25 +0000 (19:19 +0000)]
ARM: ensure all sched_clock() implementations are notrace marked
ftrace requires sched_clock() to be notrace. Ensure that all
implementations are so marked. Also make sure that they include
linux/sched.h
Also ensure OMAP clocksource read functions are marked notrace as
they're used for sched_clock() too.
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Tested-by: Will Deacon <will.deacon@arm.com> Tested-by: Mikael Pettersson <mikpe@it.uu.se> Tested-by: Eric Miao <eric.y.miao@gmail.com> Tested-by: Olof Johansson <olof@lixom.net> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Colin Cross [Thu, 18 Nov 2010 00:20:15 +0000 (16:20 -0800)]
ARM: tegra: timer: Separate clocksource and sched_clock
tegra_clocksource_read should not use cnt32_to_63, wrapping is
already handled in the clocksource code. Move the cnt32_to_63
into the sched_clock function, and replace the use of clocksource
mult and shift with a multiplication by 1000 to convert us to ns.
Acked-by: John Stultz <johnstul@us.ibm.com> Acked-by: Linus Walleij <linus.walleij@stericsson.com> Tested-by: Olof Johansson <olof@lixom.net> Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
In d7e81c2 (clocksource: Add clocksource_register_hz/khz interface) new
interfaces were added which simplify (and optimize) the selection of the
divisor shift/mult constants. Switch over to using this new interface.
Tested-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Mon, 13 Dec 2010 13:21:41 +0000 (13:21 +0000)]
ARM: stmp: update clock source registration
In d7e81c2 (clocksource: Add clocksource_register_hz/khz interface) new
interfaces were added which simplify (and optimize) the selection of the
divisor shift/mult constants. Switch over to using this new interface.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Mon, 13 Dec 2010 13:21:33 +0000 (13:21 +0000)]
ARM: orion: update clock source registration
In d7e81c2 (clocksource: Add clocksource_register_hz/khz interface) new
interfaces were added which simplify (and optimize) the selection of the
divisor shift/mult constants. Switch over to using this new interface.
Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Mon, 13 Dec 2010 13:21:21 +0000 (13:21 +0000)]
ARM: spear: update clock source registration
In d7e81c2 (clocksource: Add clocksource_register_hz/khz interface) new
interfaces were added which simplify (and optimize) the selection of the
divisor shift/mult constants. Switch over to using this new interface.
Acked-by: Viresh Kumar <viresh.kumar@st.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Mon, 13 Dec 2010 13:20:49 +0000 (13:20 +0000)]
ARM: nomadik: update clock source registration
In d7e81c2 (clocksource: Add clocksource_register_hz/khz interface) new
interfaces were added which simplify (and optimize) the selection of the
divisor shift/mult constants. Switch over to using this new interface.
Acked-by: Linus Walleij <linus.walleij@stericsson.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Mon, 13 Dec 2010 13:20:35 +0000 (13:20 +0000)]
ARM: mxc: update clock source registration
In d7e81c2 (clocksource: Add clocksource_register_hz/khz interface) new
interfaces were added which simplify (and optimize) the selection of the
divisor shift/mult constants. Switch over to using this new interface.
Acked-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Mon, 13 Dec 2010 13:20:23 +0000 (13:20 +0000)]
ARM: iop: update clock source registration
In d7e81c2 (clocksource: Add clocksource_register_hz/khz interface) new
interfaces were added which simplify (and optimize) the selection of the
divisor shift/mult constants. Switch over to using this new interface.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Mon, 13 Dec 2010 13:20:06 +0000 (13:20 +0000)]
ARM: nuc: update clock source registration
In d7e81c2 (clocksource: Add clocksource_register_hz/khz interface) new
interfaces were added which simplify (and optimize) the selection of the
divisor shift/mult constants. Switch over to using this new interface.
Acked-by: Wan zongshun <mcuos.com@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Mon, 13 Dec 2010 13:19:55 +0000 (13:19 +0000)]
ARM: U300: update clock source registration
In d7e81c2 (clocksource: Add clocksource_register_hz/khz interface) new
interfaces were added which simplify (and optimize) the selection of the
divisor shift/mult constants. Switch over to using this new interface.
Acked-by: Linus Walleij <linus.walleij@stericsson.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Mon, 13 Dec 2010 13:19:44 +0000 (13:19 +0000)]
ARM: tcc8k: update clock source registration
In d7e81c2 (clocksource: Add clocksource_register_hz/khz interface) new
interfaces were added which simplify (and optimize) the selection of the
divisor shift/mult constants. Switch over to using this new interface.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Mon, 13 Dec 2010 13:19:35 +0000 (13:19 +0000)]
ARM: SA11x0: update clock source registration
In d7e81c2 (clocksource: Add clocksource_register_hz/khz interface) new
interfaces were added which simplify (and optimize) the selection of the
divisor shift/mult constants. Switch over to using this new interface.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Mon, 13 Dec 2010 13:19:22 +0000 (13:19 +0000)]
ARM: s5pv310: update clock source registration
In d7e81c2 (clocksource: Add clocksource_register_hz/khz interface) new
interfaces were added which simplify (and optimize) the selection of the
divisor shift/mult constants. Switch over to using this new interface.
Acked-by: Kukjin Kim <kgene.kim@samsung.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Mon, 13 Dec 2010 13:19:11 +0000 (13:19 +0000)]
ARM: PXA: update clock source registration
In d7e81c2 (clocksource: Add clocksource_register_hz/khz interface) new
interfaces were added which simplify (and optimize) the selection of the
divisor shift/mult constants. Switch over to using this new interface.
Tested-by: Eric Miao <eric.y.miao@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Mon, 13 Dec 2010 13:18:44 +0000 (13:18 +0000)]
ARM: omap: update clock source registration
In d7e81c2 (clocksource: Add clocksource_register_hz/khz interface) new
interfaces were added which simplify (and optimize) the selection of the
divisor shift/mult constants. Switch over to using this new interface.
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Mon, 13 Dec 2010 13:18:32 +0000 (13:18 +0000)]
ARM: ns9xxx: update clock source registration
In d7e81c2 (clocksource: Add clocksource_register_hz/khz interface) new
interfaces were added which simplify (and optimize) the selection of the
divisor shift/mult constants. Switch over to using this new interface.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Mon, 13 Dec 2010 13:18:21 +0000 (13:18 +0000)]
ARM: netx: update clock source registration
In d7e81c2 (clocksource: Add clocksource_register_hz/khz interface) new
interfaces were added which simplify (and optimize) the selection of the
divisor shift/mult constants. Switch over to using this new interface.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Mon, 13 Dec 2010 13:18:12 +0000 (13:18 +0000)]
ARM: MSM: update clock source registration
In d7e81c2 (clocksource: Add clocksource_register_hz/khz interface) new
interfaces were added which simplify (and optimize) the selection of the
divisor shift/mult constants. Switch over to using this new interface.
Tested-By: Jeff Ohlstein <johlstei@codeaurora.org> Acked-by: David Brown <davidb@codeaurora.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Mon, 13 Dec 2010 13:18:04 +0000 (13:18 +0000)]
ARM: mmp: update clock source registration
In d7e81c2 (clocksource: Add clocksource_register_hz/khz interface) new
interfaces were added which simplify (and optimize) the selection of the
divisor shift/mult constants. Switch over to using this new interface.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Mon, 13 Dec 2010 13:17:52 +0000 (13:17 +0000)]
ARM: lpc32xx: update clock source registration
In d7e81c2 (clocksource: Add clocksource_register_hz/khz interface) new
interfaces were added which simplify (and optimize) the selection of the
divisor shift/mult constants. Switch over to using this new interface.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Mon, 13 Dec 2010 13:17:40 +0000 (13:17 +0000)]
ARM: ixp4xx: update clock source registration
In d7e81c2 (clocksource: Add clocksource_register_hz/khz interface) new
interfaces were added which simplify (and optimize) the selection of the
divisor shift/mult constants. Switch over to using this new interface.
Tested-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Mon, 13 Dec 2010 13:17:24 +0000 (13:17 +0000)]
ARM: integrator: update clock source registration
In d7e81c2 (clocksource: Add clocksource_register_hz/khz interface) new
interfaces were added which simplify (and optimize) the selection of the
divisor shift/mult constants. Switch over to using this new interface.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Mon, 13 Dec 2010 13:17:12 +0000 (13:17 +0000)]
ARM: davinci: update clock source registration
In d7e81c2 (clocksource: Add clocksource_register_hz/khz interface) new
interfaces were added which simplify (and optimize) the selection of the
divisor shift/mult constants. Switch over to using this new interface.
Acked-by: Kevin Hilman <khilman@deeprootsystems.com> Tested-by: Kevin Hilman <khilman@deeprootsystems.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Mon, 13 Dec 2010 13:16:39 +0000 (13:16 +0000)]
ARM: bcmring: update clock source registration
In d7e81c2 (clocksource: Add clocksource_register_hz/khz interface) new
interfaces were added which simplify (and optimize) the selection of the
divisor shift/mult constants. Switch over to using this new interface.
Acked-By: Scott Branden <sbranden@broadcom.com> Acked-By: Jiandong Zheng <jdzheng@broadcom.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Mon, 13 Dec 2010 13:14:55 +0000 (13:14 +0000)]
ARM: AT91: update clock source registration
In d7e81c2 (clocksource: Add clocksource_register_hz/khz interface) new
interfaces were added which simplify (and optimize) the selection of the
divisor shift/mult constants. Switch over to using this new interface.
Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Will Deacon [Sat, 13 Nov 2010 19:04:32 +0000 (19:04 +0000)]
ARM: perf: separate PMU backends into multiple files
The ARM perf_event.c file contains all PMU backends and, as new PMUs
are introduced, will continue to grow.
This patch follows the example of x86 and splits the PMU implementations
into separate files which are then #included back into the main
file. Compile-time guards are added to each PMU file to avoid compiling
in code that is not relevant for the version of the architecture which
we are targetting.
Acked-by: Jean Pihet <j-pihet@ti.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
Will Deacon [Sat, 13 Nov 2010 17:37:46 +0000 (17:37 +0000)]
ARM: perf: add _init() functions to PMUs
In preparation for separating the PMU-specific code, this patch adds
self-contained init functions to each PMU, therefore removing any
PMU-specific knowledge from the PMU-agnostic init_hw_perf_events
function.
Acked-by: Jamie Iles <jamie@jamieiles.com> Acked-by: Jean Pihet <j-pihet@ti.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
Will Deacon [Sat, 13 Nov 2010 17:13:56 +0000 (17:13 +0000)]
ARM: perf: consolidate common PMU behaviour
The functions for mapping PMU events (perf, cache and raw) are
common between all PMU types and differ only in the data on which
they operate.
This patch implements common definitions of these mapping functions
and changes the arm_pmu struct to hold pointers to the data which
they require. This is in anticipation of separating out the PMU-specific
code into separate files.
Acked-by: Jamie Iles <jamie.iles@jamieiles.com> Acked-by: Jean Pihet <j-pihet@ti.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
arch/tile: fix memchr() not to dereference memory for zero length
arch/tile: make glibc's sysconf(_SC_NPROCESSORS_CONF) work correctly
arch/tile: fix rwlock so would-be write lockers don't block new readers
Linus Torvalds [Wed, 24 Nov 2010 22:42:03 +0000 (07:42 +0900)]
Merge branch 'drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
* 'drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
pci root complex: support for tile architecture
drivers/net/tile/: on-chip network drivers for the tile architecture
MAINTAINERS: add drivers/char/hvc_tile.c as maintained by tile
Linus Torvalds [Wed, 24 Nov 2010 21:58:19 +0000 (06:58 +0900)]
Merge branch 'sh-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6
* 'sh-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
sh: clkfwk: Build fix for non-legacy CPG changes.
sh: Use GCC __builtin_prefetch() to implement prefetch().
sh: fix vsyscall compilation due to .eh_frame issue
sh: avoid to flush all cache in sys_cacheflush
sh: clkfwk: Disable init clk op for non-legacy clocks.
sh: clkfwk: Kill off now unused algo_id in set_rate op.
sh: clkfwk: Kill off unused clk_set_rate_ex().
Linus Torvalds [Wed, 24 Nov 2010 21:57:43 +0000 (06:57 +0900)]
Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
md: Call blk_queue_flush() to establish flush/fua support
md/raid1: really fix recovery looping when single good device fails.
md: fix return value of rdev_size_change()
Andrew Morton [Wed, 24 Nov 2010 20:57:18 +0000 (12:57 -0800)]
arch/x86/include/asm/fixmap.h: mark __set_fixmap_offset as __always_inline
When compiling arch/x86/kernel/early_printk_mrst.c with i386
allmodconfig, gcc-4.1.0 generates an out-of-line copy of
__set_fixmap_offset() which contains a reference to
__this_fixmap_does_not_exist which the compiler cannot elide.
Marking __set_fixmap_offset() as __always_inline prevents this.
Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Feng Tang <feng.tang@intel.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Wed, 24 Nov 2010 20:57:17 +0000 (12:57 -0800)]
scripts: fix gfp-translate for recent changes to gfp.h
The recent changes to gfp.h to satisfy sparse broke scripts/gfp-translate.
This patch fixes it up to work with old and new versions of gfp.h .
[akpm@linux-foundation.org: use `grep -q', per WANG Cong] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Namhyung Kim <namhyung@gmail.com> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
reiserfs_unpack() locks the inode mutex with reiserfs_mutex_lock_safe()
to protect against reiserfs lock dependency. However this protection
requires to have the reiserfs lock to be locked.
This is the case if reiserfs_unpack() is called by reiserfs_ioctl but
not from reiserfs_quota_on() when it tries to unpack tails of quota
files.
Fix the ordering of the two locks in reiserfs_unpack() to fix this
issue.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reported-by: Markus Gapp <markus.gapp@gmx.net> Reported-by: Jan Kara <jack@suse.cz> Cc: Jeff Mahoney <jeffm@suse.com> Cc: <stable@kernel.org> [2.6.36.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Uwe Kleine-König [Wed, 24 Nov 2010 20:57:14 +0000 (12:57 -0800)]
backlight: grab ops_lock before testing bd->ops
According to the comment describing ops_lock in the definition of struct
backlight_device and when comparing with other functions in backlight.c
the mutex must be hold when checking ops to be non-NULL.
Fixes a problem added by c835ee7f4154992e6 ("backlight: Add suspend/resume
support to the backlight core") in Jan 2009.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Richard Purdie <rpurdie@linux.intel.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Axel Lin [Wed, 24 Nov 2010 20:57:14 +0000 (12:57 -0800)]
drivers/misc/isl29020.c: remove incorrect kfree in isl29020_remove()
struct als_data *data is not used in this driver at all.
Also add a missing ">" character for MODULE_AUTHOR.
Signed-off-by: Axel Lin <axel.lin@gmail.com> Acked-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Naoya Horiguchi [Wed, 24 Nov 2010 20:57:13 +0000 (12:57 -0800)]
pagemap: set pagemap walk limit to PMD boundary
Currently one pagemap_read() call walks in PAGEMAP_WALK_SIZE bytes (== 512
pages.) But there is a corner case where walk_pmd_range() accidentally
runs over a VMA associated with a hugetlbfs file.
For example, when a process has mappings to VMAs as shown below:
then pagemap_read() goes into walk_pmd_range() path and walks in the range
0x7fbd51853000-0x7fbd51a53000, but the hugetlbfs VMA should be handled by
walk_hugetlb_range(). Otherwise PMD for the hugepage is considered bad
and cleared, which causes undesirable results.
This patch fixes it by separating pagemap walk range into one PMD.
David Sterba [Wed, 24 Nov 2010 20:57:10 +0000 (12:57 -0800)]
mm: remove call to find_vma in pagewalk for non-hugetlbfs
Commit d33b9f45 ("mm: hugetlb: fix hugepage memory leak in
walk_page_range()") introduces a check if a vma is a hugetlbfs one and
later in 5dc37642 ("mm hugetlb: add hugepage support to pagemap") it is
moved under #ifdef CONFIG_HUGETLB_PAGE but a needless find_vma call is
left behind and its result is not used anywhere else in the function.
The side-effect of caching vma for @addr inside walk->mm is neither
utilized in walk_page_range() nor in called functions.
Signed-off-by: David Sterba <dsterba@suse.cz> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Matt Mackall <mpm@selenic.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm/page_alloc.c: fix build_all_zonelist() where percpu_alloc() is wrongly called under stop_machine_run()
During memory hotplug, build_allzonelists() may be called under
stop_machine_run(). In this function, setup_zone_pageset() is called.
But it's bug because it will do page allocation under stop_machine_run().
Here is a report from Alok Kataria.
BUG: sleeping function called from invalid context at kernel/mutex.c:94
in_atomic(): 0, irqs_disabled(): 1, pid: 4, name: migration/0
Pid: 4, comm: migration/0 Not tainted 2.6.35.6-45.fc14.x86_64 #1
Call Trace:
[<ffffffff8103d12b>] __might_sleep+0xeb/0xf0
[<ffffffff81468245>] mutex_lock+0x24/0x50
[<ffffffff8110eaa6>] pcpu_alloc+0x6d/0x7ee
[<ffffffff81048888>] ? load_balance+0xbe/0x60e
[<ffffffff8103a1b3>] ? rt_se_boosted+0x21/0x2f
[<ffffffff8103e1cf>] ? dequeue_rt_stack+0x18b/0x1ed
[<ffffffff8110f237>] __alloc_percpu+0x10/0x12
[<ffffffff81465e22>] setup_zone_pageset+0x38/0xbe
[<ffffffff810d6d81>] ? build_zonelists_node.clone.58+0x79/0x8c
[<ffffffff81452539>] __build_all_zonelists+0x419/0x46c
[<ffffffff8108ef01>] ? cpu_stopper_thread+0xb2/0x198
[<ffffffff8108f075>] stop_machine_cpu_stop+0x8e/0xc5
[<ffffffff8108efe7>] ? stop_machine_cpu_stop+0x0/0xc5
[<ffffffff8108ef57>] cpu_stopper_thread+0x108/0x198
[<ffffffff81467a37>] ? schedule+0x5b2/0x5cc
[<ffffffff8108ee4f>] ? cpu_stopper_thread+0x0/0x198
[<ffffffff81065f29>] kthread+0x7f/0x87
[<ffffffff8100aae4>] kernel_thread_helper+0x4/0x10
[<ffffffff81065eaa>] ? kthread+0x0/0x87
[<ffffffff8100aae0>] ? kernel_thread_helper+0x0/0x10
Built 5 zonelists in Node order, mobility grouping on. Total pages: 289456
Policy zone: Normal
This patch tries to fix the issue by moving setup_zone_pageset() out from
stop_machine_run(). It's obviously not necessary to be called under
stop_machine_run().
[akpm@linux-foundation.org: remove unneeded local] Reported-by: Alok Kataria <akataria@vmware.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: Petr Vandrovec <petr@vmware.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Wed, 24 Nov 2010 20:57:08 +0000 (12:57 -0800)]
cgroups: make swap accounting default behavior configurable
Swap accounting can be configured by CONFIG_CGROUP_MEM_RES_CTLR_SWAP
configuration option and then it is turned on by default. There is a boot
option (noswapaccount) which can disable this feature.
This makes it hard for distributors to enable the configuration option as
this feature leads to a bigger memory consumption and this is a no-go for
general purpose distribution kernel. On the other hand swap accounting
may be very usuful for some workloads.
This patch adds a new configuration option which controls the default
behavior (CGROUP_MEM_RES_CTLR_SWAP_ENABLED). If the option is selected
then the feature is turned on by default.
It also adds a new boot parameter swapaccount[=1|0] which enhances the
original noswapaccount parameter semantic by means of enable/disable logic
(defaults to 1 if no value is provided to be still consistent with
noswapaccount).
The default behavior is unchanged (if CONFIG_CGROUP_MEM_RES_CTLR_SWAP is
enabled then CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED is enabled as well)
memcg: avoid deadlock between move charge and try_charge()
__mem_cgroup_try_charge() can be called under down_write(&mmap_sem)(e.g.
mlock does it). This means it can cause deadlock if it races with move charge:
To avoid this deadlock, we do all the move charge works (both can_attach() and
attach()) under one mmap_sem section.
And after this patch, we set/clear mc.moving_task outside mc.lock, because we
use the lock only to check mc.from/to.
Ken Sumrall [Wed, 24 Nov 2010 20:57:00 +0000 (12:57 -0800)]
fuse: fix attributes after open(O_TRUNC)
The attribute cache for a file was not being cleared when a file is opened
with O_TRUNC.
If the filesystem's open operation truncates the file ("atomic_o_trunc"
feature flag is set) then the kernel should invalidate the cached st_mtime
and st_ctime attributes.
Also i_size should be explicitly be set to zero as it is used sometimes
without refreshing the cache.
Signed-off-by: Ken Sumrall <ksumrall@android.com> Cc: Anfei <anfei.zhou@gmail.com> Cc: "Anand V. Avati" <avati@gluster.com> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robin@sgi.com [Wed, 24 Nov 2010 20:56:59 +0000 (12:56 -0800)]
sgi-xpc: XPC fails to discover partitions with all nasids above 128
UV hardware defines 256 memory protection regions versus the baseline 64
with increasing size for the SN2 ia64. This was overlooked when XPC was
modified to accomodate both UV and SN2.
Without this patch, a user could reconfigure their existing system and
suddenly disable cross-partition communications with no indication of what
has gone wrong. It also prevents larger configurations from using
cross-partition communication.
Signed-off-by: Robin Holt <holt@sgi.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
lock_page_cgroup and unlock_page_cgroup are implemented using
bit_spinlock. bit_spinlock doesn't touch the bit if we are on non-SMP
machine, so we can't use the bit to check whether the lock was taken.
Let's introduce is_page_cgroup_locked based on bit_spin_is_locked instead
of PageCgroupLocked to fix it.
[akpm@linux-foundation.org: s/is_page_cgroup_locked/page_is_cgroup_locked/] Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Will Newton [Wed, 24 Nov 2010 20:56:55 +0000 (12:56 -0800)]
uml: disable winch irq before freeing handler data
Disable the winch irq early to make sure we don't take an interrupt part
way through the freeing of the handler data, resulting in a crash on
shutdown:
Signed-off-by: Will Newton <will.newton@gmail.com> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Depending on processor speed, page size, and the amount of memory a
process is allowed to amass, cleanup of a large VM may freeze the system
for many seconds. This can result in a watchdog timeout.
Make sure other tasks receive some service when cleaning up large VMs.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com> Cc: Greg Ungerer <gerg@snapgear.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Loïc Minier [Wed, 24 Nov 2010 20:56:53 +0000 (12:56 -0800)]
include/linux/fs.h: fix userspace build
dpkg uses fiemap but didn't particularly need to include stdint.h so far.
Since 367a51a33902 ("fs: Add FITRIM ioctl"), build of linux/fs.h failed in
dpkg with:
In file included from ../../src/filesdb.c:27:0:
/usr/include/linux/fs.h:37:2: error: expected specifier-qualifier-list before 'uint64_t'
Use exportable type __u64 to avoid the dependency on stdint.h.
b31d42a5af18 ("Fix compile brekage with !CONFIG_BLOCK") fixed only the
kernel build by including linux/types.h, but this also fixed "make
headers_check", so don't revert it.
Found that the nas_led_whitelist dmi_system_id structure array had no
NULL end delimiter, causing the dmi_check_system() loop to read an
undefined entry.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Dave Hansen <dave@sr71.net> Acked-by: Richard Purdie <rpurdie@linux.intel.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
James Jones [Tue, 23 Nov 2010 23:21:37 +0000 (00:21 +0100)]
ARM: 6482/2: Fix find_next_zero_bit and related assembly
The find_next_bit, find_first_bit, find_next_zero_bit
and find_first_zero_bit functions were not properly
clamping to the maxbit argument at the bit level. They
were instead only checking maxbit at the byte level.
To fix this, add a compare and a conditional move
instruction to the end of the common bit-within-the-
byte code used by all the functions and be sure not to
clobber the maxbit argument before it is used.
Cc: <stable@kernel.org> Reviewed-by: Nicolas Pitre <nicolas.pitre@linaro.org> Tested-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: James Jones <jajones@nvidia.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Chris Metcalf [Wed, 24 Nov 2010 18:57:42 +0000 (13:57 -0500)]
arch/tile: fix memchr() not to dereference memory for zero length
This change fixes a bug that memchr() will read the first word
of the source even if the length is zero. Ironically, the code
was originally written with a test to avoid exactly this problem,
but to make the code conform to Linux coding standards with all
declarations preceding all statements, the first load from memory
was moved up above that test as the initial value for a variable.
The change just moves all the variable declarations to the top
of the file, with no initializers, so that the test can also be
at the top of the file.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Chris Metcalf [Wed, 24 Nov 2010 18:42:15 +0000 (13:42 -0500)]
arch/tile: make glibc's sysconf(_SC_NPROCESSORS_CONF) work correctly
glibc assumes that it can count /sys/devices/system/cpu/cpu* to get
the number of configured cpus. For this to be valid on tile, we need
to generate a "cpu" entry for all cpus, including the ones that are
not currently allocated for Linux's use.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Chris Metcalf [Tue, 2 Nov 2010 16:05:10 +0000 (12:05 -0400)]
pci root complex: support for tile architecture
This change enables PCI root complex support for TILEPro. Unlike
TILE-Gx, TILEPro has no support for memory-mapped I/O, so the PCI
support consists of hypervisor upcalls for PIO, DMA, etc. However,
the performance is fine for the devices we have tested with so far
(1Gb Ethernet, SATA, etc.).
The <asm/io.h> header was tweaked to be a little bit more aggressive
about disabling attempts to map/unmap IO port space. The hacky
<asm/pci-bridge.h> header was rolled into the <asm/pci.h> header
and the result was simplified. Both of the latter two headers were
preliminary versions not meant for release before now - oh well.
There is one quirk for our TILEmpower platform, which accidentally
negotiates up to 5GT and needs to be kicked down to 2.5GT.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Chris Metcalf [Mon, 1 Nov 2010 21:00:37 +0000 (17:00 -0400)]
drivers/net/tile/: on-chip network drivers for the tile architecture
This change adds the first network driver for the tile architecture,
supporting the on-chip XGBE and GBE shims.
The infrastructure is present for the TILE-Gx networking drivers (another
three source files in the new directory) but for now the the actual
tilegx sources are waiting on releasing hardware to initial customers.
Note that arch/tile/include/hv/* are "upstream" headers from the
Tilera hypervisor and will probably benefit less from LKML review.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Ian Campbell [Wed, 24 Nov 2010 12:09:41 +0000 (12:09 +0000)]
xen: x86/32: perform initial startup on initial_page_table
Only make swapper_pg_dir readonly and pinned when generic x86 architecture code
(which also starts on initial_page_table) switches to it. This helps ensure
that the generic setup paths work on Xen unmodified. In particular
clone_pgd_range writes directly to the destination pgd entries and is used to
initialise swapper_pg_dir so we need to ensure that it remains writeable until
the last possible moment during bring up.
This is complicated slightly by the need to avoid sharing kernel PMD entries
when running under Xen, therefore the Xen implementation must make a copy of
the kernel PMD (which is otherwise referred to by both intial_page_table and
swapper_pg_dir) before switching to swapper_pg_dir.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Darrick J. Wong [Wed, 24 Nov 2010 05:40:33 +0000 (16:40 +1100)]
md: Call blk_queue_flush() to establish flush/fua support
Before 2.6.37, the md layer had a mechanism for catching I/Os with the
barrier flag set, and translating the barrier into barriers for all
the underlying devices. With 2.6.37, I/O barriers have become plain
old flushes, and the md code was updated to reflect this. However,
one piece was left out -- the md layer does not tell the block layer
that it supports flushes or FUA access at all, which results in md
silently dropping flush requests.
Since the support already seems there, just add this one piece of
bookkeeping.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Wed, 24 Nov 2010 05:39:46 +0000 (16:39 +1100)]
md/raid1: really fix recovery looping when single good device fails.
Commit 4044ba58dd15cb01797c4fd034f39ef4a75f7cc3 supposedly fixed a
problem where if a raid1 with just one good device gets a read-error
during recovery, the recovery would abort and immediately restart in
an infinite loop.
However it depended on raid1_remove_disk removing the spare device
from the array. But that does not happen in this case. So add a test
so that in the 'recovery_disabled' case, the device will be removed.
This suitable for any kernel since 2.6.29 which is when
recovery_disabled was introduced.
Cc: stable@kernel.org Reported-by: Sebastian Färber <faerber@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
Justin Maggard [Wed, 24 Nov 2010 05:36:17 +0000 (16:36 +1100)]
md: fix return value of rdev_size_change()
When trying to grow an array by enlarging component devices,
rdev_size_store() expects the return value of rdev_size_change() to be
in sectors, but the actual value is returned in KBs.
The sysfs files for virtio produce the wrong format and are missing
the required newline. The output for virtio bus vendor/device should
have the same format as the corresponding entries for PCI devices.
Although this technically changes the ABI for sysfs, these files were
broken to start with!
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Jiri Slaby [Sat, 6 Nov 2010 09:06:50 +0000 (10:06 +0100)]
Char: virtio_console, fix memory leak
Stanse found that in init_vqs, memory is leaked under certain
circumstanses (the fail path order is incorrect). Fix that by checking
allocations in one turn and free all of them at once if some fails
(some may be NULL, but this is OK).
Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Amit Shah <amit.shah@redhat.com> Cc: virtualization@lists.linux-foundation.org Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We can't rely on indirect buffers for capacity
calculations because they need a memory allocation
which might fail. In particular, virtio_net can get
into this situation under stress, and it drops packets
and performs badly.
So return the number of buffers we can guarantee users.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Reported-By: Krishna Kumar2 <krkumar2@in.ibm.com>