]> git.karo-electronics.de Git - karo-tx-linux.git/log
karo-tx-linux.git
12 years agortc: add MAX8907 RTC driver
Stephen Warren [Thu, 13 Sep 2012 01:01:18 +0000 (11:01 +1000)]
rtc: add MAX8907 RTC driver

The MAX8907 is an I2C-based power-management IC containing voltage
regulators, a reset controller, a real-time clock, and a touch-screen
controller.

The driver is based on an original by or fixed by:
* Tom Cherry
* Prashant Gaikwad
* Joseph Yoon

During upstreaming, I (swarren):
* Converted to regmap.
* Fixed handling of RTC_HOUR register containing 12.
* Fixed handling of RTC_WEEKDAY register.
* General cleanup.

Signed-off-by: Stephen Warren <swarren@nvidia.com>
Cc: Tom Cherry <tcherry@nvidia.com>
Cc: Prashant Gaikwad <pgaikwad@nvidia.com>
Cc: Joseph Yoon <tyoon@nvidia.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agortc: tps65910: add RTC driver for TPS65910 PMIC RTC
Venu Byravarasu [Thu, 13 Sep 2012 01:01:18 +0000 (11:01 +1000)]
rtc: tps65910: add RTC driver for TPS65910 PMIC RTC

TPS65910 PMIC is a MFD with RTC as one of the device.  Adding RTC driver
for supporting RTC device present inside TPS65910 PMIC.

Only support for RTC alarm is implemented as part of this patch.

Signed-off-by: Venu Byravarasu <vbyravarasu@nvidia.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/rtc/rtc-at91sam9.c: use module_platform_driver() macro
Devendra Naga [Thu, 13 Sep 2012 01:01:18 +0000 (11:01 +1000)]
drivers/rtc/rtc-at91sam9.c: use module_platform_driver() macro

This driver does seems to do only platform_driver_register in the init
function and platform_driver_unregister in the exit function,

so replace all this code including the module_init and module_exit with
module_platform_driver macro...

Signed-off-by: Devendra Naga <develkernel412222@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agortc: recycle id when unloading a rtc driver
Vincent Palatin [Thu, 13 Sep 2012 01:01:17 +0000 (11:01 +1000)]
rtc: recycle id when unloading a rtc driver

When calling rtc_device_unregister, we are not freeing the id used by the
driver.  So when doing a unload/load cycle for a RTC driver (e.g.  rmmod
rtc_cmos && modprobe rtc_cmos), its id is incremented by one.  As a
consequence, we no longer have neither an rtc0 driver nor a
/proc/driver/rtc (as it only exists for the first driver).

Signed-off-by: Vincent Palatin <vpalatin@chromium.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agortc: snvs: change timeout to use a fixed number of loop
Shawn Guo [Thu, 13 Sep 2012 01:01:17 +0000 (11:01 +1000)]
rtc: snvs: change timeout to use a fixed number of loop

Andrew Morton <akpm@linux-foundation.org> wrote:

> The timeout code here is fragile.  If acquiring the spinlock takes more
> than a millisecond or if this thread gets interrupted or preempted then
> we could easily execute that loop just a single time, and fail.
>
> It would be better to retry a fixed number of times, say 1000?  That
> would take around 1 millisecond, but might be overkill.

Take Andrew's suggestion to change the timeout code to retry 1000
times.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Kim Phillips <kim.phillips@freescale.com>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agortc: snvs: add Freescale rtc-snvs driver
Shawn Guo [Thu, 13 Sep 2012 01:01:17 +0000 (11:01 +1000)]
rtc: snvs: add Freescale rtc-snvs driver

Add an RTC driver for Freescale Secure Non-Volatile Storage (SNVS)
Low Power (LP) RTC.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Kim Phillips <kim.phillips@freescale.com>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agortc-add-dallas-ds2404-driver-fix
Andrew Morton [Thu, 13 Sep 2012 01:01:16 +0000 (11:01 +1000)]
rtc-add-dallas-ds2404-driver-fix

drivers/rtc/rtc-ds2404.c:23:1: warning: "DEBUG" redefined <command-line>:
warning: this is the location of the previous definition

Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agortc: add Dallas DS2404 driver
Sven Schnelle [Thu, 13 Sep 2012 01:01:16 +0000 (11:01 +1000)]
rtc: add Dallas DS2404 driver

Signed-off-by: Sven Schnelle <svens@stackframe.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agortc-proc: permit the /proc/driver/rtc device to use other devices
Kim, Milo [Thu, 13 Sep 2012 01:01:16 +0000 (11:01 +1000)]
rtc-proc: permit the /proc/driver/rtc device to use other devices

To get time information via /proc/driver/rtc, only the first device (rtc0)
is used.  If the rtcN (eg.  rtc1 or rtc2) is used for the system clock,
there is no way to get information of rtcN via /proc/driver/rtc.  With
this patch, the time data can be retrieved from the system clock RTC.

If the RTC_HCTOSYS_DEVICE is not defined, then rtc0 is used by default.

Signed-off-by: Milo(Woogyom) Kim <milo.kim@ti.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/rtc/rtc-isl1208.c: add support for the ISL1218
Ben Gardner [Thu, 13 Sep 2012 01:01:15 +0000 (11:01 +1000)]
drivers/rtc/rtc-isl1208.c: add support for the ISL1218

The ISL1218 chip is identical to the ISL1208, except that it has 6
additional user-storage registers.  This patch does not enable access to
those additional registers, but only adds the chip name to the list.

Signed-off-by: Ben Gardner <gardner.ben@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoepoll: support for disabling items, and a self-test app
Paton J. Lewis [Thu, 13 Sep 2012 01:01:15 +0000 (11:01 +1000)]
epoll: support for disabling items, and a self-test app

Enhanced epoll_ctl to support EPOLL_CTL_DISABLE, which disables an epoll
item.  If epoll_ctl doesn't return -EBUSY in this case, it is then safe to
delete the epoll item in a multi-threaded environment.  Also added a new
test_epoll self- test app to both demonstrate the need for this feature
and test it.

Signed-off-by: Paton J. Lewis <palewis@adobe.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Paul Holland <pholland@adobe.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoCodingStyle: add networking specific block comment style
Joe Perches [Thu, 13 Sep 2012 01:01:15 +0000 (11:01 +1000)]
CodingStyle: add networking specific block comment style

The block comment style in net/ and drivers/net is non-standard.
Document it.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: "Allan, Bruce W" <bruce.w.allan@intel.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agocheckpatch: check networking specific block comment style
Joe Perches [Thu, 13 Sep 2012 01:01:15 +0000 (11:01 +1000)]
checkpatch: check networking specific block comment style

In an effort to get fewer checkpatch reviewer corrections, add a
networking specific style test for the preferred networking comment style.

/* The preferred style for block comments in
 * drivers/net/... and net/... is like this
 */

These tests are only used in net/ and drivers/net/

Tested with:

$ cat drivers/net/t.c

/* foo */

/*
 * foo
 */

/* foo
 */

/* foo
 * bar */
$ ./scripts/checkpatch.pl -f drivers/net/t.c
WARNING: networking block comments don't use an empty /* line, use /* Comment...
#4: FILE: net/t.c:4:
+
+/*

WARNING: networking block comments put the trailing */ on a separate line
#12: FILE: net/t.c:12:
+ * bar */

total: 0 errors, 2 warnings, 12 lines checked

Signed-off-by: Joe Perches <joe@perches.com>
Cc: "Allan, Bruce W" <bruce.w.allan@intel.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agocheckpatch: update suggested printk conversions
Joe Perches [Thu, 13 Sep 2012 01:01:14 +0000 (11:01 +1000)]
checkpatch: update suggested printk conversions

Direct conversion of printk(KERN_<LEVEL>...  to pr_<level> isn't the
preferred conversion when a struct net_device or struct device is
available.

Hint that using netdev_<level> or dev_<level> is preferred to using
pr_<level>.  Add netdev_dbg and dev_dbg variants too.

Miscellaneous whitespace neatening of a misplaced close brace.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agocheckpatch: check utf-8 content from a commit log when it's missing from charset
Pasi Savanainen [Thu, 13 Sep 2012 01:01:14 +0000 (11:01 +1000)]
checkpatch: check utf-8 content from a commit log when it's missing from charset

Check that a commit log doesn't contain UTF-8 when a mail header
explicitly defines a different charset, like

'Content-Type: text/plain; charset="us-ascii"'

Signed-off-by: Pasi Savanainen <pasi.savanainen@nixu.com>
Cc: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers-firmware-dmi_scanc-fetch-dmi-version-from-smbios-if-it-exists-checkpatch...
Andrew Morton [Thu, 13 Sep 2012 01:01:14 +0000 (11:01 +1000)]
drivers-firmware-dmi_scanc-fetch-dmi-version-from-smbios-if-it-exists-checkpatch-fixes

WARNING: Prefer pr_info(... to printk(KERN_INFO, ...
#56: FILE: drivers/firmware/dmi_scan.c:426:
+ printk(KERN_INFO "SMBIOS %d.%d present.\n",

WARNING: Prefer pr_info(... to printk(KERN_INFO, ...
#61: FILE: drivers/firmware/dmi_scan.c:431:
+ printk(KERN_INFO "Legacy DMI %d.%d present.\n",

WARNING: Prefer pr_debug(... to printk(KERN_DEBUG, ...
#85: FILE: drivers/firmware/dmi_scan.c:455:
+ printk(KERN_DEBUG "SMBIOS version fixup(2.%d->2.%d)\n",

WARNING: Prefer pr_debug(... to printk(KERN_DEBUG, ...
#90: FILE: drivers/firmware/dmi_scan.c:460:
+ printk(KERN_DEBUG "SMBIOS version fixup(2.%d->2.%d)\n",

total: 0 errors, 4 warnings, 104 lines checked

./patches/drivers-firmware-dmi_scanc-fetch-dmi-version-from-smbios-if-it-exists.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Feng Jin <joe.jin@oracle.com>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/firmware/dmi_scan.c: fetch dmi version from SMBIOS if it exists
Zhenzhong Duan [Thu, 13 Sep 2012 01:01:13 +0000 (11:01 +1000)]
drivers/firmware/dmi_scan.c: fetch dmi version from SMBIOS if it exists

The right dmi version is in SMBIOS if it's zero in DMI region

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Feng Jin <joe.jin@oracle.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers-firmware-dmi_scanc-check-dmi-version-when-get-system-uuid-fix
Andrew Morton [Thu, 13 Sep 2012 01:01:13 +0000 (11:01 +1000)]
drivers-firmware-dmi_scanc-check-dmi-version-when-get-system-uuid-fix

tweak code comment

Cc: Feng Jin <joe.jin@oracle.com>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/firmware/dmi_scan.c: check dmi version when get system uuid
Zhenzhong Duan [Thu, 13 Sep 2012 01:01:13 +0000 (11:01 +1000)]
drivers/firmware/dmi_scan.c: check dmi version when get system uuid

As of version 2.6 of the SMBIOS specification, the first 3
fields of the UUID are supposed to be little-endian encoded.

Also a minor fix to match variable meaning and mute checkpatch.pl

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Feng Jin <joe.jin@oracle.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agocompat_ioctl: Remove unused local typedef
Andi Kleen [Thu, 13 Sep 2012 01:01:12 +0000 (11:01 +1000)]
compat_ioctl: Remove unused local typedef

gcc 4.8 always warns

fs/compat_ioctl.c: In function 'serial_struct_ioctl':
fs/compat_ioctl.c:609:38: warning: typedef 'SS' locally defined but not used [-Wunused-local-typedefs]
         typedef struct serial_struct SS;

Indeed that typedef is unused, so just remove it.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolib/gcd.c: prevent possible div by 0
Davidlohr Bueso [Thu, 13 Sep 2012 01:01:12 +0000 (11:01 +1000)]
lib/gcd.c: prevent possible div by 0

Account for all properties when a and/or b are 0:
gcd(0, 0) = 0
gcd(a, 0) = a
gcd(0, b) = b

Signed-off-by: Davidlohr Bueso <dave@gnu.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolib/Kconfig.debug: adjust hard-lockup related Kconfig options
Jan Beulich [Thu, 13 Sep 2012 01:01:12 +0000 (11:01 +1000)]
lib/Kconfig.debug: adjust hard-lockup related Kconfig options

The main option should not appear in the resulting .config when the
dependencies aren't met (i.e.  use "depends on" rather than directly
setting the default from the combined dependency values).

The sub-options should depend on the main option rather than a more
generic higher level one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolib-parserc-avoid-overflow-in-match_number-fix
Andrew Morton [Thu, 13 Sep 2012 01:01:11 +0000 (11:01 +1000)]
lib-parserc-avoid-overflow-in-match_number-fix

coding-style tweaks

Cc: Alex Elder <elder@inktank.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolib/parser.c: avoid overflow in match_number()
Alex Elder [Thu, 13 Sep 2012 01:01:11 +0000 (11:01 +1000)]
lib/parser.c: avoid overflow in match_number()

The result of converting an integer value to another signed integer type
that's unable to represent the original value is implementation defined.
(See notes in section 6.3.1.3 of the C standard.)

In match_number(), the result of simple_strtol() (which returns type long)
is assigned to a value of type int.

Instead, handle the result of simple_strtol() in a well-defined way, and
return -ERANGE if the result won't fit in the int variable used to hold
the parsed result.

No current callers pay attention to the particular error value returned,
so this additional return code shouldn't do any harm.

Signed-off-by: Alex Elder <elder@inktank.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/video/backlight/pwm_bl.c: use devm_pwm_get()
Sachin Kamat [Thu, 13 Sep 2012 01:01:11 +0000 (11:01 +1000)]
drivers/video/backlight/pwm_bl.c: use devm_pwm_get()

This file already makes use of device managed functions.  Convert
pwm_get() to use it as well.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Acked-by: Jingoo Han <jg1.han@samsung.com>
Cc: Thierry Reding <thierry.reding@avionic-design.de>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agobacklight: remove ProGear driver
Marcin Juszkiewicz [Thu, 13 Sep 2012 01:01:11 +0000 (11:01 +1000)]
backlight: remove ProGear driver

This driver was for the ProGear webpad device which was produced in
2000/2001 and is not available on a market.  I no longer have this
hardware so can not even check how Linux works on it.

Signed-off-by: Marcin Juszkiewicz <marcin@juszkiewicz.com.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agobacklight-add-new-lm3639-backlight-driver-fix-2
G.Shark Jeong [Thu, 13 Sep 2012 01:01:10 +0000 (11:01 +1000)]
backlight-add-new-lm3639-backlight-driver-fix-2

Put NEW_LEDS and LEDS_CLASS into Kconfig file to reduce additional
configuration works in LEDs.

Signed-off-by: G.Shark Jeong <gshark.jeong@gmail.com>
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agobacklight-add-new-lm3639-backlight-driver-fix
Andrew Morton [Thu, 13 Sep 2012 01:01:10 +0000 (11:01 +1000)]
backlight-add-new-lm3639-backlight-driver-fix

code layout tweaks

Cc: "G.Shark Jeong" <gshark.jeong@gmail.com>
Cc: Daniel Jeong <daniel.jeong@ti.com>
Cc: G.Shark Jeong <gshark.jeong@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agobacklight: add new lm3639 backlight driver
G.Shark Jeong [Thu, 13 Sep 2012 01:01:10 +0000 (11:01 +1000)]
backlight: add new lm3639 backlight driver

This driver is a general version for LM3639 backlgiht + flash driver chip
of TI.

LM3639:
The LM3639 is a single chip LCD Display Backlight driver + white LED
Camera driver.  Programming is done over an I2C compatible interface.
www.ti.com

Signed-off-by: G.Shark Jeong <gshark.jeong@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Daniel Jeong <daniel.jeong@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agobacklight-add-backlight-driver-for-lm3630-chip-fix
Andrew Morton [Thu, 13 Sep 2012 01:01:09 +0000 (11:01 +1000)]
backlight-add-backlight-driver-for-lm3630-chip-fix

- make bled_name[] static

- a few coding style tuneups

- create new set_intensity(), partly to avoid awkward layout gymnastics

Cc: "G.Shark Jeong" <gshark.jeong@gmail.com>
Cc: Daniel Jeong <daniel.jeong@ti.com>
Cc: G.Shark Jeong <gshark.jeong@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agobacklight: add Backlight driver for lm3630 chip
G.Shark Jeong [Thu, 13 Sep 2012 01:01:09 +0000 (11:01 +1000)]
backlight: add Backlight driver for lm3630 chip

This driver is a general version for LM3630 backlgiht driver chip of TI.

LM3630 :
The LM3630 is a current mode boost converter which supplies the power
and controls the current in two strings of up to 10 LEDs per string.
Programming is done over an I2C compatible interface.
www.ti.com

Signed-off-by: G.Shark Jeong <gshark.jeong@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Daniel Jeong <daniel.jeong@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agobacklight: lp855x: add FAST bit description for LP8556
Kim, Milo [Thu, 13 Sep 2012 01:01:09 +0000 (11:01 +1000)]
backlight: lp855x: add FAST bit description for LP8556

LP8556 backlight driver supports fast refresh mode when exiting the low
power mode.  This bit can be configurable in the platform side.

Signed-off-by: Milo(Woogyom) Kim <milo.kim@ti.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Bryan Wu <bryan.wu@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/video/backlight/kb3886_bl.c: use usleep_range() instead of msleep() for small...
Jingoo Han [Thu, 13 Sep 2012 01:01:08 +0000 (11:01 +1000)]
drivers/video/backlight/kb3886_bl.c: use usleep_range() instead of msleep() for small sleeps

Since msleep() might not sleep for the desired amount when less than 20ms,
use usleep_range().

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Cc: Claudio Nieder <private@claudio.ch>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/video/backlight/ltv350qv.c: use usleep_range() instead of msleep() for small...
Jingoo Han [Thu, 13 Sep 2012 01:01:08 +0000 (11:01 +1000)]
drivers/video/backlight/ltv350qv.c: use usleep_range() instead of msleep() for small sleeps

Since msleep() might not sleep for the desired amount when less than 20ms,
use usleep_range().

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/video/backlight/da9052_bl.c: use usleep_range() instead of msleep() for small...
Jingoo Han [Thu, 13 Sep 2012 01:01:08 +0000 (11:01 +1000)]
drivers/video/backlight/da9052_bl.c: use usleep_range() instead of msleep() for small sleeps

Since msleep() might not sleep for the desired amount when less than 20ms,
use usleep_range().

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Cc: Ashish Jangam <ashish.jangam@kpitcummins.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/video/backlight/pwm_bl.c: add device tree support for Low Threshold Brightness
Philip, Avinash [Thu, 13 Sep 2012 01:01:07 +0000 (11:01 +1000)]
drivers/video/backlight/pwm_bl.c: add device tree support for Low Threshold Brightness

Low Threshold Brightness should be configured to have a linear relation in
brightness scale.  This patch adds device tree support for low threshold
brightness as optional one for pwm_backlight.

Signed-off-by: Philip, Avinash <avinashphilip@ti.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Cc: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoMAINTAINERS: Update gianfar_ptp after renaming
Joe Perches [Thu, 13 Sep 2012 01:01:07 +0000 (11:01 +1000)]
MAINTAINERS: Update gianfar_ptp after renaming

commit ec21e2ec36769 ("freescale: Move the Freescale drivers")
moved the files, update the pattern.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoMAINTAINERS: add defconfig file to IMX section
Uwe Kleine-König [Thu, 13 Sep 2012 01:01:07 +0000 (11:01 +1000)]
MAINTAINERS: add defconfig file to IMX section

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoMAINTAINERS: update gpio subsystem file list
Yang Bai [Thu, 13 Sep 2012 01:01:06 +0000 (11:01 +1000)]
MAINTAINERS: update gpio subsystem file list

Signed-off-by: Yang Bai <hamo.by@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolib/vsprintf: update documentation to cover all of %p[Mm][FR]
Andy Shevchenko [Thu, 13 Sep 2012 01:01:06 +0000 (11:01 +1000)]
lib/vsprintf: update documentation to cover all of %p[Mm][FR]

Acked-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolib: vsprintf: fix broken comments
George Spelvin [Thu, 13 Sep 2012 01:01:06 +0000 (11:01 +1000)]
lib: vsprintf: fix broken comments

Numbering the 8 potential digits 2 though 9 never did make a lot of sense.

Signed-off-by: George Spelvin <linux@horizon.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolib: vsprintf: optimize put_dec_trunc8()
George Spelvin [Thu, 13 Sep 2012 01:01:06 +0000 (11:01 +1000)]
lib: vsprintf: optimize put_dec_trunc8()

If you're going to have a conditional branch after each 32x32->64-bit
multiply, might as well shrink the code and make it a loop.

This also avoids using the long multiply for small integers.

(This leaves the comments in a confusing state, but that's a separate
patch to make review easier.)

Signed-off-by: George Spelvin <linux@horizon.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolib: vsprintf: optimize division by 10000
George Spelvin [Thu, 13 Sep 2012 01:01:05 +0000 (11:01 +1000)]
lib: vsprintf: optimize division by 10000

The same multiply-by-inverse technique can be used to convert division by
10000 to a 32x32->64-bit multiply.

Signed-off-by: George Spelvin <linux@horizon.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolib: vsprintf: optimize division by 10 for small integers
George Spelvin [Thu, 13 Sep 2012 01:01:05 +0000 (11:01 +1000)]
lib: vsprintf: optimize division by 10 for small integers

Shrink the reciprocal approximations used in put_dec_full4() based on the
comments in put_dec_full9().

Signed-off-by: George Spelvin <linux@horizon.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agokernel/sys.c: call disable_nonboot_cpus() in kernel_restart()
Shawn Guo [Thu, 13 Sep 2012 01:01:05 +0000 (11:01 +1000)]
kernel/sys.c: call disable_nonboot_cpus() in kernel_restart()

As kernel_power_off() calls disable_nonboot_cpus(), we may also want to
have kernel_restart() call disable_nonboot_cpus().  Doing so can help
machines that require boot cpu be the last alive cpu during reboot to
survive with kernel restart.

This fixes one reboot issue seen on imx6q (Cortex-A9 Quad).  The machine
requires that the restart routine be run on the primary cpu rather than
secondary ones.  Otherwise, the secondary core running the restart routine
will fail to come to online after reboot.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agotile: fix personality bits handling upon exec()
Jiri Kosina [Thu, 13 Sep 2012 01:01:04 +0000 (11:01 +1000)]
tile: fix personality bits handling upon exec()

Historically, the top three bytes of personality have been used for things
such as ADDR_NO_RANDOMIZE, which made sense only for specific
architectures.

We now however have a flag there that is general no matter the
architecture (UNAME26); generally we have to be careful to preserve the
personality flags across exec().

This patch fixes tile architecture not to forcefully overwrite personality
flags during exec().

In addition to that, we fix two other things along the way:

- exec_domain switching is fixed -- set_personality() should always
  be used instead of directly assigning to current->personality.
- as pointed out by Arnd Bergmann, PER_LINUX_32BIT is not used anywhere
  by tile, so let's just drop that in favor of PER_LINUX

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agocross-arch: don't corrupt personality flags upon exec()
Jiri Kosina [Thu, 13 Sep 2012 01:01:04 +0000 (11:01 +1000)]
cross-arch: don't corrupt personality flags upon exec()

Historically, the top three bytes of personality have been used for things
such as ADDR_NO_RANDOMIZE, which made sense only for specific
architectures.

We now however have a flag there that is general no matter the
architecture (UNAME26); generally we have to be careful to preserve the
personality flags across exec().

This patch tries to fix all architectures that forcefully overwrite
personality flags during exec() (ppc32 and s390 have been fixed recently
by commits f9783ec86 and 59e4c3a2f in a similar way already).

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolis3: lis3lv02d_spi.c: include linux/of.h
Daniel Mack [Thu, 13 Sep 2012 01:01:04 +0000 (11:01 +1000)]
lis3: lis3lv02d_spi.c: include linux/of.h

This include is needed to define of_match:ptr() for !CONFIG_OF &&
1CONFIG_DTC.

Signed-off-by: Daniel Mack <zonque@gmail.com>
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/misc/lis3lv02d/lis3lv02d_spi.c: add DT matching table passthru code
Daniel Mack [Thu, 13 Sep 2012 01:01:03 +0000 (11:01 +1000)]
drivers/misc/lis3lv02d/lis3lv02d_spi.c: add DT matching table passthru code

If probed from a device tree, this driver now passes the node information
to the generic part, so the runtime information can be derived.

Successfully tested on a PXA3xx board.

Signed-off-by: Daniel Mack <zonque@gmail.com>
Cc: Rob Herring <robherring2@gmail.com>
Cc: "AnilKumar, Chimata" <anilkumar@ti.com>
Reviewed-by: Éric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/misc/lis3lv02d: add generic DT matching code
Daniel Mack [Thu, 13 Sep 2012 01:01:03 +0000 (11:01 +1000)]
drivers/misc/lis3lv02d: add generic DT matching code

Adds logic to parse lis3 properties from a device tree node and store them
in a freshly allocated lis3lv02d_platform_data.

Note that the actual match tables are left out here.  This part should
happen in the drivers that bind to the individual busses (SPI/I2C/PCI).

Also adds some DT bindinds documentation.

Signed-off-by: Daniel Mack <zonque@gmail.com>
Cc: Rob Herring <robherring2@gmail.com>
Cc: "AnilKumar, Chimata" <anilkumar@ti.com>
Reviewed-by: Éric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoscore: select generic atomic64_t support
Fengguang Wu [Thu, 13 Sep 2012 01:01:03 +0000 (11:01 +1000)]
score: select generic atomic64_t support

It's required for the core fs/namespace.c and many other basic features.

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Lennox Wu <lennox.wu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agofrv: kill used but uninitialized variable
Geert Uytterhoeven [Thu, 13 Sep 2012 01:01:02 +0000 (11:01 +1000)]
frv: kill used but uninitialized variable

Commit 6afe1a1fe8ff83f6a ("PM: Remove legacy PM") removed the
initialization of retval, causing:

arch/frv/kernel/pm.c: In function 'sysctl_pm_do_suspend':
arch/frv/kernel/pm.c:165:5: warning: 'retval' may be used uninitialized in this function [-Wuninitialized]

Remove the variable completely to fix this, and convert to a proper
switch (...) { ... } construct to improve readability.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: compaction: Update try_to_compact_pages()kerneldoc comment
Mel Gorman [Thu, 13 Sep 2012 01:01:02 +0000 (11:01 +1000)]
mm: compaction: Update try_to_compact_pages()kerneldoc comment

Parameters were added without documentation, tut tut.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: compaction: move fatal signal check out of compact_checklock_irqsave
Mel Gorman [Thu, 13 Sep 2012 01:01:02 +0000 (11:01 +1000)]
mm: compaction: move fatal signal check out of compact_checklock_irqsave

c67fe3752 ("mm: compaction: Abort async compaction if locks are contended
or taking too long") addressed a lock contention problem in compaction by
introducing compact_checklock_irqsave() that effecively aborting async
compaction in the event of compaction.

To preserve existing behaviour it also moved a fatal_signal_pending()
check into compact_checklock_irqsave() but that is very misleading.  It
"hides" the check within a locking function but has nothing to do with
locking as such.  It just happens to work in a desirable fashion.

This patch moves the fatal_signal_pending() check to
isolate_migratepages_range() where it belongs.  Arguably the same check
should also happen when isolating pages for freeing but it's overkill.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agothp: khugepaged_prealloc_page() forgot to reset the page alloc indicator
Xiao Guangrong [Thu, 13 Sep 2012 01:01:01 +0000 (11:01 +1000)]
thp: khugepaged_prealloc_page() forgot to reset the page alloc indicator

If NUMA is enabled, the indicator is not reset if the previous page
request failed, ausing us to trigger the BUG_ON() in
khugepaged_alloc_page().

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomemory-hotplug: don't replace lowmem pages with highmem
Minchan Kim [Thu, 13 Sep 2012 01:01:01 +0000 (11:01 +1000)]
memory-hotplug: don't replace lowmem pages with highmem

The changelog for 6a6dccba2 ("mm: cma: don't replace lowmem pages with
highmem") mentioned that lowmem pages can be replaced by highmem pages
during CMA migration.  6a6dccba2 fixed that issue.

Quote from that changelog:

:   The filesystem layer expects pages in the block device's mapping to not
:   be in highmem (the mapping's gfp mask is set in bdget()), but CMA can
:   currently replace lowmem pages with highmem pages, leading to crashes in
:   filesystem code such as the one below:
:
:     Unable to handle kernel NULL pointer dereference at virtual address 00000400
:     pgd = c0c98000
:     [00000400] *pgd=00c91831, *pte=00000000, *ppte=00000000
:     Internal error: Oops: 817 [#1] PREEMPT SMP ARM
:     CPU: 0    Not tainted  (3.5.0-rc5+ #80)
:     PC is at __memzero+0x24/0x80
:     ...
:     Process fsstress (pid: 323, stack limit = 0xc0cbc2f0)
:     Backtrace:
:     [<c010e3f0>] (ext4_getblk+0x0/0x180) from [<c010e58c>] (ext4_bread+0x1c/0x98)
:     [<c010e570>] (ext4_bread+0x0/0x98) from [<c0117944>] (ext4_mkdir+0x160/0x3bc)
:      r4:c15337f0
:     [<c01177e4>] (ext4_mkdir+0x0/0x3bc) from [<c00c29e0>] (vfs_mkdir+0x8c/0x98)
:     [<c00c2954>] (vfs_mkdir+0x0/0x98) from [<c00c2a60>] (sys_mkdirat+0x74/0xac)
:      r6:00000000 r5:c152eb40 r4:000001ff r3:c14b43f0
:     [<c00c29ec>] (sys_mkdirat+0x0/0xac) from [<c00c2ab8>] (sys_mkdir+0x20/0x24)
:      r6:beccdcf0 r5:00074000 r4:beccdbbc
:     [<c00c2a98>] (sys_mkdir+0x0/0x24) from [<c000e3c0>] (ret_fast_syscall+0x0/0x30)

Memory-hotplug has same problem as CMA has so the same fix can be applied
to memory-hotplug as well.

Fix it by reusing.

Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm-page_alloc-refactor-out-__alloc_contig_migrate_alloc-checkpatch-fixes
Andrew Morton [Thu, 13 Sep 2012 01:01:01 +0000 (11:01 +1000)]
mm-page_alloc-refactor-out-__alloc_contig_migrate_alloc-checkpatch-fixes

ERROR: code indent should use tabs where possible
#73: FILE: mm/page_isolation.c:253:
+                             int **resultp)$

WARNING: please, no spaces at the start of a line
#73: FILE: mm/page_isolation.c:253:
+                             int **resultp)$

ERROR: code indent should use tabs where possible
#75: FILE: mm/page_isolation.c:255:
+        gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE;$

WARNING: please, no spaces at the start of a line
#75: FILE: mm/page_isolation.c:255:
+        gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE;$

ERROR: code indent should use tabs where possible
#77: FILE: mm/page_isolation.c:257:
+        if (PageHighMem(page))$

WARNING: please, no spaces at the start of a line
#77: FILE: mm/page_isolation.c:257:
+        if (PageHighMem(page))$

ERROR: code indent should use tabs where possible
#78: FILE: mm/page_isolation.c:258:
+                gfp_mask |= __GFP_HIGHMEM;$

WARNING: please, no spaces at the start of a line
#78: FILE: mm/page_isolation.c:258:
+                gfp_mask |= __GFP_HIGHMEM;$

ERROR: code indent should use tabs where possible
#80: FILE: mm/page_isolation.c:260:
+        return alloc_page(gfp_mask);$

WARNING: please, no spaces at the start of a line
#80: FILE: mm/page_isolation.c:260:
+        return alloc_page(gfp_mask);$

total: 5 errors, 5 warnings, 48 lines checked

NOTE: whitespace errors detected, you may wish to use scripts/cleanpatch or
      scripts/cleanfile

./patches/mm-page_alloc-refactor-out-__alloc_contig_migrate_alloc.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm/page_alloc: refactor out __alloc_contig_migrate_alloc()
Minchan Kim [Thu, 13 Sep 2012 01:01:01 +0000 (11:01 +1000)]
mm/page_alloc: refactor out __alloc_contig_migrate_alloc()

__alloc_contig_migrate_alloc() can be used by memory-hotplug so refactor
it out (move + rename as a common name) into page_isolation.c.

Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm/hugetlb.c: remove duplicate inclusion of header file
Sachin Kamat [Thu, 13 Sep 2012 01:01:00 +0000 (11:01 +1000)]
mm/hugetlb.c: remove duplicate inclusion of header file

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: compaction: check lock contention first before taking lock
Shaohua Li [Thu, 13 Sep 2012 01:01:00 +0000 (11:01 +1000)]
mm: compaction: check lock contention first before taking lock

isolate_migratepages_range will take zone->lru_lock first and check if the
lock is contented, if yes, it will release the lock.  This isn't
efficient.  If the lock is truly contented, a lock/unlock pair will
increase the lock contention.  We'd better check if the lock is contended
first.  compact_trylock_irqsave perfectly meets the requirement.

Signed-off-by: Shaohua Li <shli@fusionio.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm-compaction-abort-compaction-loop-if-lock-is-contended-or-run-too-long-fix
Andrew Morton [Thu, 13 Sep 2012 01:01:00 +0000 (11:01 +1000)]
mm-compaction-abort-compaction-loop-if-lock-is-contended-or-run-too-long-fix

make compact_zone_order() require non-NULL arg `contended'

Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shaohua Li <shli@fusionio.com>
Cc: Shaohua Li <shli@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: compaction: abort compaction loop if lock is contended or run too long
Shaohua Li [Thu, 13 Sep 2012 01:00:59 +0000 (11:00 +1000)]
mm: compaction: abort compaction loop if lock is contended or run too long

isolate_migratepages_range() might isolate no pages, for example, when
zone->lru_lock is contended and compaction is async.  In this case, we
should abort compaction, otherwise, compact_zone will run a useless loop
and make zone->lru_lock is even contended.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Shaohua Li <shli@fusionio.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm/memblock: cleanup early_node_map[] related comments
Wanpeng Li [Thu, 13 Sep 2012 01:00:59 +0000 (11:00 +1000)]
mm/memblock: cleanup early_node_map[] related comments

0ee332c14518699 ("memblock: Kill early_node_map[]") removed
early_node_map[].  Clean up the comments to comply with that change.

Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Gavin Shan <shangw@linux.vnet.ibm.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm/memblock: use existing interface to set nid
Wanpeng Li [Thu, 13 Sep 2012 01:00:59 +0000 (11:00 +1000)]
mm/memblock: use existing interface to set nid

Use the existing interface function to set the NUMA node ID (NID) for the
regions, either memory or reserved region.

Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Gavin Shan <shangw@linux.vnet.ibm.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm/memblock: reduce overhead in binary search
Wanpeng Li [Thu, 13 Sep 2012 01:00:58 +0000 (11:00 +1000)]
mm/memblock: reduce overhead in binary search

When checking that the indicated address belongs to the memory region, the
memory regions are checked one by one through a binary search, which will
be time consuming.

If the indicated address isn't in the memory region, then we needn't do
the time-consuming search.  Add a check on the indicated address for that
purpose.

Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Gavin Shan <shangw@linux.vnet.ibm.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoswap-add-a-simple-detector-for-inappropriate-swapin-readahead-fix
Andrew Morton [Thu, 13 Sep 2012 01:00:54 +0000 (11:00 +1000)]
swap-add-a-simple-detector-for-inappropriate-swapin-readahead-fix

tweak code comment

Cc: Hugh Dickins <hughd@google.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@fusionio.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoswap: add a simple detector for inappropriate swapin readahead
Shaohua Li [Thu, 13 Sep 2012 00:59:00 +0000 (10:59 +1000)]
swap: add a simple detector for inappropriate swapin readahead

The swapin readahead does a blind readahead whether or not the swapin is
sequential.  This is ok for harddisk because large reads have relatively
small costs and if the readahead pages are unneeded they can be reclaimed
easily.  But for SSD devices large reads are more expensive than small
one.  If readahead pages are unneeded, reading them in caused significant
overhead

This patch addes a simple random read detection similar to file mmap
readahead.  If a random read is detected, swapin readahead will be
skipped.  This improves a lot for a swap workload with random IO in a fast
SSD.

I run anonymous mmap write micro benchmark, which will triger swapin/swapout.

runtime changes with patch
randwrite harddisk -38.7%
seqwrite harddisk -1.1%
randwrite SSD -46.9%
seqwrite SSD +0.3%

For both harddisk and SSD, the randwrite swap workload run time is reduced
significantly.  Sequential write swap workload hasn't chanage.

Interestingly, the randwrite harddisk test is improved too.  This might be
because swapin readahead needs to allocate extra memory, which further
tights memory pressure, so more swapout/swapin.

Signed-off-by: Shaohua Li <shli@fusionio.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoatomic-implement-generic-atomic_dec_if_positive-fix
Andrew Morton [Thu, 13 Sep 2012 22:49:24 +0000 (08:49 +1000)]
atomic-implement-generic-atomic_dec_if_positive-fix

do the "#define foo foo" trick in the conventional manner

Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@fusionio.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoatomic: implement generic atomic_dec_if_positive()
Shaohua Li [Thu, 13 Sep 2012 22:49:20 +0000 (08:49 +1000)]
atomic: implement generic atomic_dec_if_positive()

The x86 implementation of atomic_dec_if_positive is quite generic, so make
it available to all architectures.

This is needed for "swap: add a simple detector for inappropriate swapin
readahead".

Signed-off-by: Shaohua Li <shli@fusionio.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michal Simek <monstr@monstr.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomemory-hotplug-fix-pages-missed-by-race-rather-than-failng-fix
Andrew Morton [Thu, 13 Sep 2012 00:59:00 +0000 (10:59 +1000)]
memory-hotplug-fix-pages-missed-by-race-rather-than-failng-fix

small cleanup

Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomemory-hotplug: fix pages missed by race rather than failing
Minchan Kim [Thu, 13 Sep 2012 00:59:00 +0000 (10:59 +1000)]
memory-hotplug: fix pages missed by race rather than failing

If race between allocation and isolation in memory-hotplug offline
happens, some pages could be in MIGRATE_MOVABLE of free_list although the
pageblock's migratetype of the page is MIGRATE_ISOLATE.

The race could be detected by get_freepage_migratetype in
__test_page_isolated_in_pageblock.  If it is detected, now EBUSY gets
bubbled all the way up and the hotplug operations fails.

But better idea is instead of returning and failing memory-hotremove, move
the free page to the correct list at the time it is detected.  It could
enhance memory-hotremove operation success ratio although the race is
really rare.

Suggested by Mel Gorman.

Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomemory-hotplug: bug fix race between isolation and allocation
Minchan Kim [Thu, 13 Sep 2012 00:59:00 +0000 (10:59 +1000)]
memory-hotplug: bug fix race between isolation and allocation

Like below, memory-hotplug makes race between page-isolation
and page-allocation so it can hit BUG_ON in __offline_isolated_pages.

CPU A CPU B

start_isolate_page_range
set_migratetype_isolate
spin_lock_irqsave(zone->lock)

free_hot_cold_page(Page A)
/* without zone->lock */
migratetype = get_pageblock_migratetype(Page A);
/*
 * Page could be moved into MIGRATE_MOVABLE
 * of per_cpu_pages
 */
list_add_tail(&page->lru, &pcp->lists[migratetype]);

set_pageblock_isolate
move_freepages_block
drain_all_pages

/* Page A could be in MIGRATE_MOVABLE of free_list. */

check_pages_isolated
__test_page_isolated_in_pageblock
/*
 * We can't catch freed page which
 * is free_list[MIGRATE_MOVABLE]
 */
if (PageBuddy(page A))
pfn += 1 << page_order(page A);

/* So, Page A could be allocated */

__offline_isolated_pages
/*
 * BUG_ON hit or offline page
 * which is used by someone
 */
BUG_ON(!PageBuddy(page A));

This patch checks page's migratetype in freelist in
__test_page_isolated_in_pageblock.  So now
__test_page_isolated_in_pageblock can check the page caused by above race
and can fail of memory offlining.

Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: remain migratetype in freed page
Minchan Kim [Thu, 13 Sep 2012 00:58:59 +0000 (10:58 +1000)]
mm: remain migratetype in freed page

The page allocator caches the pageblock information in page->private while
it is in the PCP freelists but this is overwritten with the order of the
page when freed to the buddy allocator.  This patch stores the migratetype
of the page in the page->index field so that it is available at all times
when the page remain in free_list.

This patch adds a new call site in __free_pages_ok so it might be overhead
a bit but it's for high order allocation.  So I believe damage isn't hurt.

Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: page_alloc: use get_freepage_migratetype() instead of page_private()
Minchan Kim [Thu, 13 Sep 2012 00:58:59 +0000 (10:58 +1000)]
mm: page_alloc: use get_freepage_migratetype() instead of page_private()

The page allocator uses set_page_private and page_private for handling
migratetype when it frees page.  Let's replace them with [set|get]
_freepage_migratetype to make it more clear.

Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: cma: discard clean pages during contiguous allocation instead of migration
Minchan Kim [Thu, 13 Sep 2012 00:58:57 +0000 (10:58 +1000)]
mm: cma: discard clean pages during contiguous allocation instead of migration

Drop clean cache pages instead of migration during alloc_contig_range() to
minimise allocation latency by reducing the amount of migration that is
necessary.  It's useful for CMA because latency of migration is more
important than evicting the background process's working set.  In
addition, as pages are reclaimed then fewer free pages for migration
targets are required so it avoids memory reclaiming to get free pages,
which is a contributory factor to increased latency.

I measured elapsed time of __alloc_contig_migrate_range() which migrates
10M in 40M movable zone in QEMU machine.

Before - 146ms, After - 7ms

Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Mel Gorman <mgorman@suse.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Rik van Riel <riel@redhat.com>
Tested-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: mmu_notifier: make the mmu_notifier srcu static
Andrea Arcangeli [Thu, 13 Sep 2012 00:58:57 +0000 (10:58 +1000)]
mm: mmu_notifier: make the mmu_notifier srcu static

The variable must be static especially given the variable name.

s/RCU/SRCU/ over a few comments.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomemory-hotplug: build zonelists when offlining pages
Xishi Qiu [Thu, 13 Sep 2012 00:58:57 +0000 (10:58 +1000)]
memory-hotplug: build zonelists when offlining pages

online_pages() does build_all_zonelists() and zone_pcp_update(), I think
offline_pages() should do it too.

When the zone has no memory to allocate, remove it from other nodes'
zonelists.  zone_batchsize() depends on zone's present pages, if zone's
present pages are changed, zone's pcp should be updated.

Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: avoid taking rmap locks in move_ptes()
Michel Lespinasse [Thu, 13 Sep 2012 00:58:56 +0000 (10:58 +1000)]
mm: avoid taking rmap locks in move_ptes()

During mremap(), the destination VMA is generally placed after the
original vma in rmap traversal order: in move_vma(), we always have
new_pgoff >= vma->vm_pgoff, and as a result new_vma->vm_pgoff >=
vma->vm_pgoff unless vma_merge() merged the new vma with an adjacent one.

When the destination VMA is placed after the original in rmap traversal
order, we can avoid taking the rmap locks in move_ptes().

Essentially, this reintroduces the optimization that had been disabled in
"mm anon rmap: remove anon_vma_moveto_tail".  The difference is that we
don't try to impose the rmap traversal order; instead we just rely on
things being in the desired order in the common case and fall back to
taking locks in the uncommon case.  Also we skip the i_mmap_mutex in
addition to the anon_vma lock: in both cases, the vmas are traversed in
increasing vm_pgoff order with ties resolved in tree insertion order.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: add CONFIG_DEBUG_VM_RB build option
Michel Lespinasse [Thu, 13 Sep 2012 00:58:56 +0000 (10:58 +1000)]
mm: add CONFIG_DEBUG_VM_RB build option

Add a CONFIG_DEBUG_VM_RB build option for the previously existing
DEBUG_MM_RB code.  Now that Andi Kleen modified it to avoid using
recursive algorithms, we can expose it a bit more.

Also extend this code to validate_mm() after stack expansion, and to check
that the vma's start and last pgoffs have not changed since the nodes were
inserted on the anon vma interval tree (as it is important that the nodes
be reindexed after each such update).

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm rmap: remove vma_address check for address inside vma
Michel Lespinasse [Thu, 13 Sep 2012 00:58:56 +0000 (10:58 +1000)]
mm rmap: remove vma_address check for address inside vma

In file and anon rmap, we use interval trees to find potentially relevant
vmas and then call vma_address() to find the virtual address the given
page might be found at in these vmas.  vma_address() used to include a
check that the returned address falls within the limits of the vma, but
this check isn't necessary now that we always use interval trees in rmap:
the interval tree just doesn't return any vmas which this check would find
to be irrelevant.  As a result, we can replace the use of -EFAULT error
code (which then needed to be checked in every call site) with a
VM_BUG_ON().

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm anon rmap: replace same_anon_vma linked list with an interval tree.
Michel Lespinasse [Thu, 13 Sep 2012 00:58:55 +0000 (10:58 +1000)]
mm anon rmap: replace same_anon_vma linked list with an interval tree.

When a large VMA (anon or private file mapping) is first touched, which
will populate its anon_vma field, and then split into many regions through
the use of mprotect(), the original anon_vma ends up linking all of the
vmas on a linked list.  This can cause rmap to become inefficient, as we
have to walk potentially thousands of irrelevent vmas before finding the
one a given anon page might fall into.

By replacing the same_anon_vma linked list with an interval tree (where
each avc's interval is determined by its vma's start and last pgoffs), we
can make rmap efficient for this use case again.

While the change is large, all of its pieces are fairly simple.

Most places that were walking the same_anon_vma list were looking for a
known pgoff, so they can just use the anon_vma_interval_tree_foreach()
interval tree iterator instead.  The exception here is ksm, where the
page's index is not known.  It would probably be possible to rework ksm so
that the index would be known, but for now I have decided to keep things
simple and just walk the entirety of the interval tree there.

When updating vma's that already have an anon_vma assigned, we must take
care to re-index the corresponding avc's on their interval tree.  This is
done through the use of anon_vma_interval_tree_pre_update_vma() and
anon_vma_interval_tree_post_update_vma(), which remove the avc's from
their interval tree before the update and re-insert them after the update.
 The anon_vma stays locked during the update, so there is no chance that
rmap would miss the vmas that are being updated.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm anon rmap: remove anon_vma_moveto_tail
Michel Lespinasse [Thu, 13 Sep 2012 00:58:55 +0000 (10:58 +1000)]
mm anon rmap: remove anon_vma_moveto_tail

mremap() had a clever optimization where move_ptes() did not take the
anon_vma lock to avoid a race with anon rmap users such as page migration.
 Instead, the avc's were ordered in such a way that the origin vma was
always visited by rmap before the destination.  This ordering and the use
of page table locks rmap usage safe.  However, we want to replace the use
of linked lists in anon rmap with an interval tree, and this will make it
harder to impose such ordering as the interval tree will always be sorted
by the avc->vma->vm_pgoff value.  For now, let's replace the
anon_vma_moveto_tail() ordering function with proper anon_vma locking in
move_ptes().  Once we have the anon interval tree in place, we will
re-introduce an optimization to avoid taking these locks in the most
common cases.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: interval tree updates
Michel Lespinasse [Thu, 13 Sep 2012 00:58:55 +0000 (10:58 +1000)]
mm: interval tree updates

Update the generic interval tree code that was introduced in "mm: replace
vma prio_tree with an interval tree".

Changes:

- fixed 'endpoing' typo noticed by Andrew Morton

- replaced include/linux/interval_tree_tmpl.h, which was used as a
  template (including it automatically defined the interval tree
  functions) with include/linux/interval_tree_generic.h, which only
  defines a preprocessor macro INTERVAL_TREE_DEFINE(), which itself
  defines the interval tree functions when invoked. Now that is a very
  long macro which is unfortunate, but it does make the usage sites
  (lib/interval_tree.c and mm/interval_tree.c) a bit nicer than previously.

- make use of RB_DECLARE_CALLBACKS() in the INTERVAL_TREE_DEFINE() macro,
  instead of duplicating that code in the interval tree template.

- replaced vma_interval_tree_add(), which was actually handling the
  nonlinear and interval tree cases, with vma_interval_tree_insert_after()
  which handles only the interval tree case and has an API that is more
  consistent with the other interval tree handling functions.
  The nonlinear case is now handled explicitly in kernel/fork.c dup_mmap().

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: move augmented rbtree functionality to rbtree_augmented.h
Michel Lespinasse [Thu, 13 Sep 2012 00:58:54 +0000 (10:58 +1000)]
rbtree: move augmented rbtree functionality to rbtree_augmented.h

Provide rb_insert_augmented() and rb_erase_augmented through a new
rbtree_augmented.h include file.  rb_erase_augmented() is defined there as
an __always_inline function, in order to allow inlining of augmented
rbtree callbacks into it.  Since this generates a relatively large
function, each augmented rbtree users should make sure to have a single
call site.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoprio_tree: remove
Michel Lespinasse [Thu, 13 Sep 2012 00:58:54 +0000 (10:58 +1000)]
prio_tree: remove

After both prio_tree users have been converted to use red-black trees,
there is no need to keep around the prio tree library anymore.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agokmemleak: use rbtree instead of prio tree
Michel Lespinasse [Thu, 13 Sep 2012 00:58:54 +0000 (10:58 +1000)]
kmemleak: use rbtree instead of prio tree

kmemleak uses a tree where each node represents an allocated memory object
in order to quickly find out what object a given address is part of.
However, the objects don't overlap, so rbtrees are a better choice than
prio tree for this use.  They are both faster and have lower memory
overhead.

Tested by booting a kernel with kmemleak enabled, loading the
kmemleak_test module, and looking for the expected messages.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: replace vma prio_tree with an interval tree
Michel Lespinasse [Thu, 13 Sep 2012 00:58:53 +0000 (10:58 +1000)]
mm: replace vma prio_tree with an interval tree

Implement an interval tree as a replacement for the VMA prio_tree.  The
algorithms are similar to lib/interval_tree.c; however that code can't be
directly reused as the interval endpoints are not explicitly stored in the
VMA.  So instead, the common algorithm is moved into a template and the
details (node type, how to get interval endpoints from the node, etc) are
filled in using the C preprocessor.

Once the interval tree functions are available, using them as a
replacement to the VMA prio tree is a relatively simple, mechanical job.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: add prio tree and interval tree tests
Michel Lespinasse [Thu, 13 Sep 2012 00:58:53 +0000 (10:58 +1000)]
rbtree: add prio tree and interval tree tests

Patch 1 implements support for interval trees, on top of the augmented
rbtree API. It also adds synthetic tests to compare the performance of
interval trees vs prio trees. Short answers is that interval trees are
slightly faster (~25%) on insert/erase, and much faster (~2.4 - 3x)
on search. It is debatable how realistic the synthetic test is, and I have
not made such measurements yet, but my impression is that interval trees
would still come out faster.

Patch 2 uses a preprocessor template to make the interval tree generic,
and uses it as a replacement for the vma prio_tree.

Patch 3 takes the other prio_tree user, kmemleak, and converts it to use
a basic rbtree. We don't actually need the augmented rbtree support here
because the intervals are always non-overlapping.

Patch 4 removes the now-unused prio tree library.

Patch 5 proposes an additional optimization to rb_erase_augmented, now
providing it as an inline function so that the augmented callbacks can be
inlined in. This provides an additional 5-10% performance improvement
for the interval tree insert/erase benchmark. There is a maintainance cost
as it exposes augmented rbtree users to some of the rbtree library internals;
however I think this cost shouldn't be too high as I expect the augmented
rbtree will always have much less users than the base rbtree.

I should probably add a quick summary of why I think it makes sense to
replace prio trees with augmented rbtree based interval trees now.  One of
the drivers is that we need augmented rbtrees for Rik's vma gap finding
code, and once you have them, it just makes sense to use them for interval
trees as well, as this is the simpler and more well known algorithm.  prio
trees, in comparison, seem *too* clever: they impose an additional 'heap'
constraint on the tree, which they use to guarantee a faster worst-case
complexity of O(k+log N) for stabbing queries in a well-balanced prio
tree, vs O(k*log N) for interval trees (where k=number of matches,
N=number of intervals).  Now this sounds great, but in practice prio trees
don't realize this theorical benefit.  First, the additional constraint
makes them harder to update, so that the kernel implementation has to
simplify things by balancing them like a radix tree, which is not always
ideal.  Second, the fact that there are both index and heap properties
makes both tree manipulation and search more complex, which results in a
higher multiplicative time constant.  As it turns out, the simple interval
tree algorithm ends up running faster than the more clever prio tree.

This patch:

Add two test modules:

- prio_tree_test measures the performance of lib/prio_tree.c, both for
  insertion/removal and for stabbing searches

- interval_tree_test measures the performance of a library of equivalent
  functionality, built using the augmented rbtree support.

In order to support the second test module, lib/interval_tree.c is
introduced. It is kept separate from the interval_tree_test main file
for two reasons: first we don't want to provide an unfair advantage
over prio_tree_test by having everything in a single compilation unit,
and second there is the possibility that the interval tree functionality
could get some non-test users in kernel over time.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: add RB_DECLARE_CALLBACKS() macro
Michel Lespinasse [Thu, 13 Sep 2012 00:58:53 +0000 (10:58 +1000)]
rbtree: add RB_DECLARE_CALLBACKS() macro

As proposed by Peter Zijlstra, this makes it easier to define the augmented
rbtree callbacks.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: remove prior augmented rbtree implementation
Michel Lespinasse [Thu, 13 Sep 2012 00:58:52 +0000 (10:58 +1000)]
rbtree: remove prior augmented rbtree implementation

convert arch/x86/mm/pat_rbtree.c to the proposed augmented rbtree api
and remove the old augmented rbtree implementation.

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: faster augmented rbtree manipulation
Michel Lespinasse [Thu, 13 Sep 2012 00:58:52 +0000 (10:58 +1000)]
rbtree: faster augmented rbtree manipulation

Introduce new augmented rbtree APIs that allow minimal recalculation of
augmented node information.

A new callback is added to the rbtree insertion and erase rebalancing
functions, to be called on each tree rotations. Such rotations preserve
the subtree's root augmented value, but require recalculation of the one
child that was previously located at the subtree root.

In the insertion case, the handcoded search phase must be updated to
maintain the augmented information on insertion, and then the rbtree
coloring/rebalancing algorithms keep it up to date.

In the erase case, things are more complicated since it is library
code that manipulates the rbtree in order to remove internal nodes.
This requires a couple additional callbacks to copy a subtree's
augmented value when a new root is stitched in, and to recompute
augmented values down the ancestry path when a node is removed from
the tree.

In order to preserve maximum speed for the non-augmented case,
we provide two versions of each tree manipulation function.
rb_insert_augmented() is the augmented equivalent of rb_insert_color(),
and rb_erase_augmented() is the augmented equivalent of rb_erase().

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: augmented rbtree test
Michel Lespinasse [Thu, 13 Sep 2012 00:58:52 +0000 (10:58 +1000)]
rbtree: augmented rbtree test

Small test to measure the performance of augmented rbtrees.

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: low level optimizations in rb_erase()
Michel Lespinasse [Thu, 13 Sep 2012 00:58:52 +0000 (10:58 +1000)]
rbtree: low level optimizations in rb_erase()

Various minor optimizations in rb_erase():
- Avoid multiple loading of node->__rb_parent_color when computing parent
  and color information (possibly not in close sequence, as there might
  be further branches in the algorithm)
- In the 1-child subcase of case 1, copy the __rb_parent_color field from
  the erased node to the child instead of recomputing it from the desired
  parent and color
- When searching for the erased node's successor, differentiate between
  cases 2 and 3 based on whether any left links were followed. This avoids
  a condition later down.
- In case 3, keep a pointer to the erased node's right child so we don't
  have to refetch it later to adjust its parent.
- In the no-childs subcase of cases 2 and 3, place the rebalance assigment
  last so that the compiler can remove the following if(rebalance) test.

Also, added some comments to illustrate cases 2 and 3.

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: handle 1-child recoloring in rb_erase() instead of rb_erase_color()
Michel Lespinasse [Thu, 13 Sep 2012 00:58:51 +0000 (10:58 +1000)]
rbtree: handle 1-child recoloring in rb_erase() instead of rb_erase_color()

An interesting observation for rb_erase() is that when a node has
exactly one child, the node must be black and the child must be red.
An interesting consequence is that removing such a node can be done by
simply replacing it with its child and making the child black,
which we can do efficiently in rb_erase(). __rb_erase_color() then
only needs to handle the no-childs case and can be modified accordingly.

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: place easiest case first in rb_erase()
Michel Lespinasse [Thu, 13 Sep 2012 00:58:51 +0000 (10:58 +1000)]
rbtree: place easiest case first in rb_erase()

In rb_erase, move the easy case (node to erase has no more than
1 child) first. I feel the code reads easier that way.

Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: add __rb_change_child() helper function
Michel Lespinasse [Thu, 13 Sep 2012 00:58:51 +0000 (10:58 +1000)]
rbtree: add __rb_change_child() helper function

Add __rb_change_child() as an inline helper function to replace code that
would otherwise be duplicated 4 times in the source.

No changes to binary size or speed.

Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree test: fix sparse warning about 64-bit constant
Michel Lespinasse [Thu, 13 Sep 2012 00:58:50 +0000 (10:58 +1000)]
rbtree test: fix sparse warning about 64-bit constant

Just a small fix to make sparse happy.

Signed-off-by: Michel Lespinasse <walken@google.com>
Reported-by: Fengguang Wu <wfg@linux.intel.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: optimize fetching of sibling node
Michel Lespinasse [Thu, 13 Sep 2012 00:58:50 +0000 (10:58 +1000)]
rbtree: optimize fetching of sibling node

When looking to fetch a node's sibling, we went through a sequence of:
- check if node is the parent's left child
- if it is, then fetch the parent's right child

This can be replaced with:
- fetch the parent's right child as an assumed sibling
- check that node is NOT the fetched child

This avoids fetching the parent's left child when node is actually
that child. Saves a bit on code size, though it doesn't seem to make
a large difference in speed.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: coding style adjustments
Michel Lespinasse [Thu, 13 Sep 2012 00:58:50 +0000 (10:58 +1000)]
rbtree: coding style adjustments

Set comment and indentation style to be consistent with linux coding style
and the rest of the file, as suggested by Peter Zijlstra

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: low level optimizations in __rb_erase_color()
Michel Lespinasse [Thu, 13 Sep 2012 00:58:49 +0000 (10:58 +1000)]
rbtree: low level optimizations in __rb_erase_color()

In __rb_erase_color(), we often already have pointers to the nodes being
rotated and/or know what their colors must be, so we can generate more
efficient code than the generic __rb_rotate_left() and __rb_rotate_right()
functions.

Also when the current node is red or when flipping the sibling's color,
the parent is already known so we can use the more efficient
rb_set_parent_color() function to set the desired color.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>