]> git.karo-electronics.de Git - karo-tx-linux.git/log
karo-tx-linux.git
12 years agortc: add MAX8907 RTC driver
Stephen Warren [Fri, 7 Sep 2012 00:24:57 +0000 (10:24 +1000)]
rtc: add MAX8907 RTC driver

The MAX8907 is an I2C-based power-management IC containing voltage
regulators, a reset controller, a real-time clock, and a touch-screen
controller.

The driver is based on an original by or fixed by:
* Tom Cherry
* Prashant Gaikwad
* Joseph Yoon

During upstreaming, I (swarren):
* Converted to regmap.
* Fixed handling of RTC_HOUR register containing 12.
* Fixed handling of RTC_WEEKDAY register.
* General cleanup.

Signed-off-by: Stephen Warren <swarren@nvidia.com>
Cc: Tom Cherry <tcherry@nvidia.com>
Cc: Prashant Gaikwad <pgaikwad@nvidia.com>
Cc: Joseph Yoon <tyoon@nvidia.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agortc: tps65910: add RTC driver for TPS65910 PMIC RTC
Venu Byravarasu [Fri, 7 Sep 2012 00:24:56 +0000 (10:24 +1000)]
rtc: tps65910: add RTC driver for TPS65910 PMIC RTC

TPS65910 PMIC is a MFD with RTC as one of the device.  Adding RTC driver
for supporting RTC device present inside TPS65910 PMIC.

Only support for RTC alarm is implemented as part of this patch.

Signed-off-by: Venu Byravarasu <vbyravarasu@nvidia.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/rtc/rtc-at91sam9.c: use module_platform_driver() macro
Devendra Naga [Fri, 7 Sep 2012 00:24:56 +0000 (10:24 +1000)]
drivers/rtc/rtc-at91sam9.c: use module_platform_driver() macro

This driver does seems to do only platform_driver_register in the init
function and platform_driver_unregister in the exit function,

so replace all this code including the module_init and module_exit with
module_platform_driver macro...

Signed-off-by: Devendra Naga <develkernel412222@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agortc: recycle id when unloading a rtc driver
Vincent Palatin [Fri, 7 Sep 2012 00:24:56 +0000 (10:24 +1000)]
rtc: recycle id when unloading a rtc driver

When calling rtc_device_unregister, we are not freeing the id used by the
driver.  So when doing a unload/load cycle for a RTC driver (e.g.  rmmod
rtc_cmos && modprobe rtc_cmos), its id is incremented by one.  As a
consequence, we no longer have neither an rtc0 driver nor a
/proc/driver/rtc (as it only exists for the first driver).

Signed-off-by: Vincent Palatin <vpalatin@chromium.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agortc: snvs: change timeout to use a fixed number of loop
Shawn Guo [Fri, 7 Sep 2012 00:24:55 +0000 (10:24 +1000)]
rtc: snvs: change timeout to use a fixed number of loop

Andrew Morton <akpm@linux-foundation.org> wrote:

> The timeout code here is fragile.  If acquiring the spinlock takes more
> than a millisecond or if this thread gets interrupted or preempted then
> we could easily execute that loop just a single time, and fail.
>
> It would be better to retry a fixed number of times, say 1000?  That
> would take around 1 millisecond, but might be overkill.

Take Andrew's suggestion to change the timeout code to retry 1000
times.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Kim Phillips <kim.phillips@freescale.com>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agortc: snvs: add Freescale rtc-snvs driver
Shawn Guo [Fri, 7 Sep 2012 00:24:55 +0000 (10:24 +1000)]
rtc: snvs: add Freescale rtc-snvs driver

Add an RTC driver for Freescale Secure Non-Volatile Storage (SNVS)
Low Power (LP) RTC.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Kim Phillips <kim.phillips@freescale.com>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agortc-add-dallas-ds2404-driver-fix
Andrew Morton [Fri, 7 Sep 2012 00:24:55 +0000 (10:24 +1000)]
rtc-add-dallas-ds2404-driver-fix

drivers/rtc/rtc-ds2404.c:23:1: warning: "DEBUG" redefined <command-line>:
warning: this is the location of the previous definition

Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agortc: add Dallas DS2404 driver
Sven Schnelle [Fri, 7 Sep 2012 00:24:54 +0000 (10:24 +1000)]
rtc: add Dallas DS2404 driver

Signed-off-by: Sven Schnelle <svens@stackframe.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agortc-proc: permit the /proc/driver/rtc device to use other devices
Kim, Milo [Fri, 7 Sep 2012 00:24:54 +0000 (10:24 +1000)]
rtc-proc: permit the /proc/driver/rtc device to use other devices

To get time information via /proc/driver/rtc, only the first device (rtc0)
is used.  If the rtcN (eg.  rtc1 or rtc2) is used for the system clock,
there is no way to get information of rtcN via /proc/driver/rtc.  With
this patch, the time data can be retrieved from the system clock RTC.

If the RTC_HCTOSYS_DEVICE is not defined, then rtc0 is used by default.

Signed-off-by: Milo(Woogyom) Kim <milo.kim@ti.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/rtc/rtc-isl1208.c: add support for the ISL1218
Ben Gardner [Fri, 7 Sep 2012 00:24:54 +0000 (10:24 +1000)]
drivers/rtc/rtc-isl1208.c: add support for the ISL1218

The ISL1218 chip is identical to the ISL1208, except that it has 6
additional user-storage registers.  This patch does not enable access to
those additional registers, but only adds the chip name to the list.

Signed-off-by: Ben Gardner <gardner.ben@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoepoll: support for disabling items, and a self-test app
Paton J. Lewis [Fri, 7 Sep 2012 00:24:53 +0000 (10:24 +1000)]
epoll: support for disabling items, and a self-test app

Enhanced epoll_ctl to support EPOLL_CTL_DISABLE, which disables an epoll
item.  If epoll_ctl doesn't return -EBUSY in this case, it is then safe to
delete the epoll item in a multi-threaded environment.  Also added a new
test_epoll self- test app to both demonstrate the need for this feature
and test it.

Signed-off-by: Paton J. Lewis <palewis@adobe.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Paul Holland <pholland@adobe.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers-firmware-dmi_scanc-fetch-dmi-version-from-smbios-if-it-exists-checkpatch...
Andrew Morton [Fri, 7 Sep 2012 00:24:53 +0000 (10:24 +1000)]
drivers-firmware-dmi_scanc-fetch-dmi-version-from-smbios-if-it-exists-checkpatch-fixes

WARNING: Prefer pr_info(... to printk(KERN_INFO, ...
#56: FILE: drivers/firmware/dmi_scan.c:426:
+ printk(KERN_INFO "SMBIOS %d.%d present.\n",

WARNING: Prefer pr_info(... to printk(KERN_INFO, ...
#61: FILE: drivers/firmware/dmi_scan.c:431:
+ printk(KERN_INFO "Legacy DMI %d.%d present.\n",

WARNING: Prefer pr_debug(... to printk(KERN_DEBUG, ...
#85: FILE: drivers/firmware/dmi_scan.c:455:
+ printk(KERN_DEBUG "SMBIOS version fixup(2.%d->2.%d)\n",

WARNING: Prefer pr_debug(... to printk(KERN_DEBUG, ...
#90: FILE: drivers/firmware/dmi_scan.c:460:
+ printk(KERN_DEBUG "SMBIOS version fixup(2.%d->2.%d)\n",

total: 0 errors, 4 warnings, 104 lines checked

./patches/drivers-firmware-dmi_scanc-fetch-dmi-version-from-smbios-if-it-exists.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Feng Jin <joe.jin@oracle.com>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/firmware/dmi_scan.c: fetch dmi version from SMBIOS if it exists
Zhenzhong Duan [Fri, 7 Sep 2012 00:24:53 +0000 (10:24 +1000)]
drivers/firmware/dmi_scan.c: fetch dmi version from SMBIOS if it exists

The right dmi version is in SMBIOS if it's zero in DMI region

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Feng Jin <joe.jin@oracle.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers-firmware-dmi_scanc-check-dmi-version-when-get-system-uuid-fix
Andrew Morton [Fri, 7 Sep 2012 00:24:52 +0000 (10:24 +1000)]
drivers-firmware-dmi_scanc-check-dmi-version-when-get-system-uuid-fix

tweak code comment

Cc: Feng Jin <joe.jin@oracle.com>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/firmware/dmi_scan.c: check dmi version when get system uuid
Zhenzhong Duan [Fri, 7 Sep 2012 00:24:52 +0000 (10:24 +1000)]
drivers/firmware/dmi_scan.c: check dmi version when get system uuid

As of version 2.6 of the SMBIOS specification, the first 3
fields of the UUID are supposed to be little-endian encoded.

Also a minor fix to match variable meaning and mute checkpatch.pl

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Feng Jin <joe.jin@oracle.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolib-parserc-avoid-overflow-in-match_number-fix
Andrew Morton [Fri, 7 Sep 2012 00:24:52 +0000 (10:24 +1000)]
lib-parserc-avoid-overflow-in-match_number-fix

coding-style tweaks

Cc: Alex Elder <elder@inktank.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolib/parser.c: avoid overflow in match_number()
Alex Elder [Fri, 7 Sep 2012 00:24:51 +0000 (10:24 +1000)]
lib/parser.c: avoid overflow in match_number()

The result of converting an integer value to another signed integer type
that's unable to represent the original value is implementation defined.
(See notes in section 6.3.1.3 of the C standard.)

In match_number(), the result of simple_strtol() (which returns type long)
is assigned to a value of type int.

Instead, handle the result of simple_strtol() in a well-defined way, and
return -ERANGE if the result won't fit in the int variable used to hold
the parsed result.

No current callers pay attention to the particular error value returned,
so this additional return code shouldn't do any harm.

Signed-off-by: Alex Elder <elder@inktank.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agobacklight: remove ProGear driver
Marcin Juszkiewicz [Fri, 7 Sep 2012 00:24:51 +0000 (10:24 +1000)]
backlight: remove ProGear driver

This driver was for the ProGear webpad device which was produced in
2000/2001 and is not available on a market.  I no longer have this
hardware so can not even check how Linux works on it.

Signed-off-by: Marcin Juszkiewicz <marcin@juszkiewicz.com.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agobacklight-add-new-lm3639-backlight-driver-fix
Andrew Morton [Fri, 7 Sep 2012 00:24:51 +0000 (10:24 +1000)]
backlight-add-new-lm3639-backlight-driver-fix

code layout tweaks

Cc: "G.Shark Jeong" <gshark.jeong@gmail.com>
Cc: Daniel Jeong <daniel.jeong@ti.com>
Cc: G.Shark Jeong <gshark.jeong@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agobacklight: add new lm3639 backlight driver
G.Shark Jeong [Fri, 7 Sep 2012 00:24:50 +0000 (10:24 +1000)]
backlight: add new lm3639 backlight driver

This driver is a general version for LM3639 backlgiht + flash driver chip
of TI.

LM3639:
The LM3639 is a single chip LCD Display Backlight driver + white LED
Camera driver.  Programming is done over an I2C compatible interface.
www.ti.com

Signed-off-by: G.Shark Jeong <gshark.jeong@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Daniel Jeong <daniel.jeong@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agobacklight-add-backlight-driver-for-lm3630-chip-fix
Andrew Morton [Fri, 7 Sep 2012 00:24:50 +0000 (10:24 +1000)]
backlight-add-backlight-driver-for-lm3630-chip-fix

- make bled_name[] static

- a few coding style tuneups

- create new set_intensity(), partly to avoid awkward layout gymnastics

Cc: "G.Shark Jeong" <gshark.jeong@gmail.com>
Cc: Daniel Jeong <daniel.jeong@ti.com>
Cc: G.Shark Jeong <gshark.jeong@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agobacklight: add Backlight driver for lm3630 chip
G.Shark Jeong [Fri, 7 Sep 2012 00:24:50 +0000 (10:24 +1000)]
backlight: add Backlight driver for lm3630 chip

This driver is a general version for LM3630 backlgiht driver chip of TI.

LM3630 :
The LM3630 is a current mode boost converter which supplies the power
and controls the current in two strings of up to 10 LEDs per string.
Programming is done over an I2C compatible interface.
www.ti.com

Signed-off-by: G.Shark Jeong <gshark.jeong@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Daniel Jeong <daniel.jeong@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agobacklight: lp855x: add FAST bit description for LP8556
Kim, Milo [Fri, 7 Sep 2012 00:24:49 +0000 (10:24 +1000)]
backlight: lp855x: add FAST bit description for LP8556

LP8556 backlight driver supports fast refresh mode when exiting the low
power mode.  This bit can be configurable in the platform side.

Signed-off-by: Milo(Woogyom) Kim <milo.kim@ti.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Bryan Wu <bryan.wu@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/video/backlight/kb3886_bl.c: use usleep_range() instead of msleep() for small...
Jingoo Han [Fri, 7 Sep 2012 00:24:49 +0000 (10:24 +1000)]
drivers/video/backlight/kb3886_bl.c: use usleep_range() instead of msleep() for small sleeps

Since msleep() might not sleep for the desired amount when less than 20ms,
use usleep_range().

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Cc: Claudio Nieder <private@claudio.ch>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/video/backlight/ltv350qv.c: use usleep_range() instead of msleep() for small...
Jingoo Han [Fri, 7 Sep 2012 00:24:49 +0000 (10:24 +1000)]
drivers/video/backlight/ltv350qv.c: use usleep_range() instead of msleep() for small sleeps

Since msleep() might not sleep for the desired amount when less than 20ms,
use usleep_range().

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/video/backlight/da9052_bl.c: use usleep_range() instead of msleep() for small...
Jingoo Han [Fri, 7 Sep 2012 00:24:48 +0000 (10:24 +1000)]
drivers/video/backlight/da9052_bl.c: use usleep_range() instead of msleep() for small sleeps

Since msleep() might not sleep for the desired amount when less than 20ms,
use usleep_range().

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Cc: Ashish Jangam <ashish.jangam@kpitcummins.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/video/backlight/pwm_bl.c: add device tree support for Low Threshold Brightness
Philip, Avinash [Fri, 7 Sep 2012 00:24:48 +0000 (10:24 +1000)]
drivers/video/backlight/pwm_bl.c: add device tree support for Low Threshold Brightness

Low Threshold Brightness should be configured to have a linear relation in
brightness scale.  This patch adds device tree support for low threshold
brightness as optional one for pwm_backlight.

Signed-off-by: Philip, Avinash <avinashphilip@ti.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Cc: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoMAINTAINERS: Update gianfar_ptp after renaming
Joe Perches [Fri, 7 Sep 2012 00:24:44 +0000 (10:24 +1000)]
MAINTAINERS: Update gianfar_ptp after renaming

commit ec21e2ec36769 ("freescale: Move the Freescale drivers")
moved the files, update the pattern.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoMAINTAINERS: add defconfig file to IMX section
Uwe Kleine-König [Fri, 7 Sep 2012 00:24:29 +0000 (10:24 +1000)]
MAINTAINERS: add defconfig file to IMX section

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoMAINTAINERS: update gpio subsystem file list
Yang Bai [Fri, 7 Sep 2012 00:24:29 +0000 (10:24 +1000)]
MAINTAINERS: update gpio subsystem file list

Signed-off-by: Yang Bai <hamo.by@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolib/vsprintf: update documentation to cover all of %p[Mm][FR]
Andy Shevchenko [Fri, 7 Sep 2012 00:24:29 +0000 (10:24 +1000)]
lib/vsprintf: update documentation to cover all of %p[Mm][FR]

Acked-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolib: vsprintf: fix broken comments
George Spelvin [Fri, 7 Sep 2012 00:24:28 +0000 (10:24 +1000)]
lib: vsprintf: fix broken comments

Numbering the 8 potential digits 2 though 9 never did make a lot of sense.

Signed-off-by: George Spelvin <linux@horizon.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolib: vsprintf: optimize put_dec_trunc8()
George Spelvin [Fri, 7 Sep 2012 00:24:28 +0000 (10:24 +1000)]
lib: vsprintf: optimize put_dec_trunc8()

If you're going to have a conditional branch after each 32x32->64-bit
multiply, might as well shrink the code and make it a loop.

This also avoids using the long multiply for small integers.

(This leaves the comments in a confusing state, but that's a separate
patch to make review easier.)

Signed-off-by: George Spelvin <linux@horizon.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolib: vsprintf: optimize division by 10000
George Spelvin [Fri, 7 Sep 2012 00:24:28 +0000 (10:24 +1000)]
lib: vsprintf: optimize division by 10000

The same multiply-by-inverse technique can be used to convert division by
10000 to a 32x32->64-bit multiply.

Signed-off-by: George Spelvin <linux@horizon.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolib: vsprintf: optimize division by 10 for small integers
George Spelvin [Fri, 7 Sep 2012 00:24:27 +0000 (10:24 +1000)]
lib: vsprintf: optimize division by 10 for small integers

Shrink the reciprocal approximations used in put_dec_full4() based on the
comments in put_dec_full9().

Signed-off-by: George Spelvin <linux@horizon.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agokernel/sys.c: call disable_nonboot_cpus() in kernel_restart()
Shawn Guo [Fri, 7 Sep 2012 00:24:27 +0000 (10:24 +1000)]
kernel/sys.c: call disable_nonboot_cpus() in kernel_restart()

As kernel_power_off() calls disable_nonboot_cpus(), we may also want to
have kernel_restart() call disable_nonboot_cpus().  Doing so can help
machines that require boot cpu be the last alive cpu during reboot to
survive with kernel restart.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agotile: fix personality bits handling upon exec()
Jiri Kosina [Fri, 7 Sep 2012 00:24:27 +0000 (10:24 +1000)]
tile: fix personality bits handling upon exec()

Historically, the top three bytes of personality have been used for things
such as ADDR_NO_RANDOMIZE, which made sense only for specific
architectures.

We now however have a flag there that is general no matter the
architecture (UNAME26); generally we have to be careful to preserve the
personality flags across exec().

This patch fixes tile architecture not to forcefully overwrite personality
flags during exec().

In addition to that, we fix two other things along the way:

- exec_domain switching is fixed -- set_personality() should always
  be used instead of directly assigning to current->personality.
- as pointed out by Arnd Bergmann, PER_LINUX_32BIT is not used anywhere
  by tile, so let's just drop that in favor of PER_LINUX

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agocross-arch: don't corrupt personality flags upon exec()
Jiri Kosina [Fri, 7 Sep 2012 00:24:26 +0000 (10:24 +1000)]
cross-arch: don't corrupt personality flags upon exec()

Historically, the top three bytes of personality have been used for things
such as ADDR_NO_RANDOMIZE, which made sense only for specific
architectures.

We now however have a flag there that is general no matter the
architecture (UNAME26); generally we have to be careful to preserve the
personality flags across exec().

This patch tries to fix all architectures that forcefully overwrite
personality flags during exec() (ppc32 and s390 have been fixed recently
by commits f9783ec86 and 59e4c3a2f in a similar way already).

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/misc/lis3lv02d/lis3lv02d_spi.c: add DT matching table passthru code
Daniel Mack [Fri, 7 Sep 2012 00:24:26 +0000 (10:24 +1000)]
drivers/misc/lis3lv02d/lis3lv02d_spi.c: add DT matching table passthru code

If probed from a device tree, this driver now passes the node information
to the generic part, so the runtime information can be derived.

Successfully tested on a PXA3xx board.

Signed-off-by: Daniel Mack <zonque@gmail.com>
Cc: Rob Herring <robherring2@gmail.com>
Cc: "AnilKumar, Chimata" <anilkumar@ti.com>
Reviewed-by: Éric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/misc/lis3lv02d: add generic DT matching code
Daniel Mack [Fri, 7 Sep 2012 00:24:26 +0000 (10:24 +1000)]
drivers/misc/lis3lv02d: add generic DT matching code

Adds logic to parse lis3 properties from a device tree node and store them
in a freshly allocated lis3lv02d_platform_data.

Note that the actual match tables are left out here.  This part should
happen in the drivers that bind to the individual busses (SPI/I2C/PCI).

Also adds some DT bindinds documentation.

Signed-off-by: Daniel Mack <zonque@gmail.com>
Cc: Rob Herring <robherring2@gmail.com>
Cc: "AnilKumar, Chimata" <anilkumar@ti.com>
Reviewed-by: Éric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoscore: select generic atomic64_t support
Fengguang Wu [Fri, 7 Sep 2012 00:24:25 +0000 (10:24 +1000)]
score: select generic atomic64_t support

It's required for the core fs/namespace.c and many other basic features.

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Lennox Wu <lennox.wu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agofrv: kill used but uninitialized variable
Geert Uytterhoeven [Fri, 7 Sep 2012 00:24:25 +0000 (10:24 +1000)]
frv: kill used but uninitialized variable

Commit 6afe1a1fe8ff83f6a ("PM: Remove legacy PM") removed the
initialization of retval, causing:

arch/frv/kernel/pm.c: In function 'sysctl_pm_do_suspend':
arch/frv/kernel/pm.c:165:5: warning: 'retval' may be used uninitialized in this function [-Wuninitialized]

Remove the variable completely to fix this, and convert to a proper
switch (...) { ... } construct to improve readability.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: wrap calls to set_pte_at_notify with invalidate_range_start and invalidate_range_end
Haggai Eran [Fri, 7 Sep 2012 00:24:25 +0000 (10:24 +1000)]
mm: wrap calls to set_pte_at_notify with invalidate_range_start and invalidate_range_end

In order to allow sleeping during invalidate_page mmu notifier calls, we
need to avoid calling when holding the PT lock.  In addition to its direct
calls, invalidate_page can also be called as a substitute for a change_pte
call, in case the notifier client hasn't implemented change_pte.

This patch drops the invalidate_page call from change_pte, and instead
wraps all calls to change_pte with invalidate_range_start and
invalidate_range_end calls.

Note that change_pte still cannot sleep after this patch, and that clients
implementing change_pte should not take action on it in case the number of
outstanding invalidate_range_start calls is larger than one, otherwise
they might miss a later invalidation.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Cc: Andrea Arcangeli <andrea@qumranet.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Haggai Eran <haggaie@mellanox.com>
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Liran Liss <liranl@mellanox.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm-move-all-mmu-notifier-invocations-to-be-done-outside-the-pt-lock-fix
Andrew Morton [Fri, 7 Sep 2012 00:24:24 +0000 (10:24 +1000)]
mm-move-all-mmu-notifier-invocations-to-be-done-outside-the-pt-lock-fix

possible speed tweak in hugetlb_cow(), cleanups

Cc: Andrea Arcangeli <andrea@qumranet.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Haggai Eran <haggaie@mellanox.com>
Cc: Liran Liss <liranl@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: move all mmu notifier invocations to be done outside the PT lock
Sagi Grimberg [Fri, 7 Sep 2012 00:24:22 +0000 (10:24 +1000)]
mm: move all mmu notifier invocations to be done outside the PT lock

In order to allow sleeping during mmu notifier calls, we need to avoid
invoking them under the page table spinlock.  This patch solves the
problem by calling invalidate_page notification after releasing the lock
(but before freeing the page itself), or by wrapping the page invalidation
with calls to invalidate_range_begin and invalidate_range_end.

To prevent accidental changes to the invalidate_range_end arguments after
the call to invalidate_range_begin, the patch introduces a convention of
saving the arguments in consistently named locals:

unsigned long mmun_start; /* For mmu_notifiers */
unsigned long mmun_end; /* For mmu_notifiers */

...

mmun_start = ...
mmun_end = ...
mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);

...

mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);

The patch changes code to use this convention for all calls to
mmu_notifier_invalidate_range_start/end, except those where the calls are
close enough so that anyone who glances at the code can see the values
aren't changing.

This patchset is a preliminary step towards on-demand paging design to be
added to the RDMA stack.

Why do we want on-demand paging for Infiniband?

  Applications register memory with an RDMA adapter using system calls,
  and subsequently post IO operations that refer to the corresponding
  virtual addresses directly to HW.  Until now, this was achieved by
  pinning the memory during the registration calls.  The goal of on demand
  paging is to avoid pinning the pages of registered memory regions (MRs).
   This will allow users the same flexibility they get when swapping any
  other part of their processes address spaces.  Instead of requiring the
  entire MR to fit in physical memory, we can allow the MR to be larger,
  and only fit the current working set in physical memory.

Why should anyone care?  What problems are users currently experiencing?

  This can make programming with RDMA much simpler.  Today, developers
  that are working with more data than their RAM can hold need either to
  deregister and reregister memory regions throughout their process's
  life, or keep a single memory region and copy the data to it.  On demand
  paging will allow these developers to register a single MR at the
  beginning of their process's life, and let the operating system manage
  which pages needs to be fetched at a given time.  In the future, we
  might be able to provide a single memory access key for each process
  that would provide the entire process's address as one large memory
  region, and the developers wouldn't need to register memory regions at
  all.

Is there any prospect that any other subsystems will utilise these
infrastructural changes?  If so, which and how, etc?

  As for other subsystems, I understand that XPMEM wanted to sleep in
  MMU notifiers, as Christoph Lameter wrote at
  http://lkml.indiana.edu/hypermail/linux/kernel/0802.1/0460.html and
  perhaps Andrea knows about other use cases.

  Scheduling in mmu notifications is required since we need to sync the
  hardware with the secondary page tables change.  A TLB flush of an IO
  device is inherently slower than a CPU TLB flush, so our design works by
  sending the invalidation request to the device, and waiting for an
  interrupt before exiting the mmu notifier handler.

Avi said:

  kvm may be a buyer.  kvm::mmu_lock, which serializes guest page
  faults, also protects long operations such as destroying large ranges.
  It would be good to convert it into a spinlock, but as it is used inside
  mmu notifiers, this cannot be done.

  (there are alternatives, such as keeping the spinlock and using a
  generation counter to do the teardown in O(1), which is what the "may"
  is doing up there).

Signed-off-by: Andrea Arcangeli <andrea@qumranet.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Haggai Eran <haggaie@mellanox.com>
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Liran Liss <liranl@mellanox.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm-support-migrate_discard-fix
Andrew Morton [Fri, 7 Sep 2012 00:23:56 +0000 (10:23 +1000)]
mm-support-migrate_discard-fix

whitespace fixlet

Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: support MIGRATE_DISCARD
Minchan Kim [Fri, 7 Sep 2012 00:23:56 +0000 (10:23 +1000)]
mm: support MIGRATE_DISCARD

Introduce MIGRATE_DISCARD mode in migration.  It drops *clean cache pages*
instead of migration so that migration latency could be reduced by
avoiding (memcpy + page remapping).  It's useful for CMA because latency
of migration is very important rather than eviction of background
processes's workingset.  In addition, it needs less free pages for
migration targets so it could avoid memory reclaiming to get free pages,
which is another factor increase latency.

Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: change enum migrate_mode with bitwise type
Minchan Kim [Fri, 7 Sep 2012 00:23:55 +0000 (10:23 +1000)]
mm: change enum migrate_mode with bitwise type

Change migrate_mode type to bitwise type because next patch will add
MIGRATE_DISCARD and it could be ORed with other attributes so it would be
better to change it with bitwise type.

Suggested by Michal Nazarewicz.

Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: mmu_notifier: make the mmu_notifier srcu static
Andrea Arcangeli [Fri, 7 Sep 2012 00:23:55 +0000 (10:23 +1000)]
mm: mmu_notifier: make the mmu_notifier srcu static

The variable must be static especially given the variable name.

s/RCU/SRCU/ over a few comments.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomemory-hotplug: build zonelists when offlining pages
Xishi Qiu [Fri, 7 Sep 2012 00:23:55 +0000 (10:23 +1000)]
memory-hotplug: build zonelists when offlining pages

online_pages() does build_all_zonelists() and zone_pcp_update(), I think
offline_pages() should do it too.

When the zone has no memory to allocate, remove it from other nodes'
zonelists.  zone_batchsize() depends on zone's present pages, if zone's
present pages are changed, zone's pcp should be updated.

Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: move augmented rbtree functionality to rbtree_augmented.h
Michel Lespinasse [Fri, 7 Sep 2012 00:23:54 +0000 (10:23 +1000)]
rbtree: move augmented rbtree functionality to rbtree_augmented.h

Provide rb_insert_augmented() and rb_erase_augmented through a new
rbtree_augmented.h include file.  rb_erase_augmented() is defined there as
an __always_inline function, in order to allow inlining of augmented
rbtree callbacks into it.  Since this generates a relatively large
function, each augmented rbtree users should make sure to have a single
call site.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoprio_tree: remove
Michel Lespinasse [Fri, 7 Sep 2012 00:23:54 +0000 (10:23 +1000)]
prio_tree: remove

After both prio_tree users have been converted to use red-black trees,
there is no need to keep around the prio tree library anymore.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agokmemleak: use rbtree instead of prio tree
Michel Lespinasse [Fri, 7 Sep 2012 00:23:54 +0000 (10:23 +1000)]
kmemleak: use rbtree instead of prio tree

kmemleak uses a tree where each node represents an allocated memory object
in order to quickly find out what object a given address is part of.
However, the objects don't overlap, so rbtrees are a better choice than
prio tree for this use.  They are both faster and have lower memory
overhead.

Tested by booting a kernel with kmemleak enabled, loading the
kmemleak_test module, and looking for the expected messages.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: replace vma prio_tree with an interval tree
Michel Lespinasse [Fri, 7 Sep 2012 00:23:53 +0000 (10:23 +1000)]
mm: replace vma prio_tree with an interval tree

Implement an interval tree as a replacement for the VMA prio_tree.  The
algorithms are similar to lib/interval_tree.c; however that code can't be
directly reused as the interval endpoints are not explicitly stored in the
VMA.  So instead, the common algorithm is moved into a template and the
details (node type, how to get interval endpoints from the node, etc) are
filled in using the C preprocessor.

Once the interval tree functions are available, using them as a
replacement to the VMA prio tree is a relatively simple, mechanical job.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: add prio tree and interval tree tests
Michel Lespinasse [Fri, 7 Sep 2012 00:23:53 +0000 (10:23 +1000)]
rbtree: add prio tree and interval tree tests

Patch 1 implements support for interval trees, on top of the augmented
rbtree API. It also adds synthetic tests to compare the performance of
interval trees vs prio trees. Short answers is that interval trees are
slightly faster (~25%) on insert/erase, and much faster (~2.4 - 3x)
on search. It is debatable how realistic the synthetic test is, and I have
not made such measurements yet, but my impression is that interval trees
would still come out faster.

Patch 2 uses a preprocessor template to make the interval tree generic,
and uses it as a replacement for the vma prio_tree.

Patch 3 takes the other prio_tree user, kmemleak, and converts it to use
a basic rbtree. We don't actually need the augmented rbtree support here
because the intervals are always non-overlapping.

Patch 4 removes the now-unused prio tree library.

Patch 5 proposes an additional optimization to rb_erase_augmented, now
providing it as an inline function so that the augmented callbacks can be
inlined in. This provides an additional 5-10% performance improvement
for the interval tree insert/erase benchmark. There is a maintainance cost
as it exposes augmented rbtree users to some of the rbtree library internals;
however I think this cost shouldn't be too high as I expect the augmented
rbtree will always have much less users than the base rbtree.

I should probably add a quick summary of why I think it makes sense to
replace prio trees with augmented rbtree based interval trees now.  One of
the drivers is that we need augmented rbtrees for Rik's vma gap finding
code, and once you have them, it just makes sense to use them for interval
trees as well, as this is the simpler and more well known algorithm.  prio
trees, in comparison, seem *too* clever: they impose an additional 'heap'
constraint on the tree, which they use to guarantee a faster worst-case
complexity of O(k+log N) for stabbing queries in a well-balanced prio
tree, vs O(k*log N) for interval trees (where k=number of matches,
N=number of intervals).  Now this sounds great, but in practice prio trees
don't realize this theorical benefit.  First, the additional constraint
makes them harder to update, so that the kernel implementation has to
simplify things by balancing them like a radix tree, which is not always
ideal.  Second, the fact that there are both index and heap properties
makes both tree manipulation and search more complex, which results in a
higher multiplicative time constant.  As it turns out, the simple interval
tree algorithm ends up running faster than the more clever prio tree.

This patch:

Add two test modules:

- prio_tree_test measures the performance of lib/prio_tree.c, both for
  insertion/removal and for stabbing searches

- interval_tree_test measures the performance of a library of equivalent
  functionality, built using the augmented rbtree support.

In order to support the second test module, lib/interval_tree.c is
introduced. It is kept separate from the interval_tree_test main file
for two reasons: first we don't want to provide an unfair advantage
over prio_tree_test by having everything in a single compilation unit,
and second there is the possibility that the interval tree functionality
could get some non-test users in kernel over time.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: add RB_DECLARE_CALLBACKS() macro
Michel Lespinasse [Fri, 7 Sep 2012 00:23:53 +0000 (10:23 +1000)]
rbtree: add RB_DECLARE_CALLBACKS() macro

As proposed by Peter Zijlstra, this makes it easier to define the augmented
rbtree callbacks.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: remove prior augmented rbtree implementation
Michel Lespinasse [Fri, 7 Sep 2012 00:23:52 +0000 (10:23 +1000)]
rbtree: remove prior augmented rbtree implementation

convert arch/x86/mm/pat_rbtree.c to the proposed augmented rbtree api
and remove the old augmented rbtree implementation.

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: faster augmented rbtree manipulation
Michel Lespinasse [Fri, 7 Sep 2012 00:23:52 +0000 (10:23 +1000)]
rbtree: faster augmented rbtree manipulation

Introduce new augmented rbtree APIs that allow minimal recalculation of
augmented node information.

A new callback is added to the rbtree insertion and erase rebalancing
functions, to be called on each tree rotations. Such rotations preserve
the subtree's root augmented value, but require recalculation of the one
child that was previously located at the subtree root.

In the insertion case, the handcoded search phase must be updated to
maintain the augmented information on insertion, and then the rbtree
coloring/rebalancing algorithms keep it up to date.

In the erase case, things are more complicated since it is library
code that manipulates the rbtree in order to remove internal nodes.
This requires a couple additional callbacks to copy a subtree's
augmented value when a new root is stitched in, and to recompute
augmented values down the ancestry path when a node is removed from
the tree.

In order to preserve maximum speed for the non-augmented case,
we provide two versions of each tree manipulation function.
rb_insert_augmented() is the augmented equivalent of rb_insert_color(),
and rb_erase_augmented() is the augmented equivalent of rb_erase().

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: augmented rbtree test
Michel Lespinasse [Fri, 7 Sep 2012 00:23:52 +0000 (10:23 +1000)]
rbtree: augmented rbtree test

Small test to measure the performance of augmented rbtrees.

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: low level optimizations in rb_erase()
Michel Lespinasse [Fri, 7 Sep 2012 00:23:51 +0000 (10:23 +1000)]
rbtree: low level optimizations in rb_erase()

Various minor optimizations in rb_erase():
- Avoid multiple loading of node->__rb_parent_color when computing parent
  and color information (possibly not in close sequence, as there might
  be further branches in the algorithm)
- In the 1-child subcase of case 1, copy the __rb_parent_color field from
  the erased node to the child instead of recomputing it from the desired
  parent and color
- When searching for the erased node's successor, differentiate between
  cases 2 and 3 based on whether any left links were followed. This avoids
  a condition later down.
- In case 3, keep a pointer to the erased node's right child so we don't
  have to refetch it later to adjust its parent.
- In the no-childs subcase of cases 2 and 3, place the rebalance assigment
  last so that the compiler can remove the following if(rebalance) test.

Also, added some comments to illustrate cases 2 and 3.

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: handle 1-child recoloring in rb_erase() instead of rb_erase_color()
Michel Lespinasse [Fri, 7 Sep 2012 00:23:51 +0000 (10:23 +1000)]
rbtree: handle 1-child recoloring in rb_erase() instead of rb_erase_color()

An interesting observation for rb_erase() is that when a node has
exactly one child, the node must be black and the child must be red.
An interesting consequence is that removing such a node can be done by
simply replacing it with its child and making the child black,
which we can do efficiently in rb_erase(). __rb_erase_color() then
only needs to handle the no-childs case and can be modified accordingly.

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: place easiest case first in rb_erase()
Michel Lespinasse [Fri, 7 Sep 2012 00:23:51 +0000 (10:23 +1000)]
rbtree: place easiest case first in rb_erase()

In rb_erase, move the easy case (node to erase has no more than
1 child) first. I feel the code reads easier that way.

Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: add __rb_change_child() helper function
Michel Lespinasse [Fri, 7 Sep 2012 00:23:50 +0000 (10:23 +1000)]
rbtree: add __rb_change_child() helper function

Add __rb_change_child() as an inline helper function to replace code that
would otherwise be duplicated 4 times in the source.

No changes to binary size or speed.

Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree test: fix sparse warning about 64-bit constant
Michel Lespinasse [Fri, 7 Sep 2012 00:23:50 +0000 (10:23 +1000)]
rbtree test: fix sparse warning about 64-bit constant

Just a small fix to make sparse happy.

Signed-off-by: Michel Lespinasse <walken@google.com>
Reported-by: Fengguang Wu <wfg@linux.intel.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: optimize fetching of sibling node
Michel Lespinasse [Fri, 7 Sep 2012 00:23:50 +0000 (10:23 +1000)]
rbtree: optimize fetching of sibling node

When looking to fetch a node's sibling, we went through a sequence of:
- check if node is the parent's left child
- if it is, then fetch the parent's right child

This can be replaced with:
- fetch the parent's right child as an assumed sibling
- check that node is NOT the fetched child

This avoids fetching the parent's left child when node is actually
that child. Saves a bit on code size, though it doesn't seem to make
a large difference in speed.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: coding style adjustments
Michel Lespinasse [Fri, 7 Sep 2012 00:23:49 +0000 (10:23 +1000)]
rbtree: coding style adjustments

Set comment and indentation style to be consistent with linux coding style
and the rest of the file, as suggested by Peter Zijlstra

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: low level optimizations in __rb_erase_color()
Michel Lespinasse [Fri, 7 Sep 2012 00:23:49 +0000 (10:23 +1000)]
rbtree: low level optimizations in __rb_erase_color()

In __rb_erase_color(), we often already have pointers to the nodes being
rotated and/or know what their colors must be, so we can generate more
efficient code than the generic __rb_rotate_left() and __rb_rotate_right()
functions.

Also when the current node is red or when flipping the sibling's color,
the parent is already known so we can use the more efficient
rb_set_parent_color() function to set the desired color.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: optimize case selection logic in __rb_erase_color()
Michel Lespinasse [Fri, 7 Sep 2012 00:23:49 +0000 (10:23 +1000)]
rbtree: optimize case selection logic in __rb_erase_color()

In __rb_erase_color(), we have to select one of 3 cases depending on the
color on the 'other' node children.  If both children are black, we flip a
few node colors and iterate.  Otherwise, we do either one or two tree
rotations, depending on the color of the 'other' child opposite to 'node',
and then we are done.

The corresponding logic had duplicate checks for the color of the 'other'
child opposite to 'node'.  It was checking it first to determine if both
children are black, and then to determine how many tree rotations are
required.  Rearrange the logic to avoid that extra check.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: adjust node color in __rb_erase_color() only when necessary
Michel Lespinasse [Fri, 7 Sep 2012 00:23:48 +0000 (10:23 +1000)]
rbtree: adjust node color in __rb_erase_color() only when necessary

In __rb_erase_color(), we were always setting a node to black after
exiting the main loop.  And in one case, after fixing up the tree to
satisfy all rbtree invariants, we were setting the current node to root
just to guarantee a loop exit, at which point the root would be set to
black.  However this is not necessary, as the root of an rbtree is already
known to be black.  The only case where the color flip is required is when
we exit the loop due to the current node being red, and it's easiest to
just do the flip at that point instead of doing it after the loop.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: low level optimizations in rb_insert_color()
Michel Lespinasse [Fri, 7 Sep 2012 00:23:48 +0000 (10:23 +1000)]
rbtree: low level optimizations in rb_insert_color()

- Use the newly introduced rb_set_parent_color() function to flip the color
  of nodes whose parent is already known.
- Optimize rb_parent() when the node is known to be red - there is no need
  to mask out the color in that case.
- Flipping gparent's color to red requires us to fetch its rb_parent_color
  field, so we can reuse it as the parent value for the next loop iteration.
- Do not use __rb_rotate_left() and __rb_rotate_right() to handle tree
  rotations: we already have pointers to all relevant nodes, and know their
  colors (either because we want to adjust it, or because we've tested it,
  or we can deduce it as black due to the node proximity to a known red node).
  So we can generate more efficient code by making use of the node pointers
  we already have, and setting both the parent and color attributes for
  nodes all at once. Also in Case 2, some node attributes don't have to
  be set because we know another tree rotation (Case 3) will always follow
  and override them.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: adjust root color in rb_insert_color() only when necessary
Michel Lespinasse [Fri, 7 Sep 2012 00:23:47 +0000 (10:23 +1000)]
rbtree: adjust root color in rb_insert_color() only when necessary

The root node of an rbtree must always be black.  However,
rb_insert_color() only needs to maintain this invariant when it has been
broken - that is, when it exits the loop due to the current (red) node
being the root.  In all other cases (exiting after tree rotations, or
exiting due to an existing black parent) the invariant is already
satisfied, so there is no need to adjust the root node color.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: break out of rb_insert_color loop after tree rotation
Michel Lespinasse [Fri, 7 Sep 2012 00:23:47 +0000 (10:23 +1000)]
rbtree: break out of rb_insert_color loop after tree rotation

It is a well known property of rbtrees that insertion never requires more
than two tree rotations.  In our implementation, after one loop iteration
identified one or two necessary tree rotations, we would iterate and look
for more.  However at that point the node's parent would always be black,
which would cause us to exit the loop.

We can make the code flow more obvious by just adding a break statement
after the tree rotations, where we know we are done.  Additionally, in the
cases where two tree rotations are necessary, we don't have to update the
'node' pointer as it wouldn't be used until the next loop iteration, which
we now avoid due to this break statement.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree-performance-and-correctness-test-fix
Andrew Morton [Fri, 7 Sep 2012 00:23:47 +0000 (10:23 +1000)]
rbtree-performance-and-correctness-test-fix

fix printk warning: sparc64 cycles_t is unsigned long

Cc: Michel Lespinasse <walken@google.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: performance and correctness test
Michel Lespinasse [Fri, 7 Sep 2012 00:23:46 +0000 (10:23 +1000)]
rbtree: performance and correctness test

This small module helps measure the performance of rbtree insert and
erase.

Additionally, we run a few correctness tests to check that the rbtrees
have all desired properties:

- contains the right number of nodes in the order desired,
- never two consecutive red nodes on any path,
- all paths to leaf nodes have the same number of black nodes,
- root node is black

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: fix jffs2 build issue due to renamed __rb_parent_color field
David Woodhouse [Fri, 7 Sep 2012 00:23:46 +0000 (10:23 +1000)]
rbtree: fix jffs2 build issue due to renamed __rb_parent_color field

... and clean up the comments to better explain why it's acceptable to
do it this way instead of using rb_erase() "properly".

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: move some implementation details from rbtree.h to rbtree.c
Michel Lespinasse [Fri, 7 Sep 2012 00:23:46 +0000 (10:23 +1000)]
rbtree: move some implementation details from rbtree.h to rbtree.c

rbtree users must use the documented APIs to manipulate the tree
structure.  Low-level helpers to manipulate node colors and parenthood are
not part of that API, so move them to lib/rbtree.c

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: fix incorrect rbtree node insertion in fs/proc/proc_sysctl.c
Michel Lespinasse [Fri, 7 Sep 2012 00:23:45 +0000 (10:23 +1000)]
rbtree: fix incorrect rbtree node insertion in fs/proc/proc_sysctl.c

The recently added code to use rbtrees in sysctl did not follow the proper
rbtree interface on insertion - it was calling rb_link_node() which
inserts a new node into the binary tree, but missed the call to
rb_insert_color() which properly balances the rbtree and establishes all
expected rbtree invariants.

I found out about this only because faulty commit also used
rb_init_node(), which I am removing within this patchset.  But I think
it's an easy mistake to make, and it makes me wonder if we should change
the rbtree API so that insertions would be done with a single rb_insert()
call (even if its implementation could still inline the rb_link_node()
part and call a private __rb_insert_color function to do the rebalancing).

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree-empty-nodes-have-no-color-fix
Stephen Rothwell [Fri, 7 Sep 2012 00:23:45 +0000 (10:23 +1000)]
rbtree-empty-nodes-have-no-color-fix

After merging the akpm tree, today's linux-next build (x86_64
allmodconfig) failed like this:

net/ceph/osd_client.c: In function 'ceph_osdc_alloc_request':
net/ceph/osd_client.c:216:2: error: implicit declaration of function 'rb_in=
it_node' [-Werror=3Dimplicit-function-declaration]

Caused by commit 753b960e52b7 ("rbtree: empty nodes have no color") from
the akpm tree interacting with commit cd43045c2de6 ("libceph: initialize
rb, list nodes in ceph_osd_request") from the ceph tree.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Sage Weil <sage@inktank.com>
Reviewed-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: empty nodes have no color
Michel Lespinasse [Fri, 7 Sep 2012 00:23:45 +0000 (10:23 +1000)]
rbtree: empty nodes have no color

Empty nodes have no color.  We can make use of this property to simplify
the code emitted by the RB_EMPTY_NODE and RB_CLEAR_NODE macros.  Also, we
can get rid of the rb_init_node function which had been introduced by
88d19cf37952 ("timers: Add rb_init_node() to allow for stack allocated rb
nodes") to avoid some issue with the empty node's color not being
initialized.

I'm not sure what the RB_EMPTY_NODE checks in rb_prev() / rb_next() are
doing there, though.  axboe introduced them in 10fd48f2376d ("rbtree:
fixed reversed RB_EMPTY_NODE and rb_next/prev").  The way I see it, the
'empty node' abstraction is only used by rbtree users to flag nodes that
they haven't inserted in any rbtree, so asking the predecessor or
successor of such nodes doesn't make any sense.

One final rb_init_node() caller was recently added in sysctl code to
implement faster sysctl name lookups.  This code doesn't make use of
RB_EMPTY_NODE at all, and from what I could see it only called
rb_init_node() under the mistaken assumption that such initialization was
required before node insertion.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorbtree: reference Documentation/rbtree.txt for usage instructions
Michel Lespinasse [Fri, 7 Sep 2012 00:23:45 +0000 (10:23 +1000)]
rbtree: reference Documentation/rbtree.txt for usage instructions

I recently started looking at the rbtree code (with an eye towards
improving the augmented rbtree support, but I haven't gotten there yet).
I noticed a lot of possible speed improvements, which I am now proposing
in this patch set.

Patches 1-4 are preparatory: remove internal functions from rbtree.h so
that users won't be tempted to use them instead of the documented APIs,
clean up some incorrect usages I've noticed (in particular, with the
recently added fs/proc/proc_sysctl.c rbtree usage), reference the
documentation so that people have one less excuse to miss it, etc.

Patch 5 is a small module I wrote to check the rbtree performance.  It
creates 100 nodes with random keys and repeatedly inserts and erases them
from an rbtree.  Additionally, it has code to check for rbtree invariants
after each insert or erase operation.

Patches 6-12 is where the rbtree optimizations are done, and they touch
only that one file, lib/rbtree.c .  I am getting good results out of these
- in my small benchmark doing rbtree insertion (including search) and
erase, I'm seeing a 30% runtime reduction on Sandybridge E5, which is more
than I initially thought would be possible.  (the results aren't as
impressive on my two other test hosts though, AMD barcelona and Intel
Westmere, where I am seeing 14% runtime reduction only).  The code size -
both source (ommiting comments) and compiled - is also shorter after these
changes.  However, I do admit that the updated code is more arduous to
read - one big reason for that is the removal of the tree rotation
helpers, which added some overhead but also made it easier to reason about
things locally.  Overall, I believe this is an acceptable compromise,
given that this code doesn't get modified very often, and that I have good
tests for it.

Upon Peter's suggestion, I added comments showing the rtree configuration
before every rotation.  I think they help; however it's still best to have
a copy of the cormen/leiserson/rivest book when digging into this code.

This patch: reference Documentation/rbtree.txt for usage instructions

include/linux/rbtree.h included some basic usage instructions, while
Documentation/rbtree.txt had some more complete and easier to follow
instructions.  Replacing the former with a reference to the latter.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoipc/mqueue: remove unnecessary rb_init_node() calls
Michel Lespinasse [Fri, 7 Sep 2012 00:23:44 +0000 (10:23 +1000)]
ipc/mqueue: remove unnecessary rb_init_node() calls

d6629859 ("ipc/mqueue: improve performance of send/recv") and ce2d52cc
("ipc/mqueue: add rbtree node caching support") introduced an rbtree of
message priorities, and usage of rb_init_node() to initialize the
corresponding nodes.  As it turns out, rb_init_node() is unnecessary here,
as the nodes are fully initialized on insertion by rb_link_node() and the
code doesn't access nodes that aren't inserted on the rbtree.

Removing the rb_init_node() calls as I removed that function during
rbtree API cleanups (the only other use of it was in a place that similarly
didn't require it).

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agothp, s390: architecture backend for thp on s390
Gerald Schaefer [Fri, 7 Sep 2012 00:23:44 +0000 (10:23 +1000)]
thp, s390: architecture backend for thp on s390

This implements the architecture backend for transparent hugepages
on s390.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agothp, s390: disable thp for kvm host on s390
Gerald Schaefer [Fri, 7 Sep 2012 00:23:44 +0000 (10:23 +1000)]
thp, s390: disable thp for kvm host on s390

This patch is part of the architecture backend for thp on s390.  It
disables thp for kvm hosts, because there is no kvm host hugepage support
so far.  Existing thp mappings are split by follow_page() with FOLL_SPLIT,
and future thp mappings are prevented by setting VM_NOHUGEPAGE in
mm->def_flags.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agothp, s390: thp pagetable pre-allocation for s390
Gerald Schaefer [Fri, 7 Sep 2012 00:23:43 +0000 (10:23 +1000)]
thp, s390: thp pagetable pre-allocation for s390

This patch is part of the architecture backend for thp on s390.  It
provides the pagetable pre-allocation functions
pgtable_trans_huge_deposit() and pgtable_trans_huge_withdraw().  Unlike
other archs, s390 has no struct page * as pgtable_t, but rather a pointer
to the page table.  So instead of saving the pagetable pre- allocation
list info inside the struct page, it is being saved within the pagetable
itself.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agothp, s390: thp splitting backend for s390
Gerald Schaefer [Fri, 7 Sep 2012 00:23:43 +0000 (10:23 +1000)]
thp, s390: thp splitting backend for s390

This patch is part of the architecture backend for thp on s390.  It
provides the functions related to thp splitting, including serialization
against gup.  Unlike other archs, pmdp_splitting_flush() cannot use a tlb
flushing operation to serialize against gup on s390, because that wouldn't
be stopped by the disabled IRQs.  So instead, smp_call_function() is
called with an empty function, which will have the expected effect.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agothp: make MADV_HUGEPAGE check for mm->def_flags
Gerald Schaefer [Fri, 7 Sep 2012 00:23:43 +0000 (10:23 +1000)]
thp: make MADV_HUGEPAGE check for mm->def_flags

This adds a check to hugepage_madvise(), to refuse MADV_HUGEPAGE if
VM_NOHUGEPAGE is set in mm->def_flags.  On s390, the VM_NOHUGEPAGE flag
will be set in mm->def_flags for kvm processes, to prevent any future thp
mappings.  In order to also prevent MADV_HUGEPAGE on such an mm,
hugepage_madvise() should check mm->def_flags.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agothp: introduce pmdp_invalidate()
Gerald Schaefer [Fri, 7 Sep 2012 00:23:42 +0000 (10:23 +1000)]
thp: introduce pmdp_invalidate()

On s390, a valid page table entry must not be changed while it is attached
to any CPU.  So instead of pmd_mknotpresent() and set_pmd_at(), an IDTE
operation would be necessary there.  This patch introduces the
pmdp_invalidate() function, to allow architecture-specific
implementations.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agothp: remove assumptions on pgtable_t type
Gerald Schaefer [Fri, 7 Sep 2012 00:23:42 +0000 (10:23 +1000)]
thp: remove assumptions on pgtable_t type

The thp page table pre-allocation code currently assumes that pgtable_t is
of type "struct page *".  This may not be true for all architectures, so
this patch removes that assumption by replacing the functions
prepare_pmd_huge_pte() and get_pmd_huge_pte() with two new functions that
can be defined architecture-specific.

It also removes two VM_BUG_ON checks for page_count() and page_mapcount()
operating on a pgtable_t.  Apart from the VM_BUG_ON removal, there will be
no functional change introduced by this patch.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agothp, x86: introduce HAVE_ARCH_TRANSPARENT_HUGEPAGE
Gerald Schaefer [Fri, 7 Sep 2012 00:23:42 +0000 (10:23 +1000)]
thp, x86: introduce HAVE_ARCH_TRANSPARENT_HUGEPAGE

Cleanup patch in preparation for transparent hugepage support on s390.
Adding new architectures to the TRANSPARENT_HUGEPAGE config option can
make the "depends" line rather ugly, like "depends on (X86 || (S390 &&
64BIT)) && MMU".

This patch adds a HAVE_ARCH_TRANSPARENT_HUGEPAGE instead.  x86 already has
MMU "def_bool y", so the MMU check is superfluous there and
HAVE_ARCH_TRANSPARENT_HUGEPAGE can be selected in arch/x86/Kconfig.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: fix potential anon_vma locking issue in mprotect()
Michel Lespinasse [Fri, 7 Sep 2012 00:23:41 +0000 (10:23 +1000)]
mm: fix potential anon_vma locking issue in mprotect()

Fix an anon_vma locking issue in the following situation:

- vma has no anon_vma
- next has an anon_vma
- vma is being shrunk / next is being expanded, due to an mprotect call

We need to take next's anon_vma lock to avoid races with rmap users (such
as page migration) while next is being expanded.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agothp: remove unnecessary set_recommended_min_free_kbytes
Xiao Guangrong [Fri, 7 Sep 2012 00:23:41 +0000 (10:23 +1000)]
thp: remove unnecessary set_recommended_min_free_kbytes

Since it is called in start_khugepaged

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agothp: use khugepaged_enabled to remove duplicate code
Xiao Guangrong [Fri, 7 Sep 2012 00:23:41 +0000 (10:23 +1000)]
thp: use khugepaged_enabled to remove duplicate code

Use khugepaged_enabled to see whether thp is enabled

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agothp: remove khugepaged_loop
Xiao Guangrong [Fri, 7 Sep 2012 00:23:40 +0000 (10:23 +1000)]
thp: remove khugepaged_loop

Merge khugepaged_loop into khugepaged

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agothp: introduce khugepaged_prealloc_page and khugepaged_alloc_page
Xiao Guangrong [Fri, 7 Sep 2012 00:23:40 +0000 (10:23 +1000)]
thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page

They are used to abstract the difference between NUMA enabled and NUMA
disabled to make the code more readable

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agothp: release page in page pre-alloc path
Xiao Guangrong [Fri, 7 Sep 2012 00:23:40 +0000 (10:23 +1000)]
thp: release page in page pre-alloc path

If NUMA is enabled, we can release the page in the page pre-alloc
operation, then the CONFIG_NUMA dependent code can be reduced

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agothp: merge page pre-alloc in khugepaged_loop into khugepaged_do_scan
Xiao Guangrong [Fri, 7 Sep 2012 00:23:39 +0000 (10:23 +1000)]
thp: merge page pre-alloc in khugepaged_loop into khugepaged_do_scan

There are two pre-alloc operations in these two function, the different is:
- it allows to sleep if page alloc fail in khugepaged_loop
- it exits immediately if page alloc fail in khugepaged_do_scan

Actually, in khugepaged_do_scan, we can allow the pre-alloc to sleep on
the first failure, then the operation in khugepaged_loop can be removed

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agothp: remove some code depend on CONFIG_NUMA
Xiao Guangrong [Fri, 7 Sep 2012 00:23:39 +0000 (10:23 +1000)]
thp: remove some code depend on CONFIG_NUMA

If NUMA is disabled, hpage is used as page pre-alloc, so there are two
cases for hpage:

- it is !NULL, means the page is not consumed otherwise,
- the page has been consumed

If NUMA is enabled, hpage is just used as alloc-fail indicator which is
not a real page, NULL means not fail triggered.

So, we can release the page only if !IS_ERR_OR_NULL

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agothp: remove wake_up_interruptible in the exit path
Xiao Guangrong [Fri, 7 Sep 2012 00:23:39 +0000 (10:23 +1000)]
thp: remove wake_up_interruptible in the exit path

Add the check of kthread_should_stop() to the conditions which are used to
wakeup on khugepaged_wait, then kthread_stop is enough to let the thread
exit

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agothp: remove unnecessary khugepaged_thread check
Xiao Guangrong [Fri, 7 Sep 2012 00:23:38 +0000 (10:23 +1000)]
thp: remove unnecessary khugepaged_thread check

Now, khugepaged creation and cancel are completely serial under the
protection of khugepaged_mutex, it is impossible that many khugepaged
entities are running

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agothp: move khugepaged_mutex out of khugepaged
Xiao Guangrong [Fri, 7 Sep 2012 00:23:38 +0000 (10:23 +1000)]
thp: move khugepaged_mutex out of khugepaged

Currently, hugepaged_mutex is used really complexly and hard to
understand, actually, it is just used to serialize start_khugepaged and
khugepaged for these reasons:

- khugepaged_thread is shared between them
- the thp disable path (echo never > transparent_hugepage/enabled) is
  nonblocking, so we need to protect khugepaged_thread to get a stable
  running state

These can be avoided by:

- use the lock to serialize the thread creation and cancel
- thp disable path can not finised until the thread exits

Then khugepaged_thread is fully controlled by start_khugepaged, khugepaged
will be happy without the lock

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>