]> git.karo-electronics.de Git - karo-tx-linux.git/log
karo-tx-linux.git
13 years agodrivers/leds/leds-renesas-tpu.c: update driver to use workqueue
Magnus Damm [Mon, 24 Oct 2011 14:58:52 +0000 (01:58 +1100)]
drivers/leds/leds-renesas-tpu.c: update driver to use workqueue

Use a workqueue in the Renesas TPU LED driver to allow the Runtime PM code
to sleep.

Signed-off-by: Magnus Damm <damm@opensource.se>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agodrivers/leds/leds-lm3530.c: remove obsolete cleanup for clientdata
Wolfram Sang [Mon, 24 Oct 2011 14:58:51 +0000 (01:58 +1100)]
drivers/leds/leds-lm3530.c: remove obsolete cleanup for clientdata

A few new i2c-drivers came into the kernel which clear the
clientdata-pointer on exit or error.  This is obsolete meanwhile, the core
will do it.

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Richard Purdie <rpurdie@rpsys.net>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agodrivers/leds/led-triggers.c: fix memory leak
Masakazu Mokuno [Mon, 24 Oct 2011 14:58:51 +0000 (01:58 +1100)]
drivers/leds/led-triggers.c: fix memory leak

The memory for struct led_trigger should be kfreed in the
led_trigger_register() error path.  Also this function should return NULL
on error.

Signed-off-by: Masakazu Mokuno <mokuno@sm.sony.co.jp>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agoleds-renesas-tpu-led-driver-v2-fix
Axel Lin [Mon, 24 Oct 2011 14:58:51 +0000 (01:58 +1100)]
leds-renesas-tpu-led-driver-v2-fix

include linux/module.h

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: Magnus Damm <damm@opensource.se>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agoleds: Renesas TPU LED driver
Magnus Damm [Mon, 24 Oct 2011 14:58:50 +0000 (01:58 +1100)]
leds: Renesas TPU LED driver

Add V2 of the LED driver for a single timer channel for the TPU hardware
block commonly found in Renesas SoCs.

The driver has been written with optimal Power Management in mind, so to
save power the LED is driven as a regular GPIO pin in case of maximum
brightness and power off which allows the TPU hardware to be idle and
which in turn allows the clocks to be stopped and the power domain to be
turned off transparently.

Any other brightness level requires use of the TPU hardware in PWM mode.
TPU hardware device clocks and power are managed through Runtime PM.
System suspend and resume is known to be working - during suspend the LED
is set to off by the generic LED code.

The TPU hardware timer is equipeed with a 16-bit counter together with an
up-to-divide-by-64 prescaler which makes the hardware suitable for
brightness control.  Hardware blink is unsupported.

The LED PWM waveform has been verified with a Fluke 123 Scope meter on a
sh7372 Mackerel board.  Tested with experimental sh7372 A3SP power domain
patches.  Platform device bind/unbind tested ok.

V2 has been tested on the DS2 LED of the sh73a0-based AG5EVM.

Signed-off-by: Magnus Damm <damm@opensource.se>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agobacklight: rename corgibl_limit_intensity() to genericbl_limit_intensity()
Axel Lin [Mon, 24 Oct 2011 14:58:50 +0000 (01:58 +1100)]
backlight: rename corgibl_limit_intensity() to genericbl_limit_intensity()

The rename of corgibl_limit_intensity is missed in commit d00ba726
("backlight: Rename the corgi backlight driver to generic").  Let's fix it
now.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agodrivers/video/backlight/l4f00242t03.c: use gpio_request_one() to simplify error handling
Fabio Estevam [Mon, 24 Oct 2011 14:58:50 +0000 (01:58 +1100)]
drivers/video/backlight/l4f00242t03.c: use gpio_request_one() to simplify error handling

Using gpio_request_one can make the error handling simpler.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agobacklight: fix broken regulator API usage in l4f00242t03
Mark Brown [Mon, 24 Oct 2011 14:58:49 +0000 (01:58 +1100)]
backlight: fix broken regulator API usage in l4f00242t03

The regulator support in the l4f00242t03 is very non-idiomatic.  Rather
than requesting the regulators based on the device name and the supply
names used by the device the driver requires boards to pass system
specific supply names around through platform data.  The driver also
conditionally requests the regulators based on this platform data, adding
unneeded conditional code to the driver.

Fix this by removing the platform data and converting to the standard
idiom, also updating all in tree users of the driver.  As no datasheet
appears to be available for the LCD I'm guessing the names for the
supplies based on the existing users and I've no ability to do anything
more than compile test.

The use of regulator_set_voltage() in the driver is also problematic,
since fixed voltages are required the expectation would be that the
voltages would be fixed in the constraints set by the machines rather than
manually configured by the driver, but is less problematic.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agovideo/backlight: remove obsolete cleanup for clientdata
Wolfram Sang [Mon, 24 Oct 2011 14:58:49 +0000 (01:58 +1100)]
video/backlight: remove obsolete cleanup for clientdata

A few new i2c-drivers came into the kernel which clear the
clientdata-pointer on exit or error.  This is obsolete meanwhile, the core
will do it.

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agoMAINTAINERS: add ASLR maintainer
Jiri Kosina [Mon, 24 Oct 2011 14:58:48 +0000 (01:58 +1100)]
MAINTAINERS: add ASLR maintainer

Since achieving the full ASLR by merging the PIE randomization in commit
cc503c1b43 ("x86: PIE executable randomization"), I have been dealing with
most (if not all) of the bugreports reported against userspace address
space randomization, so it might be a good idea to provide a decent
contact point in MAINTAINERS.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agoMAINTAINERS: Linas has moved
Linas Vepstas [Mon, 24 Oct 2011 14:58:48 +0000 (01:58 +1100)]
MAINTAINERS: Linas has moved

While ego surfing, I noticed an email address problem.

Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agoMAINTAINERS: add new entry for ideapad-laptop
Ike Panhc [Mon, 24 Oct 2011 14:58:47 +0000 (01:58 +1100)]
MAINTAINERS: add new entry for ideapad-laptop

Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agopoll: add poll_requested_events() function
Hans Verkuil [Mon, 24 Oct 2011 14:58:47 +0000 (01:58 +1100)]
poll: add poll_requested_events() function

In some cases the poll() implementation in a driver has to do different
things depending on the events the caller wants to poll for.  An example
is when a driver needs to start a DMA engine if the caller polls for
POLLIN, but doesn't want to do that if POLLIN is not requested but instead
only POLLOUT or POLLPRI is requested.  This is something that can happen
in the video4linux subsystem.

Unfortunately, the current epoll/poll/select implementation doesn't
provide that information reliably.  The poll_table_struct does have it: it
has a key field with the event mask.  But once a poll() call matches one
or more bits of that mask any following poll() calls are passed a NULL
poll_table_struct pointer.

The solution is to set the qproc field to NULL in poll_table_struct once
poll() matches the events, not the poll_table_struct pointer itself.  That
way drivers can obtain the mask through a new poll_requested_events
inline.

The poll_table_struct can still be NULL since some kernel code calls it
internally (netfs_state_poll() in ./drivers/staging/pohmelfs/netfs.h).  In
that case poll_requested_events() returns ~0 (i.e.  all events).

Since eventpoll always leaves the key field at ~0 instead of using the
requested events mask, that source was changed as well to properly fill in
the key field.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Reviewed-by: Jonathan Corbet <corbet@lwn.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agotreewide-use-__printf-not-__attribute__formatprintf: revert arch bits
Andrew Morton [Mon, 24 Oct 2011 14:58:47 +0000 (01:58 +1100)]
treewide-use-__printf-not-__attribute__formatprintf: revert arch bits

Cc: Joe Perches <joe@perches.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agotreewide: use __printf not __attribute__((format(printf,...)))
Joe Perches [Mon, 24 Oct 2011 14:58:46 +0000 (01:58 +1100)]
treewide: use __printf not __attribute__((format(printf,...)))

Standardize the style for compiler based printf format verification.
Standardized the location of __printf too.

Done via script and a little typing.

$ grep -rPl --include=*.[ch] -w "__attribute__" * | \
  grep -vP "^(tools|scripts|include/linux/compiler-gcc.h)" | \
  xargs perl -n -i -e 'local $/; while (<>) { s/\b__attribute__\s*\(\s*\(\s*format\s*\(\s*printf\s*,\s*(.+)\s*,\s*(.+)\s*\)\s*\)\s*\)/__printf($1, $2)/g ; print; }'

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agoprintk: remove bounds checking for log_prefix
William Douglas [Mon, 24 Oct 2011 14:58:46 +0000 (01:58 +1100)]
printk: remove bounds checking for log_prefix

Currently log_prefix is testing that the first character of the log level
and facility is less than '0' and greater than '9' (which is always
false).

Since the code being updated works because strtoul bombs out (endp isn't
updated) and 0 is returned anyway just remove the check and don't change
the behavior of the function.

Signed-off-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agoprintk: fix bounds checking for log_prefix
William Douglas [Mon, 24 Oct 2011 14:58:45 +0000 (01:58 +1100)]
printk: fix bounds checking for log_prefix

Currently log_prefix is testing that the first character of the log level
and facility is less than '0' and greater than '9' (which is always
false).  It should be testing to see if the character less than '0' or
greater than '9' instead.  This patch makes that change.

The code being changed worked because strtoul bombs out (endp isn't
updated) and 0 is returned anyway.

Signed-off-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agoprintk: add console_suspend module parameter
Yanmin Zhang [Mon, 24 Oct 2011 14:58:45 +0000 (01:58 +1100)]
printk: add console_suspend module parameter

We are enabling some power features on medfield.  To test suspend-2-RAM
conveniently, we need turn on/off console_suspend_enabled frequently.

Add a module parameter, so users could change it by:
/sys/module/printk/parameters/console_suspend

Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agoprintk: add ignore_loglevel as module parameter
Yanmin Zhang [Mon, 24 Oct 2011 14:58:44 +0000 (01:58 +1100)]
printk: add ignore_loglevel as module parameter

We are enabling some power features on medfield.  To test suspend-2-RAM
conveniently, we need turn on/off ignore_loglevel frequently without
rebooting.

Add a module parameter, so users could change it by:
/sys/module/printk/parameters/ignore_loglevel

Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agoprintk: add module parameter ignore_loglevel to control ignore_loglevel
Yanmin Zhang [Mon, 24 Oct 2011 14:58:44 +0000 (01:58 +1100)]
printk: add module parameter ignore_loglevel to control ignore_loglevel

We are enabling some power features on medfield.  To test suspend-2-RAM
conveniently, we need turn on/off ignore_loglevel frequently without
rebooting.

Add a module parameter, so users can change it by:
/sys/module/printk/parameters/ignore_loglevel

Signed-off-by: Yanmin Zhang <yanmin.zhang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agokernel-sysctlc-add-cap_last_cap-to-proc-sys-kernel-fix
Andrew Morton [Mon, 24 Oct 2011 14:58:42 +0000 (01:58 +1100)]
kernel-sysctlc-add-cap_last_cap-to-proc-sys-kernel-fix

make cap_last_cap const, per Ulrich

Cc: Dan Ballard <dan@mindstab.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: James Morris <jmorris@namei.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Ulrich Drepper <drepper@akkadia.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agokernel/sysctl.c: add cap_last_cap to /proc/sys/kernel
Dan Ballard [Mon, 24 Oct 2011 14:58:42 +0000 (01:58 +1100)]
kernel/sysctl.c: add cap_last_cap to /proc/sys/kernel

Userspace needs to know the highest valid capability of the running
kernel, which right now cannot reliably be retrieved from the header files
only.  The fact that this value cannot be determined properly right now
creates various problems for libraries compiled on newer header files
which are run on older kernels.  They assume capabilities are available
which actually aren't.  libcap-ng is one example.  And we ran into the
same problem with systemd too.

Now the capability is exported in /proc/sys/kernel/cap_last_cap.

Signed-off-by: Dan Ballard <dan@mindstab.net>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Ulrich Drepper <drepper@akkadia.org>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agowatchdog: move watchdog_*_all_cpus under CONFIG_SYSCTL
Vasily Averin [Mon, 24 Oct 2011 14:58:41 +0000 (01:58 +1100)]
watchdog: move watchdog_*_all_cpus under CONFIG_SYSCTL

Fix compilation warnings for CONFIG_SYSCTL=n:

fixed compilation warnings in case of disabled CONFIG_SYSCTL
kernel/watchdog.c:483:13: warning: `watchdog_enable_all_cpus' defined but not used
kernel/watchdog.c:500:13: warning: `watchdog_disable_all_cpus' defined but not used

these functions are static and are used only in sysctl handler, so move
them inside #ifdef CONFIG_SYSCTL too

Signed-off-by: Vasily Averin <vvs@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agostop_machine-make-stop_machine-safe-and-efficient-to-call-early-v3.
Jeremy Fitzhardinge [Mon, 24 Oct 2011 14:58:41 +0000 (01:58 +1100)]
stop_machine-make-stop_machine-safe-and-efficient-to-call-early-v3.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agostop_machine: make stop_machine safe and efficient to call early
Jeremy Fitzhardinge [Mon, 24 Oct 2011 14:58:41 +0000 (01:58 +1100)]
stop_machine: make stop_machine safe and efficient to call early

Make stop_machine() safe to call early in boot, before SMP has been set
up, by simply calling the callback function directly if there's only one
CPU online.

[ Fixes from AKPM:
   - add comment
   - local_irq_flags, not save_flags
   - also call hard_irq_disable() for systems which need it

  Tejun suggested using an explicit flag rather than just looking at
  the online cpu count. ]

Cc: Tejun Heo <tj@kernel.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Subject: stop_machine-make-stop_machine-safe-and-efficient-to-call-early-v3.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Tejun Heo <tj@kernel.org>
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Subject: stop_machine-make-stop_machine-safe-and-efficient-to-call-early-v3.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agodrivers/misc/ad525x_dpot-i2c.c: add i2c support for AD5161
Peter Korsgaard [Mon, 24 Oct 2011 14:58:40 +0000 (01:58 +1100)]
drivers/misc/ad525x_dpot-i2c.c: add i2c support for AD5161

Commit 6c536e4ce8e ("ad525x_dpot: add support for SPI parts") added
support for the AD5161 through SPI, but the device supports both I2C and
SPI (depending on the DIS pin), so add it to -i2c as well.

Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agodriver/misc/fsa9480.c fix potential null-pointer dereference
Jonghwan Choi [Mon, 24 Oct 2011 14:58:40 +0000 (01:58 +1100)]
driver/misc/fsa9480.c fix potential null-pointer dereference

Signed-off-by: Jonghwan Choi <jhbird.choi@samsung.com>
Cc: Donggeun Kim <dg77.kim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agolis3lv02d: make regulator API usage unconditional
Mark Brown [Mon, 24 Oct 2011 14:58:40 +0000 (01:58 +1100)]
lis3lv02d: make regulator API usage unconditional

The regulator API contains a range of features for stubbing itself out
when not in use and for transparently restricting the actual effect of
regulator API calls where they can't be supported on a particular system
so that drivers don't need to individually implement this.  Simplify the
driver slightly by making use of this idiom.

The only in tree user is ecovec24 which does not use the regulator API.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Éric Piel <eric.piel@tremplin-utc.net>
Cc: Ilkka Koskinen <ilkka.koskinen@nokia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agolis3-remove-the-references-to-the-global-variable-in-core-driver-fix
Ilkka Koskinen [Mon, 24 Oct 2011 14:58:39 +0000 (01:58 +1100)]
lis3-remove-the-references-to-the-global-variable-in-core-driver-fix

Signed-off-by: Ilkka Koskinen <ilkka.koskinen@nokia.com>
Signed-off-by: Éric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agolis3: remove the references to the global variable in core driver
Éric Piel [Mon, 24 Oct 2011 14:58:39 +0000 (01:58 +1100)]
lis3: remove the references to the global variable in core driver

Signed-off-by: Ilkka Koskinen <ilkka.koskinen@nokia.com>
Signed-off-by: Éric Piel <eric.piel@tremplin-utc.net>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Witold Pilat <witold.pilat@gmail.com>
Cc: Lyall Pearce <lyall.pearce@hp.com>
Cc: Malte Starostik <m-starostik@versanet.de>
Cc: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Cc: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agolis3: change exported function to use passed parameter
Éric Piel [Mon, 24 Oct 2011 14:58:38 +0000 (01:58 +1100)]
lis3: change exported function to use passed parameter

Change exported functions to use the device given as parameter
instead of the global one.

Signed-off-by: Ilkka Koskinen <ilkka.koskinen@nokia.com>
Signed-off-by: Éric Piel <eric.piel@tremplin-utc.net>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Witold Pilat <witold.pilat@gmail.com>
Cc: Lyall Pearce <lyall.pearce@hp.com>
Cc: Malte Starostik <m-starostik@versanet.de>
Cc: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Cc: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agolis3: use consistent naming of variables
Éric Piel [Mon, 24 Oct 2011 14:58:38 +0000 (01:58 +1100)]
lis3: use consistent naming of variables

Signed-off-by: Ilkka Koskinen <ilkka.koskinen@nokia.com>
Signed-off-by: Éric Piel <eric.piel@tremplin-utc.net>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Witold Pilat <witold.pilat@gmail.com>
Cc: Lyall Pearce <lyall.pearce@hp.com>
Cc: Malte Starostik <m-starostik@versanet.de>
Cc: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Cc: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agolis3: free regulators if probe() fails
Éric Piel [Mon, 24 Oct 2011 14:58:38 +0000 (01:58 +1100)]
lis3: free regulators if probe() fails

Signed-off-by: Ilkka Koskinen <ilkka.koskinen@nokia.com>
Signed-off-by: Éric Piel <eric.piel@tremplin-utc.net>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Witold Pilat <witold.pilat@gmail.com>
Cc: Lyall Pearce <lyall.pearce@hp.com>
Cc: Malte Starostik <m-starostik@versanet.de>
Cc: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Cc: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agohp_accel: add HP ProBook 655x
Éric Piel [Mon, 24 Oct 2011 14:58:37 +0000 (01:58 +1100)]
hp_accel: add HP ProBook 655x

Add axis correction for HP ProBook 6555b.

Signed-off-by: Malte Starostik <m-starostik@versanet.de>
Signed-off-by: Éric Piel <eric.piel@tremplin-utc.net>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Witold Pilat <witold.pilat@gmail.com>
Cc: Lyall Pearce <lyall.pearce@hp.com>
Cc: Ilkka Koskinen <ilkka.koskinen@nokia.com>
Cc: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Cc: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agolis3: add support for HP EliteBook 8540w
Éric Piel [Mon, 24 Oct 2011 14:58:37 +0000 (01:58 +1100)]
lis3: add support for HP EliteBook 8540w

Add axis correction for HP EliteBook 8540w.

Reported-by: Lyall Pearce <lyall.pearce@hp.com>
Signed-off-by: Éric Piel <eric.piel@tremplin-utc.net>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Witold Pilat <witold.pilat@gmail.com>
Cc: Malte Starostik <m-starostik@versanet.de>
Cc: Ilkka Koskinen <ilkka.koskinen@nokia.com>
Cc: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Cc: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agolis3: add support for HP EliteBook 2730p
Éric Piel [Mon, 24 Oct 2011 14:58:37 +0000 (01:58 +1100)]
lis3: add support for HP EliteBook 2730p

Add axis correction for HP EliteBook 2730p.

Tested-by: Witold Pilat <witold.pilat@gmail.com>
Signed-off-by: Éric Piel <eric.piel@tremplin-utc.net>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Lyall Pearce <lyall.pearce@hp.com>
Cc: Malte Starostik <m-starostik@versanet.de>
Cc: Ilkka Koskinen <ilkka.koskinen@nokia.com>
Cc: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Cc: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agolis3: update maintainer information
Éric Piel [Mon, 24 Oct 2011 14:58:36 +0000 (01:58 +1100)]
lis3: update maintainer information

In the move of the lis3 driver, the hp_accel.c file got dropped from the
MAINTAINER file. Make it explicit again that this file is tied to lis3
again.

Signed-off-by: Éric Piel <eric.piel@tremplin-utc.net>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Witold Pilat <witold.pilat@gmail.com>
Cc: Lyall Pearce <lyall.pearce@hp.com>
Cc: Malte Starostik <m-starostik@versanet.de>
Cc: Ilkka Koskinen <ilkka.koskinen@nokia.com>
Cc: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Cc: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agolis3lv02d: avoid divide by zero due to unchecked
Éric Piel [Mon, 24 Oct 2011 14:58:36 +0000 (01:58 +1100)]
lis3lv02d: avoid divide by zero due to unchecked

After an "unexpected" reboot, I found this Oops in my logs:

divide error: 0000 [#1] PREEMPT SMP=20
CPU 0=20
Modules linked in: lis3lv02d hp_wmi input_polldev [...]
Pid: 390, comm: modprobe Tainted: G         C  2.6.39-rc7-wl+=20
RIP: 0010:[<ffffffffa014b427>]  [<ffffffffa014b427>]
 lis3lv02d_poweron+0x4e/0x94 [lis3lv02d]
RSP: 0018:ffff8801d6407cf8  EFLAGS: 00010246
RAX: 0000000000000bb8 RBX: ffffffffa014e000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffea00066e4708 RDI: ffff8801df002700
RBP: ffff8801d6407d18 R08: ffffea00066c5a30 R09: ffffffff812498c9
R10: ffff8801d7bfcea0 R11: ffff8801d7bfce10 R12: 0000000000000bb8
R13: 00000000ffffffda R14: ffffffffa0154120 R15: ffffffffa0154030
=46S:  00007fc0705db700(0000) GS:ffff8801dfa00000(0000) knlGS:0
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f33549174f0 CR3: 00000001d65c9000 CR4: 00000000000406f0
Process modprobe (pid: 390, threadinfo ffff8801d6406000, task ffff8801d6b40=
000)
Stack:
 ffffffffa0154120 62ffffffa0154030 ffffffffa014e000 00000000ffffffea
 ffff8801d6407d58 ffffffffa014bcc1 0000000000000000 0000000000000048
 ffff8801d8bae800 00000000ffffffea 00000000ffffffda ffffffffa0154120
Call Trace:
 [<ffffffffa014bcc1>] lis3lv02d_init_device+0x1ce/0x496 [lis3lv02d]
 [<ffffffffa01522ff>] lis3lv02d_add+0x10f/0x17c [hp_accel]
 [<ffffffff81233e11>] acpi_device_probe+0x49/0x117
[...]
Code: 3a 75 06 80 4d ef 50 eb 04 80 4d ef 40 0f b6 55 ef be 21
00 00 00 48 89 df ff 53 18 44 8b 63 6c e8 3e fc ff ff 89 c1 44
89 e0 99 <f7> f9 89 c7 e8 93 82 ef e0 48 83 7b 30 00 74 2d 45
31 e4 80 7b=20
RIP  [<ffffffffa014b427>] lis3lv02d_poweron+0x4e/0x94 [lis3lv02d]
 RSP <ffff8801d6407cf8>

>From my POV, it looks like the hardware is not working as expected
and returns a bogus data rate. The driver doesn't check the result
and directly uses it as some sort of divisor in some places:

msleep(lis3->pwron_delay / lis3lv02d_get_odr());

Under this circumstances, this could very well cause the
"divide by zero" exception from above.

For now, I fixed it the easiest and most obvious way:
Check if the result is sane and if it isn't use a sane default
instead. I went for "100" in the latter case, simply because
/sys/devices/platform/lis3lv02d/rate returns it on a successful
boot.

Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Éric Piel <eric.piel@tremplin-utc.net>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Witold Pilat <witold.pilat@gmail.com>
Cc: Lyall Pearce <lyall.pearce@hp.com>
Cc: Malte Starostik <m-starostik@versanet.de>
Cc: Ilkka Koskinen <ilkka.koskinen@nokia.com>
Cc: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Cc: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agodrivers/hwmon/hwmon.c: convert idr to ida and use ida_simple_get()
Jonathan Cameron [Mon, 24 Oct 2011 14:58:35 +0000 (01:58 +1100)]
drivers/hwmon/hwmon.c: convert idr to ida and use ida_simple_get()

A straightforward looking use of idr for a device id.

Signed-off-by: Jonathan Cameron <jic23@cam.ac.uk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Acked-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agohwmon: convert idr to ida and use ida_simple interface
Jonathan Cameron [Mon, 24 Oct 2011 14:58:35 +0000 (01:58 +1100)]
hwmon: convert idr to ida and use ida_simple interface

hwmon was using an idr with a NULL pointer, so convert to an
ida which then allows use of Rusty's ida_simple_get.

Signed-off-by: Jonathan Cameron <jic23@cam.ac.uk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agolib/Kconfig.debug: fix help message for DEFAULT_HUNG_TASK_TIMEOUT
Jiaju Zhang [Mon, 24 Oct 2011 14:58:35 +0000 (01:58 +1100)]
lib/Kconfig.debug: fix help message for DEFAULT_HUNG_TASK_TIMEOUT

Added missing _secs in the help message of config DEFAULT_HUNG_TASK_TIMEOUT.

Signed-off-by: Jiaju Zhang <jjzhang@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agofs/pipe.c: add ->statfs callback for pipefs
Pavel Emelyanov [Mon, 24 Oct 2011 14:58:34 +0000 (01:58 +1100)]
fs/pipe.c: add ->statfs callback for pipefs

Currently a statfs on a pipe's /proc/<pid>/fd/ link returns -ENOSYS.  Wire
pipfs up so that the statfs succeeds.

This is required by checkpoint-restart in the userspace to make it
possible to distinguish pipes from fifos.

When we dump information about task's open files we use the /proc/pid/fd
directoy's symlinks and the fact that opening any of them gives us exactly
the same dentry->inode pair as the original process has.  Now if a task
we're dumping has opened pipe and fifo we need to detect this and act
accordingly.  Knowing that an fd with type S_ISFIFO resides on a pipefs is
the most precise way.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agointel_idle: disable auto_demotion for hotplugged CPUs
Shaohua Li [Mon, 24 Oct 2011 14:58:34 +0000 (01:58 +1100)]
intel_idle: disable auto_demotion for hotplugged CPUs

auto_demotion_disable is called only for online CPUs.  For hotplugged
CPUs, we should disable it too.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agointel_idle: fix API misuse
Shaohua Li [Mon, 24 Oct 2011 14:58:33 +0000 (01:58 +1100)]
intel_idle: fix API misuse

smp_call_function() only lets all other CPUs execute a specific function,
while we expect all CPUs do in intel_idle.  Without the fix, we could have
one cpu which has auto_demotion enabled or has no boradcast timer setup.
Usually we don't see impact because auto demotion just harms power and the
intel_idle init is called in CPU 0, where boradcast timer delivers
interrupt, but this still could be a problem.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agoalpha: wire up sendmmsg syscall
Michael Cree [Mon, 24 Oct 2011 14:58:33 +0000 (01:58 +1100)]
alpha: wire up sendmmsg syscall

Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agoalpha: wire up accept4 syscall
Michael Cree [Mon, 24 Oct 2011 14:58:32 +0000 (01:58 +1100)]
alpha: wire up accept4 syscall

Somehow wiring up the accept4 syscall on Alpha was missed long ago.
This commit rectifies that oversight.

Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agohpet: factor timer allocate from open
Magnus Lynch [Mon, 24 Oct 2011 14:58:32 +0000 (01:58 +1100)]
hpet: factor timer allocate from open

The current implementation of the /dev/hpet driver couples opening the
device with allocating one of the (scarce) timers (aka comparators).  This
is a limitation in that the main counter may be valuable to applications
seeking a high-resolution timer who have no use for the interrupt
generating functionality of the comparators.

This patch alters the open semantics so that when the device is opened, no
timer is allocated.  Operations that depend on a timer being in context
implicitly attempt allocating a timer, to maintain backward compatibility.
 There is also an IOCTL (HPET_ALLOC_TIMER _IO) added so that the
allocation may be done explicitly.  (I prefer the explicit open then
allocate pattern but don't know how practical it would be to require all
existing code to be changed.)

/dev/hpet is accessed via mmap().  This is the only interface of /dev/hpet
that is actually used in practice.

[akpm@linux-foundation.org: coding-style tweaks]
[arnd@arndb.de: fix build]
Signed-off-by: Magnus Lynch <maglyx@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Acked-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agoinclude/linux/security.h: fix security_inode_init_security() arg
Andrew Morton [Mon, 24 Oct 2011 14:58:32 +0000 (01:58 +1100)]
include/linux/security.h: fix security_inode_init_security() arg

Make the security_inode_init_security() initxattrs arg const, to match the
non-stubbed version of that function.

Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agoselinuxfs: remove custom hex_to_bin()
Andy Shevchenko [Mon, 24 Oct 2011 14:58:31 +0000 (01:58 +1100)]
selinuxfs: remove custom hex_to_bin()

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Eric Paris <eparis@parisplace.org>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agomm-munlock-use-mapcount-to-avoid-terrible-overhead-fix
Andrew Morton [Mon, 24 Oct 2011 14:58:31 +0000 (01:58 +1100)]
mm-munlock-use-mapcount-to-avoid-terrible-overhead-fix

add comment

Cc: Hugh Dickins <hughd@google.com>
Cc: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agomm: munlock use mapcount to avoid terrible overhead
Hugh Dickins [Mon, 24 Oct 2011 14:58:30 +0000 (01:58 +1100)]
mm: munlock use mapcount to avoid terrible overhead

A process spent 30 minutes exiting, just munlocking the pages of a large
anonymous area that had been alternately mprotected into page-sized vmas:
for every single page there's an anon_vma walk through all the other
little vmas to find the right one.

A general fix to that would be a lot more complicated (use prio_tree on
anon_vma?), but there's one very simple thing we can do to speed up the
common case: if a page to be munlocked is mapped only once, then it is our
vma that it is mapped into, and there's no need whatever to walk through
all the others.

Okay, there is a very remote race in munlock_vma_pages_range(), if between
its follow_page() and lock_page(), another process were to munlock the
same page, then page reclaim remove it from our vma, then another process
mlock it again.  We would find it with page_mapcount 1, yet it's still
mlocked in another process.  But never mind, that's much less likely than
the down_read_trylock() failure which munlocking already tolerates (in
try_to_unmap_one()): in due course page reclaim will discover and move the
page to unevictable instead.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agomm/huge_memory: fix typo when updating mmu cache
Hillf Danton [Mon, 24 Oct 2011 14:58:30 +0000 (01:58 +1100)]
mm/huge_memory: fix typo when updating mmu cache

There are three cases of update_mmu_cache() in the file, and the case in
function collapse_huge_page() has a typo, namely the last parameter used,
which is corrected based on the other two cases.

Due to the define of update_mmu_cache by X86, the only arch that
implements THP currently, the change here has no really crystal point, but
one or two minutes of efforts could be saved for those archs that are
likely to support THP in future.

Signed-off-by: Hillf Danton <dhillf@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agomm/huge_memory: fix copying user highpage
Hillf Danton [Mon, 24 Oct 2011 14:58:12 +0000 (01:58 +1100)]
mm/huge_memory: fix copying user highpage

The THP copy-on-write handler falls back to regular-sized pages for a huge
page replacement upon allocation failure or if THP has been individually
disabled in the target VMA.  The loop responsible for copying page-sized
chunks accidentally uses multiples of PAGE_SHIFT instead of PAGE_SIZE as
the virtual address arg for copy_user_highpage().

Signed-off-by: Hillf Danton <dhillf@gmail.com>
Acked-by: Johannes Weiner <jweiner@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agomm: do not drain pagevecs for mlockall(MCL_FUTURE)
Christoph Lameter [Mon, 24 Oct 2011 14:54:33 +0000 (01:54 +1100)]
mm: do not drain pagevecs for mlockall(MCL_FUTURE)

MCL_FUTURE does not move pages between lru list and draining the LRU per
cpu pagevecs is a nasty activity.  Avoid doing it unecessarily.

Signed-off-by: Christoph Lameter <cl@gentwo.org>
Cc: David Rientjes <rientjes@google.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agooom: thaw threads if oom killed thread is frozen before deferring
David Rientjes [Mon, 24 Oct 2011 14:54:32 +0000 (01:54 +1100)]
oom: thaw threads if oom killed thread is frozen before deferring

If a thread has been oom killed and is frozen, thaw it before returning to
the page allocator.  Otherwise, it can stay frozen indefinitely and no
memory will be freed.

Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Michal Hocko <mhocko@suse.cz>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agomm: add vm_area_add_early()
Nicolas Pitre [Mon, 24 Oct 2011 14:54:32 +0000 (01:54 +1100)]
mm: add vm_area_add_early()

The existing vm_area_register_early() allows for early vmalloc space
allocation.  However upcoming cleanups in the ARM architecture require
that some fixed locations in the vmalloc area be reserved also very early.

The name "vm_area_register_early" would have been a good name for the
reservation part without the allocation.  Since it is already in use with

Both vm_area_register_early() and vm_area_add_early() can be used together
meaning that the former is now implemented using the later where it is
ensured that no conflicting areas are added, but no attempt is made to
make the allocation scheme in vm_area_register_early() more sophisticated.
After all, you must know what you're doing when using those functions.

Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agovmscan: abort reclaim/compaction if compaction can proceed
Mel Gorman [Mon, 24 Oct 2011 14:54:31 +0000 (01:54 +1100)]
vmscan: abort reclaim/compaction if compaction can proceed

If compaction can proceed, shrink_zones() stops doing any work but its
callers still call shrink_slab() which raises the priority and potentially
sleeps.  This is unnecessary and wasteful so this patch aborts direct
reclaim/compaction entirely if compaction can proceed.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Johannes Weiner <jweiner@redhat.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agovmscan-limit-direct-reclaim-for-higher-order-allocations-fix
Johannes Weiner [Mon, 24 Oct 2011 14:54:31 +0000 (01:54 +1100)]
vmscan-limit-direct-reclaim-for-higher-order-allocations-fix

change comment to explain the order check

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agovmscan: limit direct reclaim for higher order allocations
Rik van Riel [Mon, 24 Oct 2011 14:54:31 +0000 (01:54 +1100)]
vmscan: limit direct reclaim for higher order allocations

When suffering from memory fragmentation due to unfreeable pages, THP page
faults will repeatedly try to compact memory.  Due to the unfreeable
pages, compaction fails.

Needless to say, at that point page reclaim also fails to create free
contiguous 2MB areas.  However, that doesn't stop the current code from
trying, over and over again, and freeing a minimum of 4MB (2UL <<
sc->order pages) at every single invocation.

This resulted in my 12GB system having 2-3GB free memory, a corresponding
amount of used swap and very sluggish response times.

This can be avoided by having the direct reclaim code not reclaim from
zones that already have plenty of free memory available for compaction.

If compaction still fails due to unmovable memory, doing additional
reclaim will only hurt the system, not help.

Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Johannes Weiner <jweiner@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agovmscan: add barrier to prevent evictable page in unevictable list
Minchan Kim [Mon, 24 Oct 2011 14:54:30 +0000 (01:54 +1100)]
vmscan: add barrier to prevent evictable page in unevictable list

When a race between putback_lru_page() and shmem_lock with lock=0 happens,
progrom execution order is as follows, but clear_bit in processor #1 could
be reordered right before spin_unlock of processor #1.  Then, the page
would be stranded on the unevictable list.

spin_lock
SetPageLRU
spin_unlock
                                clear_bit(AS_UNEVICTABLE)
                                spin_lock
                                if PageLRU()
                                        if !test_bit(AS_UNEVICTABLE)
                                         move evictable list
smp_mb
if !test_bit(AS_UNEVICTABLE)
        move evictable list
                                spin_unlock

But, pagevec_lookup() in scan_mapping_unevictable_pages() has
rcu_read_[un]lock() so it could protect reordering before reaching
test_bit(AS_UNEVICTABLE) on processor #1 so this problem never happens.
But it's a unexpected side effect and we should solve this problem
properly.

This patch adds a barrier after mapping_clear_unevictable.

I didn't meet this problem but just found during review.

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agomm/huge_memory.c: quiet sparse noise
H Hartley Sweeten [Mon, 24 Oct 2011 14:54:30 +0000 (01:54 +1100)]
mm/huge_memory.c: quiet sparse noise

Quiet the sparse noise:

warning: symbol 'khugepaged_scan' was not declared. Should it be static?
warning: context imbalance in 'khugepaged_scan_mm_slot' - unexpected unlock

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agomm/mempolicy.c: quiet sparse noise
H Hartley Sweeten [Mon, 24 Oct 2011 14:54:30 +0000 (01:54 +1100)]
mm/mempolicy.c: quiet sparse noise

Quiet the spares noise:

warning: symbol 'default_policy' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Stephen Wilson <wilsons@start.ca>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agomm/thrash.c: quiet sparse noise
H Hartley Sweeten [Mon, 24 Oct 2011 14:54:29 +0000 (01:54 +1100)]
mm/thrash.c: quiet sparse noise

Quiet the following sparse noise:

warning: symbol 'swap_token_memcg' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agomm/memblock.c: quiet sparse noise
H Hartley Sweeten [Mon, 24 Oct 2011 14:54:29 +0000 (01:54 +1100)]
mm/memblock.c: quiet sparse noise

Quiet the following sparse noise in this file:

warning: symbol 'memblock_overlaps_region' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers,com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Tomi Valkeinen <tomi.valkeinen@nokia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agomm: disable user interface to manually rescue unevictable pages
Johannes Weiner [Mon, 24 Oct 2011 14:54:28 +0000 (01:54 +1100)]
mm: disable user interface to manually rescue unevictable pages

At one point, anonymous pages were supposed to go on the unevictable list
when no swap space was configured, and the idea was to manually rescue
those pages after adding swap and making them evictable again.  But
nowadays, swap-backed pages on the anon LRU list are not scanned without
available swap space anyway, so there is no point in moving them to a
separate list anymore.

The manual rescue could also be used in case pages were stranded on the
unevictable list due to race conditions.  But the code has been around for
a while now and newly discovered bugs should be properly reported and
dealt with instead of relying on such a manual fixup.

In addition to the lack of a usecase, the sysfs interface to rescue pages
from a specific NUMA node has been broken since its introduction, so it's
unlikely that anybody ever relied on that.

This patch removes the functionality behind the sysctl and the
node-interface and emits a one-time warning when somebody tries to access
either of them.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Reported-by: Kautuk Consul <consul.kautuk@gmail.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agovmscan.c: fix invalid strict_strtoul() check in write_scan_unevictable_node()
Kautuk Consul [Mon, 24 Oct 2011 14:54:28 +0000 (01:54 +1100)]
vmscan.c: fix invalid strict_strtoul() check in write_scan_unevictable_node()

write_scan_unevictable_node() checks the value req returned by
strict_strtoul() and returns 1 if req is 0.

However, when strict_strtoul() returns 0, it means successful conversion
of buf to unsigned long.

Due to this, the function was not proceeding to scan the zones for
unevictable pages even though we write a valid value to the
scan_unevictable_pages sys file.

Change this check slightly to check for invalid value in buf as well as 0
value stored in res after successful conversion via strict_strtoul.  In
both cases, we do not perform the scanning of this node's zones.

Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agomm: fix kunmap_high() comment
Li Haifeng [Mon, 24 Oct 2011 14:54:27 +0000 (01:54 +1100)]
mm: fix kunmap_high() comment

Signed-off-by: Li Haifeng <omycle@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agomm: compaction: make compact_zone_order() static
Kyungmin Park [Mon, 24 Oct 2011 14:54:27 +0000 (01:54 +1100)]
mm: compaction: make compact_zone_order() static

There's no compact_zone_order() user outside file scope, so make it static.

Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agoHWPOISON: convert pr_debug()s to pr_info()s
Dean Nelson [Mon, 24 Oct 2011 14:54:27 +0000 (01:54 +1100)]
HWPOISON: convert pr_debug()s to pr_info()s

Commit fb46e73520940b ("HWPOISON: Convert pr_debugs to pr_info) authored
by Andi Kleen converted a number of pr_debug()s to pr_info()s.

About the same time additional code with pr_debug()s was added by two
other commits 8c6c2ecb4466 ("HWPOSION, hugetlb: recover from free hugepage
error when !MF_COUNT_INCREASED") and d950b95882f3d ("HWPOISON, hugetlb:
soft offlining for hugepage").  And these pr_debug()s failed to get
converted to pr_info()s.

This patch converts them as well.  And does some minor related whitespace
cleanup.

Signed-off-by: Dean Nelson <dnelson@redhat.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agofs/buffer.c: add device information for error output in __find_get_block_slow()
Tao Ma [Mon, 24 Oct 2011 14:54:26 +0000 (01:54 +1100)]
fs/buffer.c: add device information for error output in __find_get_block_slow()

On the ext4 mailing list[1], we got some report about errors in
__find_get_block_slow(), but the information is very limited.

If the device information is given, we can know the name of the sick
volume.  Futhermore, we can get the corresponding status of that
block(group, inode block etc) by analyzing the disk layout.

[1] http://marc.info/?l=linux-ext4&m=131379831421147&w=2

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agomm-mmapc-eliminate-the-ret-variable-from-mm_take_all_locks-fix
Andrew Morton [Mon, 24 Oct 2011 14:54:26 +0000 (01:54 +1100)]
mm-mmapc-eliminate-the-ret-variable-from-mm_take_all_locks-fix

Cc: Kautuk Consul <consul.kautuk@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agomm/mmap.c: eliminate the ret variable from mm_take_all_locks()
Kautuk Consul [Mon, 24 Oct 2011 14:54:26 +0000 (01:54 +1100)]
mm/mmap.c: eliminate the ret variable from mm_take_all_locks()

The ret variable is really not needed in mm_take_all_locks().

Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agovmscan: fix shrinker callback bug in fs/super.c
Mikulas Patocka [Mon, 24 Oct 2011 14:54:25 +0000 (01:54 +1100)]
vmscan: fix shrinker callback bug in fs/super.c

The callback must not return -1 when nr_to_scan is zero. Fix the bug in
fs/super.c and add this requirement to the callback specification.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agomm-add-comment-explaining-task-state-setting-in-bdi_forker_thread-fix
Andrew Morton [Mon, 24 Oct 2011 14:54:25 +0000 (01:54 +1100)]
mm-add-comment-explaining-task-state-setting-in-bdi_forker_thread-fix

fiddle wording

Cc: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agoksm: fix the comment of try_to_unmap_one()
Wanlong Gao [Mon, 24 Oct 2011 14:54:25 +0000 (01:54 +1100)]
ksm: fix the comment of try_to_unmap_one()

try_to_unmap_one() is called by try_to_unmap_ksm(), too.

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agothp: share get_huge_page_tail()
Andrea Arcangeli [Mon, 24 Oct 2011 14:54:24 +0000 (01:54 +1100)]
thp: share get_huge_page_tail()

This avoids duplicating the function in every arch gup_fast.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agosparc: gup_pte_range() support THP based tail recounting
Andrea Arcangeli [Mon, 24 Oct 2011 14:54:24 +0000 (01:54 +1100)]
sparc: gup_pte_range() support THP based tail recounting

Up to this point the code assumed old refcounting for hugepages (pre-thp).
 This updates the code directly to the thp mapcount tail page refcounting.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agos390: gup_huge_pmd() return 0 if pte changes
Andrea Arcangeli [Mon, 24 Oct 2011 14:54:23 +0000 (01:54 +1100)]
s390: gup_huge_pmd() return 0 if pte changes

s390 didn't return 0 in that case, if it's rolling back the *nr pointer it
should also return zero to avoid adding pages to the array at the wrong
offset.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agos390: gup_huge_pmd() support THP tail recounting
Andrea Arcangeli [Mon, 24 Oct 2011 14:54:23 +0000 (01:54 +1100)]
s390: gup_huge_pmd() support THP tail recounting

Up to this point the code assumed old refcounting for hugepages (pre-thp).
This updates the code directly to the thp mapcount tail page refcounting.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agopowerpc: gup_huge_pmd() return 0 if pte changes
Andrea Arcangeli [Mon, 24 Oct 2011 14:54:23 +0000 (01:54 +1100)]
powerpc: gup_huge_pmd() return 0 if pte changes

powerpc didn't return 0 in that case, if it's rolling back the *nr pointer
it should also return zero to avoid adding pages to the array at the wrong
offset.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agopowerpc: gup_hugepte() support THP based tail recounting
Andrea Arcangeli [Mon, 24 Oct 2011 14:54:22 +0000 (01:54 +1100)]
powerpc: gup_hugepte() support THP based tail recounting

Up to this point the code assumed old refcounting for hugepages (pre-thp).
This updates the code directly to the thp mapcount tail page refcounting.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agopowerpc: gup_hugepte() avoid to free the head page too many times
Andrea Arcangeli [Mon, 24 Oct 2011 14:54:22 +0000 (01:54 +1100)]
powerpc: gup_hugepte() avoid to free the head page too many times

We only taken "refs" pins on the head page not "*nr" pins.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agopowerpc: get_hugepte() don't put_page() the wrong page
Andrea Arcangeli [Mon, 24 Oct 2011 14:54:22 +0000 (01:54 +1100)]
powerpc: get_hugepte() don't put_page() the wrong page

"page" may have changed to point to the next hugepage after the loop
completed, The references have been taken on the head page, so the
put_page must happen there too.

This is a longstanding issue pre-thp inclusion.

It's totally unclear how these page_cache_add_speculative and pte_val(pte)
!= pte_val(*ptep) checks are necessary across all the powerpc gup_fast
code, when x86 doesn't need any of that: there's no way the page can be
freed with irq disabled so we're guaranteed the atomic_inc will happen on
a page with page_count > 0 (so not needing the speculative check).  The
pte check is also meaningless on x86: no need to rollback on x86 if the
pte changed, because the pte can still change a CPU tick after the check
succeeded and it won't be rolled back in that case.  The important thing
is we got a reference on a valid page that was mapped there a CPU tick
ago.  So not knowing the soft tlb refill code of ppc64 in great detail I'm
not removing the "speculative" page_count increase and the pte checks
across all the code, but unless there's a strong reason for it they should
be later cleaned up too.

If a pte can change from huge to non-huge (like it could happen with THP)
passing a pte_t *ptep to gup_hugepte() would also require to repeat the
is_hugepd in gup_hugepte(), but that shouldn't happen with hugetlbfs only
so I'm not altering that.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agopowerpc: remove superfluous PageTail checks on the pte gup_fast
Andrea Arcangeli [Mon, 24 Oct 2011 14:54:21 +0000 (01:54 +1100)]
powerpc: remove superfluous PageTail checks on the pte gup_fast

This part of gup_fast doesn't seem capable of handling hugetlbfs ptes,
those should be handled by gup_hugepd only, so these checks are
superfluous.

Plus if this wasn't a noop, it would have oopsed because, the insistence
of using the speculative refcounting would trigger a VM_BUG_ON if a tail
page was encountered in the page_cache_get_speculative().

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agomm: thp: tail page refcounting fix
Andrea Arcangeli [Mon, 24 Oct 2011 14:54:21 +0000 (01:54 +1100)]
mm: thp: tail page refcounting fix

Michel while working on the working set estimation code, noticed that
calling get_page_unless_zero() on a random pfn_to_page(random_pfn) wasn't
safe, if the pfn ended up being a tail page of a transparent hugepage
under splitting by __split_huge_page_refcount().  He then found the
problem could also theoretically materialize with
page_cache_get_speculative() during the speculative radix tree lookups
that uses get_page_unless_zero() in SMP if the radix tree page is freed
and reallocated and get_user_pages is called on it before
page_cache_get_speculative has a chance to call get_page_unless_zero().

So the best way to fix the problem is to keep page_tail->_count zero at
all times.  This will guarantee that get_page_unless_zero() can never
succeed on any tail page.  page_tail->_mapcount is guaranteed zero and is
unused for all tail pages of a compound page, so we can simply account the
tail page references there and transfer them to tail_page->_count in
__split_huge_page_refcount() (in addition to the head_page->_mapcount).

While debugging this s/_count/_mapcount/ change I also noticed get_page is
called by direct-io.c on pages returned by get_user_pages.  That wasn't
entirely safe because the two atomic_inc in get_page weren't atomic.  As
opposed other get_user_page users like secondary-MMU page fault to
establish the shadow pagetables would never call any superflous get_page
after get_user_page returns.  It's safer to make get_page universally safe
for tail pages and to use get_page_foll() within follow_page (inside
get_user_pages()).  get_page_foll() is safe to do the refcounting for tail
pages without taking any locks because it is run within PT lock protected
critical sections (PT lock for pte and page_table_lock for
pmd_trans_huge).  The standard get_page() as invoked by direct-io instead
will now take the compound_lock but still only for tail pages.  The
direct-io paths are usually I/O bound and the compound_lock is per THP so
very finegrined, so there's no risk of scalability issues with it.  A
simple direct-io benchmarks with all lockdep prove locking and spinlock
debugging infrastructure enabled shows identical performance and no
overhead.  So it's worth it.  Ideally direct-io should stop calling
get_page() on pages returned by get_user_pages().  The spinlock in
get_page() is already optimized away for no-THP builds but doing
get_page() on tail pages returned by GUP is generally a rare operation and
usually only run in I/O paths.

This new refcounting on page_tail->_mapcount in addition to avoiding new
RCU critical sections will also allow the working set estimation code to
work without any further complexity associated to the tail page
refcounting with THP.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: <stable@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agomm-add-extra-free-kbytes-tunable-update-checkpatch-fixes
Andrew Morton [Mon, 24 Oct 2011 14:54:20 +0000 (01:54 +1100)]
mm-add-extra-free-kbytes-tunable-update-checkpatch-fixes

ERROR: trailing whitespace
#98: FILE: mm/page_alloc.c:5303:
+ * free_kbytes_sysctl_handler - just a wrapper around proc_dointvec() so $

ERROR: trailing whitespace
#103: FILE: mm/page_alloc.c:5307:
+int free_kbytes_sysctl_handler(ctl_table *table, int write, $

ERROR: need consistent spacing around '*' (ctx:WxV)
#103: FILE: mm/page_alloc.c:5307:
+int free_kbytes_sysctl_handler(ctl_table *table, int write,
                                          ^

total: 3 errors, 0 warnings, 69 lines checked

NOTE: whitespace errors detected, you may wish to use scripts/cleanpatch or
      scripts/cleanfile

./patches/mm-add-extra-free-kbytes-tunable-update.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agomm-add-extra-free-kbytes-tunable-update
Rik van Riel [Mon, 24 Oct 2011 14:54:20 +0000 (01:54 +1100)]
mm-add-extra-free-kbytes-tunable-update

All the fixes suggested by Andrew Morton.   Not much of a changelog
since the patch should probably be folded into
mm-add-extra-free-kbytes-tunable.patch

Thank you for pointing these out, Andrew.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agomm: add extra free kbytes tunable
Rik van Riel [Mon, 24 Oct 2011 14:54:20 +0000 (01:54 +1100)]
mm: add extra free kbytes tunable

Add a userspace visible knob to tell the VM to keep an extra amount of
memory free, by increasing the gap between each zone's min and low
watermarks.

This is useful for realtime applications that call system calls and have a
bound on the number of allocations that happen in any short time period.
In this application, extra_free_kbytes would be left at an amount equal to
or larger than than the maximum number of allocations that happen in any
burst.

It may also be useful to reduce the memory use of virtual machines
(temporarily?), in a way that does not cause memory fragmentation like
ballooning does.

Testing results from Satoru Moriya:

: I ran some sample workloads and measure memory allocation latency
: (latency of __alloc_page_nodemask()).
: The test is like following:
:
:  - CPU: 1 socket, 4 core
:  - Memory: 4GB
:
:  - Background load:
:    $ dd if=3D/dev/zero of=3D/tmp/tmp1
:    $ dd if=3D/dev/zero of=3D/tmp/tmp2
:    $ dd if=3D/dev/zero of=3D/tmp/tmp3
:
:  - Main load:
:    $ mapped-file-stream 1 $((1024 * 1024 * 640))  --(*)
:
:  (*) This is made by Johannes Weiner
:      https://lkml.org/lkml/2010/8/30/226
:
:      It allocates/access 640MByte memory at a burst.
:
: The result is follwoing:
:
:                                |         |  extra   |
:                                | default |  kbytes  |
: --------------------------------------------------------------
: min_free_kbytes                |    8113 |   8113   |
: extra_free_kbytes              |       0 | 640*1024 | (KB)
: --------------------------------------------------------------
: worst latency                  | 517.762 |  20.775  | (usec)
: --------------------------------------------------------------
: vmstat result                  |         |          |
:  nr_vmscan_write               |       0 |      0   |
:  pgsteal_dma                   |       0 |      0   |
:  pgsteal_dma32                 |  143667 | 144882   |
:  pgsteal_normal                |   31486 |  27001   |
:  pgsteal_movable               |       0 |      0   |
:  pgscan_kswapd_dma             |       0 |      0   |
:  pgscan_kswapd_dma32           |  138617 | 156351   |
:  pgscan_kswapd_normal          |   30593 |  27955   |
:  pgscan_kswapd_movable         |       0 |      0   |
:  pgscan_direct_dma             |       0 |      0   |
:  pgscan_direct_dma32           |    5050 |      0   |
:  pgscan_direct_normal          |     896 |      0   |
:  pgscan_direct_movable         |       0 |      0   |
:  kswapd_steal                  |  169207 | 171883   |
:  kswapd_inodesteal             |       0 |      0   |
:  kswapd_low_wmark_hit_quickly  |      43 |     45   |
:  kswapd_high_wmark_hit_quickly |       1 |      0   |
:  allocstall                    |      32 |      0   |
:
:
: As you can see, in the default case there were 32 direct reclaim
: (allocstal= l) and its worst latency was 517.762 usecs.  This value may be
: larger if a process would sleep or issue I/O in the direct reclaim path.
: OTOH, ii the other case where I add extra free bytes, there were no direct
: reclaim and its worst latency was 20.775 usecs.
:
: In this test case, we can avoid direct reclaim and keep a latency low.

Signed-off-by: Rik van Riel<riel@redhat.com>
Acked-by: Johannes Weiner <jweiner@redhat.com>
Tested-by: Satoru Moriya <satoru.moriya@hds.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agomm/vmalloc.c: report more vmalloc failures
Joe Perches [Mon, 24 Oct 2011 14:54:19 +0000 (01:54 +1100)]
mm/vmalloc.c: report more vmalloc failures

Some vmalloc failure paths do not report OOM conditions.

Add warn_alloc_failed, which also does a dump_stack, to those failure
paths.

This allows more site specific vmalloc failure logging message printks to
be removed.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agomm: fix page-faults detection in swap-token logic
Konstantin Khlebnikov [Mon, 24 Oct 2011 14:54:19 +0000 (01:54 +1100)]
mm: fix page-faults detection in swap-token logic

After commit v2.6.36-5896-gd065bd8 "mm: retry page fault when blocking on
disk transfer" we usually wait in page-faults without mmap_sem held, so
all swap-token logic was broken, because it based on using
rwsem_is_locked(&mm->mmap_sem) as sign of in progress page-faults.

Add an atomic counter of in progress page-faults for mm to the mm_struct
with swap-token.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agokswapd: assign new_order and new_classzone_idx after wakeup in sleeping
Alex,Shi [Mon, 24 Oct 2011 14:54:18 +0000 (01:54 +1100)]
kswapd: assign new_order and new_classzone_idx after wakeup in sleeping

There 2 places to read pgdat in kswapd.  One is return from a successful
balance, another is waked up from kswapd sleeping.  The new_order and
new_classzone_idx represent the balance input order and classzone_idx.

But current new_order and new_classzone_idx are not assigned after
kswapd_try_to_sleep(), that will cause a bug in the following scenario.

1: after a successful balance, kswapd goes to sleep, and new_order = 0;
   new_classzone_idx = __MAX_NR_ZONES - 1;

2: kswapd waked up with order = 3 and classzone_idx = ZONE_NORMAL

3: in the balance_pgdat() running, a new balance wakeup happened with
   order = 5, and classzone_idx = ZONE_NORMAL

4: the first wakeup(order = 3) finished successufly, return order = 3
   but, the new_order is still 0, so, this balancing will be treated as a
   failed balance.  And then the second tighter balancing will be missed.

So, to avoid the above problem, the new_order and new_classzone_idx need
to be assigned for later successful comparison.

Signed-off-by: Alex Shi <alex.shi@intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Tested-by: Pádraig Brady <P@draigBrady.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agomm/memblock.c: small function definition fixes
Jonghwan Choi [Mon, 24 Oct 2011 14:54:18 +0000 (01:54 +1100)]
mm/memblock.c: small function definition fixes

warning: function 'memblock_memory_can_coalesce'
with external linkage has definition.

Signed-off-by: Jonghwan Choi <jhbird.choi@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agomm: add free_hot_cold_page_list() helper
Konstantin Khlebnikov [Mon, 24 Oct 2011 14:54:18 +0000 (01:54 +1100)]
mm: add free_hot_cold_page_list() helper

This patch adds helper free_hot_cold_page_list() to free list of 0-order
pages.  It frees pages directly from list without temporary page-vector.
It also calls trace_mm_pagevec_free() to simulate pagevec_free()
behaviour.

bloat-o-meter:

add/remove: 1/1 grow/shrink: 1/3 up/down: 267/-295 (-28)
function                                     old     new   delta
free_hot_cold_page_list                        -     264    +264
get_page_from_freelist                      2129    2132      +3
__pagevec_free                               243     239      -4
split_free_page                              380     373      -7
release_pages                                606     510     -96
free_page_list                               188       -    -188

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agokswapd: avoid unnecessary rebalance after an unsuccessful balancing
Alex,Shi [Mon, 24 Oct 2011 14:54:17 +0000 (01:54 +1100)]
kswapd: avoid unnecessary rebalance after an unsuccessful balancing

In commit 215ddd66 ("mm: vmscan: only read new_classzone_idx from pgdat
when reclaiming successfully") , Mel Gorman said kswapd is better to sleep
after a unsuccessful balancing if there is tighter reclaim request pending
in the balancing.  But in the following scenario, kswapd do something that
is not matched our expectation.  The patch fixes this issue.

1, Read pgdat request A (classzone_idx, order = 3)
2, balance_pgdat()
3, During pgdat, a new pgdat request B (classzone_idx, order = 5) is placed
4, balance_pgdat() returns but failed since returned order = 0
5, pgdat of request A assigned to balance_pgdat(), and do balancing again.
   While the expectation behavior of kswapd should try to sleep.

Signed-off-by: Alex Shi <alex.shi@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Tested-by: Pádraig Brady <P@draigBrady.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agodebug-pagealloc-add-support-for-highmem-pages-fix
Andrew Morton [Mon, 24 Oct 2011 14:54:17 +0000 (01:54 +1100)]
debug-pagealloc-add-support-for-highmem-pages-fix

remove unneeded preempt_disable/enable

Cc: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agodebug-pagealloc: add support for highmem pages
Akinobu Mita [Mon, 24 Oct 2011 14:54:17 +0000 (01:54 +1100)]
debug-pagealloc: add support for highmem pages

This adds support for highmem pages poisoning and verification to the
debug-pagealloc feature for no-architecture support.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agomm-neaten-warn_alloc_failed-fix
Andrew Morton [Mon, 24 Oct 2011 14:54:16 +0000 (01:54 +1100)]
mm-neaten-warn_alloc_failed-fix

use the __printf() macro

Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agomm: neaten warn_alloc_failed
Joe Perches [Mon, 24 Oct 2011 14:54:16 +0000 (01:54 +1100)]
mm: neaten warn_alloc_failed

Add __attribute__((format (printf...) to the function to validate format
and arguments.  Use vsprintf extension %pV to avoid any possible message
interleaving.  Coalesce format string.  Convert printks/pr_warning to
pr_warn.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agomm: iov_iter: have iov_iter_advance() decrement nr_segs appropriately
Jeff Layton [Mon, 24 Oct 2011 14:54:15 +0000 (01:54 +1100)]
mm: iov_iter: have iov_iter_advance() decrement nr_segs appropriately

Currently, when you call iov_iter_advance, then the pointer to the iovec
array can be incremented, but it does not decrement the nr_segs value in
the iov_iter struct.  The result is a iov_iter struct with a nr_segs value
that goes beyond the end of the array.

While I'm not aware of anything that's specifically broken by this, it
seems odd and a bit dangerous not to decrement that value.  If someone
were to trust the nr_segs value to be correct, then they could end up
walking off the end of the array.

Changing this might also provide some micro-optimization when dealing with
the last iovec in an array.  Many of the other routines that deal with
iov_iter have optimized codepaths when nr_segs == 1.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agoinclude/asm-generic/page.h: calculate virt_to_page and page_to_virt via predefined...
Sonic Zhang [Mon, 24 Oct 2011 14:54:15 +0000 (01:54 +1100)]
include/asm-generic/page.h: calculate virt_to_page and page_to_virt via predefined macro

On NOMMU architectures, if physical memory doesn't start from 0,
ARCH_PFN_OFFSET is defined to generate page index in mem_map array.
Because virtual address is equal to physical address, PAGE_OFFSET is
always 0.  virt_to_page and page_to_virt should not index page by
PAGE_OFFSET directly.

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>