]> git.karo-electronics.de Git - karo-tx-linux.git/log
karo-tx-linux.git
11 years agoepoll: support for disabling items, and a self-test app
Paton J. Lewis [Tue, 26 Mar 2013 23:25:07 +0000 (10:25 +1100)]
epoll: support for disabling items, and a self-test app

It is not currently possible to reliably delete epoll items when using the
same epoll set from multiple threads.  After calling epoll_ctl with
EPOLL_CTL_DEL, another thread might still be executing code related to an
event for that epoll item (in response to epoll_wait).  Therefore the
deleting thread does not know when it is safe to delete resources
pertaining to the associated epoll item because another thread might be
using those resources.

The deleting thread could wait an arbitrary amount of time after calling
epoll_ctl with EPOLL_CTL_DEL and before deleting the item, but this is
inefficient and could result in the destruction of resources before
another thread is done handling an event returned by epoll_wait.

This patch enhances epoll_ctl to support EPOLL_CTL_DISABLE, which disables
an epoll item.  If epoll_ctl returns -EBUSY in this case, then another
thread may handling a return from epoll_wait for this item.  Otherwise if
epoll_ctl returns 0, then it is safe to delete the epoll item.  This
allows multiple threads to use a mutex to determine when it is safe to
delete an epoll item and its associated resources, which allows epoll
items to be deleted both efficiently and without error in a multi-threaded
environment.  Note that EPOLL_CTL_DISABLE is only useful in conjunction
with EPOLLONESHOT, and using EPOLL_CTL_DISABLE on an epoll item without
EPOLLONESHOT returns -EINVAL.

This patch also adds a new test_epoll self-test program to both
demonstrate the need for this feature and test it.

Signed-off-by: Paton J. Lewis <palewis@adobe.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Paul Holland <pholland@adobe.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoepoll: comment + BUILD_BUG_ON to prevent epitem bloat
Eric Wong [Tue, 26 Mar 2013 23:25:06 +0000 (10:25 +1100)]
epoll: comment + BUILD_BUG_ON to prevent epitem bloat

This will prevent us from accidentally introducing a memory bloat
regression here in the future.

Signed-off-by: Eric Wong <normalperson@yhbt.net>
Cc: Davide Libenzi <davidel@xmailserver.org>,
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoepoll-trim-epitem-by-one-cache-line-on-x86_64-fix
Andrew Morton [Tue, 26 Mar 2013 23:25:06 +0000 (10:25 +1100)]
epoll-trim-epitem-by-one-cache-line-on-x86_64-fix

use __packed, for all architectures

Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoepoll: trim epitem by one cache line
Eric Wong [Tue, 26 Mar 2013 23:25:06 +0000 (10:25 +1100)]
epoll: trim epitem by one cache line

It is common for epoll users to have thousands of epitems, so saving a
cache line on every allocation leads to large memory savings.

Since epitem allocations are cache-aligned, reducing sizeof(struct epitem)
from 136 bytes to 128 bytes will allow it to squeeze under a cache line
boundary on x86_64.

Via /sys/kernel/slab/eventpoll_epi, I see the following changes on my
x86_64 Core2 Duo (which has 64-byte cache alignment):

object_size  :  192 => 128
objs_per_slab:   21 =>  32

Also, add a BUILD_BUG_ON() to check for future accidental breakage.

Signed-off-by: Eric Wong <normalperson@yhbt.net>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agocheckpatch: Complain about executable files
Joe Perches [Tue, 26 Mar 2013 23:25:06 +0000 (10:25 +1100)]
checkpatch: Complain about executable files

Complain about files with an executable bit set that are not in a scripts/
directory and are not type .pl, .py, .awk, or .sh

Based on an initial patch from Stephen.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agocheckpatch: Prefer seq_puts to seq_printf
Joe Perches [Tue, 26 Mar 2013 23:25:05 +0000 (10:25 +1100)]
checkpatch: Prefer seq_puts to seq_printf

Add a check for seq_printf use with a constant format without additional
arguments.  Suggest seq_puts instead.

Signed-off-by: Joe Perches <joe@perches.com>
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agocheckpatch: add check for reuse of krealloc arg
Joe Perches [Tue, 26 Mar 2013 23:25:05 +0000 (10:25 +1100)]
checkpatch: add check for reuse of krealloc arg

On Thu, 2013-03-14 at 13:30 +0000, David Woodhouse wrote:
> If krealloc() returns NULL, it *doesn't* free the original. So any code
> of the form 'foo = krealloc(foo, …);' is almost certainly a bug.

So add a check for it to checkpatch.

Signed-off-by: Joe Perches <joe@perches.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoargv_split-teach-it-to-handle-mutable-strings-fix-2
Oleg Nesterov [Tue, 26 Mar 2013 23:25:05 +0000 (10:25 +1100)]
argv_split-teach-it-to-handle-mutable-strings-fix-2

Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoargv_split-teach-it-to-handle-mutable-strings-fix
Andrew Morton [Tue, 26 Mar 2013 23:25:04 +0000 (10:25 +1100)]
argv_split-teach-it-to-handle-mutable-strings-fix

Cc: Andi Kleen <andi@firstfloor.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoargv_split(): teach it to handle mutable strings
Oleg Nesterov [Tue, 26 Mar 2013 23:25:04 +0000 (10:25 +1100)]
argv_split(): teach it to handle mutable strings

argv_split() allocates argv[count_argc(str)] array and assumes that it
will find the same number of arguments later.  This is obviously wrong if
this string can be changed, say, by sysctl.

With this patch argv_split() kstrndup's the whole string and does not
split it, we simply replace the spaces with zeroes and keep the allocated
memory in argv[-1] for argv_free(arg).

We do not use argv[0] because:

- str can be all-spaces or empty. In fact this case is fine,
  we could kfree() it before return, but:

- str can have a space at the start, and we can not rely on
  kstrndup(skip_spaces(str)) because it can equally race if
  this string is mutable.

Also, simplify count_argc() and kill the no longer used skip_arg().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agolib/int_sqrt.c: optimize square root algorithm
Davidlohr Bueso [Tue, 26 Mar 2013 23:25:04 +0000 (10:25 +1100)]
lib/int_sqrt.c: optimize square root algorithm

Optimize the current version of the shift-and-subtract (hardware)
algorithm, described by John von Newmann[1] and Guy L.  Steele.

Iterating 1,000,000 times, perf shows for the current version:

 Performance counter stats for './sqrt-curr' (10 runs):

         27.170996 task-clock                #    0.979 CPUs utilized            ( +-  3.19% )
                 3 context-switches          #    0.103 K/sec                    ( +-  4.76% )
                 0 cpu-migrations            #    0.004 K/sec                    ( +-100.00% )
               104 page-faults               #    0.004 M/sec                    ( +-  0.16% )
        64,921,199 cycles                    #    2.389 GHz                      ( +-  0.03% )
        28,967,789 stalled-cycles-frontend   #   44.62% frontend cycles idle     ( +-  0.18% )
   <not supported> stalled-cycles-backend
       104,502,623 instructions              #    1.61  insns per cycle
                                             #    0.28  stalled cycles per insn  ( +-  0.00% )
        34,088,368 branches                  # 1254.587 M/sec                    ( +-  0.00% )
             4,901 branch-misses             #    0.01% of all branches          ( +-  1.32% )

       0.027763015 seconds time elapsed                                          ( +-  3.22% )

And for the new version:

Performance counter stats for './sqrt-new' (10 runs):

          0.496869 task-clock                #    0.519 CPUs utilized            ( +-  2.38% )
                 0 context-switches          #    0.000 K/sec
                 0 cpu-migrations            #    0.403 K/sec                    ( +-100.00% )
               104 page-faults               #    0.209 M/sec                    ( +-  0.15% )
           590,760 cycles                    #    1.189 GHz                      ( +-  2.35% )
           395,053 stalled-cycles-frontend   #   66.87% frontend cycles idle     ( +-  3.67% )
   <not supported> stalled-cycles-backend
           398,963 instructions              #    0.68  insns per cycle
                                             #    0.99  stalled cycles per insn  ( +-  0.39% )
            70,228 branches                  #  141.341 M/sec                    ( +-  0.36% )
             3,364 branch-misses             #    4.79% of all branches          ( +-  5.45% )

       0.000957440 seconds time elapsed                                          ( +-  2.42% )

Furthermore, this saves space in instruction text:

   text    data     bss     dec     hex filename
    111       0       0     111      6f lib/int_sqrt-baseline.o
     89       0       0      89      59 lib/int_sqrt.o

[1] http://en.wikipedia.org/wiki/First_Draft_of_a_Report_on_the_EDVAC

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Reviewed-by: Jonathan Gonzalez <jgonzlez@linets.cl>
Tested-by: Jonathan Gonzalez <jgonzlez@linets.cl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodrivers/leds/leds-ot200.c: fix error caused by shifted mask
Christian Gmeiner [Tue, 26 Mar 2013 23:25:04 +0000 (10:25 +1100)]
drivers/leds/leds-ot200.c: fix error caused by shifted mask

During the development of this driver an in-house register documentation
was used.  The last week some integration tests were done and this problem
was found.  It turned out that the released register documentation is
wrong.

The fix is very simple: shift all masks by one.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Bryan Wu <cooloney@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodrivers/video/backlight/as3711_bl.c: add OF support
Guennadi Liakhovetski [Tue, 26 Mar 2013 23:25:03 +0000 (10:25 +1100)]
drivers/video/backlight/as3711_bl.c: add OF support

Add support for configuring AS3711 backlight driver from DT.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski+renesas@gmail.com>
Reviewed-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Magnus Damm <magnus.damm@gmail.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Acked-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodrivers/video/backlight/adp8870_bl.c: fix error return code in adp8870_led_probe()
Wei Yongjun [Tue, 26 Mar 2013 23:25:03 +0000 (10:25 +1100)]
drivers/video/backlight/adp8870_bl.c: fix error return code in adp8870_led_probe()

Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodrivers/video/backlight/adp8860_bl.c: fix error return code in adp8860_led_probe()
Wei Yongjun [Tue, 26 Mar 2013 23:25:03 +0000 (10:25 +1100)]
drivers/video/backlight/adp8860_bl.c: fix error return code in adp8860_led_probe()

Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Jingoo Han <jg1.han@samsung.com>
Acked-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodrivers/video/backlight/lp855x_bl.c: use PAGE_SIZE for the sysfs read operation
Kim, Milo [Tue, 26 Mar 2013 23:25:02 +0000 (10:25 +1100)]
drivers/video/backlight/lp855x_bl.c: use PAGE_SIZE for the sysfs read operation

sysfs allocates PAGE_SIZE.  It is used by each R/W operation method.  Use
it instead of another buffer size.

Signed-off-by: Milo(Woogyom) Kim <milo.kim@ti.com>
Acked-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agobacklight: da903x_bl: use BL_CORE_SUSPENDRESUME option
Jingoo Han [Tue, 26 Mar 2013 23:25:02 +0000 (10:25 +1100)]
backlight: da903x_bl: use BL_CORE_SUSPENDRESUME option

Use BL_CORE_SUSPENDRESUME option to support suspend/resume.
It reduces code size.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agovideo: backlight: add ili922x lcd driver
Stefano Babic [Tue, 26 Mar 2013 23:25:02 +0000 (10:25 +1100)]
video: backlight: add ili922x lcd driver

Add LCD driver for Ilitek ILI9221/ILI9222 controller.  The driver uses SPI
interface for controller access and configuration and RGB interface for
graphics data transfer.

Signed-off-by: Stefano Babic <sbabic@denx.de>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Cc: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodrivers/video/backlight/adp5520_bl.c: fix compiler warning in adp5520_show()
Devendra Naga [Tue, 26 Mar 2013 23:25:01 +0000 (10:25 +1100)]
drivers/video/backlight/adp5520_bl.c: fix compiler warning in adp5520_show()

While compiling with make W=1 (gcc gcc (GCC) 4.7.2 20121109 (Red Hat
4.7.2-8)) found the following warning

drivers/video/backlight/adp5520_bl.c: In function `adp5520_show':
drivers/video/backlight/adp5520_bl.c:146:6: warning: variable `error' set but not used [-Wunused-but-set-variable]

fixed by checking the return value of the variable

Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>
Acked-by: Jingoo Han <jg1.han@samsung.com>
Cc: Michael Hennerich <michael.hennerich@analog.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodrivers/video/backlight/Kconfig: fix typo "MACH_SAM9...EK" three times
Paul Bolle [Tue, 26 Mar 2013 23:25:01 +0000 (10:25 +1100)]
drivers/video/backlight/Kconfig: fix typo "MACH_SAM9...EK" three times

Fix three typos (originally) introduced by a9a84c37d ("atmel_lcdfb:
backlight control").

Two of these typos were introduced in v2.6.25.  (The third was introduced
in 915190f7d4f08 ("[ARM] 5614/1: at91: atmel_lcdfb: add at91sam9g10
support to atmel LCD driver")).  Checking these commits reveals that the
default value of 'y' has never been set automatically in all releases
since v2.6.25!  Perhaps this line might as well be dropped.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Acked-by: Jingoo Han <jg1.han@samsung.com>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agobacklight: tdo24m: convert tdo24m to dev_pm_ops
Jingoo Han [Tue, 26 Mar 2013 23:25:01 +0000 (10:25 +1100)]
backlight: tdo24m: convert tdo24m to dev_pm_ops

Instead of using legacy suspend/resume methods, using newer dev_pm_ops
structure allows better control over power management.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agobacklight: ltv350qv: convert ltv350qv to dev_pm_ops
Jingoo Han [Tue, 26 Mar 2013 23:25:01 +0000 (10:25 +1100)]
backlight: ltv350qv: convert ltv350qv to dev_pm_ops

Instead of using legacy suspend/resume methods, using newer dev_pm_ops
structure allows better control over power management.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agobacklight: locomolcd: convert locomolcd to dev_pm_ops
Jingoo Han [Tue, 26 Mar 2013 23:25:00 +0000 (10:25 +1100)]
backlight: locomolcd: convert locomolcd to dev_pm_ops

Instead of using legacy suspend/resume methods, using newer dev_pm_ops
structure allows better control over power management.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agobacklight: lm3533_bl: convert lm3533_bl to dev_pm_ops
Jingoo Han [Tue, 26 Mar 2013 23:25:00 +0000 (10:25 +1100)]
backlight: lm3533_bl: convert lm3533_bl to dev_pm_ops

Instead of using legacy suspend/resume methods, using newer dev_pm_ops
structure allows better control over power management.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agobacklight: kb3886_bl: convert kb3886bl to dev_pm_ops
Jingoo Han [Tue, 26 Mar 2013 23:25:00 +0000 (10:25 +1100)]
backlight: kb3886_bl: convert kb3886bl to dev_pm_ops

Instead of using legacy suspend/resume methods, using newer dev_pm_ops
structure allows better control over power management.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agobacklight: hp680_bl: convert hp680bl to dev_pm_ops
Jingoo Han [Tue, 26 Mar 2013 23:25:00 +0000 (10:25 +1100)]
backlight: hp680_bl: convert hp680bl to dev_pm_ops

Instead of using legacy suspend/resume methods, using newer dev_pm_ops
structure allows better control over power management.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agobacklight: ep93xx: convert ep93xx to dev_pm_ops
Jingoo Han [Tue, 26 Mar 2013 23:24:59 +0000 (10:24 +1100)]
backlight: ep93xx: convert ep93xx to dev_pm_ops

Instead of using legacy suspend/resume methods, using newer dev_pm_ops
structure allows better control over power management.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agobacklight: corgi_lcd: convert corgi_lcd to dev_pm_ops
Jingoo Han [Tue, 26 Mar 2013 23:24:59 +0000 (10:24 +1100)]
backlight: corgi_lcd: convert corgi_lcd to dev_pm_ops

Instead of using legacy suspend/resume methods, using newer dev_pm_ops
structure allows better control over power management.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agobacklight: adp8870: convert adp8870 to dev_pm_ops
Jingoo Han [Tue, 26 Mar 2013 23:24:59 +0000 (10:24 +1100)]
backlight: adp8870: convert adp8870 to dev_pm_ops

Instead of using legacy suspend/resume methods, using newer dev_pm_ops
structure allows better control over power management.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agobacklight: adp8860: convert adp8860 to dev_pm_ops
Jingoo Han [Tue, 26 Mar 2013 23:24:58 +0000 (10:24 +1100)]
backlight: adp8860: convert adp8860 to dev_pm_ops

Instead of using legacy suspend/resume methods, using newer dev_pm_ops
structure allows better control over power management.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agobacklight: adp5520: convert adp5520_bl to dev_pm_ops
Jingoo Han [Tue, 26 Mar 2013 23:24:58 +0000 (10:24 +1100)]
backlight: adp5520: convert adp5520_bl to dev_pm_ops

Instead of using legacy suspend/resume methods, using newer dev_pm_ops
structure allows better control over power management.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agobacklight: s6e63m0: convert s6e63m0 to dev_pm_ops
Jingoo Han [Tue, 26 Mar 2013 23:24:58 +0000 (10:24 +1100)]
backlight: s6e63m0: convert s6e63m0 to dev_pm_ops

Instead of using legacy suspend/resume methods, using newer dev_pm_ops
structure allows better control over power management.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agobacklight: lms501kf03: convert lms501kf03 to dev_pm_ops
Jingoo Han [Tue, 26 Mar 2013 23:24:58 +0000 (10:24 +1100)]
backlight: lms501kf03: convert lms501kf03 to dev_pm_ops

Instead of using legacy suspend/resume methods, using newer dev_pm_ops
structure allows better control over power management.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agobacklight: ld9040: convert ld9040 to dev_pm_ops
Jingoo Han [Tue, 26 Mar 2013 23:24:57 +0000 (10:24 +1100)]
backlight: ld9040: convert ld9040 to dev_pm_ops

Instead of using legacy suspend/resume methods, using newer dev_pm_ops
structure allows better control over power management.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodrivers-video-backlight-l4f00242t03c-check-return-value-of-regulator_enable-fix
Jingoo Han [Tue, 26 Mar 2013 23:24:57 +0000 (10:24 +1100)]
drivers-video-backlight-l4f00242t03c-check-return-value-of-regulator_enable-fix

- Added regulator_disable() for IO regulator before returning

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodrivers/video/backlight/l4f00242t03.c: check return value of regulator_enable()
Jingoo Han [Tue, 26 Mar 2013 23:24:57 +0000 (10:24 +1100)]
drivers/video/backlight/l4f00242t03.c: check return value of regulator_enable()

regulator_enable() is marked as as __must_check.  Therefore the return
value of regulator_enable() should be checked.  Also, this patch checks
return value of regulator_set_voltage().

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodrivers/video/backlight/adp8870_bl.c: add missing braces
Jingoo Han [Tue, 26 Mar 2013 23:24:56 +0000 (10:24 +1100)]
drivers/video/backlight/adp8870_bl.c: add missing braces

Add missing braces to include error message.  The error message is related
to the return value for sysfs_create_group().  However,
sysfs_create_group() is called when pdata->en_ambl_sens is not zero.
Thus, the checking return value should be included in the if statement.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodrivers/video/backlight/generic_bl.c: use dev_info() instead of pr_info()
Jingoo Han [Tue, 26 Mar 2013 23:24:56 +0000 (10:24 +1100)]
drivers/video/backlight/generic_bl.c: use dev_info() instead of pr_info()

dev_info() is preferred to pr_info().

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodrivers/video/backlight/omap1_bl.c: use dev_info() instead of pr_info()
Jingoo Han [Tue, 26 Mar 2013 23:24:56 +0000 (10:24 +1100)]
drivers/video/backlight/omap1_bl.c: use dev_info() instead of pr_info()

dev_info() is preferred to pr_info().

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodrivers/video/backlight/jornada720_*.c: use dev_err()/dev_info() instead of pr_err...
Jingoo Han [Tue, 26 Mar 2013 23:24:56 +0000 (10:24 +1100)]
drivers/video/backlight/jornada720_*.c: use dev_err()/dev_info() instead of pr_err()/pr_info()

dev_err()/dev_info() are preferred to pr_err()/pr_info().

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodrivers/video/backlight/lp855x_bl.c: fix compiler warning in lp855x_probe
Devendra Naga [Tue, 26 Mar 2013 23:24:55 +0000 (10:24 +1100)]
drivers/video/backlight/lp855x_bl.c: fix compiler warning in lp855x_probe

while doing with make W=1 gcc (gcc (GCC) 4.7.2 20121109 (Red Hat 4.7.2-8))

found

drivers/video/backlight/lp855x_bl.c: In function `lp855x_probe':
drivers/video/backlight/lp855x_bl.c:342:35: warning: variable `mode' set but not used [-Wunused-but-set-variable]

fixed by removing it as since its not used anywhere

Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>
Acked-by: Milo Kim <milo.kim@ti.com>
Cc: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodrivers/video/backlight/atmel-pwm-bl.c: add __init annotation
Jingoo Han [Tue, 26 Mar 2013 23:24:55 +0000 (10:24 +1100)]
drivers/video/backlight/atmel-pwm-bl.c: add __init annotation

When platform_driver_probe() is used, bind/unbind via sysfs is disabled.
Thus, __init/__exit annotations can be added to probe()/remove().

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodrivers/video/backlight/atmel-pwm-bl.c: use module_platform_driver_probe()
Jingoo Han [Tue, 26 Mar 2013 23:24:55 +0000 (10:24 +1100)]
drivers/video/backlight/atmel-pwm-bl.c: use module_platform_driver_probe()

Use the module_platform_driver_probe() macro which makes the code smaller
and simpler.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodrivers/video/backlight/ep93xx_bl.c: remove incorrect __init annotation
Jingoo Han [Tue, 26 Mar 2013 23:24:54 +0000 (10:24 +1100)]
drivers/video/backlight/ep93xx_bl.c: remove incorrect __init annotation

When platform_driver_probe() is not used, bind/unbind via sysfs is
enabled.  Thus, __init/__exit annotations should be removed from
probe()/remove().

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Acked-by: Ryan Mallon <rmallon@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodrivers/video/backlight/platform_lcd.c: remove unnecessary ifdefs
Jingoo Han [Tue, 26 Mar 2013 23:24:54 +0000 (10:24 +1100)]
drivers/video/backlight/platform_lcd.c: remove unnecessary ifdefs

When the macro such as SIMPLE_DEV_PM_OPS is used, there is no need to use
'#ifdef CONFIG_PM' to prevent build error.  Thus, this patch removes
unnecessary ifdefs.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodrivers-video-backlight-ams369fg06c-convert-ams369fg06-to-dev_pm_ops-fix
Jingoo Han [Tue, 26 Mar 2013 23:24:54 +0000 (10:24 +1100)]
drivers-video-backlight-ams369fg06c-convert-ams369fg06-to-dev_pm_ops-fix

- Remove unnecessary ifdefs.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodrivers/video/backlight/ams369fg06.c: convert ams369fg06 to dev_pm_ops
Jingoo Han [Tue, 26 Mar 2013 23:24:54 +0000 (10:24 +1100)]
drivers/video/backlight/ams369fg06.c: convert ams369fg06 to dev_pm_ops

Instead of using legacy suspend/resume methods, using newer dev_pm_ops
structure allows better control over power management.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoMAINTAINERS: i8k driver is orphan
Jean Delvare [Tue, 26 Mar 2013 23:24:53 +0000 (10:24 +1100)]
MAINTAINERS: i8k driver is orphan

Massimo Dal Zotto stopped maintaining the i8k driver several years ago, so
move his name from MAINTAINERS to CREDITS.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Massimo Dal Zotto <dz@debian.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoget_maintainer-use-filename-only-regex-match-for-tegra-fix
Andrew Morton [Tue, 26 Mar 2013 23:24:53 +0000 (10:24 +1100)]
get_maintainer-use-filename-only-regex-match-for-tegra-fix

fix typo in docs, per Marcin

Cc: Joe Perches <joe@perches.com>
Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoget_maintainer: use filename-only regex match for Tegra
Stephen Warren [Tue, 26 Mar 2013 23:24:53 +0000 (10:24 +1100)]
get_maintainer: use filename-only regex match for Tegra

Create a new N: entry type in MAINTAINERS which performs a regex match
against filenames; either those extracted from patch +++ or --- lines, or
those specified on the command-line using the -f option.

This provides the same benefits as using a K: regex option to match a set
of filenames (see commit eb90d08 "get_maintainer: allow keywords to match
filenames"), but without the disadvantage that "random" file content, such
as comments, will ever match the regex.  Hence, revert most of that
commit.

Switch the Tegra entry from using K: to N:

Reported-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoinclude/linux/printk.h: include stdarg.h
Andrew Morton [Tue, 26 Mar 2013 23:24:52 +0000 (10:24 +1100)]
include/linux/printk.h: include stdarg.h

printk.h uses va_list but doesn't include stdarg.h.  Hence printk.h is
unusable unless its includer has already included kernel.h (which includes
stdarg.h).

Remove the dependency by including stdarg.h in printk.h

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoearly_printk-consolidate-random-copies-of-identical-code-v3-fix
Andrew Morton [Tue, 26 Mar 2013 23:24:52 +0000 (10:24 +1100)]
early_printk-consolidate-random-copies-of-identical-code-v3-fix

arch/mips/kernel/early_printk.c needs kernel.h for va_list

Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoearly_printk: consolidate random copies of identical code
Thomas Gleixner [Tue, 26 Mar 2013 23:24:52 +0000 (10:24 +1100)]
early_printk: consolidate random copies of identical code

The early console implementations are the same all over the place.  Move
the print function to kernel/printk and get rid of the copies.

[v3: drop sparc bits as suggested by tglx, redo build tests on sparc
 sparc32, Randy's randconfig, ppc, mips, arm...]

[v2: essentially unchanged since v1, so I've left the acked/reviewed
 tags.  There was a compile fail[1] for a randconfig with EARLY_PRINTK=y
 and PRINTK=n, because the early_console struct and early_printk calls
 were nested within an #ifdef CONFIG_PRINTK -- moving that whole block
 exactly as-is to be outside the #ifdef CONFIG_PRINTK fixes the randconfig
 and still works for everyday sane configs too.]
 [1] http://marc.info/?l=linux-next&m=136219350914998&w=2

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King <linux@arm.linux.org.uk>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Richard Weinberger <richard@nod.at>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoearly_printk: consolidate random copies of identical code
Thomas Gleixner [Tue, 26 Mar 2013 23:24:52 +0000 (10:24 +1100)]
early_printk: consolidate random copies of identical code

The early console implementations are the same all over the place.  Move
the print function to kernel/printk and get rid of the copies.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Russell King <linux@arm.linux.org.uk>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Richard Weinberger <richard@nod.at>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoprintk/tracing: rework console tracing
zhangwei(Jovi) [Tue, 26 Mar 2013 23:24:51 +0000 (10:24 +1100)]
printk/tracing: rework console tracing

commit 7ff9554bb ("printk: convert byte-buffer to variable-length record
buffer") removed start and end parameters in call_console_drivers, but
those parameters still exists in include/trace/events/printk.h.

Without start and end parameters handling, printk tracing became more
simple as: trace_console(text, len);

Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agokernel/smp.c: cleanups
Andrew Morton [Tue, 26 Mar 2013 23:24:51 +0000 (10:24 +1100)]
kernel/smp.c: cleanups

We sometimes use "struct call_single_data *data" and sometimes "struct
call_single_data *csd".  Use "csd" consistently.

We sometimes use "struct call_function_data *data" and sometimes "struct
call_function_data *cfd".  Use "cfd" consistently.

Also, avoid some 80-col layout tricks.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Shaohua Li <shli@fusionio.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoinclude/linux/fs.h: disable preempt when acquire i_size_seqcount write lock
Fan Du [Tue, 26 Mar 2013 23:24:51 +0000 (10:24 +1100)]
include/linux/fs.h: disable preempt when acquire i_size_seqcount write lock

Two rt tasks bind to one CPU core.

The higher priority rt task A preempts a lower priority rt task B which
has already taken the write seq lock, and then the higher priority rt task
A try to acquire read seq lock, it's doomed to lockup.

rt task A with lower priority: call write
i_size_write                                        rt task B with higher priority: call sync, and preempt task A
  write_seqcount_begin(&inode->i_size_seqcount);    i_size_read
  inode->i_size = i_size;                             read_seqcount_begin <-- lockup here...

So disable preempt when acquiring every i_size_seqcount *write* lock will
cure the problem.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agosmp: Give WARN()ing when calling smp_call_function_many()/single() in serving irq
Chuansheng Liu [Tue, 26 Mar 2013 23:24:50 +0000 (10:24 +1100)]
smp: Give WARN()ing when calling smp_call_function_many()/single() in serving irq

Currently the functions smp_call_function_many()/single() will give a
WARN()ing only in the case of irqs_disabled(), but that check is not
enough to guarantee execution of the SMP cross-calls.

In many other cases such as softirq handling/interrupt handling, the two
APIs still can not be called, just as the smp_call_function_many()
comments say:

  * You must not call this function with disabled interrupts or from a
  * hardware interrupt handler or from a bottom half handler. Preemption
  * must be disabled when calling this function.

There is a real case for softirq DEADLOCK case:

CPUA                            CPUB
                                spin_lock(&spinlock)
                                Any irq coming, call the irq handler
                                irq_exit()
spin_lock_irq(&spinlock)
<== Blocking here due to
CPUB hold it
                                  __do_softirq()
                                    run_timer_softirq()
                                      timer_cb()
                                        call smp_call_function_many()
                                          send IPI interrupt to CPUA
                                            wait_csd()

Then both CPUA and CPUB will be deadlocked here.

So we should give a warning in the nmi, hardirq or softirq context as well.

Moreover, adding one new macro in_serving_irq() which indicates we are
processing nmi, hardirq or sofirq.

Signed-off-by: liu chuansheng <chuansheng.liu@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Lai Jiangshan <eag0628@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agokernel/range.c: subtract_range: fix the broken phrase issued by printk
Lin Feng [Tue, 26 Mar 2013 23:24:50 +0000 (10:24 +1100)]
kernel/range.c: subtract_range: fix the broken phrase issued by printk

Also replace deprecated printk(KERN_ERR...) with pr_err() as suggested
by Yinghai, attaching the function name to provide plenty info.

Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agokernel/watchdog.c: add comments to explain watchdog_disabled variable
anish kumar [Tue, 26 Mar 2013 23:24:50 +0000 (10:24 +1100)]
kernel/watchdog.c: add comments to explain watchdog_disabled variable

This watchdog_disabled flag is a bit cryptic.  However it's usefulness is
multifold.  Uses are:

1. Check if smpboot_register_percpu_thread function passed.
2. Makes sure that user enables and disables the watchdog in sequence
   i.e. enable watchdog->disable watchdog->enable watchdog
   Unlike enable watchdog->enable watchdog which is wrong.

[dzickus@redhat.com: small text cleanups]
Signed-off-by: anish kumar <anish198519851985@gmail.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodrivers/rpmsg/virtio_rpmsg_bus.c: fix error return code in rpmsg_probe()
Wei Yongjun [Tue, 26 Mar 2013 23:24:49 +0000 (10:24 +1100)]
drivers/rpmsg/virtio_rpmsg_bus.c: fix error return code in rpmsg_probe()

Return a negative error code from the error handling case instead of 0, as
returned elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomm: add vm event counters for balloon pages compaction
Rafael Aquini [Tue, 26 Mar 2013 23:24:49 +0000 (10:24 +1100)]
mm: add vm event counters for balloon pages compaction

Introduce a new set of vm event counters to keep track of ballooned pages
compaction activity.

Signed-off-by: Rafael Aquini <aquini@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomemcg-debugging-facility-to-access-dangling-memcgs-fix
Andrew Morton [Tue, 26 Mar 2013 23:24:49 +0000 (10:24 +1100)]
memcg-debugging-facility-to-access-dangling-memcgs-fix

fix up Kconfig text

Cc: Glauber Costa <glommer@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomemcg: debugging facility to access dangling memcgs
Glauber Costa [Tue, 26 Mar 2013 23:24:48 +0000 (10:24 +1100)]
memcg: debugging facility to access dangling memcgs

If memcg is tracking anything other than plain user memory (swap, tcp buf
mem, or slab memory), it is possible - and normal - that a reference will
be held by the group after it is dead.  Still, for developers, it would be
extremely useful to be able to query about those states during debugging.

This patch provides a debugging facility in the root memcg, so we can
inspect which memcgs still have pending objects, and what is the cause of
this state.

Signed-off-by: Glauber Costa <glommer@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomm/dmapool.c: fix null dev in dma_pool_create()
Xi Wang [Tue, 26 Mar 2013 23:24:48 +0000 (10:24 +1100)]
mm/dmapool.c: fix null dev in dma_pool_create()

A few drivers invoke dma_pool_create() with a null dev.  Note that dev is
dereferenced in dev_to_node(dev), causing a null pointer dereference.

A long term solution is to disallow null dev.  Once the drivers are fixed,
we can simplify the core code here.  For now we add WARN_ON(!dev) to
notify the driver maintainers and avoid the null pointer dereference.

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodrivers/usb/gadget/amd5536udc.c: avoid calling dma_pool_create() with NULL dev
Xi Wang [Tue, 26 Mar 2013 23:24:48 +0000 (10:24 +1100)]
drivers/usb/gadget/amd5536udc.c: avoid calling dma_pool_create() with NULL dev

Calling dma_pool_create() with dev==NULL will oops on a NUMA machine.
Rather than changing dma_pool_create() we wish to disallow passing
dev==NULL.  This requires fixing up the small number of drivers which are
passing in dev==NULL.

Use &dev->pdev->dev instead of NULL.

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodrop_caches: add some documentation and info message
Michal Hocko [Tue, 26 Mar 2013 23:24:48 +0000 (10:24 +1100)]
drop_caches: add some documentation and info message

I would like to resurrect Dave's patch.  The last time it was posted was
here https://lkml.org/lkml/2010/9/16/250 and there didn't seem to be any
strong opposition.

Kosaki was worried about possible excessive logging when somebody drops
caches too often (but then he claimed he didn't have a strong opinion on
that) but I would say opposite.  If somebody does that then I would really
like to know that from the log when supporting a system because it almost
for sure means that there is something fishy going on.  It is also worth
mentioning that only root can write drop caches so this is not an flooding
attack vector.

I am bringing that up again because this can be really helpful when
chasing strange performance issues which (surprise surprise) turn out to
be related to artificially dropped caches done because the admin thinks
this would help...

I have just refreshed the original patch on top of the current mm tree
but I could live with KERN_INFO as well if people think that KERN_NOTICE
is too hysterical.

: From: Dave Hansen <dave@linux.vnet.ibm.com>
: Date: Fri, 12 Oct 2012 14:30:54 +0200
:
: There is plenty of anecdotal evidence and a load of blog posts
: suggesting that using "drop_caches" periodically keeps your system
: running in "tip top shape".  Perhaps adding some kernel
: documentation will increase the amount of accurate data on its use.
:
: If we are not shrinking caches effectively, then we have real bugs.
: Using drop_caches will simply mask the bugs and make them harder
: to find, but certainly does not fix them, nor is it an appropriate
: "workaround" to limit the size of the caches.
:
: It's a great debugging tool, and is really handy for doing things
: like repeatable benchmark runs.  So, add a bit more documentation
: about it, and add a little KERN_NOTICE.  It should help developers
: who are chasing down reclaim-related bugs.

[mhocko@suse.cz: refreshed to current -mm tree]
[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomm: memmap_init_zone() performance improvement
Mike Yoknis [Tue, 26 Mar 2013 23:24:47 +0000 (10:24 +1100)]
mm: memmap_init_zone() performance improvement

We have what we call an "architectural simulator".  It is a computer
program that pretends that it is a computer system.  We use it to test the
firmware before real hardware is available.  We have booted Linux on our
simulator.  As you would expect it takes longer to boot on the simulator
than it does on real hardware.

With my patch - boot time 41 minutes
Without patch - boot time 94 minutes

These numbers do not scale linearly to real hardware.  But indicate to me
a place where Linux can be improved.

memmap_init_zone() loops through every Page Frame Number (pfn), including
pfn values that are within the gaps between existing memory sections.  The
unneeded looping will become a boot performance issue when machines
configure larger memory ranges that will contain larger and more numerous
gaps.

The code will skip across invalid pfn values to reduce the number of loops
executed.

Signed-off-by: Mike Yoknis <mike.yoknis@hp.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoinclude-linux-mmzoneh-cleanups-fix
Andrew Morton [Tue, 26 Mar 2013 23:24:47 +0000 (10:24 +1100)]
include-linux-mmzoneh-cleanups-fix

use zone_idx() some more, further simplify is_highmem()

Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoinclude/linux/mmzone.h: cleanups
Andrew Morton [Tue, 26 Mar 2013 23:24:47 +0000 (10:24 +1100)]
include/linux/mmzone.h: cleanups

- implement zone_idx() in C to fix its references-args-twice macro bug

- use zone_idx() in is_highmem() to remove large amounts of silly fluff.

Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomm: remove free_area_cache
Michel Lespinasse [Tue, 26 Mar 2013 23:24:46 +0000 (10:24 +1100)]
mm: remove free_area_cache

Since all architectures have been converted to use vm_unmapped_area(),
there is no remaining use for the free_area_cache.

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agopowerpc/mm/numa: use setup_nr_node_ids() instead of opencoding.
Cody P Schafer [Tue, 26 Mar 2013 23:24:46 +0000 (10:24 +1100)]
powerpc/mm/numa: use setup_nr_node_ids() instead of opencoding.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agox86/mm/numa: use setup_nr_node_ids() instead of opencoding.
Cody P Schafer [Tue, 26 Mar 2013 23:24:46 +0000 (10:24 +1100)]
x86/mm/numa: use setup_nr_node_ids() instead of opencoding.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agopage_alloc: make setup_nr_node_ids() usable for arch init code
Cody P Schafer [Tue, 26 Mar 2013 23:24:46 +0000 (10:24 +1100)]
page_alloc: make setup_nr_node_ids() usable for arch init code

powerpc and x86 were opencoding copies of setup_nr_node_ids(), which
page_alloc provides but makes static. Make it avaliable to the archs in
linux/mm.h.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
11 years agomm-speedup-in-__early_pfn_to_nid-fix
Andrew Morton [Tue, 26 Mar 2013 23:24:45 +0000 (10:24 +1100)]
mm-speedup-in-__early_pfn_to_nid-fix

add missing semicolon, per Tony

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomm: speedup in __early_pfn_to_nid
Russ Anderson [Tue, 26 Mar 2013 23:24:45 +0000 (10:24 +1100)]
mm: speedup in __early_pfn_to_nid

When booting on a large memory system, the kernel spends considerable time
in memmap_init_zone() setting up memory zones.  Analysis shows significant
time spent in __early_pfn_to_nid().

The routine memmap_init_zone() checks each PFN to verify the nid is valid.
 __early_pfn_to_nid() sequentially scans the list of pfn ranges to find
the right range and returns the nid.  This does not scale well.  On a 4 TB
(single rack) system there are 308 memory ranges to scan.  The higher the
PFN the more time spent sequentially spinning through memory ranges.

Since memmap_init_zone() increments pfn, it will almost always be looking
for the same range as the previous pfn, so check that range first.  If it
is in the same range, return that nid.  If not, scan the list as before.

A 4 TB (single rack) UV1 system takes 512 seconds to get through the zone
code.  This performance optimization reduces the time by 189 seconds, a
36% improvement.

A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds, a
112.9 second (53%) reduction.

[akpm@linux-foundation.org: make the statics __meminitdata]
[akpm@linux-foundation.org: fix comment formatting]
[akpm@linux-foundation.org: fix ia64, per yinghai]
Signed-off-by: Russ Anderson <rja@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Tested-by: "Luck, Tony" <tony.luck@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomm/migrate: fix comment typo syncronous->synchronous
Jianguo Wu [Tue, 26 Mar 2013 23:24:45 +0000 (10:24 +1100)]
mm/migrate: fix comment typo syncronous->synchronous

Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomm: page_alloc: avoid marking zones full prematurely after zone_reclaim()
Mel Gorman [Tue, 26 Mar 2013 23:24:44 +0000 (10:24 +1100)]
mm: page_alloc: avoid marking zones full prematurely after zone_reclaim()

The following problem was reported against a distribution kernel when
zone_reclaim was enabled but the same problem applies to the mainline
kernel.  The reproduction case was as follows

1. Run numactl -m +0 dd if=largefile of=/dev/null
   This allocates a large number of clean pages in node 0

2. numactl -N +0 memhog 0.5*Mg
   This start a memory-using application in node 0.

The expected behaviour is that the clean pages get reclaimed and the
application uses node 0 for its memory.  The observed behaviour was that
the memory for the memhog application was allocated off-node since commits
cd38b11 ("mm: page allocator: initialise ZLC for first zone eligible for
zone_reclaim") and commit 76d3fbf ("mm: page allocator: reconsider zones
for allocation after direct reclaim").

The assumption of those patches was that it was always preferable to
allocate quickly than stall for long periods of time and they were meant
to take care that the zone was only marked full when necessary but an
important case was missed.

In the allocator fast path, only the low watermarks are checked.  If the
zones free pages are between the low and min watermark then allocations
from the allocators slow path will succeed.  However, zone_reclaim will
only reclaim SWAP_CLUSTER_MAX or 1<<order pages.  There is no guarantee
that this will meet the low watermark causing the zone to be marked full
prematurely.

This patch will only mark the zone full after zone_reclaim if it the min
watermarks are checked or if page reclaim failed to make sufficient
progress.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Hedi Berriche <hedi@sgi.com>
Tested-by: Hedi Berriche <hedi@sgi.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agox86-64: fall back to regular page vmemmap on allocation failure
Johannes Weiner [Tue, 26 Mar 2013 23:24:44 +0000 (10:24 +1100)]
x86-64: fall back to regular page vmemmap on allocation failure

Memory hotplug can happen on a machine under load, memory shortness
and fragmentation, so huge page allocations for the vmemmap are not
guaranteed to succeed.

Try to fall back to regular pages before failing the hotplug event
completely.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Bernhard Schmidt <Bernhard.Schmidt@lrz.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agox86-64: use vmemmap_populate_basepages() for !pse setups fix
Johannes Weiner [Tue, 26 Mar 2013 23:24:44 +0000 (10:24 +1100)]
x86-64: use vmemmap_populate_basepages() for !pse setups fix

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agox86-64: use vmemmap_populate_basepages() for !pse setups
Johannes Weiner [Tue, 26 Mar 2013 23:24:44 +0000 (10:24 +1100)]
x86-64: use vmemmap_populate_basepages() for !pse setups

We already have generic code to allocate vmemmap with regular pages, use
it.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Bernhard Schmidt <Bernhard.Schmidt@lrz.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agox86-64: remove dead debugging code for !pse setups
Johannes Weiner [Tue, 26 Mar 2013 23:24:43 +0000 (10:24 +1100)]
x86-64: remove dead debugging code for !pse setups

No need to maintain addr_end and p_end when they are never actually read
anywhere on !pse setups.  Remove the dead code.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Bernhard Schmidt <Bernhard.Schmidt@lrz.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agosparse-vmemmap-specify-vmemmap-population-range-in-bytes-fix
Johannes Weiner [Tue, 26 Mar 2013 23:24:43 +0000 (10:24 +1100)]
sparse-vmemmap-specify-vmemmap-population-range-in-bytes-fix

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agosparse-vmemmap: specify vmemmap population range in bytes
Johannes Weiner [Tue, 26 Mar 2013 23:24:43 +0000 (10:24 +1100)]
sparse-vmemmap: specify vmemmap population range in bytes

The sparse code, when asking the architecture to populate the vmemmap,
specifies the section range as a starting page and a number of pages.

This is an awkward interface, because none of the arch-specific code
actually thinks of the range in terms of 'struct page' units and always
translates it to bytes first.

In addition, later patches mix huge page and regular page backing for the
vmemmap.  For this, they need to call vmemmap_populate_basepages() on
sub-section ranges with PAGE_SIZE and PMD_SIZE in mind.  But these are not
necessarily multiples of the 'struct page' size and so this unit is too
coarse.

Just translate the section range into bytes once in the generic sparse
code, then pass byte ranges down the stack.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Bernhard Schmidt <Bernhard.Schmidt@lrz.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: David S. Miller <davem@davemloft.net>
Tested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomm: try harder to allocate vmemmap blocks
Ben Hutchings [Tue, 26 Mar 2013 23:24:42 +0000 (10:24 +1100)]
mm: try harder to allocate vmemmap blocks

Hot-adding memory on x86_64 normally requires huge page allocation.  When
this is done to a VM guest, it's usually because the system is already
tight on memory, so the request tends to fail.  Try to avoid this by
adding __GFP_REPEAT to the allocation flags.

Addresses http://bugs.debian.org/699913

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Bernhard Schmidt <Bernhard.Schmidt@lrz.de>
Tested-by: Bernhard Schmidt <Bernhard.Schmidt@lrz.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomm-hugetlb-include-hugepages-in-meminfo-checkpatch-fixes
Andrew Morton [Tue, 26 Mar 2013 23:24:42 +0000 (10:24 +1100)]
mm-hugetlb-include-hugepages-in-meminfo-checkpatch-fixes

ERROR: code indent should use tabs where possible
#64: FILE: mm/hugetlb.c:2132:
+^I^I        ^Inid,$

WARNING: please, no space before tabs
#64: FILE: mm/hugetlb.c:2132:
+^I^I        ^Inid,$

total: 1 errors, 1 warnings, 52 lines checked

NOTE: whitespace errors detected, you may wish to use scripts/cleanpatch or
      scripts/cleanfile

./patches/mm-hugetlb-include-hugepages-in-meminfo.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomm, hugetlb: include hugepages in meminfo
David Rientjes [Tue, 26 Mar 2013 23:24:42 +0000 (10:24 +1100)]
mm, hugetlb: include hugepages in meminfo

Particularly in oom conditions, it's troublesome that hugetlb memory is
not displayed.  All other meminfo that is emitted will not add up to what
is expected, and there is no artifact left in the kernel log to show that
a potentially significant amount of memory is actually allocated as
hugepages which are not available to be reclaimed.

Booting with hugepages=8192 on the command line, this memory is now shown
in oom conditions.  For example, with echo m > /proc/sysrq-trigger:

Node 0 hugepages_total=2048 hugepages_free=2048 hugepages_surp=0 hugepages_size=2048kB
Node 1 hugepages_total=2048 hugepages_free=2048 hugepages_surp=0 hugepages_size=2048kB
Node 2 hugepages_total=2048 hugepages_free=2048 hugepages_surp=0 hugepages_size=2048kB
Node 3 hugepages_total=2048 hugepages_free=2048 hugepages_surp=0 hugepages_size=2048kB

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomm: merging memory blocks resets mempolicy
Hampson, Steven T [Tue, 26 Mar 2013 23:24:42 +0000 (10:24 +1100)]
mm: merging memory blocks resets mempolicy

Using mbind to change the mempolicy to MPOL_BIND on several adjacent
mmapped blocks may result in a reset of the mempolicy to MPOL_DEFAULT in
vma_adjust.

Test code.  Correct result is three lines containing "OK".

#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#include <numaif.h>
#include <errno.h>

/* gcc mbind_test.c -lnuma -o mbind_test -Wall */
#define MAXNODE 4096

void allocate()
{
int ret;
int len;
int policy = -1;
unsigned char *p;
unsigned long mask[MAXNODE] = { 0 };
unsigned long retmask[MAXNODE] = { 0 };

len = getpagesize() * 0x2fc00;
p = mmap(NULL, len, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
 -1, 0);
if (p == MAP_FAILED)
printf("mbind err: %d\n", errno);

mask[0] = 1;
ret = mbind(p, len, MPOL_BIND, mask, MAXNODE, 0);
if (ret < 0)
printf("mbind err: %d %d\n", ret, errno);
ret = get_mempolicy(&policy, retmask, MAXNODE, p, MPOL_F_ADDR);
if (ret < 0)
printf("get_mempolicy err: %d %d\n", ret, errno);

if (policy == MPOL_BIND)
printf("OK\n");
else
printf("ERROR: policy is %d\n", policy);
}

int main()
{
allocate();
allocate();
allocate();
return 0;
}

Signed-off-by: Steven T Hampson <steven.t.hampson@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoarm: set the page table freeing ceiling to TASK_SIZE
Catalin Marinas [Tue, 26 Mar 2013 23:24:41 +0000 (10:24 +1100)]
arm: set the page table freeing ceiling to TASK_SIZE

ARM processors with LPAE enabled use 3 levels of page tables, with an
entry in the top level (pgd) covering 1GB of virtual space.  Because of
the branch relocation limitations on ARM, the loadable modules are mapped
16MB below PAGE_OFFSET, making the corresponding 1GB pgd shared between
kernel modules and user space.

If free_pgtables() is called with the default ceiling 0, free_pgd_range()
(and subsequently called functions) also frees the page table shared
between user space and kernel modules (which is normally handled by the
ARM-specific pgd_free() function).  This patch changes defines the ARM
USER_PGTABLES_CEILING to TASK_SIZE when CONFIG_ARM_LPAE is enabled.

Note that the pgd_free() function already checks the presence of the
shared pmd page allocated by pgd_alloc() and frees it, though with ceiling
0 this wasn't necessary.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org> [3.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomm: allow arch code to control the user page table ceiling
Hugh Dickins [Tue, 26 Mar 2013 23:24:41 +0000 (10:24 +1100)]
mm: allow arch code to control the user page table ceiling

On architectures where a pgd entry may be shared between user and kernel
(e.g.  ARM+LPAE), freeing page tables needs a ceiling other than 0.  This
patch introduces a generic USER_PGTABLES_CEILING that arch code can
override.  It is the responsibility of the arch code setting the ceiling
to ensure the complete freeing of the page tables (usually in pgd_free()).

[catalin.marinas@arm.com: commit log; shift_arg_pages(), asm-generic/pgtables.h changes]
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: <stable@vger.kernel.org> [3.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomemcg: do not check for do_swap_account in mem_cgroup_{read,write,reset}
Michal Hocko [Tue, 26 Mar 2013 23:24:41 +0000 (10:24 +1100)]
memcg: do not check for do_swap_account in mem_cgroup_{read,write,reset}

Since 2d11085e ("memcg: do not create memsw files if swap accounting is
disabled") memsw files are created only if memcg swap accounting is
enabled so it doesn't make any sense to check for it explicitly in
mem_cgroup_read(), mem_cgroup_write() and mem_cgroup_reset().

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agommap: find_vma: remove the WARN_ON_ONCE(!mm) check
Zhang Yanfei [Tue, 26 Mar 2013 23:24:40 +0000 (10:24 +1100)]
mmap: find_vma: remove the WARN_ON_ONCE(!mm) check

Remove the WARN_ON_ONCE(!mm) check as the comment suggested.  Kernel code
calls find_vma only when it is absolutely sure that the mm_struct arg to
it is non-NULL.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agokexec-vmalloc-export-additional-vmalloc-layer-information-fix
Andrew Morton [Tue, 26 Mar 2013 23:24:40 +0000 (10:24 +1100)]
kexec-vmalloc-export-additional-vmalloc-layer-information-fix

vmalloc.h should include list.h for list_head

Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agokexec, vmalloc: export additional vmalloc layer information
Atsushi Kumagai [Tue, 26 Mar 2013 23:24:40 +0000 (10:24 +1100)]
kexec, vmalloc: export additional vmalloc layer information

Now, vmap_area_list is exported as VMCOREINFO for makedumpfile to get the
start address of vmalloc region (vmalloc_start).  The address which
contains vmalloc_start value is represented as below:

  vmap_area_list.next - OFFSET(vmap_area.list) + OFFSET(vmap_area.va_start)

However, both OFFSET(vmap_area.va_start) and OFFSET(vmap_area.list) aren't
exported as VMCOREINFO.

So this patch exports them externally with small cleanup.

Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Dave Anderson <anderson@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomm, vmalloc: remove list management of vmlist after initializing vmalloc
Joonsoo Kim [Tue, 26 Mar 2013 23:24:40 +0000 (10:24 +1100)]
mm, vmalloc: remove list management of vmlist after initializing vmalloc

Now, there is no need to maintain vmlist after initializing vmalloc.
So remove related code and data structure.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Dave Anderson <anderson@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomm, vmalloc: export vmap_area_list, instead of vmlist
Joonsoo Kim [Tue, 26 Mar 2013 23:24:39 +0000 (10:24 +1100)]
mm, vmalloc: export vmap_area_list, instead of vmlist

Although our intention is to unexport internal structure entirely, but
there is one exception for kexec.  kexec dumps address of vmlist and
makedumpfile uses this information.

We are about to remove vmlist, then another way to retrieve information of
vmalloc layer is needed for makedumpfile.  For this purpose, we export
vmap_area_list, instead of vmlist.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Dave Anderson <anderson@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomm, vmalloc: iterate vmap_area_list, instead of vmlist, in vmallocinfo()
Joonsoo Kim [Tue, 26 Mar 2013 23:24:39 +0000 (10:24 +1100)]
mm, vmalloc: iterate vmap_area_list, instead of vmlist, in vmallocinfo()

This patch is a preparatory step for removing vmlist entirely.  For above
purpose, we change iterating a vmap_list codes to iterating a
vmap_area_list.  It is somewhat trivial change, but just one thing should
be noticed.

Using vmap_area_list in vmallocinfo() introduce ordering problem in SMP
system.  In s_show(), we retrieve some values from vm_struct.  vm_struct's
values is not fully setup when va->vm is assigned.  Full setup is notified
by removing VM_UNLIST flag without holding a lock.  When we see that
VM_UNLIST is removed, it is not ensured that vm_struct has proper values
in view of other CPUs.  So we need smp_[rw]mb for ensuring that proper
values is assigned when we see that VM_UNLIST is removed.

Therefore, this patch not only change a iteration list, but also add a
appropriate smp_[rw]mb to right places.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Dave Anderson <anderson@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomm, vmalloc: iterate vmap_area_list in get_vmalloc_info()
Joonsoo Kim [Tue, 26 Mar 2013 23:24:39 +0000 (10:24 +1100)]
mm, vmalloc: iterate vmap_area_list in get_vmalloc_info()

This patch is a preparatory step for removing vmlist entirely.  For above
purpose, we change iterating a vmap_list codes to iterating a
vmap_area_list.  It is somewhat trivial change, but just one thing should
be noticed.

vmlist is lack of information about some areas in vmalloc address space.
For example, vm_map_ram() allocate area in vmalloc address space, but it
doesn't make a link with vmlist.  To provide full information about
vmalloc address space is better idea, so we don't use va->vm and use
vmap_area directly.  This makes get_vmalloc_info() more precise.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Dave Anderson <anderson@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomm, vmalloc: iterate vmap_area_list, instead of vmlist in vread/vwrite()
Joonsoo Kim [Tue, 26 Mar 2013 23:24:38 +0000 (10:24 +1100)]
mm, vmalloc: iterate vmap_area_list, instead of vmlist in vread/vwrite()

Now, when we hold a vmap_area_lock, va->vm can't be discarded.  So we can
safely access to va->vm when iterating a vmap_area_list with holding a
vmap_area_lock.  With this property, change iterating vmlist codes in
vread/vwrite() to iterating vmap_area_list.

There is a little difference relate to lock, because vmlist_lock is mutex,
but, vmap_area_lock is spin_lock.  It may introduce a spinning overhead
during vread/vwrite() is executing.  But, these are debug-oriented
functions, so this overhead is not real problem for common case.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Dave Anderson <anderson@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomm, vmalloc: protect va->vm by vmap_area_lock
Joonsoo Kim [Tue, 26 Mar 2013 23:24:38 +0000 (10:24 +1100)]
mm, vmalloc: protect va->vm by vmap_area_lock

Inserting and removing an entry to vmlist is linear time complexity, so it
is inefficient.  Following patches will try to remove vmlist entirely.
This patch is preparing step for it.

For removing vmlist, iterating vmlist codes should be changed to iterating
a vmap_area_list.  Before implementing that, we should make sure that when
we iterate a vmap_area_list, accessing to va->vm doesn't cause a race
condition.  This patch ensure that when iterating a vmap_area_list, there
is no race condition for accessing to vm_struct.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Dave Anderson <anderson@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>