]> git.karo-electronics.de Git - karo-tx-linux.git/log
karo-tx-linux.git
12 years agoepoll: limit paths
Jason Baron [Wed, 5 Oct 2011 00:43:41 +0000 (11:43 +1100)]
epoll: limit paths

The current epoll code can be tickled to run basically indefinitely in
both loop detection path check (on ep_insert()), and in the wakeup paths.
The programs that tickle this behavior set up deeply linked networks of
epoll file descriptors that cause the epoll algorithms to traverse them
indefinitely.  A couple of these sample programs have been previously
posted in this thread: https://lkml.org/lkml/2011/2/25/297.

To fix the loop detection path check algorithms, I simply keep track of
the epoll nodes that have been already visited.  Thus, the loop detection
becomes proportional to the number of epoll file descriptor and links.
This dramatically decreases the run-time of the loop check algorithm.  In
one diabolical case I tried it reduced the run-time from 15 mintues (all
in kernel time) to .3 seconds.

Fixing the wakeup paths could be done at wakeup time in a similar manner
by keeping track of nodes that have already been visited, but the
complexity is harder, since there can be multiple wakeups on different
cpus...Thus, I've opted to limit the number of possible wakeup paths when
the paths are created.

This is accomplished, by noting that the end file descriptor points that
are found during the loop detection pass (from the newly added link), are
actually the sources for wakeup events.  I keep a list of these file
descriptors and limit the number and length of these paths that emanate
from these 'source file descriptors'.  In the current implemetation I
allow 1000 paths of length 1, 500 of length 2, 100 of length 3, 50 of
length 4 and 10 of length 5.  Note that it is sufficient to check the
'source file descriptors' reachable from the newly added link, since no
other 'source file descriptors' will have newly added links.  This allows
us to check only the wakeup paths that may have gotten too long, and not
re-check all possible wakeup paths on the system.

In terms of the path limit selection, I think its first worth noting that
the most common case for epoll, is probably the model where you have 1
epoll file descriptor that is monitoring n number of 'source file
descriptors'.  In this case, each 'source file descriptor' has a 1 path of
length 1.  Thus, I believe that the limits I'm proposing are quite
reasonable and in fact may be too generous.  Thus, I'm hoping that the
proposed limits will not prevent any workloads that currently work to
fail.

In terms of locking, I have extended the use of the 'epmutex' to all
epoll_ctl add and remove operations.  Currently its only used in a subset
of the add paths.  I need to hold the epmutex, so that we can correctly
traverse a coherent graph, to check the number of paths.  I believe that
this additional locking is probably ok, since its in the setup/teardown
paths, and doesn't affect the running paths, but it certainly is going to
add some extra overhead.  Also, worth noting is that the epmuex was
recently added to the ep_ctl add operations in the initial path loop
detection code using the argument that it was not on a critical path.

Another thing to note here, is the length of epoll chains that is allowed.
Currently, eventpoll.c defines:

/* Maximum number of nesting allowed inside epoll sets */
#define EP_MAX_NESTS 4

This basically means that I am limited to a graph depth of 5 (EP_MAX_NESTS
+ 1).  However, this limit is currently only enforced during the loop
check detection code, and only when the epoll file descriptors are added
in a certain order.  Thus, this limit is currently easily bypassed.  The
newly added check for wakeup paths, stricly limits the wakeup paths to a
length of 5, regardless of the order in which ep's are linked together.
Thus, a side-effect of the new code is a more consistent enforcement of
the graph depth.

Thus far, I've tested this, using the sample programs previously
mentioned, which now either return quickly or return -EINVAL.  I've also
testing using the piptest.c epoll tester, which showed no difference in
performance.  I've also created a number of different epoll networks and
tested that they behave as expectded.

I believe this solves the original diabolical test cases, while still
preserving the sane epoll nesting.

Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Nelson Elhage <nelhage@ksplice.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoepoll: fix spurious lockdep warnings
Nelson Elhage [Wed, 5 Oct 2011 00:43:41 +0000 (11:43 +1100)]
epoll: fix spurious lockdep warnings

epoll can acquire recursively acquire ep->mtx on multiple "struct
eventpoll"s at once in the case where one epoll fd is monitoring another
epoll fd.  This is perfectly OK, since we're careful about the lock
ordering, but it causes spurious lockdep warnings.  Annotate the recursion
using mutex_lock_nested, and add a comment explaining the nesting rules
for good measure.

Recent versions of systemd are triggering this, and it can also be
demonstrated with the following trivial test program:

--------------------8<--------------------

int main(void) {
   int e1, e2;
   struct epoll_event evt = {
       .events = EPOLLIN
   };

   e1 = epoll_create1(0);
   e2 = epoll_create1(0);
   epoll_ctl(e1, EPOLL_CTL_ADD, e2, &evt);
   return 0;
}
--------------------8<--------------------

Reported-by: Paul Bolle <pebolle@tiscali.nl>
Tested-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Nelson Elhage <nelhage@nelhage.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolib-crc-add-slice-by-8-algorithm-to-crc32c-fix
Andrew Morton [Wed, 5 Oct 2011 00:43:40 +0000 (11:43 +1100)]
lib-crc-add-slice-by-8-algorithm-to-crc32c-fix

don't include asm/msr.h

Cc: Bob Pearson <rpearson@systemfabricworks.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: frank zago <fzago@systemfabricworks.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolib/crc: add slice by 8 algorithm to crc32.c
frank zago [Wed, 5 Oct 2011 00:43:40 +0000 (11:43 +1100)]
lib/crc: add slice by 8 algorithm to crc32.c

Add support for slice by 8 to existing crc32 algorithm.  Also modify
gen_crc32table.c to only produce table entries that are actually used.
The parameters CRC_LE_BITS and CRC_BE_BITS determine the number of bits in
the input array that are processed during each step.  Generally the more
bits the faster the algorithm is but the more table data required.

Using an x86_64 Opteron machine running at 2100MHz the following table was
collected with a pre-warmed cache by computing the crc 1000 times on a
buffer of 4096 bytes.

BITS Size LE Cycles/byte BE Cycles/byte
----------------------------------------------
1 873 41.65 34.60
2 1097 25.43 29.61
4 1057 13.29 15.28
8 2913 7.13 8.19
32 9684 2.80 2.82
64 18178 1.53 1.53

BITS is the value of CRC_LE_BITS or CRC_BE_BITS. The old
default was 8 which actually selected the 32 bit algorithm. In
this version the value 8 is used to select the standard
8 bit algorithm and two new values: 32 and 64 are introduced
to select the slice by 4 and slice by 8 algorithms respectively.

Where Size is the size of crc32.o's text segment which includes
code and table data when both LE and BE versions are set to BITS.

The current version of crc32.c by default uses the slice by 4 algorithm
which requires about 2.8 cycles per byte.  The slice by 8 algorithm is
roughly 2X faster and enables packet processing at over 1GB/sec on a
typical 2-3GHz system.

Signed-off-by: Bob Pearson <rpearson@systemfabricworks.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agokernel.h/checkpatch: mark strict_strto<foo> and simple_strto<foo> as obsolete
Joe Perches [Wed, 5 Oct 2011 00:43:40 +0000 (11:43 +1100)]
kernel.h/checkpatch: mark strict_strto<foo> and simple_strto<foo> as obsolete

Mark obsolete/deprecated strict_strto<foo> and simple_strto<foo> functions
and macros as obsolete.

Update checkpatch to warn about their use.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agollist-return-whether-list-is-empty-before-adding-in-llist_add-fix
Andrew Morton [Wed, 5 Oct 2011 00:43:39 +0000 (11:43 +1100)]
llist-return-whether-list-is-empty-before-adding-in-llist_add-fix

clarify comment

Cc: Huang Ying <ying.huang@intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agollist: using in_nmi requires including hardirq.h
Stephen Rothwell [Wed, 5 Oct 2011 00:43:37 +0000 (11:43 +1100)]
llist: using in_nmi requires including hardirq.h

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolib/idr.c: fix comment for ida_get_new_above()
Wang Sheng-Hui [Wed, 5 Oct 2011 00:43:37 +0000 (11:43 +1100)]
lib/idr.c: fix comment for ida_get_new_above()

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolib/percpu_counter.c: enclose hotplug only variables in hotplug ifdef
Glauber Costa [Wed, 5 Oct 2011 00:43:36 +0000 (11:43 +1100)]
lib/percpu_counter.c: enclose hotplug only variables in hotplug ifdef

These variables are only used when CONFIG_HOTPLUG_CPU is enabled, they are
ifdef'ed everywhere else.  So don't define them when CONFIG_HOTPLUG_CPU is
not enabled.

Signed-off-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolib-bitmapc-quiet-sparse-noise-about-address-space-fix
Andrew Morton [Wed, 5 Oct 2011 00:43:36 +0000 (11:43 +1100)]
lib-bitmapc-quiet-sparse-noise-about-address-space-fix

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: H Hartley Sweeten <hartleys@visionengravers.com>
Cc: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Len Brown <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolib/bitmap.c: quiet sparse noise about address space
H Hartley Sweeten [Wed, 5 Oct 2011 00:43:35 +0000 (11:43 +1100)]
lib/bitmap.c: quiet sparse noise about address space

__bitmap_parse() and __bitmap_parselist() both take a pointer to a kernel
buffer as a parameter and then cast it to a pointer to user buffer for use
in cases when the parameter is_user indicates that the buffer is actually
located in user space.  This casting, and the casts in the callers,
results in sparse noise like the following:

warning: incorrect type in initializer (different address spaces)
  expected char const [noderef] <asn:1>*ubuf
  got char const *buf
warning: cast removes address space of expression

Since these casts are intentional, use __force to quiet the noise.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolib/spinlock_debug.c: print owner on spinlock lockup
Akinobu Mita [Wed, 5 Oct 2011 00:43:35 +0000 (11:43 +1100)]
lib/spinlock_debug.c: print owner on spinlock lockup

When SPIN_BUG_ON is triggered, the lock owner information is reported.
But it is omitted when spinlock lockup is detected.

This information is useful especially on the architectures which don't
implement trigger_all_cpu_backtrace() that is called just after detecting
lockup.  So report it and also avoid message format duplication.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolib/kstrtox: common code between kstrto*() and simple_strto*() functions
Alexey Dobriyan [Wed, 5 Oct 2011 00:43:35 +0000 (11:43 +1100)]
lib/kstrtox: common code between kstrto*() and simple_strto*() functions

Currently termination logic (\0 or \n\0) is hardcoded in _kstrtoull(),
avoid that for code reuse between kstrto*() and simple_strtoull().
Essentially, make them different only in termination logic.

simple_strtoull() (and scanf(), BTW) ignores integer overflow, that's a
bug we currently don't have guts to fix, making KSTRTOX_OVERFLOW hack
necessary.

Almost forgot: patch shrinks code size by about ~80 bytes on x86_64.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/leds/leds-gpio.c: use gpio_get_value_cansleep() when initializing
David Daney [Wed, 5 Oct 2011 00:43:34 +0000 (11:43 +1100)]
drivers/leds/leds-gpio.c: use gpio_get_value_cansleep() when initializing

I get the following warning:

------------[ cut here ]------------
WARNING: at drivers/gpio/gpiolib.c:1559 __gpio_get_value+0x90/0x98()
Modules linked in:
Call Trace:
[<ffffffff81440950>] dump_stack+0x8/0x34
[<ffffffff81141478>] warn_slowpath_common+0x78/0xa0
[<ffffffff812f0958>] __gpio_get_value+0x90/0x98
[<ffffffff81434f04>] create_gpio_led+0xdc/0x194
[<ffffffff8143524c>] gpio_led_probe+0x290/0x36c
[<ffffffff8130e8b0>] driver_probe_device+0x78/0x1b0
[<ffffffff8130eaa8>] __driver_attach+0xc0/0xc8
[<ffffffff8130d7ac>] bus_for_each_dev+0x64/0xb0
[<ffffffff8130e130>] bus_add_driver+0x1c8/0x2a8
[<ffffffff8130f100>] driver_register+0x90/0x180
[<ffffffff81100438>] do_one_initcall+0x38/0x160

---[ end trace ee38723fbefcd65c ]---

My GPIOs are on an I2C port expander, so we must use the *_cansleep()
variant of the GPIO functions.  This is was not being done in
create_gpio_led().

We can change gpio_get_value() to gpio_get_value_cansleep() because it is
only called from the platform_driver probe function, which is a context
where we can sleep.

Only tested on my gpio_cansleep() system, but it seems safe for all
systems.

Signed-off-by: David Daney <david.daney@cavium.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Acked-by: Trent Piepho <tpiepho@gmail.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/leds/leds-renesas-tpu.c: move Renesas TPU LED driver platform data
Magnus Damm [Wed, 5 Oct 2011 00:43:34 +0000 (11:43 +1100)]
drivers/leds/leds-renesas-tpu.c: move Renesas TPU LED driver platform data

Use the platform_data include directory for the TPU LED driver, as
suggested by Paul Mundt.

Signed-off-by: Magnus Damm <damm@opensource.se>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/leds/leds-renesas-tpu.c: update driver to use workqueue
Magnus Damm [Wed, 5 Oct 2011 00:43:34 +0000 (11:43 +1100)]
drivers/leds/leds-renesas-tpu.c: update driver to use workqueue

Use a workqueue in the Renesas TPU LED driver to allow the Runtime PM code
to sleep.

Signed-off-by: Magnus Damm <damm@opensource.se>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/leds/leds-lm3530.c: remove obsolete cleanup for clientdata
Wolfram Sang [Wed, 5 Oct 2011 00:43:33 +0000 (11:43 +1100)]
drivers/leds/leds-lm3530.c: remove obsolete cleanup for clientdata

A few new i2c-drivers came into the kernel which clear the
clientdata-pointer on exit or error.  This is obsolete meanwhile, the core
will do it.

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Richard Purdie <rpurdie@rpsys.net>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/leds/led-triggers.c: fix memory leak
Masakazu Mokuno [Wed, 5 Oct 2011 00:43:33 +0000 (11:43 +1100)]
drivers/leds/led-triggers.c: fix memory leak

The memory for struct led_trigger should be kfreed in the
led_trigger_register() error path.  Also this function should return NULL
on error.

Signed-off-by: Masakazu Mokuno <mokuno@sm.sony.co.jp>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoleds-renesas-tpu-led-driver-v2-fix
Axel Lin [Wed, 5 Oct 2011 00:43:33 +0000 (11:43 +1100)]
leds-renesas-tpu-led-driver-v2-fix

include linux/module.h

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: Magnus Damm <damm@opensource.se>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoleds: Renesas TPU LED driver
Magnus Damm [Wed, 5 Oct 2011 00:43:32 +0000 (11:43 +1100)]
leds: Renesas TPU LED driver

Add V2 of the LED driver for a single timer channel for the TPU hardware
block commonly found in Renesas SoCs.

The driver has been written with optimal Power Management in mind, so to
save power the LED is driven as a regular GPIO pin in case of maximum
brightness and power off which allows the TPU hardware to be idle and
which in turn allows the clocks to be stopped and the power domain to be
turned off transparently.

Any other brightness level requires use of the TPU hardware in PWM mode.
TPU hardware device clocks and power are managed through Runtime PM.
System suspend and resume is known to be working - during suspend the LED
is set to off by the generic LED code.

The TPU hardware timer is equipeed with a 16-bit counter together with an
up-to-divide-by-64 prescaler which makes the hardware suitable for
brightness control.  Hardware blink is unsupported.

The LED PWM waveform has been verified with a Fluke 123 Scope meter on a
sh7372 Mackerel board.  Tested with experimental sh7372 A3SP power domain
patches.  Platform device bind/unbind tested ok.

V2 has been tested on the DS2 LED of the sh73a0-based AG5EVM.

Signed-off-by: Magnus Damm <damm@opensource.se>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agobacklight: rename corgibl_limit_intensity() to genericbl_limit_intensity()
Axel Lin [Wed, 5 Oct 2011 00:43:32 +0000 (11:43 +1100)]
backlight: rename corgibl_limit_intensity() to genericbl_limit_intensity()

The rename of corgibl_limit_intensity is missed in commit d00ba726
("backlight: Rename the corgi backlight driver to generic").  Let's fix it
now.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/video/backlight/l4f00242t03.c: use gpio_request_one() to simplify error handling
Fabio Estevam [Wed, 5 Oct 2011 00:43:31 +0000 (11:43 +1100)]
drivers/video/backlight/l4f00242t03.c: use gpio_request_one() to simplify error handling

Using gpio_request_one can make the error handling simpler.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agobacklight: fix broken regulator API usage in l4f00242t03
Mark Brown [Wed, 5 Oct 2011 00:43:31 +0000 (11:43 +1100)]
backlight: fix broken regulator API usage in l4f00242t03

The regulator support in the l4f00242t03 is very non-idiomatic.  Rather
than requesting the regulators based on the device name and the supply
names used by the device the driver requires boards to pass system
specific supply names around through platform data.  The driver also
conditionally requests the regulators based on this platform data, adding
unneeded conditional code to the driver.

Fix this by removing the platform data and converting to the standard
idiom, also updating all in tree users of the driver.  As no datasheet
appears to be available for the LCD I'm guessing the names for the
supplies based on the existing users and I've no ability to do anything
more than compile test.

The use of regulator_set_voltage() in the driver is also problematic,
since fixed voltages are required the expectation would be that the
voltages would be fixed in the constraints set by the machines rather than
manually configured by the driver, but is less problematic.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agovideo/backlight: remove obsolete cleanup for clientdata
Wolfram Sang [Wed, 5 Oct 2011 00:43:31 +0000 (11:43 +1100)]
video/backlight: remove obsolete cleanup for clientdata

A few new i2c-drivers came into the kernel which clear the
clientdata-pointer on exit or error.  This is obsolete meanwhile, the core
will do it.

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoMAINTAINERS: add new entry for ideapad-laptop
Ike Panhc [Wed, 5 Oct 2011 00:43:30 +0000 (11:43 +1100)]
MAINTAINERS: add new entry for ideapad-laptop

Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agopoll: add poll_requested_events() function
Hans Verkuil [Wed, 5 Oct 2011 00:43:30 +0000 (11:43 +1100)]
poll: add poll_requested_events() function

In some cases the poll() implementation in a driver has to do different
things depending on the events the caller wants to poll for.  An example
is when a driver needs to start a DMA engine if the caller polls for
POLLIN, but doesn't want to do that if POLLIN is not requested but instead
only POLLOUT or POLLPRI is requested.  This is something that can happen
in the video4linux subsystem.

Unfortunately, the current epoll/poll/select implementation doesn't
provide that information reliably.  The poll_table_struct does have it: it
has a key field with the event mask.  But once a poll() call matches one
or more bits of that mask any following poll() calls are passed a NULL
poll_table_struct pointer.

The solution is to set the qproc field to NULL in poll_table_struct once
poll() matches the events, not the poll_table_struct pointer itself.  That
way drivers can obtain the mask through a new poll_requested_events
inline.

The poll_table_struct can still be NULL since some kernel code calls it
internally (netfs_state_poll() in ./drivers/staging/pohmelfs/netfs.h).  In
that case poll_requested_events() returns ~0 (i.e.  all events).

Since eventpoll always leaves the key field at ~0 instead of using the
requested events mask, that source was changed as well to properly fill in
the key field.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Reviewed-by: Jonathan Corbet <corbet@lwn.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agofs/namei.c: remove unused getname_flags()
Andrew Morton [Wed, 5 Oct 2011 00:43:30 +0000 (11:43 +1100)]
fs/namei.c: remove unused getname_flags()

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agotreewide-use-__printf-not-__attribute__formatprintf-checkpatch-fixes
Andrew Morton [Wed, 5 Oct 2011 00:43:29 +0000 (11:43 +1100)]
treewide-use-__printf-not-__attribute__formatprintf-checkpatch-fixes

WARNING: externs should be avoided in .c files
#99: FILE: arch/alpha/boot/misc.c:28:
+extern __printf(1, 2) long srm_printk(const char *, ...);

ERROR: space required after that ';' (ctx:VxV)
#178: FILE: arch/powerpc/boot/ps3.c:39:
+static inline __printf(1, 2) int DBG(const char *fmt, ...) {return 0;}
                                                                     ^

ERROR: "foo* bar" should be "foo *bar"
#225: FILE: arch/s390/include/asm/debug.h:175:
+debug_sprintf_event(debug_info_t* id, int level, char *string, ...);

ERROR: space required after that ',' (ctx:VxV)
#237: FILE: arch/s390/include/asm/debug.h:216:
+debug_sprintf_exception(debug_info_t *id, int level, char *string,...);
                                                                  ^

WARNING: space prohibited between function name and open parenthesis '('
#494: FILE: fs/ext2/ext2.h:139:
+void ext2_error (struct super_block *, const char *, const char *, ...);

WARNING: printk() should include KERN_ facility level
#719: FILE: fs/partitions/ldm.c:63:
+ printk("%s%s(): %pV\n", level, function, &vaf);

WARNING: space prohibited between function name and open parenthesis '('
#721: FILE: fs/partitions/ldm.c:65:
+ va_end (args);

WARNING: space prohibited between function name and open parenthesis '('
#750: FILE: fs/ufs/ufs.h:121:
+void ufs_warning (struct super_block *, const char *, const char *, ...);

WARNING: space prohibited between function name and open parenthesis '('
#752: FILE: fs/ufs/ufs.h:123:
+void ufs_error (struct super_block *, const char *, const char *, ...);

WARNING: space prohibited between function name and open parenthesis '('
#754: FILE: fs/ufs/ufs.h:125:
+void ufs_panic (struct super_block *, const char *, const char *, ...);

WARNING: space prohibited between function name and open parenthesis '('
#1074: FILE: include/linux/ext3_fs.h:941:
+void ext3_error (struct super_block *, const char *, const char *, ...);

WARNING: space prohibited between function name and open parenthesis '('
#1083: FILE: include/linux/ext3_fs.h:944:
+void ext3_abort (struct super_block *, const char *, const char *, ...);

WARNING: space prohibited between function name and open parenthesis '('
#1085: FILE: include/linux/ext3_fs.h:946:
+void ext3_warning (struct super_block *, const char *, const char *, ...);

WARNING: do not add new typedefs
#1178: FILE: include/linux/kdb.h:119:
+typedef __printf(1, 2) int (*kdb_printf_t)(const char *, ...);

ERROR: "foo * bar" should be "foo *bar"
#1203: FILE: include/linux/kernel.h:299:
+extern __printf(2, 3) int sprintf(char * buf, const char * fmt, ...);

ERROR: "foo * bar" should be "foo *bar"
#1206: FILE: include/linux/kernel.h:302:
+int snprintf(char * buf, size_t size, const char * fmt, ...);

ERROR: "foo * bar" should be "foo *bar"
#1210: FILE: include/linux/kernel.h:306:
+int scnprintf(char * buf, size_t size, const char * fmt, ...);

total: 6 errors, 11 warnings, 1375 lines checked

./patches/treewide-use-__printf-not-__attribute__formatprintf.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agotreewide-use-__printf-not-__attribute__formatprintf-fix
Andrew Morton [Wed, 5 Oct 2011 00:43:29 +0000 (11:43 +1100)]
treewide-use-__printf-not-__attribute__formatprintf-fix

After merging the akpm tree, today's linux-next build (powerpc
ppc64_defconfig) failed like this:

In file included from arch/powerpc/boot/stdio.c:12:0:
arch/powerpc/boot/stdio.h:10:17: error: expected declaration specifiers or '...' before numeric constant
arch/powerpc/boot/stdio.h:10:20: error: expected declaration specifiers or '...' before numeric constant
arch/powerpc/boot/stdio.h:10:8: warning: return type defaults to 'int'
arch/powerpc/boot/stdio.h:10:8: warning: function declaration isn't a prototype
arch/powerpc/boot/stdio.h: In function '__printf':
arch/powerpc/boot/stdio.h:14:17: error: expected declaration specifiers or '...' before numeric constant
arch/powerpc/boot/stdio.h:14:20: error: expected declaration specifiers or '...' before numeric constant
arch/powerpc/boot/stdio.h:14:23: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'int'
arch/powerpc/boot/stdio.h:16:12: error: storage class specified for parameter 'vsprintf'
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agotreewide: use __printf not __attribute__((format(printf,...)))
Joe Perches [Wed, 5 Oct 2011 00:43:29 +0000 (11:43 +1100)]
treewide: use __printf not __attribute__((format(printf,...)))

Standardize the style for compiler based printf format verification.
Standardized the location of __printf too.

Done via script and a little typing.

$ grep -rPl --include=*.[ch] -w "__attribute__" * | \
  grep -vP "^(tools|scripts|include/linux/compiler-gcc.h)" | \
  xargs perl -n -i -e 'local $/; while (<>) { s/\b__attribute__\s*\(\s*\(\s*format\s*\(\s*printf\s*,\s*(.+)\s*,\s*(.+)\s*\)\s*\)\s*\)/__printf($1, $2)/g ; print; }'

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoprintk: remove bounds checking for log_prefix
William Douglas [Wed, 5 Oct 2011 00:43:28 +0000 (11:43 +1100)]
printk: remove bounds checking for log_prefix

Currently log_prefix is testing that the first character of the log level
and facility is less than '0' and greater than '9' (which is always
false).

Since the code being updated works because strtoul bombs out (endp isn't
updated) and 0 is returned anyway just remove the check and don't change
the behavior of the function.

Signed-off-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoprintk: fix bounds checking for log_prefix
William Douglas [Wed, 5 Oct 2011 00:43:28 +0000 (11:43 +1100)]
printk: fix bounds checking for log_prefix

Currently log_prefix is testing that the first character of the log level
and facility is less than '0' and greater than '9' (which is always
false).  It should be testing to see if the character less than '0' or
greater than '9' instead.  This patch makes that change.

The code being changed worked because strtoul bombs out (endp isn't
updated) and 0 is returned anyway.

Signed-off-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoprintk: add console_suspend module parameter
Yanmin Zhang [Wed, 5 Oct 2011 00:43:27 +0000 (11:43 +1100)]
printk: add console_suspend module parameter

We are enabling some power features on medfield.  To test suspend-2-RAM
conveniently, we need turn on/off console_suspend_enabled frequently.

Add a module parameter, so users could change it by:
/sys/module/printk/parameters/console_suspend

Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoprintk: add ignore_loglevel as module parameter
Yanmin Zhang [Wed, 5 Oct 2011 00:43:27 +0000 (11:43 +1100)]
printk: add ignore_loglevel as module parameter

We are enabling some power features on medfield.  To test suspend-2-RAM
conveniently, we need turn on/off ignore_loglevel frequently without
rebooting.

Add a module parameter, so users could change it by:
/sys/module/printk/parameters/ignore_loglevel

Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@google.com>
12 years agoprintk: add module parameter ignore_loglevel to control ignore_loglevel
Yanmin Zhang [Wed, 5 Oct 2011 00:43:27 +0000 (11:43 +1100)]
printk: add module parameter ignore_loglevel to control ignore_loglevel

We are enabling some power features on medfield.  To test suspend-2-RAM
conveniently, we need turn on/off ignore_loglevel frequently without
rebooting.

Add a module parameter, so users can change it by:
/sys/module/printk/parameters/ignore_loglevel

Signed-off-by: Yanmin Zhang <yanmin.zhang@intel.com>
Signed-off-by: Andrew Morton <akpm@google.com>
12 years agodynamic_debug: fix undefined reference to `__netdev_printk'
Jason Baron [Wed, 5 Oct 2011 00:43:26 +0000 (11:43 +1100)]
dynamic_debug: fix undefined reference to `__netdev_printk'

Dynamic debug recently added support for netdev_printk.  It uses
__netdev_printk() to support this functionality.  However, when CONFIG_NET
is not set, we get the following error:

lib/built-in.o: In function `__dynamic_netdev_dbg':
(.text+0x9fda): undefined reference to `__netdev_printk'

Fix this by making the call to netdev_printk() contingent upon CONFIG_NET.
 We could have fixed this by defining netdev_printk() to a 'no-op' in the
!CONFIG_NET case.  However, this is not consistent with how the networking
layer uses netdev_printk.  For example, CONFIG_NET is not set,
netdev_printk() does not have a 'no-op' definition defined.

Signed-off-by: Jason Baron <jbaron@redhat.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Greg KH <greg@kroah.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@google.com>
12 years agodynamic_debug: use a single printk() to emit messages
Jason Baron [Wed, 5 Oct 2011 00:43:26 +0000 (11:43 +1100)]
dynamic_debug: use a single printk() to emit messages

We were using KERN_CONT to combine messages with their prefix.  However,
KERN_CONT is not smp safe, in the sense that it can interleave messages.
This interleaving can result in printks coming out at the wrong loglevel.
With the high frequency of printks that dynamic debug can produce this is
not desirable.

So make dynamic_emit_prefix() fill a char buf[64] instead of doing a
printk directly.  If we enable printing out of function, module, line, or
pid info, they are placed in this 64 byte buffer.  In my testing 64 bytes
was enough size to fulfill all requests.  Even if it's not, we can match
up the printk itself to see where it's from, so to me this is no big deal.

[akpm@linux-foundation.org: convert dangerous macro to C]
Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Greg KH <greg@kroah.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@google.com>
12 years agodynamic_debug: remove num_enabled accounting
Jason Baron [Wed, 5 Oct 2011 00:43:25 +0000 (11:43 +1100)]
dynamic_debug: remove num_enabled accounting

The num_enabled accounting isn't actually used anywhere - remove them.

Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Greg KH <greg@kroah.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@google.com>
12 years agodynamic_debug: consolidate repetitive struct _ddebug descriptor definitions
Jason Baron [Wed, 5 Oct 2011 00:43:25 +0000 (11:43 +1100)]
dynamic_debug: consolidate repetitive struct _ddebug descriptor definitions

Replace the repetitive struct _ddebug descriptor definitions with a new
DECLARE_DYNAMIC_DEBUG_META_DATA(name, fmt) macro.

[akpm@linux-foundation.org: s/DECLARE/DEFINE/]
Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Greg KH <greg@kroah.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@google.com>
12 years agowatchdog: move watchdog_*_all_cpus under CONFIG_SYSCTL
Vasily Averin [Wed, 5 Oct 2011 00:43:25 +0000 (11:43 +1100)]
watchdog: move watchdog_*_all_cpus under CONFIG_SYSCTL

Fix compilation warnings for CONFIG_SYSCTL=n:

fixed compilation warnings in case of disabled CONFIG_SYSCTL
kernel/watchdog.c:483:13: warning: `watchdog_enable_all_cpus' defined but not used
kernel/watchdog.c:500:13: warning: `watchdog_disable_all_cpus' defined but not used

these functions are static and are used only in sysctl handler, so move
them inside #ifdef CONFIG_SYSCTL too

Signed-off-by: Vasily Averin <vvs@sw.ru>
Signed-off-by: Andrew Morton <akpm@google.com>
12 years agostop_machine-make-stop_machine-safe-and-efficient-to-call-early-v3.
Jeremy Fitzhardinge [Wed, 5 Oct 2011 00:43:24 +0000 (11:43 +1100)]
stop_machine-make-stop_machine-safe-and-efficient-to-call-early-v3.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@google.com>
12 years agostop_machine: make stop_machine safe and efficient to call early
Jeremy Fitzhardinge [Wed, 5 Oct 2011 00:43:24 +0000 (11:43 +1100)]
stop_machine: make stop_machine safe and efficient to call early

Make stop_machine() safe to call early in boot, before SMP has been set
up, by simply calling the callback function directly if there's only one
CPU online.

[ Fixes from AKPM:
   - add comment
   - local_irq_flags, not save_flags
   - also call hard_irq_disable() for systems which need it

  Tejun suggested using an explicit flag rather than just looking at
  the online cpu count. ]

Cc: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Andrew Morton <akpm@google.com>
12 years agodriver/misc/fsa9480.c fix potential null-pointer dereference
Jonghwan Choi [Wed, 5 Oct 2011 00:43:24 +0000 (11:43 +1100)]
driver/misc/fsa9480.c fix potential null-pointer dereference

Signed-off-by: Jonghwan Choi <jhbird.choi@samsung.com>
Cc: Donggeun Kim <dg77.kim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolis3lv02d: make regulator API usage unconditional
Mark Brown [Wed, 5 Oct 2011 00:43:23 +0000 (11:43 +1100)]
lis3lv02d: make regulator API usage unconditional

The regulator API contains a range of features for stubbing itself out
when not in use and for transparently restricting the actual effect of
regulator API calls where they can't be supported on a particular system
so that drivers don't need to individually implement this.  Simplify the
driver slightly by making use of this idiom.

The only in tree user is ecovec24 which does not use the regulator API.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Éric Piel <eric.piel@tremplin-utc.net>
Cc: Ilkka Koskinen <ilkka.koskinen@nokia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolis3-remove-the-references-to-the-global-variable-in-core-driver-fix
Ilkka Koskinen [Wed, 5 Oct 2011 00:43:23 +0000 (11:43 +1100)]
lis3-remove-the-references-to-the-global-variable-in-core-driver-fix

Signed-off-by: Ilkka Koskinen <ilkka.koskinen@nokia.com>
Signed-off-by: Éric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolis3: remove the references to the global variable in core driver
Éric Piel [Wed, 5 Oct 2011 00:43:22 +0000 (11:43 +1100)]
lis3: remove the references to the global variable in core driver

Signed-off-by: Ilkka Koskinen <ilkka.koskinen@nokia.com>
Signed-off-by: Éric Piel <eric.piel@tremplin-utc.net>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Witold Pilat <witold.pilat@gmail.com>
Cc: Lyall Pearce <lyall.pearce@hp.com>
Cc: Malte Starostik <m-starostik@versanet.de>
Cc: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Cc: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolis3: change exported function to use passed parameter
Éric Piel [Wed, 5 Oct 2011 00:43:22 +0000 (11:43 +1100)]
lis3: change exported function to use passed parameter

Change exported functions to use the device given as parameter
instead of the global one.

Signed-off-by: Ilkka Koskinen <ilkka.koskinen@nokia.com>
Signed-off-by: Éric Piel <eric.piel@tremplin-utc.net>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Witold Pilat <witold.pilat@gmail.com>
Cc: Lyall Pearce <lyall.pearce@hp.com>
Cc: Malte Starostik <m-starostik@versanet.de>
Cc: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Cc: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolis3: use consistent naming of variables
Éric Piel [Wed, 5 Oct 2011 00:43:22 +0000 (11:43 +1100)]
lis3: use consistent naming of variables

Signed-off-by: Ilkka Koskinen <ilkka.koskinen@nokia.com>
Signed-off-by: Éric Piel <eric.piel@tremplin-utc.net>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Witold Pilat <witold.pilat@gmail.com>
Cc: Lyall Pearce <lyall.pearce@hp.com>
Cc: Malte Starostik <m-starostik@versanet.de>
Cc: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Cc: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolis3: free regulators if probe() fails
Éric Piel [Wed, 5 Oct 2011 00:43:21 +0000 (11:43 +1100)]
lis3: free regulators if probe() fails

Signed-off-by: Ilkka Koskinen <ilkka.koskinen@nokia.com>
Signed-off-by: Éric Piel <eric.piel@tremplin-utc.net>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Witold Pilat <witold.pilat@gmail.com>
Cc: Lyall Pearce <lyall.pearce@hp.com>
Cc: Malte Starostik <m-starostik@versanet.de>
Cc: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Cc: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agohp_accel: add HP ProBook 655x
Éric Piel [Wed, 5 Oct 2011 00:43:21 +0000 (11:43 +1100)]
hp_accel: add HP ProBook 655x

Add axis correction for HP ProBook 6555b.

Signed-off-by: Malte Starostik <m-starostik@versanet.de>
Signed-off-by: Éric Piel <eric.piel@tremplin-utc.net>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Witold Pilat <witold.pilat@gmail.com>
Cc: Lyall Pearce <lyall.pearce@hp.com>
Cc: Ilkka Koskinen <ilkka.koskinen@nokia.com>
Cc: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Cc: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolis3: add support for HP EliteBook 8540w
Éric Piel [Wed, 5 Oct 2011 00:43:21 +0000 (11:43 +1100)]
lis3: add support for HP EliteBook 8540w

Add axis correction for HP EliteBook 8540w.

Reported-by: Lyall Pearce <lyall.pearce@hp.com>
Signed-off-by: Éric Piel <eric.piel@tremplin-utc.net>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Witold Pilat <witold.pilat@gmail.com>
Cc: Malte Starostik <m-starostik@versanet.de>
Cc: Ilkka Koskinen <ilkka.koskinen@nokia.com>
Cc: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Cc: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolis3: add support for HP EliteBook 2730p
Éric Piel [Wed, 5 Oct 2011 00:43:20 +0000 (11:43 +1100)]
lis3: add support for HP EliteBook 2730p

Add axis correction for HP EliteBook 2730p.

Tested-by: Witold Pilat <witold.pilat@gmail.com>
Signed-off-by: Éric Piel <eric.piel@tremplin-utc.net>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Lyall Pearce <lyall.pearce@hp.com>
Cc: Malte Starostik <m-starostik@versanet.de>
Cc: Ilkka Koskinen <ilkka.koskinen@nokia.com>
Cc: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Cc: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolis3: update maintainer information
Éric Piel [Wed, 5 Oct 2011 00:43:20 +0000 (11:43 +1100)]
lis3: update maintainer information

In the move of the lis3 driver, the hp_accel.c file got dropped from the
MAINTAINER file. Make it explicit again that this file is tied to lis3
again.

Signed-off-by: Éric Piel <eric.piel@tremplin-utc.net>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Witold Pilat <witold.pilat@gmail.com>
Cc: Lyall Pearce <lyall.pearce@hp.com>
Cc: Malte Starostik <m-starostik@versanet.de>
Cc: Ilkka Koskinen <ilkka.koskinen@nokia.com>
Cc: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Cc: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolis3lv02d: avoid divide by zero due to unchecked
Éric Piel [Wed, 5 Oct 2011 00:43:20 +0000 (11:43 +1100)]
lis3lv02d: avoid divide by zero due to unchecked

After an "unexpected" reboot, I found this Oops in my logs:

divide error: 0000 [#1] PREEMPT SMP=20
CPU 0=20
Modules linked in: lis3lv02d hp_wmi input_polldev [...]
Pid: 390, comm: modprobe Tainted: G         C  2.6.39-rc7-wl+=20
RIP: 0010:[<ffffffffa014b427>]  [<ffffffffa014b427>]
 lis3lv02d_poweron+0x4e/0x94 [lis3lv02d]
RSP: 0018:ffff8801d6407cf8  EFLAGS: 00010246
RAX: 0000000000000bb8 RBX: ffffffffa014e000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffea00066e4708 RDI: ffff8801df002700
RBP: ffff8801d6407d18 R08: ffffea00066c5a30 R09: ffffffff812498c9
R10: ffff8801d7bfcea0 R11: ffff8801d7bfce10 R12: 0000000000000bb8
R13: 00000000ffffffda R14: ffffffffa0154120 R15: ffffffffa0154030
=46S:  00007fc0705db700(0000) GS:ffff8801dfa00000(0000) knlGS:0
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f33549174f0 CR3: 00000001d65c9000 CR4: 00000000000406f0
Process modprobe (pid: 390, threadinfo ffff8801d6406000, task ffff8801d6b40=
000)
Stack:
 ffffffffa0154120 62ffffffa0154030 ffffffffa014e000 00000000ffffffea
 ffff8801d6407d58 ffffffffa014bcc1 0000000000000000 0000000000000048
 ffff8801d8bae800 00000000ffffffea 00000000ffffffda ffffffffa0154120
Call Trace:
 [<ffffffffa014bcc1>] lis3lv02d_init_device+0x1ce/0x496 [lis3lv02d]
 [<ffffffffa01522ff>] lis3lv02d_add+0x10f/0x17c [hp_accel]
 [<ffffffff81233e11>] acpi_device_probe+0x49/0x117
[...]
Code: 3a 75 06 80 4d ef 50 eb 04 80 4d ef 40 0f b6 55 ef be 21
00 00 00 48 89 df ff 53 18 44 8b 63 6c e8 3e fc ff ff 89 c1 44
89 e0 99 <f7> f9 89 c7 e8 93 82 ef e0 48 83 7b 30 00 74 2d 45
31 e4 80 7b=20
RIP  [<ffffffffa014b427>] lis3lv02d_poweron+0x4e/0x94 [lis3lv02d]
 RSP <ffff8801d6407cf8>

>From my POV, it looks like the hardware is not working as expected
and returns a bogus data rate. The driver doesn't check the result
and directly uses it as some sort of divisor in some places:

msleep(lis3->pwron_delay / lis3lv02d_get_odr());

Under this circumstances, this could very well cause the
"divide by zero" exception from above.

For now, I fixed it the easiest and most obvious way:
Check if the result is sane and if it isn't use a sane default
instead. I went for "100" in the latter case, simply because
/sys/devices/platform/lis3lv02d/rate returns it on a successful
boot.

Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Éric Piel <eric.piel@tremplin-utc.net>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Witold Pilat <witold.pilat@gmail.com>
Cc: Lyall Pearce <lyall.pearce@hp.com>
Cc: Malte Starostik <m-starostik@versanet.de>
Cc: Ilkka Koskinen <ilkka.koskinen@nokia.com>
Cc: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Cc: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Andrew Morton <akpm@google.com>
12 years agodrivers/hwmon/hwmon.c: convert idr to ida and use ida_simple_get()
Jonathan Cameron [Wed, 5 Oct 2011 00:43:19 +0000 (11:43 +1100)]
drivers/hwmon/hwmon.c: convert idr to ida and use ida_simple_get()

A straightforward looking use of idr for a device id.

Signed-off-by: Jonathan Cameron <jic23@cam.ac.uk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Acked-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agohwmon: convert idr to ida and use ida_simple interface
Jonathan Cameron [Wed, 5 Oct 2011 00:43:19 +0000 (11:43 +1100)]
hwmon: convert idr to ida and use ida_simple interface

hwmon was using an idr with a NULL pointer, so convert to an
ida which then allows use of Rusty's ida_simple_get.

Signed-off-by: Jonathan Cameron <jic23@cam.ac.uk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolib/Kconfig.debug: fix help message for DEFAULT_HUNG_TASK_TIMEOUT
Jiaju Zhang [Wed, 5 Oct 2011 00:43:18 +0000 (11:43 +1100)]
lib/Kconfig.debug: fix help message for DEFAULT_HUNG_TASK_TIMEOUT

Added missing _secs in the help message of config DEFAULT_HUNG_TASK_TIMEOUT.

Signed-off-by: Jiaju Zhang <jjzhang@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agofs/pipe.c: add ->statfs callback for pipefs
Pavel Emelyanov [Wed, 5 Oct 2011 00:43:18 +0000 (11:43 +1100)]
fs/pipe.c: add ->statfs callback for pipefs

Currently a statfs on a pipe's /proc/<pid>/fd/ link returns -ENOSYS.  Wire
pipfs up so that the statfs succeeds.

This is required by checkpoint-restart in the userspace to make it
possible to distinguish pipes from fifos.

When we dump information about task's open files we use the /proc/pid/fd
directoy's symlinks and the fact that opening any of them gives us exactly
the same dentry->inode pair as the original process has.  Now if a task
we're dumping has opened pipe and fifo we need to detect this and act
accordingly.  Knowing that an fd with type S_ISFIFO resides on a pipefs is
the most precise way.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agointel_idle: disable auto_demotion for hotplugged CPUs
Shaohua Li [Wed, 5 Oct 2011 00:43:18 +0000 (11:43 +1100)]
intel_idle: disable auto_demotion for hotplugged CPUs

auto_demotion_disable is called only for online CPUs.  For hotplugged
CPUs, we should disable it too.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agointel_idle: fix API misuse
Shaohua Li [Wed, 5 Oct 2011 00:43:17 +0000 (11:43 +1100)]
intel_idle: fix API misuse

smp_call_function() only lets all other CPUs execute a specific function,
while we expect all CPUs do in intel_idle.  Without the fix, we could have
one cpu which has auto_demotion enabled or has no boradcast timer setup.
Usually we don't see impact because auto demotion just harms power and the
intel_idle init is called in CPU 0, where boradcast timer delivers
interrupt, but this still could be a problem.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoalpha: wire up sendmmsg syscall
Michael Cree [Wed, 5 Oct 2011 00:43:17 +0000 (11:43 +1100)]
alpha: wire up sendmmsg syscall

Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoalpha: wire up accept4 syscall
Michael Cree [Wed, 5 Oct 2011 00:43:17 +0000 (11:43 +1100)]
alpha: wire up accept4 syscall

Somehow wiring up the accept4 syscall on Alpha was missed long ago.
This commit rectifies that oversight.

Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agohpet: factor timer allocate from open
Magnus Lynch [Wed, 5 Oct 2011 00:43:16 +0000 (11:43 +1100)]
hpet: factor timer allocate from open

The current implementation of the /dev/hpet driver couples opening the
device with allocating one of the (scarce) timers (aka comparators).  This
is a limitation in that the main counter may be valuable to applications
seeking a high-resolution timer who have no use for the interrupt
generating functionality of the comparators.

This patch alters the open semantics so that when the device is opened, no
timer is allocated.  Operations that depend on a timer being in context
implicitly attempt allocating a timer, to maintain backward compatibility.
 There is also an IOCTL (HPET_ALLOC_TIMER _IO) added so that the
allocation may be done explicitly.  (I prefer the explicit open then
allocate pattern but don't know how practical it would be to require all
existing code to be changed.)

/dev/hpet is accessed via mmap().  This is the only interface of /dev/hpet
that is actually used in practice.

[akpm@linux-foundation.org: coding-style tweaks]
[arnd@arndb.de: fix build]
Signed-off-by: Magnus Lynch <maglyx@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Acked-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoinclude/linux/security.h: fix security_inode_init_security() arg
Andrew Morton [Wed, 5 Oct 2011 00:43:16 +0000 (11:43 +1100)]
include/linux/security.h: fix security_inode_init_security() arg

Make the security_inode_init_security() initxattrs arg const, to match the
non-stubbed version of that function.

Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoselinuxfs: remove custom hex_to_bin()
Andy Shevchenko [Wed, 5 Oct 2011 00:43:15 +0000 (11:43 +1100)]
selinuxfs: remove custom hex_to_bin()

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Eric Paris <eparis@parisplace.org>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agovmscan: add barrier to prevent evictable page in unevictable list
Minchan Kim [Wed, 5 Oct 2011 00:43:15 +0000 (11:43 +1100)]
vmscan: add barrier to prevent evictable page in unevictable list

When a race between putback_lru_page() and shmem_lock with lock=0 happens,
progrom execution order is as follows, but clear_bit in processor #1 could
be reordered right before spin_unlock of processor #1.  Then, the page
would be stranded on the unevictable list.

spin_lock
SetPageLRU
spin_unlock
                                clear_bit(AS_UNEVICTABLE)
                                spin_lock
                                if PageLRU()
                                        if !test_bit(AS_UNEVICTABLE)
                                         move evictable list
smp_mb
if !test_bit(AS_UNEVICTABLE)
        move evictable list
                                spin_unlock

But, pagevec_lookup() in scan_mapping_unevictable_pages() has
rcu_read_[un]lock() so it could protect reordering before reaching
test_bit(AS_UNEVICTABLE) on processor #1 so this problem never happens.
But it's a unexpected side effect and we should solve this problem
properly.

This patch adds a barrier after mapping_clear_unevictable.

I didn't meet this problem but just found during review.

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@google.com>
12 years agomm/huge_memory.c: quiet sparse noise
H Hartley Sweeten [Wed, 5 Oct 2011 00:43:15 +0000 (11:43 +1100)]
mm/huge_memory.c: quiet sparse noise

Quiet the sparse noise:

warning: symbol 'khugepaged_scan' was not declared. Should it be static?
warning: context imbalance in 'khugepaged_scan_mm_slot' - unexpected unlock

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm/mempolicy.c: quiet sparse noise
H Hartley Sweeten [Wed, 5 Oct 2011 00:43:14 +0000 (11:43 +1100)]
mm/mempolicy.c: quiet sparse noise

Quiet the spares noise:

warning: symbol 'default_policy' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Stephen Wilson <wilsons@start.ca>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm/thrash.c: quiet sparse noise
H Hartley Sweeten [Wed, 5 Oct 2011 00:43:14 +0000 (11:43 +1100)]
mm/thrash.c: quiet sparse noise

Quiet the following sparse noise:

warning: symbol 'swap_token_memcg' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm/memblock.c: quiet sparse noise
H Hartley Sweeten [Wed, 5 Oct 2011 00:43:14 +0000 (11:43 +1100)]
mm/memblock.c: quiet sparse noise

Quiet the following sparse noise in this file:

warning: symbol 'memblock_overlaps_region' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers,com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Tomi Valkeinen <tomi.valkeinen@nokia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: disable user interface to manually rescue unevictable pages
Johannes Weiner [Wed, 5 Oct 2011 00:43:13 +0000 (11:43 +1100)]
mm: disable user interface to manually rescue unevictable pages

At one point, anonymous pages were supposed to go on the unevictable list
when no swap space was configured, and the idea was to manually rescue
those pages after adding swap and making them evictable again.  But
nowadays, swap-backed pages on the anon LRU list are not scanned without
available swap space anyway, so there is no point in moving them to a
separate list anymore.

The manual rescue could also be used in case pages were stranded on the
unevictable list due to race conditions.  But the code has been around for
a while now and newly discovered bugs should be properly reported and
dealt with instead of relying on such a manual fixup.

In addition to the lack of a usecase, the sysfs interface to rescue pages
from a specific NUMA node has been broken since its introduction, so it's
unlikely that anybody ever relied on that.

This patch removes the functionality behind the sysctl and the
node-interface and emits a one-time warning when somebody tries to access
either of them.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Reported-by: Kautuk Consul <consul.kautuk@gmail.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agovmscan.c: fix invalid strict_strtoul() check in write_scan_unevictable_node()
Kautuk Consul [Wed, 5 Oct 2011 00:43:13 +0000 (11:43 +1100)]
vmscan.c: fix invalid strict_strtoul() check in write_scan_unevictable_node()

write_scan_unevictable_node() checks the value req returned by
strict_strtoul() and returns 1 if req is 0.

However, when strict_strtoul() returns 0, it means successful conversion
of buf to unsigned long.

Due to this, the function was not proceeding to scan the zones for
unevictable pages even though we write a valid value to the
scan_unevictable_pages sys file.

Change this check slightly to check for invalid value in buf as well as 0
value stored in res after successful conversion via strict_strtoul.  In
both cases, we do not perform the scanning of this node's zones.

Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: fix kunmap_high() comment
Li Haifeng [Wed, 5 Oct 2011 00:43:13 +0000 (11:43 +1100)]
mm: fix kunmap_high() comment

Signed-off-by: Li Haifeng <omycle@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: compaction: make compact_zone_order() static
Kyungmin Park [Wed, 5 Oct 2011 00:43:12 +0000 (11:43 +1100)]
mm: compaction: make compact_zone_order() static

There's no compact_zone_order() user outside file scope, so make it static.

Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoHWPOISON: convert pr_debug()s to pr_info()s
Dean Nelson [Wed, 5 Oct 2011 00:43:12 +0000 (11:43 +1100)]
HWPOISON: convert pr_debug()s to pr_info()s

Commit fb46e73520940b ("HWPOISON: Convert pr_debugs to pr_info) authored
by Andi Kleen converted a number of pr_debug()s to pr_info()s.

About the same time additional code with pr_debug()s was added by two
other commits 8c6c2ecb4466 ("HWPOSION, hugetlb: recover from free hugepage
error when !MF_COUNT_INCREASED") and d950b95882f3d ("HWPOISON, hugetlb:
soft offlining for hugepage").  And these pr_debug()s failed to get
converted to pr_info()s.

This patch converts them as well.  And does some minor related whitespace
cleanup.

Signed-off-by: Dean Nelson <dnelson@redhat.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agofs/buffer.c: add device information for error output in __find_get_block_slow()
Tao Ma [Wed, 5 Oct 2011 00:43:11 +0000 (11:43 +1100)]
fs/buffer.c: add device information for error output in __find_get_block_slow()

On the ext4 mailing list[1], we got some report about errors in
__find_get_block_slow(), but the information is very limited.

If the device information is given, we can know the name of the sick
volume.  Futhermore, we can get the corresponding status of that
block(group, inode block etc) by analyzing the disk layout.

[1] http://marc.info/?l=linux-ext4&m=131379831421147&w=2

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm-mmapc-eliminate-the-ret-variable-from-mm_take_all_locks-fix
Andrew Morton [Wed, 5 Oct 2011 00:43:11 +0000 (11:43 +1100)]
mm-mmapc-eliminate-the-ret-variable-from-mm_take_all_locks-fix

Cc: Kautuk Consul <consul.kautuk@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm/mmap.c: eliminate the ret variable from mm_take_all_locks()
Kautuk Consul [Wed, 5 Oct 2011 00:43:11 +0000 (11:43 +1100)]
mm/mmap.c: eliminate the ret variable from mm_take_all_locks()

The ret variable is really not needed in mm_take_all_locks().

Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agovmscan: fix shrinker callback bug in fs/super.c
Mikulas Patocka [Wed, 5 Oct 2011 00:43:10 +0000 (11:43 +1100)]
vmscan: fix shrinker callback bug in fs/super.c

The callback must not return -1 when nr_to_scan is zero. Fix the bug in
fs/super.c and add this requirement to the callback specification.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm-add-comment-explaining-task-state-setting-in-bdi_forker_thread-fix
Andrew Morton [Wed, 5 Oct 2011 00:43:10 +0000 (11:43 +1100)]
mm-add-comment-explaining-task-state-setting-in-bdi_forker_thread-fix

fiddle wording

Cc: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoksm: fix the comment of try_to_unmap_one()
Wanlong Gao [Wed, 5 Oct 2011 00:43:10 +0000 (11:43 +1100)]
ksm: fix the comment of try_to_unmap_one()

try_to_unmap_one() is called by try_to_unmap_ksm(), too.

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agothp-tail-page-refcounting-fix-6
Andrea Arcangeli [Wed, 5 Oct 2011 00:43:09 +0000 (11:43 +1100)]
thp-tail-page-refcounting-fix-6

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: thp: tail page refcounting fix
Andrea Arcangeli [Wed, 5 Oct 2011 00:43:09 +0000 (11:43 +1100)]
mm: thp: tail page refcounting fix

Michel while working on the working set estimation code, noticed that
calling get_page_unless_zero() on a random pfn_to_page(random_pfn) wasn't
safe, if the pfn ended up being a tail page of a transparent hugepage
under splitting by __split_huge_page_refcount().  He then found the
problem could also theoretically materialize with
page_cache_get_speculative() during the speculative radix tree lookups
that uses get_page_unless_zero() in SMP if the radix tree page is freed
and reallocated and get_user_pages is called on it before
page_cache_get_speculative has a chance to call get_page_unless_zero().

So the best way to fix the problem is to keep page_tail->_count zero at
all times.  This will guarantee that get_page_unless_zero() can never
succeed on any tail page.  page_tail->_mapcount is guaranteed zero and is
unused for all tail pages of a compound page, so we can simply account the
tail page references there and transfer them to tail_page->_count in
__split_huge_page_refcount() (in addition to the head_page->_mapcount).

While debugging this s/_count/_mapcount/ change I also noticed get_page is
called by direct-io.c on pages returned by get_user_pages.  That wasn't
entirely safe because the two atomic_inc in get_page weren't atomic.  As
opposed other get_user_page users like secondary-MMU page fault to
establish the shadow pagetables would never call any superflous get_page
after get_user_page returns.  It's safer to make get_page universally safe
for tail pages and to use get_page_foll() within follow_page (inside
get_user_pages()).  get_page_foll() is safe to do the refcounting for tail
pages without taking any locks because it is run within PT lock protected
critical sections (PT lock for pte and page_table_lock for
pmd_trans_huge).  The standard get_page() as invoked by direct-io instead
will now take the compound_lock but still only for tail pages.  The
direct-io paths are usually I/O bound and the compound_lock is per THP so
very finegrined, so there's no risk of scalability issues with it.  A
simple direct-io benchmarks with all lockdep prove locking and spinlock
debugging infrastructure enabled shows identical performance and no
overhead.  So it's worth it.  Ideally direct-io should stop calling
get_page() on pages returned by get_user_pages().  The spinlock in
get_page() is already optimized away for no-THP builds but doing
get_page() on tail pages returned by GUP is generally a rare operation and
usually only run in I/O paths.

This new refcounting on page_tail->_mapcount in addition to avoiding new
RCU critical sections will also allow the working set estimation code to
work without any further complexity associated to the tail page
refcounting with THP.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm-add-extra-free-kbytes-tunable-update-checkpatch-fixes
Andrew Morton [Wed, 5 Oct 2011 00:43:08 +0000 (11:43 +1100)]
mm-add-extra-free-kbytes-tunable-update-checkpatch-fixes

ERROR: trailing whitespace
#98: FILE: mm/page_alloc.c:5303:
+ * free_kbytes_sysctl_handler - just a wrapper around proc_dointvec() so $

ERROR: trailing whitespace
#103: FILE: mm/page_alloc.c:5307:
+int free_kbytes_sysctl_handler(ctl_table *table, int write, $

ERROR: need consistent spacing around '*' (ctx:WxV)
#103: FILE: mm/page_alloc.c:5307:
+int free_kbytes_sysctl_handler(ctl_table *table, int write,
                                          ^

total: 3 errors, 0 warnings, 69 lines checked

NOTE: whitespace errors detected, you may wish to use scripts/cleanpatch or
      scripts/cleanfile

./patches/mm-add-extra-free-kbytes-tunable-update.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm-add-extra-free-kbytes-tunable-update
Rik van Riel [Wed, 5 Oct 2011 00:43:08 +0000 (11:43 +1100)]
mm-add-extra-free-kbytes-tunable-update

All the fixes suggested by Andrew Morton.   Not much of a changelog
since the patch should probably be folded into
mm-add-extra-free-kbytes-tunable.patch

Thank you for pointing these out, Andrew.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: add extra free kbytes tunable
Rik van Riel [Wed, 5 Oct 2011 00:43:08 +0000 (11:43 +1100)]
mm: add extra free kbytes tunable

Add a userspace visible knob to tell the VM to keep an extra amount of
memory free, by increasing the gap between each zone's min and low
watermarks.

This is useful for realtime applications that call system calls and have a
bound on the number of allocations that happen in any short time period.
In this application, extra_free_kbytes would be left at an amount equal to
or larger than than the maximum number of allocations that happen in any
burst.

It may also be useful to reduce the memory use of virtual machines
(temporarily?), in a way that does not cause memory fragmentation like
ballooning does.

Signed-off-by: Rik van Riel<riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm/vmalloc.c: report more vmalloc failures
Joe Perches [Wed, 5 Oct 2011 00:43:07 +0000 (11:43 +1100)]
mm/vmalloc.c: report more vmalloc failures

Some vmalloc failure paths do not report OOM conditions.

Add warn_alloc_failed, which also does a dump_stack, to those failure
paths.

This allows more site specific vmalloc failure logging message printks to
be removed.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: fix page-faults detection in swap-token logic
Konstantin Khlebnikov [Wed, 5 Oct 2011 00:43:07 +0000 (11:43 +1100)]
mm: fix page-faults detection in swap-token logic

After commit v2.6.36-5896-gd065bd8 "mm: retry page fault when blocking on
disk transfer" we usually wait in page-faults without mmap_sem held, so
all swap-token logic was broken, because it based on using
rwsem_is_locked(&mm->mmap_sem) as sign of in progress page-faults.

Add an atomic counter of in progress page-faults for mm to the mm_struct
with swap-token.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agokswapd: assign new_order and new_classzone_idx after wakeup in sleeping
Alex,Shi [Wed, 5 Oct 2011 00:43:07 +0000 (11:43 +1100)]
kswapd: assign new_order and new_classzone_idx after wakeup in sleeping

There 2 places to read pgdat in kswapd.  One is return from a successful
balance, another is waked up from kswapd sleeping.  The new_order and
new_classzone_idx represent the balance input order and classzone_idx.

But current new_order and new_classzone_idx are not assigned after
kswapd_try_to_sleep(), that will cause a bug in the following scenario.

1: after a successful balance, kswapd goes to sleep, and new_order = 0;
   new_classzone_idx = __MAX_NR_ZONES - 1;

2: kswapd waked up with order = 3 and classzone_idx = ZONE_NORMAL

3: in the balance_pgdat() running, a new balance wakeup happened with
   order = 5, and classzone_idx = ZONE_NORMAL

4: the first wakeup(order = 3) finished successufly, return order = 3
   but, the new_order is still 0, so, this balancing will be treated as a
   failed balance.  And then the second tighter balancing will be missed.

So, to avoid the above problem, the new_order and new_classzone_idx need
to be assigned for later successful comparison.

Signed-off-by: Alex Shi <alex.shi@intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Tested-by: Pádraig Brady <P@draigBrady.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: compaction: accounting fix
Minchan Kim [Wed, 5 Oct 2011 00:43:06 +0000 (11:43 +1100)]
mm: compaction: accounting fix

I saw the following accouting of compaction during test of the series.

compact_blocks_moved 251
compact_pages_moved 44

It's very awkward to me although it's possbile because it means we try to
compact 251 blocks but it just migrated 44 pages.  As further
investigation, I found isolate_migratepages doesn't isolate any pages but
it returns ISOLATE_SUCCESS and then, it just increases
compact_blocks_moved but doesn't increased compact_pages_moved.

This patch makes accouting of compaction works only in case of success of
isolation.

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm-compaction-compact-unevictable-pages-checkpatch-fixes
Andrew Morton [Wed, 5 Oct 2011 00:43:06 +0000 (11:43 +1100)]
mm-compaction-compact-unevictable-pages-checkpatch-fixes

ERROR: need consistent spacing around '|' (ctx:VxW)
#67: FILE: mm/compaction.c:264:
+ isolate_mode_t mode = ISOLATE_ACTIVE| ISOLATE_INACTIVE |
                                      ^

total: 1 errors, 0 warnings, 36 lines checked

./patches/mm-compaction-compact-unevictable-pages.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: compaction: compact unevictable pages
Minchan Kim [Wed, 5 Oct 2011 00:43:06 +0000 (11:43 +1100)]
mm: compaction: compact unevictable pages

Presently compaction doesn't handle mlocked pages as it uses
__isolate_lru_page which doesn't consider unevicatable pages.  It is used
by just lumpy so it is pointless that it isolates unevictable pages.

But the situation has changed.  Compaction can handle unevictable pages
and it can help getting big contiguos pages in memory whcih is fragmented
by many pinned pages with mlock.

I tested this patch with following scenario.

1. A : allocate 80% anon pages in system
2. B : allocate 20% mlocked page in system
/* Maybe, mlocked pages are located in low pfn address */
3. kill A /* high pfn address are free */
4. echo 1 > /proc/sys/vm/compact_memory

old:

compact_blocks_moved 251
compact_pages_moved 44

new:

compact_blocks_moved 258
compact_pages_moved 412

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <jweiner@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm/memblock.c: small function definition fixes
Jonghwan Choi [Wed, 5 Oct 2011 00:43:05 +0000 (11:43 +1100)]
mm/memblock.c: small function definition fixes

warning: function 'memblock_memory_can_coalesce'
with external linkage has definition.

Signed-off-by: Jonghwan Choi <jhbird.choi@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: add free_hot_cold_page_list() helper
Konstantin Khlebnikov [Wed, 5 Oct 2011 00:43:05 +0000 (11:43 +1100)]
mm: add free_hot_cold_page_list() helper

This patch adds helper free_hot_cold_page_list() to free list of 0-order
pages.  It frees pages directly from list without temporary page-vector.
It also calls trace_mm_pagevec_free() to simulate pagevec_free()
behaviour.

bloat-o-meter:

add/remove: 1/1 grow/shrink: 1/3 up/down: 267/-295 (-28)
function                                     old     new   delta
free_hot_cold_page_list                        -     264    +264
get_page_from_freelist                      2129    2132      +3
__pagevec_free                               243     239      -4
split_free_page                              380     373      -7
release_pages                                606     510     -96
free_page_list                               188       -    -188

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agokswapd: avoid unnecessary rebalance after an unsuccessful balancing
Alex,Shi [Wed, 5 Oct 2011 00:43:04 +0000 (11:43 +1100)]
kswapd: avoid unnecessary rebalance after an unsuccessful balancing

In commit 215ddd66 ("mm: vmscan: only read new_classzone_idx from pgdat
when reclaiming successfully") , Mel Gorman said kswapd is better to sleep
after a unsuccessful balancing if there is tighter reclaim request pending
in the balancing.  But in the following scenario, kswapd do something that
is not matched our expectation.  The patch fixes this issue.

1, Read pgdat request A (classzone_idx, order = 3)
2, balance_pgdat()
3, During pgdat, a new pgdat request B (classzone_idx, order = 5) is placed
4, balance_pgdat() returns but failed since returned order = 0
5, pgdat of request A assigned to balance_pgdat(), and do balancing again.
   While the expectation behavior of kswapd should try to sleep.

Signed-off-by: Alex Shi <alex.shi@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Tested-by: Pádraig Brady <P@draigBrady.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodebug-pagealloc-add-support-for-highmem-pages-fix
Andrew Morton [Wed, 5 Oct 2011 00:43:04 +0000 (11:43 +1100)]
debug-pagealloc-add-support-for-highmem-pages-fix

remove unneeded preempt_disable/enable

Cc: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodebug-pagealloc: add support for highmem pages
Akinobu Mita [Wed, 5 Oct 2011 00:43:04 +0000 (11:43 +1100)]
debug-pagealloc: add support for highmem pages

This adds support for highmem pages poisoning and verification to the
debug-pagealloc feature for no-architecture support.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm-neaten-warn_alloc_failed-fix
Andrew Morton [Wed, 5 Oct 2011 00:43:03 +0000 (11:43 +1100)]
mm-neaten-warn_alloc_failed-fix

use the __printf() macro

Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: neaten warn_alloc_failed
Joe Perches [Wed, 5 Oct 2011 00:43:03 +0000 (11:43 +1100)]
mm: neaten warn_alloc_failed

Add __attribute__((format (printf...) to the function to validate format
and arguments.  Use vsprintf extension %pV to avoid any possible message
interleaving.  Coalesce format string.  Convert printks/pr_warning to
pr_warn.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: iov_iter: have iov_iter_advance() decrement nr_segs appropriately
Jeff Layton [Wed, 5 Oct 2011 00:43:03 +0000 (11:43 +1100)]
mm: iov_iter: have iov_iter_advance() decrement nr_segs appropriately

Currently, when you call iov_iter_advance, then the pointer to the iovec
array can be incremented, but it does not decrement the nr_segs value in
the iov_iter struct.  The result is a iov_iter struct with a nr_segs value
that goes beyond the end of the array.

While I'm not aware of anything that's specifically broken by this, it
seems odd and a bit dangerous not to decrement that value.  If someone
were to trust the nr_segs value to be correct, then they could end up
walking off the end of the array.

Changing this might also provide some micro-optimization when dealing with
the last iovec in an array.  Many of the other routines that deal with
iov_iter have optimized codepaths when nr_segs == 1.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>