git.karo-electronics.de Git - karo-tx-linux.git/log

rtc: rtc-mv: make of_device_id array const

Make of_device_id array const, because all OF functions handle it as
const.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

rtc: isl12057: make of_device_id array const

Make of_device_id array const, because all OF functions handle it as
const.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

rtc: rtc-hym8563: make of_device_id array const

Make of_device_id array const, because all OF functions handle it as
const.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Acked-by: Heiko Stbner <heiko@sntech.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

rtc: rtc-ds1742: make of_device_id array const

Make of_device_id array const, because all OF functions handle it as
const.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

drivers/rtc/rtc-da9052.c: ALARM causes interrupt storm

Setting the alarm to a time not on a minute boundary results in repeated
interrupts being generated by the DA9052/3 PMIC device until the kernel
RTC core sees that the alarm has rung.  Sometimes the number and frequency
of interrupts can cause the kernel to disable the IRQ line used by the
DA9052/3 PMIC with disasterous consequences.  This patch fixes the
problem.

Even though the DA9052/3 PMIC is capable generating periodic interrupts,
ie TICKS, the method used to distinguish RTC_AF from RTC_PF events was
flawed and can not work in conjunction with the regmap_irq kernel core.
Thus that flawed detection has also been removed by the DA9052/3 PMIC RTC
driver's irq handler, so that it no longer reports the wrong type of event
to the kernel RTC core.

The internal static functions within the DA9052/3 PMIC RTC driver have
been changed to pass the 'da9052_rtc' structure instead of the 'da9052'
because there is no backwards pointer from the 'da9052' structure.

This patch fixes the three issues described above.  The first is serious
because usiing the RTC alarm set to a non minute boundary will eventually
cause all component drivers that depend on the interrupt line to fail.
The solution adopted is to round up to alarm time to the next highest
minute.

The second bug, reporting a RTC_PF event instead of an RTC_AF event turns
out to not matter with the current implementation of the kernel RTC core
as it seems to ignore the event type.  However, should that change in the
future it is better to fix the issue now and not have 'problems waiting to
happen'

The third set of changes are to make the da9052_rtc structure available to
all the local internal functions in the driver.  This was done during
testing so that diagnostic data could be stored there.  Should the
solution to the first issue be found not acceptable, then the alternative
of using the TICKS interrupt at the fixed one second interval in order to
step to the exact second of the requested alarm requires an extra (alarm
time) piece of data to be stored.  In devices that use the alarm function
to wake up from sleep, accuracy to the second will result in the device
being awake for up to nearly a minute longer than expected.

Signed-off-by: Anthony Olech <anthony.olech.opensource@diasemi.com>
Cc: David Dajun Chen <dchen@diasemi.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

drivers/rtc/rtc-88pm860x.c: add missing of_node_put()

Add missing of_node_put() to decrement the reference count.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: Jingoo Han <jg1.han@samsung.com>
Cc: Haojian Zhuang <haojian.zhuang@linaro.org>
Cc: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

drivers/rtc/rtc-88pm860x.c: use of_get_child_by_name()

Use of_get_child_by_name() to obtain reference to charger node instead of
of_find_node_by_name() which can walk outside of the parent node.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: Jingoo Han <jg1.han@samsung.com>
Cc: Haojian Zhuang <haojian.zhuang@linaro.org>
Cc: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

arch/mips/dec: switch DECstation systems to rtc-cmos

This adds an RTC platform device for DECstation systems so that they can
use the rtc-cmos driver for their RTC device.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

rtc-rtc-cmos-drivers-char-rtcc-features-for-decstation-support-fix

fix weird code layout

Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

drivers/rtc/rtc-cmos.c: drivers/char/rtc.c features for DECstation support

This brings in drivers/char/rtc.c functionality required for DECstation
and, should the maintainers decide to switch, Alpha systems to use
rtc-cmos.

Specifically these features are made available:

* RTC iomem rather than x86/PCI port I/O mapping, controlled with the
  RTC_IOMAPPED macro as with the original driver.  The DS1287A chip in all
  DECstation systems is mapped in the host bus address space as a
  contiguous block of 64 32-bit words of which the least significant byte
  accesses the RTC chip for both reads and writes.  All the address and
  data window register accesses are made transparently by the chipset glue
  logic so that the device appears directly mapped on the host bus.

* A way to set the size of the address space explicitly with the
  newly-added `address_space' member of the platform part of the RTC
  device structure.  This avoids the unreliable heuristics that does not
  work in a setup where the RTC is not explicitly accessed with the usual
  address and data window register pair.

* The ability to use the RTC periodic interrupt as a system clock
  device, which is implemented by arch/mips/kernel/cevt-ds1287.c for
  DECstation systems and takes the RTC interrupt away from the RTC driver.
   Eventually hooking back to the clock device's interrupt handler should
  be possible for the purpose of the alarm clock and possibly also
  update-in-progress interrupt, but this is not done by this change.

  o To avoid interfering with the clock interrupt all the places where
    the RTC interrupt mask is fiddled with are only executed if and IRQ
    has been assigned to the RTC driver.

  o To avoid changing the clock setup Register A is not fiddled with
    if CMOS_RTC_FLAGS_NOFREQ is set in the newly-added `flags' member of
    the platform part of the RTC device structure.  Originally, in
    drivers/char/rtc.c, this was keyed with the absence of the RTC
    interrupt, just like the interrupt mask, but there only the periodic
    interrupt frequency is set, whereas rtc-cmos also sets the divider
    bits.  Therefore a new flag is introduced so that systems where the
    RTC interrupt is not usable rather than used as a system clock device
    can fully initialise the RTC.

* A small clean-up is made to the IRQ assignment code that makes the IRQ
  number hardcoded to -1 rather than arbitrary -ENXIO (or whatever error
  happens to be returned by platform_get_irq) where no IRQ has been
  assigned to the RTC driver (NO_IRQ might be another candidate, but it
  looks like this macro has inconsistent or missing definitions and
  limited use and might therefore be unsafe).

Verified to work correctly with a DECstation 5000/240 system.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

drivers/rtc/rtc-efi.c: avoid subtracting day twice when computing year days

Compared source code of rtc-lib.c::rtc_year_days() with
efirtc.c::rtc_year_days(), found the code in rtc-efi decreases value of
day twice when it computing year days. rtc-lib.c::rtc_year_days() has
already decrease days and return the year days from 0 to 365.

Signed-off-by: Lee, Chun-Yi <jlee@suse.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

drivers/rtc/rtc-m41t80.c: add support for MicroCrystal rv4162

Signed-off-by: Wolfram Sang <wsa@sang-engineering.com>
Cc: Jingoo Han <jg1.han@samsung.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

drivers/rtc/rtc-m41t80.c: propagate error value from smbus functions

Don't replace the value we got from the I2C layer, just pass it on.

Signed-off-by: Wolfram Sang <wsa@sang-engineering.com>
Cc: Jingoo Han <jg1.han@samsung.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

drivers/rtc/rtc-m41t80.c: clean up error paths

There is no cleanup needed when something fails in probe, so no need for
goto. Directly return when something fails.

Signed-off-by: Wolfram Sang <wsa@sang-engineering.com>
Cc: Jingoo Han <jg1.han@samsung.com>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

drivers/rtc/rtc-m41t80.c: remove DRV_VERSION macro

History is in git, no need for sperate versioning. Also remove the
success printout, RTC core does it, too.

Signed-off-by: Wolfram Sang <wsa@sang-engineering.com>
Cc: Jingoo Han <jg1.han@samsung.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

arm64: add APM X-Gene SoC RTC DTS entry

This patch adds APM X-Gene SoC RTC DTS entry

Signed-off-by: Rameshwar Prasad Sahu <rsahu@apm.com>
Signed-off-by: Loc Ho <lho@apm.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

drivers/rtc: add APM X-Gene SoC RTC driver

Add support for the APM X-Gene SoC RTC driver.

Signed-off-by: Rameshwar Prasad Sahu <rsahu@apm.com>
Signed-off-by: Loc Ho <lho@apm.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Documentation/devicetree/bindings: add documentation for the APM X-Gene SoC RTC DTS binding

Signed-off-by: Rameshwar Prasad Sahu <rsahu@apm.com>
Signed-off-by: Loc Ho <lho@apm.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

drivers/rtc/interface.c: fix for fix of alarm initialization

Seems the previous patch "fix infinite loop in initializing the alarm"
did break the infinite loop in alarm initialization, but not in the right
way. The loop indeed should walk through the not-leap years and stop on
the leap one.

This patch does apply on top of the previous one.

Signed-off-by: Ales Novak <alnovak@suse.cz>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

drivers/rtc/interface.c: fix infinite loop in initializing the alarm

In __rtc_read_alarm(), if the alarm time retrieved by
rtc_read_alarm_internal() from the device contains invalid values (e.g.
month=2,mday=31) and the year not set (=-1), the initialization will loop
infinitely because the year-fixing loop expects the time being invalid due
to leap year.

Fix reduces the loop to the leap years and adds final validity check.

Signed-off-by: Ales Novak <alnovak@suse.cz>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Reported-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

fs/autofs4/dev-ioctl.c: add __init to autofs_dev_ioctl_init

autofs_dev_ioctl_init is only called by __init init_autofs4_fs

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

init/main.c: remove an ifdef

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kthreads: kill CLONE_KERNEL, change kernel_thread(kernel_init) to avoid CLONE_SIGHAND

1. Remove CLONE_KERNEL, it has no users and it is dangerous.

   The (old) comment says "List of flags we want to share for kernel
   threads" but this is not true, we do not want to share ->sighand by
   default. This flag can only be used if the caller is sure that both
   parent/child will never play with signals (say, allow_signal/etc).

2. Change rest_init() to clone kernel_init() without CLONE_SIGHAND.

   In this case CLONE_SIGHAND does not really hurt, and it looks like
   optimization because copy_sighand() can avoid kmem_cache_alloc().

   But in fact this only adds the minor pessimization. kernel_init()
   is going to exec the init process, and de_thread() will need to
   unshare ->sighand and do kmem_cache_alloc(sighand_cachep) anyway,
   but it needs to do more work and take tasklist_lock and siglock.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

init-mainc-add-initcall_blacklist-kernel-parameter-fix

tweak printk text

Cc: Andi Kleen <andi@firstfloor.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Weinberger <richard.weinberger@gmail.com>
Cc: Rob Landley <rob@landley.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

init/main.c: add initcall_blacklist kernel parameter

When a module is built into the kernel the module_init() function becomes
an initcall. Sometimes debugging through dynamic debug can help, however,
debugging built in kernel modules is typically done by changing the
.config, recompiling, and booting the new kernel in an effort to determine
exactly which module caused a problem.

This patchset can be useful stand-alone or combined with initcall_debug.
There are cases where some initcalls can hang the machine before the
console can be flushed, which can make initcall_debug output inaccurate.
Having the ability to skip initcalls can help further debugging of these
scenarios.

Usage: initcall_blacklist=<list of comma separated initcalls>

ex) added "initcall_blacklist=sgi_uv_sysfs_init" as a kernel parameter and
the log contains:

blacklisting initcall sgi_uv_sysfs_init
...
...
initcall sgi_uv_sysfs_init blacklisted

ex) added "initcall_blacklist=foo_bar,sgi_uv_sysfs_init" as a kernel parameter
and the log contains:

blacklisting initcall foo_bar
blacklisting initcall sgi_uv_sysfs_init
...
...
initcall sgi_uv_sysfs_init blacklisted

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Weinberger <richard.weinberger@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Rob Landley <rob@landley.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

init/main.c: don't use pr_debug()

Pertially revert ea676e846a8171b8 ("init/main.c: convert to pr_foo()").

Unbeknownst to me, pr_debug() is different from the other pr_foo() levels:
pr_debug() is a no-op when DEBUG is not defined.

Happily, init/main.c does have a #define DEBUG so we didn't break
initcall_debug. But the functioning of initcall_debug should not be
dependent upon the presence of that #define DEBUG.

Reported-by: Russell King <rmk@arm.linux.org.uk>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

binfmt_elf.c: use get_random_int() to fix entropy depleting

Entropy is quickly depleted under normal operations like ls(1), cat(1),
etc... between 2.6.30 to current mainline, for instance:

$ cat /proc/sys/kernel/random/entropy_avail
3428
$ cat /proc/sys/kernel/random/entropy_avail
2911
$cat /proc/sys/kernel/random/entropy_avail
2620

We observed this problem has been occurring since 2.6.30 with
fs/binfmt_elf.c: create_elf_tables()->get_random_bytes(), introduced by
f06295b44c296c8f ("ELF: implement AT_RANDOM for glibc PRNG seeding").

/*
* Generate 16 random bytes for userspace PRNG seeding.
*/
get_random_bytes(k_rand_bytes, sizeof(k_rand_bytes));

The patch introduces a wrapper around get_random_int() which has lower
overhead than calling get_random_bytes() directly.

With this patch applied:
$ cat /proc/sys/kernel/random/entropy_avail
2731
$ cat /proc/sys/kernel/random/entropy_avail
2802
$ cat /proc/sys/kernel/random/entropy_avail
2878

Analyzed by John Sobecki.

This has been applied on a specific Oracle kernel and has been running on
the customer's production environment (the original bug reporter) for
several months; it has worked fine until now.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <aedilger@gmail.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Arnd Bergmann <arnn@arndb.de>
Cc: John Sobecki <john.sobecki@oracle.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

fs/binfmt_flat.c: make old_reloc() static

old_reloc() is only used in this file, make it static.

Signed-off-by: Axel Lin <axel.lin@ingics.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

fs/binfmt_elf.c: fix bool assignements

Fix coccinelle warnings.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

checkpatch: warn on #defines ending in semicolon

Using a #define ending in a semicolon is poor style and can lead to
unexpected code paths being executed.

Warn on uses of these #define types:

#define foo[(...)] bar;
#define foo[(...)] \
bar;

Based on a patch from Borislav Petkov.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

checkpatch: make --strict a default for files in drivers/net and net/

Networking files are generally more strictly conformant to linux-kernel
style so make checkpatch more verbose by default for patches to files or
when checking files in these directories.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

checkpatch: improve missing blank line after declarations test

A couple more modifications to the declarations tests.

o Declarations can also be bitfields so exclude things with a colon
o Make sure the current and previous lines are indented the same
  to avoid matching some macro where a struct type is passed on
  the previous line like:

next = list_entry(buffer->entry.next,
  struct binder_buffer, entry);
if (buffer_start_page(next) == buffer_end_page(buffer))

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

checkpatch: always warn on missing blank line after variable declaration block

Make the test system wide, modify the message too.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

checkpatch: fix wildcard DT compatible string checking

We attempt to search for compatible strings which use a variable token in
the documented name such as <chip> or <soc>. While this was attempted to
be handled, it's utterly broken.

The desired forms of matching are:

vendor,<chip>-*
vendor,name<part#>-*

For <chip>, lower case characters and numbers are permitted. For <part#>,
only numeric values are allowed.

With this change, the number of missing compatible strings reported in
arch/arm/boot/dts is reduced from 1071 to 960.

Reported-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Florian Vaussard <florian.vaussard@epfl.ch>
Cc: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kernel/compat.c: use sizeof() instead of sizeof

Fix 4 checkpatch warnings
WARNING: sizeof *tv should be sizeof(*tv)

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib: Add CRC64 ECMA module

Add implementation of CRC64 ECMA checksum.

We have an IP Acceleration driver for Freescale network processors which
is using this CRC64. However, it still needs some work in order for it to
become upstreamable.

Signed-off-by: Marian Chereji <marian.chereji@freescale.com>
Reviewed-by: Varvara Andrei-B21317 <andrei.varvara@freescale.com>
Reviewed-by: Fleming Andrew-AFLEMING <AFLEMING@freescale.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/util.c: add kstrimdup()

kstrimdup() creates a whitespace-trimmed duplicate of the passed in
null-terminated string. This is useful for strings coming from sysfs that
often include trailing whitespace due to user input.

Thanks to Joe Perches for this implementation.

Signed-off-by: Sebastian Capella <sebastian.capella@linaro.org>
Cc: Joe Perches <joe@perches.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/plist.c: make CONFIG_DEBUG_PI_LIST selectable

Change CONFIG_DEBUG_PI_LIST to be user-selectable, and add a title and
description. Remove the dependency on DEBUG_RT_MUTEXES since they were
changed to use rbtrees, and there are other users of plists now.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib-btreec-fix-leak-of-whole-btree-nodes-fix

remove unneeded test of NULL

Cc: Joern Engel <joern@logfs.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Minfei Huang <huangminfei@ucloud.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/btree.c: fix leak of whole btree nodes

I use btree from 3.14-rc2 in my own module. When the btree module is
removed, a warning arises:

kmem_cache_destroy btree_node: Slab cache still has objects
CPU: 13 PID: 9150 Comm: rmmod Tainted: GF O 3.14.0-rc2 #1
Hardware name: Inspur NF5270M3/NF5270M3, BIOS CHEETAH_2.1.3 09/10/2013
ffff881ff8643b18 ffff881ffdc23ea8 ffffffff815a4ecc 0000000000000000
ffff881ff8643ac0 ffff881ffdc23ec8 ffffffff811610df 0000000000000880
ffffffffa057da60 ffff881ffdc23ed8 ffffffffa057d57c ffff881ffdc23f78
Call Trace:
[<ffffffff815a4ecc>] dump_stack+0x49/0x5d
[<ffffffff811610df>] kmem_cache_destroy+0xcf/0xe0
[<ffffffffa057d57c>] btree_module_exit+0x10/0x12 [btree]
[<ffffffff810d7948>] SyS_delete_module+0x198/0x1f0
[<ffffffff815aac89>] ? retint_swapgs+0xe/0x13
[<ffffffff810a561d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[<ffffffff812addde>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff815b3652>] system_call_fastpath+0x16/0x1b

The cause is that it doesn't release the last btree node, when height = 1
and fill = 1.

Signed-off-by: Minfei Huang <huangminfei@ucloud.cn>
Cc: Joern Engel <joern@logfs.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/vsprintf.c: fix comparison to bool

Fixing 2 coccinelle warnings:
lib/vsprintf.c:2350:2-9: WARNING: Assignment of bool to 0/1
lib/vsprintf.c:2389:3-10: WARNING: Assignment of bool to 0/1

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/libcrc32c.c: use PTR_ERR_OR_ZERO

replace IS_ERR/PTR_ERR

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/xz: enable all filters by default in Kconfig

This restores the old behavior that existed before 2013-02-22, when
changes were made by 64dbfb444c150 ("decompressors: drop dependency on
CONFIG_EXPERT") and 5dc49c75a2 ("decompressors: make the default XZ_DEC_*
config match the selected architecture").

Disabling the filters only makes sense on embedded systems.

Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Acked-by: Kyle McMartin <kyle@infradead.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/plist.c: replace pr_debug with printk in plist_test()

Replace pr_debug() in lib/plist.c test function plist_test() with
printk(KERN_DEBUG ...).

Without DEBUG defined, pr_debug() is complied out, but the entire
plist_test() function is already inside CONFIG_DEBUG_PI_LIST, so printk
should just be used directly.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/xz: add comments for the intentionally missing break statements

Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/string.c: use the name "C-string" in comments

For strncpy() and friends the source string may or may not have an actual
NUL character at the end. The documentation is confusing in this because
it specifically mentions that you are passing a "NUL-terminated" string.
Wikipedia says that "C-string" is an alternative name we can use instead.

http://en.wikipedia.org/wiki/Null-terminated_string

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

drivers/video/backlight/backlight.c: remove backlight sysfs uevent

Most mobile phones have Ambient Light Sensors and it changes brightness
according to the lux.  It means it changes backlight brightness frequently
by just writing sysfs node, so it generates uevent.

Usually there's no user to use this backlight changes.  But it forks udev
worker threads and it takes about 5ms.  The main problem is that it hurts
other process activities.  so remove it.

Kay said
"Uevents are for the major, low-frequent, global device state-changes,
not for carrying-out any sort of measurement data. Subsystems which
need that should use other facilities like poll()-able sysfs file or
any other subscription-based, client-tracking interface which does not
cause overhead if it isn't used. Uevents are not the right thing to
use here, and upstream udev should not paper-over broken kernel
subsystems."

Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Acked-by: Jingoo Han <jg1.han@samsung.com>
Cc: Henrique de Moraes Holschuh <ibm-acpi@hmh.eng.br>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

MAINTAINERS: add linux-api for review of API/ABI changes

This makes it more likely that patch submitters will CC API/ABI changes to
the linux-api list, and tools like get_maintainer.pl will do so
automatically.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Michael Kerrisk <mtk.man-pages@gmail.com>
Cc: Joe Perches <joe@perches.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/vsprintf: add %pT format specifier

Since task_struct->comm can be modified by other threads while the current
thread is reading it, it is recommended to use get_task_comm() for reading
it.

However, since get_task_comm() holds task_struct->alloc_lock spinlock,
some users cannot use get_task_comm().  Also, a lot of users are directly
reading from task_struct->comm even if they can use get_task_comm().  Such
users might obtain inconsistent result.

This patch introduces %pT format specifier for printing task_struct->comm.
Currently %pT does not provide consistency.  I'm planning to change to
use RCU in the future.  By using RCU, the comm name read from
task_struct->comm will be guaranteed to be consistent.  But before
modifying set_task_comm() to use RCU, we need to kill direct ->comm users
who do not use get_task_comm().

An example for converting direct ->comm users is shown below.  Since many
debug printings use p == current, you can pass NULL instead of p if p ==
current.

  pr_info("comm=%s\n", p->comm);       => pr_info("comm=%pT\n", p);
  pr_info("comm=%s\n", current->comm); => pr_info("comm=%pT\n", NULL);

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Pavel Machek <pavel@ucw.cz>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

printk: report dropping of messages from logbuf

If the log ring buffer becomes full, we silently overwrite old messages
with new data. console_unlock will detect this case and fast-forward the
console_* pointers to skip over the corrupted data, but nothing will be
reported to the user.

This patch hijacks the first valid log message after detecting that we
dropped messages and prefixes it with a note detailing how many messages
were dropped. For long (~1000 char) messages, this will result in some
truncation of the real message, but given that we're dropping things
anyway, that doesn't seem to be the end of the world.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Documentation: expand/clarify debug documentation

The pr_debug() and related debug print macros all differ from the normal
pr_XXX() macros, in that the normal ones print unconditionally, while the
debug macros are compiled out unless DEBUG is defined or
CONFIG_DYNAMIC_DEBUG is set. This isn't obvious, and the only way to find
this out is either to review the actual printk.h code or to read
CodingStyle, and the message there doesn't highlight the fact.

Change Documentation/CodingStyle to clearly indicate that pr_debug() and
related debug printing macros behave differently than all other pr_XXX()
macros, and attempt to clarify when and where the different debug printing
methods might be used.

Add short comment to printk.h above the pr_XXX() macros indicating that
while these macros print unconditionally, pr_debug() does not.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Cc: Joe Perches <joe@perches.com>
Cc: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

timekeeping: use printk_deferred when holding timekeeping seqlock

Jiri Bohac pointed out that there are rare but potential deadlock
possibilities when calling printk while holding the timekeeping
seqlock.

This is due to printk() triggering console sem wakeup, which can
cause scheduling code to trigger hrtimers which may try to read
the time.

Specifically, as Jiri pointed out, that path is:
  printk
    vprintk_emit
      console_unlock
        up(&console_sem)
          __up
    wake_up_process
      try_to_wake_up
        ttwu_do_activate
  ttwu_activate
    activate_task
      enqueue_task
        enqueue_task_fair
  hrtick_update
    hrtick_start_fair
      hrtick_start_fair
        get_time
  ktime_get
    --> endless loop on
    read_seqcount_retry(&timekeeper_seq, ...)

This patch tries to avoid this issue by using printk_deferred (previously
named printk_sched) which should defer printing via a irq_work_queue.

Signed-off-by: John Stultz <john.stultz@linaro.org>
Reported-by: Jiri Bohac <jbohac@suse.cz>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

printk: Add printk_deferred_once

Two of the three prink_deferred uses are really printk_once style
uses, so add a printk_deferred_once macro to simplify those call
sites.

Signed-off-by: John Stultz <john.stultz@linaro.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

printk: rename printk_sched to printk_deferred

After learning we'll need some sort of deferred printk functionality in
the timekeeping core, Peter suggested we rename the printk_sched function
so it can be reused by needed subsystems.

This only changes the function name. No logic changes.

Signed-off-by: John Stultz <john.stultz@linaro.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

printk: disable preemption for printk_sched

An earlier change in -mm (printk: remove separate printk_sched
buffers...), removed the printk_sched irqsave/restore lines since it was
safe for current users. Since we may be expanding usage of
printk_sched(), disable preepmtion for this function to make it more
generally safe to call.

Signed-off-by: John Stultz <john.stultz@linaro.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

printk: remove separate printk_sched buffers and use printk buf instead

To prevent deadlocks with doing a printk inside the scheduler,
printk_sched() was created.  The issue is that printk has a console_sem
that it can grab and release.  The release does a wake up if there's a
task pending on the sem, and this wake up grabs the rq locks that is held
in the scheduler.  This leads to a possible deadlock if the wake up uses
the same rq as the one with the rq lock held already.

What printk_sched() does is to save the printk write in a per cpu buffer
and sets the PRINTK_PENDING_SCHED flag.  On a timer tick, if this flag is
set, the printk() is done against the buffer.

There's a couple of issues with this approach.

1) If two printk_sched()s are called before the tick, the second one
   will overwrite the first one.

2) The temporary buffer is 512 bytes and is per cpu.  This is a quite a
   bit of space wasted for something that is seldom used.

In order to remove this, the printk_sched() can use the printk buffer
instead, and delay the console_trylock()/console_unlock() to the queued
work.

Because printk_sched() would then be taking the logbuf_lock, the
logbuf_lock must not be held while doing anything that may call into the
scheduler functions, which includes wake ups.  Unfortunately, printk()
also has a console_sem that it uses, and on release, the up(&console_sem)
may do a wake up of any pending waiters.  This must be avoided while
holding the logbuf_lock.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

printk: enable interrupts before calling console_trylock_for_printk()

We need interrupts disabled when calling console_trylock_for_printk() only
so that cpu id we pass to can_use_console() remains valid (for other
things console_sem provides all the exclusion we need and deadlocks on
console_sem due to interrupts are impossible because we use
down_trylock()). However if we are rescheduled, we are guaranteed to run
on an online cpu so we can easily just get the cpu id in
can_use_console().

We can lose a bit of performance when we enable interrupts in
vprintk_emit() and then disable them again in console_unlock() but OTOH it
can somewhat reduce interrupt latency caused by console_unlock()
especially since later in the patch series we will want to spin on
console_sem in console_trylock_for_printk().

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

printk: fix lockdep instrumentation of console_sem

Printk calls mutex_acquire() / mutex_release() by hand to instrument
lockdep about console_sem. However in some corner cases the
instrumentation is missing. Fix the problem by creating helper functions
for locking / unlocking console_sem which take care of lockdep
instrumentation as well.

Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Fabio Estevam <festevam@gmail.com>
Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
Tested-By: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

printk-release-lockbuf_lock-before-calling-console_trylock_for_printk-fix

fix have_callable_console() comment

Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

printk: release lockbuf_lock before calling console_trylock_for_printk()

There's no reason to hold lockbuf_lock when entering
console_trylock_for_printk().

The first thing this function does is to call down_trylock(console_sem)
and if that fails it immediately unlocks lockbuf_lock.  So lockbuf_lock
isn't needed for that branch.  When down_trylock() succeeds, the rest of
console_trylock() is OK without lockbuf_lock (it is called without it from
other places), and the only remaining thing in
console_trylock_for_printk() is can_use_console() call.  For that call
console_sem is enough (it iterates all consoles and checks CON_ANYTIME
flag).

So we drop logbuf_lock before entering console_trylock_for_printk() which
simplifies the code.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

printk: remove outdated comment

Comment about interesting interlocking between lockbuf_lock and
console_sem is outdated.

It was added in 2002 by commit a880f45a48be2956d2c78a839c472287d54435c1
during conversion of console_lock to console_sem + lockbuf_lock.

At that time release_console_sem() (today's equivalent is
console_unlock()) was indeed using lockbuf_lock to avoid races between
trylock on console_sem in printk() and unlock of console_sem. However
these days the interlocking is gone and the races are avoided by
rechecking logbuf state after releasing console_sem.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

printk: return really stored message length

I wonder if anyone uses printk return value but it is there and should be
counted correctly.

This patch modifies log_store() to return the number of really stored
bytes from the 'text' part.  Also it handles the return value in
vprintk_emit().

Note that log_store() is used also in cont_flush() but we could ignore the
return value there.  The function works with characters that were already
counted earlier.  In addition, the store could newer fail here because the
length of the printed text is limited by the "cont" buffer and "dict" is
NULL.

Signed-off-by: Petr Mladek <pmladek@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

printk: shrink too long messages

We might want to print at least part of too long messages and add some
warning for debugging purpose.

The question is how long the shrunken message should be.  If we use the
whole buffer, it might get rotated too soon.  Let's try to use only 1/4 of
the buffer for now.

Also shrink the whole dictionary.  We do not want to parse it or break it
in the middle of some pair of values.  It would not cause any real harm
but still.

Signed-off-by: Petr Mladek <pmladek@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

printk: split message size computation

We will want to recompute the message size when shrinking too long
messages. Let's put the code into separate function.

The side effect of setting "pad_len" is not nice but it is worth removing
the code duplication. Note that I will probably have one more usage for
this function when handling messages safe way in NMI context.

This patch does not change the existing behavior.

Signed-off-by: Petr Mladek <pmladek@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

printk: ignore too long messages

There was no check for too long messages.  The check for free space always
passed when first_seq and next_seq were equal.  Enough free space was not
guaranteed, though.

log_store() might be called to store messages up to 64kB + 64kB + 16B.
This is sum of maximal text_len, dict_len values, and the size of the
structure printk_log.

On the other hand, the minimal size for the main log buffer currently is
4kB and it is enforced only by Kconfig.

The good news is that the usage looks safe right now.  log_store() is
called only from vprintk_emit() and cont_flush().  Here the "text" part is
always passed via a static buffer and the length is limited to
LOG_LINE_MAX which is 1024.  The "dict" part is NULL in most cases.  The
only exceptions is when vprintk_emit() is called from printk_emit() and
dev_vprintk_emit().  But printk_emit() is currently used only in
devkmsg_writev() and here "dict" is NULL as well.  In dev_vprintk_emit(),
"dict" is limited by the static buffer "hdr" of the size 128 bytes.  It
meas that the current maximal printed text is 1024B + 128B + 16B and it
always fit the log buffer.

But it is only matter of time when someone calls printk_emit() with unsafe
parameters, especially the "dict" one.

This patch adds a check for the free space when the buffer is empty.  It
reuses the already existing log_has_space() function but it has to add an
extra parameter.  It defines whether the buffer is empty.  Note that the
same values of "first_idx" and "next_idx" might also mean that the buffer
is full.

If the buffer is empty, we must respect the current position of the
indexes.  We cannot reset them to the beginning of the buffer.  Otherwise,
the functions reading the buffer would get crazy.

The question is what to do when the message is too long.  This patch uses
the easiest solution and just ignores the problematic message.  Let's do
something better in a followup patch.

Signed-off-by: Petr Mladek <pmladek@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

printk: split code for making free space in the log buffer

The check for free space in the log buffer always passes when "first_seq"
and "next_seq" are equal.  In theory, it might cause writing outside of
the log buffer.

Fortunately, the current usage looks safe because the used "text" and
"dict" buffers are quite limited.  See the second patch for more details.

Anyway, it is better to be on the safe side and add a check.  An easy
solution is done in the 2nd patch and it is improved in the 4th patch.

5th patch fixes the computation of the printed message length.

1st and 3rd patches just do some code refactoring to make the other
patches easier.

This patch (of 5):

There will be needed some fixes in the check for free space.  They will be
easier if the code is moved outside of the quite long log_store()
function.

This patch does not change the existing behavior.

Signed-off-by: Petr Mladek <pmladek@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

drivers/misc/ti-st/st_core.c: fix NULL dereference on protocol type check

If the type we receive is greater than ST_MAX_CHANNELS we can't rely on
type as vector index since we would be accessing unknown memory when we use the type
as index.

Unable to handle kernel NULL pointer dereference at virtual address 0000001b
pgd = c0004000
[0000001b] *pgd=00000000
Internal error: Oops: 17 [#1] PREEMPT SMP ARM
Modules linked in: btwilink wl12xx wlcore mac80211 cfg80211 rfcomm bnep bluo
CPU: 0    Tainted: G        W     (3.4.0+ #15)
PC is at st_int_recv+0x278/0x344
LR is at get_parent_ip+0x14/0x30
pc : [<c03b01a8>]    lr : [<c007273c>]    psr: 200f0193
sp : dc631ed0  ip : e3e21c24  fp : dc631f04
r10: 00000000  r9 : 600f0113  r8 : 0000003f
r7 : e3e21b14  r6 : 00000067  r5 : e2e49c1c  r4 : e3e21a80
r3 : 00000001  r2 : 00000001  r1 : 00000001  r0 : 600f0113
Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 10c5387d  Table: 9c50004a  DAC: 00000015

Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kernel/user.c: drop unused field 'files' from user_struct

Nobody seems uses it for a long time. Let's drop it.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kernel/hung_task.c: convert simple_strtoul to kstrtouint

sysctl_hung_task_panic has been changed to unsigned int. use kstrtouint
instead of obsolete simple_strtoul

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kernel/utsname_sysctl.c: replace obsolete __initcall by device_initcall

Also fixes checkpatch warnings on proc_dostring function parameters

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kernel/reboot.c: convert simple_strtoul to kstrtoint

Replace obsolete function.
kstrtoint is used as reboot_cpu is an integer.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kernel-res_counterc-replace-simple_strtoull-by-kstrtoull-fix

don't overwrite kstrtoull()'s errno

Cc: Fabian Frederick <fabf@skynet.be>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kernel/res_counter.c: replace simple_strtoull by kstrtoull

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kernel/tracepoint.c: kernel-doc fixes

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kernel/stop_machine.c: kernel-doc warning fix

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kernel/latencytop.c: convert seq_printf to seq_puts

This patch also fixes one function declaration over 80 characters.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kernel/exec_domain.c: code clean-up

Fix checkpatch warnings about EXPORT_SYMBOL and return()

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kernel/capability.c: code clean-up

-EXPORT_SYMBOL
-typo: unexpectidly->unexpectedly
-function prototype over 80 characters

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kernel/backtracetest.c: replace no level printk by pr_info()

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kernel/cpu.c: convert printk to pr_foo()

no level printk converted to pr_warn (if err)
no level printk converted to pr_info (disabling non-boot cpus)
Other printk converted to respective level.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sys_sgetmask/sys_ssetmask: add CONFIG_SGETMASK_SYSCALL

sys_sgetmask and sys_ssetmask are obsolete system calls no longer
supported in libc.

This patch replaces architecture related __ARCH_WANT_SYS_SGETMAX by expert
mode configuration.That option is enabled by default for those
architectures.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

do_shared_fault(): check that mmap_sem is held

mmap_sem() is required to protect the vma, which holds ->vm_file, which
pins fault_page->mapping.

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Bob Liu <lliubbo@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/zbud.c: make size unsigned like unique callsite

zbud_alloc is only called by zswap_frontswap_store with unsigned int len.
Change function parameter + update >= 0 check.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Seth Jennings <sjennings@variantweb.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

zram: correct offset usage in zram_bio_discard

We want to skip the physical block(PAGE_SIZE) which is partially covered
by the discard bio, so we check the remaining size and subtract it if
there is a need to goto the next physical block.

The current offset usage in zram_bio_discard is incorrect, it will cause
its upper filesystem breakdown. Consider the following scenario:

On some architecture or config, PAGE_SIZE is 64K for example, filesystem
is set up on zram disk without PAGE_SIZE aligned, a discard bio leads to a
offset = 4K and size=72K, normally, it should not really discard any
physical block as it partially cover two physical blocks. However, with
the current offset usage, it will discard the second physical block and
free its memory, which will cause filesystem breakdown.

This patch corrects the offset usage in zram_bio_discard.

Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm-page_alloc-debug_vm-checks-for-free_list-placement-of-cma-and-reserve-pages-fix-2

fix x86_64 allnoconfig build

In file included from include/linux/suspend.h:8,
from arch/x86/kernel/asm-offsets.c:12:
include/linux/mm.h: In function 'check_freepage_migratetype':
include/linux/mm.h:296: error: implicit declaration of function 'page_to_section'
In file included from include/linux/suspend.h:8,
from arch/x86/kernel/asm-offsets.c:12:
include/linux/mm.h: At top level:
include/linux/mm.h:892: error: conflicting types for 'page_to_section'
include/linux/mm.h:296: note: previous implicit declaration of 'page_to_section' was here

Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm-page_alloc-debug_vm-checks-for-free_list-placement-of-cma-and-reserve-pages-fix

Use VM_BUG_ON_PAGE instead of VM_BUG_ON as suggested by Sasha Levin.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/page_alloc: DEBUG_VM checks for free_list placement of CMA and RESERVE pages

For the MIGRATE_RESERVE pages, it is important they do not get misplaced
on free_list of other migratetype, otherwise the whole MIGRATE_RESERVE
pageblock might be changed to other migratetype in
try_to_steal_freepages().  For MIGRATE_CMA, the pages also must not go to
a different free_list, otherwise they could get allocated as unmovable and
result in CMA failure.

This is ensured by setting the freepage_migratetype appropriately when
placing pages on pcp lists, and using the information when releasing them
back to free_list.  It is also assumed that CMA and RESERVE pageblocks are
created only in the init phase.  This patch adds DEBUG_VM checks to catch
any regressions introduced for this invariant.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Yong-Taek Lee <ytk.lee@samsung.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: non-atomically mark page accessed during page cache allocation where possible

aops->write_begin may allocate a new page and make it visible only to have
mark_page_accessed called almost immediately after.  Once the page is
visible the atomic operations are necessary which is noticable overhead
when writing to an in-memory filesystem like tmpfs but should also be
noticable with fast storage.  The objective of the patch is to initialse
the accessed information with non-atomic operations before the page is
visible.

The bulk of filesystems directly or indirectly use
grab_cache_page_write_begin or find_or_create_page for the initial
allocation of a page cache page.  This patch adds an init_page_accessed()
helper which behaves like the first call to mark_page_accessed() but may
called before the page is visible and can be done non-atomically.

The primary APIs of concern in this care are the following and are used
by most filesystems.

find_get_page
find_lock_page
find_or_create_page
grab_cache_page_nowait
grab_cache_page_write_begin

All of them are very similar in detail to the patch creates a core helper
pagecache_get_page() which takes a flags parameter that affects its
behavior such as whether the page should be marked accessed or not.  Then
old API is preserved but is basically a thin wrapper around this core
function.

Each of the filesystems are then updated to avoid calling
mark_page_accessed when it is known that the VM interfaces have already
done the job.  There is a slight snag in that the timing of the
mark_page_accessed() has now changed so in rare cases it's possible a page
gets to the end of the LRU as PageReferenced where as previously it might
have been repromoted.  This is expected to be rare but it's worth the
filesystem people thinking about it in case they see a problem with the
timing change.  It is also the case that some filesystems may be marking
pages accessed that previously did not but it makes sense that filesystems
have consistent behaviour in this regard.

The test case used to evaulate this is a simple dd of a large file done
multiple times with the file deleted on each iterations.  The size of the
file is 1/10th physical memory to avoid dirty page balancing.  In the
async case it will be possible that the workload completes without even
hitting the disk and will have variable results but highlight the impact
of mark_page_accessed for async IO.  The sync results are expected to be
more stable.  The exception is tmpfs where the normal case is for the "IO"
to not hit the disk.

The test machine was single socket and UMA to avoid any scheduling or NUMA
artifacts.  Throughput and wall times are presented for sync IO, only wall
times are shown for async as the granularity reported by dd and the
variability is unsuitable for comparison.  As async results were variable
do to writback timings, I'm only reporting the maximum figures.  The sync
results were stable enough to make the mean and stddev uninteresting.

The performance results are reported based on a run with no profiling.
Profile data is based on a separate run with oprofile running.

async dd
                                    3.15.0-rc3            3.15.0-rc3
                                       vanilla           accessed-v2
ext3    Max      elapsed     13.9900 (  0.00%)     11.5900 ( 17.16%)
tmpfs Max      elapsed      0.5100 (  0.00%)      0.4900 (  3.92%)
btrfs   Max      elapsed     12.8100 (  0.00%)     12.7800 (  0.23%)
ext4 Max      elapsed     18.6000 (  0.00%)     13.3400 ( 28.28%)
xfs Max      elapsed     12.5600 (  0.00%)      2.0900 ( 83.36%)

The XFS figure is a bit strange as it managed to avoid a worst case by
sheer luck but the average figures looked reasonable.

        samples percentage
ext3       86107    0.9783  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
ext3       23833    0.2710  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
ext3        5036    0.0573  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
ext4       64566    0.8961  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
ext4        5322    0.0713  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
ext4        2869    0.0384  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
xfs        62126    1.7675  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
xfs         1904    0.0554  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
xfs          103    0.0030  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
btrfs      10655    0.1338  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
btrfs       2020    0.0273  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
btrfs        587    0.0079  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
tmpfs      59562    3.2628  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
tmpfs       1210    0.0696  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
tmpfs         94    0.0054  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

fs-buffer-do-not-use-unnecessary-atomic-operations-when-discarding-buffers-fix

move BUFFER_FLAGS_DISCARD into the .c file

Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

fs: buffer: do not use unnecessary atomic operations when discarding buffers

Discarding buffers uses a bunch of atomic operations when discarding
buffers because ...... I can't think of a reason. Use a cmpxchg loop to
clear all the necessary flags. In most (all?) cases this will be a single
atomic operations.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: do not use unnecessary atomic operations when adding pages to the LRU

When adding pages to the LRU we clear the active bit unconditionally.  As
the page could be reachable from other paths we cannot use unlocked
operations without risk of corruption such as a parallel
mark_page_accessed.  This patch tests if is necessary to clear the active
flag before using an atomic operation.  This potentially opens a tiny race
when PageActive is checked as mark_page_accessed could be called after
PageActive was checked.  The race already exists but this patch changes it
slightly.  The consequence is that that the page may be promoted to the
active list that might have been left on the inactive list before the
patch.  It's too tiny a race and too marginal a consequence to always use
atomic operations for.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: do not use atomic operations when releasing pages

There should be no references to it any more and a parallel mark should
not be reordered against us. Use non-locked varient to clear page active.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: shmem: avoid atomic operation during shmem_getpage_gfp

shmem_getpage_gfp uses an atomic operation to set the SwapBacked field
before it's even added to the LRU or visible. This is unnecessary as what
could it possible race against? Use an unlocked variant.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: page_alloc: convert hot/cold parameter and immediate callers to bool

cold is a bool, make it one. Make the likely case the "if" part of the
block instead of the else as according to the optimisation manual this is
preferred.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: page_alloc: use unsigned int for order in more places

X86 prefers the use of unsigned types for iterators and there is a
tendency to mix whether a signed or unsigned type if used for page order.
This converts a number of sites in mm/page_alloc.c to use unsigned int for
order where possible.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: page_alloc: lookup pageblock migratetype with IRQs enabled during free

get_pageblock_migratetype() is called during free with IRQs disabled.
This is unnecessary and disables IRQs for longer than necessary.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: page_alloc: reduce number of times page_to_pfn is called

In the free path we calculate page_to_pfn multiple times. Reduce that.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: page_alloc: use word-based accesses for get/set pageblock bitmaps

The test_bit operations in get/set pageblock flags are expensive. This
patch reads the bitmap on a word basis and use shifts and masks to isolate
the bits of interest. Similarly masks are used to set a local copy of the
bitmap and then use cmpxchg to update the bitmap if there have been no
other changes made in parallel.

In a test running dd onto tmpfs the overhead of the pageblock-related
functions went from 1.27% in profiles to 0.5%.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: page_alloc: take the ALLOC_NO_WATERMARK check out of the fast path

ALLOC_NO_WATERMARK is set in a few cases.  Always by kswapd, always for
__GFP_MEMALLOC, sometimes for swap-over-nfs, tasks etc.  Each of these
cases are relatively rare events but the ALLOC_NO_WATERMARK check is an
unlikely branch in the fast path.  This patch moves the check out of the
fast path and after it has been determined that the watermarks have not
been met.  This helps the common fast path at the cost of making the slow
path slower and hitting kswapd with a performance cost.  It's a reasonable
tradeoff.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: page_alloc: only check the alloc flags and gfp_mask for dirty once

Currently it's calculated once per zone in the zonelist.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>