git.karo-electronics.de Git - karo-tx-linux.git/log

drivers/firmware/dmi_scan.c: fetch dmi version from SMBIOS if it exists

The right dmi version is in SMBIOS if it's zero in DMI region

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Feng Jin <joe.jin@oracle.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

drivers-firmware-dmi_scanc-check-dmi-version-when-get-system-uuid-fix

tweak code comment

Cc: Feng Jin <joe.jin@oracle.com>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

drivers/firmware/dmi_scan.c: check dmi version when get system uuid

As of version 2.6 of the SMBIOS specification, the first 3
fields of the UUID are supposed to be little-endian encoded.

Also a minor fix to match variable meaning and mute checkpatch.pl

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Feng Jin <joe.jin@oracle.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

scatterlist: atomic sg_mapping_iter() no longer needs disabled IRQs

SG mapping iterator w/ SG_MITER_ATOMIC set required IRQ disabled because
it originally used KM_BIO_SRC_IRQ to allow use from IRQ handlers.
kmap_atomic() has long been updated to handle stacking atomic mapping
requests on per-cpu basis and only requires not sleeping while mapped.

Update sg_mapping_iter such that atomic iterators only require disabling
preemption instead of disabling IRQ.

While at it, convert wte weird @ARG@ notations to @ARG in the comment of
sg_miter_start().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Alex Dubov <oakad@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/plist.c: make plist test announcements KERN_DEBUG

They show up in dmesg

[ 4.041094] start plist test
[ 4.045804] end plist test

without a lot of meaning so hide them behind debug loglevel.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/vsprintf.c: improve standard conformance of sscanf()

Xen's pciback points out a couple of deficiencies with vsscanf()'s
standard conformance:

- Trailing character matching cannot be checked by the caller: With a
  format string of "(%x:%x.%x) %n" absence of the closing parenthesis
  cannot be checked, as input of "(00:00.0)" doesn't cause the %n to be
  evaluated (because of the code not skipping white space before the
  trailing %n).

- The parameter corresponding to a trailing %n could get filled even if
  there was a matching error: With a format string of "(%x:%x.%x)%n",
  input of "(00:00.0]" would still fill the respective variable pointed to
  (and hence again make the mismatch non-detectable by the caller).

This patch aims at fixing those, but leaves other non-conforming aspects
of it untouched, among them these possibly relevant ones:

- improper handling of the assignment suppression character '*' (blindly
  discarding all succeeding non-white space from the format and input
  strings),

- not honoring conversion specifiers for %n, - not recognizing the C99
  conversion specifier 't' (recognized by vsprintf()).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib-spinlock_debug-avoid-livelock-in-do_raw_spin_lock-fix

tweak comment

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vikram Mulukutla <markivx@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/spinlock_debug: avoid livelock in do_raw_spin_lock()

The logic in do_raw_spin_lock() attempts to acquire a spinlock by invoking
arch_spin_trylock() in a loop with a delay between each attempt.  Now
consider the following situation in a 2 CPU system:

1. CPU-0 continually acquires and releases a spinlock in a
   tight loop; it stays in this loop until some condition X
   is satisfied. X can only be satisfied by another CPU.

2. CPU-1 tries to acquire the same spinlock, in an attempt
   to satisfy the aforementioned condition X. However, it
   never sees the unlocked value of the lock because the
   debug spinlock code uses trylock instead of just lock;
   it checks at all the wrong moments - whenever CPU-0 has
   locked the lock.

Now in the absence of debug spinlocks, the architecture specific spinlock
code can correctly allow CPU-1 to wait in a "queue" (e.g., ticket
spinlocks), ensuring that it acquires the lock at some point.  However,
with the debug spinlock code, livelock can easily occur due to the use of
try_lock, which obviously cannot put the CPU in that "queue".  This
queueing mechanism is implemented in both x86 and ARM spinlock code.

Note that the situation mentioned above is not hypothetical.  A real
problem was encountered where CPU-0 was running hrtimer_cancel with
interrupts disabled, and CPU-1 was attempting to run the hrtimer that
CPU-0 was trying to cancel.

Address this by actually attempting arch_spin_lock once it is suspected
that there is a spinlock lockup.  If we're in a situation that is
described above, the arch_spin_lock should succeed; otherwise other
timeout mechanisms (e.g., watchdog) should alert the system of a lockup.
Therefore, if there is a genuine system problem and the spinlock can't be
acquired, the end result (irrespective of this change being present) is
the same.  If there is a livelock caused by the debug code, this change
will allow the lock to be acquired, depending on the implementation of the
lower level arch specific spinlock code.

Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

genalloc: make it possible to use a custom allocation algorithm

Premit use of another algorithm than the default first-fit one.  For
example a custom algorithm could be used to manage alignment requirements.

As I can't predict all the possible requirements/needs for all allocation
uses cases, I add a "free" field 'void *data' to pass any needed
information to the allocation function.  For example 'data' could be used
to handle a structure where you store the alignment, the expected memory
bank, the requester device, or any information that could influence the
allocation algorithm.

An usage example may look like this:
struct my_pool_constraints {
int align;
int bank;
...
};

unsigned long my_custom_algo(unsigned long *map, unsigned long size,
unsigned long start, unsigned int nr, void *data)
{
struct my_pool_constraints *constraints = data;
...
deal with allocation contraints
...
return the index in bitmap where perform the allocation
}

void create_my_pool()
{
struct my_pool_constraints c;
struct gen_pool *pool = gen_pool_create(...);
gen_pool_add(pool, ...);
gen_pool_set_algo(pool, my_custom_algo, &c);
}

Add of best-fit algorithm function:
most of the time best-fit is slower then first-fit but memory fragmentation
is lower. The random buffer allocation/free tests don't show any arithmetic
relation between the allocation time and fragmentation but the
best-fit algorithm
is sometime able to perform the allocation when the first-fit can't.

This new algorithm help to remove static allocations on ESRAM, a small but
fast on-chip RAM of few KB, used for high-performance uses cases like DMA
linked lists, graphic accelerators, encoders/decoders. On the Ux500
(in the ARM tree) we have define 5 ESRAM banks of 128 KB each and use of
static allocations becomes unmaintainable:
cd arch/arm/mach-ux500 && grep -r ESRAM .
./include/mach/db8500-regs.h:/* Base address and bank offsets for ESRAM */
./include/mach/db8500-regs.h:#define U8500_ESRAM_BASE   0x40000000
./include/mach/db8500-regs.h:#define U8500_ESRAM_BANK_SIZE      0x00020000
./include/mach/db8500-regs.h:#define U8500_ESRAM_BANK0  U8500_ESRAM_BASE
./include/mach/db8500-regs.h:#define U8500_ESRAM_BANK1       (U8500_ESRAM_BASE + U8500_ESRAM_BANK_SIZE)
./include/mach/db8500-regs.h:#define U8500_ESRAM_BANK2       (U8500_ESRAM_BANK1 + U8500_ESRAM_BANK_SIZE)
./include/mach/db8500-regs.h:#define U8500_ESRAM_BANK3       (U8500_ESRAM_BANK2 + U8500_ESRAM_BANK_SIZE)
./include/mach/db8500-regs.h:#define U8500_ESRAM_BANK4       (U8500_ESRAM_BANK3 + U8500_ESRAM_BANK_SIZE)
./include/mach/db8500-regs.h:#define U8500_ESRAM_DMA_LCPA_OFFSET     0x10000
./include/mach/db8500-regs.h:#define U8500_DMA_LCPA_BASE
(U8500_ESRAM_BANK0 + U8500_ESRAM_DMA_LCPA_OFFSET)
./include/mach/db8500-regs.h:#define U8500_DMA_LCLA_BASE U8500_ESRAM_BANK4

I want to use genalloc to do dynamic allocations but I need to be able to
fine tune the allocation algorithm. I my case best-fit algorithm give
better results than first-fit, but it will not be true for every use case.

Signed-off-by: Benjamin Gaignard <benjamin.gaignard@stericsson.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/gcd.c: prevent possible div by 0

Account for all properties when a and/or b are 0:
gcd(0, 0) = 0
gcd(a, 0) = a
gcd(0, b) = b

Fixes no known problems in current kernels.

Signed-off-by: Davidlohr Bueso <dave@gnu.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/Kconfig.debug: adjust hard-lockup related Kconfig options

The main option should not appear in the resulting .config when the
dependencies aren't met (i.e. use "depends on" rather than directly
setting the default from the combined dependency values).

The sub-options should depend on the main option rather than a more
generic higher level one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib-parserc-avoid-overflow-in-match_number-fix

coding-style tweaks

Cc: Alex Elder <elder@inktank.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/parser.c: avoid overflow in match_number()

The result of converting an integer value to another signed integer type
that's unable to represent the original value is implementation defined.
(See notes in section 6.3.1.3 of the C standard.)

In match_number(), the result of simple_strtol() (which returns type long)
is assigned to a value of type int.

Instead, handle the result of simple_strtol() in a well-defined way, and
return -ERANGE if the result won't fit in the int variable used to hold
the parsed result.

No current callers pay attention to the particular error value returned,
so this additional return code shouldn't do any harm.

Signed-off-by: Alex Elder <elder@inktank.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

idr-rename-max_level-to-max_idr_level-fix-3

Cc: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

idr-rename-max_level-to-max_idr_level-fix-fix-2-fix

IB/mlx4: fix for MAX_ID_MASK to MAX_IDR_MASK name change

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

idr-rename-max_level-to-max_idr_level-fix-fix-2

ho hum

Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: walter harms <wharms@bfs.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

idr-rename-max_level-to-max_idr_level-fix

repair fallout

Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: walter harms <wharms@bfs.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

idr: rename MAX_LEVEL to MAX_IDR_LEVEL

To avoid name conflicts:

drivers/video/riva/fbdev.c:281:9: sparse: preprocessor token MAX_LEVEL redefined

While at it, also make the other names more consistent and add
parentheses.

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: walter harms <wharms@bfs.de>
Cc: Glauber Costa <glommer@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kvm: replace test_and_set_bit_le() in mark_page_dirty_in_slot() with set_bit_le()

Now that we have defined generic set_bit_le() we do not need to use
test_and_set_bit_le() for atomically setting a bit.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

powerpc: bitops: introduce {clear,set}_bit_le()

Needed to replace test_and_set_bit_le() in virt/kvm/kvm_main.c which is
being used for this missing function.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

bitops: introduce generic {clear,set}_bit_le()

Needed to replace test_and_set_bit_le() in virt/kvm/kvm_main.c which is
being used for this missing function.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

drivers/net/ethernet/dec/tulip: Use standard __set_bit_le() function

To introduce generic set_bit_le() later, we remove our own definition
and use a proper non-atomic bitops function: __set_bit_le().

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Acked-by: Grant Grundler <grundler@parisc-linux.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

drivers/net/ethernet/sfc: use standard __{clear,set}_bit_le() functions

There are now standard functions for dealing with little-endian bit
arrays, so use them instead of our own implementations.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

pwm_backlight-add-device-tree-support-for-low-threshold-brightness-fix

Cc: "Philip, Avinash" <avinashphilip@ti.com>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Philip, Avinash <avinashphilip@ti.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Thierry Reding <thierry.reding@avionic-design.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

pwm_backlight: add device tree support for Low Threshold Brightness

Some backlights perform poorly when driven by a PWM with a short
duty-cycle. For such devices, the low threshold can be used to specify a
lower bound for the duty-cycle and should be chosen to exclude the
problematic range.

Add device tree probing support for
platform_pwm_backlight_data,lth_brightness, putting
low-threshold-brightness as the optional property's name.

Signed-off-by: Philip, Avinash <avinashphilip@ti.com>
Cc: Thierry Reding <thierry.reding@avionic-design.de>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

backlight-platform-lcd-add-support-for-device-tree-based-probe-fix

include of.h

Cc: Jingoo Han <jg1.han@samsung.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

drivers/video/backlight/platform_lcd.c: add support for device tree based probe

This patch adds the of_match_table to platform-lcd driver to be
probed when platform-lcd device node is found in the device tree.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

drivers/video/backlight/da9052_bl.c: drop devm_kfree of devm_kzalloc'd data

devm_kfree should not have to be explicitly used.

The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression x,d;
@@

x = devm_kzalloc(...)
...
?-devm_kfree(d,x);
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

backlight: remove ProGear driver

This driver was for the ProGear webpad device which was produced in
2000/2001 and is not available on a market. I no longer have this
hardware so can not even check how Linux works on it.

Signed-off-by: Marcin Juszkiewicz <marcin@juszkiewicz.com.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

backlight-add-new-lm3639-backlight-driver-fix-2

Put NEW_LEDS and LEDS_CLASS into Kconfig file to reduce additional
configuration works in LEDs.

Signed-off-by: G.Shark Jeong <gshark.jeong@gmail.com>
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

backlight-add-new-lm3639-backlight-driver-fix

code layout tweaks

Cc: "G.Shark Jeong" <gshark.jeong@gmail.com>
Cc: Daniel Jeong <daniel.jeong@ti.com>
Cc: G.Shark Jeong <gshark.jeong@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

backlight: add new lm3639 backlight driver

This driver is a general version for LM3639 backlgiht + flash driver chip
of TI.

LM3639:
The LM3639 is a single chip LCD Display Backlight driver + white LED
Camera driver. Programming is done over an I2C compatible interface.
www.ti.com

Signed-off-by: G.Shark Jeong <gshark.jeong@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Daniel Jeong <daniel.jeong@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

backlight-add-backlight-driver-for-lm3630-chip-fix

- make bled_name[] static

- a few coding style tuneups

- create new set_intensity(), partly to avoid awkward layout gymnastics

Cc: "G.Shark Jeong" <gshark.jeong@gmail.com>
Cc: Daniel Jeong <daniel.jeong@ti.com>
Cc: G.Shark Jeong <gshark.jeong@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

backlight: add Backlight driver for lm3630 chip

This driver is a general version for LM3630 backlgiht driver chip of TI.

LM3630 :
The LM3630 is a current mode boost converter which supplies the power
and controls the current in two strings of up to 10 LEDs per string.
Programming is done over an I2C compatible interface.
www.ti.com

Signed-off-by: G.Shark Jeong <gshark.jeong@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Daniel Jeong <daniel.jeong@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

backlight: lp855x: add FAST bit description for LP8556

LP8556 backlight driver supports fast refresh mode when exiting the low
power mode. This bit can be configurable in the platform side.

Signed-off-by: Milo(Woogyom) Kim <milo.kim@ti.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Bryan Wu <bryan.wu@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

drivers/video/backlight/kb3886_bl.c: use usleep_range() instead of msleep() for small sleeps

Since msleep() might not sleep for the desired amount when less than 20ms,
use usleep_range().

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Cc: Claudio Nieder <private@claudio.ch>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

drivers/video/backlight/ltv350qv.c: use usleep_range() instead of msleep() for small sleeps

Since msleep() might not sleep for the desired amount when less than 20ms,
use usleep_range().

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

drivers/video/backlight/da9052_bl.c: use usleep_range() instead of msleep() for small sleeps

Since msleep() might not sleep for the desired amount when less than 20ms,
use usleep_range().

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Cc: Ashish Jangam <ashish.jangam@kpitcummins.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

MAINTAINERS: fix indentation for Viresh Kumar

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

MAINTAINERS: Update gianfar_ptp after renaming

commit ec21e2ec36769 ("freescale: Move the Freescale drivers")
moved the files, update the pattern.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

MAINTAINERS: add defconfig file to IMX section

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

MAINTAINERS: update gpio subsystem file list

Signed-off-by: Yang Bai <hamo.by@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/vsprintf: update documentation to cover all of %p[Mm][FR]

Acked-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib: vsprintf: fix broken comments

Numbering the 8 potential digits 2 though 9 never did make a lot of sense.

Signed-off-by: George Spelvin <linux@horizon.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib-vsprintf-optimize-put_dec_trunc8-fix

Rabin Vincent <rabin@rab.in> wrote:
> This patch breaks IP address printing with "%pI4" (and by extension,
> nfsroot).  Example:
>
>  - Before: 10.0.0.1
>  - After: 10...1

Mea culpa, and thank you for catching it!  As I said in my earlier
comment, I tested this most extensively wrapped by some sprintf code
that liked 0 converted to a 0-length string, as that works naturally
with the ANSI spec for %.0u.  And it turns out not to matter for the
usual printf code, as num_to_str special-cases that anyway.

The fix is straightforward:

Signed-off-by: George Spelvin <linux@horizon.com>
Cc: Rabin Vincent <rabin@rab.in>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib: vsprintf: optimize put_dec_trunc8()

If you're going to have a conditional branch after each 32x32->64-bit
multiply, might as well shrink the code and make it a loop.

This also avoids using the long multiply for small integers.

(This leaves the comments in a confusing state, but that's a separate
patch to make review easier.)

Signed-off-by: George Spelvin <linux@horizon.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib: vsprintf: optimize division by 10000

The same multiply-by-inverse technique can be used to convert division by
10000 to a 32x32->64-bit multiply.

Signed-off-by: George Spelvin <linux@horizon.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib: vsprintf: optimize division by 10 for small integers

Shrink the reciprocal approximations used in put_dec_full4() based on the
comments in put_dec_full9().

Signed-off-by: George Spelvin <linux@horizon.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

poweroff: fix bug in orderly_poweroff()

orderly_poweroff is trying to poweroff platform in two steps:

step 1: Call user space application to poweroff
step 2: If user space poweroff fail, then do a force power off if force param
is set.

The bug here is, step 1 is always successful with param UMH_NO_WAIT, which obey
the design goal of orderly_poweroff.

We have two choices here:
UMH_WAIT_EXEC which means wait for the exec, but not the process;
UMH_WAIT_PROC which means wait for the process to complete.
we need to trade off the two choices:

If using UMH_WAIT_EXEC, there is potential issue comments by Serge E.
Hallyn: The exec will have started, but may for whatever (very unlikely)
reason fail.

If using UMH_WAIT_PROC, there is potential issue comments by Eric W.
Biederman: If the caller is not running in a kernel thread then we can
easily get into a case where the user space caller will block waiting for
us when we are waiting for the user space caller.

Thanks for their excellent ideas, based on the above discussion, we
finally choose UMH_WAIT_EXEC, which is much more safe, if the user
application really fails, we just complain the application itself, it
seems a better choice here.

Signed-off-by: Feng Hong <hongfeng@marvell.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kernel/sys.c: call disable_nonboot_cpus() in kernel_restart()

As kernel_power_off() calls disable_nonboot_cpus(), we may also want to
have kernel_restart() call disable_nonboot_cpus().  Doing so can help
machines that require boot cpu be the last alive cpu during reboot to
survive with kernel restart.

This fixes one reboot issue seen on imx6q (Cortex-A9 Quad).  The machine
requires that the restart routine be run on the primary cpu rather than
secondary ones.  Otherwise, the secondary core running the restart routine
will fail to come to online after reboot.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

tile: fix personality bits handling upon exec()

Historically, the top three bytes of personality have been used for things
such as ADDR_NO_RANDOMIZE, which made sense only for specific
architectures.

We now however have a flag there that is general no matter the
architecture (UNAME26); generally we have to be careful to preserve the
personality flags across exec().

This patch fixes tile architecture not to forcefully overwrite personality
flags during exec().

In addition to that, we fix two other things along the way:

- exec_domain switching is fixed -- set_personality() should always
be used instead of directly assigning to current->personality.
- as pointed out by Arnd Bergmann, PER_LINUX_32BIT is not used anywhere
by tile, so let's just drop that in favor of PER_LINUX

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

cross-arch: don't corrupt personality flags upon exec()

Historically, the top three bytes of personality have been used for things
such as ADDR_NO_RANDOMIZE, which made sense only for specific
architectures.

We now however have a flag there that is general no matter the
architecture (UNAME26); generally we have to be careful to preserve the
personality flags across exec().

This patch tries to fix all architectures that forcefully overwrite
personality flags during exec() (ppc32 and s390 have been fixed recently
by commits f9783ec86 and 59e4c3a2f in a similar way already).

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

score: select generic atomic64_t support

It's required for the core fs/namespace.c and many other basic features.

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Lennox Wu <lennox.wu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

frv: kill used but uninitialized variable

Commit 6afe1a1fe8ff83f6a ("PM: Remove legacy PM") removed the
initialization of retval, causing:

arch/frv/kernel/pm.c: In function 'sysctl_pm_do_suspend':
arch/frv/kernel/pm.c:165:5: warning: 'retval' may be used uninitialized in this function [-Wuninitialized]

Remove the variable completely to fix this, and convert to a proper
switch (...) { ... } construct to improve readability.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sectons-fix-const-sections-for-crc32-table-checkpatch-fixes

ERROR: code indent should use tabs where possible
#54: FILE: lib/gen_crc32table.c:112:
+        ^Iprintf("static u32 __cacheline_aligned "$

WARNING: please, no space before tabs
#54: FILE: lib/gen_crc32table.c:112:
+        ^Iprintf("static u32 __cacheline_aligned "$

WARNING: please, no spaces at the start of a line
#54: FILE: lib/gen_crc32table.c:112:
+        ^Iprintf("static u32 __cacheline_aligned "$

ERROR: code indent should use tabs where possible
#63: FILE: lib/gen_crc32table.c:122:
+        ^Iprintf("static u32 __cacheline_aligned "$

WARNING: please, no space before tabs
#63: FILE: lib/gen_crc32table.c:122:
+        ^Iprintf("static u32 __cacheline_aligned "$

WARNING: please, no spaces at the start of a line
#63: FILE: lib/gen_crc32table.c:122:
+        ^Iprintf("static u32 __cacheline_aligned "$

ERROR: code indent should use tabs where possible
#72: FILE: lib/gen_crc32table.c:131:
+        ^Iprintf("static u32 __cacheline_aligned "$

WARNING: please, no space before tabs
#72: FILE: lib/gen_crc32table.c:131:
+        ^Iprintf("static u32 __cacheline_aligned "$

WARNING: please, no spaces at the start of a line
#72: FILE: lib/gen_crc32table.c:131:
+        ^Iprintf("static u32 __cacheline_aligned "$

total: 3 errors, 6 warnings, 48 lines checked

NOTE: whitespace errors detected, you may wish to use scripts/cleanpatch or
      scripts/cleanfile

./patches/sectons-fix-const-sections-for-crc32-table.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Joe Mario <jmario@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections: fix const sections for crc32 table

Fix the const sections for the code generated by crc32 table. There's no
ro version of the cacheline aligned section, so we cannot put in const
data without a conflict Just don't make the crc tables const for now.

[ak@linux.intel.com: some fixes and new description]
Signed-off-by: Joe Mario <jmario@redhat.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections: fix section conflicts in sound

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections: fix section conflicts in net

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections: fix section conflicts in net/can

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections: fix section conflicts in mm/percpu.c

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections: fix section conflicts in drivers/video

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections: fix section conflicts in drivers/scsi

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections: fix section conflicts in drivers/platform/x86

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections: fix section conflicts in drivers/net/wan

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Krzysztof Halasa <khc@pm.waw.pl>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections: fix section conflicts in drivers/net/hamradio

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections: fix section conflicts in drivers/net

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections: fix section conflicts in drivers/mmc

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Chris Ball <cjb@laptop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections: fix section conflicts in drivers/mfd

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections-fix-section-conflicts-in-drivers-macintosh-checkpatch-fixes

ERROR: space prohibited before open square bracket '['
#20: FILE: drivers/macintosh/macio_asic.c:751:
+static const struct pci_device_id __devinitconst pci_ids [] = { {

total: 1 errors, 0 warnings, 8 lines checked

./patches/sections-fix-section-conflicts-in-drivers-macintosh.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections: fix section conflicts in drivers/macintosh

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections: fix section conflicts in drivers/ide

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections: fix section conflicts in drivers/char

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections: fix section conflicts in drivers/atm

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections: fix section conflicts in arch/x86

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections-fix-section-conflicts-in-arch-sh-fix

Cc: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections: fix section conflicts in arch/sh

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections: fix section conflicts in arch/score

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections: fix section conflicts in arch/powerpc

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections: fix section conflicts in arch/mips

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections: fix section conflicts in arch/ia64

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections-fix-section-conflicts-in-arch-h8300-checkpatch-fixes

ERROR: spaces required around that '=' (ctx:VxV)
#105: FILE: arch/h8300/platform/h8s/irq.c:21:
+const int __initconst h8300_saved_vectors[]={
^

total: 1 errors, 0 warnings, 71 lines checked

./patches/sections-fix-section-conflicts-in-arch-h8300.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections: fix section conflicts in arch/h8300

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections: fix section conflicts in arch/frv

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections: fix section conflicts in arch/arm/

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sections: disable const sections for PA-RISC v2

The PA-RISC tool chain seems to have some problem with correct read/write
attributes on sections. This causes problems when the const sections
are fixed up for other architecture to only contain truly read-only
data.

Disable const sections for PA-RISC

This can cause a bit of noise with modpost.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

cma: decrease cc.nr_migratepages after reclaiming pagelist

reclaim_clean_pages_from_list() reclaims clean pages before migration so
cc.nr_migratepages should be updated. Currently, there is no problem but
it can be wrong if we try to use the value in future.

Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

CMA: migrate mlocked pages

Presently CMA cannot migrate mlocked pages so it ends up failing to allocate
contiguous memory space.

This patch makes mlocked pages be migrated out. Of course, it can affect
realtime processes but in CMA usecase, contiguous memory allocation failing
is far worse than access latency to an mlocked page being variable while
CMA is running. If someone wants to make the system realtime, he shouldn't
enable CMA because stalls can still happen at random times.

Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kpageflags: fix wrong KPF_THP on non-huge compound pages

KPF_THP can be set on non-huge compound pages (like slab pages or pages
allocated by drivers with __GFP_COMP) because PageTransCompound only
checks PG_head and PG_tail. Obviously this is a bug and breaks user space
applications which look for thp via /proc/kpageflags.

This patch rules out setting KPF_THP wrongly by additionally checking
PageLRU on the head pages.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

fs/fs-writeback.c: remove unneccesary parameter of __writeback_single_inode()

The parameter 'wb' is never used in this function.

Signed-off-by: Yan Hong <clouds.yan@gmail.com>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/memory.c: fix typo in comment

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

KSM: numa awareness sysfs knob

Introduces new sysfs boolean knob /sys/kernel/mm/ksm/merge_across_nodes
which control merging pages across different numa nodes.  When it is set
to zero only pages from the same node are merged, otherwise pages from all
nodes can be merged together (default behavior).

Typical use-case could be a lot of KVM guests on NUMA machine and cpus
from more distant nodes would have significant increase of access latency
to the merged ksm page.  Sysfs knob was choosen for higher variability
when some users still prefers higher amount of saved physical memory
regardless of access latency.

Every numa node has its own stable & unstable trees because of faster
searching and inserting.  Changing of merge_nodes value is possible only
when there are not any ksm shared pages in system.

I've tested this patch on numa machines with 2, 4 and 8 nodes and measured
speed of memory access inside of KVM guests with memory pinned to one of
nodes with this benchmark:

http://pholasek.fedorapeople.org/alloc_pg.c

Population standard deviations of access times in percentage of average
were following:

merge_nodes=1
2 nodes 1.4%
4 nodes 1.6%
8 nodes 1.7%

merge_nodes=0
2 nodes 1%
4 nodes 0.32%
8 nodes 0.018%

RFC: https://lkml.org/lkml/2011/11/30/91
v1: https://lkml.org/lkml/2012/1/23/46
v2: https://lkml.org/lkml/2012/6/29/105
v3: https://lkml.org/lkml/2012/9/14/550

Signed-off-by: Petr Holasek <pholasek@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Anton Arapov <anton@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: remove unevictable_pgs_mlockfreed

Simply remove UNEVICTABLE_MLOCKFREED and unevictable_pgs_mlockfreed line
from /proc/vmstat: Johannes and Mel point out that it was very unlikely to
have been used by any tool, and of course we can restore it easily enough
if that turns out to be wrong.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ying Han <yinghan@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

memory-hotplug: fix zone stat mismatch

During memory-hotplug, I found NR_ISOLATED_[ANON|FILE] are increasing,
causing the kernel to hang. When the system doesn't have enough free
pages, it enters reclaim but never reclaim any pages due to
too_many_isolated()==true and loops forever.

The cause is that when we do memory-hotadd after memory-remove,
__zone_pcp_update() clears a zone's ZONE_STAT_ITEMS in setup_pageset()
although the vm_stat_diff of all CPUs still have values.

In addtion, when we offline all pages of the zone, we reset them in
zone_pcp_reset without draining so we loss some zone stat item.

Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: enhance comment and bug check

This patch updates comment and bug check.
It can be fold into [1].

[1] mm-revert-0def08e3-mm-mempolicyc-check-return-code-of-check_range.patch

Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vasiliy Kulikov <segooon@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Revert 0def08e3 because check_range can't fail in migrate_to_node with
considering current usecases.

Quote from Johannes

: I think it makes sense to revert.  Not because of the semantics, but I
: just don't see how check_range() could even fail for this callsite:
:
: 1. we pass mm->mmap->vm_start in there, so we should not fail due to
:    find_vma()
:
: 2. we pass MPOL_MF_DISCONTIG_OK, so the discontig checks do not apply
:    and so can not fail
:
: 3. we pass MPOL_MF_MOVE | MPOL_MF_MOVE_ALL, the page table loops will
:    continue until addr == end, so we never fail with -EIO

And I added a new VM_BUG_ON for checking migrate_to_node's future usecase
which might pass to MPOL_MF_STRICT.

Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vasiliy Kulikov <segooon@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: call invalidate_range_end in do_wp_page even for zero pages

The previous patch "mm: wrap calls to set_pte_at_notify with
invalidate_range_start and invalidate_range_end" only called the
invalidate_range_end mmu notifier function in do_wp_page when the new_page
variable wasn't NULL.  This was done in order to only call
invalidate_range_end after invalidate_range_start was called.
Unfortunately, there are situations where new_page is NULL and
invalidate_range_start is called.  This caused invalidate_range_start to
be called without a matching invalidate_range_end, causing kvm to loop
indefinitely on the first page fault.

This patch adds a flag variable to do_wp_page that marks whether the
invalidate_range_start notifier was called.  invalidate_range_end is then
called if the flag is true.

Reported-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Cc: Andrea Arcangeli <andrea@qumranet.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Haggai Eran <haggaie@mellanox.com>
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Liran Liss <liranl@mellanox.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: wrap calls to set_pte_at_notify with invalidate_range_start and invalidate_range_end

In order to allow sleeping during invalidate_page mmu notifier calls, we
need to avoid calling when holding the PT lock. In addition to its direct
calls, invalidate_page can also be called as a substitute for a change_pte
call, in case the notifier client hasn't implemented change_pte.

This patch drops the invalidate_page call from change_pte, and instead
wraps all calls to change_pte with invalidate_range_start and
invalidate_range_end calls.

Note that change_pte still cannot sleep after this patch, and that clients
implementing change_pte should not take action on it in case the number of
outstanding invalidate_range_start calls is larger than one, otherwise
they might miss a later invalidation.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Cc: Andrea Arcangeli <andrea@qumranet.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Haggai Eran <haggaie@mellanox.com>
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Liran Liss <liranl@mellanox.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: move all mmu notifier invocations to be done outside the PT lock

In order to allow sleeping during mmu notifier calls, we need to avoid
invoking them under the page table spinlock.  This patch solves the
problem by calling invalidate_page notification after releasing the lock
(but before freeing the page itself), or by wrapping the page invalidation
with calls to invalidate_range_begin and invalidate_range_end.

To prevent accidental changes to the invalidate_range_end arguments after
the call to invalidate_range_begin, the patch introduces a convention of
saving the arguments in consistently named locals:

unsigned long mmun_start; /* For mmu_notifiers */
unsigned long mmun_end; /* For mmu_notifiers */

...

mmun_start = ...
mmun_end = ...
mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);

...

mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);

The patch changes code to use this convention for all calls to
mmu_notifier_invalidate_range_start/end, except those where the calls are
close enough so that anyone who glances at the code can see the values
aren't changing.

This patchset is a preliminary step towards on-demand paging design to be
added to the RDMA stack.

Why do we want on-demand paging for Infiniband?

  Applications register memory with an RDMA adapter using system calls,
  and subsequently post IO operations that refer to the corresponding
  virtual addresses directly to HW.  Until now, this was achieved by
  pinning the memory during the registration calls.  The goal of on demand
  paging is to avoid pinning the pages of registered memory regions (MRs).
   This will allow users the same flexibility they get when swapping any
  other part of their processes address spaces.  Instead of requiring the
  entire MR to fit in physical memory, we can allow the MR to be larger,
  and only fit the current working set in physical memory.

Why should anyone care?  What problems are users currently experiencing?

  This can make programming with RDMA much simpler.  Today, developers
  that are working with more data than their RAM can hold need either to
  deregister and reregister memory regions throughout their process's
  life, or keep a single memory region and copy the data to it.  On demand
  paging will allow these developers to register a single MR at the
  beginning of their process's life, and let the operating system manage
  which pages needs to be fetched at a given time.  In the future, we
  might be able to provide a single memory access key for each process
  that would provide the entire process's address as one large memory
  region, and the developers wouldn't need to register memory regions at
  all.

Is there any prospect that any other subsystems will utilise these
infrastructural changes?  If so, which and how, etc?

  As for other subsystems, I understand that XPMEM wanted to sleep in
  MMU notifiers, as Christoph Lameter wrote at
  http://lkml.indiana.edu/hypermail/linux/kernel/0802.1/0460.html and
  perhaps Andrea knows about other use cases.

  Scheduling in mmu notifications is required since we need to sync the
  hardware with the secondary page tables change.  A TLB flush of an IO
  device is inherently slower than a CPU TLB flush, so our design works by
  sending the invalidation request to the device, and waiting for an
  interrupt before exiting the mmu notifier handler.

Avi said:

  kvm may be a buyer.  kvm::mmu_lock, which serializes guest page
  faults, also protects long operations such as destroying large ranges.
  It would be good to convert it into a spinlock, but as it is used inside
  mmu notifiers, this cannot be done.

  (there are alternatives, such as keeping the spinlock and using a
  generation counter to do the teardown in O(1), which is what the "may"
  is doing up there).

[akpm@linux-foundation.orgpossible speed tweak in hugetlb_cow(), cleanups]
Signed-off-by: Andrea Arcangeli <andrea@qumranet.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Haggai Eran <haggaie@mellanox.com>
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Liran Liss <liranl@mellanox.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

hugetlb: do not use vma_hugecache_offset() for vma_prio_tree_foreach

0c176d5 ("mm: hugetlb: fix pgoff computation when unmapping page from
vma") fixed pgoff calculation but it has replaced it by
vma_hugecache_offset() which is not approapriate for offsets used for
vma_prio_tree_foreach() because that one expects index in page units
rather than in huge_page_shift.

Johannes said:

: The resulting index may not be too big, but it can be too small: assume
: hpage size of 2M and the address to unmap to be 0x200000.  This is regular
: page index 512 and hpage index 1.  If you have a VMA that maps the file
: only starting at the second huge page, that VMAs vm_pgoff will be 512 but
: you ask for offset 1 and miss it even though it does map the page of
: interest.  hugetlb_cow() will try to unmap, miss the vma, and retry the
: cow until the allocation succeeds or the skipped vma(s) go away.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Hillf Danton <dhillf@gmail.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: thp: fix pmd_present for split_huge_page and PROT_NONE with THP

In many places !pmd_present has been converted to pmd_none. For pmds
that's equivalent and pmd_none is quicker so using pmd_none is better.

However (unless we delete pmd_present) we should provide an accurate
pmd_present too. This will avoid the risk of code thinking the pmd is non
present because it's under __split_huge_page_map, see the pmd_mknotpresent
there and the comment above it.

If the page has been mprotected as PROT_NONE, it would also lead to a
pmd_present false negative in the same way as the race with
split_huge_page.

Because the PSE bit stays on at all times (both during split_huge_page and
when the _PAGE_PROTNONE bit get set), we could only check for the PSE bit,
but checking the PROTNONE bit too is still good to remember pmd_present
must always keep PROT_NONE into account.

This explains a not reproducible BUG_ON that was seldom reported on the
lists.

The same issue is in pmd_large, it would go wrong with both PROT_NONE and
if it races with split_huge_page.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>