We observed this problem has been occurring since 2.6.30 with
fs/binfmt_elf.c: create_elf_tables()->get_random_bytes(), introduced by f06295b44c296c8f ("ELF: implement AT_RANDOM for glibc PRNG seeding").
/*
* Generate 16 random bytes for userspace PRNG seeding.
*/
get_random_bytes(k_rand_bytes, sizeof(k_rand_bytes));
The patch introduces a wrapper around get_random_int() which has lower
overhead than calling get_random_bytes() directly.
With this patch applied:
$ cat /proc/sys/kernel/random/entropy_avail
2731
$ cat /proc/sys/kernel/random/entropy_avail
2802
$ cat /proc/sys/kernel/random/entropy_avail
2878
Analyzed by John Sobecki.
Signed-off-by: Jie Liu <jeff.liu@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <aedilger@gmail.com> Cc: Alan Cox <alan@linux.intel.com> Cc: Arnd Bergmann <arnn@arndb.de> Cc: John Sobecki <john.sobecki@oracle.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Kees Cook <keescook@chromium.org> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Paton J. Lewis [Thu, 7 Feb 2013 01:27:39 +0000 (12:27 +1100)]
epoll: support for disabling items, and a self-test app
It is not currently possible to reliably delete epoll items when using the
same epoll set from multiple threads. After calling epoll_ctl with
EPOLL_CTL_DEL, another thread might still be executing code related to an
event for that epoll item (in response to epoll_wait). Therefore the
deleting thread does not know when it is safe to delete resources
pertaining to the associated epoll item because another thread might be
using those resources.
The deleting thread could wait an arbitrary amount of time after calling
epoll_ctl with EPOLL_CTL_DEL and before deleting the item, but this is
inefficient and could result in the destruction of resources before
another thread is done handling an event returned by epoll_wait.
This patch enhances epoll_ctl to support EPOLL_CTL_DISABLE, which disables
an epoll item. If epoll_ctl returns -EBUSY in this case, then another
thread may handling a return from epoll_wait for this item. Otherwise if
epoll_ctl returns 0, then it is safe to delete the epoll item. This
allows multiple threads to use a mutex to determine when it is safe to
delete an epoll item and its associated resources, which allows epoll
items to be deleted both efficiently and without error in a multi-threaded
environment. Note that EPOLL_CTL_DISABLE is only useful in conjunction
with EPOLLONESHOT, and using EPOLL_CTL_DISABLE on an epoll item without
EPOLLONESHOT returns -EINVAL.
This patch also adds a new test_epoll self-test program to both
demonstrate the need for this feature and test it.
Signed-off-by: Paton J. Lewis <palewis@adobe.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jason Baron <jbaron@redhat.com> Cc: Paul Holland <pholland@adobe.com> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bruce Allan [Thu, 7 Feb 2013 01:27:39 +0000 (12:27 +1100)]
checkpatch: fix USLEEP_RANGE test
Do not test udelay() for a value less than 10usec when passed a variable
instead of a hard-coded number; there is no way for checkpatch to know the
value of the variable. As it is today, it will complain about variables
with alphanumeric characters plus '_', e.g. foo_bar, but not variables
with other characters, eg. foo->bar.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
decompressors: make the default XZ_DEC_* config match the selected architecture
Change the defautl XZ_DEC_* config symbol to match the configured
architecture. It is perfectly legitimate to support multiple XZ BCJ
filters for different architectures (e.g.: to mount foreign squashfs/xz
compressed filesystems), it is however more natural not to select them all
by default, but only the one matching the configured architecture.
Signed-off-by: Florian Fainelli <florian@openwrt.org> Acked-by: Lasse Collin <lasse.collin@tukaani.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Namjae Jeon [Thu, 7 Feb 2013 01:27:36 +0000 (12:27 +1100)]
lib/parser.c: fix up comments for valid return values from match_number
match_number() has return values of -ENOMEM, -EINVAL and -ERANGE. So, for
all the functions calling match_number, the return value should include
these values. Fix up the comments to reflect the correct values.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
drivers/leds/leds-ot200.c: fix error caused by shifted mask
During the development of this driver an in-house register documentation
was used. The last week some integration tests were done and this problem
was found. It turned out that the released register documentation is
wrong.
The fix is very simple: shift all masks by one.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Cc: Bryan Wu <cooloney@gmail.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
of_find_node_by_name() returns a node pointer with refcount incremented,
use of_node_put() on it when done.
of_find_node_by_name() will call of_node_put() against the node pass to
from parameter, thus we also need to call of_node_get(from) before calling
of_find_node_by_name().
Signed-off-by: Axel Lin <axel.lin@ingics.com> Cc: Jingoo Han <jg1.han@samsung.com> Cc: Haojian Zhuang <haojian.zhuang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This is an initial commit of a backlight driver, using step-up DCDC power
supplies on AS3711 PMIC. Only one mode has actually been tested, several
further modes have been implemented "dry," but disabled to avoid accidental
hardware damage. Anyone wishing to use any of those modes will have to
modify the driver.
Tested on sh73a0-based kzm9g board. Only one mode has been tested and is
enabled. That mode copies the sample code from the manufacturer.
Deviations from that code proved to be fatal for the hardware...
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Cc: Magnus Damm <magnus.damm@gmail.com> Cc: Richard Purdie <rpurdie@rpsys.net> Acked-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Axel Lin [Thu, 7 Feb 2013 01:27:33 +0000 (12:27 +1100)]
drivers/video/backlight/l4f00242t03.c: convert to devm_regulator_get()
Signed-off-by: Axel Lin <axel.lin@ingics.com> Acked-by: Jingoo Han <jg1.han@samsung.com> Cc: Alberto Panizzo <maramaopercheseimorto@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jingoo Han [Thu, 7 Feb 2013 01:27:32 +0000 (12:27 +1100)]
backlight: corgi_lcd: use lcd_get_data instead of dev_get_drvdata
Use the wrapper function for getting the driver data using lcd_device
instead of using dev_get_drvdata with &ld->dev, so we can directly pass a
struct lcd_device.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jingoo Han [Thu, 7 Feb 2013 01:27:32 +0000 (12:27 +1100)]
backlight: omap1: use bl_get_data instead of dev_get_drvdata
Use the wrapper function for getting the driver data using
backlight_device instead of using dev_get_drvdata with &bd->dev, so we can
directly pass a struct backlight_device.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jingoo Han [Thu, 7 Feb 2013 01:27:31 +0000 (12:27 +1100)]
backlight: tosa: use bl_get_data instead of dev_get_drvdata
Use the wrapper function for getting the driver data using
backlight_device instead of using dev_get_drvdata with &bd->dev, so we can
directly pass a struct backlight_device.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jingoo Han [Thu, 7 Feb 2013 01:27:31 +0000 (12:27 +1100)]
backlight: corgi_lcd: use bl_get_data instead of dev_get_drvdata
Use the wrapper function for getting the driver data using
backlight_device instead of using dev_get_drvdata with &bd->dev, so we can
directly pass a struct backlight_device.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jingoo Han [Thu, 7 Feb 2013 01:27:31 +0000 (12:27 +1100)]
backlight: ams369fg06: use bl_get_data instead of dev_get_drvdata
Use the wrapper function for getting the driver data using
backlight_device instead of using dev_get_drvdata with &bd->dev, so we can
directly pass a struct backlight_device.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jingoo Han [Thu, 7 Feb 2013 01:27:30 +0000 (12:27 +1100)]
pwm_backlight: use bl_get_data instead of dev_get_drvdata
Use the wrapper function for getting the driver data using
backlight_device instead of using dev_get_drvdata with &bd->dev, so we can
directly pass a struct backlight_device.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Cc: Thierry Reding <thierry.reding@avionic-design.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jingoo Han [Thu, 7 Feb 2013 01:27:30 +0000 (12:27 +1100)]
backlight: aat2870: use bl_get_data instead of dev_get_drvdata
Use the wrapper function for getting the driver data using
backlight_device instead of using dev_get_drvdata with &bd->dev, so we can
directly pass a struct backlight_device.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jingoo Han [Thu, 7 Feb 2013 01:27:30 +0000 (12:27 +1100)]
backlight: lms501kf03: use spi_get_drvdata and spi_set_drvdata
Use the wrapper functions for getting and setting the driver data using
spi_device instead of using dev_{get|set}_drvdata with &spi->dev, so we
can directly pass a struct spi_device.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jingoo Han [Thu, 7 Feb 2013 01:27:29 +0000 (12:27 +1100)]
backlight: corgi_lcd: use spi_get_drvdata and spi_set_drvdata
Use the wrapper functions for getting and setting the driver data using
spi_device instead of using dev_{get|set}_drvdata with &spi->dev, so we
can directly pass a struct spi_device.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jingoo Han [Thu, 7 Feb 2013 01:27:29 +0000 (12:27 +1100)]
backlight: tosa: use spi_get_drvdata and spi_set_drvdata
Use the wrapper functions for getting and setting the driver data using
spi_device instead of using dev_{get|set}_drvdata with &spi->dev, so we
can directly pass a struct spi_device.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jingoo Han [Thu, 7 Feb 2013 01:27:29 +0000 (12:27 +1100)]
backlight: vgg2432a4: use spi_get_drvdata and spi_set_drvdata
Use the wrapper functions for getting and setting the driver data using
spi_device instead of using dev_{get|set}_drvdata with &spi->dev, so we
can directly pass a struct spi_device.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jingoo Han [Thu, 7 Feb 2013 01:27:28 +0000 (12:27 +1100)]
backlight: ams369fg06: use spi_get_drvdata and spi_set_drvdata
Use the wrapper functions for getting and setting the driver data using
spi_device instead of using dev_{get|set}_drvdata with &spi->dev, so we
can directly pass a struct spi_device.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jingoo Han [Thu, 7 Feb 2013 01:27:28 +0000 (12:27 +1100)]
backlight: lms283gf05: use spi_get_drvdata and spi_set_drvdata
Use the wrapper functions for getting and setting the driver data using
spi_device instead of using dev_{get|set}_drvdata with &spi->dev, so we
can directly pass a struct spi_device.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jingoo Han [Thu, 7 Feb 2013 01:27:28 +0000 (12:27 +1100)]
backlight: tdo24m: use spi_get_drvdata and spi_set_drvdata
Use the wrapper functions for getting and setting the driver data using
spi_device instead of using dev_{get|set}_drvdata with &spi->dev, so we
can directly pass a struct spi_device.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jingoo Han [Thu, 7 Feb 2013 01:27:28 +0000 (12:27 +1100)]
backlight: ltv350qv: use spi_get_drvdata and spi_set_drvdata
Use the wrapper functions for getting and setting the driver data using
spi_device instead of using dev_{get|set}_drvdata with &spi->dev, so we
can directly pass a struct spi_device.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jingoo Han [Thu, 7 Feb 2013 01:27:27 +0000 (12:27 +1100)]
backlight: s6e63m0: use spi_get_drvdata and spi_set_drvdata
Use the wrapper functions for getting and setting the driver data using
spi_device instead of using dev_{get|set}_drvdata with &spi->dev, so we
can directly pass a struct spi_device
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jingoo Han [Thu, 7 Feb 2013 01:27:27 +0000 (12:27 +1100)]
backlight: ld9040: use spi_get_drvdata and spi_set_drvdata
Use the wrapper functions for getting and setting the driver data using
spi_device instead of using dev_{get|set}_drvdata with &spi->dev, so we
can directly pass a struct spi_device.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jingoo Han [Thu, 7 Feb 2013 01:27:27 +0000 (12:27 +1100)]
backlight: l4f00242t03: use spi_get_drvdata and spi_set_drvdata
Use the wrapper functions for getting and setting the driver data using
spi_device instead of using dev_{get|set}_drvdata with &spi->dev, so we
can directly pass a struct spi_device.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
WARNING: please write a paragraph that describes the config symbol fully
#45: FILE: drivers/video/backlight/Kconfig:380:
+config BACKLIGHT_LP8788
ERROR: code indent should use tabs where possible
#446: FILE: include/linux/mfd/lp8788.h:242:
+^I^I Only valid when bl_mode is LP8788_BL_COMB_PWM_BASED$
total: 1 errors, 1 warnings, 403 lines checked
NOTE: whitespace errors detected, you may wish to use scripts/cleanpatch or
scripts/cleanfile
./patches/backlight-add-new-lp8788-backlight-driver.patch has style problems, please review.
If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.
Please run checkpatch prior to sending patches
Cc: "Kim, Milo" <Milo.Kim@ti.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kim, Milo [Thu, 7 Feb 2013 01:27:26 +0000 (12:27 +1100)]
backlight: add new lp8788 backlight driver
TI LP8788 PMU supports regulators, battery charger, RTC, ADC, backlight
dri= ver and current sinks. This patch enables LP8788 backlight module.
(Brightness mode)
The brightness is controlled by PWM input or I2C register.
All modes are supported in the driver.
(Platform data)
Configurable data can be defined in the platform side.
name : backlight driver name. (default: "lcd-backlight")
initial_brightness : initial value of backlight brightness
bl_mode : brightness control by PWM or lp8788 register
dim_mode : dimming mode selection
full_scale : full scale current setting
rise_time : brightness ramp up step time
fall_time : brightness ramp down step time
pwm_pol : PWM polarity setting when bl_mode is PWM based
period_ns : platform specific PWM period value. unit is nano.
The default values are set in case no platform data is defined.
Signed-off-by: Milo(Woogyom) Kim <milo.kim@ti.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Thierry Reding <thierry.reding@avionic-design.de> Cc: "devendra.aaru" <devendra.aaru@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
'before_power' was used to check the previous status when resume() is
called. However, FB_BLANK_POWERDOWN was used in suspend() all the time,
so there is no need to check the previous status. Also, redundant return
variables are removed to reduce the code.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Remove unnecessary NULL deference check, because it was already checked in
ams369fg06_probe(). Also, unnecessary parentheses are removed in
ams369fg06_power_is_on().
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
'before_power' was used to check the previous status when resume() is
called. However, FB_BLANK_POWERDOWN was used in suspend() all the time,
so there is no need to check the previous status. Also, redundant return
variables are removed to reduce the code.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrew Morton [Thu, 7 Feb 2013 01:27:20 +0000 (12:27 +1100)]
backlight-add-lms501kf03-lcd-driver-fix
remove unused variable `before_power'
Cc: "devendra.aaru" <devendra.aaru@gmail.com> Cc: Ilho Lee <Ilho215.lee@samsung.com> Cc: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cody P Schafer [Thu, 7 Feb 2013 01:27:19 +0000 (12:27 +1100)]
MAINTAINERS: mm: add additional include files to listing
Add gfp.h, mmzone.h, memory_hotplug.h & vmalloc.h to the "MEMORY
MANAGMENT" section so scripts/get_maintainer.pl can do a better job of
making recommendations.
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
get_maintainer.pl: find maintainers for removed files
For removed files, get_maintainer.pl doesn't find any maintainers (besides
the default linux-kernel@vger.kernel.org), as it only looks at the "+++"
lines, which are "/dev/null" for removals. Fix this by extending the
parsing to the "---" lines.
E.g. for the two line test patch below the real score maintainers will now
be found:
lib/vsprintf.c: add %pa format specifier for phys_addr_t types
Add the %pa format specifier for printing a phys_addr_t type and its
derivative types (such as resource_size_t), since the physical address
size on some platforms can vary based on build options, regardless of the
native integer type.
Signed-off-by: Stepan Moskovchenko <stepanm@codeaurora.org> Cc: Rob Landley <rob@landley.net> Cc: George Spelvin <linux@horizon.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Andrei Emeltchenko <andrei.emeltchenko@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fan Du [Thu, 7 Feb 2013 01:27:18 +0000 (12:27 +1100)]
include/linux/fs.h: disable preempt when acquire i_size_seqcount write lock
Two rt tasks bind to one CPU core.
The higher priority rt task A preempts a lower priority rt task B which
has already taken the write seq lock, and then the higher priority rt task
A try to acquire read seq lock, it's doomed to lockup.
rt task A with lower priority: call write
i_size_write rt task B with higher priority: call sync, and preempt task A
write_seqcount_begin(&inode->i_size_seqcount); i_size_read
inode->i_size = i_size; read_seqcount_begin <-- lockup here...
So disable preempt when acquiring every i_size_seqcount *write* lock will
cure the problem.
Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Shaohua Li [Thu, 7 Feb 2013 01:27:18 +0000 (12:27 +1100)]
smp: make smp_call_function_many() use logic similar to smp_call_function_single()
I'm testing swapout workload in a two-socket Xeon machine. The workload
has 10 threads, each thread sequentially accesses separate memory region.
TLB flush overhead is very big in the workload. For each page, page
reclaim need move it from active lru list and then unmap it. Both need a
TLB flush. And this is a multthread workload, TLB flush happens in 10
CPUs. In X86, TLB flush uses generic smp_call)function. So this workload
stress smp_call_function_many heavily.
swapout throughput is around 1400M/s. So there is around a 7%
improvement, and total cpu utilization doesn't change.
Without the patch, cfd_data is shared by all CPUs.
generic_smp_call_function_interrupt does read/write cfd_data several times
which will create a lot of cache ping-pong. With the patch, the data
becomes per-cpu. The ping-pong is avoided. And from the perf data, this
doesn't make call_single_queue lock contend.
Next step is to remove generic_smp_call_function_interrupt() from arch
code.
Signed-off-by: Shaohua Li <shli@fusionio.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Update the alpha arch_get_unmapped_area function to make use of
vm_unmapped_area() instead of implementing a brute force search.
Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jan Kara [Thu, 7 Feb 2013 01:27:17 +0000 (12:27 +1100)]
ubifs: wait for page writeback to provide stable pages
When stable pages are required, we have to wait if the page is just going
to disk and we want to modify it. Add proper callback to
ubifs_vm_page_mkwrite().
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jan Kara [Thu, 7 Feb 2013 01:27:16 +0000 (12:27 +1100)]
ocfs2: wait for page writeback to provide stable pages
When stable pages are required, we have to wait if the page is just going
to disk and we want to modify it. Add proper callback to
ocfs2_grab_pages_for_write().
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Darrick J. Wong [Thu, 7 Feb 2013 01:27:16 +0000 (12:27 +1100)]
block: optionally snapshot page contents to provide stable pages during write
This provides a band-aid to provide stable page writes on jbd without
needing to backport the fixed locking and page writeback bit handling
schemes of jbd2. The band-aid works by using bounce buffers to snapshot
page contents instead of waiting.
For those wondering about the ext3 bandage -- fixing the jbd locking
(which was done as part of ext4dev years ago) is a lot of surgery, and
setting PG_writeback on data pages when we actually hold the page lock
dropped ext3 performance by nearly an order of magnitude. If we're going
to migrate iscsi and raid to use stable page writes, the complaints about
high latency will likely return. We might as well centralize their page
snapshotting thing to one place.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Tested-by: Andy Lutomirski <luto@amacapital.net> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Artem Bityutskiy <dedekind1@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Darrick J. Wong [Thu, 7 Feb 2013 01:27:16 +0000 (12:27 +1100)]
9pfs: fix filesystem to wait for stable page writeback
Fix up the ->page_mkwrite handler to provide stable page writes if necessary.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Darrick J. Wong [Thu, 7 Feb 2013 01:27:15 +0000 (12:27 +1100)]
mm: only enforce stable page writes if the backing device requires it
Create a helper function to check if a backing device requires stable page
writes and, if so, performs the necessary wait. Then, make it so that all
points in the memory manager that handle making pages writable use the
helper function. This should provide stable page write support to most
filesystems, while eliminating unnecessary waiting for devices that don't
require the feature.
Before this patchset, all filesystems would block, regardless of whether
or not it was necessary. ext3 would wait, but still generate occasional
checksum errors. The network filesystems were left to do their own thing,
so they'd wait too.
After this patchset, all the disk filesystems except ext3 and btrfs will
wait only if the hardware requires it. ext3 (if necessary) snapshots
pages instead of blocking, and btrfs provides its own bdi so the mm will
never wait. Network filesystems haven't been touched, so either they
provide their own stable page guarantees or they don't block at all. The
blocking behavior is back to what it was before 3.0 if you don't have a
disk requiring stable page writes.
Here's the result of using dbench to test latency on ext2:
Throughput 55.4496 MB/sec 4 clients 4 procs max_latency=298.650 ms
As you can see, the maximum write latency drops considerably with this
patch enabled. The other filesystems (ext3/ext4/xfs/btrfs) behave
similarly, but see the cover letter for those results.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Darrick J. Wong [Thu, 7 Feb 2013 01:27:15 +0000 (12:27 +1100)]
bdi: allow block devices to say that they require stable page writes
This patchset ("stable page writes, part 2") makes some key modifications
to the original 'stable page writes' patchset. First, it provides
creators (devices and filesystems) of a backing_dev_info a flag that
declares whether or not it is necessary to ensure that page contents
cannot change during writeout. It is no longer assumed that this is true
of all devices (which was never true anyway). Second, the flag is used to
relaxed the wait_on_page_writeback calls so that wait only occurs if the
device needs it. Third, it fixes up the remaining disk-backed filesystems
to use this improved conditional-wait logic to provide stable page writes
on those filesystems.
It is hoped that (for people not using checksumming devices, anyway) this
patchset will give back unnecessary performance decreases since the
original stable page write patchset went into 3.0. Sorry about not fixing
it sooner.
Complaints were registered by several people about the long write
latencies introduced by the original stable page write patchset.
Generally speaking, the kernel ought to allocate as little extra memory as
possible to facilitate writeout, but for people who simply cannot wait, a
second page stability strategy is (re)introduced: snapshotting page
contents. The waiting behavior is still the default strategy; to enable
page snapshotting, a superblock flag (MS_SNAP_STABLE) must be set. This
flag is used to bandaid^Henable stable page writeback on ext3[1], and is
not used anywhere else.
Given that there are already a few storage devices and network FSes that
have rolled their own page stability wait/page snapshot code, it would be
nice to move towards consolidating all of these. It seems possible that
iscsi and raid5 may wish to use the new stable page write support to
enable zero-copy writeout.
Thank you to Jan Kara for helping fix a couple more filesystems.
Per Andrew Morton's request, here are the result of using dbench to measure
latencies on ext2:
Throughput 55.4496 MB/sec 4 clients 4 procs max_latency=298.650 ms
As you can see, for ext2 the maximum write latency decreases from ~60ms on a
laptop hard disk to ~4ms. I'm not sure why the flush latencies increase,
though I suspect that being able to dirty pages faster gives the flusher more
work to do.
On ext4, the average write latency decreases as well as all the maximum
latencies:
Throughput 34.0795 MB/sec 4 clients 4 procs max_latency=219.044 ms
Before this patchset, all filesystems would block, regardless of whether
or not it was necessary. ext3 would wait, but still generate occasional
checksum errors. The network filesystems were left to do their own thing,
so they'd wait too.
After this patchset, all the disk filesystems except ext3 and btrfs will
wait only if the hardware requires it. ext3 (if necessary) snapshots
pages instead of blocking, and btrfs provides its own bdi so the mm will
never wait. Network filesystems haven't been touched, so either they
provide their own wait code, or they don't block at all. The blocking
behavior is back to what it was before 3.0 if you don't have a disk
requiring stable page writes.
This patchset has been tested on 3.8.0-rc3 on x64 with ext3, ext4, and xfs.
I've spot-checked 3.8.0-rc4 and seem to be getting the same results as -rc3.
[1] The alternative fixes to ext3 include fixing the locking order and page bit
handling like we did for ext4 (but then why not just use ext4?), or setting
PG_writeback so early that ext3 becomes extremely slow. I tried that, but the
number of write()s I could initiate dropped by nearly an order of magnitude.
That was a bit much even for the author of the stable page series! :)
This patch:
Creates a per-backing-device flag that tracks whether or not pages must be
held immutable during writeout. Eventually it will be used to waive
wait_for_page_writeback() if nothing requires stable pages.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Update the frv arch_get_unmapped_area function to make use of
vm_unmapped_area() instead of implementing a brute force search.
Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Glauber Costa [Thu, 7 Feb 2013 01:27:14 +0000 (12:27 +1100)]
memcg: debugging facility to access dangling memcgs
If memcg is tracking anything other than plain user memory (swap, tcp buf
mem, or slab memory), it is possible - and normal - that a reference will
be held by the group after it is dead. Still, for developers, it would be
extremely useful to be able to query about those states during debugging.
This patch provides a debugging facility in the root memcg, so we can
inspect which memcgs still have pending objects, and what is the cause of
this state.
Signed-off-by: Glauber Costa <glommer@parallels.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Xi Wang [Thu, 7 Feb 2013 01:27:13 +0000 (12:27 +1100)]
mm/dmapool.c: fix null dev in dma_pool_create()
A few drivers invoke dma_pool_create() with a null dev. Note that dev is
dereferenced in dev_to_node(dev), causing a null pointer dereference.
A long term solution is to disallow null dev. Once the drivers are fixed,
we can simplify the core code here. For now we add WARN_ON(!dev) to
notify the driver maintainers and avoid the null pointer dereference.
Signed-off-by: Xi Wang <xi.wang@gmail.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Xi Wang [Thu, 7 Feb 2013 01:27:13 +0000 (12:27 +1100)]
drivers/usb/gadget/amd5536udc.c: avoid calling dma_pool_create() with NULL dev
Calling dma_pool_create() with dev==NULL will oops on a NUMA machine.
Rather than changing dma_pool_create() we wish to disallow passing
dev==NULL. This requires fixing up the small number of drivers which are
passing in dev==NULL.
Use &dev->pdev->dev instead of NULL.
Signed-off-by: Xi Wang <xi.wang@gmail.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Michal Hocko [Thu, 7 Feb 2013 01:27:13 +0000 (12:27 +1100)]
drop_caches: add some documentation and info message
I would like to resurrect Dave's patch. The last time it was posted was
here https://lkml.org/lkml/2010/9/16/250 and there didn't seem to be any
strong opposition.
Kosaki was worried about possible excessive logging when somebody drops
caches too often (but then he claimed he didn't have a strong opinion on
that) but I would say opposite. If somebody does that then I would really
like to know that from the log when supporting a system because it almost
for sure means that there is something fishy going on. It is also worth
mentioning that only root can write drop caches so this is not an flooding
attack vector.
I am bringing that up again because this can be really helpful when
chasing strange performance issues which (surprise surprise) turn out to
be related to artificially dropped caches done because the admin thinks
this would help...
I have just refreshed the original patch on top of the current mm tree
but I could live with KERN_INFO as well if people think that KERN_NOTICE
is too hysterical.
: From: Dave Hansen <dave@linux.vnet.ibm.com>
: Date: Fri, 12 Oct 2012 14:30:54 +0200
:
: There is plenty of anecdotal evidence and a load of blog posts
: suggesting that using "drop_caches" periodically keeps your system
: running in "tip top shape". Perhaps adding some kernel
: documentation will increase the amount of accurate data on its use.
:
: If we are not shrinking caches effectively, then we have real bugs.
: Using drop_caches will simply mask the bugs and make them harder
: to find, but certainly does not fix them, nor is it an appropriate
: "workaround" to limit the size of the caches.
:
: It's a great debugging tool, and is really handy for doing things
: like repeatable benchmark runs. So, add a bit more documentation
: about it, and add a little KERN_NOTICE. It should help developers
: who are chasing down reclaim-related bugs.
[mhocko@suse.cz: refreshed to current -mm tree]
[akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Zhang Yanfei [Thu, 7 Feb 2013 01:27:12 +0000 (12:27 +1100)]
net: change type of virtio_chan->p9_max_pages
This member of struct virtio_chan is calculated from nr_free_buffer_pages
so change its type to unsigned long in case of overflow.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: David Miller <davem@davemloft.net> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Zhang Yanfei [Thu, 7 Feb 2013 01:27:12 +0000 (12:27 +1100)]
net: change type of netns_ipvs->sysctl_sync_qlen_max
This member of struct netns_ipvs is calculated from nr_free_buffer_pages
so change its type to unsigned long in case of overflow.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Simon Horman <horms@verge.net.au> Cc: Julian Anastasov <ja@ssi.bg> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Zhang Yanfei [Thu, 7 Feb 2013 01:27:10 +0000 (12:27 +1100)]
mm: fix return type for functions nr_free_*_pages
Currently, the amount of RAM that functions nr_free_*_pages return
is held in unsigned int. But in machines with big memory (exceeding
16TB), the amount may be incorrect because of overflow, so fix it.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Simon Horman <horms@verge.net.au> Cc: Julian Anastasov <ja@ssi.bg> Cc: David Miller <davem@davemloft.net> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Michal Hocko [Thu, 7 Feb 2013 01:27:10 +0000 (12:27 +1100)]
memcg: cleanup mem_cgroup_init comment
We should encourage all memcg controller initialization independent on a
specific mem_cgroup to be done here rather than exploit css_alloc callback
and assume that nothing happens before root cgroup is created.
Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <htejun@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Michal Hocko [Thu, 7 Feb 2013 01:27:10 +0000 (12:27 +1100)]
memcg: move memcg_stock initialization to mem_cgroup_init
memcg_stock are currently initialized during the root cgroup allocation
which is OK but it pointlessly pollutes memcg allocation code with
something that can be called when the memcg subsystem is initialized by
mem_cgroup_init along with other controller specific parts.
This patch wraps the current memcg_stock initialization code into a helper
calls it from the controller subsystem initialization code.
Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <htejun@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Michal Hocko [Thu, 7 Feb 2013 01:27:10 +0000 (12:27 +1100)]
memcg: move mem_cgroup_soft_limit_tree_init to mem_cgroup_init
Per-node-zone soft limit tree is currently initialized when the root
cgroup is created which is OK but it pointlessly pollutes memcg allocation
code with something that can be called when the memcg subsystem is
initialized by mem_cgroup_init along with other controller specific parts.
While we are at it let's make mem_cgroup_soft_limit_tree_init void because
it doesn't make much sense to report memory failure because if we fail to
allocate memory that early during the boot then we are screwed anyway
(this saves some code).
Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <htejun@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>