Sachin Kamat [Thu, 27 Jun 2013 23:53:01 +0000 (09:53 +1000)]
drivers/rtc/rtc-ds1511.c: fix issues related to spaces and braces
Fixes the following types of issues:
WARNING: please, no spaces at the start of a line
WARNING: braces {} are not necessary for single statement blocks
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Cc: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sachin Kamat [Thu, 27 Jun 2013 23:52:59 +0000 (09:52 +1000)]
drivers/rtc/rtc-at91rm9200.c: include <linux/uaccess.h>
Silences the following checkpatch warning:
WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h>
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Cc: Andrew Victor <linux@maxim.org.za> Cc: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sachin Kamat [Thu, 27 Jun 2013 23:52:58 +0000 (09:52 +1000)]
drivers/rtc/interface.c: fix checkpatch errors
Fixes the following types of errors:
ERROR: "foo* bar" should be "foo *bar"
ERROR: else should follow close brace '}'
WARNING: braces {} are not necessary for single statement blocks
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Cc: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The driver core clears the driver data to NULL after device_release or on
probe failure, since commit 0998d063100 ("device-core: Ensure drvdata =
NULL when no driver is bound"). Thus, it is not needed to manually clear
the device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Miklos Szeredi [Thu, 27 Jun 2013 23:52:57 +0000 (09:52 +1000)]
autofs4: translate pids to the right namespace for the daemon
The PID and the TGID of the process triggering the mount are sent to the
daemon. Currently the global pid values are sent (ones valid in the
initial pid namespace) but this is wrong if the autofs daemon itself is
not running in the initial pid namespace.
So send the pid values that are valid in the namespace of the autofs daemon.
The namespace to use is taken from the oz_pgrp pid pointer, which was set
at mount time to the mounting process' pid namespace.
If the pid translation fails (the triggering process is in an unrelated
pid namespace) then the automount fails with ENOENT.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Ian Kent <raven@themaw.net> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
autofs4: allow autofs to work outside the initial PID namespace
Enable autofs4 to work in a "container". oz_pgrp is converted from pid_t
to struct pid and this is stored at mount time based on the "pgrp=" option
or if the option is missing then the current pgrp.
The "pgrp=" option is interpreted in the PID namespace of the current
process. This option is flawed in that it doesn't carry the namespace
information, so it should be deprecated. AFAICS the autofs daemon always
sends the current pgrp, which is the default anyway.
The oz_pgrp is also set from the AUTOFS_DEV_IOCTL_SETPIPEFD_CMD ioctl.
This ioctl sets oz_pgrp to the current pgrp. It is not allowed to change
the pid namespace.
oz_pgrp is used mainly to determine whether the process traversing the
autofs mount tree is the autofs daemon itself or not. This function now
compares the pid pointers instead of the pid_t values.
One other use of oz_pgrp is in autofs4_show_options. There is shows the
virtual pid number (i.e. the one that is valid inside the PID namespace
of the calling process)
For debugging printk convert oz_pgrp to the value in the initial pid
namespace.
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Ian Kent <raven@themaw.net> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mathias Krause [Thu, 27 Jun 2013 23:52:57 +0000 (09:52 +1000)]
kprobes: handle empty/invalid input to debugfs "enabled" file
When writing invalid input to 'debug/kprobes/enabled' it'll silently be
ignored. Even worse, when writing an empty string to this file, the
outcome is purely random as the switch statement will make its decision
based on the value of an uninitialized stack variable.
Fix this by handling invalid/empty input as error returning -EINVAL.
Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Steven Rostedt [Thu, 27 Jun 2013 23:52:56 +0000 (09:52 +1000)]
init: remove permanent string buffer from do_one_initcall()
do_one_initcall() uses a 64 byte string buffer to save a message. This
buffer is declared static and is only used at boot up and when a module
is loaded. As 64 bytes is very small, and this function has very limited
scope, there's no reason to waste permanent memory with this string and
not just simply put it on the stack.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We observed this problem has been occurring since 2.6.30 with
fs/binfmt_elf.c: create_elf_tables()->get_random_bytes(), introduced by f06295b44c296c8f ("ELF: implement AT_RANDOM for glibc PRNG seeding").
/*
* Generate 16 random bytes for userspace PRNG seeding.
*/
get_random_bytes(k_rand_bytes, sizeof(k_rand_bytes));
The patch introduces a wrapper around get_random_int() which has lower
overhead than calling get_random_bytes() directly.
With this patch applied:
$ cat /proc/sys/kernel/random/entropy_avail
2731
$ cat /proc/sys/kernel/random/entropy_avail
2802
$ cat /proc/sys/kernel/random/entropy_avail
2878
Analyzed by John Sobecki.
Signed-off-by: Jie Liu <jeff.liu@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <aedilger@gmail.com> Cc: Alan Cox <alan@linux.intel.com> Cc: Arnd Bergmann <arnn@arndb.de> Cc: John Sobecki <john.sobecki@oracle.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Kees Cook <keescook@chromium.org> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jacob Keller [Thu, 27 Jun 2013 23:52:55 +0000 (09:52 +1000)]
checkpatch: allow longer logging function names
The current $logFunction regular expression allows names like dev_warn,
e_dbg, netdev_info, etc, but some log functions are now written like
e_dev_warn, so allow 1 or 2 word blocks with an underscore before the
logging level.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joe Perches [Thu, 27 Jun 2013 23:52:55 +0000 (09:52 +1000)]
checkpatch: ignore existing CamelCase uses from include/...
When using --strict, CamelCase uses are described with CHECK: messages.
These CamelCase uses may be acceptable and should not generate these
messages when the variable is already defined in a file from the
include/... path.
So, change checkpatch to read all the .h files in include/... and look
for preexisting CamelCase #defines, typedefs and function prototypes.
Add these to the existing camelcase hash so that any uses in the patch or
file can be ignored.
There are currently ~3500 files in include/. It takes about 10 cpu
seconds on my little netbook to grep for and preseed these existing uses.
That's about 4x the time for a similar git grep.
This preseeding is only done once when using --strict and only when there
is a CamelCase use found.
If a .git directory is found, it uses 'git ls-files include' If not, it
uses 'find $root/include -name "*.h"
Signed-off-by: Joe Perches <joe@perches.com> Cc: Andy Whitcroft <apw@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joe Perches [Thu, 27 Jun 2013 23:52:55 +0000 (09:52 +1000)]
checkpatch: ignore SI unit CamelCase variants like "_uV"
Many existing variable names use SI like variants that should be otherwise
obvious and acceptable.
Whitelist them from the CamelCase message.
Signed-off-by: Joe Perches <joe@perches.com> Suggested-by: Phil Carmody <phil.carmody@partner.samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joe Perches [Thu, 27 Jun 2013 23:52:54 +0000 (09:52 +1000)]
checkpatch: create an EXPERIMENTAL --fix option to correct patches
Some patches have simple defects in whitespace and formatting that
checkpatch could correct automatically. Attempt to do so.
Add a --fix option to create a "<inputfile>.EXPERIMENTAL-checkpatch-fixes"
file that tries to use normal kernel style for some of these formatting
errors.
Add warnings against using this file without verifying the changes.
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joe Perches [Thu, 27 Jun 2013 23:52:50 +0000 (09:52 +1000)]
checkpatch: warn when using gcc's binary constant ("0b") extension
The gcc extension for binary constants that start with 0b is only
supported with gcc version 4.3 or higher.
The kernel can still be compiled with earlier versions of gcc, so have
checkpatch emit a warning for these constants.
Restructure checkpatch's constant finding code a bit to support finding
these binary constants.
Signed-off-by: Joe Perches <joe@perches.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Whitcroft <apw@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dave Jones [Thu, 27 Jun 2013 23:52:50 +0000 (09:52 +1000)]
list: remove __list_for_each()
__list_for_each used to be the non prefetch() aware list walking
primitive. When we removed the prefetch macros from the list routines, it
became redundant. Given it does exactly the same thing as list_for_each
now, we might as well remove it and call list_for_each directly.
Signed-off-by: Dave Jones <davej@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: Dave Airlie <airlied@linux.ie> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Jennifer Naumann <Jennifer.Naumann@informatik.stud.uni-erlangen.de> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Sebastian Hahn <snsehahn@cip.cs.fau.de> Cc: Stanislav Yakovlev <stas.yakovlev@gmail.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: YAMANE Toshiaki <yamanetoshi@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dave Jones [Thu, 27 Jun 2013 23:52:49 +0000 (09:52 +1000)]
ipw2200: convert __list_for_each usage to list_for_each
Signed-off-by: Dave Jones <davej@redhat.com> Cc: Stanislav Yakovlev <stas.yakovlev@gmail.com> Cc: "John W. Linville" <linville@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jingoo Han [Thu, 27 Jun 2013 23:52:48 +0000 (09:52 +1000)]
backlight: add CONFIG_PM_SLEEP to suspend/resume functions
Add CONFIG_PM_SLEEP to suspend/resume functions to fix the following build
warning when CONFIG_PM_SLEEP is not selected. This is because sleep PM
callbacks defined by SIMPLE_DEV_PM_OPS are only used when the
CONFIG_PM_SLEEP is enabled.
drivers/video/backlight/backlight.c:211:12: warning: 'backlight_suspend' defined but not used [-Wunused-function]
drivers/video/backlight/backlight.c:225:12: warning: 'backlight_resume' defined but not used [-Wunused-function]
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Cc: Shuah Khan <shuahkhan@gmail.com> Cc: Jingoo Han <jg1.han@samsung.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Shuah Khan [Thu, 27 Jun 2013 23:52:48 +0000 (09:52 +1000)]
backlight: convert from legacy pm ops to dev_pm_ops
Convert drivers/video/backlight/class to use dev_pm_ops for power
management and remove Legacy PM ops hooks. With this change, backlight
class registers suspend/resume callbacks via class->pm (dev_pm_ops)
instead of Legacy class->suspend/resume. When __device_suspend() runs
call-backs, it will find class->pm ops for the backlight class.
Signed-off-by: Shuah Khan <shuah.kh@samsung.com> Cc: Shuah Khan <shuahkhan@gmail.com> Cc: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The driver core clears the driver data to NULL after device_release or on
probe failure, since commit 0998d063100 ("device-core: Ensure drvdata =
NULL when no driver is bound"). Thus, it is not needed to manually clear
the device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The driver core clears the driver data to NULL after device_release or on
probe failure, since commit 0998d063100 ("device-core: Ensure drvdata =
NULL when no driver is bound"). Thus, it is not needed to manually clear
the device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The driver core clears the driver data to NULL after device_release or on
probe failure, since commit 0998d063100 ("device-core: Ensure drvdata =
NULL when no driver is bound"). Thus, it is not needed to manually clear
the device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The driver core clears the driver data to NULL after device_release or on
probe failure, since commit 0998d06310 ("device-core: Ensure drvdata =
NULL when no driver is bound"). Thus, it is not needed to manually clear
the device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jean Delvare [Thu, 27 Jun 2013 23:52:45 +0000 (09:52 +1000)]
MAINTAINERS: fix tape driver file mappings
The masks for the st and osst tape drivers in MAINTAINERS are too
broad and include unrelated files. Make the file list accurate so that
maintainers of these drivers aren't bothered with unrelated work.
Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Willem Riede <osst@riede.org> Cc: Kai Mäkisara <Kai.Makisara@kolumbus.fi> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
drivers/dma: remove unused support for MEMSET operations
There have never been any real users of MEMSET operations since they have
been introduced in January 2007 by 7405f74badf4 ("dmaengine: refactor
dmaengine around dma_async_tx_descriptor"). Therefore remove support for
them for now, it can be always brought back when needed.
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Cc: Vinod Koul <vinod.koul@intel.com> Cc: Dan Williams <djbw@fb.com> Cc: Tomasz Figa <t.figa@samsung.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Chuansheng Liu [Thu, 27 Jun 2013 23:52:44 +0000 (09:52 +1000)]
smp: Give WARN()ing when calling smp_call_function_many()/single() in serving irq
Currently the functions smp_call_function_many()/single() will give a
WARN()ing only in the case of irqs_disabled(), but that check is not
enough to guarantee execution of the SMP cross-calls.
In many other cases such as softirq handling/interrupt handling, the two
APIs still can not be called, just as the smp_call_function_many()
comments say:
* You must not call this function with disabled interrupts or from a
* hardware interrupt handler or from a bottom half handler. Preemption
* must be disabled when calling this function.
There is a real case for softirq DEADLOCK case:
CPUA CPUB
spin_lock(&spinlock)
Any irq coming, call the irq handler
irq_exit()
spin_lock_irq(&spinlock)
<== Blocking here due to
CPUB hold it
__do_softirq()
run_timer_softirq()
timer_cb()
call smp_call_function_many()
send IPI interrupt to CPUA
wait_csd()
Then both CPUA and CPUB will be deadlocked here.
So we should give a warning in the nmi, hardirq or softirq context as well.
Moreover, adding one new macro in_serving_irq() which indicates we are
processing nmi, hardirq or sofirq.
Signed-off-by: liu chuansheng <chuansheng.liu@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Fengguang Wu <fengguang.wu@intel.com> Cc: Lai Jiangshan <eag0628@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[jani.nikula@intel.com: use DMI_EXACT_MATCH for board name.] Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reported-by: <annndddrr@gmail.com> Cc: Cornel Panceac <cpanceac@gmail.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jani Nikula [Thu, 27 Jun 2013 23:52:43 +0000 (09:52 +1000)]
dmi: add support for exact DMI matches in addition to substring matching
dmi_match() considers a substring match to be a successful match. This is
not always sufficient to distinguish between DMI data for different
systems. Add support for exact string matching using strcmp() in addition
to the substring matching using strstr().
The specific use case in the i915 driver is to allow us to use an exact
match for D510MO, without also incorrectly matching D510MOV:
Oleg Nesterov [Thu, 27 Jun 2013 23:52:43 +0000 (09:52 +1000)]
kernel/sys.c:do_sysinfo(): use get_monotonic_boottime()
Change do_sysinfo() to use get_monotonic_boottime() instead of
do_posix_clock_monotonic_gettime() + monotonic_to_bootbased().
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Tomas Janousek <tjanouse@redhat.com> Cc: Tomas Smetana <tsmetana@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
- fix comment indenting
- avoid inclusion of <asm/> files - use <linux/> where possible
- fix uniprocessor build (__dump_stack undefined)
- remove unneeded ifdef around smp.h inclusion
Cc: Alex Thorlton <athorlton@sgi.com> Cc: David S. Miller <davem@davemloft.net> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Robin Holt <holt@sgi.com> Cc: Russ Anderson <rja@sgi.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: athorlton@sgi.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alex Thorlton [Thu, 27 Jun 2013 23:52:41 +0000 (09:52 +1000)]
dump_stack: serialize the output from dump_stack()
tAdd adds functionality to serialize the output from dump_stack() to avoid
mangling of the output when dump_stack is called simultaneously from
multiple cpus.
Signed-off-by: Alex Thorlton <athorlton@sgi.com> Reported-by: Russ Anderson <rja@sgi.com> Reviewed-by: Robin Holt <holt@sgi.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kees Cook [Thu, 27 Jun 2013 23:52:41 +0000 (09:52 +1000)]
drivers: avoid parsing names as kthread_run() format strings
Calling kthread_run with a single name parameter causes it to be handled
as a format string. Many callers are passing potentially dynamic string
content, so use "%s" in those cases to avoid any potential accidents.
Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kees Cook [Thu, 27 Jun 2013 23:52:41 +0000 (09:52 +1000)]
drivers: avoid format strings in names passed to alloc_workqueue()
For the workqueue creation interfaces that do not expect format strings,
make sure they cannot accidently be parsed that way. Additionally, clean
up calls made with a single parameter that would be handled as a format
string. Many callers are passing potentially dynamic string content, so
use "%s" in those cases to avoid any potential accidents.
Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kees Cook [Thu, 27 Jun 2013 23:52:40 +0000 (09:52 +1000)]
drivers: avoid format string in dev_set_name
Calling dev_set_name with a single paramter causes it to be handled as a
format string. Many callers are passing potentially dynamic string
content, so use "%s" in those cases to avoid any potential accidents,
including wrappers like device_create*() and bdi_register().
Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kees Cook [Thu, 27 Jun 2013 23:52:40 +0000 (09:52 +1000)]
clean up scary strncpy(dst, src, strlen(src)) uses
Fix various weird constructions of strncpy(dst, src, strlen(src)). Length
limits should be about the space available in the destination, not
repurposed as a method to either always include or always exclude a
trailing NULL byte. Either the NULL should always be copied (using
strlcpy), or it should not be copied (using something like memcpy).
Readable code should not depend on the weird behavior of strncpy when it
hits the length limit. Better to avoid the anti-pattern entirely.
Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [staging] Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [acpi] Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ursula Braun <ursula.braun@de.ibm.com> Cc: Frank Blaschka <blaschka@linux.vnet.ibm.com> Cc: Richard Weinberger <richard@nod.at> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dan Carpenter [Thu, 27 Jun 2013 23:52:39 +0000 (09:52 +1000)]
err.h: IS_ERR() can accept __user pointers
Sparse generates a false positive when you pass a __user or __iomem
pointer to the IS_ERR() functions.
drivers/rtc/rtc-ds1286.c:344:36: sparse: incorrect type in argument 1 (different address spaces)
drivers/rtc/rtc-ds1286.c:344:36: expected void const *ptr
drivers/rtc/rtc-ds1286.c:344:36: got unsigned int [noderef] [usertype] <asn:2>*rtcregs
We can silence these by adding a __force here and upgrading to Sparse
v0.4.5-rc1 or later.
This change has no effect when using current Sparse releases.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Christopher Li <sparse@chrisli.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Xi Wang [Thu, 27 Jun 2013 23:52:38 +0000 (09:52 +1000)]
mm/dmapool.c: fix null dev in dma_pool_create()
A few drivers invoke dma_pool_create() with a null dev. Note that dev is
dereferenced in dev_to_node(dev), causing a null pointer dereference.
A long term solution is to disallow null dev. Once the drivers are fixed,
we can simplify the core code here. For now we add WARN_ON(!dev) to
notify the driver maintainers and avoid the null pointer dereference.
Signed-off-by: Xi Wang <xi.wang@gmail.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Xi Wang [Thu, 27 Jun 2013 23:52:38 +0000 (09:52 +1000)]
drivers/usb/gadget/amd5536udc.c: avoid calling dma_pool_create() with NULL dev
Calling dma_pool_create() with dev==NULL will oops on a NUMA machine.
Rather than changing dma_pool_create() we wish to disallow passing
dev==NULL. This requires fixing up the small number of drivers which are
passing in dev==NULL.
Use &dev->pdev->dev instead of NULL.
Signed-off-by: Xi Wang <xi.wang@gmail.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Michal Hocko [Thu, 27 Jun 2013 23:52:37 +0000 (09:52 +1000)]
drop_caches: add some documentation and info message
I would like to resurrect Dave's patch. The last time it was posted was
here https://lkml.org/lkml/2010/9/16/250 and there didn't seem to be any
strong opposition.
Kosaki was worried about possible excessive logging when somebody drops
caches too often (but then he claimed he didn't have a strong opinion on
that) but I would say opposite. If somebody does that then I would really
like to know that from the log when supporting a system because it almost
for sure means that there is something fishy going on. It is also worth
mentioning that only root can write drop caches so this is not an flooding
attack vector.
I am bringing that up again because this can be really helpful when
chasing strange performance issues which (surprise surprise) turn out to
be related to artificially dropped caches done because the admin thinks
this would help...
I have just refreshed the original patch on top of the current mm tree
but I could live with KERN_INFO as well if people think that KERN_NOTICE
is too hysterical.
: From: Dave Hansen <dave@linux.vnet.ibm.com>
: Date: Fri, 12 Oct 2012 14:30:54 +0200
:
: There is plenty of anecdotal evidence and a load of blog posts
: suggesting that using "drop_caches" periodically keeps your system
: running in "tip top shape". Perhaps adding some kernel
: documentation will increase the amount of accurate data on its use.
:
: If we are not shrinking caches effectively, then we have real bugs.
: Using drop_caches will simply mask the bugs and make them harder
: to find, but certainly does not fix them, nor is it an appropriate
: "workaround" to limit the size of the caches.
:
: It's a great debugging tool, and is really handy for doing things
: like repeatable benchmark runs. So, add a bit more documentation
: about it, and add a little KERN_NOTICE. It should help developers
: who are chasing down reclaim-related bugs.
[mhocko@suse.cz: refreshed to current -mm tree]
[akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mike Yoknis [Thu, 27 Jun 2013 23:52:37 +0000 (09:52 +1000)]
mm: memmap_init_zone() performance improvement
We have what we call an "architectural simulator". It is a computer
program that pretends that it is a computer system. We use it to test the
firmware before real hardware is available. We have booted Linux on our
simulator. As you would expect it takes longer to boot on the simulator
than it does on real hardware.
With my patch - boot time 41 minutes
Without patch - boot time 94 minutes
These numbers do not scale linearly to real hardware. But indicate to me
a place where Linux can be improved.
memmap_init_zone() loops through every Page Frame Number (pfn), including
pfn values that are within the gaps between existing memory sections. The
unneeded looping will become a boot performance issue when machines
configure larger memory ranges that will contain larger and more numerous
gaps.
The code will skip across invalid pfn values to reduce the number of loops
executed.
Signed-off-by: Mike Yoknis <mike.yoknis@hp.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mel Gorman [Thu, 27 Jun 2013 23:52:36 +0000 (09:52 +1000)]
mm: vmscan: do not scale writeback pages when deciding whether to set ZONE_WRITEBACK
After the patch "mm: vmscan: Flatten kswapd priority loop" was merged the
scanning priority of kswapd changed. The priority now raises until it is
scanning enough pages to meet the high watermark. shrink_inactive_list
sets ZONE_WRITEBACK if a number of pages were encountered under writeback
but this value is scaled based on the priority. As kswapd frequently
scans with a higher priority now it is relatively easy to set
ZONE_WRITEBACK. This patch removes the scaling and treates writeback
pages similar to how it treats unqueued dirty pages and congested pages.
The user-visible effect should be that kswapd will writeback fewer pages
from reclaim context.
Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mel Gorman [Thu, 27 Jun 2013 23:52:36 +0000 (09:52 +1000)]
mm: vmscan: avoid direct reclaim scanning at maximum priority
Further testing revealed that swapping was still higher than expected for
the parallel IO tests. There was also a performance regression reported
building kernels but there appears to be multiple sources of that problem.
This follow-up series primarily addresses the first swapping issue.
The tests were based on three kernels
vanilla: kernel 3.10-rc4 as that is what the current mmotm uses as a baseline
mmotm-20130606 is mmotm as of that date.
lessdisrupt-v1 is this follow-up series on top of the mmotm kernel
The first test used memcached+memcachetest while some background IO was in
progress as implemented by the parallel IO tests implement in MM Tests.
memcachetest benchmarks how many operations/second memcached can service.
It starts with no background IO on a freshly created ext4 filesystem and
then re-runs the test with larger amounts of IO in the background to
roughly simulate a large copy in progress. The expectation is that the IO
should have little or no impact on memcachetest which is running entirely
in memory.
memcachetest is the transactions/second reported by memcachetest. In
the vanilla kernel note that performance drops from around
23K/sec to just over 4K/second when there is 2385M of IO going
on in the background. With current mmotm and the follow-on
series performance is good.
swaptotal is the total amount of swap traffic. With mmotm the total amount
of swapping is much reduced. Note that with 4G of background IO that
this follow-up series almost completely eliminated swap IO.
1. Swap outs were almost completely eliminated and there were no swap-ins.
2. Direct reclaim is active due to reduced activity from kswapd and the fact
that it is no longer reclaiming at priority 0
3. Zone scanning is still relatively balanced.
4. Page writes from reclaim context is still reasonable low.
3.10.0-rc4 3.10.0-rc4 3.10.0-rc4
vanillamm1-mmotm-20130606mm1-lessdisrupt-v1
Mean sda-avgqz 168.05 34.64 35.60
Mean sda-await 831.76 216.31 207.05
Mean sda-r_await 7.88 9.68 7.25
Mean sda-w_await 3088.32 223.90 218.28
Max sda-avgqz 1162.17 766.85 795.69
Max sda-await 6788.75 4130.01 3728.43
Max sda-r_await 106.93 242.00 65.97
Max sda-w_await 30565.93 4145.75 3959.87
Wait times are marginally reduced by the follow-up and still a massive
improve against the mainline kernel.
I tested parallel kernel builds when booted with 1G of RAM. 12 kernels
were built with 2 being compiled at any given time.
multibuild
3.10.0-rc4 3.10.0-rc4 3.10.0-rc4
vanilla mm1-mmotm-20130606 mm1-lessdisrupt-v1
User min 584.99 ( 0.00%) 553.31 ( 5.42%) 569.08 ( 2.72%)
User mean 598.35 ( 0.00%) 574.48 ( 3.99%) 581.65 ( 2.79%)
User stddev 10.01 ( 0.00%) 17.90 (-78.78%) 10.03 ( -0.14%)
User max 614.64 ( 0.00%) 598.94 ( 2.55%) 597.97 ( 2.71%)
User range 29.65 ( 0.00%) 45.63 (-53.90%) 28.89 ( 2.56%)
System min 35.78 ( 0.00%) 35.05 ( 2.04%) 35.54 ( 0.67%)
System mean 36.12 ( 0.00%) 35.69 ( 1.20%) 35.88 ( 0.69%)
System stddev 0.26 ( 0.00%) 0.55 (-113.69%) 0.21 ( 17.51%)
System max 36.53 ( 0.00%) 36.44 ( 0.25%) 36.13 ( 1.09%)
System range 0.75 ( 0.00%) 1.39 (-85.33%) 0.59 ( 21.33%)
Elapsed min 190.54 ( 0.00%) 190.56 ( -0.01%) 192.99 ( -1.29%)
Elapsed mean 197.58 ( 0.00%) 203.30 ( -2.89%) 200.53 ( -1.49%)
Elapsed stddev 4.65 ( 0.00%) 5.26 (-13.16%) 5.66 (-21.79%)
Elapsed max 203.72 ( 0.00%) 210.23 ( -3.20%) 210.46 ( -3.31%)
Elapsed range 13.18 ( 0.00%) 19.67 (-49.24%) 17.47 (-32.55%)
CPU min 308.00 ( 0.00%) 282.00 ( 8.44%) 294.00 ( 4.55%)
CPU mean 320.80 ( 0.00%) 299.78 ( 6.55%) 307.67 ( 4.09%)
CPU stddev 10.44 ( 0.00%) 13.83 (-32.50%) 9.71 ( 7.01%)
CPU max 340.00 ( 0.00%) 333.00 ( 2.06%) 328.00 ( 3.53%)
CPU range 32.00 ( 0.00%) 51.00 (-59.38%) 34.00 ( -6.25%)
Average kernel build times are still impacted but the follow-up series
helps marginally (it's too noisy to be sure). A preliminary bisection
indicated that there were multiple sources of the regression. The two
other points are the patches that cause mark_page_accessed to be obeyed
and the slab shrinker series. As there a number of patches in flight to
mmotm at the moment in different areas it would be best to confirm this
after this follow-up is merged.
This patch (of 2):
Page reclaim at priority 0 will scan the entire LRU as priority 0 is
considered to be a near OOM condition. Direct reclaim can reach this
priority while still making reclaim progress. This patch avoids
reclaiming at priority 0 unless no reclaim progress was made and the page
allocator would consider firing the OOM killer. The user-visible impact
is that direct reclaim will not easily reach priority 0 and start swapping
prematurely.
Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Zhang Yanfei [Thu, 27 Jun 2013 23:52:35 +0000 (09:52 +1000)]
mm/vmalloc.c: fix an overflow bug in alloc_vmap_area()
When searching a vmap area in the vmalloc space, we use (addr + size - 1)
to check if the value is less than addr, which is an overflow. But we
assign (addr + size) to vmap_area->va_end.
Wanpeng Li [Thu, 27 Jun 2013 23:52:34 +0000 (09:52 +1000)]
mm/pgtable: don't accumulate addr during pgd prepopulate pmd
The old codes accumulate addr to get right pmd, however, currently pmds
are preallocated and transfered as a parameter, there is unnecessary to
accumulate addr variable any more, this patch remove it.
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Wanpeng Li [Thu, 27 Jun 2013 23:52:34 +0000 (09:52 +1000)]
mm/thp: fix doc for transparent huge zero page
Transparent huge zero page is used during the page fault instead of in
khugepaged.
# ls /sys/kernel/mm/transparent_hugepage/
defrag enabled khugepaged use_zero_page
# ls /sys/kernel/mm/transparent_hugepage/khugepaged/
alloc_sleep_millisecs defrag full_scans max_ptes_none pages_collapsed pages_to_scan scan_sleep_millisecs
This patch corrects the documentation just like the codes done.
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Wanpeng Li [Thu, 27 Jun 2013 23:52:34 +0000 (09:52 +1000)]
mm/page_alloc: fix doc for numa_zonelist_order
The default zonelist order selecter will select "node" order if any node's
DMA zone comprises greater than 70% of its local memory instead of 60%,
according to default_zonelist_order::low_kmem_size > total * 70/100.
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Wanpeng Li [Thu, 27 Jun 2013 23:52:33 +0000 (09:52 +1000)]
mm/writeback: commit reason of WB_REASON_FORKER_THREAD mismatch name
After 839a8e86 ("writeback: replace custom worker pool implementation with
unbound workqueue"), there is no bdi forker thread any more. However,
WB_REASON_FORKER_THREAD is still used due to it is TPs userland visible
and we won't be exposing exactly the same information with just a
different name.
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Reviewed-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Wanpeng Li [Thu, 27 Jun 2013 23:52:33 +0000 (09:52 +1000)]
mm/writeback: don't check force_wait to handle bdi->work_list
After 839a8e86 ("writeback: replace custom worker pool implementation with
unbound workqueue"), bdi_writeback_workfn runs off bdi_writeback->dwork,
on each execution, it processes bdi->work_list and reschedules if there
are more things to do instead of flush any work that race with us
existing. It is unecessary to check force_wait in wb_do_writeback since
it is always 0 after the mentioned commit. This patch remove the
force_wait in wb_do_writeback.
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Reviewed-by: Tejun Heo <tj@kernel.org> Reviewed-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Haicheng Li [Thu, 27 Jun 2013 23:52:32 +0000 (09:52 +1000)]
fs/fs-writeback.c: : make wb_do_writeback() as static
It's not used globally and could be static.
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Zhang Yanfei [Thu, 27 Jun 2013 23:52:32 +0000 (09:52 +1000)]
mm/sparse.c: put clear_hwpoisoned_pages within CONFIG_MEMORY_HOTREMOVE
With CONFIG_MEMORY_HOTREMOVE unset, there is a compile warning:
mm/sparse.c:755: warning: `clear_hwpoisoned_pages' defined but not used
And Bisecting it ended up pointing to 4edd7ceff ("mm, hotplug: avoid
compiling memory hotremove functions when disabled").
This is because the commit above put sparse_remove_one_section() within
the protection of CONFIG_MEMORY_HOTREMOVE but the only user of
clear_hwpoisoned_pages() is sparse_remove_one_section(), and it is not
within the protection of CONFIG_MEMORY_HOTREMOVE.
So put clear_hwpoisoned_pages within CONFIG_MEMORY_HOTREMOVE should fix
the warning.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Acked-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 27 Jun 2013 23:52:31 +0000 (09:52 +1000)]
vfree: don't schedule free_work() if llist_add() returns false
vfree() only needs schedule_work(&p->wq) if p->list was empty, otherwise
vfree_deferred->wq is already pending or it is running and didn't do
llist_del_all() yet.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Zhang Yanfei [Thu, 27 Jun 2013 23:52:31 +0000 (09:52 +1000)]
mm/page_alloc.c: remove unlikely() from the current_order test
In __rmqueue_fallback(), current_order loops down from MAX_ORDER - 1 to
the order passed. MAX_ORDER is typically 11 and pageblock_order is
typically 9 on x86. Integer division truncates, so pageblock_order / 2 is
4. For the first eight iterations, it's guaranteed that current_order >=
pageblock_order / 2 if it even gets that far!
So just remove the unlikely(), it's completely bogus.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Suggested-by: David Rientjes <rientjes@google.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Zhang Yanfei [Thu, 27 Jun 2013 23:52:30 +0000 (09:52 +1000)]
mm/page_alloc.c: remove zone_type argument of build_zonelists_node
The callers of build_zonelists_node always pass MAX_NR_ZONES -1 as the
zone_type argument, so we can directly use the value in
build_zonelists_node and remove zone_type argument.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>