Namhyung Kim [Wed, 4 Apr 2012 07:16:05 +0000 (00:16 -0700)]
perf tools: Move GTK+ bits to tools/perf/ui/gtk directory
Move those files to new directory in order to be prepared to
further UI work. Makefile and header file pathes are adjusted
accordingly. Also fix a build breakage if NO_GTK2=1 is given.
Signed-off-by: Namhyung Kim <namhyung.kim@lge.com> Suggested-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl Link: http://lkml.kernel.org/r/1333523765-12092-1-git-send-email-namhyung.kim@lge.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Wed, 11 Apr 2012 20:04:59 +0000 (17:04 -0300)]
perf annotate: Fix a build error
CC util/annotate.o
util/annotate.c: In function symbol__annotate:
util/annotate.c:87:16: error: parsed_line may be used uninitialized in this function [-Werror=maybe-uninitialized]
util/annotate.c:211:22: note: parsed_line was declared here
cc1: all warnings being treated as errors
make: *** [util/annotate.o] Error 1
make: *** Waiting for unfinished jobs....
Signed-off-by: Namhyung Kim <namhyung.kim@lge.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ashay Rane <ashay.rane@tacc.utexas.edu> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/87ty0tlv4i.fsf@dasan.aot.lge.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Wed, 11 Apr 2012 10:39:51 +0000 (12:39 +0200)]
perf tools: Fix parsers' rules to dependencies
Currently the parsers objects (bison/flex related) are each time perf
is built. No matter the generated files are already in place, the
parser generation is executed every time.
Changing the rules to have proper flex/bison objects generation
dependencies.
The parsers code is not rebuilt until the flex/bison source files
are touched. Also when flex/bison source is changed, only dependent
objects are rebuilt.
Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1334140791-3024-1-git-send-email-jolsa@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
... and make it the default one so that calling 'make' without arguments
in the tools/ directory gives you the possible targets to build along
with a short description of what they are.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Sam Ravnborg <sam@ravnborg.org> Link: http://lkml.kernel.org/r/1334162178-17152-5-git-send-email-bp@amd64.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The commit 65f3e56e0c81 ("perf tools: Remove auto-generated bison/flex
files") removed those files from git, so they'll be listed on untracked
files after building perf. Fix it.
Signed-off-by: Namhyung Kim <namhyung.kim@lge.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1333948274-20043-1-git-send-email-namhyung.kim@lge.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since we now have objdump_line allowing tools to print the offset
independently from the rest of the line, allow toggling a view where
just offsets from the start of the function are shown:
2f: jne ffffffff81265494 <__list_del_entry+0x84>
<SNIP>
84: mov %rdi,%rcx
The offset view will be the default as soon as operations that deal with
offsets in a function are handled accodringly, i.e. in offset view the
above will become:
2f: jne __list_del_entry+0x84
<SNIP>
84: mov %rdi,%rcx
And then a follow up patch will allow navigating thru jumps, just like
we handle callq instructions.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-4zpgimmz8xv7b5c920el7s45@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf ui browser: Return the current color when setting a new one
Tools that want to change parts of the line to a different color and
then restore the previous one will use this, starting with the annotate
browser that will change the color of addresses if not on the current
entry, i.e. the selected one.
Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-uiajpevhxo4mzrvna6remb4a@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf annotate: Validate addr in symbol__inc_addr_samples
This routine was checking only if the provided address was after
sym->end, not if it was before sym->start.
Fix that by checking for both and return in both cases -ERANGE, so that
tools can communicate this to the user properly, or if they chose so, to
abort.
This problem was reported previously but the fixes involved either doing
what was being done for the > end case, i.e. silently drop the sample,
returning 0, or aborting at this function, which is in a lib (or better,
is slated to be at some point) and shouldn't abort.
The 'report' tool already checks this value and uses pr_debug to warn
the user.
This patch makes the 'top' tool check it too and warn once per map where
such range problem takes place.
Reported-by: David Miller <davem@davemloft.net> Reported-by: Sorin Dumitru <dumitru.sorin87@gmail.com> Reported-by: Stephane Eranian <eranian@google.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-lw8gs7p9i9nhldilo82tzpne@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Wed, 4 Apr 2012 20:21:31 +0000 (22:21 +0200)]
perf hists browser: Fix NULL deref in hists browsing code
If there's an event with no samples in data file, the perf report
command can segfault after entering the event details menu.
Following steps reproduce the issue:
# ./perf record -e syscalls:sys_enter_kexec_load,syscalls:sys_enter_mmap ls
# ./perf report
# enter '0 syscalls:sys_enter_kexec_load' menu
# pres ENTER twice
Above steps are valid assuming ls wont run kexec.. ;)
The check for sellection to be NULL is missing. The fix makes sure it's
being check. Above steps now endup with menu being displayed allowing
'Exit' as the only option.
Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1333570898-10505-2-git-send-email-jolsa@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
David Miller [Tue, 27 Mar 2012 07:14:18 +0000 (03:14 -0400)]
perf hists: Catch and handle out-of-date hist entry maps.
When a process exec()'s, all the maps are retired, but we keep the hist
entries around which hold references to those outdated maps.
If the same library gets mapped in for which we have hist entries, a new
map will be created. But when we take a perf entry hit within that map,
we'll find the existing hist entry with the older map.
This causes symbol translations to be done incorrectly. For example,
the perf entry processing will lookup the correct uptodate map entry and
use that to calculate the symbol and DSO relative address. But later
when we update the histogram we'll translate the address using the
outdated map file instead leading to conditions such as out-of-range
offsets in symbol__inc_addr_samples().
Therefore, update the map of the hist_entry dynamically at lookup/
creation time.
perf tools: Fix getrusage() related build failure on glibc trunk
On a system running glibc trunk perf doesn't build:
CC builtin-sched.o
builtin-sched.c: In function ‘get_cpu_usage_nsec_parent’: builtin-sched.c:399:16: error: storage size of ‘ru’ isn’t known builtin-sched.c:403:2: error: implicit declaration of function ‘getrusage’ [-Werror=implicit-function-declaration]
[...]
Fix it by including sys/resource.h.
Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120404084527.GA294@x4 Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Tue, 27 Mar 2012 14:50:42 +0000 (16:50 +0200)]
perf/x86/p4: Add format attributes
Steven reported his P4 not booting properly, the missing format
attributes cause a NULL ptr deref. Cure this by adding the
missing format specification.
I took the format description out of the comment near
p4_config_pack*() and hope that comment is still relatively
accurate.
Reported-by: Steven Rostedt <rostedt@goodmis.org> Reported-by: Bruno Prémont <bonbons@linux-vserver.org> Tested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1332859842.16159.227.camel@twins Signed-off-by: Ingo Molnar <mingo@kernel.org>
Oleg Nesterov [Fri, 30 Mar 2012 16:26:36 +0000 (18:26 +0200)]
tracing, sched, vfs: Fix 'old_pid' usage in trace_sched_process_exec()
1. TRACE_EVENT(sched_process_exec) forgets to actually use the
old pid argument, it sets ->old_pid = p->pid.
2. search_binary_handler() uses the wrong pid number. tracepoint
needs the global pid_t from the root namespace, while old_pid
is the virtual pid number as it seen by the tracer/parent.
With this patch we have two pid_t's in search_binary_handler(),
not really nice. Perhaps we should switch to "struct pid*", but
in this case it would be better to cleanup the current code
first and move the "depth == 0" code outside.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: David Smith <dsmith@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Link: http://lkml.kernel.org/r/20120330162636.GA4857@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
perf annotate: Fix off by one symbol hist size allocation and hit accounting
We were not noticing it because symbol__inc_addr_samples was erroneously
dropping samples that hit the last byte in a function.
Working on a fix for a problem reported by David Miller, Stephane
Eranian and Sorin Dumitru, where addresses < sym->start were causing
problems, I noticed this other problem.
Cc: David Ahern <dsahern@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sorin Dumitru <dumitru.sorin87@gmail.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-pqjaq4cr1xs2xen73pjhbav4@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf probe: Finder fails to resolve function name to address
If DIE entries corresponding to declarations appear before definition
entry, probe finder returns error instead of continuing to look further
for a definition entry.
This patch ensures we reach to the DIE entry corresponding to the
definition and get the function address.
V2: A simpler solution based on Masami's suggestion.
Steven Rostedt [Tue, 27 Mar 2012 14:43:28 +0000 (10:43 -0400)]
tracing: Fix ent_size in trace output
When reading the trace file, the records of each of the per_cpu buffers
are examined to find the next event to print out. At the point of looking
at the event, the size of the event is recorded. But if the first event is
chosen, the other events in the other CPU buffers will reset the event size
that is stored in the iterator descriptor, causing the event size passed to
the output functions to be incorrect.
In most cases this is not a problem, but for the case of stack traces, it
is. With the change to the stack tracing to record a dynamic number of
back traces, the output depends on the size of the entry instead of the
fixed 8 back traces. When the entry size is not correct, the back traces
would not be fully printed.
Note, reading from the per-cpu trace files were not affected.
Reported-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Stephane Eranian [Sat, 17 Mar 2012 22:23:18 +0000 (23:23 +0100)]
perf tools: Fix bug in raw sample parsing
In perf_event__parse_sample(), the array variable was not incremented
by the amount of data used by the raw_data.
That was okay until we added PERF_SAMPLE_BRANCH_STACK which depends on
the array variable pointing to the beginning of the branch stack data.
But that was not the case if branch stack was combined with raw mode
sampling. That led to bogus branch stack addresses and count.
The bug would show up with:
$ perf record -R -b foo
This patch fixes the problem by correctly moving the array pointer
forward for RAW samples.
Signed-off-by: Stephane Eranian <eranian@google.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120317222317.GA8803@quad
[ committer note: Fix also later submitted by Jiri Olsa ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf tools: Fix display of first level of callchains
The callchain stdio mode display was written using a sorted by symbol
report. In this mode we have only one callchain root per hist so we
forgot to handle cases where we have multiple callchain root, as in per
dso sorting for example.
Fix this by handling these roots like any other branch, with the hist as
the parent.
Single roots keep their entry without percentage because they have
the same overhead than the hist they refer to. ie: 100% in fractal
mode and the percentage of the hist in graph mode:
Jiri Olsa [Mon, 26 Mar 2012 09:17:05 +0000 (11:17 +0200)]
perf tools: Switch module.h into export.h
When merged to Linus's latest tree the perf build is broken
due to following change in lib/rbtree.c object:
lib: reduce the use of module.h wherever possible
commit 8bc3bcc93a2b4e47d5d410146f6546bca6171663
Author: Paul Gortmaker <paul.gortmaker@windriver.com>
Date: Wed Nov 16 21:29:17 2011 -0500
Linus Torvalds [Sat, 24 Mar 2012 19:20:25 +0000 (12:20 -0700)]
Merge tag 'stable/for-linus-3.4-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Pull more xen updates from Konrad Rzeszutek Wilk:
"One tiny feature that accidentally got lost in the initial git pull:
* Add fast-EOI acking of interrupts (clear a bit instead of
hypercall)
And bug-fixes:
* Fix CPU bring-up code missing a call to notify other subsystems.
* Fix reading /sys/hypervisor even if PVonHVM drivers are not loaded.
* In Xen ACPI processor driver: remove too verbose WARN messages, fix
up the Kconfig dependency to be a module by default, and add
dependency on CPU_FREQ.
* Disable CPU frequency drivers from loading when booting under Xen
(as we want the Xen ACPI processor to be used instead).
* Cleanups in tmem code."
* tag 'stable/for-linus-3.4-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/acpi: Fix Kconfig dependency on CPU_FREQ
xen: initialize platform-pci even if xen_emul_unplug=never
xen/smp: Fix bringup bug in AP code.
xen/acpi: Remove the WARN's as they just create noise.
xen/tmem: cleanup
xen: support pirq_eoi_map
xen/acpi-processor: Do not depend on CPU frequency scaling drivers.
xen/cpufreq: Disable the cpu frequency scaling drivers from loading.
provide disable_cpufreq() function to disable the API.
Rik van Riel [Sat, 24 Mar 2012 14:26:21 +0000 (10:26 -0400)]
Fix potential endless loop in kswapd when compaction is not enabled
We should only test compaction_suitable if the kernel is built with
CONFIG_COMPACTION, otherwise the stub compaction_suitable function will
always return COMPACT_SKIPPED and send kswapd into an infinite loop.
Reported-by: Anton Blanchard <anton@samba.org> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 24 Mar 2012 17:41:37 +0000 (10:41 -0700)]
Merge tag 'device-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
Pull <linux/device.h> avoidance patches from Paul Gortmaker:
"Nearly every subsystem has some kind of header with a proto like:
void foo(struct device *dev);
and yet there is no reason for most of these guys to care about the
sub fields within the device struct. This allows us to significantly
reduce the scope of headers including headers. For this instance, a
reduction of about 40% is achieved by replacing the include with the
simple fact that the device is some kind of a struct.
Unlike the much larger module.h cleanup, this one is simply two
commits. One to fix the implicit <linux/device.h> users, and then one
to delete the device.h includes from the linux/include/ dir wherever
possible."
* tag 'device-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
device.h: audit and cleanup users in main include dir
device.h: cleanup users outside of linux/include (C files)
Linus Torvalds [Sat, 24 Mar 2012 17:24:31 +0000 (10:24 -0700)]
Merge tag 'module-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
Pull cleanup of fs/ and lib/ users of module.h from Paul Gortmaker:
"Fix up files in fs/ and lib/ dirs to only use module.h if they really
need it.
These are trivial in scope vs the work done previously. We now have
things where any few remaining cleanups can be farmed out to arch or
subsystem maintainers, and I have done so when possible. What is
remaining here represents the bits that don't clearly lie within a
single arch/subsystem boundary, like the fs dir and the lib dir.
Some duplicate includes arising from overlapping fixes from
independent subsystem maintainer submissions are also quashed."
Fix up trivial conflicts due to clashes with other include file cleanups
(including some due to the previous bug.h cleanup pull).
* tag 'module-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
lib: reduce the use of module.h wherever possible
fs: reduce the use of module.h wherever possible
includecheck: delete any duplicate instances of module.h
Linus Torvalds [Sat, 24 Mar 2012 17:08:39 +0000 (10:08 -0700)]
Merge tag 'bug-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
Pull <linux/bug.h> cleanup from Paul Gortmaker:
"The changes shown here are to unify linux's BUG support under the one
<linux/bug.h> file. Due to historical reasons, we have some BUG code
in bug.h and some in kernel.h -- i.e. the support for BUILD_BUG in
linux/kernel.h predates the addition of linux/bug.h, but old code in
kernel.h wasn't moved to bug.h at that time. As a band-aid, kernel.h
was including <asm/bug.h> to pseudo link them.
This has caused confusion[1] and general yuck/WTF[2] reactions. Here
is an example that violates the principle of least surprise:
CC lib/string.o
lib/string.c: In function 'strlcat':
lib/string.c:225:2: error: implicit declaration of function 'BUILD_BUG_ON'
make[2]: *** [lib/string.o] Error 1
$
$ grep linux/bug.h lib/string.c
#include <linux/bug.h>
$
We've included <linux/bug.h> for the BUG infrastructure and yet we
still get a compile fail! [We've not kernel.h for BUILD_BUG_ON.] Ugh -
very confusing for someone who is new to kernel development.
With the above in mind, the goals of this changeset are:
1) find and fix any include/*.h files that were relying on the
implicit presence of BUG code.
2) find and fix any C files that were consuming kernel.h and hence
relying on implicitly getting some/all BUG code.
3) Move the BUG related code living in kernel.h to <linux/bug.h>
4) remove the asm/bug.h from kernel.h to finally break the chain.
During development, the order was more like 3-4, build-test, 1-2. But
to ensure that git history for bisect doesn't get needless build
failures introduced, the commits have been reorderd to fix the problem
areas in advance.
Fix up conflicts (new radeon file, reiserfs header cleanups) as per Paul
and linux-next.
* tag 'bug-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
kernel.h: doesn't explicitly use bug.h, so don't include it.
bug: consolidate BUILD_BUG_ON with other bug code
BUG: headers with BUG/BUG_ON etc. need linux/bug.h
bug.h: add include of it to various implicit C users
lib: fix implicit users of kernel.h for TAINT_WARN
spinlock: macroize assert_spin_locked to avoid bug.h dependency
x86: relocate get/set debugreg fcns to include/asm/debugreg.
The functions: "acpi_processor_*" sound like they depend on CONFIG_ACPI_PROCESSOR
but in reality they are exposed when CONFIG_CPU_FREQ=[y|m]. As such
update the Kconfig to have this dependency and fix compile issues:
Insert/remove/Lookup in sysctl are all now O(NlogN) operations, and
are no longer bottlenecks in the process of adding and removing
network devices.
sysctl is now focused on being a filesystem instead of system call
and the code can all be found in fs/proc/proc_sysctl.c. Hopefully
this means the code is now approachable.
Much thanks is owed to Lucian Grinjincu for keeping at this until
something was found that was usable.
- The recent proc_sys_poll oops found by the fuzzer during hibernation
is fixed.
* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl: (36 commits)
sysctl: protect poll() in entries that may go away
sysctl: Don't call sysctl_follow_link unless we are a link.
sysctl: Comments to make the code clearer.
sysctl: Correct error return from get_subdir
sysctl: An easier to read version of find_subdir
sysctl: fix memset parameters in setup_sysctl_set()
sysctl: remove an unused variable
sysctl: Add register_sysctl for normal sysctl users
sysctl: Index sysctl directories with rbtrees.
sysctl: Make the header lists per directory.
sysctl: Move sysctl_check_dups into insert_header
sysctl: Modify __register_sysctl_paths to take a set instead of a root and an nsproxy
sysctl: Replace root_list with links between sysctl_table_sets.
sysctl: Add sysctl_print_dir and use it in get_subdir
sysctl: Stop requiring explicit management of sysctl directories
sysctl: Add a root pointer to ctl_table_set
sysctl: Rewrite proc_sys_readdir in terms of first_entry and next_entry
sysctl: Rewrite proc_sys_lookup introducing find_entry and lookup_entry.
sysctl: Normalize the root_table data structure.
sysctl: Factor out insert_header and erase_header
...
Linus Torvalds [Sat, 24 Mar 2012 00:56:39 +0000 (17:56 -0700)]
Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
Pull cpufreq updates for 3.4 from Dave Jones: new drivers and some fixes.
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
provide disable_cpufreq() function to disable the API.
EXYNOS5250: Add support cpufreq for EXYNOS5250
EXYNOS4X12: Add support cpufreq for EXYNOS4X12
[CPUFREQ] CPUfreq ondemand: update sampling rate without waiting for next sampling
[CPUFREQ] Add S3C2416/S3C2450 cpufreq driver
[CPUFREQ] Fix exposure of ARM_EXYNOS4210_CPUFREQ
[CPUFREQ] EXYNOS4210: update the name of EXYNOS clock register
[CPUFREQ] EXYNOS: Initialize locking_frequency with initial frequency
[CPUFREQ] s3c64xx: Fix mis-cherry pick of VDDINT
Fix up trivial conflicts in Kconfig and Makefile due to just changes
next to each other (OMAP2PLUS changes vs some new EXYNOS cpufreq
drivers).
Linus Torvalds [Sat, 24 Mar 2012 00:51:50 +0000 (17:51 -0700)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
Pull cpufreq fixes from Dave Jones:
"I meant to get some of these in for 3.3 final, but left things too
late, so I've got two trees this time."
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
cpufreq: OMAP: specify range for voltage scaling
cpufreq: OMAP: scale voltage along with frequency
cpufreq: OMAP driver depends CPUfreq tables
Linus Torvalds [Sat, 24 Mar 2012 00:37:40 +0000 (17:37 -0700)]
Merge branch 'pcmcia' of git://git.linaro.org/people/rmk/linux-arm
Pull #3 ARM updates from Russell King:
"This adds gpio support to soc_common, allowing an amount of code to be
deleted from each PCMCIA socket driver for the PXA/SA11x0 SoCs."
* 'pcmcia' of git://git.linaro.org/people/rmk/linux-arm:
PCMCIA: sa1111: rename sa1111 socket drivers to have sa1111_ prefix.
PCMCIA: make lubbock socket driver part of sa1111_cs
PCMCIA: add Kconfig control for building sa11xx_base.c
PCMCIA: sa1111: jornada720: no need to disable IRQs around sa1111_set_io
PCMCIA: sa1111: pass along sa1111_pcmcia_configure_socket() failure code
PCMCIA: soc_common: remove explicit wrprot initialization in socket drivers
PCMCIA: soc_common: remove soc_pcmcia_*_irqs functions
PCMCIA: sa11x0: h3600: convert to use new irq/gpio management
PCMCIA: sa11x0: simpad: convert to use new irq/gpio management
PCMCIA: sa11x0: shannon: convert to use new irq/gpio management
PCMCIA: sa11x0: nanoengine: convert reset handling to use GPIO subsystem
PCMCIA: sa11x0: nanoengine: convert to use new irq/gpio management
PCMCIA: sa11x0: cerf: convert reset handling to use GPIO subsystem
PCMCIA: sa11x0: cerf: convert to use new irq/gpio management
PCMCIA: sa11x0: assabet: convert to use new irq/gpio management
PCMCIA: sa1111: use new per-socket irq/gpio infrastructure
PCMCIA: pxa: convert PXA socket drivers to use new irq/gpio management
PCMCIA: soc_common: add GPIO support for card status signals
PCMCIA: soc_common: move common initialization into soc_common
Linus Torvalds [Sat, 24 Mar 2012 00:36:29 +0000 (17:36 -0700)]
Merge branch 'amba' of git://git.linaro.org/people/rmk/linux-arm
Pull #2 ARM updates from Russell King:
"Further ARM AMBA primecell updates which aren't included directly in
the previous commit. I wanted to keep these separate as they're
touching stuff outside arch/arm/."
* 'amba' of git://git.linaro.org/people/rmk/linux-arm:
ARM: 7362/1: AMBA: Add module_amba_driver() helper macro for amba_driver
ARM: 7335/1: mach-u300: do away with MMC config files
ARM: 7280/1: mmc: mmci: Cache MMCICLOCK and MMCIPOWER register
ARM: 7309/1: realview: fix unconnected interrupts on EB11MP
ARM: 7230/1: mmc: mmci: Fix PIO read for small SDIO packets
ARM: 7227/1: mmc: mmci: Prepare for SDIO before setting up DMA job
ARM: 7223/1: mmc: mmci: Fixup use of runtime PM and use autosuspend
ARM: 7221/1: mmc: mmci: Change from using legacy suspend
ARM: 7219/1: mmc: mmci: Change vdd_handler to a generic ios_handler
ARM: 7218/1: mmc: mmci: Provide option to configure bus signal direction
ARM: 7217/1: mmc: mmci: Put power register deviations in variant data
ARM: 7216/1: mmc: mmci: Do not release spinlock in request_end
ARM: 7215/1: mmc: mmci: Increase max_segs from 16 to 128
Linus Torvalds [Sat, 24 Mar 2012 00:30:49 +0000 (17:30 -0700)]
Merge branch 'for-armsoc' of git://git.linaro.org/people/rmk/linux-arm
Pull #1 ARM updates from Russell King:
"This one covers stuff which Arnd is waiting for me to push, as this is
shared between both our trees and probably other trees elsewhere.
Essentially, this contains:
- AMBA primecell device initializer updates - mostly shrinking the
size of the device declarations in platform code to something more
reasonable.
- Getting rid of the NO_IRQ crap from AMBA primecell stuff.
- Nicolas' idle cleanups. This in combination with the restart
cleanups from the last merge window results in a great many
mach/system.h files being deleted."
Yay: ~80 files, ~2000 lines deleted.
* 'for-armsoc' of git://git.linaro.org/people/rmk/linux-arm: (60 commits)
ARM: remove disable_fiq and arch_ret_to_user macros
ARM: make entry-macro.S depend on !MULTI_IRQ_HANDLER
ARM: rpc: make default fiq handler run-time installed
ARM: make arch_ret_to_user macro optional
ARM: amba: samsung: use common amba device initializers
ARM: amba: spear: use common amba device initializers
ARM: amba: nomadik: use common amba device initializers
ARM: amba: u300: use common amba device initializers
ARM: amba: lpc32xx: use common amba device initializers
ARM: amba: netx: use common amba device initializers
ARM: amba: bcmring: use common amba device initializers
ARM: amba: ep93xx: use common amba device initializers
ARM: amba: omap2: use common amba device initializers
ARM: amba: integrator: use common amba device initializers
ARM: amba: realview: get rid of private platform amba_device initializer
ARM: amba: versatile: get rid of private platform amba_device initializer
ARM: amba: vexpress: get rid of private platform amba_device initializer
ARM: amba: provide common initializers for static amba devices
ARM: amba: make use of -1 IRQs warn
ARM: amba: u300: get rid of NO_IRQ initializers
...
Linus Torvalds [Sat, 24 Mar 2012 00:24:25 +0000 (17:24 -0700)]
Merge tag 'for-3.4' of git://openrisc.net/jonas/linux
Pull OpenRISC changes for 3.4 from Jonas Bonn:
"This series for the OpenRISC architecture consists of mostly trivial
fixups. The most interesting bits of the series are:
* A fix to the timer code whereby the shortest trigger period is set
to 100 cycles; previously, it was possible to set this to 1 cycle,
but by the time the register was written, that time had already
passed and the timer interrupt would not go off until the cycle
counter had gone a full cycle.
* Allowing a device tree binary to be passed in to the kernel from
u-boot. The OpenRISC architecture has been recently merged into
upstream u-boot, so this change gets OpenRISC Linux into sync with
that project."
* tag 'for-3.4' of git://openrisc.net/jonas/linux:
OpenRISC: Remove memory_start/end prototypes
openrisc: remove semicolon from KSTK_ defs
openrisc: sanitize use of orig_gpr11
openrisc: fix virt_addr_valid
OpenRISC: Export dump_stack()
OpenRISC: Select GENERIC_ATOMIC64
openrisc: Set shortest clock event to 100 ticks
openrisc: included linux/thread_info.h twice
OpenRISC: Use set_current_blocked() and block_sigmask()
OpenRISC: Don't mask signals if we fail to setup signal stack
OpenRISC: No need to reset handler if SA_ONESHOT
OpenRISC: Don't reimplement force_sigsegv()
openrisc: enable passing of flattened device tree pointer
arch/openrisc/mm/init.c: trivial: use BUG_ON
Linus Torvalds [Sat, 24 Mar 2012 00:19:37 +0000 (17:19 -0700)]
Merge tag 'ia64-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
Pull miscellaneous Itanium patches from Tony Luck.
The conflicts in arch/ia64/hp/sim/simserial.c were due to patches to
simserial that had alredy been included (with lots of further cleanups)
in the serial tree.
* tag 'ia64-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
Documentation/kernel-parameters: remove inttest parameter
[IA64] Fix ISA IRQ trigger model and polarity setting
[IA64] Fix a couple of warnings for EXPORT_SYMBOL
[IA64] Check return from device_register() in cx_device_register()
[IA64] Fix warning from machine_kexec.c
[IA64] simserial, bail out when request_irq fails
[IA64] hpsim, initialize chip for assigned irqs
[IA64] simserial, include some headers
[IA64] hpsim, fix SAL handling in fw-emu
[IA64] genirq fixup for SGI/SN
[IA64] disable interrupts when exiting from ia64_mca_cmc_int_handler()
Linus Torvalds [Sat, 24 Mar 2012 00:10:09 +0000 (17:10 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull additional x86 fixes from Peter Anvin:
- address a long-standing bug related to when a kernel-spawned process
gets a signal on an i386 kernel compiled without CONFIG_VM86.
- fix the newly introduced build warning in arch/x86/boot.
- fix a typo in the i386 system call table which affects building some
libcs.
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86-32: Fix endless loop when processing signals for kernel tasks
x86, boot: Correct CFLAGS for hostprogs
x86-32: Fix typo for mq_getsetattr in syscall table
Linus Torvalds [Fri, 23 Mar 2012 23:59:10 +0000 (16:59 -0700)]
Merge branch 'akpm' (Andrew's patch-bomb)
Merge second batch of patches from Andrew Morton:
- various misc things
- core kernel changes to prctl, exit, exec, init, etc.
- kernel/watchdog.c updates
- get_maintainer
- MAINTAINERS
- the backlight driver queue
- core bitops code cleanups
- the led driver queue
- some core prio_tree work
- checkpatch udpates
- largeish crc32 update
- a new poll() feature for the v4l guys
- the rtc driver queue
- fatfs
- ptrace
- signals
- kmod/usermodehelper updates
- coredump
- procfs updates
* emailed from Andrew Morton <akpm@linux-foundation.org>: (141 commits)
seq_file: add seq_set_overflow(), seq_overflow()
proc-ns: use d_set_d_op() API to set dentry ops in proc_ns_instantiate().
procfs: speed up /proc/pid/stat, statm
procfs: add num_to_str() to speed up /proc/stat
proc: speed up /proc/stat handling
fs/proc/kcore.c: make get_sparsemem_vmemmap_info() static
coredump: add VM_NODUMP, MADV_NODUMP, MADV_CLEAR_NODUMP
coredump: remove VM_ALWAYSDUMP flag
kmod: make __request_module() killable
kmod: introduce call_modprobe() helper
usermodehelper: ____call_usermodehelper() doesn't need do_exit()
usermodehelper: kill umh_wait, renumber UMH_* constants
usermodehelper: implement UMH_KILLABLE
usermodehelper: introduce umh_complete(sub_info)
usermodehelper: use UMH_WAIT_PROC consistently
signal: zap_pid_ns_processes: s/SEND_SIG_NOINFO/SEND_SIG_FORCED/
signal: oom_kill_task: use SEND_SIG_FORCED instead of force_sig()
signal: cosmetic, s/from_ancestor_ns/force/ in prepare_signal() paths
signal: give SEND_SIG_FORCED more power to beat SIGNAL_UNKILLABLE
Hexagon: use set_current_blocked() and block_sigmask()
...
It is undocumented but a seq_file's overflow state is indicated by
m->count == m->size. Add seq_set_overflow() and seq_overflow() to
set/check overflow status explicitly.
Based on an idea from Eric Dumazet.
[akpm@linux-foundation.org: tweak code comment] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pravin B Shelar [Fri, 23 Mar 2012 22:02:55 +0000 (15:02 -0700)]
proc-ns: use d_set_d_op() API to set dentry ops in proc_ns_instantiate().
The namespace cleanup path leaks a dentry which holds a reference count
on a network namespace. Keeping that network namespace from being freed
when the last user goes away. Leaving things like vlan devices in the
leaked network namespace.
If you use ip netns add for much real work this problem becomes apparent
pretty quickly. It light testing the problem hides because frequently
you simply don't notice the leak.
Use d_set_d_op() so that DCACHE_OP_* flags are set correctly.
This issue exists back to 3.0.
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Reported-by: Justin Pettit <jpettit@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Cc: David Miller <davem@davemloft.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Process accounting applications as top, ps visit some files under
/proc/<pid>. With seq_put_decimal_ull(), we can optimize /proc/<pid>/stat
and /proc/<pid>/statm files.
This patch adds
- seq_put_decimal_ll() for signed values.
- allow delimiter == 0.
- convert seq_printf() to seq_put_decimal_ull/ll in /proc/stat, statm.
Test result on a system with 2000+ procs.
Before patch:
[kamezawa@bluextal test]$ top -b -n 1 | wc -l
2223
[kamezawa@bluextal test]$ time top -b -n 1 > /dev/null
real 0m0.675s
user 0m0.044s
sys 0m0.121s
[kamezawa@bluextal test]$ time ps -elf > /dev/null
real 0m0.236s
user 0m0.056s
sys 0m0.176s
After patch:
kamezawa@bluextal ~]$ time top -b -n 1 > /dev/null
real 0m0.657s
user 0m0.052s
sys 0m0.100s
[kamezawa@bluextal ~]$ time ps -elf > /dev/null
real 0m0.198s
user 0m0.050s
sys 0m0.145s
Considering top, ps tend to scan /proc periodically, this will reduce cpu
consumption by top/ps to some extent.
This patch removes most of calls to vsnprintf() by adding num_to_str()
and seq_print_decimal_ull(), which prints decimal numbers without rich
functions provided by printf().
On my 8cpu box.
== Before patch ==
[root@bluextal test]# time ./stat_check.py
real 0m0.150s
user 0m0.026s
sys 0m0.121s
== After patch ==
[root@bluextal test]# time ./stat_check.py
real 0m0.055s
user 0m0.022s
sys 0m0.030s
[akpm@linux-foundation.org: remove incorrect comment, use less statck in num_to_str(), move comment from .h to .c, simplify seq_put_decimal_ull()]
[andrea@betterlinux.com: avoid breaking the ABI in /proc/stat] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrea Righi <andrea@betterlinux.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Turner <pjt@google.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thats partly because show_stat() must be called twice since initial
buffer size is too small (4096 bytes for less than 32 possible cpus)
Fix this by :
1) Taking into account nr_irqs in the initial buffer sizing.
2) Using ksize() to allow better filling of initial buffer.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Russell King - ARM Linux <linux@arm.linux.org.uk> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since we no longer need the VM_ALWAYSDUMP flag, let's use the freed bit
for 'VM_NODUMP' flag. The idea is is to add a new madvise() flag:
MADV_DONTDUMP, which can be set by applications to specifically request
memory regions which should not dump core.
The specific application I have in mind is qemu: we can add a flag there
that wouldn't dump all of guest memory when qemu dumps core. This flag
might also be useful for security sensitive apps that want to absolutely
make sure that parts of memory are not dumped. To clear the flag use:
MADV_DODUMP.
[akpm@linux-foundation.org: s/MADV_NODUMP/MADV_DONTDUMP/, s/MADV_CLEAR_NODUMP/MADV_DODUMP/, per Roland]
[akpm@linux-foundation.org: fix up the architectures which broke] Signed-off-by: Jason Baron <jbaron@redhat.com> Acked-by: Roland McGrath <roland@hack.frob.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Avi Kivity <avi@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jason Baron [Fri, 23 Mar 2012 22:02:51 +0000 (15:02 -0700)]
coredump: remove VM_ALWAYSDUMP flag
The motivation for this patchset was that I was looking at a way for a
qemu-kvm process, to exclude the guest memory from its core dump, which
can be quite large. There are already a number of filter flags in
/proc/<pid>/coredump_filter, however, these allow one to specify 'types'
of kernel memory, not specific address ranges (which is needed in this
case).
Since there are no more vma flags available, the first patch eliminates
the need for the 'VM_ALWAYSDUMP' flag. The flag is used internally by
the kernel to mark vdso and vsyscall pages. However, it is simple
enough to check if a vma covers a vdso or vsyscall page without the need
for this flag.
The second patch then replaces the 'VM_ALWAYSDUMP' flag with a new
'VM_NODUMP' flag, which can be set by userspace using new madvise flags:
'MADV_DONTDUMP', and unset via 'MADV_DODUMP'. The core dump filters
continue to work the same as before unless 'MADV_DONTDUMP' is set on the
region.
The qemu code which implements this features is at:
In my testing the qemu core dump shrunk from 383MB -> 13MB with this
patch.
I also believe that the 'MADV_DONTDUMP' flag might be useful for
security sensitive apps, which might want to select which areas are
dumped.
This patch:
The VM_ALWAYSDUMP flag is currently used by the coredump code to
indicate that a vma is part of a vsyscall or vdso section. However, we
can determine if a vma is in one these sections by checking it against
the gate_vma and checking for a non-NULL return value from
arch_vma_name(). Thus, freeing a valuable vma bit.
Signed-off-by: Jason Baron <jbaron@redhat.com> Acked-by: Roland McGrath <roland@hack.frob.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Avi Kivity <avi@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 23 Mar 2012 22:02:50 +0000 (15:02 -0700)]
kmod: make __request_module() killable
As Tetsuo Handa pointed out, request_module() can stress the system
while the oom-killed caller sleeps in TASK_UNINTERRUPTIBLE.
The task T uses "almost all" memory, then it does something which
triggers request_module(). Say, it can simply call sys_socket(). This
in turn needs more memory and leads to OOM. oom-killer correctly
chooses T and kills it, but this can't help because it sleeps in
TASK_UNINTERRUPTIBLE and after that oom-killer becomes "disabled" by the
TIF_MEMDIE task T.
Make __request_module() killable. The only necessary change is that
call_modprobe() should kmalloc argv and module_name, they can't live in
the stack if we use UMH_KILLABLE. This memory is freed via
call_usermodehelper_freeinfo()->cleanup.
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
No functional changes. It is not sane to use UMH_KILLABLE with enum
umh_wait, but obviously we do not want another argument in
call_usermodehelper_* helpers. Kill this enum, use the plain int.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 23 Mar 2012 22:02:47 +0000 (15:02 -0700)]
usermodehelper: implement UMH_KILLABLE
Implement UMH_KILLABLE, should be used along with UMH_WAIT_EXEC/PROC.
The caller must ensure that subprocess_info->path/etc can not go away
until call_usermodehelper_freeinfo().
call_usermodehelper_exec(UMH_KILLABLE) does
wait_for_completion_killable. If it fails, it uses
xchg(&sub_info->complete, NULL) to serialize with umh_complete() which
does the same xhcg() to access sub_info->complete.
If call_usermodehelper_exec wins, it can safely return. umh_complete()
should get NULL and call call_usermodehelper_freeinfo().
Otherwise we know that umh_complete() was already called, in this case
call_usermodehelper_exec() falls back to wait_for_completion() which
should succeed "very soon".
Note: UMH_NO_WAIT == -1 but it obviously should not be used with
UMH_KILLABLE. We delay the neccessary cleanup to simplify the back
porting.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change zap_pid_ns_processes() to use SEND_SIG_FORCED, it looks more
clear compared to SEND_SIG_NOINFO which relies on from_ancestor_ns logic
send_signal().
It is also more efficient if we need to kill a lot of tasks because it
doesn't alloc sigqueue.
While at it, add the __fatal_signal_pending(task) check as a minor
optimization.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Anton Vorontsov <anton.vorontsov@linaro.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 23 Mar 2012 22:02:45 +0000 (15:02 -0700)]
signal: oom_kill_task: use SEND_SIG_FORCED instead of force_sig()
Change oom_kill_task() to use do_send_sig_info(SEND_SIG_FORCED) instead
of force_sig(SIGKILL). With the recent changes we do not need force_ to
kill the CLONE_NEWPID tasks.
And this is more correct. force_sig() can race with the exiting thread
even if oom_kill_task() checks p->mm != NULL, while
do_send_sig_info(group => true) kille the whole process.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Anton Vorontsov <anton.vorontsov@linaro.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 23 Mar 2012 22:02:44 +0000 (15:02 -0700)]
signal: give SEND_SIG_FORCED more power to beat SIGNAL_UNKILLABLE
force_sig_info() and friends have the special semantics for synchronous
signals, this interface should not be used if the target is not current.
And it needs the fixes, in particular the clearing of SIGNAL_UNKILLABLE
is not exactly right.
However there are callers which have to use force_ exactly because it
clears SIGNAL_UNKILLABLE and thus it can kill the CLONE_NEWPID tasks,
although this is almost always is wrong by various reasons.
With this patch SEND_SIG_FORCED ignores SIGNAL_UNKILLABLE, like we do if
the signal comes from the ancestor namespace.
This makes the naming in prepare_signal() paths insane, fixed by the
next cleanup.
Note: this only affects SIGKILL/SIGSTOP, but this is enough for
force_sig() abusers.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Anton Vorontsov <anton.vorontsov@linaro.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matt Fleming [Fri, 23 Mar 2012 22:02:43 +0000 (15:02 -0700)]
Hexagon: use set_current_blocked() and block_sigmask()
As described in e6fa16ab9c1e ("signal: sigprocmask() should do
retarget_shared_pending()") the modification of current->blocked is
incorrect as we need to check whether the signal we're about to block is
pending in the shared queue.
Also, use the new helper function introduced in commit 5e6292c0f28f
("signal: add block_sigmask() for adding sigmask to current->blocked")
which centralises the code for updating current->blocked after
successfully delivering a signal and reduces the amount of duplicate
code across architectures. In the past some architectures got this code
wrong, so using this helper function should stop that from happening
again.
Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Richard Kuo <rkuo@codeaurora.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Denys Vlasenko [Fri, 23 Mar 2012 22:02:43 +0000 (15:02 -0700)]
ptrace: remove PTRACE_SEIZE_DEVEL bit
PTRACE_SEIZE code is tested and ready for production use, remove the
code which requires special bit in data argument to make PTRACE_SEIZE
work.
Strace team prepares for a new release of strace, and we would like to
ship the code which uses PTRACE_SEIZE, preferably after this change goes
into released kernel.
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Pedro Alves <palves@redhat.com> Cc: Jan Kratochvil <jan.kratochvil@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Denys Vlasenko [Fri, 23 Mar 2012 22:02:42 +0000 (15:02 -0700)]
ptrace: renumber PTRACE_EVENT_STOP so that future new options and events can match
PTRACE_EVENT_foo and PTRACE_O_TRACEfoo used to match.
New PTRACE_EVENT_STOP is the first event which has no corresponding
PTRACE_O_TRACE option. If we will ever want to add another such option,
its PTRACE_EVENT's value will collide with PTRACE_EVENT_STOP's value.
This patch changes PTRACE_EVENT_STOP value to prevent this.
While at it, added a comment - the one atop PTRACE_EVENT block, saying
"Wait extended result codes for the above trace options", is not true
for PTRACE_EVENT_STOP.
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Cc: Tejun Heo <tj@kernel.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Pedro Alves <palves@redhat.com> Cc: Jan Kratochvil <jan.kratochvil@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Denys Vlasenko [Fri, 23 Mar 2012 22:02:42 +0000 (15:02 -0700)]
ptrace: make PTRACE_SEIZE set ptrace options specified in 'data' parameter
This can be used to close a few corner cases in strace where we get
unwanted racy behavior after attach, but before we have a chance to set
options (the notorious post-execve SIGTRAP comes to mind), and removes
the need to track "did we set opts for this task" state in strace
internals.
While we are at it:
Make it possible to extend SEIZE in the future with more functionality
by passing non-zero 'addr' parameter. To that end, error out if 'addr'
is non-zero. PTRACE_ATTACH did not (and still does not) have such
check, and users (strace) do pass garbage there... let's avoid
repeating this mistake with SEIZE.
Set all task->ptrace bits in one operation - before this change, we were
adding PT_SEIZED and PT_PTRACE_CAP with task->ptrace |= BIT ops. This
was probably ok (not a bug), but let's be on a safer side.
Changes since v2: use (unsigned long) casts instead of (long) ones, move
PTRACE_SEIZE_DEVEL-related code to separate lines of code.
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Pedro Alves <palves@redhat.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Jan Kratochvil <jan.kratochvil@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Denys Vlasenko [Fri, 23 Mar 2012 22:02:41 +0000 (15:02 -0700)]
ptrace: simplify PTRACE_foo constants and PTRACE_SETOPTIONS code
Exchange PT_TRACESYSGOOD and PT_PTRACE_CAP bit positions, which makes
PT_option bits contiguous and therefore makes code in
ptrace_setoptions() much simpler.
Every PTRACE_O_TRACEevent is defined to (1 << PTRACE_EVENT_event)
instead of using explicit numeric constants, to ensure we don't mess up
relationship between bit positions and event ids.
PT_EVENT_FLAG_SHIFT was not particularly useful, PT_OPT_FLAG_SHIFT with
value of PT_EVENT_FLAG_SHIFT-1 is easier to use.
PT_TRACE_MASK constant is nuked, the only its use is replaced by
(PTRACE_O_MASK << PT_OPT_FLAG_SHIFT).
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Pedro Alves <palves@redhat.com> Cc: Jan Kratochvil <jan.kratochvil@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Denys Vlasenko [Fri, 23 Mar 2012 22:02:40 +0000 (15:02 -0700)]
ptrace: don't modify flags on PTRACE_SETOPTIONS failure
On ptrace(PTRACE_SETOPTIONS, pid, 0, <opts>), we used to set those
option bits which are known, and then fail with -EINVAL if there are
some unknown bits in <opts>.
This is inconsistent with typical error handling, which does not change
any state if input is invalid.
This patch changes PTRACE_SETOPTIONS behavior so that in this case, we
return -EINVAL and don't change any bits in task->ptrace.
It's very unlikely that there is userspace code in the wild which will
be affected by this change: it should have the form
where PTRACE_O_BOGUSOPT is a constant unknown to the kernel. But kernel
headers, naturally, don't contain any PTRACE_O_BOGUSOPTs, thus the only
way userspace can use one if it defines one itself. I can't see why
anyone would do such a thing deliberately.
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Pedro Alves <palves@redhat.com> Cc: Jan Kratochvil <jan.kratochvil@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 23 Mar 2012 22:02:40 +0000 (15:02 -0700)]
ptrace: don't send SIGTRAP on exec if SEIZED
ptrace_event(PTRACE_EVENT_EXEC) sends SIGTRAP if PT_TRACE_EXEC is not
set. This is because this SIGTRAP predates PTRACE_O_TRACEEXEC option,
we do not need/want this with PT_SEIZED which can set the options during
attach.
Suggested-by: Pedro Alves <palves@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Chris Evans <scarybeasts@gmail.com> Cc: Indan Zupancic <indan@nul.nu> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Pedro Alves <palves@redhat.com> Cc: Jan Kratochvil <jan.kratochvil@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 23 Mar 2012 22:02:39 +0000 (15:02 -0700)]
ptrace: the killed tracee should not enter the syscall
Another old/known problem. If the tracee is killed after it reports
syscall_entry, it starts the syscall and debugger can't control this.
This confuses the users and this creates the security problems for
ptrace jailers.
Change tracehook_report_syscall_entry() to return non-zero if killed,
this instructs syscall_trace_enter() to abort the syscall.
Reported-by: Chris Evans <scarybeasts@gmail.com> Tested-by: Indan Zupancic <indan@nul.nu> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Pedro Alves <palves@redhat.com> Cc: Jan Kratochvil <jan.kratochvil@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Namjae Jeon [Fri, 23 Mar 2012 22:02:39 +0000 (15:02 -0700)]
fat: fix bug in enforcing Long File Name length
Since '*outlen' is initialized to zero, it is currently possible to
create a filename of length (FAT_LFN_LEN + 1) when utf8 is not enabled.
To enforce the FAT_LFN_LEN limit, we must perform one less iteration.
Signed-off-by: Namjae Jeon <linkinjeon@gmail.com> Signed-off-by: Ravishankar N <cyberax82@gmail.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Namjae Jeon [Fri, 23 Mar 2012 22:02:38 +0000 (15:02 -0700)]
fat: clean up xlate_to_uni()
xlate_to_uni() is called by vfat_build_slots() with sbi->nls_io as the
final argument. nls_io can never be null at this point because the
check is already being done in fat_fill_super() wherein the mount fails
if it is null.
Signed-off-by: Namjae Jeon <linkinjeon@gmail.com> Signed-off-by: Ravishankar N <cyberax82@gmail.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wolfram Sang [Fri, 23 Mar 2012 22:02:36 +0000 (15:02 -0700)]
rtc: ds1307: refactor chip_desc table
The chip_desc table is suboptimal. Currently it requires an entry for
every new chip type, even if it is empty. This has already been
forgotten for the ds1388. Refactor the code, so new entries are only
needed, when they chip type really needs a (non-empty) description.
Also make the table visually more appealing.
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de> Cc: Austin Boyle <Austin.Boyle@aviatnet.com> Cc: David Anders <danders.dev@gmail.com> Cc: Alessandro Zummo <alessandro.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kevin Liu [Fri, 23 Mar 2012 22:02:36 +0000 (15:02 -0700)]
drivers/rtc/rtc-max8925.c: fix alarm->enabled mistake in max8925_rtc_read_alarm/max8925_rtc_set_alarm
max8925_rtc_read_alarm() should set alrm->enabled based on both
ALARM_IRQ_MASK and ALARM_CTRL setting. max8925_rtc_set_alarm() should
enable/disable alarm according to ALARM_CTRL reg setting.
Signed-off-by: Kevin Liu <kliu5@marvell.com> Signed-off-by: Haojian Zhuang <haojian.zhuang@marvell.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yong Zhang [Fri, 23 Mar 2012 22:02:34 +0000 (15:02 -0700)]
drivers/rtc: remove IRQF_DISABLED
Since commit e58aa3d2d0cc ("genirq: run irq handlers with interrupts
disabled") we run all interrupt handlers with interrupts disabled and we
even check and yell when an interrupt handler returns with interrupts
enabled - see commit b738a50a2026 ("genirq: warn when handler enables
interrupts").