Ingo Molnar [Tue, 30 Oct 2012 07:35:33 +0000 (08:35 +0100)]
Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements, fixes and code move from Arnaldo Carvalho de Melo:
* Initialize 'page_size' variable in the python binding, this was sent
for perf/urgent by mistake, then when merging Ingo removed it, fixing
the problem for perf/urgent, but when perf/urgent was merged with
perf/core, where that initialization is needed, made the python
binding mmap call to fail, fix it by initializing page_size again.
* Add a browser for 'perf script' and make it available from the report
and annotate browsers. It does filtering to find the scripts that
handle events found in the perf.data file used. From Feng Tang
* Move some functions from symbol.c to more appropriate files, creating
dso.[ch] in the process, no code changes. From Jiri Olsa
* Fix mmap error output message for when perf_mmap fails and returns
!-EPERM, where the default for mmap_pages, INT_MAX, was causing a
!power of 2 error message, fix from Jiri Olsa.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
cputime: Separate irqtime accounting from generic vtime
vtime_account() doesn't have the same role in
CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_IRQ_TIME_ACCOUNTING.
In the first case it handles time accounting in any context. In
the second case it only handles irq time accounting.
So when vtime_account() is called from outside vtime_account_irq_*()
this call is pointless to CONFIG_IRQ_TIME_ACCOUNTING.
To fix the confusion, change vtime_account() to irqtime_account_irq()
in CONFIG_IRQ_TIME_ACCOUNTING. This way we ensure future account_vtime()
calls won't waste useless cycles in the irqtime APIs.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
With CONFIG_VIRT_CPU_ACCOUNTING, when vtime_account()
is called in irq entry/exit, we perform a check on the
context: if we are interrupting the idle task we
account the pending cputime to idle, otherwise account
to system time or its sub-areas: tsk->stime, hardirq time,
softirq time, ...
However this check for idle only concerns the hardirq entry
and softirq entry:
* Hardirq may directly interrupt the idle task, in which case
we need to flush the pending CPU time to idle.
* The idle task may be directly interrupted by a softirq if
it calls local_bh_enable(). There is probably no such call
in any idle task but we need to cover every case. Ksoftirqd
is not concerned because the idle time is flushed on context
switch and softirq in the end of hardirq have the idle time
already flushed from the hardirq entry.
In the other cases we always account to system/irq time:
* On hardirq exit we account the time to hardirq time.
* On softirq exit we account the time to softirq time.
To optimize this and avoid the indirect call to vtime_account()
and the checks it performs, specialize the vtime irq APIs and
only perform the check on irq entry. Irq exit can directly call
vtime_account_system().
CONFIG_IRQ_TIME_ACCOUNTING behaviour doesn't change and directly
maps to its own vtime_account() implementation. One may want
to take benefits from the new APIs to optimize irq time accounting
as well in the future.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
kvm: Directly account vtime to system on guest switch
Switching to or from guest context is done on ioctl context.
So by the time we call kvm_guest_enter() or kvm_guest_exit()
we know we are not running the idle task.
As a result, we can directly account the cputime using
vtime_account_system().
There are two good reasons to do this:
* We avoid some useless checks on guest switch. It optimizes
a bit this fast path.
* In the case of CONFIG_IRQ_TIME_ACCOUNTING, calling vtime_account()
checks for irq time to account. This is pointless since we know
we are not in an irq on guest switch. This is wasting cpu cycles
for no good reason. vtime_account_system() OTOH is a no-op in
this config option.
* We can remove the irq disable/enable around kvm guest switch in s390.
A further optimization may consist in introducing a vtime_account_guest()
that directly calls account_guest_time().
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: Alexander Graf <agraf@suse.de> Cc: Xiantao Zhang <xiantao.zhang@intel.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Linus Torvalds [Mon, 29 Oct 2012 15:49:25 +0000 (08:49 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fixes form Sage Weil:
"There are two fixes in the messenger code, one that can trigger a NULL
dereference, and one that error in refcounting (extra put). There is
also a trivial fix that in the fs client code that is triggered by NFS
reexport."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: fix dentry reference leak in encode_fh()
libceph: avoid NULL kref_put when osd reset races with alloc_msg
rbd: reset BACKOFF if unable to re-queue
'perf tools: Have the page size value available for all tools'
Broke the python binding because the global variable 'page_size' is
initialized on the main() routine, that is not called when using
just the python binding, causing evlist.mmap() to fail because it
expects that variable to be initialized to the system's page size.
Fix it by initializing it on the binding init routine.
Reported-by: David Ahern <dsahern@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-vrvp3azmbfzexnpmkhmvtzzc@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Sat, 20 Oct 2012 14:33:19 +0000 (16:33 +0200)]
perf record: Fix mmap error output condition
The mmap_pages default value is not power of 2 (UINT_MAX).
Together with perf_evlist__mmap function returning error value different
from EPERM, we get misleading error message: "--mmap_pages/-m value must
be a power of two."
Fixing this by adding extra check for UINT_MAX value for this error
condition.
Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1350743599-4805-12-git-send-email-jolsa@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Feng Tang [Tue, 30 Oct 2012 03:56:07 +0000 (11:56 +0800)]
perf header: Add is_perf_magic() func
With this function, other modules can basically check whether a file is
a legal perf data file by checking its first 8 bytes against all
possible perf magic numbers.
Change the function name from check_perf_magic to more meaningful
is_perf_magic as suggested by acme.
Signed-off-by: Feng Tang <feng.tang@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1351569369-26732-7-git-send-email-feng.tang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Feng Tang [Tue, 30 Oct 2012 03:56:06 +0000 (11:56 +0800)]
perf hists browser: Integrate script browser into main hists browser
Integrate the script browser into "perf report" framework, users can use
function key 'r' or the drop down menu to list all perf scripts and
select one of them, just like they did for the annotation.
Signed-off-by: Feng Tang <feng.tang@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1351569369-26732-6-git-send-email-feng.tang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Feng Tang [Tue, 30 Oct 2012 03:56:05 +0000 (11:56 +0800)]
perf annotate browser: Integrate script browser into annotation browser
Integrate the script browser into annotation, users can press function
key 'r' to list all perf scripts and select one of them to run that
script, the output will be shown in a separate browser.
Signed-off-by: Feng Tang <feng.tang@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1351569369-26732-5-git-send-email-feng.tang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Feng Tang [Tue, 30 Oct 2012 03:56:04 +0000 (11:56 +0800)]
perf scripts browser: Add a browser for perf script
Create a script browser, so that user can check all the available
scripts for current perf data file and run them inside the main perf
report or annotation browsers, for all perf samples or for samples
belong to one thread/symbol.
Please be noted: current script browser is only for report use, and
doesn't cover the record phase, IOW it must run against one existing
perf data file.
The work flow is, users can use function key to list all the available
scripts for current perf data file in system and chose one, which will
be executed with popen("perf script -s xxx.xx",) and all the output
lines are put into one ui browser, pressing 'q' or left arrow key will
make it return to previous browser.
Signed-off-by: Feng Tang <feng.tang@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1351569369-26732-4-git-send-email-feng.tang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Feng Tang [Tue, 30 Oct 2012 03:56:03 +0000 (11:56 +0800)]
perf script: Add more filter to find_scripts()
As suggested by Arnaldo, many scripts have their own usages and need
capture specific events or tracepoints, so only those scripts whose
target events match the events in current perf data file should be
listed in the script browser menu.
This patch will add the event match checking, by opening "xxx-record"
script to cherry pick out all events name and comparing them with
those inside the perf data file.
Signed-off-by: Feng Tang <feng.tang@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1351569369-26732-3-git-send-email-feng.tang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Feng Tang [Tue, 30 Oct 2012 03:56:02 +0000 (11:56 +0800)]
perf tools: Add a global variable "const char *input_name"
Currently many perf commands annotate/evlist/report/script/lock etc all
support "-i" option to chose a specific perf data, and all of them
create a local "input_name" to save the file name for that perf data.
Since most of these commands need it, we can add a global variable for
it, also it can some other benefits:
1. When calling script browser inside hists/annotation browser, it needs
to know the perf data file name to run that script.
2. For further feature like runtime switching to another perf data file,
this variable can also help.
Signed-off-by: Feng Tang <feng.tang@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1351569369-26732-2-git-send-email-feng.tang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Sat, 27 Oct 2012 21:18:32 +0000 (23:18 +0200)]
perf tools: Move dso_* related functions into dso object
Moving dso_* related functions into dso object.
Keeping symbol loading related functions still in the symbol object as
it seems more convenient.
Signed-off-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Namhyung Kim <namhyung@kernel.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1351372712-21104-6-git-send-email-jolsa@redhat.com
[ committer note: Use "symbol.h" instead of <symbol.h> to make it build with O= ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jean Delvare [Sun, 28 Oct 2012 20:37:01 +0000 (21:37 +0100)]
i2c-i801: Simplify dependency towards GPIOLIB
Arbitrarily selecting GPIOLIB causes trouble on some architectures,
so don't do that. Instead, just make the optional multiplexing code
depend on CONFIG_I2C_MUX_GPIO instead of CONFIG_I2C_MUX for now. We
can revisit if the i2c-i801 driver ever supports other multiplexing
flavors.
Also make that optional code depend on DMI, as it won't do anything
without that.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Fengguang Wu <fengguang.wu@intel.com>
Linus Torvalds [Sun, 28 Oct 2012 18:14:52 +0000 (11:14 -0700)]
Merge tag 'ktest-v3.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest
Pull ktest confusion fix from Steven Rostedt:
"With the v3.7-rc2 kernel, the network cards on my target boxes were
not being brought up.
I found that the modules for the network was not being installed.
This was due to the config CONFIG_MODULES_USE_ELF_RELA that came
before CONFIG_MODULES, and confused ktest in thinking that
CONFIG_MODULES=y was not found.
Ktest needs to test all configs and not just stop if something starts
with CONFIG_MODULES."
* tag 'ktest-v3.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
ktest: Fix ktest confusion with CONFIG_MODULES_USE_ELF_RELA
Linus Torvalds [Sun, 28 Oct 2012 18:13:54 +0000 (11:13 -0700)]
Merge tag 'spi-mxs' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc
Pull minor spi MXS fixes from Mark Brown:
"These fixes are both pretty minor ones and are driver local."
* tag 'spi-mxs' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc:
spi: mxs: Terminate DMA in case of DMA timeout
spi: mxs: Assign message status after transfer finished
Linus Torvalds [Sun, 28 Oct 2012 18:12:38 +0000 (11:12 -0700)]
Merge tag 'fixes-for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull arm-soc fixes from Arnd Bergmann:
"Bug fixes for a number of ARM platforms, mostly OMAP, imx and at91.
These come a little later than I had hoped but unfortunately we had a
few of these patches cause regressions themselves and had to work out
how to deal with those in the meantime."
* tag 'fixes-for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (38 commits)
Revert "ARM i.MX25: Fix PWM per clock lookups"
ARM: versatile: fix versatile_defconfig
ARM: mvebu: update defconfig with 3.7 changes
ARM: at91: fix at91x40 build
ARM: socfpga: Fix socfpga compilation with early_printk() enabled
ARM: SPEAr: Remove unused empty files
MAINTAINERS: Add arm-soc tree entry
ARM: dts: mxs: add the "clock-names" for gpmi-nand
ARM: ux500: Correct SDI5 address and add some format changes
ARM: ux500: Specify AMBA Primecell IDs for Nomadik I2C in DT
ARM: ux500: Fix build error relating to IRQCHIP_SKIP_SET_WAKE
ARM: at91: drop duplicated config SOC_AT91SAM9 entry
ARM: at91/i2c: change id to let i2c-at91 work
ARM: at91/i2c: change id to let i2c-gpio work
ARM: at91/dts: at91sam9g20ek_common: Fix typos in buttons labels.
ARM: at91: fix external interrupt specification in board code
ARM: at91: fix external interrupts in non-DT case
ARM: at91: at91sam9g10: fix SOC type detection
ARM: at91/tc: fix typo in the DT document
ARM: AM33XX: Fix configuration of dmtimer parent clock by dmtimer driverDate:Wed, 17 Oct 2012 13:55:55 -0500
...
Mikulas Patocka [Mon, 15 Oct 2012 21:20:17 +0000 (17:20 -0400)]
Lock splice_read and splice_write functions
Functions generic_file_splice_read and generic_file_splice_write access
the pagecache directly. For block devices these functions must be locked
so that block size is not changed while they are in progress.
This patch is an additional fix for commit b87570f5d349 ("Fix a crash
when block device is read and block size is changed at the same time")
that locked aio_read, aio_write and mmap against block size change.
Mikulas Patocka [Mon, 22 Oct 2012 23:39:16 +0000 (19:39 -0400)]
percpu-rw-semaphores: use rcu_read_lock_sched
Use rcu_read_lock_sched / rcu_read_unlock_sched / synchronize_sched
instead of rcu_read_lock / rcu_read_unlock / synchronize_rcu.
This is an optimization. The RCU-protected region is very small, so
there will be no latency problems if we disable preempt in this region.
So we use rcu_read_lock_sched / rcu_read_unlock_sched that translates
to preempt_disable / preempt_disable. It is smaller (and supposedly
faster) than preemptible rcu_read_lock / rcu_read_unlock.
Mikulas Patocka [Mon, 22 Oct 2012 23:37:47 +0000 (19:37 -0400)]
percpu-rw-semaphores: use light/heavy barriers
This patch introduces new barrier pair light_mb() and heavy_mb() for
percpu rw semaphores.
This patch fixes a bug in percpu-rw-semaphores where a barrier was
missing in percpu_up_write.
This patch improves performance on the read path of
percpu-rw-semaphores: on non-x86 cpus, there was a smp_mb() in
percpu_up_read. This patch changes it to a compiler barrier and removes
the "#if defined(X86) ..." condition.
Rik van Riel [Sat, 27 Oct 2012 16:12:11 +0000 (12:12 -0400)]
x86/mm: Completely drop the TLB flush from ptep_set_access_flags()
Intel has an architectural guarantee that the TLB entry causing
a page fault gets invalidated automatically. This means
we should be able to drop the local TLB invalidation.
Because of the way other areas of the page fault code work,
chances are good that all x86 CPUs do this. However, if
someone somewhere has an x86 CPU that does not invalidate
the TLB entry causing a page fault, this one-liner should
be easy to revert - or a CPU model specific quirk could
be added to retain this optimization on most CPUs.
Signed-off-by: Rik van Riel <riel@redhat.com> Acked-by: Linus Torvalds <torvalds@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michel Lespinasse <walken@google.com>
[ Applied changelog massage and moved this last in the series,
to create bisection distance. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Mon, 22 Oct 2012 18:15:40 +0000 (20:15 +0200)]
sched, numa, mm: Implement slow start for working set sampling
Add a 1 second delay before starting to scan the working set of
a task and starting to balance it amongst nodes.
[ note that before the constant per task WSS sampling rate patch
the initial scan would happen much later still, in effect that
patch caused this regression. ]
The theory is that short-run tasks benefit very little from NUMA
placement: they come and go, and they better stick to the node
they were started on. As tasks mature and rebalance to other CPUs
and nodes, so does their NUMA placement have to change and so
does it start to matter more and more.
In practice this change fixes an observable kbuild regression:
# [ a perf stat --null --repeat 10 test of ten bzImage builds to /dev/shm ]
!NUMA:
45.291088843 seconds time elapsed ( +- 0.40% )
45.154231752 seconds time elapsed ( +- 0.36% )
+NUMA, no slow start:
46.172308123 seconds time elapsed ( +- 0.30% )
46.343168745 seconds time elapsed ( +- 0.25% )
The implementation is simple and straightforward, most of the patch
deals with adding the /proc/sys/kernel/sched_numa_scan_delay_ms tunable
knob.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Link: http://lkml.kernel.org/n/tip-vn7p3ynbwqt3qqewhdlvjltc@git.kernel.org
[ Wrote the changelog, ran measurements, tuned the default. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Sun, 14 Oct 2012 14:59:13 +0000 (16:59 +0200)]
sched, numa, mm: Implement constant, per task Working Set Sampling (WSS) rate
Previously, to probe the working set of a task, we'd use
a very simple and crude method: mark all of its address
space PROT_NONE.
That method has various (obvious) disadvantages:
- it samples the working set at dissimilar rates,
giving some tasks a sampling quality advantage
over others.
- creates performance problems for tasks with very
large working sets
- over-samples processes with large address spaces but
which only very rarely execute
Improve that method by keeping a rotating offset into the
address space that marks the current position of the scan,
and advance it by a constant rate (in a CPU cycles execution
proportional manner). If the offset reaches the last mapped
address of the mm then it then it starts over at the first
address.
The per-task nature of the working set sampling functionality
in this tree allows such constant rate, per task,
execution-weight proportional sampling of the working set,
with an adaptive sampling interval/frequency that goes from
once per 100 msecs up to just once per 1.6 seconds.
The current sampling volume is 256 MB per interval.
As tasks mature and converge their working set, so does the
sampling rate slow down to just a trickle, 256 MB per 1.6
seconds of CPU time executed.
This, beyond being adaptive, also rate-limits rarely
executing systems and does not over-sample on overloaded
systems.
[ In AutoNUMA speak, this patch deals with the effective sampling
rate of the 'hinting page fault'. AutoNUMA's scanning is
currently rate-limited, but it is also fundamentally
single-threaded, executing in the knuma_scand kernel thread,
so the limit in AutoNUMA is global and does not scale up with
the number of CPUs, nor does it scan tasks in an execution
proportional manner.
So the idea of rate-limiting the scanning was first implemented
in the AutoNUMA tree via a global rate limit. This patch goes
beyond that by implementing an execution rate proportional
working set sampling rate that is not implemented via a single
global scanning daemon. ]
[ Dan Carpenter pointed out a possible NULL pointer dereference in the
first version of this patch. ]
Based-on-idea-by: Andrea Arcangeli <aarcange@redhat.com> Bug-Found-By: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Link: http://lkml.kernel.org/n/tip-wt5b48o2226ec63784i58s3j@git.kernel.org
[ Wrote changelog and fixed bug. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
Rik van Riel [Thu, 18 Oct 2012 21:19:28 +0000 (17:19 -0400)]
sched, numa, mm: Add credits for NUMA placement
The NUMA placement code has been rewritten several times, but
the basic ideas took a lot of work to develop. The people who
put in the work deserve credit for it. Thanks Andrea & Peter :)
[ The Documentation/scheduler/numa-problem.txt file should
probably be rewritten once we figure out the final details of
what the NUMA code needs to do, and why. ]
Signed-off-by: Rik van Riel <riel@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: aarcange@redhat.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20121018171928.24d06af4@cuia.bos.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
----
This is against tip.git numa/core
Peter Zijlstra [Tue, 9 Oct 2012 11:46:22 +0000 (13:46 +0200)]
sched, numa, mm: Add fault driven placement and migration policy
As per the problem/design document Documentation/scheduler/numa-problem.txt
implement 3ac & 4.
( A pure 3a was found too unstable, I did briefly try 3bc
but found no significant improvement. )
Implement a per-task memory placement scheme relying on a regular
PROT_NONE 'migration' fault to scan the memory space of the procress
and uses a two stage migration scheme to reduce the invluence of
unlikely usage relations.
It relies on the assumption that the compute part is tied to a
paticular task and builds a task<->page relation set to model the
compute<->data relation.
In the previous patch we made memory migrate towards where the task
is running, here we select the node on which most memory is located
as the preferred node to run on.
This creates a feed-back control loop between trying to schedule a
task on a node and migrating memory towards the node the task is
scheduled on.
Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Suggested-by: Rik van Riel <riel@redhat.com> Fixes-by: David Rientjes <rientjes@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-8ejt0ioj62k5ruf5zd2ix9zu@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Mon, 13 Aug 2012 13:22:20 +0000 (15:22 +0200)]
sched, numa, mm: Introduce last_nid in the pageframe
Introduce a per-page last_nid field, fold this into the struct
page::flags field whenever possible.
The unlikely/rare 32bit NUMA configs will likely grow the page-frame.
Completely dropping 32bit support for CONFIG_SCHED_NUMA would simplify
things, but it would also remove the warning if we grow enough 64bit
only page-flags to push the last-nid out.
Suggested-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Link: http://lkml.kernel.org/n/tip-0uois4f9skfw9mwyk1yoy0jq@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Sat, 3 Mar 2012 15:56:25 +0000 (16:56 +0100)]
sched, numa, mm: Implement home-node awareness
Implement home node preference in the scheduler's load-balancer.
This is done in four pieces:
- task_numa_hot(); make it harder to migrate tasks away from their
home-node, controlled using the NUMA_HOT feature flag.
- select_task_rq_fair(); prefer placing the task in their home-node,
controlled using the NUMA_TTWU_BIAS feature flag. Disabled by
default for we found this to be far too agressive.
- load_balance(); during the regular pull load-balance pass, try
pulling tasks that are on the wrong node first with a preference
of moving them nearer to their home-node through task_numa_hot(),
controlled through the NUMA_PULL feature flag.
- load_balance(); when the balancer finds no imbalance, introduce
some such that it still prefers to move tasks towards their
home-node, using active load-balance if needed, controlled through
the NUMA_PULL_BIAS feature flag.
In particular, only introduce this BIAS if the system is otherwise
properly (weight) balanced and we either have an offnode or !numa
task to trade for it.
In order to easily find off-node tasks, split the per-cpu task list
into two parts.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Turner <pjt@google.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Christoph Lameter <cl@linux.com> Cc: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-g0gzzzjrvrzxl6kgwnil1e97@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Sat, 3 Mar 2012 16:05:16 +0000 (17:05 +0100)]
sched, numa, mm: Introduce tsk_home_node()
Introduce the home-node concept for tasks. In order to keep memory
locality we need to have a something to stay local to, we define the
home-node of a task as the node we prefer to allocate memory from and
prefer to execute on.
These are no hard guarantees, merely soft preferences. This allows for
optimal resource usage, we can run a task away from the home-node, the
remote memory hit -- while expensive -- is less expensive than not
running at all, or very little, due to severe cpu overload.
Similarly, we can allocate memory from another node if our home-node
is depleted, again, some memory is better than no memory.
This patch merely introduces the basic infrastructure, all policy
comes later.
NOTE: we introduce the concept of EMBEDDED_NUMA, these are
architectures where the memory access cost doesn't depend on the cpu
but purely on the physical address -- embedded boards with cheap
(slow) and expensive (fast) memory banks.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-ii8j8cp87cgctecfqp2ib6rn@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Tue, 17 Jul 2012 20:54:51 +0000 (22:54 +0200)]
mm/mpol: Use special PROT_NONE to migrate pages
Combine our previous PROT_NONE, mpol_misplaced and
migrate_misplaced_page() pieces into an effective migrate on fault
scheme.
Note that (on x86) we rely on PROT_NONE pages being !present and avoid
the TLB flush from try_to_unmap(TTU_MIGRATION). This greatly improves
the page-migration performance.
Suggested-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Paul Turner <pjt@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Link: http://lkml.kernel.org/n/tip-e98gyl8kr9jzooh2s4piuils@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
numa, mm: Support NUMA hinting page faults from gup/gup_fast
Introduce FOLL_NUMA to tell follow_page to check
pte/pmd_numa. get_user_pages must use FOLL_NUMA, and it's safe to do
so because it always invokes handle_mm_fault and retries the
follow_page later.
KVM secondary MMU page faults will trigger the NUMA hinting page
faults through gup_fast -> get_user_pages -> follow_page ->
handle_mm_fault.
Other follow_page callers like KSM should not use FOLL_NUMA, or they
would fail to get the pages if they use follow_page instead of
get_user_pages.
[ This patch was picked up from the AutoNUMA tree. ]
Originally-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com>
[ ported to this tree. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
Lee Schermerhorn [Thu, 12 Jan 2012 11:37:17 +0000 (12:37 +0100)]
mm/mpol: Add MPOL_MF_LAZY
This patch adds another mbind() flag to request "lazy migration". The
flag, MPOL_MF_LAZY, modifies MPOL_MF_MOVE* such that the selected
pages are marked PROT_NONE. The pages will be migrated in the fault
path on "first touch", if the policy dictates at that time.
"Lazy Migration" will allow testing of migrate-on-fault via mbind().
Also allows applications to specify that only subsequently touched
pages be migrated to obey new policy, instead of all pages in range.
This can be useful for multi-threaded applications working on a
large shared data area that is initialized by an initial thread
resulting in all pages on one [or a few, if overflowed] nodes.
After PROT_NONE, the pages in regions assigned to the worker threads
will be automatically migrated local to the threads on 1st touch.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org>
[ nearly complete rewrite.. ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-7rsodo9x8zvm5awru5o7zo0y@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Tue, 17 Jul 2012 16:25:14 +0000 (18:25 +0200)]
mm/mpol: Create special PROT_NONE infrastructure
In order to facilitate a lazy -- fault driven -- migration of pages,
create a special transient PROT_NONE variant, we can then use the
'spurious' protection faults to drive our migrations from.
Pages that already had an effective PROT_NONE mapping will not
be detected to generate these 'spuriuos' faults for the simple reason
that we cannot distinguish them on their protection bits, see
pte_numa().
This isn't a problem since PROT_NONE (and possible PROT_WRITE with
dirty tracking) aren't used or are rare enough for us to not care
about their placement.
Suggested-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Paul Turner <pjt@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Link: http://lkml.kernel.org/n/tip-0g5k80y4df8l83lha9j75xph@git.kernel.org
[ fixed various cross-arch and THP/!THP details ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
Lee Schermerhorn [Wed, 11 Jan 2012 14:48:13 +0000 (15:48 +0100)]
mm/mpol: Check for misplaced page
This patch provides a new function to test whether a page resides
on a node that is appropriate for the mempolicy for the vma and
address where the page is supposed to be mapped. This involves
looking up the node where the page belongs. So, the function
returns that node so that it may be used to allocated the page
without consulting the policy again.
A subsequent patch will call this function from the fault path.
Because of this, I don't want to go ahead and allocate the page, e.g.,
via alloc_page_vma() only to have to free it if it has the correct
policy. So, I just mimic the alloc_page_vma() node computation
logic--sort of.
Note: we could use this function to implement a MPOL_MF_STRICT
behavior when migrating pages to match mbind() mempolicy--e.g.,
to ensure that pages in an interleaved range are reinterleaved
rather than left where they are when they reside on any page in
the interleave nodemask.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org>
[ Added MPOL_F_LAZY to trigger migrate-on-fault;
simplified code now that we don't have to bother
with special crap for interleaved ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-z3mgep4tgrc08o07vl1ahb2m@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Lee Schermerhorn [Mon, 16 Jan 2012 13:43:29 +0000 (14:43 +0100)]
mm/mpol: Add MPOL_MF_NOOP
This patch augments the MPOL_MF_LAZY feature by adding a "NOOP" policy
to mbind(). When the NOOP policy is used with the 'MOVE and 'LAZY
flags, mbind() will map the pages PROT_NONE so that they will be
migrated on the next touch.
This allows an application to prepare for a new phase of operation
where different regions of shared storage will be assigned to
worker threads, w/o changing policy. Note that we could just use
"default" policy in this case. However, this also allows an
application to request that pages be migrated, only if necessary,
to follow any arbitrary policy that might currently apply to a
range of pages, without knowing the policy, or without specifying
multiple mbind()s for ranges with different policies.
[ Bug in early version of mpol_parse_str() reported by Fengguang Wu. ]
Bug-Reported-by: Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@kernel.org>
mm/pgprot: Move the pgprot_modify() fallback definition to mm.h
pgprot_modify() is available on x86, but on other architectures it only
gets defined in mm/mprotect.c - breaking the build if anything outside
of mprotect.c tries to make use of this function.
Move it to the generic pgprot area in mm.h, so that an upcoming patch
can make use of it.
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Rik van Riel <riel@redhat.com> Cc: Paul Turner <pjt@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-nfvarGMj9gjavowroorkizb4@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Rik van Riel [Tue, 9 Oct 2012 13:31:59 +0000 (15:31 +0200)]
mm: Only flush the TLB when clearing an accessible pte
If ptep_clear_flush() is called to clear a page table entry that is
accessible anyway by the CPU, eg. a _PAGE_PROTNONE page table entry,
there is no need to flush the TLB on remote CPUs.
Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-vm3rkzevahelwhejx5uwm8ex@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Rik van Riel [Tue, 9 Oct 2012 13:31:12 +0000 (15:31 +0200)]
x86/mm: Introduce pte_accessible()
We need pte_present to return true for _PAGE_PROTNONE pages, to indicate that
the pte is associated with a page.
However, for TLB flushing purposes, we would like to know whether the pte
points to an actually accessible page. This allows us to skip remote TLB
flushes for pages that are not actually accessible.
Fill in this method for x86 and provide a safe (but slower) method
on other architectures.
Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Fixed-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-66p11te4uj23gevgh4j987ip@git.kernel.org
[ Added Linus's review fixes. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Fri, 12 Oct 2012 10:13:10 +0000 (12:13 +0200)]
sched, numa, mm: Describe the NUMA scheduling problem formally
This is probably a first: formal description of a complex high-level
computing problem, within the kernel source.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Mike Galbraith <efault@gmx.de>
Rik van Riel <riel@redhat.com> Link: http://lkml.kernel.org/n/tip-mmnlpupoetcatimvjEld16Pb@git.kernel.org
[ Next step: generate the kernel source from such formal descriptions and retire to a tropical island! ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
Rik van Riel [Sat, 27 Oct 2012 16:12:11 +0000 (12:12 -0400)]
mm/generic: Only flush the local TLB in ptep_set_access_flags()
The function ptep_set_access_flags() is only ever used to upgrade
access permissions to a page - i.e. they make it less restrictive.
That means the only negative side effect of not flushing remote
TLBs in this function is that other CPUs may incur spurious page
faults, if they happen to access the same address, and still have
a PTE with the old permissions cached in their TLB caches.
Having another CPU maybe incur a spurious page fault is faster
than always incurring the cost of a remote TLB flush, so replace
the remote TLB flush with a purely local one.
This should be safe on every architecture that correctly
implements flush_tlb_fix_spurious_fault() to actually invalidate
the local TLB entry that caused a page fault, as well as on
architectures where the hardware invalidates TLB entries that
cause page faults.
In the unlikely event that you are hitting what appears to be
an infinite loop of page faults, and 'git bisect' took you to
this changeset, your architecture needs to implement
flush_tlb_fix_spurious_fault() to actually flush the TLB entry.
Signed-off-by: Rik van Riel <riel@redhat.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michel Lespinasse <walken@google.com>
[ Changelog massage. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andi Kleen [Fri, 26 Oct 2012 20:30:06 +0000 (13:30 -0700)]
perf tools: Move parse_events error printing to parse_events_options
The callers of parse_events usually have their own error handling. Move
the fprintf for a bad event to parse_events_options, which is the only
one who should need it.
Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1351283415-13170-25-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
arch/arm/mach-imx/clk-imx25.c: In function 'mx25_clocks_init':
arch/arm/mach-imx/clk-imx25.c:206:26: error: 'pwm_ipg_per' undeclared (first use in this function)
arch/arm/mach-imx/clk-imx25.c:206:26: note: each undeclared identifier is reported only once for each function it appears in
Sascha Hauer explains:
> There are several gates missing in clk-imx25.c. I have a patch which
> adds support for them and I seem to have missed that the above depends
> on it.
Arnd Bergmann [Fri, 26 Oct 2012 21:06:43 +0000 (23:06 +0200)]
ARM: versatile: fix versatile_defconfig
With the introduction of CONFIG_ARCH_MULTIPLATFORM, versatile is
no longer the default platform, so we need to enable
CONFIG_ARCH_VERSATILE explicitly in order for that to be selected
rather than the multiplatform configuration.
Thomas Petazzoni [Tue, 23 Oct 2012 08:17:49 +0000 (10:17 +0200)]
ARM: mvebu: update defconfig with 3.7 changes
The split of 370 and XP into two Kconfig options and the multiplatform
kernel support has changed a few Kconfig symbols, so let's update the
mvebu_defconfig file with the latest changes.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Arnd Bergmann [Fri, 26 Oct 2012 20:49:09 +0000 (22:49 +0200)]
ARM: at91: fix at91x40 build
patch 738a0fd7 "ARM: at91: fix external interrupts in non-DT case"
fixed a run-time error on some at91 platforms but did not apply
the same change to at91x40, which now doesn't build.
This changes at91x40 in the same way that the other platforms
were changed.
Pull networking fixes from David Miller:
"This is what we usually expect at this stage of the game, lots of
little things, mostly in drivers. With the occasional 'oops didn't
mean to do that' kind of regressions in the core code."
1) Uninitialized data in __ip_vs_get_timeouts(), from Arnd Bergmann
2) Reject invalid ACK sequences in Fast Open sockets, from Jerry Chu.
3) Lost error code on return from _rtl_usb_receive(), from Christian
Lamparter.
4) Fix reset resume on USB rt2x00, from Stanislaw Gruszka.
5) Release resources on error in pch_gbe driver, from Veaceslav Falico.
6) Default hop limit not set correctly in ip6_template_metrics[], fix
from Li RongQing.
7) Gianfar PTP code requests wrong kind of resource during probe, fix
from Wei Yang.
8) Fix VHOST net driver on big-endian, from Michael S Tsirkin.
9) Mallenox driver bug fixes from Jack Morgenstein, Or Gerlitz, Moni
Shoua, Dotan Barak, and Uri Habusha.
10) usbnet leaks memory on TX path, fix from Hemant Kumar.
11) Use socket state test, rather than presence of FIN bit packet, to
determine FIONREAD/SIOCINQ value. Fix from Eric Dumazet.
12) Fix cxgb4 build failure, from Vipul Pandya.
13) Provide a SYN_DATA_ACKED state to complement SYN_FASTOPEN in socket
info dumps. From Yuchung Cheng.
14) Fix leak of security path in kfree_skb_partial(). Fix from Eric
Dumazet.
15) Handle RX FIFO overflows more resiliently in pch_gbe driver, from
Veaceslav Falico.
16) Fix MAINTAINERS file pattern for networking drivers, from Jean
Delvare.
17) Add iPhone5 IDs to IPHETH driver, from Jay Purohit.
18) VLAN device type change restriction is too strict, and should not
trigger for the automatically generated vlan0 device. Fix from Jiri
Pirko.
19) Make PMTU/redirect flushing work properly again in ipv4, from
Steffen Klassert.
20) Fix memory corruptions by using kfree_rcu() in netlink_release().
From Eric Dumazet.
21) More qmi_wwan device IDs, from Bjørn Mork.
22) Fix unintentional change of SNAT/DNAT hooks in generic NAT
infrastructure, from Elison Niven.
23) Fix 3.6.x regression in xt_TEE netfilter module, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits)
tilegx: fix some issues in the SW TSO support
qmi_wwan/cdc_ether: move Novatel 551 and E362 to qmi_wwan
net: usb: Fix memory leak on Tx data path
net/mlx4_core: Unmap UAR also in the case of error flow
net/mlx4_en: Don't use vlan tag value as an indication for vlan presence
net/mlx4_en: Fix double-release-range in tx-rings
bas_gigaset: fix pre_reset handling
vhost: fix mergeable bufs on BE hosts
gianfar_ptp: use iomem, not ioports resource tree in probe
ipv6: Set default hoplimit as zero.
NET_VENDOR_TI: make available for am33xx as well
pch_gbe: fix error handling in pch_gbe_up()
b43: Fix oops on unload when firmware not found
mwifiex: clean up scan state on error
mwifiex: return -EBUSY if specific scan request cannot be honored
brcmfmac: fix potential NULL dereference
Revert "ath9k_hw: Updated AR9003 tx gain table for 5GHz"
ath9k_htc: Add PID/VID for a Ubiquiti WiFiStation
rt2x00: usb: fix reset resume
rtlwifi: pass rx setup error code to caller
...
Linus Torvalds [Fri, 26 Oct 2012 21:59:01 +0000 (14:59 -0700)]
Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma
Pull slave-dmaengine fixes from Vinod Koul:
"Three fixes for slave dmanegine.
Two are for typo omissions in sifr dmaengine driver and the last one
is for the imx driver fixing a missing unlock"
* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: sirf: fix a typo in moving running dma_desc to active queue
dmaengine: sirf: fix a typo in dma_prep_interleaved
dmaengine: imx-dma: fix missing unlock on error in imxdma_xfer_desc()
Linus Torvalds [Fri, 26 Oct 2012 21:23:35 +0000 (14:23 -0700)]
Merge tag 'pm+acpi-for-3.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael J Wysocki:
- Fix for a memory leak in acpi_bind_one() from Jesper Juhl.
- Fix for an error code path memory leak in pm_genpd_attach_cpuidle()
from Jonghwan Choi.
- Fix for smp_processor_id() usage in preemptible code in powernow-k8
from Andreas Herrmann.
- Fix for a suspend-related memory leak in cpufreq stats from Xiaobing
Tu.
- Freezer fix for failure to clear PF_NOFREEZE along with PF_KTHREAD in
flush_old_exec() from Oleg Nesterov.
- acpi_processor_notify() fix from Alan Cox.
* tag 'pm+acpi-for-3.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: missing break
freezer: exec should clear PF_NOFREEZE along with PF_KTHREAD
Fix memory leak in cpufreq stats.
cpufreq / powernow-k8: Remove usage of smp_processor_id() in preemptible code
PM / Domains: Fix memory leak on error path in pm_genpd_attach_cpuidle
ACPI: Fix memory leak in acpi_bind_one()
Linus Torvalds [Fri, 26 Oct 2012 20:46:41 +0000 (13:46 -0700)]
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband fixes from Roland Dreier:
"Small batch of fixes for 3.7:
- Fix crash in error path in cxgb4
- Fix build error on 32 bits in mlx4
- Fix SR-IOV bugs in mlx4"
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
mlx4_core: Perform correct resource cleanup if mlx4_QUERY_ADAPTER() fails
mlx4_core: Remove annoying debug messages from SR-IOV flow
RDMA/cxgb4: Don't free chunk that we have failed to allocate
IB/mlx4: Synchronize cleanup of MCGs in MCG paravirtualization
IB/mlx4: Fix QP1 P_Key processing in the Primary Physical Function (PPF)
IB/mlx4: Fix build error on platforms where UL is not 64 bits
Linus Torvalds [Fri, 26 Oct 2012 17:26:36 +0000 (10:26 -0700)]
Merge tag 'usb-3.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg Kroah-Hartman:
"Here are a bunch of USB fixes for the 3.7-rc tree.
There's a lot of small USB serial driver fixes, and one larger one
(the mos7840 driver changes are mostly just moving code around to fix
problems.) Thanks to Johan Hovold for finding the problems and fixing
them all up.
Other than those, there is the usual new device ids, xhci bugfixes,
and gadget driver fixes, nothing out of the ordinary.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'usb-3.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (49 commits)
xhci: trivial: Remove assigned but unused ep_ctx.
xhci: trivial: Remove assigned but unused slot_ctx.
xhci: Fix missing break in xhci_evaluate_context_result.
xhci: Fix potential NULL ptr deref in command cancellation.
ehci: Add yet-another Lucid nohandoff pci quirk
ehci: fix Lucid nohandoff pci quirk to be more generic with BIOS versions
USB: mos7840: fix port_probe flow
USB: mos7840: fix port-data memory leak
USB: mos7840: remove invalid disconnect handling
USB: mos7840: remove NULL-urb submission
USB: qcserial: fix interface-data memory leak in error path
USB: option: fix interface-data memory leak in error path
USB: ipw: fix interface-data memory leak in error path
USB: mos7840: fix port-device leak in error path
USB: mos7840: fix urb leak at release
USB: sierra: fix port-data memory leak
USB: sierra: fix memory leak in probe error path
USB: sierra: fix memory leak in attach error path
USB: usb-wwan: fix multiple memory leaks in error paths
USB: keyspan: fix NULL-pointer dereferences and memory leaks
...
Linus Torvalds [Fri, 26 Oct 2012 17:25:31 +0000 (10:25 -0700)]
Merge tag 'staging-3.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg Kroah-Hartman:
"Here are some staging driver fixes for your 3.7-rc tree.
Nothing major here, a number of iio driver fixups that were causing
problems, some comedi driver bugfixes, and a bunch of tidspbridge
warning squashing and other regressions fixed from the 3.6 release.
All have been in the linux-next releases for a bit.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'staging-3.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (32 commits)
staging: tidspbridge: delete unused mmu functions
staging: tidspbridge: ioremap physical address of the stack segment in shm
staging: tidspbridge: ioremap dsp sync addr
staging: tidspbridge: change type to __iomem for per and core addresses
staging: tidspbridge: drop const from custom mmu implementation
staging: tidspbridge: request the right irq for mmu
staging: ipack: add missing include (implicit declaration of function 'kfree')
staging: ramster: depends on NET
staging: omapdrm: fix allocation size for page addresses array
staging: zram: Fix handling of incompressible pages
Staging: android: binder: Allow using highmem for binder buffers
Staging: android: binder: Fix memory leak on thread/process exit
staging: comedi: ni_labpc: fix possible NULL deref during detach
staging: comedi: das08: fix possible NULL deref during detach
staging: comedi: amplc_pc263: fix possible NULL deref during detach
staging: comedi: amplc_pc236: fix possible NULL deref during detach
staging: comedi: amplc_pc236: fix invalid register access during detach
staging: comedi: amplc_dio200: fix possible NULL deref during detach
staging: comedi: 8255_pci: fix possible NULL deref during detach
staging: comedi: ni_daq_700: fix dio subdevice regression
...