git.karo-electronics.de Git - karo-tx-linux.git/log

perf/x86: Check all MSRs before passing hw check

check_hw_exists() has a number of checks which go to two exit
paths: msr_fail and bios_fail.  Checks classified as msr_fail
will cause check_hw_exists() to return false, causing the PMU
not to be used; bios_fail checks will only cause a warning to be
printed, but will return true.

The problem is that if there are both msr failures and bios
failures, and the routine hits a bios_fail check first, it will
exit early and return true, not finishing the rest of the msr
checks.  If those msrs are in fact broken, it will cause them to
be used erroneously.

In the case of a Xen PV VM, the guest OS has read access to all
the MSRs, but write access is white-listed to supported
features.  Writes to unsupported MSRs have no effect.  The PMU
MSRs are not (typically) supported, because they are expensive
to save and restore on a VM context switch.  One of the
"msr_fail" checks is supposed to detect this circumstance (ether
for Xen or KVM) and disable the harware PMU.

However, on one of my AMD boxen, there is (apparently) a broken
BIOS which triggers one of the bios_fail checks.  In particular,
MSR_K7_EVNTSEL0 has the ARCH_PERFMON_EVENTSEL_ENABLE bit set.
The guest kernel detects this because it has read access to all
MSRs, and causes it to skip the rest of the checks and try to
use the non-existent hardware PMU.  This minimally causes a lot
of useless instruction emulation and Xen console spam; it may
cause other issues with the watchdog as well.

This changset causes check_hw_exists() to go through all of the
msr checks, failing and returning false if any of them fail.
This makes sure that a guest running under Xen without a virtual
PMU will detect that there is no functioning PMU and not attempt
to use it.

This problem affects kernels as far back as 3.2, and should thus
be considered for backport.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Konrad Wilk <konrad.wilk@oracle.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Link: http://lkml.kernel.org/r/1365000388-32448-1-git-send-email-george.dunlap@eu.citrix.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf/x86/amd: Add support for AMD NB and L2I "uncore" counters

Add support for AMD Family 15h [and above] northbridge
performance counters. MSRs 0xc0010240 ~ 0xc0010247 are shared
across all cores that share a common northbridge.

Add support for AMD Family 16h L2 performance counters. MSRs
0xc0010230 ~ 0xc0010237 are shared across all cores that share a
common L2 cache.

We do not enable counter overflow interrupts. Sampling mode and
per-thread events are not supported.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Stephane Eranian <eranian@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20130419213428.GA8229@jshin-Toonie
Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf/x86/intel: Add Ivy Bridge-EP uncore support

The uncore subsystem in Ivy Bridge-EP is similar to Sandy
Bridge-EP. There are some differences in config register
encoding and pci device IDs. The Ivy Bridge-EP uncore also
supports a few new events.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: ak@linux.intel.com
Link: http://lkml.kernel.org/r/1366113067-3262-4-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf/x86/intel: Fix SNB-EP CBO and PCU uncore PMU filter management

The existing code assumes all Cbox and PCU events are using
filter, but actually the filter is event specific. Furthermore
the filter is sub-divided into multiple fields which are used
by different events.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Link: http://lkml.kernel.org/r/1366113067-3262-3-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reported-by: Stephane Eranian <eranian@google.com>

perf/x86: Avoid kfree() in CPU_{STARTING,DYING}

On -rt kfree() can schedule, but CPU_{STARTING,DYING} should be
atomic. So use a list to defer kfree until CPU_{ONLINE,DEAD}.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: ak@linux.intel.com
Link: http://lkml.kernel.org/r/1366113067-3262-2-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge branch 'perf/urgent' into perf/core

Conflicts:
arch/x86/kernel/cpu/perf_event_intel.c

Merge in the latest fixes before applying new patches, resolve the conflict.

Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf/x86: Fix offcore_rsp valid mask for SNB/IVB

The valid mask for both offcore_response_0 and
offcore_response_1 was wrong for SNB/SNB-EP,
IVB/IVB-EP. It was possible to write to
reserved bit and cause a GP fault crashing
the kernel.

This patch fixes the problem by correctly marking the
reserved bits in the valid mask for all the processors
mentioned above.

A distinction between desktop and server parts is introduced
because bits 24-30 are only available on the server parts.

This version of the patch is just a rebase to perf/urgent tree
and should apply to older kernels as well.

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: gregkh@linuxfoundation.org
Cc: security@kernel.org
Cc: ak@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core

Pull uprobes updates from Oleg Nesterov:

- "uretprobes" - an optimization to uprobes, like kretprobes are an optimization
to kprobes. "perf probe -x file sym%return" now works like kretprobes.

- PowerPC fixes plus a couple of cleanups/optimizations in uprobes and trace_uprobes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>

uprobes/perf: Avoid perf_trace_buf_prepare/submit if ->perf_events is empty

perf_trace_buf_prepare() + perf_trace_buf_submit() make no sense
if this task/CPU has no active counters. Change uprobe_perf_print()
to return if hlist_empty(call->perf_events).

Note: this is not uprobe-specific, we can change other users too.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>

perf: Treat attr.config as u64 in perf_swevent_init()

Trinity discovered that we fail to check all 64 bits of
attr.config passed by user space, resulting to out-of-bounds
access of the perf_swevent_enabled array in
sw_perf_event_destroy().

Introduced in commit b0a873ebb ("perf: Register PMU
implementations").

Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: davej@redhat.com
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Link: http://lkml.kernel.org/r/1365882554-30259-1-git-send-email-tt.rantala@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Linux 3.9-rc7

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
"Misc fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Flush lazy MMU when DEBUG_PAGEALLOC is set
  x86/mm/cpa/selftest: Fix false positive in CPA self test
  x86/mm/cpa: Convert noop to functional fix
  x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal
  x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates

Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:
"Misc fixlets"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/cputime: Fix accounting on multi-threaded processes
  sched/debug: Fix sd->*_idx limit range avoiding overflow
  sched_clock: Prevent 64bit inatomicity on 32bit systems
  sched: Convert BUG_ON()s in try_to_wake_up_local() to WARN_ON_ONCE()s

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
"Misc fixlets"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix error return code
  ftrace: Fix strncpy() use, use strlcpy() instead of strncpy()
  perf: Fix strncpy() use, use strlcpy() instead of strncpy()
  perf: Fix strncpy() use, always make sure it's NUL terminated
  perf: Fix ring_buffer perf_output_space() boundary calculation
  perf/x86: Fix uninitialized pt_regs in intel_pmu_drain_bts_buffer()

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"One fix for a hotplug locking regressions, and one fix for an oops if
  you unplug the monitor at an inopportune moment on the udl device."

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/fb-helper: Fix locking in drm_fb_helper_hotplug_event
  udl: handle EDID failure properly.

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu

Pull m68knommu fix from Greg Ungerer:
"This contains only a single compilation fix for ColdFire m68k targets
that use local non-GPIOLIB support."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68k: define a local gpio_request_one() function

Merge git://www.linux-watchdog.org/linux-watchdog

Pull watchdog fix from Wim Van Sebroeck:
"It will fix compile errors for the at91rm9200_wdt driver"

* git://www.linux-watchdog.org/linux-watchdog:
watchdog: Revert the AT91RM9200_WATCHDOG dependency

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs

Pull one more btrfs fix from Chris Mason:
"This has a recent fix from Josef for our tree log replay code.  It
  fixes problems where the inode counter for the number of bytes in the
  file wasn't getting updated properly during fsync replay.

  The commit did get rebased this morning, but it was only to clean up
  the subject line.  The code hasn't changed."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: make sure nbytes are right after log replay

Merge tag 'trace-fixes-v3.9-rc-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull ftrace fixes from Steven Rostedt:
"Namhyung Kim found and fixed a bug that can crash the kernel by simply
  doing: echo 1234 | tee -a /sys/kernel/debug/tracing/set_ftrace_pid

  Luckily, this can only be done by root, but still is a nasty bug."

* tag 'trace-fixes-v3.9-rc-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace: Move ftrace_filter_lseek out of CONFIG_DYNAMIC_FTRACE section
  tracing: Fix possible NULL pointer dereferences

Add file_ns_capable() helper function for open-time capability checking

Nothing is using it yet, but this will allow us to delay the open-time
checks to use time, without breaking the normal UNIX permission
semantics where permissions are determined by the opener (and the file
descriptor can then be passed to a different process, or the process can
drop capabilities).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

watchdog: Revert the AT91RM9200_WATCHDOG dependency

Compiling the at91rm9200_wdt.c driver without at91rm9200
support was leading to several errors:

drivers/built-in.o: In function `at91_wdt_close':
at91_adc.c:(.text+0xc9fe4): undefined reference to `at91_st_base'
drivers/built-in.o: In function `at91_wdt_write':
at91_adc.c:(.text+0xca004): undefined reference to `at91_st_base'
drivers/built-in.o: In function `at91wdt_shutdown':
at91_adc.c:(.text+0xca01c): undefined reference to `at91_st_base'
drivers/built-in.o: In function `at91wdt_suspend':
at91_adc.c:(.text+0xca038): undefined reference to `at91_st_base'
drivers/built-in.o: In function `at91_wdt_open':
at91_adc.c:(.text+0xca0cc): undefined reference to `at91_st_base'
drivers/built-in.o:at91_adc.c:(.text+0xca2c8): more undefined references to
`at91_st_base' follow

So, reverting the modification of the "depends" Kconfig line
introduced by patch a6a1bcd37 (watchdog: at91rm9200: add DT support)
seems to be the good solution.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>

vfs: Revert spurious fix to spinning prevention in prune_icache_sb

Revert commit 62a3ddef6181 ("vfs: fix spinning prevention in prune_icache_sb").

This commit doesn't look right: since we are looking at the tail of the
list (sb->s_inode_lru.prev) if we want to skip an inode, we should put
it back at the head of the list instead of the tail, otherwise we will
keep spinning on it.

Discovered when investigating why prune_icache_sb came top in perf
reports of a swapping load.

Signed-off-by: Suleiman Souhlal <suleiman@google.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v3.2+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kobject: fix kset_find_obj() race with concurrent last kobject_put()

Anatol Pomozov identified a race condition that hits module unloading
and re-loading.  To quote Anatol:

"This is a race codition that exists between kset_find_obj() and
  kobject_put().  kset_find_obj() might return kobject that has refcount
  equal to 0 if this kobject is freeing by kobject_put() in other
  thread.

  Here is timeline for the crash in case if kset_find_obj() searches for
  an object tht nobody holds and other thread is doing kobject_put() on
  the same kobject:

    THREAD A (calls kset_find_obj())     THREAD B (calls kobject_put())
    splin_lock()
                                         atomic_dec_return(kobj->kref), counter gets zero here
                                         ... starts kobject cleanup ....
                                         spin_lock() // WAIT thread A in kobj_kset_leave()
    iterate over kset->list
    atomic_inc(kobj->kref) (counter becomes 1)
    spin_unlock()
                                         spin_lock() // taken
                                         // it does not know that thread A increased counter so it
                                         remove obj from list
                                         spin_unlock()
                                         vfree(module) // frees module object with containing kobj

    // kobj points to freed memory area!!
    kobject_put(kobj) // OOPS!!!!

  The race above happens because module.c tries to use kset_find_obj()
  when somebody unloads module.  The module.c code was introduced in
  commit 6494a93d55fa"

Anatol supplied a patch specific for module.c that worked around the
problem by simply not using kset_find_obj() at all, but rather than make
a local band-aid, this just fixes kset_find_obj() to be thread-safe
using the proper model of refusing the get a new reference if the
refcount has already dropped to zero.

See examples of this proper refcount handling not only in the kref
documentation, but in various other equivalent uses of this pattern by
grepping for atomic_inc_not_zero().

[ Side note: the module race does indicate that module loading and
  unloading is not properly serialized wrt sysfs information using the
  module mutex.  That may require further thought, but this is the
  correct fix at the kobject layer regardless. ]

Reported-analyzed-and-tested-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

uprobes/tracing: Don't pass addr=ip to perf_trace_buf_submit()

uprobe_perf_print() passes addr=ip to perf_trace_buf_submit() for
no reason. This sets perf_sample_data->addr for PERF_SAMPLE_ADDR,
we already have perf_sample_data->ip initialized if PERF_SAMPLE_IP.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes/tracing: Change create_trace_uprobe() to support uretprobes

Finally change create_trace_uprobe() to check if argv[0][0] == 'r'
and pass the correct "is_ret" to alloc_trace_uprobe().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Anton Arapov <anton@redhat.com>

uprobes/tracing: Make seq_printf() code uretprobe-friendly

Change probes_seq_show() and print_uprobe_event() to check
is_ret_probe() and print the correct data.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Anton Arapov <anton@redhat.com>

uprobes/tracing: Make register_uprobe_event() paths uretprobe-friendly

Change uprobe_event_define_fields(), and __set_print_fmt() to check
is_ret_probe() and use the appropriate format/fields.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Anton Arapov <anton@redhat.com>

uprobes/tracing: Make uprobe_{trace,perf}_print() uretprobe-friendly

Change uprobe_trace_print() and uprobe_perf_print() to check
is_ret_probe() and fill ring_buffer_event accordingly.

Also change uprobe_trace_func() and uprobe_perf_func() to not
_print() if is_ret_probe() is true. Note that we keep ->handler()
nontrivial even for uretprobe, we need this for filtering and for
other potential extensions.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Anton Arapov <anton@redhat.com>

uprobes/tracing: Introduce is_ret_probe() and uretprobe_dispatcher()

Create the new functions we need to support uretprobes, and change
alloc_trace_uprobe() to initialize consumer.ret_handler if the new
"is_ret" argument is true. Curently this argument is always false,
so the new code is never called and is_ret_probe(tu) is false too.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Anton Arapov <anton@redhat.com>

uprobes/tracing: Introduce uprobe_{trace,perf}_print() helpers

Extract the output code from uprobe_trace_func() and uprobe_perf_func()
into the new helpers, they will be used by ->ret_handler() too. We also
add the unused "unsigned long func" argument in advance, to simplify the
next changes.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Anton Arapov <anton@redhat.com>

uprobes/tracing: Generalize struct uprobe_trace_entry_head

struct uprobe_trace_entry_head has a single member for reporting,
"unsigned long ip". If we want to support uretprobes we need to
create another struct which has "func" and "ret_ip" and duplicate
a lot of functions, like trace_kprobe.c does.

To avoid this copy-and-paste horror we turn ->ip into ->vaddr[]
and add couple of trivial helpers to calculate sizeof/data. This
uglifies the code a bit, but this allows us to avoid a lot more
complications later, when we add the support for ret-probes.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Anton Arapov <anton@redhat.com>

uprobes/tracing: Kill the pointless local_save_flags/preempt_count calls

uprobe_trace_func() is never called with irqs or preemption
disabled, no need to ask preempt_count() or local_save_flags().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Anton Arapov <anton@redhat.com>

uprobes/tracing: Kill the pointless seq_print_ip_sym() call

seq_print_ip_sym(ip) in print_uprobe_event() is pointless,
kallsyms_lookup(ip) can not resolve a user-space address.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Anton Arapov <anton@redhat.com>

uprobes/tracing: Kill the pointless task_pt_regs() calls

uprobe_trace_func() and uprobe_perf_func() do not need task_pt_regs(),
we already have "struct pt_regs *regs".

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Anton Arapov <anton@redhat.com>

uretprobes: Documentation update

add the uretprobe syntax and update an example

Signed-off-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>

uretprobes: Remove -ENOSYS as return probes implemented

Enclose return probes implementation.

Signed-off-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>

uretprobes: Limit the depth of return probe nestedness

Unlike the kretprobes we can't trust userspace, thus must have
protection from user space attacks. User-space have "unlimited"
stack, and this patch limits the return probes nestedness as a
simple remedy for it.

Note that this implementation leaks return_instance on siglongjmp
until exit()/exec().

The intention is to have KISS and bare minimum solution for the
initial implementation in order to not complicate the uretprobes
code.

In the future we may come up with more sophisticated solution that
remove this depth limitation. It is not easy task and lays beyond
this patchset.

Signed-off-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>

uretprobes: Return probe exit, invoke handlers

Uretprobe handlers are invoked when the trampoline is hit, on completion
the trampoline is replaced with the saved return address and the uretprobe
instance deleted.

TODO: handle_trampoline() assumes that ->return_instances is always valid.
We should teach it to handle longjmp() which can invalidate the pending
return_instance's. This is nontrivial, we will try to do this in a separate
series.

Signed-off-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>

uretprobes: Return probe entry, prepare_uretprobe()

When a uprobe with return probe consumer is hit, prepare_uretprobe()
function is invoked. It creates return_instance, hijacks return address
and replaces it with the trampoline.

* Return instances are kept as stack per uprobed task.
* Return instance is chained, when the original return address is
trampoline's page vaddr (e.g. recursive call of the probed function).

Signed-off-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>

uretprobes/powerpc: Hijack return address

Hijack the return address and replace it with a trampoline address.
PowerPC implementation.

Signed-off-by: Anton Arapov <anton@redhat.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>

uretprobes/x86: Hijack return address

Hijack the return address and replace it with a trampoline address.

Signed-off-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>

uretprobes: Reserve the first slot in xol_vma for trampoline

Allocate trampoline page, as the very first one in uprobed
task xol area, and fill it with breakpoint opcode.

Also introduce get_trampoline_vaddr() helper, to wrap the
trampoline address extraction from area->vaddr. That removes
confusion and eases the debug experience in case ->vaddr
notion will be changed.

Signed-off-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>

uretprobes: Introduce uprobe_consumer->ret_handler()

Enclose return probes implementation, introduce ->ret_handler() and update
existing code to rely on ->handler() *and* ->ret_handler() for uprobe and
uretprobe respectively.

Signed-off-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Btrfs: make sure nbytes are right after log replay

While trying to track down a tree log replay bug I noticed that fsck was always
complaining about nbytes not being right for our fsynced file.  That is because
the new fsync stuff doesn't wait for ordered extents to complete, so the inodes
nbytes are not necessarily updated properly when we log it.  So to fix this we
need to set nbytes to whatever it is on the inode that is on disk, so when we
replay the extents we can just add the bytes that are being added as we replay
the extent.  This makes it work for the case that we have the wrong nbytes or
the case that we logged everything and nbytes is actually correct.  With this
I'm no longer getting nbytes errors out of btrfsck.

Cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

x86-32: Fix possible incomplete TLB invalidate with PAE pagetables

This patch attempts to fix:

https://bugzilla.kernel.org/show_bug.cgi?id=56461

The symptom is a crash and messages like this:

chrome: Corrupted page table at address 34a03000
*pdpt = 0000000000000000 *pde = 0000000000000000
Bad pagetable: 000f [#1] PREEMPT SMP

Ingo guesses this got introduced by commit 611ae8e3f520 ("x86/tlb:
enable tlb flush range support for x86") since that code started to free
unused pagetables.

On x86-32 PAE kernels, that new code has the potential to free an entire
PMD page and will clear one of the four page-directory-pointer-table
(aka pgd_t entries).

The hardware aggressively "caches" these top-level entries and invlpg
does not actually affect the CPU's copy. If we clear one we *HAVE* to
do a full TLB flush, otherwise we might continue using a freed pmd page.
(note, we do this properly on the population side in pud_populate()).

This patch tracks whenever we clear one of these entries in the 'struct
mmu_gather', and ensures that we follow up with a full tlb flush.

BTW, I disassembled and checked that:

if (tlb->fullmm == 0)
and
if (!tlb->fullmm && !tlb->need_flush_all)

generate essentially the same code, so there should be zero impact there
to the !PAE case.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Artem S Tashkinov <t.artem@mailcity.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending

Pull SCSI target fixes from Nicholas Bellinger:
"Here are remaining target-pending items for v3.9-rc7 code.

  The tcm_vhost patches are more than I'd usually include in a -rc7
  pull, but are changes required for v3.9 to work correctly with the
  pending vhost-scsi-pci QEMU upstream series merge.  (Paolo CC'ed)

  Plus Asias's conversion to use vhost_virtqueue->private_data + RCU for
  managing vhost-scsi endpoints has gotten alot of review + testing over
  the past weeks, and MST has ACKed the full series.

  Also, there is a target patch to fix a long-standing bug within
  control CDB handling with Standby/Offline/Transition ALUA port access
  states, that had been incorrectly rejecting the control CDBs required
  for LUN scan to work during these port group states.  CC'ing to
  stable."

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  target: Fix incorrect fallthrough of ALUA Standby/Offline/Transition CDBs
  tcm_vhost: Send bad target to guest when cmd fails
  tcm_vhost: Add vhost_scsi_send_bad_target() helper
  tcm_vhost: Fix tv_cmd leak in vhost_scsi_handle_vq
  tcm_vhost: Remove double check of response
  tcm_vhost: Initialize vq->last_used_idx when set endpoint
  tcm_vhost: Use vq->private_data to indicate if the endpoint is setup
  tcm_vhost: Use ACCESS_ONCE for vs->vs_tpg[target] access

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"This is a set of ten bug fixes (and two consisting of copyright year
  update and version number change) pretty much all of which involve
  either a crash or a hang except the removal of the random sleep from
  the qla2xxx driver (which is a coding error so bad, we want it gone
  before anyone has a chance to copy it)."

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  [SCSI] lpfc: fix potential NULL pointer dereference in lpfc_sli4_rq_put()
  [SCSI] libsas: fix handling vacant phy in sas_set_ex_phy()
  [SCSI] ibmvscsi: Fix slave_configure deadlock
  [SCSI] qla2xxx: Update the driver version to 8.04.00.13-k.
  [SCSI] qla2xxx: Remove debug code that msleeps for random duration.
  [SCSI] qla2xxx: Update copyright dates information in LICENSE.qla2xxx file.
  [SCSI] qla2xxx: Fix crash during firmware dump procedure.
  [SCSI] Revert "qla2xxx: Add setting of driver version string for vendor application."
  [SCSI] ipr: dlpar failed when adding an adapter back
  [SCSI] ipr: fix addition of abort command to HRRQ free queue
  [SCSI] st: Take additional queue ref in st_probe
  [SCSI] libsas: use right function to alloc smp response
  [SCSI] ipr: ipr_test_msi() fails when running with msi-x enabled adapter

Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6

Pull CIFS fix from Steve French:
"Fixes a regression in cifs in which a password which begins with a
comma is parsed incorrectly as a blank password"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Allow passwords which begin with a delimitor

ftrace: Move ftrace_filter_lseek out of CONFIG_DYNAMIC_FTRACE section

As ftrace_filter_lseek is now used with ftrace_pid_fops, it needs to
be moved out of the #ifdef CONFIG_DYNAMIC_FTRACE section as the
ftrace_pid_fops is defined when DYNAMIC_FTRACE is not.

Cc: stable@vger.kernel.org
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Fix possible NULL pointer dereferences

Currently set_ftrace_pid and set_graph_function files use seq_lseek
for their fops.  However seq_open() is called only for FMODE_READ in
the fops->open() so that if an user tries to seek one of those file
when she open it for writing, it sees NULL seq_file and then panic.

It can be easily reproduced with following command:

  $ cd /sys/kernel/debug/tracing
  $ echo 1234 | sudo tee -a set_ftrace_pid

In this example, GNU coreutils' tee opens the file with fopen(, "a")
and then the fopen() internally calls lseek().

Link: http://lkml.kernel.org/r/1365663302-2170-1-git-send-email-namhyung@kernel.org
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: stable@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Merge tag 'sound-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"This contains a few small ASoC fixes (wm8903, wm5102, samsung-i2s,
  tegra, and soc-compress) and an endian fix for NI USB-audio devices,
  update for Mark's e-mail address.

  No scary changes, AFAIS."

* tag 'sound-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  MAINTAINERS: Update e-mail address
  ASoC: wm5102: Correct lookup of arizona struct in SYSCLK event
  ASoC: wm8903: Fix the bypass to HP/LINEOUT when no DAC or ADC is running
  ALSA: usb-audio: fix endianness bug in snd_nativeinstruments_*
  ASoC: tegra: Don't claim to support PCM pause and resume
  ASoC: Samsung: set drvdata before adding secondary device
  ASoC: Samsung: return error if drvdata is not set
  ASoC: compress: Cancel delayed power down if needed
  ASoC: core: Fix to check return value of snd_soc_update_bits_locked()

Merge tag 'asoc-maintainers-v3.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

MAINTAINERS: Update e-mail address

Update the e-mail address I use for subsystems.

MAINTAINERS: Update e-mail address

Update the e-mail address I use for subsystems.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>

Merge tag 'asoc-v3.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Updates for v3.9

A few updates, more than I'd like, fixing some relatively small issues
but mostly driver specific ones. Nothing wildly exciting so if it
doesn't make v3.9 it won't be the end of the world but it'd be nice.

x86/mm: Flush lazy MMU when DEBUG_PAGEALLOC is set

When CONFIG_DEBUG_PAGEALLOC is set page table updates made by
kernel_map_pages() are not made visible (via TLB flush)
immediately if lazy MMU is on. In environments that support lazy
MMU (e.g. Xen) this may lead to fatal page faults, for example,
when zap_pte_range() needs to allocate pages in
__tlb_remove_page() -> tlb_next_batch().

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: konrad.wilk@oracle.com
Link: http://lkml.kernel.org/r/1365703192-2089-1-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/mm/cpa/selftest: Fix false positive in CPA self test

If the pmd is not present, _PAGE_PSE will not be set anymore.
Fix the false positive.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/1365687369-30802-1-git-send-email-aarcange@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf: Fix error return code

Fix to return -ENOMEM in the allocation error case instead of 0
(if pmu_bus_running == 1), as done elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: a.p.zijlstra@chello.nl
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Link: http://lkml.kernel.org/r/CAPgLHd8j_fWcgqe%3DKLWjpBj%2B%3Do0Pw6Z-SEq%3DNTPU08c2w1tngQ@mail.gmail.com
[ Tweaked the error code setting placement and the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

drm/fb-helper: Fix locking in drm_fb_helper_hotplug_event

Driver's and ->fill_modes functions are allowed to grab crtc mutexes
(for e.g. load detect). Hence we need to first only grab the general
kms mutex, and only in a second step grab all locks to do the
modesets.

This prevents a deadlock on my gm45 in the tv load detect code called
by drm_helper_probe_single_connector_modes.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>

udl: handle EDID failure properly.

Don't oops seems proper.

Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>

Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma

Pull slave-dmaengine fixes from Vinod Koul:
"The first one fixes issue in pl330 to check for DT compatible and
  the second one fixes omap-dma to start without delay"

* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: omap-dma: Start DMA without delay for cyclic channels
  DMA: PL330: Add check if device tree compatible

Merge tag 'pm-3.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:

- System reboot/halt fix related to CPU offline ordering from Huacai
   Chen.

- intel_pstate driver fix for a delay time computation error
   occasionally crashing systems using it from Dirk Brandewie.

* tag 'pm-3.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()
  cpufreq / intel_pstate: Set timer timeout correctly

Merge tag 'regmap-v3.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap

Pull regmap revert from Mark Brown:
"regmap: Back out work buffer fix

  This reverts commit bc8ce4 (regmap: don't corrupt work buffer in
  _regmap_raw_write()) since it turns out that it can cause issues when
  taken in isolation from the other changes in -next that lead to its
  discovery.  On the basis that nobody noticed the problems for quite
  some time without that subsequent work let's drop it from v3.9."

* tag 'regmap-v3.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: Back out work buffer fix

Merge tag 'gpio-fixes-v3.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO fix from Linus Walleij:
"Oneliner fix for the PCA 953x driver."

* tag 'gpio-fixes-v3.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: pca953x: fix irq_domain_add_simple usage

Merge tag 'arm-soc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC bug fixes from Arnd Bergmann:
"A little later during the week than the last few pull requests, since
  there was very little that came in before 3.9-rc6.  At least things
  have calmed down again here.

  Some important bug fixes that came in over the last 10 days, mostly
  mvebu and imx:

   - Multiple regressions on i.mx following the conversion of the clock
     code, hopefully the last we are seeing of those.
   - a regression in the mvebu irq handling code
   - An incorrect register offset in the rewritten s3c24xx irq code.
   - Two bugs in setting up the iomega_ix2_200 machine
   - Turning on an extra bus clock on imx
   - A MAINTAINERS file entry for Roland Stigge"

* tag 'arm-soc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  arm: mvebu: Fix the irq map function in SMP mode
  Fix GE0/GE1 init on ix2-200 as GE0 has no PHY
  ARM: S3C24XX: Fix interrupt pending register offset of the EINT controller
  ARM: S3C24XX: Correct NR_IRQS definition for s3c2440
  ARM i.MX6: Fix ldb_di clock selection
  ARM: imx: provide twd clock lookup from device tree
  ARM: imx35 Bugfix admux clock
  ARM: clk-imx35: Bugfix iomux clock
  ARM: mxs: Slow down the I2C clock speed
  MAINTAINERS: Add maintainer for LPC32xx
  ARM: Kirkwood: Fix typo in the definition of ix2-200 rebuild LED

[SCSI] lpfc: fix potential NULL pointer dereference in lpfc_sli4_rq_put()

The dereference to 'put_index' should be moved below the NULL test.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

gpio: pca953x: fix irq_domain_add_simple usage

We actually have to pass chip as the host_data parameter of
irq_domain_add_simple() as later on, it is used to initialize chip_data
in pca953x_gpio_irq_map(). Failing to do so is leading to a NULL pointer
dereference after calling irq_data_get_irq_chip_data() in
pca953x_irq_mask(), pca953x_irq_unmask(), pca953x_irq_bus_lock(),
pca953x_irq_bus_sync_unlock() and pca953x_irq_set_type().

Fixes regression introduced by commit
0e8f2fdacf1d44651aa7e57063c76142d1f4988b (gpio: pca953x: use simple
irqdomain)

Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

Merge remote-tracking branch 'asoc/fix/wm8903' into tmp

Merge remote-tracking branch 'asoc/fix/tegra' into tmp

Merge remote-tracking branch 'asoc/fix/samsung' into tmp

Merge remote-tracking branch 'asoc/fix/core' into tmp

Merge remote-tracking branch 'asoc/fix/compress' into tmp

Merge tag 'mvebu_fixes_for_v3.9_round3' of git://git.infradead.org/users/jcooper/linux into fixes

From Jason Cooper <jason@lakedaemon.net>:

mvebu fixes for v3.9 round 3

- Kirkwood
    - a couple of small fixes for the Iomega ix2-200 board (ether and led)
- mvebu
    - allow GPIO button to work on Mirabox when running SMP

* tag 'mvebu_fixes_for_v3.9_round3' of git://git.infradead.org/users/jcooper/linux:
  arm: mvebu: Fix the irq map function in SMP mode
  Fix GE0/GE1 init on ix2-200 as GE0 has no PHY
  ARM: Kirkwood: Fix typo in the definition of ix2-200 rebuild LED

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

target: Fix incorrect fallthrough of ALUA Standby/Offline/Transition CDBs

This patch fixes a bug where a handful of informational / control CDBs
that should be allowed during ALUA access state Standby/Offline/Transition
where incorrectly returning CHECK_CONDITION + ASCQ_04H_ALUA_TG_PT_*.

This includes INQUIRY + REPORT_LUNS, which would end up preventing LUN
registration when LUN scanning occured during these ALUA access states.

Cc: Hannes Reinecke <hare@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

tcm_vhost: Send bad target to guest when cmd fails

Send bad target to guest in case:
1) we can not allocate the cmd
2) fail to submit the cmd

Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

tcm_vhost: Add vhost_scsi_send_bad_target() helper

Share the send bad target code with other use cases.

Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

tcm_vhost: Fix tv_cmd leak in vhost_scsi_handle_vq

If we fail to submit the allocated tv_vmd to tcm_vhost_submission_work,
we will leak the tv_vmd. Free tv_vmd on fail path.

Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

tcm_vhost: Remove double check of response

We did the length of response check twice.

Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

x86/mm/cpa: Convert noop to functional fix

Commit:

a8aed3e0752b ("x86/mm/pageattr: Prevent PSE and GLOABL leftovers to confuse pmd/pte_present and pmd_huge")

introduced a valid fix but one location that didn't trigger the bug that
lead to finding those (small) problems, wasn't updated using the
right variable.

The wrong variable was also initialized for no good reason, that
may have been the source of the confusion. Remove the noop
initialization accordingly.

Commit a8aed3e0752b also erroneously removed one canon_pgprot pass meant
to clear pmd bitflags not supported in hardware by older CPUs, that
automatically gets corrected by this patch too by applying it to the right
variable in the new location.

Reported-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Mel Gorman <mgorman@suse.de>
Link: http://lkml.kernel.org/r/1365600505-19314-1-git-send-email-aarcange@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge tag 'char-misc-3.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc fix from Greg Kroah-Hartman:
"Here is a single Kconfig dependancy build fix for 3.9.

  It's been in linux-next for a while, and fixes a problem that has been
  reported multiple times."

* tag 'char-misc-3.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  misc/vmw_vmci: Add dependency on CONFIG_NET

Merge tag 'tty-3.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg Kroah-Hartman:
"Here are 4 small tty/serial fixes for 3.9.

  One fixes a bug where we broke the documentation build, and the others
  fix reported problems in some serial drivers.

  All have been in linux-next for a while"

* tag 'tty-3.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: mxser: fix cycle termination condition in mxser_probe() and mxser_module_init()
  Revert "tty/8250_pnp: serial port detection regression since v3.7"
  OMAP/serial: Revert bad fix of Rx FIFO threshold granularity
  tty: Documentation: fix a path in a DocBook template

Merge tag 'stable/for-linus-3.9-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull Xen fixes from Konrad Rzeszutek Wilk:
"Two bug-fixes:
   - Early bootup issue found on DL380 machines
   - Fix for the timer interrupt not being processed right awaym leading
     to quite delayed time skew on certain workloads"

* tag 'stable/for-linus-3.9-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/mmu: On early bootup, flush the TLB when changing RO->RW bits Xen provided pagetables.
  xen/events: Handle VIRQ_TIMER before any other hardirq in event loop.

Merge tag 'trace-fixes-3.9-rc-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
"Namhyung Kim fixed a long standing bug that can cause a kernel panic.

  If the function profiler fails to allocate memory for everything, it
  will do a double free on the same pointer which can cause a panic"

* tag 'trace-fixes-3.9-rc-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix double free when function profile init failed

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) cfg80211_conn_scan() must be called with the sched_scan_mutex, fix
    from Artem Savkov.

2) Fix regression in TCP ICMPv6 processing, we do not want to treat
    redirects as socket errors, from Christoph Paasch.

3) Fix several recvmsg() msg_name kernel memory leaks into userspace,
    in ATM, AX25, Bluetooth, CAIF, IRDA, s390 IUCV, L2TP, LLC, Netrom,
    NFC, Rose, TIPC, and VSOCK.  From Mathias Krause and Wei Yongjun.

4) Fix AF_IUCV handling of segmented SKBs in recvmsg(), from Ursula
    Braun and Eric Dumazet.

5) CAN gw.c code does kfree() on SLAB cache memory, use
    kmem_cache_free() instead.  Fix from Wei Yongjun.

6) Fix LSM regression on TCP SYN/ACKs, some LSMs such as SELINUX want
    an skb->sk socket context available for these packets, but nothing
    else requires it.  From Eric Dumazet and Paul Moore.

7) Fix ipv4 address lifetime processing so that we don't perform
    sleepable acts inside of rcu_read_lock() sections, do them in an
    rtnl_lock() section instead.  From Jiri Pirko.

8) mvneta driver accidently sets HW features after device registry, it
    should do so beforehand.  Fix from Willy Tarreau.

9) Fix bonding unload races more correctly, from Nikolay Aleksandrov
    and Veaceslav Falico.

10) rtnl_dump_ifinfo() and rtnl_calcit() invoke nlmsg_parse() with wrong
    header size argument.  Fix from Michael Riesch.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
  lsm: add the missing documentation for the security_skb_owned_by() hook
  bnx2x: Prevent null pointer dereference in AFEX mode
  e100: Add dma mapping error check
  selinux: add a skb_owned_by() hook
  can: gw: use kmem_cache_free() instead of kfree()
  netrom: fix invalid use of sizeof in nr_recvmsg()
  qeth: fix qeth_wait_for_threads() deadlock for OSN devices
  af_iucv: fix recvmsg by replacing skb_pull() function
  rtnetlink: Call nlmsg_parse() with correct header length
  bonding: fix bonding_masters race condition in bond unloading
  Revert "bonding: remove sysfs before removing devices"
  net: mvneta: enable features before registering the driver
  hyperv: Fix RNDIS send_completion code path
  hyperv: Fix a kernel warning from netvsc_linkstatus_callback()
  net: ipv4: fix schedule while atomic bug in check_lifetime()
  net: ipv4: reset check_lifetime_work after changing lifetime
  bnx2x: Fix KR2 rapid link flap
  sctp: remove 'sridhar' from maintainers list
  VSOCK: Fix missing msg_namelen update in vsock_stream_recvmsg()
  VSOCK: vmci - fix possible info leak in vmci_transport_dgram_dequeue()
  ...

Merge tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming

Pull C6X fix from Mark Salter.

Final (?) fix from the barrier discussion.

* tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming:
add memory barrier to arch_local_irq_restore

cifs: Allow passwords which begin with a delimitor

Fixes a regression in cifs_parse_mount_options where a password
which begins with a delimitor is parsed incorrectly as being a blank
password.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>

lsm: add the missing documentation for the security_skb_owned_by() hook

Unfortunately we didn't catch the missing comments earlier when the
patch was merged.

Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Prevent null pointer dereference in AFEX mode

The cnic module is responsible for initializing various bnx2x structs
via callbacks provided by the bnx2x module.
One such struct is the queue object for the FCoE queue.

If a device is working in AFEX mode and its configuration allows FCoE yet
the cnic module is not loaded, it's very likely a null pointer dereference
will occur, as the bnx2x will erroneously access the FCoE's queue object.

Prevent said access until cnic properly registers itself.

Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e100: Add dma mapping error check

e100 uses pci_map_single, but fails to check for a dma mapping error after its
use, resulting in a stack trace:

[   46.656594] ------------[ cut here ]------------
[   46.657004] WARNING: at lib/dma-debug.c:933 check_unmap+0x47b/0x950()
[   46.657004] Hardware name: To Be Filled By O.E.M.
[   46.657004] e100 0000:00:0e.0: DMA-API: device driver failed to check map
error[device address=0x000000007a4540fa] [size=90 bytes] [mapped as single]
[   46.657004] Modules linked in:
[   46.657004]  w83627hf hwmon_vid snd_via82xx ppdev snd_ac97_codec ac97_bus
snd_seq snd_pcm snd_mpu401 snd_mpu401_uart ns558 snd_rawmidi gameport parport_pc
e100 snd_seq_device parport snd_page_alloc snd_timer snd soundcore skge shpchp
k8temp mii edac_core i2c_viapro edac_mce_amd nfsd auth_rpcgss nfs_acl lockd
sunrpc binfmt_misc uinput ata_generic pata_acpi radeon i2c_algo_bit
drm_kms_helper ttm firewire_ohci drm firewire_core pata_via sata_via i2c_core
sata_promise crc_itu_t
[   46.657004] Pid: 792, comm: ip Not tainted 3.8.0-0.rc6.git0.1.fc19.x86_64 #1
[   46.657004] Call Trace:
[   46.657004]  <IRQ>  [<ffffffff81065ed0>] warn_slowpath_common+0x70/0xa0
[   46.657004]  [<ffffffff81065f4c>] warn_slowpath_fmt+0x4c/0x50
[   46.657004]  [<ffffffff81364cfb>] check_unmap+0x47b/0x950
[   46.657004]  [<ffffffff8136522f>] debug_dma_unmap_page+0x5f/0x70
[   46.657004]  [<ffffffffa030f0f0>] ? e100_tx_clean+0x30/0x210 [e100]
[   46.657004]  [<ffffffffa030f1a8>] e100_tx_clean+0xe8/0x210 [e100]
[   46.657004]  [<ffffffffa030fc6f>] e100_poll+0x56f/0x6c0 [e100]
[   46.657004]  [<ffffffff8159dce1>] ? net_rx_action+0xa1/0x370
[   46.657004]  [<ffffffff8159ddb2>] net_rx_action+0x172/0x370
[   46.657004]  [<ffffffff810703bf>] __do_softirq+0xef/0x3d0
[   46.657004]  [<ffffffff816e4ebc>] call_softirq+0x1c/0x30
[   46.657004]  [<ffffffff8101c485>] do_softirq+0x85/0xc0
[   46.657004]  [<ffffffff81070885>] irq_exit+0xd5/0xe0
[   46.657004]  [<ffffffff816e5756>] do_IRQ+0x56/0xc0
[   46.657004]  [<ffffffff816dacb2>] common_interrupt+0x72/0x72
[   46.657004]  <EOI>  [<ffffffff816da1eb>] ?
_raw_spin_unlock_irqrestore+0x3b/0x70
[   46.657004]  [<ffffffff816d124d>] __slab_free+0x58/0x38b
[   46.657004]  [<ffffffff81214424>] ? fsnotify_clear_marks_by_inode+0x34/0x120
[   46.657004]  [<ffffffff811b0417>] ? kmem_cache_free+0x97/0x320
[   46.657004]  [<ffffffff8157fc14>] ? sock_destroy_inode+0x34/0x40
[   46.657004]  [<ffffffff8157fc14>] ? sock_destroy_inode+0x34/0x40
[   46.657004]  [<ffffffff811b0692>] kmem_cache_free+0x312/0x320
[   46.657004]  [<ffffffff8157fc14>] sock_destroy_inode+0x34/0x40
[   46.657004]  [<ffffffff811e8c28>] destroy_inode+0x38/0x60
[   46.657004]  [<ffffffff811e8d5e>] evict+0x10e/0x1a0
[   46.657004]  [<ffffffff811e9605>] iput+0xf5/0x180
[   46.657004]  [<ffffffff811e4338>] dput+0x248/0x310
[   46.657004]  [<ffffffff811ce0e1>] __fput+0x171/0x240
[   46.657004]  [<ffffffff811ce26e>] ____fput+0xe/0x10
[   46.657004]  [<ffffffff8108d54c>] task_work_run+0xac/0xe0
[   46.657004]  [<ffffffff8106c6ed>] do_exit+0x26d/0xc30
[   46.657004]  [<ffffffff8109eccc>] ? finish_task_switch+0x7c/0x120
[   46.657004]  [<ffffffff816dad58>] ? retint_swapgs+0x13/0x1b
[   46.657004]  [<ffffffff8106d139>] do_group_exit+0x49/0xc0
[   46.657004]  [<ffffffff8106d1c4>] sys_exit_group+0x14/0x20
[   46.657004]  [<ffffffff816e3b19>] system_call_fastpath+0x16/0x1b
[   46.657004] ---[ end trace 4468c44e2156e7d1 ]---
[   46.657004] Mapped at:
[   46.657004]  [<ffffffff813663d1>] debug_dma_map_page+0x91/0x140
[   46.657004]  [<ffffffffa030e8eb>] e100_xmit_prepare+0x12b/0x1c0 [e100]
[   46.657004]  [<ffffffffa030c924>] e100_exec_cb+0x84/0x140 [e100]
[   46.657004]  [<ffffffffa030e56a>] e100_xmit_frame+0x3a/0x190 [e100]
[   46.657004]  [<ffffffff8159ee89>] dev_hard_start_xmit+0x259/0x6c0

Easy fix, modify the cb paramter to e100_exec_cb to return an error, and do the
dma_mapping_error check in the obvious place

This was reported previously here:
http://article.gmane.org/gmane.linux.network/257893

But nobody stepped up and fixed it.

CC: Josh Boyer <jwboyer@redhat.com>
CC: e1000-devel@lists.sourceforge.net
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Michal Jaegermann <michal@harddata.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal

Invoking arch_flush_lazy_mmu_mode() results in calls to
preempt_enable()/disable() which may have performance impact.

Since lazy MMU is not used on bare metal we can patch away
arch_flush_lazy_mmu_mode() so that it is never called in such
environment.

[ hpa: the previous patch "Fix vmalloc_fault oops during lazy MMU
  updates" may cause a minor performance regression on
  bare metal.  This patch resolves that performance regression.  It is
  somewhat unclear to me if this is a good -stable candidate. ]

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1364045796-10720-2-git-send-email-konrad.wilk@oracle.com
Tested-by: Josh Boyer <jwboyer@redhat.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> SEE NOTE ABOVE

x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates

In paravirtualized x86_64 kernels, vmalloc_fault may cause an oops
when lazy MMU updates are enabled, because set_pgd effects are being
deferred.

One instance of this problem is during process mm cleanup with memory
cgroups enabled. The chain of events is as follows:

- zap_pte_range enables lazy MMU updates
- zap_pte_range eventually calls mem_cgroup_charge_statistics,
  which accesses the vmalloc'd mem_cgroup per-cpu stat area
- vmalloc_fault is triggered which tries to sync the corresponding
  PGD entry with set_pgd, but the update is deferred
- vmalloc_fault oopses due to a mismatch in the PUD entries

The OOPs usually looks as so:

------------[ cut here ]------------
kernel BUG at arch/x86/mm/fault.c:396!
invalid opcode: 0000 [#1] SMP
.. snip ..
CPU 1
Pid: 10866, comm: httpd Not tainted 3.6.10-4.fc18.x86_64 #1
RIP: e030:[<ffffffff816271bf>]  [<ffffffff816271bf>] vmalloc_fault+0x11f/0x208
.. snip ..
Call Trace:
[<ffffffff81627759>] do_page_fault+0x399/0x4b0
[<ffffffff81004f4c>] ? xen_mc_extend_args+0xec/0x110
[<ffffffff81624065>] page_fault+0x25/0x30
[<ffffffff81184d03>] ? mem_cgroup_charge_statistics.isra.13+0x13/0x50
[<ffffffff81186f78>] __mem_cgroup_uncharge_common+0xd8/0x350
[<ffffffff8118aac7>] mem_cgroup_uncharge_page+0x57/0x60
[<ffffffff8115fbc0>] page_remove_rmap+0xe0/0x150
[<ffffffff8115311a>] ? vm_normal_page+0x1a/0x80
[<ffffffff81153e61>] unmap_single_vma+0x531/0x870
[<ffffffff81154962>] unmap_vmas+0x52/0xa0
[<ffffffff81007442>] ? pte_mfn_to_pfn+0x72/0x100
[<ffffffff8115c8f8>] exit_mmap+0x98/0x170
[<ffffffff810050d9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
[<ffffffff81059ce3>] mmput+0x83/0xf0
[<ffffffff810624c4>] exit_mm+0x104/0x130
[<ffffffff8106264a>] do_exit+0x15a/0x8c0
[<ffffffff810630ff>] do_group_exit+0x3f/0xa0
[<ffffffff81063177>] sys_exit_group+0x17/0x20
[<ffffffff8162bae9>] system_call_fastpath+0x16/0x1b

Calling arch_flush_lazy_mmu_mode immediately after set_pgd makes the
changes visible to the consistency checks.

Cc: <stable@vger.kernel.org>
RedHat-Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=914737
Tested-by: Josh Boyer <jwboyer@redhat.com>
Reported-and-Tested-by: Krishna Raman <kraman@redhat.com>
Signed-off-by: Samu Kallio <samu.kallio@aberdeencloud.com>
Link: http://lkml.kernel.org/r/1364045796-10720-1-git-send-email-konrad.wilk@oracle.com
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

Merge tag 'nfs-for-3.9-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull another nfs fixlet from Trond Myklebust:
"I suddenly noticed that a one-line issue that I _thought_ I had fixed
  with the nfs41_walk_client_list patch was apparently still there in
  the pull request I sent earlier today.  I'm very sorry for not
  catching that in time.

   - Fix a brain fart in nfs41_walk_client_list"

* tag 'nfs-for-3.9-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Doh! Typo in the fix to nfs41_walk_client_list

arm: mvebu: Fix the irq map function in SMP mode

This patch fix the regression introduced by the commit 3202bf0157ccb
"arm: mvebu: Improve the SMP support of the interrupt controller":
GPIO IRQ were no longer delivered to the CPUs.

To be delivered to a CPU an interrupt must be enabled at CPU level and
at interrupt source level. Before the offending patch, all the
interrupts were enabled at source level during map() function. Mask()
and unmask() was done by handling the per-CPU part. It was fine when
running in UP with only one CPU.

The offending patch added support for SMP, in this case mask() and
unmask() was done by handling the interrupt source level part. The
per-CPU level part was handled by the affinity API to select the CPU
which will receive the interrupt. (Due to some hardware limitation
only one CPU at a time can received a given interrupt).

For "normal" interrupt __setup_irq() was called when an irq was
registered. irq_set_affinity() is called from this function, which
enabled the interrupt on one of the CPUs. Whereas for GPIO IRQ which
were chained interrupts, the irq_set_affinity() was never called and
none of the CPUs was selected to receive the interrupt.

With this patch all the interrupt are enable on the current CPU during
map() function. Enabling the interrupts on a CPU doesn't depend
anymore on irq_set_affinity() and then the chained irq are not anymore
a special case. However the CPU which will receive the irq can still
be modify later using irq_set_affinity().

Tested with Mirabox (A370) and Openblocks AX3 (AXP), rootfs mounted
over NFS, compiled with CONFIG_SMP=y/N.

Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Reported-by: Ryan Press <ryan@presslab.us>
Investigated-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Tested-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Tested-by: Ryan Press <ryan@presslab.us>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>

NFSv4: Doh! Typo in the fix to nfs41_walk_client_list

Make sure that we set the status to 0 on success. Missed in testing
because it never appears when doing multiple mounts to _different_
servers.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: <stable@vger.kernel.org> # 3.7.x: 7b1f1fd: NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_list

Fix GE0/GE1 init on ix2-200 as GE0 has no PHY

Signed-off-by: Jason Cooper <jason@lakedaemon.net>

Merge tag 'nfs-for-3.9-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
- fix for memory corruption issues in nfs4[01]_walk_client_list (stable)
- fix for an Oopsable bug in rpc_clone_client (stable)
- another state manager deadlock in the NFSv4 open code
- memory leaks in nfs4_discover_server_trunking and rpc_new_client

* tag 'nfs-for-3.9-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Fix another potential state manager deadlock
  SUNRPC: Fix a potential memory leak in rpc_new_client
  NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_list
  NFSv4: Fix a memory leak in nfs4_discover_server_trunking
  SUNRPC: Remove extra xprt_put()

perf/x86: Add Sandy Bridge constraints for CYCLE_ACTIVITY.*

Add CYCLE_ACTIVITY.CYCLES_NO_DISPATCH/CYCLES_L1D_PENDING constraints.

These recently documented events have restrictions to counter
0-3 and counter 2 respectively. The perf scheduler needs to know
that to schedule them correctly.

IvyBridge already has the necessary constraints.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: a.p.zijlstra@chello.nl
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1362784968-12542-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

dmaengine: omap-dma: Start DMA without delay for cyclic channels

cyclic DMA is only used by audio which needs DMA to be started without a
delay.
If the DMA for audio is started using the tasklet we experience random
channel switch (to be more precise: channel shift).

Reported-by: Peter Meerwald <pmeerw@pmeerw.net>
CC: stable@vger.kernel.org # v3.7+
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:
"This fixes a GCM bug that breaks IPsec and a compile problem in
  ux500."

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: ux500 - add missing comma
  crypto: gcm - fix assumption that assoc has one segment

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"Just a spare semicolon in nouveau that caused some issues, and an
  mgag200 fix"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/mgag200: Index 24 in extended CRTC registers is 24 in hex, not decimal.
  drm/nouveau: fix unconditional return waiting on memory

drm/mgag200: Index 24 in extended CRTC registers is 24 in hex, not decimal.

This change properly enables the "requester" in G200ER cards that is
responsible for getting pixels out of memory and clocking them out to
the screen.

Signed-off-by: Christopher Harvey <charvey@matrox.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>