git.karo-electronics.de Git - karo-tx-linux.git/log

Merge branch 'timers/urgent'

timers, sched: Correct the comments for tick_sched_timer()

In the comments of function tick_sched_timer(), the sentence
"timer->base->cpu_base->lock held" is not right.

In function __run_hrtimer(), before call timer->function(),
the cpu_base->lock has been unlocked.

Signed-off-by: liu chuansheng <chuansheng.liu@intel.com>
Cc: fei.li@intel.com
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1351098455.15558.1421.camel@cliu38-desktop-build
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge branch 'perf/urgent'

perf, cpu hotplug: Use cached value of smp_processor_id()

The perf_cpu_notifier() macro invokes smp_processor_id()
multiple times. Optimize it by using a local variable.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: peterz@infradead.org
Cc: acme@ghostprotocols.net
Link: http://lkml.kernel.org/r/20121016075817.3572.76733.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf, cpu hotplug: Run CPU_STARTING notifiers with irqs disabled

The CPU_STARTING notifiers are supposed to be run with irqs
disabled. But the perf_cpu_notifier() macro invokes them without
doing that. Fix it.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: peterz@infradead.org
Cc: acme@ghostprotocols.net
Link: http://lkml.kernel.org/r/20121016075809.3572.47848.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge branch 'x86/urgent'

Revert "x86/mm: Fix the size calculation of mapping tables"

Commit:

722bc6b16771 x86/mm: Fix the size calculation of mapping tables

Tried to address the issue that the first 2/4M should use 4k pages
if PSE enabled, but extra counts should only be valid for x86_32.

This commit caused a kdump regression: the kdump kernel hangs.

Work is in progress to fundamentally fix the various page table
initialization issues that we have, via the design suggested
by H. Peter Anvin, but it's not ready yet to be merged.

So, to get a working kdump revert to the last known working version,
which is the revert of this commit and of a followup fix (which was
incomplete):

bd2753b2dda7 x86/mm: Only add extra pages count for the first memory range during pre-allocation

Tested kdump on physical and virtual machines.

Signed-off-by: Dave Young <dyoung@redhat.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Tested-by: Flavio Leitner <fbl@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Flavio Leitner <fbl@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: ianfang.cn@gmail.com
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge branch 'numa/core'

numa, sched: Eliminate unused functions

Andrew Morton reported these allnoconfig warnings:

kernel/sched/fair.c:800: warning: 'task_h_load' declared 'static' but never defined
kernel/sched/fair.c:1004: warning: 'account_numa_enqueue' defined but not used

These are only used on CONFIG_SMP - fix it.

We should eventually resolve the Kconfig complexities here by turning
SMP (and NUMA) scheduling either into a separate source code file, or
by creating a single-model scheduler, which happens to build to a small
object file on !CONFIG_SMP or !CONFIG_NUMA kernels not via #ifdefs but
via more clever build time code elimination and zero-size data fields.

That's not a simple patch.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Link: http://lkml.kernel.org/n/tip-wyctbug9qKulTs0umsxjyixi@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge branch 'perf/urgent'

Merge branch 'numa/core'

x86/perf: Fix virtualization sanity check

In check_hw_exists() we try to detect non-emulated MSR accesses
by writing an arbitrary value into one of the PMU registers
and check if it's value after a readout is still the same.
This algorithm silently assumes that the register does not contain
the magic value already, which is wrong in at least one situation.

Fix the algorithm to really do a read-modify-write cycle. This fixes
a warning under Xen under some circumstances on AMD family 10h CPUs.

The reasons in more details actually sound like a story from
Believe It or Not!:

First you need an AMD family 10h/12h CPU. These do not reset the
PERF_CTR registers on a reboot.
Now you boot bare metal Linux, which goes successfully through this
check, but leaves the magic value of 0xabcd in the register. You
don't use the performance counters, but do a reboot (warm reset).
Then you choose to boot Xen. The check will be triggered with a
recent Linux kernel as Dom0 again, trying to write 0xabcd into the
MSR. Xen silently drops the write (expected), but the subsequent read
will return the value in the register, which just happens to be the
expected magic value. Thus the test misleadingly succeeds, leaving
the kernel in the belief that the PMU is available. This will trigger
the following message:

[    0.020294] ------------[ cut here ]------------
[    0.020311] WARNING: at arch/x86/xen/enlighten.c:730 xen_apic_write+0x15/0x17()
[    0.020318] Hardware name: empty
[    0.020323] Modules linked in:
[    0.020334] Pid: 1, comm: swapper/0 Not tainted 3.3.8 #7
[    0.020340] Call Trace:
[    0.020354]  [<ffffffff81050379>] warn_slowpath_common+0x80/0x98
[    0.020369]  [<ffffffff810503a6>] warn_slowpath_null+0x15/0x17
[    0.020378]  [<ffffffff810034df>] xen_apic_write+0x15/0x17
[    0.020392]  [<ffffffff8101cb2b>] perf_events_lapic_init+0x2e/0x30
[    0.020410]  [<ffffffff81ee4dd0>] init_hw_perf_events+0x250/0x407
[    0.020419]  [<ffffffff81ee4b80>] ? check_bugs+0x2d/0x2d
[    0.020430]  [<ffffffff81002181>] do_one_initcall+0x7a/0x131
[    0.020444]  [<ffffffff81edbbf9>] kernel_init+0x91/0x15d
[    0.020456]  [<ffffffff817caaa4>] kernel_thread_helper+0x4/0x10
[    0.020471]  [<ffffffff817c347c>] ? retint_restore_args+0x5/0x6
[    0.020481]  [<ffffffff817caaa0>] ? gs_change+0x13/0x13
[    0.020500] ---[ end trace a7919e7f17c0a725 ]---

The new code will change every of the 16 low bits read from the
register and tries to write and read-back that modified number
from the MSR.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Link: http://lkml.kernel.org/r/1349797115-28346-2-git-send-email-andre.przywara@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

numa, sched, mm: Fix NULL-ptr deref

Dan reported that there's a possible NULL pointer deref in this logic,
fix that. Further fix it to avoid a possible inf. loop (completely
unlikely in the case no vma is migratable). Also don't endlessly loop
on large length, simply truncate at the end of the address-space and
restart on the next go.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Link: http://lkml.kernel.org/n/tip-mnkio02xxtttiepsg9ek6qkw@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

numa, sched: Implement slow start for working set sampling

Add a 1 second delay before starting to scan the working set of
a task and starting to balance it amongst nodes.

The theory is that short-run tasks benefit very little from NUMA
placement: they come and go, and they better stick to the node
they were started on. As tasks mature and rebalance to other CPUs
and nodes, so does their NUMA placement have to change and so
does it start to matter more and more.

In practice this change fixes an observable kbuild regression:

   # [ a perf stat --null --repeat 10 test of ten bzImage builds to /dev/shm ]

   !NUMA:
   45.291088843 seconds time elapsed                                          ( +-  0.40% )
   45.154231752 seconds time elapsed                                          ( +-  0.36% )

   +NUMA, no slow start:
   46.172308123 seconds time elapsed                                          ( +-  0.30% )
   46.343168745 seconds time elapsed                                          ( +-  0.25% )

   +NUMA, 1 sec slow start:
   45.224189155 seconds time elapsed                                          ( +-  0.25% )
   45.160866532 seconds time elapsed                                          ( +-  0.17% )

and it also fixes an observable perf bench (hackbench) regression:

   # perf stat --null --repeat 10 perf bench sched messaging

   -NUMA:

   -NUMA:                  0.246225691 seconds time elapsed                   ( +-  1.31% )
   +NUMA no slow start:    0.252620063 seconds time elapsed                   ( +-  1.13% )

   +NUMA 1sec delay:       0.248076230 seconds time elapsed                   ( +-  1.35% )

The implementation is simple and straightforward, most of the patch
deals with adding the /proc/sys/kernel/sched_numa_scan_delay_ms tunable
knob and with renaming task_period to scan_period.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Link: http://lkml.kernel.org/n/tip-vn7p3ynbwqt3qqewhdlvjltc@git.kernel.org
[ Wrote the changelog, ran measurements, tuned the default. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge branch 'perf/urgent'

Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

* Validate syscall id before growing syscall table in 'trace', fixing potential
   excessive memory usage.

* Validate perf_sample.raw_data, making 'trace' more robust, avoiding some
   potential SEGFAULTs when reading tracepoint fields.

* Fix exclude_guest parse events 'perf test's, from Jiri Olsa.

* Do not flush maps on COMM, that is sent by the kernel when a process is
   exec'ed, but also when a process changes its name. Since we were assuming
   a COMM always meant an EXEC, we were losing track of a process maps by
   flushing its maps. Fix from Luigi Semenzato.

* A recent patch introduced a problem by not initializing what should be
   the first kind of pager to use, 'man', instead it was being left as zero
   which means no pager. This caused 'perf subcmd --help' to produce no output.
   Fix from Namhyung Kim.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge branch 'linus'

Merge tag 'stable/for-linus-3.7-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull xen bug-fixes from Konrad Rzeszutek Wilk:
- Fix mysterious SIGSEGV or SIGKILL in applications due to corrupting
   of the %eip when returning from a signal handler.
- Fix various ARM compile issues after the merge fallout.
- Continue on making more of the Xen generic code usable by ARM
   platform.
- Fix SR-IOV passthrough to mirror multifunction PCI devices.
- Fix various compile warnings.
- Remove hypercalls that don't exist anymore.

* tag 'stable/for-linus-3.7-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: dbgp: Fix warning when CONFIG_PCI is not enabled.
  xen: arm: comment on why 64-bit xen_pfn_t is safe even on 32 bit
  xen: balloon: use correct type for frame_list
  xen/x86: don't corrupt %eip when returning from a signal handler
  xen: arm: make p2m operations NOPs
  xen: balloon: don't include e820.h
  xen: grant: use xen_pfn_t type for frame_list.
  xen: events: pirq_check_eoi_map is X86 specific
  xen: XENMEM_translate_gpfn_list was remove ages ago and is unused.
  xen: sysfs: fix build warning.
  xen: sysfs: include err.h for PTR_ERR etc
  xen: xenbus: quirk uses x86 specific cpuid
  xen PV passthru: assign SR-IOV virtual functions to separate virtual slots
  xen/xenbus: Fix compile warning.
  xen/x86: remove duplicated include from enlighten.c

alpha: separate thread-synchronous flags

... and fix the race in updating unaligned control ones

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'kvm-3.7-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Avi Kivity:
"KVM updates for 3.7-rc2"

* tag 'kvm-3.7-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM guest: exit idleness when handling KVM_PV_REASON_PAGE_NOT_PRESENT
  KVM: apic: fix LDR calculation in x2apic mode
  KVM: MMU: fix release noslot pfn

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
"Most of these are uprobes race fixes from Oleg, and their preparatory
  cleanups.  (It's larger than what I'd normally send for an -rc kernel,
  but they looked significant enough to not delay them.)

  There's also an oprofile fix and an uncore PMU fix."

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
  perf/x86: Disable uncore on virtualized CPUs
  oprofile, x86: Fix wrapping bug in op_x86_get_ctrl()
  ring-buffer: Check for uninitialized cpu buffer before resizing
  uprobes: Fix the racy uprobe->flags manipulation
  uprobes: Fix prepare_uprobe() race with itself
  uprobes: Introduce prepare_uprobe()
  uprobes: Fix handle_swbp() vs unregister() + register() race
  uprobes: Do not delete uprobe if uprobe_unregister() fails
  uprobes: Don't return success if alloc_uprobe() fails
  uprobes/x86: Only rep+nop can be emulated correctly
  uprobes: Simplify is_swbp_at_addr(), remove stale comments
  uprobes: Kill set_orig_insn()->is_swbp_at_addr()
  uprobes: Introduce copy_opcode(), kill read_opcode()
  uprobes: Kill set_swbp()->is_swbp_at_addr()
  uprobes: Restrict valid_vma(false) to skip VM_SHARED vmas
  uprobes: Change valid_vma() to demand VM_MAYEXEC rather than VM_EXEC
  uprobes: Change write_opcode() to use FOLL_FORCE
  uprobes: Move clear_thread_flag(TIF_UPROBE) to uprobe_notify_resume()
  uprobes: Kill UTASK_BP_HIT state
  uprobes: Fix UPROBE_SKIP_SSTEP checks in handle_swbp()
  ...

Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core kernel fixes from Ingo Molnar:
"Two small fixes"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Documentation: Reflect the new location of the NMI watchdog info
nohz: Fix idle ticks in cpu summary line of /proc/stat

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Martin Schwidefsky:
"Among the usual minor bug fixes the more interesting patches are the
  perf counters for the latest machine, the missing select to enable
  transparent huge pages and a build fix for the UAPI rework."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390,uapi: do not use uapi/asm-generic/kvm_para.h
  s390/cache: fix data/instruction cache output
  s390: fix linker script for 31 bit builds
  s390/thp: select HAVE_ARCH_TRANSPARENT_HUGEPAGE
  s390/kdump: Use 64 bit mode for 0x10000 entry point
  perf_cpum_cf: Add support for counters available with IBM zEC12
  s390/css: stop stsch loop after cc 3
  s390/cio: use generic bitmap functions
  s390/chpid: make headers usable (again)

Merge branch 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile

Pull tile fixes from Chris Metcalf:
"This fixes one issue with compiler flags that can cause modules not to
  load, and cleans up some warnings with ELF_R_xxx defines."

* 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
  arch/tile: avoid build warnings from duplicate ELF_R_xxx #defines
  arch/tile: avoid generating .eh_frame information in modules

Merge tag 'please-pull-uapi-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux

Pull ia64 fix from Tony Luck:
"Fix from dhowells for UAPI fallout"

* tag 'please-pull-uapi-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
UAPI: Make arch/ia64/include/asm/kvm_para.h generic

Merge branch 'linus'

arch/tile: avoid build warnings from duplicate ELF_R_xxx #defines

These are now provided in <asm-generic/module.h>, so clean up warnings
by not re-defining them in module.c.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>

arch/tile: avoid generating .eh_frame information in modules

The tile tool chain uses the .eh_frame information for backtracing.
The vmlinux build drops any .eh_frame sections at link time, but when
present in kernel modules, it causes a module load failure due to the
presence of unsupported pc-relative relocations. When compiling to
use compiler feedback support, the compiler by default omits .eh_frame
information, so we don't see this problem. But when not using feedback,
we need to explicitly suppress the .eh_frame.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: stable@vger.kernel.org

Merge branch 'numa/core'

numa, mm, sched: Use down_write() in task_numa_work()

change_protection() needs to be called with the mmap_sem write-locked,
like the mprotect() variants do it.

With that in place we can avoid the intrusive (and partially
incorrect) page locking changes in the:

"numa, mm: Fix 4K migration races"

patch, because the down_write() will properly serialize with the
down_read() page fault path.

Keep the cleanups and debug code removal.

In theory calling change_protection() with just down_read()
should work, but in practice it seems messy.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Link: http://lkml.kernel.org/n/tip-g3xyfmqqmmpubhcdww2TrbLc@git.kernel.org

numa, mm: Rename the PROT_NONE fault handling functions to *_numa()

Having the function name indicate what the function is used
for makes the code a little easier to read. Furthermore,
the fault handling code largely consists of do_...._page
functions.

Rename the NUMA working set sampling fault handling functions
to _numa() names, to indicate what they are used for.

This separates the naming from the regular PROT_NONE namings.

Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: aarcange@redhat.com
Cc: a.p.zijlstra@chello.nl
Link: http://lkml.kernel.org/r/20121018172021.0b1f6e3d@cuia.bos.redhat.com
[ Converted two more usage sites ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched, numa: Add NUMA_MIGRATION feature flag

After this patch, doing:

# echo NO_NUMA_MIGRATION > /sys/kernel/debug/sched_features

Will turn off the NUMA placement logic/policy - but keeps the
working set sampling faults in place.

This allows the debugging of the WSS facility, by using it
but keeping vanilla, non-NUMA CPU and memory placement
policies.

Default enabled. Generates on extra code on !CONFIG_SCHED_DEBUG.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Link: http://lkml.kernel.org/n/tip-xjt7bqjlphxRfjXxasqm4cdv@git.kernel.org

Merge branch 'timers/core'

Merge branch 'x86/urgent'

Merge branch 'perf/urgent'

Merge branch 'core/urgent'

Merge branch 'numa/misc'

x86, mm: Prevent gcc to re-read the pagetables

GCC is very likely to read the pagetables just once and cache them in
the local stack or in a register, but it is can also decide to re-read
the pagetables. The problem is that the pagetable in those places can
change from under gcc.

In the page fault we only hold the ->mmap_sem for reading and both the
page fault and MADV_DONTNEED only take the ->mmap_sem for reading and we
don't hold any PT lock yet.

In get_user_pages_fast() the TLB shootdown code can clear the pagetables
before firing any TLB flush (the page can't be freed until the TLB
flushing IPI has been delivered but the pagetables will be cleared well
before sending any TLB flushing IPI).

With THP/hugetlbfs the pmd (and pud for hugetlbfs giga pages) can
change as well under gup_fast, it won't just be cleared for the same
reasons described above for the pte in the page fault case.

[ This patch was picked up from the AutoNUMA tree. ]

Originally-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
[ Ported to this tree, because we are modifying the page tables
at a high rate here, so this problem is potentially more
likely to show up in practice. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

mm: Check if PTE is already allocated during page fault

With transparent hugepage support, handle_mm_fault() has to be careful
that a normal PMD has been established before handling a PTE fault. To
achieve this, it used __pte_alloc() directly instead of pte_alloc_map
as pte_alloc_map is unsafe to run against a huge PMD. pte_offset_map()
is called once it is known the PMD is safe.

pte_alloc_map() is smart enough to check if a PTE is already present
before calling __pte_alloc but this check was lost. As a consequence,
PTEs may be allocated unnecessarily and the page table lock taken.
Thi useless PTE does get cleaned up but it's a performance hit which
is visible in page_test from aim9.

This patch simply re-adds the check normally done by pte_alloc_map to
check if the PTE needs to be allocated before taking the page table
lock. The effect is noticable in page_test from aim9.

AIM9
                 2.6.38-vanilla 2.6.38-checkptenone
creat-clo      446.10 ( 0.00%)   424.47 (-5.10%)
page_test       38.10 ( 0.00%)    42.04 ( 9.37%)
brk_test        52.45 ( 0.00%)    51.57 (-1.71%)
exec_test      382.00 ( 0.00%)   456.90 (16.39%)
fork_test       60.11 ( 0.00%)    67.79 (11.34%)
MMTests Statistics: duration
Total Elapsed Time (seconds)                611.90    612.22

(While this affects 2.6.38, it is a performance rather than a
functional bug and normally outside the rules -stable. While the big
performance differences are to a microbench, the difference in fork
and exec performance may be significant enough that -stable wants to
consider the patch)

Reported-by: Raz Ben Yehuda <raziebe@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rik van Riel <riel@redhat.com>
[ Picked this up from the AutoNUMA tree to help
  it upstream and to allow apples-to-apples
  performance comparisons. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"Fixes for intel and nouveau mainly.

   - intel: disable HSW by default, sdvo fixes, link train regression
     fix
   - nouveau: acpi rom loading regression fix, with a few other fixes
     from the rework
   -core: just other minor fixes and race fixes for ttm."

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (24 commits)
  drm/ttm: Fix a theoretical race in ttm_bo_cleanup_refs()
  drm/ttm: Fix a theoretical race
  drm: platform: Don't initialize driver-private data
  drm/debugfs: remove redundant info from gem_names
  drm: fb: cma: Fail gracefully on allocation failure
  drm: fb: cma: Fix typo in debug message
  drm/nouveau/clock: fix missing pll type/addr when matching default entry
  drm/nouveau/fb: fix reporting of memory type on GF8+ IGPs
  drm/nv41/vm: don't init hw pciegart on boards with agp bridge
  drm/nouveau/bios: fetch full 4KiB block to determine ACPI ROM image size
  drm/nouveau: validate vbios size
  drm/nouveau: warn when trying to free mm which is still in use
  drm/nouveau: fix nouveau_mm/nouveau_mm_node leak
  drm/nouveau/bios: improve error handling when reading the vbios from ACPI
  drm/nouveau: handle same-fb page flips
  drm/i915: Initialize obj->pages before use by i915_gem_object_do_bit17_swizzle()
  drm/i915: Add no-lvds quirk for Supermicro X7SPA-H
  drm/i915: Insert i915_preliminary_hw_support variable.
  drm/i915: shut up spurious WARN in the gtt fault handler
  Revert "drm/i915: Try harder to complete DP training pattern 1"
  ...

Merge tag 'jfs-3.7-2' of git://github.com/kleikamp/linux-shaggy

Pull jfs fix from Dave Kleikamp:
"Bug fix: Fix FITRIM argument handling"

* tag 'jfs-3.7-2' of git://github.com/kleikamp/linux-shaggy:
jfs: Fix FITRIM argument handling

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
"Various bug fixes for ext4.  The most serious of them fixes a security
  bug (CVE-2012-4508) which leads to stale data exposure when we have
  fallocate racing against writes to files undergoing delayed
  allocation.  We also have two fixes for the metadata checksum feature,
  the most serious of which can cause the superblock to have a invalid
  checksum after a power failure."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: Avoid underflow in ext4_trim_fs()
  ext4: Checksum the block bitmap properly with bigalloc enabled
  ext4: fix undefined bit shift result in ext4_fill_flex_info
  ext4: fix metadata checksum calculation for the superblock
  ext4: race-condition protection for ext4_convert_unwritten_extents_endio
  ext4: serialize fallocate with ext4_convert_unwritten_extents

Merge tag 'nfs-for-3.7-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
- Do not call pnfs_return_layout() from an rpciod context
- nfs4_ds_disconnect can cause Oopses.  Kill it...
- Fix the return value for nfs_callback_start_svc
- Fix a number of compile warnings

* tag 'nfs-for-3.7-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Fix the return value for nfs_callback_start_svc
  NFSv4.1: Declare osd_pri_2_pnfs_err(), objio_init_read/write to be static
  NFSv4: fs/nfs/nfs4getroot.c needs to include "internal.h"
  NFSv4.1: Use kcalloc() to allocate zeroed arrays instead of kzalloc()
  NFSv4.1: Do not call pnfs_return_layout() from an rpciod context
  NFSv4.1: Kill nfs4_ds_disconnect()

Merge tag 'regmap-fix-mmio' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap

Pull regmap fix from Mark Brown:
"regmap: Fix for dependencies for MMIO

  Trivial dependency issue, not noticed before as the only user of MMIO
  also needs I2C."

* tag 'regmap-fix-mmio' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: select REGMAP if REGMAP_MMIO and REGMAP_IRQ enabled

drm/ttm: Fix a theoretical race in ttm_bo_cleanup_refs()

In theory, that function could release the lru lock between
checking for bo on ddestroy list and a successful reserve if the bo
was already reserved, and the function was called with waiting reserves
allowed.
However, all current reservers of a bo on the ddestroy list would
atomically take the bo off the list after a successful reserve so this
race should not have been hit, so no need to backport for stable.

This patch also fixes a case found by Maarten Lankhorst where
ttm_mem_evict_first called with no_wait_gpu would incorrectly
spin waiting for bo idle if trying to evict a busy buffer that
also sits on the ddestroy list.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/ttm: Fix a theoretical race

The ttm_mem_evict_first function could theoretically drop the
lru lock without retrying if a reservation from off the LRU list
ended up waiting.
However, since currently there are no users that could cause a wait
in that situation so this is not suitable for stable

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: platform: Don't initialize driver-private data

Platform device drivers usually use the driver-private data for their
own purposes. Having it overwritten by drm_platform_init() is confusing
and error-prone.

Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/debugfs: remove redundant info from gem_names

It's a relic of "drm: Convert proc files to seq_file and introduce debugfs",
which wrongly converted DRM_INFO + sprintf to 2 seq_printfs.

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Ben Gamari <bgamari@gmail.com>
Cc: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: fb: cma: Fail gracefully on allocation failure

The drm_gem_cma_create() function never returns NULL but rather an error
encoded in the return value using the ERR_PTR() macro. Callers therefore
need to check for errors using the IS_ERR() macro. This change allows
drivers to handle contiguous DMA allocation failures gracefully.

Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: fb: cma: Fix typo in debug message

The debug message showing the resolution of a framebuffer to be
allocated is missing a closing parenthesis.

Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>

ext4: Avoid underflow in ext4_trim_fs()

Currently if len argument in ext4_trim_fs() is smaller than one block,
the 'end' variable underflow. Avoid that by returning EINVAL if len is
smaller than file system block.

Also remove useless unlikely().

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

KVM guest: exit idleness when handling KVM_PV_REASON_PAGE_NOT_PRESENT

KVM_PV_REASON_PAGE_NOT_PRESENT kicks cpu out of idleness, but we haven't
marked that spot as an exit from idleness.

Not doing so can cause RCU warnings such as:

[  732.788386] ===============================
[  732.789803] [ INFO: suspicious RCU usage. ]
[  732.790032] 3.7.0-rc1-next-20121019-sasha-00002-g6d8d02d-dirty #63 Tainted: G        W
[  732.790032] -------------------------------
[  732.790032] include/linux/rcupdate.h:738 rcu_read_lock() used illegally while idle!
[  732.790032]
[  732.790032] other info that might help us debug this:
[  732.790032]
[  732.790032]
[  732.790032] RCU used illegally from idle CPU!
[  732.790032] rcu_scheduler_active = 1, debug_locks = 1
[  732.790032] RCU used illegally from extended quiescent state!
[  732.790032] 2 locks held by trinity-child31/8252:
[  732.790032]  #0:  (&rq->lock){-.-.-.}, at: [<ffffffff83a67528>] __schedule+0x178/0x8f0
[  732.790032]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff81152bde>] cpuacct_charge+0xe/0x200
[  732.790032]
[  732.790032] stack backtrace:
[  732.790032] Pid: 8252, comm: trinity-child31 Tainted: G        W    3.7.0-rc1-next-20121019-sasha-00002-g6d8d02d-dirty #63
[  732.790032] Call Trace:
[  732.790032]  [<ffffffff8118266b>] lockdep_rcu_suspicious+0x10b/0x120
[  732.790032]  [<ffffffff81152c60>] cpuacct_charge+0x90/0x200
[  732.790032]  [<ffffffff81152bde>] ? cpuacct_charge+0xe/0x200
[  732.790032]  [<ffffffff81158093>] update_curr+0x1a3/0x270
[  732.790032]  [<ffffffff81158a6a>] dequeue_entity+0x2a/0x210
[  732.790032]  [<ffffffff81158ea5>] dequeue_task_fair+0x45/0x130
[  732.790032]  [<ffffffff8114ae29>] dequeue_task+0x89/0xa0
[  732.790032]  [<ffffffff8114bb9e>] deactivate_task+0x1e/0x20
[  732.790032]  [<ffffffff83a67c29>] __schedule+0x879/0x8f0
[  732.790032]  [<ffffffff8117e20d>] ? trace_hardirqs_off+0xd/0x10
[  732.790032]  [<ffffffff810a37a5>] ? kvm_async_pf_task_wait+0x1d5/0x2b0
[  732.790032]  [<ffffffff83a67cf5>] schedule+0x55/0x60
[  732.790032]  [<ffffffff810a37c4>] kvm_async_pf_task_wait+0x1f4/0x2b0
[  732.790032]  [<ffffffff81139e50>] ? abort_exclusive_wait+0xb0/0xb0
[  732.790032]  [<ffffffff81139c25>] ? prepare_to_wait+0x25/0x90
[  732.790032]  [<ffffffff810a3a66>] do_async_page_fault+0x56/0xa0
[  732.790032]  [<ffffffff83a6a6e8>] async_page_fault+0x28/0x30

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Gleb Natapov <gleb@redhat.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: apic: fix LDR calculation in x2apic mode

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Reviewed-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Chegu Vinod <chegu_vinod@hp.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: MMU: fix release noslot pfn

We can not directly call kvm_release_pfn_clean to release the pfn
since we can meet noslot pfn which is used to cache mmio info into
spte

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>

perf test: Fix exclude_guest parse events tests

Event parsing tests are broken by following commit:

  perf tool: Precise mode requires exclude_guest
  commit 1342798cc13e3b48d9b5738f0c8fa812ccea8101
  Author: David Ahern <dsahern@gmail.com>
  Date:   Thu Sep 13 14:59:13 2012 -0600

which enables 'exclude_guest' modifier any time the 'precise'
modifier is detected.

Fixing related tests and adding special comment.

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: David Ahern <dsahern@gmail.com>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf tools: do not flush maps on COMM for perf report

This fixes a long-standing bug caused by the lack of separate COMM and EXEC
record types, which makes "perf report" lose track of symbols when a process
renames itself.

With this fix (suggested by Stephane Eranian), a COMM (rename) no longer
flushes the maps, which is the correct behavior. An EXEC also no longer
flushes the maps, but this doesn't matter because as new mappings are created
(for the executable and the libraries) the old mappings are automatically
removed. This is not by accident: the functionality is necessary because DLLs
can be explicitly loaded at any time with dlopen(), possibly on top of existing
text, so "perf report" handles correctly the clobbering of new mappings on top
of old ones.

An alternative patch (which I proposed earlier) would be to introduce a
separate PERF_RECORD_EXEC type, but it is a much larger change (about 300
lines) and is not necessary.

Signed-off-by: Luigi Semenzato <semenzato@chromium.org>
Tested-by: Stephane Eranian <eranian@google.com>
Acked-by: Stephane Eranian <eranian@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Olof Johansson <olofj@chromium.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Sonny Rao <sonnyrao@chromium.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Wilson <wilsons@start.ca>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Link: http://lkml.kernel.org/r/1345585940-6497-1-git-send-email-semenzato@chromium.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf help: Fix --help for builtins

It seems that commit cc5848213329 ("perf help: Remove use of die and
handle errors") caused the problem - it changed the initial value of
'help_format' from HELP_FORMAT_MAN to HELP_FORMAT_NONE.

This broke the --help option for all builtins, that would produce no
output, while 'man perf-top' would work it MANPATH is properly setup.

Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: David Ahern <dsahern@gmail.com>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/87r4orj7zc.fsf@sejong.aot.lge.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Merge branch 'drm-nouveau-fixes' of git://git.freedesktop.org/git/nouveau/linux-2.6 into drm-fixes

Fixes from Ben, off note:
ACPI ROM regression fix,
some IGP and AGP regressions fixes from rework fallout.

* 'drm-nouveau-fixes' of git://git.freedesktop.org/git/nouveau/linux-2.6:
  drm/nouveau/clock: fix missing pll type/addr when matching default entry
  drm/nouveau/fb: fix reporting of memory type on GF8+ IGPs
  drm/nv41/vm: don't init hw pciegart on boards with agp bridge
  drm/nouveau/bios: fetch full 4KiB block to determine ACPI ROM image size
  drm/nouveau: validate vbios size
  drm/nouveau: warn when trying to free mm which is still in use
  drm/nouveau: fix nouveau_mm/nouveau_mm_node leak
  drm/nouveau/bios: improve error handling when reading the vbios from ACPI
  drm/nouveau: handle same-fb page flips

module_signing: fix printk format warning

Fix the warning:

kernel/module_signing.c:195:2: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'size_t'

by using the proper 'z' modifier for printing a size_t.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k

Pull m68k updates from Geert Uytterhoeven:
"Just the expected UAPI disintegration and the "new" kcmp syscall."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
  m68k: Wire up kcmp
  m68k: Remove empty #ifdef/#else/#endif block
  UAPI: (Scripted) Disintegrate arch/m68k/include/asm

Input: fix use-after-free introduced with dynamic minor changes

Commit 7f8d4cad1e4e ("Input: extend the number of event (and other)
devices") made evdev, joydev and mousedev to embed struct cdev into
their respective structures representing input devices.

Unfortunately character device structure may outlive the parent
structure unless we do not set it up as parent of character device so
that it will stay pinned until character device is freed.

Also, now that parent structure is pinned while character device exists
we do not need to pin and unpin it every time user opens or closes it.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

char_dev: pin parent kobject

In certain cases (for example when a cdev structure is embedded into
another object whose lifetime is controlled by a separate kobject) it is
beneficial to tie lifetime of another object to the lifetime of
character device so that related object is not freed until after
char_dev object is freed.

To achieve this let's pin kobject's parent when doing cdev_add() and
unpin when last reference to cdev structure is being released.

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

drm/nouveau/clock: fix missing pll type/addr when matching default entry

This issue is a regression from 70790f4f819875e8f390871fd15bbbf823f28e1b,
and causes us to miss a special-case for C51 (NV4E) chipsets and return
the wrong reference frequency for the VPLLs.

Should fix fdo#56202

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

ext4: Checksum the block bitmap properly with bigalloc enabled

In mke2fs, we only checksum the whole bitmap block and it is right.
While in the kernel, we use EXT4_BLOCKS_PER_GROUP to indicate the
size of the checksumed bitmap which is wrong when we enable bigalloc.
The right size should be EXT4_CLUSTERS_PER_GROUP and this patch fixes
it.

Also as every caller of ext4_block_bitmap_csum_set and
ext4_block_bitmap_csum_verify pass in EXT4_BLOCKS_PER_GROUP(sb)/8,
we'd better removes this parameter and sets it in the function itself.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Cc: stable@vger.kernel.org

drm/nouveau/fb: fix reporting of memory type on GF8+ IGPs

Purely a cosmetic issue at this point.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

drm/nv41/vm: don't init hw pciegart on boards with agp bridge

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

drm/nouveau/bios: fetch full 4KiB block to determine ACPI ROM image size

Buggy firmware leads to bad things happening otherwise..

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

drm/nouveau: validate vbios size

Without checking, we could detect vbios size as 0, allocate 0-byte array
(kmalloc returns invalid pointer for such allocation) and crash in
nouveau_bios_score while checking for vbios signature.

Reported-by: Heinz Diehl <htd@fritha.org>
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

drm/nouveau: warn when trying to free mm which is still in use

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

drm/nouveau: fix nouveau_mm/nouveau_mm_node leak

v2: use already existing parent

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

drm/nouveau/bios: improve error handling when reading the vbios from ACPI

Reported-by: Pawel Sikora <pawel.sikora@agmk.net>
Signed-off-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

drm/nouveau: handle same-fb page flips

It's questionable use case, but weston/wayland already relies on this
behaviour, and other drivers don't care about it, so it's a matter of
compatibility. Without it, process invoking such page flip hangs in
unkillable state, trying to reserve the same buffer twice.

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

perf trace: Check if sample raw_data field is set

Sometimes we're segfaulting because we were expecting that the
perf_sample.raw_data field was set as requested, but in some cases
that needs further investigation, that field can be NULL, leading
to segfaults.

Make the tool more robust by checking that before calling any per event
handlers that may try to use that field.

Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-g1fmodl6ys4lq8honbj1igoi@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf trace: Validate syscall id before growing syscall table

In some cases the ID for a syscall read thru the raw_syscalls tracepoint
is bogus, still needs to be investigated why, but to make the tool more
robust first try to resolve the ID to a name via libaudit and if it
fails, don't grow the table.

Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-0lsokw3xor7c4ijo45u6bauh@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-fixes

Daniel writes:
The big thing is the disabling of the hsw support by default, cc: stable.
We've aimed for basic hsw support in 3.6, but due to a few bad
happenstances we've screwed up and only 3.8 will have better modeset
support than vesa. To avoid yet another round of fallout from such a
gaffle on for the next platform we've added a module option to disable
early hw support by default. That should also give us more flexibility in
bring-up.

Otherwise just small fixes:
- 3 fixes from Egbert for sdvo corner cases
- invert-brightness quirk entry from Egbert
- revert a dp link training change, it regresses some setups
- and shut up a spurious WARN in our gem fault handler.
- regression fix for an oops on bit17 swizzling machines, introduce in 3.7
- another no-lvds quirk

* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
  drm/i915: Initialize obj->pages before use by i915_gem_object_do_bit17_swizzle()
  drm/i915: Add no-lvds quirk for Supermicro X7SPA-H
  drm/i915: Insert i915_preliminary_hw_support variable.
  drm/i915: shut up spurious WARN in the gtt fault handler
  Revert "drm/i915: Try harder to complete DP training pattern 1"
  DRM/i915: Restore sdvo_flags after dtd->mode->dtd Roundrtrip.
  DRM/i915: Don't clone SDVO LVDS with analog.
  DRM/i915: Add QUIRK_INVERT_BRIGHTNESS for NCR machines.
  DRM/i915: Don't delete DPLL Multiplier during DAC init.

Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/urgent

Pull ftrace ring-buffer resizing fix from Steve Rostedt.

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Documentation: Reflect the new location of the NMI watchdog info

Commit 9919cba7 ("watchdog: Update documentation") moved the
NMI watchdog documentation from nmi_watchdog.txt to
lockup-watchdogs.txt. Update the index file accordingly.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/r/20121021120551.4656d99b@endymion.delvare
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/urgent

Pull various uprobes bugfixes from Oleg Nesterov - mostly race and
failure path fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge tag 'numascale_mce_fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/urgent

Pull a fix from Borislav Petkov:

"Fix for a bug causing an OOPS on confederate systems from Numascale
  where northbridge descriptors are not unique, causing a lookup failure
  of the northbridge descriptor in the amd_nb.c code.

  A more general fix is in the works."

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge branch 'nohz/core' of git://github.com/fweisbec/linux-dynticks into timers/core

Pull uncontroversial cleanup/refactoring nohz patches from Frederic Weisbecker.

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge branch 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/urgent

Pull event-wrapping Oprofile fix from Robert Richter.

Signed-off-by: Ingo Molnar <mingo@kernel.org>

numa, mm: Fix NUMA hinting page faults from gup/gup_fast

Introduce FOLL_NUMA to tell follow_page to check
pte/pmd_numa. get_user_pages must use FOLL_NUMA, and it's safe to do
so because it always invokes handle_mm_fault and retries the
follow_page later.

KVM secondary MMU page faults will trigger the NUMA hinting page
faults through gup_fast -> get_user_pages -> follow_page ->
handle_mm_fault.

Other follow_page callers like KSM should not use FOLL_NUMA, or they
would fail to get the pages if they use follow_page instead of
get_user_pages.

[ This patch was picked up from the AutoNUMA tree. ]

Originally-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
[ ported to this tree. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched, numa, mm: Implement constant, per task Working Set Sampling (WSS) rate

Previously, to probe the working set of a task, we'd use
a very simple and crude method: mark all of its address
space PROT_NONE.

That method has various (obvious) disadvantages:

- it samples the working set at dissimilar rates,
   giving some tasks a sampling quality advantage
   over others.

- creates performance problems for tasks with very
   large working sets

- over-samples processes with large address spaces but
   which only very rarely execute

Improve that method by keeping a rotating offset into the
address space that marks the current position of the scan,
and advance it by a constant rate (in a CPU cycles execution
proportional manner). If the offset reaches the last mapped
address of the mm then it then it starts over at the first
address.

The per-task nature of the working set sampling functionality
in this tree allows such constant rate, per task,
execution-weight proportional sampling of the working set,
with an adaptive sampling interval/frequency that goes from
once per 100 msecs up to just once per 1.6 seconds.
The current sampling volume is 256 MB per interval.

As tasks mature and converge their working set, so does the
sampling rate slow down to just a trickle, 256 MB per 1.6
seconds of CPU time executed.

This, beyond being adaptive, also rate-limits rarely
executing systems and does not over-sample on overloaded
systems.

[ In AutoNUMA speak, this patch deals with the effective sampling
  rate of the 'hinting page fault'. AutoNUMA's scanning is
  currently rate-limited, but it is also fundamentally
  single-threaded, executing in the knuma_scand kernel thread,
  so the limit in AutoNUMA is global and does not scale up with
  the number of CPUs, nor does it scan tasks in an execution
  proportional manner.

  So the idea of rate-limiting the scanning was first implemented
  in the AutoNUMA tree via a global rate limit. This patch goes
  beyond that by implementing an execution rate proportional
  working set sampling rate that is not implemented via a single
  global scanning daemon. ]

Based-on-idea-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Link: http://lkml.kernel.org/n/tip-wt5b48o2226ec63784i58s3j@git.kernel.org
[ Wrote changelog and fixed bug. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge branch 'linus'

Linux 3.7-rc2

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64

Pull arm64 fixes from Catalin Marinas:
"Main changes:
   - AArch64 Linux compilation fixes following 3.7-rc1 changes
     (MODULES_USE_ELF_RELA, update_vsyscall() prototype)
   - Unnecessary register setting in start_thread() (thanks to Al Viro)
   - ptrace fixes"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64:
  arm64: fix alignment padding in assembly code
  arm64: ptrace: use HW_BREAKPOINT_EMPTY type for disabled breakpoints
  arm64: ptrace: make structure padding explicit for debug registers
  arm64: No need to set the x0-x2 registers in start_thread()
  arm64: Ignore memory blocks below PHYS_OFFSET
  arm64: Fix the update_vsyscall() prototype
  arm64: Select MODULES_USE_ELF_RELA
  arm64: Remove duplicate inclusion of mmu_context.h in smp.c

arm64: fix alignment padding in assembly code

An interesting effect of using the generic version of linkage.h
is that the padding is defined in terms of x86 NOPs, which can have
even more interesting effects when the assembly code looks like this:

ENTRY(func1)
mov x0, xzr
ENDPROC(func1)
// fall through
ENTRY(func2)
mov x0, #1
ret
ENDPROC(func2)

Admittedly, the code is not very nice. But having code from another
architecture doesn't look completely sane either.

The fix is to add arm64's version of linkage.h, which causes the insertion
of proper AArch64 NOPs.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

Merge branch 'tools/kvm'

Merge branch 'perf/urgent'

Merge branch 'linus'

perf/x86: Disable uncore on virtualized CPUs

Initializing uncore PMU on virtualized CPU may hang the kernel.
This is because kvm does not emulate the entire hardware. Thers
are lots of uncore related MSRs, making kvm enumerate them all
is a non-trival task. So just disable uncore on virtualized CPU.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Tested-by: Pekka Enberg <penberg@kernel.org>
Cc: a.p.zijlstra@chello.nl
Cc: eranian@google.com
Cc: andi@firstfloor.org
Cc: avi@redhat.com
Link: http://lkml.kernel.org/r/1345540117-14164-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge branch 'x86/urgent'

Merge branch 'perf/urgent'

Merge branch 'numa/core'

sched: Fix !CONFIG_SCHED_NUMA account_numa_enqueue() variant

This build warning:

kernel/sched/fair.c:1015:1: warning: no return statement in function returning non-void

Triggers because the dummy account_numa_enqueue() should return the
rq's task list.

It's not possible to trigger this bug runtime, nevertheless fix the
warning.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

MIPS/thp: Fix update_mmu_cache() cache call

As per recent upstream commit:

b113da65785d mm: Add and use update_mmu_cache_pmd() in transparent huge page code.

The call in do_huge_pmd_prot_none() needs to call update_mmu_cache_pmd()
as well.

This resolves a MIPS build error triggered on linux-next.

Reported-by: Ralf Baechle <ralf@linux-mips.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

MIPS/thp: Add pmd_pgprot() implementation

Resolve the semantic conflict between the new THP code
on MIPS and the new NUMA code, in linux-next, by adding
the pmd_pgprot() method needed by the NUMA code.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

s390/thp: implement pmd_pgprot() for s390

Git commit "mm/thp: Preserve pgprot across huge page split"
introduced a pmd_pgprot() function, which is missing
on s390, resulting in a compile error in linux-next where
THP is enabled on s390 as well.

This patch adds an implementation of pmd_pgprot() for s390.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

kvm tools: Port to v3.7-rc1

There were a number of changes that caused problems:

- UAPI conversion

- the old version of the rbtree-augmented API got removed, I
updated the implementation to the new API

- lib/rbtree.c wants a 'true' definition

Lightly tested: it boots a v3.7-rc1 defconfig+kvmconfig kernel.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

Merge commit 'v3.7-rc1' into kvmtool/next