git.karo-electronics.de Git - karo-tx-linux.git/log

manual merge of numa/core

Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched, numa, mm, s390/thp: Add pmd_mknotpresent()

We'd like to make use of pmd_mknotpresent() in mm/, but
not all architectures have it. This patch adds the s390
version.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Link: http://lkml.kernel.org/r/20121028131014.GA10754@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/mm: Completely drop the TLB flush from ptep_set_access_flags()

Intel has an architectural guarantee that the TLB entry causing
a page fault gets invalidated automatically. This means
we should be able to drop the local TLB invalidation.

Because of the way other areas of the page fault code work,
chances are good that all x86 CPUs do this. However, if
someone somewhere has an x86 CPU that does not invalidate
the TLB entry causing a page fault, this one-liner should
be easy to revert - or a CPU model specific quirk could
be added to retain this optimization on most CPUs.

Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Linus Torvalds <torvalds@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michel Lespinasse <walken@google.com>
[ Applied changelog massage and moved this last in the series,
to create bisection distance. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched, numa, mm: Implement slow start for working set sampling

Add a 1 second delay before starting to scan the working set of
a task and starting to balance it amongst nodes.

[ note that before the constant per task WSS sampling rate patch
  the initial scan would happen much later still, in effect that
  patch caused this regression. ]

The theory is that short-run tasks benefit very little from NUMA
placement: they come and go, and they better stick to the node
they were started on. As tasks mature and rebalance to other CPUs
and nodes, so does their NUMA placement have to change and so
does it start to matter more and more.

In practice this change fixes an observable kbuild regression:

   # [ a perf stat --null --repeat 10 test of ten bzImage builds to /dev/shm ]

   !NUMA:
   45.291088843 seconds time elapsed                                          ( +-  0.40% )
   45.154231752 seconds time elapsed                                          ( +-  0.36% )

   +NUMA, no slow start:
   46.172308123 seconds time elapsed                                          ( +-  0.30% )
   46.343168745 seconds time elapsed                                          ( +-  0.25% )

   +NUMA, 1 sec slow start:
   45.224189155 seconds time elapsed                                          ( +-  0.25% )
   45.160866532 seconds time elapsed                                          ( +-  0.17% )

and it also fixes an observable perf bench (hackbench) regression:

   # perf stat --null --repeat 10 perf bench sched messaging

   -NUMA:

   -NUMA:                  0.246225691 seconds time elapsed                   ( +-  1.31% )
   +NUMA no slow start:    0.252620063 seconds time elapsed                   ( +-  1.13% )

   +NUMA 1sec delay:       0.248076230 seconds time elapsed                   ( +-  1.35% )

The implementation is simple and straightforward, most of the patch
deals with adding the /proc/sys/kernel/sched_numa_scan_delay_ms tunable
knob.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Link: http://lkml.kernel.org/n/tip-vn7p3ynbwqt3qqewhdlvjltc@git.kernel.org
[ Wrote the changelog, ran measurements, tuned the default. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched, numa, mm: Add NUMA_MIGRATION feature flag

After this patch, doing:

# echo NO_NUMA_MIGRATION > /sys/kernel/debug/sched_features

Will turn off the NUMA placement logic/policy - but keeps the
working set sampling faults in place.

This allows the debugging of the WSS facility, by using it
but keeping vanilla, non-NUMA CPU and memory placement
policies.

Default enabled. Generates on extra code on !CONFIG_SCHED_DEBUG.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Link: http://lkml.kernel.org/n/tip-xjt7bqjlphxRfjXxasqm4cdv@git.kernel.org

sched, numa, mm: Implement constant, per task Working Set Sampling (WSS) rate

Previously, to probe the working set of a task, we'd use
a very simple and crude method: mark all of its address
space PROT_NONE.

That method has various (obvious) disadvantages:

- it samples the working set at dissimilar rates,
   giving some tasks a sampling quality advantage
   over others.

- creates performance problems for tasks with very
   large working sets

- over-samples processes with large address spaces but
   which only very rarely execute

Improve that method by keeping a rotating offset into the
address space that marks the current position of the scan,
and advance it by a constant rate (in a CPU cycles execution
proportional manner). If the offset reaches the last mapped
address of the mm then it then it starts over at the first
address.

The per-task nature of the working set sampling functionality
in this tree allows such constant rate, per task,
execution-weight proportional sampling of the working set,
with an adaptive sampling interval/frequency that goes from
once per 100 msecs up to just once per 1.6 seconds.
The current sampling volume is 256 MB per interval.

As tasks mature and converge their working set, so does the
sampling rate slow down to just a trickle, 256 MB per 1.6
seconds of CPU time executed.

This, beyond being adaptive, also rate-limits rarely
executing systems and does not over-sample on overloaded
systems.

[ In AutoNUMA speak, this patch deals with the effective sampling
  rate of the 'hinting page fault'. AutoNUMA's scanning is
  currently rate-limited, but it is also fundamentally
  single-threaded, executing in the knuma_scand kernel thread,
  so the limit in AutoNUMA is global and does not scale up with
  the number of CPUs, nor does it scan tasks in an execution
  proportional manner.

  So the idea of rate-limiting the scanning was first implemented
  in the AutoNUMA tree via a global rate limit. This patch goes
  beyond that by implementing an execution rate proportional
  working set sampling rate that is not implemented via a single
  global scanning daemon. ]

[ Dan Carpenter pointed out a possible NULL pointer dereference in the
  first version of this patch. ]

Based-on-idea-by: Andrea Arcangeli <aarcange@redhat.com>
Bug-Found-By: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Link: http://lkml.kernel.org/n/tip-wt5b48o2226ec63784i58s3j@git.kernel.org
[ Wrote changelog and fixed bug. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched, numa, mm: Add credits for NUMA placement

The NUMA placement code has been rewritten several times, but
the basic ideas took a lot of work to develop. The people who
put in the work deserve credit for it. Thanks Andrea & Peter :)

[ The Documentation/scheduler/numa-problem.txt file should
probably be rewritten once we figure out the final details of
what the NUMA code needs to do, and why. ]

Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: aarcange@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20121018171928.24d06af4@cuia.bos.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
----
This is against tip.git numa/core

sched, numa, mm: Add fault driven placement and migration policy

As per the problem/design document Documentation/scheduler/numa-problem.txt
implement 3ac & 4.

( A pure 3a was found too unstable, I did briefly try 3bc
but found no significant improvement. )

Implement a per-task memory placement scheme relying on a regular
PROT_NONE 'migration' fault to scan the memory space of the procress
and uses a two stage migration scheme to reduce the invluence of
unlikely usage relations.

It relies on the assumption that the compute part is tied to a
paticular task and builds a task<->page relation set to model the
compute<->data relation.

In the previous patch we made memory migrate towards where the task
is running, here we select the node on which most memory is located
as the preferred node to run on.

This creates a feed-back control loop between trying to schedule a
task on a node and migrating memory towards the node the task is
scheduled on.

Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Suggested-by: Rik van Riel <riel@redhat.com>
Fixes-by: David Rientjes <rientjes@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-8ejt0ioj62k5ruf5zd2ix9zu@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched, numa, mm/mpol: Add_MPOL_F_HOME

Add MPOL_F_HOME, to implement multi-stage home node binding.

Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Suggested-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Turner <pjt@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-ec1gnZa9xaqwlsSjsfqjyxcl@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched, numa, mm: Introduce last_nid in the pageframe

Introduce a per-page last_nid field, fold this into the struct
page::flags field whenever possible.

The unlikely/rare 32bit NUMA configs will likely grow the page-frame.

Completely dropping 32bit support for CONFIG_SCHED_NUMA would simplify
things, but it would also remove the warning if we grow enough 64bit
only page-flags to push the last-nid out.

Suggested-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Link: http://lkml.kernel.org/n/tip-0uois4f9skfw9mwyk1yoy0jq@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched, numa, mm: Implement home-node awareness

Implement home node preference in the scheduler's load-balancer.

This is done in four pieces:

- task_numa_hot(); make it harder to migrate tasks away from their
   home-node, controlled using the NUMA_HOT feature flag.

- select_task_rq_fair(); prefer placing the task in their home-node,
   controlled using the NUMA_TTWU_BIAS feature flag. Disabled by
   default for we found this to be far too agressive.

- load_balance(); during the regular pull load-balance pass, try
   pulling tasks that are on the wrong node first with a preference
   of moving them nearer to their home-node through task_numa_hot(),
   controlled through the NUMA_PULL feature flag.

- load_balance(); when the balancer finds no imbalance, introduce
   some such that it still prefers to move tasks towards their
   home-node, using active load-balance if needed, controlled through
   the NUMA_PULL_BIAS feature flag.

   In particular, only introduce this BIAS if the system is otherwise
   properly (weight) balanced and we either have an offnode or !numa
   task to trade for it.

In order to easily find off-node tasks, split the per-cpu task list
into two parts.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Turner <pjt@google.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-g0gzzzjrvrzxl6kgwnil1e97@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched, numa, mm: Implement THP migration

Add THP migration for the NUMA working set scanning fault case.

It uses the page lock to serialize. No migration pte dance is
necessary because the pte is already unmapped when we decide
to migrate.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-yv9vbiz2s455zxq1ffzx3fye@git.kernel.org
[ Significant fixes and changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched, numa, mm: Introduce sched_feat_numa()

Avoid a few #ifdef's later on.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Turner <pjt@google.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-sxrRsaqz4cj1plnzyjbtWzbf@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched, numa, mm/mpol: Make mempolicy home-node aware

Add another layer of fallback policy to make the home node concept
useful from a memory allocation PoV.

This changes the mpol order to:

- vma->vm_ops->get_policy [if applicable]
- vma->vm_policy [if applicable]
- task->mempolicy
- tsk_home_node() preferred [NEW]
- default_policy

Note that the tsk_home_node() policy has Migrate-on-Fault enabled to
facilitate efficient on-demand memory migration.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Turner <pjt@google.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-ybn5oygsqXZa6jtwpwjywmdu@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched, numa, mm: Introduce tsk_home_node()

Introduce the home-node concept for tasks. In order to keep memory
locality we need to have a something to stay local to, we define the
home-node of a task as the node we prefer to allocate memory from and
prefer to execute on.

These are no hard guarantees, merely soft preferences. This allows for
optimal resource usage, we can run a task away from the home-node, the
remote memory hit -- while expensive -- is less expensive than not
running at all, or very little, due to severe cpu overload.

Similarly, we can allocate memory from another node if our home-node
is depleted, again, some memory is better than no memory.

This patch merely introduces the basic infrastructure, all policy
comes later.

NOTE: we introduce the concept of EMBEDDED_NUMA, these are
architectures where the memory access cost doesn't depend on the cpu
but purely on the physical address -- embedded boards with cheap
(slow) and expensive (fast) memory banks.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-ii8j8cp87cgctecfqp2ib6rn@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

mm/mpol: Use special PROT_NONE to migrate pages

Combine our previous PROT_NONE, mpol_misplaced and
migrate_misplaced_page() pieces into an effective migrate on fault
scheme.

Note that (on x86) we rely on PROT_NONE pages being !present and avoid
the TLB flush from try_to_unmap(TTU_MIGRATION). This greatly improves
the page-migration performance.

Suggested-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Paul Turner <pjt@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Link: http://lkml.kernel.org/n/tip-e98gyl8kr9jzooh2s4piuils@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

mm/migrate: Introduce migrate_misplaced_page()

Add migrate_misplaced_page() which deals with migrating pages from
faults.

This includes adding a new MIGRATE_FAULT migration mode to
deal with the extra page reference required due to having to look up
the page.

Based-on-work-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Paul Turner <pjt@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-es03i8ne7xee0981brw40fl5@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

numa, mm: Support NUMA hinting page faults from gup/gup_fast

Introduce FOLL_NUMA to tell follow_page to check
pte/pmd_numa. get_user_pages must use FOLL_NUMA, and it's safe to do
so because it always invokes handle_mm_fault and retries the
follow_page later.

KVM secondary MMU page faults will trigger the NUMA hinting page
faults through gup_fast -> get_user_pages -> follow_page ->
handle_mm_fault.

Other follow_page callers like KSM should not use FOLL_NUMA, or they
would fail to get the pages if they use follow_page instead of
get_user_pages.

[ This patch was picked up from the AutoNUMA tree. ]

Originally-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
[ ported to this tree. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

mm/mpol: Add MPOL_MF_LAZY

This patch adds another mbind() flag to request "lazy migration". The
flag, MPOL_MF_LAZY, modifies MPOL_MF_MOVE* such that the selected
pages are marked PROT_NONE. The pages will be migrated in the fault
path on "first touch", if the policy dictates at that time.

"Lazy Migration" will allow testing of migrate-on-fault via mbind().
Also allows applications to specify that only subsequently touched
pages be migrated to obey new policy, instead of all pages in range.
This can be useful for multi-threaded applications working on a
large shared data area that is initialized by an initial thread
resulting in all pages on one [or a few, if overflowed] nodes.
After PROT_NONE, the pages in regions assigned to the worker threads
will be automatically migrated local to the threads on 1st touch.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
[ nearly complete rewrite.. ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-7rsodo9x8zvm5awru5o7zo0y@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

mm/mpol: Create special PROT_NONE infrastructure

In order to facilitate a lazy -- fault driven -- migration of pages,
create a special transient PROT_NONE variant, we can then use the
'spurious' protection faults to drive our migrations from.

Pages that already had an effective PROT_NONE mapping will not
be detected to generate these 'spuriuos' faults for the simple reason
that we cannot distinguish them on their protection bits, see
pte_numa().

This isn't a problem since PROT_NONE (and possible PROT_WRITE with
dirty tracking) aren't used or are rare enough for us to not care
about their placement.

Suggested-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Paul Turner <pjt@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Link: http://lkml.kernel.org/n/tip-0g5k80y4df8l83lha9j75xph@git.kernel.org
[ fixed various cross-arch and THP/!THP details ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

mm/mpol: Check for misplaced page

This patch provides a new function to test whether a page resides
on a node that is appropriate for the mempolicy for the vma and
address where the page is supposed to be mapped.  This involves
looking up the node where the page belongs.  So, the function
returns that node so that it may be used to allocated the page
without consulting the policy again.

A subsequent patch will call this function from the fault path.
Because of this, I don't want to go ahead and allocate the page, e.g.,
via alloc_page_vma() only to have to free it if it has the correct
policy.  So, I just mimic the alloc_page_vma() node computation
logic--sort of.

Note:  we could use this function to implement a MPOL_MF_STRICT
behavior when migrating pages to match mbind() mempolicy--e.g.,
to ensure that pages in an interleaved range are reinterleaved
rather than left where they are when they reside on any page in
the interleave nodemask.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
[ Added MPOL_F_LAZY to trigger migrate-on-fault;
  simplified code now that we don't have to bother
  with special crap for interleaved ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-z3mgep4tgrc08o07vl1ahb2m@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

mm/mpol: Add MPOL_MF_NOOP

This patch augments the MPOL_MF_LAZY feature by adding a "NOOP" policy
to mbind().  When the NOOP policy is used with the 'MOVE and 'LAZY
flags, mbind() will map the pages PROT_NONE so that they will be
migrated on the next touch.

This allows an application to prepare for a new phase of operation
where different regions of shared storage will be assigned to
worker threads, w/o changing policy.  Note that we could just use
"default" policy in this case.  However, this also allows an
application to request that pages be migrated, only if necessary,
to follow any arbitrary policy that might currently apply to a
range of pages, without knowing the policy, or without specifying
multiple mbind()s for ranges with different policies.

[ Bug in early version of mpol_parse_str() reported by Fengguang Wu. ]

Bug-Reported-by: Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

mm/mpol: Make MPOL_LOCAL a real policy

Make MPOL_LOCAL a real and exposed policy such that applications that
relied on the previous default behaviour can explicitly request it.

Requested-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

mm/pgprot: Move the pgprot_modify() fallback definition to mm.h

pgprot_modify() is available on x86, but on other architectures it only
gets defined in mm/mprotect.c - breaking the build if anything outside
of mprotect.c tries to make use of this function.

Move it to the generic pgprot area in mm.h, so that an upcoming patch
can make use of it.

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rik van Riel <riel@redhat.com>
Cc: Paul Turner <pjt@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-nfvarGMj9gjavowroorkizb4@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched, numa, mm, MIPS/thp: Add pmd_pgprot() implementation

Add the pmd_pgprot() method that will be needed
by the new NUMA code.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

mm: Only flush the TLB when clearing an accessible pte

If ptep_clear_flush() is called to clear a page table entry that is
accessible anyway by the CPU, eg. a _PAGE_PROTNONE page table entry,
there is no need to flush the TLB on remote CPUs.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-vm3rkzevahelwhejx5uwm8ex@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/mm: Introduce pte_accessible()

We need pte_present to return true for _PAGE_PROTNONE pages, to indicate that
the pte is associated with a page.

However, for TLB flushing purposes, we would like to know whether the pte
points to an actually accessible page. This allows us to skip remote TLB
flushes for pages that are not actually accessible.

Fill in this method for x86 and provide a safe (but slower) method
on other architectures.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Fixed-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-66p11te4uj23gevgh4j987ip@git.kernel.org
[ Added Linus's review fixes. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

mm/thp: Preserve pgprot across huge page split

We're going to play games with page-protections, ensure we don't lose
them over a THP split.

Collapse seems to always allocate a new (huge) page which should
already end up on the new target node so loosing protections there
isn't a problem.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Paul Turner <pjt@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Link: http://lkml.kernel.org/n/tip-eyi25t4eh3l4cd2zp4k3bj6c@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched, numa, mm, s390/thp: Implement pmd_pgprot() for s390

This patch adds an implementation of pmd_pgprot() for s390,
in preparation to future THP changes.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched, numa, mm: Describe the NUMA scheduling problem formally

This is probably a first: formal description of a complex high-level
computing problem, within the kernel source.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Mike Galbraith <efault@gmx.de>
Rik van Riel <riel@redhat.com>
Link: http://lkml.kernel.org/n/tip-mmnlpupoetcatimvjEld16Pb@git.kernel.org
[ Next step: generate the kernel source from such formal descriptions and retire to a tropical island! ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched, numa, mm: Make find_busiest_queue() a method

Its a bit awkward but it was the least painful means of modifying the
queue selection. Used in a later patch to conditionally use a random
queue.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Turner <pjt@google.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-lfpez319yryvdhwqfqrh99f2@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge branch 'x86/ras'

Merge branch 'x86/cleanups'

Merge branch 'x86/boot'

Merge branch 'x86/asm'

Merge branch 'x86/acpi'

Merge branch 'timers/urgent'

Merge branch 'timers/core'

Merge branch 'sched/urgent'

Merge branch 'sched/core'

Merge branch 'perf/core'

Merge branch 'numa/misc'

Merge branch 'core/locking'

Merge branch 'tools/kvm'

x86/mm: Only do a local tlb flush in ptep_set_access_flags()

Because we only ever upgrade a PTE when calling ptep_set_access_flags(),
it is safe to skip flushing entries on remote TLBs.

The worst that can happen is a spurious page fault on other CPUs, which
would flush that TLB entry.

Lazily letting another CPU incur a spurious page fault occasionally
is (much!) cheaper than aggressively flushing everybody else's TLB.

Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michel Lespinasse <walken@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

mm/generic: Only flush the local TLB in ptep_set_access_flags()

The function ptep_set_access_flags() is only ever used to upgrade
access permissions to a page - i.e. they make it less restrictive.

That means the only negative side effect of not flushing remote
TLBs in this function is that other CPUs may incur spurious page
faults, if they happen to access the same address, and still have
a PTE with the old permissions cached in their TLB caches.

Having another CPU maybe incur a spurious page fault is faster
than always incurring the cost of a remote TLB flush, so replace
the remote TLB flush with a purely local one.

This should be safe on every architecture that correctly
implements flush_tlb_fix_spurious_fault() to actually invalidate
the local TLB entry that caused a page fault, as well as on
architectures where the hardware invalidates TLB entries that
cause page faults.

In the unlikely event that you are hitting what appears to be
an infinite loop of page faults, and 'git bisect' took you to
this changeset, your architecture needs to implement
flush_tlb_fix_spurious_fault() to actually flush the TLB entry.

Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michel Lespinasse <walken@google.com>
[ Changelog massage. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:
"This is what we usually expect at this stage of the game, lots of
  little things, mostly in drivers.  With the occasional 'oops didn't
  mean to do that' kind of regressions in the core code."

1) Uninitialized data in __ip_vs_get_timeouts(), from Arnd Bergmann

2) Reject invalid ACK sequences in Fast Open sockets, from Jerry Chu.

3) Lost error code on return from _rtl_usb_receive(), from Christian
    Lamparter.

4) Fix reset resume on USB rt2x00, from Stanislaw Gruszka.

5) Release resources on error in pch_gbe driver, from Veaceslav Falico.

6) Default hop limit not set correctly in ip6_template_metrics[], fix
    from Li RongQing.

7) Gianfar PTP code requests wrong kind of resource during probe, fix
    from Wei Yang.

8) Fix VHOST net driver on big-endian, from Michael S Tsirkin.

9) Mallenox driver bug fixes from Jack Morgenstein, Or Gerlitz, Moni
    Shoua, Dotan Barak, and Uri Habusha.

10) usbnet leaks memory on TX path, fix from Hemant Kumar.

11) Use socket state test, rather than presence of FIN bit packet, to
    determine FIONREAD/SIOCINQ value.  Fix from Eric Dumazet.

12) Fix cxgb4 build failure, from Vipul Pandya.

13) Provide a SYN_DATA_ACKED state to complement SYN_FASTOPEN in socket
    info dumps.  From Yuchung Cheng.

14) Fix leak of security path in kfree_skb_partial().  Fix from Eric
    Dumazet.

15) Handle RX FIFO overflows more resiliently in pch_gbe driver, from
    Veaceslav Falico.

16) Fix MAINTAINERS file pattern for networking drivers, from Jean
    Delvare.

17) Add iPhone5 IDs to IPHETH driver, from Jay Purohit.

18) VLAN device type change restriction is too strict, and should not
    trigger for the automatically generated vlan0 device.  Fix from Jiri
    Pirko.

19) Make PMTU/redirect flushing work properly again in ipv4, from
    Steffen Klassert.

20) Fix memory corruptions by using kfree_rcu() in netlink_release().
    From Eric Dumazet.

21) More qmi_wwan device IDs, from Bjørn Mork.

22) Fix unintentional change of SNAT/DNAT hooks in generic NAT
    infrastructure, from Elison Niven.

23) Fix 3.6.x regression in xt_TEE netfilter module, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits)
  tilegx: fix some issues in the SW TSO support
  qmi_wwan/cdc_ether: move Novatel 551 and E362 to qmi_wwan
  net: usb: Fix memory leak on Tx data path
  net/mlx4_core: Unmap UAR also in the case of error flow
  net/mlx4_en: Don't use vlan tag value as an indication for vlan presence
  net/mlx4_en: Fix double-release-range in tx-rings
  bas_gigaset: fix pre_reset handling
  vhost: fix mergeable bufs on BE hosts
  gianfar_ptp: use iomem, not ioports resource tree in probe
  ipv6: Set default hoplimit as zero.
  NET_VENDOR_TI: make available for am33xx as well
  pch_gbe: fix error handling in pch_gbe_up()
  b43: Fix oops on unload when firmware not found
  mwifiex: clean up scan state on error
  mwifiex: return -EBUSY if specific scan request cannot be honored
  brcmfmac: fix potential NULL dereference
  Revert "ath9k_hw: Updated AR9003 tx gain table for 5GHz"
  ath9k_htc: Add PID/VID for a Ubiquiti WiFiStation
  rt2x00: usb: fix reset resume
  rtlwifi: pass rx setup error code to caller
  ...

Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma

Pull slave-dmaengine fixes from Vinod Koul:
"Three fixes for slave dmanegine.

  Two are for typo omissions in sifr dmaengine driver and the last one
  is for the imx driver fixing a missing unlock"

* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: sirf: fix a typo in moving running dma_desc to active queue
  dmaengine: sirf: fix a typo in dma_prep_interleaved
  dmaengine: imx-dma: fix missing unlock on error in imxdma_xfer_desc()

Merge tag 'pm+acpi-for-3.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI fixes from Rafael J Wysocki:

- Fix for a memory leak in acpi_bind_one() from Jesper Juhl.

- Fix for an error code path memory leak in pm_genpd_attach_cpuidle()
   from Jonghwan Choi.

- Fix for smp_processor_id() usage in preemptible code in powernow-k8
   from Andreas Herrmann.

- Fix for a suspend-related memory leak in cpufreq stats from Xiaobing
   Tu.

- Freezer fix for failure to clear PF_NOFREEZE along with PF_KTHREAD in
   flush_old_exec() from Oleg Nesterov.

- acpi_processor_notify() fix from Alan Cox.

* tag 'pm+acpi-for-3.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: missing break
  freezer: exec should clear PF_NOFREEZE along with PF_KTHREAD
  Fix memory leak in cpufreq stats.
  cpufreq / powernow-k8: Remove usage of smp_processor_id() in preemptible code
  PM / Domains: Fix memory leak on error path in pm_genpd_attach_cpuidle
  ACPI: Fix memory leak in acpi_bind_one()

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband fixes from Roland Dreier:
"Small batch of fixes for 3.7:
   - Fix crash in error path in cxgb4
   - Fix build error on 32 bits in mlx4
   - Fix SR-IOV bugs in mlx4"

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  mlx4_core: Perform correct resource cleanup if mlx4_QUERY_ADAPTER() fails
  mlx4_core: Remove annoying debug messages from SR-IOV flow
  RDMA/cxgb4: Don't free chunk that we have failed to allocate
  IB/mlx4: Synchronize cleanup of MCGs in MCG paravirtualization
  IB/mlx4: Fix QP1 P_Key processing in the Primary Physical Function (PPF)
  IB/mlx4: Fix build error on platforms where UL is not 64 bits

Merge tag 'usb-3.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg Kroah-Hartman:
"Here are a bunch of USB fixes for the 3.7-rc tree.

  There's a lot of small USB serial driver fixes, and one larger one
  (the mos7840 driver changes are mostly just moving code around to fix
  problems.) Thanks to Johan Hovold for finding the problems and fixing
  them all up.

  Other than those, there is the usual new device ids, xhci bugfixes,
  and gadget driver fixes, nothing out of the ordinary.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'usb-3.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (49 commits)
  xhci: trivial: Remove assigned but unused ep_ctx.
  xhci: trivial: Remove assigned but unused slot_ctx.
  xhci: Fix missing break in xhci_evaluate_context_result.
  xhci: Fix potential NULL ptr deref in command cancellation.
  ehci: Add yet-another Lucid nohandoff pci quirk
  ehci: fix Lucid nohandoff pci quirk to be more generic with BIOS versions
  USB: mos7840: fix port_probe flow
  USB: mos7840: fix port-data memory leak
  USB: mos7840: remove invalid disconnect handling
  USB: mos7840: remove NULL-urb submission
  USB: qcserial: fix interface-data memory leak in error path
  USB: option: fix interface-data memory leak in error path
  USB: ipw: fix interface-data memory leak in error path
  USB: mos7840: fix port-device leak in error path
  USB: mos7840: fix urb leak at release
  USB: sierra: fix port-data memory leak
  USB: sierra: fix memory leak in probe error path
  USB: sierra: fix memory leak in attach error path
  USB: usb-wwan: fix multiple memory leaks in error paths
  USB: keyspan: fix NULL-pointer dereferences and memory leaks
  ...

Merge tag 'tty-3.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull serial fix from Greg Kroah-Hartman:
"Here is one patch, a revert of a omap serial driver patch that was
causing problems, for your 3.7-rc tree.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'tty-3.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "serial: omap: fix software flow control"

Merge tag 'staging-3.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg Kroah-Hartman:
"Here are some staging driver fixes for your 3.7-rc tree.

  Nothing major here, a number of iio driver fixups that were causing
  problems, some comedi driver bugfixes, and a bunch of tidspbridge
  warning squashing and other regressions fixed from the 3.6 release.

  All have been in the linux-next releases for a bit.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'staging-3.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (32 commits)
  staging: tidspbridge: delete unused mmu functions
  staging: tidspbridge: ioremap physical address of the stack segment in shm
  staging: tidspbridge: ioremap dsp sync addr
  staging: tidspbridge: change type to __iomem for per and core addresses
  staging: tidspbridge: drop const from custom mmu implementation
  staging: tidspbridge: request the right irq for mmu
  staging: ipack: add missing include (implicit declaration of function 'kfree')
  staging: ramster: depends on NET
  staging: omapdrm: fix allocation size for page addresses array
  staging: zram: Fix handling of incompressible pages
  Staging: android: binder: Allow using highmem for binder buffers
  Staging: android: binder: Fix memory leak on thread/process exit
  staging: comedi: ni_labpc: fix possible NULL deref during detach
  staging: comedi: das08: fix possible NULL deref during detach
  staging: comedi: amplc_pc263: fix possible NULL deref during detach
  staging: comedi: amplc_pc236: fix possible NULL deref during detach
  staging: comedi: amplc_pc236: fix invalid register access during detach
  staging: comedi: amplc_dio200: fix possible NULL deref during detach
  staging: comedi: 8255_pci: fix possible NULL deref during detach
  staging: comedi: ni_daq_700: fix dio subdevice regression
  ...

Merge tag 'driver-core-3.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg Kroah-Hartman:
"Here are a number of firmware core fixes for 3.7, and some other minor
  fixes.  And some documentation updates thrown in for good measure.

  All have been in the linux-next tree for a while.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'driver-core-3.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  Documentation:Chinese translation of Documentation/arm64/memory.txt
  Documentation:Chinese translation of Documentation/arm64/booting.txt
  Documentation:Chinese translation of Documentation/IRQ.txt
  firmware loader: document kernel direct loading
  sysfs: sysfs_pathname/sysfs_add_one: Use strlcat() instead of strcat()
  dynamic_debug: Remove unnecessary __used
  firmware loader: sync firmware cache by async_synchronize_full_domain
  firmware loader: let direct loading back on 'firmware_buf'
  firmware loader: fix one reqeust_firmware race
  firmware loader: cancel uncache work before caching firmware

Merge tag 'char-misc-3.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg Kroah-Hartman:
"Here are some driver fixes for 3.7.  They include extcon driver fixes,
  a hyper-v bugfix, and two other minor driver fixes.

  All of these have been in the linux-next releases for a while.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'char-misc-3.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  sonypi: suspend/resume callbacks should be conditionally compiled on CONFIG_PM_SLEEP
  Drivers: hv: Cleanup error handling in vmbus_open()
  extcon : register for cable interest by cable name
  extcon: trivial: kfree missed from remove path
  extcon: driver model release call not needed
  extcon: MAX77693: Add platform data for MUIC device to initialize registers
  extcon: max77693: Use max77693_update_reg for rmw operations
  extcon: Fix kerneldoc for extcon_set_cable_state and extcon_set_cable_state_
  extcon: adc-jack: Add missing MODULE_LICENSE
  extcon: adc-jack: Fix checking return value of request_any_context_irq
  extcon: Fix return value in extcon_register_interest()
  extcon: unregister compat link on cleanup
  extcon: Unregister compat class at module unload to fix oops
  extcon: optimising the check_mutually_exclusive function
  extcon: standard cable names definition and declaration changed
  extcon-max8997: remove usage of ret in max8997_muic_handle_charger_type_detach
  extcon: Remove duplicate inclusion of extcon.h header file

VFS: don't do protected {sym,hard}links by default

In commit 800179c9b8a1 ("This adds symlink and hardlink restrictions to
the Linux VFS"), the new link protections were enabled by default, in
the hope that no actual application would care, despite it being
technically against legacy UNIX (and documented POSIX) behavior.

However, it does turn out to break some applications. It's rare, and
it's unfortunate, but it's unacceptable to break existing systems, so
we'll have to default to legacy behavior.

In particular, it has broken the way AFD distributes files, see

http://www.dwd.de/AFD/

along with some legacy scripts.

Distributions can end up setting this at initrd time or in system
scripts: if you have security problems due to link attacks during your
early boot sequence, you have bigger problems than some kernel sysctl
setting. Do:

echo 1 > /proc/sys/fs/protected_symlinks
echo 1 > /proc/sys/fs/protected_hardlinks

to re-enable the link protections.

Alternatively, we may at some point introduce a kernel config option
that sets these kinds of "more secure but not traditional" behavioural
options automatically.

Reported-by: Nick Bowler <nbowler@elliptictech.com>
Reported-by: Holger Kiehl <Holger.Kiehl@dwd.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org # v3.6
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'sound-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Slightly a high amount of commits come from Adrian Knoth's HDSPM
  driver fixes.  Other than that, all small trival fixes or quirks that
  are pretty driver-specific."

* tag 'sound-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ASoC: wm8994: Only enable extra BCLK cycles when required
  ALSA: als3000: check for the kzalloc return value
  ALSA: sound/isa/opti9xx/miro.c: eliminate possible double free
  ALSA: hda - Fix silent headphone output from Toshiba P200
  ALSA: hdspm - Fix coding style in CTL_ELEM macros
  ALSA: hdspm - Fix typo in kcontrol element on RME MADI cards
  ALSA: hdspm - Fix sync_in detection on AES/AES32
  ALSA: hdspm - Fix sync_in reporting on RME MADI cards
  ALSA: hdspm - Also report autosync_sample_rate on MADI and MADIface
  ALSA: hdspm - Fix reported autosync_sample_rate
  ALSA: hdspm - Fix sync check reporting on all RME HDSPM cards
  ALSA: hdspm - Report external rate in slave mode on PCI MADI
  ALSA: hdspm - Allow DDS/Varispeed to be set from userspace
  ALSA: hda - add dock support for Thinkpad T430
  ASoC: ux500_msp_i2s: Fix devm_* and return code merge error
  ASoC: Ux500: Dispose of device nodes correctly

Merge branch 'fixes_for_linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping

Pull DMA-mapping revert from Marek Szyprowski:
"Due to my mistake, my previous pull request (merged as commit
  cff7b8ba60e3: "Merge branch 'fixes_for_linus' ..") contained a patch
  which is aimed for v3.8 and lacks its dependences.  This pull request
  reverts it and fixes build break of ARM architecture."

* 'fixes_for_linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
  Revert "ARM: dma-mapping: support debug_dma_mapping_error"

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
"This fixes a couple of nasty page table initialization bugs which were
  causing kdump regressions.  A clean rearchitecturing of the code is in
  the works - meanwhile these are reverts that restore the
  best-known-working state of the kernel.

  There's also EFI fixes and other small fixes."

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, mm: Undo incorrect revert in arch/x86/mm/init.c
  x86: efi: Turn off efi_enabled after setup on mixed fw/kernel
  x86, mm: Find_early_table_space based on ranges that are actually being mapped
  x86, mm: Use memblock memory loop instead of e820_RAM
  x86, mm: Trim memory in memblock to be page aligned
  x86/irq/ioapic: Check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt
  x86/efi: Fix oops caused by incorrect set_memory_uc() usage
  x86-64: Fix page table accounting
  Revert "x86/mm: Fix the size calculation of mapping tables"
  MAINTAINERS: Add EFI git repository location

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
"Most of the kernel diffstat relates to a group of Intel P6 and KNC
  (Xeon-Phi Knights Corner) PMU driver fixes, neither of which is in
  heavy use, so we took the fixes.

  The rest is diverse smallish fixes to the tooling and kernel side."

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Remove unused variable in nhmex_rbox_alter_er()
  perf/x86: Enable overflow on Intel KNC with a custom knc_pmu_handle_irq()
  perf/x86: Remove cpuc->enable check on Intl KNC event enable/disable
  perf/x86: Make Intel KNC use full 40-bit width of counters
  perf/x86/uncore: Handle pci_read_config_dword() errors
  perf/x86: Remove P6 cpuc->enabled check
  perf/x86: Update/fix generic events on P6 PMU
  perf/x86: Fix P6 FP_ASSIST event constraint
  perf, cpu hotplug: Use cached value of smp_processor_id()
  perf, cpu hotplug: Run CPU_STARTING notifiers with irqs disabled
  x86/perf: Fix virtualization sanity check
  perf test: Fix exclude_guest parse events tests
  perf tools: do not flush maps on COMM for perf report
  perf help: Fix --help for builtins
  perf trace: Check if sample raw_data field is set
  perf trace: Validate syscall id before growing syscall table

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs

Pull btrfs fixes from Chris Mason:
"This has our series of fixes for the next rc.  The biggest batch is
  from Jan Schmidt, fixing up some problems in our subvolume quota code
  and fixing btrfs send/receive to work with the new extended inode
  refs."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: do not bug when we fail to commit the transaction
  Btrfs: fix memory leak when cloning root's node
  Btrfs: Use btrfs_update_inode_fallback when creating a snapshot
  Btrfs: Send: preserve ownership (uid and gid) also for symlinks.
  Btrfs: fix deadlock caused by the nested chunk allocation
  btrfs: Return EINVAL when length to trim is less than FSB
  Btrfs: fix memory leak in btrfs_quota_enable()
  Btrfs: send correct rdev and mode in btrfs-send
  Btrfs: extended inode refs support for send mechanism
  Btrfs: Fix wrong error handling code
  Fix a sign bug causing invalid memory access in the ino_paths ioctl.
  Btrfs: comment for loop in tree_mod_log_insert_move
  Btrfs: fix extent buffer reference for tree mod log roots
  Btrfs: determine level of old roots
  Btrfs: tree mod log's old roots could still be part of the tree
  Btrfs: fix a tree mod logging issue for root replacement operations
  Btrfs: don't put removals from push_node_left into tree mod log twice

Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements from Arnaldo Carvalho de Melo:

* perf inject changes to allow showing where a task sleeps, from Andrew Vagin.

* Makefile improvements from Namhyung Kim.

* Add --pre and --post command hooks in 'stat', from Peter Zijlstra.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem

perf stat: Add --pre and --post command

In order to measure kernel builds, one has to do some pre/post cleanup
work in order to do the repeat build.

So provide --pre and --post command hooks to allow doing just that.

perf stat --repeat 10 --null --sync --pre 'make -s O=defconfig-build/clean' \
-- make -s -j64 O=defconfig-build/ bzImage

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Stephane Eranian <eranian@gmail.com>
Link: http://lkml.kernel.org/r/1350992414.13456.5.camel@twins
[ committer note: Added respective entries in Documentation/perf-stat.txt ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf inject: Mark a dso if it's used

Otherwise they will be not written in an output file.

Signed-off-by: Andrew Vagin <avagin@openvz.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1344344165-369636-5-git-send-email-avagin@openvz.org
[ committer note: Fixed up wrt changes made in the immediate previous patches ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf inject: Merge sched_stat_* and sched_switch events

You may want to know where and how long a task is sleeping. A callchain
may be found in sched_switch and a time slice in stat_iowait, so I add
handler in perf inject for merging this events.

My code saves sched_switch event for each process and when it meets
stat_iowait, it reports the sched_switch event, because this event
contains a correct callchain. By another words it replaces all
stat_iowait events on proper sched_switch events.

I use the next sequence of commands for testing:

  perf record -e sched:sched_stat_sleep -e sched:sched_switch \
      -e sched:sched_process_exit -g -o ~/perf.data.raw \
      ~/test-program
  perf inject -v -s -i ~/perf.data.raw -o ~/perf.data
  perf report --stdio -i ~/perf.data
   100.00% foo  [kernel.kallsyms]  [k] __schedule
                |
                --- __schedule
                    schedule
                   |
                   |--79.75%-- schedule_hrtimeout_range_clock
                   |          schedule_hrtimeout_range
                   |          poll_schedule_timeout
                   |          do_select
                   |          core_sys_select
                   |          sys_select
                   |          system_call_fastpath
                   |          __select
                   |          __libc_start_main
                   |
                    --20.25%-- do_nanosleep
                              hrtimer_nanosleep
                              sys_nanosleep
                              system_call_fastpath
                              __GI___libc_nanosleep
                              __libc_start_main

And here is test-program.c:

#include<unistd.h>
#include<time.h>
#include<sys/select.h>

int main()
{
struct timespec ts1;
struct timeval tv1;
int i;
long s;

for (i = 0; i <  10; i++) {
ts1.tv_sec = 0;
ts1.tv_nsec = 10000000;
nanosleep(&ts1, NULL);

tv1.tv_sec = 0;
tv1.tv_usec = 40000;
select(0, NULL, NULL, NULL,&tv1);
}
return 1;
}

Signed-off-by: Andrew Vagin <avagin@openvz.org>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1344344165-369636-4-git-send-email-avagin@openvz.org
[ committer note: Made it use evsel->handler ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf inject: Work with files

Before this patch "perf inject" can only handle data from pipe.

I want to use "perf inject" for reworking events. Look at my following patch.

v2: add information about new options in tools/perf/Documentation/

Signed-off-by: Andrew Vagin <avagin@openvz.org>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1344344165-369636-2-git-send-email-avagin@openvz.org
[ committer note: fixed it up to cope with 5852a44, 5ded57a, 002439e & f62d3f0 ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf tools: Fix LIBELF_MMAP checking

Currently checking mmap support in libelf failed due to wrong flags.

    CHK libelf
    CHK libdw
    CHK libunwind
    CHK -DLIBELF_MMAP
/tmp/ccYJwdR0.o: In function `main':
:(.text+0x18): undefined reference to `elf_begin'
collect2: error: ld returned 1 exit status

This cannot happen since we checked the elf_begin() when checking
libelf and it succeeded.

Fix it by using a same flag with libelf checking.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Borislav Petkov <bp@amd64.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1351241752-2919-5-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf tools: Always show CHK message when doing try-cc

It might be useful to see what's happening behind us rather than just
waiting few seconds during the config checking.

Also align the CHK message with other ones.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Borislav Petkov <bp@amd64.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1351241752-2919-4-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf tools: Convert invocation of MAKE into SUBDIR

This will show directory change info in a consistent form. Also it can
be converted again into David Howell's descend command.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Borislav Petkov <bp@amd64.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1351241752-2919-3-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf tools: Cleanup doc related targets

Documentation targets handling rules are duplicate. Consolidate them
with DOC_TARGETS and INSTALL_DOC_TARGETS.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Borislav Petkov <bp@amd64.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1351241752-2919-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

tools lib traceevent: Do not generate dependency for system header files

Ingo reported (again!) that 'make clean' on perf/traceevent does not
work due to some reason with system header file. Quotes Ingo:

"Note that the old dependency related build failure thought to be
  fixed in commit 860df5833e46 is back:

   make[1]: *** No rule to make target
   `/usr/lib/gcc/x86_64-redhat-linux/4.7.0/include/stddef.h', needed by `.trace-seq.d'.  Stop.

  'make clean' itself does not work in libtraceevent:

   comet:~/tip/tools/lib/traceevent> make clean
   make: *** No rule to make target `/usr/lib/gcc/x86_64-redhat-linux/4.7.0/include/stddef.h', needed by `.trace-seq.d'.  Stop.

  So I had to clean it out manually:

   comet:~/tip/tools/lib/traceevent> git ls-files --others | xargs rm
   comet:~/tip/tools/lib/traceevent>

  and then things build fine."

Try to fix it by excluding system headers from dependency generation.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Reported-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@amd64.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1351241752-2919-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Merge tag 'mca_cfg' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras

Pull x86 RAS changes from Borislav Petkov:

"Rework all config variables used throughout the MCA code and collect
  them together into a mca_config struct. This keeps them tightly and
  neatly packed together instead of spilled all over the place.

  Then, convert those which are used as booleans into real booleans and
  save some space."

Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86, MCA: Finish mca_config conversion

mce_ser, mce_bios_cmci_threshold and mce_disabled are the last three
bools which need conversion. Move them to the mca_config struct and
adjust usage sites accordingly.

Signed-off-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>

x86, MCA: Convert the next three variables batch

Move them into the mca_config struct and adjust code touching them
accordingly.

Signed-off-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>

x86, MCA: Convert rip_msr, mce_bootlog, monarch_timeout

Move above configuration variables into struct mca_config and adjust
usage places accordingly.

Signed-off-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>

x86, MCA: Convert dont_log_ce, banks and tolerant

Move those MCA configuration variables into struct mca_config and adjust
the places they're used accordingly.

Signed-off-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>

drivers/base: Add a DEVICE_BOOL_ATTR macro

... which, analogous to DEVICE_INT_ATTR provides functionality to
set/clear bools. Its purpose is to be used where values need to be used
as booleans in configuration context.

Next patch uses this.

Signed-off-by: Borislav Petkov <bp@alien8.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Tony Luck <tony.luck@intel.com>

x86: Remove dead hlt_use_halt code

The hlt_use_halt function returns always true and there is only
one definition of it.

The default_idle function can then get ride of the if ...
statement and we can remove the else branch.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: linaro-dev@lists.linaro.org
Cc: patches@linaro.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1351181591-8710-1-git-send-email-daniel.lezcano@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

arch/x86/Kconfig: Allow turning off CONFIG_X86_MPPARSE when either ACPI or SFI is present

MPS tables are not needed for systems that have proper ACPI
support. This is also true for systems that have SFI in place.

So this patch allows the configuration (turning off) of
CONFIG_X86_MPPARSE when either ACPI or SFI is present.

Signed-off-by: Bin Gao <bin.gao@intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Bin Gao <bin.gao@linux.intel.com>
Link: http://lkml.kernel.org/r/20121025163544.GB38477@bingao-desk1.fm.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/boot/doc: Fix grammar and typo in boot.txt

Fixes some minor issues in the x86 boot documentation.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Rob Landley <rob@landley.net>
Link: http://lkml.kernel.org/r/20121026031702.GA23828@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core trace improvements from Arnaldo Carvalho de Melo:

* Don't stop synthesizing threads when one vanishes, this is for
   the existing threads when we start a tool like trace.

* Use sched:sched_stat_runtime to provide a thread summary, this
   produces the same output as the 'trace summary' subcommand of
   tglx's original "trace" tool.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge tag 'efi-for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent

Pull EFI fixes from Matt Fleming:

"Fix oops with EFI variables on mixed 32/64-bit firmware/kernels and
document EFI git repository location on kernel.org."

Conflicts:
arch/x86/include/asm/efi.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>

tilegx: fix some issues in the SW TSO support

This change correctly computes the header length and data length in
the fragments to avoid a bug where we would end up with extremely
slow performance. Also adopt use of skb_frag_size() accessor.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: stable@vger.kernel.org [v3.6]
Signed-off-by: David S. Miller <davem@davemloft.net>

qmi_wwan/cdc_ether: move Novatel 551 and E362 to qmi_wwan

These devices provide QMI and ethernet functionality via a standard CDC
ethernet descriptor. But when driven by cdc_ether, the QMI
functionality is unavailable because only cdc_ether can claim the USB
interface. Thus blacklist the devices in cdc_ether and add their IDs to
qmi_wwan, which enables both QMI and ethernet simultaneously.

Signed-off-by: Dan Williams <dcbw@redhat.com>
Cc: stable@vger.kernel.org
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb: Fix memory leak on Tx data path

Driver anchors the tx urbs and defers the urb submission if
a transmit request comes when the interface is suspended.
Anchoring urb increments the urb reference count. These
deferred urbs are later accessed by calling usb_get_from_anchor()
for submission during interface resume. usb_get_from_anchor()
unanchors the urb but urb reference count remains same.
This causes the urb reference count to remain non-zero
after usb_free_urb() gets called and urb never gets freed.
Hence call usb_put_urb() after anchoring the urb to properly
balance the reference count for these deferred urbs. Also,
unanchor these deferred urbs during disconnect, to free them
up.

Signed-off-by: Hemant Kumar <hemantk@codeaurora.org>
Acked-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Unmap UAR also in the case of error flow

If a failure takes place during the EQ creation, we need to unmap the
UAR memory block too.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Uri Habusha <urih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Don't use vlan tag value as an indication for vlan presence

The vlan tag can be zero. This is why it can't serve as an indication
that packet requires VLAN header in the TX flow.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Fix double-release-range in tx-rings

The QP range is reserved as a single block. However, when freeing the
en resources, the tx-ring QPs are released both in mlx4_en_destroy_tx_ring
(one at a time) and in mlx4_en_free_resources (as a block release).

Fix by eliminating the one-at-a-time release in mlx4_en_destroy_tx_ring.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bas_gigaset: fix pre_reset handling

The delayed work function int_in_work() may call usb_reset_device()
and thus, indirectly, the driver's pre_reset method. Trying to
cancel the work synchronously in that situation would deadlock.
Fix by avoiding cancel_work_sync() in the pre_reset method.

If the reset was NOT initiated by int_in_work() this might cause
int_in_work() to run after the post_reset method, with urb_int_in
already resubmitted, so handle that case gracefully.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "ARM: dma-mapping: support debug_dma_mapping_error"

This reverts commit 871ae57adc5ed092c1341f411514d0e8482e2611, which is
scheduled for v3.8 and accidently got into v3.7-rc series.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm radeon fixes from Dave Airlie:
"Just radeon fixes in this one:
   - some new PCI IDs
   - ATPX regression fix
   - async VM regression fixes
   - some module options fixes"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/radeon: fix ATPX regression in acpi rework
  drm/radeon: fix ATPX function documentation
  drm/radeon: move the retry to gem_object_create
  drm/radeon: move size limits to gem_object_create.
  drm/radeon: use vzalloc for gart pages
  drm/radeon: fix and simplify pot argument checks v3
  drm/radeon: fix header size estimation in VM code
  drm/radeon: remove set_page check from VM code
  drm/radeon: fix si_set_page v2
  drm/radeon: fix cayman_vm_set_page v2
  drm/radeon: fix PFP sync in vm_flush
  drm/radeon: add error output if VM CS fails on cayman
  drm/radeon: give each backlight a unique id
  drm/radeon: fix sparse warning
  drm/radeon: add some new SI PCI ids

Merge tag 'nfs-for-3.7-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS bugfixes from Trond Myklebust:

- Fix the NFSv2/v3 kernel statd protocol, which broke due to net
   namespace related changes.

- Fix a number of races in the SUNRPC TCP disconnect/reconnect code.

* tag 'nfs-for-3.7-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  LOCKD: Clear ln->nsm_clnt only when ln->nsm_users is zero
  LOCKD: fix races in nsm_client_get
  SUNRPC: Get rid of the xs_error_report socket callback
  SUNRPC: Prevent races in xs_abort_connection()
  Revert "SUNRPC: Ensure we close the socket on EPIPE errors too..."
  SUNRPC: Clear the connect flag when socket state is TCP_CLOSE_WAIT

Merge branch 'drm-fixes-3.7' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

Alex writes:
"Fixes pull request for radeon.  The main things here are
fixing a ATPX regression from the acpi rework, fixing some
fallout from the async VM work, and fixing some module options
that were broken in certain cases.  Other than that, mainly
just bug fixes."

* 'drm-fixes-3.7' of git://people.freedesktop.org/~agd5f/linux:
  drm/radeon: fix ATPX regression in acpi rework
  drm/radeon: fix ATPX function documentation
  drm/radeon: move the retry to gem_object_create
  drm/radeon: move size limits to gem_object_create.
  drm/radeon: use vzalloc for gart pages
  drm/radeon: fix and simplify pot argument checks v3
  drm/radeon: fix header size estimation in VM code
  drm/radeon: remove set_page check from VM code
  drm/radeon: fix si_set_page v2
  drm/radeon: fix cayman_vm_set_page v2
  drm/radeon: fix PFP sync in vm_flush
  drm/radeon: add error output if VM CS fails on cayman
  drm/radeon: give each backlight a unique id
  drm/radeon: fix sparse warning
  drm/radeon: add some new SI PCI ids

Merge branch 'akpm' (Andrew's fixes)

Merge misc fixes from Andrew Morton:
"18 total.  15 fixes and some updates to a device_cgroup patchset which
  bring it up to date with the version which I should have merged in the
  first place."

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (18 patches)
  fs/compat_ioctl.c: VIDEO_SET_SPU_PALETTE missing error check
  gen_init_cpio: avoid stack overflow when expanding
  drivers/rtc/rtc-imxdi.c: add missing spin lock initialization
  mm, numa: avoid setting zone_reclaim_mode unless a node is sufficiently distant
  pidns: limit the nesting depth of pid namespaces
  drivers/dma/dw_dmac: make driver's endianness configurable
  mm/mmu_notifier: allocate mmu_notifier in advance
  tools/testing/selftests/epoll/test_epoll.c: fix build
  UAPI: fix tools/vm/page-types.c
  mm/page_alloc.c:alloc_contig_range(): return early for err path
  rbtree: include linux/compiler.h for definition of __always_inline
  genalloc: stop crashing the system when destroying a pool
  backlight: ili9320: add missing SPI dependency
  device_cgroup: add proper checking when changing default behavior
  device_cgroup: stop using simple_strtoul()
  device_cgroup: rename deny_all to behavior
  cgroup: fix invalid rcu dereference
  mm: fix XFS oops due to dirty pages without buffers on s390

ACPI: missing break

We handle NOTIFY_THROTTLING so don't then fall through to unsupported event.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Input: wacom - add touch sensor support for Cintiq 24HD touch

Decode multitouch reports from the touch sensor of the Cintiq 24HD
touch.

Signed-off-by: Jason Gerecke <killertofu@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Input: wacom - handle split-sensor devices with internal hubs

Like our other pen-and-touch products, the Cintiq 24HD touch needs data
to be shared between its two sensors to facilitate proximity-based palm
rejection.

Unlike other tablets that report sensor data through separate interfaces
of the same USB device, the Cintiq 24HD touch has separate USB devices
that are connected to an internal USB hub.

This patch makes it possible to designate the USB VID/PID of the other
device so that the two may share data. To ensure we don't accidentally
link to a sensor from a physically separate device (if several have been
plugged in), we limit the search to siblings (i.e., devices directly
connected to the same hub).

Signed-off-by: Jason Gerecke <killertofu@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Makefile: Documentation for external tool should be correct

If one includes documentation for an external tool, it should be
correct.  This is not:

1. Overriding the input to rngd should typically be neither
   necessary nor desired.  This is especially so since newer
   versions of rngd support a number of different *types* of sources.
2. The default kernel-exported device is called /dev/hwrng not
   /dev/hwrandom nor /dev/hw_random (both of which were used in the
   past; however, kernel and udev seem to have converged on
   /dev/hwrng.)

Overall it is better if the documentation for rngd is kept with rngd
rather than in a kernel Makefile.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm

Pull ARM fixes from Russell King:
"A random collection of various fixes, mainly from Arnd and a few other
  people.  Not thing really stands out here."

* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: drop experimental status for hotplug and Thumb2
  ARM: 7560/1: SMP_TWD: use DIV_ROUND_CLOSEST() for periodic mode
  ARM: 7559/1: smp: switch away from the idmap before updating init_mm.mm_count
  ARM: 7556/1: perf: fix updated event period in response to PERF_EVENT_IOC_PERIOD
  ARM: 7555/1: kexec: fix segment memory addresses check
  ARM: warnings in arch/arm/include/asm/uaccess.h
  ARM: binfmt_flat: unused variable 'persistent'
  ARM: be really quiet when building with 'make -s'
  ARM: pass -marm to gcc by default for both C and assembler
  ARM: Xen: fix initial build problems
  ARM: export default read_current_timer
  ARM: Fix another build warning in arch/arm/mm/alignment.c
  ARM: export set_irq_flags
  ARM: kprobes: make more tests conditional