git.karo-electronics.de Git - karo-tx-linux.git/log

mm: numa: guarantee that tlb_flush_pending updates are visible before page table updates

According to documentation on barriers, stores issued before a LOCK can
complete after the lock implying that it's possible tlb_flush_pending can
be visible after a page table update. As per revised documentation, this
patch adds a smp_mb__before_spinlock to guarantee the correct ordering.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm-fix-tlb-flush-race-between-migration-and-change_protection_range-fix

The following build error was reported by the 0-day build checker.

>> arch/arm/mm/context.c:51:18: error: 'tlb_flush_pending' redeclared as different kind of symbol
include/linux/mm_types.h:477:91: note: previous definition of 'tlb_flush_pending' was here

This patch renames tlb_flush_pending to
mm_tlb_flush_pending. This is a fix for the -mm patch
mm-fix-tlb-flush-race-between-migration-and-change_protection_range.patch

Note that when slotted into place that it will cause a conflict with
mm-numa-defer-tlb-flush-for-thp-migration-as-long-as-possible.patch . The
resolution is to delete the call from huge_memory.c and make sure the
tlb_flush_pending call in mm/migrate.c is renamed appropriately.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: fix TLB flush race between migration, and change_protection_range

There are a few subtle races, between change_protection_range (used by
mprotect and change_prot_numa) on one side, and NUMA page migration and
compaction on the other side.

The basic race is that there is a time window between when the PTE gets
made non-present (PROT_NONE or NUMA), and the TLB is flushed.

During that time, a CPU may continue writing to the page.

This is fine most of the time, however compaction or the NUMA migration
code may come in, and migrate the page away.

When that happens, the CPU may continue writing, through the cached
translation, to what is no longer the current memory location of the
process.

This only affects x86, which has a somewhat optimistic pte_accessible.
All other architectures appear to be safe, and will either always flush,
or flush whenever there is a valid mapping, even with no permissions
(SPARC).

The basic race looks like this:

CPU A CPU B CPU C

load TLB entry
make entry PTE/PMD_NUMA
fault on entry
read/write old page
start migrating page
change PTE/PMD to new page
read/write old page [*]
flush TLB
reload TLB from new entry
read/write new page
lose data

[*] the old page may belong to a new user at this point!

The obvious fix is to flush remote TLB entries, by making sure that
pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may
still be accessible if there is a TLB flush pending for the mm.

This should fix both NUMA migration and compaction.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: numa: avoid unnecessary disruption of NUMA hinting during migration

do_huge_pmd_numa_page() handles the case where there is parallel THP
migration. However, by the time it is checked the NUMA hinting
information has already been disrupted. This patch adds an earlier check
with some helpers.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: numa: clear numa hinting information on mprotect

On a protection change it is no longer clear if the page should be still
accessible. This patch clears the NUMA hinting fault bits on a protection
change.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sched: numa: skip inaccessible VMAs

Inaccessible VMA should not be trapping NUMA hint faults. Skip them.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: numa: avoid unnecessary work on the failure path

If a PMD changes during a THP migration then migration aborts but the
failure path is doing more work than is necessary.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: numa: ensure anon_vma is locked to prevent parallel THP splits

The anon_vma lock prevents parallel THP splits and any associated
complexity that arises when handling splits during THP migration. This
patch checks if the lock was successfully acquired and bails from THP
migration if it failed for any reason.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: numa: do not clear PTE for pte_numa update

The TLB must be flushed if the PTE is updated but change_pte_range is
clearing the PTE while marking PTEs pte_numa without necessarily flushing
the TLB if it reinserts the same entry. Without the flush, it's
conceivable that two processors have different TLBs for the same virtual
address and at the very least it would generate spurious faults. This
patch only unmaps the pages in change_pte_range for a full protection
change.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: numa: do not clear PMD during PTE update scan

If the PMD is flushed then a parallel fault in handle_mm_fault() will
enter the pmd_none and do_huge_pmd_anonymous_page() path where it'll
attempt to insert a huge zero page. This is wasteful so the patch avoids
clearing the PMD when setting pmd_numa.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: clear pmd_numa before invalidating

On x86, PMD entries are similar to _PAGE_PROTNONE protection and are
handled as NUMA hinting faults.  The following two page table protection
bits are what defines them

_PAGE_NUMA:set _PAGE_PRESENT:clear

A PMD is considered present if any of the _PAGE_PRESENT, _PAGE_PROTNONE,
_PAGE_PSE or _PAGE_NUMA bits are set.  If pmdp_invalidate encounters a
pmd_numa, it clears the present bit leaving _PAGE_NUMA which will be
considered not present by the CPU but present by pmd_present.  The
existing caller of pmdp_invalidate should handle it but it's an
inconsistent state for a PMD.  This patch keeps the state consistent when
calling pmdp_invalidate.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: numa: call MMU notifiers on THP migration

MMU notifiers must be called on THP page migration or secondary MMUs will
get very confused.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: numa: serialise parallel get_user_page against THP migration

Base pages are unmapped and flushed from cache and TLB during normal page
migration and replaced with a migration entry that causes any parallel
NUMA hinting fault or gup to block until migration completes.  THP does
not unmap pages due to a lack of support for migration entries at a PMD
level.  This allows races with get_user_pages and get_user_pages_fast
which commit 3f926ab94 ("mm: Close races between THP migration and PMD
numa clearing") made worse by introducing a pmd_clear_flush().

This patch forces get_user_page (fast and normal) on a pmd_numa page to go
through the slow get_user_page path where it will serialise against THP
migration and properly account for the NUMA hinting fault.  On the
migration side the page table lock is taken for each PTE update.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kexec: migrate to reboot cpu

Commit 1b3a5d02ee0 ("reboot: move arch/x86 reboot= handling to generic
kernel") moved reboot= handling to generic code.  In the process it also
removed the code in native_machine_shutdown() which are moving reboot
process to reboot_cpu/cpu0.

I guess that thought must have been that all reboot paths are calling
migrate_to_reboot_cpu(), so we don't need this special handling.  But
kexec reboot path (kernel_kexec()) is not calling migrate_to_reboot_cpu()
so above change broke kexec.  Now reboot can happen on non-boot cpu and
when INIT is sent in second kerneo to bring up BP, it brings down the
machine.

So start calling migrate_to_reboot_cpu() in kexec reboot path to avoid
this problem.

Bisected by WANG Chao.

Reported-by: Matthew Whitehead <mwhitehe@redhat.com>
Reported-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Tested-by: Baoquan He <bhe@redhat.com>
Tested-by: WANG Chao <chaowang@redhat.com>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Linux 3.13-rc4

null_blk: mem garbage on NUMA systems during init

For NUMA systems, initializing the blk-mq layer and using per node hctx.
We initialize submit queues to 1, while blk-mq nr_hw_queues is
initialized to the number of NUMA nodes.

This makes the null_init_hctx function overwrite memory outside of what
it allocated. In my case it lead to writing garbage into struct
request_queue's mq_map.

Signed-off-by: Matias Bjorling <m@bjorling.me>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

radeon_pm: fix oops in hwmon_attributes_visible() and radeon_hwmon_show_temp_thresh()

Since commit ec39f64bba34 ("drm/radeon/dpm: Convert to use
devm_hwmon_register_with_groups") radeon_hwmon_init() is using
hwmon_device_register_with_groups(), which sets `rdev' as a device
private driver_data, while hwmon_attributes_visible() and
radeon_hwmon_show_temp_thresh() are still waiting for `drm_device'.

Fix them by using dev_get_drvdata(), in order to avoid this oops:

  BUG: unable to handle kernel paging request at 0000000000001e28
  IP: [<ffffffffa02ae8b4>] hwmon_attributes_visible+0x18/0x3d [radeon]
  PGD 15057e067 PUD 151a8e067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP
  Call Trace:
    internal_create_group+0x114/0x1d9
    sysfs_create_group+0xe/0x10
    sysfs_create_groups+0x22/0x5f
    device_add+0x34f/0x501
    device_register+0x15/0x18
    hwmon_device_register_with_groups+0xb5/0xed
    radeon_hwmon_init+0x56/0x7c [radeon]
    radeon_pm_init+0x134/0x7e5 [radeon]
    radeon_modeset_init+0x75f/0x8ed [radeon]
    radeon_driver_load_kms+0xc6/0x187 [radeon]
    drm_dev_register+0xf9/0x1b4 [drm]
    drm_get_pci_dev+0x98/0x129 [drm]
    radeon_pci_probe+0xa3/0xac [radeon]
    pci_device_probe+0x6e/0xcf
    driver_probe_device+0x98/0x1c4
    __driver_attach+0x5c/0x7e
    bus_for_each_dev+0x7b/0x85
    driver_attach+0x19/0x1b
    bus_add_driver+0x104/0x1ce
    driver_register+0x89/0xc5
    __pci_register_driver+0x58/0x5b
    drm_pci_init+0x86/0xea [drm]
    radeon_init+0x97/0x1000 [radeon]
    do_one_initcall+0x7f/0x117
    load_module+0x1583/0x1bb4
    SyS_init_module+0xa0/0xaf

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Revert CHECKSUM_COMPLETE optimization in pskb_trim_rcsum(), I can't
    figure out why it breaks things.

2) Fix comparison in netfilter ipset's hash_netnet4_data_equal(), it
    was basically doing "x == x", from Dave Jones.

3) Freescale FEC driver was DMA mapping the wrong number of bytes, from
    Sebastian Siewior.

4) Blackhole and prohibit routes in ipv6 were not doing the right thing
    because their ->input and ->output methods were not being assigned
    correctly.  Now they behave properly like their ipv4 counterparts.
    From Kamala R.

5) Several drivers advertise the NETIF_F_FRAGLIST capability, but
    really do not support this feature and will send garbage packets if
    fed fraglist SKBs.  From Eric Dumazet.

6) Fix long standing user triggerable BUG_ON over loopback in RDS
    protocol stack, from Venkat Venkatsubra.

7) Several not so common code paths can potentially try to invoke
    packet scheduler actions that might be NULL without checking.  Shore
    things up by either 1) defining a method as mandatory and erroring
    on registration if that method is NULL 2) defininig a method as
    optional and the registration function hooks up a default
    implementation when NULL is seen.  From Jamal Hadi Salim.

8) Fix fragment detection in xen-natback driver, from Paul Durrant.

9) Kill dangling enter_memory_pressure method in cg_proto ops, from
    Eric W Biederman.

10) SKBs that traverse namespaces should have their local_df cleared,
    from Hannes Frederic Sowa.

11) IOCB file position is not being updated by macvtap_aio_read() and
    tun_chr_aio_read().  From Zhi Yong Wu.

12) Don't free virtio_net netdev before releasing all of the NAPI
    instances.  From Andrey Vagin.

13) Procfs entry leak in xt_hashlimit, from Sergey Popovich.

14) IPv6 routes that are no cached routes should not count against the
    garbage collection limits.  We had this almost right, but were
    missing handling addrconf generated routes properly.  From Hannes
    Frederic Sowa.

15) fib{4,6}_rule_suppress() have to consider potentially seeing NULL
    route info when they are called, from Stefan Tomanek.

16) TUN and MACVTAP have had truncated packet signalling for some time,
    fix from Jason Wang.

17) Fix use after frrr in __udp4_lib_rcv(), from Eric Dumazet.

18) xen-netback does not interpret the NAPI budget properly for TX work,
    fix from Paul Durrant.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (132 commits)
  igb: Fix for issue where values could be too high for udelay function.
  i40e: fix null dereference
  xen-netback: fix gso_prefix check
  net: make neigh_priv_len in struct net_device 16bit instead of 8bit
  drivers: net: cpsw: fix for cpsw crash when build as modules
  xen-netback: napi: don't prematurely request a tx event
  xen-netback: napi: fix abuse of budget
  sch_tbf: use do_div() for 64-bit divide
  udp: ipv4: must add synchronization in udp_sk_rx_dst_set()
  net:fec: remove duplicate lines in comment about errata ERR006358
  Revert "8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature"
  8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature
  xen-netback: make sure skb linear area covers checksum field
  net: smc91x: Fix device tree based configuration so it's usable
  udp: ipv4: fix potential use after free in udp_v4_early_demux()
  macvtap: signal truncated packets
  tun: unbreak truncated packet signalling
  net: sched: htb: fix the calculation of quantum
  net: sched: tbf: fix the calculation of max_size
  micrel: add support for KSZ8041RNLI
  ...

Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Peter Anvin:
"This is a pretty small batch:

  The biggest single change is to stop using EFI time services on 32-bit
  platforms.  This matches our current behavior on 64-bit platforms as
  we already had ruled them out there as being too unreliable.  Turns
  out that affects 32-bit platforms, too.

  One NULL pointer fix for SGI UV.

  Two minor build fixes, one of which only affects icc and the other
  which affects icc and future versions or nonstandard default settings
  of gcc"

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, efi: Don't use (U)EFI time services on 32 bit
  x86, build, icc: Remove uninitialized_var() from compiler-intel.h
  x86/UV: Fix NULL pointer dereference in uv_flush_tlb_others() if the 'nobau' boot option is used
  x86, build: Pass in additional -mno-mmx, -mno-sse options

Merge tag 'pci-v3.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
"PCI device hotplug
    - Move device_del() from pci_stop_dev() to pci_destroy_dev() (Rafael
      Wysocki)

  Host bridge drivers
    - Update maintainers for DesignWare, i.MX6, Armada, R-Car (Bjorn
      Helgaas)
    - mvebu: Return 'unsupported' for Interrupt Line and Interrupt Pin
      (Jason Gunthorpe)

  Miscellaneous
    - Avoid unnecessary CPU switch when calling .probe() (Alexander
      Duyck)
    - Revert "workqueue: allow work_on_cpu() to be called recursively"
      (Bjorn Helgaas)
    - Disable Bus Master only on kexec reboot (Khalid Aziz)
    - Omit PCI ID macro strings to shorten quirk names for LTO (Michal
      Marek)"

* tag 'pci-v3.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  MAINTAINERS: Add DesignWare, i.MX6, Armada, R-Car PCI host maintainers
  PCI: Disable Bus Master only on kexec reboot
  PCI: mvebu: Return 'unsupported' for Interrupt Line and Interrupt Pin
  PCI: Omit PCI ID macro strings to shorten quirk names
  PCI: Move device_del() from pci_stop_dev() to pci_destroy_dev()
  Revert "workqueue: allow work_on_cpu() to be called recursively"
  PCI: Avoid unnecessary CPU switch when calling driver .probe() method

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

Pull SELinux fixes from James Morris.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  selinux: process labeled IPsec TCP SYN-ACK packets properly in selinux_ip_postroute()
  selinux: look for IPsec labels on both inbound and outbound packets
  selinux: handle TCP SYN-ACK packets correctly in selinux_ip_postroute()
  selinux: handle TCP SYN-ACK packets correctly in selinux_ip_output()
  selinux: fix possible memory leak

Revert "selinux: consider filesystem subtype in policies"

This reverts commit 102aefdda4d8275ce7d7100bc16c88c74272b260.

Tom London reports that it causes sync() to hang on Fedora rawhide:

https://bugzilla.redhat.com/show_bug.cgi?id=1033965

and Josh Boyer bisected it down to this commit. Reverting the commit in
the rawhide kernel fixes the problem.

Eric Paris root-caused it to incorrect subtype matching in that commit
breaking fuse, and has a tentative patch, but by now we're better off
retrying this in 3.14 rather than playing with it any more.

Reported-by: Tom London <selinux@gmail.com>
Bisected-by: Josh Boyer <jwboyer@fedoraproject.org>
Acked-by: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Anand Avati <avati@redhat.com>
Cc: Paul Moore <paul@paul-moore.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

igb: Fix for issue where values could be too high for udelay function.

This patch changes the igb_phy_has_link function to check the value of the
parameter before deciding to use udelay or mdelay in order to be sure that
the value is not too high for udelay function.

CC: stable kernel <stable@vger.kernel.org> # 3.9+
Signed-off-by: Sunil K Pandey <sunil.k.pandey@intel.com>
Signed-off-by: Kevin B Smith <kevin.b.smith@intel.com>
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: fix null dereference

If the vsi->tx_rings structure is NULL we don't want to panic.

Change-Id: Ic694f043701738c434e8ebe0caf0673f4410dc10
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'edac_fixes_for_3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC fix from Borislav Petkov:
"Silence a compiler warning in sb_edac"

* tag 'edac_fixes_for_3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
sb_edac: Shut up compiler warning when EDAC_DEBUG is enabled

Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:
"This resolves some further issues with the dma mask changes on ARM
  which have been found by TI and others, and also some corner cases
  with the updates to the virtual to physical address translations.

  Konstantin also found some problems with the unwinder, which now
  performs tighter verification that the stack is valid while unwinding"

* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  ARM: fix asm/memory.h build error
  ARM: 7917/1: cacheflush: correctly limit range of memory region being flushed
  ARM: 7913/1: fix framepointer check in unwind_frame
  ARM: 7912/1: check stack pointer in get_wchan
  ARM: 7909/1: mm: Call setup_dma_zone() post early_paging_init()
  ARM: 7908/1: mm: Fix the arm_dma_limit calculation
  ARM: another fix for the DMA mapping checks

Merge tag 'arc-fixes-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC fixes from Vineet Gupta:
"These are couple of weeks old already, but I just couldn't get them to
  you earlier.

   - couple of fixes for recently added perf code
   - build time extable sort"

* tag 'arc-fixes-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: [perf] Fix a few thinkos
  ARC: Add guard macro to uapi/asm/unistd.h
  ARC: extable: Enable sorting at build time

Merge tag 'dm-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:
"A set of device-mapper fixes for 3.13.

  A fix for possible memory corruption during DM table load, fix a
  possible leak of snapshot space in case of a crash, fix a possible
  deadlock due to a shared workqueue in the delay target, fix to
  initialize read-only module parameters that are used to export metrics
  for dm stats and dm bufio.

  Quite a few stable fixes were identified for both the thin-
  provisioning and caching targets as a result of increased regression
  testing using the device-mapper-test-suite (dmts).  The most notable
  of these are the reference counting fixes for the space map btree that
  is used by the dm-array interface -- without these the dm-cache
  metadata will leak, resulting in dm-cache devices running out of
  metadata blocks.  Also, some important fixes related to the
  thin-provisioning target's transition to read-only mode on error"

* tag 'dm-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm array: fix a reference counting bug in shadow_ablock
  dm space map: disallow decrementing a reference count below zero
  dm stats: initialize read-only module parameter
  dm bufio: initialize read-only module parameters
  dm cache: actually resize cache
  dm cache: update Documentation for invalidate_cblocks's range syntax
  dm cache policy mq: fix promotions to occur as expected
  dm thin: allow pool in read-only mode to transition to read-write mode
  dm thin: re-establish read-only state when switching to fail mode
  dm thin: always fallback the pool mode if commit fails
  dm thin: switch to read-only mode if metadata space is exhausted
  dm thin: switch to read only mode if a mapping insert fails
  dm space map metadata: return on failure in sm_metadata_new_block
  dm table: fail dm_table_create on dm_round_up overflow
  dm snapshot: avoid snapshot space leak on crash
  dm delay: fix a possible deadlock due to shared workqueue

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid

Pull HID fixes from Jiri Kosina:

- Genius Gx Imperator Keyboard regression fix (missing break in case),
   by Ben Hutchings

- duplicate sysfs entry error fix for hid-sensor-hub driver, by
   Srinivas Pandruvada

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  HID: hid-sensor-hub: fix duplicate sysfs entry error
  HID: kye: Fix missing break in kye_report_fixup()

ARM: fix asm/memory.h build error

Jason Gunthorpe reports a build failure when ARM_PATCH_PHYS_VIRT is
not defined:

In file included from arch/arm/include/asm/page.h:163:0,
                 from include/linux/mm_types.h:16,
                 from include/linux/sched.h:24,
                 from arch/arm/kernel/asm-offsets.c:13:
arch/arm/include/asm/memory.h: In function '__virt_to_phys':
arch/arm/include/asm/memory.h:244:40: error: 'PHYS_OFFSET' undeclared (first use in this function)
arch/arm/include/asm/memory.h:244:40: note: each undeclared identifier is reported only once for each function it appears in
arch/arm/include/asm/memory.h: In function '__phys_to_virt':
arch/arm/include/asm/memory.h:249:13: error: 'PHYS_OFFSET' undeclared (first use in this function)

Fixes: ca5a45c06cd4 ("ARM: mm: use phys_addr_t appropriately in p2v and v2p conversions")
Tested-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

Merge tag 'regulator-v3.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
"A small set of driver fixes plus one larger core change which changes
  the way we check to see if we're using DT so that there aren't any
  races between deciding we're using DT and the regulator subsystem
  noticing.

  This makes the new support for substituting a dummy regulator and
  optional regulators work a lot better on DT systems since it ensures
  that we don't trigger probe deferral when we shouldn't which was
  causing bugs in clients"

* tag 'regulator-v3.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: pfuze100: allow misprogrammed ID
  regulator: pfuze100: Fix address of FABID
  regulator: as3722: set the correct current limit
  regulator: core: Check for DT every time we check full constraints
  regulator: core: Replace checks of have_full_constraints with a function

Merge tag 'regmap-v3.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap

Pull regmap fixes from Mark Brown:
"Two small changes to fix some error handling and checking (both of
  which would be quite serious if the errors trigger) plus a trivial
  documentation fix"

* tag 'regmap-v3.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: use IS_ERR() to check clk_get() results
  regmap: make sure we unlock on failure in regmap_bulk_write
  regmap: trivial comment fix (copy'n'paste error)

Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"Here are two simple but wanted fixes for the i2c subsystem"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: imx: Check the return value from clk_prepare_enable()
i2c: mux: Inherit retry count and timeout from parent for muxed bus

Merge tag 'for-linus-20131212' of git://git.infradead.org/linux-mtd

Pull MTD fixes from Brian Norris:
"Two MTD fixes, for the pxa3xx-nand driver:

   - This driver was not ready to fully Armada 370 NAND, with
     particularly notable problems seen on flash with 2KB page sizes.
     This "compatible" entry really should have been held back until
     3.14 or later.

   - Fix a bug seen in rare cases on the error path of a failed probe
     attempt, where we free unallocated DMA resources"

* tag 'for-linus-20131212' of git://git.infradead.org/linux-mtd:
  mtd: nand: pxa3xx: Use info->use_dma to release DMA resources
  Partially revert "mtd: nand: pxa3xx: Introduce 'marvell,armada370-nand' compatible string"

Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma

Pull slave-dmaengine fixes from Vinod Koul:
"Here is the common fixes PULL for dmaengine.

  Dan has been working on fixing the build issues in bunch of drivers.
  Here we have one fixing s3c24xx-dma, along with fix from Russell on
  pl08x.  Also we have Kuninori rcar dma fixes.  The s3c24xx-dma which
  was added in last merge window missed updates to usage of DMA_COMPLETE
  so converting the last driver"

* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
  dma: fix build breakage in s3c24xx-dma
  Fix pl08x warnings
  rcar-hpbdma: initialise plane information when halted
  rcar-hpbdma: fixup channel busy check for double plane
  rcar-hpbdma: add max transfer size
  dma: mmp_pdma: add missing platform_set_drvdata() in mmp_pdma_probe()
  dmaengine: s3c24xx-dma: use DMA_COMPLETE for dma completion status

dm array: fix a reference counting bug in shadow_ablock

An old array block could have its reference count decremented below
zero when it is being replaced in the btree by a new array block.

The fix is to increment the old ablock's reference count just before
inserting a new ablock into the btree.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.9+

dm space map: disallow decrementing a reference count below zero

The old behaviour, returning -EINVAL if a ref_count of 0 would be
decremented, was removed in commit f722063 ("dm space map: optimise
sm_ll_dec and sm_ll_inc"). To fix this regression we return an error
code from the mutator function pointer passed to sm_ll_mutate() and have
dec_ref_count() return -EINVAL if the old ref_count is 0.

Add a DMERR to reflect the potential seriousness of this error.

Also, add missing dm_tm_unlock() to sm_ll_mutate()'s error path.

With this fix the following dmts regression test now passes:
dmtest run --suite cache -n /metadata_use_kernel/

The next patch fixes the higher-level dm-array code that exposed this
regression.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.12+

Merge remote-tracking branch 'regulator/topic/constraints' into regulator-linus

Merge branch 'master' of git://git.infradead.org/users/pcmoore/selinux_fixes into for-linus

Merge branch 'akpm' (fixes from Andrew)

Merge patches from Andrew Morton:
  "13 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: memcg: do not allow task about to OOM kill to bypass the limit
  mm: memcg: fix race condition between memcg teardown and swapin
  thp: move preallocated PTE page table on move_huge_pmd()
  mfd/rtc: s5m: fix register updating by adding regmap for RTC
  rtc: s5m: enable IRQ wake during suspend
  rtc: s5m: limit endless loop waiting for register update
  rtc: s5m: fix unsuccesful IRQ request during probe
  drivers/rtc/rtc-s5m.c: fix info->rtc assignment
  include/linux/kernel.h: make might_fault() a nop for !MMU
  drivers/rtc/rtc-at91rm9200.c: correct alarm over day/month wrap
  procfs: also fix proc_reg_get_unmapped_area() for !MMU case
  mm: memcg: do not declare OOM from __GFP_NOFAIL allocations
  include/linux/hugetlb.h: make isolate_huge_page() an inline

mm: memcg: do not allow task about to OOM kill to bypass the limit

Commit 4942642080ea ("mm: memcg: handle non-error OOM situations more
gracefully") allowed tasks that already entered a memcg OOM condition to
bypass the memcg limit on subsequent allocation attempts hoping this
would expedite finishing the page fault and executing the kill.

David Rientjes is worried that this breaks memcg isolation guarantees
and since there is no evidence that the bypass actually speeds up fault
processing just change it so that these subsequent charge attempts fail
outright. The notable exception being __GFP_NOFAIL charges which are
required to bypass the limit regardless.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-bt: David Rientjes <rientjes@google.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: memcg: fix race condition between memcg teardown and swapin

There is a race condition between a memcg being torn down and a swapin
triggered from a different memcg of a page that was recorded to belong
to the exiting memcg on swapout (with CONFIG_MEMCG_SWAP extension).  The
result is unreclaimable pages pointing to dead memcgs, which can lead to
anything from endless loops in later memcg teardown (the page is charged
to all hierarchical parents but is not on any LRU list) or crashes from
following the dangling memcg pointer.

Memcgs with tasks in them can not be torn down and usually charges don't
show up in memcgs without tasks.  Swapin with the CONFIG_MEMCG_SWAP
extension is the notable exception because it charges the cgroup that
was recorded as owner during swapout, which may be empty and in the
process of being torn down when a task in another memcg triggers the
swapin:

  teardown:                 swapin:

                            lookup_swap_cgroup_id()
                            rcu_read_lock()
                            mem_cgroup_lookup()
                            css_tryget()
                            rcu_read_unlock()
  disable css_tryget()
  call_rcu()
    offline_css()
      reparent_charges()
                            res_counter_charge() (hierarchical!)
                            css_put()
                              css_free()
                            pc->mem_cgroup = dead memcg
                            add page to dead lru

Add a final reparenting step into css_free() to make sure any such raced
charges are moved out of the memcg before it's finally freed.

In the longer term it would be cleaner to have the css_tryget() and the
res_counter charge under the same RCU lock section so that the charge
reparenting is deferred until the last charge whose tryget succeeded is
visible.  But this will require more invasive changes that will be
harder to evaluate and backport into stable, so better defer them to a
separate change set.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

thp: move preallocated PTE page table on move_huge_pmd()

Andrey Wagin reported crash on VM_BUG_ON() in pgtable_pmd_page_dtor() with
fallowing backtrace:

  free_pgd_range+0x2bf/0x410
  free_pgtables+0xce/0x120
  unmap_region+0xe0/0x120
  do_munmap+0x249/0x360
  move_vma+0x144/0x270
  SyS_mremap+0x3b9/0x510
  system_call_fastpath+0x16/0x1b

The crash can be reproduce with this test case:

  #define _GNU_SOURCE
  #include <sys/mman.h>
  #include <stdio.h>
  #include <unistd.h>

  #define MB (1024 * 1024UL)
  #define GB (1024 * MB)

  int main(int argc, char **argv)
  {
char *p;
int i;

p = mmap((void *) GB, 10 * MB, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
for (i = 0; i < 10 * MB; i += 4096)
p[i] = 1;
mremap(p, 10 * MB, 10 * MB, MREMAP_FIXED | MREMAP_MAYMOVE, 2 * GB);
return 0;
  }

Due to split PMD lock, we now store preallocated PTE tables for THP
pages per-PMD table.  It means we need to move them to other PMD table
if huge PMD moved there.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Andrey Vagin <avagin@openvz.org>
Tested-by: Andrey Vagin <avagin@openvz.org>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mfd/rtc: s5m: fix register updating by adding regmap for RTC

Rename old regmap field of "struct sec_pmic_dev" to "regmap_pmic" and
add new regmap for RTC.

On S5M8767A registers were not properly updated and read due to usage of
the same regmap as the PMIC. This could be observed in various hangs,
e.g. in infinite loop during waiting for UDR field change.

On this chip family the RTC has different I2C address than PMIC so
additional regmap is needed.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Mark Brown <broonie@linaro.org>
Acked-by: Sangbeom Kim <sbkim73@samsung.com>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

rtc: s5m: enable IRQ wake during suspend

Add PM suspend/resume ops to rtc-s5m driver and enable IRQ wake during
suspend so the RTC would act like a wake up source. This allows waking
up from suspend to RAM on RTC alarm interrupt.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Mark Brown <broonie@linaro.org>
Acked-by: Sangbeom Kim <sbkim73@samsung.com>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

rtc: s5m: limit endless loop waiting for register update

After setting alarm or time the driver is waiting for UDR register to be
cleared indicating that registers data have been transferred.

Limit the endless loop to only 5 retries.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Mark Brown <broonie@linaro.org>
Acked-by: Sangbeom Kim <sbkim73@samsung.com>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

rtc: s5m: fix unsuccesful IRQ request during probe

Probe failed for rtc-s5m:

s5m-rtc s5m-rtc: Failed to request alarm IRQ: 12: -22
s5m-rtc: probe of s5m-rtc failed with error -22

Fix rtc-s5m interrupt request by using regmap_irq_get_virq() for mapping
the IRQ.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Mark Brown <broonie@linaro.org>
Acked-by: Sangbeom Kim <sbkim73@samsung.com>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

drivers/rtc/rtc-s5m.c: fix info->rtc assignment

Fix this warning:

drivers/rtc/rtc-s5m.c: In function `s5m_rtc_probe':
drivers/rtc/rtc-s5m.c:545: warning: assignment from incompatible pointer type

struct s5m_rtc_info.rtc has type "struct regmap *", while
struct sec_pmic_dev.rtc has type "struct i2c_client *".

Probably the author wanted to assign "struct sec_pmic_dev.regmap", which
has the correct type.

Also, as "rtc" doesn't make much sense as a name for a regmap, rename it
to "regmap".

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Sangbeom Kim <sbkim73@samsung.com>
Cc: Sachin Kamat <sachin.kamat@linaro.org>
Tested-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

include/linux/kernel.h: make might_fault() a nop for !MMU

The machine cannot fault if !MUU, so make might_fault() a nop for !MMU.

This fixes below build error if
!CONFIG_MMU && (CONFIG_PROVE_LOCKING=y || CONFIG_DEBUG_ATOMIC_SLEEP=y):

  arch/arm/kernel/built-in.o: In function `arch_ptrace':
  arch/arm/kernel/ptrace.c:852: undefined reference to `might_fault'
  arch/arm/kernel/built-in.o: In function `restore_sigframe':
  arch/arm/kernel/signal.c:173: undefined reference to `might_fault'
  ...
  arch/arm/kernel/built-in.o:arch/arm/kernel/signal.c:177: more undefined references to `might_fault' follow
  make: *** [vmlinux] Error 1

Signed-off-by: Axel Lin <axel.lin@ingics.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

drivers/rtc/rtc-at91rm9200.c: correct alarm over day/month wrap

Update month and day of month to the alarm month/day instead of current
day/month when setting the RTC alarm mask.

Signed-off-by: Linus Pizunski <linus@narrativeteam.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

procfs: also fix proc_reg_get_unmapped_area() for !MMU case

Commit fad1a86e25e0 ("procfs: call default get_unmapped_area on
MMU-present architectures"), as its title says, took care of only the
MMU case, leaving the !MMU side still in the regressed state (returning
-EIO in all cases where pde->proc_fops->get_unmapped_area is NULL).

From the fad1a86e25e0 changelog:

"Commit c4fe24485729 ("sparc: fix PCI device proc file mmap(2)") added
  proc_reg_get_unmapped_area in proc_reg_file_ops and
  proc_reg_file_ops_no_compat, by which now mmap always returns EIO if
  get_unmapped_area method is not defined for the target procfs file, which
  causes regression of mmap on /proc/vmcore.

  To address this issue, like get_unmapped_area(), call default
  current->mm->get_unmapped_area on MMU-present architectures if
  pde->proc_fops->get_unmapped_area, i.e.  the one in actual file operation
  in the procfs file, is not defined"

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: <stable@vger.kernel.org> [3.12.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: memcg: do not declare OOM from __GFP_NOFAIL allocations

Commit 84235de394d9 ("fs: buffer: move allocation failure loop into the
allocator") started recognizing __GFP_NOFAIL in memory cgroups but
forgot to disable the OOM killer.

Any task that does not fail allocation will also not enter the OOM
completion path. So don't declare an OOM state in this case or it'll be
leaked and the task be able to bypass the limit until the next
userspace-triggered page fault cleans up the OOM state.

Reported-by: William Dauchy <wdauchy@gmail.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org> [3.12.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

include/linux/hugetlb.h: make isolate_huge_page() an inline

With CONFIG_HUGETLBFS=n:

  mm/migrate.c: In function `do_move_page_to_node_array':
  include/linux/hugetlb.h:140:33: warning: statement with no effect [-Wunused-value]
   #define isolate_huge_page(p, l) false
                                   ^
  mm/migrate.c:1170:4: note: in expansion of macro `isolate_huge_page'
      isolate_huge_page(page, &pagelist);

Reported-by: Borislav Petkov <bp@alien8.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"Four security fixes for KVM on x86.  Thanks to Andrew Honig and Lars
  Bull from Google for reporting them"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)
  KVM: x86: Convert vapic synchronization to _cached functions (CVE-2013-6368)
  KVM: x86: Fix potential divide by 0 in lapic (CVE-2013-6367)
  KVM: Improve create VCPU parameter (CVE-2013-4587)

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
"Another week, another batch of fixes.

  Again, OMAP regressions due to move to DT is the bulk of the changes
  here, but this should be the last of it for 3.13.  There are also a
  handful of OMAP hwmod changes (power management, reset handling) for
  USB on OMAP3 that fixes some longish-standing bugs around USB resets.

  There are a couple of other changes that also add up line count a bit:
  One is a long-standing bug with the keyboard layout on one of the PXA
  platforms.  The other is a fix for highbank that moves their
  power-off/reset button handling to be done in-kernel since relying on
  userspace to handle it was fragile and awkward"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: sun6i: dt: Fix interrupt trigger types
  ARM: sun7i: dt: Fix interrupt trigger types
  MAINTAINERS: merge IMX6 entry into IMX
  ARM: tegra: add missing break to fuse initialization code
  ARM: pxa: prevent PXA270 occasional reboot freezes
  ARM: pxa: tosa: fix keys mapping
  ARM: OMAP2+: omap_device: add fail hook for runtime_pm when bad data is detected
  ARM: OMAP2+: hwmod: Fix usage of invalid iclk / oclk when clock node is not present
  ARM: OMAP3: hwmod data: Don't prevent RESET of USB Host module
  ARM: OMAP2+: hwmod: Fix SOFTRESET logic
  ARM: OMAP4+: hwmod data: Don't prevent RESET of USB Host module
  ARM: dts: Fix booting for secure omaps
  ARM: OMAP2+: Fix the machine entry for am3517
  ARM: dts: Fix missing entries for am3517
  ARM: OMAP2+: Fix overwriting hwmod data with data from device tree
  ARM: davinci: Fix McASP mem resource names
  ARM: highbank: handle soft poweroff and reset key events
  ARM: davinci: fix number of resources passed to davinci_gpio_register()
  gpio: davinci: fix check for unbanked gpio

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs

Pull btrfs fixes from Chris Mason:
"This is a small collection of fixes.  It was rebased this morning, but
  I was just fixing signed-off-by tags with the wrong email"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix access_ok() check in btrfs_ioctl_send()
  Btrfs: make sure we cleanup all reloc roots if error happens
  Btrfs: skip building backref tree for uuid and quota tree when doing balance relocation
  Btrfs: fix an oops when doing balance relocation
  Btrfs: don't miss skinny extent items on delayed ref head contention
  btrfs: call mnt_drop_write after interrupted subvol deletion
  Btrfs: don't clear the default compression type

Merge branch 'for-3.13' of git://linux-nfs.org/~bfields/linux

Pull nfsd reply cache bugfix from Bruce Fields:
"One bugfix for nfsd crashes"

* 'for-3.13' of git://linux-nfs.org/~bfields/linux:
nfsd: when reusing an existing repcache entry, unhash it first

mtd: nand: pxa3xx: Use info->use_dma to release DMA resources

In commit:

  commit 62e8b851783138a11da63285be0fbf69530ff73d
  Author: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
  Date:   Fri Oct 4 15:30:38 2013 -0300

  mtd: nand: pxa3xx: Allocate data buffer on detected flash size

the way the buffer is allocated was changed: the first READ_ID is issued
with a small kmalloc'ed buffer. Only once the flash page size is detected
the DMA buffers are allocated, and info->use_dma is set.

Currently, if the device detection fails, the driver checks the 'use_dma'
module parameter and tries to release unallocated DMA resources.

Fix this by checking the proper indicator of the DMA allocation, which
is 'info->use_dma'.

Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>

Partially revert "mtd: nand: pxa3xx: Introduce 'marvell,armada370-nand' compatible string"

This partially reverts c0f3b8643a6fa2461d70760ec49d21d2b031d611.

The "armada370-nand" compatible support is not complete, and it was mistake
to add it. Revert it and postpone the support until the infrastructure is
in place.

Cc: <stable@vger.kernel.org> # 3.12
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Acked-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>

selinux: process labeled IPsec TCP SYN-ACK packets properly in selinux_ip_postroute()

Due to difficulty in arriving at the proper security label for
TCP SYN-ACK packets in selinux_ip_postroute(), we need to check packets
while/before they are undergoing XFRM transforms instead of waiting
until afterwards so that we can determine the correct security label.

Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>

selinux: look for IPsec labels on both inbound and outbound packets

Previously selinux_skb_peerlbl_sid() would only check for labeled
IPsec security labels on inbound packets, this patch enables it to
check both inbound and outbound traffic for labeled IPsec security
labels.

Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>

selinux: handle TCP SYN-ACK packets correctly in selinux_ip_postroute()

In selinux_ip_postroute() we perform access checks based on the
packet's security label.  For locally generated traffic we get the
packet's security label from the associated socket; this works in all
cases except for TCP SYN-ACK packets.  In the case of SYN-ACK packet's
the correct security label is stored in the connection's request_sock,
not the server's socket.  Unfortunately, at the point in time when
selinux_ip_postroute() is called we can't query the request_sock
directly, we need to recreate the label using the same logic that
originally labeled the associated request_sock.

See the inline comments for more explanation.

Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Tested-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>

selinux: handle TCP SYN-ACK packets correctly in selinux_ip_output()

In selinux_ip_output() we always label packets based on the parent
socket. While this approach works in almost all cases, it doesn't
work in the case of TCP SYN-ACK packets when the correct label is not
the label of the parent socket, but rather the label of the larval
socket represented by the request_sock struct.

Unfortunately, since the request_sock isn't queued on the parent
socket until *after* the SYN-ACK packet is sent, we can't lookup the
request_sock to determine the correct label for the packet; at this
point in time the best we can do is simply pass/NF_ACCEPT the packet.
It must be said that simply passing the packet without any explicit
labeling action, while far from ideal, is not terrible as the SYN-ACK
packet will inherit any IP option based labeling from the initial
connection request so the label *should* be correct and all our
access controls remain in place so we shouldn't have to worry about
information leaks.

Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Tested-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>

i2c: imx: Check the return value from clk_prepare_enable()

clk_prepare_enable() may fail, so let's check its return value and propagate it
in the case of error.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>

KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

A guest can cause a BUG_ON() leading to a host kernel crash.
When the guest writes to the ICR to request an IPI, while in x2apic
mode the following things happen, the destination is read from
ICR2, which is a register that the guest can control.

kvm_irq_delivery_to_apic_fast uses the high 16 bits of ICR2 as the
cluster id. A BUG_ON is triggered, which is a protection against
accessing map->logical_map with an out-of-bounds access and manages
to avoid that anything really unsafe occurs.

The logic in the code is correct from real HW point of view. The problem
is that KVM supports only one cluster with ID 0 in clustered mode, but
the code that has the bug does not take this into account.

Reported-by: Lars Bull <larsbull@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Convert vapic synchronization to _cached functions (CVE-2013-6368)

In kvm_lapic_sync_from_vapic and kvm_lapic_sync_to_vapic there is the
potential to corrupt kernel memory if userspace provides an address that
is at the end of a page.  This patches concerts those functions to use
kvm_write_guest_cached and kvm_read_guest_cached.  It also checks the
vapic_address specified by userspace during ioctl processing and returns
an error to userspace if the address is not a valid GPA.

This is generally not guest triggerable, because the required write is
done by firmware that runs before the guest.  Also, it only affects AMD
processors and oldish Intel that do not have the FlexPriority feature
(unless you disable FlexPriority, of course; then newer processors are
also affected).

Fixes: b93463aa59d6 ('KVM: Accelerated apic support')
Reported-by: Andrew Honig <ahonig@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Fix potential divide by 0 in lapic (CVE-2013-6367)

Under guest controllable circumstances apic_get_tmcct will execute a
divide by zero and cause a crash. If the guest cpuid support
tsc deadline timers and performs the following sequence of requests
the host will crash.
- Set the mode to periodic
- Set the TMICT to 0
- Set the mode bits to 11 (neither periodic, nor one shot, nor tsc deadline)
- Set the TMICT to non-zero.
Then the lapic_timer.period will be 0, but the TMICT will not be. If the
guest then reads from the TMCCT then the host will perform a divide by 0.

This patch ensures that if the lapic_timer.period is 0, then the division
does not occur.

Reported-by: Andrew Honig <ahonig@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: Improve create VCPU parameter (CVE-2013-4587)

In multiple functions the vcpu_id is used as an offset into a bitfield.  Ag
malicious user could specify a vcpu_id greater than 255 in order to set or
clear bits in kernel memory.  This could be used to elevate priveges in the
kernel.  This patch verifies that the vcpu_id provided is less than 255.
The api documentation already specifies that the vcpu_id must be less than
max_vcpus, but this is currently not checked.

Reported-by: Andrew Honig <ahonig@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i2c: mux: Inherit retry count and timeout from parent for muxed bus

If a muxed i2c bus gets created the default retry count and
timeout of the muxed bus is zero. Hence it it possible that you
end up with a situation where the parent controller sets a default
retry count and timeout which gets applied and used while the muxed
bus (using the same controller) has a default retry count of zero
and a default timeout of 1s (set in i2c_add_adapter()). This can be
solved by initializing the retry count and timeout of the muxed
bus with the values used by the the parent at creation time.

Signed-off-by: Elie De Brauwer <eliedebrauwer@gmail.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>

Merge tag 'sound-3.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Still a slightly high amount of changes than wished, but they are all
  good regression and/or device-specific fixes.  Majority of commits are
  for HD-audio, an HDMI ctl index fix that hits old graphics boards,
  regression fixes for AD codecs and a few quirks.

  Other than that, two major fixes are included: a 64bit ABI fix for
  compress offload, and 64bit dma_addr_t truncation fix, which had hit
  on PAE kernels"

* tag 'sound-3.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Add static DAC/pin mapping for AD1986A codec
  ALSA: hda - One more Dell headset detection quirk
  ALSA: hda - hdmi: Fix IEC958 ctl indexes for some simple HDMI devices
  ALSA: hda - Mute all aamix inputs as default
  ALSA: compress: Fix 64bit ABI incompatibility
  ALSA: memalloc.h - fix wrong truncation of dma_addr_t
  ALSA: hda - Another Dell headset detection quirk
  ALSA: hda - A Dell headset detection quirk
  ALSA: hda - Remove quirk for Dell Vostro 131
  ALSA: usb-audio: fix uninitialized variable compile warning
  ALSA: hda - fix mic issues on Acer Aspire E-572

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:
"A fix for recent sysfs breakage in serio subsystem plus a fixup to
  adxl34x driver"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: adxl34x - Fix bug in definition of ADXL346_2D_ORIENT
  Input: serio - fix sysfs layout

xen-netback: fix gso_prefix check

There is a mistake in checking the gso_prefix mask when passing large
packets to a guest. The wrong shift is applied to the bit - the raw skb
gso type is used rather then the translated one. This leads to large packets
being handed to the guest without the GSO metadata. This patch fixes the
check.

The mistake manifested as errors whilst running Microsoft HCK large packet
offload tests between a pair of Windows 8 VMs. I have verified this patch
fixes those errors.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: make neigh_priv_len in struct net_device 16bit instead of 8bit

neigh_priv_len is defined as u8. With all debug enabled struct
ipoib_neigh has 200 bytes. The largest part is sk_buff_head with 96
bytes and here the spinlock with 72 bytes.
The size value still fits in this u8 leaving some room for more.

On -RT struct ipoib_neigh put on weight and has 392 bytes. The main
reason is sk_buff_head with 288 and the fatty here is spinlock with 192
bytes. This does no longer fit into into neigh_priv_len and gcc
complains.

This patch changes neigh_priv_len from being 8bit to 16bit. Since the
following element (dev_id) is 16bit followed by a spinlock which is
aligned, the struct remains with a total size of 3200 (allmodconfig) /
2048 (with as much debug off as possible) bytes on x86-64.
On x86-32 the struct is 1856 (allmodconfig) / 1216 (with as much debug
off as possible) bytes long. The numbers were gained with and without
the patch to prove that this change does not increase the size of the
struct.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:
"A dvb core deadlock fix, a couple videobuf2 fixes an a series of media
  driver fixes"

* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (30 commits)
  [media] videobuf2-dma-sg: fix possible memory leak
  [media] vb2: regression fix: always set length field.
  [media] mt9p031: Include linux/of.h header
  [media] rtl2830: add parent for I2C adapter
  [media] media: marvell-ccic: use devm to release clk
  [media] ths7303: Declare as static a private function
  [media] em28xx-video: Swap release order to avoid lock nesting
  [media] usbtv: Add support for PAL video source
  [media] media_tree: Fix spelling errors
  [media] videobuf2: Add support for file access mode flags for DMABUF exporting
  [media] radio-shark2: Mark shark_resume_leds() inline to kill compiler warning
  [media] radio-shark: Mark shark_resume_leds() inline to kill compiler warning
  [media] af9035: unlock on error in af9035_i2c_master_xfer()
  [media] af9033: fix broken I2C
  [media] v4l: omap3isp: Don't check for missing get_fmt op on remote subdev
  [media] af9035: fix broken I2C and USB I/O
  [media] wm8775: fix broken audio routing
  [media] marvell-ccic: drop resource free in driver remove
  [media] tef6862/radio-tea5764: actually assign clamp result
  [media] cx231xx: use after free on error path in probe
  ...

Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fix from Guenter Roeck:
"Fix HIH-6130 driver to work with BeagleBone"

* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: HIH-6130: Support I2C bus drivers without I2C_FUNC_SMBUS_QUICK

Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging

Pull hwmon fixes from Jean Delvare.

* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
  hwmon: Prevent some divide by zeros in FAN_TO_REG()
  hwmon: (w83l768ng) Fix fan speed control range
  hwmon: (w83l786ng) Fix fan speed control mode setting and reporting
  hwmon: (lm90) Unregister hwmon device if interrupt setup fails

drivers: net: cpsw: fix for cpsw crash when build as modules

When CPSW and Davinci MDIO are build as modules, CPSW crashes when
accessing CPSW registers in CPSW probe. The same is working in built-in
as the CPSW clocks are enabled in Davindi MDIO probe, SO Enabling the
clocks before accessing the version register and moving out the other
register access to cpsw device open.

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

word-at-a-time: provide generic big-endian zero_bytemask implementation

Whilst architectures may be able to do better than this (which they can,
by simply defining their own macro), this is a generic stab at a
zero_bytemask implementation for the asm-generic, big-endian
word-at-a-time implementation.

On arm64, a clz instruction is used to implement the fls efficiently.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

dcache: allow word-at-a-time name hashing with big-endian CPUs

When explicitly hashing the end of a string with the word-at-a-time
interface, we have to be careful which end of the word we pick up.

On big-endian CPUs, the upper-bits will contain the data we're after, so
ensure we generate our masks accordingly (and avoid hashing whatever
random junk may have been sitting after the string).

This patch adds a new dcache helper, bytemask_from_count, which creates
a mask appropriate for the CPU endianness.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

xen-netback: napi: don't prematurely request a tx event

This patch changes the RING_FINAL_CHECK_FOR_REQUESTS in
xenvif_build_tx_gops to a check for RING_HAS_UNCONSUMED_REQUESTS as the
former call has the side effect of advancing the ring event pointer and
therefore inviting another interrupt from the frontend before the napi
poll has actually finished, thereby defeating the point of napi.

The event pointer is updated by RING_FINAL_CHECK_FOR_REQUESTS in
xenvif_poll, the napi poll function, if the work done is less than the
budget i.e. when actually transitioning back to interrupt mode.

Reported-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xen-netback: napi: fix abuse of budget

netback seems to be somewhat confused about the napi budget parameter. The
parameter is supposed to limit the number of skbs processed in each poll,
but netback has this confused with grant operations.

This patch fixes that, properly limiting the work done in each poll. Note
that this limit makes sure we do not process any more data from the shared
ring than we intend to pass back from the poll. This is important to
prevent tx_queue potentially growing without bound.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'iommu-fixes-for-v3.13-rc4' of git://github.com/awilliam/linux-vfio

Pull iommu fixes from Alex Williamson:
"arm/smmu driver updates via Will Deacon fixing locking around page
  table walks and a couple other issues"

* tag 'iommu-fixes-for-v3.13-rc4' of git://github.com/awilliam/linux-vfio:
  iommu/arm-smmu: fix error return code in arm_smmu_device_dt_probe()
  iommu/arm-smmu: remove potential NULL dereference on mapping path
  iommu/arm-smmu: use mutex instead of spinlock for locking page tables

Merge tag 'keys-devel-20131210' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull misc keyrings fixes from David Howells:
"These break down into five sets:

   - A patch to error handling in the big_key type for huge payloads.
     If the payload is larger than the "low limit" and the backing store
     allocation fails, then big_key_instantiate() doesn't clear the
     payload pointers in the key, assuming them to have been previously
     cleared - but only one of them is.

     Unfortunately, the garbage collector still calls big_key_destroy()
     when sees one of the pointers with a weird value in it (and not
     NULL) which it then tries to clean up.

   - Three patches to fix the keyring type:

     * A patch to fix the hash function to correctly divide keyrings off
       from keys in the topology of the tree inside the associative
       array.  This is only a problem if searching through nested
       keyrings - and only if the hash function incorrectly puts the a
       keyring outside of the 0 branch of the root node.

     * A patch to fix keyrings' use of the associative array.  The
       __key_link_begin() function initially passes a NULL key pointer
       to assoc_array_insert() on the basis that it's holding a place in
       the tree whilst it does more allocation and stuff.

       This is only a problem when a node contains 16 keys that match at
       that level and we want to add an also matching 17th.  This should
       easily be manufactured with a keyring full of keyrings (without
       chucking any other sort of key into the mix) - except for (a)
       above which makes it on average adding the 65th keyring.

     * A patch to fix searching down through nested keyrings, where any
       keyring in the set has more than 16 keyrings and none of the
       first keyrings we look through has a match (before the tree
       iteration needs to step to a more distal node).

     Test in keyutils test suite:

        http://git.kernel.org/cgit/linux/kernel/git/dhowells/keyutils.git/commit/?id=8b4ae963ed92523aea18dfbb8cab3f4979e13bd1

   - A patch to fix the big_key type's use of a shmem file as its
     backing store causing audit messages and LSM check failures.  This
     is done by setting S_PRIVATE on the file to avoid LSM checks on the
     file (access to the shmem file goes through the keyctl() interface
     and so is gated by the LSM that way).

     This isn't normally a problem if a key is used by the context that
     generated it - and it's currently only used by libkrb5.

     Test in keyutils test suite:

        http://git.kernel.org/cgit/linux/kernel/git/dhowells/keyutils.git/commit/?id=d9a53cbab42c293962f2f78f7190253fc73bd32e

   - A patch to add a generated file to .gitignore.

   - A patch to fix the alignment of the system certificate data such
     that it it works on s390.  As I understand it, on the S390 arch,
     symbols must be 2-byte aligned because loading the address discards
     the least-significant bit"

* tag 'keys-devel-20131210' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  KEYS: correct alignment of system_certificate_list content in assembly file
  Ignore generated file kernel/x509_certificate_list
  security: shmem: implement kernel private shmem inodes
  KEYS: Fix searching of nested keyrings
  KEYS: Fix multiple key add into associative array
  KEYS: Fix the keyring hash function
  KEYS: Pre-clear struct key on allocation

Merge tag 'xfs-for-linus-v3.13-rc4' of git://oss.sgi.com/xfs/xfs

Pull xfs bugfixes from Ben Myers:

- fix for buffer overrun in agfl with growfs on v4 superblock

- return EINVAL if requested discard length is less than a block

- fix possible memory corruption in xfs_attrlist_by_handle()

* tag 'xfs-for-linus-v3.13-rc4' of git://oss.sgi.com/xfs/xfs:
  xfs: growfs overruns AGFL buffer on V4 filesystems
  xfs: don't perform discard if the given range length is less than block size
  xfs: underflow bug in xfs_attrlist_by_handle()

futex: move user address verification up to common code

When debugging the read-only hugepage case, I was confused by the fact
that get_futex_key() did an access_ok() only for the non-shared futex
case, since the user address checking really isn't in any way specific
to the private key handling.

Now, it turns out that the shared key handling does effectively do the
equivalent checks inside get_user_pages_fast() (it doesn't actually
check the address range on x86, but does check the page protections for
being a user page). So it wasn't actually a bug, but the fact that we
treat the address differently for private and shared futexes threw me
for a loop.

Just move the check up, so that it gets done for both cases. Also, use
the 'rw' parameter for the type, even if it doesn't actually matter any
more (it's a historical artifact of the old racy i386 "page faults from
kernel space don't check write protections").

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

futex: fix handling of read-only-mapped hugepages

The hugepage code had the exact same bug that regular pages had in
commit 7485d0d3758e ("futexes: Remove rw parameter from
get_futex_key()").

The regular page case was fixed by commit 9ea71503a8ed ("futex: Fix
regression with read only mappings"), but the transparent hugepage case
(added in a5b338f2b0b1: "thp: update futex compound knowledge") case
remained broken.

Found by Dave Jones and his trinity tool.

Reported-and-tested-by: Dave Jones <davej@fedoraproject.org>
Cc: stable@kernel.org # v2.6.38+
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Btrfs: fix access_ok() check in btrfs_ioctl_send()

The closing parenthesis is in the wrong place. We want to check
"sizeof(*arg->clone_sources) * arg->clone_sources_count" instead of
"sizeof(*arg->clone_sources * arg->clone_sources_count)".

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
cc: stable@vger.kernel.org

Btrfs: make sure we cleanup all reloc roots if error happens

I hit an oops when merging reloc roots fails, the reason is that
new reloc roots may be added and we should make sure we cleanup
all reloc roots.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>

Btrfs: skip building backref tree for uuid and quota tree when doing balance relocation

Quota tree and UUID Tree is only cowed, they can not be snapshoted.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>

Btrfs: fix an oops when doing balance relocation

I hit an oops when inserting reloc root into @reloc_root_tree(it can be
easily triggered when forcing cow for relocation root)

[  866.494539]  [<ffffffffa0499579>] btrfs_init_reloc_root+0x79/0xb0 [btrfs]
[  866.495321]  [<ffffffffa044c240>] record_root_in_trans+0xb0/0x110 [btrfs]
[  866.496109]  [<ffffffffa044d758>] btrfs_record_root_in_trans+0x48/0x80 [btrfs]
[  866.496908]  [<ffffffffa0494da8>] select_reloc_root+0xa8/0x210 [btrfs]
[  866.497703]  [<ffffffffa0495c8a>] do_relocation+0x16a/0x540 [btrfs]

This is because reloc root inserted into @reloc_root_tree is not within one
transaction,reloc root may be cowed and root block bytenr will be reused then
oops happens.We should update reloc root in @reloc_root_tree when cow reloc
root node, fix it.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>

Btrfs: don't miss skinny extent items on delayed ref head contention

Currently extent-tree.c:btrfs_lookup_extent_info() can miss the lookup
of skinny extent items. This can happen when the execution flow is the
following:

* We do an extent tree lookup and fail to find a skinny extent item;

* As a result, we attempt to see if a non-skinny extent item exists,
  either by looking at previous item in the leaf or by doing another
  full extent tree search;

* We have a transaction and then we check for a matching delayed ref
  head in the transaction's delayed refs rbtree;

* We find such delayed ref head and then we try to lock it with a
  call to mutex_trylock();

* The lock was contended so we jump to the label "again", which repeats
  the extent tree search but for a non-skinny extent item, because we set
  previously metadata variable to 0 and the search key to look for a
  non-skinny extent-item;

* After the jump (and after releasing the transaction's delayed refs
  lock), a skinny extent item might have been added to the extent tree
  but we will miss it because metadata is set to 0 and the search key
  is set for a non-skinny extent-item.

The fix here is to not reset metadata to 0 and to jump to the initial search
key setup if the delayed ref head is contended, instead of jumping directly
to the extent tree search label ("again").

This issue was found while investigating the issue reported at Bugzilla 64961.

David Sterba suspected this function was missing extent items, and that
this could be caused by the last change to this function, which was made
in the following patch:

    [PATCH] Btrfs: optimize btrfs_lookup_extent_info()
    (commit 74be9510876a66ad9826613ac8a526d26f9e7f01)

But in fact this issue already existed before, because after failing to find
a skinny extent item, the code set the search key for a non-skinny extent
item, and on contention of a matching delayed ref head it would not search
the extent tree for a skinny extent item anymore.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>

btrfs: call mnt_drop_write after interrupted subvol deletion

If btrfs_ioctl_snap_destroy blocks on the mutex and the process is
killed, mnt_write count is unbalanced and leads to unmountable
filesystem.

CC: stable@vger.kernel.org
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>

Btrfs: don't clear the default compression type

We met a oops caused by the wrong compression type:
[  556.512356] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  556.512370] IP: [<ffffffff811dbaa0>] __list_del_entry+0x1/0x98
[SNIP]
[  556.512490]  [<ffffffff811dbb44>] ? list_del+0xd/0x2b
[  556.512539]  [<ffffffffa05dd5ce>] find_workspace+0x97/0x175 [btrfs]
[  556.512546]  [<ffffffff813c14b5>] ? _raw_spin_lock+0xe/0x10
[  556.512576]  [<ffffffffa05de276>] btrfs_compress_pages+0x2d/0xa2 [btrfs]
[  556.512601]  [<ffffffffa05af060>] compress_file_range.constprop.54+0x1f2/0x4e8 [btrfs]
[  556.512627]  [<ffffffffa05af388>] async_cow_start+0x32/0x4d [btrfs]
[  556.512655]  [<ffffffffa05cc7a1>] worker_loop+0x144/0x4c3 [btrfs]
[  556.512661]  [<ffffffff81059404>] ? finish_task_switch+0x80/0xb8
[  556.512689]  [<ffffffffa05cc65d>] ? btrfs_queue_worker+0x244/0x244 [btrfs]
[  556.512695]  [<ffffffff8104fa4e>] kthread+0x8d/0x95
[  556.512699]  [<ffffffff81050000>] ? bit_waitqueue+0x34/0x7d
[  556.512704]  [<ffffffff8104f9c1>] ? __kthread_parkme+0x65/0x65
[  556.512709]  [<ffffffff813c7eec>] ret_from_fork+0x7c/0xb0
[  556.512713]  [<ffffffff8104f9c1>] ? __kthread_parkme+0x65/0x65

Steps to reproduce:
# mkfs.btrfs -f <dev>
# mount -o nodatacow <dev> <mnt>
# touch <mnt>/<file>
# chattr =c <mnt>/<file>
# dd if=/dev/zero of=<mnt>/<file> bs=1M count=10

It is because we cleared the default compression type when setting the
nodatacow. In fact, we needn't do it because we have used COMPRESS flag to
indicate if we need compressed the file data or not, needn't use the
variant -- compress_type -- in btrfs_info to do the same thing, and just
use it to hold the default compression type. Or we would get a wrong compress
type for a file whose own compress flag is set but the compress flag of its
filesystem is not set.

Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>

hwmon: Prevent some divide by zeros in FAN_TO_REG()

The "rpm * div" operations can overflow here, so this patch adds an
upper limit to rpm to prevent that. Jean Delvare helped me with this
patch.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Roger Lucas <vt8231@hiddenengine.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Jean Delvare <khali@linux-fr.org>

hwmon: (w83l768ng) Fix fan speed control range

The W83L786NG stores the fan speed on 4 bits while the sysfs interface
uses a 0-255 range. Thus the driver should scale the user input down
to map it to the device range, and scale up the value read from the
device before presenting it to the user. The reserved register nibble
should be left unchanged.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: stable@vger.kernel.org
Reviewed-by: Guenter Roeck <linux@roeck-us.net>

hwmon: (w83l786ng) Fix fan speed control mode setting and reporting

The wrong mask is used, which causes some fan speed control modes
(pwmX_enable) to be incorrectly reported, and some modes to be
impossible to set.

[JD: add subject and description.]

Signed-off-by: Brian Carnes <bmcarnes@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jean Delvare <khali@linux-fr.org>

hwmon: (lm90) Unregister hwmon device if interrupt setup fails

Commit 109b1283fb (hwmon: (lm90) Add support to handle IRQ) introduced
interrupt support. Its error handling code fails to unregister the already
registered hwmon device.

Fixes: 109b1283fb532ac773a076748ffccf76a7067cab
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>

sch_tbf: use do_div() for 64-bit divide

It's doing a 64-bit divide which is not supported
on 32-bit architectures in psched_ns_t2l(). The
correct way to do this is to use do_div().

It's introduced by commit cc106e441a63
("net: sched: tbf: fix the calculation of max_size")

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp: ipv4: must add synchronization in udp_sk_rx_dst_set()

Unlike TCP, UDP input path does not hold the socket lock.

Before messing with sk->sk_rx_dst, we must use a spinlock, otherwise
multiple cpus could leak a refcount.

This patch also takes care of renewing a stale dst entry.
(When the sk->sk_rx_dst would not be used by IP early demux)

Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net:fec: remove duplicate lines in comment about errata ERR006358

commit 031916568a1aa2ef1809f86d26f0bcfa215ff5c0 worked around
errata ERR006358, but comment contains duplicated lines, impairing
the readability. Remove them.

Signed-off-by: Philippe De Muyter <phdm@macqel.be>
Signed-off-by: David S. Miller <davem@davemloft.net>