Jie Liu [Mon, 16 Dec 2013 23:44:56 +0000 (10:44 +1100)]
ocfs2: return EOPNOTSUPP if the device does not support discard
For FITRIM ioctl(2), we should return EOPNOTSUPP to inform the user that
the storage device does not support discard if it is, otherwise return
success would confuse the user even though there is no free blocks were
trimmed at all.
Signed-off-by: Jie Liu <jeff.liu@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Younger Liu [Mon, 16 Dec 2013 23:44:56 +0000 (10:44 +1100)]
ocfs2: remove redundant ocfs2_alloc_dinode_update_counts() and ocfs2_block_group_set_bits()
ocfs2_alloc_dinode_update_counts() and ocfs2_block_group_set_bits() are
already provided in suballoc.c. So, the same functions in move_extents.c
are not needed any more.
Declare the functions in suballoc.h and remove redundant functions in
move_extents.c.
Signed-off-by: Younger Liu <liuyiyang@hisense.com> Cc: Younger Liu <younger.liucn@gmail.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ocfs2: use the new DLM operation callbacks while requesting new lockspace
Attempt to use the new DLM operations. If it is not supported, use the
traditional ocfs2_controld.
To exchange ocfs2 versioning, we use the LVB of the version dlm lock. It
first attempts to take the lock in EX mode (non-blocking). If successful
(which means it is the first mount), it writes the version number and
downconverts to PR lock. If it is unsuccessful, it reads the version from
the lock.
If this becomes the standard (with o2cb as well), it could simplify
userspace tools to check if the filesystem is mounted on other nodes.
Dan: Since ocfs2_protocol_version are two u8 values, the additional
checks with LONG* don't make sense.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ocfs2: shift allocation ocfs2_live_connection to user_connect()
We perform this because the DLM recovery callbacks will require the
ocfs2_live_connection structure to record the node information when
dlm_new_lockspace() is updated (in the last patch of the series).
Before calling dlm_new_lockspace(), we need the structure ready for the
.recover_done() callback, which would set oc_this_node. This is the
reason we allocate ocfs2_live_connection beforehand in user_connect().
[AKPM] rc initialization is not required because it assigned in case of
errors. It will be cleared by compiler anyways.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reveiwed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
These are the callbacks called by the fs/dlm code in case the membership
changes. If there is a failure while/during calling any of these, the DLM
creates a new membership and relays to the rest of the nodes.
recover_prep() is called when DLM understands a node is down.
recover_slot() is called once all nodes have acknowledged recover_prep and
recovery can begin. recover_done() is called once the recovery is
complete. It returns the new membership.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This is an effort of removing ocfs2_controld.pcmk and getting ocfs2 DLM
handling up to the times with respect to DLM (>=4.0.1) and corosync
(2.3.x). AFAIK, cman also is being phased out for a unified corosync
cluster stack.
fs/dlm performs all the functions with respect to fencing and node
management and provides the API's to do so for ocfs2. For all future
references, DLM stands for fs/dlm code.
The advantages are:
+ No need to run an additional userspace daemon (ocfs2_controld)
+ No controld device handling and controld protocol
+ Shifting responsibilities of node management to DLM layer
For backward compatibility, we are keeping the controld handling code. Once
enough time has passed we can remove a significant portion of the code. This
was tested by using the kernel with changes on older unmodified tools. The
kernel used ocfs2_controld as expected, and displayed the appropriate
warning message.
This feature requires modification in the userspace ocfs2-tools. The
changes can be found at: https://github.com/goldwynr/ocfs2-tools branch:
nocontrold Currently, not many checks are present in the userspace code,
but that would change soon.
This patch (of 6):
Add clustername to cluster connection.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tariq Saeed [Mon, 16 Dec 2013 23:44:54 +0000 (10:44 +1100)]
ocfs2/o2net: incorrect to terminate accepting connections loop upon rejecting an invalid one
When o2net-accept-one() rejects an illegal connection, it terminates the
loop picking up the remaining queued connections. This fix will continue
accepting connections till the queue is emtpy.
Zongxun Wang [Mon, 16 Dec 2013 23:44:54 +0000 (10:44 +1100)]
ocfs2: free allocated clusters if error occurs after ocfs2_claim_clusters
Even if using the same jbd2 handle, we cannot rollback a transaction. So
once some error occurs after successfully allocating clusters, the
allocated clusters will never be used and it means they are lost. For
example, call ocfs2_claim_clusters successfully when expanding a file, but
failed in ocfs2_insert_extent. So we need free the allocated clusters if
they are not used indeed.
Signed-off-by: Zongxun Wang <wangzongxun@huawei.com> Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The versioning information is confusing for end-users. The numbers are
stuck at 1.5.0 when the tools version have moved to 1.8.2. Remove the
versioning system in the OCFS2 modules and let the kernel version be the
guide to debug issues.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Acked-by: Sunil Mushran <sunil.mushran@gmail.com> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Younger Liu [Mon, 16 Dec 2013 23:44:54 +0000 (10:44 +1100)]
ocfs2: fix ocfs2_sync_file() if filesystem is readonly
If filesystem is readonly, there is no need to flush drive's caches or
force any uncommitted transactions.
Signed-off-by: Younger Liu <younger.liucn@gmail.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Samuel Thibault [Mon, 16 Dec 2013 23:44:53 +0000 (10:44 +1100)]
input: route kbd LEDs through the generic LEDs layer
This permits to reassign keyboard LEDs to something else than keyboard
"leds" state, by adding keyboard led and modifier triggers connected to a
series of VT input LEDs, themselves connected to VT input triggers, which
per-input device LEDs use by default. Userland can thus easily change the
LED behavior of (a priori) all input devices, or of particular input
devices.
This also permits to fix #7063 from userland by using a modifier to
implement proper CapsLock behavior and have the keyboard caps lock led
show that modifier state.
[ebroder@mokafive.com: Rebased to 3.2-rc1 or so, cleaned up some includes, and fixed some constants]
[blogic@openwrt.org: CONFIG_INPUT_LEDS stubs should be static inline] Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Signed-off-by: Evan Broder <evan@ebroder.net> Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Tested-by: Pavel Machek <pavel@ucw.cz> Acked-by: Peter Korsgaard <jacmet@sunsite.dk> Cc: Pavel Machek <pavel@ucw.cz> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Bryan Wu <cooloney@gmail.com> Cc: Arnaud Patard <arnaud.patard@rtp-net.org> Cc: Sascha Hauer <s.hauer@pengutronix.de> Cc: Matt Sealey <matt@genesi-usa.com> Cc: Rob Clark <robdclark@gmail.com> Cc: Niels de Vos <devos@fedoraproject.org> Cc: Steev Klimaszewski <steev@genesi-usa.com> Signed-off-by: John Crispin <blogic@openwrt.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stephen Boyd [Mon, 16 Dec 2013 23:44:52 +0000 (10:44 +1100)]
sched_clock: document 4Mhz vs 1Mhz decision
Bo Shen sent a patch to change this to 1Mhz instead of 4Mhz but according
to Russell King the use of 4Mhz was intentional. Add a comment to this
effect so that others don't try to change the code as well.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Cc: Bo Shen <voice.shen@atmel.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jeff Mahoney [Mon, 16 Dec 2013 23:44:51 +0000 (10:44 +1100)]
drm/nouveau: make vga_switcheroo code depend on VGA_SWITCHEROO
Commit 8116188fdef594 ("nouveau/acpi: hook up to the MXM method for mux
switching.") broke the build on non-x86 architectures due to the new
dependency on MXM and MXM being an x86 platform driver.
It built previously since the vga switcheroo registration routines were
zereod out on !X86. The code was built in but unused.
This patch makes all of the DSM code depend on CONFIG_VGA_SWITCHEROO,
allowing it to build on non-x86 and shrinking the module size as well.
[rdunlap@infradead.org: fix build eror when VGA_SWITCHEROO is not enabled] Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: David Airlie <airlied@linux.ie> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Takashi Iwai [Mon, 16 Dec 2013 23:44:51 +0000 (10:44 +1100)]
drm/cirrus: correct register values for 16bpp
When the mode is set with 16bpp on QEMU, the output gets totally broken.
The culprit is the bogus register values set for 16bpp, which was likely
copied from from a wrong place.
Daniel Vetter [Mon, 16 Dec 2013 23:44:51 +0000 (10:44 +1100)]
drm/fb-helper: don't sleep for screen unblank when an oops is in progress
Otherwise the system will burn even brighter and worse, leave the user
wondering what's going on exactly.
Since we already have a panic handler which will (try) to restore the
entire fbdev console mode, we can just bail out. Inspired by a patch from
Konstantin Khlebnikov. The callchain leading to this, cut&pasted from
Konstantin's original patch:
Note that the entire locking in the fb helper around panic/sysrq and kdbg
is ... non-existant. So we have a decent change of blowing up
everything. But since reworking this ties in with funny concepts like the
fbdev notifier chain or the impressive things which happen around
console_lock while oopsing, I'll leave that as an exercise for braver
souls than me.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Dave Airlie <airlied@gmail.com> Reviewed-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jan Kara [Mon, 16 Dec 2013 23:44:51 +0000 (10:44 +1100)]
fsnotify: remove pointless NULL initializers
We usually rely on the fact that struct members not specified in the
initializer are set to NULL. So do that with fsnotify function pointers
as well.
Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Eric Paris <eparis@parisplace.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jan Kara [Mon, 16 Dec 2013 23:44:50 +0000 (10:44 +1100)]
fsnotify: remove .should_send_event callback
After removing event structure creation from the generic layer there is no
reason for separate .should_send_event and .handle_event callbacks. So
just remove the first one.
Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Eric Paris <eparis@parisplace.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jan Kara [Mon, 16 Dec 2013 23:44:50 +0000 (10:44 +1100)]
fsnotify: do not share events between notification groups
Currently fsnotify framework creates one event structure for each
notification event and links this event into all interested notification
groups. This is done so that we save memory when several notification
groups are interested in the event. However the need for event structure
shared between inotify & fanotify bloats the event structure so the result
is often higher memory consumption.
Another problem is that fsnotify framework keeps path references with
outstanding events so that fanotify can return open file descriptors with
its events. This has the undesirable effect that filesystem cannot be
unmounted while there are outstanding events - a regression for inotify
compared to a situation before it was converted to fsnotify framework.
For fanotify this problem is hard to avoid and users of fanotify should
kind of expect this behavior when they ask for file descriptors from
notified files.
This patch changes fsnotify and its users to create separate event
structure for each group. This allows for much simpler code (~400 lines
removed by this patch) and also smaller event structures. For example on
64-bit system original struct fsnotify_event consumes 120 bytes, plus
additional space for file name, additional 24 bytes for second and each
subsequent group linking the event, and additional 32 bytes for each
inotify group for private data. After the conversion inotify event
consumes 48 bytes plus space for file name which is considerably less
memory unless file names are long and there are several groups interested
in the events (both of which are uncommon). Fanotify event fits in 56
bytes after the conversion (fanotify doesn't care about file names so its
events don't have to have it allocated). A win unless there are four or
more fanotify groups interested in the event.
The conversion also solves the problem with unmount when only inotify is
used as we don't have to grab path references for inotify events.
Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Eric Paris <eparis@parisplace.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jan Kara [Mon, 16 Dec 2013 23:44:50 +0000 (10:44 +1100)]
inotify: provide function for name length rounding
Rounding of name length when passing it to userspace was done in several
places. Provide a function to do it and use it in all places.
Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Eric Paris <eparis@parisplace.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Shuah Khan [Mon, 16 Dec 2013 23:44:50 +0000 (10:44 +1100)]
dma-debug: enhance dma_debug_device_change() to check for mapping errors
dma-debug checks to verify if driver validated the address returned by dma
mapping routines when driver does unmap. If a driver doesn't call unmap,
failure to check mapping errors isn't detected and reported.
Enhance the existing bus notifier_call dma_debug_device_change() to check
for mapping errors at the same time it detects leaked dma buffers for
BUS_NOTIFY_UNBOUND_DRIVER event. It scans for mapping errors and if any
found, prints one warning message that includes mapping error count.
Signed-off-by: Shuah Khan <shuah.kh@samsung.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Vladimir Murzin [Mon, 16 Dec 2013 23:44:49 +0000 (10:44 +1100)]
arm: move arm_dma_limit to setup_dma_zone
Since 4dcfa600 ("ARM: DMA-API: better handing of DMA masks for coherent
allocations") arm_dma_limit_pfn has almost substituted the arm_dma_limit.
The remaining user is dma_contiguous_reserve(). It is also referenced in
setup_dma_zone() to calculate arm_dma_limit_pfn.
Kill the global arm_dma_limit and equip setup_zone_dma with the local one.
Signed-off-by: Vladimir Murzin <murzin.v@gmail.com> Reported-by: Vassili Karpov <av1474@comtv.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Toshi Kani [Mon, 16 Dec 2013 23:44:49 +0000 (10:44 +1100)]
arch/x86/mm/srat.c: skip NUMA_NO_NODE while parsing SLIT
When ACPI SLIT table has an I/O locality (i.e. a locality unique to an
I/O device), numa_set_distance() emits the warning message below.
NUMA: Warning: node ids are out of bound, from=-1 to=-1 distance=10
acpi_numa_slit_init() calls numa_set_distance() with pxm_to_node(), which
assumes that all localities have been parsed with SRAT previously. SRAT
does not list I/O localities, where as SLIT lists all localities including
I/Os. Hence, pxm_to_node() returns NUMA_NO_NODE (-1) for an I/O locality.
I/O localities are not supported and are ignored today, but emitting such
warning message leads unnecessary confusion.
Change acpi_numa_slit_init() to avoid calling numa_set_distance() with
NUMA_NO_NODE.
Signed-off-by: Toshi Kani <toshi.kani@hp.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
sh: add EXPORT_SYMBOL(min_low_pfn) and EXPORT_SYMBOL(max_low_pfn) to sh_ksyms_32.c
Min_low_pfn and max_low_pfn were used in pfn_valid macro if defined
CONFIG_FLATMEM. When the functions that use the pfn_valid is used in driver
module, max_low_pfn and min_low_pfn is to undefined, and fail to build.
Jianguo Wu [Mon, 16 Dec 2013 23:44:48 +0000 (10:44 +1100)]
mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully
After a successful hugetlb page migration by soft offline, the source page
will either be freed into hugepage_freelists or buddy(over-commit page).
If page is in buddy, page_hstate(page) will be NULL. It will hit a NULL
pointer dereference in dequeue_hwpoisoned_huge_page().
Joonsoo Kim [Mon, 16 Dec 2013 23:44:48 +0000 (10:44 +1100)]
mm/compaction: respect ignore_skip_hint in update_pageblock_skip
update_pageblock_skip() only fits to compaction which tries to isolate by
pageblock unit. If isolate_migratepages_range() is called by CMA, it try
to isolate regardless of pageblock unit and it don't reference
get_pageblock_skip() by ignore_skip_hint. We should also respect it on
update_pageblock_skip() to prevent from setting the wrong information.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: <stable@vger.kernel.org> [3.7+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
It's caused by uninitialized persistent_keyring_register_sem.
The bug was introduced by commit f36f8c75 ("KEYS: Add per-user_namespace
registers for persistent per-UID kerberos caches"). Two typos are in that
commit: CONFIG_KEYS_KERBEROS_CACHE should be CONFIG_PERSISTENT_KEYRINGS
and krb_cache_register_sem should be persistent_keyring_register_sem.
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Acked-by: Josh Boyer <jwboyer@fedoraproject.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
sh: always link in helper functions extracted from libgcc
E.g. landisk_defconfig, which has CONFIG_NTFS_FS=m:
ERROR: "__ashrdi3" [fs/ntfs/ntfs.ko] undefined!
For "lib-y", if no symbols in a compilation unit are referenced by other
units, the compilation unit will not be included in vmlinux.
This breaks modules that do reference those symbols.
This doesn't fix all cases. There are others, e.g. udivsi3.
This is also not limited to sh, many architectures handle this in the same
way.
A simple solution is to unconditionally include all helper functions.
A more complex solution is to make the choice of "lib-y" or "obj-y" depend
on CONFIG_MODULES:
obj-$(CONFIG_MODULES) += ...
lib-y($CONFIG_MODULES) += ...
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Paul Mundt <lethal@linux-sh.org> Tested-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com> Reviewed-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This happens because the futex test in futex_init() lacks a switch to the
USER_DS address space, while cmpxchg_futex_value_locked() and
futex_atomic_cmpxchg_inatomic() operate on userspace pointers (albeit NULL
for this particular test).
Fix this by switching to USER_DS before running the test, and restoring
the old address space afterwards.
Bisected by Finn Thain.
Reported-by: Tuxist <tuxist@tuxist.de> Reported-by: Patrick McCarthy <patrickjmc@gmail.com> Suggested-by: Andreas Schwab <schwab@linux-m68k.org> Tested-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Darren Hart <dvhart@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Johannes Weiner [Mon, 16 Dec 2013 23:44:46 +0000 (10:44 +1100)]
mm: page_alloc: exclude unreclaimable allocations from zone fairness policy
Dave Hansen noted a regression in a microbenchmark that loops around
open() and close() on an 8-node NUMA machine and bisected it down to 81c0a2bb515f ("mm: page_alloc: fair zone allocator policy"). That change
forces the slab allocations of the file descriptor to spread out to all 8
nodes, causing remote references in the page allocator and slab.
The round-robin policy is only there to provide fairness among memory
allocations that are reclaimed involuntarily based on pressure in each
zone. It does not make sense to apply it to unreclaimable kernel
allocations that are freed manually, in this case instantly after the
allocation, and incur the remote reference costs twice for no reason.
Only round-robin allocations that are usually freed through page reclaim
or slab shrinking.
Bisected by Dave Hansen.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mel Gorman [Mon, 16 Dec 2013 23:44:45 +0000 (10:44 +1100)]
mm: numa: guarantee that tlb_flush_pending updates are visible before page table updates
According to documentation on barriers, stores issued before a LOCK can
complete after the lock implying that it's possible tlb_flush_pending can
be visible after a page table update. As per revised documentation, this
patch adds a smp_mb__before_spinlock to guarantee the correct ordering.
Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The following build error was reported by the 0-day build checker.
>> arch/arm/mm/context.c:51:18: error: 'tlb_flush_pending' redeclared as different kind of symbol
include/linux/mm_types.h:477:91: note: previous definition of 'tlb_flush_pending' was here
This patch renames tlb_flush_pending to
mm_tlb_flush_pending. This is a fix for the -mm patch
mm-fix-tlb-flush-race-between-migration-and-change_protection_range.patch
Note that when slotted into place that it will cause a conflict with
mm-numa-defer-tlb-flush-for-thp-migration-as-long-as-possible.patch . The
resolution is to delete the call from huge_memory.c and make sure the
tlb_flush_pending call in mm/migrate.c is renamed appropriately.
Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Mel Gorman <mgorman@suse.de> Cc: Alex Thorlton <athorlton@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Rik van Riel [Mon, 16 Dec 2013 23:44:45 +0000 (10:44 +1100)]
mm: fix TLB flush race between migration, and change_protection_range
There are a few subtle races, between change_protection_range (used by
mprotect and change_prot_numa) on one side, and NUMA page migration and
compaction on the other side.
The basic race is that there is a time window between when the PTE gets
made non-present (PROT_NONE or NUMA), and the TLB is flushed.
During that time, a CPU may continue writing to the page.
This is fine most of the time, however compaction or the NUMA migration
code may come in, and migrate the page away.
When that happens, the CPU may continue writing, through the cached
translation, to what is no longer the current memory location of the
process.
This only affects x86, which has a somewhat optimistic pte_accessible.
All other architectures appear to be safe, and will either always flush,
or flush whenever there is a valid mapping, even with no permissions
(SPARC).
The basic race looks like this:
CPU A CPU B CPU C
load TLB entry
make entry PTE/PMD_NUMA
fault on entry
read/write old page
start migrating page
change PTE/PMD to new page
read/write old page [*]
flush TLB
reload TLB from new entry
read/write new page
lose data
[*] the old page may belong to a new user at this point!
The obvious fix is to flush remote TLB entries, by making sure that
pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may
still be accessible if there is a TLB flush pending for the mm.
This should fix both NUMA migration and compaction.
Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mel Gorman [Mon, 16 Dec 2013 23:44:45 +0000 (10:44 +1100)]
mm: numa: avoid unnecessary disruption of NUMA hinting during migration
do_huge_pmd_numa_page() handles the case where there is parallel THP
migration. However, by the time it is checked the NUMA hinting
information has already been disrupted. This patch adds an earlier check
with some helpers.
Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mel Gorman [Mon, 16 Dec 2013 23:44:45 +0000 (10:44 +1100)]
mm: numa: clear numa hinting information on mprotect
On a protection change it is no longer clear if the page should be still
accessible. This patch clears the NUMA hinting fault bits on a protection
change.
Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mel Gorman [Mon, 16 Dec 2013 23:44:44 +0000 (10:44 +1100)]
mm: numa: ensure anon_vma is locked to prevent parallel THP splits
The anon_vma lock prevents parallel THP splits and any associated
complexity that arises when handling splits during THP migration. This
patch checks if the lock was successfully acquired and bails from THP
migration if it failed for any reason.
Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mel Gorman [Mon, 16 Dec 2013 23:44:44 +0000 (10:44 +1100)]
mm: numa: do not clear PTE for pte_numa update
The TLB must be flushed if the PTE is updated but change_pte_range is
clearing the PTE while marking PTEs pte_numa without necessarily flushing
the TLB if it reinserts the same entry. Without the flush, it's
conceivable that two processors have different TLBs for the same virtual
address and at the very least it would generate spurious faults. This
patch only unmaps the pages in change_pte_range for a full protection
change.
Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mel Gorman [Mon, 16 Dec 2013 23:44:43 +0000 (10:44 +1100)]
mm: numa: do not clear PMD during PTE update scan
If the PMD is flushed then a parallel fault in handle_mm_fault() will
enter the pmd_none and do_huge_pmd_anonymous_page() path where it'll
attempt to insert a huge zero page. This is wasteful so the patch avoids
clearing the PMD when setting pmd_numa.
Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mel Gorman [Mon, 16 Dec 2013 23:44:43 +0000 (10:44 +1100)]
mm: clear pmd_numa before invalidating
On x86, PMD entries are similar to _PAGE_PROTNONE protection and are
handled as NUMA hinting faults. The following two page table protection
bits are what defines them
_PAGE_NUMA:set _PAGE_PRESENT:clear
A PMD is considered present if any of the _PAGE_PRESENT, _PAGE_PROTNONE,
_PAGE_PSE or _PAGE_NUMA bits are set. If pmdp_invalidate encounters a
pmd_numa, it clears the present bit leaving _PAGE_NUMA which will be
considered not present by the CPU but present by pmd_present. The
existing caller of pmdp_invalidate should handle it but it's an
inconsistent state for a PMD. This patch keeps the state consistent when
calling pmdp_invalidate.
Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mel Gorman [Mon, 16 Dec 2013 23:44:43 +0000 (10:44 +1100)]
mm: numa: serialise parallel get_user_page against THP migration
Base pages are unmapped and flushed from cache and TLB during normal page
migration and replaced with a migration entry that causes any parallel
NUMA hinting fault or gup to block until migration completes. THP does
not unmap pages due to a lack of support for migration entries at a PMD
level. This allows races with get_user_pages and get_user_pages_fast
which commit 3f926ab94 ("mm: Close races between THP migration and PMD
numa clearing") made worse by introducing a pmd_clear_flush().
This patch forces get_user_page (fast and normal) on a pmd_numa page to go
through the slow get_user_page path where it will serialise against THP
migration and properly account for the NUMA hinting fault. On the
migration side the page table lock is taken for each PTE update.
Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Vivek Goyal [Mon, 16 Dec 2013 23:44:42 +0000 (10:44 +1100)]
kexec: migrate to reboot cpu
Commit 1b3a5d02ee0 ("reboot: move arch/x86 reboot= handling to generic
kernel") moved reboot= handling to generic code. In the process it also
removed the code in native_machine_shutdown() which are moving reboot
process to reboot_cpu/cpu0.
I guess that thought must have been that all reboot paths are calling
migrate_to_reboot_cpu(), so we don't need this special handling. But
kexec reboot path (kernel_kexec()) is not calling migrate_to_reboot_cpu()
so above change broke kexec. Now reboot can happen on non-boot cpu and
when INIT is sent in second kerneo to bring up BP, it brings down the
machine.
So start calling migrate_to_reboot_cpu() in kexec reboot path to avoid
this problem.
Bisected by WANG Chao.
Reported-by: Matthew Whitehead <mwhitehe@redhat.com> Reported-by: Dave Young <dyoung@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Tested-by: Baoquan He <bhe@redhat.com> Tested-by: WANG Chao <chaowang@redhat.com> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matias Bjorling [Tue, 10 Dec 2013 15:50:38 +0000 (16:50 +0100)]
null_blk: mem garbage on NUMA systems during init
For NUMA systems, initializing the blk-mq layer and using per node hctx.
We initialize submit queues to 1, while blk-mq nr_hw_queues is
initialized to the number of NUMA nodes.
This makes the null_init_hctx function overwrite memory outside of what
it allocated. In my case it lead to writing garbage into struct
request_queue's mq_map.
radeon_pm: fix oops in hwmon_attributes_visible() and radeon_hwmon_show_temp_thresh()
Since commit ec39f64bba34 ("drm/radeon/dpm: Convert to use
devm_hwmon_register_with_groups") radeon_hwmon_init() is using
hwmon_device_register_with_groups(), which sets `rdev' as a device
private driver_data, while hwmon_attributes_visible() and
radeon_hwmon_show_temp_thresh() are still waiting for `drm_device'.
Fix them by using dev_get_drvdata(), in order to avoid this oops:
1) Revert CHECKSUM_COMPLETE optimization in pskb_trim_rcsum(), I can't
figure out why it breaks things.
2) Fix comparison in netfilter ipset's hash_netnet4_data_equal(), it
was basically doing "x == x", from Dave Jones.
3) Freescale FEC driver was DMA mapping the wrong number of bytes, from
Sebastian Siewior.
4) Blackhole and prohibit routes in ipv6 were not doing the right thing
because their ->input and ->output methods were not being assigned
correctly. Now they behave properly like their ipv4 counterparts.
From Kamala R.
5) Several drivers advertise the NETIF_F_FRAGLIST capability, but
really do not support this feature and will send garbage packets if
fed fraglist SKBs. From Eric Dumazet.
6) Fix long standing user triggerable BUG_ON over loopback in RDS
protocol stack, from Venkat Venkatsubra.
7) Several not so common code paths can potentially try to invoke
packet scheduler actions that might be NULL without checking. Shore
things up by either 1) defining a method as mandatory and erroring
on registration if that method is NULL 2) defininig a method as
optional and the registration function hooks up a default
implementation when NULL is seen. From Jamal Hadi Salim.
8) Fix fragment detection in xen-natback driver, from Paul Durrant.
9) Kill dangling enter_memory_pressure method in cg_proto ops, from
Eric W Biederman.
10) SKBs that traverse namespaces should have their local_df cleared,
from Hannes Frederic Sowa.
11) IOCB file position is not being updated by macvtap_aio_read() and
tun_chr_aio_read(). From Zhi Yong Wu.
12) Don't free virtio_net netdev before releasing all of the NAPI
instances. From Andrey Vagin.
13) Procfs entry leak in xt_hashlimit, from Sergey Popovich.
14) IPv6 routes that are no cached routes should not count against the
garbage collection limits. We had this almost right, but were
missing handling addrconf generated routes properly. From Hannes
Frederic Sowa.
15) fib{4,6}_rule_suppress() have to consider potentially seeing NULL
route info when they are called, from Stefan Tomanek.
16) TUN and MACVTAP have had truncated packet signalling for some time,
fix from Jason Wang.
17) Fix use after frrr in __udp4_lib_rcv(), from Eric Dumazet.
18) xen-netback does not interpret the NAPI budget properly for TX work,
fix from Paul Durrant.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (132 commits)
igb: Fix for issue where values could be too high for udelay function.
i40e: fix null dereference
xen-netback: fix gso_prefix check
net: make neigh_priv_len in struct net_device 16bit instead of 8bit
drivers: net: cpsw: fix for cpsw crash when build as modules
xen-netback: napi: don't prematurely request a tx event
xen-netback: napi: fix abuse of budget
sch_tbf: use do_div() for 64-bit divide
udp: ipv4: must add synchronization in udp_sk_rx_dst_set()
net:fec: remove duplicate lines in comment about errata ERR006358
Revert "8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature"
8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature
xen-netback: make sure skb linear area covers checksum field
net: smc91x: Fix device tree based configuration so it's usable
udp: ipv4: fix potential use after free in udp_v4_early_demux()
macvtap: signal truncated packets
tun: unbreak truncated packet signalling
net: sched: htb: fix the calculation of quantum
net: sched: tbf: fix the calculation of max_size
micrel: add support for KSZ8041RNLI
...
Linus Torvalds [Sun, 15 Dec 2013 19:52:47 +0000 (11:52 -0800)]
Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
"This is a pretty small batch:
The biggest single change is to stop using EFI time services on 32-bit
platforms. This matches our current behavior on 64-bit platforms as
we already had ruled them out there as being too unreliable. Turns
out that affects 32-bit platforms, too.
One NULL pointer fix for SGI UV.
Two minor build fixes, one of which only affects icc and the other
which affects icc and future versions or nonstandard default settings
of gcc"
* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, efi: Don't use (U)EFI time services on 32 bit
x86, build, icc: Remove uninitialized_var() from compiler-intel.h
x86/UV: Fix NULL pointer dereference in uv_flush_tlb_others() if the 'nobau' boot option is used
x86, build: Pass in additional -mno-mmx, -mno-sse options
Linus Torvalds [Sun, 15 Dec 2013 19:45:27 +0000 (11:45 -0800)]
Merge tag 'pci-v3.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
"PCI device hotplug
- Move device_del() from pci_stop_dev() to pci_destroy_dev() (Rafael
Wysocki)
Host bridge drivers
- Update maintainers for DesignWare, i.MX6, Armada, R-Car (Bjorn
Helgaas)
- mvebu: Return 'unsupported' for Interrupt Line and Interrupt Pin
(Jason Gunthorpe)
Miscellaneous
- Avoid unnecessary CPU switch when calling .probe() (Alexander
Duyck)
- Revert "workqueue: allow work_on_cpu() to be called recursively"
(Bjorn Helgaas)
- Disable Bus Master only on kexec reboot (Khalid Aziz)
- Omit PCI ID macro strings to shorten quirk names for LTO (Michal
Marek)"
* tag 'pci-v3.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
MAINTAINERS: Add DesignWare, i.MX6, Armada, R-Car PCI host maintainers
PCI: Disable Bus Master only on kexec reboot
PCI: mvebu: Return 'unsupported' for Interrupt Line and Interrupt Pin
PCI: Omit PCI ID macro strings to shorten quirk names
PCI: Move device_del() from pci_stop_dev() to pci_destroy_dev()
Revert "workqueue: allow work_on_cpu() to be called recursively"
PCI: Avoid unnecessary CPU switch when calling driver .probe() method
Linus Torvalds [Sun, 15 Dec 2013 19:28:02 +0000 (11:28 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull SELinux fixes from James Morris.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
selinux: process labeled IPsec TCP SYN-ACK packets properly in selinux_ip_postroute()
selinux: look for IPsec labels on both inbound and outbound packets
selinux: handle TCP SYN-ACK packets correctly in selinux_ip_postroute()
selinux: handle TCP SYN-ACK packets correctly in selinux_ip_output()
selinux: fix possible memory leak
and Josh Boyer bisected it down to this commit. Reverting the commit in
the rawhide kernel fixes the problem.
Eric Paris root-caused it to incorrect subtype matching in that commit
breaking fuse, and has a tentative patch, but by now we're better off
retrying this in 3.14 rather than playing with it any more.
Reported-by: Tom London <selinux@gmail.com> Bisected-by: Josh Boyer <jwboyer@fedoraproject.org> Acked-by: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Anand Avati <avati@redhat.com> Cc: Paul Moore <paul@paul-moore.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Carolyn Wyborny [Sat, 14 Dec 2013 11:26:46 +0000 (03:26 -0800)]
igb: Fix for issue where values could be too high for udelay function.
This patch changes the igb_phy_has_link function to check the value of the
parameter before deciding to use udelay or mdelay in order to be sure that
the value is not too high for udelay function.
CC: stable kernel <stable@vger.kernel.org> # 3.9+ Signed-off-by: Sunil K Pandey <sunil.k.pandey@intel.com> Signed-off-by: Kevin B Smith <kevin.b.smith@intel.com> Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Brandeburg [Sat, 14 Dec 2013 11:26:45 +0000 (03:26 -0800)]
i40e: fix null dereference
If the vsi->tx_rings structure is NULL we don't want to panic.
Change-Id: Ic694f043701738c434e8ebe0caf0673f4410dc10 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 14 Dec 2013 00:16:03 +0000 (16:16 -0800)]
Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"This resolves some further issues with the dma mask changes on ARM
which have been found by TI and others, and also some corner cases
with the updates to the virtual to physical address translations.
Konstantin also found some problems with the unwinder, which now
performs tighter verification that the stack is valid while unwinding"
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: fix asm/memory.h build error
ARM: 7917/1: cacheflush: correctly limit range of memory region being flushed
ARM: 7913/1: fix framepointer check in unwind_frame
ARM: 7912/1: check stack pointer in get_wchan
ARM: 7909/1: mm: Call setup_dma_zone() post early_paging_init()
ARM: 7908/1: mm: Fix the arm_dma_limit calculation
ARM: another fix for the DMA mapping checks
Linus Torvalds [Sat, 14 Dec 2013 00:14:39 +0000 (16:14 -0800)]
Merge tag 'arc-fixes-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
"These are couple of weeks old already, but I just couldn't get them to
you earlier.
- couple of fixes for recently added perf code
- build time extable sort"
* tag 'arc-fixes-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: [perf] Fix a few thinkos
ARC: Add guard macro to uapi/asm/unistd.h
ARC: extable: Enable sorting at build time
Linus Torvalds [Fri, 13 Dec 2013 21:22:22 +0000 (13:22 -0800)]
Merge tag 'dm-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
"A set of device-mapper fixes for 3.13.
A fix for possible memory corruption during DM table load, fix a
possible leak of snapshot space in case of a crash, fix a possible
deadlock due to a shared workqueue in the delay target, fix to
initialize read-only module parameters that are used to export metrics
for dm stats and dm bufio.
Quite a few stable fixes were identified for both the thin-
provisioning and caching targets as a result of increased regression
testing using the device-mapper-test-suite (dmts). The most notable
of these are the reference counting fixes for the space map btree that
is used by the dm-array interface -- without these the dm-cache
metadata will leak, resulting in dm-cache devices running out of
metadata blocks. Also, some important fixes related to the
thin-provisioning target's transition to read-only mode on error"
* tag 'dm-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm array: fix a reference counting bug in shadow_ablock
dm space map: disallow decrementing a reference count below zero
dm stats: initialize read-only module parameter
dm bufio: initialize read-only module parameters
dm cache: actually resize cache
dm cache: update Documentation for invalidate_cblocks's range syntax
dm cache policy mq: fix promotions to occur as expected
dm thin: allow pool in read-only mode to transition to read-write mode
dm thin: re-establish read-only state when switching to fail mode
dm thin: always fallback the pool mode if commit fails
dm thin: switch to read-only mode if metadata space is exhausted
dm thin: switch to read only mode if a mapping insert fails
dm space map metadata: return on failure in sm_metadata_new_block
dm table: fail dm_table_create on dm_round_up overflow
dm snapshot: avoid snapshot space leak on crash
dm delay: fix a possible deadlock due to shared workqueue
Russell King [Tue, 10 Dec 2013 19:21:08 +0000 (19:21 +0000)]
ARM: fix asm/memory.h build error
Jason Gunthorpe reports a build failure when ARM_PATCH_PHYS_VIRT is
not defined:
In file included from arch/arm/include/asm/page.h:163:0,
from include/linux/mm_types.h:16,
from include/linux/sched.h:24,
from arch/arm/kernel/asm-offsets.c:13:
arch/arm/include/asm/memory.h: In function '__virt_to_phys':
arch/arm/include/asm/memory.h:244:40: error: 'PHYS_OFFSET' undeclared (first use in this function)
arch/arm/include/asm/memory.h:244:40: note: each undeclared identifier is reported only once for each function it appears in
arch/arm/include/asm/memory.h: In function '__phys_to_virt':
arch/arm/include/asm/memory.h:249:13: error: 'PHYS_OFFSET' undeclared (first use in this function)
Fixes: ca5a45c06cd4 ("ARM: mm: use phys_addr_t appropriately in p2v and v2p conversions") Tested-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Linus Torvalds [Fri, 13 Dec 2013 19:39:54 +0000 (11:39 -0800)]
Merge tag 'regulator-v3.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A small set of driver fixes plus one larger core change which changes
the way we check to see if we're using DT so that there aren't any
races between deciding we're using DT and the regulator subsystem
noticing.
This makes the new support for substituting a dummy regulator and
optional regulators work a lot better on DT systems since it ensures
that we don't trigger probe deferral when we shouldn't which was
causing bugs in clients"
* tag 'regulator-v3.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: pfuze100: allow misprogrammed ID
regulator: pfuze100: Fix address of FABID
regulator: as3722: set the correct current limit
regulator: core: Check for DT every time we check full constraints
regulator: core: Replace checks of have_full_constraints with a function
Linus Torvalds [Fri, 13 Dec 2013 19:38:35 +0000 (11:38 -0800)]
Merge tag 'regmap-v3.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap fixes from Mark Brown:
"Two small changes to fix some error handling and checking (both of
which would be quite serious if the errors trigger) plus a trivial
documentation fix"
* tag 'regmap-v3.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: use IS_ERR() to check clk_get() results
regmap: make sure we unlock on failure in regmap_bulk_write
regmap: trivial comment fix (copy'n'paste error)
Linus Torvalds [Fri, 13 Dec 2013 19:37:57 +0000 (11:37 -0800)]
Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Here are two simple but wanted fixes for the i2c subsystem"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: imx: Check the return value from clk_prepare_enable()
i2c: mux: Inherit retry count and timeout from parent for muxed bus
Linus Torvalds [Fri, 13 Dec 2013 19:31:22 +0000 (11:31 -0800)]
Merge tag 'for-linus-20131212' of git://git.infradead.org/linux-mtd
Pull MTD fixes from Brian Norris:
"Two MTD fixes, for the pxa3xx-nand driver:
- This driver was not ready to fully Armada 370 NAND, with
particularly notable problems seen on flash with 2KB page sizes.
This "compatible" entry really should have been held back until
3.14 or later.
- Fix a bug seen in rare cases on the error path of a failed probe
attempt, where we free unallocated DMA resources"
* tag 'for-linus-20131212' of git://git.infradead.org/linux-mtd:
mtd: nand: pxa3xx: Use info->use_dma to release DMA resources
Partially revert "mtd: nand: pxa3xx: Introduce 'marvell,armada370-nand' compatible string"
Linus Torvalds [Fri, 13 Dec 2013 19:29:51 +0000 (11:29 -0800)]
Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma
Pull slave-dmaengine fixes from Vinod Koul:
"Here is the common fixes PULL for dmaengine.
Dan has been working on fixing the build issues in bunch of drivers.
Here we have one fixing s3c24xx-dma, along with fix from Russell on
pl08x. Also we have Kuninori rcar dma fixes. The s3c24xx-dma which
was added in last merge window missed updates to usage of DMA_COMPLETE
so converting the last driver"
* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
dma: fix build breakage in s3c24xx-dma
Fix pl08x warnings
rcar-hpbdma: initialise plane information when halted
rcar-hpbdma: fixup channel busy check for double plane
rcar-hpbdma: add max transfer size
dma: mmp_pdma: add missing platform_set_drvdata() in mmp_pdma_probe()
dmaengine: s3c24xx-dma: use DMA_COMPLETE for dma completion status
Joe Thornber [Fri, 13 Dec 2013 12:31:08 +0000 (12:31 +0000)]
dm space map: disallow decrementing a reference count below zero
The old behaviour, returning -EINVAL if a ref_count of 0 would be
decremented, was removed in commit f722063 ("dm space map: optimise
sm_ll_dec and sm_ll_inc"). To fix this regression we return an error
code from the mutator function pointer passed to sm_ll_mutate() and have
dec_ref_count() return -EINVAL if the old ref_count is 0.
Add a DMERR to reflect the potential seriousness of this error.
Also, add missing dm_tm_unlock() to sm_ll_mutate()'s error path.
With this fix the following dmts regression test now passes:
dmtest run --suite cache -n /metadata_use_kernel/
The next patch fixes the higher-level dm-array code that exposed this
regression.
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.12+
Linus Torvalds [Fri, 13 Dec 2013 02:22:10 +0000 (18:22 -0800)]
Merge branch 'akpm' (fixes from Andrew)
Merge patches from Andrew Morton:
"13 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: memcg: do not allow task about to OOM kill to bypass the limit
mm: memcg: fix race condition between memcg teardown and swapin
thp: move preallocated PTE page table on move_huge_pmd()
mfd/rtc: s5m: fix register updating by adding regmap for RTC
rtc: s5m: enable IRQ wake during suspend
rtc: s5m: limit endless loop waiting for register update
rtc: s5m: fix unsuccesful IRQ request during probe
drivers/rtc/rtc-s5m.c: fix info->rtc assignment
include/linux/kernel.h: make might_fault() a nop for !MMU
drivers/rtc/rtc-at91rm9200.c: correct alarm over day/month wrap
procfs: also fix proc_reg_get_unmapped_area() for !MMU case
mm: memcg: do not declare OOM from __GFP_NOFAIL allocations
include/linux/hugetlb.h: make isolate_huge_page() an inline
Johannes Weiner [Fri, 13 Dec 2013 01:12:35 +0000 (17:12 -0800)]
mm: memcg: do not allow task about to OOM kill to bypass the limit
Commit 4942642080ea ("mm: memcg: handle non-error OOM situations more
gracefully") allowed tasks that already entered a memcg OOM condition to
bypass the memcg limit on subsequent allocation attempts hoping this
would expedite finishing the page fault and executing the kill.
David Rientjes is worried that this breaks memcg isolation guarantees
and since there is no evidence that the bypass actually speeds up fault
processing just change it so that these subsequent charge attempts fail
outright. The notable exception being __GFP_NOFAIL charges which are
required to bypass the limit regardless.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-bt: David Rientjes <rientjes@google.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Fri, 13 Dec 2013 01:12:34 +0000 (17:12 -0800)]
mm: memcg: fix race condition between memcg teardown and swapin
There is a race condition between a memcg being torn down and a swapin
triggered from a different memcg of a page that was recorded to belong
to the exiting memcg on swapout (with CONFIG_MEMCG_SWAP extension). The
result is unreclaimable pages pointing to dead memcgs, which can lead to
anything from endless loops in later memcg teardown (the page is charged
to all hierarchical parents but is not on any LRU list) or crashes from
following the dangling memcg pointer.
Memcgs with tasks in them can not be torn down and usually charges don't
show up in memcgs without tasks. Swapin with the CONFIG_MEMCG_SWAP
extension is the notable exception because it charges the cgroup that
was recorded as owner during swapout, which may be empty and in the
process of being torn down when a task in another memcg triggers the
swapin:
teardown: swapin:
lookup_swap_cgroup_id()
rcu_read_lock()
mem_cgroup_lookup()
css_tryget()
rcu_read_unlock()
disable css_tryget()
call_rcu()
offline_css()
reparent_charges()
res_counter_charge() (hierarchical!)
css_put()
css_free()
pc->mem_cgroup = dead memcg
add page to dead lru
Add a final reparenting step into css_free() to make sure any such raced
charges are moved out of the memcg before it's finally freed.
In the longer term it would be cleaner to have the css_tryget() and the
res_counter charge under the same RCU lock section so that the charge
reparenting is deferred until the last charge whose tryget succeeded is
visible. But this will require more invasive changes that will be
harder to evaluate and backport into stable, so better defer them to a
separate change set.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
p = mmap((void *) GB, 10 * MB, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
for (i = 0; i < 10 * MB; i += 4096)
p[i] = 1;
mremap(p, 10 * MB, 10 * MB, MREMAP_FIXED | MREMAP_MAYMOVE, 2 * GB);
return 0;
}
Due to split PMD lock, we now store preallocated PTE tables for THP
pages per-PMD table. It means we need to move them to other PMD table
if huge PMD moved there.
mfd/rtc: s5m: fix register updating by adding regmap for RTC
Rename old regmap field of "struct sec_pmic_dev" to "regmap_pmic" and
add new regmap for RTC.
On S5M8767A registers were not properly updated and read due to usage of
the same regmap as the PMIC. This could be observed in various hangs,
e.g. in infinite loop during waiting for UDR field change.
On this chip family the RTC has different I2C address than PMIC so
additional regmap is needed.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Reviewed-by: Mark Brown <broonie@linaro.org> Acked-by: Sangbeom Kim <sbkim73@samsung.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add PM suspend/resume ops to rtc-s5m driver and enable IRQ wake during
suspend so the RTC would act like a wake up source. This allows waking
up from suspend to RAM on RTC alarm interrupt.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Cc: Mark Brown <broonie@linaro.org> Acked-by: Sangbeom Kim <sbkim73@samsung.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rtc: s5m: limit endless loop waiting for register update
After setting alarm or time the driver is waiting for UDR register to be
cleared indicating that registers data have been transferred.
Limit the endless loop to only 5 retries.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Reviewed-by: Mark Brown <broonie@linaro.org> Acked-by: Sangbeom Kim <sbkim73@samsung.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rtc: s5m: fix unsuccesful IRQ request during probe
Probe failed for rtc-s5m:
s5m-rtc s5m-rtc: Failed to request alarm IRQ: 12: -22
s5m-rtc: probe of s5m-rtc failed with error -22
Fix rtc-s5m interrupt request by using regmap_irq_get_virq() for mapping
the IRQ.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Reviewed-by: Mark Brown <broonie@linaro.org> Acked-by: Sangbeom Kim <sbkim73@samsung.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Axel Lin [Fri, 13 Dec 2013 01:12:24 +0000 (17:12 -0800)]
include/linux/kernel.h: make might_fault() a nop for !MMU
The machine cannot fault if !MUU, so make might_fault() a nop for !MMU.
This fixes below build error if
!CONFIG_MMU && (CONFIG_PROVE_LOCKING=y || CONFIG_DEBUG_ATOMIC_SLEEP=y):
arch/arm/kernel/built-in.o: In function `arch_ptrace':
arch/arm/kernel/ptrace.c:852: undefined reference to `might_fault'
arch/arm/kernel/built-in.o: In function `restore_sigframe':
arch/arm/kernel/signal.c:173: undefined reference to `might_fault'
...
arch/arm/kernel/built-in.o:arch/arm/kernel/signal.c:177: more undefined references to `might_fault' follow
make: *** [vmlinux] Error 1
Signed-off-by: Axel Lin <axel.lin@ingics.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Beulich [Fri, 13 Dec 2013 01:12:22 +0000 (17:12 -0800)]
procfs: also fix proc_reg_get_unmapped_area() for !MMU case
Commit fad1a86e25e0 ("procfs: call default get_unmapped_area on
MMU-present architectures"), as its title says, took care of only the
MMU case, leaving the !MMU side still in the regressed state (returning
-EIO in all cases where pde->proc_fops->get_unmapped_area is NULL).
"Commit c4fe24485729 ("sparc: fix PCI device proc file mmap(2)") added
proc_reg_get_unmapped_area in proc_reg_file_ops and
proc_reg_file_ops_no_compat, by which now mmap always returns EIO if
get_unmapped_area method is not defined for the target procfs file, which
causes regression of mmap on /proc/vmcore.
To address this issue, like get_unmapped_area(), call default
current->mm->get_unmapped_area on MMU-present architectures if
pde->proc_fops->get_unmapped_area, i.e. the one in actual file operation
in the procfs file, is not defined"
Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: <stable@vger.kernel.org> [3.12.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Fri, 13 Dec 2013 01:12:20 +0000 (17:12 -0800)]
mm: memcg: do not declare OOM from __GFP_NOFAIL allocations
Commit 84235de394d9 ("fs: buffer: move allocation failure loop into the
allocator") started recognizing __GFP_NOFAIL in memory cgroups but
forgot to disable the OOM killer.
Any task that does not fail allocation will also not enter the OOM
completion path. So don't declare an OOM state in this case or it'll be
leaked and the task be able to bypass the limit until the next
userspace-triggered page fault cleans up the OOM state.
Reported-by: William Dauchy <wdauchy@gmail.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> [3.12.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Naoya Horiguchi [Fri, 13 Dec 2013 01:12:19 +0000 (17:12 -0800)]
include/linux/hugetlb.h: make isolate_huge_page() an inline
With CONFIG_HUGETLBFS=n:
mm/migrate.c: In function `do_move_page_to_node_array':
include/linux/hugetlb.h:140:33: warning: statement with no effect [-Wunused-value]
#define isolate_huge_page(p, l) false
^
mm/migrate.c:1170:4: note: in expansion of macro `isolate_huge_page'
isolate_huge_page(page, &pagelist);
Linus Torvalds [Thu, 12 Dec 2013 23:45:03 +0000 (15:45 -0800)]
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"Another week, another batch of fixes.
Again, OMAP regressions due to move to DT is the bulk of the changes
here, but this should be the last of it for 3.13. There are also a
handful of OMAP hwmod changes (power management, reset handling) for
USB on OMAP3 that fixes some longish-standing bugs around USB resets.
There are a couple of other changes that also add up line count a bit:
One is a long-standing bug with the keyboard layout on one of the PXA
platforms. The other is a fix for highbank that moves their
power-off/reset button handling to be done in-kernel since relying on
userspace to handle it was fragile and awkward"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: sun6i: dt: Fix interrupt trigger types
ARM: sun7i: dt: Fix interrupt trigger types
MAINTAINERS: merge IMX6 entry into IMX
ARM: tegra: add missing break to fuse initialization code
ARM: pxa: prevent PXA270 occasional reboot freezes
ARM: pxa: tosa: fix keys mapping
ARM: OMAP2+: omap_device: add fail hook for runtime_pm when bad data is detected
ARM: OMAP2+: hwmod: Fix usage of invalid iclk / oclk when clock node is not present
ARM: OMAP3: hwmod data: Don't prevent RESET of USB Host module
ARM: OMAP2+: hwmod: Fix SOFTRESET logic
ARM: OMAP4+: hwmod data: Don't prevent RESET of USB Host module
ARM: dts: Fix booting for secure omaps
ARM: OMAP2+: Fix the machine entry for am3517
ARM: dts: Fix missing entries for am3517
ARM: OMAP2+: Fix overwriting hwmod data with data from device tree
ARM: davinci: Fix McASP mem resource names
ARM: highbank: handle soft poweroff and reset key events
ARM: davinci: fix number of resources passed to davinci_gpio_register()
gpio: davinci: fix check for unbanked gpio
Linus Torvalds [Thu, 12 Dec 2013 23:25:10 +0000 (15:25 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"This is a small collection of fixes. It was rebased this morning, but
I was just fixing signed-off-by tags with the wrong email"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix access_ok() check in btrfs_ioctl_send()
Btrfs: make sure we cleanup all reloc roots if error happens
Btrfs: skip building backref tree for uuid and quota tree when doing balance relocation
Btrfs: fix an oops when doing balance relocation
Btrfs: don't miss skinny extent items on delayed ref head contention
btrfs: call mnt_drop_write after interrupted subvol deletion
Btrfs: don't clear the default compression type