mm: remove free_area_cache use in powerpc architecture
As all other architectures have been converted to use vm_unmapped_area(),
we are about to retire the free_area_cache.
This change simply removes the use of that cache in
slice_get_unmapped_area(), which will most certainly have a
performance cost. Next one will convert that function to use the
vm_unmapped_area() infrastructure and regain the performance.
Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Wen Congyang [Wed, 20 Mar 2013 04:06:51 +0000 (15:06 +1100)]
x86: make 'mem=' option to work for efi platform
Current mem boot option only can work for non efi environment. If the
user specifies add_efi_memmap, it cannot work for efi environment. In the
efi environment, we call e820_add_region() to add the memory map. So we
can modify __e820_add_region() and the mem boot option can work for efi
environment.
Note: Only E820_RAM is limited, and BOOT_SERVICES_{CODE,DATA} are always
mapped(If its address >= mem_limit, the memory won't be freed in
efi_free_boot_services()).
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Rob Landley <rob@landley.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Yasuaki ISIMATU <isimatu.yasuaki@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Matthew Garrett <mjg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Nathan Zimmer [Wed, 20 Mar 2013 04:06:51 +0000 (15:06 +1100)]
sound: convert snd_info_register() to use proc_create_data()
Convert snd_info_register to use proc_create_data instead of
create_proc_entry. This corrects a sparse warning introduced by "procfs:
Improve Scaling in proc" It is also a bit cleaner to let proc_create_data
set the ->data and ->proc_fops.
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jianguo Wu [Wed, 20 Mar 2013 04:06:50 +0000 (15:06 +1100)]
arch/x86/mm/init_64.c: fix build warning when CONFIG_MEMORY_HOTREMOVE=n
There is a warning while building kernel with
CONFIG_MEMORY_HOTPLUG=y && CONFIG_MEMORY_HOTREMOVE=n:
arch/x86/mm/init_64.c:1024: warning:kernel_physical_mapping_remove defined but not used
So move kernel_physical_mapping_remove() into "#ifdef
CONFIG_MEMORY_HOTREMOVE" block
Oleg Nesterov [Wed, 20 Mar 2013 04:06:50 +0000 (15:06 +1100)]
kthread: kill task_get_live_kthread()
task_get_live_kthread() looks confusing and unneeded. It does
get_task_struct() but only kthread_stop() needs this, it can be called
even if the calller doesn't have a reference when we know that this
kthread can't exit until we do kthread_stop().
kthread_park() and kthread_unpark() do not need get_task_struct(), the
callers already have the reference. And it can not help if we can race
with the exiting kthread anyway, kthread_park() can hang forever in this
case.
Change kthread_park() and kthread_unpark() to use to_live_kthread(),
change kthread_stop() to do get_task_struct() by hand and remove
task_get_live_kthread().
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Wed, 20 Mar 2013 04:06:50 +0000 (15:06 +1100)]
kthread: introduce to_live_kthread()
"k->vfork_done != NULL" with a barrier() after to_kthread(k) in
task_get_live_kthread(k) looks unclear, and sub-optimal because we load
->vfork_done twice.
All we need is to ensure that we do not return to_kthread(NULL). Add a
new trivial helper which loads/checks ->vfork_done once, this also looks
more understandable.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Wed, 20 Mar 2013 04:06:49 +0000 (15:06 +1100)]
thinkpad-acpi: kill hotkey_thread_mutex
hotkey_kthread() does try_to_freeze() under hotkey_thread_mutex.
We can simply kill this mutex, hotkey_poll_stop_sync() does not need to
serialize with hotkey_kthread(). When kthread_stop() returns the thread
is already dead, it called do_exit()->complete_vfork_done().
Reported-by: Artem Savkov <artem.savkov@gmail.com> Reported-by: Maciej Rutecki <maciej.rutecki@gmail.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Matthew Garrett <matthew.garrett@nebula.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Reviewed-by: Mandeep Singh Baines <msb@chromium.org> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sanjay Lal [Wed, 20 Mar 2013 04:06:49 +0000 (15:06 +1100)]
mips: define KVM_USER_MEM_SLOTS
ARCH=mips, config=fuloong2e_defconfig:
akpm3:/usr/src/25> make arch/mips/kernel/early_printk.o
...
CC arch/mips/kernel/asm-offsets.s
In file included from arch/mips/kernel/asm-offsets.c:20:
include/linux/kvm_host.h:334: error: `KVM_USER_MEM_SLOTS' undeclared here (not in a function)
Signed-off-by: Sanjay Lal <sanjayl@kymasys.com> Reported-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Josh Boyer [Wed, 20 Mar 2013 04:06:49 +0000 (15:06 +1100)]
kmsg: honor dmesg_restrict sysctl on /dev/kmsg
Originally, the addition of dmesg_restrict covered both the syslog
method of accessing dmesg, as well as /dev/kmsg itself. This was done
indirectly by security_syslog calling cap_syslog before doing any LSM
checks.
However, commit 12b3052c3ee ("capabilities/syslog: open code cap_syslog
logic to fix build failure") moved the code around and pushed the checks
into the caller itself. That seems to have inadvertently dropped the
checks for dmesg_restrict on /dev/kmsg. Most people haven't noticed
because util-linux dmesg(1) defaults to using the syslog method for access
in older versions. With util-linux 2.22 and a kernel newer than 3.5,
dmesg(1) defaults to reading directly from /dev/kmsg.
Fix this by making an explicit check in the devkmsg_open function.
This fixes https://bugzilla.redhat.com/show_bug.cgi?id=903192
Signed-off-by: Josh Boyer <jwboyer@redhat.com> Reported-by: Christian Kujau <lists@nerdbynature.de> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Vladimir Davydov [Wed, 20 Mar 2013 04:06:48 +0000 (15:06 +1100)]
mqueue: sys_mq_open: do not call mnt_drop_write() if read-only
mnt_drop_write() must be called only if mnt_want_write() succeeded,
otherwise the mnt_writers counter will diverge.
mnt_writers counters are used to check if remounting FS as read-only is
OK, so after an extra mnt_drop_write() call, it would be impossible to
remount mqueue FS as read-only. Besides, on umount a warning would be
printed like this one:
[ 194.714880] =====================================
[ 194.719680] [ BUG: bad unlock balance detected! ]
[ 194.724488] 3.9.0-rc3 #5 Not tainted
[ 194.728159] -------------------------------------
[ 194.732958] a.out/12486 is trying to release lock (sb_writers) at:
[ 194.739355] [<ffffffff811b177f>] mnt_drop_write+0x1f/0x30
[ 194.744851] but there are no more locks to release!
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Doug Ledford <dledford@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alexander Duyck [Wed, 20 Mar 2013 04:06:48 +0000 (15:06 +1100)]
dma-debug: update DMA debug API to better handle multiple mappings of a buffer
There were reports of the igb driver unmapping buffers without calling
dma_mapping_error. On closer inspection issues were found in the DMA
debug API and how it handled multiple mappings of the same buffer.
The issue I found is the fact that the debug_dma_mapping_error would only
set the map_err_type to MAP_ERR_CHECKED in the case that the was only one
match for device and device address. However in the case of non-IOMMU,
multiple addresses existed and as a result it was not setting this field
once a second mapping was instantiated. I have resolved this by changing
the search so that it instead will now set MAP_ERR_CHECKED on the first
buffer that matches the device and DMA address that is currently in the
state MAP_ERR_NOT_CHECKED.
A secondary side effect of this patch is that in the case of multiple
buffers using the same address only the last mapping will have a valid
map_err_type. The previous mappings will all end up with map_err_type set
to MAP_ERR_CHECKED because of the dma_mapping_error call in
debug_dma_map_page. However this behavior may be preferable as it means
you will likely only see one real error per multi-mapped buffer, versus
the current behavior of multiple false errors mer multi-mapped buffer.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Reviewed-by: Shuah Khan <shuah.khan@hp.com> Tested-by: Shuah Khan <shuah.khan@hp.com> Cc: Jakub Kicinski <kubakici@wp.pl> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alexander Duyck [Wed, 20 Mar 2013 04:06:47 +0000 (15:06 +1100)]
dma-debug: fix locking bug in check_unmap()
In check_unmap() it is possible to get into a dead-locked state if
dma_mapping_error is called. The problem is that the bucket is locked in
check_unmap, and locked again by debug_dma_mapping_error which is called
by dma_mapping_error. To resolve that we must release the lock on the
bucket before making the call to dma_mapping_error.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Reviewed-by: Shuah Khan <shuah.khan@hp.com> Tested-by: Shuah Khan <shuah.khan@hp.com> Cc: Jakub Kicinski <kubakici@wp.pl> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
zone_end_pfn is "unsigned" (32 bits). Changing it to
"unsigned long" (64 bits) fixes the problem.
zone_end_pfn() was added recently in commit 108bcc96ef70 ("mm: add & use
zone_end_pfn() and zone_spans_pfn()")
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/include/linux/mmzone.h?id=108bcc96ef7047c02cad4d229f04da38186a3f3f
Signed-off-by: Russ Anderson <rja@sgi.com> Reported-by: George Beshers <gbeshers@sgi.com> Acked-by: Hedi Berriche <hedi@sgi.com> Cc: Cody P Schafer <cody@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Wed, 20 Mar 2013 04:06:47 +0000 (15:06 +1100)]
poweroff: change orderly_poweroff() to use schedule_work()
David said:
Commit 6c0c0d4d108 ("poweroff: fix bug in orderly_poweroff()")
apparently fixes one bug in orderly_poweroff(), but introduces
another. The comments on orderly_poweroff() claim it can be called
from any context - and indeed we call it from interrupt context in
arch/powerpc/platforms/pseries/ras.c for example. But since that
commit this is no longer safe, since call_usermodehelper_fns() is not
safe in interrupt context without the UMH_NO_WAIT option.
orderly_poweroff() can be used from any context but UMH_WAIT_EXEC is
sleepable. Move the "force" logic into __orderly_poweroff() and change
orderly_poweroff() to use the global poweroff_work which simply calls
__orderly_poweroff().
While at it, remove the unneeded "int argc" and change argv_split() to use
GFP_KERNEL.
We use the global "bool poweroff_force" to pass the argument, this can
obviously affect the previous request if it is pending/running. So we
only allow the "false => true" transition assuming that the pending "true"
should succeed anyway. If schedule_work() fails after that we know that
work->func() was not called yet, it must see the new value.
This means that orderly_poweroff() becomes async even if we do not run the
command and always succeeds, schedule_work() can only fail if the work is
already pending. We can export __orderly_poweroff() and change the
non-atomic callers which want the old semantics.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reported-by: David Gibson <david@gibson.dropbear.id.au> Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Feng Hong <hongfeng@marvell.com> Cc: Kees Cook <keescook@chromium.org> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Wanpeng Li [Wed, 20 Mar 2013 04:06:46 +0000 (15:06 +1100)]
mm/hugetlb: fix total hugetlbfs pages count when using memory overcommit accouting
hugetlb_total_pages is used for overcommit calculations but the current
implementation considers only the default hugetlb page size (which is
either the first defined hugepage size or the one specified by
default_hugepagesz kernel boot parameter).
If the system is configured for more than one hugepage size, which is
possible since a137e1cc ("hugetlbfs: per mount huge page sizes") then the
overcommit estimation done by __vm_enough_memory() (resp. shown by
meminfo_proc_show) is not precise - there is an impression of more
available/allowed memory. This can lead to an unexpected ENOMEM/EFAULT
resp. SIGSEGV when memory is accounted.
Testcase:
boot: hugepagesz=1G hugepages=1
the default overcommit ratio is 50
before patch:
egrep 'CommitLimit' /proc/meminfo
CommitLimit: 55434168 kB
after patch:
egrep 'CommitLimit' /proc/meminfo
CommitLimit: 54909880 kB
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: <stable@vger.kernel.org> [3.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
wake_up_klogd() is useless when CONFIG_PRINTK=n because neither printk()
nor printk_sched() are in use and there are actually no waiter on log_wait
waitqueue. It should be a stub in this case for users like
bust_spinlocks().
Otherwise this results in this warning when CONFIG_PRINTK=n
and CONFIG_IRQ_WORK=n:
kernel/built-in.o In function `wake_up_klogd':
(.text.wake_up_klogd+0xb4): undefined reference to `irq_work_queue'
To fix this, provide an off-case for wake_up_klogd() when CONFIG_PRINTK=n.
There is much more from console_unlock() and other console related code in
printk.c that should be moved under CONFIG_PRINTK. But for now, focus on
a minimal fix as we passed the merged window already.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reported-by: James Hogan <james.hogan@imgtec.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>