The posix cpu timer expiry time is stored in a union of two types: a 64
bits field if we rely on scheduler precise accounting, or a cputime_t if
we rely on jiffies.
This results in quite some duplicate code and special cases to handle the
two types.
Just unify this into a single 64 bits field. cputime_t can always fit
into it.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Jones <davej@redhat.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Nathan Zimmer <nzimmer@sgi.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
v3: Corrected the case where max_cpus != nr_cpu_ids by exiting early.
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com> Reported-by: Dave Jones <davej@redhat.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Nathan Zimmer [Wed, 20 Mar 2013 04:06:54 +0000 (15:06 +1100)]
timer_list: convert timer list to be a proper seq_file
When running with 4096 cores attemping to read /proc/timer_list will fail
with an ENOMEM condition. On a sufficantly large systems the total amount
of data is more then 4mb, so it won't fit into a single buffer. The
failure can also occur on smaller systems when memory fragmentation is
high as reported by Dave Jones.
Convert /proc/timer_list to a proper seq_file with its own iterator. This
is a little more complex given that we have to make two passes with two
separate headers.
[akpm@linux-foundation.org: whitespace fixlet]
[akpm@linux-foundation.org: fix up comment]
[akpm@linux-foundation.org: fix gcc warnings]
[akpm@linux-foundation.org: fix typo in comment] Signed-off-by: Nathan Zimmer <nzimmer@sgi.com> Reported-by: Dave Jones <davej@redhat.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Nathan Zimmer [Wed, 20 Mar 2013 04:06:54 +0000 (15:06 +1100)]
timer_list-split-timer_list_show_tickdevices-v4
v4: correct extra whitespace
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com> Cc: Dave Jones <davej@redhat.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Nathan Zimmer [Wed, 20 Mar 2013 04:06:54 +0000 (15:06 +1100)]
timer_list: split timer_list_show_tickdevices()
Split timer_list_show_tickdevices() out the header and just pull the rest up
to timer_list_show. Also tweak the location of the whitespace. This is all
to prep for the fix.
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com> Reported-by: Dave Jones <davej@redhat.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ondrej Zary [Wed, 20 Mar 2013 04:06:53 +0000 (15:06 +1100)]
cyber2000fb: avoid palette corruption at higher clocks
When 1280x1024@75Hz mode is set, console palette is not set properly -
sometimes the background is white, sometimes yellow and text colors are
also messed up. This does not happen at 1280x1024@60Hz and below.
It seems that the HW needs some time before setting the palette - maybe
the PLL needs more time to lock at higher speeds. This patch fixes the
problem but without knowing what register to check for PLL lock(?), the
delay might be excessive.
On Fri, 28 Jan 2011 18:15:37 +0000
Russell King <rmk@arm.linux.org.uk> wrote:
> On Tue, Jan 18, 2011 at 01:14:24PM -0800, Andrew Morton wrote:
> > Russell, I have an (old) note here that this is awaiting an ack from
> > yourself?
>
> Well, I can reproduce this problem on the Netwinders here. I'm not sure
> that we should delay all mode switches by one second - and any attempt
> to reduce this value does result in the palette not being set correctly.
>
> For 1280x1024-75, the dotclock is 135MHz, which gives a PLL values of
> 0x41 and 0x06. That's: M=0x41+1, N=0x06+1, P=0x00 (top 2 bits of 0x06)
> -> Q=1
>
> Fpll = 14.31818MHz * M / N
> Fout = Fpll / Q
>
> The PLL itself is formed by dividing the 14-ish MHz frequency by N and
> phase comparing the output of the VCO, divided by M, and adjusting the
> VCO until the two correlate. As VCOs typically tend to have a limited
> range, it's normal to divide the output frequency to produce a greater
> range - and in this case that's done by Q.
>
> For the 800x600-100 copied from /etc/fb.modes, this has a dotclock of
> 67.5MHz, which is exactly half this rate. The PLL values for this are:
> M=0x41+1, N=0x06+1, P=0x01, giving PLL values of 0x41 and 0x46.
>
> Booting with 800x600-100 does not suffer the problem. So it's not
> related to PLL lock time. There's something else going on.
>
> Another experiment I tried was forcing the PLL values to produce 108MHz
> instead of 135MHz. 108MHz is the dotclock for 1280x1024-60. This too
> doesn't suffer the problem.
>
> I've also tried chosing other delay values. 100ms is too short and
> produces the problem, but 1s works. 1s for a PLL to lock is a hell of
> a time, especially for a PLL operating in the MHz range.
>
> I've tried setting the PLL to a known good freqency, and then switching
> to 135MHz - the problem persists. It's not like 135MHz is reaching the
> limits - it'll go up to 206MHz.
>
> So, I don't think this has anything to do with PLL locking. I think
> there's something else going on which isn't immediately obvious - maybe
> bandwidth starvation preventing us from writing properly to the palette?
> As it's a horrible VGA, where you write the same register multiple times
> I wouldn't be surprised if some writes were going missing.
>
> I'll see if I can play around with it some more this evening, but I've
> spent an awful long time on just this issue already this afternoon...
>
> I think further investigation needs to happen on this patch before it's
> acceptable. Or maybe we should prevent the cyberpro coming up in
Signed-off-by: Ondrej Zary <linux@rainbow-software.org> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Haiyang Zhang [Wed, 20 Mar 2013 04:06:53 +0000 (15:06 +1100)]
drivers/video: add Hyper-V Synthetic Video Frame Buffer Driver
This is the driver for the Hyper-V Synthetic Video, which supports screen
resolution up to Full HD 1920x1080 on Windows Server 2012 host, and
1600x1200 on Windows Server 2008 R2 or earlier. It also solves the double
mouse cursor issue of the emulated video mode.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Cc: Olaf Hering <olaf@aepfle.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Devendra Naga [Wed, 20 Mar 2013 04:06:53 +0000 (15:06 +1100)]
drivers/video/console/fbcon_cw.c: fix compiler warning in cw_update_attr
with make W=1
saw
drivers/video/console/fbcon_cw.c: In function `cw_update_attr':
drivers/video/console/fbcon_cw.c:30:8: warning: variable `t' set but not used [-Wunused-but-set-variable]
matroxfb: convert struct i2c_msg initialization to C99 format
Convert the struct i2c_msg initialization to C99 format. This makes
maintaining and editing the code simpler. Also helps once other fields
like transferred are added in future.
Thanks to Julia Lawall for automating the conversion.
Signed-off-by: Shubhrajyoti D <shubhrajyoti@ti.com> Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Julia Lawall <julia@diku.dk> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Daniel Vetter [Wed, 20 Mar 2013 04:06:52 +0000 (15:06 +1100)]
drm/fb-helper: don't sleep for screen unblank when an oops is in progress
Otherwise the system will burn even brighter and worse, leave the user
wondering what's going on exactly.
Since we already have a panic handler which will (try) to restore the
entire fbdev console mode, we can just bail out. Inspired by a patch from
Konstantin Khlebnikov. The callchain leading to this, cut&pasted from
Konstantin's original patch:
Note that the entire locking in the fb helper around panic/sysrq and kdbg
is ... non-existant. So we have a decent change of blowing up
everything. But since reworking this ties in with funny concepts like the
fbdev notifier chain or the impressive things which happen around
console_lock while oopsing, I'll leave that as an exercise for braver
souls than me.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Dave Airlie <airlied@gmail.com> Reviewed-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mm: use vm_unmapped_area() on powerpc architecture
Update the powerpc slice_get_unmapped_area function to make use of
vm_unmapped_area() instead of implementing a brute force search.
Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Tested-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mm: remove free_area_cache use in powerpc architecture
As all other architectures have been converted to use vm_unmapped_area(),
we are about to retire the free_area_cache.
This change simply removes the use of that cache in
slice_get_unmapped_area(), which will most certainly have a
performance cost. Next one will convert that function to use the
vm_unmapped_area() infrastructure and regain the performance.
Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Wen Congyang [Wed, 20 Mar 2013 04:06:51 +0000 (15:06 +1100)]
x86: make 'mem=' option to work for efi platform
Current mem boot option only can work for non efi environment. If the
user specifies add_efi_memmap, it cannot work for efi environment. In the
efi environment, we call e820_add_region() to add the memory map. So we
can modify __e820_add_region() and the mem boot option can work for efi
environment.
Note: Only E820_RAM is limited, and BOOT_SERVICES_{CODE,DATA} are always
mapped(If its address >= mem_limit, the memory won't be freed in
efi_free_boot_services()).
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Rob Landley <rob@landley.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Yasuaki ISIMATU <isimatu.yasuaki@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Matthew Garrett <mjg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Nathan Zimmer [Wed, 20 Mar 2013 04:06:51 +0000 (15:06 +1100)]
sound: convert snd_info_register() to use proc_create_data()
Convert snd_info_register to use proc_create_data instead of
create_proc_entry. This corrects a sparse warning introduced by "procfs:
Improve Scaling in proc" It is also a bit cleaner to let proc_create_data
set the ->data and ->proc_fops.
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jianguo Wu [Wed, 20 Mar 2013 04:06:50 +0000 (15:06 +1100)]
arch/x86/mm/init_64.c: fix build warning when CONFIG_MEMORY_HOTREMOVE=n
There is a warning while building kernel with
CONFIG_MEMORY_HOTPLUG=y && CONFIG_MEMORY_HOTREMOVE=n:
arch/x86/mm/init_64.c:1024: warning:kernel_physical_mapping_remove defined but not used
So move kernel_physical_mapping_remove() into "#ifdef
CONFIG_MEMORY_HOTREMOVE" block
Oleg Nesterov [Wed, 20 Mar 2013 04:06:50 +0000 (15:06 +1100)]
kthread: kill task_get_live_kthread()
task_get_live_kthread() looks confusing and unneeded. It does
get_task_struct() but only kthread_stop() needs this, it can be called
even if the calller doesn't have a reference when we know that this
kthread can't exit until we do kthread_stop().
kthread_park() and kthread_unpark() do not need get_task_struct(), the
callers already have the reference. And it can not help if we can race
with the exiting kthread anyway, kthread_park() can hang forever in this
case.
Change kthread_park() and kthread_unpark() to use to_live_kthread(),
change kthread_stop() to do get_task_struct() by hand and remove
task_get_live_kthread().
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Wed, 20 Mar 2013 04:06:50 +0000 (15:06 +1100)]
kthread: introduce to_live_kthread()
"k->vfork_done != NULL" with a barrier() after to_kthread(k) in
task_get_live_kthread(k) looks unclear, and sub-optimal because we load
->vfork_done twice.
All we need is to ensure that we do not return to_kthread(NULL). Add a
new trivial helper which loads/checks ->vfork_done once, this also looks
more understandable.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Wed, 20 Mar 2013 04:06:49 +0000 (15:06 +1100)]
thinkpad-acpi: kill hotkey_thread_mutex
hotkey_kthread() does try_to_freeze() under hotkey_thread_mutex.
We can simply kill this mutex, hotkey_poll_stop_sync() does not need to
serialize with hotkey_kthread(). When kthread_stop() returns the thread
is already dead, it called do_exit()->complete_vfork_done().
Reported-by: Artem Savkov <artem.savkov@gmail.com> Reported-by: Maciej Rutecki <maciej.rutecki@gmail.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Matthew Garrett <matthew.garrett@nebula.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Reviewed-by: Mandeep Singh Baines <msb@chromium.org> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sanjay Lal [Wed, 20 Mar 2013 04:06:49 +0000 (15:06 +1100)]
mips: define KVM_USER_MEM_SLOTS
ARCH=mips, config=fuloong2e_defconfig:
akpm3:/usr/src/25> make arch/mips/kernel/early_printk.o
...
CC arch/mips/kernel/asm-offsets.s
In file included from arch/mips/kernel/asm-offsets.c:20:
include/linux/kvm_host.h:334: error: `KVM_USER_MEM_SLOTS' undeclared here (not in a function)
Signed-off-by: Sanjay Lal <sanjayl@kymasys.com> Reported-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Josh Boyer [Wed, 20 Mar 2013 04:06:49 +0000 (15:06 +1100)]
kmsg: honor dmesg_restrict sysctl on /dev/kmsg
Originally, the addition of dmesg_restrict covered both the syslog
method of accessing dmesg, as well as /dev/kmsg itself. This was done
indirectly by security_syslog calling cap_syslog before doing any LSM
checks.
However, commit 12b3052c3ee ("capabilities/syslog: open code cap_syslog
logic to fix build failure") moved the code around and pushed the checks
into the caller itself. That seems to have inadvertently dropped the
checks for dmesg_restrict on /dev/kmsg. Most people haven't noticed
because util-linux dmesg(1) defaults to using the syslog method for access
in older versions. With util-linux 2.22 and a kernel newer than 3.5,
dmesg(1) defaults to reading directly from /dev/kmsg.
Fix this by making an explicit check in the devkmsg_open function.
This fixes https://bugzilla.redhat.com/show_bug.cgi?id=903192
Signed-off-by: Josh Boyer <jwboyer@redhat.com> Reported-by: Christian Kujau <lists@nerdbynature.de> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Vladimir Davydov [Wed, 20 Mar 2013 04:06:48 +0000 (15:06 +1100)]
mqueue: sys_mq_open: do not call mnt_drop_write() if read-only
mnt_drop_write() must be called only if mnt_want_write() succeeded,
otherwise the mnt_writers counter will diverge.
mnt_writers counters are used to check if remounting FS as read-only is
OK, so after an extra mnt_drop_write() call, it would be impossible to
remount mqueue FS as read-only. Besides, on umount a warning would be
printed like this one:
[ 194.714880] =====================================
[ 194.719680] [ BUG: bad unlock balance detected! ]
[ 194.724488] 3.9.0-rc3 #5 Not tainted
[ 194.728159] -------------------------------------
[ 194.732958] a.out/12486 is trying to release lock (sb_writers) at:
[ 194.739355] [<ffffffff811b177f>] mnt_drop_write+0x1f/0x30
[ 194.744851] but there are no more locks to release!
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Doug Ledford <dledford@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alexander Duyck [Wed, 20 Mar 2013 04:06:48 +0000 (15:06 +1100)]
dma-debug: update DMA debug API to better handle multiple mappings of a buffer
There were reports of the igb driver unmapping buffers without calling
dma_mapping_error. On closer inspection issues were found in the DMA
debug API and how it handled multiple mappings of the same buffer.
The issue I found is the fact that the debug_dma_mapping_error would only
set the map_err_type to MAP_ERR_CHECKED in the case that the was only one
match for device and device address. However in the case of non-IOMMU,
multiple addresses existed and as a result it was not setting this field
once a second mapping was instantiated. I have resolved this by changing
the search so that it instead will now set MAP_ERR_CHECKED on the first
buffer that matches the device and DMA address that is currently in the
state MAP_ERR_NOT_CHECKED.
A secondary side effect of this patch is that in the case of multiple
buffers using the same address only the last mapping will have a valid
map_err_type. The previous mappings will all end up with map_err_type set
to MAP_ERR_CHECKED because of the dma_mapping_error call in
debug_dma_map_page. However this behavior may be preferable as it means
you will likely only see one real error per multi-mapped buffer, versus
the current behavior of multiple false errors mer multi-mapped buffer.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Reviewed-by: Shuah Khan <shuah.khan@hp.com> Tested-by: Shuah Khan <shuah.khan@hp.com> Cc: Jakub Kicinski <kubakici@wp.pl> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alexander Duyck [Wed, 20 Mar 2013 04:06:47 +0000 (15:06 +1100)]
dma-debug: fix locking bug in check_unmap()
In check_unmap() it is possible to get into a dead-locked state if
dma_mapping_error is called. The problem is that the bucket is locked in
check_unmap, and locked again by debug_dma_mapping_error which is called
by dma_mapping_error. To resolve that we must release the lock on the
bucket before making the call to dma_mapping_error.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Reviewed-by: Shuah Khan <shuah.khan@hp.com> Tested-by: Shuah Khan <shuah.khan@hp.com> Cc: Jakub Kicinski <kubakici@wp.pl> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
zone_end_pfn is "unsigned" (32 bits). Changing it to
"unsigned long" (64 bits) fixes the problem.
zone_end_pfn() was added recently in commit 108bcc96ef70 ("mm: add & use
zone_end_pfn() and zone_spans_pfn()")
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/include/linux/mmzone.h?id=108bcc96ef7047c02cad4d229f04da38186a3f3f
Signed-off-by: Russ Anderson <rja@sgi.com> Reported-by: George Beshers <gbeshers@sgi.com> Acked-by: Hedi Berriche <hedi@sgi.com> Cc: Cody P Schafer <cody@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Wed, 20 Mar 2013 04:06:47 +0000 (15:06 +1100)]
poweroff: change orderly_poweroff() to use schedule_work()
David said:
Commit 6c0c0d4d108 ("poweroff: fix bug in orderly_poweroff()")
apparently fixes one bug in orderly_poweroff(), but introduces
another. The comments on orderly_poweroff() claim it can be called
from any context - and indeed we call it from interrupt context in
arch/powerpc/platforms/pseries/ras.c for example. But since that
commit this is no longer safe, since call_usermodehelper_fns() is not
safe in interrupt context without the UMH_NO_WAIT option.
orderly_poweroff() can be used from any context but UMH_WAIT_EXEC is
sleepable. Move the "force" logic into __orderly_poweroff() and change
orderly_poweroff() to use the global poweroff_work which simply calls
__orderly_poweroff().
While at it, remove the unneeded "int argc" and change argv_split() to use
GFP_KERNEL.
We use the global "bool poweroff_force" to pass the argument, this can
obviously affect the previous request if it is pending/running. So we
only allow the "false => true" transition assuming that the pending "true"
should succeed anyway. If schedule_work() fails after that we know that
work->func() was not called yet, it must see the new value.
This means that orderly_poweroff() becomes async even if we do not run the
command and always succeeds, schedule_work() can only fail if the work is
already pending. We can export __orderly_poweroff() and change the
non-atomic callers which want the old semantics.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reported-by: David Gibson <david@gibson.dropbear.id.au> Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Feng Hong <hongfeng@marvell.com> Cc: Kees Cook <keescook@chromium.org> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Wanpeng Li [Wed, 20 Mar 2013 04:06:46 +0000 (15:06 +1100)]
mm/hugetlb: fix total hugetlbfs pages count when using memory overcommit accouting
hugetlb_total_pages is used for overcommit calculations but the current
implementation considers only the default hugetlb page size (which is
either the first defined hugepage size or the one specified by
default_hugepagesz kernel boot parameter).
If the system is configured for more than one hugepage size, which is
possible since a137e1cc ("hugetlbfs: per mount huge page sizes") then the
overcommit estimation done by __vm_enough_memory() (resp. shown by
meminfo_proc_show) is not precise - there is an impression of more
available/allowed memory. This can lead to an unexpected ENOMEM/EFAULT
resp. SIGSEGV when memory is accounted.
Testcase:
boot: hugepagesz=1G hugepages=1
the default overcommit ratio is 50
before patch:
egrep 'CommitLimit' /proc/meminfo
CommitLimit: 55434168 kB
after patch:
egrep 'CommitLimit' /proc/meminfo
CommitLimit: 54909880 kB
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: <stable@vger.kernel.org> [3.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
wake_up_klogd() is useless when CONFIG_PRINTK=n because neither printk()
nor printk_sched() are in use and there are actually no waiter on log_wait
waitqueue. It should be a stub in this case for users like
bust_spinlocks().
Otherwise this results in this warning when CONFIG_PRINTK=n
and CONFIG_IRQ_WORK=n:
kernel/built-in.o In function `wake_up_klogd':
(.text.wake_up_klogd+0xb4): undefined reference to `irq_work_queue'
To fix this, provide an off-case for wake_up_klogd() when CONFIG_PRINTK=n.
There is much more from console_unlock() and other console related code in
printk.c that should be moved under CONFIG_PRINTK. But for now, focus on
a minimal fix as we passed the merged window already.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reported-by: James Hogan <james.hogan@imgtec.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>