John Stultz [Fri, 2 Dec 2011 03:09:12 +0000 (14:09 +1100)]
merge_config.sh: fix bug in final check
Arnaud Lacombe pointed out the final checking that the requested configs
were included in the final .config was broken.
The example was that if you had a fragment that disabled
CONFIG_DECOMPRESS_GZIP applied to a normal defconfig, there would be no
final warning that CONFIG_DECOMPRESS_GZIP was acutally set in the final
.config.
This bug was introduced by me in v3 of the original patch, and the
following patch reverts the invalid change.
Signed-off-by: John Stultz <john.stultz@linaro.org> Reported-by: Arnaud Lacombe <lacombar@gmail.com> Cc: Darren Hart <dvhart@linux.intel.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Arnaud Lacombe <lacombar@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Darren Hart [Fri, 2 Dec 2011 03:09:11 +0000 (14:09 +1100)]
merge_config.sh: whitespace cleanup
Fix whitespace usage in the clean_up routine.
Signed-off-by: Darren Hart <dvhart@linux.intel.com> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Darren Hart [Fri, 2 Dec 2011 03:09:11 +0000 (14:09 +1100)]
merge_config.sh: use signal names compatible with dash and bash
The SIGHUP SIGINT and SIGTERM names caused failures when running
merge_config.sh with the dash shell. Dropping the "SIG" component makes
the script work in both bash and dash.
Signed-off-by: Darren Hart <dvhart@linux.intel.com> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
john stultz [Fri, 2 Dec 2011 03:09:08 +0000 (14:09 +1100)]
kconfig: add merge_config.sh script
After noticing almost every distro has their own method of managing config
fragments, I went looking at some best practices, and wanted to try to
consolidate some of the different approaches so this fairly simple
infrastructure can be shared (and new distros/build systems don't have to
implement yet another config fragment merge script).
This script is most influenced by the Windriver tools used in the Yocto
Project, reusing some portions found there.
This script merges multiple config fragments, warning on any overridden
values. It then sets any unspecified values to their default, then
finally checks to make sure no specified value was dropped due to
unsatisfied dependencies.
I'm sure this implementation won't work for everyone, and I expect it will
need to evolve to adapt for various use cases. But I think its a
reasonable starting point.
Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Greg Thelen <gthelen@google.com> Cc: <tartler@cs.fau.de> Cc: Dmitry Fink <Dmitry.Fink@palm.com> Cc: Darren Hart <dvhart@linux.intel.com> Cc: Eric B Munson <ebmunson@us.ibm.com> Cc: Bruce Ashfield <Bruce.Ashfield@windriver.com> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
tick-sched: add specific do_timer_cpu value for nohz off mode
Show and modify the tick_do_timer_cpu via sysfs. This determines the cpu
on which global time (jiffies) updates occur. Modification can only be
done on systems with nohz mode turned off.
While not necessarily harmful, doing jiffies updates on an application cpu
does cause some extra overhead that HPC benchmarking people notice. They
prefer to have OS activity isolated to certain cpus. They like
reproducibility of results, and having jiffies updates bouncing around
introduces variability.
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Garrett [Fri, 2 Dec 2011 03:08:46 +0000 (14:08 +1100)]
hrtimers: Special-case zero length sleeps
sleep(0) is a common construct used by applications that want to trigger
the scheduler. sched_yield() might make more sense, but only appeared in
POSIX.1-2001 and so plenty of example code still uses the sleep(0) form.
This wouldn't normally be a problem, but it means that event-driven
applications that are merely trying to avoid starving other processes may
actually end up sleeping due to having large timer_slack values. Special-
casing this seems reasonable.
Signed-off-by: Matthew Garrett <mjg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Witold Baryluk [Fri, 2 Dec 2011 03:08:46 +0000 (14:08 +1100)]
intel-iommu: Fix __init section missmatch of dmar_parse_rmrr_atsr_dev
dmar_parse_rmrr_atsr_dev() (drivers/iommu/dmar.c) is called from
dmar_dev_scope_init() (drivers/iommu/intel-iommu.c), but
dmar_dev_scope_init() is annotated with __init, when
dmar_parse_rmrr_atsr_dev() is not, causing full section missmatch
analsysis to abort compilation.
Fix problem by adding __init annotation to dmar_parse_rmrr_atsr_dev.
Signed-off-by: Witold Baryluk <baryluk@smp.if.uj.edu.pl> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Allen Kay <allen.m.kay@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mathias Krause [Fri, 2 Dec 2011 03:08:46 +0000 (14:08 +1100)]
arm, exec: remove redundant set_fs(USER_DS)
The address limit is already set in flush_old_exec() so this
set_fs(USER_DS) is redundant.
Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ERROR: do not use assignment in if condition
#59: FILE: drivers/platform/x86/sony-laptop.c:384:
+ if ((scancode = sony_laptop_input_index[event]) != -1) {
total: 1 errors, 0 warnings, 39 lines checked
./patches/drivers-platform-x86-sony-laptopc-fix-scancodes.patch has style problems, please review.
If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.
Please run checkpatch prior to sending patches
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
John Hughes [Fri, 2 Dec 2011 03:06:58 +0000 (14:06 +1100)]
drivers/platform/x86/sony-laptop.c: fix scancodes
The scancodes returned by the sony-laptop driver for function keys did not
match the scancodes used to remap keys. Also, since the scancode was sent
to the input subsystem after the mapped keysym the /lib/udev/keymap
utility was confused about which scancode to report for which keysym.
This patch fixes the driver so the correct scancode is shown for each key.
It also adds to the documentation a description of where to find the
scancodes.
Before the patch FN/E returned scancode 0x1B, but to remap scancode 0x14
had to be used.
Signed-off-by: John Hughes <john@calva.com> Cc: Mattia Dongili <malattia@linux.it> Cc: Matthew Garrett <mjg@redhat.com> Acked-by: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bjorn Helgaas [Fri, 2 Dec 2011 03:06:58 +0000 (14:06 +1100)]
x86: mpparse: account for bus types other than ISA and PCI
In commit f8924e770e04 ("x86: unify mp_bus_info"), the 32-bit and 64-bit
versions of MP_bus_info were rearranged to match each other better.
Unfortunately it introduced a regression: prior to that change we used to
always set the mp_bus_not_pci bit, then clear it if we found a PCI bus.
After it, we set mp_bus_not_pci for ISA buses, clear it for PCI buses, and
leave it alone otherwise.
In the cases of ISA and PCI, there's not much difference. But ISA is not
the only non-PCI bus, so it's better to always set mp_bus_not_pci and
clear it only for PCI.
Without this change, Dan's Dell PowerEdge 4200 panics on boot with a log
indicating interrupt routing trouble unless the "noapic" option is
supplied. With this change, the machine boots reliably without "noapic".
Fixes http://bugs.debian.org/586494
[jrnieder@gmail.com: clarified commit message] Reported-bisected-and-tested-by: Dan McGrath <troubledaemon@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <stable@vger.kernel.org> # 2.6.26+ Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kautuk Consul [Fri, 2 Dec 2011 03:06:57 +0000 (14:06 +1100)]
mm/vmalloc.c: eliminate extra loop in pcpu_get_vm_areas error path
If either of the vas or vms arrays are not properly kzalloced, then the
code jumps to the err_free label.
The err_free label runs a loop to check and free each of the array members
of the vas and vms arrays which is not required for this situation as none
of the array members have been allocated till this point.
Eliminate the extra loop we have to go through by introducing a new label
err_free2 and then jumping to it.
Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The problem is that in copy_page_range() we turn lazy mode on, and then in
swap_entry_free() we call swap_count_continued() which ends up in:
map = kmap_atomic(page, KM_USER0) + offset;
and then later we touch *map.
Since we are running in batched mode (lazy) we don't actually set up the
PTE mappings and the kmap_atomic is not done synchronously and ends up
trying to dereference a page that has not been set.
Looking at kmap_atomic_prot_pfn(), it uses 'arch_flush_lazy_mmu_mode' and
doing the same in kmap_atomic_prot() and __kunmap_atomic() makes the problem
go away.
Interestingly, commit b8bcfe997e4615 ("x86/paravirt: remove lazy mode in
interrupts") removed part of this to fix an interrupt issue - but it went
to far and did not consider this scenario.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jack Steiner [Fri, 2 Dec 2011 03:06:56 +0000 (14:06 +1100)]
x86: reduce clock calibration time during slave cpu startup
Reduce the startup time for slave cpus.
Adds hooks for an arch-specific function for clock calibration. These
hooks are used on x86. If a newly started cpu has the same phys_proc_id
as a core already active, uses the TSC for the delay loop and has a
CONSTANT_TSC, use the already-calculated value of loops_per_jiffy.
This patch reduces the time required to start slave cpus on a 4096 cpu
system from: 465 sec OLD 62 sec NEW
This reduces boot time on a 4096p system by almost 7 minutes. Nice...
[akpm@linux-foundation.org: fix CONFIG_SMP=n build] Signed-off-by: Jack Steiner <steiner@sgi.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Shaohua Li [Fri, 2 Dec 2011 03:06:56 +0000 (14:06 +1100)]
x86: tlb flush avoid superflous leave_mm()
If just one page VA tlb is required to be flushed and current task is in
lazy TLB state, doing leave_mm() is superfluous because it flushes the
whole TLB. This can reduce some TLB miss.
Signed-off-by: Shaohua Li <shaohua.li@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
arch/x86/kernel/e820.c: quiet sparse noise about plain integer as NULL pointer
The last parameter to sort() is a pointer to the function used to swap
items. This parameter should be NULL, not 0, when not used. This quiets
the following sparse warning:
warning: Using plain integer as NULL pointer
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ludwig Nussel [Fri, 2 Dec 2011 03:06:54 +0000 (14:06 +1100)]
x86: fix mmap random address range
On x86_32 casting the unsigned int result of get_random_int() to long may
result in a negative value. On x86_32 the range of mmap_rnd() therefore
was -255 to 255. The 32bit mode on x86_64 used 0 to 255 as intended.
The bug was introduced by 675a081 ("x86: unify mmap_{32|64}.c") in January
2008.
Signed-off-by: Ludwig Nussel <ludwig.nussel@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Harvey Harrison <harvey.harrison@gmail.com> Cc: <stable@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andres Salomon <dilinger@queued.net> Cc: Daniel Drake <dsd@laptop.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Matthew Garrett <mjg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Daniel Drake [Fri, 2 Dec 2011 03:06:53 +0000 (14:06 +1100)]
x86, olpc-xo15-sci: enable lid close wakeup control through sysfs
Like most systems, OLPC's ACPI LID switch wakes up the system when the lid
is opened, but not when it is closed.
Under OLPC's opportunistic suspend model, the lid may be closed while the
system was oportunistically suspended with the screen running. In this
event, we want to wake up to turn the screen off.
Enable control of normal ACPI wakeups through lid close events through a
new sysfs attribute "lid_wake_on_closed". When set, and when LID wakeups
are enabled through ACPI, the system will wake up on both open and close
lid events.
Signed-off-by: Daniel Drake <dsd@laptop.org> Cc: Andres Salomon <dilinger@queued.net> Cc: Matthew Garrett <mjg@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Shérab [Fri, 2 Dec 2011 03:06:53 +0000 (14:06 +1100)]
arch/x86/platform/iris/iris.c: register a platform device and a platform driver
This makes the iris driver use the platform API, so it is properly exposed
in /sys.
[akpm@linux-foundation.org: remove commented-out code, add missing space to printk, clean up code layout] Signed-off-by: Shérab <Sebastien.Hinderer@ens-lyon.org> Cc: Len Brown <lenb@kernel.org> Cc: Matthew Garrett <mjg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Peter Feuerer [Fri, 2 Dec 2011 03:06:52 +0000 (14:06 +1100)]
acerhdf: lowered default temp fanon/fanoff values
Due to new supported hardware, of which the actual temperature limits of
processor, harddisk and other components are unknown, it feels safer with
lower fanon / fanoff settings.
It won't change much for most people, already using acerhdf, as they use
their own fanon/fanoff variable settings when loading the module.
Furthermore seems like kernel and userspace tools have been improved to
work more efficient and netbooks don't get so hot anymore.
Signed-off-by: Peter Feuerer <peter@piie.net> Acked-by: Borislav Petkov <petkovbb@gmail.com> Cc: Matthew Garrett <mjg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alex Bligh [Fri, 2 Dec 2011 03:06:51 +0000 (14:06 +1100)]
net/netfilter/nf_conntrack_netlink.c: fix Oops on container destroy
Problem:
A repeatable Oops can be caused if a container with networking
unshared is destroyed when it has nf_conntrack entries yet to expire.
A copy of the oops follows below. A perl program generating the oops
repeatably is attached inline below.
Analysis:
The oops is called from cleanup_net when the namespace is
destroyed. conntrack iterates through outstanding events and calls
death_by_timeout on each of them, which in turn produces a call to
ctnetlink_conntrack_event. This calls nf_netlink_has_listeners, which
oopses because net->nfnl is NULL.
The perl program generates the container through fork() then
clone(NS_NEWNET). I does not explicitly set up netlink
explicitly set up netlink, but I presume it was set up else net->nfnl
would have been NULL earlier (i.e. when an earlier connection
timed out). This would thus suggest that net->nfnl is made NULL
during the destruction of the container, which I think is done by
nfnetlink_net_exit_batch.
I can see that the various subsystems are deinitialised in the opposite
order to which the relevant register_pernet_subsys calls are called,
and both nf_conntrack and nfnetlink_net_ops register their relevant
subsystems. If nfnetlink_net_ops registered later than nfconntrack,
then its exit routine would have been called first, which would cause
the oops described. I am not sure there is anything to prevent this
happening in a container environment.
Whilst there's perhaps a more complex problem revolving around ordering
of subsystem deinit, it seems to me that missing a netlink event on a
container that is dying is not a disaster. An early check for net->nfnl
being non-NULL in ctnetlink_conntrack_event appears to fix this. There
may remain a potential race condition if it becomes NULL immediately
after being checked (I am not sure any lock is held at this point or
how synchronisation for subsystem deinitialization works).
Patch:
The patch attached should apply on everything from 2.6.26 (if not before)
onwards; it appears to be a problem on all kernels. This was taken against
Ubuntu-3.0.0-11.17 which is very close to 3.0.4. I have torture-tested it
with the above perl script for 15 minutes or so; the perl script hung the
machine within 20 seconds without this patch.
Applicability:
If this is the right solution, it should be applied to all stable kernels
as well as head. Apart from the minor overhead of checking one variable
against NULL, it can never 'do the wrong thing', because if net->nfnl
is NULL, an oops will inevitably result. Therefore, checking is a reasonable
thing to do unless it can be proven than net->nfnl will never be NULL.
Check net->nfnl for NULL in ctnetlink_conntrack_event to avoid Oops on
container destroy
Signed-off-by: Alex Bligh <alex@alex.org.uk> Cc: Patrick McHardy <kaber@trash.net> Cc: David Miller <davem@davemloft.net> Cc: <stable@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Rientjes [Fri, 2 Dec 2011 03:06:51 +0000 (14:06 +1100)]
cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask
c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when changing
cpuset's mems") adds get_mems_allowed() to prevent the set of allowed
nodes from changing for a thread. This causes any update to a set of
allowed nodes to stall until put_mems_allowed() is called.
This stall is unncessary, however, if at least one node remains unchanged
in the update to the set of allowed nodes. This was addressed by 89e8a244b97e ("cpusets: avoid looping when storing to mems_allowed if one
node remains set"), but it's still possible that an empty nodemask may be
read from a mempolicy because the old nodemask may be remapped to the new
nodemask during rebind. To prevent this, only avoid the stall if there is
no mempolicy for the thread being changed.
This is a temporary solution until all reads from mempolicy nodemasks can
be guaranteed to not be empty without the get_mems_allowed()
synchronization.
Also moves the check for nodemask intersection inside task_lock() so that
tsk->mems_allowed cannot change. This ensures that nothing can set this
tsk's mems_allowed out from under us and also protects tsk->mempolicy.
Reported-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Paul Menage <paul@paulmenage.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Michal Hocko [Fri, 2 Dec 2011 03:06:50 +0000 (14:06 +1100)]
mm: Ensure that pfn_valid() is called once per pageblock when reserving pageblocks
setup_zone_migrate_reserve expects that zone->start_pfn starts
at pageblock_nr_pages aligned pfn otherwise we could access
beyond an existing memblock resulting in the following panic if
CONFIG_HOLES_IN_ZONE is not configured and we do not check pfn_valid:
We crashed in pageblock_is_reserved() when accessing pfn 0xc0000 because
highstart_pfn = 0x36ffe.
The issue was introduced in 3.0-rc1 by 6d3163ce ("mm: check if any page in
a pageblock is reserved before marking it MIGRATE_RESERVE").
Make sure that start_pfn is always aligned to pageblock_nr_pages to ensure
that pfn_valid s always called at the start of each pageblock.
Architectures with holes in pageblocks will be correctly handled by
pfn_valid_within in pageblock_is_reserved.
Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Mel Gorman <mgorman@suse.de> Tested-by: Dang Bo <bdang@vmware.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Arve Hjnnevg <arve@android.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> [3.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Youquan Song [Fri, 2 Dec 2011 03:06:50 +0000 (14:06 +1100)]
thp: set compound tail page _count to zero
70b50f94f1644 ("mm: thp: tail page refcounting fix") keeps all
page_tail->_count zero at all times. But the current kernel does not set
page_tail->_count to zero if a 1GB page is utilized. So when an IOMMU 1GB
page is used at KVM, it wil result in a kernel oops because a tail page's
_count does not equal zero.
Signed-off-by: Youquan Song <youquan.song@intel.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Youquan Song [Fri, 2 Dec 2011 03:06:49 +0000 (14:06 +1100)]
thp: add compound tail page _mapcount when mapped
With the 3.2-rc kernel, the IOMMU 2M page in KVM works. While I try to us
IOMMU 1GB page in KVM, I encounter a oops and 1GB page total fail to be
used. The root cause is that 1GB page allocation calls gup_huge_pud()
while 2M page calls gup_huge_pmd. If compound pages are used and the page
is tail page, gup_huge_pmd increase _mapcount to record tail page are
mapped while gup_huge_pud does not include this process. So when the
mapped page is relesed, it will result in kernel oops because the page
does not mark mapped.
This patch add tail process for compound page in 1GB huge page which keeps
the same process as 2M page.
Signed-off-by: Youquan Song <youquan.song@intel.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Peter Zijlstra [Fri, 2 Dec 2011 03:06:49 +0000 (14:06 +1100)]
printk: avoid double lock acquire
Commit 4f2a8d3cf5e ("printk: Fix console_sem vs logbuf_lock unlock race")
introduced another silly bug where we would want to acquire an already
held lock. Avoid this.
Reported-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
More players joined to memory cgroup developments and Johannes' great work
changed internal design of memory cgroup dramatically. And he will do
more works. Michal Hokko did many bug fixes and know memory cgroup very
well. Daisuke Nishimura helped us very much but he seems busy now.
Thanks to his works.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Use wait_event_freezable_timeout() instead of
schedule_timeout_interruptible() to avoid missing freezer wakeups. A
try_to_freeze() would have been needed in the khugepaged_alloc_hugepage
tight loop too in case of the allocation failing repeatedly, and
wait_event_freezable_timeout will provide it too.
khugepaged would still freeze just fine by trying again the next minute
but it's better if it freezes immediately.
Reported-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Tested-by: Jiri Slaby <jslaby@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Cc: "Rafael J. Wysocki" <rjw@suse.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
A shrinker function can return -1, means that it cannot do anything
without a risk of deadlock. For example prune_super() does this if it
cannot grab a superblock refrence, even if nr_to_scan=0. Currently we
interpret this -1 as a ULONG_MAX size shrinker and evaluate `total_scan'
according to this. So the next time around this shrinker can cause really
big pressure. Let's skip such shrinkers instead.
Also make total_scan signed, otherwise the check (total_scan < 0) below
never works.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>