Michal Hocko [Thu, 22 May 2014 00:54:35 +0000 (10:54 +1000)]
memcg: allow setting low_limit
Export memory.low_limit_in_bytes knob with the same rules as the hard
limit represented by limit_in_bytes knob (e.g. no limit to be set for the
root cgroup). There is no memsw alternative for low_limit_in_bytes
because the primary motivation behind this limit is to protect the working
set of the group and so considering swap doesn't make much sense. There
is also no kmem variant exported because we do not have any easy way to
protect kernel allocations now.
Please note that the low limit might exceed the hard limit which basically
means that the group is not reclaimable if there is other reclaim target
in the hierarchy under pressure.
Signed-off-by: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Greg Thelen <gthelen@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Roman Gushchin <klamm@yandex-team.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Michal Hocko [Thu, 22 May 2014 00:54:34 +0000 (10:54 +1000)]
memcg, mm: introduce lowlimit reclaim
Previous discussions have shown that soft limits cannot be reformed
(http://lwn.net/Articles/555249/). This series introduces an alternative
approach for protecting memory allocated to processes executing within a
memory cgroup controller. It is based on a new tunable that was discussed
with Johannes and Tejun held during the kernel summit 2013 and at LSF
2014.
This patchset introduces such low limit that is functionally similar to a
minimum guarantee. Memcgs which are under their lowlimit are not
considered eligible for the reclaim (both global and hardlimit) unless all
groups under the reclaimed hierarchy are below the low limit when all of
them are considered eligible.
The previous version of the patchset posted as a RFC
(http://marc.info/?l=linux-mm&m=138677140628677&w=2) suggested a hard
guarantee without any fallback. More discussions led me to reconsidering
the default behavior and come up a more relaxed one. The hard requirement
can be added later based on a use case which really requires. It would be
controlled by memory.reclaim_flags knob which would specify whether to OOM
or fallback (default) when all groups are bellow low limit.
The default value of the limit is 0 so all groups are eligible by default
and an interested party has to explicitly set the limit.
The primary use case is to protect an amount of memory allocated to a
workload without it being reclaimed by an unrelated activity. In some
cases this requirement can be fulfilled by mlock but it is not suitable
for many loads and generally requires application awareness. Such
application awareness can be complex. It effectively forbids the use of
memory overcommit as the application must explicitly manage memory
residency.
With the low limit, such workloads can be placed in a memcg with a low
limit that protects the estimated working set.
The hierarchical behavior of the lowlimit is described in the first patch.
The second patch allows setting the lowlimit. The last 2 patches clarify
documentation about the memcg reclaim in gereneral (3rd patch) and low
limit (4th patch).
This patch (of 5)
This patch introduces low limit reclaim. The low_limit acts as a reclaim
protection because groups which are under their low_limit are considered
ineligible for reclaim. While hardlimit protects from using more memory
than allowed lowlimit protects from getting below memory assigned to the
group due to external memory pressure.
More precisely a group is considered eligible for the reclaim under a
specific hierarchy represented by its root only if the group is above its
low limit and the same applies to all parents up the hierarchy to the
root. Nevertheless the limit still might be ignored if all groups under
the reclaimed hierarchy are under their low limits. This will prevent
from OOM rather than protecting the memory.
Consider the following hierarchy with memory pressure coming from the
group A (hard limit reclaim - l-low_limit_in_bytes, u-usage_in_bytes,
h-limit_in_bytes):
root_mem_cgroup
.
_____/
/
A (l = 80 u=90 h=90)
/
/ \_________
/ \
B (l=0 u=50) C (l=50 u=40)
\
D (l=0 u=30)
A and B are reclaimable but C and D are not (D is protected by C).
The low_limit is 0 by default so every group is eligible. This patch
doesn't provide a way to set the limit yet although the core
infrastructure is there already.
Signed-off-by: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Greg Thelen <gthelen@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Roman Gushchin <klamm@yandex-team.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fabian Frederick [Thu, 22 May 2014 00:54:33 +0000 (10:54 +1000)]
kernel/kprobes.c: convert printk to pr_foo()
Also fixes some checkpatch warnings
-Static initialization
-Lines over 80 characters
Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
kernel/watchdog.c: In function `watchdog_timer_fn':
kernel/watchdog.c:368:4: warning: `smp_mb__after_clear_bit' is deprecated (declared at include/linux/bitops.h:48) [-Wdeprecated-declarations]
smp_mb__after_clear_bit();
That code was introduced in commit 90e6b763ca8a5eb739e59489f42d45e13431d157
("kernel/watchdog.c: print traces for all cpus on lockup detection") and then
merged with another branch containing commit febdbfe8a91ce0d11939d4940b592eb0dba8d663 ("arch: Prepare for
smp_mb__{before,after}_atomic()") which deprecates the
smp_mb__after_clear_bit() call in favour of smp_mb__after_atomic().
Signed-off-by: Jan Moskyto Matejka <mq@suse.cz> Acked-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Aaron Tomlin <atomlin@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Don Zickus <dzickus@redhat.com> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Aaron Tomlin [Thu, 22 May 2014 00:54:31 +0000 (10:54 +1000)]
kernel/watchdog.c: print traces for all cpus on lockup detection
A 'softlockup' is defined as a bug that causes the kernel to loop in
kernel mode for more than a predefined period to time, without giving
other tasks a chance to run.
Currently, upon detection of this condition by the per-cpu watchdog task,
debug information (including a stack trace) is sent to the system log.
On some occasions, we have observed that the "victim" rather than the
actual "culprit" (i.e. the owner/holder of the contended resource) is
reported to the user. Often this information has proven to be
insufficient to assist debugging efforts.
To avoid loss of useful debug information, for architectures which support
NMI, this patch makes it possible to improve soft lockup reporting. This
is accomplished by issuing an NMI to each cpu to obtain a stack trace.
If NMI is not supported we just revert back to the old method. A sysctl
and boot-time parameter is available to toggle this feature.
[dzickus@redhat.com: add CONFIG_SMP in certain areas] Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
rtc: s5m: consolidate two device type switch statements
In probe the configuration of driver for different chipsets was done in
two switch (pdata->device_type) statements. Consolidate them into one
switch statement to increase code readability.
Additionally check the return value of regmap_irq_get_virq and exit probe
on error.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Sangbeom Kim <sbkim73@samsung.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add support for S2MPS14 to the rtc-s5m driver. Differences in S2MPS14
(in comparison to S5M8767):
- Layout of registers;
- Lack of century support for time and alarms (7 registers used for
storing time/alarm);
- Two buffer control registers: WUDR and RUDR;
- No register for enabling writing time;
- RTC interrupts are reported in main PMIC I2C device;
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Sangbeom Kim <sbkim73@samsung.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Prepare for adding support for S2MPS14 RTC device to the
rtc-s5m driver:
1. Add a map of registers used by the driver which differ between
the chipsets (S5M876X and S2MPS14).
2. Move code of checking for alarm pending to separate function.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Sangbeom Kim <sbkim73@samsung.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Set the time needed for updating alarm and time registers to 0.45 ms.
The default is 7.32 ms which is too long and leads to warnings when
setting alarm or time:
s5m-rtc: waiting for UDR update, reached max number of retries
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Sangbeom Kim <sbkim73@samsung.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
rtc: s5m: remove undocumented time init on first boot
Remove the code for initializing time if this is first boot.
The code for detecting first boot uses undocumented field RTC_TCON in
RTC_UDR_CON register. According to S5M8767's datasheet this field is
reserved. On S2MPS14 it is not documented at all. On device first boot
the registers will be initialized with reset value (2000-01-01 00:00:00).
The code might work on S5M8763 but still this does not look like a task
for RTC driver.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Sangbeom Kim <sbkim73@samsung.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Prepare for adding support for S2MPS14 RTC device to the rtc-s5m driver:
1. Rename SEC* symbols to S5M.
2. Add S5M prefix to some of defines which are different between S5M876X
and S2MPS14.
This is only a rename-like patch, new code is not added.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Acked-by: Lee Jones <lee.jones@linaro.org> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Sangbeom Kim <sbkim73@samsung.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrew Morton [Thu, 22 May 2014 00:54:24 +0000 (10:54 +1000)]
mm/page_io.c: work around gcc bug
gcc-4.4.4 (at least) screws up this initialization.
mm/page_io.c: In function '__swap_writepage':
mm/page_io.c:277: error: unknown field 'bvec' specified in initializer
mm/page_io.c:278: warning: excess elements in struct initializer
mm/page_io.c:278: warning: (near initialization for 'from')