vmscan: memcg: always use swappiness of the reclaimed memcg

[karo-tx-linux.git] / Documentation / cgroups / memory.txt
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt

index 2622115276aa028266ae376ff9116401409f433a..c564882f49b19caf76e74b73fe1f05f20a1d4637 100644 (file)
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -57,6 +57,7 @@ Brief summary of control files.
   memory.memsw.usage_in_bytes    # show current res_counter usage for memory+Swap
                                  (See 5.5 for details)
   memory.limit_in_bytes          # set/show limit of memory usage
+ memory.low_limit_in_bytes      # set/show low limit for memory reclaim
   memory.memsw.limit_in_bytes    # set/show limit of memory+Swap usage
   memory.failcnt                         # show the number of memory usage hits limits
   memory.memsw.failcnt           # show the number of memory+Swap hits limits
@@ -236,23 +237,26 @@ it by cgroup.
  2.5 Reclaim
  
  Each cgroup maintains a per cgroup LRU which has the same structure as
-global VM. When a cgroup goes over its limit, we first try
-to reclaim memory from the cgroup so as to make space for the new
-pages that the cgroup has touched. If the reclaim is unsuccessful,
-an OOM routine is invoked to select and kill the bulkiest task in the
-cgroup. (See 10. OOM Control below.)
-
-The reclaim algorithm has not been modified for cgroups, except that
-pages that are selected for reclaiming come from the per-cgroup LRU
-list.
-
-NOTE: Reclaim does not work for the root cgroup, since we cannot set any
-limits on the root cgroup.
-
-Note2: When panic_on_oom is set to "2", the whole system will panic.
-
-When oom event notifier is registered, event will be delivered.
-(See oom_control section)
+global VM. Cgroups can get reclaimed basically under two conditions
+ - under global memory pressure when all cgroups are reclaimed
+   proportionally wrt. their LRU size in a round robin fashion
+ - when a cgroup or its hierarchical parent (see 6. Hierarchical support)
+   hits hard limit. If the reclaim is unsuccessful, an OOM routine is invoked
+   to select and kill the bulkiest task in the hiearchy. (See 10. OOM Control
+   below.)
+
+Groups might be also protected from both global and limit reclaim by
+low_limit_in_bytes knob. If the limit is non-zero the reclaim logic
+doesn't include groups (and their subgroups - see 6. Hierarchy support)
+which are bellow the low limit if there is other eligible cgroup in the
+reclaimed hierarchy. If all groups which participate reclaim are under
+their low limits then all of them are reclaimed and the low limit is
+ignored.
+
+Note: When panic_on_oom is set to "2", the whole system will panic.
+
+When oom event notifier is registered, event will be delivered to the root
+of the memory pressure which cannot be handled (See oom_control section)
  
  2.6 Locking
  
@@ -270,6 +274,11 @@ When oom event notifier is registered, event will be delivered.
  
  2.7 Kernel Memory Extension (CONFIG_MEMCG_KMEM)
  
+WARNING: Current implementation lacks reclaim support. That means allocation
+        attempts will fail when close to the limit even if there are plenty of
+        kmem available for reclaim. That makes this option unusable in real
+        life so DO NOT SELECT IT unless for development purposes.
+
  With the Kernel memory extension, the Memory Controller is able to limit
  the amount of kernel memory used by the system. Kernel memory is fundamentally
  different than user memory, since it can't be swapped out, which makes it
@@ -453,15 +462,11 @@ About use_hierarchy, see Section 6.
  
  5.1 force_empty
    memory.force_empty interface is provided to make cgroup's memory usage empty.
-  You can use this interface only when the cgroup has no tasks.
    When writing anything to this
  
    # echo 0 > memory.force_empty
  
-  Almost all pages tracked by this memory cgroup will be unmapped and freed.
-  Some pages cannot be freed because they are locked or in-use. Such pages are
-  moved to parent (if use_hierarchy==1) or root (if use_hierarchy==0) and this
-  cgroup will be empty.
+  the cgroup will be reclaimed and as many pages reclaimed as possible.
  
    The typical use case for this interface is before calling rmdir().
    Because rmdir() moves all pages to parent, some out-of-use page caches can be
@@ -535,16 +540,13 @@ Note:
  
  5.3 swappiness
  
-Similar to /proc/sys/vm/swappiness, but affecting a hierarchy of groups only.
-Please note that unlike the global swappiness, memcg knob set to 0
-really prevents from any swapping even if there is a swap storage
-available. This might lead to memcg OOM killer if there are no file
-pages to reclaim.
+Overrides /proc/sys/vm/swappiness for the particular group. The tunable
+in the root cgroup corresponds to the global swappiness setting.
  
-Following cgroups' swappiness can't be changed.
-- root cgroup (uses /proc/sys/vm/swappiness).
-- a cgroup which uses hierarchy and it has other cgroup(s) below it.
-- a cgroup which uses hierarchy and not the root of hierarchy.
+Please note that unlike during the global reclaim, limit reclaim
+enforces that 0 swappiness really prevents from any swapping even if
+there is a swap storage available. This might lead to memcg OOM killer
+if there are no file pages to reclaim.
  
  5.4 failcnt
  
@@ -754,7 +756,6 @@ You can disable the OOM-killer by writing "1" to memory.oom_control file, as:
  
         #echo 1 > memory.oom_control
  
-This operation is only allowed to the top cgroup of a sub-hierarchy.
  If OOM-killer is disabled, tasks under cgroup will hang/sleep
  in memory cgroup's OOM-waitqueue when they request accountable memory.