Merge remote-tracking branch 'mmc/mmc-next'

[karo-tx-linux.git] / Documentation / accounting / taskstats.txt
diff --git a/Documentation/accounting/taskstats.txt b/Documentation/accounting/taskstats.txt

index efd8f605bcd52fdf6b4d3fde71bd416593ea0006..ff06b738bb88065b28f6006c7a6381d5af7d57dd 100644 (file)
--- a/Documentation/accounting/taskstats.txt
+++ b/Documentation/accounting/taskstats.txt
@@ -26,20 +26,28 @@ leader - a process is deemed alive as long as it has any task belonging to it.
  Usage
  -----
  
  Usage
  -----
  
-To get statistics during task's lifetime, userspace opens a unicast netlink
+To get statistics during a task's lifetime, userspace opens a unicast netlink
  socket (NETLINK_GENERIC family) and sends commands specifying a pid or a tgid.
  The response contains statistics for a task (if pid is specified) or the sum of
  statistics for all tasks of the process (if tgid is specified).
  
  socket (NETLINK_GENERIC family) and sends commands specifying a pid or a tgid.
  The response contains statistics for a task (if pid is specified) or the sum of
  statistics for all tasks of the process (if tgid is specified).
  
-To obtain statistics for tasks which are exiting, userspace opens a multicast
-netlink socket. Each time a task exits, its per-pid statistics is always sent
-by the kernel to each listener on the multicast socket. In addition, if it is
-the last thread exiting its thread group, an additional record containing the
-per-tgid stats are also sent. The latter contains the sum of per-pid stats for
-all threads in the thread group, both past and present.
+To obtain statistics for tasks which are exiting, the userspace listener
+sends a register command and specifies a cpumask. Whenever a task exits on
+one of the cpus in the cpumask, its per-pid statistics are sent to the
+registered listener. Using cpumasks allows the data received by one listener
+to be limited and assists in flow control over the netlink interface and is
+explained in more detail below.
+
+If the exiting task is the last thread exiting its thread group,
+an additional record containing the per-tgid stats is also sent to userspace.
+The latter contains the sum of per-pid stats for all threads in the thread
+group, both past and present.
  
  getdelays.c is a simple utility demonstrating usage of the taskstats interface
  
  getdelays.c is a simple utility demonstrating usage of the taskstats interface
-for reporting delay accounting statistics.
+for reporting delay accounting statistics. Users can register cpumasks,
+send commands and process responses, listen for per-tid/tgid exit data,
+write the data received to a file and do basic flow control by increasing
+receive buffer sizes.
  
  Interface
  ---------
  
  Interface
  ---------
@@ -66,10 +74,20 @@ The messages are in the format
  
  The taskstats payload is one of the following three kinds:
  
  
  The taskstats payload is one of the following three kinds:
  
-1. Commands: Sent from user to kernel. The payload is one attribute, of type
-TASKSTATS_CMD_ATTR_PID/TGID, containing a u32 pid or tgid in the attribute
-payload. The pid/tgid denotes the task/process for which userspace wants
-statistics.
+1. Commands: Sent from user to kernel. Commands to get data on
+a pid/tgid consist of one attribute, of type TASKSTATS_CMD_ATTR_PID/TGID,
+containing a u32 pid or tgid in the attribute payload. The pid/tgid denotes
+the task/process for which userspace wants statistics.
+
+Commands to register/deregister interest in exit data from a set of cpus
+consist of one attribute, of type
+TASKSTATS_CMD_ATTR_REGISTER/DEREGISTER_CPUMASK and contain a cpumask in the
+attribute payload. The cpumask is specified as an ascii string of
+comma-separated cpu ranges e.g. to listen to exit data from cpus 1,2,3,5,7,8
+the cpumask would be "1-3,5,7-8". If userspace forgets to deregister interest
+in cpus before closing the listening socket, the kernel cleans up its interest
+set over time. However, for the sake of efficiency, an explicit deregistration
+is advisable.
  
  2. Response for a command: sent from the kernel in response to a userspace
  command. The payload is a series of three attributes of type:
  
  2. Response for a command: sent from the kernel in response to a userspace
  command. The payload is a series of three attributes of type:
@@ -78,9 +96,9 @@ a) TASKSTATS_TYPE_AGGR_PID/TGID : attribute containing no payload but indicates
  a pid/tgid will be followed by some stats.
  
  b) TASKSTATS_TYPE_PID/TGID: attribute whose payload is the pid/tgid whose stats
  a pid/tgid will be followed by some stats.
  
  b) TASKSTATS_TYPE_PID/TGID: attribute whose payload is the pid/tgid whose stats
-is being returned.
+are being returned.
  
  
-c) TASKSTATS_TYPE_STATS: attribute with a struct taskstsats as payload. The
+c) TASKSTATS_TYPE_STATS: attribute with a struct taskstats as payload. The
  same structure is used for both per-pid and per-tgid stats.
  
  3. New message sent by kernel whenever a task exits. The payload consists of a
  same structure is used for both per-pid and per-tgid stats.
  
  3. New message sent by kernel whenever a task exits. The payload consists of a
@@ -104,12 +122,12 @@ of atomicity).
  
  However, maintaining per-process, in addition to per-task stats, within the
  kernel has space and time overheads. To address this, the taskstats code
  
  However, maintaining per-process, in addition to per-task stats, within the
  kernel has space and time overheads. To address this, the taskstats code
-accumalates each exiting task's statistics into a process-wide data structure.
-When the last task of a process exits, the process level data accumalated also
+accumulates each exiting task's statistics into a process-wide data structure.
+When the last task of a process exits, the process level data accumulated also
  gets sent to userspace (along with the per-task data).
  
  When a user queries to get per-tgid data, the sum of all other live threads in
  gets sent to userspace (along with the per-task data).
  
  When a user queries to get per-tgid data, the sum of all other live threads in
-the group is added up and added to the accumalated total for previously exited
+the group is added up and added to the accumulated total for previously exited
  threads of the same thread group.
  
  Extending taskstats
  threads of the same thread group.
  
  Extending taskstats
@@ -138,4 +156,26 @@ struct too much, requiring disparate userspace accounting utilities to
  unnecessarily receive large structures whose fields are of no interest, then
  extending the attributes structure would be worthwhile.
  
  unnecessarily receive large structures whose fields are of no interest, then
  extending the attributes structure would be worthwhile.
  
+Flow control for taskstats
+--------------------------
+
+When the rate of task exits becomes large, a listener may not be able to keep
+up with the kernel's rate of sending per-tid/tgid exit data leading to data
+loss. This possibility gets compounded when the taskstats structure gets
+extended and the number of cpus grows large.
+
+To avoid losing statistics, userspace should do one or more of the following:
+
+- increase the receive buffer sizes for the netlink sockets opened by
+listeners to receive exit data.
+
+- create more listeners and reduce the number of cpus being listened to by
+each listener. In the extreme case, there could be one listener for each cpu.
+Users may also consider setting the cpu affinity of the listener to the subset
+of cpus to which it listens, especially if they are listening to just one cpu.
+
+Despite these measures, if the userspace receives ENOBUFS error messages
+indicated overflow of receive buffers, it should take measures to handle the
+loss of data.
+
  ----
  ----