From aa123a748ea552b18f0d4add823c29ddbddaf7b4 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 11 Apr 2017 09:17:08 -0700 Subject: [PATCH] doc: Update RCU data-structure documentation for rcu_segcblist The rcu_segcblist data structure, which contains segmented lists of RCU callbacks, was recently added. This commit updates the documentation accordingly. Signed-off-by: Paul E. McKenney --- .../Data-Structures/Data-Structures.html | 207 ++++++++++++------ .../RCU/Design/Data-Structures/nxtlist.svg | 34 +-- 2 files changed, 156 insertions(+), 85 deletions(-) diff --git a/Documentation/RCU/Design/Data-Structures/Data-Structures.html b/Documentation/RCU/Design/Data-Structures/Data-Structures.html index d583c653a703..2ab38ee420c5 100644 --- a/Documentation/RCU/Design/Data-Structures/Data-Structures.html +++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.html @@ -19,6 +19,8 @@ to each other. The rcu_state Structure
  • The rcu_node Structure +
  • + The rcu_segcblist Structure
  • The rcu_data Structure
  • @@ -841,6 +843,134 @@ for lockdep lock-class names. Finally, lines 64-66 produce an error if the maximum number of CPUs is too large for the specified fanout. +

    +The rcu_segcblist Structure

    + +The rcu_segcblist structure maintains a segmented list of +callbacks as follows: + +
    + 1 #define RCU_DONE_TAIL        0
    + 2 #define RCU_WAIT_TAIL        1
    + 3 #define RCU_NEXT_READY_TAIL  2
    + 4 #define RCU_NEXT_TAIL        3
    + 5 #define RCU_CBLIST_NSEGS     4
    + 6
    + 7 struct rcu_segcblist {
    + 8   struct rcu_head *head;
    + 9   struct rcu_head **tails[RCU_CBLIST_NSEGS];
    +10   unsigned long gp_seq[RCU_CBLIST_NSEGS];
    +11   long len;
    +12   long len_lazy;
    +13 };
    +
    + +

    +The segments are as follows: + +

      +
    1. RCU_DONE_TAIL: Callbacks whose grace periods have elapsed. + These callbacks are ready to be invoked. +
    2. RCU_WAIT_TAIL: Callbacks that are waiting for the + current grace period. + Note that different CPUs can have different ideas about which + grace period is current, hence the ->gp_seq field. +
    3. RCU_NEXT_READY_TAIL: Callbacks waiting for the next + grace period to start. +
    4. RCU_NEXT_TAIL: Callbacks that have not yet been + associated with a grace period. +
    + +

    +The ->head pointer references the first callback or +is NULL if the list contains no callbacks (which is +not the same as being empty). +Each element of the ->tails[] array references the +->next pointer of the last callback in the corresponding +segment of the list, or the list's ->head pointer if +that segment and all previous segments are empty. +If the corresponding segment is empty but some previous segment is +not empty, then the array element is identical to its predecessor. +Older callbacks are closer to the head of the list, and new callbacks +are added at the tail. +This relationship between the ->head pointer, the +->tails[] array, and the callbacks is shown in this +diagram: + +

    nxtlist.svg + +

    In this figure, the ->head pointer references the +first +RCU callback in the list. +The ->tails[RCU_DONE_TAIL] array element references +the ->head pointer itself, indicating that none +of the callbacks is ready to invoke. +The ->tails[RCU_WAIT_TAIL] array element references callback +CB 2's ->next pointer, which indicates that +CB 1 and CB 2 are both waiting on the current grace period, +give or take possible disagreements about exactly which grace period +is the current one. +The ->tails[RCU_NEXT_READY_TAIL] array element +references the same RCU callback that ->tails[RCU_WAIT_TAIL] +does, which indicates that there are no callbacks waiting on the next +RCU grace period. +The ->tails[RCU_NEXT_TAIL] array element references +CB 4's ->next pointer, indicating that all the +remaining RCU callbacks have not yet been assigned to an RCU grace +period. +Note that the ->tails[RCU_NEXT_TAIL] array element +always references the last RCU callback's ->next pointer +unless the callback list is empty, in which case it references +the ->head pointer. + +

    +There is one additional important special case for the +->tails[RCU_NEXT_TAIL] array element: It can be NULL +when this list is disabled. +Lists are disabled when the corresponding CPU is offline or when +the corresponding CPU's callbacks are offloaded to a kthread, +both of which are described elsewhere. + +

    CPUs advance their callbacks from the +RCU_NEXT_TAIL to the RCU_NEXT_READY_TAIL to the +RCU_WAIT_TAIL to the RCU_DONE_TAIL list segments +as grace periods advance. + +

    The ->gp_seq[] array records grace-period +numbers corresponding to the list segments. +This is what allows different CPUs to have different ideas as to +which is the current grace period while still avoiding premature +invocation of their callbacks. +In particular, this allows CPUs that go idle for extended periods +to determine which of their callbacks are ready to be invoked after +reawakening. + +

    The ->len counter contains the number of +callbacks in ->head, and the +->len_lazy contains the number of those callbacks that +are known to only free memory, and whose invocation can therefore +be safely deferred. + +

    Important note: It is the ->len field that +determines whether or not there are callbacks associated with +this rcu_segcblist structure, not the ->head +pointer. +The reason for this is that all the ready-to-invoke callbacks +(that is, those in the RCU_DONE_TAIL segment) are extracted +all at once at callback-invocation time. +If callback invocation must be postponed, for example, because a +high-priority process just woke up on this CPU, then the remaining +callbacks are placed back on the RCU_DONE_TAIL segment. +Either way, the ->len and ->len_lazy counts +are adjusted after the corresponding callbacks have been invoked, and so +again it is the ->len count that accurately reflects whether +or not there are callbacks associated with this rcu_segcblist +structure. +Of course, off-CPU sampling of the ->len count requires +the use of appropriate synchronization, for example, memory barriers. +This synchronization can be a bit subtle, particularly in the case +of rcu_barrier(). +

    The rcu_data Structure

    @@ -983,62 +1113,18 @@ choice. as follows:
    - 1 struct rcu_head *nxtlist;
    - 2 struct rcu_head **nxttail[RCU_NEXT_SIZE];
    - 3 unsigned long nxtcompleted[RCU_NEXT_SIZE];
    - 4 long qlen_lazy;
    - 5 long qlen;
    - 6 long qlen_last_fqs_check;
    + 1 struct rcu_segcblist cblist;
    + 2 long qlen_last_fqs_check;
    + 3 unsigned long n_cbs_invoked;
    + 4 unsigned long n_nocbs_invoked;
    + 5 unsigned long n_cbs_orphaned;
    + 6 unsigned long n_cbs_adopted;
      7 unsigned long n_force_qs_snap;
    - 8 unsigned long n_cbs_invoked;
    - 9 unsigned long n_cbs_orphaned;
    -10 unsigned long n_cbs_adopted;
    -11 long blimit;
    + 8 long blimit;
     
    -

    The ->nxtlist pointer and the -->nxttail[] array form a four-segment list with -older callbacks near the head and newer ones near the tail. -Each segment contains callbacks with the corresponding relationship -to the current grace period. -The pointer out of the end of each of the four segments is referenced -by the element of the ->nxttail[] array indexed by -RCU_DONE_TAIL (for callbacks handled by a prior grace period), -RCU_WAIT_TAIL (for callbacks waiting on the current grace period), -RCU_NEXT_READY_TAIL (for callbacks that will wait on the next -grace period), and -RCU_NEXT_TAIL (for callbacks that are not yet associated -with a specific grace period) -respectively, as shown in the following figure. - -

    nxtlist.svg - -

    In this figure, the ->nxtlist pointer references the -first -RCU callback in the list. -The ->nxttail[RCU_DONE_TAIL] array element references -the ->nxtlist pointer itself, indicating that none -of the callbacks is ready to invoke. -The ->nxttail[RCU_WAIT_TAIL] array element references callback -CB 2's ->next pointer, which indicates that -CB 1 and CB 2 are both waiting on the current grace period. -The ->nxttail[RCU_NEXT_READY_TAIL] array element -references the same RCU callback that ->nxttail[RCU_WAIT_TAIL] -does, which indicates that there are no callbacks waiting on the next -RCU grace period. -The ->nxttail[RCU_NEXT_TAIL] array element references -CB 4's ->next pointer, indicating that all the -remaining RCU callbacks have not yet been assigned to an RCU grace -period. -Note that the ->nxttail[RCU_NEXT_TAIL] array element -always references the last RCU callback's ->next pointer -unless the callback list is empty, in which case it references -the ->nxtlist pointer. - -

    CPUs advance their callbacks from the -RCU_NEXT_TAIL to the RCU_NEXT_READY_TAIL to the -RCU_WAIT_TAIL to the RCU_DONE_TAIL list segments -as grace periods advance. +

    The ->cblist structure is the segmented callback list +described earlier. The CPU advances the callbacks in its rcu_data structure whenever it notices that another RCU grace period has completed. The CPU detects the completion of an RCU grace period by noticing @@ -1049,16 +1135,7 @@ Recall that each rcu_node structure's ->completed field is updated at the end of each grace period. -

    The ->nxtcompleted[] array records grace-period -numbers corresponding to the list segments. -This allows CPUs that go idle for extended periods to determine -which of their callbacks are ready to be invoked after reawakening. - -

    The ->qlen counter contains the number of -callbacks in ->nxtlist, and the -->qlen_lazy contains the number of those callbacks that -are known to only free memory, and whose invocation can therefore -be safely deferred. +

    The ->qlen_last_fqs_check and ->n_force_qs_snap coordinate the forcing of quiescent states from call_rcu() and friends when callback @@ -1069,6 +1146,10 @@ lists grow excessively long. fields count the number of callbacks invoked, sent to other CPUs when this CPU goes offline, and received from other CPUs when those other CPUs go offline. +The ->n_nocbs_invoked is used when the CPU's callbacks +are offloaded to a kthread. + +

    Finally, the ->blimit counter is the maximum number of RCU callbacks that may be invoked at a given time. diff --git a/Documentation/RCU/Design/Data-Structures/nxtlist.svg b/Documentation/RCU/Design/Data-Structures/nxtlist.svg index abc4cc73a097..0223e79c38e0 100644 --- a/Documentation/RCU/Design/Data-Structures/nxtlist.svg +++ b/Documentation/RCU/Design/Data-Structures/nxtlist.svg @@ -19,7 +19,7 @@ id="svg2" version="1.1" inkscape:version="0.48.4 r9939" - sodipodi:docname="nxtlist.fig"> + sodipodi:docname="segcblist.svg"> @@ -28,7 +28,7 @@ image/svg+xml - + @@ -241,61 +241,51 @@ xml:space="preserve" x="225" y="675" - fill="#000000" - font-family="Courier" font-style="normal" font-weight="bold" font-size="324" - text-anchor="start" - id="text64">nxtlist + id="text64" + style="font-size:324px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->head nxttail[RCU_DONE_TAIL] + id="text66" + style="font-size:324px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->tails[RCU_DONE_TAIL] nxttail[RCU_WAIT_TAIL] + id="text68" + style="font-size:324px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->tails[RCU_WAIT_TAIL] nxttail[RCU_NEXT_READY_TAIL] + id="text70" + style="font-size:324px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->tails[RCU_NEXT_READY_TAIL] nxttail[RCU_NEXT_TAIL] + id="text72" + style="font-size:324px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->tails[RCU_NEXT_TAIL]