packages/kernel/v2_0/doc/SMP.txt

   1
   2 eCos SMP Support
   3 ================
   4
   5 eCos contains support for limited Symmetric Multi-Processing
   6 (SMP). This is only available on selected architectures and platforms.
   7
   8 This first part of this document describes the platform-independent
   9 parts of the SMP support. Annexes at the end of this document describe
  10 any details that are specific to a particular platform.
  11
  12 Target Hardware Limitations
  13 ---------------------------
  14
  15 To allow a reasonable implementation of SMP, and to reduce the
  16 disruption to the existing source base, a number of assumptions have
  17 been made about the features of the target hardware.
  18
  19 - Modest multiprocessing. The typical number of CPUs supported is two
  20   to four, with an upper limit around eight. While there are no
  21   inherent limits in the code, hardware and algorithmic limitations
  22   will probably become significant beyond this point.
  23
  24 - SMP synchronization support. The hardware must supply a mechanism to
  25   allow software on two CPUs to synchronize. This is normally provided
  26   as part of the instruction set in the form of test-and-set,
  27   compare-and-swap or load-link/store-conditional instructions. An
  28   alternative approach is the provision of hardware semaphore
  29   registers which can be used to serialize implementations of these
  30   operations. Whatever hardware facilities are available, they are
  31   used in eCos to implement spinlocks.
  32
  33 - Coherent caches. It is assumed that no extra effort will be required
  34   to access shared memory from any processor. This means that either
  35   there are no caches, they are shared by all processors, or are
  36   maintained in a coherent state by the hardware. It would be too
  37   disruptive to the eCos sources if every memory access had to be
  38   bracketed by cache load/flush operations. Any hardware that requires
  39   this is not supported.
  40
  41 - Uniform addressing. It is assumed that all memory that is
  42   shared between CPUs is addressed at the same location from all
  43   CPUs. Like non-coherent caches, dealing with CPU-specific address
  44   translation is considered too disruptive to the eCos source
  45   base. This does not, however, preclude systems with non-uniform
  46   access costs for different CPUs.
  47
  48 - Uniform device addressing. As with access to memory, it is assumed
  49   that all devices are equally accessible to all CPUs. Since device
  50   access is often made from thread contexts, it is not possible to
  51   restrict access to device control registers to certain CPUs, since
  52   there is currently no support for binding or migrating threads to CPUs.
  53
  54 - Interrupt routing. The target hardware must have an interrupt
  55   controller that can route interrupts to specific CPUs. It is
  56   acceptable for all interrupts to be delivered to just one CPU, or
  57   for some interrupts to be bound to specific CPUs, or for some
  58   interrupts to be local to each CPU. At present dynamic routing,
  59   where a different CPU may be chosen each time an interrupt is
  60   delivered, is not supported. ECos cannot support hardware where all
  61   interrupts are delivered to all CPUs simultaneously with the
  62   expectation that software will resolve any conflicts.
  63
  64 - Inter-CPU interrupts. A mechanism to allow one CPU to interrupt
  65   another is needed. This is necessary so that events on one CPU can
  66   cause rescheduling on other CPUs.
  67
  68 - CPU Identifiers. Code running on a CPU must be able to determine
  69   which CPU it is running on. The CPU Id is usually provided either in
  70   a CPU status register, or in a register associated with the
  71   inter-CPU interrupt delivery subsystem. Ecos expects CPU Ids to be
  72   small positive integers, although alternative representations, such
  73   as bitmaps, can be converted relatively easily. Complex mechanisms
  74   for getting the CPU Id cannot be supported. Getting the CPU Id must
  75   be a cheap operation, since it is done often, and in performance
  76   critical places such as interrupt handlers and the scheduler.
  77
  78 Kernel Support
  79 --------------
  80
  81 This section describes how SMP is handled in the kernel, and where
  82 system behaviour differs from a single CPU system.
  83
  84 System Startup
  85 ~~~~~~~~~~~~~~
  86
  87 System startup takes place on only one CPU, called the primary
  88 CPU. All other CPUs, the secondary CPUs, are either placed in
  89 suspended state at reset, or are captured by the HAL and put into
  90 a spin as they start up.
  91
  92 The primary CPU is responsible for copying the DATA segment and
  93 zeroing the BSS (if required), calling HAL variant and platform
  94 initialization routines and invoking constructors. It then calls
  95 cyg_start() to enter the application. The application may then create
  96 extra threads and other objects.
  97
  98 It is only when the application calls Cyg_Scheduler::start() that the
  99 secondary CPUs are initialized. This routine scans the list of
 100 available secondary CPUs and calls HAL_SMP_CPU_START() to start each one.
 101 Finally it calls Cyg_Scheduler::start_cpu().
 102
 103 Each secondary CPU starts in the HAL, where it completes any per-CPU
 104 initialization before calling into the kernel at
 105 cyg_kernel_cpu_startup(). Here it claims the scheduler lock and calls
 106 Cyg_Scheduler::start_cpu().
 107
 108 Cyg_Scheduler::start_cpu() is common to both the primary and secondary
 109 CPUs. The first thing this code does is to install an interrupt object
 110 for this CPU's inter-CPU interrupt. From this point on the code is the
 111 same as for the single CPU case: an initial thread is chosen and
 112 entered.
 113
 114 From this point on the CPUs are all equal, eCos makes no further
 115 distinction between the primary and secondary CPUs. However, the
 116 hardware may still distinguish them as far as interrupt delivery is
 117 concerned.
 118
 119
 120 Scheduling
 121 ~~~~~~~~~~
 122
 123 To function correctly an operating system kernel must protect its
 124 vital data structures, such as the run queues, from concurrent
 125 access. In a single CPU system the only concurrent activities to worry
 126 about are asynchronous interrupts. The kernel can easily guard its
 127 data structures against these by disabling interrupts. However, in a
 128 multi-CPU system, this is inadequate since it does not block access by
 129 other CPUs.
 130
 131 The eCos kernel protects its vital data structures using the scheduler
 132 lock. In single CPU systems this is a simple counter that is
 133 atomically incremented to acquire the lock and decremented to release
 134 it. If the lock is decremented to zero then the scheduler may be
 135 invoked to choose a different thread to run. Because interrupts may
 136 continue to be serviced while the scheduler lock is claimed, ISRs are
 137 not allowed to access kernel data structures, or call kernel routines
 138 that can. Instead all such operations are deferred to an associated
 139 DSR routine that is run during the lock release operation, when the
 140 data structures are in a consistent state.
 141
 142 By choosing a kernel locking mechanism that does not rely on interrupt
 143 manipulation to protect data structures, it is easier to convert eCos
 144 to SMP than would otherwise be the case. The principal change needed to
 145 make eCos SMP-safe is to convert the scheduler lock into a nestable
 146 spin lock. This is done by adding a spinlock and a CPU id to the
 147 original counter.
 148
 149 The algorithm for acquiring the scheduler lock is very simple. If the
 150 scheduler lock's CPU id matches the current CPU then it can increment
 151 the counter and continue. If it does not match, the CPU must spin on
 152 the spinlock, after which it may increment the counter and store its
 153 own identity in the CPU id.
 154
 155 To release the lock, the counter is decremented. If it goes to zero
 156 the CPU id value must be set to NONE and the spinlock cleared.
 157
 158 To protect these sequences against interrupts, they must be performed
 159 with interrupts disabled. However, since these are very short code
 160 sequences, they will not have an adverse effect on the interrupt
 161 latency.
 162
 163 Beyond converting the scheduler lock, further preparing the kernel for
 164 SMP is a relatively minor matter. The main changes are to convert
 165 various scalar housekeeping variables into arrays indexed by CPU
 166 id. These include the current thread pointer, the need_reschedule
 167 flag and the timeslice counter.
 168
 169 At present only the Multi-Level Queue (MLQ) scheduler is capable of
 170 supporting SMP configurations. The main change made to this scheduler
 171 is to cope with having several threads in execution at the same
 172 time. Running threads are marked with the CPU they are executing on.
 173 When scheduling a thread, the scheduler skips past any running threads
 174 until it finds a thread that is pending. While not a constant-time
 175 algorithm, as in the single CPU case, this is still deterministic,
 176 since the worst case time is bounded by the number of CPUs in the
 177 system.
 178
 179 A second change to the scheduler is in the code used to decide when
 180 the scheduler should be called to choose a new thread. The scheduler
 181 attempts to keep the *n* CPUs running the *n* highest priority
 182 threads. Since an event or interrupt on one CPU may require a
 183 reschedule on another CPU, there must be a mechanism for deciding
 184 this. The algorithm currently implemented is very simple. Given a
 185 thread that has just been awakened (or had its priority changed), the
 186 scheduler scans the CPUs, starting with the one it is currently
 187 running on, for a current thread that is of lower priority than the
 188 new one. If one is found then a reschedule interrupt is sent to that
 189 CPU and the scan continues, but now using the current thread of the
 190 rescheduled CPU as the candidate thread. In this way the new thread
 191 gets to run as quickly as possible, hopefully on the current CPU, and
 192 the remaining CPUs will pick up the remaining highest priority
 193 threads as a consequence of processing the reschedule interrupt.
 194
 195 The final change to the scheduler is in the handling of
 196 timeslicing. Only one CPU receives timer interrupts, although all CPUs
 197 must handle timeslicing. To make this work, the CPU that receives the
 198 timer interrupt decrements the timeslice counter for all CPUs, not
 199 just its own. If the counter for a CPU reaches zero, then it sends a
 200 timeslice interrupt to that CPU. On receiving the interrupt the
 201 destination CPU enters the scheduler and looks for another thread at
 202 the same priority to run. This is somewhat more efficient than
 203 distributing clock ticks to all CPUs, since the interrupt is only
 204 needed when a timeslice occurs.
 205
 206 Device Drivers
 207 ~~~~~~~~~~~~~~
 208
 209 The main area where the SMP nature of a system will be most apparent
 210 is in device drivers. It is quite possible for the ISR, DSR and thread
 211 components of a device driver to execute on different CPUs. For this
 212 reason it is much more important that SMP-capable device drivers use
 213 the driver API routines correctly.
 214
 215 Synchronization between threads and DSRs continues to require that the
 216 thread-side code use cyg_drv_dsr_lock() and cyg_drv_dsr_unlock() to
 217 protect access to shared data. Synchronization between ISRs and DSRs
 218 or threads requires that access to sensitive data be protected, in all
 219 places, by calls to cyg_drv_isr_lock() and cyg_drv_isr_unlock().
 220
 221 The ISR lock, for SMP systems, not only disables local interrupts, but
 222 also acquires a spinlock to protect against concurrent access from
 223 other CPUs. This is necessary because ISRs are not run with the
 224 scheduler lock claimed. Hence they can run in parallel with other
 225 components of the device driver.
 226
 227 The ISR lock provided by the driver API is just a shared spinlock that
 228 is available for use by all drivers. If a driver needs to implement a
 229 finer grain of locking, it can use private spinlocks, accessed via the
 230 cyg_drv_spinlock_*() functions (see API later).
 231
 232
 233 API Extensions
 234 --------------
 235
 236 In general, the SMP support is invisible to application code. All
 237 synchronization and communication operations function exactly as
 238 before. The main area where code needs to be SMP aware is in the
 239 handling of interrupt routing, and in the synchronization of ISRs,
 240 DSRs and threads.
 241
 242 The following sections contain brief descriptions of the API
 243 extensions added for SMP support. More details will be found in the
 244 Kernel C API and Device Driver API documentation.
 245
 246 Interrupt Routing
 247 ~~~~~~~~~~~~~~~~~
 248
 249 Two new functions have been added to the Kernel API and the device
 250 driver API to do interrupt routing. These are:
 251
 252 void cyg_interrupt_set_cpu( cyg_vector_t vector, cyg_cpu_t cpu );
 253 void cyg_drv_interrupt_set_cpu( cyg_vector_t vector, cyg_cpu_t cpu );
 254
 255 cyg_cpu_t cyg_interrupt_get_cpu( cyg_vector_t vector );
 256 cyg_cpu_t cyg_drv_interrupt_get_cpu( cyg_vector_t vector );
 257
 258 the *_set_cpu() functions cause the given interrupt to be handled by
 259 the nominated CPU.
 260
 261 The *_get_cpu() functions return the CPU to which the vector is
 262 routed.
 263
 264 Although not currently supported, special values for the cpu argument
 265 may be used to indicate that the interrupt is being routed dynamically
 266 or is CPU-local.
 267
 268 Once a vector has been routed to a new CPU, all other interrupt
 269 masking and configuration operations are relative to that CPU, where
 270 relevant.
 271
 272 Synchronization
 273 ~~~~~~~~~~~~~~~
 274
 275 All existing synchronization mechanisms work as before in an SMP
 276 system. Additional synchronization mechanisms have been added to
 277 provide explicit synchronization for SMP.
 278
 279 A set of functions have been added to the Kernel and device driver
 280 APIs to provide spinlocks:
 281
 282 void cyg_spinlock_init( cyg_spinlock_t *lock, cyg_bool_t locked );
 283 void cyg_drv_spinlock_init( cyg_spinlock_t *lock, cyg_bool_t locked );
 284
 285 void cyg_spinlock_destroy( cyg_spinlock_t *lock );
 286 void cyg_drv_spinlock_destroy( cyg_spinlock_t *lock );
 287
 288 void cyg_spinlock_spin( cyg_spinlock_t *lock );
 289 void cyg_drv_spinlock_spin( cyg_spinlock_t *lock );
 290
 291 void cyg_spinlock_clear( cyg_spinlock_t *lock );
 292 void cyg_drv_spinlock_clear( cyg_spinlock_t *lock );
 293
 294 cyg_bool_t cyg_spinlock_try( cyg_spinlock_t *lock );
 295 cyg_bool_t cyg_drv_spinlock_try( cyg_spinlock_t *lock );
 296
 297 cyg_bool_t cyg_spinlock_test( cyg_spinlock_t *lock );
 298 cyg_bool_t cyg_drv_spinlock_test( cyg_spinlock_t *lock );
 299
 300 void cyg_spinlock_spin_intsave( cyg_spinlock_t *lock,
 301                                 cyg_addrword_t *istate );
 302 void cyg_drv_spinlock_spin_intsave( cyg_spinlock_t *lock,
 303                                     cyg_addrword_t *istate );
 304
 305 void cyg_spinlock_clear_intsave( cyg_spinlock_t *lock,
 306                                  cyg_addrword_t istate );
 307 void cyg_drv_spinlock_clear_intsave( cyg_spinlock_t *lock,
 308                                      cyg_addrword_t istate );
 309
 310 The *_init() functions initialize the lock, to either locked or clear,
 311 and the *_destroy() functions destroy the lock. Init() should be called
 312 before the lock is used and destroy() should be called when it is
 313 finished with.
 314
 315 The *_spin() functions will cause the calling CPU to spin until it can
 316 claim the lock and the *_clear() functions clear the lock so that the
 317 next CPU can claim it. The *_try() functions attempts to claim the lock
 318 but returns false if it cannot. The *_test() functions simply return
 319 the state of the lock.
 320
 321 None of these functions will necessarily block interrupts while they
 322 spin. If the spinlock is only to be used between threads on different
 323 CPUs, or in circumstances where it is known that the relevant
 324 interrupts are disabled, then these functions will suffice. However,
 325 if the spinlock is also to be used from an ISR, which may be called at
 326 any point, a straightforward spinlock may result in deadlock. Hence
 327 the *_intsave() variants are supplied to disable interrupts while the
 328 lock is held.
 329
 330 The *_spin_intsave() function disables interrupts, saving the current
 331 state in *istate, and then claims the lock. The *_clear_intsave()
 332 function clears the spinlock and restores the interrupt enable state
 333 from *istate.
 334
 335
 336 HAL Support
 337 -----------
 338
 339 SMP support in any platform depends on the HAL supplying the
 340 appropriate operations. All HAL SMP support is defined in the
 341 hal_smp.h header (and if necessary var_smp.h and plf_smp.h).
 342
 343 SMP support falls into a number of functional groups.
 344
 345 CPU Control
 346 ~~~~~~~~~~~
 347
 348 This group consists of descriptive and control macros for managing the
 349 CPUs in an SMP system.
 350
 351 HAL_SMP_CPU_TYPE        A type that can contain a CPU id. A CPU id is
 352                         usually a small integer that is used to index
 353                         arrays of variables that are managed on an
 354                         per-CPU basis.
 355
 356 HAL_SMP_CPU_MAX         The maximum number of CPUs that can be
 357                         supported. This is used to provide the size of
 358                         any arrays that have an element per CPU.
 359
 360 HAL_SMP_CPU_COUNT()     Returns the number of CPUs currently
 361                         operational. This may differ from
 362                         HAL_SMP_CPU_MAX depending on the runtime
 363                         environment.
 364
 365 HAL_SMP_CPU_THIS()      Returns the CPU id of the current CPU.
 366
 367 HAL_SMP_CPU_NONE        A value that does not match any real CPU
 368                         id. This is uses where a CPU type variable
 369                         must be set to a nul value.
 370
 371 HAL_SMP_CPU_START( cpu )
 372                         Starts the given CPU executing at a defined
 373                         HAL entry point. After performing any HAL
 374                         level initialization, the CPU calls up into
 375                         the kernel at cyg_kernel_cpu_startup().
 376
 377 HAL_SMP_CPU_RESCHEDULE_INTERRUPT( cpu, wait )
 378                         Sends the CPU a reschedule interrupt, and if
 379                         _wait_ is non-zero, waits for an
 380                         acknowledgment. The interrupted CPU should
 381                         call cyg_scheduler_set_need_reschedule() in
 382                         its DSR to cause the reschedule to occur.
 383
 384 HAL_SMP_CPU_TIMESLICE_INTERRUPT( cpu, wait )
 385                         Sends the CPU a timeslice interrupt, and if
 386                         _wait_ is non-zero, waits for an
 387                         acknowledgment. The interrupted CPU should
 388                         call cyg_scheduler_timeslice_cpu() to cause
 389                         the timeslice event to be processed.
 390
 391 Test-and-set Support
 392 ~~~~~~~~~~~~~~~~~~~~
 393
 394 Test-and-set is the foundation of the SMP synchronization
 395 mechanisms.
 396
 397 HAL_TAS_TYPE            The type for all test-and-set variables. The
 398                         test-and-set macros only support operations on
 399                         a single bit (usually the least significant
 400                         bit) of this location. This allows for maximum
 401                         flexibility in the implementation.
 402
 403 HAL_TAS_SET( tas, oldb )
 404                         Performs a test and set operation on the
 405                         location _tas_. _oldb_ will contain *true* if
 406                         the location was already set, and *false* if
 407                         it was clear.
 408
 409 HAL_TAS_CLEAR( tas, oldb )
 410                         Performs a test and clear operation on the
 411                         location _tas_. _oldb_ will contain *true* if
 412                         the location was already set, and *false* if
 413                         it was clear.
 414
 415 Spinlocks
 416 ~~~~~~~~~
 417
 418 Spinlocks provide inter-CPU locking. Normally they will be implemented
 419 on top of the test-and-set mechanism above, but may also be
 420 implemented by other means if, for example, the hardware has more
 421 direct support for spinlocks.
 422
 423 HAL_SPINLOCK_TYPE       The type for all spinlock variables.
 424
 425 HAL_SPINLOCK_INIT_CLEAR A value that may be assigned to a spinlock
 426                         variable to initialize it to clear.
 427
 428 HAL_SPINLOCK_INIT_SET   A value that may be assigned to a spinlock
 429                         variable to initialize it to set.
 430
 431 HAL_SPINLOCK_SPIN( lock )
 432                         The caller spins in a busy loop waiting for
 433                         the lock to become clear. It then sets it and
 434                         continues. This is all handled atomically, so
 435                         that there are no race conditions between CPUs.
 436
 437 HAL_SPINLOCK_CLEAR( lock )
 438                         The caller clears the lock. One of any waiting
 439                         spinners will then be able to proceed.
 440
 441 HAL_SPINLOCK_TRY( lock, val )
 442                         Attempts to set the lock. The value put in
 443                         _val_ will be *true* if the lock was
 444                         claimed successfully, and *false* if it was
 445                         not.
 446
 447 HAL_SPINLOCK_TEST( lock, val )
 448                         Tests the current value of the lock. The value
 449                         put in _val_ will be *true* if the lock is
 450                         claimed and *false* of it is clear.
 451
 452 Scheduler Lock
 453 ~~~~~~~~~~~~~~
 454
 455 The scheduler lock is the main protection for all kernel data
 456 structures. By default the kernel implements the scheduler lock itself
 457 using a spinlock. However, if spinlocks cannot be supported by the
 458 hardware, or there is a more efficient implementation available, the
 459 HAL may provide macros to implement the scheduler lock.
 460
 461 HAL_SMP_SCHEDLOCK_DATA_TYPE
 462                         A data type, possibly a structure, that
 463                         contains any data items needed by the
 464                         scheduler lock implementation. A variable of
 465                         this type will be instantiated as a static
 466                         member of the Cyg_Scheduler_SchedLock class
 467                         and passed to all the following macros.
 468
 469 HAL_SMP_SCHEDLOCK_INIT( lock, data )
 470                         Initialize the scheduler lock. The _lock_
 471                         argument is the scheduler lock counter and the
 472                         _data_ argument is a variable of
 473                         HAL_SMP_SCHEDLOCK_DATA_TYPE type.
 474
 475 HAL_SMP_SCHEDLOCK_INC( lock, data )
 476                         Increment the scheduler lock. The first
 477                         increment of the lock from zero to one for any
 478                         CPU may cause it to wait until the lock is
 479                         zeroed by another CPU. Subsequent increments
 480                         should be less expensive since this CPU
 481                         already holds the lock.
 482
 483 HAL_SMP_SCHEDLOCK_ZERO( lock, data )
 484                         Zero the scheduler lock. This operation will
 485                         also clear the lock so that other CPUs may
 486                         claim it.
 487
 488 HAL_SMP_SCHEDLOCK_SET( lock, data, new )
 489
 490                         Set the lock to a different value, in
 491                         _new_. This is only called when the lock is
 492                         already known to be owned by the current
 493                         CPU. It is never called to zero the lock, or
 494                         to increment it from zero.
 495
 496
 497 Interrupt Routing
 498 ~~~~~~~~~~~~~~~~~
 499
 500 The routing of interrupts to different CPUs is supported by two new
 501 interfaces in hal_intr.h.
 502
 503 Once an interrupt has been routed to a new CPU, the existing vector
 504 masking and configuration operations should take account of the CPU
 505 routing. For example, if the operation is not invoked on the
 506 destination CPU itself, then the HAL may need to arrange to transfer
 507 the operation to the destination CPU for correct application.
 508
 509 HAL_INTERRUPT_SET_CPU( vector, cpu )
 510                        Route the interrupt for the given _vector_ to
 511                        the given _cpu_.
 512
 513 HAL_INTERRUPT_GET_CPU( vector, cpu )
 514                        Set _cpu_ to the id of the CPU to which this
 515                        vector is routed.
 516
 517
 518
 519
 520
 521 Annex 1 - Pentium SMP Support
 522 =============================
 523
 524 ECos supports SMP working on Pentium class IA32 CPUs with integrated
 525 SMP support. It uses the per-CPU APIC's and the IOAPIC to provide CPU
 526 control and identification, and to distribute interrupts. Only PCI
 527 interrupts that map into the ISA interrupt space are currently
 528 supported. The code relies on the MP Configuration Table supplied by
 529 the BIOS to discover the number of CPUs, IOAPIC location and interrupt
 530 assignments - hardware based MP configuration discovery is
 531 not currently supported.
 532
 533 Inter-CPU interrupts are mapped into interrupt vectors from 64
 534 up. Each CPU has its own vector at 64+CPUID.
 535
 536 Interrupt delivery is initially configured to deliver all interrupts
 537 to the initial CPU. HAL_INTERRUPT_SET_CPU() currently only supports
 538 the ability to deliver interrupts to specific CPUs, dynamic CPU
 539 selection is not currently supported.
 540
 541 eCos has only been tested in a dual processor configuration. While the
 542 code has been written to handle an arbitrary number of CPUs, this has
 543 not been tested.
 544