5 eCos contains support for limited Symmetric Multi-Processing
6 (SMP). This is only available on selected architectures and platforms.
8 This first part of this document describes the platform-independent
9 parts of the SMP support. Annexes at the end of this document describe
10 any details that are specific to a particular platform.
12 Target Hardware Limitations
13 ---------------------------
15 To allow a reasonable implementation of SMP, and to reduce the
16 disruption to the existing source base, a number of assumptions have
17 been made about the features of the target hardware.
19 - Modest multiprocessing. The typical number of CPUs supported is two
20 to four, with an upper limit around eight. While there are no
21 inherent limits in the code, hardware and algorithmic limitations
22 will probably become significant beyond this point.
24 - SMP synchronization support. The hardware must supply a mechanism to
25 allow software on two CPUs to synchronize. This is normally provided
26 as part of the instruction set in the form of test-and-set,
27 compare-and-swap or load-link/store-conditional instructions. An
28 alternative approach is the provision of hardware semaphore
29 registers which can be used to serialize implementations of these
30 operations. Whatever hardware facilities are available, they are
31 used in eCos to implement spinlocks.
33 - Coherent caches. It is assumed that no extra effort will be required
34 to access shared memory from any processor. This means that either
35 there are no caches, they are shared by all processors, or are
36 maintained in a coherent state by the hardware. It would be too
37 disruptive to the eCos sources if every memory access had to be
38 bracketed by cache load/flush operations. Any hardware that requires
39 this is not supported.
41 - Uniform addressing. It is assumed that all memory that is
42 shared between CPUs is addressed at the same location from all
43 CPUs. Like non-coherent caches, dealing with CPU-specific address
44 translation is considered too disruptive to the eCos source
45 base. This does not, however, preclude systems with non-uniform
46 access costs for different CPUs.
48 - Uniform device addressing. As with access to memory, it is assumed
49 that all devices are equally accessible to all CPUs. Since device
50 access is often made from thread contexts, it is not possible to
51 restrict access to device control registers to certain CPUs, since
52 there is currently no support for binding or migrating threads to CPUs.
54 - Interrupt routing. The target hardware must have an interrupt
55 controller that can route interrupts to specific CPUs. It is
56 acceptable for all interrupts to be delivered to just one CPU, or
57 for some interrupts to be bound to specific CPUs, or for some
58 interrupts to be local to each CPU. At present dynamic routing,
59 where a different CPU may be chosen each time an interrupt is
60 delivered, is not supported. ECos cannot support hardware where all
61 interrupts are delivered to all CPUs simultaneously with the
62 expectation that software will resolve any conflicts.
64 - Inter-CPU interrupts. A mechanism to allow one CPU to interrupt
65 another is needed. This is necessary so that events on one CPU can
66 cause rescheduling on other CPUs.
68 - CPU Identifiers. Code running on a CPU must be able to determine
69 which CPU it is running on. The CPU Id is usually provided either in
70 a CPU status register, or in a register associated with the
71 inter-CPU interrupt delivery subsystem. Ecos expects CPU Ids to be
72 small positive integers, although alternative representations, such
73 as bitmaps, can be converted relatively easily. Complex mechanisms
74 for getting the CPU Id cannot be supported. Getting the CPU Id must
75 be a cheap operation, since it is done often, and in performance
76 critical places such as interrupt handlers and the scheduler.
81 This section describes how SMP is handled in the kernel, and where
82 system behaviour differs from a single CPU system.
87 System startup takes place on only one CPU, called the primary
88 CPU. All other CPUs, the secondary CPUs, are either placed in
89 suspended state at reset, or are captured by the HAL and put into
90 a spin as they start up.
92 The primary CPU is responsible for copying the DATA segment and
93 zeroing the BSS (if required), calling HAL variant and platform
94 initialization routines and invoking constructors. It then calls
95 cyg_start() to enter the application. The application may then create
96 extra threads and other objects.
98 It is only when the application calls Cyg_Scheduler::start() that the
99 secondary CPUs are initialized. This routine scans the list of
100 available secondary CPUs and calls HAL_SMP_CPU_START() to start each one.
101 Finally it calls Cyg_Scheduler::start_cpu().
103 Each secondary CPU starts in the HAL, where it completes any per-CPU
104 initialization before calling into the kernel at
105 cyg_kernel_cpu_startup(). Here it claims the scheduler lock and calls
106 Cyg_Scheduler::start_cpu().
108 Cyg_Scheduler::start_cpu() is common to both the primary and secondary
109 CPUs. The first thing this code does is to install an interrupt object
110 for this CPU's inter-CPU interrupt. From this point on the code is the
111 same as for the single CPU case: an initial thread is chosen and
114 From this point on the CPUs are all equal, eCos makes no further
115 distinction between the primary and secondary CPUs. However, the
116 hardware may still distinguish them as far as interrupt delivery is
123 To function correctly an operating system kernel must protect its
124 vital data structures, such as the run queues, from concurrent
125 access. In a single CPU system the only concurrent activities to worry
126 about are asynchronous interrupts. The kernel can easily guard its
127 data structures against these by disabling interrupts. However, in a
128 multi-CPU system, this is inadequate since it does not block access by
131 The eCos kernel protects its vital data structures using the scheduler
132 lock. In single CPU systems this is a simple counter that is
133 atomically incremented to acquire the lock and decremented to release
134 it. If the lock is decremented to zero then the scheduler may be
135 invoked to choose a different thread to run. Because interrupts may
136 continue to be serviced while the scheduler lock is claimed, ISRs are
137 not allowed to access kernel data structures, or call kernel routines
138 that can. Instead all such operations are deferred to an associated
139 DSR routine that is run during the lock release operation, when the
140 data structures are in a consistent state.
142 By choosing a kernel locking mechanism that does not rely on interrupt
143 manipulation to protect data structures, it is easier to convert eCos
144 to SMP than would otherwise be the case. The principal change needed to
145 make eCos SMP-safe is to convert the scheduler lock into a nestable
146 spin lock. This is done by adding a spinlock and a CPU id to the
149 The algorithm for acquiring the scheduler lock is very simple. If the
150 scheduler lock's CPU id matches the current CPU then it can increment
151 the counter and continue. If it does not match, the CPU must spin on
152 the spinlock, after which it may increment the counter and store its
153 own identity in the CPU id.
155 To release the lock, the counter is decremented. If it goes to zero
156 the CPU id value must be set to NONE and the spinlock cleared.
158 To protect these sequences against interrupts, they must be performed
159 with interrupts disabled. However, since these are very short code
160 sequences, they will not have an adverse effect on the interrupt
163 Beyond converting the scheduler lock, further preparing the kernel for
164 SMP is a relatively minor matter. The main changes are to convert
165 various scalar housekeeping variables into arrays indexed by CPU
166 id. These include the current thread pointer, the need_reschedule
167 flag and the timeslice counter.
169 At present only the Multi-Level Queue (MLQ) scheduler is capable of
170 supporting SMP configurations. The main change made to this scheduler
171 is to cope with having several threads in execution at the same
172 time. Running threads are marked with the CPU they are executing on.
173 When scheduling a thread, the scheduler skips past any running threads
174 until it finds a thread that is pending. While not a constant-time
175 algorithm, as in the single CPU case, this is still deterministic,
176 since the worst case time is bounded by the number of CPUs in the
179 A second change to the scheduler is in the code used to decide when
180 the scheduler should be called to choose a new thread. The scheduler
181 attempts to keep the *n* CPUs running the *n* highest priority
182 threads. Since an event or interrupt on one CPU may require a
183 reschedule on another CPU, there must be a mechanism for deciding
184 this. The algorithm currently implemented is very simple. Given a
185 thread that has just been awakened (or had its priority changed), the
186 scheduler scans the CPUs, starting with the one it is currently
187 running on, for a current thread that is of lower priority than the
188 new one. If one is found then a reschedule interrupt is sent to that
189 CPU and the scan continues, but now using the current thread of the
190 rescheduled CPU as the candidate thread. In this way the new thread
191 gets to run as quickly as possible, hopefully on the current CPU, and
192 the remaining CPUs will pick up the remaining highest priority
193 threads as a consequence of processing the reschedule interrupt.
195 The final change to the scheduler is in the handling of
196 timeslicing. Only one CPU receives timer interrupts, although all CPUs
197 must handle timeslicing. To make this work, the CPU that receives the
198 timer interrupt decrements the timeslice counter for all CPUs, not
199 just its own. If the counter for a CPU reaches zero, then it sends a
200 timeslice interrupt to that CPU. On receiving the interrupt the
201 destination CPU enters the scheduler and looks for another thread at
202 the same priority to run. This is somewhat more efficient than
203 distributing clock ticks to all CPUs, since the interrupt is only
204 needed when a timeslice occurs.
209 The main area where the SMP nature of a system will be most apparent
210 is in device drivers. It is quite possible for the ISR, DSR and thread
211 components of a device driver to execute on different CPUs. For this
212 reason it is much more important that SMP-capable device drivers use
213 the driver API routines correctly.
215 Synchronization between threads and DSRs continues to require that the
216 thread-side code use cyg_drv_dsr_lock() and cyg_drv_dsr_unlock() to
217 protect access to shared data. Synchronization between ISRs and DSRs
218 or threads requires that access to sensitive data be protected, in all
219 places, by calls to cyg_drv_isr_lock() and cyg_drv_isr_unlock().
221 The ISR lock, for SMP systems, not only disables local interrupts, but
222 also acquires a spinlock to protect against concurrent access from
223 other CPUs. This is necessary because ISRs are not run with the
224 scheduler lock claimed. Hence they can run in parallel with other
225 components of the device driver.
227 The ISR lock provided by the driver API is just a shared spinlock that
228 is available for use by all drivers. If a driver needs to implement a
229 finer grain of locking, it can use private spinlocks, accessed via the
230 cyg_drv_spinlock_*() functions (see API later).
236 In general, the SMP support is invisible to application code. All
237 synchronization and communication operations function exactly as
238 before. The main area where code needs to be SMP aware is in the
239 handling of interrupt routing, and in the synchronization of ISRs,
242 The following sections contain brief descriptions of the API
243 extensions added for SMP support. More details will be found in the
244 Kernel C API and Device Driver API documentation.
249 Two new functions have been added to the Kernel API and the device
250 driver API to do interrupt routing. These are:
252 void cyg_interrupt_set_cpu( cyg_vector_t vector, cyg_cpu_t cpu );
253 void cyg_drv_interrupt_set_cpu( cyg_vector_t vector, cyg_cpu_t cpu );
255 cyg_cpu_t cyg_interrupt_get_cpu( cyg_vector_t vector );
256 cyg_cpu_t cyg_drv_interrupt_get_cpu( cyg_vector_t vector );
258 the *_set_cpu() functions cause the given interrupt to be handled by
261 The *_get_cpu() functions return the CPU to which the vector is
264 Although not currently supported, special values for the cpu argument
265 may be used to indicate that the interrupt is being routed dynamically
268 Once a vector has been routed to a new CPU, all other interrupt
269 masking and configuration operations are relative to that CPU, where
275 All existing synchronization mechanisms work as before in an SMP
276 system. Additional synchronization mechanisms have been added to
277 provide explicit synchronization for SMP.
279 A set of functions have been added to the Kernel and device driver
280 APIs to provide spinlocks:
282 void cyg_spinlock_init( cyg_spinlock_t *lock, cyg_bool_t locked );
283 void cyg_drv_spinlock_init( cyg_spinlock_t *lock, cyg_bool_t locked );
285 void cyg_spinlock_destroy( cyg_spinlock_t *lock );
286 void cyg_drv_spinlock_destroy( cyg_spinlock_t *lock );
288 void cyg_spinlock_spin( cyg_spinlock_t *lock );
289 void cyg_drv_spinlock_spin( cyg_spinlock_t *lock );
291 void cyg_spinlock_clear( cyg_spinlock_t *lock );
292 void cyg_drv_spinlock_clear( cyg_spinlock_t *lock );
294 cyg_bool_t cyg_spinlock_try( cyg_spinlock_t *lock );
295 cyg_bool_t cyg_drv_spinlock_try( cyg_spinlock_t *lock );
297 cyg_bool_t cyg_spinlock_test( cyg_spinlock_t *lock );
298 cyg_bool_t cyg_drv_spinlock_test( cyg_spinlock_t *lock );
300 void cyg_spinlock_spin_intsave( cyg_spinlock_t *lock,
301 cyg_addrword_t *istate );
302 void cyg_drv_spinlock_spin_intsave( cyg_spinlock_t *lock,
303 cyg_addrword_t *istate );
305 void cyg_spinlock_clear_intsave( cyg_spinlock_t *lock,
306 cyg_addrword_t istate );
307 void cyg_drv_spinlock_clear_intsave( cyg_spinlock_t *lock,
308 cyg_addrword_t istate );
310 The *_init() functions initialize the lock, to either locked or clear,
311 and the *_destroy() functions destroy the lock. Init() should be called
312 before the lock is used and destroy() should be called when it is
315 The *_spin() functions will cause the calling CPU to spin until it can
316 claim the lock and the *_clear() functions clear the lock so that the
317 next CPU can claim it. The *_try() functions attempts to claim the lock
318 but returns false if it cannot. The *_test() functions simply return
319 the state of the lock.
321 None of these functions will necessarily block interrupts while they
322 spin. If the spinlock is only to be used between threads on different
323 CPUs, or in circumstances where it is known that the relevant
324 interrupts are disabled, then these functions will suffice. However,
325 if the spinlock is also to be used from an ISR, which may be called at
326 any point, a straightforward spinlock may result in deadlock. Hence
327 the *_intsave() variants are supplied to disable interrupts while the
330 The *_spin_intsave() function disables interrupts, saving the current
331 state in *istate, and then claims the lock. The *_clear_intsave()
332 function clears the spinlock and restores the interrupt enable state
339 SMP support in any platform depends on the HAL supplying the
340 appropriate operations. All HAL SMP support is defined in the
341 hal_smp.h header (and if necessary var_smp.h and plf_smp.h).
343 SMP support falls into a number of functional groups.
348 This group consists of descriptive and control macros for managing the
349 CPUs in an SMP system.
351 HAL_SMP_CPU_TYPE A type that can contain a CPU id. A CPU id is
352 usually a small integer that is used to index
353 arrays of variables that are managed on an
356 HAL_SMP_CPU_MAX The maximum number of CPUs that can be
357 supported. This is used to provide the size of
358 any arrays that have an element per CPU.
360 HAL_SMP_CPU_COUNT() Returns the number of CPUs currently
361 operational. This may differ from
362 HAL_SMP_CPU_MAX depending on the runtime
365 HAL_SMP_CPU_THIS() Returns the CPU id of the current CPU.
367 HAL_SMP_CPU_NONE A value that does not match any real CPU
368 id. This is uses where a CPU type variable
369 must be set to a nul value.
371 HAL_SMP_CPU_START( cpu )
372 Starts the given CPU executing at a defined
373 HAL entry point. After performing any HAL
374 level initialization, the CPU calls up into
375 the kernel at cyg_kernel_cpu_startup().
377 HAL_SMP_CPU_RESCHEDULE_INTERRUPT( cpu, wait )
378 Sends the CPU a reschedule interrupt, and if
379 _wait_ is non-zero, waits for an
380 acknowledgment. The interrupted CPU should
381 call cyg_scheduler_set_need_reschedule() in
382 its DSR to cause the reschedule to occur.
384 HAL_SMP_CPU_TIMESLICE_INTERRUPT( cpu, wait )
385 Sends the CPU a timeslice interrupt, and if
386 _wait_ is non-zero, waits for an
387 acknowledgment. The interrupted CPU should
388 call cyg_scheduler_timeslice_cpu() to cause
389 the timeslice event to be processed.
394 Test-and-set is the foundation of the SMP synchronization
397 HAL_TAS_TYPE The type for all test-and-set variables. The
398 test-and-set macros only support operations on
399 a single bit (usually the least significant
400 bit) of this location. This allows for maximum
401 flexibility in the implementation.
403 HAL_TAS_SET( tas, oldb )
404 Performs a test and set operation on the
405 location _tas_. _oldb_ will contain *true* if
406 the location was already set, and *false* if
409 HAL_TAS_CLEAR( tas, oldb )
410 Performs a test and clear operation on the
411 location _tas_. _oldb_ will contain *true* if
412 the location was already set, and *false* if
418 Spinlocks provide inter-CPU locking. Normally they will be implemented
419 on top of the test-and-set mechanism above, but may also be
420 implemented by other means if, for example, the hardware has more
421 direct support for spinlocks.
423 HAL_SPINLOCK_TYPE The type for all spinlock variables.
425 HAL_SPINLOCK_INIT_CLEAR A value that may be assigned to a spinlock
426 variable to initialize it to clear.
428 HAL_SPINLOCK_INIT_SET A value that may be assigned to a spinlock
429 variable to initialize it to set.
431 HAL_SPINLOCK_SPIN( lock )
432 The caller spins in a busy loop waiting for
433 the lock to become clear. It then sets it and
434 continues. This is all handled atomically, so
435 that there are no race conditions between CPUs.
437 HAL_SPINLOCK_CLEAR( lock )
438 The caller clears the lock. One of any waiting
439 spinners will then be able to proceed.
441 HAL_SPINLOCK_TRY( lock, val )
442 Attempts to set the lock. The value put in
443 _val_ will be *true* if the lock was
444 claimed successfully, and *false* if it was
447 HAL_SPINLOCK_TEST( lock, val )
448 Tests the current value of the lock. The value
449 put in _val_ will be *true* if the lock is
450 claimed and *false* of it is clear.
455 The scheduler lock is the main protection for all kernel data
456 structures. By default the kernel implements the scheduler lock itself
457 using a spinlock. However, if spinlocks cannot be supported by the
458 hardware, or there is a more efficient implementation available, the
459 HAL may provide macros to implement the scheduler lock.
461 HAL_SMP_SCHEDLOCK_DATA_TYPE
462 A data type, possibly a structure, that
463 contains any data items needed by the
464 scheduler lock implementation. A variable of
465 this type will be instantiated as a static
466 member of the Cyg_Scheduler_SchedLock class
467 and passed to all the following macros.
469 HAL_SMP_SCHEDLOCK_INIT( lock, data )
470 Initialize the scheduler lock. The _lock_
471 argument is the scheduler lock counter and the
472 _data_ argument is a variable of
473 HAL_SMP_SCHEDLOCK_DATA_TYPE type.
475 HAL_SMP_SCHEDLOCK_INC( lock, data )
476 Increment the scheduler lock. The first
477 increment of the lock from zero to one for any
478 CPU may cause it to wait until the lock is
479 zeroed by another CPU. Subsequent increments
480 should be less expensive since this CPU
481 already holds the lock.
483 HAL_SMP_SCHEDLOCK_ZERO( lock, data )
484 Zero the scheduler lock. This operation will
485 also clear the lock so that other CPUs may
488 HAL_SMP_SCHEDLOCK_SET( lock, data, new )
490 Set the lock to a different value, in
491 _new_. This is only called when the lock is
492 already known to be owned by the current
493 CPU. It is never called to zero the lock, or
494 to increment it from zero.
500 The routing of interrupts to different CPUs is supported by two new
501 interfaces in hal_intr.h.
503 Once an interrupt has been routed to a new CPU, the existing vector
504 masking and configuration operations should take account of the CPU
505 routing. For example, if the operation is not invoked on the
506 destination CPU itself, then the HAL may need to arrange to transfer
507 the operation to the destination CPU for correct application.
509 HAL_INTERRUPT_SET_CPU( vector, cpu )
510 Route the interrupt for the given _vector_ to
513 HAL_INTERRUPT_GET_CPU( vector, cpu )
514 Set _cpu_ to the id of the CPU to which this
521 Annex 1 - Pentium SMP Support
522 =============================
524 ECos supports SMP working on Pentium class IA32 CPUs with integrated
525 SMP support. It uses the per-CPU APIC's and the IOAPIC to provide CPU
526 control and identification, and to distribute interrupts. Only PCI
527 interrupts that map into the ISA interrupt space are currently
528 supported. The code relies on the MP Configuration Table supplied by
529 the BIOS to discover the number of CPUs, IOAPIC location and interrupt
530 assignments - hardware based MP configuration discovery is
531 not currently supported.
533 Inter-CPU interrupts are mapped into interrupt vectors from 64
534 up. Each CPU has its own vector at 64+CPUID.
536 Interrupt delivery is initially configured to deliver all interrupts
537 to the initial CPU. HAL_INTERRUPT_SET_CPU() currently only supports
538 the ability to deliver interrupts to specific CPUs, dynamic CPU
539 selection is not currently supported.
541 eCos has only been tested in a dual processor configuration. While the
542 code has been written to handle an arbitrary number of CPUs, this has