Startup Shutdown Synchronization Protocol (SSSP v1.4)

Startup Shutdown Synchronization Protocol (SSSP v1.4)¶

SSSP defines signal handling during the startup phase until all AMiRo Modules are fully initialized and during the shutdown phase, so that the system turns off in a controlled and safe manner or restarts, if requested.
The complexity of the protocol is quite low and designed in a way that modules which do not implement SSSP will not compromise system operation.
Hence, only two GPIO signals are required:

S - synchronize
PD - power down

Both must be designed in a way, that they realize a logical OR on activation (one or more nodes are active) and a logical AND on deactivation (all nodes are inactive) respectively.
Electrically this can be implemented using active-low open-drain signals with pull-up resistors.

Although these two mandatory signals suffice to implement the protocol, some optional features require some further signals and communication interfaces:

UP/DN - GPIO to the adjacent module (neighbor) up/down
BCB - a communication bus with broadcast capability

Note that a heterogeneous setup with some modules supporting the optional stages and others do not is fully compatible.
However, these optional features will only apply successfully if all modules support them.
Hence, the system must not rely on the additional information, but may take advantage of it, if it is available.

In order to make the protocol adaptable to any system, it uses a parameter T.
This defines a time period, which is used by the protocol for synchronization or to detect timeouts.
However, this parameter must be identical for all nodes within a system (or at least similar since the factor between the largest and smallest parameter in the system must be smaller than ten).

An additional parameter F defines the frequency at which S is toggled during the Operation Phase.

Startup Phase¶

All modules must initialize the signals in a way, that S is active and PD is inactive.
Although only S is used for startup, PD must be inactive during the startup phase, or the shutdown phase will be initiated either immediately by the bootloader or by the operating system as soon as it is active.

Each module executes the following steps:

basic initialization
1. initialization of required signals, voltages, or other hardware
  This first stage is very module specific and strongly depends on the hardware configuration.
  When a module has finished this stage, it sets S to inactive.
  In order to prevent erroneous behavior due to incorrect signals during the initialization, this stage takes at least one period T.
2. waiting for synchronization
  Each module waits for S to become inactive (all modules are initialized) as a first synchronization.
3. synchronous start of stage 2
  As soon as S is inactive, the master node activates it again in order to start the next stage.
  To ensure that each module had enough time to detect the inactive state of S, the master node must delay the activation by at least one period T.
operating system initialization
1. complete system startup
  Each module activates S again and fully initializes (e.g. starts the operating system).
  As soon as it is ready, it deactivates S again.
  When a module indicates to be ready, at least the main communication channel (for AMiRo this is CAN) must be fully operational.
  Again, S must be active for at least one period T, so every module can detect the activation.
2. waiting for synchronization
  Each module waits for S to become inactive (all modules are ready).
  Only now it is safe to use the main communication channel and all modules are able to receive messages correctly.
module stack initialization [optional]
This stage is optional and can only be applied if all modules can read and write from/to the main communication channel (BCB) and two additional signals (UP and DN), which connect neighboring modules, exist for each AM.
However, modules which do not support this stage will not cause severe issues, but this stage will fail nevertheless (no stack numbers / module IDs will be available).
Furthermore, the first and last node of this 'module stack' must be known beforehand.
In case of the AMiRo, for instance, the DiWheelDrive and PowerManagement are defined to be the lowermost modules, and the LightRing always finalizes the stack at the top.
1. initiation of this stage
  The master node initiates this stage by broadcasting a unique command via BCB to all modules, so they can interpret the upcoming communication via the neighbor signals (UP and DN) and BCB correctly.
  All supporting modules must wait at least ten periods T for the master's message before skipping this stage (similar to abortion; see below).
  As soon as the initiation command was received, all modules activate S for later detection of failure.
2. starting the sequence
  One of the known nodes at the end of the module stack broadcasts its own stack number (e.g. 1) via BCB.
  One period T after that, it signals its neighboring module to continue by setting the neighbor signal active for at least one period T and deactivates S right after.
  Note that an identifier value of 0 is reserved and must not be used by any module.
  These 'module IDs' can later be used to represent a hierarchy within the system or to address/identify individual modules.
3. counting the modules
  This stage is subdivided into two actions, which are triggered on different events.
  All modules have to execute this stage.
  - triggered by neighbor signal
    When a module is triggered by the activation of a neighbor signal, it broadcasts its own stack number (via BCB), which is defined to be greater than the last one.
    Then again, it waits one period T before deactivating S and triggers the next module to continue by activating the other neighbor signal for at least one period T.
    Furthermore, a timer is set to ten periods T, which is used to detect timeouts.
    In case this timer runs out before the next BCB message or neighbor event (via UP or DN) is received, the module broadcasts an abort message to abort the stage (see below).
    Another reason for abortion would be if the module is triggered a second time during this stage, indicating an invalid loop in the system architecture.
    This step is repeated until one of the termination conditions is fulfilled (see below).
  - message received via BCB
    If a message that holds a stack number of another module is received via BCB, the timer as mentioned above is reset to ten periods T.
    Moreover, the received ID is checked, whether it is greater than the one before.
    If this rule is violated, an abort message is broadcasted via BCB and the stage is aborted.
4. termination of this stage
  There are two ways this stage can be terminated: either it is completed correctly, or it is aborted.
  Whereas any module can abort this stage, only the known module finalizing the stack can complete it successfully.
  - completion
    The stage is completed correctly if the signal is propagated to the known node on the other end of the module stack and S becomes inactive as soon as that node deactivates it (all modules have participated in the procedure).
    All modules need to wait ten more periods T after the deactivation of S to make sure no timeouts occurred and no abort message was emitted.
    In this case, all nodes adopt their ID and can use it for later identification.
    If an abort message was received at any time during this stage, however, the whole procedure is aborted (see below).
  - abortion
    The stage is aborted, whenever an abort message was received or a timeout occurred (see above).
    As a result, all stack numbers must be considered unreliable, thus identification is not supported.
    Any modules that still activate S must hence deactivate it and as soon as S becomes inactive, all modules may continue operation.

At the end of the startup phase both signals, S and PD, are inactive.
Note that a module, which does not implement the protocol, will not interfere and cause no errors as long as it does not activate S.
However, such a module might cause errors after the startup phase, if it does not receive crucial information because communication is not set up (e.g. stage 3 might fail).

Operation Phase¶

All AMiRo Modules are kept in sync during operation by toggling S at frequency F.
Hence, all modules must act as slaves and there may only be one (or none) master node.
Since S gets activated when a shutdown is initialized (see Shutdown Phase), modules must synchronize at deactivation (logically falling edge) of S.

Note that this whole phase is optional, since there may be no master node at all.
Further note that a module, which does not implement the protocol, will not interfere and cause no errors as long as it does not activate S.
However, such a module might run out of sync which again may cause errors during operation.

Shutdown Phase¶

Since the PD signal must not be used during system operation, it is defined to be inactive.
The state of S is undefined, because is was used for synchronization during operation.
Any module can initiate the shutdown phase by activation of PD.
All modules (including the initiating one) must then execute the following steps as soon as the activation of PD is detected:

shutdown of high-level operation
1. initiation of module shutdown
  As soon as the activation of PD is detected, each module activates S.
  The module, which initiated system shutdown by activating PD has to activate S as well, of course.
  Obviously, the module which acted as master node during operation must stop toggling S as soon as PD is activated.
2. shutdown of high-level operation (e.g. the operating system)
  Each module stops all computation in a safe manner, so it can be shut down without data loss or other issues.
  As soon as this is done, it deactivates S.
  In order to ensure that every module had a chance to detect the activation of PD, this step must take at least one period T.
3. waiting for synchronization
  Each module waits for S to become inactive (all modules are done).
system shutdown or restart
1. evaluation of PD signal
  When S becomes inactive, the state of PD indicates whether the system shall shut down or restart.
  Hence, the initiating module, which activated PD, must have set it to the according state before it deactivated S.
  The implication of the PD state at this point is defined as follows:
  - active: A system shutdown is requested.
  - inactive: A system restart is requested.
2. disambiguation procedure
  Since there may be not one, but multiple ways to shutdown/restart the system, this ambiguity is resolved in the following procedure.
  The requirement for this to work is that the identifiers, which encode the exact shutdown/restart procedure to be executed (see below), must be non-ambiguous.
  These identifiers, however, dependent on the platform and implementation and hence are not defined by SSSP.
  1. serial broadcast of identifier
    The module which initiated the shutdown/restart phase broadcasts an arbitrary number of 'pulses' via S.
    Each 'pulse' is defined to start with S deactivated, activates it for at least one period T, and deactivates the signal again for at least one more period T.
    All modules can count the number of pulses, which encodes the exact shutdown/restart procedure to be used.
    Note that S must be inactive for at least one period T before the first pulse (after PD was evaluated).
  2. termination of the serial broadcast
    The broadcast is terminated by a timeout of ten periods T since the last change of S from active to inactive state.
    This timeout also applies if no pulse was sent at all, which corresponds to the identifier 0.
    Thus, this identifier is reserved for the special case, that the ambiguity is not resolved and all modules shall execute their default shutdown procedure.
3. final shutdown or restart
  Depending on the evaluation of PD and the result of the disambiguation procedure, each module reacts accordingly.
  - shutdown
    Each module completely stops itself and enters low-power mode.
    The details (e.g. which signals and sensors are still active) depend on the result of the disambiguation procedure.
  - restart
    If a restart was requested, each module starts with the first step of the startup phase.
    The details (e.g. which sensors are kept active) depend on the result of the disambiguation procedure.
    In order to minimize risk of errors, all modules can power off, except for a master node, which resets the whole system and forces a clean startup.

Again, a module which does not implement the protocol will cause no errors as long as it does not activate S or PD.
However, if such a module has its own power supply and does not enter low-power mode, it will unnecessarily draw energy and might not end up in a defined state as the rest of the system.
Most importantly, the latter might result in corruption of system operation if the not-defined state of modules that do not implement SSSP causes unwanted side effects like stalled communication buses or duplicate IDs.

AMiRo-OS

Wiki

Startup Shutdown Synchronization Protocol (SSSP v1.4)¶

Startup Phase¶

Operation Phase¶

Shutdown Phase¶