If you have followed my blogs, courses, and/or webinars, you will know that I do a lot of high-reliability design and have written about it for years. An article I wrote for Xilinx’s Xcell Journal on mission-critical design is over 10 years old now. When you do what you are passionate about time really flies. This year, I have talked a lot about high-reliability design and ran a webinar on high-reliability design. In addition, one of my recent Hackster.io projects was demonstrating how we can simulate single event upsets and the impacts on our FSMs (should they occur).
In all these things, I mentioned how we can design state machines that do not lock up if subjected to a single event upset. With the proliferation of programmable logic into automotive applications and other high-reliability applications, I thought I would examine how we can implement safe state machines when using Vivado 2020.2 and Xilinx synthesis.
Before we focus on the implementation, let’s recap the different methods or protections we might use on our state machine and what we are protecting against. A single event upset flips a register between states, for example from a 0 to a 1. When we are working with state machines which store the current state in registers on the programmable logic, a single event upset can make an FSM change states uncommanded. This uncommanded state change could occur between states which are defined in the design, or alternatively, they could send the FSM into an unmapped state. Transition under either circumstances can lead to the state machine in a deadlock, requiring a reset of the state machine as the only way to recover.
Obviously, in many applications, this is not acceptable and we want our state machine to be able to either continue operating (tolerance) or detect a failure and fail safe (detection). When it comes to tolerance there are two mechanisms we can use:
Triple Modular Redundancy (TMR) — Three instantiations of the state machine are created and the output and current state is voted upon clock cycle by clock cycle. TMR is a good approach but it does require three implementations and ensuring physical separation of the three implementations to ensure only one FSM is corrupted. Xilinx Isolation Design Flow (IDF) can be especially useful for ensuring physical separation in the chip.
Hamming Three Encoded — With a Hamming three encoded state machine, each state is encoded with a hamming distance of three between them. This prevents a SEU from changing between valid states, as the SEU can only flip a single bit. In a Hamming three, each state has several adjacent states which also cover the possible states which a SEU could change the Hamming three state too. This adjacent state behaves the same as the original state, hence allowing the state machine to tolerate the SEU and keep operating. It does, however, mean the number of states declared is large. For a 16 state FSM encoded sequentially, seven bits are needed to encode the 16 states separated by a Hamming distance of three. This means there are N * (M+1) states required, where N is the number of states and M is the register size.
Of the two structures, TMR and Hamming three both require considerable effort from the design engineer unless the structure can be implemented automatically by the synthesis tool.
When it comes to detection, the structures used are considerably simpler. These are the basic structures that are used:
Hamming Two (sequential + parity) — This method encodes the state with a Hamming distance of two between them. Should a SEU occur, the error can be found using a simple XOR network and the state machine can be recovered to a safe state to recommence operation.
Default Case / When Others —This method uses the Verilog Default/ VHDL when others to implement a recover state. This does require the synthesis tool does not ignore the default case / when others and the user does not define them as null or do not care.
There are some great capabilities when it comes to implementing safe state machines using XST in Vivado.
Using the FSM_SAFE_STATE attribute in either the source code or the XDC file in Vivado, allows us to implement the following state machines:
auto_safe_state — Hamming three encoded to ensure tolerance of SEU
reset_state — Hamming two encoding to force the state machine to its reset state on error
power_on_state —Hamming two encoding to force the state machine to its power-on state on error
default — Hamming two forces the state machine to the default / when others case define in the HDL
Let’s take for example a simple three state FSM which leaves the fourth state unused. We can implement a Hamming three FSM using the attribute in the source code.
type state is (idle, rd, wr, unmapped);
signal current_state : state;
signal transfer_word : std_logic_vector(31 downto 0);
attribute fsm_safe_state : string;
attribute fsm_safe_state of current_state : signal is "auto_safe_state";
When we run synthesis with this attribute enabled, we will see the Hamming three implementation by opening the synthesis view and examining the schematic.
Confirmation that this has been implemented as a Hamming three state machine will also be reported in the synthesis report.
When it comes to examining the difference in implementation, there is of course a difference in the resources required. A simple implementation requires no protection implements as a One-Hot FSM and uses three FF, with the remaining 64 FF used for the Input and Output Register (32 bits each). The LUT are used to implement the next state logic.
Enabling the safe FSM attribute results in an increase in the number of FF to 69. This accounts for the five FF needed for the Hamming three encoding of the state machine. The number of LUTS also doubled to support the additional next state decoding required by the Hamming three encoding.
Of course, the increase in complexity will impact the performance and power dissipation of the device when multiple FSM are protected across a design.
If you do not want to define the attribute in the RTL code, it can be declared in the XDC file as below:
set_property fsm_safe_state auto_safe_state [get_cells current_state_reg[*]]
The elaboration schematic is your friend if you need to find the name of the cells for the FSM in a more complex design with hierarchy.
Now we know how to use Vivado to implement safe state machines in our programmable logic designs should we need it.
The design is available on my GitHub.
Comments