MicroZed Chronicles: TMR SEM IP

As a practicing engineer, I spend a lot of time developing space or other high-reliability applications.

I mention this because we need to consider the integrity of the SRAM configuration memory when designing for high-reliability systems, which operate terrestrially or in space. The main concern is that corruption of the configuration memory will change the behavior of the design, resulting in an error or incorrect behavior. Of course, creating a high-reliability solution means a layered defense and mitigating for configuration corruption to ensure there is no single point of failure.

One of the most important tools in the toolbox when working with Xilinx FPGAs is the TMR SEM IP. This is a Triple Modular Redundant Soft Error Mitigation IP core which can scan the configuration memory at run time and correct any configuration errors that may occur.

Implementing the TMR SEM IP in Vivado is simple. We can insert the IP core in the block diagram and can communicate with the core using one of two methods -- either AXI-Lite or UART. The AXI-Lite option enables a processor inside the device to monitor and communicate with the TMR SEM IP core.

Alternatively, we can communicate with the core using the UART and route the pins to an external port. This enables an external monitor to communicate as an external monitoring device or to be used during testing to inject faults to test the correction and overall system response.

The error correction method can be one of the following:

  • Repair – ECC-based correction algorithm

  • Enhanced Repair – ECC and CRC-based correction algorithm

  • Replace – Reloads the configuration frame from the configuration memory

Selecting the most appropriate correction method allows us to correct single errors, single and multiple bit errors distributed once frame or double bit adjacent errors. The replacement will result in arbitrary errors confined to a single frame to be corrected.

In the example I created for this blog, I inserted the TMR SEM IP core into an Arty S7-50 board with its interface connected to the USB-UART.

Once the application is built and programmed into the Arty S7-50 board, we can see the output of the SEM core over the terminal.

This reports the state change initialization, the core configuration information, ICAP, RDBK, and INIT showing as OK. The state reports being in state 02, which is observation.

The state variable defines the current state of the SEM IP core. It shows either the idle (SC00), initialization (SC01), observation (SC02), correction (SC04), classification (SC08), injection (SC10), and fatal error (SC1F).

Using this interface, we can query the status of the SEM IP core and inject errors into the configuration for testing.

We can use the ‘S’ command to determine the status of the SEM IP core. This will report the maximum frame, the number of SLR regions, the current state, flags, and feature set.

When the TMR SEM IP core is in the idle state, we can inject errors into the SEM IP core. We can use the EBD file in the implementation directory to determine the address if we want to specially target an essential bit.

The SEM IP core must be in the idle state to inject an error in the configuration. Once there, we can enter the linear address we wish to corrupt.

In this example, I am going to corrupt the LSB at linear frame address 20 word 0. For more information on addressing, see PG036 page 62.

To run the correction, the SEM IP must be in the observation state. We can enter the observation state by using the command ‘O’. Switching back to this state should result in the error being detected and corrected.

The error just injected will be detected and corrected when the SEM IP core is placed back in the observation state. You will notice there is a report generated. This will include the state change to correction SC04 followed by a single error detection (SED) with a valid syndrome (hence the OK). The following three lines of data provide the physical and logical address, word, and bit location. This should align with the error previously injected.

The report also then states COR to indicate a correctable error followed by the flag status and the error classification report.

Having the TMR SEM IP core available is a valuable tool when we are working with high reliability and mission-critical applications!