When we are verifying a design or bringing it up on a board, it’s not uncommon for either reset or clocking issues to cause it to not work as expected. The issue could be as simple as accidentally holding the design in reset, however, there are many other worse issues that could manifest later once the product is in production.
In this blog, we are going to look at how we can use resets in our AMD FPGA and SoCs. As with SRAM-based devices, the configuration sequence has a global set reset (GSR) which can set all sequential cells at the end of configuration.
The set / reset state of these cells is determined by initialization of registers within the RTL. If the register is not explicitly initialized, it will be set to zero for most cases. Of course, the exception to this are flip flops which are designed to set, not reset, the FDSE and FDPE.
Using no reset provides the synthesis engine more resource options when it comes to implementing the applications functionality.
However, while we can rely on GSR for the initial power on, there are other reasons we might want to reset part or all of the design.
1. Reconfiguration of processing pipelines
2. Watchdog processors expiring and processors needed to be reset
3. User commanded
4. Phase lock loops losing lock
Typically, it’s the control plane within our design that needs to be reset versus the data path. This varies from design to design so I recommend that you consider each design separately.
How we use the reset in our FPGA can impact the power dissipation, performance an area of the implemented design.
We first need to decide whether to use an asynchronous reset or a synchronous reset. We need to use a synchronous reset to get the best performance. In addition, special features like BRAM and DSP elements can only be reset synchronously.
This means if we are using an external source or different clock domain, we need to synchronize it to the correct clock domain and use a glitch filtering circuit. The processor reset block used in many designs provides this synchronisation capability.
Synchronizing the external reset signal to the correct clock domain will prevent the reset register from going metastable if the reset is removed during the registers setup and hold window. Of course, there is no guarantee it won’t if the signal is not synchronized and this can lead to the corruption of data buses in structures such as BRAM.
We also need to ensure we are using the reset synchronously within our RTL code. That is where the reset functionality takes place within the clocked process. See the following example.
vhdl_example : process(clk)
begin
if rising_edge(clk) then
if reset = '1' then
--reset action
else
--normal operation
end if;
end if;
end process;
To get the best implementation, the logic reset should be active high otherwise additional logic is required on the reset path.
To demonstrate this example, let’s use the code below to look at the different implementations that happen when we use synchronous reset compared to asynchronous implementation.
First let’s look at the synchronous approach.
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
entity reset_sync_example is
Port (
i_resetn : in std_logic;
i_clk : in std_logic;
i_a : in std_logic_vector(15 downto 0);
i_b : in std_logic_vector(15 downto 0);
o_res : out std_logic_vector(31 downto 0));
end reset_sync_example;
architecture Behavioral of reset_sync_example is
signal s_a_delay_0 : std_logic_vector(15 downto 0);
signal s_b_delay_0 : std_logic_vector(15 downto 0);
signal s_a_delay_1 : std_logic_vector(15 downto 0);
signal s_b_delay_1 : std_logic_vector(15 downto 0);
begin
sync: process(i_clk,i_resetn)
begin
if rising_edge(i_clk) then
if i_resetn = '1' then
s_a_delay_0 <= (others =>'0');
s_b_delay_0 <= (others =>'0');
s_a_delay_1 <= (others =>'0');
s_b_delay_1 <= (others =>'0');
o_res <= (others =>'0');
else
--implement 2 registers for the DSP48
s_a_delay_0 <= i_a;
s_b_delay_0 <= i_b;
s_a_delay_1 <= s_a_delay_0;
s_b_delay_1 <= s_b_delay_0;
o_res <= std_logic_vector(unsigned(s_a_delay_1) * unsigned(s_b_delay_1)) ;
end if;
end if;
end process;
end Behavioral;
When elaborated, the above code generates the implementation as seen below. The registers are mapped to those available within the DSP48 (A/B/P Reg). This reduces the number of registers required to implement the solution which can be observed in the resource utilization.
Running the same code, but with an asyncrhonous reset, produces a different result.
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
entity reset_sync_example is
Port (
i_resetn : in std_logic;
i_clk : in std_logic;
i_a : in std_logic_vector(15 downto 0);
i_b : in std_logic_vector(15 downto 0);
o_res : out std_logic_vector(31 downto 0));
end reset_sync_example;
architecture Behavioral of reset_sync_example is
signal s_a_delay_0 : std_logic_vector(15 downto 0);
signal s_b_delay_0 : std_logic_vector(15 downto 0);
signal s_a_delay_1 : std_logic_vector(15 downto 0);
signal s_b_delay_1 : std_logic_vector(15 downto 0);
begin
sync: process(i_clk,i_resetn)
begin
if i_resetn = '1' then
s_a_delay_0 <= (others =>'0');
s_b_delay_0 <= (others =>'0');
s_a_delay_1 <= (others =>'0');
s_b_delay_1 <= (others =>'0');
o_res <= (others =>'0');
elsif rising_edge(i_clk) then
s_a_delay_0 <= i_a;
s_b_delay_0 <= i_b;
s_a_delay_1 <= s_a_delay_0;
s_b_delay_1 <= s_b_delay_0;
o_res <= std_logic_vector(unsigned(s_a_delay_1) *
unsigned(s_b_delay_1)) ;
end if;
end process;
end Behavioral;
As can be seen from the utilization, this implementation uses an additional 65 registers because the registers within the DSP48 cannot be used for the buffering. This will decrease performance and increase the area required.
In addition to using a synchronous reset, there are several constraints which can also be used in synthesis to ensure the correct implementation.
The first of these is the direct reset attribute which enables a signal to be directly connected to the reset input of the flip flop. The second of these is the extract reset attribute which enables the developer to define a explicit reset signal if Vivado’s algorithm does not detect and use the desired signal.
We will hopefully be able to achieve better implementation results now that we know how to best use resets in our designs.
Workshops and Webinars
Enjoy the blog why not take a look at the free webinars, workshops and training courses we have created over the years. Highlights include
Ultra96, MiniZed & ZU1 three day course looking at HW, SW and Petalinux
Arty Z7-20 Class looking at HW, SW and Petalinux
Mastering MicroBlaze learn how to create MicroBlaze solutions
HLS Hero Workshop learn how to create High Level Synthesis based solutions
Embedded System Book
Do you want to know more about designing embedded systems from scratch? Check out our book on creating embedded systems. This book will walk you through all the stages of requirements, architecture, component selection, schematics, layout, and FPGA / software design.
We designed and manufactured the board at the heart of the book! The schematics and layout are available in Altium here
Learn more about the board (see previous blogs on Bring up, DDR validation, USB, Sensors) and view the schematics here.
Sponsored by AMD Xilinx
In the first implementation, where did the flip-flops go? Are they in the IO cells or in the DSP block? It would be interesting to re-run your experiment with the IO cell insertion turned off so we could get some clarity on where the flip-flops were implemented.
What does this experiment tell us so far? I suspect it is that either the DSP block or the IO cells do not support asynchronous reset - that is a handy thing to know.
What about the polarity of synchronous reset? Is it particular about that? That would be unfortunate as the polarity of ASIC resets has historically been active low.