We looked recently at the UltraScale+ IO resources which provide a range of capabilities that can be very effective in addressing both FPGA and hardware design aspects. For example, we can use the I/OSERDES to work with serialized IO streams, DDR primitives to work with DDR interfaces, and I/ODELAYS to fine tune signal delays.
In this blog, we are going to look at how we can work with the ODELAY on an UltraScale+ device. The ODELAY is a 512-tap delay line and the individual taps are uncalibrated but each tap provides between 2.1 ps and 12 ps of delay. The ODELAY can operate in count mode where the delay is defined as the tap to be output, e.g. 0 to 511. The more useful mode of operation though, is the time mode, where the ODELAY is calibrated for voltage, process, and temperature variation by an IDELAYCNTRL block. In this case, the delay is defined as the required time delay. The delay the ODELAY provides can be defined as being either fixed or variable.
This blog is going to look at how we can work with the ODELAY in the variable time mode. There are two methods of updating the delay, either variable or var_load method. In variable method, the delay can be incremented or decremented using a simple inc/dec interface and chip enable. In var_load approach, the delay can be loaded in via a parallel interface although we need to consider the number of taps being changed.
The var_load time approach is the more complex solution which is what we’ll be examining in the remainder of this blog. This application will chain together three ODELAY to make the delay a little more obvious when examined on an oscilloscope.
To operate in the time mode, we need to use the IDELAYCNTRL block which provides calibration. This block is simple to instantiate and only needs the reference clock and reset input signals. For UltraScale+ devices, the reference clock needs to be between 300 and 800 MHz.
When working in the time mode in the var_load approach, we need to know an initial delay and its corresponding tap setting. This is made available on the CNTVALOUT port, where we can read this value and use it to calculate the updated value to apply on the CNTVALIN port.
The process for updating the ODELAY includes the following:
Sample the CNTVALOUT port
Set EN_VTC low
Wait for 10 clock periods
Calculate the new CNTVALIN
Pulse the load signal
Wait for 10 clock periods
Set EN_VTC high
To calculate the time delay, we need to use the previous version of the CNTVALOUT and the ratio between the delay which resulted in the CNTVALOUT and the updated delay.
CNTVALIN = CNTVALOUT * (Delay_Old/Delay_New)
When calculating this new value, we need to take into account the alignment delay which is between 45 and 65 taps. Typically, this is 54 taps and as a result, the equation can be updated to the following:
CNTVALIN = ((CNTVALOUT – Align) * (Delay_Old/Delay_New)) +algin
The align delay can be determined by setting the delay to 0, asserting reset on the IDEALYCNTRL, and observing the CNTVALOUT once reset is released.
In this example, I have cascaded three ODELAYS together to provide a variable delay between 3.6 ns and 23 ns, with each delay able to provide between 1.2ns and 7.6ns depending on tap delay.
The code below implements the delay structure.
library ieee; use ieee.std_logic_1164.all; use ieee.numeric_std.all; library ieee_proposed; use ieee_proposed.fixed_float_types.all; use ieee_proposed.fixed_pkg.all; library unisim; use UNISIM.vcomponents.all; entity iodelay is generic( G_DEPTH : integer := 32; G_SEL_WIDTH : integer := 6 ); port( i_clk : in std_logic; i_reset : in std_logic; i_update : in std_logic; i_pulse : in std_logic; i_fine_dly : in std_logic_vector(9 downto 0); i_delay : in std_logic_vector(G_SEL_WIDTH-1 downto 0); o_rdy : out std_logic; o_vtc : out std_logic; o_load : out std_logic; o_reset : out std_logic; o_cntvalue_init : out std_logic_vector(8 downto 0); o_int_val_done : out std_logic; o_cntvaluein : out std_logic_vector(8 downto 0); o_pulse : out std_logic); end entity; architecture rtl of iodelay is constant c_delay_0 : ufixed(8 downto -16) := to_ufixed(0.01,8,-16); type t_fsm is (idle, wait_cnt_val, update, write_update, wait_update); type t_srl_array is array (G_DEPTH - 1 downto 0) of std_logic; signal s_srl_sig : t_srl_array; signal s_srl_delay : std_logic; signal s_rdy : std_logic; signal s_cntvalueout : std_logic_vector(8 downto 0); signal s_cntvaluein : std_logic_vector(8 downto 0); signal s_tap_width : std_logic_vector(8 downto 0); signal s_fsm : t_fsm; signal s_cnt : integer range 0 to 15; signal s_en_vtc : std_logic:='0'; signal s_load : std_logic; signal s_initial_cntval : ufixed(8 downto 0); signal s_initial_cntval_done : std_logic :='0'; signal s_reset : std_logic; signal s_update : std_logic_vector(1 downto 0):= (others =>'0'); signal s_ce : std_logic; signal s_master_out : std_logic; signal s_middle_ret : std_logic; signal s_middle_out : std_logic; signal s_end_ret : std_logic; begin ODELAYE3_inst_1 : ODELAYE3 generic map ( CASCADE => "MASTER", DELAY_FORMAT => "TIME", DELAY_TYPE => "VAR_LOAD", DELAY_VALUE => 100, IS_CLK_INVERTED => '0', IS_RST_INVERTED => '0', REFCLK_FREQUENCY => 333.333, SIM_DEVICE => "ULTRASCALE_PLUS", UPDATE_MODE => "ASYNC" ) port map ( CASC_OUT => s_master_out, CNTVALUEOUT => s_cntvalueout, DATAOUT => o_pulse, CASC_IN => '0' , CASC_RETURN => s_middle_ret, CE => '0', CLK => i_clk, CNTVALUEIN => s_cntvaluein, EN_VTC => s_en_vtc, - INC => '0', LOAD => s_load, ODATAIN => s_srl_delay, RST => s_reset ); ODELAYE3_inst_2 : ODELAYE3 generic map ( CASCADE => "SLAVE_MIDDLE", DELAY_FORMAT => "TIME", DELAY_TYPE => "VAR_LOAD", DELAY_VALUE => 100, IS_CLK_INVERTED => '0', IS_RST_INVERTED => '0', REFCLK_FREQUENCY => 333.333, SIM_DEVICE => "ULTRASCALE_PLUS", UPDATE_MODE => "ASYNC" ) port map ( CASC_OUT => s_middle_out, CNTVALUEOUT => s_cntvalueout, DATAOUT => s_middle_ret, CASC_IN => s_master_out, CASC_RETURN => s_end_ret, CE => '0', CLK => i_clk, CNTVALUEIN => s_cntvaluein, EN_VTC => s_en_vtc, INC => '0', LOAD => s_load, ODATAIN => s_srl_delay, RST => s_reset ); ODELAYE3_inst_3 : ODELAYE3 generic map ( CASCADE => "SLAVE_END", DELAY_FORMAT => "TIME", DELAY_TYPE => "VAR_LOAD", DELAY_VALUE => 100, IS_CLK_INVERTED => '0', IS_RST_INVERTED => '0', REFCLK_FREQUENCY => 333.333, SIM_DEVICE => "ULTRASCALE_PLUS", UPDATE_MODE => "ASYNC" ) port map ( CASC_OUT => open, CNTVALUEOUT => s_cntvalueout, DATAOUT => s_end_ret, CASC_IN => s_middle_out , CASC_RETURN => '0' , CE => '0', CLK => i_clk, CNTVALUEIN => s_cntvaluein, EN_VTC => s_en_vtc, - INC => '0', LOAD => s_load, ODATAIN => s_srl_delay, RST => s_reset ); IDELAYCTRL_inst : IDELAYCTRL generic map ( SIM_DEVICE => "ULTRASCALE" ) port map ( RDY => s_rdy, REFCLK => i_clk, RST => s_reset ); s_reset <= not(i_reset); process(i_clk) variable v_cntvaluein : ufixed(27 downto -16); begin if rising_edge(i_clk) then s_load <= '0'; s_ce <= '0'; s_update <= s_update(s_update'high-1 downto s_update'low) & i_update; case s_fsm is when idle => s_en_vtc <='1'; if s_initial_cntval_done = '0' and s_cntvalueout /= "000000000" then -- store initial delay count s_initial_cntval <= ufixed(s_cntvalueout); o_cntvalue_init <= (s_cntvalueout); s_initial_cntval_done <= '1'; --only store this the first time end if; if s_update = "01" then s_en_vtc <='0'; s_fsm <= wait_cnt_val; s_cnt <= 0; end if; when wait_cnt_val => if s_cnt = 10 then s_fsm <= update; else s_cnt <= s_cnt +1; end if; when update => v_cntvaluein := s_initial_cntval * ufixed(i_fine_dly) * c_delay_0; s_cntvaluein <= to_slv(v_cntvaluein(8 downto 0)); s_fsm <= write_update; when write_update => s_load <= '1'; s_fsm <= wait_update; s_cnt <= 0; when wait_update => if s_cnt = 10 then s_en_vtc <= '1'; s_fsm <= idle; else s_cnt <= s_cnt + 1; end if; end case; end if; end process; process(i_clk) begin if rising_edge(i_clk) then s_srl_sig <= s_srl_sig(G_DEPTH - 2 downto 0) & i_pulse; end if; end process; s_srl_delay <= s_srl_sig(to_integer(unsigned(i_delay(G_SEL_WIDTH-1 downto 0)))); o_rdy <= s_rdy; o_vtc <= s_en_vtc; o_load <= s_load; o_cntvaluein <= s_cntvaluein; o_int_val_done <= s_initial_cntval_done; o_reset <= s_reset; end architecture;
We can see the delays running this in simulation.
Running this on hardware will, of course, need a good scope to be able to pick up on the slight delays.
Workshops and Webinars
Enjoy the blog why not take a look at the free webinars, workshops and training courses we have created over the years. Highlights include
Ultra96, MiniZed & ZU1 three day course looking at HW, SW and Petalinux
Arty Z7-20 Class looking at HW, SW and Petalinux
Mastering MicroBlaze learn how to create MicroBlaze solutions
HLS Hero Workshop learn how to create High Level Synthesis based solutions
Embedded System Book
Do you want to know more about designing embedded systems from scratch? Check out our book on creating embedded systems. This book will walk you through all the stages of requirements, architecture, component selection, schematics, layout, and FPGA / software design.
We designed and manufactured the board at the heart of the book! The schematics and layout are available in Altium here
Sponsored by AMD Xilinx