top of page

MicroZed Chronicles: UltraScale+ IO, ODELAY3E and Cascading

We looked recently at the UltraScale+ IO resources which provide a range of capabilities that can be very effective in addressing both FPGA and hardware design aspects. For example, we can use the I/OSERDES to work with serialized IO streams, DDR primitives to work with DDR interfaces, and I/ODELAYS to fine tune signal delays.

In this blog, we are going to look at how we can work with the ODELAY on an UltraScale+ device. The ODELAY is a 512-tap delay line and the individual taps are uncalibrated but each tap provides between 2.1 ps and 12 ps of delay. The ODELAY can operate in count mode where the delay is defined as the tap to be output, e.g. 0 to 511. The more useful mode of operation though, is the time mode, where the ODELAY is calibrated for voltage, process, and temperature variation by an IDELAYCNTRL block. In this case, the delay is defined as the required time delay. The delay the ODELAY provides can be defined as being either fixed or variable.


This blog is going to look at how we can work with the ODELAY in the variable time mode. There are two methods of updating the delay, either variable or var_load method. In variable method, the delay can be incremented or decremented using a simple inc/dec interface and chip enable. In var_load approach, the delay can be loaded in via a parallel interface although we need to consider the number of taps being changed.


The var_load time approach is the more complex solution which is what we’ll be examining in the remainder of this blog. This application will chain together three ODELAY to make the delay a little more obvious when examined on an oscilloscope.


To operate in the time mode, we need to use the IDELAYCNTRL block which provides calibration. This block is simple to instantiate and only needs the reference clock and reset input signals. For UltraScale+ devices, the reference clock needs to be between 300 and 800 MHz.


When working in the time mode in the var_load approach, we need to know an initial delay and its corresponding tap setting. This is made available on the CNTVALOUT port, where we can read this value and use it to calculate the updated value to apply on the CNTVALIN port.


The process for updating the ODELAY includes the following:


  1. Sample the CNTVALOUT port

  2. Set EN_VTC low

  3. Wait for 10 clock periods

  4. Calculate the new CNTVALIN

  5. Pulse the load signal

  6. Wait for 10 clock periods

  7. Set EN_VTC high

To calculate the time delay, we need to use the previous version of the CNTVALOUT and the ratio between the delay which resulted in the CNTVALOUT and the updated delay.


CNTVALIN = CNTVALOUT * (Delay_Old/Delay_New)


When calculating this new value, we need to take into account the alignment delay which is between 45 and 65 taps. Typically, this is 54 taps and as a result, the equation can be updated to the following:


CNTVALIN = ((CNTVALOUT – Align) * (Delay_Old/Delay_New)) +algin


The align delay can be determined by setting the delay to 0, asserting reset on the IDEALYCNTRL, and observing the CNTVALOUT once reset is released.


In this example, I have cascaded three ODELAYS together to provide a variable delay between 3.6 ns and 23 ns, with each delay able to provide between 1.2ns and 7.6ns depending on tap delay.


The code below implements the delay structure.

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

library ieee_proposed;
use ieee_proposed.fixed_float_types.all;
use ieee_proposed.fixed_pkg.all;

library unisim;
use UNISIM.vcomponents.all;

entity iodelay is 
generic(
 G_DEPTH     : integer := 32;
 G_SEL_WIDTH : integer := 6
);

port(
    i_clk    : in std_logic;
    i_reset  : in std_logic;
    i_update : in std_logic;
    i_pulse  : in std_logic;
    i_fine_dly : in std_logic_vector(9 downto 0);
    i_delay  : in std_logic_vector(G_SEL_WIDTH-1 downto 0);
    o_rdy    : out std_logic;
    o_vtc    : out std_logic;
    o_load   : out std_logic;
    o_reset  : out std_logic;
    o_cntvalue_init : out std_logic_vector(8 downto 0);
    o_int_val_done : out std_logic;
    o_cntvaluein : out std_logic_vector(8 downto 0);
    o_pulse  : out std_logic);
end entity;

architecture rtl of iodelay is 

constant c_delay_0 : ufixed(8 downto -16) := to_ufixed(0.01,8,-16);

type t_fsm is (idle, wait_cnt_val, update, write_update, wait_update);
type t_srl_array is array (G_DEPTH - 1 downto 0) of std_logic;

signal s_srl_sig : t_srl_array;
signal s_srl_delay : std_logic;
signal s_rdy : std_logic;
signal s_cntvalueout : std_logic_vector(8 downto 0);
signal s_cntvaluein : std_logic_vector(8 downto 0);
signal s_tap_width : std_logic_vector(8 downto 0);
signal s_fsm : t_fsm;
signal s_cnt : integer range 0 to 15;
signal s_en_vtc : std_logic:='0';
signal s_load : std_logic;
signal s_initial_cntval : ufixed(8 downto 0);
signal s_initial_cntval_done : std_logic :='0'; 
signal s_reset : std_logic;
signal s_update : std_logic_vector(1 downto 0):= (others =>'0');
signal s_ce : std_logic;

signal s_master_out : std_logic;
signal s_middle_ret : std_logic;
signal s_middle_out : std_logic;
signal s_end_ret : std_logic;



begin

   ODELAYE3_inst_1 : ODELAYE3
   generic map (
      CASCADE => "MASTER",            
      DELAY_FORMAT => "TIME",         
      DELAY_TYPE => "VAR_LOAD",       
      DELAY_VALUE => 100,                       
      IS_CLK_INVERTED => '0',         
      IS_RST_INVERTED => '0',         
      REFCLK_FREQUENCY => 333.333,    
      SIM_DEVICE => "ULTRASCALE_PLUS",
      UPDATE_MODE => "ASYNC"          
                                      
   )
   port map (
      CASC_OUT => s_master_out,    
      CNTVALUEOUT => s_cntvalueout,
      DATAOUT => o_pulse,          
      CASC_IN => '0'    ,          
      CASC_RETURN => s_middle_ret, 
      CE => '0',                   
      CLK => i_clk,                
      CNTVALUEIN => s_cntvaluein,  
      EN_VTC => s_en_vtc,         -
      INC => '0',                  
      LOAD => s_load,              
      ODATAIN => s_srl_delay,      
      RST => s_reset               
   );
   
   
    ODELAYE3_inst_2 : ODELAYE3
   generic map (
      CASCADE => "SLAVE_MIDDLE",      
      DELAY_FORMAT => "TIME",         
      DELAY_TYPE => "VAR_LOAD",       
      DELAY_VALUE => 100,                       
      IS_CLK_INVERTED => '0',         
      IS_RST_INVERTED => '0',         
      REFCLK_FREQUENCY => 333.333,    
      SIM_DEVICE => "ULTRASCALE_PLUS",
      UPDATE_MODE => "ASYNC"          
                                      
   )
   port map (
      CASC_OUT => s_middle_out,    
      CNTVALUEOUT => s_cntvalueout,
      DATAOUT => s_middle_ret,     
      CASC_IN => s_master_out,     
      CASC_RETURN => s_end_ret,    
      CE => '0',                   
      CLK => i_clk,                
      CNTVALUEIN => s_cntvaluein,  
      EN_VTC => s_en_vtc,         
      INC => '0',                  
      LOAD => s_load,              
      ODATAIN => s_srl_delay,      
      RST => s_reset               
   );

   ODELAYE3_inst_3 : ODELAYE3
   generic map (
      CASCADE => "SLAVE_END",         
      DELAY_FORMAT => "TIME",         
      DELAY_TYPE => "VAR_LOAD",       
      DELAY_VALUE => 100,                       
      IS_CLK_INVERTED => '0',         
      IS_RST_INVERTED => '0',         
      REFCLK_FREQUENCY => 333.333,    
      SIM_DEVICE => "ULTRASCALE_PLUS",  
      UPDATE_MODE => "ASYNC"          
                                      
   )
   port map (
      CASC_OUT => open,            
      CNTVALUEOUT => s_cntvalueout,
      DATAOUT => s_end_ret,        
      CASC_IN => s_middle_out    , 
      CASC_RETURN => '0' ,         
      CE => '0',                   
      CLK => i_clk,                
      CNTVALUEIN => s_cntvaluein,  
      EN_VTC => s_en_vtc,         -
      INC => '0',                  
      LOAD => s_load,              
      ODATAIN => s_srl_delay,      
      RST => s_reset               
   );
  
IDELAYCTRL_inst : IDELAYCTRL
generic map (
    SIM_DEVICE => "ULTRASCALE"  
)
port map (
    RDY => s_rdy,        
    REFCLK => i_clk,     
    RST => s_reset                            
);
 
s_reset <= not(i_reset);
 
process(i_clk)
variable v_cntvaluein : ufixed(27 downto -16);
begin 
    if rising_edge(i_clk) then 
        s_load <= '0';
        s_ce <= '0';
        s_update <= s_update(s_update'high-1 downto s_update'low) & i_update;
        case s_fsm is 
            when idle => 
                s_en_vtc <='1';
                if s_initial_cntval_done = '0' and s_cntvalueout /= "000000000" then -- store initial delay count
                    s_initial_cntval <= ufixed(s_cntvalueout);
                    o_cntvalue_init <= (s_cntvalueout);
                    s_initial_cntval_done <= '1'; --only store this the first time 
                end if;
                if s_update = "01" then 
                    s_en_vtc <='0';
                    s_fsm <= wait_cnt_val;
                    s_cnt <= 0;
                end if;
            when wait_cnt_val =>  
                if s_cnt = 10 then 
                   
                    s_fsm <= update;
                 else
                    s_cnt <= s_cnt +1;
                 end if;
            when update => 
                 v_cntvaluein := s_initial_cntval * ufixed(i_fine_dly) * c_delay_0;
                 s_cntvaluein  <= to_slv(v_cntvaluein(8 downto 0)); 
                 s_fsm <= write_update;
            when write_update =>
                 s_load <= '1';
                 s_fsm <= wait_update;
                 s_cnt  <= 0;
            when  wait_update => 
                if s_cnt = 10 then 
                    s_en_vtc <= '1';
                    s_fsm <= idle;
                else
                    s_cnt <= s_cnt + 1;  
                end if;
        end case;
   end if;
end process; 

process(i_clk)
begin
 if rising_edge(i_clk) then
   s_srl_sig <= s_srl_sig(G_DEPTH - 2 downto 0) & i_pulse;
 end if;
end process;

s_srl_delay <= s_srl_sig(to_integer(unsigned(i_delay(G_SEL_WIDTH-1 downto 0))));
o_rdy <= s_rdy;
o_vtc <= s_en_vtc;
o_load <= s_load;
o_cntvaluein <= s_cntvaluein;
o_int_val_done <= s_initial_cntval_done;
o_reset <= s_reset;
end architecture; 
    

We can see the delays running this in simulation.

Running this on hardware will, of course, need a good scope to be able to pick up on the slight delays.


Workshops and Webinars

Enjoy the blog why not take a look at the free webinars, workshops and training courses we have created over the years. Highlights include


Embedded System Book

Do you want to know more about designing embedded systems from scratch? Check out our book on creating embedded systems. This book will walk you through all the stages of requirements, architecture, component selection, schematics, layout, and FPGA / software design.


We designed and manufactured the board at the heart of the book! The schematics and layout are available in Altium here


Learn more about the board (see previous blogs on Bring up, DDR validation, USB, Sensors) and view the schematics here.



Sponsored by AMD Xilinx


0 comments
bottom of page