top of page

MicroZed Chronicles: UltraScale+ IO, ODELAY3E and Cascading

We looked recently at the UltraScale+ IO resources which provide a range of capabilities that can be very effective in addressing both FPGA and hardware design aspects. For example, we can use the I/OSERDES to work with serialized IO streams, DDR primitives to work with DDR interfaces, and I/ODELAYS to fine tune signal delays.

In this blog, we are going to look at how we can work with the ODELAY on an UltraScale+ device. The ODELAY is a 512-tap delay line and the individual taps are uncalibrated but each tap provides between 2.1 ps and 12 ps of delay. The ODELAY can operate in count mode where the delay is defined as the tap to be output, e.g. 0 to 511. The more useful mode of operation though, is the time mode, where the ODELAY is calibrated for voltage, process, and temperature variation by an IDELAYCNTRL block. In this case, the delay is defined as the required time delay. The delay the ODELAY provides can be defined as being either fixed or variable.


This blog is going to look at how we can work with the ODELAY in the variable time mode. There are two methods of updating the delay, either variable or var_load method. In variable method, the delay can be incremented or decremented using a simple inc/dec interface and chip enable. In var_load approach, the delay can be loaded in via a parallel interface although we need to consider the number of taps being changed.


The var_load time approach is the more complex solution which is what we’ll be examining in the remainder of this blog. This application will chain together three ODELAY to make the delay a little more obvious when examined on an oscilloscope.


To operate in the time mode, we need to use the IDELAYCNTRL block which provides calibration. This block is simple to instantiate and only needs the reference clock and reset input signals. For UltraScale+ devices, the reference clock needs to be between 300 and 800 MHz.


When working in the time mode in the var_load approach, we need to know an initial delay and its corresponding tap setting. This is made available on the CNTVALOUT port, where we can read this value and use it to calculate the updated value to apply on the CNTVALIN port.


The process for updating the ODELAY includes the following:


  1. Sample the CNTVALOUT port

  2. Set EN_VTC low

  3. Wait for 10 clock periods

  4. Calculate the new CNTVALIN

  5. Pulse the load signal

  6. Wait for 10 clock periods

  7. Set EN_VTC high

To calculate the time delay, we need to use the previous version of the CNTVALOUT and the ratio between the delay which resulted in the CNTVALOUT and the updated delay.


CNTVALIN = CNTVALOUT * (Delay_Old/Delay_New)


When calculating this new value, we need to take into account the alignment delay which is between 45 and 65 taps. Typically, this is 54 taps and as a result, the equation can be updated to the following:


CNTVALIN = ((CNTVALOUT – Align) * (Delay_Old/Delay_New)) +algin


The align delay can be determined by setting the delay to 0, asserting reset on the IDEALYCNTRL, and observing the CNTVALOUT once reset is released.


In this example, I have cascaded three ODELAYS together to provide a variable delay between 3.6 ns and 23 ns, with each delay able to provide between 1.2ns and 7.6ns depending on tap delay.


The code below implements the delay structure.

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

library ieee_proposed;
use ieee_proposed.fixed_float_types.all;
use ieee_proposed.fixed_pkg.all;

library unisim;
use UNISIM.vcomponents.all;

entity iodelay is 
generic(
 G_DEPTH     : integer := 32;
 G_SEL_WIDTH : integer := 6
);

port(
    i_clk    : in std_logic;
    i_reset  : in std_logic;
    i_update : in std_logic;
    i_pulse  : in std_logic;
    i_fine_dly : in std_logic_vector(9 downto 0);
    i_delay  : in std_logic_vector(G_SEL_WIDTH-1 downto 0);
    o_rdy    : out std_logic;
    o_vtc    : out std_logic;
    o_load   : out std_logic;
    o_reset  : out std_logic;
    o_cntvalue_init : out std_logic_vector(8 downto 0);
    o_int_val_done : out std_logic;
    o_cntvaluein : out std_logic_vector(8 downto 0);
    o_pulse  : out std_logic);
end entity;

architecture rtl of iodelay is 

constant c_delay_0 : ufixed(8 downto -16) := to_ufixed(0.01,8,-16);

type t_fsm is (idle, wait_cnt_val, update, write_update, wait_update);
type t_srl_array is array (G_DEPTH - 1 downto 0) of std_logic;

signal s_srl_sig : t_srl_array;
signal s_srl_delay : std_logic;
signal s_rdy : std_logic;
signal s_cntvalueout : std_logic_vector(8 downto 0);
signal s_cntvaluein : std_logic_vector(8 downto 0);
signal s_tap_width : std_logic_vector(8 downto 0);
signal s_fsm : t_fsm;
signal s_cnt : integer range 0 to 15;
signal s_en_vtc : std_logic:='0';
signal s_load : std_logic;
signal s_initial_cntval : ufixed(8 downto 0);
signal s_initial_cntval_done : std_logic :='0'; 
signal s_reset : std_logic;
signal s_update : std_logic_vector(1 downto 0):= (others =>'0');
signal s_ce : std_logic;

signal s_master_out : std_logic;
signal s_middle_ret : std_logic;
signal s_middle_out : std_logic;
signal s_end_ret : std_logic;



begin

   ODELAYE3_inst_1 : ODELAYE3
   generic map (
      CASCADE => "MASTER",            
      DELAY_FORMAT => "TIME",         
      DELAY_TYPE => "VAR_LOAD",       
      DELAY_VALUE => 100,                       
      IS_CLK_INVERTED => '0',         
      IS_RST_INVERTED => '0',         
      REFCLK_FREQUENCY => 333.333,    
      SIM_DEVICE => "ULTRASCALE_PLUS",
      UPDATE_MODE => "ASYNC"          
                                      
   )
   port map (
      CASC_OUT => s_master_out,    
      CNTVALUEOUT => s_cntvalueout,
      DATAOUT => o_pulse,          
      CASC_IN => '0'    ,          
      CASC_RETURN => s_middle_ret, 
      CE => '0',                   
      CLK => i_clk,                
      CNTVALUEIN => s_cntvaluein,  
      EN_VTC => s_en_vtc,         -
      INC => '0',                  
      LOAD => s_load,              
      ODATAIN => s_srl_delay,      
      RST => s_reset               
   );
   
   
    ODELAYE3_inst_2 : ODELAYE3
   generic map (
      CASCADE => "SLAVE_MIDDLE",      
      DELAY_FORMAT => "TIME",         
      DELAY_TYPE => "VAR_LOAD",       
      DELAY_VALUE => 100,                       
      IS_CLK_INVERTED => '0',         
      IS_RST_INVERTED => '0',         
      REFCLK_FREQUENCY => 333.333,    
      SIM_DEVICE => "ULTRASCALE_PLUS",
      UPDATE_MODE => "ASYNC"          
                                      
   )
   port map (
      CASC_OUT => s_middle_out,    
      CNTVALUEOUT => s_cntvalueout,
      DATAOUT => s_middle_ret,     
      CASC_IN => s_master_out,     
      CASC_RETURN => s_end_ret,    
      CE => '0',                   
      CLK => i_clk,                
      CNTVALUEIN => s_cntvaluein,  
      EN_VTC => s_en_vtc,         
      INC => '0',                  
      LOAD => s_load,              
      ODATAIN => s_srl_delay,      
      RST => s_reset               
   );

   ODELAYE3_inst_3 : ODELAYE3
   generic map (
      CASCADE => "SLAVE_END",         
      DELAY_FORMAT => "TIME",         
      DELAY_TYPE => "VAR_LOAD",       
      DELAY_VALUE => 100,                       
      IS_CLK_INVERTED => '0',         
      IS_RST_INVERTED => '0',         
      REFCLK_FREQUENCY => 333.333,    
      SIM_DEVICE => "ULTRASCALE_PLUS",  
      UPDATE_MODE => "ASYNC"          
                                      
   )
   port map (
      CASC_OUT => open,            
      CNTVALUEOUT => s_cntvalueout,
      DATAOUT =>