top of page

MicroZed Chronicles: From UART to AXI Lite Debug Access

Updated: Jan 18

Last week we examined how we could create a UART with AXI Stream interfaces to enable access to AXI buses in device for debugging.


In this blog, we are going to implement the protocol level element of this, which is the bit that takes the bytes received over the AXI Stream and converts them into AXI Lite accesses. You can also use the same approach with some additional to work with full AXI.


First let’s recap. We are going to enable either a read or write of a single AXI register using commands received over a UART.


The AXI Stream interactions will be as follows.

The structure will be as follows.

  • Write Op Code – 1 byte, value 0x09

  • Read Op Code – 1 byte, value 0x05

  • Address – 4 bytes the address of the AXI interaction

  • Length – 1 byte always 1 for AXI Lite implementations

  • Payload – Words to write or received data, 4 bytes are provided for an AXI Lite read or write

This will allow using AXI Lite to read or write to a single register address. In our production version of this code, we provide both AXI Lite and AXI4 along with configurable generics for the bus / address widths etc. I have however, kept this one a little simpler to demonstrate the concepts involved.


First, let’s look at the microarchitecture of the module where we need to be able to receive several bytes from the AXIS slave interface. These bytes determine the action required by the module, which either performs a read or write at the requested address. This requires a state machine which can receive information over the AXIS slave interface and collect the address, length, and data bytes.


With the bytes collected, either an AXI Lite write or AXI Lite read is performed. Since AXI write and read are separate channels, we can use separate state machines for the AXI Lite interfacing.


The final element of the microarchitecture is outputting received AXI read data over the AXI Stream master interface.

The diagrams for each of the state machines can be seen below. When I wrote the main state machine, I tried to ensure that the HDL was flexible and would scale. For example, I included buffers which could hold multiple read and write data words.



The source code can be seen below.


library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

--Declare entity
entity axi_protocol is
    generic(
            G_AXIL_DATA_WIDTH    :integer   := 32;                                         
            G_AXI_ADDR_WIDTH     :integer   := 32;                                         
            G_AXI_ID_WIDTH       :integer   := 8;                                          
            G_AXI_AWUSER_WIDTH   :integer   := 1                                           
    );
    port(   
            --Master clock & reset
            clk              :in std_ulogic;                                           
            reset            :in std_ulogic;                                           

            --! Master AXIS Interface  
            m_axis_tready : in  std_logic;
            m_axis_tdata  : out std_logic_vector(7 downto 0);
            m_axis_tvalid : out std_logic;

            --! Slave AXIS Interface
            s_axis_tready : out  std_logic;
            s_axis_tdata  : in std_logic_vector(7 downto 0);
            s_axis_tvalid : in std_logic;
            
            --! AXIL Interface
            --!Write address
            axi_awaddr    : out std_logic_vector
                            (G_AXI_ADDR_WIDTH-1 downto 0);                  
            axi_awprot    : out std_logic_vector(2 downto 0);                   
            axi_awvalid   : out std_logic;
            --!write data
            axi_wdata     : out std_logic_vector
                            (G_AXIL_DATA_WIDTH-1 downto 0);  
            axi_wstrb     : out std_logic_vector
                            (G_AXIL_DATA_WIDTH/8-1 downto 0);
            axi_wvalid    : out std_logic; 
            --!write response
            axi_bready    : out std_logic;
            --!read address
            axi_araddr    : out std_logic_vector
                            (G_AXI_ADDR_WIDTH-1 downto 0);               
            axi_arprot    : out std_logic_vector(2 downto 0);                   
            axi_arvalid   : out std_logic; 
            --!read data
            axi_rready    : out std_logic;    
            --write address
            axi_awready   : in std_logic;
            --write data
            axi_wready    : in std_logic;
            --write response
            axi_bresp     : in std_logic_vector(1 downto 0);                   
            axi_bvalid    : in std_logic;
            --read address
            axi_arready   : in std_logic;
            --read data       
            axi_rdata     : in std_logic_vector
                            (G_AXIL_DATA_WIDTH-1 downto 0);  
            axi_rresp     : in std_logic_vector(1 downto 0);                   
            axi_rvalid    : in std_logic 

        );
        
end entity axi_protocol;

architecture rtl of axi_protocol is 

    constant C_SINGLE_READ       : std_logic_vector(7 downto 0) := x"05";
    constant C_SINGLE_WRITE      : std_logic_vector(7 downto 0) := x"09";

    constant C_NUMB_ADDR_BYTES        : integer := 4;
    constant C_NUMB_LENGTH_BYTES      : integer := 1;
    constant C_NUMB_DATA_BYTES        : integer := 4;
    constant C_NUMB_AXIL_DATA_BYTES   : integer := 4;
    constant C_NUMB_CRC_BYTES         : integer := 4;   
    constant C_MAX_NUMB_BYTES         : integer := 4; 
    constant C_ZERO_PAD               : std_logic_vector(7 downto 0) 
                                        := (others => '0');
    
    type t_fsm is (idle, address, length, dummy, 
                    write_payload, read_payload, crc, 
                    write_axil, write_axi, read_axi, read_axil);
    type t_op_fsm is (idle, output);
    type t_array is array (0 to 7) of std_logic_vector(31 downto 0);
    type axil_read_fsm is (IDLE, START, CHECK_ADDR_RESP, READ_DATA, DONE);
    type axil_write_fsm is (IDLE, START, CHECK_ADDR_RESP, WRITE_DATA, 
                            RESP_READY, CHECK_RESP, DONE);
    signal write_state : axil_write_fsm;
    signal read_state  : axil_read_fsm;

    signal s_current_state : t_fsm;

    signal s_command            : std_logic_vector(7 downto 0);
    signal s_address            : std_logic_vector
                                    ((C_NUMB_ADDR_BYTES * 8)-1 downto 0);
    signal s_length             : std_logic_vector(7 downto 0);
    signal s_length_axi         : std_logic_vector(7 downto 0);
    signal s_buf_cnt            : unsigned(7 downto 0);
    signal s_byte_pos           : integer range 0 to C_MAX_NUMB_BYTES; 
    signal s_num_bytes          : integer range 0 to C_MAX_NUMB_BYTES; 
    signal s_s_tready           : std_logic;
    signal s_write_buffer       : t_array :=(others=>(others=>'0'));
    signal s_read_buffer        : t_array :=(others=>(others=>'0'));
    signal s_write_buffer_temp  : std_logic_vector(31 downto 0);
    signal s_read_buffer_temp   : std_logic_vector(31 downto 0);

    --axil lite data interface 
    signal s_axil_data          : std_logic_vector
                                    (G_AXIL_DATA_WIDTH-1 downto 0);
    signal s_axil_valid         : std_logic;
    signal s_axil_idata         : std_logic_vector
                                    (G_AXIL_DATA_WIDTH-1 downto 0);


    --axi mstream 
    signal s_opptr              : unsigned(7 downto 0);
    signal s_start              : std_logic;
    signal s_op_state           : t_op_fsm;
    signal s_op_byte            : integer range 0 to C_MAX_NUMB_BYTES; 
    signal start_read           : std_logic;
    signal start_write          : std_logic;

begin

    s_axis_tready <= s_s_tready;

FSM : process(clk, reset )
begin 
    if (reset = '0') then 
        start_read  <= '0';
        start_write <= '0';

        s_s_tready  <= '0';
    elsif rising_edge(clk) then
        s_s_tready  <= '1';
        s_start     <= '0';
        start_read  <= '0';
        start_write <= '0';
        case s_current_state is

            when idle => -- to do needs to check the command is valid
                s_buf_cnt           <= (others =>'0');
                if (s_axis_tvalid = '1' and s_s_tready = '1') and 
                    (s_axis_tdata = C_SINGLE_READ 
                     or s_axis_tdata = C_SINGLE_WRITE) then
                        s_s_tready <= '0';
                        s_command <= s_axis_tdata;
                        s_current_state <= address;
                        s_byte_pos <= C_NUMB_ADDR_BYTES;
                end if;

            when address =>
                if s_byte_pos = 0 then
                    s_s_tready <= '0';
                    s_byte_pos <= C_NUMB_LENGTH_BYTES;
                    s_current_state <= length;    
                elsif s_axis_tvalid = '1' and s_s_tready = '1'  then
                    s_address <= s_address(s_address'length-8-1 downto 0) 
                    & s_axis_tdata;
                    s_byte_pos <= s_byte_pos - 1;
                    if s_byte_pos = 1 then 
                        s_s_tready <= '0';
                    end if; 
                end if;

            when length => 
                if s_byte_pos = 0 then
                    s_s_tready <= '0';
                    if s_command = C_SINGLE_READ 
                        and unsigned(s_length) = 1 then
                        s_current_state <= read_axil; 
                        start_read      <= '1';
                        s_num_bytes     <= C_NUMB_AXIL_DATA_BYTES;
                    elsif s_command = C_SINGLE_WRITE then
                        s_buf_cnt       <= (others =>'0');
                        s_byte_pos      <= C_NUMB_AXIL_DATA_BYTES;
                        s_num_bytes     <= C_NUMB_AXIL_DATA_BYTES;
                        s_current_state <= write_payload;
                    end if;    
                elsif s_axis_tvalid = '1' and s_s_tready = '1'  then
                    s_length            <= s_axis_tdata;
                    s_length_axi        <= 
                    std_logic_vector(unsigned(s_axis_tdata)-1);
                    s_byte_pos          <= s_byte_pos - 1;
                    s_s_tready <= '0';
                end if;

            when read_axil =>  
                if s_axil_valid = '1' then 
                    s_start             <= '1';
                    s_read_buffer(0)(G_AXIL_DATA_WIDTH-1 downto 0) <=     
                                        s_axil_data;
                end if;
                if (read_state = DONE) then
                    s_current_state <= read_payload;
                end if;
            

            when write_payload =>
                if s_buf_cnt = unsigned(s_length) then 
                    s_s_tready <= '0';
                    s_current_state <= write_axil;
                    start_write <= '1';
                else
                    if s_byte_pos = 0 then 
                        s_s_tready <= '0';
                        s_byte_pos <= s_num_bytes;
                        s_write_buffer(to_integer(s_buf_cnt)) <= 
                                                      s_write_buffer_temp;
                        s_buf_cnt <= s_buf_cnt + 1;  
                    elsif (s_axis_tvalid = '1' and s_s_tready = '1')  then
                        s_write_buffer_temp <= s_write_buffer_temp
                        (s_write_buffer_temp'length-8-1 downto 0) 
                         & s_axis_tdata;
                        s_byte_pos <= s_byte_pos - 1;  
                        if s_byte_pos = 1 then 
                            s_s_tready <= '0';
                        end if;   
                    end if;
                end if;

            when write_axil =>  
                s_s_tready <= '0';
                s_axil_idata <= s_write_buffer(0);
                if (write_state = DONE) then
                    s_current_state <= idle;
                end if;

            when read_payload =>
                s_current_state <= idle;
            when others => null;
        end case;
    end if;

end process;

process(clk, reset)
begin
    if (reset = '0') then 
        m_axis_tvalid   <= '0';
        m_axis_tdata    <= (others =>'0');
        s_opptr             <= (others => '0');
        s_op_byte           <= C_NUMB_AXIL_DATA_BYTES;
    elsif rising_edge(clk) then 
        case s_op_state is  
            when idle => 
                if s_start = '1' then 
                    s_opptr     <= (others => '0');
                    s_read_buffer_temp <= s_read_buffer(0);
                    s_op_byte   <= s_num_bytes;
                    s_op_state  <= output;
                end if;
            when output =>
                m_axis_tvalid <= '0';
                if s_opptr = unsigned(s_length) then 
                    s_op_state <= idle;
                else
                    if m_axis_tready = '1' then                   
                        if s_op_byte = 0 then 
                            s_op_byte   <= s_num_bytes;
                            s_opptr     <= s_opptr + 1;
                            s_read_buffer_temp <= s_read_buffer
                                        (to_integer(s_opptr) + 1); 
                        else
                          m_axis_tvalid <= '1';
                          m_axis_tdata <= s_read_buffer_temp(7 downto 0);
                          s_read_buffer_temp <= C_ZERO_PAD 
                             & s_read_buffer_temp
                             (s_read_buffer_temp'length-1 downto 8);
                            s_op_byte <= s_op_byte - 1; 
                        end if;     
                    end if;