top of page
Writer's pictureAdam Taylor

MicroZed Chronicles: From UART to AXI Lite Debug Access

Updated: Jan 18, 2023

Last week we examined how we could create a UART with AXI Stream interfaces to enable access to AXI buses in device for debugging.


In this blog, we are going to implement the protocol level element of this, which is the bit that takes the bytes received over the AXI Stream and converts them into AXI Lite accesses. You can also use the same approach with some additional to work with full AXI.


First let’s recap. We are going to enable either a read or write of a single AXI register using commands received over a UART.


The AXI Stream interactions will be as follows.

The structure will be as follows.

  • Write Op Code – 1 byte, value 0x09

  • Read Op Code – 1 byte, value 0x05

  • Address – 4 bytes the address of the AXI interaction

  • Length – 1 byte always 1 for AXI Lite implementations

  • Payload – Words to write or received data, 4 bytes are provided for an AXI Lite read or write

This will allow using AXI Lite to read or write to a single register address. In our production version of this code, we provide both AXI Lite and AXI4 along with configurable generics for the bus / address widths etc. I have however, kept this one a little simpler to demonstrate the concepts involved.


First, let’s look at the microarchitecture of the module where we need to be able to receive several bytes from the AXIS slave interface. These bytes determine the action required by the module, which either performs a read or write at the requested address. This requires a state machine which can receive information over the AXIS slave interface and collect the address, length, and data bytes.


With the bytes collected, either an AXI Lite write or AXI Lite read is performed. Since AXI write and read are separate channels, we can use separate state machines for the AXI Lite interfacing.


The final element of the microarchitecture is outputting received AXI read data over the AXI Stream master interface.

The diagrams for each of the state machines can be seen below. When I wrote the main state machine, I tried to ensure that the HDL was flexible and would scale. For example, I included buffers which could hold multiple read and write data words.



The source code can be seen below.


library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

--Declare entity
entity axi_protocol is
    generic(
            G_AXIL_DATA_WIDTH    :integer   := 32;                                         
            G_AXI_ADDR_WIDTH     :integer   := 32;                                         
            G_AXI_ID_WIDTH       :integer   := 8;                                          
            G_AXI_AWUSER_WIDTH   :integer   := 1                                           
    );
    port(   
            --Master clock & reset
            clk              :in std_ulogic;                                           
            reset            :in std_ulogic;                                           

            --! Master AXIS Interface  
            m_axis_tready : in  std_logic;
            m_axis_tdata  : out std_logic_vector(7 downto 0);
            m_axis_tvalid : out std_logic;

            --! Slave AXIS Interface
            s_axis_tready : out  std_logic;
            s_axis_tdata  : in std_logic_vector(7 downto 0);
            s_axis_tvalid : in std_logic;
            
            --! AXIL Interface
            --!Write address
            axi_awaddr    : out std_logic_vector
                            (G_AXI_ADDR_WIDTH-1 downto 0);                  
            axi_awprot    : out std_logic_vector(2 downto 0);                   
            axi_awvalid   : out std_logic;
            --!write data
            axi_wdata     : out std_logic_vector
                            (G_AXIL_DATA_WIDTH-1 downto 0);  
            axi_wstrb     : out std_logic_vector
                            (G_AXIL_DATA_WIDTH/8-1 downto 0);
            axi_wvalid    : out std_logic; 
            --!write response
            axi_bready    : out std_logic;
            --!read address
            axi_araddr    : out std_logic_vector
                            (G_AXI_ADDR_WIDTH-1 downto 0);               
            axi_arprot    : out std_logic_vector(2 downto 0);                   
            axi_arvalid   : out std_logic; 
            --!read data
            axi_rready    : out std_logic;    
            --write address
            axi_awready   : in std_logic;
            --write data
            axi_wready    : in std_logic;
            --write response
            axi_bresp     : in std_logic_vector(1 downto 0);                   
            axi_bvalid    : in std_logic;
            --read address
            axi_arready   : in std_logic;
            --read data       
            axi_rdata     : in std_logic_vector
                            (G_AXIL_DATA_WIDTH-1 downto 0);  
            axi_rresp     : in std_logic_vector(1 downto 0);                   
            axi_rvalid    : in std_logic 

        );
        
end entity axi_protocol;

architecture rtl of axi_protocol is 

    constant C_SINGLE_READ       : std_logic_vector(7 downto 0) := x"05";
    constant C_SINGLE_WRITE      : std_logic_vector(7 downto 0) := x"09";

    constant C_NUMB_ADDR_BYTES        : integer := 4;
    constant C_NUMB_LENGTH_BYTES      : integer := 1;
    constant C_NUMB_DATA_BYTES        : integer := 4;
    constant C_NUMB_AXIL_DATA_BYTES   : integer := 4;
    constant C_NUMB_CRC_BYTES         : integer := 4;   
    constant C_MAX_NUMB_BYTES         : integer := 4; 
    constant C_ZERO_PAD               : std_logic_vector(7 downto 0) 
                                        := (others => '0');
    
    type t_fsm is (idle, address, length, dummy, 
                    write_payload, read_payload, crc, 
                    write_axil, write_axi, read_axi, read_axil);
    type t_op_fsm is (idle, output);
    type t_array is array (0 to 7) of std_logic_vector(31 downto 0);
    type axil_read_fsm is (IDLE, START, CHECK_ADDR_RESP, READ_DATA, DONE);
    type axil_write_fsm is (IDLE, START, CHECK_ADDR_RESP, WRITE_DATA, 
                            RESP_READY, CHECK_RESP, DONE);
    signal write_state : axil_write_fsm;
    signal read_state  : axil_read_fsm;

    signal s_current_state : t_fsm;

    signal s_command            : std_logic_vector(7 downto 0);
    signal s_address            : std_logic_vector
                                    ((C_NUMB_ADDR_BYTES * 8)-1 downto 0);
    signal s_length             : std_logic_vector(7 downto 0);
    signal s_length_axi         : std_logic_vector(7 downto 0);
    signal s_buf_cnt            : unsigned(7 downto 0);
    signal s_byte_pos           : integer range 0 to C_MAX_NUMB_BYTES; 
    signal s_num_bytes          : integer range 0 to C_MAX_NUMB_BYTES; 
    signal s_s_tready           : std_logic;
    signal s_write_buffer       : t_array :=(others=>(others=>'0'));
    signal s_read_buffer        : t_array :=(others=>(others=>'0'));
    signal s_write_buffer_temp  : std_logic_vector(31 downto 0);
    signal s_read_buffer_temp   : std_logic_vector(31 downto 0);

    --axil lite data interface 
    signal s_axil_data          : std_logic_vector
                                    (G_AXIL_DATA_WIDTH-1 downto 0);
    signal s_axil_valid         : std_logic;
    signal s_axil_idata         : std_logic_vector
                                    (G_AXIL_DATA_WIDTH-1 downto 0);


    --axi mstream 
    signal s_opptr              : unsigned(7 downto 0);
    signal s_start              : std_logic;
    signal s_op_state           : t_op_fsm;
    signal s_op_byte            : integer range 0 to C_MAX_NUMB_BYTES; 
    signal start_read           : std_logic;
    signal start_write          : std_logic;

begin

    s_axis_tready <= s_s_tready;

FSM : process(clk, reset )
begin 
    if (reset = '0') then 
        start_read  <= '0';
        start_write <= '0';

        s_s_tready  <= '0';
    elsif rising_edge(clk) then
        s_s_tready  <= '1';
        s_start     <= '0';
        start_read  <= '0';
        start_write <= '0';
        case s_current_state is

            when idle => -- to do needs to check the command is valid
                s_buf_cnt           <= (others =>'0');
                if (s_axis_tvalid = '1' and s_s_tready = '1') and 
                    (s_axis_tdata = C_SINGLE_READ 
                     or s_axis_tdata = C_SINGLE_WRITE) then
                        s_s_tready <= '0';
                        s_command <= s_axis_tdata;
                        s_current_state <= address;
                        s_byte_pos <= C_NUMB_ADDR_BYTES;
                end if;

            when address =>
                if s_byte_pos = 0 then
                    s_s_tready <= '0';
                    s_byte_pos <= C_NUMB_LENGTH_BYTES;
                    s_current_state <= length;    
                elsif s_axis_tvalid = '1' and s_s_tready = '1'  then
                    s_address <= s_address(s_address'length-8-1 downto 0) 
                    & s_axis_tdata;
                    s_byte_pos <= s_byte_pos - 1;
                    if s_byte_pos = 1 then 
                        s_s_tready <= '0';
                    end if; 
                end if;

            when length => 
                if s_byte_pos = 0 then
                    s_s_tready <= '0';
                    if s_command = C_SINGLE_READ 
                        and unsigned(s_length) = 1 then
                        s_current_state <= read_axil; 
                        start_read      <= '1';
                        s_num_bytes     <= C_NUMB_AXIL_DATA_BYTES;
                    elsif s_command = C_SINGLE_WRITE then
                        s_buf_cnt       <= (others =>'0');
                        s_byte_pos      <= C_NUMB_AXIL_DATA_BYTES;
                        s_num_bytes     <= C_NUMB_AXIL_DATA_BYTES;
                        s_current_state <= write_payload;
                    end if;    
                elsif s_axis_tvalid = '1' and s_s_tready = '1'  then
                    s_length            <= s_axis_tdata;
                    s_length_axi        <= 
                    std_logic_vector(unsigned(s_axis_tdata)-1);
                    s_byte_pos          <= s_byte_pos - 1;
                    s_s_tready <= '0';
                end if;

            when read_axil =>  
                if s_axil_valid = '1' then 
                    s_start             <= '1';
                    s_read_buffer(0)(G_AXIL_DATA_WIDTH-1 downto 0) <=     
                                        s_axil_data;
                end if;
                if (read_state = DONE) then
                    s_current_state <= read_payload;
                end if;
            

            when write_payload =>
                if s_buf_cnt = unsigned(s_length) then 
                    s_s_tready <= '0';
                    s_current_state <= write_axil;
                    start_write <= '1';
                else
                    if s_byte_pos = 0 then 
                        s_s_tready <= '0';
                        s_byte_pos <= s_num_bytes;
                        s_write_buffer(to_integer(s_buf_cnt)) <= 
                                                      s_write_buffer_temp;
                        s_buf_cnt <= s_buf_cnt + 1;  
                    elsif (s_axis_tvalid = '1' and s_s_tready = '1')  then
                        s_write_buffer_temp <= s_write_buffer_temp
                        (s_write_buffer_temp'length-8-1 downto 0) 
                         & s_axis_tdata;
                        s_byte_pos <= s_byte_pos - 1;  
                        if s_byte_pos = 1 then 
                            s_s_tready <= '0';
                        end if;   
                    end if;
                end if;

            when write_axil =>  
                s_s_tready <= '0';
                s_axil_idata <= s_write_buffer(0);
                if (write_state = DONE) then
                    s_current_state <= idle;
                end if;

            when read_payload =>
                s_current_state <= idle;
            when others => null;
        end case;
    end if;

end process;

process(clk, reset)
begin
    if (reset = '0') then 
        m_axis_tvalid   <= '0';
        m_axis_tdata    <= (others =>'0');
        s_opptr             <= (others => '0');
        s_op_byte           <= C_NUMB_AXIL_DATA_BYTES;
    elsif rising_edge(clk) then 
        case s_op_state is  
            when idle => 
                if s_start = '1' then 
                    s_opptr     <= (others => '0');
                    s_read_buffer_temp <= s_read_buffer(0);
                    s_op_byte   <= s_num_bytes;
                    s_op_state  <= output;
                end if;
            when output =>
                m_axis_tvalid <= '0';
                if s_opptr = unsigned(s_length) then 
                    s_op_state <= idle;
                else
                    if m_axis_tready = '1' then                   
                        if s_op_byte = 0 then 
                            s_op_byte   <= s_num_bytes;
                            s_opptr     <= s_opptr + 1;
                            s_read_buffer_temp <= s_read_buffer
                                        (to_integer(s_opptr) + 1); 
                        else
                          m_axis_tvalid <= '1';
                          m_axis_tdata <= s_read_buffer_temp(7 downto 0);
                          s_read_buffer_temp <= C_ZERO_PAD 
                             & s_read_buffer_temp
                             (s_read_buffer_temp'length-1 downto 8);
                            s_op_byte <= s_op_byte - 1; 
                        end if;     
                    end if;      
                end if;
        end case;
    end if;

end process;


process(clk, reset)
begin  

    if (reset = '0') then 
        write_state <= IDLE;
        axi_awaddr  <= (others =>'0');
        axi_awprot  <= (others =>'0');
        axi_awvalid <= '0';
        axi_wdata   <= (others =>'0');
        axi_wstrb   <= (others =>'0');
        axi_wvalid  <= '0';
        axi_bready  <= '0';
    elsif rising_edge(clk) then 

        case write_state is
            --Send write address
            when IDLE =>
                if start_write = '1' then
                    write_state <= START;
                end if;
            when START =>
                axi_awaddr  <= s_address;
                axi_awprot  <= "010";
                axi_awvalid <= '1';
                write_state <= CHECK_ADDR_RESP;

            --Wait for slave to acknowledge receipt
            when CHECK_ADDR_RESP =>
                if (axi_awready = '1' ) then
                    axi_awaddr  <= (others => '0');
                    axi_awprot  <= (others => '0');
                    axi_awvalid <= '0';
                    write_state <= WRITE_DATA;
                else
                    write_state <= CHECK_ADDR_RESP;
                end if;
            --Send write data
            when WRITE_DATA =>          
                axi_wdata  <= s_axil_idata;
                axi_wvalid <= '1';
                if (axi_wready = '1') then     
                    write_state <= RESP_READY;
                else
                    write_state <= WRITE_DATA;
                end if;
            --Set response ready
            when RESP_READY =>  
                axi_wvalid <= '0';
                axi_bready <= '1';
                write_state <= CHECK_RESP;
            --Check the response
            when CHECK_RESP =>
                if (axi_bvalid = '1') then
                    axi_bready <= '0';
                    write_state <= DONE;
                end if; 
            --Indicate the transaction has completed
            when DONE =>
                write_state <= IDLE;
            when others =>
                write_state <= START;
        end case;
    end if;
end process;

process(clk, reset)
begin  

    if (reset = '0') then 
        read_state <= IDLE;   
        axi_araddr  <= (others =>'0');
        axi_arprot  <= (others =>'0');
        axi_arvalid <= '0';
        axi_rready  <= '0';
    elsif rising_edge(clk) then 
    case read_state is
    when IDLE =>
        if start_read = '1' then
         read_state <= START;
        end if;
    --Send read address
    when START =>
        axi_araddr  <= s_address;
        axi_arprot  <= "010";
        axi_arvalid <= '1';
        s_axil_valid <= '0';
        read_state <= CHECK_ADDR_RESP;

    --Wait for the slave to acknowledge receipt of the address
    when CHECK_ADDR_RESP =>
        if (axi_arready = '1' ) then
            axi_araddr  <= (others => '0');
            axi_arprot  <= (others => '0');
            axi_arvalid <= '0';

            read_state <= READ_DATA;
        else
            read_state <= CHECK_ADDR_RESP;
        end if;
        s_axil_valid <= '0';

    --Read data from the slave
    when READ_DATA =>
        s_axil_data  <= axi_rdata; 
        if (axi_rvalid = '1') then                   
            s_axil_valid <= '1';
            read_state <= DONE;
        else
            s_axil_valid <= '0';
            read_state <= READ_DATA;
        end if;
        axi_rready <= '1';            
    --Indicate the transaction has completed
    when DONE =>
        axi_rready <= '0';
        s_axil_data  <= (others => '0');
        s_axil_valid <= '0';
        read_state <= IDLE;
    when others =>
         read_state <= START;

end case;
end if;
end process;

end architecture;

To test the developed module, I again used a cocotb test bench. This test bench can apply AXI streaming commands and replicates an AXI memory connected to the AXI Lite interface. Using cocotb, we can then easily recreate a simple read and write access and ensure that the data is as expected. Using external AXI test structures ensures that we have the AXI implementation correctly for the standard.

import cocotb
from cocotb.clock import Clock
from cocotb.triggers import Timer
from cocotb.regression import TestFactory

from cocotbext.axi import (AxiStreamSource, AxiStreamBus, AxiStreamSink, AxiLiteBus,  AxiLiteRam,  AxiBus, AxiRam)

async def reset_dut(reset_n, duration_ns):
    reset_n.value = 0
    await Timer(duration_ns, units="ns")
    reset_n.value = 1
    reset_n._log.debug("Reset complete")
    

@cocotb.test()
async def run_test(dut):
    
    PERIOD = 10
    global clk 
    dut.clk.value = 0
    dut.reset.value = 0
    dut.axi_awaddr.value = 0
    dut.axi_awvalid.value = 0
    dut.axi_awready.value = 0
    dut.axi_wdata.value = 0
    dut.axi_wvalid.value = 0
    dut.axi_bready.value = 0
    dut.axi_araddr.value = 0
    dut.axi_arvalid.value = 0
    dut.axi_rready.value = 0
    dut.axi_wready.value = 0
    #dut.m_axil_bresp.value = 0
    dut.axi_bvalid.value = 0
    dut.axi_arready.value = 0
    #dut.m_axil_rdata.value = 0
    #dut.m_axil_rresp.value = 0
    dut.axi_rvalid.value = 0
    dut.m_axis_tdata.value = 0
    dut.m_axis_tvalid.value = 0
    dut.m_axis_tready.value= 0

    cocotb.start_soon(Clock(dut.clk, PERIOD, units="ns").start())
        
    await reset_dut(dut.reset, 50)
    dut._log.debug("After reset")
    await Timer(20*PERIOD, units='ns')
    axis_source = AxiStreamSource
    (AxiStreamBus.from_prefix(dut, "s_axis"), dut.clk, dut.reset)
    axis_sink = AxiStreamSink
    (AxiStreamBus.from_prefix(dut, "m_axis"), dut.clk, dut.reset)
    axi_master = AxiLiteRam
    (AxiLiteBus.from_prefix(dut, "axi"), dut.clk, dut.reset, size=2**16)
    
    data = [0x09,0x00, 0x00, 0x00, 0x04,0x01,0x55,0xaa,0x12,0x34] 
    await axis_source.send(data)
    await axis_source.wait()
    
    await Timer(20*PERIOD, units='ns')
    
    axi_master.write_dword(0x0000,0x98765432)
    data = axi_master.read_dword(0x0000)
   
    dut._log.info("Mem Data %x" % data)
    
    data = [0x05,0x00, 0x00, 0x00, 0x00,0x01] 
    await axis_source.send(data)
    await axis_source.wait()
    await axis_sink.recv()
    await Timer(20*PERIOD, units='ns')

       

The simulation results show that the read and write access are as expected and the protocol block works as would be expected.

This block can be integrated with the UART block to create a simple system which enables us to access the AXI network.


As I mentioned in the introduction, this can be expanded to support a more flexible solution including AXI4. We do have a more complex implementation of this which will be open sourced soon.


I will integrate this within a device in a future blog and show how we can access and debug our systems during integration and test.


Workshops and Webinars

Enjoy the blog why not take a look at the free webinars, workshops and training courses we have created over the years. Highlights include


Embedded System Book

Do you want to know more about designing embedded systems from scratch? Check out our book on creating embedded systems. This book will walk you through all the stages of requirements, architecture, component selection, schematics, layout, and FPGA / software design.


We designed and manufactured the board at the heart of the book! The schematics and layout are available in Altium here


Learn more about the board (see previous blogs on Bring up, DDR validation, USB, Sensors) and view the schematics here.



Sponsored by AMD Xilinx

0 comments

Comments


bottom of page