Last week we examined how we could create a UART with AXI Stream interfaces to enable access to AXI buses in device for debugging.
In this blog, we are going to implement the protocol level element of this, which is the bit that takes the bytes received over the AXI Stream and converts them into AXI Lite accesses. You can also use the same approach with some additional to work with full AXI.
First let’s recap. We are going to enable either a read or write of a single AXI register using commands received over a UART.
The AXI Stream interactions will be as follows.
The structure will be as follows.
Write Op Code – 1 byte, value 0x09
Read Op Code – 1 byte, value 0x05
Address – 4 bytes the address of the AXI interaction
Length – 1 byte always 1 for AXI Lite implementations
Payload – Words to write or received data, 4 bytes are provided for an AXI Lite read or write
This will allow using AXI Lite to read or write to a single register address. In our production version of this code, we provide both AXI Lite and AXI4 along with configurable generics for the bus / address widths etc. I have however, kept this one a little simpler to demonstrate the concepts involved.
First, let’s look at the microarchitecture of the module where we need to be able to receive several bytes from the AXIS slave interface. These bytes determine the action required by the module, which either performs a read or write at the requested address. This requires a state machine which can receive information over the AXIS slave interface and collect the address, length, and data bytes.
With the bytes collected, either an AXI Lite write or AXI Lite read is performed. Since AXI write and read are separate channels, we can use separate state machines for the AXI Lite interfacing.
The final element of the microarchitecture is outputting received AXI read data over the AXI Stream master interface.
The diagrams for each of the state machines can be seen below. When I wrote the main state machine, I tried to ensure that the HDL was flexible and would scale. For example, I included buffers which could hold multiple read and write data words.
The source code can be seen below.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
--Declare entity
entity axi_protocol is
generic(
G_AXIL_DATA_WIDTH :integer := 32;
G_AXI_ADDR_WIDTH :integer := 32;
G_AXI_ID_WIDTH :integer := 8;
G_AXI_AWUSER_WIDTH :integer := 1
);
port(
--Master clock & reset
clk :in std_ulogic;
reset :in std_ulogic;
--! Master AXIS Interface
m_axis_tready : in std_logic;
m_axis_tdata : out std_logic_vector(7 downto 0);
m_axis_tvalid : out std_logic;
--! Slave AXIS Interface
s_axis_tready : out std_logic;
s_axis_tdata : in std_logic_vector(7 downto 0);
s_axis_tvalid : in std_logic;
--! AXIL Interface
--!Write address
axi_awaddr : out std_logic_vector
(G_AXI_ADDR_WIDTH-1 downto 0);
axi_awprot : out std_logic_vector(2 downto 0);
axi_awvalid : out std_logic;
--!write data
axi_wdata : out std_logic_vector
(G_AXIL_DATA_WIDTH-1 downto 0);
axi_wstrb : out std_logic_vector
(G_AXIL_DATA_WIDTH/8-1 downto 0);
axi_wvalid : out std_logic;
--!write response
axi_bready : out std_logic;
--!read address
axi_araddr : out std_logic_vector
(G_AXI_ADDR_WIDTH-1 downto 0);
axi_arprot : out std_logic_vector(2 downto 0);
axi_arvalid : out std_logic;
--!read data
axi_rready : out std_logic;
--write address
axi_awready : in std_logic;
--write data
axi_wready : in std_logic;
--write response
axi_bresp : in std_logic_vector(1 downto 0);
axi_bvalid : in std_logic;
--read address
axi_arready : in std_logic;
--read data
axi_rdata : in std_logic_vector
(G_AXIL_DATA_WIDTH-1 downto 0);
axi_rresp : in std_logic_vector(1 downto 0);
axi_rvalid : in std_logic
);
end entity axi_protocol;
architecture rtl of axi_protocol is
constant C_SINGLE_READ : std_logic_vector(7 downto 0) := x"05";
constant C_SINGLE_WRITE : std_logic_vector(7 downto 0) := x"09";
constant C_NUMB_ADDR_BYTES : integer := 4;
constant C_NUMB_LENGTH_BYTES : integer := 1;
constant C_NUMB_DATA_BYTES : integer := 4;
constant C_NUMB_AXIL_DATA_BYTES : integer := 4;
constant C_NUMB_CRC_BYTES : integer := 4;
constant C_MAX_NUMB_BYTES : integer := 4;
constant C_ZERO_PAD : std_logic_vector(7 downto 0)
:= (others => '0');
type t_fsm is (idle, address, length, dummy,
write_payload, read_payload, crc,
write_axil, write_axi, read_axi, read_axil);
type t_op_fsm is (idle, output);
type t_array is array (0 to 7) of std_logic_vector(31 downto 0);
type axil_read_fsm is (IDLE, START, CHECK_ADDR_RESP, READ_DATA, DONE);
type axil_write_fsm is (IDLE, START, CHECK_ADDR_RESP, WRITE_DATA,
RESP_READY, CHECK_RESP, DONE);
signal write_state : axil_write_fsm;
signal read_state : axil_read_fsm;
signal s_current_state : t_fsm;
signal s_command : std_logic_vector(7 downto 0);
signal s_address : std_logic_vector
((C_NUMB_ADDR_BYTES * 8)-1 downto 0);
signal s_length : std_logic_vector(7 downto 0);
signal s_length_axi : std_logic_vector(7 downto 0);
signal s_buf_cnt : unsigned(7 downto 0);
signal s_byte_pos : integer range 0 to C_MAX_NUMB_BYTES;
signal s_num_bytes : integer range 0 to C_MAX_NUMB_BYTES;
signal s_s_tready : std_logic;
signal s_write_buffer : t_array :=(others=>(others=>'0'));
signal s_read_buffer : t_array :=(others=>(others=>'0'));
signal s_write_buffer_temp : std_logic_vector(31 downto 0);
signal s_read_buffer_temp : std_logic_vector(31 downto 0);
--axil lite data interface
signal s_axil_data : std_logic_vector
(G_AXIL_DATA_WIDTH-1 downto 0);
signal s_axil_valid : std_logic;
signal s_axil_idata : std_logic_vector
(G_AXIL_DATA_WIDTH-1 downto 0);
--axi mstream
signal s_opptr : unsigned(7 downto 0);
signal s_start : std_logic;
signal s_op_state : t_op_fsm;
signal s_op_byte : integer range 0 to C_MAX_NUMB_BYTES;
signal start_read : std_logic;
signal start_write : std_logic;
begin
s_axis_tready <= s_s_tready;
FSM : process(clk, reset )
begin
if (reset = '0') then
start_read <= '0';
start_write <= '0';
s_s_tready <= '0';
elsif rising_edge(clk) then
s_s_tready <= '1';
s_start <= '0';
start_read <= '0';
start_write <= '0';
case s_current_state is
when idle => -- to do needs to check the command is valid
s_buf_cnt <= (others =>'0');
if (s_axis_tvalid = '1' and s_s_tready = '1') and
(s_axis_tdata = C_SINGLE_READ
or s_axis_tdata = C_SINGLE_WRITE) then
s_s_tready <= '0';
s_command <= s_axis_tdata;
s_current_state <= address;
s_byte_pos <= C_NUMB_ADDR_BYTES;
end if;
when address =>
if s_byte_pos = 0 then
s_s_tready <= '0';
s_byte_pos <= C_NUMB_LENGTH_BYTES;
s_current_state <= length;
elsif s_axis_tvalid = '1' and s_s_tready = '1' then
s_address <= s_address(s_address'length-8-1 downto 0)
& s_axis_tdata;
s_byte_pos <= s_byte_pos - 1;
if s_byte_pos = 1 then
s_s_tready <= '0';
end if;
end if;
when length =>
if s_byte_pos = 0 then
s_s_tready <= '0';
if s_command = C_SINGLE_READ
and unsigned(s_length) = 1 then
s_current_state <= read_axil;
start_read <= '1';
s_num_bytes <= C_NUMB_AXIL_DATA_BYTES;
elsif s_command = C_SINGLE_WRITE then
s_buf_cnt <= (others =>'0');
s_byte_pos <= C_NUMB_AXIL_DATA_BYTES;
s_num_bytes <= C_NUMB_AXIL_DATA_BYTES;
s_current_state <= write_payload;
end if;
elsif s_axis_tvalid = '1' and s_s_tready = '1' then
s_length <= s_axis_tdata;
s_length_axi <=
std_logic_vector(unsigned(s_axis_tdata)-1);
s_byte_pos <= s_byte_pos - 1;
s_s_tready <= '0';
end if;
when read_axil =>
if s_axil_valid = '1' then
s_start <= '1';
s_read_buffer(0)(G_AXIL_DATA_WIDTH-1 downto 0) <=
s_axil_data;
end if;
if (read_state = DONE) then
s_current_state <= read_payload;
end if;
when write_payload =>
if s_buf_cnt = unsigned(s_length) then
s_s_tready <= '0';
s_current_state <= write_axil;
start_write <= '1';
else
if s_byte_pos = 0 then
s_s_tready <= '0';
s_byte_pos <= s_num_bytes;
s_write_buffer(to_integer(s_buf_cnt)) <=
s_write_buffer_temp;
s_buf_cnt <= s_buf_cnt + 1;
elsif (s_axis_tvalid = '1' and s_s_tready = '1') then
s_write_buffer_temp <= s_write_buffer_temp
(s_write_buffer_temp'length-8-1 downto 0)
& s_axis_tdata;
s_byte_pos <= s_byte_pos - 1;
if s_byte_pos = 1 then
s_s_tready <= '0';
end if;
end if;
end if;
when write_axil =>
s_s_tready <= '0';
s_axil_idata <= s_write_buffer(0);
if (write_state = DONE) then
s_current_state <= idle;
end if;
when read_payload =>
s_current_state <= idle;
when others => null;
end case;
end if;
end process;
process(clk, reset)
begin
if (reset = '0') then
m_axis_tvalid <= '0';
m_axis_tdata <= (others =>'0');
s_opptr <= (others => '0');
s_op_byte <= C_NUMB_AXIL_DATA_BYTES;
elsif rising_edge(clk) then
case s_op_state is
when idle =>
if s_start = '1' then
s_opptr <= (others => '0');
s_read_buffer_temp <= s_read_buffer(0);
s_op_byte <= s_num_bytes;
s_op_state <= output;
end if;
when output =>
m_axis_tvalid <= '0';
if s_opptr = unsigned(s_length) then
s_op_state <= idle;
else
if m_axis_tready = '1' then
if s_op_byte = 0 then
s_op_byte <= s_num_bytes;
s_opptr <= s_opptr + 1;
s_read_buffer_temp <= s_read_buffer
(to_integer(s_opptr) + 1);
else
m_axis_tvalid <= '1';
m_axis_tdata <= s_read_buffer_temp(7 downto 0);
s_read_buffer_temp <= C_ZERO_PAD
& s_read_buffer_temp
(s_read_buffer_temp'length-1 downto 8);
s_op_byte <= s_op_byte - 1;
end if;
end if;
end if;
end case;
end if;
end process;
process(clk, reset)
begin
if (reset = '0') then
write_state <= IDLE;
axi_awaddr <= (others =>'0');
axi_awprot <= (others =>'0');
axi_awvalid <= '0';
axi_wdata <= (others =>'0');
axi_wstrb <= (others =>'0');
axi_wvalid <= '0';
axi_bready <= '0';
elsif rising_edge(clk) then
case write_state is
--Send write address
when IDLE =>
if start_write = '1' then
write_state <= START;
end if;
when START =>
axi_awaddr <= s_address;
axi_awprot <= "010";
axi_awvalid <= '1';
write_state <= CHECK_ADDR_RESP;
--Wait for slave to acknowledge receipt
when CHECK_ADDR_RESP =>
if (axi_awready = '1' ) then
axi_awaddr <= (others => '0');
axi_awprot <= (others => '0');
axi_awvalid <= '0';
write_state <= WRITE_DATA;
else
write_state <= CHECK_ADDR_RESP;
end if;
--Send write data
when WRITE_DATA =>
axi_wdata <= s_axil_idata;
axi_wvalid <= '1';
if (axi_wready = '1') then
write_state <= RESP_READY;
else
write_state <= WRITE_DATA;
end if;
--Set response ready
when RESP_READY =>
axi_wvalid <= '0';
axi_bready <= '1';
write_state <= CHECK_RESP;
--Check the response
when CHECK_RESP =>
if (axi_bvalid = '1') then
axi_bready <= '0';
write_state <= DONE;
end if;
--Indicate the transaction has completed
when DONE =>
write_state <= IDLE;
when others =>
write_state <= START;
end case;
end if;
end process;
process(clk, reset)
begin
if (reset = '0') then
read_state <= IDLE;
axi_araddr <= (others =>'0');
axi_arprot <= (others =>'0');
axi_arvalid <= '0';
axi_rready <= '0';
elsif rising_edge(clk) then
case read_state is
when IDLE =>
if start_read = '1' then
read_state <= START;
end if;
--Send read address
when START =>
axi_araddr <= s_address;
axi_arprot <= "010";
axi_arvalid <= '1';
s_axil_valid <= '0';
read_state <= CHECK_ADDR_RESP;
--Wait for the slave to acknowledge receipt of the address
when CHECK_ADDR_RESP =>
if (axi_arready = '1' ) then
axi_araddr <= (others => '0');
axi_arprot <= (others => '0');
axi_arvalid <= '0';
read_state <= READ_DATA;
else
read_state <= CHECK_ADDR_RESP;
end if;
s_axil_valid <= '0';
--Read data from the slave
when READ_DATA =>
s_axil_data <= axi_rdata;
if (axi_rvalid = '1') then
s_axil_valid <= '1';
read_state <= DONE;
else
s_axil_valid <= '0';
read_state <= READ_DATA;
end if;
axi_rready <= '1';
--Indicate the transaction has completed
when DONE =>
axi_rready <= '0';
s_axil_data <= (others => '0');
s_axil_valid <= '0';
read_state <= IDLE;
when others =>
read_state <= START;
end case;
end if;
end process;
end architecture;
To test the developed module, I again used a cocotb test bench. This test bench can apply AXI streaming commands and replicates an AXI memory connected to the AXI Lite interface. Using cocotb, we can then easily recreate a simple read and write access and ensure that the data is as expected. Using external AXI test structures ensures that we have the AXI implementation correctly for the standard.
import cocotb
from cocotb.clock import Clock
from cocotb.triggers import Timer
from cocotb.regression import TestFactory
from cocotbext.axi import (AxiStreamSource, AxiStreamBus, AxiStreamSink, AxiLiteBus, AxiLiteRam, AxiBus, AxiRam)
async def reset_dut(reset_n, duration_ns):
reset_n.value = 0
await Timer(duration_ns, units="ns")
reset_n.value = 1
reset_n._log.debug("Reset complete")
@cocotb.test()
async def run_test(dut):
PERIOD = 10
global clk
dut.clk.value = 0
dut.reset.value = 0
dut.axi_awaddr.value = 0
dut.axi_awvalid.value = 0
dut.axi_awready.value = 0
dut.axi_wdata.value = 0
dut.axi_wvalid.value = 0
dut.axi_bready.value = 0
dut.axi_araddr.value = 0
dut.axi_arvalid.value = 0
dut.axi_rready.value = 0
dut.axi_wready.value = 0
#dut.m_axil_bresp.value = 0
dut.axi_bvalid.value = 0
dut.axi_arready.value = 0
#dut.m_axil_rdata.value = 0
#dut.m_axil_rresp.value = 0
dut.axi_rvalid.value = 0
dut.m_axis_tdata.value = 0
dut.m_axis_tvalid.value = 0
dut.m_axis_tready.value= 0
cocotb.start_soon(Clock(dut.clk, PERIOD, units="ns").start())
await reset_dut(dut.reset, 50)
dut._log.debug("After reset")
await Timer(20*PERIOD, units='ns')
axis_source = AxiStreamSource
(AxiStreamBus.from_prefix(dut, "s_axis"), dut.clk, dut.reset)
axis_sink = AxiStreamSink
(AxiStreamBus.from_prefix(dut, "m_axis"), dut.clk, dut.reset)
axi_master = AxiLiteRam
(AxiLiteBus.from_prefix(dut, "axi"), dut.clk, dut.reset, size=2**16)
data = [0x09,0x00, 0x00, 0x00, 0x04,0x01,0x55,0xaa,0x12,0x34]
await axis_source.send(data)
await axis_source.wait()
await Timer(20*PERIOD, units='ns')
axi_master.write_dword(0x0000,0x98765432)
data = axi_master.read_dword(0x0000)
dut._log.info("Mem Data %x" % data)
data = [0x05,0x00, 0x00, 0x00, 0x00,0x01]
await axis_source.send(data)
await axis_source.wait()
await axis_sink.recv()
await Timer(20*PERIOD, units='ns')
The simulation results show that the read and write access are as expected and the protocol block works as would be expected.
This block can be integrated with the UART block to create a simple system which enables us to access the AXI network.
As I mentioned in the introduction, this can be expanded to support a more flexible solution including AXI4. We do have a more complex implementation of this which will be open sourced soon.
I will integrate this within a device in a future blog and show how we can access and debug our systems during integration and test.
Workshops and Webinars
Enjoy the blog why not take a look at the free webinars, workshops and training courses we have created over the years. Highlights include
Ultra96, MiniZed & ZU1 three day course looking at HW, SW and Petalinux
Arty Z7-20 Class looking at HW, SW and Petalinux
Mastering MicroBlaze learn how to create MicroBlaze solutions
HLS Hero Workshop learn how to create High Level Synthesis based solutions
Embedded System Book
Do you want to know more about designing embedded systems from scratch? Check out our book on creating embedded systems. This book will walk you through all the stages of requirements, architecture, component selection, schematics, layout, and FPGA / software design.
We designed and manufactured the board at the heart of the book! The schematics and layout are available in Altium here
Learn more about the board (see previous blogs on Bring up, DDR validation, USB, Sensors) and view the schematics here.
Sponsored by AMD Xilinx
Comments