top of page

MicroZed Chronicles: System Integration and Debugging

Updated: Jan 18

As we know, FPGA design and FPGA verification are complex businesses. Then, of course, comes integration, at which time the actual hardware is deployed and tested where we often need to control elements of the design or understand what is happening if things are not going to plan.


One of the key elements to developing an FPGA is to use a standard internal bus such as AXI, AHB or Wishbone. Typically, we tend to use either AXI or AHB and using a standard interface means we can leverage IP easily to reduce development time and cost. One common approach is to implement an external UART which is easily accessed (it is three pins) and provides the ability to access the internal AXI network. We do this even in SoC-based designs or those that have a softcore processor because it gives the ability to access the network without modifying the software applications.


Doing this is very simple. We create a UART which has AXIS interfaces for TX and RX and a protocol block that wraps around the UART. This protocol block can then convert the received bytes into an AXI / APB transaction.

The beauty of this approach is that the UART can be easily swapped out with other interfaces. For example, an SPI module can provide simple access to the FPGA internal network if an external processor is used.


The protocol we use internally defines an OP code byte (read, write, burst etc), four address bytes, a length byte to support burst accesses, and the data payload either to write or read.



Over the next few weeks, we will look at how we can do this starting with the creation of the UART. This UART is simple and must be able to send and receive data independently in addition to interfacing with the higher-level protocol block using AXI Stream.

To implement this UART, I will use two state machines. The first one transmits information received from the slave AXIS interface on the UART TX and the second receives UART data and outputs it over the master AXIS interface.


The timing of the baud rate is controlled by a baud rate generator which is reset at the start of each transmission. While on the reception, the half and bit periods are calculated from the detection of the start bit. Since the input is asynchronous to the system clock, metastability registers are provided. The incoming bits are sampled at the midpoint of the nominal bit period.


Depending upon the clock frequency and requested baud rate, a package is used to calculate the required vector lengths in addition to calculating the parity on reception and transmission.




library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use ieee.math_real.all;

use work.adiuvo_uart.all;

entity uart is generic (
  reset_level : std_logic := '0'; -- reset level which causes a reset
  clk_freq    : natural   := 100_000_000; -- oscillator frequency
  baud_rate   : natural   := 115200 -- baud rate 
);
port (
  --!System Inputs 
  clk   : in std_logic;
  reset : in std_logic;

  --!External Interfaces
  rx : in std_logic;
  tx  : out std_logic;

  --! Master AXIS Interface  
  m_axis_tready : in  std_logic;
  m_axis_tdata  : out std_logic_vector(7 downto 0);
  m_axis_tvalid : out std_logic;

  --! Slave AXIS Interface
  s_axis_tready : out  std_logic;
  s_axis_tdata  : in std_logic_vector(7 downto 0);
  s_axis_tvalid : in std_logic
  
  );
  
end entity;
architecture rtl of uart is

  constant bit_period : integer := (clk_freq/baud_rate) - 1;
  type cntrl_fsm is (idle, set_tx,wait_tx);
  type rx_fsm is (idle, start, sample, check, wait_axis);

  signal current_state : cntrl_fsm; --:= idle;
  signal rx_state      : rx_fsm;-- := idle;
  signal baud_counter  : unsigned(vector_size(real(clk_freq), 
                         real(baud_rate)) downto 0) := (others => '0'); 
  signal baud_en       : std_logic := '0';
  signal meta_reg      : std_logic_vector(3 downto 0)                                     
                         := (others => '0'); -- fe detection too
  signal capture       : std_logic_vector(8 downto 0)                                    
                         := (others => '0'); -- data and parity
  signal bit_count     : integer range 0 to 1023:= 0;
  signal pos_count     : integer range 0 to 15 := 0;
  signal running       : std_logic := '0';
  signal load_tx       : std_logic := '0';
  signal complete      : std_logic := '0';

  signal tx_reg  : std_logic_vector(11 downto 0) := (others => '0');
  signal tmr_reg : std_logic_vector(11 downto 0) := (others => '0');
  signal payload : std_logic_vector(7 downto 0)  := (others => '0');
  constant zero  : std_logic_vector(tmr_reg'range) := (others => '0');
begin

  process (reset, clk)
  begin
    if reset = reset_level then
      current_state <= idle;
      payload       <= (others => '0');
      load_tx <= '0';
    elsif rising_edge(clk) then
      load_tx <= '0';
      case current_state is
        when idle =>
          if s_axis_tvalid = '1' then
            current_state <= set_tx;
            load_tx       <= '1';
            payload       <= s_axis_tdata;
          end if;
        when set_tx =>
          current_state <= wait_tx;
        when wait_tx =>
          if complete = '1' then
            current_state <= idle;
          end if;
        when others => 
         current_state <= idle;
      end case;
    end if;
  end process;

  s_axis_tready <= '1' when (current_state = idle) else '0';

  process (reset, clk)
  --! baud counter for output TX 
  begin
    if reset = reset_level then
      baud_counter <= (others => '0');
      baud_en      <= '0';
    elsif rising_edge(clk) then
      baud_en <= '0';
      if (load_tx = '1') then
        baud_counter <= (others => '0');
      elsif (baud_counter = bit_period) then
        baud_en      <= '1';
        baud_counter <= (others => '0');
      else
        baud_counter <= baud_counter + 1;
      end if;
    end if;
  end process;

  process (reset, clk)
  --!metastability protection rx signal
  begin
    if reset = reset_level then
      meta_reg <= (others => '1');
    elsif rising_edge(clk) then
      meta_reg <= meta_reg(meta_reg'high - 1 downto meta_reg'low) & rx;
    end if;
  end process;

  process (reset, clk)
  begin
    if reset = reset_level then
      pos_count <= 0;
      bit_count <= 0;
      capture     <= (others => '0');
      rx_state    <= idle;
      m_axis_tvalid <= '0';
      m_axis_tdata     <= (others => '0');
      
    elsif rising_edge(clk) then
      case rx_state is
        when idle =>
          m_axis_tvalid  <= '0';
          if meta_reg(meta_reg'high downto meta_reg'high - 1) = fe_det 
          then 
            pos_count <= 0;
            bit_count <= 0;
            capture  <= (others => '0');
            rx_state <= start;
          end if;
        when start =>
          if bit_count = bit_period then
            bit_count <= 0;
            rx_state  <= sample;
          else
            bit_count <= bit_count + 1;
          end if;
        when sample =>
          bit_count <= bit_count + 1;
          rx_state  <= sample;
          if bit_count = (bit_period/2) and (pos_count < 9) then 
            capture <= meta_reg(meta_reg'high) 
                       & capture(capture'high downto capture'low + 1);
          elsif bit_count = bit_period then
            if pos_count = 9 then 
              rx_state <= check;
            else
              pos_count <= pos_count + 1;
              bit_count <= 0;
            end if;
          end if;
        when check =>
          if parity(capture) = '1' then
            m_axis_tvalid <= '1';
            m_axis_tdata  <= capture(7 downto 0);
            rx_state      <= wait_axis;
          else
            rx_state    <= idle;
          end if;
        when wait_axis =>
          if m_axis_tready = '1' then 
            m_axis_tvalid <= '0';
            rx_state      <= idle;    
          end if;
      end case;
    end if;
  end process;

  op_uart : process (reset, clk)
  begin
    if reset = reset_level then
      tx_reg  <= (others => '1');
      tmr_reg <= (others => '0');
    elsif rising_edge(clk) then
      if load_tx = '1' then
        tx_reg  <= stop_bit & not(parity(payload)) & payload & start_bit ;
        tmr_reg <= (others => '1');
      elsif baud_en = '1' then
        tx_reg  <= '1' & tx_reg(tx_reg'high downto tx_reg'low + 1);
        tmr_reg <= tmr_reg(tmr_reg'high - 1 downto tmr_reg'low) & '0';
      end if;
    end if;
  end process;

  tx       <= tx_reg(tx_reg'low);
  complete <= '1' when (tmr_reg = zero and current_state = wait_tx) else   
              '0';
end architecture;

Uart Package


library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use ieee.math_real.all;


package adiuvo_uart is 

    function vector_size(clk_freq, baud_rate : real) return integer;
    function parity (a : std_logic_vector) return std_logic;
    constant fe_det     : std_logic_vector(1 downto 0) := "10";
    constant start_bit  : std_logic                    := '0';
    constant stop_bit   : std_logic_vector             := "11";
end package;

package body adiuvo_uart is 

    function vector_size(clk_freq, baud_rate : real) return integer is
        variable div                             : real;
        variable res                             : real;
      begin
        div := (clk_freq/baud_rate);
        res := CEIL(LOG(div)/LOG(2.0));
        return integer(res - 1.0);
      end;
    
      function parity (a : std_logic_vector) return std_logic is
        variable y         : std_logic := '0';
      begin
        for i in a'range loop
          y := y xor a(i);
        end loop;
        return y;
      end parity;
    

      
end package body adiuvo_uart;

To simulate this, I used cocotb which allows the implementation of the AXIS sink and sources coupled with the UART. I used the excellent cocotb extensions provided by Alex Forencich which enable quick simulation of transmission and reception.


import cocotb
from cocotb.clock import Clock
from cocotb.triggers import Timer
from cocotb.regression import TestFactory
from cocotbext.axi import AxiStreamSource, AxiStreamBus, AxiStreamSink
from cocotbext.uart import UartSource, UartSink


async def reset_dut(reset, duration_ns):
    reset.value = 0
    await Timer(duration_ns, units="ns")
    reset.value = 1
    reset._log.debug("Reset complete")


@cocotb.test()
async def run_test(dut):
    PERIOD = 10
    global clk 
    cocotb.start_soon(Clock(dut.clk, PERIOD, units="ns").start())
    clk = dut.clk
    
    await reset_dut(dut.reset, 50)
    dut._log.debug("After reset")
    await Timer(20*PERIOD, units='ns')
    
    axis_source = AxiStreamSource(AxiStreamBus.from_prefix(dut, "s_axis"), dut.clk, dut.reset)
    axis_sink = AxiStreamSink(AxiStreamBus.from_prefix(dut, "m_axis"), dut.clk, dut.reset)
    uart_source = UartSource(dut.rx, baud=115200, bits=8)
    uart_sink = UartSink(dut.tx, baud=115200, bits=8)
    
    
    data = [0xaa,0x55] 
    await axis_source.send(data)
    
    data_rx = await uart_sink.read()
    data_rx = await uart_sink.read()
   
    
    data_uart = [0x12] 
    await uart_source.send(data_uart)
    data_axis = await axis_sink.recv()
 
   

When simulated, we can see the expected behavior working with the AXIS and UART interfaces. Now we can connect this with the higher-level protocol block to be able to access memories and peripherals.


We will look at that module and its implementation in the FPGA in a blog soon!


Workshops and Webinars

Enjoy the blog why not take a look at the free webinars, workshops and training courses we have created over the years. Highlights include


Embedded System Book

Do you want to know more about designing embedded systems from scratch? Check out our book on creating embedded systems. This book will walk you through all the stages of requirements, architecture, component selection, schematics, layout, and FPGA / software design.


We designed and manufactured the board at the heart of the book! The schematics and layout are available in Altium here


Learn more about the board (see previous blogs on Bring up, DDR validation, USB, Sensors) and view the schematics here.


Order here


Sponsored by AMD Xilinx

817 views1 comment