As we know, FPGA design and FPGA verification are complex businesses. Then, of course, comes integration, at which time the actual hardware is deployed and tested where we often need to control elements of the design or understand what is happening if things are not going to plan.
One of the key elements to developing an FPGA is to use a standard internal bus such as AXI, AHB or Wishbone. Typically, we tend to use either AXI or AHB and using a standard interface means we can leverage IP easily to reduce development time and cost. One common approach is to implement an external UART which is easily accessed (it is three pins) and provides the ability to access the internal AXI network. We do this even in SoC-based designs or those that have a softcore processor because it gives the ability to access the network without modifying the software applications.
Doing this is very simple. We create a UART which has AXIS interfaces for TX and RX and a protocol block that wraps around the UART. This protocol block can then convert the received bytes into an AXI / APB transaction.
The beauty of this approach is that the UART can be easily swapped out with other interfaces. For example, an SPI module can provide simple access to the FPGA internal network if an external processor is used.
The protocol we use internally defines an OP code byte (read, write, burst etc), four address bytes, a length byte to support burst accesses, and the data payload either to write or read.
Over the next few weeks, we will look at how we can do this starting with the creation of the UART. This UART is simple and must be able to send and receive data independently in addition to interfacing with the higher-level protocol block using AXI Stream.
To implement this UART, I will use two state machines. The first one transmits information received from the slave AXIS interface on the UART TX and the second receives UART data and outputs it over the master AXIS interface.
The timing of the baud rate is controlled by a baud rate generator which is reset at the start of each transmission. While on the reception, the half and bit periods are calculated from the detection of the start bit. Since the input is asynchronous to the system clock, metastability registers are provided. The incoming bits are sampled at the midpoint of the nominal bit period.
Depending upon the clock frequency and requested baud rate, a package is used to calculate the required vector lengths in addition to calculating the parity on reception and transmission.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use ieee.math_real.all;
use work.adiuvo_uart.all;
entity uart is generic (
reset_level : std_logic := '0'; -- reset level which causes a reset
clk_freq : natural := 100_000_000; -- oscillator frequency
baud_rate : natural := 115200 -- baud rate
);
port (
--!System Inputs
clk : in std_logic;
reset : in std_logic;
--!External Interfaces
rx : in std_logic;
tx : out std_logic;
--! Master AXIS Interface
m_axis_tready : in std_logic;
m_axis_tdata : out std_logic_vector(7 downto 0);
m_axis_tvalid : out std_logic;
--! Slave AXIS Interface
s_axis_tready : out std_logic;
s_axis_tdata : in std_logic_vector(7 downto 0);
s_axis_tvalid : in std_logic
);
end entity;
architecture rtl of uart is
constant bit_period : integer := (clk_freq/baud_rate) - 1;
type cntrl_fsm is (idle, set_tx,wait_tx);
type rx_fsm is (idle, start, sample, check, wait_axis);
signal current_state : cntrl_fsm; --:= idle;
signal rx_state : rx_fsm;-- := idle;
signal baud_counter : unsigned(vector_size(real(clk_freq),
real(baud_rate)) downto 0) := (others => '0');
signal baud_en : std_logic := '0';
signal meta_reg : std_logic_vector(3 downto 0)
:= (others => '0'); -- fe detection too
signal capture : std_logic_vector(8 downto 0)
:= (others => '0'); -- data and parity
signal bit_count : integer range 0 to 1023:= 0;
signal pos_count : integer range 0 to 15 := 0;
signal running : std_logic := '0';
signal load_tx : std_logic := '0';
signal complete : std_logic := '0';
signal tx_reg : std_logic_vector(11 downto 0) := (others => '0');
signal tmr_reg : std_logic_vector(11 downto 0) := (others => '0');
signal payload : std_logic_vector(7 downto 0) := (others => '0');
constant zero : std_logic_vector(tmr_reg'range) := (others => '0');
begin
process (reset, clk)
begin
if reset = reset_level then
current_state <= idle;
payload <= (others => '0');
load_tx <= '0';
elsif rising_edge(clk) then
load_tx <= '0';
case current_state is
when idle =>
if s_axis_tvalid = '1' then
current_state <= set_tx;
load_tx <= '1';
payload <= s_axis_tdata;
end if;
when set_tx =>
current_state <= wait_tx;
when wait_tx =>
if complete = '1' then
current_state <= idle;
end if;
when others =>
current_state <= idle;
end case;
end if;
end process;
s_axis_tready <= '1' when (current_state = idle) else '0';
process (reset, clk)
--! baud counter for output TX
begin
if reset = reset_level then
baud_counter <= (others => '0');
baud_en <= '0';
elsif rising_edge(clk) then
baud_en <= '0';
if (load_tx = '1') then
baud_counter <= (others => '0');
elsif (baud_counter = bit_period) then
baud_en <= '1';
baud_counter <= (others => '0');
else
baud_counter <= baud_counter + 1;
end if;
end if;
end process;
process (reset, clk)
--!metastability protection rx signal
begin
if reset = reset_level then
meta_reg <= (others => '1');
elsif rising_edge(clk) then
meta_reg <= meta_reg(meta_reg'high - 1 downto meta_reg'low) & rx;
end if;
end process;
process (reset, clk)
begin
if reset = reset_level then
pos_count <= 0;
bit_count <= 0;
capture <= (others => '0');
rx_state <= idle;
m_axis_tvalid <= '0';
m_axis_tdata <= (others => '0');
elsif rising_edge(clk) then
case rx_state is
when idle =>
m_axis_tvalid <= '0';
if meta_reg(meta_reg'high downto meta_reg'high - 1) = fe_det
then
pos_count <= 0;
bit_count <= 0;
capture <= (others => '0');
rx_state <= start;
end if;
when start =>
if bit_count = bit_period then
bit_count <= 0;
rx_state <= sample;
else
bit_count <= bit_count + 1;
end if;
when sample =>
bit_count <= bit_count + 1;
rx_state <= sample;
if bit_count = (bit_period/2) and (pos_count < 9) then
capture <= meta_reg(meta_reg'high)
& capture(capture'high downto capture'low + 1);
elsif bit_count = bit_period then
if pos_count = 9 then
rx_state <= check;
else
pos_count <= pos_count + 1;
bit_count <= 0;
end if;
end if;
when check =>
if parity(capture) = '1' then
m_axis_tvalid <= '1';
m_axis_tdata <= capture(7 downto 0);
rx_state <= wait_axis;
else
rx_state <= idle;
end if;
when wait_axis =>
if m_axis_tready = '1' then
m_axis_tvalid <= '0';
rx_state <= idle;
end if;
end case;
end if;
end process;
op_uart : process (reset, clk)
begin
if reset = reset_level then
tx_reg <= (others => '1');
tmr_reg <= (others => '0');
elsif rising_edge(clk) then
if load_tx = '1' then
tx_reg <= stop_bit & not(parity(payload)) & payload & start_bit ;
tmr_reg <= (others => '1');
elsif baud_en = '1' then
tx_reg <= '1' & tx_reg(tx_reg'high downto tx_reg'low + 1);
tmr_reg <= tmr_reg(tmr_reg'high - 1 downto tmr_reg'low) & '0';
end if;
end if;
end process;
tx <= tx_reg(tx_reg'low);
complete <= '1' when (tmr_reg = zero and current_state = wait_tx) else
'0';
end architecture;
Uart Package
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use ieee.math_real.all;
package adiuvo_uart is
function vector_size(clk_freq, baud_rate : real) return integer;
function parity (a : std_logic_vector) return std_logic;
constant fe_det : std_logic_vector(1 downto 0) := "10";
constant start_bit : std_logic := '0';
constant stop_bit : std_logic_vector := "11";
end package;
package body adiuvo_uart is
function vector_size(clk_freq, baud_rate : real) return integer is
variable div : real;
variable res : real;
begin
div := (clk_freq/baud_rate);
res := CEIL(LOG(div)/LOG(2.0));
return integer(res - 1.0);
end;
function parity (a : std_logic_vector) return std_logic is
variable y : std_logic := '0';
begin
for i in a'range loop
y := y xor a(i);
end loop;
return y;
end parity;
end package body adiuvo_uart;
To simulate this, I used cocotb which allows the implementation of the AXIS sink and sources coupled with the UART. I used the excellent cocotb extensions provided by Alex Forencich which enable quick simulation of transmission and reception.
import cocotb
from cocotb.clock import Clock
from cocotb.triggers import Timer
from cocotb.regression import TestFactory
from cocotbext.axi import AxiStreamSource, AxiStreamBus, AxiStreamSink
from cocotbext.uart import UartSource, UartSink
async def reset_dut(reset, duration_ns):
reset.value = 0
await Timer(duration_ns, units="ns")
reset.value = 1
reset._log.debug("Reset complete")
@cocotb.test()
async def run_test(dut):
PERIOD = 10
global clk
cocotb.start_soon(Clock(dut.clk, PERIOD, units="ns").start())
clk = dut.clk
await reset_dut(dut.reset, 50)
dut._log.debug("After reset")
await Timer(20*PERIOD, units='ns')
axis_source = AxiStreamSource(AxiStreamBus.from_prefix(dut, "s_axis"), dut.clk, dut.reset)
axis_sink = AxiStreamSink(AxiStreamBus.from_prefix(dut, "m_axis"), dut.clk, dut.reset)
uart_source = UartSource(dut.rx, baud=115200, bits=8)
uart_sink = UartSink(dut.tx, baud=115200, bits=8)
data = [0xaa,0x55]
await axis_source.send(data)
data_rx = await uart_sink.read()
data_rx = await uart_sink.read()
data_uart = [0x12]
await uart_source.send(data_uart)
data_axis = await axis_sink.recv()
When simulated, we can see the expected behavior working with the AXIS and UART interfaces. Now we can connect this with the higher-level protocol block to be able to access memories and peripherals.
We will look at that module and its implementation in the FPGA in a blog soon!
Workshops and Webinars
Enjoy the blog why not take a look at the free webinars, workshops and training courses we have created over the years. Highlights include
Ultra96, MiniZed & ZU1 three day course looking at HW, SW and Petalinux
Arty Z7-20 Class looking at HW, SW and Petalinux
Mastering MicroBlaze learn how to create MicroBlaze solutions
HLS Hero Workshop learn how to create High Level Synthesis based solutions
Embedded System Book
Do you want to know more about designing embedded systems from scratch? Check out our book on creating embedded systems. This book will walk you through all the stages of requirements, architecture, component selection, schematics, layout, and FPGA / software design.
We designed and manufactured the board at the heart of the book! The schematics and layout are available in Altium here
Learn more about the board (see previous blogs on Bring up, DDR validation, USB, Sensors) and view the schematics here.
Sponsored by AMD Xilinx
Hello Mr. Taylor,
Thank you for this great blog. Is it possible to get the package "adiuvo_uart"?
Thank you in advance!
Faraj