top of page
Adiuvo Engineering & Training logo
MicroZed Chronicles icon

MicroZed Chronicles: Zynq MPSoC Packet Processing

  • Feb 25
  • 5 min read

One thing that is very useful for many applications is the GEMs within the PS of the MPSoC. We can use these to provide simple networking and communication, or alternatively as part of a larger application. Recently, I was thinking about the performance that can be achieved.



For this application, I decided to set up a design which uses the GEM and the PS to parse packets as they are received and route the packets into one of two BRAMs within the PL.

The GEM is capable of operating at up to 1000 Mbps line rate. However, there are overheads associated with Ethernet headers, Frame Check Sequence (FCS), preamble, and interframe gaps. In practice, we should be able to achieve somewhere in the region of 900–950 Mbps, though this will depend on both the host and the target board.

For this application, I am using the ZUBoard from Avnet / Tria. The first thing we need to do is create a simple design in Vivado.


This Vivado design configures the PS for the ZUBoard and adds two AXI BRAM controllers connected to the PS AXI interface. Each BRAM controller connects to a target BRAM. To ensure maximum performance, the AXI bus width of the AXI network in the PL matches the FPD HP0 AXI interface at 128 bits, while the clock frequency is set to 300 MHz.



SW Architecture

On the software side, we are using bare-metal and the lightweight IP (lwIP) stack to receive and process UDP packets.


The packet structure sent over UDP contains 32 bytes, including:

  • A packet identifier

  • Sequence number

  • BRAM destination identifier

  • Timestamps

  • Number of bytes


The application software must parse this header to determine whether it is a packet it should process, as indicated by the header. Once all bytes have been received, the payload is routed to the appropriate BRAM.


The application software also determines the number of bits received per second to calculate throughput. This information is reported over a UART connection to a terminal.


LwIP Modifications

In Vitis, we need to include the lwIP library when building the platform. Because we are using the ZUBoard for this example, we also need to modify the lwIP file xemacpsif_physpeed.c.


In this file, we must add support for the Microchip PHY used on the ZUBoard and update get_IEEE_Phy_speed() to correctly detect the Microchip PHY. I have included the updated file in my GitHub repository for reference.



Application Overview

Once this is complete, we can create a simple application that performs packet parsing. The full code is available on my GitHub; however, an overview of the key functions is provided below:


cycles_to_ns(uint64_t cyc)Converts a cycle count into nanoseconds using the configured CPU counter frequency (g_cntfrq). Returns 0 if the frequency is not set.


router_set_cntfrq(uint64_t cntfrq_hz)Sets the global counter frequency used for timing conversions (cycles → nanoseconds). Ignores a zero input to avoid invalid timing calculations.


*bram_probe(uintptr_t base, const char name)Sanity-checks that a BRAM address is writable and readable by writing two test patterns and reading them back. Prints “OK” or “FAIL” and returns 0 or -1 accordingly.


bram_write_bytes(...)Copies an arbitrary number of bytes into BRAM at the current write pointer, handling unaligned starts and partial trailing words. Enforces a per-BRAM size limit and clamps writes to prevent overruns.


router_init(void)Prints BRAM base, high address, and size information; probes both BRAMs; resets write pointers; and clears the 1-second statistics timing baseline.


router_handle_payload(...)Updates RX, throughput, and processing-time statistics. Validates the packet header (length, magic value, header size, route), clamps payload length, and routes payload bytes to BRAM0 or BRAM1 based on h.route, while counting per-route totals. Invalid packets increment drop counters.


router_tick_1s(uint64_t now_ms)Every ~1000 ms, prints packet rate, bytes per second, Mbps, average packet length, drops, per-route counts, and processing min/avg/max in cycles and nanoseconds. It then resets all 1-second counters so the next interval starts fresh.


Performance Testing

Running this on the board while directly connected to a host machine enables us to achieve the best performance. To maximise throughput, we need to use large packets, as smaller packets significantly reduce throughput due to overheads and interframe gaps.


My first attempt used Python to send packets to the ZUBoard. While this worked functionally, the throughput was low because Python is not optimal for high packet-rate transmission.


The measured performance was approximately 100 Mbps, which makes sense given that packet rates on Windows using socket.sendto() in Python are typically between 20k and 80k packets per second on most PCs.


To improve performance, I wrote a C application using Winsock to transmit packets. Using this approach, we were able to determine the maximum achievable packet throughput.

This application is also available on my GitHub, along with its executable.



When running the C-based sender, we achieved 923 Mbps, which is very close to the theoretical maximum once protocol overheads are taken into account.


Conclusion


This has been an interesting project, and I plan to return to it and use it as the basis for implementing packet inspection in RTL within the PL.


This will allow us to explore how highly optimised hardware designs compare with software-based packet handling in terms of throughput and determinism.


FPGA Conference

FPGA Horizons US East - April 28th, 29th 2026 - THE FPGA Conference, find out more and get Tickets here.


FPGA Journal

Read about cutting edge FPGA developments, in the FPGA Horizons Journal or contribute an article.


Workshops and Webinars:

If you enjoyed the blog why not take a look at the free webinars, workshops and training courses we have created over the years. Highlights include:



Boards

Get an Adiuvo development board:

  • Adiuvo Embedded System Development board - Embedded System Development Board

  • Adiuvo Embedded System Tile - Low Risk way to add a FPGA to your design.

  • SpaceWire CODEC - SpaceWire CODEC, digital download, AXIS Interfaces

  • SpaceWire RMAP Initiator - SpaceWire RMAP Initiator,  digital download, AXIS & AXI4 Interfaces

  • SpaceWire RMAP Target - SpaceWire Target, digital download, AXI4 and AXIS Interfaces

  • Other Adiuvo Boards & Projects.


Embedded System Book   

Do you want to know more about designing embedded systems from scratch? Check out our book on creating embedded systems. This book will walk you through all the stages of requirements, architecture, component selection, schematics, layout, and FPGA / software design. We designed and manufactured the board at the heart of the book! The schematics and layout are available in Altium here.  Learn more about the board (see previous blogs on Bring up, DDR validation, USB, Sensors) and view the schematics here.


Sponsored by AMD



bottom of page