MicroZed Chronicles: Multi-Gigabit Transceivers

Adam Taylor
Jun 2, 2021
5 min read

Updated: Jun 27

One aspect of FPGA design that we haven’t really examined is multi-gigabit transceivers (MGT). These transceivers are available in many Xilinx devices including ones in the 7 Series, UltraScale, UltraScale+ and Versal families.

I first worked with gigabit transceivers back in 2006 on an image processing application using Virtex-II Pro FPGAs. Incidentally it was also my first SoC design because we also used the PowerPC core. Back then, we thought 3.125 Gbps was lightning fast.

Of course, modern FPGAs provide us with a range of bandwidths across several types of gigabit transceiver.

Using these transceivers, we can implement high-bandwidth interfaces in FPGA designs from USB3 to DisplayPort, SATA, 100G Ethernet, and PCIe etc. We can also use them for high-bandwidth transfers between boards and devices if we need more than one device in the processing chain.

Working with these transceivers is straight forward. To start, I am going to demonstrate how we can connect two boards. The first is a KCU105 which has a Kintex FPGA and GTH transceivers. The second board is a ZCU102 which contains a Zynq MPSoC and also GTH transceivers. Both boards make a single transceiver available via SMA connections and enables a straightforward interconnect using coax cables.

To keep things nice and simple for this blog, all references and explanations are in respect to the GTH transceivers.

Before we start looking at the design of the Vivado projects for the KCU105 and the ZCU102, I want to explain a little about the structure of multi-gigabit transceivers.

Each transceiver consists of the TX and RX differential pair and both the transmit and receive paths are split between a PMA and PCS layers.

The PMA is the Physical Media Attachment layer which performs parallel to serial conversion, pre and post emphasis on the TX side, On the RX side, equalization and serial are to parallel conversion.

The Physical Coding Sublayer performs the protocol encoding / decoding and is also responsible for alignment. Obviously the innerworkings of the gigabit transceivers are beyond the scope of a single blog, but more detailed information can be found here.

One of the simplest ways to connect boards or devices is to leverage the Aurora Protocol. The Aurora enables the establishment and transfer of data between devices using a simple protocol. While simple to implement, Aurora provides the designer with a range of abilities from bonding multiple transceivers together to be able to transfer more data, to being able to create single ended TX and RX links if desired.

For this blog we need to create two projects, one in the KU105 and another in the ZCU102. Both are amazingly simple to implement once we understand how the MGTs work in the devices.

Due to their unique performance, MGTs use dedicated IO, power supplies, termination voltages and reference clocks. This ensures that high performance can be achieved. Furthermore, MGTs are provided in devices that are called quads.

Each quad contains four TX / RX pairs and two reference clocks. When looking at the ZCU102 and KCU105 user manuals, you will find there are 24 and 20 GTHs available, respectively. Looking through the connections of the GTH, you will find that the ZCU102 uses quad 128, while the KCU105 uses quad 226.

However, we also need a reference clock for the MGT and I do not want to use the SMA reference clocks. Instead, I am going to use the reference clocks which are provided by the ZCU102 and KCU105 clocking networks. Both networks provide a 156.25 MHz reference clock connected to an adjacent quad. When it comes to clocking quads, we can use the reference clock of the quad above and the quad below as well. Of course, the clock must be high quality and supplied directly from the dedicated reference clock pins to have a high data rate while still having an acceptable bit error rate.

We have now identified the quads we want to connect between the ZCU102 and the KU105. However, when we open the Aurora IP cores to configure the transceiver, we need to know the X and Y location of the quad we want to use. This is where the Zynq UltraScale+ Device Packaging and Pinouts and UltraScale and UltraScale+ FPGAs Packaging and Pinouts documents are useful.

These documents contain the cross reference from the quad to the XY location for the device and pin out you are using. For the KCU105 and the ZCU102, we have the following.

Referencing the charts, I can see that I want to use the KCU105 X0Y11 because the SMA TX/RX pair are the fourth transceivers in the quad. In the ZCU102, I want to use X0Y7 since the SMA transceiver is also the last one in the quad.

With the configurations known, I could then create two projects -- one targeting the ZCU102 and the other the KCU105. Both designs would contain Aurora IP, configured for a reference clock of 156.25MHz, set to the correct XY, and with the mode of operation set to streaming.

For both Aurora IP, I included the shared logic in the core because it was the only Aurora IP being instantiated. If you have multiple IP blocks being used, you need to have one which includes the shared logic (check documentation carefully to determine the optimal one to use) and the others as excluding the shared logic. The necessary signals are provided by the one with the shared logic included to the other instantiations.

Of course, I needed a source of data so I used a AXI traffic generator on the TX ports of the Aurora IP blocks.

The only thing left to do was to connect an init_clk which is at 156.25 MHz in both designs. I used a clocking wizard to connect an ILA to the status outputs to ensure that the line and channel was being established. In the KCU105, I created a MicroBlaze subsystem to help me configure the solution. In the ZCU102, I used the Zynq PS.

With the designs built, I connected the two boards together, being careful to cross connect the RX and TX pairs. Both boards were then programmed over JTAG and the ILAs interrogated to check out the link status. Both boards showed the channel was established and functioning.

Turning off one or the other board also showed that the link was no longer established and working.

Now that we have the link between the ZCU102 and the KCU105 up and running, the next stage is to look at how to use the in-system IBERT to capture the eye of the transceivers.

This is extremely useful for custom designs so I will show how to do that in the next blog.