One of the main methods of increasing timing performance in our FPGA designs is to implement pipelining. At its heart, pipelining allows us to restructure data paths that have several logic layers.
Having several layers of logic between registers presents several challenges when trying to achieve high performance. A data path that consists of several logic layers will have both significant logic and routing delays. These delays prevent Vivado from being able to achieve timing closure at higher clock frequencies.
Restructuring the data path enables us to achieve higher clock frequencies and increased throughput at the cost of increased latency. We restructure the data by inserting registers to reduce the number of logic layers, thereby reducing the total logic and routing delay between registers.
Like many things in engineering, it is better to think about pipelining of data paths from day one. The work involved in addressing the issue later, when trying to achieve timing closure, might be considerable and costly.
So, how do we correctly pipeline our design? The main element is to think about this from day one when we write our data path.
One great technique is to include additional registers either at the beginning or at the end of the data path.
We can then use the global retiming synthesis option; this option enables the synthesis tool to perform register rebalancing across logic levels.
As the registers are already present either before or after the data path, the original latency and behavior is maintained. Enabling global retiming presents a good solution as it enables the synthesis tool to position the registers where they will be most effective.
However, sometimes we need more granular control over the implementation of pipelining, and using synthesis attributes such as retiming_forward and retiming_backwards enables us to control the positioning of pipelining registers.
If we are using block synthesis, we can control retiming within the block by using the BLOCK_SYNTH.RETIMING option.
As I mentioned, we should consider pipelining of our data paths from day one. However, sometimes we need to be able to analyze the design to identify potential issues. Thankfully, we are able to do this by running the design analysis report and checking the “include logic level distribution” option. This will report the levels of logic depth within the design; we can then investigate and optimize the pipelines as necessary.
This report will show, for each clock in the design, the number of paths with specific logic depth. For example, in the table below, in the clk_pll_i, there are 174 paths that have 9 logic layers.
Along with inserting registers in the design, we are able to instruct the Vivado placer to implement automatic insertion of registers to implement pipelining using HDL attributes / XDC constraints defined below:
Autopipeline_group – Defines a group of signals where auto-inserted balanced pipelining must be inserted.
Autopipeline_Include – Defines a signal to include.
Autopipeline_Limit – Defines the number of stages allowable, from 0 to 24.
Autopipeline_Module – Enables modules with autopipelining to be instantiated several times in the design.
For the best performance when enabling automatic pipelining, ensure the data path does not contain registers that use clock enables or resets.
When automatic pipelining is implemented, the latency will vary depending on the decisions of the placer. The actual latency for the automatically placed paths is, of course, reported in the implementation report.
It is normal to use auto-pipelining when crossing Super Logic Regions, but I would not recommend you use it for normal pipelining of your data paths. In those cases, you should use the approach outlined above, which uses additional registers and global/block retiming options.
If we think about pipelining from the beginning of our design, hopefully, as you can see from above, we stand a better chance of achieving timing closure.
Workshops and Webinars
If you enjoyed the blog why not take a look at the free webinars, workshops and training courses we have created over the years. Highlights include
Professional PYNQ Learn how to use PYNQ in your developments
Introduction to Vivado learn how to use AMD Vivado
Ultra96, MiniZed & ZU1 three day course looking at HW, SW and PetaLinux
Arty Z7-20 Class looking at HW, SW and PetaLinux
Mastering MicroBlaze learn how to create MicroBlaze solutions
HLS Hero Workshop learn how to create High Level Synthesis based solutions
Perfecting Petalinux learn how to create and work with PetaLinux OS
Boards
Get an Adiuvo development board
Adiuvo Spartan 7 / RPi 2040 Embedded System Development Board
Embedded System Book
Do you want to know more about designing embedded systems from scratch? Check out our book on creating embedded systems. This book will walk you through all the stages of requirements, architecture, component selection, schematics, layout, and FPGA / software design. We designed and manufactured the board at the heart of the book! The schematics and layout are available in Altium here Learn more about the board (see previous blogs on Bring up, DDR validation, USB, Sensors) and view the schematics here.
Comments