Being able to leverage the power of programmable logic thanks to using High Level Synthesis allows us to significantly reduce development times. Of course, we also want to leverage existing libraries in order to get the best from HLS developments and avoid having to reinvent the wheel each time.
Last year, I examined SLX FPGA and used it to optimize IP Cores for implementation in Vivado looking at security and industrial algorithms. Of course, things have moved on in the HLS world with the introduction of Vitis, last November. I was curious to see how SLX FPGA could be used in a Vitis bottom up flow. When working in a bottom up flow we use Vivado HLS to generate a Xilinx Object (XO) which is then added into Vitis for use later in the Vitis application. Such a bottom up flow allows us to focus on complex algorithms, verify their performance and ensuring the optimization for programmable logic implementation provide the best implementation and are kept close to the algorithm.
I thought it would be a good idea to try and combine the bottom XO flow, while leveraging the Xilinx Vitis Acceleration libraries at the same time. I was also curious to see what optimization SLX FPGA could suggest on top of the libraries provided by Xilinx.
Vitis accelerated libraries are open source and provide a range of common library functions like Math and DSP that are typically used across a breadth of applications. The Vitis accelerated libraries also include domain-specific libraries which address a range of applications from image processing to quantitative finance, security, and data compression.
Many of the domain applications like quantitative finance, for example, are more likely to be implemented in the cloud using Alveo cards or AWS F1 Instances. In quantitative finance, every nano second counts because the developer may wish to accelerate functions into the programmable logic provided by an Alveo card.
However, even though the Vitis acceleration libraries come with several optimization pragma already inserted, there is still a need for hand optimization to truly get the best performance – and this can take time to implement. One of the reasons I was curious to see if SLX FPGA would recommend further optimizations. I figured this could also be especially useful in cases where I make modifications to the library code and have to re-optimize, or if I were building a kernel from scratch."
For this example, I picked the Vasicek model, which is one of the quantitative finance libraries. To be able to deploy this in the Alveo card, I am going to use Vivado HLS to create a Xilinx Object, SLX FPGA to optimize the XO and implement the XO using Vitis to create. What I am really interested in is the difference SLX FPGA can make to the performance over the standard library download.
Thankfully, each of the library functions comes with its own test function, which can be used for demonstration.
This is the flow I am going to use:
Clone the Vitis accelerated library.
Create a Vivado HLS project using the Vasicek model test files provided.
Synthesize the design to understand the baseline performance and utilization.
Import the project into SLX FPGA.
Analyze the design for parallel structures.
Generate the code with additional optimization pragma identified by SLX FPGA.
Update the original Vivado HLS project with the new optimized code to generate the Xilinx Object file for the bottom-up flow.
Import the XO file into Vitis and implement the design to demonstrate the implemented algorithm.
Running the original code from the cloned libraries through Vivado HLS results in a latency of 1823 clock cycles and utilization of approximately 20,000 Flip Flops, 22,000 LUTS and 114 DSP elements.
Looking at the design in the analysis view shows several areas where optimizations might be able to be implemented.
With the Vivado HLS design implemented and the baseline obtained, the next step is to import the project into SLX FPGA.
Importing the design into SLX FPGA should correctly configure the project. However, we need to make sure the base path contains the cloned libraries.
Running the design through FPGA mapping and identifying the parallel structures will identify several data and pipeline-level parallelization structures in the code, which could be optimized by pragma insertion.
With the optimizations identified, the next job is to generate the updated code that has a new optimization pragma included.
Rerunning the generated code back through the Vivado HLS project provides for a significant improvement not only in performance but in resource utilization as well.
With the instrumented code compiled, we can take the next step and export the Xilinx Object and import it into a new Vitis project, which targets the Alveo U200 card.
Once the project completes, opening the Vivado project will show the new implementation containing the Vasicek model implemented in the hardware.
Now we can create applications which run in the host using our Vasicek model and take advantage of the parallel nature of programmable logic.
I do a lot of HLS work for clients and I am always impressed when I play with SLX FPGA and see how fast it is at finding the optimization pragmas!
Comments