top of page

MicroZed Chronicles: OpenCL - Creating a Kernel Application and Host Integration

In the last blog we looked at how to create a Vitis application that would detect the OpenCL platform and identify the OpenCL devices available within the platform. In this blog, we are going to examine how we can create a simple kernel application and then complete the rest of the host application so that we get an executable application.

To get started creating the kernel, we need to select the src directory under the kernels folder in the explorer and create a new file.

The code to be created is simple. For instructional purposes, it is simple input which is multiplied by the iteration of the loop and returned. All of the source will be made available on my GitHub.

Once this is completed, we need to update the host application in order to load and execute the kernel. Last week we identified the platform and devices in addition to creating the context and the command queue.

Now that we have a kernel, we can start developing the remainder of the host application. To complete the application, we need to do the following.

The kernel program is precomputed and not compiled on the fly due to the long compilation times of programmable logic, so we need to load in the precompiled XCL binary. The XCL binary to load can be an argument passed to the host program at run time.

 // Load xclbin
 char* xclbinFilename = argv[1];
 std::cout << "Loading: '" << xclbinFilename << "'\n";
 std::ifstream bin_file(xclbinFilename, std::ifstream::binary);
 bin_file.seekg (0, bin_file.end);
 unsigned nb = bin_file.tellg();
 bin_file.seekg (0, bin_file.beg);
 char *buf = new char [nb];, nb);

Once the XCL binary is loaded, we need to create a program object and the XCL binary is then loaded into the program object

 // Creating Program from Binary File
 cl::Program::Binaries bins;
 cl::Program program(context, devices, bins);

Create a kernel which uses the program object previously loaded.

 //create the kernel
 cl::Kernel krnl_simple(program,"cvd");

Create the buffers to be able to transfer data to and from the kernel. These buffers need to be correctly sized because we use the size of the arguments. We create two buffers, one for each argument passed to the kernel.

 //create the buffers
 size_t size_in_bytes = BUFFER_SIZE * sizeof(int);

 cl::Buffer ip(context, CL_MEM_READ_ONLY, size_in_bytes);
 cl::Buffer result(context, CL_MEM_WRITE_ONLY, size_in_bytes);

With the kernel and the buffers defined, the next step is to associate the kernel arguments with a buffer.

 //set up the arguments for the program

We need to map them to pointers to be able to read and write the buffers.

//map the buffers so we can access them
 int *ptr_ip = (int *) q.enqueueMapBuffer (ip , CL_TRUE , CL_MAP_WRITE , 0, size_in_bytes);
 int *ptr_result = (int *) q.enqueueMapBuffer (result , CL_TRUE , CL_MAP_READ , 0, size_in_bytes);

We can then use the pointer to create the input data using a simple loop to initialize all the input values.

With the input data ready, we are now able to move the ip data from the host memory to the kernel, run the kernel, and move the results back from the kernel to the host.

 //move the input buffer from host memory to the kernel

 //run the kernel

 //move the results back from the kernel to the host

 //finish the operation

Finally, we need to unmap the pointers from the buffers and analyze the results to make sure the algorithm behaved as expected.

 //release the mapping
 q.enqueueUnmapMemObject(ip , ptr_ip);
 q.enqueueUnmapMemObject(result , ptr_result);

 std::cout << "TEST Completed " << std::endl;

 int match = 0;
 for (int i = 0; i < BUFFER_SIZE; i++) {
 int host_result = ptr_ip[i] * i;
 if (ptr_result[i] != host_result) {
 printf(error_message.c_str(), i, host_result, ptr_result[i]);
 match = 1;

 std::cout << "TEST " << (match ? "FAILED" : "PASSED") << std::endl;
 return (match ? EXIT_FAILURE : EXIT_SUCCESS)

The completed application for both the host and kernel is available on my GitHub. Once these are in place, we need to make sure we have set up the kernel and linker projects settings to include the kernel in the binary container.

We are now ready to build the application and run the application on the host. This will load the kernel on the Alveo U50 card and execute it. Once completed, we should see if the example has run successfully or not in the terminal window.

Now we know exactly how we can create a kernel and integrate it within a host application using OpenCL and the Vitis acceleration flow. This will be a useful reference when we develop applications using Vitis on both acceleration cards and edge-based heterogeneous SoCs.


Recent Posts

See All


bottom of page