Argonne National Laboratory (ANL) Argonne, United States of America
Field-programmable gate arrays (FPGAs) are becoming promising heterogeneous computing components. In the meantime, high-level synthesis (HLS) tools are pushing the FPGA-based development from the register-transfer level to high-level-language design flow using Open Computing Language (OpenCL), C, and C++. The performance of binary search applications is often associated with irregular memory access patterns to off-chip memory. In this paper, we implement the binary search algorithms using OpenCL, and evaluate their performance on an Intel Arria-10 based FPGA platform. Based on the evaluation results, we implement the grid search in XSBench by vectorizing and replicating the binary search kernel. In addition, we overcome the overhead of kernel vectorization by grouping work-items into work-groups. Our optimizations improve the performance of the grid search using the classic binary search by a factor of 1.75 on the FPGA.