Abstract
We describe a mechanism for connecting GPU and FPGA devices directly via the PCI Express bus, enabling the transfer of data between these heterogeneous computing units without the intermediate use of system memory. We evaluate the performance benefits of this approach over a range of transfer sizes, and demonstrate its utility in a computer vision application. We find that bypassing system memory yields improvements as high as 2.2× in data transfer speed, and 1.9× in application performance.
Similar content being viewed by others
References
Bittner, R.: Speedy bus mastering PCI express. In: 22nd International Conference on Field Programmable Logic and Applications (2012)
Goldhammer, A., Ayer, J. Jr.: Understanding performance of PCI express systems. Xilinx WP350 (Sept. 2008)
Khronos Group: OpenCL: the open standard for parallel programming of heterogeneous systems. Available at: http://www.khronos.org/opencl/
Khronos Group: OpenCL API registry. Available at: http://www.khronos.org/registry/cl
Microsoft Corporation: “DirectCompute”. Available at: http://blogs.msdn.com/b/chuckw/archive/2010/07/14//directxompute.aspx
nVidia Corporation: nVidia CUDA API reference manual, version 4.1. Available at: http://ww.nvidia.com/CUDA
nVidia Corporation: nVidia CUDA C programming guide, version 4.1. Available at: http://ww.nvidia.com/CUDA
PCI express base specification, PCI SIG: Available at http://www.pcisig.com/specifications/pciexpress
Whitted, T., Kajiya, J., Ruf, E., Bittner, R.: Embedded function composition. In: Proceedings of the Conference on High Performance Graphics (2009)
PLDA Corporation: http://www.plda.com/prodetail.php?pid=175
Xilinx Corporation: PCI express. Available at: http://www.xilinx.com/technology/protocols/pciexpress.htm
nVidia GPUDirect: http://developer.nvidia.com/gpudirect
Oberg, J., Eguro, K., Bittner, R., Forin, A.: Random decision tree body part recognition using FPGAs. In: International Conference on Field Programmable Logic and Applications, August (2012)
da Silva, B., Braeken, A., D’Hollander, E., Touhafi, A., Cornelis, J.G., Lemiere, J.: Performance and toolchain of a combined GPU/FPGA desktop. In: 21st International Symposium on Field Programmable Gate Arrays (FPGA’13), Monterey, CA, February (2013)
Rossetti, D., et al.: GPU peer-to-peer techniques applied to a cluster interconnect. In: Proceeding of the Third Workshop on Communication Architecture for Scalable Systems (2013)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bittner, R., Ruf, E. & Forin, A. Direct GPU/FPGA communication Via PCI express. Cluster Comput 17, 339–348 (2014). https://doi.org/10.1007/s10586-013-0280-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-013-0280-9