Skip to main content
Log in

Direct GPU/FPGA communication Via PCI express

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

We describe a mechanism for connecting GPU and FPGA devices directly via the PCI Express bus, enabling the transfer of data between these heterogeneous computing units without the intermediate use of system memory. We evaluate the performance benefits of this approach over a range of transfer sizes, and demonstrate its utility in a computer vision application. We find that bypassing system memory yields improvements as high as 2.2× in data transfer speed, and 1.9× in application performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Bittner, R.: Speedy bus mastering PCI express. In: 22nd International Conference on Field Programmable Logic and Applications (2012)

    Google Scholar 

  2. Goldhammer, A., Ayer, J. Jr.: Understanding performance of PCI express systems. Xilinx WP350 (Sept. 2008)

  3. Khronos Group: OpenCL: the open standard for parallel programming of heterogeneous systems. Available at: http://www.khronos.org/opencl/

  4. Khronos Group: OpenCL API registry. Available at: http://www.khronos.org/registry/cl

  5. Microsoft Corporation: “DirectCompute”. Available at: http://blogs.msdn.com/b/chuckw/archive/2010/07/14//directxompute.aspx

  6. nVidia Corporation: nVidia CUDA API reference manual, version 4.1. Available at: http://ww.nvidia.com/CUDA

  7. nVidia Corporation: nVidia CUDA C programming guide, version 4.1. Available at: http://ww.nvidia.com/CUDA

  8. PCI express base specification, PCI SIG: Available at http://www.pcisig.com/specifications/pciexpress

  9. Whitted, T., Kajiya, J., Ruf, E., Bittner, R.: Embedded function composition. In: Proceedings of the Conference on High Performance Graphics (2009)

    Google Scholar 

  10. PLDA Corporation: http://www.plda.com/prodetail.php?pid=175

  11. Xilinx Corporation: PCI express. Available at: http://www.xilinx.com/technology/protocols/pciexpress.htm

  12. nVidia GPUDirect: http://developer.nvidia.com/gpudirect

  13. Oberg, J., Eguro, K., Bittner, R., Forin, A.: Random decision tree body part recognition using FPGAs. In: International Conference on Field Programmable Logic and Applications, August (2012)

    Google Scholar 

  14. da Silva, B., Braeken, A., D’Hollander, E., Touhafi, A., Cornelis, J.G., Lemiere, J.: Performance and toolchain of a combined GPU/FPGA desktop. In: 21st International Symposium on Field Programmable Gate Arrays (FPGA’13), Monterey, CA, February (2013)

    Google Scholar 

  15. Rossetti, D., et al.: GPU peer-to-peer techniques applied to a cluster interconnect. In: Proceeding of the Third Workshop on Communication Architecture for Scalable Systems (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erik Ruf.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bittner, R., Ruf, E. & Forin, A. Direct GPU/FPGA communication Via PCI express. Cluster Comput 17, 339–348 (2014). https://doi.org/10.1007/s10586-013-0280-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-013-0280-9

Keywords

Navigation