Skip to main content
Log in

Implementation of LTE system on an SDR platform using CUDA and UHD

  • Published:
Analog Integrated Circuits and Signal Processing Aims and scope Submit manuscript

Abstract

In this paper, we present an implementation of a long term evolution (LTE) system on a software defined radio (SDR) platform using a conventional personal computer that adopts a graphic processing unit (GPU) and a universal software radio peripheral2 (USRP2) with a URSP hardware driver (UHD) to implement an SDR software modem and a radio frequency transceiver, respectively. The central processing unit executes C++ control code that can access the USRP2 via the UHD. We have adopted the Ettus Research UHD due to its high degree of flexibility in the design of the transceiver chain. By taking advantage of this benefit, a simple cognitive radio engine has been implemented using libraries provided by the UHD. We have implemented the software modem on a GPU that is suitable for parallel computing due to its powerful arithmetic and logic units. A parallel programming method is proposed that exploits the single instruction multiple data architecture of the GPU. We focus on the implementation of the Turbo decoder due to its high computational requirements and difficulty in parallelizing the algorithm. The implemented system is analyzed primarily in terms of computation time using the compute unified device architecture profiler. From our experimental tests using the implemented system, we have measured the total processing time for a single frame of both transmit and receive LTE data. We find that it takes 5.00 and 8.58 ms for transmit and receive, respectively. This confirms that the implemented system is capable of real-time processing of all the baseband signal processing algorithms required for LTE systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Kim, J., Hyeon, S., & Choi, S. (2010). Implementation of an SDR system using graphics processing unit. IEEE Communication Magazine, 48(3), 156–162.

    Article  Google Scholar 

  2. Jorgensen, P. B., Hansen, T. L., Sorensen, T. B., & Berardinelli, G. (2011). Implementation of LTE SC-FDMA on the USRP2 software defined radio platform, IEEE Swedish Communication Technologies Workshop, (pp. 34–39).

  3. NVIDIA Corporation (2009).NVIDIA CUDA programming guide.

  4. Ettus Research. Change log for releases. http://ettus-apps.sourcerepo.com/redmine/ettus/projects/uhd/wiki/ChangeLog.

  5. 3GPP (2012). 3rd generation partnership project (3GPP); Technical specification group radio access network; evolved universal terrestrial radio access (E-UTRA); physical channels and modulation (Release 9), http://3gpp.org/ftp/specs/html-info/36211.htm.

  6. 3GPP (2012). 3rd generation partnership project (3GPP); Technical specification group radio access network; evolved universal terrestrial radio access (E-UTRA); Multiplexing and channel coding (Release 9), http://3gpp.org/ftp/specs/html-info/36212.htm.

  7. NVIDIA Corporation (2011). NVIDIA GTX 295 Datasheet, NVIDIA Corporation. http://www.nvidia.com/object/product_geforce_gtx_295_us.html.

  8. Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J. W., & Skadron, K. (2008). A performance study of general-purpose application on graphics processors using CUDA. Journal of Parallel and distributed Computing, 68(10), 1370–1380.

    Article  Google Scholar 

  9. Merrill, D., & Grimshaw, A. (2011). High performance and scalable radix sorting: A case of implementing dynamic parallelism for GPU computing. Parallel Processing Letters, 21, 245–272.

    Article  MathSciNet  Google Scholar 

  10. Ahn, C., Bang, S., Kim, H., Lee, S., Kim, J., Choi, C., et al. (2012). Implementation of an SDR system using an MPI-based GPU cluster for WiMAX and LTE. Analog Integrated Circuit and Signal Processing, 73(2), 569–582.

    Article  Google Scholar 

  11. Ahn, C., Kim, J., Ju, J., Choi, J., Choi, B., & Choi, S. (2011). Implementation of an SDR platform using GPU and its application to a 2x2 MIMO WiMAX system. Analog Integrated Circuit and Signal Processing, 69(2–3), 107–117.

    Article  Google Scholar 

  12. Ryoo, S., Rodrigues, C. I., Baghsorkhi, S. S., Stone, S. S., Kirk, D. B., & Hwu, W. W. (2008). Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. Proceedings of the 13th ACM SIGPLAN Symposium on principles and practice of parallel programming, (pp. 73–82).

  13. Moazeni, M., Bui, A., & Sarrafzadeh, M. (2009). A memory optimization technique for software-managed scratchpad memory in GPUs. IEEE 7th Symposium on Application Specific Processors, (pp. 43–49).

  14. NVIDIA Corporation (2009). NVIDIA CUDA Compute unified device architecture: Programming guide (Version 2.3).

  15. Berrou, C., Glavieux, A., & Thitimajshima, P. (1993). Near Shannon limit error-correcting coding and decoding: Turbo-codes. 1. IEEE International Conference on Communications, 2, 1064–1070.

    Google Scholar 

  16. Studer, C., Benkeser, C., Belfanti, S., & Huang, Q. (2011). Design and implementation of a parallel turbo-decoder ASIC for 3GPP LTE. IEEE Journal of Solid-State Circuits, 46(1), 8–17.

    Article  Google Scholar 

  17. Karim, S. M., & Chakrabarti, I. (2010). An improved low-power high-throughput log-MAP turbo decoder. IEEE Transactions on Consumer Electronics, 56(2), 450–457.

    Article  Google Scholar 

  18. Hsu, J. M., & Wang, C. L. (1998). A parallel decoding scheme for turbo codes. IEEE International Symposium on Circuits and Systems, 4, 445–448.

    Google Scholar 

  19. Wu, M., Sun, Y., Wang, G., & Cavallaro, J. R. (2011). Implementation of a high throughput 3GPP turbo decoder on GPU. Journal of Signal Processing Systems, 65(2), 171–183.

    Article  Google Scholar 

  20. Sun, Y., & Cavallaro, J. R. (2011). Efficient hardware implementation of a highly-parallel 3GPP LTE/LTE-advance turbo decoder. Integration, the VLSI Journal, 44, 305–315.

    Article  Google Scholar 

  21. Yun, S., & Bar-Ness, Y. (2002). A parallel MAP algorithm for low latency turbo decoding. IEEE Communication Letters, 6(7), 288–290.

    Article  Google Scholar 

Download references

Acknowledgments

This research was supported by the MSIP (Ministry of Science, ICT&Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (NIPA-2013-H0301-13-1001) supervised by the NIPA (National IT Industry Promotion Agency).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seungwon Choi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bang, S., Ahn, C., Jin, Y. et al. Implementation of LTE system on an SDR platform using CUDA and UHD. Analog Integr Circ Sig Process 78, 599–610 (2014). https://doi.org/10.1007/s10470-013-0229-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10470-013-0229-1

Keywords

Navigation