skip to main content
research-article
Free Access

Understanding throughput-oriented architectures

Published:01 November 2010Publication History
Skip Abstract Section

Abstract

For workloads with abundant parallelism, GPUs deliver higher peak computational throughput than latency-oriented CPUs.

References

  1. Alverson, G., Alverson, R., Callahan, D., Koblenz, B., Porterfield, A., and Smith, B. Exploiting heterogeneous parallelism on a multithreaded multiprocessor. In Proceedings of the Sixth international Conference on Supercomputing (Washington, D.C., July 19--24). ACM Press, New York, 1992, 188--197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alverson, R., Callahan, D., Cummings, D., Koblenz, B., Porterfield, A., and Smith, B. The Tera computer system. In Proceedings of the Fourth international Conference on Supercomputing (Amsterdam, The Netherlands, June 11--15). ACM Press, New York, 1990, 1--6 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bell, N. and Garland, M. Implementing sparse matrix-vector multiplication on throughput-oriented processors. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (Portland, OR, Nov. 14--20). ACM Press, New York, 2009, 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Birrell, A.D. An Introduction to Programming with Threads. Research Report 35. Digital Equipment Corp. Systems Research, Palo Alto, CA, 1989.Google ScholarGoogle Scholar
  5. Blank, T. The MasPar MP-1 architecture. In Proceedings of Compcon (San Francisco, CA, Feb. 26--Mar. 2). IEEE Press, 1990, 20--24.Google ScholarGoogle Scholar
  6. Borkar, S., Jouppi, N.P., and Stenstrom, P. Microprocessors in the era of terascale integration. In Proceedings of the Conference on Design, Automation and Test in Europe (Nice, France, Apr. 16--20). EDA Consortium, San Jose, CA, 2007, 237--242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bouknight, W.J., Denenberg, S.A., McIntyre, D.E., Randall, J.M., Sameh, A.H., and Slotnick, D.L. The Illiac IV system. Proceedings of the IEEE 60, 4 (Apr. 1972), 369--388.Google ScholarGoogle ScholarCross RefCross Ref
  8. Dally, W. Power efficient supercomputing. Presented at the Accelerator-based Computing and Manycore Workshop (Lawrence Berkeley National Laboratory, Berkeley, CA, Nov. 30--Dec. 2, 2009); http://www.lbl.gov/cs/html/Manycore_Workshop09/GPUMulticoreSLAC2009/dallyppt.pdfGoogle ScholarGoogle Scholar
  9. Dally, W.J., Labonte, F., Das, A., Hanrahan, P., Ahn, J., Gummaraju, J., Erez, M., Jayasena, N., Buck, I., Knight, T. J., and Kapasi, U.J. Merrimac: Supercomputing with streams. In Proceedings of the 2003 ACM/IEEE Conference on Supercomputing (Nov. 15--21). IEEE Computer Society, Washington, D.C., 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Davis, J.D., Laudon, J., and Olukotun, K. Maximizing CMP throughput with mediocre cores. In Proceedings of the 14th international Conference on Parallel Architectures and Compilation Techniques (Sept. 17--21). IEEE Computer Society, Washington, D.C., 2005, 51--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Espasa, R., Valero, M., and Smith, J.E. Vector architectures: Past, present and future. In Proceedings of the 12th international Conference on Supercomputing (Melbourne, Australia). ACM Press, New York, 1998, 425--432. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Flynn, M.J. Very high-speed computing systems. Proceedings of the IEEE 54, 12 (Dec. 1966), 1901--1909.Google ScholarGoogle ScholarCross RefCross Ref
  13. Garland, M., Grand, S.L., Nickolls, J., Anderson, J., Hardwick, J., Morton, S., Phillips, E., Zhang, Y., and Volkov, V. Parallel computing experiences with CUDA. IEEE Micro 28, 4 (July 2008), 13--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Gavril, F. Merging with parallel processors. Commun. ACM 18, 10 (Oct. 1975), 588--591. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Grochowski, E., Ronen, R., Shen, J., and Wang, H. Best of both latency and throughput. In Proceedings of the IEEE international Conference on Computer Design (Oct. 11--13). IEEE Computer Society, Washington, D.C., 2004, 236--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kapasi, U., Dally, W.J., Rixner, S., Owens, J.D., and Khailany, B. The Imagine stream processor. In Proceedings of the 2002 IEEE International Conference on Computer Design (Sept. 16--18). IEEE Computer Society, Washington, D.C., 2002, 282--288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Khailany, B.K., Williams, T., Lin, J., Long, E.P., Rygh, M., Tovey, D.W., and Dally, W.J. A programmable 512 GOPS stream processor for signal, image, and video processing. IEEE Journal of Solid-State Circuits 43, 1 (Jan. 2008), 202--213.Google ScholarGoogle ScholarCross RefCross Ref
  18. Kongetira, P., Aingaran, K., and Olukotun, K. Niagara: A 32-way multithreaded Sparc processor. IEEE Micro 25, 2 (Mar./Apr. 2005), 21--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kozyrakis, C. and Patterson, D. Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks. In Proceedings of the 35th Annual ACM/IEEE International Symposium on Microarchitecture (Istanbul, Turkey, Nov. 18--22). IEEE Computer Society Press, Los Alamitos, CA, 2002, 283--293. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Krashinsky, R., Batten, C., Hampton, M., Gerding, S., Pharris, B., Casper, J., and Asanovic, K. The vector-thread architecture. SIGARCH Computer Architecture News 32, 2 (Mar. 2004), 52--63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Laudon, J., Gupta, A., and Horowitz, M. Interleaving: A multithreading technique targeting multiprocessors and workstations. In Proceedings of the Sixth International Conference on Architectural Support For Programming Languages and Operating Systems (San Jose, CA, Oct. 5--7). ACM Press, New York, 1994, 308--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Lindholm, E., Nickolls, J., Oberman, S., and Montrym, J. NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro 28, 2 (Mar./Apr. 2008), 39--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Nickolls, J., Buck, I., Garland, M., and Skadron, K. Scalable parallel programming with CUDA. Queue 6, 2 (Mar./Apr. 2008), 40--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. NVIDIA. NVIDIA's Next-Generation CUDA Compute Architecture: Fermi, Oct. 2009; http://www.nvidia.com/fermiGoogle ScholarGoogle Scholar
  25. Russell, R.M. The Cray-1 computer system. Commun. ACM, 21, 1 (Jan. 1978), 63--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Sanders, J. and Kandrot, E. CUDA By Example: An Introduction to General-Purpose GPU Programming. Addison-Wesley, July 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Satish, N., Harris, M., and Garland, M. Designing efficient sorting algorithms for manycore GPUs. In Proceedings of the 2009 IEEE international Symposium on Parallel & Distributed Processing (May 23--29). IEEE Computer Society, Washington, D.C., 2009, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Smith, B.J. Architecture and applications of the HEP multiprocessor computer system. Proceedings of the International Society for Optical Engineering 298 (Aug. 1981), 241--248.Google ScholarGoogle ScholarCross RefCross Ref
  29. Tucker, L.W. and Robertson, G.G. Architecture and applications of the Connection Machine. Computer 21, 8 (Aug. 1988), 26--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Tullsen, D.M., Eggers, S.J., and Levy, H.M. Simultaneous multithreading: maximizing on-chip parallelism. In Proceedings of the 22nd Annual international Symposium on Computer Architecture (S. Margherita Ligure, Italy, June 22--24). ACM Press, New York, 1995, 392--403. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ungerer, T., Robic, B., and Šilc, J. A survey of processors with explicit multithreading. ACM Computing Surveys 35, 1 (Mar. 2003), 29--63. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Understanding throughput-oriented architectures

                    Recommendations

                    Comments

                    Login options

                    Check if you have access through your login credentials or your institution to get full access on this article.

                    Sign in

                    Full Access

                    • Published in

                      cover image Communications of the ACM
                      Communications of the ACM  Volume 53, Issue 11
                      November 2010
                      112 pages
                      ISSN:0001-0782
                      EISSN:1557-7317
                      DOI:10.1145/1839676
                      Issue’s Table of Contents

                      Copyright © 2010 ACM

                      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                      Publisher

                      Association for Computing Machinery

                      New York, NY, United States

                      Publication History

                      • Published: 1 November 2010

                      Permissions

                      Request permissions about this article.

                      Request Permissions

                      Check for updates

                      Qualifiers

                      • research-article
                      • Popular
                      • Refereed

                    PDF Format

                    View or Download as a PDF file.

                    PDF

                    eReader

                    View online with eReader.

                    eReader

                    HTML Format

                    View this article in HTML Format .

                    View HTML Format