The Macro-DSE for HPC Processing Unit: The Physical Constraints Perspective

Tang, Yuxing; Wang, Lei; Deng, Yu; Ni, Xiaoqiang; Dou, Qiang

doi:10.1007/978-3-319-39077-2_7

Yuxing Tang¹⁶,
Lei Wang¹⁶,
Yu Deng¹⁶,
Xiaoqiang Ni¹⁶ &
…
Qiang Dou¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9663))

784 Accesses

Abstract

Because of the popularity of big data and cloud computing, the evolution of microarchitecture has to concentrated on raw computing ability, throughput, low power and cost at the same time. Due to the huge Non-recurring engineering costs, computer architects and processor designers rely on the simulation tools and models to optimize the main processing unit. Design space exploration (DSE) methodology is responsible to filter all the possible choices. However, thousands of parameters for current multi-core processor make it too expensive to complete the exhausting search. The future high performance computing (HPC) no longer insist on peak double precision performance (DFP) only, but also on high throughput and light-weight. Depending on the various details from the number of cores to the individual pipeline buffer size, we can divide the DSE problem into macro and micro level.

In this paper, we focus on the macro-DSE problem around choosing the right style for the processing core design. Firstly, we extended McPAT, the de facto DSE tools to support from 65 nm to 16 nm technology and up to 256 Cores. Based on the physical design constraints: chip area, power and balance design request, we examine and explore the design of future processing unit of high performance. Although traditional HPC pursued the peak performance only, our DSE results show the physical constrain will direct the processing unit of future HPC to limited choice. The experiment results show that with only 74.8 % increasing in chip die area and 3.8 % increasing in power, one many-core design can archive 4 times peak performance both in INT and FP, and 285.6 % increasing in performance/power efficiency than another. The key insight of our experiment indicates that unique type of processing core can be the best choice depending on the specific physical design plan.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bergman, K., Borkar, S., Campbell, D., Carlson, W., Dally, W., Denneau, M., Franzon, P., Harrod, W., Hill, K., Hiller, J., Karp, S., Keckler, S., Klein, D., Lucas, R., Richards, M., Scarpelli, A., Scott, S., Snavely, A., Thomas Sterling, R., Williams, S., Yelick, K.: ExtraScale Computing Study: Technology Challenges in Achieving Exascale System. Kogge, P. (ed. and study lead) (2008)
Google Scholar
Danowitz, A., Kelley, K., Mao, J., Stevenson, J.P., Horowitz, M.: CPU DB: recording microprocessor history. Commun. ACM 55(4), 55–63 (2012)
Article Google Scholar
Lee, V.W., Kim, C., Chhugani, J., Deisher, M., Kim, D., Nguyen, A.D., Satish, N., Smelyanskiy, M., Chennupaty, S., Hammarlund, P., Singhal, R., Dubey, P.: Debunking the 100X GPU vs. CPU myth: an evalution of throughput computing on CPU and GPU. In: Proceedings of the 37th Annual International Symposium on Computer Architecdture (ISCA 2010), pp. 451–460 (2010)
Google Scholar
Blem, E., Menon, J., Vijayaraghavan, T., Sankaralingam, K.: ISA wars: understanding the relevance of ISA being RISC or CISC to performance power and energy on modern architecture. ACM Trans. Comput. Syst. 33(1), 3 (2015)
Article Google Scholar
Tendler, J.M., Dodson, J.S., Fields, J.S., Le, H., Sinharoy, B.: POWER4 System microarchtecture. IBM J. Res. Dev. 46(1), 5–15 (2001)
Article Google Scholar
Sampson, R., Yang, M., Wei, S., Chakrabarti, C., Wenisch, T.F.: Sonic Millip3De: a massively parallel 3D-stacked accelerator for 3D ultrasound. In: Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, pp. 318–329 (2013)
Google Scholar
Akin, B., Franchetti, F., Hoe, J.C.: Data reorganization in memory using 3D-stacked DRAM. In: Proceedings of the 42nd International Symposium on Computer Architecture, pp. 131–143 (2015)
Google Scholar
Koyanagi, M.: Heterogeneous 3D integration - technology enabler toward future super-chip. In: Proceedings of IEEE International Electron Devices Meeting (IEDM), pp. 1.2.1–1.2.8 (2013)
Google Scholar
Li, S., Ahn, J.H., Strong, R.D., Brockman, J.B., Tullsen, D.M., Jouppi, N.P.: The McPAT framework for multicore and manycore architecture: simultaneously modeling power, area, and timing. ACM Trans. Archit. Code Optim. 10(1), 5 (2013)
Article Google Scholar
Xi, S.L., Jacobson, H., Bose, P., Wei, G.-Y., Brooks, D.: Quantifying sources of error in McPAT and potential impacts on architecture studies. In: Proceedings of 21st Internaional Symposium on High Performance Computer Architecture, pp. 577–589 (2015)
Google Scholar
Leng, J., Hethering, T., ElTantawy, A., Gilani, S., Kim, N.S., Aamodt, T.M., Reddi, V.J.: GPUWattch: enabling energy optimizations in GPGPUs. In: Proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA 2013), pp. 487–498 (2013)
Google Scholar
Serafy, C., Srivastava, A., Yeung, D.: Unlocking the true potential of 3D CPUs with micro-fluidic cooling. In: Proceedings of the 2014 International Symposium on Low Power Electronics and Design, pp. 323–326 (2014)
Google Scholar
Johns, C.R., Brokenshire, D.A.: Introduction to the cell broadband engine architecture. IBM J. Res. Dev. 51(5), 503–520 (2007)
Article Google Scholar
Gutta, S.R., Foley, D., Naini, A., Wasmuth, R., Cherepacha, D.: A low-power integrated X86-64 and graphics processor for mobile computing devices. In: 2011 IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, pp. 270–272 (2011)
Google Scholar
Davy, G., Deckhout, L.: Chip multiprocessor design space exploration through statistical simulation. IEEE Trans. Comput. 12(58), 1668–1681 (2009)
MathSciNet Google Scholar
Lee, J., Jang, H., Kim, J.: RpStacks: fast and accurate processor design space exploration using representative stall-event stacks. In: Proceedings of 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 255–267 (2014)
Google Scholar
Rajovic, N., Carpenter, R.M., Gelado, I., Puzovic, N., Ramirez, A., Valero, M.: Supercomputing with commodity CPUs: are mobile SoCs Ready for HPC? In: Proceedings of 2013 International Conference of Supercomputing (SC 2013), pp. 1–12 (2013)
Google Scholar
Dubach, C., Jones, T., O’Boyle, M.: Microarchitectural design space exploration using an architecture-centric approach. In: Proceeding of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 40), pp. 262–271 (2007)
Google Scholar
Wang, L., Tang, Y., Deng, Y., Qi, F., et al.: A Scalable and fast microprocessor design space exploration methodology. In: Proceedings of McSoC (2015)
Google Scholar
Gibbons, P.B.: Big data: scale down, scale up, scale out. The Keynotes in 29th IEEE International Parallel and Distributed Processing Symposium (IPDPS 29) (2015)
Google Scholar
Dhodapkar, A., Aauterbach, G., Li, S., et al.: SeaMicro SM10000-64 server: building datacenter servers using cell phone chips. In: Proceedings of 23rd IEEE HotChips Symposium (2011)
Google Scholar
Gwennap, L.: ThunderX rattles server market: cavium develops 48-Core ARM processor to challenge Xeon. MicroProcessor report, 9 June 2014
Google Scholar
Gwennap, L.: 3D packaging gains momentum: xilinx FPGAs to use stacked silicon - will processors follow suit? MicroProcessor report 12/27/10-01 December 2012
Google Scholar
Dreslinski, R.G., Fick, D., Giridhar, B., Kim, G., Seo, S., Fojtik, M., Satpathy, S., Lee, Y., Kim, D., Liu, N., Wieckowski, M., Chen, G., Sylvester, D., Blaauw, D., Mudge, T.: Centip3De: a many-core prototype exploring 3D integration and near-threshold computing. Commun. ACM 56(11), 97–104 (2013)
Article Google Scholar
Nickolls, J., Dally, W.J.: The GPU computing era. IEEE Micro 30(2), 56–69 (2010)
Article Google Scholar

Download references

Acknowledgements

We thanks the other cpu@nudt team numbers that provide architecture, microarchitecture and physical design parameters of various processor. This work is supported in part by NSFC grants No. 61272139 and National Science and Technology Major Project HGJ-2015ZX01028001-001.

Author information

Authors and Affiliations

School of Computer, National University of Defense Technology, Changsha, China
Yuxing Tang, Lei Wang, Yu Deng, Xiaoqiang Ni & Qiang Dou

Authors

Yuxing Tang
View author publications
You can also search for this author in PubMed Google Scholar
Lei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yu Deng
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoqiang Ni
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Dou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuxing Tang .

Editor information

Editors and Affiliations

Fujian Normal University, Fuzhou, China
Xinyi Huang
Deakin University, Burwood, Australia
Yang Xiang
Providence University, Taichung, Taiwan
Kuan-Ching Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tang, Y., Wang, L., Deng, Y., Ni, X., Dou, Q. (2016). The Macro-DSE for HPC Processing Unit: The Physical Constraints Perspective. In: Huang, X., Xiang, Y., Li, KC. (eds) Green, Pervasive, and Cloud Computing. Lecture Notes in Computer Science(), vol 9663. Springer, Cham. https://doi.org/10.1007/978-3-319-39077-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-39077-2_7
Published: 03 May 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39076-5
Online ISBN: 978-3-319-39077-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics