Abstract
Because of the popularity of big data and cloud computing, the evolution of microarchitecture has to concentrated on raw computing ability, throughput, low power and cost at the same time. Due to the huge Non-recurring engineering costs, computer architects and processor designers rely on the simulation tools and models to optimize the main processing unit. Design space exploration (DSE) methodology is responsible to filter all the possible choices. However, thousands of parameters for current multi-core processor make it too expensive to complete the exhausting search. The future high performance computing (HPC) no longer insist on peak double precision performance (DFP) only, but also on high throughput and light-weight. Depending on the various details from the number of cores to the individual pipeline buffer size, we can divide the DSE problem into macro and micro level.
In this paper, we focus on the macro-DSE problem around choosing the right style for the processing core design. Firstly, we extended McPAT, the de facto DSE tools to support from 65 nm to 16 nm technology and up to 256 Cores. Based on the physical design constraints: chip area, power and balance design request, we examine and explore the design of future processing unit of high performance. Although traditional HPC pursued the peak performance only, our DSE results show the physical constrain will direct the processing unit of future HPC to limited choice. The experiment results show that with only 74.8 % increasing in chip die area and 3.8 % increasing in power, one many-core design can archive 4 times peak performance both in INT and FP, and 285.6 % increasing in performance/power efficiency than another. The key insight of our experiment indicates that unique type of processing core can be the best choice depending on the specific physical design plan.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bergman, K., Borkar, S., Campbell, D., Carlson, W., Dally, W., Denneau, M., Franzon, P., Harrod, W., Hill, K., Hiller, J., Karp, S., Keckler, S., Klein, D., Lucas, R., Richards, M., Scarpelli, A., Scott, S., Snavely, A., Thomas Sterling, R., Williams, S., Yelick, K.: ExtraScale Computing Study: Technology Challenges in Achieving Exascale System. Kogge, P. (ed. and study lead) (2008)
Danowitz, A., Kelley, K., Mao, J., Stevenson, J.P., Horowitz, M.: CPU DB: recording microprocessor history. Commun. ACM 55(4), 55–63 (2012)
Lee, V.W., Kim, C., Chhugani, J., Deisher, M., Kim, D., Nguyen, A.D., Satish, N., Smelyanskiy, M., Chennupaty, S., Hammarlund, P., Singhal, R., Dubey, P.: Debunking the 100X GPU vs. CPU myth: an evalution of throughput computing on CPU and GPU. In: Proceedings of the 37th Annual International Symposium on Computer Architecdture (ISCA 2010), pp. 451–460 (2010)
Blem, E., Menon, J., Vijayaraghavan, T., Sankaralingam, K.: ISA wars: understanding the relevance of ISA being RISC or CISC to performance power and energy on modern architecture. ACM Trans. Comput. Syst. 33(1), 3 (2015)
Tendler, J.M., Dodson, J.S., Fields, J.S., Le, H., Sinharoy, B.: POWER4 System microarchtecture. IBM J. Res. Dev. 46(1), 5–15 (2001)
Sampson, R., Yang, M., Wei, S., Chakrabarti, C., Wenisch, T.F.: Sonic Millip3De: a massively parallel 3D-stacked accelerator for 3D ultrasound. In: Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, pp. 318–329 (2013)
Akin, B., Franchetti, F., Hoe, J.C.: Data reorganization in memory using 3D-stacked DRAM. In: Proceedings of the 42nd International Symposium on Computer Architecture, pp. 131–143 (2015)
Koyanagi, M.: Heterogeneous 3D integration - technology enabler toward future super-chip. In: Proceedings of IEEE International Electron Devices Meeting (IEDM), pp. 1.2.1–1.2.8 (2013)
Li, S., Ahn, J.H., Strong, R.D., Brockman, J.B., Tullsen, D.M., Jouppi, N.P.: The McPAT framework for multicore and manycore architecture: simultaneously modeling power, area, and timing. ACM Trans. Archit. Code Optim. 10(1), 5 (2013)
Xi, S.L., Jacobson, H., Bose, P., Wei, G.-Y., Brooks, D.: Quantifying sources of error in McPAT and potential impacts on architecture studies. In: Proceedings of 21st Internaional Symposium on High Performance Computer Architecture, pp. 577–589 (2015)
Leng, J., Hethering, T., ElTantawy, A., Gilani, S., Kim, N.S., Aamodt, T.M., Reddi, V.J.: GPUWattch: enabling energy optimizations in GPGPUs. In: Proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA 2013), pp. 487–498 (2013)
Serafy, C., Srivastava, A., Yeung, D.: Unlocking the true potential of 3D CPUs with micro-fluidic cooling. In: Proceedings of the 2014 International Symposium on Low Power Electronics and Design, pp. 323–326 (2014)
Johns, C.R., Brokenshire, D.A.: Introduction to the cell broadband engine architecture. IBM J. Res. Dev. 51(5), 503–520 (2007)
Gutta, S.R., Foley, D., Naini, A., Wasmuth, R., Cherepacha, D.: A low-power integrated X86-64 and graphics processor for mobile computing devices. In: 2011 IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, pp. 270–272 (2011)
Davy, G., Deckhout, L.: Chip multiprocessor design space exploration through statistical simulation. IEEE Trans. Comput. 12(58), 1668–1681 (2009)
Lee, J., Jang, H., Kim, J.: RpStacks: fast and accurate processor design space exploration using representative stall-event stacks. In: Proceedings of 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 255–267 (2014)
Rajovic, N., Carpenter, R.M., Gelado, I., Puzovic, N., Ramirez, A., Valero, M.: Supercomputing with commodity CPUs: are mobile SoCs Ready for HPC? In: Proceedings of 2013 International Conference of Supercomputing (SC 2013), pp. 1–12 (2013)
Dubach, C., Jones, T., O’Boyle, M.: Microarchitectural design space exploration using an architecture-centric approach. In: Proceeding of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 40), pp. 262–271 (2007)
Wang, L., Tang, Y., Deng, Y., Qi, F., et al.: A Scalable and fast microprocessor design space exploration methodology. In: Proceedings of McSoC (2015)
Gibbons, P.B.: Big data: scale down, scale up, scale out. The Keynotes in 29th IEEE International Parallel and Distributed Processing Symposium (IPDPS 29) (2015)
Dhodapkar, A., Aauterbach, G., Li, S., et al.: SeaMicro SM10000-64 server: building datacenter servers using cell phone chips. In: Proceedings of 23rd IEEE HotChips Symposium (2011)
Gwennap, L.: ThunderX rattles server market: cavium develops 48-Core ARM processor to challenge Xeon. MicroProcessor report, 9 June 2014
Gwennap, L.: 3D packaging gains momentum: xilinx FPGAs to use stacked silicon - will processors follow suit? MicroProcessor report 12/27/10-01 December 2012
Dreslinski, R.G., Fick, D., Giridhar, B., Kim, G., Seo, S., Fojtik, M., Satpathy, S., Lee, Y., Kim, D., Liu, N., Wieckowski, M., Chen, G., Sylvester, D., Blaauw, D., Mudge, T.: Centip3De: a many-core prototype exploring 3D integration and near-threshold computing. Commun. ACM 56(11), 97–104 (2013)
Nickolls, J., Dally, W.J.: The GPU computing era. IEEE Micro 30(2), 56–69 (2010)
Acknowledgements
We thanks the other cpu@nudt team numbers that provide architecture, microarchitecture and physical design parameters of various processor. This work is supported in part by NSFC grants No. 61272139 and National Science and Technology Major Project HGJ-2015ZX01028001-001.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Tang, Y., Wang, L., Deng, Y., Ni, X., Dou, Q. (2016). The Macro-DSE for HPC Processing Unit: The Physical Constraints Perspective. In: Huang, X., Xiang, Y., Li, KC. (eds) Green, Pervasive, and Cloud Computing. Lecture Notes in Computer Science(), vol 9663. Springer, Cham. https://doi.org/10.1007/978-3-319-39077-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-39077-2_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39076-5
Online ISBN: 978-3-319-39077-2
eBook Packages: Computer ScienceComputer Science (R0)