Abstract
Fast and accurate performance and power prediction is a key challenge in pre-silicon design evaluations during the early phases of hardware and software co-development. Performance evaluation using full-system simulation is prohibitively slow, especially with real world applications. By contrast, analytical models are not sufficiently accurate or still require target-specific execution statistics that may be slow or difficult to obtain. In this paper, we present LACross, a learning-based cross-platform prediction technique aimed at predicting the time-varying performance and power of a benchmark on a target platform using hardware counter statistics obtained while running natively on a host platform. We employ a fine-grained phase-based approach, where the learning algorithm synthesizes analytical proxy models that predict the performance and power of the workload in each program phase from performance statistics obtained on the host. Our learning approach relies on a one-time training phase using a target reference model or real hardware. We train our models on less than 160 programs from the ACM ICPC database, and demonstrate prediction accuracy and speed on 35 programs from SPEC CPU2006, MiBench and SD-VBS benchmark suites. Results show that with careful choice of phase granularity, we can achieve on average over 97% performance and power prediction accuracy at simulation speeds of over 500 MIPS.
Similar content being viewed by others
References
Abu-Mostafa, Y.S., Magdon-Ismail, M., Lin, H.-T.: Learning from Data. AMLBook, United States (2012)
Aho, A.V., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, and Tools. Addison-Wesley Longman Publishing Co., Inc., Boston (1986)
AMD Phenom II Processor. http://www.amd.com/en-us/products/processors/desktop/phenom-ii
Balasubramonian, R., Albonesi, D., Buyuktosunoglu, A., Dwarkadas, S.: Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In: MICRO (2000)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Binkert, N., Beckmann, B., Black, G., Reinhardt, S.K., Saidi, A., Basu, A., Hestness, J., Hower, D.R., Krishna, T., Sardashti, S., Sen, R., Sewell, K., Shoaib, M., Vaish, N., Hill, M.D., Wood, D.A.: The gem5 simulator. SIGARCH Comput. Archit. News 39(2), 1–7 (2011)
Bircher, W., Valluri, M., Law, J., John, L.: Runtime identification of microprocessor energy saving opportunities. In: ISLPED (2005)
Bringmann, O., Ecker, W., Gerstlauer, A., Goyal, A., Mueller-Gritschneder, D., Sasidharan, P., Singh, S.: The next generation of virtual prototyping: ultra-fast yet accurate simulation of HW/SW systems. In: DATE (2015)
Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A portable programming interface for performance evaluation on modern processors. Int. J. High Perform. Comput. Appl. 14(3), 189–204 (2000)
Cai, L., Gerstlauer, A., Gajski, D.: Retargetable profiling for rapid, early system-level design space exploration. In: DAC (2004)
Carlson, T.E., Heirman, W., Eeckhout, L.: Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulations. In: International Conference for High Performance Computing, Networking, Storage and Analysis (SC) (2011)
Chakravarty, S., Zhao, Z., Gerstlauer, A.: Automated, retargetable back-annotation for host compiled performance and power modeling. In: CODES+ISSS (2013)
Chiou, D., Sunwoo, D., Kim, J., Patil, N.A., Reinhart, W., Johnson, D.E., Keefe, J., Angepat, H.: FPGA-accelerated simulation technologies (FAST): Fast, full-system. cycle-accurate simulators. In: MICRO (2007)
Emma, P.G., Davidson, E.S.: Characterization of branch and data dependencies on programs for evaluating pipeline performance. IEEE Trans. Comput. 36(7), 859–875 (1987)
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)
Guthaus, M.R., Ringenberg, J.S., Ernst, D., Austin, T.M., Mudge, T., Brown, R.B.: MiBench: a free, commercially representative embedded benchmark suite. In: IISWC (2001)
Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall PTR, Englewood Cliffs (1998)
Henning, J.L.: Spec cpu2006 benchmark descriptions. SIGARCH Comput. Archit. News 34(4), 1–17 (2006)
Huang, M., Renau, J., Yoo, S.-M., Torrellas, J.: A framework for dynamic energy efficiency and temperature management. In: MICRO (2000)
Ipek, E., Mckee, S.A.: Efficiently exploring architectural design spaces via predictive modeling. In: ASPLOS (2006)
Intel Core i7-920 Processor. http://ark.intel.com/products/37147/Intel-Core-i7-920-Processor-8M-Cache-266-GHz-480-GTs-Intel-QPI
Joseph, P., Vaswani, K., Thazhuthaveetil, M.: Construction and use of linear regression models for processor performance analysis (2006)
Karkhanis, T.S., Smith, J.E.: A first-order superscalar processor model. In: ISCA (2004)
Khan, S., Xekalakis, P., Cavazos, J., Cintra, M.: Using predictive modeling for cross-program design space exploration in multicore systems. In: PACT (2007)
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI (1995)
Lee, B.C., Brooks, D.M.: Illustrative design space studies with microarchitectural regression models. In: HPCA (2007)
Lee, B.C., Brooks, D.M.: A tutorial in spatial sampling and regression strategies for microarchitectural analysis (2007)
Lee, B.C., Collins, J., Wang, H., Brooks, D.: CPR: composable performance regression for scalable multiprocessor models (2008)
Li, S., Ahn, J.H., Strong, R.D., Brockman, J.B., Tullsen, D.M., Jouppi, N.P.: McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: MICRO (2009)
Magnusson, P.S., Christensson, M., Eskilson, J., Forsgren, D., Hållberg, G., Högberg, J., Larsson, F., Moestedt, A., Werner, B.: Simics: a full system simulation platform. IEEE Comput. 35(2), 50–58 (2002)
McCullough, J.C., Agarwal, Y., Chandrashekar, J., Kuppuswamy, S., Snoeren, A.C., Gupta, R.K.: Evaluating the effectiveness of model-based power characterization. In: USENIX (2011)
Mucci, P.J., Browne, S., Deane, C., Ho, G.: PAPI: A portable interface to hardware performance counters. In: DoD HPCMP (1999)
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103, 127–152 (2005)
Noonburg, D.B., Shen, J.P.: Theoretical modeling of superscalar processor performance. In: MICRO (1994)
ODROID U3 Development Board. http://www.hardkernel.com/main/products/prdt_info.php?g_code=g138745696275
ODROID XU3 Development Board. http://www.amd.com/en-us/products/processors/desktop/phenom-ii
Quadratic programming—MATLAB quadprog. http://www.mathworks.com/help/optim/ug/quadprog.html
Sherwood, T., Perelman, E., Calder, B.: Basic block distribution analysis to find periodic behavior and simulation points in applications. In: PACT (2001)
Shlens, J.: A Tutorial on Principal Component Analysis. arXiv:1404.1100 (2014)
Smola, A.J., Schölkopf, B.: A tutorial on support vector regression. Stat. Comput. 14(3), 199–222 (2004)
Sorin, D.J., Pai, V.S., Adve, S.V., Vernon, M.K., Wood, D.A.: Analytic evaluation of shared-memory systems with ILP processors. In: ISCA (1998)
The ACM-ICPC International Collegiate Programming Contest. http://icpc.baylor.edu/
The LLVM Compiler Infrastructure. http://llvm.org/
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1994)
Venkata, S., Ahn, I., Jeon, D., Gupta, A., Louie, C., Garcia, S., Belongie, S., Taylor, M.: SD-VBS: The San Diego vision benchmark suite. In: IISWC (2009)
Zheng, X., John, L.K., Gerstlauer, A.: Accurate phase-level cross-platform power and performance estimation. In: DAC (2016)
Zheng, X., Ravikumar, P., John, L.K., Gerstlauer, A.: Learning-based analytical cross-platform performance prediction. In: SAMOS (2015)
Acknowledgements
This work has been supported by Semiconductor Research Corporation (SRC) Grant 2012-HJ-2317.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zheng, X., John, L.K. & Gerstlauer, A. LACross: Learning-Based Analytical Cross-Platform Performance and Power Prediction. Int J Parallel Prog 45, 1488–1514 (2017). https://doi.org/10.1007/s10766-017-0487-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-017-0487-0