Skip to main content
Log in

LACross: Learning-Based Analytical Cross-Platform Performance and Power Prediction

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Fast and accurate performance and power prediction is a key challenge in pre-silicon design evaluations during the early phases of hardware and software co-development. Performance evaluation using full-system simulation is prohibitively slow, especially with real world applications. By contrast, analytical models are not sufficiently accurate or still require target-specific execution statistics that may be slow or difficult to obtain. In this paper, we present LACross, a learning-based cross-platform prediction technique aimed at predicting the time-varying performance and power of a benchmark on a target platform using hardware counter statistics obtained while running natively on a host platform. We employ a fine-grained phase-based approach, where the learning algorithm synthesizes analytical proxy models that predict the performance and power of the workload in each program phase from performance statistics obtained on the host. Our learning approach relies on a one-time training phase using a target reference model or real hardware. We train our models on less than 160 programs from the ACM ICPC database, and demonstrate prediction accuracy and speed on 35 programs from SPEC CPU2006, MiBench and SD-VBS benchmark suites. Results show that with careful choice of phase granularity, we can achieve on average over 97% performance and power prediction accuracy at simulation speeds of over 500 MIPS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Abu-Mostafa, Y.S., Magdon-Ismail, M., Lin, H.-T.: Learning from Data. AMLBook, United States (2012)

  2. Aho, A.V., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, and Tools. Addison-Wesley Longman Publishing Co., Inc., Boston (1986)

    MATH  Google Scholar 

  3. AMD Phenom II Processor. http://www.amd.com/en-us/products/processors/desktop/phenom-ii

  4. Balasubramonian, R., Albonesi, D., Buyuktosunoglu, A., Dwarkadas, S.: Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In: MICRO (2000)

  5. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  6. Binkert, N., Beckmann, B., Black, G., Reinhardt, S.K., Saidi, A., Basu, A., Hestness, J., Hower, D.R., Krishna, T., Sardashti, S., Sen, R., Sewell, K., Shoaib, M., Vaish, N., Hill, M.D., Wood, D.A.: The gem5 simulator. SIGARCH Comput. Archit. News 39(2), 1–7 (2011)

    Article  Google Scholar 

  7. Bircher, W., Valluri, M., Law, J., John, L.: Runtime identification of microprocessor energy saving opportunities. In: ISLPED (2005)

  8. Bringmann, O., Ecker, W., Gerstlauer, A., Goyal, A., Mueller-Gritschneder, D., Sasidharan, P., Singh, S.: The next generation of virtual prototyping: ultra-fast yet accurate simulation of HW/SW systems. In: DATE (2015)

  9. Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A portable programming interface for performance evaluation on modern processors. Int. J. High Perform. Comput. Appl. 14(3), 189–204 (2000)

    Article  Google Scholar 

  10. Cai, L., Gerstlauer, A., Gajski, D.: Retargetable profiling for rapid, early system-level design space exploration. In: DAC (2004)

  11. Carlson, T.E., Heirman, W., Eeckhout, L.: Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulations. In: International Conference for High Performance Computing, Networking, Storage and Analysis (SC) (2011)

  12. Chakravarty, S., Zhao, Z., Gerstlauer, A.: Automated, retargetable back-annotation for host compiled performance and power modeling. In: CODES+ISSS (2013)

  13. Chiou, D., Sunwoo, D., Kim, J., Patil, N.A., Reinhart, W., Johnson, D.E., Keefe, J., Angepat, H.: FPGA-accelerated simulation technologies (FAST): Fast, full-system. cycle-accurate simulators. In: MICRO (2007)

  14. Emma, P.G., Davidson, E.S.: Characterization of branch and data dependencies on programs for evaluating pipeline performance. IEEE Trans. Comput. 36(7), 859–875 (1987)

    Article  Google Scholar 

  15. Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)

    MATH  Google Scholar 

  16. Guthaus, M.R., Ringenberg, J.S., Ernst, D., Austin, T.M., Mudge, T., Brown, R.B.: MiBench: a free, commercially representative embedded benchmark suite. In: IISWC (2001)

  17. Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall PTR, Englewood Cliffs (1998)

    MATH  Google Scholar 

  18. Henning, J.L.: Spec cpu2006 benchmark descriptions. SIGARCH Comput. Archit. News 34(4), 1–17 (2006)

    Article  Google Scholar 

  19. Huang, M., Renau, J., Yoo, S.-M., Torrellas, J.: A framework for dynamic energy efficiency and temperature management. In: MICRO (2000)

  20. Ipek, E., Mckee, S.A.: Efficiently exploring architectural design spaces via predictive modeling. In: ASPLOS (2006)

  21. Intel Core i7-920 Processor. http://ark.intel.com/products/37147/Intel-Core-i7-920-Processor-8M-Cache-266-GHz-480-GTs-Intel-QPI

  22. Joseph, P., Vaswani, K., Thazhuthaveetil, M.: Construction and use of linear regression models for processor performance analysis (2006)

  23. Karkhanis, T.S., Smith, J.E.: A first-order superscalar processor model. In: ISCA (2004)

  24. Khan, S., Xekalakis, P., Cavazos, J., Cintra, M.: Using predictive modeling for cross-program design space exploration in multicore systems. In: PACT (2007)

  25. Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI (1995)

  26. Lee, B.C., Brooks, D.M.: Illustrative design space studies with microarchitectural regression models. In: HPCA (2007)

  27. Lee, B.C., Brooks, D.M.: A tutorial in spatial sampling and regression strategies for microarchitectural analysis (2007)

  28. Lee, B.C., Collins, J., Wang, H., Brooks, D.: CPR: composable performance regression for scalable multiprocessor models (2008)

  29. Li, S., Ahn, J.H., Strong, R.D., Brockman, J.B., Tullsen, D.M., Jouppi, N.P.: McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: MICRO (2009)

  30. Magnusson, P.S., Christensson, M., Eskilson, J., Forsgren, D., Hållberg, G., Högberg, J., Larsson, F., Moestedt, A., Werner, B.: Simics: a full system simulation platform. IEEE Comput. 35(2), 50–58 (2002)

    Article  Google Scholar 

  31. McCullough, J.C., Agarwal, Y., Chandrashekar, J., Kuppuswamy, S., Snoeren, A.C., Gupta, R.K.: Evaluating the effectiveness of model-based power characterization. In: USENIX (2011)

  32. Mucci, P.J., Browne, S., Deane, C., Ho, G.: PAPI: A portable interface to hardware performance counters. In: DoD HPCMP (1999)

  33. Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103, 127–152 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  34. Noonburg, D.B., Shen, J.P.: Theoretical modeling of superscalar processor performance. In: MICRO (1994)

  35. ODROID U3 Development Board. http://www.hardkernel.com/main/products/prdt_info.php?g_code=g138745696275

  36. ODROID XU3 Development Board. http://www.amd.com/en-us/products/processors/desktop/phenom-ii

  37. Quadratic programming—MATLAB quadprog. http://www.mathworks.com/help/optim/ug/quadprog.html

  38. Sherwood, T., Perelman, E., Calder, B.: Basic block distribution analysis to find periodic behavior and simulation points in applications. In: PACT (2001)

  39. Shlens, J.: A Tutorial on Principal Component Analysis. arXiv:1404.1100 (2014)

  40. Smola, A.J., Schölkopf, B.: A tutorial on support vector regression. Stat. Comput. 14(3), 199–222 (2004)

  41. Sorin, D.J., Pai, V.S., Adve, S.V., Vernon, M.K., Wood, D.A.: Analytic evaluation of shared-memory systems with ILP processors. In: ISCA (1998)

  42. The ACM-ICPC International Collegiate Programming Contest. http://icpc.baylor.edu/

  43. The LLVM Compiler Infrastructure. http://llvm.org/

  44. Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1994)

    MathSciNet  MATH  Google Scholar 

  45. Venkata, S., Ahn, I., Jeon, D., Gupta, A., Louie, C., Garcia, S., Belongie, S., Taylor, M.: SD-VBS: The San Diego vision benchmark suite. In: IISWC (2009)

  46. Zheng, X., John, L.K., Gerstlauer, A.: Accurate phase-level cross-platform power and performance estimation. In: DAC (2016)

  47. Zheng, X., Ravikumar, P., John, L.K., Gerstlauer, A.: Learning-based analytical cross-platform performance prediction. In: SAMOS (2015)

Download references

Acknowledgements

This work has been supported by Semiconductor Research Corporation (SRC) Grant 2012-HJ-2317.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinnian Zheng.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, X., John, L.K. & Gerstlauer, A. LACross: Learning-Based Analytical Cross-Platform Performance and Power Prediction. Int J Parallel Prog 45, 1488–1514 (2017). https://doi.org/10.1007/s10766-017-0487-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-017-0487-0

Keywords

Navigation