Skip to main content
Log in

ARED: automata-based runtime estimation for distributed systems using deep learning

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

High-performance computers are used for computation-intensive tasks. It is essential for these systems to simultaneously execute several computation-intensive tasks for efficient and timely system utilization. Since typical tasks have a longer runtime, it is essential to determine the runtime of each task prior to execution and schedule them accordingly. We propose a method for predicting the runtime of MPI-based software. Initially, we analyze the source code of the software by translating the code to finite automata and measuring the state complexity. Next, the runtime of software is trained using a deep neural network (DNN) along with its state complexity. Herein, we propose three models based on DNN, statistics and their hybrid. DNN model is superior in comparison. Additionally, the adaptability of our method is demonstrated by showing that our method can adapt on new environment with 90% accuracy on various software.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. Code is available on https://github.com/hyunjoonc/stc-rtpred.

References

  1. Alipourfard, O., Liu, H.H., Chen, J., Venkataraman, S., Yu, M., Zhang, M.: CherryPick: Adaptively unearthing the best cloud configurations for big data analytics. In: 14th USENIX Symposium on Networked Systems Design and Implementation, pp. 363–378 (2017)

  2. Alur, R., Benedikt, M., Etessami, K., Godefroid, P., Reps, T., Yannakakis, M.: Analysis of recursive state machines. ACM Trans. Programm. Lang. Syst. 27(4), 786–818 (2005)

    Article  Google Scholar 

  3. George, D., Girase, P., Gupta, M., Gupta, P., Sharma, A.: Programming language inter-conversion. Int. J. Comput. Appl. 1(20), 68–74 (2010)

    Google Scholar 

  4. Grohmann, J., Eismann, S., Bauer, A., Züfle, M., Herbst, N., Kounev, S.: Utilizing clustering to optimize resource demand estimation approaches. In: 2019 IEEE 4th International Workshops on Foundations and Applications of Self Systems, pp. 134–139 (2019)

  5. Harel, D.: Statecharts: a visual formalism for complex systems. Sci. Comput. Programm. 8(3), 231–274 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  6. Holzer, M., Kutrib, M.: Nondeterministic descriptional complexity of regular languages. Int. J. Found. Comput. Sci. 14(6), 1087–1102 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  7. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning, 37, pp. 448–456 (2015)

  8. Jain, N., Bhatele, A., Robson, M.P., Gamblin, T., Kale, L.V.: Predicting application performance using supervised learning on communication features. In: SC ’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–12 (2013)

  9. Khoussainov, B., Nerode, A.: Automata Theory and its Applications, pp. 121–207. Birkhäuser, Boston (2007)

    MATH  Google Scholar 

  10. Lawson, C.L., Hanson, R.J., Kincaid, D.R., Krogh, F.T.: Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Softw. 5(3), 308–323 (1979)

    Article  MATH  Google Scholar 

  11. Milan, S., Rajabion, L., Darwesh, A., Hosseinzadeh, M., Navimipour, N.: Priority-based task scheduling method over cloudlet using a swarm intelligence algorithm. Clust. Comput. 23, 663–671 (2020)

    Article  Google Scholar 

  12. Muraña, J., Nesmachnow, S., Armenta, F., Tchernykh, A.: Characterization, modeling and scheduling of power consumption of scientific computing applications in multicores. Clust. Comput. 22, 839–859 (2019)

    Article  Google Scholar 

  13. Nadeem, F., Alghazzawi, D., Mashat, A., Fakeeh, K., Almalaise, A., Hagras, H.: Modeling and predicting execution time of scientific workflows in the Grid using radial basis function neural network. Clust. Comput. 20, 2805–2819 (2017)

    Article  Google Scholar 

  14. Neelima, P., Reddy, A.: An efficient load balancing system using adaptive dragonfly algorithm in cloud computing. Clust. Comput. 23, 2891–2899 (2020)

    Article  Google Scholar 

  15. Panda, D.K.: Network-based Computing Laboratory, The Ohio State University: OSU micro-benchmarks (2013–2018). http://mvapich.cse.ohio-state.edu/benchmarks/

  16. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in python. J. Mach. Learning Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  17. Petitet, A., Whaley, R.C., Dongarra, J., Cleary, A.: HPL—a portable implementation of the high-performance linpack benchmark for distributed-memory computers (2008–2018). https://www.netlib.org/benchmark/hpl/index.html

  18. Saillant, T., Weill, J.C., Mougeot, M.: Predicting job power consumption based on RJMS submission data in HPC systems. ISC pp. 63–82 (2020)

  19. Sipser, M.: Introduction to the Theory of Computation, 3rd edn. Cengage Learning, Boston, MA (2013)

    MATH  Google Scholar 

  20. Smith, S.A., Cromey, C.E., Lowenthal, D.K., Domke, J., Jain, N., Thiagarajan, J.J., Bhatele, A.: Mitigating inter-job inteference using adaptive flow-aware routing. In: SC ’18: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–12 (2018)

  21. Stillwell, M., Vivien, F., Casanova, H.: Dynamic fractional resource scheduling for HPC workloads. In: 2010 IEEE International Symposium on Parallel Distributed Processing, pp. 1–12 (2010)

  22. Tanash, M., Dunn, B., Andresen, D., Hsu, W., Yand, H., Okanlawon, A.: Improving HPC system performance by predicting job resources via supervised machine learning. In: Proceedings of the PEARC, p. 69. ACM (2019)

  23. Venkataraman, S., Yang, Z., Franklin, M., Recht, B., Stoica, I.: Ernest: Efficient performance prediction for large-scale advanced analytics. In: 13th USENIX Symposium on Networked Systems Design and Implementation, pp. 363–378 (2016)

  24. Yu, S.: State complexity of regular languages. J. Autom. Lang. Combin. 6(2), 221–234 (2001)

    MathSciNet  MATH  Google Scholar 

  25. Yu, S., Zhuang, Q., Salomaa, K.: The state complexities of some basic operations on regular languages. Theor. Comput. Sci. 125(2), 315–328 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  26. Zeiler, M.D.: Adadelta: An adaptive learning rate method (2012). https://arxiv.org/abs/1212.5701

Download references

Acknowledgements

We thank the anonymous referees for a careful reading of an earlier version of the paper and for many useful suggestions that have improved the presentation. Cheon and Han were supported by the Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (2018-0-00276) and the AI Graduate School Program (2020-0-01361), and Ryu and Park were supported by the KISTI (Grant No.K-19-L02-C06).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yo-Sub Han.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Hyunjoon Cheon and Jinseung Ryu have contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheon, H., Ryu, J., Ryou, J. et al. ARED: automata-based runtime estimation for distributed systems using deep learning. Cluster Comput 26, 2629–2641 (2023). https://doi.org/10.1007/s10586-021-03272-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-021-03272-w

Keywords

Navigation