skip to main content
10.1145/3358960.3379124acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections

Microservices: A Performance Tester's Dream or Nightmare?

Published:20 April 2020Publication History

ABSTRACT

In recent years, there has been a shift in software development towards microservice-based architectures, which consist of small services that focus on one particular functionality. Many companies are migrating their applications to such architectures to reap the benefits of microservices, such as increased flexibility, scalability and a smaller granularity of the offered functionality by a service. On the one hand, the benefits of microservices for functional testing are often praised, as the focus on one functionality and their smaller granularity allow for more targeted and more convenient testing. On the other hand, using microservices has their consequences (both positive and negative) on other types of testing, such as performance testing. Performance testing is traditionally done by establishing the baseline performance of a software version, which is then used to compare the performance testing results of later software versions. However, as we show in this paper, establishing such a baseline performance is challenging in microservice applications. In this paper, we discuss the benefits and challenges of microservices from a performance tester's point of view. Through a series of experiments on the TeaStore application, we demonstrate how microservices affect the performance testing process, and we demonstrate that it is not straightforward to achieve reliable performance testing results for a microservice application.

References

  1. Carlos M. Aderaldo, Nabor C. Mendoncca, Claus Pahl, and Pooyan Jamshidi. 2017. Benchmark requirements for Microservices Architecture Research. In Proc. 1st IEEE /ACM International Workshop on Establishing the Community -Wide Infrastructure for Architecture -Based Software Engineering, (ECASE @ICSE '17). IEEE, 8--13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Tarek M. Ahmed, Cor-Paul Bezemer, Tse-Hsun Chen, Ahmed E. Hassan, and Weiyi Shang. 2016. Studying the Effectiveness of Application Performance Management (APM) Tools for Detecting Performance Regressions for Web Applications: An Experience Report. In International Conference on Mining Software Repositories (MSR). ACM, 1--12.Google ScholarGoogle Scholar
  3. T. W. Anderson and D. A. Darling. 1952. Asymptotic Theory of Certain 'Goodness of Fit' Criteria Based on Stochastic Processes. Ann. Math. Statist., Vol. 23, 2 (06 1952), 193--212.Google ScholarGoogle Scholar
  4. Muhammad Moiz Arif, Weiyi Shang, and Emad Shihab. 2018. Empirical study on the discrepancy between performance testing results from virtual and physical environments. Empirical Software Engineering, Vol. 23, 3 (2018), 1490--1518.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Len Bass, Ingo Weber, and Liming Zhu. 2015. DevOps: A Software Architect's Perspective .Addison-Wesley.Google ScholarGoogle Scholar
  6. Cor-Paul Bezemer, Elric Milon, Andy Zaidman, and Johan Pouwelse. 2014. Detecting and Analyzing I/O Performance Regressions. Journal of Software: Evolution and Process (JSEP), Vol. 26, 12 (2014), 1193--1212.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanović, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. SIGPLAN Not., Vol. 41, 10 (2006), 169--190.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. André B. Bondi. 2014. Foundations of Software and System Performance Engineering: Process, Performance Modeling, Requirements, Testing, Scalability, and Practice .Addison-Wesley Professional.Google ScholarGoogle Scholar
  9. Maria Carla Calzarossa, Luisa Massari, and Daniele Tessera. 2016. Workload Characterization: A Survey Revisited. Comput. Surveys, Vol. 48, 3 (2016), 1--43.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Emmanuel Cecchet, Julie Marguerite, and Willy Zwaenepoel. 2002. Performance and Scalability of EJB Applications. SIGPLAN Not., Vol. 37, 11 (2002), 246--261.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Tse-Hsun Chen, Mark D. Syer, Weiyi Shang, Zhen Ming Jiang, Ahmed E. Hassan, Mohamed Nasser, and Parminder Flora. 2017. Analytics-driven Load Testing: An Industrial Experience Report on Load Testing of Large-scale Systems. In Proceedings of the 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP '17). IEEE Press, Piscataway, NJ, USA, 243--252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. E. Damasceno Costa, C. Bezemer, P. Leitner, and A. Andrzejak. 2019. What's Wrong With My Benchmark Results? Studying Bad Practices in JMH Benchmarks. IEEE Transactions on Software Engineering (2019), 1--14.Google ScholarGoogle Scholar
  13. Sandeep Dinesh. 2018. Kubernetes best practices: Resource requests and limits. https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-resource-requests-and-limits .Google ScholarGoogle Scholar
  14. Nicola Dragoni, Saverio Giallorenzo, Alberto Lluch Lafuente, Manuel Mazzara, Fabrizio Montesi, Ruslan Mustafin, and Larisa Safina. 2017. Microservices: yesterday, today, and tomorrow. In Present and Ulterior Software Engineering. Springer, Cham, 195--216.Google ScholarGoogle Scholar
  15. Thomas F. Dü llmann, Robert Heinrich, André van Hoorn, Teerat Pitakrat, Jü rgen Walter, and Felix Willnecker. 2017. CASPA: A Platform for Comparability of Architecture-Based Software Performance Engineering Approaches. In Proc. IEEE International Conference on Software Architecture (ICSA 2017) Workshops. IEEE, 294--297.Google ScholarGoogle Scholar
  16. Simon Eismann, Cor-Paul Bezemer, Weiyi Shang, Duvsan Okanović, and André van Hoorn. 2019. Microservices: A Performance Tester's Dream or Nightmare? - Replication package. https://doi.org/10.5281/zenodo.3582707 .Google ScholarGoogle Scholar
  17. Christian Esposito, Aniello Castiglione, and Kim-Kwang Raymond Choo. 2016. Challenges in Delivering Software in the Cloud as Microservices. IEEE Cloud Computing, Vol. 3, 5 (2016), 10--14.Google ScholarGoogle ScholarCross RefCross Ref
  18. King Chun Foo, Zhen Ming (Jack) Jiang, Bram Adams, Ahmed E. Hassan, Ying Zou, and Parminder Flora. 2015. An Industrial Case Study on the Automated Detection of Performance Regressions in Heterogeneous Environments. In Proc. of the 37th International Conference on Software Engineering (ICSE '15). IEEE, 159--168.Google ScholarGoogle ScholarCross RefCross Ref
  19. R. Gao, Z. M. Jiang, C. Barna, and M. Litoiu. 2016. A Framework to Evaluate the Effectiveness of Different Load Testing Analysis Techniques. In 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST). 22--32.Google ScholarGoogle Scholar
  20. Sen He, Glenna Manns, John Saunders, Wei Wang, Lori Pollock, and Mary Lou Soffa. 2019. A Statistics-based Performance Testing Methodology for Cloud Applications. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019). ACM, New York, NY, USA, 188--199.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Robert Heinrich, André van Hoorn, Holger Knoche, Fei Li, Lucy Ellen Lwakatare, Claus Pahl, Stefan Schulte, and Johannes Wettinger. 2017. Performance Engineering for Microservices: Research Challenges and Directions. In Companion 8th ACM/SPEC on International Conference on Performance Engineering (ICPE 2017). ACM, 223--226.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Nikolas Roman Herbst, Samuel Kounev, and Ralf H. Reussner. 2013. Elasticity in Cloud Computing: What It Is, and What It Is Not. In Proc. 10th International Conference on Autonomic Computing (ICAC'13). 23--27.Google ScholarGoogle Scholar
  23. International Organization for Standardization (ISO). 2005. ISO/IEC 25000:2005, Software Engineering - Software Product Quality Requirements and Evaluation (SQuaRE) .Google ScholarGoogle Scholar
  24. Pooyan Jamshidi, Claus Pahl, Nabor C. Mendoncc a, James Lewis, and Stefan Tilkov. 2018. Microservices: The Journey So Far and Challenges Ahead. IEEE Software, Vol. 35, 3 (2018), 24--35.Google ScholarGoogle ScholarCross RefCross Ref
  25. Zhen Ming Jiang and Ahmed E. Hassan. 2015. A Survey on Load Testing of Large-Scale Software Systems. IEEE Transactions on Software Engineering, Vol. 41, 11 (2015), 1091--1118.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Holger Knoche. 2016. Sustaining Runtime Performance While Incrementally Modernizing Transactional Monolithic Software Towards Microservices. In Proc. 7th ACM/SPEC on International Conference on Performance Engineering (ICPE '16). ACM, 121--124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Christoph Laaber and Philipp Leitner. 2018. An Evaluation of Open-Source Software Microbenchmark Suites for Continuous Performance Assessment. In Proc. Proc. 15th International Conference on Mining Software Repositories (MSR '18). IEEE.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Christoph Laaber, Joel Scheuner, and Philipp Leitner. 2019 a. Software Microbenchmarking in the Cloud. How Bad is It Really? Empirical Softw. Engg., Vol. 24, 4 (Aug. 2019), 2469--2508.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Christoph Laaber, Joel Scheuner, and Philipp Leitner. 2019 b. Software microbenchmarking in the cloud. How bad is it really? Empirical Software Engineering, Vol. 24, 4 (2019), 2469--2508.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Philipp Leitner and Cor-Paul Bezemer. 2017. An Exploratory Study of the State of Practice of Performance Testing in Java-Based Open Source Projects. In Proc. 8th ACM/SPEC on International Conference on Performance Engineering (ICPE '17). ACM, 373--384.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Philipp Leitner and Jürgen Cito. 2016. Patterns in the Chaos -- A Study of Performance Variation and Predictability in Public IaaS Clouds. ACM Trans. Internet Technol., Vol. 16, 3, Article 15 (2016), bibinfonumpages1--23 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. James Lewis and Martin Fowler. 2014. Microservices. https://martinfowler.com/articles/microservices.html Retrieved June 3, 2018 fromGoogle ScholarGoogle Scholar
  33. David J. Lilja. 2005. Measuring Computer Performance: A Practitioner's Guide .Cambridge University Press.Google ScholarGoogle Scholar
  34. Jeffrey D. Long, Du Feng, and Norman Cliff. 2003. Ordinal Analysis of Behavioral Data .John Wiley & Sons, Inc.Google ScholarGoogle Scholar
  35. Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An Empirical Analysis of Flaky Tests. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE). ACM, 643--653.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. H. B. Mann and D. R. Whitney. 1947. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. Ann. Math. Statist., Vol. 18, 1 (03 1947), 50--60.Google ScholarGoogle Scholar
  37. Aleksander Maricq, Dmitry Duplyakin, Ivo Jimenez, Carlos Maltzahn, Ryan Stutsman, and Robert Ricci. 2018. Taming performance variability. In 13th $$USENIX$$ Symposium on Operating Systems Design and Implementation (OSDI 18). 409--425.Google ScholarGoogle Scholar
  38. Sam Newman. 2015. Building Microservices 1st ed.). O'Reilly Media, Inc., Sebastopol, California, USA.Google ScholarGoogle Scholar
  39. A. V. Papadopoulos, L. Versluis, A. Bauer, N. Herbst, J. Von Kistowski, A. Alieldin, C. Abad, J. N. Amaral, P. Tima, and A. Iosup. 2019. Methodological Principles for Reproducible Performance Evaluation in Cloud Computing. IEEE Transactions on Software Engineering (2019), 1--1.Google ScholarGoogle Scholar
  40. Florian Rademacher, Jonas Sorgalla, and Sabine Sachweh. 2018. Challenges of Domain-Driven Microservice Design: A Model-Driven Perspective. IEEE Software, Vol. 35, 3 (2018), 36--43.Google ScholarGoogle ScholarCross RefCross Ref
  41. Jeanine Romano, Jeffrey D Kromrey, Jesse Coraggio, Jeff Skowronek, and Linda Devine. 2006. Exploring methods for evaluating group differences on the NSSE and other surveys: Are the t-test and Cohen's d indices the most appropriate choices. In Annual meeting of the Southern Association for Institutional Research .Google ScholarGoogle Scholar
  42. Jean-Mathieu Saponaro. 2016. Monitoring Kubernetes performance metrics. https://www.datadoghq.com/blog/monitoring-kubernetes-performance-metrics/ Retrieved June 6, 2018 fromGoogle ScholarGoogle Scholar
  43. Joel Scheuner and Philipp Leitner. 2018. A Cloud Benchmark Suite Combining Micro and Applications Benchmarks. In Companion 2018 ACM/SPEC International Conference on Performance Engineering (ICPE '18). 161--166.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Weiyi Shang, Ahmed E. Hassan, Mohamed Nasser, and Parminder Flora. 2015. Automated Detection of Performance Regressions Using Regression Models on Clustered Performance Counters. In Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering (ICPE '15). ACM, New York, NY, USA, 15--26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. S. Elliot Sim, Steve Easterbrook, and Richard C. Holt. 2003. Using benchmarking to advance research: a challenge to software engineering. In Proceedings of the 25th International Conference on Software Engineering (ICSE). 74--83.Google ScholarGoogle Scholar
  46. Nikolai V Smirnov. 1939. Estimate of deviation between empirical distribution functions in two independent samples. Bulletin Moscow University, Vol. 2, 2 (1939), 3--16.Google ScholarGoogle Scholar
  47. Petr Stefan, Vojtech Horky, Lubomir Bulej, and Petr Tuma. 2017. Unit Testing Performance in Java Projects: Are We There Yet?. In Proc. of the 8th ACM/SPEC on International Conference on Performance Engineering (ICPE '17). ACM, 401--412.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Alexandru Uta and Harry Obaseki. 2018. A Performance Study of Big Data Workloads in Cloud Datacenters with Network Variability. In Companion 2018 ACM/SPEC International Conference on Performance Engineering (ICPE '18). 113--118.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Christian Vögele, André van Hoorn, Eike Schulz, Wilhelm Hasselbring, and Helmut Krcmar. 2018. WESSBAS: Extraction of Probabilistic Workload Specifications for Load Testing and Performance Prediction--A Model-Driven Approach for Session-Based Application Systems. Softw. and Syst. Modeling, Vol. 17, 2 (2018), 443--477.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Jóakim von Kistowski, Maximilian Deffner, and Samuel Kounev. 2018a. Run-time Prediction of Power Consumption for Component Deployments. In Proceedings of the 15th IEEE International Conference on Autonomic Computing (ICAC 2018) .Google ScholarGoogle ScholarCross RefCross Ref
  51. Jóakim von Kistowski, Simon Eismann, Norbert Schmitt, André Bauer, Johannes Grohmann, and Samuel Kounev. 2018b. TeaStore: A Micro-Service Reference Application for Benchmarking, Modeling and Resource Management Research. In Proceedings of the 26th IEEE International Symposium on the Modelling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS '18).Google ScholarGoogle ScholarCross RefCross Ref
  52. Frank Wilcoxon. 1945. Individual comparisons by ranking methods. Biometrics bulletin, Vol. 1, 6 (1945), 80--83.Google ScholarGoogle Scholar
  53. Claes Wohlin, Per Runeson, Martin Höst, Magnus C. Ohlsson, and Bjö rn Regnell. 2012. Experimentation in Software Engineering. Springer.Google ScholarGoogle Scholar
  54. Pengcheng Xiong, Calton Pu, Xiaoyun Zhu, and Rean Griffith. 2013. PerfGuard: An Automated Model-driven Framework for Application Performance Diagnosis in Consolidated Cloud Environments. In Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering (ICPE '13). ACM, New York, NY, USA, 271--282.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Microservices: A Performance Tester's Dream or Nightmare?

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICPE '20: Proceedings of the ACM/SPEC International Conference on Performance Engineering
      April 2020
      319 pages
      ISBN:9781450369916
      DOI:10.1145/3358960

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 April 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      ICPE '20 Paper Acceptance Rate15of62submissions,24%Overall Acceptance Rate252of851submissions,30%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader