ABSTRACT
In recent years, there has been a shift in software development towards microservice-based architectures, which consist of small services that focus on one particular functionality. Many companies are migrating their applications to such architectures to reap the benefits of microservices, such as increased flexibility, scalability and a smaller granularity of the offered functionality by a service. On the one hand, the benefits of microservices for functional testing are often praised, as the focus on one functionality and their smaller granularity allow for more targeted and more convenient testing. On the other hand, using microservices has their consequences (both positive and negative) on other types of testing, such as performance testing. Performance testing is traditionally done by establishing the baseline performance of a software version, which is then used to compare the performance testing results of later software versions. However, as we show in this paper, establishing such a baseline performance is challenging in microservice applications. In this paper, we discuss the benefits and challenges of microservices from a performance tester's point of view. Through a series of experiments on the TeaStore application, we demonstrate how microservices affect the performance testing process, and we demonstrate that it is not straightforward to achieve reliable performance testing results for a microservice application.
- Carlos M. Aderaldo, Nabor C. Mendoncca, Claus Pahl, and Pooyan Jamshidi. 2017. Benchmark requirements for Microservices Architecture Research. In Proc. 1st IEEE /ACM International Workshop on Establishing the Community -Wide Infrastructure for Architecture -Based Software Engineering, (ECASE @ICSE '17). IEEE, 8--13.Google ScholarDigital Library
- Tarek M. Ahmed, Cor-Paul Bezemer, Tse-Hsun Chen, Ahmed E. Hassan, and Weiyi Shang. 2016. Studying the Effectiveness of Application Performance Management (APM) Tools for Detecting Performance Regressions for Web Applications: An Experience Report. In International Conference on Mining Software Repositories (MSR). ACM, 1--12.Google Scholar
- T. W. Anderson and D. A. Darling. 1952. Asymptotic Theory of Certain 'Goodness of Fit' Criteria Based on Stochastic Processes. Ann. Math. Statist., Vol. 23, 2 (06 1952), 193--212.Google Scholar
- Muhammad Moiz Arif, Weiyi Shang, and Emad Shihab. 2018. Empirical study on the discrepancy between performance testing results from virtual and physical environments. Empirical Software Engineering, Vol. 23, 3 (2018), 1490--1518.Google ScholarDigital Library
- Len Bass, Ingo Weber, and Liming Zhu. 2015. DevOps: A Software Architect's Perspective .Addison-Wesley.Google Scholar
- Cor-Paul Bezemer, Elric Milon, Andy Zaidman, and Johan Pouwelse. 2014. Detecting and Analyzing I/O Performance Regressions. Journal of Software: Evolution and Process (JSEP), Vol. 26, 12 (2014), 1193--1212.Google ScholarDigital Library
- Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanović, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. SIGPLAN Not., Vol. 41, 10 (2006), 169--190.Google ScholarDigital Library
- André B. Bondi. 2014. Foundations of Software and System Performance Engineering: Process, Performance Modeling, Requirements, Testing, Scalability, and Practice .Addison-Wesley Professional.Google Scholar
- Maria Carla Calzarossa, Luisa Massari, and Daniele Tessera. 2016. Workload Characterization: A Survey Revisited. Comput. Surveys, Vol. 48, 3 (2016), 1--43.Google ScholarDigital Library
- Emmanuel Cecchet, Julie Marguerite, and Willy Zwaenepoel. 2002. Performance and Scalability of EJB Applications. SIGPLAN Not., Vol. 37, 11 (2002), 246--261.Google ScholarDigital Library
- Tse-Hsun Chen, Mark D. Syer, Weiyi Shang, Zhen Ming Jiang, Ahmed E. Hassan, Mohamed Nasser, and Parminder Flora. 2017. Analytics-driven Load Testing: An Industrial Experience Report on Load Testing of Large-scale Systems. In Proceedings of the 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP '17). IEEE Press, Piscataway, NJ, USA, 243--252.Google ScholarDigital Library
- D. E. Damasceno Costa, C. Bezemer, P. Leitner, and A. Andrzejak. 2019. What's Wrong With My Benchmark Results? Studying Bad Practices in JMH Benchmarks. IEEE Transactions on Software Engineering (2019), 1--14.Google Scholar
- Sandeep Dinesh. 2018. Kubernetes best practices: Resource requests and limits. https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-resource-requests-and-limits .Google Scholar
- Nicola Dragoni, Saverio Giallorenzo, Alberto Lluch Lafuente, Manuel Mazzara, Fabrizio Montesi, Ruslan Mustafin, and Larisa Safina. 2017. Microservices: yesterday, today, and tomorrow. In Present and Ulterior Software Engineering. Springer, Cham, 195--216.Google Scholar
- Thomas F. Dü llmann, Robert Heinrich, André van Hoorn, Teerat Pitakrat, Jü rgen Walter, and Felix Willnecker. 2017. CASPA: A Platform for Comparability of Architecture-Based Software Performance Engineering Approaches. In Proc. IEEE International Conference on Software Architecture (ICSA 2017) Workshops. IEEE, 294--297.Google Scholar
- Simon Eismann, Cor-Paul Bezemer, Weiyi Shang, Duvsan Okanović, and André van Hoorn. 2019. Microservices: A Performance Tester's Dream or Nightmare? - Replication package. https://doi.org/10.5281/zenodo.3582707 .Google Scholar
- Christian Esposito, Aniello Castiglione, and Kim-Kwang Raymond Choo. 2016. Challenges in Delivering Software in the Cloud as Microservices. IEEE Cloud Computing, Vol. 3, 5 (2016), 10--14.Google ScholarCross Ref
- King Chun Foo, Zhen Ming (Jack) Jiang, Bram Adams, Ahmed E. Hassan, Ying Zou, and Parminder Flora. 2015. An Industrial Case Study on the Automated Detection of Performance Regressions in Heterogeneous Environments. In Proc. of the 37th International Conference on Software Engineering (ICSE '15). IEEE, 159--168.Google ScholarCross Ref
- R. Gao, Z. M. Jiang, C. Barna, and M. Litoiu. 2016. A Framework to Evaluate the Effectiveness of Different Load Testing Analysis Techniques. In 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST). 22--32.Google Scholar
- Sen He, Glenna Manns, John Saunders, Wei Wang, Lori Pollock, and Mary Lou Soffa. 2019. A Statistics-based Performance Testing Methodology for Cloud Applications. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019). ACM, New York, NY, USA, 188--199.Google ScholarDigital Library
- Robert Heinrich, André van Hoorn, Holger Knoche, Fei Li, Lucy Ellen Lwakatare, Claus Pahl, Stefan Schulte, and Johannes Wettinger. 2017. Performance Engineering for Microservices: Research Challenges and Directions. In Companion 8th ACM/SPEC on International Conference on Performance Engineering (ICPE 2017). ACM, 223--226.Google ScholarDigital Library
- Nikolas Roman Herbst, Samuel Kounev, and Ralf H. Reussner. 2013. Elasticity in Cloud Computing: What It Is, and What It Is Not. In Proc. 10th International Conference on Autonomic Computing (ICAC'13). 23--27.Google Scholar
- International Organization for Standardization (ISO). 2005. ISO/IEC 25000:2005, Software Engineering - Software Product Quality Requirements and Evaluation (SQuaRE) .Google Scholar
- Pooyan Jamshidi, Claus Pahl, Nabor C. Mendoncc a, James Lewis, and Stefan Tilkov. 2018. Microservices: The Journey So Far and Challenges Ahead. IEEE Software, Vol. 35, 3 (2018), 24--35.Google ScholarCross Ref
- Zhen Ming Jiang and Ahmed E. Hassan. 2015. A Survey on Load Testing of Large-Scale Software Systems. IEEE Transactions on Software Engineering, Vol. 41, 11 (2015), 1091--1118.Google ScholarDigital Library
- Holger Knoche. 2016. Sustaining Runtime Performance While Incrementally Modernizing Transactional Monolithic Software Towards Microservices. In Proc. 7th ACM/SPEC on International Conference on Performance Engineering (ICPE '16). ACM, 121--124.Google ScholarDigital Library
- Christoph Laaber and Philipp Leitner. 2018. An Evaluation of Open-Source Software Microbenchmark Suites for Continuous Performance Assessment. In Proc. Proc. 15th International Conference on Mining Software Repositories (MSR '18). IEEE.Google ScholarDigital Library
- Christoph Laaber, Joel Scheuner, and Philipp Leitner. 2019 a. Software Microbenchmarking in the Cloud. How Bad is It Really? Empirical Softw. Engg., Vol. 24, 4 (Aug. 2019), 2469--2508.Google ScholarDigital Library
- Christoph Laaber, Joel Scheuner, and Philipp Leitner. 2019 b. Software microbenchmarking in the cloud. How bad is it really? Empirical Software Engineering, Vol. 24, 4 (2019), 2469--2508.Google ScholarDigital Library
- Philipp Leitner and Cor-Paul Bezemer. 2017. An Exploratory Study of the State of Practice of Performance Testing in Java-Based Open Source Projects. In Proc. 8th ACM/SPEC on International Conference on Performance Engineering (ICPE '17). ACM, 373--384.Google ScholarDigital Library
- Philipp Leitner and Jürgen Cito. 2016. Patterns in the Chaos -- A Study of Performance Variation and Predictability in Public IaaS Clouds. ACM Trans. Internet Technol., Vol. 16, 3, Article 15 (2016), bibinfonumpages1--23 pages.Google ScholarDigital Library
- James Lewis and Martin Fowler. 2014. Microservices. https://martinfowler.com/articles/microservices.html Retrieved June 3, 2018 fromGoogle Scholar
- David J. Lilja. 2005. Measuring Computer Performance: A Practitioner's Guide .Cambridge University Press.Google Scholar
- Jeffrey D. Long, Du Feng, and Norman Cliff. 2003. Ordinal Analysis of Behavioral Data .John Wiley & Sons, Inc.Google Scholar
- Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An Empirical Analysis of Flaky Tests. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE). ACM, 643--653.Google ScholarDigital Library
- H. B. Mann and D. R. Whitney. 1947. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. Ann. Math. Statist., Vol. 18, 1 (03 1947), 50--60.Google Scholar
- Aleksander Maricq, Dmitry Duplyakin, Ivo Jimenez, Carlos Maltzahn, Ryan Stutsman, and Robert Ricci. 2018. Taming performance variability. In 13th $$USENIX$$ Symposium on Operating Systems Design and Implementation (OSDI 18). 409--425.Google Scholar
- Sam Newman. 2015. Building Microservices 1st ed.). O'Reilly Media, Inc., Sebastopol, California, USA.Google Scholar
- A. V. Papadopoulos, L. Versluis, A. Bauer, N. Herbst, J. Von Kistowski, A. Alieldin, C. Abad, J. N. Amaral, P. Tima, and A. Iosup. 2019. Methodological Principles for Reproducible Performance Evaluation in Cloud Computing. IEEE Transactions on Software Engineering (2019), 1--1.Google Scholar
- Florian Rademacher, Jonas Sorgalla, and Sabine Sachweh. 2018. Challenges of Domain-Driven Microservice Design: A Model-Driven Perspective. IEEE Software, Vol. 35, 3 (2018), 36--43.Google ScholarCross Ref
- Jeanine Romano, Jeffrey D Kromrey, Jesse Coraggio, Jeff Skowronek, and Linda Devine. 2006. Exploring methods for evaluating group differences on the NSSE and other surveys: Are the t-test and Cohen's d indices the most appropriate choices. In Annual meeting of the Southern Association for Institutional Research .Google Scholar
- Jean-Mathieu Saponaro. 2016. Monitoring Kubernetes performance metrics. https://www.datadoghq.com/blog/monitoring-kubernetes-performance-metrics/ Retrieved June 6, 2018 fromGoogle Scholar
- Joel Scheuner and Philipp Leitner. 2018. A Cloud Benchmark Suite Combining Micro and Applications Benchmarks. In Companion 2018 ACM/SPEC International Conference on Performance Engineering (ICPE '18). 161--166.Google ScholarDigital Library
- Weiyi Shang, Ahmed E. Hassan, Mohamed Nasser, and Parminder Flora. 2015. Automated Detection of Performance Regressions Using Regression Models on Clustered Performance Counters. In Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering (ICPE '15). ACM, New York, NY, USA, 15--26.Google ScholarDigital Library
- S. Elliot Sim, Steve Easterbrook, and Richard C. Holt. 2003. Using benchmarking to advance research: a challenge to software engineering. In Proceedings of the 25th International Conference on Software Engineering (ICSE). 74--83.Google Scholar
- Nikolai V Smirnov. 1939. Estimate of deviation between empirical distribution functions in two independent samples. Bulletin Moscow University, Vol. 2, 2 (1939), 3--16.Google Scholar
- Petr Stefan, Vojtech Horky, Lubomir Bulej, and Petr Tuma. 2017. Unit Testing Performance in Java Projects: Are We There Yet?. In Proc. of the 8th ACM/SPEC on International Conference on Performance Engineering (ICPE '17). ACM, 401--412.Google ScholarDigital Library
- Alexandru Uta and Harry Obaseki. 2018. A Performance Study of Big Data Workloads in Cloud Datacenters with Network Variability. In Companion 2018 ACM/SPEC International Conference on Performance Engineering (ICPE '18). 113--118.Google ScholarDigital Library
- Christian Vögele, André van Hoorn, Eike Schulz, Wilhelm Hasselbring, and Helmut Krcmar. 2018. WESSBAS: Extraction of Probabilistic Workload Specifications for Load Testing and Performance Prediction--A Model-Driven Approach for Session-Based Application Systems. Softw. and Syst. Modeling, Vol. 17, 2 (2018), 443--477.Google ScholarDigital Library
- Jóakim von Kistowski, Maximilian Deffner, and Samuel Kounev. 2018a. Run-time Prediction of Power Consumption for Component Deployments. In Proceedings of the 15th IEEE International Conference on Autonomic Computing (ICAC 2018) .Google ScholarCross Ref
- Jóakim von Kistowski, Simon Eismann, Norbert Schmitt, André Bauer, Johannes Grohmann, and Samuel Kounev. 2018b. TeaStore: A Micro-Service Reference Application for Benchmarking, Modeling and Resource Management Research. In Proceedings of the 26th IEEE International Symposium on the Modelling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS '18).Google ScholarCross Ref
- Frank Wilcoxon. 1945. Individual comparisons by ranking methods. Biometrics bulletin, Vol. 1, 6 (1945), 80--83.Google Scholar
- Claes Wohlin, Per Runeson, Martin Höst, Magnus C. Ohlsson, and Bjö rn Regnell. 2012. Experimentation in Software Engineering. Springer.Google Scholar
- Pengcheng Xiong, Calton Pu, Xiaoyun Zhu, and Rean Griffith. 2013. PerfGuard: An Automated Model-driven Framework for Application Performance Diagnosis in Consolidated Cloud Environments. In Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering (ICPE '13). ACM, New York, NY, USA, 271--282.Google ScholarDigital Library
Index Terms
- Microservices: A Performance Tester's Dream or Nightmare?
Recommendations
An architecture to automate performance tests on microservices
iiWAS '16: Proceedings of the 18th International Conference on Information Integration and Web-based Applications and ServicesThe microservices architecture provides a new approach to develop applications. As opposed to monolithic applications, in which the application comprises a single software artifact, an application based on the microservices architecture is composed by a ...
Performance Engineering for Microservices: Research Challenges and Directions
ICPE '17 Companion: Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering CompanionMicroservices complement approaches like DevOps and continuous delivery in terms of software architecture. Along with this architectural style, several important deployment technologies, such as container-based virtualization and container orchestration ...
PerfRanker: prioritization of performance regression tests for collection-intensive software
ISSTA 2017: Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and AnalysisRegression performance testing is an important but time/resource-consuming phase during software development. Developers need to detect performance regressions as early as possible to reduce their negative impact and fixing cost. However, conducting ...
Comments