ABSTRACT
The Cancer Registry of Norway (CRN) collects, curates, and manages data related to cancer patients in Norway, supported by an interactive, human-in-the-loop, socio-technical decision support software system. Automated software testing of this software system is inevitable; however, currently, it is limited in CRN’s practice. To this end, we present an industrial case study to evaluate an AI-based system-level testing tool, i.e., EvoMaster, in terms of its effectiveness in testing CRN’s software system. In particular, we focus on GURI, CRN’s medical rule engine, which is a key component at the CRN. We test GURI with EvoMaster’s black-box and white-box tools and study their test effectiveness regarding code coverage, errors found, and domain-specific rule coverage. The results show that all EvoMaster tools achieve a similar code coverage; i.e., around 19% line, 13% branch, and 20% method; and find a similar number of errors; i.e., 1 in GURI’s code. Concerning domain-specific coverage, EvoMaster’s black-box tool is the most effective in generating tests that lead to applied rules; i.e., 100% of the aggregation rules and between 12.86% and 25.81% of the validation rules; and to diverse rule execution results; i.e., 86.84% to 89.95% of the aggregation rules and 0.93% to 1.72% of the validation rules pass, and 1.70% to 3.12% of the aggregation rules and 1.58% to 3.74% of the validation rules fail. We further observe that the results are consistent across 10 versions of the rules. Based on these results, we recommend using EvoMaster’s black-box tool to test GURI since it provides good results and advances the current state of practice at the CRN. Nonetheless, EvoMaster needs to be extended to employ domain-specific optimization objectives to improve test effectiveness further. Finally, we conclude with lessons learned and potential research directions, which we believe are applicable in a general context.
- Ali Abedi and Tim Brecht. 2017. Conducting Repeatable Experiments in Highly Variable Cloud Computing Environments. In Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering (ICPE 2017). Association for Computing Machinery (ACM), New York, NY, USA. 287–292. isbn:978-1-4503-4404-3 https://doi.org/10.1145/3030207.3030229 Google ScholarDigital Library
- Shaukat Ali, Muhammad Zohaib Iqbal, Andrea Arcuri, and Lionel C. Briand. 2013. Generating Test Data from OCL Constraints with Search Techniques. IEEE Transactions on Software Engineering, 39, 10 (2013), Oct., 1376–1402. https://doi.org/10.1109/tse.2013.17 Google ScholarDigital Library
- APIFuzzer. 2022. APIFuzzer – HTTP API Testing Framework. https://github.com/KissPeter/APIFuzzer Accessed 23.8.2023 Google Scholar
- Andrea Arcuri. 2018. EvoMaster: Evolutionary Multi-Context Automated System Test Generation. In Proceedings of the 11th IEEE International Conference on Software Testing, Verification and Validation (ICST 2018). Institute of Electrical and Electronics Engineers (IEEE), 394–397. https://doi.org/10.1109/ICST.2018.00046 Google ScholarCross Ref
- Andrea Arcuri. 2018. Test Suite Generation with the Many Independent Objective (MIO) Algorithm. Information and Software Technology, 104 (2018), Dec., 195–206. https://doi.org/10.1016/j.infsof.2018.05.003 Google ScholarCross Ref
- Andrea Arcuri. 2019. RESTful API Automated Test Case Generation with EvoMaster. ACM Transactions on Software Engineering and Methodology, 28, 1 (2019), Feb., 1–37. https://doi.org/10.1145/3293455 Google ScholarDigital Library
- Andrea Arcuri. 2021. Automated Black- and White-Box Testing of RESTful APIs With EvoMaster. IEEE Software, 38, 3 (2021), May, 72–78. https://doi.org/10.1109/MS.2020.3013820 Google ScholarCross Ref
- Andrea Arcuri and Lionel Briand. 2011. A Practical Guide for Using Statistical Tests to Assess Randomized Algorithms in Software Engineering. In Proceedings of the 33rd International Conference on Software Engineering (ICSE 2011). Association for Computing Machinery (ACM). https://doi.org/10.1145/1985793.1985795 Google ScholarDigital Library
- Vaggelis Atlidakis, Patrice Godefroid, and Marina Polishchuk. 2019. RESTler: Stateful REST API Fuzzing. In Proceedings of the 41st IEEE/ACM International Conference on Software Engineering (ICSE 2019). Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/icse.2019.00083 Google ScholarDigital Library
- Earl T. Barr, Mark Harman, Phil McMinn, Muzammil Shahbaz, and Shin Yoo. 2015. The Oracle Problem in Software Testing: A Survey. IEEE Transactions on Software Engineering, 41, 5 (2015), May, 507–525. https://doi.org/10.1109/tse.2014.2372785 Google ScholarDigital Library
- Marcel Böhme, László Szekeres, and Jonathan Metzman. 2022. On the Reliability of Coverage-Based Fuzzer Benchmarking. In Proceedings of the 44th IEEE/ACM International Conference on Software Engineering (ICSE 2022). Association for Computing Machinery (ACM), 1621–1633. https://doi.org/10.1145/3510003.3510230 Google ScholarDigital Library
- Marcel Böme. 2023. Tweet: Comparison to Production. https://twitter.com/mboehme_/status/1640743122681339905 Accessed 23.8.2023 Google Scholar
- Marcel Böme. 2023. Tweet: Domain-Specific Fuzzing. https://twitter.com/mboehme_/status/1640739828621795332 Accessed 23.8.2023 Google Scholar
- Marcel Böme. 2023. Tweet: Evaluating Fuzzers. https://twitter.com/mboehme_/status/1640365695211896837 Accessed 23.8.2023 Google Scholar
- Marcel Böme. 2023. Tweet: Oracles. https://twitter.com/mboehme_/status/1640705559879094272 Accessed 23.8.2023 Google Scholar
- Davide Corradini, Amedeo Zampieri, Michele Pasqua, Emanuele Viglianisi, Michael Dallago, and Mariano Ceccato. 2022. Automated Black-Box Testing of Nominal and Error Scenarios in RESTful APIs. Software Testing, Verification and Reliability, 32, 5 (2022), Jan., https://doi.org/10.1002/stvr.1808 Google ScholarCross Ref
- Fida K. Dankar and Mahmoud Ibrahim. 2021. Fake It Till You Make It: Guidelines for Effective Synthetic Data Generation. Applied Sciences, 11, 5 (2021), issn:2076-3417 https://doi.org/10.3390/app11052158 Google ScholarCross Ref
- Dredd. 2021. Dredd – HTTP API Testing Framework. https://dredd.org Accessed 23.8.2023 Google Scholar
- Dmitry Dygalo. 2023. Schemathesis: Property-Based Testing for API Schemas. https://schemathesis.readthedocs.io Accessed 23.8.2023 Google Scholar
- J Ferlay, M Ervik, F Lam, M Colombet, L Mery, M Piñeros, A Znaor, I Soerjomataram, and Bray Freddie. 2020. Global Cancer Observatory: Cancer Today. https://gco.iarc.fr/today Google Scholar
- Gordon Fraser and Andrea Arcuri. 2011. EvoSuite: Automatic Test Suite Generation for Object-Oriented Software. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE 2011). Association for Computing Machinery (ACM). https://doi.org/10.1145/2025113.2025179 Google ScholarDigital Library
- Gordon Fraser and Andrea Arcuri. 2013. Whole Test Suite Generation. IEEE Transactions on Software Engineering, 39, 2 (2013), Feb., 276–291. https://doi.org/10.1109/TSE.2012.14 Google ScholarDigital Library
- Amid Golmohammadi, Man Zhang, and Andrea Arcuri. 2022. Testing RESTful APIs: A Survey. https://doi.org/10.48550/arXiv.2212.14604 arxiv:2212.14604. Google ScholarCross Ref
- A. Goncalves, P. Ray, B. Soper, J. Stevens, L. Coyle, and A. P. Sales. 2020. Generation and evaluation of synthetic patient data. BMC Med Res Methodol, 20, 1 (2020), 108. issn:1471-2288 (Electronic) 1471-2288 (Linking) https://doi.org/10.1186/s12874-020-00977-1 Goncalves, Andre Ray, Priyadip Soper, Braden Stevens, Jennifer Coyle, Linda Sales, Ana Paula eng England BMC Med Res Methodol. 2020 May 7;20(1):108. doi: 10.1186/s12874-020-00977-1. Google ScholarCross Ref
- Roman Haas, Daniel Elsner, Elmar Juergens, Alexander Pretschner, and Sven Apel. 2021. How Can Manual Testing Processes Be Optimized? Developer Survey, Optimization Guidelines, and Case Studies. In Proceedings of the 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2021). Association for Computing Machinery (ACM). https://doi.org/10.1145/3468264.3473922 Google ScholarDigital Library
- Mikel Hernandez, Gorka Epelde, Ane Alberdi, Rodrigo Cilla, and Debbie Rankin. 2022. Synthetic data generation for tabular health records: A systematic review. Neurocomputing, 493 (2022), 28–45. issn:09252312 https://doi.org/10.1016/j.neucom.2022.04.053 Google ScholarDigital Library
- Erblin Isaku, Hassan Sartaj, Christoph Laaber, Shaukat Ali, Tao Yue, Thomas Schwitalla, and Jan F. Nygård. 2023. Cost Reduction on Testing Evolving Cancer Registry System. In Proceedings of the 39th IEEE International Conference on Software Maintenance and Evolution (ICSME 2023). Institute of Electrical and Electronics Engineers (IEEE). Google Scholar
- Myeongsoo Kim, Qi Xin, Saurabh Sinha, and Alessandro Orso. 2022. Automated Test Generation for REST APIs: No Time to Rest Yet. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2022). Association for Computing Machinery (ACM), 289–301. https://doi.org/10.1145/3533767.3534401 Google ScholarDigital Library
- Kerry Kimbrough, Juglar, and Thibault Kruse. 2023. Tcases: A Model-Based Test Case Generator. https://github.com/Cornutum/tcases Accessed 23.8.2023 Google Scholar
- Christoph Laaber, Tao Yue, Shaukat Ali, Thomas Schwitalla, and Jan F. Nygård. 2023. Challenges of Testing an Evolving Cancer Registration Support System in Practice. In Proceedings of the 45th IEEE/ACM International Conference on Software Engineering: Companion Proceedings (ICSE-Companion 2023). Institute of Electrical and Electronics Engineers (IEEE), 355–359. https://doi.org/10.1109/ICSE-Companion58688.2023.00102 Google ScholarDigital Library
- Nuno Laranjeiro, João Agnelo, and Jorge Bernardino. 2021. A Black Box Tool for Robustness Testing of REST Services. IEEE Access, 9 (2021), Feb., 24738–24754. https://doi.org/10.1109/ACCESS.2021.3056505 Google ScholarCross Ref
- Nuno Laranjeiro, Carlos Francisco Fernandes Santos, and João Agnelo. 2022. EvoReFuzz – Evolutionary REST Fuzzer. https://git.dei.uc.pt/cnl/bBOXRT Accessed 23.8.2023 Google Scholar
- Valentin Liévin, Christoffer Egeberg Hother, and Ole Winther. 2023. Can large language models reason about medical questions? https://doi.org/10.48550/arXiv.2207.08143 arxiv:2207.08143. Google ScholarCross Ref
- Chengjie Lu, Qinghua Xu, Tao Yue, Shaukat Ali, Thomas Schwitalla, and Jan F. Nygård. 2023. EvoCLINICAL: Evolving Cyber-Cyber Digital Twin with Active Transfer Learning for Automated Cancer Registry System. In Proceedings of the 31th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023). Association for Computing Machinery (ACM), 11 pages. isbn:979-8-4007-0327-0/23/12 https://doi.org/10.1145/3611643.3613897 Google ScholarDigital Library
- Hong Lu, Shuai Wang, Tao Yue, Shaukat Ali, and Jan F. Nygård. 2019. Automated Refactoring of OCL Constraints with Search. IEEE Transactions on Software Engineering, 45, 2 (2019), Feb., 148–170. https://doi.org/10.1109/tse.2017.2774829 Google ScholarCross Ref
- Bogdan Marculescu, Man Zhang, and Andrea Arcuri. 2022. On the Faults Found in REST APIs by Automated Test Generation. ACM Transactions on Software Engineering and Methodology, 31, 3 (2022), July, 1–43. https://doi.org/10.1145/3491038 Google ScholarDigital Library
- Alberto Martin-Lopez, Sergio Segura, and Antonio Ruiz-Cortés. 2021. RESTest: Automated Black-Box Testing of RESTful Web APIs. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2021). Association for Computing Machinery (ACM), 682–685. https://doi.org/10.1145/3460319.3469082 Google ScholarDigital Library
- Rohan Padhye, Caroline Lemieux, Koushik Sen, Laurent Simon, and Hayawardh Vijayakumar. 2019. FuzzFactory: Domain-Specific Fuzzing with Waypoints. Proceedings of the ACM on Programming Languages, 3, OOPSLA (2019), Oct., 1–29. https://doi.org/10.1145/3360600 Google ScholarDigital Library
- Annibale Panichella, Fitsum Meshesha Kifetew, and Paolo Tonella. 2018. Automated Test Case Generation as a Many-Objective Optimisation Problem with Dynamic Selection of the Targets. IEEE Transactions on Software Engineering, 44, 2 (2018), Feb., 122–158. https://doi.org/10.1109/tse.2017.2663435 Google ScholarCross Ref
- Karan Singhal, Shekoofeh Azizi, Tao Tu, S. Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, Perry Payne, Martin Seneviratne, Paul Gamble, Chris Kelly, Nathaneal Scharli, Aakanksha Chowdhery, Philip Mansfield, Blaise Aguera y Arcas, Dale Webster, Greg S. Corrado, Yossi Matias, Katherine Chou, Juraj Gottweis, Nenad Tomasev, Yun Liu, Alvin Rajkomar, Joelle Barral, Christopher Semturs, Alan Karthikesalingam, and Vivek Natarajan. 2022. Large Language Models Encode Clinical Knowledge. https://doi.org/10.48550/arXiv.2212.13138 arxiv:2212.13138. Google ScholarCross Ref
- Klaas-Jan Stol and Brian Fitzgerald. 2018. The ABC of Software Engineering Research. ACM Transactions on Software Engineering and Methodology, 27, 3 (2018), Oct., 1–51. https://doi.org/10.1145/3241743 Google ScholarDigital Library
- Shuai Wang, Hong Lu, Tao Yue, Shaukat Ali, and Jan Nygård. 2016. MBF4CR: A Model-Based Framework for Supporting an Automated Cancer Registry System. In Proceedings of the 12th European Conference on Modelling Foundations and Applications (ECMFA 2016). Springer International Publishing, 191–204. https://doi.org/10.1007/978-3-319-42061-5_12 Google ScholarDigital Library
- Shuai Wang, Thomas Schwitalla, Tao Yue, Shaukat Ali, and Jan F. Nygård. 2017. RCIA: Automated Change Impact Analysis to Facilitate a Practical Cancer Registry System. In Proceedings of the 33rd IEEE International Conference on Software Maintenance and Evolution (ICSME 2017). Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/icsme.2017.22 Google ScholarCross Ref
- Xi Yang, Aokun Chen, Nima PourNejatian, Hoo Chang Shin, Kaleb E. Smith, Christopher Parisien, Colin Compas, Cheryl Martin, Anthony B. Costa, Mona G. Flores, Ying Zhang, Tanja Magoc, Christopher A. Harle, Gloria Lipori, Duane A. Mitchell, William R. Hogan, Elizabeth A. Shenkman, Jiang Bian, and Yonghui Wu. 2022. A Large Language Model for Electronic Health Records. npj Digital Medicine, 5, 1 (2022), Dec., https://doi.org/10.1038/s41746-022-00742-2 Google ScholarCross Ref
- Li Yunxiang, Li Zihan, Zhang Kai, Dan Ruilong, and Zhang You. 2023. ChatDoctor: A Medical Chat Model Fine-Tuned on LLaMA Model using Medical Domain Knowledge. https://doi.org/10.48550/arXiv.2303.14070 arxiv:2303.14070. Google ScholarCross Ref
- Man Zhang and Andrea Arcuri. 2021. Enhancing Resource-Based Test Case Generation for RESTful APIs with SQL Handling. In Proceedings of the 13th International Symposium on Search Based Software Engineering (SSBSE 2021). Springer, 103–117. https://doi.org/10.1007/978-3-030-88106-1_8 Google ScholarDigital Library
- Man Zhang and Andrea Arcuri. 2022. Adaptive Hypermutation for Search-Based System Test Generation: A Study on REST APIs with EvoMaster. ACM Transactions on Software Engineering and Methodology, 31, 1 (2022), Jan., 1–52. https://doi.org/10.1145/3464940 Google ScholarDigital Library
- Man Zhang, Andrea Arcuri, Yonggang Li, Yang Liu, and Kaiming Xue. 2023. White-Box Fuzzing RPC-Based APIs with EvoMaster: An Industrial Case Study. ACM Transactions on Software Engineering and Methodology, 1–39. Google Scholar
- Man Zhang, Bogdan Marculescu, and Andrea Arcuri. 2021. Resource and Dependency Based Test Case Generation for RESTful Web Services. Empirical Software Engineering, 26, 4 (2021), June, https://doi.org/10.1007/s10664-020-09937-1 Google ScholarDigital Library
Index Terms
- Automated Test Generation for Medical Rules Web Services: A Case Study at the Cancer Registry of Norway
Recommendations
Automated coverage calculation and test case generation
SAICSIT '12: Proceedings of the South African Institute for Computer Scientists and Information Technologists ConferenceThis article describes the use of symbolic execution, a formal method of static analysis, to calculate code coverage of a program's existing JUnit test suites. Code coverage is measured with respect to a number of test adequacy criteria, including ...
Dynamic Analysis of Algebraic Structure to Optimize Test Generation and Test Case Selection
TAIC-PART '09: Proceedings of the 2009 Testing: Academic and Industrial Conference - Practice and Research TechniquesWhere no independent specification is available, object-oriented unit testing is limited to exercising all interleaved method paths, seeking unexpected failures.A recent trend in unit testing, that interleaves dynamic analysis between each test cycle, ...
Repairing order-dependent flaky tests via test generation
ICSE '22: Proceedings of the 44th International Conference on Software EngineeringFlaky tests are tests that pass or fail nondeterministically on the same version of code. These tests can mislead developers concerning the quality of their code changes during regression testing. A common kind of flaky tests are order-dependent tests, ...
Comments