Manual test case derivation from UML activity diagrams and state machines: A controlled experiment
Introduction
Model-based testing is a variant of system testing that relies on explicit models that encode the intended behavior of a system under test (SUT) [1]. An advantage of model-based testing is that it allows tests to be linked directly to the SUT’s requirements, which improves the readability, understandability, and maintainability of tests. Furthermore, it helps ensure a repeatable basis for testing and to provide good coverage of all the behaviors of the SUT [2].
For the derivation of test cases, model-based testing relies on behavior models of the system which are in practice often UML activity diagrams or state machines [3], as these two diagram types are most frequently used to model system requirements [4]. Therefore, the derived tests are system tests concerned with testing an entire system based on its specification [5]. As indicated by an empirical case study [6], both automatically and manually derived model-based test suites have the potential to detect significantly more requirements defects than handcrafted test suites, which were directly derived from the requirements. The fully automated derivation of test cases from UML activity diagrams or state machines is still difficult and challenging. This is due to the fact that high quality UML models, which contain all information needed for automatically deriving test cases, are required for this purpose. However, such models are rarely available in practice. In addition, there is empirical evidence [6] that automatically generated model-based test suites do not detect more errors than hand-crafted model-based test suites with the same number of tests.
Therefore, in this paper, we investigate the case which is still practically more relevant than automation: A system tester analyzes a UML activity diagram or state machine and derives several test cases manually from them in order to achieve test coverage. These testers are typically key users or domain experts without in-depth experience and knowledge in testing [7]. Like all complex manual activities and especially when taking the often missing testing expertise of system testers into account, this kind of test case derivation is error-prone, what impacts the quality of the derived test cases. Knowing possible errors made, when manually deriving test cases from UML activity diagrams or state machines as well as the comprehensibility and suitability of these diagrams for test case derivation, are valuable to provide guidelines for systematically deriving test cases avoiding these errors.
The objective of the study presented in this paper therefore is to examine which errors are possible and actually made when manually deriving test cases from UML activity diagrams or state machines and whether there are differences between these diagram types with regard to manual test case derivation. Both serve as system models from behavioral perspective and are in practice often used as alternatives [4]. Knowing which of both better serves the purpose of test case derivation therefore can be useful for practice, when one has to select the right UML model type for system modeling taking testing aspects into account. In industry, it is common to have system tests executed by test personnel with some domain knowledge, but only little experience in systematic testing [7], for instance by key users. The required skills in test design are then often provided in short trainings [7]. This situation is similar to classes if domains familiar to students and suitable trainings are provided. We therefore investigate the difference between the two diagram types, i.e., UML activity diagrams and state machines, in a controlled experiment with overall 84 students divided into three groups at two institutions, i.e., the experiment was performed with two groups at Duale Hochschule Baden–Württemberg in Karlsruhe (Germany) and its internal replication [8] by the same researchers was performed at the University of Innsbruck (Austria). From the results, we derive a taxonomy of errors, identify the most frequent errors for each diagram type, as well as differences with regard to perceived comprehensibility and errors made between the two diagram types.
As a result, we provide a taxonomy of errors made and their frequencies. In addition, our experiment and its internal replication provide evidence that activity diagrams are perceived to be more comprehensible but are also more error-prone with regard to manual test case derivation.
This paper refers to established guidelines for reporting experiments in software engineering [9], [10] and is structured as follows. In Section 2, we provide an overview of related work. In Section 3, we present the experiment planning and execution. Then, in Section 4 we present the experiment results and their analysis. In Section 5, we discuss the interpretation of the results and threats to validity. Finally, in Section 6 we outline conclusions and future work.
Section snippets
Related work
The manual derivation of test cases from UML models has not been investigated empirically before. However, there are two types of related work: (1) empirical studies on comprehensibility of UML models as well as the manual derivation of test cases discussed in Section 2.1, and (2) methods for semi-automatically deriving test cases from UML models discussed in Section 2.2. In the remainder of this paper, we sometimes skip the term “UML” when referring to UML activity diagrams or state machines.
Experiment planning and execution
In this section, we discuss planning and execution of the experiment, which include goals and investigated research questions, participants, experiment tasks and material, variables and hypotheses, design of the experiment, experiment procedure, execution of the experiment, as well as the applied analysis procedure.
Results
In this section, we present the results and their interpretation according to the stated research questions.
To answer RQ1, we collected the types of errors and categorized them according to the main affected artifact, i.e., the precondition, input data, expected result, overall test step including the determining operation call, test case, or the complete test suite. The resulting taxonomy of error types is shown in Table 6. This taxonomy covers for each system model, i.e., activity diagram for
Discussion and threats to validity
In this section, we interpret the results of the previous section, compare them to related work and discuss threats to validity.
Conclusion and future work
In this paper, we empirically evaluated the manual derivation of test cases from UML activity diagrams and state machines in a controlled experiment with 84 student participants as experimental subjects. The students were divided into three groups at two institutions, i.e., the experiment was performed with two groups at Duale Hochschule Baden–Württemberg in Karlsruhe (Germany) and its internal replication by the same researchers was performed at the University of Innsbruck (Austria). The
Acknowledgements
This work was sponsored by the projects QE LaB – Living Models for Open Systems (FFG 882740) and MOBSTECO (FWF P 26194-N15). In addition, we thank all participants of the experiment for their time and concentration.
References (48)
- et al.
Quality and comprehension of UML interaction diagrams-an experimental comparison
Inf. Softw. Technol.
(2005) - et al.
Evaluation of the comprehension of the dynamic modeling in UML
Inf. Softw. Technol.
(2004) Level of detail in UML models and its impact on model comprehension: a controlled experiment
Inf. Softw. Technol.
(2009)- et al.
Empirical assessment of using stereotypes to improve comprehension of UML models: a set of experiments
J. Syst. Softw.
(2006) - et al.
Assessing the influence of stereotypes on the comprehension of UML sequence diagrams: a family of experiments
Inf. Softw. Technol.
(2011) - et al.
Empirical studies concerning the maintenance of UML diagrams and their use in the maintenance of code: a systematic mapping study
Inf. Softw. Technol.
(2013) - et al.
A taxonomy of model-based testing approaches
Softw. Test. Verif. Rel.
(2012) - et al.
Model-Based Testing for Embedded Systems
(2011) - et al.
Practical Model-based Testing: A Tools Approach
(2010) - K. Pohl, C. Rupp, Requirements engineering fundamentals: a study guide for the certified professional for requirements...
A UML-based approach to system testing
Softw. Syst. Model.
Reporting experiments in software engineering
Experimentation in Software Engineering
Empirical evidence about the UML: a systematic literature review
Softw.: Pract. Exp.
An experimental investigation of formality in UML-based development
IEEE Trans. Softw. Eng.
How developers’ experience and ability influence web application comprehension tasks supported by UML stereotypes: a series of four experiments
IEEE Trans. Softw. Eng.
An experimental comparison of ER and UML class diagrams for data modelling
Empirical Softw. Eng.
Assessing the effectiveness of sequence diagrams in the comprehension of functional requirements: results from a family of five experiments
IEEE Trans. Software Eng.
Cited by (17)
Empirical studies omit reporting necessary details: A systematic literature review of reporting quality in model based testing
2018, Computer Standards and InterfacesInvestigating comprehension and learnability aspects of use cases for software specification problems
2017, Information and Software TechnologyCitation Excerpt :Overall, the collaboration diagrams were found to be more comprehensible, but the differences were not statistically significant. Felderer et al. [25] reported the results of a controlled experiment carried out to assess the quality of manually derived test cases from UML activity diagrams and state machines. The evaluations were made by investigating the differences between these diagrams in terms of test case generation and comprehensibility.
Is business domain language support beneficial for creating test case specifications: A controlled experiment
2016, Information and Software TechnologyCitation Excerpt :Quality again was defined as measurable metrics. Those metrics and measurements that are collected for answering the research questions have been defined by the authors of this paper in several meetings on basis of an empirical framework for the evaluation of model comprehensibility [31], related studies [34,35] and on the possibilities of measurements. The subjects had enough time to answer all questions and to perform the experiment tasks.
Construct validity in software engineering
2021, TechRxivAnalyzing UML use cases to generate test sequences
2021, International Journal of Computing and Digital SystemsEvaluating the Effects of Different Requirements Representations on Writing Test Cases
2020, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)