Manual test case derivation from UML activity diagrams and state machines: A controlled experiment

doi:10.1016/j.infsof.2014.12.005

Information and Software Technology

Volume 61, May 2015, Pages 1-15

https://doi.org/10.1016/j.infsof.2014.12.005 Get rights and content

Highlights

•
Presents controlled experiment on manual test case derivation from system models.
•
Provides taxonomy of errors made and statistics about their frequencies.
•
Shows that activity diagrams have a higher comprehensibility than state machines.
•
Shows that activity diagrams are more error-prone than state machines.

Abstract

Context

It is a difficult and challenging task to fully automatize model-based testing because this demands complete and unambiguous system models as input. Therefore, in practice, test cases, especially on the system level, are still derived manually from behavioral models like UML activity diagrams or state machines. But this kind of manual test case derivation is error-prone and knowing these errors makes it possible to provide guidelines to reduce them.

Objective

Method

We investigate the errors made when deriving test cases manually in a controlled student experiment. The experiment was performed and internally replicated with overall 84 participants divided into three groups at two institutions.

Results

As a result of our experiment, we provide a taxonomy of errors made and their frequencies. In addition, our experiment provides evidence that activity diagrams have a higher perceived comprehensibility but also a higher error-proneness than state machines with regard to manual test case derivation. This information helps to develop guidelines for manual test case derivation from UML activity diagrams and state machines.

Conclusion

Most errors observed were due to missing test steps, conditions or results, or content was written into the wrong field. As activity diagrams have a higher perceived comprehensibility, but also more error-prone than state machines, both diagram types are useful for manual test case derivation. Their application depends on the context and should be complemented with clear rules on how to derive test cases.

Introduction

Model-based testing is a variant of system testing that relies on explicit models that encode the intended behavior of a system under test (SUT) [1]. An advantage of model-based testing is that it allows tests to be linked directly to the SUT’s requirements, which improves the readability, understandability, and maintainability of tests. Furthermore, it helps ensure a repeatable basis for testing and to provide good coverage of all the behaviors of the SUT [2].

For the derivation of test cases, model-based testing relies on behavior models of the system which are in practice often UML activity diagrams or state machines [3], as these two diagram types are most frequently used to model system requirements [4]. Therefore, the derived tests are system tests concerned with testing an entire system based on its specification [5]. As indicated by an empirical case study [6], both automatically and manually derived model-based test suites have the potential to detect significantly more requirements defects than handcrafted test suites, which were directly derived from the requirements. The fully automated derivation of test cases from UML activity diagrams or state machines is still difficult and challenging. This is due to the fact that high quality UML models, which contain all information needed for automatically deriving test cases, are required for this purpose. However, such models are rarely available in practice. In addition, there is empirical evidence [6] that automatically generated model-based test suites do not detect more errors than hand-crafted model-based test suites with the same number of tests.

Therefore, in this paper, we investigate the case which is still practically more relevant than automation: A system tester analyzes a UML activity diagram or state machine and derives several test cases manually from them in order to achieve test coverage. These testers are typically key users or domain experts without in-depth experience and knowledge in testing [7]. Like all complex manual activities and especially when taking the often missing testing expertise of system testers into account, this kind of test case derivation is error-prone, what impacts the quality of the derived test cases. Knowing possible errors made, when manually deriving test cases from UML activity diagrams or state machines as well as the comprehensibility and suitability of these diagrams for test case derivation, are valuable to provide guidelines for systematically deriving test cases avoiding these errors.

The objective of the study presented in this paper therefore is to examine which errors are possible and actually made when manually deriving test cases from UML activity diagrams or state machines and whether there are differences between these diagram types with regard to manual test case derivation. Both serve as system models from behavioral perspective and are in practice often used as alternatives [4]. Knowing which of both better serves the purpose of test case derivation therefore can be useful for practice, when one has to select the right UML model type for system modeling taking testing aspects into account. In industry, it is common to have system tests executed by test personnel with some domain knowledge, but only little experience in systematic testing [7], for instance by key users. The required skills in test design are then often provided in short trainings [7]. This situation is similar to classes if domains familiar to students and suitable trainings are provided. We therefore investigate the difference between the two diagram types, i.e., UML activity diagrams and state machines, in a controlled experiment with overall 84 students divided into three groups at two institutions, i.e., the experiment was performed with two groups at Duale Hochschule Baden–Württemberg in Karlsruhe (Germany) and its internal replication [8] by the same researchers was performed at the University of Innsbruck (Austria). From the results, we derive a taxonomy of errors, identify the most frequent errors for each diagram type, as well as differences with regard to perceived comprehensibility and errors made between the two diagram types.

As a result, we provide a taxonomy of errors made and their frequencies. In addition, our experiment and its internal replication provide evidence that activity diagrams are perceived to be more comprehensible but are also more error-prone with regard to manual test case derivation.

This paper refers to established guidelines for reporting experiments in software engineering [9], [10] and is structured as follows. In Section 2, we provide an overview of related work. In Section 3, we present the experiment planning and execution. Then, in Section 4 we present the experiment results and their analysis. In Section 5, we discuss the interpretation of the results and threats to validity. Finally, in Section 6 we outline conclusions and future work.

Section snippets

Related work

The manual derivation of test cases from UML models has not been investigated empirically before. However, there are two types of related work: (1) empirical studies on comprehensibility of UML models as well as the manual derivation of test cases discussed in Section 2.1, and (2) methods for semi-automatically deriving test cases from UML models discussed in Section 2.2. In the remainder of this paper, we sometimes skip the term “UML” when referring to UML activity diagrams or state machines.

Experiment planning and execution

In this section, we discuss planning and execution of the experiment, which include goals and investigated research questions, participants, experiment tasks and material, variables and hypotheses, design of the experiment, experiment procedure, execution of the experiment, as well as the applied analysis procedure.

Results

In this section, we present the results and their interpretation according to the stated research questions.

To answer RQ1, we collected the types of errors and categorized them according to the main affected artifact, i.e., the precondition, input data, expected result, overall test step including the determining operation call, test case, or the complete test suite. The resulting taxonomy of error types is shown in Table 6. This taxonomy covers for each system model, i.e., activity diagram for

Discussion and threats to validity

In this section, we interpret the results of the previous section, compare them to related work and discuss threats to validity.

Conclusion and future work

In this paper, we empirically evaluated the manual derivation of test cases from UML activity diagrams and state machines in a controlled experiment with 84 student participants as experimental subjects. The students were divided into three groups at two institutions, i.e., the experiment was performed with two groups at Duale Hochschule Baden–Württemberg in Karlsruhe (Germany) and its internal replication by the same researchers was performed at the University of Innsbruck (Austria). The

Acknowledgements

This work was sponsored by the projects QE LaB – Living Models for Open Systems (FFG 882740) and MOBSTECO (FWF P 26194-N15). In addition, we thank all participants of the experiment for their time and concentration.

References (48)

C. Glezer et al.
Quality and comprehension of UML interaction diagrams-an experimental comparison
Inf. Softw. Technol.
(2005)
M.C. Otero et al.
Evaluation of the comprehension of the dynamic modeling in UML
Inf. Softw. Technol.
(2004)
A. Nugroho
Level of detail in UML models and its impact on model comprehension: a controlled experiment
Inf. Softw. Technol.
(2009)
M. Staron et al.
Empirical assessment of using stereotypes to improve comprehension of UML models: a set of experiments
J. Syst. Softw.
(2006)
J.A. Cruz-Lemus et al.
Assessing the influence of stereotypes on the comprehension of UML sequence diagrams: a family of experiments
Inf. Softw. Technol.
(2011)
A.M. Fernández-Sáez et al.
Empirical studies concerning the maintenance of UML diagrams and their use in the maintenance of code: a systematic mapping study
Inf. Softw. Technol.
(2013)
M. Utting et al.
A taxonomy of model-based testing approaches
Softw. Test. Verif. Rel.
(2012)
J. Zander et al.
Model-Based Testing for Embedded Systems
(2011)
M. Utting et al.
Practical Model-based Testing: A Tools Approach
(2010)
K. Pohl, C. Rupp, Requirements engineering fundamentals: a study guide for the certified professional for requirements...

L. Briand et al.

A UML-based approach to system testing

Softw. Syst. Model.

(2002)

A. Pretschner, W. Prenninger, S. Wagner, C. Kühnel, M. Baumgartner, B. Sostawa, R. Zölch, T. Stauner, One evaluation of...

M. Felderer, A. Beer, B. Peischl, On the role of defect taxonomy types for testing requirements: results of a...

M.G. Mendonça, J.C. Maldonado, M.C.F. de Oliveira, J. Carver, S.C.P.F. Fabbri, F. Shull, G.H. Travassos, E.N. Höhn,...

A. Jedlitschka et al.

Reporting experiments in software engineering

C. Wohlin et al.

Experimentation in Software Engineering

(2012)

D. Budgen et al.

Empirical evidence about the UML: a systematic literature review

Softw.: Pract. Exp.

(2011)

D. Sun, K. Wong, On evaluating the layout of UML class diagrams for program comprehension, in: 13th International...

H.C. Purchase, L. Colpoys, M. McGill, D. Carrington, C. Britton, UML class diagram syntax: an empirical study of...

L.C. Briand et al.

An experimental investigation of formality in UML-based development

IEEE Trans. Softw. Eng.

(2005)

F. Ricca et al.

How developers’ experience and ability influence web application comprehension tasks supported by UML stereotypes: a series of four experiments

IEEE Trans. Softw. Eng.

(2010)

A. De Lucia et al.

An experimental comparison of ER and UML class diagrams for data modelling

Empirical Softw. Eng.

(2010)

S. Abrahão et al.

Assessing the effectiveness of sequence diagrams in the comprehension of functional requirements: results from a family of five experiments

IEEE Trans. Software Eng.

(2013)

J. Aranda, N. Ernst, J. Horkoff, S. Easterbrook, A framework for empirical evaluation of model comprehensibility....

Cited by (17)

Empirical studies omit reporting necessary details: A systematic literature review of reporting quality in model based testing
2018, Computer Standards and Interfaces
Empirical studies are essential in evaluating the effectiveness of Model-based Testing (MBT) research and should be reported properly to ensure their replication and to highlight the strengths and limitations of the MBT techniques being evaluated. Researchers have proposed guidelines detailing what information should be reported when presenting empirical studies and what should be the structure of such primary studies. There is a need to evaluate the reporting quality of the empirical studies in MBT literature.
To evaluate the reporting quality of empirical studies in the model based testing domain; identifying where the reported studies fail to follow the proposed guidelines and finding frequently omitted details. As an auxiliary goal we aim to quantify the percentage of empirical studies conducted in industrial context.
We evaluate the reporting quality and the execution contexts of MBT empirical studies reported in literature. For our study we consider the MBT papers published in top ten software engineering journals over the last eighteen years. We evaluate the published primary studies using the empirical study reporting guidelines.
We found 87 empirical in MBT that met our selection criteria. Initial results showed that the existing guidelines were not only too strict (for example they demand presence of specific sections rather than simply having the details present in the paper), they also did not adequately cover MBT specific details. Therefore, we propose modified the guidelines for reporting empirical studies in MBT and re-evaluated the selected studies. Results show that while only a few empirical studies follow the exact structure proposed by the guidelines, approximately half the papers contain at least 50% of the required details. Most of the papers omit details related to process and analysis leading to presented results. We found a positive trend of improving reporting quality of empirical studies in MBT over the last Eighteen years. Another important finding from the review is that few reported studies were conducted in real industrial context.
Model based testing community needs to be more aware of the reporting guidelines and more effort should be spent on reporting the necessary details. Furthermore, we found that only few studies that are conducted in industrial context and hence more focus should be given to empirical case studies in real industry context. However, the reporting quality of research papers presenting empirical evaluations is gradually improving.
Investigating comprehension and learnability aspects of use cases for software specification problems
2017, Information and Software Technology
Citation Excerpt :
Overall, the collaboration diagrams were found to be more comprehensible, but the differences were not statistically significant. Felderer et al. [25] reported the results of a controlled experiment carried out to assess the quality of manually derived test cases from UML activity diagrams and state machines. The evaluations were made by investigating the differences between these diagrams in terms of test case generation and comprehensibility.
Context: Availability of multiple use case templates to document software requirements inevitably requires their characterization in terms of their relevance, usefulness, and the degree of the formality of the expressions.
Objective: This paper reports two experimental studies that separately investigate two usability aspects, namely the comprehension and the learnability of use case templates for software specification problems.
Method: We judged the comprehension aspect by evaluating the subjects’ understanding of the requirements, specified in eight different use case templates, and the ease with which the changes were made by them in the requirement specifications. The learnability aspect was judged by assessing the completeness, the correctness, and the redundancy of the use case specifications developed by the subjects using these eight use case templates for three software specification problems.
Results: Our results suggested that the Kettenis’s use case template was found to be significantly more understandable, and the templates by Tiwari, Yue and Somé were found to be significantly more flexible to adapt to the changes. On the learnability aspect, the way we formulated it, we found different templates to be more complete (Kettenis), correct (Somé), and non-redundant (Tiwari).
Conclusion: The specifications documented using a more detailed use case template with an intermediate degree of formality can be more comprehensible and flexible to adapt to the required changes to be made in the specification. A more formal template seems to enhance the learnability as well.
Is business domain language support beneficial for creating test case specifications: A controlled experiment
2016, Information and Software Technology
Citation Excerpt :
Quality again was defined as measurable metrics. Those metrics and measurements that are collected for answering the research questions have been defined by the authors of this paper in several meetings on basis of an empirical framework for the evaluation of model comprehensibility [31], related studies [34,35] and on the possibilities of measurements. The subjects had enough time to answer all questions and to perform the experiment tasks.
Context: Behavior Driven Development (BDD), widely used in modern software development, enables easy creation of acceptance test case specifications and serves as a communication basis between business- and technical-oriented stakeholders. BDD is largely facilitated through simple domain specific languages (DSL) and usually restricted to technical test domain concepts. Integrating business domain concepts to implement a ubiquitous language for all members of the development team is an appealing test language improvement issue. But the integration of business domain concepts into BDD toolkits has so far not been investigated.
Objective: The objective of the study presented in this paper is to examine whether supporting the ubiquitous language features inside a DSL, by extending a DSL with business domain concepts, is beneficial over using a DSL without those concepts. In the context of the study, benefit is measured in terms of perceived quality, creation time and length of the created test case specifications. In addition, we analyze if participants feel supported when using predefined business domain concepts.
Method: We investigate the creation of test case specifications, similar to BDD, in a controlled student experiment performed with graduate students based on a novel platform for DSL experimentation. The experiment was carried out by two groups, each solving a similar comparable test case, one with the simple DSL, the other one with the DSL that includes business domain concepts. A crossover design was chosen for evaluating the perceived quality of the resulting specifications.
Results: Our experiment indicates that a business domain aware language allows significant faster creation of documents without lowering the perceived quality. Subjects felt better supported by the DSL with business concepts.
Conclusion: Based on our findings we propose that existing BDD toolkits could be further improved by integrating business domain concepts.
Construct validity in software engineering
2021, TechRxiv
Analyzing UML use cases to generate test sequences
2021, International Journal of Computing and Digital Systems
Evaluating the Effects of Different Requirements Representations on Writing Test Cases
2020, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

View all citing articles on Scopus

View full text

Manual test case derivation from UML activity diagrams and state machines: A controlled experiment

Highlights

Abstract

Context

Objective

Method

Results

Conclusion

Introduction

Section snippets

Related work

Experiment planning and execution

Results

Discussion and threats to validity

Conclusion and future work

Acknowledgements

Inf. Softw. Technol.

Inf. Softw. Technol.

Inf. Softw. Technol.

J. Syst. Softw.

Inf. Softw. Technol.

Inf. Softw. Technol.

A taxonomy of model-based testing approaches

Softw. Test. Verif. Rel.

Model-Based Testing for Embedded Systems

Practical Model-based Testing: A Tools Approach

A UML-based approach to system testing

Softw. Syst. Model.

Reporting experiments in software engineering

Experimentation in Software Engineering

Empirical evidence about the UML: a systematic literature review

Softw.: Pract. Exp.

An experimental investigation of formality in UML-based development

IEEE Trans. Softw. Eng.

How developers’ experience and ability influence web application comprehension tasks supported by UML stereotypes: a series of four experiments

IEEE Trans. Softw. Eng.

An experimental comparison of ER and UML class diagrams for data modelling

Empirical Softw. Eng.

Assessing the effectiveness of sequence diagrams in the comprehension of functional requirements: results from a family of five experiments

IEEE Trans. Software Eng.