Analysis of distance functions for similarity-based test suite reduction in the context of model-based testing

Coutinho, Ana Emília Victor Barbosa; Cartaxo, Emanuela Gadelha; Machado, Patrícia Duarte de Lima

doi:10.1007/s11219-014-9265-z

Analysis of distance functions for similarity-based test suite reduction in the context of model-based testing

Published: 27 December 2014

Volume 24, pages 407–445, (2016)
Cite this article

Download PDF

Software Quality Journal Aims and scope Submit manuscript

Analysis of distance functions for similarity-based test suite reduction in the context of model-based testing

Download PDF

Ana Emília Victor Barbosa Coutinho¹,
Emanuela Gadelha Cartaxo¹ &
Patrícia Duarte de Lima Machado¹

763 Accesses
21 Citations
Explore all metrics

Abstract

Test suite reduction strategies aim to produce a smaller and representative suite that presents the same coverage as the original one but is more cost-effective. In the model-based testing (MBT) context, reduction is crucial since automatic generation algorithms may blindly produce several similar test cases. In order to define the degree of similarity between test cases, researchers have investigated a number of distance functions. However, there is still little or no knowledge on whether and how they influence on the performance of reduction strategies, particularly when considering MBT practices. This paper investigates the effectiveness of distance functions in the scope of a MBT reduction strategy based on the similarity degree of test cases. We discuss six distance functions and apply them to three empirical studies. The first two studies are controlled experiments focusing on two real-world applications (and real faults) and ten synthetic specifications automatically generated from the configuration of each application (and faults randomly generated). In the third study, we also apply the reduction strategy to two subsequent versions of an industrial application by considering real faults detected. Results show that the choice of a distance function has little influence on the size of the reduced test suite. However, as reduced suites are different depending on the distance function applied, the choice can significantly affect the fault coverage. Moreover, it can also affect the stability of the reduction strategy regarding coverage of different sets of faults on different executions.

Life Sciences-Inspired Test Case Similarity Measures for Search-Based, FSM-Based Software Testing

Challenges for Automated, Model-Based Test Scenario Generation

Cost-effective testing based fault localization with distance based test-suite reduction

Article 28 July 2017

Xingya Wang, Shujuan Jiang, … Yanmei Zhang

1 Introduction

Model-based testing (MBT) (Utting and Legeard 2007) is an emergent field that has raised interest from both academy and industry in the last years. It provides the benefit of automatic test case generation from abstract models that capture, for instance, software requirements. Despite the fact that automation is critical to the practice of MBT, test case generation for industrial size applications can often produce large test suites that may not be cost-effective. The reason is that most of the times, automatic generation algorithms are based on a structural and systematic search for test cases constrained by test criteria. With the goal of improving the effectiveness of the suite by achieving coverage, algorithms may generate several similar test cases, depending on the model structure. In order to handle this problem, the testing team can perform additional test selection before test execution. However, test selection may profoundly impact on the success of the testing process as whole: important test cases such as the ones that uncover faults may not be selected (Pezzè and Young 2007). Therefore, there has been extensive research and practical interest in the automated test case selection problem for MBT (Anand et al. 2013).

Test suite reduction aims to produce a representative subset of the original test suite that satisfies a set of test requirements with the same coverage as the original test suite (Harrold et al. 1993; Chen and Lau 1998a). The idea is to have in the subset the most representative test cases chosen according to the capability of either covering more test requirements or covering uniquely one or more requirements. For instance, four well-known heuristics for code-based test suite reduction follow these ideas: Greedy (Chvätal 1979; Cormen et al. 2001), GE (Chen and Lau 1998b), GRE (Chen and Lau 1998a), and HGS (Harrold et al. 1993). Empirical studies have shown that requirements-based reduction may be effective to reduce the size of the suite, but they may also reduce the capability of fault detection (Fraser and Wotawa 2007; Yoo and Harman 2012). To address this problem, there are other approaches, present in the literature, based on test case classification according to a degree of similarity measured by a distance function (da Silva Simao et al. 2006; Kovcs et al. 2009; Bertolino et al. 2010; Coutinho et al. 2013). Empirical studies on test case selection based on similarity have shown that test case diversity may improve the rate of fault detection (Chen et al. 2010; Hemmati et al. 2013; Cartaxo et al. 2011).

Intuitively, the choice of a distance function may directly influence on the performance of test reduction strategies. For instance, the function can tune a technique to an extent in which it becomes capable of revealing differences that may speed up the achievement of coverage and at the same time diversifying the choice of test cases for improving the fault coverage (FC). Another important issue is that since reduction strategies often face draws and handle them by random selection, distance functions may also influence on the stability of the technique, that is, how variable are the results obtained in relation to selected test cases and fault coverage by subsequent runs of the technique.

Applications of distance functions spread across different contexts such as medicine (Felipe et al. 2003), speech (Thakur and Sahayam 2013), and image (Felipe et al. 2006) recognition. Moreover, there are many distance functions proposed in the literature, usually applied to specific applications or contexts where they are recognized as more effective (Akleman and Chen 1999). For instance, the use of distance functions and equivalence relations is the basis of several fault localization strategies (Renieres and Reiss 2003; Xie et al. 2013).

More specifically, in the context of software testing, efforts have already been made to compare distance functions for both test case selection (Hemmati et al. 2013) and prioritization (Ledru et al. 2009). On one hand, empirical studies have already shown that the choice of the function may influence on fault detection capability for the general test selection and test case prioritization problems (Yoo and Harman 2012; Hemmati et al. 2013). Particularly, Hemmati et al. (2013) present a study on test selection strategies based on similarity where they consider the choice of different distance functions combined with other parameters to decide on the best strategy for test case selection. Among results on 320 variants applied to two industrial case studies, top candidates emerge, even though differences found are minor. Generally, studies point to the need for more investigation. On the other hand, to the best of our knowledge, there are no studies comparing the effectiveness of distance functions applied to test suite reduction strategies for MBT. Different from test selection strategies where the tester may decide on the number of test cases to select, test suite reduction strategies rely on requirements coverage. In this sense, the choice of a distance function may influence on the size of the reduced suite as it may or not optimize coverage.

The goal of this work was to investigate the effectiveness of distance functions for test suite reduction in the context of MBT. For this, we apply a similarity strategy for test suite reduction proposed by Coutinho et al. (2013) by considering six distance functions: Similarity function, Levenshtein distance, Sellers algorithm, Jaccard index, Jaro distance, and Jaro–Winkler distance. We evaluate effectiveness by comparing the rates of test suite size reduction (SSR) and fault coverage (FC). Moreover, we observe the stability of the technique when considering the different functions according to the different subsets of test cases and faults. We focus on system-level testing and specifications modelled as labelled transition systems (LTS). LTS are largely considered by research and practice of MBT, including fundamental background, techniques, and tools (Anand et al. 2013). This paper presents three empirical studies. The first two are controlled experiments focusing on two real-world applications with real faults and 10 synthetic specification models automatically generated from the configuration of each application, such as the number of forks, transitions of forks, transitions of joins, joins, paths with loop, and depth. We generate synthetic models based on the strategy presented by Oliveira et al. (2013) and define sets of faults randomly for each generated model according to the obtained percentage of faults from each correspondent real-world specification. Test cases are sequences of transitions generated from each abstract specification by using a depth-search-based algorithm with all-one-loop-paths coverage as stop criterion—a common criterion applied in MBT (Utting and Legeard 2007; Cartaxo et al. 2008; Sapna and Mohanty 2009). As test requirement for the reduction strategy, we choose all-transition-pairs criterion (Utting and Legeard 2007). This criterion is satisfied if all pairs of adjacent transitions in the specification are traversed at least once (Utting and Legeard 2007). By using test cases selected according to this criteria, all interactions between adjacent transitions can be tested, even if the reduction strategy discards a number of test cases from the original generated suite. Furthermore, in the third study, we apply the reduction strategy to two versions of a real-world industrial application with real faults collected from manual execution of test cases. Although we apply the same procedure of the first two studies for generating the application model and test cases as well as the same reduction strategy, this is a low control study with no synthetic models with the goal of further investigation following an industrial application of MBT. Results show that the choice of the distance function has little influence on the size of the reduced test suite. However, the choice can significantly affect FC and stability.

In summary, the main contributions of this paper are as follows: (1) investigate the impact on the choice of a distance function in the scope of a similarity-based reduction strategy for MBT; (2) by considering SSR, FC, and stability in controlled experiments with statistical analysis; (3) and by observing on these measurements in the scope of a real application under development.

This paper is structured as follows. Section 2 presents basic concepts and Sect. 3 presents the distance functions considered in this work. Section 4 presents the reduction strategy applied to investigate the distance functions. Section 5 presents a description of the first empirical studies' goals and planning. Section 6 presents the results and analysis through the metrics collected and their statistical validity, particularly explaining them in terms of observations of the test suites and the faults revealed. Section 7 presents a case study that investigates the effectiveness of the functions in the context of a two versioned real-world application. Section 8 discusses related work. Finally, Sect. 9 presents some conclusions and pointers for further research.

2 Background

2.1 Labelled transition system (LTS)

MBT is a black box testing approach based on the automatic generation of test cases from behavioral specifications (Utting and Legeard 2007). In this work, we focus on Labelled Transition Systems (LTSs)—a common formalism considered by both fundamental and practical research on MBT that is also usually adopted as the semantics formalism of specification notations (Tretmans 2008; Anand et al. 2013).

According to Tretmans (2008), an LTS can be formally defined as a 4-tuple $\langle S, L, T, s_0 \rangle$, where

$S$ is a finite, nonempty set of states;
$L$ is a finite, nonempty set of labels of transitions;
$T$ is a subset of $S \times L \times S$ (set of triples), called the transition relation;
$s_0$ is the initial state, where $s_0 \in S$.

Figure 1a presents an example of an LTS that combines basic, alternate, and exception flows of a use case. The use case defines the behavior of a user account editing operation where (1) we can change user name and password and (2) we can delete a user account. As a usual convention, labels ending with “?” denote actor input actions, whereas labels ending with “!” denote system output actions. We consider this example to illustrate concepts throughout the paper. However, for the sake of simplicity, we replace transition labels by letters (Fig. 1b). Figure 1c shows a test suite generated from the LTS based on the depth search test case generation algorithm, proposed by Araújo et al. (2012), with all-one-loop-paths as stop criterion.

In an LTS, a path is a finite or infinite sequence of transitions from the initial state. In this work, a test case is defined as a path. Paths can be classified as: (1) simple path, path without repeated states or transitions ($\langle d, e, c \rangle$ from Fig. 1); (2) path with loop, path in which one or more states or transitions may be repeated, producing cycles (for example, $\langle a, b, f, g, h, i \rangle$, $\langle a, b, f, g, e \rangle$, $\langle d, e, f, g \rangle$ and $\langle d, h, i, g \rangle$ from Fig. 1). The depth of an LTS is calculated by considering the longest simple path. In the example presented in Fig. 1, the depth of the LTS is 5 defined by the path $\langle a, b, f, g, h \rangle$.

Two kinds of special states can be identified in an LTS: (1) join is a state with more than one incoming transition (the example in Fig. 1 contains three joins: states 2, 3, and 5); (2) fork is a state with more than one outgoing transition (the example in Fig. 1 contains threeforks: states 0, 2, and 3). Finally, we can define the transitions of joins and transitions of forks measures as the total number of incoming transitions of joins and outgoing transitions of forks of an LTS, respectively. The LTS from Fig. 1 has six transitions of joins and six transitions of forks.

It is important to remark that, in this paper, we consider models as abstraction of software requirements devoted for test case generation. We do not require models to be executable. The tester can choose between automated or manual execution of test cases.

2.2 Test suite reduction

According to Harrold et al. (1993), the test suite reduction problem can be defined as follows:

Given A test suite $TS$, a set $Req=\{Req_1, Req_2,\ldots , Req_n\}$ of test requirements to be covered, and subsets of $TS$: $TS_1, TS_2, \ldots , TS_n$, where each test case of $TS_i$ can be used to test $Req_i$;

Problem Find a minimal subset—the reduced set—$RS \subseteq TS$ that satisfies all of the Req’s, that is, $RS$ must have at least one test case for each $Req_i$.

In general, finding $RS$ is an NP-complete problem (minimization problems are NP-complete since they can be reduced to the minimum set-covering problem) (Cormen et al. 2001). Therefore, heuristics and approximations are often applied to compute RS, such as the ones presented by Chen and Lau (1998b).

In order to apply a reduction strategy, it is necessary to define a satisfiability relationship between $TS$ and $Req$, relating each $Req_i$ to the set of test cases $TS_i$ that cover it.

For the LTS specification and the test cases in Fig. 1, by considering that the test criterion is all-transition-pairs coverage, the satisfiability relation is presented in the Table 1. The most covered requirement is the pair (f, g)—(New changes saved!, Select another user?)—whereas the least covered requirement is the pair (b, c)—(Change password?, Limit of daily changes exceeded!) with only one test case. Particularly, (b, c) is part of an exception flow that is often associated with critical failures in practice. In this case, $t_1$ is an essential test case—the one that uniquely covers a given requirement—any test suite reduction strategy will keep it. However, it is important to remark that if we consider a weaker test criterion such as all transitions, we would require at least one test case covering b and c, but not necessarily both in the same test case. In this case, we cannot guarantee that the reduction strategy will select $t_1$. Therefore, the choice of test criteria that define requirements to be covered is critical to maximize fault detection capability of the reduced suite.

Furthermore, the choice of a distance function plays an important role in test suite reduction. It defines which test case(s) are part of the reduced suite, by selecting the most different ones that cover a given requirement. As we discuss in Sect. 3, different functions present different measures; therefore, we may get a different reduced set for each function.

Table 1 Satisfiability relation

Analysis of distance functions for similarity-based test suite reduction in the context of model-based testing

Abstract

Similar content being viewed by others

Life Sciences-Inspired Test Case Similarity Measures for Search-Based, FSM-Based Software Testing

Challenges for Automated, Model-Based Test Scenario Generation

Cost-effective testing based fault localization with distance based test-suite reduction

1 Introduction

2 Background

2.1 Labelled transition system (LTS)

2.2 Test suite reduction

3 Distance functions

3.1 Similarity function

3.2 Levenshtein distance

3.3 Sellers algorithm

3.4 Jaccard index

3.5 Jaro distance

3.6 Jaro–Winkler distance

4 Similarity-based test suite reduction strategy

4.1 Similarity matrix

4.2 Similarity-based strategy

5 Experiment definition

5.1 Definition

5.2 Planning

5.2.1 Context selection

5.2.2 Variables selection

5.2.3 Hypothesis formulation

5.2.4 Instrumentation

5.2.5 Experimental design

5.3 Operation

5.4 Threats to validity

6 Experiment analysis

6.1 First empirical study—PDFSam configuration

6.2 Second empirical study—TaRGeT configuration

6.3 General remarks

7 Case study

8 Related work

8.1 Test suite reduction and prioritization by dependency analysis and clustering in the context of MBT

8.2 Selection, reduction, and prioritization based on similarity in the context of MBT

9 Conclusions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation