Variation factors in the design and analysis of replicated controlled experiments

Runeson, Per; Stefik, Andreas; Andrews, Anneliese

doi:10.1007/s10664-013-9262-z

Variation factors in the design and analysis of replicated controlled experiments

Three (dis)similar studies on inspections versus unit testing

Published: 18 August 2013

Volume 19, pages 1781–1808, (2014)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Per Runeson¹,
Andreas Stefik² &
Anneliese Andrews³

718 Accesses
12 Citations
6 Altmetric
Explore all metrics

Abstract

In formal experiments on software engineering, the number of factors that may impact an outcome is very high. Some factors are controlled and change by design, while others are are either unforeseen or due to chance. This paper aims to explore how context factors change in a series of formal experiments and to identify implications for experimentation and replication practices to enable learning from experimentation. We analyze three experiments on code inspections and structural unit testing. The first two experiments use the same experimental design and instrumentation (replication), while the third, conducted by different researchers, replaces the programs and adapts defect detection methods accordingly (reproduction). Experimental procedures and location also differ between the experiments. Contrary to expectations, there are significant differences between the original experiment and the replication, as well as compared to the reproduction. Some of the differences are due to factors other than the ones designed to vary between experiments, indicating the sensitivity to context factors in software engineering experimentation. In aggregate, the analysis indicates that reducing the complexity of software engineering experiments should be considered by researchers who want to obtain reliable and repeatable empirical measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparing the results of replications in software engineering

Article 02 February 2021

Empirical Practice in Software Engineering

The Evolution of Empirical Methods in Software Engineering

Notes

http://www.nobelprize.org/nobel_prizes/medicine/laureates/1945/fleming-faq.html

References

Anderson T, Darling D (1952) Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes. Ann Math Stat 23(2):193–212
Article MathSciNet MATH Google Scholar
Basili VR, Selby RW (1987) Comparing the effectiveness of software testing strategies. IEEE Trans Softw Eng 13(12):1278–1296
Article Google Scholar
Basili VR, Shull F, Lanubile F (1999) Building knowledge through families of experiments. IEEE Trans Softw Eng 25(4):456–473
Article Google Scholar
Berling T, Runeson P (2003) Evaluation of a perspective based review method applied in an industrial setting. IEE Proc SW 150(3):177–184
Article Google Scholar
Cartwright N (1991) Replicability, reproducibility, and robustness: comments on Harry Collins. Hist Polit Econ 23(1):143–155
Article Google Scholar
Clarke P, O’Connor RV (2012) The situational factors that affect the software development process: towards a comprehensive reference framework. Inf Softw Technol 54(5):433–447
Article Google Scholar
da Silva FQB, Suassuna M, França ACC, Grubb AM, Gouveia TB, Monteiro CVF, dos Santos IE (2012) Replication of empirical studies in software engineering research: a systematic mapping study. Empir Softw Eng. doi:10.1007/s10664-012-9227-7
Google Scholar
Dybå T, Sjøberg DIK, Cruzes DS (2012) What works for whom, where, when, and why?: on the role of context in empirical software engineering. In: Proceedings of the 11th international symposium on empirical software engineering and measurement, pp 19–28
Gomez OS, Juristo N, Vegas S (2010) Replications types in experimental disciplines. In: Proceedings of the fourth international symposium on empirical software engineering and measurement
Hannay J, Jørgensen M (2008) The role of deliberate artificial design elements in software engineering experiments. IEEE Trans Softw Eng 34(2):242–259
Article Google Scholar
Hetzel W (1972) An experimental analysis of program verification problem solving capabilities as they relate to programmer efficiency. Comput Pers 3(3):10–15
Article Google Scholar
Hoaglin D, Andrews D (1975) The reporting of computation-based results in statistics. Am Stat 29(3):112–126
Google Scholar
Humphrey WS (1995) A discipline for software engineering. Addison-Wesley, Reading, MA
Google Scholar
Jedlitschka A, Pfahl D (2005) Reporting guidelines for controlled experiments in software engineering. In: Proceedings of the 4th international symposium on empirical software engineering, pp 95–104
Jørgensen M, Grimstad S (2011) The impact of irrelevant and misleading information on software development effort estimates: a randomized controlled field experiment. IEEE Trans Softw Eng 37(5):695–707
Article Google Scholar
Jørgensen M, Grimstad S (2012) Software development estimation biases: the role of interdependence. IEEE Trans Softw Eng 38(3):677–693
Article Google Scholar
Jørgensen M, Gruschke T (2009) The impact of lessons-learned sessions on effort estimation and uncertainty assessments. IEEE Trans Softw Eng 35(3):368–383
Article Google Scholar
Jørgensen M, Shepperd M (2007) A systematic review of software development cost estimation studies. IEEE Trans Softw Eng 33:33–53
Article Google Scholar
Juristo N, Gomez OS (2012) Replication of software engineering experiments. In: Meyer B, Nordio M (eds) Empirical software engineering and verification. LNCS, vol 7007. Springer, pp 60–88
Juristo N, Vegas S (2011) The role of non-exact replications in software engineering experiments. Empir Softw Eng 16(3):295–324
Article Google Scholar
Juristo N, Moreno AM, Vegas S (2004) Reviewing 25 years of testing technique experiments. Empir Softw Eng 9(1–2):7–44
Article Google Scholar
Juristo N, Moreno AM, Vegas S, Solari M (2006) In search of what we experimentally know about unit testing. IEEE Softw 23:72–80
Article Google Scholar
Juristo N, Vegas S, Solari M, Abrahao S, Ramos I (2012) Comparing the effectiveness of equivalence partitioning, branch testing and code reading be stepwise abstraction applied by subjects. In: Proceedings fifth IEEE international conference on software testing, verification and validation, Montreal, Canada, pp 330–339
Kitchenham BA, Fry J, Linkman SG (2003) The case against cross-over designs in software engineering. In: 11th international workshop on software technology and engineering practice (STEP 2003), Amsterdam, The Netherlands, pp 65–67
Kitchenham, BA (2008) The role of replications in empirical software engineering—a word of warning. Empir Softw Eng 13:219–221
Article Google Scholar
Kitchenham BA, Al-Khilidar H, Babar MA, Berry M, Cox K, Keung J, Kurniawati F, Staples M, Zhang H, Zhu L (2007) Evaluating guidelines for reporting empirical software engineering studies. Empir Softw Eng 13(1):97–121
Article Google Scholar
Kitchenham B, PearlBrereton O, Budgen D, Turner M, Bailey J, Linkman S (2009) Systematic literature reviews in software engineering—a systematic literature review. Inf Softw Technol 51(1):7–15
Article Google Scholar
Laitenberger O (1998) Studying the effects of code inspection and structural testing on software quality. In: Proceedings 9th international symposium on software reliability engineering, pp 237–246
Lindsay RM, Ehrenberg ASC (1993) The design of replicated studies. Am Stat 47(3):217–227
Google Scholar
Mäntylä MV, Lasseinus C, Vanhanen J (2010) Rethinking replication in software engineering: can we see the forest for the trees? In: Knutson C, Krein J (eds) 1st international workshop on replication in empirical software engineering research, Cape Town, South Africa
Miller J (2000) Applying meta-analytical procedures to software engineering experiments. J Syst Softw 54(1):29–39
Article Google Scholar
Miller J (2005) Replicating software engineering experiments: a poisoned chalice or the holy grail. Inf Softw Technol 47(4):233–244
Article Google Scholar
Montgomery DC (2001) Design and analysis of experiments, 5th edn. Wiley, New York
Google Scholar
Pickard L, Kitchenham BA, Jones P (1998) Combining empirical results in software engineering. Inf Softw Technol 40(14):811–821
Article Google Scholar
Runeson P, Andrews A (2003) Detection or isolation of defects? An experimental comparison of unit testing and code inspection. In: 14th international symposium on software reliability engineering, pp 3–13
Runeson P, Anderson C, Thelin T, Andrews A, Berling T (2006) What do we know about defect detection methods? IEEE Softw 23(3):82–90
Article Google Scholar
Runeson P, Stefik A, Andrews A, Grönblom S, Porres I, Siebert S (2011) A comparative analysis of three replicated experiments comparing inspection and unit testing. In: Proceedings 2nd international workshop on replication in empirical software engineering research, Banff, Canada, pp 35–42
Runeson P, Höst M, Rainer A, Regnell B (2012) Case study research in software engineering—guidelines and examples. Wiley, New York
Book Google Scholar
Schmidt S (2009) Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Rev Gen Psychol 13(2):90–100
Article Google Scholar
Shull F, Basili VR, Carver J, Maldonado JC, Travassos GH, Mendonca M, Fabbri S (2002) Replicating software engineering experiments: addressing the tacit knowledge problem. In: Proceedings of the 1st international symposium empirical software engineering, pp 7–16
Shull FJ, Carver J, Vegas S, Juristo N (2008) The role of replications in empirical software engineering. Empir Softw Eng 13(2):211–218
Article Google Scholar
Siegel S, Castellan N (1956) Nonparametric statistics for the behavioural sciences. McGraw-Hill, New York
Google Scholar
Sjøberg DIK (2007) Knowledge acquisition in software engineering requires sharing of data and artifacts. In: Basili V, Rombach H, Schneider K, Kitchenham B, Pfahl D, Selby R (eds) Empirical software engineering issues: critical assessment and future directions. LNCS, vol 4336. Springer, pp 77–82
So S, Cha S, Shimeall T, Kwon Y (2002) An empirical evaluation of six methods to detect faults in software. SW Test Ver Rel 12(3):155–171
Article Google Scholar
Teasley BE, Leventhal LM, Mynatt CR, Rohlman DS (1994) Why software testing is sometimes ineffective: two applied studies of positive test strategy. J Appl Psychol 79(1):142–155
Article Google Scholar
Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslen A (2012) Experimentation in software engineering. Springer
Yin RK (2009) Case study research design and methods, 4th edn. Sage Publications, Beverly Hills, CA
Google Scholar

Download references

Acknowledgements

We thank Sam Grönblom and Ivan Porres, Åbo Akademi university, Finland, for providing data from experiment 3. The first author conducted parts of the work during a sabbatical at North Carolina State University, USA. We thank the anonymous reviewers to help focus the manuscript and thereby significantly improve it.

Author information

Authors and Affiliations

Lund University, Lund, Sweden
Per Runeson
University of Nevada, Las Vegas, NV, USA
Andreas Stefik
University of Denver, Denver, CO, USA
Anneliese Andrews

Authors

Per Runeson
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Stefik
View author publications
You can also search for this author in PubMed Google Scholar
Anneliese Andrews
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Per Runeson.

Additional information

Communicated by: Natalia Juristo

Appendix

Table 16 Defects in the PSP programs; classifications based on Basili and Shelby’s scheme (Basili and Selby 1987)

Full size table

Table 17 Defects in the real-time programs; classifications based on Basili and Shelby’s scheme (Basili and Selby 1987)

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Runeson, P., Stefik, A. & Andrews, A. Variation factors in the design and analysis of replicated controlled experiments. Empir Software Eng 19, 1781–1808 (2014). https://doi.org/10.1007/s10664-013-9262-z

Download citation

Published: 18 August 2013
Issue Date: December 2014
DOI: https://doi.org/10.1007/s10664-013-9262-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variation factors in the design and analysis of replicated controlled experiments

Abstract

Access this article

Similar content being viewed by others

Comparing the results of replications in software engineering

Empirical Practice in Software Engineering

The Evolution of Empirical Methods in Software Engineering

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Variation factors in the design and analysis of replicated controlled experiments

Abstract

Access this article

Similar content being viewed by others

Comparing the results of replications in software engineering

Empirical Practice in Software Engineering

The Evolution of Empirical Methods in Software Engineering

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation