Abstract
In formal experiments on software engineering, the number of factors that may impact an outcome is very high. Some factors are controlled and change by design, while others are are either unforeseen or due to chance. This paper aims to explore how context factors change in a series of formal experiments and to identify implications for experimentation and replication practices to enable learning from experimentation. We analyze three experiments on code inspections and structural unit testing. The first two experiments use the same experimental design and instrumentation (replication), while the third, conducted by different researchers, replaces the programs and adapts defect detection methods accordingly (reproduction). Experimental procedures and location also differ between the experiments. Contrary to expectations, there are significant differences between the original experiment and the replication, as well as compared to the reproduction. Some of the differences are due to factors other than the ones designed to vary between experiments, indicating the sensitivity to context factors in software engineering experimentation. In aggregate, the analysis indicates that reducing the complexity of software engineering experiments should be considered by researchers who want to obtain reliable and repeatable empirical measures.
Similar content being viewed by others
References
Anderson T, Darling D (1952) Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes. Ann Math Stat 23(2):193–212
Basili VR, Selby RW (1987) Comparing the effectiveness of software testing strategies. IEEE Trans Softw Eng 13(12):1278–1296
Basili VR, Shull F, Lanubile F (1999) Building knowledge through families of experiments. IEEE Trans Softw Eng 25(4):456–473
Berling T, Runeson P (2003) Evaluation of a perspective based review method applied in an industrial setting. IEE Proc SW 150(3):177–184
Cartwright N (1991) Replicability, reproducibility, and robustness: comments on Harry Collins. Hist Polit Econ 23(1):143–155
Clarke P, O’Connor RV (2012) The situational factors that affect the software development process: towards a comprehensive reference framework. Inf Softw Technol 54(5):433–447
da Silva FQB, Suassuna M, França ACC, Grubb AM, Gouveia TB, Monteiro CVF, dos Santos IE (2012) Replication of empirical studies in software engineering research: a systematic mapping study. Empir Softw Eng. doi:10.1007/s10664-012-9227-7
Dybå T, Sjøberg DIK, Cruzes DS (2012) What works for whom, where, when, and why?: on the role of context in empirical software engineering. In: Proceedings of the 11th international symposium on empirical software engineering and measurement, pp 19–28
Gomez OS, Juristo N, Vegas S (2010) Replications types in experimental disciplines. In: Proceedings of the fourth international symposium on empirical software engineering and measurement
Hannay J, Jørgensen M (2008) The role of deliberate artificial design elements in software engineering experiments. IEEE Trans Softw Eng 34(2):242–259
Hetzel W (1972) An experimental analysis of program verification problem solving capabilities as they relate to programmer efficiency. Comput Pers 3(3):10–15
Hoaglin D, Andrews D (1975) The reporting of computation-based results in statistics. Am Stat 29(3):112–126
Humphrey WS (1995) A discipline for software engineering. Addison-Wesley, Reading, MA
Jedlitschka A, Pfahl D (2005) Reporting guidelines for controlled experiments in software engineering. In: Proceedings of the 4th international symposium on empirical software engineering, pp 95–104
Jørgensen M, Grimstad S (2011) The impact of irrelevant and misleading information on software development effort estimates: a randomized controlled field experiment. IEEE Trans Softw Eng 37(5):695–707
Jørgensen M, Grimstad S (2012) Software development estimation biases: the role of interdependence. IEEE Trans Softw Eng 38(3):677–693
Jørgensen M, Gruschke T (2009) The impact of lessons-learned sessions on effort estimation and uncertainty assessments. IEEE Trans Softw Eng 35(3):368–383
Jørgensen M, Shepperd M (2007) A systematic review of software development cost estimation studies. IEEE Trans Softw Eng 33:33–53
Juristo N, Gomez OS (2012) Replication of software engineering experiments. In: Meyer B, Nordio M (eds) Empirical software engineering and verification. LNCS, vol 7007. Springer, pp 60–88
Juristo N, Vegas S (2011) The role of non-exact replications in software engineering experiments. Empir Softw Eng 16(3):295–324
Juristo N, Moreno AM, Vegas S (2004) Reviewing 25 years of testing technique experiments. Empir Softw Eng 9(1–2):7–44
Juristo N, Moreno AM, Vegas S, Solari M (2006) In search of what we experimentally know about unit testing. IEEE Softw 23:72–80
Juristo N, Vegas S, Solari M, Abrahao S, Ramos I (2012) Comparing the effectiveness of equivalence partitioning, branch testing and code reading be stepwise abstraction applied by subjects. In: Proceedings fifth IEEE international conference on software testing, verification and validation, Montreal, Canada, pp 330–339
Kitchenham BA, Fry J, Linkman SG (2003) The case against cross-over designs in software engineering. In: 11th international workshop on software technology and engineering practice (STEP 2003), Amsterdam, The Netherlands, pp 65–67
Kitchenham, BA (2008) The role of replications in empirical software engineering—a word of warning. Empir Softw Eng 13:219–221
Kitchenham BA, Al-Khilidar H, Babar MA, Berry M, Cox K, Keung J, Kurniawati F, Staples M, Zhang H, Zhu L (2007) Evaluating guidelines for reporting empirical software engineering studies. Empir Softw Eng 13(1):97–121
Kitchenham B, PearlBrereton O, Budgen D, Turner M, Bailey J, Linkman S (2009) Systematic literature reviews in software engineering—a systematic literature review. Inf Softw Technol 51(1):7–15
Laitenberger O (1998) Studying the effects of code inspection and structural testing on software quality. In: Proceedings 9th international symposium on software reliability engineering, pp 237–246
Lindsay RM, Ehrenberg ASC (1993) The design of replicated studies. Am Stat 47(3):217–227
Mäntylä MV, Lasseinus C, Vanhanen J (2010) Rethinking replication in software engineering: can we see the forest for the trees? In: Knutson C, Krein J (eds) 1st international workshop on replication in empirical software engineering research, Cape Town, South Africa
Miller J (2000) Applying meta-analytical procedures to software engineering experiments. J Syst Softw 54(1):29–39
Miller J (2005) Replicating software engineering experiments: a poisoned chalice or the holy grail. Inf Softw Technol 47(4):233–244
Montgomery DC (2001) Design and analysis of experiments, 5th edn. Wiley, New York
Pickard L, Kitchenham BA, Jones P (1998) Combining empirical results in software engineering. Inf Softw Technol 40(14):811–821
Runeson P, Andrews A (2003) Detection or isolation of defects? An experimental comparison of unit testing and code inspection. In: 14th international symposium on software reliability engineering, pp 3–13
Runeson P, Anderson C, Thelin T, Andrews A, Berling T (2006) What do we know about defect detection methods? IEEE Softw 23(3):82–90
Runeson P, Stefik A, Andrews A, Grönblom S, Porres I, Siebert S (2011) A comparative analysis of three replicated experiments comparing inspection and unit testing. In: Proceedings 2nd international workshop on replication in empirical software engineering research, Banff, Canada, pp 35–42
Runeson P, Höst M, Rainer A, Regnell B (2012) Case study research in software engineering—guidelines and examples. Wiley, New York
Schmidt S (2009) Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Rev Gen Psychol 13(2):90–100
Shull F, Basili VR, Carver J, Maldonado JC, Travassos GH, Mendonca M, Fabbri S (2002) Replicating software engineering experiments: addressing the tacit knowledge problem. In: Proceedings of the 1st international symposium empirical software engineering, pp 7–16
Shull FJ, Carver J, Vegas S, Juristo N (2008) The role of replications in empirical software engineering. Empir Softw Eng 13(2):211–218
Siegel S, Castellan N (1956) Nonparametric statistics for the behavioural sciences. McGraw-Hill, New York
Sjøberg DIK (2007) Knowledge acquisition in software engineering requires sharing of data and artifacts. In: Basili V, Rombach H, Schneider K, Kitchenham B, Pfahl D, Selby R (eds) Empirical software engineering issues: critical assessment and future directions. LNCS, vol 4336. Springer, pp 77–82
So S, Cha S, Shimeall T, Kwon Y (2002) An empirical evaluation of six methods to detect faults in software. SW Test Ver Rel 12(3):155–171
Teasley BE, Leventhal LM, Mynatt CR, Rohlman DS (1994) Why software testing is sometimes ineffective: two applied studies of positive test strategy. J Appl Psychol 79(1):142–155
Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslen A (2012) Experimentation in software engineering. Springer
Yin RK (2009) Case study research design and methods, 4th edn. Sage Publications, Beverly Hills, CA
Acknowledgements
We thank Sam Grönblom and Ivan Porres, Åbo Akademi university, Finland, for providing data from experiment 3. The first author conducted parts of the work during a sabbatical at North Carolina State University, USA. We thank the anonymous reviewers to help focus the manuscript and thereby significantly improve it.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Natalia Juristo
Appendix
Appendix
Rights and permissions
About this article
Cite this article
Runeson, P., Stefik, A. & Andrews, A. Variation factors in the design and analysis of replicated controlled experiments. Empir Software Eng 19, 1781–1808 (2014). https://doi.org/10.1007/s10664-013-9262-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-013-9262-z