research-article

An External Replication on the Effects of Test-driven Development Using a Multi-site Blind Analysis Approach

Authors:
Davide Fucci

M-Group, University of Oulu, Oulu, Finland

M-Group, University of Oulu, Oulu, Finland
View Profile

,
Giuseppe Scanniello

University of Basilicata, Potenza, Italy

University of Basilicata, Potenza, Italy
View Profile

,
Simone Romano

University of Basilicata, Potenza, Italy

University of Basilicata, Potenza, Italy
View Profile

,
Martin Shepperd

Brunel University, London, United Kingdom

Brunel University, London, United Kingdom
View Profile

,
Boyce Sigweni

Brunel University, London, United Kingdom

Brunel University, London, United Kingdom
View Profile

,
Fernando Uyaguari

Universidad Politécnica de Madrid, Madrid, Spain

Universidad Politécnica de Madrid, Madrid, Spain
View Profile

,
Burak Turhan

M-Group, University of Oulu, Oulu, Finland

M-Group, University of Oulu, Oulu, Finland
View Profile

,
Natalia Juristo

M-Group, University of Oulu, Universidad Politécnica de Madrid

M-Group, University of Oulu, Universidad Politécnica de Madrid
View Profile

,
Markku Oivo

M-Group, University of Oulu, Oulu, Finland

M-Group, University of Oulu, Oulu, Finland
View Profile

ESEM '16: Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and MeasurementSeptember 2016Article No.: 3Pages 1–10https://doi.org/10.1145/2961111.2962592

Published:08 September 2016Publication History

ESEM '16: Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Pages 1–10

ABSTRACT

Context: Test-driven development (TDD) is an agile practice claimed to improve the quality of a software product, as well as the productivity of its developers. A previous study (i.e., baseline experiment) at the University of Oulu (Finland) compared TDD to a test-last development (TLD) approach through a randomized controlled trial. The results failed to support the claims. Goal: We want to validate the original study results by replicating it at the University of Basilicata (Italy), using a different design. Method: We replicated the baseline experiment, using a crossover design, with 21 graduate students. We kept the settings and context as close as possible to the baseline experiment. In order to limit researchers bias, we involved two other sites (UPM, Spain, and Brunel, UK) to conduct blind analysis of the data. Results: The Kruskal-Wallis tests did not show any significant difference between TDD and TLD in terms of testing effort (p-value = .27), external code quality (p-value = .82), and developers' productivity (p-value = .83). Nevertheless, our data revealed a difference based on the order in which TDD and TLD were applied, though no carry over effect. Conclusions: We verify the baseline study results, yet our results raises concerns regarding the selection of experimental objects, particularly with respect to their interaction with the order in which of treatments are applied.

We recommend future studies to survey the tasks used in experiments evaluating TDD. Finally, to lower the cost of replication studies and reduce researchers' bias, we encourage other research groups to adopt similar multi-site blind analysis approach described in this paper.

References

M. Baker. Over half of psychology studies fail reproducibility test. Nature News, 27, 2015.Google Scholar
M. T. Baldassarre, J. Carver, O. Dieste, and N. Juristo. Replication types: Towards a shared taxonomy. In Proc. of International Conference on Evaluation and Assessment in Software Engineering, pages 18:1--18:4. ACM, 2014. Google ScholarDigital Library
V. Bebarta, D. Luyten, and K. Heard. Emergency medicine animal research: does use of randomization and blinding affect the results? Academic Emergency Medicine, 10(6):684--687, 2003.Google ScholarCross Ref
K. Beck. Test-driven development: by example. Addison-Wesley Professional, 2003. Google ScholarDigital Library
J. Carver, L. Jaccheri, S. Morasca, and F. Shull. Issues in using students in empirical studies in software engineering education. In Proceedings of International Symposium on Software Metrics, pages 239--251, 2003. Google ScholarDigital Library
J. C. Carver. Towards reporting guidelines for experimental replications: A proposal. Proceedings of the 1st international workshop on replication in software engineering (RESER), 2010.Google Scholar
M. Ciolkowski. What do we know about perspective-based reading? an approach for quantitative aggregation in software engineering. In Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement, pages 133--144. IEEE Computer Society, 2009. Google ScholarDigital Library
M. Ciolkowski, D. Muthig, and J. Rech. Using academic courses for empirical validation of software development processes. EUROMICRO Conference, pages 354--361, 2004. Google ScholarDigital Library
F. Q. B. da Silva, M. Suassuna, A. C. C. França, A. M. Grubb, T. B. Gouveia, C. V. F. Monteiro, and I. E. dos Santos. Replication of empirical studies in software engineering research: a systematic mapping study. Empirical Software Engineering, 19(3), 2014. Google ScholarDigital Library
H. Erdogmus, M. Morisio, and M. Torchiano. On the effectiveness of the test-first approach to programming. IEEE Transactions on Software Engineering, 31(3):226--237, Mar. 2005. Google ScholarDigital Library
M. Fowler. Refactoring: improving the design of existing code. Pearson Education India, 2009.Google Scholar
C. O. Fritz, P. E. Morris, and J. J. Richler. Effect size estimates: current use, calculations, and interpretation. Journal of Experimental Psychology: General, 141(1):2, 2012.Google ScholarCross Ref
D. Fucci and B. Turhan. A Replicated Experiment on the Effectiveness of Test-First Development. In 2013 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pages 103--112. IEEE, Oct. 2013.Google ScholarCross Ref
D. Fucci and B. Turhan. On the role of tests in test-driven development: a differentiated and partial replication. Empirical Software Engineering, 19(2):277--302, 2014. Google ScholarDigital Library
D. Fucci, B. Turhan, and M. Oivo. Conformance factor in test-driven development: initial results from an enhanced replication. In Proc. of the 18th International Conference on Evaluation and Assessment in Software Engineering. ACM, 2014. Google ScholarDigital Library
D. Fucci, B. Turhan, and M. Oivo. Impact of process conformance on the effects of test-driven development. In the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurements, pages 1--10. ACM Press, 2014. Google ScholarDigital Library
D. Fucci, B. Turhan, and M. Oivo. On the effects of programming and testing skills on external quality and productivity in a test-driven development context. In Proc. of the 19th International Conference on Evaluation and Assessment in Software Engineering, EASE '15. ACM, 2015. Google ScholarDigital Library
B. George and L. Williams. A structured experiment of test-driven development. Information and Software Technology, 46(5):337--342, 2004.Google ScholarCross Ref
O. S. Gómez, N. Juristo, and S. Vegas. Understanding replication of experiments in software engineering: A classification. Information and Software Technology, 56(8):1033--1048, Aug. 2014.Google ScholarCross Ref
J. Hannay and M. Jørgensen. The role of deliberate artificial design elements in software engineering experiments. IEEE Trans. on Soft. Eng., 34:242--259, March 2008. Google ScholarDigital Library
T. P. Hettmansperger and J. W. McKean. Robust nonparametric statistical methods. CRC Press, 2010.Google ScholarCross Ref
M. Ivarsson and T. Gorschek. A method for evaluating rigor and industrial relevance of technology evaluations. Empirical Software Engineering, 16(3):365--395, Oct. 2010. Google ScholarDigital Library
N. Juristo and S. Vegas. Using differences among replications of software engineering experiments to gain knowledge. In Empirical Software Engineering and Measurement, 2013 ACM / IEEE International Symposium on, pages 356--366. IEEE, 2009. Google ScholarDigital Library
N. Juristo, S. Vegas, M. N. Solari, and S. Abrahão. A process for managing interaction between experimenters to get useful similar replications. Information and Software Systems Journal, 2013. Google ScholarDigital Library
B. Kitchenham. The role of replications in empirical software engineering - a word of warning. Empirical Software Engineering, 13(2):219--221, 2008. Google ScholarDigital Library
R. MacCoun and S. Perlmutter. Blind analysis: Hide results to seek the truth. Nature, 526:187--189, 2015.Google ScholarCross Ref
L. Madeyski. The impact of test-first programming on branch coverage and mutation score indicator of unit tests: An experiment. Information and Software Technology, 52(2):169--184, 2010. Google ScholarDigital Library
E. M. Maximilien and L. Williams. Assessing test-driven development at ibm. In Proceedings of the 25th International Conference on Software Engineering, ICSE '03, pages 564--569, Washington, DC, USA, 2003. IEEE Computer Society. Google ScholarDigital Library
S. E. Maxwell and H. D. Delaney. Designing experiments and analyzing data: A model comparison perspective, volume 1. Psychology Press, 2004.Google Scholar
M. G. Mendonça, J. C. Maldonado, M. C. F. de Oliveira, J. Carver, S. C. P. F. Fabbri, F. Shull, G. H. Travassos, E. N. Höhn, and V. R. Basili. A framework for software engineering experimental replications. In Proc. of International Conference on Engineering of Complex Computer Systems, pages 203--212. IEEE Computer Society, 2008. Google ScholarDigital Library
M. M. Müller and A. Höfer. The effect of experience on the test-driven development process. Empirical Software Engineering, 12(6):593--615, 2007. Google ScholarDigital Library
H. Munir, M. Moayyed, and K. Petersen. Considering rigor and relevance when evaluating test driven development: A systematic review. Information and Software Technology, 2014. Google ScholarDigital Library
M. Pančur and M. Ciglarič. Impact of test-driven development on productivity, code and tests: A controlled experiment. Information and Software Technology, 53(6):557--573, June 2011. Google ScholarDigital Library
K. Pearson. Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58:240--242, 1895.Google ScholarCross Ref
S. L. Pfleeger and W. Menezes. Marketing technology to software practitioners. IEEE Software, 17(1):27--33. Google ScholarDigital Library
Y. Rafique and V. B. Misic. The Effects of Test-Driven Development on External Quality and Productivity: A Meta-Analysis. Software Engineering, IEEE Transactions on, 39(6):835--856, 2013. Google ScholarDigital Library
M. Shepperd, D. Bowes, and T. Hall. Researcher bias: The use of machine learning in software defect prediction. IEEE Transactions on Software Engineering, 40(6):603--616, 2014.Google ScholarCross Ref
F. Shull, J. C. Carver, S. Vegas, and N. J. Juzgado. The role of replications in empirical software engineering. Empirical Software Engineering, 13(2):211--218, 2008. Google ScholarDigital Library
F. Shull, G. Melnik, B. Turhan, L. Layman, M. Diep, and H. Erdogmus. What Do We Know about Test-Driven Development? IEEE Software, 27(6):16--19, 2010. Google ScholarDigital Library
B. Sigweni and M. Shepperd. Using blind analysis for software engineering experiments. In Proceedings of the 19th International Conference on Evaluation and Assessment in Software Engineering, page 32, 2015. Google ScholarDigital Library
B. Turhan, L. Layman, M. Diep, H. Erdogmus, and F. Shull. How effective is test-Driven Development. Making Software: What Really Works, and Why We Believe It, pages 207--217, 2010.Google Scholar
S. Vegas, C. Apa, and N. Juristo. Crossover designs in software engineering experiments: Benefits and perils. IEEE Transactions on Software Engineering, 42(2):120--135, Feb 2016.Google ScholarDigital Library
E. Vonesh and V. M. Chinchilli. Linear and nonlinear models for the analysis of repeated measurements. CRC press, 1996.Google Scholar
S. Wellek and M. Blettner. On the proper use of the crossover design in clinical trials. Deutsches Arztebllatt Intern, 109:276--281, 2012.Google Scholar
C. Wohlin, P. Runeson, M. Höst, M. C. Ohlsson, B. Regnell, and A. Wesslén. Experimentation in software engineering. Springer Science & Business Media, 2012. Google ScholarDigital Library

Recommendations

A longitudinal cohort study on the retainment of test-driven development
ESEM '18: Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Background: Test-Driven Development (TDD) is an agile software development practice, which is claimed to boost both external quality of software products and developers' productivity.

Aims: We want to study: (i) the TDD effects on the external quality ...
Read More
An experimental evaluation of test driven development vs. test-last development with industry professionals
EASE '14: Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering

Test-Driven Development (TDD) is a software development approach where test cases are written before actual development of the code in iterative cycles. Context: TDD has gained attention of many software practitioners during the last decade since it has ...
Read More
Test-Driven Development: Concepts, Taxonomy, and Future Direction

Test-driven development creates software in very short iterations with minimal upfront design. Poised for widespread adoption TDD has become the focus of an increasing number of researchers and developers.

Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ESEM '16: Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement
September 2016
457 pages
ISBN:9781450344272
DOI:10.1145/2961111
General Chair:
Marcela Genero
University of Castilla-La Mancha, Spain
,
Program Chairs:
Andreas Jedlitschka
Fraunhofer IESE, Germany
,
Magne Jørgensen
Simula Research Laboratory, Norway
,
Giuseppe Scanniello
University of Basilicata, Italy
,
Sreedevi Sampath
University of Maryland Baltimore County, USA
,
Danilo Caivano
SER&Practices, Italy
,
Daniel Port
University of Hawaii, USA
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 September 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Best Paper
Author Tags
blind analysis
external experiment replication
test-driven development
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
ESEM '16 Paper Acceptance Rate27of122submissions,22%Overall Acceptance Rate130of594submissions,22%
More
Upcoming Conference
ESEM '24

Sponsor:

sigsoft

ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

October 24 - 25, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 18
  Total Citations
  View Citations
- 2,147
  Total Downloads
- Downloads (Last 12 months)37
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An External Replication on the Effects of Test-driven Development Using a Multi-site Blind Analysis Approach

ESEM '16: Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

ABSTRACT

References

Cited By

Recommendations

A longitudinal cohort study on the retainment of test-driven development

An experimental evaluation of test driven development vs. test-last development with industry professionals

Test-Driven Development: Concepts, Taxonomy, and Future Direction