skip to main content
10.1145/2961111.2962592acmconferencesArticle/Chapter ViewAbstractPublication PagesesemConference Proceedingsconference-collections
research-article
Best Paper

An External Replication on the Effects of Test-driven Development Using a Multi-site Blind Analysis Approach

Published:08 September 2016Publication History

ABSTRACT

Context: Test-driven development (TDD) is an agile practice claimed to improve the quality of a software product, as well as the productivity of its developers. A previous study (i.e., baseline experiment) at the University of Oulu (Finland) compared TDD to a test-last development (TLD) approach through a randomized controlled trial. The results failed to support the claims. Goal: We want to validate the original study results by replicating it at the University of Basilicata (Italy), using a different design. Method: We replicated the baseline experiment, using a crossover design, with 21 graduate students. We kept the settings and context as close as possible to the baseline experiment. In order to limit researchers bias, we involved two other sites (UPM, Spain, and Brunel, UK) to conduct blind analysis of the data. Results: The Kruskal-Wallis tests did not show any significant difference between TDD and TLD in terms of testing effort (p-value = .27), external code quality (p-value = .82), and developers' productivity (p-value = .83). Nevertheless, our data revealed a difference based on the order in which TDD and TLD were applied, though no carry over effect. Conclusions: We verify the baseline study results, yet our results raises concerns regarding the selection of experimental objects, particularly with respect to their interaction with the order in which of treatments are applied.

We recommend future studies to survey the tasks used in experiments evaluating TDD. Finally, to lower the cost of replication studies and reduce researchers' bias, we encourage other research groups to adopt similar multi-site blind analysis approach described in this paper.

References

  1. M. Baker. Over half of psychology studies fail reproducibility test. Nature News, 27, 2015.Google ScholarGoogle Scholar
  2. M. T. Baldassarre, J. Carver, O. Dieste, and N. Juristo. Replication types: Towards a shared taxonomy. In Proc. of International Conference on Evaluation and Assessment in Software Engineering, pages 18:1--18:4. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. V. Bebarta, D. Luyten, and K. Heard. Emergency medicine animal research: does use of randomization and blinding affect the results? Academic Emergency Medicine, 10(6):684--687, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  4. K. Beck. Test-driven development: by example. Addison-Wesley Professional, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Carver, L. Jaccheri, S. Morasca, and F. Shull. Issues in using students in empirical studies in software engineering education. In Proceedings of International Symposium on Software Metrics, pages 239--251, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. C. Carver. Towards reporting guidelines for experimental replications: A proposal. Proceedings of the 1st international workshop on replication in software engineering (RESER), 2010.Google ScholarGoogle Scholar
  7. M. Ciolkowski. What do we know about perspective-based reading? an approach for quantitative aggregation in software engineering. In Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement, pages 133--144. IEEE Computer Society, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Ciolkowski, D. Muthig, and J. Rech. Using academic courses for empirical validation of software development processes. EUROMICRO Conference, pages 354--361, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. Q. B. da Silva, M. Suassuna, A. C. C. França, A. M. Grubb, T. B. Gouveia, C. V. F. Monteiro, and I. E. dos Santos. Replication of empirical studies in software engineering research: a systematic mapping study. Empirical Software Engineering, 19(3), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. Erdogmus, M. Morisio, and M. Torchiano. On the effectiveness of the test-first approach to programming. IEEE Transactions on Software Engineering, 31(3):226--237, Mar. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Fowler. Refactoring: improving the design of existing code. Pearson Education India, 2009.Google ScholarGoogle Scholar
  12. C. O. Fritz, P. E. Morris, and J. J. Richler. Effect size estimates: current use, calculations, and interpretation. Journal of Experimental Psychology: General, 141(1):2, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  13. D. Fucci and B. Turhan. A Replicated Experiment on the Effectiveness of Test-First Development. In 2013 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pages 103--112. IEEE, Oct. 2013.Google ScholarGoogle ScholarCross RefCross Ref
  14. D. Fucci and B. Turhan. On the role of tests in test-driven development: a differentiated and partial replication. Empirical Software Engineering, 19(2):277--302, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Fucci, B. Turhan, and M. Oivo. Conformance factor in test-driven development: initial results from an enhanced replication. In Proc. of the 18th International Conference on Evaluation and Assessment in Software Engineering. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Fucci, B. Turhan, and M. Oivo. Impact of process conformance on the effects of test-driven development. In the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurements, pages 1--10. ACM Press, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Fucci, B. Turhan, and M. Oivo. On the effects of programming and testing skills on external quality and productivity in a test-driven development context. In Proc. of the 19th International Conference on Evaluation and Assessment in Software Engineering, EASE '15. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. George and L. Williams. A structured experiment of test-driven development. Information and Software Technology, 46(5):337--342, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  19. O. S. Gómez, N. Juristo, and S. Vegas. Understanding replication of experiments in software engineering: A classification. Information and Software Technology, 56(8):1033--1048, Aug. 2014.Google ScholarGoogle ScholarCross RefCross Ref
  20. J. Hannay and M. Jørgensen. The role of deliberate artificial design elements in software engineering experiments. IEEE Trans. on Soft. Eng., 34:242--259, March 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. P. Hettmansperger and J. W. McKean. Robust nonparametric statistical methods. CRC Press, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  22. M. Ivarsson and T. Gorschek. A method for evaluating rigor and industrial relevance of technology evaluations. Empirical Software Engineering, 16(3):365--395, Oct. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. N. Juristo and S. Vegas. Using differences among replications of software engineering experiments to gain knowledge. In Empirical Software Engineering and Measurement, 2013 ACM / IEEE International Symposium on, pages 356--366. IEEE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. N. Juristo, S. Vegas, M. N. Solari, and S. Abrahão. A process for managing interaction between experimenters to get useful similar replications. Information and Software Systems Journal, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. B. Kitchenham. The role of replications in empirical software engineering - a word of warning. Empirical Software Engineering, 13(2):219--221, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. MacCoun and S. Perlmutter. Blind analysis: Hide results to seek the truth. Nature, 526:187--189, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  27. L. Madeyski. The impact of test-first programming on branch coverage and mutation score indicator of unit tests: An experiment. Information and Software Technology, 52(2):169--184, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. E. M. Maximilien and L. Williams. Assessing test-driven development at ibm. In Proceedings of the 25th International Conference on Software Engineering, ICSE '03, pages 564--569, Washington, DC, USA, 2003. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. E. Maxwell and H. D. Delaney. Designing experiments and analyzing data: A model comparison perspective, volume 1. Psychology Press, 2004.Google ScholarGoogle Scholar
  30. M. G. Mendonça, J. C. Maldonado, M. C. F. de Oliveira, J. Carver, S. C. P. F. Fabbri, F. Shull, G. H. Travassos, E. N. Höhn, and V. R. Basili. A framework for software engineering experimental replications. In Proc. of International Conference on Engineering of Complex Computer Systems, pages 203--212. IEEE Computer Society, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. M. Müller and A. Höfer. The effect of experience on the test-driven development process. Empirical Software Engineering, 12(6):593--615, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. H. Munir, M. Moayyed, and K. Petersen. Considering rigor and relevance when evaluating test driven development: A systematic review. Information and Software Technology, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Pančur and M. Ciglarič. Impact of test-driven development on productivity, code and tests: A controlled experiment. Information and Software Technology, 53(6):557--573, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. K. Pearson. Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58:240--242, 1895.Google ScholarGoogle ScholarCross RefCross Ref
  35. S. L. Pfleeger and W. Menezes. Marketing technology to software practitioners. IEEE Software, 17(1):27--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Y. Rafique and V. B. Misic. The Effects of Test-Driven Development on External Quality and Productivity: A Meta-Analysis. Software Engineering, IEEE Transactions on, 39(6):835--856, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Shepperd, D. Bowes, and T. Hall. Researcher bias: The use of machine learning in software defect prediction. IEEE Transactions on Software Engineering, 40(6):603--616, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  38. F. Shull, J. C. Carver, S. Vegas, and N. J. Juzgado. The role of replications in empirical software engineering. Empirical Software Engineering, 13(2):211--218, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. F. Shull, G. Melnik, B. Turhan, L. Layman, M. Diep, and H. Erdogmus. What Do We Know about Test-Driven Development? IEEE Software, 27(6):16--19, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. B. Sigweni and M. Shepperd. Using blind analysis for software engineering experiments. In Proceedings of the 19th International Conference on Evaluation and Assessment in Software Engineering, page 32, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. B. Turhan, L. Layman, M. Diep, H. Erdogmus, and F. Shull. How effective is test-Driven Development. Making Software: What Really Works, and Why We Believe It, pages 207--217, 2010.Google ScholarGoogle Scholar
  42. S. Vegas, C. Apa, and N. Juristo. Crossover designs in software engineering experiments: Benefits and perils. IEEE Transactions on Software Engineering, 42(2):120--135, Feb 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. E. Vonesh and V. M. Chinchilli. Linear and nonlinear models for the analysis of repeated measurements. CRC press, 1996.Google ScholarGoogle Scholar
  44. S. Wellek and M. Blettner. On the proper use of the crossover design in clinical trials. Deutsches Arztebllatt Intern, 109:276--281, 2012.Google ScholarGoogle Scholar
  45. C. Wohlin, P. Runeson, M. Höst, M. C. Ohlsson, B. Regnell, and A. Wesslén. Experimentation in software engineering. Springer Science & Business Media, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    ESEM '16: Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement
    September 2016
    457 pages
    ISBN:9781450344272
    DOI:10.1145/2961111

    Copyright © 2016 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 8 September 2016

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    ESEM '16 Paper Acceptance Rate27of122submissions,22%Overall Acceptance Rate130of594submissions,22%

    Upcoming Conference

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader