skip to main content
10.1145/2884781.2884803acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Using (bio)metrics to predict code quality online

Published:14 May 2016Publication History

ABSTRACT

Finding and fixing code quality concerns, such as defects or poor understandability of code, decreases software development and evolution costs. A common industrial practice to identify code quality concerns early on are code reviews. While code reviews help to identify problems early on, they also impose costs on development and only take place after a code change is already completed. The goal of our research is to automatically identify code quality concerns while a developer is making a change to the code. By using biometrics, such as heart rate variability, we aim to determine the difficulty a developer experiences working on a part of the code as well as identify and help to fix code quality concerns before they are even committed to the repository.

In a field study with ten professional developers over a two-week period we investigated the use of biometrics to determine code quality concerns. Our results show that biometrics are indeed able to predict quality concerns of parts of the code while a developer is working on, improving upon a naive classifier by more than 26% and outperforming classifiers based on more traditional metrics. In a second study with five professional developers from a different country and company, we found evidence that some of our findings from our initial study can be replicated. Overall, the results from the presented studies suggest that biometrics have the potential to predict code quality concerns online and thus lower development and evolution costs.

References

  1. A. F. Ackerman, P. J. Fowler, and R. G. Ebenau. Software inspections and the industrial production of software. In Proc. of Symp. on Softw. Validation, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. E. H. Alikacem and H. Sahraoui. Generic metric extraction framework. In Proc. of IWSM/MetriKon, 2006.Google ScholarGoogle Scholar
  3. L. Anthony, P. Carrington, P. Chu, C. Kidd, J. Lai, and A. Sears. Gesture dynamics: Features sensitive to task difficulty and correlated with physiological sensors. Stress, 1418(360), 2011.Google ScholarGoogle Scholar
  4. http://www.apple.com/watch/.Google ScholarGoogle Scholar
  5. P. Ayres. Systematic mathematical errors and cognitive load. In Contemporary Educational Psychology, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  6. A. Bacchelli and C. Bird. Expectations, outcomes, and challenges of modern code review. In Proc. of ICSE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Bednarik and M. Tukiainen. An eye-tracking methodology for characterizing program comprehension processes. In Proc. of ETRA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Bednarik, H. Vrzakova, and M. Hradis. What do you want to do next: a novel approach for intent prediction in gaze-based interaction. In Proc. of ETRA, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. G. Berntson, J. T. J. Bigger, D. L. Eckberg, P. Grossman, P. G. Kaufmann, M. Malik, H. N. Nagaraja, S. W. Porges, J. P. Saul, P. H. Stone, and M. W. van der Molen. Heart rate variability: origins, methods, and interpretive caveats. Psychophysiology, 34(6):623--648, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  10. B. W. Boehm. Software engineering economics. Prentice-Hall, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. W. Boehm, J. R. Brown, and M. Lipow. Quantitative evaluation of software quality. In Proc. of ICSE, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Bosu, M. Greiler, and C. Bird. Characteristics of useful code reviews: An empirical study at microsoft. In Proc. of MSR, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Butterworth. On the theory of filter amplifiers. Wireless Engineer, 7:536--541, 1930.Google ScholarGoogle Scholar
  15. J. Carter and P. Dewan. Are you having difficulty? In Proc. of CSCW, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Cohen. A coefficient of agreement for nominal scales. Education and Psychological Measurement, 20:37--46, 1960.Google ScholarGoogle ScholarCross RefCross Ref
  17. A. M. Connor. Mining software metrics for the jazz repository. Journal of Systems and Software, 1(5):194--204, 2011.Google ScholarGoogle Scholar
  18. D. J. Cornforth, A. Koenig, R. Riener, K. August, A. H. Khandoker, C. Karmakar, M. Palaniswami, and H. F. Jelinek. The role of serious games in robot exoskeleton-assisted rehabilitation of stroke patients. In Serious Games Analytics: Methodologies for Performance Measurement, Assessment, and Improvement. Springer International Publisher, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  19. M. Crosby and J. Stelovsky. How do we read algorithms? a case study. Computer, 23(1), 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. W. Cunningham. The wycash portfolio management system. OOPS Messenger, 4(2):29--30, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. Curtis, S. Sheppard, P. Milliman, M. Borst, and T. Love. Measuring the psychological complexity of software maintenance tasks with the Halstead and McCabe metrics. Trans. on Software Engineering, SE-5(2):96--104, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. G. Ebenau and S. H. Strauss. Software Inspection Process. McGraw-Hill, Inc., 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. O. Elish and M. O. Elish. Predicting defect-prone software modules using support vector machines. Journal of Systems and Software, 81(5):649--660, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. http://www.empatica.com.Google ScholarGoogle Scholar
  25. http://techcrunch.com/2011/08/07/oh-what-noble-scribe-hath-penned-these-words/.Google ScholarGoogle Scholar
  26. S. H. Fairclough, L. Venables, and A. Tattersall. The influence of task demand and learning on the psychophysiological response. International Journal of Psychophysiology, 56, 2005.Google ScholarGoogle Scholar
  27. J. Feigenspan, S. Apel, J. Liebig, and C. Kastner. Exploring software measures to assess program comprehension. In Proc. of ESEM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. http://findbugs.sourceforge.net/.Google ScholarGoogle Scholar
  29. T. Fritz, A. Begel, S. C. Müller, S. Yigit-Elliot, and M. Züger. Using psycho-physiological measures to assess task difficulty in software development. In Proc. of ICSE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. E. Giger, M. D'Ambros, M. Pinzger, and H. C. Gall. Method-level bug prediction. In Proc. of ESEM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. http://www.niallkennedy.com/blog/2006/11/google-mondrian.html.Google ScholarGoogle Scholar
  32. R. Grady and T. Slack. Key lessons in achieving widespread inspection use. Software, 11(4):46--57, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: An update. SIGKDD Explorations Newsletter, 11(1):10--18, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Y. Ikutani and H. Uwano. Brain activity measurement during program comprehension with NIRS. In Proc. of SNPD, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  35. K. Kevic, B. M. Walters, T. R. Shaffer, B. Sharif, D. C. Shepherd, and T. Fritz. Tracing software developers' eyes and interactions for change tasks. In Proc. of ESEC/FSE, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. A. J. Ko and B. A. Myers. A framework and methodology for studying the causes of software errors in programming systems. Journal of Visual Languages & Computing, 16(1):41--84, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. N. A. Kuznetsov, K. D. Shockley, M. J. Richardson, and M. A. Riley. Effect of precision aiming on respiration and postural-respiratory synergy. Neuroscience letters, 502(1):13--17, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  38. J. R. Landis and G. G. Koch. The measurement of observer agreement for categorical data. Biometrics, 33(1):159--174, 1977.Google ScholarGoogle ScholarCross RefCross Ref
  39. M. Lanza and R. Marinescu. Object-oriented Metrics in Practice: Using Software Metrics to Characterize, Evaluate, and Improve the Design of Object-Oriented Systems. Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. T. Lee, J. Nam, D. Han, S. Kim, and H. P. In. Micro interaction metrics for defect prediction. In Proc. of ESEC/FSE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. M. M. Lehman. On understanding laws, evolution, and conservation in the large-program life cycle. Journal of Systems and Software, 1:213--221, 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. S. Lessmann, B. Baesens, C. Mues, and S. Pietsch. Benchmarking classification models for software defect prediction: A proposed framework and novel findings. Trans. on Software Engineering, 34(4):485--496, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. O. Maimon and L. Rokach, editors. Data Mining and Knowledge Discovery Handbook. Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. R. Marinescu. Detection strategies: Metrics-based rules for detecting design flaws. In Proc. of ICSM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. S. McConnell. Code complete. Pearson, 2004.Google ScholarGoogle Scholar
  46. N. Moha, Y. Guéhéneuc, L. Duchien, and A. Le Meur. Decor: A method for the specification and detection of code and design smells. Trans. on Software Engineering, 36(1), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. R. Moser, W. Pedrycz, and G. Succi. Analysis of the reliability of a subset of change metrics for defect prediction. In Proc. of ESEM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. S. C. Müller and T. Fritz. Stuck and frustrated or in flow and happy: Sensing developers' emotions and progress. In Proc. of ICSE, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. M. Munro. Product metrics for automatic identification of "bad smell" design problems in java source-code. In Proc. of METRICS, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. N. Nagappan and T. Ball. Use of relative code churn measures to predict system defect density. In Proc. of ICSE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. N. Nagappan, T. Ball, and A. Zeller. Mining metrics to predict component failures. In Proc. of ICSE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. N. Nagappan, B. Murphy, and V. Basili. The influence of organizational structure on software quality: An empirical case study. In Proc. of ICSE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. T. Nakagawa, Y. Kamei, H. Uwano, A. Monden, K. Matsumoto, and D. M. German. Quantifying programmers' mental workload during program comprehension based on cerebral blood flow measurement: A controlled experiment. In Companion Proc. of ICSE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. D. Novak, J. Ziherl, A. Olenšek, M. Milavec, J. Podobnik, M. Mihelj, and M. Munih. Psychophysiological response to robotic rehabilitation tasks in stroke. Trans. on Neural Systems and Rehabilitation Engineering, 18(4), 2010.Google ScholarGoogle Scholar
  55. F. Palomba, G. Bavota, M. Di Penta, R. Oliveto, A. De Lucia, and D. Poshyvanyk. Detecting bad smells in source code using change history information. In Proc. of ASE, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. C. Parnin. Subvocalization - toward hearing the inner thoughts of developers. In Proc. of ICPC, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. https://pmd.github.io/.Google ScholarGoogle Scholar
  58. Y. Qi. Random forest for bioinformatics. In Ensemble Machine Learning. Springer, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  59. S. Radevski, H. Hata, and K. Matsumoto. Real-time monitoring of neural state in assessing and improving software developers' productivity. Proc. of CHASE, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. http://www.ifi.uzh.ch/seal/people/mueller/PredictCodeQualityWithBiometrics.Google ScholarGoogle Scholar
  61. P. Richter, T. Wagner, R. Heger, and G. Weise. Psychophysiological analysis of mental load during driving on rural roads - a quasi-experimental field study. Ergonomics, 41(5), 1998.Google ScholarGoogle Scholar
  62. P. C. Rigby, D. M. German, and M.-A. Storey. Open source software peer review practices: A case study of the apache server. In Proc. of ICSE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. P. Rodeghero, C. McMillan, P. W. McBurney, N. Bosch, and S. D'Mello. Improving automated source code summarization via an eye-tracking study of programmers. In Proc. of ICSE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. S. Schmidth and H. Walach. Electrodermal activity (EDA) - state-of-the-art measurements and techniques for parapsychological purposes. Journal of Parapsychology, 64(2), 2000.Google ScholarGoogle Scholar
  65. C. Setz, B. Arnrich, J. Schumm, R. L. Marca, G. Tröster, and U. Ehlert. Discriminating stress from cognitive load using a wearable eda device. Trans. on Information Technology in Biomedicine, 14(2), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. J. Siegmund, C. Kästner, S. Apel, C. Parnin, A. Bethmann, T. Leich, G. Saake, and A. Brechmann. Understanding understanding source code with functional magnetic resonance imaging. In Proc. of ICSE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. L. A. Sroufe and E. Waters. Heart rate as a convergent measure in clinical and developmental research. Merrill-Palmer Quarterly of Behavior and Development, 23(1):3--27, 1977.Google ScholarGoogle Scholar
  68. J. Sweller. Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2):257--285, 1988.Google ScholarGoogle ScholarCross RefCross Ref
  69. J. Sweller, P. Ayres, and S. Kalyuga. Cognitive Load Theory. Springer, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  70. E. van Emden and L. Moonen. Java quality assurance by detecting code smells. In Proc. of WCRE, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. J. Veltman and A. W. Gaillard. Physiological workload reactions to increasing levels of task difficulty. Ergonomics, 41(5):656--669, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  72. G. F. Walter and S. W. Porges. Heart rate and respiratory responses as a function of task difficulty: The use of discriminant analysis in the selection of psychologically sensitive physiological responses. Psychophysiology, 13(6), 1976.Google ScholarGoogle Scholar
  73. R. A. Weast and N. G. Neiman. The effect of cognitive load and meaning on selective attention. In Annual Meeting of the Cognitive Science Society, 2010.Google ScholarGoogle Scholar
  74. E. J. Weyuker, T. J. Ostrand, and R. M. Bell. Do too many cooks spoil the broth? using the number of developers to enhance defect prediction models. Empirical Software Engineering, 13(5):539--559, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. G. F. Wilson. An analysis of mental workload in pilots during flight using multiple psychphysiological measures. International Journal of Aviation Psychology, 12(1), 2002.Google ScholarGoogle ScholarCross RefCross Ref
  76. H. Zhang, X. Zhang, and M. Gu. Predicting defective software components from code complexity measures. In Proc. of PRDC, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. T. Zimmermann, R. Premraj, and A. Zeller. Predicting defects for eclipse. In Proc. of PROMISE, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    ICSE '16: Proceedings of the 38th International Conference on Software Engineering
    May 2016
    1235 pages
    ISBN:9781450339001
    DOI:10.1145/2884781

    Copyright © 2016 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 14 May 2016

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate276of1,856submissions,15%

    Upcoming Conference

    ICSE 2025

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader