Abstract
Source code comments are a valuable instrument to preserve design decisions and to communicate the intent of the code to programmers and maintainers. Nevertheless, commenting source code and keeping comments up-to-date is often neglected for reasons of time or programmers obliviousness. In this paper, we investigate the question whether developers comment their code and to what extent they add comments or adapt them when they evolve the code. We present an approach to associate comments with source code entities to track their co-evolution over multiple versions. A set of heuristics are used to decide whether a comment is associated with its preceding or its succeeding source code entity. We analyzed the co-evolution of code and comments in eight different open source and closed source software systems. We found with statistical significance that (1) the relative amount of comments and source code grows at about the same rate; (2) the type of a source code entity, such as a method declaration or an if-statement, has a significant influence on whether or not it gets commented; (3) in six out of the eight systems, code and comments co-evolve in 90% of the cases; and (4) surprisingly, API changes and comments do not co-evolve but they are re-documented in a later revision. As a result, our approach enables a quantitative assessment of the commenting process in a software system. We can, therefore, leverage the results to provide feedback during development to increase the awareness of when to add comments or when to adapt comments because of source code changes.
Similar content being viewed by others
Notes
Compared to the hypotheses of Study 1 and 2 this is rather an assumption than a statistical hypothesis. Nevertheless we use the term hypothesis to keep the organization of the empirical studies consistent.
The detailed results of the other study are not available for publication.
References
Antoniol, G., Canfora, G., Casazza, G., Lucia, A. D., & Merlo, E. M. (2002). Recovering traceability links between code and documentation. IEEE Transactions on Software Engineering, 28(10), 970–983.
Baresi, L., & Morasca, S. (2007). Three empirical studies on estimating the design effort of Web applications. ACM Transactions on Software Engineering and Methodology, 16(4), 15.
Bevan, J., James Whitehead, E. J., Kim, S., & Godfrey, M. W. (2005). Facilitating software evolution research with Kenyon. In Proceedings of the joint 10th European software engineering conference and the 13th ACM SIGSOFT symposium on the foundations of software engineering (pp. 177–186). ACM.
Demeyer, S., Ducasse, S., & Nierstraz, O. (2003). Object-oriented reengineering patterns. Morgan Kaufmann.
des Rivières, J., & Wiegand, J. (2004). Eclipse: A platform for integrating development tools. IBM Systems Journal, 43(2), 371–383.
Dromey, R. G. (1995). A model for software product quality. IEEE Transactions on Software Engineering, 21(2), 146–162.
Elshoff, J. L., & Marcotty, M. (1982). Improving computer program readability to aid modification. Communications of the ACM, 25(8), 512–521.
Fischer, M., Pinzger, M., & Gall, H. (2003). Populating a release history database from version control and bug tracking systems. In Proceedings of the 19th international conference on software maintenance (pp. 23–32). IEEE Computer Society.
Fluri, B., & Gall, H. C. (2006). Classifying change types for qualifying change couplings. In Proceedings of the 14th international conference on program comprehension (pp. 35–45). IEEE Computer Society.
Fluri, B., Würsch, M., Pinzger, M., & Gall, H. C. (2007). Change distilling: Tree differencing for fine-grained source code change extraction. IEEE Transactions on Software Engineering, 33(11), 725–743.
Goldberg, A. (1987). Programmer as reader. IEEE Software, 4(5), 62–70.
Hyatt, L. E., & Rosenberg, L. H. (1996). A software quality model and metrics for identifying project risks and assessing software quality. In European space agency software assurance symposium and the 8th annual software technology conference (p. 209).
Jiang, Z. M., & Hassan, A. E. (2006). Examining the evolution of code comments in PostgreSQL. In Proceedings of the 3rd international workshop on mining software repositories (pp. 179–180). ACM.
Kaelbling, M. J. (1988). Programming languages should NOT have comment statements. ACM SIGPlan Notices, 23(10), 59–60.
Lakhotia, A. (1993). Understanding someone else’s code: Analysis and experience. Journal of Systems and Software, 23(3), 269–275.
Lawrie, D. J., Feild, H., & Binkley, D. (2006). Leveraged quality assessment using information retrieval techniques. In Proceedings of the international conference on program comprehension (pp. 149–158). IEEE Computer Society.
Lucia, A. D., Penta, M. D., Oliveto, R., & Zurolo, F. (2006). Improving comprehensibility of source code via traceability: A controlled experiment. In Proceedings of the 14th international conference on program comprehension (pp. 317–326). IEEE Computer Society.
Lucia, A. D., Fasano, F., Oliveto, R., & Tortora, G. (2007). Recovering traceability links in software artifact management systems using information retrieval methods. ACM Transactions on Software Engineering and Methodology, 16(4), 50.
Marcus, A., & Maletic, J. I. (2003). Recovering documentation-to-source-code traceability links using latent semantic indexing. In Proceedings of the 25th international conference on software engineering (pp. 125–135). IEEE Computer Society.
Marcus, A., & Poshyvanyk, D. (2005). The conceptual cohesion of classes. In Proceedings of the 21st international conference on software maintenance (pp. 133–142). IEEE Computer Society.
Schreck, D., Dallmeier, V., & Zimmermann, T. (2007). How documentation evolves over time. In Proceedings of the 9th international workshop on principles of software evolution (pp. 4–10). ACM.
Spinellis, D. (2006). Code quality—The open source perspective. Addison-Wesley, Pearson Education, Inc.
Tan, L., Yuan, D., Krishna, G., & Zhou, Y. (2007). /* iComment: Bugs or bad comments? */. In Proceedings of 21st ACM SIGOPS symposium on operating systems principles (pp. 145–158). ACM.
Tenny, T. (1988). Program readability: Procedures versus comments. IEEE Transactions on Software Engineering, 14(9), 1271–1279.
Vanter, M. L. V. D. (2002). The documentary structure of source code. Information and Software Technology, 44(13), 767–782.
Witte, R., Zhang, Y., & Rilling, J. (2007). Empowering software maintainers with semantic web technologies. In Proceedings of the 4th European semantic web conference (pp. 37–52). Springer.
Yin, R. K. (2003). Case study research—Design and methods (3rd edn.). Sage Publications, Inc.
Ying, A. T. T., Wright, J. L., & Abrams, S. (2005). Source code that talks: An exploration of eclipse task comments and their implication to repository mining. In Proceedings of the 2nd international workshop on mining software repositories (pp. 1–5).
Zimmermann, T., Weissgerber, P., Diehl, S., & Zeller, A. (2005). Mining version histories to guide software changes. IEEE Transactions on Software Engineering, 31(6), 429–445.
Acknowledgements
This work was supported by the Hasler Foundation as part of the ProMedServices—Proactive Software Service Improvements project. The authors would like to thank the reviewers for their insightful suggestions that greatly helped to improve the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fluri, B., Würsch, M., Giger, E. et al. Analyzing the co-evolution of comments and source code. Software Qual J 17, 367–394 (2009). https://doi.org/10.1007/s11219-009-9075-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11219-009-9075-x