ABSTRACT
E-mails concerning the development issues of a system constitute an important source of information about high-level design decisions, low-level implementation concerns, and the social structure of developers.
Establishing links between e-mails and the software artifacts they discuss is a non-trivial problem, due to the inherently informal nature of human communication. Different approaches can be brought into play to tackle this trace-ability issue, but the question of how they can be evaluated remains unaddressed, as there is no recognized benchmark against which they can be compared.
In this article we present such a benchmark, which we created through the manual inspection of a statistically significant number of e-mails pertaining to six unrelated software systems. We then use our benchmark to measure the effectiveness of a number of approaches, ranging from lightweight approaches based on regular expressions to full-fledged information retrieval approaches.
- G. Antoniol, G. Canfora, G. Casazza, A. D. Lucia, and E. Merlo. Recovering traceability links between code and documentation. IEEE Transactions on Software Engineering, 28(10):970--983, 2002. Google ScholarDigital Library
- A. Bacchelli, M. D'Ambros, and M. Lanza. Are popular classes more defect prone? In Proceedings of FASE 2010 (13th International Conference on Fundamental Approaches to Software Engineering), pages xxx-xxx, 2010. Google ScholarDigital Library
- A. Bacchelli, M. D'Ambros, M. Lanza, and R. Robbes. Benchmarking lightweight techniques to link e-mails and source code. In Proceedings of WCRE 2009 (16th IEEE Working Conference on Reverse Engineering), pages 205--214. IEEE CS Press, 2009. Google ScholarDigital Library
- O. Baysal and A. J. Malton. Correlating social interactions to release history during software evolution. In Proceedings of MSR 2007 (4th International Workshop on Mining Software Repositories), page 7. IEEE Computer Society, 2007. Google ScholarDigital Library
- M. Berry and M. Browne. Understanding Search Engines - Mathematical Modeling and Text Retrieval. SIAM, 2nd edition, 2005. Google ScholarDigital Library
- M. W. Berry, S. T. Dumais, and T. A. Letsche. Computational methods for intelligent information access. In Proceedings of SC 1995 (ACM/IEEE Conference on Supercomputing), 1995. Google ScholarDigital Library
- C. Bird, A. Gourley, P. T. Devanbu, M. Gertz, and A. Swaminathan. Mining email social networks. In Proceedings of MSR 2006 (3th International Workshop on Mining Software Repositories), page 137, 2006. Google ScholarDigital Library
- C. Bird, D. S. Pattison, R. M. D'Souza, V. Filkov, and P. T. Devanbu. Latent social structure in open source projects. In SIGSOFT FSE, pages 24--35, 2008. Google ScholarDigital Library
- E. J. Chikofsky and J. H. C. II. Reverse engineering and design recovery: A taxonomy. IEEE Software, 7(1):13--17, 1990. Google ScholarDigital Library
- S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41:391--407, 1990.Google ScholarCross Ref
- A. Dekhtyar and J. Hayes. Good benchmarks are hard to find: Toward the benchmark for information retrieval applications in software engineering. In ICSM 2006 Working Session: Information Retrieval Based Approaches in Software Evolution, 2007.Google Scholar
- S. Demeyer, S. Tichelaar, and S. Ducasse. FAMIX 2.1 --- The FAMOOS Information Exchange Model. Technical report, University of Bern, 2001.Google Scholar
- J. H. Hayes, A. Dekhtyar, and S. K. Sundaram. Advancing candidate link generation for requirements tracing: The study of methods. IEEE Transactions on Software Engineering, 32(1):4--19, 2006. Google ScholarDigital Library
- A. Kontostathis. Essential dimensions of latent semantic indexing (LSI). In Proceedings of HICSS 2007 (40th Annual Hawaii International Conference on System Sciences), pages 73--80. IEEE CS Press, 2007. Google ScholarDigital Library
- A. Kuhn, S. Ducasse, and T. Gírba. Semantic clustering: Identifying topics in source code. Information and Software Technology, 49(3):230--243, 2007. Google ScholarDigital Library
- T. D. LaToza, G. Venolia, and R. DeLine. Maintaining mental models: a study of developer work habits. In Proceedings of ICSE 2006 (28th ACM International Conference on Software Engineering), pages 492--501. ACM, 2006. Google ScholarDigital Library
- M. Lormans and A. van Deursen. Can LSI help reconstructing requirements traceability in design and test? In Proceedings of CSMR 2006 (10th European Conference on Software Maintenance and Reengineering), pages 47--56, 2006. Google ScholarDigital Library
- C. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008. Google ScholarDigital Library
- A. Marcus and J. Maletic. Recovering documentation-to-source-code traceability links using latent semantic indexing. In Proceedings of ICSE 2003 (25th International Conference on Software Engineering), pages 125--135. IEEE CS Press, 2003. Google ScholarDigital Library
- D. S. Pattison, C. Bird, and P. T. Devanbu. Talk and work: a preliminary report. In MSR, pages 113--116, 2008. Google ScholarDigital Library
- S. Pfleeger and J. Atlee. Software Engineering - Theory and Practice. Pearson, 2006.Google Scholar
- S. E. Sim, S. Easterbrook, and R. C. Holt. Using benchmarking to advance research: a challenge to software engineering. In Proceedings of ICSE 2003 (25th International Conference on Software Engineering), pages 74--83. IEEE CS Press, 2003. Google ScholarDigital Library
- R. Tang, A. E. Hassan, and Y. Zou. Techniques for identifying the country origin of mailing list participants. In Proceedings of WCRE 2009 (16th IEEE Working Conference on Reverse Engineering), pages 36--40. IEEE CS Press, 2009. Google ScholarDigital Library
- M. Triola. Elementary Statistics. Addison-Wesley, 10th edition, 2006.Google Scholar
Recommendations
Live API documentation
ICSE 2014: Proceedings of the 36th International Conference on Software EngineeringApplication Programming Interfaces (APIs) provide powerful abstraction mechanisms that enable complex functionality to be used by client programs. However, this abstraction does not come for free: understanding how to use an API can be difficult. While ...
Augmenting API documentation with insights from stack overflow
ICSE '16: Proceedings of the 38th International Conference on Software EngineeringSoftware developers need access to different kinds of information which is often dispersed among different documentation sources, such as API documentation or Stack Overflow. We present an approach to automatically augment API documentation with "...
From word embeddings to document similarities for improved information retrieval in software engineering
ICSE '16: Proceedings of the 38th International Conference on Software EngineeringThe application of information retrieval techniques to search tasks in software engineering is made difficult by the lexical gap between search queries, usually expressed in natural language (e.g. English), and retrieved documents, usually expressed in ...
Comments