ABSTRACT
Every software system (ideally) comes with one or more forms of documentation. Beside source code comments, other structured and unstructured sources (e.g., design documents, API references, wikis, usage examples, tutorials) constitute critical assets. Cloud-based repositories for collaborative development (e.g., GitHub, Bitbucket, GitLab) provide many functionalities to create, persist, and version documentation artifacts. On the other hand, the last decade has seen the rise of rich instant messaging clients used as global software community platforms (e.g., Slack, Discord). Although completely detached from a specific versioning system or development workflow, they allow developers to discuss implementation issues, report bugs, and, in general, interact with one another.
We refer to this evolving heterogeneous collection of information sources and documentation artifacts as the documentation landscape. It is important to have tools to extract information from these sources and integrate them in a topological visualization, to ease comprehension of a software system. How can we automatically generate this topology? How can we link elements in the topology back to the source code they refer to?
The goal of this PhD research is to automatically mine the documentation landscape of a system by disclosing pieces of information to aid, for example, in program maintenance tasks. We present our classification of possible documentation sources. The long term vision is to provide a domain model of the documentation landscape to build, visualize, and explore its instances for real software systems and evaluate the usefulness of the metaphor we propose.
- Nahla Abid, Natalia Dragan, Michael L. Collard, and Jonathan I. Maletic. 2017. The Evaluation of an Approach for Automatic Generated Documentation. In Proceedings of ICSME 2017 (International Conference on Software Maintenance and Evolution). IEEE, 307--317.Google Scholar
- Emad Aghajani, Csaba Nagy, Mario Linares-Vásquez, Laura Moreno, Gabriele Bavota, Michele Lanza, and David C. Shepherd. 2020. Software Documentation: The Practitioners' Perspective. In Proceedings of ICSE 2020 (International Conference on Software Engineering). ACM, 590--601.Google Scholar
- Emad Aghajani, Csaba Nagy, Olga Lucero Vega-Márquez, Mario Linares-Vásquez, Laura Moreno, Gabriele Bavota, and Michele Lanza. 2019. Software Documentation Issues Unveiled. In Proceedings of ICSE 2019 (International Conference on Software Engineering). IEEE/ACM, 1199--1210.Google ScholarDigital Library
- Maurício Aniche, Christoph Treude, Igor Steinmacher, Igor Wiese, Gustavo Pinto, Margaret-Anne Storey, and Marco Aurélio Gerosa. 2018. How Modern News Aggregators Help Development Communities Shape and Share Knowledge. In Proceedings of ICSE 2018 (International Conference on Software Engineering). ACM, 499--510.Google ScholarDigital Library
- Tegawendé F. Bissyandé, David Lo, Lingxiao Jiang, Laurent Réveillère, Jacques Klein, and Yves Le Traon. 2013. Got issues? Who cares about it? A large scale investigation of issue trackers from GitHub. In Proceedings of ISSRE 2013 (International Symposium on Software Reliability Engineering). IEEE, 188--197.Google ScholarCross Ref
- Joshua Charles Campbell, Chenlei Zhang, Zhen Xu, Abram Hindle, and James Miller. 2013. Deficient Documentation Detection a Methodology to Locate Deficient Project Documentation Using Topic Analysis. In Proceedings of MSR 2013 (Working Conference on Mining Software Repositories). IEEE, 57--60.Google ScholarCross Ref
- Jie-Cherng Chen and Sun-Jen Huang. 2009. An Empirical Analysis of the Impact of Software Development Problem Factors on Software Maintainability. Journal of Systems and Software 82, 6 (2009), 981--992.Google ScholarDigital Library
- Laura Dabbish, Colleen Stuart, Jason Tsay, and Jim Herbsleb. 2012. Social Coding in GitHub: Transparency and Collaboration in an Open Software Repository. In Proceedings of CSCW 2012 (Conference on Computer Supported Cooperative Work). ACM, 1277--1286.Google ScholarDigital Library
- Barthélémy Dagenais and Martin P Robillard. 2010. Creating and Evolving Developer Documentation: Understanding the Decisions of Open Source Contributors. In Proceedings of FSE 2010 (International Symposium on Foundations of Software Engineering). ACM, 127--136.Google ScholarDigital Library
- Andrea Di Sorbo, Sebastiano Panichella, Corrado A. Visaggio, Massimiliano Di Penta, Gerardo Canfora, and Harald C. Gall. 2021. Exploiting Natural Language Structures in Software Informal Documentation. IEEE Transactions on Software Engineering 47, 8 (2021), 1587--1604.Google ScholarCross Ref
- Osama Ehsan, Safwat Hassan, Mariam El Mezouar, and Ying Zou. 2020. An Empirical Study of Developer Discussions in the Gitter Platform. Transactions on Software Engineering and Methodology 30, 1 (2020), 1--39.Google ScholarDigital Library
- Aron Fiechter, Roberto Minelli, Csaba Nagy, and Michele Lanza. 2021. Visualizing GitHub Issues. In Proceedings of VISSOFT 2021 (Working Conference on Software Visualization). IEEE, 155--159.Google ScholarCross Ref
- Andrew Forward and Timothy C Lethbridge. 2002. The Relevance of Software Documentation, Tools and Technologies: A Survey. In Proceedings of DocEng 2002 (Symposium on Document Engineering). ACM, 26--33.Google ScholarDigital Library
- Golara Garousi, Vahid Garousi-Yusifoğlu, Guenther Ruhe, Junji Zhi, Mahmoud Moussavi, and Brian Smith. 2015. Usage and Usefulness of Technical Software Documentation: An Industrial Case Study. Information and Software Technology 57 (2015), 664--682.Google ScholarCross Ref
- Mehdi Golzadeh, Alexandre Decan, Damien Legay, and Tom Mens. 2021. A Ground-Truth Dataset and Classification Model for Detecting Bots in GitHub Issue and PR Comments. Journal of Systems and Software 175 (2021), 110911.Google ScholarCross Ref
- Hideaki Hata, Nicole Novielli, Sebastian Baltes, Raula Gaikovina Kula, and Christoph Treude. 2021. GitHub Discussions: An Exploratory Study of Early Adoption. arXiv:2102.05230Google Scholar
- Rafael Kallis, Andrea Di Sorbo, Gerardo Canfora, and Sebastiano Panichella. 2021. Predicting Issue types on GitHub. Science of Computer Programming 205 (2021), 102598.Google ScholarCross Ref
- Riivo Kikas, Marlon Dumas, and Dietmar Pfahl. 2016. Using Dynamic and Contextual Features to Predict Issue Lifetime in GitHub Projects. In Proceedings of MSR 2016 (Working Conference on Mining Software Repositories). IEEE/ACM, 291--302.Google ScholarDigital Library
- Alexander LeClair, Sakib Haque, Lingfei Wu, and Collin McMillan. 2020. Improved Code Summarization via a Graph Neural Network. In Proceedings of ICPC 2020 (International Conference on Program Comprehension). ACM, 184--195.Google ScholarDigital Library
- Bo Lin, Shangwen Wang, Kui Liu, Xiaoguang Mao, and Tegawendé F. Bissyandé. 2021. Automated Comment Update: How Far are We?. In Proceedings of ICPC 2021 (International Conference on Program Comprehension). IEEE/ACM, 36--46.Google Scholar
- Bin Lin, Alexey Zagalsky, Margaret-Anne Storey, and Alexander Serebrenik. 2016. Why Developers Are Slacking Off: Understanding How Software Teams Use Slack. In Proceedings of CSCW/SCC 2016 (Conference on Computer Supported Cooperative Work and Social Computing Companion). ACM, 333--336.Google ScholarDigital Library
- Christian D. Newman, Natalia Dragan, Michael L. Collard, Jonathan I. Maletic, Michael J. Decker, Drew T. Guarnera, and Nahla Abid. 2018. Automatically Generating Natural Language Documentation for Methods. In Proceedings of DysDoc 2018 (International Workshop on Dynamic Software Documentation). IEEE, 1--2.Google ScholarCross Ref
- Jalves Nicacio and Fabio Petrillo. 2021. Towards Improving Architectural Diagram Consistency Using System Descriptors. In Proceedings of ICPC 2021 (International Conference on Program Comprehension). IEEE/ACM, 401--405.Google ScholarCross Ref
- Dennis Pagano and Walid Maalej. 2011. How Do Developers Blog? An Exploratory Study. In Proceedings of MSR 2011 (Working Conference on Mining Software Repositories). ACM, 123--132.Google Scholar
- Chris Parnin, Christoph Treude, Lars Grammel, and Margaret-Anne Storey. 2012. Crowd Documentation: Exploring the Coverage and the Dynamics of API Discussions on Stack Overflow. Technical Report. Georgia Institute of Technology.Google Scholar
- Esteban Parra, Ashley Ellis, and Sonia Haiduc. 2020. GitterCom: A Dataset of Open Source Developer Communications in Gitter. In Proceedings of MSR 2020 (International Conference on Mining Software Repositories). ACM, 563--567.Google ScholarDigital Library
- Jirat Pasuksmit, Patanamon Thongtanunam, and Shanika Karunasekera. 2021. Towards Just-Enough Documentation for Agile Effort Estimation: What Information Should Be Documented?. In Proceedings of ICSME 2021 (International Conference on Software Maintenance and Evolution). IEEE.Google ScholarCross Ref
- Luca Ponzanelli, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Michele Lanza. 2014. Mining StackOverflow to Turn the IDE into a Self-Confident Programming Prompter. In Proceedings of MSR 2014 (Working Conference on Mining Software Repositories). IEEE/ACM, 102--111.Google ScholarDigital Library
- Marco Raglianti, Roberto Minelli, Csaba Nagy, and Michele Lanza. 2021. Visualizing Discord Servers. In Proceedings of VISSOFT 2021 (Working Conference on Software Visualization). IEEE, 150--154.Google ScholarCross Ref
- Martin P Robillard. 2009. What Makes APIs Hard to Learn? Answers from Developers. IEEE Software 26, 6 (2009), 27--34.Google ScholarDigital Library
- Martin P Robillard and Robert DeLine. 2011. A Field Study of API Learning Obstacles. Empirical Software Engineering 16, 6 (2011), 703--732.Google ScholarDigital Library
- Martin P. Robillard, Andrian Marcus, Christoph Treude, Gabriele Bavota, Oscar Chaparro, Neil Ernst, Marco Aurélio Gerosa, Michael Godfrey, Michele Lanza, Mario Linares-Vásquez, Gail C. Murphy, Laura Moreno, David Shepherd, and Edmund Wong. 2017. On-demand Developer Documentation. In Proceedings of ICSME 2017 (International Conference on Software Maintenance and Evolution). IEEE, 479--483.Google ScholarCross Ref
- Lin Shi, Xiao Chen, Ye Yang, Hanzhi Jiang, Ziyou Jiang, Nan Niu, and Qing Wang. 2021. A First Look at Developers' Live Chat on Gitter. In Proceedings of ESEC/FSE 2021 (European Software Engineering Conference and Symposium on the Foundations of Software Engineering). ACM, 391--403.Google ScholarDigital Library
- Ian Sommerville. 2015. Software Engineering (10th ed.). Pearson.Google ScholarDigital Library
- Margaret-Anne Storey, David F. Fracchia, and Hausi A. Müller. 1999. Cognitive Design Elements to Support the Construction of a Mental Model During Software Exploration. Journal of Systems and Software 44, 3 (1999), 171--185.Google ScholarDigital Library
- Jirateep Tantisuwankul, Yusuf Sulistyo Nugroho, Raula Gaikovina Kula, Hideaki Hata, Arnon Rungsawang, Pattara Leelaprute, and Kenichi Matsumoto. 2019. A topological analysis of communication channels for knowledge sharing in contemporary GitHub projects. Journal of Systems and Software 158 (2019), 110416.Google ScholarDigital Library
- Yuan Tian, Palakorn Achananuparp, Ibrahim Nelman Lubis, David Lo, and Ee-Peng Lim. 2012. What does software engineering community microblog about?. In Proceedings of MSR 2012 (Working Conference on Mining Software Repositories). IEEE, 247--250.Google ScholarCross Ref
- Christoph Treude, Martin P. Robillard, and Barthélémy Dagenais. 2015. Extracting Development Tasks to Navigate Software Documentation. IEEE Transactions on Software Engineering 41, 6 (2015), 565--581.Google ScholarDigital Library
- Gias Uddin and Martin P Robillard. 2015. How API Documentation Fails. IEEE Software 32, 4 (2015), 68--75.Google ScholarDigital Library
- Fengcai Wen, Csaba Nagy, Gabriele Bavota, and Michele Lanza. 2019. A Large-Scale Empirical Study on Code-Comment Inconsistencies. In Proceedings of ICPC 2019 (International Conference on Program Comprehension). IEEE/ACM, 53--64.Google ScholarDigital Library
- Juan Zhai, Xiangzhe Xu, Yu Shi, Guanhong Tao, Minxue Pan, Shiqing Ma, Lei Xu, Weifeng Zhang, Lin Tan, and Xiangyu Zhang. 2020. CPC: Automatically Classifying and Propagating Natural Language Comments via Program Analysis. In Proceedings of ICSE 2020 (International Conference on Software Engineering). ACM, 1359--1371.Google ScholarDigital Library
- Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, and Xudong Liu. 2020. Retrieval-Based Neural Source Code Summarization. In Proceedings of ICSE 2020 (International Conference on Software Engineering). ACM, 1385--1397.Google ScholarDigital Library
- Junji Zhi, Vahid Garousi-Yusifoğlu, Bo Sun, Golara Garousi, Shawn Shahnewaz, and Guenther Ruhe. 2015. Cost, Benefits and Quality of Software Development Documentation: A Systematic Mapping. Journal of Systems and Software 99 (2015), 175--198.Google ScholarDigital Library
Index Terms
- Topology of the documentation landscape
Recommendations
API documentation and software community values: a survey of open-source API documentation
SIGDOC '13: Proceedings of the 31st ACM international conference on Design of communicationStudies of what software developers need from API documentation have reported consistent findings over the years; however, these studies all used similar methods--usually a form of observation or survey. Our study looks at API documentation as artifacts ...
On mining crowd-based speech documentation
MSR '16: Proceedings of the 13th International Conference on Mining Software RepositoriesDespite the globalization of software development, relevant documentation of a project, such as requirements and design documents, often still is missing, incomplete or outdated. However, parts of that documentation can be found outside the project, ...
Patterns of Knowledge in API Reference Documentation
Reading reference documentation is an important part of programming with application programming interfaces (APIs). Reference documentation complements the API by providing information not obvious from the API syntax. To improve the quality of reference ...
Comments