skip to main content
10.1145/3510454.3517068acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Topology of the documentation landscape

Published:19 October 2022Publication History

ABSTRACT

Every software system (ideally) comes with one or more forms of documentation. Beside source code comments, other structured and unstructured sources (e.g., design documents, API references, wikis, usage examples, tutorials) constitute critical assets. Cloud-based repositories for collaborative development (e.g., GitHub, Bitbucket, GitLab) provide many functionalities to create, persist, and version documentation artifacts. On the other hand, the last decade has seen the rise of rich instant messaging clients used as global software community platforms (e.g., Slack, Discord). Although completely detached from a specific versioning system or development workflow, they allow developers to discuss implementation issues, report bugs, and, in general, interact with one another.

We refer to this evolving heterogeneous collection of information sources and documentation artifacts as the documentation landscape. It is important to have tools to extract information from these sources and integrate them in a topological visualization, to ease comprehension of a software system. How can we automatically generate this topology? How can we link elements in the topology back to the source code they refer to?

The goal of this PhD research is to automatically mine the documentation landscape of a system by disclosing pieces of information to aid, for example, in program maintenance tasks. We present our classification of possible documentation sources. The long term vision is to provide a domain model of the documentation landscape to build, visualize, and explore its instances for real software systems and evaluate the usefulness of the metaphor we propose.

References

  1. Nahla Abid, Natalia Dragan, Michael L. Collard, and Jonathan I. Maletic. 2017. The Evaluation of an Approach for Automatic Generated Documentation. In Proceedings of ICSME 2017 (International Conference on Software Maintenance and Evolution). IEEE, 307--317.Google ScholarGoogle Scholar
  2. Emad Aghajani, Csaba Nagy, Mario Linares-Vásquez, Laura Moreno, Gabriele Bavota, Michele Lanza, and David C. Shepherd. 2020. Software Documentation: The Practitioners' Perspective. In Proceedings of ICSE 2020 (International Conference on Software Engineering). ACM, 590--601.Google ScholarGoogle Scholar
  3. Emad Aghajani, Csaba Nagy, Olga Lucero Vega-Márquez, Mario Linares-Vásquez, Laura Moreno, Gabriele Bavota, and Michele Lanza. 2019. Software Documentation Issues Unveiled. In Proceedings of ICSE 2019 (International Conference on Software Engineering). IEEE/ACM, 1199--1210.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Maurício Aniche, Christoph Treude, Igor Steinmacher, Igor Wiese, Gustavo Pinto, Margaret-Anne Storey, and Marco Aurélio Gerosa. 2018. How Modern News Aggregators Help Development Communities Shape and Share Knowledge. In Proceedings of ICSE 2018 (International Conference on Software Engineering). ACM, 499--510.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Tegawendé F. Bissyandé, David Lo, Lingxiao Jiang, Laurent Réveillère, Jacques Klein, and Yves Le Traon. 2013. Got issues? Who cares about it? A large scale investigation of issue trackers from GitHub. In Proceedings of ISSRE 2013 (International Symposium on Software Reliability Engineering). IEEE, 188--197.Google ScholarGoogle ScholarCross RefCross Ref
  6. Joshua Charles Campbell, Chenlei Zhang, Zhen Xu, Abram Hindle, and James Miller. 2013. Deficient Documentation Detection a Methodology to Locate Deficient Project Documentation Using Topic Analysis. In Proceedings of MSR 2013 (Working Conference on Mining Software Repositories). IEEE, 57--60.Google ScholarGoogle ScholarCross RefCross Ref
  7. Jie-Cherng Chen and Sun-Jen Huang. 2009. An Empirical Analysis of the Impact of Software Development Problem Factors on Software Maintainability. Journal of Systems and Software 82, 6 (2009), 981--992.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Laura Dabbish, Colleen Stuart, Jason Tsay, and Jim Herbsleb. 2012. Social Coding in GitHub: Transparency and Collaboration in an Open Software Repository. In Proceedings of CSCW 2012 (Conference on Computer Supported Cooperative Work). ACM, 1277--1286.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Barthélémy Dagenais and Martin P Robillard. 2010. Creating and Evolving Developer Documentation: Understanding the Decisions of Open Source Contributors. In Proceedings of FSE 2010 (International Symposium on Foundations of Software Engineering). ACM, 127--136.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Andrea Di Sorbo, Sebastiano Panichella, Corrado A. Visaggio, Massimiliano Di Penta, Gerardo Canfora, and Harald C. Gall. 2021. Exploiting Natural Language Structures in Software Informal Documentation. IEEE Transactions on Software Engineering 47, 8 (2021), 1587--1604.Google ScholarGoogle ScholarCross RefCross Ref
  11. Osama Ehsan, Safwat Hassan, Mariam El Mezouar, and Ying Zou. 2020. An Empirical Study of Developer Discussions in the Gitter Platform. Transactions on Software Engineering and Methodology 30, 1 (2020), 1--39.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Aron Fiechter, Roberto Minelli, Csaba Nagy, and Michele Lanza. 2021. Visualizing GitHub Issues. In Proceedings of VISSOFT 2021 (Working Conference on Software Visualization). IEEE, 155--159.Google ScholarGoogle ScholarCross RefCross Ref
  13. Andrew Forward and Timothy C Lethbridge. 2002. The Relevance of Software Documentation, Tools and Technologies: A Survey. In Proceedings of DocEng 2002 (Symposium on Document Engineering). ACM, 26--33.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Golara Garousi, Vahid Garousi-Yusifoğlu, Guenther Ruhe, Junji Zhi, Mahmoud Moussavi, and Brian Smith. 2015. Usage and Usefulness of Technical Software Documentation: An Industrial Case Study. Information and Software Technology 57 (2015), 664--682.Google ScholarGoogle ScholarCross RefCross Ref
  15. Mehdi Golzadeh, Alexandre Decan, Damien Legay, and Tom Mens. 2021. A Ground-Truth Dataset and Classification Model for Detecting Bots in GitHub Issue and PR Comments. Journal of Systems and Software 175 (2021), 110911.Google ScholarGoogle ScholarCross RefCross Ref
  16. Hideaki Hata, Nicole Novielli, Sebastian Baltes, Raula Gaikovina Kula, and Christoph Treude. 2021. GitHub Discussions: An Exploratory Study of Early Adoption. arXiv:2102.05230Google ScholarGoogle Scholar
  17. Rafael Kallis, Andrea Di Sorbo, Gerardo Canfora, and Sebastiano Panichella. 2021. Predicting Issue types on GitHub. Science of Computer Programming 205 (2021), 102598.Google ScholarGoogle ScholarCross RefCross Ref
  18. Riivo Kikas, Marlon Dumas, and Dietmar Pfahl. 2016. Using Dynamic and Contextual Features to Predict Issue Lifetime in GitHub Projects. In Proceedings of MSR 2016 (Working Conference on Mining Software Repositories). IEEE/ACM, 291--302.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Alexander LeClair, Sakib Haque, Lingfei Wu, and Collin McMillan. 2020. Improved Code Summarization via a Graph Neural Network. In Proceedings of ICPC 2020 (International Conference on Program Comprehension). ACM, 184--195.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Bo Lin, Shangwen Wang, Kui Liu, Xiaoguang Mao, and Tegawendé F. Bissyandé. 2021. Automated Comment Update: How Far are We?. In Proceedings of ICPC 2021 (International Conference on Program Comprehension). IEEE/ACM, 36--46.Google ScholarGoogle Scholar
  21. Bin Lin, Alexey Zagalsky, Margaret-Anne Storey, and Alexander Serebrenik. 2016. Why Developers Are Slacking Off: Understanding How Software Teams Use Slack. In Proceedings of CSCW/SCC 2016 (Conference on Computer Supported Cooperative Work and Social Computing Companion). ACM, 333--336.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Christian D. Newman, Natalia Dragan, Michael L. Collard, Jonathan I. Maletic, Michael J. Decker, Drew T. Guarnera, and Nahla Abid. 2018. Automatically Generating Natural Language Documentation for Methods. In Proceedings of DysDoc 2018 (International Workshop on Dynamic Software Documentation). IEEE, 1--2.Google ScholarGoogle ScholarCross RefCross Ref
  23. Jalves Nicacio and Fabio Petrillo. 2021. Towards Improving Architectural Diagram Consistency Using System Descriptors. In Proceedings of ICPC 2021 (International Conference on Program Comprehension). IEEE/ACM, 401--405.Google ScholarGoogle ScholarCross RefCross Ref
  24. Dennis Pagano and Walid Maalej. 2011. How Do Developers Blog? An Exploratory Study. In Proceedings of MSR 2011 (Working Conference on Mining Software Repositories). ACM, 123--132.Google ScholarGoogle Scholar
  25. Chris Parnin, Christoph Treude, Lars Grammel, and Margaret-Anne Storey. 2012. Crowd Documentation: Exploring the Coverage and the Dynamics of API Discussions on Stack Overflow. Technical Report. Georgia Institute of Technology.Google ScholarGoogle Scholar
  26. Esteban Parra, Ashley Ellis, and Sonia Haiduc. 2020. GitterCom: A Dataset of Open Source Developer Communications in Gitter. In Proceedings of MSR 2020 (International Conference on Mining Software Repositories). ACM, 563--567.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jirat Pasuksmit, Patanamon Thongtanunam, and Shanika Karunasekera. 2021. Towards Just-Enough Documentation for Agile Effort Estimation: What Information Should Be Documented?. In Proceedings of ICSME 2021 (International Conference on Software Maintenance and Evolution). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  28. Luca Ponzanelli, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Michele Lanza. 2014. Mining StackOverflow to Turn the IDE into a Self-Confident Programming Prompter. In Proceedings of MSR 2014 (Working Conference on Mining Software Repositories). IEEE/ACM, 102--111.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Marco Raglianti, Roberto Minelli, Csaba Nagy, and Michele Lanza. 2021. Visualizing Discord Servers. In Proceedings of VISSOFT 2021 (Working Conference on Software Visualization). IEEE, 150--154.Google ScholarGoogle ScholarCross RefCross Ref
  30. Martin P Robillard. 2009. What Makes APIs Hard to Learn? Answers from Developers. IEEE Software 26, 6 (2009), 27--34.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Martin P Robillard and Robert DeLine. 2011. A Field Study of API Learning Obstacles. Empirical Software Engineering 16, 6 (2011), 703--732.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Martin P. Robillard, Andrian Marcus, Christoph Treude, Gabriele Bavota, Oscar Chaparro, Neil Ernst, Marco Aurélio Gerosa, Michael Godfrey, Michele Lanza, Mario Linares-Vásquez, Gail C. Murphy, Laura Moreno, David Shepherd, and Edmund Wong. 2017. On-demand Developer Documentation. In Proceedings of ICSME 2017 (International Conference on Software Maintenance and Evolution). IEEE, 479--483.Google ScholarGoogle ScholarCross RefCross Ref
  33. Lin Shi, Xiao Chen, Ye Yang, Hanzhi Jiang, Ziyou Jiang, Nan Niu, and Qing Wang. 2021. A First Look at Developers' Live Chat on Gitter. In Proceedings of ESEC/FSE 2021 (European Software Engineering Conference and Symposium on the Foundations of Software Engineering). ACM, 391--403.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Ian Sommerville. 2015. Software Engineering (10th ed.). Pearson.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Margaret-Anne Storey, David F. Fracchia, and Hausi A. Müller. 1999. Cognitive Design Elements to Support the Construction of a Mental Model During Software Exploration. Journal of Systems and Software 44, 3 (1999), 171--185.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Jirateep Tantisuwankul, Yusuf Sulistyo Nugroho, Raula Gaikovina Kula, Hideaki Hata, Arnon Rungsawang, Pattara Leelaprute, and Kenichi Matsumoto. 2019. A topological analysis of communication channels for knowledge sharing in contemporary GitHub projects. Journal of Systems and Software 158 (2019), 110416.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Yuan Tian, Palakorn Achananuparp, Ibrahim Nelman Lubis, David Lo, and Ee-Peng Lim. 2012. What does software engineering community microblog about?. In Proceedings of MSR 2012 (Working Conference on Mining Software Repositories). IEEE, 247--250.Google ScholarGoogle ScholarCross RefCross Ref
  38. Christoph Treude, Martin P. Robillard, and Barthélémy Dagenais. 2015. Extracting Development Tasks to Navigate Software Documentation. IEEE Transactions on Software Engineering 41, 6 (2015), 565--581.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Gias Uddin and Martin P Robillard. 2015. How API Documentation Fails. IEEE Software 32, 4 (2015), 68--75.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Fengcai Wen, Csaba Nagy, Gabriele Bavota, and Michele Lanza. 2019. A Large-Scale Empirical Study on Code-Comment Inconsistencies. In Proceedings of ICPC 2019 (International Conference on Program Comprehension). IEEE/ACM, 53--64.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Juan Zhai, Xiangzhe Xu, Yu Shi, Guanhong Tao, Minxue Pan, Shiqing Ma, Lei Xu, Weifeng Zhang, Lin Tan, and Xiangyu Zhang. 2020. CPC: Automatically Classifying and Propagating Natural Language Comments via Program Analysis. In Proceedings of ICSE 2020 (International Conference on Software Engineering). ACM, 1359--1371.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, and Xudong Liu. 2020. Retrieval-Based Neural Source Code Summarization. In Proceedings of ICSE 2020 (International Conference on Software Engineering). ACM, 1385--1397.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Junji Zhi, Vahid Garousi-Yusifoğlu, Bo Sun, Golara Garousi, Shawn Shahnewaz, and Guenther Ruhe. 2015. Cost, Benefits and Quality of Software Development Documentation: A Systematic Mapping. Journal of Systems and Software 99 (2015), 175--198.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Topology of the documentation landscape

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ICSE '22: Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings
          May 2022
          394 pages
          ISBN:9781450392235
          DOI:10.1145/3510454

          Copyright © 2022 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 19 October 2022

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate276of1,856submissions,15%

          Upcoming Conference

          ICSE 2025
        • Article Metrics

          • Downloads (Last 12 months)41
          • Downloads (Last 6 weeks)2

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader