Skip to main content
Log in

Collab-RS: semantic recommendation of external collaborators for projects in software ecosystems

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The software development industry has evolved in recent years, presenting new challenges. In this scenario, software ecosystems have emerged as a new development paradigm through which external contributors support software production by providing solutions that complement a common ecosystem platform. Due to the many technologies, frameworks, and domains that an ecosystem can host, many collaborators acquainted with various domain topics and skills have also come into play. Recruiting collaborators becomes complex due to the varying degrees of knowledge and skills each collaborator has and their multiple competencies. There is a need to support the decision-making in the collaborator’s recruitment, using the knowledge related to their skills. This work presents a solution supported by an ontology capable of recommending external collaborators for specific projects. The solution encompasses an architecture based on semantic models and expertise retrieval techniques. The architecture scores the collaborators’ level of knowledge about topics and provides contextual information for the recommendation. Two studies were conducted involving two real software ecosystem platforms (Node.js and E-SECO). Results reveal that our approach can (i) use semantic models and inference mechanisms, (ii) offer context information essential for recruiter decision-making, and (iii) support recruiter’ decision on contributor selection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. Software product lines (SPLs), or software product line development, refer to software engineering methods, tools, and techniques for creating a collection of similar software systems from a shared set of software assets using a common means of production.

  2. Global software development (GSD) is multi-site software development with software teams scattered across different places around the world.

  3. A software ecosystem is defined as a set of actors (e.g., software developers, vendors, distributors, complementors) interacting on a shared software market to satisfy consumers’ demands.

  4. A socio-technical system (STS) considers hardware, software, personal, and community requirements. It applies an understanding of the social structures, roles, and rights (the social sciences) to inform the design of systems that involve communities of people and technology.

  5. It is a conceptual data model that includes the capability to express and exchange information, enabling parties to interpret meaning (semantics) from the instances, without needing to know the meta-model. Such semantic models are fact-oriented (as opposed to object-oriented). Facts are typically expressed by binary relations between data elements, whereas higher-order relations are expressed as collections of binary relations. Typically, binary relations have the form of triples: Object-RelationType-Object.

  6. Expertise Retrieval is defined as finding experts in different subject areas and identifying people’s expertise area(s) using computational techniques.

  7. Developers that work on other SECO´s projects.

  8. A SECO platform can be defined as a stable core (such as a smartphone operating system or a music streaming service) that mediates the relationship between a wide range of complements (like apps, games, or songs) and prospective developers, managers, and end-users.

  9. A variability model is a model that captures the commonality and variability of the software product families.

  10. Domain engineering (DE) analyzes software systems for the concepts, notations, and implementation methods and codifies that knowledge in a reusable form.

  11. An ontology encompasses a representation, formal naming, and definition of the categories, properties, and relations between the concepts, data, and entities that substantiate one, many, or all domains of discourse.

  12. The details of a systematic mapping we conducted in this area can be accessed at https://doi.org/10.13140/RG.2.2.30637.72161/1.

  13. https://www.w3.org/OWL/.

  14. https://github.com/marciotojr/collab-rs-SBES/blob/master/SECOn.owl.

  15. https://doi.org/10.13140/RG.2.2.12182.78403.

  16. This example was extracted from our evaluation HR1.

  17. Matchers are software capable of identifying the equivalence between components of an ontology (classes, individuals, properties).

  18. This technology can be replaced if another more suitable technique is available, only compatibility with the endpoints is required.

  19. https://nodejs.org/.

  20. https://v8.dev.

  21. https://www.npmjs.com.

  22. https://developer.github.com/v3/.

  23. GHTorrent is a set of services and tools available on the web that monitors Github’s public events, extracts information from these events, and stores them in databases. http://ghtorrent.org.

  24. https://www.antlr.org.

  25. https://git-scm.com/docs/git-blame.

  26. ANTLR is a parser generator that can be used to read and process structured codes, allowing the user to create an analysis tree representing the code for analysis at a syntactic level. For example, this type of representation allows us to retrieve lines on which the dependencies were called.

  27. In the ANTLR tree representation, the parent element containing the "require" command represents the command line on which require is invoked. Having the element representing the line, we can check which constant or variable the content imported by the command is stored and which package was imported. We can then continue with the file parser, observing the use of this constant or variable. When the constant or variable is used, the line in which its invocation occurred records it to verify who used it.

  28. http://pgcc.github.io/plscience-ecos/.

  29. https://www.biocatalogue.org/

  30. https://www.myexperiment.org/home.

  31. A set of terms composes the query, which must be compared to the textual content directly or indirectly related to the collaborators. The query must contain the terms searched in the tags, titles, descriptions, and other comments about the scientific workflows in the E-SECO scenario.

  32. https://www.genome.jp/kegg/.

  33. Clicking on the external link icon redirects the user to the pages from which this information was extracted, providing more details about the type of information that led to the recommendation. The topic’s provenance is separated according to its type to check the multiple sources of information obtained through the query.

  34. To quote: "The E-SECO platform has a major deficiency today, which is the lack of support for the distributed development of scientific applications. It is easy to specify a distributed experiment where scientists from around the world can collaborate. Based on scientific applications already known to the community, the E-SECO platform can compose scientific experiments. However, when someone wants to develop/use new applications in experiments, the support is minimal, and there is no support for the search for suitable developers. As we can see in the queries’ return, the repositories associated with E-SECO did not help in this search for external developers. These results reinforced the need to evolve the platform considering business and social support. We hope that Collab-RS can assist as the first step in that direction. I look forward to the results achieved by its use".

  35. To quote: "With no doubt the use of Collab-RS improves the results, considering the search for developers specialized in scientific applications related to specific topics, but I expected more outstanding results, to recommend developers related to specific scientific techniques but also that we could identify developers related to specific development technologies, such as java, python, and specific functionalities of scientific applications, which could be easily composed with other applications in a scientific workflow. However, I congratulate the work and the results returned, as it is undoubtedly an important improvement for the E-SECO platform. I also emphasize that I know that the results can be significantly improved by refining the associated repositories’ metadata. In this sense, we must try to create a meta-repository on the E-SECO platform, which stores metadata linked to the applications stored in the associated repositories, to feed Collab-RS’s search with more information, improving its results. Anyway, we have made progress in supporting the search for external developers at E-SECO, but we must continue to improve. Considering the question: If Collab-RS solution can recommend external developers for software projects related to a SECO, I can answer YES, since now, with the use of Collab-RS, we have a mechanism which presents, in detail (context), how the developer recommendation process takes place in E-SECO, but also MAYBE, considering the necessity of more metadata to feed the search process".

  36. Clicking on the external link icon redirects the user to the pages from which this information was extracted, providing more details about the type of information that led to the recommendation. The topic’s provenance is separated according to its type to check the multiple sources of information obtained through the query.

References

  1. Bosch J (2009) From software product lines to software ecosystems. In: SPLC '09 Proceedings of the 13th international software product line conference, p. 111–119

  2. Jansen S (2020) A focus area maturity model for software ecosystem governance. Inf Softw Technol 118:106219. https://doi.org/10.1016/j.infsof.2019.106219

    Article  Google Scholar 

  3. Clements P, Northrop L (2002) Software product lines: practices and patterns. Addison-Wesley

    Google Scholar 

  4. Herbsleb JD, Moitra D (2001) Global software development. IEEE Softw 18(2):16–20. https://doi.org/10.1109/52.914732

    Article  Google Scholar 

  5. Herbsleb J (2007) Global software engineering: the future of socio-technical coordination. In: Future of software engineering (FOSE '07), Minneapolis, MN, USA, pp. 188–198. Doi: https://doi.org/10.1109/FOSE.2007.11

  6. Manikas K, Hansen K (2013) Software ecosystems: a systematic literature review. J Syst Softw 86(5):1294–1306. https://doi.org/10.1016/j.jss.2012.12.026

    Article  Google Scholar 

  7. Lugu M, Lanza M, Gîrba T, Robbes R (2010) The small project observatory: visualizing software ecosystems. Sci Comput Program 75(4):264–275. https://doi.org/10.1016/j.scico.2009.09.004

    Article  MathSciNet  Google Scholar 

  8. Syed S, Jansen S (2013) On clusters in open source ecosystems. In: CEUR-WS, pp. 13–25

  9. Constantino K, Zhou S, Souza M, Figueiredo E, Kastner C (2020) Understanding collaborative software development: an interview study. In: Proceedings of the 15th international conference on global software engineering (ICGSE '20). Association for computing machinery, New York, NY, USA, pp. 55–65. Doi: https://doi.org/10.1145/3372787.3390442

  10. Pinto G, Dias L, Steinmacher G (2018) Who gets a patch accepted first? comparing the contributions of employees and volunteers. In: Proceedings of the 11th international workshop on cooperative and human aspects of software engineering (CHASE '18). Association for computing machinery, New York, NY, USA, pp. 110–113. Doi: https://doi.org/10.1145/3195836.3195858

  11. Farias V, Wiese I, Santos R (2019) What characterizes an influencer in software ecosystems? IEEE Softw 36(1):42–47. https://doi.org/10.1109/MS.2018.2874325

    Article  Google Scholar 

  12. Teixeira J, Robles G, Gonzales-Barahona J (2015) Lessons learned from applying social network analysis on an industrial free/libre/open source software ecosystem. J Internet Serv Appl 6(1):14. https://doi.org/10.1186/s13174-015-0028-2

    Article  Google Scholar 

  13. Qiu Y, Hann Il-Horn, Gopal A (2013) From invisible hand to visible hand: platform governance and institutional logic of independent Mac developers. In: ICIS 2013 proceedings, pp. 2001–2012. Doi: https://doi.org/10.5465/ambpp.2014.14385abstract

  14. Wareham J, Fox P, Cano J (2013) Technology ecosystem governance. SSRN Electr J. https://doi.org/10.1287/orsc.2014.0895

    Article  Google Scholar 

  15. Rickmann T, Wenzel S, Fishbach K (2014) Software ecosystem orchestration: the perspective of complementors. In: Twentieth Americas conference on information systems, pp. 1–14

  16. Smedlund A, Faghankhain H (2015) Platform orchestration for efficiency, development, and innovation. In: IEEE Computer Society, pp. 1380–1388. Doi: https://doi.org/10.1109/HICSS.2015.169

  17. Valença G, Alves C (2017) We need to discuss the relationship: an analysis of facilitators and; barriers of software ecosystem partnerships. In: ICEIS 2017: proceedings of the 19th international conference on enterprise information systems, vol 2, pp. 978–989. Doi: https://doi.org/10.5220/0006231900170028

  18. Marlow J, Dabbish L, Herbsleb J (2013) Impression formation in online peer production: activity traces and personal profiles in github. In: Proceedings of the 2013 conference on computer supported cooperative work, pp. 117–128. Doi: https://doi.org/10.1145/2441776.2441792

  19. Hauff C, Gousios G (2015) Matching GitHub developer profiles to job advertisements. In: 2015 IEEE/ACM 12th working conference on mining software repositories, pp. 362–366. Doi: https://doi.org/10.1109/MSR.2015.41

  20. Greene S, Thapliyal H, Caban-Holt A (2016) A survey of affective computing for stress detection: evaluating technologies in stress detection for better health. IEEE Consum Electron Mag 5(4):44–56. https://doi.org/10.1109/MCE.2016.2590178

    Article  Google Scholar 

  21. Papoutsoglou M, Mittas N, Angelis L (2017) Mining people analytics from stackoverflow job advertisements. In: 2017 43rd Euromicro conference on software engineering. Doi: https://doi.org/10.1109/SEAA.2017.50

  22. Hevner A, March S, Park J, Ram S (2004) Design science in information systems research. Manag Inf Syst Q 28(1):6

    Article  Google Scholar 

  23. Campbell P, Ahmed F (2010) A three-dimensional view of software ecosystems. In: CSA '10: proceedings of the fourth European conference on software architecture: companion pp. 81–84. Doi: https://doi.org/10.1145/1842752.1842774

  24. Dos Santos R, Esteves M, Freitas G, De Souza J (2014) Using social networks to support software ecosystems comprehension and evolution. Soc Netw 03(2):108–118. https://doi.org/10.4236/sn.2014.32014

    Article  Google Scholar 

  25. Adomavicius G, Tuzhilin A (2005) Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans Knowl Data Eng 17(6):734–749. https://doi.org/10.1109/TKDE.2005.99

    Article  Google Scholar 

  26. Bobadilla J, Ortega F, Hernando A, Gutiérrez A (2013) Recommender systems survey. Knowl-Based Syst 46:109–132. https://doi.org/10.1016/j.knosys.2013.03.012

    Article  Google Scholar 

  27. Adomavicius G, Tuzhilin A (2011) Context-aware recommender systems. Recommender systems handbook. Springer, Boston, MA, pp 217–253

    Chapter  Google Scholar 

  28. Jorro-Aragoneses J, Diaz-Agudo B, Recio-Garcia JA (2020) RECOLIBRY SUITE: a set of intelligent tools for the development of recommender systems. Autom Softw Eng 27:63–89. https://doi.org/10.1007/s10515-020-00269-4

    Article  Google Scholar 

  29. Capuruço R, Capretz L (2010) Integrating recommender information in social ecosystems decisions. In: Proceedings of the Fourth European conference on software architecture: companion volume (ECSA '10). Association for computing machinery, New York, NY, USA, pp. 143–150. Doi: https://doi.org/10.1145/1842752.1842783

  30. George S, Lathabai H, Prabhakaran T, Changat M (2021) A framework for inventor collaboration recommendation system based on network approach. Expert Syst Appl 176:114833. https://doi.org/10.1016/j.eswa.2021.114833

    Article  Google Scholar 

  31. Guarino N, Oberle D, Staab S (2009) What is an ontology?. In: Handbook on ontologies, pp. 1–17. Doi: https://doi.org/10.1007/978-3-540-92673-3_0

  32. Balog K, Fang Y, de Rijke M, Serdyukov P, Si L (2012) Expertise retrieval. Found Trends Inf Retr 6(2–3):127–256. https://doi.org/10.1561/1500000024

    Article  Google Scholar 

  33. Herbold S, Amirfallah A, Trautsch F, Grabowski J (2021) A systematic mapping study of developer social network research. J Syst Softw 171:110802. https://doi.org/10.1016/j.jss.2020.110802

    Article  Google Scholar 

  34. Fontão A, Cleger-Tamayo S, Wiese I, Dos Santos R, Dias-Neto A (2020) On value creation in developer relations (DevRel): a practitioners' perspective. In: Proceedings of the 15th international conference on global software engineering (ICGSE '20). Association for computing machinery, New York, NY, USA, pp. 33–42. Doi: https://doi.org/10.1145/3372787.3390440

  35. Hou T, Yao X, Gong D (2020) Community detection in software ecosystem by comprehensively evaluating developer cooperation intensity. Inf Softw Technol 130:106451. https://doi.org/10.1016/j.infsof.2020.106451

    Article  Google Scholar 

  36. Blincoe K, Harrison F, Kaur N, Damian D (2019) Reference coupling: an exploration of inter-project technical dependencies and their characteristics within large software ecosystems. Inf Softw Technol 110:174–189. https://doi.org/10.1016/j.infsof.2019.03.005

    Article  Google Scholar 

  37. Sun X, Xu W, Xia X, Chen X, Li B (2018) Personalized project recommendation on GitHub. Sci China Inf Sci 61(5):050106. https://doi.org/10.1007/s11432-017-9419-x

    Article  Google Scholar 

  38. Avelino G, Passos L, Petrillo F, Valente MT (2018) Who can maintain this code? Assessing the effectiveness of repository-mining techniques for identifying software maintainers. IEEE Softw. https://doi.org/10.1109/MS.2018.185140155

    Article  Google Scholar 

  39. Montandon J, Silva L, Valente M (2019) Identifying experts in software libraries and frameworks among GitHub users. In: Proceedings of the 16th international conference on mining software repositories, MSR 2019, pp. 276–287. Doi: https://doi.org/10.1109/MSR.2019.00054

  40. Zhao W, Pu S (2021) Collaboration prediction in heterogeneous academic network with dynamic structure and topic. Knowl Inf Syst 63:2053–2074. https://doi.org/10.1007/s10115-021-01580-6

    Article  Google Scholar 

  41. Breslin JG, Decker S, Harth A, Bojars U (2006) SIOC: an approach to connect web-based communities. Int J Web Based Commun 2(2):133. https://doi.org/10.1504/IJWBC.2006.010305

    Article  Google Scholar 

  42. Würsch M, Ghezzi G, Hert M, Reif G, Gall HC (2012) SEON: a pyramid of ontologies for software evolution and its applications. Computing 94(11):857–885. https://doi.org/10.1007/s00607-012-0204-1

    Article  Google Scholar 

  43. Boucharas V, Jansen S, Brinkkemper S (2009) Formalizing software ecosystem modeling. pp. 41–50. Doi: https://doi.org/10.1145/1595800.1595807

  44. Jansen S, Finkelstein A, Brinkkemper S (2007) Providing transparency in the business of software: a modelling technique for software supply networks. In: Proceedings of the 8th IFIP working conference on virtual enterprises. Gumares, Portugal: IFIP

  45. Alves C, Oliveira J, Jansen S (2017) Software ecosystems governance a systematic literature review and research agenda. SciTePress. https://doi.org/10.5220/0006269402150226

    Article  Google Scholar 

  46. Horrocks I, Patel-Schneider P, Boley H, Tabet S, Grosof B, Dean M (2004) SWRL: a semantic web rule language combining OWL and RuleML. W3C Member Submission, 21(79): 1–31

  47. Baader F, Lutz C (2007) 13 Description logic. Stud Logic Pract Reason 3:757–819. https://doi.org/10.1016/S1570-2464(07)80016-4

    Article  Google Scholar 

  48. Oliveira M, Braga R, Ghiotto G, David J (2019) Recommending external developers to software projects based on historical analysis of previous contributions. In: SBES 2019 Brazilian symposium on software engineering, pp. 417–426. Doi: https://doi.org/10.1145/3350768.3352458

  49. Stal M (2019) Using architectural patterns and blueprints for service-oriented architecture. IEEE Softw 23(2):54–61

    Article  Google Scholar 

  50. Sirin E, Parsia B, Grau BC, Kalyanpur A, Katz Y (2007) Pellet: a practical OWL-DL reasoner. J Web Semant 5(2):51–53. https://doi.org/10.1016/j.websem.2007.03.004

    Article  Google Scholar 

  51. Fang H, Zhai C (2007) Probabilistic models for expert finding. Adv Inf Retr. https://doi.org/10.1007/978-3-540-71496-5_38

    Article  Google Scholar 

  52. Yin RK (2014) Case study design and methods, 5th edn. Beverly Hills, Sage Publications

    Google Scholar 

  53. Shull F, Mendonça M, Basili V (2004) Knowledge-sharing issues in experimental software engineering. Empir Softw Eng 9(1/2):111–137. https://doi.org/10.1023/B:EMSE.0000013516.80487.33

    Article  Google Scholar 

  54. Classe T, Braga R, David J, Campos F (2017) A distributed infrastructure to support scientific experiments. J Grid Comput 15(4):475–500. https://doi.org/10.1007/s10723-017-9401-7

    Article  Google Scholar 

  55. Ambrósio L, Linhares H, David JMN, Braga R, Arbex W, Campos MM, Capilla R (2021) Enhancing the reuse of scientific experiments for agricultural software ecosystems. J Grid Comput 19(4):1–24. https://doi.org/10.1007/s10723-021-09583-x

    Article  Google Scholar 

  56. Wohlin C, Runeson P, Höst M, Ohlsson M, Regnell B, Wesslén A (2012) Experimentation in software engineering. Springer, Cham

    Book  Google Scholar 

  57. Van Solingen R, Basili V, Caldiera G, Rombach H (2002) Goal question metric (GQM) approach. In: Encyclopedia of software engineering. Wiley: Hoboken, NJ, USA

  58. Constantinou E, Kapitsaki GM (2016) Identifying developers' expertise in social coding platforms. In: 42th Euromicro conference on software engineering and advanced applications (SEAA), pp. 63–67. Doi: https://doi.org/10.1109/SEAA.2016.18

  59. Gedikli F, Jannach D, Ge M (2014) How should I explain? A comparison of different explanation types for recommender systems. Int J Hum Comput Stud 72(4):367–382. https://doi.org/10.1016/j.ijhcs.2013.12.007

    Article  Google Scholar 

  60. Campos R, Dos Santos R, Oliveira J (2020) A recommendation system based on knowledge gap identification in MOOCs ecosystems. In: XVI Brazilian symposium on information systems (SBSI'20). Association for computing machinery, New York, NY, USA, vol 2, pp. 1–8. Doi: https://doi.org/10.1145/3411564.3411572

  61. Kitchenham B (2012) Systematic review in software engineering. Doi: https://doi.org/10.1145/2372233.2372235

  62. Van Angeren J, Jansen S, Brinkkemper S (2014) Exploring the relationship between partnership model participation and interfirm network structure: an analysis of the office365 ecosystem. Doi: https://doi.org/10.1007/978-3-319-08738-2_1

  63. Kude T, Dibbern J, Heinzl A (2012) Why do complementors participate? An analysis of partnership networks in the enterprise software industry. IEEE Trans Eng Manage 59(2):250–265. https://doi.org/10.1109/TEM.2011.2111421

    Article  Google Scholar 

  64. Doan A et al. (2012) Introduction. Principles of data integration, pp. 1–18

Download references

Acknowledgements

We would like to thank the people who participated in the evaluation.

Funding

This work was partially funded by UFJF/Brazil, CAPES/Brazil, CNPq/Brazil (Grant: 307194/2022-1), and FAPEMIG/Brazil (Grant: APQ-02685-17), (Grant: APQ-02194-18).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Regina Braga.

Ethics declarations

Conflict of interest

All authors: none.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Oliveira, M., Braga, R., Ghiotto, G. et al. Collab-RS: semantic recommendation of external collaborators for projects in software ecosystems. Knowl Inf Syst 66, 147–186 (2024). https://doi.org/10.1007/s10115-023-01954-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-023-01954-y

Keywords

Navigation