How ontologies are made: Studying the hidden social dynamics behind collaborative ontology engineering projects
Introduction
Today, large-scale ontologies in fields such as biomedicine are developed collaboratively by a large set of distributed users, using tools such as collaborative Protégé [1], [2] that provide structured logs of changes of the ontology. Evaluating the outcome of such collaborative ontology engineering efforts is a problem of pressing practical and theoretical relevance: For managers and quality assurance personnel, understanding the quality of collaboratively constructed ontologies–and how they have been constructed–is key. For developers of tools for collaborative ontology construction, understanding these processes will help improve the tools and make them fit more naturally the process that is already taking place. For researchers, collaborative ontology engineering projects with large numbers of users involved add a new social layer and additional complexity to an already complex theoretical problem. Therefore, we need new methods and techniques to analyze and further investigate the social dynamics of collaborative ontology engineering efforts.
Traditionally, evaluation methods in the field of semantic technologies have focused on the end result of ontology engineering efforts, mainly, on evaluating ontologies and their corresponding qualities and characteristics. This focus has led to the development of a useful arsenal of ontology-evaluation techniques that study and investigate the quality of ontologies as a product [3]. However, ontology evaluation represents a wide open problem, and we need new techniques, especially for ontologies that are constructed collaboratively. For example, evaluating an ontology that has been constructed by hundreds of users without understanding who these users are, what they have contributed, where they had disagreed with one another, or how they have participated would paint a very narrow picture of the ontology under investigation. We argue that understanding the usually hidden social dynamics that have led to the construction of ontologies has the potential to create new insights and opportunities for ontology evaluation.
Our main objective in this paper is to study the social fabric of collaborative ontology engineering projects empirically, as a prerequisite for devising future evaluation methods that investigate the social processes behind such projects. Our high-level hypothesis is that quantitative analysis of ontology change data can provide qualitative insights into characteristics of collaborative ontology construction processes.
Our work is inspired by work of researchers who investigate the social dynamics behind collaborative construction processes in a range of different domains, including open source software and collaborative authoring systems such as Wikipedia. We will leverage and adapt work from these areas whenever possible in order to study and explore social dynamics in the context of collaborative ontology construction, such as the work of Suh et al. [4] who analyzed the influence of a set of different factors on collaboration between Wikipedia editors. Voss [5] conducted research regarding the analysis of different attributes of Wikipedia articles and users, such as the amount of edits contributed by each user or the amount of distinct users that worked on each article. Blumenstock [6], Wilkinson and Huberman [7] on the other hand analyzed and identified, among other things, that the amount of changes performed on an article in Wikipedia correlates with its quality. Stamelos et al. [8] studied the quality of code in open source software development projects by counting and comparing specific attributes of the committed source code against industry standards.
Research questions: Using historical data from five different collaboratively constructed ontologies in the field of biomedicine and a sample of Wikipedia articles as a control, we aim to study the following research issues:
- 1.
Dynamic aspects (Section 4.1):
- (a)
How does activity in the system evolve over time?
- (b)
How are changes to the ontology distributed across concepts?
- (c)
How does activity in ontology engineering projects differ from activity in other collaborative authoring systems such as Wikipedia?
- (a)
- 2.
Social aspects (Section 4.2):
- (a)
Is collaboration actually happening or do users work independently?
- (b)
How is the work distributed among users?
- (c)
How does collaboration in ontology engineering differ from collaboration in other systems such as Wikipedia?
- (a)
- 3.
Lexical aspects (Section 4.3):
- (a)
Is the vocabulary in the ontology stabilizing or does it continue to change/grow?
- (b)
Are the concepts in the ontology lexically stabilizing or do they continue to change?
- (a)
- 4.
Behavioral aspects (Section 4.4):
- (a)
Are collaborative ontologies constructed in a top-down or a bottom-up manner?
- (b)
Are collaborative authoring systems such as Wikipedia constructed similarly (i.e. top-down or bottom-up) to collaboratively engineered ontologies?
- (c)
How do contributors allocate activity on different abstraction levels in different ontologies?
- (a)
Contributions: To the best of our knowledge, the work presented in this paper represents the most fine-grained study of social dynamics in very large collaborative ontology engineering projects to date. We develop and apply quantitative metrics that help answer qualitative questions related to dynamic, social, lexical, and behavioral aspects of collaborative ontology engineering processes. Our results show that (i) there are qualitative differences between different collaborative ontology engineering projects that demand explanations in terms of organizing and managing quality in such projects and (ii) there are also interesting commonalities that set collaborative ontology engineering projects apart from other collaborative authoring projects such as Wikipedia. Our findings suggest that collaborative ontology engineering represents a novel and interesting phenomenon with unique characteristics that warrant more research in this direction.
The paper is structured as follows: In Section 2 we review related work. In Section 3, we introduce the data sets used in this study, and provide descriptive statistics. We proceed with presenting the results from our comparative study of change logs in Section 4. In Section 5, we discuss our results and interpret our findings. We conclude our paper with a summarization of our findings and implications in Section 6.
Section snippets
Related work
For the research presented in this paper, we consider work from the following domains to be of relevance: ontology evaluation; collaborative ontology engineering; collaborative authoring systems.
Material and methods
In the following study, we use two main types of data for our analysis: First, we use a set of biomedical ontologies that are being developed collaboratively in Protégé (and its derivatives) and a set of articles from Wikipedia describing biomedical terms as a control (Section 3.1); and second, we use the structured logs of changes that reflect collaborative development of these resources (Section 3.2).
Results
In the following, we present results from our empirical investigations on dynamic, social, lexical and behavioral aspects of collaborative ontology engineering processes.
Summary and discussion
In this paper, we present an analysis of quantitative data that characterizes collaborative development of several large biomedical ontologies. The analysis of this quantitative data enabled us to gain qualitative insight into dynamic, social, lexical, and behavioral aspects of the process of ontology engineering itself. We summarize these insights in the rest of this section by revisiting the set of our initial research questions:
Conclusions
This work exposes the hidden social dynamics behind collaborative ontology engineering projects. The main results of this paper are twofold: (i) On a theoretical level, our work makes an argument for expanding the existing arsenal of ontology evaluation techniques with new techniques that analyze the social dynamics behind collaborative ontology engineering projects. (ii) On an empirical level, our work conducts a broad investigation of five real-world collaborative ontology engineering
Acknowledgments
We want to thank the World Health Organization for providing us with change tracking data for ICD-11 and ICTM as well as answering our questions to help validating our results.
References (49)
- et al.
NCI thesaurus: a semantic model integrating cancer-related clinical and molecular information
J. Biomed. Inform.
(2007) - et al.
The biomedical resource ontology (BRO) to enable resource discovery in clinical and translational research
J. Biomed. Inform.
(2011) - et al.
Supporting collaborative ontology development in protégé
- et al.
Web Protégé: a distributed ontology editor and knowledge acquisition tool for the web
Semant. Web J.
(2011) - J. Brank, M. Grobelnik, D. Mladenić, A survey of ontology evaluation techniques, in: Proceedings of the Conference on...
- et al.
Us vs. them: understanding social dynamics in wikipedia with revert graph visualizations
Measuring wikipedia
Size matters: word count as a measure of quality on wikipedia
- et al.
Cooperation and quality in wikipedia
- et al.
Code quality analysis in open-source software development
Inf. Syst. J.
(2002)
Measuring similarity between ontologies
Ontologies are us: a unified model of social networks and semantics
The evaluation of ontologies
Semantic MediaWiki
OntoWiki—a tool for social, semantic collaboration
MoKi: the enterprise modelling wiki
Poolparty: SKOS thesaurus management utilizing linked data
Semant. Web: Res. Appl.
Will Semantic Web technologies work for the development of ICD-11?
Cited by (24)
Visualization and interaction for ontologies and linked data—Editorial
2019, Journal of Web SemanticsAnalyzing user interactions with biomedical ontologies: A visual perspective
2018, Journal of Web SemanticsCitation Excerpt :These studies have used the data provided by logs of user activity in collaborative ontology development tools. Strohmaier et al. [12] conducted an empirical investigation using user activity logs to measure the impact of collaboration on ontology-engineering projects. The authors developed several new metrics to quantify different aspects of the hidden social dynamics that take place in these collaborative ontology-engineering projects from the biomedical domain.
How to apply Markov chains for modeling sequential edit patterns in collaborative ontology-engineering projects
2015, International Journal of Human Computer StudiesCitation Excerpt :The authors applied it to the analysis of the ICD-11 project. Strohmaier et al. (2013) investigated the hidden social dynamics that take place in collaborative ontology-engineering projects from the biomedical domain and provided new metrics to quantify various aspects of the collaborative engineering processes. Falconer et al. (2011) investigated the change-logs of collaborative ontology-engineering projects, showing that contributors exhibit specific roles, which can be used to group and classify these users, when contributing to the ontology.
Discovering Beaten Paths in Collaborative Ontology-Engineering Projects using Markov Chains
2014, Journal of Biomedical InformaticsCitation Excerpt :In contrast to Mikroyannidi et al., our analysis focuses on the detection of sequential patterns in interaction data rather than content. Strohmaier et al. [23] investigated the hidden social dynamics that take place in collaborative ontology-engineering projects from the biomedical domain and provides new metrics to quantify various aspects of the collaborative engineering processes. Wang et al. [24] have used association-rule mining to analyze user editing patterns in collaborative ontology-engineering projects.
Analysis and implementation of the DynDiff tool when comparing versions of ontology
2023, Journal of Biomedical SemanticsDynDiff: A Tool for Comparing Versions of Large Ontologies
2022, CEUR Workshop Proceedings