Beware of the hierarchy — An analysis of ontology evolution and the materialisation impact for biomedical ontologies

: Ontologies are becoming a key component of numerous applications and research fields. But knowledge captured within ontologies is not static. Some ontology updates potentially have a wide ranging impact; others only affect very localised parts of the ontology and their applications. Investigating the impact of the evolution gives us insight into the editing behaviour but also signals ontology engineers and users how the ontology evolution is affecting other applications. However, such research is in its infancy. Hence, we need to investigate the evolution itself and its impact on the simplest of applications: the materialisation. In this work, we define impact measures that capture the effect of changes on the materialisation. In the future, the impact measures introduced in this work can be used to investigate how aware the ontology editors are about consequences of changes. By introducing five different measures, which focus either on the change in the materialisation with respect to the size or on the number of changes applied, we are able to quantify the consequences of ontology changes. To see these measures in action, we investigate the evolution and its impact on materialisation for nine open biomedical ontologies, most of which adhere to the description logic. Our results show that these ontologies evolve at varying paces but no statistically significant difference between the ontologies with respect to their evolution could be identified. We identify three types of ontologies based on the types of complex changes which are applied to them throughout their evolution. The impact on the materialisation is the same for the investigated ontologies, bringing us to the conclusion that the effect of changes on the materialisation can be generalised to other similar ontologies. Further, we found that the materialised concept inclusion axioms experience most of the impact induced by changes to the class inheritance of the ontology and other changes only marginally touch the materialisation.


Introduction
Change are inevitable. Especially when capturing our growing knowledge, adaptation becomes necessary [1]. Therefore, ontologies like the Gene Ontology (GO) [2] and the National Cancer Institute Thesaurus (NCIT) [3] change over time to adapt the representation to the evolving knowledge. Experts or communities usually take care of the maintenance of these ontologies by adding new knowledge and removing or updating outdated and wrong information. For example, GO evolves as experts add new gene annotations of living organisms. Researchers then apply functions to this graph, such as functional enrichment analysis [4,5], mRNA expression, proteomics, genetic, or DNA methylation data analysis [6].
In this study, we focus on a general function-the materialisation. It is a deterministic logical entailment, which infers implicit statements. We investigate the materialisation because it is a common calculation used on ontologies to check for consistency but also for subsequent further tasks such as querying and recommendations. A small change in a class definition can have a notable impact on the ontology, significantly vary the number of materialised axioms, or even its consistency. However, not every change in the ontology leads to significant changes in the materialised graph. What are the consequences of ontology evolution on functions like the materialisation over time? Moreover, how does this change in materialisation further impact other tasks and applications?
A materialised ontology is larger than the original and its computation may consume considerable amounts of resources. As a consequence, indicating that a change may cause a potentially significant difference in materialisation would signal the necessity for re-computation. This could lead not only to the recomputation of the materialisation but also to other results and calculations on top of the materialisation. Hence, we believe that curators and users need to be aware of the evolution and its consequences.

RQ1.1.
What is the relation between the number of changes applied to the ontology and its size for open biomedical ontologies, the evolution of which is available online?

RQ1.2.
What are the most common complex changes which are applied to open biomedical ontologies, the evolution of which is available online?
We consider the growth of the ontologies in different aspects like number of triples or axioms, and the changes between consecutive versions themselves. We make use of the classification of complex changes provided by COnto-Diff [14] to also assess the evolution and the applied changes in more detail.
Due to the lack of measures to compare materialisations, we first define such an instrument. Therefore, our second research question is:

RQ2.
What are measures that are simple to compute and allow to assess the impact of the ontology change on the ontology's materialisation?
We focus on measures that are meaningful in quantifying the impact, and at the same time they require a limited amount of calculation to compare two materialised ontologies. Our goal is to be able to compute these measures in an online environment like Protégé [15], without negatively affecting the usability of such tools. Impact measures can be added to the ChImp plugin [16] to widen the range of information displayed to the ontology editors while they are changing an ontology. We define two sets of impact measures to be able to capture different relations between the changes and the impact on the materialisation. The first set exhibits the consequences of ontology changes with respect to the size of the materialisation. The second set focuses on the changes to the materialisation with respect to the underlying changes to the ontology. Then, we calculate the impact measures for the selected ontologies and compare them. Given the previous evolution analysis, we want to identify relevant aspects of changes and change types for the impact. Therefore, our last research question is: RQ3. What aspects of changes and which complex change types have the largest share in the impact on the materialisation?
We found that a distinction between ontologies based purely on size is not statistically significant. Nonetheless, we observe a very high correlation between the number of changes and impact for all ontologies. Additions of leaves, moves, and changes of attribute values are the three most common change types in the evolution of the nine selected ontologies. Further, we find Table 1 Definitions of symbols used in Fig. 1 and in the definitions of metrics in Section 3.1.

Symbol Description
O i Ontology at time i Removed axioms New, inferred axioms part of M i+1 but not in M i Old, inferred axioms part of M i but not in M i+1 Quantification of the difference between M i and M i+1 that concept inclusion axiom changes have the highest impact on the materialisation. Most ontologies only experience impact by changes to the class hierarchy. At the same time, the impact is observed on the materialised class hierarchy as well. Most other changes do not have a significant influence on the materialisation at all. Interestingly, complex change actions do not correlate with the impact on the materialisation. The location of the changes (subclass changes) is much more important to determine if changes will have a large impact or not.
In conclusion, this study presents the following contributions: • the definition of five novel impact measures, which capture the effect of changes on the materialisation, and • an investigation of the evolution of nine OBO ontologies and their evolution's impact on the materialisation, where we found that: the size of an ontology does not have a statistically significant effect on the number of changes applied, -addition of leaves, moves, and changes of attribute values are the most common complex change action types, the impact on the materialisation is minimal compared to the size of the ontologies, -the impact correlates highly with the (absolute and relative) number of changes applied for all ontologies, and, lastly, most changes are applied within the class hierarchy which also leads to the most impact being on the class hierarchy as well.
This paper is structured as follows. Section 2 explains and formalises the problem with the corresponding background information and related research. We define the ontology and impact measures in Section 3 and explain our calculation approach. In Section 4, we introduce the ontologies in detail. We analyse and discuss the evolution as well as the impact on the materialisation in Section 5 and address limitations and future work in Section 6. Lastly, Section 7 concludes this work.

Background and related research
This section formalises the problem of ontology evolution and its impact on downstream tasks. A visualisation based on our previous work [17] is shown in Fig. 1. We explain it in the remainder of this section.
We also introduce the background to the various aspects of Fig. 1, as well as the related research. First, we formally define ontologies. Then, we introduce the materialisation task and how it can be executed. The next step is the topic of ontology evolution as well as the formal definition of streams upon which the visualisation and definition from [17] are based. Lastly, we present the topic of impact of evolution and we discuss the related research.  Table 1.

Ontology and EL ++
We use the definition of ontology proposed by Baader, Brandt, and Lutz in [18]. Such ontologies are expressed through the description logic EL ++ . OWL 2 EL, which adheres to EL ++ , is a OWL 2 profile suited for ontologies with a very large number of concepts organised in complex structures. In EL ++ , the consistency checking and class expression subsumption tasks can be executed in polynomial time. As such, it is often used in biological and biomedical ontology engineering. When talking about ontologies in this work we refer to such EL ++ ontologies and denote to one ontology with O.
An ontology is made of two sets-the TBox and the ABox. A TBox consists of four types of axioms: general concept inclusions (GCIs), role inclusions, domain restrictions, and role restrictions. The ABox includes concept and role assertions. We use T and A to refer to the TBox and ABox, respectively.
As in other description logics, EL ++ distinguishes between classes, individuals, and properties. The set of classes is composed of classes defined in the TBox and used as types in the ABox. Individuals are entities belonging to classes. The set of properties includes those defined in the TBox, and used in the ABox to relate individuals.

Materialisation and reasoners
Materialisation is the process of calculating the implicit statements in an ontology. This is done by taking into account both the ontological language used to define the ontology as well as the axioms and assertions stored in the ontology. We define the materialisation mat(·) as a function that is applicable to an ontology and produces the result M, as shown in Fig. 1. In this work, the materialisation M does not include O.
There are various reasoners that can perform a materialisation and they focus on different ontological languages, and implement distinct algorithms. HermiT [19] and FaCT++ [20] are two examples of general purpose reasoners that handle OWL 2 DL. The HermiT reasoner can check for consistency as well as identify subsumption relationships [19]. However, the reasoning becomes incomplete if the ontology contains property chains or transitivity axioms. FaCT++ is another reasoner for expressive description logic [20]. FaCT++ and HermiT deploy different reasoning algorithms to obtain the materialisation but are expected to return the same results. There are also reasoners which are specifically developed to support ontologies in EL ++ logic, such as ELK [21], CEL [22], and TrOWL [23] Lastly, there are incremental reasoners, such as RDFox and Pellet, especially when dealing with ontology evolution. Incremental reasoning is enabled by reusing previous results when dealing with dynamic knowledge bases. RDFox is a triple store which supports parallel datalog reasoning and incremental updates to the materialisation [24]. Pellet is a sound and complete OWL-DL reasoner that supports reasoning with individuals, custom data types, and other unique features [25].

Ontology evolution
We define an evolving ontology O as a sequence where O i denotes the ontology version i. This definition of evolving ontology is similar to the one of ontology stream proposed by Ren and Pan in [26]. Let O i and O i+1 be two consecutive versions in O. The update of O between i and i + 1 is described by a set of changes δ i . δ indicates a set of edits that are authored by one or more agents, such as ontology engineers, curators, or maintenance bots.
Ontology evolution is a well studied and understood topic. Zablith et al. [27] survey various evolution processes. In addition, Hartung et al. [28] show the different tools for managing, exploring, and propagating changes on ontologies. Both focus on how ontologies are maintained. Our study is orthogonal to the mentioned ones, since they do not consider the consequences of the evolution and exclusively look at the need for updates as well as how these are conducted. Rashid et al. [29] use the evolution of a knowledge graph to assess its quality by examining consistency, completeness, persistence, and historic persistence. Quesada-Martínez et al. [30] use the OQuaRE framework [31] to investigate the evolution and quality of eight OBO Foundry ontologies.
The study of ontology changes and change classification is orthogonal to ontology evolution research. OntoDiff [32] is a tool that enables the user to detect changes between two versions of the same graph. It works by identifying semantically equivalent elements between the ontologies. Klein and Noy [33] developed an ontology describing 80 basic changes. They also introduce a notion of complex changes, showing how they help in the interpretation of consequences for data and entities. Papavasileiou et al. [34] propose and evaluate a new approach based on a language to express changes, together with an algorithm to compute changes between versions. They require the ontology to be in RDF(s), which is usually not the case for biomedical ontologies. COnto-Diff [14] and the integrated CODEX [35] both detect changes and group low level changes into high level change actions. They provide a simple classification and a rich action semantics. In our analysis, we will use COnto-Diff to classify changes into complex changes, because it is specifically targeted towards the biomedical domain.
Flouris et al. [36] distinguish between evolution, debugging, and other aspect of ontology change and provide an overview of research in each of the areas. Noy and Klein in [1] already made clear that ontology evolution is not the same as database schema evolution. They point out that an evolution's consequences are generally unknown because of the decentralisation of ontologies. With this work, we aim to lessen the huge gap consisting of the unknown consequences of ontology evolution by quantifying the impact of evolution. Potentially, our research could lead to better informing both sides -the developers and the users -on the issues of ontology evolution and its consequences.
There is also research that focuses on the prediction of ontology evolution [37][38][39]. Pesquita et al. [37] focus on GO and learn a model predicting which parts of GO will undergo annotation change in the next version. Similarly, Cardoso et al. [38] exploit the evolution of biomedical ontologies and various features to build predictive model to identify concepts that will change in the future as well as the type of changes. Also, Meroño et al. [39] go into this direction of predicting changes and expand the work of Pesquita et al. [37] with semantic drift for the detection and prediction of changes. These three studies differ from our research since they do not consider consequences of the changes, but they predict where changes will happen in the next version of an ontology. Their models could be used to help ontology editors with their maintenance task, whereas our research would inform them about consequences of the applied changes.

Evolution impact
Given an ontology One form of impact from ontology evolution is semantic drift. As an ontology evolves, the meaning of names within the ontology can shift, causing new alignments between concepts and real world things. Therefore, semantic drift is a form of impact of the ontology evolution.
Originally, this type of impact is used in linguistics, where words drift semantically over time. Wegmann et al. [40] recently investigated different forms of semantic drift and how to measure it using word embeddings and neighbourhood comparisons. For ontologies, SemaDrift [41] is a tool to calculate various semantic drift measures between versions of ontologies. It applies different methods of calculation and distinguishes between an exact, inexact, and hybrid ontology matching approach. OntoDrift [42] builds on top of SemaDrift, addressing some of their shortcomings by including more aspects in their semantic drift calculations. A notion of semantic drift has also been investigated in the context of code repositories and bug fixing [43]. The defect prediction quality decreased as the software evolved. The authors found that this worsening can be accredited to a semantic drift of the classes which need to be taken into account when predicting defects. Therefore, the evolution of the repository showed an impact on the prediction. Our work answers similar questions in a different context and focuses on the structural changes.
Different studies focus on the impact of ontology evolution. Chen et al. [44] discuss how learned models become less accurate as a stream evolves semantically. Their work is directly related to our approach: they study machine learning as their task, where we focus on materialisation. They measure impact with accuracy loss and use concept drift [45] as the underlying change which causes the impact. Know-Evolve [46] is a model that enables deep temporal reasoning over dynamic knowledge bases. The authors apply machine learning over the graph and predict re-occurrence of events. The time component directly affects the results of the reasoning from which an impact could be derived. Gonçalves et al. [47] define a categorisation of changes based on a logical impact. They investigate if changes affect the set of entailed axioms in the next version and distinguish between effectual and ineffectual changes. Using this categorisation, they analyse NCIT. Gross et al. [4] examine how the changes in an ontology impact previously conducted functional analysis. They propose the stability of individual concepts as their impact measure. Gottron and Gottron [48] also investigate the impact of knowledge base evolution using Linked Open Data. They implement twelve different indexing methods and evaluate how the index is affected by the evolution of the data using three different measures. Dos Reis et al. [49] look into the impact concerning mappings between two evolving ontologies. Cardoso et al. [50] identify the impact on annotation creation using an evolving ontology. Osborne et al. [51] present the pragmatic ontology evolution, in which they analyse the selection of concepts for a new version by evaluating the performance of four different tasks. However, [4,[48][49][50][51] focus on one ontology and its specific tasks, where we aim for a broader selection of ontologies and the materialisation -a task not only used by the bioinformatics community but also by other research fields as well as the materialisation is potentially used by further applications. We have previously investigated the impact of changes over embeddings [17] and reported changes in neighbourhoods as impact. Additionally, we used the types of changes and additional ontology information to learn a linear model to estimate the impact without calculating the embeddings [17].
Summarising, ontology evolution and evolution impact has been a topic of various research initiatives. To our knowledge, this is the first study to define and investigate impact on the materialisation and at the same time investigating the evolution at this scale. We focus on OBO, but our approach can be applied to any other ontology.

Approach
In this section, we define two different groups of measures: supportive and impact measures. Following the definitions, we explain our computation strategy.

Supportive measures
This section introduces the supportive measures (popularity, ontology, and edit) we use to investigate our research questions. Popularity measures address some simple methods of assessing the popularity of an ontology. The goal of the ontology measures is to capture the information about every single ontology O i which is part of the evolving ontology O. The edit measures capture how two consecutive ontologies O i and O i+1 differ.
Generally, we focus on measures with low computational complexity. Measures with high computational complexity require a large amount of resources when the ontology size is large. We target scenarios where measures should be computed in an (interactive) online fashion, i.e. when the ontology engineer modifies the ontology [16]. Our focus is, therefore, on simple measures, which deliver information about the ontology structure without requiring an excessive amount of resources for calculation.

Popularity measures
We introduce three measures to determine if popularity influences the evolution of the ontologies. Possibly, if a larger number of engineers are working on an ontology or if it is being used more widely, it might influence the evolution of the ontology as additional knowledge needs to be consolidated over time.
First, we consider the number of authors who contribute to the ontology via GitHub, because OBO ontologies are often tracked and shared via GitHub. Using the GitHub API, we can retrieve all contributors and count them. Ekenayake et al. [43] have found that more contributors to a software repository lead to less stable bug prediction results. The authors used the number of contributors to decide when a new prediction model needed to be learned due to the old one becoming inaccurate. The same might be applicable in the domain of ontology engineering and therefore, have an influence on the evolution of ontologies.
Second, we can count the number of months between the start of the project and the latest version, by retrieving the first and last commit to the repository, which also includes the date. Also in this case, we utilise the GitHub API to retrieve such dates. The number of months gives a comparable measure across ontologies, because ontologies are not being updated with the same periodicity.
Third, we measure the usage of the ontology. Since there is no reliable resource that tracks the ontology usage, we use Google Search to assess the number of mentions of the ontology's permanent link. For this, we use the permanent links of the ontology files, in both OWL and OBO formats, and report the number of entries found by the Google Search Engine. We consider links to both formats because the usage of either OWL or OBO heavily depends on the domain of the application. We do not use both formats in the analysis of the ontology evolution itself. We only consider both permanent links to determine their popularity more accurately.
|O T | EL ++ and |O A | EL ++ count the number of EL ++ axioms in the ontologies. As explained in Section 2.1, the TBox includes general concept inclusion, role inclusion, as well as domain and range axioms, while the ABox includes the concept and role assertions. To compute the number of axioms and assertions, we defined the SPARQL queries reported in Table 2, which extract the number of axioms for each type, and then we add them up. c and p count the atomic classes and properties defined in the TBox. The inheritance richness, calculated using c and the number of explicit subclass relationships, tells us much about the class hierarchy portion of the ontology.

Edit measures
The edit measures capture differences between consecutive ontology versions. We use these measures to investigate changes δ between two snapshots of an ontology, because the changes give a different perspective on the evolution beyond simple growth investigation of classes, relations, or annotations. Changes provide a more thorough view and allows for more detail, where the ontology measures are not sufficient.
We consider two different approaches for calculating edits between ontology: (i) a simple approach of counting additions, deletions, and moves [47] as well as (ii) a more complex approach which includes the classification of changes into so-called complex change actions [14]. Edit measures presented in Table 3 are solely based on counts of logical axioms and the notion of structural equivalence as defined by Motik et al. [61]. Structural additions refer to the axioms that are in an ontology but not in the previous version. Removals identify the axioms that were in the previous version but are no longer present in the ontology. Shared axioms include the axioms that are in both versions of the ontology. The definitions are shown in Table 3. We consider the axioms which are either additions or removals as changes. Just like for the calculation of ontology axioms (|O| EL ++ ), where axioms are counted according to EL ++ logic as shown in Table 2, |·| refers to the counting of axioms also using EL ++ logic. Therefore, we use the same queries for any |·| operation as we do for |O| EL ++ , except where specified otherwise (e.g., number of subclasses h). This applies to the number of addition and removals. The query for subclass additions and removal is also already introduced in Table 2 (h in inheritance richness) and used for h δ + Since we are dealing with EL ++ OBO ontologies, we expect that most changes affect hierarchies of concepts. Therefore, we introduce an edit measure that relates the updates in the subclass relations. Such a measure, called hierarchical moves, counts how many removals have a corresponding addition, where the subject or object of the triple remained the same.
Additionally to the measures described in Table 3, we use COnto-Diff [14] for the classification of changes into complex change actions. The classification is rule-based and has nine atomic changes as basis: addition, deletion, and modification of concepts, attributes, and predicates. These atomic change actions are first identified and then condensed into complex change actions, e.g., the addition of a leaf node consists of the addition of a concept, of its attributes, and of a link to an already existing concept. The classification will enable a more thorough analysis of types of changes and the impact the generate on the materialisation.

Impact measures
The impact measures quantify how the ontology evolution affects the materialisation. The state of the art does not include ways of comparing materialisations between ontology versions. Known measures are either meant for ontology matching or for comparing two ontologies within the same domain but not related to each other. At the same time, we are looking for measures that are simple to compute, meaning not computationally intensive and computable in an online fashion. We propose the measures presented below to close this research gap. We have two groups of impact measures, the first focusing on impact relative to the size of the materialisation, which we refer to as size-based metrics further below (σ and σ ⊑ ). The second set focuses on the number of changes applied to the ontology and, therefore, referred to as change-based metrics (γ , γ ⊑ , and γ ̸ ⊑ ).
There are numerous ways to define impact, depending on the ontology, the task, and the analysis to be performed. We designed the measures to consider both the removed and added axioms and to make them symmetric (i.e. the impact between O i and O i+1 is the same of the one between O i+1 and O i ). Moreover, we took into account that we are dealing with ontologies defining subclass hierarchies (hence: σ ⊑ , γ ⊑ , and γ ̸ ⊑ ), with large TBoxes and small or absent ABoxes. Therefore, we do not pay spacial attention to the ABox in our impact definitions, however, we also do not exclude it.
Size-based metrics. In general, we have two families of impact measures as we previously mentioned. The first includes measures defined as ratios between how much the materialisation changes and how much has remained the same. Denominators signal the amount of axioms that are shared between the two materialisations, and numerators count the differences. We defined the values of impact measures ranging in [0, ∞), where 0 indicates equality between two materialisations. When measures are greater than 1, the materialisation changes substantially, i.e. more than half of the axioms change. Since we are dealing with ontology that are evolving, we usually expect measures lower than 1.
The first impact measure is the percentage of the materialisation which changed due to ontology evolution. We define it as the ratio between the number of inferred axioms that change and of those that do not. To recap, axioms added to the materialisation are denoted with ∆ + i = M i \M i+1 and those removed from the materialisation are ∆ − i = M i+1 \M i , where the M x only contains the inferred axioms (i.e., M x ∩ O x = {}). We refer to the changes to the materialisation as ∆ i , which accounts for both the added

Share of subclass moves
and the removed axioms. The unchanged axioms of the materialisation is captured by the intersection of the two materialisations We can now define σ : as the number of changes divided by the number commonalities between the materialisations. When ontologies are large, we expect the change impact to be close to 0, as the effect of the changes on the materialisation should be dominated by the number of inferred axioms that are not affected.
We have a second impact measure in this group, which focuses solely on the subclass hierarchy. The numerator and denominator only consider the subclass axioms of ∆ i and M i ∩ M i+1 : can be understood accordingly, where the Sub-ClassOf axioms are counted for the shared axioms between the two materialisations M i ∩ M i+1 . Hence, σ ⊑ corresponds to the number of added and removed subclass axioms in the materialisation divided by the joint subclass axioms of the two materialisations. This impact focuses on the hierarchy, and we expect it to be effective with biomedical ontologies, which often define taxonomies.
Change-based metrics. The second family of impact measures do not consider the size of the ontology. Rather, they focus on the changes in the materialisation (∆ i ) compared to the changes in the ontology (δ i ). Therefore, the general impact based on changes is: In contrast to the previous impact measures, γ only considers the amount of changes applied to the ontology and the following changes to the materialisation. This means that when there are more changes in the materialisation than in the underlying ontology, impact is high (above 1.0). Consequently, when impact is below 1.0 the ontology changes did not lead to as many changes in the materialisation. We also introduce a specific impact measure for taxonomies, which only regards the subclass changes to the materialisation (h ∆ i ) in the numerator and the subclass changes to the ontology (h δ i ), as previously defined in Table 3. Instead of only counting the subclass changes to the materialisation we also subtract the subclass moves (h δ + i ∩δ − i ). This allows us to divide the subclass changes to the materialisation by the changes to the ontology, which do not have matches between additions and deletions.
Also for this impact measure, we expect the impact value to be around 1.
Lastly, since we anticipate most impact to come from the class hierarchy, we are also curious about the remaining changes and if they impact the materialisation as well. To get the number of changes without the subclass changes, we simply subtract the subclass changes (h ∆ i for the nominator and h δ i for the denominator) from the changes overall (∆ i for the nominator and δ i for the denominator).
With this last impact measure, we can investigate the impact on the materialisation without the hierarchy. We further analyse the usefulness of the impact measures in Section 5. We use ontologies and the supportive measures for the analysis. We then discuss and answer research RQ2 in more detail with real world data.

Metrics computation
All the measures presented above are implemented in the framework depicted in Fig. 2. The workflow is supplied with an evolving ontology and produces sequences of ontology, evolution, and edit measures.
The Materialiser takes an ontology O i as input and computes the materialised axioms M i . We implemented this component using the OWL2 API [62], which allows exploiting state of the art reasoners, among which we chose HermiT [19]. Since some ontologies are in OBO format, the component uses the Robot module [63] to convert them in OWL before materialisation.
The Diff-calculator processes pairs of consecutive materiali- as implementation of this component. COnto-Diff 1 is the module which also calculates a difference between ontology [14]. However, unlike Ecco, this tool categorises the diff into multiple complex changes. Simple changes (additions, deletions of concepts, relations, or attributes) are grouped together to form more complex changes. We expanded the implementation to return the number of the different complex change actions instead of the entire list. Finally, the calculated measures are analysed by the Measure Analyser -an R script that computes the monthly growth, averages, and standard deviations. We also use it to test hypotheses, where applicable, and to generate tables and plots displayed in the next sections. Based on the results from the Measure Analyser, we assess and answer our research questions.
The code of the overall running script as well as the R code used in the analysis are open source and available on our project repository 2 and published under the GNU GPLv3 License. It also includes the modified version of Ecco and COnto-Diff as well as the Materialiser. For all three components code is available, as well as a build jar-file which was used during our calculations.

Datasets: Analysed ontologies
Even though there are many ontologies freely available, only a few record the edit history and make it available in a reusable format. GO and NCIT are two well known ontologies, which publish not only the most current version, but also the previous ones. 3 Additionally, we consider seven ontologies from the OBOFoundry [65]: DOID, FYPO, UBER, PWO, RSO, ASV, PTO. 4 In this section, we describe our strategy for choosing the ontologies. Next, we introduce each of them ordered by their size.
To select the OBO Foundry ontologies, we analysed its 260 ontologies. 49 of them are marked as obsolete, ten as inactive, and six as orphaned. Out of the 179 remaining ontologies, 16 have either no available files or broken links to their repositories. We found that 37 repositories include two files, one marked as releases (called simply <ontology>.owl/.obo) and the other as edit file (<ontology>-edit.owl/.obo). The edit file is used for changing the ontology until it is deemed good enough for release. The commit messages of the release files follow the format ''Release ⟨date⟩'' and signal official releases with no intermediate commits to these files. We consider the release files, as the edit ones lead to extremely small changes that are not relevant for the reasoning task [66]. Moreover, the released ontologies are ready to be used, and should not contain errors due to the ongoing editing. Finally, we selected the ontologies with more than 100 releases, to have a sufficient amount of versions to observe their evolution and run statistical testing. Some remaining ontologies fit these criteria but were not selected because of parsing or materialisation issues with many of their available versions, resulting in seven ontologies selected from OBO Foundry. Table 4 shows the ontology measures and the profile violations for the considered ontologies. We report and discuss the ontologies starting with the largest ontology and ending with the smallest one. The first row reports the number of versions for each ontology, followed by the number of months they cover. The second block of rows reports the compliance of the ontologies to the OWL2 EL profile, as well as OWL2 DL by listing the number of violations. We expect the ontologies to be in the EL ++ profile.   where c 1 and c n are the numbers of classes in the first and last version, respectively. The raw data we collected and calculated (without aggregations) is available for further analyses. 5 NCIT is a widely recognised standard for biomedical coding and Refs. [67]. It provides a vocabulary for diverse medical fields: clinical care, transitional and basic research, public information, and administrative activities. We use 190 versions of NCIT from October 2003 to December 2019. NCIT shows the largest number of violations: 232 axioms violate OWL2 EL and 97 axioms violate OWL2 DL. However, those numbers are negligible when considering the total number of axioms. Among the ontologies we consider, NCIT is the biggest and the one with the largest growth. 5 https://gitlab.ifi.uzh.ch/ddis-public/chimp-mat.
GO is a well-known ontology in the biomedical domain and has been maintained by the Gene Ontology Consortium since 2000. GO provides a precise and common vocabulary to describe the role of genes and gene products in any organism [2]. We use 123 versions from January 2010 to December 2019. The older versions of GO that we are using are not directly available in OWL. Therefore, we decided to use all the versions in the OBO format, and convert them with Robot [63]. GO shows two violations of the OWL2 EL profile, related to the metadata of the ontology: one is an axiom declaring an inverse property in the context of the versionIRI and OntologyID, and the other is an undeclared annotation (license). Since both violations do not pose a problem, GO can be considered compliant to EL ++ . GO grows faster than the ontologies presented below when considering the number of classes. But the number of defined properties increases by five throughout the evolution, which is small compared to the growth of number of classes.
DOID is the human disease ontology [9]. The ontology has undergone significant expansions in the past three years. Though an expansion from single asserted classification to multiple-inferred mechanistic classification, it provides a new perspective on related diseases. DOID had 107 commits on its GitHub repository, which cover 51 months starting from November 2015. The profile-checker reports that there are 30 violations for OWL2 EL and 17 for OWL2 DL. This ontology grows significantly in the considered time period. The slope for the number of properties is negative, indicating that the last version of DOID has fewer defined properties than the first one.
FYPO is an ontology of phenotype observed in fission yeast [7]. FYPO versions range from July 2015 to February 2020, and it is the ontology with the highest number of versions (355) among the ones we consider. FYPO shows three violations to the OWL2 EL Profile, all of which are the use of undeclared annotation properties (title, description, and license). Therefore, it is safe to assume that FYPO is an EL ++ ontology. Both the number of axioms and classes increase over time, while the number of properties does not grow significantly.
UBER is an integrative cross-species anatomy ontology [8]. It is organised according to traditional anatomical classification criteria, and models concepts in a species-neutral way. UBER reuses several concepts and properties of other anatomical ontologies. UBER has 305 versions, but we consider 254 because the 46 versions cannot be parsed. The versions we use cover 116 months starting in September 2010. This ontology does not show any OWL2 DL violations, while it has 22 OWL2 EL violations. UBER is the third largest ontologies among the ones we consider. It is also the third fastest growing ontology, with a growth rate slope values slightly higher than DOID. The number of properties has the highest growth among the ontologies we consider, but it is small compared to classes and axioms.
PWO is the Pathway Ontology and it contains all known types of biological pathways, including disease paths [10]. It also incorporates relations among pathways, creating an acyclic directed graph structure. PWO has 104 versions, which cover 98 months starting from March 2011. PWO passes the profile-checker without violations for OWL2 EL, and consequently OWL2 DL. The number of properties does not change over time, and we observe that the number of axioms grows faster than classes.
RSO is another ontology developed by the maintainers of PWO. It is a structured vocabulary for facilitating access to rat strain data [11]. It models the breeding history, parental background, and genetic manipulations. RSO cover 101 months starting from February 2011, and it has 146 versions, but we considered 140 of them (6 raised parsing errors). Just like PWO, RSO passes the profile-checker without violations for OWL2 EL and DL. The size of RSO is also comparable to PWO. The number of properties is constant, and the growth of axioms is larger than the growth of the classes.
ASV provides definitions that are necessary for inter-operation between epidemic simulators and public health application software [12]. Versions of ASV range from May 2015 to February 2020, with 216 commits on the owl file. Despite its small size, this is the ontology with the highest number of violations of the OWL2 EL profile. The smallest version has fewer than 1'000 axioms, and grows to a maximum of 1'781 axioms. However, as the slope shows, the first and last version have the same size. Therefore, ASV grew during the covered time, but also shrank to its original size with the last version. This is visible for classes and properties. It is also interesting to note that this ontology has a large number of defined properties, considering its size.
PTO is the Plant Trait Ontology; it is part of the Plant Ontology, maintained by the Plant Ontology Consortium [13]. As the name suggests, PTO encodes traits of plants. It is used in many other ontologies focusing on specific trait of plants and on genetics of plants. PTO has 144 versions, and it covers 59 months beginning in July 2015. We were unable to run the profile-checker on most versions of this ontology. This ontology is a OWL2 DL ontology. The one violation for OWL2 EL is an inverse object property definition. This profile was derived from version 24. All later versions either gave a FileNotFoundException for importing ontologies or a NullPointerException after the loading of the ontology with the profile-checker. The growth rate between axioms and classes is close to each other for PTO.
In conclusion, our analysis of the selected datasets confirms our expectations: the ontologies have small or empty ABoxes and most comply to the OWL2 EL profile. At the same time, the ontologies show to have different sizes, growth rates, and number of versions.
As previously stated in this section, we analysed ontologies in OBO Foundry, which are about 30% of those referenced by BioPortal (877) 6 [68], and selected the ones fit for our analysis. Therefore, of the 30% analysed ontologies, we are confident that we nonetheless selected a fairly representative sample of ontologies which fit our selection criteria and therefore, are appropriate to study the research questions we introduced in Section 1.

Changes and their impact
First, we analyse the evolution of the chosen nine ontologies. We take a closer look at the changes and some trends within them. We will then address the different impact measures and what aspects they capture together with an analysis of the effect of evolution on the materialisation.

Ontology evolution
In Fig. 3, we report different distributions of the relative changes. The substantial ontology size difference also leads to a large difference in the number of changes, therefore, we visualise relative numbers rather than absolute ones. Additionally, visualising ratios allows us to see outliers more clearly. Firstly, Fig. 3(a) shows the distributions of structural changes relative to the respective ontology's size ( |δ i | /|O| EL ++ ). Secondly, Fig. 3(b) reports the distribution of the share of hierarchical moves relative to the number of structural changes ( h δ + i ∩δ − i /|δ i |). In Table 5, we also report the means of the relative and structural changes. Additionally, this table shows the number of contributors on GitHub and the number of Google search results of the ontologies official file link for both the OWL and OBO file, as explained in Section 3.1, Popularity Measures.
The number of absolute structural changes is directly related to the size of the ontology. This can be read in Table 5. However, this relation does not apply to relative changes. We confirmed both of these observations using a pairwise t-test to evaluate if samples are taken from the same distribution. A significant result reports that the means of the two samples are indeed different from each other and, therefore, from different distributions. We cannot report only significant results, neither for relative nor for absolute changes, which means that not all ontology comparisons yields a significant difference between means. So, even though we observe a relation between the size of the ontology and number of changes, we do not confirm it statistically.
Since size does not explain the differences, we investigate the number of contributors and the popularity (mentions) of the ontology reported in Table 5. By observation, we cannot see a connection between size of ontology and these two other aspects. We did not test this statistically because the correlation result with only eight points is unreliable. However, there seems to be a connection between the popularity of an ontology and the number of contributors, with two exceptions: PWO and RSO. These two ontologies are developed by the same team and are not properly tracked using GitHub. Therefore the number of contributors is not reliable for these ontologies. Additionally, we do   not have a number of contributors for NCIT, because this ontology is not updated or shared via GitHub. We also note that for GO and RSO we see a large difference in mentions for OBO versus OWL files. We believe that these two ontologies are much more used within the biomedical domain compared to elsewhere, which explains the higher number of mentions of the OBO file format.
We can now answer the research RQ1.1: What is the relation between the number of changes applied to the ontology and its size for open biomedical ontologies, the evolution of which is available online? We observe the intuition within the presented data that larger ontologies experience more changes. However, we are unable to confirm this statistically at this point.
At the same time, the number of contributors does not explain the absolute or relative number of changes. We observe a relation between the number of contributors and the popularity of an ontology. Even though we notice growth tendencies and how ontologies evolve based on size, there is still some uniqueness to each of the ontologies that should not be left unnoticed, because size does not totally explain all differences between the ontologies.
Further, Fig. 3(b) shows the distribution of hierarchical moves in relation to structural changes ( h δ + i ∩δ − i /2×|δt|). A move consists of one addition and one deletion of subclass axioms, hence, we calculate the ratio using the number of changes divided by 2. In 34 versions (removed from the visualisation), certain moves were counted multiple times because there was more than one addition or deletion identified as part of the move. Hence, in these cases the ratio is larger than 1.0 and they are omitted from Fig. 3(b). In most cases, the main part of the distribution (the box) is under 0.5 (ASV, DOID, FYPO, NCIT, PWO, RSO). This signals that for most versions the moves only make up about 25% of the changes. This is different for PTO and UBER. The means are around 0.3, which is higher than for the other seven ontologies, and there are almost no outliers. Therefore, these two ontologies experience this type of change more often. For DOID, FYPO, PWO, RSO, and ASV, the subclass moves do not occur as often, as for GO and NCIT. GO and NCIT's distributions of the share of subclass moves is around 0.125, with theirs whiskers touching 0. Whereas for the remaining five ontologies, the boxes end at 0 and no whiskers are visible. Fig. 4 shows the distribution of all complex changes detected with COnto-Diff [14]. Two outliers have been cut off and are labelled at the top of the graph. Fig. 5 shows the distributions of the relative number of the specific complex change action types in comparison to the total number of change actions. For completeness, we report the min, max, mean, median, and standard deviation values in the appendix (Table A. 8). We can see that the addition of leaves (AddLeaf ), moves (Move), and the changes of attribute values (ChgAttValue) build the most common complex change actions as categorised by Hartung et al. [14]. Changes to attribute values are present often, but do not necessarily make up the largest part of the changes except for ASV and PTO or NCIT, where there are none. In contrast, NCIT experiences the highest number of additions of leaves and a smaller number of moves when compared with GO, DOID, FYPO, UBER, PWO, and RSO. ASV and PTO show a slightly different pattern with a lot of changes to attribute values but a small amount of leaf additions and even fewer moves. The remaining complex changes are visibly less present. Therefore, we observe different types of behaviours which we categorise into three groups: (1) Ontologies in the first group experience many additions (leaves), small amount of moves and close to no change to attributes, visible in the evolution of NCIT. We see the goal of the evolution to be the addition of new information but only small amount of modifications to already present information with moves. Ontologies in this group show a very large monthly growth. (2) The second group shows changes mostly in the form of leaf additions, but these make up less than 50% of the changes on average. Moves and changes to attributes together make up the rest of the changes. This group includes six ontologies: GO, DOID, FYPO, UBER, PWO, and RSO. These ontologies still grow substantially, but far less than those in the first group (NCIT). (3) Lastly, the third group experiences mainly maintenance in the form of changes in attribute values and also moves. These ontologies (ASV and PTO) do not experience that many additions, and therefore, also show the lowest growth rate compared to the first two groups.
We could also see this as a spectrum, where on the one end we find NCIT with the highest growth (group 1) and on the other and we have ASV and PTO that are mostly maintained and do not grow much over time (group 3). The second or middle group experiences both, addition of new information (growth) and maintenance at the same time. At this point we can also answer the research RQ1.2: What are the most common complex changes which are applied to open biomedical ontologies, the evolution of which is available online? The most common changes are additions of leaves, moves, and changes to attribute values. Based on these three types of changes, we also observe three slightly different change behaviours among the ontologies. In short, they can be classified as either growing (1), being mostly maintained (3), or something in between, growing and also experiencing maintenance (2).

Influence on materialisation
To further discuss the second research question and to answer the last one, we investigate the evolution's impact on the materialisation. Fig. 6 shows the impact measures based on size, in comparison to the number of structural changes. Figs. 6(a) and 6(b) show the individual points in the evolution of ontologies for σ and σ ⊑ , and Figs. 6(c) and 6(d) show the linear regressions for each ontology for σ . Fig. 6(c) relates the impact to structural changes, whereas Fig. 6(d) relates it to relative changes. In Figs. 6(a)-6(c) we observe two clusters of points or lines. The ontologies use different shapes or line types based on size to improve readability. In the scatter plots, triangles are associated with small ontologies, which have fewer than 6'587 classes; the remaining ontologies are large and are illustrated with circles. In the line plots, full lines denote the small ontologies and dashed lines denote the large ontologies. When the same amount of change is involved, the large ontologies have a lower impact on the materialisation compared to the small ontologies. This behaviour confirms our initial idea: the smaller the ontology, the higher impact of one single change.
σ and σ ⊑ penalises large ontologies because of their size. We see that the absolute number of changes for large ontologies is larger than in small ones ( Table 5). As the figures indicate that the size of the ontology provides a classification into the two visible clusters. Labelling each point according to the cluster visible in Fig. 6(a), we trained a decision tree model to identify the size of the ontology which explains this classification visualised with the two shapes of points (circles and triangles). The classification into circles and triangles is used as the independent variable for the decision tree model. Using relative changes, as visible in Fig. 6(d), we do not observe these two clusters of impact based on the size of the ontology. Therefore, we infer that ontologies experience the same amount of relative changes, which coincides with the same amount of impact. This observation is supported by the parallel linear regression lines in Fig. 6(c) as well as in Fig. 6(d). The slope and intercept numbers for both figures are also reported in Table 6 on the left side. The right side of Table 6 lists the Spearman correlations between impact values and number of changes (either structural or relative) and their p-values. We can see that the correlation is almost one for all ontologies and all are highly significant. Fig. 7 shows the set of impact measures related to the number of changes. The three scatter plots relate the number of structural changes with γ , γ ⊑ , and γ ̸ ⊑ , respectively. Assuming that every change would effect at least one materialised axiom, we expect the plots to hover near or above 1.0. Numbers far above 1.0 would indicate structural changes with a large impact, numbers below 1.0 would indicate the presence of changes without any impact on the materialisation. We see differences when comparing Figs. 7(a) and 7(b). Though overall the cluster prevails, γ ⊑ is situated slightly higher than . The x-axis shows structural (|δ i |) or relative changes. Both axes are in logarithmic scale. Legend applies to all plots. Table 6 Intercept and Slope of LM displayed in Figs. 6(c) and 6(d) and the correlation between σ and number of structural as well as relative changes.   Table 7 Regression and Correlation between γ , γ ⊑ and γ ̸ ⊑ as indicated in the header of the  γ , but still around 1.0, and it also displays more outliers above 1.0. Therefore, the changes on the class hierarchy have a high contribution to the overall impact. We can confirm this with the scatter plot in Fig. 7(c), which plots γ ̸ ⊑ and shows impact without the class hierarchy. All points situated at the bottom of this plot have no impact besides that of the class hierarchy, because their γ ̸ ⊑ is equal to 0.0. The remaining points show ontology version pairs which experience impact not connected to the hierarchy. This impact is mostly below 1.0 as well, showing that other changes have less influence on the materialisation. Further investigation into the characterisation of the points between 10e −2 and 10e 2 in Fig. 7(c) was not fruitful. No feature emerged that would identify such ontology version pairs, which experience high impact outside of the hierarchy. We investigated other common ontology measures known from works like [55,56,58], such as property richness, annotation richness, or classproperty ratio. We also analysed the axioms in δ i , ignoring general concept inclusion axioms with type and SubClassOf relations, but we found no common changes among these points. There are only 58 ontology pairs in this range (out of 1'090), and most of such points relate to ASV and UBER. This behaviour is therefore seldom, and in the cases of PWO, RSO, and PTO it never happens. We also observe that in all the ontologies, the most common changes are related to general concept inclusions, mostly type triples, followed by subClassOf ones. Finally, the ontologies we consider are biomedical EL ++ ontologies, which typically have a strong focus on the class hierarchy relations and have no or a small ABox.
Further, We calculated regressions and correlation to investigate the observation of most impact being related to the subclass hierarchy. Results are shown in Table 7 with the regression on the left side and correlation on the right side. The first line shows the results over all ontologies combined. The regression target is γ with γ ⊑ and γ ̸ ⊑ as input. Unfortunately, the regressions are not significant in most cases. This looks different when looking at the correlation, where some ontologies show a significant result between γ and γ ⊑ . Even though the results are slightly ambiguous for individual ontologies, the results over all ontologies still suggests that class hierarchy is the deciding factor for impact on the materialisation. Hence, we conclude that the changes to the hierarchy are the ones that most prevailingly influence the materialisation. This confirms our initial intuition of creating impact measures tailored to class inheritance.
Additionally, in Fig. 7(a), and therefore for γ , there is no clear distinction between the different sizes of ontologies. This is in contrast to the split we observe for σ in Fig. 6(a).
We also use the change classification from COnto-Diff [14] for a further analysis. We calculate a linear model to learn the influence of the different types of changes on the impact. Unfortunately, no change type emerged across ontologies, for which we can say it has a significant influence on the materialisation impact. For some ontologies one or two types were significant, but most of the time they differed from each other. Results are presented in the Appendix in Tables A.9-A.12.
Through the analyses we presented in this section, we can answer research RQ2: What are measures that are simple to compute and allow to assess the impact of the ontology change on the ontology's materialisation? Our analyses support the fact that the impact measures capture different aspects of the consequences of changes: where one set focuses on the size of the ontology (σ and σ ⊑ ) and the other on the number of edits (γ , γ ⊑ , and γ ̸ ⊑ ). σ and σ ⊑ show similar behaviour and no influence when only regarding the hierarchy of the ontology. On the contrary, γ and γ ⊑ display differences in the scatter plots. σ and σ ⊑ are heavily influenced by the size, penalising the impact for large ontologies when small part of axioms changes. Due to their design, γ and γ ⊑ do not have this problem. The measures are also simple to compute. We used Ecco to calculate the difference between two materialisations. We have also looked into the implementation within Protégé [15] and will include the impact measures in the ChImp plugin [16].
To conclude, we analyse research RQ3: What aspects of changes and which complex change types have the largest share in the impact on the materialisation? This question concerns the ontology evolution's real consequences on the nine chosen OBO ontologies. Once an ontology exceeds roughly 6'500 classes, we consider it to be a large ontology. This clear distinction is visible in Figs. 6(a) and 6(b). When using σ and σ ⊑ as impact measures, large ontologies require more impactful changes to reach comparable impact. However, relating the changes to the size, we found a high correlation between the relative changes in the impact for all ontologies. At the same time, the relationship reported is similar for all ontologies, which allows us to generalise to other ontologies with a similar profile. When considering γ and γ ⊑ , we observe that most impact comes from hierarchy changes and also consequences mostly on the hierarchy within the materialisation. When hierarchy is not taken into consideration, we observed that less than 2% of the ontology versions experience impact. Therefore, we identify the evolution of the subclass hierarchy as having the largest share in the impact on the materialisation. This finding is in line with related work, where the features capturing hierarchy relations were performing the best in the change prediction task [37,50]. Surprisingly, no specific change actions emerged to have a significant influence on the materialisation impact. However, these change actions do not distinguish between changes on the subclass hierarchy and other changes. The type of change, where the addition or deletion occurs does not reliably signal impact on the materialisation. The changes on the subclass hierarchy are much more dominant in this regard.

Limitations and future work
The largest threat to validity is the analysis of only nine ontologies, when there are roughly 877 ontologies 7 available in BioPortal [68]. However, because of the selection criteria and computational intensity of this research, we could not select an excessive amount of ontologies. Many ontologies available online do not openly share their evolution, but rather just the recent release. Additionally, a more specific investigation has to be conducted on the ABox, its changes and their impact on the materialisation. Our chosen ontologies did not include ABoxes and they are also rare in the biomedical domain. Hence, we cannot draw any conclusions about ontologies which make use of ABoxes. In our future work, we will extend the analysis towards other types of ontologies, investigating both other description logics and domains outside of biomedicine, but our constraint on the availability of the evolution of an ontology remains.
Our analyses show that every ontology has its unique characteristics, even though they are from the same domain and mostly the same description logic. This supports recent claims that ontology engineering and ontology evolution is still an open research area that requires novel and more supporting instruments [69].
In this study, we do not focus on the analysis of the outliers, as we are interested in the overall picture, as in previous studies 7 https://bioportal.bioontology.org/ -Accessed on 29/05/2021. such as Gonçalves et al. [70] in the case of NCIT in 2011, and Gross et al. on GO [4] in 2012. Such detailed analyses are still important and necessary, as they may let properties and findings emerge that are complementary to the ones we found in our study.
Lastly, even if the impact measures we introduced are effective to carry out our analyses, they need to be validated and evaluated by the ontology engineering community. As we have added the measures in our ChImp Protégé plugin [16] our next step includes a detailed user-study on the usefulness and informativeness of the introduced measures. Additionally, this will give us the opportunity to study the awareness of change consequences in general. We are interested in studying if such measures are useful to help users to better understand the impact of their changes, and in changing their behaviour.

Conclusions
It is a known fact that knowledge evolves constantly. As we capture knowledge in ontologies, we inherently have to deal with its evolution. As ontologies are often built to power subsequent applications, and the ontology evolution also impacts the applications built on top of such ontologies. In this work, we studied how ontology evolution affects their materialisation. Our goal was two-fold: we defined materialisation impact measures and analysed the relationship between ontology evolution and the impact using the proposed measures. We carried out this analysis for nine OBO ontologies, two of which are very large and well established. At this point, we want to advocate for more transparency in the editing process and ask ontology maintainers and engineers to share the editing history of their ontologies.
Our primary contribution and the answer to our research RQ2 is the definition of five impact measures, divided into two sets. The first set concerns the consequences of changes on the materialisation concerning the materialisation's size. The second set focuses on both the number of changes to the materialisation and to the ontology.
To start the analysis, we took a closer look at the evolution of ontologies themselves, which acted as a stepping stone to understand the impact on the materialisation. We found that even though we observed more changes being applied to already large ontologies, we could not confirm this statistically (RQ1.1). Additionally, using the relative number of changes to the size of the ontology, no such distinction could be observed at all. We investigated the types of changes in detail as well and found that the most common complex change actions are addition of leaves, moves, and changes of attributes (RQ1.2) classified using the rules defined by Hartung et al. [14].
We then continued to use the proposed materialisation impact measures RQ2 to investigate the influence of the evolution on the materialisation. The first set of impact measures displays a clear division based on the size of the ontologies, which allows us to conclude that smaller ontologies experience a more considerable impact on their materialisation regarding their size. Once we correct for the size of the ontology and use relative instead of absolute changes, no distinction between ontologies is visible. However, there is a very strong correlation between the number of changes and the impact, meaning that more changes also mean more impact on the ontology. With the help of the second set of impact measures, we found that most impact on the materialisation is related to the hierarchy of the ontology, except for some noise. This is also supported by the first set of impact measures, as no difference could be found between the general impact and the impact on the class hierarchy, showing the same distributions. Also taking into account the complex changes calculated using COnto-Diff [14], we were unable to identify commonalities among the ontologies, meaning that no single or multiple change types have a significant share in the impact on the materialisation. Therefore, we conclude that OBO ontologies experience changes and their impact mostly on the class hierarchy within the ontology or materialisation but not specific to complex change actions. These findings answer RQ3.
Knowing that some changes or the accumulation thereof are causing a tremendous impact, could signal the ontology user that they need to update their version and recalculate tasks based on the new version of the ontology. There is significant potential in future work following this research for investigating the awareness of change effect with ontology engineers. Additionally, because of the simplicity of the measures, they can easily be incorporated into ontology editors as well as release notes. It is open to see if knowledge about impact can improve processes and communication of ontology evolution between engineers and users. Hopefully, with time, we will be able to observe such improvements, which could result not only in better ontologies but also in a better usage of ontologies in other applications.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Results of the linear model to estimate the size impact σ using the complex actions classified by COnto-Diff.