1 Introduction

The maintenance of individual aircraft components, including Line Replaceable Units (LRUs), which are specifically addressed in this contribution, is a complex task in many respects. In particular, the large number of different part numbers, with their individual histories, the diverse influencing variables and the range of possible faults, poses great challenges to MRO service providers [1]. For planning and executing maintenance activities in the workshops, a large amount of information from different life cycle phases of the LRU is needed. In this regard, information from flight operations about the condition of individual components is of great value for improved planning of maintenance activities [2, 3]. However, such information is currently not available for the MRO service providers for a variety of reasons, not the least because the operators of the aircraft rarely disclose their data [3]. Consequently, MRO is characterised by great uncertainty, which manifests itself in many ways. For instance, much more capacity and materials are planned as buffers, resulting in increased costs [4, 5]. Besides, planned lead times are rarely met [1, 5]. The accuracy of the information concerning the cause of the fault does only increase in the course of the maintenance process (see Fig. 1, curve As–Is), for instance by disassembling the LRU [1, 6]. In this respect, there is a great need for MRO service providers to discover internal sources of information and specifically use them to improve the efficiency of processes in a wide variety of use cases.

Fig. 1
figure 1

Concept for creating digital services to improve information accuracy for MRO of LRUs based on a knowledge base with knowledge graphs

In order to realise concepts of Industry 4.0, target-oriented digitalisation is necessary. Only when, as described in [7], visibility of the maintenance processes (stage 3 in the Industry 4.0 development path) can be established, further-reaching ideas of diagnostics or prediction can be pursued on the basis of the realised digital twin. The digital twin is a concept for digitally representing valuable assets through automated data exchange between the physical and the digital object [8]. Over the years, numerous definitions have been published, yet a consensus regarding the specific constituents of the digital twin remains elusive in the scientific discourse. Typically, it commonly comprises data, digital models and value-generating digital services [8,9,10,11]. Figure 1 illustrates the approach pursued by the authors of this contribution in order to enhance the information accuracy at an early stage (see Fig. 1, curve To–Be) using such concepts. In the initial phase, the collected historical and current maintenance process data, information from engineering artefacts (e.g. maintenance documents) as well as existing domain expertise need to be integrated and structured within a knowledge base (see Fig. 1, the arrows with the number 1). Subsequently, the objective is to augment this knowledge based on the gathered data. AI (especially Machine Learning (ML)) and DM techniques are able to support this task by creating valuable knowledge from the manifold data (see arrow 2 and 3). This knowledge can in turn be saved in the existing knowledge base and be used for effective non-invasive diagnostics and subsequent short-term maintenance scheduling. Existing approaches in the area of maintenance of aircraft components have already proven the potential added value of using such techniques [1, 4, 6]. For example, they can be used to identify critical and often faulty subcomponents from historical data [12]. Finally, the created and extended knowledge base is utilised to create the corresponding digital services, which are made available at the beginning of the maintenance process (see arrow 4 and 5).

However, on the way to realising such data-driven projects, some obstacles have to be overcome by the MRO service providers regarding the data management. The maintenance processes of individual LRUs in different workshops can differ significantly due to various influencing factors [5]. One reason might be the amount of dissimilar technologies considered (hydraulics, pneumatics, avionics, etc.) as well as their complex structure, which place diverse requirements on the MRO service providers. Furthermore, varying customer demands may necessitate slightly different processes [13]. A major challenge is the heterogeneity of potentially interesting data sources at IT and OT level. Gathered data (e.g. test results, maintenance records, booked material) is stored in different formats and ways [4]. Moreover, a varying level of digitalisation along this process, including both, complete manual process steps (repair) and (partially) automatic ones (e.g. test bench), is not unusual. Thus, a lack of context, inconsistent, incomplete or inaccurate data are not uncommon. Ontologies from the field of symbolic AI are suitable for formally describing the semantics of a domain as well as linking existing information to a knowledge graph. The use of ST for data management and knowledge bases as a means to establish the targeted visibility is promising in many respects: on the one hand, to integrate large amounts of data from heterogeneous data sources into a uniform schema [2]. On the other hand, to generate considerable added value from these data by sensibly supplementing it with sub-symbolic AI approaches [14, 15]. The potential of ST has been demonstrated in multiple Industry 4.0 application scenarios [16, 17]. In comparison to production, however, more extensive requirements must be considered in the maintenance domain, which demand more in-depth research.

The purpose of this contribution is to outline the advantages of using ST for the data management of LRU maintenance processes. For this purpose, the following research questions (RQs) are considered:

  • RQ1: Which data is generated in IT and OT during maintenance processes of aircraft components and which problems hinder the successful execution and implementation of data-driven projects in this domain?

  • RQ2: Which challenges arise when accessing and integrating maintenance process data with an ontology-based approach and which requirements does this place on the ontology needed?

  • RQ3: With regard to an ontology, which terms and relations are relevant when recording maintenance process data for diagnostics and subsequent short-term maintenance scheduling tasks?

In this regard, Sect. 2 aims to introduce data collected in maintenance processes of LRUs. Moreover, a short introduction to techniques from DM and sub-symbolic AI is presented. Accordingly, the numerous challenges in data-driven projects are highlighted. The advantageousness of ST to potentially mitigate these challenges is depicted in Sect. 3. Section 4 outlines requirements regarding the ontology of maintenance processes. Additionally, relevant works are described and analysed shortly. Essential terms and their relations needed to be considered in an ontology for the tasks of diagnostics and maintenance scheduling, especially process planning, are elaborated in Sect. 5. For the practical applicability of the created ontology, the maintenance process data of an exemplary LRU will be included (see Sect. 6). On this basis, it will be demonstrated how the individual data can be structured and integrated in an application-specific manner. Subsequently, the results will be critically reflected and evaluated. Finally, the contribution is concluded with a summary and an outlook in Sect. 7.

2 Background

To illustrate the initial situation in view of RQ1, the maintenance processes, together with the collected data, are depicted in the following. Besides, potentially beneficial techniques from the fields of AI and DM will be introduced. A brief overview of frequent problems in data-driven projects concludes this section.

2.1 Collected data in maintenance processes of aircraft components

LRUs are, as mentioned previously, modular components from the aircraft that can be quickly replaced by the line maintenance in the event of faulty behaviour [18]. Figure 2 visualises a rough possible maintenance protocol, subdividing the areas of flight operations, line maintenance and the MRO workshop.

Fig. 2
figure 2

Exemplary maintenance process, in accordance with [12, 18, 19]

During flight operations, the components are monitored and tested with regard to certain parameters. The Initiated Built-In Test (IBIT) is applied in line maintenance with the aim of identifying anomalous behaviour [18]. If fault symptoms are detected, the aircraft component is taken out of service. Accordingly, the LRU is removed in line maintenance and sent to the MRO workshop for further investigation. Information about the reason for removal is frequently exchanged by paper with handwritten notes. Conversely, IBIT results or measured values from monitoring during flight operations are rarely communicated. Consequently, the availability of data regarding the condition of the LRU is often inadequate, which in turn hinders the planning process in the MRO workshops. Hence, as mentioned in the introduction, historical data about fault symptoms and subsequent maintenance activities within the workshops might be all the more important.

All this results in extremely complex material flows within and between different workshops. For a large number of LRUs, tests are carried out using test equipment (e.g. test benches) by recording various measurements according to the Component Maintenance Manual (CMM) before disassembly takes place. Thereby, values are checked considering defined limits with the intention to recommend troubleshooting [18, 19]. It should be mentioned that not all components are subjected to an entry test. However, an outgoing test is mandatory with regard to the certification. The process described in Fig. 2 can of course be subdivided in even more detail, e.g. by adding logistical and specific maintenance activities. As elaborated in other contributions, the typical maintenance process steps include goods receipt, visual inspection, the entry test, diagnostics, disassembly, repair, assembly, outgoing test, certification and dispatch [6, 13, 19, 20]. Depending on the initial state, there may be further process steps demanded such as cleaning. Shop Replaceable Units (SRUs) removed during this process can be repaired either in the same or in other internal or external workshops [18].

In each of these individual process steps, a large amount of data accumulates that is recorded and stored in a wide variety of formats (Extensible Markup Language (XML), JavaScript Object Notation (JSON), Comma-Separated Values (CSV), Aeronautical Radio Incorporated (ARINC) standards, Common Data Format (CDF), proprietary etc.). Depending on the technology and workshop under consideration, the data sources may differ significantly from each other. Typically, however, there are some data sources that are used by a large number of MRO service providers, which contain potentially interesting information [4]. Data relevant to business processes is stored in ERP systems [5]. Moreover, maintenance documents are managed by ERP systems [4]. Job cards are another precious source of technical information from the respective workshops. Amongst other information, activities and measured values are documented here [5, 21]. Looking at the OT level, the data from the test benches, mostly legacy systems, are a significant source for fault diagnostics. Here, as explained above, entry and outgoing test results including multiple test protocols, measuring points and time series data are gathered [19].

Generally, it becomes apparent that the data types recorded in these different data sources are very diverse. Data is stored in structured, semi-structured and unstructured form. The fields of interest for evaluation, such as past findings, maintenance activities or behaviour of the LRU (e.g. measured values like time series data) are stored in semi- and unstructured form [22]. The issue of data quality, particularly with regard to its availability, constitutes a pressing and pivotal concern within the MRO domain, especially when considering data-driven projects [4, 22].

2.2 Data mining and artificial intelligence

Various methods and techniques are researched in the era of big data with the intention of deriving useful knowledge from the vast amounts of generated data. In particular, DM and AI aim to gain insights from the data so that descriptive, diagnostic, predictive or prescriptive questions can be answered [11, 14]. DM is considered as a partially automated process of extracting knowledge and structured information from large databases [23]. It basically pursues the goal of recognising patterns from data, interpreting them and deriving recommendations for action or prediction with the help of specific algorithms and methods [14, 15, 23]. Classic methods used in this context are, for instance, classification, regression or clustering. Besides, DM includes techniques from different disciplines such as statistics but also from AI and Machine Learning (ML) [23].

AI, in turn, is seen as software that can learn and evolve from experience using a variety of techniques. It is divided into symbolic and sub-symbolic AI. Symbolic AI refers primarily to methods that generate new knowledge and models by means of formal logic and rules that can be interpreted by humans [24]. This also includes ST for knowledge representation (see Sect. 3). Sub-symbolic AI, on the other hand, comprises methods from ML and Deep Learning (DL). The goal here is to assist computers in statistical modelling so that they can solve complex problems as independently as possible through a learning phase [25]. Typically, three sub-areas are distinguished when it comes to the learning phase. Supervised learning receives labelled data sets for training. In unsupervised learning, e.g. in DL models, the correlations are recognised autonomously. In reinforcement learning, an agent learns by interacting with its environment through the principle of reward [24]. Sub-symbolic AI approaches have advantages when a large amount of data is available and no universally valid rules can be described. Additionally, however, there are also promising combinations of symbolic and sub-symbolic AI approaches, which are considered helpful for some use cases [22, 26].

2.3 Challenges in data-driven projects

As elaborated in Sect. 2.1, a lot of potentially interesting data is gathered in the maintenance process of an aircraft component. Nevertheless, there are a number of challenges in achieving the targeted visibility (in accordance with [7]) of the processes as well as getting value out of the recorded data. Different procedural models have been established for the effective implementation of DM and AI methods, with the Cross Industry Standard Process for Data Mining (CRISP-DM) being one of the most widespread [4, 27, 28]. This involves steps, such as business understanding, data understanding, data preparation, modelling, evaluation and deployment [28]. However, there a few challenges arising in all of these steps when conducting data-driven projects in the MRO domain.

A major and cross-domain challenge are the heterogeneous data sources resulting in high effort needed in the phases of data understanding and preparation [17, 29]. The vast amounts of data in different sources and formats, the many dimensions, missing or noisy data and meta-data or inconsistencies are not uncommon pain points [23, 30]. Typically, data preparation (e.g. removal of missing or incorrect values from the data sets and merging into a uniform format, etc.) takes disproportionate time regardless of the use case under consideration [15, 29]. Generally, as outlined in Sect. 2.1, there is a multitude of potentially useful data from numerous maintenance processes. The test benches at the OT level with measured values regarding the faulty behaviour of a LRU can be mentioned as an example here. However, the integration of these data with those from IT systems (e.g. ERP) with the aim to make maintenance processes more visible and getting insights using AI techniques is a major challenge. Most of the test benches used are legacy systems, which were typically not designed for concepts of excessive networking and data exchange. Accordingly, it is difficult to link such systems with technological advanced ones. Therefore, a retrofit of the test benches to ensure networkability is advisable. Other data sources from IT level are poorly documented, stored in different formats and not easy to integrate [17]. The lack of a uniform standardised view of the maintenance process data as well as missing semantics complicates and interferes with the implementation of data-driven projects [15]. Furthermore, the maintenance of LRUs is and remains a very manual process that can only be automated partially. Thus, potentially precious data is often documented manually and in unstructured form. Poor data quality, for example in form of incorrect data or inaccurate timestamps, is not unusual [22]. The frequent system breaks or the manual entry of technicians entail that a large number of the data records have to be cleaned with great care. In the worst case, the data is completely unusable or might not lead to accurate results. Consequently, consistency checks and time-consuming data pre-processing steps are necessary before DM or AI approaches can be applied sensibly [4, 17].

Besides, the lack of domain and prior knowledge to interpret the data correctly leads to impractical results [15, 30]. Especially the complexity in aircraft components’ maintenance is one of the reasons why domain and expert knowledge needs to be incorporated systematically, for example in data understanding and data preparation in order to identify correlations but also data quality issues [15, 31]. Accordingly, the task of generating significant features is often a complex, time-consuming endeavour [15, 17]. Without domain knowledge, it is substantially aggravated. However, it is not uncommon that for some aircraft components not enough historical data sets are available to obtain meaningful results by applying ML methods. Considering this scenario, the use of high-fidelity simulation models, created by domain experts, is potentially promising. On the one hand, to identify relevant features on the basis of the behavioural model of the aircraft component. On the other hand, to increase the data sets in the database necessitated for model training [12]. Additionally, the curse of dimensions is a major problem. The multitude of possible influencing factors is enormous. Especially for fault diagnostics activities, relevant context is of great importance as a means to interpret and analyse time series data correctly [26]. To include such context, information from different life cycle phases of the LRU needs to be incorporated. Additionally, domain knowledge might be precious for the model selection with the purpose of choosing appropriate algorithms as well as for the interpretation of the results [15]. However, it is not uncommon for projects of this kind to have limited manpower resources. Not every data scientist is simultaneously a domain expert. Correspondingly more time is needed in the typical steps depicted.

3 Semantic technologies

It becomes obvious, that solely raw data is of no use as long as it is not enriched meaningfully with semantics and context [14, 15]. In view of the challenges mentioned above, ST offer the potential to support the generation of valuable knowledge from maintenance process data [14, 19, 32, 33]. As such, various studies emphasise the benefits of systematically incorporating domain knowledge with formal semantics in techniques from DM and sub-symbolic AI [15, 26, 30]. Ontologies, used for a graph-based knowledge representation, are a suitable means to formally and explicitly specify the semantics within a domain in a machine-readable manner [15, 24]. Basically, an ontology consists of a T-Box and an A-Box. The terminological knowledge with the relevant vocabulary of the domain described by terms and their relations is stored in the T-Box. The factual knowledge, also called assertional knowledge, with actual instances is located in the A-Box [30].

Ontologies are usually created with ST, for which the World Wide Web Consortium (W3C) has defined important technology standards (see Fig. 3) [34]. For instance, the Resource Description Language (RDF) is used to represent and link data in a graph-based format. The Resource Description Framework Schema (RDFS) extends RDF by allowing classes, properties and hierarchies to be defined [35]. The OWL Web Ontology Language is based on description logics. OWL can be used to describe even more expressive properties, relationships and classes [24]. SPARQL Protocol and RDF Query Language serves to query or manipulate the RDF triples from the ontology. Apart from the integration of data, ontologies are also useful for inferring new information by adding reasoning approaches [35]. In this context, the Semantic Web Rule Language (SWRL) can be applied to express rules and logic on defined concepts in the ontology. The Shapes Constraint Language (SHACL), on the other hand, is utile when constraints need to be incorporated [24].

Fig. 3
figure 3

Semantic web stack [34]

Considering the challenges in Sect. 2, the advantageousness of using ST becomes apparent. Ontologies have already proven to be helpful in several Industry 4.0 application scenarios for the tasks of data access and integration from heterogeneous data sources [15, 16, 36]. Furthermore, as indicated before, ontologies can also be combined sensibly with methods from sub-symbolic AI and DM [24]. This refers to both directions. On the one hand, for example, to integrate prior knowledge into ML approaches or to improve data pre-processing [17, 29]. Especially the phases of data understanding and data preparation in a typical CRISP-DM can benefit from incorporating formal domain knowledge [30, 31]. On the other hand, methods from DM and ML can be applied to efficiently build knowledge bases [14, 26, 37, 38]. The latter is particularly necessary since manual creation of ontologies is very time-consuming. Beyond that, however, it is sensible in order to reuse generated knowledge.

4 Requirements and related work

As indicated in the previous Sect. 3, ST are a useful addition to data-driven projects, especially in the context of data understanding and preparation. However, in view of the diversity and complexity of the maintenance processes and its data, further requirements (R) regarding the creation and use of the ontology have to be contemplated. Intending to answer RQ2, these requirements are to be outlined in Sect. 4.1. Likewise, the related work from the domain is depicted and analysed in Sect. 4.2 with regard to the requirement set-up.

4.1 Requirements

Looking specifically at the maintenance processes of LRUs, the diversity and the complexity are remarkable. Each individual maintenance process in a different technology workshop has its own pertinent use cases, data and methods of data collection. Testing an avionic LRU on the test bench requires different data formats and communication protocols compared to pneumatic or hydraulic ones. The IT systems used, tailored to the LRU requirements, may also differ accordingly. Correspondingly complex is the knowledge necessitated to gain valuable results. Moreover, each use case has its own requirements that need to be considered. Depending on the objective of the project and the stakeholders involved, the relevant context of the maintenance process, modelled in an ontology, differs. Due to poor data quality or incomplete data, it is also not ensured that every desired use case can be realised instantly. Instead, as explained in [7], steps of computerisation and connectivity have to be realised first before development levels of visibility can be reached. Intending to reduce the effort of adapting the ontology when transferring it to another workshop, other service providers or other use cases a modular structure is preferable. In this context, Hildebrandt et al. [16] have emphasised the advantages of ontology design patterns (ODPs), which follow a modular design logic. Thus, the same ODPs can be reused for cross-workshop concepts or use cases. In addition, they are easier to maintain compared to a large complex ontology. Technology-specific knowledge can be added in the form of further ODPs. Likewise, the terms and relations must be describable on different hierarchy and abstraction levels. Thus, general terms of maintenance can be further specified with regard to the individual workshops. A subsequent alignment of the separate ODPs with predefined mechanisms enables a more rapid adaptation to the slightly changed context of a new workshop or use case. Besides, the effort of creating the ontology is compensated by the reusability.

R1: An ontology for describing relevant terms and relations of aircraft components maintenance processes should follow a modular structure. In this context, ODPs are appropriate as they can be extended and reused depending on the use case. Additionally, terms and relations must be describable on different hierarchy and abstraction levels.

Looking at the vocabulary used in the various data sources delineated in Sect. 2.1, it is often very different from the terms used in the domain of maintenance. For unambiguous semantics, it is necessary to use a vocabulary that is established in the maintenance domain. Only in this manner, it is possible to incorporate the formalised domain knowledge effectively in the phases of data understanding and data preparation. Furthermore, this lays the foundation for reusing ODPs developed in previous projects. Thus, it is advisable to define the ODPs on the basis of appropriate industry standards with the aim to ensure a uniform meaning of the terms [16].

R2: The ODPs developed for describing relevant terms and relations of aircraft components maintenance processes should be based on industry standards, norms and guidelines of the maintenance domain. This ensures both reusability and cross-domain understanding.

Before modelling the ODPs to formalise the knowledge about the maintenance processes of individual LRUs, the information needs will have to be collected. Depending on the perspective of the potential stakeholders and the use cases considered, a different context is needed, which has to be described in the ontology. In the context of this work, the particular aim is to improve short-term maintenance scheduling. As the first step, the focus is primarily on process planning, i.e. the sequencing of successive activities. This task is characterised by the short-term planning of processes to be carried out, necessary spare parts and further resources. Thus, information for diagnostics and for scheduling is ascertained first. Typically, these two tasks are often associated with a certain complexity. Therefore, a data-driven approach seems potentially promising here. Moreover, these tasks have a high degree of dependency on each other. The results and the knowledge generated from the diagnostics can be used to improve the execution of maintenance activities in workshops. Moreover, the possibility of achieving visibility of the maintenance process including relevant context (faults, maintenance activities, material orders, etc.) offers the potential to improve fundamental planning tasks (e.g. scheduling, material and capacity) and corresponding logistical key performance indicators (KPIs). In addition to purely maintenance-related activities, logistical and administrative activities play an important role in MRO, especially in view of the KPIs [39]. This is not least due to the strong integration of customers and their heterogeneous requirements.

R3: The ontology developed for describing relevant terms and their relations of aircraft component maintenance serves to improve short-term maintenance scheduling by including diagnosis. In this respect, maintenance-related, logistical and administrative activities as well as the associated historical and current data of the LRU must be taken into account.

Although ontologies have many advantages in terms of data integration and knowledge representation, they also have some weaknesses in terms of typical characteristics of the domain under consideration. This is particularly pertinent when it comes to requirements, such as efficiently building up new knowledge and scalability [26]. The inclusion of large amounts of knowledge in the ontology is associated with certain obstacles and problems [22]. Especially the different properties of the data (text, time series, etc.) require extensive pre-processing before the ontology can be populated with the individual instances. Specifically looking at the activity of fault diagnostics, the number of possible factors that can be responsible for the faulty behaviour of a component is immense [6]. Determining the strength of the influencing features or describing them in a rule-based approach is not a trivial endeavour [5]. Methods from DM and sub-symbolic AI, on the other hand, are suitable for recognising patterns from large amounts of data. These, in turn, can be interpreted with the help of domain experts and used to create or extend the ontology [30]. Moreover, these techniques might be needed to populate the A-Box of the ontology [14]. The number of methods from sub-symbolic AI, especially from the field of ML, for assisting in tasks like fault diagnostics is increasing and will potentially continue to increase. The models used, the associated features and possible classification results must also be semantically annotated and combined with further information. This applies, for example, if the subsequent maintenance activities need to be planned. Some sub-symbolic approaches have already proven their value in extracting precious and structured data from large and partially unstructured one [22, 30, 32].

In this respect, a combination of knowledge-driven and data-driven techniques is particularly advantageous in the domain of maintenance, where very complex knowledge is necessitated and a lot of data need to be accounted for.

R4: The ontology developed for describing relevant terms and relations of aircraft components maintenance processes should be combined with approaches from DM and sub-symbolic AI. This is intended to cope with the complexity of the different tasks in the domain of maintenance. Besides, it is beneficial if the ontology requires efficient creation and population.

As described in Sect. 1, the domain is characterised by a high degree of uncertainty due to the large number of missing information. Especially the lack of data from the flight operations as well as from line maintenance prevents diagnostics and scheduling tasks from being carried out efficiently. Equivalently information about faults is only evident in the course of the maintenance process through the disassembly of the LRU, as mentioned in Sect. 1. Besides missing information, poor data quality should not be neglected [4]. Consequently, the incorporation of uncertainty and vagueness in the ontology for diagnostics results is beneficial in the field of aircraft component maintenance. This applies especially to the activity of non-invasive fault diagnostics, where inaccurate and incomplete information may be available at the time of investigation [1]. Hence, there can be several fault causes for one fault symptom or different fault symptoms for one fault cause. In order to generate value for the user, despite this possible ambiguity, additional details on the probability of linked information have to be added. In this respect, the extension of the ontology with ways of expressing uncertainty is demanded.

R5: The ontology developed for describing relevant terms and their relations of aircraft components maintenance processes should incorporate uncertainty regarding the diagnostics results. This is especially required as incomplete and inaccurate information are common within the domain of maintenance.

4.2 Related work

In the field of aircraft components maintenance, there are a wide variety of research approaches in which knowledge bases, respectively, ontologies have found application.

In [40], an ontology-based approach in aircraft maintenance is presented that aims to capture, store and use appropriate data, information and knowledge to support maintenance tasks. In this process modular containers, so-called enterprise knowledge resources (EKR) are introduced for knowledge and process elements. These are intended to model maintenance processes, goals and necessary in- and outputs. Moreover, a domain-specific meta-model based on the product-process-resource (PPR) principle is used to annotate the individual EKRs.

Medinacelli [19] has applied an ontology-based approach for identifying failures in avionic components. The ontology contains relevant domain-specific knowledge for the automatic analysis of tests generated from the test benches. On this basis, corrective maintenance activities are proposed for the technician. An unsupervised learning algorithm is applied with the aim of learning signatures of a failure.

The authors of [21] delineate the aircraft maintenance process and the knowledge necessitated using a meta-model. In the course of this, pertinent information on documents, the product, the resources and the maintenance instructions is taken into account. XML serves to describe and structure the data. Using the example of an A320 nose landing gear, the data is visualised and made available.

Wu et al. [32] introduce an approach in which heterogeneous maintenance records (mostly text) are made accessible through the use of an ontology. In this context, the standards ATA (Air Transport Association) iSpec 2200 and ISO (International Organization for Standardization) 15926 are referred to as a means to obtain a representation model. The aim is to gain valuable insights for fault diagnostics and maintenance management by accessing the maintenance records. One of the targeted applications is a case-based reasoning approach. The evaluation is applied on a direct current generation system.

The authors of [41] introduce an approach to improve capacity planning of complex capital goods, despite uncertain load information. For this objective, historical data are collected with the aim to establish a damage library. On this basis, possible damages are linked to a process model. This model includes all possible regeneration paths to be carried out. For forecasting the necessary capacities, a Bayesian network is then constructed and applied.

For the fault diagnosis of flight control software, Yang et al. [42] combine case-based reasoning and a Bayesian network. In advance, a knowledge base is created in Failure Mode and Effects Analysis (FMEA) style. This allows data on failure causes, failure modes and failure effects to be described in a structured way. The case-based reasoning approach is then used for a low-level diagnosis. In case of insufficient results, a Bayesian network is learned from the historical data in order to infer the possible causes of the fault.

Considering the requirements of Sect. 4.1, it is noticeable that these are not met sufficiently by any of the publications delineated. The authors of [21, 32, 41] and [32] aim to depict knowledge about the maintenance process without using a formal description for an ontology. Furthermore, the vocabulary is not based on appropriate industry standards. This is also not the case in [19]. However, a learning procedure for the ontology is being developed intending to continuously expand the knowledge with further concepts. Beyond that, current data from the test bench is included and maintenance recommendations are provided with probabilities. Wu et al. use the ISO 15926 and ATA ISPEC 2200 standards to build an ontology, but do not take uncertainty into account. However, most of the work presented relies on historical and/or current maintenance process data. Yet, not all approaches apply both types of data for their application scenario. The analysis of the related work is represented in Table 1. In this context, a cross means that the requirement has been considered in the respective approach. If the requirement has been partially addressed, a cross in brackets has been appended.

Table 1 Analysis of related work regarding the defined requirements

5 Ontology for aircraft components maintenance

In the following section, essential terms and relations of aircraft components maintenance processes will be derived and presented as a means to answer RQ3. For this purpose, competency questions (CQ), based on relevant information for maintenance scheduling, are collected. This is intended to collect the functional requirements for the modular ODPs. Moreover, it is substantial in the interest of narrowing the relevant context. In Sect. 5.2, the ontology aligned as well as important terms and relations are introduced based on the PPR principle. Subsequently, techniques for populating the ontology by incorporating techniques from DM and sub-symbolic AI are elaborated.

5.1 Competency questions

To identify the information needs regarding the task of diagnostics and maintenance scheduling, the first step was to investigate appropriate CQs from the intended users. In that regard, the method for ontology-based requirements elicitation by Hildebrandt et al. [16] was followed. CQs are indeed a suitable evaluation method when using semantic technologies to ascertain the extent to which the created ontology meets the requirements. To elicit the CQs, interviews were first conducted with potential stakeholders in the interest of raising apposite user stories. In addition, these statements have been compared with approaches from the literature on maintenance scheduling. The following interview partners have been rated as particularly important: technicians from the maintenance workshops, engineers of the LRUs, process planners, resource planners and employees from lean management. In addition, data scientists were included who are currently working on improved diagnostics using data-driven models. Basically, the CQs can be divided into two categories. On the one hand, historical data from maintenance processes is needed to carry out further descriptive and diagnostic analyses (e.g. using methods from DM). On the other hand, current data from the workshop is needed in order to conduct up-to-date maintenance planning with the help of the generated knowledge and integrated ML models for diagnostics. Fundamental CQs with regard to the ontology are listed in Table 2.

Table 2 Exemplary competency questions

5.2 Relevant terms of maintenance processes

In the following, different modular ODPs are to be presented for the structuring of the historical and current data in the MRO of LRUs. In this context, domain-unspecific and domain-specific ODPs are defined. The modularity allows the ontology to be composed specifically according to the information needs and the LRU to be varied. In the respective subchapters, essential terms (written in italics) and relations for modelling the T-Box of a maintenance process are introduced. For this purpose, reference is made to existing industry standards. Following the principle of Hildebrandt et al. [16], the necessary terms and relations of an ODP are described as Lightweight Ontologies (LWO, respectively, Unified Modeling Language (UML) class diagrams) first.

5.2.1 Ontology alignment based on the PPR structure

The relevant context regarding the maintenance process based on the derived CQs is divided and expressed in a PPR structure. The formalised process description (FPD) according to the standard of the Association of German Engineers VDI 3682 allows us to link information on products, processes and resources at an abstract level [43]. The relevant classes are presented in the UML class diagram and can be used in a domain-unspecific manner. With the help of this ODP, information, product and energy flows of individual interconnected process steps within a system border (in this case the MRO workshop) can be formally described. Figure 4 depicts how the T-Box and the corresponding ODP is derived from a UML class diagram. The UML class diagram is a model that can be understood by all roles considered (software developer, domain expert, data scientist, ontology expert, etc.) [31]. Thus, it facilitates the communication in the development phase of an ontology. Protégé has been used as a tool for modelling the T-Box including the relevant concepts and relations with classes, object and data properties. This example is intended to illustrate how further ODPs will be developed in the following.

Fig. 4
figure 4

VDI 3682 ODP as a LWO and in Protege [43]

For a more domain-specific description of the maintenance data, the defined classes can be extended by domain-specific concepts of further industry standard-based ODPs. In the interest of describing the maintenance process data for the respective LRU, a modular ontology (with ODPs) is necessary, as defined in Sect. 4.1. This includes a basic structure of terms and relations for the area of MRO of aircraft components. On this basis, individual ontologies of specific LRUs can be created efficiently. The reusability of already existing ODPs reduces the effort of creation (see R1 and R2). To combine or expand the individual ODPs according to the information needs, relationships must be established between the industry standards and the classes to be modelled. The four mechanisms 1. Attribute-to-class 2. Subclassing 3. Equivalent-to and 4. Relation-to are applied in this respect [16]. Sects. 5.2.2, 5.2.3 and 5.2.4 refine such domain-specific concepts from relevant standards. In this course, the four mechanisms mentioned are explained exemplarily. If one combines the individual ODPs according to these mechanisms, the term “alignment” is used.

As indicated in Sect. 5.1, the historical and current maintenance process data is intended to improve diagnostics and maintenance scheduling. Thus, on the one hand, information on the condition of the specific LRU must be included for diagnostics tasks. On the other hand, this information must be linked with other relevant information in the context of MRO process planning (see R3). According to the standard of the German Institute of Standardisation DIN 13306, maintenance consists of maintenance-related, logistical and administrative activities or processes [39]. Thus, the maintenance process data regarding the LRU is subdivided into two categories. First, data that reflects the current condition or state of the LRU. Secondly, historical data on past maintenance cycles as further context is required. However, logistical and administrative process data also serve as input and output of the entire maintenance process. This is due to the aforementioned requirement that, in addition to condition information, information from customer contracts and logistical processes also have a significant influence on maintenance scheduling.

Basically, the ontology for structuring data, information and knowledge in the maintenance process of LRUs can be classified as follows (see Fig. 5). The ODP VDI 3682 (white, in the middle) is used for abstract modelling of processes, as already indicated. ODPs in black are used to describe LRU-related knowledge about maintenance, which can differ greatly depending on the technology under consideration. Air Transport Association (ATA) chapter are useful here, as they structure the modules and components of an aircraft in a standardised and systematic way. The International Electrotechnical Commission (IEC) 61360 in combination with the ECLASS data standard is suitable for including features such as certain measured values in an unambiguous way. The ODPs in yellow explicate maintenance activities that occur during a maintenance cycle [18, 39, 44]. In most cases, these are technology-neutral. Accordingly, the terminology can be reused independent of the item to be maintained.

Fig. 5
figure 5

Ontology alignment of relevant ODPs

Due to the special importance of diagnostics in the step of maintenance scheduling, the relevant terms and relations are separately specified in the grey marked ODPs ISO 17359, DIN EN 9721 and ISO 3534 [18, 45, 46]. Methods from DM and AI are potentially helpful for this task (see R4). First, to identify the faults that have been recorded most frequently from historical data. Second, to use this generated knowledge in combination with expert knowledge to derive relevant fault symptoms. These, in turn, can be used as features or input for model training (e.g. for ML methods). The ODP ISO 3534 (Statistics—General statistical terms and terms used in probability) aims to address uncertainties associated with the activity of diagnostics (see R5). ODPs that contain terms and relations about supporting processes (administrative or logistical activities) and the information gained from these activities are blue. In the context of administrative activities, it is particularly important to take the exchange with and specific requirements of the client into account. The latter is typically defined contractually. Hence, DIN 13269 (Maintenance – Guideline on preparation of maintenance contracts) has been included to illustrate contractual elements [47].

The UML class diagram (see Fig. 6) roughly illustrates the relationships between the terms of the individual ODPs that are linked together. Here, the four mechanisms previously explained have already been applied. In this case, the information class according to VDI 3682 is once again divided into different subclasses. The following Sects. 5.2.2, 5.2.3 and 5.2.4 depict the individual concepts regarding products, processes and resources in more detail.

Fig. 6
figure 6

Terms and relations in the ontology aligned

5.2.2 Product

Products that directly serve as input and output in maintenance process steps are the respective LRUs as well as installed and removed sub-components or materials (SRUs) in the workshop. In addition to the physical components, further information is demanded and gathered in various maintenance activities. This information on the LRU must be described in a structured manner and with clear semantics. For a technology-neutral description the system and property model according to [48] (derived from VDI 2206) can serve as a meta-model. In this course, relevant information of the LRU can be assigned to the structure, function and behaviour model.

In order to refine these very generic classes in a more domain-specific way, it is necessary to search for corresponding equivalents, subclasses or relations to further classes and attributes in pertinent standards. Considering the structure first, aircraft components according to DIN EN 13306 [39] are items that have manifold properties (reliability, availability, average failure rate, etc.). The modelling of these properties can, in turn, be based on IEC 61360 [49], for example, in which data elements are identified by type and instance descriptions. Each of these items has a part number and serial number as well as other data properties for unique identification. An aircraft component or the item under consideration structurally consists of several subcomponents or assemblies. These are also regarded as items with properties. A specific definition of the terms LRU and SRU, as used in the domain under consideration, can be found in DIN 9721 [18]. Through this principle, the initially abstract description of product (according to VDI 3682) could be refined to LRU and SRU. In addition, from an information technology point of view, a linkage of these classes could be formed through the mechanisms described before (see Fig. 7).

Fig. 7
figure 7

Terms and relations of the maintenance item

Depending on which workshops and LRUs are actually examined in the use case, more specific ODPs can be utilised. In principle, the ATA chapters can be referred to for the unique identification of the LRU. Apart from the structure, information about the functions of individual components and their behaviour is helpful, especially for fault diagnostics. According to VDI 2206, functions serve to convert an input (e.g. material, signals or energy) into an output [50].

Furthermore, each LRU should obtain a life cycle record. According to DIN 77005-1 [51], this contains all documented information (e.g. ownership, possession, technical information, costs) about the entire life cycle of the technical equipment in chronological order (see Fig. 8). Substantial documents stored include the aforementioned CMM, the Engineering Order (EO) or the Service Bulletins (SBs). Likewise, measured values from flight operations, line maintenance and MRO workshops are documented. In this form, historical data can be structured and collected over the life cycle.

Fig. 8
figure 8

LWO of the life cycle file [51]

5.2.3 Process

The maintenance process of a LRU in MRO workshops is specifically characterised by diverse activities (see Sect. 2.1). These can be further divided into the three classes of maintenance, logistical and administrative activities (respectively management). A sample of the relevant activities and subclasses are shown in Fig. 9. DIN 13306 presents a series of preventive and corrective activities. Corrective activities include, for example, fault localisation, the fault diagnostics and repair. An even more detailed description of maintenance activities is expressed in ISO 15926 [44]. Although this standard is not aviation-specific, it covers the same maintenance activities as the domain under consideration. An additional advantage stems from the ODP’s prior incorporation in a published work (see [22]), which facilitates its rapid and convenient reapplication. Activities described here include inspect, troubleshoot, repair, adjust, modify, calibrate, renew, check, refit, overhaul, replace, test and others.

Fig. 9
figure 9

Maintenance-related activities

The fault diagnostics, as a subclass of the maintenance activities, is composed of all measures for fault detection and fault location. As mentioned before, it is a particularly important activity in MRO to increase the effectiveness and efficiency of maintenance scheduling. DIN ISO 17359 and appendix 1 specify possible procedures as well as terms for condition monitoring and diagnostics [45, 52]. In this context, fault location are activities for recognising the faulty item on the corresponding structure level. As mentioned in Sect. 5.2.2, each item offers functions that are considered necessary for the fulfilment of a given requirement. According to the industry standard, a fault is, conversely, a state of an item in which it is not able to fulfil the required function. In this respect, a fault can be identified by fault symptoms. According to DIN 9721 and DIN ISO 17359, a fault symptom is the physical manifestation of a fault [18]. The detection of faults and fault symptoms is done by identifying deviations from an expected behaviour (defined by features, procedures, states or variables) or reference values, which can be described by different types of information (e.g. test results) and models (physical, simulation, data-driven, etc.) [3]. Basically, a fault can be subdivided into further subclasses such as latent or partial fault. If the requirements for a function are only partially fulfilled, this is referred to as degradation [35]. Depending on the type of system, deviations can be identified in different maintenance stages, for instance, by generating fault messages during flight operations or line maintenance (see Fig. 2) in the case of unscheduled maintenance. Alternatively, they are identified in the workshops by the technicians through visual inspection or with the help of tests and recorded measured values (for example by the test equipment, see Fig. 10). A list of historical and current faults and fault symptoms of an item provides important context that needs to be incorporated in various maintenance activities. Increasingly, diagnostic models (also from the field of sub-symbolic AI) are being investigated to assist technicians in this step. Especially in the case of ML models, relevant features that are needed for training must be determined on the basis of expert knowledge. Furthermore, uncertainty should be considered in the ontology, especially regarding recommendations for maintenance scheduling use cases, as described in Sect. 4. In particular, relations between faults and the corresponding item, fault symptoms and necessary maintenance activities should have a corresponding probability (according to DIN ISO 3534) or other measures to represent uncertainty.

Fig. 10
figure 10

Identification of fault symptoms by tests

In addition, preventive activities are carried out to prevent faults occurring in operation. These include, for example, the compliance test or the internal inspection. The individual activities depicted above should be assigned to time-related KPI’s in order to determine the lead time. DIN 13306 defines maintenance time as the period of time in which maintenance activities are carried out on an item. This also includes technical, logistical and internal/external administrative delays. In this regard, a logistical delay is the period of time during which maintenance cannot be carried out as maintenance resources (see Sect. 5.2.3) have to be procured. For the logistical delays, the logistical activities or logistical processes must be known first. VDI 4490 [53] defines concepts based on a general material flow of a common company. These include incoming goods, warehouse input, storage/replenishment/order-picking/warehouse output or dispatch. With the help of information from VDI/VDMA 5100 [54], these terms can be further supplemented. The logistical activities are also used to enable improved visibility of the end-to-end material flow in the MRO workshops. Typically, the data describing these activities is stored in different heterogeneous data sources and is not correlated with each other temporally. Besides, the different abstraction levels of processes, saved in various data sources, hinder the data analysis. Moreover, they are not described with vocabularies of the domain [32]. With the intention to evaluate these event traces and process data in a meaningful way, the activities must be depicted in a structured and systematic way for suitable pre-processing. According to DIN 17007, management includes all activities that are necessary to achieve the goals set by the company’s management [55]. This also comprises the communication of information with external partners. Thus, it is important as it enables the customer’s requirements (especially contractual elements) to be incorporated into the planning of maintenance activities.

5.2.4 Resource

When carrying out maintenance activities (e.g. inspection, tests, fault diagnostics), multiple resources are necessary. Examples are tools, monitoring and test equipment (e.g. the test bench) or human resources (technicians), as explained in DIN 17007. Most of the maintenance activities are carried out manually by technicians. One very significant monitoring or test equipment in the maintenance process of LRUs is the test bench. Above all, the test bench is necessary for the maintenance activity test and for the determination of the item condition (see Sect. 5.2.3). According to DIN 9721, a subset of fault symptoms can be codified by the test results [18]. The codification can, for instance, be done with the binary information FAILED/PASSED or OK/NOT OK. The individual tests for the LRU or specific SRUs are documented in the CMM. Subsequently, different measured data, respectively, values are recorded as a means to check whether they comply with a certain reference value or tolerance range. If this is the case, the individual test is PASSED, otherwise it is FAILED. Subsequently, the aircraft component goes back to the workshop, where the causes of the fault symptoms are searched for. As mentioned in Sect. 2.1, the CMM defines troubleshooting for fault isolation, which are recommended as iterative and corrective maintenance measures for fault elimination.

5.2.5 Ontology population

Before the ontology can be used, it must be instantiated with data objects from the original data sources. As outlined in Sect. 4, there are some challenges that need to be overcome concerning the ontology population. A major hurdle in this context is that the large amount of data on maintenance activities but also on the findings of faults and fault symptoms is available in textual respectively unstructured or semi-structured form. In this respect, the first step is to bring this data into a structured form with the intention of extracting essential information that can be used in the ontology. Natural Language Processing (NLP) approaches have proven helpful in labelling the texts noted down by technicians with terms of the ontology and thereby extract the necessary information [22]. Structured data stored in relational databases as well as recorded time series data and measuring points must also undergo pre-processing with methods from signal processing. Subsequently, defined mappings are needed as a means to populate the ontology with instances (see Fig. 11).

Fig. 11
figure 11

Ontology population

The ontology established with knowledge and collected information about the product, processes and resources can in turn be used for the tasks depicted in Sect. 5.2.1. Intending to enable an even better fault diagnostics, the numerical measured values instead of binary test results must be included in the analysis. Methods from signal processing and ML are potentially helpful in classifying faults [26]. The knowledge specified in the ODPs of the life cycle record as well as information gained from actually performed maintenance activities are valuable in this process. Thus, further context and prior knowledge can be incorporated in order to interpret the measured values correctly. Likewise, the ontology can be used for techniques such as process mining to extract valuable insights from the collected event traces. With a view to expanding the knowledge of the ontology, it is important that the knowledge gained from these techniques is again formalised. For this purpose, the involvement of domain experts is advisable.

6 Exemplary proof of concept

In the following Sect. 6.1, the historical and current maintenance process data of an exemplary LRU from the field of hydraulic aircraft components is structured using the ontology modelled. Besides, the extension and the use of the ontology for further LRUs from other technological areas will be discussed exemplarily. Finally, the first results from the use of the ontology will be analysed in Sect. 6.2.

6.1 Exemplary aircraft component

To ensure the practical applicability of the ontology developed, the maintenance cycles including data of an electro-hydraulic servo actuator (EHSA) have been examined. This actuator consists of both hydraulic (electrohydraulic servo valve and linear hydraulic actuator) as well as control parts (position linear sensor) and belongs to the flight control systems. The EHSA is needed to control various aircraft surfaces (e.g. flaps). It receives the electrical signals for control from the flight control computer. These in turn obtain feedback from the EHSAs and various sensors around the control surface under consideration [56]. The EHSA is a predestined reference component for a more in-depth analysis on grounds of many reasons. First, the number of EHSA that currently are and will be in operation in the future is substantial. Moreover, the load on the EHSA with every take-off and landing is significant. Consequently, the need for maintenance activities is expected to remain high [12]. Second, the main components of many primary flight controls are the same. Hence, a certain transferability of the modular ontology with terms and relations to a variety of part numbers is feasible.

The CQs defined in Sect. 5.1 were used to specify the knowledge base, respectively, the ontology to be devised. The first three CQs (CQ1–CQ3) address populating the knowledge base with historical data and integrating a suitable data-based diagnostic model to automatically determine faults. Therefore, the first step was to determine, which faults occurred most frequently on which item in the past. On this basis and by including expert knowledge, relevant features for the model training were determined. In this context, methods from DM and ML were applied in the application example. Since the essence of this paper is the creation and use of the ontology for short-term maintenance scheduling, the details of the ML model, the simulation model for generating necessary data sets, as well as DM methods are not explained explicitly. Instead, reference is made to [12] for a more in-depth description of the procedure to implement and use the DM and ML approaches. Essentially, the knowledge generated from these previous publications [12, 56] is formalised in the knowledge base as follows. Subsequently, maintenance activities necessary to restore the EHSA have been modelled (CQ 4). Finally, the output information of the diagnostic model was linked to the corresponding maintenance activities (CQ 5). In this process, the logistical and administrative activities were added. The primary objective was to assess the maintenance requirements of the LRU (including diagnostics and contractual elements with the customer) in comparison to the capabilities of the maintenance organisation (CQ 6). The individual T-Boxes of the ODPs and the alignment ontology have been modelled in Protégé (version 5.5.0). A graph database (Ontotext GraphDB) has subsequently been used to materialise the individual RDF triples obtained from the historical as well as current data. In this context, mappings have been written for the semantic annotation and the efficient integration of the data into the ontology. The W3C-standardised mapping languages RML and R2RML were employed to integrate diverse data formats, which were introduced in Sect. 2.1. Subsequently, SPARQL was utilised to execute a query against the graph database, aiming to retrieve individual triples. To generate dynamic SPARQL queries at runtime (e.g. based on various currently generated fault symptoms for the CQ3), Python with the library RDFlib was utilised.

Regarding CQ 1–CQ 3, various information on structure, function and behaviour is necessary (see Sect. 5.2.2). In the historical recorded data, the heterogeneous data sources mostly contain information on typical classes of faults for certain SRUs. With the aim to identify the most predominant faults, as demanded in Sect. 5.1, the corresponding integrated historical data should then be analysed using DM methods. For this purpose, the EHSA (LRU) and its structural components (SRUs) were described semantically beforehand. Figure 12 depicts the reference configuration of the EHSA with its individual sub-components. The SRUs to be considered for the task are labelled in the Fig. 12. As mentioned in Sect. 5.2.2, the ODPs VDI 2206 and DIN 9721 can be used here to express the structural information. However, a more detailed description of the individual structural information may be useful. In this case, the corresponding maintenance document (e.g. the CMM) has been used for instantiation. Before modelling, effort and benefit should be weighed up, as the manual creation of the T-Box consumes a corresponding amount of time. If there are no constant revisions of the CMM, the effort can be worthwhile.

Fig. 12
figure 12

EHSA reference configuration [12]

The ODPs DIN 9721, DIN 17359 and DIN 77005 have been used to describe the relationships between faults, fault symptoms, measured input and output signals or measuring points and the reference values defined in the CMM. For the consideration of measured values and associated units, indicating the behaviour of the LRU, the ODP IEC 61360 was applied, which references ECLASS. Typically, three signals or types of information are relevant for the control of the EHSA. First, the position command, which specifies the target position. Second, the real position (ascertained with the help of the linear variable differential transformer (LVDT)), which ensures a closed control loop. Thirdly, the servo valve current, which is needed to control the valve [56]. As referenced in Sect. 5.1, employing CQs with corresponding SPARQL queries serves as an evaluation method within the context of semantic technologies. The objective is to demonstrate that the ontology effectively addresses the intended inquiries and supplies the necessary information. An example for demonstration is shown in Fig. 13. The SPARQL query retrieves information about faults and their associated fault symptoms from an RDF dataset that uses the ODPs DIN 9721 and ISO 17359. It groups the results of the specific LRU “EHSA1234” (the designation serves as a dummy) and SRU combinations and then counts the frequency of each fault for each combination. The query also concatenates the fault symptoms for each fault and sorts the results based on the frequency of faults in descending order.

Fig. 13
figure 13

Simplified SPARQL query for answering the CQ2

This integration enabled data, which would typically be stored in separate IT- and OT-systems, to be accessed and queried through a unified SPARQL endpoint. Additionally, the introduction of semantics and context enriched the data, facilitating a coherent analysis of information originally residing in disparate silos. An examination of the historical data collected revealed that certain faults were detected more frequently than others. For example, the electro valves and the mechanical feedback of the EHSA are usually less critical. The electro-hydraulic servo valve (EHSV), in contrast, is replaced repeatedly, also due to the mechanical significance of this SRU [12]. Basically, the EHSV comprises a number of other small subcomponents (e.g. torque motor, spool and feedback spring). Its function is to control the fluid flow in the EHSA by positioning the valve opening. Accordingly, the EHSV is important for the overall dynamic behaviour of the actuator. Various levels of degradation are possible as faults, which are examined in [56]. An example: it may occur that the torque of the torque motor decreases, resulting in a deterioration of the response of the EHSV (as a fault symptom). This information on the anomalous behaviour can be collected via diverse measured signals and converted into relevant features using signal processing methods. The reason for this could be a short circuit between neighbouring coils, for example as a result of metallic deposits having accumulated [56]. This gained knowledge has in turn been used to extend the ontology. In addition, the possible fault classes have been included in the ontology by means of a further sub-classing. Following this step, various relevant features from the measured signals were defined for the training of the ML model in collaboration with domain experts and data scientists. For this purpose, various reference values have been assigned to these features. The different measurement values from the individual test sequences, obtained from the test bench, were used to first record the data signals. Other information from the life cycle file was also incorporated. Subsequently, the methods from signal processing were applied as previously explained. The data pre-processing was not done with the ontology, but as described in Fig. 11. Thus, the ontology only includes references to the features defined with the help of the ODP ISO 17359 (see Fig. 10). Furthermore, labels have been specified with the predefined fault classes in order to determine the possible results of the diagnostics and the fault classification. In the considered use case of the EHSV, only a binary classification has been conducted initially. This is due to the circumstance that the EHSV is typically completely replaced when the reference values are exceeded [12].

In the next step, the plan processes or activities to be carried out were semantically defined with the help of domain experts and the CMM (on the basis of the previously defined relevant fault classes). The historical maintenance activities that are necessary to eliminate certain faults can also be used as support here (CQ 4). The ODPs DIN 13306 and ISO 15926 provide the pertinent terms and relations. With the help of these concepts, historical data on individual activities can be semantically annotated. In principle, the individual process paths (in the form of process patterns) with the respective maintenance activities have been created. To facilitate this step, the tool FPB.js was applied [57]. It provides a graphical user interface for modelling the activities with the concepts from VDI 3682. Additionally, the information is mapped in JSON format and can be easily incorporated into the ontology. The resulting process library specifies the corresponding maintenance activities on the basis of possible faults. If the EHSV is taken as an example, “Replace EHSV” would be a target-oriented activity as a result of fault classification according to the diagnostic model. An explicit description of the relationship between maintenance activities to be carried out successively, the LRUs or SRUs and the necessary resources follow with the help of the ODP VDI 3682. As mentioned above, the maintenance activities inspection and the tests are essential with regard to maintenance scheduling. After all, these activities record all relevant fault symptoms and measured data, which are highly relevant for the subsequent fault diagnostics. Therefore, the maintenance activity test with input and output products (red circle) as well as information (blue hexagon) is exemplarily and simplified illustrated with the symbols of the VDI 3682 (see Fig. 14). The subsequent maintenance activity of fault diagnostics has also been explicitly modelled with its input and output information. In this step, the trained diagnostic model has been integrated to consider the results for the configuration of the process model. All necessary features were passed to the model via a defined interface. The return value corresponded to the respective fault classification result and an exemplary probability. Using a single SPARQL query, all the essential information from diverse data sources, such as fault classification through the ML model, fault symptoms and structure information, was combined to address CQ3. Additionally, this integrated information, along with a dynamically generated SPARQL query based on diagnostic results, facilitated the determination and combination of maintenance activities required for subsequent maintenance scheduling, as required in CQ5.

Fig. 14
figure 14

Simplified description of the maintenance activity test, based on VDI 3682

Finally, logistical and administrative activities have also been modelled, which are directly dependent on the state of the item (see CQ 6). Typically, a workshop does not have all the capabilities to fix the faults of a LRU or the faulty SRUs. Either the items are assigned to another workshop internally. Alternatively, an external maintenance service provider is acquired, which is specialised for the maintenance activity demanded. In both scenarios, logistical activities are necessitated to realise the material flow. This information must also be integrated and linked in the planning task. For this purpose, the contractually agreed information (respectively contractual elements) from the ERP system and the associated knowledge have been described with the ODP DIN 13269. As a result, both, the logistical as well as the administrative activities could be sequenced sensibly with maintenance activities in order to generate the complete maintenance process. Since the ODPs VDI 4490 and DIN 17007 utilised for this purpose are component-independent, they can be reused regardless of the LRU under consideration. This also applies if this principle is to be transferred to another aircraft component.

The exemplary application to the EHSA has demonstrated that the proposed ontology is helpful in networking the heterogeneous maintenance process data. In this form, not only could data be integrated from OT and IT, but valuable knowledge about the LRU be formalised through the linked information. This enables maintenance service providers to use the actual condition and diagnostics information of a LRU for planning maintenance-related and business processes. Likewise, this formalised knowledge can be reused for different further use cases.

6.2 Discussion

This section provides a critical reflection on the initial outcome of creating and using ontologies as knowledge bases for maintenance process data of LRUs. In this respect, the defined challenges and requirements (see Sect. 4.1) are compared with the results from Sect. 6.1. Besides, the transferability to other LRUs will be analysed theoretically.

The primary objective, as outlined in Sect. 1, was to establish a comprehensive knowledge base to accumulate maintenance process data, pertinent information and valuable expert knowledge. This knowledge base aimed to seamlessly integrate maintenance process data from both OT as well as IT sources and facilitate its accessibility for the envisioned digital services. When designing the knowledge base and ontology, both requirements R1 and R2 were taken into account. First, the necessary knowledge was modularized using ODPs. Second, industry standards were utilised to describe terms and relations, ensuring reusability. The aim was to enhance the efficiency of creating LRU-specific ontologies. Indeed, the prototypical implementation has only been shown using one LRU (in this case the EHSA). Nonetheless, the analysis of other workshops and technologies reveals that occasionally, as intended, the black ODPs (see Fig. 5) need to be adapted for other LRUs. All other ODPs in the ontology can be reused and hence contribute to reducing the effort required for modelling individual knowledge bases. As such, it has been illustrated that the ontology is suitable for structuring and collecting data, information and knowledge on LRUs in the maintenance workshops. Merely, the LRU-specific description of information (component structure, function, behaviour, faults, etc.) must be addressed. In this context, the mapping rules are particularly relevant, as various technology workshops (e.g. in avionics) employ distinct data formats. An adaption to these diverse formats incurs substantial effort. However, if one wants to use the ontology for LRUs from the same technology family, the modification effort of the T-Box can be considered even lower. For example, data analysis has shown that the modelled T-Boxes can be adopted for a wide variety of EHSAs. Often the component description used is very similar or the same. Nonetheless, the ontology must be evaluated using further LRUs.

The prototypical application to the EHSA has shown that the ontology incorporates significant information for short-term maintenance scheduling. As such, it integrates relevant maintenance process data on maintenance, logistical and administrative activities in accordance with DIN 13306 (see R3). Of particular relevance is information on the condition of the LRU as well as on its life cycle (past maintenance activities and measured values). These can serve as input for appropriate algorithms (e.g. models from the field of ML) in order to carry out planning steps. Contractual (customer requirements) and logistical information are equally valuable with the aim to improve maintenance performance. The method has not yet been tested in comparison with state-of-the-art methods of maintenance scheduling. In this respect, it is necessary to assess how the procedure presented performs with regard to the logistical KPIs. Since the results also depend strongly on the diagnostic results, research is needed on how these can be improved. Beyond that, further information on currently available spare parts stocks and available resources (measuring equipment, technical personnel, etc.) must be integrated.

Experience has proven that the domain of maintenance particularly benefits from the combination of sub-symbolic and symbolic AI methods (see R4). In the application example, correlations only became apparent by analysing historical data (e.g. frequent faults). As a result, the ontology could be extended, for example by formalising the existing fault classes as a result of the data analysis. It has also been demonstrated that the employment of ML models for inferring possible fault classes in the maintenance activity diagnostics is highly beneficial. Furthermore, the knowledge of domain experts can be used to determine necessary features as well as to label data. However, data quality remains a major concern. As maintenance is still dominated by manual activities, it is much more difficult to track maintenance activities purely on the basis of data. Relevant measured data for model training are sparsely collected. It is not uncommon for the technician to intervene manually during the test. As a consequence, recorded historical measured data can only be conditionally relied upon. Likewise, condition information from flight operations is often missing. In this respect, the actual integration of ML methods still remains considerably more complicated in practice. Furthermore, the automatic population of the ontology is still unresolved with regard to some of the existing information. After all, a lot of information is available in unstructured form (e.g. maintenance documents, maintenance record, etc.) and currently still has to be modelled manually or integrated into the ontology by hand. In view of the large amount of information and the frequency of its changes, the effort involved in the manual creation is very high. In this context, methods from multimodal learning need to be further explored and applied.

Considering the aspect of uncertainty (see R5), important terms and relations from the field of statistics are incorporated with the help of the ODP ISO 3534. This enables further types of uncertainty (both aleatory and epistemic) to be accounted for in maintenance scheduling tasks. In the application example, a simplified metric was employed to verify the fundamental functionality of the ontology concerning the CQs. However, no appropriate metric has been utilised in the outlined example to represent the uncertainty of the diagnosis. Evaluating uncertainty in ML models remains an active area of research. The choice of appropriate techniques depends, amongst other factors, on the specific ML model used for diagnostics. In the future, more suitable techniques should be considered, particularly those that adapt based on the available information. For instance, classification results could be characterised using suitable probability distributions and confidence intervals. Additionally, reducing epistemic uncertainty can be achieved through targeted digitization and acquisition of relevant features for diagnostics. Emphasising these aspects would enhance the overall performance and reliability of the diagnostic process.

7 Summary and outlook

In the era of Industry 4.0, systematic organization of data, information and knowledge related to maintenance processes is crucial for MRO service providers. Therefore, the main objective of this contribution was to emphasise the potential advantages of ST for driving data-driven process improvements. First, precious data sources in maintenance processes of LRUs and challenges in data-driven projects have been outlined as a means to answer RQ1. Papers from different domains demonstrated that ontologies are potentially helpful in overcoming these challenges. Notably, ontologies serve as an efficient means to access and integrate heterogeneous data collected from diverse sources. Moreover, when combined with approaches from DM and sub-symbolic AI, they offer the opportunity to generate valuable new insights and knowledge. However, specific considerations must be given to additional requirements within the domain under study. Therefore, after analysing the challenges, pertinent requirements for developing an ontology concerning maintenance processes were identified (see RQ2). With respect to RQ3, the ontology was constructed by defining essential terms and relations according to industry standards. These definitions are particularly relevant for application in diagnostics and short-term maintenance scheduling. The applicability of this ontology has been assessed by means of an exemplary application, in which an EHSA and its maintenance process data have been investigated. As part of the knowledge base evaluation, specific CQs were formulated and addressed using SPARQL queries. This allowed the effectiveness of the ontology in meeting the information needs of stakeholders to be assessed.

The ontology presented must of course still be proven advantageous in further use cases and for different LRUs. With regard to the different data formats, it is also necessary to choose a suitable strategy between materialised and virtualised data access. The latter is particularly recommended for time series data. After all, in view of the high volume and velocity of the data, a complete materialisation is not expedient. Moreover, the companies concerned have to deal with major data quality issues. In some cases, valuable information for the ontology may be unavailable or of insufficient quality. As part of digitization efforts, research should be conducted on how to collect this information from the maintenance process using appropriate technologies. For use cases, such as diagnostics and short-term maintenance scheduling, it is crucial to generate accurate and synchronised timestamps for measured values and maintenance activities. This should be done in a non-disruptive manner for the technicians’ work. In addition, solutions must be found for the problem of uncertainty in the ontology. For example, research is needed to determine whether certain fuzziness in the data can be tolerated. Besides, further techniques from DM and sub-symbolic AI need to be applied in the described application scenario on the basis of ontology-based data access and integration. The effortless creation and updating of such ontologies remains one significant challenge. The continuously changing contextual conditions necessitate frequent adjustments to both the T-Box and A-Box. Therefore, methods need to be explored that facilitate this process.