Predictive maintenance using digital twins: A systematic literature review

Context: Predictive maintenance is a technique for creating a more sustainable, safe, and profitable industry. One of the key challenges for creating predictive maintenance systems is the lack of failure data, as the machine is frequently repaired before failure. Digital Twins provide a real-time representation of the physical machine and generate data, such as asset degradation, which the predictive maintenance algorithm can use. Since 2018, scientific literature on the utilization of Digital Twins for predictive maintenance has accelerated, indicating the need for a thorough review. Objective: This research aims to gather and synthesize the studies that focus on predictive maintenance using Digital Twins to pave the way for further research. Method: A systematic literature review (SLR) using an active learning tool is conducted on published primary studies on predictive maintenance using Digital Twins, in which 42 primary studies have been analyzed. Results: This SLR identifies several aspects of predictive maintenance using Digital Twins, including the objec- tives, application domains, Digital Twin platforms, Digital Twin representation types, approaches, abstraction levels, design patterns, communication protocols, twinning parameters, and challenges and solution directions. These results contribute to a Software Engineering approach for developing predictive maintenance using Digital Twins in academics and the industry. Conclusion: This study is the first SLR in predictive maintenance using Digital Twins. We answer key questions for designing a successful predictive maintenance model leveraging Digital Twins. We found that to this day, computational burden, data variety, and complexity of models, assets, or components are the key challenges in designing these models.


Introduction
Since the emergence of the Industrial Revolution, technical and technological advancements have dramatically increased industrial productivity. However, after the 1970s, with the trend toward machine automation, technological advancements in the industry were lacking compared to other markets [1]. In 2011, a new revolution of digital advancement in the industry began, also known as Industry 4.0 [2]. This transformation connects sensors, machines, and Information Technology(IT) systems to increase value along the supply chain. These connected systems, also called cyber-physical systems, can interact and exchange data through Internet protocols.
While manufacturing processes and machines have become increasingly smart and complex, worldwide regulations induce a need for more secure, reliable, and safe systems. A machine component failure, for instance, may cause further damage to the machine, harm employees, or pollute the environment. Maintenance is executed to avoid dangerous machine failures to improve a machine's health condition.
The following maintenance methods are discussed in the literature [3]: Reactive maintenance, preventive maintenance, condition-based maintenance, predictive maintenance, and prescriptive maintenance. Descriptions of these methods are presented in the Background and Related Work section. This study focuses on predictive maintenance approaches, as the field is rapidly evolving while still being implemented. However, large datasets with run-to-failure data must be built to develop predictive maintenance models. Unfortunately, most machines equipped with sensors are well-designed, resulting in very low amounts of failure data. As these machines are equipped with sensors and are connected to the Internet, it is possible to connect the physical When such an algorithm is trained, its model can use inputs from the cyber-physical system to (1) classify whether a failure will happen, or (2) when there is enough data, predict when the failure is going to happen (i. e., estimation of the remaining useful life). One of the capabilities of the Digital Twin, according to [6], is to estimate the physical system's response to an unexpected event before it occurs. This prediction may be made by analyzing the event and the current response to previous behavior predictions. A complete Digital Twin instance can be produced depending on the data collected and the completeness of the simulations employed.
In this study, an SLR on predictive maintenance using Digital Twins is conducted to collect and synthesize the available literature to present the state-of-the-art to establish a foundation for future research. We aim to find all relevant information regarding Predictive Maintenance using Digital Twins from a Software Engineering perspective. The literature review identifies the critical components of predictive maintenance using Digital Twins that academics and the industry may use. The steps used in the SLR process follow [7]. We have carefully identified and synthesized 42 academic studies that leverage predictive maintenance using Digital Twins. No comprehensive overview of studies of the current state of predictive maintenance using Digital Twin has been reported to this day. Hence, this paper aims to identify and synthesize the current studies focusing on predictive maintenance using Digital Twins and identify the objectives, the application domains, development approaches, and the corresponding challenges and solution directions. This research provides a foundation for future research on leveraging Digital Twin techniques to improve predictive maintenance capabilities. New technological advancements in Artificial Intelligence (AI), Machine Learning (ML), Deep Learning (DL), Computer Vision, IoT, and Digital Twin fields provide many opportunities for predictive maintenance using Digital Twins; however, there are many general and domain-specific challenges to address.

Background and related work
There are several maintenance methods. Each of these methods has a different balance of complexity versus costs. [3] provided the following non-exhaustive list of five maintenance methods: • Reactive maintenance focuses on performing maintenance only when a component breaks, which is often used for components with low cost and risk of hazardous situations. Reactive maintenance induces machine downtime through unscheduled maintenance. This is the costliest maintenance method, which contains a high risk of catastrophic failures of the whole machine.
• Preventive maintenance focuses on performing maintenance at the set times. Here, the lifetime of each part is determined, and maintenance is done before the part fails. Preventive maintenance allows businesses to schedule maintenance and avoid machine downtime. However, this under-utilizes the component [4], as the trend is to ensure safety and service maintenance by over-maintaining the machine, which is expensive [3]. • Condition-based maintenance consists of predicting maintenance based on degradation and variation from the normal behavior of the machine [8]. The advancement of IoT and cloud computing, leveraged to monitor the asset's status, allows these abnormalities to be discovered. AI algorithms can improve condition-based techniques by diagnosing and collecting precise status data [9]. • Predictive maintenance focuses on proactive methods to reduce cost and increase machine uptime. Predictive maintenance aims to foresee when a component or system will no longer fulfill its function. This estimation is quantized through the remaining useful life (RUL), a Health Indicator, or equipment effectiveness. Based on the RUL and other machine data, businesses can predict when to schedule maintenance for a component or system [10,11]. Making full use of each asset is also more sustainable, as resources are used to their fullest capacity while not damaging other assets. • Prescriptive maintenance refers to predictive maintenance with a module to prescribe an action plan [12]. Consequently, the preventive maintenance plan is substituted with a proactive and intelligent approach [13,14]. It is estimated that the impact on service, cost, and safety would be optimal. However, this method is also the most complex to implement.
Even though Digital Twins have matured in the latest years, Digital Twin-based design was not fully explored until the study of [15], who proposed a Digital Twin design pattern catalog. They proposed design patterns for each stage in the product development process and indicated what a Digital Twin is not, such as a Digital Model, Digital Generator, and a Digital Shadow. In their perception, a Digital Twin contains a causally connected and synchronized digital object and physical object. In addition to the "real" Digital Twin patterns, the patterns that do not include two-way synchronization were also adopted in the design pattern catalog. An abstracted catalog of design patterns is listed in Table 1.
There are multiple related literature reviews on Digital Twins, cyberphysical systems, and predictive maintenance. For instance, [16] provides a detailed review of the challenges and opportunities of AI-enabled Prognostics and Health Management (PHM), including predictive maintenance. [17] provide a review of RUL prediction methods using  [18][19][20][21] were categorized as literature reviews on smart manufacturing in the industry using Digital Twin technology. [22] performed a review on Digital Twins for RUL prediction on gear degradation specifically. Using the Digital Twin, they identified that RUL prediction is established using physics or data-driven models. Studies used gear crack, fatigue, surface scratch, tooth breakage, and permanent deformation for physics-based models. Studies used shallow learning methods or deep learning methods for data-driven models. Additionally, [23] reviews RUL prediction for offshore fixed and floating wind turbine power converters using Digital Twin technology. [24][25][26][27][28] were categorized as reviews on Industry 4.0 and techniques such as machine health management or smart manufacturing. In their reviews, the scope is on cyber-physical systems rather than Digital Twin systems. [29,30] provide reviews of augmented reality applications in smart manufacturing, which is a key technology to facilitate human integration in a manufacturing system. At last, [3] provided the most related study to ours; a review on Digital Twins for maintenance. However, [3] looked at maintenance from a broad perspective. Also, preventive and prescriptive techniques were analyzed. In summary, related work investigated Digital Twins or predictive maintenance separately and often in a non-systematic way.

Results in relation to related work
Our study's main difference from the related work is that we have explicitly adopted an SLR protocol, a widely accepted method used in the Evidence-Based Medicine and Software Engineering communities. Additionally, we have focused our research on the combined effort of Digital Twins and predictive maintenance. Based on the SLR protocol, we have searched and identified primary studies from a broad set of over eight hundred studies, from which we selected 42 primary studies. This systematic review methodology differs from [3], which published a review on Digital Twins for maintenance. Besides, [24][25][26][27][28] studied cyber-physical systems for health management. Therefore, we developed a systematic review that is significantly different, with new and relevant insights.
The coming sections are organized as follows: Section 3 explains the systematic literature review research methodology. Section 4 presents the results. Section 5 shows the discussion. Section 6 discusses the conclusion and future work.

Research methodology
The related work shows no up-to-date overview of predictive maintenance using Digital Twins. As predictive maintenance and Digital Twins are developing rapidly, we aim to provide an overview of the current trends of the techniques used. In addition, we see that most secondary research focuses on a specific application domain or component. This study aims to overview the key features required to build predictive maintenance models that leverage Digital Twins. By doing so, this study can act as an accelerator for future primary studies in this domain. We perform a semi-automated SLR to gather all relevant primary studies.
We constructed a review protocol before conducting the SLR based on the guidelines from [7,31,32], who describe that a predefined, strictly followed protocol reduces bias among researchers and increases rigor and reproducibility. Fig. 1 shows the adopted review protocol. Section 3.1 provides a table of research questions for the SLR. Section 3.2 to 3.2.3 describes the SLR's scope, search methods, and search string. Section 3.2.4 describes the inclusion/exclusion criteria of the retrieved literature, and Section 3.3 adds quality assessment criteria applied to the retrieved literature.

Research questions
We aim to find all relevant information regarding Predictive Maintenance using Digital Twins from a Software Engineering perspective. When building a Digital Twin for Predictive Maintenance, developers should always know (1) its objective and application domain, (2) platform to develop Digital Twins, (3) model representation type to use for developing a Digital Twin, (4) approaches for developing predictive maintenance algorithms, (5) abstraction level of the Digital Twin, (6) Digital Twin design pattern to use, (7) communication protocol to use, (8) Twinning parameters for predictive maintenance and Digital Twins, and (9) challenges to expect. We constructed a table of research questions to which the SLR should provide answers. Research Questions are given in Table 2.

Search strategy
The literature search aims to collect as many related primary studies on predictive maintenance using Digital Twins as possible (i.e., a high recall) while excluding irrelevant studies (i.e., a high precision). A welldeveloped search strategy is essential to achieve high recall and precision levels. This section elaborates on the review's search strategy: the scope (i.e., publication period and publication venue), search method (e. g., automatic search or manual search), and search string.

Search scope
The SLR scope consists of two dimensions, namely, publication period and publication venues. Concerning the publication period, this SLR includes papers published from January 2002 to August 2021. The year 2002 was taken as the starting point since, from this year, Digital Twins have been introduced in literature [33]. We conducted the literature search in August 2021, accepting only studies until that point. From a publication venue perspective, we searched for studies in the following databases: ScienceDirect, Scopus, ACM Digital Library, IEEE Xplore, Wiley, Taylor, and Francis Online, and Springer Link. Publication venues were required to cover a large number of papers regarding the Software Engineering discipline.

Search method
For this systematic study, we used an automated literature search. Automatic search refers to scanning for the search strings in electronic databases. We used an automatic search for each publication venue. We also supported our search with snowballing (i.e., backward snowballing and forward snowballing), which means that we used both the reference list of the selected papers and the citations to the selected papers to find more relevant papers in this SLR research.

Search string
To find relevant articles regarding Predictive Maintenance using Digital Twins, we used the following general search string: ("digital twin" OR "cyber-physical system") AND ("predictive manufacturing" OR "condition monitoring system" OR "prognostics and health management" OR "remaining useful life" OR "predictive maintenance") We used Digital Twin and predictive maintenance as starting terms. Then, we added Remaining Useful Life, one of the Key Performance Indicators (KPIs) for predictive maintenance. A predictive maintenance system is also often called a condition monitoring system, which executes Prognostics and Health Management. Last, we added cyberphysical system, an umbrella term for physical systems controlled by algorithms. This term is often used in titles that also describe Digital Twins. However, cyber-physical systems are a broader term than Digital Twin, resulting in a lower precision. Table 3 lists the results of the search query. A total of 831 papers were found through an automated search. Scopus was the source with the most results (330 studies). ACM Digital Library was the source with the lowest amount of results (40 studies). Since there were duplicating studies retrieved from different databases, the total number of papers per stage might not equal the sum of retrieved studies.

Study selection criteria
The terms "Digital Twin" and "Predictive Maintenance" provided many secondary studies. We identified the relevant studies using the study selection criteria provided in Table 4. In this table, IC and EC denote inclusion and exclusion criteria, respectively.
As the selection of primary studies is known to be repetitive, dull, highly erroneous, and the most time-consuming task [32], we leverage a data-driven tool to assist our selection process. [34] provides a list of all SLR automation tools. We used ASReview, an active learning tool from Utrecht University [35], which uses continuously evolving machine learning algorithms to perform primary study selection efficiently. We selected ASReview as the software is open-source, the software runs locally, making sure data is kept private, the user interface is intuitive, and experience with the software was gained through a hackathon.
To use the ASReview software, we exported all articles to Excel and later imported these articles into the tool. We selected the default machine learning model, Naïve Bayes, and selected primary studies. The system provides the reviewer with articles that are ranked on the probability of inclusion. After each decision, the model is updated and proposes the next article with the highest probability of inclusion. We decided to stop reviewing after reading 20 irrelevant articles in a row.
The selection criteria were applied by reading the title and abstract, which reduced the number of included studies to 144. The second column of Table 3 lists the number of articles that passed the selection criteria. Finally, we retrieved and read the entire study and applied the selection criteria, reducing the number of papers to 42.

Study quality assessment
In addition to the exclusion criteria, the quality of the included literature was assessed as well. Quality criteria have been derived to determine if factors could bias study results. Table 5 shows the quality criteria. While developing the quality assessment criteria, the summary quality checklist for quantitative and qualitative studies has been adopted, as proposed by [7] and [36]. We chose the study quality assessment criteria based on their impact on the quality of this SLR.
While reading each study's full text, points were granted to each of the eight assessment criteria based on a scale from 1 to 0. As [37] described in their study, a full point should be provided for Q1 if the study's goal was specified in the introduction (expected place), and no point (0) should be provided if the study's intent was not mentioned in the report. A half-point (0.5) should be given if the objective was stated vaguely or not at the expected location. Studies with a grade lower than 4 out of 8 were excluded. As a result, studies with a higher grade were kept to include only high-quality input for our study. Fig. 2 shows that three papers have been excluded after quality assessment. Therefore, the number of included articles was reduced to 42 (Table 3, fourth column).  Table 4 Study inclusion and exclusion criteria.

IC1
The study must have full text (e.g., abstract-only papers are not considered) IC2 The study must be a primary study IC3 The study must be related to predictive maintenance using Digital Twins EC1 The study is a duplicate publication EC2 The study is published before 2002 EC3 The study is published in a language other than English

Data extraction
We leveraged a data extraction form as a structured method to extract the essential data from the 42 primary studies. An extraction form with essential features was first created using the research questions from Table 2. After performing several pilot extractions, the data extraction form was iteratively updated and refined after reading more papers. The final data extraction form is shown in Table 6. In addition to the ten data extraction elements that answer the research question, this form contains general metadata, including the year of publication and title. The table shows the ten research question elements as R1-R10. This data form was implemented in MS Excel and can be downloaded from the attachments.

Data synthesis
We performed a data analysis on the extracted data form. Many papers use synonyms for key terms. Therefore, we first standardized these terms. We created several bar charts, pie charts, and heatmaps using Python's pandas and seaborne libraries. These charts provide insights into data patterns. The full dataset has been made available on [115].

Results
The results section first describes the main statistics of the 42 selected primary studies. Afterward, the results corresponding to each research question are presented.

Active learning process
As discussed in the methodology section, we stopped primary study selection based on the abstract after the ASReview software proposed 20 irrelevant studies in a row, as we assumed to have included all relevant articles at that point. In our case, we stopped the primary selection process after reading 46.86% of all retrieved studies. Fig. 3 shows the percentage of proposed papers that have been accepted. In the beginning, there were many included studies, and the number of reviewed papers was low. Therefore, the percentage of accepted proposed studies is high. However, as there are more irrelevant studies than relevant studies, at some point, there is a high number of reviewed studies, while the accepted papers are low. Therefore, the fraction of proposed papers accepted in Fig. 3 decreases. Fig. 4 shows the percentage of relevant studies found vs. the reviewed studies. The gray dotted line shows the general case of a manual search with randomly selected studies for reviewing. The blue line shows the active learning assisted SLR progression. After reading 46.86% of all studies, we assumed all relevant studies were found.

Q1
Are the aims of the study clearly stated? Q2 Are the study's scope, context, and experimental design clearly defined? Q3 Are the variables in the study likely to be valid and reliable? Q4 Is the research process documented adequately? Q5 Are all the study questions answered? Q6 Are the negative findings presented? Q7 Are the main findings assessed regarding creditability, validity, and reliability? Q8 Do the conclusions relate to the aim of the purpose of the study? Are they reliable?

Fig. 2.
Histogram of the quality assessment grades.

Table 6
The data extraction form.
No. Extraction Element   1  ID  2  Title  3 Passed inclusion criteria 4 Date of extraction 5 Year of publication 6 Authors 7 Repository of extraction 8 Publication   Table 7 lists the 42 included primary studies. [38] were the first to introduce Prognostics and Health Management using Digital Twins, with an application on wind turbines' gearboxes. Fig. 5 shows that the key authors in this field are P. Aivaliotis, A.R. Nejad, Z. Liu, and K. Georgoulias. We see that P. Aivaliotis and K. Georgoulias often collaborated in their studies. Fig. 6 shows a sunburst diagram of the distribution of studies per search database. The inner ring shows the ratio of studies per database, while the outer ring shows the distribution of studies that have also been found on Scopus. For example, of the 17 studies found on ScienceDirect, 15 have also been found on Scopus.

Main statistics
ScienceDirect and Springer were the most popular databases for primary studies, with 17 and 4 studies directly found. Table 8 shows that the publication channel with the most publications on predictive maintenance using Digital Twins is the open-access IFAC-PapersOnLine, with three primary studies.
When looking at the study type, 40.5 percent were conference papers, and 59.5 percent were categorized as journal articles, indicating that most studies in this systematic review are peer-reviewed.

RQ1
: What is the objective of predictive maintenance using digital twins? Fig. 7 shows the objectives of predictive maintenance using Digital Twins. Nearly all studies acknowledge that the aim is to estimate, predict, or detect the condition of a component, system, or system-ofsystems for maintenance more effectively using Digital Twins. Here, the output to predict differs. These outputs can be categorized into classification and regression tasks. A classification task indicates categorical targets, such as machine states. Regression tasks indicate continuous targets, such as the time until a machine fails.
Seven studies aimed to design a framework [43,45,53,64,73,78] or reference model [74] for predictive maintenance using Digital Twins.  Insights in application domains provide knowledge in which domains predictive maintenance using Digital Twins have matured and in which domains this topic is still in its infancy. The majority of included studies applied their algorithm or framework to an application domain. The application domains of the included studies are shown in Fig. 8. The figure depicts that Manufacturing and Energy are the main application domains.
The Manufacturing domain was the most frequently included application domain, with 25 studies. Out of these studies, eight focus on computer numerical control (CNC) machines, five focus on Industrial Robots, and two focus on Semiconductor Manufacturing.
Within the Energy domain, most research is focused on Renewable Energy sources (n = 4), such as Wind Turbines. The other studies focus on Fossil Energy (n = 3), Energy Grids (n = 2), and Nuclear Energy (n = 1).
Nine studies apply their study in the Aerospace domain, which can be related to the fact that NASA provides several datasets in this field, such as the Intelligent Maintenance Systems Bearing dataset [80] and the Turbofan Engine Degradation Simulation dataset [81]. Using these datasets eliminates the need to build a custom Digital Twin, as the data has already been simulated, accelerating and standardizing the research process.
We must note that even though there is a large variety in the application domains, many papers across all application domains discuss the application of Predictive Maintenance on components with rapid wear, such as rolling bearings and gearboxes. 4.5. RQ3: Which digital twin platforms are used to develop digital twins for predictive maintenance? Fig. 9 shows the count plot of Digital Twin platforms. Digital Twin platforms are often IoT-enabled software tools that host a Digital Twin model and its modules. Four studies have used OpenModelica from the same authors [52,67,68,79]. OpenModelica is an open-source modeling and simulation environment based on the Modelica programming language [82]. It is used often by both experts in industry and academics. [41,47,78] used MathWorks Simulink to design and analyze models that can act as Digital Twins. They have been used for predictive maintenance of pumping equipment, drilling machine, and a transmission system. [59,70] leveraged a drivetrain Multibody System Simulation from Simpack to generate additional input data on top of data from the physical object.
The figure shows that eight other studies used another platform for building a Digital Twin. Some papers consider the predictive maintenance module as part of a Digital Twin. Others only consider the modeling and simulation module as the Digital Twin. We listed the modeling platforms for building Digital Twins as follows:   [21,83] state several approaches to detect production failures. These approaches can be model-based or data-driven. Model-based approaches represent the physical asset through mathematical or physical equations. [83] provides a non-exhaustive list of model representation types as follows: The geometric, physical, behavior and collaborative models are descriptive models, while the decision-making model is an intelligent data-driven model approach [83]. To our knowledge, we can add another representation type, called a hybrid model. This type combines two or more models, such as a physical and data-driven model. Due to Industry 4.0 and the increasing use of IoT, huge amounts of historical machine data are often available. Data-driven models can find hidden patterns in this data and use these patterns to model a physical asset. [3] describes that Digital Twins using a model-based approach produce the most accurate synthetic data. However, they are also costly, as it takes more time to develop these models, and they require expert domain knowledge. On the other hand, decision-making models, such as machine learning algorithms, rapidly speed up the time to market developing Digital Twins with a slightly decreased accuracy. Fig. 10 shows the number of studies that used each Digital Twin model representation type. Furthermore, one study may use multiple model representation types for a Digital Twin. If multiple models were used, we would also count one Hybrid-approach Model. Looking at Fig. 10, we see that, in total, three studies used such a Hybrid approach [38,40,44]. For instance, [38] explains, "a high-fidelity digital mirror model for the equipment is built in different levels of geometry, physics, behavior and rule" [38]. Rule-based and data-driven model representation types were categorized as decision-making model representation types.
Next to [38], there was one study that used a geometric [65] representation and one study that used a behavior [58] model representation. [40,44]  This section discusses approaches applied for predictive maintenance using Digital Twins. We categorized extracted approaches into (1)  (5) Mathematical approaches. In the following subsections, we discuss each of these approaches. We plotted a heat map versus the Digital Twin model representation types for each of these approaches to show what type of Digital Twin the approach is applied. Fig. 11 shows the Machine Learning approaches used for predictive maintenance using Digital Twins. Four studies implemented a Support Vector Machine (SVM) [57,58,63,77]. SVM is an algorithm that learns a hyperplane through linear algebra to perform classifications [84]. [57] used a polynomial kernel SVM for binary classification to predict oil pipeline risk. [58] used an SVM for multi-class classification of seven states of a rolling bearing. [77] used an ensemble of 276 binary SVM classifiers for 24-class health state classification of a Control Element Drive machine. [44] used an ensemble of a regression variant of the SVM, the Support Vector Regressor (SVR), linear regression, decision tree regression, and random forest regression for RUL prediction of a cutting tool. Decision trees aim to perform classification or regression using a binary tree algorithm. Each leaf node represents a single numeric input variable and a binary split point on that variable. The tree's leaf nodes contain an output variable used for predictions [84]. Random forest is based on decision trees, as it trains an ensemble of multiple 'weak' decision trees. These weak decision trees are trained on a randomly sampled set of features to resolve the greedy behavior of decision trees. Specifically, [44] aimed to predict the wear of the cutting tool due to cutting force. The ensemble was used as a data-driven approach in hybrid with a model-driven (i.e., physics-based) approach.

Machine learning approaches
[45] used a decision tree as an explainable model for root-cause analysis to find causes that lead to the deterioration of silicon wafers' polishing velocity in the semiconductor industry. [50,77] used k-means clustering. K-means clustering is an unsupervised learning algorithm that categorizes the samples in k number of clusters [84]. [50] used the k-means algorithm to cluster assets with similar Health Indicators, while [77] used the k-means algorithm combined with Principal Component Analysis (PCA) to find natural clusters in coil signal data and assign a Health Indicator to each coil profile. PCA is an unsupervised learning method to reduce the dimensions of the feature space [77]. [38] employed an extreme learning machine (ELM), a feedforward neural network with a single hidden layer with fast learning speed and good generalization performance to predict the fault cause of a gearbox (i.e., tooth wear, fatigue, or breakage).  [64] used a self-organizing map (SOM) for fault diagnosis and damage level prediction through time-frequency features from the vibration signal. A SOM is an unsupervised machine learning model for pattern recognition, typically used for dimensionality reduction. Fig. 12 shows the Deep Learning algorithms used for Digital Twin representation or predictive analytics. Deep learning is a subdomain of Machine Learning, inspired by artificial neural networks. These layered networks are often stacked; we call them deep neural networks [85]. [39,42,51,62,71] used an autoencoder algorithm to represent a Digital Twin. The autoencoder is a neural network type that learns a compressed representation of raw data. The autoencoder is composed of an encoder and decoder part. The encoder compresses the input while the decoder attempts to recreate the input from the encoder. The encoder model is saved after training, whereas the decoder is deleted. The encoder may then be used as a data preparation approach to extract features from raw data so that a new machine learning model can be trained [86].

Deep learning approaches
To rebuild the input time series, preserve the healthy data information, identify anomalies, reconstruct the error, and generate an unsupervised Health Indicator for reliable RUL prediction, [71] employed a Long Short-Term Memory (LSTM)-based encoder-decoder. The LSTM algorithm is a type of Recurrent Neural Network (RNN) designed for time series data where the order of data samples is important. LSTMs are designed to store long and short-term temporal information. LSTMs are generally uni-directional, but may also be bi-directional (bi-LSTM) [87]. The LSTM can be used as an encoder-decoder, where the encoder part reduces the feature space, and the decoder part up-samples the feature space, which reduces overall noise. [39] used a feedforward neural     network as an autoencoder. To train the autoencoder to discover the key features, the autoencoder is trained to reconstruct the input from a corrupted input: the so-called denoising autoencoder. They stacked multiple denoising autoencoders to predict the RUL of a proton exchange membrane fuel cell to obtain a Stacked Denoising Autoencoder (SDA). Each Denoising Autoencoder's output provides new input for the next Denoising Autoencoder. On top of the SDA, a logistic regression layer is used to predict the RUL. [42] and [62] used a Stacked Sparse Autoencoder (SSA) with a Softmax activation function to classify a machine's health condition. As [42] describes: "Sparse autoencoder (SAE) imposes the sparsity constraint on AE to make most of the hidden units be inactive" [42]. [ . CNNs are one of the most common DL architectures. CNNs use a set of convolutional and pooling operations to extract topological features of input data. Afterward, a set of fully-connected layers is used for classification. CNNs are mostly used in computer vision applications [88]. As mentioned above, [51] has built a VAE using convolutional layers. [66] used the data from three sensors, (1) normal load, (2) frictional torque, and (3) temperature, to develop a data-driven Digital Twin based on a CNN inspired by Google's Wavenet architecture. The CNN made use of dilated causal convolutions. After testing, the CNN-based Digital Twin estimated the RUL of a four-ball tester with 95% accuracy. [48] used a hybrid CNN and Bi-LSTM architecture for multi-class classification of a Health Indicator in the C-MAPSS dataset. [40] used a CNN to estimate the Modulation Transfer Function (MTF), a key performance evaluation indicator used in the design stage of the electro-optical system. [64] used an Auto-Associative Neural Network (AANN). An AANN comprises a five-layer feedforward network that can be divided into two three-layer neural networks connected in series, like autoencoder architecture. An input layer is followed by a hidden layer and a bottleneck layer in the network. The bottleneck layer compresses the input data and topology while allowing effective feature extraction. A second non-linear hidden layer and the output layer of the second network follow the bottleneck layer. [64] proposed an AANN methodology to model the vibration and speed relationship between sensors near the four wheels of a high-speed train, and change of such relationship due to dent or corrosion on the track will generate residual signal between the neural network input and output data. The residual signal is then used as an indicator to locate irregularities on the railway track.
[48, 61, 63, 71] used a LSTM architecture. As mentioned above, [71] used an LSTM encoder-decoder (LSTM-ED) as an autoencoder, and [48] used a hybrid CNN and Bi-LSTM architecture for multi-class classification. [63] used a Bi-LSTM architecture as the feature extractor in the Domain Adversarial Neural Network (DANN) regime. [61] used data-driven and deep learning technology to develop a Digital Twin from sensor data and historical operation data of equipment and realizes reliable simulation data mapping through intelligent sensing and data mining called implicit Digital Twin (IDT). The aero-engine sensor data samples from the C-MAPSS dataset in IDT are fused to obtain a Health Indicator, combined with the LSTM method, to predict an aero engine's life.
[43] used a Spiking Neural Network (SNN) to detect anomalies in a Smart Energy Grid, which receives data from a sensor in a specific area. SNN considers temporal information, which fits their time series forecasting approach, exhibiting low computational needs.
[56] employed a Bi-Directional Gated Recurrent Unit (Bi-GRU) to construct a monitoring model to predict the tool wear condition based on extracted local features. A GRU's behavior is similar to the LSTM; however, it does not contain a memory unit.
Next to a VAE for explicit Digital Twin, [51] constructed a Generative Adversarial Network (GAN) in the implicit deep generative network setting. The GAN training process is cast as a two-player non-cooperative game where each player has control over its parameters (i.e., the weights of each network that constitute either the Generator (G) or Discriminator (D). One significant advantage of the GAN approach is that it does not require minimizing a Mean Squared Error (MSE) loss, unlike most regression and system identification methods. Instead, the D approximates the density ratio between the observed and generated data, and the G minimizes some divergence. These training dynamics allow the GAN to become sensitive to even low magnitude deviations from the manifold. As with the VAE, the GAN algorithm also predicts a Health Indicator.
[72] leveraged a multilayer perceptron (MLP) in a machine learning platform called OnPoint Cortex. The MLP contains a deep set of stacked linear neurons to perform a supervised learning task. With the MLP, [72] could accurately predict the risk of failure in the C-MAPSS dataset.
In the previous sections, we have discussed machine learning methods. For these methods, evaluation metrics have been defined. Fig. 13 shows the model validation metrics that have been used for digital twin models or predictive algorithms.
First, eleven studies used more than one key metric. [48] used overall accuracy, Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), micro-, and macro-averaged precision and recall, and R 2 to evaluate the performance of three models. [57] used the accuracy, precision, recall, and F1-measure metrics to compare two SVM models with different kernels. [51] used MSE through a loss function to optimize the VAE. They also used the Receiver Operating Characteristic (ROC) to compare the VAE and GAN Digital Twin models. [61] used the RMSE, relative error quantile (MAPE), and R 2 metrics to find the optimal training and testing data distribution for RUL estimation. [39] used precision and relative accuracy to evaluate a model's ability to predict the RUL. [77] used accuracy, recall, and precision to show the results of the SVM algorithm for anomaly detection. [71] used MAE, R 2 , model loss, and accuracy to show the performance of their LSTM-ED model on RUL estimation. [62] used MAE, R 2 , and accuracy to compare the performance of four models. [56] used RMSE and MAE to demonstrate the improvement of their wear estimation model quantitatively. [63] used Cumulative Relative Accuracy (CRA), RMSE, and MAE to calculate the error between the actual and predicted RUL of several Deep Learning   Fig. 13. Model validation metrics for AI-based models. models. [66] used accuracy, MAE, and the Pearson R correlation coefficient to compare various CNN architectures. [44] used the Pearson R correlation coefficient for feature selection and RMSE to find the optimal machine learning algorithm.
Second, seven studies showed to use of one key metric to compare algorithms. [67,79] used synchronous tuning of the simulation models, responsible for keeping the precision of the Digital Twin achievement over 95%. [40] used MSE to compare the validation results of MTF between a nonparametric Bayesian Network and a CNN. [38,42,54,72] used accuracy to compare several predictive analytics algorithms.

Model optimization methods
In AI-based solutions, models can be optimized in several methods. [39,44,58,71] used cross-validation (CV) to evaluate a predictive maintenance model. [58] used data from a bearing test rig to generate data for training and testing using cross-validation. [71] used 5-fold cross-validation to find the model that performs best on RUL prediction. [39] used cross-validation to find the optimal parameters of their SDA. [44] used a grid search algorithm, a variation on a CV. A parameter grid is provided with grid search, and all possible combinations are evaluated in a CV setting. As a result, the parameter setting with the best results is returned.

Statistical approaches
Several statistical approaches have been used in the predictive analytics phase, as shown in Fig. 14. We have identified studies that used probability distribution functions, Kalman and particle filters, Monte Carlo methods, and Principal Component Analysis (PCA).
[72] used a kernel density probability distribution function to establish the likelihood of failure based on past failures and the time since the last repair. [39,40,44,70,75] used a Particle filter or Kalman filter, also known as a Linear Quadratic Estimation (LQE). Both are statistical approaches for noise reduction. Only linear systems can leverage a Kalman filter [89]. If a non-linear system is encountered, and the goal is to estimate system states, a non-linear state estimator is required. The working principles behind particle filters are like unscented Kalman filters, but particle filters can approximate any distribution. For this, particle filters require a larger set of points, referred to as particles. However, the particle filter is computationally more expensive than a Kalman filter [89]. [70] used the Kalman filter to estimate the states/unnoisy response, where the output provides sufficient inputs for the designed load observers. [75] used the Kalman filter for wave load identification.
[44] used a particle filter algorithm that includes two processes: prediction and updating. In the prediction process, Monte Carlo sampling is used to estimate the next state according to the prior probability density of the system, and the updating process uses the measured data to modify the prediction results. [39] employed a Particle filter approach using an exponential empirical model as a comparison against their SDA model. [40] used a Gaussian Particle Filter (GPF), which uses Gaussian density estimation as a continuous approximation instead of a discrete approximation to resample the particles. [44,51,65,70,75] applied Monte Carlo methods, a broad class of computational algorithms relying on repeated random sampling to obtain numerical results. [65] used the Monte Carlo Method to numerically solve posterior distribution in a finite element model. [51] used Monte Carlo sampling to numerically solve an integral equation of the VAE to estimate Health Indicators. Additionally, [70] employed crude Monte Carlo simulation to mitigate the influence of uncertainty in the input measurements in the estimated error. [75] used a Markov Chain Monte Carlo Method based Finite Element Model to model non-linear system behavior. [42,51,57,77] used the PCA algorithm. [42,51] used PCA to reduce the dimensionality of the learned features to a plottable 2D or 3D space. [57] applied PCA to estimate the contribution rate of each feature for feature selection. [77] employed PCA, as they needed to perform a dimensionality reduction, which is a way to train machine learning models in a feature space that would otherwise be too large.

Mathematical approaches
Several studies have applied mathematical approaches throughout the Digital Twin modeling and predictive maintenance, as shown in Fig. 15. Exponential Degradation Model, n-Degrees of Freedom model, Finite Element Method, and Johnson-Cook model approaches have been applied.
model that follows the degradation trend based on set parameters. Using the EDM, a failure of interest can be detected. As the EDM is parameterized, it is flexible and scalable to multiple machines. In particular, [41] employed stochastic parameters. [51,59,69,70] used an n-Degrees of Freedom (DOF) torsional vibration model. An n-DOF system is a system that requires n coordinates to describe its equation of motion. There are n natural frequencies in an n-DOF system, and each natural frequency corresponds to a natural state of vibration with a displacement configuration known as the normal mode. Eigenvalues and eigenvectors are mathematical terminology for these quantities [90]. For monitoring the remaining usable lifespan of the gears in the different gearbox gear stages owing to contact fatigue stress, [59] employed a 14-DOF model as the drivetrain digital twin model. [70] proposed using a 3-DOF model to compromise the complexity and accuracy in their application for providing a rough estimation of RUL using a method implementable by a turbine onboard automation system. [51] employed a 3-DOF torsional model as the drivetrain Digital Twin for monitoring the residual life of main and high-speed shafts. [69] employed a 5-DOF modeling approach, as the increased fidelity led to less accurate simulations than a 1-DOF modeling approach. Thus, the increased fidelity led to a decrease in the accuracy in this case. [38,49,54,65,75] used a Finite Element Method (FEM), a method to numerically solve differential equations by dividing a modeled object into smaller connected elements. [49] employed a FEM simulation to gather information on geometrics (i.e., CAD models), materials, and process data. [38] used a FEM to simulate a wind turbine's blade deformation, gear tooth stress, and bearing temperature. [65] acquired material property parameters, such as Young's modulus, Poisson's ratio, and constitutive model from a laboratory. The material property information was fed into digital twin FEM to perform a non-linear analysis. [75] used typical FEM input parameters that influence the modal parameters to generate a Digital Twin of offshore structures are mass, the center of gravity, element stiffness, Local Joint Stiffness, soil stiffness, and damping properties. [54] generated a Digital Twin of a knuckle boom crane through real-time FEM simulation, where the estimated payload weight is used as an input.
[44] leveraged a Johnson-Cook plasticity model to accurately describe and simulate the relationship between stress, strain, and temperature during the milling process.

RQ6: Which abstraction levels of digital twins are used for predictive maintenance?
The abstraction level of the Digital Twin describes the level of detail of the Digital Twin and the level of precision of the predictive maintenance module. For example, a component-level Digital Twin may provide more specific maintenance predictions than a system-level Digital Twin. Fig. 16 shows the number of studies that developed Digital Twin on a component, system, or system-of-systems level. Nineteen studies discussed the use of Digital Twins on a component level, for instance, bearings [47,51,52,58,63,66,68,79], and gearboxes [38,51,52,59,67,68,79].
System-level Digital Twins aim to perform predictive maintenance on a full machine, such as high-speed railway transportation systems [53,64], drivetrains [64,69,70], boom cranes [54], or CNC machines [44,55,56]. Seventeen studies have focused on developing Digital Twins on this abstraction level. [42] published the only study discussing using a system-of-systems Digital Twin for predictive maintenance of a whole shop floor.

RQ7
: Which digital twin design patterns are applied for predictive maintenance? Fig. 17 shows which design patterns of Digital Twins have been applied for predictive maintenance. The Digital Twin design pattern count shows how many studies discuss that pattern. However, a Digital Control pattern also includes a Digital Monitoring pattern, and a Digital Monitoring pattern also includes a Digital Shadow/Model pattern. We have included just the highest maturity level pattern of the Digital Twin in these cases. Therefore, a study discussing the Digital Control pattern is counted with the Digital Control pattern.
We see that just three patterns have been applied to this day. First, the Digital Shadow pattern has been applied nine times, as researchers use NASA's Bearing and Turbofan Engine Degradation simulation datasets as Digital Twins [48, 51, 58-61, 63, 71, 72]. These datasets contain simulation data of bearings and turbofan engines with sensorial  features and an RUL for each time step.
[49] and [45] are the only studies that discuss the Digital Control pattern. In addition to an RUL estimation model, [49] also developed a Decision Support System (DSS) to provide rule-based and concrete solution proposals. The DSS would incorporate maintenance instructions for production workers and engineers, options to optimize costs for controlling, and recommendations for optimized product or machine development in the future. Unfortunately, the development and implementation of the DSS were not included in the study.
[45] discuss using a Prognostics and Healing Module as part of a smart controller framework. These Prognostics and Healing Module use Digital Twin simulation data, Big Data, Root-Cause Analysis, preprocessing, and Machine Learning and provide rule-based decision-making to determine the type of intervention.
Thirty studies make use of the Digital Monitor pattern [38-44, 46, 47, 52-57, 62, 64-70, 73-79]. Using a Digital Twin, they describe the monitoring of the RUL, a machine's condition, degrading health indicator, or performing fault diagnosis and analysis, and anomaly detection. During the monitoring, they provide predictive information of the physical object while leaving the decision-making to the machine operators.

RQ8: Which communication protocols are used for digital twins for predictive maintenance?
The communication protocols describe the data exchange methods between the physical and digital assets. Fig. 18 shows the communication protocols used to transmit information between the physical and digital objects.
Message Queuing Telemetry Transport (MQTT) is the standard protocol for IoT data exchange. MQTT uses a TCP/IP stack and publish/ subscribe architecture [91]. The MQTT protocol uses two network entity types: a broker and numerous clients. An MQTT broker is a server that receives all messages from clients and routes the messages to clients that are supposed to retrieve. Clients can publish messages on a certain topic, which the broker distributes to clients subscribed to that topic. Two studies leveraged the MQTT protocol [76,79]. [76] used the MQTT protocol with the open-source RabbitMQ message broker, [79] used the ORION Context Broker. Additionally, [79] used JSON-MQTT to send JSON payloads over MQTT.
Open Platform Communications Unified Architecture (OPC UA) is a communication protocol for IoT and Industry 4.0. OPC UA standardizes data exchange between physical objects in industrial environments while being manufacturer-independent. It uses a client-server communication architecture and a TCP/IP stack. Two studies leveraged the OPC UA protocol [41,78]. [78] used an XML schema to transport data in an object-oriented manner. [41] do not describe the data schema. [47,55] used the Modbus protocol. Modbus supports three protocols, TCP/IP, RTU, and ASCII [92]. [55] used Modbus TCP, which uses the Ethernet protocol and a client/server communication, while [47] used Modbus RTU, which uses serial master/slave communication.
[57] used the File Transfer Protocol (FTP) over HTTP to exchange data from a Bolt IoT module to a Bolt cloud server. [73] used the Extensible Messaging and Presence Protocol (XMPP), a protocol that uses XML schemas and exchanges data via TCP/IP.
Additionally, [55,57] used a SQL database to store sensor data, while [41] applied Mongo DB to store the data.

RQ9: Which twinning state parameters are used for predictive maintenance using digital twins?
Twinning state parameters are essential for creating an allencompassing Digital Twin that elaborates the complete behavior of the physical asset. Additionally, these Twinning state parameters are key for explaining the degradation of the physical asset. Fig. 19 shows the twinning state parameters used for predictive analytics of maintenance. The figure shows that most data sources or state parameters are focused on mechanical or electrical influences. The vibration was the most used sensorial data, with 13 studies that used the state parameter. Velocity, torque, and temperature were the other most used state parameters, with 10 and 9 studies.
Two state parameters need to be elaborated. First, [41,42] used operational data from a PLC. The data originating from the PLC is not further described. Second, [40] used ten images in addition to mechanical stress and electronic overstress to assess the health of an electro-optical system.

RQ10: What are the challenges and possible solution directions with respect to predictive maintenance using digital twins?
In total, eight challenge types were mentioned more than one time. The three main challenges were a computational burden, data variety, and data, model, or asset complexity. The three major challenges are further elaborated. All challenges found are listed in Table 9. A solution direction for a challenge is given when found.
Computational burden: [39,44,52,55] regarded computational  burden due to the level of detail as a challenge, which impacts the cost-effectiveness, energy, and time consumption of the proposed method. [38,61,62] describe the challenge of performing complex computations on a large volume of operational data at high velocity. The computational burden on local hardware may be solved by using cloud computing, which is often cheaper than hosting and maintaining a server on-premise. Data variety: [39,40,51,52] proposed that creating an accurate Digital Twin with measurements from just one operating condition is challenging. [52] also proposed to use additional external sensors to monitor the degradation of a system. [58] describe that there is just a limited set of datasets for bearing health conditions. [72] identified the lack of semi-healthy and failure data as a challenge for predictive maintenance. [93] describes the use of different RUL estimation models when tackling different datasets.
Asset/data/model complexity: [40,44,56] proposes that complex asset modeling with different behavior at different environments is challenging. [38,44,67] also add that it is currently deemed unrealistic to establish a detailed Digital Twin due to the complexity of assets. [61] describe the challenge of data complexity as follows: "A large amount of data will be generated during operation, which has the characteristics of multi-modality, multi-source heterogeneity, multidimensional, and complex distribution" [61].
Data scarcity: [41] describes that small firms barely manage asset maintenance data. [71,72,77] describe that large amounts of labeled run-to-failure data are required to create predictive models. However, these amounts of labeled data are not available. Data quality: [51] describes that there are uncertainties in the available data. [50] add that there is high noise in the data, which is challenging for parameter estimation. [46] describe the issue as follows: "How to combine the high precision sensing data with the effective depth of the system mechanism to obtain better state evaluation" [46].
Feature extraction: [62] describes that it is difficult to extract normal and abnormal behavior from large real-time datasets. [76,78] describe that it is a time-consuming challenge to create meaningful data from sensor equipment.
Lack of reference model, standards, or framework: [65,78] describe that there is no mature reference model for Digital Twins to this day. [74] propose to overcome the lack of standards, manufacturers and other organizations must come together to design a common standard for predictive maintenance approaches. Joint undertakings such as the European [94] may solve this challenge.
Prediction feedback loop: [56,63,66] describes the need for a feedback loop in predictive maintenance methods to evolve dynamically with the asset. The use of reinforcement learning methods, like Hidden Markov Models, could resolve this challenge.

Discussion
In the following sub-sections, we discuss the results of our study. In Section 5.1, we discuss the threats to the validity of the present study and how we tried to address them.
This study represents the first SLR on predictive maintenance using Digital Twins, focusing on providing a full overview for developers to the best of our knowledge. In this respect, we identified over eight hundred papers from which we selected 42 high-quality primary studies. We can make several interesting observations from the results, which we highlight for each research question.
Active learning process Using the ASReview software, we automated the primary study selection process using Natural Language Processing and Machine Learning. The active learning capabilities of ASReview showed to predict the most relevant studies. We could have reduced the number of irrelevant studies to 10, with a slightly higher risk of stopping too early.

Main statistics
We have included 42 primary studies in this review, of which 59.5 percent of studies were categorized as journal articles. Since 2018, a stable number of high-quality papers have been published. Of these studies, most have also been identified by the Scopus database.
RQ1. What is the objective of predictive maintenance using Digital Twins? Most researchers did not explicitly mention their objectives in the introduction or abstracts section. However, most of the mentioned objectives were straightforward and aimed at predicting a health indicator for an asset, such as RUL. For instance, [53] mentioned: "The primary goal is to transform the invisible patterns of component degradation and loss of efficiency into health insights" [53], where, after further reading, they aimed to predict RUL.
RQ2. On which application domains are predictive maintenance using Digital Twins applied?
We have found two main application domains for predictive maintenance using Digital Twins: Manufacturing and Energy. For the Manufacturing domain, we found three subdomains, while for the Energy domain, we identified four subdomains. For most studies, it was simple to extract the application domain, as they mentioned a use case study. However, some studies used a component-level or system-level Digital Twin of a bearing or gearboxwithin different application domains. Nonetheless, these Digital Twins could be applied in other application domains with slight to no changes.
RQ3. Which industry Digital Twin platforms are used to develop Digital Twins for predictive maintenance?  Most studies explicitly mentioned the type of model they used to represent the Digital Twin. We carefully read the methodology section to categorize the studies that did not explicitly mention one of the terms. Sometimes it was difficult to categorize the Digital Twin model representation type, as some authors consider the predictive maintenance module as part of the Digital Twin. Therefore, they mention a hybrid methodology, while the digital mirroring is executed using a physicsbased model, and a decision-making model is used for predictive analytics.
RQ5. Which approaches are being applied for predictive maintenance using Digital Twins?
The following paragraphs discuss approaches for predictive maintenance using Digital Twins.
Machine Learning approaches Throughout the application of predictive maintenance, the mostused machine learning approach is the Support Vector Machine. It has been applied for classification tasks, while one study also used it for regression. Remarkably, just two studies applied regression tasks to estimate the RUL with machine learning. This could be possible, as all studies have been published since 2018, after which Deep Learning has progressed massively.
Deep Learning approaches For deep learning approaches in predictive maintenance using Digital Twins, the approaches can be divided into approaches for Digital Twin model representation and approaches for predictive maintenance modeling. Our study shows that autoencoders, AANNs, and GANs have been used to represent a Digital Twin, while other architectures, such as LSTMs, GRUs, and MLPs, have been used for predictive maintenance. However, the CNN architecture has been applied for both Digital Twinning and predictive maintenance.
Model validation metrics To our understanding, accuracy, MAE, precision, and (R)MSE have been used most frequently as model validation metrics. No key model validation metric has been designated to compare different studies. However, [95] describes that using a single all-encompassing metric is easier to explain and compare for readers and researchers. Based on our results, we can state that additional research on the key model validation metric for each task in predictive maintenance for Digital Twins still must be executed.

Model optimization methods
Even though many studies have leveraged machine and deep learning, only five studies have used model optimization methods. The process of training machine learning models is the most complex and time-consuming aspect of applying the approach in general, both in terms of the effort necessary for building the process and the computing complexity required to perform it. Therefore, using hyperparameter tuning methods, such as grid search, random search, or Bayesian optimization, could drastically improve a method's performance. Furthermore, cross-validation allows authors and readers to compare algorithms and parameter configurations.
Statistical approaches A clean dataset is required to develop predictive analytics or digital twinning models, as they provide the knowledge more effectively. Cleaning sensor data and datasets, therefore, could provide huge performance increases. Therefore, using a noise filtering method, such as Particle Filter or Kalman Filter could be useful. Additionally, when opting for a physics-based Digital Twin model, using Monte Carlo sampling methods for Finite Element Method modeling enables researchers to develop high-fidelity models.

Mathematical approaches
The studies demonstrated that n-DOF torsional vibration models were frequently applied for assets heavily affected by vibrations, such as rolling bearings and gearboxes. Furthermore, when CAD models have been developed in the development phase, a FEM model could be applied as a Digital Twin. However, developing a FEM-based Digital Twin requires domain knowledge, which is costly and scarce. Additionally, a FEM-based Digital Twin is also static. Therefore, when an asset is slightly changed, it cannot evolve in a data-driven manner.
RQ6. Which abstraction levels of Digital Twins are used for predictive maintenance?
Most authors explicitly mentioned the abstraction level of the Digital Twin. However, sometimes it was difficult to categorize the granularity of the Digital Twin. For instance, is a pump a component or system? This may differ in various circumstances. Therefore, we kept a list of classified assets and used the same abstraction level for each asset.
RQ7. Which Digital Twin design patterns are applied for predictive maintenance?
We closely followed the Digital Twin design pattern catalog by [15] to classify the type of design pattern used. As expected, most studies applied a Digital Monitor design pattern. However, some studies used the NASA Bearing or Turbofan Engine Degradation simulation datasets. Using this dataset removes the need to develop a "real" Digital Twin, with continuous communication between a physical and digital object, which hugely decreases the time to develop RUL prediction models while improving reproducibility. We classified these 'Digital Twins' as Digital Shadow design patterns.

RQ8. Which communication protocols are used for Digital Twins for predictive maintenance?
Most studies did not mention the communication protocol to exchange data between the physical and digital objects. However, most studies mentioned that the communication protocol used a TCP/IP stack. We noticed that studies regarding Cyber-Physical Systems that do not apply to Digital Twins often elaborate on the communication protocols. Nonetheless, communication protocols are a key feature to enabling Digital Twins, as their speed, quality of service, and security standards are essential in the operations.
RQ9. Which Twinning state parameters are used for predictive maintenance using Digital Twins?
To develop a Digital Twin or predictive maintenance module, features must be selected that explain the machine's behavior or degradation. The selected primary studies frequently elaborated on the selected features to reach their objectives. After extracting the Twinning state parameters, we combined similar state parameters, such as speed and velocity. Other studies mentioned the sensors used to measure twinning parameters, such as accelerometers to measure vibrations.
RQ10. What are the challenges and possible directions with respect to predictive maintenance using Digital Twins?
Extracting the challenges from the studies was a tedious task, as not every challenge is expressed in the same field in every study. One study may express challenges in the discussions section, while others express them in the introduction, while others express it in both. The authors have thoroughly scanned each article to find any challenges. However, some challenges may be overlooked.

Threats to validity
Construct validity: The construct validity of the SLR determines if it accurately measures what it claims to. We employed several databases with a computer science discipline to gather a large set of evidence on predictive maintenance using Digital Twins. We created automated database search queries. Even while a database is a powerful literature search tool, it is also sensitive to how a query is phrased, and even tiny word changes can result in drastically different search results.
The first threat to construct validity is a query's phrasing. As each database requires custom search query formulations, we developed a general search query and modified them slightly for each database. As each search query has been customized, a relevant study might have been missed. To avoid this threat, the authors have thoroughly discussed the query design and evaluated the results of several trials. We included "cyber-physical systems" as a safety measure, an umbrella topic closely related to Digital Twins.
The primary study selection step is the second threat. Authors are more inclined to publish favorable outcomes of their study than negative results, according to the phenomena of publication bias [7]. Using the ASReview decision support system and the study quality evaluation, we have mitigated the risk of publication bias. After screening a primary studies pilot set, the study selection criteria were defined, reducing the risk of selection bias. All selection criteria were debated among the co-authors to ensure each included study's quality. Even though the selection is based on predetermined criteria and quality evaluation questions, it is difficult to eliminate personal and subjective judgments from the scoring process. Since the domains of predictive maintenance and Digital Twins are vast, it requires a broad spectrum of expert knowledge. Therefore, we carefully reviewed the studies, but we might have misinterpreted some aspects of the studies.
The third threat is the data extraction process. Even though the data extraction form was established, some essential data fields may be left out. We modified the data extraction form several times to ensure all the data we needed to answer the research questions. If the additional data (i.e., communication protocols, mathematical approaches, and statistical approaches) could be retrieved from several studies and was beneficial in answering the research objectives, we added new data fields.
Internal Validity: Internal validity reveals biased relationships between results, leading to structural flaws. We carefully formulated each research question to determine the required approaches, methods, and techniques to perform predictive maintenance using Digital Twins to overcome these biases. As there are many options for developing predictive maintenance and Digital Twins, and the field is relatively new, we might have missed essential research questions or made assumptions about the current research questions, which may contain flaws.
External Validity: The external validity shows how well the SLR study's outcome can be applied to other settings. This SLR study only reviewed studies that leveraged Digital Twin technology for predictive maintenance purposes. Likely, novel predictive maintenance techniques have not been applied with Digital Twins yet, and vice versa. As these studies have not been published in this setting, they have not been discussed regardless of their potential.
Conclusion Validity: The conclusion validity measures the reproducibility of the SLR. Our study followed the methodology by [32] and the protocol proposed by [7]. The research question design, search process, screening criteria, and quality evaluation were performed based on this widely used protocol. To reduce individual bias, we reviewed our SLR method in different meetings. We deduced all conclusions from the retrieved and synthesized data based on the tables and figures to prevent subjective interpretation of the results.

Review update
Since the submission of this research, much progress has been made in predictive maintenance using Digital Twins. As the review scope filtered studies published until August 2021, new advancements may have been excluded from this study. However, we have conducted a second search after the review and included relevant articles in this section.

Conclusion and future work
This study presents a systematic literature review (SLR) of the evidence generated in the past four years. This review aimed to identify the features, challenges, and solution directions of predictive maintenance using Digital Twins. We addressed 42 studies in this SLR that capture state-of-the-art strategies to leverage Digital Twin technology to improve predictive maintenance capabilities. To the best of our knowledge, this is the first SLR of its kind to look at Digital Twins as a predictive maintenance accelerator. Our choice to adopt an SLR as an instrument to answer our key research questions showed to be very useful and led us to the critical insights that could be beneficial to both practitioners and researchers.
This study has led to novel insights into the current literature on predictive maintenance using Digital Twins. Our heat maps for approaches enable researchers to comfortably find the key approaches and algorithms to use when developing a predictive maintenance module for each Digital Twin representation type. We have found that RUL estimation is the key objective for predictive maintenance using Digital Twins, as RUL is a KPI that provides essential information for maintenance planners. As a result, the leading approaches for predictive maintenance also focus on regression tasks. In the Manufacturing and Energy domains, predictive maintenance technologies using Digital Twins have been widely researched. Another important insight from this SLR is the overview of challenges and solution directions. We have collected the challenges that researchers explicitly mentioned. Then, we categorized the challenges based on their kind and, when available, included solutions for each challenge.
Our analysis observed a low adaptation level of industrial Digital Twin platforms. Furthermore, more research on recurrent neural networks for predictive maintenance using Digital Twins may be performed. The key challenges are a computational burden, the complexity of data, models, and assets, and the lack of reference architectures and standards. The discussion section provides a list of identified relevant studies published after the review was conducted.
Predictive maintenance using Digital Twins pays off because it dramatically reduces the number of maintenance activities and machines' downtime while increasing machine lifetime. As for future research, we plan to develop a design pattern catalog and reference architecture for predictive maintenance using Digital Twins.