A Systematic Literature Review on Transfer Learning for Predictive Maintenance in Industry 4.0

The advent of Industry 4.0 has resulted in the widespread usage of novel paradigms and digital technologies within industrial production and manufacturing systems. The objective of making industrial operations monitoring easier also implied the usage of more effective data-driven predictive maintenance approaches, including those based on machine learning. Although those approaches are becoming increasingly popular, most of the traditional machine learning and deep learning algorithms experience the following three major challenges: 1) lack of training data (especially faulty data), 2) incompatible computation power, and 3) discrepancy in data distribution. A new data-driven technique, such as transfer learning, can be developed to overcome the issues related to traditional machine learning and deep learning for predictive maintenance. Motivated by the recent big interest towards transfer learning within computer science and artificial intelligence, in this paper we provide a systematic literature review addressing related research with a focus on predictive maintenance. The review aims to define transfer learning in the context of predictive maintenance by introducing a specific taxonomy based on relevant perspectives. We also discuss current advances, challenges, open-source datasets, and future directions of transfer learning applications in predictive maintenance from both theoretical and practical viewpoints.


I. INTRODUCTION
The Industry 4.0 era saw the introduction of new paradigms and technologies based on connectivity, data analytics, and novel devices, allowing for inventory reduction, customization, and controlled production [1]. In many manufacturing industries, profitability and effectiveness are reliant on developing high-quality products based on reliable systems. In fact, any unanticipated downtime of machinery or deterioration of equipment can lead to significant penalties and severe reputational losses for companies. Therefore, maintenance represents a critical activity with significant implications for the companies' capacity to compete on cost, quality, and The associate editor coordinating the review of this manuscript and approving it for publication was Huiyan Zhang . performance. The transition of maintenance strategies from reactive and preventive to predictive can be a concrete result within Industry 4.0. In this context, special attention must be paid to Artificial Intelligence (AI) approaches, due to their capability to manage high-dimensional and multivariate data as well as extract hidden correlations within data; as such, AI is especially suitable for enhancing the performance of Predictive Maintenance (PdM) [2].
Within the AI field, Machine Learning (ML) and, more recently, Deep Learning (DL), have emerged as effective techniques for developing PdM models due to their capability of performing failure prediction tasks such as estimating the remaining useful life of a machine [2]. However, despite their benefits, traditional ML and DL techniques suffer several limitations. To begin with, these methods are usually based VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ on the assumption that both the training and testing datasets are drawn from the same distribution. When dealing with real-world applications, this assumption may not necessarily apply. Second, traditional ML and DL algorithms require a significant amount of historical fault data to learn the fault characteristics of machines, whereas in real-world applications it is not always feasible to run the equipment in failure mode for both safety and economic reasons. Finally, the cost of model training in terms of time is high. It takes a lot of time to adjust the weights and parameters of DL algorithms when training from scratch for a new operating condition.
To tackle the mentioned problems, Transfer Learning (TL) has recently emerged as a powerful AI technique [3]. More specifically, TL has proven to support target domain adaptation by leveraging knowledge acquired from the source domain, in order to cope with the lack of training data, especially faulty data, in the target domain. Furthermore, unlike traditional ML and DL methods, TL can be successfully applied even if the data distribution of the source and target domains differ. In addition, the amount of computing power and computation time required to train a model can be drastically decreased by leveraging pre-learned knowledge from various source domains [4]. Since TL methods can be effectively exploited for overcoming issues related to traditional ML and DL approaches, in this work we bridge the gap in the technical literature by providing the reference taxonomy, the state-of-the-art (SOTA), and the challenges arising from applying TL to PdM. In order to achieve those objectives, we leverage a method known as Systematic Literature Review (SLR). The main objectives and contributions of the paper can be summarized as follows: • To clearly motivate the need for TL on the base of a structured PdM categorization.
• To provide a novel TL taxonomy in PdM from three perspectives: problem, solution, and application.
• To provide a SOTA with up-to-date references focused on the use of TL techniques for PdM tasks.
• To highlight the key challenges that must be tackled to successfully apply TL to PdM.
• To discuss recent advancements, the availability of open-source data sets, and the possible future directions of TL in PdM, from both scientific and industrial perspectives. The remainder of the paper is organized as follows: Section II sets this study in the context of related works. The reference context, PdM categorization, and transfer learning definition are discussed in Section III. The motivation for the application of TL in the PdM context is discussed in Section IV. Section V described the systematic literature review approach, including the definition of research questions, the search process, and the selection and filtering of relevant publications. Section VI provides the responses to the research questions set in Section V, including the specific TL taxonomy and a discussion of methodological and industrial challenges. Finally, Section VII draws conclusions.

II. RELATED WORK
Several survey papers exist addressing different aspects of fault detection, diagnosis, and prognosis within PdM. For example, Carlvalho et al. [2], conducted a systematic literature review of traditional ML methods used for PdM and discussed the efficiency of the current state-of-the-art traditional ML techniques. Furthermore, Jovani et al. [5] investigated the benefits that traditional ML algorithms may offer in PdM, and have also conducted an SLR to identify the implementation obstacles. Zhao et al. [6] conducted a systematic review of machine health monitoring systems based on deep learning algorithms, including Deep Belief Networks (DBN), Deep Auto-Encoders (DAE), Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN). Olga et al. [7] presented a comprehensive assessment of current advancements, trends, challenges, and future direction of DL in the context of PHM. Lei et al. [8] introduced a study and a roadmap to comprehensively address the evolution of Intelligent Fault Diagnosis (IFD) as a result of developments in ML, as well as future perspectives. Zonta et al. [1] discussed current challenges and limitations in PdM and suggested a new taxonomy to categorize this study field in light of Industry 4.0 requirements. Juan et al. [9] conducted an SLR to summarize the current diagnostic and prognostic trends, as well as outline current challenges and research opportunities, with a specific focus on multi-model approaches. Keleko et al. conducted a bibliometric study to investigate and quantify the most important concepts, areas of application, methodologies, and significant trends of AI used in real-time predictive maintenance in Industry 4.0 [10].
Other survey papers are devoted to analyzing a specific component or equipment for fault diagnosis and prognosis, e.g., bearing [11], milling cutting tools [12] and rotating machinery [13]. Among them, Liu et al. [13] reviewed the main AI methods for fault diagnosis of rotating machines in industrial applications. In addition, a number of survey articles on fault prognosis are also available. For instance, Lei et al. [14] provided an evaluation of machinery prognostics based on four prognostic processes, including data acquisition, Health Indicator (HI) construction, Health Stage (HS) division, and remaining useful life (RUL) prediction. Although all those papers provide interesting reviews within fault diagnosis and prognosis, they focus on traditional ML and DL techniques, while none of them addresses TL as we do in this paper.
Only a few survey papers address the application of TL to maintenance (see, e.g, [15], [16]); however, they only focus on specific tasks for diagnosing machinery faults, while this work is more comprehensive. In addition, existing papers reviewed works published till 2019, while we found more than 120 related publications released after 2019. Indeed, only very recently TL has demonstrated its power within PdM and hence showed a rapid expansion. Also, in this study, we provided a novel TL taxonomy in PdM from three perspectives: problem, solution, and application.

A. BACKGROUND ON PREDICTIVE MAINTENANCE
The profit and competitiveness of the industrial sectors rely on designing and producing high-quality products and reliable systems. However, developing such sophisticated systems brings new challenges, including maintenance expenses. The main concern of the industry is reducing maintenance costs and minimizing business risks while improving asset reliability and safety. Three labels commonly used in the technical literature to indicate the main maintenance paradigms and related actions [17]: • Reactive Maintenance (RM). This approach is also known as corrective maintenance or run-to-failure (R2F) since the repair is conducted following the occurrence of a fault. In this paradigm, a machinery component operates from installation until its failure occurs, at which point the whole machine is shut down for maintenance. However, because of the additional maintenance costs (due to the impact of the failure on others machinery parts) and unanticipated downtime, the cost and efficiency of this methodology are often unacceptable for industry sectors.
• Preventive Maintenance (PM). To prevent process or equipment breakdowns, this kind of maintenance is performed on a predetermined schedule in time or process iterations, so it is often called time-based or planned maintenance. Despite its effectiveness, in some cases, the PM may increase operational expenses due to the possibility of over-maintenance.
• Predictive Maintenance (PdM). PdM makes use of predictive technologies to determine when maintenance is required. It is built on the continuous monitoring of a machine's or process's integrity, thus allowing maintenance to be done only when required. PdM stands out among the others in the industry 4.0 era owing to its capacity to optimize asset use and management [17]. PdM's main purposes are to avoid unexpected downtime, increase overall system reliability, and lower operational costs. The three main principles of predictive maintenance are fault detection, diagnosis, and prognosis. In other words, predictive maintenance involves detecting a developing defect (fault detection), in the case of faulty systems, isolating and identifying the specific type of fault (fault diagnostics), and forecasting the RUL or end life of the system (prognostics). PdM can be implemented online or offline. In an online implementation, information is collected, preprocessed, and analyzed in a real-time manner to raise alarm signals or adjust actions while the system is running. Offline implementation collects all operational data so that the maintenance team can analyze it afterward. Moreover, the interaction of predictive maintenance with manufacturing is based on the cyber system and service innovation. As a result, there is interaction with products and industrial processes of the system. It should be highlighted that Prognostics and Health Management (PHM) is an example of the possible predictive maintenance extensions often found in the literature.
Indeed, this term is frequently used even to replace predictive maintenance, and there is no consistency in the use of these keywords or their interaction in the maintenance area. In an attempt to clarify the keywords, Jimenez et al. [9], clearly define PHM as an extension of PdM aiming to increase asset predictability and life cycle management. According to [9], the predictive maintenance approaches, could be classified into two major groups: single-model approaches and multimodel approaches. Single-model methods can be further categorized as knowledge-based methods, data-driven methods, and physics-based methods, while the various multi-model approaches, sometimes referred to as hybrid models, arise from the different possible single-model method combinations. Data-driven machine health monitoring methods have grown increasingly appealing as sensors, and computer systems have advanced significantly. In this area, traditional ML, DL, and TL have emerged as effective techniques for developing intelligent prediction systems, especially in complex or large-scale and dynamic contexts. However, the success of these applications is contingent on the proper choice of the most suitable machine-learning technique according to the specific problem [5]. Finally, summarizing the main characteristics and features, the predictive maintenance overview, is depicted in Fig. 1 by using class diagrams.

B. TRANSFER LEARNING: MAIN CONCEPTS AND NOMENCLATURE
Transfer Learning (TL) aims to develop an effective model for a target domain with limited training data by leveraging and exploiting knowledge from different but related source domains. For the sake of completeness, the definitions commonly used in this field, and related notation, are introduced in the following.
Definition 1 (Domain [18]): A domain D = {X , P(X )} includes two parts, namely a feature space, say X , and a marginal distribution, say P(X ), where the symbol X specifies an instance set as X = {x|x i ∈ X , i = 1, . . . , N } that contains N instances.
Definition 2 (Task [18]): Given a domain D = {X , P(X )}, a task T = {Y, f } consists of a label space Y and a prediction function f (.), being the sample data composed of pairs {x i , y i } where x i ∈ X and y i ∈ Y. The prediction function f is designed to learn from sample data and predict the label of the future instance; it can be expressed as the following conditional distribution of instances: The above definitions can be specified for source and target domains. Accordingly, let D S = {X S , P S (X S )} and D T = {X T , P T (X T )} be the source and the target domain, where X S and X T represent the source and target domains' feature spaces, respectively, while P S (X S ) and P T (X T ) represent the source and target domains' marginal probability distributions, respectively. Correspondingly, a source task and a target task can be defined as P T (Y T |X T )}, being Y S and Y T the source and target task label spaces, respectively, P S (Y S |X S ) and P T (Y T |X T ) the source and target domains' conditional probability distributions, respectively. Given the above concepts, a unified definition of TL can be provided as follows.
Definition 3 (Transfer Learning [18]): Consider the source domain D S = {X S , P S (X S )}, with relating source task T S = {Y S , P S (Y S |X S )}, and target domain D T = {X T , P T (X T )}, with respective target task T T = {Y T , P T (Y T |X T )}. Transfer Learning tries to exploit the related knowledge embedded in D S and T S to improve the performance of the target prediction function f T (.) in the target domain D T with target task T T , being D S ̸ = D T or T S ̸ = T T . A crucial concept related to TL is the idea of domain adaptation, which has been developed to deal with cross-domain learning challenges under specific conditions. The aim is to adapt one or more source domains to exploit knowledge and improve the target learner's performance.
Definition 4 (Domain Adaptation) [16]): Domain adaptation is a type of transfer learning problem in which the source and target tasks are assumed to be the same, i.e. T S = T T . The definitions provided so far refer to single-source TL, which is the most common type of TL in current research. Since recent studies demonstrate that using several source domains and tasks improves the prediction function f T , the following multi-source TL definition must be given.
Definition 5 (Multiple-source Transfer Learning [19]): Given some observations corresponding to m S ∈ N + source domains and tasks (i.e., {(D S i , T S i )|i = 1, . . . , m S }), and observation about target domain and task (D T , T T ), transfer learning enhances the performance of the learned prediction function f T in the target domain by leveraging the knowledge implied in the multiple source domains.

IV. MOTIVATIONS FOR APPLYING TL TO PdM
In recent years, many traditional ML and DL methods have made significant progress as a result of their superior representation learning and pattern recognition capabilities, and they have also been successfully used for the predictive maintenance of industrial systems [7], [20], [21]. However, intelligent PdM systems based on traditional ML and DL have to satisfy specific conditions for achieving outstanding performance. First, the feature extraction process of traditional ML methods can be inefficient and passive. As a result, manual feature extractor design not only necessitates the expertise of signal processing experts, but it is also a time-consuming process, as each handcrafted feature is only acceptable for specific working conditions, necessitating the operator's judgment in many situations [8]. Second, the network of DL and ML algorithms has to be trained with enough labeled data to discover representative features and be able to generalize adequately. Therefore, training such networks requires a sufficient amount of labeled data; however, in the real world, it is usually possible to collect only a significant amount of data under the normal operation of the system. Indeed, fault injection and run-to-failure experiments are constrained in real industrial applications for safety, security, and cost concerns. As a result, if PdM models are trained on limited or unbalanced datasets, they can easily overfit, making generalization to the test data difficult [4]. Finally, typically traditional ML and DL algorithms have acceptable performance under the condition that test data comes from the same distribution as training data. However, real-world application complicates this issue because the working operation and environments of mechanical equipment (e.g., speed, load, and noise) might change over time. As a result, performance degrades due to the distribution discrepancy between the training dataset in a source domain and the test dataset in a target domain. In such cases, traditional ML and DL models are frequently reconstructed from scratch for a new working operation/environment, resulting in a waste of computational resources and training time. It follows that these issues make it challenging, in the case of traditional ML and DL, to generalize or adapt PdM for a new domain where there is a discrepancy in data distribution as well as a lack of data in the target domain.
In the real application world, maintenance technicians typically monitor the condition of one machine by using experience gained from different same-type machines instead of relying exclusively on insights gained from the target machine. So, in a real-world PdM application, training data can be collected from a variety of sources, such as different operation conditions, different same-type machines, failure simulation machines in the laboratory, or digital models. They will, however, have a different distribution than the data being examined in the target machine. However, owing to the same operating principle and failure mechanism, the data from the target machine and the data from the source machines should share the same fault characteristic information. [4]. These facts suggest that it may be useful to transfer knowledge across multiple related machines. In this context, TL has recently gained attention as a powerful technique to address all of these challenges. The various advantages of adopting TL with respect to traditional ML and DL methods can be finally summarized as follows: • Traditional ML and DL techniques rely on target domain data to train the model, while TL uses source domain data as a starting point, so requiring less target domain training data [19].
• Transfer learning-trained models can be easily adapted to other domains. Indeed, TL models are trained to recognize features, relations, instances, or parameters that may be applied to different domains [3].
• TL has the potential to make traditional machine learning and deep learning techniques more accessible. Indeed, TL algorithms can transfer to a new domain the maintenance knowledge coming from a model that has already been trained by traditional ML or DL algorithms [22].
• Unlike other learning methods, TL usually provides an optimal initial starting point, higher learning accuracy, and faster training for new domains [22].

V. RESEARCH METHODOLOGY
The systematic literature review technique provided by Kitchenham et al. is considered throughout this paper [23].
The methodology has been applied according to the following main steps.

A. RESEARCH QUESTIONS
The definition of proper research questions is a crucial part of a SLR [23]. Herein, the research questions aim to enable readers to better comprehend how TL can be effective in the PdM context. In so doing, we conducted preliminary research analyzing the publications useful for developing the research questions. At first, we have defined the Main Question (MQ) able to drive the search of open research challenges. Subsequently, based on the main question, we focused on Specific Questions (SQ) to highlight the existing approaches and to reveal gaps and opportunities for future studies. The main question and the specific questions are reported in Table. 1.

B. SEARCH PROCESS
The search process leverages two phases. The first is devoted to creating a search string, while the second aims to choose a source. The search string's construction necessitates a preliminary reading of handpicked publications related to the topic of interest. The considered search string is shown in Fig.2. Note that, on our first attempt to create the string we included keywords such as industry 4.0, cyber-physical systems, cloud computing, edge computing, internet of things, and so on. However, in this way, just a few papers have been obtained at the end of the process. So, we decided to remove these keywords from the search string in order to conduct a more comprehensive literature review in the field. Scopus has been chosen as the primary search database, due to its multi-publisher indexing (including Elsevier, Springer, IEEE, ACM, etc.), reliable metadata, powerful search engine, ease of use, extensive coverage, and quality filtering of sources [24].

C. SELECTION OF RELEVANT STUDIES
The string presented in Fig. 2 has been applied to the Scopus database by the searching string in the title, keywords, and abs of papers. Following that, all research that is not related to the study's objectives has been removed. To this aim, the Exclusion Criteria (EC) in Table.2 have been applied.   Subsequently, for further refinement, a manual filtering process has been performed, which is composed of the following steps.
1) Analysis of the title and abstract. It consists of reading the title and the abstract of the paper and determining whether or not it is adequate for our study. Papers, classified as surveys or reviews, are also excluded. 2) Entire text analysis. It consists of comprehensive reading. It is obviously essential when the title and abstract do not provide enough information about the suggested solution, while the proposed ideas appear to be viable for the objectives of the literature review. In addition, we have further skimpy evaluated all papers at first removed on the base of the exclusion criteria, so to ensure that all relevant papers are considered in this study.

D. SEARCH RESULTS
This subsection illustrates the results of the search and selection process. An overview of the results is shown in Fig. 3.
In detail, 685 publications have been selected during the initial search step, and they have been then reduced to 224 exploiting the exclusion criteria listed in Table.2. In the last refinement phase, the selected publications underwent a text analysis based on the criteria listed in the section. V-C during the manual filtering phase. During this last phase, 72 studies were eliminated, leaving 152 papers resulting as the most relevant for this SLR. In addition, researchers added another 16 papers to the selected papers following a short assessment of the removed papers based on exclusion criteria.
The 168 papers obtained via the screening are categorized by year in Fig. 4, where the x-axis indicates the period of time included in this review, i.e., from 2017 to 2021-22. Note that this area of research received little attention in 2017 and 2018, while in 2019 it grew to 20 papers. Furthermore, the publications show considerable growth in 2020, indicating that this will be the trend in the coming years.

VI. RESULTS AND DISCUSSION
The contributions of selected papers are discussed in this section. More specifically, each research question listed in   Table.1 is addressed, taking into account the contributions of the publications found in the literature.

A. TAXONOMY OF TL IN THE FIELD OF PdM (SQ1)
Transfer learning can be classified according to a variety of factors. Different viewpoints and characteristics influence categorization. In this study, we classify the TL taxonomy by considering the problem, solution, and application criteria. According to the accessibility of labeled data in the source and target domains, the transfer learning problem could be split into three groups as follow [18]: • Transductive Learning. The task is the same for the source and target domains, but domains may differ. The labeled data are only accessible in the source domain with this setting.
• Inductive Learning. Inductive learning differs from transductive learning in that regardless of whether the domains of the source and target are the same or not, the tasks are different. Despite whether labeled data is available or not in the source domain, labeled data is generally accessible in the target domain in this setting.
• Unsupervised Learning. It's comparable to inductive TL except it's focused on the setting in which neither the source nor the target domains have a labeled dataset. Table. 3 can be used as an index to assist researchers in identifying settings that are relevant to their interests. An alternative problem-based categorization can be based on feature and label space consistency among the source and target. On the basis of the discrepancy between the feature and label space of domains, the following types of TL can be defined: [19]: • Homogeneous Learning. This concept has been developed and presented to deal with circumstances when the domains have identical labels and feature spaces. The closed-set transfer learning problem is another name for this circumstance.
• Heterogeneous Learning. It indicates the process of transferring knowledge between domains that have different features or/and label spaces. It is more challenging than homogeneous learning since it requires feature or/and label space adaptation in addition to distribution adaptation. (X S ̸ = X T or/and Y S ̸ = Y T ) Furthermore, three forms of heterogeneous TL arise from the relation between the source and target label spaces: • Universal Transfer Learning. It deals with the case when there is no prior knowledge about the label space of the source and target domain. Herein, due to the lack of label space information, the universal TL problem provides a challenging and quite practical setting for general TL. (Y S ̸ = Y T ) Due to the high economic and labor expenses in real-world industries, it is generally difficult for a single source to collect enough high-quality data to build an efficient data-driven predictive maintenance model in the target domain. Therefore, depending on the number of source domains used for TL, we can have two sorts of transfers: • Single Source Domain. This technique relies on knowledge from a single source.
• Multiple Source Domain. The multiple source domain transfer learning techniques transfer the knowledge from different multiple, but relevant sources. Based on network structure, solutions to TL problems could be categorized into shallow and deep transfer learning approaches. The deep neural network-based TL methodology aims to learn more transferable representations by including TL methods in the deep learning pipeline. Based on the aspect of ''what to transfer'', transfer learning approaches are here split into the following five groups: • Instance-based. This methodology attempts to reduce the conditional or/and marginal distribution difference across domains through re-weighting or applying importance sampling strategies.
• Feature-based. It intends to reduce the conditional or/and marginal distribution difference across domains based on two scenarios: asymmetric and symmetric. The source features are transformed to match the target characteristics in asymmetric methods. On the other hand, symmetric methods seek to find a shared latent feature space before transforming both source and target domain features into another distinct feature representation.
• Parameter-based. It aims to transfer knowledge at the level of the model or parameter.
• Relational-based. It is primarily concerned with problems in relational domains and transfers the source domain's functional relationships or rules to the target domain.
• Hybrid-based. It attempts to integrate two or more approaches to fulfill a single functional block of the TL model, and the integrated approaches work together to achieve desirable results. Concerning applications, TL techniques could be exploited for accomplishing the main PdM tasks, i.e., fault detection, diagnosis, and prognosis. More specifically, TL can be leveraged for detecting a developing defect (fault detection) or in the case of faulty systems, isolating and identifying the the specific type of fault (fault diagnostics), furthermore it can also be used for forecasting the RUL or the end life of a system (prognostics). Regarding application objects, predictive maintenance systems based on TL techniques could be used to provide safe and reliable operations of assets within critical systems, e.g., oil and gas, mining, aviation, industrial manufacturing components, and power plants.
The feasible transfer scenarios in this context can be divided into the following scenarios [4] (see also • Transfer in the Identical Machine (TIM). In this transfer scenario, the source, and target domain data are collected on the same machine but under different operational conditions or working environments.
The data for the source and target domains in this transfer scenario is collected from different but related machines. Compared to the TIM scenario, these data are VOLUME 11, 2023 more complex because of differences in machine specs, structures, and so on. Hence, there is a significant data distribution discrepancy between the source and target domains. The intuitive motivation for these scenarios is that the probability of all failure modes happening during the previous operation of the target machine is low; hence gathering historical fault data from different related machines is a good option.
• Transfer from Laboratory to Real Machine (TLRM).
Within this scenario, the source domain data is obtained from the laboratory machine to enable the identification of real-world machine fault modes in the target domain. This scenario is intuitively inspired by the fact that modeling failure modes in the lab are simpler, safer, and cheaper than gathering faulty data from a real-world machine.
• Transfer from Virtual to Real Machine (TVRM).
Within this transfer scenario, the source domain data is collected from a machine's virtual model to provide transferable maintenance information for the target machine. The primary reason for this scenario is that, in real-world applications, the historical faulty data offered by physical machines can be limited. However, digital models, which could reflect the fundamental behaviors and principles of physical machines, can generate a significant amount of labeled data under different health and operating conditions. In this subsection, the most relevant methodologies and applications of TL are briefly discussed on the basis of the characteristics and taxonomy depicted in Fig. 5.

1) APPROACH CATEGORIZATION
Table. 5 summarizes the surveyed papers based on the five primary TL approaches. The distribution of the reference based on the approach categorization is summarized in Fig. 6. As expected, due to its capacity for projecting data into a shared feature space, where cross-domain inconsistencies may be minimized, feature-based TL is currently the most popular approach when referring to cross-domain predictive maintenance applications. In the following, we better detail and correlate the different TL approaches to the different PdM applications.
• Feature-based. This approach was introduced to tackle the mentioned problems by finding a common feature between domains in the latent space or decreasing the discrepancy between domain distributions by transferring features from the source to the target domain. Feature-based methods are frequently explored in the TDM scenario of TL because of their capacity to rectify significant across-domain discrepancies. For feature-based TL, there are two mainstreams. To reduce distribution shift, the first methodology, known as moment matching-based, measured statistic discrepancy by applying methods such as maximum mean discrepancy (MMD) and correlation alignment (CORAL). The second methodology is referred to as adversarial learning-based, where two parties compete to align distribution. Adversarial-based TL approaches, inspired by the adversarial learning process, have received much attention as a growing trend. Adversarial-based techniques can be split into two groups according to various strategies. The first is a generative-based technique [191], in which the primary idea is to use source data to build synthetic target data with ground-truth annotations and then use synthesized target data to enable crossdomain tasks. The second technique is the adversarial adaptation-based strategy [192], which uses a domain discriminator to adapt the representation distributions of the source and target domains. In this framework, Tong et al. [29], for instance, developed a feature-based transfer learning method to deal with the fault diagnostic performance deterioration challenge under variable operating conditions of the bearings. A deep feature-based transfer network based on Linear Discriminant Analysis (LDA) and weighted MMD, is suggested by Wang et al. [64] for fault diagnosis of chemical processes. Different from existing research, for rolling bearing fault diagnosis, Li et al. [37] provided a new feature-based transfer network based on multi-Layer and multi-kernel MMD to reduce the domain shift problem. Wu et al. [91] proposed a deep transfer maximum classifier discrepancy approach to align the distributions of auxiliary samples produced by Batch-Normalized Long-Term Memory (BNLSTM) with unlabeled target domain data.
• Parameter-based. This is also known as pre-train model-based transfer and involves fixing and transferring partial parameters of a trained network in the source domain while fine-tuning the remaining parameters in the target domain using  • Relational-based. The goal is to find the correlation among data in the source domain and then apply that knowledge to the target domain task. Despite the other three methods of transferring, relation-based TL approaches do not require that the source data and the target data be distributed independently and identically. As a result, relation-based approaches are far more adaptable and robust than other approaches. Furthermore, most of these approaches are built on statistical learning techniques. However, according to Table 5, relational-based TL is rarely exploited in the PdM field. Zhu et al. [182], for example, presented a flexible TL framework for transferring information from both a qualitative and quantitative perspective for monitoring the batch process.
To begin, a statistical pattern clustering technique is designed for evaluating and distinguishing similar conditions. In addition, a multiphase bayesian network is built with nominal representations enabling qualitative knowledge transfer and statistical modeling.
• Hybrid-based. The hybrid techniques refer to the case when at least two approaches are integrated to perform one single functional block of a transfer and the combined models cooperate to achieve their outputs.
Nonetheless, integrating multiple TL methods into a hybrid approach provides an effective means of using historical information, but it also introduces new challenges. Again, despite its benefits, the majority of surveyed studies (as summarized in Fig. 6) focus on one sort of TL approaches, such as feature-based and parameter-based TL, and only a few studies exploit hybrid-based transfer learning approaches. For instance, Hybrid Transfer Learning (HTL) has been recently proposed by Ma et al. [184] for predicting the behavior of Proton Exchange Membrane Fuel Cells (PEMFCs) based on intercell differences. Two types of TL, instance-based and parameter-based, are used to extract as much information as possible from past data and prior models to increase the similarity of the generated curve. Sun et al. [183] presented a deep hybrid-based TL network based on Sparse Autoencoder (SAE) and three transfer techniques, including weight transfer, feature TL, and weight updating, to transfer prognostic knowledge from a trained SAE to a new domain. These methods enable the prediction of a new domain without the need for labeled target data. For sparse target data, Han et al. [185], a proposed framework based on a hybrid approach. The proposed framework's overall goal is divided into two portions: multiple adversarial domain adaption and supervised fine-tuning. In order to overcome the open-set TL challenge in machinery fault diagnosis, Zhang et al. [186], employed a hybrid model based on adversarial learning and instance-level weighted techniques to capture generalized features and represent the similarities of testing samples with known health conditions. Li et al. [189] exploit both strategies of feature-based and parameter-based TL in their proposed Knowledge Mapping-based Adversarial Domain Adaptation (KMADA) structure, so as to achieve fast convergence and satisfying outcomes.

2) PROBLEM CATEGORIZATION
In the following, the surveyed papers have been also classified and discussed based on different problem-setting in the source and target domains.
• Label-setting Categorization. The categorization of the selected papers on the basis of label-setting categorization is summarized in Table 6. The results in Fig. 7, clearly show that inductive and transductive TL has a significant body of contributions, and this can be clearly motivated by the fact that inductive and transductive TL are natural extensions of the very popular supervised learning approaches. However, ML tasks can be often unsupervised in industrial applications, especially when monitoring complex industrial processes, since, the number of possible faults is uncountable due to the enormous number of parts and interacting components [193]. In addition, because industrial systems are designed to be reliable, obtaining a labeled dataset with sufficient samples of each potential fault, as the source or target domain, is often impossible in practice. It follows that often transductive and inductive algorithms cannot be exploited in practice, since they are not proper for handling conditions in which the source or/and target domains are unlabeled. Unsupervised TL, on the other hand, can deal with the absence of labeled data in both the source and target domains. So, it could be very useful for dealing with unique and special tasks for which there isn't enough labeled data from both the source and target domains. Despite its great benefits, unsupervised TL is still not popular in the PdM field, since learning good representations from a significant set of unlabeled data is a particularly challenging task that makes difficult its application to real-world tasks [194]. Among the few works focusing on Unsupervised Transfer Learning (UTL), it is worth noting the novel framework proposed by Mao et al. [102] for online early fault detection via UTL. The approach is aimed to improve the resilience of the detection model, resulting in a much lower false alarm rate, by combining robust state evaluation and coupled adversarial training of deep domain adaptable neural networks. Gabriel et al. [104], proposed a new framework for solving one-class classification issues. The approach is intended to detect anomalies in fleets of machines from the same manufacturer that are monitored by comparable sensors but are experiencing domain shifts due to differences in system setup, operation, or environment. However, the main drawback to this configuration is related to the difficulties arising from training the domain discriminator in an imbalanced configuration.
• Source-setting Categorization. Table. 7 summarizes the selected papers based on source-setting categorization, while their distribution in Fig. 8 discloses that most of the publications (namely, 88%) are focused on single source TL. Mainly for the sake of simplicity, current transfer learning-based fault diagnostic approaches mostly rely on transferring maintenance knowledge from one source domain to the target domain. Moreover, the accessibility of significant amounts of labeled training data from a single source allows for achieving excellent performance and testing accuracy. Nevertheless, it is sometimes impossible for a single source to provide adequate labeled data for developing an efficient data-driven predictive maintenance model in real industries. On the contrary, it sometimes could be easier to collect labeled data from many different sources with comparable operating machines in order to expand the training dataset at a lower cost. Moreover, further advantages arise. First, by leveraging all the independent data collected from several source domains, the model will be able to transfer more comprehensive and general diagnostic and prognostic knowledge. Secondly, the possibility of over-fitting can be alleviated with more training samples, which could provide favorable performance in the target domain [105]. When shifting among multiple domains, even though greater data exploration could theoretically lead to higher model performance, in practice, it results to be hard aligning the distributions between all source domains and target domains. More specifically, as illustrated intuitively in Fig. 9, it is very challenging to completely remove the shift between a single source and target domain. So when attempting to align multiple sources and target domains, a considerable degree of mismatch commonly arises, which could adversely affect model performance. This further motivates the distribution in Fig. 8. As shown in   Wen et al. [36] conducted analytical experiments to prove that the third dataset XTD might influence the DTL's final prediction accuracy. More specifically, they disclose that the prediction accuracy improves when the third dataset is more similar to the target dataset than the source dataset. Note that, transferring from multiple operating conditions to a single operating condition might also have a negative impact as shown in Zhang et al. [149]. Herein, the authors attempted to explain this negative effect by comparing sensor monitoring data values. More notably, they try to disclose what factors influence the impact of TL under various operating conditions and two factors have been identified so far. First, sensor data gathered under multiple operating conditions are more complex than data collected under single conditions. Second, the sensor value distribution differs from the sensor monitoring data with a single operating condition. As discussed in [68], domain-invariant representations are difficult to understand when single-source domain adaptation is employed to explain the distribution of obtained data under various working conditions. To tackle this issue, Wen et al. [68] applied a technique based on multi-feature spatial domain adaptation. To achieve more accurate homogeneous transfer knowledge in the TIM scenario, Tian et al. [142] developed a multi-source subdomain adaptation transfer learning approach by employing a multi-branch network structure and local MMD method. The multiple source domain adaptation for machinery fault diagnosis has been very recently also addressed via Weighted Domain Adaptation Neural Network (WDAN) in Wei et al. [110] by leveraging some criteria to find whether to conduct domain adaptation before starting the training or traditional supervised learning in order to prevent negative transfer. However, they are mostly focused on feature transferability and neglect the impact of sample transferability on domain adaptation. To this purpose, Shi et al. [116] introduced a unique unsupervised MDA-based TL scheme called Multisource Domain Factorization Network (MDFN), which learns generic diagnostic information from numerous sources and then applies it to diagnosing the target task. The proposed framework employs transferability-based entropy penalties and shared-space component analysis methods to deal with negative transfer from the perspectives of instance transferability and feature representation. However, the theoretical frameworks proposed in the above studies are mainly designed under the main restrictive assumption that both source and target domains have identical features and label spaces. This is often unfeasible since the health condition sets in source and target domains are usually different in a real industrial environment. To deal with this issue, Li et al. [90] have instead developed a novel deep learning-based heterogeneous TL technique for diagnostics, in which diagnostic information, obtained from adequate labeled data of multiple rotating machines, is transferred to the target machine. However, the real-world application of the approach is limited since the required labeled target training samples are usually unavailable in practice. Finally, in order to address the issue of data imbalances across health states in multi-source TL scenarios, Yang et al. [117] introduced a network that combines several partial distribution adaption sub-networks and a multi-source diagnostic knowledge fusion module to collect and leverage diagnostic knowledge from multiple source machines. • Space-setting Categorization. In Table. 8, the surveyed publications are classified based on the space-setting categorization. Currently, due to ease of application, most transfer learning methods operate under the assumption the feature spaces of the data in the source and target domains are represented by the same attributes [197]. This also reflects in the PdM literature as highlighted by results in Fig. 10, where (75%) of the TL studies leverage homogeneous transfer learning. Nevertheless, regardless of the progress and achievement of homogeneous TL methods for PdM, there is still a challenge that cannot be neglected. Indeed, a challenging assumption is still required for homogenous approaches, i.e., they require a source dataset and target dataset sharing the same label space, that is, the health conditions sets in the two domains are identical.
In real-world industrial applications, this assumption may not always be fulfilled, where collecting data under all the possible machine health conditions is very difficult. As shown in Fig. 11(a), since the homogeneous or closed-set TL assigns equal weights to the source and target instances, cross-domain data have an equal contribution to the adaptation process. It follows that most of the existing homogeneous TL techniques are exposed to negative transfer because they attempt to align all data, even outlier samples.
In real-world industrial applications, since maintenance should be both effective and efficient, it is not reasonable that equipment cannot be monitored until data of the same category is obtained from the source domain and adapted to the target domain. This refers to a situation where the target label space is a subset of the source label space and is named a partial transfer learning problem.
As a result, compared to the scenario for which the closed-set TL is designed, this configuration is closer to engineering practice. This kind of TL is depicted in Fig. 11(b). With the aim of handling the partial TL problem in machinery fault diagnostics, Li et al. [95] developed a novel deep TL model based on class-weighted adversarial networks. Liu et al. [190] developed a deep partial adversarial domain adaptation network based on an SAE algorithm, to weigh and recognize common instance types from mixed source domain instances.
For achieving a broad diagnostic model in the partial transfer applications, Deng et al. [109], introduced a Double-layer Attention-based Generative Adversarial Network (DA-GAN) to find target label space that should be considered for the transfer task, and to determine which samples for each sub-domain discriminator should be focused on. Finally, to overcome the challenges arising from the presence of unbalanced data across every health state of the target domain and heterogeneity of label-space in the partial mode, Yang et al. [78] presented an adversarial adaptation model called Deep Partial Transfer Learning Network (DPTL-Net) for machinery fault diagnosis. Often in practice, as shown in Fig. 11(c), new fault modes which not present in the source classes, may appear in the testing phases. A key factor in dealing with this situation might be finding source data with the same class as the target data so that these data may be used for domain adaptation and classifier training. Nevertheless, because the target data usually is fully unlabeled and the target label space is unknown, finding source data that is associated with the target label space might be challenging. This challenging scenario, in which the source domain's label space is a subset of the target domain's label space, is named open-set transfer learning in the technical literature. It is worth noting that, transfer learning approaches that rely on marginal distribution alignment cannot achieve class-level diagnostic knowledge transfer within the open-set transfer learning context, due to target outlier classes, thus often resulting in model adaptation capacity degradation. Despite its relevance, however, the open-set transfer learning problem has received only a little attention in the technical literature.  [186] also tackled this problem via a deep learning-based adversarial training strategy for extracting domain-invariant features from source and target domains. In detail, an instance-level weighted technique is suggested to identify the target outlier classes, representing the similarities of the target instances with the source classes.
Both partial and open set transfer learning techniques demand previous knowledge of label spaces across domains, hence implying that these approaches are suitable for dealing with off-line cross-domain problems. However, in a more general PdM scenario, obtaining the relationship between the source and target label spaces in advance is not feasible. Hence, a more difficult scenario, known as universal transfer learning, needs to be tackled. It follows that no previous knowledge about machine faults can be obtained in advance for the target domain and this suggests that testing machines may have both known and unknown source-domain faults (as shown in Fig. 11(d)). Note that this universal transfer learning scenario commonly arises when implementing predictive machine maintenance in practice. To address this challenging problem, Zhang et al. [187] exploited a deep adversarial learning technique so to bridge domain gaps across various operating conditions. More in detail, source class-wise, and target instance-wise weighting methods have been suggested for selective domain adaptation. In addition, novel outlier identifiers and reconstructor modules have been included to discover unknown fault modes while maintaining data information in processing.

3) APPLICATION CATEGORIZATION
The selected papers have been classified and discussed in the following based on different application-setting in the source and target domains.
Depending on the application scenario, the selected papers have been classified in Table 9. It's important to note that in some works, the authors validated their proposed framework in more than one transfer scenario. Results in Fig.12 confirm that, as expected, the majority of the studies validated their proposed approach in the simplest scenarios, i.e., TIM. The main weakness of these studies is that only TL within a machine is investigated and only transferred the maintenance knowledge from one operating condition to the next. Whereas, in the real world, the transfer of maintenance knowledge from the data collected from different related machines is the most realistic scenario for the PdM [4]. Moreover, within this challenging scenario, additional reasons, including fault machine specifications, characteristics, etc., can impact the discrepancy of the domains' data distribution in addition to the working condition. Along this line, Chen et al. [161] developed a transferable diagnostic model based on the CNN algorithm to enhance the performance of a target machine fault diagnosis model by using information obtained through different machines (historical data). However, in practice, when dealing with transferrable maintenance scenarios, only normal target domain samples are generally available, and this could significantly influence the proposed approach's performance. To tackle this issue, Zheng et al. [47], developed an innovative feature-based TL approach for fault diagnosis of gearbox machines, named Transfer Locality Preserving Projection-based Intelligent Fault Identification (TLPPIFI). However, it is sometimes difficult to obtain enough labeled data, even from other real-world machines, because real-world machinery is typically maintained in good condition and failures are unusual. As a result, obtaining real faulty data is more difficult and time-consuming than collecting real normal data, and whereas real normal data is typically sufficient, real faulty data is sometimes inadequate. Furthermore, the fault types are often unknown during the running of realworld machines. It is impracticable to stop machines on a regular basis and then evaluate their health conditions based on the data obtained. Additionally, when data quantities grow at a fast rate, manual data labeling  becomes an ineffective solution due to significant human labor costs and a high dependency on knowledge [4].
In order to address this problem, introducing various fault types into a lab-case machine and then acquiring a huge amount of labeled faulty data might be a feasible solution. The collected data from laboratory and realworld machines, on the other hand, have a dramatically different distribution, which is impacted by a number of reasons, such as the measurement environment, working conditions, fault characteristics, damage mode, and others. As a result, maintenance models that were exclusively developed based on data collected from laboratory machines may not perform well when applied directly to real-world machinery. As shown in Fig. 12, only a few studies have been devoted to a deeper understanding of the above-mentioned TLRM scenario, and its related issues, via TL. For instance, with the aim of identifying the health conditions of wind turbine bearings, Lv et al. [70] introduced a deep transfer model based on a multi-kernel dynamic distribution adaptation method to VOLUME 11, 2023 transfer diagnosis knowledge from a laboratory bearing. Furthermore, by using data gathered in labs from the gearbox and motor bearings, Yang et al. [39] introduced a feature-based transfer model based on a neural network, to diagnose the health conditions of locomotive bearings in real-world applications. Despite the remarkable success, researchers ignore the changes in the class weight of the target machine. Due to the controlled experimental circumstances, most lab machine datasets are balanced, but in real-world industrial applications, a considerable amount of training data belongs to normal conditions. To address this problem, Cao et al. [79] developed a pseudo-categorized MMD that takes into account the various class weight bias in the real-case machines dataset and exploits the category probability vector as a penalty term in the MMD. However, in other cases, access to historical fault data generated by physical equipment during a machine's real-world application may be restricted, or the cost of obtaining faulty data through the lab-case machine may be too expensive. In all these cases, digital models, able to capture the underlying rules and behaviors of real-case machines, can produce the required large amounts of data under different operating and health conditions. As a matter of fact, even though both the virtual and real models are centered on the same machine health condition, there is always a distribution difference between them for a variety of factors. To begin with, it is practically unfeasible to integrate all features of a real-world machine into a digital model; as a consequence, certain simplifications must be made, and just the primary components are often included. Besides, digital modeling cannot fully explain the presence of uncertainty, noise, and random environmental influences. Finally, the distribution of vibration signals in real-world machines is affected by various factors (e.g., sensor type, calibration, installation and fixation technique, drift, structure transmission properties, and so on) that cannot be simulated in a digital model, e.g., via a simplified mathematical representation such as dynamical systems or differential equations. In a nutshell, addressing the distribution discrepancy challenge between digital and real-world model data is critical for designing a predictive maintenance model that performs well in real-world machines. [4]. However, as shown in Fig. 12 [182].
• Tasks Categorization. The surveyed publications have also been classified on the base of the application tasks in Table. 11. Herein, results in Fig. 14 confirm that only very few studies are currently focusing on detection and prognosis tasks, since gathering time-series data needed for prognosis purposes in predictive maintenance is much harder and more time-consuming  with respect to gathering the data for classification issue.

C. DATASETS TO APPLY TL WITHIN PdM (SQ3)
High-quality data is the foundation for the implementation of TL approaches. Therefore, to design successful TL algorithms for PdM, a good collection of datasets is necessary. As already pointed out, collecting the dataset from real machines, is a very time-consuming process, since natural fault degradation is a slow process, that can take years.
To tackle this issue, in some experiments, researchers gather data by employing components with artificially induced faults or accelerated life cycle approaches. Nevertheless, data collection is still difficult and costly and, hence, several organizations have made their fault databases accessible to engineers and researchers, so that they can be exploited by the scientific community. These datasets can also be used as a baseline platform for evaluating and comparing different approaches since they are a common ground for scientists due to their widespread use. This section provides a brief summary of the open-source datasets in Table. 12, where the sensor type of each dataset is shown in the second column and the number of sensors for each dataset is disclosed in the third column, while the remaining four columns refer to the sampling frequency, monitoring object, fault mode, and task of each dataset, respectively.

D. OPEN ISSUES, CHALLENGES AND OPPORTUNITIES OF TL FOR PdM (SQ4)
The potential of transfer learning is becoming increasingly apparent, and many researchers believe that this technique can really improve the predictive maintenance industry [19]. However, considerable challenges must still be tackled and solved before reaching its full potential. The following are briefly mentioned important concerns, challenges, and opportunities:

1) NEGATIVE TRANSFER
Transfer learning has been used to address the problem of a lack of training data in a target domain by leveraging knowledge from one or more source domains. Transferred knowledge, on the other hand, may not always have a positive impact on the target domain's task since its effectiveness is dependent on several assumptions [4]. For example, any violation of the following assumptions may result in negative transfer (NT). To begin with, the tasks in both domains should be related or similar. Second, the distribution of data across the domains should not be too diverse. Lastly, a proper TL  technique should be used. However, despite the importance of the issue, just a few studies have described it informally, without proposing a comprehensive definition, extensive analysis, or systematic treatment [190]. As a result, the questions of what characterizes the formulation of NT, what attributes contribute to NT, and how to prevent or minimize NT remain unresolved.

2) UNSUPERVISED MULTIPLE SOURCE TRANSFER
As shown in Fig. 7, the current achievements of TL are largely focused on inductive and transductive TL. However, collecting a labeled dataset with sufficient samples of each possible fault type is quite challenging in practice. Furthermore, in real-world applications, the unlabeled data might come from multiple sources, such as various heterogeneous, related machines [117]. For these reasons, unsupervised multiple TL, aimed at accomplishing maintenance tasks by leveraging heterogeneous, unlabeled sources and target data, seems to be the most promising and significant research direction in the very near future for dealing with industrial applications in practice. Hence, it is crucial to understand and identify which factors influence the success of a TL model when exploiting knowledge from multiple unlabeled heterogeneous datasets.

3) UNSEEN TARGET DOMAINS
Despite the fact that TL has demonstrated good performance in PdM, most of the developed models are only applicable to off-line PdM tasks due to the following assumption: It was assumed that, during the training process, target domain data must be readily available and adaptation models must be specially trained with target data before being executed on target machines. As a result, this assumption can restrict the adaptation of existing approaches to the unseen target domains [4]. To overcome this limitation, it is essential to focus on the more practical scenario in which the target domain data is inaccessible. This is compatible with real-time cross-domain PdM and has more applicability in real-world machines. One possible solution to the challenge is to generalize the maintenance knowledge of the several source domains to an unseen target domain based on the meta-learning technique [198].

4) DIGITAL TWINS
A Digital Twin (DT) consists of a virtual model that is constantly updated to reflect the status of its physical counterpart. This technique allows engineers to collect a huge amount of useful component run-to-failure data [199].
In case of the unavailability of historical fault data of physical equipment, the combination of TL and DT is a very promising approach worth investigating, as the target domain can be dynamically augmented with virtual model data [173]. Despite its potential, as shown in Fig. 12, a few studies focused on the TVRM scenario, where there is no consensus or consolidation on how DT might be employed in TL applications. Another challenge that needs to be tackled is how engineers can adaptively update the operation conditions of DT to the new operating conditions of the target machine.

5) CROSS-MODALITY TRANSFER LEARNING
Indeed, most of the TL approaches typically need some sort of feature space connection across the source and target domains, and knowledge transfer is only feasible whenever the source and target data (such as image, audio, text, and so on) are both in the same modality. However, within Industry 4.0, different data sources (such as operation and maintenance logs, sensor measurements, design papers, etc.) can be used, and these might also provide valuable information for the implementation of PdM models [9]. It follows that Cross-Modality Transfer Learning (CMTL) is a hot topic in PdM aiming to cope with scenarios where the feature spaces of both domains are entirely distinct, e.g., when transferring knowledge from text to image, or from audio to text. So, in the future, it will be critical to investigate how maintenance knowledge could be transferred across cross-modality spaces.

6) TRANSFER LEARNING FOR PROGNOSIS TASK
As illustrated in Fig. 14, when limiting to TL application, the majority of the current research in the PdM field aims to fault diagnosis. Only a few studies are focused on prognosis, and they provide very preliminary results. As a consequence, the use of TL technology for prediction, decision-making, and proper scheduling of the maintenance work is still a crucial open issue.

VII. CONCLUSION
The goal of this paper has been to explore the main concepts, concerns, and potential of TL in the context of PdM. To that aim, an overview of PdM has been introduced and discussed together with the challenges arising from traditional ML and DL algorithms in real-world PdM applications, which motivated the need for TL. Through an SLR, we selected 168 recent journal publications that we further classified and analyzed. The taxonomy of the TL in the PdM context was first introduced with respect to three different problem settings: label, space, and source. Furthermore, the selected papers have been categorized and reviewed on the basis of the introduced taxonomy. Moreover, the PdM real-world applications that already have profited from TL have been clearly identified in this survey, as well as the open-source datasets introduced to assist the researchers in validating and comparing their approaches for cross-domain PdM challenges. Ultimately, open issues, challenges, and future research directions have been highlighted. We believe the SLR provided in this paper represents a useful reference summarizing SOTA approaches as well as hints for developing and deploying TL for PdM in both academic research and industrial settings.