Unsupervised Transfer Learning Method via Cycle-Flow Adversarial Networks for Transient Fault Detection under Various Operation Conditions

The efficient fault detection (FD) of traction control systems (TCSs) is crucial for ensuring the safe operation of high-speed trains. Transient faults (TFs) can arise due to prolonged operation and harsh environmental conditions, often being masked by background noise, particularly during dynamic operating conditions. Moreover, acquiring a sufficient number of samples across the entire scenario presents a challenging task, resulting in imbalanced data for FD. To address these limitations, an unsupervised transfer learning (TL) method via federated Cycle-Flow adversarial networks (CFANs) is proposed to effectively detect TFs under various operating conditions. Firstly, a CFAN is specifically designed for extracting latent features and reconstructing data in the source domain. Subsequently, a transfer learning framework employing federated CFANs collectively adjusts the modified knowledge resulting from domain alterations. Finally, the designed federated CFANs execute transient FD by constructing residuals in the target domain. The efficacy of the proposed methodology is demonstrated through comparative experiments.


Introduction
High-speed trains have emerged as one of the most crucial components within intelligent transportation systems.Traction control systems (TCSs), serving as the core power systems for high-speed trains, are intricately linked to trains' reliability and safety.However, they also represent a major source of faults in both long-term operation and harsh operating environments.Consequently, fault detection and diagnosis (FDD) has become an active area of research over the past few decades [1][2][3].
Currently, FDD methods for high-speed trains can be broadly categorized into three groups: model-based approaches, signal-based approaches, and data-driven approaches.Despite their accessibility and high efficiency in producing FDD results, establishing model-based methods is challenging due to practical uncertainties and complex designs.Signal-based methods exhibit limited effectiveness in detecting minor symptoms, particularly in dynamic scenarios [4].
In the meantime, due to the widespread deployment of sensors in complex systems, data-driven methods have been extensively advocated for accomplishing fault detection and diagnosis (FDD) tasks by effectively processing a massive volume of data [1,[5][6][7].In [8], the authors proposed a discriminative stacked autoencoder (D-SAE) network based on feature integration boosting for bearing fault diagnosis.This method mitigated the performance degradation and enhanced the generalization ability in various scenarios.Ref. [9] proposed an innovative fault detection (FD) method for bogie.In this Sensors 2024, 24, 4839 2 of 23 study, a Monte Carlo-based perturbation technique is employed to amplify the distinction between unexpected faults and known ones.Consequently, the FD outcome for unexpected faults can be obtained using dropout-based Bayesian deep learning.The authors in [10] proposed a fault diagnosis method for braking friction based on a onedimensional convolutional neural network (1DCNN) and the GraphSAGE network.This approach effectively addresses the challenge of imbalanced fault samples by considering the correlation between different fault features.In addition, ref. [11] presented an incipient FDD method for running gear systems that leveraged Hellinger distance and slow feature analysis.
The aforementioned FDD methods primarily address permanent faults (PFs) in mechanical components or systems.However, transient faults (TFs), as a type of incipient fault, have the potential to develop into PFs and are responsible for most failures observed in electronic devices such as power electronics, sensors, and traction control units (TCUs) within TCSs.
In the context of complex industrial systems, multiple fault detection and diagnosis (FDD) methods have been developed specifically for transfer functions (TFs) [12][13][14][15][16][17][18].Ref. [12] assesses and demonstrates the ability of a bulk built-in current sensor's (BBICS) architecture to detect multiple and simultaneous TFs for integrated circuits.Ref. [14] studies fault tolerance in switching reconfigurable nano-crossbar arrays, considering both TFs and PFs.In [15], an innovative ontology-based fault propagation analysis approach (ontolo-gyFPA) is proposed to analyze transient fault propagation effects in networked control systems (NCSs).Ref. [17] presents a TF detection and classification approach in power transmission lines based on graph convolutional neural networks.In [18], an optimal fractional-order method is proposed for TF diagnosis, which suppresses background noise and amplifies the faulty part of the signal.Afterward, kurtosis and the fault duration time are applied to locate the fault component.
However, the methods mentioned above perform in static or one fixed operation condition, which are not involved in dynamic cases [4].Different operation conditions may lead to significant distribution differences, which means that an intelligent FDD model trained on data under a certain operation condition is usually not applicable to other operation conditions [19,20].Traditional deep learning approaches necessitate a plethora of samples from diverse operational conditions for effective model training.Conversely, a TCS typically operates under steady-state conditions, resulting in imbalanced distributions across various operation conditions [21].
The primary challenges in the field of TF detection encompass the following: 1.
TFs exhibit sporadic and stochastic behavior, leading to impermanent damage that disappears unpredictably.

2.
The distribution of samples across different operational conditions is imbalanced, particularly for faulty samples which are significantly underrepresented.

3.
The features of TFs are inherently weak and can easily be overshadowed by background noise, especially in dynamic scenarios.
These characteristics make TFs challenging to detect.In this context, transfer learning (TL) has been extensively discussed for extracting latent feature information and achieving precise fault detection under dynamic operation conditions.TL aims to enhance the performance of target domains by leveraging the knowledge embedded in diverse but related source domains, thereby reducing the reliance on a substantial amount of target domain data for constructing target learners [21,22].
Several FDD methods with TL have been developed for electrical systems.Ref. [23] proposes an FD method for traction converter faults in traction drive systems.This method consists of a federal neural network based on a variational autoencoder (VAE), which can perform the FD task with performance degradation.The authors of ref. [24] developed a hierarchical method for transformer rectifier unit (TRU) fault diagnosis and a transfer learning-based fault diagnosis method without training new models for different TRUs.In [25], a novel transferrable open-circuit fault diagnosis method is proposed for insulated Sensors 2024, 24, 4839 3 of 23 gate bipolar transistors in three-phase inverters, which can be applied to different systems with the same topology but different parameters.The authors of ref. [26] developed an adversarial-based deep TL model that can detect and classify short-circuit faults in DC microgrids without using historical fault data.Ref. [27] proposes a transfer learning-based fault location method for voltage source convertor-based high-voltage direct current (VSC-HVDC) transmission lines.This method can locate faults with small training datasets.However, executing the task of transient FD for TCS in dynamic operation conditions is still an urgent problem that needs to be solved.
Motivated by the discussions above, we propose a TL strategy to detect the transient faults of TCS under various operation conditions.In the proposed method, a Cycle-Flow adversarial network (CFAN) is first constructed for latent feature extraction and data reconstruction in steady operation conditions.Secondly, a TL framework with the federated CFANs jointly adjust the changed information caused by varied operation conditions.The two mentioned steps are to learn and preserve knowledge under normal cases.Finally, designed federated CFANs reconstruct residuals with faulty data for transient FD under dynamic operation conditions.
The contributions of the proposed method are summarized as follows: 1.
A CFAN is proposed for latent variable extraction and data reconstruction, which consists of an invertible flow model and two discriminative networks; the loss function is designed as well.Specifically, bidirectional optimization can enhance the quality of reconstruction while mitigating interference caused by background noise through adversarial training and flexible inference.

2.
The proposed federated CFAN-based TL is divided into two stages.Initially, the first CFAN model is trained using normal data in steady operation conditions.Subsequently, the second CFAN calibrates the changed information caused by varied operation conditions utilizing limited data.In conclusion, the federated CFANs can jointly learn latent knowledge in a steady state and be applied to transient fault detection in various operation conditions.

3.
Simulation experiments are conducted on various transient faults using the normal steady state of TCS as the source domain and the dynamic operation condition as the target domain.The simulation results show that the federated CFAN-based TL method can improve the performance of transient fault detection.
The remainder of this paper is organized as follows: Section 2 states the transient fault detection problems and flow basics.Section 3 details the proposed transfer learning fault detection strategy based on federated CFANs.In Section 4, the experiment results and data sources are briefly described.Finally, the conclusions and prospects are given in Section 5.

Problem Statement
The schematic diagram of the TCS is shown in Figure 1.The pantograph delivers single-phase AC power from the public grid to the transformer.The rectifier receives a lower voltage u n and current i n from the transformer and converts single-phase AC into DC voltages (u cd1 , u cd2 ) stabled by DC-link.The inverter then outputs three-phase AC voltage/current (u u /i sa , u v /i sb , u w /i sc ) to drive the asynchronous traction motors.In addition, the traction control unit (TCU) receives the sensor signals and sends the gate control signals spwm and svpwm.
As the attended time of high-speed trains increases, irreversible scenarios will arise in components of the TCS [1].TFs caused by these irreversible changes are temporary faults but may not necessarily cause permanent damage.TFs are usually induced by the internal structural defects and manufacturing processes of active components.Furthermore, noise signals such as electromagnetic interference, spark discharge, lightning strikes, load fluctuation, etc., also contribute to TFs.As the attended time of high-speed trains increases, irreversible scenarios w in components of the TCS [1].TFs caused by these irreversible changes are tem faults but may not necessarily cause permanent damage.TFs are usually induce internal structural defects and manufacturing processes of active components.more, noise signals such as electromagnetic interference, spark discharge, l strikes, load fluctuation, etc., also contribute to TFs.
There is analog signal interference in its external communication connect TCU faults.Consider the three-phase current,  , , , where its fault current is as  , , =  , , + (, , ) where (, , ) represents transient pulses described by a double-exponential and  and  are the time coefficients of the injection signal, which codeterm width of the injection pulse, rising time, and falling time. is the amplitude co of the injection signal.The control strategy will compensate for the aforemention by leveraging closed-loop regulation, thereby rendering them challenging to dete agnose using conventional methodologies.
Furthermore, sensor faults caused by surges in power, ground wires, and g voltage fluctuations are also causes of TFs.For the value of the -phase curren fault current is as follows: where  is the current value in a normal state, and () is the short-duratio value caused by the above factors.
In addition, soft errors of components in the traction control unit can also lead permanent mutations in the output of the sensors.TFs usually appear randomly appear in a short period, which results in uncertainty [18].

Preliminaries of Normalizing Flow
Normalizing flow (NF) is a transformation of a simple probability distributio more complex distribution by a sequence of invertible and differentiable mapping allows for an exact likelihood calculation [28,29].Therefore, NF has been widely image processing, denoising, and anomaly detection [30][31][32][33].Suppose  is a high sional random vector with a known probability density function (PDF)  ().Th variable z is typically assumed to follow a specific distribution, usually the mul unit Gaussian distribution (0, ), which compels the model to learn the input d tribution.Assuming ~ (),  and  are all D-dimensional; the PDFs of the gi There is analog signal interference in its external communication connections for TCU faults.Consider the three-phase current, i sa,sb,sc , where its fault current is as follows: where f (p, q, A) represents transient pulses described by a double-exponential module, and p and q are the time coefficients of the injection signal, which codetermines the width of the injection pulse, rising time, and falling time.A is the amplitude coefficient of the injection signal.The control strategy will compensate for the aforementioned fault by leveraging closed-loop regulation, thereby rendering them challenging to detect or diagnose using conventional methodologies.Furthermore, sensor faults caused by surges in power, ground wires, and grid-side voltage fluctuations are also causes of TFs.For the value of the U-phase current i u , its fault current is as follows: where i 0 u is the current value in a normal state, and δ(t) is the short-duration pulse value caused by the above factors.
In addition, soft errors of components in the traction control unit can also lead to non-permanent mutations in the output of the sensors.TFs usually appear randomly and disappear in a short period, which results in uncertainty [18].

Preliminaries of Normalizing Flow
Normalizing flow (NF) is a transformation of a simple probability distribution into a more complex distribution by a sequence of invertible and differentiable mappings, which allows for an exact likelihood calculation [28,29].Therefore, NF has been widely used in image processing, denoising, and anomaly detection [30][31][32][33].Suppose x is a high-dimensional random vector with a known probability density function (PDF) p x (x).The latent variable z is typically assumed to follow a specific distribution, usually the multivariate unit Gaussian distribution N (0, I), which compels the model to learn the input data distribution.Assuming z ∼ p z (z), x and z are all D-dimensional; the PDFs of the given data are as follows: The generation process can be expressed as follows: where f (•) is a reversible function that transforms a random variable x into z, which is also called bijection.g(•) is the inverse function of f (•) such that given for a data x, the variable inference is completed by z = f (x) = g −1 (x), and θ is the parameter.Therefore, ( 5) and ( 6) can be written as follows: where J(Z) is the B × B Jacobian matrix.As shown in Figure 2, transformation g molds the PDF p z (z) into p x (x).The ab- solute Jacobian determinant |detJ(Z)| quantifies the relative volume change in a small neighborhood around z due to g [34].
The generation process can be expressed as follows: where (•) is a reversible function that transforms a random variable  into , also called bijection.(•) is the inverse function of (•) such that given for a da variable inference is completed by z = () =  (), and  is the parameter.Therefore, ( 5) and ( 6) can be written as follows:  Based on the above, NF can complete the distribution transformation of a plexity.
As shown in (7).Considering the forward process, fitting a flow-based mo can be achieved by minimizing the Kullback-Leibler (KL) divergence between th distribution and the  (z) can be expressed as follows: where  is determined by the discretization level of the data and  is the dime .Assuming the target data samples {()} from target distribution, the exp of target distribution can be estimated by Monte Carlo as follows: Based on the above, NF can complete the distribution transformation of any complexity.
As shown in (7).Considering the forward process, fitting a flow-based model f (•) can be achieved by minimizing the Kullback-Leibler (KL) divergence between the target distribution and the p z (z) can be expressed as follows: where a is determined by the discretization level of the data and M is the dimension of z.
Assuming the target data samples {z(n)} N n=1 from target distribution, the expectation of target distribution can be estimated by Monte Carlo as follows:

The Proposed Federated CFAN-Based Transfer Learning Strategy
Motivated by the research on image processing based on NF [35], a federated CFANbased TL strategy to detect transient faults in TCSs is proposed due to its reversibility and flexibility in modeling various distributions.
In this work, the source domain is represented by D s , where d represents data, which denotes the measurements under the steady operation of TCSs.Similarly, target domain Sensors 2024, 24, 4839 6 of 23 data are represented by D t , which represents the measurements in dynamic operation.There will be some differences in data distribution between the source and target domain.New knowledge can be acquired through reasonable adjustments of previous knowledge.This transfer-learning approach can achieve better FD performance than using only the target domain data [36].

Principle of CFAN
The framework of the proposed CFAN model is shown in Figure 3.In this work, the forward process of CFAN can be defined as H, and the reverse process is expressed as H −1 .Consider source domain sample {d s (k)} M k=1 ∈ D s .The target of the defined model H is to learn the potential features of the source domain in which θ 1 represents hyperparameters.Model H contains two mapping functions, the forward process H, and the reverse process H −1 .In addition, two adversarial discriminative networks, D f and D r , are introduced, where D f aims to differentiate between d s (k) and the generated data H(d s (k); θ 1 ).Similarly, D r aims to distinguish between d s (k) and H −1 (d s (k); θ 1 ).D f encourages H to transform d s (k) into an output (itself) that is indistinguishable from d s (k) and vice versa for D r and H −1 .
domain data are represented by  , which represents the measureme eration.There will be some differences in data distribution between th domain.New knowledge can be acquired through reasonable adjus knowledge.This transfer-learning approach can achieve better FD pe ing only the target domain data [36].

Principle of CFAN
The framework of the proposed CFAN model is shown in Figure forward process of CFAN can be defined as ℋ, and the reverse proc ℋ .Consider source domain sample { ()} ∈  .The target of ℋ is to learn the potential features of the source domain in which  r rameters.Model ℋ contains two mapping functions, the forward pro verse process ℋ .In addition, two adversarial discriminative networ introduced, where  aims to differentiate between  () and t ℋ( ();  ) .Similarly,  aims to distinguish between  () and encourages ℋ to transform  () into an output (itself) that is indi  () and vice versa for  and ℋ .
The affine coupling layer is presented in Figure 4; the output ℎ( pling layer follows Equations ( 16) and (17).Given a B dimensional input, d s (k): , which are given as follows: The affine coupling layer is presented in Figure 4; the output h(k) of an affine coupling layer follows Equations ( 16) and (17).As the reverse input ℎ() and output (), its reverse process can be expressed as follows: where (•) represents the scaling function, (•) represents the translation function, and ⨀ is the Hadamard or element-wise product.
Considering the forward process, the Jacobian matrix of transformation (•) can be expressed as follows: The upper left area of the Jacobian matrix is an identity matrix .Since  : () is irrelevant to ℎ : (), the upper right area of the Jacobian matrix is a zero matrix, 0. The lower right area of the Jacobian matrix is a diagonal matrix with the diagonal element exp ( ( : ())).Therefore, the calculation of the lower left area of the Jacobian matrix can be ignored.Because the Jacobian of (•) or (•) is not necessary for computing the Jacobian determinant of the coupling layer, (•) or (•) can be arbitrarily complex for various network designs.
Although the coupling layer may be powerful, the distribution is often very complex in practice.Moreover, it is challenging to transform a complex distribution into another; one transformation is often insufficient.In addition, the forward transformation leaves some components unchanged, with the first d dimensions being identical to the initial data.Figure 5 illustrates the composition of the coupling layer in an alternating pattern.This structure allows different parts of the data to be passed through different transformation paths.It ensures that the final generated data do not contain components originating from the initial data [30].Combining coupling layers is carried out as follows: Then, its reverse process can be expressed as follows: As mentioned above, to minimize the error between the input and reconstructed output, the expectation of  ( ()) can be estimated by Monte Carlo as follows: Finally, h 1:b (k) and h b+1:B (k) are merged into one group h(k).
As the reverse input h(k) and output x(k), its reverse process can be expressed as follows: where s(•) represents the scaling function, t(•) represents the translation function, and ⊙ is the Hadamard or element-wise product.
Considering the forward process, the Jacobian matrix of transformation f (•) can be expressed as follows: The upper left area of the Jacobian matrix is an identity matrix I. Since d 1:b s (k) is irrelevant to h b+1:B (k), the upper right area of the Jacobian matrix is a zero matrix, 0. The lower right area of the Jacobian matrix is a diagonal matrix with the diagonal element exp(s d 1:b s (k) ).Therefore, the calculation of the lower left area of the Jacobian matrix can be ignored.Because the Jacobian of s(•) or t(•) is not necessary for computing the Jacobian determinant of the coupling layer, s(•) or t(•) can be arbitrarily complex for various network designs.
Although the coupling layer may be powerful, the distribution is often very complex in practice.Moreover, it is challenging to transform a complex distribution into another; one transformation is often insufficient.In addition, the forward transformation leaves some components unchanged, with the first d dimensions being identical to the initial data.Figure 5 illustrates the composition of the coupling layer in an alternating pattern.This structure allows different parts of the data to be passed through different transformation paths.It ensures that the final generated data do not contain components originating from the initial data [30].Combining coupling layers is carried out as follows: Then, its reverse process can be expressed as follows: Sensors 2024, 24, 4839 8 of 23 As mentioned above, to minimize the error between the input and reconstructed output, the expectation of p D s (d s (k)) can be estimated by Monte Carlo as follows: Similarly, for the reverse process, For the forward process, the loss function loss H of the model H in this work can be expressed as follows: where D f is a discriminative network and θ f is a hyperparameter, and then the loss function of D f can be expressed as follows: Similarly, the loss function loss H −1 of the reverse process can be expressed as follows: where D r is a discriminative network, and θ r is a hyperparameter, and then the loss function of D r can be expressed as follows: The total loss L total of the proposed CFAN is presented as follows: The overall optimization objective of the model can be written as follows: In summary, the CFAN can learn knowledge in the source domains by adversarial training, and the trained hyperparameter is θ 1 * .
Similarly, for the reverse process, For the forward process, the loss function  ℋ of the model ℋ in this work can be expressed as follows: where  is a discriminative network and  is a hyperparameter, and then the loss function of  can be expressed as follows: Similarly, the loss function  ℋ of the reverse process can be expressed as follows: where  is a discriminative network, and  is a hyperparameter, and then the loss function of  can be expressed as follows: The total loss  of the proposed CFAN is presented as follows: The overall optimization objective of the model can be written as follows: In summary, the CFAN can learn knowledge in the source domains by adversarial training, and the trained hyperparameter is  * .The trained  * model can perform FD tasks under steady operation conditions(The training progress is detailed in Algorithm 1).However, the distributed discrepancies arising from diverse operational conditions result in a decline in its overall performance.To mitigate this issue, fine-tuning of the model is necessary to attain optimal FD performance through TL.The trained CFAN * model can perform FD tasks under steady operation conditions(The training progress is detailed in Algorithm 1).However, the distributed dis-Sensors 2024, 24, 4839 9 of 23 crepancies arising from diverse operational conditions result in a decline in its overall performance.To mitigate this issue, fine-tuning of the model is necessary to attain optimal FD performance through TL.

Fault Detection with Transfer Learning Based on Federated CFANs
This work aims to establish an FD model under dynamic operation with TL.The first CFAN reflects the information on steady-state operation in the system, which was trained in the previous step.The second CFAN learns the performance changes influenced by domain changes.This design concept involves neural model-aided learning to identify changing and unchanging crucial parameters.The framework of the proposed TL strategy is illustrated in Figure 6.

Fault Detection with Transfer Learning Based on Federated CFANs
This work aims to establish an FD model under dynamic operation with TL.The firs CFAN reflects the information on steady-state operation in the system, which was trained in the previous step.The second CFAN learns the performance changes influenced by domain changes.This design concept involves neural model-aided learning to identify changing and unchanging crucial parameters.The framework of the proposed TL strategy is illustrated in Figure 6.From the above formula,  is the path between the source and the target domain which retains the information when the operation conditions change.The  has th ability to calibrate the knowledge changes caused by the varied operation conditions.Th construction of  is similar to  , and  is a hyperparameter.In addition, i also includes the discriminative network  and  , in which  and  are hy perparameters.The loss function  ℋ of  can be expressed as follows: From the above formula, e 1 is the path between the source and the target domain, which retains the information when the operation conditions change.The CFAN 2 has the ability to calibrate the knowledge changes caused by the varied operation conditions.The construction of CFAN 2 is similar to CFAN 1 , and θ 2 is a hyperparameter.In addition, it also includes the discriminative network D f 2 and D r2 , in which θ f 2 and θ r2 are hyperparameters.The loss function loss H 2 of CFAN 2 can be expressed as follows: The loss function loss D f 2 of D f 2 can be expressed as follows: The loss function of the reverse process H −1 2 can be expressed as follows: The loss function loss D r2 of D r2 can be expressed as follows: In summary, the total loss L total2 of CFAN 2 is formulated as follows: The overall optimization objective of the proposed TL model is provided as follows: The CFAN * 2 learns the performance variation of the CFAN * 1 due to varied operation conditions; the training process of CFAN * 2 is detailed in Algorithm 2. The change information ê1 (k) is obtained using the following formula: Based on the above analysis, the residual signal r(m) used for the final FD decision is defined as follows: According to the final decision signal r, m represents the dimension of r.The framework of the proposed federated CFANs is depicted in Figure 7.This work utilizes the root mean square (RMS) norm to maintain satisfacto alarm rates (FARs) in high-dimensional situations.The RMS measures the averag of a signal  and is defined by the following formula: The threshold is set to be

𝐽 = 𝑠𝑢𝑝 𝐽(𝑟)
Then, the fault detection logic becomes The flowchart of the proposed method is illustrated in Figure 8, comprising a training phase and an online fault detection (FD) phase.The first CFAN-based  is trained by using the normal data  obtained during steady operatio tions to extract latent variables and reconstruct data.Subsequently, the model  dergoes federated training based on dynamic operation condition data  .The federated neural networks  * and  * enable feature extraction and th struction of the healthy data.Thus, the residual  is calculated using the fe CFANs.Finally, with the FD threshold  being determined by the RMS statisti residual , the  () of the testing data is compared with  to realize th the TCS.This work utilizes the root mean square (RMS) norm to maintain satisfactory false alarm rates (FARs) in high-dimensional situations.The RMS measures the average energy of a signal r and is defined by the following formula: The threshold is set to be Then, the fault detection logic becomes The flowchart of the proposed method is illustrated in Figure 8, comprising an offline training phase and an online fault detection (FD) phase.The first CFAN-based model CFAN 1 is trained by using the normal data D s obtained during steady operation conditions to extract latent variables and reconstruct data.Subsequently, the model CFAN 2 undergoes federated training based on dynamic operation condition data D t .The trained federated neural networks CFAN * 1 and CFAN * 2 enable feature extraction and the reconstruction of the healthy data.Thus, the residual r is calculated using the federated CFANs.Finally, with the FD threshold J th being determined by the RMS statistics of the residual r, the J(r(m)) RMS of the testing data is compared with J th to realize the FD of the TCS.
dergoes federated training based on dynamic operation condition data  .The traine federated neural networks  * and  * enable feature extraction and the recon struction of the healthy data.Thus, the residual  is calculated using the federate CFANs.Finally, with the FD threshold  being determined by the RMS statistics of th residual , the  () of the testing data is compared with  to realize the FD o the TCS.

Experiment Results and Analysis
In this section, the data source and experimental platform are briefly described.To verify the effectiveness of the proposed method, FD tasks with different methods were performed on the TCS under dynamic operation conditions.Some discussions are proposed based on the experimental results.

Data Description
In this case, a TCS is adopted to demonstrate the effectiveness of the proposed FD method.A simulation platform of traction drive control systems named "TDCS-FIB" is presented in [37,38].TDCS-FIB develops fault injection benchmarks based on simulation models.TDCS-FIB provides a variety of fault injection types for the main components in TCS, which provides reliable data support for fault detection and diagnosis.
To verify the proposed method, a TCS with different TFs is adopted.As depicted in Figure 9, the onboard TCS serves as the experimental system, with its specifications presented in Table 1.The sensor data were collected under traction operation conditions.

Experiment Results and Analysis
In this section, the data source and experimental platform are briefly described.To verify the effectiveness of the proposed method, FD tasks with different methods were performed on the TCS under dynamic operation conditions.Some discussions are proposed based on the experimental results.

Data Description
In this case, a TCS is adopted to demonstrate the effectiveness of the proposed FD method.A simulation platform of traction drive control systems named "TDCS-FIB" is presented in [37,38].TDCS-FIB develops fault injection benchmarks based on simulation models.TDCS-FIB provides a variety of fault injection types for the main components in TCS, which provides reliable data support for fault detection and diagnosis.
To verify the proposed method, a TCS with different TFs is adopted.As depicted in Figure 9, the onboard TCS serves as the experimental system, with its specifications presented in Table 1.The sensor data were collected under traction operation conditions.The filter resistances of dc link 6000 Ω i u,v,w Three-phase currents In practice, transient faults will lead to abnormal data from multiple sensors.Multi-sensor FD can reduce interference and improve detection efficiency [39,40].Therefore, multi-sensor data are used to detect transient faults, which include the three-phase current output [i sa i sb i sc ] of an inverter, the voltage output [u cd1 u cd2 ] of the upper and lower support capacitors in the DC link, and the transformer secondary voltage and current [u n i n ].The FD model of TCS is trained based on the sensor signals as follows: where [i sa i sb i sc u cd1 u cd2 u n i n ] ∈ D s , D t .The collected data can be expressed as D f for the transient faults under dynamic operation conditions.Since the waveforms of the seven groups of sensors tend to be stable after the 1 × 10 4 -th step, 1 × 10 3 samples in the normal steady state of the TCS are obtained as the source domain training dataset D s , and 2 × 10 2 samples in the dynamic condition and 50 in steady are used as the target domain training dataset D t .
The test dataset in the dynamic state contains four transient faults and fault-free scenarios.Each fault scenario contains 5 × 10 2 samples, and the fault-free scenario contains 1 × 10 5 samples.The evaluation of the experimental results is completed using the false alarm rate, fault detection rate (FDR), recall, and accuracy rate (ACR), which are defined as follows: Define fault samples as positive samples and normal samples as negative samples.The total number of fault samples predicted to be correct is called true positive (TP).The total number of fault samples predicted to be errors is called false positive (FP).The total number of normal samples predicted to be correct is true negative (TN), and the total number of errors is false negative (FN).Fx represents the x type of fault.
The proposed model was built by Pytorch 1.13.1.The CFAN 1 and CFAN 2 models have the same structure and contain four affine coupling layers.s(•) includes two fully connected layers with 2 × 100 neurons.t(•) includes two fully connected layers with 2 × 100 neurons.The two discriminators D(•) use the same fully connected structure with (200, 100, 50, 1).According to the loss function L total defined in (29) and L total2 defined in Sensors 2024, 24, 4839 14 of 23 (35), the best weights and biases can be obtained via ADAM.The details of the CFANs and methods for comparison are given in Tables 2 and 3.

Analysis and Discussion
Comparisons between each FD task and other methods were conducted, encompassing four types of FD tasks and fault-free detection tasks for each method.Figures 10-13 illustrate the FD results obtained using both the proposed method and VAE (including transfer and non-transfer learning).The traditional VAE refers to the VAE method without TL, while the federated VAE, which incorporates a similar TL strategy as our proposed method, is adapted for dynamic operating conditions.As shown in Figures 10a, 11a, 12a and 13a, the blue curve represents the a-phase current waveform i sa , and the orange dotted line represents the fault injection time.For (b), (c), and (d) in Figures 10-13, the blue curve represents the detection results using three methods, and the red dotted line in the figures represents the FD threshold J th .
The fault F 1 is attributed to the damage incurred by manufacturing processes, overstress, and other contributing factors on the shielding layer of communication cables.The transmission of external pulses in combinational logic circuits induces variations in both the pulse width and amplitude, which leads to TF in the TCS.
The reason for F 2 faults is that the sensor chip pins and wiring are loose or improperly connected.The sensor signal is instantaneously disturbed by vibration, thereby inducing transient fault F 2 .
Transient shock faults F 3 may arise from improper sensor installation and the degradation of insulating materials triggered by power and ground wire surges.
The occurrence of F 4 can be attributed to IGBT damage resulting from internal structural defects, manufacturing processes, and other contributing factors.Furthermore, excessive stress induced by high temperatures may lead to gate driver circuit failure, such as TF caused by erroneous pulse control signals originating from the control circuit.method, is adapted for dynamic operating conditions.As shown in Figures 10a-13a, blue curve represents the a-phase current waveform  , and the orange dotted line r resents the fault injection time.For (b), (c), and (d) in Figures 10-13, the blue curve rep sents the detection results using three methods, and the red dotted line in the figures r resents the FD threshold  .The comparison results of the three methods are illustrated in Figure 14, and Table 4 the ACR and average fault detection delay.The proposed method comprehensively achieves better performance for different FD tasks.Specifically, Figure 14a shows the FDR, and Figure 14b shows the FAR of different methods under four types of faults.The FDR of the other two FD methods is lower than that of the method described in this article.In Figure 14b and Table 3, the FAR, recall, and ACR of different methods are all lower than those of the method proposed in this work.The traditional VAE does not include the TL process and cannot adaptively adjust the changing knowledge based on the target domain data, which causes poor FD performance.
The fault  is attributed to the damage incurred by manufacturing processes, overstress, and other contributing factors on the shielding layer of communication cables.The transmission of external pulses in combinational logic circuits induces variations in both the pulse width and amplitude, which leads to TF in the TCS.The data distributions vary across different operation conditions of TCSs, leading to a degradation in FD performance.However, there exists common knowledge among various operation conditions, necessitating the acquisition of knowledge from the steady-state operation of a TCS.As depicted in Figures 10-13, due to the proposed TL strategy that leverages prior knowledge and mitigates the impact of operational variations, a federated VAE outperforms a traditional VAE.The proposed TL strategy based on federated CFANs effectively transfers and adapts knowledge between steady-state and dynamic operation conditions while ensuring the accurate extraction of latent variables and data reconstruction.By leveraging the adversarial training and reversibility properties of CFANs, the precise description of data distribution is achieved through bidirectional optimization, resulting in significant performance improvements as demonstrated in Figure 14 and Table 4. Especially for weak TFs (case studies F 1 , F 2 , and F 4 ), this proposed method exhibits superior fault detection capabilities under dynamic operating conditions.In addition, FD experiments are also introduced under steady operating conditions.The test dataset in the steady state also contains four transient faults and fault-free scenarios which are similar to F 1 , F 2 , F 3 , and F 4 in dynamic operation conditions.The performance comparison of different methods is shown in Table 5, each fault scenario contains 1 × 10 3 samples, and the fault-free scenario contains 1 × 10 5 samples.The comparison results of the three methods are illustrated in Figure 15 and Tables 6 and 7.
The comparison results in steady operation conditions are illustrated in Figure 15, and Table 7 shows the FAR, recall, ACR, and average FD delay.It can be concluded that the proposed second CFAN achieves better performance for different FD tasks under steady operation conditions, for the reason that the knowledge of steady states has been learned by a small amount of data in healthy condition.The occurrence of  can be attributed to IGBT damage resulting from internal structural defects, manufacturing processes, and other contributing factors.Furthermore, excessive stress induced by high temperatures may lead to gate driver circuit failure, such as TF caused by erroneous pulse control signals originating from the control circuit.
The comparison results of the three methods are illustrated in Figure 14, and Table 4 shows the ACR and average fault detection delay.The proposed method comprehensively achieves better performance for different FD tasks.Specifically, Figure 14a shows the FDR, and Figure 14b shows the FAR of different methods under four types of faults.The FDR of the other two FD methods is lower than that of the method described in this article.In Figure 14b and Table 3, the FAR, recall, and ACR of different methods are all lower than those of the method proposed in this work.The traditional VAE does not include the TL process and cannot adaptively adjust the changing knowledge based on the target domain data, which causes poor FD performance.The data distributions vary across different operation conditions of TCSs, leading to a degradation in FD performance.However, there exists common knowledge among various operation conditions, necessitating the acquisition of knowledge from the steadystate operation of a TCS.As depicted in Figures 10-13, due to the proposed TL strategy that leverages prior knowledge and mitigates the impact of operational variations, a federated VAE outperforms a traditional VAE.The proposed TL strategy based on federated   The comparison results in steady operation conditions are illustrated in Figure 15, and Table 7 shows the FAR, recall, ACR, and average FD delay.It can be concluded that the proposed second CFAN achieves better performance for different FD tasks under steady operation conditions, for the reason that the knowledge of steady states has been learned by a small amount of data in healthy condition.The training loss curve is defined by the Mean Squared Error (MSE) for evaluating the reconstruction accuracy.As illustrated in Figure 16a, during the training of the proposed method, the training loss stabilizes at a lower level than the other two methods, indicating the superior data reconstruction capabilities of the proposed method.Figure 16b displays the loss of CFAN 2 and federated VAE network2.The losses of two methods converge to a similar value, which illustrates that both networks have the ability to achieve performance adjustments.The training loss curve is defined by the Mean Squared Error (MSE) for evaluating the reconstruction accuracy.As illustrated in Figure 16a, during the training of the proposed method, the training loss stabilizes at a lower level than the other two methods, indicating the superior data reconstruction capabilities of the proposed method.Figure 16b displays the loss of  and federated VAE network2.The losses of two methods converge to a similar value, which illustrates that both networks have the ability to achieve performance adjustments.The ROC-AUC (Receiver Operating Characteristic-Area Under the Curve) curves of three methods are shown in Figure 17, the AUC of the proposed method is 0.953, while the AUC values of the traditional VAE and the federated VAE are 0.826 and 0.907, respectively.The proposed method has the largest area under the curve, indicating superior performance in terms of FD.Generally, to ensure the security of the system, the TCS typically works in normal states.As a result, the fault occurrences have a much lower chance of appearing than the healthy instances [21].This unsupervised method only learns normal patterns from faultfree data, which is a feasible solution to the problem of imbalanced data.Therefore, unsupervised learning improves robustness without the cost of labeling.This FD method is not limited to the TCS of the train, but for faults in other electrical systems, this method has efficient transient FD performance.The training loss curve is defined by the Mean Squared Error (MSE) for evaluating the reconstruction accuracy.As illustrated in Figure 16a, during the training of the proposed method, the training loss stabilizes at a lower level than the other two methods, indicating the superior data reconstruction capabilities of the proposed method.Figure 16b displays the loss of  and federated VAE network2.The losses of two methods converge to a similar value, which illustrates that both networks have the ability to achieve performance adjustments.The ROC-AUC (Receiver Operating Characteristic-Area Under the Curve) curves of three methods are shown in Figure 17, the AUC of the proposed method is 0.953, while the AUC values of the traditional VAE and the federated VAE are 0.826 and 0.907, respectively.The proposed method has the largest area under the curve, indicating superior performance in terms of FD.Generally, to ensure the security of the system, the TCS typically works in normal states.As a result, the fault occurrences have a much lower chance of appearing than the healthy instances [21].This unsupervised method only learns normal patterns from faultfree data, which is a feasible solution to the problem of imbalanced data.Therefore, unsupervised learning improves robustness without the cost of labeling.This FD method is not limited to the TCS of the train, but for faults in other electrical systems, this method has efficient transient FD performance.Generally, to ensure the security of the system, the TCS typically works in normal states.As a result, the fault occurrences have a much lower chance of appearing than the healthy instances [21].This unsupervised method only learns normal patterns from fault-free data, which is a feasible solution to the problem of imbalanced data.Therefore, unsupervised learning improves robustness without the cost of labeling.This FD method is not limited to the TCS of the train, but for faults in other electrical systems, this method has efficient transient FD performance.

Conclusions
In this work, we present a transient fault detection method under dynamic operation conditions.For the purpose of latent variable extraction and data reconstruction, a CFAN is established by an invertible flow model and two discriminative networks; additionally, the loss function was designed.Moreover, adversarial training and bidirectional optimization can enhance the reconstruction quality and depress interference caused by background noise.
Then, an unsupervised transfer learning strategy based on federated CFANs is proposed for transient fault detection under various operation conditions, which is divided into two stages.Initially, the first CFAN model is trained using the normal data in steady operation conditions.Subsequently, the second CFAN calibrates the changed information caused by varied operation conditions utilizing only a few samples.The federated CFANs can jointly learn latent knowledge in steady states and be applied to transient fault detection in various operation conditions.
By selecting the data-driven fault detection methods for comparative experiments, the effectiveness of the method is verified.
Several directions are available for future work.The first is to develop fault diagnosis technology and locate faulty components further.Otherwise, the FD method employed in this work is based on the CRH2 type, and data related to high-speed trains with different topological structures have not been explored.Such out-of-distribution (OOD) data, as mentioned in [41], may negatively impact FD performance.Future work will be considered, and fault diagnosis methods for high-speed trains of multiple types will be developed.

Figure 1 .
Figure 1. Circuit topology of TCS in high-speed trains.

Figure 3 .
Figure 3.The framework of the CFAN model.

Figure 3 .
Figure 3.The framework of the CFAN model.

Figure 4 .
Figure 4.The structure of a coupling layer.

Figure 4 .
Figure 4.The structure of a coupling layer. h

Figure 6 .
Figure 6.The federated CFAN-based TL strategy.The data of target domain  are input into the  * after training.Due to th different data distribution between the  and  , their performance will also change Consider target domain sample { ()} from  , where the residual signal  ()~ can be expressed as follows:  () =  () −  * ( ();  * ) (31

Figure 6 .
Figure 6.The federated CFAN-based TL strategy.The data of target domain D t are input into the CFAN * 1 after training.Due to the different data distribution between the D s and D t , their performance will also change.Consider target domain sample {d t (n)} L n=1 from D t , where the residual signal e 1 (n) ∼ E 1 can be expressed as follows:

Figure 7 .
Figure 7.The structure of federated CFAN-based transfer learning.

Figure 8 .
Figure 8.The overall flowchart of the proposed FD method.

Figure 8 .
Figure 8.The overall flowchart of the proposed FD method.

Figure 9 .
Figure 9.The onboard TCS in high-speed trains.(a) Traction control unit.(b) Main circuit of TCS.

Figure 9 .
Figure 9.The onboard TCS in high-speed trains.(a) Traction control unit.(b) Main circuit of TCS.

Figure 10 .
Figure 10.The current waveform and FD results for  .(a) is the traction motor a-phase curr waveform  of the  fault; (b) is the FD result obtained through the proposed method; (c) is FD result obtained through the traditional VAE; (d) is the FD result obtained through the federa VAE.

Figure 10 .
Figure 10.The current waveform and FD results for F 1 .(a) is the traction motor a-phase current waveform i sa of the F 1 fault; (b) is the FD result obtained through the proposed method; (c) is the FD result obtained through the traditional VAE; (d) is the FD result obtained through the federated VAE.

Figure 11 .
Figure 11.The current waveform and FD results for  .(a) is the traction motor a-phase current waveform  of the  fault; (b) is the FD result obtained through the proposed method; (c) is the FD result obtained through the traditional VAE; (d) is the FD result obtained through the federated VAE.The reason for  faults is that the sensor chip pins and wiring are loose or improperly connected.The sensor signal is instantaneously disturbed by vibration, thereby inducing transient fault  .

Figure 11 .
Figure 11.The current waveform and FD results for F 2 .(a) is the traction motor a-phase current waveform i sa of the F 2 fault; (b) is the FD result obtained through the proposed method; (c) is the FD result obtained through the traditional VAE; (d) is the FD result obtained through the federated VAE.

Figure 12 .
Figure 12.The current waveform and FD results for  .(a) is the traction motor a-phase curr waveform  of the  fault; (b) is the FD result obtained through the proposed method; (c) is FD result obtained through the traditional VAE; (d) is the FD result obtained through the federa VAE.Transient shock faults  may arise from improper sensor installation and the d radation of insulating materials triggered by power and ground wire surges.

Figure 12 .
Figure 12.The current waveform and FD results for F 3 .(a) is the traction motor a-phase current waveform i sa of the F 3 fault; (b) is the FD result obtained through the proposed method; (c) is the FD result obtained through the traditional VAE; (d) is the FD result obtained through the federated VAE.

Figure 13 .
Figure 13.The current waveform and FD results for  .(a) is the traction motor a-phase current waveform  of the  fault; (b) is the FD result obtained through the proposed method; (c) is the FD result obtained through the traditional VAE; (d) is the FD result obtained through the federated VAE.

Figure 13 .Figure 14 .
Figure 13.The current waveform and FD results for F 4 .(a) is the traction motor a-phase current waveform i sa of the F 4 fault; (b) is the FD result obtained through the proposed method; (c) is the FD result obtained through the traditional VAE; (d) is the FD result obtained through the federated VAE.Sensors 2024, 24, x FOR PEER REVIEW 18 of 23

Figure 14 .
Figure 14.The comparison of results among different methods: (a) the FDR of four types of transient faults; (b) the FAR, recall, and ACR of three methods.

Figure 15 .
Figure 15.The comparison results among different methods in steady operation conditions: (a) the FDRs of three methods; (b) the FAR, recall, and ACR of three methods.

Figure 15 .
Figure 15.The comparison results among different methods in steady operation conditions: (a) the FDRs of three methods; (b) the FAR, recall, and ACR of three methods.

Figure 16 .
Figure 16.Loss of three methods.(a) Comparison of three methods for first network loss; (b) second network loss comparison of proposed method and federated VAE.

Figure 16 .
Figure 16.Loss of three methods.(a) Comparison of three methods for first network loss; (b) second network loss comparison of proposed method and federated VAE.
The ROC-AUC (Receiver Operating Characteristic-Area Under the Curve) curves of three methods are shown in Figure17, the AUC of the proposed method is 0.953, while the AUC values of the traditional VAE and the federated VAE are 0.826 and 0.907, respectively.The proposed method has the largest area under the curve, indicating superior performance in terms of FD.Sensors 2024, 24, x FOR PEER REVIEW 20 of 23

Figure 16 .
Figure 16.Loss of three methods.(a) Comparison of three methods for first network loss; (b) second network loss comparison of proposed method and federated VAE.

Table 1 .
Specifications of experimental system under normal case.
Parameter Setting Parameter Description Value  Stator's resistance 0.114 Ω  Rotor's resistance 0.146 Ω  Magnetizing inductance 32.747 H  Rated power of traction motor.300 KW  Motor pole pairs 2

Table 1 .
Specifications of experimental system under normal case.

Table 3 .
Configuration of methods for comparison.

Table 4 .
Detection results of different methods.

Table 5 .
Performance comparison of different methods.

Table 6 .
Detection results of different methods in steady operation conditions.

Table 6 .
Detection results of different methods in steady operation conditions.

Table 7 .
Performance comparison of different methods in steady operation conditions.