Next Article in Journal
Sharp Profile for Icebreaking Propellers to Improve Their Ice and Hydrodynamic Characteristics
Previous Article in Journal
Automatically Guided Selection of a Set of Underwater Calibration Images
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multi-Level Federated Network Based on Interpretable Indicators for Ship Rolling Bearing Fault Diagnosis

1
Logistics Engineering College, Shanghai Maritime University, Shanghai 201306, China
2
College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
J. Mar. Sci. Eng. 2022, 10(6), 743; https://doi.org/10.3390/jmse10060743
Submission received: 6 April 2022 / Revised: 20 May 2022 / Accepted: 26 May 2022 / Published: 28 May 2022
(This article belongs to the Section Ocean Engineering)

Abstract

:
The federated learning network requires all the connection weights to be shared among the server and clients during training which increases the risk of data leakage. Meanwhile, the traditional federated learning method has a poor diagnostic effect for non-independently identically distributed data. In order to address these issues, a multi-level federated network based on interpretable indicators was proposed in this manuscript. Firstly, an interpretable adaptive sparse deep network is constructed based on the interpretability principle. Secondly, the relevance map of the network is constructed based on interpretable indicators. Based on this map, the contribution of the connection weights in the network is used to build a multi-level federated network. Finally, the effectiveness of the proposed algorithm has been proved through experimental validation in the paper.

1. Introduction

Ships’ powertrains are often required to operate for long periods of time or even overloaded during shipping, which can sometimes lead to powertrain failures. Rolling bearings, as a common component of the powertrain, are often damaged under some abnormal conditions [1]. People realize that it is important to conduct fault diagnosis of ship powertrain equipment. Based on the operating data of the equipment, faults can be detected and repaired in time, so that further damage can be prevented [2]. The powertrains units inside the ship include engines and a large number of rolling bearings where friction occurs continuously with the mechanical movements. The fault that occurs in these rolling bearings accounts for a significant proportion of all the faults of powertrains units [3].
Common fault diagnosis (classification) methods include analytical model-based models and knowledge-based and data-driven approaches. Analytical model-based observer methods are limited in their application in complex systems because of the difficulty of establishing an accurate observer model to describe the mechanism of the object [4,5,6]. In addition, the difficulty in acquiring complete knowledge of complex systems has limited the application of fault diagnosis methods based on knowledge acquisition and knowledge application of cognitive processes [7]. However, data-driven approaches are becoming increasingly popular with vast amounts of data [8,9]. Traditional data-driven methods, such as statistical analysis-based principal component analysis, are built on the necessary statistical properties of the obtained data. Nevertheless, it is a difficult situation for a system that has a large amount of data but hard to establish statistical properties [10]. A data-driven approach for fault diagnosis can effectively record operational information and fault states from a large amount of historical data [11]. Based on the obtained process information, data-driven diagnostic models can be built to describe industrial processes and provide valid diagnostic results [12]. The most popular data-driven fault diagnosis methods are artificial neural network (ANN) algorithms [13,14,15], autoencoders [16] and Bayesian networks [17]. In this manuscript, we propose a novel FL (Federated Learning) model to address the problem when data-driven diagnosis methods do not have enough data on a single client. The model provides good diagnostic results even when the FL dataset is under a non-independent identical distribution. Meanwhile, it can also effectively protect data privacy.
The subsequent sections of this manuscript are organized as follows: Section 2 presents the related work. Section 3 introduces preliminary theoretical preparations for the paper. Section 4 describes the method proposed in detail. Section 5 designs experimental validation for the proposed algorithms, and Section 6 ends with the conclusion and gives suggestions for future work.

2. Related Work

The data-based diagnostics methods require a large amount of historical data to train an effective model [18]. However, it is common that some equipment (whose data is required to be kept private) is confidential, such as military vessels and some commercial boats in specific industries, which makes it extremely difficult to collect large amounts of data. In recent years, Internet of Things (IoT) technology has achieved rapid development. To further explore the ubiquitously connected world, deep learning techniques are often used to analyze the data collected by the IoT. The sensors are constantly collecting data about the operational status of the equipment during the operation of the equipment. It is possible to analyze the condition of the equipment in depth by combining these collected data [19]. With the combination of IoT technology and FL [20], such situations have been effectively addressed. FL has now achieved considerable research results in the field of IoT. A typical federated learning base on the Internet of Things (FL-IOT) architecture network structure is shown in Figure 1.
There are many experts and scholars who have proposed various improvements and extension formats on the basis of FL. Zhang et al. [21] applied FL to short-circuit fault diagnosis of permanent magnet synchronous motors, which effectively solved the problem of lacking samples. FL is slow to converge because of the constant synthesis of each client’s upload network parameters during the learning process. To address this problem, Liu et al. [22] proposed a momentum federated learning (MFL) method, which uses the MFL method to perform gradient updates while conducting local training. Considering the problem that only some of the clients’ data are fused at a training epoch in federated learning, Chen et al. [23] proposed a probabilistic user selection scheme that connects the server to users whose local FL models have a significant impact on the global FL model with high probability, thus increasing the global convergence speed. Amiri et al. [24] found that most FLs did not study federated learning from the actual physical layer. This paper proposed a wireless multiple access channel (MAC) from the device to the server, through which the parameter server receives the gradient calculated by the device at each iteration of the distributed stochastic gradient descent algorithm, thus greatly increasing the convergence speed. The above research focus on improving the parameter integration algorithm to achieve higher accuracy without effectively considering the privacy of the data.
Cyber-attacks can also have a serious impact on the effectiveness of FL. For example, Saha R et al. [25] pointed out that although FL is a popular distributed learning approach, global aggregation in FL relies on a centralized server which is vulnerable to malicious attacks, resulting in inefficient training models. This paper added fog nodes under the cloud server according to the location of the client distribution, thus effectively avoiding too much reliance on a single cloud server. Hao et al. [26] proposed privacy-enhanced federated learning (PEFL) to address the phenomenon that an attacker can still perform attacks on the network in the FL framework. Compared to the existing solutions, PEFL is non-interactive and can prevent private data from being compromised even if multiple entities collude with each other. Li et al. [27] proposed a CNN gated recurrent unit (CNN-GRU)-based detection model to identify and isolate the attacks in federated networks. The above research mainly focus on improving the convergence speed of FL, and they do not fully consider how to deal with the problem of data leakage.
Improvements have also been proposed for the problem of whether the data for federated learning obeys the independently identically distributed (IID) data. Sattler et al. [28] addressed the situation where the samples are unevenly distributed across clients, i.e., the data is non-independently identically distributed (NOIID), and the network complexity is limited. A clustering federated learning algorithm is proposed. The clustering structure is derived from the cosine correlation metrics for the client data, and the gradient is updated by the clustering federation algorithm. The communication cost of a federated network is also an important limitation on its performance, so Chen et al. [29] propose an enhanced joint learning technique to address the dilemma. They took an asynchronous learning strategy for the client and a time-weighted aggregation strategy for the local model on the server. With the development of blockchain technology, researchers have started to consider the integration of blockchain technology and FL to implement a decentralized FL network [30]. However, these are all faced with the problem of a black box of machine learning neural network and lack of interpretability. They improve the federal aggregation algorithm to prevent cyber-attacks. However, no consideration has been given to the situation that there exists a hostile client and how to prevent data leakage if the hostile client knows the gradient parameters of the other clients.
In summary, it is worth pointing out that FL technology has some limitations: (1) Federated networks require fast and real-time communications capabilities; (2) the traditional FL method has poor diagnostic effect for data under NOIID distribution; (3) the risk of data leakage from federated networks exists, and some studies have shown that information about data can be reversed through the gradient of updates shared by federated networks [31]; (4) the federated network obtained by means of learning is a black box, and it is difficult to understand the internal mechanisms of the network. Moreover, the credibility of the diagnostic results given by the network is doubtful in practical applications. To address the above problems with the usage of FL, we design a multi-level shared federated network based on interpretable indicators (i-MFN).
The main contributions of this manuscript are as follows.
(1)
Construct the interpretable adaptive sparse depth networks;
(2)
Design a map of network correlations based on the network’s interpretability parameters;
(3)
Establish a multi-level FL network and design a corresponding gradient sharing mechanism.

3. Preliminaries

3.1. LRP Network Interpretability Principle

Neural networks have been widely used in various fields due to their extremely nonlinear fitting properties and powerful information extraction ability. Currently, people mostly use neural networks as black boxes, i.e., they do not care about their internal propagation mechanisms and logic, which leads to the results with low credibility in some areas to some degree. For example, in the field of medical diagnosis, doctors are unclear about the basis of the results given by neural networks, leading to limitations in their diffusion. The Layer-wise Relevance Propagation (LRP) [32] technique is a metric based on the importance of the correlation between layers of a neural network. The main principle can be summarized as follows: Firstly, starting from the last layer of the network and proceeding backward, the correlation index between the outputs of each neuron in the penultimate layer is calculated according to the output of the network. Then, according to the correlation index of the neurons in the penultimate layer, the correlation of each neuron in the preceding layer can be deduced backward. By parity of reasoning, calculating the correlation index of the outputs of all neurons in the network with the results is doable. The calculation formula is shown in Equations (1) and (2), where i is the input to the k-th neuron, Σ i is the input set of the k-th neuron, w i k is the connection value between the i-th neuron in layer l and the k-th neuron in layer l + 1, a is the output value of the neuron, and R k ( l ) denotes the correlation index of the kth neuron in the lth layer to the output result of the neural network.
R k ( l + 1 ) = i R i k ( l , l + 1 )
R i ( l ) = k R i k ( l , l + 1 )
R i k ( l , l + 1 ) = R k ( l + 1 ) a i ( l ) w i k i a i ( l ) w i k
Taking a three-layer neural network as an example, its propagation calculation process is shown in Figure 2.

3.2. Introduction on Federated Learning Framework

Machine learning neural networks and some other learning algorithms require a large amount of data to train the network, and many scenarios require that data can only be used internally and are not available for sharing with others in practical applications, which makes it difficult to train a neural network with good performance. In order to solve this problem, the federated learning mechanism enables the neural network to effectively learn the internal features of each individual while effectively protecting the privacy of the user’s data. The client data can be non-independent and identically distributed.
As shown in Figure 1, a typical FL-IOT architecture network structure consists of different clients, servers, and communication networks. The local model of clients is trained on their own data, while the data keeps private. The model parameters are transmitted among the communication network, which is built on satellites or 5G communication technology.
Assuming that a certain federated network has K clients, X denotes the sample feature space, and Y = { 1 , C } denotes the label space. The federated learning loss function is shown in Equation (4).
L ( w t f ) k = 1 K n k | D | i = 1 C p k ( y = i ) E x | y = i [ log i ( x , w t k ) ] = E ( x , y ) ~ p k [ i = 1 C 1 y = i log i ( x , w t k ) ]
where is the federated network model function, C represents the types of all fault outputs, X is the feature space of the sample data, and i denotes the network model of the i-th client.
The training model for each client is shown in Equation (5).
L k ( w t k ) = E ( x , y ) ~ p k [ i = 1 C 1 y = i log i ( x , w t k ) ] = i = 1 C p k ( y = i ) E x | y = i [ log i ( x , w t k ) ]

4. The Proposed Method

4.1. Interpretable Adaptive Sparse Depth Networks

The structure of the interpretable adaptive sparse depth networks constructed based on Layer-wise Relevance Propagation technique is shown in Figure 3. After normalizing the correlation values of each layer in the network, they are fed into the adaptive sparse layer, which performs adaptive sparse processing of the neurons. After the neural network has been trained, the sparse value of each neuron is the final sparse value.
The workflow of interpretable adaptive sparse depth networks is as follows.
Step 1: The network enters a fine-tuning phase after the coarse tuning of encoding and complete decoding.
Step 2: The interpretable correlation parameters for each neuron are derived after each forward training iteration is completed.
Step 3: The correlation parameters are normalized and used as the input value of the adaptive sparse layer neurons.
Step 4: The adaptive sparse layer is processed to obtain a sparse value which is used as a threshold for forward propagation sparsity of the network.
The literature [33] shows that, as the diagnostic capability of the network increases, the locations of the effective neurons in the network that extract the valuable signals will be fixed. Therefore, adding the interpretable term to the loss function of the network is expected, as shown in Equation (6), where λ is the coefficient of the interpretable loss function. At denotes the test accuracy of the test set, and ϕ ( x ) represents the Softplus function after normalization [34]. ϕ ( x ) is used as a nonlinear transformation function to measure the correlation between the output value of each neuron and the network outcome, and its expression is shown in Equation (7).
J Q = λ ( 1 A t ) l = 1 L i = 1 n l ϕ ( R i ( l ) )
ϕ ( x ) = s o f t p l u s max ( s o f t p l u s )
s o f t p l u s = ln ( 1 + e x )
The interpretable correlation parameters for each neuron are used to train the adaptive sparse layer. Then the output value of the adaptive sparse layer is used for sparsing the network. From Equation (6), it can be seen that as the training accuracy continue improving, the neurons that play a positive correlation role will be gradually highlighted, while the negatively correlated neurons will be weakened. After such processing, the generalization ability of the network can be greatly improved. The adaptive sparse method is a shallow neural network consisting of neurons whose output values are processed by a sigmoid nonlinear function. Neurons will be inhibited, according to Equation (9), which will limit the output values to between (0, 1). This method is a soft sparse mechanism compared to the traditional 0 or 1 sparse method. Due to the neuron structure, the sparse metrics can be adaptively adjusted to the different network structures.
s i g m o i d ( x ) = 1 1 + e x
The main advantages of the interpretable adaptive sparse deep neural networks proposed in this manuscript are shown as follows when compared to traditional sparse methods.
(1)
The sparse approach is a soft-threshold sparse approach;
(2)
Sparse indicators are interpretable;
(3)
It can be adaptively adjusted according to the structure of the network or on the data set.

4.2. Construction of the Multi-Level Federated Learning Framework

4.2.1. Multi-Level Sharing Mechanism

Based on the LRP values of the network, an Rw map is constructed. Taking the neural network shown in Figure 4 as an example, the construction process of the Rw map is as follows:
(1)
The Rw map consists of n l × n l + 1 . pixel points, where n l is the number of neurons in layer l, and n l + 1 is the number of neurons in layer l + 1;
(2)
The vertical axis corresponds to neurons in layer l, and the horizontal axis corresponds to neurons in layer l + 1. Each pixel R i k ( l , l + 1 ) in the map corresponds to the cross-values of layers l and l + 1;
(3)
A Rw map can be made by coloring according to the red and blue heat map.
According to the definition of the LRP value, the deep red parts correspond to network nodes that have a significant contribution to the output, while the deep blue sections are negatively correlated. Figure 5 shows an example of the Rw map. The deep red part corresponds to a network node that has a significant contribution to the results, so it will be sent to the top level for parameter sharing. The middle-colored segment corresponds to the network parameters that contribute the second most to the network diagnostic results and are shared in the secondary sharing level. The network parameters corresponding to the deep blue section are negatively correlated to the diagnostic results and are thus reserved locally and not shared with other clients.
If R i k ( l , l + 1 ) > R θ + , the corresponding parameters of the net will be shared at the top sharing level. While it meets if R θ - < R i k ( l , l + 1 ) < R θ + , they are shared at the secondary level, and if R i k ( l , l + 1 ) < R θ - , they are not shared.
A multi-level federated learning framework is shown in Figure 6. The sharing method is divided into three levels. Network nodes can be classified according to their LRP values, and the importance of each federated network can be divided into three categories according to the node’s contribution to the prediction results during the design of the FL process. The first: if R i k ( l , l + 1 ) > R θ + , then its corresponding parameters are for the top sharing level. The second: if R i k ( l , l + 1 ) < R θ - , then its corresponding network connection parameters are not shared. The third: if R θ - < R i k ( l , l + 1 ) < R θ + , then its corresponding network parameters are shared in the secondary level. A multi-level parameter sharing mechanism is established.
Figure 7 shows the flow chart of the proposed i-MFN algorithm. Firstly, each client performs local training. Based on the network values, the LRP values of the network are calculated. Then the Rw map can be drawn. Finally, the parameters can be sent out for multi-level sharing based on the Rw map.

4.2.2. Network Parameter Update Strategy

Suppose that there is a set X = { X 1 , X 2 X L } , where X 1 = { x 1 1 , x 1 2 x 1 N } , X 2 = { x 2 1 , x 2 2 x 2 N } X L = { x L 1 , x L 2 x L N } reconstitute the elements at each position of X 1 , X 2 X L into a new set, thus it obtains a new set X 1 = { x 1 1 , x 2 1 x L 1 } ,   X 2 = { x 1 2 , x 2 2 x L 2 } X N = { x 1 N , x 2 N x L N } Define mapping:
G ( X 1 , X 2 X L ) = { d 1 , d 2 d N }
The following conditions are met:
(1)
The set of all non-zero elements in the set X n = { x 1 n , x 2 n x L n } is C n = { x 1 n , x 2 n x L n |   x l n X n , x l n 0 } ;
(2)
d n = 1 L l = 1 L x l n .
Based on the parameter update strategy of the network, the pseudo-code of the i-MFN algorithm can be divided into two major parts: the server side and the client side.
Algorithm 1 shows the interpretable federated learning algorithm for the server side.
Algorithm 1:i-MFN algorithm for server side
1. Input: LRP threshold R θ , learning rate l r , loss function J , maximum number of iterations N t r a i n
2. Cloud server side:
3. Cloud parameter initialization θ 0 c l o u d
4. for i = 1 , 2 , N t r a i n  do
5.   Send θ i 1 c l o u d of round i − 1 to each secondary level
6.   Wait for each secondary level to upload its own network parameters { θ i k c l o u d } k = 1 N e d g e
7.   Update θ i c l o u d G ( θ i 1 c l o u d , θ i 2 c l o u d , θ i N e d g e c l o u d )
8. end for
9. Edge server side:
10. for i = 1 , 2 , N t r a i n  do
11.   for k = 1 , 2 , N e d g e  do
12.     Download θ i - 1 c l o u d from top level server
13.     Wait for each subordinate clients to upload its own network parameters { θ i j } j = 1 N c l i e n t k and Rw-MAP
14.    Extract the secondary level parameters and share parameters for each client based on the uploaded Rw-MAP { θ i , c l o u d j } j = 1 N c l i e n t k { θ i j | R w > R θ + } j = 1 N c l i e n t k , { θ i , e d g e j } j = 1 N c l i e n t k { θ i j | R θ - < R w < R θ + } j = 1 N c l i e n t k , { θ i j } j = 1 N c l i e n t k = { θ i , e d g e j } j = 1 N c l i e n t k { θ i , C l o u d j } j = 1 N c l i e n t k . Unshared positions in θ i j are filled with zeros θ i k c l i e n t G ( θ i , e d g e 1 , θ i , e d g e 2 θ i , e d g e N c l i e n t k , θ i - 1 c l o u d ) , θ i k c l o u d G ( θ i , c l o u d 1 , θ i , c l o u d 2 θ i , c l o u d N c l i e n t k )
15.    Upload θ i k c l o u d to top level and send θ i k c l i e n t to the secondary level
16.  end for
17. end for
18. Output: θ c l i e n t
For better illustrating the parameter update process, the interpretable federated learning algorithm for the client side is shown as Algorithm 2.
Algorithm 2:i-MFN algorithm for client side
1. Client Side
2. for i = 1 , 2 , N t r a i n  do
3.   for k = 1 , 2 , N e d g e  do
4.     Download θ i 1 k c l i e n t
5.     for j = 1 , 2 , N c l i e n t k  do
6.      θ i k , j G ( θ i - 1 k c l i e n t , θ i 1 k , j ) as the network parameters for this round of training
7.        for each training batch do
8.          θ i k , j θ i k , j l r J θ i k , j
9.        end for
10.     Upload client parameters θ i j , k
11.     end for
12.  end for
13. end for

5. Simulation Experiments

5.1. Dataset Preparation

In order to evaluate the effectiveness of the proposed method in fault diagnosis, some experiments were carried out for validation. The experimental data is a publicly available motor bearing dataset from Case Western Reserve University. We used the proposed interpretable federated learning model for fault diagnosis validation.
The rolling bearing dataset of Western Reserve University has nine fault types. They are: Ball Defect I (BDI), Ball Defect II (BDII), Ball Defect III (BDIII), Inner Ring Defect I (IR I), Inner Ring Defect II (IRII), Inner Ring Defect III (IRIII), Outer Ring Defect I (OR I), Outer Ring Defect II (ORII) and Outer Ring Defect III (ORIII). The sampling frequency of the vibration sensor is 12 kHz, and the motor speed is set to 1772 rpm. It can be figured out that there are about 400 sampling points per circle. Three hundred samples were collected for each fault type. The sample states are shown in Table 1. The samples were processed using the k-fold cross-validation method to ensure that the experimental results are non-chance.
Figure 8 shows the waveform of the above fault samples on the drive side.

5.2. Experimental Settings

5.2.1. Feature Extraction

Effective fault feature extraction of the collected data is a key step toward effective fault diagnosis. We refer to the method in [35] to extract fault features in the frequency and time domains and feed the extracted feature information into the diagnosis network as the input to the network. We have listed the fault characteristics used in the text, as shown in Table 2 below.

5.2.2. Data IID Distribution

The data for each client is independently and identically distributed (IID). In practical experiments, it is assumed that each client contains the same fault types, and the dataset number is not significantly different.

5.2.3. Data NOIID Distribution

In this experiment, the case where each client has different types of fault data is called NOIID data distribution. The assignment of data between clients should ensure the NOIID distribution between clients while maintaining a balance of data across all clients. The actual assignment of fault types to individual clients is shown in Table 3.

5.2.4. Rw Map

Based on the principles proposed in Chapter III, a schematic of the Rw map for a four-layer network is shown in Figure 9a, and a map decomposition of the Rw values at the corresponding level is shown in Figure 9b.
From Figure 9, the diagnosis net has four layers. According to the Rw map, the importance of each neuron is clearly demonstrated.

5.3. Experiment and Analysis

5.3.1. Description of Experimental Comparison Conditions

(1)
Upper Limit: All clients’ data is used to train a neural network, and the result is used as an upper limit for the diagnostic accuracy of the diagnosis network.
(2)
Distributed IID: All clients are trained only using their own private IID data. Distributed-own indicates a fully independent structure where the test dataset is also the client’s own data. The test dataset for Distributed-all is a centralized structure, indicating that the test dataset is the sum of all clients’ test sets.
(3)
Distributed NOIID: All clients are trained using only their own private NOIID data. Distributed-own indicates a fully independent structure where the test dataset is also the client’s own data. The test dataset for Distributed-all is a centralized structure, indicating that the test dataset is the sum of all clients’ test sets.
(4)
i-MFN: Denotes the test accuracy of the algorithm proposed in the manuscript, and the test set is centralized.

5.3.2. Experiment for IID Data Distribution

Figure 10 and Table 4 show the experiment results for each experimental condition when the data is IID distributed.
When experiments were conducted with each client using the IID data distribution, the trend in accuracy rates for each client is shown in Figure 11. In the experiments, the test set is owned by the clients themselves, except for ClientX to All, whose test set is consolidated with all clients’ test sets merged.
The test accuracy of the individual client and server sides are shown in Figure 12 when network parameters are federated.
It can be seen from the experiment results that since the data obeys the IID distribution, each client’s data has almost the same features. As a result, the local clients can learn a strong global fault recognition capability without sharing data. This is precisely why Distributed-own and Distributed-all are of similar diagnostic accuracy. The Upper Limit method has all the datasets, and thus it can extract as much information as possible. So it is the most powerful method. The i-MFN method extracts the weights which are important to the network when sharing is performed. The information exchanging ability is thus effectively improved. The training efficiency of the network has been improved.

5.3.3. Experiment for NOIID Data Distribution

Figure 13 and Table 5 show the experiment results for each condition when the data is under NOIID distribution.
The distribution of the data sets under NOIID distribution for each client is shown in Table 3. The server’s test set was composed of all clients’ test sets.
When experiments were conducted with each client using the NOIID data distribution with network parameters all-shared, the trend in accuracy rates for each client and server is shown in Figure 14.
When the experiments were conducted with each client using the NOIID data distribution with network parameters isolated, the trend in accuracy rates for each client and server is shown in Figure 15.
From the experimental results, it can be seen that, for NOIID distributed data, local clients can only learn the features of their own data without parameters sharing. As a result, the local client network has little diagnostic ability when new fault types occur. So we can find that the diagnosis results are better in the Distributed-own case but worse in the Distributed-all case. Similarly, federated + average (Ave-FL) [21], in the case of NOIID data distribution, does not extract valid information well due to gradient interference, making the server unable to perform gradient fusion, thus reducing the diagnostic ability. The i-MFN extracts some of the important network parameters for sharing and avoids the global sharing, effectively reducing the gradient interference from other clients in the global sharing, thus greatly improving the overall network diagnostic capability.

5.3.4. Privacy Analysis

The Deep Leakage from Gradients (DLG) method proposed by ZHU [30] is used to extract the original data information leaked by the shared gradients. Figure 16 shows the data information leaked by the full gradient information and the proposed i-MFN algorithm at the top sharing level, respectively.
We found that, as the gradient information decreases, the amount of information that can be extracted from the gradients will decrease until nothing can be extracted at all. With the proposed algorithm, the shared gradients are filtered, and data leakage from the client can be effectively avoided. The main advantages of the i-MFN are as follows.
(1)
The most valuable weights selected by the interpretable indicators ensure that the diagnostic capabilities are effectively transmitted between clients;
(2)
The gradients involved in sharing are not sufficient to cause a data leakage.

5.3.5. Experimental Analysis

A comparison of the performance of several algorithms is shown in Table 6.
From the experimental results, the following conclusions can be drawn:
(1)
For data under IID data distribution. It can be seen from Table 4 that there is little difference between the average diagnosis of each client and the whole in the case of Distributed-own and Distributed-all. The main reason for this result can be summarized as follows: The data distribution is similar between the individual clients, and there is almost no difference between dataset distributions for Distributed-own and Distributed-all, so the federation results are not significantly different from those of the centralized one.
(2)
For data under NOIID data distribution. The test accuracy is 98.72% for Upper Limit, 99.56% for Distributed-own, 29.50% for Distributed-all, and 27.67% for Ave-FL. In addition, accuracy for α-FedAvg (α = 25%) is 91.32%, 92.89% for CEC + Average, 77.72% for Federated + SKF and 96.69% for i-MFN. It can be seen that Distributed-own and Distributed-all differ significantly due to the large change in data distribution between the individual clients. α-FedAvg and CEC + Average reduce parameter interference by reducing the numbers of sharing but have randomness and limitations. On the other hand, the i-MFN proposed in the paper is extremely effective in learning the characteristics of the data between NOIID data. Each client is well trained to learn the characteristics of other clients’ data. Thus, the overall diagnostic accuracy of the federated network can be improved.
(3)
For the protection of data privacy. The DLG method proposed by ZHU [30] is a kind of privacy data acquisition algorithm similar to an adversarial neural network. This method is significantly effective in obtaining complete raw data through the shared gradients. However, the stealing of private data by this method can be effectively blocked by the mechanism of multi-level sharing proposed in this paper. Because the unfriendly client only gets a small fraction of the gradient information at a single time, and the position of the gradient is uncertain. The main reasons for this can be summarized as follows: 1. We have experimentally verified that the DLG method needs to retain all gradient information in order to recover the original data effectively. This is the reason that both Ave-FL and Federated + SKF can lead to data leakage. They share all the gradient information during parameter aggregation. Even if randomly changing the value of a particular gradient to zero, it will significantly affect the effectiveness of the data recovery. 2. Based on the above analysis, we share the gradient parameters into multiple levels through the LRP interpretability metric. By disrupting the gradient information, the secrecy of the information is thus effectively guaranteed.

6. Conclusions

In this paper, an interpretable multi-level federated learning network is proposed to solve the black box problem of the general federated network. An interpretable multi-level federated network has also been constructed based on an interpretable index, which allows the dataset under NOIID distribution to maintain a good diagnostic accuracy in FL. The confidentiality of the network is enhanced. In addition, how to build a federal network structure in a multi-level federated center still needs to be further explored in future research.

Author Contributions

S.W. and Y.Z. wrote the manuscript. S.W. wrote the algorithmic program and performed the experiments; Y.Z. conceived and supervised the research and experiments, contributed as the lead author of the article, analyzed and audited the data. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (no. 61673259); supported by Shanghai “Science and Technology Innovation Action Plan” Hong Kong, Macao and Taiwan Science and Technology Cooperation Project (no.21510760600); and also supported by Capacity Building Project of Local Colleges and Universities of Shanghai (no. 21010501900).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data contributions presented in the study are included in the article, and further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Xu, L.; Chatterton, S.; Pennacchi, P. A Novel Method of Frequency Band Selection for Squared Envelope Analysis for Fault Diagnosing of Rolling Element Bearings in a Locomotive Powertrain. Sensors 2018, 18, 4344. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  2. Lei, Y.; Yang, B.; Jiang, X.; Jia, F.; Li, N.; Nandi, A.K. Applications of machine learning to machine fault diagnosis: A review and roadmap. Mech. Syst. Signal Process. 2020, 138, 106587. [Google Scholar] [CrossRef]
  3. Schmid, M.; Gebauer, E.; Hanzl, C.; Endisch, C. Active Model-Based Fault Diagnosis in Reconfigurable Battery Systems. IEEE Trans. Power Electron. 2021, 36, 2584–2597. [Google Scholar] [CrossRef]
  4. Wang, X.; Wang, Z.; Xu, Z.; Cheng, M.; Wang, W.; Hu, Y. Comprehensive Diagnosis and Tolerance Strategies for Electrical Faults and Sensor Faults in Dual Three-Phase PMSM Drives. IEEE Trans. Power Electron. 2019, 34, 6669–6684. [Google Scholar] [CrossRef]
  5. Fu, S.; Qiu, J.; Chen, L.; Chadli, M. Adaptive Fuzzy Observer-Based Fault Estimation for a Class of Nonlinear Stochastic Hybrid Systems. IEEE Trans. Fuzzy Syst. 2022, 30, 39–51. [Google Scholar] [CrossRef]
  6. Nguyen, N.P.; Huynh, T.T.; Do, X.P.; Mung, N.X.; Hong, S.K. Robust Fault Estimation Using the Intermediate Observer: Application to the Quadcopter. Sensors 2020, 20, 4917. [Google Scholar] [CrossRef]
  7. Rai, A.; Upadhyay, S.H. A review on signal processing techniques utilized in the fault diagnosis of rolling element bearings. Tribol. Int. 2016, 96, 289–306. [Google Scholar] [CrossRef]
  8. Zhang, Y.; Zhang, Z.; Chen, L.; Wang, X. Reinforcement Learning-Based Opportunistic Routing Protocol for Underwater Acoustic Sensor Networks. IEEE Trans. Veh. Technol. 2021, 70, 2756–2770. [Google Scholar] [CrossRef]
  9. Zhang, Y.; Kong, L. Photovoltaic power prediction based on hybrid modeling of neural network and stochastic differential equation. ISA Trans. 2021, early access, 1–26. [Google Scholar] [CrossRef]
  10. Namigtle-Jiménez, A.; Escobar-Jiménez, R.; Gómez-Aguilar, J.; García-Beltrán, C.; Téllez-Anguiano, A. Online ANN-based fault diagnosis implementation using an FPGA: Application in the EFI system of a vehicle. ISA Trans. 2020, 100, 358–372. [Google Scholar] [CrossRef]
  11. Wei, Y.; Wu, D.; Terpenny, J. Robust Incipient Fault Detection of Complex Systems Using Data Fusion. IEEE Trans. Instrum. Meas. 2020, 69, 9526–9534. [Google Scholar] [CrossRef]
  12. Lei, Y.; Jia, F.; Lin, J.; Xing, S.; Ding, S.X. An Intelligent Fault Diagnosis Method Using Unsupervised Feature Learning Towards Mechanical Big Data. IEEE Trans. Ind. Electron. 2016, 63, 3137–3147. [Google Scholar] [CrossRef]
  13. Zhang, Y.; Li, P.; Wang, X. Intrusion Detection for IoT Based on Improved Genetic Algorithm and Deep Belief Network. IEEE Access 2019, 7, 31711–31722. [Google Scholar] [CrossRef]
  14. Wen, L.; Li, X.; Gao, L. A New Two-Level Hierarchical Diagnosis Network Based on Convolutional Neural Network. IEEE Trans. Instrum. Meas. 2020, 69, 330–338. [Google Scholar] [CrossRef]
  15. Zhang, Y.; Liu, Q. On IoT intrusion detection based on data augmentation for enhancing learning on unbalanced samples. Futur. Gener. Comput. Syst. 2022. [Google Scholar] [CrossRef]
  16. Wang, J.; Li, S.; Han, B.; An, Z.; Xin, Y.; Qian, W.; Wu, Q. Construction of a batch-normalized autoencoder network and its application in mechanical intelligent fault diagnosis. Meas. Sci. Technol. 2019, 30, 015106. [Google Scholar] [CrossRef]
  17. Cai, B.; Liu, Y.; Xie, M. A Dynamic-Bayesian-Network-Based Fault Diagnosis Methodology Considering Transient and Intermittent Faults. IEEE Trans. Autom. Sci. Eng. 2017, 14, 276–285. [Google Scholar] [CrossRef]
  18. Liu, P.; Liu, Y.; Cai, B.; Wu, X.; Wang, K.; Wei, X.; Xin, C. A dynamic Bayesian network based methodology for fault diagnosis of subsea Christmas tree. Appl. Ocean Res. 2020, 94, 101990. [Google Scholar] [CrossRef]
  19. Lin, J.; Yu, W.; Zhang, N.; Yang, X.; Zhang, H.; Zhao, W. A Survey on Internet of Things: Architecture, Enabling Technologies, Security and Privacy, and Applications. IEEE Internet Things J. 2017, 4, 1125–1142. [Google Scholar] [CrossRef]
  20. McMahan, H.B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A.Y. Communication-effificient learning of deep networks from decentralized data. arXiv 2017, arXiv:1602.05629. [Google Scholar] [CrossRef]
  21. Zhang, J.; Wang, Y.; Zhu, K.; Zhang, Y.; Li, Y. Diagnosis of Interturn Short-Circuit Faults in Permanent Magnet Synchronous Motors Based on Few-Shot Learning Under a Federated Learning Framework. IEEE Trans. Ind. Inform. 2021, 17, 8495–8504. [Google Scholar] [CrossRef]
  22. Liu, W.; Chen, L.; Chen, Y.; Zhang, W. Accelerating Federated Learning via Momentum Gradient Descent. IEEE Trans. Parallel Distrib. Syst. 2020, 31, 1754–1766. [Google Scholar] [CrossRef] [Green Version]
  23. Chen, M.; Poor, H.V.; Saad, W.; Cui, S. Convergence Time Optimization for Federated Learning Over Wireless Networks. IEEE Trans. Wirel. Commun. 2021, 20, 2457–2471. [Google Scholar] [CrossRef]
  24. Amiri, M.M.; Gunduz, D. Machine Learning at the Wireless Edge: Distributed Stochastic Gradient Descent Over-the-Air. IEEE Trans. Signal Process. 2020, 68, 2155–2169. [Google Scholar] [CrossRef] [Green Version]
  25. Saha, R.; Misra, S.; Deb, P.K. FogFL: Fog-Assisted Federated Learning for Resource-Constrained IoT Devices. IEEE Internet Things J. 2021, 8, 8456–8463. [Google Scholar] [CrossRef]
  26. Hao, M.; Li, H.; Luo, X.; Xu, G.; Yang, H.; Liu, S. Efficient and Privacy-Enhanced Federated Learning for Industrial Artificial Intelligence. IEEE Trans. Ind. Inform. 2020, 16, 6532–6542. [Google Scholar] [CrossRef]
  27. Li, B.; Wu, Y.; Song, J.; Lu, R.; Li, T.; Zhao, L. DeepFed: Federated Deep Learning for Intrusion Detection in Industrial Cyber–Physical Systems. IEEE Trans. Ind. Inform. 2021, 17, 5615–5624. [Google Scholar] [CrossRef]
  28. Sattler, F.; Muller, K.-R.; Samek, W. Clustered Federated Learning: Model-Agnostic Distributed Multitask Optimization Under Privacy Constraints. IEEE Trans. Neural Networks Learn. Syst. 2021, 32, 3710–3722. [Google Scholar] [CrossRef]
  29. Chen, Y.; Sun, X.; Jin, Y. Communication-Efficient Federated Deep Learning With Layerwise Asynchronous Model Update and Temporally Weighted Aggregation. IEEE Trans. Neural Networks Learn. Syst. 2020, 31, 4229–4238. [Google Scholar] [CrossRef]
  30. Kim, H.; Park, J.; Bennis, M.; Kim, S.-L. Blockchained On-Device Federated Learning. IEEE Commun. Lett. 2020, 24, 1279–1283. [Google Scholar] [CrossRef] [Green Version]
  31. Zhu, L.; Liu, Z.; Han, S. Deep leakage from gradients. Adv. Neural Inf. Processing Syst. 2019, arXiv:1906.08935. [Google Scholar] [CrossRef]
  32. Bach, S.; Binder, A.; Montavon, G.; Klauschen, F.; Müller, K.R.; Samek, W. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation. PLoS ONE 2015, 10, 130–140. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  33. Grezmak, J.; Zhang, J.; Wang, P.; Loparo, K.A.; Gao, R.X. Interpretable Convolutional Neural Network Through Layer-wise Relevance Propagation for Machine Fault Diagnosis. IEEE Sensors J. 2020, 20, 3172–3181. [Google Scholar] [CrossRef]
  34. Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
  35. Xia, M.; Li, T.; Xu, L.; Liu, L.; De Silva, C.W. Fault Diagnosis for Rotating Machinery Using Multiple Sensors and Convolutional Neural Networks. IEEE/ASME Trans. Mechatron. 2017, 23, 101–110. [Google Scholar] [CrossRef]
  36. Ma, X.; Wen, C.; Wen, T. An Asynchronous and Real-Time Update Paradigm of Federated Learning for Fault Diagnosis. IEEE Trans. Ind. Inform. 2021, 17, 8531–8540. [Google Scholar] [CrossRef]
  37. Wang, S.; Tuor, T.; Salonidis, T.; Leung, K.K.; Makaya, C.; He, T.; Chan, K. Adaptive Federated Learning in Resource Constrained Edge Computing Systems. IEEE J. Sel. Areas Commun. 2019, 37, 1205–1221. [Google Scholar] [CrossRef] [Green Version]
  38. Zhao, Y.; Li, M.; Lai, L.; Suda, N.; Civin, D.; Chandra, V. Federated learning with non-IID data. arXiv 2018, arXiv:1806.00582. Available online: https://arxiv.org/abs/1806.00582 (accessed on 26 May 2022).
  39. Xue, M.; Chenglin, W. An Asynchronous Quasi-Cloud/Edge/Client Collaborative Federated Learning Mechanism for Fault Diagnosis. Chin. J. Electron. 2021, 30, 969–977. [Google Scholar] [CrossRef]
Figure 1. IoT federated learning architecture.
Figure 1. IoT federated learning architecture.
Jmse 10 00743 g001
Figure 2. LRP propagation process. (a) Forward propagation of neural networks. (b) Calculating LRP in backward direction.
Figure 2. LRP propagation process. (a) Forward propagation of neural networks. (b) Calculating LRP in backward direction.
Jmse 10 00743 g002
Figure 3. The interpretable adaptive sparse deep neural networks.
Figure 3. The interpretable adaptive sparse deep neural networks.
Jmse 10 00743 g003
Figure 4. The Rw map schematic. (a) Network parameter LRP values graph. (b) Correspondence between pixels and LRP values.
Figure 4. The Rw map schematic. (a) Network parameter LRP values graph. (b) Correspondence between pixels and LRP values.
Jmse 10 00743 g004
Figure 5. An example of Rw map.
Figure 5. An example of Rw map.
Jmse 10 00743 g005
Figure 6. Diagram of federated learning framework based on network interpretable parameters.
Figure 6. Diagram of federated learning framework based on network interpretable parameters.
Jmse 10 00743 g006
Figure 7. The flow chart of the proposed i-MFN algorithm.
Figure 7. The flow chart of the proposed i-MFN algorithm.
Jmse 10 00743 g007
Figure 8. The waveforms of nine types of fault samples. (ai) correspond to the waveforms of Label 1–Label 9 of Table 1, respectively.
Figure 8. The waveforms of nine types of fault samples. (ai) correspond to the waveforms of Label 1–Label 9 of Table 1, respectively.
Jmse 10 00743 g008
Figure 9. Rw map. (a) Schematic diagram of a four-layer network Rw map. (b) Layered diagram of the multi-level of a four-layer network.
Figure 9. Rw map. (a) Schematic diagram of a four-layer network Rw map. (b) Layered diagram of the multi-level of a four-layer network.
Jmse 10 00743 g009
Figure 10. Experimental accuracy comparison under IID data distribution.
Figure 10. Experimental accuracy comparison under IID data distribution.
Jmse 10 00743 g010
Figure 11. Accuracy rates trend under IID data distribution with network parameters isolated.
Figure 11. Accuracy rates trend under IID data distribution with network parameters isolated.
Jmse 10 00743 g011
Figure 12. Accuracy rates trend of i-MFN under IID data distribution with network parameters federated.
Figure 12. Accuracy rates trend of i-MFN under IID data distribution with network parameters federated.
Jmse 10 00743 g012
Figure 13. Experimental accuracy comparison under NOIID data distribution.
Figure 13. Experimental accuracy comparison under NOIID data distribution.
Jmse 10 00743 g013
Figure 14. Accuracy rates trend of i-MFN under NOIID data distribution with network parameters federated.
Figure 14. Accuracy rates trend of i-MFN under NOIID data distribution with network parameters federated.
Jmse 10 00743 g014
Figure 15. Accuracy rates trend under NOIID data distribution with network parameters isolated.
Figure 15. Accuracy rates trend under NOIID data distribution with network parameters isolated.
Jmse 10 00743 g015
Figure 16. Client data information leakage. (a) Client data information leaked by the full gradient information. (b) Client data information leaked by the i-MFN algorithm in top sharing level.
Figure 16. Client data information leakage. (a) Client data information leaked by the full gradient information. (b) Client data information leaked by the i-MFN algorithm in top sharing level.
Jmse 10 00743 g016
Table 1. Data information on nine types of fault states.
Table 1. Data information on nine types of fault states.
Fault TypeFault Diameter (inch)LabelSample Size
BDI0.0071300
BDII0.0142300
BDIII0.0213300
IRI0.0074300
IRII0.0145300
IRIII0.0216300
ORI0.0077300
ORII0.0148300
ORIII0.0219300
Table 2. Features used in the experiment.
Table 2. Features used in the experiment.
FeaturesExpression
Absolute mean: 1 n i = 1 n | x i | Time domain
Variance: 1 n i = 1 n ( x i x ) 2
Clearance factor: max ( | x i | ) / ( 1 n i = 1 n x i 2 ) 2
Crest factor: max ( | x i | ) / 1 n i = 1 n x i 2
Shape factor: 1 n i = 1 n x i 2 / ( 1 n i = 1 n | x i | )
Average frequency: ( i = 1 n ω i X i ) / i = 1 n X i Frequency domain
Crest: max ( | X i | )
Kurtosis: 1 n i = 1 n X i 4
Mean energy: 1 n i = 1 n X i
Variance: 1 n i = 1 n ( X i X ¯ ) 2
Table 3. Distribution of client data.
Table 3. Distribution of client data.
Client No.Fault TypeLabel
Client1IRII IRIII5,6
Client2BDI BDII1,2
Client3BDII BDIII2,3
Client4BDII BDIII2,3
Client5BDIII IRI3,4
Client6BDIII IRI3,4
Client7BDI ORIII1,9
Client8ORI ORII7,8
Table 4. Comparison of diagnostic accuracy under IID distribution.
Table 4. Comparison of diagnostic accuracy under IID distribution.
IID
Upper LimitDistributed-OwnDistributed-Alli-MFN
C198.50%94.00%95.00%98.00%
C298.00%98.00%96.88%99.00%
C398.50%97.00%95.88%98.00%
C498.75%96.00%96.88%97.00%
C599.25%95.00%95.63%97.00%
C699.00%95.00%96.13%94.00%
C797.75%99.00%96.50%99.00%
C8100.00%97.00%97.38%94.00%
Mean98.72%96.38%96.28%97.00%
Table 5. Comparison of diagnostic accuracy under NOIID distribution.
Table 5. Comparison of diagnostic accuracy under NOIID distribution.
NOIID
Upper LimitDistributed-OwnDistributed-Alli-MFN
C198.50%100.00%13.00%99.50%
C298.00%98.00%31.00%90.00%
C398.50%99.00%43.00%98.00%
C498.75%99.50%43.00%98.00%
C599.25%100.00%38.00%99.00%
C699.00%100.00%38.00%99.50%
C797.75%100.00%19.00%89.50%
C8100.00%100.00%13.00%100.00%
Mean98.72%99.56%29.50%96.69%
Table 6. Comparison of diagnostic accuracy of different algorithms.
Table 6. Comparison of diagnostic accuracy of different algorithms.
MethodNOIIDDeep Leakage from Gradients
Upper Limit98.72%-
Distributed-own99.56%Yes
Distributed-all [36]29.50%No
Federated + SKF [36]77.72%Yes
Ave-FL [37]27.67%Yes
α-FedAvg(α = 25%) [38]91.32%No
CEC + Average [39]92.89%No
i-MFN96.69%No
Note: “-“: the dataset of the Upper Limit itself is shared, so there is no problem of “deep leakage from gradients”.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Wang, S.; Zhang, Y. Multi-Level Federated Network Based on Interpretable Indicators for Ship Rolling Bearing Fault Diagnosis. J. Mar. Sci. Eng. 2022, 10, 743. https://doi.org/10.3390/jmse10060743

AMA Style

Wang S, Zhang Y. Multi-Level Federated Network Based on Interpretable Indicators for Ship Rolling Bearing Fault Diagnosis. Journal of Marine Science and Engineering. 2022; 10(6):743. https://doi.org/10.3390/jmse10060743

Chicago/Turabian Style

Wang, Shuangzhong, and Ying Zhang. 2022. "Multi-Level Federated Network Based on Interpretable Indicators for Ship Rolling Bearing Fault Diagnosis" Journal of Marine Science and Engineering 10, no. 6: 743. https://doi.org/10.3390/jmse10060743

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop