A Fault Aware Broad Learning System for Concurrent Network Failure Situations

The broad learning system (BLS) framework gives an efficient solution for training flat-structured feedforward networks and flat structured deep neural networks. However, the classical BLS model and other variants focus on the faultless situation only, where enhancement nodes, feature mapped nodes, and output weights of a BLS network are assumed to be realized in a perfect condition. When a trained BLS network suffers from coexistence of weight/node failures, the trained network has a greatly degradation in its performance if a countermeasure is not taken. In order to reduce the effect of weight/node failures on the BLS network’s performance, this paper proposes an objective function for enhancing the fault aware performance of BLS networks. The objective function contains a fault aware regularizer term which handles the weight/node failures. A learning algorithm is then derived based on the objective function. The simulation results show that the performance of the proposed fault aware BLS (FABLS) algorithm is superior to the classical BLS and two state-of-the-arts BLS algorithms, namely correntropy criterion BLS (CBLS) and weighted BLS (WBLS).


I. INTRODUCTION
Without doubt, human brain is capable to handle fault and noise situations [1].For instance, human being is able to recognize a partially occluded object without much difficulty.Moreover, human brain can still operate with a certain ability when a meager amount of synapses or neurons are malfunctioned.As the idea of artificial neural networks (ANNs) is based on human brain, it was presumed that a trained ANN should have an inherent ability against weight/node failures.In the inception of ANN researches, a common misconception was that ANNs have intrinsic tolerance against any form of failures, such as weight noise, node noise, and open fault.Meanwhile, if ANNs suffer from weight failures or node failures, without any fault aware measures in place, then the performance of such ANNs is greatly degraded [2], [3].
In the realization of a trained ANN in electronic systems, weight/node failures are not avoidable [4], [5].For instance, when digital technology is employed to represent the weights of a trained ANN, there are precision errors and the errors can be modelled as multiplicative noise.Also, when analog The associate editor coordinating the review of this manuscript and approving it for publication was Qichun Zhang .
technology is used, hardware components do have precision errors and thermal noise do exist.Furthermore, damages such as open weight faults or open node faults may be introduced when the communication link between two nodes are broken [6].Besides, when realizing a ANN using the very large scale integration (VLSI) at nano-scale level, transient noise or failures are unavoidable during the operation [7], [8].
One of basic requirements of many electronic systems is dependability.Many safety critical systems would be using complex ANNs, such as deep neural network (DNN).It is expected that such systems must have high dependability.Hence, it is vital or crucial to embed good fault aware techniques in such systems, such as build-in self test and error detection/correction coding [9].
To enhance the behaviour of a well trained ANN during the failure situation, it is paramount to figure out how fault affects the behavior or performance of a trained ANN in an analytical way.There are related studies on fault resistant ability of ANNs.Recently, in [10], empirical results of a fault tolerant ability of a small convolutional neural network (CNN) are presented.Furthermore, a survey paper reported the effects of those imperfect conditions for various neural network models, including feedforward networks and VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/radial basis function (RBF) networks [11], [12].For RBF networks, fault aware ability and some fault aware training algorithms were extensively investigated in [3], [4].The broad learning system (BLS) [13], [14] concept is very efficient for constructing flat structured ANN, and BLS networks are universal approximators [15].In addition, due to its good properties, the BLS concept has caught many attentions [16]- [20].Although many BLS results were reported, commonly available BLS algorithms focus on the faultless situation.To the best of our knowledge, there are not many literatures related to fault tolerant BLS networks.
This paper studies the effect of coexistence of weight/node failures on BLS [13].It first examines the behaviour of the original BLS model under the coexistence of several fault/noise, including open weight fault on output weights, multiplicative noise on the nodes, and multiplicative noise on the output weights.To compensate the degradation effect from the aforementioned noise/fault, an objective function for enhancing the fault aware performance of BLS is developed.The objective function contains a fault aware regularizer term.The learning algorithm for minimizing the fault aware objective function is proposed.The proposed algorithm is called fault aware BLS (FABLS) algorithm.Several real life datasets from the UCI repository [21] are used to verify the effectiveness of the proposed FABLS algorithm.From the simulation results, our proposed FABLS is superior to the original BLS [13], and two state of arts algorithms, namely weighted BLS (WBLS) [19] and correntropy cirterion based BLS (CBLS) [20].
In summary, our major contributions are as follows.
• This paper investigates the effect of the coexistence of several fault/noise on the performance of BLS network.
• A fault aware objective function is proposed.The objective function contains a fault aware regularizer term which mitigates the effects of weight/node failures.
• Based on the developed fault aware objective function, the fault aware BLS (FABLS) algorithm is developed.
• From the simulation results, our proposed FABLS is superior to the comparison BLS algorithms.The paper is presented as follows.Section II discusses related works, Section III presents the background on the traditional broad learning system (BLS) network and model.Section IV provides the concept of coexistence of a weight failure situation and the weight failure models.The fault-aware objective function and the learning algorithm are developed in Section V. Section VI uses several real life datasets to demonstrate the superiority of the proposed FABLS algorithm.Finally, Section VII provides conclusion remarks.Table 1 lists some key notations used in this paper.

II. RELATED WORKS A. FAULT TOLERANT IN ANNs
One fascinating feature or characteristic of ANNs is fault tolerance.However, if proper learning steps are not adopted, a trained ANN usually has poor fault tolerant ability.In the last two decades, various results on fault-tolerant ability of ANNs were reported.For instance, the common node fault situations are stuck-at-zero and stuck-at-one [6], [22]- [24].For stuck-at-zero, the output of a faulty node is stuck at zero.For stuck-at-one, the output of a faulty node is stuck at one.Another kind of ''stuck-at'' fault is ''stuck-at-any-values'', in which the output of a faulty node is stuck at a value in between the output range of the activation function.This stuck-at-any-values situation was analyzed in [25].Apart from node fault/failure, connection weights in a ANN may also have fault or failure [26]- [28].In the open weight fault, the connection weight between two nodes is disconnected.In the weight noise case, due to the technology limitation, there are some precision errors/noise in the realization.The common used model to describe the behaviour of the weight noise is the multiplicative weight noise model [2], [3], [26].
There are many methods to handle node/weight failures.Some approaches inject random failure during training [22], [24], [25].The demerit of this approach is that a large number of training cycles is required to achieve some certain performance.Otherwise, the statistical information of the fault cannot be expressed.Furthermore, another common approach for protecting fault/failure is replication approach [8], [29]- [31].In the replication approach, a number of networks from a trained network are replicated.Hence, the overall output is equal to the median of the outputs of the replicated networks.The disadvantage of this approach is that it requires extra resources.Besides, a median circuit is more complicated than the traditional linear neuron.
There are other approaches which focus on weight noise/fault or combination of two failures on a ANN.For instance, the works in [26]- [28] investigated the effect of failure situation on the performance of faulty radial basis function (RBF).Parara and Catala in [32] showed how a fault tolerant RBF network can be obtained based on weight decay regularizer [33].Furthermore, from the perspective of model sensitivity, an approach called explicit regularizer to attain a multi-layer perceptron (MLP) [34]- [36] or RBF [37], [38] was proposed.This approach can handle or tolerate multiplicative weight noise.Recently, failure tolerance ability of the extreme learning machine (ELM) were studied [12], [39]- [42].
Additionally, in [10], fault tolerant ability of a convolutional neural network (CNN) was investigated.Besides, in [43] fault tolerant methods based on diversifying learning were proposed to improve deep neural networks(DNNs) dependability.Moreover, in [44] Xu et al. studied the feasibility of developing fault-tolerant deep learning systems based on model redundancy.
Similarly, recent literatures indicate that soft errors are unavoidable in modern electronic systems, from edge computing devices to supercomputers [45], [46] due to multiple factors [47] such as device aging, wear devices, and high-energy cosmic radiation.In [48], an algorithm for improving fault tolerance of CNNs was proposed.
Furthermore, Leung et al. [49] investigated the effect of open weight faults and multiplicative noise on the performance of bidirectional associative memories (BAM).Moreover, in [50] literature review on fault and noise tolerance in neural networks is discussed.For fault tolerant systems, principles and concepts, interested readers are referred to [51]- [54].

B. BLS
The BLS concept [13] is a recent technique for broad connection of hidden nodes of a neural network.As shown in Fig. 1, these nodes are arranged in a broad sense.In a BLS network, there are two types of hidden nodes, feature mapped nodes, and enhancement nodes.
The feature mapped nodes form a layer, namely feature mapped node layer.The input connection weights of these nodes are constructed based on auto-encoder learning.That means, the feature mapped node layer is designed for feature extraction.The outputs of the feature mapped layer are connected to the output layer and another layer, calling enhancement node layer.The enhancement nodes receive the inputs from the feature nodes.In the original BLS learning, the connection weights between the input to the enhancement node are generated in a random manner.
As shown in Fig. 1, the output layer collects the information from the feature node layer and enhancement node layer.In other words, the outputs of feature nodes and enhancement nodes are concatenated in a wide manner and are fed to the output layer.The output weights could be obtained by a least-mean square technique based on Moore-Penrose's generalized inverse [55], [56].The overall training computational cost in BLS is not excessive and is much lower than that of other classical training algorithms, such as meta-heuristic techniques (differential evolution [57], particle swarm optimization [58], and gradient descent methods [59].
As the BLS framework provides efficient ways for constructing a flat structured ANN, and a BLS network owns the universal approximation ability [13], [14], the BLS framework has attracted increasing attentions.Many variants of BLS models and algorithm were proposed to enhance its performance [16], [60], [61].Besides, many applications of BLS were reported in the last several years.
In [16], a hybrid model of BLS and fuzzy concept was proposed, in which the feature nodes are replaced with some fuzzy subsystems.In [18], the BLS system is applied to fault diagnosis in rotor systems.In [62], a BLS model was proposed for object detection using the event data obtained from event cameras.In this model, the incremental learning concept was also used for constructing the resultant BLS network.In [63], the BLS concept was used for processing hyperspectral images.Apartment from applications, there are modifications on its learning algorithm [19], [20].In [19], the weighted BLS (WBLS) algorithm was proposed for compensate the effect of noise and outliers.In [20], the correntropy cirterion based BLS (CBLS) was proposed for handling outlier data based on the maximum correntropy criterion (MCC).
Although the BLS concept is a quite promising framework, the original BLS and other variants focus on the faultless situation only, where a BLS network is assumed to be perfectly implemented.To the best of our knowledge, there are not many literatures related to fault tolerant BLS networks.This paper studies the behaviour of original BLS which are concurrently affected by node noise, weight noise, and open weight fault.Also, a fault aware objective function is derived.Furthermore, based on the proposed objective function, a fault VOLUME 9, 2021 tolerant BLS (FABLS) is proposed to mitigate the effect of network failures.

III. BLS NETWORKS AND NETWORK FAILURE A. OPERATION OF BLS NETWORKS
This subsection gives an introduction to the operation of a BLS network.Consider that a BLS network is designed for solving nonlinear regression problem.Let x ∈ R D and o ∈ R, respectively, be the input data and output of a BLS network.First, for clear and easy presentation, This paper denotes the input x augmented with 1 as χ = [x T , 1] T .

1) FEATURE MAPPED NODES
In the BLS network, the feature mapped nodes are grouped into n groups.Each group is used to extract different features.The i-th group has f i feature mapped nodes.Hence, the total number of feature mapped nodes is For the i-th group, there is a learned projection matrix, given by where i ∈ R f i ×(D+1) .The feature mapped nodes are employed to explore the hidden features of the input data.
The mapped features g i of the i-th group are obtained by projecting the input data with the matrix i .They are given by where g i,u is the u-th feature of the i-th The construction procedure of i 's is established via sparse optimization step.This can be achieved in several ways, such as solving sparse optimization problem based on the alternating direction method of multipliers ADMM [64] algorithm.The construction procedure of i 's is presented in the next subsection III-B1.It should be noticed that a nonlinear operation on g i 's can be used.In the classical BLS framework, no nonlinear operation is applied on g i 's.This paper follows the classical BLS framework and does not apply nonlinear operation on g i 's.
All the outputs from the feature mapped nodes are collected together as The augmented vector q of g is denoted as 2) ENHANCEMENT NODES The BLS network has m groups of enhancement nodes.In the BLS framework, the j-th group of enhancement nodes has e j nodes.The total number of enhancement nodes in the BLS network is given by Besides, the output of j-th group of enhancement nodes is given by where j = 1, • • • , m and W j is a randomly generated.The elements of W j are denoted as It should be noticed that ξ (•) is the activation function for enhancement nodes.Generally, each group of enhancement nodes can have its own activation function.In the original BLS algorithm, hyperbolic tangent is employed as the activation function for all the enhancement nodes.This paper uses hyperbolic tangent as the activation function for all enhancement nodes.Let η be the collection of all the enhancement node outputs, given by 3) NETWORK OUTPUT For a given input vector x, the output of the network is where β is the output weight vector.The number of elements in β is equal to f + e.Hence, its components are given by

B. CONSTRUCTION OF WEIGHT MATRICES AND VECTORS
Consider that there are N training pairs in the training set T is the kth training input vector and y k is the corresponding target output.First, the training data matrix is formed by packing all the input x k 's together.The augmented data matrix denoted as X is given by

) CONSTRUCTION OF THE PROJECTION MATRIX: i
For each group of feature mapped nodes, one vital thing is how to construct the projection matrix, i .The technique described here is based on the procedures in [13], [60], [65].
In the BLS, a random matrix P i ∈ R (D+1)×f i is generated first for each group of feature mapped nodes.Afterwards, a random projection data matrix Q i is obtained, given by The projection matrix i is the solution of the sparse approximation problem given by min i where in (13), the term ρ i 1 is to enforce the solution of ( 13) to be a sparse matrix.Also, ρ is a regularization parameter for sparse regularization.

2) CONSTRUCTION OF THE WEIGHT MATRICES OF THE ENHANCEMENT NODES: W j
The construction step of W j 's is a bit simple.The traditional BLS algorithm and other variants randomly generate the weight matrices for each group of the enhancement nodes.

3) CONSTRUCTION OF OUTPUT WEIGHT VECTOR
This section gives the procedure to construct the output weight vector β.Given the projection matrix i s ∀ i = 1, • • • , n of the feature mapped nodes and the training data matrix X, the i-th training data feature matrix for all training samples is given by where Let Z be the collection of all training data feature matrices, given by In this way, Z is an N × f matrix, denoted as where the k-th row vector Z T k of Z contains the inputs of the enhancement nodes (the outputs of the feature mapped nodes) for the k-th training input vector x k .In order to handle input biases, an one vector is added into Z, given by Furthermore, given Z, the enhancement node outputs of the j-th enhancement group for all training data are given by for j = 1, • • • , m, where and Packing all the enhancement node outputs together, a N × m j=1 e j = N × e matrix H is obtained: Define A = [Z|H].The output weight vector β can be calculated based on least square techniques arg min where y = [y 1 , • • • , y N ] T is the collection of all training outputs, and is a regularization parameter.

IV. MULTIPLICATIVE NOISE AND OPEN WEIGHT FAULT IN BLS
In the digital realization of a well trained neural network, the finite precision representation of the weights leads to multiplicative weight noise [3], [4].Also, using analog implementation, the precision errors are usually specified based on the percentage of error.Hence, the weight deviation is proportional to the magnitude of the weight value.The output weights β l 's (l = 1, • • • , f + e) can be decomposed into two sets.One set contains the output weights related to the feature mapped nodes.Another set contains the output weights related to the enhancement nodes.
For easy of reading, this paper introduces a duplication labelling for the output weights based on the feature mapped node set and the enhancement node set.The duplication labelling for β l 's is as follows.
• The symbol β g,i,u denotes the u-th output weight in the i-th group of feature mapped nodes, where i = 1, • The symbol β h,j,v denotes the v-th output weight in the j-th group of feature mapped nodes, where VOLUME 9, 2021

A. MULTIPLICATIVE NOISE IN OUTPUT WEIGHTS β
This paper uses the following model to describe the behaviour of of multiplicative noise on output weights β l 's, given by βg,i,u = (1 + γ g,i,u )β g,i,u , where βg,i,u and βh,j,v are implemented values of β g,i,u and β h,j,v , respectively.The variables γ g,i,u 's and γ h,j,v 's are the noise factors that describe the deviation percentages.This paper assumes that they are independent and identically distributed (iid) zero mean random variables with variance equal to σ 2 β .

B. OPEN WEIGHT FAULT IN OUTPUT WEIGHTS β
Due to hardware imperfection, open weight faults [2], [4], [6] may happen.For the BLS case, feature mapped nodes and enhancement nodes may be disconnected to the output layer.For open weight fault, an implemented output weight can be described as where p β is the fault rate of the output weights.

C. CONCURRENT WEIGHTS IN OUTPUT WEIGHTS β
When concurrent weight failures (multiplicative weight noise and open weight fault) occur on the output weights, the output weights can be described as From ( 28) and ( 29), when an output weight is opened, its weight value is equal to 0. When the output weight is not opened, its weight value is affected by weight noise only.

D. NOISE IN FEATURE MAPPED NODES
Feature nodes can be affected by multiplicative noise.For easy of reading, denote the current input vector is the kth training vector x k .That means, the augmented input is Under the multiplicative noise, the feature node output is given by zi,k,u = (1 where i = 1, . . ., n, u = 1, • • • , f i and ι = 1, . . ., D + 1.In (30), α i,u 's are noise factors that describe the percentage of deviation from the feature node output z i,k,u .This paper assumes that these noise factors are iid zero mean random variables with variance equal to σ 2 z .From (30), the noisy version of the training data feature matrices are denoted as Z = Z1 , • • • , Zn .

E. NOISE IN ENHANCEMENT NODES
Also, enhancement nodes can be affected by multiplicative noise.This paper considers that the input weights of the enhancement nodes have precision errors under implementation.Hence, under the multiplicative noise model, the affected input weight, wj,v,τ is given by where • , e j , and τ = 1, . . ., f + 1. δ i,v,τ 's are noise factors that describe the percentage of deviation in input weights.Given the current input is x k , with the aids of Taylor series, the output h j,k,v of an enhancement node can be modelled as This paper assumes that δ j,v,τ 's are iid zero mean random variables with variance equal to σ 2 w .It should be noticed from ( 7) and ( 16), When the hyperbolic tangent is used as the activation, Hence, from ( 19), (32), and (34), the noise version of enhancement node outputs for all training inputs are denoted as H = H1 , • • • , Hm .

F. NOISE STATISTICS OF THE WEIGHTED INPUTS TO THE OUTPUT NODES
This section presents the statistical properties of the weighted inputs to the output node, given by (z i,k,u βg,i,u ) and ( hj,k,v βh,j,v ).From the statistical properties of γ g,i,u 's, ζ g,i,u 's, and α i,u 's, the statistics of (z i,k,u βg,i,u ) are given by zi,k,u βg,i,u = (1 where • is the expectation operation.It should be noticed that zi,k,u βg,i,u and zi ,k,u βg,i,u are uncorrelated for i = i or u = u. In addition, the statistics of ( hj,k,v βh,j,v ) are given by It should be noticed that hj,k,v βh,j,v and hj ,k,v βg,j ,v are uncorrelated for j = j or v = v.Also, (z i,k,u βg,i,u ) and ( hj,k,v βh,j,v ) are uncorrelated.

V. THE PROPOSED FAULT-AWARE OBJECTIVE FUNCTION AND LEARNING ALGORITHM
In the noise/fault situation, this paper proposes to minimize the expected error over all possible noise/fault patterns.First, the training set error for a particular noise/fault pattern is considered.With the fault model ( 24)-( 26), and ( 28)-( 32), the training set error for a particular noise/fault pattern is given by where With ( 35)-( 38) and some manipulations, the average training set error of noisy networks over all possible noise/fault is the given by where diag(•) is to return a diagonal matrix from the main diagonal of a square matrix.Note that since the squared error stated in ( 39) is a convex function of the output weight vector β, the average squared error stated in ( 40) is a convex function too.
The matrix S is a block diagonal matrix, given by 41) Define R as With ( 46), the average training set error, stated in (40), can be rewritten as Since the last term in ( 47) is independent of the output weight vector, the objective function can be rewritten as In (48), it can be seen that the term β T Rβ can be considered as regularization term for handling the noise/fault in the BLS network.Since ( 40) is a convex function of β, (48) is a convex function too.Hence, the optimal solution can be obtained by setting the gradient of (48) to zero.The proposed FABLS solution is The FABLS algorithm is summarized in Algorithm 1.
Remark: One advantage of the proposed FABLS algorithm is that there is no tuning parameter.In the original BLS (see (23)) and other variants, such as WBLS [19] and CBLS [20], there is a tuning parameter .The good values of are chosen by trial and error.

VI. EXPERIMENTS A. EXPERIMENTAL SETTINGS AND DATASETS
In this section, the proposed algorithm FABLS is compared with the original BLS algorithm, WBLS [19] and CBLS [20].
The WBLS algorithm was designed to handle both noisy data and outliers.Similarly, CBLS is a robust approach based on correntropy criterion.The CBLS algorithm has an anti-noise agent in its objective function.Hence, these comparison algorithms, original BLS, WBLS, and CBLS are best suitable for comparison with the proposed FABLS under network failure situations.
In the comparison, several real life datasets from the UCI machine learning repository [21] are used.They are CTslice, Construct the projection matrices i 's based on (13).
Construct the weight matrices W j 's based on Section III-B1.Output: Output weight β * 1: Compute Z i using ( 13)-( 14), where i = 1, 3: Compute H j using ( 18) and ( 19), where Compute S and R using ( 41) and ( 46) respectively.7: Compute β * according to (49)  Airfoil Self Noise(ASN), Building Energy, Abalone, CCPP, and Coil2000.Table 2 shows the details of the datasets.The 5-fold evaluation method is used in our experiments.Table 3 provides the details of the five folds.In this paper, the input features of the datasets are normalized to [−1, 1].Also, the target outputs are normalized to [0, 1].In the original BLS frame-work, the input weights and the biases of the enhancement nodes are generated randomly and then fixed afterward.Based on the original BLS concept, the values of the generated weights and biases are between the range [−1, 1].
The WBLS algorithm has a weighted penalty factor which enhances its performance.In this work, the weighted penalty factor in WBLS is constructed by the Huber weight function with positive adjustable Huber parameter given by b = 0.6.
The original BLS algorithm, WBLS algorithm and CBLS algorithm have a regularizer term that can mitigate the effect of fault in the network.In the original BLS algorithm and other variants, there is a tuning regularization parameter .The value of affects the performance of resultant BLS networks.There is no analytical way of selecting the suitable value of .In this experiment, various values are tested for each dataset, and then the best value is found based on the test set.Table 4 summarizes the various tested values.
In the BLS concept, the feature nodes of a BLS network uses linear activation function to process the input data.The problem with this approach is that the output values of feature nodes could be very large.This behaviour can affect the performance of the original BLS algorithm, if the output value is forwarded to enhance node directly, without any additional process.In the traditional BLS, the commonly used solution is that after getting the output values of feature nodes, the values are mapped or normalized to range [0, 1].The normalized values are further processed by enhancement node [66].This approach can improve the performance of the original BLS algorithm.In this paper, for fair comparison, this normalization concept is used for the original BLS algorithm, WBLS algorithm, CBLS algorithm and the proposed FABLS algorithm.5.
From Figs. 2-4, when the noise levels are small, the average test MSE values of the four algorithms are similar.It is because when noise levels are small, the advantage of our proposed FABLS is not fully demonstrated.However, the average test set MSE values of the proposed FABLS algorithm are still small those of the original BLS algorithm, WBLS algorithm, and CBLS algorithm.The improvements of the proposed FABLS are more significant when the noise levels is large.For example, for the ASN dataset, at low noise level of {p β = 0.01, σ 2 β = σ 2 z = σ 2 w = 0.01}, the average test set MSE of the original BLS is 0.018626, the average test set MSE of the WBLS is 0.018159, and the average test set MSE of the CBLS is 0.018603.For the proposed FABLS algorithm, it has an average test set MSE of 0.016880 only.When the open weight fault level is fixed at p β = 0.01 and the multiplicative noise level is changed to {σ 2 β = σ 2 z = σ 2 w = 0.25}, the average test set MSE of the original BLS jumps to 0.082964.Also, the average test set MSE of the WBLS jumps to 0.067633 and the average test set MSE of the CBLS jumps to 0.082206.However, the average test set MSE of FABLS is 0.020835 only.
In a more complicated dataset, namely CTslice, the performance of our proposed FABLS algorithm is clearly superior to the BLS algorithm, the WBLS algorithm and the CBLS algorithm, as shown in the first row and the first column subfigures in Figs 2-4

C. PAIRED T-TEST
As explained earlier in Subsection VI-B, from Table 5, in terms of average test set MSE, the proposed FABLS algorithm are better than the comparison algorithms.However, there is still a key question: if the improvements of using our FABLS are statistically significant.Hence, the paired t-test analysis is performed.It a standard method to investigate whether the mean difference between two sets of samples are statistically significant or not.For the two tailed test with 5 trials and 95% confident level, the critical t-value is 2.132.

1) PAIRED T-TEST BETWEEN FABLS AND BLS
Table 6 summarizes the paired t-test result between the proposed FABLS algorithm and the original BLS algorithm.From the table, all the t-values are greater than the critical t-value, i.e., 2.132.Hence there is a strong evidence to conclude that the improvements are statistically significant.For instance, in the Coil2000 dataset with noise/fault level equal to {p β = 0.1, σ 2 z = σ 2 z = σ 2 z = 0.09}, the t-value is 13.12, which is greater than 2.132.Furthermore, the average MSE improvement of using FABLS is 0.053712 and the corresponding confident interval of average improvement is [0.042344-0.065081].In addition, the confident interval does not include zero.In the table, similar situations are found in other noise levels and datasets.With these results, there is a strong evidence that the FABLS is better than BLS.In fact, from the t-values at the large noise levels, the confidence  of the improvements is extremely strong for high noise/fault levels.

2) PAIRED T-TEST BETWEEN FABLS AND WBLS
Table 7 presents the paired t-test results between the proposed FABLS algorithm and the WBLS algorithm in a tabular form.Clearly, it is seen that all the t-values are greater than the critical t-value, which is 2.132.In addition, all confident intervals of the average improvements of using FABLS do not include the zero.For instance, in the ASN dataset with noise level equal to p β = 0.01, σ 2 β = σ 2 z = σ 2 w = 0.09}, the t-value is 11.50, which is greater than 2.132.Furthermore, the average improvement of using FABLS is 0.015640 and the corresponding confident interval of average improvement does not include zero.In the table, similar situations are found in other noise/fault levels and datasets.With these results, there is a strong evidence that the FABLS is better than WBLS.
3) PAIRED-T TEST BETWEEN FABLS AND CBLS Similar to the other comparison algorithms, Table 8 summarizes the paired t-test results between the proposed FABLS algorithm and the CBLS algorithm.
As shown in Table 8, all confident intervals of the average improvements of using FABLS do not include the zero, and all the t-values are greater than the critical t-value.For instance, in the CTslice dataset with noise level equal to {p β = 0.01, σ 2 β = σ 2 z = σ 2 w = 0.09}, the t-value is 173, which is much greater than 2.132.Furthermore, the average MSE improvement of using FABLS is 1.368225 and the corresponding confident interval of average improvement [1.346355 − 1.390095].In the table, similar situations are found in other noise levels and datasets.With these results, there is a strong evidence that the FABLS is better than CBLS.

VII. CONCLUSION
This paper first analyzed the effects of open weight fault and multiplicative noise in the BLS network.Four kinds of noise/fault are considered.They are open weight fault in output weights, multiplicative noise in output weights, multiplicative noise in feature nodes, and multiplicative noise in input weights of enhancement nodes.From the analyzed result, a fault aware objective function for BLS networks is then developed.The objective function contains a fault aware regularizer term which is capable of mitigating the effect of aforementioned fault/noise.Besides, there is no the regularizer parameter in our approach.On the other hand, in the traditional BLS, WBLS and CBLS, there is is tuning regularization parameter.
Based on the proposed objective function, a faultaware learning algorithm was developed to train a BLS network.The simulation experiments show that the proposed FABLS algorithm outperforms the original BLS algorithm, the WBLS algorithm and the CBLS algorithm.Also, the paired t-test results show that the improvements of the proposed FABLS algorithm are statistically significant.

FIGURE 1 .
FIGURE 1.A typical structure of a BLS network.
where ζ g,i,u 's and ζ h,j,v are open weight factors that describes whether the output weights are opened or not.If an output weight is opened, ζ = 0. Otherwise, ζ = 1.In this paper, the open weight factors are modelled as iid binary random variables.Hence, their probability mass functions are given by Prob

VOLUME 9, 2021 Algorithm 1
The Training Process of FABLS Algorithm Require: Training data matrix X according to(11) and the corresponding target y, number of groups of feature nodes n, number of groups of enhancement nodes m.

FIGURE 3 .
FIGURE 3. Comparison of various algorithms for each dataset, where the open fault rate is p β = 0.05.Three noise levels are considered.The noise levels are:{σ 2 z = σ 2 w = σ 2 β = 0.01}, {σ 2 z = σ 2 w = σ 2 β = 0.09} {σ 2 z = σ 2 w = σ 2 β = 0.25}.Three multiplicative noise levels are considered, and they are:{σ 2 β = σ 2 z = σ 2 w = 0.01}, {σ 2 β = σ 2 z = σ 2 w = 0.09}, and {σ 2 β = σ 2 z = σ 2 w = 0.25} ).Also, three open fault rates are considered, and they are: {p β = 0.01}, {p β = 0.05}, {p β = 0.1}.In our experiment, n = 10 and f i = 10, where i = 1, • • • , n.With this setting, there are 100 feature mapped nodes.For the enhancement nodes, m = 1 ande 1 = e = 100.Hence, there are 100 enhancement nodes.With the setting, there are 200 hidden nodes in the BLS network.B. PERFORMANCE COMPARISON This subsection presents the performance of the comparison algorithms in term of average test set MSE of noisy networks.The 5-fold evaluation strategy is used in the experiments.For each dataset and each fold, 2000 faulty networks are generated.For an easy and quick view of the result, Figs. 2-4 summarize the average test set MSE of noisy networks and the corresponding standard deviation over the 5 fold evaluation.Each figure shows the results for an open fault rate.Besides, the numerical values of the average test set MSE and the corresponding standard deviation of the comparison algorithms are listed in tabular form in Table5.From Figs.2-4, when the noise levels are small, the average test MSE values of the four algorithms are similar.It is because when noise levels are small, the advantage of our

TABLE 8 .
The paired t-test result of FABLS and CBLS.There are 200 nodes in the network.The 5-fold evaluation strategy is used.For each dataset and each fold, 2,000 faulty networks are generated.

TABLE 2 .
Details of the datasets.

TABLE 3 .
The data division over the 5 fold.

TABLE 4 .
various tested values for BLS, CBLS and WBLS algorithms.

TABLE 5 .
Average test set MSE and Standard Deviation of BLS, WBLS, CBLS and FABLS.There are 200 nodes in the network.The 5-fold evaluation strategy is used.For each dataset and each fold, 2,000 faulty networks are generated.averagetest set MSE of the WBLS algorithm is 0.147297 and the average test set for CBLS algorithm is 0.156059.On the other hand, for FABLS, the test set MSE is 0.023772 only.When p β is fixed to 0.01 and {σ 2 β = σ 2 z = σ 2 w } is increased to 0.25, the original BLS algorithm has a very large average test set MSE, which is equal to 4.246942.In addition, the average test set MSE of WBLS is 4.018248 and the average test set MSE of CBLS algorithms is 4.245743.However, the average test set MSE of our proposed FABLS algorithm is 0.069069 only.For other datasets, similar results are obtained.Hence, our proposed FABLS algorithm outperforms the original BLS algorithm, the WBLS algorithm and the CBLS algorithm under the concurrent noise situation.

TABLE 6 .
The paired t-test result between FABLS and BLS.There are 200 nodes in the network.The 5-fold evaluation strategy is used.For each dataset and each fold, 2,000 faulty networks are generated.The critical t-value is 2.132.

TABLE 7 .
The paired t-test result of FABLS and WBLS.There are 200 nodes in the network.The 5-fold evaluation strategy is used.For each dataset and each fold, 2,000 faulty networks are generated.The critical t-value is 2.132.