Broad Learning System under Label Noise: A Novel Reweighting Framework with Logarithm Kernel and Mixture Autoencoder

The Broad Learning System (BLS) has demonstrated strong performance across a variety of problems. However, BLS based on the Minimum Mean Square Error (MMSE) criterion is highly sensitive to label noise. To enhance the robustness of BLS in environments with label noise, a function called Logarithm Kernel (LK) is designed to reweight the samples for outputting weights during the training of BLS in order to construct a Logarithm Kernel-based BLS (L-BLS) in this paper. Additionally, for image databases with numerous features, a Mixture Autoencoder (MAE) is designed to construct more representative feature nodes of BLS in complex label noise environments. For the MAE, two corresponding versions of BLS, MAEBLS, and L-MAEBLS were also developed. The extensive experiments validate the robustness and effectiveness of the proposed L-BLS, and MAE can provide more representative feature nodes for the corresponding version of BLS.


Introduction
In recent years, deep learning, as a research hotspot in the field of artificial intelligence, has achieved significant breakthroughs and widespread applications in various domains [1,2].Although deep learning boasts powerful learning capabilities, its training process is extremely time-consuming due to its complex model structure and the iterative adjustment of numerous hyperparameters [3,4].
Against this backdrop, Chen et al. [5] proposed a neural network learning framework called the Broad Learning System (BLS) in 2018.It originated from the Random Vector Functional Link Neural Network [6].BLS extends the width of neural networks to fit the data and has advantages such as a simple network structure and fewer model parameters [7][8][9].BLS does not require iterative adjustment and has an extremely fast learning speed.Since its proposal, BLS has attracted widespread attention and has rapidly developed in both theoretical and applied research fields.
Researchers have further explored the excellent performance of BLS, leading to further developments in many challenging areas [10][11][12][13].Chen et al. [14] developed a cascade structure, a recurrent structure, and a wide and deep combination structure based on BLS.Ye et al. [15] implemented image denoising by using cascaded BLS that connects feature mapping node groups and enhancement node groups in sequence.For chaotic time series prediction.
Furthermore, BLS demonstrates excellent performance in semi-supervised learning.For instance, Zhao et al. [16] developed a semi-supervised BLS (SS-BLS) that utilizes manifold regularization to obtain pseudo-labels for unknown data, thereby expanding the training set.Huang et al. [17] designed a generalized model with manifold regularization sparse features (BLS-MS), which utilizes latent information hidden in unlabeled samples for representation learning.Bhosle et al. [18] designed a deep learning CNN model for the Sensors 2024, 24, 4268.https://doi.org/10.3390/s24134268https://www.mdpi.com/journal/sensorsrecognition of the Devanagari digit.Deng et al. [19] developed a cluster clustering-based modular with a deep neural network for predicting flight arrival time.
The aforementioned BLS methods and their variants have shown good generalization performance and practical effectiveness, but these results are typically obtained under conditions where training samples are free from label noise.In recent decades, the rapid advancement of sensor technology has greatly raised the bar for sensor complexity, accuracy, and efficiency.The operating status of a sensor is critical to its health and reliability.However, during the data annotation process, factors such as sensor aging, human annotation errors, and environmental issues can degrade data quality, leading to incorrect labels, posing significant challenges to sensor operation.Addressing label noise is critical to improving the accuracy and reliability of sensor prediction and health management systems.The broad learning system employs ridge regression for output weights; although BLS uses one-hot encoding for sample labels, it is still highly sensitive to labeling noisy data.This has been elaborated on in detail [20].Therefore, in the label noise environment, we urgently need a more robust BLS method.
In response to the above problems, in previous studies, Lu et al. [21] proposed a robust weighted least squares support vector machine.Jin et al. [22] combined the L1 and L2 norms and effectively optimized the BLS model using the augmented Lagrange multiplier method to enhance model robustness.Chu et al. [23] introduced several weighted penalty factors to enhance model performance, resulting in the design of a weighted BLS (WBLS).Liu et al. [24] utilized modal regression instead of least squares measurement to train a generalized network, which generated a diagonal weight matrix through optimization strategies for stronger noise penalties.A notable work in improving the robustness of BLS is the robust manifold BLS (RM-BLS) [25].By introducing manifold embedding and random perturbation approximation, it aims to achieve robust mapping characteristics in certain specialized applications, such as predicting chaotic time series with noise.Zheng et al. [26] employed the maximum correlation entropy criterion to train network connection coefficients, achieving outstanding regression and classification performance in noisy environments.The graph regularized BLS introduced in [27] incorporates a target function based on maximum likelihood estimation, assigning appropriate weights to each sample for classifying data with label noise.In addition, other methods have also been proposed in recent years [28][29][30][31][32][33][34][35][36].
The above methods have achieved certain results for BLS, but there is still room for improvement.In this paper, the main contributions are listed as follows: (1) Based on the Gaussian kernel, a novel function called Logarithm Kernel (LK) is constructed to effectively enhance the robustness of BLS under label noise conditions.(2) A new robust broad learning reweighting framework (L-BLS) is designed by adding Logarithm Kernel (LK) to BLS for training output weights in order to significantly mitigate or eliminate the impact of label noise on BLS.(3) A Mixture Autoencoder (MAE) is constructed to create more representative feature nodes in BLS for image databases with numerous features.(4) MAEBLS and L-MAEBLS were developed to improve the expressiveness of the enhanced feature nodes of the corresponding BLS version and reduce their sensitivity to label noise.
The rest of this paper is organized as follows: Section 2 provides a brief review of BLS.In Section 3, we elaborate on the proposed methods.Section 4 presents the robustness analysis of the Logarithm Kernel (LK).Section 5 offers extensive experimental results and discussions.Finally, Section 6 concludes the paper.

Review of the Broad Learning System
To maintain consistency in this text, variables are represented in italics, vectors are represented in bold lowercase letters (e.g., x), and matrices are represented in bold uppercase letters (e.g., X).More generally, for any matrix A ∈ R m×n , a i ∈ R 1×n , a j ∈ R m×1 , a i,j represents the element of matrix A at the ith row and jth column.The capital letter T indicates the transpose operator.Let us denote X = x 1 ; x 2 ; . . .; x N ∈ R N×M as the training sample matrix, where N represents the number of samples, M represents the feature dimension, and Y = y 1 ; y 2 ; . . .; y N ∈ R N×C as the training label matrix, where C means the number of classes.
The introduction of the Broad Learning System (BLS) provides an effective and efficient learning framework for classification and regression problems.The main advantage of BLS is its ability to map input data into a series of random feature spaces and determine output weights through optimized least squares.When new nodes or inputs arise, the training process can be extended to an incremental learning mode.
Here, a set of labeled training samples is provided to the BLS, for example, {X,Y}.Assume that BLS has n groups of feature mapping nodes; each group contains k feature mapping nodes.The feature node can be represented by Z i .
where ϕ i (•) represents the activation function.W e i ∈ R M×k and β e i ∈ R N×k both represent randomly generated weight matrices and biases.
Connecting the n groups of feature mapping nodes together forms the feature mapping layer.Here, the feature mapping layer is represented by Z n .
The feature mapping layer Z n is passed to enhancement nodes to construct the feature enhancement layer.The enhancement node is represented by H j .
where ξ j i (•) represents the activation function.W h j ∈ R nk×b and β h j ∈ R N×b both represent randomly generated weight matrices and biases.The enhancement nodes are connected together to form the enhancement layer, resulting in H m .The enhancement layer has m groups of enhancement nodes; each group contains b nodes.
Connecting the feature mapping layer and the enhancement layer together, the complete state matrix A can be represented as follows: This optimization problem can be formulated as finding the regularized least squares solution of Y = AW.Hence, the BLS can be trained as follows: where λ is the regularization parameter.W ∈ R (nk+mb)×C , A A T is the Moore-Penrose pseudoinverse, and more details can be found in [5].

The Proposed Method
To enhance the robustness of the BLS framework, this section delves into a comprehensive exploration of the proposed Logarithm Kernel (LK) function.We elucidate its integration into the width learning system for training output weights, thereby yielding the reweighting framework L-BLS.At the end of this section, we will introduce a Mixture Autoencoder (MAE) that helps BLS build more representative feature nodes in a noisy label environment and two BLS versions of MAEBLS and L-MAEBLS for image data sets.

Logarithm Kernel (LK)
The correlation entropy between two random variables X and Y serves as a correlation measure in kernel space, as elucidated in [37][38][39][40].
where E[•] represents the expectation operator, F XY (x, y) represents the joint distribution function, ⟨Φ(x), Φ(y)⟩ H = κ(x, y), κ(x, y) is the Mercer kernel [21] controlled by the kernel size σ.We can clearly obtain the following equation: In the field of machine learning, E(κ(X, Y)) is commonly used to estimate the degree of correlation between the true values and the predicted values.In complex, noisy environments, it is important to accurately quantify the correlation between the true values and the predicted values and enhance the robustness of parameter estimation.
Non-second-order statistical measures can be elegantly defined as second-order measures in kernel space.Non-second-order statistical measures can be elegantly defined as second-order measures in kernel space.According to Property 3 provided in [39], the correlation entropy has the potential to capture the second-order and higher order statistical characteristics of the error when using a Gaussian kernel.With an appropriate kernel size setting, the second-order statistical characteristics of the error can dominate, which makes entropy-based optimization criteria a suitable choice in label noise environments as well.This paper presents a new function called the Logarithm Kernel (LK).
where kernel G (y ′ i , y i ) is equal to exp(− ) and σ > 0, y ′ i represents the predicted label, and y i represents the real label.
Clearly, we can obtain V y ′ i , y i = E kernel G y ′ i , y i .By using the Taylor series expansion to V y ′ i , y i , we have V y ′ i , y i can be regarded as the weighted sum of all the even moments of y ′ i and y i , with the weights of the second and higher order moments controlled by the kernel size σ.The kernel size determines the weight of the individual, even at moments when calculating the weighted sum.When σ increases, it means that we consider the difference between y ′ i and y i more smoothly, and the weight of the higher order even moments decreases.This property allows the correlated entropy to better adapt to different data situations and Sensors 2024, 24, 4268 5 of 20 improves its robustness in practical applications.For finite sample data, V y ′ i , y i can be approximated as follows: Similarly, each sample is introduced into ζ(X, Y) = log(1 + kernel(X, Y)) and changes the following equation:

The Proposed L-BLS
Based on the above motivations and to address the worse performance of ridge regression in complex label noise environments., we transform LK to a BLS-based reweighting framework, L-BLS, for training output weights.
Similar to BLS, the state matrix A can be constructed to build the feature mapping via (1)- (5).Therefore, to enhance the robustness of the BLS, incorporating the LK proposed in Section A and sample reweighting techniques into the optimization model of the BLS can be represented as follows: where a i ∈ R L represents the feature of the i-th sample among N data samples.According to our experience, if we need to find W satisfying the condition in Equation ( 15), we should first calculate the gradient of this equation.To simplify the calculation, let us By taking the derivative of the function ϕ(W) with respect to W, we obtain the following equation: To simplify subsequent calculations, ∂W can be represented as follows: where U is equal to Sensors 2024, 24, 4268 6 of 20 Letting the partial derivative ∂ϕ(W) ∂W be zero, W can be expressed as follows: Observing W and U, we can see that the right-hand side can be viewed as a function of W, so we can further rewrite them as follows: A more intuitive L-BLS framework and detailed algorithm are shown in Figure 1 and Algorithm 1.In Algorithm 1, the weight of each sample is continuously updated by iteratively optimizing U and W. Reasonable weights are assigned to samples with the correct labels.The effectiveness of the sample weighting framework can be more intuitively presented in Section 5.

Mixture Autoencoder
In this section, for image databases with numerous features, a novel Mixture Autoencoder (MAE) is constructed by utilizing convolutional autoencoder techniques [41] and the advantages of variational autoencoders [42].The purpose of MAE is to help BLS and its variants create more representative feature nodes under label noise conditions [43,44].
The encoder network consists of convolutional layers and nonlinear activation functions, which are used to extract features from the input images.The convolutional layers employ multiple convolutional kernels to perform convolutional operations on the input images, capturing the spatial structure and features of the images.Nonlinear activation functions are then applied to introduce nonlinear transformations, enhancing the expressive power and robustness of the features.The structure of the Mixture Autoencoder is shown in Figure 2.

Mixture Autoencoder
In this section, for image databases with numerous features, a novel Mixture Autoencoder (MAE) is constructed by utilizing convolutional autoencoder techniques [41] and the advantages of variational autoencoders [42].The purpose of MAE is to help BLS and its variants create more representative feature nodes under label noise conditions [43,44].
The encoder network consists of convolutional layers and nonlinear activation functions, which are used to extract features from the input images.The convolutional layers employ multiple convolutional kernels to perform convolutional operations on the input images, capturing the spatial structure and features of the images.Nonlinear activation functions are then applied to introduce nonlinear transformations, enhancing the expressive power and robustness of the features.The structure of the Mixture Autoencoder is shown in Figure 2.

Mixture Autoencoder
In this section, for image databases with numerous features, a novel Mixture Autoencoder (MAE) is constructed by utilizing convolutional autoencoder techniques [41] and the advantages of variational autoencoders [42].The purpose of MAE is to help BLS and its variants create more representative feature nodes under label noise conditions [43,44].
The encoder network consists of convolutional layers and nonlinear activation functions, which are used to extract features from the input images.The convolutional layers employ multiple convolutional kernels to perform convolutional operations on the input images, capturing the spatial structure and features of the images.Nonlinear activation functions are then applied to introduce nonlinear transformations, enhancing the expressive power and robustness of the features.The structure of the Mixture Autoencoder is shown in Figure 2.  To achieve efficient latent representation learning, a reparameterization of EncoderOutput is performed, and reparameterization factors a and b are introduced, ensuring a + b = 1, as in Equation ( 23), to allocate reasonable weights.This enhances the model's robustness while ensuring feature integrity.
where DecoderInput represents the input to the decoder, E represents EncoderOutput, Variance(x) denotes the standard deviation of x, and Mean(x) denotes the mean of the vector x.
The decoder network is responsible for mapping the learned low-dimensional latent representation back to the original image space to verify whether the encoder part provides representative feature nodes.To simplify the model in the decoding part of MAE, a fully connected approach is adopted.Nonlinear activation functions are applied layer by layer to decode the features, reconstructing the input image through inverse transformations to recover the information of the input image to the maximum extent.
The number of convolutional layers and the size of each convolution kernel determine the receptive field and feature extraction capability of the model.More convolutional layers and larger convolution kernels can not only capture more complex and high-order features but also increase computational complexity.So, in MAE, we designed three convolutional layers with 5 × 5 convolution kernels.
On the other hand, we encode by retaining 40%-50% of the original data features in order to provide the most representative features of the image to BLS during MAE training.The entire decoder design gradually maps low-dimensional features back to highdimensional features through linear layers and ReLU activation functions.The original input data can be efficiently reconstructed by gradually increasing the dimensionality and complexity of the data.In the process of reconstructing the original image, if the decoder's reconstruction step size is too small, it will increase the uncertainty and complexity of the decoding process.In this paper, we gradually reconstruct the original data through two stages of similar step size.This can improve the stability of the decoder while reducing the uncertainty and complexity of the decoder.In addition, the encoding and decoding methods of the convolutional layer and the fully connected layer determine the choice of subsequent parameters to a certain extent.
Here are the detailed parameter settings, as shown in Tables 1 and 2. We use Conv i and FullyConnected i to represent the i-th convolutional layer and the i-th fully connected layer.This process enhances the feature extraction capability of BLS, strengthens BLS's understanding and utilization of data, and provides a more reliable foundation and support for subsequent applications.In any version of BLS, MAE can be used to achieve high-quality extraction of image features.

The Proposed MAEBLS and L-MAEBLS
On the image database, based on BLS, LK, and MAE, we developed MAEBLS and L-MAEBLS.As can be seen from Figure 3, we embed MAE into the feature layer of BLS and use MAE to encode complex image data to help BLS build more representative feature nodes to obtain MAEBLS.At the same time, the decoder is used to verify the effectiveness of the encoder.Based on MAEBLS, transform LK to a MAEBLS-based reweighting framework for training output weights to obtain L-MAEBLS.This enables BLS to focus more on valuable features, thereby alleviating the performance degradation caused by insufficient feature extraction capabilities when label noise exists.In Section 5, the experiment results will validate our statement.EncoderOutput = MAE(data) (24) of the encoder.Based on MAEBLS, transform LK to a MAEBLS-based reweighting framework for training output weights to obtain L-MAEBLS.This enables BLS to focus more on valuable features, thereby alleviating the performance degradation caused by insufficient feature extraction capabilities when label noise exists.In Section 5, the experiment results will validate our statement.

Proof of Robustness
The method proposed in this paper demonstrates impressive robustness; thus, its inherent mechanism for robustness to label noise will be proved in this section.On one hand, we can approximate ζ( ,  ) as ζ(e), where e = ||y −  || .We can observe that ζ(e) is bounded, smooth, and non-convex loss.Therefore, ζ(e) exhibits robustness under noisy conditions.On the other hand, we will proceed to demonstrate the robustness mechanism of the method proposed in this paper.
Theorem 1. Through L-BLS, normal samples are assigned larger weights, while noisy samples are assigned smaller weights.Therefore, L-BLS can be more robust than BLS.
Proof.The error term of the robust width learning model L-BLS proposed in this paper can be regarded as E = log(1 + exp(−e )), log(1 + exp(−e )), . . . . . ., log(1 + exp(−e )) .Let the boundary for determining whether a sample is corrupted be θ .If log(1 + exp(−e )) < θ, the i-th sample is considered an intact sample.Otherwise, it is considered a corrupted (containing label noise) sample.We can set the weights of the k-th sample in BLS, and L-BLS are assigned as follows: where, for ease of representation and understanding, let us set L = log(1 + exp(−e )).
We define δ = .Substituting the above equation into Equation ( 30), we get

Proof of Robustness
The method proposed in this paper demonstrates impressive robustness; thus, its inherent mechanism for robustness to label noise will be proved in this section.On one hand, we can approximate ζ y ′ i , y i as ζ(e), where e = ||y ′ i − y i || 2 .We can observe that ζ(e) is bounded, smooth, and non-convex loss.Therefore, ζ(e) exhibits robustness under noisy conditions.On the other hand, we will proceed to demonstrate the robustness mechanism of the method proposed in this paper.
Theorem 1. Through L-BLS, normal samples are assigned larger weights, while noisy samples are assigned smaller weights.Therefore, L-BLS can be more robust than BLS.
Proof.The error term of the robust width learning model L-BLS proposed in this paper can be regarded as E = log 1 + exp −e 2 1 , log 1 + exp −e 2 2 , . . . . . ., log 1 + exp −e 2 N .Let the boundary for determining whether a sample is corrupted be θ.If log 1 + exp −e 2 i < θ, the i-th sample is considered an intact sample.Otherwise, it is considered a corrupted (containing label noise) sample.We can set the weights of the k-th sample in BLS, and L-BLS are assigned as follows: where, for ease of representation and understanding, let us set L k = log 1 + exp −e 2 k .We define Substituting the above equation into Equation (30), we get Sensors 2024, 24, 4268 10 of 20 Furthermore, by normalizing the Cauchy-Schwarz inequality [30] to the above equation, we obtain the following equation: An obvious fact is that the error of noisy data is much larger than that of intact data.Therefore, when e k belongs to normal training samples, we obtain the following equation: Further derivation leads to the following equation: This implies that in L-BLS, compared to the base width learning system, normal data can be assigned larger weights.Thus, all training data can be concluded as Thus, we can derive the following equation: Thus, this proof is finished.□

Experimental Results
In this section, the performance of the proposed L-BLS, L-MAEBLS, and MAEBLS for classifying data with label noise was evaluated through extensive experiments.Accuracy (ACC) was chosen as the evaluation metric.
where ψ(a, b) is a function that computes the number of correctly classified samples.Unless otherwise stated, all experiments were conducted using Python 3.10 on a computer equipped with an Intel i7 2.5-GHz CPU and 16-GB RAM.

Dataset Selection and Parameter Settings
Our experiments utilized six datasets from the UC Irvine (UCI) Machine Learning Repository [45] and three image classification datasets: Coil20 [46], ORL [47], and UMIST.Their features and partitions are shown in Table 3.
In the UCI datasets, we selected BLS [5] and four robust BLS models, including WBLS [23], C-BLS [26], ENBLS [22], and GRBLS [27], as comparison methods.On the image datasets, we constructed MAEBLS and L-MAEBLS based on the above methods and compared them with their original versions to demonstrate the ability of MAE in feature extraction under label noise [48][49][50].To ensure fairness, we conducted grid searches within the same range to search for common parameters of the comparison methods in order to obtain the best performance.Commonly used parameters include the number of feature mapping groups N ω , the number of feature mapping nodes per group N f , the number of enhancement nodes N e , and the L2 regularization parameter λ.
More specifically, for each UCI dataset, the search range for N f and N ω is [1,15], with a step size of 2. The search range for N e is [10,50], with a step size of 5. Search for the L2 regularization parameter within the range 2 −30 , 2 −25 , . . ., 2 0 .For each image dataset, the search range for N f and N ω is [10,50], with a step size of 5. Besides, the search range for N e is [1000, 5000], with a step size of 1000.The search range for the L2 regularization parameter λ is set to be the same as the corresponding range for the UCI datasets.
In the comparison methods, refer to the corresponding methodological papers, in the Huber-WBLS model, the positive adjustable parameter is set to 1.345.For the C-BLS model, the kernel size; for the ENBLS model, the L1 regularization parameter λ 1 and the L2 regularization parameter λ 2 ; for the GRBLS model, the regularization factor for the manifold term; and for the L-BLS model, the kernel width are all searched within the range of 2 −30 , 2 −25 , . . . 2 0 .
Additionally, to eliminate the scale effect, we normalize the attributes of the datasets to [-1, 1].For the UCI datasets, their attributes are normalized individually.For the three 8-bit grayscale image datasets, all attributes are divided by 127.5 and then subtracted by 1.Each dataset undergoes 50 repeated processes with all comparison methods, using the corresponding fixed optimal parameters to ensure the stability and reliability of the results.

Noise Modeling
People generally believe that labels always have a greater impact on the modeling process.In the model learning process, since the importance of each feature to model learning varies, label noise tends to have a more profound and detrimental effect than feature noise.
Ghosh et al. [51] proposed that label noise can be represented as follows: ∼ y = y , with probability 1 − η y j , jϵ[N], y j ̸ = y, with probability η j = η N−1 (35) where η represents the noise ratio, and satisfies η = ∑ η j .N represents the total number of classes.When η is a constant, η j can be represented as η j = η N−1 .In this case, the type of label noise is symmetric or uniform noise.Otherwise, the noise type is asymmetric noise, where the true labels are randomly flipped to another class.
To be more realistic, in our experiments, we chose uniform label noise to simulate common noise situations.As for the process of mislabeling, it is completely random, with equal probabilities for all other classes.

Performance Evaluation on Data with Label Noise
In this section, the comparison results of the above methods on different UCI datasets under different pollution rates (η = 0%, η = 10%, . .., η = 50%) are compared.The results are shown in Tables 4 and 5 as the average values ± STD (%), with the best results highlighted in bold.After thorough experimentation, we have confirmed the superior performance of L-BLS.Analyzing Tables 4 and 5, we can make the following observations: First of all, across most UCI datasets, L-BLS outperforms competitors at various contamination levels, especially at higher noise levels like 40% and 50%.For instance, on the Wine dataset with 40% label noise, L-BLS achieves an impressive average accuracy of 94.25% with a minimal standard deviation (STD) of 0.96, while other methods struggle to reach 90% accuracy.This underscores L-BLS's robustness to label noise.Figures 4 and 5 provide a visual comparison of method trends on the Iris and Ecoil datasets.L-BLS outperforms other methods under any contamination rate conditions.In the Iris dataset, the accuracy of Sensors 2024, 24, 4268 13 of 20 the L-BLS method is always above 96%, which is difficult to achieve with other methods.In particular, in the Ecoil dataset, when other methods are seriously affected by label noise, the L-BLS method can still remain stable and be far ahead of other methods.We can conclude that as the contamination rate increases, L-BLS maintains an acceptable accuracy drop, while the accuracy of other methods drops significantly.Second, L-BLS surpasses all comparison methods on COIL20, ORL, and UMIST datasets, except for L-MAEBLS, even maintaining acceptable performance degradation under severe contamination rates.This shows that L-BLS can still demonstrate strong robustness on image datasets.Third, on most clean datasets, L-BLS demonstrates superior performance compared to standard BLS and other methods, suggesting its capability to reweight and detect samples, further enhancing its performance.Overall, L-BLS consistently outperforms other methods across different scenarios, with acceptable performance even under high contamination rates.These comprehensive results indicate promising applications for L-BLS.enhancing its performance.Overall, L-BLS consistently outperforms other methods across different scenarios, with acceptable performance even under high contamination rates.These comprehensive results indicate promising applications for L-BLS.Notably, L-BLS consistently exhibits superior performance and accuracy across different contamination rates, especially under high noise levels, showcasing its robustness in label noise environments for effective classification tasks.

The Performance of L-BLS Combined with MAE
In the image dataset, in order to verify the MAE's ability to enhance feature extraction and robustness under label noise conditions in the wide learning system, we added MA-EBLS and L-MAEBLS to the experimental part and compared them with their corresponding BLS versions.In the COIL20, ORL, and UMIST datasets, MAEBLS shows more powerful feature extraction capabilities than BLS, and L-MAEBLS shows more powerful feature extraction capabilities than L-BLS.Notably, L-MAEBLS exhibited even greater robustness.Key findings from Table 5 are as follows: First of all, for the COIL20, ORL, and UMIST datasets, the proposed MAEBLS and L-MAEBLS are superior to BLS and L-BLS at most pollution rates.From Figures 6 and 7, it can be seen intuitively that MAE can effectively help BLS construct feature nodes.In particular, as shown in Figure 8, L-MAEBLS shows a significant improvement over BLS in accuracy.When η = 50%, L-MAEBLS can improve the accuracy by 5.91% compared with L-BLS, and other noise conditions are also significantly improved.Second, in the above three image datasets, L-MAEBLS outperforms all other comparison methods in most Notably, L-BLS consistently exhibits superior performance and accuracy across different contamination rates, especially under high noise levels, showcasing its robustness in label noise environments for effective classification tasks.

The Performance of L-BLS Combined with MAE
In the image dataset, in order to verify the MAE's ability to enhance feature extraction and robustness under label noise conditions in the wide learning system, we added MAE-BLS and L-MAEBLS to the experimental part and compared them with their corresponding BLS versions.In the COIL20, ORL, and UMIST datasets, MAEBLS shows more powerful feature extraction capabilities than BLS, and L-MAEBLS shows more powerful feature extraction capabilities than L-BLS.Notably, L-MAEBLS exhibited even greater robustness.Key findings from Table 5 are as follows: First of all, for the COIL20, ORL, and UMIST datasets, the proposed MAEBLS and L-MAEBLS are superior to BLS and L-BLS at most pollution rates.From Figures 6 and 7, it can be seen intuitively that MAE can effectively help BLS construct feature nodes.In particular, as shown in Figure 8, L-MAEBLS shows a significant improvement over BLS in accuracy.When η = 50%, L-MAEBLS can improve the accuracy by 5.91% compared with L-BLS, and other noise conditions are also significantly improved.Second, in the above three image datasets, L-MAEBLS outperforms all other comparison methods in most cases.From Figure 9, we can intuitively see the excellent performance of L-MAEBLS on the ORL database, especially when there is 50% label noise.The average accuracy of L-MAEBLS is 73.33%, with a standard deviation of only 0.65, while other methods struggle to achieve an accuracy of 70%.
Sensors 2024, 24, x FOR PEER REVIEW 15 of 2 cases.From Figure 9, we can intuitively see the excellent performance of L-MAEBLS on the ORL database, especially when there is 50% label noise.The average accuracy of L MAEBLS is 73.33%, with a standard deviation of only 0.65, while other methods struggle to achieve an accuracy of 70%.

The Effectiveness of L-BLS
In order to further verify the effectiveness of the reweighting framework L-BLS, in this section, we take the COIL20 dataset under the condition of symmetric label noise with a contamination rate of η = 30% as an example.We plot the sum of squares of residuals for each sample to visually evaluate the effectiveness of the proposed method.Since onehot encoding is used to encode the labels of the samples, it is necessary to calculate the residuals for each element of each sample and then sum the squares of the residuals for each element.

The Effectiveness of L-BLS
In order to further verify the effectiveness of the reweighting framework L-BLS, in this section, we take the COIL20 dataset under the condition of symmetric label noise with a contamination rate of η = 30% as an example.We plot the sum of squares of residuals for each sample to visually evaluate the effectiveness of the proposed method.Since one-hot Sensors 2024, 24, 4268 16 of 20 encoding is used to encode the labels of the samples, it is necessary to calculate the residuals for each element of each sample and then sum the squares of the residuals for each element.SUM Y ′ − Y 2 (36) where SUM(A) represents the sum of rows in matrix A.
The residuals and their squares for each sample in the initial iteration and at the optimal iteration are shown in Figure 10a.From the results depicted in Figure 10a, it can be inferred that there exists an overlapping region, making it challenging to differentiate between clean and noisy samples.Hence, employing an iterative sample identification approach is more reasonable.Initially, only a few samples are distinctly identified as clean or noisy, with appropriate element weights, while the remaining samples are considered unidentified samples with moderate weights, effectively avoiding the negative impact of misjudgment.As the iterations progress, the squared errors of clean samples gradually decrease, while those of noisy samples gradually increase, facilitating more accurate sample identification.
Observing the results from Figure 10b, it can be noted that in the optimal iteration process, almost all clean samples are correctly identified, while the adverse effects of some noisy samples are effectively suppressed.

Statistical Analysis
In this section, we provide an analysis using the Friedman test to evaluate the statistical significance of the differences between the specified methods on the UCI datasets and image datasets.
First, we use the Friedman test with a confidence level of α = 0.1 to test the overall performance of different algorithms.As shown in Table 6, we can see that on the UCI dataset, our proposed L-BLS ranks the highest.C-BLS is second, while Huber-WBLS, ENBLS, and GRBLS perform better than the standard BLS.In the image dataset, as shown in Table 7, our proposed L-MAEBLS and L-BLS rank first and second, respectively, and our proposed MAEBLS also shows an improvement in ranking compared to BLS.
Second, as noise level is a critical factor affecting classifier performance, we employed the Friedman test at a confidence level of α = 0.1 to test the statistical differences of the complete set of algorithms under various noise levels.Tables 8 and 9 display the results of the Friedman tests for the UCI datasets and the image datasets, respectively.In Table 8, the participants included BLS, ENBLS, C-BLS, GRBLS, Huber-WBLS, and L-BLS.Building on

Figure 1 .
Figure 1.Architecture of the L-BLS.

Figure 2 .Figure 1 .
Figure 2. Architecture of a Mixture Autoencoder.To achieve efficient latent representation learning, a reparameterization of EncoderOutput is performed, and reparameterization factors a and b are introduced, ensuring a + b = 1, as in Equation (23), to allocate reasonable weights.This enhances the model's robustness while ensuring feature integrity.DecoderInput = aE + b[Variance(E) × E + Mean(E)] (23)

Figure 1 .
Figure 1.Architecture of the L-BLS.

Figure 2 .
Figure 2. Architecture of a Mixture Autoencoder.To achieve efficient latent representation learning, a reparameterization of EncoderOutput is performed, and reparameterization factors a and b are introduced, ensuring a + b = 1, as in Equation (23), to allocate reasonable weights.This enhances the model's robustness while ensuring feature integrity.DecoderInput = aE + b[Variance(E) × E + Mean(E)] (23)

Figure 4 .
Figure 4. Classification performance trends of different algorithms on the Iris dataset corrupted by noise of diverse levels.

Figure 4 .
Figure 4. Classification performance trends of different algorithms on the Iris dataset corrupted by noise of diverse levels.

Figure 4 .
Classification performance trends of different algorithms on the Iris dataset corrupted by noise of diverse levels.

Figure 5 .
Figure 5. Classification performance trends of different algorithms on the Ecoil dataset corrupted by noise of diverse levels.

Figure 5 .
Figure 5. Classification performance trends of different algorithms on the Ecoil dataset corrupted by noise of diverse levels.

Figure 6 .
Figure 6.Classification performance of MAEBLS corrupted by noise of diverse levels in three image datasets.

Figure 6 .
Figure 6.Classification performance of MAEBLS corrupted by noise of diverse levels in three image datasets.

Figure 6 .
Figure 6.Classification performance of MAEBLS corrupted by noise of diverse levels in three image datasets.

Figure 7 .
Figure 7. Classification performance of MAEBLS corrupted by noise of diverse levels in three image datasets.

Figure 8 .
Figure 8. Performance of L-MAEBLS on the UMIST database.

Figure 7 .
Figure 7. Classification performance of MAEBLS corrupted by noise of diverse levels in three image datasets.

Figure 6 .
Figure 6.Classification performance of MAEBLS corrupted by noise of diverse levels in three image datasets.

Figure 7 .
Figure 7. Classification performance of MAEBLS corrupted by noise of diverse levels in three image datasets.

Figure 8 .
Figure 8. Performance of L-MAEBLS on the UMIST database.Figure 8. Performance of L-MAEBLS on the UMIST database.

Figure 8 .
Figure 8. Performance of L-MAEBLS on the UMIST database.Figure 8. Performance of L-MAEBLS on the UMIST database.

Figure 9 .
Figure 9. Classification performance trends of different algorithms on the ORL database corrupted by noise of diverse levels.

Figure 9 .
Figure 9. Classification performance trends of different algorithms on the ORL database corrupted by noise of diverse levels.

Table 2 .
Fully connected layer parameter settings.

Table 3 .
Characteristics of the selected datasets.

Table 4 .
The classification results of different methods on different test sets of UCI databases.

Table 5 .
The classification results of different methods on different test sets of Images databases.

Table 6 .
Average rankings of different algorithms in classification accuracy in UCI databases.

Table 7 .
Average rankings of different algorithms in classification accuracy in image databases.

Table 8 .
Statistical testing of classification accuracy on the UCI databases.