Industry Image Classification Based on Stochastic Configuration Networks and Multi-Scale Feature Analysis

For industry image data, this paper proposes an image classification method based on stochastic configuration networks and multi-scale feature extraction. The multi-scale features are extracted from images of different scales using deep 2DSCN, and the hidden features of multiple layers are also connected together to obtain more informational features. The integrated features are fed into SCNs to learn a classifier which improves the recognition rate for different categories. In the experiments, a handwritten digit database and an industry hot-rolled steel strip database are used, and the comparison results demonstrate the proposed method can effectively improve the classification accuracy.


Introduction
With the development of computer technology, deep learning methods have been widely used for image recognition and classification [1].Since demonstrating their strong feature representation ability, CNNs with various structures have been developed [2][3][4][5][6] and widely used in image processing and analysis.For the industry image classification task, Masci et al. [7] presented a classification approach based on Max-Pooling CNNs with the features directly extracted from the pixel representation of the steel defect images.Lee et al. [8] proposed a classification method based on CNNs and class activation maps to implement a fast decision-making process.Chen et al. [9] proposed to combine three deep CNN models trained individually, and the average strategy was used to obtain defect classification.Konovalenko et al. [10] proposed a classifier based on two deep residual neural networks in which the hyper-parameters of the optimal model were selected through various investigations.Li et al. [11] proposed a CNN-T model by merging the CNN and Transformer encoder, and obtained an improvement in classification accuracy compared with the pure CNN.Feng et al. [12] introduced a ResNet50 classifier with two additional FcaNet and convolutional block attention modules to deal with steel surface defect data.To deal with the image classification problem, CNNs with different structures that consist of various modules have been developed and obtained better classification accuracy.However, in order to obtain the optimal model, the selection of hyper-parameters in the back-propagation algorithm requires a lot of prior work.
Random neural networks have also been developed for data analysis modeling; the weights and bias of the hidden layers are randomly assigned based on the randomized algorithm [13].The feed-forward neural networks with random weights (NNRWs) was proposed in [14].By adding the direct link between the input layer and output layer, Pao et al. [15][16][17] proposed the random vector functional link (RVFL) neural network.A stack RVFL was introduced in [18] using the negative correlation learning strategy.Lu et al. [19] proposed an image data recognition method using two dimensional neural networks with random weights (2D-NNRWs).The stochastic configuration networks (SCNs) in [20] were proposed to improve the learning process by introducing a supervised mechanism.Ensemble methods using base SCN models were introduced in [21,22] to deal with large-scale modeling.The base SCN model was expanded by using multiple hidden layers, and a deep network framework was built in [23,24].For matrix data, a two dimensional stochastic configuration network (2DSCN) [25] has been developed, which has advantages in image data analytics.On this basis, Li et al. [26] proposed an improved SCN with vision patch fusion, which improves the network's feature representation ability by extracting randomly fused image features from three-channel images.Li et al. [27] proposed to generate a different convolution kernel with physical meaning using a supervised learning mechanism, and a deep convolutional neural network was constructed for working condition recognition.
In this paper, stochastic configuration networks are used in the proposed image classification method.For feature extraction, compared with a single hidden layer, SCNs with deep structures can extract image features with low computing time.Moreover, because of randomly assigned weights and bias in the stochastic configuration algorithm, images of different scales are used to make up a loss of image details when constructing the deep networks.These multi-scale features are fused through a fully connected layer, and the integrated features are used to build a classifier based on SCNs to complete the identification of different categories.The main contributions are summarized as follows: (1) Designed an image classification framework based on SCNs for extracting features from multi-scale images.The remainder of this paper is organized as follows.Section 2 overviews the stochastic configuration networks.Section 3 details the proposed method with deep feature extraction SCNs and a learned classifier.Section 4 elaborates on the experiments and discusses the experimental results.Finally, the conclusion is presented in Section 5.

Stochastic Configuration Networks
As improved randomized networks, SCNs were proposed in [20] under a supervised mechanism.Based on the stochastic configuration algorithm, the weights and biases of the hidden layer are gradually configured until an SCN is built.The modeling process of the SCN model with a single hidden layer can be described as follows: (1) Given a training dataset {X, T} with N samples, where ) Suppose the SCN has been configured with L − 1 hidden nodes, the output can be calculated by where w l ∈ R d , b l ∈ R are the weights and bias, β l = [β l,1 , β l,2 , . . ., β l,m ] is the weight of the output layer and g is an activation function.The current residual error is (3) Suppose the error e L−1 does not reach the stop condition, then a new node should be added and the weight and bias w L , b L are configured to calculate the hidden output Denote a set of variables ξ L,q , q = 1, 2, . . ., m as follows: where 0 < r < 1, µ L is a sequence of real numbers and 0 The configured weights should satisfy the following constraint condition Based on the SC-I algorithm in [20], a set of candidate parameters w n L , b n L are randomly assigned under the condition Equation ( 5).The corresponding parameters are selected as the final values when ∑ m q=1 ξ L,q obtains the maximum value.
(5) Calculate the output weight β.For the configured and fixed hidden weights and bias, the hidden feature matrix is Then, the output weight β = [β 1 , β 2 , . . ., β L ] can be calculated In the stochastic configuration algorithm, the hidden layer weights w l , l = 1, 2, . . ., L are firstly taken from a certain distribution (e.g., Uniform distribution or Gaussian distribution), and the bias b l can be calculated by where X ⋆ is randomly taken from the training dataset [28].Then, the supervision mechanism ( 5) is employed to select proper weight values and add new hidden nodes.The deepSCN is an expansion of the SCN framework and more details about the theory and algorithm can be found in [23].
Similarly, for the image data, i.e., x i ∈ R d1×d2 , a 2DSCN model can be built under the supervised constraint Equation ( 5), and the output of the hidden node is where u l ∈ R d1 , v l ∈ R d2 are the hidden layer weight parameters and they are randomly assigned, and the bias is calculated via b l = −u T l X ⋆ v l with randomly sampled data X ⋆ .The weight of the output layer β l is calculated by Equation ( 6), and the model output is Moreover, by combining the learning processes of a 2DSCN and deepSCN, the construction of a deep 2DSCN can be described as follows.When the structure of deep networks is in series, the output of the current layer is regarded as the input data of the next layer.Due to the stochastic configuration algorithm, the weights and bias of the hidden layers are randomly generated from a certain distribution and selected with the inequality condition (5) satisfied.The weight parameters of the first hidden layer and the following hidden layers are u l , v l , b l and w k , b k ; the corresponding hidden features are extracted to calculate the parameter β and establish the deep 2DSCN model.

Proposed Multi-Scale Image Classification Based on SCNs
In order to extract multi-scale information and improve the feature representation ability, an SCN-based classification method is proposed in this paper.The overview of the proposed method is shown in Figure 1, the multi-scale images are obtained by downsampling from the source image data, and the images of each scale are used to construct a deep SCN model for feature extraction, respectively.These extracted features are combined via a fully connected layer, and an SCN-based classifier is built to obtain the classification result.

Multi-Scale Feature Extraction Based on deepSCNs
During the pre-processing process, images of different scales are obtained from the original data by performing kernel convolution operators and downsampling.For an image I, the new image of a different scale can be obtained by where σ is the Gaussian kernel and D p is the downsampling operator.
Given the training dataset {X, T}, where where σ n , n = 1, 2, . . ., K are different kernels and are used to obtain images of different scales.Since the label is unchanged, T n = T, n = 1, 2, . . ., K.
For each sub-dataset {X n , T n }, the deep SCN model is built for feature extraction.The detailed modeling process is described in Algorithm 1. Firstly, the images X n are used to extract the feature matrix H 1 of the first hidden layer, i.e., where h l (X n ) = g(u T l X n v l + b l ).If the number of hidden nodes is equal to L max or the early stop condition is satisfied, the corresponding weights and bias {u l , v l , b l } are fixed, and the parameter learning of the next layer proceeds.
For the second hidden layer, since the network is connected in series, the input data are the feature vectors in the matrix H 1 (X n ) which is extracted from the layer above.For each feature vector, the weights and bias w l , b l are randomly assigned through the inequality constraint and the feature matrix H 2 (X n ) is calculated.The network structure is built step by step with the adding of hidden nodes until the stop condition is met.
Suppose the deep SCN model has M hidden layers, for each sub-dataset, the final feature matrix can be obtained by combining the hidden layer features, Algorithm 1: Deep SCNs for images.3) and ξ L,q Equation (4);

SCN-Based Classifier
For the multi-scale feature extraction, the deep SCN models are established by executing Algorithm 1 on each sub-dataset, and the corresponding feature matrix H(X n ) is calculated as the output.
A fully connected layer is used to fuse the features extracted from images of different scales; the fully connected feature vector can be calculated by Then, the feature data H are used for building the SCN classifier, and the classification results are obtained.
The proposed classification method consists of two stages: the multi-scale feature analysis and classifier learning.For multi-scale feature analysis, the sub-images of different scales are obtained by various convolution kernels operators.Image features are extracted from sub-images by building a deep SCN model, and in order to reduce the information loss among hidden layers, the hidden features of each layer are combined together.Then, the connected features are used to build the classification model, and an SCN-based classifier is learned to identify different categories.

Performance Evaluation 4.1. Experimental Setup
In this section, the Handwritten Digit Recognition CVL database [29] and the Northeastern University (NEU) surface defect database [30,31] are used for experimental comparison.The CVL database serves as a standard database for digit recognition and classification and is usually used for performance comparison among various algorithms.The NEU dataset has six kinds of typical surface defects of the hot-rolled steel strip, including rolledin scale (RS), patches (Pa), crazing (Cr), pitted surface (PS), inclusion (In) and scratches (Sc), as shown in Figure 2. The dataset includes 1800 images with the size 299 × 299.Due to the small sample size, the deep neural network model is prone to overfitting.The window sampling method is applied to expand the number of samples to 10,000, and the new sample images are resized to 64 × 64.In the experiment, the image data are randomly sampled, with 80 % as the training data and the remaining 20 % as testing data.In the proposed method, for the parameters used to build SCNs, we take the maximum number of candidate layers M max = 5, the training tolerance error ϵ = 0.0001, the range of the hidden weights λ = {0.5, 1, 5, 30, 50, 100}, r = {0.9,0.99, 0.999, 0.9999, 0.99999}, the maximum number of candidate nodes T max = 100 and g is the sigmoid activation function.All the simulation experiments in this paper are performed by using the MATLAB R2023a on a computer with a 3.00 GHz CPU and 16 GB memory.

Results and Discussion
We first evaluate the classification results of deep SCNs with different numbers of hidden layers.For the CVL database, the number of hidden layers M max is set to {1, 2, 3, 4, 5} with L max = 500.Since the SCNs are built by incremental learning in Algorithm 1, the classification accuracy of different hidden nodes L is calculated.As shown in Figure 3, with the increase in hidden layer nodes, the classification accuracy of the training and testing process increases.For M = 1, M = 2, M = 3, we find that the results of the test data are very close when L = 500, while the computing time shown in Table 1 is completely different.For the NEU dataset, the number of hidden layers M max is set with L max = 1000, as shown in Figure 4.For the hidden layer M = 2, the values of the classification result are close to that of M = 1 before L = 500; then, the values get higher with the increment in hidden nodes L. However, the curve values of M = 4 and M = 5 decrease and the classification effect becomes worse.The comparison results of different M show that large M can extract more useful and reliable feature information for image representation, the built deep SCN model can obtain a good recognition rate, while the computing time is shorter.In addition, due to the limited number of training samples, the deep model may become overfitted.Different from the back-propagation algorithm, the deep network of the proposed method is built by the stochastic configuration algorithm; the modeling process is an incremental learning process, meaning it can adaptively adjust the network structure.In order to analyze the influence of image size on modeling, the original image data are downsampled to obtain samples of different sizes.The original sizes of the images in the CVL database and the NEU dataset are 28 × 28 and 64 × 64 , respectively.Since the CVL database consists of images with handwritten digits, the image size has a significant impact on molding.As shown in Figure 5, image samples of three different scales are used to build the SCN and deepSCN models.The comparison results show that the recognition rate gets higher as the image size becomes smaller, no matter whether using the SCN model or deepSCN model.The comparison results on the NUE dataset are shown in Figure 6.Different from the CVL dataset, the training recognition rate gets lower as the image size becomes smaller.With the addition of hidden nodes L, the recognition rate keeps increasing, and the model built with image data from scale 2 achieves high testing accuracy.In the deep network, the input of the current hidden layer is the extracted feature map from the above layer.The features extracted from scale 2 images can still be used to characterize the images, whereas the images from scale 3 have a loss of detail, resulting in low recognition rates.Moreover, the computing time is calculated and is shown in Table 2.In general, the larger the image, the longer the computing time.However, compared with the SCN model, the deepSCN model achieves a comparable recognition rate and is less time-consuming.For the image classification task, the proposed method consists of different parts, including downsampling to obtain images of different scales, building the deep SCN model for feature extraction and building an SCN classifier.For the three parts, the ablation experiments are performed.As shown in Figure 7, the SCN and deepSCN methods are built with only onescale images (i.e., the original image dataset).The ensemble-SCN and ensemble-deepSCN methods are built with images from multi-scales and their outputs are integrated into the final classification results.The proposed-SCN and the proposed-deepSCN methods are the proposed method in which the connected feature vectors are extracted by the SCN and deepSCN models.As shown in Figure 7, because the training set only contains images from one scale, the recognition rate of the SCN and deepSCN methods is the lowest; see the yellow curves.
With training images collected from multiple scales, the recognition rate of the ensemble-SCN and ensemble-deepSCN methods is greatly improved, while when the number of hidden nodes L exceeds 500, the recognition rate begins to decrease due to overfitting.The proposed method can solve the overfitting problem by adding an additional classifier to optimize the fully connected feature vector.Through the experimental comparison, the proposed-deepSCN method can obtain a good recognition accuracy, and due to the stochastic configuration algorithm, the building of networks with a deep structure takes less cost time than the single hidden layer structure.(c) (d) In the experiment, the proposed method is compared with some randomized learning methods, such as the 2DNNRW-[19], 2DSCN- [25], deepSCN- [23], DNNE- [18] and SCNE [22]-based methods.Table 3 shows the recognition results of the comparison methods.It can be seen that the 2DNNRW-, 2DSCN-and deepSCN-based methods obtain low classification accuracy as the training data are constructed from single-scale images.The DNNE-and SCNE-based methods built with multi-scale images obtain higher accuracy along with the proposed method.In addition, when an additional classifier is added, the proposed method achieves a higher recognition rate.The comparison results show that the multi-scale feature representation and classifier of the proposed method can effectively improve the classification accuracy.

Conclusions
In this paper, an image classification method using SCNs is proposed.Due to its incremental learning algorithm, the network structure of deep SCNs can be adaptively constructed with the weight parameters of hidden layers randomly assigned.The experimental results demonstrate the advantage of the proposed method for solving classification problems, using the multi-scale extracted features, and an additional classifier can effectively improve classification accuracy.For future work, the weight configuration of the convolutional kernel can be discussed to optimize SCNs' modeling on image data.

( 2 )
Investigated the influence of different scales' training data and different network structures on feature extraction results.(3) Demonstrated the advantage of the proposed randomized learning method in steel surface defect data.

Figure 1 .
Figure 1.The overview of the proposed method.

Figure 2 .
Figure 2. The NEU surface defect data.

Figure 3 .
Figure 3. Recognition rate of different M on CVL database.(a) Training performance; (b) testing performance.

Figure 4 .
Figure 4. Recognition rate of different M on NEU database.(a) Training performance; (b) testing performance.

Figure 7 .
Figure 7. Performance comparison with different classifiers on NEU database.(a,b) Training and testing performance with SCN-based feature extraction; (c,d) training and testing performance with deepSCN-based feature extraction.

Table 1 .
Computing time of deep SCN model with different layers.

Table 2 .
Computing time of different image sizes.

Table 3 .
Comparison results on NEU dataset.