Deep Belief Network for Feature Extraction of Urban Artificial Targets

Reducing the dimension of the hyperspectral image data can directly reduce the redundancy of the data, thus improving the accuracy of hyperspectral image classification. In this paper, the deep belief network algorithm in the theory of deep learning is introduced to extract the in-depth features of the imaging spectral image data. Firstly, the original data is mapped to feature space by unsupervised learning methods through the Restricted BoltzmannMachine (RBM).)en, a deep belief network will be formed by superimposed multiple Restricted BoltzmannMachines and training the model parameters by using the greedy algorithm layer by layer. At the same time, as the objective of data dimensionality reduction is achieved, the underground feature construction of the original data will be formed. )e final step is to connect the depth features of the output to the Softmax regression classifier to complete the fine-tuning (FT) of the model and the final classification. Experiments using imaging spectral data showing the indepth features extracted by the profound belief network algorithm have better robustness and separability. It can significantly improve the classification accuracy and has a good application prospect in hyperspectral image information extraction.


Introduction
Hyperspectral image classification is one of the most advanced techniques to understand the remote sensing image scene [1]. However, the classification constitutes many challenges because of the high dimensionality of images, high correlations among bands, and spectral mixing. Hyperspectral images are generally composed of hundreds or even thousands of relatively narrow bandwidth bands, which provide sufficient spectral and spatial information [2]. Combined with the spectral approach and optical approach, each space of the target pixel is dispersed during the process of spatial imaging, and then covering the spectrum. Characteristics of hyperspectral images and classification approaches, which are based on hyperspectral imaging, provide possibilities to classify the land surface objects high accurately [3,4].
Hyperspectral datasets are composed of hundreds of bands and combine images with spectrum. ey provide rich surface spectral information and have incomparable advantages in the exceptional identification and classification of surface materials [5]. In terms of increasing the data dimension and selecting more data samples, band information is extended to increase the redundancy of the model. Although this improves the spectral resolution of hyperspectral remote sensing images, it dramatically affects the processing speed of the model data and also reduces the accuracy of the model and affects the target recognition. But high spectral resolution can affect the occurrence of Hughes effects [6]. e high correlation and information redundancy between the bands, as well as the foreign matter in the image and the same-spectrum foreign matter problem, result in a highly nonlinear data structure, which also makes extract information from the imaging spectral data difficult [7]. erefore, the dimension reduction method is used to extract more productive and stable low-dimensional features to express the original high-dimensional data. While reducing the computational complexity, the improvement of the classification accuracy on imaging spectral images has become one of the leading research questions in information extraction of spectral images [8].
Commonly used dimensionality reduction methods include linear dimensionality reduction and nonlinear dimensionality reduction. Linear dimensionality reduction methods mainly include principal component analysis (PCA) [9], independent components analysis (ICA) [10], and minimum noise fraction (MNF) [11]. However, the hyperspectral data have nonlinear structures, and the traditional linear dimensionality reduction method cannot reveal the nonlinear structure contained in the datasets. In recent years, the nonlinear manifold learning algorithm (NMLA) has been introduced to reduce the dimensionality of hyperspectral data. Bachmann et al. [12,13] applied the improved isometric mapping algorithm (ISOMAP) [14] and locally linear embedding (LLE) [15] algorithm to the dimensionality reduction of hyperspectral images, further improving the accuracy of image classification. However, the current dimension reduction methods are often limited to extracting the shallow features of pixels, which may restrict the performance of classifiers. Deep learning extracts the pattern of features from the data according to the predesigned feature extraction rules, from which the in-depth features of the pixels can be extracted, achieving the purpose of dimensionality reduction.
Deep learning can be regarded as the continuation and sublimation of neural networks. Deep learning gradually extracts the features from the low level to the advanced input by stimulating the learning process of the brain and finally forms the ideal characteristics of pattern classification to improve the classification accuracy. In 2006, Hinton and Salakhutdino proposed using a deep belief network (DBN) [16] to achieve data dimensionality reduction and classification. It is essentially the feature extraction of data using deep neural networks, called the deep learning algorithm. Typical methods for deep learning include Restricted Boltzmann Machines (RBM), deep belief network (DBN), convolution neural network (CNN), and auto encoder (AE) [17]. New deep learning approaches include the recurrent neural network (RNN), long short-term memory (LSTM), and generative adversarial nets (GAN). At present, deep learning has become the focus of attention in the field of machine learning and artificial intelligence. It has been widely used in image classification [18], target detection [19], speech image recognition [20], and natural language processing [21].
In this paper, based on the study of the dimensionality reduction method of traditional imaging spectral data, we introduce deep belief network based on the theory of deep learning to using a dimensionality reduction of hyperspectral images. e conventional dimension reduction method and deep belief network are compared to extract hyperspectral image information, and the robustness and separability of abstract features are considered. Finally, the optimal way of classification accuracy is verified.

Restricted Boltzmann Machine (RBM).
e Restricted Boltzmann Machine is a typical energy-based model, as shown in Figure 1. Suppose that there is a bipartite graph. One is the visible layer (that is the data input layer), and the other one is the hidden layer. ere are connections between all the visible layer and the hidden layer, but there is no connection between the hidden layer and the visible layer; that is, there is full connection between layers and no connection within layers. All nodes can only be taken as 0 or 1; that is, all nodes are random binary variables. At the same time, the full probability distribution p(v, h) satisfies the Boltzmann distribution. is model is a Restricted Boltzmann Machine (RBM). e energy function of the RBM model, such as equation (1), can be directly converted into a free energy form. e configured joint probability distribution can be determined by the Boltzmann distribution and the configured energy, as shown in equation (2). For example, equations (3) and (4) are conditionally independent because there is no connection between nodes in each layer, i.e., equations (5) and (6), so it is known that v, h, can be obtained by p(h | v). Similarly, v′ of the visible layer can be obtained by h, by adjusting the parameters to make v and v ′ equal. At this time, h can be expressed as a feature of the input data. ere is a probability that the j th node of the hidden layer is 1 or 0 when the visible layer v is given, such as equation (7). Similarly, there is a probability that the i th node of the visible layer is 1 or 0 when the hidden layer is provided, such as equation (8). In this case, the free energy function can be expressed as equation (9) presenting a set of samples satisfying independent and identical distribution: { }, and the RBM Log-likelihood gradient is equations (10)- (12). e contrastive divergence (CD) algorithm is used to update the weight, such as equations (13)- (15).
v and h are the units of visible and hidden units, respectively. W is the connection weight between the visible layer and the hidden layer, b ′ is the bias of the neurons in the visible layer, and c ′ is the respective bias of the hidden layer. For a given set of states (v, h), the energy possessed by RBM as a system is defined as θ � W, b ′ , c ′ is a parameter of RBM, all of which are real. When the parameters are determined, based on the energy function, we can get the joint probability density distribution of (v, h): Z(θ) is the normalization factor, which is the energy sum in all possible cases. e concept is formed by dividing the energy of a certain state by the total energy sum of possible states as follows: When the state of a visible unit is given, the activation states of each hidden unit are independent of the conditions. At this point, the probability of activation of the j th hidden unit is Since the structure of the RBM is symmetrical, when given the state of the hidden unit, the activation states of each visible unit are also conditionally independent; that is, activation probability of the j th visible unit is It should be noted that, in both formulas (7) and (8), c i and b i are corresponding bias values.
e following assumes only one training cost, and we use p(h | v) and p(v | h) probability distributions, respectively, then the logarithmic deterministic function is about the connection weight W ij , the bias c i of the visible layer unit, and the hidden layer unit. e partial derivatives of offset b j are At the beginning of the CD algorithm, the state of the visible unit is set to a training sample and the binary state of the hidden unit is calculated. After determining the status of all hidden units, the probability that the i th visible unit takes a value of 1 is determined according to the formula, resulting in a reconstruction of the visible layer, so that when values are on the training data, the criteria for updating for each parameter are where ε denotes the learning rate in the CD algorithm,

Deep Belief Network Is Constructed Using Training Restricted Boltzmann Machine by Layer.
e deep belief network is a superposition of a multilayer of Restricted Boltzmann Machines, which can extract the indepth features of the original data. Figure 2 declares the model. e joint probability distribution between the input data v and the llayer hidden layer h k in the visible layer is shown in equation (16). e weight is obtained by using the unsupervised Greedy algorithm (GA). First, the first layer of the Restricted Boltzmann Machine is trained to fix the training parameters of the first layer. en, the hidden layer output of the first layer of Restricted Boltzmann Machine takes as the input to the second layer of Restricted Boltzmann Machine, and the parameters of the first layer are successively trained layer by layer.
e last hidden layer is connected to the Softmax regression classifier, and the fine-tuning (FT) is completed by the supervised Gradient descent (GD) algorithm.
is the joint probability distribution between the visible and hidden layers of the topmost RBM model.

Deep Belief Network Algorithm Workflow.
e training process is shown in Figure 3. Give the training sample set X after initialization, there are k visible layer (v) and j th hidden layer (h) in the RBM network structure, where the visible layer is only affected by j th hidden layer. At the same time given the training period and learning rate, after the initialization of each parameter, the comparison dispersion algorithm is used to update the training parameters. If the algorithm converges, then the output, otherwise, continues with the parameter training as equations (13)- (15). e training process is shown in Figure 4. Given the parameters and the number of hidden layers in Figure 3 after initialization, use Figure 3 to train the first layer of RBM. In this way, the hidden layer of the first layer of RBM is used as the input layer of the second layer of RBM, and the training is performed layer by layer until the last layer of RBM. Output the last layer and connect to the Softmax regression classifier, which is the output after fine-tuning (FT) (Figure 4). Figure 1: RBM model schematic.

Data and Preprocessing
To verify the robustness of the model, two different hyperspectral image data types were simultaneously tested in this section. e experiment has been carried out on two publicly available and widely used hyperspectral images: airborne hyperspectral image data and near-ground data captured by Hyspex imaging spectrometer data in Pavia ...

Input
Training samples

RBM-algorithm
Visible layers Hidden layers

Offset vectors
Given the number of iterations and learning rate Update

Visible layers
Hidden Given the parameters and the number of hidden layers City, Italy. After preprocessing, such as radiation correction and reflectance inversion, the image pixel samples are connected to train the deep belief network model. Adjust and test the hyperparameters and training parameters at the same time, then compare them with the traditional dimensionality reduction method and analyze the optimal classification accuracy of the model to get the optimal classification model.

Datasets.
e Pavia City image was gathered by the Reflective Optics System Imaging Spectrometer (ROSIS-3) optical sensor over the Pavia City, Italy.
is image is 610 × 340 pixels, as shown in Figure 5. e ROSIS-3 sensor generates 115 bands in the range of 430-860 nm, of which 103 bands except for the noisy band are selected for classification. Only the data without noisy band can effectively extract the characteristic bands. ere are eight categories in the Pavia City image, as shown in Figure 6, and Table 1 shows the selection of sample data. According to the proportion of 3 : 1 : 1, the sample ratio is divided into training, verification, and test samples. e training samples are used to adjust the model's trainable parameters, the verification samples are used to improve the hyperparameters, and the test samples are used to test the classification accuracy of the model. e near-ground image acquired by the Hyspex Imaging Spectrometer (Hyspex) uses the ground imaging method.
is image is 400 × 600 pixels, as shown in Figure 7. e Hyspex sensor generates 1600 spatial pixels and 108 bands in the range of 400-1000 nm. In the experiment, the Hyspex sensor selected 103 bands except the noisy band for classification to effectively extract the characteristic band. ere are six categories in the sample spectrum curve, as shown in Figure 8, and the sample data selection is shown in Table 2.
e main advantages of the Hyspex imaging spectrometer are proper matching between point spread function and pixel size, low stray light, moderate spectral trapezoidal distortion effect, low polarization correlation, high sensitivity and low noise, high acquisition speed and data rate, real-time responsiveness, and dark compensation correction.

Preprocessing. ROSIS-3 has been preprocessed.
e Hyspex uses a radiation correction of the original image obtained by the imaging calibration spectrometer. e reflectance inversion was performed by the Flat Field (FF) method based on the statistical model, and the large cement floor was selected as the Flat Field.

RBM Model Analysis.
e deep belief network is composed of multilayered Restricted Boltzmann Machines, so the single Restricted Boltzmann Machine model is analyzed first. According to the hyperparameters selection opinion given in reference [16], the design learning rate is 0.1, and the Batch size is 20. e number of layers and hidden layer nodes of RBM model need to be determined by repeated experiments. By fixing other parameters and successively changing the number of layers, the optimal number of layers is obtained. e same training samples are used to train the RBM with different numbers of hidden layer neurons until the algorithm converges. e spectral curve of the original sample is compared with the reconstructed spectral curve under different experimental parameters, and the performance of the RBM under different numbers of neurons in the hidden layer is intuitively compared. In the following we try to discuss the influence of the number of hidden layer units and training optimization iterations of the model on the input data reconstruction ability. A representative water sample is selected to make the   us, selecting representative vegetation samples, the original spectral curve is shown in Figure 10(a). e number of fixed hidden layer neurons in the sample is 60, and the number of iterations is [100,150,200,250,300]. In this way, we discuss the effect of the number of iterations on the reconstruction capability of the input data. e reconstructed spectral curves of the output are as shown in Figures 10(b)-10(f ). It can be obtained that when the number of iterations is 250, the reconstruction capability of the model starts to stabilize.

Comparison with Traditional Dimensionality Reduction
Methods. To test the classification performance of the data after the dimension reduction and the deep belief network, the accuracy of this method is compared with the conventional dimension reduction method. In the experiment, we select a deep belief network model including two layers of Restricted Boltzmann Machines. e first layer parameter setting is 60 from the previous section, and the second layer is the final extracted feature number set to [4,8,12,16,20]. Meanwhile, conventional dimension reduction methods are principal component analysis, minimum noise separation, factor analysis (FA), and independent component analysis. All the dimension reduction methods have the same feature number extraction settings, and they are connected to the same classifier. In the experiment, we select the commonly used Support Vector Machine (SVM) and Softmax regression classifier for comparison analysis. SVM is a supervised classification method based on structural risk minimization. Its goal is to maximize the interval. It uses a limited number of boundary pixels to create a decision surface. By this way, the optimization problem becomes a convex quadratic programming problem. While the kernel function selects the radial basis kernel function suitable for hyperspectral image classification, the hyperparameter σ of the kernel function is  Testing samples  Total  Asphalt  745  250  252  1247  Bare soil  231  82  85  398  Gravel  593  200  208  1001  Meadows  402  133  135  670  Metal sheet  1250  425  512  2187  Brick  352  133  139  624  Shadow  1647  518  509  2674  Tree  780  259 260 1299    Testing samples  Total  Water  515  168  179  862  Vegetation  528  177  172  877  Cement road  688  235  238  1161  Magmatic rock  374  140  145  659  Automobile  452  162  166  780  Curtain wall  415  138  145 Figures 11 and 12, and the experiment results of the Hyspex data are shown in Figures 13 and 14. Under experimental results, the Softmax regression classifier is more conducive to the feature classification after the dimension reduction, while other feature extraction methods are less accurate than the deep belief network.

Analysis of the Influence of the Number of Implicit Layers in Deep Belief Network.
e selection of the number of hidden layers in the deep belief network determines whether it can extract appropriate features, which plays a essential role in the accuracy of the final classification. When the number of layers is too few, only the features of the shallow layer can be extracted, which affects the classification accuracy. When the number of hidden layers increases, the abstract features obtained have better separability, which can improve the robustness of the classification model. But when there are too many layers, it is easy to make the model fall into overfitting. e number of hidden layers is set to [2][3][4][5][6]. To achieve the purpose of full dimensionality reduction, the number of top-level units is set to 4. e setting of the other parameters is consistent with the analysis of the Restricted Boltzmann model. e results are shown in Figure 15.

Optimal Model Accuracy Evaluation.
Compared with the traditional dimensionality reduction method, the deep belief network has a higher classification accuracy after dimensionality reduction. Moreover, the optimal model can be obtained when the number of hidden layers is 4; the number of units in each hidden layer is set to 60-60-60-4. Use confusion matrix to evaluate the classification accuracy of the optimal model and a ROSIS-3 confusion matrix shown in Table 3, with Hyspex confusion matrix being shown in Table 4.

Classification Effects.
Use the optimal model obtained by experimental analysis to classify the entire image. e effect of classification of test area I is shown in Figure 16(b), and that of test area II is shown in Figure 17     setting is consistent with the previous text. e classification effect of ROSIS-3 is shown in Figure 16(a) and that of Hyspex is shown in Figure 17(a).

Conclusions
is paper presents new research based on the deep belief network to deal with the extraction of artificial target features in cities, as hyperspectral image. Based on the study of traditional imaging spectral data dimensionality reduction methods, it thoroughly considered that the conventional imaging spectral data dimensionality reduction methods only extract the shallow features of the pixels that tend to be unstable in the feature space, which limits the improvement of classification accuracy. erefore, this paper introduces the deep belief network algorithm in the theory of deep learning that can not only reduce the dimension of data but also extract the depth features of pixels. rough the experimental analysis of the deep belief network model, it found that when using four hidden layers, the number of hidden layer units is 60-60-60-4, and connected to the Softmax regression classifier, the best classification accuracy can be obtained. Compared to traditional shallow feature extraction based on the Principal dimension analysis, minimum noise separation, factor analysis, independent component analysis, and other dimensionality reduction methods are obtained; the abstract features extracted by the deep belief network have better robustness and separability, which can lead to better classification accuracy and facilitate the phenotype of classifier performance. Besides, when using two different data types to test, the deep belief network has the best classification performance, which sufficiently proves that the model has broad applicability in imaging spectral data classification and information extraction. e next study will focus on the adjustment and selection of model parameters to obtain better classification results, as well as the introduction of the theory of deep learning into imaging spectral data processing.

Data Availability
e data used to support the findings of this research are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.