Overview of Hyperspectral Image Classification

With the development of remote sensing technology, the application of hyperspectral images is becoming more and more widespread. The accurate classification of ground features through hyperspectral images is an important research content and has attracted widespread attention. Many methods have achieved good classification results in the classification of hyperspectral images. This paper reviews the classification methods of hyperspectral images from three aspects: supervised classification, semisupervised classification, and unsupervised classification.


Introduction
In recent years, people have only begun to obtain hyperspectral remote sensing images with high spatial resolution and high spectral resolution relatively easily. Because hyperspectral images have strong resolving power for fine spectra, they have a wide range of applications in environmental [1], military [2], mining [3], and medical fields [4]. The acquisition of hyperspectral images depends on imaging spectrometers installed in different spaces. The imaging spectrum was established in the 1980s. It is used to image in the ultraviolet, visible, near-infrared, and mid-infrared regions of electromagnetic waves. The imaging spectrometer can image in many continuous and very narrow bands, so each pixel in the used wavelength range can get a fully reflected or emitted spectrum. Therefore, hyperspectral images have the characteristics of high spectral resolution, many bands, and abundant information. The processing methods of hyperspectral remote sensing images mainly include image correction [5], noise reduction [6], transformation [7], dimensionality reduction, and classification [8]. Unlike ordinary images, hyperspectral images are rich in spectral information, and this spectral information can reflect the physical structure and chemical composition of the object of interest, which is helpful for image classification. Hyperspectral image classification is the most active part of the research in the hyperspectral field [9].
Computer classification of remote sensing images is to identify and classify the information of the earth's surface and its environment on the remote sensing images, so as to identify the feature information corresponding to the image information and extract the required feature information. Computer classification of remote sensing images is the specific application of automatic pattern recognition technology in the field of remote sensing. Moreover, with the further development of hyperspectral imaging systems, the information collected by hyperspectral images will be more detailed and richer. On the one hand, the spatial resolution will be higher; on the other hand, the spectral resolution will be significantly improved. With the further development of hyperspectral imagers, the amount of information contained in hyperspectral images will become even greater, and the scope of application of hyperspectral images will also be wider. The application of different occasions and the increasing amount of data also put forward more complicated requirements for the hyperspectral remote sensing ground observation technology. Compared with multispectral images, the number of imaging bands in hyperspectral images is greater, and the ability to resolve objects is stronger, that is, the higher the spectral resolution. However, due to the high-dimensional characteristics of hyperspectral data, the similarity between the spectra and the mixed pixels, the hyperspectral image classification technology still faces a series of challenges, mainly including the following problems that need to be solved [10,11].
(1) The data of hyperspectral images have high dimensionality. Because hyperspectral images are obtained by using spectral reflectance values collected by airborne or space-borne imaging spectrometers in hundreds of bands, the corresponding spectral information dimension of hyperspectral images is also up to hundreds of dimensions.
(2) Missing labeled samples. In practical applications, it is relatively easy to collect hyperspectral image data, but it is extremely difficult to obtain image-like label information. Therefore, the lack of labeled samples is often faced in the classification of hyperspectral images.
(3) Spatial variability of spectral information. Affected by factors such as atmospheric conditions, sensors, composition and distribution of ground features, and surrounding environment, the spectral information of hyperspectral images changes in the spatial dimension, resulting in that the ground feature corresponding to each pixel is not single.
(4) Image quality. During the acquisition of hyperspectral images, the interference of noise and background factors seriously affects the quality of the collected data. The image quality directly affects the classification accuracy of hyperspectral images.
The high dimensionality of hyperspectral image data and the lack of labeled samples can lead to the Hughes phenomenon [12]. Earlier in the research of hyperspectral image classification, people often focused on spectral information, using only spectral information to achieve image classification, and developed many classification methods, such as support vector machine (SVM) [13], random forest (RF), neural networks [14], and Polynomial logistic regression [15]. Dimension reduction methods such as feature extraction and feature selection have also been proposed, such as principal component analysis (PCA) [16,17], independent component analysis (ICA) [18,19], and linear discriminant analysis (LDA) [20]. The other is a nonlinear feature extraction method. For example, in 2000, the local linear embedding (LLE) [21] algorithm published by Science of Roweis and Saul in Science projects local high-dimensional data points into a low-dimensional coordinate system. The overall information is obtained by superimposing local neighborhoods, maintaining the same topological relationship, and retaining the overall geometric properties. At the same time, Tenenbaum et al. proposed (Isometric Feature Mapping, ISOMAP) [22] an algorithm based on the classic MDS [23]. It uses geodesic distance to embed high-dimensional data into low-dimensional coordinates. The neighborhood structure between high-dimensional spatial data points is still retained in low-dimensional coordinate space. Belki and Niyogi proposed a similar pull to LLE in 2001. Laplacian Eigenmap (LE) [24,25], also known as Spectral Clustering (SC); these nonlinear feature extraction methods are used in classification for practical applications.
Because the spatial context information is not considered, the above classification methods based on spatial information have not achieved good classification results. According to the research published in [26,27], spatial context information plays an important role in the classification of hyperspectral images and can effectively avoid the phenomenon of homologous hyperspectral phenomena or homologous heterogeneous phenomena caused by using only spectral information. In a certain spectral region, two different features may show the same spectral line characteristics. This is the same spectrum of foreign objects, and may also be the same ground feature in different states, such as different relative angles to sunlight, different densities, and different water contents, showing different spectral line characteristics. The samespectrum heterospectrum phenomenon or the same-object heterospectrum phenomenon has a certain effect on the classification effect, so spatial context information needs to be used in classification. The combination of spatial information and spectral information is another hot area in hyperspectral image classification. More and more scholars are beginning to explore this new direction. In [28,29], the extended morphological profile (EMP) method was used to extract the spatial information of the image. In addition, the joint sparse representation models [30,31] can also mine spatial information.
It is worth noting that deep learning [32] has excellent capabilities in image processing. Especially in recent years, image classification, target detection, and other fields have set off a wave of deep learning. Some deep learning network models have been used in remote sensing image processing, such as the Convolutional neural network (CNN), deep belief network (DBN) [33], and recurrent neural network (RNN). Moreover, in order to solve the problem of poor classification results due to the lack of training samples, a new tensor-based classification model [34][35][36] was proposed. Experiments confirmed that this method is superior to support vector machines and deep learning when the number of training samples is small.
Hyperspectral image classification methods are classified into supervised classification [37][38][39], unsupervised classification [40,41], and semisupervised classification [42,43] according to whether the classification information of training samples is used in the classification process.

Supervision Classification
The supervised classification method is a commonly used hyperspectral image classification method. The basic process is first, determine the discriminant criteria based on the known sample category and prior knowledge and calculate the discriminant function; Commonly used supervised classification methods include support vector machine method, artificial neural network classification method, decision tree classification method, and maximum likelihood classification method.  [44] is a supervised classification method proposed by Boser et al. Based on statistical theory and based on the principle of minimizing structural risk, it solves a quadratic constraint with inequality constraints. As a machine learning method, the support vector machine method plays a huge role in image and signal processing and recognition. SVM applies the structural risk minimization principle to a linear classifier to find the optimal classification surface. It requires that the classification surface can not only separate the two types of sample points without error but also maximize the classification gap between the two types. Suppose a hyperspectral image X = fx 1 , x 2 , ⋯x n g, x i = fx i1 , x i2 , ⋯, x iD g T represents the spectral vector of the pixel i in the image, D represents the total number of bands, and n is the total number of pixels. In addition, we define y = ðy 1 , y 2 , ⋯, y n Þ the classification mark image, in the formula y i ∈ ð−1, 1Þ. The mathematical process of a classic SVM classifier is Among them, l n is recorded as the number of a priori marks, and α i is a soft interval parameter. By setting b = 0, the optimal classification plane can pass the origin of the coordinate system, thereby simplifying the calculation. In practical operations, the situations we encounter are not all linearly separable, so we introduce slack variables, the mathematical expression of the support vector machine after introducing the slack variable is Among them, C is a constant, called the penalty factor or regularization parameter.
For nonlinear cases, that is, cases where the data is nonlinearly separable due to the hyperspectral data itself or the external environment. At this time, the classic support vector machine classification method can no longer meet the classification requirements, and the kernel function [45] came into being. After introducing the concept of a kernel function, the basic idea of SVM can be summarized as simple: firstly transform the input space to a high-dimensional space by nonlinear transformation, and then find the optimal linear classification surface in this new space, and this nonlinear transformation is achieved by defining the appropriate inner product function. The more commonly used kernel functions include linear kernel functions, polynomial kernel functions, and Gaussian kernel functions. Figure 1 is a schematic diagram of a kernel function support vector machine.
There is x T i x j in the above formula, Order kernel function 2.2. Minimum Distance Classification. Minimum distance classifier (MDC) is a supervised classification based on the distance of pixels in the feature space as a classification basis. It is generally considered that in the feature space; feature points belonging to the same class are clustered in space. The mean vector determined by these feature points is used as the center of the category, and the covariance matrix is used to describe the dispersion of surrounding points. Points are similarly measured with each category. The basic assumption of the similarity measure is if the feature differences between the two modes are below a set threshold, the two modes are said to be similar. It uses the area formed by the collection of various training sample points to represent 3 Journal of Sensors various decision-making regions and uses distance as the main basis for measuring the similarity of samples. There are many forms of distance calculation, including Ming's distance, Mahalanobis distance, absolute value distance, Euclidean distance, Che's distance, and Barth's distance. Among them, the Mahalanobis distance and the Barth-Parametric distance not only consider the class means vector, but also consider the distribution of each feature point around the center of the class, so it has a more effective classification result than other distance criteria, but the calculation amount, it is larger than several other criteria.
The minimum distance method matching method: ① select a feature type from the spectral library; ② calculate the distance between the feature in the spectral library and the feature to be matched; ③ set a threshold for classification. He is one of the earliest applied methods for image classification research. Due to its advantages such as intuitiveness and simple calculation, it is still widely used today. For some classifications with insufficient training sample points, it can obtain better classification results than other complex classifiers. Figure 2 is a flowchart of the minimum distance classification method.

Maximum Likelihood Classification. Maximum Likelihood Classifier (MLC) is a classification method based on
the Bayesian criterion. The maximum likelihood of discrimination classification is a nonlinear classification method. The statistical feature values of each type of training samples are calculated during classification. Establish a classification discriminant function, use the discriminant function to find the probability that each pixel in the hyperspectral remote sensing image belongs to various types, and classify the test sample into the category with the highest probability. Each category in the remote sensing image has a projection in either direction of the feature space, but when the projections of these different directions are difficult to distinguish, it means that the method of linear discrimination is also not ideal. This requires the establishment of a nonlinear. It is possible to obtain better results with classification boundaries. Due to a large amount of data in the hyperspectral remote sensing image, the covariance matrix generated will be very large, and it will be more difficult to calculate when using this covariance matrix. Therefore, the maximum likelihood discrimination classification method can generally obtain better results. When the training samples are normally distributed, the classification method obtained by the maximum likelihood classification method is better. A remote sensing ground feature image can use its spectral feature vector X as a measure to find a corresponding feature point in the spectral feature space: and each feature point from a similar feature will form a cluster of a certain probability in the feature space. The conditional probability Pðω i | XÞ of a feature point ðXÞ falling into a certain cluster ðω i Þ can be used as a component category decision function, which is called a likelihood decision function. Assuming that g i ðxÞ is a discriminant function, the probability pðω i | xÞ that a pixel x belongs to class ω i can be expressed as According to the Bayesian formula, there is where pðx | ω i Þ is the conditional probability that x belongs to ω i , pðω i Þ is the prior probability, and pðxÞ is the probability when x has nothing to do with the category. Maximum likelihood classification assumes normal distribution of hyperspectral data, and the discriminant formula is where i is the number of classes, K is the number of features, ∑i is the covariance matrix of the i-th class, j∑ij is the determinant of the matrix ∑i, and u i is the mean vector.

Neural Network Classification.
Artificial Neural Networks (Artificial Neural Networks, ANN) is the most popular artificial intelligence classification method at present. It is characterized by simulating the processing and processing of information by human neurons. It has been widely used in intelligent control, information processing, and combinatorial optimization. However, artificial neural networks also have their own weaknesses, such as the need for a large amount of training data, slower operation speed, and difficulty in obtaining decision surfaces in the feature space. Neuron classification methods are commonly used in BP neural networks [46], radial basis neural networks [47], and wavelet neural networks [48].   Journal of Sensors widely used neural network model. It consists of an input layer, a hidden layer, and an output layer. When an input mode is given, the input signal is from the input layer to the transmission of the output layer is a forward propagation process. If there is an error between the output signal and the desired signal, the error is transferred to the backward propagation process, and the weight of each layer is adjusted according to the magnitude of the error of each layer. The implementation process of BP neural network classification mainly includes two stages: the first stage is the network self-learning of the sample data to obtain an optimized connection weight matrix. The main steps include the determination of the network system, the input of sample data and control parameters, initialize the weights, and adjust the connection weights of each layer. The second stage is to use the learning results to classify the entire image data. After inputting multispectral images, the network uses the connection weight matrix of the network obtained during the learning process to the image data is calculated. According to the comparison between the output result and the expected value of each type, each pixel is classified as the one with the smallest error.
SVM is a novel small sample learning method with a solid theoretical foundation, so it does not require a large number of training samples. However, it also has defects when dealing with large-scale samples, and it cannot solve the multiclassification problem well. The principle of the minimum distance classification method is simple; the classification accuracy is not high, but the calculation speed is fast; it can be used in a quick overview of the classification. In practical application, the maximum likelihood method, the minimum distance method, and the neural network method can be used, but strict and precise supervision by humans is required to ensure that the accuracy meets certain requirements.

Deep Learning
In recent years, hyperspectral image classification methods have introduced spatial information of hyperspectral images. This type of method is simply referred to as hyperspectral image classification methods based on spatial-spectral joint features. Deep learning originates from artificial neural networks. Compared with artificial neural networks, deep learning has a stronger pumping ability. Deep learning models have deeper layers, which also helps to extract feature information. This section mainly introduces convolutional neural networks (CNN) in deep learning [49,50], deep belief network (DBN), and stacked autoencoder (SAE).

CNN. Classification method based on spectral features:
Hyperspectral images have very rich spectral information and extremely high spectral resolution. Each pixel can extract one-dimensional spectral vectors. These vectors are composed of spectral information. Classification using only one-dimensional spectral vectors is called a classification method based on spectral information. In the classification method based on spectral information, generally, the pixel is used to extract spectral information or to obtain certain specific features from spectral information through feature extraction to classify. Using CNN to classify spectral features of hyperspectral images is to use one-dimensional CNN (1D-CNN) to extract spectral features and classify them. The process is shown in Figure 3.
The specific process is: input labeled data in hyperspectral data into 1-DCNN, train 1-DCNN with class labels, and then iteratively update network weights through algorithms such as SGD, and finally use the trained 1-DCNN for each Pixel classification results in classification results. The one-dimensional convolution operation uses a one-dimensional convolution kernel to perform a convolution operation on a one-dimensional feature vector. Its expression is Among them, k h l,j,m represents the specific value of the l-th convolution kernel in the j-th layer at h, and the convolution kernel is connected to the m-th feature vector in the (l-1) layer network. H i represents the length of the one-dimensional convolution kernel. b l,j represents the offset of the j-th feature map of the l-th layer. v ðx+hÞ ðl−1Þ,m represents the specific value of the m-th feature map at the ðx + h, y + wÞ position in the l-1th layer.
Classification method based on spatial features: spatial information, that is, context information. When classifying based on spatial information, instead of using the spectral information extracted from a certain pixel, the spatial information extracted from the neighborhood of the pixel is used instead. Due to the high latitude of hyperspectral data, the usual method for extracting spatial information is to first compress the data set, then use two-dimensional convolutional neural networks (2D-CNN) to extract deeper spatial information, and then use spatial information to classify. The specific process is shown in Figure 4.
The main difference between the two-dimensional convolution operation and the one-dimensional convolution operation is the dimensions of the convolution layer and the pooling layer. The two-dimensional convolution Among them, k h,w l,j,m represents the value of the j-th convolution kernel in the l-th layer at ðh, wÞ, and this convolution kernel is connected to the m-th feature map in the l-1 layer. H l and W l , respectively, represent the height and width of the convolution kernel, and b l,j represents the offset of the j -th feature map of the l-th layer. map Classification method based on spectral-spatial features: In traditional hyperspectral image classification, only spectral information is often used. However, the same ground features will show different spectral curves due to the influence of the external environment. Different ground objects may also exhibit the same spectral curve, which is the so-called same-object heterospectrum and foreign-object samespectrum phenomenon. For example, some pixels connected in space are classified as parking lots, so the pixels whose spectral information appears very similar to the metal spectral information are likely to be cars. If many pixels around a pixel are grass, the pixels in the middle are likely to be grass too. Hyperspectral data presents a three-dimensional structure, containing both one-dimensional spectral information and two-dimensional spatial information. A threedimensional convolutional neural network (3D-CNN) can simultaneously extract spectral information and spatial information. The specific process is shown in Figure 5.

DBN. The realization of a deep belief network (DBN) is based on restricted Boltzmann machine (RBM). DBN is a network model constructed by multiple RBMs layer by layer.
A classic DBN network is composed of several RBMs and a layer of BP. The structure diagram is shown in Figure 6.
During training, a layer-by-layer unsupervised method is used to learn parameters. First, take the data and the first hid-den layer as an RBM, train the parameters of this RBM, then fix the parameters of this RBM, treat the first hidden layer as a visible vector, and the second hidden layer as a hidden vector, train the second RBM, get its parameters, then fix these parameters, and loop according to this method. The specific steps are First, separately and unsupervisedly train each layer of RBM network to ensure that feature vectors retain as much feature information as possible when they are mapped to different feature spaces; Second, set up the BP network in the last layer of the DBN, receive the RBM output feature vector as its input feature vector, and supervise the training of the entity relationship classifier. Moreover, each layer of the RBM network can only ensure that the weights in its own layer and the layer feature vector mapping is optimal, and it is not optimal for the entire DBN feature vector mapping, so the backpropagation network also propagates error information from top to bottom to each layer of RBM, and fine-tunes the DBN network and RBM network training model. The process can be seen as the initialization of the weight parameters of a deep BP network, so that the DBN overcomes the shortcomings of the BP network that is easy to fall into local optimization and long training time due to the random initialization of the weight parameters.
When using DBN to classify the spectral features of hyperspectral images, the main method is to use DBN to extract the deeper features of the spectral information collected from the positions of the pixels to be classified, and then use the deep features to complete the classification. Spatial features of hyperspectral images based on DBN, the classification method is very similar to the SAE-based hyperspectral image spatial feature classification method. In [51], Li et al. first used PCA to compress the original hyperspectral image, retain the first 3 principal components, and then extract the data in the 7 × 7 neighborhood. As input, use the DBN network for feature extraction and classification. Literature [52] uses DBN to separately extract spectral features and spatial features, and then connect the spectral features and spatial features to form a spatial spectrum feature, and then complete the classification based on the spatial spectrum feature. The overall framework of the task and classification is basically the same as the framework in [53], and the method in [54] introduces sparse restrictions, Gaussian  Figure 7 shows the flowchart of using DBN to classify hyperspectral images.
3.3. SAE. The SAE network structure is similar to the DBN, and it is also trained in an unsupervised form. Depending on the specific training method, the stacked SAE will also have different performance aspects. The hyperspectral image classification method based on the SAE network can also generally be used. It is divided into classification methods based on spectral features, classification methods based on spatial features, and classification methods based on spatial spectrum combined features. When performing spectral feature classification of hyperspectral images based on SAE, the spectral vector extracted from the position of the pixel to be classified is usually used as the input data of the network, and SAE is used to extract deeper and more abstract depth spectral features, and then complete the classification task based on the depth spectral features.
When performing spatial feature classification of hyperspectral images based on SAE, the main method is to first reduce the dimensionality of the hyperspectral image and then extract all the information in the neighborhood with the pixels to be classified as the center, and pull the extracted information into a vector as SAE input data. SAE cannot directly process data blocks with a two-dimensional structure in space. In order to meet SAE's input format requirements, the data needs to be stretched into a one-dimensional vector, which is different from the spatial feature classification of hyperspectral images based on CNN. The method differs greatly.
When SAE-based hyperspectral image spatial spectrum joint classification, the common practice is to separately extract spatial information and spectral information, and finally merge to obtain spatial spectrum joint information. The spectral vector extracted from the spatial position of the pixel to be classified is compressed in the hyperspectral image, extract spatial information from the neighborhood on the map and pull the spatial information into a vector, and merge the spectral vector and the vector drawn from the spatial information. Enter the merged information into the SAE, use the SAE to extract deep spatial-spectral features, and complete the classification.
It can be found through statistics that it can be found that the hyperspectral image classification methods based on SAE and DBN have developed earlier, and various hyperspectral image classification methods based on CNN have been proposed later. Spectral image classification methods have developed the fastest and have the largest number of papers. Overall, the classification results are better than the hyperspectral image classification based on SAE and DBN.

Unsupervised Classification
The unsupervised classification method refers to the classification based on the spectral similarity of the hyperspectral data, that is, the clustering method without any prior knowledge. Because no prior knowledge is used, unsupervised classification can only assume initial parameters, form clusters through preclassification processing, and then iterate to make the relevant parameters reach the allowable range.

K-Means
Classification. The classification idea of the K -means clustering method is that the sum of the squares of the distances from all the pixels in each class to the center point of the class is the smallest. At the beginning of the clustering, the center point of the initial clustering is randomly selected, and other pixels to be classified are classified into one of the categories according to the prescribed principles to complete the initial clustering. Then recalculate the clustering center point of each class, modify the clustering center point, and classify again, so iterate until the position of the clustering center points no longer changes, find the best clustering center, and get the best Cluster results and stop the iteration. Figure 8 shows the algorithm flow of K-means clustering. During K-means clustering, the number of selected categories cannot be changed during the calculation process, and the position of the initial cluster center point initially selected will also affect the clustering result, so each time you may get a difference. Large experimental results, this is the disadvantage of K-means clustering. This kind of defects can help to find a better initial clustering center with some 4.2. Iterative Self-Organizing Method. ISODATA algorithm is also a commonly used clustering algorithm. The ISODATA algorithm is similar to the K-means classification algorithm. It is a method based on K-means classification improvement. Compared with K-means clustering, the ISODATA algorithm has more obvious advantages in classification. First, when clustering, there is no need to continuously adjust the cluster center during the calculation process, but all categories are calculated and then the samples are adjusted uniformly. Second, a big difference from K-means clustering is that when clustering the algorithm, the number of categories can be automatically adjusted according to the actual situation, so that a reasonable clustering result can be obtained.
The main advantages of these two classification methods: there is no need to have a broad understanding of the classification area, only a certain amount of knowledge is required to explain the classified cluster groups; the chance of human error is reduced, and the initial parameters that need to be input are less; the clusters with small but unique spectral characteristics are more homogeneous than the supervised classification; the unique and small coverage categories can be identified. Main shortcomings: A lot of analysis and postprocessing is required to obtain reliable classification results; the clusters and land categories that are classified may or may not correspond, plus the common "same spectrum" and "foreign material". The phenomenon of "same spectrum" makes the matching of cluster groups and categories difficult. Because the spectral characteristics of each category change with time and terrain, the spectral cluster groups between different images cannot maintain their continuity and are difficult to compare.

Semisupervised Classification
The main disadvantage of the supervised method is that the classification model and classification accuracy mainly depend on the number of training data sets of label points, and obtaining a large number of hyperspectral image class labels is a time-consuming and cost-intensive task. Although unsupervised methods are not sensitive to labeled samples, due to the lack of prior knowledge, the relationship between clustering categories and real categories is uncertain [55]. Semisupervised classification uses both labeled and unlabeled data to train the classifier. It makes up for the lack of unsupervised and supervised learning. This classification method is based on the same type of labeled and unlabeled samples on the feature space. Closer assumptions, since a large number of unlabeled samples can more fully describe the overall characteristics of the data, the classifier trained using these two samples has better generalization.
This classification method is widely used in hyperspectral image classification. Typical semisupervised classification methods include model generation algorithms, semisupervised support vector machines, graph-based semisupervised algorithms, and self-training, collaborative training, and triple training.   Journal of Sensors Based on the above problems, this paper reviews a semisupervised classification method. Semisupervised learning has attracted much attention from researchers in hyperspectral image classification because it only requires a small number of labeled samples [56]. Semisupervised learning combines labeled data with unlabeled data to improve classification accuracy.

Laplace Support Vector
Machine. Laplacian support vector machine (Laplacian Support Victor Machine, LapSVM) is proposed on the basis of traditional SVM. By adding manifold regularization terms, the geometric information of unlabeled samples and labeled samples are fully utilized to construct a classifier. LapSVM can predict the labels of future test samples. And LapVM has the characteristics of strong adaptability and global optimization.
Given labeled samples and unlabeled samples fx i g l+u i=l+1 , x i ∈ R m , and y i ∈ f−1,+1g, the decision function is f . The Among them, V represents the mis-segmentation cost function of labeled samples, γ L controls the complexity of the function f in Hilbert space, and γ M controls the complexity of the geometric characteristics of the data distribution within the maximum distance of f . The structure of LapSVM is described in detail below. First, LapSVM uses the same loss function as traditional SVM: Among them, f represents the classification decision function f ðxÞ = hw, φðxÞi + b of the selected classifier, where φð⋅Þ represents a nonlinear mapping function from a lowdimensional space to a high-dimensional Hilbert space, where α = ½α 1 , ⋯, α l+u , is a decision function after finishing: Here, K represents the kernel function, and different learner functions are realized by selecting different kernel functions, so there are LapSVM algorithm simulates the geometric distribution of data by constructing a graph using labeled samples and unlabeled samples. Using the smoothing assumption to normalize the graph, the fast-changing part of the penalty classification function is Substituting the above formula into where ξ i represents the relaxation factor of the labeled sample.
LapSVM algorithm fully reflects the role of unlabeled samples in the classification process through the geometric characteristics of the data, but often requires a large computational cost.
5.2. Self-Training. Self-training is a commonly used semisupervised classification algorithm. In the implementation of the algorithm, a classifier is first trained with labeled samples, and then a large number of unlabeled samples are labeled with this classifier. Data with high confidence are selected from the labeled samples, and these data are added to the initial training set together with their labels to retrain the classifier. Repeat this process until the end condition is met. The general process of self-training is as follows: (1) Use the initial labeled sample set to train the classifier (2) Use the classifier to label the data in the set of unlabeled samples, and select the samples with the highest confidence, and record as (3) Retrain the classifier using the new sample set (4) Repeat steps 2) and 3) until the end condition is met Self-training algorithms have been widely used. This classification method is simple and convenient. However, because the number of initial training samples is generally limited, it is difficult to train a classifier with good generalization performance and high accuracy. When unlabeled samples are labeled, a large number of mislabeled samples will be generated. Such samples will always exist as noise samples when they are added to the original training set. With the iterative loop, errors accumulate, which will inevitably cause the classification performance of the classifier to decline.

Evaluation of Classification Results
After classification processing of hyperspectral remote sensing images, we need to judge the quality of the classification results, and then evaluate the performance of the classifier. Some evaluation criteria are used to evaluate the degree of agreement between the classification results and the real 9 Journal of Sensors features, that is, the classification accuracy is calculated to visually show the feasibility of the proposed algorithm is followed by a comparative analysis of other classification algorithms, and further improvements based on their shortcomings. Commonly used evaluation indicators are confusion matrix, producer accuracy, user accuracy, Kappa coefficient, overall classification accuracy, etc. [57], and its implementation principle will be briefly introduced below.
6.1. Confusion Matrix. The confusion matrix is also called the error matrix [58], which is mainly used to compare whether the classification result is the same as the actual ground cover. Various other evaluation criteria are based on this. Assume that the confusion matrix of order c * c is where c is the number of categories, and x ij , ði, j = 1, 2,⋯cÞ is the number of samples of the i-th category divided into the j -th category, and the element x ii on the diagonal represents the number of sample points that were correctly divided.
Where n is the total number, and the total number of sample points is calculated as 6.2. Overall Accuracy. The overall accuracy refers to the ratio of the number of sample points that are correctly divided into the total number of samples. The calculation formula is 6.3. Kappa Coefficient. The Kappa coefficient can comprehensively evaluate the division results of images. The Kappa coefficient comprehensively considers the number of sample points that are correctly divided and the number of incorrect divisions to evaluate the classification results, which is very convincing [59]. Based on the confusion matrix, the following formula can be used: where n is total and x i+ is the sum of the elements of each row, defined as follows: x ij : ð20Þ x +i represents the sum of the elements in each column and is defined as follows: Table 1 gives several classification methods of supervised classification; the data comes from Pavia University. It should be noted that percentages are added after the values of OA and AA.
For the above methods, the SVM training time is longer but the classification speed is fast; the RF training speed is fast, the implementation is simple but it is prone to overfitting; the BP algorithm has fault tolerance and strong selflearning ability, but the convergence speed is slow and the training time is long; 1D-CNN can handle highdimensional data but requires a large number of samples and it is best to use a GPU for training. Table 2 gives several classification methods based on deep learning. Similarly, the data also comes from Pavia University and gives a comparison table of the classification effects of each method using spectral information, spatial information, and spatial-spectral information.
Comparing Tables 1 and 2, it can be seen that the classification effect of the deep learning method is superior to the ordinary classification method, especially after combining spatial information and spectral information, the classification performance has been greatly improved. However, the deep learning method requires a large number of training samples for classification, and the training time is long. If there are enough training samples, the deep learning method is a good choice. If there are no training samples, then SVM, etc. can also be used for classification.

Conclusion
Classification and recognition of hyperspectral images are important content of hyperspectral image processing. This paper discusses several methods of hyperspectral image classification, including supervised and unsupervised classification and semisupervised classification. Although the supervised and unsupervised classification methods described in this article have their respective advantages to varying degrees, there are limitations in the application of various methods. For example, supervised classification requires a certain number of prior conditions, and human factors will affect the classification results have an impact. Therefore, based on different application requirements, combined with the acquisition of hyperspectral images with massive information, multiple methods need to be combined 10 Journal of Sensors with each other in order to achieve the desired classification effect. With the development of hyperspectral image technology, hyperspectral image classification has been widely used. Existing theories and methods still have certain limitations for more complicated hyperspectral image classification. Therefore, researching more targeted hyperspectral image classification methods will be an important research direction in the future.

Data Availability
Data are available on request. Please contact Xiaofei Wang to request the data.

Conflicts of Interest
The authors declare that they have no conflicts of interest.