A Dual-Branch Fusion of a Graph Convolutional Network and a Convolutional Neural Network for Hyperspectral Image Classification

Semi-supervised graph convolutional networks (SSGCNs) have been proven to be effective in hyperspectral image classification (HSIC). However, limited training data and spectral uncertainty restrict the classification performance, and the computational demands of a graph convolution network (GCN) present challenges for real-time applications. To overcome these issues, a dual-branch fusion of a GCN and convolutional neural network (DFGCN) is proposed for HSIC tasks. The GCN branch uses an adaptive multi-scale superpixel segmentation method to build fusion adjacency matrices at various scales, which improves the graph convolution efficiency and node representations. Additionally, a spectral feature enhancement module (SFEM) enhances the transmission of crucial channel information between the two graph convolutions. Meanwhile, the CNN branch uses a convolutional network with an attention mechanism to focus on detailed features of local areas. By combining the multi-scale superpixel features from the GCN branch and the local pixel features from the CNN branch, this method leverages complementary features to fully learn rich spatial–spectral information. Our experimental results demonstrate that the proposed method outperforms existing advanced approaches in terms of classification efficiency and accuracy across three benchmark data sets.


Introduction
A hyperspectral image (HSI) offers more refined spectral information than other remote sensing images (e.g., optical and multispectral images), making it advantageous for hyperspectral image classification (HSIC) tasks.HSIC has various applications in military detection, mineral exploration, agricultural production, urban planning, and environmental monitoring [1][2][3][4][5].However, HSI has inherent characteristics such as high-dimensional data, a limited training sample size, and spectral uncertainty.Predicting the true class of each pixel with high accuracy based on spatial-spectral features in HSI remains a challenge [3] and merits further research.
During early HSIC research, most studies focused on investigating how spectral characteristics functioned in classification, proposing many traditional pixel-based classification methods, covering support vector machines (SVMs) [6], K-nearest neighbors [7], polynomial logistic regression [8,9], etc.The main disadvantage of these methods is that the feature engineering step is a time-consuming task, and the classification accuracy tends to be reduced when highly relevant features or features with low information content are extracted in this step.Given the inherent nonlinear relationship between spectral information Sensors 2024, 24, 4760 2 of 22 and the corresponding materials in HSI, it is challenging to accurately classify such data using traditional machine learning methods.
Deep learning (DL) is regarded as a powerful tool that can be used to solve nonlinear problems.It has been utilized extensively in several image processing tasks, including image classification [10,11], target detection [12], natural language processing [13], etc. Inspired by the success of these applications, HSIC has also benefited from DL, demonstrating promising results.Currently, one of the most important technical indicators of the classification model's quality is how to avoid the dimension disaster problem and extract useful and discriminative feature information from the high-dimensional feature space of HSIs.In [14][15][16][17][18], most scholars employ 2D CNNs to extract the spatial characteristics in the hyperspectral pixel neighborhood after first performing a principal component analysis (PCA) on the entire set of hyperspectral data to lower the dimensionality of the original space.This method can efficiently extract spatial information and lower computing costs by integrating PCA and CNNs.Nevertheless, spectral information is unavoidably lost during the dimensionality reduction process, which might impact the model's ability to comprehend input features and its overall classification performance.Li et al. proposed a three-dimensional convolutional neural network (3-D-CNN) that can synchronously analyze spatial spectrum characteristics and produce impressive classification performance [19].It is worth noting that a diverse-region CNN (DR-CNN) utilized various neighboring areas of the center target pixel; nevertheless, when the label sample size was insufficient, the model's performance was a cause for concern in terms of generality [20].
In addition, as the data volume increases, the issue of scarce labeling samples worsens, making it challenging to use the above-mentioned CNN-based supervised learning approach in the absence of a sufficient number of labeled training samples.Consequently, scholars began focusing on the semi-supervised learning (SSL) approach, which utilized the labeled data and information from unlabeled samples as a supplement.For instance, Zhou et al. suggested a label propagation approach that makes use of labeled samples to execute label propagation on the full HSI to obtain labels for unlabeled samples [21].However, because of the sensitivity of the parameters, this approach is prone to noise.The semi-supervised support vector machine (S3VM) included the construction of a support vector machine classifier by combining existing labeled samples and adding a certain proportion of unlabeled samples [22].Although it produces good classification results, a good generalization performance necessitates careful parameter adjustment.Makhzsani et al. provided a semi-supervised classification method by restricting the reconstruction error and reconstructing HSIs using autoencoders [23].Common algorithms include sparse autoencoders and variational autoencoders.Good classification results have been obtained using this strategy.However, it is too time-consuming.
Notably, semi-supervised graph convolutional networks (SSGCNs) have demonstrated notable efficacy as one of the most efficient SSL techniques.By effectively processing the local spatial characteristics and global semantic characteristics in HSIs, the utilization of features from unlabeled nodes and the comprehensive learning of the interaction and feature transfer between nodes have led to a notable increase in classification accuracy.However, a traditional GCN can compile and convert features from every graph node's neighbor, but it only utilizes spectral features and overlooks the significant space structures that are embedded in the original HSI data [24].Moreover, when dealing with a large number of pixels, the construction and computing costs associated with the graph structure are unfeasible.Compared with the original GCN, Qin et al. proposed a spectral-space GCN (S 2 GCN) that achieved superior accuracy in classification [25].Nevertheless, this approach only employs a fixed neighborhood size and graph throughout the graph convolution process, making it unable to flexibly capture spectral-spatial information from various local areas and accurately portray the intrinsic relationship between pixels.Consequently, Wan et al. [26] suggested a multi-scale dynamic GCN (MDGCN) that included superpixels in multi-hop graph learning, saving training time while reducing computing complexity.However, integrating multi-scale spatial information utilizing a spatial multi-hop graph Sensors 2024, 24, 4760 3 of 22 structure may lead to classifier deviation, which would impact the classification performance.Li et al. [27] proposed a novel framework called SGML, which can better capture the similarities and differences between samples to improve classification efficiency by combining graph-embedding technology and metric learning methods.However, this network adopts a multi-scale superpixel segmentation technology to process hyperspectral images, which is likely to ignore the pixel features of local details.Dong et al. [28] proposed a weighted feature fusion method combining a convolutional neural network (CNN) and graph attention network (GAT), which led to the proposal of a new solution for dual-branch fusion networks, but the classification effect still needs to be improved.
Hence, in this paper, we propose a dual-branch fusion of a GCN and CNN, namely DFGCN, to achieve superior hyperspectral image classification outcomes.First, a multiscale superpixel segmentation method is employed in the GCN branch to optimize the utilization of various feature information points related to shapes and sizes.Additionally, this approach significantly reduces computational cost by converting the algorithmic calculation unit from individual pixels to superpixels.Next, fusion-adjacent matrices are created based on superpixels for each scale to better measure the similarity between the graph nodes, resulting in more efficient graph convolution and stronger node representation.Then, a spectral feature enhancement module between the two graph convolutions enhances the most important channels of information during data transmission.In the CNN branch, we designed a convolutional network with an attention mechanism to concentrate on extracting detailed features of local areas.Through the fusion of the multi-scale superpixel features from the GCN branch and the local pixel features from the CNN branch, our proposed approach comprehensively captures and fully learns rich spatial-spectral information, thereby enhancing classification performance.The following are the novel aspects of this study: (1) The methodology adopted in this research involves the construction of a fusion adjacency matrix following the segmentation of an HSI using multi-scale superpixel segmentation.The incorporation of the Pearson correlation coefficient as a supplement to the construction of a similarity function based on Euclidean distance is a critical aspect of this study, and the weight ratio between the two is of paramount importance.The introduction of new adjacency matrices plays a vital role in discovering novel graph structures, facilitating the learning of more powerful node representations and enhancing the effectiveness of the graph convolutions.The proposed technique enables the extraction of spatial information features that are more comprehensive and discriminative than existing methods do.(2) The spectral feature enhancement module was designed in the middle of the two graph convolutions to enhance important channel information in a self-supervised way to extract more important spectral information.(3) We fused the GCN branch based on multi-scale superpixel segmentation with the CNN branch, which included an attention mechanism, to fully extract the longdistance contextual information and local detail features of the HSI.Furthermore, our extensive experiments demonstrated that the proposed DFGCN outperforms several widely used and advanced classification techniques in terms of classification results.
The remainder of this article is arranged as follows: Our methods are introduced in Section 2 and include the entire architecture, a synopsis of the superpixel segmentation, and a detailed implementation of the proposed DFGCN.Our experimental data sets and evaluation indicators are described in Section 3. Our extensive experiment results are presented in Section 4. Further analysis and discussion are included in Section 5. Section 6 presents our conclusions.

Architecture of the Proposed DFGCN
This section describes the proposed DFGCN, which can be seen in Figure 1.It is primarily divided into two branches: the GCN branch, based on multi-scale superpixel segmentation, and the CNN branch with an attention mechanism.In the GCN branch, we perform adaptive multi-scale superpixel segmentation on the first principal component after reducing the dimensionality of the HSI using the PCA method.For each scale, we map the graph nodes from the pixel scale to the superpixel scale.Then, we carry out fusion adjacency matrix construction (FAMC).We establish a spectral feature-enhanced module between the two graph convolutions and employ them for spectral feature extraction.In the CNN branch, we design a convolutional network with an attention mechanism to focus on extracting detailed features of local areas.Finally, we fuse the complementary features of the two branches and send them to the classifier.In the following section, we will provide a detailed description of the main DFGCN implementation procedures, including the multi-scale superpixel segmentation method, construction of the fusion adjacency matrix, design of the spectral feature improvement module, and structure of the CNN branch.

Architecture of the Proposed DFGCN
This section describes the proposed DFGCN, which can be seen in Figure 1.It is primarily divided into two branches: the GCN branch, based on multi-scale superpixel segmentation, and the CNN branch with an a ention mechanism.In the GCN branch, we perform adaptive multi-scale superpixel segmentation on the first principal component after reducing the dimensionality of the HSI using the PCA method.For each scale, we map the graph nodes from the pixel scale to the superpixel scale.Then, we carry out fusion adjacency matrix construction (FAMC).We establish a spectral feature-enhanced module between the two graph convolutions and employ them for spectral feature extraction.In the CNN branch, we design a convolutional network with an a ention mechanism to focus on extracting detailed features of local areas.Finally, we fuse the complementary features of the two branches and send them to the classifier.In the following section, we will provide a detailed description of the main DFGCN implementation procedures, including the multi-scale superpixel segmentation method, construction of the fusion adjacency matrix, design of the spectral feature improvement module, and structure of the CNN branch.

Superpixel Segmentation
Superpixel segmentation is a technique that enhances the ability to extract semantic information from images by aggregating pixels in the image that have similar color and texture features into a more significant and recognizable portion [29].This new portion serves as the fundamental component of subsequent image processing, which can greatly reduce the computational burden, as seen in Figure 2 below.Furthermore, superpixel segmentation has already been employed as a preprocessing technique in many HSIC methods and has proven to be effective [30].For example, Li et al. proposed a symmetric graph metric learning framework based on a multi-scale adaptive superpixel segmentation technique to increase classification efficiency using the graph's structural characteristics and metric learning technology [27].Jia et al. presented methods for clustering pixels with similar spectral characteristics by carrying out weighted label propagation on superpixels [31].They reduced the computing time and obtained a notable classification performance.Specifically, entropy rate segmentation (ERS) is usually chosen to produce superpixels because of its efficacy compared with other methods [32].In summary, ERS could be translated as the solution to the following objective function, since it is a graph-based technique:

Superpixel Segmentation
Superpixel segmentation is a technique that enhances the ability to extract semantic information from images by aggregating pixels in the image that have similar color and texture features into a more significant and recognizable portion [29].This new portion serves as the fundamental component of subsequent image processing, which can greatly reduce the computational burden, as seen in Figure 2 below.Furthermore, superpixel segmentation has already been employed as a preprocessing technique in many HSIC methods and has proven to be effective [30].For example, Li et al. proposed a symmetric graph metric learning framework based on a multi-scale adaptive superpixel segmentation technique to increase classification efficiency using the graph's structural characteristics and metric learning technology [27].Jia et al. presented methods for clustering pixels with similar spectral characteristics by carrying out weighted label propagation on superpixels [31].They reduced the computing time and obtained a notable classification performance.Specifically, entropy rate segmentation (ERS) is usually chosen to produce superpixels because of its efficacy compared with other methods [32].In summary, ERS could be translated as the solution to the following objective function, since it is a graph-based technique: arg p maxE(P) + γA(P) (1) Here, E(P) is the limitation of the entropy rate, which is used to create homogeneous clusters.A(P) is a balanced constraint that lowers the amount of imbalanced superpixels by requiring clusters to have comparable spatial sizes.γ represents the balance of the constraint's weight coefficient, which must be greater than or equal to 0.
Here, ( ) E P is the limitation of the entropy rate, which is used to create homogene- ous clusters.( ) A P is a balanced constraint that lowers the amount of imbalanced super- pixels by requiring clusters to have comparable spatial sizes. represents the balance of the constraint's weight coefficient, which must be greater than or equal to 0.

Fusion Adjacency Matrix Construction
Next, we will introduce how to build a fusion-graph-adjacent matrix using an HSI after multi-scale superpixel segmentation.First of all, we will briefly introduce the process of converting pixels into superpixels.Because superpixels may automatically modify their size and shape based on the HSI content, they are an excellent way to describe land cover.Consequently, we leverage superpixels to make further graph learning easier.In this case, the superpixel's value is determined by weighing the average of the pixels that make up one superpixel: be the labeled and unlabeled superpixels, respectively.The length of every superpixel is denoted by L .Superpixels with labeled and unlabeled samples are denoted by The majority of currently existing GCN-based HSIC methods employ a single Euclidean distance to build the similarity function of the graph adjacency matrix [27,33].As an intuitive distance measurement method, Euclidean distance can measure the degree of difference between pixels, expressed by the geometric distance between hyperspectral pixels, more effectively (Formula (2)).However, it does not properly account for the linear correlation between data, which could lead to a graph adjacency matrix that may not accurately capture the complex data features in the HSI, thereby impacting the accuracy and stability of classification and weakening the robustness of the algorithm.

( ) = ( )
Therefore, we consider introducing the Pearson correlation coefficient as a supplement to construct the similarity function based on the Euclidean distance.The Pearson correlation coefficient measures the similarity between variables based on their covariance (Formula (3)), which takes into account the linear relationship between variables to determine whether variables change in similar or opposite trends.

Fusion Adjacency Matrix Construction
Next, we will introduce how to build a fusion-graph-adjacent matrix using an HSI after multi-scale superpixel segmentation.First of all, we will briefly introduce the process of converting pixels into superpixels.Because superpixels may automatically modify their size and shape based on the HSI content, they are an excellent way to describe land cover.Consequently, we leverage superpixels to make further graph learning easier.In this case, the superpixel's value is determined by weighing the average of the pixels that make up one superpixel: ul be the labeled and unlabeled superpixels, respectively.The length of every superpixel is denoted by L. Superpixels with labeled and unlabeled samples are denoted by S1 and S2, respectively, and S1 + S2 = S.By means of majority voting of the contained pixels, the corresponding labels of X sp l are selected.After that, every superpixel is utilized to create the graph G = (X, A), with X ∈ R S×L representing each graph node's superpixel characteristic.
The majority of currently existing GCN-based HSIC methods employ a single Euclidean distance to build the similarity function of the graph adjacency matrix [27,33].As an intuitive distance measurement method, Euclidean distance can measure the degree of difference between pixels, expressed by the geometric distance between hyperspectral pixels, more effectively (Formula (2)).However, it does not properly account for the linear correlation between data, which could lead to a graph adjacency matrix that may not accurately capture the complex data features in the HSI, thereby impacting the accuracy and stability of classification and weakening the robustness of the algorithm.
Therefore, we consider introducing the Pearson correlation coefficient as a supplement to construct the similarity function based on the Euclidean distance.The Pearson correlation coefficient measures the similarity between variables based on their covariance (Formula (3)), which takes into account the linear relationship between variables to determine whether variables change in similar or opposite trends.
Constructing a similarity function, which more effectively captures the spatial relationship, feature similarity, and correlation between different pixels in an HSI, is achieved by combining the Euclidean distance and Pearson correlation coefficient.This creates more comprehensive feature information while improving the performance of the HSIC tasks and enhances the classification efficacy and accuracy.The structural function of the graph-adjacent matrix is shown in Formula (4), where adj denotes that two graph nodes are adjacent and α represents the weight ratio of two similar measurement methods.

Graph Convolutional Network
When it comes to spectral-based convolutional graph neural networks, the GCN is one of the most often used techniques.Its main use in topological graphs is to extract pertinent vertices and edges' spatial characteristics [34,35].Notably, Kipf and Welling [24] developed an efficient layer-by-layer propagation method that can encode node properties and the local graph structure, leading to a more stable state, using Chebyshev polynomials for estimating the convolution kernel.In short, from the Fourier perspective of graph Laplacian [36], the definition of the convolution operation is as follows: Among them, the convolution filter parameterized by θ is represented by g θ , while x g represents the graph signal.The eigenvector matrix of the normalized graph Laplacian is represented by the symbol U, which may be written as The identity matrix with the proper size is represented by the symbol I.The graph's degree matrix is denoted by D, while the adjacent matrix is represented by A. The diagonal matrix Λ corresponds to the eigenvalues of L. Next, the authors in [34] used the truncated translational Chebyshev polynomial T k (x) to approximate g θ .This can be stated as follows: where θ k is the kth Chebyshev coefficient, and shifted, L = (2/λ max )L − I. λ max is the greatest eigenvalue of L. Notably, this operation is K-localized, since it uses the Kthorder polynomial of the Laplacian.The layer-by-layer convolution process is further approximated and restricted to K = 1 using the GCN [24].The computation formula is as follows: where θ ′ 0 and θ ′ 1 are shared by the two free parameters across the entire graph.Under restrictive conditions of θ = θ ′ 0 = −θ ′ 1 , (4) can be simplified to To prevent the issue of disappearance/explosion gradients and numerical instability, Lastly, the graph convolution can be expressed as follows for the signal X ∈ R N×C (N nodes): where Θ ∈ R C×F and F are the trainable convolutional variables and the number of kernels, respectively.The graph convolution's output is represented by Y ∈ R N×F .Moreover, considering the highly nonlinear geometric nature of an HSI in the characteristics area, which is susceptible to changes in lighting, environment, atmosphere, time conditions, etc., we may potentially improve the robustness of the experiment by working on the graph [37].Several studies have used a GCN to classify an HSI, thereby achieving encouraging results.In this work, we further explore how to fully utilize the advantages of graph convolution by supplementing the spatial information across scales and taking the similarities and correlations across nodes into account.Specifically, we spread the labeled sample feature into the unlabeled samples using graph convolution and design, designing the spectral feature enhancement module to study the local correlation of spectral features within nodes.Through multi-scale interaction and deep feature mining operations, we obtain more representative and discriminative features and achieve highly accurate classification results.

Spectral Feature Enhancement Module
According to the GCN theory, the primary purpose of graph convolution is to propagate information across nodes without taking into account how important the internal relationships of nodes are.On the other hand, local and non-local spectral features are highly significant for classification tasks when processing an HSI, as they are tightly associated with nodes in the graph.Thus, we sandwich the spectral feature enhancement module (SFEM) between two graph convolutions.The purpose of this module is to enhance the expressive ability of spectral features so that it can more effectively discern distinctions between various categories.Figure 3 depicts the design of this module.kernels, respectively.The graph convolution's output is represented by Y  .Moreover, considering the highly nonlinear geometric nature of an HSI in the characteristics area, which is susceptible to changes in lighting, environment, atmosphere, time conditions, etc., we may potentially improve the robustness of the experiment by working on the graph [37].Several studies have used a GCN to classify an HSI, thereby achieving encouraging results.In this work, we further explore how to fully utilize the advantages of graph convolution by supplementing the spatial information across scales and taking the similarities and correlations across nodes into account.Specifically, we spread the labeled sample feature into the unlabeled samples using graph convolution and design, designing the spectral feature enhancement module to study the local correlation of spectral features within nodes.Through multi-scale interaction and deep feature mining operations, we obtain more representative and discriminative features and achieve highly accurate classification results.

Spectral Feature Enhancement Module
According to the GCN theory, the primary purpose of graph convolution is to propagate information across nodes without taking into account how important the internal relationships of nodes are.On the other hand, local and non-local spectral features are highly significant for classification tasks when processing an HSI, as they are tightly associated with nodes in the graph.Thus, we sandwich the spectral feature enhancement module (SFEM) between two graph convolutions.The purpose of this module is to enhance the expressive ability of spectral features so that it can more effectively discern distinctions between various categories.Figure 3 depicts the design of this module.To better capture details and heterogeneous information, we initially perform two lightweight one-dimensional convolution(1-D-Conv) operations on the spectral features, first increasing the dimension and then reducing it.The output is then scaled to within the range of 0 to 1 as significant factors of various channels of graph nodes using the sigmoid function.Features of significant channels are then highlighted by performing an element-wise multiplication of these factors with the input graph node features.We also add the original features with the aforementioned outcomes to prevent unnecessary information loss.In using this self-supervision approach, fewer significant spectral features are comparatively limited, while the critical feature expression of channels is boosted.The following formula represents the SFEM: where h 1 represents the input graph node feature, and w 1 and w 2 are the weights of two 1-D-Conv, respectively.⊙ represents the pixel product operation.

Structure of CNN Branch
Superpixel segmentation technology aggregates pixels in an HSI and represents them with the same features, but the same-spectrum and different-spectrum characteristics of HSI data may lead to erroneous superpixel segmentation, thus affecting the subsequent classification accuracy.Secondly, after treating each superpixel as a graph node, information can only be propagated between each superpixel, ignoring the local spatial spectrum information within the superpixel.Considering the above factors, we designed a CNN branch with an attention mechanism to obtain local detail features to solve the problems of edge smoothing and detail loss during classification, which may be caused by the superpixel segmentation technology.This branch consists of two Squeeze-and-Excitation (SE) attention mechanisms and depthwise separable convolutions.Among them, the SE attention mechanism ensures better classification results with a small amount of calculation.Depthwise separable convolution is a special convolution operation in a CNN that aims to reduce the number of parameters and calculations of the model while improving its efficiency and performance.
The SE module includes three key steps: First, compress the spatial dimension of the input features from a three-dimensional tensor of H × W × C to a tensor of 1 × 1 × C. Second, generate each feature through a fully connected layer.The excitation weight of the channel is used to characterize its importance.Finally, multiply these weights by the original feature tensor to adjust the importance of each channel in the feature map, highlight important features, and suppress the influence of unimportant features, thereby achieving adaptation attention weighting.A structural diagram of this process is shown in Figure 4.
add the original features with the aforementioned outcomes to prevent unnecessary information loss.In using this self-supervision approach, fewer significant spectral features are comparatively limited, while the critical feature expression of channels is boosted.The following formula represents the SFEM: where 1   h represents the input graph node feature, and 1 w and 2 w are the weights of two 1-D-Conv, respectively. represents the pixel product operation.

Structure of CNN Branch
Superpixel segmentation technology aggregates pixels in an HSI and represents them with the same features, but the same-spectrum and different-spectrum characteristics of HSI data may lead to erroneous superpixel segmentation, thus affecting the subsequent classification accuracy.Secondly, after treating each superpixel as a graph node, information can only be propagated between each superpixel, ignoring the local spatial spectrum information within the superpixel.Considering the above factors, we designed a CNN branch with an a ention mechanism to obtain local detail features to solve the problems of edge smoothing and detail loss during classification, which may be caused by the superpixel segmentation technology.This branch consists of two Squeeze-and-Excitation (SE) a ention mechanisms and depthwise separable convolutions.Among them, the SE a ention mechanism ensures be er classification results with a small amount of calculation.Depthwise separable convolution is a special convolution operation in a CNN that aims to reduce the number of parameters and calculations of the model while improving its efficiency and performance.
The SE module includes three key steps: First, compress the spatial dimension of the input features from a three-dimensional tensor of H × W × C to a tensor of 1 × 1 × C. Second, generate each feature through a fully connected layer.The excitation weight of the channel is used to characterize its importance.Finally, multiply these weights by the original feature tensor to adjust the importance of each channel in the feature map, highlight important features, and suppress the influence of unimportant features, thereby achieving adaptation a ention weighting.A structural diagram of this process is shown in Figure 4. Depthwise separable convolution consists of two steps: depth convolution and pointwise convolution.First, a convolution kernel of size K × K × 1 is applied to each channel of the image of an input size of H × W × B for convolution operation, which will produce B feature maps of sizes of H × W, where each feature map corresponds to a channel of the input image.Then, use a convolution kernel of a size of 1 × 1 × B to perform a point- Depthwise separable convolution consists of two steps: depth convolution and pointwise convolution.First, a convolution kernel of size K × K × 1 is applied to each channel of the image of an input size of H × W × B for convolution operation, which will produce B feature maps of sizes of H × W, where each feature map corresponds to a channel of the input image.Then, use a convolution kernel of a size of 1 × 1 × B to perform a point-bypoint convolution operation on the feature map obtained via depth convolution, which is equivalent to a linear combination between channels and will eventually produce an output result of H × W × M. Here, M is the number of output channels in the pointwise convolution operation.Compared with traditional convolution methods, depth-separable convolution can reduce the computational cost to 1/K 2 .

Experiments
We conducted extensive tests using three standard HSI data sets, namely Indian Pines (IndianP), Pavia University (PaviaU), and Kennedy Space Center (KSC), to assess the efficacy of the proposed DFGCN.For comparison, we selected several prevalent and advanced machine learning-, CNN-, and GCN-based methods, including RBF-SVM [6], 2-D-CNN [17], 3-D-CNN [19], GCN [24], S 2 GCN [25], MDGCN [26], and SGML [27].An Intel (R)Core (TM) i7-8700K CPU with 12 GB of RAM, an NVIDIA RTX3080Ti graphics card, and the TensorFlow deep learning framework were used with all algorithms.This device is manufactured by Intel and located in Santa Clara, CA, USA.by-point convolution operation on the feature map obtained via depth convolution, which is equivalent to a linear combination between channels and will eventually produce an output result of H × W × M. Here, M is the number of output channels in the pointwise convolution operation.Compared with traditional convolution methods, depth-separable convolution can reduce the computational cost to 1/K 2 .

Experiments
We conducted extensive tests using three standard HSI data sets, namely Indian Pines (IndianP), Pavia University (PaviaU), and Kennedy Space Center (KSC), to assess the efficacy of the proposed DFGCN.For comparison, we selected several prevalent and advanced machine learning-, CNN-, and GCN-based methods, including RBF-SVM [6], 2-D-CNN [17], 3-D-CNN [19], GCN [24], S 2 GCN [25], MDGCN [26], and SGML [27].An Intel (R)Core (TM) i7-8700K CPU with 12 GB of RAM, an NVIDIA RTX3080Ti graphics card, and the TensorFlow deep learning framework were used with all algorithms.This device is manufactured by Intel and located in Santa Clara, CA, USA.    2.    7. Specific data on the training, test, and total samples of each class are listed in Table 3.

3.
Kennedy Space Center: The KSC scene was collected using the AVIRIS sensor above the Kennedy Space Center in Florida and covers 224 bands with wavelengths from 0.4 to 2.5 µm.The KSC maintains 176 bands after the abandonment of low-SNR and suction channels.The space size is 614 × 512 pixels, and the space resolution is 18 m, covering 13 different types of earth objects.The false-color composite image and ground truth are displayed in Figure 7. Specific data on the training, test, and total samples of each class are listed in Table 3.

Evaluation Parameters
We employ widely used assessment parameters, such as the overall accuracy (OA), average accuracy (AA), Kappa coefficient, and F1-Score, to objectively and comprehensively assess the effectiveness of the proposed DFGCN in classification tasks.The OA is the proportion of the accurately classified sample to the entire classification sample, the AA is defined as the average of the classification accurateness, and the Kappa statistic [38] is the difference between the chance of classification results being consistent with the chances of matching actual results, i.e., confusing the line and column of the matrix and between them.It should be noted that Kappa values are between −1 and 1, and be er classification models mean that the Kappa value is inclined to 1.The F1-Score is used to measure the performance of a classification model.It takes both the precision and recall of a classification model into account.The closer the F1-Score value is to 1, the be er the classification performance is.The following formula represents these parameters:

Evaluation Parameters
We employ widely used assessment parameters, such as the overall accuracy (OA), average accuracy (AA), Kappa coefficient, and F1-Score, to objectively and comprehensively assess the effectiveness of the proposed DFGCN in classification tasks.The OA is the proportion of the accurately classified sample to the entire classification sample, the AA is defined as the average of the classification accurateness, and the Kappa statistic [38] is the difference between the chance of classification results being consistent with the chances of matching actual results, i.e., confusing the line and column of the matrix and between them.It should be noted that Kappa values are between −1 and 1, and better classification models mean that the Kappa value is inclined to 1.The F1-Score is used to measure the performance of a classification model.It takes both the precision and recall of a classification model into account.The closer the F1-Score value is to 1, the better the classification performance is.The following formula represents these parameters: where N c and N a represent the number of samples that are correctly classified and the overall number of samples, and N i c and N i a coincide with the sums of each category in N c and N a , respectively.

Kappa =
OA − P e 1 − P e P e is the hypothetical probability of coincidence in chance.The following formula can be used to obtain P e : In the formula, N i r and N i p denote the number of actual samples for every class and the number of predicted samples in every class, respectively.

Experimental Settings
In this section, we conduct intricate tests to maximize the usefulness of the suggested DFGCN.First of all, we select three levels of superpixel segmentation, corresponding to the subsequent three different spatial scales.In the process of constructing the adjacent matrix, we use the combination of the Euclidean distance and Pearson correlation coefficient to construct the similarity function.We create a weight ratio between the two that ranges from 0.1 to 0.9 and run tests in increments of 0.1.As shown in Figure 8, we discover that the weight parameter value of 0.5 produces the best experimental outcomes on the three benchmark HSIC data sets.Thus, we set it to 0.5 in our ensuing studies.
DFGCN.First of all, we select three levels of superpixel segmentation, corresponding to the subsequent three different spatial scales.In the process of constructing the adjacent matrix, we use the combination of the Euclidean distance and Pearson correlation coefficient to construct the similarity function.We create a weight ratio between the two that ranges from 0.1 to 0.9 and run tests in increments of 0.1.As shown in Figure 8, we discover that the weight parameter value of 0.5 produces the best experimental outcomes on the three benchmark HSIC data sets.Thus, we set it to 0.5 in our ensuing studies.In addition, we analyze the impact of the parameters in the SFEM on the model's performance.Specifically, the parameter varies in the range of {4,8,16,32}.Figure 9 displays the overall categorization accuracies.It can be seen that, for the IndianP, PaviaU, and KSC data sets, respectively, the best classification results are obtained when γ equals 8, 8, and 32.In addition, we analyze the impact of the parameters in the SFEM on the model's performance.Specifically, the parameter varies in the range of {4, 8, 16, 32}. Figure 9 displays the overall categorization accuracies.It can be seen that, for the IndianP, PaviaU, and KSC data sets, respectively, the best classification results are obtained when γ equals 8, 8, and 32.
Sensors 2024, 24, x FOR PEER REVIEW 13 of 24 Finally, in the CNN branch, the convolution kernel size is 3 × 3, the first convolution layer contains 128 convolution kernels, and the second layer contains 64 convolution kernels.We use the full-batch gradient descent method and the effective Adam approach for parameter optimization.The KSC data set has a learning rate of 5 × 10 −5 , whereas the In-dianP and KSC data sets are both set at 5 × 10 −4 .Furthermore, 500 is the stated epoch number.

Classification Performance
To illustrate the advantages of the suggested method, we contrast DFGCN with the currently popular advanced HSIC approaches, where the se ing of the parameters aligns with the relevant paper.Specifically, the RBF-SVM [6] uses the RFB kernel to describe an SVM.Among the popular CNN-based methods are 2-D-CNN [17] and 3-D-CNN [19], which are widely cited as benchmarks for spectral-spatial HSIC.In addition, the GCN [24] introduced the DL paradigm into graph learning.S 2 GCN [25] further utilizes the spectral and spatial features in an HSI.Using multi-scale spatial information, the MDGCN [26] is one of the most recently proposed SOTA approaches.SGML [27] is a newly proposed symmetric graph metric learning framework.Tables 4-6 provide the details of the mean per-class accuracy, AA, OA, Kappa, and F1-Score values for the above-mentioned models.Bold text indicates the rows with the highest accuracy levels.
In Table 4, we observe that the proposed DFGCN acquires the best results for the AA, OA, and Kappa.As predicted, DL-based techniques outperform conventional RBF-SVM methods.Regarding the AA, the semi-supervised learning approach based on a GCN outperforms the supervised learning approach based on a CNN, which, to some degree, illustrates the potential of graph convolution in HSIC.More significantly, the accuracies of most categories in the DFGCN rank first among these methods, especially for the tenth category, namely Soybean-no-till, for which DFGCN can obtain an accuracy of up to 99.46%, while other approaches find it challenging to identify.This infers that the multiscale and fusion graphs have a strong capacity to analyze the rich spectral and spatial characteristics contained in an HSI.Moreover, the SFEM can produce more differential discriminative features, thereby improving the performance of the classification.
Table 5 displays the classification performance of several algorithms using the PaviaU Finally, in the CNN branch, the convolution kernel size is 3 × 3, the first convolution layer contains 128 convolution kernels, and the second layer contains 64 convolution kernels.We use the full-batch gradient descent method and the effective Adam approach for parameter optimization.The KSC data set has a learning rate of 5 × 10 −5 , whereas the IndianP and KSC data sets are both set at 5 × 10 −4 .Furthermore, 500 is the stated epoch number.

Classification Performance
To illustrate the advantages of the suggested method, we contrast DFGCN with the currently popular advanced HSIC approaches, where the setting of the parameters aligns with the relevant paper.Specifically, the RBF-SVM [6] uses the RFB kernel to describe an SVM.Among the popular CNN-based methods are 2-D-CNN [17] and 3-D-CNN [19], which are widely cited as benchmarks for spectral-spatial HSIC.In addition, the GCN [24] introduced the DL paradigm into graph learning.S 2 GCN [25] further utilizes the spectral and spatial features in an HSI.Using multi-scale spatial information, the MDGCN [26] is one of the most recently proposed SOTA approaches.SGML [27] is a newly proposed symmetric graph metric learning framework.Tables 4-6 provide the details of the mean per-class accuracy, AA, OA, Kappa, and F1-Score values for the above-mentioned models.Bold text indicates the rows with the highest accuracy levels.
In Table 4, we observe that the proposed DFGCN acquires the best results for the AA, OA, and Kappa.As predicted, DL-based techniques outperform conventional RBF-SVM methods.Regarding the AA, the semi-supervised learning approach based on a GCN outperforms the supervised learning approach based on a CNN, which, to some degree, illustrates the potential of graph convolution in HSIC.More significantly, the accuracies of most categories in the DFGCN rank first among these methods, especially for the tenth category, namely Soybean-no-till, for which DFGCN can obtain an accuracy of up to 99.46%, while other approaches find it challenging to identify.This infers that the multiscale and fusion graphs have a strong capacity to analyze the rich spectral and spatial characteristics contained in an HSI.Moreover, the SFEM can produce more differential discriminative features, thereby improving the performance of the classification.Table 5 displays the classification performance of several algorithms using the PaviaU data sets.Given that the training samples of the data set are relatively dispersed, most approaches find it challenging to achieve accurate classification, and, therefore, the classification accuracy is relatively low.In terms of a comprehensive consideration of spectral and spatial information, 3-D-CNN and S 2 GCN achieve improved performance compared with 2-D-CNN and GCN, which confirms the importance of spectral and spatial feature merging in HSIs.Most importantly, we find that the DFGCN also achieves an obvious advantage over the MDGCN.The AA, OA, Kappa, and F1-Score received an inspiring 2.72%, 2.32%, 2.98%, and 2.28% increase, respectively.Compared with SGML, the DFGCN has a better classification effect at image edges, which may be related to the local detail features extracted by the CNN branch.It makes sense to assume that the suggested framework can extract spectral and spatial characteristics more effectively.
From Table 6, we find that the classification results of the DFGCN reach the highest level compared with other classification methods using four parameters, the AA, OA, Kappa, and F1-Score, with 100% correct predictions for up to 11 covered categories.It is worth noting that due to the superpixel segmentation preprocessing, as well as the multiscale feature fusion, the MDGCN and SGML obtain amazing classification results with fewer parameters, which may be due to the important role of the multi-scale architecture in HSIC.This is another significant justification for this paper's usage of a multi-scale design.It is evident from the classification performance on three benchmark data sets that a network consisting of a fusion adjacency matrix, SFEM, and multi-scale architecture has obvious benefits.
Regarding qualitative analysis, Figures 10-12 plot the classification maps along with the ground truth obtained on different models using the IndianP, PaviaU, and KSC data sets.We discover that the suggested DFGCN performs the best since its classification maps most closely resemble the real-world scenario and have the lowest classification error.Comparatively, the 2-D-CNN only considers spatial features, and the classification effect is not ideal, with large pieces of pepper-salt noise on the classified map.The 3-D-CNN takes spatially adjacent samples into account, obtaining relatively tight classification maps with significantly better classification effects.In addition, the GCN maps also contain some of the dispersed pepper-salt noise, which is associated with the use of spectral features only.Specifically, we note that the proposed DFGCN, in contrast to the MDGCN and SGML, can yield more accurate findings at the regional margins, which are primarily the locations of border samples that are hard to differentiate.This suggests that the proposed DFGCN may be able to identify various surface coverings with similar spectral-spatial information and extract comprehensive and distinctive features.In addition, we use the PU data set to calculate the ROC curve and AUC value of each category and the macro-average AUC and micro-average AUC for evaluating the performance of the model.The ROC curve shows the trade-off between the true positive rate and the false positive rate under different thresholds.The AUC value represents the area under the ROC curve and is an important indicator to measure the performance of the classifier.The closer the value is to 1, the be er the performance is. Figure 13 shows the ROC curve and AUC value of the PU data set.It can be seen that the AUC values of categories 3, 5, 7, 8, and 9 are all 1, and the ROC curve is almost completely in the upper-left corner, achieving a very ideal classification situation.The AUC values of categories 1, 2, 4, and 6 are 0.97, 0.90, 0.97, and 0.98, respectively, which means that the classification ability of the DFGCN in these categories is slightly weaker than that of other categories.Although it is slightly lower than 1, it achieves satisfactory classification results.At the same time, the macro-average and micro-average AUC reached 0.98 and 0.94, respectively, indicating that the DFGCN has a good generalization ability and classification accuracy in multi-category classification tasks.In addition, we use the PU data set to calculate the ROC curve and AUC value of each category and the macro-average AUC and micro-average AUC for evaluating the performance of the model.The ROC curve shows the trade-off between the true positive rate and the false positive rate under different thresholds.The AUC value represents the area under the ROC curve and is an important indicator to measure the performance of the classifier.The closer the value is to 1, the better the performance is. Figure 13 shows the ROC curve and AUC value of the PU data set.It can be seen that the AUC values of categories 3, 5, 7, 8, and 9 are all 1, and the ROC curve is almost completely in the upper-left corner, achieving a very ideal classification situation.The AUC values of categories 1, 2, 4, and 6 are 0.97, 0.90, 0.97, and 0.98, respectively, which means that the classification ability of the DFGCN in these categories is slightly weaker than that of other categories.Although it is slightly lower than 1, it achieves satisfactory classification results.At the same time, the macro-average and micro-average AUC reached 0.98 and 0.94, respectively, indicating that the DFGCN has a good generalization ability and classification accuracy in multi-category classification tasks.In addition, we use the PU data set to calculate the ROC curve and AUC value of each category and the macro-average AUC and micro-average AUC for evaluating the performance of the model.The ROC curve shows the trade-off between the true positive rate and the false positive rate under different thresholds.The AUC value represents the area under the ROC curve and is an important indicator to measure the performance of the classifier.The closer the value is to 1, the be er the performance is. Figure 13 shows the ROC curve and AUC value of the PU data set.It can be seen that the AUC values of categories 3, 5, 7, 8, and 9 are all 1, and the ROC curve is almost completely in the upper-left corner, achieving a very ideal classification situation.The AUC values of categories 1, 2, 4, and 6 are 0.97, 0.90, 0.97, and 0.98, respectively, which means that the classification ability of the DFGCN in these categories is slightly weaker than that of other categories.Although it is slightly lower than 1, it achieves satisfactory classification results.At the same time, the macro-average and micro-average AUC reached 0.98 and 0.94, respectively, indicating that the DFGCN has a good generalization ability and classification accuracy in multi-category classification tasks.

Ablation Network
To comprehensively evaluate the DFGCN network proposed in this paper, we conducted the following ablation experiments: First, we explored the role of two branches in the DFGCN network, namely the graph convolution branch based on multi-scale superpixel segmentation and the CNN branch in the DFGCN network.Secondly, the role of the SFEM in the GCN branch was evaluated.The experimental results are shown in Table 7.We found that, if the DFGCN lacks any of its branches, its overall classification accuracy is affected, and the experimental effect of the GCN branch is better than that of the CNN branch.In addition, as shown in Table 8, we found that when the SFEM is not added, the classification results will decrease slightly, which further verifies that the SFEM can enhance important channel information while transmitting information, thereby improving the classification effect.These experimental results fully demonstrate the effectiveness and robustness of the DFGCN network in hyperspectral image classification tasks.* The "?" here indicates whether this SFEM module has been added to the model."×" and " √ " respectively indicate whether the SFEM module has been added or not in this model.The meaning of "↑" is that after adding the SFEM module to the model, AA, OA, and Kappa have all been improved.Furthermore, we apply grayscale images on the PaviaU data set to visualize the importance of different channels at different scales, as shown in Figure 14 below.(a), (b), and (c) represent three different scales.Among them, the vertical axis represents the number of superpixels, and the horizontal axis represents the channel importance of a single superpixel.White represents the strongest importance, and black represents the opposite.In the red box areas in Subfigures (a), (b), and (c), the importance of these superpixels on all channels is surprisingly consistent, which shows that the superpixels in this range are very likely to be in the same category.In addition, in the blue box in Subfigure (a), the orange box in Subfigure (b), and the green box in Subfigure (c), the importance of all superpixels on these channels is almost the same.This shows that these channels have the same importance for the hyperspectral image classification task: some are important, and some are unimportant.Figure 7 also supports the claim that the model has increased its precision to different degrees after the SFEM has been added to the three base data sets, further demonstrating the significant effectiveness of the SFEM in the HSIC task.The red boxes indicate that the importance of superpixels within these ranges remains highly consistent across all channels at different scales, while the blue, orange, and green boxes represent that the importance of all superpixels on channels within these ranges is almost the same.

Qualitative Viewpoint Regarding Feature Discrimination
This section shows the fused output of our model's final graph convolution and the original spectral feature using t-SNE technology.Specifically, we complete the visualization based on t-SNE algorithms using the manifold module in the sklearn package.The t-SNE method requires a feature matrix, which comprises every non-background pixel with spectrum curves serving as feature vectors in the original image.Furthermore, we store the fused output characteristics in the final convolutional layer of the graph as feature vectors.Figure 15 displays the visualization outcomes for IndianP, PaviaU, and KSC.It can be observed that these samples are intermingled in the original spectral domain, making it difficult to classify them into the appropriate category.This situation is substantially be er with our approach, particularly in the IndianP and PaviaU data sets.For instance, in Figure 13, the original features of categories 2, 3, 10, and 11 in (a) are haphazardly combined and overlapped in the spectral domain.But in (d), these categories are discernibly distinguished and exhibit an aggregated state.Furthermore, the original features of categories 3, 5, and 6 in (b) are sca ered throughout the spectral domain, but these categories start to assemble and separate from other categories after our approach is applied.The aforementioned visualization outcomes illustrate that the suggested DFGCN can enhance our feature classification ability and achieve be er classification results.The red boxes indicate that the importance of superpixels within these ranges remains highly consistent across all channels at different scales, while the blue, orange, and green boxes represent that the importance of all superpixels on channels within these ranges is almost the same.

Qualitative Viewpoint Regarding Feature Discrimination
This section shows the fused output of our model's final graph convolution and the original spectral feature using t-SNE technology.Specifically, we complete the visualization based on t-SNE algorithms using the manifold module in the sklearn package.The t-SNE method requires a feature matrix, which comprises every non-background pixel with spectrum curves serving as feature vectors in the original image.Furthermore, we store the fused output characteristics in the final convolutional layer of the graph as feature vectors.Figure 15 displays the visualization outcomes for IndianP, PaviaU, and KSC.It can be observed that these samples are intermingled in the original spectral domain, making it difficult to classify them into the appropriate category.This situation is substantially better with our approach, particularly in the IndianP and PaviaU data sets.For instance, in Figure 13, the original features of categories 2, 3, 10, and 11 in (a) are haphazardly combined and overlapped in the spectral domain.But in (d), these categories are discernibly distinguished and exhibit an aggregated state.Furthermore, the original features of categories 3, 5, and 6 in (b) are scattered throughout the spectral domain, but these categories start to assemble and separate from other categories after our approach is applied.The aforementioned visualization outcomes illustrate that the suggested DFGCN can enhance our feature classification ability and achieve better classification results.

Computational Cost
The computational cost of the suggested DFGCN in comparison to the other grap based baseline techniques, namely GCN, S 2 GCN, and MDGCN, is displayed in Table We observe that the computational costs of the GCN and S 2 GCN are greater than those the MDGCN and DFGCN due to the lack of a preprocessing approach for superpixel s mentation.As a SOTA method, the MDGCN minimizes data volumes due to ultra-pi technology, significantly reducing the training time.Notably, on the three benchmar the proposed DFGCN outperforms the MDGCN by approximately 12 times, 5 times, a 2 times, respectively.Compared with SGML, the DFGCN achieves a be er classificat performance at the expense of a certain efficiency, which is acceptable.The DFGCN c achieve a be er classification performance with less training time, significantly alleviati the pressure of real-time needs in the HSIC task.This further demonstrates that the fram work we have suggested is a valid and effective HSIC model.

Computational Cost
The computational cost of the suggested DFGCN in comparison to the other graphbased baseline techniques, namely GCN, S 2 GCN, and MDGCN, is displayed in Table 9.We observe that the computational costs of the GCN and S 2 GCN are greater than those of the MDGCN and DFGCN due to the lack of a preprocessing approach for superpixel segmentation.As a SOTA method, the MDGCN minimizes data volumes due to ultra-pixel technology, significantly reducing the training time.Notably, on the three benchmarks, the proposed DFGCN outperforms the MDGCN by approximately 12 times, 5 times, and 2 times, respectively.Compared with SGML, the DFGCN achieves a better classification performance at the expense of a certain efficiency, which is acceptable.The DFGCN can achieve a better classification performance with less training time, significantly alleviating the pressure of real-time needs in the HSIC task.This further demonstrates that the framework we have suggested is a valid and effective HSIC model.

Conclusions
This study presents a dual-branch fusion of a GCN and convolutional neural network for HSIC, named the DFGCN.In the GCN branch, we initially segmented an HIS using a multi-scale superpixel segmentation method and constructed a fusion adjacency matrix for each scale.Based on the Euclidean distance, we added the Pearson correlation coefficient as a supplement to the feature correlation to better measure the similarity between nodes and extract more comprehensive and discriminative spatial features when building the adjacency matrix.This allowed us to find new graph structures to perform more effective graph convolution and learn more powerful node representation.Moreover, the spectral feature enhancement module was designed between two graph convolutions.In using selfsupervision, this module can enhance the feature expression of significant channels while comparatively limiting irrelevant spectral characteristics.In the CNN branch, we used the SE attention mechanism and depthwise separable convolution to focus on extracting local detailed features of the HSI as supplementary features.Based on the dual-branch multi-fusion network, rich spatial-spectral features can be fully extracted at multiple scales and extensively learned.Our experiments on three benchmark data sets showed that the suggested DFGCN outperforms advanced algorithms in HSIC tasks in terms of classification results.
The recently proposed SPSM (Superpixel-Subpixel Multilevel Network) is a threebranch network that reduces information loss by compensating for defects at different levels.We are considering conducting multi-branch explorations based on the DFGCN in the future.

Figure 1 .
Figure 1.An outline of the proposed DFGCN for HSIC.It consists of two branches: a GCN branch, based on multi-scale superpixel segmentation, and a CNN branch with an a ention mechanism.

Figure 1 .
Figure 1.An outline of the proposed DFGCN for HSIC.It consists of two branches: a GCN branch, based on multi-scale superpixel segmentation, and a CNN branch with an attention mechanism.

Figure 2 .
Figure 2. Segmentation maps acquired from the Indian Pines data set using the first principal component (PC) and adaptive multi-scale superpixel segmentation.Superpixel numbers at varying scales make up the figures: (a) first PC, (b) 262, (c) 525, and (d) 1051.

Figure 2 .
Figure 2. Segmentation maps acquired from the Indian Pines data set using the first principal component (PC) and adaptive multi-scale superpixel segmentation.Superpixel numbers at varying scales make up the figures: (a) first PC, (b) 262, (c) 525, and (d) 1051.

Figure 3 .
Figure 3. Implementation of the proposed spectral feature enhancement module.

Figure 4 .
Figure 4. Structural diagram of SE a ention mechanism.

Figure 4 .
Figure 4. Structural diagram of SE attention mechanism.

Figure 5 .
Figure 5. Indian Pines: (a) false-color synthetic image; (b) ground truth.2.Pavia University: The Reflective Optical System Imaging Spectrometer (ROSIS) facility in urban settings provided the PaviaU scenario.It consists of 610 × 430 pixels, with a spatial resolution of 1.3 m.After discarding noise bands on the spectral dimension, 103 bands remain, with wavelengths from 0.43 to 0.86 µm, which includes nine label categories of objects.Figure 6a,b depict the false-color composite image and ground truth, respectively.Specific data on the training, test, and total samples of each class are listed in Table2.

Figure 8 .
Figure 8. Impact of the parameters of α and the spectral feature enhancement module.

Figure 8 .
Figure 8. Impact of the parameters of α and the spectral feature enhancement module.

Figure 9 .
Figure 9. Performance of classification with varying values of γ in spectral feature enhancement.

Figure 9 .
Figure 9. Performance of classification with varying values of γ in spectral feature enhancement.

Figure 13 .
Figure 13.ROC curves and AUC values of each category in the PU data set.

Figure 13 .
Figure 13.ROC curves and AUC values of each category in the PU data set.

Figure 13 .
Figure 13.ROC curves and AUC values of each category in the PU data set.

Figure 14 .
Figure 14.Multi-scale channel importance visualization using the PaviaU data set.(a-c) are weight visualizations of different scales.The red boxes indicate that the importance of superpixels within these ranges remains highly consistent across all channels at different scales, while the blue, orange, and green boxes represent that the importance of all superpixels on channels within these ranges is almost the same.

Figure 14 .
Figure 14.Multi-scale channel importance visualization using the PaviaU data set.(a-c) are weight visualizations of different scales.The red boxes indicate that the importance of superpixels within these ranges remains highly consistent across all channels at different scales, while the blue, orange, and green boxes represent that the importance of all superpixels on channels within these ranges is almost the same.

Figure 15 .
Figure 15.The visualization of features from the IndianP, PaviaU, and KSC data sets using 2-D SNE.(a-c) are the original feature spaces of labeled samples and (d-f) are the data distribution labeled samples in the graph convolution feature space.Classes are represented by different colo

Figure 15 .
Figure 15.The visualization of features from the IndianP, PaviaU, and KSC data sets using 2-D t-SNE.(a-c) are the original feature spaces of labeled samples and (d-f) are the data distributions of labeled samples in the graph convolution feature space.Classes are represented by different colors. 1

Table 1 .
The Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) above Northwestern Indiana took the image of the IndianP.Its spatial size is 145 × 145, with a resolution of 20 m.It captured reflections of 220 bands, spanning from 0.4 to 2.5 µm in the spectral view.Two hundred bands remain after the removal of the noise and water absorption bands, including sixteen label types.Figure5a,b depict the false-color composite image and ground truth, respectively.Table1contains a detailed list of the training, test, and total sample counts for each class.The training, test, and total sample numbers and types of labeled land covers in the IndianP data set.

Table 1 .
The training, test, and total sample numbers and types of labeled land covers in the IndianP data set.

Table 2 .
The training, test, and total sample numbers and types of labeled land covers in the PaviaU data set.

Table 2 .
The training, test, and total sample numbers and types of labeled land covers in the PaviaU data set.

Table 3 .
The training, test, and total sample numbers and types of labeled land covers in the KSC data set.

Table 3 .
The training, test, and total sample numbers and types of labeled land covers in the KSC data set.

Table 4 .
The outcome of different methods for the IndianP data set classification.

Table 5 .
The outcome of different methods for the PaviaU data set classification.

Table 6 .
The outcome of different methods for the KSC data set classification.

Table 5 .
The outcome of different methods for the PaviaU data set classification.

Table 6 .
The outcome of different methods for the KSC data set classification.

Table 9 .
The computational expense of training and testing of different techniques.

Table 9 .
The computational expense of training and testing of different techniques.