UNLABELED SELECTED SAMPLES IN FEATURE EXTRACTION FOR CLASSIFICATION OF HYPERSPECTRAL IMAGES WITH LIMITED TRAINING SAMPLES

Feature extraction plays a key role in hyperspectral images classification. Using unlabeled samples, often unlimitedly available, unsupervised and semisupervised feature extraction methods show better performance when limited number of training samples exists. This paper illustrates the importance of selecting appropriate unlabeled samples that used in feature extraction methods. Also proposes a new method for unlabeled samples selection using spectral and spatial information. The proposed method has four parts including: PCA, prior classification, posterior classification and sample selection. As hyperspectral image passes these parts, selected unlabeled samples can be used in arbitrary feature extraction methods. The effectiveness of the proposed unlabeled selected samples in unsupervised and semisupervised feature extraction is demonstrated using two real hyperspectral datasets. Results show that through selecting appropriate unlabeled samples, the proposed method can improve the performance of feature extraction methods and increase classification accuracy.


INTRODUCTION
Feature extraction is one of the efficient approaches for overcoming Hughes phenomenon (Hughes, 1968) in hyperspectral image classification.In feature extraction, a transformation is applied by which all pixels in hyperspectral images are transferred from a space with dimension 'd' to a space with dimension 'r' (r ≤ d) (Jia et al., 2013;Hosseini and Ghassemian, 2015;Imani and Ghassemian, 2015a;Imani and Ghassemian, 2015b).Feature extraction algorithms can be divided into supervised, unsupervised and semisupervised ones.Supervised methods use Labeled samples so they are usually appropriate for classification purposes (Kamandar and Ghassemian, 2013;Imani and Ghassemian, 2014;Imani and Ghassemian, 2015c;Imani and Ghassemian, 2015d).They find a low dimension space where all labeled samples are provided.However supervised FE methods will not be prosperous if a limited number of training samples exist.In this situation unsupervised and semisupervised feature extraction methods show better performance because they can use all pixels in hyperspectral images.
Principal component analysis (PCA) (Fukunaga, 1990) and locality preserving projection (LPP) (He and Niyogi, 2004) are Commonly used unsupervised feature extraction methods.They use all pixels in hyperspectral image without knowing their labels.PCA maintains general information of data by maximizing covariance matrix.LPP preserves local structure of data by forming adjacency graph.semisupervised feature extraction algorithms use unlabeled samples beside labeled samples.Among which are semisupervised Marginal Fisher Analysis (SSMFA) (Huang et al., 2012), semisupervised local discriminant analysis (SELD) (Liao et al., 2013) and semisupervised feature extraction based on supervised and fuzzy-based linear discriminant analysis (SLDA) (Li et al., 2015).These methods maintain general structure of data and increase class separability by using unlabeled and labeled samples.
All unlabeled samples can be applied in unsupervised or semisupervised methods.A hyperspectral image has millions of pixels and using all of them increases storage and calculation costs.So, some unlabeled pixels can be applied in feature extraction process.Chang et al. (2014) defined unlabeled pixels pursuant to the Voronoi cells by using labeled pixels.Shi et al. (2013) determined proper unlabeled pixels using multilevel segmentation results.But unlabeled samples are selected randomly in most semisupervised and unsupervised feature extraction algorithms.Randomly selected samples may include outlier or mixed pixels.Samples may be selected from a limited area.This way, they cannot be an appropriate representative for total data.Therefore, just some unlabeled samples are suitable to be used in feature extraction process.The aim followed by feature extraction in hyperspectral image processing is exploitation of obtained data in classification.So, best result is acquired when data obtained from feature extraction methods that use selected unlabeled samples increase classification accuracy.
This article illustrates the importance of selecting appropriate unlabeled samples used in feature extraction methods.Also it proposes a new method for unlabeled samples selection using spectral and spatial information.Our proposed method has four parts including: PCA, prior classification, posterior classification and sample selection.As hyperspectral image passes these parts, selected unlabeled samples can be used in arbitrary feature extraction methods.In this article, unlabeled selected samples are used in unsupervised LPP and semisupervised SSMFA.We demonstrate that the performance of SSMFA and LPP with selected samples is significantly better than SSMFA and LPP, which use random samples.The experimental results on Pavia University (PU) and Indian Pines (IP) data sets show that appropriate unlabeled samples improves the performance of semisupervised and unsupervised feature extraction methods and increases classification accuracy.
The remaining is organized as followed: next section, details proposed method for unlabeled samples selection.Section 3 overviews LPP and SSMFA methods.Experimental results are reported in the fourth section and last section concludes the paper.

THE PROPOSED SELECTION METHOD
A hyperspectral image has millions of pixels and using all of them increases storage and calculation costs.So, some unlabeled pixels can be applied in feature extraction process.We propose a new method for selecting unlabeled samples in this article.The method includes four parts including: PCA, prior classification, posterior classification and sample selection.is the number of training samples,   refer to the number of training samples in the ith class, q is the number of classes and d denotes the number of spectral bands. × =   ×  and  ×   =   ×   are projected data using PCA transformation matrix (  ).r is the dimensionality of the projected data ( < ). and   are used in prior classification part.In the experiments, the number of r is empirically set  = 5 for   = 10 and  = 6 for   = 15.

Prior classification
Prior classification is carried out using Gaussian maximum likelihood (GML), support vector machine (SVM), and K nearest neighbor (KNN) classifiers (Duda et al., 2001;Pal and Mather, 2005).Classification results are obtained in the form of classification map.In classification map, a label (  = 1, … , ) is allocated to each pixel   .
The number of neighbors  = 4 for KNN classifier is chosen by experiments.Radial basis kernel function (RBF) is used for SVM classifier defined in LIBSVM (Chang and Linin, 2008).The five-fold cross validation is used to choose the optimal penalty parameters C and kernel parameter g in SVM.

Posterior classification
Posterior classification includes two steps to determine final label of each pixel.Firstly, primary label of each pixel is specified using classification maps.Label   will be allocated to   if it holds the label in all three classification maps.Otherwise, a zero label is considered for the pixel.In second step, final label of each pixel is obtained using labels in its spatial neighborhood.If, at least, five pixels in 8-spatial neighbors of central pixel   contain label   , this label will be kept for   , otherwise, its label will change to zero.
The objective of second and third sections is to put mixed pixels aside using spectral and spatial information.Mixed pixels are consist of more than one type of land covering and exist because of limitation in spatial resolution.They often found in spatial border of classes.Mixed pixels cannot be perfect representative for classes.So they affect the efficiency of algorithms used to hyperspectral images feature extraction.

Samples selection
In final part, unlabeled samples are selected among pixels nearest to the mean of classes resulting in the absence of outlier samples in unlabeled ones.Now, we can use the unlabeled selected samples in arbitrary feature extraction.In this article, we use the unlabeled selected samples in unsupervised LPP and semisupervised SSMFA.

LPP
LPP is an effective unsupervised method to reduce features (He and Niyogi, 2004).Using LPP, local structure of data is preserved and adjacent samples will be similar in original and transferred space.In order for this, information of adjacent samples is used to form adjacency graph.Let  ×   = [ 1 ,  2 , … ,    ] denote the unlabeled selected samples.Where   is the number of unlabeled samples.Each pair of unlabeled samples (  ,   ) are considered as nodes for the adjacency graph and an edge is added between   and   if they are among k nearest neighbors of each other.The weight matrix (A) of adjacency graph is defined as: Where t is local scaling parameter.In this article  =   ×   (Manor and Perona, 2005).Where   = ‖  −   () ‖ is a local scaling for   and    denotes kth nearest neighbors of   .Parameters k is not fixed and it is determined using experiments.In this article k is empirically set to 7. LPP transformation matrix () is calculated as follows: Where T * is transpose of T,   =     * ,   =     * , D is diagonal matrix with   = ∑     =1 and  =  −  is laplacian matrix.The solution is given by solving a generalized eigenvalue problem.

SSMFA
SSMFA is a semisupervised feature reduction method (Huang et al., 2012).In SSMFA geometric structure of labeled and unlabeled data is preserved and labeled data from variable classes are differentiated.In order for this, two between-class (Gb) and within-class (Gw) graphs are used.Let  ×(  +  ) be the total samples that first   samples are labeled and the rest   samples are unlabeled.In Gb, each pair of labeled samples (  ,   ) is considered as nodes for the graph and an edge is added between   and   if they have different class labels.In Gw, each pair of samples (  ,   ) are considered as nodes for the graph and an edge is added between   and   if they have the same class labels or at least one of them is unlabeled and they are among k + nearest neighbors of each other.The weight matrices   of Gb and   of Gw is defined as follows: .The solution is given by solving a generalized eigenvalue problem.

EXPERIMENTAL RESULTS
In order to evaluate the performance of proposed method, unlabeled selected samples are used in LPP and SSMFA.LPP and SSMFA performance using unlabeled selected samples (LPPUSS and SSMFAUSS) compared to LPP and SSMFA using random selected samples (LPPURS and SSMFAURS), is demonstrated using IP and PU data sets.
IP is related to an agricultural-woodsy area obtained through Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor from a test site in northeast of Indian Pine state.The data includes 224 spectral bands where after 24 noisy bands are removed, experiments are carried out on 200 bands.Each band consists of 145⨯145 pixels and it includes 21025 samples in total.The ground truth is designated into 16 classes.PU is related to Pavia University area in north Italy obtained by Reflective Optics System Imaging Spectrometer (ROSIS) sensor.It includes 115 spectral bands where after 12 noisy bands are removed, experiments are carried out on 103 bands.Pavia University is 610⨯340 pixels image and it includes 207700 samples in total.The ground truth differentiates 9 classes.
SVM and GML are used to classify hyperspectral images.RBF kernel is used for SVM classifier defined in LIBSVM.The fivefold cross validation is used to choose the optimal parameters in SVM.The penalty parameter C is tested between {10 0 , 10 1 ,..., 10 4 }, and the g parameter is tested between {10 0 , 10 -1 ,..., 10 -8 }.Classification results contain accuracy (Acc) and reliability (Rel) of classes, average accuracy, average reliability, overall accuracy and kappa coefficient (Cohen, 1960). =   ⁄ × 100 and  =   ⁄ × 100 where α is the number of samples classified correctly in related class, β is the total number of samples belonging to related class and γ is the number of samples classified correctly in related class.Overall accuracy is the percentage of correctly classified samples.samples increased in number.For IP dataset, maximum average accuracy in ill-posed classification problem with GML classifier using LPPURS and SSMFAURS features are 53.45% and 38.46% respectively.These values are obtained using 2000 unlabeled samples.But obtained average accuracy in the same classification problem with similar classifier are 53.77%using LPPUSS by 1000 unlabeled samples and 40.59% using SSMFAUSS by 1400 unlabeled samples.Similar results are acquired for datasets with GML and SVM classifiers.It can be derived from the results that unlabeled selected samples are an appropriate representative for total data because proposed selection method put mixed and outlier pixels aside using spectral and spatial information.According to experiments, LPPUSS and SSMFAUSS outperform LPPURS and SSMFAURS and they decrease storage and calculation costs using less number of unlabeled pixels.Results show that proposed unlabeled samples improve feature extraction methods performance and increase classification accuracy with limited training samples.

CONCLUSION
Selecting appropriate unlabeled samples has important role in the performance of semisupervised and unsupervised feature extraction methods.In this article, using pixels' spatial and spectral information, a method is provided for unlabeled samples selection.Proposed method can solve problems resulted from random selection like outlier or mixed samples.Unlabeled salected samples can be use in any arbitrary feature extraction method.In this article, we use the unlabeled selected samples in LPP and SSMFA.Results show that LPP and SSMFA using unlabeled selected samples provide remarkable results in ill-posed and poorly-posed classification situations.
examine performance of proposed method in classification problems, two experiments are carried out.In first experiment, we evaluate performance of proposed method in illposed and poorly-posed classification problems for two datasets.1400 unlabeled samples and 10 and 15 training samples are used from each class in ill-posed and poorly-posed classification problems respectively.Training samples and random unlabeled samples are chosen 10 times and the average results are reported.Average accuracy (%) versus number of extracted features is shown in figure1for IP dataset.Maximum average accuracy for (a) GML classifier, 10 training samples, (b) GML classifier, 15 training samples (c) SVM classifier, 10 training samples and (d) SVM classifier, 15 training samples are obtained by 5, 6, 15 and 15 extracted features respectively.Figure 2 displays average accuracy (%) versus number of extracted features for PU dataset.Maximum average accuracy for (a) GML classifier, 10 training samples, (b) GML classifier, 15 training samples (c) SVM classifier, 10 training samples and (d) SVM classifier, 15 training samples are obtained by 5, 5, 15 and 15 extracted features respectively.

Figure 2 .
Figure 2. Average accuracy versus the number of extracted features for PU dataset, (a) GML classifier, 10 training samples, (b) GML classifier, 15 training samples (c) SVM classifier, 10 training samples and (d) SVM classifier, 15 training samples Figure 3. Average accuracy versus the number of unlabeled samples by 10 training samples and 5 and 15 extracted features for GML and SVM classifiers respectively (a) IP, LPP, (b) IP, SSMFA, (c) PU, LPP, (c) PU, SSMFA In the first part, due to limitation in number of training samples and broad dimensions, PCA is used to decrease features.Let  × = [ 1 ,  2 , … ,   ] be the original data and  × ( + ) } is the k + nearest neighbors for   ,   be the   label and trade-off parameter β > 1 adjust the share of unlabeled and labeled data.= exp (− ‖  −   ‖ 2  ⁄ ) if   and   are among k nearest neighbors of each other.is calculated by all samples in .σ is the local scaling parameter.In the experiments parameter β is empirically set to 10.The number of neighbors is set to 5 for   = 10 and 7 for   = 15.σ is defined as t in LPP.SSMFA transformation matrix (T) is calculated as follows:Where   =    * ,   =    * ,   =   −   ,   =   −   and D is diagonal matrix with   = ∑     +  =1

Table 2 .
Classification results for IP dataset obtained by SVM classifier, 10 training samples, 1400 unlabeled samples and 15 extracted features Classification results in ill-posed classification problem using SVM classifier and 15 extracted features are reported in Table 1 for PU and Table 2 for IP datasets.In second experiment, number of training samples and extracted features are fixed and number of unlabeled samples varies from 200 to 2000 with a step size increment of 200.