Pseudo-Invariant Feature Selection for Crosssensor Optical Satellite Images

Remote sensing is the technology used to obtain the Earth surface information from aircraft or satellites without physical contacts. Many satellites have been used to observe the Earth continuously in the last four decades. For instance, the Landsat 1 was launched in 1972, and the latest generation, that is, Landsat 8, which was launched in 2013, is still operating to date. Landsat imagery is important to researchers who require satellite data for more than 20 years. Therefore, the archive of Landsat data is one of the most valuable data for the studies of land cover change detection and global climate change. In particular, since the launching of the first Thematic 3 Mapper sensor in 1982, the Landsat has provided high-spatial and high-spectral resolution images.


Introduction
Remote sensing is the technology used to obtain the Earth surface information from aircraft or satellites without physical contacts. Many satellites have been used to observe the Earth continuously in the last four decades. For instance, the Landsat 1 was launched in 1972, and the latest generation, that is, Landsat 8, which was launched in 2013, is still operating to date. Landsat imagery is important to researchers who require satellite data for more than 20 years. Therefore, the archive of Landsat data is one of the most valuable data for the studies of land cover change detection and global climate change. In particular, since the launching of the first Thematic 3 Mapper sensor in 1982, the Landsat has provided high-spatial and high-spectral resolution images.
A key to a study on long-term optical satellite images is the radiometry consistency of acquired images and radiometric stability of satellite sensors [1]. To eliminate the radiometric difference between images without ground measurements during data acquisition, a pseudo-invariant feature selection for cross-sensor relative radiometric normalization (RRN) is proposed. Radiometric difference emerges because of different sensors and different acquisition conditions [2]. The use of images from different sensors will trigger inconsistencies because of the different wavelengths and band widths of spectral bands in satellite sensors. For instance, as described by Mishra et al. and Roy et al. [1,2] the spectral response of Landsat 8 Operational and Imager (OLI) is narrower than that of Landsat 7 Enhanced Thematic Mapper Plus (ETM+). The spectral band edges in Landsat 8 OLI have been refined to avoid atmospheric absorption effects. Consequently, when two sensors observe the same region at the same time, they generally report different at-sensor radiances and spectral signatures. Another radiometric inconsistency occurs due to different acquisition conditions, including different atmospheric absorptions, illuminations, view angles, Radiometric correction is a solution to these inconsistency problems. This technique can be classified into two categories, namely, absolute and relative [4,5]. Afterward, the relative method is named as RRN. The absolute method requires radiometric calibration, an atmospheric model, and a target reflectance model [3]. However, these data shall be obtained at the time of image acquisition, there by making the implementation relatively difficult to carry out. By constrast, RRN requires no in-situ atmospheric and ground data, which makes the method feasible for historical satellite images from the same and even different sensors.
RRN takes bitemporal images as inputs, which consist of reference and target images. The reference image is selected manually or automatically, and the target images are transformed to the reference image. The cross-sensor method is performed for paired images from different sensors under the assumption that the bitemporal images are spatially nonlinearly correlated. On the contrary, the image series from the same sensor is assumed spatially linear, and the transformation is consequently approximated by using a linear function [6,7]. Both cross-sensor and same-sensor methods involve the normalization of the digital numbers of target images and their transformation to that of the reference image. The advantage is that the normalized images are adjusted to the reference image, which means the sensor calibration and atmospheric and illumination conditions are adjusted to a common level [8]. Recently, several researchers have devoted efforts to pseudoinvariant feature (PIF) selection of bitemporal images, such as the methods based on principal component analysis, to analyze land cover change [6,9]. Nevertheless, the method is limited with the independent use of each single band; thus, the relation between bands are not maintained during PIF selection. A decade before the PIF method was developed [10] a rediometric rectification technique that transforms the images of the same areas through the use of landscape element whose reflectance is nearly constant during a time. The same with the procedure proposed by Garcia [11], the method presents disadvantages due to the selection by using visual inspection. Hence, a subjective radiometric normalization occurs. The most suitable mathematical model that can be used to describe the objective information between two images is regression. The algorithm assumes that the pixels at the bitemporal images are linearly related to the pixels at the same location. This assumption indicates that the spectral reflectance of the same location is unchanged or called PIF during the time interval. The key to the image regression is the objective selection of PIF. However, the related methods [7,12] are still unstable on PIF selection when images contain significant land cover changes.
In this article, a PIF selection procedure is presented for the relative radiometric normalization of bitemporal images from different sensors. The method aims to determine suitable PIF based on the multivariate alteration detection (MAD) [7]. The PIF selection proposed by Canty and Nielsen [7] adopts MAD based on canonical correlation analysis (CCA), which can deal with multivariables. Therefore, all spectral bands are processed simultaneously. However, the CCA-based MAD is sensitive to land cover changes, such as cloud covers. In the present study, a MAD based on kernel CCA (KCCA), also called KMAD, is proposed to alleviate the spatially nonlinear problems caused by cover changes and different sensors. KMAD assumes that images are nonlinear in original space and extracts PIFs by projecting images into a high-dimensional feature space, in which linearity exists in data. In addition, a regularization term in KCCA optimization is adopted to avoid trivial solutions and overfitting which may happen in the related studies [13,14].
The main contribution of this study is the introduction of KMAD to deal with the nonlinear relation between images from different sensors, i.e., Landsat 7 ETM+ and Landsat 8 OLI, and the regularization term in KCCA optimization to avoid trivial solutions and overfitting. The remainder of this paper is organized as follows. Figure 1 illustrates the proposed method which consists of two main steps: PIF selection and radiometric normalization. The input to the method is a bitemporal image, in which one image is acquired by Landsat 7 ETM+ and another image is acquired by Landsat 8 OLI. In the first step, PIFs are selected using the proposed KMAD in a highdimensional space with spatial linearity instead of the original space with spatial nonlinearity. Generally, the sensor's response changes over time due to the following reasons: differences in the illumination and observation angles, variation in atmospheric effects, and changes in the object reflectance. All these radiometric variations are randomly distributed over the images during data acquisition. Therefore, PIFs are extracted, and radiometric correction is conducted in the second step to correct the image digital count to a common scale. In radiometric correction, the selected PIFs are used in a regression for transformation coefficient determination. In this section, the CCA-based MAD is reviewed in Section CCA-based MAD and the proposed KMAD is introduced in Section KCCA-based MAD.

CCA-based MAD
The CCA is used to describe the linear relation between two random variables. Given two images X and Y that contain n pixels and k spectral bands, these two images are represented as linear combination of coefficient vectors a and b as follows: U=a T X= a i X i +…+a k X k , and where X i and Y i denote i-th spectral bands in images X and Y, respectively, and a i and b i represent i-th components in coefficient vectors a and b, respectively. According to Hotelling [15], CCA is used in determining the basis vectors of two random variables such that the variance between the projections of the variables onto these basis vectors is mutually maximized. In other words, coefficient vectors a and b should be determined to maximize the positive correlation between U and V. Hence, the difference in U and V shows the maximal spread in pixel intensity. Coefficient vectors a and b are optimized as follows: where Corr(a T X, b T Y) represents the correlation between U and V; Σ XX and Σ YY indicate the variance of X and Y, respectively; and Σ XY denotes the covariance of X and Y. The optimization in Eq. (2) can be simplified as follows: subject to the constraints Var(U)=a T Σ XX a=1 and Var(V)=b T Σ YY b=1. To solve the constrained optimization, the Lagrange multipliers are introduced in the equation. Eq. (3) is reformulated as follows: where λ a and λ b are the Lagrange multipliers. To obtain the optimized coefficient vectors a and b, let Substituting the coefficient vector a in Eq. (5) to the square of correlation in Eq. (2) obtains the following equation: Figure 1: System workflow. The proposed method consists of two main steps: PIF selection and radiometric normalization.
When the coefficient vector b in Eq. (5) is substituted to the square of correlation in Eq. (2), another equation similar to Eq. (6) can be obtained. Subsequently, the optimization problem can be transformed to eigen problems as follows: The desired transformed vectors U=a T X and V=b T Y can be obtained from the ordered eigenvectors {a 1 ,⋯, a k } and {b 1 ,⋯, b k }, which correspond to the eigenvalues 2 According to Nielsen et al. [16], the b difference components.
are referred as the MAD components. The first MAD component presents maximal spread in its pixel intensities and maximal change information, and the final MAD component contains minimal change information. In addition, the MAD components are invariant to the linear transformation of the intensities of image spectral bands. To select PIFs using MAD with a threshold, all MAD components are combined and normalized as follows: where t represents a specified threshold value.

KCCA-based MAD
CCA can describe the linear relation between two random variables. Nevertheless, KCCA is more suitable than CCA for spatially nonlinear data. In KCCA, data are projected to a high-dimensional feature space, as the illustration shown in Figure 2, in which the spatial nonlinearity of data in the original space is transformed to linearity in the feature space [17][18][19][20]. The transformation or mapping function of data is formulated as follows: Furthermore, the optimization in Eq. (2) is reformulated as follows: where α and β are new unknown coefficient vectors representing the previous coefficient vectors a and b, respectively, and a=φ (X) × α and β=φ (Y) × β. With the mapping function φ in Eq. (10), Eq. (11) is reformulated as follows: Let K x =φ(X T ) φ(X) and K y =φ(Y T ) φ(Y) be the kernel matrices corresponding to the input images X and Y, respectively. The optimization equation in Eq. (12) is simplified as follows: The objective function is further simplified as the following: . Kernel functions K x and K y can be presented in many forms. The mapping function φ belongs to the reproducing kernel Hilbert space with associated real-valued positive definite kernel. The commonly used Gaussian radial basis function kernel (Gaussian RBF) is adopted as the kernel function, that is, where x i and x j are pixels in an image, k(x i , x j ) is the entry (i, j) in the kernel matrix, and σ is the parameter of Gaussian function. According to Hardoon et al. [21], σ is set to the maximum of the standard deviations of each spectral band. Additionally, similar to CCA that requires a zero-mean input data, the data in KCCA should be centered in high-dimensional space. Given a set of pixels {x 1 , ⋯, x n } in image X, under the mapping function φ, the points are centered in highdimensional space by using  . The proof in Schölkopf et al. [22] indicates that the kernel matrix with centered data in high-dimensional space, which is called centered kernel matrix, can be obtained by the following equations: y n n y n y y n n y n y n n y n y y n n y n where 1 n is a n × n matrix, in which each entry is 1⁄n, that is, (1 n ) ij =1⁄n. Consequently, the Lagrange function is transformed as follows: where λ α and λ β are the Lagrange multiplies. With the derivative of the optimization L in Eq. (17) with respect to the coefficient vectors α and β, the equation called stationary equation can be obtained as follows: The substitution of coefficients α and β into the stationary equations in Eq. (18) makes the optimization become eigen problems, that is, However, the kernel matrices K̃x and K̃y are invertible, and the eigen equations in Eq. (19) become Iα=λ α λ β α or Iβ=λ α λ β β, where I is the identity matrix. In this case, trivial solutions will be obtained. To avoid the trivial solutions and the overfitting problem, a regularization term is added to the optimization in Eq. (13) to penalize the norms of the associated coefficients. Afterward, the optimization equation is transformed as follows: subject to the constraint , where r||a|| 2 and r||b|| 2 are the regularization terms, which are equal to Taking the derivatives of L with respect to α and β, respectively, the stationary equations are obtained as follows: Variable α is substituted into the stationary equation, and the equation becomes Similarly, substituting variable β to the stationary equation obtains another equation similar to Eq. (23). Afterward, the final form of the kernel regularization becomes the following eigen problems: When the eigenvalue problems are solved, the ordered eigenvectors {α 1 ,⋯, α n } and {β 1 ,⋯, β n } corresponding to the eigenvalues {λ 1 ,⋯, λ n } can be obtained, in which λ 1 ≤ … ≤λ n . With the coefficients vectors α and β, the KMAD components can be obtained as follows: Where (U) s × n =(a T ) s × k (φ(X)) k × n (V) s × n =(b T ) s × k (φ(Y)) k × n and s represents the number of solutions; n and k denote the numbers of pixels and bands, respectively; (a T ) s × k and (b T ) s × k are the projected samples in the high-dimensional feature space, which can be formulated as follows: (a T ) s × k =(α T ) s × n (φ(X T )) n × k , and (b T ) s × k =(β T ) s × n (φ(Y T )) n × k The transformed vectors U and V are reformulated with the kernel matrices K̃x and K̃y and the obtained coefficient vectors α and β as follows: (U) k × n = α T φ(X T )φ(X)=(α T ) k × n (K̃x) n × n and, KMAD, which is based on nonlinear correlation, is performed in high-dimensional feature space. The number of obtained KMAD components is equal to the number of spectral bands of input images. Figure 3 shows an example of KMAD components. The input data are the images from Landsat7 ETM+ and Landsat 8 OLI. The KMAD components are shown in the bottom row, in which the KMAD values are displayed by gray levels ranging from white (most significant change) to black (least significant change). The first KMAD component displays the maximal spread in its pixel intensities and maximal change information. The final KMAD component contains minimal change information.
Similar to Eq. (9), in PIF selection using KMAD with a threshold, all KMAD components are combined and normalized as follows: The KCCA process is summarized as follows. As shown in Figure  4, the kernel matrices K̃x and K̃y are calculated using Eqs. (15) and (16) with the input images X and Y. In this study, the Gaussian RBF is selected as the kernel function. Subsequently, KCCA is formulated as an optimization equation with the regularization terms shown in Eq. (20). The optimization equation is solved with the aid of Lagrange multipliers in Eq. (21). The stationary equations in Eq.

Test data and study areas
In the experiment, Landsat-7 ETM+ and Landsat-8 OLI images that contain various landscapes with cloud covers are used to evaluate the feasibility and performance of the proposed method.  Table 1. The New Mexico-US acquisitions contain desert landscape with only a few cloud covers; the Taiwan acquisitions contain forest landscape with 10% cloud cover; the Java acquisitions contain agricultural/vegetation landscape with 5% cloud cover, and the China acquisitions contain urban landscape with 25% cloud cover. These tested data containing various landscapes with different cloud cover rates are selected for quantitative and qualitative analyses.

Experimental results
When PIFs are extracted, the commonly used orthogonal least squares regression is adopted to determine the transformation coefficients, namely, slope ν and intersect ε. The target image Y is transformed to the reference image X by using the following linear transformation: X=ν × Y+ε [23]. The experimental results of KCCA, including the original input images, PIF selection, and radiometric normalization, are presented in Figures 5-8. Note that the black strips in Landsat 7 images are caused by the Scan Line Corrector   failed. In Figures 5, 6 and 8, the input images contain 10%, 2%, and 5% cloud covers only. The PIF extraction results of CCA-based MAD is comparable with that of the proposed method, and visually pleasing radiometric normalization results are generated thanks to the successful PIFs extraction. However, in Figure 7, the input images contain 25% cloud covers. Some of the cloud covers are misclassified as PIFs in the method of CCA-based MAD, and thus, perceptible radiometric difference occurs in the radiometric normalization results. By contrast, the cloud covers in the tested images of Figure 7 are classified as non-PIFs using the proposed method, and thus, the radiometric difference between the normalized images is eliminated and imperceptible. From these results, we can see that the proposed MMAD can successfully extract PIFs even the images contain significant cloud covers.

Evaluation
Two experiments are conducted to evaluate and compare the proposed method with related methods. The first experiment compares the proposed method with the method based on CCA [7]. In the experiment, images that contain a large amount of clouds and significant illumination differences are used to evaluate the ability of PIF extraction. The comparison results are shown in Figure 9. In the scatterplot, the y-axis is the digital number of the first image that contains clouds, and the x-axis is that of the second image that is nearly cloud-free. The proposed PIF selection that uses KCCA with centered kernel matrix and regularization term in optimization is compared with CCA [7]. CCA considers no spatial changes, which makes it sensitive to cloud covers and land cover changes. Figure 9 indicates that the major axis from CCA is unstable because of improper PIF selection. To further compare the proposed KCCA with CCA, different NMAD and NKMAD thresholds in Eqs. (9) and (28), including t=25%, 15%, 10% of the number of pixels, are evaluated. Figure 10 demonstrates that the outliers in the scatterplots of PIFs extracted by the proposed KCCA are       less than that by the CCA, especially the results at t=15% and t=10%. Therefore, the proposed method is robust than that of the CCAbased MAD in terms of cloud cover resistance.
The second experiment is the statistical comparisons of normalization results between CCA-based MAD and the proposed method. The statistical comparisons are shown Tables 2-4. The commonly used measurements, including root-mean-squared error (RMSE), standard deviation, mean, and coefficient of determination, are adopted for quantitative analysis. Correspondingly, these measurements are defined as MADs can significantly improve the radiometric consistency, which indicates that the radiometric normalization works well. Furthermore, the RMSE-(B) is reduced 3-6% for the cases with few cloud covers and reduced 10-20% for the case with significant cloud covers, compared the normalization results of CCA-based MAD and that of the proposed method. The statistical results indicate that the proposed KMAD outperforms CCA-based MAD, in terms of radiometry normalization.

Conclusions and Limitation
This study presents a KCCA-based PIF extraction for cross-sensor satellite images. In the experiments, the images from Landsat 7 ETM+ and Landsat 8 OLI sensors were tested. Compared with CCA that assumes the relation between the at-sensor radiances of bi-temporal image is spatially linear, KCCA assuming the relation between sensor radiance is spatially nonlinear can obtain more appropriate PIFs for cross-sensor images. In addition, a regularization term, that is, the norms of the associated coefficients, is added to the optimization of KCCA to avoid trivial solutions and overfitting. The experimental results show that stable KMAD components can be generated, and high-quality PIFs and normalization results can be obtained. The qualitative and quantitative analyses indicate that the proposed KMAD method is better than that of CCA-based MAD in terms of PIF extraction and radiometric normalization, particularly for images that contain significant cloud covers. Nevertheless, the proposed method is limited with long time and high space complexities because of the large storage requirement of kernel matrix and the computation of inverse kernel matrix. In the future, these limitations can be alleviated by using the combination of KCCA and CCA in radiometric normalization. MT, MR, and MN represent the means of target, reference, and normalized images, respectively. ST, SR, and SN denote the standard deviations of target, reference, and normalized images, respectively. RMSE-(B) and RMSE-(A) indicate the RMSE of PIFs before and after normalization, respectively.