Simulation of EO-1 Hyperion Data from ALI Multispectral Data Based on the Spectral Reconstruction Approach

Data simulation is widely used in remote sensing to produce imagery for a new sensor in the design stage, for scale issues of some special applications, or for testing of novel algorithms. Hyperspectral data could provide more abundant information than traditional multispectral data and thus greatly extend the range of remote sensing applications. Unfortunately, hyperspectral data are much more difficult and expensive to acquire and were not available prior to the development of operational hyperspectral instruments, while large amounts of accumulated multispectral data have been collected around the world over the past several decades. Therefore, it is reasonable to examine means of using these multispectral data to simulate or construct hyperspectral data, especially in situations where hyperspectral data are necessary but hard to acquire. Here, a method based on spectral reconstruction is proposed to simulate hyperspectral data (Hyperion data) from multispectral Advanced Land Imager data (ALI data). This method involves extraction of the inherent information of source data and reassignment to newly simulated data. A total of 106 bands of Hyperion data were simulated from ALI data covering the same area. To evaluate this method, we compare the simulated and original Hyperion data by visual interpretation, statistical comparison, and classification. The results generally showed good performance of this method and indicated that most bands were well simulated, and the information both preserved and presented well. This makes it possible to simulate hyperspectral data from multispectral data for testing the performance of algorithms, extend the use of multispectral data and help the design of a virtual sensor.


Introduction
Remote sensing is playing an increasingly important role in earth science research and environmental problem solving. A number of earth satellites have been launched to advance our understanding of Earth's environment. Satellite sensors, both active and passive, capture data from visible to microwave regions of the electromagnetic spectrum. A wide range of satellite data, including multispectral data and hyperspectral data, such as Landsat Thematic Mapper 5/Enhanced Thematic Mapper (TM/ETM+); Global Imager (GLI); Moderate Resolution Imaging Spectroradiometer (MODIS); and Advanced Land Imager (ALI) and Hyperion, are frequently used in oceanography, hydrology, geology, forestry, and meteorology studies. Different studies and applications require different spatial, spectral, radiant resolution, and time-resolution data [1,2]. Hyperspectral sensors monitor hundreds of spectral bands and can provide near-laboratory quality reflectance spectra. The data produced, referred to as hyperspectral data, contain much more information than multispectral data and have greatly extended the range of remote sensing applications [3,4]. Unfortunately, hyperspectral data are much more difficult and expensive to acquire and were not available prior to the development of operational hyperspectral instruments. On the other hand, large amounts of accumulated multispectral data have been collected around the world over the past several decades, therefore it is reasonable to examine means of using these multispectral data to simulate or construct hyperspectral data, especially in situations where the latter are necessary but hard to acquire. Many studies have examined methods to simulate or construct hyperspectral and multispectral data spectra from field spectra or to aggregate spectra of hyperspectral bands into multispectral bands. However, few attempts have been made to simulate hyperspectral data from multispectral data [2,[5][6][7][8][9]. In this paper, we propose a method, based on a spectral reconstruction approach, to simulate hyperspectral data from multispectral data.
Data simulation is widely used in remote sensing. It is often utilized to produce imagery for virtual or new sensors that are in the design stage. Simulated data can be used to assess or evaluate the spectral and spatial characteristics of the sensor, which are critical in the planning of a project [8]. NASA has developed a system to simulate imagery to meet customer needs and costs in a virtual environment (http://www.esad.ssc.nasa.gov/art/). Meanwhile, data simulation is often used to evaluate or assess the influence of different spectral or spatial resolutions on some applications and thus to select the appropriate resolution for a particular problem [10][11][12][13]. For example, by simulating hyperspectral data with different spatial resolution, Luo [12] evaluated the adaptability of linear spectral unmixing to different levels of spatial resolution. Jiao [13] simulated hyperspectral data to evaluate the influence of spatial and spectral resolution to vegetation classification. In addition, simulated data are often used to evaluate and test novel algorithms such as target detection and identification algorithms in hyperspectral remote sensing. There is no easy method to simulate hyperspectral data for testing the performance of these algorithms. If simulated hyperspectral data can be easily obtained, it will greatly help the testing and development of new algorithms.
The universal pattern decomposition method (UPDM) is a sensor-independent method which can be considered as a spectral reconstruction approach, in which each satellite pixel is expressed as the linear sum of fixed, standard spectral patterns for water, vegetation, and soil, and the same normalized spectral patterns can be used for different solar-reflected spectral satellite sensors [14]. Sensor independence requires that analysis results for the same sample are the same or nearly the same regardless of the sensor used. Based on this trait, here we present a method based on the universal pattern decomposition method (UPDM) to achieve the goal of simulating hyperspectral data from multispectral data, which can be considered either a method of spectral construction or spectral transform. The hyperspectral and multispectral data are NASA EO-1 satellite/Hyperion and EO-1/ALI data, respectively (see Section 3.2 for a brief introduction). First, we obtained ALI and Hyperion data covering the same area and performed atmospheric correction to obtain surface reflectance data; here Hyperion data served as standard or real data to evaluate the results in the subsequent analysis. Then, we obtained the decomposition coefficients thought to be sensor-independent for the same sample by applying the UPDM to ALI data; these coefficients were subsequently used to construct Hyperion data. Before performing UPDM, standard pattern matrices of both sensors were calculated based on the standard spectral patterns (see Section 2 for details). Finally, the simulated Hyperion data were compared with the real Hyperion data, i.e., test data, to evaluate and assess this method.

Review of the Universal Pattern Decomposition Method (UPDM)
The spectral reconstruction approach is based on the UPDM, which is a sensor-independent method derived from PDM that has been successfully applied in many studies [14][15][16][17][18][19][20][21]. This method can be explained by multi-dimensional analysis, which is also mathematically and practically almost identical to the spectral unmixing method [22][23][24][25][26]. UPDM decomposes reflectance values at each pixel into a linear sum of standard spectral patterns for water, vegetation, soil, and any supplemental patterns using the following formula [20,21]: Here, i R is the reflectance of band I measured on the ground or by satellite sensor; w C , v C , and s C are the decomposition coefficients for water, vegetation, and soil, respectively; 4 C represents the supplemental coefficients; and w P , v P , and s P are the respective standard spectral patterns for water, vegetation, and soil for some typical sensor captured from the same standard pattern normalized in the same wave region of 350 nm-2500 nm for any sensor, and are therefore related to the properties of each sensor. 4 P is the supplementary standard pattern and is an optional component that can be controlled for the purpose of the study. For each sensor band, the standard spectral patterns of each band iw P iv P , and is P are calculated as follows: where ( ) is the normalized standard pattern, which is fixed for use for all sensors and defined as: where ( ) λ R k represents the spectral reflectance patterns of standard objects and ∫ dλ refers to integration of the total wavelength range from 350 nm to 2500 nm. Obviously, ( ) λ R k satisfies the following normalization equation: As the supplemental pattern is not fixed, it can be chosen according to the purpose of the study. As an example, we used a yellow-leaf spectrum to briefly show how a supplemental is added. Due to the multi-colinearity, the yellow-leaf pattern cannot be added directly. A residual yellow-leaf pattern is used as the supplementary spectral pattern (see [20]). By analogy to (3), ( ) λ P 4 is defined as follows: where ( ) λ r 4 is the residual yellow-leaf value: is the measured spectral value for the yellow-leaf sample. For any sensor, i4 P values are calculated using (2) in the same way. For simplicity, we express UPDM in matrix form as follows [14]: is the 4 × n matrix, called the standard pattern matrix, in which the row vector is the standard spectral pattern for band number n, and P is unique and fixed for a sensor once the standard spectral patterns are given.
is the column vector of UPDM coefficients, and r is the residual column vector. C can be obtained by minimizing the sum-of-squarederror criterion function: The reduced 2 χ employed to evaluate the precision of fitting is defined as follows [21]:

A Modification of UPDM
Spectral sensitivity is an important parameter of any sensor and is normally expressed as the spectral response function (SRF), which is the relative responsivity of the sensor to monochromatic radiation of different wavelengths. Various studies have indicated the effects of sensor SRFs on analysis results in a variety of applications, and it is very important to take SRFs into account when comparing data from different sensors and applying physical models [27][28][29][30][31]. To take the effects of SRF into consideration, we revised (2) to calculate standard pattern matrix as follows: where ( ) λ S i represents the SRF of a sensor of band i. The other equations in UPDM remain the same.

Study Area
The study area was located in Yueyang in northeastern Hunan province, P.R. China, where Dongting Lake, China's second-largest freshwater lake, connects to the Yangtze River. The climate of the Dongting Lake area is between middle and northern subtropical. The annual mean temperature is about 16.4°C-17°C; the mean temperature in January is 3.8°C-4.5°C and in July is about 29°C. The annual precipitation is about 1,100 mm-1,400 mm, with more than half of the rainfall occurring between April and June.

Remote Sensing Data
The remote sensing data used in this study were ALI and Hyperion images. Hyperion and ALI, collecting data over the same area simultaneously, are two of the three sensors onboard the NASA EO-1 satellite with sun-synchronous orbit at an altitude of 705 km. The cross-track width of an ALI scene and a Hyperion scene are 37 and 7.7 km, respectively. The along-track scene length for both ALI and Hyperion will generally be either 42 km or 185 km, depending on the dimensions specified when the scene was scheduled. ALI was built to provide vital information for the next Landsat mission, with 1 panchromatic band and 9 multispectral bands, most of which are comparable to ETM+ bands. The Hyperion sensor collects a total of 242 bands, and its final L1R product provides a total of 198 bands representing 427-2,395 nm continuous spectra with 10-nm spectral resolution.
ALI and Hyperion data covering the study area were acquired on September 2, 2002. The center of the images is 29.38° N, 113.06° E. Frequent torrential rain in May and June 2002 led to localized flooding, which damaged crops and infrastructure along the shores of Dongting Lake. ALI data served as source data from which we attempted to simulate hyperspectral data, whereas Hyperion data served as test data, i.e., real data, to test and evaluate the final results.

Preprocessing of Remote Sensing Data
Vertical streaks between columns in the along-track direction of image data in a push-broom system are quite common. Such effects are evident in Hyperion data, especially in shortwave infrared (SWIR) channels. To remove the striping, we adopted a statistical balancing method that calculates the mean and standard deviation of a local selectable neighborhood of columns as the reference values to adjust the column data [32]. Local instead of global statistical moments are used to minimize changes made before atmospheric correction as global statistical destriping may alter mid-frequency spatial effects [32,33]. Only a small subset of bands with the most severe streak effects was selected for such destriping. The intention here was still to minimize changes, although this conservative treatment may not remove striping effects fully.
After destriping, Hyperion L1R data were atmospherically corrected using ACORN5.0 in mode 1.5 to obtain surface reflectance data. As ALI data are of very good quality, atmospheric correction was performed directly using ACORN5.0 in mode 5.0 without similar preprocessing.
Although both sensors are onboard the same satellite, the areas covered by the pixels of their images were not strictly identical. Hyperion data were geometrically corrected to ALI data using firstorder polynomial interpolation and bilinear resampling.

Obtaining Standard Pattern Matrix
As described in Section 2, it is necessary to calculate standard pattern matrices of ALI and Hyperion to apply UPDM. According to (12), SRFs of both sensors are also needed. The SRFs of ALI are available from the website of CSRIO (http://www.eoc.csiro.au/hswww/oz_pi/specresp.htm). We use the Gaussian function to simulate the SRFs for Hyperion. The Gaussian function is widely accepted for many detector instruments and is reasonably constant across detector arrays within the same optical system [34].
can be represented by the central wavelength i λ and the bandwidth i σ , which is a function of Full Width at Half Maximum (FWHM) (14). With the assumption that the peak of the Gaussian function corresponding to the central wavelength is 1, the formula of ( ) given as: where the subscript i represents the band i of a sensor. All SRFs of Hyperion can be well constructed in this way involving two variables, i.e., central wavelength and FWHM, which are known parameters.
After obtaining the SRFs of both sensors, the standard pattern matrices are calculated using (12). The standard spectral patterns used here were the same as those used previously [20]. The standard pattern matrix of ALI, denoted as A P , has an order of 4 9 × , and the matrix of Hyperion, denoting as H P , has an order of 4 106 × . That is, information from all ALI multispectral bands is used to simulate Hyperion data. Due to the strong water vapor absorption, low SNR, and valid region of wavelength of standard spectral patterns used, a subset of 106 Hyperion bands were used in this study (Table 1).

Simulating Hyperion Data Based on UPDM from ALI Data
UPDM is applied to ALI data to acquire the decomposition coefficients vector A C , which is considered to be sensor-independent, i.e., it holds the same value when UPDM is applied to Hyperion data: where subscript A denotes ALI.
To construct Hyperion data, we substitute A C for H C in the following equation: ( ) where subscript H similarly denotes Hyperion.

Results and Discussion
Following the process flow discussed above, 106 bands were simulated based on UPDM from ALI data. We used source data referring to ALI data, simulated data referring to the new generated Hyperion data, and original data or real data referred to as real Hyperion data. Here, we evaluated this simulating method by comparing simulated and original data with regard to three aspects. First, the general appearances of both types of data were compared by visual interpretation to determine whether they have similar visual effects. Second, the statistical characteristics of both types of data were compared to determine whether they show a good correlation. Finally, we performed classification of both types of data to evaluate how information is preserved in application.  Figure 4 for detailed discussion of correlation). Specifically, the wavelength of band 94 is not covered by any ALI band. Therefore, we wish to show not only those Hyperion bands covered by ALI bands that can be explained as divided from the ALI band, but also those that are uncovered, which are simulated or "created." Band 13, band 94, and band 148 had higher correlation coefficients (0.98, 0.97, and 0.96), which indicate well-simulated bands, whereas band 19 had a lower correlation coefficient of 0.88, indicating a minor fraction of bands that are not simulated as well as others (Figure 1).

Comparing Simulated Hyperion Data and Real Hyperion Data by Visual Interpretation
By interpreting each pair of images in Figure 2, we can see no significant differences between the simulated and original data for band 13, band 94, or band 148. The tone, pattern, texture, shape, and border of objects in both images all look quite similar with no obvious differences. These observations indicated that, from the viewpoint of visual interpreting, the information of ground objects and features presented in real data were also well preserved and presented in our simulated Hyperion data. The high degree of similarity and coherence of band 94 showed that our method could well simulate even those bands uncovered by multispectral data. Band 19 showed lower degrees of similarity and identity, with the most obvious differences in the color of objects; however, the texture and shape were still consistent and coherent. The most obvious differences were related to some parts of the vegetation area. These differences may have been because our standard spectral pattern of vegetation was not collected in the study area and may therefore not accurately present the characteristics of vegetation in this area around the wavelength of band 19. It is thought that the simulated results would have been much better if field spectra for this area during the same period had been available.
For comparison of detailed regions, we selected a small area covering the border of the pond circled by the rectangle in bands 94 and 148. This small region had a great deal of variety and many details and therefore served as a good test region (Figure 2). Similar to the observations discussed above, most details and variety were also well presented in the simulated data. For example, the texture of the pond, the pattern of vegetation distribution on the left part of this region, several small bright objects in the lower right part, and the border between the pond and land remained consistent and coherent between the original and simulated data. The original data often appear slightly vaguer than the corresponding simulated data. This is attributable to the geometrical correction in which the bilinear resampling method can degrade the spatial resolution of the original Hyperion data. However, this effect cannot be removed completely, as instantaneous fields of view of both sensors are not completely identical.

Comparing Simulated and Real Hyperion Data by Statistical Analysis
The reflectance of original 9-band ALI multispectral data and the simulated 106-band Hyperion data from ALI is obviously different. However, the reflectance of original Hyperion data and that of the simulated Hyperion data is very similar, the latter spectral curve are a little more smooth than that of the former.
For simple statistical comparison, we used the mean and standard deviation of each band from both datasets (Figure 3). Although the mean and standard deviation of each pair of bands of both datasets showed a large degree of variation, we still obtained some good observations. For most bands, the trends of variation in the mean and standard deviation and their dynamic ranges were similar.  Table 1). We also calculated the correlation coefficients between each band of original data and the counterpart in the simulated data ( Figure 4). As shown in Figure 4, most bands had correlation coefficients >0.95 (n = 70, bands 10-16, 22-34, 39-53, 87-94, 107-113, and 139-158), and a small fraction had correlation coefficients <0.90 (n = 9, bands 8, 19, 35-37, and 216-219); among the latter, band 36 had a the lowest correlation coefficient of 0.69. A higher coefficient suggested much better simulation of the corresponding band, whereas a coefficient lower than 0.90 indicated poor simulation.  . Generally, the linear fitness of these bands was consistent with the band correlation coefficient. They could be assigned into three groups according to the fitness performance. The first group included bands 19 and 36, with correlation coefficients <0.90. Their data points showed loose scattering and did not cluster tightly around the fitted line, and their 2 R values (0.789 and 0.472, respectively) were much lower than those of other bands. As indicated by its lowest correlation coefficient, band 36 showed the poorest performance with the greatest RMS of 0.123 and lowest 2 R of 0.472. The wavelength of band 36 was 711 nm, corresponding to the red edge. The rapid change in vegetation reflectance around the red edge may degrade our model performance and cause the lowest correlation coefficient and poor performance. This may also have been responsible for the low correlation coefficients of bands 35 and 37, as they were also around the red edge. Adding some supplementary spectral patterns accounting for this rapid change into UPDM or replacing the vegetation standard spectral pattern with ground-measured vegetation spectra in the study area may improve the results. The second group consisted of bands 52, 94, 113, 148, and 208, all of which had correlation coefficients >0.9. The data points for these bands clustered around the fitted line very well with quite high 2 R values (0.973, 0.949, 0.936, 0.934, and 0.879, respectively), and their fitted lines were very close to the line 1:1. These observations indicated that these bands were well simulated and highly similar to the original bands. The best performance was observed for band 52; its fitted line was y = x + 0.00209 (R 2 = 0.973), indicating that the simulated band was almost the same as the real data. Band 13 alone was considered a separate group for which the intercept of the fitted line (0.0286) relative to the dynamic range of data (0.06-0.13) was much greater than that for the second group, causing it to move away from the line x y = . However, the data points of band 13 fit the line quite well.
The high values of 2 R (0.956) and correlation coefficient (0.98) also suggested good linearity.  To evaluate the general results of the whole dataset, the vector angles between simulated and original data of each pixel were calculated. The images were displayed as cosine values of the angle, and a higher value corresponded to a smaller angle ( Figure 6). Most pixels reached values >0.95, especially for the deep red area. We also noted that the areas in yellow and canyons had lower values, indicating slightly greater differences between simulated and original data. Interestingly, these areas all corresponded to river or pond areas. Water has much lower reflectance than land, especially in the infrared region. This results in a lower SNR of data in water areas, and the noise may lead to poorer simulation. This result was similar to the finding in [16]. Some borders of different ground objects or line features with narrow width also have lower values caused by the resampling method.
Combining the above discussion and analysis, most simulated bands, with the exception of a small fraction with quite low correlation coefficients, showed strong correlations and high linearity with the original bands. The vector angle image also showed a high degree of similarity and good simulation for most areas. These observations indicated that our method is valid for simulating Hyperion bands from the viewpoint of statistics.

Comparing Simulated Hyperion Data and Real Hyperion Data by Classification Application
We also performed classification using the spectral angle mapper method on the whole original Hyperion data (106 bands), simulated Hyperion data (106 bands), and ALI data (9 bands) to evaluate the general performance of our simulation method in classification application (Figure 7). That is, we wish to determine whether, being reconciled with the hyperspectral data, simulated data can preserve the information of ALI data.
The aim of our method is to simulate hyperspectral data from multispectral data to make it comparable to the real hyperspectral data. Our method involves transformation to present the information of multispectral data in the new simulated hyperspectral data, but not creating or adding new information. This is quite reasonable, as we could never create any new information just using mathematical techniques. The situation here is quite similar to the PCA method, which never creates any new information but reassigns inherent information.
From the classification images, we can see that the classification results on simulated data are similar to those on ALI data, indicating that the information of ALI data is well preserved in the simulated data using our method. That is, the inherent information of ALI data is not lost after being reassigned. In addition, the classification on original Hyperion data is also similar to the results of simulated Hyperion data shown by the overall classification accuracies, which are calculated using the classification results on original Hyperion data as the reference image to compare and evaluate the classification results of simulated Hyperion data and ALI data ( Table 2). The overall accuracy of the classification on simulated data was 87.6%, and the kappa coefficient was 0.808, whereas the accuracy on ALI data was 86.8% and its corresponding kappa coefficient was 0.797. Figure 7. Classification results of original Hyperion data, simulated Hyperion data, and ALI data: (a) shows the classification image of original Hyperion data, (b) shows the classification image of simulated Hyperion data, and (c) shows the classification image of ALI data. (Class label "pond" refers to water, which has characteristics of pond water; label "plant 1" refers to sparse plant area; and Label "plant 2" refers to dense plant area). The classification results showed that our method successfully preserved the inherent information and presented it in the new data.

Summary and Conclusions
We have proposed a method to simulate hyperspectral data from multispectral data based on the spectral reconstruction method UPDM. A total of 106 bands of Hyperion were simulated from ALI data covering the same area. Visual comparison showed that the simulated data successfully presented the information of ground features and objects described by the original data for interpretation. To further evaluate our method, we compared the simulated and original data by statistical methods. The results indicated that most bands had very high correlation coefficients, suggesting a high degree of similarity and good consistency of the simulated bands to the original bands. The detailed results of linear regression analyses further verify that, for the bands with high correlation coefficients, the data points were generally clustered very tightly and fit the line 1:1 very well. These observations indicated that most bands showed good linearity and similarity to the original data. The high cosine values of the vector angle between the simulated and original data of each pixel also demonstrated the general good performance of our method.
However, a small fraction of bands showed lower correlation coefficients, corresponding to poor simulation. This may have been because our standard spectral patterns were not collected in the study area. It may be possible to improve the results by adding supplementary patterns and replacing the standard spectral patterns with those derived from ground-measured spectra.
The aim of our method is to simulate hyperspectral data from multispectral data and make them comparable to the real hyperspectral data. Similar to PCA, our method attempts to preserve and reassign the inherent information of multispectral data to the simulated data and to make full use of them, but not to add or create extra new information. The similarity between classification results derived from ALI data and simulated Hyperion data showed that the inherent information of ALI data were well preserved by reassignment to simulated Hyperion data.
Simulated data could serve as a powerful tool in algorithm testing and assessment and could act as a potential surrogate when real hyperspectral data are unavailable. Validation and evaluation of such algorithms should be conducted using hyperspectral images covering a wide range of spatial complexities, but acquiring enough hyperspectral data to meet this need can be difficult. Our method can provide simulated hyperspectral imagery with the spatial complexity of real-world imagery, thus allowing for extensive yet lower-cost testing of algorithms over a wide variety of environmental conditions. In addition to algorithm development and testing, our method can also be applied to simulate the imagery of new sensors still in the design stage.
Although this pilot study demonstrated the good general performance of our method from the viewpoint of visual interpretation, statistical comparison, and classification application, further studies of both the theory and applications should be performed to improve this method. For example, we may add some supplementary spectral patterns or consider the variability of standard spectral patterns, attempt to derive standard spectral patterns from ground-measured spectra, and select different standard spectral patterns for different applications.