Information Loss-Guided Multi-Resolution Image Fusion

Spatial downscaling is an ill-posed, inverse problem, and information loss (IL) inevitably exists in the predictions produced by any downscaling technique. The recently popularized area-to-point kriging (ATPK)-based downscaling approach can account for the size of support and the point spread function (PSF) of the sensor, and moreover, it has the appealing advantage of the perfect coherence property. In this article, based on the advantages of ATPK and the conceptualization of IL, an IL-guided image fusion (ILGIF) approach is proposed. ILGIF uses the fine spatial resolution images acquired in other wavelengths to predict the IL in ATPK predictions based on the geographically weighted regression (GWR) model, which accounts for the spatial variation in land cover. ILGIF inherits all the advantages of ATPK, and its prediction has perfect coherence with the original coarse spatial resolution data which can be demonstrated mathematically. ILGIF was validated using two data sets and was shown in each case to predict downscaled images more accurately than the compared benchmark methods.

involves regular supports (pixels with the same size and shape) [5].ATPK accounts for the size of support, spatial correlation, and the point spread function (PSF) of the sensor and has the appealing characteristic of perfect coherence with the original coarse spatial resolution data, and thus, it is an accurate method for downscaling [2].

A. Information Loss in Downscaling
Downscaling is essentially an ill-posed, inverse problem, in which multiple plausible solutions can lead to an equally coherent recreation of the original coarse image.As a result, some of the required fine spatial resolution information cannot be recovered in the process, particularly for heterogeneous landscapes and boundaries between land cover types.That is, there is unavoidable information loss (IL) in downscaling solutions, where the terminology IL is defined as the gap between the ideal fine spatial resolution image (i.e., reference image) and actual downscaling solutions (e.g., those based on spatial prediction, e.g., using ATPK), as shown in Fig. 1.IL is defined in contrast to information gain (IG) which refers to the gain of the downscaling solution over the original coarse image.The relation between the input coarse image and the ideal downscaling solution can be summarized in (1).Although the objective of downscaling is to minimize the IL, such loss always exists and is never zero.If the IL can be predicted, it can compensate the ATPK-based predictions to achieve more accurate downscaling predictions Ideal solution = Coarse image + IG Downscaling solution +IL. (1)

B. Potential Solutions to IL Prediction 1) Learning-Based Solution:
For downscaling in real applications, the reference (i.e., the ideal solution) is always unavailable (otherwise there is no need for downscaling).Thus, the IL for the study area at the required fine spatial resolution cannot be predicted straightforwardly.A plausible solution to predict IL for a downscaling prediction is to find the relation between the downscaling prediction (or original coarse image, as input) and the IL (as output) based on the This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see http://creativecommons.org/licenses/by/4.0/training data and apply the fitting model to the downscaling prediction of the study area.The training images need to be at the same spatial resolution as the target fine spatial resolution for downscaling and, more importantly, need to have a similar spatial pattern as the study area [6].In most cases, there may not be easy access to such demanding training data.Alternatively, the fitting model could be predicted based on a self-example scheme [7]: the coarse image of the study area is upscaled to a coarser spatial resolution, and the original coarse image is treated as the ideal solution to calculate the IL.In this scheme, however, the IL is predicted at the original coarse spatial resolution.For remote sensing data, the spatial content can be different when the spatial resolution varies.For example, the roads and buildings are visible in a 5-m spatial resolution image but may "disappear" at a coarser (e.g., 20-m) spatial resolution.
2) Multi-Resolution Image Fusion-Based Solution: With the development of satellite sensors such as WorldView, QuickBird, IKONOS, SPOT, Landsat ETM+, and, more recently, Sentinel-2 Multispectral Imager (MSI) [8], the earth's surface can be observed at different spatial resolutions in different wavebands.The finer spatial resolution images in some wavebands [e.g., 15-m panchromatic (PAN) band in Landsat ETM+ or 10-m bands in Sentinel-2 MSI] have been used to guide the downscaling process for coarser spatial resolution images in other wavebands (e.g., 30-m multispectral bands in Landsat ETM+ or 20-m bands in Sentinel-2 MSI).This process is commonly known as multi-resolution image fusion in remote sensing, which has received increasing attention in recent years especially in relation to reliable monitoring.
Multi-resolution image fusion methods were originally developed for the case of fusing a single PAN band (also termed pan-sharpening in remote sensing).Recently, Selva et al. [26] investigated the extension of the methods to the more general case of fusing more than one fine spatial resolution band, which is also termed "hypersharpening."Specifically, two schemes (i.e., the selected band and synthesized band schemes) are summarized for using multiple fine spatial resolution bands.

D. Proposed IL-Guided Image Fusion Approach
With the availability of fine spatial resolution data in some wavebands, the IL in downscaling for these bands can be quantified by downscaling the coarse data (simulated by upscaling the known fine-resolution data) and comparing the predictions with the known fine spatial resolution image.The fine spatial resolution bands can be treated as training data, and the IL in these bands can be used to predict the IL in downscaling coarse images in other wavebands.On this basis, a new IL-guided image fusion (ILGIF) approach is proposed for fusing multi-resolution images.Based on the ATPK solution to the COSP, the ILGIF prediction is the combination of the ATPK prediction for the coarse image and the corresponding prediction for the IL.
According to one of the protocols in [27], any fused synthetic images, once degraded to its original spatial resolution, should be as identical as possible to the original coarse image.This has been a great challenge for the existing image fusion methods.As a new multi-resolution image fusion method based on a new conceptualization, ILGIF has the appealing merits of preserving perfectly the spectral property of the original coarse images (can be demonstrated mathematically) and, thus, satisfies the aforementioned protocol.Moreover, ILGIF accounts for the PSF of the sensor and is easy to implement.ILGIF is suitable for the fusion of PAN and multispectral images (i.e., the standard pan-sharpening problem) and fusion of multispectral and multi/hyperspectral images (i.e., where two groups of images are in different wavelength ranges).
The remainder of this article is organized into four sections.Section II introduces briefly the principles of ILGIF.Section III provides the experimental results of two groups of data sets for the validation of ILGIF.Further issues related to ILGIF and opening future research are discussed in Section IV.Finally, Section V concludes this article.

A. Problem Formulation
Let Z l C (x i ) be the measurements of pixel C centered at x i (i = 1, . . ., M, where M is the number of pixels) in coarse band l (l = 1, . . ., L, where L is the number of coarse bands), and Z k F (x j ) be the measurements of pixel F centered at x j [ j = 1, . . ., MG 2 , where G is the spatial resolution (zoom) ratio] in fine band k (k = 1, . . ., K , where K is the number of fine bands).Note that F and C represent the fine and coarse pixels, respectively.The objective of downscaling is to predict variables Z l F (x) for all fine pixels in all L coarse bands.In the proposed ILGIF method, the process consists of ATPK-based downscaling and IL estimation.Denote the predictions of ATPK and IL as Ẑl F A (x) and Ẑl F I (x), the ILGIF prediction is The calculation of ATPK and IL predictions is detailed in Sections II-B-II-D.
Fig. 2. Flowchart of the proposed ILGIF method, where red and blue lines represent the ATPK and IL prediction processes, respectively.A coarse band l is used as an example, and the process is implemented for each coarse band in turn.
Fig. 2 shows the flowchart of the proposed ILGIF method, where a coarse band l is used as an example for illustration.The implementation of ILGIF is summarized by the following steps.
1) For a coarse band l, it is downscaled to the fine spatial resolution using ATPK.This step is detailed in Section II-B.
2) The ILs for the K fine bands in other wavebands are calculated, see ( 8), (9), and ( 11).This step is detailed in Section II-C.
3) The K weights transforming the ILs for the K fine bands to that for the coarse band are calculated using geographically weighted regression (GWR).This step is detailed in Section II-D.4) The IL for the coarse band is calculated [see (13)] and added to the ATPK prediction in step 1 [see ( 2)]. 5) Steps 1-4 are performed for all L coarse bands.

B. Area-to-Point Kriging
For a fine pixel centered at x 0 in band l, the ATPK-based downscaling prediction can be simply described as a linear combination of the neighboring coarse pixels where λ i is the weight for the i th coarse neighboring pixel centered at x i and N is the number of coarse neighbors.The N weights are calculated according to the kriging matrix as follows: In ( 4), γ l CC (x i , x j ) is the coarse-to-coarse semivariogram between coarse pixels centered at x i and x j in band l, γ l FC (x 0 , x j ) is the fine-to-coarse semivariogram between fine (to be predicted) and coarse pixels centered at x 0 and x j in band l, and θ is the Lagrange multiplier.Let s be the Euclidean distance between the centroids of any two pixels, γ l FF (s) be the fine-to-fine semivariogram between two fine pixels, and h l C (s) be the PSF for band l. γ l CC (s) and γ l FC (s) in ( 4) are calculated by convoluting γ l FF (s) with the PSF h l C (s) as follows: where * is the convolution operator.The key issue becomes the estimation of the fine-to-fine semivariogram γ l FF (s).If any prior spatial structure information at target fine spatial resolution for the band is available, it can be used readily for estimation.However, such information is not always available in reality.In this case, its estimation is achieved based on the deconvolution, where the original coarse data are treated as real data.The optimal solution to the fine-to-fine semivariogram is identified as the one that when convolved with the PSF according to (6), is the same as the areal semivariogram.Details of the several approaches for deconvolution can be found in the literature [4], [6].
An appealing advantage of ATPK is that the prediction has perfect coherence with the input coarse image.That is, once the ATPK prediction is upscaled to the original coarse resolution, it is exactly the same as the original coarse data [2], [3]

C. Information Loss
As downscaling is an ill-posed, inverse problem, there exists unavoidable IL in ATPK prediction when compared with the ideal prediction (reference).The IL may not be an important problem for homogeneous landscapes, but it is crucial for the restoration of heterogeneous landscapes with great spatial variation.For more reliable downscaling, it is important to predict the IL in ATPK predictions.In this article, it is achieved using the K available fine spatial resolution images Z 1  F , Z 2 F , . . ., Z K F in other wavebands.The process is detailed below.
1) Upscaling the K Fine Bands to Simulated Coarse Bands: Each fine spatial resolution image is upscaled to match the spatial resolution of the coarse image 2) ATPK-Based Downscaling for the Simulated K Coarse Bands: ATPK is performed on the simulated coarse image Z k C for band k to downscale it back to the fine spatial resolution.Similar to (3), the prediction for a fine pixel centered at x 0 in band k is in which β i is the weight for the i th coarse neighbor.The weights are calculated in the same way as in (5).Based on the perfect coherence property of ATPK, the simulated coarse image Z k C can be reproduced exactly when the ATPK prediction Ẑ k F A (x) is upscaled to the coarse spatial resolution 3) Calculating the ILs for the K Fine Bands: Since the reference for Ẑ k F A (x 0 ) is known, the IL in the ATPK prediction for the fine pixel at x 0 in band k is quantified as follows: From ( 8) and (10), we can conclude an important property of the quantified IL: once it is upscaled to the coarse spatial resolution, it is zero (12) 4) Calculating the ILs for the L Coarse Bands: The ILs of the K fine bands are used to predict the ILs in the L coarse bands.Specifically, the IL in ATPK prediction for a pixel centered at x in coarse band l [i.e., Ẑl F I (x) in ( 2)] is assumed to be a linear combination of all K ILs in the K available fine bands where α k (x) is the weight for the kth fine band.The weights are determined according to the relation between the coarse band l and K fine bands.That is, a larger weight will be assigned to band k if the relation between images Z l C and Z k C is larger, and vice versa.
As acknowledged widely, the spatial structure of land cover always varies spatially [28], [29].For images composed of pixels, the relation between the coarse and fine bands is not fixed and it is a function of the pixel.This requires a nonstationary spatially adaptive model to characterize the relation (e.g., a fitting model in a local window).Moreover, in the local window, pixels can exert different effects on the center, as their spatial distances to the center are not the same.Thus, it would be more reasonable to quantify their influence according to spatial distance.On this basis, the GWR model [30] is proposed to predict the weights in (13).

D. Geographically Weighted Regression-Based Weight Estimation
GWR has been used widely in the spatial analysis [31] and data assimilation [32], [33].The model can relate data from different sources or platforms.For example, GWR was used to relate field data (e.g., PM2.5) to satellite sensor data [32] and normalized difference vegetation index (NDVI) to rainfall [33].GWR can also relate data acquired from the same platform, such as filling the missing data [due to could or scan line corrector (SLC)-off] in remote sensing images [34] using temporally close, complete data.
In this article, GWR is applied for the estimation of α k (x) in ( 13) by relating remote sensing data acquired in different wavebands.GWR is a local model that accounts explicitly for the spatial nonstationarity between the dependent and independent variables.Moreover, it allows the contributions from neighbors to vary according to their distances to the center pixel [33].With the coarse band l and K fine bands, the GWR model is constructed as follows: In ( 14), α 0 (x) is the intercept.Let P(x) be an N 0 × (K + 1) matrix composed of the coarse pixel values of all K bands Z 1 C , Z 2 C , . . ., Z K C [produced according to (8)] in the local window (including N 0 pixels for each band) centered at x, with the last column being an N × 1 vector of ones; Q(x) be an N 0 × 1 vector composed of the coarse pixel values of the local window centered at x in coarse band l; and W(x) be an N 0 × N 0 spatial weighting diagonal matrix.The K weights for the pixel, included in a (K + 1) × 1 vector, are predicted by As seen from ( 15), the matrices of P(x) and Q(x) constructed from a local window result in weights varying on a pixel basis, which can cope with spatial nonstationarity.Furthermore, the diagonal elements in W(x) ensure that pixels near to the location x have more influence on the prediction than the further pixels [33].They can be determined based on a bisquare function in which d i is the distance between the i th neighboring pixel and the center pixel at x, and H is the bandwidth for the kernel.

E. Coherence Property of ILGIF
As mentioned in (2), the final ILGIF prediction Ẑl F (x) is a combination of the ATPK prediction Ẑl F A (x) in (3) and IL prediction Ẑl F I (x) in (13).Combining (7), (12), and (13), we can derive the following important property of the ILGIF prediction: It means that once the ILGIF prediction is upscaled to the coarse spatial resolution, it is exactly the same as the original coarse input Z l C , that is, it has the perfect coherence property.It should be noted that such a property is not affected by the specific value of weights α k (x) and the specific form of PSF (as long as a consistent PSF is used in the whole process of ILGIF).

A. Data and Experimental Setup
Two data sets were used for experimental validation of the proposed ILGIF method, including a WorldView-2 data set and a Sentinel-2 data set.The WorldView-2 data set contains eight multispectral bands with a spatial resolution of 2-m and a PAN band with a spatial resolution of 0.5-m.The spatial sizes of the multispectral and PAN images are 400 × 400 pixels and 1600 × 1600 pixels, respectively.The data were acquired in April 2011 and cover an urban area in Shenzhen, China.
The used Sentinel-2 data set contains four 10-m bands and six 20-m bands.It was acquired on August 18, 2015.The study area is located in Verona, Italy, and is covered mainly by a mix of vegetation and urban fabric.The data set has a spatial extent of 8-km × 8-km (400 × 400 pixels for 20-m bands and 800 × 800 pixels for 10-m bands).
For objective evaluation where fine spatial resolution data are required for examination, synthetic data sets were used (i.e., the reduced resolution case as termed in [9]).Specifically, for the WorldView-2 data set, the eight 2-m multispectral bands and 0.5-m PAN band were upscaled to 8 and 2-m by convolving them with a PSF, as shown in Fig. 3(a) and (b).Similarly, for the Sentinel-2 data set, the six 20-m and four 10-m bands were upscaled to 40 and 20-m.In all experiments, a Gaussian PSF was used and the standard deviation (size of the PSF width) was set to half of the coarse pixel size.The task of downscaling is to restore the eight 2-m WorldView-2 multispectral bands and six 20-m Sentinel-2 bands, by fusing them with the synthesized 2-m WorldView-2 PAN band and four 20-m Sentinel-2 bands, respectively.The predictions were compared to the original 2-m WorldView-2 bands and 20-m Sentinel-2 bands for objective evaluation.This scheme has been used commonly to evaluate downscaling approaches [35].For clarity, we termed the experiments for the two data sets as pan-sharpening and multispectral sharpening.
Four CS methods (i.e., PRACS [15], GSA [13], GSA-CA [14], and BDSD [11]) and six MRA methods, i.e., ATWT [18], AWLP [19], MTF-GLP [20], MTF-GLP-CBD [21], MTF-GLP-HPM [22], and the recently developed morphological half gradient (MF-HG) [36], were considered as benchmark methods.The CS and MRA approaches use a single fine band (e.g., PAN band) for the coarse bands.Thus, a single band needs to be extracted from the set of fine bands to adapt them for multispectral sharpening.Two schemes summarized in [26] (i.e., the selected band and synthesized band schemes) were considered in the experiments.With respect to the selected band scheme, for each coarse band, the fine band with the greatest correlation [quantified by correlation coefficient (CC)] with it was selected.Regarding the synthesized band scheme, for each coarse band, a single fine band was synthesized as a linear combination of the available fine bands.The weights were determined using the multiple regression model built between the coarse band and all fine bands.
For quantitative evaluation, we used the CC, universal image quality index (UIQI), Q2 n index [37], relative global-dimensional synthesis error (ERGAS), and spectral angle mapper (SAM).CC and UIQI were first calculated for each band, and the values for all bands were finally averaged.For Q2 n and SAM, they were calculated for each pixel first and then averaged.Moreover, to measure the ability to honor the original coarse data, coherence (quantified by the CC) was used.More precisely, the fused image was upscaled to the original coarse spatial resolution and evaluated with the original coarse image based on CC.

B. Experiment on Pan-Sharpening
ATPK-based downscaling for the input 8-m coarse image [Fig.3(a)] is an important first step of ILGIF.To illustrate the advantage of ATPK-based downscaling, it was compared to the classical polynomial interpolation (with 23 coefficients).Fig. 3(c) shows the polynomial interpolation result for Fig. 3(a).Compared with the ATPK result in Fig. 3(d), the polynomial interpolation result is more blurred and the gaps between the buildings cannot be restored satisfactorily.Table I lists the accuracies for the two methods, where the advantage of ATPK is obvious from the quantitative comparison.More precisely, ATPK increases the Q2 n and UIQI by around 0.10 and 0.07, respectively.The more satisfactory performance of ATPK mainly lies in the ability to account for the size of support and PSF and, more importantly, the preservation of the original data [i.e., coherence property, see (10)].
ATPK was performed on the 8-m upscaled PAN image, and the 2-m IL [in units of digital number (DN)] produced by comparing to the 2-m reference PAN in Fig. 3(b) is shown in Fig. 3(e).It is seen that for the boundaries of the smallsized buildings, there exists relatively large uncertainty in downscaling.Based on GWR, the IL in Fig. 3(e) was then used for the estimation of the IL in ATPK-based restoration of the 2-m multispectral bands [i.e., the prediction in Fig. 3(d)].By adding the IL to the ATPK prediction, the final ILGIF prediction was produced, as shown in Fig. 3(f).It is clear that by adding IL, the smoothing effect in ATPK result was obviously reduced and much more spatial detail was reproduced.As a result, the ILGIF result is much more similar to the reference in Fig. 3(g).
The ten benchmark methods were implemented.For a clearer comparison with the results, all fused images were compared with the reference in Fig. 3(g) and produced the error maps in Fig. 4. It is visually clear that the proposed ILGIF method has the smallest error among all methods, especially for the restoration of building boundaries (heterogeneous features).
Table II lists the quantitative assessment results for all 11 methods.Comparing to the results in Table I, it is seen that  the accuracies of the image fusion methods are greater than that for the method using only the input coarse image.For example, both CC and UIQI are increased by about 0.12 from ATPK to ILGIF.Focusing on the result in Table II, GSA and GSA-CA have very similar performance and the ERGASs are smaller than 1.6.Both are more accurate than the other two CS methods (i.e., PRACS and BDSD).Among the MRA methods, MTF-GLP, MTF-GLP-CBD, and MTF-GLP-HPM tend to be more accurate.However, the accuracies of both the CS and MRA methods are smaller than the proposed ILGIF method.ILGIF produced the largest CC, UIQI, and Q2 n and smallest ERGAS and SAM.Regarding coherence, ILGIF produced a value very close to the ideal value of 1, suggesting its perfect coherence property.All 11 methods were also implemented for the full resolution case, that is, the fusion of the eight 2-m multispectral bands and the 0.5-m PAN band to create an eight-band, 0.5-m multispectral image.The quality with no reference (QNR) index [38] was used to evaluate the methods quantitatively.As claimed by Alparone et al. [38], consistency can also give a reliable assessment of the relative performance of image fusion methods at full resolution, and it tends to be superior  to the commonly used QNR metrics.Thus, the coherence was also used here.The results for the two indices are shown in Table III.Comparing the QNR values, the ILGIF can produce greater accuracy than the benchmark methods except PRACS.Checking the coherence values, however, ILGIF has the largest value, suggesting the result is the most accurate.

C. Experiment on Multispectral
The 20-m downscaling results of polynomial interpolation and ATPK for the input 40-m coarse Sentinel-2 images in Fig. 5(a) are shown in Fig. 5(c) and (d), respectively.Again, ATPK can reproduce more spatial details.For example, in Fig. 5(d), the linear features of the urban fabric can be observed more clearly.The advantage is also supported by the quantitative assessment in Table IV.Furthermore, by adding ILs derived from the four 20-m bands to the APTK result, the produced ILGIF result Fig. 5(f) is more accurate and much closer to the reference in Fig. 5(c).
The error maps for all 11 methods are shown in Fig. 6.For each benchmark method, the results for both selected and synthesized band schemes are exhibited.As seen from  the results, ILGIF has the smallest error among all cases, which can be observed clearly by checking the locations of the rivers.Table V also indicates that ILGIF produces greater accuracies than the ten benchmark methods, no matter whether the selected or synthesized band scheme is applied.More precisely, the CCs and UIQIs of the ten methods are below 0.99, but ILGIF produced a CC and UIQI of 0.99.The ERGASs of the ten methods are all above 2.5 (even exceeds 3.4 for PRACS with both schemes), but for ILGIF, it is about 2.
In addition, the coherence value of ILGIF is almost the ideal value of 1.

D. Analysis of Alternatives for ATPK and GWR in ILGIF
To analyze the advantages of using ATPK and GWR in the proposed ILGIF method, different combinations of interpolation and IL estimation were performed for the two data sets.Table VI shows the accuracies of four combinations: 1) polynomial + global linear regression (GLR) (i.e., MTF-GLP with synthesized band scheme); 2) polynomial + GWR; 3) ATPK + GLR; 4) ATPK + GWR (i.e., the proposed ILGIF method).
By comparing ATPK + GLR (or GWR) to polynomial + GLR (or GWR), it is seen clearly that the accuracies of the two ATPK-based methods are greater than the two polynomialbased methods for both data sets.For example, focusing on the results for the WorldView-2 data set, the Q2 n of ATPK + GWR is 0.0118 larger than those of polynomial + GWR, while the Q2 n of ATPK + GLR is 0.0260 larger than those of polynomial + GLR.This means that the use of ATPK is more advantageous than polynomial interpolation in the image fusion problem, which is also consistent with the findings in Tables I and IV.When comparing ATPK (or polynomial) + GWR to ATPK (or polynomial) + GLR, it is observed that the two GWR-based methods are more accurate than the two GLR-based methods, suggesting the benefits of using the GWR scheme in IL estimation.Overall, ATPK + GWR produces the most accurate results among all four combinations.

IV. DISCUSSION
Both ATPK and GWR are popular methods in spatial statistics.The proposed ILGIF method integrates them into a single framework for multi-resolution image fusion.ATPK is employed for initial downscaling, while GWR transforms the ILs from the fine bands covering the same area, but in other wavelengths, to that for the coarse band.Based on the important property of the IL (i.e., once upscaled to the coarse spatial resolution, it is exactly zero), it is concluded from (17) that the perfect coherence property of ILGIF is not influenced by the specific value of the weights in (13).This means that any weight can lead to a prediction with perfect coherence with the original coarse data.Such a property opens doors to more powerful alternatives to GWR for weight estimation.
In the experiments, when comparing the ILGIF predictions to the reference (ideal downscaling solution), there still exist gaps, which means IL still remains.The uncertainty in IL estimation in the ILGIF method may be ascribed to the inconsistency in terms of wavelength between the coarse band and fine bands, as ILGIF treats the fine bands as training data and makes use of the ILs extracted from the fine bands.It would be worth developing more powerful models to relate the ILs from the fine bands to the coarse bands.Another possible choice for enhancement is to seek training data that fall in the same wavelength with the coarse band.As mentioned in the Introduction, such types of data may be challenging to provide as they need to be at the target fine spatial resolution and have a similar spatial pattern with the study area [6].On the other hand, a large volume of such training data may be required to achieve as accurate a prediction as possible.This also motivates the development of more intelligent training schemes, such as that based on deep learning [40].
To reduce the smoothing effect in ATPK prediction and reproduce the variation at target fine spatial resolution, the conditional simulation was developed in some literature [2], [41].The idea of compensating ILs for the ATPK prediction in ILGIF is analogous to conditional simulation.However, they are substantially different.Specifically, for conditional simulation, an unconditional simulation at fine spatial resolution is produced first and then upscaled to match spatial resolution of the input coarse data.The ATPK prediction for the simulated coarse data is compared to the available unconditional simulation, and the difference (analogous to the IL defined in this article) is finally added back to the ATPK prediction of the input coarse data [41].Different unconditional simulations will lead to different predictions.Any prediction of conditional simulation has perfect coherence with the original coarse data.Admittedly, the conditional simulation scheme can increase the spatial variation of downscaling predictions, but this scheme is highly conditioned by the target spatial variation and the prediction always contains unstructured features, presenting as noise.This is because the unconditional simulation is derived from a random realization of white noise (zero-mean) without any spatial continuity.In this article, however, IL is a fixed realization derived from fine bands which contain spatial continuity information highly related to the coarse bands (these bands were acquired over the same scene).ILGIF can, therefore, be viewed as a special case of conditional simulation, where the "unconditional simulation" is actually a set of the available fine bands in different wavelengths.
Inheriting the advantages of ATPK, ILGIF accounts for the PSF and is suitable for any PSF.In the two experiments, we simulated coarse data based on the assumption of a Gaussian PSF, a filter widely used in remote sensing [42]- [44].On the one hand, it should be noted that the sensor PSF in reality may be different from the Gaussian filter.For example, Tan et al. [45] claimed that the MODIS sensor has a scanning mirror which ensures that the shape has a directional component, and the sensor PSF was assumed to be triangular in the along-scan direction but rectangular in the along-track direction.The characterization of the real PSF remains an open problem, and the most appropriate PSF model varies for different sensors.Specifically, the PSF depends on the used optics, the detector, the exploited scanning system, and the electronics.Moreover, it can vary over time due to the aging process [46].As mentioned earlier, however, the implementation of ILGIF is not affected by the specific form of PSF, and any PSF can be readily used in ILGIF once it is known or estimated in advance.
It is necessary to use an accurate PSF in the ILGIF method.For example, for the Sentinel-2 data set where the PSF was simulated with a Gaussian filter, when ILGIF was performed using a different square wave filter (i.e., the ideal PSF filter), the CC and UIQI of the prediction were 0.9361 and 0.9248, which are 0.054 and 0.065 smaller than those produced by the correct PSF.On the other hand, it should be stressed that when fusing images with different spatial resolutions, we are more interested in the PSF of the scale transformation than the PSF of the sensor (i.e., original measurement).It would be interesting to develop new methods to predict the mathematical formulation and corresponding parameters for the PSF in scale transformation.This is part of our ongoing research.

V. CONCLUSION
In this article, based on the concept of IL, a new method called ILGIF is proposed for image fusion.ILGIF compensates the IL to the initial APTK prediction of the observed coarse image, where the IL is predicted using the ILs for fine spatial resolution bands acquired in other wavelengths.GWR is proposed to relate the two types of ILs and transform the ILs for the fine bands to the observed coarse band.ILGIF has the perfect coherence property and is suitable for pan-sharpening and fusion of multispectral and multi/hyperspectral images.Experiments on two data sets showed that ILGIF can produce more accurate results than six benchmark methods.

Fig. 1 .
Fig. 1.Definition of IG and IL in image downscaling.

TABLE I COMPARISON
BETWEEN POLYNOMIAL INTERPOLATION AND ATPK FOR THE WORLDVIEW-2 DATA SET (THE BOLD VALUES MEAN THE MOST ACCURATE RESULTS IN EACH TERM)

TABLE II QUANTITATIVE
ASSESSMENT FOR DIFFERENT METHODS FOR THE WORLDVIEW-2 DATA SET (THE BOLD VALUES MEAN THE MOST ACCURATE RESULTS IN EACH TERM)

TABLE III QUANTITATIVE
ASSESSMENT FOR DIFFERENT METHODS FOR THE WORLDVIEW-2 DATA SET AT FULL RESOLUTION (THE BOLD VALUES MEAN THE MOST ACCURATE RESULTS IN EACH TERM)

TABLE IV COMPARISON
BETWEEN POLYNOMIAL INTERPOLATION AND ATPK FOR THE SENTINEL-2 DATA SET (THE BOLD VALUES MEAN THE MOST ACCURATE RESULTS IN EACH TERM)

TABLE V QUANTITATIVE
ASSESSMENT FOR DIFFERENT METHODS FOR THE SENTINEL-2 DATA SET (THE BOLD VALUES MEAN THE MOST ACCURATE RESULTS IN EACH TERM)

TABLE VI COMPARISON
BETWEEN DIFFERENT COMBINATIONS OF INTERPOLATION AND IL ESTIMATION METHODS (THE BOLD VALUES MEAN THE MOST ACCURATE RESULTS IN EACH TERM)