Exploiting Joint Sparsity for Pansharpening: The J-SparseFI Algorithm

Recently, sparse signal representation of image patches has been explored to solve the pansharpening problem. Although these proposed sparse-reconstruction-based methods lead to promising results, three issues remained unsolved: 1) high computational cost; 2) no consideration given to the possibility of mutually correlated information in different multispectral channels; and 3) requirement that the spectral responses of the panchromatic (Pan) image and the multispectral image cover the same wavelength range, which is not necessarily valid for most sensors. In this paper, we propose a sophisticated sparse image fusion algorithm, which is named “jointly sparse fusion of images” (J-SparseFI). It is based on the earlier proposed sparse fusion of images (SparseFI) algorithm and overcomes the aforementioned three drawbacks of the existing sparse image fusion algorithms. The computational problem is handled by reducing the problem size and by proposing a fully parallelizable scheme. Moreover, J-SparseFI exploits the possible signal structure correlations between multispectral channels by introducing the joint sparsity model (JSM) and sharpening the highly correlated adjacent multispectral channels together. This is done by exploiting the distributed compressive sensing theory that restricts the solution of an underdetermined system by considering an ensemble of signals being jointly sparse. J-SparseFI also offers a practical solution to overcome spectral range mismatch between the Pan and multispectral images. By means of sensor spectral response and channel mutual correlation analysis, the multispectral channels are assigned to primary groups of joint channels, secondary groups of joint channels, and individual channels. Primary groups of joint channels, individual channels, and secondary groups of joint channels are then reconstructed sequentially, by the JSM or by modified SparseFI, using a dictionary trained from the Pan image or previously reconstructed high-resolution multispectral channels. A recipe of how to choose appropriate algorithm parameters, including the most crucial regularization parameter, is provided. The algorithm is evaluated and validated using WorldView-2-like images that are simulated using very high resolution airborne HySpex hyperspectral imagery and further practically demonstrated using real WorldView-2 images. The algorithm's performance is compared with other state-of-the-art methods. Visual and quantitative analyses demonstrate the high quality of the proposed method. In particular, the analysis of the difference images suggests that J-SparseFI is superior in image resolution recovery.


I. INTRODUCTION
M OST optical Earth observation satellites such as IKONOS, Quick Bird, GeoEye, and WorldView-2/-3 provide two separate products with complementary spatial and spectral resolutions: a single broadband channel panchromatic (Pan) image of high spatial resolution (HR) (e.g., 0.3-1 m) and a multispectral image consisting of multiple channels (typically three to eight) at a lower spatial resolution (LR) (e.g., 1.2-4 m). While the HR Pan image allows for accurate geometric analysis, the LR spectral channels provide the spectral information necessary for thematic interpretation. As a special branch of image fusion, pansharpening aims at the fusion of the HR Pan image and the corresponding LR multispectral image to meet the demands of remote sensing applications, such as feature detection, change monitoring, and land cover classification, which require both high spatial and spectral resolutions.
A large suite of pansharpening methods has been developed over the last two decades. Among the most popular methods are intensity-hue-saturation (IHS) technique [2], principal components analysis (PCA) [3], Brovey transform [4] and its improved version [5], wavelet-based methods [6], [7], hyperspectral color sharpening method [8], and Gram-Schmidt method [9]. Existing pansharpening methods often suffer from spectral distortion and from artifacts (e.g., for multiresolution analysis-based methods) [10]. The increasing demand of the remote sensing applications and the high costs of very high resolution (VHR) satellite imagery keep pushing the limits of the techniques and call for more sophisticated methods.
Recently, sparse signal representation [11]- [13] of image patches has been explored to solve the pansharpening problem. A first successful attempt is addressed in [14], where multispectral image patches are assumed to have a sparse representation in a dictionary randomly sampled from HR multispectral images acquired by "comparable" sensors. It is demonstrated that this method gives competitive or even superior performance This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/ Fig. 1. Spectral responses of the (gray narrow lines) HySpex sensor and the (colored curves for multispectral channels and black line for the Pan image) WorldView-2 imager [29]. compared with the aforementioned methods. However, since the algorithm of [14] requires training images from a-possibly nonavailable-HR multispectral sensor that is spectrally similar to the sensor at hand, its applicability is limited. For example, pansharpening of data of the highest available resolution is not possible per definition with this algorithm. To cope with this problem, a joint dictionary from oversampled LR multispectral and HR Pan images is proposed in [15]. The HR multispectral image is assumed to be sparse in this dictionary. Still, this method requires large collections of LR multispectral and HR Pan image pairs. In [16], the authors proposed a pansharpening method named sparse fusion of images (SparseFI, pronounced "sparsify") that can be used in a broader application domain. Different from [14], SparseFI explores the sparse representation of multispectral image patches in a dictionary trained only from the Pan image at hand. Therefore, no HR multispectral images from other sensors are required. It has been demonstrated that the SparseFI algorithm also does not assume any spectral composition model of the Pan image and gives robust performance against even gross spectral model errors. An extension of the SparseFI algorithm is proposed in [17], in which a two-step sparse coefficient estimation is implemented.
Although the recently proposed sparse-reconstruction-based methods lead to motivating results, there are three obvious drawbacks of this type of image fusion algorithms. 1) Compared with other conventional methods, they are computationally expensive. 2) No consideration is given to the possibility of mutually correlated information in different multispectral channels. Such correlations constitute a so far unexploited prior to the solution, the so-called joint sparsity. 3) Self-trained coupled dictionary pair-based methods, such as SparseFI and its two-step extension, can be applied to a broader application; however, they require the Pan and multispectral images to cover the same wavelength region, which is not completely valid for many sensors.
For example, the bold lines in Fig. 1 show the spectral responses of the WorldView-2 imager. Colored curves represent the eight multispectral channels, and the black line represents the spectral response of the Pan image. The spectral response of the Pan image only fully covers the blue through red edge channels and only partially covers the near-infrared-1 (NIR-1) channel. No spectral information from the coastal and nearinfrared-2 (NIR-2) channels is contained in the Pan image.
In this paper, we propose a sophisticated sparse image fusion algorithm, which is named "jointly sparse fusion of images" (J-SparseFI) algorithm. It is based on the proposed SparseFI algorithm and overcomes the aforementioned three drawbacks of the existing sparse image fusion algorithms. First, the problem related to the computational cost is addressed by the following: 1) reducing the problem size, i.e., the dictionary size, by using a coupled local dictionary pair instead of a global dictionary containing atoms of the full image and 2) proposing a framework that can be fully parallelized, i.e., doing the calculations for each patch independently, instead of in a sequential manner. Second, the proposed J-SparseFI algorithm exploits the possible signal structural correlations between highly correlated multispectral channels. This is done by introducing the joint sparsity model (JSM). It exploits the distributed compressive sensing theory [18], [19] that restricts the solution of an underdetermined system by considering an ensemble of signals being jointly sparse. Third, J-SparseFI offers a practical solution for the spectral range mismatch between the Pan and multispectral images. By means of sensor spectral response and channel mutual correlation analysis, the multispectral channels are automatically assigned to one or more primary groups of joint channels, secondary groups of joint channels, and individual channels. The highly correlated primary groups of joint channels whose spectral ranges match well the one of the Pan image are jointly sharpened by the JSM using a coupled dictionary pair built up from the Pan image. Subsequently, individual channels that suffer from sensor spectral mismatch are sharpened individually in a sequential manner by modified SparseFI using updated dictionary pairs built up from previously reconstructed high-resolution channels. Finally, J-SparseFI jointly sharpens the secondary groups of joint channels using updated dictionary pairs built up from previously reconstructed high-resolution primary groups of joint channels and individual channels. We also provide a recipe about how to tune parameters involved in the proposed algorithm, e.g., the dictionary size, patch overlap, and regularization parameters.
The remainder of this paper is organized as follows. Section II describes the data sets used in this work, i.e., the WorldView-2-like images simulated using VHR airborne visible/near-infrared (VNIR) HySpex hyperspectral images and a real WorldView-2 image. Section III briefly reviews the previously proposed but modified SparseFI algorithm, introduces the JSM, and further proposes the new J-SparseFI algorithm. In Section IV, a detailed recipe about how to tune parameters is given. Section V evaluates the performance of the proposed J-SparseFI algorithm by comparing it with other state-of-the-art methods using simulated WorldView-2 images. In Section VI, the high quality and feasibility of J-SparseFI are practically demonstrated using real WorldView-2 image. Conclusions are drawn in Section VII.

II. DATA SETS
The data sets used in this work are reported here. Simulated WorldView-2 images using HySpex data with difference resolution ratio are exploited in Sections III-V for exemplifying the proposed algorithm, parameter sensitivity analysis, and performance evaluation; real WorldView-2 data are used in Section VI for practical demonstration of the proposed algorithm.

A. Simulated WorldView-2 Images Using HySpex Data
In order to appropriately evaluate sharpening results, it is common to work with simulated data. In this paper, we use WorldView-2-like images simulated from the airborne visible to near-infrared (VNIR) HySpex data acquired over Munich, Germany, in 2012 by experts at DLR. The HySpex data are of great value as, in addition to the high quality of the instrument itself, the acquired data have been elaborately processed via a comprehensive established postprocessing chain at DLR. The HySpex sensor is characterized by submeter ground sampling distance (depending on the flight height) and 160 spectral chan-nels ranging from 0.4 to 1.0 μm. The spectral responses of both HySpex and WorldView-2 are shown in Fig. 1. The gray lines refer to the narrow spectral bands of the HySpex sensor. The bold lines refer to the spectral responses of the WorldView-2 imager, as explained in the previous section. It can be seen that the spectral channels of HySpex cover the full spectral range of the WorldView-2 imager (except for some outer bands), making it particularly suitable for the simulation of WorldView-2 imagery (and other multispectral and panchromatic data in the VNIR range).
With regard to the WorldView-2 data simulation procedure, we used the ENVI software, version 5.1, developed by Exelis. In particular, both data sets, which differ only in the spatial resolution of the low-resolution multispectral image, were generated as follows.
-High-resolution multispectral image and high-resolution Pan image: We used ENVI's Spectral Resampling tool, which allows us to simulate data from various instruments (including WorldView-2) using very accurate spectral response functions as predefined filter functions. -Low-resolution multispectral image: We used ENVI's Resize Data tool to low-pass filter and downsample the HR MS image. The same Gaussian kernel is selected to model the point spread functions (PSFs) as in [20].
The input HySpex hyperspectral data cube is shown in Fig. 2(a) and has a spatial resolution of 0.75 m and a size of 3600 × 1200 pixels. Taking the hyperspectral image as input, synthetic LR multispectral images and an HR Pan image that match the specifications of the WorldView-2 imager in terms of the spectral properties are simulated by following the aforementioned procedure. Fig. 2(b) illustrates the synthetic HR Pan image which has a spatial resolution equivalent to the hyperspectral data, i.e., 0.75 m. For most of the topographic satellites, the spatial resolution ratio F DS between the HR Pan image and the LR multispectral image is 4. To access the performance of the relevant algorithms with respect to the data provided by these sensors, Fig. 2(c) shows the simulated LR multispectral image with this representative downsampling factor, with a spatial resolution of 3 m. Moreover, in order to investigate the limits of the pansharpening methods, synthetic LR multispectral data with an extreme resolution ratio F DS of 10, i.e., with a spatial resolution of 7.5 m, are simulated and illustrated in Fig. 2(d). A reference HR multispectral image with the original spatial resolution of 0.75 m is generated using the spectral responses of the WorldView-2 imager. By comparing the fusion solutions to the reference image, the sharpening results can be appropriately evaluated. Moreover, the validity of the sparse model can be better evaluated. A zoom-in area marked by the rectangular box in Fig. 2(a) is selected and illustrated in Fig. 3. This area has a good mixture of man-made objects and vegetation. Fig. 3(a)-(d) illustrate the input hyperspectral data cube, the simulated HR Pan image, and the simulated LR multispectral images with downsampling factors of 4 and 10, respectively. In Section IV, all visual analyses of the sharpening results are performed in this test area, whereas all statistical assessments utilize the whole area. We are working with data in the unsigned 16-bit integer domain.

B. WorldView-2 Image
A real WorldView-2 image is used in Section VI for practical demonstration of the proposed methodology. Fig. 4 illustrates the RGB channel of a WorldView-2 eight-band multispectral image acquired over Munich on August 22, 2011. The scene contains 960 × 1320 m 2 pixels with a spatial resolution of 2 m. In order to have an HR reference for the reconstructed multispectral images for later quality assessment, this image was used to synthesize an 8-m low-resolution multispectral and a 2-m high-resolution Pan image as input for the final pansharpening experiments. The yellow marked area is selected for visual inspection, and the quantitative analysis is based on the whole image.

III. J-SPARSEFI ALGORITHM
Here, to better describe the J-SparseFI algorithm, its two important elements, namely, the improved SparseFI algorithm and the JSM, will be first introduced. Then, detailed algorithm steps of the J-SparseFI algorithm will be described.

A. Improved SparseFI Algorithm
The SparseFI algorithm was proposed in [16]. As an important element of the J-SparseFI algorithm, the improved SparseFI algorithm is briefly introduced here. The main improvement lies in the computational efficiency. First, to reduce the problem size, i.e., the dictionary size, a coupled local dictionary pair with a predefined size are employed, containing only atoms, i.e., patches, of the neighborhood of a patch to be sharpened, instead of a global dictionary pair. Second, in the original SparseFI algorithm, image patches are calculated in a sequential manner because the HR reconstruction of the overlap region of the target patch requires the HR reconstruction from previous patches as input in order to avoid edge artifacts.
Here, a framework that can be fully parallelized is proposed, i.e., calculations for each patch are done independently and can be distributed to multiple threads without requiring cross communication.
Pansharpening starts from a low-resolution (LR) multispectral image Y with N spectral channels and a high-resolution (HR) Pan image X 0 . It aims at increasing the spatial resolution of Y while keeping its spectral information, i.e., generating an HR multispectral image X utilizing both Y and X 0 as inputs. The improved SparseFI algorithm consists of three main steps: 1) dictionary learning; 2) sparse coefficient estimation; and 3) HR multispectral image reconstruction.
1) Dictionary Learning: The HR Pan image X 0 is low-pass filtered and downsampled by a factor of F DS (typically 4-10) such that it has a final point spread function similar to and a sampling grid identical to the multispectral channels. The resulting LR version of X 0 is called Y 0 . The LR Pan image Y 0 and the LR multispectral image Y are tiled into small, possibly, but not necessarily, partially overlapping patches y 0 and y k , where k = 1, . . . , N stands for the kth spectral channel. After subtracting the mean value y 0 and normalizing all the LR Pan patchesỹ 0 with pixel values arranged in column vectors form the matrix D l , which is called the global LR dictionary. Likewise, the HR Pan image X 0 is tiled into patches x 0 of F DS × F DS times the size of the LR Pan image patches. After subtracting the mean value x 0 and normalizing,x 0 form the global HR dictionary D h , such that each HR patch corresponds to an LR patch.ỹ 0 andx 0 are also referred to as atoms of the dictionary pair. Fig. 5 provides representative atoms in the LR/HR dictionary pair and its corresponding LR multispectral image patch using the input data from Fig. 3. From top to bottom, patches of vegetated areas, urban areas, and mixed areas are shown. From left to right, we can compare y 0 of the LR dictionary D l , x 0 of the HR dictionary D h , and y k (k = 1, . . . , 8) of the LR multi-spectral image. In this example, LR and HR patches are of sizes 5 × 5 and 50 × 50 pixels, respectively, i.e., the downsampling factor F DS = 10. By comparing the LR multispectral channels to the low-resolution Pan image patch, it can be observed that, for all representative areas, the Pan image patches are highly correlated with the multispectral image patch of the coastal to yellow channel. This confirms that it is reasonable to represent the multispectral patches using the Pan patch, at least for these first five channels. However, NIR-1 and NIR-2 are not well represented by the Pan image, and we will tackle this problem in the proposed J-SparseFI algorithm.
2) Sparse Coefficient Estimation: Let y k,p be an LR multispectral patch, indexed as p, in the kth channel. In order to construct its high-resolution version x k,p , a local HR and LR dictionary pair, which is named D * h,p and D * l,p , is formed by selecting L atoms that are spatially closest to the target patch from D h and D l , respectively. This means that, instead of a pair of excessively large global dictionaries, for each target patch, a specific local dictionary pair of considerably smaller size is selected. Similar to the dictionary atoms, the mean value y k,p of the target patch y k,p is subtracted at the first stage, i.e., y k,p = y k,p − y k,p .ỹ k,p is then modeled as a linear combination of LR atoms, i.e., of columns of the local dictionary D * l,p weighted by a coefficient vector α k,p . Since this dictionary is still overcomplete, the system is underdetermined, and hence, there may be infinitely many solutions. The typical dimension ofỹ k,p is 10-100, and the typical number of atoms in D * l,p is 400-5000. Since we only need far fewer atoms to represent y k,p , α k,p is very sparse. Incorporating this constraint into the possible solutions, for each LR multispectral patch y k,p , the sparse coefficient vector α k,p is estimated by an L 1 −L 2 minimization, i.e., To overcome the amplitude bias introduced by the L 1 norm approximation of the L 0 norm, we correct the estimate of the sparse coefficientα k,p using the SL1MMER algorithm [21], [22].
3) HR Multispectral Image Reconstruction: Each of the HR image patches x k,p is also split into two parts, namely, the mean value x k,p and the zero-mean variationsx k,p = x k,p − x k,p . Since x k,p and y k,p represent the same object at different resolution scales, it is reasonable to assume x k,p = y k,p . Furthermore,x k,p is assumed to share the same sparse coefficients as its corresponding LR versionỹ k,p in the coupled HR/LR dictionary pair, i.e., the coefficients ofx k,p in D * h,p are identical to the coefficients ofỹ k,p in D * l,p . Hence, the final sharpened multispectral image patches x k,p are reconstructed by simply replacing the downsampled low-resolution Pan patches by the corresponding high-resolution ones in the linear combination, i.e.,x The tiling and summation of all patches in all individual channels finally gives the desired pansharpened imagex. For the overlapping areas, an average of all estimates at the corresponding pixel is computed as the final reconstruction. In this fashion, the most computationally expensive step-sparse coefficient estimation-can be solved independently for individual patches, and hence renders the framework fully parallelizable.

B. JSM
Another important element of the J-SparseFI algorithm is the JSM. Corresponding image patches in different multispectral channels represent the same physical objects. Hence, it is plausible to assume that the same geometric features are present in all highly correlated channels, as demonstrated in [23]. For instance, in Fig. 5, one can observe that the coastal through red channels are highly similar and that, for different representative patches, the same geometrical structure appears. This can be explained by the spectral profiles of different materials. In Fig. 6, the green, blue, and red curves are the spectral profiles, i.e., reflectances at different wavelengths, of p1-p3 marked in Fig. 5, which are pure pixels containing building roof, vegetation, and street, respectively. The crosses mark their profiles in the multispectral image that consists of only eight bands. Their values vary smoothly for the first five channels. It is therefore reasonable to assume that the coefficient vectors for the same patch at different channels are expected to be of similar structure. The SparseFI model can be extended to the JSM by considering this possible signal correlation between K (K ≤ N ) individual multispectral channels. As shown in Fig. 7, the JSM shares the same dictionary learning step as SparseFI. The main difference comes in the sparse coefficient estimation. Let us construct the jointly sparse representation by arranging the measurements, the sparse coefficients to be estimated, and the signals to be reconstructed in individual channels side by side and form the matrices as follows: We can recover the sparse coefficients in all correlated channels simultaneously by the mixed L 1,2 minimization, i.e., where • F is the Frobenius matrix norm, i.e., the square root of sum of squares of all matrix elements accounting for the residues. • 1,2 is the mixed norm, i.e., the sum of the L 2 norms of the columns of a matrix. The L 1,2 norm regularization promotes sparsity along the columns of the matrix while minimizing the energy along the rows. This minimization favors nonzero coefficients in the multispectral channels at the same positions. A simple example may illustrate this: Let us assume that we have only two channels and only one nonzero coefficient in each of them, i.e., α 1 and α 2 . If these are at different positions, the L 1,2 norm is |α 1 | + |α 2 |. If they are at the same position, the norm is α 2 1 + α 2 2 , which is always smaller than the sum of the magnitudes (triangle inequality). Hence, the same positions are preferred by minimizing this norm. As a result, this minimization favors the sparse solution having similar structure in different columns. That is, the image patches in different channels tend to be represented as linear combinations of the same atoms in the dictionary, but with different weights. λ is again the multiplier, balancing the joint sparsity of the solution and the fidelity of the approximation toỹ p .
Similarly, the final sharpened multispectral image patches x are reconstructed byx where y p is the mean gray value vector of the target patch in all involved channels in LR which should be the same for the corresponding HR ones. The tiling and summation of all patches finally gives the desired pansharpened imageX.
Obviously, the JSM is based on the precondition that the Pan image, from which the dictionary pair is trained, provides adequate coverage of the wavelength range of the jointly considered multispectral bands, i.e., the dictionary pair can represent the patches of the multispectral image in all channels. As mentioned in the introduction, this is not always valid. In the J-SparseFI algorithm proposed here, all these practical considerations will be taken into account.

C. J-SparseFI Algorithm
As mentioned in Section I, methods that directly train the dictionary pair from the Pan image, such as SparseFI and its two-step extension, can be applied to a broader range of problems in image fusion. However, they require the Pan and multispectral images to cover the same wavelength region which is not fully valid for most of the sensors. Here, we propose a more sophisticated fusion strategy to tackle this problem. The workflow is illustrated in Fig. 8. It jointly uses the modified SparseFI and the proposed JSM methods. It consists of four main steps, namely, sensor spectral response and channel mutual correlation analysis, JSM for primary groups of joint channels, modified SparseFI for individual channels, and JSM for secondary groups of joint channels.

1) Sensor Spectral Response Analysis and Channel Mutual Correlation Analysis:
In addition to the aforementioned modified J-SparseFI algorithm and the JSM model, the key step of the J-SparseFI algorithm is a sensor spectral response analysis and a channel mutual correlation analysis. The goal of this step is to automatically assign N multispectral channels to either of the following: primary group(s) of joint channels defined as group(s) of adjacent channels with high mutual correlation and within the wavelength range well covered by the Pan image; secondary group(s) of joint channels defined as group(s) of adjacent channels with high mutual correlation and outside or partially outside of the wavelength range covered by the Pan image; individual channels defined as the rest of the channels. To overcome the problem caused by the aforementioned mismatch of the wavelength range between Pan and multispectral channels, we sharpen the multispectral channels in a sequential fashion. For the channels which cannot be well represented by the Pan image, previously sharpened multispectral channels will be used for dictionary training. By analyzing the sensor spectral responses and channel mutual correlation, the source images used for dictionary training for the individual SparseFI or JSM problem are automatically identified. The detailed procedure is described in Table I. Based on the mutual correlation matrix between the multispectral channels and the downsampled Pan image, submatrices centered along the diagonal are investigated. These submatrices are called blocks and represent the mutual correlation matrices of involved adjacent channels. Block mutual correlation is defined as the minimum correlation value in a block. For example, if four adjacent channels form a block and the block mutual correlation is larger than 0.9, it means that the mutual correlation between each two channels is equal to or higher than 0.9.
2) JSM for Primary Group of Joint Channels: To reconstruct the identified primary groups of joint channels in HR, local dictionary pairs are built up from the Pan image whose information content is representative for these channels. The described JSM method is used to jointly sharpen these channels.
3) Modified SparseFI for Individual Channels: For individual channels, the modified SparseFI algorithm is applied in a sequential manner using the dictionary pair built up from a source image, either the Pan image or the previously reconstructed HR multispectral channels, identified from the sensor spectral response and channel mutual correlation analysis step.

4) JSM for Secondary Groups of Joint Channels:
Finally, the secondary groups of the joint channels are sharpened using the dictionary pair built up from the previously identified image source, i.e., among the Pan or adjacent sharpened HR channels, the channel that shows the highest correlation to them.

D. J-SparseFI Applied to WorldView-2-Like Data
The proposed J-SparseFI fusion scheme can be generally applied to data acquired using different sensors, such as IKONOS, Quick Bird, GeoEye, and WorldView-2/-3. Here, we exemplify it using WorldView-2-like data. Table II illustrates the channel mutual correlation matrix of our simulated WorldView-2 data calculated from the LR multispectral image and the downsampled Pan image. Since the difference in the mutual correlation between different channels under resolution ratios of 4 and 10 is negligible, input images with a resolution ratio of 10 are exemplified in this table. According to the procedure, from WorldView-2 spectral responses (see Fig. 1) and the channel mutual correlation of our input data, the channels 1-5 and 7 and 8 are identified as blocks, i.e., each group has mutual correlation higher than 0.9. Among them, channels 2-5 (blue, green, yellow, and red) fulfill C1 in Table I and, therefore, will be identified as the primary group of joint channels. After excluding the primary group of joint channels, the remaining block, i.e., channels 7 and 8 (NIR-1 and NIR-2), can be identified as the secondary group of joint channels. As a consequence, channel 1 (coastal) and channel 6 (red edge) are identified as individual channels. Primary groups of joint channels, individual channels, and secondary groups of joint channels are then sharpened in a sequential manner. First, the HR version of the group of joint channels (blue, green, yellow, and red) is reconstructed by JSM using the coupled dictionary pair built up from the HR Pan image and its downsampled version. Then, the coastal channel is reconstructed by modified SparseFI using an updated coupled dictionary pair built-up, instead of using the Pan image, using the previously reconstructed HR blue channel and its downsampled version, because, among the Pan or the sharpened primary group of joint channels, i.e., channels 2-5, the blue channel correlates the most with channel 1. The red edge channel is reconstructed by modified SparseFI using a dictionary pair trained from the HR Pan image and its downsampled image. Finally, the NIR-1 and NIR-2 channels are jointly reconstructed by JSM using a dictionary pair of the previously reconstructed HR red edge channel and its downsampled version, because of its relatively highest correlation to the target joint channels.

IV. RECIPE FOR CHOOSING THE TUNING PARAMETERS
Due to the fact that we are working with images simulated from the airborne HySpex data with the reference image at hand, we are able to analyze the impact of the tuning parameters involved in the proposed algorithm. Based on this analysis, a recipe of how to choose parameters, including regularization parameter, overlap size, and dictionary size, described by the number of nearest patches/atoms (NNP), will be provided here.

A. Evaluation Metrics
For quantitative assessment, the utilized assessment metrics include [24]- [27] the following.
• Correlation coefficient (CC): The correlation coefficient of the pansharpened image and the reference image mea-sures the similarity of spectral features. For the kth channel, it is defined as where i and j are the spatial indices; and X k andX k are the mean gray values of the reference and reconstructed images of the kth band, respectively.

• Relative dimensionless global error in synthesis (ERGAS):
This measure reflects the overall quality of pansharpened images. It represents the difference between pansharpened and reference images. For the kth channel, it is defined as where RMSE k are the rmse of the kth band. A small ERGAS value means small spectral distortion. For K channels, it is generally defined as • Spectral angle mapper (SAM): It denotes the absolute value of the angle between the true and estimated spectral vectors, i.e., A value of SAM equal to zero denotes absence of spectral distortion, but radiometric distortion is possible (the two pixel vectors are parallel but have different lengths). SAM is usually averaged over the whole image to yield a global measurement of spectral distortion. In this paper, SAM is measured in degrees.

B. Regularization Parameters
The regularization parameters λ (for modified SparseFI) and λ (for JSM) are the most crucial tuning parameters for J-SparseFI. Since they affect the reconstruction results in the same manner, here, we mainly discuss the selection of λ . It balances data fidelity and sparsity in the optimization problem (7). As shown in Fig. 9(a), (7) degenerates to the least square solution for λ = 0, whereas it converges to the zero solution if λ goes to infinity. As shown in Fig. 9(c), for the latter case, for each patch in every channel, J-SparseFI will simply give the mean gray value of the patch as the constant gray value for each pixel.
The choice of λ plays a significant role in both quality and computational cost, as shown in Figs. 10 and 11, respectively. In [27], it is suggested to select λ according to the noise standard deviation σ and the dictionary size NNP. As it will be addressed later, the favorable range of NNP is rather limited. Here, we mainly analyze the impact of the noise level. Fig. 10 shows the quality metrics, from left to right, namely, average CC, average ERGAS, and SAM, as a function of λ under different SNRs. In this experiment, the dictionary size, i.e., the number of atoms, is fixed to 600. A wide range of λ from 10 −10 to 10 5 is tested. From these curves, we can conclude the following.
-A moderate λ gives better performance than extremely small or large λ s. This confirms that it is reasonable to introduce the sparsity constraint in the reconstruction. -No optimum λ exists that gives simultaneously the best values for all quality metrics. For example, the λ value that gives the best CC does not give the best ERGAS. -It is interesting to observe that, although the optimum λ s for individual quality metrics are different and vary under different SNRs, for values of λ between 10 −2 and 10 2 , J-SparseFI always gives stable and good results for all metrics.
Based on the preceding analysis, a moderate value of λ , i.e., between 10 −2 and 10 2 , is recommended to guarantee a robust and good quality of the reconstruction results.
In addition to image quality, another important factor is the computational cost. Fig. 11(a) illustrates the computational time (evaluated using 128 cores) as a function of λ . In the moderate range of λ , the computational time increases dramatically with decreasing λ , particularly when λ < 1. Therefore, within the favorable range, we recommend λ within the range of [10 0 , 10 2 ].

C. Overlap Size
For methods, as proposed in [16], that reconstruct the whole image in a sequential manner and require a previously reconstructed HR version at the overlap regime, the overlap area is recommended to be 20%-40% of the patch size. In J-SparseFI, calculations for each patch are done independently to allow full parallelization, and an average of all estimates at the corresponding pixel is computed as the final reconstruction for the overlap areas. Here, we test the optimum overlap size for this type of reconstruction framework.
In this experiment, the patch size is set to be 5 × 5, the regularization parameter is selected to be 1.0 (center of the stable regime), the dictionary size is chosen to be 600, and the resolution ratio of the input images is F DS = 10. Overlap sizes between 0 and 4 are tested. Fig. 12 illustrates the quality metrics as a function of overlap size. As expected, a clear trend can be observed, i.e., the performance improves with the increasing overlap size. Therefore, we recommend patch sizes of 5 × 5 to 7 × 7, as suggested by previous studies [14]- [17], and choose the overlap size to be as large as possible.

D. Dictionary Size
The dictionary size is another crucial parameter. Its selection is again a tradeoff between quality and computational cost. Fig. 13 describes the quality metrics CC (left), ERGAS (middle), and SAM (right) as a function of dictionary size (experiments with channels 2-5). In this experiment, the patch size is set to be 5 × 5 with an overlap size of 4, the regularization parameter is selected to be 1.0 (center of the stable regime), and the resolution ratio of the input images is F DS = 10. Fig. 13 shows that, when increasing NNP, CC and ERGAS improve significantly, reach the peak performance with NNP = 200, and then degrade gradually; whereas SAM improves steadily with the increasing NNP. Obviously, for NIR channels whose characteristics are very different from atoms contained in dictionaries, larger NNP should be used. Fig. 11(b) illustrates the computational time (evaluated using 128 cores) as a function of NNP. As one could expect, the computational cost increases almost linearly with the increasing NNP.
In the later experiments, we set NNP to be 200 for channels 1-6 and 1000 for NIR channels.

E. Summary
J-SparseFI is not a parameter-free pansharpening algorithm. Its performance depends on regularization parameters, patch size, overlap size, and dictionary size. We provided a recipe about how to select these parameters. This recipe does not aim to provide the best reconstruction results, but an appropriate tradeoff between quality and computational cost. Experimental results using parameters selected according to this recipe will be compared with other state-of-the-art methods.    -Regularization parameters λ and λ: To balance the quality and computational cost, a parameter value between 0.01 and 100 is suggested. In Section V, we set both regularization parameters to be unity. Note that λ is a dimensionless parameter and all involved quantities in (7) are normalized. Hence, our recommendation for λ is generic and does not depend on the scaling or the dynamic range of the data. -Favorable patch sizes are between 5 × 5 and 7 × 7; the overlap size HR is suggested to be as large as possible. In Section V, we use a patch size of 5 × 5 and an overlap size of 4. -The favorable dictionary size NNP is between 200 and 1000. For applications requiring extremely high spectral accuracy or high accuracy in NIR channels, larger NNP is recommended. In Section V, we select NNP to be 200 for the visible channels and 1000 for the NIR channels.

V. PERFORMANCE EVALUATION
Here, the performance of the proposed J-SparseFI algorithm is investigated and compared with the original SparseFI algorithm and with other conventional methods visually and quantitatively. All experiments are carried out using the data sets reported in Section II-A and the aforementioned parameter settings.

A. Visual Comparison
For visual comparison, only results with the higher resolution ratio of F DS = 10 are presented. From the two input images in Fig. 2, the HR multispectral image has been reconstructed and then compared with the reference HR multispectral image [see Fig. 14 (upper left)]. From left to right and top to bottom, Fig. 14 shows the HR multispectral image reconstructed using the Gram-Schmidt method, the PCA method, the adaptive IHS method, the synthetic variable ratio merging method (SVR-MM) [5], the two-step sparse coding method with patch normalization (PN-TSSC) [17], the additive wavelet luminance proportional (AWLP) method [6], SparseFI, and the proposed J-SparseFI. Note that, for visual comparison, only the zoomin area shown in Fig. 3 and the RGB channels are visualized. Compared with the results produced by conventional pansharpening methods, sparse-reconstruction-based methods, including the PN-TSSC method, the SparseFI algorithm, and the J-SparseFI algorithm, can provide visually good results even under the situation of the large resolution ratio of 10. In addition, the color distortions they introduce are much less pronounced than those introduced by the Gram-Schmidt, PCA, and adaptive IHS methods. The results of SVR-MM and AWLP give also a very good visual impression.
Among the three sparse-reconstruction-based algorithms, due to independent reconstruction between different channels, the PN-TSSC method and the SparseFI algorithm sometimes introduce artifacts in the reconstructed image. This effect can be observed in the area marked by the yellow box. Some color distortion and blurred structures can be seen in the result of PN-TSSC and SparseFI, which are corrected in the results of J-SparseFI. It suggests that joint reconstruction of the highly correlated channels like J-SparseFI does is more robust and gives fewer artifacts.

B. Quantitative Assessment
For quantitative assessment, the reconstructed HR results are analyzed using the aforementioned metrics. We consider a resolution ratio of F DS = 4, which is representative for current satellites, e.g., IKONOS, Quick Bird, GeoEye, WorldView-2, and Worldview-3, and an extreme resolution ratio of F DS = 10 to test the limits of the algorithms. Table III summarizes the calculated assessment criteria values for the resolution ratio of 4, including CC for each channel, CC in average, ERGAS in each channel, ERGAS in average, and SAM in degrees. The best value is highlighted in bold for each criterion, and the second best value is underlined. Since the average values of CC, average value of ERGAS, and SAM indicate the overall performance, they are highlighted in the table.
From the table, it can be observed that the sharpening performance strongly depends on the sensor spectral response and/or channel mutual correlation. In general, all methods perform well for groups of adjacent mutually highly correlated channels whose spectral ranges match well with the one of the Pan image, namely, the blue, green, yellow, and red channels, as well as their highly correlated adjacent channel, i.e., the coastal channel. In addition to PCA, their performances in the red edge channel are better than in the NIR channels whose spectral bands are not covered by the Pan image.
Comparing the performance of the different methods in this experiment, for almost all the selected metrics, AWLP, SparseFI, and J-SparseFI give better values for all channels. PN-TSSC and SVR-MM are ranked as the second best groups. GS and PCA produce the most severe spectral distortions. Among the three sparse-representation-based methods, the J-SparseFI algorithm outperforms PN-TSSC and SparseFI in almost all assessment metrics. Hence, it confirms that we should consider the signal correlation between different multispectral channels and adaptively select, among the Pan image and the previously reconstructed HR multispectral channels, the appropriate source images for dictionary training. Table IV summarizes the same results as in Table III, but for the resolution ratio of F DS = 10. Similar conclusions can be drawn as above, except that SVR-MM and AWLP become more competitive. J-SparseFI still reaches the best average values in all metrics and for almost all channels. Moreover, the performance improvement of J-SparseFI with respect to PN-TSSC and SparseFI becomes more pronounced. A possible explanation is that if the resolution ratio gets higher, atoms in the dictionaries are more mutually similar because their high frequent differences visible in higher resolution are now smoothed out. Therefore, it is more challenging to find the correct set of atoms in which the sparse coefficients in LR and HR are identical. However, if we jointly estimate a group of mutually correlated channels like J-SparseFI does, then it is easier to find the correct set as we have more prior knowledge at hand. As a consequence, J-SparseFI can handle higher resolution ratios.

C. Difference Images
All aforementioned metrics give only global assessments of the results. In order to better understand where the reconstruction errors are localized, we investigate the difference images between the reconstructed HR images and the reference image for the selected area shown in Fig. 3. To emphasize the difference between different methods, results reconstructed from input images with a resolution ratio of 10 are selected. We visualize here the results of the first (coastal), fourth (yellow), and seventh (NIR-1) channels for the following reasons. First, in J-SparseFI, the first (coastal) and seventh (NIR-1) WorldView-2 channels are pansharpened using the previously pansharpened high-resolution image bands 2 (blue) and 6 (red edge), respectively, instead of the pan image; it is reasonable to demonstrate the error reduction in these two channels. Second, channel 4 (yellow) was additionally chosen because it is a representative band in the central group of jointly sharpened channels (i.e., bands 2-5) via JSM. Fig. 15 illustrates the false color difference images of pansharpening results visualized in Fig. 14. The RGB channels are chosen to be 7 (NIR-1), 4 (red), and 1(coastal), respectively. In Fig. 15, white color means zero difference; intense red, blue, and green colors mean significant errors in NIR-1, red, and coastal channels, respectively. Abrupt color jumps between white and other colors indicate resolution loss. If no resolution loss is introduced, the difference map should appear as randomly scattered colors. With resolution loss, these abrupt changes will follow the features of objects in the image, the wider the transition region, the more severe the resolution loss. Based on these rules of thumb, the following conclusions can be drawn.
-Clear structures can be observed from the difference images. This means that all methods introduce a certain amount of resolution loss. The transition regions for sparse-reconstruction-based methods, including PN-TSSC, SparseFI, and J-SparseFI, are narrower than those for the other methods. This indicates that they preserve the image resolution better. -For homogeneous areas, for instance, the upper right corner of the images, SparseFI and J-SparseFI give stable and accurate reconstruction in all channels, appearing white and smooth. PN-TSSC seems to contain a large amount of outliers appearing as randomly scattered colors other than white. -In terms of spectral distortion, GS and PCA perform the worst, as obviously dominating color appears. For the other methods, the spectral distortion mainly occurs at the boundaries of the objects, i.e., associated with the resolution loss. In particular, in the area marked by a black circle (bottom left), performances of the analyzed methods differ significantly. The sparse-reconstruction-based methods outperform the other methods. Among the three methods, it can be ranked in a descending order of quality as J-SparseFI, SparseFI, and PN-TSSC.
To get the statistics of the whole image rather than only of the selected area shown in Fig. 14, Fig. 16 illustrates the corresponding histograms of difference between the reconstructed HR images and the reference image for the full data sets shown in Fig. 2 and for all channels. As mentioned in Section II, we are working with data in the unsigned 16-bit integer domain. From the histogram, the error distributions are clearly visible. First, the fact that all histograms are centered around zero indicates they do not introduce systematic brightening and darkening effects on the whole image. Second, we can observe that the sparse-reconstruction-based methods and AWLP give generally less reconstruction errors. Among them, the stable  Fig. 4. Underlying is the WorldView-2 data set which was simulated from HySpex data with a resolution ratio of F DS = 10. Each of the eight pansharpening results corresponds to one eight-channel difference images. The histograms are plotted separately for all spectral channels. The data format for the gray values is unsigned 16-bit integers.  Fig. 4, the (center left) input high-resolution Pan image, the (center right) input low-resolution multispectral image, and the (right) pansharpened high-resolution multispectral image using the proposed J-SparseFI method. performance (sharper peaks centered at zero) of J-SparseFI in terms of residual errors is evident in this experiment. Moreover, all methods show worse performance for the NIR-1 and NIR-2 channels since they are least correlated with the other channels and the information is also not contained in the pan channel. Various methods are most distinct from each other at the red edge channel.
VI. EXPERIMENT USING WORLDVIEW-2 DATA As a final practical demonstration, here, we report experiments carried out using a real WorldView-2 image introduced in Section II-B. From left to right, Fig. 17 visualizes the original 2-m WorldView-2 reference image of the selected area shown in Fig. 4, the input high-resolution Pan image, the input lowresolution multispectral image, and the high-resolution multispectral image reconstructed using the proposed J-SparseFI method, respectively.
The reconstructed HR multispectral images of the whole image shown in Fig. 4 are compared with the results produced by the aforementioned other methods in Table V. Real data experiment confirms the conclusion we found in the previous experiments, i.e., J-SparseFI delivers better values for most of the quality metrics.

VII. CONCLUSION
In this paper, we have presented the modified fully parallelizable SparseFI algorithm and introduced the JSM, which takes into account the signal correlation between individual multispectral channels. To overcome the spectral range mismatch between the Pan and multispectral images, we proposed a sophisticated strategy-the J-SparseFI algorithm. After a sensor spectral and channel mutual correlation analysis, the multispectral channels are automatically assigned into primary groups of joint channels, individual channels, and secondary groups of joint channels. A systematic strategy is proposed to reconstruct these three types of channels in a sequential manner using the Pan image or previously reconstructed HR multispectral channels for dictionary training. Note that this sequential pansharpening strategy can be also used with other pansharpening algorithms and is not restricted to J-SparseFI.
We provide a recipe for parameter selection. The proposed algorithm is evaluated and validated with simulated WorldView-2 images using VHR airborne HySpex data (this data set is made available online as part of this work. 1 ) and, finally, practically demonstrated using real WorldView-2 image.
The statistical assessment using the data sets which we believe are quite representative for mixed urban and vegetated areas suggests the superior performance of the proposed J-SparseFI algorithm with respect to the methods included in the comparison, under resolution ratios of 4 and 10, and with both simulated and real data. A particular strength of this algorithm is that it also gives reliable quality for the NIR channels, whose information contents are different from the visible-light channels and whose spectral ranges are not covered by the Pan image. The differences to the reference images show that J-SparseFI recovers the spatial resolution of the HR multispectral image the best. Due to the large number of developed image fusion algorithms, it is only possible to make comparisons with selected methods. For example, a comparison with advanced component substation or multiresolution-based algorithms [10] could be also interesting for future studies.
Although we took pansharpening as the application example in this paper, the proposed algorithms are generally applicable for image fusion. In particular, new instrument concepts, such as the Japanese next-generation hyperspectral imager suite [30], that incorporate a higher resolution multispectral sensor and a lower resolution hyperspectral sensor on board would require techniques that fuse the multispectral and hyperspectral images with high resolution ratio [20], [31]- [34]. This could be another promising application for the proposed algorithm.