Pansharpening by Combining Enhanced Spatial-Spectral Fidelity and Gradient-Domain Guided Filtering

Pansharpening techniques fuse the complementary information from a high-resolution panchromatic (PAN) and low-resolution multispectral (LRMS) images to produce a high-resolution multispectral (HRMS) image. However, most of the existing pansharpening methods have been affected in the full-resolution domain due to both the absence of ground truths and unavoidable unknown noises. To address this problem, a new pansharpening method has been proposed that combines enhanced sparse models and gradient-domain guided image filtering. Specifically, a deep multiscale Laplacian pyramid super-resolution network improves the resolution of the original LRMS image instead of bicubic interpolation. Then, the accurate preservation of spatial-spectral characteristics is achieved in a variational framework with enhanced spatial-spectral fidelity in the image gradient domain. Meanwhile, the gradient-domain guided image filter is used to effectively improve the extraction accuracy of spatial characteristics from the PAN image. Finally, the enhanced sparse regularization on the latent HRMS image is designed to remove noise and artifacts while promoting piecewise-smooth solutions. The experimental results on public satellite datasets demonstrate the superiority of the proposed method against existing pansharpening methods in terms of both full-resolution performance indexes and visual quality.

And image data fusion is becoming a powerful tool used in the above applications, extracting significant characteristics in the temporal-spatial-spectral dimension. Due to physical constraints, satellites such as WorldView, IKONOS, and Pleiades capture a single PAN image with high spatial resolution, along with an LRMS image with rich spectral information. In order to acquire the HRMS image, pansharpening techniques have been developed fast to fuse the complementary information from the original PAN and LRMS images in order to acquire the complete characteristics of the captured scenes.
Since the new generation satellites can provide continuously increasing resolutions of scenes, pansharpening faces a huge challenge in the practical applications. Traditional pansharpening methods always have the limited fusion performance when performed at full resolution where the HRMS images are not available [1]. In the recent years, several pansharpening methods have been proposed, which can be divided into four main categories, such as component substitution (CS), multiresolution analysis (MRA), deep learning (DL), and variational optimization (VO). The CS approaches are based on the projection of the original LRMS image in a transformed domain, whose objective is to replace the partial or complete spatial information with that of the PAN image. The most representative CS methods include the intensity-hue-saturation (IHS) [2], the nonlinear principal component analysis (PCA) [3], the partial replacement adaptive component substitution (PRACS) [4], band-dependent spatial detail (BDSD) [5], and its variants [6]. Although those pioneering pansharpening methods have relatively fast computational efficiency, their resulting images are often affected by the undesired spectral distortions. The MRA methods inject the high-frequency spatial details from the PAN image into the upsampled MS image through a multiscale decomposition, such as Laplacian pyramids, wavelets, curvelets, and contourlets [6]. The modulation transfer function (MTF) and its improved versions are the classic representatives of MRA methods, such as MTF with generalized Laplacian pyramid [7] and MTF-GLP with high-pass modulation [8]. In contrast to the CS methods, the MRA methods can better preserve spectral information, but produce spatial distortions.
Totally different from the traditional methods, deep learningbased methods have recently achieved significant success in pansharpening. The pansharpening neural network (PNN) is one successful representative, producing the state-of-the-art This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ results [9]. Due to the relatively simple architecture of the PNN and its variants, other works design much more complicated networks in terms of topology [10] or depth [11] in order to obtain better fusion performance. Different training strategies have also been investigated by means of different loss functions, such as the mean squared error [10], the mean absolute error [12], the relative dimensionless global error [13], and so on. Those strategies undoubtedly improve the fusion performance. However, the DL methods have some intrinsic problems with full-resolution pansharpening.
1) The lack of ground-truth images for supervised training: In the practical applications, modern multiresolution sensors can only provide the original LRMS-PAN image pairs, never the ground-truth versions of HRMS image which need to be well estimated.
2) The scale-invariance assumption: A network well trained at reduced resolution is not equally well workable at full resolution. For example, significant small-scale detail structures at full resolution will definitely lose their shapes or disappear after down-sampled.
3) The scarcity of remote sensing training images: Due to the high cost of satellite data, networks are usually trained on very limited image datasets, which obviously can not adapt well to the diversity in terms of geographical position, territorial conformation, atmospheric conditions, illuminance variations, and so on. Consequently, such networks can not be generalized to the images outside the training data.
The VO methods have gained popularity recently, thanks to their flexibility, which can establish the specific relationship between the PAN, LRMS and HRMS images in a variational framework with novel image priors independent of specific training datasets. For example, the spatial fractional-order geometry and spatial-spectral low-rank priors were used in a unified variational pansharpening framework [14]. And the hyper-Laplacian prior depicted by p (0 < p < 1) norm was built to express the error distribution between the HRMS image and the upsampled MS image in gradient domain [15]. Similarly, the nonconvex 1/2 -norm sparse priors in the spatial-spectral gradient domain were applied to characterize the relationship between the HRMS image and the LRMS image [16]. Considering the effectiveness of total generalized variation (TGV) to regularize ill-posed inverse problems, a variational model with the second-order TGV was proposed for pansharpening [17]. Different from the above global image priors, local gradient constraints were incorporated into the objective function in order to achieve more accurate spatial preservation [18]. Unlike local and global approaches, a context-aware details injection fidelity (CDIF) with adaptive coefficients estimation was proposed to fully express the complicated relationship between the PAN image and the HRMS image [19]. Although the aforementioned methods can effectively extract spatial characteristics without affecting the spectral information, they are always subject to the limitation in practical applications as follows. First, the upsampled version of LRMS image in most of the VO methods is achieved by simple bicubic interpolation, which has an adverse impact on the resulting fusion performance. Second, their fusion performance may be affected by the changing scenes, low illuminance, and other adverse atmospheric conditions. Finally, unavoidable noises during the data acquirement should be suppressed to well improve the visual quality of the resulting HRMS image.
In order to improve the robustness of pansharpening at full resolution, a new pansharpening model is proposed in this study which incorporates three core modules-image superresolution, gradient-domain guided image filtering, and enhanced sparse models. The proposed framework flow is shown in Fig. 1. First, traditional pansharpening methods often use bicubic interpolation to upsample the LRMS images, which does not effectively preserve image details. Instead of it, a deep multiscale Laplacian pyramid super-resolution network (MS-LapSRN) is applied for fast and accurate image super-resolution [20]. Meanwhile, the enhanced sparse models termed e [21], which combines 0 and 1 sparse models, can reduce the reconstruction error in spectral dimension, and thus, improve the extraction precision of spectral information. Second, the proposed model incorporates the gradient-domain guided image filtering to effectively utilize the dominant spatial characteristics from the PAN image. Combining the enhanced sparse models and the gradient-domain guided image filtering significantly increass the extraction accuracy of spatial-spectral features. Finally, the e sparse constraint on the resulting HRMS image serves for removing noise gradients. Therefore, the proposed pansharpening method can produce an HRMS image with high quality at full resolution, which is beneficial to subsequent image tasks. The main ideas and contributions of the proposed approach are summarized as follows.
1) A new pansharpening method combining enhanced sparse models and gradient-domain guided image filtering is proposed to improve the extraction accuracy of spatialspectral features.
2) The accurate spectral fidelity based on enhanced sparse model is designed in spectral-gradient domain so as to effectively utilize the spectral information from the superresolved LRMS image by reducing the reconstruction error.
3) The gradient-domain guided image filtering is used to build the local linear relationship between the latent HRMS and PAN image in order to effectively extract spatial features from the PAN image. Meanwhile, the enhanced sparse models are imposed on both spatial fidelity and regularization terms to reduce reconstruction error and to remove noise and artifacts. 4) The full-resolution experimental results on public satellite datasets demonstrate the superiority of the proposed method against existing pansharpening methods. The rest of this article is organized as follow. In Section II, we discuss the proposed pansharpening framework combing enhanced sparse models and gradient-domain guided image filtering. In Section III, we solve the optimization problem using a computational efficient numerical algorithm based on ADMM. In Section IV, we present and analyze the experimental results. Finally, Section V concludes this article.

A. Superresolution of LRMS Image via Laplacian Pyramid Super-Resolution Network
Bicubic interpolation is a significant component in most of pansharpening frameworks to achieve the superresolution of LRMS image. However, it cannot well preserve additional highfrequency characteristics for reconstructing the resulting HRMS image, which in turn increases the computational cost and requires a large amount of memory. The comparative experiment of superresolution via bicubic interpolation and MS-LapSRN on the IKONOS dataset is shown in Fig. 2. It is easily observed that MS-LapSRN can produce much sharper edge structures without introducing undesired artifacts caused by spatial aliasing. Thus, it can be used to perform the superresolution of LRMS image in the proposed pansharpening framework where M denotes the super-resolved version of LRMS image M.

B. Enhanced Spectral Fidelity
To reconstruct the latent HRMS image through the PAN image and the super-resolved LRMS image, the resulting pansharpened image is assumed to possess the same spatial-spectral features as the source images. In this study, the enhanced sparse model in the spectral gradient domain is designed in the variational framework to effectively acquire the spectral information from the LRMS image. Traditional pansharpening methods must accurately build the spectral degradation model [22], which contains the blur kernel and downsampling matrix. In fact, it is noted that the unknown blur kernels need to be well estimated [23]. To avoid this problem, the original LRMS is super-resolved in each band via MS-LapSRN in this study. The latent HRMS image F must contain the same spectral information as the super-resolved LRMS image. Thus, the relationship between F and M can be formulated as only reflects the simple spectral relationship between the super-resolved LRMS and HRMS images in intensity domain. In this study, a new model of spectral characteristics in spectral gradient domain is built in order to significantly improve the prediction accuracy.
From the perspective of reconstruction error modeling, fusing the spectral information only from intensity domain in E spectral (F, M ) is not accurate enough to extract the complete spectral characteristics. The previous study in [15] has demonstrated that the error distributions of ∇(F − M ) along the spatial and spectral directions from different satellite datasets all follow a hyper-Laplacian one, neither Gaussian nor Laplacian distribution. Under the maximum a posterior framework, the hyper-Laplacian distribution induces the nonconvex p (0 < p < 1) norm sparse prior. Similarly, a nonsubsampled shearlet transform hidden Markov forest (NSST-HMF) model was proposed to investigate the statistical properties of NSST coefficients of hyperspectral images and the PAN image in the transformation domain [24]. This new work shows that the NSST coefficients also exhibit a heavy-tail distribution, which can be well modeled by the Gaussian mixture model. Although those novel priors can enforce spectral and structural preservation, they can not flexibly tackle the complexity of reconstruction errors in real applications, where single empirical statistical distribution may not be accurate enough to depict the unknown error. Instead of adopting complicate error modeling, such as the multiorder gradient characteristics of Gaussian distribution [25] or continuous p norm [26], we propose a simple combination of the 2 norm and the enhanced sparse model to fit the error distribution in the spectral direction, which can be expressed as follows: where μ and α are nonnegative parameters, and · e denotes the sum of · 0 and · 1 . The comparison of spectral information between the state-of-the-art models and the enhanced sparse model is shown in Fig. 3. The red components of the LRMS image and the HRMS image in Fig. 3 are much salient. The comparative results at full resolution show that the enhanced sparse model in Fig. 3(d) has the higher spectral extraction accuracy than the state-of-the-art models.

C. Enhanced Spatial Fidelity
Although the super-resolved LRMS image M has enhanced edge structures, it can not provide enough salient structure characteristics for the resulting pansharpened image. Both the PAN image and HRMS image taken from the same scene have a strong similarity in the spatial structure characteristics. Thus, different from the super-resolved LRMS image M , the PAN image can provide abundant significant spatial structures, which are dominate in effectively improving the spatial resolution of the LRMS image.
Recent methods often preserve the consistency of the highpass filtered components of HRMS image and PAN image in order to improve the extraction accuracy of image details. For example, the nonconvex p sparse prior [15], [16], the fractional-order geometry prior [28], and the sparse prior based on cartoon-texture similarities [29]. The aforementioned global sparse modeling is not accurate enough to express the relationship between the latent HRMS image and PAN image. Although some novel models such as the local gradient constraints [18], the adaptive spectral-spatial gradient sparsity [30] and the contextaware details injection [19] can improve the modeling flexibility to some degree, they need to be optimized further to better tackle with pansharpening at full resolution. Thus, we build the relationship between the HRMS image and the PAN image from the point of view of optimizing pansharpening under the local gradient constrains. The guided image filter (GIF) derived from a local linear model can successfully transfer the structures in a guidance image to the filtering output [31]. Different from the previous GIF used for pansharpening in intensity domain [32], we apply the gradient-domain GIF to build the local linear relationship between the HRMS image and PAN image where a i = 1 |ω| Σ n∈ω i a n and b i = 1 |ω| Σ n∈ω i b n are average linear coefficients of GIF defined in [31]. a n and b n are the linear coefficients assumed to be constant in a window ω n , and |ω| is the number of pixels in ω n . ∇F i denotes the output of GIF at the pixel i, and the PAN image gradient ∇P is the guidance input of GIF. The GIF ensures the local linear relationship between its output ∇F and the guidance image ∇P.
As mentioned in Section II-B, the extraction accuracy of spectral contents can be effectively improved by reducing the reconstruction error between the HRMS image and the LRMS image. Similarly, the same technique can be adopted in spatial domain in order to accurately extract detail characteristics from the PAN image. Thus, the enhanced sparse constrains between the latent HRMS image and PAN image are expressed as following: where G k denotes the output of GIF in x or y direction, viz.,G k = a∇P k + b and β k is the positive model parameter.

D. Pansharpening Based on Enhanced Spatial-Spectral Fidelity
The spectral and spatial fidelity terms provide the similarity between the desired pansharpened image and source images. In order to obtain a sharp fused image, the enhanced regularization term R(F) = λ ∇F e is imposed on the latent image, aiming at penalizing the undesirable gradient components. Thus, the proposed pansharpening model can be formulated as the combination of enhanced data fidelity terms and regularization term as follows: where λ k is the regularization parameter, which controls the tradeoff between the data fidelity terms and regularization term.

III. OPTIMIZATION WITH ADMM
Since the proposed model in (5) is not differentiable and separable, it is necessary to find an efficient solution to tackle the nonconvex optimization problem. Recently, the alternating direction method of multipliers (ADMM) derived from the well-known variable-splitting and penalty techniques in optimization [16], [19], [21], [29] has become a powerful approach for many optimization problems. Due to its superior convergence properties, the ADMM-based algorithm is designed to solve (5). Specifically, the original minimization function can be converted into the following approximation one by introducing five auxiliary variables W, Z k and T k (k = x, y): where α , β k , and λ k are the penalty parameters, and A, B k , and C k are the Lagrange multipliers. We obtain the solution of (6) by alternatingly solving the following multiple simple subproblems.

F-subproblem:
The F-subproblem corresponds to minimizing by omitting the e norm terms in (6). The function in (7) is quadratic, and thus, has a global minimum. Using the convolution theorem of Fourier Transformations and diagonalized derivative operators, we can solve (7) by using the fast Fourier transformations for speedup. W-subproblem: When F is fixed, the minimization function on W is Equation (8) has a closed form solution [21], which is expressed as the following: where V equals to ∇ z F − ∇ z M − A.
While Relcha> and j < j max Update F j+1 , via (7); Update W j+1 , via (8) T k -subproblem: When F is fixed, the minimization functions on T k are k=x,y The above subproblems on Z k and T k can be solved by using the same method in (9).
Updating multipliers: According to the ADMM framework, the Lagrangian multipliers A, B k , and C k (k = x, y) are updated as follows: The stopping criterion of the proposed algorithm is based on the relative change (Relcha) between two successive pansharpened images. The Relcha is defined as Combining the above subproblems, we have a one-step iteration for the ADMM. By decomposing the complex minimization problem into easy subproblems, the whole optimization algorithm to solve (5) is summarized in Algorithm 1, where j max is the maximum iteration number.

A. Datasets
Considering the large variability of the captured scenes, all comparative pansharepening methods are assessed on different datasets at full resolution, which include WorldView-2 (WV-2), WorldView-3 (WV-3), SPOT-7, Pleiades-1B and IKONOS. For WV-2 and WV-3, the size of the LRMS image is 256 × 256 × 8, and the size of the LRMS image in other datasets is 256 × 256 × 4. The size of the corresponding PAN image in all datasets is 1024 × 1024. The experimental images cover the diversity of the captured scenes in order to fully assess the fusion performance of all comparative methods, and meanwhile, the impact of atmospheric conditions on pansharpening methods is investigated, such as clouds, low illuminance, and interferences.

B. Baseline Methods
To verify the superiority of our approach over other methods, we compare the proposed approach, called PESS for short, with several state-of-the-art pansharpening methods. For CS methods, the haze correction of Brovey transform (BTH) [33] and the robust band-dependent spatial-detail approach (RBDSD) [34] are used for comparison. The classic generalized Laplacian pyramid (GLP) approaches are widely applied as an important branch of MRA methods. Thus, the MTF-GLP with high-pass modulation (GLP-HPM) [8] and the contex-based GLP (C-GLP) [35] are adopted. Moreover, the comparative methods also include the novel pansharpening neural network in the full resolution framework (ZPNN) [27] and detail-based deep Laplacian pansharpening (DDLPS) [32]. ZPNN is used for comparison on WV-2 and WV-3 datasets since the model weights well trained on these two datasets are provided. And DDLPS is adopted for other datasets. For VO methods, we choose the classic pansharpening methods including the sparse representation of injected details (SR-D) [36], the pansharpening via tensor-based sparse modeling and hyper-Laplacian prior (TSMHLP) [15], the nonconvex pansharpening model based on sparsity priors (NPM) [16] and the context-aware details injection fidelity with adaptive coefficients estimation for variational pansharpening (CDIF) [19]. All fusion algorithms for comparison are implemented based on publicly available codes and the corresponding parameters are set for the best fusion performance, which are summarized as follows.

C. Subjective Evaluation At Full Resolution 1) WorldView-2 Test Cases:
The WV-2 satellite can provide 0.46-and 1.84-m resolution with a dynamic range depth of 11 b per pixel for the PAN and the eight MS bands. The eight spectral bands help applications for coastal and vegetation land cover monitoring, mapping of crop types, benthic habitats, and wetlands.
In the first test case, a small harbor is captured in the MS image with the size of 256 × 256 × 8 and the corresponding PAN image with the size of 1024 × 1024. And the image patches with the size of 150 × 400 × 8 cropped from the resulting HRMS images are shown in Fig. 4. Obviously, the LRMS image is degraded severely in Fig. 4(a). Contrarily, the corresponding PAN image with the size of 150 × 400 shown in Fig. 4(b) has the sharp detail features. The pansharpened MS image patches of different methods are shown in Fig. 4(c)-(l). BTH, GLP-HPM, SR-D, RBDSD, and C-GLP can not well restrict the background interferences from sea background, causing spatial-spectral distortions especially around the boats in Fig. 4(c)-(e), (g), and (i), and SR-D produces the most obvious distortions in all resulting HRMS images. TSMHLP and NPM cause blurry edges which definitely affect the subsequent high-level image segmentation and recognition in Fig. 4(f) and (j). Among them, ZPNN, CDIF, and PESS have a relatively low residue error in Fig. 4(h), (k), and (l). The lower intensities of the bridge in Fig. 4(h) show that ZPNN can not well balance the intensity information from the source images. CDIF induces some spatial distortions around the boats in Fig. 4(k). In contrast, PESS can effectively restrict the adverse impact from interferences, while well preserving the spatial and spectral features from the source images in Fig. 4(l).
In the second case, the MS-PAN image pair of a suburb is shown in Fig. 5(a) and (b). Let us focus on the broad road and the house roofs, and it can be seen that both ZPNN and PESS can successfully transfer the salient spatial characteristics from the PAN image into the resulting HRMS image than the other methods while effectively restricting the distortions from the LRMS images in Fig. 5(h) and (l). Moreover, PESS extracts the more accurate spectral components in the green channel from the LRMS image than ZPNN. Among all methods, RBDSD produces the most severe spatial and spectral distortions in Fig. 5(g).
2) WorldView-3 Test Cases: WorldView-3 can collect one PAN band at 0.31-m resolution, eight visible and NIR bands at 1.24-m resolution, and eight short-wave IR bands with a spatial resolution of 3.7 m, and has a dynamic range depth of 11 b per pixel. The eight short-wave IR bands are not used in our study.
Different from the scenes from WV-2 test cases, a scene of highly dense buildings in urban area acquired from WV-3 is used for comparison. The HRMS image patches with the size of 150 × 400 × 8 are displayed in Fig. 6. It can be easily observed from all flat roofs of the buildings that ZPNN and PESS have the least spatial distortions in Fig. 6(h) and (l). If the spectral contents from these two algorithms are compared further, PESS extracts more accurate spectral information than ZPNN. The different spectral components in Fig. 6(l) look sharper and more natural than ones in Fig. 6(h). The buildings with flatter roofs and more authentic colours in Fig. 6(l) can demonstrate the superiority of the proposed model. Meanwhile, it is noteworthy that the irregular interference in the PAN image brings a huge challenge to all pansharpening methods. From the aspect of its impact on the spatial-spectral features around it, PESS has the more excellent performance than the other comparative methods.
In the second case, a scene of a suburb from WV-3 is shown in Fig. 7(a) and (b). Since the PAN image shown in Fig. 7(b) has the sharp detail characteristics, the salient white cars parked on the street can be easily observed. However, one car among them in the corresponding MS image in Fig. 7(a) is subject to the severe spatial distortion during the imaging process. In this case, only PESS is capable of restraining the spatial distortion     from the imaging system while preserving the salient structure of this white car in Fig. 7(l). Unfortunately the other methods are unavoidably affected by this kind of spatial distortion from the LRMS image, which may not benefit the subsequent high-level image tasks.
3) SPOT-7 Test Cases: SPOT-7 can provide one PAN band at 1.5-m resolution and four MS bands at 6-m resolution with a dynamic range depth of 14 b per pixel. In the first case, the resulting HRMS image patches with the size of 150 × 400 × 4 are shown in Fig. 8. Obviously, SR-D induces so strong blurring effects that the HRMS image in Fig. 8(e) totally loses both detail and structure characteristics. TSMHLP, NPM and CDIF are also affected by blurring effects, and cannot successfully transfer fine image features into the resulting HRMS images in Fig. 8(f), (j), and (k). In contrast, BTH, GLP-HPM, RBDSD, and PESS have more salient edge structures than the other methods in Fig. 8(c), (d), (g), and (l). Among them, BTH and PESS can successfully extract a subtle target on the street in the shadow at the tops of Fig. 8(c) and (l). If the spectral components in Fig. 8(c) and (l) are compared further, it can be observed that PESS preserves much sharper spectral characteristics than BTH.
In the second case, a set of HRMS image patches with the size of 150 × 400 × 4 are shown in Fig. 9(c)-(l). Since the PAN image does not have sharp detail features, fusing the original images in Fig. 9(a) and (b) is challenging. Let us focus on the green belts, and it can be seen in Fig. 9(l) that PESS has relatively better contrast than the other methods. Moreover, the salient intensity information from the white building at the top of the resulting HRMS image obviously has an adverse impact on the green belt near it in all pansharpening methods except BTH, TSMHLP, and PESS, which means most of pansharpening methods cause the unfriendly spectral distortions.

4) Pleiades-1B Test Cases:
Pleiades-1B has one PAN band at 0.7-m resolution and four MS bands at 2.8 m with a dynamic range depth of 12 b per pixel. In the first test case, most of the pansharpening methods in Fig. 10 have excellent fusion performance since the LRMS image has salient spectral contents and the PAN image possesses abundant detail features except SR-D in Fig. 10(e). However, if the bright white roof of the building is zoomed in, the dim profile of the white roof can be observed from the PAN image in Fig. 10(b). Only PESS is capable of effectively preserving the dim center line on the white roof in Fig. 10(l) under the high emittance from the white background. Meanwhile, the weak outlines of trees along the street in Fig. 10(l) are more easily recognizable than ones from other methods.
In the second test case, a scene of sparse buildings with dim hues is recorded under the low illuminance in Fig. 11(a), and the textures of the PAN image are not easily observed in Fig. 11(b). Obviously, BTH enhances subtle textures from the PAN image than the other methods, but simultaneously causes some unfriendly artifacts along the right side of the HRMS image in Fig. 11(c). If the shadows near the buildings in the rest of HRMS images are zoomed in, it can be observed that both   DDLPS and PESS can extract the subtle targets in the shadows than the other methods in Fig. 11(h) and (l).

5) IKONOS Test Cases:
IKONOS has the capability of capturing a four-band MS image at 3.2-m resolution and a PAN image with the resolution of 0.8 m, and a dynamic range depth of 11 b per pixel. In the first test case, the LRMS image is degraded by thin clouds in Fig. 12(a). It is much challenging to extract spatial-spectral features from the degraded source images due to the impact of clouds. If the green belts in the park in Fig. 12 are zoomed in, it can be obviously observed that TSMHLP, C-GLP and CDIF do not successfully transfer detail characteristics from the PAN image into the resulting images in Fig. 12(f), (i),, and (k). Moreover, CDIF induces spatial distortions around the vehicles on the road. And these vehicles in Fig. 12(d), (e), and (j) are not easily identified since GLP-HPM, SR-D, and NPM are more or less affected by thin clouds. DDLPS can not fully extract intensity information from the vehicles in Fig. 12(h). Relatively, three other methods can well preserve the spatial characteristics from the PAN image in Fig. 12(c), (g), and (l), but if the spectral components are compared further, it can be found that PESS achieves better extract accuracy in spectral preservation than BTH and RBDSD.
In the second test case, a scene of a coastal road is captured in Fig. 13(a) and (b). Obviously, BTH, SR-D, and NPM produce spatial distortions in Fig. 13(c), (e), and (j). As the three white cars on the road in Fig. 13 are zoomed in, GLP-HPM, SR-D, C-GLP, and CDIF also have some spatial distortions around these cars in Fig. 13(d), (e), (i), and (k), and NPM can not extract the salient features of these cars in Fig. 13(j). In contrast, TSMHLP, RBDSD, DDLPS, and PESS have a good ability of preserving spatial characteristics at different scales in Fig. 13(f)-(h), and (l). Among them, TSMHLP can not well transfer small-scale detail information of buildings and tennis court due to the impact of blurring effects in Fig. 13(f). If the spectral contents are compared further, PESS achieves the better fusion performance in spectral preservation than RBDSD and DDLPS.

D. Objective Evaluation
Only with the abovementioned visual comparison, can the performance of all pansharpening methods not be evaluated thoroughly. Thus, it is necessary to evaluate fusion results in some objective quality indexes. Since all experiments are performed at full resolution where no reference images are available, three full-resolution no-reference quality metrics are used in this study, which are the quality with no reference (QNR), spectral quality index D λ , and spatial quality index D s where D λ and D s denote the spectral distortion and the spatial distortion, respectively, and α and β equal to 1 in this study. QNR belongs to the range of [0, 1], with 1 being the best attainable value.
First the performance indexes in Table I are computed for WV-2 and WV-3 datasets. For all test cases in Table I, PESS achieves the best results in terms of D λ and QNR. After analyzing the relationship between the performance indexes and the specific scenes, the performance indexes of most of the comparative methods are easily affected by the changing scenes. For example, as the captured scene changes from the sea background in Fig. 4 to the suburban one in Fig. 5, the value of QNR computed by NPM significantly increases from 0.57 to 0.93. The significant differences of D λ or D s between Figs. 4 and 6 are also produced by ZPNN. In contrast, PESS has the robust fusion performance in all WV-2 and WV-3 test cases.
Second, the fusion performance of all comparative methods is further quantitatively evaluated on the SPOT-7, Pleiades-1B, and IKONOS datasets. And the experimental results are summarized in Table II. PESS achieves the best performance in D s and QNR. Although GLP-HPM, SR-D, TSMHLP, and NPM have the best performance index in some test case in Table II, PESS noticeably outperforms the other methods in most of all test cases in Tables I and II. More importantly, the relationship between the performance indexes and the captured scenes shown in Fig. 14 are analyzed, which can demonstrate that PESS has the more robust fusion performance than the other comparative methods. Therefore, the proposed method outperforms all comparative methods in terms of both the visual quality and the performance indexes.  λ k and λ k are set as 0.0001 and 0.00001, respectively. Finally, the maximum iteration number j max equals to 50. The Relchas shown in Fig. 15 are computed during the iterative process of the proposed method. Thanks to the introduction of ADMM, PESS can converge quickly for all test cases.
2) Running Time: Finally, the average running time shown in Table III is computed by different pansharpening methods. All methods are performed on MATLAB R2021b in a laptop computer with 2.3-GHz CPU and 32-GB RAM except ZPNN. In Table III, BTH has the shortest running time, and CDIF has the longest one. The results in Table III show that the larger the image size is, the longer running time all pansharpening methods spend, and although the VO methods have the relatively excellent fusion performance, they consume more computational resources.

V. CONCLUSION
Pansharpening at full resolution is much challenging due to the absence of ground truths and unavoidable unknown noises. Investigating robust pansharpening models in full-resolution domain is significantly meaningful in practical applications. In this study, a new pansharpening method has been proposed, which combines enhanced sparse models and gradient-domain guided image filtering. The deep multiscale Laplacian pyramid super-resolution network effectively improves the resolution of original LRMS images and meanwhile enhances edge structures from the LRMS image which benefit the subsequent pansharpening modeling. Then, combining the enhanced spatial-spectral fidelity and the gradient-domain guided image filtering improves the extraction accuracy in the spatial-spectral dimension. Finally, the resulting HRMS images become much sharper since the enhanced sparse regularization on the latent HRMS image restricts noise and artifacts.
Although the proposed pansharpening method has achieved relatively excellent fusion performance at full resolution, it still faces some drawbacks to be conquered. First, parameter selection may be very time-consuming. In order to achieve the best fusion performance, the reasonable parameters must be set carefully. In the future, pansharpening model with automatic parameter selection will be highly considered to improve the efficiency and effectiveness further. Second, the complicated model design often needs lot of computational resources, which may impact its use in practical applications. Inspired by the fullresolution training framework for deep learning-based pansharpening in [27]. In the future, a more robust full-resolution training framework can also be considered. Finally, the real-world noises are so complicated that single distribution may not be accurate enough to model them, such as random noise and stripes, and thus the robust antinoise pansharpening model should be investigated in order to improve the fusion performance further.