Improved Generalized IHS Based on Total Variation for Pansharpening

Pansharpening refers to the fusion of a panchromatic (PAN) and a multispectral (MS) image aimed at generating a high-quality outcome over the same area. This particular image fusion problem has been widely studied, but until recently, it has been challenging to balance the spatial and spectral fidelity in fused images. The spectral distortion is widespread in the component substitution-based approaches due to the variation in the intensity distribution of spatial components. We lightened the idea using the total variation optimization to improve upon a novel GIHS-TV framework for pansharpening. The framework drew the high spatial fidelity from the GIHS scheme and implemented it with a simpler variational expression. An improved L1-TV constraint to the new spatial–spectral information was introduced to the GIHS-TV framework, along with its fast implementation. The objective function was solved by the Iteratively Reweighted Norm (IRN) method. The experimental results on the “PAirMax” dataset clearly indicated that GIHS-TV could effectively reduce the spectral distortion in the process of component substitution. Our method has achieved excellent results in visual effects and evaluation metrics.


Introduction
Some satellites can obtain multispectral (MS) and panchromatic (PAN) images simultaneously, such as WorldView-2, 3, 4 and GeoEye-1, benefitting the alignment of the two images. MS images contain a wealth of spectral information whose wavelength is from the visible to the near-infrared. MS images usually perform with a relatively poor spatial resolution, limiting their application scenarios. A PAN image can provide a powerful complement to spatial detail. Pansharpening refers to the fusion of MS images and PAN images in the same region, aiming to produce an image with rich spatial and spectral information. Pansharpening has been widely used in many fields, such as environmental monitoring, agriculture, forestry and geological survey.
The pansharpening problem has been widely studied for over three decades. There are still some challenges in the filed of pansharpening. The fusion task requires consideration of alignment, noise amplification due to upsampling, information loss by downsampling and how information is chosen between source images. The inconsistent spectral response between the panchromatic and multispectral sensors is prone to spatial and spectral structure distortion. Suboptimal image registration between the MS and PAN images will lead to problems such as false edges or artifacts. In addition, increasing the practicality of algorithms is a demanding issue in remote sensing image spatial-spectral fusion. Thus, it is a difficult task for pansharpening to balance spatial and spectral fidelity due to these factors. Figure 1 illustrates that spatial fidelity and spectral fidelity usually cannot be achieved at the same time. The increase in spatial detail tends to destroy the spectral fidelity, as in the IHS [1] and the GS [2] methods. Conversely, the fused images of the EXP [3] and BDSD [4] methods are too blurred. From a mathematical point of view, MS and PAN images are incomplete complementary observations of ideal images, and image fusion is the reconstruction of ideal images. Therefore, the focus of pansharpening is to extract geometric information from the PAN image to add this into the fused images and to leave the spectral information unchanged as much as possible. Approaches of pansharpening can be classified into four categories: component substitution-based (CS-based), multi-resolution analysis-based (MRA-based), variational optimization-based and machine learning-based [5]. The main idea of the CS-based methods is that projection transforms the MS image into a separated spatial component and then enhances it with the PAN image. It is worth mentioning that Tu [6] made a mathematical derivation to simplify the process of component substitution and established a detail injection scheme called generalized IHS (GIHS), where the forward and backward transformations are omitted. The CS approach has high fidelity for spatial details and robustness for spatial misregistration [7]. Based on multiscale decomposition methods, the MRA approaches directly act on the spatial domain [8][9][10]. The basic idea of the MRAbased class methods is to keep the whole content of the LRMS images and add further information from the PAN image. The difference between the PAN image and its lowpass version is regarded as the complement of the spatial details to the fused images [8].
Multiscale decomposition methods such as pyramid transform and wavelet transform are widely used in pansharpening [10]. The process of the MRA-based methods is generally divided into three steps: decomposing the source image into sub-images at different levels with pyramid or wavelet transform, then fusing the sub-images of MS and PAN images at each level and finally performing inverse transform and reconstructing to obtain the fused image [11]. The iterative decomposition scheme is the distinguishing feature from other classes of methods [12]. The MRA-based methods can maintain high spectral quality, and the research hotspots are utilizing different decomposition schemes and optimizing the injection coefficients [5]. The variational optimization-based methods hold the common idea of considering PAN and MS images as coarse measurements of high-spatial MS images, which can be estimated based on the regularized solution of this ill-inverse problem [13]. The variational optimization-based techniques perform well in both spectral and spatial fidelity, with the drawback that they take more optimization iterations than the CS and MRA methods [14]. The last category is machine learning-based methods, where compressed perception-based [15] and dictionary-based [16] versions are the early algorithms, and CNN [17][18][19][20][21][22][23][24][25][26] and GAN [27][28][29][30][31] in deep learning are moving into the field of pansharpening with promising achievements. The pansharpening network (PNN) [32] has achieved encouraging results and a following from researchers. Most super-resolution algorithms cannot be used directly for pansharpening because they cannot fully utilize the spatial information of PAN images. The requirement of CNN for the number of training samples is also an essential limitation of deep learning for pansharpening [5,17,22].
Among these methods, CS-based methods have the best spatial fidelity and worst spectral fidelity. The spatial component synthesis in the earlier CS methods is simpler and uses less spectral information, laying the groundwork for their spectral distortion. BDSD [4] and PRACS [33] use downsampled PAN for spatial component synthesis, which explains the poor spatial fidelity in these improved CS methods. These methods cannot maintain spectral and spatial fidelity simultaneously. The important advantages of CS methods are their high spatial fidelity and good tolerance for misregistration. In the component substitution framework, histogram matching and fitting is the most common way to reduce spectral distortion, but these weak constraints lead to worse results. The new spatial component (I new ) could be regarded as the function of the original spatial component (I 0 ) and PAN image. Some proper constraints could be imposed to construct the spatial component (I new ) by minimizing the functional of the objective function. In this paper, we proposed an optimization framework that takes an optimization perspective to build the new spatial component, which was called the GIHS-TV framework. The goal is to improve the spectral distortion in the CS-based approaches and maintain the spatial details. Furthermore, L1 norm total variation as a constraint for the spatial components generated by the IHS transform was proposed as a simple implementation of the GIHS-TV framework. Experimental results on the "PAirMax" dataset indicated that the GIHS-TV performs well, especially in spectral fidelity.

Related Works
CS-based methods assume that this transformation can separate the spatial component from the spectral information of different bands. However, the separated spatial component is the weighted sums of the MS image, which usually contains spectral information and does not precisely match the spectral response of panchromatic sensors. Some spectral information is lost after the component substitution, which accounts for the poor performance on spectral fidelity [5,6]. IHS [1,34], PCA [35], BT [36] and GS [2] are the early classical algorithms that are easy to implement and fast to compute. These methods focused on the construction of the spatial component. The spatial component of the IHS transform [1,34] and Brovey transform (BT) [36] are the means of all bands. PCA [35] considers the first principal component as the spatial component. The spatial component of these transforms contains certain spectral information. The GS [2] method conducts Gram-Schimidt orthogonalization with the original multi-band data and the means of each band and then performs the subsequent component substitution and its backward transformation. The mean intensity information is incorporated before the transformation in the GS. In the improved algorithm of GS, GSA [37], using multispectral data to fit the low-passed filtered PAN image, generates the spatial component by a linear combination of MS images whose weights are solved by minimizing the RMSE.
The focus of subsequent research on most CS approaches after GIHS [6] was proposed shifts from constructing spatial variables to designing injection forms. BDSD [4] jointly estimates the weights and gain coefficients by minimizing MSE. An improved method based on physical constraint optimization, BDSD-PC [38], was proposed to raise the fusion quality. The PRACS [33] approach proposed the concept of the partial substitution of the spatial components. The weighted sum of MS and PAN images was used to construct the new spatial component. Scale factors and correlation coefficients are adopted for optimization to remove the local spectral instability errors.
Poor performance in terms of spatial fidelity usually exists with the MRA-based and variational methods. The typical MRA-based methods include ATWT [8], ATWT-M2 [9], ATWT-M3 [9], MTF-GLP [10], MTF-GLP-CBD [10], MTF-GLP-HPM-PP [33], MTF-GLP-HPM [39], HPF [40], AWLP [12], Indusion [41], SFIM [42,43], etc. Early MRA methods often took the strictly sampled discrete wavelet transform (DWT) approach, which does not have translation invariance, so the undecimated discrete wavelet transform gradually replaced it [8,9,12,39]. The Laplacian pyramid extracts spatial details from PAN images and adds these details to MS images. The generalized Laplacian pyramid (GLP) generalizes the LP to arbitrary fraction ratios [3]. The difference of the modulation transfer functions (MTF) between the MS and PAN sensors tends to cause spatial and spectral distortion in the fused images. Introducing the spectral response information of sensors into the multiscale decomposition framework is a milestone of the MRA-based approaches [10,33,39,44]. The MTF-GLP [10] method conducts operations such as MTF filtering and interpolation to obtain detailed images, which are injected into the multispectral images to obtain pansharpened images. The MTF-GLP achieved similar fused results to the ATWT method. MTF-GLP-HPM [39], using high pass modulation, improved the spectral quality on the basis of MTF-GLP. MTF-GLP-CBD [10] adopted injection coefficients estimated by multivariate linear regression, which achieved good performance on spatial and spectral fidelity. Some critical approaches employ a combination of CS and MRA in pansharpening [7]. The variational optimization-based methods rely on the models describing how the low-resolution MS and PAN images degrade from the high-resolution MS images. The complicated spectral and spatial constraints were constructed with large amounts of basic assumptions and priors. These models usually consist of at least three items that depict the subsampled and degradation process where the level sets [45], the sparse representation [46][47][48] and Bayesian [15,49] were introduced to regulate the ill-posed problem. P+XS [45] began the pioneering of variational pansharpening in the form of the regularization of the total variation. The subsequent variational fusion model adopted the TV or its derivatives [15,50,51], the nonlocal extension form [52] or fractional order model [13], as the regularization terms. Even with the noise amplification considered in these models depicting the degradation process, there is still a large upside to the variational optimization-based methods because the framework proposed by Ballester [45] too deeply binds them, making it difficult to incorporate approaches from other frameworks. Otherwise, sparse representation-based approaches form an essential class of pansharpening methods due to their effectiveness in local structure extraction. They are also considered variational optimization-based algorithms because these methods use variational models directly or indirectly [15]. It is worth noting that the performance of variational optimization-based methods is very sensitive to the value of hyperparameters, both in terms of computation time and performance. Therefore, determining hyperparameters usually requires a precise optimization phase, which may limit its performance and application scenarios. The variational optimization-based method may become more practical if suitable hyperparameters can be determined quickly. Machine learning works effectively in image processing, including image fusion [7,14]. Masi [32] introduced conventional neural networks with a simple three-layer architecture to the pansharpening problem, which achieved a competitive result. Yang [53] designed a deep network architecture called PanNet with a strong generalization ability incorporating domain-specific priors for spectral and spatial preservation. Recently, the basic idea of model-driven networks has become popular. The GPPNN [54], the first model-driven deep network for pansharpening, formulated two optimization problems for the generative models for PAN and LRMS images, which perform well visually and quantitatively.

GIHS-TV Fusion Framework
Tu [6] simplified the component substitution framework into the details injection scheme (GIHS) by mathematical derivation. In this paper, the module for constructing new spatial components using optimization methods was integrated into the GIHS scheme, which is called the GIHS-TV framework. The general steps of the GIHS-TV framework in the component substitution method are as follows: up-sampling low-resolution multispectral images (LRMS) to PAN image size to obtain MS, calculating the spatial component (mean intensity, I 0 ) with the weights (ω k ) for LRMS bands and constructing the objective function F(I new , I 0 , PAN) based on custom constraints. Then, the iterative method is selected to solve I new by minimizing the function; according to the selected method of constructing the spatial component, the detail gain coefficients (g k ) are determined and the detail residuals calculated (δ = I new − I 0 ). The detail injection, whose expression is as (1), is calculated to obtain the fused image.
In addition, we proposed a new L1-TV optimized method under the GIHS-TV framework. Figure 2 clearly illustrates the construction process of the objective function and the vital role of total variation in the GIHS-TV framework.

Generalized IHS Transform
To visualize the construction process of the injection expression, we first performed the derivation of the three-channel form of the IHS method. The IHS transform extracts the spatial component (I 0 ), the hue component (H) and the saturation component (S) by performing the IHS transform on the three-channel image. The forward transform of IHS is as shown in (2).
where I 0 and I new individually represent the spatial component before and after the substitution. The inverse transformation is as shown in (3).
Then, the gain coefficients and injection expression could be determined by the following.
T are the three channels in the fused image with the IHS method and δ = I new − I 0 . The detail injection expression for the IHS transformation is as shown in Equation (5). Thus, it is easy to determine the weights and gain coefficients in the IHS fusion method, i.e., ω k = 1/3, g k = 1.
For the GIHS method's N-channel (N > 3) form, the GIHS transformation is as Equation (6). The spatial component's mean intensity is synthesized as in (7).
After replacing I 0 with I new , the fused multispectral image could be obtained by Equation (8).
where Φ −1 is the inverse of the matrix Φ. From (8), it can be seen that the detail injection is only the first column of the matrix Φ −1 in action, which could be written as a new vector g. g = [g 1 , g 2 , . . . , g N ] T is called the detail gain vector. Then, Equation (8) can be written as a more concise expression for detail injection.
Uniform detail injection form can be written as in (10) MS k = MS k + g k δ, ∀k,

L1-TV Optimization
Due to the fact that the new spatial component could be obtained by the original spatial component and the PAN image, I new could be regarded as the function of I 0 and PAN. As for the problem of spectral distortion, intensity distribution could be the most critical constraint. Combining the need to enhance the spatial detail of multispectral images, we establish the functional of the objective function for pansharpening. Specifically, the new framework (GIHS-TV) proposed in this paper considers the construction task of the new spatial component as an optimization problem.
First, the intensity distribution of the new spatial component I new should be as consistent as possible with the spatial component I 0 , i.e., the difference between the two should be as small as possible. To solve the problem of spectral distortion, we formulated constraints (11) as the fidelity term, which should be small enough.
On the other hand, the PAN image contains more edge information. Due to the significant difference in intensity distribution between the spatial component and PAN image, it is more reasonable to use the gradient rather than the intensity to express the edge information. Therefore, the gradient of I new should be consistent with the PAN image, and it could be regarded as the regularization item, as in Equation (12).
The fusion task is then depicted as minimizing the objective function (13) to obtain spatial and spectral fidelity simultaneously.
Let p = 1, q = 1. The rationale for the choice of the norm is as follows. Firstly, preserving the intensity distribution as much as possible is desirable, i.e., most of the fidelity term, I new − I, should be zero. From this objective, a small part of I new − I should be large to transfer the gradient information from the PAN image to the new spatial component. The fidelity term should be subject to the Laplacian or impulsive distribution, so the L1 norm is chosen as the constraint. Secondly, the sparsity of gradients is encouraged since natural images are usually piece-wise smooth and their gradients tend to be sparse. The regularization adopted the L1 norm because relevant mathematical theory guarantees that the L1 norm can obtain the sparse solution. This objective function expects most of the difference between the I new and I 0 to be zero, where the non-zero item indicates the gradient information added from the PAN image, which could ensure the sparsity of the fidelity and the consistency of intensity distribution. On the other hand, the L1 norm of the gradient, i.e., the total variation, can also encourage the sparsity of the gradient. As for the solution of the objective function, it can be turned into the L1-TV minimization problem of the variable Di f f with a simple variable substitution as (14) [55], and then Di f f can be solved by IRN [56], ADMM [57] or FISTA [58], etc. The Iteratively Reweighted Norm (IRN, [56]) method was chosen in this paper, whose stream is shown in Algorithm 1.
Algorithm 1 The solution flow of Di f f in the GIHS-TV algorithm.

Input linear operator
The new spatial component could be obtained from Equation (15). After the inverse transformation is conducted with I new , the multispectral image with high spatial resolution is obtained.
As for the new spatial component I new , the simplest synthesized approach was adopted in this paper, i.e., the weight ω k = 1, ∀k. In addition, the detail gain coefficient g k satisfies (16).
Thus, g k = 1, ∀k. The detail residuals δ can be obtained.
The fast implementation expression in the method with L1-TV constraints for constructing the new spatial component can be determined as in (18).
It is worth mentioning that the three-channel and the multi-channel form of the GIHS-TV share the same form of injection expression. The difference between them is only in synthesizing the mean intensity (spatial component), which brings great convenience to the two types of fusion.

Datasets and Evaluation Metrics
In this paper, "PAirMax" [14], a publicly available fusion dataset produced by Vivone et al., was selected to explore the performance of GIHS-TV. The dataset consists of 14 pairs of images, including MS image (4-band, 8-band) and PAN image from GeoEye-1 (GE), WorldView-2 (W2), WorldView-3 (W3), WorldView-4 (W4), SPOT-7 (S7) and Pleiades (Pl), where the resolution of PAN is four times that of MS images, and the dataset has been registered. The dataset provides three types of scenarios from different countries: urban (Urb), natural (Nat) and mixed urban-natural (Mix). It shows many kinds of urban environments: typical, dense, with long shadows, water or vegetation. The data names consist of the satellite, location and scene type.
(1) SAM: SAM reflects the spectral fidelity between the fused image and the reference MS images by calculating the spectral angle between the corresponding pixels of the two images. Denote F{i} and R{i} as the grayscale values at position i in the fused and MS images, respectively. The calculation is as in Equation (19).
The average of the spectral angle of all pixels is regarded as the SAM of the two images. The smaller the SAM value, the higher the spectral fidelity.
(2) SCC: The high-frequency information is extracted from the PAN and fused images by high-pass filters; then, the correlation between the two is calculated through correlation coefficients. Its definition is as Equation (20). (20) where HF k stands for the high-frequency information of the kth band in the fused images, and HP is the high-frequency information of the PAN image. The larger the SCC, the better the spatial correlation between the fused and the PAN images.
(3) D λ : D λ characterizes the spectral loss between the fused and the MS images and is defined by the following equation.
where p amplifies the spectral differences and is usually set to 1. The smaller it is, the smaller the spectral distortion is. Q is the universal image quality index (UIQI). UIQI calculates the correlation, brightness and contrast similarity between the fused and the reference images to characterize the comprehensive performance of fusion. It is usually abbreviated as the Q index, and the definition formula is as follows.
where σ F and σ R are the standard deviations of the fused and the reference images, respectively, σ FR is the covariance of the two, and µ F and µ R denote the mean values of the two, respectively. The three fractions represent the correlation, mean brightness and contrast similarity. The higher Q value means that the fused images are more similar to the reference images.
(4) D S : D S is used to calculate the loss of spatial detail between the fused and the PAN images. It is defined by the following.
where PAN LP is the downsampled PAN image to the same resolution as the original MS images. From a practical point of view, it should be ensured that the PAN LP should be perfectly aligned with the MS image; otherwise, this metric loses its meaning. q serves to amplify the difference in spatial detail distortion. The smaller the D S , the smaller the spatial detail distortion is. (5) QNR: QNR is the most mainstream non-reference index for evaluating the performance of pansharpening methods, which integrates spectral and spatial distortion characterization, as defined below.
where α and β are parameters used to balance the spectral distortion and spatial distortion, and the larger the QNR, the better the fusion performance, with a maximum value of 1.

Initial Results
In the GIHS-TV framework, the residuals between the new and original spatial components (I 0 and I new ) stand for the detail injections added to the fused images. The consistency and difference were analyzed in the global and local performance. In addition, the residual by the IHS method was provided for comparison with the one of the GIHS-TV methods.
The experiments in the scene "GE_Lond_Urb" show the spatial component images before and after the fusion. Figure 3 shows that the two images maintain a good consistency in global hue, which indicates the spectral fidelity of the GIHS-TV method. In the boxes, the texture and contour of the buildings in the I new is richer than that in the I 0 . Further, the residuals (δ) between I 0 and I new were evaluated in terms of the grayscale and edge. A histogram stretching was performed on the residuals. Another implication of the residuals is the detail injection in our framework, which accounts for the selection of the IHS method for the auxiliary analysis where the two methods share the same I 0 and our method produces the new spatial component by L1-TV.
In Figure 4, clear edges were shown in the residuals from the GIHS-TV method, achieving the goal of adding edge information from the PAN image to I new without altering the grayscale distribution as much as possible. Many artifacts near the edges were generated in the residuals from the IHS method. Especially, the interior of the building was almost filled with white in the red box; a dramatic brightness changes occurred in the new spatial component, and the final fused images showed spectral distortion. Several other local details are present in Figure 4-our method also has good, cleaner edge details.
The proportion of edge information added from the PAN image was changed through hyperparameter λ in the model. The fused effects of λ values were analyzed in terms of both visual effects and evaluation metrics in the "Pl_Sacr_Mix" scene. Figure 5 shows that the results are consistent with the intuition that as the hyperparameter increases, the geometric information becomes richer while the spectral distortion increases. At the same time, there is a tipping point (λ = 0.7) and a bottleneck point (λ = 2.8). D S , and QNR achieves the best value at λ = 0.7, but the visual results of the fusion in Figure 6 show a good balance between spectral fidelity and spatial fidelity when λ is equal to 1.

Fusion Results
Images by EXP [3] stand for the MS image interpolation using a polynomial kernel with 23 coefficients. The hyperparameters λ in the GIHS-TV method were set to 1 and 2, and the parameters of the other methods were kept consistent with the corresponding articles. Experiments on the "PAirMax" dataset show that most fusion methods enhance spatial details compared with the original MS images. Objects that cannot be identified in any single original image can be easily identified and inferred in fused images. However, CS approaches exhibit various degrees of color distortion. Fused images of typical scenes were shown and analyzed.
In the "GE_Lond_Urb" (Figure 7), the fused images all perform well in the enhancement of spatial details. The details within the shadows are enhanced well, such as the cars and the color, contours and texture of the trees and the low buildings. The textures of the tall buildings outside the shadows also become apparent, and even their structures or materials can be inferred. However, PCA, IHS and GS all exhibit a global blue hue that is different from the original MS image. This shows the color distortion that occurred in the CS approaches. In contrast, GIHS-TV performs better in terms of global hue and the spectral construction of local objects.  In the "W2_Miam_Mix" (Figure 9), we mainly reference the color of the grass and trees and the global hue. The PCA, IHS, BT and BDSD methods show poor global hue, and BDSD performs poorly in spatial expression. It is worth pointing out that the GIHS-TV method contains richer spatial details when the hyperparameter is greater than or equal to 2 but also exhibits color distortion, which also occurs in Figure 8. If the fused images were to be used for the visual tasks, a large λ value would work better, λ = 2, which is recommended as an experience value based on our experimental results. In the local of "Pl_Hous_Urb" (Figure 10), the fused images of all methods are rich in spatial details, especially the target contours and textures of cargo bins, trucks plants. In addition, road lines are effectively enhanced. GIHS-TV achieved a visual effect comparable to other methods. The local fused results of "W4_Mexi_Urb" are as shown in Figure 11, where the spatial details in the fused images are rich. It can be seen that, compared with other methods, the PRACS method is slightly weaker in rendering spatial details, such as the poor clarity of buildings' contours. The BDSD method tends to introduce spatial artifacts. These two improved CS-based methods are centered on spectral fidelity and do not use the PAN directly when constructing the spatial component but use its downsampled PAN or its partial image, and their spatial detail injection is not as good as other methods, which explains the lack of effective improvement in their spatial fidelity. GIHS-TV fused images perform well in both spectral fidelity and spatial fidelity, and it can effectively identify many small targets, such as buildings, cars, containers and objects on building roofs. The experimental performance with high spatial and spectral fidelity fully illustrates the advantages of the GIHS-TV framework for constructing spatial components with optimized ideas. GIHS-TV achieved a superior performance.
Representative methods of CS and MRA were selected for the comparison and evaluation of fusion methods in the full-resolution scheme provided by Vivone [5,11]. As shown in Figure 12, it can be seen that the evaluation metrics of the GIHS-TV method fusion perform well, with lower D λ and higher QNR in most scenes. The lower D λ indicates good performance in spectral fidelity, and a higher QNR stands for an excellent comprehensive fidelity.
Among the many methods, only the BDSD method can outperform the GIHS-TV metrics in most scenes, but based on the visual analysis above, it is clear that BDSD does not perform as well as GIHS-TV and other methods.
The above visual effects show that the GIHS-TV method effectively improves the spectral distortion in the component substitution class. The analysis of the evaluation metrics shows better performances than these methods. A comparison with MRA-based methods was conducted and shown. Table 1 offers the mean value of six metrics undertaken in the 13 scenes except for the "W3_Muni_Nat". The GIHS-TV performs best in all full resolution metrics i.e., D λ , D § , QNR. As for the performance at reduced resolution, our method behaves well in both SAM and SCC. In Figure 13, GIHS-TV performs with high spectral fidelity and more spatial details compared with the IHS method and LRMS images, whose performance is consistent with those at full resolution.
Compared with ATWT-M2 and ATWT-M3 methods, GIHS-TV needs a slightly longer computation time due to optimization iterations. It is more meaningful to make comparisons between the same class of methods. Thus, the P+XS methods are selected to compare the running time in the "GE_Tren_Urb", "W3_Muni_Urb", "W3_Muni_Mix" and "W4_Mexi_Nat" scenes. As shown in Table 2, the optimization times of the GIHS-TV method are much lower than the P+XS method-approximately one order of magnitude. This is attributed to the sparsity and conciseness of the proposed model.    The analysis involving visual effects and metrics shows that the GIHS-TV method is excellent in terms of fusion with high spectral and spatial fidelity. Compared with traditional TV-based methods, our method performs with better timeliness.

Conclusions
This paper adopted an optimization perspective to improve the CS-based pansharpening methods and build the GIHS-TV framework. Faced with the problem of the loss of spectral information in the fused images, we proposed a method with L1-TV that was used to constrain the spectral-spatial information in the new spatial components, effectively improving the spectral distortion. Its fast algorithm was implemented in the framework. Compared with other variational optimization-based methods, the GIHS-TV framework absorbs the advantage of high spatial fidelity from the CS-based methods. Experiments on the "PairMax" dataset show that GIHS-TV can maintain both the spectral and spatial information from the MS and PAN image well. The spectral fidelity in GIHS-TV is greatly improved compared with other CS approaches.