Variational Bayesian Pansharpening with Super-Gaussian Sparse Image Priors

Pansharpening is a technique that fuses a low spatial resolution multispectral image and a high spatial resolution panchromatic one to obtain a multispectral image with the spatial resolution of the latter while preserving the spectral information of the multispectral image. In this paper we propose a variational Bayesian methodology for pansharpening. The proposed methodology uses the sensor characteristics to model the observation process and Super-Gaussian sparse image priors on the expected characteristics of the pansharpened image. The pansharpened image, as well as all model and variational parameters, are estimated within the proposed methodology. Using real and synthetic data, the quality of the pansharpened images is assessed both visually and quantitatively and compared with other pansharpening methods. Theoretical and experimental results demonstrate the effectiveness, efficiency, and flexibility of the proposed formulation.


Introduction
Remote sensing sensors simultaneously capture a Multispectral (MS) low resolution image along with a single-band high resolution image of the same area, referred to as Panchromatic (PAN) image. However, MS high-resolution images are needed by many applications, such as land use and land cover analyses or change detection. Pansharpening is a technique that fuses the MS and PAN images into an MS high resolution image that has the spatial resolution of the PAN image and the spectral resolution of the MS one.
In this paper we formulate the pansharpening problem following the Bayesian framework. Within this framework, we use the sensor characteristics to model the observation process as a conditional probability distribution. The observation process describes both the MS high resolution image to MS low resolution image relationship and how the PAN image is obtained from the MS high resolution one. This probability distributions provides fidelity to the observed data in the pansharpened image reconstruction process. together with from fidelity to the data, Bayesian methods incorporate prior knowledge on the MS high resolution image in the form of prior probability distributions. Crisp images, such as high resolution MS images, are expected to have Super-Gaussian (SG) statistics, while upsampled images suffer from blur that smooths out sharp gradients, making them more Gaussian in their statistics [1]. Our goal is to integrate the sharp edges of the PAN image into the pansharpened image, leading to less Gaussian statistics which makes SG priors a suitable choice. SG priors have been successfully applied to other image processing tasks, such as compressed sensing [2], blind deconvolution [1,3] and blind color deconvolution [4] and so it is also expected to produce good results in pansharpening. However, the form of the SG prior does not allow us to obtain the posterior distribution in an analytical way, making full Bayesian inference intractable. Hence, in this paper, we use the variational Bayesian inference to estimate the distribution of the pansharpened image as well as the model parameters from the MS low resolution and PAN images.
The rest of the paper is organized as follows: a categorization and short review of related pansharpening methods is presented in Section 2. In Section 3 the pansharpening problem is mathematically formulated. Following the Bayesian modelling and inference, in Section 4 we propose a fully Bayesian method for the estimation of all the problem unknowns and model parameters. In Section 5, the quality of the pansharpened images is assessed both visually and quantitatively and compared with other classic and state-of-the-art pansharpening methods. In Section 6 we discuss the obtained results and finally, Section 7 concludes the paper.

Related Work
Early pansharpening techniques, such as in the Brovey method [5], substituted some bands for image visualization or performed simple arithmetic transformations. Other classical methods included the transformation of the MS image and the substitution of one of its components by the high spatial resolution PAN image. Examples of this strategy are Principal Components Analysis PCA substitution [6], Brovey Transform [7] and Intensity-Hue-Saturation (IHS) [8] methods. A review of those early methods, among others, can be found in [9].
Over the past 20 years, numerous methods have been presented and, in an attempt to bring some order to the diversity of approaches, different reviews, comparisons and classifications have been proposed in the literature (see, for instance, [10][11][12][13][14][15][16][17]) each one with different criteria and, therefore, with a different categorization. Nevertheless, in the last years, there seems to be a consensus in three main categories, namely Component Substitution (CS), Multi-Resolution Analysis (MRA) and Variational Optimization (VO) [15][16][17]. Additionally, the increasing number of Deep Learning (DL)-based pansharpening methods proposed in recent years can be regarded as a new category.
The Component Substitution (CS) category includes the most widely used pansharpening methods. CS methods [12] usually upsample the MS image to the size of the PAN image and transform it to another space that separates the spatial and spectral image components. Then, the transformed component containing the spatial information is substituted by the PAN image (possibly, after histogram matching). Finally, the backward transform is applied to obtain the pansharpened image. Examples of these methods include the already mentioned PCA substitution [6], IHS methods [8,18,19], the Gram-Schmidt (GS) methods [20] and Brovey transform [7]. In [21], the transformation is replaced by any weighted average of the MS bands. It is shown that this approach generalizes any CS image fusion method. Determination of the weights has been carried out in different ways. For instance, in [22] the weights are optimally estimated to minimize the mean squared error while in [23] they are set to the correlation coefficient between a single band low resolution image (obtained from the MS image) and each MS band. A local criterion, based on the belonging of a given pixel to a fuzzy cluster, was applied in [24] to estimate weights that are different for each pixel of the image. To obtain a crisper MS high-resolution image, in [25] a Wiener deconvolution of the upsampled MS bands was performed before fusion.
In general, CS-based methods produce spectral distortions due to the different statistics of the PAN image and the transformed component containing the spatial details. To tackle this issue, Multi-Resolution Analysis (MRA) methods decompose the MS and PAN images to different levels, extract spatial details from the decomposed PAN image, and inject them into the finer scales of the MS image. This principle is also known as the ARSIS concept [10]. The High-Pass Filtering (HPF) algorithm in [11,18], can be considered to be the first approach in this category where only two levels are considered. Multi-scale decompositons, such as the wavelet transform (WT) [26][27][28], the Generalized Laplacian Pyramid (GLP) [29][30][31] or the Non-Subsampled Contourlet Transform (NSCT) [32][33][34], were used to bring more precision to the methods. The "a trous" wavelet transform (AWT) was the preferred decomposition technique [26,28] until the publication of [31] showed the advantages of GLP over AWT. This was later corroborated in [14] where a comparison of different methods based on decimated and undecimated WT, AWT, GLP and NSCT concluded that GLP outperforms AWT because it better removes aliasing. MRA category also includes the Smoothing Filter Based Intensity Modulation (SFIM) method [35,36], which first upsamples the MS image to the size of the PAN one and then uses a simplified solar radiation and land surface reflection model to increase its quality, and the Indusion method [37] in which upscaling and fusion steps are carried out together.
Deep Learning (DL) techniques have gained prominence in the past years and several methods have been proposed for pansharpening. As far as we know, the use of Deep Neural Networks (DNN) for pansharpening were first introduced in [38] where a Modified Sparse Denoising Autoencoder (MSDA) algorithm was proposed. For the same task, a Coupled Sparse Denoising Autoencoder (CSDA) was used in [39]. Convolutional neural networks were introduced in [40] and also used, for instance, in [41]. Instead of facing the difficult task of learning the whole image, residual networks [42,43] learn, from upsampled MS and PAN patches, only the details of the MS high-resolution image that are not already in the upsampled MS image and add them to it to obtain the pansharpened image. To adjust the size of the MS image to the size of the PAN one in a coarse-to-fine manner, two residual networks in cascade were set in the so called Progressive Cascade Deep Residual Network (PCDRN) [44]. In [45] a multi-scale approach is followed by learning a DNN to upsample each NSCT directional sub-band from the MS and PAN images. In general, the main weaknesses of the DL techniques are the high computational resources needed for training, the need of a huge amount of training data, which, in the case of pansharpening, might not be available, and the poor generalization to satellite images not used during training. The absence of ground-truth MS high-resolution images, needed for training these DL methods, is a problem pointed-out by [46] where a non-supervised generative adversarial network (Pan-GAN) was proposed. The GAN aims to generate pansharpened images that are consistent with the spectral information of the MS image while maintaining the spatial information of the PAN image. However, the generalization of this technique to satellite images different from the ones used for training is not clear. The adaptation of general image fusion methods, like the U2Fusion method in [47], to the pansharpening problem is a promising research area.
From a practical perspective, Variational Optimization (VO)-based methods present advantages both from a theoretical as well as computational points of view [48]. VO-based methods mathematically model the relation between the observed images and the original MS high resolution image, building an energy functional based on some desired properties of the original image. The pansharpened image is obtained as the image that minimizes this energy functional [49]. This mathematical formulation allows to rigorously introduce and process features that are visually important into the energy functional. Variational optimization can be considered as a particular case of the Bayesian approach [50], where the estimated image is obtained by maximizing the posterior probability distribution of the MS high resolution image. Bayesian methods for pansharpening formulate the relations between the observed images and the original MS high resolution image as probability distributions, model the desired properties as prior distributions and use Bayes' theory to estimate the pansharpened image based on the posterior distribution of the original MS high resolution image.
Following the seminal P+Xs method [51], the PAN image is usually modelled as a combination of the bands of the original high resolution mutispectral image. However, in [49] this model was generalized by substituting the intensity images by their gradients. Note that while the P+Xs method [51] preserves spectral information, it produces blurring artifacts. To remove blur while preserving spectral similarity, other restrictions are introduced as reasonable assumptions or prior knowledge about the original image such as Laplacian prior [52], total variation [53,54], sparse representations [55], band correlations [56,57], non-local priors [58,59], etc. Spectral information is also preserved by enforcing the pansharpened image to be close to the observed MS one when downsampled to the size of the latter [52,60,61]. A special class of VO-based methods are the super-resolution methods which model pansharpening as the inverse problem of recovering the original high-resolution image by fusing the MS image and the PAN (see [52,62] for a recent review and [63] for a recent work). Deconvolution methods, such as [64], also try to solve the inverse problem but the upsampling of the MS image to the size of the PAN one is performed prior to the pansharpening procedure. Registration and fusion are carried out simultaneously in [65].
Note that the variational Bayesian approach, also followed in this paper, is more general than variational optimization. While VO-based methods aim at obtaining a single estimate of the pansharpened image, the variational Bayesian approach estimates the whole posterior distribution of the pansharpened images and the model parameters, given the observations. When a single image is needed, the mode of the distribution is usually selected, but other solutions can be obtained, for instance, by sampling the estimated distribution. Even more, the proposed approach allows us to simultaneously estimate the model parameters along with the pansharpened image using the same framework.

Problem Formulation
Let us denote by y the MS high-resolution image hypothetically captured with an ideal high-resolution sensor with B bands y b , b = 1, . . . , B, of size p = m × n pixels, that is, y = [y T 1 , . . . , y T B ] T , where the superscript T denotes the transpose of a vector or matrix. Note that each band of the image is flattened into a column vector containing its pixels in lexicographical order. Unfortunately, this high-resolution image is not available in real applications. Instead, we observe an The bands in this image are flattened as well to express them as a column vector. The relation between each low-resolution band, Y b , and its corresponding high-resolution one, y b , is defined by where D is P × p decimation operator, H is a p × p blurring matrix, B = DH, and the capture noise n b is modeled as additive white Gaussian noise with variance β −1 b . A single band high-resolution PAN image covering a wide range of frequencies is also provided by the sensor. This PAN image x of size p = m × n is modelled as an spectral average of the unknown high-resolution bands y b , as where λ b > 0 are known quantities that depend on each particular satellite sensor, and the capture noise v is modeled as additive white Gaussian noise with variance γ −1 .
Once the image formation is formulated, let us use the Bayesian formulation to tackle the problem of recovering y, the MS high resolution image, using the observed Y, its degraded MS low resolution and PAN x.

Bayesian Modelling and Inference
We model the distribution of each low resolution image Y b , b = 1, . . . , B, following the degradation model in Equation (1) as a Gaussian distribution with mean By b and covariance matrix β −1 b I. Then, the distribution of the observed image Y is modelled by with β = {β 1 , . . . , β b }. Analogously, using the degradation model in Equation (2), the distribution of the PAN image x is given by The starting point for Bayesian methods is to choose a prior distribution for the unknowns. In this paper, we use SG distributions as priors for the MS high resolution image as with α bν > 0 and α = {α 11 . . . , α LB } and Z(α bν ) is a partition function. In Equation (5), is a set of J high-pass filters, y bν (i) is the i-th pixel value of y bν , and ρ(·) is a penalty function. The image priors are placed on the filtered image y bν . It is well-known that the application of high-pass filters to natural images returns sparse coefficients. Most of the coefficients are zero or close to zero while only the edge related coefficients remain large. Sparse priors enjoy SG properties, heavier tails, more peaked and positive excess kurtosis compared to the Gaussian distribution. The distribution mass is located around zero, but large values have a higher probability than in a Gaussian distribution. For p(y bν |α bν ) in Equation (5) to be SG, ρ(·) has to be symmetric around zero and the function ρ( √ s) increasing and concave for s ∈ (0, ∞). This condition is equivalent to ρ (s)/|s| being decreasing on (0, ∞), and allows ρ to be represented as where inf denotes the infimum, ρ * (·) is the concave conjugate of ρ(·) and η bν = {η bν (i)} p i=1 are a set of positive parameters. The relationship dual to Equation (6) is given by [66] To achieve sparsity, the function ρ should suppress most of the coefficients in y bν and preserve a small number of key features. Table 1 shows some penalty functions, corresponding to SG distributions (see [1]).
Following the Bayesian paradigm, inference will be based on p(Θ|Y, x). Since this posterior distribution cannot be analytically calculated due to the form of the SG distribution, in this paper we use the mean-field variational Bayesian model [67] to approximate p(Θ|Y, x) by the distribution q(Θ) of the form q(Θ) = ∏ θ∈Θ q(θ), that minimizes the Kullback-Leibler divergence [68] defined as The Kullback-Leibler divergence is always non-negative and it is equal to zero if and only if q(Θ) = p(Θ|Y, x). Even with this factorization, the SG prior for y hampers the evaluation of this divergence, but the quadratic bound for ρ in Equation (7) allows us to bound the prior in Equation (5) with a Gaussian form such that We then define the lower bound of the prior M ν (y ν , and obtain the lower bound of the joint probability distribution to obtain the inequality log p(Θ, Y, x) ≥ log F(Θ, Y, x, η).
As shown in [67], for each unknown θ ∈ Θ, the estimated q(θ) will have the form where Θ\θ represents all the variables in Θ except θ and · q(Θ\θ) denotes the expected value calculated using the distribution q(Θ\θ). When point estimates are requiredθ = θ q(θ) is used. For variables with a degenerate posterior approximation, that is, for θ ∈ {β, γ, α}, the value where the posterior degenerates is given by [67] Let us now obtain the analytic expressions for each unknown posterior approximation.

High Resolution Multispectral Image Update
Using Equation (14) we can show in a straightforward way that the posterior distribution for the high resolution MS image will have the form where the inverse of the covariance matrix is given by with ⊗ denoting the Kronecker product, diag(·) is a diagonal matrix formed from the elements of a vector and the mean is obtained as

Model Parameters Update
The estimates of the noise variance in the degradation models in Equations (3) and (4) are obtained using Equation (15) where tr(·) represents the trace of the matrix. From Equation (14) we obtain the following distribution for the parameter α bν of the SG prior in Equation (5).
The mode of this distribution can be obtained (see [69]) by solving The p penalty function shown in Table 1 produces proper priors, for which the partition function can be evaluated, but the log penalty function produces an improper prior. We tackle this problem examining, for α bν = 1, the behavior of and keeping in ∂Z(α bν )/∂α bν the term that depends on α bν . This produces for the log prior

Calculating the Covariance Matrices
The matrix Σ y in Equation (17) must be explicitly computed to find its trace and also to calculatê η bν (i). However, since its calculation is very intense, we propose the following approximation. We first where z(η bν ) is calculated as the mean of the values in η bν . We then use the approximation Finally we have

Proposed Algorithm
Based on the previous derivations, we propose the Variational Bayesian SG Pansharpening Algorithm in Algorithm 1. The linear equations problem in Equation (18), used in step 4 of Algorithm 1, has been solved using the Conjugate Gradient approach.
while convergence criterion is not met do 1. Set n = n + 1.

end while
Output the high resolution hyperspectral imageŷ = y (n) .

Materials and Methods
To test the performance of the proposed methodology on different kind of images, five satellite images were used: three LANDSAT 7-ETM+ [70] images, a SPOT-5 [71] image and a FORMOSAT-2 [72] image. LANDSAT MS images have six bands and a ratio between PAN and MS images p/P = 4. Figures 1 and 2 show RGB color images formed by the bands B4, B3 and B2 of LANDSAT MS images, and their corresponding PAN images. Figure 1 corresponds to an area from Chesapeake Bay (US) while  Both the observed Y and x images have been normalized to the range [0, 1] before running Algorithm 1. The convergence criterion in the algorithm was y (n) − y (n−1) 2 / y (n) 2 ≤ 10 −6 or 50 iterations were reached, whatever occurs first. The relationship between the MS high resolution image and the panchromatic image in Equation (2) is governed by the parameters λ that need to be set before pansharpening is carried out. If we knew the sensor spectral response characteristics, the values of λ could be estimated from them. For instance, for LANDSAT 7-ETM+, Figure 4 shows the sensor spectral response curves for the MS bands B1-B6, shown in color, and the PAN band shown in black. For this sensor, the PAN band mainly overlaps B2-B4 MS bands, and λ coefficients could be obtained from this overlapping (see [52]). In this paper, however, a more general approach is followed to estimate λ from the observations. First, we define X = Dx, a version of the PAN image downsampled to the size of the MS image. Then, since the sensor spectral response is the same in high and low resolution, the parameters λ can be obtained by solving subject to λ b ≥ 0, ∀b,  Table 2 shows the λs associated to the different considered observed images. For the LANDSAT 7-ETM+ images only the first four bands are positive and λ 5 and λ 6 values are 0 since we know that bands B5 and B6 are not covered by the panchromatic sensor. For this process, each band is normalized to the interval [0, 1]. Note that due to the normalization, the estimated λ values do not only depend on the sensor spectral response but also on the observed area characteristics. This explains the differences between the obtained λ values for the images in Figure 2a

Discussion
Within the variational Bayesian methodology, two methods are proposed in this paper: one using the log penalty function (see Table 1), hence, named log method, and another using the p penalty function, with p = 1, referred as 1 method. The proposed methods have been compared with the following classic and state-of-the-art pansharpening methods: the Principal Component Analysis (PCA) [6], the Intensity-Hue-Saturation (IHS) transform [19], the Brovey transform (Brovey) [7], the Band-Dependent Spatial-Detail (BDSD) method in [22], the Gram-Schmidt (GS) method in [20], the Gram-Schmidt adaptive (GSA) method in [21], the Partial Replacement Adaptive Component Substitution (PRACS) method in [23], the High Pass Filtering (HPF) algorithm in [18], the Smoothing Filter Based Intensity Modulation (SFIM) method [35,36], the Indusion method in [37], the Additive A Trous Wavelet Transform (ATWT) in [26], the Additive Wavelet Luminance Proportional (AWLP) method in [28], the ATWT Model 2 (ATWT-M2) and ATWT Model 3 (ATWT-M3) methods in [10], the Generalized Laplacian Pyramid (GLP)-based methods in [29], concretely the modulation transfer functions (MTF)-GLP, GLP with High Pass Modulation (MTF-GLP-HPM), and GLP with Context Based Decision (MTF-GLP-CBD) methods, and the pansharpening method using a Total Variation (TV) image model in [53]. We have used the implementation of the methods and measures provided by the Pansharpening Toolbox (https://rscl-grss.org/coderecord.php?id=541) [13]. For those methods not included in the toolbox we have used the code provided by the authors. The code of the proposed methods will be publicly available at https://github.com/vipgugr. We have also included the results of bilinear interpolating the MS image to the size of the PAN, marked as EXP, as a reference. Both quantitative and qualitative comparisons of the different methods have been performed.

Quantitative Comparison
A common problem in pansharpening is the nonexistence of a MS high resolution ground-truth image to compare with. Hence we performed two kinds of quantitative comparisons. Firstly, the images obtained using the different methods have been compared following Wald's protocol [73] as follows: the observed MS image, Y, and the PAN image, x, are downsampled by applying the operator D to generate low resolution versions of them. Then, pansharpening is applied to those low resolution images and the obtained estimation of the MS image,ŷ, is quantitatively compared with the observed MS image, Y. Secondly, the different methods have been compared using Quality with No Reference (QNR) measures [13,74]. As previously stated, for the LANDSAT image in Figure 1, the resolution ratio between MS and PAN images is p/P = 4. Since the SPOT-5 satellite provides two PAN images, two experiments were carried out on the image in Figure 3, one with a decimation ratio of 4 and another with a ratio of 16. For the FORMOSAT-2 image the ratio is p/P = 16. However, for the sake of completeness, two experiments were also carried out, one assuming a decimation ratio of 4 and another with a ratio of 16.
Both spatial and spectral quality metrics have been used to compare the results obtained using the different methods. Details for the metrics used is shown below: Spatial measures: • Q -Universal Quality Index (UQI) [75] averaged on all MS bands.
The higher the the better. •

Q4, Q8
-Instances of the Q2 n [76] index taking values. Suitable to measure quality for multiband images having an arbitrary number of spectral bands. Q4 is used for SPOT-5 and FORMOSAT-2 images which have four bands and Q8 for the LANDSAT image with six bands.
The higher the better. •

Spatial Correlation Coefficient (SCC) [77]
-Measures the correlation coefficient between compared images after the application of a Sobel filter.
The higher the better. The lower the better.
The lower ERGAS value the better consistency, specially for values lower than the number of image bands B. •

QNR spectral distortion (D λ ) [78]
-This measure is derived from the differences between the inter-band Q index values computed for HR and LR images.
The lower, the better.
The higher the better. Table 3 shows the obtained figures of merit using Wald's protocol for the LANDSAT image in Figure 1. As it is clear from the table, 1 outperforms all the other methods both in spectral fidelity and the incorporation of spatial details. Note the high SCC value (meaning that the details in the PAN image have been successfully incorporated into the pansharpened image) while also obtaining the lowest spectral distortion as evidenced by the SAM and ERGAS values. The TV method obtains the second best results except for the SAM metric, for this metric, the proposed log method has the second best value. This method also obtains the third best values for ERGAS and SCC measures. GLP based and PRACS methods also obtain high values for the Q, Q8 indices and low value for SAM. However, their ERGAS and SCC performance is worse. Table 4 shows the QNR quantitative results for the LANDSAT image in Figure 1. In this table, the proposed methods achieve competitive results. Log obtains the best D λ value and this method together with 1 obtain second and third QNR scores, respectively. Note that EXP obtained the highest score using QNR since bilinear interpolation of the observed MS low resolution image is used as the MS high resolution estimation to calculate D S and D λ calculations. Tables 5 and 6 show the quantitative results using Wald's protocol for the LANDSAT images in Figure 2a,c, respectively. PRACS outperforms all other methods on the image in Figure 2a (see Table 5) and the proposed 1 and log obtain the first and second best scores on the image in Figure 2c (see Table 6). Tables 7 and 8 show the obtained QNR figures of merit for those two images. The proposed methods produce good D S , D λ and QNR values for both images, both above 0.9 which supports their good performance. Again the EXP results are the best in all the measures for Table 8 and provides the best D λ , for the image associated to Table 7. The 1 method obtains the best D S for this image and BDSD the highest QNR. Figures 5 and 6 show a zoomed in region of the RGB color images formed by bands B4, B3, and B2 of MS ground truth images used to apply Wald's protocol and also the absolute error images for the methods in Tables 7 and 8. In those images, the darker the intensity the lower the absolute error. Figures 5 and 6 are consistent with the quantitative comparison shown in Table 5 and 6, respectively. The best results for the image in Figure 2a were obtained using PRACS, while for the image in Figure 2c the best performing method is 1. Note that brighter areas in Figure 5e,f correspond to the borders of cloudy areas in Figure 2a. We argue that since clouds alter the weights of λ estimated using Equation (30), the boundaries of clouds and land areas in Figure 2a are not well resolved. This explains a worse behavior of the proposed methods in the cloudy areas of this image. Tables 9 and 10 show, respectively, the quantitative results using Wald's protocol for the SPOT-5 and the FORMOSAT-2 images in Figure 3 for the decimation ratios p/P = 4 and p/P = 16. The proposed log obtains the best figures of merit for the SPOT image in Figure 3a with p/P = 4 except for Q and Q4 metrics. The Q values obtained by log and 1 are slightly lower than those obtained by BDSD. Note that BDSD achieved the third best general figures just below the proposed log and 1 algorithms. With p/P = 16 the proposed log algorithm provides the best results except for Q, Q4 and SAM values, where competitive values are obtained. The proposed log achieves a slightly lower Q value than PRACS and a slightly higher SAM value than Brovey. In general, PRACS is the second best performing method for this image for p/P = 16. For the FORMOSAT-2 image in Figure 3c, the proposed 1 and log algorithms obtained the best numerical results for a p/P = 4 magnification. Both methods provide similar results, which are better than all the one provided by the competing methods. For a ratio p/P = 16, there is not a clear winner. The proposed methods are competitive in this image although they do not stand out in any of the measures. Tables 11 and 12 show, respectively, the QNR quantitative results for the SPOT-5 and the FORMOSAT-2 images in Figure 3 for the decimation ratios p/P = 4 and p/P = 16. In Table 11, EXP achieves the best D S , D λ and QNR scores. In this table, the proposed methods obtain good scores. The log method obtains the second best D λ and D S values and very high QNR values for both decimation ratios. Results for the FORMOSAT image, shown in Table 12, are very similar although in this case, BDSD obtains the best D S and QNR values for p/P = 4 and ATWT-M3 for p/P = 16.           Table 13 shows the required CPU time in seconds on a 2.40GHz Intel R Xeon R CPU for the pansharpening of a MS image with 4 bands to a 1024 × 1024 size, for p/P = 4 and p/P = 16, using the different methods under comparison. Equation (18) has been solved using the Conjugate Gradient method which required, to achieve convergence, less than 30 iterations for the 1 prior and at least 1000 iterations for the log prior. This explains the differences between their required CPU time. Note that the proposed methods automatically estimate the model parameters which increases the running time but makes our methods parameter free.  Figure 7 shows a small region of interest of the observed LANDSAT-7 images in Figure 1 and the pansharpening results with p/P = 4 obtained by the proposed methods and the competing ones with the best quantitative performance, that is, PRACS, MTF-GLP-HPM, MTF-GLP-CBD and TV methods. All color images in this figure are RGB images formed from the B4, B3 and B2 Landsat bands. Since we are using full resolution images, there is no ground truth to compare with, so a visual analysis of the resulting images is performed. The improved resolution of all the pansharpening results in Figure 7c-h with respect to the observed MS image in Figure 7a is evident. PRACS, MTF-GLP-HPM and MTF-GLP-CBD images in Figure 7c-e have a lower detail level than TV and the proposed 1 method, see Figure 7f,g, respectively. See, for instance, the staircase effects in some diagonal edges not present in the TV and proposed 1 method results. The PRACS, MTF-GLP-HPM and MTF-GLP-CBD methods produce similar, but lower, spectral quality than the proposed method, which is consistent with the numerical results in Table 3 and discussion presented in Section 6.1. The image obtained using the 1 method, Figure 7g, has colors closer to those of the observed MS image than the TV image, Figure 7f, which is also somewhat noisier. The log method is very good at removing noise in the image (see the sea area) but it tends to remove other fine details too.   Figure 3a,b and the pansharpening results with p/P = 16 obtained using the competing methods with the best performance on this image, that is, Brovey, PRACS, ATWT-M3, and TV methods, and the proposed 1 and log methods. All color images in this figure are RGB images formed from the bands B3, B2 and B1 of the SPOT-5 image. The Brovey method (Figure 8c) produces the highest spectral distortion, however, it also recovers more spatial details in the image (see the airport runway and plane). ATWT-M3 (Figure 8e), on the other hand, produces a blurry image. PRACS produces a sharper image, see Figure 8d, but details in the PAN image do not seem to be well-integrated. The TV method (Figure 8f) and the proposed 1 and log methods (Figure 8g,h obtain the most consistent results, with high spatial details and low spectral distortion. However, TV introduces staircase artifacts on diagonal lines that are not noticeable in the 1 and log images. As with the LANDSAT-7 image, the log image in Figure 8h lacks some small details, removed by the method along with noise.

Conclusions
A variational Bayesian methodology for the pansharpening problem has been proposed. In this methodology, we model the relation between the MS high resolution image and the PAN image as a linear combination of the MS bands whose weights are estimated from the available data. The observed MS image is modelled as a downsampled version of the original MS image. The expected characteristics of the pansharpened image are incorporated in the form of SG sparse image priors. Two penalty functions corresponding to SG distributions are used, 1 and log. All the unknowns and model parameters have been automatically estimated within the variational Bayesian modelling and inference, and an efficient algorithm has been obtained.
The proposed 1 and log methods have been compared to classic and state-of-the-art methods obtaining very good results both quantitative and qualitatively. In general, they have obtained the best quantitative results for LANDSAT-7 ETM+, SPOT-5 and FORMOSAT-2 images with a resolution ratio of 4 and SPOT-5 with a resolution ratio of 16. Competitive results were also obtained for the FORMOSAT-2 image with a resolution ratio of 16. They stand out in terms of spectral consistency while improving the spatial resolution of pansharpened images. We argue that the superior spectral consistency of SG methods arises from the modelling of the PAN image which selectively incorporates PAN detailed information into the different MS high resolution bands without changing their spectral properties. Qualitatively, SG methods produce results consistent with the observed PAN and MS images and with the numerical results previously described. The log method is better at removing noise in the images, at the cost of removing some fine details.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The