PAN-SHARPENING APPROACHES BASED ON UNMIXING OF MULTISPECTRAL REMOTE SENSING IMAGERY

Model based analysis or explicit definition/listing of all models/assumptions used in the derivation of a pan-sharpening method allows us to understand the rationale or properties of existing methods and shows a way for a proper usage or proposal/selection of new methods ‘better’ satisfying the needs of a particular application. Most existing pan-sharpening methods are based mainly on the two models/assumptions: spectral consistency for high resolution multispectral data (physical relationship between multispectral and panchromatic data in a high resolution scale) and spatial consistency for multispectral data (so-called Wald’s protocol first property or relationship between multispectral data in different resolution scales). Two methods, one based on a linear unmixing model and another one based on spatial unmixing, are described/proposed/modified which respect models assumed and thus can produce correct or physically justified fusion results. Earlier mentioned property ‘better’ should be measurable quantitatively, e.g. by means of so-called quality measures. The difficulty of a quality assessment task in multi-resolution image fusion or pan-sharpening is that a reference image is missing. Existing measures or so-called protocols are still not satisfactory because quite often the rationale or assumptions used are not valid or not fulfilled. From a model based view it follows naturally that a quality assessment measure can be defined as a combination of error model residuals using common or general models assumed in all fusion methods. Thus in this paper a comparison of the two earlier proposed/modified pan-sharpening methods is performed. Preliminary experiments based on visual analysis are carried out in the urban area of Munich city for optical remote sensing multispectral data and panchromatic imagery of the WorldView-2 satellite sensor. * Corresponding author


INTRODUCTION
Multi-resolution image fusion also known as pan-sharpening aims to restore/estimate a multispectral image in a high resolution space from two given inputs: low resolution multispectral image and high resolution panchromatic image.Multi-resolution image fusion is not limited only to multispectral and panchromatic image pairs.For example, in hyper-sharpening, a high resolution image is a multispectral image, and a low resolution image is a hyperspectral image.A large number of algorithms and methods to solve this problem were introduced during the last three decades.One of the first extensive classifications of image fusion methods is presented in (Pohl and van Genderen 1998), where all methods are divided into two main groups: color related techniques, e.g.Intensity-Hue-Saturation (IHS) and statistical/numerical methods, including arithmetic combinations (addition, multiplication, difference and ratio), Principal Component Analysis (PCA), High Pass Filtering (HPF), Component Substitution (CS) and wavelets.Additionally, a combined approaches group is introduced.In (Thomas et al. 2008) the methods are grouped into three classes: projection/substitution methods (e.g.IHS, PCA); relative spectral contribution (RSC) methods, e.g.Brovey transform (BT) (Gillespie et al. 1987) and Color Normalization (CN) (Vrabel 1996); and ARSIS concept (Ranchin and Wald 2000) or its implementations, e.g.Multi-Resolution Analysis (MRA) using wavelet transform (Aiazzi et al. 2002).Similarly as mentioned above, a hybrid methods group including all combinations of three previous groups is introduced.In (Amro et al. 2011) the methods are divided into five groups: CS family, RSC family, high-frequency injection family, methods based on the statistics of the image (e.g.Bayesian methods) and multi-resolution family (e.g.MRA).In (Xu et al. 2014) four groups are defined: CS methods (IHS, PCA, Gram-Schmidt (GS) orthogonalization), rationing methods: BT, CN, Smoothing Filter-based Intensity Modulation (SFIM) (Liu 2000), sparse representation (SR) based methods (Li and Yang 2011, Zhu and Bamler 2013, Vicinanza et al. 2015) and ARSIS concept or MRA methods.In (Yang and Zhang 2014) it is shown that three groups of methods based on component substitution, modulation and multi-resolution analysis can be expressed using a generalized model.In recent review (Pohl and van Genderen 2015), which is an update of the old review (Pohl and van Genderen 1998), methods are divided into five groups: CS, numerical and statistical image fusion, modulation-based techniques, multi-resolution approaches and hybrid techniques.In (Zhang and Huang 2015) a new look at image fusion methods from a Bayesian perspective is presented, which shows that CS and MRA based methods are special cases of Bayesian based method under Gaussian model assumption.

i
In summary, the most known and popular methods can be divided into two large groups (Vivone et al. 2015, Palubinskas 2015).The first group of methods is based on a linear spectral transformation, e.g.IHS, PCA, and GS, followed by a CS.Methods of the second group use spatial frequency decomposition usually performed by means of High Pass Filtering, e.g.boxcar filter (HPF) in signal domain, filtering in Fourier domain or MRA using wavelet transform.Additionally, we would like to mention presently quite a small group of methods, but which are spreading quite rapidly in remote sensing community.These are so-called model based methods based on the minimization of model error residuals, e.g. using Bayesian (Aanaes et al. 2008, Fasbender et al. 2008, Zhang et al. 2012, Aly and Sharma 2014, Palsson et al. 2014, Loncan et al. 2015) or already mentioned SR based techniques.One of the main reasons, why it is so difficult to classify existing methods is a missing common denominator.We propose to use models/assumptions, which are used in a method derivation, as a basis for different method comparison.First attempts of such type classification were already performed, e.g. using image formation model (Wang et al. 2005).Thus, we propose to look at image fusion methods from a model based view.This type of analysis allows us to recognize quite easily similarities and differences of various methods and thus perform a systematic classification of most known multi-resolution image fusion approaches and methods.
In parallel to the development of pan-sharpening methods, many attempts were undertaken to assess quantitatively their quality usually using measures originating from signal/image processing.Examples of measures (scalar or vector based) used to assess pan-sharpening quality are Spectral Angle Mapper (SAM), Mean Squared Error (MSE) and measures based on it, e.g., Peak Signal-to-Noise Ratio (PSNR), Relative Average Spectral Error (RASE) and relative dimensionless global error in synthesis (ERGAS), Pearson's Correlation Coefficient (CC), Universal Image Quality Indices (UIQI/SSIM) and multispectral extensions of UIQI (Q4/Q2n) just to mention few or most popular of them.For recent overviews of quality measures, see references (Alparone et al. 2007, Ehlers et al. 2010, Aiazzi et al. 2011, Makarau et al. 2012, Palubinskas 2015).These simple/separate measures can be used only as Full Reference (FR) measures that is, when the reference image is available.This situation is valid for quite few applications mostly simulations.Due to the missing reference in pansharpening quality assessment task different solutions or socalled protocols were proposed: Wald's protocol (Wald et al. 1997), Zhou's protocol (Zhou et al. 1998), Quality with No Reference (QNR) (Alparone et al. 2008) and Khan's protocol (Khan et al. 2009), which usually include the calculation of several quality measures.Of course, a sole or joint quality measure, as already proposed in (Alparone et al. 2008, Padwick et al. 2010, Palubinskas 2015), enables much easier and practical/comfortable ranking of various fusion methods.
But besides that, most of existing quality measures cannot satisfy fully the needs of remote sensing community for different methods evaluation or comparison.Again, we propose to look at quality assessment problem from a model based perspective.Thus, we propose to select common models used by most of fusion methods and to define a quality measure by combining such model error residuals.This paper presents a new view to pan-sharpening task -model based view -in the following section (Sect.2).This analysis allows us to understand the origin or rationale of most known methods and additionally results in a proposal of new methods or shows a way for the modification of existing methods.Moreover, this analysis resulted in the proposal of new quality measures or a way how to construct new measures which have a potential for a future.Finally, the paper ends with conclusion and reference sections.

METHODOLOGY
A model based view at pan-sharpening or multi-resolution image fusion and its quality assessment is presented.This analysis allowed us to show the origin and drawbacks of most popular methods and a proposal or selection of methods which really respect models assumed.

Aim
The aim is to restore/estimate a multispectral image msf in the resolution of panchromatic image pan, from low resolution multispectral image ms using a high resolution panchromatic image pan.It is known as a pan-sharpening or multi-resolution image fusion task.To solve this task various assumptions or models can or should be made.Quite often not all assumptions are explicitly written.Further in this paper the assumptions or models assumed are underlined.Notations used in this paper are given in Tables 1-3 (k,j) s hr (k,j) a b Table 2. Definition and notation of input, intermediate and output pixels of pan-sharpening: (a) interpolated multispectral pixel to the resolution of pan, (b) result of pan-sharpening (index j is used for high resolution pixels in the area occupied by ms pixel i).

i j j
High resolution pan pixel clu p cluster pixel p hr (j) c hr (j) c d Table 3. Definition and notation of input, intermediate and output pixels of pan-sharpening: (a) panchromatic pixel in high resolution, (b) cluster pixel of pan (index j is used for high resolution pixels in the area occupied by ms pixel i).
Index n i is used to denote pixels in some neighborhood/window (e.g.3x3 pixels as used later in this paper) of pixel i in ms image.
A box with a bold line shows image area occupied by ms pixel i in pan image (index j is used to denote pixels in a high resolution image) (see Table 2. or 3).Again, index n j is used to denote pixels in some neighborhood/window of pixel j in pan image, e.g.used for filtering in HPFM and MRA methods.Bold symbols are used for vectors and matrices.

General models/assumptions
The following three implicit assumptions are valid for pansharpening.First, the spatial resolution of ms is lower than the resolution of pan.Second, ms exhibits more spectral bands than pan.Third, spectra of both data overlap significantly.
For each ms pixel i (index i is omitted for simplicity further) the following four assumptions (1-4) can be formulated, which are based on a common sense of image formation/acquisition model.
Spectral consistency for low resolution multispectral data (Model 1) where w(k) are spectral weights calculated from spectral response functions of data provider (Boggione et al. 2003) and sum to 1 . This assumption gives a relation between the input of pan-sharpening task: a low resolution multispectral image s lr and an unknown panchromatic band in a low resolution p lr .We will show later that it appears to be useful in derivation of image fusion methods and thus is used further for an approximation or estimation of p lr .
Spectral consistency for high resolution multispectral data This assumption gives a relation between input p hr and output s hr of pan-sharpening task, thus is essential further.
Spatial consistency for multispectral data (Model 3) where f s are low pass filters with . Again, this assumption gives a relation between input s lr and output s hr of pan-sharpening task, thus is essential further.
Spatial consistency for panchromatic data (Model 4) where f p is a low pass filter with 1 ) (   j p j f . This assumption gives a relation between the input of pan-sharpening task p hr and an unknown panchromatic band in a low resolution p lr .We will show later that it appears to be useful in derivation of image fusion methods and thus is used further as an alternative to (1) for an approximation or estimation of p lr .
For each model or assumption an error term should be added, which is omitted for simplicity.
If we assume that both assumptions (1,4) are valid simultaneously, then it appears that both filters f s and f p should be equal (see Appendix 1).But, this is not always valid in real applications.So simultaneous usage of assumptions (1,4) imply too strong restrictions and thus are not very useful in praxis.Usually only one these two assumptions is used, depending on the way how p lr is calculated/estimated. Thus only the following two assumption combinations: (1,2,3) or (2,3,4) are used further in this paper.
Additionally, the following noise observation model (Model 5) is assumed where o -observation sample, t -true signal, n -additive noise with a standard deviation . o can be, e.g.s (spectral image) or p (panchromatic image) in this context.For example, (5) can be written as in some homogeneous image area.
It can be seen that assumptions (1-5) depend on parameters w, f s , f p and , which may be given or have to be estimated from data.For example, w can be estimated using ( 4) and (1), when f p is known.
Additional assumptions may concern the accuracy of the absolute/relative radiometric calibration of ms and pan data, and the spatial co-registration of ms and pan data (Baronti et al. 2011).

Pan-sharpening methods
In the pan-sharpening task usually the following two general models (2) and (3) are assumed.Moreover, implicitly the model (1) or (4) and the model (5) are assumed.
For example, for WorldView-2 satellite remote sensing data assumption (2) results in 4x4=16 equations and assumption (3) in 8 equations.So in total 24 equations are available to estimate in total 16*8=128 unknowns.Thus, a pan-sharpening task is an illposed problem.

j j
To solve it additional assumptions or more specific models are needed.Here, we will show how a pure pixels assumption leads to an analytic solution of pan-sharpening task.Its relation to several known image fusion methods is presented.Further, the two methods are presented for pan-sharpening of mixed pixels, which respect models assumed.Finally, other so-called model based methods minimizing model error residuals are shortly introduced.

Pure pixels assumption:
A pure pixels assumption (Model 6), which is usually defined for some homogeneous image area, implies the following relationship for each k This assumption reduces a pan-sharpening task to the estimation of only 8 unknowns and thus allows an analytic solution of the problem.From ( 6) and ( 2) it follows directly i j p j p hr hr or equivalently 0 ) ( where P hr is a vector of values p hr (j) belonging to ms pixel i.
Under the pure pixels assumption from (2) follows and from ( 3) and ( 6) we have The following relation was used in the derivation of (10).
Thus, from (10) we have a solution of a pan-sharpening task with the condition (9) to be fulfilled.This is a trivial, but very important solution of the problem because it is analytic.
We have to note, that under the assumption (1) the condition (9) reduces to In praxis, quite often equation ( 12 or ( 4) where n j belongs to the neighborhood of j (in the case of WorldView-2 data, e.g.4x4 pixels).In summary, a list of several known methods which can be derived directly from (18,19)  Here we have to note that additionally to the already mentioned drawback (violation of pure pixels assumption), these methods use interpolation of ms to msi, what includes wrong (nonexistent) information into the fusion result for heterogeneous image areas.
So we see that all these well-known methods are based on the same assumptions or models as ( 18) or ( 19) and thus they can be considered model based methods.It is not so easy to see that because quite often the assumptions used are not defined explicitly.For example, attempts of such analysis based on image formation principle can be seen already in (Liu 2000, Wang et al. 2005, Aanaes et al. 2008), though not all models used are written explicitly.So the methods (18,19) can be seen as an optimal pan-sharpening methods for pure pixels.Application of these methods for mixed pixels, as its derivatives such as IHS/CN/SFIM/HPF/HPFM/MRA do, can lead to wrong/incorrect fusion results because the pure pixels assumption is violated.This is maybe the main reason, why most of the fusion methods developed in last decades cannot satisfy fully the remote sensing community, maybe except the visual perception task.
Here we have to note that the assumptions (13,14) can be fulfilled by appropriate histogram matching of data or absolute calibration of input data used in pan-sharpening task.

Mixed pixels:
In heterogeneous image areas pure pixels are quite seldom and the most pixels are mixed that is composed of several components or so-called endmembers (or pure spectra).From ( 8) and ( 17) it follows that mixed pixels satisfy the following condition: . These mixed pixels should be unmixed before the fusion can be performed.Two different approaches are presented below, which respect models assumed.

Linear mixing model.
A very popular approach to unmix mixed pixels is to assume a linear mixing model (Settle and Drake 1993) where M is the number of endmembers em, α -fractions or abundances of endmembers (unknowns) and e -error residuals.The solution of (24) can be found, e.g. by a Bounded-Variable Least-Squares (BVLS) minimization (Stark and Parker 1995) of the following error residual or norm with the following constraint: , where l and u are the lower and upper bounds of α respectively, e.g.l=0 and u=1.EM is a KxM matrix of elements em(k,m) and A is a 1xM vector of α(m).
This spectral unmixing model assumes the knowledge of endmembers a priori (e.g. from spectral library, ms data clustering, pure pixels model or the usage of any other spectral tool to extract em) and estimates abundances α of each em within the ms pixel.The number of endmembers M is limited by the number of spectral bands K or equations in (24).Usually the number of em for a single ms pixel reduces to M'<M due to zero valued abundances.Thus, recently proposed methods based on the compressed sensing theory and sparse representation (Candès and Wakin 2008) can be preferable than BVLS.
Pan-sharpening based on endmembers can be performed following the way proposed in (Bieniarz et al. 2011(Bieniarz et al. , 2014) ) and (Yokoya et al. 2012) for hyperspectral and multispectral image fusion (hyper-sharpening) with some modifications, which are specific for a pan-sharpening task.First, an unmixing of ms pixel is performed by using estimated/given multispectral endmembers em (25).Then, non-zero em are resampled spectrally to endmembers of pan (2) Further steps are specific for a pan-sharpening task.Third, pan pixels covering ms pixel i are clustered into several clusters, e.g. a number of clusters is a number of endmembers M' in ms pixel.For example, k-means or multi-mode histogram thresholding can be used.The clustering method should account for the size of clusters given by ) (m  .Fourth, pixels belonging to one cluster, e.g.c are related to emp by using the following minimization where mean(.) is a mean value of pan values belonging to cluster c.Similarly as in ( 13) possible shift between p hr and emp should be corrected where J is the number of pan pixels in ms pixel (so-called energy preservation assumption in one ms pixel).Finally, msf is calculated for each cluster c as where m c is the endmember index for which ( 27) is valid and c is one of the clusters in pan image.
A local approach, that is the usage of individual ms pixel or its neighbourhood/window, is preferable twofold.First, it is used to avoid ambiguity in (29) due to many-to-one relationship between ms and pan data (2).Second, it allows us to increase the number of equations in (24) in order to match the limited number of bands K. Careful selection of a window size is important.On one hand, the window size should be large enough to include all endmembers present in ms pixel.On the other hand, it should be not too large in order to limit the number of endmembers to K. Already available/detected pure pixels can be used as endmembers.

Spatial unmixing (or unmixing-based data fusion).
Another possible approach to perform image fusion is to perform the interchange unknown variables in (24), that is to estimate endmembers from the known abundances as it is proposed in (Zhukov et al. 1999, Clevers andZurita-Milla 2008).First, the abundances for each ms pixel i can be estimated from the pan image clustering clu p (see Table 1), e.g.M clusters.Then, the spatial unmixing is used to estimate endmembers in ms pixel i separately for each k where the number of equations in (30) is given by the selected neighborhood, e.g.3x3 of ms pixel i.Here, the same minimization methods (e.g.BVLS) can be used as in the previous section 2.3.3.1 to solve equation system (30).High resolution msf image is then reconstructed using estimated em The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLI-B7, 2016XXIII ISPRS Congress, 12-19 July 2016, Prague, Czech Republic This contribution has been peer-reviewed.doi:10.5194/isprsarchives-XLI-B7-693-2016 The size of the neighborhood (number of pixels) or number of equations ( 30) should be larger than M.
There are still some open issues concerning the usage of pure pixels information in spatial ummixing, the usage of already available pure pixels information in the reconstruction method of s hr , selection of a number of clusters M in pan image and the size of the neighborhood of ms pixel i .Finally, the future research could be directed towards the comparison of both image fusion methods.

Minimization of model error residuals:
Model based methods are formulated usually as optimization problems which minimize some cost functions based on image formation/observation models.Most of the methods are based on the following two general assumptions or models (2,3).Thus, in general a minimization of a sum of error residuals of these two models with some regularization function imposed on the solution is used, e.g. as are error residuals of models ( 2) and ( 3) respectively, λ i are regularization parameters, F -some regularization function.
Usually these error residuals are written in a form of matrices (Li and Yang 2011).Most of these fusion methods differ mainly in F definition, e.g.F is a total variation in (Palsson et al. 2014), ERM3 is applied only on high frequency components of signal and F is a spectral correlation dependent regularization (Aly and Sharma 2015).Moreover, the methods can be formulated using a Bayesian (Fasbender et al. 2008, Zhang et al. 2012, Zhang and Huang 2015) or sparsity regularization frameworks (Li and Yang 2011, Zhu and Bamler 2013, Vicinanza et al. 2015).Further, unmixing models can be included into the cost function definition as, e.g. in (Yokoya et al. 2012, Loncan et al. 2015).

Pan-sharpening quality assessment
A natural way to estimate the quality of pan-sharpening in the absence of a reference is to use model error residuals.As it is shown in the previous section most of fusion methods use/assume the models (2,3).Thus, these models can be used as a common denominator to compare or assess the quality of image fusion methods as already proposed in (Palubinskas 2015).
From Model (2) we have a Quality measure in High Resolution (QHR) ) , ( where Composite measure based on Means, Standard deviations and Correlation (CMSC) is defined (Palubinskas 2014) as These three separate quality measures can be used to produce a joint quality measure as, e.g. in (Palubinskas 2015) 1 , Additionally, method specific assumptions or models (error residuals) can be used to characterize the behavior of a particular method sometimes called a quality layer.For example, from pure pixels assumption (8,17) we can define a measure for pure pixels Similarly, for mixed pixels we can use model error residuals as quality measures.

PRELIMINARY EXPERIMENTS
I will illustrate my ideas concerning pan-sharpening quality assessment for optical remote sensing satellite WorldView-2 (WV-2) data over Munich city in Germany.For scene details see Table 5. Original data are presented in Figure 1 (panchromatic band) and Figure 2 (bilinear interpolated multispectral RGB bands).Pure pixels derived using ( 17) are shown in Figure 3 and they occupy about 15% of the whole scene.The result of reconstruction of only pure pixels using an optimal pansharpening method ( 19) is presented in Figure 4. Further, results of reconstruction using the two methods ( 29) and (31) based on linear unmixing are presented in Figures 5 and 6 respectively.
For the visual quality evaluation the result of HPFM (Palubinskas 2013) is included in Figure 7.We see, that unmixing based methods are competitive with known methods and thus have a potential for applications such as classification and change detection.Optimization of the methods and quantitative analysis is still to be performed using new quality measures.From the model based analysis it followed that error residuals of general or common models can be used as quality assessment measures of multi-resolution image fusion methods in the absence of a reference image.These models include the following general or common assumptions: spectral consistency for high resolution multispectral data, spatial consistency for multispectral data and additive noise observation model.They can be combined to define a sole or joint quality measure for a more convenient/comfortable quality measure.
Presented analysis is not limited only to the pan-sharpening.It can be easily extended to the hyper-sharpening too.
The future work could be directed towards a model based classification of all multi-resolution image fusion methods, a comparison of the two proposed/modified pan-sharpening methods with already existing model based methods using proposed quality assessment measures.
of most existing pan-sharpening methods showed that they are based mainly on the two models or assumptions: spectral consistency for high resolution multispectral data and spatial consistency for multispectral data.Additionally, spectral transformation and filtering based methods are based on a pure pixels assumption.Thus, their usage for mixed pixels can lead to wrong image fusion results.Two methods, one based on a linear unmixing model and another one based on a spatial unmixing, are described which respect models assumed and thus can produce correct or physically justified fusion results.Model based methods based on Bayesian and sparse representation frameworks are still in the development, but seem to be very promising.
Equations (18,19)present analytic solutions of the pansharpening task under the assumptions(1,2,3,5,17)or simply for pure pixels for additive (13) and multiplicative (14) models respectively.We have to note that this solution accounts for the variation of pixel values in high resolution panchromatic image, what it makes different from a trivial solution (11).This solution is valid only for pure pixels (17).
is presented in Table4.
Thus, it can be used to define an additional quality measure: Quality measure based on Noise (QN), which can be based on the comparison of noise values (standard deviations) in homogeneous areas of images s lr and s hr