Blind image fusion for hyperspectral imaging with the directional total variation

Hyperspectral imaging is a cutting-edge type of remote sensing used for mapping vegetation properties, rock minerals and other materials. A major drawback of hyperspectral imaging devices is their intrinsic low spatial resolution. In this paper, we propose a method for increasing the spatial resolution of a hyperspectral image by fusing it with an image of higher spatial resolution that was obtained with a different imaging modality. This is accomplished by solving a variational problem in which the regularization functional is the directional total variation. To accommodate for possible mis-registrations between the two images, we consider a non-convex blind super-resolution problem where both a fused image and the corresponding convolution kernel are estimated. Using this approach, our model can realign the given images if needed. Our experimental results indicate that the non-convexity is negligible in practice and that reliable solutions can be computed using a variety of different optimization algorithms. Numerical results on real remote sensing data from plant sciences and urban monitoring show the potential of the proposed method and suggests that it is robust with respect to the regularization parameters, mis-registration and the shape of the kernel.


Introduction
Hyperspectral imaging is an earth observation technique that measures the energy of light reflected off the earth's surface within many narrow spectral bands, from which reflectance curves can be calculated. The shape of these reflectance curves provides information about the chemical and physical properties of materials such as minerals in rocks, vegetation, synthetic materials and water, which allows these materials to be classified and mapped remotely (see [1,2], for instance).
Usually, hyperspectral images of the earth are being recorded by imaging devices that are mounted on planes or satellites. In either case, the speed of the carrier and narrowness of the spectral bands result in limited energy reaching the sensor for each spectral band. Therefore, most hyperspectral cameras have an intrinsic trade-off between spatial and spectral resolution [3,4]. A standard approach to overcoming this trade-off is to record a second image with low spectral but high spatial resolution (e.g. a photograph or a laser scan) [5] and to fuse these data together in order to create an image that has the high spectral resolution of the hyperspectral image and the high spatial resolution of the photograph [3][4][5][6][7][8][9][10][11][12][13]. These techniques have become known as pansharpening or image fusion. The problem is illustrated by three example data sets in figure 1.
Most image fusion techniques are based on certain assumptions regarding the data and the corresponding imaging devices. First, it is often assumed that the image of high spatial resolution is a linear combination of the spectral channels with known weights [4]. Second, the loss of resolution is usually modeled as a linear operator which consists of a subsampled convolution with known kernel (point spread function). While both assumptions may be justified in some applications, it may be difficult to measure or estimate the weights and the convolution kernel in a practical situation.
Instead of having accurate descriptions or measurements about the system response of our cameras [4,6,12], we make use of the fact that both images are acquired from the same scenery. While the actual image intensities will depend on the wavelengths that are being recorded and are therefore dissimilar, the geometric structure of those images is likely to be very similar as they are both taken from the same terrain [6,7,9,12]. This approach has been studied extensively in the context of medical imaging where traditionally an anatomical imaging modality with high spatial resolution is used to aid the reconstruction of a functional imaging modality, cf. e.g. [14][15][16][17][18][19][20].
Note that the hyperspectral data image and the high-resolution photograph data non-blind blind data ta non-blind non-blind blind usually undergo a georectification procedure which aligns them using a digital elevation map of the study area. However, this adjustment is often imprecise, particularly in topographically complex regions, so further steps are required to co-register the images precisely [21]. Moreover, having a good description of the point spread function of the system is very unnatural. Thus, we take a blind approach and estimate the point spread function within our reconstruction and thereby learn this information from the recorded data. This approach is intrinsically related to blind deconvolution [22][23][24][25]. Figure 2 shows an example of blind vs non-blind image fusion applied to the real remote sensing data with unknown blurring kernel. It can be seen that the shape of the kernel that is estimated during the reconstruction clearly differs from the fixed kernel. Equally important is the translation of the estimated kernel, which can compensate for shifts between the given image pair. To the best of our knowledge this is the first contribution where structural side information / anatomical priors are combined with registration which removes the unnatural assumption of perfectly registered images that is usually made, cf. [14][15][16][17][18][19][20], for instance.

Mathematical Model and Notation
In this work we consider the problem of simultaneous image fusion and blind image deconvolution which can be cast as the optimization problem where the forward operator A k is a sampled blur with kernel k. Thus, it relates an image u ∈ U := R m of size m = (m 1 , m 2 ) to blurred and subsampled data f ∈ R n , n = (n 1 , n 2 ). Note that we used multi-index notation which means R m = R m1×m2 , for instance. Here, we consider the case of super-resolution n < m (true if and only if n 1 < m 1 and n 2 < m 2 ). The data discrepancy is measured in terms of the Euclidean / Frobenius norm x 2 := i x 2 i and the optimal solution is regularized through the penalty functionals R u and R k .
Since-up to now-the ingredients of our model are only vaguely defined, the remainder of this section is devoted to an detailed explanation of the operators and regularization functionals involved.
Blind Image Fusion for Hyperspectral Imaging 4

The Forward Operator
The forward operator A k := S • B • C k is the composition of three operators: C k represents a convolution with a kernel k, B performs boundary treatment and S is a sampling operator.
The convolution operator C k : U → U with kernel k ∈ K := R r , r := (r 1 , r 2 ) is defined by where * is the cyclic convolution of images in U = R m . Since the kernel k is generally assumed to have smaller support than the image u, i.e. r < m, we introduced the embedding operator J : K → U which embeds the 'small' convolution kernel k ∈ K into the image space U by padding it with zeros. By assuming periodic boundary conditions on u and Jk, the convolution can be efficiently computed via the discrete Fourier transform F, i.e. C k u = F −1 (FJk Fu) where denotes the Hadamard product (pointwise multiplication). With a slight abuse of notation, we can exploit the symmetry of the convolution to infer and thus which will prove helpful later-on. While periodic boundary conditions are computationally useful, any kind of boundary treatment is artificial and may result in artifacts. For instance, boundary artifacts are expected in the boundary region with margins l := (r − 1)/2 where r is the diameter of the convolution kernel. To avoid these artefacts, we follow the boundary-layer approach (cf. [26] and references therein; also described by Almeida and Figueiredo in [27]) and introduce a boundary clipping (margin deletion) operator that maps a large image u to a meaningful image Bu ∈ R m−2l which is independent of the type of boundary conditions used for the precedent convolution. Thus, after computation of an optimal solution u * of (1) we only use the meaningful part Bu * . Finally, the third component of our forward operator is the sampling S : R m−2l → R n which mimics an integration sensor of size s × s, i.e.
where the detector indices are given by the set ∆ i := {j = (j 1 , j 2 ) | i ≤ j < i + s}, which contains s 2 elements. Consequently, for a kernel of odd dimensions the relation m − 2l = sn connects the sizes of kernel, image and data. Let us now turn to the regularization and prior knowledge on the image u and the kernel k.
Due to the periodic boundary conditions needed for the forward operator, the discretized gradient (with forward differences) ∇u ∈ U 2 of an image u ∈ U is where e 1 = (1, 0) and e 2 = (0, 1) are the standard unit vectors in R 2 .
In the task at hand, the high-resolution photograph can be viewed as a source of additional a-priori knowledge about the sought solution. As both the hyperspectral data and the high-resolution photograph depict the same scenery, it is reasonable to assume that they show the similar geometrical structures albeit having very different intensities. However, the TV functional (7) is not able to incorporate this kind of additional prior information. Instead, we will apply the so-called directional TV (dTV), which can fuse the structural information of a high-resolution image with the intensities of a low-resolution channel of a hyperspectral image. Figure 3 provides a visual comparison of reconstructions obtained from applying the TV and the dTV functional, respectively.
Definition (Directional Total Variation [18]). Let ξ ∈ U 2 , ξ i ≤ γ < 1 be a vectorfield that determines directions and denote by P ∈ U 2×2 , P i := I−ξ i ⊗ξ i an associated matrix-field where I is the 2 × 2 identity matrix and ⊗ marks the outer product of vectors. Then the directional total variation dTV : U → R is defined as as Note that due to the linearity of P i the directional total variation is convex. Let us shortly comment on the interpretation of the directional total variation. The vectorfield ξ allows the definition of directions that should be less penalized and therefore get promoted. It is easy to see that the quantity P i ∇u i can be expanded as where ·, · denotes the inner product on R 2 . This expression reduces to (1− ξ i 2 )∇u i in regions where ∇u i is collinear to ξ i , and to ∇u i where ∇u i is orthogonal to ξ i . Thus, gradients that are aligned / collinear ξ i are favored as long as ξ i > 0. In the extreme case ξ i = 1, aligned gradients are not penalized any more. It is important to note, that this prior does not enforce gradients in the direction of ξ i , as a gradient in the direction of ξ i is never 'cheaper' than a zero gradient.
The uniform upper bound γ < 1 ensures that the directional total variation is equivalent to the total variation in a semi-norm sense, i.e.
Similar versions of the directional total variation have been used before [17,19,29] and it is related to other anisotropic priors [14,30,31] and the notion of parallel level sets [32][33][34]. Note that the directional total variation generalizes the usual total variation (7) for ξ = 0 and other versions of the directional total variation [35,36] where the direction ξ is constant and not depending in the pixel location i. With the isotropic choice P i = w i I, 0 ≤ w i ≤ 1 it reduces to weighted total variation [18,29,[37][38][39].
Having this definition of the directional total variation at hand, the question is how to define the vector-field ξ ∈ U 2 . As we want to favor images having a similar structure as the high-resolution photograph, we let v ∈ U denote a gray-scale image which is generated from the color high-resolution image and define the vector-field where x ε := x 2 + ε 2 and the scaling γ ∈ [0, 1) ensures that ξ i ≤ γ. The parameter ε > 0 guarantees that the vector-field ξ is well-defined, even if ∇v i = 0. Moreover, it allows to control which gradient magnitudes ∇v i should be considered meaningful and which are merely due to noise. Thus, both parameters ε and γ in general will depend on the given side information v but can be chosen without considering a particular data set f . Note that in practice, one chooses γ very close or even equal to 1 (cf. [18] for the latter) such that the side information image has sufficiently much influence. Extensive studies of this parameter are performed in [36] where the authors obtain their best results for small values of a := 1 − γ 2 . An example photograph, its gray scale version and the corresponding vector-field are shown in figure 4.

7
In addition to the structural a-priori knowledge given by the photograph and encoded in the directional total variation, it is reasonable to assume that the solution we are looking for is non-negative. Thus, the regularization functional for the image becomes where the characteristic function of the non-negative quadrant is defined as The regularization parameter for the image λ u > 0 controls the trade-off between regularity and data fidelity.

Kernel Regularization
Similarly as for the image u, it is reasonable to assume that the kernel is non-negative and regular. Motion blurs that arise from ideal diffraction-free lenses have typically a compactly supported kernel with sharp cut-off boundaries [26, p. 234]. Thus, the desired regularity of the kernel may be achieved by utilizing the total variation. Of course, other smoothness priors such as the H 1 -semi-norm or total generalized variation [40] are possible, too, and will be subject of future work. In addition, blind deconvolution has the intrinsic non-uniqueness in the problem formulation that does not allow to estimate the correct global scaling factors for both the kernel and the image. To circumvent this issue, a common choice (e.g. [24][25][26]41]) is to normalize the kernel with respect to the 1-norm. Thus, together with the non-negativity, we assume that the kernel is in the unit simplex Letting ı S denote the characteristic function of the unit simplex, defined analogously to (14), the regularization functional for the kernel becomes where we again can trade-off regularity and data fidelity by the regularization parameter λ k > 0 which in general needs to be chosen differently than the regularization parameter for the image λ u .

Algorithms
Given the abstract mathematical model (1) which models the solution that we are after, this section discusses algorithms that will attempt to numerically compute this solution. For a concise presentation, we cast problem (1) in the form by defining Ψ(u, k) := D(u, k) + R u (u) + R k (k) and abbreviating the data fidelity term in (1) by D(u, k) : While the function Ψ is not convex in the joint variable (u, k), it is bi-convex, i.e. it is convex in each of the variables u and k. The latter holds true as the regularization functions R u and R k are convex and the data fidelity D is bi-convex.
Blind Image Fusion for Hyperspectral Imaging 8 An abstract algorithm for finding solutions of (17) is the proximal alternating minimization (PAM) algorithm [42]. Given an image-kernel pair (u, k) and step sizes τ u , τ k > 0, it computes the next iterate (u + , k + ) by the update scheme Under reasonably mild conditions on the regularity of Ψ (see [42] for more details), PAM monotonously decreases the objective and its iterates converge to a critical point of Ψ. As the overall problem is non-convex, we cannot hope to find global solutions with this strategy.

Proximal alternating linearized minimization (PALM)
In addition, we may exploit the fact that the data term D is continuously differentiable with respect to u and k and that it is not very difficult to compute the proximal operators with respect to the regularization functionals R u and R k . The proximal operator of a functional R is defined as the solution operator to a denoising problem with regularization term R prox R (y) := arg min These extra information can be utilized with the proximal alternating linearized minimization (PALM) [43]. Linearizing the differentiable part D of the objective function Ψ in the update of (18) yields the new update It can be interpreted both as a linearized version of PAM and as an alternating form of forward-backward splitting [44,45]. Note that PALM also monotonously decreases the objective and has the same convergence guarantees as long as the step size parameters are well-chosen which we will discuss later in this section.

Inertial PALM (iPALM)
Local methods for non-convex optimization may lead to critical points that are not global minimizers. The authors of [46] proposed an inertial variant of forwardbackward splitting which next to the gradient direction gives inertia to steer the iterates in the previously chosen direction. They empirically showed on a simple two dimensional problem that inertia may help to avoid spurious critical points. An inertial version of PALM [41], called iPALM, follows the updates Blind Image Fusion for Hyperspectral Imaging 9 with inertia parameter α ∈ [0, 1). Next to the current iterate (u, k) it also depends on the previous iterate (u − , k − ). We remark that PALM can be recovered by vanishing inertial parameter α = 0. Also note that one may define iPALM not with one inertial parameter α but with four different ones [41]. While iPALM is not guaranteed to decrease the objective Ψ, it monotonously decreases a surrogate functional that relates to Ψ [41] and its iterates are guaranteed to converge to a critical point of Ψ for wellchosen step size and inertia parameter.
Algorithm 1 PALM / iPALM with backtracking. PALM is recovered by setting the inertial parameter α = 0. The parameters L, L, η, θ can all be chosen different for the image u and kernel k but for simplicity we chose not to.

Step sizes and backtracking
The convergence of PALM and iPALM depend on the step size parameters τ u , τ k . Following [41], we choose the step size parameters as where θ > 1 is a global constant and L x is the local Lipschitz constant of the derivative of the data fit D with respect to x. Using the relation (4), its derivatives with respect to the image and kernel are which shows that L u will depend on k and L k on u. While it is possible to conservatively estimate these constants, standard estimates turn out to be highly pessimistic and one can do significantly better by determining L u and L k with a backtracking scheme [46,47]. The actual local Lipschitz constants L u and L k satisfy the descent inequalities (also called descent Lemma) [41,43,46,47] If during the iterations these inequalities are not satisfied, we increase the Lipschitz estimates by a constant factor η > 1 and repeat the iteration.
In addition to the descent inequality, an inexact evaluation of the proximal operators (e.g. due to finite iterations) may also hinder convergence. Thus, we will evaluate the proximal operators to a precision such that the proximal descent inequalities hold. Equations (24), (25) and the parameters L u , L k guarantee the monotonic descent of PALM and iPALM with respect to the objective or its surrogate [41,43]. The pseudo code for PALM and iPALM with this backtracking can be found in algorithm 1.

Numerical Setting
Data In this work we use data from two separate data sets. The first data set [48] is from environmental remote sensing to study vegetation in the Mediterranean woodlands in Alta Tajo natural park, northeast of Madrid, Spain [49]. It was acquired in October to November 2014 using an AISA Eagle hyperspectral sensor and a Leica RCD105 39 megapixel digital camera mounted on a Dornier 228 aircraft. The hyperspectral imagery consisting of 126 spectral bands almost equidistantly covering the wavelengths 400 nm to 970 nm with a spatial resolution of 1 m x 1 m. The aerial photograph has a resolution of 0.25 m x 0.25 m but only captures channels corresponding to the visible red, green and blue. We refer to images from this data set as environment1, environment2.
The second data set [50,51] shows an urban area of Vancouver, Canada. It was acquired in March to May 2015 by the DEIMOS-2 satellite operating on a mean altitude of 620 km. The satellite carried a very highly resolving push-broom camera with one panchromatic channel of 1 m x 1 m resolution and the four multispectral channels red, green, blue and near-infrared with a resolution of 4 m x 4 m. Images from this data set are marked urban1, urban2.
Next to the already described real data sets we simulated an environmental data set where we know the ground truth image and kernel. We test two different kernels, one which is disk-shaped and one which is an off-centered Gaussian and refer to the corresponding data sets as groundtruth disk and groundtruth Gaussian. The red channel of an RGB image is convolved with these two kernels, subsampled by a factor of s = 4 and Gaussian noise of variance 0.001 is added. The data is shown at the top left Blind Image Fusion for Hyperspectral Imaging 11 of figures 12 and 13. For the data set groundtruth Gaussian we use the RGB image itself as side information and for groundtruth disk we use a shifted version of it. This situation occurs frequently in applications since usually data and side information are acquired by different imaging devices which can lead to a relative translation between the observed images. Note that we used replicated instead of periodic boundary conditions for the convolution in order to avoid inverse crime. As the ground truth is known in these cases, we evaluate the result in terms of the structural similarity index (SSIM) [52] and the Haar wavelet-based perceptual similarity index (HPSI) [53]. As compared to measures like the peak-signal-to-noise ratio (PSNR) or the mean squared error (MSE), the SSIM and the HPSI aim at reproducing how image similarity is perceived by human viewers. The SSIM is one of the most widely used image similarity measures and obtained by comparing local statistical parameters such as local means and local variances. The HPSI is based on a simple discrete Haar wavelet transform and incorporates basic assumptions about the human visual system. The HPSI was recently shown to yield state-of-the-art correlations with human opinion scores on large benchmarking databases of differently distorted images.
In all numerical experiments the data f are scaled to [0, 1].
Visualization Throughout the results, the data and reconstructed images are shown with MATLAB's 'parula' colormap where yellow represents values ≥ 1 and dark blue corresponds to values ≤ 0, respectively. The kernels are visualized with gray scales where black corresponds to the smallest value and white to the largest. In addition, we draw a red hair cross on the center, i.e. the pixel with indices l + 1, where l denotes the radius of the kernel. The kernel's centroid i ik i is visualized by a green hair cross in order to visualize the off-set between data and side information.
Algorithms The numerical results will be computed with the algorithms PALM, iPALM and PAM. For PAM, the inner problems require solving a deconvolution problem with convex regularization which is implemented with ADMM [54][55][56] following a similar strategy as in [18]. All three-PALM, iPALM and PAM-require the computation of proximal operators for the total variation and directional total variation under a convex constraint. These are implemented with a finite number of warm started Fast Gradient Projection/FISTA [47,57] iterations. Within these, projections onto the constraint sets are computed by prox ı [0,∞) m (u) = max(0, u) (to be understood componentwise) and the fast algorithm of [58] for the projection onto the unit simplex. A MATLAB implementation of the algorithms and the data itself can be found at [59].

Parameters
The numerical experiments require model parameters and algorithm parameters. For the mathematical model, the parameters of the directional total variation are chosen as γ = 0.9995 and ε = 0.003 for all experiments, cf. (12) and figure A1 in the appendix. The low resolution images are of size 100 × 100 with an subsampling rate of s = 4. Since we fix the kernel size to be 41 × 41, we will consequently obtain reconstructions of size 440 × 440 where we only visualize the meaningful part of size 400 × 400. Except for the algorithm comparison, we use PALM with 2000 iterations and a step size constant θ = 1.1, cf. (22). The backtracking parameters in algorithm 1 are chosen as η = 2, L = 1 and L = 10 30 .
Initialization Both algorithm 1 and our implementation of PAM require initial guesses for image and kernel. While the kernel is initialized with a compactly supported Gaussian kernel, we choose the initial image to be Hf , where H : R n → U is a right inverse of the operator S•B, i.e H is an upsampling operator (cf. the top-left images in figures 5 and 7).    10 for environment1. Please refer to the captions of each figure for some detailed observations. We would like to highlight two important observations. First, both examples show that all three algorithms converge to practically the same critical point and that a visually pleasing reconstruction is already obtained after about 500 iterations. Thus, even though problem (1) is non-convex, the algorithm itself does not seem to have much influence on the final reconstructions. Second, the examples show that inertia may slow the convergence down as for groundtruth disk or may make it faster as for environment1. Other results-which are not shown here for brevityindicate that this is correlated with the regularization parameter λ u . For large values, e.g. λ u ≥ 1, we found that inertia consistently sped up convergence while for smaller values, e.g. λ u ≤ 0.1 we found that it was always slowing down the convergence. Thus, for simplicity and due to the lack of supporting examples that would justify other decisions, all further reconstructions are performed by PALM.

Simulated Data
This section is dedicated to simulated data where the ground truth image is known. Figure 11 shows the similarity indices SSIM and HPSI of the reconstructed images when varying the image and kernel regularization parameters λ u and λ k . The results show that the proposed model is stable with respect to the choice of parameters as long as the image regularization λ u is not too small. It is remarkable that the success of the groundtruth Gaussian Figure 11. Similarity measures SSIM and HPSI for different values of the regularization parameters λu and λ k . Top: data set groundtruth disk with shifted side information. Bottom: data set groundtruth Gaussian. Yellow corresponds to high and blue to low similarity. All four plots indicate that the reconstructions are not very sensitive with respect to the kernel regularization. model depends only mildly on the choice of λ k and a vanishing kernel regularization λ k = 0 has only modest impact. However, it can also be seen that best reconstructions are obtained for non-zero λ k which indicates that a mild kernel regularization should be performed.
In figure 12 we show reconstruction results for the groundtruth disk test case where we fix the image regularizing parameter λ u = 0.1 and vary the kernel regularization λ k . Note that we complicated the situation by using a side information which is shifted by several pixels and, thus, not aligned to the data. The proposed approach is well-fit to deal with this problem since blind deconvolution can implicitly perform rigid registration of data and side information by producing a blurring kernel that compensates for the translation by having an off-centered centroid. The results show that our method is able to compensate for unaligned side information images and can achieve very good results, nevertheless. The influence of the amount of kernel regularization is noticeably small although over-regularization clearly leads to different contrasts in the reconstructed image. Note that the barycentric translations of the kernels in figure 12 correspond exactly-up to pixel accuracy-to the number of pixels that the shifted side information deviates from the un-shifted one. Figure 13 shows in a similar way the reconstruction results for the test case groundtruth Gaussian, using the un-shifted side information. In contrast to the previous example, here we fix the parameter λ k = 10 and vary λ u instead. Here, the effects of different regularization strengths are considerably larger; particularly, in  the case λ u = 0.01 the shift between data and side information-which was induced by convolving with the off-centered kernel-is not fully detected. Consequently, the resulting similarity values, being not translationally invariant, are relatively poor. In the opposite scenario of over-regularizing the image, small structures in the image are not captured well.  However, due to a lower noise level of the data, the parameters can be chosen smaller.

Hyperspectral Data
In this section we apply our method to real data and-similarly to the previous section-investigate the effects of varying image and kernel regularization. To this end, we apply our method to one channel of the hyperspectral environmental and of the multispectral urban data set. Figures 14 and 15 show data, side information, and the corresponding reconstruction results for different values of the regularization parameters. Since no ground truth solutions are known in this case, we rely on our visual impression to choose a set of parameters. Figure 14, which corresponds to nearinfrared light, suggests that a good compromise among data fidelity and regularization  is achieved for λ u = λ k = 1. However, even if no TV regularity of the kernel is required, that is in the case λ k = 0, the resulting images do hardly differ. Bearing in mind the results of figure 12, we will regularize the kernel slightly in the sequel. The situation is similar in figure 15, which also corresponds to near-infrared light. However, a smaller image regularization parameter, e.g. λ u = 0.01, seems necessary in order to avoid over-smoothing.
Finally, we apply the proposed strategy to several hyperspectral channels of the data set environment2, using the same parameters for the reconstruction of each channel. Here, we scaled the reconstructed images back to their physical range to allow inter-channel comparisons. Figure 16 shows the reconstruction of six hyperspectral channels ranging from blue to near-infrared light. Note that the resulting kernels only change slightly with respect to the channel. We would like to point out that we were able to use the same set of parameters, namely λ u = λ k = 1, for all channels. In figure 17 we show the reconstruction of all channels of the multispectral urban data set urban2 using λ u = 0.01 and λ k = 1. Additionally, we visualized the results by re-combining the four channels into an RGB and a false color image which has the channels near-infrared, red and green. We would like to highlight that the kernels are consistently estimated up to a pixel accuracy and that no color smearing is visible in the RGB and false color images despite the fact that all channels have been processed independently.

Conclusions and Outlook
We have presented a novel model for simultaneous image fusion and blind deblurring of hyperspectral images based on the directional total variation. We showed that this approach yields sharp images for both environmental and urban data sets that are either acquired by a plane or by a satellite. In addition, the numerical comparison of different algorithms showed that-despite the non-convexity of the problem-the computed solutions are stable with respect to the algorithms. The implicitly included registration by estimating the kernel yields further robustness to imperfections in the image acquisitions. Thus, the proposed approach is appealing for automated processing of large data sets.
Future work will be devoted to multiple directions. First, the proposed approach is channel-by-channel which makes it computationally appealing. However, a coupling of the spectral channels via their kernel and/or via spectral fingerprint preservation may yield further improvements. Second, higher order regularization such as the total generalized variation [40] has been proven useful for natural images. This naturally leads to the directional total generalized variation which combines higher order smoothness with structural a-priori knowledge as presented in this work, thereby generalizing the definition of [36]. Similarly, other kernel regularization functionals such as the total generalized variation or the H 1 -semi-norm will be considered in future investigations to better model the smoothness of the kernel. Finally, it is currently not clear how the computational cost of the proposed method depends on the size of the considered data. In particular, it is difficult to predict to which degree larger images also require a higher number of iterations during optimization for the method to yield satisfactory results. However, due to the fact that our approach mostly depends on operations that are of a local nature, it should be highly suitable for parallelization. The development and analysis of implementations that utilize the benefits of distributed computing architectures is also a topic for future research.