Robust Core Tensor Dictionary Learning with Modified Gaussian Mixture Model for Multispectral Image Restoration

: The multispectral remote sensing image (MS-RSI) is degraded existing multi-spectral camera due to various hardware limitations. In this paper, we propose a novel core tensor dictionary learning approach with the robust modified Gaussian mixture model for MS-RSI restoration. First, the multispectral patch is modeled by three-order tensor and high-order singular value decomposition is applied to the tensor. Then the task of MS-RSI restoration is formulated as a minimum sparse core tensor estimation problem. To improve the accuracy of core tensor coding, the core tensor estimation based on the robust modified Gaussian mixture model is introduced into the proposed model by exploiting the sparse distribution prior in image. When applied to MS-RSI restoration, our experimental results have shown that the proposed algorithm can better reconstruct the sharpness of the image textures and can outperform several existing state-of-the-art multispectral image restoration methods in both subjective image quality and visual perception.


Introduction
The radiance of a real scene is distributed across a wide range of spectral bands. Traditional multispectral remote sensing image (MS-RSI) is achieved by integrating the product of the intensity at four typical band intervals. Characterized by the distinctive advantages of spatial-spectrum information, MS-RSI has been widely in many fields such as land-cover and resource survey, geological exploration, environment monitoring and natural disaster monitoring [Xu, Chen, Xia et al. (2018) ;Fu, Xu, Zhang et al. (2019); Canbaz, Gursoy and Gokce (2018) ;Liu, Yang, Zhao et al. (2017); Xia and Hu (2016)]. In real cases, however, an MS-RSI is always corrupted by some noises that are generally conducted by equipment limitations like sensor sensitivity, photon effects and calibration error [Gong and Chen (2019)]. Moreover, the MS-RSI includes both 2D spatial information and 1D spectral information, it has severe limitations in radiation quality comparing against regular panchromatic image due to the out-of-focus blur of the optical system, the vibration of camera platform, the disturbance of the turbulent atmosphere, the lowpass filtering effect and the integral average effect of the lens. To obtain high-quality MS-RSI, the image restoration process is often necessary when the multispectral imaging system is fixed. In the MS-RSI restoration field, there are mainly two approaches including the 2D approach and 3D-cube-based approach. As one of the classical problems in computer vision, 2D image restoration has been addressed for more than 50 years and a large amount of researches have been proposed on this problem, such as NLM [Penna and Mascarenhas (2018)], BM3D [Chen, Luo, Tian et al. (2017)], K-SVD [Chen, Liu, Huang et al. (2017)], dictionary learning [Zhang, Porikli, Sun et al. (2020)] and the low-rank matrix approximations. For example, the multilayer dictionary model restrains the noise by learning the image feature with similar spectral characters [Liu, Ma, Yang et al. (2017)]; the Laplacian scale mixture distribution and nonlocal low-rank regularization are used to regularize the image recovery process [Huang, Dong, Xie et al. (2017)]. These 2D methods can be directly applied to MS-RSI restoration by processing the images located at different bands separately. However, the details and textures cannot be reconstructed effectively caused by ignoring the inter-spectral correlation of MS-RSI. The more reasonable extension is specifically designed for the 3D-cube-based image restoration methods [Sun and Jeon (2018); Chen and Shen (2015)]. Recently, the tensorbased approaches have led to many promising results, which regards MS-RSI as a 3 rd -order tensor by applying the tensor factorization techniques [Peng, Meng, Xu et al. (2014); Dong, Huang, Shi et al. (2018)]. More specifically, Peng et al. proposed an effective nonlocal tensor dictionary learning method for multispectral image denoising. To exploit nonlocal spatiotemporal redundancy, each set of grouped similar 3D patches are linearly approximated by low-rank tensor approximation. Different from the rank of matrix, the multi-rank of tensor cannot be estimated accurately. Therefore, the sparse coding approaches have attracted increasingly more attention [Dian, Fang, Li et al. (2017);Yang, Wang, Li et al. (2015)]. Dian et al. proposed a noniterative recovery of hyperspectral images via sparse tensors and non-local spatial self-similarities. However, the existing multispectral tensor representation cannot take full use of the image structures and the spatial-spectral correlation. Our own recent works demonstrates the potential of sparse core tensor dictionary learning for MS-RSI restoration by exploiting self-repeating patterns in abundant similarity of edges and textures [Geng, Nie, Niu et al. (2018)]. To further exploit the sparse distribution prior of the core tensor, we propose a novel core tensor dictionary learning approach with the robust modified Gaussian mixture model (CTDL-MGM). The contributions of this paper are two-fold. First, to improve the accuracy of sparse core tensor coding, the asymmetric and the non-Gaussian distribution of MS-RSI data is modeled by modified Gaussian mixture model with non-zero means and positive scaling variables. Second, an efficient core tensor dictionary learning model using alternating optimization algorithm is proposed.

Proposed method
In this section, the tensor dictionary learning is presented firstly. Then we analyze the limitation of the existing approaches. Finally, we construct a novel core tensor dictionary learning approach with the robust modified Gaussian mixture model.

Tensor dictionary learning for MS-RSI
We aim at recovering a high quality MS-RSI ∈ ℝ × from its distorted measurement where N=W×H denotes the number of pixels and L is the number of spectral bands: = ℋ + (1) where ℋ is the blurring operators and is the additive noise. Considering both structural correlation in space and global correlation in spectrum, the MS-RSI patch is modeled by 3 rd -tensor = ℛ , (sized by × × ) with the operator matrix ℛ at spatial position i. Due to the redundant information, the spatial textures cannot be characterized with the multispectral tensor . Then, we construct the core tensor by high-order singular value decomposition (HOSVD) as in Eq. (2), which preserves spatial structure information with small amount of computation.
where U ,1 , U ,2 , U ,3 ∈ ℝ × are orthogonal matrices; the core tensor ̃∈ ℝ × × is a 3D coefficient array; × denotes the i-th mode product. Thanks for the both spatialstructural similarity and spectral correlation of the , most components of ̃= × 1 U ,1 ⊺ × 2 U ,2 ⊺ × 3 U ,3 ⊺ are close to zero and the corresponding 1D representations ̃ is highly sparse as in Fig. 1. Due to the orthogonality of the U , ( = 1,2,3), the �U ,1 , U ,2 , U ,3 � can be regarded as dictionary and the distorted ̃ can be recovered using sparse model as follows: ̂= arg min‖̃− ‖ 2 2 + ( ) where the ‖̃− ‖ 2 2 is the fidelity constraint term and the ( ) is the sparse constraint term. Popular choices (•) of include the pseudo-norm l0 and the l1 norm, which exactly lead to the hard and soft thresholding of core tensor coefficient array ̃ respectively.

The limitation of the tensor dictionary learning
For effective MS-MRI restoration, the sparse core tensor ̂, obtained from distorted by Eq. (3) is expected to be as close as possible to the ̂, obtained from clean . However, due to the degradation, ̂, will deviate from its true value ̂, and would directly decrease the quality of the reconstructed image. To further illustrate these differences, we define the deviation degree of the core tensor ̂, as: where ‖•‖ denotes the Frobenius norm. The definition of �̂, ,̂, � describes the similarity between ̂, and ̂, . To show the �̂, ,̂, �, we performed simulated experiments on distorted MS-RSI, degraded by the Gaussian blurring and Gaussian white noise. For fair comparison, both ̂, and ̂, were obtained using the same projection matrices �U ,1 , U ,2 , U ,3 � . As shown in Fig. 2, most components of �̂, ,̂, � concentrated in 0.1~0.25 and the ̂, deviates from its true value obviously. The definition of �̂, ,̂, � indicates that the accurate ̂, cannot be obtained by the tensor dictionary learning model and the reconstructed performance can be further improved by learning the optimal solution of ̂, . This observation motivates us to model a novel tensor dictionary learning approach, as will be further discussed in Subsection 2.3.

Core Tensor dictionary learning via modified Gaussian mixture modeling
As we can see from Eq. (3), the regularization function (•) is critical in sparse tensor approximation. In this subsection, we propose a Maximum a Posterior (MAP) method for estimating from distorted , which can be formulated as: where log (| ) is given by the Gaussian distribution of noise, i.e.: and ( ) is the distribution prior of . To exploit the nonlocally similar textures, the tensors { , = 1, ⋯ , } are clustered into K sets and their corresponding core tensor sets �Ω �Ω = � , � � are obtained. Then each core tensor set should be characterized by the same prior (i.e., the probability density function with the same parameters). Motivated by the BAMM algorithm [Nguyen, Wu, Mukherjee et al. (2014)] and GM-based method [Mallouli (2019)], in this paper, we modify the asymmetric Gaussian mixture model to fit different shapes of observed data and incorporate the spatial information. The density function ( |Π, Θ) for each core tensor is defined as: where Π = { } is the set of between-cluster prior distributions which models the prior probability of core tensor in label Ω , and satisfies the constraints 0 ≤ ≤ 1 and ∑ =1 = 1. The probability � | � is a statistical distribution, and is the weight factor of j-th tensor in the local neighborhood = { 1 , ⋯ , } around tensor , which will be presented in details in subsection 2.4. Since the observation is generally considered as statistically independent among the core tensor set Ω , the joint conditional density over the core tensor set = { } can be calculated using the novel modified Gaussian Mixture (MGM) Model: The joint conditional density Φ( |Π, Θ) is mainly influenced by the probabilities of core tensors � | � in its neighborhood = { 1 , ⋯ , } around tensor . The distribution � | � is the important component which can be any kind of probability statistical distribution (e.g., the Laplacian, Generalized Gaussian, and student's t-distribution). However, these traditional distributions are symmetric and unimodal, which does not consider the spatial information and is often shorter than required. It means that these are not flexible enough to fit the shape of the multispectral core tensor with non-Gaussian and asymmetric distribution. So each component of can be characterized by a multiplestatistical distributions. In this paper, we propose a modified Gaussian mixture distribution with a positive scaling variable to model core tensor , which can be written in the form: is the mean value, the D×D matrix Σ is the covariance value, and |Σ | denotes the determinant of Σ . Consequently, we have also adopted this option in this work, which translates the core tensor dictionary learning (described in Eq. (5)) into: where the ‖̃− ‖ 2 2 is the fidelity constraint term and the ‖ ‖ 1 is the sparse constraint term. As shown in Fig. 2, the core tensor is naturally sparse due to the both spatialstructural similarity and spectral correlation. Then the sparse constraint term ‖ ‖ 1 can be removed. Therefore, the above core tensor dictionary learning model can be converted into: where ω is the weight factor vector of the core tensor in its local neighborhood, which will be presented in details in subsection 2.4. is the mean vector with D dimension. Σ is the covariance matrix with D×D dimension, and |Σ| is the determinant of Σ. We call such novel formulation Core Tensor Dictionary Learning with the Modified Gaussian mixture (CTDL-MGM) and the computationally efficient solution will be discussed in Section 3.

Construction of weight factor
In our proposed model, the conditional probability of the core tensor is influenced by the probabilities of core tensors in its neighborhood . To incorporate the spatial-spectral information and core tensor intensity information, we add weigh factor for distant core tensors in order to distinguish among the contributions of different core tensors, as the weighted parameters decrease with increasing distance. Most state-of-the-art algorithms with spatial-spectral information, including nonlocal core tensor dictionary learning method [Geng, Nie, Niu et al. (2018)] and clustering methods [Li, Zhang, Lu et al. (2017); Chen, Xiong, Xu et al. (2019)], have yielded effective results, but still have some disadvantages: (1) Although the introduction of local spatial and spectral information makes the corresponding algorithm overcome the impact of noise, they still lack enough robustness to noise with different types or levels, especially in absence of prior knowledge of the noise; (2) One or more controlling parameters are generally introduced to balance the robustness to noise and effectiveness of preserving the details. Unfortunately, the selection of these parameters has to be made by experience or trial and error experiments. To overcome the above mentioned disadvantages, a new spatial weight factor should be constructed with the following characteristics: controlling the influence of the neighborhood core tensors depending on their distance from each other among the neighborhood to further improve the robustness to noise; utilizing the intensity information of the original image to make the algorithm avoid sensitive to initializations; being free of any balance parameter selection. Let R be the radius of the neighborhood centered as core tensor . There are (2 + 1) 2 core tensors in each neighborhood obviously, but not all the core tensors are useful for similarity computation. In other words, different core tensors in the neighborhood should have different weights. Therefore, we construct a novel weight factor defined as: where is the number of core tensors in the neighborhood for core tensor . As shown in above equation, it can be seen that the factor is formulated without setting any control or balance parameters to balance the robustness to noise and effectiveness of preserving the details. Moreover, the influence of core tensors within the local window in is flexibly by using their spatial-spectral Euclidean distance from other core tensors in the neighborhood. Therefore, the proposed weight factor can reflect the damping extent of the neighbors with the spatial-spectral distance from other neighborhood core tensors. Thus more local spatial information can be used. It is worth noting that the shape of the local window used in our experiments is square, but also, windows with other shapes like circle or diamond can be easily adapted to the proposed weight factor.

MS-RSI Restoration with CTDL-MGM
In the previous sections, we have seen how to solve CTDL-MGM problem for a single core tensor (a collection of image patches similar to a chosen exemplar). In this section, we generalize such formulation to whole-image reconstruction and study the applications of CTDL-MGM into image restoration including image denoising and image deblurring. For the degraded image = ℋ + (as in Eq. (1)), the whole-image can be reconstructed by solving the optimization problem: where ℛ � = �ℛ � 0 , ℛ � 1 , ⋯ , ℛ � −1 � denotes an operator extracting similar 3D tensors at spatial position i; = { , Σ } is the parameter in modified Gaussian mixture model. The above global minimization problem can be decomposed into the following two subproblems: minimization of for a fixed { } and { }; minimization of { } and { } for a fixed , which can be efficiently solved by the method of alternating optimization. Specifically, both subproblems admits closed-form solutions when the dictionary �U ,1 , U ,2 , U ,3 � is orthogonal.

Solving for { } and { }
When the image is fixed, the subproblem boils down to a sequence of tensor-level CTDL-MGM problems, i.e., for each exemplar: where we have used ̃= × 1 U ,1 ⊺ × 2 U ,2 ⊺ × 3 U ,3 ⊺ . This is exactly the problem studied in the previous section. The and Σ can be computed by: where is the weight factor constructed in Subsection 2.4. Putting things together, a complete image restoration based on CTDL-MGM can be summarized as following section.

Algorithm of CTDL-MGM
The main procedure of the proposed CTDL-MGM model is summarized in Algorithm 1. Compared with the existing approaches, there three main advantages in the CTDL-MGM model as follows: (1) The MS-RSI patch is modeled by 3 rd -tensor preserving both structural correlation in space and global correlation in spectrum. (2) The core tensor sparse-based model is constructed to solve the core tensor approximation effectively. (3) Exploiting the sparse distribution prior of the core tensor, the proposed CTDL-MGM can obtain a better estimation of the core tensor. (1) Calculate the weight factor for each core tensor with Eq. (13).

Experimental results
In this section, we apply the proposed CTDL-MGM algorithm to image restoration, including image denoising and image deblurring. To verify the performance of our proposed method, we utilized the 16 ZY-3 remote sensing images, each with 4 spectral bands including red, blue, green and near-infrared. The parameters of the proposed method are set as follows: the multispectral tensor size is 6×6×4 with 2-pixel-width overlap and its corresponding core tensor size is 6×6×4; the number of cluster is K=10; the mean values for each Gaussian component are initialized randomly; the covariance matrices and the prior probabilities are then initialed based on the means. To comprehensively assess the performance, we employ two objective image quality indices: peak signal-to-noise ratio (PSNR) and feature similarity (FSIM).

Image denoising
In the subsection, the Gaussian noise is added to the MS-RSIs, which comes from many natural sources, such as the spontaneous thermal generation of electrons. We parameterized the Gaussian noise by its standard deviation σ=10, 20, 35, 50. We have compared CTDL-MGM based image denoising method against three current state-of-the-art methods, including the structured sparse coding with Gaussian scale mixture (SSC-GSM) [Dong, Shi, Ma et al. (2015)], hierarchical dictionary learning (HDL) [Liu, Ma, Yang et al. (2017)], nonlocal tensor dictionary learning (NTDL) [Peng, Meng, Xu et al. (2014)]. Both average PSNR (dB) and average FSIM at different noise levels are reported in Tab. 1. As can be seen from Tab. 1, the proposed CTDL-MGM method consistently outperforms all other leading algorithms. The average PSNR improvements over HDL, SSC-GSM and NTDL methods are larger than about 3 dB, 3 dB and 1 dB, respectively. Besides, the average FSIM improvements 0.01~0.02 on low noise levels (σ=10, 20) and 0.04~0.05 on high noise levels (σ=35, 50). On the average, HDL and SSC-GSM are mostly performing methods. However, they fall behind 3 rd -tensor dictionary learning methods (including NTDL and CTDL-MGM) by less than 1.5 dB in PSNR and 0.01 in FSIM. By exploiting the sparse distribution prior of the multispectral 3 rd -tensor, the proposed CTDL-MGM method has achieved better competitive denoising performance to the NTDL using nonlocal tensor similarities.  Fig. 3 includes the visual comparison of denoising results for ZY-3 remote sensing images at the moderate noise level (σ=20). It can be easily observed that the restored images by HDL and SSC-GSM both suffer from noticeable artifacts especially around the smooth areas close to the bridge; by contrast, the NTDL and CTDL-MGM based on tensor dictionary learning seem to deliver the better visual quality. Obviously, the MS-RSI restorated by our method properly removes the noise while finely preserves the structures underlying the image.

Image deblurring
We have also compared CTDL-MGM based image deblurring and three other competing approaches in the literature: structural compact core tensor dictionary learning (SC-CTDL) [Geng, Nie, Niu et al. (2018)], multispectral image deblurring with jitter image trajectory (MID-JIT) [Song, Fang, Pan et al. (2017)], multispectral image deblurring using interchannel correlation (MID-IC) [Chen and Shen (2015)]. The performance of the proposed CTDL-MGM model was verified on a set of 16 MS-RSIs. The original images were blurred by a blur kernel and then added by the additive Gaussian white noise with standard deviation √2. In our comparative study, two blur kernels were commonly used: (1) 9×9 uniform blur simulating the relative motion between the subject and multispectral camera; (2) Gaussian blur with standard deviation 1.6 simulating the optics blur. Uniform blur. We have applied the proposed method and three other competing approaches on 16 blurred scenes by uniform blur. The PSNR and FSIM curves are depicted in Fig. 4. In PSNR results, the proposed CTDL-MGM method significantly outperforms three other methods by 2~3 dB. For the FSIM, our proposed method is better than three other algorithms about 0.04. From Figs. 5(c) to 5(f), we present the deblurring results of a city region at blue band. It can be easily observed that MID-JIT and MID-IC are quit poor without considering the correlation between spatial and spectral information into account. The SC-CTDL method constructs the central texture based on the details in its neighborhood which are also corrupted. Therefore, the corresponding result is not satisfied enough especially for the region with abundant edges or textures. The proposed method leads to the best visual quality. It cannot only remove the blurring effects and noise, but also reconstruct more and sharper image edges than other methods. The excellent edge preservation owes to exploiting the sparse distribution prior of the core tensor. Gaussian blur. For 7×7 Gaussian blurring with standard deviation 1.6, the superior performance of our proposed CTDL-MGM is demonstrated as shown in Fig. 6. We can see that the proposed method achieves the best results. Its performance is more outstanding than other three methods with 1~3 dB in PSNR and 0.01~0.06 in FSIM. Moreover, the MID-JIT and MID-IC is comparable in FSIM. To further verify the effectiveness of the proposed method, we present the deblurring island image at green band in Fig. 7. The MID-JIT produces over-smoothed results and eliminates much image details. The MID-IC also fails to reconstruct fine image edges and tends to generate some "ghost" artifacts around the edges. The SC-CTDL results seem the acceptable visual quality. In contrast, the proposed method presents the clearer and more details images when compared with the ground truth. Overall the proposed CTDL-MGM method leads to the best result on both subjective image quality and visual perception. One of the main reasons is that there is essential difference of data structure for MS-RSI patch between the proposed method and other 2D-patch-based methods. In CTDL-MGM method, the MS-RSI patch is conducted by tensor incorporating spatial structure and spectral correlation. Additionally, the proposed method is capable of greatly improving the accuracy of core tensor by exploiting the sparse distribution prior. In contrast, MID-JIT ignored the spectral correlation and restored spectral image separately; MID-IC focused out-of-focus blur but not on other blur types; SC-CTDL emphasized more on exploiting nonlocal similarity and failed to strike a good balance between local variation and nonlocal invariance. Therefore, our proposed method can outperform other three methods.

Conclusions
To further improve the accuracy of sparse core tensor and the qaulity of reconstructed image, in this paper, we presented a novel prior distrebution for MS-RSI restoration. In order to incorporate spatial-structural similarity and spectral correlation of MS-RSIs into a unified representation, the MS-RSI patch is modeled by 3 rd -tensor. Considering both the sparsity of core tensors and the orthogonality of projection matrices, the core tensors are restored by the sparse core tensor dictionary learning model. For the purpose of improving the accuracy of core tensors, the modified Gaussian mixture is introduced into the core tensor dictionary learning model. To this end, the core tensor dictionary learning approach with the modified Gaussian mixture model (CTDL-MGM) is proposed. An efficient alternating optimization algorithm was presented for solving the CTDL-MGM minimization problem. Our results on Gaussian noise, uniform blur kernel and Gaussian blur kernel show that the proposed algorithm is capable of producing more accurate image restoration results than several state-of-the-art algorithms. One limitation of the proposed algorithm is the manual selection of the window size for the tensor. One possible extension of this work is to set the radius based on the amount of noise in the target image. Making our algorithm with adaptive window size selection will be our direction of the future work. Another limitation of the proposed algorithm is how to determine the number of classes/components automatically for different applications. The traversal strategy is generally used to determine the number of clusters/components based on the criteria in mixture models. Thus, addressing such questions is out of the scope of this paper and subjects of future research.