Toward Super-Resolution Image Construction Based on Joint Tensor Decomposition

: In recent years, fusing hyperspectral images (HSIs) and multispectral images (MSIs) to acquire super-resolution images (SRIs) has been in the spotlight and gained tremendous attention. However, some current methods, such as those based on low rank matrix decomposition, also have a fair share of challenges. These algorithms carry out the matrixing process for the original image tensor, which will lose the structure information of the original image. In addition, there is no corresponding theory to prove whether the algorithm can guarantee the accurate restoration of the fused image due to the non-uniqueness of matrix decomposition. Moreover, degenerate operators are usually unknown or difﬁcult to estimate in some practical applications. In this paper, an image fusion method based on joint tensor decomposition (JTF) is proposed, which is more effective and more applicable to the circumstance that degenerate operators are unknown or tough to gauge. Speciﬁcally, in the proposed JTF method, we consider SRI as a three-dimensional tensor and redeﬁne the fusion problem with the decomposition issue of joint tensors. We then formulate the JTF algorithm, and the experimental results certify the superior performance of the proposed method in comparison to the current popular schemes.


Introduction
With the flourishing of both artificial intelligence (AI) and mathematical theory, image fusion has always been the focus and hotspot in neuroscience, metabonomics, remote sensing, and many other fields [1][2][3]. Generally, image fusion refers to synthesizing images' data obtained from the diverse image acquisition equipment. It is aimed at achieving complementary information from different information sources to further acquire clearer, more informative, and higher quality reconstructed images. In 1994, Genderen and Phol proposed a simple and intuitive definition of image fusion [4]: image fusion is merging two or more images into a new image using some algorithms. Therein, hyperspectral images (HSIs), playing a catalytic role in image fusion, have been widely leveraged in geophysical exploration [5], agricultural remote sensing [6], marine remote sensing [7], environmental monitoring [8], and other fields [9][10][11][12][13] because of their rich spectral information. However, the spatial resolution of HSIs is still relatively low, subjected to the imaging equipment of HSIs and the complex imaging environment, which cannot meet the application requirements of mixing, classification, detection, etc., while this further limits the prospect of HSIs. Therefore, how to improve the resolution of hyperspectral images has become the hot issue in the field of image processing.
Specially, panchromatic fusion and hyperspectral-multispectral fusion are two forms of hyperspectral super-resolution image reconstruction. Panchromatic fusion refers to fusing multispectral images (or hyperspectral images) and panchromatic images to obtain images with more spatial information. In [14], the authors classified the pansharpening techniques into component substitution (CS) [15] and multi-resolution analysis (MRA) [16]. Meanwhile, they proposed a hybrid method, combining the better spatial information of CS and the more accurate spectral information of MRA techniques, to improve the spatial resolution while preserving the original spectral information as much as possible.
Concretely, a multispectral image (MSI) generally consists of dozens of bands, and most of them are in the range of the visible region. Given a low-spatial resolution HSI, the operation of spatial resolution enhancement using MSI under the same scene is termed hyperspectral image fusion or super-resolution image (SRI) reconstruction. Generally, the MSI has a higher spatial resolution than the HSI, which is complementary to the HSI. In this paper, we mainly study the method of SRI reconstruction by combining the spectral information of HSI with the spatial information of MSI under the same scene.
Nevertheless, the existing technologies can neither avoid the distortion of image spectral characteristics, nor the complex and time-consuming frequency decomposition and reconstruction. Therefore, Yokoya proposed a simple spectral preservation fusion technique: the smoothing filter based intensity modulation (SFIM) [17], which is also called the generalized Laplace pyramid (GLP) [18][19][20][21] based on the modulation transfer function (MTF) used to fuse HSI and MSI by hypersharpening [22]. Then, utilizing the ratio between the high-resolution image and its low-pass filter (with a smoothing filter) image, spatial details could be modulated into co-registered low-resolution MSIs without changing their spectral characteristics and contrast. Compared with Brovey transform [23], SFIM is an advanced fusion technology to improve the spatial details of MSIs, and its spectral characteristics are reliably preserved.
Beyond that, Eismann introduced a maximum a posteriori estimation method [24]. It combined the stochastic mixing model of the content under the spectral scene and developed a cost function that could optimize the estimation of the hyperspectral scene related to the observed hyperspectral and auxiliary images. Moreover, this method can generally reconstruct sub-pixel information of several main components in SRI estimation. Furthermore, sparse representation has often been employed to deal with various types of image processing problems, especially in the inverse problem. In 2006, Elad denoised the image with the sparse representation method [25]. This not only achieved the state-of-the-art effect, but introduced the K-SVD dictionary training method [26]. In 2010, the authors of [27] proposed a single-frame super-resolution image reconstruction method based on sparse representation.
In contrast, traditional methods, such as principal component substitution and enhancement of least squares estimation, are primarily limited to the first principal component. For remote sensing images, the spectral characteristics of pixels are denoted as endmembers, including mixed endmembers and pure endmembers. Since each pixel has mixed endmembers, unmixing is a technique for estimating the number of pure endmembers in each pixel, the spectral characteristics, and the abundance of the endmembers [28]. The SRI reconstruction method based on unmixing usually decomposes the HSI and the MSI of the same scene. The endmember matrix of the decomposed HSI and the abundance matrix of the MSI are combined to obtain the reconstructed HSI with high spatial resolution.
In the effort of [29], the authors proposed the method of enhancing the spatial resolution of the HSI in terms of unmixing technology: coupled nonnegative matrix factorization (CNMF). They exploited nonnegative matrix factorization to unmix the HSI and MSI sequentially and iteratively obtained the endmember matrix and abundance matrix. Ultimately, the SRI was obtained by combining the two matrices. The nonnegative matrix decomposition usually cannot guarantee the unique solution, although CNMF could achieve good reconstruction results. To address this matter, Eliot Wycoff presented a nonnegative sparse enhancement model for SRI reconstruction [30], which testified that the solution was not unique in the CNMF method, and it had high computational complexity and a high requirement for CPU operation ability. To further boost the effect of super-resolution reconstruction, the authors of [31] came up with a method to resolve the problem of super-resolution and hyperspectral unmixing simultaneously. Unlike the measures in [29], they took advantage of the nearest alternating linear minimization (PALM) [32] to update them simultaneously, while the initialization of the endmember matrix applied SISAL [33] for endmember extraction.
Simultaneously, some researchers studied the fusion of the MSI and HSI based on the tensor [34][35][36][37][38][39][40], mainly considering the natural tensor structure of spectral images, so as to reduce the information lost in matricization and increase the performance. Generally, multi-channel images and other data have their own natural tensor structure. In addition, since the tensor has good expressive ability and computational characteristics, it is very meaningful to study the tensor analysis of images. Moreover, tensor decomposition can preserve the structural characteristics of the original image data. For HSIs, tensor decomposition makes full use of spatial and spectral redundancy between images and compresses and extracts relevant feature information with high quality. Based on HSIs, a nonnegative tensor canonical polyadic decomposition (CP) algorithm was raised that was applied to dispose of the blind source separation [41]. Shashua utilized CP decomposition for image compression and classification [42], while Bauckhage introduced discriminant analysis to high-order data such as color images for classification [43]. Xiao Fu proposed a coupled tensor decomposition framework [44], which could guarantee the identifiability of SRIs under mild and realistic conditions. Meanwhile, Shutao Li and Renwei Dian put forward a coupled sparse tensor factorization (CSTF) [45]; they regarded the SRI as a three-dimensional tensor and redefined the fusion problem as the core tensor and dictionary estimation of three modes. The high-spatial spectral correlation in the SRI was modeled by a regularizer, which could promote the generation of sparse core tensors. However, CSTF is an optimization model based on tensor Tucker decomposition, which is not unique. Moreover, most existing methods assume that known (or easily estimated) degenerate operators are applied to SRI to form the corresponding HSI and MSI, which is practically absent. In this paper, we deal with the super-resolution problem under the condition that the degenerate operators are seldom known and contain noise. A joint tensor decomposition model is proposed by taking advantage of the multi-dimensional tensor structure of the HSI and MSI.
The main content of this paper is to utilize the joint tensor decomposition (JTF) algorithm for the fusion of HSI and MSI, so as to explore the problem of SRI reconstruction. The contributions are listed as follows:

•
In the proposed method, the high-spatial resolution HSI is regarded as a three-dimensional tensor, while the fusion issue is redefined as the joint estimation of the coupling factor matrix, which is also expressed as the joint tensor decomposition problem for the hyperspectral image tensor, multispectral image tensor, and noise regularization term.

•
In order to observe the reconstruction effect of this method, the performance of this algorithm is compared with the five algorithms. Experiments show that the JTF method provides clearer spatial details than the real SRI, while the running time of JTF is acceptable compared with the excellent performance. • Besides, we conduct experiments under the incorrect Gaussian kernel (3 × 3, 5 × 5, 7 × 7), correct Gaussian kernel (9 × 9), and different noises, while showing the fusion effect of the six test methods with the Pavia University data captured by the ROSISsensor as well. The results reveal that the JTF method performs best in comparison with the other methods in terms of reconstruction accuracy regardless of whether the Gaussian kernel is correct and the level of added noise. This indicates that the JTF algorithm is more suitable for degradation operators that are unknown or contain noise.
The outline of this paper is organized as follows. In Section 1, we mainly introduce some basic notations and definitions for tensors. In Section 2, we give a basic overview of tensors. The proposed coupled image fusion algorithms are introduced in Section 3. In Section 4, experimental results on the algorithms are presented. Conclusions and future research directions are given in Section 6.

Definition and Notations
In this section, we first briefly introduce some necessary notions and preliminaries. The general tensor is denoted as X , while the element (i, j, k) of a third-order tensor X is signed by x ijk . The matrix is denoted as X, and the scalar (or the vector) is represented by x. A fiber is defined by fixing every index but one. Third-order tensors have column, row, and tube fibers, denoted by x :jk , x i:k , and x ij: , respectively; see Figure 1. Fibers are always assumed to be column vectors. The mode-n matricization of a tensor X ∈ R I 1 ×I 2 ×···×I N is signed by X (n) , which arranges the mode-n fibers to be the columns of the matrix and can reduce the dimension of the tensor. Definition 1. The Kronecker product of matrices A ∈ R I×J and B ∈ R K×L is defined by Equation (1), which is denoted as A ⊗ B, and the calculation result is a matrix of size IK × JL, i.e., Then, we have a new matrix-matrix production termed as the Khatri-Rao product.

Definition 2.
Let A ∈ R I×K and B ∈ R J×K . Then, the Khatri-Rao product is a matrix of size I J × K defined as: where ⊗ is the Kronecker product.
Next, we discuss some properties of the Khatri-Rao product, which will be useful in our later discussion.
Definition 3. The n-mode (matrix) product of a tensor X ∈ R I 1 ×I 2 ×···×I N with a matrix M ∈ R J×I n is represented as: which can be denoted by X × n M and is a tensor with a size of I 1 × · · · × I n−1 × J × I n+1 × · · · × I N .
Definition 4. The Frobenius norm of a tensor X ∈ R I 1 ×I 2 ×···×I N is represented as:

Tensor Decomposition
The general tensor decomposition models involve CP decomposition and Tucker decomposition. Specifically, the CP decomposition is a special case of the Tucker decomposition. Due to the special structure of tensors, these tensor decomposition methods are leveraged in hyperspectral image processing. For a tensor X ∈ R I 1 ×I 2 ×···×I N , the CP decomposition could be expressed as: where "•" is the outer product of the vectors, R is a positive integer, and A (n) is the factor matrix.
For n = 1, 2, · · · , N, λ ∈ R R , a (n) r ∈ R I n , A (n) ∈ R I n ×R , the factor matrix is a combination of the rank one vector a (n) r and denoted as: Let the three-order tensor X ∈ R I×J×K be a hyperspectral image, the CP decomposition could be formulated as: where I, J, and K are the numbers of the row, column, and spectral dimensions, respectively, while r = 1, 2, · · · , R, λ ∈ R R , a r ∈ R I , b r ∈ R J ,c r ∈ R K . Each column of the above factor matrices A, B, and C is normalized, and λ r is the weight. If there is no requirement to standardize the factor matrix, the CP decomposition can also be reformulated as: where A , B , C mean the general factor matrices, which are constructed by assigning the weight to the factor matrices A, B, C.
The schematic diagram of CP decomposition is shown in Figure 2. If R denotes the minimum number of outer products needed to express X , then the tensor rank is R, i.e., rank(X ) = R, and the decomposition is known as rank decomposition, which is a particular case of CP decomposition. At present, there is no specific method to directly solve the rank of any given tensor, which has been proven to be an NP-hard problem. Through the factor matrix, the CP decomposition of a third-order tensor can be written in expansion form.
A third-order tensor can be denoted as follows by applying mode-n products: The above formula can be expressed in the form of factor matrices: Figure 2. Canonical polyadic (CP) decomposition of third-order tensors . Theorem 1. Correspondingly, we consider how many rank-one tensors (components) of the decomposition of the CP model are added to minimize the error. The usual practice is to start with R = 1 until you encounter a "good" result. Of course, if you have a strong application background and prior information, you can also specify it in advance. For a given number of components, there is still no universal solution for CP decomposition. Specifically, the alternating least squares (ALS) algorithm is a more popular method in the case that the number of components is pre-given [46]. For the CP decomposition of tensors, even if R is much larger than max{i, j, k}, the CP decomposition model is essentially unique. The lower order decomposition of matrices and Tucker decomposition of tensors are generally not unique, which is a significant difference between them. The most famous result about the uniqueness of tensor decomposition is due to Kruskal [47]. One result of the Kruskal criteria is the following statement, which applies to general tensors, which provides the uniqueness proof of the CP decomposition model.
and tensor X ∈ R I×J×K of rank R has a unique decomposition if: where A ∈ R I×R , B ∈ R J×R , and C ∈ R K×R .
The uniqueness condition of the tensor CP decomposition model is relatively relaxed compared with that of the matrix decomposition model. Since the rank of matrix decomposition must be lower than the dimension of the matrix and needs nonnegative, sparse, and geometric conditions, certainly, we can also judge whether the CP decomposition model is unique according to the rank of the given tensor.

Problem Formulation
The purpose of HSI and MSI fusion is to estimate the unobservable SRI (S ∈ R I×J×K ) from the observable low-spatial resolution HSI (H ∈ R i×j×K ) and the high-spatial resolution MSI (M ∈ R I×J×k ), where I(i) and J(j) denote the spatial dimensions and K(k) denotes the number of spectral bands. The tensor H is spatially downsampled with respect to (w.r.t.) S, that is I > i and J > j, while the tensor M is spectrally downsampled w.r.t. S, that is K > k. We assume that the two observed data are obtained under the same atmospheric and illumination conditions and are geometrically combined with radiation correction.

Image Fusion Based on Matrix Decomposition
The fusion method based on matrix factorization assumes that each spectral vector of the target SRI can be written as a small number of linear combinations of different spectral characteristics [48], which can be represented as: where S (3) ∈ R I J×K is the three-mode unfolding matrix of the tensor S. Matrices W ∈ R I J×R and H ∈ R R×K represent the spectral basis and the corresponding coefficient matrix, respectively, where R min{I J, K}. The spatial domain of low-spatial resolution hyperspectral data is degraded from the spatial domain of multispectral data. On the other hand, multispectral data are a form of spectral degradation of high-spatial resolution hyperspectral data. Therefore, H and M are modeled as: where H h = HD H , W m = D M W, H (3) ∈ R ij×K , and M (3) ∈ R I J×k are the three-mode unfolding matrices of the HSI (tensor H) and MSI (tensor M), respectively. D H ∈ R I J×ij is a matrix modeling the point spread function (PSF) and the spatial subsampling process in the hyperspectral sensor. D M ∈ R K×k is a matrix modeling spectral downsampling in the multispectral sensor, whose rows contain the spectral response of the multispectral sensor. Therefore, the matricized HSI and MSI are modeled as: In the matrix decomposition based fusion approaches, if the spectral basis D H and coefficient matrix D M can be estimated by jointly factoring from H (3) and M (3) , the SRI can be restored according to Equation (13), which is the main idea based on matrix decomposition.

Image Fusion Based on Tensor Decomposition
Matrix based methods usually assume that degradation operators are known or easily estimated, but in practice, it is difficult to determine. By comparing the spectral properties of hyperspectral and multispectral sensors, the degradation operator D M can be modeled and estimated relatively easily. However, the spatial operator becomes a bit difficult. A common model assumption from SRI to HSI conversion is a combination of the blurring by a Gaussian kernel and a downsampling process. Of course, this is a rough approximation and may be far from accurate. Even if this assumption is approximately correct, there are still many uncertainties.
In order to solve the non-uniqueness of matrix decomposition and under the condition of little knowledge of degenerate operators and noise, we propose a method based on joint tensor decomposition to fuse the HSI and MSI in this section. Tensor based models have many advantages. For example, it is a very efficient strategy to abstract image data into tensor representation and then input them in the image fusion model. For the output data, we can choose the desired format to save them conveniently. Formally, we represent the SRI as the following equation via CP decomposition: where S ∈ R I×J×K , A ∈ R I×R , B ∈ R J×R , C ∈ R K×R , and R is the the number of components. The HSI is the spatial downsampling version of the SRI. Assuming that the point spread function (PSF) of the hyperspectral sensor is separable from the downsampling matrix of the wide mode and the high mode, we can have: where D 1 ∈ R I×i , D 2 ∈ R J×j are the spatial degradation along the width and height modes, respectively. For subsampling, the separability hypothesis implies that the function of spatial subsampling matrix D H is decoupled from the two spatial patterns of S, and thus, the degenerate operator D H = D 2 ⊗ D 1 in the matricized form. Under the separability assumption, the HSI (H) can be represented as: where A = D 1 A ∈ R I×R , B = D 2 B ∈ R J×R , and C ∈ R k×R . In this paper, we assume that spectral response D M has noise, i.e., rough sampling in the process of conversion from the SRI to the MSI. Formally, we represent it as: where Γ is Gaussian random noise. Analogously, the MSI (M) can be represented as: where D M ∈ R K×k is the downsampling matrix of the spectral mode. We substitute Formula (16) into (20) to obtain: where A ∈ R i×R , B ∈ R j×R , and C = D M C ∈ R K×R . In order to reconstruct the SRI, we need to estimate the factor matrices A, B, C.

The Joint Tensor Decomposition Method
The Joint Tensor Decomposition Method In this section, we consider that when D M contains noise and the spatial degradation operator D H = D 2 ⊗ D 1 is completely unknown, even though this type of operation is called a combination of blurring and downsampling, in practice, hyperparameters such as the blurring kernel type, kernel size, and downsampling offset are barely known. Therefore, the joint tensor decomposition model can be generalized to the following model: We use the following optimization models to obtain factor matrices A, B and C, where β is the regularization parameter. The above optimization problem is non-convex, and the solutions of the factor matrices A, B, and C are not unique. However, the objective function in (22) is convex for each variable block, remaining unchanged with other variables. Therefore, we choose the proximal alternate optimization (PAO) scheme to solve the above optimization problem, which guarantees that the optimization problem converges to the critical point under certain conditions. Then, each step of the iterative update of the factor matrix is reduced to solving an easy-to-handle Sylvester equation by matricization of tensor HSI and MSI. Specifically, the A, B, and C iterations are updated as follows: • Optimization with respect to C: When A, B, A , B , and C are fixed, the optimization w.r.t. C in (22) can be written as: The above optimization problem can be transformed into the following one by using the properties of n-mode matrix unfolding.
where H (3) is the three-mode unfolding matrix of tensors H. The optimization problem (23) is quadratic, and its unique solution is equal to the calculation of the general Sylvester matrix equation.
where E = (B • Optimization with respect to A : When A, B, C, B , and C are fixed, the optimization w.r.t. A in (22) can be written as: The above optimization problem can be transformed into the following one by using the properties of n-mode matrix unfolding.
where H (1) is the one-mode unfolding matrix of tensors H. The optimization problem (26) is convex, and the optimal solution is then given by: According to the property of the Khatri-Rao product pseudo-inverse, we can rewrite the solution as: The advantage of solving the above equation is that we only need to compute the pseudo-inverse matrix of the R × R matrix, but not the jK × R matrix. The solving process of factor matrix B is similar to that of A , and we can rewrite the solution as: • Optimization with respect to C : When A, B, C, A , and B are fixed, the optimization w.r.t. C in (22) can be written as the following one by using the properties of n-mode matrix unfolding.
where M (3) is the three-mode unfolding matrix of tensors M. The optimization problem (30) is quadratic, and its unique solution is equal to the calculation of the general Sylvester matrix equation.
where F = (B T B) * (A T A), and I is the unit matrix of 4 × 4.
We use the Sylvester function in the MATLAB toolbox to solve the above equation.
• Optimization with respect to A: When B, C, A , B , and C are fixed, the optimization w.r.t. A in (22) can be written as: The above optimization problem can be transformed into the following one by using the properties of n-mode matrix unfolding.
where M (1) is the one-mode unfolding matrix of tensors M. The optimization problem (33) is convex, and the optimal solution is then given by: According to the property of the Khatri-Rao product pseudo-inverse, we can rewrite the solution as: Similarly, we only need to compute the pseudo-inverse matrix of the R × R matrix, but not the Jk × R matrix. The solving process of factor matrix B is similar to that of A, and we can rewrite the solution as: For each iteration update, we discuss it in detail. The specific algorithm is shown in Algorithm 1. After obtaining the estimated values of A, B, and C, the super-resolution tensor reconstruction is obtained from the following formula: S ≈ (A, B, C).
The detailed steps of the proposed method are given in Algorithm 1.

Algorithm 1: Algorithm for coupled images.
Initialization: β, R, A 0 , B 0 , C 0 , A 0 , B 0 , C 0 Apply blind STEREO Algorithm [44] with random initializations to obtain A, B, C, A , B , C While not converged, do

Experimental Data
To obtain an MSI from an SRI, we used the spectral specifications of the multispectral sensor, which were taken from the QuickBird sensor in our experiments [49]. The QuickBird sensor produces four-band MSI in the following spectral bands: blue (430-545 nm), green (466-620 nm), red (590-710 nm), and near-infrared (715-918 nm). Then, the spectral response matrix D M is formed by comparing the SRI obtained in the experiment from 400 to 2500 nm with the multi-spectral sensor band, and the MSI image is obtained by assuming that there is random Gaussian noise in D M . More precisely, D M is a selective averaging matrix that acts on the common wavelength of the SRI and MSI, and the experimental data came from [44]. The data selected in this paper were taken from Pavia University in Italy and were captured by the ROSIS sensor. The SRI, HSI, and MSI have sizes of 608 × 336 × 103, 152 × 84 × 103, and 608 × 336 × 4, respectively. Specifically, the MSI is generated by QuickBird simulation, while the HSI is generated by SRI by 9 × 9 Gaussian blur and downsampling, and the MSI is generated for the Pavia University image according to the QuickBird specification. The degradation process from the SRI to the HSI is a combination of spatial blurring of the 9 × 9 Gaussian kernel and the D = 4 factor along two spatial directions to model the blurred image.
On the 3.6 GHz kernel and 8 GB RAM Windows server, the simulation is carried out by MATLAB. According to the algorithm JTF, factors A, B, and C are obtained through the joint tensor decomposition of the MSI and HSI, which mainly solves the least squares problem and preliminarily estimates the potential factors, where the CP decomposition is computed by TensorLab [50]. The maximum number of iterations for tensor decomposition was set to 25 in the initialization, while the number of iteration updates for factor matrix requires continuing numerical simulation. In this paper, we fixed β = 1. We mainly refer to [44]; this paper proved that the performance of super-resolution tensor reconstruction is best when beta is equal to one. The proposed model adds the noise term in the objective function based on [44]. Therefore, we selected a similar parameter.

Evaluation Criterion
In order to evaluate the quality of reconstructed high-spatial resolution HSIs, we introduce several intuitive evaluation indicators. The first index is the reconstruction signal-to-noise ratio (R-SNR) criterion defined as: ). (38) where S k and S k are the frontal slices of reconstructed SRI and ground truth SRI. The higher the R-SNR is, the better the reconstruction quality. The second index is the root mean squared error (RMSE), i.e., where S and S are the reconstructed SRI and ground truth SRI, B is the number of bands of hyperspectral images, and W and H are the spatial dimensions of total spectral images. Low RMSE values indicate good reconstruction performance. The third index is the spectral angle mapper (SAM), which is defined as: where S (3) (n, :) and S (3) (n, :) express respectively the fibers of the reconstructed and the ground-truth SRI. SAM measures the angles between the reconstructed and the ground-truth fibers of the SRI, and a small SAM is equivalent to good performance. The fourth index is the relative dimensionless global error in synthesis (ERGAS), which is represented as: where c = I i = J j is the spatial downsampling factor and µ k is the mean values of the elements in S k . After image reconstruction, we hope to get a smaller ERGAS.
The fifth index is the universal image quality index (UIQI), which is defined as: where: The sixth index is the normalized mean squared error (NMSE), which is represented as: where S 3 and S 3 are the three-mode unfolding matrix of the reconstructed and ground-truth SRI. The smaller the NMSE, the closer the effect of image fusion is to the ground truth image. In addition to the above evaluation indicators, the other simplest performance indicator is the running time of the algorithm. In this paper, the efficiency of several reconstruction algorithms is compared with the computational time.

Selection Of Parameters
In this section, we select different iterations and the number of components of CP decomposition and experimented under these conditions to obtain the best parameter values, so as to evaluate the sensitivity of the JTF algorithm to important parameters in the model. Because the algorithm in this paper is based on the unknown D H , we first consider experimenting under the condition that the algorithm incorrectly assumes a 7 × 7 Gaussian blur kernel instead of using the correct 9 × 9 Gaussian kernel. Certainly, we have also made experimental comparisons under the correct Gaussian kernels.
Considering the JTF algorithm under different signal-to-noise ratios and assuming that the signal-to-noise ratio (SNR) of the HSI and MSI is the same, where the SNR here is set 20 db, to evaluate the effect of the number of iterations Iter on the image fusion in the algorithm, we run the JTF algorithm based on the number of iterations Iter. Because the JTF algorithm is based on the modification of the Blind Stereo algorithm [44], we compare the performance of the algorithms with or without noise of the degenerate operator. Figure 3 shows the evaluation metrics of SRI after the reconstruction of Pavia University with the change of iteration number Iter. In order to reduce the running time of the algorithm, without loss of generality, here R = 100, β = 1, where the black line represents Blind Stereo algorithm performance in matrix D M with noise, the red is the reconstruction results of the JTF algorithm under the same condition, and the blue trend line indicates the fusion performance by the Blind Stereo algorithm when D M does not contain noise.
As can be seen from Figure 3, when Iter changes from one to 20, the R-SNR of Pavia University decreases, while the values of NMSE, ERGAS, and SAM increase. Among them, R-SNR declines sharply and then rises, while the other evaluation metrics show the opposite trend. When the number of iterations is less than five, the reconstruction effect is better. Therefore, the maximum number of iterations of the JTF algorithm is set between one and five. In addition, the reconstruction effect of this algorithm is always superior to the Blind stereofusion algorithm with noise.
Then, we change the number of components from R = 50 to R = 600 to observe the effect of the number of tensor decomposition components on the image fusion, which depicts different evaluation metrics of the recovered HSIs for Pavia University; see Figure 4.   Figure 4 shows the effect of image fusion in three cases with a different number of components. When D H is unknown and D M contains noise, we compare the above three cases. From the six evaluation metrics, it can be seen that when the number of components is less than 100, the performance of the proposed algorithm is almost the same as the case that D M is clean. It completely achieves the denoising effect and is always better than the Blind Stereo algorithm in the same situation. With the increase of the number of components, all three cases show good performance. However, when the number of components increases to more than 300, the reconstruction effect has a downward trend. According to the above Theorem [47], this is because when the number of components does not satisfy Theorem 1, the algorithm cannot guarantee the uniqueness of CP decomposition, which affects the initialization of the initial factor matrix in the algorithm, resulting in the poor performance of the algorithm.
Then, consider the selection range of the number of components under the condition of the uniqueness of tensor decomposition. As the JTF algorithm only decomposes MSI by CP, we set I = 608, J = 336, K = 4 in Theorem 1. According to the dimension of MSI, we can divide the selection of parameter R into the following four cases: (1) When R < K = 4, bring R into Proposition 1, i.e., By synthesizing the formulas and conditions, we can get the range of R in the first case, which is 1 ≤ R < 4.
(2) When 4 = K ≤ R < J = 336, bring R into Proposition 1, i.e., The above derivation is obviously valid, so we only consider the conditions, and we can get the range of R in the second case, which is 4 ≤ R < 336.
(3) When J = 336 ≤ R < I = 608, bring R into Proposition 1, i.e., By synthesizing the formulas and conditions, we can get the range of R in the third case, which is 336 ≤ R < 338.
(4) When R ≥ 608 = I, bring R into Proposition 1, i.e., As such, we can get the range of R in the fourth case, which is R < 475 and R ≥ 608 by synthesizing the formulas and conditions. Therefore, we can conclude that there is a contradiction between the deduced range of R and the range of conditions, so this situation does not exist.
To sum up, combining the above four cases, in order to guarantee the uniqueness of CP decomposition, the range of the number of components is 1 ≤ R ≤ 338. In this paper, according to the fusion effect of the algorithm while ensuring the uniqueness of CP decomposition, we fixed R = 275.

Experimental Results
To further investigate the performance of the method, we conduct experiments under the incorrect Gaussian kernel (3 × 3, 5 × 5, 7 × 7) and correct Gaussian kernel (9 × 9) and show the fusion effect of the six test methods on Pavia University. Table 1 shows the R-SNR, NMSE, RMSE, ERGAS, SAM, and UIQI of the HSI recovered from Pavia University, and we present the best of the six algorithms in bold. As can be seen from the table, in the case of incorrect estimation of the Gaussian kernel, the fusion effect of other algorithms excluding JTF is worse than that of the correct Gaussian kernel. The closer these five algorithms are to the correct Gaussian kernel, the better the results will be. Nevertheless, the JTF method performs best in the comparison of the methods in terms of reconstruction accuracy whether the Gaussian kernel is correctly estimated or not. Overall, the JTF and CNMF methods are very effective in the reconstruction of Pavia University. On the contrary, the JTF algorithm proposed does not degrade the image reconstruction effect due to the incorrect estimation of the Gaussian kernel. More specifically, the property of the proposed algorithm is greater under the hypothetical Gaussian kernel, which also proves that the JTF algorithm has more generalization significance and application prospects.  Figure 5 reveals the fusion experimental results for Pavia University under the incorrect Gaussian kernel (7 × 7), which contains the 50th and 100th bands' fused images and the corresponding error images reconstructed by the six algorithms, were Line 1 and Line 2 in Figure 5 denote the fused HSIs of the 50th band and the corresponding error HSIs of each method, respectively. Moreover, Figure 5g shows the reference HSIs, while the third and forth rows show the reconstructed images for the 100th band and corresponding error images, respectively. Except for the last column, each column in Figure 5 shows the experimental results corresponding to each method. The error image reflects the difference between the fusion result and the ground truth. As depicted in Figure 5, this paper uses the red box to show the more obvious areas in order to compare the difference of error images of different algorithms clearly. By visualized comparison of the fused HSIs with the reference HSIs, the fusion result of the Blind STEREO method shows slight spectral distortion on the top of the building, while the MAPSMM method generates fuzzy spatial details in some areas, and the spatial information of the fused image is well enhanced by the CNMF method. A closer inspection reveals that the spectral and spatial differences of fused HSIs obtained by the six methods are not obvious. Therefore, in order to further compare the performance of each fusion method, the second and fourth lines of Figure 5 show the error images of the six methods under two spectral bands. The error image is the difference (absolute value) between the fused HSI and the reference HSI pixel value. We magnify the data element value in the error image by 10 times, so that we can inspect it more carefully. It can be seen that the Blind STEREO, SFIM, MTF-GLP, and MAPSMM methods have large differences, while the CNMF method generates relatively smaller differences, and the JTF method has the smallest differences in most regions, indicating that this method has good fusion ability and provides clearer spatial details than the other five algorithms. Similar to the previous experiments, Figure 6 shows the fusion experimental results for Pavia University under the correct Gaussian kernel (9 × 9), which contain the 50th and 100th bands' fused images and the corresponding error images reconstructed by the six algorithms. Figure 6 shows the fused HSIs of the 50th band and the corresponding error HSIs of each method, which are displayed in Lines 1-2. Moreover, Figure 6g shows the reference HSIs, while the third and forth rows show the reconstructed images for the 100th band and corresponding error images, respectively. Similarly, in order to compare the difference of the error images of different algorithms clearly, the data element values in the error image are magnified 10 times, and the red box is applied to display the region with obvious errors. The spectral distortion caused by the Blind STEREO method is very obvious and is affected by the Gaussian kernel changes, as shown in Figure 6b. Compared with the Blind STEREO method, other methods can effectively improve the spatial performance while maintaining the spectral information, and the difference between the fused images is not significant. Therefore, in order to further verify the fusion performance of the proposed method, the second and fourth lines of Figure 6 show the error images corresponding to different methods, respectively. It can be seen that the error image obtained by the JTF method is the lowest in most regions, and the fusion effect is not affected by the Gaussian kernel, which indicates that the JTF method has a superior image reconstruction effect and is more robust. Overall, the JTF method has better reconstruction performance and clearer fusion effects than the other five algorithms.

Experimental Results of the Noisy Case
In practice, there exits additive noise in the hyperspectral and multispectral imaging processes. Therefore, to test the robustness of the proposed JTF method to the noise, we firstly simulate the tensor images M and H in the same way as the previous experiments for Pavia University and then add Gaussian noise to the HSI and MSI. Because the noise level in the HSI is often higher than that of the MSI, we fix the SNR added to the HSI to be 20db and compare the evaluation indicators with the traditional five classical models with the change of noise added to MSI. Figure 7 presents the quality metric values of the noisy cases on Pavia University. It can be seen that from the reconstruction performance of the six fusion algorithms emerges a trend of enhancement with the increase of MSI image noise. Although the fusion effect of CNMF is closer to that of the JTF algorithm when the noise is high, the JTF method is still better than other test methods in the case of noise as a whole.

Analysis of Computational Costs
In this section, experiments are carried out on six classical methods to demonstrate the computational efficiency of the proposed method, which are accomplished with MATLAB R2016b on a PC with Intel Core i7-7500 CPU and 8 GB RAM. The mean time (in terms of seconds) of all comparison methods is shown as Table 2.
As can be seen from Table 2, the method based on the filter fusion (SFIM) does not need to calculate the optimal factor matrix of each mode, and its running time is shorter than the method based on tensor samples. For tensor based methods, since the iterative strategy is used to obtain the optimal solution of each unknown factor, the time of the two methods (JTF, Blind STEREO) is almost the same. MTF-GLP runs between the first two classes of methods, while CNMF and MAPSMM have long running times. Compared with the excellent performance, the running time of JTF is acceptable. Analyses were conducted based on various noises under different Gaussian kernels to further observe the performance of different algorithms. The results indicate that the running time of most algorithms is shorter under the premise of correct estimation of Gaussian kernels. However, there is little difference with the run time of the JTF algorithm under unknown Gaussian kernels, and the running time has little relation with the magnitude of additive noise, which indirectly proves that our algorithm is more robust.

Conclusions
In this paper, a joint tensor decomposition method was proposed to fuse hyperspectral and multispectral images to address the hyperspectral super-resolution issue. The JTF algorithm regards the fusion problem as the joint tensor decomposition, which not only ensures the non-uniqueness of decomposition, but is applicable to the circumstance that degenerate operators are unknown or tough to gauge. In order to observe the reconstruction effect of this method, we compare the performance of the proposed algorithm with that of the five algorithms. Experiments show that the proposed algorithm has great performance advantages and certain simulation prospects. For our future work, we would concentrate on the novel scenario that adds the non-negative constraints for the joint tensor decomposition of super-resolution images.
Author Contributions: Methodology, X.R. and L.L.; resources, J.C.; validation, X.R.; writing, original draft, X.R. and L.L.; writing, review and editing, L.L. All authors read and agreed to the published version of the manuscript.

Funding:
The authors would like to thank the Editors and anonymous Reviewers. This work was partially supported by the National Natural Science Foundation of China under No.51877144.