Sparse Representation of 3D Images for Piecewise Dimensionality Reduction with High Quality Reconstruction

Sparse representation of 3D images is considered within the context of data reduction. The goal is to produce high quality approximations of 3D images using fewer elementary components than the number of intensity points in the 3D array. This is achieved by means of a highly redundant dictionary and a dedicated pursuit strategy especially designed for low memory requirements. The beneﬁt of the proposed framework is illustrated in the ﬁrst instance by demonstrating the gain in dimensionality reduction obtained when approximating true color images as very thin 3D arrays, instead of performing an independent channel by channel approximation. The full power of the approach is further exempliﬁed by producing high quality approximations of hyper-spectral images with a reduction of up to 371 times the number of data points in the representation.

While sparse representation of 3D arrays has received less attention, the advantage of modeling these arrays as a superposition of 3D elementary components is recognized in previous publications [13][14][15][16].
At present, the most widely used multichannel images in every day life are true color images. The simplest way of sparsely representing these images is channel by channel, or adding constraints of correlation across colors [4,17]. However, as demonstrated in this work, sparsity in the representation of true color images can increase substantially if the approximation is realized by means of 3D elements taken from a highly redundant dictionary. The effect is of course more pronounced for arrays involving more channels, such as hyper-spectral images.
From a practical view point, the current drawbacks of 3D sparse modeling using a large dictionary are (i) storage requirements and (ii) the complexity of the concomitant calculations.
In this paper we propose a method which, by addressing (i) leaves room for possible high performance implementations using Graphics Processing Unit (GPU) programming. While the approach is illustrated using Central Processing Unit (CPU) programming, the storage requirements are shown to fit within 48Kb's of fast access shared memory of a GPU when the approximation of a 3D image is realized with a partition block size of 8 × 8 × 8 and with a separable dictionary of redundancy 125.
The main contributions of the paper are listed below.
• The low memory implementation of the Orthogonal Matching Pursuit (OMP) strategy, called Self Projected Matching Pursuit (SPMP) [18] is dedicated to operating in 3D (SPMP3D) with separable dictionaries. This technique delivers an iterative solution to the 3D least squares problem which requires much less storage than direct linear algebra methods. It could therefore be also applied with any other of the pursuit strategies that include a least squares step [14,[19][20][21][22].
• The C++ MEX file for the SPMP3D method has been made available on a dedicated website [23]. All the scripts for reproducing the results of the paper in the MATLAB environment have also been placed on that website.
• Remarkable reduction in the dimensionality of the representation of true color images and hyper-spectral images, with high quality reconstruction, is demonstrated using highly redundant and highly coherent separable dictionaries.
The results suggest that the method may be of assistance to image processing applications which rely on a transformation for data reduction as a first step of further processing. For examples of relevant applications we refer to [24][25][26][27][28].

Notational Convention
R represents the set of real numbers. Boldface letters are used to indicate Euclidean vectors, 2D and 3D arrays. Standard mathematical fonts indicate components, e.g., d ∈ R N is a vector of components d(i) ∈ R, i = 1, . . . , N . The elements of a 3D array I ∈ R Nx×Ny×Nz are indicated as I(i, j, m), i = 1, . . . , N x , j = 1, . . . , N y , m = 1, . . . , N z . Moreover, for each m-value I m ∈ R Nx×Ny stands for the 2D array of elements I m (i, j) = I(i, j, m), i = 1, . . . , N x , j = 1, . . . , N y , which, when not leaving room for ambiguity will also be represented as I(:, :, m). The transpose of a matrix, G say, is indicated as G .
The inner product between 3D arrays, say I ∈ R Nx×Ny×Nz and G ∈ R Nx×Ny×Nz , is given as: Ny j=1 Nz m=1 G(i, j, m)I(i, j, m).
For G ∈ R Nx×Ny×Nz with tensor product structure, i.e. for G = g x ⊗ g y ⊗ g z , with g x ∈ R Nx , g y ∈ R Ny and g z ∈ R Nz , we further have where for each value of m the vector I m g y in R Nx arises by the standard matrix-vector multiplication rule and p ∈ R Nz is given by its components p(m) = g x , I m g y , m = 1, . . . , N z . Note that p, g z indicates the Euclidean inner product in 1D, i.e.

Sparse Representation of Multi-channel Images
Suppose that a 3D image, given as an array I ∈ R Nx×Ny×Nz of intensity pixels, is to be approximated by the linear decomposition where each c(n) is a scalar and each D n is an element of R Nx×Ny×Nz to be selected from a set, , called a 'dictionary'. A sparse approximation of I ∈ R Nx×Ny×Nz is an approximation of the form (2) such that the number k of elements in the decomposition is significantly smaller than N = N x N y N z . The terms in the decomposition (2) are taken from a large redundant dictionary, from where the elements D n in (2), called 'atoms', are chosen according to an optimality criterion.
Within the redundant dictionary framework for approximation, the problem of finding the sparsest decomposition of a given multi-channel image can be formulated as follows: Given an image and a dictionary, approximate the image by the 'atomic decomposition' (2) such that the number k of atoms is minimum. Unfortunately, the numerical minimization of the number of terms to produce an approximation up to a desired error, involves a combinatorial problem for exhaustive search. Hence, the solution is intractable. Consequently, instead of looking for the sparsest solution, one looks for a 'satisfactory solution', i.e., a solution such that the number of k-terms in (2) is considerably smaller than the image dimension. For 2D images this can be effectively achieved by greedy pursuit strategies in the line of the Matching Pursuit (MP) [29] and OMP [30] methods, if dedicated to 2D separable dictionaries [14,18,32,33]. Within a tensor product framework the consideration of OMP in 3D is natural.
Let's assume that a 3D dictionary is obtained as the tensor product For computational purposes the 1D dictionaries are stored as three matrices D x ∈ R Nx×Mx , D y ∈ R Ny×My and D z ∈ R Nz×Mz . Suppose now that a 3D array I ∈ R Nx×Ny×Nz is to be approximated by an atomic decomposition of the form where for n = 1, . . . , k the atoms d x with R k−1 = I − I k−1 . It is the determination of the coefficients c(n), n = 1, . . . , k in (3) that gives rise to pursuit strategies which go with different names.

Matching Pursuit in 3D (MP3D)
The MP approach in 3D would simply calculate the coefficients in (3) as The main drawback of the MP method is that it may select linearly dependent atoms. Moreover, that approximation is not stepwise optimal because at iteration k the coefficients (5) do not minimize the norm of the residual error. The pursuit strategy that overcomes these limitations is the so called OMP [30].

Orthogonal Matching Pursuit in 3D
The implementation of OMP in 3D (OMP3D) we describe here is the 3D extension of the implementation of OMP in 2D given in [32]. An alternative algorithm called Kronecker-OMP, which is based on the Tucker representation of a tensor, is discussed in [14]. Our algorithm is based on adaptive biorthogonalization and Gram-Schmidt orthogonalization procedures, as proposed in [31] for the one dimensional case.
In order to ensure the coefficients c(n), n = 1, . . . , k involved in (3) are such that R k 2 3D = R k , R k 3D is minimum, the decomposition (3) should fulfill that . This is ensured by requiring that R k = I −P V k I, whereP V k is the orthogonal projection operator ii) V k = span{B k n } k n=1 .
The required arrays B k n , n = 1, . . . , k should be upgraded and updated to account for each newly selected atom. Starting from k = 1, R 0 = I, B 1 ⊗d z z k+1 are selected as in (4). The required reciprocal set B k+1 n , n = 1, . . . , k +1 is adapted and upgraded by extending the recursion formula given in [31] as follows.
including, for numerical accuracy, the re-orthogonalization step: Although the image approximation is carried out by partitioning the images into relatively small 3D blocks, memory requirements of the OMP3D method are high. Indeed, the above are 2(k + 1) nonseparable arrays each of dimension N = N x N y N z which need to be stored in double precision. Hence, we consider next a low memory implementation of the orthogonal projection step, which avoids having to store the arrays W n , n = 1, . . . , k and B k n , n = 1, . . . , k and fully exploits the separability of the dictionary.

Self Projected Matching Pursuit in 3D (SPMP3D)
The Self Projected Matching Pursuit (SPMP) methodology was introduced in [18] and conceived to be used with separable dictionaries in 2D (SPMP2D). Because the technique is based on calculations of inner products, it can be easily extended to operate in 3D (SPMP3D).
Suppose that at iteration k the selection process has chosen the atoms labeled by the triple of indices { x n , y n , z n } k n=1 and letĨ k be the atomic decompositioñ where the coefficients a(n), n = 1, . . . , k are arbitrary numbers. Every array I ∈ R Nx×Ny×Nz can be expressed as ForĨ k to be the optimal representation of I in , in the sense of minimizing the norm of the residualR, it should be true thatP V kR = 0. The SPMP3D method fulfills this property by approximatingR in V k , via the MP method, and subtracting that component fromR. The following algorithm describes the whole procedure. Starting from k = 0 and R 0 = I, at each iteration, implement the steps below.
i) Increase k ← k + 1 and apply the criterion (4) for selecting the triple of indices ( x k , y k , z k ). Save this triple in the array L(k, 1 : 3) = ( x k , y k , z k ), and set and implement the update of the residue ii) Given the indices L(n, 1 : 3) = ( x n , y n , z n ), n = 1, . . . , k of the previously selected atoms, and a tolerance for the projection error, realize the orthogonal projection up to that error as follows. Set j = 1,R 0 = R k and at iteration j apply the steps a) -c) below. a) For n = 1, . . . , k evaluate and single out the value k * such that The value k * signalizes the indices x k * , y k * , z k * corresponding to the already selected atoms with maximum correlation with the residualR j−1 .
This step subtracts from the residual a component in V k and add that component to the approximationĨ k c) Increase j ← j + 1 and repeat the steps a) -c) to keep subtracting components ofR j in V k until iteration, J say, for which the stopping criterion b) is met. This criterion indicates that, up to tolerance , the residual has no component in V k so that one Continue with steps i) and ii) to keep enlarging V k until, for a required tolerance error ρ, the condition R k 3D < ρ is reached.
Remark 1: For each fixed value k the rate of convergence through the steps a) -c) above is given in [34] for the one dimensional case. The proof for 3D is identical to that proof, because a 3D array can be represented as a long 1D vector. What varies is the implementation. A vectorized version of the algorithm would not be applicable in this context.

Implementation details
The bulk of the computational burden in the SPMP3D method lies in the realization of the selection of atoms (4). Algorithm 1 outlines a procedure implementing the process. It should be stressed once again that the algorithm is designed to use as little memory as possible, rather than to reduce complexity. At iteration k the outputs of Algorithm 1 are saved as c(k) = α Algorithm 1 Implementation of the selection of atoms (c.f. (4)) Input: 3D array R, matrices D x , D y D z the columns of which are the atoms in the corresponding dictionaries. Due to computational complexity and memory requirements, pursuit strategies using general dictionaries can only be implemented on an image partitioned into small blocks. We consider nonoverappling blocks. The approximation of each block is carried out independently of the others. When the approximation of all the blocks is concluded, these are assembled together to produce the approximation of the whole image. While the sparsity results yielded by the Algorithm 2 Selection of the triple of indices from the reduced dictionary (c.f. (10) Input: As in Algorithm 1 plus the array L, with the triple of indices L(n, 1 : 3) = ( x n , y n , z n ), n = 1 . . . k Output: k * and the corresponding values of α (c.f. (10)) to update the coefficients and residual {Initiate the algorithm} OMP3D and the SPMP3D methods are theoretically equivalent, we have seen that the latter implementation is much more economic in terms of storage demands. As discussed in Remark 2 below, this feature makes the SPMP3D algorithm suitable for possible GPU implementations using only the fast access shared memory. Assuming for simplicity in the notation that a 3D image is partitioned into cubes of size N 3 b and the dictionaries D x , D y and D z are all of the same size N b × rN b , where r > 1 is the redundancy of the 1D dictionary, the SPMP3D algorithm storage needs are as follows.
1. Two N 3 b arrays for the intensity block in the image partition and the residual of the corresponding approximation.
2. Three matrices of size N b × rN b for each dictionary, in case they are different. This still leaves 10Kb for temporary variables to be used within calculations.

Mixed Dictionaries
A key factor for the success in the construction of sparse representations is to have a good dictionary. While a number of techniques for learning dictionaries from training data have been proposed in the literature [35][36][37][38][39][40][41][42], they are not designed for learning large and highly coherent separable dictionaries. Nevertheless, previous works [18,32,33,43] have demonstrated that highly redundant and highly coherent separable dictionaries, which are easy to construct, achieve remarkable levels of sparsity in the representation of 2D images. Such dictionaries are not specific to a particular class of images. A discrimination is only made to take into account whether the approximation is carried out in the pixel intensity or in the wavelet domain. • Approximate the array U ∈ R Nx×Ny×Nz exactly as it is done in the pixel domain (pd).
• Apply the inverse wavelet transform to the approximated planes to recover the approximated intensity channels.
The mixed dictionary we consider for the 2D approximation consists of two sub-dictionaries: A trigonometric dictionary, D x T , which is the common sub-dictionary for the approximation in both domains, and a dictionary of localized atoms, which contains atoms of different shapes when used in each domain.
The trigonometric dictionary is the union of the dictionaries D x C and D x S defined below: where w c (n) and w s (n), n = 1, . . . , M x are normalization factors, and usually M x = 2N x . Thus, the trigonometric dictionary is constructed as D x T = D x C ∪ D x S . For approximations in the pd we add the dictionary, D x Lp , which is built by translation of the prototype atoms in the left graph of Fig. 1. This type of dictionary is inspired by a general result holding for continuous spline spaces. Namely, that spline spaces on a compact interval can be spanned by dictionaries of B-splines of broader support than the corresponding B-spline basis functions [45,46]. Thus, the first 4 prototype atoms h i , i = 1, . . . , 4 in the left graph of Fig. 1 are generated by discretization of linear B-spline functions of different support. For m = 1, 2, 3, 4 those functions are defined as follows: The remaining prototypes, h 5 , h 6 and h 7 , in the left graph of Fig. 1 For the other dimension we take D y pd = D x pd . For approximations in the wd we use the dictionary of localized atoms D x Lw as proposed in [33], which is built by translation of the prototype atoms p i , i = 1, . . . , 7 in the right graph of Fig. 1. Notice that p 1 = h 1 and p 3 = h 5 . The remaining prototypes are given by the vectors: pd ⊗ D y pd and D wd = D x wd ⊗ D y wd are very large, but never used as such. All the calculations are carried out using the 1D dictionaries. In order to demonstrate the gain in sparsity attained by the approximation of 3D images by partitioning into 3D blocks, we use dictionaries D wd and D pd only for the approximation of the single channel 2D images. For the 3D case we maintain the redundancy of the 3D dictionary equivalent to that of the 2D dictionary, by considering the 1D dictionaryD x pd = D x C ∪ D x S ∪ D P 1 . Notice that D P 1 is the standard Euclidean basis for R Nx , also called the Dirac's basis, i.e., the basis arising by translation of the first atom in Fig. 1. Notice thatD x pd ⊂ D x pd andD x pd ⊂ D x wd . We also considerD y pd =D x pd andD z pd =D x pd , but taking N x = N z . The redundancy of the resulting dictionaryD pd =D x pd ⊗D y pd ⊗D z pd is equivalent to the redundancy of the 2D dictionary D pd . In 3D we use the same dictionary in both domainsD wd =D pd .

Numerical Results
The merit of the simultaneous approximation of multiple channel images is illustrated in this section by recourse to two numerical examples. Firstly we make the comparison between the sparsity produced by the joint approximation of the Red-Green-Blue (RGB) channel images partitioned into blocks of size N b × N b × 3 and the sparsity obtained by the independent approximation of each channel partitioned into blocks of size N b × N b . Secondly, the full power of the approach is illustrated through the gain in sparsity attained by approximating hyperspectral images partitioned into 3D blocks, vs the plane by plane approximation.
In both cases, once the approximation of each 3D block I q in the image partition is completed, for q = 1, . . . , Q the k q -term atomic decomposition of the corresponding block is expressed in the form The sparsity of the representation of an image of dimension N = N x · N y · N z is measured by the Sparsity Ratio (SR), which is defined as: where for the 3D representation K = Q q=1 k q , with k q the number of atoms in the atomic decomposition (12). For the channel by channel decomposition of a N z -channel image, each channel is partitioned into P = (N x · N y )/N 2 b blocks I p,z , p = 1, . . . , P , which are approximated by the 2D atomic decompositions where the indices x,p,l n , y,p,l n are selected for each channel l by the OMP2D algorithm. Accordingly, the number K in (13) is given as K = Nz l=1 P p=1 k p,l , with k p,l the number of atoms in the atomic decomposition (14).
Notice that the SR is a measure of the reduction of dimensionality for representing an image.
The larger the value of the SR the smaller the dimensionality of the atomic decomposition representing the whole image. The required quality of the approximation is ensured with respect to the Mean Structural SIMilarity (MSSIM) index [47,48] and the classical Peak Signal-to-Noise Ratio (PSNR), which for a 3D image is defined as where Imax is the maximum intensity range and I K the image approximation.

Example I
In this example we use the Kodak data set consisting of 24 true color images shown in Fig. 2. Figure 2: Illustration of the Kodak data set consisting of 24 true color images, credit Rich Franzen [49].
The approximations are realized in both domains by maintaining the same redundancy in the 2D and 3D dictionaries. For the independent approximation of the 2D channels the partitions are realized with blocks of size 8 × 8 and 16 × 16 (a partition of block size 24 × 24 does not improve results for this data set). Accordingly, the simultaneous approximation of the 3 color channels involves partitions of block size 8 × 8 × 3 and 16 × 16 × 3 respectively.
As already discussed, for the independent approximation of the 2D channels we consider the dictionaries D pd (in the pd) and D wd (in the wd) as given in Sec. 3.4. For the simultaneous approximation of the 3 channels we consider the dictionariesD pd given in the same section.
Both dictionaries have redundancy of 125.
The average values of SR (SR), with respect to the 24 images in the set, are given in Table   1 for the approaches and partitions indicated by the first column.       Fig. 4).
As a final remark it is worth noting that the number k q of atoms in the approximation of each block q of an image partition produces a meaningful summary of local sparsity.

Example II
We consider now the approximation of the hyper-spectral images illustrated in Fig. 6. Details on the images acquisition and processing are described in [52][53][54].   Table 3: Same description as in Table 2, but the approximations are realized by applying first a wavelet transform to each of the 32 channels.
Remark 3: In both Table 2 and Table 3    Notice that the results in the pd are much closer to the results in the wd than they are in the case of the natural images in Fig. 6. This is because, as illustrated in Fig. 9, the planes of the remote sensing images are not as sparse in the wd as the planes of the natural images are.

Conclusions
High quality approximation of 3D images has been considered within the context of data reduction. A remarkable improvement in sparsity achieved by the simultaneous approximation of multiple channels has been illustrated through numerical experiments of different natures.
Firstly it was demonstrated that a standard data set of RGB images can be approximated at high quality using far fewer elementary components if each image is treated as a very thin 3D array instead of as 3 independent 2D arrays. Secondly the full power of the approach was demonstrated through the approximation of hyper-spectral images. For the hyper-spectral nat-ural images the sparsity is remarkably higher if the approximation is realized in the wavelet domain. For the remote sensing images the domain of approximation has less influence because, as opposed to natural images, these images are not as sparse in the wavelet domain as natural images are.
Taking into account the major reduction of dimensionality demonstrated by the numerical examples in this work, we feel confident that the proposed approach will be of assistance to the broad range of image processing applications which rely on a transformation for data reduction as a first step of further processing.