Seismic Data Reconstruction via Matrix Completion

In seismic processing, one goal is to recover missing traces when the data is sparsely and incom-pletely sampled. We present a method which treats this reconstruction problem from a novel perspective. By utilizing its connection with the general matrix completion (MC) problem, we build an approximately low-rank matrix, which can be reconstructed through solving a proper nuclear norm minimization problem. Two efﬁcient algorithms, accelerated proximal gradient method (APG) and low-rank matrix ﬁtting (LMaFit) are discussed in this paper. The seismic data can then be recovered by the conversion of the completed matrix into the original signal space. Numerical experiments show the efﬁciency and high performance of data recovery for our model compared with other algorithms.


I. INTRODUCTION
Seismic data is often irregularly sampled along spatial coordinates, which is mostly caused by dead or severely corrupted traces, surface obstacles, acquisition aperture and economic limit.
Seismic data regularization spatially transforms irregularly acquired data to regularly sampled data in order to subsequently display, process and interpret it. This is a long-standing and important problem in seismic processing. Seismic data interpolation and reconstruction is one particular case of data regularization. Nowadays, the recovery of missing traces has become a main issue in seismic research, where input data are already measured or observed in a regular grid and one needs to reconstruct the value in missing traces (i.e., empty bins).
Different approaches have been proposed to solve this problem. Some of these techniques require converting the data into different domains with certain transform dictionary, such as Y. Yang  Fourier transform ( [1], [2], [3], [4]), curvelet transform ( [5], [6], [7]) and shearlet transform ([8]). The idea of L 1 -norm minimization was applied in ( [5], [6], [8]) to reconstruct the missing traces of seismic data. Another popular technique is the use of prediction filters ( [9], [10], [11]), where low-frequency non-aliased components or observed data are used to build antialiasing error-prediction filters and then the filters are applied to interpolate high-frequency components or missing traces. The idea of prediction filters has been extended by many researchers in the past few years. For instance, Liu and Fomel ([12]) proposed a regularized nonstationary autoregression for adaptive error-prediction filtering; Naghizadeh and Sacchi ([13]) used multidimensional prediction filters for seismic data reconstruction and in ( [7]) they proposed prediction filters using curvelet transform instead of Fourier transform.
Recently, rank-reducing methods are used for seismic data reconstruction. Trickett et al. ([14]) presented a truncated singular value decomposition based on rank reduction of constant-frequency slices for trace interpolation. In ( [15]), Oropeza and Sacchi reorganized the seismic data into a Hankel matrix, and then used multichannel singular spectrum analysis (MSSA) to solve the rankreduction problem of the Hankel matrix. The MSSA comes from the generalization of singular spectrum analysis (SSA) ( [16]). Basically, SSA is used for 2D data and MSSA is applied to 3D data recovery.
Mathematically speaking, this reconstruction problem can be understood in the following way: we are given a matrix with some missing columns, and we want to complete the matrix by filling in the missing data. This interpretation has some similarities with the matrix completion (MC) problem, in which we are given some components of a matrix X and we want to reconstruct X with its rank as low as possible. In this paper, we will explore the connection in a novel way such that a lot of efficient MC algorithms can be applied to seismic data recovery. The technology of rank reduction has been used for data reconstruction in MSSA based methods in [15], but here we start from different theory and algorithms.
From the mathematical theory of MC, however, one can not simply define the given signal matrix as an input in MC framework. The main reason is that based on the justification in [17], it is impossible to reconstruct a matrix when the sampling set avoids any column or row of X. That is to say, we need at least one nonzero value in each column vector. Unlike the applications of MC in image inpainting where pixels are randomly missed, we have several columns missing in seismic data. In order to apply the MC idea, we should first design a pre-DRAFT transformation which transforms the given seismic data with column/trace missing to a new matrix with pixel missing, i.e., each column and row of the new matrix has at least one nonzero entry. Besides, sometimes the original signal may have complicated structure and the matrix may not necessarily be low-rank. Therefore, we should try to explore special structure of the given data and use pre-transformation to construct a low-rank matrix with proper sampling set.
In this paper, we build a texture matrix through pre-transformation. This new matrix is either low-rank or can be approximated by a low-rank matrix. With very high probability there is at least one nonzero entry in each row and column of it. Hence it can be reconstructed via regular MC algorithms. We can then recover the original signal matrix through transforming the completed low-rank matrix into the original space. The main contribution of our work is that we attack seismic reconstruction from a completely new angle. By making the connection with MC problem, a lot of efficient MC algorithms from convex optimization can be utilized to recover the seismic data. The proposed method is simpler and faster than existing methods. This paper is organized as follows. In section II and III, we will give a detailed description about the MC problem and some commonly used algorithms for solving it. Section IV focuses on the construction of texture matrix. The performance of our model is illustrated in section V with comparison to algorithms mentioned in [15]. We will end this work by a short conclusion.

II. MATRIX COMPLETION
Nowadays, a lot of real world models can be categorized as MC problems, such as video denoising ( [18]) etc. The general form of the problem is: Here rank(X) is the number of nonzero singular values of X. Sometimes when noise is involved in the measurements, the model is adjusted to with a properly tuned parameter µ. Here P Ω stands for the the projection onto the subspace of matrices with nonzero entries restricted to the index subset Ω, and ∥ · ∥ F , the Frobenius norm, DRAFT is defined as for any matrix A = (a ij ) m×n . However, solving the above two models are often numerically expensive. Hence people tend to consider their relaxation: and Here ∥X∥ * stands for the nuclear norm of X, which is the L 1 norm of the singular values. As a matter of fact, similar relaxation also appears in compressed sensing theory, where minimizing the L 1 norm of a vector is used as a relaxation of L 0 minimization.
In raw seismic data X = (x ij ) m×n , the i denotes temporal sampling and j stands for trace sampling, i.e., the spatial location of receivers. For instance, the first column of X is the 1D time sequence signal acquired at the first receiver. Each column of X is named as one seismic trace.

III. FAST ALGORITHMS FOR MC
Lots of algorithms have been proposed to solve (4) and (5), such as linearized Bregman, fixed point continuation with Bregman iteration etc. See [19], [20], [21], [22] for more details. In this paper we will focus on the following two, accelerated proximal gradient method (APG, it is the same as FISTA) and LMaFit. We need to point out that LMaFit is not designed to solve the standard MC problem. It is actually used for solving the low-rank matrix reconstruction where the low-rank matrix is parameterized in a factorization form. More details about it will be provided in the following.
Accelerated proximal gradient method comes from [23], [24], and it was applied to MC problems in [25]. It is a classic method for solving (5). In order to illustrate this method, let us briefly introduce the idea of proximal mappings. DRAFT The proximal mapping (prox-operator) of a convex function h is defined as Let us assume we want to minimize a function f (u) of the form with a convex and differentiable g(u), and a convex h(u) with inexpensive prox-operator. The proximal gradient method computes the minimizer of f (u) iteratively via with a suitable step size t k . For (5), by choosing t k ≡ τ we have Here APG uses extrapolation on the result from (8) and is usually several times faster than regular proximal gradient method. In order to obtain the optimal X of (5), in each iteration we update X k in the following way: Here τ has to be chosen properly. We also want to mention that with Bregman iteration, the solution to (4) can be obtained by solving a sequence of (5) with noise-add-back skill in each iteration: LMaFit, proposed in [26], is an efficient algorithm designed to solve DRAFT Here X ∈ R m×k , Y ∈ R k×n , and Z ∈ R m×n , where k is a predicted rank bound. With an appropriate k, the product of the optimal X and Y from (12) is also a minimizer of (4). X, Y , Z can be updated in an alternating fashion. Following the idea of nonlinear successive-overrelaxation (SOR) algorithm, the author in [26] used weighted average between this update and the previous iterate. In each iteration, the new X, Y and Z are defined in the following way: with weight parameter ω > 1. Here (·) † stands for the pseudo inverse of a matrix. As we can see, this algorithm avoids singular value decomposition (SVD) in each iteration, which is the main time cost in most of the algorithms for (4). Hence when timing is the main issue, LMaFit is undoubtedly a great choice.

IV. MODEL CONSTRUCTION: PRE-TRANSFORMATION
In our discussion we assume that the number of consecutive missing columns in the given matrix is bounded above by a constant C. This is a reasonable assumption since the data can not be restored properly if too many successive columns are missing at the same time. As mentioned in introduction, the motivation of pre-transformation involves two aspects: 1) reshape the data with column missing to a new format with pixel missing as required by MC theory; 2) after the transformation, the new data matrix has low rank or can be approximated by a low-rank one. In this paper, we consider the following texture matrix.
Due to the continuity and integrity of seismic data, it is reasonable to believe that our signal only has a few texture patterns, which is defined as base texture in [27]. Inspired by it, if we divide the signal matrix into small r × r patches, each submatrix should be a combination of these base textures. Furthermore, if we rearrange each patch into a vector with the same ordering, then these vectors will be (highly) linearly dependent, i.e., the patched matrix has lower rank or can be approximated with a low-rank matrix. Later we will justify this claim with numerical test on real data. In [27], Schaeffer and Osher defined the texture matrix as follows: Definition 1: Given the original signal X ∈ R m×n and patch size r, we can divide X into mn/r 2 subblocks, labeled as {B i }. Each B i can be rearranged into a column vector w i following the same ordering. The texture matrix T is defined as The nuclear norm of T is defined as the texture norm of X. The nuclear norm minimization is correspondingly shifted to texture norm minimization. Remark: • The specific ordering which maps each patch to a vector is not important, as long as it is consistent. In this paper the patches are labeled in the following way:  And for each patch, e.g. B 1 , the patch-vector is defined as: =⇒ (x 11 , · · · , x r1 , x 12 , · · · , x r2 , · · · , x rr ) Tt = w 1 . (16) where T t denotes transpose.
• In the definition of T , the patches are defined in a non-overlapping way. The overlapping case is not considered for two reasons. Firstly, if we construct the texture matrix T ∈ R m 1 ×n 1 with overlapping patches, we shall have m 1 n 1 > mn, hence the new matrix belongs to a space with higher dimension than the original one. This increases the complexity of the problem, and may not be feasible since we must apply MC techniques to this larger matrix and then transform it back to the original space. More importantly, if each pixel contains noise, then the robustness from applying patch based rank reduction techniques will be lost once the patches overlap. The reason is as follows: the overlap causes the patches to become highly correlated, thus the noise is no longer random and cannot be separated via MC and other rank reduction techniques. More explanation about this can be found in the DRAFT latest version of [27], to appear soon.
In order to apply MC techniques on the new matrix T , we first need to ensure that if we choose r properly, with high probability there is at least one nonzero entry for each row and column of T . If we have a missing column in the new matrix, then one of the patches B k has to be all-zero, which means that the consecutive r columns that B k stays in are all missing. This can not happen if r is greater than C.
On the other hand, if we have a missing row in the new matrix, then there exists 1 ≤ i ≤ r, Hence the j th , (j +r) th , . . . , (j +n−r) th columns should all be zero. Since we are dealing with column missing, finding the probability of this case is the same as the following: given a length n vector with missing rate ρ, where at most C consecutive elements are missing at the same time, we need to find the probability that there exists j between 1 and r such that the j th , (j + r) th , . . . , (j + n − r) th entries are all missing.
The exact probability for this case is hard to measure. Here we consider a sampling model that people often use in application: jittered sampling [6]. We first divide the long vector into several non-overlapping short vectors of length C/2ρ, and choose ρ · C/2ρ elements from each part. In this way at most C/2 consecutively entries are missing from each short vector, hence the original vector will have at most C consecutive missing entries. Let us choose r > C/2ρ, then for each j value between 1 and r, the j th , (j + r) th , . . . , (j + n − r) th entries will stay in different short vectors, and the probability of each element missing is independent and equals ρ. We then know that for a particular j, the probability of the j th , (j + r) th , . . . , (j + n − r) th elements all missing is ρ n/r . Since j can be any number between 1 and r, the overall probability is bounded by rρ n/r . Therefore, if we have r > max(C, C/2ρ) and n/r large, T will have at least one nonzero element for each row with extremely high probability. For example, when we have n = 128, ρ = 0.5 and C = 2, if we choose r = 8, the probability of having one row missing in T is less than 1.2 · 10 −4 .
Next we use real data to show that for the original seismic data with no missing traces, the transformed T can be approximated by a low-rank matrix. When the data has some missing columns, the associated T may not be close to a low-rank matrix. Here two data sets X 1 , X 2 are considered. For each test matrix, we generate the pre-transformation T with the original data and the data with missing columns. Different r values are used in our test, and we display the DRAFT singular value plot in the following Figure 1. We can see from Figure 1

V. NUMERICAL RESULTS
We test the above model with different algorithms on real seismic data. In each experiment, we fix the signal matrix and conduct the comparison with different sampling sets. 50% of the traces are extracted with jittered sampling and C = 2. The test is conducted with 10 different sampling sets and we record the mean time cost and signal-to-noise ratio (SNR). SNR is denoted by 10 log 10 ( where X * is the original signal matrix and X is the recovered one. We also display the comparison of one original (missing) trace with reconstructed traces by different methods.

A. Comparison with SSA based algorithm
In our first test, we compare our model with the SSA based algorithm in [15] (algorithm (11) in that paper). For simplicity from now on we will use SSA to denote this method. A detailed description of this algorithm is provided in appendix. For SSA filter, we choose rank threshold k = 5 for X 1 and k = 7 for X 2 . The number of outer iteration is set to be 10 for X 1 and 15 for X 2 . Since this algorithm tends to take a long time for large data set, we restrict the size of the signal matrix to 128 × 128. APG and LMaFit are applied in our model. The result is shown below in Figure 2  We can see clearly from these figures that for both data sets the texture-patch model requires much less time and also gives better results. According to the trace comparison, the results from our model match the true trace perfectly while SSA might miss some parts. We can optimize the results of SSA by taking more outer iterations, but we have to sacrifice even more in time cost. For both tests, APG gives the highest SNR with medium time cost. According to the signal matrix graph, the recovered result from APG is smoother than the other two. When the data becomes more complicated (X 2 ) with edges and corners, the result from APG and LMaFit are comparable while LMaFit takes much less time.

B. Test with large data sets
This time the original 512 × 512 signal matrices are used in the experiment. APG and LMaFit are tested and compared. When we use X 1 (Figure 1  When the original signal becomes more complicated (Figure 1 (b)), both SNR become undoubtedly lower while still acceptable. According to Figure 5, this time we would rather choose LMaFit since it takes less time and gives similar SNR. We may say that when the original signal has complicated structure, LMaFit should be our choice.
We also consider noisy case for the texture model. X 1 is used as input and Gaussian noise is added to the given data. We define ρ to be the ratio between the variance of the noise and the L ∞ norm of the seismic data. The results for ρ = 0.01 and 0.02 are depicted in Figure 6 and

VI. CONCLUSION
In this paper, we discover the connection between seismic data reconstruction and the general matrix completion problem. We formulate the seismic data reconstruction as a nuclear-norm the Hankel matrix for row j is constructed via x j,1 x j,2 · · · x j,Kn x j,2 x j,3 · · · x j,Kn+1 . . . . . . . . . . . .
x j,Ln x j,Ln+1 · · · x j,n Usually we choose L n = [n/2] + 1 and K n = n − L n + 1 to make H j close to square. The SSA filter is defined as follows: for each row j, we first construct the Hankel matrix H j , and then apply SVD to get H j = U diag(σ)V . Here σ i are singular values. Only the first k largest singular values and the corresponding singular vectors of H j are kept for the next step, i.e., we only treat H k j = U k diag(σ 1 , · · · , σ k )V k as useful information. The final step is averaging the antidiagonal of each block of H k j to reform the row j. It has been proven in [28] that if the data consists of k complex exponentials, the associated Hankel matrix of the data is a matrix of rank k. Usually the noise and missing data will increase the rank of Hankel matrix, hence DRAFT proper rank reduction technique can be used for data denoising and recovery.
Let us define this SSA filter for matrix X as F(X), X obs as the original given matrix with missing traces, and P I as the projection onto the given data set. The SSA based algorithm we used as comparison (Algorithm 11) is illustrated below: The number of iterations taken for this algorithm is denoted as outer iteration in our discussion. DRAFT