Can learning from natural image denoising be used for seismic data interpolation?

We propose a convolutional neural network (CNN) denoising based method for seismic data interpolation. It provides a simple and efficient way to break though the lack problem of geophysical training labels that are often required by deep learning methods. The new method consists of two steps: (1) Train a set of CNN denoisers from natural image clean-noisy pairs to learn denoising; (2) Integrate the trained CNN denoisers into project onto convex set (POCS) framework to perform seismic data interpolation. The method alleviates the demanding of seismic big data with similar features as applications of end-to-end deep learning on seismic data interpolation. Additionally, the proposed method is flexible for many cases of traces missing because missing cases are not involved in the training step, and thus it is of plug-and-play nature. These indicate the high generalizability of our approach and the reduction of the need of the problem-specific training. Primary results on synthetic and field data show promising interpolation performances of the presented CNN-POCS method in terms of signal-to-noise ratio, de-aliasing and weak-feature reconstruction, in comparison with traditional $f$-$x$ prediction filtering and curvelet transform based POCS methods.


INTRODUCTION
Due to existing obstacles or economic restrictions, unavoidable missing traces in acquired seismic data nonuniformly or uniformly along spatial coordinate always affects seismic inversion, amplitude-versus-angle analysis and migration. To utilize these incomplete data, many insightful researchers have developed dozens of interpolation methods to restore the missing traces. Besides frequency space (f -x) prediction filtering methods (Spitz, 1991;Naghizadeh and Sacchi, 2008), methods based on sparse representation of seismic data in a transform domain are popular these years because of their promising framework. An early example is project onto convex set (POCS) based the Fourier transform method (Abma and Kabir, 2006). In recent years, several directional wavelets including curvelets and shearlets, have been applied to sparsely present seismic events (Herrmann and Hennenfent, 2008).
These nonadaptive or highly redundant transforms have strong anisotropic directional selectivity. In consideration of characteristics of seismic data, seislet transform is presented by Fomel and Liu (2010) and used for seismic interpolation (Gan et al., 2015). Dictionary learning methods (Liang et al., 2014;Yu et al., 2015Yu et al., , 2016 and rank-reduction regularization methods (Trickett et al., 2010;Ma, 2013;Gao et al., 2013) are also successfully applied for seismic interpolation. Most of them are just suitable for randomly missing cases. For regularly subsampled seismic data with spatial aliasing, associated antialiasing techniques should be included into these methods, e.g., (Naghizadeh and Sacchi, 2010).
A machine learning method with support vector regression was successfully applied for seismic data interpolation, e.g., (Jia and Ma, 2017). Deep learning (DL), which is a fast development branch of machine learning, attracts much attention from multidisciplinary researchers. DL offers to learn amount of parameters through convolutional neural network 3 (CNN) to capture high level features in data. Recently, DL has had great success in computer vision research community, including image classification (Krizhevsky et al., 2012;He et al., 2016), denoising (Zhang et al., 2017a and superresolution (Dong et al., 2016;Kim et al., 2016). DL also has been applied for geological features identification (Huang et al., 2017) and seismic lithology detection (Zhang et al., 2018a). For seismic interpolation, primary attempts were made by Wang et al. (2018) using residual network  and Alwon (2018) using generative adversarial networks (Goodfellow et al., 2014) to recover seismic data from regularly sampled observations. These end-to-end approaches directly learn to interpolating in certain missing case on synthetic seismic training datum because of lack of training label data. To process field data, however, feature similarity to training data is required to obtain good results.
In this paper, we propose a simple and efficient plug-and-play approach for seismic interpolation, which integrates deep networks into POCS algorithm. Via the POCS algorithm framework, we convert the original interpolation problem to a denoising task and the CNN plays an important role of denoising. Our work is similar in spirit to the using DL network as a regularizer (Zhang et al., 2017b;Liu et al., 2018) in image processing. However, while they use half quadratic splitting (Geman and Yang, 1995) or ADMM (Boyd et al., 2011) to separate regularization away from fidelity term and then replace the regularization using deep learning networks, we intuitively use deep learning networks to perform the denoising mission that exists in POCS algorithm. Instead of training CNN on seismic data, we trained a set of CNNs on noisy-clean natural image pairs. Numerical experiments both on synthetic and field show that the proposed method is qualified to get pretty good restoration results for seismic interpolation.

METHOD
Seismic data interpolation aiming at recovering complete data d from observed incomplete data d obs can be characterized as where P Λ denotes the subsampling matrix and Λ denotes the ratio of subsampling. Seismic data can always be sparsely represented by where Φ could be a sparse transform or dictionary learning, such as curvelet transform, and x is a vector of representation coefficients. Thus we could recover complete data or dense d by regularizing x to be sparse, that is, resolving the following optimization problem where ψ(x) denotes the sparse regularization on x, and usually being ψ(x) = x 1 . The problem is often called sparsity-promoting compressed sensing reconstruction. There are fruitful algorithms to solve such an optimization problem, such as the well-known iterative shrinkage-thresholding (IST) algorithm (Daubechies et al., 2004), its accelerated version FIST algorithm (Beck and Teboulle, 2009), and split Bregman method (Goldstein and Osher, 2009).
POCS algorithm is another simple iterative method for recovering d. It can be easily derived from IST algorithm (Yang et al., 2012) giving where the soft thresholding operator T λ is defined as Quite a few seismic interpolation methods dependent on the POCS algorithm were proposed by considering different Φ, e.g., the curvelet transform based POCS methods (Yang et al., 2012) and data driven tight frame methods (Liang et al., 2014) that learn dictionaries from a given data.
Interpolation can be often completed by iterative denoising algorithms. Equation 4 can be viewed as a denoising procedure since small representation coefficients usually correspond to noises in signals, which are eliminated during iterations. We can define a generalized POCS algorithm framework as follows where D σ k denotes the denoising operator with respect to noise variance σ k . This generalization is consistent with its special case D σ k (d (k) ) = ΦT λ k (Φ T d (k) ) due to the regularizing parameter λ k is a function value of σ k as literature analyzed. Usually in equation 4, λ k decays as iteration increases. There is a more intuitive explanation from the generalized framework: the recovered data progressively approximate the noise-free target as iteration increases with σ k decreases.
Denoising operator is the key ingredient of the generalized POCS framework. We utilize deep CNN as our denoising operator because it can achieve higher denoising quality in image processing compared with those non-deep-learning denoising methods. Figure 1 shows the 6 flow chart of our proposed method, where Σ is a decreasing sequence and its definition is given in next subsection. We refer the proposed method as CNN-POCS.

Train CNN denoisers
It is widely acknowledged that CNNs generally benefit from big training data. Instead of training on seismic data, we choose to train on natural images due to its labels are easy avail- Definition of Σ. The decaying discrete sequence Σ plays a central role as a prediction and reguarizer of noise variance of reconstructed data during iteration. Here we give the mechanism of generating such a sequence. The i-th element Σ i in this discrete decreasing sequence is defined as where Λ is the sampling ratio, T denotes the total number of iterations and λ controls the highest noise variance. In t-th iteration, the CNN denoiser pre-trained on noise level that closest to σ t will be loaded to perform denoising task. Usually, Λ = 12 is good enough for the CNN-denoiser based method to produce good results. When Λ decreases, smaller λ can make the CNN-POCS method improvement with respect to the signal-to-noise ratio (SNR), which is defined as 10 log( ), whered is the reconstructed data.

EXAMPLES
In this section, we test the presented CNN-POCS method for seismic interpolation. The set of 25 CNN denoisers was trained on natural images as described above and the discrete decreasing noise variance sequence was generated as it is defined. We set the total number of iterations in CNN-POCS method as T = 30. The traditional Spitz f-x prediction filtering method (Spitz, 1991) and recent curvelet-POCS method (Yang et al., 2012) are used for comparisons. The number of iterations in curvelet-POCS method was set to 100 so that it achieved its convergence. Nearest-neighboring interpolation method was used to generate the initial data for the proposed method. In each experiment, the parameters were empirically adapted to obtain best results.

Synthetic examples
We first test a synthetic seismic data as shown in Figure 4a. We decimate the original data by a factor of a (eliminate a − 1 traces between each pair of traces.) leading to Λ = 1/a regularly subsampled data. We consider several decimating factor varying from 2 to 5.  Figure 4 shows the reconstructed data of the three methods in case of a = 2. The associated f -k spectra are shown in the right column, wherein the horizontal axis denotes the normalized wavenumber, and the vertical axis represents the 8 frequency. Spacial aliasing can be observed obviously in Figure 4d and it is well removed by CNN-POCS method in Figure 4j. The f -k spectrum comparison further illustrates the validity of our CNN-POCS method.

Field data examples
Two field data are provided to further assess the flexibility and validity of the CNN-POCS method. Figure 5a shows one post-stack field data. The results of interpolation in cases of missing traces regularly with a = 2, 3, 4, 5 on this data are reported in Table 2. The CNN- POCS method outperforms f -x method and curvelet-POCS method significantly when a ≤ 3. We also provide a single trace comparison when a = 2 in Figure 5b, wherein the bias of CNN-POCS method is minor.
Dense data reconstruction. Figure 6 shows a prestack field shot gather, which has 294 traces with a 12.5 m trace interval and 879 temporal samples with 0.004 s temporal interval. Three interpolation algorithms are adopted to this field data which has inadequate acquisition trace interval to construct dense data for fine reservoir characterization and specific processing algorithm. The enlarged rectangle area are shown in Figure 7. We can see the spatial visual-serration effects are weakened by the curvelet-POCS method and the CNN-POCS method. Linear weak events which are more continuous can be obviously observed in Figure 7d, which demonstrates the proposed CNN-POCS method reconstructs weaker reflections better.

DISCUSSION
Results from the previous section show the proposed generalized POCS algorithm associated with CNN denoisers that are pre-trained on natural images is able to produce satisfying interpolation quality on both synthetic and field data. The data used in tests are dominated by different features, which indicates that our proposed method does not require feature similarity among the data to be processed. Additionally, the reconstructed aliasing-free data can be beneficial to subsequent seismic data processing steps. However, the proposed method has its own limitations. First, the denoising effect of these CNN denoisers pretrained on natural images is still limited for seismic data because the distribution that natural images are drawn from is quite different from that the seismic data contributes. One possible solution is 'semi-seismic'. In other words, we can train CNN denoisers on dataset that consists of both natural images and seismic data. Secondly, the work of training a set of 25 CNN denoisers may be arduous. This can be resolved by using a more advanced network, FFDNet (Zhang et al., 2018b), which is adpative to many noise levels. In Table 1 and Table 2, we discover that the performance of the CNN-denoiser based method degenerates rapidly when the decimating factor becomes larger than 3. The main reason causing this situation is that the true noise variance goes far beyond the noise level that the pre-trained CNN can process in these cases. Therefore, the CNN denoisers that are trained on noise level beyond 50 are required for interpolation at low sampling ratio. The performance can also be further improved by investigating new denoising CNN architectures and other optimization algorithm frameworks.
10 SNR (dB) comparison of three methods on field data, a significant improvement of the CNN-POCS method over curvelet transform method when a ≤ 3.
16 The architecture of denoiser network proposed by Zhang et al. (2017b) is used in our work. Note that "s-DConv" denotes s-dilated convolution; "BNorm" represents batch normalization; "ReLU" is rectified linear units. Initialize u (t) , Σ, t = 0 Second column: associated data in f -k domain. From top to bottom, original data, 50%