Deep Convolutional Neural Network and Sparse Least Squares Migration

We have recast the forward pass of a multilayered convolutional neural network (CNN) as the solution to the problem of sparse least-squares migration (LSM). The CNN filters and feature maps are shown to be analogous, but not equivalent, to the migration Green ’ s functions and the quasi-reflectivity distribution, respectively. This provides a physical interpretation of the filters and feature maps in deep CNN in terms of the operators for seismic imaging. Motivated by the connection between sparse LSM and CNN, we adopt the neural network version of sparse LSM. Unlike the standard LSM method that finds the optimal reflectivity image, neural network LSM (NNLSM) finds the optimal quasi-reflectivity image and the quasi-migration Green ’ s functions. These quasi-migration Green ’ s functions are also denoted as the convolutional filters in a CNN and are similar to migration Green ’ s functions. The advantage of NNLSM over standard LSM is that its computational cost is significantly less and it can be used for denoising coherent and incoherent noise in migration images. Its disadvantage is that the NNLSM quasi-reflectivity image is only an approximation to the actual reflectivity distribution. However, the quasi-reflectivity image can be used as an attribute image for high-resolution delineation of geologic bodies.

There are many types of neural network or machine-learning methods, including generative adversarial networks for seismic interpolation (Siahkoohi et al., 2018), residual networks for traces missing at regular intervals (Wang et al., 2019), Monte Carlo and support vector regression (Jia et al., 2018) for data interpolation, autoencoders (Shi et al., 2020;Wang et al., 2020) for target horizons tracking, and recurrent neural networks for well-log interpolation (Pham et al., 2020).The design of geophysical CNN architectures is largely based on empirical evidence from computer vision research, insights from the principles of artificial intelligence, and heuristic experimentation.Heuristic experimentation is most often used to decide the parameter design (the design of a CNN architecture selects the number of CNN layers, the number of filters/layer, the size of the filters, the type of activation functions, the number of skip layers, and whether a layer acts as a decoding or encoding operation) for the CNN architecture, which has merits and liabilities.The merit is that trial-and-error with different architecture parameters is likely to give excellent results for a particular data set, but it might not be the best one for a different data set.This shortcoming in using empirical tests for parameter selection largely results from the absence of a rigorous mathematical foundation (Papyan et al., 2016(Papyan et al., , 2017a) ) for neural networks in general, and CNN in particular.
To partly mitigate this problem for CNN-based imaging algorithms, we now present a physical interpretation of CNN filters and feature maps in terms of physics-based operators for seismic imaging.With such an understanding, we can use a physics-based rationale for the better design of CNN architectures in seismic migration and inversion.Donoho (2019) points out that "machine learning has a troubled relationship with understanding the foundation of its achievements well and its literature is admittedly corrupted by anti-intellectual and antischolarly tendencies."Progress in advancing the capabilities of deep neural networks will be severely stymied unless its mathematical foundations are established.As a first step in this direction, Papyan et al. (2016) propose that the forward modeling operation of CNN could be recast as finding the sparsest model m under the L 1 norm subject to honoring the data-misfit constraint kΓm − dk 2 2 ≤ β: where m Ã is the optimal solution, Γ is the dictionary matrix, d represents the signal, and the scalar β is the specified noise tolerance.
The iterative solution to this problem is a series of forward modeling operations of a neural network, in which the mathematical operations of each layer consist of two steps: a weighted summation of input values to give the vector z followed by a two-sided soft thresholding operation denoted as σðzÞ (Papyan et al., 2016).
The sparse constraint problem defined in equation 1 is commonly seen in geophysics, e.g., least-squares migration (LSM) with a sparsity constraint.LSM is an important seismic-imaging technique that produces images with better balanced amplitudes, fewer artifacts, and better resolution than standard migration (Lailly, 1983;Tarantola, 1987;Schuster, 1993;Chavent and Plessix, 1999;Nemeth et al., 1999;Duquet et al., 2000;Feng and Schuster, 2017;Schuster and Liu, 2019).The sparsity constraint is one of the important regularization terms used in solving the ill-conditioned least-squares problem (Sacchi and Ulrych, 1995;De Roeck, 2002;Kühl and Sacchi, 2003;Wang and Sacchi, 2005), and sparse LSM (SLSM) has been demonstrated to be effective for mitigation of aliasing artifacts and crosstalk in LSM (Wang and Sacchi, 2007;Herrmann et al., 2009;Dutta, 2017;Witte et al., 2017;Li and Gao, 2018).Image-domain SLSM finds the optimal reflectivity m Ã with sparse constraints to minimize the objective function kΓm − m mig k 2 2 , where Γ is the Hessian matrix and m mig is the migration image.Here, we can see that SLSM shares the same problem as that defined in equation 1.Following the work of Papyan et al. (2016), we show that the sparse solution to the LSM problem reduces to the forward modeling operations of a multilayered neural network.The CNN filters and feature maps are shown to be analogous, but not equivalent, to the migration Green's functions (Hessian) and the reflectivity distribution.
The standard SLSM algorithm needs to solve the wave equation, which is time-consuming.Motivated by the connection between SLSM and CNN, we propose the neural network version of sparse LSM, which does not need to solve the wave equation for the backpropagation of residuals and is faster than standard SLSM.Instead of just finding the optimal reflectivity m Ã , we optimize for the quasireflectivity m and the quasi-migration Green's functions Γ.These quasi-migration Green's functions approximate the role of migration Green's function (Schuster and Hu, 2000) and are denoted as the convolutional filters in a CNN.As discussed in Appendix A, the migration Green's function is the point scatterer response of the migration operator.The final image is denoted as the neural network least-squares migration (NNLSM) estimate of the quasi-reflectivity distribution that honors the L 1 sparsity condition.
The next section shows the connection between the multilayer neural network and the solution to the multilayer NNLSM problem.This is followed by numerical examples with synthetic models and field data from the North Sea.

THEORY OF NEURAL NETWORK LSM
The theory of standard image-domain LSM is first presented to establish the benchmark solution in which the optimal reflectivity function minimizes the image misfit under the L 2 norm.This is then followed by the derivation of the SLSM solution for a single-layer network.The final two subsections derive the NNLSM solution for single-layer and multilayer networks, respectively.

Least-squares migration
The LSM problem can be defined (Schuster and Hu, 2000;Schuster, 2017) as finding the reflectivity coefficients m i in the N × 1 vector m that minimize the L 2 objective function where Γ ¼ L T L is the symmetric N × N Hessian matrix, L is the forward modeling operator, and L T is the migration operator.Here, m mig ¼ L T d is the migration image computed by migrating the recorded data d with the migration operator L T .Alternatively, the image-domain LSM problem can also be defined as finding m that minimizes ϵ ¼ 1∕2ðm T Γm − m T m mig Þ, which has a more wellconditioned solution than the one in equation 2 (Schuster, 2017).However, we will use equation 2 as the definition of the LSM problem to be consistent with the notation from Papyan et al. (2017a).
The kernel associated with the Hessian matrix L T L is also known as the point scatterer response of the migration operator or the migration Green's function (Schuster and Hu, 2000).It is a square matrix that is assumed to be invertible; otherwise, a regularization term is incorporated into the objective function.
A formal solution to equation 2 is where it is too expensive to directly compute the inverse Hessian Γ −1 .Instead, a gradient method gives the iterative solution where α is the step length, Γ is symmetric, and m ðkÞ is the solution at the kth iteration.Typically, a regularization term is used to stabilize the solution, e.g., the sparse constraint that will be introduced in the next subsection.

SLSM
The SLSM in the image domain is defined as finding the reflectivity coefficients m i in the N × 1 vector m that minimize the objective function ϵ (Perez et al., 2013): where Γ ¼ L T L represents the migration Green's function (Schuster and Hu, 2000), λ > 0 is a positive scalar, m mig ¼ L T d is the migration image, and SðmÞ is a sparseness function.For example, the sparseness function might be SðmÞ ¼ kmk which can be approximated by an iterative gradient-descent method: Here, SðmÞ 0 i is the derivative of the sparseness function with respect to the model parameter m i and the step length is α.Vectors and matrices are denoted by boldface lowercase and uppercase letters, respectively.When SðmÞ ¼ kmk 1 , the iterative solution in equation 7 can be recast as where soft is the two-sided soft thresholding function (Papyan et al., 2016) derived in Appendix B (see equation B-4).Here, Γ ¼ L T L is computed by solving the wave equation to get the forward modeled field and backpropagating the data by a numerical solution to the adjoint wave equation.Equation 8 is similar to the forward modeling operation associated with the first layer of the neural network in Figure 1.That is, set k ¼ 0, m ð0Þ ¼ 0, α ¼ 1, and let the input vector be the scaled residual vector r ¼ −ðΓm ð0Þ − m mig Þ ¼ m mig so that the first-iterate solution can be compactly represented by m ð1Þ ¼ softðΓ T m mig ; λÞ: (9) Here, the input vector r ¼ m mig is multiplied by the matrix Γ T to give z ¼ Γ T r, and the elements of z are then thresholded and shrunk to give the output m ¼ softðz; λÞ.If we impose a positivity constraint for z and a shrinkage constraint so that λ is small, then the soft thresholding function becomes that of a one-sided threshold function, also known as the rectified linear unit or ReLU function.
To simplify the notation, the softðz; λÞ function or ReLUðzÞ function is replaced by σ λ ðzÞ so that equation 9 is given by For the ReLU function, there is no shrinkage so λ ¼ 0. Γ in equation 10 is computed by forward and backward solutions to the wave equation.Unlike a neural network, the physics of wave propagation is included with the SLSM solution in equation 8.

NNLSM
We now propose the neural network version of SLSM that finds the Γ Ã and m Ã that minimize equation 5, which is equivalent to the convolutional sparse coding problem.We denote the optimal solution m Ã as the NNLSM image.Here, we assume that the migration image m mig can be decomposed into components that have the form Γ 1 m 1 , where m 1 represents a sparse quasi-reflectivity structure for the first CNN layer in Figure 1 and Γ 1 has a convolutional structure.The solution can be found by using the alternating direction method of multipliers method either in the Fourier domain (Heide et al., 2015) or in the space domain (Papyan et al., 2017b), which alternates between finding Γ Ã (the dictionary learning problem) and then finding m Ã (the sparse-pursuit problem).The key difference between least-squares sparse inversion and NNLSM is that Γ is not computed by numerical solutions to the wave equation.Instead, the coefficients of Γ are computed by the usual gradient-descent learning algorithm of a CNN.For this reason, we denote m Ã as the quasi-reflectivity distribution and Γ Ã as the quasi-migration Green's function.
Appendix C provides the general solution for NNLSM for a singlelayer neural network, where the optimal Γ Ã is composed of the quasimigration Green's functions, which are denoted as convolutional Figure 1.The forward modeling procedure for a multilayer CNN is equivalent to the multilayer sparse solution.filters in machine learning terminology (Liu andSchuster, 2018, 2019).Each filter is used to compute a feature map that corresponds to a subimage of quasi-reflection coefficients in the context of LSM.
We now compute the NNLSM image for a 1D model, where we assume m mig is an N-dimensional vector which can be expressed as Here, γ i is the ith local filter with the length of n 0 , m 0 1i is the ith feature map, * denotes the convolution operator, and k 0 is the number of the filters.Alternatively, following Figure 2a, equation 11 can be written in matrix form as Papyan et al., 2017a), where Γ 1 is a convolutional matrix with its columns consisting of the k 0 filters with all of their shifts.The term Γ 0 1 is a concatenation of banded and circulant (we shall assume throughout this paper that boundaries are treated by a periodic continuation, which gives rise to the cyclic structure) matrices, which is the same as Γ 1 except that the order of the columns is different.The term m 0 1 is a concatenation of the feature map vectors m 0 1i onto a single long vector for i ¼ 1; 2; The advantage of NNLSM is that only inexpensive matrix-vector multiplications are used and no expensive solutions to the wave equation are needed for backward and forward wavefield propagation.As will be seen later, convolutional filters that appear to be coherent noise can be excluded for denoising the migration image.

Multilayer NNLSM
Multilayer NNLSM is a natural extension of single-layer NNLSM.For NNLSM, the migration image m mig can be expressed as m mig ¼ Γ 1 m 1 (Figure 2a), where there are k 0 filters in Γ 1 and k 0 sub-quasi-reflectivity images in m 1 .Following Sulam et al. (2018), we can cascade this model by imposing a similar assumption on the sparse representation m 1 , i.e., m 1 ¼ Γ 2 m 2 , for a corresponding convolutional matrix Γ 2 with k 1 local filters and a sparse sub-quasi-reflectivity image m 2 , as depicted in Figure 2b.In this case, the filter size is n 1 × k 0 and there are k 1 sub-quasi-reflectivity images in m 2 .
Similar to the derivation by Papyan et al. (2017a) and Sulam et al. (2018), the multilayer NNLSM problem is defined as the following: where Γ i is the ith Hessian matrix in the ith layer.The first iterate solution to the above system which is a repeated concatenation of the two operations of a multilayered neural network: matrix-vector multiplication followed by a thresholding operation.In all cases, we use a CNN in which different filters are applied to the input from the previous layer to give feature maps associated with the next layer, as shown in Figure 1.
The migration image m mig can be approximated as m mig ¼ Γ 1 Γ 2 : : : Γ N m N .We refer to Γ ðiÞ as the effective filter at the ith level, The next section tests the effectiveness of NNLSM on synthetic data and field data.

NUMERICAL RESULTS
We now present numerical simulations of NNLSM.Instead of only determining the optimal reflectivity m as computed by SLSM, the NNLSM method computes the quasi-reflectivity m and the elements of the Hessian matrix Γ ¼ L T L. Each block of Γ is interpreted as the segment spread function (SSF) of the migration operator rather than the point spread function (PSF).If the actual Green's functions are used to construct Γ, then each column of the Hessian matrix is the point scatterer response of the migration operator (Schuster and Hu, 2000).In contrast, the NNLSM Hessian is composed of blocks, where each block is the segment scatterer response of the migration operator.An example will be shown later where reflections from a segment of the reflector are migrated to give the migration segment response of the migration operator.The computational cost for computing SSFs is several orders of magnitude less than that for PSFs because no solutions to the wave equation are needed.However, the penalty is that the resulting solution m is not the true reflectivity, but a sparse representation of it we denote as the quasi-reflectivity distribution.
Using the terminology of neural networks, we also can denote the sparse sub-quasi-reflectivity images as feature maps.Each block in Γ will be denoted as a filter.Therefore, the vector output of Γm can be interpreted as a sum of filter vectors γ i weighted by the coefficients in m, where γ i is the ith column vector of Γ.

Three-layer velocity model
The interpretation of feature maps and filters can be understood by computing them for the Figure 3a model.The grid size of the model is 101 × 101, and the grid interval is 10 m in the xand z-directions.There are 26 shots evenly spaced at a distance of 40 m on the surface, and each shot is recorded by 101 receivers with a sampling interval of 10 m. Figure 3b shows the reverse time migration (RTM) image.
The first test is for a 1D model in which we extract the image located at X ¼ 0.5 km, which is displayed as the blue curve in Figure 3c.The red curve in Figure 3c represents the reflectivity model.Assume that there is only one filter in Γ, and it extends over the depth of 400 m (41 grid points).We now compute the NNLSM image by finding the optimal m and Γ by the two-step iterative procedure denoted as the alternating-descent method (Liu andSchuster, 2018 andLiu et al., 2018).The computed filter γ i is shown in Figure 4a where the phase of the filter γ i is nonzero.If we use a filter with a nonzero time lag to calculate its feature map m, the phases of the feature map and the true reflectivity m will be differ-  Next, we perform a 2D test in which the input is the 2D migration image in Figure 3b.Three 35 × 35 (grid point) filters are learned (see Figure 5a).The modified filters are shown in Figure 5b.Appendix D describes how we align the filters by using the crosscorrelation method.The feature maps of these three filters are displayed in Figure 6a-6c.Figure 6d shows the sum of these three feature maps.It is evident that the stacked feature maps can estimate the correct locations of the reflectivity spikes.

SEG/EAGE salt model
The multilayer NNLSM procedure (see equation 12) is now applied to the migration image associated with the 2D SEG/EAGE salt velocity model in Figure 7a.The grid size of the model is 101 grid points in the zand x-directions.The grid interval is 40 m in the x-direction and 20 m in the z-direction.Figure 7b shows the RTM image.The multilayer NNLSM consists of three convolutional layers: The first one contains 15 basis functions, i.e., filters, of size 11 × 11 grid points; the second one consists of 15 basis functions with dimensions 11 × 11 × 15; and the third one contains 15 basis functions with dimensions 11 × 11 × 15.Equation 12 is solved for m i and Γ i (i ∈ 1; 2; 3) by the iterative alternating-descent method.The multilayered structure is shown in Figure 8, where the black dots in m i represent the nonzero values of the quasi-reflectivity distribution.The effective basis functions computed for these layers are shown in Figure 7c-7e, where the yellow, red, and green boxes indicate the sizes of the basis functions, which can be considered as quasi-migration Green's functions.It indicates that the basis functions of the first layer Γ 1 contain very simple small-dimensional edges, which are called "atoms" by Sulam et al. (2018).The nonzeros of the second group of basis functions Γ 2 combine a few atoms from Γ 1 to create slightly more complex edges, junctions, and corners in the effective basis functions in Γ ð2Þ .Finally, Γ 3 combines molecules from Γ ð2Þ to reconstruct the more complex parts of the migration image.The corresponding stacked coefficient images, also known as feature maps, are shown in Figure 7f-7h, which give the quasi-reflectivity distributions.The reconstructed migration images are shown in Figure 7i-7k.
For comparison, we computed the standard LSM image using the deblurring method described in Chen et al. (2017Chen et al. ( , 2019)).Here, the deblurring filter size is 17 × 17 grid points (the black boxes in Figure 9) computed for a 50 × 50 grid (the red boxes in Figure 9) of evenly spaced point scatterers with the same migration velocity model as that used for the data migration in Figure 7a.The standard LSM images for the 1st and 50th iterations are shown in Figure 10b and 10c, respectively, next to the NNLSM image in Figure 10d.It is clear that the NNLSM image is better resolved than the LSM image, although there are discontinuities in some of the NNLSM interfaces not seen in the LSM image.Some of the detailed geology is lost in the NNLSM image as seen in the wiggly interface in the red rectangular area of Figure 10.The practical application of the NNLSM image is that it might serve as a superresolved attribute image that can be combined with other attributes to delineate the geology.For example, combining the depth slice  of the NNLSM image with a spectral decomposition image (Aarre, 2016) can help delineate the lithologic edges of meandering channels.
NNLSM can filter out random and coherent noise in the migration image by eliminating the noisy learned basis functions and their coefficients in the NNLSM image.For example, Figure 11a shows the RTM image with a sparse acquisition geometry so that the image contains a strong acquisition footprint.The reconstructed migration image in Figure 11b shows significant mitigation of this noise in Figure 11a.However, the migration swing noise is still prominent near the red arrows in Figure 11b.Such noise is reconstructed from the noisy basis function shown in Figure 12a and the coefficients in Figure 12b.Figure 12c is the image reconstructed by correlating the basis function in Figure 12a with the coefficients in Figure 12b.After filtering out the basis functions from the noise, the reconstructed image is shown in Figure 11c, which is free from aliasing noise at the locations indicated by the red arrows.

North Sea data
We apply the NNLSM method to field data collected in the North Sea (Schroot and Scüttenhelm, 2003), where the time migration image is shown in Figure 13a.The time axis is gridded with 213 evenly spaced points, and there are 301 grid points along the x-axis.We compute 21 13 × 5 (grid point) convolutional basis functions, i.e., filters γ i for (i ¼ 1; 2; : : : 21), by the NNLSM procedure (see Figure 13b).These filters approximate the dip-filtered migration Green's functions, and the basis functions are marked as the yellow boxes in Figure 13a and 13b.The stacked feature maps (quasi-reflectivity distribution) are displayed in Figure 13c.It is evident that the stacked feature maps can provide a high-resolution migration image.After reconstruction from the learned filters and feature maps, the migration image is shown in Figure 13d with less noise.
Finally, we apply NNLSM to a time slice of the migration image, which is shown in Figure 14a, where the image size is 301 by 301 grid points.Figure 14b shows the 21 13 × 5 filters estimated by the NNLSM procedure.The stacked feature map is displayed in Figure 14c, which could be used as an attribute image for high-resolution delineation of geologic bodies.The reconstructed migration image is shown in Figure 14d, and we can see that there is less noise.

DISCUSSION
The forward modeling for a multilayered neural network is shown to be equivalent to a single-iterate solution of a multilayered LSM    problem.This assumes positivity and shrinkage constraints on the soft thresholding operation, so the activation function reduces to the ReLU function.This equivalence relates the physics of seismic imaging to architectural features of the neural network.
The size of the filters in the first layer should be approximately the same size as the migration Green's function for that model.Experiments with numerical models suggest that this size is approximately one to two wavelengths.In this case, the filter is interpreted as an approximation to the migration Green's function, except it is that for a reflecting segment.Thus, we interpret the approximate migration Green's function as a migration SSF rather than a migration PSF.Sulam et al. (2018) classify each feature in the first layer as an atom that takes on the role of an SSF.
The output of the first layer provides the smallscale, i.e., high-wavenumber, features associated with the input data.For an input migration image, the feature maps of the first layer resemble sub-quasi-reflectivity maps of the subsurface.Adding the sub-quasi-reflectivity maps together gives a close approximation to the actual reflectivity model as shown in Figures 6d and 7f.
The output of the second layer is a weighted sum of the first-layer features, which create sparser feature maps.Sulam et al. (2018) classifies the concatenation of the filters from the first and second layers as analogous to the creation of molecules from different atoms (see equation 14).In the migration problem, the resulting filters are SSFs for even larger segments of the original reflector boundaries.The feature maps of the third layer are a weighted sum of the second layer's features to produce even sparser feature maps.For migration, the final feature maps are very sparse whereas the concatenated filters are associated with large-scale reflectivity structures in the migration image.
The computational cost of computing NNLSM images is significantly less than that for LSM images because no solutions of the wave equation are needed.For example, we consider the 2D FDTD forward modeling of the acoustic equation with an eighth-order scheme in space and a second-order scheme in time, and its computational complexity is Oðn t n 2 Þ for one shot, where n is the number of grid points in one direction, and n t is the number of the time steps.According to Plessix (2007), the number of time steps n t is approximately equal to 30n in order to satisfy the stability condition and to make sure that there is enough recording time when v max ∕v min ≃ 3; here, v max and v min are the maximum and minimum velocities, respectively.So, the complexity of 2D FDTD forward  modeling of the acoustic equation is Oðn s n 3 Þ, where n s is the number of shots.The computational complexity of LSRTM is Oð6n s N iter n 3 Þ (Schuster, 2017), where N iter is the iteration number.For NNLSM, the complexity is OðN iter n 2 log nÞ according to Heide et al. (2015) and can be reduced to OðN iter n 2 Þ if a local block coordinate-descent algorithm is used (Zisselman et al., 2019).
The 1D NNLSM can be interpreted as a blind deconvolution (BD) problem in seismic data processing (Kaaresen and Taxt, 1998;Bharadwaj et al., 2018).It can be seen in Figure 5 that the filter of NNLSM is the source wavelet of BD and the coefficients of NNLSM are the quasi-reflectivity coefficients of BD.However, NNLSM can have more than one filter and the filters can be 2D or 3D filters.We show the relationship among BD, SLSM, NNLSM, CNN in Figure 15.
NNLSM is an unsupervised learning method.Compared to a supervised learning method, it does not heavily depend on the huge amount of training data.However, it may need human intervention for inspecting the quality of the quasi-reflectivity images and we need to align the filters to get a more accurate estimate of the reflectivity distribution.
In Figure 14, we apply 2D NNLSM to a time slice of the 3D North Sea data.The 2D NNLSM method can be extended to a 3D implementation to learn a set of 3D filters.Incorporating the third dimension of information from 3D filters will likely lead to a better continuity of the reflectors in Fig- ure 14c.However, the computational cost will increase by more than a multiplicative factor of n.

CONCLUSIONS
NNLSM finds the optimal quasi-reflectivity distribution and quasi-migration Green's functions that minimize a sum of migration misfit and sparsity regularization functions.The advantages of NNLSM over standard LSM are that its computational cost is significantly less than that for LSM and that it can be used for filtering coherent and incoherent noise in migration images.A practical application of the NNLSM image is as an attribute map that provides superresolution imaging of the layer interfaces.This attribute image can be combined with other attributes to delineate the structure and lithology in depth/time slices of migration images.Its disadvantage is that the NNLSM quasi-reflectivity image is only an approximation to the actual reflectivity distribution.
A significant contribution of our work is that we show that the filters and feature maps of a multilayered CNN are analogous to migration Green's functions and reflectivity distributions.We now have a physical interpretation of the filters and feature maps in deep CNN in terms of the operators for seismic imaging.Such an understanding has the potential to lead to better architecture design of the network and extend its application to waveform inversion.In answer to Donoho's plea for more rigor, NNLSM represents a step forward in establishing the mathematical foundation of CNN in the context of LSM.

ACKNOWLEDGMENTS
The research reported in this publication was supported by the King Abdullah University of Science and Technology (KAUST) in Thuwal, Saudi Arabia.We are grateful to the sponsors of the Center for Subsurface Imaging and Modeling Consortium for their  financial support.For computer time, this research used the resources of the Supercomputing Laboratory at KAUST and the IT Research Computing Group.We thank them for providing the computational resources required for carrying out this work.
where η represents model parameters and constants.Here, we implicitly assume a normalized source wavelet the frequency domain and D model and D data represent the sets of coordinates in the model and data spaces, respectively.The term Gðx 0 jxÞ ¼ e iωτ xx 0 ∕kx − x 0 k is the Green's function for a source at x and a receiver at x 0 in a smoothly varying medium (if the source and receiver are coincident at x, then the zero-offset trace is represented by the squared Green's function Gðxjx 0 Þ 2 ).The traveltime τ xx 0 is for a direct arrival to propagate from x to x 0 .
The physical interpretation of the kernel Γðx 0 jxÞ is that it is the migration operator's (this assumes that the zero-offset trace is generated with an impulsive point source with a smoothly varying background velocity model and then migrated by a poststack migration operation; it is always assumed that the direct arrival is muted and there are no multiples) response at x 0 to a point scatterer at x, otherwise known as the migration Green's function (Schuster and Hu, 2000).It is analogous to the PSF of an optical lens for a point light source at x in front of the lens and its optical image at x 0 behind the lens on the image plane.In the discrete form, the modeling term ½Γm i in equation A-1 can be expressed as with the physical interpretation that ½Γm i is the migration Green's function response at x i .An alternative interpretation is that ½Γm i is the weighted sum of basis functions Γðx i jz j Þ, where the weights are the reflection coefficients m j , and the summation is over the j index.
We will now consider this last interpretation and redefine the problem as finding the weights m j and the basis functions Γðx i jz j Þ.This will be shown to be equivalent to the forward pass of a fully connected neural network.

APPENDIX B SOFT THRESHOLDING FUNCTION
Define the sparse inversion problem as finding the optimal value x Ã that minimizes the objective function ϵ where the L 1 norm demands sparsity in the solution x.An example is where z is a noisy M × N image such that z ¼ x þ noise and we seek the optimal vector x that satisfies equation B-1.Here, the noisy M × N image has been flattened into the tall MN × 1 vector z.
The stationary condition for equation B-1 is Equations B-2-B-3 can be combined to give the optimal x Ã expressed as the two-sided ReLU function More generally, the iterative soft-threshold algorithm (ISTA) that finds There are several more recently developed algorithms that have faster convergence properties than ISTA.For example, fast-ISTA has quadratic convergence (Beck and Teboulle, 2009).

APPENDIX C NEURAL NETWORK LEAST-SQUARES MIGRATION
The NNLSM algorithm in the image domain is defined as solving for the basis functions Γðx i jx j Þ and mj that minimize the objective function defined in equation 5.In contrast, SLSM finds only the where now ΓÃ and mÃ are to be found.The functions with tildes are mathematical constructs that are not necessarily identical to those based on the physics of wave propagation as in equation 5.The explicit matrix-vector form of the objective function in equation C-1 is given by ϵ ¼ 1 2 and its derivative with respect to Γðx i 0 jz j 0 Þ is given by ∂ϵ The iterative solution of equation C-1 is given in two steps (Olshausen and Field, 1996): 1. Iteratively estimate mi by the gradient-descent formula used with SLSM: Γðx i 0 jz j 0 Þ ðkþ1Þ ¼ Γðx i 0 jz j 0 Þ ðkþ1Þ − α ∂ϵ ∂ Γðx i 0 jz j 0 Þ ; It is tempting to think of Γðxjx 0 Þ as the migration Green's function and mi as the component of reflectivity.However, we can't submit to this temptation and so we must consider, unlike in the SLSM algorithm, that Γðxjx 0 Þ is a sparse basis function and mi is its coefficient.To get the true reflectivity, we should equate equation A-3 to P j Γðx i ; x j Þ mj and solve for m j .

APPENDIX D ALIGNMENT OF THE FILTERS
To align the learned filters, we first choose a "target" filter, which is denoted as a 2D matrix A with the size of M × N.Then, we try to align all of the other filters with the target filter through their crosscorrelation.For example, if we choose one filter denoted as matrix B with the same size as A, we can get the crosscorrelation matrix C with its elements defined as 24/21 to 109.171.190.210.Redistribution subject to SEG license or copyright; see Terms of Use at https://library.seg.org/page/policies/termsDOI:10.1190/geo2019-0412.1

Figure 6 .
Figure 6.Feature maps (a) 1, (b) 2, and (c) 3 for the filters shown in Figure 5.The stacked feature map is shown in (d).Here, the white lines show the locations of nonzero points and the yellow lines indicate the locations of the reflectivity distributions.

Figure 8 .
Figure 8. Decomposition of the migration image of the 2D SEG/EAGE salt model in terms of the multilayer filters Γ i and prestacked quasi-reflectivity coefficients m i , where the black dots in m i represent the nonzero values.

Figure 9 .
Figure 9. (a) Reference model and (b) its migration image computed by the deblurring LSM method.

Figure 11
Figure 11.(a) RTM image, (b) image reconstructed from all of the basis functions, and (c) with selected basis functions.

Figure 14 .
Figure 14.(a) Time slice of the migration image computed from the F3 offshore block data, (b) learned filters, (c) stacked feature maps, and (d) migration image after filtering.

Figure 15 .
Figure 15.The relationship between sparse inversion and CNN, where the sparse inversion methods include sparse LSM, NNLSM, and BD.

Figure 17 .
Figure 17.Diagram that illustrates the circular shifting of padded filter B.
Downloaded 03/24/21 to 109.171.190.210.Redistribution subject to SEG license or copyright; see Terms of Use at https://library.seg.org/page/policies/terms DOI:10.1190/geo2019-0412.1 of equations can be cast in a form similar to equation 10, except we have