Least-squares path-summation diffraction imaging using sparsity constraints

Diffraction imaging aims to emphasize small-scale subsurface heterogeneities, such as faults, pinch-outs, fracture swarms, channels, etc. and can help seismic reservoir characterization. The key step in diffraction imaging workflows is based on the separation procedure suppressing higher energy reflections and emphasizing diffractions, after which diffractions can be imaged independently. Separation results often contain crosstalk between reflections and diffractions and are prone to noise. We have developed an inversion scheme to reduce the crosstalk and denoise diffractions. The scheme decomposes an input full wavefield into three components: reflections, diffractions, and noise. We construct the inverted forward modeling operator as the chain of three operators: Kirchhoff modeling, plane-wave destruction, and path-summation integral filter. Reflections and diffractions have the same modeling operator. Separation of the components is done by shaping regularization. We impose sparsity constraints to extract diffractions, enforce smoothing along dominant local event slopes to restore reflections, and suppress the crosstalk between the components by local signaland-noise orthogonalization. Syntheticand field-data examples confirm the effectiveness of the proposed method.


INTRODUCTION
Diffraction imaging aims to emphasize small-scale subsurface heterogeneities such as faults, pinch-outs, fracture swarms, channels and other small features and can play an important role in seismic reservoir characterization (Landa, 2012). Reflections and diffractions coexist on seismic records, however the latter typically have significantly lower energy than the former (Klem-Musatov, 1994). Moreover, most of the conventional seismic data processing is tuned to imaging and enhancing reflected waves, whereas diffractions on seismic images are being partially suppressed (Kozlov et al., 2004). Therefore, reflection and diffraction separation procedure is a key step in diffraction imaging workflows (Harlan et al., 1984;Khaidukov et al., 2004;Fomel et al., 2007).
Seismic waves diffract on heterogeneities with a size of a wavelength, whereas reflected energy is associated with larger objects in the subsurface corresponding to the size of the first Fresnel zone (Moser and Howard, 2008). Therefore, separating diffractions and imaging them independently can produce an image with a higher spatial resolution than that of the conventional migration. On the other hand, this resolution is limited by using conventional migration operators, which are tailored for reflection processing, in diffraction imaging workflows. To improve diffraction imaging resolution, Gelius et al. (2013) propose to use singular value decomposition of the data in windows along the Kirchhoff diffraction-stacking aperture. This approach significantly improves resolution of diffraction images but requires a significant computational effort. Huang and Schuster (2014) propose to use resonant multiples to improve resolution of diffraction images. Their method requires nonlinear least-squares reverse time migration (RTM) or full wave inversion (FWI) updates of the background reflectivity after each iteration to account for the contribution of multiples.
Robust diffraction separation from noise and reflections while accounting for the crosstalk between them is of utmost importance for a resolution increase, which cannot be accomplished without enhancing subtle features with magnitudes close to the noise level. The problem of diffraction and noise separation was first addressed by Harlan et al. (1984). Klokov et al. (2010a,b); Klokov and Fomel (2012a); Burnett et al. (2015) enhance diffractions' signal-to-noise ratio in the dip-angle gathers' domain.
Least-squares migration (Nemeth et al., 1999;Ronen and Liner, 2000) can produce higher resolution than conventional migration algorithms and suppress artifacts intrinsic to the latter. Effectively an operator for diffraction imaging can be constructed in an iterative fashion using least-squares migration framework. A priori knowledge about the signature of a scatterer's response to a seismic wavefield can be incorporated into the inversion in the form of regularization and allow for diffraction separation from noise and reflections.  perform least-squares migration chained with plane wave destruction (PWD) (Fomel, 2002;Fomel et al., 2007) and path-summation integral filtering while enforcing sparsity in a diffraction model. Yu et al. (2016) utilize common-offset Kirchhoff least-squares migration with a sparse model regularization to emphasize diffractions. Yu et al. (2017a) extract diffractions based on plane wave destruction and dictionary learning for sparse representation. Yu Least-squares path summation et al. (2017b) use two separate modeling operators for diffractions and reflections and impose a sparsity constraint on diffractions. Decker et al. (2017) denoise diffractions by weighting the model with semblance estimated in dip-angle gather domain (DAG) in least-squares migration.
In this paper, we propose a workflow to decompose the input seismic data into reflections, diffractions and noise. To restore both reflections and diffractions simultaneously we extend the least-squares migration workflow presented by , in which the inverted forward modeling operator is the chain of Kirchhoff modeling, plane wave destruction and path-summation integral filter operators. The proposed chain of operators allows for proper incorporation of diffraction component into the inversion as opposed to the conventional least-squares migration dominated by stronger reflections . Incorporation of reflection model space into the inversion reduces the crosstalk between reflections and diffractions. Both reflections and diffractions are modeled using the same forward modeling operator. Separation into the components is accomplished by shaping regularization (Fomel, 2007b(Fomel, , 2008 and by local signal-and-noise orthogonalization (Chen and Fomel, 2015). Following  diffractions are penalized by sparsity constraints allowing to distinguish them from noise. Reflections are forced to be smooth along the dominant local slopes in the image domain. Local signaland-noise orthogonalization is applied additionally to further minimize the leakage of reflections into the diffraction image domain: local similarity (Fomel, 2007a) is computed between two models and events following reflected energy pattern are removed from the diffraction image. Although we develop and illustrate the efficiency of the method for the zero-offset time-migration case, all the developments can be extended to depth and pre-stack domains.
We start with a theory section describing all the components of the workflow. We describe the workflow and then demonstrate its efficiency on synthetic and data examples and discuss its comparative advantages and disadvantages.

METHOD
We start with a brief review of the path-summation imaging framework. Then we give a formulation of the least-squares inverse problem for diffraction imaging. We then proceed with a description of an optimization strategy and introduce a regularization framework.
Analytical path-summation migration Landa et al. (2006) present a path-integral imaging approach, which does not require knowledge of a velocity distribution in the subsurface. Burnett et al. (2011) adopt this approach and employ the principle of diffraction apex stationarity under migration velocity perturbation (Novais et al., 2006) to evaluate the diffraction path-summation Least-squares path summation migration integral * through stacking of constant migration velocity images. The images are generated by velocity continuation (VC) technique, which corresponds to the following transformation in the Fourier domain (Fomel, 2003): where v is VC velocity, Ω is the Fourier dual of σ (time axis after the transform σ = t 2 ), k is the wavenumber;P 0 (Ω, k) is input stacked and not migrated data, which corresponds to v = 0 andP (Ω, k, v) is a constant velocity image.
In our previous work (Merzlikin and Fomel, 2017a), we derived analytical formulae for path-summation integral evaluation based on the continuity of VC transformation with respect to velocity: stacking of constant migration velocity images is replaced by a direct integral expression. Analytical formulae allow us to treat path-summation imaging workflow as a simple filter in the frequency-wavenumber domain. Here, we extend Gaussian weighting path-summation migration scheme (Merzlikin and Fomel, 2017a) to account for variable velocity in the subsurface: we design a weighting function, which has a flat region in the middle and has a Gaussian decay for the tail-elimination region as opposed to a curve with one most probable velocity value in regular Gaussian weighting path-summation integral. The following expression describes path-summation integral with Gaussian tapering: where v a and v b correspond to the lowest and to the highest constant migration velocities in the image set being stacked or, precisely speaking, integrated; v 1 and v 2 are tail cut-off velocities, which correspond to the slope decay parameter β. Expression 2 corresponds to the sum of three path-summation integrals: two Gaussian weighting scheme path-summation integrals (the first and the last terms) and unweighted path-summation integral (the middle term). Each term has an analytical expression. Gaussian weighting scheme path-summation integrals can be computed as (Merzlikin and Fomel, 2017a): where γ(Ω, k, β) = i ik 2 /16Ω + β. To compute the last summation term, which corresponds to the right taper slope, we replace v a with v 2 and v 1 with v b . The middle term can be evaluated as: Thus, Gaussian-tapering path-summation integral can be evaluated analytically with a cost equal to that of only two FFTs. * We use a term "path-summation migration integral" to highlight that the weighting scheme described further is not associated with "probabilities" of corresponding images being stacked, which is the case for "path-integral imaging" concept . Least-squares path summation

Objective function
Seismic imaging can be formulated as an optimization problem (Nemeth et al., 1999;Ronen and Liner, 2000): where L corresponds to the modeling operator, d corresponds to seismic data, m is an image of the subsurface containing both reflections and diffractions. In this work we use Kirchhoff modeling and migration as forward (L) and adjoint (L T ) operators correspondingly. To image reflections and diffractions as separate components of a wavefield we propose the following objective function: Here, L is forward Kirchhoff modeling operator, m d and m r are diffraction and reflection images (models) respectively, P stands for analytical path-summation migration operator computed from equation 2 (P corresponds toÎ(Ω, k, v a , v b , β) including forward Fourier, inverse Fourier and σ = t 2 transforms), D data is a plane-wave destruction filter applied in the data domain. To penalize the objective function and perform reflection/diffraction separation the shaping regularization operator R Sσ,T λ (m r , m d ) is designed (see the "Regularization" subsection for details).
Rather than fitting seismic data d as in equation 5, we search for models minimizing a misfit weighted by path-summation migration operator. Path-summation operator P preceded by plane-wave destruction filter applied in the data domain (D data ) for reflection energy elimination can be treated as diffraction probability at a certain location. Therefore, model fitting becomes constrained to those samples, in which diffractions are most probable. The terms of the misfit in equation 6 can be rearranged to stress the misfit weighting by diffraction probability (PD data ): Incorporation of PWD (D data ) as a misfit weight into the inversion (equation 6) puts some of the reflected energy in the null space. PWD is a crucial part of the inversion. By weighting the residual with path-summation integral and PWD we guide the inversion to extract diffractions from the full wavefield, which is predominated by reflections. To restore reflections from the null space and account for their leakage to the diffraction image domain PWD can be disabled and another inversion cascade can be run: where R Sσ,T λ ,Om r (m r , m d ) is the shaping regularization operator designed to perform reflection/diffraction separation after PWD operator (D data ) deactivation (see "Regularization" subsection for details).
Path-summation integral acts as a dip filter suppressing events with high wavenumbers (Merzlikin and Fomel, 2017a). High wavenumbers can be restored by running the third cascade of the inversion with path-summation integral filter disabled: A practical alternative is to restore high wavenumbers separately for reflections and diffractions. Once reflection m r and diffraction m d components are restored (after the second inversion cascade 8), they can be used as starting models in the following inversions for wavenumber restoration: where m w r are reflections with high wavenumbers restored, which are initialized with m w r = m r at the iteration zero. High wavenumbers in diffractions m w d can be restored as Reflections restored by equation 10 (m w r ) are subtracted from the full wavefield data d to allow for diffraction-only restoration m w d . No interference is expected between reflections m w r and diffractions m w d provided accurate starting models from the previous inversions (equations 6 and 8).
as penalty terms for reflections and diffractions correspondingly, the problem is reformulated in the shaping framework (see "Regularization" subsection for details).

Optimization strategy
To solve the optimization problem in equations 6, 8, 9 we employ a conjugate-gradients scheme: where −∇J(m n r , m n d ) is the gradient at iteration n, [s n r , s n d ] T is a conjugate direction, which has two components to update both of the models, scalar α n can be obtained by perfoming a line search, and β n is designed to guarantee that [s n r , s n d ] T and [s n−1 r , s n−1 d ] T are conjugate. Least-squares path summation It is important to highlight that the same operator is used to update both images of reflections m r and diffractions m d . The implication of this strategy is that without regularization same updates are attributed to both models. For simplicity, consider conjugate direction s n at the zero iteration n = 0. It has the form of s n=0 = [L T D T data P T r, L T D T data P T r] T (PWD (D data ) and path-summation integral (P) are disabled for objective functions in equations 8, 9 correspondingly), where r corresponds to the residual between the observed data d and the data modeled from the initial guess model [m init r , m init d ] T (initialized by zeroes for the first inversion (equation 6) and by the output of the first inversion (equation 6) for optimization of the objective function in equation 8). The conjugate direction s n=0 is equal to the negative gradient of the objective function −∇J(m init r , m init d ). It is obvious that the residual r in the conjugate direction s n=0 is mapped to reflection and diffraction image using the same operator -the adjoint of the "chain". The same is true for all of the iterations. In this case, previous directions s n−1 = [s n−1 r , s n−1 d ] T , which are also the same for both models, participate in the update. To perform reflection and diffraction separation regularization is required.

Regularization
We follow the shaping regularization approach (Fomel, 2007b) to constrain the models. After several iterations of the conjugate gradient algorithm (denoted by n), output of which is denoted by [m n+1 r , m n+1 d ] T , shaping operators are applied to the models as follows: , where T λ is a thresholding operator with a threshold level λ and S σ is smoothing along dominant local slopes with a radius σ. Smoothing and thresholding correspond to external iterations of the optimization scheme (denoted by k), result of which is denoted by [m k r , m k d ] T . Shaped models are then input to the next cascade of internal iterations by conjugate gradients as a new starting model. Therefore, internal iterations minimize the misfit term of the equation 6 and regularization in the form of shaping (equation 13) corresponds to external iterations. For wavenumber restoration inversions (equations 10 and 11) shaping operators should be applied separately: thresholding for diffraction restoration, and smoothing along dominant local slopes for reflections.
As soon as PWD is removed from the misfit term in the second inversion cascade (equation 8) the residual gets dominated by reflections. Thus, before disabling PWD we need to make sure that the majority of the diffracted energy has been extracted by the first inversion (equation 6). Because of the high energy of reflections, the sparsity constraints alone can be insufficient to remove reflected component leaked to the diffraction image after disabling PWD. Reflection removal will require higher threshold levels leading to diffraction supression.
To address this issue signal and noise orthogonalization workflow is employed Least-squares path summation (Chen and Fomel, 2015). We follow the approach and measure the similarity between the update to diffractivity s n d and reflectivity obtained on the previous external iteration m k−1 r . Events in the diffraction image update s n d caused by reflection energy leakage have high similarity to reflectivity. Based on the local similarity estimate (Fomel, 2007a) orthogonalization operator is formed O m k−1 r . It is then applied to the update of the diffractions O m k−1 r (s n d ) removing reflected energy with high similarity to reflectivity as noise. This procedure can be treated as additional regularization allowing for component separation in the second cascade of the inversions with PWD disabled (equation 8). Orthogonalization can also be applied to the diffractivity model itself rather than to its update. Although both ways are mathematically equivalent to each other in the examples shown below update orthogonalization demonstrates better reflection crosstalk supression. After update orthogonalization, sparsity constraints prove to be efficient to denoise diffractions and suppress the remaining crosstalk, major part of which has been eliminated in the update.

Workflow
The workflow we propose to decompose the input data into reflections, diffractions and noise is outlined in the flowchart (Figure 1) and is as follows: 1. run the first inversion (equation 6); 2. run the second inversion (equation 8) and use the output from the first inversion cascade as the starting model; 3. restore high wavenumbers in reflections (equation 10); use the output of the second inversion as the starting model; 4. restore high wavenumbers in diffractions (equation 11); use the output of the second inversion as the starting model for diffractivity; use reflectivity from the third step to subtract it from the data.
For shaping regularization of reflections (S σ ) dominant local slopes should be estimated in the image domain (conventional Kirchhoff migration image should suffice). For PWD filter (D data ) dominant local slopes should be estimated in the data domain (zero-offset or stacked section). Both slopes should be precomputed before the inversion.

SYNTHETIC DATA EXAMPLE
We use the following synthetic to test the proposed chain-inversion approach. Figure 2 shows a zero-offset modeling result in a medium with a constant vertical gradient of velocity. The starting velocity at the surface corresponds to 3 km/s. The section Least-squares path summation

First Inversion
Conjugate Gradients  Figure 1: Flowchart illustrating the workflow for reflection, diffraction and noise separation. Internal iterations correspond to "Conjugate Gradients" minimizing the misfit term. External iterations correspond to "Shaping Regularization" performing reflection/diffraction separation. First, the objective function given by the equation 6 is minimized ("First Inversion" in the chart). The data to be matched in the "First Inversion" is filtered with PWD followed by path-summation integral filter. The output from the "First Inversion" is input to the "Second Inversion" cascade (equation 8).
Orthogonalization of updates to reflections and to diffractions is introduced. The data to be input to the "Second Inversion" is filtered with path-summation integral filter. PWD is disabled. Estimated reflectivity is then input to high-wavenumber restoration inversion denoted by "Restore Reflections' Wavenumbers". Reflections with high-wavenumbers restored and diffractions from the second inversion are then input to the "final" inversion restoring high-wavenumbers in diffractions. The data to be matched by the last two inversions is zero-offset data d with no PWD or pathsummation integral filter applied. Least-squares path summation contains reflections with various dips, several layers of sparse diffractions and an area, in which the density of scatterers is increased (towards the right edge of the synthetic). Random noise with a 30% magnitude of that of the diffractions and filtered to the signal frequency band is added to the section. First, we illustrate the advantage of misfit minimization with a diffraction probability weight using the same synthetic ( Figure 2) but with no reflections (Figure 3a) in order to compare its performance with least-squares Kirchhoff migration. Reflection/diffraction separation is considered further. Figure 4a shows the result of leastsquares Kirchhoff migration (equation 5) with no regularization. Noise leakage into the diffraction image domain can be easily noticed. The result of path-summation migration applied to the zero-offset synthetic data (Figure 3a) is shown in Figure 3b. Path-summation migration image can provide us with a probability of a scatterer at a certain sample location. We employ this information by weighting the misfit term in the equation 5 by the path-summation migration operator ( P d − Lm 2 2 ), which forces the misfit minimization only at probable diffraction locations. No PWD (D data ) is required to emphasize diffractions since there are no reflections. Since the input contains diffractions only we need only one model for diffractions m. No reflection model is considered. Figure 4b shows the result of least-squares Kirchhoff migration with a misfit weighted by path-summation integral filter. No regularization has been applied. It can be seen that weighted minimization reduces high frequency artifacts. Path-summation integral operator emphasizes diffraction apexes remaining stationary under migration velocity perturbation. It acts as a low-frequency filter guiding the inversion towards probable scatterers locations. Indeed, lower frequency content is noticeable and diffractions are more visible on the result of weighted minimization (Figure 4b) than on the "conventional" least-squares Kirchhoff migration image (Figure 4a). Incorporation of sparsity constraints will remove high frequency artifacts from the least-squares Kirchhoff migration image and will reduce the area corresponding to a diffraction location in a weighted minimization result. Here, regularization is disabled to show the effectiveness of path-summation operator misfit weighting and avert the subjective choice of a regularization parameter, which will influence the results and make their unbiased comparison harder. Least-squares path summation  We then apply the proposed workflow to the synthetic containing all the components with regularization enabled. First, we estimate slopes of the dominant events, which are associated with reflections, in the data and image domains. We then generate the data for the first inversion cascade (Figure 5a): zero-offset section (Figure 2) is weighted by PWD (D data ) and path-summation integral filter (P). We run 100 outer iterations and 5 inner iterations. Convergence curves are shown in Figure 6. The results of the first inversion are shown in Figure 5. Although subtle reflection remainders are present (e.g. 0.9 − 1.0 [s] TWTT (two-way travel time) and 6.0 − 6.5 [km] distance) the majority of diffracted energy has been extracted with high accuracy.
To reduce these remainders and restore reflections trapped in the null space of the "chain" of operators we run the second inversion cascade with no PWD. First, we generate the data to be fit (Figure 7a): PWD is disabled and only path-summation integral operator is used to weight the misfit. The result of the first inversion cascade ( Figure 5) is used as the starting model for the second inversion. We run 10 outer iterations and 5 inner to restore reflections and reduce the reflection-diffraction crosstalk. Orthogonalization is required. Figure 8 shows the updates to reflections and diffractions after the first external iteration of the second inversion (equation 8).
Orthogonalization is applied to remove reflections from the diffraction image update: it can be seen that reflections in the update are significantly stronger than diffractions and are hard to suppress with thresholding only. Orthogonalization significantly reduces the energy of reflections and diffraction-related update becomes more visible (Figure 8b). The orthogonalized update is then added to diffraction model followed by shaping regularization. Reflection and diffraction images shown in Figure 7 are generated. Reflections are recovered from the null space introduced by PWD. Crosstalk is reduced (Figure 7c: e.g. 0.9 − 1.0 [s] TWTT and 6.0 − 6.5 [km] distance). Different choice of regularization parameters including the balance between the numbers of internal and external iterations in the first and second inversions can be tweaked to improve the results even more.
We then run the inversions (equations 10 and 11) to restore high wavenumbers with PWD and path-summation integral operators disabled. For diffraction-only recovery (equation 11) besides using the starting model from the second inversion as an additional regularization the model is masked by the locations of the diffractors. This mask can easily be created from the diffraction image generated as the output of the second inversion cascade (Figure 7c). Figure 9 show reflection and diffraction components restored by the inversions, which were generated by applying Kirchhoff forward modeling operator to reflectivity and diffractivity output by wavenumber restoration inversions (equations 10 and 11). We then subtract recovered reflections and diffractions from the input data to generate the noise estimate. Noise and diffractions are plotted with the same gain. Although the noise section contains some diffracted energy, the approach is highly effective. We then compare the results with the true components used to generate the synthetic. High correlation between the restored and the "true" components of the synthetic also supports high accuracy of the workflow. Signal and noise orthogonalization helps to account for amplitude differences between "true" and restored diffractions (Figure 9c and 9d): although high restoration quality Least-squares path summation  Figure 5: (a) Zero-offset section weighted by path-summation integral operator (P) and PWD (D data ) -data to be fit by the first inversion (equation 6); reflectivity (b) and diffractivity (c) restored. Least-squares path summation is achieved subtle amplitude differences can be noticed. Since diffraction shapes are restored this energy can be brought from the noise section back to the diffraction image domain by measuring the similarity between the corresponding components.
Simultaneous inversion for both reflections and diffractions allows to account for the crosstalk between the components and thus improves the confidence of scatterer position identification. We proceed with the workflow application to the field data example.

FIELD DATA EXAMPLE
We apply the proposed approach to the Viking Graben dataset (Keys and Foster, 1998). Klokov and Fomel (2012a) previously performed diffraction imaging of the Viking Graben dataset using Hybrid Radon transform in the dip-angle common image gathers' domain. Gonzalez et al. (2016) perform diffraction-based depth velocity analysis on the dataset. Figure 10a illustrates the zero-offset section. It can be noticed that gradually dipping continuous reflections are interrupted by faults encoded in the diffracted wavefield. Additional diffractions are associated with rough reflector surfaces. Diffractions exhibit significantly lower amplitudes than reflections, and a separation procedure aimed to emphasize diffractions must be applied before proceeding along the workflow. We separated diffractions (Figure 10b) from reflections using PWD filter (Fomel et al., 2007) and first imaged them using conventional post-stack Kirchhoff time mi-Least-squares path summation We apply the "chain" of operators inversion approach to improve diffraction signalto-noise ratio and increase the diffraction image reliability. First, we prepare the data to be fit by the inversion (equation 6). The zero-offset section after application of PWD in the data domain (D data ) followed by path-summation integral filter (P) is shown in Figure 12a. These are the data to be fit, which can be treated as diffraction probability. Slope values required for PWD in the data domain (D data ) and for shaping regularization in the model space (S σ ) are estimated using zero-offset section ( Figure 10a) and conventional Kirchhoff migration image (Figure 11a) correspondingly. We run 10 outer iterations and 5 inner to restore reflectivity and diffractivity shown in Figure 12. It can be seen that diffractions are significantly more highlighted than in the Kirchhoff migration image of the section after PWD (Figure 11b) and appear to be denoised. At the same time, crosstalk between reflections and diffractions is strong and a lot of strong reflection energy is in the null space of the operator. To address the null space and crosstalk problems we generate the data for the second inversion (equation 8), invert for reflections and diffractions (we run 50 outer iterations and 5 inner) and produce the models shown in Figure 13. Orthogonalization of the diffraction update is shown in Figure 14. Convergence curves are shown in Figure 15. It can be seen that reflections are recovered from the null space (refer to the shallow events) and crosstalk between diffractions and reflections has also been reduced. In particular continuous layering events delimited by 1.75 − 2.0 [s]  distance and around 2 [s] TWTT between 18 − 19 [km] distance are suppressed. There is still a remainder of events with elongated shapes in diffraction image (Figure 13c). These events can be connected with rough surface features, which are too irregular to be described by the chosen smoothing radius in shaping regularization penalizing reflections. Less aggressive smoothing and more detailed dip field in the model space should allow moving these events back to the reflection image domain. Stronger thresholding can also help.
Finally we restore high wavenumbers and model reflections ( Figure 16a) and diffractions ( Figure 16b). As in the synthetic data example, diffractions in wavenumber restoration inversion are additionally penalized by the locations' mask determined from the output of the second inversion cascade (Figure 13c). We then subtract their sum from the zero-offset DMO section ( Figure 10a) and arrive at the following noise estimate (Figure 16c). Noise section is plotted with the same gain as the "restored" diffractions. Signal and noise orthogonalization (Chen and Fomel, 2015) is applied to account for aperture difference between observed ( Figure 10b) and restored diffractions ( Figure 16b) (for instance, often only a single diffraction flank can be observed in the data leading to spuriously high difference or the "noise estimate") and to bring back some of the energy accidentally leaked to the noise domain (Figure 16c). Some residuals can be noticed in the lower part of the noise section, which can have a signature intermediate between reflections and diffractions, were not attributed to either. Upper part of the noise section exhibits low magnitude residuals stating high accuracy of the method. Besides that, restored diffractions ( Figure 16b) exhibit high correlation with PWD section (Figure 10b) also supporting high efficiency of the method.
Overall impression of the image shown in Figure 13c can be contradictory because of the continuity loss. However, diffractions have spiky and intermittent distribution, which correlates well with the faults and other discontinuities observed in the conventional image (Figure 11a). To highlight this statement we overlay the conventional image ( Figure 11a) with estimated diffractivity (Figure 17). High correlation between  On the other hand, the approach allows increasing the confidence of the diffractions present in the image and distinguishing them from noise. In other words, even though some events might be lost, we can be confident in the events that were successfully imaged.

DISCUSSION
We have proposed a workflow to decompose full waveform input data into reflections, diffractions and noise by using three cascades of inversions. We also want to highlight regularization through orthogonalization as a novel tool to aid the inversion, in which multiple models are estimated with a single modeling operator. Its ability to track down the leakage of the events from one model space to another appears to be crucial for the second cascade of the inversions. Orthogonalization, combined with shaping regularizations, allows for efficient reflection, diffraction and noise separation.
Shaping regularization allows for a flexible model constraints since models for reflections and diffractions are conditioned independently. For instance, if the regularization terms were incorporated as penalty terms in the objective functions (equa-Least-squares path summation tions 6, 8 and 9), finding the trade-off between weights for all the terms would be challenging: the two terms would conflict with each other leading to stronger interference between model spaces.
It should be mentioned that the workflow presented here targets diffraction phenomenon in 2D rather than in 3D. In particular, in the case of three dimensions, sparsity constraints can be inadequate for edge diffractions regularization. The latter features are continuous along the edges and act as reflections but at the same time have a diffractive component in the direction perpendicular to the edge (Moser, 2011). To properly regularize edge diffractions sparsity constraints should be accompanied by anisotropic smoothing operator enforcing continuity along the edges (Merzlikin et al., 2018). At the same time, PWD robustly serving the purpose of diffraction extraction in 2D, in 3D should be replaced by AzPWD -workflow properly addressing hybrid signature of edge diffractions (Merzlikin et al., , 2017b. These are the modifications, which have to be incorporated into the workflow presented in this paper in order for it to be extended to 3D. At the same time, all the necessary modifications have already been derived and, thus, support the applicability of the reflection-diffraction crosstalk minimization approach presented in this paper in three dimensions. Finally, the proposed approach (except for analytical time domain path-summation migration describing diffraction probability) is compatible with any migration algorithms operating in pre-stack, post-stack, depth and time domains. In pre-stack domain the PWD filter can be implemented using common-offset sorting of input data. Regularizations are performed in the image domain and, thus, are not dependent on Least-squares path summation a particular migration algorithm. Analytical expressions exist for path-summation imaging in pre-stack time-migration domain (Merzlikin and Fomel, 2017a), however, path-summation integral migration in the depth domain remains challenging  and, thus, limits the applicability of the approach presented here. The possible workaround is to incorporate probability weights favouring diffractions (Merzlikin and Fomel, 2017b) into path-summation integral expression and utilize importance sampling strategies for computational load reduction associated with the absence of analytical expressions of path-integral evaluation in the depth domain.

CONCLUSIONS
We have developed an approach to decomposing input data into reflection, diffraction and noise components. The inverted forward modeling operator corresponds to the chain of Kirchhoff modeling, plane wave destruction (PWD) and path-summation integral migration operators. The chain of these operators accounts for the contribution of diffractions to the optimization, and guides inversion towards probable diffraction locations. Reflection model is estimated simultaneously. Separation into different components in iterative inversion is achieved by regularization: reflections are forced to be smooth along the dominant local slopes in the image domain, while diffractions are penalized by the sparsity constraints. Additional shaping regularization based on local signal-and-noise orthogonalization removes continuous events from diffractivity updates and reduces the crosstalk between the components during iterations. Incorporation of a reflection model into the inversion allows for reflected energy withdrawal from the diffraction image domain and increases the reliability of diffraction images.
Numerical experiments with synthetic data emulating high interference between reflections and diffractions demonstrates high effectiveness of the approach. A field data example confirms the robust separation of the components by the proposed approach.
The workflow has a high computational efficiency. Shaping regularization allows for fast convergence whereas the cost of the forward modeling operator at each iteration is predominated by Kirchhoff migration.