Probabilistic moveout analysis by time warping

Parameter estimation from reflection moveout analysis represents one of the most fundamental problems in subsurface model building. We have developed an efficient moveout inversion method based on the process of automatic flattening of common-midpoint (CMP) gathers using local slopes. We find that as a by-product of this flattening process, we can also estimate reflection traveltimes corresponding to the flattened CMP gathers. This traveltime information allows us to construct a highly overdetermined system and subsequently invert for moveout parameters including normal-moveout velocities and quartic coefficients related to anisotropy. We use the 3D generalized moveout approximation (GMA), which can accurately capture the effects of complex anisotropy on reflection traveltimes as the basis for our moveout inversion. Due to the cheap forward traveltime computations by GMA, we use a Monte Carlo inversion scheme for improved handling of the nonlinearity between the reflection traveltimes and moveout parameters. This choice also allows us to set up a probabilistic inversion workflow within a Bayesian framework, in which we can obtain the posterior probability distributions that contain valuable statistical information on estimated parameters such as uncertainty and correlations. We use synthetic and real data examples including the data from the SEAM Phase II unconventional reservoir model to demonstrate the performance of our method and discuss insights into the problem of moveout inversion gained from analyzing the posterior probability distributions. Our results suggest that the solutions to the problem of traveltime-only moveout inversion from 2D CMP gathers are relatively well constrained by the data. However, parameter estimation from 3D CMP gathers associated with more moveout parameters and complex anisotropic models are generally nonunique, and there are trade-offs among inverted parameters, especially the quartic coefficients.


INTRODUCTION
Moveout analysis is arguably one of the most important steps in seismic processing, which leads to an expeditious construction of subsurface models. The classic work of Taner and Koehler (1969) represents the foundation for modern moveout analysis, in which the reflection traveltimes of pure P-waves are approximated by a Taylor expansion around the zero offset. The degree of matching between the data and the computed traveltime can be judged based upon the semblance criterion (Neidell and Taner, 1971). Velocities that correspond to a higher degree of matching (semblance) are picked, and an accurate subsurface model can be inferred. In a 2D isotropic subsurface, only one parameterthe normal-moveout (NMO) velocityneeds to be estimated but more parameters are required when 3D data and effects from anisotropy are considered (Alkhalifah and Tsvankin, 1995;Grechka and Tsvankin, 1998a;Yilmaz, 2001;Tsvankin, 2012;Thomsen, 2014;Oh and Alkhalifah, 2016;Masmoudi et al., 2017). Aiming to improve the process, many researchers have proposed strategies such as refining the semblance criterion for higher resolution semblance peaks (Kirlin, 1992;Kirlin and Done, 2001;Abbad et al., 2009;Fomel, 2009;Luo and Hale, 2012;Chen et al., 2015;Wilson and Gross, 2017), taking into account amplitude variations with offset (AVO) (Sarkar et al., 2001(Sarkar et al., , 2002Yan and Tsvankin, 2008) and implementing inversions by maximizing semblance-based objective function to avoid too many costly parameter scans (Grechka and Tsvankin, 1999;Vasconcelos and Tsvankin, 2006). However, it can generally be said that the higher the number of parameters to scan, the lower the computational efficiency of semblance-based techniques. Fomel (2007) proposes an alternative approach to this problem and formulates an automatic NMO correction of common-midpoint (CMP) gathers based on local slopes, which can directly and automatically be estimated from the gathers themselves (Fomel, 2002). This concept evolves from the original velocity-independent imaging strategies of Ottolini (1983) and early results of Wolf et al. (2004), which rely on the connection between the slopes of reflection traveltime surfaces and the time-domain processing parameters. As a result, the process of subsurface model building amounts to mapping from the automatically measured slopes to the corresponding NMO velocity. In the simple case of 2D hyperbolic reflection traveltimes, the NMO velocity v NMO can be found from the slope p as follows (Fomel, 2007): 1 v 2 NMO ðt 0 ; xÞ ¼Wðt 0 ; xÞ ¼ pðt 0 ; xÞ t x ; (1) where t is the hyperbolic reflection traveltime, t 0 denotes the zerooffset traveltime, x is the offset, and p ¼ dt∕dx. The slowness squaredWðt 0 ; xÞ is obtained as an attribute volume in terms of t 0 and x. In a conventional time processing workflow (Yilmaz, 2001), a profile of slowness squared Wðt 0 Þ at each CMP location as opposed to a volume ofWðt 0 ; xÞ is generally preferred, which leads to the research question of how one can best obtain Wðt 0 Þ from Wðt 0 ; xÞ. Moreover, reflection traveltime approximations are often nonhyperbolic with more independent parameters needed to honor the effects from anisotropy and heterogeneity. Stovas and Fomel (2016) present several relationships between local slopes/curvatures (d 2 t∕dx 2 ) and moveout parameters for different nonhyperbolic traveltime approximations. Nonetheless, the problem of using local slopes to achieve moveout correction (event flattening) and parameter estimation (velocity analysis) automatically in this regime remains a challenging task. Alternatively, Burnett and Fomel (2009) and Burnett (2011) propose to accomplish both processes by a combination of nonphysical flattening and moveout inversion. The former is achieved by using predictive painting (Fomel, 2010), which automatically traces the CMP events according to the local structural information from slopes without relying on any prior assumption regarding their shape. This is notably different from directly mapping local slopes to the moveout parameter as in equation 1. The traced events are subsequently warped (numerical nonstationary shifts) until they are flat, and a seismic image can be obtained by stacking these flattened gathers. The term "nonphysical" is used here to emphasize the difference between the conventional physical flattening based on an assumption of some particular shape of traveltime surfaces (e.g., hyperbolic in equation 1 or any other nonhyperbolic forms extracted from the physics of wave propagation) and the warping approach, which flattens the gathers numerically. In view of subsurface model building, the by-product of this process, the nonstationary shifts, can be stored and applied to a time attribute volumehence the name time warpingwhich yields reflection traveltimes of the corresponding flattened CMP events. We can then invert for the best-fit parameters of any preferred moveout approximation from this knowledge on reflection traveltimes of all CMP events and obtain properties of the subsurface.
Many moveout approximations suitable for various settings exist in the literature; choosing which one to use depends largely on the data coverage, the types of subsurface media, the desired traveltime accuracy, and the computational efficiencymore parameters to scan is often more computationally inefficient. Conventionally, moveout approximations have the well-known hyperbolic expression, which is exact for homogeneous isotropic or elliptically anisotropic media but is only approximately applicable to small-offset data in other cases. For long-offset data, the moveout becomes nonhyperbolic and requires more parameters to capture the effects from possible anisotropy (Hake et al., 1984;Castle, 1994;Tsvankin and Thomsen, 1994;Alkhalifah and Tsvankin, 1995;Al-Dajani et al., 1998;Ursin and Stovas, 2006;Blias, 2009;Aleixo and Schleicher, 2010;Alkhalifah, 2011;Golikov and Stovas, 2012;Farra and Pšenčík, 2013;Farra et al., 2016). Fomel and Stovas (2010) propose the 2D highly accurate generalized moveout approximation (GMA), which requires five independent parameters. This approximation is named "generalized" due to the fact that its functional form can reduce to several other known moveout approximations with different choices of parameters and therefore provides a systematic view on the effects of various parameter choices on the approximation accuracy. Its 3D extension (3D GMA) is proposed by Sripanich et al. (2017) and requires a total of 17 parameters to accurately describe reflection traveltimes in 3D layered anisotropic media.
In this study, we first show that using the time-warping workflow, we can construct a highly overdetermined inverse problem for moveout parameters. We improve upon the work of Burnett and Fomel (2009) and Burnett (2011), who rely on second-order moveout approximations (NMO ellipse); we use in our approach instead the more accurate 3D GMA appropriate for 3D layered anisotropic models. Furthermore, we acknowledge the cheap forward traveltime computations with 3D GMA and propose to use a global inversion scheme within a transdimensional (hierarchical Bayesian) framework (Malinverno and Briggs, 2004;Sambridge et al., 2006Sambridge et al., , 2013, instead of the previous linearized least-squares inversion by Burnett (2011). This allows us to obtain valuable statistical posterior probability distributions of all estimated moveout parameters and data uncertainty as opposed to only one "best-fit" solution according to the least-squares minimization criterion. We provide synthetic and real data examples including one from the SEAM Phase II unconventional reservoir model to demonstrate the performance and discuss important implications from using the proposed traveltime-only inversion approach for subsurface model building.

Moveout inversion by time warping
In general, we can formulate the problem of moveout inversion in the context of time warping as follows (Burnett and Fomel, 2009): where t 2 − t 2 0 denotes the measured traveltime squared shifts from nonphysical flattening, F is a linear or nonlinear operator describing the geometry of the reflection traveltime surface; i.e., the moveout approximation of choice, m denotes the vector of the pertinent U2 Sripanich et al.
moveout parameters associated with that choice of F, and x denotes the vector of offsets (x and y). Here, t denotes the recorded reflection traveltime and t 0 is its value at the zero offset. For each event at t 0 in a CMP gather of N traces, we have N equations with only several unknowns (m) depending on the choice of F. Therefore, this generally leads to a highly overdetermined system for estimating the associated moveout parameters m. For example, in the case of a 2D hyperbolic reflection traveltime, we have Fðm; xÞ ¼ x 2 Wðt 0 Þ, leading to Therefore, this highly overdetermined system can be solved using, for instance, a least-squares inversion (Burnett and Fomel, 2009;Burnett, 2011). The number of parameters in m varies depending on the choice of F and the kind of considered data (2D or 3D). Nevertheless, in general, the number of traces in a CMP gather far exceeds the number of unknown parameters and the resulting system is almost always overdetermined. We note that, in practice, the automatic traveltime picks from recorded data may include complications from various causes such as noise and errors from slope estimation, which may lead to a system of equations that is mathematically inconsistentno set of values for m satisfies the system exactly. In this study, we propose to use the Bayesian inversion framework to find the posterior distributions of possible solutions instead of seeking only one solution associated with least-squares misfit minimization criterion.

Time-warping workflow
Steps to implement the entire time-warping workflow can be summarized (Figures 1 and 2) as follows (Burnett and Fomel, 2009): 1) The inputs of the workflow include a CMP gather dðt; x; yÞ and a time attribute volume as shown in Figure 1a and 1b. The latter is generated by having a similarly sized volume as the CMP gather with each element storing the corresponding value of traveltime along the axis. After warping, this volume will store the reflection traveltimes from different CMP events. 2) Apply slope estimation on the CMP gather ( Figure 1a) using tools such as the plane-wave destruction filter (Fomel, 2002). The resulting slopes indicate the local structures and are used to automatically trace the CMP events in the predictive painting method (Fomel, 2010).
3) The picked events are then numerically warped until flattened by inverse interpolation. The warping process can be viewed as applying a nonstationary shifting filter. An application of this filter to dðt; x; yÞ produces dðt 0 ; x; yÞ in Figure 2a. Applying the same filter to the time attribute volume in Figure 1b, we obtain tðt 0 ; x; yÞ in Figure 2b. Here, we restrict our application of warping to only the parts with CMP events between time 1 and 2 s. 4) In the considered time range (1-2 s), we can compute the traveltime shift volume by the element-wise subtraction of t 2 ðt 0 ; x; yÞ − t 2 0 . 5) The resulting traveltime shifts will be used in the left-hand side of equation 2 for the moveout parameter inversion.
The above five steps are carried out in the same way, regardless of the choice of moveout approximation. These steps could also be applied to migrated common-image-point gathers, or many other gather or domain types. In this study, we focus on its application to the problem of moveout inversion, which is dependent upon the choice of moveout approximation F and the inversion method. Different moveout approximations represent different shapes of reflection traveltime surfaces and involve a different number of moveout parameters to be fitted. An appropriate inversion method must also be chosen in accordance with the choice of moveout approximation. Whereas Burnett and Fomel (2009) solve for the three parameters of NMO ellipse moveout model using least-squares inversion, here, we use a Monte Carlo inversion to infer many parameters of the more comprehensive 3D GMA model.

3D generalized nonhyperboloidal moveout approximation
Using similar notations as before, we express the reflection traveltime of each CMP event in terms of offset x and y in a given acquisition coordinate frame as tðx; yÞ. The 3D generalized nonhyperboloidal moveout approximation (3D GMA) can be expressed as (Sripanich et al., 2017) t 2 ðx; yÞ ≈ t 2 0 þ Wðx; yÞ and t 0 again denotes the reflection traveltime at the zero offset. The total number of independent parameters in equation 4 is 17 including t 0 , W i , A i , B i , and C i . In the case of x ¼ 0 or y ¼ 0, equation 4 reduces to the generalized nonhyperbolic moveout approximation of Fomel and Stovas (2010). Sripanich et al. (2017) show that the 3D GMA (equation 4) can predict reflection traveltimes in 3D homogeneous or complex horizontally layered anisotropic models with a nearly exact level of accuracy for all practical purposes. In equation 4, the parameters t 0 , W i , and A i are defined with respect to the zero-offset ray, whereas B i and C i are obtained from finite-offset rays (Sripanich et al., 2017). With these definitions, the parameters t 0 , W i , and A i are related to previously known timeprocessing parameters: The term W i denotes the reciprocals of NMO velocities and governs the so-called NMO ellipse (Grechka and Tsvankin, 1998a). The term A i represents the quartic coefficients that govern the nonhyperbolicity of reflection traveltimes. A common approach to simplify and relate the quartic coefficients (A i ) to anisotropic parameters of the subsurface medium is the pseudoacoustic approximation, in which A i can then be expressed in terms of η i (Alkhalifah and Tsvankin, 1995;Alkhalifah, 2003;Stovas, 2015;Sripanich and Fomel, 2016;Sripanich et al., 2017). With regard to this study, we keep the A i notation to maintain the generality of the proposed moveout inversion approach.
On the other hand, B i and C i are designed to be dynamic and can vary with respect to different choices of finite-offset rays, which lead to better flexibility and accuracy of the traveltime fitting. As a result, they cannot be directly related to medium parameters and one may expect that there are many possibilities of B i and C i that can lead to equally good approximation accuracy. Therefore, we expect that the nonuniqueness of B i and C i can also lead to the nonuniqueness in the final inverted models.

Monte Carlo inversion
Equipped with the general concepts on time warping and the choice of 3D GMA described in the previous sections, we propose to solve this moveout inversion problem with a global Monte Carlo inversion method. This choice is encouraged by the nonuniqueness of the solution of this inverse problem that only relies on traveltime (kinematic) data and the fact that the 3D GMA depends nonlinearly on associated moveout parameters. By using a global optimization scheme within a Bayesian framework, we can map out the nonuniqueness of possible solutions and present them as posterior probability distributions, while also honoring the nonlinearity of the problem. Here, we largely follow the notations of Mosegaard and Tarantola (1995) and Tarantola (2005), where we express the posterior probability density function σðmÞ as σðmÞ ¼ kρðmÞLðmÞ: The term ρðmÞ denotes the prior probability density function of model parameters m and LðmÞ represents the likelihood function (with an appropriate normalization constant k) that measures the degree of fit between predicted data and observed data. Assuming a prior ρðmÞ that we can draw samples from, the goal is to design a selection process based on LðmÞ that will result in samples whose density directly represents σðmÞ. Mosegaard and Tarantola (1995) show that given samples of ρðmÞ and a choice of LðmÞ, the selection process can be performed by means of the Metropolis-Hastings algorithm. In this study, we define LðmÞ as where for each t 0 , the misfit between the observed F obs n and modeled F n ðmÞ for all N traces in the considered CMP gather is evaluated and summed according to equation 7. Here, S represents a "hyperparameter" related to data uncertainty, which we choose to express as (8) with S % denoting the magnitude of uncertainty expressed in percent with respect to the root mean square of the data F obs n . Because t 0 is constant for each CMP event, S can also be considered as a relative uncertainty in the estimated reflection traveltime squared (equation 2).
We emphasize that data uncertainty is generally not easily quantified in practice due to the workflow of estimating F. To account for this complication, we propose to use the transdimensional (hierarchical Bayesian) inversion framework (Malinverno and Briggs, 2004;Sambridge et al., 2006Sambridge et al., , 2013, which permits us to treat S % as yet another parameter to be estimated during the inversion. In this way, we obviate the need to estimate the data uncertainty explicitly, while correctly honoring its effects in our inversion. Our implementation of the Monte Carlo inversion with the Metropolis-Hastings rule can, therefore, be summarized as follows: 1) Assume that we start at a point in the model space j with a guessed set of m j drawn from the prior distributions of each model parameter. In the case of the 3D GMA, m consists of W i , A i , B i , C i , and S % . For uniform prior distributions, a random number generator can be used for drawing samples, whereas, for Gaussian prior distributions, a combination of a random number generator and the Box-Muller transform can be used. 2) Evaluate the current likelihood function L j ¼ Lðm j Þ according to equation 7 with a choice of cutoff offset that depends on the example at hand. 3) Attempt to move to another point i with a new guess of m i drawn from the same distribution. This attempt is evaluated according to the following Metropolis-Hastings rule: • If L i ∕L j ≥ 1, then accept the move to i.
• If L i ∕L j < 1, then make a random decision of staying at j or accepting the move to i with a probability of L i ∕L j .

4)
Record the values of m after the selection. 5) Repeat steps 2-4 as many times as needed.
In our workflow, we propose to perform moveout parameter estimation with two inversion runs: 1) The first run is aimed at estimating W i using only small-offset data. At this stage, we assume that the prior distribution is uniform within the chosen bounds (min and max values) for each model parameter in m (W i , A i , B i , and C i ) and specify a small cutoff offset (e.g., where the offset-to-depth ratio is assumed to be unity). 2) Subsequently, the resulting distributions of W i from the first run are assumed to be Gaussian and will be used as new prior distributions for W i in the second inversion run with all available (small-and large-offset) data to estimate A i , B i , and C i . Better prior distributions of W i from the first run are expected to help with the estimation of A i , B i , and C i . The prior distributions for other pertinent parameters are again assumed to be uniform within some chosen bounds (min and max values).
After the two runs, the final recorded m can be visualized as histograms that represent 1D marginal posterior probability distributions of the estimated moveout parameters m. The corresponding posterior probability density function σðmÞ can subsequently be obtained with appropriate normalization of the histograms. To quantify the "gain of information" from the moveout inversion process, we use the Kullback-Leibler divergence D KL ∈ ½0; ∞Þ given by which estimates the relative difference between the prior ρðmÞ and the posterior σðmÞ. A high D KL value indicates that the posterior distribution is very different from the prior, and therefore, has gained new information on the moveout parameters through the inversion process using the provided data (i.e., moveout traveltime shifts F obs n in our problem). On the other hand, D KL ¼ 0 indicates that the two distributions are identical and no gain has been achieved upon seeing the data. In all of our examples, we report D KL with respect to the input uniform prior distributions to the first inversion run.
Even though D KL is generally unbounded, it is possible to compute a benchmark value for some simple case. For example, if we consider Gaussian prior and posterior distributions, the Kullback-Leibler divergence (D KL ) can be expressed as where the v i and μ i are the variance and mean of the distributions, respectively. It follows that if no information is gained during the inversion process and σðv σ ; μ σ Þ ¼ ρðv ρ ; μ ρ Þ, then D KL ¼ 0. If the standard deviation of the posterior reduces to half that of the prior with the same mean, the Kullback-Leibler divergence is equal to

EXAMPLES 2D synthetic model with inverse moveout
We first consider a simple 2D example where the input CMP gather is generated from the inverse moveout process based on the 2D version (Fomel and Stovas, 2010) of equation 4 (2D GMA) given by This case can be associated with a horizontal reflector beneath a homogeneous layer of transversely isotropic medium with vertical symmetry axis (VTI). Relying on the pseudoacoustic approximation and the choice of zero-offset and horizontal rays to fit moveout parameters, Fomel and Stovas (2010) show that for a homogeneous VTI medium, it is possible to specify A, B, and C in terms of η (Alkhalifah and Tsvankin, 1995)a commonly used parameter to characterize the nonhyperbolicity of moveout traveltimes. This process leads to The terms ϵ and δ are the Thomsen's parameters that characterize the property of the overlying VTI medium.
In this example, we consider a CMP event generated by the inverse moveout process (Figure 3) based on the 2D GMA (equation 12) with A, B, and C specified as in equation 13. We use the VTI parameters of Dry Green River shale (Thomsen, 1986) given by V P ¼ 3.292 km∕s, Using this configuration, we first test our algorithm and its capability to estimate the required model parameters and the hyper- parameter S % that governs the data uncertainty. We consider the setup for the first inversion run in our workflow, where we restrict our data range to small offsets. The benefit of this test is twofold: 1) Given that our algorithm functions correctly, we should be able to recover W and S % , while seeing how the posterior distributions of W changes with varying data uncertainty S % . 2) If the estimated maximum likelihood points (peaks of histograms) are not centered around the true values, the results then suggest some trade-offs among inverted parameters intrinsic to our traveltime-only moveout inversion setup.
In this test, we assume that the picked moveout curve for nonphysical flattening is approximately error-free and we add artificial Gaussian noise with a variance equal to S 2 (equation 8) with S % ¼ 0.1, 0.5, 2, 5, and 25 to the input F obs n for the inversion process. We record model parameters at every 100 iterations to increase the independence between recorded models and stop the iterations when the number of recorded model reaches 20,000. We specify the minimum and maximum values for the uniform prior distributions as shown in Table 1 except for the range of S % , which is set to 0-60 in this case. Here, we use the relationships between GMA parameters (A, B, and C) and η (equation 13) to help guide our selection of the min and max values and we also specify relatively large ranges to reflect the reality of this situation in practice. The resulting posterior distributions for W and S % from this experiment with the cutoff offset specified at 1.25 km are shown in Figure 4. We can observe that our algorithm performs well and manages to correctly estimate posterior distributions for the W and S % centered around the true values. Moreover, at lower S % (small data uncertainty/ noise), the estimated posterior distribution of W has a smaller variance and has its peak closer to the true value. On the other hand, the estimated posterior distribution of W becomes wider (higher variance) with increasing S % as would be anticipated. Looking closer at Figure 4, we also see that at higher levels of S % , the peak of the posterior distribution of W shifts toward a lower value, suggesting a trade-off between inverted parameters despite the use of only small-offset data. This observation is supported by the 2D marginal posterior distributions of W and the other three parameters: A, B, and C as shown in Figure 5 for the case of S % ¼ 2. We can observe  After successful testing, we use the same example and implement the proposed two-run inversion without any additional noise to the input data. In this case, we record the model at every 500 iterations and stop when the total number of the recorded model reaches 20,000. Similar min and max values for the uniform prior distributions in Table 1 are used in this experiment. The resulting posterior distributions of all the estimated parameters are shown in Figure 6. We can observe that the posterior distributions of W and A have their peaks close to the true values with high D KL suggesting that information on both parameters has been gained from the inversion and they appear to be recoverable with relatively high fidelity. On the other hand, the distributions for B and C are more diffuse with no clear peak and lower D KL . We emphasize that the results on B and C agree with the properties of GMA, in which both parameters are designed to be nonunique for dynamic traveltime fitting with high accuracy. Finally, the posterior distribution for S % has only one peak close to zero verifying our assumption that the picked traveltime curve in this example is approximately noise-free.

2D Mobil Viking Graben data set
Next, we consider an example with real 2D data set from the Viking Graben (Figure 7) from Keys and Foster (1998). In this example, we use the flexibility of the proposed time-warping workflow that can be implemented at any selected locations related to different reflectors to help assessing velocity estimated from conventional semblance-based technique. We can achieve this selection by choosing an appropriate t 0 (a slice of the tðt 0 ; x; yÞ cube) that corresponds to the desired reflector and implement a Monte Carlo inversion only at such a location. Figure 8 shows CMP gather number 1000 of the Viking Graben data overlain by two picked moveout curves associated with prominent reflectors to be considered in our inversion. Due to the availability of only small-offset data, we are obliged to restrict our inversion to W and S % . Their ranges for the input uniform prior distribution are 0.08-0.5 for W and 0-10 for S % . The cutoff offsets used for the two events are at 1.8 (blue) and 2.2 km (red), approximately corresponding to trace numbers 27 and 35, respectively. A comparison of the semblance-based velocity and those from the proposed workflow is shown in Figure 9 with the solid black line denoting semblance-based velocity picks and the white "+" signs denoting the maximum likelihood points from the proposed inversion workflow. The estimated posterior distributions of both events   Figure 10. We can see that in this example the results from both approaches generally agree well with each other despite relying on different theoretical bases (signal coherency versus traveltime-only inversion) to estimate W. This result gives further assurance to the estimated parameters. Nevertheless, we note that using the proposed workflow, we obtain posterior distributions that contain additional statistical information for more assessment and quality control of the estimated moveout parameters (only W in this case) and the input data uncertainty (S % ).

3D synthetic model with inverse moveout
Turning to a 3D case, we consider an example in which the input CMP gather is generated from the inverse moveout process similar to the previous 2D case but now with the 3D GMA in equation 4. All of the pertinent parameters W i , A i , B i , and C i are nonzero and can represent the case of an orthorhombic subsurface with azimuthally rotated principal axes with respect to the local Cartesian coordinates. The inputs of the time-warping workflow for this experiment have previously been used in our explanation on the method and are shown in Figure 1. We purposefully specify three CMP events at t 0 ¼ 1, 1.5, and 2 s, which correspond to the maximum offset-to-depth ratios of approximately 2.326, 1.789, and 1.5 in the x-direction and of 2.474, 1.876, and 1.5625 in the y-direction reflecting our choice of W 1 and W 3 , respectively. According to the general rule of thumb of offset-to-depth ratio > 1 for nonhyperbolic moveout analysis (Alkhalifah, 1997;Tsvankin, 2012;Thomsen, 2014), we anticipate that our inverted posterior probability distributions for A i from the topmost event should demonstrate the best performance. We also add Gaussian noise with S % ¼ 2.5 (the top event), 2.75 (the middle event), and 3.0 (the bottom event) to the input traveltime data from automatically picked moveout curves. Because we are dealing with a more challenging 3D gather, we anticipate that the automatically picked moveout curve will no longer be error-free and, therefore, the final estimates of S % would reflect the total data uncertainty from the artificially added noise and the automatic picking process.
In this example, we specify the minimum and maximum values for the prior distributions as shown in Table 2    in the form of histograms in Figures 11, 12, 13, 14, 15, and 16. It is apparent from Figure 11 that the W i are estimated well with relatively high D KL values.
We proceed with the second inversion run with all available data and the resulting W i from the first run. The inverted A i from this experiment require more detailed analyses. First, we emphasize that the posterior distributions of A i appear more diffuse and multimodal (Figure 12). This observation suggests that there are many equivalently good values of A i that can lead to an accurate prediction of the moveout traveltime, which indicates the inherent nonuniqueness of the moveout inversion problem based on traveltime data alone. Moreover, it is not obvious that having a larger offset-to-depth ratio (Figure 12a associated with the topmost event) helps to mitigate this nonuniqueness of possible solutions.
A closer look at Figures 11a reveals that the peak of the posterior distribution of W 1 is slightly higher than that of the true value, whereas that of W 3 is slightly lower. This becomes less obvious in Figure 11b and 11c, which corresponds to smaller offset-to-depth ratios. One explanation of this behavior can be given by considering Figures 11a and 12a together. In Figures 12a, the maximum likelihood values of the posterior A i also slightly deviate from the true values. Therefore, similar to the previous to 2D case, there appears to be a trade-off between inverted W i and A i on top of the nonuniqueness of possible solutions, which seems to be noticeable with data from larger offset-to-depth ratios. This observation is supported by the 2D marginal posterior   distributions of W i and A i shown in Figure 13, where we can observe clear correlations between the W 1 − A 1 pair and the W 3 − A 5 pair. A decrease in the estimated W 1 (W 3 ) correlates with an increase in the estimated A 1 (A 5 ). A lesser degree of correlation can also be seen from W 2 − A 2 and W 2 − A 4 pairs, where a decrease in W 2 correlated with an increase in A 2 and A 4 . Correlations among other pairs of W i and A i are not observed from this experiment. Note that we display the 2D marginal posterior distributions from the bottom event at 2 s in Figure 13. This is done to facilitate the visualization of possible correlations because the associated posterior distributions are more diffuse. However, our observation remains valid and applicable to other events because this trade-off behavior is a consequence of our inversion configuration with the 3D GMA. Analogous to the estimated A i , the resulting distributions of B i ( Figure 14) and C i (Figure 15) are also fairly diffuse and multimodal. Therefore, similar conclusions can be drawn regarding the nonuniqueness of possible solutions. However, similar to the 2D case, we emphasize that both parameters, whose values depend on the choice of finite-offset rays, are designed to be dynamic in the 3D GMA and this behavior is to be expected. Figure 16 shows the resulting posterior distributions of S % for the three events. We can observe that the peak estimates of S % are slightly higher than the level of added Gaussian noise, agreeing with our expectation that the automatically picked moveout curve for this 3D gather contains a small degree of uncertainty, which can be captured and quantified in our workflow.
We emphasize that the above analysis is permissible due to the use of a Bayesian framework for the inversion that gives posterior probability distribution as a result. Together with our analysis from the previous 2D inverse moveout example, our results suggest a limitation of using moveout traveltime inversion to estimate quartic coefficients, which becomes more difficult in 3D with a higher number of required parameters associated with a more complex subsurface model (e.g., orthorhombic media).

SEAM Phase II unconventional reservoir model
For a more complex example, we consider a CMP gather from the 3D SEAM II unconventional reservoir model shown in Figure 17 (Oristaglio, 2015). This model has a largely flat structure suitable for moveout analysis and includes unaligned orthorhombic sub- Figure 12. A comparison of estimated posterior probability distributions for A i in the form of histograms for events at (a) 1, (b) 1.5, and (c) 2 s of the 3D inverse moveout model. Note that they are more diffuse than the inverted W i in Figure 11. The solid vertical lines denote the true values.
layers that require 3D multiazimuth processing. We choose one CMP gather located at x ¼ 4.735 km and y ¼ 2.881 km, eliminate multiples, and bin to 39 × 39 offset samples from −3.8 to 3.8 km in xand y-directions. We note that the maximum depth of this model is 3.75 km; therefore, the offset-to-depth ratio of intermediate layers is moderately larger than unity. Figure 18 shows the considered CMP gather overlain by automatically traced events for nonphysical flattening. We specifically focus on two events at t 0 ¼ 1.315 and 1.815 s, respectively. The parameter ranges for prior distributions are given in Table 3, where we use the same notion as in the previous 3D inverse moveout case to choose the min and max values. We also assume that B i and C i are largely unknown and specify relatively large ranges for their values. The model parameters are recorded every 500 iterations until reaching a total of 10,000 models. The cutoff offsets for the first inversion are at 0.89 and 1.5 km for the two events, respectively. In this example, we choose to use 1D average values of moveout parameters to benchmark our results. These values are computed by taking the true model parameters at the considered CMP location and using them to compute effective moveout parameters appropriate for a stack of horizontally layered unaligned orthorhombic media (Sripanich and Fomel, 2016;Koren and Ravve, 2017). They are exact in the case of truly 1D media but represent good approximate values of moveout parameters in this case of mildly laterally hetero-geneous SEAM II model. We note also that in this example, the contribution from the quartic terms (A i ) as shown in their 1D average values are relatively small in comparison with that from the second-order terms (W i ). Therefore, this condition is expected to further hamper the parameter estimation process in addition to the inherent nonuniqueness of possible solutions.
Example posterior histograms for events at t 0 ¼ 1.315 and 1.815 s are shown in Figures 19, 20, and 21. It follows from the results that in this example of SEAM II, the proposed approach can capture W 1 and W 3 with clearly defined maximum likelihood values close to the 1D average values and a high value of D KL . However, results on the W 2 parameter that define the orientation of the NMO ellipse suggest that its value is centered around zero different from the 1D average value. The estimated posterior distributions of A i related to subsurface anisotropy exhibit the same characteristics as beforediffuse and multimodal, although several maximum likelihood values and the 1D average values fall within the same ballpark. Agreeing with our previous analysis on 3D synthetic example, these results suggest that it is possible to estimate the quartic moveout coefficients from a 3D gather associated with complex anisotropic media, although there is a limitation on the inherent nonuniqueness of solutions due to more diffuse and multimodal posterior distributions in comparison with W i , which are generally well constrained. Figure 13. 2D marginal posterior distributions of W i and A i from the 3D inverse moveout example for the event at 2 s that suggest correlations and trade-offs among inverted parameters. The rows denote W i , and the columns denote A i . Prominent correlations between the W 1 − A 1 pair and W 3 − A 5 pair can be observed.

Spiral-sorted (snail) gather
Instead of looking at 3D CMP gathers, Burnett (2011) shows that the time-warping workflow can accommodate gathers with other choices of sorting scheme, which may provide greater flexibility. This is due to the fact that in the time-warping workflow, one uses nonphysical flattening that is not restricted to any particular event shape. The algorithm merely functions based on aligning events between neighboring traces and will be able to produce meaningful results as long as the automatic picker (predictive painting) can follow the events from trace to trace.
The first and most intuitive candidate is the offset-sorted gather, in which traces corresponding to the same CMP position are not binned but are instead sorted and shown in 2D with increasing absolute offset. This leads directly to a practical advantage of requiring only 2D slope estimation as opposed to a 3D version. Additionally, neighboring CMP gathers can be offset-sorted and juxtaposed along the third axis for simultaneous processing with a 3D slope estimator and additional regularization across the CMP positions. Nonetheless, in this offset-sorting scheme, neighboring traces may be from completely random azimuths (Figure 22), which leads to a nonoptimal setup for capturing the azimuthal variations of moveout curves.
Alternatively, Burnett (2011) proposes that the choice of spiral sorting is particularly attractive because it holds a similar advantage of enabling the use of a 2D slope estimator, yet captures azimuthal information. An illustration of a map-view spiral-sorted gather in comparison with an offset-sorted gather is shown in Figure 22. We can see that spiral sorting the gather will allow the events to vary smoothly in the offset and azimuth, while maintaining a minimum physical proximity between the neighboring traces.
Using the same CMP gather of SEAM II data from the previous example, we spiral sort the gather instead of binning, which enables us to look at the total 3200 traces available at this particular CMP position. A subsequent 2D slope estimation, as opposed to 3D, is carried out, and the automatically traced events are shown overlaying the spiral-sorted gather in Figure 23. We use the same prior distributions as in the previous case (Table 3) and the same general setup by recording the model every 500 samples with a total of 10,000 models. Similar histograms of estimated posterior distributions for the same two events are shown in Figures 24, 25, and 26. In this particular example of SEAM II data, apart from saving computational cost by allowing 2D slope estimation, spiral sorting the gather appears to slightly improve the estimated posterior probability distributions in comparison with those from the previous section (Figures 19 and 20) some of the maximum likelihood A i are closer to the 1D average values. Similar general observations and conclusions regarding the nonuniqueness of possible solutions can still be drawn.

DISCUSSION
We first emphasize that one of the main advantages of timedomain processing is efficiency. Despite its well-known limitations in complex geologic environments (e.g., subsalt), it remains an attractive choice in some other important areas such as unconventional reservoirs, where the geology is largely layered (e.g., SEAM II model in Figure 17) and an expensive treatment of depth-imaging may not be necessary (Fomel, 2014). Our time-warping approach presented here further improves the current automatic time-domain processing workflow by proposing the use of highly accurate 3D traveltime approximations (equation 4) and adding a probabilistic component for a Figure 15. A comparison of estimated posterior probability distributions for C i in the form of histograms for events at (a) 1, (b) 1.5, and (c) 2 s of the 3D inverse moveout model. Note that they are moderately diffuse and sometimes multimodal similar to Figure 12. The solid vertical lines denote the true values. Figure 16. A comparison of estimated posterior probability distributions for S % in the form of histograms for events at (a) 1, (b) 1.5, and (c) 2 s of the 3D inverse moveout model reflecting the total uncertainty from the automatically picked 3D moveout curve and the artificially added Gaussian noise. The peaks in the distributions are slightly higher than the levels of added Gaussian noise for the three events at S % ¼ 2.5, 2.75, and 3, suggesting the small degree of uncertainty from the automatic picking process in a more challenging case of 3D gather.

Moveout analysis by time warping U13
better analysis of the inversion results. We show that these additions can easily be implemented, while maintaining the original goal of efficiency and automation. We note, however, that the degree of efficiency of our time-warping approach primarily depends on the number of sampled models, which must be tailored to any specific problem on a case-by-case basis.
In addition, even though we propose a workflow with two inversion runs to help obtain better prior distributions for W i by taking advantage of one of the characteristics of our problem (i.e., W i can be estimated with only small-offset data), our resulting posterior probability distributions of A i are still estimated relative to the choice of uniform priori probability distributions. Therefore, it remains possible to improve the proposed approach if a better knowledge on the prior distributions ρðmÞ is available. For example, one may choose to use, instead of a uniform prior, a Gaussian distribution over some known mean value and standard deviation. Such information may be obtained on the basis of a previous velocity model in the region or any nearby well measurement. A better characterization of the prior probability distributions will surely enhance the efficiency of Monte Carlo sampling for solutions to the moveout inversion problem.
We note that in practice, it is sensible to consider a large range for uniform prior distributions of different parameters. However, this may not be the best choice in view of computational efficiency. We emphasize that this decision has to be made on a case-by-case basis and one may choose to follow our method by using information such as relationships between GMA parameters and η in equation 13, as well as results from forward computational experiments with 3D GMA in various media (Sripanich et al., 2017) to help in coming up with a rough numeric range for each parameter.
Moreover, one could be tempted to use a different inversion sequence (e.g., estimate W i , A i , and S % before B i and C i ) with more runs to better constrain the results or to use a different parameterization as that in equation 4 to reduce the trade-offs between parameters. However, we would like to emphasize that A i , B i , and C i are inherently connected. This follows from the fact that to maintain the accuracy of moveout traveltime approximations, the quartic (x 4 ) term needs to have both the numerator (related to A i ) and the denominator (related to B i and C i ). Without a proper denominator, the series would lead to inferior accuracy (higher error) at large offsets (Tsvankin, 2012;Sripanich et al., 2017). Therefore, it is unclear how we can separately invert for A i , B i , and C i in different runs without introducing any potential bias to the results. One way to approximately handle this complexity is to assume a relationship among A i , B i , and C i such as that in equation 13, which effectively reduces the number of inverted parameters at the expense of the moveout approximation accuracy. The result from this experiment with our 2D inverse moveout example ( Figure 3) is shown in Figure 27, where we can observe that the maximum likelihood points of estimated W and η lie close to the true values. Nonetheless, the 3D counterpart of equation 13 is not yet established for complex 3D layered orthorhombic media with azimuthally rotated principal axes such as the SEAM II model. Figure 17. Vertical P-wave velocity of the SEAM II unconventional reservoir model. Table 3. Minimum and maximum values of moveout parameters that define the uniform prior distributions for our Monte Carlo inversion for the SEAM II example. In our inversion workflow, we propose to use the Metropolis-Hastings algorithm for its simplicity. There are more advanced guided random walk methods that can lead to better performance, and several of them are reviewed by Sambridge and Mosegaard (2002). Other choices of likelihood functions LðmÞ are also possible (Mosegaard and Tarantola, 1995;Tarantola, 2005). Nonetheless, it is expected that our observations on the characteristics of the resulting posterior probability distributions (i.e., multimodal and diffuse) should remain valid.
With regard to the resulting posterior probability distributions, it is clear that there is nonuniqueness in the moveout inversion based on traveltime fitting and trade-offs between the inverted parameters. Grechka and Tsvankin (1998b) show that the inverted nonhyperbolic moveout parameter -η in their caseis highly sensitive to the errors in reflection traveltimes and the spreadlength of available data. This observation agrees with our posterior probability distributions that can be multimodal and diffuse, which suggest that there are many equivalently good sets of possible moveout  Moveout analysis by time warping U15 parameters that satisfy the same level of accuracy in predicting reflection traveltimes. Moreover, the B i and C i in 3D GMA also remain undetermined and contribute to the nonuniqueness in the inverted models. Therefore, several factors contribute to the nonuniqueness of possible solutions and choosing which specific choice of representative model parameters to commit to is not straightforward. Nevertheless, the results obtained from our approach could be used as starting models for more sophisticated tomographic or full-waveform inversion techniques for greater accuracy and better convergence.
In the time-warping workflow, it is important to reiterate that we do not map local slopes to moveout parameters directly because that requires a prior assumption of the traveltime approximation form (Stovas and Fomel, 2016). Rather, we rely on measured slope fields and predictive painting to automatically trace CMP events and capture the moveout curves regardless of their shape. Consequently, conditioning of local slopes field is critical to the quality of the input traveltime data for the inversion. In light of this, it is possible to further improve the time-warping method by addressing potential problems that may arise from interfering dips (Schleicher et al., 2009) and predictive painting across discontinuities (Xue et al., 2018). Nonetheless, given that information on reflection traveltimes can be obtained, our proposed Bayesian inversion will produce posterior distributions of estimated parameters that reflect the level of data uncertainty.
We stress that there are many possible choices of moveout approximations that can be used. Each of them can achieve a different level of approximation accuracy and involves a different set of moveout parameters. The 3D GMA is chosen here primarily due to its convenient applicability to complex 3D anisotropic models (e.g., layered orthorhombic with unaligned vertical symmetry planes in SEAM II) and its high accuracy in traveltime prediction, which technically allows us to accurately invert for moveout parameters and relate them to subsurface properties. This comes at the price of working with a large number of moveout parameters (17). Other less accurate moveout approximations with fewer parameters can be used, but a poorer performance in traveltime prediction may also lead to more uncertainty in the inverted moveout parameters. Nevertheless, analysis has to be conducted on a caseby-case basis to draw meaningful conclusions regarding the tradeoffs between the accuracy of the traveltime approximation and the number of resolvable moveout parameters. Therefore, in general, it appears to be practicable to perform moveout inversion in a hierarchical manner starting from 3D GMA, which is applicable to a complex 3D model before reducing the number of inverted parameters based on the resolvability of the parameters (e.g., with an analysis on the posterior probability distributions and D KL ) or different assumptions such as a simpler anisotropic model of subsurface medium.
The estimated moveout parameters (W i and A i ) are well known to also be affected by lateral heterogeneity (Lynn and Claerbout, 1982;Grechka and Tsvankin, 1999;Blias, 2009;Sripanich et al., 2019). Only when the subsurface is laterally homogeneous (1D) can the moveout parameters directly be related to medium parameters. In our approach, we do not specify their expressions explicitly in terms of medium parameters and we treat them as "effective" moveout parameters that are estimated based on the 3D GMA. Interval parameter estimation can subsequently be accomplished by layer stripping given that the medium is assumed to be 1D (Sripanich and Fomel, 2016;Koren and Ravve, 2017). . Map-view setups for the offset-sorted gather (left) and spiral-sorted gather (right). The different numbers indicate different traces that have to be sorted from the smallest to the largest. Azimuthal information in the former is arranged inconsistently because the absolute offsets are equal for all four traces and the order in which these four traces will be arranged is random. However, in the latter, the azimuthal information can be traced smoothly from small to large radial offsets, which enhances continuity and stability for the slope estimation process.  Most importantly, regardless of which moveout approximation one uses, the time-warping workflow nonetheless remains the same. The final estimated models are expressed in the time (t 0 ) domain, which can be related to the depth model via time-to-depth conversion (Cameron et al., 2007;Li and Fomel, 2015;Sripanich and Fomel, 2018).
Finally, we note that alternative traveltime approximations such as various forms of common-reflection-surface (CRS) approximations exist, which consider traveltime variations in the offset and midpoint directions (e.g., Jäger et al., 2001). In general, CRS methods aim to achieve better stacked images and objective functions for global CRS parameter estimation are normally considered based on coherency measurements such as semblance (e.g., Barros et al., 2015;Bloot et al., 2018;Garabito, 2018). An alternative estimation of CRS parameters can also be performed by mapping directly from local slopes and curvatures (Santos et al., 2011;Waldeland et al., 2018). In view of the proposed approach, inversion-based CRS parameter estimation methods can potentially benefit from a Baye-  sian formulation, which will lead to statistical posterior distributions of the estimated parameters. Nonetheless, in this study, we restrict ourselves to the CMP domain and the corresponding problem of parameter estimation from reflection moveout only in the offset direction. We focus on investigating the problem of 3D anisotropic model building at the earliest stage of processing from reflection moveouts. As demonstrated in our examples, the proposed framework is relatively flexible and can be used to bring additional statistical information on inverted parameters at any selected locations (reflectors). We anticipate that this method could represent an auxiliary tool that can help assess the quality of inverted moveout parameters from other automatic estimation methods.

CONCLUSION
We propose a general workflow for moveout processing and inversion by means of time-warping. The method achieves event flattening and moveout parameter estimation in an automatic and efficient manner with minimal user inputs. We show that our method doesn't rely on multiparameter scans but instead on data-driven slope measurements. Automatic event flattening can be achieved with the measured slopes and produces the information on moveout traveltimes for all corrected CMP events. Such information can then be used in a moveout inversion process for estimating effective parameters associated with any preferred choice of moveout approximation. Due to the cheap traveltime modeling with moveout approximation, we choose to solve this inverse problem using a global Monte Carlo optimization method. Using synthetic and real data examples, we demonstrate that the resulting moveout parameters estimated using our approach are represented by posterior probability distributions that contain valuable statistical information. Unlike the results from other estimation methods such as semblance-based picking and least-squares inversion, the posterior probability distributions are unbiased toward any particular solution but represent the entire space of possible solutions and their associated probability of occurrence. This information can be used in conjunction with results from other estimation methods for better assessment and quality control of estimated parameters associated with any selected reflectors. Our results also suggest that the solutions to the problem of moveout inversion from traveltime fitting for 2D CMP gathers are generally well constrained with only a few parameters to estimate despite a trade-off between the second-order W and quartic moveout coefficient A. On the other hand, for 3D CMP gathers associated with a complex anisotropic subsurface, the solutions appear nonunique and there are trade-offs among the inverted parameters, especially the quartic coefficients (A i ). Only W i parameters that govern the NMO ellipse seem to be generally well constrained by the traveltime data. Further limitations on the data such as noise and sparse measurements can hamper the inversion and can even lead to inferior results as shown by more diffuse estimated posterior distributions at a high level of data uncertainty. Therefore, one has to be mindful to draw conclusions on representative model parameters to be used in subsequent stages of processing and imaging, especially for 3D anisotropic models.