Seismic inversion with deep learning

This article investigates bypassing the inversion steps involved in a standard litho-type classification pipeline and performing the litho-type classification directly from imaged seismic data. We consider a set of deep learning methods that map the seismic data directly into litho-type classes, trained on two variants of synthetic seismic data: (i) one in which we image the seismic data using a local Radon transform to obtain angle gathers, (ii) and another in which we start from the subsurface-offset gathers, based on correlations over the seismic data. Our results indicate that this single-step approach provides a faster alternative to the established pipeline while being convincingly accurate. We observe that adding the background model as input to the deep network optimization is essential in correctly categorizing litho-types. Also, starting from the angle gathers obtained by imaging in the Radon domain is more informative than using the subsurface offset gathers as input.

Comparative diagram. Top: the baseline seismic inversion approach starting from an imaging process, followed by estimating the wavelet at the well, a time-consuming elastic inversion, and finally litho-type classification from estimated elastic parameters, followed by the verification step. Bottom: the proposed deep learning approach that aims at solving the seismic inversion problem in one step. We start from a local imaging process, where we use either subsurface-offset gathers or angle gathers. These together with a background model obtained from the well-log, form the input of a deep network classifier which predicts litho-classes to be verified approach known as Naive Bayes [21], which assumes independence between the variables, is easy to implement and provides good results. In this work, we apply both the full Bayesian and the Naive Bayes approach as baselines.
Here, we consider a set of deep learning methods inspired by image segmentation and image classification literature [20,23]. We compare the considered deep learning methods with an advanced elastic inversion approach, performing a non-linear elastic inversion based on a local full-waveform inversion (FWI) in the tau-p domain, which provides an estimate of the elastic parameters in the reservoir area [15]. These elastic parameters form the input of a Bayesian inference process to determine the litho-types. We evaluate the estimated litho-types against the petrophysical analysis, here referred to as the ground-truth labeling. Throughout this paper, we call this process the baseline approach, depicted in Fig. 1 (top). As an alternative approach to this, we propose a one-step algorithm, which bypasses the individual steps of elastic inversion and litho-type classification. In the proposal, as in the baseline approach, we start from the seismic data. We perform an imaging process to obtain either angle gathers [25], or alternatively, subsurface-offset gather responses [29]. Note that the subsurface-offset gathers implicitly store angle-dependent information in the offset data. These responses represent the input of a deep-learning pipeline that maps them directly to corresponding litho-types. To solve the inversion problem without ambiguity, we also use background models as prior information and input these into the considered deep-learning pipeline. Figure 1 (bottom) depicts the deep learning proposal. The method for obtaining background models is similar in both the baseline and the deep learning approach. These background models represent prior geologic knowledge obtained from typical seismic velocity analysis. For the purpose of this paper, we create a synthetic earth model based on the "Book Cliffs" geological model, derived from the Book Cliffs outcrop in Colorado, Utah (US). We use this model to evaluate the deep learning approach and the baseline approach. In both cases, we evaluate the predicted litho-types by comparison against the ground-truth labeling. Although a direct comparison between the baseline and the deep learners is not possible due to the nature of the data processing, we evaluate both approaches as transparent as possible.

Naming conventions
Throughout the paper, we refer to "model" as the earth geophysical model. We use the names "learner" or "classifier" for the machine learning methods and in specific the deep learning networks. We denote the litho-types predicted by the learners as "labels" or "targets". We use the terms "classes" and "litho-classes" to refer to the litho-types discriminated by the considered classifiers.

Baseline: FWI and Bayesian classification
For the baseline approach, we carry out the estimation of lithology from seismic data in two steps. Initially, we invert the global seismic data to estimate the elastic properties, followed by statistical classification of lithology using inverted elastic properties and well logs [1]. In the present work, we use local 1.5D elastic full-waveform inversion (FWI) to invert the seismic data, and we map the lithology using a Bayesian classification method [10].

Data description for the baseline
The dataset used in this article is based on the "Book Cliffs" geological model, derived from the outcrop in Colorado, Utah (US). The Book Cliffs model was initially built tetetyukhina2014acoustic to represent a realistic reservoir layer containing 8 different litho-types. This model was downscaled by [11] and changed to have improved geologic structures that contain 12 litho-types with unique v p (pwave velocity), v s (s-wave velocity), ρ (density). Typically, in lithology prediction, a rough separation of litho-types into 3-4 groups is defined based on physical properties of rocks [34]. In our work, we group these 12 litho-types together to represent 4 different classes, each with distinct properties, and increasing clay content: 'coal', 'sand', 'sandy-shale', and 'shale'. Based on the lithological model, we generate the model for the compressional (p-wave) velocity, shearwave (s-wave) velocity, and density of the rock (v p , v s , ρ) and upscale it to the seismic frequency (60 Hz) using Backus averaging [2]. The upscaling process brings a spread in the v p , v s , ρ, as shown in Fig. 2. It is evident from Fig. 2 that the litho-classes correspond to well-separated ranges of values. Although in reality, the classes are not well-separated, the Bayesian methods work by calculating posterior probabilities for every class and assigning the class with the highest posterior probability. If the data is highly overlapping, different classes may have similar posterior probability, thus greatly affecting the classifier's performance. Therefore, here we consider a more idealistic case. In our synthetic experiment, we define a ground-truth model that contains the three elastic properties of rocks: v p , v s , and ρ measured at different well locations, and interpolate these spatially. Figure 3 shows the sampling of the locations along which we define the three data subsets we use for training, validating, and testing our classifiers.

Full-waveform inversion
The inversion scheme applied in this paper was developed by Gisolf and an den Berg [16] and more extensively described by Gisolf, Haffinger, and Doulgeris [17]. The method has been applied in various studies for the purpose of reservoir-oriented seismic inversion for synthetic data [12]. This inversion scheme has been shown to provide better quantitative images of the subsurface for both synthetic and real-field data [13,26,27]. The inversion uses a wave equation based approach on pre-stack AVO (amplitude versus offset), or AVP (amplitude versus rayparameter) information, and it solves locally the 1.5D full elastic wave equation, in conjunction with inverting for the elastic parameters: compressibility (κ), shear-compliance (M) and density (ρ).
For the application of the inversion scheme on our considered Book Cliffs model, we generate synthetic data in the tau-p domain over this model. For this, we use the Kennett invariant embedding method [19], for 10 different ray-parameters, or horizontal slowness parameters. The highest ray parameter is such that on the outermost trace the maximum angle of incidence is 42 degrees. We make this choice because in real data we never have angles that  Inverted shear-compliance (M). We use full-waveform inversion to invert for the elastic properties are more than 40 degrees, and we know that FWI can suffer from missing angles. This allows us to test the limitations of the learning approaches in terms of missing angles.
For the modeling, we use a zero-phase band-pass wavelet with a maximum frequency of 60 Hz. Figure 4 shows the synthetic data after inversion. The inversion reconstructs the general structure of the lithofacies model, however, it fails to resolve the thin coal seams. In addition, in some places, the lateral continuity is missing. This may be due to the application of the inversion on individual common midpoints (CMP's) independently. An accurate estimation of density requires a broad angle range (> 45 degrees). Given that the synthetic modeling contains less than 45 degrees, we did not invert for density.

Full-waveform inversion details
In this study, we use an inversion based on the wave equation on pre-stack AVO, or AVP. We solve the 1.5D full elastic wave equation locally, in conjunction with inverting for the elastic parameters: compressibility κ and shear compliance M, or their inverse: bulk modulus K and shear modulus μ. If data quality permits, also density ρ can be inverted for.

FWI parameterization
In this inversion scheme, the outputs are the contrast in shear compliance M = 1/μ (μ is the shear modulus), compressibility κ = 1/K (K is the bulk modulus) and bulk density (ρ). The properties actually inverted for are the normalised relative contrasts of the absolute properties against very smooth background properties (κ 0 , M 0 , ρ 0 ). The background medium is a very important aspect of the inversion as the incident wave field and Green's function are calculated in this medium. It has to be smooth because we rely on the WKB (Wentzel-Kramers-Brillouin) approximation, which is only valid in smooth mediums. Additionally, it should be a smooth heterogeneous background medium that is non-reflective over the data bandwidth. On the other hand, one would like to have as much information as possible in the background, because it represents the starting model for the inversion, and to keep the contrasts χ as low as possible to reduce the non-linearity of the problem. However, for the current state of the art, we need to keep the backgrounds non-reflective over the bandwidth of data. Usually, low wave-number backgrounds are derived from well logs and interpolated between well logs. That is another reason why the background should have only a low wave-number content.
Forward modelling The forward modelling in the inversion is based on the scattering approach for calculating wave propagation in inhomogeneous elastic mediums. This makes use of the integral formulation of the wave equation. For the purpose of matching it to the observed data, we use the data equation, which is a subset of the full integral equation, or the object equation. We show the data equation and object equation for the simple single parameter acoustic case: where G(·) is Green's function, x r and x s are the receiver location and source location, respectively, ω is the frequency. The integral over x is an integral over the whole object domain. Equation 1 predicts the data recorded at the surface P data in terms of the wave field transmitted by a source that propagates to every point in the subsurface. The wave field is transmitted back from the points where the contrast χ is non-zero to the surface through the smooth background medium. The contrast functions χ are: where c(x) is an unknown subsurface acoustic wavevelocity model and c 0 (x) is the known background medium.
On the other hand, the object domain equation predicts the total wave field at each grid point in the subsurface: Eq. 3 can be used to estimate the total wave field with all its complex propagation, given the contrast function is known. Equation 3 can be substituted in Eq. 1 to obtain the recorded seismic data in terms of subsurface properties.

Optimization Scheme
The inversion is an iterative process where the linearized inversion of the recorded data is alternated with the re-calculation of the total wave field in the object domain (Fig. 5). The inversion kernel and the

Fig. 5 Iterative inversion scheme for Full-waveform Inversion.
It consists of two loops, an inner loop that inverts for the elastic parameters for a given wave field in the object domain, and an outer loop that updates the estimated wave fields by adding a higher order of scattering after every inner loop iteration re-calculation of the total wave field are based on the full elastic wave equation. These are carried out such that every re-calculation brings in a higher order of multiple scattering in the modelled data. Optimization is needed to ensure nondivergence of the field updates. In the context of the iterative inversion scheme, Eqs. 1 and 3 are solved alternatively for the elastic case. The process is augmented by using the Born approximation, where the incident wave-field propagating in the background medium is subjected to a simple linear inversion to estimate the approximate subsurface properties. These approximate subsurface properties are then used to update the total field in the domain using equation Eq. 3. This process is repeated until the estimated subsurface properties and the updated total field do not change anymore.

Bayesian classification
Our considered baseline approaches follow [4,31] and rely on a full Bayesian classifier and a Naive Bayes classifier. For classification, one is interested in calculating the class C j (litho-types) probabilities, given the elastic properties X (observed data), and assigning the class with the highest posterior probability: where P (C j | X=x) is the posterior probability of the j th litho-type given the elastic parameters. P (C j ) is the prior litho-type probability, and P (X=x | C j ) is the likelihood of the elastic property ( X=x ) to be from the j th litho-type.
is the probability of the elastic property. If we have n elastic properties for a given depth sample, the posterior in Eq. 4 becomes: Eq. 5 provides the full Bayesian treatment by including the correlation among all elastic properties, giving rise to a multivariate distribution conditioned on the classes. We estimate this posterior from the training data by using kernel density estimation (KDE) with Gaussian kernels. A simplified approach of Eq. 5 is the Naive Bayes classifier, which assumes independence among elastic properties: where m is the number of litho-types. The above posterior only requires evaluating univariate distributions conditioned on the classes. As before, we estimate these from the training dataset using KDE with Gaussian kernels. We estimate the target priors P (C j ) by counting the number of instances for each target in the training dataset and normalising this to 1. For the full Bayesian treatment we estimate a bivariate density (P (κ, M)), whereas for the Naive Bayes we obtain univariate distributions (P (κ)P (M)). We do not expect a significant difference between the full Bayesian and Naive Bayes classifiers because, in our case, we consider only two inverted elastic properties: compressibility (κ) and shear compliance (M). Figure 6 shows the estimated 2D probability densities for the four litho-type classes.

Seismic litho-class estimation with deep learning
For the deep learning approaches, we employ the same "Book Cliffs" geological model described in Section 2.1. This is a single model corresponding to a single ground truth lithology output. Deep networks require numerous training examples (models) with their associated labels (ground truth lithologies). To overcome this bottleneck we propose to process the data as described below.

Data description for deep learning
Starting from the same geological model as for the baseline, we create a large number of samples to be used for training the deep-learning classifiers. The dataset creation process depicted in Fig. 7 is as follows: we construct snippets of subsurface models from a range of possible models that we expect in our geology. A snippet is a very small piece of 1D model (say 150 meter) containing a few layers of elastic parameters. Each snippet is extended by including an overburden, creating a realistic-scale 1.5D model. However, the overburden has smooth velocity variations. Subsequently, we generate a seismic response and apply the backpropagation and imaging process to arrive at the local subsurface gathers, either in the linear Radon domain or in the subsurface offset domain. These local responses are used as input for the training stage. By this process, all subsurface responses have natural limitations from the overburden and surface acquisition parameters (aperture, seismic source signal, etc.). (1) Book-Cliffs angle: We preprocess the seismic data corresponding to each location along with the snippet, via the linear Radon domain, to obtain the angle gathers [9,33]. Note that we refer to these as "angle gathers" while in principle they are horizontal ray parameter gathers. (2) Book-Cliffs offset: We define subsurface-offset gathers as cross-correlations between the forwardpropagated source fields and the backpropagated seismic responses [6,29]. This is done via depth migration, and specifically in our case WEM (Wave Equation Migration) using one-way recursive propagation operators, and applying a subsurface offset imaging condition at each depth level.
The considered deep-learning classifiers use the background model as input as well as a dataset-specific input: angle gathers or subsurface-offset gathers. The true litho-classes represent the learning targets. We sample the three data subsets: training, test, validation at the same locations as for the baseline classifiers (Fig. 3), where each location contains ≈ 1, 900 data tuples composed of: background model, imaged seismic inputs, and litho-classes.

Deep learning approaches
For deep learning methods, the few hundred samples available represent a relatively small dataset size. Because of this, we consider a set of small network architectures, selected based on their popularity for visual classification tasks resembling our litho-type classification task. LeNet [20] is a standard, commonly-used architecture for classification on small datasets, while UNet [23] is the most popular segmentation architecture to date. The lithology classification implies predicting a class label at every input location which is similar to image semantic segmentation. However, our outputs are 1D vectors of litho-types rather than 2D segmentation maps. We experiment with the following architecture variations: (a) LeNet-2D: a 2D network inspired by the popular LeNet-5 architecture [20], as depicted in Fig. 8.(a); (b) LeNet-1D a 1D version of the same network depicted in Fig. 8.(b), where the 1D convolution is performed across the angles when using the angle gathers, and across the offsets when using subsurface-offset gathers; (c) UNet-2D a light-weight version of the network  [20] where the background and input seismic data are processed on separate branches; (b) LeNet-1D is similar to the LeNet-2D but using only 1D convolutions; (c) UNet-2D inspired by [23] where the seismic inputs and the background models are concatenanted at input level; (d) UNet-1D is similar to the UNet-2D while performing 1D convolutions. The yellow blocks represent the inputs/outputs of the learners, the blue blocks are convolutions, and the gray blocks are fully-connected layers proposed by [23], shown in Fig. 8.(c); and (d) UNet-1D a UNet-inspired [23] 1D network displayed in Fig. 8.(d), where again the 1D convolution is performed across the angles / offsets dimension. Given the limited depth of the networks, for all architectures we use filter sizes of 13 × 13 px, 13×1 px respectively, to increase the receptive field size. We feed into the networks the input data and the background model and predict the associated ground-truth litho-classes. In the LeNet-like architectures, we add the background model via fully-connected layers into the network and concatenate its output with the downstream network features. The UNet-like networks are fullyconvolutional, thus we concatenate the background model to the seismic data as input to the network. For the 2D UNet we replicate the background model spatially to match the resolution of the seismic data.

Experimental analysis
We evaluate the considered deep network architectures: LeNet-2D, LeNet-1D, UNet-2D and UNet-1D on the two versions of the Book Cliffs data: Book-Cliffs angle and Book-Cliffs offset. We compare these results with the baseline: implementing the FWI (full-waveform inversion) followed by a full Bayesian classifier or Naive Bayes classifier, respectively. All the considered methods are trained on the same training data split and evaluated on the same validation/test split, sampled at the locations in Fig. 3.

Baseline results
The baseline lithology classification starts from the inverted elastic properties shown in Fig. 4. We apply the full We evaluate also the added value of the background models. The results indicate that starting from the angle gathers (Book-Cliffs angle), is slightly more beneficial, especially for the fully-convolutional networks. Overall, the UNet classifiers on Book-Cliffs angle achieve the best generalization Bayesian and Naive Bayes classifiers to these inputs. Figure 9 shows the predictions made on common-midpoint 200 for the full Bayesian and Naive Bayes classifiers, along with the true lithology. The average accuracy of both classifiers is ≈70-75%, however, Naive Bayes outperforms the full Bayesian classifier for some targets, especially for the 'shale' and 'sandy-shale' classes. The low accuracy can be attributed to the imperfect inversion results used as a starting point for the lithology classification. The true parameters versus inverted parameters are shown in Fig. 9 (left).

Deep learning results
To bypass the inversion step in the baseline method, we use deep networks trained on the training sets of our two synthetic datasets: Book-cliffs angle and Book-cliffs offset.
We evaluate on the corresponding test sets, where we set the hyperparameters using the validation sets. All deep networks are trained for 500 epochs, using the standard Adam optimizer and batch sizes of 64 samples. We used a scheduled learning rate starting from 0.003 and reduced at epochs 100, 200 and 300, and a weight decay of 0.0001. We standardize the input data: both the imaged seismic data and the background models, by making it zero mean and unit standard deviation, using training set statistics. Table 1 shows the classification accuracies of the deep learners: LeNet-1D, LeNet-2D, UNet-2D and UNet-1D when compared to each other and with the baseline methods. We also report the performance of the networks with and without the background models, as well as just using the background models. For all considered classifiers, we measure performance as class-normalized accuracies. For the deep learning classifiers, we also report standard deviations over three runs and the number of parameters. The two LeNet inspired networks perform on par in terms of accuracy, while the UNet methods are more accurate on both datasets, both with and without background models. Adding the background models as input to the network is beneficial for all architectures, and more specifically for the UNet architectures where it gives a 13-16% improvement in accuracy. Compared to each other the UNet networks using 1D versus 2D inputs perform approximately on par when considering the variance of the learners across different initializations. The advantage of the UNet methods may come from both having a fully convolutional network incorporating translation invariance, and from concatenating the background models earlier on in the network rather than at the end through a fully-connected layer. The deep network architectures obtain competitive performance when compared to the baseline approaches, however, it is difficult to reach a strong conclusion, as the seismic data is processed differently between the baseline  Results on the Book-Cliffs angle test set. We display the true litho-types versus the predicted litho-types. Each color represents a litho-type. We can see the predicted litho-types follow the true litho-types, which indicated that the network is able to generalize well on our considered data and the deep-network methods. We, additionally, test using only the background models, and we find the deep networks are effective at extracting the correct lithology class from only the input blurred version of the elastic properties. This may be due to the well-known ability of deep learning methods to perform input deblurring. Figure 10 shows the confusion matrices for the two baseline approaches and for the deep learning methods trained on Book-Cliffs angle. In the Bayesian approaches, 'sand' seems to be often misclassified as 'shale', and 'coal' as 'shale', while the LeNet-like networks display a larger confusion for the class 'coal'. For the UNet-1D deep network, the errors are different: the most common misclassification is 'coal' as 'sand'. Although we train and evaluate at the same locations in the Book Cliffs model, these confusion matrices cannot be directly compared between the baselines and the deep networks, because the methods process the seismic data differently. We report accuracy over 3 repetitions on the validation set for both the Book-cliffs offset and Book-cliffs angle datasets. A constant background across locations is detrimental to the classification, while the more informative the background is, the higher the accuracy. Figure 11 visualizes true and predicted litho-types over the test set of Book-Cliffs angle when using the UNet-1D learner. Each color represents one out of the 4 lithotypes: the predicted litho-types closely resemble the true ones. The explanation for this exceptional performance is that previous work shows that a shallow network can successfully approximate the Zoeppritz equations used in the FWI [14], and the FWI is an iterative process adapting a set of parameters to a data fit, which is similar to the training procedure of artificial neural networks. Additionally, our data does not contain neighboring seismic interferences and relies on informative background models. We conclude that on this specific dataset the UNet-1D network can generalize well. However, we do not know how well these classifiers generalize to real seismic data, which is more challenging, and when using less informative background models.

Importance of background models
Here we test the importance of the background models in the deep learning methods. We consider four scenarios: using no background, using a constant background obtained by keeping only the background values at training location CMP-100, and a background defined as a single median v p , v s , ρ value of the already blurred background per snippet, as well as the blurred background. We report validation accuracies over 3 repetitions for both the Book-cliffs offset and Book-cliffs angle datasets.
Deep networks are well-known to be able to perform input deblurring, therefore the best results are obtained when using the blurred background. Interestingly, in Table 2, using a constant background is detrimental to the lithology classification especially for the deep learning methods where the background is processed earlier on in the network such as the UNet methods. Moreover, for the UNet-2D the background is replicated to match the input data dimensions thus the network focuses more on the background, and if this background is not informative the accuracy suffers greatly. Just using a median value of the blurred background allows the network to find better solutions when compared to not using any background or using an uninformative background such as the constant background. For the deep networks the background plays a stronger role than for the standard baselines, and therefore a more informative background greatly improves results. LeNet-based architectures display a strongly overfitting behavior where the training loss is nearly 0, while the validation loss is increasing. The UNet-based networks are less prone to overfitting

Discussion
There are a number of limitations to our current work. The first one is the scarce amount of training data, which entails the risk of having the deep learners overfit [18]. Figure 12 shows the training curves for the four considered deep networks. The LeNet-based networks largely overfit on the training data, while the UNet-based networks seem to be less prone to overfitting. This could be due to the reduced number of parameters compared to the LeNetbased learners and also to the use of skip-connections allowing the learners to replace parts of the network with an identity mapping. A possible solution to overcome this is using multiple earth models and creating a variety of synthetic datasets to train on. However, this is an involved task that requires strong geophysical expertise. The second drawback is that the snippet-based simulation process for creating the synthetic datasets -Book-cliffs angle and Book-cliffs offset -covers only an ideal scenario where the seismic responses do not suffer from large interference caused by responses coming from neighboring layers. Also, the background models that we used as priors are blurred versions of the ground truth, which greatly helps in solving the classification problem.

Conclusion
In our experimental analysis, we compare two networks inspired by the popular LeNet-5 [20] architecture, and two fully convolutional CNNs inspired by the UNet architecture [23]. We compare these learners over two variants of our data where the seismic responses are imaged either in the Radon domain or in the subsurface-offset domain. Results indicate that the fully convolutional networks, where the background models are input earlier on in the network, can more accurately solve the classification problem. Moreover, we also find that preprocessing the data in the Radon domain improves the classification. The discussed approaches use different processings of the seismic data: the deep learning classifiers rely on snippets cropped along the depth direction in the model, while the baseline uses the complete model. This means that the results presented here should be carefully considered, as a direct comparison between the baseline method and the deep-learning-based methods is not possible. A more rigorous test of the deep learning methods would be to apply them to real seismic data. The lack of publicly available, large, seismic datasets including well-logs and associated litho-type information suitable for litho-type classification inhibits progress in the field.