Efficient probabilistic inversion for induced earthquake parameters in 3D heterogeneous media

. We present an efficient probabilistic workflow for the estimation of source parameters of induced seismic events in three-dimensional heterogeneous media. Our workflow exploits a linearized variant of the Hamiltonian Monte Carlo (HMC) algorithm. Compared to traditional Markov-Chain Monte Carlo (MCMC) algorithms, HMC is highly efficient in sampling high-dimensional model spaces. Through a linearization of the forward problem around the prior mean (i.e., the "best" initial model), this efficiency can be further improved. We show, however, that this linearization leads to a performance in which the 5 output of an HMC chain strongly depends on the quality of the prior; in particular, because not all (induced) earthquake model parameters have a linear relationship with the recordings observed at the surface. To mitigate the importance of an accurate prior, we integrate the linearized HMC scheme into a workflow that (i) allows for a weak prior through linearization around various (initial) centroid locations, (ii) is able to converge to the mode containing the model with the (global) minimum misfit by means of an iterative HMC approach, and (iii) uses variance reduction as a criterion to include the output of individual 10 Markov chains in the estimation of the posterior probability. Using a three-dimensional heterogeneous subsurface model of the Groningen gas field, we simulate an induced earthquake to test our workflow. We then demonstrate the virtue of our workflow by estimating the event’s centroid (three parameters), moment tensor (six parameters), and the earthquake’s origin time. Using the synthetic case, we find that our proposed workflow is able to recover


Introduction
The need to understand earthquake source mechanisms is an essential aspect in fields as diverse as global seismology (Ekström et al., 2005), oil and gas exploration (Gu et al., 2018), hazard mitigation (Pinar et al., 2003) and space exploration (Brinkman et al., 2020).In its simplest form, an earthquake source can be described, from a physics point of view, by means of a moment tensor (MT) (Aki and Richards, 2002).An MT captures displacement, (potential) fault orientation, and the energy released during an earthquake.In a regional seismology context, MT inversions can provide insight into seismic afterslip patterns of megathrust earthquakes (e.g., Agurto et al., 2012).In case a seismic activity is induced by anthropogenic subsurface operations, characterizing seismic sources may also prove essential (e.g., Sen et al., 2013;Langenbruch et al., 2018).With regard to oil and gas exploration, earthquake source mechanisms are often monitored when hydrocarbons are extracted or when fluids are injected into the subsurface (e.g., for fracking).In fact, such monitoring can be used to assess and mitigate the risk of ongoing injection processes activating existing faults (Clarke et al., 2019).
For the purpose of monitoring induced seismicity, arrays of seismometers can be installed over the exploration area.The waveforms recorded by these seismometers can subsequently be exploited to characterize the induced events.For example, the time of the first-arrival (typically the direct P-wave) is sensitive to the earthquake hypocenter and origin time.There are many inversion algorithms that exploit first-arrivals to obtain estimates of earthquake hypocenters and origin times, such as the double-difference (Waldhauser and Ellsworth, 2000) and equal differential time (EDT) (Lomax, 2005) algorithms.However, to estimate MTs, it is insufficient to use only (first-arrival) travel times.In this study we therefore develop a workflow that utilizes full waveforms as input.Importantly, we pair the workflow with a probabilistic inversion algorithm.
In terms of computational efficiency, each combination of a specific inversion algorithm and a specific subsurface model has both its advantages and disadvantages.In general, the main advantage of using a probabilistic approach is that the output does not consist of a single set of (source) model parameters that minimizes an objective function, but the posterior distribution (see, e.g., Tarantola, 2006) of the desired earthquake parameters.Probabilistic approaches, however, are computationally significantly more expensive than deterministic ones.One way to reduce the computational expense is using 1D subsurface models instead of 3D velocity models to model seismograms.Unfortunately, this can adversely affect the reliability of the obtained posterior because part of the heterogeneity of the subsurface is not accounted for (Hingee et al., 2011;Hejrani et al., 2017).In our workflow, we therefore deploy a computationally efficient probabilistic algorithm to invert for centroid (three coordinate components), origin time, and MT (six independent MT components) while at the same time utilizing a detailed 3D subsurface model.
The algorithm used in our workflow is the Hamiltonian Monte Carlo (HMC) algorithm, which, for sampling high-dimensional posterior distributions, has shown to be significantly more efficient than the conventional probabilistic Metropolis-Hasting family of algorithms (Betancourt, 2017).Using frequencies lower than 0.1 Hz and available prior information, Fichtner and Simutė (2018) developed a variant of the HMC and demonstrated its efficiency to invert for the source parameters of a tectonic earthquake.More recently, Simute et al., 2022 demonstrated the variant's ability to estimate earthquake parameters of tectonic earthquakes while employing 3D subsurface models of the Japanese islands.In contrast to tectonic earthquakes, where prior information regarding the event's MT, centroid and origin time is often available, such prior information is usually absent for induced earthquakes.An insufficiently constrained prior reduces the ability and efficiency of sampling algorithms to properly sample the posterior distribution, and increases the chance of the sampler getting trapped in local minima (Sen and Stoffa, 2013).In addition, compared to tectonic events, the frequency content of induced earthquake waveforms is usually significantly higher.This is because tectonic events usually occur at greater depths than induced events, and hence the higher frequencies have been attenuated more.Also, most of the studied induced events are of lower magnitudes than tectonic events (e.g., below Mw 3), and therefore do not excite frequencies below 1 Hz that effectively.
Due to the higher frequencies present in recordings of induced events, the wavelengths are significantly shorter.Layers of sediment/basin infill close to the Earth's surface may exacerbate this, since velocities usually decrease rapidly in this case.
Shorter wavelengths matter, because, other things being equal, it increases non-linearity.In essence, however, the degree to which the relation between the source parameters and the recorded waveforms is non-linear depends on the ratio between the nominal event-receiver separation and the wavelength.For example, consider (i) an induced seismic event at 3 km depth, an average P-wave velocity of 2.5 km/s, periods that range between 1 and 0.33 seconds, and event-receiver distances of 4 to 11 km (this study), and (ii) a tectonic event at 50 km depth, an average P-wave velocity of 5 km/s, periods between 100 and 15 seconds, and event-receiver distances of 200 to 1100 km (e.g., Fichtner and Simutė, 2018).These values correspond to ratios between event-receiver separation and wavelength that vary (approximately) between 2 and 14 (this study), and 1 to 14 (Fichtner and Simutė, 2018).As soon as shear waves are used to perform centroid-moment tensor inversions, however, the non-linearity in the induced seismic setting considered in this study increases relative to the tectonic case considered.This is due to the fact that Vp/Vs ratios are typically significantly higher in the near surface (i.e., the top 1 to 2 km) than at greater depth.This is particularly the case in Groningen (e.g., Spetzler and Dost, 2017).
In this study, the absence of a well-constrained prior and an increase in non-linearity receives significant attention.First, the challenge of a weaker prior is met by means of a workflow in which the initial prior is updated before running the HMC algorithm.In addition, multiple chains of the HMC variant are run sequentially, where the results of the current chain serve as priors for the next chain.This iterative HMC is meant to provide improved prior information resulting in an adequate linear approximation.We demonstrate the validity of our workflow using data from a synthetically generated induced earthquake, which was simulated using the velocity model of the Groningen subsurface.It should be understood that the proposed workflow is of interest for the characterization of induced seismic events in general.The Groningen case is merely chosen because of the quality and density at which the induced wavefields are sampled, and the relatively high resolution of the available velocity model.
The Groningen gas field is one of the largest gas reservoirs in Europe.Since production began in 1963, more than 2115 billion cubic meters of natural gas have been produced from the field (van Thienen-Visser and Breunese, 2015).Due to this gas production, the reservoir layer has compacted over time, causing earthquakes that have in some cases caused damage to buildings in the Groningen province (Van Eck et al., 2006) and led to several protests against further gas extraction in the area (Verdoes and Boin, 2021).To investigate these earthquakes, an extensive seismometer array was installed, which is operated by the KNMI (The Royal Netherlands Meteorological Institute) on behalf of Nederlandse Aardolie Maatschappij (NAM) (Ntinalexis et al., 2019).Event recordings collected over the Groningen field have been used as input for several inversion algorithms.Spetzler and Dost (2017) used the EDT algorithm to invert for the hypocenters of many Groningen earthquakes.
They inverted arrival times of 87 events and found that all earthquakes occurred within a depth interval of 2300 to 3500 m, with most of the events originating from the reservoir layer (approximately 3000 m depth).These findings are in line with the results of Smith et al. (2020), who used the envelopes of the seismic arrivals as input to their probabilistic algorithm.To invert for both hypocenter (or centroid) and MT, Willacy et al. (2018) took a different approach.Contrary to Spetzler and Dost (2017) who use a 1D model to represent Groningen's subsurface, they utilized a 3D heterogeneous model similar to Smith et al. (2020), and used the model to generate synthetic waveforms to perform a full-waveform deterministic MT inversion.The results of Willacy et al. (2018), however, only focused on pure double-couple sources, which might not capture the true source dynamics.In fact, Dost et al. (2020) recently followed a probabilistic approach to invert event centroids and MTs of a selected number of events and consistently found the (non double-couple) isotropic component of the MT to be dominant and negative.
The latter is in agreement with expectations for a compacting medium.Similar to Willacy et al. (2018), they invert waveforms but employ 1D local subsurface models to generate the modeled seismograms.
In what follows, we first introduce the forward problem of obtaining surface displacements (recorded wavefields) due to induced seismic source activity, including the description of a seismic source in terms of elementary moment tensors.Subsequently, we introduce the Bayesian formulation and detail the linearized HMC algorithm.Afterward, we proceed with the description and implementation of our workflow, which involves several steps that are specific to the characterization of induced seismic sources.We then test the proposed workflow using synthetic recordings of an induced earthquake source.We end by giving a perspective discussion of our results, including an outlook of applying our workflow to actual field recordings of induced earthquakes from the Groningen gas field.

Forward problem
As with all Markov Chain Monte Carlo algorithms, HMC involves an evaluation of forward modeled data against observed data.In our case, this evaluation is between (forward) modeled surface displacement and observed displacement.Specifically, we compute synthetic displacement seismograms u due to a moment tensor source M (Aki and Richards, 2002) with x r the location at which u is recorded, x a the source location, and * representing temporal convolution.Subscripts i, j, and k take on values 1,2 and 3 such that a vector can be decomposed in three Cartesian components, associated with the x 1 , x 2 , and x 3 axis, respectively.G is the Green's function, and its first subscript represents its recorded component.The second subscript indicates the direction in which an impulsive (delta function) force is acting.The comma after the second subscript represents a spatial derivative, and the subscript after the comma indicates the direction in which the derivative is taken.Each component of M represents the strength of a force couples.Together, the 9 constants M jk constitute the second order seismic moment tensor M. The MT effectively approximates a seismic source by collapsing it into a single point.Furthermore, due to conservation of angular momentum, The MT has only six independent components (e.g., Aki and Richards, 2002;Jost and Herrmann, 1989).
Instead of repeatedly computing u for each source-receiver pair location, it is convenient to exploit source-receiver reciprocity.That is, we exploit the fact that G ij (x r ; x a , t) = G ji (x a ; x r , t) (Aki and Richards, 2002;Wapenaar and Fokkema, 2006), which yields To facilitate the computation of seismograms for a specific M, we follow the work of Mustać and Tkalčić (2016) who use six independent tensors that they call elementary moment tensors as decomposed by Kikuchi and Kanamori (1991) Under the assumption that each of these elementary moment tensors have the same time dependence (e.g., in case of pure shear, this would imply faulting to occur along a straight "trajectory"), a specific M can be described as a linear combination of these elementary moment tensors, i.e., where the coefficients a n (n = 1, 2, ..., 6) are usually referred to as expansion coefficients.In this study, we assume instantaneous rupturing of the source.This is not an uncommon assumption for (relatively small) induced seismic events.This assumption implies that the time dependence of an MT is modeled using a Heaviside function.Using the decomposition above and source-receiver reciprocity, we compute elementary seismograms Consequently, we obtain In practice, all S n i (x r ; x a , t) are computed for a finite number of x a on a predetermined subsurface grid with a specific grid spacing.We detail the numerical implementation of computing the S n i (x r ; x a , t) further below.

Hamiltonian Monte Carlo
The HMC algorithm originated from the field of classical mechanics and its application to statistical mechanics (Betancourt, 2017).It is known to be one of the most efficient probabilistic algorithms within the Markov Chain Monte Carlo (MCMC) family.For our workflow, we apply a variant of the HMC algorithm that utilizes a linearization of the forward problem.
Therefore, we include several initial steps in our workflow to obtain priors that enable meaningful linearization.In total, our workflow estimates ten source parameters.These are the centroid x a (three components), the origin time T0, and the MT (6 independent MT components).
Similar to other probabilistic algorithms, HMC is deployed in the context of Bayesian inference.The objective of Bayesian inference is to obtain an estimate of the posterior probability distribution ρ(m|d) that approaches the true posterior probability distribution (from here on, we will refer to ρ(m|d) as being "the posterior").This approach combines the likelihood ρ(d|m) of the observed data given the modeled data with the simultaneous assimilation of the distribution of prior knowledge ρ(m), i.e., where m is a vector that contains model parameters and d a vector containing the observed data.The likelihood evaluates a model m against the observed data d by evaluating the misfit between the latter and forward modeled data associated with m.
The HMC algorithm relies on the sequential calculation of two quantities.These are the potential energy U, which explicitly quantifies ρ(m|d), and the kinetic energy K, which is a function of momentum vector p.Together, they make up the Hamiltonian H(m, p), which represents the total energy of a system (Neal et al., 2011), and is written as follows: A model m can be interpreted as the position of a particle within phase space.The phase space has a dimension that is twice the dimension of the model space (i.e., this dimension coincides with the length of the vector m multiplied by two).
By having the same dimension as m, the elements of the auxiliary momentum vector p are therefore needed to complement each dimension of the model space (Betancourt, 2017).The movement of the particle is highly dependent on the mass matrix M, which therefore often acts as a tuning parameter (Fichtner et al., 2019(Fichtner et al., , 2021)).The mass matrix affects the "distance" a particle travels and ideally coincides with the posterior covariance matrix.Given a certain momentum p, the particle is allowed to travel for a certain (artificial) time τ while in conjunction fulfilling Hamilton's equations: We parenthetically coined τ an artificial time because is shouldn't be confused with physical time t.It is this artificial time with which the model moves through phase space: at time τ , the particle arrives at a new location representing a new model m(τ ).The new model and momentum vectors are associated with updated potential and kinetic energies, respectively, and hence a higher or lower Hamiltonian H(p(τ ), m(τ )).Given the probability θ, that the particle will stay at the new location, the acceptance probability is given by, By sequentially evaluating equations ( 8) to ( 10) in an iterative manner, we collect all locations (models) visited by the particle, except for a number of initial models (representing the burn-in period).The density of the collected models asymptotically approaches the posterior probability distribution.
In Figure 1, we visualize the sampling behavior of both the Metropolis algorithm (a) and the HMC algorithm (b) for a 2D joint probability distribution.Note that he metropolis algorithm is a special case of the Metropolis-Hastings algorithm in the sense that the proposal distribution is symmetric (Hoff, 2009).Both algorithms start with the same starting model, which is represented by the red ball.The low a posteriori probability of this initial model corresponds to a high U .The question marks in Figure 1(a) represent randomly selected models by the Metropolis algorithm, which were not accepted due to their relatively low acceptance probability.Hence, each of these question marks involves a (computationally expensive) solution to the forward problem.Instead of using random sampling, in the HMC algorithm, the particle within phase space moves along trajectories obtained by solving equation ( 9) leading the particle being exerted towards areas with low U , as illustrated in Figure 1 9).Therefore, the algorithm is also known as the Hybrid Monte Carlo algorithm (Duane et al., 1987).Thus, after proper tuning, the HMC algorithm requires less sampling than the Metropolis algorithm to converge, which makes the HMC algorithm computationally more efficient.Assuming Gaussian distributed, uncorrelated, and coinciding data variance σ 2 d , we can write U as (Fichtner and Simutė, 2018) In our context, the x r are the locations of the N r three-component KNMI seismometers (r = 1, 2, 3, .., N r ).Furthermore, T is the length of observed and forward modeled seismograms in time, N m the number of model parameters (ten in our case), m 0 a vector containing prior means, and C m the prior covariance matrix.In application to field data, the u obs i would be field recordings by seismometers, but in this study we restrict ourselves to a numerically simulated induced event.
In our workflow, most of the computational burden in running HMC involves the evaluation of equation ( 9).This is because for each dτ we have to evaluate ∂U /∂m.To speed up the process, we use a variant of the HMC algorithm introduced by Fichtner and Simutė (2018), in which u (x r , t; m) is approximated by means of an expansion around the prior mean, i.e., around m 0 : Substituting this linearized expression in equation ( 11) gives, where A pq , b p , and c read and Differentiating equation ( 13) with respect to m p , we have (Fichtner and Simutė, 2018) which, together with the random momentum vector, determines the HMC trajectory.
Because the displacement depends linearly on the moment tensor components (see equations ( 5) and ( 6)), equation ( 12) is exact with respect to these parameters.The dependence on the other parameters is non-linear, and this non-linearity increases as the frequency of the input data increases.Therefore, in the case of induced events, which usually generate higher frequencies than stronger, regional events, the non-linearity is considerably higher.Hence, to have a tolerable linearization, accurate priors are required when it comes to the centroid and origin time.Without sufficiently accurate priors, the above HMC variant will struggle to sample the mode containing global minimum of the potential energy.Therefore, we propose an approach that involves an initial estimation of the prior mean in order to permit this linearization.This is detailed further below.
In practice, the elementary seismograms discussed in Section 2 are computed for a finite number of possible centroid locations.
That is, prior to our probabilistic inversion, we generate a database of these seismograms.This database contains, for each possible source location x a and receiver location x r (r=1,...,N r ), a total of 3 x 6 = 18 elementary seismograms (three components for each of the six elementary moment tensors).In our case, each x r corresponds to a (KNMI) seismometer location that recorded the induced event.The elementary seismograms are computed using the spectral element software SPECFEM3D-Cartesian (Komatitsch and Tromp, 2002) and we exploited spatial reciprocity while doing so.We use an existing detailed Groningen velocity model (Romijn, 2017) for this purpose from which we construct a regular grid of the model using gnam and PyAspect Python packages that are available at https://github.com/code-cullison/gnamand https://github.com/code-cullison/pyaspect.
To confirm the successful implementation of source-receiver reciprocity, we simulate a scenario of an induced event in the Groningen gas reservoir (Figure 2).The centroid is indicated with a red star, and the receivers are depicted as white triangles.
At each location, the wavefield is "recorded" at 200 meters depth by the deepest of a series of four borehole geophones (Ruigrok and Dost, 2019).The elementary seismograms computed at the location of KNMI station G094 are shown in Figure 3 (green), and superimposed on top (yellow) are the waveforms resulting from the application of source-receiver reciprocity.
All seismograms are bandpass filtered between 1 and 3 Hz, similar to the passband used by Dost et al. (2020).
We integrate the above HMC variant into our workflow by implementing a leapfrog algorithm for evaluating equation ( 9).
Furthermore, we define dτ as suggested by Neal et al. (2011) to ensure numerical stability and set a fixed value for τ for all chains.The construction of the mass matrix M is discussed in the next section.

An iterative approach
The performance of the linearized HMC variant strongly depends on the prior means (see equation ( 12)).For that reason, we propose a workflow in which the algorithm is run iteratively, with each iteration involving an update of the priors to allow for an updated linearization.Specifically, instead of evaluating equation ( 14) -( 16) once, we run a sequence of HMC chains.
For each successive chain, the posterior means and standard deviations from the previous chain act as prior means and entries for M in the new chain (i.e., the next iteration).For the first chain in the sequence, the "initial" prior means (i.e., m 0 ) are obtained via a specific scheme integrated into the workflow.The estimation of these prior means is described in more detail in the subsection below.
We test our workflow for an induced event shown in figure 2. We set the MT components to 9E+13 Nm, -1E+13 Nm, -3E+13 Nm, 8E+13 Nm, 5E+13 Nm, and 4E+13 Nm for M 11 , M 22 , M 33 , M 12 , M 13 , and M 23 , respectively.Using the moment-magnitude relation given by Gutenberg (1956) and Kanamori (1977), this moment tensor can be shown to correspond to an earthquake of magnitude 3.28 Mw.We also add noise to our synthetic seismograms in order to make our experiment more realistic.This noise is added in the frequency domain by multiplying the (complex) spectrum of each synthetic seismogram with a bivariate normal distribution that has a zero mean and a standard deviation of 15% of the amplitude of the seismogram at the dominant frequency.As a result, this noise will not only give amplitude variations but also varying time-shifts with respect to the true synthetic seismograms.When running the Markov chains, we assume the square root of the data variance (σ d ) to be 30% of the maximum amplitude of each seismogram.Admittedly, this is rather arbitrary, and in the application to field data, the data uncertainty has to be estimated from the obtained seismograms themselves.Finally, we set the origin time to 14 s.Left, center, and right plots show particle displacement in the x1, x2, and x3 direction, respectively.

Prior mean estimation
Before running the first HMC chain, we need to estimate the initial prior means and variances.In short, we propose an approach in which a first-arrival based algorithm is used to estimate the centroid.Subsequently, the origin time can be estimated, after which equation ( 17) can be used to compute the prior means for the individual moment tensor components.Each of these steps is now discussed in more detail.
Numerous algorithms exist that allow one to estimate an earthquake's hypocenter and/or centroid.Here we propose using first-arrival based algorithms for this purpose since these are computationally more efficient than waveform-based algorithms.
First-arrival based algorithms only require the computation of the P and S-wave arrival times, and by adopting a high-frequency approximation (e.g., Aki and Richards, 2002), these arrivals can be found by running one of the various Eikonal solvers (e.g., Noble et al., 2014).For example, the EDT method detailed in Lomax (2005) can be used for this purpose (Masfara and Weemstra, 2021).
As an alternative to using a first-arrival based algorithm, the prior means of the centroid can instead be retrieved from existing literature if it exists.For example, in the case of the induced seismicity in the Groningen field, Smith et al. (2020) have shown that they could resolve hypocenters with maximum uncertainties of 150 m and 300 m for epicenter and depth, respectively.
Their results could be considered as priors.Another option is to use the epicenters from the KNMI earthquake database, which by default has all depths set to 3 km.
Given a centroid prior mean that was either calculated or acquired from literature, the prior mean of the origin time can be estimated by computing the P-wave traveltimes from a centroid prior to each of the receivers.These travel times can be computed using the same Eikonal solver that was used to obtain the centroid prior (e.g., the Fast Marching Method; Sethian and Popovici, 1999).By subtracting the computed travel times from the observed (picked) first-arrival times and averaging across receivers, an initial origin time prior mean can be obtained.
To refine the initial origin time prior estimate, we cross-correlate the envelope of the observed seismograms env[u obs i (x r , t)] with the envelope of the forward modeled seismograms env[u(x r , t, m 0 )].We do this for each component of each receiver location individually.The forward-modeled seismograms are computed using full-waveform modeling (detailed in Section 5) using the initial prior means for centroid and origin time, and given arbitrary MT components.Specifically, we compute where T 0 shif t is the additional time shift that needs to be added to the initial origin time prior mean to obtain the refined origin time prior.
We test equation ( 18) using the synthetic earthquake shown in Figure 2.For this test, we add 600 m to the true x 1 , x 2 , and x 3 centroid components, and we impose a (rather aggressive) 9 s time shift with respect to the true origin time.Note, this implies that we did not employ the aforementioned procedure to obtain initial centroid and origin time priors because this would result in a centroid and origin time estimate that would be too close to the true centroid and origin time -essentially rendering the use of equation ( 18) unnecessary.In other words, we deliberately impose large deviations from the true values to show the merit of using equation ( 18).
Given arbitrary MT components, we show in Figure 4 the result of applying equation ( 18) to vertical surface displacements.
In Figure 4(a), we depicts the envelopes of modeled seismograms given available prior means (i.e., 600 m deviation from the true x 1 , x 2 , and x 3 values, and 9 s from the true T 0).Having sufficiently accurate prior means for the centroid and origin time, we then estimate the prior mean of the MT.For this purpose, we keep the centroid and origin time constant but solve for the remaining six parameters (the independent MT elements).In Section 4, we showed that because equation ( 13) is a quadratic function of m, its derivative is linear in m (see equation ( 17)).This first derivative hence coincides with zero for that model for which U (m) attains its (global) minimum value.As such, setting this derivative to zero allows us to obtain a first estimate (i.e., prior means) of the moment tensor components.Setting the left-hand side of equation ( 17) to zero yields It should be understood that equation ( 19) is implemented with T 0 and centroid fixed.Hence, the model vector has only six elements, and A is a six by six matrix.The quadratic nature of U in equation ( 13) furthermore implies that arbitrary values can be chosen for the initial moment tensor components in m 0 .In fact, in the absence of noise and the correct prior means for the centroid and origin time, the MT priors estimated using equation ( 19) will coincide with the true MT components.In practice, the prior means resulting from equation ( 19) may still deviate significantly from the true values due to the inaccuracy of the initial centroid and origin time priors.Solving equation ( 19) nevertheless provides sufficiently accurate prior information regarding the magnitude of the induced event.Finally, it is useful to draw a parallel with typical least-squares optimization problems (e.g., Virieux and Operto, 2009).In such context, A is analogous to the Hessian and the difference 310 between m and m 0 in equation ( 19) can be considered the model update vector.
8. For each of the n posteriors, compute the means m s (s = 1, ..., n).We use these means to generate synthetic recordings and evaluate them against the observed data through determination of the variance reduction (VR) 9. Define a V R threshold.Posteriors associated with a m s for which the V R exceeds this threshold are used to compute the final posterior distribution.We use the above workflow to estimate the parameters of the synthetic event shown in Figure 2. In step 1, we assume a suitable prior of the centroid can be retrieved from literature (e.g., from Smith et al. (2020) in case of the seismicity in Groningen).To simulate the fact that this prior may well deviate from the true centroid, we shift this initial centroid prior mean by 600 m in all directions (i.e., with respect to the correct event location).Having the prior mean for the centroid, we follow steps 2 and 3 in the workflow to obtain the other prior means.To encode for a state of ignorance, we set the standard deviation σ m of each model parameter to infinity, which implies that the last term of equation ( 14) evaluates to 0. The elements of our initial mass matrix are taken from the results of steps 1-3 as explained in the full workflow (step 4), except for those elements that correspond to the centroid; these we set to 300 m.Using the initial prior means and the initial mass matrix, we run 20 chains of the HMC variant.Furthermore, we run 2500 iterations (step 6) for every chain, with the first 500 samples discarded as burn-in samples.After finishing all iterations, the results of each current chain are then used to update the prior means and mass matrix for the next HMC chain (the actual iterative HMC procedure).For each of the ten model parameters, all 40,000 samples (20 iterations × 200 samples) are depicted in Figure 6.To obtain our final posterior, we take the results of chains for which the means are associated with seismograms yielding a VR >= 85% of the maximum VR.In detail, the samples from all chains (black dots) and the selected chains (green dots) are depicted in Figure 7.For the samples of the selected chains, the one-dimensional marginal probability distributions of each of the ten source parameters are shown in Figure 8.In Figure 9, we show the synthetic (observed) seismograms (green), seismograms generated from the final posterior means (blue), and the true noise-free seismograms (red).With the use of a database containing pre-computed elementary seismograms and using the python code we developed, the entire workflow takes approximately one minute to finish on a single-core CPU system.7 The importance of the prior The above workflow might not be optimal if the initial prior information is "weak" in the sense that the initial centroid prior mean deviates significantly from the true value.This is due to the fact that our forward problem is in essence a non-linear problem, whereas the adopted linearization (Section 4) relies on the assumption of it being only weakly non-linear.In other words, poor initial centroid priors imply linearization around a location x that deviates too much from the true source location x a , which may result in the HMC algorithm "getting stuck" in local minima.This problem can be mitigated by running the workflow with multiple initial prior means.Depending on how close each of the initial prior means is to the true values, some chains might get stuck in a local (minimum) mode while others correctly sample the mode containing the global minimum (or global maximum if one considers ρ(m|d)).In the end, the final posterior can be drawn by combining the results of all chains given multiple initial prior means.
To showcase the effect of weak prior information in the context of induced seismicity in the Groningen gas field, we re-use the synthetic earthquake in Figure 2.However, instead of shifting the initial centroid prior mean by 600 m for all coordinate  components, we rigorously shift it by 1 km for each horizontal coordinate.Meanwhile, the depth is set to 3 km, corresponding to the default depth in the KNMI database.In particular because, in application to field data, this database will be our primary source to obtain our priors.To get additional initial centroid prior means, we construct a 2.8 x 2.8 km 2D grid, at a depth of constructed grid, assuming that the depth could be around ± 500 m, the maximum total distance is about 700 m, which is then considered safe for the HMC algorithm to sample the mode containing the global minimum.
Overall, given our 5x5 horizontal grid, we have 25 initial centroid prior means, each of them being subjected to our workflow.

385
For each workflow run, we use the same model and data uncertainty as in the initial synthetic case.The same applies to the number of chains (20), samples per chain (2500), and burn-in period (500 samples).To reduce the computational time, we run the 25 workflows (associated with 25 initial centroid prior mean) simultaneously by parallelizing our code.We subsequently collect the results of each workflow to obtain an estimate of our posterior distribution.The results of this parallelization are summarized in Figure 10, which highlights the effect of the separation between the centroid prior mean and the true centroid.

390
Using the same threshold as in our initial experiment (VR>= 85% of maximum VR), in Figure 11, we show all (non burn-in) samples associated with the selected chains and samples from all chains given the calculated VR.For the selected chains, the marginal probability distribution of each parameter is presented in Figure 12.As expected, chains with an initial centroid prior mean relatively close to the true centroid converge to the true mode (containing the global minimum).At the same time, chains successful in picking appropriate chains, allowing us to obtain an estimate of the posterior distributions.

Discussion and Conclusions
Using synthetic events, we demonstrate that the proposed probabilistic workflow is able to efficiently estimate the posterior probability of the various parameters describing induced seismic events.A number of caveats need to be made though.First, the synthetic recordings used to test our probabilistic workflow are the result of propagating a wavefield through the very same 400 velocity model as the one used to estimate the posterior (i.e., the velocity model in our probabilistic workflow).In application to field data, this would obviously not be the case.Part of the misfit between modeled recordings and observed recordings would then be the result of discrepancies between the true velocity model and the employed numerical velocity model.Second, and in the same vein, we employed the same code (SPECFEM3D-Cartesian) for generating the synthetic recordings as for modelling Figure 11.Same ten two-dimensional marginal probability densities as in Figure 7).Note that scales on horizonal and vertical axes differ.
Black dots are the samples from all chains, whereas the green dots represent the samples from chains with a posterior mean that yields a VR that is higher than 85% of the maximum VR.The red lines represent the true values.the wavefield in the probabilistic workflow.And although this code is known to be rather accurate (Komatitsch and Tromp, 405 2002), undoubtedly some of the physics describing the actual wavefield propagation is not fully captured by SPECFEM3D-Cartesian. Third, this study does not include an application to field data.This is intentional as our objective is to present a stand-alone workflow that can be applied in any induced seismic setting.Applying a methodology to field recordings of induced seismic events (e.g., in Groningen) would require numerous processing details, which we consider to be beyond the scope of this paper.We are currently drafting a follow-up manuscript in which we apply the proposed HMC workflow to field recordings of induced seismic events in Groningen.
The aforementioned deviation of the available numerical velocity model from the true subsurface velocities, will pose a number of challenges.First, the estimated posterior probability would give a lower bound in terms of the variability of the source parameters: inaccuracies in the velocity model necessarily imply broader posterior probabilities.Second, in the presence of strong anisotropy, the posterior could be adversely affected.In particular, in case of non pure shear mechanisms this effect could be significant (Ma et al., 2022).Third, cycle skipping will be particularly hard to tackle in case the velocity model is rather inaccurate.
Our workflow includes a systematic approach to obtain meaningful initial priors, which is particularly important for the employed HMC variant: the linearization of the forward problem around the prior mean requires the initial priors to be sufficiently close to the true event location.Furthermore, we show that by using an iterative scheme, we can update the prior mean such that convergence is obtained to a centroid location that allows the estimation of a meaningful posterior.The iterative scheme involves sequentially updating the prior mean of each new HMC chain using the posterior estimate obtained from the previous HMC chain.This approach is based on the suggestion of Fichtner and Simutė (2018) to repeat the Taylor expansion (of the forward problem) for each new sample of the Markov chain.However, we (only) do this every 2500 samples in our case.
A brute-force approach to perform the expansion at each step of every chain would render computational costs prohibitively large.
Prior to executing the workflow, one needs to compile a database of the elementary seismograms, which often requires significant computing power.In our case, it took about one day to generate the database using one node of our computer cluster that consists of 24 CPU cores (Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz) with a total ram of 503GiB.Once compiled, our workflow can be run efficiently.Using a single-core CPU system, a single run of our workflow with 20 sequential HMC chains takes about one minute to finish, where each chain consists of 2500 iterations.In contrast, the computational costs of the Metropolis algorithm (to get the same results) would be much higher, as previously shown by (Fichtner and Simutė, 2018;Fichtner et al., 2019).Furthermore, various modifications could be applied to the workflow, such as adding simulated annealing and tempering (Tarantola, 2006), including a step to quantify the error in the input seismograms (Mustać and Tkalčić, 2016), and/or applying a scheme that is able to tune dτ and τ for each HMC run (Hoffman et al., 2014).These modifications could be beneficial, especially when dealing with field observations, which is the subject of future work.We also show that the workflow can be adapted to account for scenarios in which the initial centroid prior mean is rather inaccurate and/or the initial prior is weak.If that is the case, an approach can be adopted in which various iterative HMC workflows, each using a different centroid prior mean as a starting point, are run.Subsequently, using the variance reduction associated with the posterior means of the individual chains as a binary criterion for selecting a chain's samples, a final estimate of the posterior probability can be obtained.
We would like to emphasize that our workflow is, in principle, not limited to inversions of the parameters we use here.
We could extend our probabilistic inversion to parameters such as stress drop, velocity, or to invert for finite fault source parameters.Furthermore, it is important to mention that our workflow aims to invert seismic source parameters using seismic surface recordings in a specific frequency range.That is, it is specifically geared towards inverting for induced seismic events.
We found that the workflow works well when applied to data with frequencies between 1 and 3 Hz.For higher frequencies, however, some testing might be needed because the non-linearity between the input data and model parameters increases with increasing frequency.
(b).Furthermore, in Figure 1(b), the result of solving equation (9) (i.e., the HMC trajectory) is represented by the brown dashed lines, and the pointing finger represents the momentum vector p.For both the HMC and Metropolis algorithms, an accepted model serves as starting model for the next sample.Although being probabilistic in terms of acceptance probabilities, the trajectories of the HMC algorithm are deterministically guided by ∂U/∂m as shown in equation (

Figure 1 .
Figure 1.Comparison between the sampling strategy of (a) Metropolis algorithm and (b) Hamiltonian Monte Carlo algorithm .

Figure 2 .
Figure 2. Scenario of an induced earthquake in Groningen area.a. Horizontal slice of the Groningen P-wave velocity model.b.Zoom of the area indicated by the green rectangle in a. 'Inverted triangles indicate locations of KNMI stations (i.e., the xr)'.c.Vertical slice along the blue line in b. d.Vertical slice along the red line in b.

Figure 3 .
Figure 3.Comparison between elementary seismograms due to a source at the actual location (red star) and the receiver at G094 (green) and the elementary seismograms resulting from the implementation of source-receiver reciprocity (yellow).The equality of the traces confirms successful implementation of source-receiver reciprocity.Along the vertical axis, all six (independent) elementary seismograms are depicted.
Figure 4(c) is the result of applying equation (18) to each of the displacement envelopes.In Figure 4(d), we show the result of stacking all signals in Figure 4(c).The vertical blue line indicates the time at which the stack of the cross-correlated envelopes attains its maximum value, i.e., T 0 shif t ; the vertical red line represents the deviation of the initial origin time prior from the true origin time (i.e., 9 seconds in this example).

Figure 4 .
Figure 4.The results of estimating the prior mean of origin time using equation (18) (d), given the envelopes of modeled displacements (a), noisy synthetic "observed" seismograms (b), and the convolution between (a) and (b) in (c).

Figure 5 .
Figure 5. Full workflow of our iterative HMC scheme.

Figure 6 .
Figure 6.Results of our iterative HMC scheme for a total of 20 chains, each involving 2500 steps of which the first 500 are discarded as burn-in samples (not shown).The red lines are the true values.

Figure 7 .
Figure 7.Ten two-dimensional marginal probability densities of the inverted model parameters.Black dots are all the samples given the results from all chains (Figure 6), whereas the green dots represent the samples from chains that give VR>= 85% of the maximum VR, which then represent our final posterior.The red lines are the true values.

Figure 8 .
Figure 8.The final marginal posterior distributions (green samples in Figure 7).The means are represented by the blue lines and the gray lines are the standard deviations.Red lines are the true values.

Figure 10 .
Figure 10.Summary of running the workflow using the 25 initial prior means.The distance between each of the initial centroid prior means and the true centroid is indicated on top.Below, we show for each of these initial prior means the VR as a function of chain number (vertical axis).Chains associated with a VR>= 85% of the maximum VR (0.4) are labeled with green dots, whereas chains with a posterior mean yielding seismograms for which the VR does not exceed 85%, are labeled with black dots.

Figure 12 .
Figure 12.Marginal posterior distributions for each model parameter given the selected chains (depicted as green dots in Figure 10).The blue lines represent the means, and the standard deviations are represented by the grey lines.The red lines represent the true values.