Quantifying Fracture Networks Inferred From Microseismic Point Clouds by a Gaussian Mixture Model With Physical Constraints

Microseismicity is generated by slip on fractures and faults and can be used to infer natural or anthropogenic deformation processes in the subsurface. Yet identifying patterns and fractures from microseismic point clouds is a major challenge that typically relies on the skill and judgment of practitioners. Clustering has previously been applied to tackle this problem, but with limited success. Here, we introduce a probabilistic clustering method to identify fracture networks, based on a Gaussian mixture model algorithm with physical constraints. This method is applied to a rich microseismic data set recorded during the hydraulic fracturing of eight horizontal wells in western Canada. We show that the method is effective for distinguishing hydraulic‐fracture‐created events from induced seismicity. These fractures follow a log‐normal distribution and reflect the physical mechanisms of the hydraulic fracturing process. We conclude that this method has wide applicability for interpreting natural and anthropogenic processes in the subsurface.


Introduction
Fracture networks play a fundamental role for subsurface fluid transport in oil and gas reservoirs, volcanoes, fault systems, and aquifers. Many types of discrete-fracture-network models have been developed to understand these processes. Some examples include models of contaminant transport (Berkowitz, 2002), hydroshearing for geothermal energy (Tezuka & Niitsuma, 2000), block caving for mining (Hudyma, 2008), and hydraulic fracturing (HF) for unconventional oil and gas (Maxwell, Mack, et al., 2015). However, the general inaccessibility of subsurface-fracture networks means that inference methods for characterization and analysis are required.
Slip activation of fractures and faults can generate seismic waves that can be used to characterize natural and anthropogenic fluid transport in the subsurface. This microseismicity can provide important clues about the location, density, and orientation of fractures (Eaton, 2018). Microseismic point clouds typically have noisy, complex, and incomplete spatiotemporal expressions. This complicates their use in investigating subsurface fracture networks. Most attempts to infer fracture characteristics rely on heuristic pattern recognition, but the large number of events in modern microseismic catalogs quickly overwhelms these approaches. This problem is compounded by the multivariate attributes of microseismic data, which tends to confound simple analytical methods such as cross plots.
This study investigates HF-created microseismicity and induced seismicity (IS; Shapiro et al., 2006;Eaton, 2018). Hydraulic fracturing creates new fractures and dilates natural fractures by injecting fluids a high pressure through multiple discrete stages in a wellbore (Hubbert & Willis, 1972;Smith & Montgomery, 2015). The creation of new fractures is initiated by a negative effective stress that exceeds the tensile strength of ©2019. The Authors. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

10.1029/2019GL083406
Key Points: • A clustering method with a Gaussian mixture model and physical constraints can identify fracture networks based on microseismic point clouds • Using physical constraints, this method can distinguish hydraulicfracturing-created microseismicity from induced seismicity (slip on faults) • The quantified fracture network follows a log-normal distribution and reveals the physical mechanisms of hydraulic fracturing Supporting Information: • Supporting Information S1 • Data Set S1 • Data Set S2 • Movie S1 • Figure S1 • Figure S2 • Figure S3 • Figure S4 • Figure S5 • Figure S6 • Figure S7 • Figure S8 • Figure S9 • Figure S10 • Figure S11 • Figure S12 • Figure S13   intact rock (Johann et al., 2016). The propagation of this fracture then proceeds toward the maximum horizontal stress in accordance with fracture mechanics. The propagation is governed by the net pressure, elastic properties of the surrounding rock and the fracture toughness of the rock in front of the propagating fracture (Valko & Economides, 1995). The failure of natural fractures is also an effective stress-driven process, but it can occur in either shear or tensile fracturing mode when the frictional resistance and effective stresses surrounding a natural fracture are overcome (Rutledge & Phillips, 2003). Microseismicity is a key tool for understanding the physics of HF through the failure of intact rock and natural fractures (Warpinski, 2013).
In this paper, we introduce a new clustering method that employs a Gaussian mixture model (GMM) with physical constraints. These include a spatiotemporal-based separation of events into stages followed by the application of a diffusion-based model (Shapiro, 2015) to separate HF-created microseismicity from IS. The method is applied to a rich microseismic data set recorded during a hydraulic-fracturing treatment. We show that the application of hybrid analytical-clustering methods holds significant promise for characterizing fractures in the subsurface and investigating IS. This is one of the first applications of such a method for fracture quantification and we show its superior performance relative to application of GMMs only.

Data and Methods
The data set in this study ( Figure 1) consists of 17,540 microseismic events observed during a HF treatment of eight horizontal wells (Figures 1a and 1b). The wells are located at a depth of~2,500 m within the Duvernay formation, a prolific unconventional oil and gas formation in Western Canada. The maximum horizontal stress roughly trends east-west in the study area, which is in a strike-slip tectonic regime (Anderson, 1951). The HF program stimulated stages in multiple wells simultaneously using an industrial practice called zipper fracking (Eaton, 2018). This causes concurrent microseismic events in multiple wells and further challenges event interpretation. The HF program consisted of 153 stages completed during two time periods ( Figure 1c). Wells 1 to 4 were completed on days 0 through 17 and wells 5 to 8 were completed on days 26 through 42. The microseismic signatures are illustrated by the overlapping colors (representing different wells) in Figure 1c. Between 430 and 3,000 m 3 of fluid and 45 to 300 tons of proppant were injected into each stage. Several induced seismic events are visible east of the wells (Figure 1a). The observed microseismic events are characterized by reported moment magnitudes (MW) in the range −2.4 to 2.3. The magnitudes were estimated based on low-frequency spectral amplitudes, and it is likely that events exceeding MW 1 are above the saturation limits of the geophones (cf. Eaton et al., 2018). This limitation has no effect on the clustering analysis considered here.
The microseismic data set is defined by spatial coordinates (R), origin time (T), M W , moment tensor (MT) solution. Each event in the catalog is also assigned to a well and stage. To process the data, events below the magnitude of completeness (M C ) are removed. The M c is the lowest magnitude that a specific geophone array records completely. It is determined using a combination of the binned maximum-likelihood (Aki, 1965;Bender, 1983;Freedman & Diaconis, 1981), maximum curvature, and goodness-of-fit methods (Wiemer & Wyss, 2000) with a cross-validated approach (Kuhn & etal, 2008). Removing events below the M c provides an unbiased data set with 11,978 events. Additional details on the data processing are provided in Text S1 in the supporting information. The data set has event location uncertainty that ranges from 0.1 m for large-magnitude events to 900 m for low-magnitude events near the study area boundary. This, along with the uncertain HF initiation point, create variability in the clustering method results. A bootstrap resampling (Text S2, Efron (1982)) of the location uncertainty and HF initiation point uncertainty (Williamson, 1999) quantifies the mean (M) and standard deviation (s.d.) of 10,000 realizations for key metrics in this study.
GMM algorithms rely on the multidimensional distance (or dissimilarity) between data attributes. Consequently, the attributes used in the method strongly affect its results. We evaluate the importance of each attribute and select an appropriate subset of attributes using correlation and PCAs. First, a nonparametric correlation (Spearman, 1904) is conducted, as opposed to a Pearson correlation (Pearson, 1897), because the data are nonnormal (Rahman & Govindarajulu, 1997;Razali & Wah, 2011;Shapiro & Wilk, 1965;Wei & Simko, 2013). Then, PCA is used to visualize the influence of each attribute on the data variance and group-similar attributes together in reduced-dimensionality space (Hotelling, 1933;Pearson, 1901). The application of correlation and PCAs shows that the R and T attributes are more influential than the MT and M W attributes. Based on this, a lower-dimension data set with R and T attributes only is considered in the clustering method. Creating this subset is shown to improve the performance of the clustering algorithm. Thus, the use of correlation and PCA (Text S3) is an essential step in the implementation of the algorithm. Additional details on correlation and PCAs and their results are provided in Text S3. Beyond attribute selection, the PCA results also suggest that fracture orientation and spatiotemporal proximity of the microseismic events to the fracturing stage play a role in the generated magnitude.
We compare and validate the GMM method against three widely used deterministic clustering algorithms: K-medoids (MacQueen, 1967;Kaufman & Rousseeuw, 2009;Hennig, 2013), agglomerative clustering (R Core Team, 2014;Ward, 1963), and density-based spatial clustering with noise (Sander et al., 1998;Hahsler & Piekenbrock, 2017). Validation statistics based on the within-cluster and between-cluster distances are applied to evaluate this comparison (Caliński & Harabasz, 1974;Charrad et al., 2014;Kaufman & Rousseeuw, 1987). Details on clustering are discussed in Text S4. The comparison shows that the GMM method reveals lineations in regions of high point cloud density, where deterministic algorithms struggle with noisy point cloud data.
The GMM algorithm is a probabilistic clustering method (Banfield & Raftery, 1993;Celeux andGovaert, 1995, Fraley &Raftery, 2006) that assigns multivariate Gaussian distributions via an expectation-maximization algorithm (Dempster et al., 1977) with the Bayesian information criterion (Schwarz, 1978). The GMM algorithm can address overlapping clusters, uncertainty in point locations, and high densities of microseismic observations to a degree. The use of the Bayesian information criterion results in a parsimonious solution that approximates Bayesian evidence, optimizing the trade-off between data fit and complexity (i.e., the number of clusters). The GMM algorithm has been successfully applied to studies in hydrology (Kim et al., 2014), energy (Kazor & Hering, 2015), and geochemistry (Ellefsen et al., 2014). It also has the advantage of defining a confidence ellipse for each cluster. The geometric properties of each ellipse are directly interpretable, avoiding the need to apply additional techniques such as the convex hull method (Barber et al., 1996). An ellipsoid is an ideal representation of a hydraulic fracture, which tends to have dimensions of the order of 10 2 m (length), 10 1 to 10 2 m (height), and 10 -2 m (aperture; Nelson, 2001). The ratio of the three principal axes in each cluster quantifies its planarity and thus the confidence that each cluster is a fracture. Nearspherical clusters that contain a small number of microseismic events are classified as spurious and removed from further interpretation. In addition, a series of simple physical constraints are introduced to segregate the point cloud into groups and to improve the GMM method's ability to identify fractures.

Results and Discussion
The GMM method is first applied to R and T attributes from the preprocessed data set, resulting in the identification of 25 clusters (Figure 2). This unconstrained application reveals a cluster of induced events east of the wells (identified by the gray clustered points in Figure 1) but is unable to meaningfully cluster hydraulic fractures. Most clusters trend parallel to the horizontal portion of each well and close to the injection stages, where the density of microseismic events is highest. But, they generally fail to capture the geometry of hydraulic fractures, which are expected to be elongated clusters oriented perpendicular to the minimumstress axis. A more granular analysis with stage-level separation is therefore required for a meaningful investigation of microseismicity.
The stage-level separation improves clustering performance by dividing the data set into 152 stages and applying a new classification algorithm to associate each microseismic event to a completion stage, resulting in up to 310 events (M: 310, s.d.: 2). This algorithm applies a Euclidian distance in space and time (see Text S5). Note that classification prior to this part of the analysis was based on operator classification which considers only the timing of microseismic events relative to the start of a completion stage. However, HF is a fluid-driven diffusive process and it has been demonstrated that microseismic events can continue for hours and days after the end of a particular stage (Johann et al., 2016;Shapiro, 2015;Shapiro et al., 2006).

Geophysical Research Letters
Separating stages by origin time alone can result in erroneously classified events that are simply a continuation of the HF process after an injection stage ends. Use of both time and distance reclassifies 25% of the data set relative to the time-only classification. This significantly improves the clustering performance, as illustrated by a portion of the data set from well 3, stage 2 (w3s2) that is provided as Data Set S1. The original data for this single-stage portion (Figure 3a) contain 310 events and reveal seven clusters, whereas the reclassified data (Figure 3b) contain 216 events (M: 203, s.d.: 21) and reveal four clusters. The reclassification of the distant event in the northeast tightens the noisy cluster (red in Figures 3a and 3b) and eliminates a previously identified lineation (dark blue, Figure 3a) to provide a more parsimonious solution.

10.1029/2019GL083406
The events associated with each stage still contain both HF-created and induced events. Therefore, a physical constraint reflecting HF physics is required to separate these two types of events. Here, we apply an analytical diffusion-based model (Shapiro et al., 2006) to define a sphere of influence for each hydraulic fracture. This model (cf. Shapiro, 2015, pp. 167, eq. 4.3 & 4.4) relies on the approximates of the Perkins, Kern, and Nordgren semianalytical model of hydraulic fracture growth in an isotropic rock mass (Nordgren, 1972). The length of a constant-height (h f ) hydraulic fracture (L) is estimated based on the average injection rate (q i ) and fluid leakoff (c L ) into the formation, as detailed in equations (1) and (2) and Text S5. We assume a fracture height equal to the average thickness of the Duvernay formation (75 m) in the study area and a typical leakoff coefficient for shale (0.001 m/min; Munson, 2015). identified by the Gaussian mixture model algorithm after reclassification and application of physical constraints. Each contour line represents 0.5 standard deviations and the contour edge represents a 95% confidence interval. Normalized semi-log histograms illustrate the height (blue) and length (red) of the hydraulic fracturing-created clusters (c) and IS clusters (d). The empirical cumulative distribution functions (black lines) are compared with log-normal distributions (blue for height and red for length) that were fit using the maximum likelihood method.
The model predicts a mean fracture length of 379 m for w3s2 where fluid was injected at a rate of 14.9 m 3 / min for 390 min. This length is increased by 1 standard deviation to project a spherical region of influence of 712 m. This sphere physically represents an estimate of the stimulated reservoir volume (Maxwell et al., 2009), and it is used to separate HF-created events and induced microseismicity. The variance is a tuning parameter for the method, but 1 standard deviation seems to reflect the likely extent of a hydraulic fracture. Two standard deviations results in spheres of influence nearly 10 times the expected size of a hydraulic fracture. The application of the physical constraint separates the stage-segregated events into 148 HF-created groups (M: 146, s.d.: 0.5) and 38 IS groups (M: 37, s.d.: 1.2). This successful application of a diffusion-based spatiotemporal model to separate HF-created microseismicity and IS reinforces the idea that HF-created microseismicity is fluid driven and reflects the geomechanical response of unconventional reservoirs.
We then apply the GMM to each of the 186 groups of HF-created and IS groups resulting in the identification of 488 clusters (M: 484, s.d.: 9.6). Figure 4 presents the results for w3s2, with the results from the entire data set presented as Figures S13 and S14. In many cases, the HF-created events are clustered into single ellipsoids as a result of optimizing the number of clusters with the Bayesian information criterion. In other cases, multiple clusters (up to 11, M: 8, s.d.: 0.8) are identified. The orientation and dimensions of each cluster are quantified by a multidimensional 95% confidence ellipsoid. The covariance matrix of each ellipse represents the principal axes of the cluster, which quantify fracture length and height (Text S5).
Using the results, we now study the physical meaning of clusters. In an isotropic and homogeneous rock mass, hydraulic fractures propagate parallel to the maximum principal stress (Jaeger, Cook, and Zimmerman, 2007). Induced seismicity, in contrast, is expected to be localized within regions that are oblique to the maximum principal stress, where the shear stress is maximized (Ellsworth, 2013). Equal-area stereographic projections of the clustering results ( Figure 5)  . This is consistent with previous numerical simulations of fracture growth (Maxwell, Zhang, & Damjanac, 2015) and the assumptions of the Perkins, Kern, and Nordgren model approximation. The distribution of fracture heights and lengths are both well fit by log-normal distributions in this study (Figures 5c and 5d). This provides evidence that the results reflect a fracture network, since it has been established that fracture networks are self-similar with attributes generally following log-normal distributions (Eaton et al., 2014;Gutierrez & Youn, 2015;Nelson, 2001).

Conclusions
This study presents a new method for fracture network identification from microseismic point cloud data. Our method uses a GMM algorithm with physical constraints to probabilistically cluster events into hydraulically -created fractures and IS. The method is unbiased, automated, and provides a numerical approach for fitting elliptical planes in multidimensional spaces. It is computationally efficient and can be applied in real time. A HF program in an unconventional oil and gas reservoir is used as a case study. We demonstrate that our method is more effective than other deterministic clustering algorithms, revealing hundreds of fractures in a preprocessed microseismic data set with 11,978 events. The method successfully differentiates HF-created events from IS (fault slip) using a diffusion-based model. It is capable of extracting important information from microseismic data, which are inherently ambiguous due to the inaccessibility of the fractured region. The orientation and geometry of the identified fractures reflect the physical mechanisms of the HF process. This method, while not a panacea for clustering point cloud data, provides insights with widespread implications for researchers and industry practitioners. For example, to identify weak fault planes during monitoring of mining or geothermal microseismicity. Comparing dense lineations to operationally expected microseismicity provides a new tool for fault identification and IS mitigation.