Skip to main content
Log in

Machine learning for graph-based representations of three-dimensional discrete fracture networks

  • Original Paper
  • Published:
Computational Geosciences Aims and scope Submit manuscript

Abstract

Structural and topological information play a key role in modeling flow and transport through fractured rock in the subsurface. Discrete fracture network (DFN) computational suites such as dfnWorks (Hyman et al. Comput. Geosci. 84, 10–19 2015) are designed to simulate flow and transport in such porous media. Flow and transport calculations reveal that a small backbone of fractures exists, where most flow and transport occurs. Restricting the flowing fracture network to this backbone provides a significant reduction in the network’s effective size. However, the particle-tracking simulations needed to determine this reduction are computationally intensive. Such methods may be impractical for large systems or for robust uncertainty quantification of fracture networks, where thousands of forward simulations are needed to bound system behavior. In this paper, we develop an alternative network reduction approach to characterizing transport in DFNs, by combining graph theoretical and machine learning methods. We consider a graph representation where nodes signify fractures and edges denote their intersections. Using random forest and support vector machines, we rapidly identify a subnetwork that captures the flow patterns of the full DFN, based primarily on node centrality features in the graph. Our supervised learning techniques train on particle-tracking backbone paths found by dfnWorks, but run in negligible time compared to those simulations. We find that our predictions can reduce the network to approximately 20% of its original size, while still generating breakthrough curves consistent with those of the original network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abelin, H., Birgersson, L., Moreno, L., Widén, H., Ågren, T., Neretnieks, I.: A large-scale flow and tracer experiment in granite: 2. results and interpretation. Water Resour. Res. 27(12), 3119–3135 (1991)

    Article  Google Scholar 

  2. Abelin, H., Neretnieks, I., Tunbrant, S., Moreno, L.: Final report of the migration in a single fracture: experimental results and evaluation. Nat Genossenschaft fd Lagerung Radioaktiver abfälle (1985)

  3. Aldrich, G., Hyman, J., Karra, S., Gable, C., Makedonska, N., Viswanathan, H., Woodring, J., Hamann, B.: Analysis and visualization of discrete fracture networks using a flow topology graph. IEEE T. Vis. Comput. Gr. (2016)

  4. Andresen, C.A., Hansen, A., Le Goc, R., Davy, P., Hope, S.M.: Topology of fracture networks. Front. Phys. 1, 1–5 (2013). https://doi.org/10.3389/fphy.2013.00007

    Article  Google Scholar 

  5. Anthonisse, J.: The rush in a directed graph. Stichting Mathematisch Centrum. Mathematische Besliskunde (BN 9/71), 1–10. http://www.narcis.nl/publication/RecordID/oai%3Acwi.nl%3A9791 (1971)

  6. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on Computational learning theory COLT’92. https://doi.org/10.1145/130385.130401, p 144 (1992)

  7. Brandes, U., Fleischer, D.: Centrality measures based on current flow. Proc. 22nd Symp. Theor. Aspects Comput. Sci. 3404, 533–544 (2005)

    Google Scholar 

  8. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324

    Article  Google Scholar 

  9. Chen, C., Liaw, A., Breiman, L.: Using random forest to learn imbalanced data. Tech Rep. 666, University of California, Berkeley (2004)

  10. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https://doi.org/10.1007/BF00994018

    Google Scholar 

  11. de Dreuzy, J.R., Méheust, Y., Pichot, G.: Influence of fracture scale heterogeneity on the flow properties of three-dimensional discrete fracture networks. J. Geophys. Res.-Sol. Ea 117(B11), 1–21 (2012)

    Article  Google Scholar 

  12. Frampton, A., Cvetkovic, V.: Numerical and analytical modeling of advective travel times in realistic three-dimensional fracture networks. Water Resour. Res. 47(2), 1–16 (2011)

    Article  Google Scholar 

  13. Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry 40(1), 35–41 (1977)

    Article  Google Scholar 

  14. Ghaffari, H.O., Nasseri, M.H.B., Young, R.P.: Fluid flow complexity in fracture networks: analysis with graph theory and LBM. arXiv:http://arXiv.org/abs/1107.4918 (2011)

  15. Goetz, J.N., Brenning, A., Petschko, H., Leopold, P.: Evaluating machine learning and statistical prediction techniques for landslide susceptibility modeling. Comput. Geosci. 81, 1–11 (2015)

    Article  Google Scholar 

  16. Hagberg, A.A., Schult, D.A., Swart, P.: Exploring network structure, dynamics, and function using Networkx. In: Proceedings of the 7Th Python in Science Conferences (Scipy 2008), vol. 2008, pp 11–16 (2008)

  17. Ho, T.K.: Random decision forests. In: Proceedings of the 3rd international conference on document analysis and recognition, vol. 14–16, pp 278–282 (1995)

  18. Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998). https://doi.org/10.1109/34.709601

    Article  Google Scholar 

  19. Hope, S.M., Davy, P., Maillot, J., Le Goc, R., Hansen, A.: Topological impact of constrained fracture growth. Front. Phys. 3, 1–10 (2015). https://doi.org/10.3389/fphy.2015.00075

    Google Scholar 

  20. Hyman, J., Jiménez-martínez, J., Viswanathan, H., Carey, J., Porter, M., Rougier, E., Karra, S., Kang, Q., Frash, L., Chen, L., et al.: Understanding hydraulic fracturing: a multi-scale problem. Phil. Trans. R. Soc. A 374(2078), 20150,426 (2016)

    Article  Google Scholar 

  21. Hyman, J.D., Aldrich, G., Viswanathan, H., Makedonska, N., Karra, S.: Fracture size and transmissivity correlations: implications for transport simulations in sparse three-dimensional discrete fracture networks following a truncated power law distribution of fracture size. Water Resour. Res. 52(8), 6472–6489 (2016). https://doi.org/10.1002/2016WR018806

    Article  Google Scholar 

  22. Hyman, J.D., Gable, C.W., Painter, S.L., Makedonska, N.: Conforming Delaunay triangulation of stochastically generated three dimensional discrete fracture networks: a feature rejection algorithm for meshing strategy. SIAM J. Sci. Comput. 36(4), A1871–A1894 (2014)

    Article  Google Scholar 

  23. Hyman, J.D., Hagberg, A., Srinivasan, G., Mohd-Yusof, J., Viswanathan, H.: Predictions of first passage times in sparse discrete fracture networks using graph-based reductions. Phys. Rev. E 96(013), 304 (2017)

    Google Scholar 

  24. Hyman, J.D., Karra, S., Makedonska, N., Gable, C.W., Painter, S.L., Viswanathan, H.S.: dfnWorks: a discrete fracture network framework for modeling subsurface flow and transport. Comput. Geosci. 84, 10–19 (2015). https://doi.org/10.1016/j.cageo.2015.08.001

    Article  Google Scholar 

  25. Hyman, J.D., Painter, S.L., Viswanathan, H., Makedonska, N., Karra, S.: Influence of injection mode on transport properties in kilometer-scale three-dimensional discrete fracture networks. Water Resour. Res. 51(9), 7289–7308 (2015)

    Article  Google Scholar 

  26. James, G., Witten, D., Hastie, T., Tibshirani, R.: An introduction to statistical learning. Springer, Berlin (2013)

    Book  Google Scholar 

  27. Jenkins, C., Chadwick, A., Hovorka, S.D.: The state of the art in monitoring and verification—ten years on. Int. J. Greenh. Gas. Con. 40, 312–349 (2015)

    Article  Google Scholar 

  28. Karra, S., Makedonska, N., Viswanathan, H., Painter, S., Hyman, J.: Effect of advective flow in fractures and matrix diffusion on natural gas production. Water Resour. Res. 51, 1–12 (2014)

    Google Scholar 

  29. LaGriT: Los Alamos Grid Toolbox, (LaGriT) Los Alamos National Laboratory. http://lagrit.lanl.gov (2013)

  30. Liaw, A., Wiener, M.: Classification and regression by randomForest. R News 2(3), 18–22 (2002)

    Google Scholar 

  31. Lichtner, P., Hammond, G., Lu, C., Karra, S., Bisht, G., Andre, B., Mills, R., Kumar, J.: PFLOTRAN User manual: a massively parallel reactive flow and transport model for describing surface and subsurface processes. Tech. rep. (Report No.: LA-UR-15-20403) Los Alamos National Laboratory (2015)

  32. Maillot, J., Davy, P., Le Goc, R., Darcel, C., De Dreuzy, J.R.: Connectivity, permeability, and channeling in randomly distributed and kinematically defined discrete fracture network models. Water Resour. Res. 52(11), 8526–8545 (2016)

    Article  Google Scholar 

  33. Makedonska, N., Painter, S.L., Bui, Q.M., Gable, C.W., Karra, S.: Particle tracking approach for transport in three-dimensional discrete fracture networks. Computat. Geosci. 19(5), 1123–1137 (2015)

    Article  Google Scholar 

  34. Mudunuru, M.K., Karra, S., Makedonska, N., Chen, T.: Joint geophysical and flow inversion to characterize fracture networks in subsurface systems. arXiv:http://arXiv.org/abs/1606.04464 (2016)

  35. National Research Council: Rock fractures and fluid flow: contemporary understanding and applications. National Academy Press, Washington, D.C. (1996)

    Google Scholar 

  36. Neuman, S.: Trends, prospects and challenges in quantifying flow and transport through fractured rocks. Hydrogeol. J. 13(1), 124–147 (2005)

    Article  Google Scholar 

  37. Newman, M.: A measure of betweenness centrality based on random walks. Soc. Netw. 27(1), 39–54 (2005)

    Article  Google Scholar 

  38. Osuna, E., Freund, R., Girosi, F.: Support vector machines: training and applications. Tech. Rep. AIM-1602 Massachusetts Institute of Technology (1997)

  39. Painter, S., Cvetkovic, V.: Upscaling discrete fracture network simulations: an alternative to continuum transport models. Water Resour. Res. 41, W02,002 (2005)

    Article  Google Scholar 

  40. Painter, S., Cvetkovic, V., Selroos, J.O.: Power-law velocity distributions in fracture networks: numerical evidence and implications for tracer transport. Geophys. Res. Lett. 29(14), 1–4 (2002)

    Article  Google Scholar 

  41. Painter, S.L., Gable, C.W., Kelkar, S.: Pathline tracing on fully unstructured control-volume grids. Computat. Geosci. 16(4), 1125–1134 (2012)

    Article  Google Scholar 

  42. Rasmuson, A., Neretnieks, I.: Radionuclide transport in fast channels in crystalline rock. Water Resour. Res. 22(8), 1247–1256 (1986)

    Article  Google Scholar 

  43. Robinson, B.A., Dash, Z.V., Srinivasan, G.: A particle tracking transport method for the simulation of resident and flux-averaged concentration of solute plumes in groundwater models. Comput. Geosci. 14(4), 779–792 (2010)

    Article  Google Scholar 

  44. Russell, S.: Artificial intelligence: a modern approach. Prentice Hall, Upper Saddle River (2010)

    Google Scholar 

  45. Santiago, E., Romero-Salcedo, M., Velasco-Hernández, J.X., Velasquillo, L.G., Hernández, J. A.: An integrated strategy for analyzing flow conductivity of fractures in a naturally fractured reservoir using a complex network metric. In: Batyrshin, I., Mendoza, M.G. (eds.) Advances in computational intelligence: 11Th Mexican International Conference on Artificial Intelligence, MICAI 2012, San Luis Potosí, Mexico, October 27 – November 4, 2012. Revised Selected Papers, Part II, pp 350–361. Springer, Berlin (2013)

  46. Santiago, E., Velasco-Hernández, J.X., Romero-Salcedo, M.: A methodology for the characterization of flow conductivity through the identification of communities in samples of fractured rocks. Expert Syst. Appl. 41(3), 811–820 (2014). https://doi.org/10.1016/j.eswa.2013.08.011

    Article  Google Scholar 

  47. Santiago, E., Velasco-Hernandez, J.X., Romero-Salcedo, M.: A descriptive study of fracture networks in rocks using complex network metrics. Comput. Geosci. 88, 97–114 (2016). https://doi.org/10.1016/j.cageo.2015.12.021

    Article  Google Scholar 

  48. Srinivasan, G., Tartakovsky, D.M., Dentz, M., Viswanathan, H., Berkowitz, B., Robinson, B.: Random walk particle tracking simulations of non-fickian transport in heterogeneous media. J. Comput. Phys. 229 (11), 4304–4314 (2010)

    Article  Google Scholar 

  49. Vapnik, V., Chervonenkis, A.Y.: Theory of pattern recognition: statistical problems of learning [in russian] (1974)

  50. Vapnik, V., Lerner, A.: Pattern recognition using generalized portrait method. Autom. Remote. Control. 24, 774–780 (1963)

    Google Scholar 

  51. Vevatne, J.N., Rimstad, E., Hope, S.M., Korsnes, R., Hansen, A.: Fracture networks in sea ice. Front. Phys. 2, 1–8 (2014). https://doi.org/10.3389/fphy.2014.00021

    Article  Google Scholar 

  52. Witherspoon, P.A., Wang, J., Iwai, K., Gale, J.: Validity of cubic law for fluid flow in a deformable rock fracture. Water Resour. Res. 16(6), 1016–1024 (1980)

    Article  Google Scholar 

Download references

Acknowledgments

The authors are grateful to an anonymous reviewer, for comments and suggestions that helped improve the clarity and readability of the manuscript.

Funding

This work was supported by the Computational Science Research Center at San Diego State University, the National Science Foundation Graduate Research Fellowship Program under Grant No. 1321850, and the U.S. Department of Energy at Los Alamos National Laboratory under Contract No. DE-AC52-06NA25396 through the Laboratory Directed Research and Development program. JDH thanks the LANL LDRD Director’s Postdoctoral Fellowship Grant No. 20150763PRD4 for partial support. JDH, HSV, and GS thank LANL LDRD-DR Grant No. 20170103DR for support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jeffrey D. Hyman.

Appendices

Appendix: Flow equations and transport simulations

We assume that the matrix surrounding the fractures is impermeable and there is no interaction between flow within the fractures and the solid matrix. Within each fracture, flow is modeled using the Darcy flow. The aperture within each fracture is uniform and isotropic, but they do vary between fractures and are positively correlated to the fracture size [21]. Locally, we adopt the cubic law [52] to relate the permeability of each fracture to its aperture. We drive flow through the domain by applying a pressure difference of 1MPa across the domain aligned with the x axis. No flow boundary conditions are applied along lateral boundaries and gravity is not included in these simulations. These boundary conditions along with mass conservation and Darcy’s law are used to form an elliptic partial differential equation for steady-state distribution of pressure within each network

$$ \nabla \cdot (k \nabla P ) = 0~. $$
(1)

Once the distribution of pressure and volumetric flow rates are determined by numerically integrating (1) the methodology of Makedonska et al. [33] and Painter et al. [41] and are used to determine the Eulerian velocity field u(x) at every node in the conforming Delaunay triangulation throughout each network.

The spreading of a nonreactive conservative solute transported is represented by a cloud of passive tracer particles, i.e., using a Lagrangian approach. The imposed pressure gradient is aligned with the x axis and thus the primary direction of flow is in the x direction. Particles are released from locations in the inlet plane x0 at time t = 0 and are followed until the exit the domain at the outlet plane x L The trajectory x(t; a) of a particle starting at a on x0 is given by the advection equation, \(\dot {\mathbf {x}(t; \mathbf {a})} = \mathbf {v}(t;\mathbf {a})\) with x(0; a) = a where the Lagrangian velocity v(t; a) is given in terms of the Eulerian velocity u(a) as v(t; a) = u[x(t; a)]. The mass represented by each particle m(a) and the breakthrough time at the outlet plane, τ(x L ; a) of a particle that has crossed the outlet plane, x L = (L,y,z) is can be combined to compute the total solute mass flux ψ(t) that has broken through at a time t,

$$ {\Psi}(t,\mathbf{x}_{L} ) = \frac{1}{M} \int\limits_{{\Omega}_{a}} d m(\mathbf{a}) \delta[t - \tau(\mathbf{x}_{L},\mathbf{a})]~, $$
(2)

where Ω a is the set of all particles. Here mass is distributed uniformly amongst particles, i.e., resident injection is adopted. For more details about the injection mode see Hyman et al. [25].

Algorithm description

2.1 Random forest

The random forest method is based on constructing a collection of decision trees. A decision tree [44] is a tree whose interior nodes represent binary tests on a feature and whose leaves represent classifications. An effective way of constructing such a tree from training data is to measure how different tests, also called splits, separate the data. The information gain measure compares the entropy of the parent node to the weighted average entropy of the child nodes for each split. The splits with the greatest information gain are executed, and the procedure is repeated recursively for each child node until no more information is gained, or there are no more possible splits. A limitation of decision trees is that the topology is completely dependent on the training set. Variations in the training data can produce substantially different trees.

The random forest method [17, 18] addresses this problem by constructing a collection of trees using subsamples of the training data. These subsamples are generated with replacement (bootstrapping), so that some data points are sampled more than once and some not at all. The sampled “in-bag” data points are used to generate a decision tree. The “out-of-bag” observations (the ones not sampled) are then run through the tree to estimate its quality [30]. This procedure is repeated to generate a large number (hundreds or thousands) of random trees.

To classify a test data point, each tree “votes” for a result. This provides not only a predicted classification, determined by majority rule, but also a measure of certainty, determined by the fraction of votes in favor. The use of bootstrapping effectively augments the data, allowing random forest to perform well using fewer features than other methods. The category with more votes is assigned to the new observation. The idea of random decision forests originated with T. Ho in 1995. Ho found that forests of trees partitioned with hyperplanes can have increased accuracy under certain conditions [17]. In a later work [18], Ho determined that other splitting methods, under appropriate constraints, yielded similar results.

Additionally, random forest provides an estimate of how important each individual feature is for the class assignment. This is calculated by permuting the feature’s values, generating new trees, and measuring the “out-of-bag” classification errors on the new trees. If the feature is important for classification, these permutations will generate many errors. If the feature is not important, they will hardly affect the performance of the trees.

2.2 Support vector machines

Support vector machines (SVM) use a maximal margin classifier to perform binary classification. Given training data described by p features, the method identifies boundary limits for each class in the p-dimensional feature space. These boundary limits, which are (p − 1)-dimensional hyperplanes, are known as local classifiers, and the distance between the local classifiers is called the margin. SVM attempts to maximize this margin, making the data as separable as possible, and defines the classifier as a hyperplane in the middle that separates the data into two groups. The data points on the boundaries are called support vectors, since they “support” the limits and define the shape of the maximal margin classifier.

Formally, a support vector machine attempts to construct a hyperplane,

$$ \langle \mathbf{a}, \mathbf{x}_{i} \rangle + b = 0~, $$
(3)

that partitions data points x i into disjoint sets, for i ∈{1,…,n}. To define the hyperplane we must determine the unknown coefficients a = (a1,…,a p ) and b. SVM seeks to determine a hyperplane that separates the two sets A and B and leaves the largest margin.

Let 〈a,x i 〉 + b = ± 1 be the normalized equations for the boundaries of the hyperplane margin. Data points x i on either side of the margins of the hyperplane will lie in either A and B depending upon whether they satisfy either

$$ \langle \mathbf{a}, \mathbf{x}_{i} \rangle + b > 1\qquad {\text or} \qquad \langle \mathbf{a}, \mathbf{x}_{i} \rangle + b < -1~. $$
(4)

Define an indicator function S(x) by

$$ S(\mathbf{x}) =\left\{\begin{array}{l} 1 \qquad \mathbf{x} \in A\\ -1 \qquad \mathbf{x} \in B \end{array}\right. $$
(5)

and set S i = S(x i ). Then we can combine (4) into a single inequality (〈a,x i 〉 + b)S i ≥ 1 for all x i ; with equality holding for support vectors, the nearest points to the margin.

In most cases the sets A and B may only be close to linearly separable. To account for this possibility we introduce slack variables ξ i ≥ 0,

$$ (\langle \mathbf{a}, \mathbf{x}_{i} \rangle + b) S_{i} \ge 1 - \xi_{i} \qquad \forall~i~, $$
(6)

that allows x i corresponding to ξ i > 0 to be incorrectly classified with the ξ i being used as a penalty term. Distances ρ1 and ρ2 from the coordinate origin to the margin boundary are given by ρ1 = −(b + 1)/∥a∥ and ρ2 = −(b − 1)/∥a∥ where ∥a∥ is the Euclidean length of a. The margin width is the distance between these two lines d = ρ2ρ1 = 2/∥a∥. We seek to maximize d or, equivalently, minimize ∥a∥, over the set of training data (y1,…,y m ), subject to the linear constraints (6). Typically one works with the Lagrangian formulation of the constrained optimization problem. The dual Lagrangian problem is to minimize the objective function

$$\begin{array}{@{}rcl@{}} L(\mathbf{a}, b, \boldsymbol{ \xi}, \gamma, \boldsymbol{\delta}) &\,=\,& \frac{1}{2}{\| \mathbf{a}\|}^{2} \,-\, {\sum}_{j = 1}^{m} \gamma_{j} [ (\langle \mathbf{a}, \mathbf{y}_{j} \rangle \,+\, b) S_{j} \,-\, 1 + \xi_{j}] \\ && + C {\sum}_{j = 1}^{m} \xi_{j} - {\sum}_{j = 1}^{m} \delta_{j} \xi_{j}~ \end{array} $$
(7)

subject to the non-negativity constraints of the Lagrange multipliers γ j ,δ j , and ξ j ≥ 0 and obtain a and b. The penalty, or margin parameter, C is a regularization term that controls how many points are allowed to be mislabeled in the SVM hyperplane construction; smaller values of C allow for more points to be mislabeled. A solution to this optimization problem defines a and b. The SVM classifier is then given by the sign of the decision function,

$$ \chi(\mathbf{x}_{i}) = \text{sgn} (\langle \mathbf{a}, \mathbf{x}_{i} \rangle + b). $$
(8)

SVM falls into the category of kernel methods, a theoretically powerful and computationally efficient means of generalizing linear classifiers to nonlinear ones. For instance, on a two-dimensional surface (p = 2 features), instead of the line described by Eq. 3, we can choose a polynomial curve or a radial loop. Equation 8 may then be written in the form

$$ \chi(\mathbf{x}_{i}) = \text{sgn}\left( \alpha_{0} + {\sum}_{j = 1}^{m}\alpha_{j} K(\mathbf{x}_{i},\mathbf{y}_{j})\right), $$
(9)

where K(x i ,y j ) is an appropriate kernel function of x i and training point y j . Radial kernels often provide the best classification performance, but at higher computational costs.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Valera, M., Guo, Z., Kelly, P. et al. Machine learning for graph-based representations of three-dimensional discrete fracture networks. Comput Geosci 22, 695–710 (2018). https://doi.org/10.1007/s10596-018-9720-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10596-018-9720-1

Keywords

Navigation