Benefits of maximum likelihood estimators for fracture attribute analysis: Implications for permeability and up-scaling

The success of any predictive model is largely dependent on the accuracy with which its parameters are known. When characterising fracture networks in rocks, one of the main issues is accurately scaling the parameters governing the distribution of fracture attributes. Optimal characterisation and analysis of fracture lengths and apertures are fundamental to estimate bulk permeability and therefore ﬂ uid ﬂ ow, especially for rocks with low primary porosity where most of the ﬂ ow takes place within fractures. We collected outcrop data from a fractured upper Miocene biosiliceous mudstone formation (California, USA), which exhibits seepage of bitumen-rich ﬂ uids through the fractures. The dataset was analysed using Maximum Likelihood Estimators to extract the underlying scaling parameters, and we found a lognormal distribution to be the best representative statistic for both fracture lengths and apertures in the study area. By applying Maximum Likelihood Estimators on outcrop fracture data, we generate fracture network models with the same statistical attributes to the ones observed on outcrop, from which we can achieve more robust predictions of bulk permeability.


Introduction
The upper part of the Earth's crust is highly fractured at all scales, ranging from microcracks to large-scale faults and joints, from tens of micrometres to hundreds of metres. Fractures and joints are well known for their effects on the mechanical properties of rocks (e.g. bulk elastic constants and shear strength) but they also control the transport properties, in particular the permeability of crystalline and tight sedimentary rocks, such as shales and mudstones (Brace, 1980). Fracture networks can make major contributions to fluid flow in a number of contexts: (1) hydrocarbon production from fractured reservoirs; (2) reservoir stimulation by hydrofracturing; (3) geothermal energy extraction; (4) hazardous waste isolation; (5) aquifer exploitation for water supply; (6) mining and mineralisation processes; (7) deeper Earth systems, such as earthquakes and ocean floor hydrothermal venting (Berkowitz, 2002;Brown, 1987). Consequently, considerable effort has been placed on characterisation (Dershowitz and Einstein, 1988;Aydin, 2000) and modelling (e.g. Brown and Bruhn, 1998;Oda, 1982Oda, , 1986Oda, , 1988 of fractures and fracture networks. A wide variety of different techniques and approaches has been applied to the problem of modelling fracture networks (e.g. Berkowitz, 2002;Long and Billaux, 1987;Taylor et al., 1999). Analyses are usually derived from measurements of fractures at outcrops, core samples, or borehole data. Laboratory measurements are often limited to analysing single fractures, which can impede the extrapolation of data at larger scales. On the other hand, borehole data can preclude complete and detailed mapping and analysis, as they rely largely on extrapolation and subjective considerations, e.g. interpretation of geophysical data and image logging (Berkowitz, 2002). Outcrop observations, therefore, can provide direct and less subjective geological evidence. However, issues remain: in particular, those related to the scaling of fracture attributes observed on outcrops into reservoir scale permeability and fracture network models.
The main objective of this work is to demonstrate a more accurate statistical approach to increase utility, meaningfulness, and reliability of data from fractured outcrop analogues. Finding the best statistical distribution governing a dataset is of critical importance when predicting the tendency of fracture attributes towards small and large scales. The most common way for statistical inference on fracture attributes relies upon calculation of the slope of a regression line, usually for data plotted on a bilogarithmic plot. This technique is applicable with the assumption that for each fixed x (e.g. fracture lengths) the values lying on the y-axis (the dependent variable) are normally distributed and with a variance s 2 (not depending on x) (Kreyszig, 1970). However, this is not always the case, in particular, when data have a powerlaw or exponential behaviour (Newman, 2005). Therefore, since we do not know a priori the underlying distribution of our fracture attributes, we need statistically reliable tools, which for our purposes are the Maximum Likelihood Estimators (MLEs). In fact, the goodness-of-fit of MLEs can be statistically quantified and they are proven to give accurate parameter estimates (Clauset et al., 2007).
The unique fracture network cropping out around Santa Cruz (California, USA) gave us the chance of measuring a recently-active bitumen-bearing fractured top seal, as a natural example to build and test a general model for bulk permeability prediction, and to establish statistical models that can be inferred from measures taken at local places and extrapolated for a larger two-dimensional (2D) domain.
This paper is organised as follows. First, the MLE method is introduced, and we compare its abilities against least-square regression techniques to find the best distribution among the most common statistics observed for fracture attributes, as well as its capacity of accurately estimating the parameter underlying the data set. Second, we present the fracture attribute acquisition method, based on the circular estimator method (Mauldon et al., 2001;Rohrbaugh et al., 2013). Third, we evaluate the statistical distribution of the outcrop fracture attributes e namely mean fracture trace length and fracture aperture e using Maximum Likelihood Estimators. Last, we model the impact of this statistical analysis on up-scaling two-dimensional (2D) fracture networks and on permeability prediction of the fractured rock mass, using Oda's model (Oda, 1985;Suzuki et al., 1998), which allows us to maximise the acquired data (mean fracture trace lengths and apertures).

Statistical tools for fracture attributes
Fractures are most frequently characterised geometrically in terms of fracture trace length, intensity, density and aperture. In particular, these geometrical features are often well described by certain statistical distributions (Bonnet et al., 2001). The statistical distributions used to describe fracture attributes commonly range from the normal distribution (un-skewed) to increasingly skewed distributions, such as log-normal, exponential and power-law .
As suggested by McCaffrey et al. (2003), the simplest qualitative test to decide which of these distributions is appropriate for a given set of data is to rank the values of a fracture attribute in descending order (x-axis) and then plot the attribute against the cumulative frequency (y-axis) on a bi-logarithmic plot (called a population plot) (McCaffrey et al., 2003). Then the resulting function is fitted to the linear form by least-squares regression. The slope to the fit is interpreted as the estimate of the distribution parameter of the related attribute (Fig. 1). The goodness-of-fit is judged based on the value of the coefficient of determination (R 2 ). This number indicates the proportion of the variance in the dependent variable that is predictable from the independent variable (Myung, 2003); the closer it is to 1 the better the fit. Linear regression requires minimal distributional assumptions and is useful for obtaining a descriptive measure for the purpose of summarising observed data, but it has no basis for quantitatively testing the hypothesis that the chosen model is actually the best fit for the observed data (Myung, 2003;Clauset et al., 2007). Unfortunately, distributions can be deceptive. On a bi-logarithmic plot for example, a log-normal distribution resembles a power-law with a different slope (Fig. 1a). Fig. 1bec illustrate the issue, where two small (n ¼ 100) synthetic sets of discrete data (Appendix A for details), a power-law log-normal with known m ¼ 0:3 and s ¼ 2, a power-law with a ¼ 2:5: Visually both the CDFs appear straight on the log-scale used, and can both be interpreted as power-law, with different slopes. (b)e(c) When both distributions are fitted using a linear regression with a power-law model, the values of the R 2 do not help to reveal the real nature of the distribution, giving high results in both cases. Moreover (c), the estimated b a ¼ 1:06 for the real power-law distribution is far from be accurate. NB: the y-axes on these plots are values of cumulative frequency. distribution with known scaling parameter (a ¼ 2:5) and a lognormal distribution with m ¼ 0:3 and s ¼ 2, are plotted on population plots (data of interest on the x-axis and cumulative frequency on the y-axis) and then fitted using linear regression models. Both distributions, in their linear form, can be fitted with a power-law model for many orders of magnitude. This results in high values of R 2 even for the distribution that has been fitted with the wrong model; R 2 values tell us very little about the hypothesis validation.
Moreover, Newman (2005) and Clauset et al. (2007) explained that these procedures generate systematic errors under relatively common conditions, and consequently the results they give cannot be trusted. In fact, the formula for calculating the standard errors and the value for R 2 on the slope of a regression line is correct when the assumptions of linear regression hold (Kreyszig, 1970;Clauset et al., 2007). These include: 1. For each fixed x, the variable y (the dependent variable) is normally distributed with mean (a þ bx) and variance (s 2 ) not depending on x. 2. Independent Gaussian noise in the dependent variable at each value of the independent variable.
When fitting to the logarithm of a Cumulative Density Function (CDF), as in the analysis of power-law and exponential data in a population plot, the noise, although independent, is not Gaussian. The noise in the frequency estimates themselves is Gaussian, but the noise in their logarithm is not (Clauset et al., 2007). Thus, the formula for the error is inapplicable in this case. Furthermore, Newman (2005) showed that for power-law distributions whose scaling parameter range is a 3 (which includes most of the known power-law distributions for natural phenomena) there is no meaningful variance. Hence, the first of the assumption for making use of linear regression falls. Consequently, the calculation of the fraction R 2 of the variance accounted for the fitted line should not be trusted in these cases.
In addition, the cumulative frequency axis (such as shown in Fig. 1) must take the value 1 at x min if the probability distribution above the minimum value is properly normalised. Ordinary linear regression, however, does not incorporate such constraints and then, in general, the regression line does not respect them (Clauset et al., 2007). Therefore, more appropriate statistical tools are needed; following Newman (2005) and Clauset et al. (2007) suggestion, the statistical distribution of the data collected for this work has been estimated via Maximum Likelihood Estimators (MLEs).
It must be stressed that finding the best statistical distribution governing a set of data is of critical importance when we aim to predict and model the tendency of fracture attributes towards the small and large scale, especially when these attributes, collected from outcrops, often suffer censoring and truncation biases, and restrictions due to small sample sizes.

Maximum likelihood estimators for fracture attributes
The aim of our data analysis is to identify the population that is most likely to have generated the sample. To demonstrate the validity of the MLEs against linear regression, we test their ability to extract the statistical parameters from the same synthetic data set described in the previous section (Fig. 1). This has been done using a suite of custom MATLAB™ functions, integrated into FracPaQ (Healy et al., 2016), which finds the best theoretical distribution fitting the fracture attributes chosen among power-law, log-normal and exponential distributions (Table 1).
After performing the MLE (Fig. 2), the estimates for the parameters governing the power-law function are found to be b a ¼ 2:6 and b x min ¼ 15:2 (see Appendix A for details); conversely (Fig. 1), the linear regression method estimated an b a ¼ 1:06, without giving any estimates of the minimum value of the observed data set ( Table 2).
To constrain the goodness-of-fit obtained via the Maximum Likelihood Estimator we use the Kolmogorov-Smirnoff (K-S) test, a standard approach in statistics. This test is based on the measurement of the 'distance' between the observed distribution and the hypothesised model (Clauset et al., 2007), which is simply the maximum distance between the CDFs of the data and the fitted model: where SðxÞ is the CDF of the data and PðxÞ is the CDF for the model. For our purposes, we use the K-S statistic as follows: 1. We fit our observed data to a chosen model (log-normal, exponential, or power-law), and calculate the K-S test for this fit. 2. Then, we generate a large number (2,500) of synthetic distributions that follow one of the chosen distributions; we also set the relative distribution parameters equal to those of the distribution that best fits the observed data set. 3. We fit each synthetic data set individually to its own model and calculate the K-S statistic for each one relative to the selected model. 4. Finally, we count the fraction, expressed as a percentage, when the resulting statistic is larger than the value for the empirical data. This is the resulting p-value.
It is crucial to note that for each of the synthetic data set the K-S statistic is computed relative to the best-fit model for that specific set of data, not to the original distribution. In this way, we make sure that the same calculation is performed for each synthetic data set, as for the real data set.
After 2,500 tests, only 2% of the tests reject the null hypothesis, in other words 2450 tests were successful (Fig. 2a). This implies that MLE is successful in correctly estimating the required parameters of the synthetic data distributed as a power-law; therefore, we can be Table 1 Formulations for probability density functions, likelihood functions and parameter functions of the statistical distributions used in this work.

Distribution
Probability Density Function (PDF) Likelihood Function Parameters Where x i denotes the sample mean b l ¼ 1 x more confident of having found the best distribution underlying our data. We also tested whether the MLE is able to correctly discern deceptive distributions. Hence, we fitted a power-law model to the same synthetic log-normal data set described in the previous section. In this case, the K-S test returned a probability of the 59% that our synthetic data are drawn from a power-law distribution; such a result allows us to comfortably discard the hypothesis that the analysed data set is distributed as a power-law (Fig. 2b).
We emphasise that in Figs. 1 and 2 different quantities are plotted on the y-axes, hence a direct comparison of the angular coefficient e the slope e is not appropriate.
Achieving an optimal and robust estimate for the parameters governing a statistical distribution underlying fracture attribute measurements guarantees that when we use them to build a model, the latter will be a closer approximation to reality.
Similar results have been obtained by Clark et al. (1999) when testing the abilities of MLE versus least-squares linear regression to estimate the parameter governing the power-law distribution of fault trace lengths. The results of their simulations showed that Least-square Estimators (LSEs) are sensibly biased, although this bias decreases as the sample size increases, and that least-squares regression underestimates the true value of the scaling parameter. They also confirmed that MLE results are unbiased and that the 95% confidence intervals in fact cover the true statistical parameter about 95% of the time. It has also been verified (Clark and Cox, 1996) that Least-squares Estimators become less precise as the distribution becomes increasingly skewed. Similarly, Pickering et al. (1995), analysing the issues related to sampling power-law distributions of earthquakes and fault populations, concluded that MLE offers the best method as an estimator.

Fracture attribute acquisition: the circular estimator method
We collected fracture attributes using the circular estimator method, a combination of circular scan lines (simply represented by circles drawn on the rock surface) and windows (the region enclosed in the circular scan line). This is a time-saving sampling tool, providing efficient estimates for fracture trace density, intensity and mean trace length that eliminates orientation bias, censoring, and length bias with respect to measurement in a plane (Mauldon et al., 2001). It is called an estimator because instead of directly sampling individual fractures and measuring their characteristic, parameters are estimated using an underlying statistical model (Lyman, 2003;Zeeb et al., 2013).
Fracture attributes are obtained by simply counting the number (n) of fracture traces intersecting the circumference, and counting the number (m) of fracture traces terminating within the circle interior ( Fig. 3d) (Rohrbaugh et al., 2013). For two-dimensional fracture patterns, such as those analysed in this work, we define: (1) Estimated Density (r) as the number of fracture per unit area; (2) Estimated Size (l) as the mean fracture length; (3) Estimated Intensity (i) as fracture length per unit area. However, this method does not provide information on parameters such as fracture orientation and aperture, which are fundamental inputs in the calculation for the permeability tensor. Hence, we combined measures of orientations and apertures of each fracture inside the circular windows. A representative aperture for each window is obtained averaging all the measured apertures within the window. Rohrbaugh et al. (2013) found that 30 is the minimum number of m-points necessary to estimate the mean trace length accurately. In our study, we have always reached the required minimum number of m-points.

Site location and identification of the fracture sets
We collected fracture attributes in a highly fractured Upper Miocene biosiliceous mudstone (Santa Cruz Mudstone) cropping out along the coast to the north of Santa Cruz, California. In the study area, the entire outcrop of the Santa Cruz Mudstone is characterised by a pervasive dilatant fracture system (Fig. 3bec) (Clark, 1981;Boehm and Moore, 2002;Scott et al., 2009). Fractures form a joint set striking predominantly towards the northwest (Fig. 3b), having all the features of tensile Mode I fractures, such as  Fig. 1, using Maximum Likelihood Estimators: (a) Fit to a power-law distribution with a MLE power-law model; on the y-axes are plotted probability values, Pr(x ! x), indicating the probability that a value 'x' will take on a value greater or equal to x. The box on the topright of the plots shows the resulting goodness-of-fit obtained after performing 2,500 Kolmogorov-Smirnoff (KeS) tests. (b) Fit to a log-normal distribution with a MLE power-law model. The resulting K-S test, performed 2,500, returned a low value of probability (59%), so that almost 50% of the K-S tests failed the null hypothesis.

Table 2
Estimates of the scaling parameter a using two estimator methods (MLE and LSE) for the synthetic data set shown in Fig. 2. Accurate estimates are in italic.

Method
Scaling Parameter Minimum Value Known Parameter none plumose structures, hackles and hackle fringes (en-echelon fractures). The latter structures are interpreted as the results of stress oscillations and local heterogeneities at the tip of the major fractures, which caused bifurcations and off-plane micro-cracks (fringes) (Fossen, 2010). Many fractures in the area show infills and stains of bitumen (Fig. 3c). Therefore, dilatant fractures in the area represent major pathways for hydrocarbon migration, from the organic-rich mudstone source rock of the Monterey Formation positioned stratigraphically below the Santa Cruz Mudstone units to the surface (Phillips, 1990).
Two sampling sites were selected for collecting fracture attribute data because they represent two different structural settings. 4 Mile Beach (Fig. 3a) is representative of the average fracturing condition of the area, while Panther Beach (Fig. 3a) shows a more showing that the fracture network forms a joint set striking predominantly towards the NW (the contouring method is 1% of the net area, with intervals set at 2%) (c) Close-up photograph showing a bitumen-filling fracture (locality: Panther Beach). (d) Example of the circular window method (Mauldon et al., 2001) applied to collect fracture attribute data. The squares indicate the 'n-points', where a fracture crosses the edges of the sampling window, while the triangles indicate the 'm-points' where a fracture terminates inside the sampling window. complex scenario characterised by a fault zone. These differences are well captured in the plots of estimated fracture intensity and estimated fracture density versus distance along transects (Fig. 4) for the two localities.
Fracture attribute acquisitions, based on the mentioned circular estimator method (Mauldon et al., 2001), have been collected every metre along two scan line transects of 22 m at Panther Beach, and 23 m at 4 Mile Beach, respectively, for a total of 45 sample sites (Fig. 3aed). A circle of known radius (14.2 cm) was placed onto the bedding surface and then used to count the number of fracture intersections with the edge of the window (n-points) and the number of fracture terminations inside the window (m-points) (Watkins et al., 2015). Based on the ratio between the n-and mpoints, we calculated for each window an estimated mean fracture length (l), estimated fracture intensity (i), and estimated fracture density (r). This method has been combined with acquisitions of fracture apertures and orientations of each fracture inside the windows.
From the measured orientations, two or possibly three, fracture sets are observed. These are defined by plotting the poles to the fracture orientations on lower-hemisphere Schmidt's equal area net (Fig. 3b), from which it can also be shown that most of the fractures have steep dips (sub-vertical). Therefore, the 2D modelling approach employed in this work is a reasonable first approximation.

Fracture trace lengths
We infer that the estimations of mean fracture lengths are reliable, as we always reached the minimum value of 30 m-points (Rohrbaugh et al., 2013) per scan line circle. For the two location sites, values of the scaling parameters for the mean trace lengths have been computed using the suite of MATLAB™ functions, integrated into FracPaQ (Healy et al., 2016) In the case of 4 Mile Beach (Fig. 5), the fitting of distributions using the MLEs reveals that the log-normal distribution is the most likely distribution with a probability of 99.5% for fracture trace lengths, while the power-law is possible albeit less likely (82.4% of probability; with an estimated scaling parameter b a ¼ 2:4 and b x min ¼ 0:07 m). The exponential distribution is the least likely distribution underlying this data set (3.5% of probability). The estimated parameters for the log-normal distribution are b m ¼ À2:01 and b s ¼ 0:41. These values, called the scaling parameters, are the logarithmic mean and logarithmic standard deviation, respectively. The estimated log-mean corresponds to an arithmetic mean length of 0.14 m. The latter value is obtained from the formula of the arithmetic mean of a log-normally distributed variable, mean ¼ e mþ 1 2 s 2 . A similar behaviour has been found for the data measured in Panther Beach (Fig. 6), where the log-normal distribution is the most likely distribution for the mean fracture trace lengths, with 99.4% of probability. In this case, the estimated parameters are b m ¼ À1:62 and b s ¼ 0:37, corresponding to 0.2 m for the arithmetic mean length. Also for Panther Beach, a probability for a power-law distribution still exists (88.8%; with an estimated scaling parameter b a ¼ 2:7 and b x min ¼ 0:12 m), although is less likely.

Fracture apertures
Values of fracture apertures have been collected from fractures inside the circular window. The measures have been taken using a ruler; therefore, the precision of these values is in the order of 1 mm. The MLE method reveals that the best distribution of fracture apertures for both the studied localities is a log-normal distribution When fitting the aperture distributions using the least-squares The log-normal distribution is the best fit for both length and aperture parameter. The estimated log-means are À2.013 and À5.62, corresponding to a mean length of 0.14 m and a mean aperture of 0.004 m, respectively. Note that the y-axes on these plots are probability values, which indicate the probability that a value 'x' will take on a value greater or equal to x.
linear regression, the estimated arithmetic mean apertures are 0.024 m and 1.2 Â 10^À5 m, for the fracture seen in 4 Mile Beach and Panther Beach, respectively. The resulting estimated parameters are evidently far from reality and they also lack consistency; in one case (4 Mile Beach) the linear regression gives a relative overestimation of the mean of the apertures (2.4 cm), while for Panther Beach the method returns a mean value probably too small (1.2 mm) (Fig. 7).

Fracture network simulations and permeability predictions
If we assume that the analysed rock mass has low primary (matrix) porosity, then most, if not all, the fluid flow takes places entirely within the fractures. Real fractures have complex surfaces and irregular apertures (e.g. natural fractures can be opened in their central part but sealed at tips) with variable degree of cementation; Fig. 6. Plot of the measured ('observed data') mean trace lengths (on the left) and apertures (on the right) relative to Panther Beach area. The same sample is plotted according to (a) Lognormal, (b) Power-law, and (e) Truncated Power-law, respectively. The broken black lines are the estimated (theoretical) CDF computed for the different distribution via the Maximum Likelihood. In the boxes on the bottom left of each plot is indicated the probability (tested through 2500 K-S tests) that the observed data are distributed according to the relative theoretical distribution law. The lognormal distribution results the best fit for both length and aperture parameter. The estimated log-means are À1.62 and À6.89, corresponding to a mean length of 0.2 m and a mean aperture of 0.001 m, respectively. Note that the y-axes on these plots are probability values, which indicate the probability that a value 'x' will take on a value greater or equal to x. however, for the purposes of this study, the geometric description is simplified. The assumptions we made are that: (1) fractures are well represented by a parallel plate model (i.e. individual fractures lie in a single plane and have a constant hydraulic aperture (Long et al., 1982)); (2) the fractured rock mass is assumed to be, and therefore treated as, a homogenous, anisotropic porous medium; and (3) permeability is approximated by a 2nd rank tensor as formulated by Oda (1985; see also Brown and Bruhn, 1998;Suzuki et al., 1998).

Simulation of the fractured rock mass
Using the statistical parameters estimated via the MLEs for the best statistical distribution representing the fracture attributes collected on outcrops (in this case a log-normal distribution, Table 1), we can create 2D fracture maps, called fracture meshes, made of a network which is an optimum replica of the fracture networks observed at outcrop.
Compared with maps of natural rock exposures, which can be subjected to incomplete observations and measurements at all relevant scales, fracture meshes can be configured with variable sizes and resolutions. These numerical models are useful to simulate the influences of the distribution of fractures in the spatial domain on statistically variable phenomena (e.g. permeability) and can provide valuable information for understanding possible interactions between fractures (e.g. connectivity), which would otherwise be difficult to study analytically.
The 2D mesh is a sample which, admittedly, is not identical to the real fractured rock mass, but will approach it if the number of samples approaches infinity, according to the principle of the Monte Carlo (MC) method (Zhang et al., 2004).
The steps needed before running the simulation are: (1) establishing the simulated space, (2) determining the fracture position, (3) simulating the fracture sizes and attitudes, (4) determining the fracture density. Fracture centres have been randomly positioned, while their orientation has been assigned in a manner proportional to the number of fractures with that specific orientation observed in the field (Fig. 3b). The length of a fracture is simulated independently for each object in the mesh with a Monte Carlo process, which samples from a log-normal distribution (Table 1) produced using as inputs the scaling parameters (m and s) estimated through MLE for 4 Mile Beach (Section 3.2). Fracture abundance is organised according to the mean density value obtained from the circular window survey on the outcrop (Fig. 4), 277 fractures/m 2 . The population of fractures is enclosed in a two-dimensional rectangular space, which for congruity to the sizes of the scan-lines used on the outcrops, has its longest side set to 21 m. An example of one of the 1000 MC simulations carried out using these specifications is illustrated in Fig. 8a. We have produced a smaller (7 Â 4 m) 2D mesh, keeping constant all the described specifications, to aid the visualisation of the fracture spatial organisation into the simulated fracture network (Fig. 8a-1).
For comparison, we have also generated a model of the fracture network using the scaling parameters obtained from the leastsquares linear regression, while all the other characteristics (density, orientations, positioning, simulated area) have been kept identical (Fig. 9c). This change in scaling parameters (Fig. 9c-1) introduces in the resulting 2D mesh longer fractures on average, but with a reduced range of lengths compared with the simulated network obtained using the MLEs parameters (confront the respective histograms on Figs. 8 and 9).

Permeability tensor of the fracture network
Prediction of the bulk anisotropic permeability as 2nd rank tensor (Oda, 1985) for the modelled 2D meshes have been carried out inputting the simulated fracture network into FracPaQ (Healy et al., 2016;Appendix B), which bases the permeability estimation on the crack tensor equation of Suzuki et al. (1998): This formulation includes the density (r), the mean of the squared length (r 2 ), the mean of the cubed aperture (t 3 ), and the orientation cosines matrix (N ij ) of the fractures. From equation (4.1), Suzuki et al. (1998) derived the permeability tensor as: where l is a pre-multiplying dimensionless coefficient satisfying the inequality 0 l 1 = 12; d ij is the Kronecker delta; and P kk ¼ P 11 þ P 22 þ P 33 . It follows that the key parameters in this expression of the permeability are the scaling parameters of the statistical distribution (particularly the mean) of fracture trace lengths and apertures, and their orientations.
For the 1000 2D meshes generated using the MLEs scaling parameters (Fig. 8), the average anisotropic permeability tensor results as follows: Mesh 21x12 k ii ¼ f1:5; 1:33g*10 À9 m 2 (4.3) The relative directional permeability ellipse in the direction of flow (Long et al., 1982;Healy et al., 2016) is shown in Fig. 8c and is obtained using a constant fracture aperture equal to the mean aperture estimated through the MLE for a log-normal distribution (¼ 0.004 m).
We have also attempted to compute anisotropic permeability related to the fracture network built using the least-squares linear regression estimates (Fig. 7). When using the parameters estimated for the fracture attribute distributions of Panther Beach, the permeability values (Fig. 9) result in unlikely estimates (k ii y10 À16 m 2 ), even for impervious fracture networks (Bear, 1972).

Mean of squares or squared means
The use of fracture network observables from outcrops as analogues for subsurface fracture system has several advantages, because key fracture attributes such as spatial arrangement and length can be effectively measured only on outcrops (Gale et al., 2014). Using the circular window estimator (Mauldon et al., 2001), a time-saving sampling tool, it has been possible to efficiently collect these fracture attributes. However, when using the mean fracture trace lengths obtained through the circular window estimator in the calculation of the permeability tensor, some issues can arise. In equation (4.1), the term r 2 indicates the second moments of the recorded fracture lengths; in practice, equation (4.1) requires us to take the square of each value of the trace length and then compute their mean. In our study, we have not measured single fracture trace lengths, instead we have estimated the means of the fracture trace length for each window drawn along a reference line. Mauldon et al. (2001) stated that "Circular scanlines e and indeed any other linear or area measurement devices e sample an exposed rock surface, which itself is a sample of the fracture trace population". Therefore, fractures counted in each window are small sub-samples of the total sample e corresponding to the outcrop exposure e of the real population; the mean value of each single window is not a mean representative of the whole population, and in theory can be treated as a single value in the equation for permeability. Nonetheless, we tested the validity of this statement comparing the resulting second moment values obtained using two different subsampling methods from a synthetic array of fracture trace lengths: (a) classical linear scan-line, which contemplates measuring the length of each fracture crossing the sampling line; and (b) circular window method.
This process is implemented using Monte Carlo simulations (100 iterations) of an array of synthetic trace lengths representing the whole population; each iteration depends on three variables: 1 The first variable is x TRUE , which includes the second moments calculated over the whole number of synthetic fracture lengths. The length value can be distributed accordingly to four statistics e random, log-normal, power-law, or exponential. This variable (x TRUE ) is used as the true measure of the second moment to verify possible discrepancies in the sub-sampling techniques. 2 The second variable, x TL , contains the second moments computed using the linear scan-line, so that it includes subsamples from the whole fracture set. The number of trace lengths theoretically scanned and included in the computation of x TL varies, including increasing proportions of the sub-sample with respect to the whole set e namely 1%, 5%, 10%, 25%, 50%, and 75% of the entire population, so that x TL < x TRUE . 3 The third variable, x CMT , accounts for the circular window method and is obtained by creating n sub-samples that randomly select a small number of length values from the population, with the condition that for each i ¼ 1, …n, Then the values of each x CMT i are averaged to obtain a total of n mean values corresponding to the mean values obtained from the circular window estimator. All sub-samples contain at least 30 length values to ensure that all mean values are significant.
For simplicity, we have not taken into account possible orientation biases; moreover, in our implementation both sub-sampling techniques capture the same amount of fractures (x CMT ¼ x TL for all iterations).
Results for this test (Fig. 10) show that the true second moments are always underestimated when using the mean trace lengths, as in the case of the circular window method. At the same time, the linear scan-line method, in general, better approaches the true values of second moments, particularly as the sampling sizes increase. However, there are also many cases where this technique overestimates the true value and it is not possible to predict when the scan-line method will over-or under-estimate the true value. This is clearly reflected when plotting the average errors e as absolute values e between the model (i.e. if we were able to measure all the fracture in a specific area) and the used sub-sampling techniques e linear scan-line and circular window, respectively (Fig. 11). Over the range of sub-sampling percentages, the linear scan-line technique shows a greater variability with bigger error for small sub-samples, while the average errors for the circular window method are constant for all percentages of sub-sampling.
Considering that in the real case scenario a geologist does not know a priori which is the relative percentage of fracture population forming a sub-sample the circular window strategy applied in this study is the only one which is invariant relative to the uncertainties related to the sub-sampling proportions.

Truncation, censoring
Directly linked to the previous discussion there follows another limitation when dealing with data collected directly from outcrops; the sparseness of natural data, in turn due to the intrinsic size limits of outcrops. These common issues result in attribute datasets that rarely extend over the range of two orders of magnitude. Such limited intervals of values may produce inaccurate representations of the properties of a fracture network, especially if there is an interest in building mechanical or flow models related to it. Although previous efforts have been made for treating the data statistically to extrapolate and extend the range of fracture attributes, until now data fitting has mostly relied on graphical tools or regression methods, rather than using formal mathematical tools, yielding non-accurate parameter estimations.
In this work, we have shown that it is possible to overcome this limitation by applying Maximum Likelihood Estimators to outcrop fracture datasets. MLEs are shown to be more powerful and reliable tools, because they suffer neither subjective biases, as is the case of graphical tools, nor biases related to the precision of the parameter estimation, as verified for least-squares linear regression method (Clark et al., 1999). The application of MLEs can have important consequences especially when we aim to predict the tendency of fracture attributes towards smaller and larger scales than the observed, in order to build better e i.e. more consistent e models Fig. 10. Plots showing the results of second moments calculated using as inputs (1) synthetic fracture trace-length single values e representing an ideal scan-line (labeled as 'TraceLength'), and (2) mean trace-lengths calculated from small sub-sets (¼ circular windows) of the length array (labeled as 'CMeanTrace'). This is done through a Monte Carlo simulation (100 repetitions), and sub-sampling consequently increasing proportion of an array of synthetic trace lengths, representing the true population (whose second moments are labeled as 'Model'). The plots from (a) to (d) show the result for increasing sub-sampling percentage, from 5% up to 75% of the original population, respectively. In particular, this example shows the simulation using log-normally distributed fracture trace lengths. based on outcrop observations. As discussed by Gale et al. (2014), the importance of inferring and correctly extrapolating the presence of microfracture networks (fractures which can extend below the resolution of 30 mm and which are unlikely to be captured even in thin sections) can have crucial repercussions; they can influence values of seismic anisotropy and also be responsible for changes in fluid production from low-permeability rock masses, such as shale (Gale et al., 2014). As shown in Figs. 1e2 and Table 2, both the precision and the reliability of MLEs are undoubtedly higher compared to LSEs.
In the cases presented, the outcropping network of fractures shows a log-normal distribution of trace lengths and apertures (Figs. 5 and 6). In particular, trace lengths are distributed about a central value that lies at the lower limit of the distribution; in other words, we have observed high frequencies for short fractures and relatively few long fractures. This implies that the fracture system in the study area is scale-limited (Odling et al., 1999). Such systems are typical for rock sequences with strongly developed layers, as for the biosiliceous mudstones cropping out in the studied area.
The distribution of fracture apertures also tends to be lognormal, which is consistent with other studies (Oda, 1985;Bonnet et al., 2001 and references therein). If a log-normal distribution of fracture apertures exists, then it can be related to mechanical and stratigraphic layering which influence the propagation and thickening of fractures.
When fitting a power-law function to the dataset, our MAT-LAB™ functions (Healy et al., 2016) can explicitly take into account the existence of a lower limit, x min , that represents the minimum value until which a power-law behaviour applies. Specifically, for fracture attribute analysis, this minimum value should be chosen as the threshold above which the data might have been affected by censoring; in such a case, only the 'tail' of the distribution is analysed (truncated power-law). If we apply a threshold to the fracture length data, the probability of an underlying power-law distribution increases to 94.6% (with estimated parameters b a ¼ 3:0 and b x min ¼ 0:10 m) and 92.2% (with estimated parameters b a ¼ 2:7 and b x min ¼ 0:13 m), respectively for 4 Mile Beach and Panther Beach area, when 20% of the small lengths measures are excluded (Figs. 5e and 6e). However, for our case study, we are rather confident in using the whole dataset as it is based on the mean fracture length as estimated with the Circular Estimator Method (Rohrbaugh et al., 2013), and not on measurements of single fracture lengths. In the case of apertures, we observe that if an upper cut-off is applied (in this case 20%) excluding some of the smaller values, the probability of a power-law distribution is increased to 99.3% for 4 Mile Beach aperture dataset (Fig. 5f), and to 94.5% for the Panther Beach apertures (Fig. 6f); respectively with b a ¼ 3:2 and b x min ¼ 0:0027, and b a ¼ 3:0 and b x min ¼ 7 Â 10 À4 m.
Excluding the case of apertures in 4 Mile Beach (Fig. 5f), even applying a cut-off to our data the K-S tests always return smaller probability values for power-law distributions compared with lognormal distributions (unless we increase the cut-off to higher percentages, e.g. 40%). Log-normal remains the best fitting probability distribution even for cut data. We also note that when cutting the data, the power-law fit returns a value of b a ! 3:0, which is almost never observed in power-law distribution for natural phenomena (Newman, 2005). Therefore, we prefer not to apply any cut-off to our datasets.

Other limitations
A comprehensive overview of all the anisotropic permeability values obtained is shown in Table 3. These permeability predictions assume that fracturing generates all permeability and that the rock between the fractures is impermeable. The permeability estimates reported here do not consider the possible in-situ stresses, as well any effect that can be induced by depth. Therefore, they refer to Fig. 11. Plot of the resulting second moment average errors -as absolute values -, between the model (if we were able to measure all the fracture in a certain area) and the used subsampling techniques e linear scan-line and circular window, respectively. Each plot shows the variability for different sub-sampling percentages, from 1% to 75% of the total fracture population, for different distributions of fracture trace lengths.
possible fluid flow at surface levels.
Although we improved the statistical models by using MLEs, a lack of agreement between natural fracture networks and 2D meshes exists. The reasons for these limitations have been clearly explained by Manzocchi (2002) and are related to computational limitations in reproducing the natural fracture spatial organisation. For synthetically generated fractures, most of the connectivity is achieved through fracture trace intersections (an 'X' connection geometry), and the two ends of each line usually terminate as isolated ('I') fracture trace tips. Differently, for opening-mode fracture systems, many fractures terminate as abutments against other fractures to form connections with a 'Y' (or 'T') geometry. Connectivity in natural fracture systems is therefore achieved through a combination of fracture intersections ('X'-nodes) and abutments or splays ('Y'-nodes); and a high incidence of 'Y'-nodes results in a low incidence of isolated 'I' fracture tips. A 'Y'-node is highly unlikely to form by chance in a random continuum system such as a fracture mesh, so only 'I'-nodes and 'X'-nodes are present in the network. So far, we have not reproduced clustering of fractures to form fracture corridors. It is possible to overcome this issue, for example, using non-linear (non-random) Poisson distributions for positioning fracture centroids in the simulated space. Further work is planned to address these issues.

Summary and conclusions
The combination of Maximum Likelihood Estimators (MLEs) and Kolmogorov-Smirnoff test (K-S) provides a powerful and reliable statistical tool for estimating the parameters governing the distribution of fracture attribute populations. We have implemented the analysis with a set of MATLAB™ functions which are integrated into FracPaQ (Healy et al., 2016).
We have provided examples of MLE performance by analysing the statistical distribution of fracture attributes collected on two outcrops in the Santa Cruz area (USA). Using the best statistical distribution for the fracture attributes we generated 2D fracture meshes, containing a fracture network with the same statistical parameters of those measured on outcrop. Finally, we estimated the principal values of the permeability tensor of the simulated fracture networks, following Oda's tensorial approach. Our results were compared to those obtained using the least-squares regression method, demonstrating the validity and robustness of our analytical method.
The application of Maximum Likelihood Estimators allowed us to: 1 Individuate the best statistical distribution for fracture attributes measured on outcrop e specifically, length and aperture; 2 Use the calculated scaling parameter to generate synthetic fracture networks, which by design are more likely to resemble the distribution and spatial organisation observed on outcrop; 3 Employ the derived distributions for a basic 2D estimation of the bulk permeability tensor, yielding consistent values of anisotropic permeability for highly fractured rock masses, such as the studied areas. Table 3 Summary of the directional permeability values for the 2D meshes built using the parameters estimated using the MLEs and LSEs.