ROBUST METRICS OF CONNECTIVITY

,


Introduction
Connectivity of a material constituent represents the extent of connectedness depending on the scale of observation ranging from local to global connectedness.In this paper, connectivity describes the topology of connected spaces in material.Connectivity of constituents influence various physical properties, such as electromagnetic, chemical, transport, thermal, and mechanical properties.Connectivity is an important parameter for engineering/scientific studies involving piezoelectric, cement, porous material, brain neuronal networks, ceramics, hydrology, geophysics, and thermoelastic, to name a few.Definition and use of connectivity vary across disciplines.For example, the transfer of sediment from one zone or location to another is defined as the kilometerscale connectivity in geomorphology, where the connectivity is important for understanding the linkages between river reaches, the influence of sediment sources on channel morphology and the mechanisms of morphological change [1].In hydrological literature, the hydrological connectivity is related to the physical connection between waterbodies that contributes to the spatial heterogeneity of riverine flood plains across hundreds of meters [2].In geoscience, the kilometerscale connectivity is defined as the degree to which subsurface geobodies are connected that governs the transport characteristics [3,4].Connectivity of an aquifer in the subsurface at kilometer scale influences groundwater flow [5].The connectivity of catchment and landscape is considered in geomorphology for environmental management [6].The connectivity of habitats is utilized in spatial ecology for colonization event prediction [7].At the small scale, the connectivity of pores based on multiple-point statistics facilitates the reconstruction of porous media.[8].The connectivity of soil structure helps in soil surface topography detection, as well as morphology and percolation analyses [9].
Connectivity as a physical parameter is relevant and important to various disciplines because of its influence on several physical and chemical properties.Our focus is on the micro-scale/porescale connectivity of constituents.Various metrics for connectivity quantification have been proposed in the last 50 years.Few metrics are developed based on percolation theory, which denotes the transition from disconnected clusters to a large spanning cluster [10].The proportion of percolated cluster is found to be one way of measuring connectivity based on the percolation theory.Percolation-based metrics have been implemented in quantifying the connectivity of pore network [11].In kilometer-scale subsurface characterization, percolation-based metrics have been used to quantify the connectivity of geobodies [12].Another concept from integral geometry, such as Euler characteristic, have been used to quantify the shape and structure of topological components [13].The Euler characteristic has been used as the metric to quantify connectivity of the trabecular network as a measure of bone quality [14].The Euler characteristic is programmed as a tool to access connectivity of an image for trabeculae in BoneJ, a plugin-in for bone image analysis in ImageJ [15].One component from Euler characteristic, Betti numbers, has been used for tumor area detection based on the connectivity of cells [16].As a spatial patterns measurement in geostatistics, the indicator variogram is suggested as an promising tool for connectivity quantification [17].The indicator variograms are a measure of spatial continuity at a specific threshold, where multiple thresholds can be selected for non-binarized images to calculate indicator variograms and the response serves as the measure of connectivity [18].However, it was later shown that the indicator variograms are inappropriate and not able to distinguish between lens structure and connected channel structure [19].Later, connectivity function was proposed by Allard serving as an alternative approach [20].The connectivity function was used to distinguish the connected and disconnected patterns in soil moisture [21].The results show the connectivity function can distinguish patterns that share the same response with the indicator variogram.Thus, it serves as a more promising tool that exist in observed spatial fields.Fast marching algorithm introduced by Sethian is a numerical method for tracking the evolution of monotonically advancing fronts in simulated grids [22].The response from the evolving of wave front is related to the topology and spatial properties.Fast marching method was used to determine paths of anatomical connection between regions of the brain from the map of travel time [23] The application of connectivity quantification can be found across various disciplines.For example, quantification of the pore connectivity captured in the 2D digital images of porous materials [24].Using such metrics, the segmented X-ray micro-tomography images were analyzed for assessing the connectivity of oil, water, and gas phases during different periods of injection [25].Currently, there does not exist a standard measure of connectivity of material constituents.Moreover, there has not been a rigorous study on the robustness and reliability of connectivity metrics for composite materials.
In this study, six metrics are developed and tested for quantifying the connectivity of material constituents captured in high-resolution microcopy images.The robustness of the metrics is evaluated by applying the metrics to 3000 synthetic images representing six levels of connectivity and SEM images of organic-rich shale sample.We also study the sensitivity of these metrics to areal fraction and random distribution of a constituent.In this paper, we first introduce the synthetic images and the connectivity metrics used in this study.Following that, we analyze the results of the connectivity metrics when applied on synthetic images and real SEM images.The effect of areal fraction and random distribution of constituent on the performance of metrics is discussed next.

Material
Six types of synthetic binary images, comprising white or black pixels, with specific levels of connectivity are created for evaluating the performances of the six connectivity-quantification metrics.The six types of binary images will be referred as Type 1 to Type 6.These six types of images are created such that the connectivity of the white constituent/component decreases from Type 1 to Type 6 due to the reduction in the connectedness of the white constituent, while ensuring similar areal fraction of the white constituent irrespective of the level of connectivity.The typical binary images representing Type 1 to Type 6 are shown in Figure 1.Each image has a dimension of 200 pixels by 200 pixels.The synthetic dataset contains 500 realizations of each level of connectivity with random location and distribution of the constituent of interest (i.e. the white component/constituent).As a result, there are in total 3000 binary images in the synthetic dataset.The image shown in Figure 1(a) represents Type-1 connectivity, where the image contains ten horizontal and ten vertical bars representing the white constituent randomly distributed over the black constituent in the background.All the bars have the same dimension, i.e. hundred pixels in length and two pixels in width.These white pixels represent the material constituent of interest for which the connectivity is to be quantified, whereas the black pixels represent the background.Due to some overlapping pixels between the white horizontal and vertical bars, all the binary images have slightly less than 4000 pixels representing the white constituent.Consequently, the white constituent in each binary image is approximately 10% fraction of the entire image (~4000/(200×200)).Figure 1(b) represents the Type-2 binary image, which has slightly lower level of connectivity as compared to the Type-1 binary image.All images irrespective of the level of connectivity have the same dimension and fraction of the white constituent.All images belonging to Type 1 contain 10 vertical and 10 horizontal bars representing the white constituent.However, the bars in Type-2 images are half the length of those in Type-1 images; therefore, the numbers of bars in horizontal and vertical directions in Type 2 image are twice those in Type 1 image.The bars in Type-2 images are fifty pixels in length and two pixels in width.Type-2 binary image has 20 vertical and 20 horizontal bars randomly distributed over the black background.The synthetic binary images belonging to Types 3, 4, 5, and 6 are created in the same manner comprising 10% areal fraction of the white constituent, such that each subsequent connectivity type is created by reducing the length of randomly distributed bars and increasing the number of bars to keep the fraction of the white constituent the same while reducing connectivity of material constituent of interest.Each type of binary image representing specific level of connectivity consists of 500 different realizations of randomly distributed bars.As a result, there are in total 3000 binary images in the synthetic dataset.After evaluating the six metrics on binary images corresponding to the six levels on connectivity (shown in Figure 1), the six metrics are applied on portions/slices of a segmented scanning electron microscopy (SEM) image.The segmentation was accomplished using a machine-learning assisted segmentation workflow [30,31].Thus, the 200-pixel by 200-pixel binary image in this section serves as the real-world data and is taken from a 2000-pixel by 2000-pixel segmented image derived from an SEM image of shale rock sample from Wolfcamp formation.A robust image segmentation is required to convert the SEM images into segmented images that delineates the material constituents [26].We first convert the segmented images into binary images such that the material constituent that represent organic matter is masked as white pixels and the remaining black pixels represent the background comprising matrix, pores, clays, and other solid minerals.
In this study, we quantify the connectivity of the organic constituent shown in white.The two binary images shown in Figure 2 have the image size of 200 pixels by 200 pixels.The fraction of the organic matter (i.e. the white constituent) in each of the two images is 15%.A visual examination of the images in Figure 2 indicates that the organic matter in the first image has higher connectivity than the second one.Our hypothesis is that the responses of the metrics when applied to two images will reveal the difference in the connectivity of organic matter in the two segmented SEM images.As a result, connectivity of the white constituent can be easily inferred from the response of the metrics instead of human-led visual inspection of each 200-pixel by 200-pixel image slice of the 2000-pixel by 2000-pixel SEM image.The proposed metrics will standardize and speed-up the analysis of connectivity.

Clusters in an image
A cluster in a 2D image is a group of connected pixels.In 2D, two adjacent pixels are connected when they share the same face, referred as the 4-connectivity type, or when they share the same vertex, referred as 8-connectivity type.As shown in the

Metrics based on percolation theory
Connectivity in the percolation theory is defined as the probability of any two cells/points/pixels to belong to the largest percolating cluster [12] specific to the constituent of interest.On those lines, connectivity of a specific constituent is expressed as: where the   is the number of pixels in the largest cluster, n is the total number of clusters, and   is the number of pixels in each of the cluster.Similar to this definition based on percolation theory, geobody connectivity developed for subsurface reservoir models is defined as the ratio of the largest geobody volume to the total grid volume.Following steps are required to calculate connectivity index specific to a constituent: 1. Identify all the clusters and assign a label to each cluster.2. Determine the number of pixels in each cluster.3. Use the values determined in the Steps 1 and 2 in Equation 1.
However, the metric in Equation 1 is suitable when the fraction of material constituent of interest is large and there is a single dominant cluster.However, when the fraction of constituent is low and there is no single dominant cluster, a more applicable metric is presented in Equation 2. The metric calculates the connectivity based on all the clusters and probability of pixels being in any one of the clusters, which is expressed as The numerator is formulated to emphasize the contribution of larger clusters.

Euler characteristic/number
Euler characteristic/number describes the shape and structure of topological spaces that don't vary with certain types of deformation/distortion, such as stretching, compression, inflation, twisting, and bending without gluing or tearing.Two topologically similar objects will have the same Euler number.Geometry of an object is not equivalent the topology.In 3D, the Euler number can be calculated as is referred as Betti number.The subscript 0, 1, or 2 of  are all based on the shape of objects represented using grid points or pixels.In 2D image,  2 does not exist; so, the Euler number corresponds to the difference between the total number of clusters and total number of holes in the grid-based pixel representation, expressed as A hole is a cluster of the background material completely surrounded by the constituent of interest.The calculation of the Euler's number in a binary image is as follows: 1. Identify all the clusters of the constituent of interest.
2. Identify all the clusters of the background material.
3. Calculate the numbers of clusters of the constituent of interest and those of the background material.4. Solve Equation 4 using values calculated in Step 3.
Negative values or values close to zero for the Euler number indicates high connectivity because of few large-sized dominant clusters of the constituent of interest and limited number of holes formed by the background material.Euler number does not have sufficient resolution for high connectivity.An increase in the areal fraction with a corresponding increase in the connectedness of constituent generally leads to number of holes to be larger than the number of constituent clusters with limited scattered pixels of constituents and background material (hole).This situation will decrease the Euler number to negative values.However, caution is required when using Euler number to quantify connectivity.For example, as the connectedness of clusters increases further the holes can get filled up, resulting in an increase in Euler number to positive values, which is contrary to the general trend of decrease in Euler number with increase in connectivity.Further, Euler number is not suitable for 2-dimensional images when the constituent of interest is distributed as large number of small clusters, especially for lower areal fractions.Issues with Euler number is evident for low areal fraction of the constituent of interest.Large number of scattered pixels, either the constituent of interest or the background material, will result in erroneously large positive or negative values of Euler number with large variance.

Indicator variogram
Variogram describes the degree of spatial dependence and spatial extension of randomly varying processes or properties.Variogram is the variance of the difference between the values of a property/process measured at two locations.Indicator variogram is calculated based on indicator values at various spatial locations on an indicator map, which is a binary transform of the original map of the continuous stochastic process or random variable based on the selected threshold.Indicator is a binary transform of the spatial distribution of a random variable (process/property) to either 1 or 0 for each spatial location, depending on whether the variable is above or below a threshold.The spatial distribution of a continuous variable can be transformed to an indicator map based on a selected threshold.The binary images used in this study can be considered as indicator maps where pixels of value 1 (i.e.white pixels) represent constituent of interest and pixels of value 0 represent the background material.We intend to use the indictor variogram to quantify the connectivity of the pixels with the value of 1.The indicator variogram is expressed as where I is the indicator value (either 1 or 0) based on a specific threshold, u is the coordinate vector of a pixel, h is the separation distance between two pixels, and N is the number of paired pixels at a given distance h.Indicator value of 1 represents the constituent of interest and the indicator value of 0 represents the background material.The indicator variogram is related to the two-point spatialcorrelation function (ℎ) in the following manner: where (ℎ) is the probability of having two pixels located in the constituent of interest at a distance The following steps are used to calculate the two-point spatial-correlation function (ℎ) and the indicator variogram (ℎ): 1. Specify the direction for computing the indicator variogram, either along X-axis, Y-axis, X-diagonal or Y-diagonal, as shown in Figure 5. 2. Specify the range of separation distance h for which indicator variogram (ℎ) needs to be computed.3. Randomly select pixel pairs at the specified separation distance h in the specified direction.4. Record total number of pixel pairs selected in the previous step and the number of those pixel pairs that lie in the constituent of interest.Hence, calculate (ℎ) along the specified direction.5. Subtract (ℎ) from the areal fraction of the constituent of interest in the binary indicator map to obtain (ℎ).6. Loop through all the separation distance h within the specified range to obtain the complete indicator variogram (ℎ) for the entire range of h in one specific direction.7. Compute (ℎ) for the four directions, namely X-axis, Y-axis, X-diagonal and Y-diagonal.

Two-point cluster function
The two-point correlation function (ℎ) can be written as a sum of two contributions [28]: where the 2(ℎ) is the probability of finding two pixels separated by a separation distance h in the same cluster of the constituent of interest and (ℎ) is the probability of finding two pixels at separation distance h but in different clusters of the constituent of interest.When the probability of finding two pixels in the same cluster of a constituent is high, there is a dominant cluster and the connectivity of the constituent is high.Calculation of 2(ℎ) requires the same steps as in calculating (ℎ) except that in the step 4, it needs to be determined whether the pixel pairs lie in the same cluster of the constituent.2(ℎ) is an indicator of local connectivity within a cluster, whereas (ℎ) is an indicator of isolated clusters.A high global connectivity requires high 2(ℎ) and low (ℎ).Consequently, C2(h) is a better indicator of connectivity as compared to S(h) and indicator variogram.

Connectivity function
The connectivity function or connectivity statistics (ℎ) measures the probability of two pixels in the constituent of interest being connected.(ℎ) is a function of separation distance h expressed as where A is a cluster index.Equation 8 mentions that two pixels  and  + ℎ are connected when both belong to constituent of interest and lie in the same cluster.The probability (ℎ) of such connectedness of two pixels can be calculated for all the possible separation distances.Following that, the average connected distance  can be computed as [21]: where  represent the average distance over which pixels are connected.Long average distance represents high connectivity and vice versa.Average connected distance  can be converted to physical distance when the resolution/pixel-dimension of the image is known.Unlike the indicator variogram γ(h), two-point correlation function S(h), and two-point cluster function C2(h) that have statistical formulation, (ℎ) has a deterministic formulation.The following steps are required to calculate the connectivity function and the average connected distance : 1. Locate clusters and assign cluster indices for the constituent of interest.
2. Determine separation distances between all the pixels pairs located in the constituent of interest.3.For each separation distance calculated above, determine the number of pixel pairs located in the same cluster and the number of pixel pairs whose pixels are located in different clusters.4. For each separation distance calculated above, calculate (ℎ) as the ratio of number of pixel pairs sharing the same cluster index to the total number of pixel pairs in the constituent irrespective of the cluster index.5. Compute average connected distance  by calculating the Integral in Equation 9.

Travel-Time Histogram computed using the Fast Marching Method
The fast-marching method is used to model the evolution of a boundary and interface, also referred as a front.The fast-marching method is a numerical technique to approximate the travel time T of a front moving through a region of varying travel speed F(x).Travel time of a front is computed by solving the Eikonal equation [22] expressed as: where  is the travel speed at pixel location x and  is the travel time of the front arriving at a certain pixel location x.By setting a high travel speed for pixels belonging to the constituent of interest and null travel speed for pixels belonging to the background material, the travel time required for a front to travel between two pixels (i.e.pixel pair) in the constituent of interest at various separation distances can be calculated by solving Equation 10 using fast marching method.Such travel times between several pixel-pairs is a statistical indicator of connectivity, especially the tortuosity of the path connecting the pixels.Large travel times indicate pixels have high global connectivity.Following steps are implemented to compute the travel-time histogram to quantify connectivity of a constituent: Step 6 prevents bias towards a specific/dominant cluster.
Step 6 facilitates comparison of the average travel times for different images.
Step 7 is indispensable because it minimizes the effect of random source point selection.However, Step 7 is computationally expensive especially when image size is large.The implementation of fast-marching method is similar to the one used by Ojha et al. [29] to estimate diffusion of pressure propagation in the subsurface.

Results and Discussion
The result section demonstrates the connectivity quantification achieved by the six metrics on the six types of synthetic binary images (Figure 1) and the two real SEM images (Figure 2).Connectivity quantified using scalar metrics (single-valued metrics) are presented as the mean and standard deviation computed for the 500 images (realizations) belonging to each connectivity type/level.

Connectivity Index based on percolation theory (Metric 1)
This is a scalar metric.The connectivity results using Equation 2, referred as the connectivity index, are listed in Table 1.This metric is computed based on percolation theory.The mean of connectivity index decreases from Type 1 to Type 6, consistent with the reduction in connectivity from Type 1 and 6.Moreover, there is close to 2 orders of magnitude variation in the mean of connectivity index, which indicates a high sensitivity of the metric to connectivity.The standard deviation indicates the variation in the metric over the 500 images per connectivity type.Coefficient of variation (CV) indicates the relative magnitude of standard deviation or the variability of standard deviation to its mean.There is a reduction in CV with reduction in connectivity indicating the variability of the metric for higher connectivity.A good metric should have low CV and low standard deviation especially for higher connectivity (e.g.Type 1).Also, the peak of CV irrespective of the decrease in connectivity from Type 1 to Type 2 and from Type 2 to Type 3, indicates the increase in variability of the metrics for connectivity level represented by the Type 2 images.The large standard deviation of Type 1 images is evident in  This is a scalar metric.The results from Euler number are shown in the Table 2.The average Betti numbers are also included in the table.The average  0 and  1 represent the number of clusters and holes, respectively, detected for each conductivity type.As the length of the distributed bars representing the constituent of interest decreases, there is an increase in number of clusters corresponding to the decrease in connectivity.Table 2 shows an increase in  0 and decrease in  1 with the decrease in connectivity.The resolution of  0 improves whereas as that of  1 decreases with the decrease in connectivity.Overall, there is an increase in Euler number with decrease in connectivity.Both  0 (i.e.clusters) and  1 (i.e.holes) influence the variation in Euler number for higher connectivity, whereas  0 dominates the variation in Euler number for lower connectivity.Compared to Metric 1, this metric exhibits higher CV for higher connectivity (e.g.Type 1) and lower CV for lower connectivity (e.g.Type 6).Consequently, Metric 2 is better than Metric 1 for lower connectivity but Metric 2 exhibits large variability for higher connectivity.Metric 2 also has a high sensitivity to connectivity.Connectivity index and Euler number have higher variability at higher connectivity.Euler number is less reliable than connectivity index at high connectivity, whereas the connectivity index is less reliable at low connectivity.This is a spectral metric generated as a function of separation distance between two pixels.When presenting the indicator variogram, the y-axis is (ℎ) calculated using Equation 6and x-axis is the separation distance h ranging from 0 to 200, which is the length of the synthetic image.Indicator variogram is calculated as the average over the 500 images for each type of connectivity.At a given separation distance, (ℎ) can be as high as the fraction of the constituent in the image indicating zero connectivity.When the constituent is fully connected, (ℎ) is zero.Because of the symmetry in the configuration of the white bars in the synthetic images, indicator variograms are generated in horizontal (X) direction and X-diagonal direction, as shown in Figure 7.The six plots correspond to Type 1 to 6, respectively, where red and blue curves represent the response in the horizontal (X) direction and X-diagonal direction, respectively.
Indicator variogram in horizontal direction (red) is highly sensitive to the level of connectivity, whereas that in the diagonal direction (blue) is relatively insensitive.(ℎ) increases with the increase in the separation distance indicating decrease in connectedness as the separation distance becomes comparable to the length-scale of cluster elements (i.e. the length of bar), beyond which the indicator variogram is insensitive to connectivity and separation distance.Each red curve increases with increase in separation distance h until the maximum length of white bars (100, 50, 25, 20, 10, 5 pixels for Type 1 to 6, respectively) are reached for specific image type.At any separation distance, the variogram response increases from Type 1 to 6, which is consistent with the reduction in the connectivity from Type 1 to 6.The directional sensitivity of the indicator variogram (i.e.separation between the horizontal and diagonal responses) is higher for higher connectivity (e.g.Type 1).Indicator variogram is better suited for representing local connectivity as compared to global connectivity because this metric does not differentiate whether pixel pairs belong to same cluster or not.For example, based on the indicator variogram response, Type 1 exhibits extensive local connectivity in horizontal direction that is better than other image types.
Indicator variogram has good sensitivity to the connectivity level.The resolution of this metric decreases with decrease in connectivity and the metric is suited for quantifying higher connectivity.This is a spectral metric generated as a function of separation distance between two pixels.Due to symmetry in the synthetic images, the two-point cluster function 2(ℎ) is generated in horizontal (X) direction and X-diagonal direction, as shown in Figure 8.This metric is generated for separation distance ranging from 0 to 200 pixels and the probability 2(ℎ) ranges from 0 to 1.For visualization purposes, y-axis ranges from 0 to 0.1.Two-point cluster function is calculated as the average over the 500 images for each type of connectivity.The six plots correspond to the six image types with distinct level of connectivity.2(ℎ) decreases from Type 1 to 6 in both horizontal and diagonal direction indicating the decrease in connectivity.2(ℎ) in diagonal direction is much lower than that in horizontal direction due to the configuration of the white bars forming the constituent of interest.2(ℎ) response becomes zero beyond a certain separation distance because pixels in constituent of interest are not connected, which is a major difference from the (ℎ).Two-point cluster function has good sensitivity to the connectivity level.The resolution of this metric decreases with decrease in connectivity and the metric is suited for quantifying higher connectivity.

Quantification of Connectivity of Real Scanning Electron Microscopy (SEM) Images
After testing the metrics on the synthetic images, the performance of the metrics is evaluated on the SEM images of shale samples, as shown in Figure 2. Table 3 lists the connectivity obtained using connectivity index (Equation 2) and Euler number.Both the metrics indicate that      4 present the performances of connectivity metrics in response to changes in areal fraction and connectivity.In the two-point cluster function C(h) plot (Figure 17), the C(h) is plotted against the distance h, where the solid line is the average C(h) for 500 images per connectivity type and the shaded region represents the variations across the 500 images within the range of ±2 standard deviations.The connectivity function plot in Figure 18 is similar to the layout of Figure 17, where the average (ℎ) calculated for 500 images per connectivity type is presented in solid line and the variations across the 500 images per connectivity type is presented as the shaded region around the solid line.The area under the curve (AUC) is calculated for the connectivity function and 2-point cluster function and is averaged for each image in the set of 500 images per connectivity type.The average travel time is derived from the travel-time histogram.The mean and coefficient of variation of the scalar metrics are presented in Table 4.As compared to connectivity function, cluster function (C2) has higher variability at lower separation distance, and vice versa.Variability of connectivity function and cluster function increases with the increase in connectivity.For Type 1, connectivity function exhibits large variations especially for large separation distances.Mean connectivity function is relatively insensitive to variations in the areal fraction as compared to the mean cluster function.A robust connectivity metric should be more sensitive to connectivity type and less sensitive to areal fraction.All metrics are sensitive to connectivity type.Euler number and connectivity index are more sensitive to connectivity variations at lower connectivity, while the area under cluster function and connectivity function are more sensitive to connectivity variations at higher connectivity.The average travel-time is most sensitive to connectivity variations for both low and high connectivity.The average travel time and average connected distance are the least sensitive to the variation in areal fraction at a constant connectivity.Euler number and connectivity index are most sensitive to areal fraction at a constant connectivity.At high connectivity, Euler number exhibits large variations due to the randomness in the distribution of the constituent.Uncertainty in Euler number reduces with reduction in connectivity.Connectivity function and connectivity index also exhibit large uncertainty due to the randomness in the distribution of constituent at high connectivity, which reduces with the decrease in connectivity.Area under the cluster function followed by the average travel time is the most robust to randomness in the distribution of the constituent at high connectivity.

Conclusions
Six metrics are developed and tested for quantifying the connectivity of material constituents.The robustness of the metrics is evaluated by applying the metrics to 3000 synthetic images representing six levels of connectivity and SEM images of organic-rich shale sample.Connectivity function, travel-time histogram, cluster function, correlation function, and indicator variogram are spectral metrics, whereas Euler number, connectivity index, average connected distance and mean travel time are scalar metrics.Indicator variogram, two-point cluster function, connectivity function, and travel time histogram, have scale dependence and can differentiate between local and global connectivity.Indicator variogram and cluster function can quantify the directional nature of connectivity.Travel-time histogram represents the tortuosity of connected paths unlike other metrics.Connectivity index and Euler number have good sensitivity to connectivity but higher variability at high connectivity.Euler number is less reliable than connectivity index at high connectivity, whereas the connectivity index is less reliable at low connectivity.Unlike the twopoint cluster function, the indicator variogram is better suited for representing local connectivity as compared to global connectivity.The directional sensitivities and resolutions of the cluster function and indicator variogram are higher for higher connectivity.Both average connected distance and mean travel time have good resolution at higher connectivity that decreases with decrease in connectivity.Being scalar metrics, average connected distance and mean travel time can be easily used to compare connectivity of different images.A robust connectivity metric should be more sensitive to connectivity type and less sensitive to areal fraction.The average travel-time is most sensitive to connectivity variations for both low and high connectivity.The average travel time and average connected distance are the least sensitive to the variation in areal fraction at a constant connectivity.Euler number and connectivity index are most sensitive to areal fraction at a constant connectivity.Area under the cluster function followed by the average travel time is the most robust to randomness in the distribution of the constituent at high connectivity.
For any queries contact Dr. Siddharth Misra, Texas A&M University

Figure 1 .
Figure 1.Six synthetic binary images representing Type 1 to Type 6 indexed as (a) to (f), respectively, such that each type represents a specific level of connectivity at approximately 10% areal fraction of the white constituents.Connectivity of the white constituent decreases from Type 1 to Type 6 due to the reduction in the connectedness with reduction in the length of the white bars randomly distributed over the black background.Figure 1(a) represents Type 1 with highest connectivity, whereas Figure 1(f) represents Type 6 with lowest connectivity.

Figure 1 (
Figure 1.Six synthetic binary images representing Type 1 to Type 6 indexed as (a) to (f), respectively, such that each type represents a specific level of connectivity at approximately 10% areal fraction of the white constituents.Connectivity of the white constituent decreases from Type 1 to Type 6 due to the reduction in the connectedness with reduction in the length of the white bars randomly distributed over the black background.Figure 1(a) represents Type 1 with highest connectivity, whereas Figure 1(f) represents Type 6 with lowest connectivity.

Figure 2 .
Figure 2. 200-pixel by 200-pixel binary image sliced from the 2000-pixel by 2000-pixel scanning electron microscopy (SEM) image of a shale sample having (a) high connectivity and (b) low connectivity of the organic constituent represented in white.

Figure 3 (
a), only four pixels noted by 1 are connected to red pixel when 4-connectivity type is considered.In Figure 3(b), all the white pixels are connected to the red one when 8-connectivity type is considered.The group of pixels connected together form a single cluster.Different connectivity type will result in different configurations of clusters.The 8-connectiivty type is used in our study.A sample of clusters identified using the 8-connectivity type are shown in Figure 4, where white-constituent pixels in Figure 4(a) are grouped into 5 clusters, shown in Figure 4(b).In our study, a 3×3 operating kernel shown in the Figure 3(b) when applied on binary image shown in Figure 4(a) identifies the clusters and assigns unique indices as shown in Figure 4(b).It is to be noted, image segmentation is essential prior to cluster determination [27].

Figure 3 .Figure 4 .
Figure 3. Operating kernels corresponding to (a) 4-connectivity type and (b) 8-connectivity type.These kernels are used to identify all the clusters in a 2D image.Kernel in Figure 3(b) is applied to Figure 4(a) to obtain the Figure 4(b)

1 .
Determine pixels belonging to the constituent of interest.2. Assign a large value of travel speed to the constituent pixels and a zero travel speed to the background pixels.Low values of travel speed will hinder the propagation of front.3. Randomly select one of the constituent pixels to be source point.4. Solve Eikonal equation using fast marching to obtain the travel times of a front starting from the source point and traveling to all the constituent pixels.5. Identify the locations of pixels being reached from a source point.Do not use these identified pixels as source points in the next iterations.6. Loop from Step 3 to 5 until all the pixels in constituent of interest has a travel time.7. Loop from Step 3 to 6 for 30 times to obtain statistically significant distribution of travel time responses.8. Compute the travel-time histogram and the average travel time to quantify the connectivity.

Figure 6 .
Both the images are from Type 1 but the connectivity in Figure 6(a) is larger than in Figure 6(b) because of the randomly distributed bars in Figure 6(a) intersect with each other forming a single cluster, whereas the bars exhibit fewer intersection in Figure 6(b) resulting in more isolated clusters and lower connectivity as compared to Figure 6(a).

Figure 7 .
Figure 7. Connectivity of synthetic binary images shown in Figure 1 quantified using indicator variogram versus separation distance.The distance in x-axis is plotted in log scale.

Figure 8 .
Figure 8. Connectivity of synthetic binary images shown in Figure 1 quantified using two-point cluster function versus separation distance

Figure 9 .Figure 10 .
Figure 9. Connectivity of synthetic binary images shown in Figure 1 quantified using connectivity function versus separation distance

Figure 11 .Figure 12 .
Figure 11.Connectivity of synthetic binary images shown in Figure 1 quantified using travel-time histogram computed using fast marching method

Figure 2 (Table 3 .
a) has higher connectivity as compared to Figure 2(b), which is consistent with the visual inspection.Both the metrics exhibit significant resolution in differentiating the connectivity for the two images.These metrics don't have directional and scale dependence.Connectivity of the real SEM images of shales in Figure 2 quantified using connectivity index (Metric 1) and Euler number (Metric 2) are generated in all the four directions, as shown in Figure 13.For Figure 2(a) representing higher connectivity in horizontal directional, the variograms in Figure 13(a) indicate that the local and global connectivity are similar in vertical and the two diagonal directions and are negligible beyond the separation distance of 30 pixels.However, the local and global connectivity in horizontal direction is much higher and the connectivity persists till separation distance of 175 pixels.For Figure 2(a), connectivity in horizontal direction has longer range and higher magnitude as compared to the vertical and the two diagonal directions.In contrast, for Figure 2(b), the connectivity is negligible beyond a separation distance of 15 pixels in all directions, as shown in Figure 13(b).There may be few 100-pixel long clusters in horizontal direction.Based on Figure 13, we can conclude that Figure 2(a) has better connectivity than Figure 2(b) in all the four directions.Moreover, Figure 2(a) is predominantly connected in horizontal direction, while Figure 2(b) has isotropic connectivity.We do not present the responses of two-point cluster function because it will provide similar information as indicator variogram.One disadvantage of two-point cluster function is that is only sensitive to the dominant cluster.

Figure 13 .
Figure 13.Connectivity of the real SEM images of shales in Figure 2 quantified using indicator variogram The connectivity function for the two images in Figure 2 are shown in the Figure 14.As separation distance increases, there are fewer pixel pairs in the same cluster beyond a certain separation distance resulting in a sharp drop in connectivity function.The sharp drop in connectivity function is evident in the low-connectivity sample at smaller separation distance.The high-connectivity sample exhibits a flat connectivity response from separation distance of 60 to 200 pixels indicating a dominant cluster spanning the image.Connectivity function has a good resolution for differentiating between high and low connectivity at separation distance greater than 25 pixels.Connectivity function is also a good indicator of local and global connectivity, as shown for the high-connectivity sample, where there are two flat responses, one at small separation distances smaller than 20 pixels and the other for separation distance between 60 to 200 pixels.The area under the curve is determined to be 125.97 and 25.48 for the high-connectivity and lowconnectivity images, respectively, indicating the first image to be more connected than the second image.The travel-time histogram for the two samples in Figure 2 are presented in Figure 15.For the low connectivity image, the frequency of occurrence goes to zero for travel times larger than 34.The histogram of the low-connectivity sample has narrow spread and low mean travel time.The average travel time for Figure 2(a) and Figure 2(b) are 24.61 and 6.20, respectively.To conclude, the connectivity of the first sample is significantly larger than the second sample, consistent with visual inspection.

Figure 14 .Figure 15 .
Figure 14.Connectivity of the real SEM images of shales in Figure 2 quantified using connectivity function

Figure 16 .
Figure 16.Typical synthetic images with different areal fractions and similar global connectivity.Areal fraction is altered by varying the dimensions of white squares and connectivity is altered by varying the length of the white bars.Leftmost top figure represents highest connectivity and highest areal fraction.Rightmost bottom figure represents lowest connectivity and lowest areal fraction.

Figure 17 , 18 ,
Figure 17, 18, and Table4present the performances of connectivity metrics in response to changes in areal fraction and connectivity.In the two-point cluster function C(h) plot (Figure17), the C(h) is plotted against the distance h, where the solid line is the average C(h) for 500 images per connectivity type and the shaded region represents the variations across the 500 images within the range of ±2 standard deviations.The connectivity function plot in Figure18is similar to the layout of Figure17, where the average (ℎ) calculated for 500 images per connectivity type is presented in solid line and the variations across the 500 images per connectivity type is presented as the shaded region around the solid line.The area under the curve (AUC) is calculated for the connectivity function and 2-point cluster function and is averaged for each image in the set of 500 images per connectivity type.The average travel time is derived from the travel-time histogram.The mean and coefficient of variation of the scalar metrics are presented in Table4.As compared to connectivity function, cluster function (C2) has higher variability at lower separation distance, and vice versa.Variability of connectivity function and cluster function increases with the increase in connectivity.For Type 1, connectivity function exhibits large variations especially for large

Figure 17 .Figure 18 .
Figure 17.Connectivity of the synthetic images in Figure 16 quantified using the twopoint cluster function (C2) to study the effect of areal fraction on the two-point cluster function and p is the areal fraction of the constituent of interest.For estimating reliable indicator variogram, a substantial amount of data is needed.The indicator variogram captures the distribution of pixels in the image but misses the information about clusters in the image.One limitation of indicator variogram is its inability to capture the curvilinearly connected features.Moreover, different patterns of pixel distributions may lead to similar indicator variogram response.
On the other hand, connectivity quantified using spectral (non-scalar) metrics, such as indicator variogram, are presented as average probability at each separation distance.Connectivity function, travel-time histogram, cluster function, correlation function, and indicator variogram are spectral metrics, whereas Euler number, connectivity index, average connected distance and mean travel time are scalar metrics.Connectivity index, Euler number, and other scalar metrics cannot characterize the anisotropy and scale dependence of connectivity.Indicator variogram, two-point cluster function, connectivity function, and travel time histogram, have scale dependence and can differentiate between local and global connectivity.Indicator variogram and cluster function can quantify the directional nature of connectivity.Connectivity function, connectivity index and Euler number have deterministic formulation.Cluster function and connectivity function share certain similarities in terms of connectedness of pixels when they lie in the same cluster; however, connectivity function has a deterministic formulation.Travel-time histogram represents the tortuosity of connected paths unlike other metrics.

Table 1 .
Connectivity of synthetic binary images shown in Figure1quantified using mean, standard deviation (std), and coefficient of variation (CV) of connectivity index based on percolation theory (computed using Equation2).
Figure 6.Two Type 1 images with different connectivity due to different random distribution of the white constituent.3.1.2.Euler Characteristic/Number (Metric 2)

Table 2 .
Connectivity of synthetic binary images shown in Figure1quantified using Betti numbers and Euler number based on integral topology.