Versatile algorithms for the computation of 2-point spatial correlations in quantifying material structure

This paper presents a generalized framework along with the associated computational strategies for a rigorous quantification of the material structure in a range of different applications using the framework of 2-point spatial correlations. In particular, we focus on applications requiring different assumptions about the periodicity and/or involving irregular domain shapes and potentially extremely large datasets. Important details of the computational algorithms needed to address these challenges are developed and illustrated with example case studies. Algorithms developed and presented in this work are available at http://dx.doi.org/10.5281/zenodo.31329.


Background
Almost all materials enabling advanced technologies exhibit a richness of hierarchical internal structures at multiple length scales (spanning from atomic to macroscale).Certain salient features of this structure control the performance characteristics of interest for a selected application.Although there is often some intuition about what these salient features might be, validated and automated protocols do not yet exist for reliably identifying these features.Further, efficient computational protocols do not yet exist for tracking their evolution during the various unit processing/synthesis steps employed in the industrial manufacturing of new products/devices.In fact, the modulation of the material structure in order to improve the performance of engineering components is often the main motivation behind all activities in the field of materials science and engineering.Despite its important role, a unified computational framework for the quantification of the material hierarchical structure does not exist currently.
Conventional practices for the quantification of the material microstructure have largely relied on accumulated legacy knowledge by domain experts and intuition.Some examples of such microstructure measures include volume fraction, average particle/ grain size, average particle/fiber spacing (mean free path), tortuosity and coordination number [1][2][3][4][5][6][7][8][9][10][11][12].However, it is easily seen that this simple set of microstructure measures is unlikely to be the best possible set or even an adequate set, because it is easy to imagine multiple instantiations of microstructures that would exhibit the same values of these simple microstructure measures while displaying vastly different values of macroscale properties of interest.This is particularly true when establishing structureproperty linkages for defect-sensitive, potentially anisotropic, macroscale properties.
In recent papers, Niezgoda et al. [13,14] presented a rigorous theoretical framework for the stochastic quantification of the material structure at any selected length/structure scale, building on the established concepts of spatial correlation functions [15][16][17][18][19][20][21][22][23][24][25][26].Although a number of different measures of the spatial correlations in the microstructure are possible (e.g., lineal path functions [27][28][29][30] and radial distribution [30][31][32][33] functions), only the n-point spatial correlations (or n-point statistics) [15,16,18,19,30,[34][35][36][37][38][39] provide the most complete set of measures that are naturally organized by increasing amounts of structure information.For example, the most basic of the n-point statistics are the 1-point statistics, and they reflect the probability of finding a specific local state of interest at any randomly selected single point (or voxel) in the material structure.In other words, they essentially capture the information on volume fractions of the various distinct local states present in the material system.The next higher level of structure information is contained in the 2-point statistics, which capture the probability of finding specified local states h and h 0 at the tail and head, respectively, of a prescribed vector r randomly placed into the material structure.It should be noted that there is a tremendous leap in the amount of structure information contained in the 2-point statistics compared to the 1-point statistics.It should also be noted that if the 2-point statistics described above are expressed only as a function of the distance between the two points (i.e., r is treated as a scalar instead of a vector), one recovers the radial distribution functions or the pair correlation functions that have been used extensively in prior literature [21,30,40].Higher-order correlations (3-point and higher) are defined in a completely analogous manner.
It is emphasized that the n-point spatial correlations provide statistical information on the microstructure.For example, 2-point statistics provide the expected (i.e., the average) value of a selected correlation between two points separated by a specified vector.However, they also contain information on the variance in the 1-point statistics [39].In some special cases, they can provide readily interpretable information such as the average shape of the particle (i.e., a mesoscale constituent), especially when the particles have a dominant shape and orientation.On the other hand, when the distribution of the particle shape and orientation is completely random, the corresponding correlations are indeed isotropic (i.e., do not reveal the particle shape directly).The connections between the n-point statistics and the more traditional measures of microstructure have been detailed in prior literature [30,39].
The n-point statistics described above are most efficiently computed on digital datasets using fast Fourier transform (FFT) techniques [13,35,36,41].An implicit benefit of treating the material structure function as a stochastic process is that it allows a rigorous quantification of the associated variance [13,14].A second important benefit of the spatial correlations described here is that they lend themselves to objective, lowdimensional, high-value representations (using techniques such as principal component analysis (PCA)) [13,14,37,[42][43][44].
The strongest support for the choice of n-point spatial correlations as the most appropriate measures of material structure comes from the pioneering work of Kroner [45], who has taught us that the effective properties of composite material systems can be conveniently expressed as a series sum with the structure details entering this series explicitly in the form of n-point spatial correlations.These composite theories have been generalized to a broad range of materials phenomena and have been summarized in several books [30,39,46].There are also several reports in literature, where they have been successfully applied to estimate effective properties (both linear and nonlinear) of a broad range of materials with complex structures [47][48][49][50][51][52].Physically, the n-point spatial correlations are very effective in rigorously quantifying the local neighbourhoods in the complex internal structure of most advanced materials.Since the local neighbourhoods control the local response, it is only logical that the n-point spatial correlations are the ideal measures of the material structure in formulating process-structure-property (PSP) linkages of interest in designing high performance engineering components.In recent work, the spatial correlations have been used successfully to establish reliable low-cost surrogate models for capturing the materials core knowledge in the form of process-structureproperty linkages [19,37,43,44,53,54].
In this work, we focus exclusively on the computations of 2-point spatial correlations, but the concepts presented can be expanded trivially for the computation of higherorder statistics.Much of the prior work on the computations of the 2-point spatial correlations has focused on fairly simple microstructures described on rectangular parallelepiped domains that were uniformly tessellated into cuboids (also referred as pixels or voxels).In these earlier applications, the microstructure domains were mainly assumed to be periodic to take advantage of the computational efficiency of discrete Fourier transforms (DFTs).Furthermore, most computations were demonstrated on relatively small domain sizes.In this paper, we present new enhancements that facilitate the computation of the 2-point spatial correlations in a much broader range of applications.In particular we focus on three challenges: (i) avoiding the need to invoke periodicity while still using DFTs, (ii) application to irregular domains, and (iii) application to extremely large datasets.

Methods: Discretized microstructure function and spatial correlations
A microstructure function expresses spatially resolved material structure information gathered from any source, either experiments or simulations.Conceptually, one can think of the microstructure function as h(x), where h denotes the local state occupying the spatial position x.In this notation, the local state refers to any combination of attributes used to define the material locally (e.g., a combination of elemental composition, phase identifier, crystal lattice orientation, and dislocation density may be used to define the local state in multiphase polycrystalline materials at the mesoscale).Brief reflection will expose the unwieldy nature of such a description, especially when one tries to include a diverse set of local state attributes over multiple hierarchical length scales.In an effort to overcome this challenge, the concept of a stochastic microstructure function was introduced [15].In this novel concept, the microstructure function is defined as m(h, x), where m denotes the probability density associated with finding the local state h at the spatial position x.Consequently, m(h, x)dhdx captures the corresponding probability measure.
Our interest in this paper, however, rests solely on digital description of the microstructure.Although it is theoretically possible to extract a digital representation of the microstructure function using a multitude of choices in the selection of the basis functions for both the spatial and local state variables [55,56], we focus our attention here on the simplest of these bases corresponding to the primitive binning of the spatial domain as well as the local state space.With this choice, m(h, x) admits a simple digital description as where χ i () denotes a set of indicator basis functions, and m n s denotes a digital microstructure signal.For example, χ s (x) allows partitioning of the spatial domain into nonoverlapping volumes (typically employed as uniform binning of the space so that DFT methods can be applied later), with the function taking the value one for all points inside the sub-volume enumerated by s and the value zero for all other points.Note that χ n (h) can be defined in a similar manner for any local state space of interest.Figure 1 presents a simple illustration of these concepts.It is also important to recognize that m n s can be physically interpreted as the probability of finding any of the local states corresponding to local state bin enumerated by n in the spatial bin enumerated by s.Consequently, it should be noted that m n s reflects a spatially resolved description of the material structure in a broadly applicable form.Note that 0≤m n s ≤1 .It is also emphasized that the digital microstructure signal is inherently tied to a specific length scale (defined by size of spatial bins) and a specific resolution of the local state (defined by size of local state bins).
Because of the absence of a natural origin from where one might start indexing the spatial bins, only the relative placement of local states in the material structure contains meaningful information.In other words, only the spatial correlations in the material structure contain high value information.As mentioned earlier, an extensible framework for rigorous quantification of spatial correlations in the material structure is available in the The interpretation for the index t used to label the discretized vector space is also illustrated.Note that both s and t are used as vector indices in this figure form of n-point spatial correlations (or n-point statistics) [16,19,30,[34][35][36][37].The firstorder information on the spatial statistics is actually contained in the 2-point spatial correlations (recall that 1-point statistics capture only the volume fractions) defined as [39] f ðh; As noted earlier, the 2-point spatial correlation function, f(h, h 0 |r), reflects the probability density associated with finding local states h and h ' at the tail and head, respectively, of a randomly placed vector (includes both a magnitude and a direction) r in the material internal structure.Because the vector r carries both the magnitude and direction in this definition, the spatial correlation function defined in Eq. ( 2) is directionally resolved.As one can imagine, it is possible to average the statistics over the direction and use f(h, h 0 ||r|) instead, where |r| denotes the magnitude of the vector.Indeed, f(h, h 0 ||r|) are generally referred to as the pair correlation functions or the radial distributions and contain significantly less spatial information compared to f(h, h 0 |r).In Eq. ( 2), Ω(r) denotes the volumetric domain of the material internal structure analyzed, with |Ω(r)| denoting the measure of the corresponding volume.It is important to note the dependence of the volumetric domain on the vector itself.This is because material structures studied often have finite domains (except when periodicity is invoked) and the domain available for evaluating the 2-point spatial correlation defined in Eq. ( 2) depends on the vector r.This is because only those points where it is possible to evaluate both m(h, x) and m(h 0 , x + r) can be included in the evaluation of Eq. ( 2).As one might imagine, there are certain regions near the boundaries of a given microstructure image where this condition is not met (i.e., either x or x + r fall outside the given image) and therefore the region available for use in Eq. ( 2) should be expected to show a strong dependence on r (to be discussed in more detail later).
Analogous to the treatment of the microstructure function earlier, we can express the probability measure as f(h, h 0 |r)dhdh 0 and establish a simple digital representation of this function as It is important to recognize that the index t in Eq. (3) effectively bins the vector space associated with r as illustrated in Fig. 1.Starting with the above notions, one can establish the desired relationship between the digital representations of microstructure and the (directionally resolved) 2-point spatial correlations as [35,36] where S t captures the r-dependence of Ω(r) (see Eq. ( 2)).It is important to recognize that the denominator S t in Eq. ( 4) is essentially the total number of trials conducted (where each trial denotes checking what local states exist in spatial bins marked s and s + t) and the numerator m n s m p sþt in Eq. ( 4) denotes an expected measure of total success in these trials (i.e., actually finding the selected local states n and p at the two bins, respectively).Recognizing this feature of Eq. ( 4) allows one to make any needed corrections for different situations (will be expanded in later sections).
The computation of f np t for a specified combination of n and p, essentially requires Ο(S 2 ) (i.e., of the order of S 2 ) computations (Ο(S) for each value of t and there are Ο(S) different values of t).Such calculations are generally very expensive and are not easily scalable for datasets with high values of S. In recent years, it has been demonstrated that the angularly resolved n-point statistics computations can be accomplished at Ο(Slog S) by employing discrete Fourier transforms (DFTs) [35,36] (which allow the use of fast Fourier transform (FFT) algorithms) and invoking the convolution theorem.One of the main benefits of these computational schemes is their excellent scalability to large datasets.
In prior work, the protocols described above have been successfully applied to multiphase composite systems [13,14,19,43,44,57,58], atomistic datasets [59,60], and polycrystalline microstructures [42,61].However, in all of these applications, the microstructure domains had a simple overall shape (rectangles in 2-D and rectangular parallelepipeds in 3-D), periodicity was generally imposed to take advantage of FFT algorithms, and the studies used relatively small domains.In this work, we present major enhancements to the current protocols that are designed to address these challenges.
As noted earlier, FFT algorithms are central to scalable computation of 2-point statistics.However, they implicitly assume that the microstructure being studied is periodic in all directions (i.e., it can be extended by simply repeating the entire domain as many times as needed).With the assumption of periodicity, S t in Eq. ( 4) can be taken to be the same as S (the total number of spatial bins in the microstructure).This is because every spatial bin in the microstructure can be used to place the tail (or equivalently the head) of the vector in evaluating the 2-point statistics.Furthermore, one can simply use the properties of DFTs to compute f np t .This is because Eq. ( 4), with the assumption of periodicity, translates to the following in the DFT space via the convolution theorem: where ⊙ is the element-wise product operator (also known as Hadamard or Schur product).Throughout this paper, superscript * will denote the complex conjugate and ℑ() denotes the DFT transformation of the data to the frequency space enumerated by k (in the context of this paper, this is the spatial frequency space).As a result of Eq. ( 5), the computation of the 2-point statistics is reduced to computing the DFT of m n s , performing requisite products in the frequency space (where they are fully uncoupled), and performing an inverse DFT.For plotting the 2-point statistics, the most intuitive visualizations of 2-pt.statistics would result if t = 0 lies in the center of plot.This shift is accomplished trivially by making use of the periodicity implied in the DFT-based computations.
Figure 2 illustrates the above concepts through a simple "honeycomb" microstructure, where each pixel or voxel is colored either white or black.Since there are two local states, we can potentially compute a total of four different 2-point spatial correlations functions: f 11 t , f 12 t , f 21 t , and f 22 t , where n = 1 refers to the white-colored phase and n = 2 refers to the black-colored phase in Fig. 2. Exploiting the known properties of DFTs, Niezgoda et al. [35] have demonstrated that the number of independent 2-point spatial correlations defined in Eq. ( 5) is only H − 1, where H is the total number of distinct local states present in the material system of interest.Consequently, for two-phase microstructures studied here, we generally need to compute only one of the autocorrelations.Figure 2 shows a plot of white-white autocorrelation.
The autocorrelations presented in Fig. 2 capture a number of salient features of the microstructure.The hexagonal symmetry, the feature shape, and the feature spacing are readily apparent.Furthermore, the periodicity implied in the use of DFTs resulted in the autocorrelations also exhibiting the same periodicity.Note also that the autocorrelation for the zero vector (at the center of the plot) provides the phase volume fraction.
An important consequence of invoking periodicity assumptions is that the number of trials for all vectors is exactly the same and is equal to the number of pixels or voxels in the microstructure studied.In other words, all vectors of interest have been sampled fairly.

Application to non-periodic microstructures
As a specific example, we will revisit the same structure illustrated earlier, but without invoking the assumption of periodicity.In other words, our interest is to compute the autocorrelations as defined in Eq. ( 4), while accounting for the fact that S t ≠ S.However, as stated earlier, a direct implementation of Eq. ( 4) would incur Ο(S 2 ) computations.A much better computational strategy would result if one borrows a well-established concept from image analysis [62,63] and "pads" the microstructure such that only long vectors (larger than the vectors of interest in computing the 2-point statistics) can wrap around from one edge of the original image to the opposite edge when the periodic assumption is implicitly invoked to take advantage of the computational expediency of the DFTs.
The padding strategy described above is illustrated in Fig. 3. Let S = (S 1 , S 2 ) denote the number of spatial bins in the original two-dimensional microstructure , where T = (T 1 , T 2 ) identifies the range of the vectors for which the 2-point statistics are to be computed.The reader is cautioned that use of very high values of T can produce meaningless answers.As an example, if one chooses T = (S 1 , S 2 ), then one can see that the number of trials conducted for the largest vector in computing the 2-point statistics is just one.Based on our experience, we recommend that T < (S 1 /2, S 2 /2).Let the padded microstructure be denoted as m ̃n s .The spatial bins in the padded region of the microstructure may be assigned any of the local states that are not involved in the computation of the desired 2-point statistics.For example, if we are interested in computing f 11 t only, then the spatial bins in the padded region can be assigned a local state enumerated by 2 or a completely new local state enumerated by 3 (making the padded microstructure a 3-phase microstructure).
With the padded microstructures, we are now in a position to take advantage of DFTs.Following Eq. ( 5), we can first compute M ̃n k ¼ ℑ m ̃n s À Á , and then which produces an accurate count of the number of successes in finding local states n and p separated by all vectors t ≤ T. In fact, the computation described above produces results even for vectors t > T, but these results are corrupted by vectors wrapping around the padded region because of the periodicity assumption implicit in the DFTs.However, since our interest here is exclusively in t ≤ T, we will only take these results from the DFT computation described above.In order to compute the 2-point statistics of interest, we simply need to divide these numbers (equivalent to the numerator in Eq. ( 4)) with a suitable denominator denoting the total number of trials involved, which is expressed simply as It is pointed out that this strategy provides the exact answer we seek, and not an approximation to it.In fact all of the novel strategies presented in this paper provide the exact answers for the problems posed, but have the advantage that they provide these answers at significantly reduced computational cost compared to direct computations.Furthermore, the padding in Fig. 3 is shown such that it equally envelopes all sides of the original microstructure.This is just for easy visualization and interpretation.In reality, any placement of the original microstructure inside the overall padded region (i.e., any unequal distribution of the padding as long as the extended microstructure has the same overall size) will produce identical results for the computed 2-point statistics (this is, once again, a consequence of using DFTs).
Figure 3 depicts a plot of the f 11 t (white-white) autocorrelations that are not tainted by the periodicity assumptions implied in the use of DFTs.A comparison of the autocorrelations in Figs. 2 and 3 reveals important consequences of the assumption of periodicity.For example, the hexagonal symmetry is no longer evident in the autocorrelations (see the values corresponding to the black and red vectors shown in these figures).This is mainly because the different vectors are no longer sampled the exact same number of times.Although this may not be as important when one deals with a very large image, it clearly has an effect for the relatively small image shown in Fig. 3.In this simple example, one can easily reconcile the different values of the autocorrelations for the red and black vectors depicted in Fig. 3, by noting that we can indeed place many more red vectors with both endpoints in a white pixel, when compared to the similar placement of the black vectors.It is therefore important to recognize that the assumption of periodicity can indeed influence significantly the computed 2-point statistics, especially when one has a limited number of features in the image.Note that the strategy described above can be applied selectively on any of the bounding planes of the image.In other words, one can decide to invoke periodicity assumption on certain bounding planes and employ the padding strategy described above selectively on the other bounding planes.

Masked microstructure domains
As an extension of the idea described above, we now demonstrate a general concept of "masks" that can be used advantageously in many situations related to computing the 2-point statistics.In fact, the padding strategy described above can be considered as a special case of using masks.As an example, consider the microstructure in Fig. 4 which is essentially an extended version of the same microstructure shown in Figs. 2 and 3.However, certain regions of the microstructure have been masked to hide certain irregularly shaped regions where the information is either not available or is of inferior quality (in other words, we do not wish to include that information in the computations of the 2-point statistics).As shown in Fig. 4, these masked regions can be on the boundary of the microstructure (e.g., the microstructure is measured in an irregular domain).But they can also be inside the microstructure (e.g., some regions of the micrograph may not be discernable or reliable).As demonstrated earlier, a mask can also be applied to produce a padded region to impose non-periodic boundaries (see Fig. 3).In this situation, it is convenient to define two microstructure functions (see Fig. 4): (i) an extended microstructure function denoted as m ̃n s , where we have introduced an additional fictitious local state (i.e., the third phase colored gray in Fig. 4) in the masked region as well as the boundary padded regions and (ii) a mask function denoted as c s such that it takes a value of zero for spatial bins (shown as black) in the masked regions and one (shown as white) everywhere else.It is pointed out that the extended m ̃n s already contains the information in the c s .However, we choose to carry this information in the redundant manner described above for ease of discussion and computation.
Following the methodology described in the previous example, we compute to accurately count of the number of successes in finding local states n and p separated by all vectors of interest (as mentioned earlier, it is important to include padding if we wish to avoid the default assumption of periodicity implicit in the use of DFTs).In order to compute the 2-point statistics of interest, we simply need to divide these numbers (equivalent to the numerator in Eq. ( 4)) with a suitable denominator denoting the total number of trials involved.For the masked microstructures described here, the denominator can be computed easily as where C k = ℑ(c s ).It should be noted once again that the padding scheme described in the previous case study is essentially a special case of the masking protocol described here.
Figure 4 depicts a plot of the f 11 t (white-white) autocorrelations where the computations were limited to the unmasked regions (the white region of the mask) using the computationally efficient DFT-based protocols developed and presented in this paper.Furthermore, there was no assumption of periodicity in this computation.However, it is seen that these autocorrelations are indeed very similar to the ones shown in Fig. 2 (performed assuming periodicity and limited to a much smaller range of vectors).This provides unambiguous confirmation that the protocols presented here are doing an excellent job of computing the 2-point statistics for irregular domains without invoking periodicity, while taking full advantage of the computational efficiency of the FFT algorithms.

Large microstructure domains
We have already emphasized the benefits of using FFT algorithms to dramatically reduce the computational time incurred in the calculations of spatial correlations.In this section, we now shift our attention to cases where the datasets are extremely large and present a substantial challenge with their storage requirements.For example, a Fig. 4 Illustration of the masking strategy to compute 2-point statistics on irregular domains.The green boxes around the original microstructure are only for visualization microstructure of about 600 × 600 × 600 pixels is likely to prove unwieldy for an average desktop computer, especially since the application of the FFT algorithms would require double precision storage of complex numbers.Consequently, the computation of the non-periodic spatial correlations for a 2000 × 2000 × 2000 voxel dataset can easily demand close to 180 GBs of memory, forcing the use of a supercomputer for such calculations.We address the challenge described above using a strategy that carefully partitions the large domain into smaller subdomains, performs the requisite computations on them, and then assembles correctly the statistics for the original large domain from the computations on the subdomains.Our approach can be compared to various partitioning strategies for efficient computation of convolutions via FFTs that are well known in digital signal processing applications, such as overlap-save, overlap-add, and hybrid schemes [64,65].The overall process is illustrated schematically for a 2-D dataset in Fig. 5.
In this specific illustration, the overall domain is broken into 25 subdomains (see Fig. 5a).Let i m n s denote the digitized microstructure in the subdomain enumerated by i.The microstructure in each subdomain is then extended by padding in two ways to produce i m ̃n s and i m ̂n s as shown in Fig. 5b for a corner subdomain (labelled 1) and an interior subdomain (labelled 13).The main idea is that i m ̃n s is a simply padded version of i m n s with the padding size controlled by the largest vector size of interest in the calculation of the 2-point statistics (as we did before for avoiding the assumption of periodicity of the microstructure), while i m ̂n s is an extended version of i m n s that actually captures the real neighborhood information from the original large dataset.As illustrated in Fig. 5b, the treatment for generating i m ̂n s would have to be somewhat different for interior subdomains versus those that are at the boundary of the original large domain.Furthermore, it is important to ensure that the extensions for both i m ̃n s and i m ̂n s are of the exact same size.Let i M ̃n k and i M ̂p k denote the DFT representations of i m ̃n s and i m ̂n s , respectively.Following the ideas presented earlier, it should be clear that ℑ −1 i M ̃nÃ k ⊙ i M ̂p k Þ À will produce an accurate count of the number of successes from the ith subdomain in finding vectors with local states n and p at the tail and the head of the vector, respectively.As before, these counts are only accurate for vectors smaller than the padding size used in i m ̃n s , which is really our stated interest anyway.Once the Note that the total number of trials (denominator in Eq. ( 6)) is actually the same as what we used before in the case of non-periodic domains and is given by i S 1 − t 1 j jÞ i S 2 − t 2 j jÞ À À , where i S 1 ; i S 2 Þ À denote the grid size in the ith subdomain being studied, and (t 1 , t 2 ) denote the components of the vector for which the 2-point statistics are being computed.It is important to also note that the concepts of masking and modification for periodicity/non-periodicity can be combined with this scheme by making suitable adjustments to the algorithm as described in earlier example case studies.As a demonstration, the scheme is applied to a 3-D (three-dimensional) micro-CT dataset obtained from a sample of reinforced polymer composite.A visualization of the entire dataset and an exemplar subdomain is shown in Fig. 6a, b.For this dataset, we have applied masks on the irregularly shaped overall domain and computed the nonperiodic 2-point autocorrelations of the fiber phase.The computed autocorrelations are visualized as 3-D iso-contour surfaces in Fig. 6c.It can be observed that the fibers are predominantly aligned along the xy-plane with a small angular margin confined within a flat, ellipsoid region.There is also visible anisotropy in the in-plane distribution of the fiber orientations.Note that these topological features regarding the placement of It is important to note that suitable trade-offs can be made between the execution speed and the memory usage for the computation on the large microstructure described above.This is accomplished using the partitioning strategy illustrated in Fig. 5. Obviously, using more partitions reduces the memory requirements at the expense of increased overall computation time.Table 1 presents the time and memory cost comparisons for the 2-point autocorrelation calculation for the example dataset, for different partitioning window sizes (i.e., different memory requirements).For this case study, the partitioning window sizes were selected to correspond to commonly available memory choices.For example, at the current time, an average consumer laptop has 4 GBs of DDR3 memory, while an average researcher desktop has 8 GBs of DDR3 memory.All tests were done entirely on a single personal machine utilizing all threads available with an i7-5820K CPU and 48 GBs of DDR4 RAM.

Conclusions
We have presented a rigorous framework for quantification of the material microstructure using directionally resolved 2-point spatial correlations.The use and importance of FFTs for computationally efficient calculation of these spatial correlations have been discussed.Schemes to accommodate non-periodic boundaries, irregular grids, and very large datasets are detailed and demonstrated on simplistic datasets for maximum clarity.Finally, all schemes are simultaneously demonstrated on an experimentally obtained 3D microstructure dataset of very large size displaying an irregular grid with nonperiodic boundaries.Algorithms developed and presented in this work are made available at http://dx.doi.org/10.5281/zenodo.31329.

Fig. 1
Fig. 1 Illustration of the discretized microstructure, m n s .In this highly simplified microstructure, there are only two local states that are conveniently indexed by n, with n = 1 denoting the phase represented by white and n = 2 denoting the phase represented by gray.Example values of the microstructure signal are m 1 1;2

Fig. 2
Fig. 2 Illustration of the computation and visualization of 2-point statistics while invoking the periodicity assumption.Left: the microstructure used in the computation.The actual microstructure, shown in the green box in the center, is extended by invoking the periodicity assumption.This extension is only for visualization purposes and allows us to see the use of the exact same sampling size for all vectors of interest in the microstructure domain.Right: the corresponding white-white autocorrelation map

Fig. 3
Fig. 3 Illustration of the padding strategy to compute the 2-point statistics using DFT representations while avoiding the errors associated with the implicit periodic boundary assumptions.The green box around the original microstructure is only for visualization

Fig. 5 a
Fig. 5 a Illustration of the partitioning strategy for computation of the 2-point statistics for a very large microstructure.b The padding strategy needed for different subdomains depending on where they appear in the original large microstructure

Fig. 6 a
Fig. 6 a A visualization of the entire polymer composite dataset.b A visualization of a partitioned section of the dataset for use with the memory efficient calculation strategy described in this work.c Contour plots of the central axial planes of the calculated autocorrelations

Table 1
Comparison of computation times and memory required for the naive computation and various choices of partition (patch) size for the memory efficient procedure