Measuring concentration with Voronoï diagrams: the study of possible biases

In the context of turbulent flows laden with inertial particles, the accurate estimation of preferential concentration is particularly relevant. We have recently proposed to use Voronoï diagrams to estimate concentration fields from two-dimensional imaging techniques implemented around wind tunnel experiments. Due to various experimental biases, the relevance of such an analysis becomes questionable. In this paper, we show the robustness of the Voronoï analysis with respect to the three more important identified biases possibly present in such experiments.


Context and motivations
In the increasing number of studies on particle-laden turbulent flows, three aspects are usually pointed out in the case of inertial particles: the preferential concentration or clustering, the possible increase in collision rate and the enhancement of the particle settling velocity since all of them are of prime importance regarding the various practical applications (pollutant dispersion, rain formation, optimization of chemical reactors and so on). Preferential concentration plays a crucial role since it is clearly involved in two other aspects: the collision kernel is expressed as the product of the particles' radial relative velocity and the particle radial distribution function, the latter containing information related to the preferential concentration [9,17]. Regarding the settling velocity enhancement, which has been reported both numerically and experimentally [1,13,15], the commonly invoked mechanism is based on the fact that particles tend to preferentially explore the downward side of turbulent eddies. Nevertheless, Aliseda et al [1] have shown that collective effects should be involved to explain the further increase in settling velocity measured in their experiment when the seeding density is increased. They found that the fall velocity conditional on the local concentration linearly increases with the latter, suggesting the key role played by preferential concentration in this issue.
Properly simulating particle-laden flows requires an accurate model for the particle dynamics equation that is lacking so far: only the point particle limit is usually considered [5,6], and most of the simulations make further simplifications and often do not consider the backreaction of the particles on the fluid phase. Thus, experimental investigations are required to assess the preferential concentration problem and its consequences on the settling velocity and collision rate enhancement. We study in the present the possible biases arising from classical experiments regarding the quantification of preferential concentration.
The relevant parameters in this context are the particles' diameter or their Stokes number defined as the ratio of the particle viscous relaxation time to the dissipation time scale of the carrier flow (St = τ p /τ η ), the average seeding density and the Reynolds number. A first bias arises from the nature of the particles used: in air experiment, particles are usually water droplets generated from commercial or dedicated injectors that produce polydispersed particle populations leading to seeding consisting of a mixture of various Stokes number particles so that average Stokes numbers and corresponding uncertainties have to be defined.
Acquisitions of the concentration field require imaging techniques. If three-dimensional (3D) imaging is developing fast, it is still limited to the cases when too few particles (thousands) are present in the measurement volume to allow proper study of concentration issues; consequently, most of the work devoted to preferential concentration relies on 2D images obtained from cameras aiming at a particular region of the flow illuminated by a laser sheet whose thickness L th is of the order of 1 mm. In this case, experimentalists are studying a 3D phenomenon through projections into a 2D space and one may wonder how this projection affects the results and conclusions of their work.
Another bias arising from these imaging techniques is linked to the identification of the particles in the images. Typical water droplet experiments involve a mixture of particles whose diameters are in the range of 2-200 µm [1,7,10,11,14]. On a typical image, rough detection of particles is performed through thresholding of the grey level, a more precise position being obtained from sub-pixel accuracy techniques that are relevant when particles span several pixels in the images. The use of thresholds implies that many particles may not be detected and therefore the particle sets are artificially sub-sampled.
Whenever it is possible to clearly measure the coordinates of the particles, various methods can be implemented to estimate and/or accurately access the concentration field or its global properties: box counting methods, clustering index, correlation dimension, Minkowski functional, Voronoï diagrams and so on. In a recent review, we have tried to present the advantages and drawbacks of these various techniques in the field of particle-laden turbulent flows [8]. Here, we would like to focus particularly on the technical issues raised by the use of Voronoï diagrams when studying concentration fields of particles in turbulent flows. We have introduced the use of this tool in the field in a recent paper to analyse experimental data obtained in a wind tunnel [7]. In the present, we will use various 3D numerical simulations to question the influence of the three biases introduced above. Section 2 presents the Voronoï analysis, the former experimental results and the numerical simulations used here. Section 3 is dedicated to the study of the various biases we consider. We finally discuss our results and draw possible conclusions in section 4.

The data, post-processing and former results
As we aim to address the possible biases arising in experiments using 2D imaging methods, we present the experimental setup and summarize the main results presented in [7] that is representative of this family of experiments. We then describe the two numerical data sets we have employed to assess the importance of these biases. All of the analysis being carried out using Voronoï diagrams, we first recall the basics of this tool.

Post-processing and definitions
As explained in [7] and further justified in [8], the use of Voronoï diagrams is particularly relevant for studying preferential concentration of inertial particles in turbulent flows. Given a set of particles, the corresponding Voronoï diagram is the unique decomposition of the nD space into independent cells associated with each particle. One Voronoï cell is defined as the ensemble of points that are closer to a particle than to any other. The use of Voronoï diagrams is very classical to study granular systems and has also been used to identify galaxy clusters. Voronoï diagram computation is very efficient with the typical number of particles per set usually encountered (up to several hundred thousand). From their definition, it appears that the volume/area of a Voronoï cell is the inverse of the local concentration of particles; therefore the investigation of the Voronoï volume/area field is strictly equivalent to that of the local concentration field. In the following, we will deal with volume regardless of the space dimension. As the mean value of the Voronoï volumes is nothing but the average concentration, we always normalize these volumes by their mean value to define normalized Voronoï volumes that will be referred to as volumes and denoted by V throughout this paper.
In simulations as well as in experiments, several 2D or 3D particle fields are obtained at various instants. We present statistics obtained from ensemble averaging over several time samples. The probability density function (PDF) and standard deviation of the normalized Voronoï volumes can be calculated for experimental and numerical data sets and compared to those expected for uniformly distributed particles (see [4] for a presentation of those expectations). Voronoï volumes PDF may be used to identify clusters of particles as follows. Voronoï PDFs for a typical experiment/simulation and for a uniform random process (URP) intersect twice: for low and high values of the normalized Voronoï volume, corresponding respectively to high and low values of the local concentration, the PDF associated with inertial particles is above that of URP, while the opposite is observed for intermediate volume values. This is consistent with the intuitive image of preferential concentration: the inertial particles concentration field is more intermittent than the URP, with more probable preferred regions where the concentration is higher than the uniform case and subsequently also more probable depleted regions where the concentration is lower than in the uniform case. We consider the first intersection point V c as an intrinsic definition of particle clusters: for a given data set, Voronoï cells whose volume is smaller than V c are considered to belong to a cluster. It appears that cluster cells tend to be connected in groups of various sizes and shapes that we identify as clusters whenever they belong to the same connected component. We then analyse the geometrical structure of the identified clusters. More details are available in [7].

The original experiment
Here we present our former experimental setup and results that will be referred to as Laboratoire des Ecoulements Géophysiques et Industriels (LEGI) data.

Experimental setup.
Experiments are conducted in a large wind tunnel with a 0.75 m × 0.75 m square cross-section where an almost ideal isotropic turbulence is generated behind a grid whose mesh size is 7.5 cm. We can adjust the mean velocity from 3 to 15 m s −1 (the turbulence level remaining relatively low, of the order of 3% at the measurement location, and the anisotropy level between the transverse and longitudinal fluctuating velocities is smaller than 10%) and thus the Taylor micro scale Reynolds number R λ . Inertial particles are water droplets generated by four injectors placed in the convergent part of the wind tunnel, 1 m upstream of the grid to ensure a homogeneous seeding of the flow. According to the injection process, we are able to tune the average Stokes number and the average particle concentration C 0 . We always consider regimes of relatively low particles volume loading (volume fraction in our experiments is in the range 2.10 −6 < φ v < 3.10 −5 ) so that no turbulence modulation by two-way coupling is expected to occur. Acquisitions are performed using a Phantom V12 high-speed camera (Vison Research, USA) operated at 10 kHz and acquiring 12-bit images at a resolution of 1280 pixels ×488 pixels corresponding to a 125 mm (along x)×45 mm (along y) visualization window on the axis of the wind tunnel (covering slightly less than an integral scale in the vertical y-direction and almost two integral scales in the stream-wise x-direction), located 2.95 m downstream of the grid. The camera is mounted with a 105 mm macro Nikon lens opening at f /D = 2.8. An 8 W pulsed copper laser synchronized with the camera is used to generate a 2 mm (i.e. 3η-4η) thick light sheet illuminating the field of view in the stream-wise direction. Particles are identified on the recorded images as local maxima with intensity higher than a prescribed threshold. Subpixel accuracy detection is obtained by locating the particles at the centre of mass of the pixels surrounding the local maxima. More detailed descriptions of the experimental and acquisition setups are available in [7].

Experimental results.
Here, we briefly recall the main results obtained experimentally and presented in [7]. By systematically varying the triplet of parameters (St, R λ , C 0 ), we have shown that particle Voronoï volume distributions are always reasonably log-normal, so that preferential concentration can be quantitatively measured by a single scalar (the standard deviation of these distributions σ V ). By plotting σ V against the Stokes number, the maximum of heterogeneity of the concentration field for particles with Stokes numbers around unity has been successfully recovered. We have characterized cluster (and void) geometries and their inner concentration: the cluster areas/volumes (A c ) are algebraically distributed and their structure is fractal (their perimeter P c and their area A c are not linearly related); in particular, clusters do not appear to have any characteristic typical scale. The analysis of the particles' normalized concentration inside the clusters (C/C 0 ) revealed two new, so far unpredicted results: (i) the average particle concentration inside clusters depends on the global particle loading in a non-trivial way and (ii) after compensating for this particle loading dependence, the average concentration inside clusters exhibits a non-monotonic dependence on the Stokes number, with a maximum around unity values. Figure 1 gathers all these results.

Numerical simulations
Our goal in this paper is to use 3D direct numerical simulation (DNS) data to produce, as much as possible, sets of data that exhibit biases similar to those naturally present in experiments and to analyse them as experimental data to estimate the consequences of the considered biases for the measured results. We have benefited from two sets of numerical data produced by similar methods. The first set is available online at http://mp0806.cineca.it/icfd.php and has been used and presented in [3], and the second set provided by Goto has been used in [16]. They will be referred to as CINECA and GOTO data, respectively. Both of them consist of 3D DNS of isotropic homogeneous turbulence produced in periodic cubic boxes of size (2π) 3 . They employ the simplest model for small heavy particles [5,6]: where v p (t) and x p (t) are the particle velocity and position at time t and u(x, t) is the surrounding fluid velocity field at position x. The latter corresponds to a statistically homogeneous isotropic stationary turbulence field obtained a priori for an incompressible fluid and is used as a frozen forcing to solve equation (1) for various types of particles defined through  the Stokes drag coefficient τ p = 2ρ p a 2 /9µ, where ρ p and a are the particle density and radius, respectively, and µ is the fluid viscosity. The fluid velocity field u(x, t) is obtained by solving the Navier-Stokes equations: where ν and ρ f are the fluid kinematic viscosity and density, respectively, p(x, t) is the fluid pressure field and f is an external forcing implemented by fixing the amplitudes of Fourier modes in the low-wavenumber region. The numerical grids used set the associated Taylor micro scale Reynolds numbers R λ . The relevant parameters defining the simulations are provided in table 1. The number of particles in the simulations and the considered Stokes numbers vary from one simulation to another; they are also recalled in table 1.

Two-/three-dimensional bias
As mentioned in the introduction, most of the current concentration measurements carried out in particle-laden turbulent flows consist of 2D imaging of the particular region of the flow illuminated by a laser sheet whose thickness ranges from 0.5 to 2 mm. The consequence of such a technique is that particles living in a 3D space are projected onto a 2D slice. Although this may have little implication regarding particle image velocimetry since in this case the relevant parameter is the particle velocity that should not change much along the sheet thickness in most  of the applications, it turns out to be an issue when one deals with the particle concentration, since it may artificially increase the concentration locally (consider, for example, a case when particles would be organized in structures or clusters whose typical relative distance would be smaller than the laser thickness). In order to estimate the resulting bias, from the original DNS boxes of size (2π ) 3 , we have defined several slices whose surface along the first two coordinates is (2π) 2 , whose extension along the third coordinate is varied from L th = 2η to L th = 6η and which are centred at various positions between 0 and 2π. For each slice, we keep only the two first coordinates and calculate the associated Voronoï tessellations. This exactly mimics experimental 2D imaging of a 3D flow with usual laser sheets. Figures 2 and 3 present the resulting normalized Voronoï area distributions and the dependence of the Voronoï area standard deviation with the Stokes number for various slice thicknesses for GOTO and CINECA data. As to the experimental data, the PDFs of the Voronoï areas are wide, covering more than four decades. The ones obtained from GOTO are close to log-normal (as in the experimental data), while the ones obtained from CINECA are not. The shape of the ones obtained from GOTO changes when L th is increased, which is not the case for the ones obtained by CINECA. Nevertheless, all of them are well described by their standard deviation. When the slice thickness is increased, the standard deviation dependence on the Stokes number is qualitatively preserved over the available range of Stokes numbers. Nevertheless, the exact values of the standard deviation show a systematic increase with slice thickness. We report a maximal increase of 25% when the slice thickness is increased by 150% for both sets of numerical data. Surprisingly, the standard deviations obtained from GOTO data are twice as large as the ones obtained from CINECA data that are closer to the experimental one. The discrepancies between GOTO and CINECA data on the standard deviation values, on the PDF shapes and on their shape change with L th will be discussed when addressing subsampling issues in section 3.2.
A very interesting feature is that for both simulations, 2D and 3D post-processing lead to very similar quantitative results, which legitimates the experimental use of 2D slices to investigate a 3D phenomenon.

Sub-sampling bias
Another bias that may come from imaging techniques is an artificial sub-sampling of the data sets. As in any detection problem, two kind of errors are expected: wrong detections and missed detections (false positive and false negative in receiver operating characteristic curve vocabulary). The former overestimates the number of particles while the latter underestimates it. Usually, the signal processing chain implemented in such a case is designed so that there is no wrong detection and so that the number of particles found varies very slowly with the chosen threshold whenever this is possible. As a consequence, there are necessarily several if not many missed detections and experimental data sets are usually sub-sampled.
Sources of missed particles are numerous. One source is particles that remain in the shadow of others and another is due to the use of thresholds for particle detection that implies that some particles have to be assimilated to the noisy background. We believe that the latter is the main source of missed detection and this is for several reasons: (i) smaller particles appear less luminous than bigger ones, (ii) as the laser sheet intensity is decreasing toward its edges, particles in the centre of the sheet appear more luminous than the others, and (iii) due to the average seeding and to the laser sheet thickness, the resulting concentration on 2D images is not very high and the number of shadowed particles might be very small compared to the number of missed particles due to (i) and (ii).
To test the impact of this unavoidable sub-sampling, we have created extra data sets from data sets obtained with a given threshold, by keeping fewer and fewer particles from the original data set. Particles were removed by randomly picking them from the original data set with a uniform distribution. The same procedure has been applied to 3D DNS data. From an original data set consisting of N 0 particles, we produce new data sets with only α N 0 particles, α ∈ [0, 1] being the ratio of the kept particles.
Results are presented in figure 4. The top left figure shows the changes in the normalized Voronoï area PDF obtained for one experiment at St = 0.25 taken from LEGI data when the number of particles is artificially reduced down to 1% of the original number and the top right figure shows the associated standard deviation σ V as a function of the ratio of the particles kept. It is seen that the PDFs are not much affected for ratios down to 40% of the original number. This result is confirmed with the standard deviation that shows a change by less than 2% when the number of particles is decreased by more than a factor of 2. Note that when only 1% of the particles are kept, the resulting standard deviation corresponds to that of a uniform distribution of particles. The bottom left and right figures in figure 4, respectively, present the dependence of the Voronoï volumes standard deviation on the sub-sampling ratio and the Stokes number for all GOTO data associated with 2D slices of thickness 5.7η. As to the experimental data, the effect of sub-sampling is a monotonic decrease of the estimated Voronoï standard deviation. Whatever the value of the Stokes number, when a few per cent of the original particles are kept, the standard deviation is getting very close to the value expected for a URP. For small and intermediate values of the sub-sampling (say above 40%) the qualitative dependence upon the Stokes number is preserved while the absolute values are decreased by at most 25%, which is larger that what has been observed from the experimental data sets. It is worth noting that the plateau exhibited for sub-sampling values ranging from 40% to 100% in the experimental data is no longer present in GOTO data. When the sub-sampling is larger (values below 40%), the qualitative dependence of σ V on the Stokes number is affected and does not present a maximum anymore. The corresponding curves look similar to those obtained from CINECA data that were computed from much smaller particle samples (see table 1). Tagawa [12] found consistent values from another set of numerical data. The shift observed on the curve σ V = f (St) when half of the particles are removed has to be linked to results presented in 3.1: we noted that CINECA and GOTO data lead to different values of σ V and to different shapes for the V PDFs. The only difference between both simulations is in the number of particles, which is 125 times  larger in GOTO data. As a consequence, the use of Voronoï diagrams that do not involve any coarse graining leads to different scales of analysis. GOTO data allow probing at dissipative scales, while CINECA data are fully in the inertial range; as a consequence, both data sets lead to different results.
All the results presented in section 2.2.2 have been recovered from the sub-sampled experimental data sets. We illustrate this in figure 5, which presents the PDFs of the identified cluster areas for various sub-sampling ratios. PDFs are compensated for by A 2 c to evidence that the algebraic behaviour with a −2 exponent is always roughly preserved even if more than 50% of the particles have to be kept to observe the scaling over more than one decade. Direct visualizations of the non-compensated PDFs reveal that the departure from the −2 scaling law appears mostly in the tails, which shrink when the number of particles kept is decreased (see the inset).   6. Left: particle diameter PDF evolution with air pressure (varying from 2 to 5 bars) at a fixed water flow rate (1.2 l min −1 for each injector) and a fixed wind velocity (V 0 = 4.5 m s −1 ); note that PDFs are calculated from particle volume (and not particle number).

Mixing Stokes numbers
for each injector) and a fixed wind velocity (V 0 = 4.5 m s −1 ). Each of them is widely distributed over more than one decade and the estimation of a Stokes number implies to use an average or a most probable Stokes number defined from the average or the mode of these PDFs (we actually used the mode in [7]). In both cases, because of the high polydispersity, the standard deviation σ St of Stokes number (based on measured diameters distribution), which could be interpreted as an error bar for the Stokes number estimation, is large (σ St /St easily exceeds 50%). The same happens in any wind tunnel experiment seeded with liquid droplets [1,10] and one may wonder how this polydispersity impacts the dependence on the Stokes number of the various quantities measured in experiments. To tackle this issue, we have built a polydisperse data set from GOTO data projected on 2D slices.
In Goto's work, each Stokes number data set was computed using the same fluid DNS (see section 2.3). As a result, gathering particle fields of various Stokes numbers makes sense and properly mimics polydispersity. For each experimental condition, we have estimated a particle diameter distribution similar to those shown in figure 6 (left). Using the definition of the Stokes number used in this paper, these PDFs can be expressed in terms of the Stokes number rather than in the particle diameter. In Goto's work, eight sets of particles associated with eight values of the Stokes number have been simulated. To mimic one particular experiment, we pick randomly a certain number of particles from each of these eight data sets. The relative proportion taken from each of them is given by the presented experimental particle diameter PDF. Figure 6 (right) presents σ V as a function of St for single Stokes data (see figure 3) and for mixed data built as described above. In spite of the short range of Stokes numbers covered by the polydispersed data, we can see that quantitative values are changed by less than 15%. The qualitative behaviours of these two curves are close, even though the one associated with single Stokes number data presents a maximum while the other does not. Nevertheless, this maximum relies only on the point at St = 10, and in the range where both curves are present, the two behaviours are similar.

Discussion and conclusions
2D/3D biases studied in section 3.1 and summarized in figure 3 have been shown to affect less the obtained results. Qualitative results are always recovered when considering slices of thickness L th 2η − 6η. Quantitatively, the estimated values of σ V are not much affected in this range of slice thickness. This is encouraging regarding the reliability of the experimental results obtained from 2D acquisitions performed in particle-laden turbulent flows.
The study of sub-sampling experimental and numerical data shows that none of the so far obtained results are affected qualitatively or quantitatively when half of the detected particles are removed from the original data. This demonstrates the robustness of the Voronoï analysis with respect to the almost unavoidable missed detections occurring when processing experimental data. On the other hand, the impact of sub-sampling seems slightly more important for GOTO data than for LEGI data and we noted several discrepancies between GOTO and CINECA data. We may wonder about the reasons for these important differences. Considering the values reported in table 1, it is seen that the ratio between λ, the Taylor micro-scale, and η, the Kolmogorov scale, is larger for GOTO data. As a consequence, when sub-sampling the data from the experiment, we actually probe scales fully included within the inertial range, whereas when sub-sampling in GOTO data we probe scales ranging from the dissipative to the deep inertial range. A similar reasoning can be applied to the global seeding (N 0 ) differences between GOTO and CINECA data. Yoshimoto and Goto [16] have shown that preferential concentration is self-similar within the inertial range, while Bec et al [2] insisted on the crucial differences between the mechanisms involved to achieve preferential concentration in the dissipative and the inertial range. The differences observed here between the numerical and the experimental data on the one hand and between GOTO and CINECA data on the other hand could thus be explained from these considerations about the involved scales and the global seeding of the flow. This is also consistent with the self-similarity observed in our experimental results [7]. Similarly, differences observed between GOTO and CINECA data can also be understood from this scale argument and help us formulate the obvious warning that one should be very careful about the scales involved in an experiment or in a simulation according to the particle seeding density: changing the seeding density (or the number of particles used for Voronoï calculations) has an impact on the scales that can be probed.
Regarding the polydispersity problem, we have also shown that the qualitative behaviour of σ V as a function of St is preserved and that the associated absolute values are weakly affected by polydispersity. This shows that the maximum enhancement of preferential concentration observed around St 1-2 is a very robust effect that is even recovered when various types of particles are mixed together.
Even though the present study does not address all possible biases arising from imaging measurement of concentration in particle-laden flows, we have considered the more relevant ones in the context of Voronoï analysis. A similar work could be undertaken for the other classical preferential concentration estimators such as pair correlation or box counting methods for example.