Improved three-dimensional localization of multiple small objects in close proximity in digital holography

Using intensity gradient- or sparsity-based focus metrics, the ability to accurately localize the three-dimensional (3D) position of a small object in a digital holographic reconstruction of a large ﬁeld of view is hindered in the presence of multiple nearby objects. A more accurate alternative method for 3D localization, based on evaluation of the complex reconstructed volume, is proposed. Simulations and experimental data demonstrate a reduction in depth positionalerrorforsingleobjectsandanotablyimprovedaxialresolutionofmultipleobjectsincloseproximity.


INTRODUCTION
Digital holography has previously been used to identify the size and three-dimensional (3D) position of small objects, such as the dispersion of diesel and oil droplets [1,2] and marine plankton [3][4][5]. By extending the field of view laterally to record a larger volume, it is anticipated that similar techniques could be used to capture long time series tracks of multiple mosquitoes while in flight around a supine human, protected by an insecticidal bed net.
The mosquitoes of interest are responsible for transmission of a number of tropical diseases, particularly malaria, against which insecticidal bednets are the most effective method of control [6]. Previous work on two-dimensional (2D) flight reconstructions demonstrated the ability to quantify discrete mosquito behaviors [7], from which improved bednets were designed and validated [8]. Evaluation of bednet modes of action and efficacy have also been conducted in lab-scale assays, e.g., a 100 × 100 × 100 mm cubic chamber [9]. 3D tracking is necessary to correct the unknown displacement in the third spatial axis, and holographic imaging offers the potential to give detailed reconstruction of mosquito position and orientation. However, appropriate processing techniques are needed to resolve the multiple mosquitoes that can be present and form a robust 3D metrology solution.
Non-holographic methods for extracting the 3D position of small objects often involve stereo-pair imaging techniques as often used in 3D particle image velocimetry (PIV) [10]. In light-sheet-based stereo PIV, the requirement to localize individual tracer particles is removed through calculation of average displacements at a matrix of interrogation regions in each camera that are combined to give an array of three-component displacement vectors across a plane [11]. Full tomographic solutions have been implemented to give three-component velocity vectors in 3D space, but typically require multiple cameras that are angularly well separated. In both cases careful camera and in situ calibration of the setup are needed to accurately determine the mapping between camera and global co-ordinate systems [12]. Similar techniques for 3D tracking of swarming mosquitoes have been developed by Butail et al. in the field using a stereo-pair imaging setup [13,14]. However, the inherent problem with matching stereo image pairs resulted in short tracks (5-7 s) that required human supervision to combine into long time-series data. Other approaches have modified a retro-reflective imaging setup [15] with a small separation between light source and camera to give a quasi-stereo setup where a mosquito's secondary shadow on the retro-reflective screen and its primary shadow on the camera form a stereo image pair [16]. The approach is challenging as missing data points can occur due to the poor shadow-background contrast. In each of the above techniques, non-occluded access to the measurement space is needed and with angularly well-separated cameras (in all but the last approach [16]).

Research Article
For mosquito behavior monitoring, the interaction of the mosquitoes with walls and nets is critically important as these are regions where interventions can be applied, e.g., insecticides. Consquently, the use of single-view digital holography with collimated illumination is particularly attractive and the digital re-focusing of the mosquito images gives the potential for 3D tracking of mosquito flight. However, it is well known that the inherent depth-of-focus problem in digital holography results in a much poorer axial locational accuracy than the lateral resolution when recording scenes with small objects. The axial distance over which a particle of diameter d can be viewed as "nearly in focus" is given by the proportional relationship of z ∝ d 2 /λ [17]. The limited pixel pitch of a charge-coupled device (CCD) makes the reduction of this problem using physical methods difficult in digital holography [18][19][20]. Instead, numerical methods to calculate a plane of best-focus from a reconstructed volume has resulted in many algorithms being developed for the automatic processing of holograms to refocus a scene or provide accurate 3D positional metrology of small objects.
When an object covers several pixels, edge detection methods and calculation of the plane with the maximum or total sum of the: edge contrast [21][22][23]; gradient [24,25]; Laplacian [26]; or intensity variance [27,28]; have been used to locate the axial position of a small object.
Examining the phase or complex value reconstructed volume as a means of localization has also been explored. The reconstruction of a point-like real object features a minima at the object-depth location when the imaginary part of the complex reconstruction is plotted along the optical axis according to Pan and Meng [29]. Yang et al. also observed a characteristic change in sign of the phase at this same axial position [30], and De Jong characterized a "particle signature function" in the complex reconstruction in 2007 to find the in-focus plane of a particle [31]. More recently, Ohmans calculated the planar wavefront curvature of a particle to determine its axial position [32].
More mature methods of autofocusing a hologram recording of a scene exist, where the sparsity or energy distribution of an image is quantified. Dubois et al. suggested calculating the energy of the reconstructed volume in multiple planes to identify the plane of best-focus [33]. Memmolo et al. suggested the use of a Tamura coefficient to identify the plane of best-focus in 2011 [34], and later compared this to the use of a metric called Gini's Index [35]. Zhang et al. provided a robust method for holographic autofocusing, based on the sparsity of the absolute gradient of the complex optical wavefront, which they called the Gini of the Gradient and the Tamura of the Gradient [36] and compared these to several well-known autofocusing metrics proposed by Langehanenberg in 2008 [37]. Several other autofocus methods in digital holography are given in [38][39][40][41], some of which were compared in Zhang's paper [36].
The majority of these papers propose robust focus metrics for a single object inside a single interrogation region or for point-like sources. However, the effect of multiple objects in close proximity inside a single interrogation region on z-axis localization has not been widely reported. That being said, Ren et al. used image entropy as a best-focus metric in 2015 [42], and Jiao examined the automation and detection of multiple objects in a single interrogation region by separating the objects using image segmentation in 2017 [43]. In an investigation into sea plankton, Dyomin also suggested that two peaks were present inside a Tenengrad coefficient focus metric graph for two objects in a single interrogation region [4]. However, in both papers, the objects were well-separated laterally (no X Y overlap of objects), and the lateral sampling resolution or "effective pixel size" was high (≈5−7 µm per pixel in the object space).
In the presented paper, the use of a "particle signature function" from De Jong (2007) [31] is adapted for use with larger non-particle objects (d o = 0.4−8 mm) and applied as a focus metric. The proposed method is analyzed alongside a typical edge-gradient-based focus metric [25], and a sparsity-based focus metric [36,44], which are both shown to encounter issues in localization when there are multiple objects inside a single interrogation region. Analysis of the focus metric proposed in this paper reveals improvements in localization accuracy of a single object, and large improvements in the ability to resolve two separate objects inside a single interrogation region.
The major factors contributing to the localization error of a single object within a single interrogation region are discussed in Section 2, and a simulated comparison between the three focus metrics is performed to assess algorithm performance and examine the focus metric curves with respect to depth of a single object. The proposed focus metric maintained a relatively small error compared to the other focus metrics at lower imaging resolutions (i.e., larger effective pixel size in mm/pixel), implying that a larger volume could be recorded, while maintaining the same level of accuracy for a given CCD array. Section 3 introduces the factors contributing to total localization error of multiple objects inside a single interrogation region. Simulations are performed to examine the effects of lateral and axial separation of the two objects on localization accuracy. The proposed focus metric offers considerable advantages in the axial resolution of multiple objects in close proximity compared to the other focus metrics. Experimental validation of the use of the proposed focus metric is provided in Section 4, which demonstrates a reduction in localization error and improved axial separation of multiple objects inside a single interrogation region. The proposed metric is shown to be capable of resolving two objects axially when the objects are overlapping laterally, and in cases when the edge-gradient-based method failed to resolve the two separate objects in the z-axis.

A. Contributing Factors
There are several contributing factors to the localization error when determining the z-axis position of a single object within an interrogation region using edge-based focus metrics, namely the interrogation region size, effective pixel size (EPS), and object size.
The interrogation region size relative to the object size determines how much of the defocused object wave and twin-image wave is involved in the calculation of the focus metric, as well as how much of the background image is used. The edge of the region must sit outside the bounds of the object to adequately capture the object edge pixels, although an interrogation that is too large can have an adverse effect on localization accuracy when using sparsity criteria as a focus metric [45]. In the case of multiple objects in close proximity, the window size will also affect how much of the second object diffraction pattern is involved in the calculation. The effective pixel size (EPS, x e ), also known as lateral image spatial resolution or sampling resolution, is defined as the recorded volume transverse distance (x r , y r ) divided by the number of pixels on the recording device (M, N) so that x e = min(x r /M, y r /N). The EPS is increased by using fewer pixels to record a given volume, or by increasing the recorded volume for a given number of pixels. The demagnification factor of a two-lens lens system between the object and camera is used to alter the EPS. The effect of x e on localization error is twofold as it determines the sampling at which the diffracted continuous object wave signal is discretized and stored during recording [Figs. 1(a) and 1(b)]; and also the number of pixels over which the focus metric is evaluated in the reconstructed volume.
For an object larger than a point source, diffracted light becomes a summation of multiple point sources around the edge of the object following Huygens' principle. The absolute object size therefore determines the diffraction rate of the object wave with respect to the axial direction (i.e., how quickly the twin and primary image defocus). A smaller object has a higher diffraction angle so that the "effective numerical aperture" of the object diffraction is higher [ Fig. 1(d)]. The relative object size (d o / x e ) and demagnification factor also determines the number of pixels over which the object is reconstructed, and therefore dictates the pixel averaging when determining reconstructed intensities in the calculation of the focus metric.

B. Edge-Gradient-based Focus Metric
A combination of the methods used byİlhan et al. [24,25,27] was employed as a baseline edge-gradient-based focus metric. A Sobel gradient (∇G) was calculated by the convolution of a reconstructed plane intensity matrix and the 3 × 3 Sobel operator. The edge gradient focus metric (F G ) was calculated as the sum of intensity gradient variance, given by where ∇G(m, n) is the Sobel gradient value for each pixel, and ∇G is the mean gradient over the interrogation region.
C. Sparsity-Based Focus Metric (Tamura of Intensity) The Tamura Coefficient (TC) of the intensity reconstruction was used as a baseline sparsity-based focus metric. Tamamitsu et al. compared the TC and the Gini Index as focus metrics in 2017 and concluded that the TC offers more flexibility in choosing a larger interrogation region (important in making a fair comparison in Section 3) and is less susceptible to background noise, particularly in naturally sparse samples [45]. The TC was applied to intensity reconstructed planes of the amplitude-only objects, as the resulting intensity images typically exhibited lower background noise levels than the gradient images, particularly in the experimental data presented in Section 4. The TC of an intensity image, I , is given by where σ (I ) and I are the standard deviation and the mean intensity of an image, respectively.

D. Proposed Particle Signature Function Based Focus Metric
The majority of phase-or complex-value-based autofocus methods involve finding a minima or maxima in the calculated one-dimensional (1D) focus metric through the center of a particle in the axial direction. De Jong described a 1D "particle signature function (Y )" [31], given by where A z is the complex-value reconstructed wavefront, and d A/d z is the axial gradient of this wavefront with respect to z. The variation of this metric through the center of the particle indicated the z-axis position by a sharp zero-crossing and a minima in the real and imaginary parts of Y , respectively. The diffraction pattern of a larger object is the superposition of multiple point sources around the edge of the object, and so a 1D axial plot of Y through the object center does not reveal the z-position in the same way as in [31]. The particle signature function was therefore altered to make it suitable to locate larger objects (d = 0.4 − 8 mm). A pixel-wise particle signature function for the reconstructed volume was calculated using Eq. (3). A large number of highly negative pixel values around the edge of the object at the plane of best-focus was expected in the imaginary part of the particle signature function, as shown by Fig. 2 and in accordance with the theory that considered point source objects in De Jong's paper. The imaginary volume was therefore thresholded such that only values of Im(Y ) < 0 were retained. The sum of the squared values of Im(Y ) was then calculated by A sharp maxima features at the object axial position of this metric when plotted along the z-axis. Taking the second-order differential of this curve with respect to z therefore produces a Research Article sharper negatively valued minima. This value is then thresholded to retain values of d 2 S Y /d z 2 < 0, and is then multiplied by −1 to produce a positively valued particle signature function metric (F Y ), given by

E. Focus Metric Example of a Single Object
The edge-gradient-based focus metric (F G ), the TC focus metric (F TC ), and the proposed particle signature function metric (F Y ) are plotted in Fig. 2 to demonstrate how the focus metrics identify the z-axis position of a single simulated object.
The focus values have been scaled between [0,1] for clarity. The object was simulated as a d o = 2.4 mm sphere, placed z prop = 1.40 m away from the back-focal plane of a telecentric two-lens system, with an effective pixel size in the object space of EPS = 58.6 µm. The volume was reconstructed in increments of z inc = 0.25 mm from z = 1.37 m to z = 1.43 m. The simulated hologram dataset to produce Fig. 2, as well as the holograms used to produce all subsequent figures in the paper are available (see Ref. [46]).

F. Performance Characteristics
The performance measures used for comparison were the z-axis localization error (z err ) and peak prominence (Q-value, Q F ) of the focus metric versus z-axis graphs. The z-axis localization error was defined as the absolute difference between the known z-axis position of the object input into the simulation (z pos ) and the calculated position using the peak of the focus metric curve (z rec ). The Q-value was calculated by the normalized peak height (max(F )), divided by the full-width half maximum [FWHM(F )]: G. Simulation Results

Parameters and Setup
A large matrix size was chosen (10800 × 10800) to closely resemble a continuous non-discretized diffraction pattern solution to the wave propagation of a small object. Object sizes in the range d o = 0.4 − 8 mm in increments of 0.2 mm were simulated. The incident field (E i ) was propagated by z = 1.40 m from the object to the hologram plane to give a diffracted field (E o ) using a Fresnel transfer function method [47]. A large propagation distance helps alleviate the twin-image problem, as the primary and twin image are equidistant on either side of the hologram plane. Therefore, a larger propagation distance from the object to hologram plane produces a more defocused twin image at the plane of best-focus of the reconstructed object, meaning the twin image will have a smaller contribution to the focus metric calculations.
The hologram was resized to simulate demagnification through a telecentric two-lens system by taking the sum of adjacent complex-valued pixels and then calculating the absolute values to produce an intensity pattern as per a CCD recording of an in-line hologram. The number of adjacent pixels used in the resizing was determined by the desired EPS to be simulated, and the pixel values were rescaled and rounded in the range 0-4095 to represent a 12-bit CCD intensity recording. The resized intensity hologram was back-propagated using negative z-distances to multiple planes in increments of z inc = 0.25 mm using the Fresnel transfer function method to reconstruct the object volume in the range z = 1.20 − 1.60 m.

Window Size and Effective Pixel Size (Imaging Resolution)
Window sizes ranging from 1.0 − 2.5× the object diameter in increments of 0.1× were examined for a range of EPS and object sizes. It was found that the window size had little effect on the localization error or Q-value of the edge-gradient-based or proposed particle signature function focus metric for window sizes larger than the object diameter. A window size of 1.2× was chosen to ensure that the window fell outside the bounds of the object, but the smaller matrix would minimize computational time for calculating the focus metrics and minimize the adverse effect on localization accuracy when using sparsity criteria as a focus metric [45].
The EPS, or imaging resolution, has a considerable effect on the z-axis accuracy of the focus metric. The EPS values assessed were in the range 23 µm < x e < 123 µm, corresponding to demagnification factors of 4−21× through a two-lens system onto a CCD with pixel size of 5.86 µm. Figure 3 shows how the error and Q-value of the focus metric curve varies with changing the effective pixel size for an object size of d o = 2.4 mm. A marked increase in localization error is seen for the gradient-based metric when x e > 58.6 µm. The average localization error across all EPS values wasz err = 5.30 mm, z err = 1.32 mm, andz err = 0.95 mm for the edge gradient, TC, and particle signature function focus metrics, respectively. The average localization error for larger EPS values in the range 70 µm < x e < 123 µm wasz err = 10.10 mm, z err = 2.50 mm, andz err = 1.50 mm for the edge gradient, TC, and particle signature function focus metrics, respectively. These results indicate that the proposed particle signature function has considerable benefits for 3D metrology across all EPS values, but particularly at higher EPS values, e.g., when the recorded volume needs to be maximized. The Q-value, averaged across all EPS values, was notably higher for the particle signature function method (Q F y = 324.98) compared to the TC (Q F TC = 69.87) or the edge-gradient-based method (Q F G = 49.25), although the downward trend as EPS increases is more apparent in the particle signature function focus metric.

Object Size
A window size of 1.2× the object diameter (d o ) was chosen along with an EPS value of 58.6 µm, which was based on where the marked increase in localization error occurred in the previous section for the edge-gradient-based metric. This EPS value was chosen as it was the largest EPS value that yielded a similarly acceptable level of axial localization accuracy (z err ≈ 2d o for d o = 2.4 mm) for all focus metrics, making subsequent analysis in Section 3 fairer. The largest EPS value that maintains an acceptable level of localization accuracy is desirable, as it maximizes the volume that can be accurately analyzed for a given CCD array. The volume was reconstructed in z-axis increments of z inc = 0.25 mm. Object sizes in the range d o = 0.4 − 8 mm in increments of 0.2 mm were plotted against the z-axis localization error and Q-value in Fig. 4. Figure 4 shows the effect of object size on localization error and Q-value for of the focus metrics. Figure 4(a) demonstrates that the localization error was largely independent of object size. However, the mean localization error across all object sizes wasz err = 1.45 mm, z err = 0.56 mm, andz err = 0.40 mm for F G , F TC , and F Y , respectively, demonstrating a reduction in localization error for the proposed focus metric. The Q-value graph in Fig. 4(b) also indicates an independence with object size. The Q-values across all object sizes were notably larger for the proposed focus metric with an average Q-value of Q F Y = 254.21, compared to average Q-values of Q F TC = 58.18 and Q F G = 57.78 for the other focus metrics.

LOCALIZATION ERROR AND MULTIPLE OBJECTS A. Contributing Factors
The EPS has a considerable impact on the localization accuracy due to the reasons specified in Section 2A. The continuous diffracted wavefront to be sampled, however, is now the superposition of the waves emanating from the two individual objects. For fair comparison of the focus metric localization methods, an EPS of 58.6 µm was chosen for analysis in the following section as the z-axis localization error was similarly acceptable for a single object as outlined in Section 2G.3.
The normalized lateral X Y separation of the objects (OS xy ) is defined as the lateral distance between the center of the two objects (x s ) divided by the object diameter (d o ). For example, OS xy = 1 means that if the objects were on the same X Y plane, their edges would be touching. At a given hologram distance and axial separation of the two objects, a lower OS xy value means that a higher proportion of the recorded wavefront is overlapping at the hologram plane. The axial z-separation of the objects is defined as the axial distance between the two objects (z s ), as shown in Fig. 5.

B. Simulation Setup
The hologram was created by propagating from the furthest object from the hologram plane to the nearest object. The wavefront was then zeroed in the lateral coordinates of the closer object to simulate blocking, and then this combined wavefront was propagated to the hologram plane. The hologram was resized and remapped using the same methodology described in Section 2G.1. A mid-sized object of 2.4 mm diameter was chosen alongside five lateral separations ranging from overlapping objects to well-separated ones, such that OS xy = 0.5, 0.75, 1, 1.5, 2. The window size was chosen so that the bounds of the window were 1.2× the outside of the combined object field in both the x and y -direction. As per Section 2G.1, an EPS of 58.6µm was chosen. A range of z s = 10 − 50 mm, in increments of 1 mm, was chosen as the axial object separation. The total z-axis localization error (z err,t ) was defined as the sum total of the error of each object, such that z err,t = |z pos,1 − z rec,1 | + |z pos,2 − z rec,2 |, where a focus metric failed to identify two separate objects in the z-direction; z err,t was said to be undefined, so that z err,t = NaN. Figure 6 shows how the focus metric curves (F G , F TC , F Y ) respond to multiple objects in the same interrogation region and features two objects of 2.4 mm diameter separated laterally by OS xy = 1, and axially by z s = 40 mm. The gradient based focus metric curves corresponding to Figs. 6(a) and 6(b) correctly identify the object z-axis position of each individual object with minimal error. For the interrogation region with multiple objects [Fig. 6(c)], the z-axis localization errors were z err,t = 7.50 mm, z err,t = 2.50 mm, and z err,t = 2.25 mm for F G , F TC , and F Y , respectively. However, in the case of the edge gradient or sparsity measure, it is evident that the focus metric curves of Fig. 6(c) are a combination of two curves that are superimposed with no distinctly deep trough between the two maxima, whereas the proposed focus metric (F Y ) yields a focus metric curve with two distinct peaks. Figure 7 shows the how the total error (z err,t ) varies with object separation in the lateral and axial direction for each of the three focus metrics. The shaded region in Fig. 7(a) is due to the inability of the edge-gradient-based focus metric to axially resolve two separate objects for z s < 30 mm in the majority of cases. Likewise, the shaded region in Fig. 7(b) represents the often-unresolved cases for z s < 19 mm when using the TC.

D. Object Separation
As a fair comparison between the focus metrics, the mean z-axis localization error when z s > 30 mm was calculated as z err,t = 11.54, z err,t = 4.73, and z err,t = 1.74 for F G , F TC , and F Y , respectively. This demonstrates a significant improvement in axial localization using the proposed particle signature function metric compared to the edge-gradient and TC focus metrics. It is also worth noting that the use of F Y resulted in lower axial separation distances that could still be resolved as two separate objects, shown by the shaded regions in Fig. 6.

PHYSICAL EXPERIMENTAL VALIDATION
The experimental setup for a digital recording of a forward scattered diffraction pattern is shown in Fig. 8. The light source was a continuous wave 150 mW CrystaLaser DL785-150-SO with a wavelength of 785 nm and a coherence length >10 m. The beam was expanded and filtered through an objective lens and aperture. A collimating lens (CL) was used to provide a plane illumination wave. The objects were subject to forward scattering, and the plane reference wave and object wave interference pattern was passed through a telecentric two-lens system to demagnify the scene onto the CCD. The hologram was recorded on a 12-bit Dalsa Genie Nano, with a pixel size of 5.86 µm. The telecentric system consisted of focal lengths 0.5 m and 0.05 m to give a 10× demagnification factor, providing an EPS of 58.6µm. This EPS value was chosen based on the results presented in Sections 2,3, which was shown to be the largest effective pixel size (or worst "imaging resolution") that still gave an acceptable localization error in the single object case. Maximizing the EPS in this way yields the largest possible recording volume, while maintaining accuracy for a given CCD array.
Two steel balls 2.4 mm in diameter were affixed to optical mounts using copper wires 0.05 mm in diameter. The closest object (O 1 ) was placed on an X Y translation stage, and the furthest object (O 2 ) was placed on a cage mount to allow for z-axis travel. The objects were sprayed with matte grey paint to avoid reflections from the metallic surface. The large propagation distance (1.40 m) required to help defocus the twin image was achieved by moving the CCD away from the front focal plane of lens L2 (see Fig. 8). At the CCD side of the two-lens system, the scattered waves propagate at a rate proportional to the square of the lateral demagnification of the two-lens system, but the plane reference wave remains collimated. This allows for a more compact optical setup, as a 1 mm movement of the CCD amounts to a 100 mm movement of the apparent object position for the case of a lateral demagnification of 10×. The closest object (O 1 ) was placed at a known z-axis distance from the back focal plane of the two-lens system (z = 0.7 m), and the CCD was moved 7 mm away from the front focal plane of the two-lens system. The best-focus plane of O 1 would therefore appear to be at z = 1.40 m away from the back-focal plane of the two-lens system during reconstruction, which is shown in the focus metric curves in Fig. 9.
For the multiple object case, the furthest object was placed directly behind O 1 at distances of z s = 20 − 50 mm in increments of 5 mm, and the X Y translation stage was adjusted to provide offsets of OS xy = 0.5, 0.75, 1, 1.5, 2. The total error (z err,t ) was defined as per Eq. (7). In the case of a focus metric curve failing to identify two separate objects, as shown by F G in Fig. 10(c), the error was defined as z err,t = NaN. Figure 10(a) shows the recorded hologram of two objects separated by OS xy = 1 and z s = 40 mm, the simulation of which was shown in Fig. 6. The intensity reconstruction at z rec = 1.40 m is given in Fig. 10(b), and the focus metric curve is shown in Fig. 10(c). It is apparent that, when using an edge-gradient-based focus metric or the Tamura coefficient, the two objects are unresolvable in the z-direction, therefore being assigned a total error of z err,t = NaN. However, the particle signature function method successfully identifies two separate objects in the z-direction with an absolute error of z err,t = 1 mm.

Research Article
The effect of object proximity on the total z-axis localization error is demonstrated in Fig. 11 for the proposed particle signature function focus metric. The edge-gradient-based focus metric failed to identify the two objects as separate in the z-axis for all object X Y separations (OS xy ) and z separations (z s ), while the TC only identified the two objects as separate in a few cases. The proposed metric successfully identified the two objects in a single interrogation region as separate in the z-direction for z-axis separations of 25−50 mm. However, the particle signature function metric failed to resolve the two objects axially for z s = 20 mm. The objects were resolved axially for z s = 25 mm using F Y , although the localization error was considerable. Across all object separations and for z s = 30 − 50 mm, the highest total error was z err,t = 15.25 mm, and the mean total error was z err,t = 5.81mm = 2.42d o .
To assess the algorithm performance on an object with an asymmetric morphology, two dead mosquitoes (anopheles gambiae) were suspended on copper wires 0.05 mm in diameter at known distances of z = 1.40 m and z = 1.45 m. An intensity reconstruction at z = 1.45 m is given in Fig. 12(a), with the mosquito positioned on the right at z = 1.45 m in sharper focus. Figure 12(b) shows that the proposed particle signature function focus metric displays two distinctly separated peaks, which yields a total z-axis localization error of z er r ,t = 3.25 mm.

DISCUSSION AND CONCLUSIONS
The particle signature function focus metric further developed in this paper for use with non-particle objects has been shown to be robust across a range of effective pixel sizes and object diameters in the presence of both a single object, and multiple objects inside a single interrogation region.
A clear improvement in the Q-value and reduction in the z-axis localization error is evident for the proposed metric in both the simulated and experimental data of a single object when compared with the intensity gradient based metric or the TC. The lateral effective pixel size (or imaging/sampling resolution) in the recorded volume was shown to be a key contributing factor to the localization error and Q-value of the focus metric curves, which was expected. The simulations and experimental data for the case of multiple objects inside a single interrogation region provide evidence that the proposed focus metric was able to correctly resolve two objects in the axial direction when the gradient-based metric and TC failed, even when the objects were largely overlapping (OS xy = 0.5). The discrepancy in the minimum axial resolvable distance of two objects between the simulated and experimental data is likely due to noise and non-uniformities in the wavefront phase in the physical experimentation. Background removal was performed on the holograms to alleviate some of these issues, and methods to reduce the minimum axial distance resolvable will be explored in a subsequent publication. Despite the axial resolution of the real experimental data being poorer than the simulations, the proposed focus metric still performed far better at multiple object localization than the edge-gradient-and TC-based focus metrics.
A case can be made that, for OS xy > 1, two objects can be separated laterally, and therefore given their own interrogation region as per the methods used in [43]. However, overlap of the diffraction patterns at the hologram plane can make lateral object separation and image segmentation difficult and imposes restrictions on the lateral separation that can be reviewed [43]. The experimental case from Fig. 10(b) [also simulated in Fig. 6(c)] was examined, where OS xy = 1, for known object axial distances of 1.40 and 1.44 m, and an EPS of 58.6 µm, and separating the interrogation region into two windows centered around each object [ Fig. 13(a)].
The close lateral proximity and overlapping diffraction patterns of the two objects caused a total error of z err,t = 9.25 mm and z err,t = 2.25 mm for the two objects when using the edgegradient-and TC-based methods, respectively, despite the focus metrics being calculated for separate interrogation regions, as shown in Fig. 13(a). The total error when using the particle signature function focus metric in this case was z err,t = 0.75 mm when two separate interrogation regions (Windows 1, red, and 2, green) were used, and z err,t = 1.00 mm when a single interrogation region [Window 3, blue, or Fig. 6(c)] was used. It is therefore anticipated that a combination of utilizing the proposed focus metric and image segmentation would provide object localization with minimal error when OS xy > 1. For OS xy < 1, image segmentation methods are unlikely to identify two separate objects laterally at the hologram plane, and therefore, the use of the proposed particle signature function method will yield more accurate localization than other focus metrics, as shown in Section 4.
There is a point of discussion to be made in the multiple object case that the individual diffraction patterns of two objects with a fixed lateral separation (OS xy ) will overlap more on the recorded hologram when the axial separation (z s ) is larger due to the more diffracted wavefront of the furthest object, and therefore the less effective signal may influence localization accuracy. This point was discussed by Jiao et al. in 2017, where their image segmentation algorithm performed better for objects with a smaller axial separation due to less overlapping signal on the hologram plane [43]. However, the results presented in Section 3 earlier in this paper, that a larger axial separation yields a lower total error, implies that this effect is offset by the lesser superposition of the two focus metric curves for larger z-axis separations, i.e., the out-of-focus signal of one object has less of an impact on the other object's focus metric calculation at the correct z-axis position when the two objects are further apart axially.
Another point of discussion is the effect of the overall propagation distance on the localization error. If the objectto-hologram distance (propagation distance) is changed, the Research Article assessed focus metrics in this paper may lead to different localization error results for a pair of objects with fixed lateral (OS xy ) and axial (z s ) separations. In this study, an object-to-hologram propagation distance of z d = 1.40 m was chosen as a good compromise between satisfying sampling criteria of the discretized propagation formulae; reduction of the impact of the twin image on focus metric calculations; and what can feasibly physically fit on a standard optical table. This is something that was not examined in detail in this paper but should be the subject of further study.
In small-object tracking applications, an optimized optical recording system consists of the largest recording volume on a CCD array with the fewest number of pixels, while maintaining acceptable levels of localization accuracy. For a given CCD array, the results shown in Fig. 3 indicate that the proposed focus metric could be used for a larger recorded volume (approx. 2× larger in linear dimensions), while maintaining a similar z-axis localization accuracy compared to using the intensity gradient focus metric. For the continuation of this study and expansion of the imaging volume, large aperture mirrors will replace the collimating lens (CL) and object lens (L1) in the recording setup shown in Fig. 8.
The likelihood of multiple objects coming into close proximity in a given volume increases as the number of objects within the volume is increased. The improved axial resolution using the proposed particle signature function method, presented in Sections 3 and 4, implies that the position of a greater number of objects in a given volume can be accurately resolved and localized than if the edge gradient or TC was used as a focus metric.
Recently, Hughes et al. examined mosquito behaviors at a human-baited insecticidal net interface in a 100 × 100 × 100 mm "baited-box" environment [9]. In the same testing environment, digital holographic recording and reconstruction using the particle signature function focus method could provide accurate 3D flight reconstructions of multiple mosquitoes inside this volume with a high depth of field using a single camera. With modern high-resolution cameras (e.g., 3000 × 3000 pixels), it is anticipated that this could be extended to a 250 × 250 × 250 mm volume to encapsulate more of the mosquito flight behavior pre-landing and correct the as-yet-unknown displacement in the third spatial axis.