On the use of a cascaded convolutional neural network for three-dimensional flow measurements using astigmatic PTV

Many applications in chemistry, biology and medicine use microfluidic devices to separate, detect and analyze samples on a miniaturized size-level. Fluid flows evolving in channels of only several tens to hundreds of micrometers in size are often of a 3D nature, affecting the tailored transport of cells and particles. To analyze flow phenomena and local distributions of particles within those channels, astigmatic particle tracking velocimetry (APTV) has become a valuable tool, on condition that basic requirements like low optical aberrations and particles with a very narrow size distribution are fulfilled. Making use of the progress made in the field of machine vision, deep neural networks may help to overcome these limiting requirements, opening new fields of applications for APTV and allowing them to be used by nonexpert users. To qualify the use of a cascaded deep convolutional neural network (CNN) for particle detection and position regression, a detailed investigation was carried out starting from artificial particle images with known ground truth to real flow measurements inside a microchannel, using particles with uni- and bimodal size distributions. In the case of monodisperse particles, the mean absolute error and standard deviation of particle depth-position of less than and about 1 μm were determined, employing the deep neural network and the classical evaluation method based on the minimum Euclidean distance approach. While these values apply to all particle size distributions using the neural network, they continuously increase towards the margins of the measurement volume of about one order of magnitude for the classical method, if nonmonodisperse particles are used. Nevertheless, limiting the depth of measurement volume in between the two focal points of APTV, reliable flow measurements with low uncertainty are also possible with the classical evaluation method and polydisperse tracer particles. The results of the flow measurements presented herein confirm this finding. The source code of the deep neural network used here is available on https://github.com/SECSY-Group/DNN-APTV.


Introduction
Microfluidic devices have great potential in several fields including chemical processing, medical science, biology and energy conversion, among others. Very often the fluid flow has a 3D characteristic due to the small dimensions of the fluidic channels or nonlinear effects exploited in those small devices. To characterize the fluid flows, optical velocity measurement techniques such as particle image velocimetry (PIV) or particle tracking velocimetry (PTV) are often applied. For this, particles of well-known size and concentration are suspended in the fluid as tracers that have to faithfully follow the fluid flow. However, due to the very small dimensions of several tens to hundreds of micrometers, and optical access given from only one side, the whole fluid volume is illuminated. In conjunction with velocity gradients in the outof-plane direction, this causes major bias errors. To circumvent these biases, several techniques capable of measuring the three-component, three-dimensional velocity distributions with high spatial resolution have been developed [1]. One of these techniques fulfilling the requirement of velocity measurements from only one side is the astigmatic particle tracking velocimetry (APTV), which has widely been used for different applications [2][3][4][5][6]. This technique uses astigmatic aberrations caused by the introduction of a cylindrical lens in the detection path of a microscope, yielding elliptical particle images. The dimensions of their principal axes depend on the depth position of the spherical particles [7][8][9][10]. For the calibration, usually particle images from particles placed on several wellknown depth-positions inside the measurement volume are evaluated and parametric fit functions for the principal axes of the elliptical particle images with depth-position are determined [11]. For this, 2D Gaussian-shaped elliptical particle images are assumed. During flow measurements the shape of detected particle images is evaluated in terms of their ellipticity, whereby the actual depth position of the corresponding particles is derived by minimizing the Euclidean distance between the measured data and the calibration curve [11]. This evaluation method is referred to as the Euclidean approach and promises a robust estimation of the particle position with low uncertainty even for noisy data, on the condition that minor optical aberrations exist and high-quality monodisperse spherical particles are used as tracers [12]. However, both requirements cannot be fulfilled for each experimental setup and application. For example, it might be beneficial to measure the velocity of biological samples or randomly shaped particles, or optical access of only low quality is given due to thick cover glasses to withstand high pressure or due to internal refractive index gradients caused by tensions in microfluidic channels made of elastomers. Therefore, efforts were made to predict uncertainties depending on the optical setup [13], and to generalize the evaluation methods based on correlations [10].
For the latter, recent progress made in the field of machine vision promises a new advanced evaluation method for APTV with less computational time. Machine learning algorithms are powerful in classifying images. In previous works, a technique was developed that automatically identifies 2770 German plant species and builds upon the latest deep learning approaches. Achieving accuracies well beyond 80% on a single plant image in this extremely fine-grained classification problem impressively demonstrates the potential of machine learning [14][15][16]. Deep learning techniques have also been used to classify the species and the age of phytoplankton using a microfluidic image cytometer [17]. However, a large amount of already annotated data is required to train the network. Neural networks have also been used in fluid dynamics in the mid-90s of the last century, for the determination of the fluid velocity using PIV and PTV [18]. In the first approach, already multilayer neural networks (three and four layers) were used to determine the particle image displacement in successive frames. The images in this early work were not analyzed by the network but used classical image processing methods. The particle image center positions were fed by a binary image into the network. The number of correctly matched particle images in these studies were always larger than for classical methods in low seeding concentration images (50 to 500 particle images in double exposure per image) at that time. Using cellular neural networks for PTV a higher matching probability using less computational time was found [19]. Further improvements have been made using neural networks by identifying partly overlapping particle images to exclude outliers or by reconciling the stereo view for 3D PIV [20]. However, neural networks were used to perform only parts of the PIV processing scheme. The first end-to-end PIV approach using neural networks was reported by Rabault et al in 2017, applying convolutional neural networks (CNN) and fully connected neural networks (FCNN) to synthetic data and experimental test sets [21]. The proof-of-concept results presented therein hold a lot of promise, particularly for flow measurements close to boundaries where strong velocity gradients are present. Very dense vector fields with one vector for each pixel can be obtained by modifying the CNN LiteFlowNet for optical flow estimations [22]. This approach will significantly increase the spatial resolution further, dependent on the actual particle image diameter given by the optical magnification of the setup used, particle and pixel size, as known from PIV using single-pixel ensemble correlation [23]. Recently, deep convolutional networks were used to determine the velocity vectors in densely seeded flows [24,25]. Since different flows (wall bounded, bluff body, uniform) were used for training, the algorithm was capable of being applied to a variety of flow situations. Although the results are quite promising, the computational costs were much larger than using classical correlation methods and the performance at new flow situations is not fully proven yet. However, machine learning algorithms are a powerful tool in classifying particle images and have already been used for the calibration of a macroscopic APTV by extracting more generalized Gaussian features of the particle images [26]. This shows that recent advances made in the field of machine vision offer new possibilities, to obtain a more robust and user-friendly evaluation method, probably also for the case of overlapping particle images.
The aim of this paper is to show the capability of a cascaded deep neural network, not only to detect particle images but also to determine the depth-position of the corresponding particles, by comparing this novel method with the classical evaluation method based on the Euclidean distance approach. For this, artificially generated elliptical particle images with known ground truth were used to get first insights of the achievable uncertainty for the particle depth-position. However, as aberration-free, 2D Gaussian-shaped particle images do not exist in real experiments, calibration measurements were made with a real APTV setup using monodisperse particles and a particle solution with bimodal size distribution. Flow measurements in a straight microchannel using both particle solutions demonstrate the applicability of the novel method and confirm the results derived from the calibration measurements.

Experimental setup and calibration
In order to experimentally qualify the use of the deep neural network for 3D flow measurements applying APTV, a micro channel made of polydimethylsiloxan (PDMS) with a long straight section was used. The PDMS was produced by mixing base and curing agent in a ratio of 10:1. This mixture was then evacuated to remove air bubbles and poured into a homemade, milled casting mold made of aluminum. The estimated width W and height H of the microchannel was W × H = (565 × 507) µm 2 . To harden the PDMS, the mold was placed on a hotplate at 80 • C for 2 h. To ensure optical access from the bottom side, the PDMS channel was closed with a 1 mm thick glass cover slide. To bond the PDMS to the microscope slide, both were placed in a plasma cleaner to activate their surfaces with the aid of an O 2 -plasma.
The experimental setup used in the present study is sketched in figure 1. For the measurements, the assembled micro channel was placed on top of an inverse microscope (Axio Observer 7, Zeiss GmbH) equipped with a long working distance Plan-Neofluar objective (M20x, NA = 0.4, Zeiss GmbH). A high-power LED at a central wavelength of about 520 nm (Solis525C, Thorlabs Inc.) was used to illuminate fluorescent polysterol particles (530/607 nm, PS-FluoRed, MicroParticles GmbH) of either 5 µm or 2.5 µm in diameter, which were dispersed in an aqueous glycerol solution. The glycerol content of the solution was set to about 20 wt. % , in order to match the particle's density of about 1.05 g cm −3 and keep sedimentation low [27]. A syringe pump (neMESYS, Cetoni GmbH) with a 1 ml glass syringe (ILS Inovative Laborsysteme GmbH) was used to pump the aqueous glycerol solution through the microchannel with a constant flow rate of 0.05 ml h −1 . To discriminate between illumination and fluorescent light of the tracer particles, the microscope was equipped with a longpass dichroic mirror (DMLP567T, Thorlabs Inc.) and a longpass filter (FELH0550, Thorlabs Inc.) with a cut-on wavelength of 567 nm and 550 nm, respectively. To induce astigmatism in the detection path, a cylindrical lens with a focal length of f cy = 200 mm was placed approximately 40 mm in front of the sensor of the sCMOS camera (imager sCMOS, LaVision GmbH). The axis of the cylindrical lens was carefully aligned to the y-axis of the camera sensor to shorten the resulting focal length in the xdirection, while keeping the focal length in the y-direction almost unaffected. This yields different magnifications of the particle images in the x-and y-direction of the sensor, depending on the z-position of the particles. As the depth of the measurement volume of about 80 µm was smaller than the height H of the microchannel, the measurement volume was traversed through the microchannel by moving the microscope objective in equidistant steps of 30 µm. At each measurement position 3000 images were taken in single-frame mode with a frame rate of 50 Hz.
Before the flow measurements were taken, the fluorescent particles were dried off on a glass cover slide of the same type as used for the microchannel. These particle images served as calibration data for both evaluation methods. An exemplary image capture with randomly distributed particles within the xy-plane is depicted on the right side of figure 1. For the calibration, these particles have to be positioned on several well-known z-positions within the measurement volume, to obtain the correlation between particle image shape and depth-position of the particles. However, as can be seen by the three zoomed-in particle images at the lower left, center and upper right position within the field of view (FOV), the particle images slightly deviate from each other, although they are located at the same z-position. This is further highlighted by the corresponding intensity profiles depicted in figure 1. While the particle image intensity profile close to the center is of nearly Gaussian shape, the particle images on the lower left and upper right positions are slightly distorted to the left and the right side, respectively. This asymmetric behavior is caused by inherently spherical aberrations from the optical setup, especially caused by the noncorrected cylindrical lens [13], and may result in a higher error of the particle position as only undistorted, elliptical particle images of Gaussian shape are considered in the classical evaluation method. In contrast, neural networks consider all distinctive features of the particle images. However, if local variations are present, comprehensive training data sets are necessary to keep the position error low. Therefore, calibration measurements were done by traversing the glass cover slide with the dried particles not only in the z-direction, but also in the x-and y-directions. In total, 9225 images were captured with an in-plane displacement of the glass slide of almost 1 mm and a traversing range of 80 µm in depth-direction. A step width of ∆z = 2 µm was set in between the several well-known depth-positions for high spatial resolution in the z-direction. The unidirectional positioning accuracy in the z-direction was less than 10 nm and can be neglected. However, as it turned out later by processing the data, a slight tilt of the glass cover slide of about 0.04 • was present during the calibration measurements. This very slight tilt caused a shift of the particles of about 0.7 µm in the zdirection, while moving the glass cover slide 1 mm in the xand y-directions. This undesired particle displacement limited the positioning accuracy in the z-direction during the calibration measurements and has to be borne in mind for the discussion about the position error of the experimental test case in subsection 4.2.
The images were preprocessed with an in-house Matlab script to detect and validate the individual particle images [28]. The estimated x-and y-positions, as well as the sizes of the particle images, were fed into the training set for the deep neural network as initial data. In addition, with the validation step, taking the maximum and minimum allowable size of particle images into account, partly overlapping particle images were excluded from the training data. As flow measurements were conducted with two different particle sizes at the same time, calibration measurements were done three times: solely with particles of 5 µm and 2.5 µm in diameter, as well as with both particle sizes at the same time. Hence, the total training data for the deep neural network comprised 3 × 9225 image captures containing pre-annotated and validated individual particle images.
In figure 2(a) the particle image width a x and height a y determined close to the center of the field of view (FOV) are depicted over the z-position. As expected, the astigmatic defocus is independent of particle diameter d p . The two focal points of the astigmatic setup F xz and F yz coincide well for both measurements. The general behavior of particle image diameter a(z), depending on the position relative to the focal point, can be approximated by [29] and depends on particle diameter d p , magnification M, emitted wavelength λ, numerical aperture NA and the refractive index of the immersion medium n 0 of the microscope objective, as well as on the actual z-position of the particle relative to the position where the focal point z F is located. Here, either the focal point F xz or F yz applies, depending on whether the width a x or the height a y of the particle image is considered. Basically, equation (1) consists of three terms describing the geometrical and the diffraction limited imaging as well as a blurring of the particle image due to a defocus. The latter plays an important role for APTV as its principle is based on the defocus [28]. Close to the focal points, the particle images deviate in width a x and height a y mainly due to the geometrical part in equation (1), and the different particle sizes. However, as can be seen in figure 2(a), the calibration curves for the two different particle sizes coincide with increasing distance to the focal points. With the parameters of the optical setup, the difference between the particle image diameters of both particle sizes decreases to less than 10% within a distance of 10 µm to the focal point z F , as the geometrical part becomes less significant in comparison to the strong defocus. The measured particle image width and height confirm this theoretical estimation quite well. The measured particle images during the calibration measurement using both particle sizes are depicted in figure 2(b) within the a x a y -space. For clarity, the individual calibration functions obtained either using 2.5 µm or 5 µm particles are indicated by the lines as well. While particle images close to and in between the two focal points are separated from each other depending on the particle size, both overlap within the a x a y -space towards the margin of the measurement volume. This makes it difficult to discriminate between both particle diameters there, if both are present at the same time. In the case of flow measurements, applying a maximum allowable Euclidean distance from the calibration curve within the a x a y -space as a validation step during post-processing, the other type of particles can be excluded from the measurement data in between both focal points, on condition that the particle sizes sufficiently differ from each other. However, towards the margins of the measurement volume in the depth direction a higher position error and, therefore, a higher velocity uncertainty has to be expected.

Image processing
In contrast to the artificially-made particle images obtained from the APTV particle image generator, the experimentally taken images have to be preprocessed due to background noise occurring in real experiments, e.g. caused by particles that stick to the channel walls. For background removal, the minimum intensity for each pixel was determined over all images taken at each measurement position, and subsequently subtracted from each individual image. As 3000 images were taken at each measurement position, a very efficient background removal was obtained. Therefore, particle images from particles that stuck on the wall were removed efficiently, so that they did not have to be taken into account for the PTV algorithm afterwards. For the calibration measurements, no background subtraction by using the minimum intensity over time approach was applied. To determine the 3D particle positions within the measurement volume, the image captures were then processed using a deep neural network, as described in subsection 3.1. For comparison, the same data sets were evaluated using a classical evaluation approach, see subsection 3.2.
Knowing all 3D particle positions, the particle trajectories were determined using a probabilistic particle tracking algorithm for a high vector yield [30]. The parameter settings for the particle pairing were the same for both evaluation approaches. Outliers were detected using a universal outlier detection [31]. For further evaluation, the randomly distributed particle velocity vectors were ensemble-averaged and interpolated onto a rectangular grid using Gaussian-weighted interpolation based on vector distance from the grid nodes. Since a laminar fluid flow within a straight microchannel was measured and no change of the velocity distribution has to be expected in streamwise direction, only a 2D pattern of rectangular bins with an edge length of 30 µm was used. Using an overlap of the bins of 75% yields a velocity vector spacing of 10 µm.

Deep neural network
For the two stage analysis process of identifying individual particle images in the image captures containing many of them, and then estimating the depth position of the corresponding particles, a cascade of two connected deep convolutional neural networks (CNN) is used. Figure 3 depicts a flow chart of the cascaded network topology. In a more general context, machine learners refer to the two depicted stages as object detection and regression, respectively. The first CNN is based on the faster R-CNN architecture (region-based convolutional neural network) known for its precise object detection capabilities ('particle image detection' box in figure 3). This CNN is used to generate so-called region proposals containing candidate objects and then classifies for each of these candidate regions whether it contains a particle image or not.
Faster R-CNN is especially effective, since the two networks share their initial layers and the connected parameters, making it possible for an image to be processed in one forward propagation through the network rather than performing an upfront selective search used in precursor approaches, such as R-CNN and fast R-CNN [32]. More specifically, the input of the realized faster R-CNN architecture is an image of any size containing zero or more particle images. The output is a set of object proposals each containing a cropped candidate particle image and the position at which it has been discovered in the input image. The size of the cropped image was set to a fixed size of 180 × 180 pixels, slightly larger than the maximal width a x and height a y of the elliptical particle images. Flow chart of the cascaded CNN architecture used to detect particle images and determine their position. The architecture consists of two CNNs in total. The first CNN is based on the faster R-CNN architecture with shared initial layers and parameters, used for particle detection. The second CNN is used for the estimation of the depth-position of the particles.
Within the faster R-CNN architecture, one can decide about the actual feature extraction network to be used. Here, a Res-Net architecture with 101 layers as the feature extractor was used. Since its introduction in 2015, the ResNet architecture has been successfully applied to many different problems and won various benchmark competitions, e.g. MSCOCO 2015object detection, ILSVRC 2015-image classification, localization, object detection. A rectified linear unit (ReLu) is used as activation function and a batch normalization is applied after all convolutional layers. To train the faster R-CNN network, an asynchronous stochastic gradient decent (SGD) with a momentum of 0.9 was used. The initial learning rate was set to 3e −4 , which was further reduced by a factor of ten after 900 000 and after 1.2 million iterations. The network weights were randomly initialized from a truncated normal distribution with a standard deviation of σ = 0.01. To prevent overfitting a L2 regularization was applied. The hyperparameters, especially the learning rate and its decay schedule, mini batch size, and SGD momentum, that have a substantial effect on the performance of the approach, were determined by performing a systematic search of these parameters to obtain the most suitable values, using the validation split of the dataset. For further general details about the faster R-CNN architecture, the reader is referred to [33]. Since an object should be equally recognizable as its mirror image, we augment the training data from the calibration measurements with horizontally flipped counterparts per image.
What remains unknown after the first stage is the particles' depth-position. Deriving a continuous output value, such as the desired z-position, based on a given input (particle image), constitutes a regression problem from a machine learning perspective. The second CNN was used to obtain the relationship between elliptical particle images, discovered in the first stage, and the z-position of the corresponding particles ('depth regression' box in figure 3). It relies on an up-to-date Inception V3 CNN architecture [34], but utilizes a mean absolute error (MAE) as the objective function, since it aims to predict a continuous numerical value rather than a probability distribution as used for classification problems. The network was trained with an initial learning rate of 0.01 that exponentially decreased after each epoch by a factor of 2.4. The network was trained for 800 epochs with a batch size of 32 and Adam as optimizer [35]. The network's training converged almost simultaneously, after 19 epochs, on the training and the validation set for the regression network. The simultaneous convergence of the validation set demonstrates that the network not only learned but is also able to generalize to unseen data.
Deep neural networks require substantial computational effort for training. However, this training is required only once. During the inference phase, each image typically requires less than 100 ms for processing. However, the actual computation time heavily depends on the specific hardware the model is computed on. Here, an NVIDIA graphics processing unit of type RTX2080 was employed. Using the experimental dataset, the time required for training the particle detection and the particle regression model was about 26 and 23 h, respectively. However, training the two models can happen in parallel. Once both models are trained, they can be applied, with significantly less effort, in intensive inference. Analyzing one image capture with our trained model pipeline takes 5.28 ms on the same hardware.

Classical method
The particle image detection and the evaluation of the individual particle images followed the general procedure described in [11,28]. First, after subtracting the image background, a particle image detection algorithm based on a global intensity threshold was used to determine regions where particles are supposed to be present. For this, the width a x and height a y of the particle images were allowed to vary between 6 and 180 pixels, according to the particle sizes over the entire depth of measurement volume determined during the calibration measurement. Sub-pixel accuracy for the in-plane positions and for the width a x and height a y of the particle images were obtained by correlating the particle images with a Gaussian particle image [36]. The actual depth-position z i of a particle within the measurement volume was then derived by minimizing the Euclidean distance between the measured point [a x,i , a y,i ] and the calibration curve in the (a x ,a y ) -space, compare figure 2(b). Only those particle images were accepted with a maximum distance from the calibration function of about 4 pixels. In this way, outliers caused, for example, by overlapping particle images, were excluded.

Synthetic test case
To test the reliability of the new evaluation approach using deep neural networks for APTV, artificial particle images with known ground truth of the 3D particle position were used at first. For this, a particle image generator was applied. The algorithm of the image generator is based on the mathematical background derived by Rossi et al [12] and generates particle images from the known optical setup including magnification M, focal length of the cylindrical lens f cy , wavelength of the light λ, refractive index of the medium and so forth. Here, particle images were produced for a very similar setup as described above, using a magnification of M = 20 and a focal length of f cy = 200 mm for the cylindrical lens. However, additional spherical aberrations, as present in the experimental test case, are not included in the image generator.

Calibration.
Artificial calibration images were generated with approximately 100 000 individual particle images randomly distributed within a depth distance from 0 to 90 µm, using a step width of ∆z = 1 µm. In total, 13 650 artificial images were generated with approximately eight individual particle images only. In this way, overlapping particle images were avoided to a very good extent for training the neural network. Approximately 90% of the particles were used to train the neural network, while 10% were used to evaluate the accuracy of both methods. Details about parameter settings during the processing of these images, etc, can be found in a preliminary study presented at the 13 th International Symposium on Particle Image Velocimetry (ISPIV 2019) [37]. The mean absolute error MAE z and the standard deviation σ z derived from the differences between the real prescribed and the estimated particle z-positions (z real − z est ) from the calibration data, are itemized in table 1.
According to the results, the MAE z for the deep neural network is about seven times higher than that for the classical method. This is not surprising, taking into account that the model for the artificially made particle images coincide with the model of the particle images considered in the classical evaluation method. Since the particle images are ideally Gaussian shaped, the only discernible feature for the deep neural network is the radial intensity distribution. As it is known that these networks perform better with increasing features, the optical aberrations as shown in figure 1 are expected to increase the performance for the neural network, whereas it is detrimental for the classical method. In general, the larger the deviations from the assumptions made in classical evaluation methods, the better the deep neural network will perform. This applies not only for the object detection in machine vision, but also for flow measurement techniques like PIV, if high local flow variations exist [21]. The standard deviations of both methods are in the same order of magnitude. For the classical method, the 13 times higher standard deviation compared to the MAE z certainly limits the uncertainty, as long as no aberrations cause deviations of the particle images from the ideal model of elliptical Gaussian-shaped particle images.

Influence of the noise level.
In order to test the robustness of the trained deep neural network dependent on the particle image quality, artificial particle images with and without noise were generated, applying different signal-tonoise ratios (SNR) ranging from 0 dB to 30 dB. For each data set, 1000 images with more than 20 individual particles in each of them were generated. The particles were randomly distributed within a slightly larger depth-position ranging from −14 µm to 104 µm, since in a typical measurement scenario the effective depth of the measurement volume is very often smaller than the microchannel.
In figure 4 the standard deviation σ z is depicted over the SNR-level. Firstly, the standard deviation of the DNN is more than one order of magnitude larger than that of the classical method. Even without noisy particle images, the standard deviation of the DNN is significantly increased compared to that obtained with the calibration data (compare the values given in  1). The reason might be that the higher concentration of particles applied for the testing data cause a higher probability of partly overlapping particle images. While particle overlaps are being recognized in the in-house software using the classical evaluation method, those overlaps cannot be detected with the deep neural network yet, yielding a correspondingly higher amount of outliers. Another reason can be found in the range of the prescribed z-values. While the depth-positions of the particles in the test data coincide with the prescribed z-positions of the artificial calibration set, randomly distributed particle positions in between the sampling points exist now, which have to be estimated by regression. Secondly, as expected, the higher the SNR, the lower the standard deviation for both methods. Interestingly, the evaluation with the deep neural network approaches much faster the standard deviation obtained without noise, indicating that the use of deep neural networks is relatively robust to noise.

Experimental test case
To test the new evaluation method under more realistic conditions, the calibration measurements done, as explained in section 2, using the monodisperse particle solution with 5 µm in size and the particle solution with bimodal size distribution were evaluated. In this way, particle images aberrant from ideal 2D Gaussian-shaped particle images exist in the data; compare the zoomed-in particle images and their corresponding intensity profiles in figure 1. Again, 90% of the particles were used to train the neural network, while 10% of the calibration images were kept for testing. The artificially generated particle images were not considered to train the neural network for the experimental test case. For quantification, the MAE z and the standard deviation σ z were derived from the real particle position z real given by the known depth-positions of the glass cover slide and the estimated particle positions z est determined either with the classical method or the neural network. It should be noted that, in contrast to the synthetic test case, the ground truth of particle position is not known here, as calibration measurements are subjected to uncertainties (please see also the explanations given in section 2). In figure 5, the MAE z and the standard deviation σ z are depicted over the z-position within the measurement volume for both calibration measurements.
The MAE z obtained for the monodisperse particles is almost constant over the entire depth of the measurement volume. No significant difference exists between the CM and the DNN, which is also indicated by the mean absolute error averaged over the entire depth of the measurement volume, listed in table 2. Comparing these values with the determined positioning accuracy of about 0.7 µm of the calibration measurements, it is obvious that the accuracy of the calibration measurement mainly limits the achievable MAE. While the MAE remains constant for the DNN, it significantly increases for the CM applying the calibration functions obtained for particles of 5 µm in size to the images captured with the mixed particle solution. According to figure 5(a), apart from the positions where the minimum particle image width a x and height a y are present, significant mean absolute errors occur causing an increase in the averaged MAE of about four times. Comparing the experimentally determined averaged mean absolute values with those obtained for the synthetic test case, see table 1, a tremendous difference between both evaluation methods comes to light. For the DNN, the MAEs obtained for the synthetic and the experimental test case are comparable. However, for the CM the MAE increases about 10 or even 150 times at maximum, using the monodisperse particles or the particles with two different sizes, respectively. This can be explained as follows. First, the real particle images deviate from the ideal Gaussian-shaped particle image assumed for the classical evaluation method, while the distinctive features of the particle images caused by optical aberrations can be utilized from the DNN. Second, the CM is based on parametric calibration functions relying on a fixed particle size, while the DNN can be trained for different particle sizes.
The standard deviations σ z obtained along the depth direction for both methods and both calibration measurements are depicted in figure 5(b). The standard deviation for the DNN with the monodisperse particle solution seems to slightly increase with increasing z-position. In the case of the CM, the standard deviation is almost constant over the entire depth of measurement volume. Again, the positioning accuracy of the calibration measurements mainly limits the standard deviation. However, applying the mixed particle solution the standard deviation significantly increases towards the margins of the measurement volume, while a very similar standard deviation results in between the two focal points of the astigmatic system compared to that of the monodisperse particles. For the CM, this can be explained with the calibration functions obtained for both particle sizes separately, see figure 2. As the optical transfer function does not alter and only the particle image diameter changes with particle size, the two calibration functions within the a x a y -space run almost parallel in between the two focal points. Hence, with determining the minimum Euclidean distance between the measured point [a x,i , a y,i ] and the calibration curve close to the center of the measurement  volume, the resulting z-position does not change significantly even though particle images of different size are evaluated. In the case of the DNN, no unique trend to higher standard deviations can be recognized, indicating that the higher deviations towards the margins of the measurement volume might be caused by a lower particle image quality. In comparison to the synthetic test case, see table 1, the use of real particle images does not increase the standard deviation, neither for monodisperse particles nor for the particle solution with bimodal size distribution. Contrary to this, the classical evaluation method is very sensitive to the particle size distribution. While a similar standard deviation is obtained using real particle images instead of artificial ones, the standard deviation for the mixed particle solution is more than two times higher than for the monodisperse particle solution.

Flow measurements
Flow measurements were conducted as explained in section 2, and the corresponding image captures were processed according to the explanations given in section 3. With the exception of the determination of the particle positions, no differences were made for image preprocessing and postprocessing of the data using the CM and the DNN. At first, measurements were conducted with fluorescent particles solely of 5 µm in size. For the classical and the novel method using the DNN, 696 325 and 844 901 valid particles were obtained, respectively. Meaning, approximately 20% more valid particles were detected with the deep neural network. In figure 6 the ensemble-averaged 2D velocity fields are depicted. Obviously, more particles close to the top and the side walls of the microchannel were detected using the DNN. The reason might be the following: the deeper the penetration depth of the measurement volume, the larger the differences between the actual particle images and the ideal model of 2D Gaussian-shaped particle images, due to stronger optical aberrations. In addition, shadowing might affect the local intensity distribution of the particle images close to a side wall. In both cases, particle images might be identified as outliers and excluded from the data set. Regardless of the voids close to the top of the microchannel, a very similar 2D velocity field was obtained with both methods. For a quantitative comparison, the measured centerline velocity profiles in the horizontal and vertical directions are depicted in figure 7. As can be seen, the velocity profiles of both methods coincide, indicating a very good quantitative agreement not only in the horizontal but also in the vertical direction. In addition, the laminar velocity field in a microchannel with a rectangular cross section was calculated according to the result derived for the Poiseuille flow in those channel geometries given by Bruus [38]. For the estimated width W and height H of the microchannel used here, the calculated centerline velocity profiles are also depicted in figure 7. A very good agreement between theory and experiment can be found as well. The MAEs of the velocity MAE u along both profiles were estimated. The values are listed in table 3. Very similar results were obtained for both methods, not only in the horizontal but also in the depth direction. The higher MAE u for the vertical velocity profile results from the larger deviations between theoretical and measured velocity profiles within the first 30 µm close to the top and the bottom of the microchannel.
A second test measurement was conducted under the same experimental conditions, using the mixed particle solution with bimodal size distribution. The measured and ensemble-averaged velocity fields are depicted in figure 8. Again, a very similar velocity field was obtained applying both evaluation methods. In comparison to the measurement before, more velocity data close to the top corners were obtained. However, in the vicinity close to the top of the microchannel, meaning within 40 µm of the top channel wall, no reliable velocity measurements were possible. In addition, although the measurements were carried out under the same experimental conditions, the maximum velocity is approximately 15% higher. The reasons for these differences in the measurement result are unknown. However, as only an ordinary Poiseuille flow is considered, comparison between the measurements in terms of both evaluation methods can be undertaken. The total number of detected particle images was about 2.84 million and 1.48 million for the CM and the DNN, respectively. The significant higher number of detected particles for the CM results from a higher variation of the particle image intensities that varies along the measurement volume depth due to the defocus, and now because of the two different particle sizes. As the intensity of the particle images scales with the volume of the particles, a much lower intensity threshold was used for this measurement, to determine regions where also particles of 2.5 µm in size are supposed to be present. In that way, overlapping particle images from particles strongly out of focus might be detected as well.
Most of them can be recognized and excluded from the data set by applying the outlier filter in terms of the maximum allowed Euclidean distance from the calibration curve within the a x a y -space. In conjunction with the universal outlier detection algorithm during particle tracking, which applies for both methods, the number of valid particles reduced to 644 525 and 1 094 516 for the classical method and the deep neural network, respectively. For the classical method, the number of valid particles is almost the same as for the flow measurement using only monodisperse particles. However, now almost 70% more valid particles were detected with the DNN, which considerably lowers the statistical uncertainty of the velocity measurement.
The centerline velocity profiles in the horizontal and vertical directions for both evaluation methods are depicted in figure 9. Besides, the theoretically expected velocity profiles with adapted maximum velocity are illustrated as well, for comparison. In the case of the horizontal profile, a very good quantitative agreement is obviously found even very close to the channel side walls. In the case of the vertical velocity profile, both evaluation methods yield a good agreement with the theoretical profile, particularly within the first half channel height. Above from the apex, the measured profiles deviate more strongly from the theoretical one. Obviously, the larger the penetration depth of the measurement volume, the larger the deviations. However, the profiles obtained from both evaluation methods coincide well, indicating that no differences exist between the evaluation methods. The estimated MAE u , see table 3, confirms the visual agreement. Both methods yield comparable mean deviations from the theoretical velocity profile. However, the estimated values indicate a small increase compared to those derived from the measurement with monodisperse particles.
Surprisingly, no significant difference exists between the CM and the DNN when using the particle solution with bimodal size distribution. This can be explained as follows. (i) According to the results obtained from the calibration measurement, see figure 5, the maximum standard deviation σ z and the maximum MAE z are almost 100 times smaller than the channel size. Hence, the structure of the laminar fluid flow is comparably large to the expected uncertainty due to the use of the mixed particle solution. However, using smaller microchannels the influence would be significant. (ii) The light intensity emitted from the particles of 2.5 µm is almost one order of magnitude lower than that emitted from particles of Here, particles with dp = 5 µm were used as tracer particles.  Here, the particle solution with bimodal size distribution was used for measurement. 5 µm in diameter. The probability of detecting small particles close to the margins of the measurement volume, where the corresponding particle images are strongly defocussed and maximum standard deviation as well as maximum MAE are present, is low. (iii) Applying the outlier filter that limits the maximum Euclidean distance of the measured points from the calibration curve within the a x a y -space, the remaining particle images obtained from small particles located close to or in between the two focal points of the astigmatic system will be excluded.

Conclusion
The use of a deep neural network (DNN) was investigated for measuring volumetric velocity distributions in microfluidics using astigmatic particle tracking velocimetry (APTV). For this, comparisons were made between the classical evaluation method (CM) based on the Euclidean distance approach and a three-stage cascaded convolutional neural network. The first two stages detect individual particle images within the image captures, while the third estimates the depthposition of the corresponding particles. Starting from artificial particle images with known ground truth, calibration measurements with different particle sizes, to flow measurements in a microchannel employing monodisperse particles and particles with bimodal size distribution as tracer particles, a comprehensive comparison between both evaluation methods was carried out. This comprises the uncertainty of particle position depending on the size distribution of the tracer particles and the particle image quality, which depends on the SNR and optical aberrations existing in real experiments. Regarding the uncertainty of the particle's depth-position, the following can be stated: • The classical evaluation method works best if optical aberrations can be neglected and high-quality monodisperse particle tracers are used. In that case the standard deviation σ z limits the uncertainty of the particle depth-position. • The DNN is very robust to noise, however, the mean absolute error (MAE) of the particle depth-position as well as its standard deviation σ z are one order of magnitude higher than that obtained with the classical evaluation method, for the artificial particle images. • If optical aberrations come into play, the MAE of the particle position for the CM increases by about one order of magnitude, while the uncertainty obtained with the DNN decreases. For real particle images, both methods show a very comparable uncertainty in terms of the MAE and standard deviation of the z-position, on condition that an APTV setup with low optical aberrations and monodisperse particles as tracers are used. • In the case of a nonmonodisperse size distribution of the particle tracers, MAE and standard deviation continue increasing towards the margin of the measurement volume, of about one order of magnitude, using the classical evaluation method. Contrarily, the MAE and the standard deviation obtained with the DNN are not affected and remain almost constant over the entire depth of measurement volume.
The findings mentioned above can be explained as follows. If experimental conditions and particle images do not comply with the basic assumptions made in the classical evaluation method, the better the result obtained with the DNN compared to the CM. For this, two main reasons exist. Firstly, real particle images deviate from the ideal Gaussian-shaped particle image assumed for the classical evaluation method, while the distinctive features of the particle images caused by optical aberrations are utilized from the DNN. Hence, the novel evaluation method already promises very robust and reliable velocity measurements, even though strong optical aberrations may be present, e.g. due to the use of a low-cost APTV setup or an optical access of low optical quality often found in applications where measurements have to be conducted in vivo. Secondly, the CM is based on parametric calibration functions relying on a fixed particle size. In contrast, the DNN can be trained for the use of different particle sizes at the same time.
Besides the advantages of DNNs for APTV, another very important finding was revealed by the present study: the classical evaluation method based on the Euclidean distance approach is extremely robust, on the condition that optical aberrations can be kept at a moderate level. Even if nonmonodisperse particles are used, reliable flow measurements may be possible, as demonstrated with the measurement of the Poiseuille flow inside a microchannel with a rectangular cross section. With the APTV setup used here and a particle solution with bimodal size distribution having distinct narrow peaks at 2.5 µm and 5 µm, the maximum MAE and standard deviation of the particle position occurring close to the margins of the measurement volume were less than 10 µm. Hence, as long as the fluid flow of interest exhibits a comparably large structure and a laminar flow behavior without a pronounced out-of-plane component, the classical evaluation method can be applied without any doubt, even if the basic requirement of using high-quality monodisperse particles is violated. Moreover, a standard deviation of σ z < 1 µm can be ensured by limiting the depth of measurement volume close to and in between both focal points of the APTV. In that way, three-dimensional, three-component velocity measurements with very low uncertainty can be achieved, comparable to those where monodisperse particle tracers are used.
In future, further improvements are expected for 3D velocity measurements using APTV with the novel evaluation approach presented herein, by employing more sophisticated designs of network architectures better adapted to the needs of APTV, and by using more comprehensive training data including noise and overlapping particle images. For the latter, wellelaborated training strategies are required. With those training data, the use of neural networks for APTV may outperform the classical evaluation method by identifying more particle images with lower particle position uncertainty, even though optimized optical measurement setups are used. In addition, the neural networks can be trained to extend the applicability of this measurement technique not only to estimate the 3D position of polydisperse particles with very low uncertainty, but also to determine their size and shape at the same time. This opens up new possibilities for APTV, to be used not only for velocity measurement but also for clustering types of particles within one data evaluation step. Such applications can often be found in medicine or biology with flow-based assays, that make more and more use of microfluidic devices to separate, detect and analyze cells and particles of multiple species or sizes, respectively. In addition, less computational costs are expected, as demonstrated recently for PIV applications. Hence, the use of neural networks holds a lot of promise for APTV and other defocus techniques to also gain access to other disciplines, where nonexpert users can make use of the advanced measurement technique.