Fast and robust Fourier domain-based classification for on-chip lens-free flow cytometry.

The development of portable haematology analysers receives increased attention due to their deployability in resource-limited or emergency settings. Lens-free in-line holographic microscopy is one of the technologies that is being pushed forward in this regard as it eliminates complex and expensive optics, making miniaturisation and integration with microfluidics possible. On-chip flow cytometry enables high-speed capturing of individual cells in suspension, giving rise to high-throughput cell counting and classification. To perform a real-time analysis on this high-throughput content, we propose a fast and robust framework for the classification of leukocytes. The raw data consists of holographic acquisitions of leukocytes, captured with a high-speed camera as they are flowing through a microfluidic chip. Three different types of leukocytes are considered: granulocytes, monocytes and T-lymphocytes. The proposed method bypasses the reconstruction of the holographic data altogether by extracting Zernike moments directly from the frequency domain. By doing so, we introduce robustness to translations and rotations of cells, as well as to changes in distance of a cell with respect to the image sensor, achieving classification accuracies up to 96.8%. Furthermore, the reduced computational complexity of this approach, compared to traditional frameworks that involve the reconstruction of the holographic data, allows for very fast processing and classification, making it applicable in high-throughput flow cytometry setups.


Introduction
Leukocytes or white blood cells play an important role in the body's immune system as they are responsible for protecting from infections and invading organisms. Leukocyte counts are performed to detect abnormal balances of leukocytes, which are often caused by an underlying condition. Leukocyte differentials are performed to determine the concentration of each type of white blood cell present in a subject's blood. Both leukocyte counts and leukocyte differentials are essential measures to determine the subject's health as they can detect hidden infections and alert specialists of specific medical conditions, such as autoimmune diseases and blood disorders. Both tests are also used to monitor the effectiveness of chemotherapy during cancer treatment [1]. A fast, reliable and accurate method for leukocyte characterization remains an ongoing research goal [2].
Considerable efforts are made to develop portable haematology analysers that can perform 3-part leukocyte differentials at the point-of-care or in situations where a rapid analysis is crucial. Such smaller, preferably hand-held, devices play a crucial role in healthcare delivery in developing countries by bringing medical diagnosis closer to patients [3]. They also have great utility in settings such as ambulances, pharmacies, doctor's practices, etc. In this context, lens-free in-line holographic microscopy, originally introduced by Dennis Gabor [4], is a promising technology that is capable of extracting enough morphological information from a cell in a label-free manner (i.e. without using fluorescent markers). The advantages of digital in-line microscopy are numerous, the relatively simple optics and hardware requirements allow for inexpensive and compact solutions for three-dimensional imaging. Furthermore, it enables capturing specimens at a high rate, providing a high-throughput microscopy platform.
A digital in-line holographic microscope (DIHM) conceptually works as follows: a laser is directed onto a pinhole with a diameter of the order of the wavelength of the laser. The pinhole acts as a point source from which a spherical wave emanates. The wave illuminates an object and forms a magnified diffraction pattern on a digital image sensor, in many cases a CCD or a CMOS chip. If the scattered wave from the object is small compared to the unscattered reference wave, the interference pattern on the screen constitutes a hologram, which is then stored as a digital image [5]. The spherical wavefront of the point source acts as an ideal flat lens transformation, magnifying the captured object and increasing the resolution of the entire acquisition setup. Lens-free microscopy was already used for imaging a wide variety of biological specimens, such as marine microorganisms [5], pollen spores [6] and various types of blood cells [7,8].
The relative simplicity of the digital in-line holographic microscopy setup makes it a suitable candidate for integration with microfluidic devices, giving rise to new opportunities in the field of optofluidics. With these setups, spatial resolutions up to a few microns can be achieved and temporal resolutions are only limited by the speed of the camera sensor (usually several hundreds of frames per second can be captured) [9]. In [10], a compact lens-free imaging flow cytometer was introduced, capable of capturing individual blood cells flowing through a microfluidic chip, using a high-speed camera. The different types of cells flow over the imaging area of the microfluidic channel. A waveguide, acting as a point source, and a CMOS-imaging sensor are placed on each side of the microfluidic chip. A laser illuminates the passing cells and the scattered and unscattered light, forming an interference pattern (or hologram), is then captured by the imaging sensor (see Fig. 1 for a schematic of the acquisition setup). In combination with the high-rate microfluidic cell sorter chip, presented in [11], the lens-free imaging flow cytometer of [10] constitutes a powerful tool for high-rate and affordable clinical cell sorting. A schematic of the flow cytometry chip, with as primary function the fast sorting of cells, is depicted in Fig. 2. The goal is to determine the type of the captured cell, flowing through the microfluidic channel, by the time it arrives at the microfluidic switch. The cell is then directed to a specific outlet according to its type. Microheaters, located in this area, produce microscopic steam bubbles, pushing each cell to the desirable output. The distance between the holographic imaging and the sorting circuitry is crucial as longer distances decrease the accuracy of the sorting. Working at longer distances would require additional circuitry for more accurate positioning of the cells on top of the sorter circuitry. Additionally, the cells are flowing in the channel with speeds in the order of a few mm per second. Those speeds are not a problem for the image capturing, but they do put a strong restriction of only a few milliseconds on the total amount of time available to image, identify and classify every single cell. This simple, fast and cost-effective solution, developed at the research center imec, aims to replace the traditional microscope. Furthermore, its compact form allows for a parallelized setup, drastically increasing the throughput of cells to be analysed.
To identify an individual cell and direct it to a specific microfluidic outlet, as depicted in Fig. 2, its digital acquisition needs to be processed and robust features need to be extracted. In [10] a three-part WBC differential was presented, capable of distinguishing granulocytes, monocytes and T-lymphocytes from each other. Features that characterize internal complexity and cell size, are extracted from reconstructions of the raw holographic data. The method was validated by performing a 3-part classification from a minimally processed whole blood sample and compared to a conventional haematology analyser. The same authors later extended their setup by fitting it with a triggering mechanism for fluorescence-labeled single cells, ensuring a more reliable "groundtruth" labeling [12]. In the same work, an extended range of image features was investigated to achieve a robust three-class leukocyte classification. It has to be noted that the fluorescence labelling in [12] is used solely as a validation tool for the assessment of cell classification frameworks (including the one presented here), and will not be present in the final system. While the leukocyte classification pipeline proposed in [12] show good classification performance, one of its main issues resides in the computation time: the cell detection, the multiple propagations for automated refocusing do not scale well for the classification of a large number of cells (even after GPU acceleration) in real-time. Our aim is therefore to (efficiently) reduce the feature space in an intelligent manner, allowing for cell classifications within less than 1 ms. The cell sorter system has the following relevant properties and requirements which can be used for the design of a robust classification framework: • Almost all of the relevant information is concentrated in the lower frequencies due to the geometry of the setup; • Translations parallel to the camera plane should not affect classification performance; • The distance of the cell to the image sensor should not affect classification performance; • The proposed classification method should be rotationally invariant.
Given these observations, we propose to extract features from the frequency domain as it has a number of advantages. First of all, taking the 2D Fourier transform of an image is not costly, as the reconstruction of the hologram is avoided altogether, and can easily be implemented on a GPU. Furthermore, the Fourier transform has the significant advantages that it is inherently invariant to translations and concentrates the relevant information of the cell close to the DC-component of the signal. It is important to note that a fast classification method, using the raw data acquired with the setup of [10] was described in [2], with a classification accuracy of up to 89%. Yet, the proposed integrated optical neural network was only simulated in Matlab and it is unclear how translation as well as rotation invariance were achieved.
The paper is organized as follows. In the following section, the details of our proposed method are described together with the motivation for the chosen features. Section 3 describes the experiments that were conducted and provides a comparison with the traditional pipeline that includes the reconstruction of the holographic data. Section 4 offers conclusions as well as future work.

Proposed method
As stated in Section 1, our goal is to produce an efficient classification pipeline that is robust to various translations and rotations of a cell, as well as to changes in depth to the image sensor. We will address each of these invariance requirements in turn.

Working in the Fourier domain -achieving translation and depth invariance
The propagation of light under scalar diffraction theory is modeled by the Huygens-Fresnel principle, which is generally not efficiently computable for arbitrary surfaces. However, in the case of a planar surface, diffraction reduces to a convolution. This can be described by the angular spectrum of plane waves [13]. The Angular Spectrum Method (ASM) [14] is based on the notion that plane wave propagation can be described by the propagation of its spectrum. The Fourier coefficients of the signal correspond to plane waves, each undergoing a different phase delay depending on the propagation distance. The Angular Spectrum Method (ASM) is implemented as follows: where U stands for the wavefield, H z is the ASM kernel for propagation distance z, expressed in the Fourier domain using the angular frequencies ω x and ω y . λ is the illumination laser wavelength and F is the Fourier operator. The ASM is a common reconstruction method to recover the amplitude and phase images from a captured hologram [10]. We also revert to the angular spectrum method in Section 3 when comparing the performance between our proposed classification pipeline and the traditional pipeline with reconstruction of the holographic data. Motivated by Eq. (1), we first take the Fourier transform of the hologram and take the absolute value. This operation achieves invariance for both lateral as well as depth translations. Lateral translations are equivalent to a point-wise multiplication with a phasor, and depth translations are equivalent to propagations using the ASM kernel, which only modify the phase of the Fourier domain.

Zernike moments -achieving rotation invariance
Zernike polynomials are a sequence of functions which are orthogonal on the unit disk. Their orthogonality allows them to represent properties of an image without redundancy or overlap of information between these moments. The magnitudes of Zernike moments are independent of the rotation angle of the object in a region of interest, making them a popular tool for a wide variety of applications in optics and image processing [15]. For example, they are used as shape descriptors in biomedical applications, such as the accurate modeling of corneal surfaces [16], the classification of benign and malignant breast masses [17], and measuring systematic changes in the shape of cancer cells [18].
The (real-valued) Zernike polynomials are defined on the unit disk in polar coordinates (ρ, ϕ) where 0 ≤ ρ ≤ 1. They are defined for all integers n ≥ 1, m ≤ n and (m − n) even: where As described in [15], the features (Z −m n ) 2 + (Z m n ) 2 are rotationally invariant for all valid m and n. Each of these Zernike functions is pre-computed and multiplied with the absolute value of the Fourier transformed image, obtained as described in Section 2.1.

Dataset description and data preprocessing
The data that is used to showcase the proposed classification pipeline consist of grayscale images taken with the setup detailed in [12]. The CMV2000 CMOS image sensor, with a pixel size of 5.5 µm, produces 8-bit grayscale images with a resolution of 2048 × 1088 pixels. We consider two types of images, content images that contain a cell, and background images, acquired when no cell is present in front of the image sensor (see Fig. 3 for an example of both). Both types of images are necessary as the two classification pipelines considered require subtracting an empty background from each content image as a pre-processing step to enhance the signal-to-noise ratio of the holographic image. Reconstructing the hologram without subtracting the background leads to unwanted residual interference fringes [19]. Since the acquisition setup is fitted with a fluorescence triggering mechanism, it is relatively easy to estimate when a cell is to be acquired together with its corresponding background image. Furthermore, the fluorescence labeled cells provide us with a 'ground-truth' labeling, providing a more reliable means to assess the proposed classification framework for a three-part leukocyte differential. While for many bio-applications multiple fluorescent labels are used to increase the confidence level of cell type identification, only one fluorescent label could be used in our experiments due to limitations in the acquisition setup. Furthermore, a PMT (Photon Multiplier Tube) was used in combination with a manually selected threshold for the trigger signal. Hence, it is not possible to determine how much a specific fluorescent label is expressed by an individual cell, only that it exceeds a predefined threshold. It might also happen that specific cells do not express a fluorescent label as much as others. Overall, these factors might result in mislabeled training samples and the (potentially) wrong identification of cell-types on an individual cell basis. We also observe that in some cases, images, labeled as content images, are empty. As a solution to this problem, those background images can be detected by means of their energy in the frequency domain. Images that do not contain cells typically have a lower energy signature than content images.
After processing the dataset, we obtain a total number of roughly 18.400 raw cell acquisitions, each with a corresponding background image. The population of monocytes and granulocytes consist of about 8.800 and 8.000 instances respectively, while the T-lymphocytes are less represented with a total of 1.600 instances.

Classification pipeline with reconstruction
The deployed classification framework considered here is similar to the one presented in [12] and will be used as a frame of reference for comparison with the newly proposed framework in  3. An overview of all processing steps is depicted in Fig. 4. Prior to reconstruction, the background is subtracted from each content image (see an example in the right hand side of Fig. 3). To obtain the reconstructed amplitude and phase images of a single leukocyte, a crop of 600 × 600 pixels, centered around the cell is reconstructed numerically using angular spectrum propagation, as expressed in Eq. (1). The reconstruction depth, which induces the magnification factor, is decided by the best focus of the cell image. A wide range of measures and methods to estimate the focus of an image were proposed in literature, including spatial-, spectral-, and sparsity-based methods [20]. We observed that a combination of three measures is the most robust solution for our data. Two of the measures are based on the gradient mean and standard deviation. The motivation being that the sharpest reconstruction will have the highest gradient energy. The third metric relies on using entropy to measure the sharpness of the reconstruction, as proposed in [21] (for more details, we refer the reader to [12]). It has to be noted that estimating the reconstruction depth for each cell (or consecutive cells) is computationally expensive and constitutes a bottleneck for the fast processing of the high-throughput holographic data.
Once a hologram is reconstructed, a number of representative features are extracted from the leukocyte contained within. The features considered, eight for each cell, are essentially a set of measures that describe the shape characteristics of the leukocyte and the textural characteristics of its nucleus.

Acquisition
Detection Reconstruction Feature extraction Classification The four shape features that are derived from the phase image, unwrapped with the method presented in [22] and thresholded with Otsu's method [23], consist of area, eccentricity, and the radius mean and standard deviation. The textural features consist of well-known statistical features (i.e. contrast, correlation, homogeneity and energy) derived from the grayscale co-occurrence matrix [24], constructed from the amplitude image of the reconstructed cell. To induce rotation invariance, the four statistical features are averaged for each cell over the angles 0, 45, 90 and 135 degrees. As can be observed from the reconstructions in Fig. 5, and confirmed in specialized literature, T-lymphocytes are typically smaller than the two other types of leukocytes. Granulocytes can be distinguished from monocytes as they have a rougher inner cell, due to the presence of granules in their cytoplasm. Both these observations are confirmed by the extracted features. When looking at the scatter plot, depicted in Fig. 6, T-lymphocytes clearly have a smaller radius and granulocytes are characterised by a slightly larger contrast value.

Granulocytes
Monocytes T-lymphocytes  The classifier that we consider for both pipelines is the well-established Random Forest classifier [25] and in all experiments 10-fold cross validation is performed. Bare in mind that all the processing steps that are involved prior to feature extraction not only increase the computational cost per cell, but also increase the chances of introducing errors in the data. For example, the cell might not be well centered, or the phase unwrapping and subsequent thresholding might not be ideal, having a direct impact on the features related to the leukocyte shape. The classification accuracy obtained with this method on a population of 8000 granulocytes, 8800 monocytes and 1600 T-lymphocytes is of 92%. The confusion matrix in Table 1 shows that the T-lymphocytes are rather well separated, while most classification errors occur between granulocytes and monocytes.

Classification pipeline without reconstruction
Just as in the classification framework described in Section 3.2, each content image needs to have its background subtracted. While in the previous pipeline centering each cell is crucial for  Gran  7338  685  2  8025  Mono 581  8194  55  8830  T-cell  5  57  1593 1655 a proper reconstruction, in the new pipeline proposed here, the image is cropped around the center of the image, disregarding the actual position of the cell. The center of the image is not the geometric center of the image but is defined as the center of the laser illumination area. Keeping the observation in mind that most of the information is concentrated in the lower frequencies of the image, we resize the cropped leukocyte image of size 600 × 600 to 32 × 32 pixels (the size of 32 × 32 pixels was determined experimentally). This is functionally (almost) equivalent to cropping the Fourier transform of the image around the DC coefficient, but this operation is computationally more efficient: it has time complexity of O(n) instead of O(n log n) for a pixel count of n. The absolute value of the Fourier transformed 32 × 32 image is then multiplied with each pre-computed Zernike function. We reduce the first 28 real Zernike polynomials (up until order 6) to 16 as described in Section 2.2, making the features rotational invariant, and obtain a classification accuracy of 94.8%. As can be observed in the confusion matrix in Table 2 the T-lymphocytes are again well separated from the two other types of leukocytes. However, we still observe some confusion between granulocytes and monocytes. We repeated the same experiment with an increased amount of Zernike polynomials, up until order 22, resulting in a total of 144 rotational invariant features per leukocyte. The obtained classification accuracy is now 96.8% (the corresponding confusion matrix can be observed in Table 3. When compared to the previous experiment we can observe that there is an improvement in the classification of granulocytes and monocytes, indicating that there is discriminative information contained in the higher order Zernike polynomials.

A quick study on complexity
The motivation for the proposed classification framework is firstly to increase the robustness of the system by introducing invariance to translation, rotation and depth. Secondly, to deploy a cell classification framework on an optofluidic platform, the processing pipeline needs to be computationally efficient. The traditional pipeline described in Section 3.2, involving the reconstruction of the leukocyte prior to feature extraction, consists of several processing steps that slow down the entire system. As a pre-processing step, the cell interference pattern needs to be centered and two Fourier transform operations have to be performed for reconstruction (not mentioning the steps required to determine the optimal reconstruction depth). Subsequently, the feature extraction consists of processing the reconstructed phase image to construct shape features and several matrix manipulations to obtain the features related to the structure of the cell nucleus. Needless to say that each of these operations slows down the entire system considerably. The processing of a single cell in an unoptimized Matlab implementation on a machine with a 2.6 GHz Intel Core i7 architecture can easily take up to 500ms. This constitutes the time for pre-processing, a single reconstruction with Angular Spectrum propagation (assuming we have the proper reconstruction depth) and feature extraction. The proposed classification pipeline (described in Section 3.3) consists of a downscaling operation, followed by a single Fourier transform. The feature extraction is reduced to the multiplication of each cell with pre-computed Zernike bases. Based on the observation drawn from the results portrayed in Tables 2 and 3, we notice that the higher order polynomials contain information that better differentiates the granulocytes from the monocytes, indicating that the difference between these two types of leukocytes is subtle.
To further improve the performance of the proposed system, a study was conducted to identify the features that contribute most to the classification accuracy. A popular feature selection approach is wrapper based, i.e. to use a generic learning algorithm and evaluate the performance of that algorithm on the dataset with different subsets of features selected. In this way we are able to reduce the total number of features from 144 to 10 features, significantly reducing the feature space (the confusion matrix can be found in Table 4, while still obtaining a classification accuracy of 96,8%. The retained features correspond to both the lower and higher order polynomials.  Gran  7823  200  2  8025  Mono 362  8462  6  8830  T-cell  0  6  1649 1655 Lastly, we implemented the proposed feature extraction framework on GPU and were able to reduce the processing to 0.2ms per cell, proving that this pipeline can be deployed in a practical setup where thousands of cells need to be processed per second. At this point it has to be noted that, although we chose for a GPU implementation, the algorithm can easily be ported to an ASIC, customised for this particular application, giving rise to massively parallellised architectures.

Conclusion
We propose a cell classification framework that can be deployed for various haematology tests, such as leukocyte counts and differentials. The goal of the proposed pipeline is to reduce the computational complexity of the cell processing while keeping a similar classification accuracy, obtained with traditional methods, making it a viable option for high-throughput applications. This goal is achieved by firstly, omitting the hologram reconstruction bottleneck and secondly, simplifying the feature extraction to a matrix multiplication of the data with pre-computed bases. Specifically, the Zernike polynomials were chosen due to their inherent rotational invariant properties. Furthermore, working in the frequency domain has the significant advantage of making the features translation and depth invariant. During our study on complexity, we observed that only a subset of 7 features contributes to the overall classification performance, further reducing the complexity of the entire system. The classification performance of our pipeline seems to hit a ceiling at 97%. It is not entirely clear whether this is due to our leukocyte representation or if this is inherent to the experimental setup. Namely, the fluorescence labeling, which triggers the lens-free acquisition and provides us with "ground truth" data, is not entirely reliable and might be the cause for some of the classification errors. Finally, we implemented part of the processing pipeline, i.e. the cell processing and feature extraction, on GPU, achieving an average computation time of 0.2ms per cell. It has to be noted that this framework can be entirely parallelized, giving rise to stacked architectures, where multiple cellsorter chips work simultaneously. Furthermore, the observation that only the relatively low frequencies matter for this particular application, could indicate that low resolution cameras might be sufficient to distinguish different cell types.
Future work will consist of extending the framework to cope with more cell types, enabling a 5-part or even 7-part blood differential. We expect the system to scale nicely, as long as higher order Zernike polynomials are included. The next stage of the project will consist of the implementation of the entire classification pipeline on GPU and of experiments with stacked architectures of multiple chips.