Deep learning-based light scattering microfluidic cytometry for label-free acute lymphocytic leukemia classification

: The subtyping of Acute lymphocytic leukemia (ALL) is important for proper treatment strategies and prognosis. Conventional methods for manual blood and bone marrow testing are time-consuming and labor-intensive, while recent ﬂow cytometric immunophenotyping has the limitations such as high cost. Here we develop the deep learning-based light scattering imaging ﬂow cytometry for label-free classiﬁcation of ALL. The single ALL cells conﬁned in three dimensional (3D) hydrodynamically focused stream are excited by light sheet. Our label-free microﬂuidic cytometry obtains big-data two dimensional (2D) light scattering patterns from single ALL cells of B/T subtypes. A deep learning framework named Inception V3-SIFT (Scale invariant feature transform)-Scattering Net (ISSC-Net) is developed, which can perform high-precision classiﬁcation of T-ALL and B-ALL cell line cells with an accuracy of 0.993 ± 0.003. Our deep learning-based 2D light scattering ﬂow cytometry is promising for automatic and accurate subtyping of un-stained ALL.


Introduction
Leukemia is a malignancy of blood leukocytes and it is usually divided into four subtypes according to the rate of progression and cell type, namely acute lymphocytic leukemia (ALL), chronic lymphocytic leukemia (CLL), acute myeloid leukemia (AML) and chronic myeloid leukemia (CML) [1]. Leukemia subtypes have varied clinical manifestations, pathogenic factors, treatment strategies and prognosis, differing in their reactions to chemotherapy [2,3]. The ALL is the most common childhood malignancy, accounting for more than a quarter of pediatric cancers [4,5]. Immunophenotyping, using monoclonal antibodies to label lymphocyte surface antigen markers, can divide ALL into two subtypes which include T cell ALL (T-ALL) and B cell ALL (B-ALL) [6]. The ALL is a heterogeneous malignant leukemia with subtypes that are differing in their response to chemotherapy. For example, T-ALL is not amenable to salvage treatment with blinatumomab, while nelarabine is a T-cell specific purine nucleoside analog approved for T-ALL [7]. Another example is that hematopoietic cell transplantation (HCT) has been demonstrated to have superior results for patients with T-ALL [8]. Therefore, subtyping is key for early diagnosis of ALL, and has great prognostic significance [9]. Examining blood and bone marrow samples under a microscope by pathologists is one of the routine tests to diagnose ALL. As a traditional manual method to identify prognostically important leukemia subtypes, it can be labor-intensive, time consuming and requires the combined expertise of hematologists, pathologists, and cytogeneticists [10][11][12]. Machine learning and recently developed deep learning technologies lead to high accuracy and automation of leukemia subtyping [13][14][15], which depends on stained cell images. It is of great interest and significance to develop automated and label-free subtyping methods for ALL.
Flow cytometry (FCM) has become an important approach for accurate diagnosis and subtyping of leukemia. It can classify cells with corresponding antibodies according to the different markers or marker combinations. Modern immunophenotyping of hematological malignancies using FCM can help identify malignant cells and determine the degree of immunophenotypic heterogeneity of malignant cell populations [16,17]. In addition, the ability of FCM to rapidly detect cells makes it possible to detect minimal residual disease (MRD) with 1 leukemic cell per 105 leukocytes [18]. New monoclonal antibodies, improved gating strategies and multiparametric techniques have greatly improved the application of FCM in the leukemia diagnosis [19]. For example, multiparametric flow cytometry immunophenotype (MFCI) has been applied to distinguish myeloid and B/T lymphoid acute leukemia [20,21]. The main limitations of FCM are the high cost of instruments and reagents, and the requirement of professional skills and experience [16].
Recent studies have shown that the two-dimensional (2D) light scattering of single cells can provide label-free cellular information [22][23][24]. Biological cells are not homogeneous as they are composed of cell membranes, cytoplasm, nuclei, and various organelles. When a monochromatic laser beam illuminates a cell, elastic scattering occurs and the scattered lights are distributed in space. It has been reported that the small angle forward scattering (less than 5 degrees in polar angle) is mainly determined by the cell size, while the side scattering (around 90 degrees in polar angle) contains rich information of cellular organelles [25]. The 2D light scattering patterns of biological cells can be obtained by using a 2D sensor in the side scattering angular range, which contain rich information of cell organelles such as the mitochondria [23,[26][27][28].The 2D light scattering static cytometry has been reported as a label-free and low-cost technique for myeloid leukemia subtyping [27]. As a technique for precise manipulation of microscale fluids, microfluidic is a powerful tool for rapid and high throughput single-cell analysis [29,30]. Label-free microfluidic technology has attracted increasing attention due to its advantages such as high throughput, miniaturization, and non-invasiveness. Our group has developed a label-free light-sheet microfluidic cytometry by combining 2D light scattering, a disposable hydrodynamic focusing unit and light sheet illumination, which has a broad application prospect for automatic and label-free clinical diagnosis [31].
In 2015, we developed our first generation pattern recognition cytometry based on the machine learning technology and a 2D light scattering static cytometer [32]. Adaptive boosting (AdaBoost) method is adopted for the analysis of the single-cell 2D light scattering patterns. We demonstrated that the pattern recognition cytometry can perform label-free classification of normal cervical cells and HeLa cells with a high accuracy rate. In 2017, we reported a wide-angle label-free static cytometer for acute and chronic myeloid leukemic cells identification by combining gray level differential statistics (GLDS) algorithm with support vector machine (SVM) [27]. Although machine learning technology can be well combined with 2D light scattering label-free cytometry and can achieve complex cell analysis, traditional machine learning technology relies on feature engineering. It means that experts need to extract image features manually, and then put them into machine learning models, which limits the full mining and extraction of useful image information to a certain extent. In addition, as the amount of training data increases, the performance improvement of machine learning models is limited. Therefore, we focus on deep learning technology in our second generation pattern recognition cytometry in order to perform high-performance analysis on large data single-cell images obtained by 2D light scattering flow cytometry.
As an emerging machine learning technology, deep learning is widely recognized as an advanced method to automate data analysis tasks [33][34][35][36]. Convolutional Neural Networks (CNN) have made remarkable breakthroughs in image-based classification tasks, making a huge leap in the field of medical image analysis [37]. Since 2014, very deep CNNs have become mainstream for most computer vision solutions [38]. Compared with the traditional machine learning technologies which rely on feature engineering designed by human experts, CNN can extract image features automatically and combine feature engineering with learning processes, which provides a possibility for automatic exploration of feature hierarchy and interaction. Deep learning has unique advantages in big data analysis, which have been reported to analyze big data from high-throughput imaging flow cytometry [39][40][41]. The deep learning-based methods have been successfully applied in various single-cell optical image studies, such as super-resolution reconstruction [42], cell counting [43] and cell tracking [44]. Deep learning has also been reported to be combined with light scattering technologies to solve biomedical problems [45].
In this work, we develop a deep learning-based 2D light scattering microfluidic cytometer for high-precision, automatic, and label-free identification of ALL cell lineage cells. We used a low-cost compact hydrodynamic focusing unit without microfabrication. Single cells can be illuminated uniformly in the core of 3D hydrodynamically focused stream by the light sheet technique. Our label-free light-sheet microfluidic cytometer obtains 2D light scattering images of T-ALL and B-ALL cell lineage cells in the azimuthal and polar angle ranging from 58°to 122°. We also report a deep learning framework named ISSC-Net. Scale invariant feature transform (SIFT) local feature descriptor is used to improve the Inception V3 deep learning network, making it more suitable for learning the key properties of cellular 2D light scattering patterns. Our ISSC-Net can accurately identify T-ALL and B-ALL cell lineage cells with an average accuracy of 0.993 ± 0.003, sensitivity and specificity of 0.993 ± 0.003 and 0.993 ± 0.004, which is promising to facilitate early diagnosis of ALL.

Sample preparation
Cells of the Jurkat T-ALL cell line (Cell Bank of Chinese Academy of Sciences, China) and BALL-1 B-ALL cell line (DSMZ-German Collection of Microorganisms and Cell Cultures GmbH, Germany) were cultured in Iscove's Modified Dulbecco's Medium (IMDM, Gibco, Invitrogen, USA) supplemented with 10% fetal calf serum, 100 U/mL penicillin and 10 mg/mL streptomycin in a humidified 5% CO2 and 95% air atmosphere at 37°C. Both cell lines were cultured in growth medium for less than 15 passages. Jurkat and BALL-1 cells were then suspended in phosphate buffered saline (PBS) solution, centrifuged at 1,000 RPM for 10 minutes. After removing the supernatant, the cell pellets were re-suspended with a small amount of PBS. Cells were then fixed with Immunology Staining Fix Solution (P0098, Beyotime, China) for 30 minutes at room temperature in order to avoid potential biological hazards. To obtain cell suspensions for single-cell 2D light scattering measurement, the fixed cells were re-suspended in PBS at a concentration of approximately 1500 cells/mL.

Experimental setup
Our deep learning-based light-sheet 2D light scattering microfluidic cytometer is illustrated in Fig. 1. The experimental setup consists of the light-sheet illumination, the microfabrication-free hydrodynamic focusing microfluidics which can achieve 3D hydrodynamic focusing of a sample fluid, a 2D light scattering collector and a deep learning-based data processor. We choose a diode pumped solid state (DPSS) laser (Frankfurt laser company, Germany) to generate a monochromatic laser beam with a wavelength of 532 nm, and then send it to a neutral density (ND) filter (Thorlabs, USA) and a cylindrical lens (CL) with a focal length of 150 mm (Thorlabs, USA) to form a light sheet. The illumination power after the ND filter is about 20mW. The sample chamber is composed of two coaxial capillaries, with the inner capillary as the sample fluid microchannel and the outer one for the sheath fluid. The diameter is 300 µm for the inner glass capillary and 1000 µm for the outer one. Two syringe pumps drive the sample fluid and sheath fluid to form a hydrodynamic focusing fluid. In this work, the flow rate Q in of the syringe pump is set to 1 µL/min to drive sample solution, and the flow rate Q out of the pump for driving sheath fluid is set to 100 µL/min. The scattered light emitted from the flowing single scatterers is transmitted through a 60× objective (Olympus, Japan) with a field of view number (FN) of 22 and numerical aperture (NA) of 0.7, which is in the angular field of view (FOV) of about 58 to 122 degrees both in azimuthal and polar angles. The 2D light scattering is imaged by a complementary metal oxide semiconductor (CMOS) camera (Canon, Japan) with a frame rate of 25 frames per second. Experimental results of T-ALL and B-ALL cells are recorded as videos. The cellular 2D light scattering patterns are obtained by extracting and intercepting every frame of the videos. Finally, the 2D light scattering patterns of single cells enter the data processor for automated analysis. The distribution of scattered light varies with both polar angle θ and azimuthal angle ϕ, and the angular range is mainly determined by the NA of the objective [31]. In this work, the objective with an NA of 0.7 obtains the corresponding angular ranges for θ and ϕ from approximately 58°t o 122°as shown in the inset of Fig. 1.

Data preparation
When the single cells pass through the observation area, they are captured by a CMOS sensor in video mode. For the well diluted single cell solution with a frame rate of 25 f/s, only one 2D light scattering pattern will be observed for each single cell (Occasionally, two light scattering images are captured for a single cell, and the chance is less than 0.75%. In this case, we use only one 2D light scattering pattern of this cell for cell analysis.) It is of importance to point out that the flow rate of single cells should balance the frame rate of the CMOS sensor for the good performance of our deep learning based approach. Ideally, the CMOS sensor should catch each single cell well and time-efficiently, and image each single cell only once. However, for a given frame rate of the sensor, it may miss the images of many single cells in the case of a too high rate of single cells, or duplicated images may be obtained for a single cell when the rate of single cells is relative low. Both cases could affect the effectiveness of our deep learning based approach even if good quality images of single cells are obtained. Please note the flow rate of cells is related to the cell concentration, the sample fluid flow rate and the sheath fluid flow rate in our experiments, and the parameters we used here allow us to get a good big-data set of single cells.
For the video processing, we first used gray value detection to filter the frames without speckles, and image processing algorithms such as binarization and denoising were then applied to find the boundaries of the light scattering image in each frame (excluding the incomplete light scattering patterns) to obtain the segmented 2D light scattering patterns of single cells. The training set includes 2D light scattering images of 12,000 single Jurkat and BALL-1 cells, with 6,000 patterns for each cell type. The verification set and the test set consist of 600 images, and have the same number in each cellular category. The patterns for each section are randomly selected without crossing.

Deep classification framework
Inception V3 network is a 42-layer deep CNN with input of 299×299 and has achieved great success in the field of object recognition [38]. Inception V3 consists of symmetric and asymmetric building blocks, including convolutions, max pooling layers, average pooling, dropouts, and fully connected layers. It uses a convolution kernel splitting method to divide large convolutions into small ones in order to reduce the number of parameters. For example, a 3×3 convolution can be split into 3×1 and 1×3 convolutions. The SIFT [46] is an image feature descriptor based on scale space, which is invariant to image scaling, rotation and affine transformation, and also maintains stability against viewing angle changes, brightness changes and noise. The SIFT keypoints can be identified as local extreme points of the Difference of Gaussian (DoG) images by comparing each pixel to its all neighbours at the same scale and at neighbouring scales [47]. The SIFT features can be easily combined with other forms of features and do not require extensive data sets for generalization. By combining SIFT calculation with CNN, not only the dimension of the feature can be expanded, but also the robust affine transformation invariance can be introduced into the CNN.
We design a framework that combines SIFT with Inception V3 network as depicted in Fig. 2. Firstly, we use SIFT to locate the keypoints of each 2D light scattering image. After locating the keypoints, we calculate the SIFT descriptor for each keypoint to get SIFT features. We used OpenCV packages for SIFT and default values that have been proven to be excellent in many practices, which can be found in the OpenCV documentation. The number of layers in each octave is 3, the number of octaves is computed automatically from the image resolution, the contrast threshold used to filter out weak features in low-contrast regions is 0.04, the threshold used to filter out edge-like features is 10, and the initial standard deviation of Gaussian kernel is 1.6. The Kmeans algorithm is trained by SIFT features to generate the bag of words (BoW) histogram as a 1D SIFT feature vector. Secondly, we use inception V3 network to extract another 1D feature vector of each 2D light scattering image. Finally, these two 1D feature vectors will be connected to train the fully connected layers to determine the final output of the network. Features are fed into two hidden layers with 64 neurons each and using the 'ReLU' activation function, followed by the output layer using the 'softmax' activation function. The cross-entropy (CE) error function is used as loss function and Stochastic Gradient Descent (SGD) is used as optimizer.

Measurements of light-sheet illumination and hydrodynamic focusing effect
The light-sheet illumination and 3D hydrodynamic focusing are the key parts of our experimental setup which ensure the cells to be detected individually. The Rhodamine 6G fluorescent dye (Life Technologies, USA), whose excitation and emission wavelengths are 535 nm and 575 nm respectively, is used for measuring the light sheet thickness and the focused stream width. The light sheet profile is shown in Fig. 3(a), which can provide a uniform illumination along a long x-axis. The width (Z-direction) of the light-sheet along the dashed line is measured to be 53 µm (FWHM), as shown in Fig. 3(b). The profile of the hydrodynamically focused stream is visualized in Fig. 3(c) for the inner glass capillary with an inner diameter of 300 µm. The width (Z-direction) of the focused stream along the dashed line is measured to be 46 µm (FWHM), as shown in Fig. 3(d).

2D light scattering patterns acquisition and SIFT keypoints description
The SIFT keypoints are mainly located in the high contrast area of the image, such as edges and textures. The orientation of each keypoint can be specified through the orientation gradient histogram, which is built by the magnitudes and orientations of the gradient in the patch around the keypoints. Figure 4 shows the 2D light scattering patterns of two types of ALL cells (Jurkat: (a); BALL-1: (c)) and their corresponding SIFT keypoints diagram: (b) and (d). Each SIFT keypoint is denoted by a yellow circle with a yellow line to represent its scale and orientation.

Classification results of acute lymphocytic leukemia cells
In order to classify 2D light scattering images of T-ALL and B-ALL cells automatically, we used a new deep learning framework by combining SIFT algorithm and Inception V3 network. We randomly divided all 2D light scattering patterns into a training set, a validation set and a test set, containing 12,000, 600 and 600 images, respectively. Each data set includes 50% Jurkat images and 50% BALL-1 images. In order to test whether our improved deep learning network is more suitable for the classification of cellular 2D light scattering images, the machine learning method used SIFT features and Support Vector Machine (SVM), the traditional CNN method and our improved deep learning method are used to classify T-ALL and B-ALL cells. We used t-SNE for dimensionality reduction of the image features (extracted by SIFT, CNN and ISSC-Net) by sklearn.manifold.TSNE (a Python package). For all image features, parameters are set the same during dimension reduction process, where the perplexity is 30, the number of iterations and learning rate are 1,000 and 200, respectively. Figure 5 shows the t-SNE visualization results of 2D light scattering image features of two cell types (training set, 6,000 Jurkat and 6,000 BALL-1) extracted by three classification methods. The clustering and separation of red and blue clusters reflects the representation learning capability of three different classification methods. And Fig. 6 shows the training processes of two deep networks, Inception V3 and ISSC-Net, which are trained by the same 2D light scattering patterns. It can be seen from Fig. 5 and Fig. 6 that our ISSC-Net shows better representation learning capability and higher stability in the training process.  To ensure the generalization of the classification results, we performed 5 random rearrangements on all 2D light scattering patterns in order to form 5 different groups of data sets. Each group of data sets has a training set (6,000 Jurkat, 6,000 BALL-1), a verification set (300 Jurkat, 300 BALL-1) and a test set (300 Jurkat, 300 BALL-1). In each data set group, there is no crossover between the training set, verification set and test set. We used SIFT features and SVM, Inception V3 network, and our ISSC-Net to process and analyze these five groups of data sets. Automatic classification results for the first test set by three different methods are sorted in Table 1. Here TP and TN are for the correctly classified results (True Positive and Negative), FP and FN are for the incorrectly classified results (False Positive and Negative). Accuracy, sensitivity and specificity are calculated according to the confusion matrix. The mean and standard deviation of classification accuracy, sensitivity and specificity of the three methods for five test sets are sorted in Table 2.   According to the classification results shown in Table 1 and Table 2, it can be clearly seen that compared with the other two classification algorithms, our proposed method is superior for the classification of cellular 2D light scattering patterns. This may be due to the fact that our proposed algorithm combines the advantages of SIFT features and convolutional features in deep learning, which can be more capable of exploiting information about 2D light scattering images of cells.
Experts usually screen and subtype ALL by observing blood or bone marrow samples under a microscope, which is time-consuming and labor-intensive. Flow cytometric immunophenotyping has the limitation of high cost of instruments and reagents, which also requires skilled experts for operations. Bio-markers have been shown for the identifying of major disease categories, however the interpretation requires integration of complex patterns of immunophenotypes [48]. Machine learning and deep learning technologies lead to high accuracy and automation of leukemia subtyping, while many excellent studies are based on stained cell images [13][14][15]. Therefore, exploring the label-free and automated leukemia subtyping method in our work has profound clinical significance.
Here, we have demonstrated the potential feasibility of achieving label-free and automated leukemia subtyping by deep learning-based pattern recognition flow cytometry. A light-sheet microfluidic cytometer with a microfabrication-free hydrodynamic focusing unit and the light sheet illumination is used to generate 2D light scattering patterns of label-free single cells, which contain information about cell morphology and internal structure. This imaging method avoids staining the cells and can image a large number of cells in a short time.
A deep learning framework for automatic classification of ALL subtype (T-ALL and B-ALL) single-cell 2D light scattering images, ISSC-Net, is based on the combination of deep learning network (Inception V3) and a local feature extraction algorithm (SIFT). We have provided various comparisons among three classification methods, SIFT and SVM, Inception V3 and ISSC-Net. First, we use the t-SNE algorithm to visualize the representations extracted by three classification methods, the result shows that ISSC-Net has better representation learning capability and can comprehensively extract representative features which are critical for single-cell 2D light scattering images classification. Then we illustrate the training processes of two deep learning network, Inception V3 and ISSC-Net, and we find that ISSC-Net has higher robustness and stability. Finally, we use three classification methods to classify ALL subtype cell images. ISSC-Net can classify T-ALL and B-ALL cells with an accuracy of 0.993 ± 0.003, sensitivity and specificity of 0.993 ± 0.003 and 0.993 ± 0.004, which has higher precision and stronger robustness than other two methods. We believe that our ISSC-Net-based pattern recognition flow cytometry is promising for subtyping automatically and precisely of ALL.
Recent studies have shown that T-ALL and B-ALL cells have many differences in their ultrastructures via electron microscopy [49][50][51][52]. It is found that T-ALL cell has smooth boundaries and short endoplasmic reticulum. There are a few electron dense granules in the cytoplasm and the nucleus is convoluted with margination of heterochromatin and inconspicuous nucleolus in T-ALL cells. The B-ALL cell margins show microvillous processes, the cytoplasm contains rough endoplasmic reticulum and numerous mitochondria, and the nucleoli are coarse and filamentous. These obvious differences in the ultrastructure of T-ALL and B-ALL cells lead to the variations of 2D light scattering patterns, which are well-classified by our method. It is worthy to pointing out that the differences in ultrastructure of the T-ALL and B-ALL cells could be challenging to observe by using conventional microscopy or image flow cytometry.
Our deep learning-based light scattering microfluidic cytometry has successfully classified the cell lineage cells of two ALL subtypes with high accuracy, which is promising for clinical ALL treatment considering the precision medicine needed for ALL subtypes. In future work, we are interested in extending the method developed here to the four leukemia types for a more comprehensive understanding of leukemia typing, especially for the subtyping of the four types of leukemia. Moreover, our deep learning-based light scattering microfluidic cytometry may be used to screen clinical blood cells for leukemia diagnosis, considering that the normal white blood cells are well-differentiated from the leukemic ones by label-free 2D light scattering with machine learning [28].

Conclusion
In summary, we provide an ISSC-Net-based pattern recognition flow cytometry for high-precision, automatic, and label-free identification of ALL cell lineage cells with deep learning. Our light sheet microfluidic cytometry provides 2D light scattering images of un-labelled ALL cells. The deep learning framework ISSC-Net combines SIFT algorithm and Inception V3 network for label-free ALL subtyping by analyzing the 2D light scattering patterns of ALL cells. It is shown that ISSC-Net can fully mine the effective features of cellular 2D light scattering patterns and identify T-ALL and B-ALL cell lineage cells with a high and stable accuracy of 0.993. We believe that our deep learning-based pattern recognition flow cytometry will be a potential tool for automatic and label-free subtyping of ALL, and our ISSC-Net can be valuable for big data analysis of single cell images.

Disclosures
The authors declare no financial or commercial conflict of interest.