Deep learning-based cell identification and disease diagnosis using spatio-temporal cellular dynamics in compact digital holographic microscopy

: We demonstrate a successful deep learning strategy for cell identiﬁcation and disease diagnosis using spatio-temporal cell information recorded by a digital holographic microscopy system. Shearing digital holographic microscopy is employed using a low-cost, compact, ﬁeld-portable and 3D-printed microscopy system to record video-rate data of live biological cells with nanometer sensitivity in terms of axial membrane ﬂuctuations, then features are extracted from the reconstructed phase proﬁles of segmented cells at each time instance for classiﬁcation. The time-varying data of each extracted feature is input into a recurrent bi-directional long short-term memory (Bi-LSTM) network which learns to classify cells based on their time-varying behavior. Our approach is presented for cell identiﬁcation between the morphologically similar cases of cow and horse red blood cells. Furthermore, the proposed deep learning strategy is demonstrated as having improved performance over conventional machine learning approaches on a clinically relevant dataset of human red blood cells from healthy individuals and those with sickle cell disease. The results are presented at both the cell and patient levels. To the best of our knowledge, this is the ﬁrst report of deep learning for spatio-temporal-based cell identiﬁcation and disease detection using a digital holographic microscopy system.


Introduction
Digital holographic microscopy is an optical imaging technology that is capable of both quantitative amplitude and phase imaging [1]. Many studies have shown digital holographic microscopy as a powerful imaging modality for biological cell imaging, inspection, and analysis [2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18]. Furthermore, digital holographic microscopy has been used successfully in cell and disease identification [16] for various diseases including malaria [2], diabetes [4], and sickle cell disease [6,19]. Digital holographic microscopy has additionally been used for discrimination between various inherited red blood anemias by considering Zernike coefficients [20]. Other forms of quantitative phase imaging, such as optical diffraction tomography can similarly be used for analysis of biological cells [21,22]. However, the single-shot capabilities of digital holographic microscopy, make it an ideal choice for analysis of spatio-temporal dynamics in live biological cells [6,14,[23][24][25]. In particular, in [6] a compact and field-portable digital holographic system was presented for potential diagnostic application in sickle cell disease using spatio-temporal information. In this current work, we advance those capabilities by combining the use of dynamic cellular behavior with deep learning strategies to provide better cell identification capabilities and diagnostic performance in a low-cost and compact, digital holographic microscopy system. While the presented work uses a shearing-based compact digital holographic microscope, the proposed deep-learning method can be applied for cell identification tasks using any system that offers video-rate holographic imaging capabilities. This includes but is not limited to other forms of off-axis digital holography such as those using the Michelson arrangement [26,27], various forms of wavefront-division systems [28,29], and in-line holographic systems [30].
Deep learning refers to the multi-level representation learning methods where the complexity of the representation increases as the number of layers grow [31]. These methods have been found to be extremely useful for finding intricate structures from high level data and routinely outperform conventional machine learning algorithms, all without the need for carefully engineering feature extractors. For image processing tasks specifically, convolutional neural networks (CNNs) are paramount among deep learning strategies. CNNs notably make use of convolutional layers where each unit of a convolutional layer connects to local patches of the previous layer. This allows for combinations of local features to build upon each other in generation of more global features and makes CNNs well-suited for easily identifying local patterns at various positions across an image. Developments in hardware and software since the inception of CNNs have resulted in CNNs becoming the dominant approach for nearly all recognition and detection tasks today [31].
Following the expansion in use of convolutional neural networks [31] for many image processing tasks, deep learning has similarly grown increasingly popular for cell imaging tasks in recent years [32,33]. These tasks include, but are not limited to, cell classification, cell tracking [32], and segmentation of living cells [34]. Moreover, deep learning has been used in holographic imaging specifically for reconstruction, super-resolution imaging, and pseudo-colored phase staining to mimic conventional brightfield histology slides [35]. Additionally, the use of convolutional neural networks in quantitative phase imaging has been presented for screening of biological samples, such as for the detection of anthrax spores [36], classification of cancer cell stages [37], and classification between white T-cells and colon cancer cells [38]. While much of this research for deep learning in holographic cell imaging thus far has primarily dealt with stationary phase images, recurrent neural networks such as long-short term memory (LSTM) networks [39], also make deep learning an attractive option for dealing with time-varying biological data [40].
In this paper, we present a deep learning approach for cell identification and disease diagnosis using dynamic cellular information acquired in digital holographic microscopy. Handcrafted morphological features, and transfer learned [32,41] features from a pretrained CNN are extracted from every reconstructed frame of a recorded digital holographic video, then these features are input into a Bi-LSTM network which learns the temporal behavior of the data. The proposed method is demonstrated for cell identification between cow and horse red blood cells, as well as for classification between healthy and sickle cell diseased human red blood cells. The proposed deep learning approach provides a significant improvement in terms of classification accuracy in comparison to our previously presented approach utilizing conventional machine learning methods [6,19]. To the best of our knowledge, this is the first report of deep learning using spatio-temporal cell dynamics for cell identification and disease diagnosis in digital holographic microscopy video data.
The rest of the paper is organized as follows: the optical imaging system used for data collection is described in Section 2 followed by the details of the feature extraction and the long short-term memory network. Results for the two datasets are given in Section 3 followed by discussions in Section 4 and finally, the conclusions are presented in Section 5.

Compact and field-portable digital holographic microscope
All data was collected using a previously described compact and field-portable 3D-printed shearing digital holographic microscope [6]. Shearing-based interferometers are off-axis common-path arrangements that provide a simple setup and highly stable arrangement for digital holographic microscopy [3,17]. The 3D-printed microscope consists of a laser diode (λ = 633 nm), a translation stage to axially position the sample, one 40X (0.65 NA) microscope objective, a glass plate, and a CMOS image sensor (Thorlabs DCC1545M).
The beam emitted from the laser diode passes through the cell sample, is magnified by the microscope objective lens, then falls upon the glass plate at an incidence angle of 45-degrees. The light incident on the glass plate is reflected from both the front and back surfaces of the plate to generate two laterally sheared copies of the beam. These beams will self-interfere and form an interference pattern that is recorded by the image sensor. The fringe frequency of the recorded pattern is determined by the radius, r, of curvature of the wavefront, as well as the vacuum wavelength of the source beam, λ, and the lateral shear induced by the glass plate, S, where S = t g sin(2β)/ n 2 − sin 2 (β). In this equation, t g is the glass plate thickness, β is the incidence angle upon the glass plate, and n is the refractive index of the glass plate [42,43]. Due to a shearing configuration, there will exist two copies of each cell, however, when the lateral shear is greater than the size of the magnified object, the cells will not overlap with their copies, and the normal processing of off-axis holograms is followed [3]. Furthermore, the capture of redundant sample information can be avoided by ensuring the lateral shear is greater than the sensor size [12]. This shearing digital holographic microscopy system provides a field-of-view of approximately 165 µm x 135 µm, and the theoretical resolution limit was calculated as 0.6 µm by the Rayleigh criterion [6,19]. Furthermore, the temporal stability of the system was reported as 0.76 nm [6], when measured in the hospital setting that served as the site of the human red blood cell data collection, making the system well suited to study red blood cell membrane fluctuations which are on the order of tens of nanometers [6,44]. A diagram of the optical system and depiction of the 3D-printed shearing digital holographic microscope are provided in Fig. 1. Using the described system, thin blood smears are examined, and video holographic data is recorded. Following data acquisition, each individual frame of a segmented red blood cell is numerically reconstructed. Based on the thin blood smear preparation, the red blood cells are locally stationary, but continue to exhibit cell membrane fluctuations. Given the high sensitivity of the system to axial changes, the system can detect small changes in the cell membrane (i.e. cell membrane fluctuations). These slight changes in the overall cell morphology are studied over the duration of the videos to provide information related to the spatio-temporal cellular dynamics of the sample. Further details related to the suitability of DHM in studying red blood cell membrane fluctuations are provided in Refs. [6] and [14].

Human subjects, blood collection and preparation
This study was approved by the Institutional Review Boards of the University of Connecticut Health and University of Connecticut-Storrs. All subjects were at least 18 years old and were ineligible to participate if they had a blood transfusion within the previous three months. Blood was collected once from both healthy control subjects and subjects with sickle cell disease (SCD). 7 ccs of blood was collected via venipuncture into two 3.5 mL lavender top vacutainer tubes for complete blood count with leukocyte differential, hemoglobin electrophoresis, and blood smears. Demographic information including age, race, and ethnicity was recorded.

Digital holographic reconstruction
Following the recording of a holographic video by the CMOS sensor, the video frames are computationally processed to extract the phase profiles of the objects under inspection. Based on the off-axis nature of the shearing interferometer, we use the common Fourier spectrum analysis [3,45,46] in processing of the holograms. From the recovered complex amplitude of the sample,ũ (ξ, η), the object phase calculated as F=tan −1 [Im{ũ}/Re{ũ}], where Im{·} and Re{·} represent the real and imaginary functions, respectively. The extracted phase is then unwrapped by Goldstein's branch-cut method [47]. System abberations are reduced by subtracting the phase of a cell-free region of the blood smear [3,48]. Given the unwrapped object phase (F un ), the optical path length (OPL) is computed as OPL= F un [λ/(2π)]. When the refractive indices of both the object and background media are known, the OPL can be directly related to the height or thickness of the object through the expression h = OPL/∆n, where h is the height and ∆n is the refractive index difference. Typical values for human red blood cells and plasma are 1.42 and 1.34, respectively [49], however, these values cannot be assumed for all samples in the follwing analysis, thus all analysis in this paper is performed using the OPL values.

Hand-crafted morphological feature extraction
To characterize the cells under inspection, morphological features related to the cell shape are calculated for each segmented cell. The use of morphological features is a long-standing method for qualitative and quantitative assessment of biological specimen [50]. The morphological features extracted here provide easily interpretable cell characteristics that are related to the threedimensional cell shape and composition. In total, we use fourteen handcrafted morphological features. Eleven of these features relate to the instantaneous characteristics of the cell (i.e. static features) whereas three of these features are designed to capture the time-varying or motility-based features and encode the spatio-temporal behavior of the cells [6]. The static features are used in both the conventional machine learning algorithm and the proposed deep learning method. However, because the proposed deep learning method learns the spatio-temporal behavior from the static features at each time step, the three motility features mentioned in this section are used only in the conventional machine learning algorithm for comparison to the proposed method. Details regarding extraction of spatio-temporal features for use in conventional machine learning algorithms are provided in [6]. Table 1 provides a brief description for all hand-crafted features examined in this work [5,6,12,14].

Feature extraction through transfer learning
Alternatively, instead of carefully designing handcrafted features, convolutional neural networks can be used to find effective feature representations for a given dataset. Recently, thanks in part to openly available databases for training such as ImageNet [51], the use of pretrained networks in transfer-learned image classification tasks has grown significantly. Transfer learning enables the use of powerful deep learning models to be used on smaller, 'target' datasets by first pre-training the network on a larger 'source' dataset [32]. When using a pretrained network, a complex convolutional neural network is trained on an extremely large database to learn features that generalize well to new tasks. Then the pretrained network is adapted for a new dataset. This sometimes involves retraining several terminal layers of a network using the target dataset, however, the simplest form of transfer learning is to use a pretrained network as a feature extractor for new tasks. To apply these pretrained networks as a feature extractor, after training on a source dataset, the target training set is passed through the CNN up to the last fully connected layer, then the feature vector from this last layer is used to train a new classifier such as a support vector machine [39]. In doing so, transfer learning removes the need for long training process on new datasets, requires the data to only pass through the network once for feature extraction, and greatly reduces both the time and computational resources needed to leverage the benefits of deep learning models.
In this work, we use the DenseNet-201 convolutional neural network [52], pretrained on ImageNet, as a feature extractor for classification tasks. The DenseNet is a specific type of CNN architecture wherein the input to any given layer includes the feature maps from all preceding layers rather than only the most recent layer as in more traditional architectures. This arrangement was shown to be beneficial in reducing issues due to vanishing gradients, reducing the number of parameters, and strengthening feature propagation [52]. Following feature extraction using the pretrained network, 1000 features are output by the network. These features have no easily interpretable meaning but do offer robust features for most image classification tasks. As with the static handcrafted morphology-based features, the transfer learned features were extracted at each frame of the reconstructed video data providing a time-varying signal of the feature values for input to our deep learning model.

Long short-term memory (LSTM) network
To make classifications on sequential data, a special form of neural network is required. Recurrent neural networks (RNNs) are the specific type of artificial neural network designed to handle sequences of data and map the dynamic temporal behavior. RNNs use a looped architecture to allow some information to be maintained from the previous time step. They work by processing sequential data one element at a time and maintaining a 'state vector' for hidden units to incorporate the information from past input elements [31]. Theoretically, these networks should be able to handle arbitrarily long signals. In reality, however, RNNs are sensitive to exploding and vanishing gradients as the sequences get longer. To help alleviate the gradient-related issues, the long short-term memory network (LSTM) [39] was introduced and has since become one of the most used network architectures for sequential deep learning tasks. The LSTM architecture uses a cell state and cell gate arrangement to handle longer sequences of data and to mitigate the vanishing gradient problem [39]. Through several interacting gates that control the cell state as it is passed from the previous LSTM block to the following LSTM block, the LSTM network controls how memory is maintained through the system. A forget gate determines how much of the previous information regarding the cell state is to be discarded, an input gate determines how much information of the current cell state needs to be updated, and an output gate determines what information of the current cell is output to the following block.
To use an LSTM network, features are extracted at each time instance of a time-varying signal and input into the LSTM network as a matrix wherein each column is a feature vector corresponding to a different time step and each row is a different feature. During the training process, the LSTM network learns the mapping for the time-varying behavior of the data to accomplish the given task. A popular variant of the original LSTM architecture is the bidirectional LSTM (Bi-LSTM) architecture [53] which simultaneously learns in both the forward and reverse directions and is used in this work. An explanatory diagram for the Bi-LSTM network is provided by Fig. 2.
From Fig. 2, we see that feature vectors for each time-step are input to the network as x t . Each LSTM block, uses the input feature vector (x t ), and the previous cell's hidden state (h t−1 ) to update the cell state from C t−1 to C t , as well as outputting a new hidden state (h t ) to be fed to the following LSTM block. The outputs from the Bi-LSTM layer are passed to a fully connected layer, followed by a softmax then classification layer. Inside, each LSTM block, as shown by Fig. 2(b) several operations take place which give the LSTM architecture its unique functionality in comparison to traditional RNNs.
For operation of the Bi-LSTM network, the cell state (C t ), is passed through the repeating blocks of the Bi-LSTM layer and updated by several interacting gates, where each gate is composed of a sigmoid function to determine how much information should be passed along and a multiplication operation. From left to right in the diagram shown in Fig. 2(b), the first of these sigmoidal functions is the activation function for the forget gate (f t ) which uses the previous hidden state (h t−1 ) and the current input (x t ) to determine how much of the previous cell state (C t−1 ) is discarded. The second sigmoid function is the activation function of the input gate (i t ), which determines the values of the cell state to be updated. The input gate is multiplied with a vector of potential new values to be added to the previous cell state, (C t ), produced by a hyperbolic tangent function, then combined with the previous cell state to obtain the current cell state (C t ). The third sigmoid is the activation function for the output gate (O t ) to determine which part of the cell state to output as the updated hidden state. The current cell state (C t ) is passed through a hyperbolic tangent function then multiplied with the output gate (O t ) to produce the updated hidden state (h t ). These defining interactions can be described mathematically as follows [39]: Where i, f, and O are the input, forget and output gates, respectively, C is the cell state vector, and W kk and b kk , k {i, f , O, c, x, h}, are the weights and biases of the network. σ and tanh represent the sigmoidal and hyperbolic tangent functions, respectively. Since a Bi-LSTM layer is used, the output of the Bi-LSTM layer to the fully connected layer is the concatenation of the final outputs from the both the forward and reverse directions.
For the Bi-LSTM network used in this work, both the handcrafted morphological features, and the transfer learned features which have been extracted for every time frame are used as inputs. Note that for the handcrafted features, we are unable to extract the three handcrafted motility features at each individual frame. Instead, we use the statistical means of the optical flow vector magnitudes and orientations as additional static features to incorporate the information obtained from the optical flow algorithm.
During processing, all videos were limited to the first two hundred frames to reduce the computational requirements. The Bi-LSTM network used a dropout rate of 0.4 following the Bi-LSTM layer and was optimized using the Adam optimizer. For the animal RBC task, the Bi-LSTM layer contained 400 hidden units and was trained for 60 epochs with a learn rate of 0.001 for the first 30 epochs and 0.0001 for the final 30 epochs. These hyperparameters were chosen based on the performance of a single randomly chosen video as the test set. For the human RBC task, the Bi-LSTM layer contained 650 hidden units and was trained for 40 epochs with an initial learn rate of 0.0001 that dropped to 0.00001 for the final 10 epochs. Similarly, these hyperparameters were chosen based on the performance for a single randomly chosen patient as the test set. All processing, classification, and analysis was performed using MATLAB software. During feature extraction, approximately 100 cells were processed per minute for video sequences of 200 frames. After feature extraction, including training time, classification of a single patient's RBCs took less than 1 minute using a NVIDIA Quadro P4000 GPU.

Performance assessment
To demonstrate the performance of our proposed method, we study two distinct red blood cell (RBC) datasets. First, we consider classification between cow and horse RBCs. Second, we consider a previously studied dataset of healthy and sickle cell disease RBCs from human volunteers [6]. In each case, the datasets are classified using a conventional machine learning strategy as well as the proposed deep learning method for comparison. The conventional machine learning algorithm uses the handcrafted features of Section 2.4 in a random forest classifier [54], then these results are compared to our proposed method using an LSTM deep learning model. To further illustrate the benefit of using time-varying signals as inputs for classification, we also consider a support vector machine classifier using the same features as the proposed LSTM model but extracted from only the first time frame.
The classification performance is assessed based on classification accuracy, area under the receiver operating characteristic curve (AUC), and Mathew's correlation coefficient (MCC). The area under the curve provides the probability for a classifier that a randomly chosen positive class data point is ranked higher than a randomly chosen negative class data point. For this metric, 1 represents perfect classification of all data points, 0 represents incorrect classification of all points and 0.5 represents a random guess. Mathew's correlation coefficient is a machine learning performance metric that effectively returns the correlation coefficient between the observed and predicted classes in a binary classification task. Values of MCC range from -1 (total disagreement) to 1 (perfect classification) with 0 representing a random guess. The MCC is considered to be a balanced measure and provides a reliable metric of performance even when dealing with classes of varied sizes.
The full process for cell identification and disease diagnosis is overviewed by Fig. 3. Each red blood cell dataset is considered under two conditions: pooled data, and cross-validated data. For the pooled data, all cells are grouped together and randomly partitioned with an 80%/20% split for training and testing. Cross-validation, on the other hand, provides a more accurate representation of real-world applications by testing the cells in a manner similar to that which they were collected. For the human red blood cell dataset, cross-validation is performed at the patient level. That is, we test one patient at a time, ensuring no cells from the current test patient are present in the training data. The testing is repeated for each patient's data while the remaining patients comprise the training set, and the results are averaged. For the animal red blood cell dataset, cross-validation is performed at the video level. By this we mean all cells extracted from a given video were grouped together, and only the cells from a single video were used as the current test set. This ensures that no data from the current testing video data is present in the training set. As with the human RBC dataset, the results are averaged over all individual videos acting as the test set.

Cell identification: classification of cow and horse red blood cells
The dataset for cow and horse red blood cells consisted of 707 total segmented RBCs (376 Cow, 331 Horse). The two animal cell types were chosen based on morphological similarity as both animal RBC types are biconcave disk-shaped and between 5-6 µm in diameter [55]. The animal red blood cells were obtained as a 10% RBC suspension taken from multiple specimens and suspended in phosphate-buffered saline (as provided by the vendor). Thin blood smears of the blood samples were prepared on microscope slides and covered by a coverslip, then imaged using the digital holographic microscopy as shown in Fig. 1. No additional buffer medium was used, and all cells were imaged prior to the slide drying out due to exposure to air. Video holograms were recorded at 20 frames per second for 10 seconds. Following hologram acquisition, the cells were segmented, reconstructed, then features were extracted as described in the above paragraphs. Segmentation was performed using a semi-automated algorithm wherein potential cells were identified through simple thresholding and an input expected size. Once potential cells are identified, morphological operators included in the Matlab image processing toolbox were used to automatically isolate the segmented cell from its background medium at each time frame. Human supervision during the segmentation process allows for quality control to remove any data from the dataset when multiple cells overlap, or segmentation was not performed properly. Examples of segmented cow and horse RBCs are provided by Fig. 4. Notably both cell types have similar size and shape, presenting a non-trivial classification problem. Video reconstructions of the segmented cells are available online. Furthermore, we provide the probability density functions for each of the hand-crafted features in Fig. 5, which show significant overlap between the two classes, indicating their similarity. The resulting confusion matrices for the cross-validated data are provided in Table 2, and the receiver operating characteristic curves comparing the random forest and LSTM models for the cross-validated data are shown in Fig. 6.
From Table 2, we note a slight increase in classification accuracy when using the proposed LSTM classification strategy over the previously reported random forest model for cell identification using spatio-temporal cell dynamics [6,19]. Furthermore, both the random forest and LSTM models outperform the SVM model which uses features extracted from only a single time frame, highlighting the benefits of considering the dynamic cellular behavior. Figure 6 shows the ROC curves for both the random forest and LSTM models on the cross-validated data using different extracted features for classification.
The random forest model was tested under three conditions: (1) using only static-based features, (2) using only the motility-based features and (3) when the combination of all features was  used. The LSTM model was also tested for three conditions: (1) using only the handcrafted morphological features, (2) using only the transfer learned features, and (3) using the combination of all features. Amongst the random forest classifiers, the highest area under the curve (AUC = 0.8511) was achieved when the combination of both static and motility-based features was used. Overall, the LSTM model using combined features provided the best overall performance, having an AUC of 0.8615. A summary of the classification performance for all tested conditions, including both the pooled and cross-validated results are given in Table 3. For the pooled data, the proposed LSTM model achieves the highest classification accuracy and MCC values, whereas for the cross-validated data, the proposed LSTM model achieves the highest values of classification accuracy, AUC, and MCC values.

Detection of sickle cell disease (SCD)
For validation on a clinically relevant dataset, we test our proposed method on a previously studied dataset of healthy and sickle cell disease human RBCs [6,19]. Sickle cell disease is recognized as a global health issue by both the World Health Organization (WHO) and the United Nations. The inherited blood disease is characterized by the presence of abnormal hemoglobin which cause mishappen red blood cells and affects oxygen transport throughout the body. The disease is particularly devasting in less developed regions of the world, such as some regions of Africa, where approximately 1,000 children with SCD are born every day and over half will die before their fifth birthday [56]. The only cure for SCD is a bone marrow transplant which carries significant risks and is rarely used for cases of SCD [57]. Most often, treatments include medications and blood transfusions [57]. Early and accurate detection of sickle cell disease is important for establishing a treatment plan and reducing preventable deaths. Cells with the abnormal sickle hemoglobin often form irregular rod-shaped and sickle shaped cells which give the disease its name. However, not all RBCs from a SCD patient will be sickle shaped. Despite some cells having normal appearance under visual inspection, all RBCs produced by a person with sickle cell disease will have abnormal intracellular hemoglobin which may contribute to abnormal cell behavior.
Standard procedures for identification of sickle cell disease use gel electrophoresis or highperformance liquid chromatography to analyze a blood sample and screen it for the presence of abnormal hemoglobin. The major downside of these strategies is that they require dedicated laboratory facilities, the processing takes several hours, and due to cost-saving measures, the patient may not receive the results for up to two weeks after the initial blood draw [6]. These drawbacks severely limit the testing abilities in areas of the world where the disease is most prevalent. To address these limitations, rapid point of care systems are currently being developed and tested [58,59]. Furthermore, optical methods such as multi-photon microscopy [60], and digital holographic microscopy [6,13,19], have also been used to study sickle cell disease and as potential diagnostic alternatives.
For generation of this dataset, 151 cells from six healthy (i.e. having no hemoglobinopathy traits) volunteers (4 female, 2 male) and 152 cells from eight patients with sickle cell disease (2 female, 6 male) were segmented from blood samples. After whole blood was drawn from the volunteers, a thin blood smear was prepared on a standard microscope slide with a coverslip for analysis in the digital holographic microscopy system (Fig. 1). All data was collected before the samples dried out from exposure to air. Video holograms were recorded for 20 seconds at 30 frames per second. Following data acquisition, the individual cells were segmented and numerically reconstructed to allow for feature extraction and classification. Due to the presence of abnormally shaped cells, all reconstructed RBCs in this dataset were manually segmented. Visualizations of both healthy and SCD RBCs are shown in Fig. 7. Furthermore, video reconstructions of these cells are available online. Again, we provide the probability density finctions for each of the hand-crafted features to illustrate the non-trivial classification problem in Fig. 8.   Fig. 8. Probability density functions of each feature for healthy (blue curve) and sickle cell disease (red curve) red blood cells. Mean optical path length is reported in meters, projected area in meters squared, optical volume in cubic meters, and all other features in arbitrary units.
The resulting confusion matrices for the cross-validated data are provided in Table 4, and the receiver operating characteristic curves comparing the random forest and LSTM models for the cross-validated data are shown in Fig. 9. The confusion matrices in Table 4 show we achieved the best performance using the proposed LSTM model for cross-validated classification of the human RBC dataset (81.52% accuracy). The proposed method outperformed the previously presented random forest model (72.93% accuracy) and outperforms an SVM classifier using features extracted from only a single time frame (76.23% accuracy). Likewise, Fig. 9 shows the ROC curves for both the random forest and LSTM models on the cross-validated data using different extracted features for classification. From Fig. 9, the LSTM model using the combination of handcrafted morphological and transfer-learned features provided the highest AUC. A summary of the classification performance for all tested conditions, including both the pooled and cross-validated results are given in Table 5.  In both the pooled data and the cross-validated data, the proposed LSTM model achieves the highest values of classification accuracy, AUC, and MCC values.

Discussions
The classification results show the proposed method provided the best overall performance for both identification of different animal red blood cell classes and detection of sickle cell disease in human RBCs. The improvement of classification performance on two distinct datasets provides strong evidence that the proposed approach may be beneficial to various biological classification tasks including potential for diagnosis of various disease states. The results further demonstrate that the inclusion of spatio-temporal cellular dynamics can be used to improve classification performance.
For the animal RBC data, the proposed method provides approximately a 3% increase in classification accuracy. The proposed method was especially beneficial for the human RBC data, wherein, we achieve a nearly 10% increase in classification accuracy using the proposed approach in comparison to the previously published random forest classifier on the cross-validated dataset. Several factors may be responsible for this difference in performance between the two RBC datasets. Firstly, from the video reconstructions, the sickle cell disease RBC shows very limited membrane fluctuations, potentially indicative of increased cell rigidity in comparison to the healthy human RBC (video reconstructions available online). The LSTM network's ability to represent this spatio-temporal information may explain why a greater improvement in performance is achieved for the human RBC dataset. Another possible explanation is that the transfer learned features are particularly helpful for distinguishing the irregularly shaped diseased cells. This explanation finds support by the fact that the combination of handcrafted and transfer-learned features from a single frame using an SVM outperformed the previously used random forest model incorporating both static and spatio-temporal information ( Table 4).
The results of the sickle cell disease dataset also highlight the importance of cross-validation when dealing with human disease detection (Table 5). When pooling all cells together and not considering the separation of individual patient data, the same dataset achieves nearly perfect classification with 98.36% classification accuracy and an AUC of 1 as opposed to the 81.52% classification accuracy and AUC of 0.8645 attained with patient-wise cross-validation. We believe that the drop off from pooled data to cross-validated data was more evident in the human RBC group because of the smaller size of the dataset and the inter-patient variability, leading to a less homogenous dataset than in the animal RBC data. Classification accuracy on human data should be expected to improve with larger datasets, even at the patient level as the training data will become a better representation of the overall population that the testing data is drawn from. Furthermore, as data driven methods, deep learning approaches tend to increase in accuracy along with growing data availability whereas conventional machine learning methods reach a plateau in performance. Therefore, we believe the inclusion of cell motility information, and the utilization of deep learning as demonstrated here will continue to have an important role in cell and disease identification tasks.
It is important to also discuss the advantages of the presented system and methodology with respect to traditional medical diagnostics as well as with respect to other machine-learning based approaches. Firstly, the presented offers several advantages in terms of cost, field-portability, and time to results over conventional medical tests. The proposed methodology can provide classification of a patients' cells on the order of minutes, whereas a typical electrophoresis assay takes several hours, and oftentimes the assays are batched to reduce costs which can extend the time for results to up to a week or more. Furthermore, these traditional systems require dedicated lab facilities and highly trained personnel. The presented system is field-portable, provides near rapid results, and does not require extensive training. Lastly, since the system requires only a small sample of blood, it can easily be used in the field. In combination, these factors make the presented system an attractive option for potential use as diagnostic system, especially in areas of limited resources.
Whereas several machine learning and deep learning approaches have been presented for classification of biological samples in digital holographic imaging [6,14,[23][24][25][36][37][38], to the best of our knowledge, this is the first report of a deep-learning approach based on spatio-temporal dynamics. Previously presented works considering time-dynamics of living cells focus on cell monitoring [14,23,24], tracking of cell states over time [25] or use hand-crafted motility-based features [6] to include temporal information in classification using conventional machine learning algorithms. On the other hand, previous deep-learning approaches for biological cell classification [36][37][38] using convolutional layers do not consider temporal behavior of the cells. The proposed methodology enables the use of time-varying behavior in a deep learning framework by utilizing a Bi-LSTM network architecture and shows improved performance over classification using handcrafted motility-based features in conventional machine learning approaches [6].

Conclusions
In conclusion, we have presented a deep learning approach for cell identification and disease detection based on spatio-temporal cell signals derived from digital holographic microscopy video recordings. Holographic videos were recorded using a compact, 3D-printed shearing digital holographic microscope, then individual cells were segmented and reconstructed. Handcrafted morphological features and features extracted through transfer learning using a pretrained convolutional neural network were extracted at each time frame to generate time-varying feature vectors as inputs to the LSTM model. The proposed approach was demonstrated for cell identification between two morphologically similar animal red blood cell classes using a clinically relevant dataset consisting of healthy and sickle cell disease human red blood cells. In each instance, the cross-validated data had improved performance using the proposed approach over conventional machine learning methods with substantial improvement noted for sickle cell disease detection. The proposed approach also outperformed a classifier using data from only a single timeframe, indicating the benefit of studying time-evolving cellular dynamics. Future work entails continued study of time-varying biological signals, increased patient pools for clinically relevant studies, and testing on various disease states. Additional future work may also consider Zernike coefficients in combination with morphological or transfer learned features using the proposed spatio-temporal deep learning strategy as well as the use of the proposed methodology to distinguish between various disease states that may appear morphologically similar [20].

Funding
Office of Naval Research (N000141712405).