Evaluation of a Methodology for Automated Cell Counting for Streak Mode Imaging Flow Cytometry

Circulating tumor cells (CTCs) have been reported to play an important role in the development of distance cancer micrometastases, and a potential role in characterizing genetic changes with tumor progression, but only recently, CTCs technology has matured to achieve acceptable reproducibility and sensitivity levels to explore clinical utility [1,2]. In disease monitoring, baseline CTCs count and cell characterization have shown promise as an early predictor of metastases [3,4], but since CTCs are found in very low concentration in blood (1-100 per mL) [5], large sample volumes (5-10 ml) are needed for meaningful enumeration impeding standard flow cytometry for analysis (due to low volumetric throughput) [6]. Latest technologies in microfluidic cytometry allow the analysis of several thousand particles per second, but this technology is still bounded to low throughput, which limits the device to either small volumes or long analysis times [7,8].


Introduction
Circulating tumor cells (CTCs) have been reported to play an important role in the development of distance cancer micrometastases, and a potential role in characterizing genetic changes with tumor progression, but only recently, CTCs technology has matured to achieve acceptable reproducibility and sensitivity levels to explore clinical utility [1,2]. In disease monitoring, baseline CTCs count and cell characterization have shown promise as an early predictor of metastases [3,4], but since CTCs are found in very low concentration in blood (1-100 per mL) [5], large sample volumes (5-10 ml) are needed for meaningful enumeration impeding standard flow cytometry for analysis (due to low volumetric throughput) [6]. Latest technologies in microfluidic cytometry allow the analysis of several thousand particles per second, but this technology is still bounded to low throughput, which limits the device to either small volumes or long analysis times [7,8].
The issue of low throughput has been extensively studied to address the challenge of large volumes analyses in flow cytometry [9] including low cost Lab on a Chip CMOS or CCD-based multichannel detectors [10][11][12]. Mobile phones have also been proposed for cell enumeration and analysis [13][14][15], however the cameras in these phones are less versatile with their optical systems than webcams (e.g., inability to change lenses).
A high throughput microfluidic cytometer was recently developed using a wide field flow-flow cell instead of the conventional narrow hydrodynamic focusing cells used in traditional flow cytometry enabling analysis of large volumes at lower flow rates (challenging for standard flow cytometers) [16,17]. This wide-field flow cytometer adopts a technique used in Particle Image Velocimetry (PIV) known as "streak photography" where exposure times and flow velocities are set such that the particles are imaged as short "streaks" (Figure 1). Since streaks are imaged with large number of pixels, they should be easily distinguished from the noise which appears as "speckles" increasing Abstract Identification of Circulating Tumor Cells (CTCs) has shown promising clinical applications, but since CTCs are found in very low concentration in blood large sample volumes are needed for meaningful enumeration. This issue impedes the analysis of CTCs using standard flow cytometry due to its low throughput. To address this issue, a high throughput microfluidic cytometer was recently developed using a wide field flow-flow cell instead of the conventional narrow hydrodynamic focusing cells (used in traditional flow cytometry) enabling analysis of large volumes at lower flow rate. This wide-field flow cytometer adopts a technique known as "streak photography" where exposure times and flow velocities are set such that the particles are imaged as short "streaks". Since streaks are imaged with large number of pixels, they are easily distinguished from the noise which appears as "speckles" increasing the detection capabilities of the device, making it more suitable for analysis using current low sensitivity, high noise webcams or mobile phone cameras. The non-stationary nature of the high noisy background found in streak cytometry introduces additional challenges for automated cell counting methods using traditional cell detection techniques such TLC, CellProfiler, CellTracker and other tools based in traditional edge detection (e.g., Canny based filters) or manual thresholding. In order to address this issue, we developed a new automated enumeration approach that does not rely on edge detection or manual thresholding of individual cells, rather is based in image quantizing, morphological operations, 2D order-statistic filtering and decisions rules that take into account knowledge of the structure and expected location of the streaks in consecutive frames. We evaluated our approach comparing it with two current methods representing the major computational modalities for cell detection: CellTrack (based in edge detection) and MTrack2 (based in manual thresholding). Samples of 1 cell/mL nominal concentration were analyzed in batch size of 30 mL at flow rate of 10 mL/min and imaged at 4 frames per second (fps), the files were saved in uncompressed AVI format files. The cells were annotated and the signal to noise ratio (SNR) was calculated. For samples with average SNR greater than 4.4 dB, our method achieved a sensitivity of 91% compared to CellTrack (60%) and MTrack2 (71%). The True Positive Rate (TPR) of cells detected was 0.93 for our method compared with 0.80 for Mtrack2 and 0.29 for CellTrack. This demonstrated the ability of the algorithm to count rare cells with high accuracy for concentrations of 1 cell/mL with SNR greater than 4.4 dB. This cell counting capability will enable to automate low cost imaging flow cytometry based on CCD detector and the expansion of cell-based clinical diagnostics in resource-poor settings.
the detection capabilities of the device, making it more suitable for analysis using current low sensitivity, high noise webcams or mobile phone cameras. In addition, since the images are taken at low speed the file size is reduced by a factor of 40 [16,17].
Advantages of the wide field streak cytometer are its low cost manufacturing, portability and it allows rapid enumeration at low cell concentration (e.g., less than 1 cell/ml), but it requires visual counting making the cell enumeration time-consuming and subjective, highlighting the need for an automated cell detection and tracking method.
Cell detection and tracking is an important problem with numerous clinical applications and it has been the subject of intense study in recent years. Commercial and public domain software tools and methods for cell motion tracking haven intensively reviewed [15,16]. General purpose tools for cell analysis such as TLA (Time lapse Analyzer) [18], CellProfiler (Broad Institute) [19] and other more specialized cell tracking tools such as CellTrack [20], CellTracker [21] and others [22,23], provide a framework for image object-tracking based on traditional edge detection (e.g., Canny based filters) and segmentation methods, but faint moving streaks over a moving background present a challenge for these methods. Other tools such MTrack2 (ImageJ plugin) do not rely on edge detection, rather rely on manual image thresholding and size filtering for detection and tracking [24]. Since manual thresholding is pre-selected by the user based on the average cell intensity, faint cells in a high noise background (usually encountered in streak imaging cytometry) also present a challenge for these methods.
In previous work, we developed a computational framework to overcome the challenges of cell detection and enumeration for streak image cytometry [25]. Our approach does not rely on edge detection or manual thresholding of individual cells, rather is based in image quantizing, morphological operations, 2D order-statistic filtering and decision rules based in streak intensity, streak integrity and cell location for identification and tracking. In this paper, we perform a comparison of our method [25] against current tools for cell detection and tracking. The tools selected represent the two most effective/used methods for cell tracking: MTrack2 (based on manual thresholding) and CellTrack (based on edge detection).

Webcam-based flow cytometer
The webcam-based wide-field streak cytometer and the data used in this paper was reported in previous work [16,17]. Briefly, a green emission filter with center wavelength 535 nm and bandwidth 50 nm (Chroma Technology Corp., Rockingham, VT) was used for detecting fluorescent emission. For fluorescent excitation, a 1 W 450 nm laser was used (Hangzhou BrandNew Technology Co., Zhejiang, China). A Sony PlayStation® Eye webcam with resolution of 640 × 480 pixels with a max video frame rate of 120 fps, equipped with a c-mount CCTV lens (Pentax 12 mm f/1.2) was used as the photodetector. The webcam sensor was connected to a 32-bit Windows-based laptop computer via a USB2 port. The camera control software was used to set camera parameters (exposure time, frame rate, and gain) and to save video in uncompressed AVI format ( Figure 2). Cell SYTO-9: As described in previous work, fluorescently stained THP-1 human monocytes were used to simulate rare cells [16,17]. Briefly, cells were labeled with SYTO-9 dye added to 1 mL of suspended cells solution. After labeling, cells were diluted to a level of approximately 1 cell/μL (measured by microscopy). From this relatively high concentration, lower concentration 27 samples of 1 cell/ mL were generated by single-step dilution. Samples of cells at 1 cell/mL concentration were injected to the wide view flow cell in batch sizes of 30 mL with flow rate set to 10 mL/min.

In-House methodology for streak detection and counting
The streak detection and tracking algorithm was reported in previous work [25] (for details of the algorithm refer to the Appendix). The algorithm is implemented in MATLAB R2014b and consists of three major procedures: Streak detection: Streaks are defined as vertical elements that are expected to belong to cells. In this step we identify all potential streaks through thresholding using Otsu method [26] and noise reduction using a 2D order-statistic filter (please refer to procedure "ComputeStreakMask" in Appendix for more details). The final result of this procedure is a binary mask that contains vertical regions of the frame where there is a high likelihood of containing a cell.

Identifying candidate cells:
The binary mask from the previous step is overlaid with the original image and each streak representing a potential cell is enclosed in a boundary box for further possessing. Features of the streaks such as area, centroid, orientation, etc. are computed for each streak. Using their centroid and orientation, streaks are partitioned into equivalent classes and identified as a candidate cell (equivalent streaks are defined as streaks that belong to the same cell).  frames using a MATLAB application developed to assist with this task (Figure 4).
In order to facilitate the automatic evaluation of various cell detection/identification algorithms, we developed a MATLAB application for matching the detected cells to the ground truth cells. First, a complete weighted bipartite graph is built having as vertices all of the detected cells 'D' and all the ground-truth cells 'T' in all the video frames (Figure 5a), and edges between (D, T) with weight equal to the minimum (across all the frames where both cells appear) of the distance between the bounding boxes (in the same frame) of the cells corresponding to 'D' and 'T' (Figure 5b). Next, we remove all edges with weight above a user provided threshold, and replace the edge weights w(D, T) with wmax − w(D, T), where wmax is the maximum edge weight (Figure 5c). Finally, we compute a maximum weight matching in the resulting graph.
Please refer to procedure "ComputeStreakAndCandidateCells" for more details). The final result of this step is the location of all candidate cells in different frames.

Filtering out spurious cells and identifying true cells:
The candidate cells identified in the previous steps are evaluated according to intensity and length to identify true cells and eliminate spurious cells (please refer to procedure "FilterSpuriousCells" in the Appendix for more details).
The parameters of our algorithm are fixed for all the samples at the same values.

Signal to noise ratio
As described in previous work [25] the SNR of a cell occurrence is computed from the intensity values of the pixels within its bounding box and a noise box. The noise box is the area consist of the 5 outer pixels of 12 pixel dilation of the bounding box (7 pixels away from the signal) ( Figure 3). The signal consists of all the pixels enclosed within cell's bounding box; this ensures that pixels associated with the cell are not used in estimating the background noise. The background noise is the average of the pixels in the noise rectangle. SNR was calculated as:

Matching detected cells to ground truth cells
Ground truth cells were identified by manually reviewing each video file, and providing for each such cell the location and frame(s) in which it occurs. The cells were annotated while crossing the    This matching associates each detected cell with at most ground truth cell. Note that some detected cells may not be matched with any ground truth cell; in which case, we consider such detected cells as "false positives" (FP). Similarly, some ground truth cells may not be matched with any detected cell; in which case, we consider such ground truth cells as "false negatives" (FN). The matched detected cells are considered "true positives" (TP).
Upon counting the FN, FP, and TP cells, we calculate the false positive rate (FPR), false negative rate (FNR), true positive rate (TPR) and sensitivity as showed below:

Performance of our In-House method compared with current methodology for cell tracking
We compare the performance of our method with CellTrack (edge detection method) and MTrack2 (standard thresholding method). We feed the same frame sequences to these tools and compared the results with our algorithm (Figure 6). CellTrack requires inputting the initial edge threshold and the edge linking threshold for efficient edge detection. We set the initial edge threshold between 30-60, and we set the edge linking threshold between 20-50 for all the samples. For   MTrack2, the threshold for detection was set between 140-150 and the minimal size object detected was 20 pixels.

Results
Our algorithm along with Mtrack2 and CellTrack were evaluated in video files from all the 27 samples containing 1 cell/mL. Tables 1 and  2 show the detection results of our algorithm, MTrack2 and CellTrack respectively as compared with the ground truth. The samples are ordered by the average Signal-to-Noise Ratio (SNR) of their ground truth cells. The SNR of a cell is the maximum of the SNR's of all of its occurrences ( Table 3).
Our algorithm performs better than CellTrack and MTrack2 for all the samples as well as the samples with SNR at least 4.4 dB. In addition, the average value for false positive and false negative rates were considerably smaller in our algorithm (0.07 and 0.09 respectively) compared with MTrack2 and CellTrack (Table 4).
Considering all SNRs values across all samples, our algorithm also performed better than MTrack2 and CellTrack (Table 5). We can clearly see that our algorithm shows substantially better (and never worse) sensitivity for all the samples as compared with MTrack2 and CellTrack.

Conclusion
Our algorithm performed better than current methods for cell detection and enumeration (average true positive rate of 93% TPR and sensitivity of 91%) in detection of cells in concentrations of 1 cell/ml for samples with SNR at least 4.4 dB. In comparison, current commonly--used edge detection or threshold based tools such CellTrack and MTrack2 did not performed well. CellTrack fails to detect faint moving streaks in the moving background producing a high false positive rate. MTrack2 performs better than CellTrack but still much worse than our method.
Our method enables automation of the new imaging cytometry technique for rare cell detection. Wide field video imaging cytometry combined with cell streak imaging results in a simpler, affordable, and more portable flow cytometer, which facilitates the expansion of cellbased clinical diagnostics, especially in resource-poor settings.
For samples with SNR lower than 4.4 dB, our algorithm was able to detect only ~58% of the true cells. Our algorithm for cell detection relies in part on the ability to differentiate cells from noise in each individual frame, therefore the algorithm performed poorly for very faint cells. In order to detect these faint cells other techniques based on recognition of pixel spatial patterns and expected location in the frame may be used. In future research we plan to develop a methodology for detecting rare cells with lower SNR values.

Appendix Automated streak detection and counting algorithm
Streak detection and binary mask: The objective of this procedure is to create a binary mask with the location of all the streaks in the frames that potentially represent cells. The creation of the binary mask is achieved through the procedure I "ComputeStreakMask". The major steps of this procedure (in MATLAB-like pseudocode) are given below.
Step 1: The 640 × 480 pixel frames are preprocessed by removing 40 pixels in left and right margins to eliminate artifacts showed in the margins of the frame (Figure 8a).

Steps 2-3:
The image intensity is adjusted by histogram equalization, then two thresholds values (of the image intensity) are selected using Otsu's method quantizing the image into three levels (Figure 8b).
Step 4: In order to fill small gaps along the boundary of foreground objects, a morphological close with a 3 × 2 pixels rectangle is slide across each frame. This procedure generates unwanted noise (Figure 8c), but the foreground objects (potential streaks) become more defined (no gaps along the boundary).    The binary mask from the previous step is overlaid with the original image and the streaks representing potential cells are enclosed in boundary boxes for further possessing (Figure 11). The following features are computed for each streak: (a) area, (b) bounding box (centroid, height, and width), (c) major and minor axes length of enclosed ellipsoid, (d) eccentricity, (e) orientation, (f) perimeter, and (g) descriptive statistics (min, max, media, quantiles, variance, sum) of the values of the streak's pixels. Most of these features are provide by the "regionprops" MATLAB command. Streaks (across all frames) are grouped and labeled as equivalent if they are expected to belong to the same cell, based on the displacement of their centroids within a tolerance level. The streaks that are in the same equivalence class are now identified as a candidate cell. For each candidate cell, we compute aggregates (min, max, mean, median, range, variance) of the features of its streaks. Note that in MATLAB's image coordinate system, the x-axis (y-axis) runs along the image's width (height), increasing from left-toright (top-to-bottom) with the (0, 0) point at the upper-and-left-most pixel. By the end of this step the streaks are annotated and candidate cells are identified for each frame.

Filtering out spurious cells and identifying true cells:
The candidate cells identified in the previous steps are evaluated according to intensity and length. the goal of this procedure is to eliminate the cells with a low probability of being real cells. The major steps of this procedure are listed below (Procedure III).
Briefly, to eliminate spurious cells, we computed the 90 th quantile of streak height and 75 th quantile of the streak mean intensity and we eliminated all the streaks in the frame with height or mean intensity below these values ( Figure 12).
Step 5: In order to reduce the background noise generated in the previous steps, a 2D order statistic filtering is performed. Briefly, a 23 × 1 pixels rectangle is slid across the frame, replacing the pixel value at the rectangle origins with the 3 rd smallest of the pixel contained in the rectangle. The result of this operation will eliminate pixels with low values around the foreground object, but will preserve pixels with small values inside the foreground object (Figures 9a and 9b).
Step 6: The three level threshold image is converted to a binary image (Figure 9c).

Steps 7, 8:
This steps are necessary to clean the image from small artifacts and eliminate extrusions from the boundary of foreground objects. All objects (potential streaks) smaller than 50 pixels are eliminated (Figures 10a and 10b), then, a morphological open, with an 81 × 1 rectangle, is performed to remove non-vertical short objects, this objects most probably represent artifacts. The final result of this procedure is a binary mask with the location of all the streaks that potential represent cells (Figure 10c).

Identifying candidate cells:
The goal of this step is to identify all potential cells (candidate cells) using the streak location from the binary mask. This is achieved through procedure II "ComputeStreakAndCandidateCells".  Figure 10: All the foreground objects smaller than 50 pixels a long with non-vertical short objects are discarded. In addition, all the extrusion of the boundary of foreground objects are eliminated providing clean, well defined streaks mask(c). Notice, the binary mask (c) includes three streaks, but only the streak in the middle corresponds to a real cell.   Figure 12: Eliminating Spurious cells. Only one object was classified as nonspurious cell(b) from the three objects identified in the previous steps (a).