Fully-automatic deep learning-based analysis for determination of the invasiveness of breast cancer cells in an acoustic trap

: A single-beam acoustic trapping technique has been shown to be very useful for determining the invasiveness of suspended breast cancer cells in an acoustic trap with a manual calcium analysis method. However, for the rapid translation of the technology into the clinic, the development of an eﬃcient/accurate analytical method is needed. We, therefore, develop a fully-automatic deep learning-based calcium image analysis algorithm for determining the invasiveness of suspended breast cancer cells using a single-beam acoustic trapping system. The algorithm allows to segment cells, ﬁnd trapped cells, and quantify their calcium changes over time. For better segmentation of calcium ﬂuorescent cells even with vague boundaries, a novel deep learning architecture with multi-scale/multi-channel convolution operations (MM-Net) is devised and constructed by a target inversion training method. The MM-Net outperforms other deep learning models in the cell segmentation. Also, a detection/quantiﬁcation algorithm is developed and implemented to automatically determine the invasiveness of a trapped cell. For the evaluation of the algorithm, it is applied to quantify the invasiveness of breast cancer cells. The results show that the algorithm oﬀers similar performance to the manual calcium analysis method for determining the invasiveness of cancer cells, suggesting that it may serve as a novel tool to automatically determine the invasiveness of cancer cells with high-eﬃciency.

were applied to the cancer cells. This observation clearly suggested that cytoplasmic calcium responses of cancer cells to external stimuli were related to the invasiveness of breast cancer cells. This finding provided the impetus for the development of cancer cell stimulators that can increase the concentration of intracellular calcium levels by applying mechanical stimuli such as hydrodynamic, electrical, and magnetic forces in order to identify characteristics of cancer cells [7][8][9].
Recently, Hwang et al. have successfully evaluated a single-beam acoustic trapping system for determining the invasion potential of suspended breast cancer cells [10,11]. The single-beam acoustic trapping system allowed to trap a target cell in suspension and subsequently modulate its cytoplasmic calcium signaling. The report showed that the single-beam acoustic trapping system has the potential to determine the invasion potential of suspended breast cancer cells [12]. In this study, to determine the invasion potential of suspended breast cancer cells in an acoustic trap, the changes in the fluorescence intensity of calcium indicators loaded into the cells were manually or semi-automatically determined by comparing consecutive fluorescence images using a conventional image processing tool [10,12]. However, the image analysis was excessively laborious and time-consuming, constraining the use of the single-beam acoustic trapping system as a biophysical tool for medical and biological applications.
In this study, we, therefore, develop a novel fully-automatic deep learning-based image analysis algorithm that is capable of determining the invasion potential of breast cancer cells in an acoustic trap. We built a single-beam acoustic trapping system with a high-frequency ultrasound transducer for trapping of a target cell. Also, we developed a novel deep learning-based model for cell segmentation, identification of a trapped cell, and quantification of the calcium responses of the cell to the applied trapping force. The system and deep learning model developed are applied to determine the invasions potentials of breast cancer cells in suspension. The outcomes obtained using the fully-automatic deep learning-based algorithm are quantitatively compared to those obtained using a manual calcium analysis method to evaluate the performance of the developed algorithm for determining invasion potentials of breast cancer cells in an acoustic trap. The results demonstrate that a single-beam acoustic trapping system equipped with the novel fully-automatic deep learning-based image analysis algorithm has the potential to be a novel biophysical tool for determining the invasion potential of cancer cells in suspension.

Contributions
The contributions of this work are as follows: 1) We have developed a fully-automatic analysis system for determining the invasiveness of suspended breast cancer cells in an acoustic trap. Therefore, the system may serve as a novel tool to determine the invasion potential of breast cancer cells and may offer a rapid test for invasiveness of tumor biopsies in situ with further refinement.; 2) A deep learning-based model, denoted here as a multi-scale and multi-channel deep learning network (MM-Net), consists of the novel deep learning architecture, which is based on the combined structure of U-Net and InceptionNet via a bridge connection, to achieve better segmentation of fluorescent cells with vague boundaries [13,14]; 3) MM-Net can maintain spatial information as well as extract multiple features with different scales by exploiting the advantages of U-Net and InceptionNet; 4) To enhance the performance of our deep learning networks in the segmentation of fluorescent objects with vague boundaries, we propose a novel training methodology, which is denoted as a target inversion training method; 5) MM-Net outperforms other state-of-the-art deep learning models. The IoU value obtained by using MM-Net is higher than the IoUs obtained by using other deep learning models in the segmentation of fluorescent cells; 6) The deep learning-based image analysis algorithm has multi-functionalities such as automatic segmentation of fluorescent cells with high accuracy, identification of a cell trapped in an acoustic beam, tracking of calcium changes of the trapped cell moving into an acoustic trap over time, and determination of the invasiveness of the trapped cell, quantitatively.

Related works
1) Conventional Image Segmentation Methods: Various automatic segmentation methods of fluorescent cell images, based on conventional image processing or deep-learning techniques, have been extensively studied to extract important features of individual cells in fluorescence images. Otsu, Maximum Entropy, and isodata-based segmentation methods have been conventionally used for cell segmentation. These methods offered to accurately segment cells in fluorescence images with a high signal-to-noise ratio (SNR). On the other hand, k-means clustering and Canny edge detection-based segmentation methods showed better performance in the cell segmentation than the aforementioned methods. However, their performance was also sensitive to the SNR of fluorescence images and the threshold values to be selected [15].
2) Deep learning-based Image Segmentation Methods: Recently, several automatic cell segmentation algorithms based on deep learning techniques have been developed to overcome the problems that resulted from conventional image processing methods [16][17][18]. Deep Convolutional Neural Networks (DCNNs) was applied to segment a live cell. The method showed similar performance to the manual annotation method in live-cell segmentation [16]. A fully-convolutional semantic segmentation network based on SegNet was also shown to provide good performance in the segmentation of live fluorescent cells in the presence of high noise induced by low illumination [18]. On the other hand, Conv-Nets allowed not only to accurately segment the cytoplasm and nuclei of mammalian cells but also to classify cell types [17].
3) Automatic cell tracking techniques offer a high-throughput quantitative analysis of cell proliferation, motility, and division, which can provide valuable insights into biological processes. Various methods have been developed for the elucidation of cell signaling, cell phenotyping, etc [19][20][21]. An automatic cell tracking algorithm, based on a scoring system for choosing the best morphological and intensity match, was successfully applied to find the lineage of Caenorhabditis elegans in time-lapse fluorescence images [19]. On the other hand, a cell tracking algorithm based on sequential and joint optimization techniques has been developed to track cell signals in images. The sequential and joint optimization techniques improved the accuracy in the tracking of cell signals [20]. Besides these methods, model fitting algorithms or inter-frame assignment algorithms have been developed to track a target cell in time-lapse images. However, these methods required manual post-processing to achieve better outcomes [22][23][24].

Materials and methods
2.1. High-frequency ultrasound single beam acoustic trapping system with deep learning-based calcium image analysis A high-frequency ultrasound single beam acoustic trapping system has been developed for quantification of the calcium response of breast cancer cells to the given trapping force. As shown in Fig. 1, the single-beam acoustic trapping system consists of two functional modules: an acoustic trapping module and a fluorescence imaging module. The acoustic trapping module in the system was designed to adjust the position of a single-element focused ultrasound transducer using three-axis motorized stages. The ultrasound transducer generates a highly focused ultrasound microbeam for trapping and manipulating suspended cells in a petri-dish. A function generator (SG382, Stanford Research Systems, CA, USA) was used to produce sinusoidal bursts with a frequency of 30 MHz, which were amplified in a power amplifier (525LA, E&I ltd., Rochester, NY, USA). To identify the position of the focal point of an acoustic microbeam, a pulser-receiver (UTEX340, UTEX Scientific Instruments Inc., Ontario, Canada) was utilized.
In the acoustic trapping module, the acoustic force, generated by the module, trapped and stimulated periodically a breast cancer cell in suspension. The sequential concentration changes of intracellular calcium ions of the trapped cell with respect to the acoustic trapping force were visualized by using an inverted fluorescence microscope (IX71, Olympus, Center Valley, PA, USA). The light path in the fluorescence imaging module is shown in Fig. 1. The incident light from a mercury lamp passed through an excitation filter (488nm ± 20nm), was reflected by a dichroic mirror, and then passed through the microscope's objectives to excite the calcium fluorescence indicators loaded into cells. The fluorescence from the indicators was then collected by the objectives and passed back through the dichroic mirror and an emission filter (530nm ± 20nm). The fluorescence was recorded by a CMOS camera attached to the fluorescence microscope to form a calcium fluorescence image over time.
After time-lapse calcium fluorescence images were acquired before and after acoustic trapping, the time-lapse images were analyzed by using the developed deep-learning-based calcium image analysis model. The deep learning-based model offered to automatically segment cells, identify an acoustically trapped cell among the cells, and analyze intracellular calcium changes of the trapped cell for determination of its invasion potential.

High-frequency single element focused ultrasound transducer
A highly focused high-frequency ultrasound transducer was fabricated to trap a suspended cell. A multi-layered transducer was designed and fabricated to generate a highly focused ultrasound microbeam. The transducer has a multi-layered structure including a LiNbO 3 single-crystal layer, double matching layers, and a backing layer. A LiNbO 3 single-crystal layer with a thickness of 71 µm was coated on both top and bottom sides by chrome/gold (Cr/Au, Nano-Master, TX, USA). The first matching layer of 12 µm was constructed with a mixture of Epo-Tek 301 epoxy (Epoxy Technology Inc, MA, USA) and 2 -3 µm silver powder (Sigma Aldric Chemical Co., MO, USA) on the bottom side of the LiNbO 3 plate. The conductive epoxy layer (E-solder 3022, Von Roll Isola Inc, Solothrn, Switzerland) was attached to the other side of the LiNbO 3 plate for constructing the backing layer. This acoustic stack was placed in a brass housing, and the space between the acoustic stack and housing was filled with Epotek 301 epoxy for electrical insulation. The bottom side of the acoustic stack was formed into a concave shape by using a conventional press-focusing method. Parylene C was finally deposited over the concave surface for the second acoustic matching layer. A pulse-echo test of the ultrasound transducer was performed to evaluate the transducer using a pulser-receiver. The center frequency and bandwidth of the transducer were 22 MHz and 48% [ Fig. 2(c)]. The diameter and the focal depth of the transducer were 5 mm and 3.8 mm, respectively. The lateral and axial width of the focused ultrasound microbeam, generated from the fabricated transducer, and its negative-peak pressure at different input voltages were characterized using a needle-type hydrophone (AIMS II, Onda Corporation, A, USA) [ Fig. 2 . From the measured intensity profiles of an acoustic microbeam, the lateral and the axial width were determined as the domain where the intensity of the induced acoustic microbeam was higher than the critical value (-3 dB). The lateral and the axial width were around 30µm and 200µm, respectively [ Fig. 2(b)]. The negative-peak pressures at the acoustic beam focus were also measured to be 1.   To automatically segment fluorescent cells, we constructed a novel multi-scale/multi-channel deep learning model based on U-Net and InceptionNet, which is denoted as MM-Net [13,14].
To combine U-Net with InceptionNet, we implemented multi-channel convolution filters with various sizes, used in InceptionNet, into the "U shaped" architecture of U-Net. Also, we added a bridge connection between the encoder and decoder to combine all the features extracted from each stream of multi-channel convolution layers with various sizes [ Fig. 3(a)]. Here the bridge connection is based on the addition operation as the residual block utilized in FusionNet. By combining U-Net and InceptionNet, we could exploit the main advantages of both models such as maintaining location information on an image and extracting multiple features via multi-scale and multi-channel convolution operations. The MM-Net consists of two parts, as shown in [ Fig. 3

(b)]
and Table 1: upsampling and downsampling layers that have multi-scale convolution filters. The upsampling and downsampling structures were inspired by U-Net. The U-Net architecture is one of the underlying architectures for the segmentation of biological images as a deep neural network. Moreover, U-Net is a robust algorithm for the segmentation of regions of interest in biomedical applications with limited training samples. On the other hand, the multi-scale convolution filters are the notable features used in InceptionNet. They are applied to the steps of upsampling and downsampling layers to increase performance. By applying the multi-scale convolution channels in the InceptionNet architecture, its performance for the explicit recognition of a target is improved. The architecture of MM-Net is described in the following in detail. To make an output image size equal to the size of the input image, convolutional padding is applied to convolution operations. In downsampling layers, convolution layers with ReLu activation and max-pooling layers are utilized as the same as the U-Net model. Unlike the U-Net model, however, four convolution layers with different sizes of 1 × 1, 3 × 3, 5 × 5, and 9 × 9 are applied to each stream, which allows better detection of feature-pools. Each stream has a series of blocks that are the outputs of convolution max-pooling layers, but a convolution operation, which has different sizes of filters, is applied to every block in each stream. Furthermore, the next block is generated as the result of the convolution operation with the previous block. At the bottom of the architecture, outputs of each stream are not concatenated. However, the addition of the outputs is performed as shown in the residual connection of FusionNet to reduce computation complexity as well as increase the accuracy in segmentation [25]. The concatenation operation duplicates the channels of a feature map. In contrast, the addition operation adds each channel of images. Therefore, the number of channels of a feature map can be significantly reduced by the addition operation. With the reduced number of the channels of the feature map, the addition operation has reduced the computational complexity in the proposed model. Moreover, in the previous study, the addition operation showed increased accuracy in the segmentation compared to the concatenation operation [25] (Table 1).
A multi-scale convolution operation is also applied to deconvolution layers and increase probability-pools, thereby resulting in a more detailed prediction. Moreover, in the upsampling layers, multi-scale deconvolution layers are applied to the added layers. With the residual connection, the addition of output blocks of multi-scale deconvolution layers is performed and they are merged as an added block. A multi-scale deconvolution operation is then multiplied to the added block so that the next four different blocks, indicated by different colors, are produced. These steps are repeatedly performed to the end of the structure (Fig. 3).
In the developed model, the accuracy of the segmentation can be increased by performing convolution and deconvolution with various filter sizes of the multi-scale filter layers. This leads to an increase in the probability pools and therefore allows to more clearly distinguish boundaries. After the bridge connection, the addition of all the outputs from downsampling layers allows us to manage many features in raw data simultaneously with lower computation complexity, resulting in fast computation speed, free available memory, and high accuracy. In particular, to enhance the performance of MM-Net in the segmentation of fluorescent cells with vague boundaries, we have also adopted a novel training methodology, a target inversion training method. In the target inversion training method, the background is considered a main training target object, not a cell with significantly fluorescent variations, since the fluorescence change is not severe in the background.
The predicted mask from a fluorescence image delineates the specific boundary of each cell and localizes the cell at each time point. The size of the cells can, therefore, be measured by calculating the specific boundary of the cell. The segmentation performance of the developed model was quantitatively estimated by calculating the IoU and dice coefficient. The IoU value and dice coefficient of the MM-net were compared with those of selected conventional algorithms such as U-Net, FusionNet, FCN, and Mask R-CNN. Here same deep-learning parameters were utilized for optimization of U-net [14], FusionNet [25], FCN [26] , Mask R-CNN [27], and MM-Net. The 4-fold cross-validation was applied. The type of optimizer was Adam optimizer. The dropout rate was 0.000, and the size of a batch was four. In the parameters of Adam Optimizer, the learning rate was set to 0.001, the momentum to 0.900, and the weight decay to 0.999. The ReLu function was adopted as an activation function on every output of a (de)convolution filter. The binary cross-entropy loss was utilized as a loss function. In the segmentation, the output classes were a cell and background. Input images were gray-scale images. The size of the images was 512 × 512 pixels. In this experiment, TensorFlow was utilized and run on a server that has two CPUs, E5-2640v4, four GPUs, Nvidia Geforce TitanXp (12GB), and a total of 128GB memory [28].
Two different individuals manually constructed two sets of ground truth data under the supervision of an image processing expert. The number of ground truth data was 2,528 per each set. With two different sets of ground truth data, we trained, validated, and tested MM-Net, respectively. To build a dataset for the construction of MM-Net, we carried out 32 experiments with different cells. In the experiment, a total of 32 videos with 250 frames were acquired. Since any significant movements and changes of cells were not observed between adjacent frames, one frame per 12 frames was selected for the dataset. The size of the selected image was 2048 x 2048. After the image was resized by 1024 x 1204, it was cropped into four images with a size of 512 x 512. The images with a few numbers of cells were here removed. The 32 videos were divided into a training set (n=16), a validation set (n=8), and a test set (n=8) at a ratio of 2:1:1, respectively. As a result, the number of images in a training set was 1,264, the number of images in a validation set was 624, and the number of images in the test set was 640. The 4-fold cross-validation method was applied to the evaluation of MM-Net.

Automatic calcium analysis algorithm for a trapped single cell in time-lapse fluorescence images
To monitor calcium elevations in trapped cells, an automatic calcium analysis algorithm was developed and then combined with the novel deep learning model. This algorithm includes the following procedures: 1) recognizing the boundary and location of cells; 2) tracing movements of the cells at consecutive frames; 3) detecting a trapped cell in an acoustic beam among cells within the field of view; 4) measuring the calcium variations in the trapped cell over time; 5) determining the invasion potential of a trapped cell (Fig. 4). All procedures are performed automatically after segmenting cells in the series of fluorescence images. In all the frames of the segmented images, to detect the trapped cell within the field of view, cells are pseudo-colored in white color, and others (background, dumps, among others) are pseudo-colored in black color. In each frame, to index individual cells within the field of view of an image, the contours of cells are individually determined by a contouring algorithm [29], and each cell is then labeled. We here applied a conventional labeling and contour algorithm to the segmented cells [29]. For every frame, these procedures are repeatedly performed. To trace cells in every frame, the weighted key parameters such as distance, intensity, and hu's moments [30] are extracted from the segmented cells between two consecutive frames. To find the identical cells in the consecutive frames, an association index value, denoted as A, between cells in two consecutive frames is determined with Eq. (1), where A(i, j) is the association index value between the i th cell of the t th frame and the j th cell of the (t + 1) th frame, D(i, j) represents the normalized distance between centroids of two cells, I(i, j) and H(i, j) represent the intensity rate and the rate of hu's moments between the i th cell of the t th frame and the j th cell of the (t + 1) th frame, respectively. The α, β, and γ represent parameter-optimized constants for the distance, the intensity rate, and the hu's moment rate, respectively. The parameter-optimized constants were here determined by a heuristic grid search algorithm. α, β, and γ were 0.013, 1.100, and 0.800, respectively. i indicates the i th cell of the index set, Img(C t ) of cells in the t th frame whereas j indicates the j th cell of the index set, Img(C t+1 ) of cells of the (t + 1) th frame. The index set of cells at the t th frame is here denoted as C t . Note that the lower association index value (A) indicates the higher association between the i th cell of the t th frame and the j th cell of the (t + 1) th frame. By comparing the association index values between a cell in the current frame and all cells in the next frame, the index of the cell in the (t + 1) th frame, which has the highest association with the i th cell of the t th frame, is determined by Eq. (2), where Cell_Index t+1 represents the index of the cell in the (t + 1) th frame, which has the lowest association index value with the i th cell of the t frame. However, to exclude a disappeared cell from the field of view of an image, the following constraint is applied to the determination of the associated index value with Eq. (3): If the calculated association index value of a cell is higher than a threshold value, the cell in the i th frame is considered a disappeared cell in the (t + 1) th frame. The cell is then excluded in the set of cells. In contrast, if a new cell appears in the (t + 1) th frame, the association index values between the newly appeared cell in the (t + 1) th frame and disappeared cells in the previous ten frames are compared to determine whether the disappeared cell reenters within the field of view of an image. If the association index value between the newly appeared cell in the (t + 1) th frame and disappeared cells in the previous ten frames is greater than a threshold value of 0.5, the appeared cell is considered a new cell entering in the field of view of the image. After tracing cells of all the frames, to find a trapped cell, the set of a sequence of traced cells for the k th cell in the first frame is defined as Eq. (4): where S n represents the sequence of the n th cells traced from the first frames to the end. S k represents the set of the sequence of the traced cells (S n ), associated with the k th cell in the first frame. Finally, a trapped cell is detected by Eqs. (5) and (6): T k cell S k ≤ threshold ⇒ the cell for S k is trapped (6) where T k Cell is the index of the k th trapped cell, D is the sequence of distances among the k th cells in all frames, and G denotes the average of the center of gravity of individual cells of S k . The std and avg represent the standard deviation and mean, respectively. The trapped cell is here determined with the constraint shown in Eq. (6). σ, ε, , and the threshold value were determined to be 1.400, 1.100, 1.100, and 1.650 by a grid search algorithm, respectively. Table 2 demonstrates the pseudo-code of the algorithm for identifying a trapped cell.

:
For t frame do :

:
For i th cell(C th t (i)) in the frame t do : 7 : For j th cell(C th t (i)) in the frame t + 1 do : 8 : -Calculate A(i, j)

:
End for 10 : -Detect the cell matched with the cell (C t (i)) at the frame t+1

:
-Not detect the cell (Cell_index t+1 (C t (i))) when A(i, j) is over the threshold 13 : End for

: End for
15 : -Construct the set (S k ) which has all sequence of the traced cells :

:
For S n in S k do : 18 : Determine the cell k as a trapped cell when : End for After identifying a trapped cell, the mean fluorescent intensity of segmented regions of the trapped cell is measured in all the frames. The mean fluorescent intensity of the trapped cell is here normalized by the mean fluorescent intensity of a non-trapped cell in each frame to remove the photobleaching effects on the quantification of calcium changes in the trapped cell due to acoustic trapping force. After quantifying the calcium changes in the trapped cell, the invasion potential of the cell is determined with the method as shown previously [10,11].

Cell preparation and experimental setup
To evaluate the deep learning-based analysis algorithm for the determination of invasion potentials of breast cancer cells trapped in an acoustic beam, highly invasive MDA-MB-231 and weakly invasive MCF-7 human breast cancer cells were prepared from ATCC (Manassas, VA, USA) and cultured in Dulbecco's modified Eagle medium (DMEM) containing fetal bovine serum (10%). For the experiment, after the cultured MDA-MB-231 and MCF-7 cells were suspended using Trypsin-EDTA (Thermo Fisher Scientific, Waltham, MA, USA), which is used for detaching the cells from the bottom of a flask, the suspended MDA-MB-231 and MCF-7 cells were respectively added in two different Petri-dishes, which were filled with 2mL of Hank's balanced salt solution (HBSS), including mixtures of Pluronic F-127, Fluo-4 AM, and intracellular calcium indicators (Invitrogen, Grand Island, NY, USA), as reported previously [12].
To monitor the calcium response of breast cancer cells to acoustic trapping force, time-lapse fluorescence imaging was performed. The intracellular calcium variations in breast cancer cells were monitored before and after the application of acoustic trapping force. The acoustic pressure was sequentially increased to 1.01 MPa, 1.31 MPa, 1.75 MPa, and 2.12 MPa every 25 seconds during the time-lapse fluorescence imaging of cells to monitor intracellular calcium variations in the trapped cells. Also, the calcium response of the MDA-MB-231 and MCF-7 cells to acoustic trapping force at 1.10 MPa was quantified and compared. Furthermore, to assess the deep learning-based analysis algorithm for determining the invasiveness of suspended breast cancer cells in an acoustic trap, the algorithm was applied to quantify the calcium response of highly invasive (MDA-MB-231) and weakly invasive (MCF-7) cells. The outcomes obtained using the algorithm were then compared to the outcomes obtained using a manual calcium analysis method. The calcium response index (CRI) was utilized for the quantification of the calcium response of the cells. The CRI value was used to assess the invasion potential of the cells in the previous study [10]. The manual calcium analysis method was processed as shown in the followings: 1) Visually identify a trapped cell by acoustic trapping force and non-trapped cells, which are not affected by acoustic trapping force, in the time-lapse images; 2) Draw the boundaries of the trapped cell and one of the non-trapped cells in every frame; 3) Measure the mean of fluorescence intensities within the boundaries of the trapped and non-trapped cell in every frame; 4) Repeat the procedures from 1 to 3 for all the frames to quantify fluorescent changes in the cells over time; 5) Normalize the mean fluorescence intensity of a trapped cell by that of the non-trapped cell to reduce photobleaching artifacts; 6) Quantify a calcium response level of the trapped cell; 7) Determine the invasion potentials of the trapped cell.

Cell segmentation by MM-Net
To assess the MM-Net developed in this study, we compared the IoU values of the segmented images obtained by our developed model (MM-Net) and other conventional deep learning models including U-Net, FusionNet, FCN, and Mask R-CNN. Figure 5 illustrates the segmented images.  of individual cells than other models. The image obtained using FusionNet showed more salt and pepper noise. For more quantitative analysis, a test set of 640 images was applied to the models, and the IoU values and dice coefficients of the deep learning models were obtained (Table 3)

Calcium analysis of a trapped cell in an acoustic beam using the fully-automatic deep learning-based analysis method
To monitor intracellular calcium changes of an acoustically trapped cell, the trapped cells among the cells within the field of view were identified. The fluorescence intensities of the trapped cells were then measured across different time points using the developed algorithm. Figure 6 shows the segmented cell images, which are obtained using the MM-Net Finally, to examine the calcium response of a trapped cell to the acoustic trapping force, the mean fluorescence intensity of the trapped cell was calculated using the algorithm. To assess the performance of the algorithm, we compared the outcomes obtained by our method and those obtained by the manual calcium analysis method, which can be considered here as the ground truth. Figure 7(a) illustrates the resultant images obtained at different steps in the algorithm. When an original image [ Fig. 7(a), left] was an input to the algorithm, a segmented image [ Fig. 7(a), middle] was obtained after the MM-Net. After the segmented image input to the automatic calcium analysis algorithm, the automatic calcium analysis algorithm identified an acoustically trapped cell and a non-trapped cell [ Fig. 7(a), right], and the mean calcium fluorescence intensity profile of a trapped cell was calculated over time [ Fig. 7(b)]. In Fig. 7(a) (right), the acoustically trapped cell is represented in red whereas the reference cell, which is used for the reduction of photobleaching artifacts, is represented in blue. Figure 7 (b) illustrates the mean calcium fluorescence intensity changes of the trapped MDA-MB-231 and MCF-7 cells at different acoustic pressures. The MDA-MB-231 cell exhibited significant calcium elevations at 2.12 MPa whereas the MCF-7 cell did not show notable calcium elevations at the indicated acoustic pressures. When comparing the calcium fluorescence intensity profiles of the cells obtained using the developed algorithm and the manual calcium analysis method, the calcium fluorescence intensity profile obtained using the algorithm was slightly different from the profile obtained using the manual calcium analysis method, but there is a similar trend on both results for each cell type. These results demonstrate that our developed algorithm has the potential to be a novel algorithm for automatically quantifying the calcium response of the trapped cells.  These results showed that the outcomes obtained using the developed algorithm were in good agreement with the outcomes obtained using the manual calcium analysis method. In the above results, it was found that the calcium elevations in MDA-MB-231 cells were significantly different from those in MCF-7 cells at 1.01 MPa (Fig. 8). Therefore, quantitative analysis of the calcium response of MDA-MB-231(n=22) and MCF-7 cells (n=22) to acoustic trapping force at 1.01 MPa was performed using the developed algorithm, and the associated outcomes were further compared to the results of the manual calcium analysis method (Fig. 9). The means of the normalized maximum calcium elevations of MDA-MB-231 cells obtained using the developed algorithm and the manual calcium analysis method were 1.2638 (std: ± 0.2225) and 1.2420 (std: ± 0.2140) whereas those of MCF-7 cells were 1.0017 (std: ± 0.0033) and 1.0013 (std: ± 0.0027), respectively. In the outcomes obtained by the developed algorithm, fourteen MDA-MB-231 cells showed significant calcium elevations due to the acoustic trapping force, whereas MCF-7 cells did not show any significant calcium elevations. As a result, the CRI values (the number of cells showing significant calcium elevation x mean of the normalized maximum intensity) of the trapped MDA-MB-231 cells obtained by our developed algorithm and the manual calcium analysis method were 0.8043 and 0.7339, respectively. In contrast, the CRI values of MCF-7 cells obtained using both methods were 0. These results suggested that the results obtained using the developed algorithm were in good agreement with the outcomes shown in the previous study.

Discussion
A novel fully-automatic deep learning-based analysis technique for a single-beam acoustic trapping system, was developed for determining the invasiveness of suspended breast cancer cells in an acoustic trap. The constructed model was capable of segmenting cells with a high IoU value, identifying a trapped cell in an acoustic beam, and quantifying the calcium response of the trapped cell. The proposed system has notable advantages compared to the previous conventional acoustic trapping and analysis methods for monitoring fluorescent calcium responses of cancer cells in an acoustic trap. In the conventional single-beam acoustic trapping system, trapped cells are manually selected and delineated to measure their fluorescence intensities. However, when the system was combined with a fully-automatic deep learning-based analysis model, it allowed the automatic segmentation of breast cancer cells in time-lapse calcium fluorescence images, detection of a trapped cell, and quantification of the increase in calcium concentration in the trapped cell. This may enable rapid and accurate determination of the invasion potential of breast cancer cells in an acoustic trap when it is combined with other high-throughput cell analysis techniques such as microfluidics.
For the acoustic trapping of a single cell, a 22.2 MHz highly focused LiNbO 3 ultrasound transducer with an f-number of 0.76 was fabricated. The f-number of the transducer utilized here was similar to the f-number of the transducer described previously [12]. Although the center frequency of our developed ultrasound transducer was somewhat different from the center frequency of the ultrasound transducer utilized in the previous study, the transducer generated sufficient acoustic pressure of 2.21 MPa at 29.3 V, which was the same acoustic pressure as utilized in the previous study. Furthermore, the beam width of the ultrasound generated from the developed transducer was around 30 µm at at the focus. Using this high-frequency ultrasound transducer, MDA-MB-231 and MCF-7 breast cancer cells were successfully trapped in an acoustic beam at the focus.
For determination of the invasion potentials of suspended breast cancer cells labeled with fluorescent calcium indicators, the intracellular calcium changes of the cells were measured by the following procedures: 1) fluorescent cells are segmented with high accuracy; 2) a trapped cell in a fluorescence image is identified; 3) the intracellular calcium changes of the trapped cell are quantitatively monitored over time; 4) its invasion potential is then determined by quantification of the calcium response of the trapped cell to the applied trapping force. Note that it is highly important to precisely segment the acoustically trapped fluorescent cells with vague boundaries for better quantification of the invasion potential of breast cancer cells in an acoustic trap. A novel deep learning model based on U-Net and InceptionNet was thus developed for the segmentation of fluorescent cells. The deep learning model (MM-Net) offered the higher average IoU and Dice coefficient values than other deep learning models in the segmentation of fluorescent cells. Owing to the multi-convolution layers with different sizes and the target inversion training method, the developed model enabled better segmentation of fluorescent cells with ambiguous boundaries. In the previous study, fluorescent cells showing a calcium response were segmented using an Otsu method [31]. The Otsu method provided good results when the fluorescence of the target cells was intense, and the light illumination for excitation of cells was uniform within the field of view. However, when the Otsu method was applied to images of trapped cells obtained using the system, it lacked efficiency in the segmentation of cells exhibiting relatively low fluorescent intensity. In another previous study, the calcium intensities of cells were measured over time within the cell area segmented at the first time point since the cultured cells did not move significantly over time. However, this method was not appropriate for the present study because the suspended cells moved over time due to the acoustic trapping force applied to the cells. In contrast, in this study, the deep learning-based model automatically segmented the cells, detected the trapped cells, and measured the fluorescence intensity of moving cells at different time points, respectively. It is noted that the proposed model is fully-automatic for the determination of invasion potentials of breast cancer cells in an acoustic trap.
In MM-Net, we utilized an addition operation as the bridge connection to further enhance the performance in the semantic segmentation of fluorescent cells with vague boundaries. In the FusionNet model, a skip connection, which is introduced in U-Net, is replaced with a residual block. The residual block merges skipped images using the addition operation rather than the concatenation operation used in U-Net. The concatenation operation just duplicates the channels of a feature map whereas the addition operation adds each channel of images so that the number of channels of a feature map can be significantly reduced. With the reduced number of channels of the feature map, the addition operation reduces the computational complexity of the proposed model. Moreover, the addition operation can increase the accuracy in the segmentation compared to the concatenation operation [25]. With the devised architecture, MM-Net offered enhanced performance in the semantic segmentation of the fluorescent cells with vague boundaries by exploiting the advantages of the models, including maintaining the location information, operating multi-scale and multi-channel convolutions, and reducing the complexity of the operations.
Using the fully-automatic deep learning-based algorithm, we performed the segmentation of all cells within the field of view, instead of segmenting only a cell trapped in an acoustic beam. There were two crucial reasons to segment all the cells within the field of view: 1) because of photobleaching of calcium fluorescent dyes, fluorescent levels of trapped cells in acoustic beams should decrease during the acoustic trapping. Therefore, to accurately quantify the calcium response of the trapped cells, the effects of photobleaching on the trapped cells should be considered. For that, non-trapped cells not affected by acoustic trapping force should be identified and segmented automatically. After quantifying the temporal fluorescence changes in the non-trapped cells over time due to photobleaching, the fluorescence changes of trapped cells were normalized by the temporal fluorescence changes in non-trapped cells over time to minimize the photobleaching effects; 2) In our previous study, it was found that the acoustic beam location within the field of view could be changed by the disturbance of the experimenters. Also, the acoustic beam location was changed within the field of view of an image when the objective lens with different magnification was changed due to the misalignment of optics of a commercial microscope. Therefore, we ought to find the acoustic beam location before performing the experiment. The procedure of finding the acoustic beam location is also time-consuming and laborious. Therefore, to make our system robust to such surroundings, we developed the fully-automatic deep learning-based analysis algorithm for segmenting all the cells within the field of view, identifying trapped cells among the cells, and quantifying calcium changes in the trapped cells while minimizing the photobleaching effects.
After developing the MM-Net model, we developed and implemented the algorithm for identification of trapped cells among moving cells due to acoustic trapping forces in order to automatically quantify the calcium response of the trapped cells to acoustic trapping force rather than the use of other existing methods. The existing methods [32][33][34] are optimized to trace cells. However, they are not appropriate for identifying a trapped cell and then quantifying fluorescent intensity changes of the cell over time due to acoustic trapping force. Hence, in this study, we further developed a sophisticated algorithm implemented to the MM-Net. It was capable of identifying the trapped cell and quantifying the calcium response of the trapped cell to acoustic trapping forces over time to determine the invasiveness of breast cancer cells in an acoustic trap with high efficiency.
To assess the performance of the deep learning-based algorithm for the determination of the invasion potential of breast cancer cells, the calcium responding frequencies of MDA-MB-231 and MCF-7 cells obtained by the deep learning-based algorithm and the manual calcium analysis method were compared at different acoustic pressures. The mean maximum fluorescence elevation and CRI values of those cells were also compared. The values obtained using our developed model were in agreement with those obtained using the manual calcium analysis method. In a previous study, MDA-MB-231 cells exhibited a higher calcium response value than MCF-7 cells under the given acoustic pressure in both cultured and suspended states [10]. Here, MDA-MB-231 cells also exhibited a higher calcium response value than MCF-7 cells, as shown in the previous study, suggesting that the deep learning-based algorithm provides good performance in determining the invasion potential of breast cancer cells in an acoustic trap.
In the experiment, the analysis time for the determination of the invasion potentials of trapped cells in an acoustic trap with 250 frames by using the manual calcium analysis method was almost one hour, which was dependent on the number of cells within the field of view. It was very time-consuming and laborious. In contrast, the analysis time for that by using the developed algorithm was less than six minutes. The processing time of MM-Net for the segmentation of cells in one single image was around 200 ms. The processing time of the sophisticated algorithm for the identification of trapped cells and the quantification of their calcium response was around five minutes. Here the processing time of the developed algorithm was dependent on the number of cells in the images. Note that the fully-automatic analysis method was ten times faster than the manual calcium analysis method for the determination of invasion potentials of cancer cells with high accuracy. Here, the processing time of the sophisticated algorithm was relatively longer than that of MM-Net. The long processing time can be reduced by replacing the sophisticated algorithm part with deep learning architectures. The related study here remains future work.

Conclusion
In this study, we demonstrated a fully-automatic deep learning-based calcium image analysis algorithm for a single-beam acoustic trapping system capable of quantifying the invasion potential of breast cancer cells in an acoustic trap with high accuracy. The developed algorithm showed multiple functionalities in the determination of the invasion potential of breast cancer cells. It allowed the segmentation of cells, detection of a trapped cell, measurement of calcium variations in the trapped cell over time, and quantification of the calcium response of the cell to the applied trapping force with high accuracy.
In particular, for better quantification of the invasion potential of breast cancer cells in an acoustic trap, it is highly important to precisely segment the acoustically trapped fluorescent cells with vague boundaries. MM-Net developed here outperformed other state-of-the-art models for the segmentation of such fluorescent cells. It offered higher average IoU and dice coefficient values in the segmentation of fluorescent cells than those by U-Net, Fusion-Net, FCN, and Mask R-CNN.
In conclusion, the results obtained using the deep learning-based calcium analysis algorithm were consistent with those obtained using the manual calcium analysis method for the determination of invasion potentials of breast cancer cells in an acoustic trap. These results suggest that the fully-automatic deep learning-based calcium analysis using the single-beam acoustic system has the potential to be a novel tool for the rapid determination of invasion potential of breast cancer cells in suspension.