Quantifying the deformability of malaria-infected red blood cells using deep learning trained on synthetic cells

Summary Several hematologic diseases, including malaria, diabetes, and sickle cell anemia, result in a reduced red blood cell deformability. This deformability can be measured using a microfluidic device with channels of varying width. Nevertheless, it is challenging to algorithmically recognize large numbers of red blood cells and quantify their deformability from image data. Deep learning has become the method of choice to handle noisy and complex image data. However, it requires a significant amount of labeled data to train the neural networks. By creating images of cells and mimicking noise and plasticity in those images, we generate synthetic data to train a network to detect and segment red blood cells from video-recordings, without the need for manually annotated labels. Using this new method, we uncover significant differences between the deformability of RBCs infected with different strains of Plasmodium falciparum, providing clues to the variation in virulence of these strains.


INTRODUCTION
The malaria parasite, which in 2020 caused an estimated 627,000 deaths worldwide, 1 spends part of its life cycle in human red blood cells (RBCs).In the most deadly of malaria parasites, Plasmodium falciparum, the intraerythrocytic developmental cycle lasts approximately 48 hours, comprising the ring-, trophozoite-, and schizont-stages.3][4] As a result, the infected RBC (iRBC) becomes sticky and hyper-rigid.3][4] To study variations in rigidity, e.g., during the intraerythrocytic developmental cycle, or between different P. falciparum strains, hundreds to thousands of cells must be analyzed to create statistically meaningful data.Progress in studying these RBC morphological changes, particularly in membrane mechanics, has been hampered by the absence of a suitable assay to accurately measure the changes in rigidity in a high throughput manner. 5,6However, recent advances in microfluidic devices allow measuring RBC mechanics in large quantities. 7icrofluidic devices (MDs) are tools to manipulate fluid on a scale from a few microns to a few hundred microns.Feature size in MDs is commensurate with individual cell size, thus making MDs useful for cell analysis. 8MDs provide a low-cost, high-throughput method to study RBCs under varying conditions. 7,9For example, the impact of small blood vessels on RBCs or the RBC filtering function of the spleen, 10 can be emulated with narrow constriction channels.Comparing the deformity index (DI) (i.e., the shape deformity) of an RBC before and after the tight passage then yields a deformity index difference (DDI) value per RBC indicating cell deformability.High-speed microscope imaging records the events in such MDs, producing movies that can easily contain thousands of cells.Manually evaluating such a large number of cells, though, is a daunting task.In some cases, researchers have partly automated the MD-video processing for these studies by utilizing conventional image processing techniques [10][11][12] However, in general, these techniques are limited to pre-programmed solutions [11], or rely on the use of cell-specific fluorescent dyes. 13They often struggle to handle unexpected events such as motion blur, translucent cell edges, colliding cells, air bubbles, unusual cell shapes, background noise, MD artifacts, floating cell debris, cell clusters, and fluctuations in lightings.Figure 1 provides an example of a typical video frame that illustrates some of these problems.As a result, MD-video processing still requires timeconsuming human labor to predefine all the exceptions and design algorithms to handle them.An example of this, is the work of Saadat et al., 14 who, despite reporting tremendous success in determining the mechanical properties of RBCs in high-throughput, confirmed that image noise was detrimental for cell tracking and shape determination and were unable to handle abnormally shaped cells.Since all these ll OPEN ACCESS factors are prevalent in iRBCs MD videos, a different approach should be considered.Machine learning, in particular deep learning (DL), may be a more effective strategy to process the image data, including handling the many technical and biological variations.
The success of many recent DL projects can be attributed to the combination of clever neural architecture design and automated feature extraction from large datasets. 157][18][19] Most relevant here is that it has shown great results in image segmentation -the per pixel labeling of images-under noisy conditions, 20,21 which in principle makes it an ideal candidate for handling MD video data.
DL is a data-driven approach, and its performance directly depends on the quality and quantity of the data it is trained on.However, many scientific disciplines lack the amount of labeled data required to power DL algorithms.One way to remedy this is utilising transfer learning, which involves using huge pretrained deep learning models trained on large corpora of data.Rizzuta et al., 22 for example, used the features of a deep learning model trained on everyday objects to classify RBCs for hemolytic anemia.Although transfer learning can be a powerful tool for addressing the challenge of limited labeled data, it is important to note that the effectiveness of this approach is not always guaranteed, 23 particularly when the source domain is vastly different from the target domain.Moreover, deep learning models trained on these large datasets tend to be computationally expensive and require significant resources, making it impractical to analyze videos with many frames.
The use of synthetic data, which refers to data created by computer algorithms instead of collected from the real world, is another potential solution for combatting the lack of large sets of labeled data that is gaining traction in the DL community. 24Synthetic data are similar to realworld data and has the benefits of consistent labels and complete control over all characteristics.It enables the generation of large datasets without the need for manual labeling, and a small dedicated network can be trained to efficiently process large sets of frames.
In order to investigate whether synthetic data can be a viable option, we created a robust method to automatically track and calculate DIs of RBCs from noisy MD videos using RBC deformability differences of iRBCs of geographically distinct Plasmodium falciparum strains K1 25 and NF135. 26We designed a CNN architecture with two output layers to translate the noisy video frames into cell shapes and cell locations.We avoided the need for annotating a diverse, carefully labeled dataset of RBCs by training the CNN on synthetic data instead.We also created a Python program that accepts the CNN outputs to extract the individual RBC journeys through the MD, remove colliding cells, and calculate cell DDIs.The stages of RBC infection were classified by a domain expert.We confirm that there is a highly significant difference in RBC deformability between uninfected and early-stage infected RBCs, as well as a general trend of decreased deformability between the consecutive infection stages.Unexpectedly, we discovered a significant difference in deformability between the K1 and the NF135 strains at all stages of infection.

Generating synthetic data red blood cell images
We generated a synthetic training dataset of 10,000 image patches of 100 3 100 pixels that each could include RBCs, MD walls, and/or various artifacts.We here describe conceptually how this was done; see the STAR Methods section for a more detailed description.RBCs in our experimental setting adopted either a round (Figure 2A) or a bullet-like shape (Figure 2C).Although the real-world data of RBCs and their background appear intricate (Figures 2A, 2C, and 2E), we were able to approximate them reasonably well with just a limited set of simple rules (Figures 2B, 2D, and 2F).For each of the synthetic columns b, d, and f, we used the same procedure to generate the diversity shown, i.e., all generated images started from circles, lines, or both.We warped these basic forms in a variety of ways using elastic-deformation, a popular technique for augmenting images. 27To generate natural-looking textures, we used the Simplex-noise function, a function that produces gradient noise patterns that are commonly used in the game and film industries. 28Doing so, we were able to generate the background (Figure 2F) and inner cell textures (Figure 2B).The noise patterns were also utilized as interpolation ratio matrices, which enabled us to interpolate between simple textures to create more complex ones, to make cells partially translucent, and to add a color gradient to the background.In addition to the cell textures, we added dark spot(s) to a fraction of the cells to simulate the presence of parasites (Figure 2B).Finally, we used a blur technique to reduce harsh edges, thereby making the images appear more natural.The backgrounds were created in duplicate.One containing all objects, and one without any transient objects but with altered color intensities to make the neural network robust for differences in lightning.
All images were of course perfectly annotated because of the synthetic nature of the data.Each image had labels describing the cell's form (Figure 2G) and labels describing the cell's position (Figure 2H).The form labels contained pixel-per-pixel binary segmentation values indicating the probability that a pixel belonged to an RBC.For the cell location labels, each pixel was assigned the probability of being an RBC center, modeled as a 2D Gaussian distribution with its peak at the cell centers (illustrated as a heatmap in Figure 2H).The synthetic images were significantly smaller than the video frames, as can be appreciated by comparing Figure 1 with Figure 2.This allowed for a much smaller and more efficient CNN to be trained on while still containing enough context for the CNN to perform well.An interesting finding during testing was that exaggerating the objects' features helped the neural network generalize better.Making objects brighter or darker than they appeared in the videos, or having objects with more deformity than seen in the real data, for example, aided in making the neural network more robust.

Red blood cell tracking and segmentation results
Training the CNN on the synthetic data allowed us to recognize ''real'' RBCs from the MD image data.The CNN produced two kinds of output: one showing where the cells were and the other showing what shape they had.Using the CNN's location outputs, a Python script tracked the cells throughout the video.Figure 3 depicts some of the resulting cell journeys, which were created by combining crops from each frame in which the cell was present.The accurate segmentation, which entails marking each pixel whether it is part of the cell or not, appeared to be more difficult than the recognition of RBCs.As intended, cells produced a strong signal, whereas most floating artifacts were ignored unless they came into direct contact with the cells (Figure 3B, eighth frame, bottom row).Such temporary associations, however, have minimal effects on the DDI calculations (STAR Methods).The tracking script was tasked with ignoring cells that traveled in clusters or collided with other cells during the journey.This was to ensure that the collision effects would not interfere with later DI calculations.As discussed in the synthetic data section, we trained the CNN on round and bullet-like shaped cells, and, as expected, it was able to correctly identify and segment similar real-world cells (Figure 3A).Moreover, the CNN was able to generalize to cells with abnormal shapes and textures, which are common for malaria iRBCs, as can be appreciated from Figure 3C for which we selected some striking ''edge'' cases of cells with abnormal shapes.Since the CNN output contained the position of cells as peaks of probability values, we could correct for false positives and false negatives by setting a threshold to the model's output.We observed that a threshold for recognizing a cell that was set too high led to missing cells that were almost transparent in some of the frames.A too low threshold, however, returned an increasing number of falsepositives, typically large chunks of cell debris or parasite clusters (e.g., Figure 1D, bottom row).We preferred false positives over false negatives since we did not want to overlook any true cells, and a visual inspection of the cell journeys allowed us to remove the obvious non-cells afterward.The inspection also revealed unexpected ''double-cell'' journeys overlooked by the tracking algorithm caused by the CNN recognizing two overlapping individual cells as one single cell (e.g., the top row of Figure 3D).

Quantitative comparison of annotations
Although we visually confirmed the neural network performed adequately in segmenting the cells, we also wanted to assess the results quantitatively.To assess the CNN's performance, we compared the RBC surface area as determined by three human curators with experience in analyzing RBCs (GT, JK, and DR) with that of the CNN.Curators each were given samples from the 81,034 unique cell pictures segmented by the CNN (cropped to 54 3 54 pixels) without its output-prediction and were instructed to segment them by drawing polygons around them to indicate cell form.Agreements between shapes were quantified as fractions of overlapping pixels.As ambiguous or ''edge cases'' are relatively hard to segment, we made sure they were well represented in the set ($20%).To get an indication of the overlap between the human and CNN segmentations, we also compared them among curators themselves.We found that, collectively, the human curators had a median overlapping fraction of 0.950 with the CNN's output, and a median fraction of 0.961 amongst themselves (Table 1).Given that the cells are centered in the images, we actually expected large portions of the labels to always overlap.As a sanity check, we also evaluated what these fractions would be if the CNN outputs were compared to randomly selected curated segmentations of the cells.When randomly paired, the median overlapping fraction drops to $0.90 (Table 1, column 4).The small difference between human and CNN assessment, as well as the visual inspection of the CNN's outputs, gave us confidence in the CNN's ability to segment cells.Note that for the final DDI determination (Figure 4), we only used the median of the DI values between the shapes during and after the constriction (STAR Methods).This was done to ensure that occasional mis-segmentations (e.g., caused by cells being too transparent or touching floating cell debris) did not play a large role in the DDI calculations.These potential mis-segmentations were however not excluded from the samples on which Table 1 is based, allowing (D) Some ''failure'' cases.The top journey contains two cells traveling together, which should have been excluded.This was probably caused by the fact that they were overlapping the whole journey making it look like a single cell to the neural network.The bottom one is debris that was seen as a cell by the neural network due to its large size.
for an unbiased assessment of the CNN's capabilities.As a result, we anticipate that the final segmentations used for DDI calculations will even be more similar to the human curator results than Table 1 indicates.To offer more quantitative support, we have included in the supplementary file ROC plots (Figure S1) that display the F1, AUC, and IoU scores for all three curators.

The effect of developmental stage and strain on deformability
Having validated our synthetic data approach for training the CNN, we next focused on studying the effect of malaria strain and developmental stage on the DDI of iRBCs.Here we compared two P. falciparum strains of different geographical origins: K1 from Thailand and NF135 from Cambodia.For both strains, we studied the parasite effect in the ring, trophozoite, and schizont developmental stages on the DDI and used uninfected RBCs as controls for maximally deformable cells.In total, 1,716 cells were evaluated.
Infected RBCs have a lower DDI than non-infected ones, and the DDI is lowest in the largest and most mature developmental stage.Interestingly, we also find a significant and consistent difference across the developmental stages between the two strains, with NF135-infected RBCs having a lower deformability than K1-infected RBCs, while there is no difference between the DDI of the uninfected RBCs.

DISCUSSION
We described a deep learning approach that enabled us to investigate the deformability of malaria infected RBCs.To accomplish this, we used synthetic data to train a CNN to predict RBC surface areas and locations and process that information with a Python script.We demonstrated that the neural network performed well in noisy environments and could generalise not only to real-world data, but also to cells with abnormal shapes and textures.We found a significant RBC deformability difference between the K1 and NF135 strains throughout asexual blood-stage development.To the best of our knowledge, we are the first to use deep learning for automatic RBC deformity quantification under noisy conditions and the first to use non-simulation synthetic data to solve this problem.
The observed difference in the deformability between the strains is particularly interesting as differences in the membrane rigidity of iRBCs have been linked to virulence in the asexual blood stages of the parasite and to the transmission of the sexual blood-stage parasites to the mosquito vector. 53][4] While disease severity has been reported for the K1 strain, 29,30 there is no data on the disease severity of NF135.Although controlled human malaria infections (CHMI) with the NF135 strain in naive individuals did not report severe malaria during the relatively short infection period, 26 direct patient comparisons would be required to confirm the hypothesis that the differences in the DDI observed between the strains correlate with differences in virulence.However, the increased membrane rigidity combined with the drug resistance profiles of NF135 26 could make this particular parasite strain vital to uncovering important molecular mechanisms relevant to disease progression and parasite survival within the human host.Regardless of the interest surrounding NF135, the observed differences in membrane rigidity between these two wild-type malaria strains using this technique underline its power to uncover discrete differences in membrane mechanics.This will undoubtedly aid in the comparison of strains as well as genetically modified parasites in the quest to unravel the molecular mechanisms of virulence.Furthermore, by applying the same strategy to studying sexual blood-stage deformability, the importance of the relative levels of RBC rigidity for transmission can be explored.Finally, this technique can be used to study existing compounds and discover new ones aimed at reducing rheological changes in the iRBC that the current filtration-based screen would miss. 31t is well established that more diverse training data leads to more accurate DL models because deep learning can only interpolate between what it has observed during training. 32,33As a consequence, DL often fails on edge cases, which are rare occurrences that are unlikely to be captured in a training dataset.With our synthetic data, we were able to add rare but critical edge situations purposefully at any desired quantity and with consistent labeling.We found that by exaggerating the data features, the neural network learned to generalize better to new data.Also, by training on synthetic data while testing on real-world data, we did not have to be concerned with data leakage between training and test sets, something that is becoming a major part of the DL replication crisis in many science-fields. 34However, there are also valid arguments against the use of synthetic data to answer research questions.The most important one being that synthetic data will always be based on a rough approximation of the real world and thus also be limited by that.This could potentially result in false insights and, as a result, incorrect decision-making.After all, you can only get as much out of the data as you put in.What distinguishes this case is that the employed deep learning model is not attempting to learn something novel from the data but rather learns to take over the mundane task of cell segmentation under noisy conditions.The synthetic data used in this study is more than adequate for that task.The CNN is only used for what it is good at, namely pattern recognition, and statistical analysis is used to detect meaningful patterns in the data.This approach allows researchers to focus on their own strengths, such as hypothesis generation and testing, instead of wasting time labeling data by hand.Besides this study's objective, the technique discussed here is relevant to more than just studying malaria-induced RBC effects.In the future, this analysis pipeline could be used to study RBC deformability in diseases such as sickle cell anemia, thalassemia, and hereditary spherocytosis, to name a few.This tool can also be used to study RBC dynamics during other physiological processes, such as hypoxia, oxidative stress, lipid peroxidation, and the effects of red blood cell membrane drug binding.And these are only a few RBC-specific examples.We expect that by modifying the synthetic data, it can be extended beyond analyzing RBCs to other single cells or droplets, but this would require further research to confirm.Thus, the proposed pipeline has the potential to open up a whole new avenue of study.
To conclude, by showcasing the success of DL in one of the more difficult MD cases, we demonstrated that DL has the potential to significantly accelerate the analysis of single-cell studies that use microfluidic devices.We found that, in some cases, such as in this study, synthetic data can effectively complement DL.We anticipate that our approach will make the benefits of DL, which were previously restricted to manually labeling data, available to a larger audience.However, we also believe that DL should be employed conservatively to promote research transparency, which can only benefit science.The newly discovered effect of malaria strain on the deformability of iRBCs suggests that our combination of DL with CNNs and synthetic data can be used to uncover factors underlying the reduced deformability of iRBCs in malaria.

Limitations of the study
The cell detection and segmentation procedures took care of the majority of the labor involved in analyzing MD videos.Yet, the extracted cells still had to be categorized into the malaria stages manually.A natural next step could be to generate synthetic data representing all malaria stages so that a neural network could also learn to classify those.However, given the significant variance introduced by the random generators during data generation, which makes it difficult to replicate the small distinctions between malaria stages, as well as the method's simplicity, we do not feel this strategy was viable.Instead, we propose that the synthetic data be limited to cell tracking and extraction alone.

EXPERIMENTAL MODEL AND STUDY PARTICIPANT DETAILS
Plasmodium falciparum strains K1 and NF135 were cultured in standard culture conditions 35 in RPMI media supplemented with 10% human serum and maintained in 5% haematocrit of human RBCs.The parasites were maintained in an automated shaking culture system 35 as asynchronous cultures and diluted to between 0.5 -1% parasitaemia every 2 days through the addition of human RBCs.For the day of sampling, parasite cultures were first analysed by Giemsa staining 35 to ensure at least 5% parasitaemia and mixed life-cycle stages.Samples for microfluidic analysis were taken directly from the cultures and kept at 37 C until they were directly added to the microfluidic device.The parasite life-cycle stage and whether the RBC was infected not was manually annotated as described below.

Microfluidic devices and experimental protocol
A microfluidic device consists of an inlet, 30 parallel channels (6 regions with 5 channels in each), and an outlet (see figure below).A narrow channel was 7 mm wide and 1 mm long, two adjacent channels were 13 mm apart.A microfluidic pattern was drawn in Autocad (Autodesk) and transferred to glass (JD Photo Data).A microfluidic master was fabricated by patterning SU8 2007 photoresist (Kayaku Advanced Materials) on a silicon wafer (50 mm dia.; Si-mat); the photoresist was processed according to the manufacturer's guidelines.Briefly, photoresist was spun (20 seconds at 500 rpm; 30 s at 3000 rpm; acceleration 300 rpm/s) on a wafer, baked at 65 C for 1 minute, and at 95 C for 2.5 minutes; To define the pattern, the wafer was exposed through a glass mask (dose 100 mJ/cm 2 ; mask aligner MBJ3, Suss Microtec), baked at 65 C for 1 minute, and at 95 C for 3 minutes.The pattern was developed in SU-8 developer (Kayaku Advanced Materials) and the obtained master was baked at 175 C for 2 minutes.The height of fabricated features was checked with a Dektak 6M stylus profiler (Bruker) and was between 8.5 and 9 mm.The surface of the master was treated with 1H,1H,2H,2H-perfluorooctyltrichlorosilane (Thermo Scientific) to promote the removal of elastomer; the master and 50 mL of silane were left in a desiccator for 1 h under vacuum, followed by 2 h in a 95 C oven.Microfluidic devices were made from Sylgard 184 Silicon Elastomer kit (Dow).Base and cross-linker parts were mixed in a ratio of 10:1 (w/w), poured over the master to create a 5-7 mm thick layer, and degassed in the desiccator.PDMS was cured for at least 2 h at 65 C in an oven.PDMS was separated from a master, and a biopsy punch (1 mm dia., Kai Medical) was used to bore 1 mm holes for the inlet and outlet.PDMS piece was cleaned with Scotch tape, rinsed with isopropanol, and blow dried with nitrogen gas.Finally, a glass coverslip (50 mm dia.) and the PDMS piece were treated with oxygen plasma (25 s, 65 W, Femto 1A, Diener Electronic), and sealed together.
An experimental setup consisted of an IX71 (Olympus) microscope equipped with a 100x oil immersion objective (UPlanFLN, Olympus) and Miro ex4 (Vision Research) camera.Sample flow was controlled by Nemesys low-pressure syringe pump (Cetoni).
Ethanol was used to fill the narrow and shallow channels of a device and then replaced with PBS before introducing a sample.A PDMS connector: a 5 mm biopsy punch (Kai Medical) was used to cut out a PDMS cylinder from a 5-7 mm thick PDMS sheet. 1 mm hole was bored in the centre of a PDMS cylinder along the rotational symmetry axis.
A 0.5 mL Gastight syringe (Hamilton) was filled with HFE7500 oil and connected to tubing (0.56 mm ID, 1.07 mm OD, Adtech Polymer Engineering) using a 23G (blue) needle.A PDMS connector was used to connect tubing to a 200 mL pipet tip.Prior to loading the sample, a pipet tip was filled with HFE7500 oil from a syringe.50 mL of a sample was loaded into a tip and connected to a microfluidic chip; the other outlet was connected to a piece of tubing and a waste vial.
A sample aliquot was pushed through a device at 50 to 100 mL/h.Movies were acquired at a rate of 1000 fps.

Synthetic RBCs
As These predictions were then compared to the CNN's cell-location coordinates of the next frame to check if they were similar.A match was defined as a Euclidean distance that was less than twice the cell velocity and no larger than 30 pixels.Cell-objects that did not have a match with the location peaks were removed from memory.The crops of the frame and segmentation results were stored in memory alongside earlier crops for the cells that did match.The distances between all cell-objects were then calculated, and cells with a distance of less than 35 pixels were erased from memory to ensure colliding cells were not part of the study.The process of predicting the next location of cell objects and matching them with CNN detections was repeated until the cell-objects met the 'completion threshold', which was defined as the frame-width minus 40 pixels (see above figure).Following that, all cell-crops were stored as cell-journey images (e.g., Figure 2).

Shape and deformity index determination
The formula for the deformity of an RBC is expressed as the difference between the width-to-height ratios of the RBC during and after constriction, as shown in Equation 3. Specifically, Wa and Ha represent the width and height ratios of the RBC after constriction, while Wd and Hd represent the width and height ratios of the RBC during constriction.

DDI =
W a H a À W d H d (Equation 3) Each RBC, however, has many frames during the constriction and multiple frames after the constriction.Therefore, we calculated the DI for each frame and used the median DI value of the constriction and the median DI value outside the constriction to calculate the DDI.The rationale behind this was that this would result in a more robust DDI since noise factors, like those illustrated in Figure 3B, could sometimes hinder making proper segmentations.
Before calculating the DIs, cell-journeys with raw segmentation CNN outputs were multiplied by 255 and saved as grayscale images (grayscale = 1 colour dimension per pixel instead of the 3 used in colour images).For DI calculation, the cell journey segmentation outputs were loaded in and first smoothed with a Gaussian function (s value of 1) and then binarised using a threshold of 150.Some frames of the journeys still contained parts of nearby cells.Therefore, a 'Flood Fill' 38 function was applied to all frame centres to isolate the cell segmentations from other cells.Afterwards, rectangles were drawn around all cell segmentations to create bounding boxes.The widths and heights of the boxes were used to compute the cell ratios.

QUANTIFICATION AND STATISTICAL ANALYSIS Analysis of malaria's effect on red blood cell deformability
In our study examining the impact of malaria on red blood cell deformability, we adopted an especially stringent statistical approach to analyze differences in DDI distributions across infection stages for each strain.Using the Student's t-test, our significance thresholds were set at P > 0.01 for non-significance, 0.01 > P > 0.001 for significance (which is stricter than the conventional P < 0.05 typically used), and P < 0.001 for high significance.

Neural network performance analysis
We employed a neural network for cell segmentation.For validation, three expert curators (GT, JK, and DR) annotated a subset of 81,034 unique cell images.They marked cell boundaries using polygons without prior knowledge of the neural network's predictions.Overlap accuracy between human annotations and the network's segmentations was determined by calculating the fraction of overlapping pixels.As a control, we also assessed the overlap when neural network outputs were randomly paired with curated annotations.Additionally, we incorporated ROC plots in a supplementary file, detailing the F1, AUC, and IoU scores for each curator.

Figure 1 .
Figure 1.Topography of a typical microfluidic video frame A typical video frame from a microfluidic device contains many objects other than the RBCs of interest.These non-cell objects, such as cell debris, microfluidic device walls, or other artifacts, in combination with varying focus and lighting conditions, interfere with the segmentation and tracking processes.The channels in the video frames have a width of 7 mm and are spaced 13 mm apart.

Figure 2 .
Figure 2. Synthetic and real-world example images (A) Variation of real-world ''round'' shaped RBCs; (C) same as (A), but with ''bullet'' shaped RBCs; (E) same as (A) and (C) but with focus on background variations.(B), (D), and (F) are manually selected images from the synthetic data that are similar to the corresponding images in (A), (C), and (E), respectively.Labels are given in two ways: 1) the segmentation, indicating the probability per pixel that it belongs to an RBC, as illustrated in (G); and 2) a Gaussian distribution indicating the probability of pixels belonging to the center of an RBC, as illustrated in (H).

Figure 3 .
Figure 3. Cell tracking results Original images are on top, segmentation results are at the bottom.(A) Examples of cell-journeys of cells with shapes similar to those in the training dataset.(B) Examples of cell-journeys with noise artifacts floating around the cell, which could potentially hinder correct cell-segmentations.Note that in the in the 8th frame of the bottom journey, the artifact is in contact with the cell and becomes part of the cell-segmentation.(C) Cell journeys with unusual shapes not similar to the training data.This demonstrates the generalization abilities of the neural network.(D)Some ''failure'' cases.The top journey contains two cells traveling together, which should have been excluded.This was probably caused by the fact that they were overlapping the whole journey making it look like a single cell to the neural network.The bottom one is debris that was seen as a cell by the neural network due to its large size.

Figure 4 .
Figure 4. DDI distributions of the various infection stages, per strain Student's t test between each infection stage of both strains.The significance of the Student's t test is indicated as follows: n.s.= p > 0.01; * = 0.01 > p > 0.001; ** = p < 0.001.The boxplots contain a middle line that indicates the median; the lower and upper ends of the box indicate the 25th and 75th percentiles; and the lower and upper extremities indicate the minimum and maximum values.The dotted red lines indicate the zero-deformity and the median-deformity of uninfected cells.
An experimental setup to capture RBCs deformation (A) Overview of a setup: a syringe with sacrificial fluid (HFE7500) and a 200 mL pipet tip is connected to the inlet of a microfluidic device.A pipet tip is loaded with 50 mL of RBC sample.An outlet is connected to a waste vial.Deformation of RBCs is observed on the IX71 inverted microscope via a 1003 oil immersion objective.(B) An architecture of a microfluidic device: the inlet (circle at the top) splits into 6 sets of cannels; each set consists of five 7 mm narrow and 1000 mm long channels.Neighboring channels are 13 mm apart.The outlet is located at the bottom.The inset at the bottom-right of the panel shows the approximate position where experimental videos were acquired.
illustrated in below figure, the synthetic data consisted of images of 100x100 pixels with a single-colour (grayscale) dimension.Each image consisted of a noisy background (e.g., below figure, panel 2) with some or all of the following objects: a single wall (below figure, panel 3), static artefacts (below figure, panels 4 and 6), floating objects (below figure, panel 5), and a round (see below figure) or bullet-like shaped cell (see below figure).Besides drawing circles and lines, the main algorithms used are: elastic deformation 27 (see below figure), Gaussian distribution function (examples in below figure), and simplex noise generation 28 (e.g., below figure).Elastic deformation is a function that can 'randomly' warp images, the ranges of which are determined by the parameters a and s, which determine the range and intensity of image-warping.The Gaussian function creates a two-dimensional bell-curve.It contains a single parameter, the standard deviation (s).The s can either be used directly or be calculated indirectly given a box-size (e.g., below figure had box-sizes of 3 and 16 and an ideal s was calculated for those sizes).The OpenSimplex function, which generates gradient-noise patterns, has three parameters: the zoom, which determines the 'size' of the noise (for example, the left image in below figure has a smaller zoom value than the right image), the random seed, which serves as a starting point for the random number generator used by the algorithm, and the range, which normalises the noise to be within a certain range of values.The seeds were based on the local time.Because we employ a large number of randomly sampled integers to construct the synthetic data, these will be designated as rdm_range(n,m) in the remainder of the article, signifying any integer randomly chosen between n and (including) m.Often, multiple images were interpolated (combined) into a new image.Interpolation requires two images (P1 and P2) and a third interpolation image (I) that contains information on how much each pixel in either image should contribute to the new combined image (P3) and has only values ranging between 0 and 1. Interpolation is then done using the formula: P 3 = I Ã P 1 + ð1 --IÞ Ã P 2 (Equation 1) Creation of synthetic RBCs (A) Steps in the creation of synthetic backgrounds (B) Steps in the creation of synthetic round cells.(C) Similar as a, but for bullet-like cells.(D) Illustration of elastic warping, which was used to introduce variation to object shapes.(E) Gaussian distribution examples, which were used to smooth out rough edges (left), but also to generate the cell's centre-probability labels (right) (F) Some examples from the OpenSimplex noise function.These are gradient noise patterns that can be utilized to synthesize textures (as seen in a's second panel, b's third panel, or c's fourth panel), but they can also be used to interpolate between two images, with the intensity of the noise value determining how much each of the two images per pixel should contribute to the combined image (e.g., a noise value for a given pixel of 0.7 corresponds to the combined pixel being 70% image 1 and 30% image 2 for that pixel).

A
schematic of the tracking algorithm A cell initialization threshold indicates the start of a cell-journey.With every consecutive frame, the cell's new coordinates are predicted given the cell's momentum.Cells are eliminated for research if they come within 35 pixels of another cell.After a cell object passes the cell journey completion threshold it is marked as complete.

Table 1 .
Trained CNN assessment