Big Data in Transfusion Medicine and Artificial Intelligence Analysis for Red Blood Cell Quality Control

Background: “Artificial intelligence” and “big data” increasingly take the step from just being interesting concepts to being relevant or even part of our lives. This general statement holds also true for transfusion medicine. Besides all advancements in transfusion medicine, there is not yet an established red blood cell quality measure, which is generally applied. Summary: We highlight the usefulness of big data in transfusion medicine. Furthermore, we emphasize in the example of quality control of red blood cell units the application of artificial intelligence. Key Messages: A variety of concepts making use of big data and artificial intelligence are readily available but still await to be implemented into any clinical routine. For the quality control of red blood cell units, clinical validation is still required.


Introduction
Blood donations, the processing of blood products (plasma, platelet units, and red blood cell [RBC] units), and their use to treat patients are success stories at least since the discovery of the ABO blood group system by Landsteiner [1]. The technology and the knowledge developed ever since improved almost all aspects of transfusion medicine. Today, more than 100 million RBC units are transfused each year worldwide [2,3] and the use of big data in these transfusion data has the potential to further enhance the safety, efficiency, and effectiveness of transfusion medicine [4]. One way to conceptualize the application of big data to transfusion medicine is to combine various types of health data, including electronic health records [5,6], electronic medical records [7], personal health records [8], laboratory information systems [9], medical practice management [10] software, and hemovigilance data [11]. This combination creates a large database that healthcare professionals can use to identify patterns and trends, leading to improved practices in blood product usage, inventory management, and more, ultimately, improving patient outcomes [4,12]. By including hemovigilance data in the discussion of big data in transfusion medicine, it highlights the importance of monitoring and ensuring the safety and quality of blood products, which is a critical aspect of transfusion medicine. Blood unit safety in terms of infection risks is very low in high-income countries. For example, in Germany, it was estimated to be as low as 1 in 10.9 million for hepatitis C virus, 1 in 4.3 million for human immunodeficiency virus (HIV)-1, and 1 in 360,000 for hepatitis B virus [13]. The general infection risk is higher (up to 2%) in low-income countries [14]. In particular, the measures implemented after the HIV crisis in the 1980s [15] made blood products very safe in terms of contamination. In addition to the risk of infection, the storage conditions of blood units are another important aspect to consider. In many countries, blood units can be stored for up to 42 days under controlled conditions before transfusion. However, studies have shown that after the first 14 days of storage [16], RBCs can accumulate biochemical and morphological lesions that can irreversibly alter their biological functions [17]. These lesions, known as RBC storage lesions, have been widely discussed in the literature, but it remains unclear which parameters are useful in assessing the quality of blood units for transfusion. Such parameters could be crucial to establish if RBCs should be transfused and in which number. Moreover, possible donor-dependent RBC quality is discussed [18], opening room to singledonor blood quality tests and shelf-life dependence to improve RBC transfusion decision-making and outcomes. This is of particular importance since there is a shortage of blood products all over the world, e.g., [19]. Although quality control per se does not increase the total amount of blood products, a need which is even increased with the COVID-19 pandemic [20,21], transfusion product management concepts, and putative increased shelf life (at least for a fraction of the donors) may ensure the most effective application for each blood donation (including a best donor-patient match). Artificial intelligence (AI) has emerged as a promising method to access and analyze such parameters providing valuable insights into single RBC quality and function [22]. Moreover, AI can establish reliable quality databases for blood products, enabling efficient management of the manufacturing and clinical application of blood [23]. Computer simulation and algorithmic predictions will facilitate this process, aided by advanced robotics and information technology to standardize production processes. The incorporation of AI as a tool for quality control analysis in big data represents a new and promising approach to single-cell quality control in transfusion medicine, as illustrated in Figure 1. Its ultimate goal is personalized transfusion therapies [24] while maintaining high-quality standards. Here, we review the current situation and perspective for the application of big data and AI analysis in transfusion medicine in general and for RBC quality control in particular.

Big Data in Transfusion Medicine
The concept of big data, although popular, is still not completely clear. However, it is guided by its four essential features: data attributes, technological requirements, analysis, and limitations [25]. Taken together, big data can be defined as a dataset that increases exponentially in variety, volume, and velocity [26] and that requires specific technology and analytical methods for storage, processing, and extracting information [27].
In the healthcare system, the use and management of electronic health records have expanded the scope for big data and its approaches since the early 2000s. In transfusion medicine, big data has been extensively used at various levels [4], such as in the REDS-III program [28], which utilizes big data techniques and infrastructure to establish a comprehensive research database that connects information from blood donors, their donations, the components derived from these donations, and the recipients of these components. The interconnected database allows for numerous analyses that characterize blood component utilization patterns in diverse settings, inform the design of future clinical trials, and potentially determine the effects of blood donors/donations on the clinical outcomes of recipients.
REDS-III also accesses the SCANDAT [29] database, a large linked electronic donor-recipient database consisting of blood donation records and transfusion activities dating back to the 1960s-1980s in Sweden and Denmark. The upgrade of SCANDAT to SCANDAT2 [30] aims to facilitate systematic surveillance programs for blood safety by enabling complete tracking of subjects regarding cancer, hospital care, and causes of death up to 2012. SCANDAT2 can be integrated with big data through various methods, including linking it with other largescale health databases like national cancer registries to examine the long-term health outcomes of blood donors and transfusion recipients. Additionally, big data analysis techniques like machine learning algorithms can be utilized to identify patterns in the vast amount of data contained in SCANDAT2, which may indicate transfusion-transmitted infections or adverse health outcomes related to blood transfusions.
Large databases offer additional insights beyond the influence of donor factors on recipient outcomes, as demonstrated by various studies [31][32][33], e.g., importance of standardized procedures in blood processing and storage [32]. Using data and samples collected as part of the REDS-III program, the study [32] investigated how the metabolism of stored RBCs is impacted by the heterogeneity of blood processing and storage additives at different centers, emphasizing the crucial role of standardization in blood processing and storage to minimize the variability of blood products across different centers. The relative abundance of metabolites (energy metabolism, amino acids, and nucleotides) in RBCs stored for varying durations (1,7,14,21,28, and 42 days) from three different centers is shown in Figure 2. The figure highlights the importance of standardizing blood processing and storage procedures to reduce heterogeneity between centers. And it indicates that the heterogeneity of blood processing and storage additives at different centers significantly impacted RBC metabolism, and this impact was as significant as the effect of storage time.
Other studies utilize extensive data on each participant, including demographic information, medical history, and blood donation history. For instance, the INTERVAL [34] study investigated whether varying the frequency of whole blood donation affected the safety and efficiency of blood collection, while the PROTON [35] study utilized a large dataset for costeffectiveness analyses of blood safety interventions. The PROTON study in the Netherlands created the first national dataset of blood product recipients allowing the analysis of the cost-effectiveness of blood safety interventions such as nucleic acid amplification testing, leucocyte depletion, and pathogen reduction. The dataset can be used for economic evaluations and costeffectiveness analyses to assess the effects of blood safety measures on different age groups, sex, and diagnoses of recipients.
Additional information can also be integrated into such databases, such as the effects of nonmatching patient and donor sex after transfusion [36,37] and the implementation of various blood preservation methods [38] that may contribute to post-transfusion complications. Moreover, the impact of a blood transfusion monitoring program [39] on financial and allogeneic blood product costs, as well as the optimization of current tools in managing blood orders for specific surgical procedures [40], can be examined to conserve financial and allogeneic resources without adverse effects on patient health. When integrated in a large-scale health database, machine learning algorithms can be applied to identify patterns in the data that may lead researchers to a deeper understanding of the factors that influence blood safety and identify potential areas for improvement in blood screening and transfusion practices [30].

RBC Unit Quality
Investigating RBC quality is an example where new technology is used to generate a large amount of (annotated) imaging data, too large to be analyzed by manual inspection (a definition of big data). Therefore, AI can be used for data analysis, and in the following, we will describe (and review) the entire process in detail.
We need to discriminate between RBC unit quality and safety. The safety of RBC units was improved considerably by screening for HIV, hepatitis B, hepatitis C, syphilis, and (in some countries) malaria [14,41]. In contrast, until now there is no clinically established graded measure for RBC quality. In some instances, the approaches to improve RBC safety may even compromise RBC quality [42].
An easy-to-measure and straightforward-to-judge parameter is the extracellular hemoglobin indicating the number of hemolyzed RBCs. However, this is an "all or nothing" response just indicating the lysed cells without any chance to judge the remaining cells for putative damage.
There are changes in RBC properties occurring during the time course of the RBC storage, which are usually referred to as "storage lesions" [43,44]. Additionally, there are indications that the risk for post-transfusion complications increases with storage time [45,46].
Furthermore, there are indications that the efficiency of the transfusion is indeed related to RBC properties. The seminal work of Barshtein and colleagues showed correlations between single RBC properties (for assays to measure such properties, see below) and parameters, which are influenced by transfusion efficiency [47,48]. To this end and to judge these results, we extracted and reprinted some of these correlations in Figure 3.  microscopic life cell imaging [49], patch clamp [50], flow cytometry but also biochemical methods such as single-cell PCR [51]. This trend in principle also applies to human RBCs and RBC analysis in research and diagnostics [52,53]. Prominent examples are the application of automated patch clamp in identifying channelopathies [54] or the use of single-cell Ca 2+ imaging to decipher general principles in disease mechanisms [55]. However, all these methods require extensive (pre)processing and/or contain time-consuming procedures. Therefore, methods to be applied in clinical routine would benefit from label-free methods and decent throughput.

RBC Characteristics: From Shape to Deformability
To assess the quality of RBCs, traditional methods often characterize the RBC morphology. This ranges from high-resolution scanning electron microscopy [56] to the judgment of blood smears. The latter can be exploited in 3D fluorescence microscopy in stasis [57] and flow [58]. As a result, many RBC-related diseases have derived their names from the shape of the RBCs [59] ranging from sickle cell disease [60] to neuroacanthocytosis syndrome [61]. Furthermore, it is known from rheological and ektacytometric cell population measurements that a crucial parameter (also influenced by RBC storage time) is RBC deformability. In such ensemble average methods, it was shown that the overall RBC deformability reduces as storage progresses [62][63][64][65]. However, these traditional bulk flow techniques require a large sample volume of up to a few milliliters. Moreover, these methods do not consider the heterogeneity within the sample population or cannot detect fractions of abnormal RBCs in a sample that contains primarily normal cells. Therefore, singlecell techniques, such as micropipette aspiration, optical tweezers, atomic force microscopy/spectroscopy, and microfluidics, have emerged to assess RBC deformability as a biomarker for numerous pathologies and to evaluate storage lesions [66,67]. While measuring RBC properties, employing traditional micropipette aspiration, optical tweezers, and atomic force microscopy is tedious, highly skill-dependent, and often low throughput [68,69].

Microfluidic Approaches to Investigate RBCs
Microfluidic devices provide straightforward singlecell measurements. Thus, microfluidic measurements can be performed with higher throughput, which is paramount for point-of-care applications [52]. Significant progress has been achieved in measuring RBC deformability in microfluidics using various sophisticated designs and mechanisms, including wedging in tapered constrictions, transition, and deformation through constricted channels, and microcapillary flow (see, e.g., [67,70,71]). Such microfluidic techniques are often coupled with high-speed imaging and automated image analysis, enabling to test large numbers of cells.
Different parameters have been utilized to assess storage-induced changes in RBC deformability. One property is the clogging of narrow constrictions in a b microfluidic chips [72][73][74] or the perfusion in capillary networks of microfluidic chips [75,76]. As the RBC storage progresses, the degradation of RBC deformability increases the number of cells that are not able to transit through tapered constrictions, smaller than the RBC diameter [77]. Consequently, the perfusion rate in artificial microcapillary networks declines while the number of plugging events increases during storage [65].

Microfluidic Assays to Measure Deformability
In straight microcapillaries, RBC deformability is often evaluated by the elongation of single cells in the flow field, which is described by the deformability index. The deformability index is defined as the ratio between the length L and the diameter D of the deformed RBC, as shown in Figure 4a the image plane. Therefore, measuring the deformability index requires optical measurements with submicrometer accuracy using high-speed imaging in combination with optical microscopy. By data on storageassociated changes in deformability obtained by ektacytometry [62,64], a decrease in the deformability index during storage has been reported using microfluidic approaches [78]. In contrast, other microfluidic studies did not observe a significant difference in the change of the deformability index, measured in human-capillarylike microchannels during storage [79,80]. Remarkably, significant changes in the time constants for cell relaxation and cell circularity during storage have been reported [80]. Cell elongation in microchannels and, thus, the deformability index depend on the exact cell velocity [81], which generally depends on the applied pressure drop, cell size, channel geometry, and used medium for cell suspension. Hence, assessing RBC quality based on geometrical cell parameters alone, such as the deformability index, is prone to experimental and postprocessing errors. Therefore, alternative parameters, such as storage-induced morphological changes, have been investigated. The number of pathological RBC morphologies increases during storage [22,77,82,83].

AI to Judge RBCs
To determine the RBC morphology quickly and unbiasedly, machine learning approaches are used nowadays, similar to other aspects in hematology and transfusion medicine [24,84,85]. In the context of RBCs, artificial neural networks and deep learning-based techniques have been used to assess cell phenotypes both in stasis [57,82,[86][87][88] and during deformation [71,83,[89][90][91]. Kim et al. [82] employed a generative adversarial network to evaluate RBC phenotypes based on phase images obtained by digital holographic microscopy at rest. During storage, they observed a transition from discocytes to echinocytes and, finally, sphero-echinocytes [82]. Recently, Lamoureux et al. [87] proposed a technique to assess the deformability of stored RBCs from brightfield microscopy images. Therefore, cells were first sorted in a microfluidic device and subsequently used to train a convolutional neural network (CNN) to classify cell-based image features related to cell deformability. The CNN correctly predicted the deformability of individual RBCs with 81 ± 11% accuracy averaged across ten donors [87]. However, approaches to assess storage-induced changes in RBCs at rest usually require the sample to be dropped or fixed on a glass slide, which often renders the techniques unsuitable for high throughput. Additionally, blood in vivo is under permanent flow, while blood in stasis is not compatible with life. Therefore, recent AI-based approaches aimed at assessing RBC morphologies in microfluidic devices, mimicking in vivo flow conditions. Recently, imaging flow cytometry was combined with a deep-learning-based morphological assessment to provide an objective evaluation of stored RBC quality [83,91]. In these studies, images of single RBCs flowing through a cytometer were recorded. Subsequently, a CNN was trained to categorize cells into six different morphological classes. This fully supervised model was able to provide a fast and automated morphological assessment with accuracy values comparable to the accuracy between different human experts. However, it was noted that the classification of RBC shapes into discrete bins, by either human experts or AI, could be an inadequate description of a continuous biological process [91]. Therefore, Doan et al. [91] proposed a weakly supervised network that learned the morphological properties of RBCs independently from visual categories previously defined by experts. This network revealed a chronological progression of RBC morphological changes that better predicted RBC quality than the previously employed shape classification system [91].

RBC Quality Measurements in Microcapillaries
Using AI Despite these advances in RBC shape classification under flow using imaging flow cytometers, the dimensions of the flow channel's cross-section in such instruments are generally larger than the RBC size to achieve high throughput. However, this results in a large degree of freedom regarding the cell orientation, rotation, and Based on these images, standard flow analysis can determine the RBC velocity, projection area, y-position, and deformability index DI. b Histogram and probability density functions (pdf) of the normalized RBC y-position for two donors at week one and week six after storage. c Characteristic healthy, pathological, and other RBC shapes during capillary flow. d AIbased assessment of RBCs during flow. RBC shape phase diagrams, i.e., the fraction of the RBC shape classes shown in c as a function of the RBC velocity for the same donors and points in time as for b. e Comparison of the standard (top) and AI-based (bottom) evaluation of donor-dependent changes in the RBC microcapillary flow behavior. Dashed black lines correspond to linear fits. Gray areas indicate an estimate of a 95% prediction interval. In the standard approach, the y-peak ratio describes the ratio between centered and off-centered RBCs concerning the channel center in the y-direction based on probability density functions (pdfs) at velocities between 7 mm/s and 10 mm/s. For the AI approach, the y-axis of e shows the logarithm of the fraction of pathological RBC shapes based on the shape phase diagrams of d. f Comparison of the coefficient of determination for the linear fits shown in e. Dashed lines connect data points for the same donor analyzed with the standard and the AI approach. This figure was reproduced from data presented by Recktenwald et al. [22]. resulting shapes. To overcome these challenges, microfluidic channels with cross sections on the order of 10 μm were previously used to achieve stable RBC shapes that primarily depend on the cell velocity or channel shear rate [58,92]. In such microcapillaries, healthy RBCs predominantly deform into centered croissant-like shapes (shown in Fig. 4a, top) at low velocities and offcentered slipper-like shapes (shown in Fig. 4a, bottom) at high velocities. Combining these microfluidic measurements with AI-based shape assessment provides new opportunities in RBC flow characterization. Recent advantages have been reported for fast and unbiased classification of healthy RBCs [89], for study changes during infection (e.g., COVID-19) [93], and for detecting donordependent RBC aging curves [22].
To highlight the practicability of artificial neural networks and deep-learning-based techniques, we reanalyzed the data presented in [22]. We further compare the possible advantages of this AI-based evaluation with the standard approach reported in [22] that relies on geometric or analytic cell parameters. Figure 4 illustrates the results using standard flow analysis versus AI-based shape classification techniques, evaluating RBC changes of six donors over 10 weeks. In the standard approach, multiple parameters can be extracted from the captured images shown in Figure 4a to assess changes in the RBC capillary flow behavior. These include the cell velocity, which depends on the applied pressure drop in the microfluidic chip, the projection area of the cell, the equilibrium lateral position of the RBC across the channel width, and the deformability index DI, which is calculated based on the geometrical size of the RBC. One standard parameter that has been shown to capture dynamical RBC transitions in capillary flow is the equilibrium cell position across the channel width. Figure 4b showcases the evolution of the normalized cell position distribution for two donors and 2 weeks at cell velocities between 8 mm/s and 10 mm/s. The off-centered peak position is characteristic of the occurrence of slipper-shaped RBCs at such velocities, while other cell shapes preferentially flow at the channel center. Consequently, the ratio between the central and off-centered peak positions was previously used as an analytical parameter in [22] to assess changes in the flow behavior of stored RBCs. However, it was shown that the microcapillary flow behavior is crucially governed by RBC morphology [94]. Hence, RBCs that flow at a similar equilibrium y-position yet exhibit different shapes can have different effects on hemorheology. Therefore, characterizing the cell shape in flow could potentially improve the assessment of RBC quality and flow behavior. Such an automated RBC shape classification, using a CNN, is used as a comparison to the standard analytical approach. The CNN can discriminate between various RBC shapes in flow, including specific cell shapes that are characteristic of a specific pathology, e.g., sickle cells or acanthocytes. To assess storageinduced RBC changes, cells are classified as either healthy or pathological shapes shown in Figure 4c. The so-called shape phase diagram, i.e., the fraction of RBC shapes as a function of the velocity, is one straightforward parameter that can be used in the AI-based approach to depicting the shape dynamics of RBCs in capillary flow [95]. Figure  4d shows representative shape phase diagrams for the same data as in Figure 4b. To assess donor-dependent RBC in vitro aging and predict metadata and prospective changes in a stored sample, data reduction is crucial. Consequently, Figure 4e shows one parameter derived from the standard method (top) and one obtained from the AI-based approach (bottom). In the standard approach, the above-described y-peak ratio based on the probability density functions in Figure 4b is shown in Figure 4e. This quality overall increases over time for the different donors yet exhibits pronounced scattering, indicated by the broad prediction intervals. For the AI approach, the fraction of pathological RBC shapes based on the shape phase diagrams is employed to evaluate donor-dependent RBC changes (Fig. 4e, bottom). Comparing both the standard and the AI approaches, as shown in Figure 4f, we observe an improved estimation of time-dependent change for the AI method, expressed by the R-squared values from the linear regression models in Figure 4e. Given the continuous advancement in the field of microfluidics and AI, new opportunities for assessing RBC quality arise. Simple, rapid, and unbiased approaches that rather require a single instrument, such as the Erysense recently reported [22], instead of a sophisticated cascade of elaborated equipment, will be critical for the future of RBC research and quality control.

Future Activities for RBC Quality Control in Transfusion Medicine
We can measure the integrity of individual cells. However, we do need proof of correlation between the parameters we measure in vitro and the behavior of the RBCs in vivo. To this end, transfused RBCs need to be labeled before transfusion. Historically, 51 Cr or other radioactive labels have been used, which are nowadays ethically no longer justifiable. However, with biotin labeling an alternative labeling technology with various advantages is readily available [96].
If the relation between in vitro parameters and in vivo RBC survival could be proven, it has the potential to revolutionize the practice of transfusion medicine: the composition of RBC units can be investigated before transfusion or in intervals over the entire storage period. Similar to those depicted in Figure 4, storage lesions could be quantified and RBC units be selected according to the intended treatment (e.g., acute vs. chronic transfusion need). The judgment of RBC quality over time, which is donor-dependent (compare Fig. 4), may also influence the storage time of particular units, which should be defined by the RBC quality and may, in particular cases, be beyond the current administrative regulations. Future developments based on AI may even predict storage quality over time improving the process even further.
Finally, similar to the complementary blood group determination and bedside tests today, the flow properties of RBCs of particular units could be pretested in the plasma of the patients, and such best match transfusions are realized without knowing (or determining) all RBC properties. This holds especially since not all transfusionrelated properties of RBCs are currently known as the recent discovery of the ion channel Piezo1 being determinant of the Er blood group proofed [97,98]. This RBCplasma matching on minimal sample volumes, could be a stairway to personalized transfusion medicine.

Further Applications for AI in Transfusion Medicine
Although intensively discussed, the microfluidicdriven, image-based judgment of RBC quality is not the only possible application of AI in transfusion medicine (e.g., [24,99,100]). However, so far, all these approaches have in common that they are only used at the academic level or in clinical trials without routine application in clinical laboratories. To achieve the latter, great progress needs to be made in teaching the concepts of AI not only to the next generation of scientists but to the decision-makers today [101].
In addition to aiding decision-making, AI has been researched for other applications in certain studies such as [102,103]. Sibinga [102] has demonstrated the potential of AI in the prediction of donor behavior and preferences, allowing blood banks (1) to more effectively recruit and retain donors, (2) to match donor blood with recipient blood more efficiently and accurately, (3) to predict blood demand and ensure that the appropriate blood components are available for transfusion, and (4) to predict whether a patient will require a blood transfusion during a surgical procedure, allowing for proactive management of blood product. On the other hand, Meier and Tschoellitsch [103] have explored the use of AI in patient blood management, including its potential in the blood manufacturing processes.

Conflict of Interest Statement
M.L. and S.Q. are employees of Cysmic GmbH, the manufacturer of Erysense, used for data generation presented in this study. In addition, S.R., G.S., S.Q., and L.K. are shareholders of Cysmic GmbH. H.E. and C.W. have no conflicts of interest to declare.

Funding Sources
This work was supported by the European Framework Horizon 2020 under grant agreement number 860436 (EVIDENCE), the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation, Grant No. WA 1336/13) and the Saarland University through the program "Anschubfinanzierung". Furthermore, we acknowledge support by the DFG and Saarland University within the funding programme Open Access Publishing.