FormalPara Key Summary Points

Machine learning algorithms, especially deep learning, will expand the boundaries of echocardiography. It will automate multiple tasks which help to reduce the burden for the sonographer.

Machine learning algorithms will integrate several new approaches in a seamless manner, which includes speckle tracking, color Doppler echocardiography, 3D echocardiography, vector flow mapping, and others. Many of these approaches were time-consuming or required extensive expertise. It will lead to the development of automated clinical pipelines that can improve cardiovascular diagnosis and management.

Many challenges still exist for the growth of machine learning algorithms. Databases need to be publicly available and data or code sharing needs to be encouraged. Future studies need to incorporate external validation to assess the performance of the algorithm in various cohorts and to observe the reproducibility of values.

Introduction

Artificial intelligence (AI) refers to the utilization of machines to simulate human behavior and execute different actions with minimal involvement or supervision [1]. Machine learning (ML), a branch of AI, can examine the information and lead to data-driven revelations or discoveries [2]. With every passing year, ML is emerging as a powerful tool in multiple industries and is establishing a firm foothold in information technology [3]. ML is also growing in the field of healthcare. AI can simplify workflow and improve the capacity for image interpretation [4].

ML has made significant strides in the fields of radiology and pathology [5]. ML has impacted every cardiovascular imaging modality—this includes all phases of acquisition to the presentation of findings [6]. In the arena of cardiology, echocardiography frequently serves as the first line and diagnostic pillar of cardiovascular imaging [7]. Echocardiography not only demands proper acquisition of imaging but also an appropriate interpretation [8]. If a physician comes to a different conclusion while observing the same set of imaging, it is more subjective than objective. A standardized and reproducible approach is pertinent in patient care. The application of ML in the field of echocardiography can elevate the modality to unprecedented new heights. Over the last few years, ML has made significant progress in cardiovascular imaging. In this review, we will explore the impact of ML in the realm of echocardiography.

Potential of Machine Learning in Echocardiography

As pocket ultrasound, wearable devices, and smartphone applications are expanding the capabilities of cardiovascular imaging, AI and ML algorithms will be strongly intertwined with the future of echocardiography [9]. Data emanating from each echocardiography is exponentially rising with each passing year and becoming progressively complex and it will easily overwhelm current statistical software [10]. With the integration of ML algorithms in clinical pipelines, it will provide additional insight and automate multiple clinical tasks (Table 1). It can improve the user interface, standardize interpretation, connect a variety of parameters in a meaningful way, and facilitate the growth of the mobile health (mhealth) [11, 12]. ML algorithms can enable automatic and rapid interpretation of ejection fraction (EF). Besides automated EF, ML algorithms can integrate several diagnostic approaches that have been hindered previously due to multiple factors [8]. This includes multiple novel approaches such as global longitudinal strain, color Doppler echocardiography, 3D echocardiography, and others, which open various new branch points or pathways in the arena of echocardiography (Fig. 1). Furthermore, it can help bridge the gap between experienced sonographers and burgeoning doctors. It can also serve as a complementary assistant [13]. It can improve the user interface, standardize interpretation, and connect a variety of parameters in a meaningful way [11, 12]. In addition, it can identify a variety of novel phenotypes in heterogeneous conditions like aortic stenosis and diastolic heart failure [14]. It can integrate information from genomics and radiomics into large cardiovascular imaging databases [15]. This article is based on previously conducted studies and does not contain any new studies with human participants or animals performed by any of the authors.

Table 1 Application of machine learning in echocardiography
Fig. 1
figure 1

Role of machine learning in echocardiography

Types of Machine Learning

ML is a broad term that can be subdivided into supervised learning, unsupervised, semi-supervised learning, and deep learning [16]. Supervised learning uses labeled parameters or domains to execute actions [16]. Unsupervised learning does not utilize fixed labels and functions independently [17]. It is often referred to as agnostic. Semi-supervised learning is a hybrid algorithm that incorporates attributes from both supervised and unsupervised learning [18]. Reinforcement learning integrates certain reward criteria to perform various actions [18]. It is less commonly used in comparison to other contemporary algorithms.

Among all ML algorithms, deep learning is heralded as the most advanced algorithm [19]. It is used in voice recognition software such as Siri or Alexa or self-driving cars developed by Google [5]. From an architecture standpoint, deep learning is like a human neuron with multiple layers [20]. This is the key fundamental difference between deep learning and other algorithms. The multi-layered design helps augment object identification and visual recognition. Information is processed between previous and successive layers [20]. Recently, several technological advances in neuronal architecture have expanded the capabilities of deep learning [21, 22]. It can analyze information in a multitude of approaches.

AI-Augmented Assessment of Ejection Fraction and Cardiac Function

AI-driven automation can expedite and expand various processes in echocardiography. In this landmark study, Quyang et al. applied a video-based deep learning algorithm that could segment the left ventricle, estimate ejection fraction, and assess cardiomyopathy and was compared with human interpretation [23]. The model was able to accurately segment the left ventricle with a DICE coefficient of 0.92 (DICE coefficient measures the level of similarity between two samples). It was also able to efficiently predict ejection fraction (mean absolute error of 6.0%) and classify heart failure with reduced ejection fraction (HFrEF) (area under the curve (AUC) = 0.96) in external databases. Interestingly, the predicted EF from the algorithm had a small or similar variance with human experts. More impressively, the algorithm had higher reproducibility of values than human experts in real time. With more accuracy and reproducibility in calculated values, more accurate decision-making can be performed for critical conditions and improve patient care. This study shows the potential of automated EF and facilitates the growth of precision medicine.

Similarly, Tromp et al. developed a deep learning workflow that could classify and segment two-dimensional echocardiograms during systolic and diastolic phases in multiple data sets [24]. For the ATTRaCT test set, the deep learning algorithm was able to achieve segmentation of the left ventricle and left atrium with a mean DICE coefficient greater than 93%. Furthermore, it was able to effectively classify systolic dysfunction (left ventricular ejection fraction < 40% with area under the operating curve (AUC) of 0.9–0.92) and diastolic dysfunction (early diastolic mitral inflow velocity to early diastolic mitral annular tissue velocity (E/e′) ratio > 13, AUC of 0.91–0.91) in the remaining datasets. Amazingly, there was less variability in measured values of deep learning architecture in comparison to values recorded by human experts. Similar to Quiang et al., the algorithm demonstrated similar values and performance in multiple external datasets with patients from different countries. Besides automated ejection fraction, their algorithm was able to rapidly assess systolic dysfunction from multiple cardiac Doppler parameters. In reality, color Doppler echocardiography generally requires extensive training and time involvement.

Howard et al. compared the performance of deep learning network assessment with expert consensus for assessing key characteristics in the echocardiogram [25]. The intraclass correlation coefficient (ICC) between AI and the expert consensus was 0.926. Similarly, the algorithm performed well in assessing interventricular septum thickness (ICC = 0.809) and posterior wall thickness (ICC = 0.535) when compared with human interpretation. This is one of the first studies to create an online expert consensus panel from the entire country (United Kingdom) to serve as a reference standard to help train and validate an ML algorithm. Furthermore, the range in variation in expert analysis could designate a possible range for performance which can guide the development of the ML algorithm. Unlike many studies in which the algorithm focused on segmentation of various areas, this algorithm focused on identifying key points as the primary network target. By choosing key points, the algorithm mirrors the behavior of a typical sonographer.

Role of AI in Other Aspects of Echocardiography

We explore notable studies utilizing ML algorithms in different aspects of echocardiography. Hughes et al. evaluated the role of a deep learning interpretation of echocardiograms for evaluating certain biomarkers such as hemoglobin, B-type natriuretic peptide (BNP), troponin I, and blood urea nitrogen (BUN) [26] to assess certain physiological states from echocardiography. Interestingly, the algorithm was derived from the publicly available EchoNet database. From the Cedars-Sinai data set, the deep learning framework was able to successfully detect anemia (AUC = 0.80), elevated BNP (AUC = 0.75), and elevated troponin I (AUC = 0.69). This study emphasizes the capabilities of deep learning algorithms in providing additional clinical insight and phenotypic information. The findings are in accordance with a growing consensus that deep learning assessment of cardiovascular imaging can correlate with systemic physiology.

Duffy et al. assessed the accuracy of deep learning for quantifying ventricular hypertrophy in patients with hypertrophic cardiomyopathy and cardiac amyloidosis [27]. The algorithm accurately measured intraventricular wall thickness and left ventricular diameter, and classified cardiac amyloidosis (AUC = 0.83) and hypertrophic cardiomyopathy (AUC = 0.98). In external validation, the algorithm had a similar performance in detecting cardiac amyloidosis (AUC = 0.79) and hypertrophic cardiomyopathy (AUC = 0.89). The algorithm was able to recognize left ventricular wall thickness and diameter within the variance of human experts. In addition, it was capable of subtle ventricular changes or phenotypes that could be proven difficult for human readers. If this approach is adopted into clinical workflow, the algorithm can provide a high index of suspicion for the sonographer for several underrecognized conditions.

Salte et al. explored the role of deep learning and ML algorithms for fully automated measurement of global longitudinal strain (GLS) in 200 patients [28]. The algorithm successfully performed global longitudinal strain, automatic segmentation, and motion estimates across a wide variety of cardiac pathologies rapidly and with good accuracy (GLS was 12.0 ± 4.1% for the AI method and – 13.5 ± 5.3% for the reference method). This is the first study to describe a fully automated deep-learning pipeline that calculates GLS in real time. The algorithm was able to perform calculations within seconds, which is substantially better than the current clinical software. If implemented into ultrasound hardware, GLS could be regularly integrated into clinical practice.

AI-Driven Identification of New Phenotypes in Aortic Stenosis

ML algorithm can isolate new phenotypes in various cardiac conditions, which can further characterize various heterogeneous conditions. Sengupta et al. explored the potential of a supervised ML algorithm to stratify patients with aortic stenosis (AS) into high-severity and low-severity AS phenotypes based on parameters of echocardiography [29]. Furthermore, they were compared to markers indicating disease severity observed in computed tomography (CT) and cardiac magnetic resonance (CMR) imaging and major complications including aortic valve replacement (AVR) and mortality. The ML model subdivided 1117 (57%) patients having high severity and 847 (43%) patients having low-severity AS. In addition, ML-derived classification had enhanced discrimination (integrated discrimination improvement: 0.17, confidence interval (CI) 0.02–0.12) and reclassification (net reclassification improvement: 0.17, Cl 0.11–0.23) for aortic valve replacement (AVR) outcomes at 5 years. Todoki et al. assessed an unsupervised learning framework in 866 patients by including echocardiographic attributes from the left ventricle to predict major adverse cardiac events (MACE) [30]. A loop subdivided patients into four unique categories and the Kaplan–Meier curves demonstrated significant differences in mortality and MACE-related events (both p < 0.001).

AI-Driven Identification of New Phenotypes in Congestive Heart Failure and Other Conditions

Similarly, AI algorithms can be used to further differentiate heart failure. Pandey et al. developed a deep learning model which incorporated multidimensional parameters from echocardiography to identify subtypes in patients having heart failure with preserved ejection fraction (HFpEF) [31]. The algorithm isolated high- and low-risk phenotypes. The performance of the ML-derived model was tested against two external cohorts. It revealed a superior area under the receiver operating curve (ROC) than the 2016 American Society of Echocardiography (ASE) guidelines in patients with HFpEF for predicting elevated left ventricular filling pressures (0.88 vs. 0.67; p = 0.01). The high-risk phenotype had significantly higher rates of hospitalization and cardiac death (hazard ratio, HR = 1.92, p = 0.01) in the TOPCAT cohort. Cho et al. examined an ML algorithm in 297 patients with multidimensional left ventricular parameters for phenotyping by using left ventricular speckle tracking, vector flow mapping, and left ventricular measurements [32]. The algorithm isolated four unique clusters. Cluster IV had a higher prevalence of stage C or D heart failure (78%, p < 0.001) and an elevated incidence of MACE events (p < 0.001). Mishra et al. utilized unsupervised clustering in 1000 heart failure patients with coronary artery disease (CAD) with 15 echocardiographic variables and four phenogroups were identified [33]. Phenogroup 1 was associated with an elevated risk (HR = 4.8) of heart failure hospitalization. Similarly, Segar et al. utilized unsupervised clustering in the TOPCAT cohort with 654 patients with echocardiographic information, and three phenogroups were identified [34]. Phenogroup 1 had a substantially higher incidence of adverse events, which include all-cause mortality and heart failure hospitalization.

External Validation

There is considerable variance in the human expert assessment of the ejection fraction [35]. This can be attributed to the time-consuming nature of manual tracing of the size of the ventricle to assess each beat and there are irregularities in the heart rate [36]. If variation is present, the American Society of Echocardiography and the European Association of Cardiovascular Imaging recommend a tracing average of five consecutive beats [36, 37]. More commonly, the ejection fraction is often a measure of one beat which explains high inter-observer variation or minimal precision in day-to-day practice [37]. Unfortunately, choosing one beat is very subjective, and not reflective of the guidelines. This further becomes an issue in patients with borderline ejection fraction, where less accurate reporting is linked with elevated complications and morbidity [38, 39]. Deep learning algorithms provide an opportunity for rapid segmentation and frequent evaluation of the cardiac cycle across multiple beats in the cardiac cycle.

It is imperative that ML algorithms are tested to the same level of rigor equaling a human expert to have any meaningful role in the clinical environment [40]. To date, there have been multiple ML studies for assessing ejection fraction, systolic function, and view classification. They were tested on private databases, and the results were not replicable and there was a lack of interpretability. Most of these algorithms were never tested with an external cohort to assess their performance; there was an absence of validation [41]. If algorithms can be tested in different populations, it provides a glimpse or insight into the performance of the algorithm if adopted in clinical practice. Moreover, the algorithms can be adjusted or improved by medical teams if results are unsatisfactory. If more nationwide collaboration between multiple experts occurs, we can create a uniform reference standard that can help serve as a benchmark for training ML algorithms. More importantly, it provides the foundation for automated workflows in medical management.

Contemporary Views on Machine Learning

Without a doubt, echocardiography serves as the primary diagnostic modality and is fundamental in cardiovascular diagnosis and management. Several important metrics that reflect myocardial function include left ventricular ejection fraction, left ventricular end-systolic volume, and left ventricular end-diastolic volume [42]. In a single echocardiogram, there are numerous acquisition angles and views that could offer multiple interpretations of the underlying anatomy of the heart [8]. Nonetheless, there is substantial interobserver variability. Due to significant differences in quantification and interpretation, ML approaches can help standardize and improve the reproducibility of these values. However, initial deep learning studies based on manual images of systole and diastole had considerable differences with human expert evaluation [43, 44]. With the emergence of the landmark EchoNet database, deep learning algorithms trained in video–labeled images can help predict ejection fraction comparable or superior to human experts with high reproducibility. With the database being publicly available, other academic centers have been able to use the transfer learning properties of the database to help train their deep learning algorithms with smaller sample sizes and produce accurate results. Previously, ML algorithms focused primarily on the automation of tasks. Now these algorithms are beginning to embark on obtaining specific cardiac measurements during systole or diastole readings. Most patients with heart failure have a mildly reduced or preserved ejection fraction, and proper evaluation is necessary in the spectrum of cardiac disease. Furthermore, it can be particularly puzzling to identify heart failure with preserved ejection fraction, and structural changes associated with elevated left ventricular pressure can be difficult to detect on echocardiograms [45]. This will eventually lead to fully automated ML pipelines in the clinical setting, which will facilitate rapid and accurate assessment of ejection fraction and various parameters in real time, which will drastically improve patient care.

Multiple prominent echocardiographic approaches such as speckle tracking, tissue Doppler echocardiography, vector flow mapping, 3D echocardiography, and stress echocardiography can be increasingly integrated into clinical practice with the application of ML algorithms, especially with deep learning. Many of the difficulties associated with these procedures can be overcome through these ML frameworks. The time and expertise required for 3D echocardiography have been a barrier to widespread acceptance. GE Healthcare's HeartModel, which utilizes artificial intelligence, enables chamber quantification and 3D ejection fraction, left ventricular, and atrial volume [46]. After proper training, these algorithms can automate a number of these redundant tasks and be faster than their human counterparts. Few commercially available applications utilize AI to measure strain such as GE Automated Functional Imaging (AFI) or EchoPAC™, which allows quantitative analysis of complete global and longitudinal strain throughout the entire course of the cardiac cycle [47]. However, these approaches are semi-automatic and require several steps of operator input. In addition, this approach generally requires 5–10 min. Based on positive results for speckle tracking, deep learning can remove the need for manual tracing and can possibly be implemented into ultrasound machines soon [28]. The application of deep learning algorithms can perform these scans in less than a minute and do not require human input to correctly classify cardiac views or perform timing of cardiac events. This will enable automated pipelines that can calculate global longitudinal strain (GLS) in real time. Furthermore, the use of automated workflow can expedite and broaden access worldwide, augment the quality of care, and reduce costs for multiple cardiac conditions.

Current guidelines do not place significant weight on the heterogeneity of various cardiac conditions such as CAD, congestive heart failure, and aortic stenosis [10]. There is a complex interplay of various factors that includes genetic, molecular, and pathological components [10]. This emphasizes the importance of unsupervised clustering or deep learning approaches. With the application of these approaches, we can uncover new phenotypes or variants within these intricate conditions [48]. Genomics is being incorporated into clinical databases to form pan-genomic databases [49]. With the incorporation of echocardiographic and genomic variables, ML algorithms can extrapolate new patterns or improve current clinical risk stratification or classifications. In addition, there are prospects for new biomarkers or drugs. Furthermore, the correlation between medical imaging and biomarkers is vastly underexplored [50]. It remains unclear if common cardiovascular imaging approaches such as echocardiography can broadly approximate biomarker values and possibly provide great insight into a patient’s underlying condition [25]. Furthermore, these deep learning algorithms do not have an elevated risk of radiation or additional cost. These pipelines may reduce the need for invasive testing. With the growing advances in radiomics, this approach allows greater extraction of features in imaging. It can help further differentiate clinically similar conditions [49]. These approaches allow us to divert our resources towards new innovative approaches in management, which can tailor phenotype-driven or individual-based management and improve our understanding of complex heterogeneous entities in cardiology.

Though significant strides are being made in clinical phenotyping and genetic sequencing, several cardiac conditions are frequently misdiagnosed or underdiagnosed, such as hypertrophic cardiomyopathy or cardiac sarcoidosis [12]. This can be attributed to similar morphological attributes, which can be difficult to distinguish in echocardiography [51]. Though the degree of left ventricular thickness is often a prominent prognostic sign, it can be difficult to quantify due to high inter-observer variability [27]. With significant strides in deep learning algorithms, it can accurately identify phenotypes and characteristics not perceived by human expert evaluation. ML algorithms can accurately delineate subtle variations in left ventricular wall geometry and thickness. This will lead to the development of automated workflows that can enable a precision diagnosis of multiple cardiac conditions.

Smart devices and mobile applications are deeply ingrained in our daily lifestyles [9]. As deep learning algorithms continue to progressively evolve with concomitant growth in handheld echocardiographic devices, these algorithms will cause a paradigm shift in cardiovascular diagnosis and management. These algorithms will reduce repetitive tasks and interobserver or intraobserver variability. With AI software embedded in ultrasound machines, this will improve access to cardiac imaging in underserved areas or regions with the absence of clinical expertise [1, 11]. Deep learning algorithms will expedite the growth of smart clinics in the near future, and this will be especially important in resource-deficient areas [52].

Challenges of ML Algorithms in Echocardiography

Although ML frameworks offer limitless opportunities, the optimal performance of these algorithms is dependent on the size and complexity of the underlying data. If ML algorithms are trained on smaller datasets, it can lead to “false discoveries”, which can be misleading or impact patient care [16]. Up-and-coming medical teams need to be cautioned about this and need to be responsible when developing their algorithms. Except for a few databases, such as EchoNet, most databases in echocardiography are not publicly available [19]. However, building or obtaining datasets can pose significant difficulties for smaller institutions. Multiple institutional board approvals are required, which can be both time-consuming and labor-intensive [53]. Creating large datasets may also involve financial constraints. The concept of data-sharing and code-sharing needs to be developed within institutions that can promote widespread growth of ML.

Although the promise of ML and AI can be promising and enticing for providers, it is far from perfect. “The black box” nature of AI and ML is one of the potential pitfalls of these algorithms and this needs to be addressed carefully by medical teams. AI is not programmed with ethics and can be susceptible to various biases [4, 17]. Medical providers need to have a clear understanding of the algorithm and the project goal. This would require frequent meetings between the investigator and engineering to effectively develop and train ML algorithms [54] (Fig. 2). Furthermore, for ML to prosper in the medical field, the fundamentals of ML need to be introduced during medical education, residency, and fellowship [4]. This will help prepare future investigators and they will be able to fully utilize the capabilities of these algorithms.

Fig. 2
figure 2

Steps in developing a proper ML algorithm for a medical team. CV cross-validation, GPL General Public License, LOOCV leave 1 out cross-validation, ML machine learning, S/W software. (Permission obtained for the figure for use in our publication)

Conclusions

As we move forward in this digital era, ML algorithms will create new pathways and expand the boundaries of the field of echocardiography. These will serve as a valuable digital companion that offers additional diagnostic input and can automate several processes. The future of precision medicine is becoming increasingly evident with the growth of ML algorithms.