Deep learning applications in myocardial perfusion imaging, a systematic review and meta-analysis

Background Coronary artery disease (CAD) is a leading cause of death worldwide, and the diagnostic process comprises of invasive testing with coronary angiography and non-invasive imaging, in addition to history, clinical examination, and electrocardiography (ECG). A highly accurate assessment of CAD lies in perfusion imaging which is performed by myocardial perfusion scintigraphy (MPS) and magnetic resonance imaging (stress CMR). Recently deep learning has been increasingly applied on perfusion imaging for better understanding of the diagnosis, safety, and outcome of CAD. The aim of this review is to summarise the evidence behind deep learning applications in myocardial perfusion imaging. Methods A systematic search was performed on MEDLINE and EMBASE databases, from database inception until September 29, 2020. This included all clinical studies focusing on deep learning applications and myocardial perfusion imaging, and excluded competition conference papers, simulation and animal studies, and studies which used perfusion imaging as a variable with different focus. This was followed by review of abstracts and full texts. A meta-analysis was performed on a subgroup of studies which looked at perfusion images classification. A summary receiver-operating curve (SROC) was used to compare the performance of different models, and area under the curve (AUC) was reported. Effect size, risk of bias and heterogeneity were tested. Results 46 studies in total were identified, the majority were MPS studies (76%). The most common neural network was convolutional neural network (CNN) (41%). 13 studies (28%) looked at perfusion imaging classification using MPS, the pooled diagnostic accuracy showed AUC = 0.859. The summary receiver operating curve (SROC) comparison showed superior performance of CNN (AUC = 0.894) compared to MLP (AUC = 0.848). The funnel plot was asymmetrical, and the effect size was significantly different with p value < 0.001, indicating small studies effect and possible publication bias. There was no significant heterogeneity amongst studies according to Q test (p = 0.2184). Conclusion Deep learning has shown promise to improve myocardial perfusion imaging diagnostic accuracy, prediction of patients’ events and safety. More research is required in clinical applications, to achieve better care for patients with known or suspected CAD.


Background
Coronary artery disease (CAD) continues to be a major cause of death and hospitalisation worldwide including in high-income countries [1]. The main underlying pathology lies in the progressive nature of coronary atherosclerotic process. Therefore, timely diagnosis to aid management of patients with CAD has significant impact on both morbidity and mortality.
There have been significant advancements in CAD imaging in the last two decades, from anatomical imaging of the coronary tree by means of invasive x-ray coronary angiography and cardiac computed tomography (CCTA), to functional assessment of coronary stenoses and their impact on the myocardium both at rest and stress (physical or pharmacological), using stress echocardiography, nuclear myocardial perfusion scanning (MPS), and stress perfusion cardiac magnetic resonance (CMR). Myocardial perfusion abnormalities are one of the early stages in the ischaemic cascade and ischaemic constellation, which also includes angina symptoms, electrocardiographic (ECG) changes and ventricular wall motion abnormalities [2].
Another exciting advancement has been made in computer vision technology following the revolution of neural networks and artificial intelligence (AI) algorithms. Deep learning is the main subfield of AI which has been the focus of computing in medical imaging, with cardiovascular imaging being one of the common arenas for such novel applications. Cardiac perfusion imaging is one of the main applications which has been studied by many deep learning practitioners and computer vision experts.
One of the key aspects of deep learning is that it allows automation of clinical tasks, and thus reduces dependence on users. This has significant advantage in perfusion imaging interpretation given that the diagnostic accuracy of visual assessment by users is highly dependent on level of training, and previously it has been demonstrated that automated quantitative analysis performed similar to highly trained users (level 3) in interpreting perfusion CMR imaging [3].

Rationale and objectives
There is mounting evidence of the successful applications of deep learning in cardiac perfusion imaging, as demonstrated by the increasing number of publications. Moreover, data derived from medical imaging can be integrated into specific machine learning approaches to provide valuable information for the prediction of different outcomes by exploring new correlations between variables and clinical data to build predictive models.
As a result, it is becoming increasingly important that the current literature and evidence behind deep learning applications in myocardial perfusion imaging needs further evaluation, as well as recommendations of how to fine-tune the research towards more meaningful results for patients.
Therefore, the objective of this review is to determine the diagnostic accuracy of cardiac perfusion imaging using deep learning algorithms, the impact of deep learning on image quality, image safety, and the assessment of its prognostic value.

Design
The umbrella protocol for this systematic review is registered in the International Prospective Register of Systematic Reviews (PROSPERO, CRD42020204164), and reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. This review follows the Cochrane Review structure of Diagnostic Test Accuracy (DTA) [4]. All searching activities were performed by two independent authors (EA and UD), with divergences solved after consensus.
The main review question was determined using the PICO approach: • Population: patients with suspected or known coronary artery disease (CAD) • Intervention: deep learning applications in CAD perfusion imaging • Comparison: comparison with conventional CAD imaging • Outcome: improve test accuracy and patient care

Selection criteria
Selection criteria decision was made by one author (EA) and overread by a senior author (AC), with disagreement resolved after consensus. Both prospective and retrospective studies were included with no restrictions based on minimal sample sizes or recruitment process. The analysis focused on participants with known or suspected CAD who had a perfusion imaging modality with the application of deep learning. Comparison was made with the standard imaging tests used in clinical practice to identify the functional significance of coronary artery lesions (index test). A clinical reference standard is used for both techniques (reference test) which is considered the gold standard.
Medical imaging techniques presented in conferences as part of challenges, such as Medical Image Computing and Computer Assisted Intervention (MICCAI), simulation studies and animal studies were not included due to the ambiguity in their direct relation to patient care. Given that the main scope of this review is on the direct application of deep learning on myocardial perfusion imaging, studies which used perfusion data as an input variable for prediction without deep learning image applications were excluded. As there are numerous studies of left ventricular segmentation using deep learning, these studies were not included unless they form the basis for perfusion quantification. Finally, studies of automated perfusion quantification which relied mainly on hand crafted algorithms or non-deep learning algorithms such as principle component analysis (PCA) were not included.  September 29, 2020 with no language constraints. Full Ovid search strategy and output is shown in Appendix [1]. No routine use of methodology search filters has been used due to reports of missing relevant studies and inconsistency [4].

Search procedure
To avoid publication bias and give currency to this systematic review with upcoming research, the grey literature also has been searched. This includes: • Web of Science Conference Proceedings.
• Open Grey database.
• Manual searching of references

Data extraction
The extracted summary estimates included imaging modality performance after the application of deep learning (sensitivity, specificity, and area under the curve (AUC)). The sample size of each study with the imaging modality used and deep learning techniques were all reported.
The following is a summary of input data which were reported from each study in the review:

Statistical analysis
The diagnostic accuracy of the imaging modalities was measured mainly with specificity and sensitivity analyses and presented as forest plots. Data were reported as count or percentages.
Given that most studies did not report the values for true positive (TP), false positive (FP), false negative (FN), and true negative (TN), a confusion matrix was generated for each of the included studies in the meta-analysis by taking sample size (S) to calculate FN using sensitivity and FP using specificity. This was followed by subtracting FN from S to calculate TN and FP from S to calculate TP.
Although different studies reported different perfusion interpretation scales in MPS imaging, there were 2 common scales: a binary scale of normal vs abnormal, and an ordinal scale from 0 to 4. Both scaling methods are considered similar given that the ordinal scale would group 0 and 1 as normal, and 2,3 and 4 as abnormal. The ground truth finding from the reference test was considered the threshold for the summary receiver-operating curve (SROC) with bivariate diagnostic random effects meta-analysis with logit-transformed pairs of sensitivities and false positive rates method. Two SROC plots were performed using linear mixed model to compare convolutional neural network (CNN) performance against multi-layer perceptron (MLP). Publication bias and effect size derived from each study accuracy compared to mean accuracy was tested using funnel plot and Egger's test. P value of less than 0.05 was considered significant. Heterogeneity was examined using tau 2 , I 2 and Q tests.
All statistical analysis was performed using RStudio software Version April 1, 1106 using R programming language version 4.0.4, "mada" and "meta" packages were used for meta-analysis.

Search results
715 study entries from the published literature Ovid search, and 432 entries from grey literature were identified. After the screening of titles and duplicate selection, 320 studies were included in the initial analysis for which titles indicated that the study might be of relevance. Following full text review, 46 studies were included in the systematic review, of  which 13 studies were included in the meta-analysis. The selection procedure and results with reasons for exclusion in the full text assessment is illustrated in Fig. 1.

Characteristics of studies
The final number of studies included in this systematic review was 46, details of first author, year of publication, model output, sample size, machine learning and deep learning techniques, index test (comparator), and reference test (gold standard) are all given in Table 1.
The majority of the studies were performed on MPS (76%). However, the number of studies in CMR has increased in the last 2 years, as shown in Fig. 2.
The most common neural network architecture in early years was MLP (35%), which has been dominated by CNN (41%) in recent years, as shown in Fig. 3.

Meta-analysis of perfusion classification
There were several studies which applied deep learning directly to segment and classify perfusion imaging maps with various classes, most of those studies were based on MPS imaging.
A meta-analysis was performed on 13 studies where the output of the classifier was based on perfusion maps segmentation and referenced to the presence or absence of significant CAD based on invasive coronary angiography or consensus of expert readers of MPS, a summary of their corresponding sensitivity and specificity is depicted in the coupled forest plot Fig. 4. The plot shows good performance of the neural networks with most studies reporting sensitivity and specificity of over 65%.
When comparing the performance of MLP with CNN across these studies using the summary receiver operating curve (SROC), CNN showed a higher value of SROC (higher sensitivity, lower false positive rate) with area under the curve (AUC) of 0.894, compared to MLP (AUC = 0.848), as showed in Fig. 5. The overall pooled AUC including all 13 studies was averaged at 0.859, showing good performance.

Assessment of heterogeneity
Quantifying heterogeneity showed τ 2 = 0.0037 with confidence interval [0.0000, 0.0295], which contains zero, indicating no significant between-study heterogeneity exists in our data. I 2 was found to be 22.3%, meaning that less than quarter of the variation in our data is estimated to stem from true effect size differences. Using literature "rule of thumb", we can characterize this amount of heterogeneity as mild.
Predictive interval was found to be ranging from [0.8569 to 1.1645], meaning that it is possible that some future studies will likely find positive effect based on the present evidence.
Finally, the reported p value for Q test was found to be above significance level (p = 0.2184), meaning there is no significant heterogeneity.

Assessment of risk of bias
The risk of bias was assessed using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) tool. A modified version was adapted and five main fields were assessed: Taking all the above into consideration, a table of the included studies with their associated risk of bias is shown in Appendix [2]. There were 13 studies (28%) which did not include an index test to compare with the machine learning or deep learning model before comparing with the reference test (ground truth). All studies defined a ground truth test against which they tested the model performance, and the majority of the studies blinded the model reporters from the ground truth results. This indicates the high reliability of the reported results.
Funnel plot in Fig. 6 shows asymmetrical pattern indicating smallstudy effects. Egger's test was significant with p value < 0.001, indicates that the data in the funnel plot are indeed asymmetrical, and possibly related to publication bias.

Deep learning techniques
The application of deep learning models on myocardial perfusion imaging started in the early 1990s, where all the studies were focused on the use of MLP architecture and applied on MPS [5][6][7][8][9][10]. MLPs are composed of three types of layers: input layers taking the raw image data, hidden layers which are connected via weight vectors, and an output layer which takes the weighted sum, applies an output function and returns a prediction [51]. The main output prediction of interest in early studies was perfusion map classification, this has continued in early 2000s, when the performance of MLP was also compared to other traditional linear and non-linear machine learning algorithms, such as K-nearest neighbours (KNN) [13] and support vector machines (SVM) [17]. Most of the networks achieved high performance metrics when There has been a substantial increase in the number of publications on deep learning in general with more focus on CNN over the last few years, as shown in Fig. 3. Due to the high dimensionality of imaging data, the fully connected layers which MLPs are based on put a significant limitation on the size of the model available to learn image features. The CNN overcomes this challenge by using convolutional layers which have significantly fewer parameters and make use of extensive weight sharing. The process of several convolutional layers can be thought in the following steps: detect low level features and edges  from raw pixel data in the early layers, use these edges to detect shapes in the later layers, and use these shapes to detect higher-level features for prediction. An additional useful property of CNNs is that they lend themselves well to be used with transfer learning where the majority of the network is kept with its high-level feature extraction ability and only the last output layer is exchanged with a new layer to fit with the purpose of the study [51]. As a result, the majority of deep learning studies on perfusion imaging in the last few years have used CNNs as the main architecture, as shown in Fig. 3. Furthermore, the power and flexibility of CNNs has opened the window for deep learning applications in more challenging image analysis domains such as stress perfusion CMR [23,27,30,31], resting CT perfusion (rCTP), and myocardial contrast echocardiography (MCE) [22].

Summary of main results
The performance of neural networks for the identification of perfusion defects has proven to have a comparative performance to human expert reading, and had a strong overall accuracy in MPS studies (AUC 0.859) regardless of the comparator or reference tests. The metaanalysis presented in this review also shows the superior performance of CNNs compared to MLPs in reading and classification of perfusion maps.
The applications of deep learning on stress perfusion CMR has been increasing in recent years. There are some promising data on the effectiveness of using deep learning with CNNs to the pre-processing stage of perfusion quantification in CMR by automated identification of anatomical landmarks, such as the right ventricle (RV) insertion point into the septum and left ventricle (LV) centre on peak contrast enhancement [23,31,43]. Furthermore, CNN algorithms have been successfully applied to the segmentation of CMR perfusion images [27,30] with high performance. These applications in CMR still require further research. Another exciting application of CNN is on k-space acceleration and reconstruction for faster perfusion images acquisition [33] which is another attractive research application for deep learning. Another novel application on the horizon is using deep learning for predicting myocardial blood flow in perfusion CMR using physics-informed neural networks (PINNs) [52,53].
Other successful applications of deep learning include prediction, whether looking at prediction of death or myocardial infarction [39,47], prediction of revascularisation events [46], or prediction of high-quality full radiation dose MPS images from low dose or short time scans which has significant impact on the radiation dose delivered to the patients [36,40,41], and the identification of acquisition sequence type and image plane in CMR [54].
Furthermore, there are newer deep learning techniques using generative adversarial network (GAN) which have some promising application for image reconstruction, but this is still an active area of research.

Applicability of findings to review question
Given the evidence of multiple successful applications of deep learning on perfusion imaging presented in this review, the value of this evidence, although significant, remains in research applications with limited clinical use. A wider use of such applications based on the evidence presented could have significant impact on patients with known or suspected CAD. Reducing scan time, radiation dose, human resources and increasing diagnostic accuracy can save patients time, and result in better management of their coronary artery disease which has significant mortality and morbidity benefits.

Limitations
There are more applications and techniques which have been used without full publications of clinical studies and were not reported in this review, given that the main scope is clinical applications of deep learning in perfusion imaging.
The published articles included in this review did not report the same performance metrics, which was a challenge on the meta-analysis process, one of the main observations was that the performance metrics were reported in some studies for both stress and rest images, but not in others. As a result, only the highest performance score of the models for the stress images was reported in this meta-analysis.

Implications for practice
In this review the evidence of successful deep learning applications in myocardial perfusion imaging has been presented. Most of the early studies used the standard MLP perceptron architecture on MPS imaging, but more recently CNN architectures gained in popularity given its superior performance in image analysis, and deep learning applications have expanded to other perfusion imaging modalities, mainly stress perfusion CMR. The accuracy of deep learning has proven to be high in perfusion image classification to diagnose CAD compared to human readers and conventional diagnostic procedures performed in routine clinical practice, based on our meta-analysis of the relevant studies.

Implications for research
The successful preliminary applications of deep learning in stress perfusion CMR have opened a wide spectrum of potential applications to improve accuracy, accelerate scan times, and predict outcomes. Despite the high performance of deep learning in MPS image classification, which have shown promise for more than two decades, there is still a lack of wide use in clinical practice.
As a result, the findings of this review would encourage more clinical studies and trials to assess the performance and accuracy of deep learning in cardiac perfusion imaging using the latest techniques, in order to obtain clinical validation and to start to use this technology as clinical applications in perfusion imaging. Furthermore, other perfusion imaging modalities which are still in their infancy, such as rCTP and MCE, can also benefit from deep learning applications.

Funding
This research received no grant from any funding agency in the public, commercial or not-for-profit sectors. This research has received grant from Wellcome Trust [222678/Z/21/Z].

Authors contributions
EA and UD have performed the systematic search, data extraction and writing the manuscript. EA and CS have contributed to data analysis and statistics. EA and AC have contributed to discussion and conclusion. AC has contributed to final proof-reading as a senior author. All authors have reviewed and approved the submission.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.