The effect of initial teaching on evaluation of left ventricular volumes by cardiovascular magnetic resonance imaging: comparison between complete and intermediate beginners and experienced observers

Background High reproducibility and low intra- and interobserver variability are important strengths of cardiac magnetic resonance (CMR). In clinical practice a significant learning curve may however be observed. Basic CMR courses offer an average of 1.4 h dedicated to lecturing and demonstrating left ventricular (LV) function analysis. The purpose of this study was to evaluate the effect of initial teaching on complete and intermediate beginners’ quantitative measurements of LV volumes and function by CMR. Methods Standard clinical cine CMR sequences were acquired in 15 patients. Five observers (two complete beginners, one intermediate, two experienced) measured LV volumes. Before initial evaluation beginners read the SCMR guidelines on CMR analysis. After initial evaluation, beginners participated in a two-hour teaching session including cases and hands-on training, representative for most basic CMR courses, after which it is uncertain to what extent different centres provide continued teaching and feedback in-house. Dice Similarity Coefficient (DSC) assessed delineations. Agreement, accuracy, precision, repeatability and reliability were assessed by Bland-Altman, coefficient of variation, and intraclass correlation coefficient methods. Results Endocardial DSC improved after teaching (+0.14 ± 0.17;p < 0.001) for complete beginners. Low intraobserver variability was found before and after teaching, however with wide limits of agreement. Beginners underestimated volumes by up to 44 ml (EDV), 27 ml (ESV) and overestimated LVM by up to 53 g before teaching, improving to an underestimation of up to 9 ml (EDV), 7 ml (ESV) and an overestimation of up to 30 g (LVM) after teaching. For the intermediate beginner, however, accuracy was quite high already before teaching. Conclusions Initial teaching to complete beginners increases accuracy for assessment of LV volumes, however with high bias and low precision even after standardised teaching as offered in most basic CMR courses. Even though the intermediate beginner showed quite high accuracy already before teaching, precision did generally not improve after standardised teaching. To maintain CMR as a technique known for high accuracy and reproducibility and low intra- and inter-observer variability for quantitative measurements, internationally standardised training should be encouraged including high-quality feedback mechanisms. Objective measurements of training methods, training duration and, above all, quality of assessments are required.


(Continued from previous page)
Conclusions: Initial teaching to complete beginners increases accuracy for assessment of LV volumes, however with high bias and low precision even after standardised teaching as offered in most basic CMR courses. Even though the intermediate beginner showed quite high accuracy already before teaching, precision did generally not improve after standardised teaching. To maintain CMR as a technique known for high accuracy and reproducibility and low intra-and inter-observer variability for quantitative measurements, internationally standardised training should be encouraged including high-quality feedback mechanisms. Objective measurements of training methods, training duration and, above all, quality of assessments are required.
Keywords: CMR, Teaching, Cine, Function, Beginners Background High reproducibility and low intra-and interobserver variability of cardiac magnetic resonance (CMR) assessments of quantitative data are important strengths of this technique in clinical practice and in its role as an endpoint in research studies. Although CMR has been demonstrated to be a highly accurate imaging technique for measuring left ventricular (LV) volumes [1], it has been noted that a significant learning curve may be observed in clinical practice. In basic CMR courses, as based on nine international course schedules available online, an average of 1.4 h dedicated to lecturing and demonstrating LV function analysis is offered. It is however unclear to what extent different centres provide continued teaching and feedback in-house to beginners to further improve accuracy and precision in CMR measurements after initial teaching. There are currently no international criteria to measure the quality of "sufficient" training for the individual based on accuracy and precision, but rather a time frame and number of cases performed, as presented in the guidelines for training and accreditation of CMR in Europe [2]. It is the senior authors' experience that whereas some centres have a reference population that should be accurately and precisely measured by beginners with feedback from experienced observers as part of in-house training, other centres may have no particular in-house training or validation of CMR beginners' measurements before they are allowed to report clinically or actively participate in research studies.
Previous teaching studies have shown improvement for measuring LV parameters in healthy volunteers and patients after teaching for observers with up to 2 years previous training [3,4], indicating the importance of continued teaching and feedback. The impact of short standardised teaching on CMR measurements of LV volumes and function in patients has however not been assessed for complete beginners, although the limited teaching time offered in basic courses may be hypothesised to not lead to a major change in measurements due to the complexity of CMR. Further, there is an increasing number of physicians in want of CMR training leading to a potential move from training in high-volume centres to lower-volume centres. It is thus important to have knowledge of the impact of basic training also on complete beginners' measurements, to maintain CMR as a technique known for its high accuracy and reproducibility and low intra-and inter-observer variability.
The aim of the current study was therefore to evaluate the effect of initial teaching on complete and intermediate beginners' quantitative measurements of LV volumes and function in patients, and to compare these with experienced observers' measurements.

Study population
The local ethics board approved the study and written informed consent was obtained from all participants. Fifteen patients (13 male, median age 68 years, range 52-82 years) were included. The patients were prospectively and randomly selected from the clinical population, with a history of symptoms representative of stable angina and known or suspected coronary artery disease (CAD). Patients with an estimated glomerular filtration rate < 30 ml/min/1.73 m 2 or with contraindications to CMR were excluded, according to clinical routine.

Image acquisition
Cardiac magnetic resonance imaging was performed using standard clinical techniques according to recognised SCMR international guidelines [5]. All CMR examinations were performed on a 3 T MR scanner (Achieva, Philips, Best, the Netherlands), using a 32-channel phased-array receiver coil. The study examination included routinely acquired cine short-and long-axis balanced steady state free precession (bSSFP) images after administration of a gadolinium-based contrast agent (gadobutrol, Berlin-Wedding, Schering, Germany).

Image analysis
All analyses were performed in accordance with the SCMR guidelines [6]. The cine images were analysed using the freely available software Segment (version 2.0; Medviso AB, Lund, Sweden) for quantitative measurements [7]. Results were blinded to all observers between assessments and between observers. In total five observers (two complete beginners, one intermediate beginner, and two experienced observers) independently evaluated the images. The two complete beginners were medical students (B1, B2), without previous experience in LV CMR delineation. The intermediate beginner (IB) had experience from foetal cardiac ultrasound as an obstetrician, and had delineated 20 CMR cases without formal training before participating in this study. The two experienced CMR level III-certified observers with 15 years CMR experience had trained at the same centre (E1, E2). All observers independently measured LV volumes including LV mass twice, at least one week apart. In the case of beginners, assessments were also repeated twice after the teaching sessions.
Before the initial assessment, the beginners read the standardised SCMR description on how to analyse LV volumes and function [6]. After the first assessment, the three beginners attended a one-hour lecture including cases and discussions on how to delineate LV endo-and epicardial borders in bSSFP cine images, with a total of seven attendees. The beginners also attended a one-hour hands-on teaching session on how to delineate endoand epicardial borders for LV volumes and function. As only the three beginners participating in the current study attended this hands-on session, much attention was given to interaction for direct feedback from the teacher. The CMR level III-accredited observer E1 gave both teaching sessions.

Statistical analyses
The software R (version 3.0.2) was used for all statistical analyses [8]. Inter-observer variability between beginner and expert observers was based on each observer's first measurement; in the case of beginners the first measurements both before and after teaching, respectively, to avoid impact of the repeat measurements on precision and accuracy calculations after teaching. Beginners' accuracy and precision was calculated as bias and 95% limits of agreement (i.e. ±1.96 standard deviations) between each beginner and E1 [9]. The coefficient of variation (CV) was calculated for further assessment of agreement and was defined as the SD of differences between the respective measurements divided by their mean and expressed in %. For reliability, the intraclass correlation coefficient (ICC) was computed using the R irr v0.84 package as a one-way single consistency score as an index of intra-and inter-rater reliability for quantitative data [10], for inter-rater reliability based on each observer's first measurement. Negative ICC values may occur when within-groups variance exceeds the between-groups variance, and were marked * in Table 1.
The Dice Similarity Coefficient (DSC) was calculated between beginners and E1 for endo-and epicardial delineations, respectively, on a slice-by-slice basis for both end-diastole and end-systole. Only slices in which both the respective beginner and E1 had performed delineations were included in the averages presented in Table 2, and grouped for complete cases and for basal, midventricular and apical thirds of the left ventricle for readability. This matched slice-by-slice DSC was performed to avoid falsely reported averaged imperfect matches due to delineations in basal or apical slices where observers may differ in opinion on which slice to include, and thus these slices would risk to lower the overall score despite perfect matches in other slices. The number of cases where the beginners and expert chose to define basal and apical slices differently is instead presented separately. The DSC formula was based on two times the volume of the intersection of the respective two regions divided by the sum of the regional volumes, with DSC = 0 if the regions did not overlap at all and DSC = 1 if the regions overlapped perfectly [11]. As the D' Agostino & Pearson omnibus normality test showed non-Gaussian distribution, the Mann-Whitney test was performed to assess differences before and after teaching, with a p value < 0.05 indicating statistically significant differences.

Results
The set of CMR images were representative of a general clinical population, including infarcted regions, thinning of myocardium and hypertrophy, and a range of hypo-/ a-and dyskinesia. Images ranged from good overall and diagnostic quality to images affected by artefacts. Intraand inter-observer variability and agreement, accuracy, precision, repeatability and reliability for beginners and experts are displayed in Table 1. Compared with both intra-and inter-observer data for experienced observers, beginners generally showed higher variability, lower agreement and lower repeatability, as could be expected.

Endo-and epicardial delineations
The DSC showed an improvement in overall delineations for complete beginners for both end-diastolic and end-systolic endocardial delineations, whereas epicardial delineations did not improve significantly after teaching, except for basal end-systole for B1 (Table 2). When comparing the DSC values between basal, midventricular and apical thirds of the longitudinal length of the LV it should be noted that basal but even more so apical DSC values are expected to be lower than midventricular values as the generally smaller areas yield inherently lower DSC. Values should therefore be compared between observers before and after teaching for the separate LV thirds, rather than between separate thirds. The beginners generally improved their definition of which basal and apical slices to include after teaching, and correspondingly missed fewer basal and apical slices after teaching (Table 3). Noteworthy, the beginners did not  include more basal or apical slices than the expert for any case. Representative delineations before ( Fig. 1 top row) and after ( Fig. 1 bottom row) teaching indicate typical "delineation errors" before teaching, such as exclusion of papillary muscle from blood pool (albeit not incorrect per se, but generally not performed for clinical delineation) by the complete beginners and lack of apical endocardial delineation by all beginners, but also the more challenging case of the basal delineation in end-systole where the experts followed the guidelines and extended the delineation to the aortic valve leaflets. It may be argued that variability in delineation of the basal slice may decrease if only ventricular muscle and corresponding blood pool were to be delineated instead of extending the delineations to the leaflets. This was however not particularly investigated in the current study, as the standardised SCMR guidelines were the chosen basis for delineations. Further, it was noted that these beginners' errors were corrected after teaching, and also that the general circular shape of the ventricle was more closely followed ( Fig. 1 bottom row).

Intra-observer measures
For EDV, intra-observer variability for beginners showed improvement after teaching for B1 as indicated by both For a perfect match in delineations DSC = 1 whereas 0 indicates no overlap in delineations. Basal, midventricular and apical denote slices comprising one third of the longitudinal left ventricle for end-diastole and end-systole, respectively. ns non-significant, *p < 0.05, **p < 0.01, ***p < 0.001 after teaching vs. before teaching For ESV, a different pattern emerged in which B2 improved in intra-observer variability, whereas B1 rather showed an increased variability and IB again showed no major difference after teaching. The mean values for volume measurements for each beginner's repeated measures before and after teaching were similar, showing quite low bias and overall high ICC values ( Table 1). The limits of agreement were however overall comparably wide also after teaching.
For LVM, bias did not show a clear improvement after teaching and again limits of agreement were wide and ICC showed a sharp decrease for intra-observer reliability after teaching for both B1 and B2 (Table 1).

Inter-observer, accuracy and precision measures
Accuracy and precision before and after teaching are shown in Fig. 2. Precision as quantified by CV for beginners' measurements of EDV and ESV showed improvement after teaching, but to a generally small extent, which is supported by a small change also in precision measured as limits of agreement. The limits of agreement were quite wide also after teaching. The beginners underestimated measured volumes by up to 44 ml (EDV), 27 ml (ESV) and overestimated LVM by up to 53 g before teaching, improving to an underestimation of up to 9 ml (EDV), 7 ml (ESV) and an overestimation of up to 30 g (LVM) after teaching. It is important to note that ejection fraction (EF) showed a generally high accuracy for all beginners also before teaching, indicating the imperative in stating measured volumes, i.e. EDV and ESV, and not EF alone as it hides potential delineation errors and incorrect volume measurements. Thus, the measure of EF cannot be used as a proxy for accurate and precise measurement of volumes.
The ICC showed generally lower accuracy compared with intra-observer reliability for the two complete beginners, and showed negative values in some cases, indicating within-groups variance exceeding between-groups variance (marked * in Table 1). More importantly, accuracy as measured by ICC increased strongly after teaching. For the intermediate beginner, however, accuracy measured as bias and as ICC for EDV and ESV was high already before teaching, particularly so for ESV. The intermediate beginner on the other hand showed an increase in accuracy for LVM after teaching.
Trends of improvement in intra-observer reliability after teaching did not necessarily indicate improved accuracy after teaching, and trends of decrease in intraobserver reliability were also found in cases with increased accuracy (Table 1). This indicates that the beginners improved in assessment compared with the expert in some cases, but at a cost of decreased reliability in repeated assessments, important to acknowledge in continued training.

LVM at end-diastole and end-systole
Left ventricular mass measured in diastole and systole can be used as an internal control for measurement errors as it should be similar throughout the cardiac cycle with minimum theoretical changes related to myocardial blood volume. For the experienced observer this difference was −0.1 ± 0.8 g, whereas the beginners generally showed higher bias and variability, both before (IB: −3.1 ± 6.9 g; B1: −0.1 ± 18.1 g; B2: −8.2 ± 10.6 g) and after teaching (IB: −5.2 ± 6.4 g; B1: 2.2 ± 13.5 g; B2: −17.9 ± 12.6 g).

Discussion
This CMR study shows that initial teaching to complete beginners increases accuracy for assessment of left ventricular volume measurements, however with high bias and low precision even after teaching. Even though the intermediate beginner showed quite high accuracy already before teaching, precision did generally not improve after standardised teaching as offered in most basic CMR courses. Thus, a two-hour teaching session is insufficient to allow complete and intermediate beginners to adequately evaluate basic CMR studies for left ventricular volumes and function.
When comparing accuracy and precision in the current study with previous studies some important points in how to plan teaching sessions need to be considered. In the previous teaching study by Karamitsos et al. [3] the beginner observers had up to two years previous training and were taught how to delineate CMR images also before their two-month study training period. We showed generally lower accuracy and precision for complete beginners, as expected, but similar in parts for the intermediate beginner in the current study. In their study, however, the most basal and apical slices were excluded from analysis, which inherently may increase accuracy and precision as compared with experienced observers when excluding these more challenging slices, although at a cost of accuracy and precision vs. true volumes. Compared with the study by Groth et al. [4], accuracy was lower but precision similar for EDV and ESV in the current study for the complete beginners, despite that the beginners in the previous study by Groth et al. all had a minimum of six months training before participating in the teaching study. This is also indicated by that the intermediate beginner in the current study showed similar accuracy and precision for EDV, whereas he measured ESV more accurately than the beginners in the previous study. Both the current study and the study by Groth et al. show improvement in accuracy but not precision after teaching. Altogether, this shows the importance of not focusing on time and number of cases only for deciding on whether a beginner is sufficiently trained.
In general, beginners' underestimation of EDV and ESV and overestimation of LVM, as in the current study, can be related to systematic errors in defining which basal slice to include for measurements and incorrect inclusion of papillary muscle and trabeculation in midventricular slices. As an example, both B1 and B2 in the current study assumed before teaching that papillary muscle and trabeculation should be included as myocardium, which is consistent with the more pronounced observed underestimation of EDV and ESV, and corresponding overestimation of LVM for these observers. This also stresses the importance of reporting measured EDV and ESV, and not EF alone as EF showed low bias (5%) despite the large underestimation of EDV (−44 and −37 ml) and ESV (−27 and −23 ml) before teaching, and reporting only on EF would thereby be misleading.
Previous studies showing improvement in CMR assessment after teaching included observers with six months to two years previous CMR experience [4], or previous cardiovascular or imaging knowledge [3]. In the current study, the intermediate observer functioned as a bridge between previous studies and the complete beginners in the current study. Together with the previous studies, the current study indicates both the value of previous experience, regardless of whether this is in CMR, other imaging or cardiovascular anatomy knowledge, and the need for intensive training. Most importantly, it is obvious that physicians may not receive appropriate initial training as improvement is found also two years after initial training as shown previously [4]. An initial short teaching session as given in most basic CMR courses is thus not sufficient and the individual differences between complete CMR beginners, the intermediate beginner and observers already with up to two years previous CMR experience are large. Therefore both adequate initial teaching and continuous experience including systematic training and above all adequate feedback in-house is needed for development of satisfactory skills in CMR evaluation. Even though CMR level II and III definitions include a time frame/number of cases for training; 3 months/150 cases and 12 months/300 cases, respectively, there are currently no criteria to measure the quality of "sufficient" training for the individual trainee [2].
In the current study both accuracy and precision of LV volumes were low for complete beginners without previous theoretical or clinical experience in CMR, irrespective of acceptable intra-observer variability. As shown by the intermediate beginner already before teaching, however, already a basic background in cardiac physiology and CMR analysis resulted in high reliability for both EDV (ICC: 0.177 and 0.357 vs. 0.931 for B1, B2 and IB, respectively) and ESV (ICC: 0.601 and 0.692 vs. 0.972 for B1, B2 and IB, respectively) as compared with expert observers. Whereas complete beginners demonstrated a clear improvement in volume measurements of both EDV and ESV after teaching, the intermediate beginner showed only minor changes in accuracy, being high already before teaching. However, the intermediate beginner did not improve in precision for EDV and ESV after teaching, indicating reproducibility errors. These reproducibility errors may be related to a combination of excluded basal slices, mismatch of delineation vs. trabeculation and for ESV difficulties in defining the proper lumen area considering that papillary muscle is included. All of which can be handled by focussed training after high-quality feedback with specific advice and recommendations on how to use the adjacent slices and different time frames for guidance.
For LVM, measurements rely on delineation of both endo-and epicardial borders. In the current study accuracy for LVM was low compared with accuracy for EDV and ESV, indicating challenges in delineating the epicardial border in particular. This is also supported by the higher reliability ICC values for EDV and ESV between beginners and experts. The DSC results may seem contra intuitive to this as no large differences in epicardial delineations between beginners and expert were shown, but this can be related to the inherent weakness of the DSC method as also relatively large area differences between observers for the relatively large epicardial delineations only have small impact on DSC values, i.e. a larger variability for epicardial delineations as DSC values are similar to (the smaller area of the) endocardial delineations and thus myocardial volumes and LVM may be different even though not indicated by DSC as such. Even though teaching improved beginners' assessments of LVM, the low accuracy and precision also after teaching are unacceptable for clinical reports and research studies. Further teaching with adequate feedback from experienced observers in-house is required to maintain CMR as an accurate and precise method. The reason for why the epicardial borders may be challenging could be associated with for instance chemical shift artefacts, which if not recognised will lead to inconsistent delineations with impact on variability, as noted in the performed by the complete beginners. Teaching led the beginners to understand the chemical shift artefacts and how to delineate the epicardial border in these cases. Together with properly clinically delineated papillary muscle and trabeculation after teaching, the LVM accuracy improved, however insufficient for clinical reporting or research studies. The decreased precision for LVM by B1, however, may be related to that this observer found the endocardial border more difficult to define, particularly in end-systole where papillary muscle and trabeculation may be challenging to differ from the actual myocardial wall. As this observer challenge was identified, focussed continued training may quickly help this observer to improve. This was also indicated by that the last measurement rendered an accuracy and precision of 21 ± 22 g, i.e. an improvement compared with previous measurements, without the interaction of the expert or continued teaching.
Altogether, the beginners in the current study showed trends of both increased and decreased reliability after teaching. This may indicate individual differences between observers, and points to the fact that individualised feedback is crucial when structuring training.
For training purposes it may also be useful to perform delineations in patients without shunts or valvular disease and provide the differences in stroke volumes (SV) between the left and right ventricles as a feedback mechanism. Similarly, utilising the diastolic and systolic LVM measurements as internal validation is considered to lower variability and increase accuracy in LVM determination. As this is generally not part of basic courses, the teaching did not include this recommendation. The results also show that the beginners did not consider LVM as an internal control and this single addition may further improve measurements.
The use of medical students as complete beginners may seem exaggerated, as most physicians may have had previous cardiac imaging experience before turning towards CMR. However, considering that CMR research and cardiac delineations are sometimes being performed also by engineers with potentially little background in cardiac anatomy and imaging experience, and by students without formal CMR training, we considered it valuable to assess delineations of complete beginners also representative for these groups. Finally, a higher number of observers may increase the power of the study. Study of the outcome of continued training of the beginners in the current study, or lack of training, would indicate the accuracy over time in centres with and without formalised continued training, and most importantly presence or absence of high-quality feedback mechanisms.
It has recently been shown that also experienced observers from different centres may vary in measured volumes, particularly for basal and apical slices but also dependent on small and consistent disparities throughout the short-axis stack [12]. A strategy to decrease interobserver variability could be to move towards more automated delineation by employing computer algorithms, as routinely performed in nuclear medicine [13]. This has however been challenging in CMR even though recent improvements of automatic algorithms show promise for the future [14]. The use of automatic algorithms without previous training in delineation however leads to lower overall delineation competence. The observer may not be sufficiently knowledgeable to delineate particularly challenging cases where the automatic algorithm may fail. Basic and thorough training is thus necessary before applying the use of automatic delineation software to maintain CMR as a technique known for its high accuracy and reproducibility and low intra-and inter-observer variability for quantitative measurements.

Limitations
Image contrast between blood pool and myocardium was slightly reduced as contrast agent had been administered before acquisition of cine images. This may have impact on delineation of endocardial borders for volume measurements. Also, the anatomical changes related to presence of pathology found in some of the included patients may render delineation more complex. The study population is however also because of these limitations representative of consecutive clinical patients and the results may therefore be considered adequate for clinical imaging and research studies. The number of observers may be considered small, however equivalent to previous teaching and observer variability studies, although the current study does not provide the highest number of observers among these studies. As the observers in previous studies represented more heterogeneous groups it may however be argued that the number in the current study is satisfactory.

Conclusions
This CMR study shows that initial teaching to complete beginners increases accuracy for assessment of left ventricular volume measurements, however with high bias and low precision even after teaching. Even though the intermediate beginner showed quite high accuracy already before teaching, precision did generally not improve after standardised teaching as offered in most basic CMR courses. To maintain CMR as a technique known for its high accuracy and reproducibility and low intra-and inter-observer variability for quantitative measurements, internationally standardised training should be encouraged including high-quality feedback mechanisms. Objective measurements of training methods, training duration and, above all, quality of assessments are required. of the authors, and the Commission cannot be held responsible for any use, which may be made of the information contained therein.

Availability of data and materials
The datasets analysed during the current study are available from the corresponding author on reasonable request.
Authors' contributions EH made substantial contributions to conception and design, analysed and interpreted data and drafted the manuscript. MI made substantial contributions to conception and design, acquired data and revised the manuscript for important intellectual content. HE, ASM, JS and DR analysed and interpreted data and revised the manuscript critically for important intellectual content. EN made substantial contributions to conception and design and revised the manuscript critically for important intellectual content. All authors gave final approval of the version to be published and agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Competing interests
Eike Nagel has received research grants from Philips Healthcare, Bayer Healthcare and Siemens Healthcare as well as contributions in kind from MEDIS, CVI42 and TomTec. All other authors declare no competing interest.

Consent for publication
Consent to publish was obtained from all individual participants included in the study.
Ethics approval and consent to participate All procedures were in accordance with the ethical standards of the institutional research committee (King's College London; REC Study No: 09/H0802/78) and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Informed consent was obtained before participation in the study.

Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.