Potential of E-Learning Interventions and Artificial Intelligence–Assisted Contouring Skills in Radiotherapy: The ELAISA Study

PURPOSE Most research on artificial intelligence–based auto-contouring as template (AI-assisted contouring) for organs-at-risk (OARs) stem from high-income countries. The effect and safety are, however, likely to depend on local factors. This study aimed to investigate the effects of AI-assisted contouring and teaching on contouring time and contour quality among radiation oncologists (ROs) working in low- and middle-income countries (LMICs). MATERIALS AND METHODS Ninety-seven ROs were randomly assigned to either manual or AI-assisted contouring of eight OARs for two head-and-neck cancer cases with an in-between teaching session on contouring guidelines. Thereby, the effect of teaching (yes/no) and AI-assisted contouring (yes/no) was quantified. Second, ROs completed short-term and long-term follow-up cases all using AI assistance. Contour quality was quantified with Dice Similarity Coefficient (DSC) between ROs' contours and expert consensus contours. Groups were compared using absolute differences in medians with 95% CIs. RESULTS AI-assisted contouring without previous teaching increased absolute DSC for optic nerve (by 0.05 [0.01; 0.10]), oral cavity (0.10 [0.06; 0.13]), parotid (0.07 [0.05; 0.12]), spinal cord (0.04 [0.01; 0.06]), and mandible (0.02 [0.01; 0.03]). Contouring time decreased for brain stem (–1.41 [–2.44; –0.25]), mandible (–6.60 [–8.09; –3.35]), optic nerve (–0.19 [–0.47; –0.02]), parotid (–1.80 [–2.66; –0.32]), and thyroid (–1.03 [–2.18; –0.05]). Without AI-assisted contouring, teaching increased DSC for oral cavity (0.05 [0.01; 0.09]) and thyroid (0.04 [0.02; 0.07]), and contouring time increased for mandible (2.36 [–0.51; 5.14]), oral cavity (1.42 [–0.08; 4.14]), and thyroid (1.60 [–0.04; 2.22]). CONCLUSION The study suggested that AI-assisted contouring is safe and beneficial to ROs working in LMICs. Prospective clinical trials on AI-assisted contouring should, however, be conducted upon clinical implementation to confirm the effects.

specific workload.If the development of radiotherapy does not keep pace with the increasing demands, it may lead to further underutilization globally, which inevitably will harm patients.
Contouring is a time-consuming task for clinical staff 10 and is prone to observer variability, 11,12 but it is essential in modern radiotherapy.Auto-contouring has been studied intensely 11 and is known to reduce contouring time 13,14 and interobserver variation [15][16][17][18][19] across a variety of cancer sites, although manual editing is still required. 20Today, autocontouring is often artificial intelligence (AI)-based and is usually integrated into the clinical workflow as templates for contouring (AI-assisted contouring).AI-assisted contouring may serve as a part of the solution to the underutilization of radiotherapy by reducing the manual workload for clinical staff and thereby reducing the diagnosis-to-treatment times, which may improve patient outcome. 21Most research on AI-assisted contouring, however, stem from high-income countries.This is a challenge in low-and middle-income countries (LMICs) since local factors such as patient abundance, clinical and financial resources, technical expertise, and mindsets of clinicians possibly influence how AI-assisted contouring is used.Hence, AI-assisted contouring should be evaluated in the context of LMICs to secure a safe and beneficial implementation worldwide.
The purpose of this study was to investigate how contouring quality and contouring time were affected among radiation oncologists (ROs) working in LMICs by (1) a single teaching session on contouring guidelines and AI-assisted contouring and (2) having AI-assisted contouring available.Subsequently, the effect of teaching (1) combined with AI-assisted contouring (2) was investigated after 2-week and 6-month follow-up periods.The study was a collaboration between the International Atomic Energy Agency (IAEA) and Aarhus University Hospital, Denmark.

Institutions and Participants
Radiotherapy institutions were selected and enrolled by the IAEA according to the following criteria:

Case Selection
The aim was to have as many patient cases as possible while having at least seven contour sets made per case.On the basis of the experience from previous studies by IAEA, the expected participant dropout rate was 50%.Therefore, optimally, 14 participants should be assigned to each case.Given the study design (see below) and the number of enrolled institutions and participants, this required 16 head-andneck cancer cases that were provided by Aarhus University Hospital (Data Supplement, Table S1).

Study Design and Random Assignment
Institutions (including their participants) were randomly assigned to either the control group or the intervention

CONTEXT Key Objective
How does teaching and artificial intelligence (AI)-assisted contouring affect contouring quality and time in a global cohort of radiation oncologists working in low-and middle-income countries (LMICs) contouring organs-at-risk (OARs) for headand-neck cancer?Knowledge Generated AI-assisted contouring increased contouring quality compared with expert consensus contours regardless of whether teaching was received or not.Teaching increased contouring quality for only two OARs, but increased the time-saving effect of AI-assisted contouring.
Relevance AI-assisted contouring in combination with teaching of contouring guidelines is an effective strategy to reduce contouring time and conform contouring practices within and between radiotherapy departments located in LMIC.
group.Random assignment was balanced on (1) institutions' annual number of patients with head-and-neck cancer and (2) whether any form of auto-contouring was available at the institution.
With 2 weeks to complete each round, participants were asked to contour eight organs-at-risk (OARs) on one case in each of the four sequential rounds (Fig 1 ): 1.Before the teaching session (baseline) 2. Immediately after the teaching session (after teaching) 3. Two weeks after the teaching session (short-term follow-up) 4. Six months after short-term follow-up (long-term follow-up) The control group contoured manually at baseline and after teaching and did AI-assisted contouring in the short-term and long-term follow-ups.The intervention group used AIassisted contouring in all four rounds.In each round, four new patient cases were used (same cases in both groups).The cases were assigned randomly institution-wise, which resulted in 2-3 institutions (7-15 ROs) in each study group per case.

6-month follow-up period
The effects of teaching and AI-assisted contouring were quantified in a two-by-two fashion with results at baseline and after teaching.To investigate whether the effects of teaching combined with AI-assisted contouring persisted over time, the short-term and long-term follow-ups were compared with the round after teaching within each group.

Contouring
Contouring took place in EduCase (RadOnc eLearning Center, Inc, Jackson, WY), and the AI-contours were generated by EduCase professionals using Contour1, Guideline-Based Segmentation Solution (MVision AI Oy, Helsinki, Finland), which is based on well-defined contouring guidelines for brain stem 22 and for the spinal cord, oral cavity, mandible, right submandibular gland, right parotid gland, right optic nerve, and thyroid. 23To imitate the clinical reality of many LMICs, only CT scans were available to the participants.Participants were instructed to generate clinically acceptable contours in accordance with the contouring guidelines.If deemed necessary, participants were allowed to delete AI contours and start over with manual contouring.The participants did not have access to the contours of others.Contours were handed in individually, but collaboration between participants was, however, not explicitly disallowed.

Preprocessing of Contours
Digital Imaging and Communications in Medicine Structure Sets were exported from EduCase and converted to Neuroimaging Informatics Technology Initiative files with a voxel spacing of x 5 0.39 mm, y 5 0.39 mm, and z 5 2 mm (a scaling factor in X and Y of 3 times the CT grid). 24,25ntour Quality Expert consensus contours were generated in the following steps (Fig 1).First, three sets of contours were independently made by three head-and-neck expert oncologists (J.G.E., H.P. and P.L.) without access to the AI contours used in this study.These were merged using Simultaneous Truth and Performance Evaluation (STAPLE). 26The STAPLEmaps were binarized with a threshold of 0.8.The binary STAPLE structures were then reviewed and corrected for artifacts by an external head-and-neck expert oncologist (J.C.) in consensus with J.G.E.Contouring quality was quantified using the Dice Similarity Coefficient (DSC) and Hausdorff Distance 95th percentile (HD95) between participants' contours and expert consensus contours.Increasing DSC and decreasing HD95 indicate increasing agreement with the expert consensus contours and thus higher contouring quality.
The participants were blinded toward the expert consensus contours throughout the entire study.

Contouring Time
Contouring time was defined as the time of active contouring (mouse click-and-hold) with any contouring tool.For all contouring interactions, duration in milliseconds, type of interaction, active structure, and participant name were automatically recorded by EduCase.The durations of interactions were summed over structures and are reported in minutes.

Statistics and Software
Data were assumed to be nonparametric.Therefore, effect sizes were quantified as absolute differences of medians with 95% CIs.This is formatted as: Absolute Difference [CI low; CI high].CIs were estimated with percentile bootstrapping in 9,999 iterations.Absolute differences are positive when the reference value is the smaller number and vice versa.Data handling was performed in Python 3.9, and DSC and HD95 were calculated using MedPy. 27Statistics and bootstrapping were performed with SciPy. 28

RESULTS
Of the 97 participating ROs, 94 completed the questionnaire on professional background.The random assignment resulted in 11 institutions for the control group and 12 for the intervention group.The characteristics of institutions and participants are found in Table 1.The four rounds of contouring were completed by 89 (92%), 91 (94%), 93 (96%), and 80 (82%) ROs, respectively (Fig 2).
The raw results of contouring quality and contouring time are shown for each group at baseline in Figure 3 and for all contouring rounds in Figure 4. Below is a walk-through of the estimated effect sizes of the four combinations of exposures along with the effect of combining teaching and AIassisted contouring in the follow-up rounds (Fig 5).Visual comparisons between manual contouring and AI-assisted contouring are provided in the Data Supplement (Fig S2).

AI-Assisted Contouring Without Teaching
By comparing the two study groups at baseline (Fig 3), the effect of AI-assisted contouring without teaching was quantified (Fig

AI-Assisted Contouring With Teaching
To address the combined effect of teaching and AI-assisted contouring, the two study groups were compared after teaching (Fig

Teaching and AI-Assisted Contouring Over Time
To address whether the effect of teaching combined with AIassisted contouring persisted over time, the short-term and long-term follow-ups were compared within the study groups with the round after teaching (Fig 5 , bottom).For most OARs, the effect sizes were similar in short-term and long-term follow ups, corresponding to a persistent effect over time.However, median DSC was substantially lower for thyroid at long-term follow-up in both study groups.

Acceptance of AI Contours
Across all submissions with AI-assisted contouring, 335 (24%) AI contours were accepted without editing by a total of 66 (68%) participants.Among these participants, the median number of accepted contours was three per case.Out of the accepted-as-is contours, 100 (30%) contours had a higher DSC than the group average for the specific case and study arm.

DISCUSSION
This study investigated the effects of teaching and AIassisted contouring in a large randomized study on ROs from LMICs.The dropout rate was much lower than expected, and therefore, the study was well powered for the research questions.Regardless of contouring method, teaching improved contouring quality for two OARs.Regardless of teaching, AI-assisted contouring increased contouring quality and reduced contouring time for most investigated OARs.The combined effect of teaching and AIassisted contouring persisted throughout the two follow-up rounds.The study thereby confirms previous research on time savings [29][30][31][32] and reduction in interobserver variability 33 obtained by auto-contouring.
Previous research has shown that teaching improves contour consistency and quality. 34,35It was therefore somewhat surprising that teaching did not affect contour quality more.This is probably due to participants already contouring well at baseline, and thus there was little room for improvement with the applied metrics.This explanation is supported by the fact that oral cavity was one of the structures that improved with teaching in both study groups.Contouring of oral cavity heavily relies on the guideline definitions.Hence, the increase in contouring quality and contouring time in the control group may indicate that teaching in fact was effective, as the participants spent more time contouring higher quality structures.For the intervention group, this manifested itself with an increase in contouring quality and a drop in contouring time, which could be due to AI contours being in high accordance with the contouring guidelines.
Therefore, fewer adaptations were required with the participants' updated knowledge.Besides this, it can, however, not be ruled out that (1) a single teaching session is not enough to change contouring practice of the participants, (2) the rotation to new cases after the teaching session could make contouring easier/harder, and (3) the metrics were not sensitive to subtle differences in contours.The results are, however, good news as the high baseline quality may be attributed to the recent years' efforts in teaching programs and implementation of contouring guidelines.Although teaching did not increase contouring quality as much as expected, teaching modified the time savings of AI-assisted contouring; for most OARs, AI-assisted contouring alone reduced contouring time (crosses, Fig 5), but even larger time savings were observed when AI-assisted contouring was combined with teaching (lines, Fig 5).In effect, this means that similar levels of contouring quality were obtained faster with AI-assisted contouring when accompanied by teaching.
Given the inevitable interobserver variability that also exists between experts, the consensus structures were considered to be of the highest quality obtainable.It was therefore quite extraordinary that AI-assisted contouring enabled participants to make structures with higher similarity to the expert consensus contours compared with manual contouring.Although the consensus contours were generated independently of the AI contours, they were in high agreement (Data Supplement, Fig S3).This suggested that AI-assisted contouring may be an effective strategy not only to conform contouring practices between individuals but even facilitate the implementation of-and adherence to-contouring guidelines across countries and continents.
An additional finding was that outliers were effectively eliminated with AI-assisted contouring.With an (arbitrary) threshold in DSC at 0.4, 2.5% of manual contours and only 0.2% of AI-assisted contours were outliers.Although it is unlikely that all outliers would affect radiotherapy treatment, there were five complete geographical misses observed with manual contouring (Data Supplement, Fig S2).These would likely have affected treatment planning, had they been used clinically.The rate of geographical misses with manual contouring in this study hypothetically translates into one organ missed for every 18 patients contoured.This serves as an imperative reminder that peer review of all contours should be routine clinical practice.
From this study, it is clear that AI-assisted contouring is beneficial when the AI-contouring model locates the right structures and provides decent contours.It is, however, known that these models sometimes fail to do so. 20,36  the commercial model used in this study.Therefore, it cannot be determined whether the benefit of avoided outliers supersedes the risk that may come with erroneous AI contours.
A major limitation to the study was that contours were not used clinically.Most participants completed their cases besides their regular clinical duties, which theoretically increased the risk of automation bias due to the risk of time pressure, lack of interest, and lack of accountability for treatment. 37The fact that two thirds of participants handed in all the accepted-as-is-contours suggests either that automation bias was at play for these participants or that these participants might have cognitively processed the review of AI contours differently.The underlying mechanisms of reviewing and editing AI contours are an important topic for future research but are beyond the scope of this work.To confirm the findings of this study, prospective clinical studies on AI-assisted contouring in LMICs should be carried out upon clinical implementation.
ROs who worked in low-and middle-income countries contoured most OARs for head-and-neck cancer with higher similarity to expert consensus contours with AI-assisted contouring than with manual contouring.Furthermore, teaching combined with AI-assisted contouring was the most effective strategy to reduce contouring time.The benefits of teaching combined with AI-assisted contouring persisted after a 6-month follow-up period.Therefore, AIassisted contouring-especially when combined with teaching-is a promising contribution to reaching optimal utilization of radiotherapy in the present and future.Therefore, a global transition toward AI-assisted contouring with appropriate clinical monitoring is encouraged.

1 .
Located in an LMIC 2. Treating at least 20 patients with head-and-neck cancer per year 3. Performing computed tomography (CT)-based intensitymodulated radiotherapy 4. Able to enroll at least three ROs 5. Access to stable internet connection Each institution appointed one chief scientific investigator who was in charge of recruiting ROs from the institution.There were no requirements for the ROs except they should have undergone training in head-and-neck contouring.The ROs completed a questionnaire regarding professional background and knowledge about AI-assisted contouring (Data Supplement, Fig S1).As a result, 23 radiotherapy institutions with 97 ROs were enrolled.

FIG 3 .FIG 5 .
FIG 3. The results at baseline of the two groups.In red is the control group contouring manually, and in blue is the intervention group doing AIassisted contouring.Each circle represents a single contour from a single participant, and boxplots are based on these.The top and middle rows show DSC and HD95 between participants' contours and expert consensus structures, respectively.The bottom row shows the contouring time recorded by the contouring platform.AI, artificial intelligence; DSC, Dice Similarity Coefficient; HD95, Hausdorff Distance 95th percentile.

Random assignment RT centers Teaching session in guidelines and AI-assisted contouring Baseline cases: 1-4 Manual contouring AI-assisted contouring 2 weeks 2 weeks 2 weeks 2 weeks After teaching cases: 5-8 Manual contouring AI-assisted Contouring Short-term follow-up cases: 9-12 AI-assisted contouring AI-assisted Contouring Long-term follow-up cases: 13-16 AI-assisted contouring AI-assisted contouring
STAPLE algorithm and binary thresholded at 0.8, and (3) the final STAPLEd structures were reviewed and edited for artifact by one of the initial contourers in consensus with an external head-and-neck expert oncologist.AI, artificial intelligence; RT, radiotherapy; STAPLE, Simultaneous Truth and Performance Evaluation.

TABLE 1 .
Baseline Characteristics of Enrolled Institution and Participants Itremains unknown what effect erroneous AI contours would have on final contours-or, in other words, how wrong AI contours would have to be, before clinicians realize it and fall back on manual contouring.The failure rate is unknown for