We present a roadmap for integrating artificial intelligence (AI)-based image analysis algorithms into existing radiology workflows such that (1) radiologists can significantly benefit from enhanced automation in various imaging tasks due to AI, and (2) radiologists’ feedback is utilized to further improve the AI application. This is achieved by establishing three maturity levels where (1) research enables the visualization of AI-based results/annotations by radiologists without generating new patient records; (2) production allows the AI-based system to generate results stored in an institution’s picture-archiving and communication system; and (3) feedback equips radiologists with tools for editing the AI inference results for periodic retraining of the deployed AI systems, thereby allowing continuous organic improvement of AI-based radiology-workflow solutions. A case study (i.e., detection of brain metastases with T1-weighted contrast-enhanced three-dimensional MRI) illustrates the deployment details of a particular AI-based application according to the aforementioned maturity levels. It is shown that the given AI application significantly improves with feedback coming from radiologists; the number of incorrectly detected brain metastases (false positives) decreases from 14.2 to 9.12 per patient with the number of subsequently annotated datasets increasing from 93 to 217 as a result of radiologist adjudication. |
1.IntroductionArtificial intelligence (AI) (decision trees, regression algorithms, support vector machines, Bayesian methods, neural networks, etc.) has been utilized for decades to address a variety of medical imaging problems, such as image segmentation1 (i.e., finding the borders of a target object), registration2 (i.e., visually aligning anatomical parts in single- or multimodality images), detection (i.e., detecting formations/structures), and classification3 (i.e., grouping of medical information in subgroups). It can also facilitate information feeds in radiology workflows4,5 (e.g., natural language processing in dictation systems). “Machine learning” (ML) is an application of AI in which the computer (machine) is given data access, and models are used for extracting relevant information from the data. The recent usage of neural networks, which is a well-established ML approach, has gained significant momentum with generalization of the techniques to deeper network architectures, referred to as deep neural networks (DNNs); the complete concept is known as “deep learning” (DL).6 Research on deeper architectures shows that the accuracy of the deployed models depends heavily on the amount of relevant information; therefore, access to both past data and the ongoing inflow of new information is critical. Accordingly, DL-based solutions are commonly built on vast amounts of data.7 Medical imaging presents multiple challenges for researchers attempting to adopt DL. They include the following: (1) circulation of data between institutions, or even between departments within the same institution, is complicated by various legal barriers largely related to privacy issues; (2) high-resolution and high-dimensionality (e.g., 3-D + time) of the data commonly translates into AI models with high magnitudes of parameters; large amounts of data are then needed for convergence of models; and (3) most medical-imaging applications require image annotations (i.e., segmentation and detection results) by medical experts to train the AI algorithms;8,9 medical images without relevant and accessible annotations might not be useful for a variety of supervised learning scenarios in which the machine learns how to map an input image to output results by processing example input–output pairs. Thus, researchers pursuing the latest developments in ML must restructure their workflows to enable a flow of high-quality annotated information to both train and continuously update medical-imaging models. Accordingly, this report introduces architectural modifications to a given radiology workflow in multiple stages, delineating research, production, and feedback maturity levels. The ultimate goal of this work is to promote the integration of imaging AI into the radiology workflow in which inference-generating models grow organically with the continuous inflow of both new medical data and radiologist feedback for ongoing model learning. Therefore, the feedback maturity level is the final goal, with research and production serving as prior stages to achieving this end. To further solidify the understanding of these maturity levels, a case study representing the deployment of an example AI application, brain metastases (BM) detection with T1-weighted contrast-enhanced three-dimensional (3-D) MRI, is provided. The results section represents the evaluation of the accuracy of that AI application at three incremental quantities of added feedback data from radiologist adjudication of inference results in a complete simulated deployment [i.e., with 93, 155, and 217 annotated datasets and using a fivefold cross-validation (CV)10]. The report concludes with a discussion of the results, system limitations, and future directions. 2.Radiology Workflow and Its Adaptations2.1.Example Radiology Workflow and DefinitionsThe implementation and maintenance of radiology workflows have been investigated in numerous earlier studies.11,12 The workflow example highlighted in this study is as follows (see Fig. 1):
2.2.Architectural Adaptations to Achieve Levels of Maturity2.2.1.Research maturity levelTo represent the inference results from an ML algorithm to a dedicated group of medical experts, the workflow must be adapted as shown in Fig. 2. At this maturity level, imaging modalities (e.g., CT, MRI, etc.) send acquired images to a DICOM router which then distributes received images to pertinent storage locations, such as the PACS or VNA. The images archived in PACS can then be accessed by a radiologist via dedicated workstations just as the standard workflow. Next, the following occur sequentially:
2.2.2.Production maturity levelThe production maturity level aims to implement existing AI models as conceived without allowing further modifications. It enables verified AI models that have been deployed, optimized, and validated within the research workflow to be placed in a production mode (see Fig. 3). At this level of maturity,
The production maturity level permits triaging of studies based on the results from the AI model inference; it allows a study to be flagged, or accordingly prioritized, in a radiologist’s reading worklist.18 While this setup also allows viewing of AI results in connection with their target images, receipt of radiologist feedback on these results is not facilitated. Note that the AI results submitted to production PACS have the potential to consume large amounts of archival space on production systems. 2.2.3.Feedback maturity levelAs mentioned earlier, the accuracy of an AI model utilizing DL commonly depends on the amount of data available during the initial training. Accordingly, the feedback maturity level aims to place the AI model at a location where it can benefit from the constant stream of annotated data resulting from radiologist adjudication of inference results; the AI model is continuously updated/modified (see Fig. 4). The feedback maturity level can be achieved by adding a dedicated AI-model training server, medical-data annotation storage, and medical-imaging viewer that allows adding, editing, and removal of annotations from corresponding medical images. In this workflow,
AI results are tagged with the version of the used AI model, where the version information is linked with the model creation date. The review of the generated AI results is not mandatory, as the results are kept in separate annotation storage. However, if the institution chooses to utilize the full potential of the feedback architecture, then the continuous inflow of radiologist feedback becomes critical. 3.Case Study: Brain Metastases Detection Beyond CADComputer-aided detection (CAD) technology allows computational procedures to assist radiologists in the diagnosis and characterization of disease by obtaining quantitative measurements from medical images along with clinical information.19 A classical CAD system is trained with a collection of medical-imaging datasets before deployment, either as stand-alone software or as a tool in PACS and/or medical imaging viewers. The concept of integrating CAD into PACS (i.e., to allow the execution of CAD procedures on images stored in PACS) has been investigated in previous studies.20,21 The development of a traditional CAD model is complete after the initial training procedure. However, the model may be kept up to date with future batch data updates and training(s), executed as an additional procedure that is not part of a routine radiology workflow. In this case study, an example AI application for detecting BM with T1-weighted contrast-enhanced 3-D MRI22 is deployed in a radiology workflow that evolved through all three aforementioned maturity levels: research, production, and feedback. In the feedback maturity level, this AI application is beyond a traditional CAD; its accuracy is constantly improving due to feedback coming from the interpreting neuroradiologists who receive algorithm inference results while performing clinical interpretations; feedback to the AI system is provided almost seamlessly using the tools that are integrated into the radiology workflow. 3.1.Brain Metastases Detection: Research Maturity LevelFor the given case study, the technologist acquires the T1-weighted contrast-enhanced 3-D MRI data of a patient using an MRI scanner. The images are sent through DICOM-transfer to a DICOM router. After the images are received by this router, they are forwarded to two different storage locations, including the institution’s PACS and VNA. The images routed to PACS are immediately accessible by neuroradiologists for their interpretation, whereas the images routed to the VNA are accessible to nonradiologist physicians via an enterprise viewer. During the medical-image interpretation, if the neuroradiologist decides to receive inference results from the AI model, the images can be sent to the AI system via DICOM-transfer, which transmits the series to the DICOM node where the AI system is located (see Fig. 5). This image transfer can be initiated at any time by the neuroradiologist, but it should ideally be done at the beginning of an interpretation to minimize the amount of time waiting on inference results; if images are sent as soon as the examination is opened for interpretation, the neuroradiologist can continue viewing images while the AI model is processing a result. After processing the 3-D MRI data, the AI model generates a GSPS object to register its results, which are 3-D coordinates of each BM-detected center position in this case. It then uses DICOM-transfer to send the GSPS object to the research-PACS; in the current infrastructure, the results are sent to an advanced imaging-analysis workstation server. In the research architecture, this is a critical phase for keeping the results (1) separated from the patient’s EMR and (2) inaccessible by nonradiology personnel. Next, the neuroradiologist can switch to the advanced viewer with the click of a button located in the user toolbar of the PACS. The advanced viewer overlays the resulting GSPS objects on the medical images (see Fig. 6). The visualization of the AI results concludes the capabilities of the research maturity level; the neuroradiologists and medical experts may visually inspect and analyze AI system outputs. 3.2.Brain Metastases Detection: Production Maturity LevelAfter an AI model is approved for clinical usage by the Food and Drug Administration (FDA), the research workflow can be altered slightly to achieve the production maturity level. First, the DICOM router is configured to concurrently send the acquired medical images directly to the institution’s PACS, VNA, and AI systems based on procedure code. The routing rules of the router can be set so that it only forwards the necessary set of images to the AI system; in this case study, the pertinent images are the series for axial T1-weighted 3-D MRI with contrast. For a given patient, the sending of a selected subset of images series, rather than a complete study, is critical; sending a complete study may take significantly more time to transmit while consuming larger storage space at the target destination. After the AI system processes the images received from the router, it sends the results in GSPS format to the PACS server; hence, the results are available for the neuroradiologist to view in their standard PACS workspace. The neuroradiologist can simply load the results by selecting the AI inference presentation state that displays the GSPS overlays on the corresponding MRI data (see Fig. 7). The production workflow saves the neuroradiologist time by having only the appropriate images automatically routed to the AI model, as well as sending the results to the system where the neuroradiologist will be utilizing them to enhance examination interpretation. The results become part of the patient’s EMR in this architecture; therefore, the deployed AI model must be validated properly during the research architecture deployment. 3.3.Brain Metastases Detection: Feedback Maturity LevelThe production architecture can be further modified to integrate feedback processes into the workflow. This is achieved by first incorporating a viewing tool that enables the neuroradiologist to see the AI results/annotations on their corresponding images; the viewing tool must also enable the editing of these annotations. For this purpose, a ZFP medical-image viewer23,24 can be modified for (1) accessing the AI results, representing the AI-detected metastases centers stored in an annotation database and (2) editing/removing these results (see Fig. 8). To enable the continuous updating of the AI model, a dedicated training server can be added into the workflow, where (1) a direct connection between PACS and the training server is established, allowing newly acquired 3-D MRI datasets to be sent from PACS to the training server, and (2) the training server is also given direct access to the annotation database. An updated AI model is automatically used for studies acquired after the model update or for previously acquired studies on demand. By connecting the training server to both PACS and the annotation database, the training server is able to extract labeled data in any desirable format (e.g., GSPS, DICOM SEG, DICOM mask, or DICOM SR). 4.ResultsThe accuracy of the AI model used in the case study22 is measured for the feedback maturity level by simulating three incremental quantities of added feedback data from radiologist adjudication of inference results in a simulated complete deployment. The data-selection criteria for these increments were as follows: (1) datasets included 93 (acquired from 85 patients), 155 (from a total of 120 patients, 35 additional patients) and 217 (from a total of 158 patients, 38 additional patients) postgadolinium T1-weighted 3-D MRI exams, respectively. The major components of this investigation, including (1) the algorithmic details (e.g., DNN architecture, data augmentation steps, training methodology, etc.), (2) analysis of statistical properties of the BM included in the study (e.g., lesion diameter, volume, location, etc.), and (3) adherence to data-acquisition criteria, have been comprehensively described in a previous report.22 This retrospective study was conducted under Institutional Review Board approval with waiver of informed consent. The metric of average false-positives (AFP) per patient, representing the incorrectly AI-detected BM lesions for each patient in relation to the sensitivity, was used during the validation of the algorithm for the three datasets;21 the AFP values were computed using a fivefold CV (see Fig. 9). At a 90% sensitivity level (i.e., 90% of true BM are detected for a given test exam), the algorithm produced 14.2, 9.78, and 9.12 false positives per patient for the first [see Fig. 9(a)], second [see Fig. 9(b)], and third [see Fig. 9(c)] datasets, respectively. The reduction of false positives from 14.2 to 9.12 with the addition of 124 exams (i.e., Dataset01 had 93 and Dataset03 had 217 exams) is a significant improvement for a BM detection system. The AFP of the system is summarized for 80%, 85%, and 90% sensitivity levels in Fig. 10. 5.Discussion and ConclusionIt has been shown in multiple previous studies that to train more complex models it is commonly better to have more expert annotated data.25,26 The results of this work show that the amount of data is also a determining factor for the accuracy of the AI approach used in the case study. However, the amount of data is not the only element that might benefit an AI-based system; the ML algorithm and its parameters, as well as the properties of the added data (e.g., data labels, image quality, the novelty of the added data, etc.) have a significant impact on the final accuracy. While the feedback maturity level facilitates the gathering of annotated data and feeding of the additional data into deployed models, the selection of proper AI models is still the responsibility of the researchers/data scientists. Accordingly, it would be misleading to assume that the deployment of feedback architecture alone would ensure a successful AI system integration. The architectures described in this paper should be feasible to implement in most of the medical institutions with relatively modern radiology systems. Variations on institutional setups are common. For example, some institutions may not have a DICOM router between their modalities and PACS. In these cases, the flow of information should be facilitated by other means; either technologists or radiologists could be responsible for transferring the data between systems or more automated processes could be set up within PACS itself (e.g., configured to send data to preset DICOM nodes within the institution). Modern PACS systems are commonly able to load and display GSPS objects; hence, the visualization of basic AI results should not be a limitation. More recent PACS implementations can also accommodate DICOM SEG and/or DICOM SR file types, opening new opportunities for more advanced data display and complex workflows. As ML enhances its applicability and significance in the medical-imaging domain, radiology workflows enabling AI models to access medical data will become increasingly critical. Accordingly, this report delineates three maturity levels for AI integration into a given radiology workflow: (1) research, representing the results of investigational AI models to radiologists without generating new patient records, (2) production, processing data stored in PACS with a previously validated deployed AI model, and (3) feedback, updating a deployed AI model organically via radiologist interactions with images and their annotations, which allows constant evolution of an AI model from the inference adjudication process. The case study gave implementation directions for these architectures by providing descriptive figures. ReferencesN. Sharma and L. M. Aggarwal,
“Automated medical image segmentation techniques,”
J. Med. Phys./Assoc. Med. Phys. India, 35
(1), 3
(2010). https://doi.org/10.4103/0971-6203.58777 Google Scholar
M. V Wyawahare et al.,
“Image registration techniques: an overview,”
Int. J. Signal Process. Image Process. Pattern Recognit., 2
(3), 11
–28
(2009). Google Scholar
S. N. Deepa and B. A. Devi,
“A survey on artificial intelligence approaches for medical image classification,”
Indian J. Sci. Technol., 4
(11), 1583
–1595
(2011). https://doi.org/10.17485/ijst/2011/v4i11/30291 Google Scholar
F. Jiang et al.,
“Artificial intelligence in healthcare: past, present and future,”
Stroke Vasc. Neurol., 2
(4), 230
–243
(2017). https://doi.org/10.1136/svn-2017-000101 Google Scholar
A. N. Ramesh et al.,
“Artificial intelligence in medicine,”
Ann. R. Coll. Surg. Engl., 86
(5), 334
–338
(2004). https://doi.org/10.1308/147870804290 ARCSAF 0035-8843 Google Scholar
W. Liu et al.,
“A survey of deep neural network architectures and their applications,”
Neurocomputing, 234 11
–26
(2017). https://doi.org/10.1016/j.neucom.2016.12.038 NRCGEO 0925-2312 Google Scholar
C. Sun et al.,
“Revisiting unreasonable effectiveness of data in deep learning era,”
in Proc. IEEE Int. Conf. Comput. Vision,
843
–852
(2017). https://doi.org/10.1109/ICCV.2017.97 Google Scholar
G. Litjens et al.,
“A survey on deep learning in medical image analysis,”
Med. Image Anal., 42 60
–88
(2017). https://doi.org/10.1016/j.media.2017.07.005 Google Scholar
M. I. Razzak, S. Naz and A. Zaib,
“Deep learning for medical image processing: overview, challenges and the future,”
Lect. Notes Comput. Vision Biomech., 26 323
–350
(2018). https://doi.org/10.1007/978-3-319-65981-7 Google Scholar
T. Hastie, R. Tibshirani, J. Friedman,
“Model assessment and selection,”
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 219
–257 Springer-Verlag, New York
(2009). Google Scholar
B. I. Reiner et al.,
“Multi-institutional analysis of computed and direct radiography: part I. Technologist productivity,”
Radiology, 236
(2), 413
–419
(2005). https://doi.org/10.1148/radiol.2362040671 RADLAX 0033-8419 Google Scholar
K. P. Andriole et al.,
“Addressing the coming radiology crisis: the society for computer applications in radiology transforming the radiological interpretation process (TRIP) initiative,”
J. Digital Imaging, 17
(4), 235
–243
(2004). https://doi.org/10.1007/s10278-004-1027-1 JDIMEW Google Scholar
T. Hackländer et al.,
“DICOM router: an open source toolbox for communication and correction of DICOM objects,”
Acad. Radiol., 12
(3), 385
–392
(2005). https://doi.org/10.1016/j.acra.2004.11.015 Google Scholar
R. R. M. Aparna and P. Shanmugavadivu,
“A survey of medical imaging, storage and transfer techniques,”
in Int. Conf. ISMAC Comput. Vision and Bio-Eng.,
17
–29
(2018). Google Scholar
Digital Imaging and Communications in Medicine (DICOM)—Supplement 33: Grayscale Softcopy Presentation State (GSPS) Storage, National Electrical Manufacturers Association (NEMA), Rosslyn, Virginia
(2016). Google Scholar
A.50 Segmentation IOD in DICOM PS3.3—Information Object Definitions, National Electrical Manufacturers Association (NEMA), Rosslyn, Virginia
(2016). Google Scholar
R. Noumeir,
“DICOM structured report document type definition,”
IEEE Trans. Inf. Technol. Biomed., 7
(4), 318
–328
(2003). https://doi.org/10.1109/TITB.2003.821334 Google Scholar
L. M. Prevedello et al.,
“Automated critical test findings identification and online notification system using artificial intelligence in imaging,”
Radiology, 285
(3), 923
–931
(2017). https://doi.org/10.1148/radiol.2017162664 RADLAX 0033-8419 Google Scholar
H. Fujita et al.,
“An introduction and survey of computer-aided detection/diagnosis (CAD),”
in Proc. IEEE Int. Conf. Future Comput. Control Commun.,
200
–205
(2010). Google Scholar
A. H. T. Le, B. Liu and H. K. Huang,
“Integration of computer-aided diagnosis/detection (CAD) results in a PACS environment using CAD—PACS toolkit and DICOM SR,”
Int. J. Comput. Assist. Radiol. Surg., 4
(4), 317
–329
(2009). https://doi.org/10.1007/s11548-009-0297-y Google Scholar
L. Bogoni et al.,
“Impact of a computer-aided detection (CAD) system integrated into a picture archiving and communication system (PACS) on reader sensitivity and efficiency for the detection of lung nodules in thoracic CT exams,”
J. Digital Imaging, 25
(6), 771
–781
(2012). https://doi.org/10.1007/s10278-012-9496-0 JDIMEW Google Scholar
E. Dikici et al.,
“Automated brain metastases detection framework for T1-weighted contrast-enhanced 3D MRI,”
(2019). Google Scholar
Y. H. Chang et al.,
“Primer for image informatics in personalized medicine,”
Procedia Eng., 159 58
–65
(2016). https://doi.org/10.1016/j.proeng.2016.08.064 Google Scholar
T. Urban et al.,
“LesionTracker: extensible open-source zero-footprint web viewer for cancer imaging research and clinical trials,”
Cancer Res., 77
(21), e119
–e122
(2017). https://doi.org/10.1158/0008-5472.CAN-17-0334 Google Scholar
C. D. Naylor,
“On the prospects for a (deep) learning health care system,”
JAMA, 320
(11), 1099
–1100
(2018). https://doi.org/10.1001/jama.2018.11103 JAMAAP 0098-7484 Google Scholar
M. de Bruijne,
“Machine learning approaches in medical image analysis: from detection to diagnosis,”
Med. Image Anal., 33 94
–97
(2016). https://doi.org/10.1016/j.media.2016.06.032. Google Scholar
BiographyEngin Dikici is a research scientist in the Laboratory for Augmented Intelligence in Imaging of the Department of Radiology in the Ohio State University College of Medicine. He received his MSc degree from the Computer and Information Science Department at the University of Pennsylvania in 2006, and his PhD in biomedical engineering from the College of Medicine of Norwegian University of Science and Technology in 2012. His research interests include segmentation, registration, real-time tracking, and synthesis of medical images. Matthew Bigelow is a biomedical informatics consultant for the Department of Radiology in the Ohio State University College of Medicine. He received his BS and MBA degrees from Ohio State University in 2012, and 2019, respectively. His research interests are nuclear medicine, computed tomography, and imaging informatics. Luciano M. Prevedello, associate professor of radiology, is vice-chair for medical informatics and augmented intelligence in imaging at Ohio State University. He is a member of the Board of Directors of the Society for Imaging Informatics in Medicine. He chairs the Machine Learning Steering Committee at the Radiological Society of North America and is an associate editor of the Radiology: Artificial Intelligence journal. Richard D. White, MD, MS, professor of radiology, has been the chairman in the Department of Radiology at Ohio State University since 2010, succeeding a chairmanship at the University of Florida-Jacksonville, 2006 to 2010. These followed services as clinical director in the Center for Integrated Non-Invasive Cardiovascular Imaging at Cleveland Clinic in 1989 to 2006. He received his MD degree from Duke University, 1978 to 1981, before becoming fellow of the Sarnoff Foundation for Cardio-Vascular Research (Duke: 1981 to 1982). He completed residency at the University of California-San Francisco with ABR certification, 1982 to 1986, followed by NIH-Cardiovascular Imaging fellowship (UCSF: 1985 to 1987). After training, he held leadership positions at Georgetown University, 1987 to 1988, and University Hospitals, Cleveland, 1988 to 1989. Throughout his career, he has focused on cardiovascular MR/CT, more recently pursuing imaging informatics with MS-Health Informatics at Northwestern University in 2015 to 2018. Barbaros S. Erdal, associate professor of radiology and biomedical informatics, is assistant chief of the Division of Medical Imaging Informatics for the Ohio State University Wexner Medical Center, as well the director of the Laboratory for Augmented Intelligence in Imaging, and director of scholarly activities for the Department of Radiology in the Ohio State University College of Medicine. He received his PhD in electrical and computer engineering from Ohio State University. |