Quantitative Imaging Informatics for Cancer Research

PURPOSE We summarize Quantitative Imaging Informatics for Cancer Research (QIICR; U24 CA180918), one of the first projects funded by the National Cancer Institute (NCI) Informatics Technology for Cancer Research program. METHODS QIICR was motivated by the 3 use cases from the NCI Quantitative Imaging Network. 3D Slicer was selected as the platform for implementation of open-source quantitative imaging (QI) tools. Digital Imaging and Communications in Medicine (DICOM) was chosen for standardization of QI analysis outputs. Support of improved integration with community repositories focused on The Cancer Imaging Archive (TCIA). Priorities included improved capabilities of the standard, toolkits and tools, reference datasets, collaborations, and training and outreach. RESULTS Fourteen new tools to support head and neck cancer, glioblastoma, and prostate cancer QI research were introduced and downloaded over 100,000 times. DICOM was amended, with over 40 correction proposals addressing QI needs. Reference implementations of the standard in a popular toolkit and standalone tools were introduced. Eight datasets exemplifying the application of the standard and tools were contributed. An open demonstration/connectathon was organized, attracting the participation of academic groups and commercial vendors. Integration of tools with TCIA was improved by implementing programmatic communication interface and by refining best practices for QI analysis results curation. CONCLUSION Tools, capabilities of the DICOM standard, and datasets we introduced found adoption and utility within the cancer imaging community. A collaborative approach is critical to addressing challenges in imaging informatics at the national and international levels. Numerous challenges remain in establishing and maintaining the infrastructure of analysis tools and standardized datasets for the imaging community. Ideas and technology developed by the QIICR project are contributing to the NCI Imaging Data Commons currently being developed.


INTRODUCTION
Medical imaging is increasingly important in cancer applications. 1,2 Few existing imaging biomarkers are used to guide clinical decisions. 3 Major efforts are underway to identify, validate, and deploy new imaging tools in the clinic. These efforts rely on the discovery of novel quantitative imaging (QI) biomarkers, which promise to support objective and reproducible characterization of disease and allow for more personalized approaches to diagnosis and therapy. One such effort is led by the National Cancer Institute (NCI) via its Quantitative Imaging Network (QIN) initiative, [4][5][6] with primary goals including collecting data from ongoing imaging clinical trials, developing innovative methods for data collection and analysis, and establishing consensus on QI methods. 4 Practical QIN experience in striving toward those goals highlighted the importance of imaging informatics as applied to QI biomarker development. In the context of radiology, imaging informatics is defined as a subspecialty of radiology concerned with "the study of how information about medical images is exchanged within radiology and throughout the medical enterprise" 7 (p657) . Communication of health care information and integration of data from various institutions and sources is critical for development and validation of innovative QI research tools. Research tools need to be harmonized to support data interoperability, reuse, and aggregation in community repositories and enable federated learning and eventual translation of the matured tools into clinical trials. The Informatics Technology for Cancer Research (ITCR) program (https://itcr.cancer.gov/) was established by NCI to support development and sustainment of open-source cancer informatics tools spanning all areas of cancer research.
Quantitative Imaging Informatics for Cancer Research (QIICR; U24 CA180918) was one of the first projects funded by ITCR under PAR-12-287. Motivated by challenges encountered by the QIN community, and with several investigators actively or formerly funded by the QIN program, QIICR embarked on a mission to close some of the informatics gaps that were perceived as limitations to the success of QIN.
The QIICR project developed a collection of reusable opensource components to support a range of informatics tasks for QI. In parallel, we worked on improving standardization of QI analysis outputs by refining and improving the existing standard, developing reference tools and datasets implementing the standard, and performing various outreach activities to demonstrate and promote best practices for sharing QI analysis results. We summarize the scope, collaborative approach, vision, and results of the QIICR project over the funded period; discuss the legacy of the project in the context of various ongoing activities; and outline remaining challenges and opportunities that we identified.

METHODS
The QIICR approach was defined by the three aims: 1. establishing open-source tools and workflows to support analysis of longitudinal and derived image data; 2. supporting standardized representations for communicating QI analysis results; and 3. improving interoperability with community image data repositories.
Three collaborating QIN groups joined the QIICR consortium and defined clinical applications, imaging modalities, analysis approaches, and generated data types. They also contributed to the development of tools and supplied representative datasets: contributed scholarly publications, training materials, and tutorials explaining the best practices for using DICOM and the tools we developed. We prioritized collaborations with other groups (academic and commercial) developing imaging analysis tools and workstations to encourage, support, and evaluate adoption of DICOM.
For the third aim, we directed our effort to improved integration with The Cancer Imaging Archive (TCIA), 10 the recommended repository for the QIN investigators. TCIA implements procedures that ensure DICOM imaging data are free from protected health information and maintains resources to support sharing of de-identified data. 11 Increasingly, TCIA is also distributing analysis results alongside the image data. We aimed to simplify the process of interaction of tool users with the repository and worked on DICOM conversion of the QI analysis results in other formats already deposited in TCIA.

Quantitative Image Analysis Tools
A range of batch-processing and interactive tools were developed to support QI workflows for each of the use cases. Batch-processing C++ Slicer Execution Model (SEM) plugins were added for 3D Slicer. 8 These modules can be used by other applications adopting SEM. 12,13 The plugins can also be used as standalone command-line tools outside of 3D Slicer. Interactive tools were implemented as 3D Slicer Python scripted modules. Modules were disseminated using the 3D Slicer extensions infrastructure, which supports automatic testing and packaging of extensions for Windows, macOS, and Linux. The tools are summarized in Table 1 and Figure 1 and on the QIICR Web site. 14 We engaged in a number of efforts led by QIN to evaluate the developed tools using common reference datasets and to compare the results produced by the various tools available within the network (Table 1). We also conducted a number of studies designed to develop best practices in corresponding QI methodologies. [15][16][17][18][19] Standardized Representation of QI Analysis Results To enhance the DICOM standard based on the needs of the use cases, we contributed numerous improvements to the standard (some of those are discussed in detail in Fedorov et al 20 ; see the Data Supplement for the complete list). Per the DICOM process, those updates passed the rigor of public review, revision, and approval by the DICOM standard committee.
Our DICOM implementation efforts improved support of relevant DICOM components in a variety of tools. We added new capabilities to the DICOM Toolkit (DCMTK) C++ library, 21 improving read/write support and the application programming interface (API) for DICOM Segmentation and Parametric Map objects and the DICOM TID 1500 Structured Report template. The API introduced an extra level of abstraction to simplify both reading and writing of those objects. We developed dcmqi, 22 a standalone library and set of command-line converters (implementing the SEM interface) between standard and research-oriented representations of the aforementioned objects. dcmqi has subsequently been integrated into a number of independent platforms, such as MITK 12 and ePAD, 23 and has supported data conversion tasks for several other projects. 24,25 The Quantitative Reporting extension of 3D Slicer 22 was developed to hide (behind the interactive user interface) the peculiarities of reading and writing QI DICOM objects.
We worked with the CommonTK, ePad, and OHIF communities to create JavaScript implementations of key DICOM processing code to use in Web browsers and other JavaScript environments. An early experiment was to cross-compile DCMTK C++ code to JavaScript, using Emscripten. 26,27 More recently, we have been developing a pure-JavaScript suite of libraries to natively implement critical DICOM functionality, including a new version of dcmjs that manages conversion of traditional DICOM binary format to and from JavaScript objects and JavaScript object notation. 28 The new dcmjs is also used as the core library for JavaScript implementation of client and server portions    30 We published a number of datasets accompanied by peerreviewed articles demonstrating best practices for QI analysis results sharing (  NOTE. Role of the project is indicated as C (contributed the original dataset), T (tools developed by the project were used for data harmonization by a third party), or H (project contributed harmonized representations of the analysis results submitted earlier by an independent entity).
Abbreviations use suitable DICOM objects for communicating analysis results implemented that capability and used DICOM4QI as a venue for testing interoperability with other platforms. Platforms included the open-source 3D Slicer, 8,37 MITK, 12 ePAD, 23 and OHIF Viewer, 38 and commercial products such as those from Brainlab and TeraRecon (Fig 2).
We organized a half-day tutorial on the use of DICOM for QI research at the Medical Image Computing and Computer Assisted Interventions conferences in 2017 and 2018. The tutorial included presentations from QIICR investigators and collaborators, and a hands-on component demonstrating the capabilities of the DICOM standard and tools. Materials for those tutorials (presentation slides, pointers to the datasets, Jupyter notebooks) were archived and are publicly available. 39

Interoperability With Data Repositories
We developed TCIA Browser, a 3D Slicer module that uses the TCIA API to explore, download, and visualize the content of public TCIA collections. It streamlines 3D Slicer user interaction with the repository, and makes it easier to evaluate various visualization and analysis tools available within the application.
Coordinating with the TCIA team, we converted some of the existing third-party-contributed analysis results to a standard representation. These datasets can now be accessed from the usual search interface of TCIA or directly via a digital object identifier (as opposed to being linked as compressed archives from collection wiki pages) and are interoperable with a number of popular open-source tools. Furthermore, several collections were harmonized using dcmqi without direct involvement of the QIICR team. Table 2 lists all such collections directly or indirectly supported by QIICR. Currently, dcmqi is the tool recommended by TCIA for harmonizing analysis results submitted to the archive, and 3D Slicer is listed as the only visualization and analysis desktop application integrated with TCIA via its API.

DISCUSSION
The QIICR project aimed to address major perceived needs of the QI community and the QIN, and proposed solutions for specific QIN projects. There is no clear convention on evaluating contributions such as software tools or datasets that QIICR produced. Based on the evidence we presented, at least some of the tools we developed have been used in projects external to QIICR. The MRI modeling tools and PET segmentation tools we contributed scored favorably based on the results of the challenges organized by QIN. [40][41][42][43] Unlike most of the other tools evaluated in those challenges, QIICR-developed tools are readily available as 3D Slicer extensions; source code is public under commercially friendly, nonrestrictive, open-source license; and use of the tools does not require establishing material transfer agreements.
Our investigation of DICOM to represent QI analysis results had limited precedent in the research community. By the end of the project, we were encouraged to see increasing adoption of our ideas in both academic and commercial tools. Our contributions to the DICOM standard passed the rigor of the community review process and are now being reused in new contexts and areas of application. Through one of the QIICR collaborations, we have been investigating the use of the DICOM standard to support digital pathology applications. 44 DICOM Working Group 23 is actively exploring the use of the DICOM TID 1500 Structured Reporting template, as well as DICOM Segmentation objects and Parametric Maps to support a host of artificial intelligence (AI) use cases, which subsume but are not limited to QI applications. We collaborated with the Imaging Biomarker Standardization Initiative (IBSI), which, among other objectives, is establishing a consensus nomenclature and definitions for radiomics features. 45 We worked with IBSI to augment the feature definitions with codes and added corresponding coded terms to the DICOM standard (see CP-1705 and CP-1764; Data Supplement). Furthermore, we augmented pyradiomics, 46 an open-source tool for extracting radiomics features, with the experimental features to accept input in DICOM format and store calculated features as DICOM Structured Reports, which use IBSI-defined codes. We consider this collaboration to be one example of successful translation of domain experts' consensus into the standard and implemented in tools for broader deployment and adoption.
The importance of metadata and the use of standards are becoming prominent with the increasing emphasis on quality of radiomics studies 47 and distributed learning, 48,49 where data cannot be shared outside the institution because of privacy, size, or regulatory reasons. Documentation of the acquisition protocol and potential confounding factors related to analysis, provenance of the individual annotations and analysis results, and use of unique identifiers to prevent cross-contamination of testing and training data are important for ensuring quality of the analysis. The use of standards is critical to scale production deployment of federated learning systems. They provide improved consistency of data representation and harmonization of imaging with the analysis results.
Overall, we emphasize the importance of a collaborative approach for addressing challenges in imaging informatics at the national and international levels. Topics such as data standardization, interoperability, and sustainable opensource software development cannot be addressed by an individual group. Open coordinated effort involving a broad range of stakeholders, including clinical researchers, engineers, and industry collaborators, is essential. Joint development of the standard, reference implementations and datasets, and reusable components is ultimately beneficial to both academic and commercial efforts.
Most recently, the BWH team led a consortium of collaborators, including many of the QIICR investigators, who were awarded the contract to establish the NCI Imaging Data Commons (IDC), 50 a new component of the NCI Cancer Research Data Commons (CRDC). Our IDC approach is motivated by our QIN and QIICR experience, and will use DICOM to harmonize analysis results that IDC will share.
Challenges remain in establishing and maintaining the infrastructure of analysis tools and standardized imaging datasets. Sustainability of open-source tools developed in academia is often problematic, because continuous maintenance effort is required for changes in dependencies and operating systems, user support, and implementation of new features. Our use of 3D Slicer as a delivery platform partially mitigates this issue by leveraging infrastructure, community effort, and contributions from numerous other supporting projects. Development and dissemination of most of the QIICR tools would be extremely challenging without leveraging the 3D Slicer infrastructure. Sustained support for extensible open-source platforms that provide generic reusable components and a marketplace-like infrastructure for new contributions is critical to reduce redundancy, build community, and support tool dissemination and outreach. Despite significant progress in community adoption of DICOM for QI results, there remains significant effort required to integrate it into academic and commercial tools and workflows. Although improving, support in commonly used tools is limited. Adoption of FAIR (Findable, Accessible, Interoperable, Reusable) data stewardship principles 51 requires time and sustained community effort to improve the standards, tools, documentation, and outreach. We envision the CRDC IDC as a mechanism and laboratory to refine and demonstrate best practices for curating and interoperating with standardized image and image-derived data, and improve access to QI and AI analysis tools.