Acquisition and Management of Data for Translational Science in Oncology

Oncology clinical trials provide opportunity to advance care for patients with cancer. Bridging basic science with bedside care, cancer clinical trials have brought new and updated scientific knowledge at a rapid pace. Managing subject data in translation science requires a sophisticated informatics infrastructure that will enable harmonized datasets across all areas that could influence outcomes. Successful translational science requires that all relevant information be made readily available in a digital format that can be queried in a facile manner. Through a translational science prism, we look at past issues in cancer clinical trials and the new National Institutes of Health/National Cancer Institute initiative to address the need of database availability at an enterprise level.


Introduction
Translational science provides the opportunity to apply advances in science directly to patient care.In oncology, new and important scientific information is moving forward at an enterprise level.Managing data for translational science requires a sophisticated informatics infrastructure that can harmonize multiple datasets on subjects in all areas that influence outcome.Medical history, stage of disease, disease location, surgery, pathology genomic/proteomics imaging, radiation therapy, and medical oncology are equally important in determining outcome both in tumor control and normal tissue function post-therapy.In this chapter, we describe opportunities moving forward to enhance our knowledge and application of science to patient care.and the American College of Radiology (ACR) core quality assurance centers in imaging and radiation oncology [1,3].
The integration of the cooperative groups established an economy of scale for the NCI clinical trials program.The five NCTN groups house a significant amount of data.These data include outcome data, tissue banks data, and IROC hosting invaluable information important for clinical trial outcome analysis.These data libraries house the largest volume of oncology-related information in the world.Because the data are acquired on clinical trials, the datasets and outcome information are the best in the world for consistency in data acquisition and management and completeness of the information.They are among the best resources in the world for performing outstanding translational science research and comparing institutional translational science against validated datasets.Currently, the data are fragmented and siloed within the multiple remaining statistical centers, tissue banks, and IROC.Moving salient trial information into a single or synergistic data system would be an important objective promoting translational science.Hundreds of thousands of complete datasets are readily available in these systems, which can be used to promote individual work and serve to validate work of translational scientists for the next generation of clinical trials.This is the goal of big data acquisition and data management of this information.Each subject on study has pathology, imaging, therapy, and data for outcome analysis.Validated datasets with consistent and uniform acquisition of information will permit accurate assessment of trial outcome and provide quantitative significance to the work.The potential for successful application of this effort is within our reach, the challenge is to define a pathway to achieve the goals of the work.
Problem solving in oncology is challenging.To move the field forward with strong translational science and apply balanced judgments for disease management, the information acquired for review must be robust to appropriately power the study question and have the quality needed to be certain that the conclusions are accurate and can be validated.Oncology data management can be challenging if the information under review is incomplete and unvalidated, resulting in inaccurate conclusions established in outcome analysis.The database must also undergo self-renewal as process improvements become standard for evaluating outcome imaging to validate patterns of failure and pathology to review change in biomarker status relative to disease progression.Tools for biomarker identification, imaging, radiation oncology, and applied medical oncology are under constant revision and databases for translational science need constant maintenance to insure accuracy and applicability.Future strategic translational science objectives are clear and unambiguous.The more complete and accurate the information acquired, the more successful science can be applied to the bedside.In the next section, we will describe challenges in clinical trial outcome interpretation when information is incomplete.

Hodgkin lymphoma
Hodgkin lymphoma is a unique disease than can affect children, adolescents, and adults of all ages.Chemotherapy has become the initial and primary therapy for this disease with choice of agents and duration of therapy based on stage at presentation, subject medical status, and response to induction therapy.The use of radiation therapy remains under continued refinement and influenced by both response to chemotherapy and volume of tumor at presentation.Pediatric Oncology Group protocol 8725 evaluated what would be today called intermediate and advanced stage subjects.In this study, subjects were treated with eight cycles of hybrid chemotherapy (ABVD-MOPP).After completion of chemotherapy, subjects were randomized to observation or to receive radiation therapy to all sites of disease identified on imaging at presentation.In the original Journal of Clinical Oncology publication, there was no difference in outcome in subjects who received radiation therapy [4].Retrospective analysis of the study required imaging at presentation and before radiation therapy.The results revealed that subjects who received study-compliant radiation therapy had a survival rate that was 10% superior to those who received chemotherapy only.For the subjects treated in a manner not compliant to study guidelines (deviation), the survival was identical to those who received chemotherapy alone.In other words, subject outcome with chemotherapy was not improved using consolidation radiotherapy in a non-study-compliant manner.Most study deviations were secondary to excluding original disease from the treatment field [4,5].
Because of the study deviation rate, the next series of clinical trials in Hodgkin lymphoma required that the radiation therapy fields be reviewed pre-radiation therapy at the Quality Assurance Review Center (QARC/now IROC).The trials evaluated early stage subjects with therapy titration secondary to response after two cycles of chemotherapy and subjects with intermediate risk disease.With pretreatment review of radiation therapy objects, compliance to study was outstanding.However, the images at presentation and response were collected by QARC to (1) confirm response and (2) define radiation therapy treatment fields.The images were reviewed retrospectively for response confirmation and the central review agreed only 50% of the time with the site assessment, demonstrating the need to define response in a consistent manner if clinical trials were going to be designed to either titrate or augment therapy based on response to treatment [6].
COG AHOD0031 was the first clinical trial in the world to use central review of imaging objects to assess response to treatment and to review radiation therapy treatment objects in real time.The real time review of objects was imbedded into the trial design structure.The image review and the review of radiation therapy treatment objects were completed at QARC with immediate feedback to site and COG statistical center, which in turn triggered both secondary and tertiary points of randomization with therapy titration based on good to complete response and therapy augmentation if response was incomplete (Figure 1).The trial accrued more than 1700 subjects.The completeness of the database, including outcome images, have generated many secondary projects including response in bone with PET, radiomics of response to chemotherapy in pulmonary parenchyma and pleural effusions, and patterns of failure on protocol therapy [8][9][10].The data management process involved in COG trials has established the infrastructure for translational science.Pathology objects are housed in tissue banks and outcome information is housed at the statistical center; nevertheless, this information could be made available as needed for secondary projects to be completed.One of the goals moving forward is to have all this information available in a single format.This will be discussed in greater length in The Cancer Imaging Archive (TCIA) section.
The advantage to the data management system housed at IROC for COG is the current ability to now manage clinical trials for many subsets of subjects with Hodgkin lymphoma in a nimble manner with international involvement.With digital data transfer tools, information can be reviewed in real time to effectively manage adaptive trials in a facile manner.Very young subjects, older subjects with medical co-morbidities, CD30 therapies, and immunotherapy are now under study with real-time review of response coupled with biomarker analysis [11].Expanding this strategy to a global and quantitative function is how translational science can be applied on an enterprise basis.Creating a central resource of all cancer research information and objects for investigators would enhance next generation science.DOI: http://dx.doi.org/10.5772/intechopen.89700

Breast
Clinical trials have been the infrastructure to process improvements in care for breast cancer patients.The NSABP (National Surgical Adjuvant Breast and Bowel Project) clinical trial B-06 confirmed the use of radiation therapy in the definitive management of breast cancer.Clinical trials have optimized chemotherapy in this disease, conformed the utility of hormonal therapy, and provided us the platform to titrate therapies including surgery.The advantages of clinical trials in breast cancer are self-evident [12,13].Likewise, clinical trials in breast cancer also reflect the problems associated with limited data acquisition and how unintentional omission of data acquisition negatively influences translational science.
The former CALGB conducted a series of well-designed clinical trials intended to evaluate the role of chemotherapy dose escalation with Adriamycin and Cytoxan with the sequential additional of Taxol in node positive subjects [13].These trials performed in sequence predated the routine use of Her-2-Neu-directed therapy in this disease.Likewise, these trials in development predated the confirmation that radiation therapy provided a survival advantage for node positive breast cancer subjects treated with chemotherapy.The decision was made by the principal investigator not to either collect or inquire how local care to the breast and lymph nodes was applied and accomplished.The decision was made in this direction because there was no unambiguous information available at the time of trial development that local care or local control affected survival; therefore, the utility of data collection was not thought to be of consequence and did not merit the cost or effort.Evidence to the contrary was made public in 1997 when two separate clinical trials were published together, both demonstrating a survival advantage when node positive subjects were treated with radiation therapy on an adjuvant basis [14,15].This made clinical trial interpretation challenging because subject outcome relative to type, duration, and specific chemotherapy could not be solely assigned to the chemotherapy delivered.Many of the subjects on these series of studies also received radiation therapy to dose and volumes that were non-uniform and site/investigator specific.
To attempt and address this situation, Sartor and colleagues from QARC (now IROC Rhode Island) attempted to collect radiation therapy treatment information in a retrospective manner by contacting institutions about specific subjects.This Real-time imaging and radiation therapy object reviews on AHOD0031.This image is under the creative commons attribution license [7].
was partially effective, however limited in execution because of several reasons.This effort was not the intention or part of the data management of the studies; therefore, site investigator enthusiasm to support the effect was non-uniform.The second issue was one that is now more visible in more modern clinical trials as to collect this information, institutional review boards (IRB) began to insist that subjects on study be re-consented for this effort.Accordingly, this became a significant barrier to project completion [16].
These trials accrued more than 3000 subjects and were well powered to achieve the objectives of the study.However, only less than one-third of the radiation therapy information could be collected and often the information received was incomplete.The information received clearly indicated that without protocol guidelines, dose and volumes treated were significantly heterogeneous and when treated, subjects received therapy that was non-uniform relative to dose and volume and conclusions would have been difficult to validate.There was a trend for subjects who received Taxol to also have received radiation therapy, imposing a question as to which therapy, or both, provided a care advantage.
A second issue foreshadowed a problem now seen in current breast cancer clinical trials.There were more than 300 local and or local regional failures in this study.If radiation therapy objects were collected and detailed information concerning the location and nature of the local failure, the application of the radiation oncology treatment fields, technique, and dose could be reviewed to better ascertain how radiation therapy can/should be applied along the chest wall and regional draining lymph nodes.The lack of information has led to a paralysis in our understanding of issues than remain today.
ACOSOG Z0011 (Z11) was a clinical trial designed to evaluate limited axillary surgery in subjects with breast cancer including those with limited nodal involvement.The objective was to validate that sentinel lymph node staging coupled with radiation therapy would be non-inferior to more comprehensive axillary surgery [17].The trial asked for what is referred to as "tangential radiation therapy."This would imply that the axillary volume was treated to a partial volume.In studies evaluating this point using anatomical guidelines superimposed on radiation therapy treatment fields, approximately 60% of the axilla would be included in a traditional field that would not extend superiorly to the axillary vein nor posteriorly to the latissimus muscle [18].The study proved to be positive relative to limited axillary surgery.However, radiation therapy treatment objects were not collected as part of the study, giving the impression that partial volume RT to the axilla could be considered a new standard of care.Jolie and colleagues decided to review this point and with several colleagues including support from QARC (IROC), attempts were made to gather specific information as to how radiation therapy was delivered to subjects on study.The investigators encountered similar barriers to data acquisition relative to site enthusiasm and IRB approval.However, the investigators were able to determine that a significant number of subjects were treated with more comprehensive radiation therapy to extended volumes including regional lymph nodes that were not study directed.Therefore, conclusions concerning the application of radiation therapy to limited volumes of the axilla are premature [19].
If radiation therapy digital datasets and radiation dose were collected as part of the study, it is possible that interventional review pre-therapy could have been performed and the data set could have been more uniform.In this situation, outcome images could be applied to patterns of failure analysis and define outcome relative to axillary therapy with more security.This issue continues to haunt even the most current breast cancer trial attempting to discern the appropriate radiation therapy target volume in subjects with limited and more extended axillary surgery.This has influence in normal tissue outcome and despite nearly 50 years of clinical trials, it Acquisition and Management of Data for Translational Science in Oncology DOI: http://dx.doi.org/10.5772/intechopen.89700remains unanswered.If the datasets were complete, it is possible we might be closer to understanding how to apply therapy for both tumor control and optimal normal tissue function.This is of increasing importance as radiation therapy is now being asked to treat nodal regions with more limited surgery.If information acquired on study was more complete, we may be further along in our understanding of field placement.This also limits our ability to perform translational science including pathology biomarkers for local failure and other potential areas of interest to the oncology community.These are problems of omission of data acquisition and limit our ability to use these datasets to review information in retrospect and apply knowledge to the next generation of clinical trials.

Head/neck
There is much to be learned as issues of head and neck management are influenced by biomarkers and relationship to viral origin of disease.To be accurate in data assessment, pathology and imaging objects need to be complete to intercompare both staging and therapy execution/outcome.Non-uniform treatment execution can limit study success and utility of the dataset for translational science.
Tirapazamine gained prominence as a hypoxic cell sensitizer to radiation therapy with favorable phase 2 results.In the HeadSTART clinical trial, Tirapazamine was randomized with traditional chemoradiotherapy for locally advanced squamous cell carcinoma of the head/neck.In this study, both imaging and radiation therapy quality assurance objects underwent on treatment review at QARC for study compliance.Pre-treatment review was not utilized because the trial included many international partners and most of the involved sites credentialed for participation had primitive digital transfer tools at the time of clinical trial participation.The trial management committee determined that pre-therapy review would potentially be a barrier to accrual.Nearly 25% of the cases required radiation field adjustments to ensure that full dose was received to gross tumor as seen on central review.Of the 211 requests for field alteration, 116 chose not to adjust the fields and 95 chose to make the adjustment.In all cases, the potential for deviation was due to potential exclusion of gross tumor from full dose and protocol-compliant radiation.If the adjustment was made on treatment, subjects had reasonable survival; however, their survival was less than those who had a compliant plan de novo.If adjustments were not made, except for two cases, the trial management committee agreed with the assessment of the on-treatment reviewer.The trial committee asked the primary on-treatment reviewer to score all study deviations into two categories, clinically meaningful or not.Clinically meaningful deviations excluded gross tumor from full-dose radiation therapy.Subjects with clinically meaningful deviations had a significant decrease in survival while those whose deviations were not considered significant had an outcome identical to those whose plans were adjusted for compliance on treatment.Both groups had survival less than the subjects who had a compliant plan upfront.The most unfortunate aspect of this study is the deviation rate on trial over rode the point of randomization on the study, rendering the experimental arm uncertain, secondary to quality of the radiation therapy.Therefore, the cases become less helpful for translational science endeavors including pattern of failure analysis as the outcome was influenced by the quality of the data [20,21].
Current studies in head and neck cancer place emphasis on the role of immunotherapy in both comparing the traditional platinum-based therapy coupled with non-inferiority objectives to evaluate toxicity.In these trials, investigators have matured in the coverage of gross tumor.However, there is new disparity on coverage of targets deemed of intermediate and low risk as investigators are empirically titrating volume treated to ameliorate mucosal discomfort and late effects to multiple normal tissue volumes.Therapeutic titration will influence toxicity profiles.If investigators are titrating volumes, modern protocols may unfairly favor toxicity profiles generated by new regimens in comparison to historical standard.Complete datasets including outcome imaging with patterns of failure will be important to compare both tumor control and toxicity moving forward.
These are important datasets as strategies in artificial intelligence and machine learning in radiology and radiation oncology need to be developed from accurate and complete datasets, especially as the origin of disease becomes multifactorial in origin.If objects are not targeted correctly or per protocol, it will be impossible to develop accurate artificial intelligence (AI) machine tools that can be applied for translational science moving forward.

Lung
Treatment of patients with lung malignancy is challenging as metrics for normal tissue function extend beyond the anatomical constraints that can be defined in radiation oncology planning images.Both tumor control and normal tissue function are influenced by dose to target and radiation dose to normal tissue, including volumes of parenchyma receiving low, intermediate, and high doses.One of the challenges moving forward is to ensure that volumes are drawn accurately to assess tumor control and cardio-pulmonary function.RTOG 0617 study evaluated the role of targeted therapy coupled with both low-and high-dose radiation therapy to tumor target.Radiation therapy treatment objects were submitted for review without diagnostic images to validate that what was drawn for treatment reflected all the tumor anatomy.It was assumed, perhaps accurately, that the planning CT would be sufficient to confirm that all disease was included in the intended target.Although the study did not show a benefit to high-dose radiation, the paradox was that in the early phase of the trial, the higher dose arm had statistically worse local control.One explanation is that gross tumor may not have been fully contoured in selected cases and possibly influenced trial outcome.As such, without primary imaging and outcome imaging, it is difficult to confirm the pattern of failure, possibly rendering these datasets less useful for secondary analysis.Complete datasets would be helpful to validate trial outcome and increase the utility of the data for secondary analysis and translational science [22].

Analysis
Successful translational science requires that all relevant information be made readily available in an informatics format that can be queried in a facile manner.Incomplete datasets can lead to conclusions that may be inaccurate.Complete datasets will permit the facts to drive study outcome and investigators will feel the analysis is accurate.A senior clinical trial investigator once stated that the time to write your protocol is when you analyze your data as you finally begin to see the questions you should have both asked and anticipated in the design phase of the trial.This is why informatics platforms must have enterprise-level query function to help answer questions not anticipated in study design.Datasets from clinical trials potentially make the best tools for moving translational science forward as the (1) subjects are entered on trial for a specific purpose, (2) the stage and appropriateness for subject entry are confirmed with uniform standards, (3) information including pathology and imaging are collected and collated in a uniform manner, (4) subjects are treated in a uniform manner with treatment data available for review, and (5) outcome information is available including imaging to validate outcome as appropriate.These datasets permit evaluation of a potential new biomarker as (1) the information may be available in situ as a digital object and (2) additional tissue may be available as part of an exhaustible resource housed in a tissue bank.Nimble query function can permit intercomparison between subjects on the same trial or other subjects on different studies with similar demographics for analysis.This can potentially evaluate and compare those with similar biomarkers on different studies.The datasets become a rich resource for science.At the moment within both industry and the NCTN, the information is housed in disparate locations including statistical data centers that house subject demographics and outcome, IROC which houses images and radiation therapy treatment objects, and tissue banks which house both digital maps of known biomarkers and tissue help for additional studies when needed.As a result, efforts to perform secondary analysis on objects become challenging and often lost in process and sequential approvals for use.This can take considerable time and effort and often the scientific question is muted and defeated by processes designed for data protection.
To make science move forward at an enterprise level, this information needs to be housed in in a single platform designed to promote and support the modern scientist with nimble and comprehensive query function for robust data review.The next section will discuss how these objects are currently managed and present a strategy for modern science moving forward.

The cancer imaging archive (TCIA)
TCIA is an initiative of the National Institute of Health (NIH) and the National Cancer Institute (NCI) to address the need for enterprise database availability to promote translational science and to validate concepts and ideas against a strong database.The archive resides in the public domain and the objects are fully de-identified per government standards.The principal investigator of TCIA is Fred Prior, PhD, who is the chair of quantitative science at the University of Arkansas.Dr. Prior has long-standing expertise in database management and development of image datasets.He has applied his expertise to this effort coupled with co-investigators Joel Saltz, MD, PhD, and Ashish Sharma, PhD.Dr. Saltz is the chair of bioinformatics at Stony Brook University and an international expert in digital pathology and integrated database function.Dr. Sharma is at Emory University and is an informatics expert who helps move TCIA to enterprise function.
The digital archive houses imaging and outcome information that can be used for translational science.The current portfolio includes datasets from all cancer subtypes in adult oncology and will soon be updated with datasets from pediatric oncology.The information includes clinical information, imaging, radiation therapy datasets and treatment plans, pathology objects including genomics, and other supplemental information required for analysis.The current portfolio can be accessed through www.thecancerimagingarchive.net.TCIA is an important initiative as it makes information available to all interested scientists who can apply information directly to their projects of interest.
As TCIA moves forward, it will be essential to find mechanisms for enterpriselevel data capture and uniform data formatting for all trials both within the NCTN and industry.Clinical trials within the NCTN will likely need to be designed in the future with data capture as part of the protocol and processes imbedded in the trial will need to be made uniform in the trial design to be uploaded to TCIA once the trial is complete and objects fully de-identified for the public archive.Investigators will want to be able to look at specific subjects' subsets in clinical trials and evaluate both image and biomarker expression for subsets both within individual trials and intercompare with other trials for outcome analysis.This will be true in all adult and pediatric oncology subtypes.We are learning more each day about responsedirected therapies and the relationship of response and disease progression relative to biomarkers.The informatics tools will need to be robust to accommodate this element of data exchange and evaluation.Digital pathology will play an increasingly important role including capture of profiles that are identified to date in a manner similar to oncotype for breast cancer.Tissue will need to be stored for the biomarker not yet identified.Digital capture makes identified objects inexhaustible while tissue is exhaustible, hence the reason we need to store and protect this invaluable resource.Radiation therapy treatment plans will need to be fused with images that define targets and correlate to images that review the pattern of failure.It will be through these mechanisms we will all mature in our understanding of disease processes and the success/failure of our applied therapies [23,24].

Conclusions
Enterprise-level function of comprehensive clinical trial datasets is closer to reality than it has been in the past.The quality of the dataset will significantly influence the quality of our understanding of the applied information and how we use clinical information moving forward.Efforts need to be made to optimize existing datasets in the NCTN and industry to help move knowledge forward in a manner we can validate and trust.

Figure 1 .
Figure 1.Real-time imaging and radiation therapy object reviews on AHOD0031.This image is under the creative commons attribution license[7].