The Expanding Potential for Cohort Studies to Inform Priorities for Cancer Prevention

evaluation and appreciation of the contributions of cohorts and their role in cancer prevention continues to be debated as the rate of growth in resources for cancer research slows and future return on investment becomes an explicit evaluation criterion through assessment of the impact of proposed studies on public health or clinical programs. Abstract While the contributions of cohorts to informing cancer prevention have grown, areas that have been omitted from recent discussions are identified and implications for training discussed. In particular, greater familiarity with and collaboration in modeling of disease can inform prevention building from cohort data to synthesis of behavior, cohort, and population evidence.

The discovery, development, delivery, paradigm of research by National Institutes of Health (NIH) has been used to summarize the impact of large initiatives such as funding of cohort studies [1]. In previous writing, the approach to evaluation of large cohorts has focused on these three summary measures: • Discovery: to explain the etiology of disease and health conditions • Development: to provide a basis for developing control measures and prevention procedures for groups and populations at risk (e.g., determination of causes, public health guidelines, risk models) • Delivery: implementation, use of findings, evidence-based public health policy and clinical guidelines (e.g., public awareness, policy applications).
Examples from the Nurses' Health Study and Childhood Cancer Survivors Study show that these metrics may summarize the contributions show that measures of output grow over time, but the summary of return on investment is poorly presented [1][2][3][4]. We might do well to expand the classification further by better understanding how data from cohorts are used and how these data form an essential basis for the growing demand of modeling disease at population levels [5][6][7][8].
A lead series of cohorts has been run by the American Cancer Society (ACS) which has led U.S. prospective studies documenting the link between smoking cigarettes and lung cancer from the first study of over 188,000 men [9,10], to the Cancer Prevention Study 1, follow-up of 1 million men and women [11] and also documenting the benefits of stopping smoking where after more than one year the risk was lower than current smokers and took more than 10 years to return to the risk of never smokers [12]. Subsequent follow-up data informed the estimates of tobacco smoking to cancer mortality in the USA providing essential input to the report by Doll and Peto on the potential to prevent cancer [13]. Further updates of the ACS cohorts refined our understanding of the burden of tobacco across decades [14]. The Cancer Prevention Study cohorts have also contributed leadership to documenting the burden due to overweight and obesity [15] setting the stage for the International Agency for Research on Cancer report on this topic and global estimates [16]. Like other cohorts studying lifestyle and diet data [17], the ACS also contributed major data on mortality due to alcohol [18].
Beyond the reporting of these associations, investigators at ACS have collaborated with Centers for Disease Control and Prevention (CDC) to provide necessary inputs to their programs for estimating the population burden of tobacco summarized as morbidity, mortality and economic costs [19].
CDC applications build from ACS prospective data (tobacco) to generate estimates of cancer burden for the nation and for states. Thus the ACS cohorts, like other cohorts, have contributed substantially to the discovery of public health and biologically important relations between smoking and cancer [20,21]. Likewise, the early report by Thun on the relation of aspirin to reduced colon cancer mortality [22] opened a field of study that has included subsequent randomized trials [23,24] and further observational data [25,26] to document the substantial reduction in risk with regular long term use of aspirin.

Refining Assessment of Public Health Impact
In the report by Colditz and Winn [1] applying these measures of impact of large initiatives to the Nurses' Health Study, a final consideration of the overall scientific context was the development and expansion of collaborations. The early collaborations among cohorts included the Oxford hormone combined analysis (OCs [27] and PMH [28]) in the early 1990s, the diet and cancer collaborative individual patient data analysis (beginning in 1991) [17] outcomes have matured over time. Leadership of these collaborative efforts varies and now many are led by investigators outside of the National Cancer Institute. Some address factors like weight change, BMI, and mortality [33].
In some cohorts, collaboration has taken the form of the link to local or national laboratory investigators with expertise in particular assays, or pathologists with expertise in a particular malignancy (breast, colon). More recently, expanding links to laboratory scientist developing assays [34] or tissue markers [35], exemplify the broadening role of cohorts to address public health issues in cancer from etiology to estimation of population burden, and evaluation of the impact of dietary/lifestyle guidelines [36]. These approaches for epidemiologic data are summarized in a recent discussion of transforming epidemiology for the 21 st Century [37].
However, beyond these broader applications by epidemiologists, cohort data can inform policy through incorporation into cost of illness studies [38,39], cost-effectiveness models that require natural history models for disease and outcomes [40][41][42][43][44], and disease models to evaluate screening and other prevention interventions [8,45]. Thus to fully quantify the contribution of prospective data from cohort studies, we will need improved tracking of the use of such epidemiologic data as well as broader training of epidemiologists to engage in these collaborative studies.
The return on investment in prospective data grows when these data are used for development and delivery, not just discovery and generation of research papers explaining etiology of cancer. The expanding fields of disease modeling and population burden of exposures from tobacco to obesity and physical activity, benefits of screening and mathematical models of disease development and progression after diagnosis all require grounding in issues of exposure assessment, as well as disease biology, and modeling techniques. For example, to evaluate the benefits of obesity treatment, a statistical model appropriately controlling for intention-to-treat(or any potential confounding) can be constructed to analyze the data and to estimate the effect of the treatment [46]. Alternatively, agent-based modeling or micro simulations can be used to compare different treatment regimens when the data do not support comparisons. The data, however, can be analyzed to inform the parameters in the agent-based models.
To more usefully draw on the cohort data, stronger collaborations between those generating prospective data and those incorporating such data into population modeling will be needed. Thus, expanded transdisciplinary training programs and funding for collaborative projects must be a high priority. Identifying new ways to share cohort data and bring evolving quantitative modeling methods to evaluate population burden of diseases or societal gains from interventions should be given top priority. These modeling techniques quantify both public health burden and benefits in terms of life years, relative risks, or economic costs. For example, regression models are used to measure association of key variables and the outcome variable keeping all other variables fixed [47]. The measured association is either a marginal effect or an average difference ceteris paribus [47]. Another example is agentbased models. These models can be used to simulate the life history of individuals. [48]. Under reasonable assumptions, any disease impact or policy/treatment intervention can be inputted to the model with the individual loss or gain as the output. The societal loss or gain can be derived by aggregating the individual level of loss or gain.

Conclusion
The growing sophistication of modeling and the integration of biologic as well as population level data will call for new skills among epidemiologists and disease modelers to maximize our use of existing data resources and better inform communities and policy makers how changes at multiple levels from policy to individual behaviors and therapies can reduce the cancer burden across all sectors of society, now and in the future.