The case for continued funding of cohorts has been made in part through previous commentaries [1, 2], and in contrast, Potter has called for the creation of a new “last cohort” [3, 4]. Such a last cohort or large-scale human observational study would include genetics, environment, early-detection markers, molecular classification of outcome, and treatment data to inform clinical practice [3], in fact replicating the existing cohort studies that together follow 4 million men and women (in 39 cohorts), with over 2 million having germline DNA blood samples stored [5]. Potter indicated that a new cohort focused on the collection of fresh tissue from incident cancer cases should take precedence over continuing the funding of existing cohorts and the inefficient and piecemeal addition of tumor samples to these cohorts [4]. Such fresh tissue would help reduce outcome heterogeneity. Furthermore, in 2005, he argued that cases should be followed to evaluate outcomes and to address survivorship issues; an approach we adopted in the Nurses’ Health Study in 1992 by adding quality of life assessment and follow-up of breast cancer cases with subsequent funding to evaluate outcomes (R01AG014742, GAC PI). Here, I build the case for continued follow-up focusing on key attributes of cohorts that are already meeting the goals proposed by Potter for a new cohort and provide a greater return on investment than starting a new “last cohort”. These include the basic nature of ongoing cohorts in that they are, by design, hypothesis-driven and must adapt to emerging technologies over time. Importantly, cohort investigators must identify and address gaps in knowledge that will inform public health strategies and clinical practices. Above all, cohorts must capitalize on their unique features to address public health priorities and inform our prevention strategies.

Hypothesis-driven cohorts

While it is given that cohorts such as other NIH-funded research endeavors must be hypothesis-driven, it is instructive to briefly review the origin of several major cohorts that have helped shape public health in the United States and internationally. Take for example the American Cancer Society cohort studies, which have documented the major impact of smoking on cancer and mortality and serve as the basis for CDC projects and global estimates of the burden of tobacco [6]. The British Doctors Study likewise confirmed with a prospective design the original findings of Doll [7, 8] relating smoking to excess lung cancer and other conditions.

The Nurses’ Health Study, on the other hand, was established to address the potential relation between use of oral contraceptives and breast cancer and to evaluate relations between smoking and cancer, extending findings to women [9]. The Health Professionals Follow-up Study was first funded to address dietary etiologies of coronary heart disease [10] but then moved to NCI-based funding with a focus on cancer [11]. Other cohorts have been established to address cancer in specific race and ethnic groups: Multiethnic Cohort [12], Black Women’s Health Study [13] and the Southern Cohort [14] and specific occupations such as the Agricultural Health Study [15].

Adapting technology

Numerous examples attest to the adaptive design, development and application of new technologies to further understanding of disease etiology and potential for prevention. Examples range from development and validation of dietary assessment methods for large population studies [16] to collection of blood samples from tens of thousands of study participants [17, 18], collection of tissue blocks to provide more detailed classification of endpoints [19], and assessment of lifestyle after diagnosis to document relations to cancer survival [20]. Adapting technology also has refined approaches to conduct cohort studies [21] and to the uses that can be made of formalin-fixed tissue to address biologic mechanisms for exposure–disease relations [22].

Blood samples collected to address hypotheses relating hormones to cancer risk [23] also served to provide DNA for the evaluation of molecular markers [24] and finally to contribute across numerous cohorts to studies of pathways of potential importance in etiology of Breast and Prostate Cancer (BPC3) [25] and then to genome-wide association of SNPs and cancer [26].

Identifying gaps in knowledge

To fully address the role of lifestyle in cancer etiology and prevention, we must understand when in the course of disease exposure is most important. Examples from breast cancer with the follow-up of survivors from the atomic bomb in Japan attest to the value of cohort studies to define the exposure–disease relation in the time course of life and etiology [27]. Findings such as this also point to the need for studies to address lifestyle during childhood and adolescence in relation to breast cancer risk [28]. Cohorts addressing diet, growth, and cancer risk work to fill this gap in knowledge [29, 30]. Follow-up of childhood cancer survivors offers one important example of such cohort findings relating to clinical practice guidelines [31]. On the other hand, cohorts have been able to supplement existing data resources to fill in gaps in exposure over the life course. For example, the validated measures of self-reported body shape and adiposity [32] added to several cohorts attest to the value of these measures in relation to cancer risk [33]. Likewise, we added recall of high school diet to the Nurses’ Health Study and then Nurses’ Health Study II [34] and a more detailed assessment of physical activity [35] to further refine our ability to address adolescent exposure and breast cancer risk [36, 37]. Validation shows these measures can be included with adequate performance to detect important relations. The California Teachers Study took this approach from its beginning with assessment of key exposures over the life course and relating these exposures to risk of cancers [3840].

Clearly in the setting of established data collection, storage, and participant follow-up, the addition of data collection strategies to fill in periods of the life course that may be particularly important offers a valuable and cost-efficient strategy to uncover details of exposure–disease relations.

Understanding temporal details within exposure–disease relations is essential to inform the timing and population characteristics for prevention strategies. Cohort studies have developed and validated measures of intermediate endpoints such as colon polyps and benign breast lesions, adding these endpoints to the existing cancer endpoints that were of primary importance for initiating the cohort studies [41]. With these endpoints, we can evaluate the timing of diet and activity in relation to precursor lesions, progression to invasive disease, and progression from invasive disease to recurrence and mortality.

Another approach to filling gaps in knowledge has been the establishment of additional cohorts to address specific populations. As noted above, these have often been defined by race or ethnicity or occupation. Another example relates to adolescent exposure and the creation of the Growing Up Today Study. Initial attempts to expand our assessment of diet, physical activity, and growth in relation to subsequent cancer risk were turned down as not innovative or lacking validated measures of diet. With non-federal funds, I led the creation and validation of the adolescent diet assessment tool [4244] prior to being awarded funds from NICHD to study diet, adiposity, and excess weight gain for gain in height among adolescent children of participants in the Nurses’ Health Study II [45]. Again with the mother already participating in the study, additional data sources were available (and tracking of participant’s long term) at little marginal cost.

Based on these examples, I conclude we should not abandon existing cohorts to reallocate funds to the creation of a new final cohort.

Return on investment grows over time

Let me explain. The return on investment or sunk costs in established cohorts is substantial, and return on this investment grows over time. Numerous cohorts have added collection of biologic samples to an extensive exposure history and have added additional endpoints to maximize the use of extensive exposure data collection and also allow consideration of trade-offs between risks and benefits of components of lifestyle and other exposures. Furthermore, to address the key question for informing prevention, we must clearly document when in the time course of disease an exposure is important for etiology, at what level of exposure risk of cancer changes, and importantly, how much exposure must change to reduce risk, how long that change must be sustained to achieve a reduction in incidence, and whether there is persisting protection after cessation of exposure. (See Table 1.)

Table 1 Questions to frame cancer prevention

As an example, consider calcium intake and risk of colon cancer. As Martinez and colleagues have elegantly summarized, data from the pooling of prospective studies of diet and cancer show that risk of colon cancer is elevated at very low levels of intake. Risk declines with intake increasing up to about 1,000 mg per day but then flattens with little further reduction in risk with greater intakes (see Fig. 2 in Martinez et al. [48]). Confirming this relation, the Women’s Health Initiative randomized women who already had intakes at baseline of approximately 1,151 mg per day and observed minimal change in risk over 5 years of supplementation [49]. Similar issues with the level of intake among study participants also apply to the observed lack of benefit for vitamin D in this trial [49] (see Fig. 1). Though not interpreted as consistent with the underlying epidemiology by all reporters, these findings nevertheless show the importance of key prevention questions in defining prevention strategies and the value in this case of combined cohort data across many studies to define the levels of exposure necessary to reduce risk.

Fig. 1
figure 1

Vitamin D and colon cancer. Cohort data from Feskanich et al. [64]

The example of “filling in” exposure over the life course to understand etiology as described above clearly informs our understanding of exposure and disease etiology. Expanding on this, we can consider the role of cohorts in defining exposure impacting on precursor lesions to a greater (or lesser) extent than on invasive disease or even lifestyle after disease onset and subsequent outcomes. Numerous examples demonstrate these points. For example, the availability of repeated measures of exposure allows us to address change in exposure and change in risk or sustained high exposure and achievable risk reduction. For each of these questions, the cohort offers a broad set of data that with further refinement greatly adds to our understanding and ultimately informs strategies for prevention. Killing (or terminating) existing cohorts would be an incredible waste of existing resources as we continue to address the key question that will transform our population level potential of cancer prevention into focused strategies that achieve the goal of preventing more than half of all cancers in Western societies.

Repeated measures inform prevention

Numerous components of lifestyle have been addressed capitalizing on the availability of repeated measures in cohort studies. Examples include smoking cessation and continued non-smoking and the decline in incidence of cancer [50]; aspirin and colon cancer [51]; weight loss and breast cancer [52]; and methods development [53]—to name just a few.

Consider aspirin and prevention of colon cancer. Clearly, the time course matters. Epidemiologic data highlighted the potential for prevention of colon cancer [51, 54], but refining this understanding required repeated measures and intermediate endpoints. For example, in the Health Professionals Follow-up Study, Chan and colleagues showed that both dose and duration of exposure matters—observing a 25% reduction in risk after 10 years of use and greater reduction with higher dose [54]. Finally, the combined analysis of randomized trials confirmed these epidemiologic associations and demonstrated unequivocally that benefit is not observed until 10 or more years of use [55].

Additional endpoints

Even greater return on investment is achieved through addition of endpoints for disease onset and for total mortality [15, 5658]. In each situation, one has the added value of placing exposure–disease relation into a broader public health context by potentially weighing risks and benefits across the multiple endpoints. While such approaches offer challenges in some research settings, such as cancer-specific research institutes and organizations, when other chronic diseases have been added to cohorts, the gain in understanding has been substantial with the marginal cost limited to the documenting of the endpoints as questionnaire-based exposures are already collected and available.

An innovative resource that is added as a complement to the California Teachers Study is the linkage to the California hospital discharge databases (Office of Statewide Health and Planning), which has been validated as a resource for multiple additional endpoints beyond cancer [59]. The investigators thus have put in place permission and systems to use this resource to capitalize on the cohort to address a broader range of health issues, placing exposure and outcomes for cancer in the context of other major health events and also enriching the self-report data with the history of gynecologic and other surgeries that may modify cancer risk. This added resource has supported discovery with publication of the first paper on the epidemiology and correlates of adenomyosis (a condition that, although similar to endometriosis, affects the inner lining of the uterus) [60]. They have also published on predictors of surgery for endometriosis and fibroids in the cohort [61]. As the added value of additional endpoints becomes more widely recognized, other cohorts are linking to records for identification of non-cancer outcomes [62].

In summary, return on investment can be considered through the gains as outlined above:

  • Understanding the time course of disease

  • Evaluating change in behavior and change in disease risk

  • Adding endpoints to evaluate the exposures in relation to multiple diseases at low marginal cost and balancing risks and benefits of exposures/behaviors.

As I have emphasized in previous writing on cohorts [1], maintaining high follow-up rates is key to internal validity. This reflects, in part, the adaptive nature of cohort studies. Use of web-based approaches to follow participants, e-mail in addition to postal addresses, and the like, attest to the adaptability of this design. In addition, ongoing quality control measures are needed within those cohorts that rely on tumor registry linkages for confirmation of incident cancers as performance of linkages can vary over time.

Evaluating cohorts

Within the NCI Division of Cancer Control and Population Sciences (DCCPS), increasing emphasis on evaluation of major initiatives has placed led to new strategies to quantify collaborative research [63]? Despite the overall contribution of cohorts to cancer prevention, how does one evaluate specific cohorts and potentially chose among them for continued funding and follow-up? With Debbie Winn, we proposed a set of metrics to evaluate large initiatives such as cohorts within the NCI Division of Cancer Prevention and Population Sciences [64]. These are summarized below:

  • Discovery

    • To explain the etiology of diseases and health conditions (e.g., journal articles, impact factors)

  • Development

    • To provide a basis for developing control measures and prevention procedures for groups and populations at risk (e.g., determination of causes, public health guidelines, risk models)

  • Delivery

    • Implementation, use of findings, evidence-based public health policy and clinical guidelines (e.g., public awareness, policy applications)

While we applied these metrics to the Nurses’ Health Study as an example in our original writing, application to other cohorts confirm the utility of these measures. For example, under discovery metrics, including journal publications and the impact ranking of the journals, we showed that publications during the first 10 years of a cohort are typically minor. In the Nurses’ Health Study, these were largely cross-sectional studies and amounted to a total of 25 papers in the first 10 years of follow-up [64]. Review of the Childhood Cancer Survivor Study cohort [65] that draws on a population of 14,000 5-year childhood cancer survivors has endpoints arising far more rapidly than in an etiology or primary prevention setting, yet still shows the slow take-off in publications that reflect the time and investment in establishing a cohort and the accompanying methods for follow-up (see Fig. 2). During these first 10 years, papers tend to be in discipline-specific journals and high impact publications tend to develop as the cohort matures. Of note, the CCSS has 44 percent of its publications through June 2009, in journals with impact factor of 10 or higher.

Fig. 2
figure 2

Annual publications from Childhood Cancer Survivor Study, 1995–2008

Development metrics ask that findings from cohorts provide a basis for developing cancer control measures and prevention procedures for groups and populations at risk (e.g., determination of causes, public health guidelines, and disease risk classification models). Again, NHS-related contributions are summarized previously [64]. Here, the CCSS provides several examples that translate ongoing findings from the follow-up of childhood cancers to prevention/intervention strategies [65].

Delivery metrics include implementation, or use of findings, to guide evidence-based public health policy and clinical guidelines or policy applications. The CCSS shows the direct translation of their findings to clinical management guidelines, where the cohort has facilitated identification of childhood cancer survivor populations at high risk for specific organ toxicity and secondary carcinogenesis, which has directly informed clinical screening practices [66, 67].

Other applications of these metrics to cohorts support their use for the evaluation of these studies. Publications from GUTS, for example, follow the pattern of the NHS and CCSS but with a stronger push for early publications to sustain funding, and accordingly a number of papers in leading discipline journals were published in the first 5 years using prospective data as were methods papers drawing on the cohort.

Closing cohorts

The lifespan of cohorts is not clear a priori—the British Doctors Study has followed participants from the baseline “questionary” through 50 years [68]. Other cohorts are continuing to mature, and some have ceased follow-up [6]. While individual cohorts may be evaluated against the metrics of discovery, development, and delivery, we might also consider the issues addressed under return on investment. The changing patterns of exposure, new populations defined by birth cohort, and immigration, all suggest that from a public health perspective, NIH must balance a portfolio of cohorts to adequately address lifestyle, occupation, and other factors that are best evaluated through long-term prospective cohort studies. Perhaps those cohorts that are not best adapting to changing technologies and understanding of disease processes and are not updating exposure information after diagnosis (or similar factors outlined under return on investment) should receive lower priority for continued funding. Thus, in addition to the evaluation of specific hypotheses for cohort competitive renewal, we might also evaluate and score the contribution to discovery, development and delivery and return on investment to date, and the potential for return in the next and 10 years should renewal funding be implemented.

Conclusion

In sum, cohorts are complex and most informative when exposures are updated or multiple endpoints are addressed. Most importantly for cancer etiology and prevention, they can place exposure in the time course of disease development to address key questions of timing and magnitude of change necessary to result in prevention of cancer. Furthermore, when multiple endpoints are addressed, they can place etiology and prevention in broad public health context giving insights to the trades-offs of risks and benefits of lifestyle and lifestyle changes. Established modern cohorts clearly support numerous scientific investigations beyond the primary etiologic hypotheses defined to justify cohort follow-up. Based on these considerations, it is clear that the return on investment of established cohorts can increase over time, and careful consideration should be given to the allocation of resources to maximize the pay-off for informing prevention strategies.