Biomarkers in occupational cancer epidemiology: considerations in study design.

Epidemiologic studies of occupational groups have been central to the identification of human carcinogens. The incorporation of a biochemical component into occupational studies of cancer can expand the possibilities for identifying human carcinogens and for understanding the disease process. Two epidemiologic studies of occupation and cancer which include evaluation of biomarkers are described. The association of acetylator phenotype with bladder cancer risk was studied in benzidine-exposed workers. The association of benzene-related leukopenia with leukemia is being studied in benzene-exposed workers. These investigations illustrate issues in the use of biomarkers in epidemiologic studies of cancer risk. Such studies require the identification and characterization of the population at risk. Disease susceptibility factors are amenable for inclusion in these studies and can be statistically modeled as exposure-effect modifiers. Biomarkers of exposure are mainly of importance in short-term longitudinal and cross-sectional studies of exposure and intermediate outcomes and for validation of other data sources. Several sources of error can affect the results of molecular epidemiologic studies. Aside from minimizing laboratory error, consideration must be given in the design and execution of these studies to potential problems in subject selection and field collection of biologic samples and other relevant data.


Introduction
Epidemiologic studies of occupational groups have been central to the identiflcation of human carcinogens. Of the 50 agents classified by the International Agency for Research on Cancer as carcinogenic to humans, 27 are occupation related (1). The incorporation of a biochemical component into occupational studies of cancer can expand possibilities for identifying human carcinogens and for understanding the disease process. The use of biomarkers in epidemiologic studies of cancer has been suggested for measuring the magnitude of exposure to carcinogens, or body burden; for characterizing the biologic response to such exposures; and for identifying markers of disease susceptibility (2)(3)(4). 'Environmental Epidemiology Branch, National Cancer Institute, Bethesda, MD 20895. I shall discuss the use of biomarkers in two of our ongoing epidemiologic studies of occupational exposure and cancer risk and describe general considerations in carrying out and interpreting such studies. The focus of this paper is the use of biomarkers in epidemiologic studies on the relation between occupational exposure and tihe development of cancer. First, a brief description of the epidemiologic method is given.

Epidemiologic Methods
Epidemiology is the study of the frequency of disease in populations, which is studied as a measure of risk. Risk is considered in terms of environmental factors, such as occupation, diet, tobacco use, and other behaviors, in terms ofpossible genetic factors, such as family history of cancer, and other factors, such as age and race. For example, the increased frequency of bladder cancer in workers exposed to benzidine is evidence for the role of benzidine in human bladder cancer development.

I
The epidemiologic study of cancer requires consideration of exposure and disease development over extended periods of time, typically several decades. In the cohort study design, a statistical sample of the population of interest is identified (e.g., workers in a chemical plant), and subjects are classified into groups on the basis of occupational exposure and personal factors of interest, such as year of birth and sex. The classification of occupational exposure generally accounts for the level and period of exposure, although other parameters such as peak exposure may also be of interest. The subsequent development of disease in the exposed groups is then determined and the frequency of disease compared as the relative risk of disease.
In the case-control study design, statistical samples of cases and controls are selected from the study population; and historical exposure and personal factors of interest are compared between the two study groups. As in the cohort study design, exposure classification takes level and periods of exposure into account. The case-control study can be seen as a theoretical derivative of the cohort study, in which the cases occurring during a period are compared to a sample of the study population, the controls, at risk during that time. Because of this similarity, casecontrol studies provide the same information as cohort studies about the relative risk of disease.
These and related study designs are also used in biomarker research for end points other than disease. Shortterm longitudinal studies, for example comparing preexposure and post-exposure, are used to determine changes in biomarkers over a short period. Cross-sectional studies examine the association of various factors, including biomarkers, at one point in time. These and similar study designs are of great importance for assessing the short-term biochemical and biologic effects of exposure and for studying the interrelationship of these factors with disease susceptibility. Short-term longitudinal and cross-sectional studies are the respective design equivalents of cohort and case-control studies, except for the major difference that the full temporal aspect of exposure, generally over many years, and subsequent cancer development are not considered.
Two on-going studies ofthe association between occupational exposure and cancer risk that include biomarkers are briefly described below. The description and following discussion of these studies serve to illustrate issues in the design and execution of such studies.

N-Acetylation and Bladder Cancer
When examined in general population groups, slow acetylation status and bladder cancer are only weakly associated (5), while studies of bladder cancer cases with nonspecific occupational exposures show about 2to 3-fold risks (6,7). Cartwright et al. (8) found about a 17-fold risk for bladder cancer associated with slow acetylation in dye workers, and Hanke et al. (9) found an 8-fold risk in a group of workers exposed to aromatic amines. These strong findings in industrially exposed populations suggest that acetylation rate may be important only when exposures to aromatic amines are substantial, as would be found in the occupational setting.
In 1988, we began studies in an Asian population exposed to benzidine to expand upon the previous epidemiologic studies. In this study we collaborated with W. Bi of the Chinese Academy of Preventive Medicine and F. Kadlubar of the National Center for Toxicologic Research.
In the first phase of the study, we identified a cohort of 2030 males alive in 1972 who had worked for at least one year between 1945 and 1977 in benzidine production and use in three cities in China. A comparison of the incidence rates for bladder cancer in the general population with that in the benzidine-exposed group gave a 25-fold increase. Producers of benzidine [relative risk (RR) = 45.7] were at higher risk than users (RR = 20.9). Exposure to benzidine was crudely categorized as low, medium, and high exposure, on the basis of the occupational title of the longest job held. The risk for bladder cancer was about 5-fold in the low-exposure group, increasing to 158-fold in the high-exposure group (10). Although the "high-risk" slow-acetylator phenotype is reported to be about half as frequent in Asians (11), suggesting that they may be at lower risk for aromatic amine-associated bladder cancer, the RR for bladder cancer in our study cohort seemed to be as high or higher than that identified in studies in Europe (12,13) and the United States (14).
To follow up on these findings, we carried out a casecontrol study to examine the relationship between acetylation phenotype, benzidine exposure, and bladder cancer risk. Thirty-nine surviving, pathologically confirmed, benzidine-exposed bladder cancer cases and 39 benzidineexposed controls with negative urine cytology were selected for study. Acetylation phenotyping with a caffeine dose (15) and urinary cotinine determination (16) were carried out. Information was collected on occupational history and smoking habits.
As compared to the controls, cases had a higher historical level of exposure to benzidine (p < 0.01) and greater tobacco use (p > 0.05). About 38% ofthe controls were determined to be slow acetylators (5-acetylamino-6-amino-3-methyluracil/ 1-methylxanthine [AFMU/1X] < 0.6), a prevalence higher than expected for this population. On comparison with fast acetylators, the risks (odds ratio, OR) for bladder cancer among intermediate and slow acetylators were 1.4 [95% confidence interval (CI): 0.4-5.1) and 1.2 (95% CI: 0.4-4.0), respectively. When corrected for differences in the level of benzidine exposure, the respective risks were OR = 1.8 (95% CI: 0.4-7.9) and OR = 1.5 (95% CI: 0.4-5.5). Neither of these results is statistically significant, and there is no evidence of a trend in risk. Similar analyses were carried out, adjusting for age, city, and history oftobacco use, with no important change in the findings. Urinary cotinine levels and questionnaire-derived current tobacco use were correlated, but neither influenced acetylator phenotype.
No association was found between acetylator phenotype and bladder cancer risk. Because of the high prevalence of the slow acetylator phenotype in the control series and because of our concern that bladder cancer itself may affect the phenotyping procedure, we are currently attempting to verify this finding in the same population, using dapsone and another phenotyping procedure (17), as well as acetylation genotyping (18). Recent analyses of the genetic mutations associated with the slow acetylator phenotype point to a different pattern of mutations in Europeans and Asians (18).

Benzene and Leukemia Risk
A number of epidemiologic studies have shown an increased risk for leukemia in occupational groups exposed to benzene (19,20). In a collaborative project between the National Cancer Institute and Y. Songnian of the Chinese Academy of Preventive Medicine, more than 70,000 subjects occupationally exposed to benzene between 1971 and 1985 in 12 cities in China were identified. Occupational histories were abstracted for study members for the period of employment in the study factories. Available data on measurements of exposure to benzene and information on industrial processes and industrial hygiene practices were obtained. Data on occupational history and on industrial exposures were used to estimate joband period-specific exposures.
As determined from routine examinations of peripheral blood at the factory, almost 1000 subjects had benzenerelated leukopenia, defined by Chinese national criteria (Y. Songnian, personal communication) as occupational benzene poisoning. Subjects with and without benzene poisoning were followed for the development of leukemia. In preliminary analyses, the leukemia rate is about 20-fold higher in subjects with benzene poisoning. Is benzene poisoning a marker for high-level benzene exposure or is it an intermediate end point in the benzene-leukemia relationship (21), which identifies subjects who have undergone a biologic change leading to disease susceptibility?
A detailed exposure assessment is now being carried out to determine the history of benzene exposure of the study subjects. The risk for benzene poisoning will be studied with respect to period and level of benzene exposure. Following this, the relationship between benzene exposure, benzene poisoning, and leukemia will be studied. We will explore the extent to which benzene poisoning affects leukemia risk, independent of the amount of benzene exposure: that is, to what extent is benzene poisoning a simple surrogate for benzene exposure or an independent risk factor for leukemia? The resolution of this issue is of central importance in assessing the relationship between benzene dose and leukemia response. Further work to characterize individual susceptibility to the effects of benzene and to identify biomarkers more specific than leukopenia would also be ofvalue but fall outside the possibilities of the present study.

Discussion
We have carried out a study of genetic polymorphisms associated with bladder cancer risk in workers exposed to aromatic amines and are proceeding with a study of benzene poisoning as an intermediate end point in leukemia development. In the following sections I describe a number of methodologic issues related to the execution of these and similar studies.

Population Ascertainment
Studies of occupation and the risk of cancer require ascertainment of populations which have experienced substantial occupational exposure to substances of interest over extended periods of time, generally several decades. Thanks to gains made in occupational health and safety, the number of workers with high exposures has substantially decreased. The study populations for our work on benzidine and benzene exposure were recruited in collaboration with Chinese investigators. Benzidine use in China was discontinued in 1972, and benzene exposure has markedly decreased over the past decades. While this trend is clearly to be applauded, the unique remaining opportunities to study the effects of these and other substances of interest at substantial levels of exposure must be identified soon.

Disease Frequency
One major difference that distinguishes the epidemiologic approach from other approaches to human disease is the concept of disease frequency in a defined population. For example, in a prospective study to compare the rate of leukemia in subjects with and without benzene poisoning, it is necessary to determine both the number of subjects (denominator) and the number of leukemia cases (numerator) in each group. In a case-control study, such as the biochemical component of the benzidine study, cases should represent a sample of all cases of interest and the controls should represent a sample of the population from which the cases arose.
As part of a largely observational science, epidemiologic investigations are subject to a number of pitfalls, most of which relate to errors in the estimation of relative disease frequency between comparison groups. For example, if knowledge about leukemia occurrence had influenced the categorization of subjects as having had benzene poisoning, a biased estimate of the frequency of leukemia in the benzene-poisoned group would have resulted. In the casecontrol study of benzidine-exposed workers, only prevalent, surviving cases were included for analysis. If, for example, bladder cancer cases who were slow acetylators had poorer survival, the risk of bladder cancer in the slow acetylator group wvould be underestimated.

Disease Susceptibility
Disease susceptibility has been assessed in epidemiologic studies of cancer in the past by analysis of familial association (22) and other possible genetic indicators, such as race. The goals of these studies have been to identify populations susceptible to disease and to model the relationship between exposure, disease susceptibility, and occurrence of disease. Observed familial associations could be due in part to clustering of exposure risk factors in the presumed genetic susceptibility risk groups, but genetic factors probably play an important role in causing familial aggregation (23).
The' ability to detect environmental risks may be dramatically limited by unmeasured genetically determined heterogeneity in susceptibility (24). The power of epidemiologic studies to identify human carcinogens will be greatly increased as markers of disease susceptibility are established, allowing for the separation of people at risk of cancer from those not at risk. The biologic basis for heterogeneity in the metabolism of xenobiotics is becoming better understood, and phenotypic and genotypic markers for metabolic susceptibility to cancer in association with specific exposures are currently being assessed, as described in this volume. Susceptibility to cancer may also be due to the inheritance of mutated cancer suppressor genes (25,26). This model may have general significance but has not yet been substantiated for the common tumors.
Epidemiologists are accustomed to analyzing the relationship between multiple factors in disease development. For example, methods have been developed to assess the interrelationship of occupation and smoking in their effect on cancer risk (27). Molecular susceptibility factors can be modeled in a similar way in epidemiologic studies as modifiers of the effect of exposure.
In the study ofbenzidine-exposed workers, we were able to classify subjects by historical exposure to benzidine and by their ability to N-acetylate a test compound. This allowed statistical modeling of the effect of these factors on disease risk. The results do not support the hypothesis that N-acetylation is an important metabolic pathway for benzidine-associated bladder cancer, at least in this population.

Occupational and Other Exposures
In some investigations, the most difficult area of epidemiologic study is the assessment of exposure because of the complex, and often poorly understood, relationship of disease development to dose, as expressed by level and period of exposure. Information on job title and industry forms the basis for estimates of occupational exposure. Historical information, based upon industrial process and industrial hygiene measurements, can provide further detail on occupational exposures. Contemporary industrial hygiene studies can also be carried out to validate historical exposure estimations.
Epidemiologic studies of occupation and cancer risk have relied upon retrospective estimation of occupational exposure. Information on level and period of exposure has been used to assess dose-response relationships and to evaluate the temporal relationship between exposure and disease response. As in our studies of benzidineand benzene-exposed workers, these efforts are often limited by the availability of historical information on occupation and the related periods and levels of exposure (28). No molecular marker has been proposed, however, as a reasonable substitute for such historical information. At best, such a marker would provide an indication of cumulative exposure but would not provide information about the temporal sequence of exposure. Molecular markers could be used in prospective cohort studies to characterize exposure as it occurs, or biologic material could be stored for later analysis in a "nested" case-control study. These designs are being used for large-scale population studies but have not yet been used in the occupational setting.
Some exogenous exposures may have mutational specificity, causing specific "signature" changes in DNA (29,30). If this approach proves sufficiently sensitive and specific, exposures could be determined qualitatively by examination of tumor DNA for these changes. This could be particularly useful for suggesting new carcinogens or new tumor sites for known carcinogens. Also, the ability to discriminate exposure-related from "spontaneous" tumors would in principle allow for greater specificity in examining dose-response relationships.
The principal use of molecular markers of occupational exposure is in cross-sectional or short-term longitudinal studies which can provide important insight by relating external exposure to internal exposure and to intermediate markers of effect (2)(3)(4). Further, molecular markers may allow validation of information from questionnaires and other sources; for example, serum cotinine levTels can be used to assess the validity of questionnaire data on current tobacco use.
In molecular epidemiologic studies, other instruments such as questionnaires for the collection of historical information on diet, tobacco use, and other factors may need to be developed. Such auxiliary information may be crucial to the interpretation of a study and deserves the same attention to detail as the development of laboratory procedures. The data collected should, as much as possible, reflect the amount and period of exposure. This information should then be used appropriately in the analysis. For example, pack-years, or cumulative use of tobacco, is an appropriate measure of historical tobacco use but not of current tobacco use.

Validity of Findings
In studies on the etiology of cancer, error in the dependent variable, disease, is generally not substantial; errors in the independent variables are of greater importance. Error in estimating biologic parameters may be due to limitations of biologic assay methods. In epidemiologic field studies, further errors in biologic determinations may be due to limitations, under field conditions, in overall study design and in collection, shipment, and storage of samples. Other types of misclassification can derive from errors in determination of important factors such as occupation-related exposure, tobacco use, and diet.
It is useful to consider measurement error as either differential or nondifferential. Differential measurement errors occur when the act of measurement is influenced by study group status, resulting in biased findings. For example in the case-control study of bladder cancer, the determination of the acetylator phenotype of the cases could have been biased because of the effect of elevated urinary pH in cystectomy cases on caffeine metabolites. A phenotyping procedure in serum, for, e.g., dapsone, would address this potential problem. Nondifferential error can be random, related to precision, and systematic, related to accuracy. Nondifferential error in precision, while not to be ignored, may be less problematic in epidemiologic studies than in clinical research, because the absolute value of the measurement is less important than the ability to identify group differences. Random error directly affects study power and biases the result toward the null value (31). In more complex situations where, for example, exposure status and disease susceptibility are being examined as independent variables determining disease status, errors of measurement can result in unpredictable results (32). Clearly, attention to reduction in measurement error is valuable in both the fleld and laboratory phases of molecular epidemiologic studies. Multiple repeats of analytic runs is an established technique for reducing laboratory variability. Only recently has attention been given in epidemiologic investigations to the use of multiple measures (33) and validation substudies (34) to reduce misclassification.
In the benzidine study, we are attempting to validate the acetylation phenotyping carried out with caffeine by retesting the study subjects with another phenotype test drug. Planned genotype assays may provide further insight into the study findings. Although the criteria for benzene poisoning are established in China, their implementation may vary. We are planning further studies to assess the presence of benzene poisoning in leukemia cases and controls. Independent verification of flndings in another laboratory is generally considered strong confirmation of an experimental finding. Because of the observational nature of epidemiologic investigations, it is critical to repeat studies in several settings. Often, findings will not be entirely consistent between studies and will thus require considerable judgment in deriving overall conclusions. The inconsistency in studies of N-acetylation and bladder cancer risk in aromatic amine-exposed populations illustrates this point. We plan to pursue these issues in studies in other populations.