The Environmental Genome Project: functional analysis of polymorphisms.

The ultimate goal of the Environmental Genome Project is the improvement of human health on the basis of information about the variations in certain genes. The first phase of the project will involve the selection of the human genes and characterization of the alleles occurring in the American population. However, intelligent use of this information will require analysis of the relevance of the allelic differences. Epidemiology alone will not solve the problem, and mechanistic studies will be required. Factors to be considered in the design of functional analyses are considered in this commentary.

The Environmental Genome Project is an effort by the NIEHS to characterize the variations in a number of important human genes and to relate these differences to susceptibility of humans to chemical and physical agents in the environment. The long-term goal of this work is to improve human health, primarily through improved prevention, but also through treatment when appropriate.
Why is there strong interest in this project? At least four major reasons can be identified. 1) The Human Genome Project is not finished, but there is broad support in the biomedical science community for the program and a general consensus that the work will be completed on schedule, even in the absence of major new developments in technology. Also, related government efforts are under way, such as the Centers for Disease Control National Health and Nutrition Examination Survey (NHANES) program and a National Cancer Institute effort to identify genes that show altered expression levels in tumors. All of this information will provide a knowledge base to use in disease prevention and treatment. 2) There is a long history of interest in inherited diseases, which resulted in the establishment of medical genetics and genetic epideminology as disciplines (1). Many of these debilitating diseases were found to be disorders of primary metabolism. The reasoning is that if genetics contributes to rare diseases, then it should also contribute to more common ones. 3) Many studies with experimental animals show genetic influences on susceptibility to chemical and physical agents. For instance, mice deficient in the Ah locus show differences in cancer incidence and acute toxicity from a variety of chemicals (2). Another relevant example in humans is the differential susceptibility to ultraviolet (UV) light due to functional differences in DNA repair genes (3,4). 4) Studies on human drug metabolism over the past 20 years have provided convincing evidence of the wide variability in function of the enzymes ofxenobiotic metabolism [e.g., cytochromes P450, N-acetyltransferases, and glutathione (GSH) transferases] and the in vivo significance of these variations in the disposition of these drugs in humans (5)(6)(7). In some cases, fatalities can be attributed to these differences (8). Further, many of these same enzymes are known to be involved in the bioactivation and detoxication of potentially healthdamaging chemicals found in the environment (i.e., other than drugs).

Background
The goal of the Environmental Genome Project is to characterize allelic variants qualitatively and quantitatively. Variations from a predominant allele are often referred to as genetic polymorphisms, a term used to describe variants occurring at an incidence of >1%. Polymorphisms are common among humans. Some have no functional significance, and identification of which do and do not have functional significance will be necessary in order to interpret the information obtained in the Environmental Genome Project.
The significance of a polymorphism depends on the phenotype. Phenotype changes are observed if 1) the altered codon produces an amino acid substitution that changes the catalytic or stability properties of the coded protein; 2) a stop codon is formed to cause premature termination of protein synthesis; 3) the rate of transcription to form RNA is decreased or increased; 4) RNA maturation is prevented (splice site variants); or 5) mRNA stability is modified. Only case 1 produces a protein with different properties, but any of these cases can raise or lower the amount of a particular protein produced. Two other points should be made. First, there are emerging technologies that can be used to rapidly assay mRNA levels. At the recent NIEHS Symposium on the Environmental Genome (9), P.O. Brown presented information on the use of chip technology for this purpose (10). Nongenetic effects on levels of mRNA and protein expression cannot be distinguished from genetic effects in the absence of further information. However, if a genetic polymorphism is identified, it may be possible to use mRNA expression to establish the functionality of a polymorphism.

Use of Epidemiology: Strengths and Weaknesses
Considerable effort has been expended on the use of epidemiology in the analysis of the effects of physical and chemical agents in the environment on human health. Recently, the limitations of epidemiology have been discussed (11,12). Inherent problems in most epidemiological analysis include racial patterns, small sample sizes, and confounding variables. For these reasons, it is unlikely that epidemiology alone will be sufficient to establish the significance of polymorphisms. An analogy, although not perfect, is in drug development, where clinical trials would not be done in the absence of at least some basic mechanistic information about the action of a new drug.
The environmental health epidemiology literature contains many controversies. The effects of low-level electromagnetic radiation on cancer and pesticides on breast cancer remain controversial at best (11,(13)(14)(15)(16). The field of molecular epidemiology, in which biomarkers and intermediate end points are incorporated, is also complicated. Many associations of genetic polymorphisms of enzymes involved in xenobiotic metabolism with disease have been considered (17), and meta-analyses are presented in a recent article by d'Errico et al. (18).
Odds ratios vary considerably among different studies with some of the genes. Further, in many cases there is no established mechanistic basis for associations. Although the Ile/Val 462 polymorphism of P450 lAI has been studied extensively, two studies have shown little if any catalytic differences between the gene products (19,20). A reported association of an epoxide hydrolase polymorphism with aflatoxin-related hepatocellular carcinoma (21) has also been questioned on a mechanistic basis (22).
Although these are serious concerns about the discipline of dassical epidemiology, ultimately mechanism-based molecular epidemiology will be needed to establish the importance of paradigms developed in the laboratory. Returning to the analogy of drug development, many candidate compounds appear promising, but do not deliver efficacy in clinical trials.

Problems in Functional Analysis
As mentioned above, there are serious caveats in reliance on only epidemiology as a means to establishing significance of genetic variations. One reasonable approach would be to use experimental studies to establish mechanisms, which can be used as a guide to doing more enlightened epidemiology. However, there are serious issues to consider in designing meaningful mechanistic studies on the functional significance of polymorphisms.
Many of the environmental exposures are complex mixtures, e.g., tobacco smoke, smog, and mixtures of solvents in waste dumps. The etiology is often not precisely known. Many of these mixtures are not readily adaptable to laboratory experiments (23)(24)(25). Another issue is the choice of an appropriate dose of a single compound or a mixture in a cellular system. What dose is really relevant to the human disease under consideration? Will this dose be enough to observe a difference in the laboratory experiment? Is the end point under observation reflective of the disease? For instance, DNA adducts and mutations are often used as intermediate end points in cancer studies but may not always be predictive in the absence of other factors such as cell proliferation. We also need to know whether a certain assay will be predictive for the target organ under consideration.
Another potential problem is an unexpected gain of an unpredicted function. This appears to be the case in superoxide dismutase variations related to familial amyotropic lateral schlerosis, in which the variant form appears to enhance H202dependent oxidations that may be responsible for the neurological defect (26,27).
Similarily, a certain P450 variation might show decreased catalytic activity toward a model substrate but increased activition of one more relevant to toxicity, and the effect might not be predicted.
Finally, another complication is the interactions of gene products, an issue related to haplotypes. This situation can be readily appreciated with some of the proteins involved in gene regulation, in which various heterodimers have varying activities and can be derived from members of two different gene families, e.g., PPAR/RXR and fosljun (28,2,). Such situations can also be seen with enzymes involved in xenobiotic metabolism, in which one enzyme makes a product that is used as a substrate by another. Such paradigms apply to pathways of both activation and detoxication.

Approaches to the NIEHS Environmental Genome Project
Although the outline of the Environmental Genome Project was still being developed at the time of the October 1997 workshop, the general strategy being developed is as follows. The first stage is defining the precise goals, genes under investigation, numbers and source(s) of samples, and the technologies to be employed in the sequencing analysis. The first real experimental stage is the actual sequence analysis. Currently a rough estimate is that this will involve 200 genes and 1,000 human samples, and if we assume an average of 10 kb/gene then the prospect is for approximately 2 billion bases of sequence analysis (9. The next stages are less clear (9). One aspect involves extending the basis to more individual samples. This can readily be accomplished using chip technology, in the context of all of the identified polymorphisms. The other aspect involves functional analysis. Knowledge of the polymorphic variants will be of some immediate use in epidemiological settings, although the concerns about understanding significance of polymorphisms through only epidemiological studies have already been expressed. Functional analysis must involve mechanistic studies on the roles of the polymorphisms. However, exactly how the studies will be designed has not been considered in detail as ofyet.

Strategies for Functional Analysis
In some cases assays can be done directly with humans. For instance, drugs are oxidized by some of the P450 enzymes under consideration here (5). Function can often be determined in in vitro assays in tissues obtained from organ donors or surgical samples, and comparisons with genotype can be made. Such work can be extended to in vivo drug studies with humans (30). Thus, information about polymorphisms that will be relevant to drug metabolism itself can be rapidly obtained. The question is how to relate this to diseases with complex etiologies (e.g., chemical mixtures, unknown causes).
Another approach, which will probably be more general, involves heterologous expression and examination of function. There are basically two different end points to use. One is to use a specific assay (e.g., for an enzyme such as glutathione peroxidase); the other is to examine the pathology of the cells following expression of the polymorphic variants of each gene product. Questions involve appropriateness of these model systems to humans. A number of model systems exist for heterologous expression of genes (31). They include bacteria, insect cells, yeast, mammalian cells, human cells, and (transgenic) mice. Human cell lines are usually of tumor origin and may either involve transient expression or be stable cell lines. A number of issues need to be considered when choosing a system. They include ease of expression, yields of expressed product, the endogenous background (i.e., similar proteins to the expressed gene product), the availability of discernible parameters as end points to measure, and relevance of the system to the human situation. The similarity to humans must be considered with regard to transcriptional regulation, protein stability, and interactions with other components.

Approaches to Functional Analysis-A Universal Approach
There are two approaches to doing functional mechanistic analysis of polymorphisms. One strategy is to choose a common cellular system and express all of the genes under investigation. This has the advantage of a common base for comparison and could also make use of the cheapest host system. A possible proposal is to use yeast to express all of the genes and their variants. Yeast are eukaryotes and have an advantage in that many, but not all, of the gene products involved in yeast gene regulation have been found to have human counterparts. Following expression of genes, the enzyme activity could be measured, if an assay exists. The phenotypic changes could be observed and quantified, if possible. These results could then be extended to (human) epidemiology studies.
There are a number of deficiencies in this approach. The main one is that the Commentaries * Environmental genome functional analysis assays of function will be quite different for the various genes, and some may not be appropriate for yeast. For instance, the enzymes involved in xenobiotic metabolism tend to act in concerted pathways, and yeast do not normally contain much of these systems.

Multiple Approaches to Functional Analysis: Examples
An approach that will probably be more realistic is one that involves multiple strategies depending upon the gene under consideration. A few examples are considered.
In many respects, the enzymes involved in the metabolism of xenobiotic chemicals are easiest to deal with. After establishing the polymorphisms and frequency distribution, it may be expedient to first look for a correlation of the polymorphism and in vitro catalytic activity in human tissue samples. Various gene products can also be expressed readily in simple heterologous vector systems, and assays of catalytic activity can be done. These enzyme systems have the advantage that, in many cases, probe drug substrates are known that can be used in humans (5), and studies on in vivo phenotyping/genotyping may be possible, as already have been done for enzymes such as P450s 2D6 and 2C19 (32). If differences can be identified, work can be extended to epidemiological settings. A caveat in all of this is that the test (and probe) substrates may not be relevant to human health issues, so aspects such as the chemical etiology of disease must also be considered.
Another case involves enzymes related to oxidative stress. As with the case above, it is possible to look for correlation of the polymorphism and in vitro catalytic activity in human tissue samples, if relevant samples and an appropriate assay are available. The gene products can be expressed in simple systems, perhaps even microbial ones, and the abilities of the expressed gene products can be compared with regard to their abilities to protect from an induced oxidative stress. If microorganisms are used, some of the experiments might be repeated in mammalian cells. In the extension of the work to molecular epidemiology, one approach might be to look for biomarkers of oxidative stress [e.g., F2 isoprostanes (33)] in humans having various genotypes.
Enzymes that repair chemical and physical damage to DNA are certainly of interest in a venture such as the Environmental Genome Project. One approach might be to use blood cells from individuals of defined genotypes and do assays on the repair of transfected plasmids with UVor carcinogen-damaged DNA (34). The work could then be extended to cell systems in which the individual gene products are overexpressed and then treated with chemicals or physical agents known to produce the particular kind of DNA damage under consideration. The results could then be extended to epidemiology. Some of the systems under consideration are complex, and more creativity may be required. For instance, how will we deal with polymerases, tumor suppressors, cell cycle checkpoint-associated proteins, and signal transduction pathways? In some cases, coupling may be done in yeast experimental systems. A general problem with both yeast and mammalian cells is that wild-type homologs already exist in the cells and may make interpretation of expression studies more difficult.

Conclusions
A few more points should be made. In the above considerations of the possible expression systems to use, one might ask why not use the most complex and relevant. However, if we work with 200 genes and each has 5-10 polymorphisms (a rough but not unreasonable estimate), we face the prospect of 1,000-2,000 expressions and doing the necessary assays. Therefore, some thought must be given to simplicity of systems and rapid assays.
Another problem already mentioned is that many of the effects may be small in themselves, and synergism of multiple phenotypes may be needed for a particular disease. For instance, one can consider the epidemiology work on colon cancer, where a mixture of a high level of P450 1A2 and the rapid phenotype for N-acetyltransferase 2 showed a synergism (along with consumption of well-done cooked beef) (35). Further, we do not really know all of the functions of the genes we are exploring, due to various aspects of coupling. For instance, some proteins appear to be involved in both DNA repair and transcription (36)(37)(38).
Finally, we have issues of inducibility and various nongenetic differences. In many cases the regulatory elements involved have not been identified, and these should be the targets for polymorphism studies. Also, we should remember that unknown dietary factors play an important role in these issues of human health, and the genotyping studies may or may not impact on these.
Despite all the caveats presented about how to proceed with functional analysis, there is still considerable optimism that the Environmental Genome Project will be successful. Accumulation of genotype data is clearly the first step. Planning is critical for the functional analysis stage.