The challenge of risk characterization: current practice and future directions.

Risk characterization is perhaps the most important part of risk assessment. As currently practiced, risk characterizations do not convey the degree of uncertainty in a risk estimate to risk managers, Congress, the press, and the public. Here, we use a framework put forth by an ad hoc study group of industry and government scientists and academics to critique the risk characterizations contained in two risks assessments of gasoline vapor. After discussing the strengths and weaknesses of each assessment's risk characterization, we detail an alternative approach that conveys estimates in the form of a probability distribution. The distributional approach can make use of all relevant scientific data and knowledge, including alternative data sets and all plausible mechanistic theories of carcinogenesis. As a result, this approach facilitates better public health decisions than current risk characterization procedures. We discuss methodological issues, as well as strengths and weaknesses of the distributional approach.


Introduction
Risk characterization, the process of integrating the parts of a risk assessment into a form useful for communicating to risk managers and the public, is perhaps the most important, but most overlooked, portion of risk assessment. This paper critiques current approaches to risk characterization and discusses a distributional method that might better convey the mix of both knowledge and uncertainty. The paper begins with a discussion of the risk characterization portions of two risk assessments of gasoline vapor, primarily within a "risk presentation" framework advanced by an ad hoc committee of experts from academia, industry, government agencies, and public policy think tanks (1). Several key comparisons are made to illustrate the uneven and cursory manner in which risks are often characterized. Building on these examples, an alternative approach to risk characterization is described. The new approach, which entails construction of a probability distribution on risk, is discussed from both the scientific and decision-making points of view. Strengths and weaknesses of the distributional approach are discussed as well as future directions for work in this area.

Gasoline Vapor Risk Assessments and the Ad Hoc Study Group Framework
Only two risk assessments that attempt to estimate cancer risk from gasoline vapors were identified, one from the U.S. Environmental Protection Agency (EPA) (2) and one from the Northeast States for Coordinated Air Use Management (NESCAUM) (3). A qualitative review of the carcinogenicity of gasoline vapors was undertaken by the Health Effects Institute (HE I) (4), but because it made no quantitative estimates of risk, it was not included in this analysis.
The risk presentation framework in which the risk assessments were analyzed was developed by an ad hoc study group sponsored by the EPA, the American Industrial Health Council, the U.S. Department of Health and Human Services, and the Society for Risk Analysis (1). The group's recommendations stress two main themes: a) the need to be explicit when characterizing risks, stating the uncertainties and assumptions that underlie the numbers, as well as the rationale for the data and models chosen in estimating the risk and b) use of all relevant data in assessing risk based on a weight-of-the-evidence approach, critically reviewing the data and examining the impact on the risk assessment of alternative data sets or models. Table 1 outlines the full set of recommendations, breaking them into five general areas: general attributes, hazard identification, dose-response evaluation, exposure assessment, and risk characterization. Figure 1 also Scope and objectives of the report are explicitly stated Content is laid out impartially with a balanced treatment of the evidence bearing on the conclusions Risk assessment presentation includes a description of any review process that was employed, acknowledging specific review commentary Key findings of the report are highlighted in a concise executive summary Explains clearly how and why its findings differ from other risk management reports on the same topic Explicitly and fairly conveys scientific uncertainty, including a discussion of research that might clarify the degree of uncertainty Hazard Identification All relevant information is presented and reviewed Highlights critical aspects of data quality A weight-of-the-evidence approach is presented for judgment as to the likelihood of human carcinogenic hazard and includes a clear articulation of the rationale for the position taken Identifies research that would permit a more confident statement about human hazard Dose-response evaluation Valid data sets and plausible models for high-to-low dose and interspecies extrapolation are presented in dose-response modeling Presentation of dose-response evaluation includes both an upper and lower bound of potency estimates and, wherever possible, some measure of the central tendency Offers an explicit rationale for any preferred data set(s) and model(s) used in dose-response evaluation; strengths and weaknesses of the preferred data sets are discussed, and scientific consensus or lack thereof is indicated for critical issues or assumptions Reveals how dose-response relationships change with alternative data sets, assumptions, and models

Exposure assessment
Purpose and scope of the exposure assessment and the underlying methodologies are clearly described Specific populations and subpopulations that are subjects of the assessment are clearly identified, and the reasons for their selections and any exclusions are given Available data are considered and critically evaluated, and the degree of confidence in the data expressed (reasons for any data exclusion are presented) If models are used, their bases are described, along with their validation status Potential sources, pathways, and routes of human exposure are identified and quantified; the reasons why any are not included in the assessment are presented Central estimates and upper and lower bounds on exposures or, if possible, the full population distribution of exposures are described and preferred estimates are noted together with supporting documentation Uncertainties in the estimates are described, and the relative importance of key assumptions and data is highlighted Research or data necessary to improve the exposure assessment are described details our subjective evaluation of the EPA and NESCAUM risk assessments. Generally speaking, the EPA (2) report did an admirable job of evaluating the impact of alternative data sets, using either the female mouse or male rat carcinogenicity data, as well as alternative models for low-dose extrapolation. The

Key Features of the Gas Vapor Risk Characterizations
We have chosen three of the study group recommendations outlined in Table 1 for further discussion because they illustrate some of the key challenges in risk characterization. These recommendations are a) hazard identification: a weight-of-the-evidence approach should be presented for judgment as to the likelihood of human carcinogenic hazard that includes clear articulation of the rationale for the position taken; b) dose-response evaluation: all valid data sets and plausible models for high-to-low dose and interspecies extrapolation are presented in dose-response modeling; and c) dose-response evaluation: the presentation of dose-response evaluation includes both an upper and lower bound of potency estimates and, wherever possible, some measure of the central tendency. In the areas addressed by each of these recommendations, we discuss the approach taken by the EPA and NESCAUM in their gasoline vapor risk assessment. We then present the distributional approach as an alternative and illustrate how it facilitates superior decisions.
Weight-of-the-Evidence Approach to Characterize Human Carcinogenicity When extrapolating results of carcinogenicity tests in animals to humans, risk assessors must evaluate the statistical, biological, and epidemiological evidence to determine the relevance of the animal response to human risk (5,6). In the case of gasoline vapors, some of the important issues in determining its potential hazard to humans include the carcinogenicity of wholly vaporized gasoline versus the low boiling point fraction, the relevance to humans of male rat kidney tumors (which may be caused by a species-specific mechanism), how to interpret increases in mouse liver tumors in light of the high background rate, and the potential role of benzene in gasolineinduced carcinogenicity (2,4).
EPA did a good job discussing both the relevant animal and epidemiological data. In addition, EPA detailed uncertainties in the hazard identification process, explaining why gasoline was classified as a B2 carcinogen. This classification is interesting in light of the fact that gasoline vapor contains benzene, a group A carcinogen. EPA also, very admirably, stressed that the conclusions reached in other parts of the risk assessment, including potency estimation and risk characterization, are based on the assumption that gasoline is a human carcinogen, an uncertain prospect.
In contrast to the EPAs assessment, the NESCAUM document (3) focused only on evidence supporting the carcinogenicity of gasoline and benzene. NESCAUM gave little credence to the relevance of mechanistic research on the male rat kidney tumor response or the toxicological significance of whole gasoline versus the low-molecularweight fraction.
Neither EPA nor NESCAUM attempted to provide a quantitative risk characterization based on the weight of the evidence. Current risk characterization procedures limit the use of weight of the evidence to hazard identification (i.e., the carcinogen classification) stage, reporting the classification along with a separately assessed quantitative risk estimate (5). Because it is not incorporated into the numerical estimate of risk, uncertainty information that is part of the hazard identification stage is frequently unclear and ignored. As a result, risk managers may place the same priority on known human carcinogens like radiation as they do on animal carcinogens of uncertain human relevance like gasoline vapor (7).

Valid Data Sets and Plausible Models for Extrapolation
For some substances, there are multiple animal bioassay data sets with different findings. Moreover, it is often not clear which laboratory animal strain, species, or sex is the best surrogate for humans. Despite these ambiguities, risk assessment procedures usually call for estimates to be based on the most pessimistic data set. That is, risk assessment procedures must use data from the animal species, sex, and tumor site that give rise to the highest potency estimate (8).
Numerous mathematical models can be used to extrapolate results from the high doses used in animal bioassays to the much lower doses commonly encountered by humans. Many risk assessment procedures make use of only one extrapolation procedure. For example, the EPA as a rule uses a technique which, roughly speaking, estimates the highest potency that does not produce a "very poor" fit to the data (9). EPA also uses one of a range of plausible interspecies scaling factors to convert animal doses to their human equivalents (10).
In the case of their gasoline risk assessment, EPA went beyond their usual practice, comparing risk estimates based on both rat and mouse data using three different statistics (the confidence interval upper and lower bounds and the maximum likelihood estimate) to extrapolate from high to low doses. Moreover, EPA explains their choice of one of these estimates over the others, although their primary rationale was an attempt to build conservatism into their estimate rather than a belief that their selected data set and extrapolation technique yielded the best representation of reality. NESCAUM made use of two data sets (mouse and rat) but relied on a single statistical model to derive upper bounds on risk, values which are very similar for the two data sets.

Presentation of Upper and Lower Bound Values
Few risk assessments report both an upper and lower bound on risk as well as an estimate of a central tendency. Moreover, there is little agreement on which statistics are appropriate for each of these three values (7,11). For example, the expected value, median, or mode (most likely value) may be reported as the central estimate. Any percentile in the distribution may be reported as the upper bound. Neither NESCAUM nor the EPA attempted to report all three of these values. Instead, each reported only an upper bound for their estimate of risk.

Distributional Approach
In contrast to the procedures used by EPA and NESCAUM, the "distributional approach" to risk assess-ment portrays the full range of conceivable risks, all weighted by their relative likelihood in light of current evidence and scientific judgment. In other words, the distributional approach describes the risk associated with a substance as a probability distribution over a range of possible risk values. Though generating this distribution involves implementation problems (which we discuss later), current risk assessment methods avoid these problems only at the cost of leaving out information that may be important to the risk manager (12,13). As the goal of risk assessment is to provide any risk manager with all insights that may be relevant to his or her decision, these other methods do not facilitate optimal decision making.
We advocate the superiority of the distributional approach in estimating potency values, showing that failure to use this method leads to the exclusion ofinformation relevant to the risk manager. That is, judging outcomes by the risk manager's own criteria, the distributional approach leads to better decisions on average than do other techniques. We argue specifically that an optimal potency assessment methodology must include the following characteristics: a) it must consider all relevant information, including the relative importance of different data sets and the plausibility of competing mechanistic theories; b) it must portray the full range of possible potency values given all relevant knowledge (and ignorance) regarding the substance; and c) it must assign to each plausible potency value its relative likelihood. In other words, it must portray the probability distribution of potency values.
To illustrate the importance of each of the these items, we show that they can each affect choices made by the risk manager. Specifically, we imagine a risk manager setting regulatory priorities for a collection of substances akin to EPA's new "risk-based priority setting procedure" (14). We demonstrate that these factors can affect the risk manager's priorities, and hence that failure to include these items can result in suboptimal choices.
By focusing on a limited number of data sets, or even just a single data set and implicitly assuming an underlying carcinogenic mechanism, currently used potency assessment methods do not take into account the impact that these default procedures have on potency estimates. Formaldehyde risk assessment demonstrates the impact the data set selection can have on the potency estimate. The Kerns et al. study (15), the largest and most extensively peer reviewed formaldehyde animal bioassay, serves as the basis for virtually all formaldehyde risk assessments. That study found an increase in both benign and malignant tumors in the rat nasal cavity. Although most potency estimates are based on only the malignant tumors, some scientists recommend the use of all tumors in the nasal cavity (12). Suppose a risk manager were comparing formaldehyde and another substance that generated the same number of malignant tumors but did not produce any benign tumors. Without knowledge of the benign tumors caused by the formaldehyde, the risk manager might place the same priority on both substances. However, knowledge of the benign tumors, along with the fact that at least some scientists believe them to be relevant to human cancer risk, might lead the risk manager to place a higher regulatory priority on formaldehyde.
Assumptions about underlying mechanisms are also critical. Science often draws inferences from an understanding of the underlying mechanism even in the absence of data. For example, our mechanistic understanding of planetary motion leads us to suspect (very strongly) that the sun rises on Jupiter even though we have no empirical evidence to this effect (16). Our assumptions regarding the mechanism underlying cancer are particularly crucial because we have no low-dose human data for the vast majority of chemicals. In the best of cases, we have only high-dose animal experiment data. The critical inferences drawn regarding the effect of low-dose human exposure, and hence the regulatory priority assigned by the risk manager, depend on assumptions such as whether we believe the chemical exhibits threshold or nonthreshold behavior and whether we believe the carcinogenic mechanism in animals is relevant to humans. In fact, judgments about the underlying mechanism may lead the risk manager to assign different priorities to two substances whose empirical data sets are similar. Because these judgments are so important, risk assessment procedures that do not consider all plausible mechanisms do not facilitate optimal decisions.
Taking all this information into account, representing the full range of possible potency values is also important. Failure to express the full range of potential values may cause the risk manager to make inconsistent decisions. For example, current risk assessment approaches often report conservative or upper-bound potency estimates. These estimates often portray substances about which we know little, which is as worse than equally dangerous substances about which we know much more. This is because with limited knowledge, the plausible upper bound on potency can be much greater than it is ifwe know much more about a substance. Defenders of this approach argue that caution is warranted when knowledge is limited. We believe that the proper degree of caution is a choice to be made by the risk manager, not the potency assessor. A conservative potency estimate may lead a "risk-neutral" risk manager (that is, someone interested in minimizing expected lives lost) to place a high regulatory priority on a substance that will statistically generate fewer deaths than another substance about which more is known.
Central estimates, such as the mean, maximum likelihood estimate, or mode (most likely value) can likewise mislead risk managers, depending on their values. For example, reporting the expected value does not distinguish between substances that are certain to kill a small number of people and substances that have a small probability of killing many people. Some risk managers may be willing to accept a relatively small number of deaths to avoid the small possibility of disaster. This type of trade-off may underlie society's current trend toward favoring coalpowered electric plants over nuclear plants. Pollution and occupational hazards associated with the former are virtually certain to kill people every year. However, this risk may be more acceptable to society than even the small possibility of another Chernobyl-type incident wiping out entire regions. For "risk-averse" risk managers reflecting societal wishes, reporting the expected number of annual deaths from nuclear power and coal-fired electric plants does not capture all the relevant information. For these risk managers, the full range of possibilities must be reported.
Finally, in addition to the full range of potencies, the relative likelihood of various values is also important.
Clearly the probability of extreme values will influence the regulatory priority a risk manager places on a substance. Even if the range ofvalues for two substances is the same, most risk managers will place a higher priority on the substance for which higher potencies are considered more probable. The relative likelihood of potency values is also important in determining which substances should be further researched. If a broad range of potency values are possible, but only a narrow range is likely, then further research is unlikely to yield new insights. Often, research efforts are better spent on substances for which a broad range of values are similarly plausible. In other words, we can learn more by studying substances we do not understand than we can from studying substances whose properties are already well known.

Implementation Challenges for Distributional Analysis
Skeptics of the distributional approach might argue against it on two grounds: a) the dependence of the distributional approach on subjective judgments undermines its reliability and scientific objectivity, and b) it is more difficult to understand assessments presented in the form of a distribution than it is to understand one or a handful of summary statistics. The first of these problems stems from the effort to incorporate all relevant knowledge. Here, relevant knowledge includes both the methodology and results of specific experiments and the scientific judgment regarding the relative quality and applicability of different data sets and the plausibility of underlying biological mechanisms. Currently used risk assessment approaches often specify which data sets are relevant and how they should be interpreted. Only uncertainty arising from the stochastic nature of the experiments is considered in the assessment of different potency values.
The relative likelihood of different theoretical carcinogenic mechanisms, along with the relative importance of different data sets, cannot be resolved by well-defined procedures. Instead, the distributional approach must rely on the subjective judgment of qualified experts. Techniques for eliciting expert judgment are subject to a number of difficulties (13,17,18). Experts tend to express too much confidence in their opinions, and hence tend to underestimate uncertainty. Moreover, it is not always clear how to combine conflicting views expressed by different experts. Finally, there is no universally accepted procedure to determine who the experts are. All of these problems are active areas of study in the decision analysis and expert judgment fields (13,19). Critics might argue that the lack of objective procedures for making these subjective judgments leaves the process vulnerable to manipulation by interested parties. But these subjective judgments cannot be avoided. Alternative risk assessment procedures make the same choices. The difference is that while the distributional approach makes difficult decisions about the relevance of various data sets and the plausibility of competing mechanisms explicitly, other risk assessment approaches make these decisions implicitly. Because the decisions are implicit, currently used risk assessment methodologies are subject to a far more insidious form of manipulation than is the distributional approach. These implicit, uniform decisions are not only wrong in many circumstances but cannot even be debated because of their hidden nature.
The second set ofproblems associated with the distributional approach stems from the presentation of the potency values in the form of a probability distribution. Other approaches present either a single number, or perhaps a range of values accompanied by a best or central estimate of potency. But the complexity of a probability distribution can be avoided only at the cost of information that may be important to the risk manager. Choosing summary statistics to simplify the presentation of the potency assessment necessarily involves value judgments, choices that are not appropriate for the risk assessor.
Before concluding, we note that in addition to providing the risk manager with needed information, the distributional approach encourages the scientific community to conduct more helpful research. Scientists are more likely to investigate issues if they believe the assessment process does not have built-in assumptions dictating a certain point of view. They will also be more likely to support the risk assessment process if they do not see it as failing to consider all relevant data or applying inflexible approaches that are incompatible with empirical results. This advantage has important implications for the EPA, which is currently attempting to increase the input of scientists from outside the Agency into the risk assessment and regulatory process (20).
In summary, both the distributional approach and currently used assessment methods face the same difficult choices. The distributional method is superior to the standard approach because it makes use of all available scientific knowledge and deals with difficult judgments explicitly. By doing so, it permits the risk manager to make an optimal choice based on his or her values.

Next Steps
Because the theoretical superiority of the distributional approach is clear, its feasibility must now be demonstrated. We have begun pilot projects to illustrate the method of distributional analysis of carcinogenic potency using formaldehyde and chloroform as demonstration compounds. The formaldehyde analysis consists of the construction of a probability tree and calculation of a risk distribution using our own judgmental probabilities (21). the chloroform work goes beyond this, constructing and eliciting probabilities for the tree with experts in risk assessment and chloroform science. We urge others to work to demonstrate the distributional approach.