2
General Considerations of Uncertainty and Variability, Selection of Dose Metric, and Dose-Response Modeling

Health risk assessments now typically include discussion of variability (real differences) and uncertainty (fundamental lack of knowledge) and often use probabilistic risk assessment methods to characterize variability and uncertainty in the estimates of risks. Prior National Research Council (NRC) reports and U.S. Environmental Protection Agency (EPA) documents make clear the need for these characterizations; for example, they emphasize that

uncertainty forces decision-makers to judge how probable it is that risks will be overestimated or underestimated for every member of the exposed population, whereas variability forces them to cope with the certainty that different individuals will be subjected to risks both above and below any reference point one chooses (NRC 1994, p. 237)

and that

[i]n successive versions of its cancer guidelines, EPA expressed increasing emphasis on a full examination of uncertainties, with the recognition that both qualitative and quantitative approaches to uncertainty assessment are important and can (applied appropriately) help clarify the nature of assessment findings. The use of sophisticated uncertainty tools also involves substantial issues of science and mathematics, as well as specialized issues such as the appropriate presentation and characterization of probabilistic estimates in the decision making context where appropriate. (EPA 2004a, p. 49)

Significant uncertainties remain in understanding human health risks from 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD), other dioxins, and di-



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 45
Health Risks from Dioxin and Related Compounds: Evaluation of the EPA Reassessment 2 General Considerations of Uncertainty and Variability, Selection of Dose Metric, and Dose-Response Modeling Health risk assessments now typically include discussion of variability (real differences) and uncertainty (fundamental lack of knowledge) and often use probabilistic risk assessment methods to characterize variability and uncertainty in the estimates of risks. Prior National Research Council (NRC) reports and U.S. Environmental Protection Agency (EPA) documents make clear the need for these characterizations; for example, they emphasize that uncertainty forces decision-makers to judge how probable it is that risks will be overestimated or underestimated for every member of the exposed population, whereas variability forces them to cope with the certainty that different individuals will be subjected to risks both above and below any reference point one chooses (NRC 1994, p. 237) and that [i]n successive versions of its cancer guidelines, EPA expressed increasing emphasis on a full examination of uncertainties, with the recognition that both qualitative and quantitative approaches to uncertainty assessment are important and can (applied appropriately) help clarify the nature of assessment findings. The use of sophisticated uncertainty tools also involves substantial issues of science and mathematics, as well as specialized issues such as the appropriate presentation and characterization of probabilistic estimates in the decision making context where appropriate. (EPA 2004a, p. 49) Significant uncertainties remain in understanding human health risks from 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD), other dioxins, and di-

OCR for page 45
Health Risks from Dioxin and Related Compounds: Evaluation of the EPA Reassessment oxin-like compounds (DLCs), in spite of very large investments in data collection and research. Variability among members of the population is an important consideration in understanding risks. Variability results from the wide range of environmental sources and human interactions with them, as well as from physiological and genetic differences that might influence the relative susceptibility of humans and other species to adverse health effects from exposure. For example, sources of variability associated with human health outcomes include the inherent genetic diversity of human populations, which currently remain difficult to address quantitatively. Abundant evidence demonstrates complex gene-environment interactions for many complex human diseases, immune system dysfunction, and other disorders in which TCDD, other dioxins, and DLCs might be implicated. Adding more complexity, the risks from TCDD, other dioxins and DLCs continue to change over time because of changing exposures, and understanding of the risks continues to evolve with the collection of more data. Any assessment reflects the snapshot of the information available at that time, and analysts should recognize that additional information might later reveal evidence that differs from prior assumptions. One of the charges to the committee emphasized reviewing the Reassessment1 “to assess whether EPA’s risk estimates are scientifically robust and whether there is a clear delineation of all substantial uncertainties and variability.” Risk assessment in the case of TCDD, other dioxins, and DLCs represents a formidable task because of the size of the available database and the complexity of numerous issues. EPA collated and presented a massive database on TCDD, other dioxins, and DLCs, on which the committee commented specifically in the chapters that follow. This chapter identifies the major categories of decisions that analysts generally make when developing risk estimates in the context of the four traditional steps of risk assessment: hazard identification and classification, exposure assessment, dose-response assessment, and risk characterization (NRC 1983). The Reassessment deals with complexities in the risk assessment of TCDD, other dioxins and DLCs by making specific choices as described in this chapter, but EPA could alternatively use a probabilistic approach. Typically, risk assessments should address uncertainties that derive from conceptualizations and fundamental choices among competing options in a way that clearly identifies the quantitative impacts of alternatives. When there are two or more plausible interpretations, a risk assessment should make clear 1 The Exposure and Human Health Reassessment of 2,3,7,8-Tetrachlorodibenzo-p-dioxin (TCDD) and Related Compounds (EPA 2003a, Part I; 2003b, Part II; 2003c, Part III) is collectively referred to as the Reassessment.

OCR for page 45
Health Risks from Dioxin and Related Compounds: Evaluation of the EPA Reassessment that such alternatives give rise to uncertainty. To this end, a risk assessment should identify the key uncertainties (those that drive the risk estimates) and make clear how selection of specific alternative assumptions influences the risk assessment results. In general, the choice of individual or population risk metric that is modeled influences the appropriate characterization of variability and uncertainty in risk (Thompson and Graham 1996). The Reassessment strives to present a comprehensive baseline risk assessment intended to cover all potential sources. This generic approach results in limited discussions of variability and uncertainty. The committee found that the lack of a specific context and absence of a focused exposure assessment that would link sources to potential health effects in individuals, or in the population, severely limited both EPA’s and the committee’s abilities to appropriately characterize variability and uncertainty in risk estimates related to exposure to TCDD, other dioxins, and DLCs. HAZARD CLASSIFICATION In the context of the Reassessment, EPA faced the decisions of assigning a hazard classification for TCDD, and for other dioxins and DLCs, including mixtures. Hazard classification typically focuses on characterizing the weight of the evidence with respect to potential health effects. For cancer risk, the cancer guidelines (EPA 2005a, also see Appendix B) outline specific criteria for classifying substances into the following categories: Carcinogenic to humans Likely to be carcinogenic to humans Suggestive evidence of carcinogenic potential Inadequate evidence to assess carcinogenic potential Not likely to be carcinogenic to humans The charge to the committee stated that it should address “the scientific evidence for classifying dioxin as a human carcinogen.” The committee believes that the scientific evidence on cancer causation usually falls within a continuum, and classification often artificially places apparent bright lines (e.g., in distinguishing a “known human carcinogen” from a “likely human carcinogen”). In Chapter 5, the committee reviews and comments on EPA’s decisions with respect to its determinations of cancer classification. With respect to noncancer end points, the committee notes that EPA does not use a rigorous approach for evaluating evidence from studies and the weight of their evidence in the Reassessment. The committee finds that EPA’s lack of systematic evaluation and classification of the noncancer

OCR for page 45
Health Risks from Dioxin and Related Compounds: Evaluation of the EPA Reassessment evidence left significant ambiguity about the basis for some of EPA’s decisions implied in the report (e.g., the decision not to identify a critical effect or to develop a reference dose [RfD]). The Reassessment provides an extensive catalog of studies but does not synthesize the significant insights or provide clear assessments of the key uncertainties in a way that allows the reader to determine the impact of various choices made. In general, the use of a rigorous evaluation process for noncancer hazards would lead to improved characterization of noncancer risks. In the context of the Reassessment and any future iterations of this analysis, the committee suggests that EPA focus its efforts on improving its quantitative characterization of the risks, including noncancer risks, and not devote substantial effort to further carcinogen classification for TCDD, other dioxins, and DLCs, as discussed in Chapter 5. EXPOSURE ASSESSMENT EPA provided the committee with an updated exposure inventory (EPA 2005b), which provides an extensive review of the existing database of exposure data for TCDD, other dioxins, and DLCs. The review also provides a useful qualitative review of the level of confidence in the data for various sources, although the Reassessment does not quantitatively characterize the uncertainty associated with low-confidence data. Although the Reassessment (Part III, p. 4-6) specifically mentions the possibility of unknown sources causing underestimation of releases from contemporary sources, it does not attempt to correct the incomplete accounting of sources in historical data or adjust current data to address anticipated discoveries of other sources. Thus, EPA implicitly assumed that the exposure assessment sufficiently captures the exposure sources so that any additional new sources identified would not significantly alter its estimates. The committee discusses this choice in more detail in Chapter 4 and suggests additional analyses that might further explore the impacts of this assumption. The updated exposure inventory devotes considerable attention to documenting how the nature and magnitude of dominant exposure sources changed over time. The substantial amount of new evidence of significant declines in measured concentrations of TCDD, other dioxins, and DLCs over the past several decades reflects EPA’s specific management efforts targeted at reducing exposure from some sources (e.g., pulp and paper mills, medical and municipal waste incineration, and ball clay2). Referring 2 The term ball clay originated from an early English mining practice of rolling the highly plastic clay into balls weighing 30 to 50 lb. Ball-clay uses historically included serving as a supplement in animal feeds (as in chicken feed). In 1996, as a result of investigations into the source of contamination with TCDD and other dioxins in chicken fat, investigators measured relatively high levels of TCDD and related compounds in ball clay (FDA 1997).

OCR for page 45
Health Risks from Dioxin and Related Compounds: Evaluation of the EPA Reassessment specifically to TCDD, EPA notes that “dioxin levels in the environment have been declining since the 1970s…, and it is reasonable to expect that levels in food, human intake, and ultimately, human tissue have also declined over this period. The changes in tissue levels are likely to lag the decline seen in environmental levels, and the changes in tissue levels cannot be assumed to occur proportionally with declines in environmental levels” (Reassessment, Part III, p. 4-16). Changing concentrations in the environment over time provides another substantial uncertainty in risk assessment, because EPA must decide whether to use specific “snapshot-in-time” concentrations for risk assessment or whether to extrapolate or average such changing concentrations over time. Given the timing of the updated exposure inventory, it was not clear to the committee how EPA intends to use the updated inventory information in the context of estimating current exposures. Another area of uncertainty lies in determining what constitutes background exposures in the general population. EPA carefully defines “background” in a prominent footnote (Reassessment, Part III, p. 1-1), and the committee concurs that this approach is appropriate and is clearly presented in the Reassessment. However, the uncertainty associated with potential discoveries of “new sources” will remain an issue that EPA may need to analyze further. For example, the Reassessment added a chapter on ball clays in the latest iteration. Yet another area of uncertainty is determination of background levels when many samples lie below the analytical limit of detection. This issue arises in any exposure assessment, and several widely used options address it (e.g., assume all nondetects are true zeroes, assign a value of either 1/22 or 1 times the detection limit, or fit a distribution to the data). The committee noted that EPA did not pick a single consistent approach (see the note to the summary table at the bottom of Part III, p. 4-32) or provide a clear quantitative indication of the importance of the choice of strategy for dealing with nondetects, which creates inconsistencies in the Reassessment. The committee recommends that EPA clearly and quantitatively explore how different strategies for dealing with nondetects affect exposure assessment results, as discussed in Chapter 4. If these alternative approaches produce very different results, then EPA should further consider the implications of specific options. Another major source of uncertainty stems from the selection of a dose metric. The Reassessment could provide exposure estimates for a wide range of dose metrics and averaging times to support the spectrum of possible dose-response assessment choices. This important issue is discussed in more detail below. The Reassessment also provides little insight about bioavailability, an issue that frequently falls between the domains of the exposure assessment and dose-response assessment. Finally, the Reassessment provides very little information about the

OCR for page 45
Health Risks from Dioxin and Related Compounds: Evaluation of the EPA Reassessment amount of individual variability in exposure. EPA describes how average daily toxic equivalent quotient (TEQ) varies as a function of age (Reassessment, Part III, pp. 4-16, 4-17, and 4-35), although it does not provide a measure of the variability around these estimates (that is, the population distribution of exposures within each age group). EPA’s description may implicitly give the impression of very limited variability within the population, which may not be the case. However, the Reassessment provides some good examples of other parameters that may influence interindividual variability. For example, considering the variability in total fat consumption, the Reassessment suggests that TCDD intakes in the general population could extend to levels at least three times higher than the mean (Reassessment, Part III, p. 4-19). The exposure assessment also demonstrates that TCDD intake for children based on age-specific food consumption and average food concentrations exceeds adult intake estimates on a body-weight basis (although their intake on a mass basis is lower) (Reassessment, Part III, p. 4-35). These examples also illustrate the difficulties that arise in choosing an appropriate overall averaging time for exposure. ASSESSMENT OF OTHER DIOXINS AND DLCS The challenge of characterizing the risks from complex mixtures also leads to important choices. EPA’s use of a TEQ approach represents the prevailing strategy (in the United States and internationally). In Chapter 3, the committee provides an in-depth evaluation of EPA’s use of toxic equivalency factors (TEFs) and TEQs. This issue also represents an important area of uncertainty in the overall risk assessment. The Reassessment states that “despite the uncertainties in the TEF methodology, the use of this methodology decreases the overall uncertainty of the risk assessment” (Reassessment, Part III, p. 1-10). Although that may be true, EPA should quantitatively support the argument with some comparisons or data. The Reassessment also notes that “TEFs are the result of scientific judgment of a panel of experts who used all of the available data, and they are selected to account for uncertainties in the available data and to avoid underestimating risk. In this sense, they can be described as public-health conservative values” (Reassessment, Part III, p. 1-5). The committee recommends that EPA quantify the extent to which the TEF estimation process may be health protective. In addition, because TEFs continue to evolve (see Chapter 3), EPA must continue to choose which TEF values to use and which congeners to include. Such choices will influence exposure estimates as well as the uncertainties associated with those estimates. The Reassessment acknowledges the difficulty of comparing different human-exposure data sets because some do not include coplanar polychlorinated biphenyls in the estimation of TEQ values. The Reassessment clearly

OCR for page 45
Health Risks from Dioxin and Related Compounds: Evaluation of the EPA Reassessment states that TCDD per se is not the main contributor to TEQ levels in human lipids (Part III, Table 4-5). The Reassessment uses the calculation of body burden at steady state, its associated assumptions given in the Reassessment (Part III, section 1.3), best estimates of current adult intakes, and the assumption of 25% body fat to calculate the TEQ concentration in human lipids. The resulting estimate is about one-half the level currently measured in human lipids. The Reassessment suggests that this discrepancy arises from the presence of an historical body burden and lipid concentration, but it does not consider other possibilities. GENERAL ISSUES RELATED TO VARIABILITY AND UNCERTAINTY ASSOCIATED WITH SELECTION OF DOSE METRIC AND DOSE-RESPONSE MODELING EPA makes a number of assumptions about the appropriate dose metric and mathematical functions to use in the Reassessment’s dose-response analysis (see “Selection of Dose Metric” and “Dose-Response Modeling” in this chapter for specific issues related to dose metric and dose-response modeling). The Reassessment does not adequately comment on the extent to which each of these assumptions could affect the resulting risk estimates. EPA discussed various dose metrics and selected one particular metric based on its judgment. However, EPA did not quantitatively describe how this particular selection affected its estimates of exposure and therefore provided no overall quantitative perspective on the relative importance of the selection. EPA faced numerous choices with respect to developing quantitative models for characterizing cancer risk from exposure to TCDD, other dioxins, and DLCs (summarized in Table 2-1) and for characterizing noncancer effects (summarized in Table 2-2). The Reassessment characterizes the risk of cancer at background and incremental intakes by using a cancer slope factor (CSF), and it recommends the use of a margin of exposure (MOE) for both noncancer and cancer end points (Reassessment, Part III, p. 6-12). The committee did not find EPA’s justification sufficient for why it used different methods to characterize risk for end points that have the same basic underlying mode of action. The committee noted that the Reassessment should also quantitatively characterize the impact of this choice. The Reassessment concludes that setting an RfD is not appropriate because of the relatively high background levels compared with effect levels and suggests that setting an RfD provides little value for evaluating possible risk management options if average background exposure exceeds the RfD (Reassessment, Part III, p. 6-14). As discussed in Chapter 7, this decision conflicts with the choices made by other international regulatory bodies (e.g., European Scientific Committee on Food, Food and Agricultural Orga-

OCR for page 45
Health Risks from Dioxin and Related Compounds: Evaluation of the EPA Reassessment TABLE 2-1 Categories of Key Decisions EPA Faced in Characterizing Cancer Risk Basis for Quantification Epidemiological Data Set Bioassay Data Set Dose-Response Model Dose Metric Point of Departure Epidemiological and bioassay data Epidemiological data Bioassay data Other Choose from individual studies Use multiple studies Choose from individual studies Use multiple studies Low-dose linear Nonlinear Multiple Other Average daily dose Area under the curve Lifetime average body burden Peak Other ED01 ED05 ED10 LED01 Other Abbreviations: ED, effective dose; LED, lower confidence limit on ED.

OCR for page 45
Health Risks from Dioxin and Related Compounds: Evaluation of the EPA Reassessment TABLE 2-2 Categories of Key Decisions EPA Faced in Characterizing Noncancer Risk Basis for Quantification Epidemiological Data Set Bioassay Data Set POD Dose Metric Critical Effect Choice Epidemiological and bioassay data Epidemiological data Bioassay data Other Choose from individual studies Choose from individual studies LOAEL NOAEL ED01 ED05 ED10 BMD Other BB ADD AUC Peak Other Reproductive and developmental Immunotoxicity Neurotoxicity Central nervous system Diabetes Enzymatic change Other Additional Categories Exposure Route Exposure Time Type of Dosing DRD U.F. (Database) U.F. (Interspecies) U.F. (Intraspecies) Ingestion Inhalation Multiple Other Depends on individual studies Single Multiple Yes No 10 3 1 Chemical-specific adjustment factor Other 10 3 1 Chemical-specific adjustment factor Other 10 3 1 Chemical-specific adjustment factor Other Abbreviations: POD, point of departure; LOAEL, lowest-observed-adverse-effect level; NOAEL, no-observed-adverse-effect level; ED, effective dose; BMD, benchmark dose; BB, body burden; ADD, average daily dose; AUC, area under curve; DRD, develop reference dose; U.F., uncertainty factor.

OCR for page 45
Health Risks from Dioxin and Related Compounds: Evaluation of the EPA Reassessment nization of the United Nations [FAO]/World Health Organization [WHO], and the Joint Expert Committee on Food Additives [JEFCA]). EPA’s decision not to specify an RfD in the Reassessment may have depended on the set of specific assumptions it selected, such as use of the 1% effective dose (ED01) as the point of departure (POD) for this calculation and the magnitude of the applicable uncertainty factors. The Reassessment provides a thorough statement of the potential sources of uncertainty for consideration in noncancer risk assessment, many of which also apply in the context of cancer risk assessment: Consideration should be given to a number of difficulties and uncertainties associated with comparing the same or different endpoints across species, such as differences in sensitivity of endpoints, times of exposure, exposure routes, and species and strains; the use of multiple or single doses; and variability between studies even for the same response. The estimated ED01s may be influenced by experimental design, suggesting caution should be used when comparing values from different designs. Caution should also be used when comparing studies that extrapolate ED01s outside the experimental range. Furthermore, it may be difficult to compare values across endpoints. For example, the human health risk for a 1% change of body weight may not be equivalent to a 1% change in enzyme activity. Similarly, a 1% change in response in a population for a dichotomous endpoint is different from a 1% change in a continuous endpoint, where the upper bound of possible values may be very large, leading to a proportional increase in what constitutes the 1% effect level. Finally, background exposures are often not considered in these calculations simply because they were not known. (Reassessment, Part III, p. 5-24) The Reassessment used empirical, full dose-response modeling to estimate PODs, specifically an ED for cancer and noncancer. Historically, a POD for a noncancer end point was based on a no-observed-adverse-effect level (NOAEL) or a lowest-observed-adverse-effect-level (LOAEL), a practice inconsistent with cancer risk assessment. EPA now recommends the use of a benchmark dose (BMD) approach to derive a POD for noncancer end points. Although a lower confidence bound on an ED was cited in the literature to define a BMD, EPA’s BMD guidance document (EPA 2000b) defines the ED, BMD, and the lower one-sided confidence limit on the BMD (BMDL).3 This definition unified the determination of PODs for 3 “BMD is used generically to refer to the benchmark dose approach; in the more specific cases, BMD … refer[s] to the central estimates, for example the EDx … for dichotomous endpoints (with x referring to some level of response above background, e.g., 5% or 10%). BMDL … refers to the corresponding lower limit of a one-sided 95% confidence interval on the BMD….” (EPA 2000b, Executive Summary)

OCR for page 45
Health Risks from Dioxin and Related Compounds: Evaluation of the EPA Reassessment cancer and noncancer end points. The modeling process generally involves two steps: The first step is an analysis of dose and response in the range of observation of the experimental or epidemiologic studies. The modeling yields a POD near the lower end of the observed range, without significant extrapolation to lower doses. The second step is extrapolation to lower doses. The extrapolation approach considers what is known about the agent’s mode of action. Both linear and nonlinear approaches are available. (EPA 2005a) This analysis requires making several key decisions, including primarily (1) determining appropriate types of studies (epidemiological, animal, both, and other), (2) choosing specific studies and subsets of data (e.g., species and gender), (3) choosing specific end points for dose-response modeling, (4) choosing a specific dose metric, (5) choosing model type and form, (6) selecting the benchmark response (BMR) and POD, and (7) characterizing uncertainty. Current EPA practice generally relies on choosing to model a single data set, specifically the one that tends to show the most significant potential adverse effect. This choice can introduce substantial uncertainty into the risk estimation process, particularly in cases in which different data sets yield very different results. One way to avoid the uncertainty introduced by the selection of a single data set is to use multiple data sets. In particular, EPA could place some weight on each of a number of data sets. Chapters 5 and 6 review EPA’s data set choices made in the Reassessment. GENERAL ISSUES RELATED TO RISK CHARACTERIZATION Critical issues related to risk characterization (see Chapter 7) include the impact of decisions on the information communicated to risk managers about the magnitude of uncertainties associated with the data used to generate risk estimates. The impact of choices made in the risk assessment process can be characterized by quantifying the impact of plausible alternative assumptions at critical steps. The risk estimates can be most fully characterized by performing probabilistic analyses when possible and by presenting the range of possible risk estimates rather than by reporting the single point estimates. Risk characterization should provide useful information to risk managers to help them understand the variability and uncertainty in the risk estimates. As further discussed in Chapter 5, the committee understands that quantitatively addressing all sources of uncertainty in a risk assessment can impose an analytical burden, which may result in addressing some sources of uncertainty qualitatively. Quantifying the contribution of various assumptions to the overall uncertainty often proceeds

OCR for page 45
Health Risks from Dioxin and Related Compounds: Evaluation of the EPA Reassessment response relationship, called the “quantal dose-response relationship,” for a population of exposed individuals. The quantal dose response describes the relationship between exposure and the proportion of the population that will exhibit a health effect (that is, a separate relationship for each adverse end point). In the case of TCDD, other dioxins, and DLCs, it is important to assess the population-based dose-response relationship for cancer, birth defects, immunotoxic effects, and so forth. For each end point of interest, individuals in a population (e.g., rats and mice in laboratory studies and humans in epidemiological studies) are identified as either responders or nonresponders at defined doses (quantal responses). The cumulative quantal dose-response relationship for the population is then determined from the distribution of responses in the population across a defined range of doses. The term threshold is often used to describe the dose below which no response occurs for the graded (continuous) dose-response relationship or the dose below which the probability of anyone in the population responding approaches zero for the cumulative quantal dose-response relationship. A common but scientifically unachievable goal in risk assessment is to identify a threshold dose that protects everyone in the population. The term offers some value in recognizing that for the vast majority of dose-response relationships (either individual or population) some doses may exist below which no measurable responses occur (in an individual or a population). However, the term threshold remains subject to many vagaries of interpretation, and the committee prefers to express ranges of dose in terms of MOEs. MOEs are usually defined as the ratio of the highest dose (daily exposure) to an agent presumably without adverse impact on the human population (the so-called reference dose; Faustman and Omenn 2001) to the estimated daily human dose that might occur, determined from analysis of actual exposure scenarios. Because of inherent biological differences between individuals, as well as the probabilistic nature of many toxic responses, distributions in responses in a population will always exist (that is, not everyone responds the same way to the same dose). In human populations, differences arise from genetic diversity, differences in age, gender, nutritional status, diseases, and other concomitant exposures, which can modify the response of an individual to a toxic substance. However, such contributors to human variability are presumably represented in the data sets obtained in human population-based studies (epidemiological studies), although any one study generally cannot capture the full range of possible individual variability in response. A second major challenge in establishing population-based dose-response relationships in epidemiological studies arises from the frequently poor quality of exposure (dose) information. Although well-designed occupational and environmental epidemiological studies can yield useful infor-

OCR for page 45
Health Risks from Dioxin and Related Compounds: Evaluation of the EPA Reassessment mation on human population variability, relatively little quantitative information is available about the potential impact on genetic polymorphisms in the human population that might give rise to differences in susceptibility to the toxic effects of TCDD, other dioxins, and DLCs. Chapters 5 and 7 provide more discussion about genetic, molecular, and biochemical mechanisms that might contribute to interindividual variation in response to TCDD, other dioxins, and DLCs. With these caveats noted, risk assessors commonly take existing data sets (both animal and human) and attempt to develop mathematical models to characterize the shape of the dose-response relationships from the observed data. Dose-response modeling is a process to formally quantify dose-related changes in the incidence or severity of an adverse effect. The scale of the response can be quantal (e.g., cancer incidence) or continuous (e.g., AHR-binding immune response). Analysts use mathematical functions (preferably with mechanistic parameters) to describe the dose-response relationship observed in the data. In the case of cancer or any quantal outcome, the dose-response model, R(dose), is the same as the probabilistic risk of the adverse outcome. With this dose-response model, or risk, R(dose), the EDα, at which there is a prespecified, small amount (typically 1 ~ 10%) of risk increase α above the background, can be estimated by the following equation of excess risk: The risk increase α is called the effective dose level. Because R(dose) is a statistically estimated quantity (function), the resultant EDα is subject to data variation. In the case of a continuous response (or more generally, a nonquantal response), EPA guidance documents discuss how the type of data and biological knowledge will determine appropriate methods using general approaches, but no single approach or model can be universally the “best.” Analysts first fit a dose-response model R(d) to the response data. They then take additional steps to formulate a measure of risk based on the model. Here, R(d) describes the mean response level of the toxicological outcome (e.g., cognitive function as measured in terms of IQ test score in the case of exposure to a neurotoxin). The Reassessment discusses several proposed approaches (Part II, pp. 15-16), all of which identify a dose associated with a specified level of response change relative to the control. For continuous responses, this task is complicated by the ambiguous separation between a “normal response” and an “adverse response.” In lieu of an obvious dividing line, EPA used the “dynamic range” approach (Murrell

OCR for page 45
Health Risks from Dioxin and Related Compounds: Evaluation of the EPA Reassessment et al. 1998), which defines EDα (EPA assigned an α value of 1%) as the dose satisfying the relationship, where Rmax is the maximum range of total response, either theoretical or estimated under the maximum exposure condition. The main drawback of the approach used by EPA is that the response level associated with the EDα may not be clinically or toxicologically important. The NRC (2000) described an alternative approach in the context of its review of methylmercury toxicity, based on work by Crump (1984) and Gaylor and Slikker (1992). That approach first identifies an adverse response level, which demarcates normal and abnormal (or adverse) responses. For example, in the case of a neurotoxin, an IQ score of 70 points (two standard deviations [SDs] below the population mean of 100 points) could be designated the adverse response level because individuals with IQ scores below this level often require community support to live (WHO 1992, as cited in EPA 2005c). The EDα is then defined to be the neurotoxin dose that increases the background probability of an adverse response by α. Continuing the IQ example, the ED05 is the level of neurotoxin exposure that increases the background risk of having an IQ below 70 of 2.5% by an extra 5% (5%*97.5%=4.875%), to a total of 7.375%. The Reassessment (Part II, p. 8-16) identifies difficulties with this approach. Although such an adverse response level might not always identify toxicologically meaningful events, it can identify unusual outcomes outside the normal range. The committee recognizes this challenge and understands that for some end points this may emerge as an insurmountable challenge. Nonetheless, because the EDα definition used by EPA is difficult to interpret toxicologically, EPA should strive to use the alternative approach described here whenever possible. Historically, risk assessment of noncancer effects used a NOAEL or a LOAEL as the POD. The BMD approach (Crump 1984) eliminates some of the limitations of the NOAEL and LOAEL approach and makes the analysis of noncancer effects more consistent with that of cancer. The primary objective of dose-response modeling is to define an ED toward the lower end of the experimental dose range where the model remains supported by adequate data. The ED can then be used as a POD for extrapolation toward an environmental background level or for safety assessment using the MOE approach. The choice of model for dose-response assessment, choice of the POD, and extrapolation below the POD thus represent other key areas of uncertainty. The Reassessment quantified the cancer dose-response relationship

OCR for page 45
Health Risks from Dioxin and Related Compounds: Evaluation of the EPA Reassessment relying primarily on occupational cohort data. EPA also used selected animal bioassay data to confirm the plausibility of the resulting estimates. Specific issues related to choice of data set for cancer risk assessment are discussed in more detail in Chapter 5 Data Set Selection Full dose-response modeling requires adequate dose-response data, and adequate selection criteria must be applied. EPA’s guidance document (EPA, 2000b, p. 14) states: In general, studies with more dose groups and a graded monotonic response with dose will be more useful for BMD analysis.… Studies in which responses are only at the same level as background or at or near the maximal response level are not considered adequate for BMD analysis. It is preferable to have studies with one or more doses near the level of the BMR to give a better estimate of the BMD and, thus, a shorter confidence interval. Studies in which all dose levels show changes compared with control values (i.e., no NOAEL) are readily useable in BMD analyses, unless the lowest response level is much higher than that at the BMR. Depending on whether the scale of the selected end point is quantal (dichotomous), continuous, or categorical, different statistical procedures and models are required for dose-response modeling. EPA’s Reassessment selected a large body of published data sets, using the criteria of (1) a positive dose trend and (2) at least three dose groups in addition to a control (more specifically for noncancer data). In dose-response modeling of human cancer data, EPA further used cancer death incidence (time-to-event) data as the end point, which generally provides more information than mortality data by considering when a death occurred. (These studies are discussed in more detail in Chapter 5.) Statistical Power and Precision Although meeting those minimal selection criteria (discussed above) is critical, it does not guarantee adequate statistical power to ascertain the shape of the dose-response curve, and it does not account for the associated uncertainty. In the present context, statistical power refers to the general ability of an experiment, and its associated data set, to provide information needed to make a reliable inference, including testing positive dose effects and ascertaining a fitted dose-response model. The Reassessment did not discuss the issue of statistical power, although the cancer guidelines (EPA 2005a, see also Appendix B) recommend assessing the statistical power of the studies used for dose-response assessment when possible. Even if a study possesses adequate statistical power to

OCR for page 45
Health Risks from Dioxin and Related Compounds: Evaluation of the EPA Reassessment confirm a positive overall dose response within the observed data range, the power might be inadequate to ascertain the shape of the dose-response curve below the POD level. The lack of statistical power at the lower end also represents a problem for both cancer and many of the noncancer data sets, contributing additional uncertainty to the POD. Choice of the Dose-Response Model The goal of mathematical modeling in determining a POD is to fit a model that describes the data set well, especially at the lower end of the observable dose-response range. Fitting such a model involves first selecting models for consideration, based on the characteristics of the data and experimental design, and then fitting the models using one of a few established methods. Then, an ED, along with its upper and lower confidence bounds, is calculated at the POD level. In the process, the analysis should evaluate model fitting, determine goodness-of-fit, and compare models to decide which one to use for obtaining the POD. For example, the BMD guidance document (EPA 2000b) recommends use of a P value of 0.1 as the reference critical value for goodness-of-fit (instead of the more conventional values of 0.05 and 0.01), examination of a graphical display of the model fit, and use of Akaike’s information criterion for comparison of models and selection of the model to use. In the case of human cancer data, the Reassessment included fits of linear and nonlinear models to the data (see Chapter 5). With the rodent cancer data, EPA used a simple multistage model fitted with the BMD software program. For noncancer data, EPA used the Hill model as the default for continuous responses, with a power model as the alternative when the Hill model failed to fit the data computationally. (See Chapter 6 for additional discussion about specific noncancer end-point modeling.) EPA used the Weibull model as the default for quantal noncancer data. The committee commends EPA for using flexible mathematical models (e.g., the Hill and Weibull models) to account for both nonlinear and linear shapes of the dose response for noncancer effects. However, the committee recommends that EPA apply similar efforts in dose-response modeling of human cancer data (see Chapter 5). The Reassessment did not conduct or report statistical tests of goodness-of-fit of the cancer risk models. Two reasons might explain the absence of these test results. First, EPA relied on the models reported in the original publications. For example, Steenland et al. (2001) fitted several models to the risk ratio for cancer death incidence, including a power and a piecewise linear model. The likelihood ratio test showed a statistically significant, positive dose response, but the graphical display clearly showed a potential lack of fit. It is important to note that a higher statistical signifi-

OCR for page 45
Health Risks from Dioxin and Related Compounds: Evaluation of the EPA Reassessment FIGURE 2-1 Vmax. As used in the BMD software for modeling dose-response data, the term Vmax refers to the modeled maximum percent response seen in the observed data set. SOURCE: N. Walker, NIEHS. cance does not correspond to a higher degree of goodness-of-fit of the model to the data. The Reassessment did not distinguish statistical tests of significance from tests of goodness-of-fit. Second, EPA had access only to summary data taken from the published literature for dose-response modeling, not the raw data, and consequently may not have been able to conduct statistical tests for goodness-of-fit. Nonetheless, the committee recognizes that the critical choice of the dose-response model would benefit from as much information as possible. In contrast, EPA adopted an ad hoc method to assess goodness-of-fit in dose-response modeling of noncancer end points. Specifically, the model fits were evaluated with regard to the observed data. The goodness of the model fit was determined as ‘good’ if the model curve included nearly all of the data point means, ‘marginal’ if the model curve was within one standard deviation of the data point means, or ‘poor’ if model fit was not within one standard deviation of the means. Furthermore, for the Hill model fits, the Vmax [see Figure 2-1] estimates from ‘good’ and ‘marginal’ model fits were subjectively evaluated for stability and biological plausibility with regard to the observed data. This evaluation identified some potential problems with some of the Vmax estimates. In

OCR for page 45
Health Risks from Dioxin and Related Compounds: Evaluation of the EPA Reassessment some cases the error associated with the Vmax could not be calculated by the BMD software. In these cases if the Vmax model estimate was similar to the ‘observed Vmax’ (i.e. the difference between the highest dose response level and the control response level) then the Vmax estimate was considered biologically plausible and was used for the calculation of an ED01. Otherwise the ‘observed Vmax’ was used for calculation of the ED01. (Part II, p. 8-32) This subjective approach to goodness-of-fit did not identify whether the lack of fit occurs at the higher or lower end of the observed dose-response range. Alternatively, the Reassessment could judge goodness-of-fit of an empirical dose-response model on mechanistic grounds. Finally, a statistically well-fit model alone does not guarantee that the model approximates the true but unknown shape of the dose response, especially below the observed dose-response range. With limited data (e.g., about three dose groups for noncancer data) and limited statistical power, many of the data sets (including epidemiological studies) analyzed in the Reassessment do not provide sufficient information to confirm the true shape of the dose-response curve at the ED01 level. The committee emphasizes that this critical uncertainty about low-dose extrapolation remains one of the most significant uncertainties; at the same time, it represents an uncertainty that EPA probably will not resolve in the short term. When feasible, mechanistic and statistical information should be used to ascertain the shape of the dose-response curve at lower doses. Minimally, EPA should use rigorous statistical methods to assess model fitting to control and reduce the uncertainty of the POD caused by a poorly fitted model. Choice of the POD Value Selection of the ED (BMR) level is critically important in the calculation of an ED (BMD), and therefore, in the determination of a POD or calculation of a MOE. The current cancer guidelines (EPA 2005a, see also Appendix B) and the draft BMD guidance document (EPA 2000b) give detailed recommendations. For quantal data, an excess risk of 10% was chosen as the default level because 10% response is at or near the limit of sensitivity in most cancer bioassays and in some noncancer studies as well. If a study offers greater than usual sensitivity, then a lower level (e.g., 1%) can be used. EPA recommends the 1% BMR level for epidemiological studies primarily because the 1% level is typically within the observed range. In any case, according to the guidance document, the ED10 should be reported along with any other possible POD options. EPA’s BMD guidance document further recommends:

OCR for page 45
Health Risks from Dioxin and Related Compounds: Evaluation of the EPA Reassessment For continuous data, if there is an accepted level of change in the endpoint that is considered to be biologically significant then that amount of change is the BMR. Otherwise, if individual data are available and a decision can be made about what individual levels should be considered adverse, the data can be ‘dichotomized’ based on that cutoff value, and the BMR set as above for quantal data. Alternatively, in the absence of any other idea of what level of response to consider adverse, a change in the mean equal to one control standard deviation (SD) from the control mean can be used. The control SD can be computed including historical control data, but the control mean must be from data concurrent with the treatments being considered. Regardless of which method of defining the BMR is used for a continuous dataset, the effective dose corresponding to one control SD from the control mean response, as would be calculated for the latter definition, should always be presented for comparison purposes. (EPA 2000b, p. vii) In EPA’s computation of ED01 for noncancer continuous end points, the 1% BMR level is defined as the change of response from the background level of the control group that was 1% of the maximum possible total response range. The choice of a 1% BMR level ignored EPA’s own guidance that “if there is an accepted level of change in the end point that is considered to be biologically significant then that amount of change is the BMR” (EPA 2000b, vii). The Reassessment also did not consider an alternative approach to dichotomize a continuous outcome into normal and extreme outcomes below a lower or above an upper percentile (Gaylor and Slikker 1990), an approach recommended in the BMD guidance document (EPA 2000b) and implemented in EPA’s BMD software program. Because the shape of the dose-response is less certain at the lower end of the experimental range, the consequent uncertainty for the ED chosen in this range is important. This uncertainty is likely to be greater for the lower confidence bound of ED01 than on the central estimate of ED01 itself. The Reassessment appears to have largely ignored this issue. As the starting point of extrapolation of risk to environmental exposure levels, the POD directly influences the risk estimate. The lack of fit of the model at the lower end of the dose-response curve leads to substantial extrapolation of the model toward the POD, and that can bias the ED or BMD estimates and widen their confidence intervals, adding substantially to the uncertainty of the estimate.4 4 The accuracy of any experimental measurement is limited by the ability to measure the phenomenon, by any methodological errors introduced through sampling (e.g., limitations in sample size or selection), and by assumptions made in fitting a model to the data. As such, any result obtained provides an estimate of the “true value” with some associated uncertainty. A confidence interval represents the likelihood that the “true value” will occur within

OCR for page 45
Health Risks from Dioxin and Related Compounds: Evaluation of the EPA Reassessment Despite the Reassessment’s consideration of multiple options and the use of flexible model forms (such as the Hill model and the Weibull model) to test for nonlinear dose response, mechanistic knowledge gaps, data gaps, and model gaps remain. For example, many of the data sets of noncancer effects yielded a Hill coefficient greater than 1.5, indicating a plausible nonlinear dose response. However, those studies lacked adequate statistical power to estimate the Hill coefficient reliably, rendering the estimate statistically nonsignificant (that is, the confidence interval includes unity). This result represents a general data gap because the dose-response data required to establish a nonlinear dose-response form do not exist, a problem that becomes magnified in extending nonlinear models to the low-dose range. At present, mechanistic knowledge of both cancer and noncancer effects supports the plausibility of a nonlinear dose response at the lower range (see also Chapter 5), but no adequate data or widely accepted dose-response models describe the shape below a chosen POD at or below the 1% level. It is useful to differentiate the lack of data to confirm the shape of the dose-response curve below the POD from the lack of qualitative evidence of nonlinearity. On the whole, the committee concluded that the empirical evidence supports a nonlinear dose response below the ED01, while acknowledging that the possibility of a linear response cannot be completely ruled out. The Reassessment emphasizes the lack of such nonlinear models, hence its adoption of the approach of linear extrapolation below the POD level. Although this approach remains consistent with the cancer guidelines (EPA 2005a, see also Appendix B), EPA should acknowledge the qualitative evidence of a nonlinear dose response in a more balanced way, continue to fill in the quantitative data gaps, and look for opportunities to incorporate mechanistic information as it becomes available. The committee recommends adopting both linear and nonlinear methods of risk characterization to account for the uncertainty of dose-response relationship shape below ED01. With respect to dose-response modeling, the committee recommends that the Reassessment explicitly acknowledge the lack of statistical power (precision) of the data to estimate the ED01 or test nonlinearity of the dose response below the POD level of choice (e.g., ED01). The committee notes that the choice of the 1% response level as the POD substantially affects both the cancer and the noncancer analyses, the range of the lower and upper confidence bound. For example, statisticians often choose to report a 95% confidence interval, which implies a 95% chance that the true value will fall within the stated range, but this represents a subjective choice and other choices (e.g., 90% confidence interval) are equally valid. The confidence interval depends on the underlying variability of the quantity being measured or modeled and the number of samples collected and/or available to fit the data. For any given result, collecting more samples tends to narrow the confidence interval.

OCR for page 45
Health Risks from Dioxin and Related Compounds: Evaluation of the EPA Reassessment perhaps driving EPA’s decision not to develop an RfD. The committee recommends that the Reassessment use levels of change that represent clinical adverse effects to define the BMR level for noncancer continuous end points as the basis for an appropriate POD in the assessment of noncancer effects. The Reassessment should also explicitly address the importance of statistical assessment of model fit at the lower end and the difficulties in such assessments, particularly when using summary data from the literature instead of the raw data, although estimates of the impacts of different choices of models would provide valuable information about the role of this uncertainty in driving the risk estimates. CONCLUSIONS AND RECOMENDATIONS Although EPA qualitatively addressed many sources of uncertainty and variability, the Reassessment does not adequately address uncertainty and variability that result from the numerous decisions EPA made in deriving point estimates of cancer risk in the comprehensive risk assessment. In contrast, EPA used concerns about uncertainties and uncertainty factors as part of the justification for not setting an RfD for noncancer effects (see Chapter 7 for further discussion). The Reassessment does not provide details about the magnitudes of the various uncertainties surrounding the decisions EPA makes in relation to dose metrics (e.g., the impact of species differences in percentage of body fat on the steady-state concentrations present in nonadipose tissues). The committee recommends that EPA use simple PBPK models to define the magnitude of any differences between humans and rodents in the relationship between total body burden at steady-state concentrations (as calculated from the intake, half-life, bioavailability) and tissue concentrations. The same model could be used to explore human variability in kinetics in relation to elimination half-life. EPA should modify the estimated human equivalent intakes when necessary. Many opportunities exist to further characterize sources of uncertainty and variability related to the dose metric choices, and the committee recommends that EPA provide a clear evaluation of the impacts of possible choices on the risk estimates. The committee recommends that EPA make greater use of mechanistic information to assess the biological plausibility of different mathematical models, use more rigorous criteria (e.g., goodness-of-fit tests) and follow its own guidance (EPA 2000b) in deriving a POD, and clearly identify the BMR level of toxicological significance for noncancer end points. Many opportunities exist to further characterize sources of uncertainty and variability related to the POD and extrapolation choices, and the committee recommends that EPA provide a clear evaluation of the impacts of possible choices on the risk estimates.

OCR for page 45
Health Risks from Dioxin and Related Compounds: Evaluation of the EPA Reassessment The committee notes that EPA would substantially improve its transparency and management of the complexity of the risk assessment of TCDD, other dioxins, and DLCs by creating an ongoing process for clearly identifying and updating the key assumptions that support the quantitative risk assessment. This process would essentially require viewing the risk assessment as an ongoing and iterative effort in which EPA continues to create incentives to obtain and use better information when possible and appropriate.