Instruments for Assessing Risk of Bias and Other Methodological Criteria of Animal Studies: Omission of Well-Established Methods

In response to the systematic review by Krauth et al. (2013) of instruments for assessing animal toxicology studies for risk of bias and other aspects of quality, we propose the need for a broader perspective when appraising—and hopefully improving—such studies. 
 
Krauth et al. (2013) reviewed 30 instruments, 4 of which were designed for environmental toxicology studies used to evaluate human and ecological health hazards. The authors noted that these instruments were derived from preclinical pharmaceutical research in animal models. Many of these instruments focus on efficacy and not toxicity, and—as acknowledged by the authors—they may have limited potential application in environmental health research because they often have criteria that are not relevant to hazard and risk assessments. 
 
Based on these 30 instruments, Krauth et al. concluded that a limited number of risk of bias assessment criteria have been empirically tested for animal research, including randomization, concealment of allocation, blinding, and accounting for all animals. However, the authors did not discuss which elements of risk of bias criteria have been empirically tested, nor did they discuss how they were tested, leaving the reader with no information on their reliability or usefulness. 
 
We would like to bring the readers’ attention to several other important publications in environmental chemical health hazard assessment that are pertinent to this topic (Agerstrand et al. 2011; Hulzebos et al. 2010; Schneider et al. 2009), along with a U.S. Environmental Protection Agency (EPA) approach developed under the High Production Volume Challenge (U.S. EPA 1999b) as well as relevant and potentially eligible guidance developed by the U.S. EPA (1999a) and the Food and Drug Administration (FDA 2003). In addition, the majority of the procedures specified in Good Laboratory Practices and regulatory in vivo toxicity test guidelines (e.g., U.S. EPA 2013; Organisation for Economic Co-operation and Development 1999) were specifically developed to minimize systematic errors, assure high quality data and produce scientifically reliable studies. 
 
These additional publications describe design, conduct, and reporting criteria that form the basis of the methodologies employed globally to assure quality and reliability of in vivo toxicological investigations for regulatory assessment of human and ecological health hazards. Because the application of systematic review and related evidence-based approaches in toxicology is still in its infancy, it is especially important at this time to recognize the contributions of these publications. 
 
The omission of these publications by Krauth et al. could have major science policy implications. The National Toxicology Program (NTP) (whose parent organization, the National Institute of Environmental Health Sciences, funded the research of Krauth et al.) has begun relying on Krauth et al. (2013) to identify elements of risk of bias in evaluating animal studies of environmental agents as part of its systematic reviews for assessing health effects (NTP 2013a, 2013b). The reliance on criteria that have not been transparently empirically tested instead of well-established methodological criteria developed by authoritative national and international organizations could result in biased systematic reviews that ultimately lead to regulations or classifications not supported by the science. 
 
We suggest that further work is warranted in pulling together published perspectives on how to evaluate study quality in animal toxicology studies. Issues in appraising such studies for evaluating environmental hazards to humans and wildlife go well beyond those of human clinical trials, and would benefit from collaboration of experts in animal toxicology with experts in human clinical trials of medical interventions and human epidemiology.


Instruments for Assessing Risk of Bias and Other
Methodological Criteria of Animal Studies: Omission of Well-Established Methods http: //dx.doi.org/10.1289/ehp.1307727 In response to the systematic review by Krauth et al. (2013) of instruments for assessing animal toxicology studies for risk of bias and other aspects of quality, we pro pose the need for a broader perspective when appraising-and hopefully improvingsuch studies. Krauth et al. (2013) reviewed 30 instru ments, 4 of which were designed for environ mental toxicology studies used to evaluate human and ecological health hazards. The authors noted that these instruments were derived from pre clinical pharma ceutical research in animal models. Many of these instruments focus on efficacy and not toxicity, and-as acknowledged by the authors-they may have limited potential application in environ mental health research because they often have criteria that are not relevant to hazard and risk assessments.
Based on these 30 instruments, Krauth et al. concluded that a limited number of risk of bias assessment criteria have been empiri cally tested for animal research, including randomization, concealment of allocation, blinding, and accounting for all animals. However, the authors did not discuss which elements of risk of bias criteria have been empirically tested, nor did they discuss how they were tested, leaving the reader with no information on their reliability or usefulness.
We would like to bring the readers' attention to several other important publi cations in environmental chemical health hazard assessment that are pertinent to this topic (Ågerstrand et al. 2011;Hulzebos et al. 2010;Schneider et al. 2009), along with a U.S. Environmental Protection Agency (EPA) approach developed under the High Production Volume Challenge (U.S. EPA 1999b) as well as rele vant and poten tially eligible guidance developed by the U.S. EPA (1999a) and the Food and Drug Administration (FDA 2003). In addition, the majority of the procedures specified in Good Laboratory Practices and regu latory in vivo toxicity test guidelines (e.g., U.S. EPA 2013; Organisation for Economic Cooperation and Development 1999) were specifically developed to minimize systematic errors, assure high quality data and produce scientifically reliable studies.
These additional publications describe design, conduct, and reporting criteria that form the basis of the methodologies employed globally to assure quality and reliability of in vivo toxicological investiga tions for regulatory assessment of human and ecological health hazards. Because the application of systematic review and related evidencebased approaches in toxicology is still in its infancy, it is especially important at this time to recognize the contributions of these publications.
The omission of these publications by Krauth et al. could have major science policy implications. The National Toxicology Program (NTP) (whose parent organization, the National Institute of Environmental Health Sciences, funded the research of Krauth et al.) has begun relying on Krauth et al. (2013) to identify elements of risk of bias in evaluating animal studies of environ mental agents as part of its systematic reviews for assessing health effects (NTP 2013a(NTP , 2013b. The reliance on criteria that have not been transparently empirically tested instead of well established methodological criteria developed by authorita tive national and international organiza tions could result in biased systematic reviews that ultimately lead to regulations or classifications not supported by the science.
We suggest that further work is warranted in pulling together published perspectives on how to evaluate study quality in animal toxicology studies. Issues in appraising such studies for evaluating environm ental hazards to humans and wildlife go well beyond those of human clinical trials, and would bene fit from collaboration of experts in animal toxicology with experts in human clinical trials of medical inter ventions and human epidemiology.
The authors had complete control over the design, conduct, interpretation, and reporting of the analyses included in this letter. The contents are solely the responsibility of the authors and do not necessarily reflect the official opinions or policies of the authors' employers or clients.  (Krauth et al. 2013) lists the criteria found in most instruments we identified. In the "Discussion," we described the empirical evi dence supporting the use of some of these criteria and cited the relevant references with the empirical data. By empirical evidence, we mean that a criterion (e.g., randomization) has been shown to be associated with over estimation or under estimation of effect (this could be an efficacy or harm outcome).
Beck et al. note several publications in environmental chemical health hazard assessment [Ågerstrand et al. 2011;Food and Drug Administration (FDA) 2003;Hulzebos et al. 2010; Organisation for Economic Cooperation and Development (OECD) 1998;Schneider et al. 2009;U.S. Environmental Protection Agency (EPA) 1999a, 1999b. All of these publications, except OECD (1998), were identified in our search; however, they did not meet the a priori inclusion criteria for our systematic review. As noted in our "Methods" (Krauth et al. 2013), we included the earliest publication of an instrument when it was used in subsequent reports. The article by Ågerstrand et al. (2011) was based on four earlier published papers (i.e., Durda and Preziosi 2000;Hobbs et al. 2005;Klimisch et al. 1997;Schneider et al. 2009). We cited three of these in our review, but excluded Schneider et al. (2009) because it appeared to be a description of software that could be used to opera tionalize the Klimisch criteria. After reviewing the criteria described by Schneider et al. (2009) in their supplemental file, we found no unique additional criteria that were not already included in our Table 1 and Supplemental Material, Table S1. The reports from the U.S. EPA (1999aEPA ( , 1999b and FDA (2003) were neither indexed in Medline nor found in screening of bibliog raphies. In addition, U.S. EPA (2013) was published after we ended our study. Because we did not find the OECD docu ment (OECD 1998), we cannot conclude whether or not it should have been included in our study.
The comment by Beck et al. that the National Toxicology Program is relying on criteria that have not been "trans parently empirically tested" is not correct. In our paper (Krauth et al. 2013), we recommended the use of empirically tested criteria and we pointed out criteria that have been shown to be a risk of bias.
We caution against gathering judgments on how to assess study quality and propose that evidence should guide such evaluations. We propose an empirically based approachas opposed to consensusbased opinion of experts-as this would provide a more unbiased evaluation of the data.