Neuroscience and Biobehavioral Reviews Concordance and Incongruence in Preclinical Anxiety Models: Systematic Review and Meta-analyses

Rodent defense behavior assays have been widely used as preclinical models of anxiety to study possibly therapeutic anxiety-reducing interventions. However, some proposed anxiety-modulating factors – genes, drugs and stressors – have had discordant effects across different studies. To reconcile the effect sizes of purported anxiety factors, we conducted systematic review and meta-analyses of the literature on ten anxiety-linked interventions, as examined in the elevated plus maze, open field and light-dark box assays. Diazepam, 5-HT1A receptor gene knockout and overexpression, SERT gene knockout and over-expression, pain, restraint, social isolation, corticotropin-releasing hormone and Crhr1 were selected for review. Eight interventions had statistically significant effects on rodent anxiety, while Htr1a overexpres-sion and Crh knockout did not. Evidence for publication bias was found in the diazepam, Htt knockout, and social isolation literatures. The Htr1a and Crhr1 results indicate a disconnect between preclinical science and clinical research. Furthermore, the meta-analytic data confirmed that genetic SERT anxiety effects were paradoxical in the context of the clinical use of SERT inhibitors to reduce anxiety.


Introduction
The anxiety disorders are among the costliest classes of mental disorders, with regard to both morbidity and economic cost (Baldwin et al., 2014;DiLuca and Olesen, 2014). Development of anxiety-reducing (anxiolytic) drugs has been a major focus of the pharmaceutical industry and academic neuropsychiatric research, though no new drug types have been adopted since the introduction of selective serotonin uptake inhibitors (SSRIs) and other antidepressants for the treatment of anxiety disorders (Griebel and Holmes, 2013;Tone, 2009). Anxiety research relies on similarities between human emotional behavior and behaviors in animals (Darwin, 1998), specifically rat and mouse (Prut and Belzung, 2003). While there are many rodent behavioral paradigms that aim to model anxious behavior, three anxiety-related defense behavior (ARDEB) assays that specifically aim to measure rodent anxiety have been widely adopted, also referred to as 'approach-avoidance conflict tests': the elevated plus maze (EPM), the light-dark box (LD) and the open field (OF), the first, second and fifth most widely used rodent anxiety assays, respectively (Griebel and Holmes, 2013). All three assays use an arena that contains a sheltered domain (e.g., the closed arms in EPM) and an exposed region. It is believed that an animal's avoidance of the exposed portions of the chamber reports on anxiety-like brain states. The ARDEB assays are accepted as preclinical assays of anxiety disorders, by reference to classic studies that tested their predictive validity with panels of drugs known to have anxiety-modulating effects in humans (Crawley and Goodwin, 1980;Pellow et al., 1985;Simon et al., 1994).
Rodent research has been implicated in the largely frustrated efforts to develop new types of anxiolytics (Griebel and Holmes, 2013). The literature regarding defense behaviors is contradictory about the size and even the direction of many interventions that are proposed to be anxiolytic or anxiogenic (together 'anxiotropic') (Griebel and Holmes, 2013;Prut and Belzung, 2003). This is true even for some anxiety-related factors with major clinical relevance, such as the serotonin transporter (SERT/Htt), the target of the SSRIs. As with the assessment of clinical anxiety interventions (Baldwin et al., 2014), a solid preclinical evidence base is necessary to guide decisions about further research and therapeutic development (Vesterinen et al., 2014). To better understand the widespread discordance in rodent anxiety studies, we conducted a quantitative review of the effect of purported anxiety factors on rodent ARDEB. The primary aim of this study was to examine the relevance of these factors and to estimate the magnitude of their effects on rodent anxiety. A secondary goal of this analysis was to examine patterns in ARDEB factor evidence: gaps in the literature, the extent of standardization/heterogeneity and publication bias. Synthesizing the data on anxiety-targeted interventions might also assist in understanding why these assays have not led to new therapies. Once confirmed by meta-analysis, effective anxiotropic interventions can be adopted as benchmarks against which to validate new rodent assays and/or more tractable model animal species (e.g. Drosophila and zebrafish).

Literature review
We identified genes, drugs and environmental interventions that had been proposed to be involved in anxiety with a literature search of anxiety review articles. From two histories of anxiety research (Griebel and Holmes, 2013;Tone, 2009), a list of ten anxiotropic interventions were chosen to be included in the systematic review, either due to their clinical relevance (e.g., diazepam, Htt), their role as an example of a class of proposed anxiety-related factors (e.g., isolation), or their connection to possible forthcoming therapeutics (e.g. Crh). A systematic review was conducted to identify published articles addressing experimental outcomes in rodents from the EPM, OF, or LD assays for these interventions (Fig. 1). The literature for each genetic, pharmacological or environmental intervention was identified by a search of PubMed and EMBASE using specific search phrases ( Table 1). The selective serotonin reuptake inhibitors (SSRIs), which have clinical importance (Baldwin et al., 2014), a very large number of studies conducted on them (Griebel and Holmes, 2013), and controversial efficacy (Kirsch et al., 2008) are the subject of a separate meta-analytic study, currently in preparation.

Eligibility criteria and study selection
The search phrases in Table 1 were used to identify lists of studies. We exported the articles' bibliographic data (including study ID, date of publication, title and abstract) of to a spreadsheet. Each article on this list was then reviewed at one or more of four levels of detail (title, abstract, full text and a detailed review of experimental design) to determine their eligibility for the review. Studies were required to be written in English and to have reported ARDEB in adult rats or mice. We required that each included study contain Table 1 Summary of systematic reviews of anxiety-related interventions in mouse and rat. The PubMed and Embase query phrases used to identify articles that might contain data relevant to the interventions and assays of interest are detailed. Title, abstract and full-text searches were performed to identify articles meeting the selected criteria. (1) primary behavior data from either an OF, EPM, or LD experiment for at least one of the interventions of interest, (2) suitable control data and (3) the relevant statistics (mean, standard error or standard deviation, and sample sizes of both control and intervention groups). Experiments that used combination treatments were excluded. Only studies in which adult rodents were assayed were included. For gene knockout and overexpression interventions, we included only experiments that used a lifetime loss of function throughout the entire animal. All eligible experiments from all eligible studies were included in the ten meta-analyses (Table 1).

Data items and extraction
The following data were collected from each of the included studies: authors, year of publication, figure and panel numbers, species, genotype, and mean, standard error of the mean and sample size (N) of each intervention and its related control group. Graphically presented data were extracted from Portable Document Format (PDF) files with the Measuring Tool in Adobe Acrobat Pro. All extracted data were checked by a second researcher. For values extracted from tables, the check consisted of ensuring the values were identical. For values extracted from graphical data (e.g. bar plots), the check consisted of a visual inspection to ensure that the extracted value matched the graphical data. Extraction discrepancies were reconciled by conference between the primary extractor and the researcher who identified the discrepancy.

Summary measures
The following behavioral metrics were extracted from the articles: in OF studies, percent or total time spent at the center; in EPM studies, percent or total time spent on the open arm; in LD studies, percent or total time spent in the bright area. To synthesize these time-based metrics from the three assays, all estimates were standardized to Hedges' g, a preferred variant of Cohen's d that uses the pooled standard deviation and is corrected for bias using Hedges' method (Borenstein et al., 2011;Cumming, 2012). The conventional adjectives to describe effect size -trivial, small, moderate, large -are used for effect sizes g < 0.2, g < 0.5, g < 0.8 and g > 0.8 SD respectively (Cumming, 2012).

Synthesis of results
Meta-analyses of experimental outcomes, including the calculation of weighted mean effect sizes (Hedges'g), 95% confidence intervals, I 2 heterogeneity values, and P values using the random effects model, were performed with the metafor package in R (http://CRAN.R-project.org/package=metafor) (Viechtbauer, 2010). All error bars in forest plots are 95% confidence intervals; forest plots were generated with custom R scripts.

Assessment of bias across studies
Publication bias was assessed with funnel plots and Egger's linear regression test of funnel plot asymmetry (Egger et al., 1997). The standard normal deviate (Hedges'g/standard error) for each study was regressed against the study's precision (1/standard error) using the "lm" function in R (http://www.R-project.org/). For studies that showed publication bias (P-value ≤ 0.05), the trim-and-fill method (Duval and Tweedie, 2000) was employed to estimate the effects of publication bias on the effect size estimate. Funnel plots and trimand-fill adjustments were performed with the 'metafor' package in R (Viechtbauer, 2010).

Review selection criteria identified 306 eligible articles
The flow-chart in Fig. 1 summarizes the study selection process. In total, 1169 articles were identified by the initial search in PubMed and EMBASE databases. According to the selection criteria described above, 498 studies were excluded based on their titles and a further 150 were excluded based on their abstracts. The full text of the remaining 521 articles were screened for criteria related to experimental paradigm, methods, and relevant variables, resulting in the exclusion of a further 215 studies. A total of 306 articles were considered eligible for inclusion in the review.

Characteristics of included experiments
The characteristics of all included studies are given in Supplementary ments conducted on male animals, 29 on female, 35 on mixed and 3 experiments with no gender information reported. ARDEB studies of diazepam used a median dosage of 1 mg/kg, with minimum and maximum dosages of 0.01 mg/kg and 20 mg/kg respectively, a dose range is similar to or higher than commonly used by patients.

Heterogeneity
Statistically significant heterogeneity was found in (8/10) of the meta-analyses. Only two meta-analyses had high heterogeneity, I 2 > 75%: Htr1a overexpression, and physical restraint (Higgins et al., 2003). Three of the meta-analyses, pain and Htt knockouts and diazepam, had moderate heterogeneity (50% < I 2 < 75%). Five meta-analyses had low heterogeneity (I 2 < 50%). As most of these syntheses contained data from more than one assay type, it is encouraging that half had low or moderate heterogeneity, and outcome that is compatible with the idea that the three ARDEB assays are testing similar aspects of rodent anxiety.

Substantial publication bias in four anxiety factors
Censorship of non-statistically significant experimental results and selective publication of statistically significant 'positive' results can cause a literature (and meta-analysis thereof) to overstate effect sizes. This effect, termed 'publication bias,' has a profound influence on the literature on rodent models of stroke, and may affect other animal models (Sena et al., 2010). Publication bias in the ARDEB literature was assessed for the six meta-analyses that had at least 20 experiments (Table 2) (Sterne et al., 2011). Funnel plots of these data showed pronounced asymmetry (Fig. 2), which pointed to publication bias in these literatures (Sterne et al., 2011). Egger's asymmetry test indicated that four of these literatures showed statistically significant bias (Table 2). For the biased data sets, we applied trim-and-fill adjustment to estimate the number of hypothesized missing studies and to correct the bias (Duval and Tweedie, 2000) (Fig. 2). These data support the idea that the literatures of diazepam, Htt knockout, social isolation and restraint effects on ARDEB are strongly affected by publication bias.

Anxiotropic effects of the serotonin transporter
The serotonin transporter (SERT) is the target for the selective serotonin reuptake inhibitors (SSRIs), a class of drugs used to treat depression and anxiety (Baldwin et al., 2014). Meta-analysis of thirteen knockout studies (Carroll et al., 2007;Holmes et al., 2003a,b;Kalueff et al., 2007a,b;Li et al., 2004;Line et al., 2011;Lira et al., 2003;Moya et al., 2011;Olivier et al., 2008;Pang et al., 2011;Schipper et al., 2011;Zhao et al., 2006) revealed a large anxiogenic effect (g = 0.88 [95CI 1.26, 0.23], P = 5.2 × 10 −14 ; Fig. 5A) produced by knocking out the SERT gene, Htt. However, a funnel plot and Egger's regression revealed a pronounced bias in reported effect sizes (Egger's test P = 6.7 × 10 −6 , Table 2). Trim-and-fill adjustment added ten imputed data points to the left segment of the funnel plot, lowering the effect size to g = 0.57 [95CI 0.29, 0.86], a moderate effect. Only two articles studying the effect of Htt overexpression on ARDEB were found (Jennings et al., 2006;Line et al., 2011). Metaanalysis revealed a large anxiolytic effect (g = −0.94 [95CI −1.69, −0.20], P = 0.013; Fig. 5B) in EPM and OF assays (no LD articles were found). The transporter gene knockout and overexpression effects clearly connect Htt function to rodent anxiety. However, the direction of effects is the opposite of what would be expected from the clinical application of SERT inhibitors, given that SSRI reduction of SERT function is believed to have a therapeutic, anxiety-reducing effect.

The effect of acute pain on rodent anxiety
Environmental stressors have physiological effects on animals that promote the anxiety-like state (van Praag, 2003). To survey a range of stress modalities we selected acute pain, bodily restraint and social isolation for review; all three have been found to promote anxiety in humans (Sherif and Oreland, 1995). The systematic review identified seven papers measuring the effect of acute pain on ARDEB (Benbouzid et al., 2008;Leite-Almeida et al., 2012;Liu et al., 2015a,b;Matsuzawa-Yanagida et al., 2008;Parent et al., 2012;Schellinck et al., 2003;Shang et al., 2014). Meta-analysis of the 21 experiments therein indicated a moderate anxiogenic effect (g = 0.56 [95CI 0.19, 0.93], P = 2.9 × 10 −3 ; Fig. 6A).

Crh gene knockout has a modest effect on rodent ARDEB
Several neuropeptide-related genes involved in stress signaling have been linked to anxiety, notably the peptide, corticotropinreleasing hormone (CRH; also known as corticotropin-releasing factor) (Kormos and Gaszner, 2013) and its receptor, CRHR1. Two studies that examined the effects of Crh knockouts on ARDEB were found (Weninger et al., 1999), which revealed only a small effect (g = 0.30 [95CI −0.32, 0.92], P = 0.34; Fig. 7A). This supports the idea that CRH has only a modest effect on the ARDEB. The meta-analytic result may suffer from insufficient precision as the cumulative sample size was small (N = 20, 21). As publication bias appears to affect the literature, this small effect could be an overestimate.
3.12. Crhr1 gene knockout has a large effect on rodent anxiety CRH exerts its biological action via two receptors known as CRHR1 and CRHR2. The two receptors are pharmacologically distinct and only the former has been widely studied in the context of anxiety (Owens and Nemeroff, 1991;Paez-Pereda et al., 2011). Meta-analysis (Gammie and Stevenson, 2006;Liebsch et al., 1999Liebsch et al., , 1995Müller et al., 2003;Smith et al., 1998;Trimble et al., 2007) found that, in contrast to the Crh knockout, deletion of Crhr1 had a large anxiolytic effect on ARDEB (g = −1.0 [95CI −1.30, −0.70], P = 6.64 × 10 −11 ; Fig. 7B). The discordance between Crh and Crhr1 knockout effects has previously been attributed to the action of other peptide ligand(s) of Crhr1, either urocortin or another, unidentified ligand .

Species and sex differences in ARDEB
Rats and mice have differences in their defensive behavior when exposed to predators or predator cues (Blanchard et al., 2001). We used the synthetic data to examine species differences in baseline ARDEB prior to anxiotropic manipulation. The LD box showed the most substantial difference between species −6.88% [−13.3; −0.46], p = 0.04, but in general, inter-species differences in naive ARDEB were minor when compared to the overall variance (Fig. 8A).
Where rat data were available, inter-species differences between the effects of anxiety-related interventions were investigated. There were only minor inter-species differences in the meta-analytic effect sizes of the interventions (Fig. 8B). A striking exception to this trend was a large rat-mouse difference in the response to restraint: this treatment appeared strongly anxiogenic for rats, but modestly anxiolytic for mice.
We investigated sex differences in ARDEB, but found that the anxiety studies contained ∼18 × fewer experiments on female rodents than males, rendering any meta-analytic estimates of female ARDEB imprecise.

Validity of the ARDEB assays
Of the interventions analyzed above, only diazepam has extensive clinical evidence supporting its ability to alter human anxiety. The large anxiolytic diazepam effect size observed with the ARDEB assays verifies their validity (Fig. 9A). The stressors -isolation, acute pain and restraint -would all be expected to produce increases in human anxiety, and all show anxiogenic effects in the ARDEB, thus also verifying the validity of these assays (Fig. 9A), with the exception of the surprisingly small social isolation effect (0.21 g). Establishing the validity of animal models relies partly on showing concordance between models (Campbell and Fiske, 1959;van der Staay, 2006). To explore the concordance between the three assays, we conducted regression analyses on all possible two-way comparisons of ARDEB (Fig. 9B-D). The LD-EPM and OF-LD comparisons of ARDEB changes both showed 60% concordance, supporting the idea that the three assays were measuring similar aspects. However, surprisingly, the OF-EPM comparison revealed that the two methods were discordant (R 2 adj = −0.01, Fig. 9D).

Summary of evidence
Inspection of the forest plots reveals that all of the primary publication sets include experimental effect sizes that are discordant, either in direction (anxiolytic versus anxiogenic) and/or magni- tude. The generality of discordance in the literature emphasizes the utility of meta-analysis to behavioral neuroscience to give a quantitative overview and to synthesize the best evidence available. Of ten analyses of putative anxiotropic interventions, eight yielded at least moderate meta-analytic effect sizes and two produced small effect sizes (Fig. 9). The synthetic data strongly confirm that diazepam, the serotonergic system, environmental stressors, and Crhr1 influence an anxiety-like process in the mouse brain.  Holmes, 2003Mice Female Holmes, 2003Mice Female Li, 2004Mice Female Carroll, 2007Mice Female and Male Line, 2011Mice Female and Male Holmes, 2003Mice Male Holmes, 2003Mice Male Holmes, 2003Mice Male Kalueff, 2007Mice Male Kalueff, 2007Mice Male Lira, 2003Mice Male Schipper, 2011Rats Female Olivier, 2008Rats Male Holmes, 2003Mice Female Holmes, 2003Mice Female Carroll, 2007Mice Female and Male Holmes, 2003Mice Male Holmes, 2003Mice Male Holmes, 2003Mice Male Kalueff, 2007Mice Female Carroll, 2007Mice Female and Male Zhao, 2006Mice Female and Male Zhao, 2006Mice Female and Male Holmes, 2003Mice Male Kalueff, 2007Mice Male Kalueff, 2007Mice Male Moya, 2011Mice Male Pang, 2011Mice Male Olivier, 2008

Limitations
This study is limited by its exclusive use of English-language published data. Some studies had to be excluded from the meta-analysis during the full text scan because they did not report measures of variance. Only studies that reported time or percent time spent in exposed arena could be selected for metaanalysis. We found no knowledge gaps per se, as all ten proposed anxiety-related factors had at least two studies. Nevertheless, Htt overexpression, Crh knockouts and the non-anxiety genes had limited cumulative sample sizes (N cumulative < 64, 64). Of the six factors for which publication bias was examined, three were affected. The presence of publication bias in the larger data sets suggests that inclusion of further data to the smaller meta-analyses would be expected, on average, to lower these effect sizes as well. Heterogeneity was at least moderate (I 2 > 50%) in five of the meta-analyses, indicating that the random effects model is insufficient to explain the variance in these data sets. Thus, laboratory, strain, assay type and other protocol variations played variable roles across factors. Heterogeneity could in theory be reduced by increased standardization (Crabbe et al., 1999). Multilevel regression models of these data may be able to account for the unexplained variance (Yildizoglu et al., 2015).

Assay validity
The validity of each ARDEB assay was originally tested with a panel of anxiotropic agents (Crawley and Goodwin, 1980;Pellow et al., 1985;Simon et al., 1994). In the decades since the variability of assay results and the disappointing clinical outcome of compounds identified with these preclinical assays raise new questions about their validity (Griebel and Holmes, 2013). The diazepam, restraint and acute pain synthetic data shown here support the ARDEB assays' validity, though two other results raise doubts: (1) the social isolation effect on ARDEB is weaker than expected (Fig. 9A); (2) the failure of EPM and OF to reproduce each other's outcomes (Fig. 9D). The EPM-OF meta-regression discordance is an exploratory observation that could be verified with a formal method comparison with animals run through all three assays (Bland and Altman, 1999).
How might the validity of the ARDEB assays be tested further? First, meta-analyses of additional known anxiotropic agents will help assess the assays' strengths and weaknesses. Second, assay validity assessment would be helped by researchers making their video or tracking data available (ideally with experimental metadata in a standard file format), similar to data sharing efforts currently underway in neurophysiology. Anxiety assay validity may also be tested with new instrumentation that allows the estimation of animal pose (Nanjappa et al., 2015;Wiltschko et al., 2015) and that will make complex, ethological relevant anxiety assays (Blanchard et al., 2001) increasingly accessible for routine analysis (Schaefer and Claridge-Chang, 2012). Looking backward (meta-analysis and data sharing) and forward (more refined anxiety assays) are both valuable to rodent anxiety research. It has been suggested that research consortia should form to overcome the cost restrictions of large rodent sample sizes (Button et al., 2013); per-haps a consortium could form around the problem of anxiety assay validation.

Disconnect between Htr1a & Crhr1 preclinical results and clinical efforts
Meta-analysis of Htr1a overexpression revealed it has a moderate anxiotropic effect (−0.6 g), smaller than the bias-corrected diazepam effect (−0.85 g), suggesting that compounds aiming to increase 5-HT1A function may be a poor strategy to reduce anxiety. This view is supported by clinical meta-analyses that have concluded that drugs targeting 5-HT1A -the azapironesappear inferior to benzodiazepines for generalized anxiety disorder (Chessick et al., 2006) and that there is insufficient evidence to support azapirone use in panic disorder (Imai et al., 2014). It appears that clinical adoption of the azapirones was/is not informed by the preclinical genetic evidence base. A second type of preclinicalclinical disconnect is observed with the Crhr1 knockouts. The synthetic preclinical data indicate that Crhr1 knockout produces a very large reduction of rodent anxiety (g = −1.0 [95Ci −0.7, −1.3], I 2 = 13%, N cumulative = 105, 99). However, at least one clinical trial of a CRHR1 antagonist for generalized anxiety disorder showed no benefit over placebo (Coric et al., 2010). The discrepancy between the efficacy of Crhr1 knockouts and inefficacy of CRHR1 antagonists in patients remains unexplained.

A paradox in Htt-SSRI anxiety effects
Drugs that inhibit SERT, the SSRIs, are recommended as the first line of pharmacological treatment for anxiety (Baldwin et al., 2014). Blocking SERT-mediated reuptake of serotonin from the synaptic cleft is the proposed mechanism of SSRI anxiety reduction, although rodent studies of chronic SSRI effects on ARDEB have been incon-   clusive (Griebel and Holmes, 2013;Perez-Caballero et al., 2014). Given the inhibitors' clinical use, it is paradoxical that Htt knockouts have elevated anxiety relative to controls (0.57 g) and that Htt overexpression dramatically reduces rodent anxiety (−0.94 g).  Fig. 8. Species differences in ARDEB. (A) Contrasts of mice and rats naive defence behaviors in three different assays. Upper panel shows the means of proportion of time spent in the exposed region, categorized by assay type and species. Each point is the mean value of an experiment. The lower panel shows the contrast means and confidence intervals (mean difference of the percent time spent in exposed region). (B) The weighted mean effect sizes of six interventions subgrouped into mice and rats. Color indicates species (green = mice, orange = rats). Each mean effect size is represented by the central vertices of diamond; the outer vertices indicate the 95% confidence interval. The horizontal axis is Hedges' g, the standard deviation change relative to control animals. NC and NT indicate control and treatment animal sample sizes respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) discussed it (Carroll et al., 2007;Kalueff et al., 2007a;Moya et al., 2011;Schipper et al., 2011). Other authors have remarked that the underlying reason remains unclear (Holmes et al., 2003b;Lira et al., 2003) or have called the validity of ARDEB assays into doubt (Pang et al., 2011). As both genetic knockouts and SSRIs are expected to produce monotonic, systemic reductions of SERT function, this incongruence is not easily explained by models of serotonin conflicting action that invoke distinct 5-HT circuits in the brain with opposing effects on defense (Deakin and Graeff, 1991). Others have proposed two explanatory hypotheses. The first is that increased anxiety arises from developmental alterations present in Htt knockouts not present in chronically drug-treated animals (Holmes et al., 2003b;Olivier et al., 2008;Zhao et al., 2006). This hypothesis could be tested with conditional knockdown models, i.e. in animals with Htt only deleted at the adult stage. While systematic review of PubMed and EMBASE did not identify any published reports of postdevelopmental Htt knockout experiments (e.g., using floxed Htt), researchers have analyzed the anxiety-related effects of conditionally ablating the Pet-1 gene. Pet-1 is a transcription factor with an expression range that overlaps closely with the expression of Htt.
In mice with Pet-1 removed in adulthood, mRNA levels of Htt are substantially reduced . Like Htt knockouts, these mice show increased anxiety-like behaviors in multiple ARDEB assays , eroding confidence in the developmental Color indicates direction (green = anxiolytic, red = anxiogenic) and statistical significance (grey = statistically non-significant). The diamonds for the diazepam, social isolation, and Htt KO meta-analyses represent the summary effect sizes after trim-and-fill bias correction. Each mean effect size is represented by the central vertices of a diamond; the outer vertices indicate the 95% confidence intervals. The horizontal axis is Hedges' g, the standard deviation change relative to control animals. (B) Method comparison of LD and EPM shows that the two methods report ARDEB changes with 59% concordance. (C) Method comparison of OF and LD shows that the two methods reports ARDEB changes with 60% concordance. (D) Method comparion of OF and EPM shows that there is no concordance between the two methods. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) alteration hypothesis. A second hypothesis to explain the Htt/SSRI paradox is that there is a J-shaped relationship between Htt function and anxiety, i.e., both wild-type and knockout animals would have higher anxiety relative to animals with intermediate function (Olivier et al., 2008). The SSRI/knockout paradox is also observed in depression assays, though interfering RNA knockdown of Htt in adult mice reduced the forced swim test measure of depression (N = 10, 10) (Thakker et al., 2005).

Conclusions
This study confirms that diazepam, two environmental stressors and three genes influence rodent anxiety as measured by defense behavior assays. These anxiety-related interventions (diazepam, Htr1a gene knockout, Htt gene knockout, Htt gene overexpression, acute pain, restraint and Crhr1 gene knockout) can be used as reference manipulations when establishing other anxiety models. The rodent anxiety literature is affected by publication bias that amplifies effect sizes. For the panel of ten interventions, there is strong EPM-LD and OF-LD ARDEB assay concordance, but EPM and OF did not reproduce each other. The meta-analytic results bring several preclinical-clinical incongruencies into sharp relief: the weakness of Htr1a overexpression contrasting with the clinical use of azapirones, the potently anxiogenic Crhr1 knockout contrasting with the clinical failure of CRHR1 antagonists, and the anxiogenic SERT knockout contrasting with the clinical use of SSRIs as anxiolytic drugs. Meta-analysis has the ability to aggregate information and resolve discordance in the primary literature, something of particularly use to behavioral neuroscience where most primary articles describe experiments with poor precision (Button et al., 2013). Precise estimation of effect magnitudes (Claridge-Chang and Assam, 2016) is important both to better understand animal model strengths/weaknesses and to improve the ability of preclinical studies to guide clinical investigation. The formation of multi-lab consortia to coordinate the examination of important hypothesized anxiety factors would be one promising way to increase the reliability of rodent anxiety data (Button et al., 2013). New, automated methods of behavioral imaging will also play a role in better preclinical models (Schaefer and Claridge-Chang, 2012). Another possibility would be to use small animal models (worms, flies, and zebrafish) that allow large sample sizes and powerful genetic tools to complement rodent experiments (Mohammad et al., 2016).

Funding
The authors were supported by Biomedical Research Council block grants to the Neuroscience Research Partnership and the Institute of Molecular and Cell Biology. FM and ACC also received support from Duke-NUS Graduate Medical School. JH received support from the A*STAR Graduate Academy. ACC received additional support from a Nuffield Department of Medicine Fellowship, a Wellcome Trust block grant to the University of Oxford, A*STAR Joint Council Office grant 1431AFG120 and NARSAD Young Investigator Award 17741. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.