First Genome-wide Association Study on Anxiety- Related Behaviours in Childhood

Background: Twin studies have shown that anxiety in a general population sample of children involves both domain-general and trait-specific genetic effects. For this reason, in an attempt to identify genes responsible for these effects, we investigated domain-general and trait-specific genetic associations in the first genome-wide association (GWA) study on anxiety-related behaviours (ARBs) in childhood.


Introduction
Anxiety disorders are among the most common psychiatric disorders [1]. They often begin in childhood [2] and continue into adulthood [3], when they become co-morbid with other psychiatric disorders especially depression [4] and entail significant costs both to society and to the individual [5]. Quantitative anxietyrelated traits, assessed as clinical symptoms, e.g. [6] or personality/ temperament traits [7,8], are strong predictors of diagnosed anxiety disorders [7].
Twin studies have shown that childhood anxiety in representative samples, like other complex traits, is influenced genetically, e.g. [9]. Multivariate genetic studies indicate genetic overlap as well as specificity between different aspects of anxiety and from age to age as early as the preschool years [10] and into middle childhood [11] and adolescence [12,13]. At age 7, the age of the twins in the present study, parent ratings of anxiety-related traits have been shown to be moderately heritable with both domaingeneral and trait-specific genetic effects [11]. Similar results were found at age 9 and for continuity from age 7 to age 9 [14]. Although these quantitative genetic findings are important, the next step is to identify specific genes responsible for these effects.
Until recently, molecular genetic investigation into the aetiology of anxiety relied on linkage and candidate-gene designs. Linkage, which looks for co-inheritance between DNA variants and a disorder within families, is a systematic strategy for detecting genes of large effect size throughout the genome. However, linkage found few such large effects for common disorders like anxiety and lacks power to detect more modest effects [15].
In contrast, allelic association, which looks for correlations between an allele and a trait among unrelated individuals, is much more powerful than linkage, but until recently, association has been limited to the exploration of a few candidate genes and could not be used to conduct a systematic search of the genome. Candidate-gene association studies of anxiety-related traits reported many associations but few of these associations have stood the test of replication, similar to candidate-gene studies in other domains in the life sciences [16].
Association studies became systematic with the advent of genome-wide DNA arrays that genotype hundreds of thousands of DNA variants throughout the genome and resulted in a plethora of genome-wide association (GWA) studies [17]. Although the first major GWA studies were reported in 2007 [18], significant results have been reported for more than 200 traits in 1500 GWA studies [19]. The only GWA studies of anxiety-related traits have focused on the personality trait of neuroticism in adults and reported possible associations with several genes [20,21,22]. However, no GWA studies of anxietyrelated traits in children have previously been reported.
The current study presents the first GWA study of anxietyrelated traits in children. The multivariate genetic results mentioned earlier led us to consider trait-specific as well as domain-general measures. Despite the success of GWA, reported associations are of small effect size and together account for only a modest proportion of the heritability of traits, known as the ''missing heritability'' problem [23,24]. One of many possible reasons for the missing heritability problem is that potential associations are missed by the common SNPs that are included in extant DNA arrays. To test this hypothesis, a new technique, described by Yang et al. [25] and implemented in a software package called Genomewide Complex Trait Analysis (GCTA), has been developed that allows estimation of the total genetic variance captured by SNPs on a genome-wide DNA array, even though it does not identify which SNPs are responsible for the genetic influence [26]. For this reason, we also report GCTA results for anxiety-related traits in childhood and compare them to our twin study estimates of heritability from the same sample at the same age and using the same measures.

Ethics Statement
Written parental consent was obtained prior to data collection and the project received approval from the Institute of Psychiatry ethics committee (05/Q0706/228).

Sample
The sample was drawn from the Twins Early Development Study (TEDS), a multivariate longitudinal study which recruited over 11,000 twin pairs born in England and Wales in 1994, 1995 and 1996 [27], whose families are representative of the UK population [28]. Twins with severe medical problems or severe birth complications or whose zygosity could not be determined were excluded from the sample. To decrease heterogeneity of ancestry, the sample was restricted to families who identified themselves as white and whose first language was English. After exclusions, 7834 pairs of twins had anxiety data available at age 7 (mean age = 7.06, SD = 0.25). Although anxiety data were also available at age 9, we did not use these data in our GWA analyses because only half the sample were contacted at age 9 to provide phenotypic data. 3747 DNA samples from unrelated children in TEDS were sent for DNA array genotyping at the Wellcome Trust Sanger Institute, Hinxton, UK as part of the Wellcome Trust Case Control Consortium 2.
3665 samples were successfully hybridized to Affymetrix GeneChip 6.0 SNP genotyping arrays using standard experimental protocols (see Text S1). 3152 samples (1446 males and 1706 females) survived stringent quality control procedures performed (see Text S1), of whom 2810 also had anxiety data.
The replication sample was also drawn from TEDS children for whom DNA and anxiety data were available but for whom genome-wide genotyping was not available. After quality control, both anxiety data and SNP genotyping were available for 4804 additional individuals. Of these, 2625 were unrelated children who were also unrelated to children in the discovery sample; for 1742 children, their fraternal co-twin was in the discovery sample, and for 437 children their fraternal co-twin was also in the replication sample.

Anxiety-Related Behaviours Questionnaire (ARBQ)
Anxiety was rated by parents using the Anxiety-Related Behaviours Questionnaire (ARBQ) [10]. The ARBQ is a quantitative trait parent rating instrument for children in the general population rather than a diagnostic tool. It includes items that assess anxiety symptoms as well as aspects of anxiety-related personality. The items are best structured as four latent variables in childhood: negative affect, negative cognition, fear, and social anxiety [11]. In order to investigate domain-general genetic associations, we also constructed a general anxiety composite by summing the standardised scores for these four variables. The overall composite was crucial to produce a phenotypic measure that was free from any scale-specific error. In addition, combining standardised scores assured that none of the scales biased the composite. The ARBQ has been shown to have good construct validity, and high internal consistency [10]. In order to avoid the skew that occurs for behaviour problem measures, the five anxiety scores were quantile normalised (van der Waerden; ranks averaged for tied data) [29]. Although the distributional properties of these transformed scores are better, the correlation between the raw scores and the transformed scores varied from .80 to .98 and results were highly similar for the raw and transformed scores.

Genotyping
Genome-wide genotyping was done on AffymetrixGeneChip 6.0 SNP genotyping array with additional ,2.5 million SNPs imputed from HapMap 2 and 3 and WTCCC controls Details about genotyping and quality control are included in the Text S1. 13 SNPs for the top hits for the five anxiety-related scales from the discovery sample were genotyped in the replication sample of 4804 individuals using the Sequenom MassARRAY iPlex GoldH system (Sequenom, San Diego, USA). Three SNPs failed to meet quality control criteria, leaving 10 SNPs available for the replication stage.

Statistical Analyses
Genome-wide association (GWA) analysis. Linear regression analyses were conducted using SNPTEST v2.0 [18] under an additive model, using a frequentist method that accounts for uncertainty of genotype information [30]. We included age, sex, cohort and eight eigenvectors representing population ancestry as covariates. Consolidation and summary of the GWA results was performed in R (www.r-project.org) [31].
The strongest association results from the GWA were selected for genotyping in the replication sample. Where imputed SNPs were in LD with genotyped SNPs, the genotyped SNPs were preferred. However, one especially promising imputed SNP (rs1113313) was also selected. The SNPs were selected that were not in linkage disequilibrium (LD) with each other.
Sequenom genotyping results for the replication sample were analysed using the same protocols and software as those in GWA analysis. We conducted analyses using the total replication sample as well as the subsample of individuals genetically unrelated to each other or to individuals in the discovery sample. Although this is somewhat unorthodox, power is crucial for replication and the total sample provides maximum power because it maximises sample size. If replication is found for the total sample, the replication may be biased because the sample is not completely independent of the discovery sample and more replication would be required for definitive proof of replication. However, if the results from the discovery sample do not replicate using the total sample, this is the strongest possible evidence of failure to replicate because our replication sample consists of a highly similar sample tested at exactly the same age using exactly the same measures.
Genome-wide Complex Trait Analysis (GCTA). GCTA does not attempt to identify specific variants associated with traits. Instead, it uses chance genetic similarity among unrelated individuals across hundreds of thousands of SNPs to predict phenotypic similarity. We used the GCTA software package [25] to evaluate the amount of the phenotypic variance explained by the genetic information available from the Affymetrix 6.0 DNA array. Detailed explanation of the methodology and procedure is available from Yang et al. [26]. To remain consistent with the procedure outlined by the proponents of the software, we initially used all ,700,000 genotyped SNPs to calculate a genetic relatedness matrix (GRM). However, GCTA results reported previously for height, weight and intelligence used the Illumina microarray, which was designed with specific focus on European ancestry, whereas the Affymetrix microarray was less ancestry specific. We found that by adding high-quality imputed SNPs (see 'Genotyping' section), thus increasing the number of SNPs to ,1.7 million, brought our GCTA estimates in line with previously published estimates for height, weight and intelligence. Thus, we used the 1.7 million SNPs to estimate how much of the heritability as estimated by the classical twin method could be accounted for by the available genetic information.

Results
Genome-wide Association (GWA)  First GWAS on Anxiety-Related Behaviours Figure 2 presents 'Manhattan' plots for the same traits that show -log 10 p values on the Y axis for the ,1.7 million SNPs across the 22 autosomes on the X axis. The p values on the Y axis are the negative logarithms of the p values so that the highest points in the plot represent the strongest SNP associations. The dotted horizontal line represents suggestive significance (5610 27 ), not genome-wide significance (5610 28 ). Regions with the strongest associations were chosen for replication -for example, regions of chromosome 6 and 12 that reached suggestive significance (5610 27 ) for the anxiety composite and negative cognition scale respectively. The association of the SNP on chromosome 6 with negative affect (Figure 2) was not proposed for replication due to the SNP's low minor allele frequency (maf = 0.03). Table 1 shows results in the discovery sample for the 10 SNPs that were also successfully genotyped in the replication sample. Two of the lowest p values in the discovery sample were SNP rs16879771, associated with the anxiety composite (p = 6.27610 27 ), and rs1952500, which was associated with Negative Cognition (p = 4.12610 27 ). The significance of the remaining SNPs varied from 10 24 to 8210 27 . The amount of variance explained in the discovery sample as indicated by the squared beta values varied from 0.09% to 1.0%. Visual inspection of the genotype-specific means suggested that none of the selected SNPs deviated from additivity. Table 1 also includes results for the 10 SNPs in the replication sample. None of the SNPs reached significance and the direction of the associations in the replication sample was nearly at a chance level (6 in the same direction as in the GWA analysis and 4 in the opposite direction). These replication analyses were based on our total replication sample of 4804 for which we had greatest power; similarly negative results were found for our subsample of 2625 individuals which constituted a more independent but less powerful replication sample.

Genome-wide Complex Trait Analysis (GCTA)
As described earlier, we used ,1.7 million SNPs to estimate the GCTA Genetic Relatedness Matrix for our sample of 2810 individuals. Our sample included no known pairs related in the traditional sense, which was confirmed by finding that no pairs reached the standard GCTA relatedness cut-off threshold of 0.025 genetic relatedness. Table 2 summarises the GCTA estimates obtained for the five anxiety-related traits and compares them to twin study heritability estimates from the sample at the same age using the same measures. Table 2 also includes GCTA estimates for height and weight in our sample in order to compare our results to previously reported results for height and weight. As indicated in Table 2, our twin study heritability estimates are 0.80 and 0.84 for height and weight, respectively, and our GCTA estimates are 0.35 and 0.42, all of which are comparable to results reported in the literature [32]. Also similar to the literature reviewed in the Introduction, our twin study heritabilities for anxiety-related traits are substantial, varying from 0.50 to 0.61. However, the GCTA estimates for anxiety-related traits were much lower, ranging from only 0.01 to 0.19. None of the GCTA estimates reached statistical significance (p,.05) due to the large standard errors of estimates.

Discussion
This first genome-wide association study of anxiety-related traits in childhood indicates that no common genetic variants of large effect contribute to the heritability of these traits. Our sample of 2810 had 80% power to detect causal variants with effect sizes First GWAS on Anxiety-Related Behaviours PLOS ONE | www.plosone.org greater than 1.4% of the variance and none were detected. Power was calculated with the Genetic Power Calculator [33] using an additive model with a genome-wide significance threshold of p,5610 28 . As seen in Table 1, the largest effect size from the GWA analysis accounted for only 1% of the variance. Our power calculations indicated that we had less than 80% of power to detect a signal of this magnitude; thus this result should be considered with caution until replicated. That said, these results are similar to those found for other quantitative traits for which the strongest associations account for about 1% of the variance such as height [34], weight (GIANT Consurtium) [35], and cognitive traits including reading [36], mathematics [37], and general cognitive ability [38,39]. Our GWA results for anxiety-related traits in childhood are compatible with a growing consensus from GWA studies of complex traits that the largest effect sizes are very small and that all known associations only explain a small portion of the heritabilities of complex traits and common disorders, a gap that is known as the missing heritability problem [23]. The missing heritability problem can be seen in Table 2 in which our twin study estimates of heritability for the five anxiety-related scales exceed 50%, whereas the sum of the effect sizes of the 10 SNPs shown in Table 2 is less than 5% in the discovery sample, and negligible in the replication sample.
Dozens of papers have been published about possible solutions to the missing heritability problem [40]. One possibility is that heritability might be overestimated in twin studies and another is that the common SNPs on commercially available DNA arrays might be missing associations due to very small effect sizes and also might be caused by rare polymorphisms of larger effect sizes [41]. Some of these issues are addressed in part by GCTA analysis. GCTA estimates overall genetic influence directly from overall SNP similarity pair by pair for a large population of unrelated individuals; in this sense, it is independent of the effect size of individual polymorphism, although it is limited to detecting the additive effects of the DNA array's common SNPs and the variants they tag. The large standard errors (Table 2) from our GCTA estimates based on a sample of 2810 indicate the daunting demands for power in trying to detect a tiny genetic signal from the noise of 1.7 million SNPs: Most of the population differ by less than 1% in overall SNP similarity across more than a million SNPs [42]. Nonetheless, for height and weight, our GCTA estimates are similar to those reported in the literature, which account for about half the heritability of these 'anchor' variables [32]. In contrast, across the five anxiety-related traits, the average GCTA estimate of 10% (Table 2) is less than one-fifth of the average twin-study heritability estimate of 55%. The total composite showed the highest, albeit non-significant, GCTA estimate but even this estimate was only about 30% of the twins study heritability estimate, which fell below the expected 50%. Importantly, consideration of the standard errors shows that if the SNPs accounted for 50% of the twin study heritability, as has been found with the 'anchor' variables height and weight, the GCTA results would have been significant in our study.
Two hypotheses for explaining the gap between these anxietyrelated GCTA estimates and twin-study estimates are that GCTA underestimates genetic influence or that twin studies overestimate genetic influence, although these are not mutually exclusive hypotheses. We know that GCTA underestimates genetic influence to some extent because it only captures causal variants that are in linkage disequilibrium with the common SNPs used in the analysis; it misses the effect of rarer DNA variants not tagged by these SNPs. In addition, GCTA only assesses additive genetic effects. So, one possibility is that anxiety is influenced by rarer DNA variants or nonadditive genetic effects to a greater extent than height and weight. On the other hand, our twin study heritability estimates for parental ratings may be inflated -most estimates of heritability of anxiety traits in childhood and adolescence using other assessment techniques are around 30% [43], which would put our GCTA estimates more nearly in range of accounting for half the heritability. Another possibility is that, unlike in GCTA, nonadditive genetic variance can inflate estimates of additive genetic variance in a twin study. That is because its estimation is generally weak without extended family data [44].
It is important to resolve this issue of the gap between GCTA and twin-study estimates of heritability in general and specifically in terms of the possibility that the gap might be larger for anxietyrelated traits than for other complex traits. To the extent that GCTA estimates account for heritability, it should be possible to identify genes responsible for the heritability of anxiety using common SNPs alone if samples are sufficiently large. Larger samples could result in closing this gap by producing an increased number of significant SNP associations in GWA and by providing GCTA estimates with smaller error terms. That said, a recent study reported GCTA estimate of 0.06(0.03) for neuroticism in a sample of nearly 12,000 adults [40]. Suggesting that this gap might remain opened until either data from exome-sequencing microarrays are available (that tag rarer variants), or until whole-genome sequencing identifies all variants of any kind [45].

Conclusion
Our GWA results for anxiety-related traits suggest that, similar to other quantitative traits and common disorders, heritability is caused by many genes of small effect. Our GCTA results suggest that the genetic architecture of parent-rated anxiety-related traits may differ from previously published results in showing a greater gap between GCTA estimates of genetic influence and twin study estimates of heritability. One implication of knowing that there are no genes of large effect and that at least some of the genetic variance can be accounted for by the common SNPs on current DNA arrays is to increase sample sizes to detect associations of small effect size. Eventually, polygenic prediction, using composites of hundreds or thousands of DNA markers, may reach levels of predictive power useful at least for research if not for clinical practice.

Supporting Information
Text S1 Genotyping protocol, quality control and statistical analysis.