Introduction

Obsessive–compulsive disorder (OCD) is a neuropsychiatric disorder characterized by recurrent, persistent and intrusive anxiety-provoking thoughts or images (obsessions) and subsequent repetitive behaviors (compulsions).1 The lifetime prevalence of OCD has been estimated between 0.5 and 2.0%,1 and among all anxiety disorders, it is known as a major cause of social impairment and a leading cause of non-fatal disease burden worldwide.2

It is clear that genetic factors are important in the etiology of obsessive–compulsive symptoms (OCS), with heritability estimated ~40%.3, 4 These results did not vary with sex or symptom severity.3 Consistent with what is expected for individual single nucleotide polymorphism (SNP) effect sizes in highly polygenic traits, the first molecular association studies for OCD have not identified large effect variants.5, 6, 7 Taylor et al.7 performed a comprehensive meta-analysis of genetic association studies in OCD, including 113 relevant studies. Their main meta-analysis showed that OCD was associated with serotonin-related polymorphisms (5-HTTLPR and HTR2A) and, in males only, with polymorphisms involved in catecholamine modulation (COMT and MAOA). A secondary meta-analysis by the same group, which targeted polymorphisms that were investigated in fewer than 5 datasets, identified another 18 polymorphisms with significant odds ratios. These polymorphisms might be useful candidates for further investigation, although most results were based on candidate gene studies and must be treated with care due to the possible confounding effects of population stratification.

The international OCD foundation Genetics Collaborative (IOCDF-GC; Stewart et al.),5, 7 conducted an ancestry-stratified case–control genome-wide association (GWA) analysis, in 1465 cases, 5557 ancestry-matched controls and 400 trios, consisting of 1 affected offspring with 2 parents. The trio analysis revealed a significant SNP near the BTBD3 gene (rs6131295). However, this variant lost genome-wide significance when meta-analyzed with the case–control data. The meta-analysis showed a significant enrichment of methylation quantitative trait loci (P<0.001) and frontal lobe expression for the top-ranked SNPs (P<0.01).5 Recently, a second multinational collaboration (OCD Collaborative Genetics Association Study (OCGAS); Mattheisen et al.).8 performed a GWA study (GWAS) in 1598 patients and 3473 controls. The smallest P-value was observed for a marker near the PTPRD gene on chromosome 9 with P=4.13 × 107.8 Gene-based testing for associations at the gene instead of SNP level, revealed a significant association of IQCK, C16orf88 and OFC11.8 These findings await replication.

With this study, we aim to gain further insights into the genetic basis of OCS by performing a series of analyses in a homogeneous, population-based sample of twin families, registered with the Netherlands Twin Registry (NTR). Well-phenotyped population cohorts may contribute to the understanding of the architecture of common complex disorders. We first performed genetic structural equation modeling (SEM)9 to estimate twin–twin and twin–sibling correlations and heritability for OCS in adults as measured with the Padua Inventory Revised Abbreviated (PI-R-ABBR).10 Information on the PI-R-ABBR was available for two time points (2002 and 2008), which allowed calculation of the stability of OCS across a 6-year period. The long-term stability of the phenotype puts an upper limit to heritability—that is, reveals the proportion of total variation across time that is due to differences among individuals, and puts genetic association studies into perspective. Next, we investigated whether polygenic risk scores (PRS) based on a GWA analysis of clinical OCD cases5 significantly predict OCS in our population-based sample. If so, this indicates genetic overlap between the two types of samples, and suggests that future GWA studies can benefit from combining both population-based and case–control samples. We estimated the proportion of phenotypic variance explained by all autosomal SNPs using the Genome-wide Complex Trait Analysis (GCTA) software (Brisbane, QLD, Australia) in a sample of related and unrelated individuals. Further, we performed an explorative GWAS on OC symptom scores from 6931 subjects and entered the GWAS output into a gene-based analysis, to test for associations at the gene rather than the single SNP level.11

Materials and methods

Participants and measures

Study data were collected in participants of the NTR.12, 13 We analyzed data collected in 2002 and 2008 using the PI-R-ABBR.14 The total sample with phenotype data from either the 2002 and/or 2008 data collection contained 20 376 individuals from 7812 families. All individuals with information from at least one survey were included in the analysis. A total of 10 134 individuals had sum scores available for the PI-R-ABBR collected in 2002 and 15 720 individuals had sum scores for the PI-R-ABBR collected in 2008. Longitudinal data were available for 5478 individuals. The distribution of the OC symptom scores, from the PI-R-ABBR collected in 2002 and 2008, is provided in Supplementary Figure 1.

For more information on the PI-R-ABBR OCS measures, please see Supplementary Methods ‘participants and measures’.

On 6931 subjects (twins, their siblings, parents and spouses), phenotypic and genetic data were available. These subjects were entered into the genetic analyses (PRS, GWAS). For GCTA analysis, from these 6931, 6881 subjects were included in the Genetic Relationship Matrix. If a subject filled out both surveys we choose to enter the score of the last survey into GWAS and GCTA. Table 1 gives an overview of subjects per analysis and their demographics (see also Supplementary Methods ‘Subjects entered into different genetic analyses’ for more details and the study by Zilhão et al.).15

Table 1 Subjects (including age and sex) entered per analysis

This study has been approved by the Central Ethics Committee on Research involving human subjects of the VU University Amsterdam. Informed Consent was obtained from all subjects.

Genotyping and imputation

DNA was extracted from either blood or buccal cell samples that were collected for various projects done by the NTR.12, 13 A total of 31,265,038 SNPs were included in the analysis using the 1000 Genome phase 1 version 3 as a reference panel. For further details on genotyping, quality control and imputation methods see Supplementary Methods ‘genotyping and imputation’.

Heritability estimates from SEM

To estimate the total contribution of genetic factors to trait variance and to the longitudinal covariance, the resemblance among twins and siblings was compared. Monozygotic (MZ) twins share (nearly) all their genes, whereas dizygotic (DZ) twins, just like non-twin siblings, share on average half of their genetic variation. In quantitative genetics, this information is used to decompose the total variance of a trait into additive genetic (A), non-additive genetic or dominance (D), and unshared environmental variance (E). The greater the phenotypic similarity between MZ twins, when compared with DZ twins and non-twin siblings, the more of the variance of the trait is attributed to genetic factors. Genetic analyses were carried out by SEM as implemented in OpenMx.9 Further details on SEM analysis may be found in Supplementary Methods ‘SEM methods’.

Polygenic risk scores

To examine shared polygenic risk at an aggregate level between two independent GWAS samples, we used genetic risk score profiling as described by Purcell et al.16 GWAS results (SNP, effect allele, effect size as represented by a Beta-value and P-value) obtained from the analysis in the European case–control sample by Stewart et al.5 were used as a discovery dataset for calculating PRS within our NTR target sample in Plink. PRS were then regressed against the OCS scores from the NTR dataset (n=6931) to calculate the proportion of variance in this target set explained by PRS obtained from the discovery set, with 15 statistical cutoffs for SNP inclusion in the score (cutoffs: P<0.00001, P<0.0001, P<0.001, P<0.01, P<0.05, P<0.1, P<0.2, P<0.3, P<0.4, P<0.5, P<0.6, P<0.7, P<0.8, P<0.9, P1). To correct for family relatedness in the NTR sample, generalized estimating equations in SPSS with the exchangeable and robust function were used.17 Sex, age, principal components to correct for the population substructure18 and genotyping platform were included as covariates. As an additional test, we regressed the PRS on a non-psychiatric, and OCS-uncorrelated trait (height), from the NTR (n=6715); we sought to rule out a possible spurious association consequent of incomplete correction for population stratification and/or cryptic relatedness between the discovery and target datasets. Finally, to test for concordance of effect directions across both datasets, we performed Fisher’s exact statistical test using the online-based application SECA (SNP effect concordance analysis)—http://neurogenetics.qimrberghofer.edu.au/SECA/.19 For 12 subsets of SNPs with P<(0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.8, 0.9, 1.0) in both datasets, Fisher’s exact tests were performed to evaluate if there is an excess of SNPs in the first dataset (European case–control study) with same direction of effect in the NTR dataset across the total 144 SNP subsets. An empirical P-value was generated by permutations (1000) for observing the number of SNP subsets with nominally significant concordance.

Estimations of variance explained by common SNPs (GCTA)

The variance of OCS explained by measured and imputed SNPs was estimated with GCTA using the Restricted Maximum Likelihood procedure.20, 21 This method gives insight into the contribution to the additive genetic variance of all genotyped and imputed SNPs. This provides an upper bound on the variance that can be explained by the set of SNPs. To analyze all available data we followed the method proposed by Zaitlen et al,22 in which the SNP-based and kinship-based heritability can be estimated simultaneously. For further details on the method followed and quality control, refer to Supplementary Methods ‘estimations of variance explained by common SNPs (GCTA)’.

GWAS

GWA analysis was conducted with linear regression under an additive model with adjustment for age, age-squared, principal components of genetic ancestry, genotyping platform and sex. SNPs with values of P<5.00E−08 were declared genome-wide significant. For further details on this analysis, refer to Supplementary Methods ‘GWAS’.

Gene-based analysis

GATES, as implemented in the open-source tool Knowledge-Based Mining System for Genome-Wide Genetic Studies (KGG, version 3.0, Hong Kong, China), was used to perform a gene-based genome-wide analysis.11 GATES employs an extension of the Simes procedure23 to assess the significance of a statistical association at the gene level, by combining the individual genotype–phenotype association tests applied at each single marker. In short, it sums all the individual SNP P-values, available from GWAS summary data, within a gene to output a gene-based P-value. The effective number of independent P-values is given by appropriately controlling for the Linkage Disequilibrium structure on the specified SNPs. SNPs were allocated to genes including gene boundaries of ±5 kb from the 5′and 3′ untranslated region. To correct for multiple testing false discovery rate was set at Q<0.05.

Results

Demographics

For subjects who completed the PI-R-ABBR in 2002 and in 2008 the average age was 33.0 (s.d.=11.5), and 34.7 (s.d.=14.6). Retest stability for OCS scores over a time span of ~6 years was 0.63. The effect of age on OCS scores was a drop of 0.03 per year for both PI-R-ABBR collected in 2002 and 2008.

Genetic modeling (SEM)

MZ, DZ/sibling, test–retest and cross-correlations are summarized in Table 2. Twin correlations for MZ males and MZ females were equal, as were twin/sibling correlations for DZ males, DZ females, DZ opposite-sex twins and siblings. This indicates that there is no evidence for qualitative sex differences in the heritability of OCS and that to a large extent the same genes influence these symptoms in males and females. Twin correlations were more than twice as large in MZ as compared with DZ/sib pairs, indicating that phenotypic similarity is predominantly accounted for by genetic effects rather than shared environment. The same pattern was observed for cross-twin–cross-time correlations, indicating that the observed stability is also mainly caused by genetic factors. SEM showed a significant heritability (P<1.0 × 10−10) for OCS at both time points of 0.42 (95% confidence interval (CI) PI-R-ABBR 2002=0.371–0.467 for, and 95% CI PI-R-ABBR 2008 =0.379–0.456). The estimation of the bivariate (broad sense) heritability found that 56% (95% CI=0.497–0.619) of the stability of OC symptom was due to genetic factors (both additive and dominant components), and the longitudinal additive genetic correlation, that is, the degree of overlap between additive genetic influences at both time points was 0.58.

Table 2 Familial correlations estimated from maximum likelihood for OCS measured over two points in time

PRS regressed on PI-R-ABBR sum scores

The proportions of variance of OCS in the NTR sample explained by PRS obtained from the discovery set (European case control sample Stewart et al.),5 using the 15 statistical cutoffs for SNP inclusion in the score, are summarized in Figure 1. Results show that when including more SNPs in the analyses an increasing amount of variance in OCS is explained, reaching a plateau of 0.2% explained variance (P<0.001) at PRS12 (includes all SNPs from Stewart et al.5 with a P<0.7 (N=4,288,152)). Supplementary Figure 2 provides the results obtained when using height as the outcome phenotype for the target set for the same PRS. For the same 15 statistical cutoffs, no significant result was observed (P<0.01), that is, the PRS did not explain any variance in height, hence excluding the confounding effect due to cryptic relatedness across sample sets and possible residual genetic stratification effects present in both populations. Finally, Supplementary Figure 3 presents the results in a heat map plot from analysis of concordance using SECA. The permuted P-value for the number of SNP subsets nominally significant was P=0.002 thus indicating significant concordance of genetic risk across the datasets.

Figure 1
figure 1

Proportion of variance in OC symptom scores, as measured in the NTR sample, explained by polygenic scores (PRS) obtained from European case–control sample by Stewart et al.5, with a range of 15 statistical cutoffs for SNP inclusion in the score (PRS1; P<0.00001, PRS2; P<0.0001, PRS3; P<0.001, PRS4; P<0.01, PRS5; P<0.05, PRS6; P<0.1, PRS7; P<0.2, PRS8; P<0.3, PRS9; P<0.4, PRS10; P<0.5, PRS11; P<0.6, PRS12; P<0.7, PRS13; P<0.8, PRS14; P<0.9, PRS15; P1). *P<0.05; **P<0.01. NTR, Netherlands Twin Registry; OC, obsessive–compulsive.

SNP-based heritability

GCTA results showed a significant SNP-based heritability estimate of 0.14 (s.e.=0.05, P=0.003), and a total narrow-sense heritability of 0.34 (s.e.=0.02, P<0.001). Thus 14% of the total phenotypic variance in OCS can be accounted for by the SNPs in the genotyping platform, and the total set of SNPs included in an additive genetic model account for 34% of the total heritability.

GWAs analysis

Top associated variants in GWAS analysis (top 20 SNPs) are summarized in Table 3 (a more comprehensive overview of these results is present in Supplementary Table 1). The Manhattan plot, showing the −log(P) plotted against genomic location, and QQ plot of observed versus expected −log(P) statistics for the OC symptom GWAS, are illustrated in Figure 2 and Supplementary Figure 4, respectively. Of the top associated variants, one SNP, rs8100480 (19299079, 19p13.11 (hg19), P=2.56 × 10−8), exceeded the threshold for genome-wide significance and showed a positive association with OCS. This SNP is located within the MEF2BNB gene. A more detailed look into this region is provided by the regional association plot in Supplementary Figure 5.

Table 3 Top associated variants in NTR-GWAS analysis
Figure 2
figure 2

Manhattan plots of all genotyped single nucleotide polymorphisms (SNPs). Red and blue lines indicate significance thresholds of 5.00E−08 and 1.00E−05, respectively.

Further, we searched in our results whether they replicate top SNPs reported by Stewart et al.5 and Mattheisen et al.8 Supplementary Table 2 summarizes results for the strongest associated GWAS variants from Stewart et al.5 From their table of 43 SNPs (with P<1.0 × 10−5), 16 were found to be independent.8 None of these SNPs were significantly replicated in our sample when correcting for multiple comparison (0.05/16=0.003). However, a trend (P=0.0049) for replication was found for rs4868342, located on chromosome 5. This SNP is located within the HMP19 gene.

Supplementary Table 3 summarizes results for the strongest associated GWAS variants from Mattheisen et al.8 None of the 32 suggestive associations (with P<1.0 × 10−4) were replicated in our sample after correcting for multiple comparison (0.05/32=0.001). Neither were the significant results obtained in our study replicated in the Stewart sample (data not shown).

Gene-based analysis

A total of 2,644,694 SNPs were mapped to 22 759 genes. The QQ plots with the observed versus expected –log(P) of the association tests are presented in Supplementary Figure 6. Table 4 depicts all the genes with a significant association. Although these are all nominal significant, the Benjamin–Hochberg procedure was set to control for the q<0.05.23 After correction, four genes remained significant, the regulatory factor X-associated ankyrin-containing protein (RFXANK), the myocyte enhancer factor 2B (MEF2B), the MEF2B neighbor (MEF2BNB) and the MEF2BNB-MEF2B read through (MEF2BNB-MEF2B). All these genes are located in the same chromosomal region (19p13.11), and share the SNP with the lowest P-value annotated to the gene (rs8100480, P=2.56E−08). The Manhattan plot for the gene association P-values is present in Supplementary Figure 7.

Table 4 List of all nominal significant genes (α=0.05)

Discussion

This study aimed at getting a better insight into the genetic basis of OCS in a population-based sample.

First, in line with previous studies, we estimated the heritability for OCS at 0.42.4 Stability of OCS over a 6-year time period was 0.63, and cross-twin–cross-time correlations were found to be twice as high in MZ compared with DZ/sib pairs, indicating that the observed stability is mainly caused by genetic factors. Bivariate analyses showed a longitudinal genetic correlation of 0.58. Second, polygenic scores based on a GWA analysis of clinical OCD cases significantly predicted OCS in the independent population-based sample. Analysis of concordance of results indicate that the genetic risk between the two datasets are concordant. These results indicate a genetic overlap between OCD assessed as a categorical disorder and OCS assessed as a continuous trait. Furthermore, the fact that the proportion of variance explained increases as more SNPs are included in the analysis indicates that many small effect variants contribute to the trait. Therefore, future GWA studies can benefit from the inclusion of both population-based and case–control studies, and by analyzing OCD as a quantitative rather than a categorical trait.

Third, we estimated the SNP-based heritability. We were able to find significant explained variance at 0.34. More importantly we were able to partition the heritability into two components, revealing a SNP heritability of 0.14 and the heritability not accounted for by SNPs at ~0.20. These results corroborate well with our findings from our twin model (broad heritability at 0.42). In a previous study by Davis et al.24 using a case–control design, the heritability of OCD was estimated at 0.37. Although our results do not completely agree, different study-designs can render somewhat different estimates, as was shown by Golan et al.25

Fourth, the GWAS performed on the continuous OC symptom scores resulted in one significant SNP (rs8100480), which is located in the MEF2BNB-MEF2B gene. In addition, we sought to calculate gene-based P-values, to determine whether there are genes with associated SNPs, which can collectively achieve statistical significance. Implementing a gene-based analysis as a follow-up complementary analysis is of additional value over traditional GWAS results. Gene-based analysis takes the number of SNPs per gene and gene size into account, considering genes as functional units and informing on the underlying genetic architecture of the phenotype. Furthermore, since for most psychiatric disorders we do not expect large effect causal variants, replication analysis from underpowered GWAS would not reflect these findings. Gene-based tests, by providing an aggregated analysis, may successfully capture and replicate those aggregate effects. We found significant associations for four genes. RFXANK encodes a protein that belongs to the MHC class II molecules, which has an important role in the development and control of the immune system. Bare lymphocyte syndrome 2, an immunodeficiency disorder, has been linked with mutations in this gene. Although RFXANK has not been identified in OCD previously, several lines of research have indicated a role of immune system alterations and of immune system genes, including TNFα, and SLC1A1, in combination with cerebral immunopathological reactions to among others group A ß-hemolytic streptococcus infections26 and Bornea virus infections27, 28 in the onset (and expression) of OCD. This has led to the concept of ‘PANDAS’ and ‘non-PANDAS’ OCD.29, 30, 31 However, very little research has been performed on direct gene–gene interactions/pathways or on gene–environment interactions to better understand immunopathological pathways related to OCS. MEF2B is a protein-coding gene belonging to the DNA-binding protein family MADS/MEF2, that regulates gene expression, specifically in the smooth muscle tissue. Both MEF2BNB and MEF2BNB-MEF2B are closely related to MEF2B, and have mostly regulatory functions. Additional support for a relation between MEF2BNB and OCD comes from a gene-based analysis in a recently published GWAS of OCD8 where the gene was ranked 21st of the 21 567 genes tested (gene-based P=8.09E−04, not significant after correction for multiple testing). All four genes are overlapping, and span a region of 56302 bp located in the 19p13.11 cytogenetic band, on the short arm of chromosome 19. Perhaps the best interpretation for these results, therefore, is in the implication of this genomic location (19p13.11) as a susceptibility locus for OCS. Several SNP–trait associations have been linked to this locus. For example, rs1064395 (NCAN) has been reported as a susceptibility factor for bipolar disorder in a GWAS. The NCAN gene is located just 10095 bp at 5′ from RFXANK, and is one of the few genetic variants that has been genome-wide replicated as a risk factor in both bipolar disorder and schizophrenia.32, 33 In addition, a recent study focusing on the cortical thickness and folding in schizophrenia patients found evidence for association of the NCAN genetic variant in the occipital and prefrontal cortex.34 The SNP rs874628 (MPV17L2 gene), located in this locus, was implicated in multiple sclerosis, an inflammatory disease with disruptions in the nervous system.35

Overall, our results, combined with previous genetic studies in OCD, suggest a possible role for the 19p13.11 region (MEF2BNB gene) in OCS. It might be of interest for future genetic studies to investigate this area in association with OCD into depth. Further, our data shows that well-phenotyped population cohorts could contribute to the understanding of the underlying architecture of common complex disorders such as OCS, and that these partly overlap with results from case–control studies. Therefore, future studies could benefit from combining case–control and population-based samples.