The genetic epidemiology of obsessive-compulsive disorder: a systematic review and meta-analysis

The first systematic review and meta-analysis of obsessive-compulsive disorder (OCD) genetic epidemiology was published approximately 20 years ago. Considering the relevance of all the studies published since 2001, the current study aimed to update the state-of-art knowledge on the field. All published data concerning the genetic epidemiology of OCD from the CENTRAL, MEDLINE, EMBASE, BVS, and OpenGrey databases were searched by two independent researchers until September 30, 2021. To be included, the articles had to fulfill the following criteria: OCD diagnosis provided by standardized and validated instruments; or medical records; inclusion of a control group for comparison and case-control, cohort or twin study designs. The analysis units were the first-degree relatives (FDRs) of OCD or control probands and the co-twins in twin pairs. The outcomes of interest were the familial recurrence rates of OCD and the correlations of OCS in monozygotic compared with dizygotic twins. Nineteen family, twenty-nine twin, and six population-based studies were included. The main findings were that OCD is a prevalent and highly familial disorder, especially among the relatives of children and adolescent probands, that OCD has a phenotypic heritability of around 50%; and that the higher OCS correlations between MZ twins were mainly due to additive genetic or to non-shared environmental components.


INTRODUCTION
Obsessive-compulsive disorder (OCD) is a prevalent and highly heterogeneous disorder of unknown etiology. Similar to other psychiatric disorders, OCD probably originates from a complex interaction of genetic and environmental risk factors [1][2][3].
Since the beginning of the twentieth century, family studies have consistently reported that OCD is a familial disorder. However, the sample size, and methodological rigor of these studies have been mixed [4,5]. Consequently, these studies have limited external validity [6]. More recently, population-based studies have significantly increased the sample sizes to several thousand probands and relatives but are limited by less precise diagnostic procedures [7][8][9][10][11][12].
Since OCD is a heterogeneous disorder, it may be possible that certain OCD phenotypes (i.e., early onset or tic-related OCD) are more familial/heritable than others [13][14][15][16][17][18]. However, not all studies had sufficient statistical power to confirm this familial pattern.
Considering the relevance of all the studies published since 2001, the current study aimed to update the state-of-art knowledge on the field by conducting a systematic review and metaanalysis of OCD family and twin studies.

Search strategy and selection criteria
The present meta-analysis was conducted according to the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) guidelines [57] and the study protocol was registered in PROSPERO (registration number: CRD42019118317).
The PECO strategy [58] was used to frame the systematic review procedure criteria, as follows. Participants: family members and/or twins of OCD probands; Exposure: family history of OCD; Controls: family members and/or twins of probands without OCD; and Outcome: OCD rates in relatives and/or twins of OCD probands.
For this systematic review and meta-analysis, we considered all studies that examined familial loading thorough family aggregation rates and/or twin resemblance up to September 30, 2021. The following databases were searched: CENTRAL (Cochrane Library); MEDLINE by PubMed (US National Library of Medicine); EMBASE; BVS (Biblioteca Virtual em Saúde); and OpenGrey (for gray literature). The search strategy was designed using DeCS headings and adapted to the terms for each database indexing vocabulary thesaurus (i.e., Medical Subject Headings [MeSH] for MEDLINE). No restrictions were placed on language or date of publication. Specific details of the search strategy for each database including the uniterms used are provided in Box S1.
A specific search strategy was used to locate previous reviews. The reference lists of all previously published reviews and metaanalyses were carefully screened, and included articles for full-text selection procedures, which were also scrutinized for additional relevant studies.
Studies were included if the following criteria were met: OCD diagnosis was assessed using standardized and validated instruments, or by a population database record (based on the Diagnostic and Statistical Manual of Mental Disorders [DSM-III or DSM-III-R, or DSM-IV, or DSM 5], or the International Classification of Diseases, Eight, Ninth or 10th Revision [ICD-8, ICD-9 or ICD-10] diagnostic criteria); the probands were diagnosed with OCD by direct interviews; relatives were directly interviewed; the study designs comprised cohort or case-control studies; and included a control group.
Review studies, case reports, expert consensus, letters to editor, opinion papers, segregation analysis studies, molecular genetic studies, studies reporting obsessive-compulsive personality disorder as the outcome, and animal model studies were excluded.
A flowchart illustrating the study search and selection process is presented in Fig. 1.
Two independent OCD experts (T.B.V. and L.M.) performed the screening procedure for the databases to determine which studies met the eligibility criteria. First, duplicate publications were screened and excluded (n = 101). Next, the screening process was applied to titles and abstracts (n = 4022), and potentially eligible full-text articles were selected (n = 95), followed by fulltext review (T.B.V. and L.M.). Disagreements were discussed and resolved by consensus, or by consulting a third expert (M.C.R.). The Cohen's kappa coefficient of agreement between the two reviewers was excellent (0.90). The screening process was performed using the Rayyan software [59].
During the study selection procedure, the reviewers concluded that all twin studies were based on community samples in which obsessive-compulsive symptoms (OCS) were not systematically assessed by standardized instruments and or direct interviews. This means that twin and population-based studies would have been excluded from the analysis, despite presenting extremely relevant information. Instead, we made the post hoc decision to wave this inclusion criterion for those studies (i.e., probands and FDRs were not directly interviewed).  From the 95 full-text records evaluated, 58 were excluded. A list of all the excluded studies after a full-text review and reasons for exclusion appears in the Supplement (Table S1). Of the remaining articles, 14 were family studies and 23 were twin studies. From the reference list search, five additional family and six additional twin studies were found and added to the meta-analysis. No studies were found in the gray literature.
The Newcastle-Ottawa Scale (NOS) [60] tool was used for the investigation of the risk of bias in case-control observational assessment. The NOS contains eight items, categorized into three domains: selection, comparability, and outcome/exposure. For each study type (i.e., family or twin), the items were adapted, and a series of response options were provided. A star system scoring was used, ranging from zero to nine stars. Details of the risk assessment of bias according to the NOS for each included study is provided in Table S2.

Data analyses
For the coding process, a standardized data extraction form was used by the reviewers consisting of the following items: author(s) and year of publication; study location; sample recruitment procedure; inclusion and exclusion criteria applied for sample selection; OCD diagnostic criteria coding resource; and assessment tools used.
For family studies, it also included: case definition methods, including the adoption of the best estimate method (or not); matching procedures for proband comparison group selection; blindness of interviewer for proband status; number of probands, controls, and relatives with OCD (definite and subthreshold/ probable); mean age of the proband and relative samples.
For family studies, the analysis unit was the FDR(s) of the OCD patients or controls, and the outcomes of interest were: (a) the familial recurrence rates of definite and probable/subthreshold OCD in FDRs of OCD compared with FDRs of control probands; and (b) the familial recurrence rates of definite and probable/ subthreshold OCD in FDR of early-onset (before 18 years old) OCD probands compared with FDRs of late-onset (after 18 years old) OCD probands. We conducted these analyses three times: once for all studies, once for studies involving children/adolescents, and once for adults.
For each outcome of interest reported by ≥2 family studies, we performed a random-effects Mantel-Haenszel meta-analysis to derive pooled effect estimates. Because all data under analysis were dichotomous outcome variables, we summarized the effects using odds ratio (OR) and their 95% confidence intervals (CIs).
For twin studies, the number of MZ and DZ pairs and the twin resemblance correlations according to zigozity were systematically extracted. When the studies only reported separate correlations for males and females, we transformed the correlation coefficients to Fisher z-values (see below), averaged them, and back-transformed the resulting z-value to a correlation coefficient.
For the twin studies, the analysis unit was the twin pairs. The outcome of interest was the correlation of OCS in monozygotic compared with dizygotic twins. We tested two hypotheses: (a) that the OCS correlation in monozygotic twins is equal to the OCS correlation in dizygotic twins; and (b) that the OCS correlation in monozygotic twins is the double of the OCS correlation in dizygotic twins. We conducted these analyses considering all studies, including only the children/adolescents and only adults.
For each outcome of interest reported by ≥2 twin studies, we transformed the correlation coefficients to Fisher Z values and performed a random-effects meta-analysis to derive pooled effect estimates. This transformation was beneficial because the standard error of a correlation depends on the correlation itself, making larger correlations appear more precise and thus receiving more weight. In contrast, the Fisher transformation only depends on the sample size. We conducted these analyses twice: once assuming that the standard error was that of the Pearson correlation and once assuming that it was that of the tetrachoric correlation. The latter assumes that the presence of OCS represents latent variables that follow a bivariate normal distribution. To use the standard error of the tetrachoric correlation, we first estimated the number of concordant and discordant twin pairs for the presence of OCS according to a 2.3% lifetime prevalence of OC [61] then using the "polychor" function to derive the standard error [62] and finally calculating the "effective" sample size to perform the meta-analysis, as described in Polderman et al. [63].
Finally, we used tetrachoric correlations to estimate the A (additive genetics), the C (shared environmental) and the E (nonshared environmental) components based on the following definitions: We measured the heterogeneity between studies with the I 2 statistic, which describes the percentage of the variability in effect estimates attributable to heterogeneity. We accepted I 2 values <50%. When the I 2 value exceeded this value, the studies were excluded one-by-one from the analyses to identify and analyze the outlier.
Data from 5053 directly interviewed FDRs of 176 child and adolescent and 899 adult OCD probands and 522 control probands (95 from children/adolescents and 427 from adult samples) were pooled for meta-analysis. The analyses combining pediatric and adult probands showed that FDRs of OCD probands had higher risks for definite OCD (OR = 7.18, 95% CI 4.13-12.47, p < 0.00001) than the FDRs of controls (Fig. 2).
The comparison between pediatric and adult studies indicated that OCD was significantly more familial in children/ adolescents than in adults. First-degree relatives of the child and adolescent OCD probands had a 16 times higher risk of definite OCD compared to the control FDRs (OR = 16.44, 95% CI 4.57-59.17, p < 0.00001). The FDRs of adult OCD probands had an approximately 6 times higher risk of definite OCD compared to the control FDRs (OR = 6.02, 95% CI 3.16-11.46, p < 0.00001) (Fig. 3).  (23) 1 (3) 13 (30) 8 (23) Fyer et al. [26] USA  (55) 16 (7) Nationalwide register-based studies The * shows the study group which the study was related to.
For the above analyses, the degree of heterogeneity among the studies was acceptable (overall I 2 = 37%; adult probands, I 2 = 48%; child and adolescent probands, I 2 = 0).
Regarding definite and subthreshold OCD, the pooled data among adult samples revealed slightly lower familial loading (OR = 4.06, 95% CI 2.91-5.66, p < 0.00001) than the analyses including only definite OCD. There was no heterogeneity between the studies pooled for this analysis (I 2 = 0%).
Data on the occurrence of definite or definite/subthreshold OCD in FDRs of probands considering the age of symptom onset were reported for eight different samples in 12 publications [5, 16, 18, 23, 25-29, 66, 68]. However, eight studies did not present the number of relatives included in their studies, remaining only four samples for the statistical analyses [16,18,26,68]. Together, the studies showed a high heterogeneity for the analyzed outcomes (definite OCD, I 2 = 71%; definite/ subthreshold OCD, I 2 = 95%) (Fig. 4).
The tic-related OCD family aggregation analyses were not reported because of the insufficient number of studies included or the high heterogeneity of the extracted data.

Risk of bias
The Newcastle-Ottawa Scale (NOS) scores of the included family studies were generally high, indicating low risk of bias (Tables S2). The only exceptions were the studies by [65] and [20], which lacked an interviewer blinding procedure regarding proband/relative status, and [13,69], and [68], which did not use the best estimate diagnosis method and lacked control samples.
Of note, there was a high heterogeneity index across the twin analyses. However, it is important to mention that despite the high heterogeneity, almost all results were statistically significant in the same direction.

Population-based cohorts (post hoc)
Six large nationwide register-based cohorts [7][8][9][10][11][12] did not meet our initial inclusion criteria but, given their superior statistical power and relevance, they were included in the current paper and their results are narratively described below.
From a cohort of more than 13.5 million people who were born or lived in Sweden between 1969 and 2009, Mataix-Cols et al. [7] found that the FDRs of the 24,768 individuals with OCD were more likely to also have OCD (OR = 5.03, 95% CI = 4.49-5.64 for siblings; OR = 4.70, 95% CI = 4.09-5.40 for parents; and OR = 4.56, 95% CI = 3.97-5.24 for the offspring); that this risk decreased proportionally to the degree of genetic relatedness; and that the risk tended to be higher amongst FDR of early-onset OCD individuals.

DISCUSSION
The current meta-analysis included all OCD family and twin studies published until September 2021. The results update and extend the findings of the previous meta-analysis published more than 20 years ago [19]. The main findings were that OCD is highly familial, particularly in children and adolescents; that the heritability of OCS in twin samples is approximately 0.5; and that the higher OCS correlations between MZ twins were mainly due to additive genetic or to non-shared environmental components. These results are relevant for future genetic and clinical studies and reinforce the need for the development of specific guidelines for the Fig. 4 Age of onset. Only four samples remained for OCD family recurrence rate statistical analysis. The studies showed a high heterogeneity for the analyzed outcome.
T. Blanco-Vieira et al. Table 2. OCD twin studies.     The * shows the study group which the study was related to.
screening of OCS in the FDRs of OCD subjects and the early referral for treatment when needed. According to the 18 OCD family studies included in the analyses, OCD was 7.2 times more frequent in OCD families, when compared to control families. These estimates are almost twice higher than those from the last meta-analysis published in 2001 [19]. One conceivable explanation for these different ratings could be the fact that the current study included studies with larger samples, interviewed with validated assessment tools and based on reliable diagnostic criteria. Furthermore, considering the secrecy characteristic of OCD, the higher rates may be due to the fact that the current analyses included subjects that were directly interviewed.
Of note, the OCD rates among control relatives (2.3%) were very similar to the lifetime prevalence rates in the general population, ranging from 0.7% to 3% [61]. These findings suggest that the current results may be generalized to other samples and reinforce the robustness of the current estimates.
Additional analyses of very large population-based studies, primarily conducted in Scandinavian countries and Taiwan, support the estimates from the family studies [7][8][9][10][11][12]. In line with the studies that did meet our inclusion criteria, the risk for OCD in these population-based studies varied from 4.7 [7] to 7.64 [12] for parents, 4.82 [11] to 8.95 [12] for full siblings, and 4.54 [8] to 8.95 [12] for the offspring. Because these population studies had superior statistical power and less risk of selection bias, we conclude that the familial risk estimates are generalizable to the general population.
The twin studies demonstrated that OCD, or at least its dimensional representation, is not only familial but also heritable, with twin correlations ranging from 0.52 and 0.43 in MZ twins compared to 0.27 and 0.20 in DZ twins (in children and adult samples, respectively). These findings are in line with previous reports [1,71,72], and indicate that both genetic and environmental characteristics are important in the etiology of OCS. The analyses of the specific roles of additive genetic effects (A) and non-shared environment (E) components of the ACE model in the etiology od OCD revealed that our findings are in line with previous results [1] with each accounting for 46% and 54% of the variance, respectively. Interestingly, singlenucleotide polymorphisms -based heritability of OCD is still considerably lower, in the region of 30% [73], which indicates that further research is needed to understand the "missing heritability". It is plausible to assume that while the majority of inherited liability for OCD is due to common genetic variation, rare variation may also contributes to some extent. Thus, future genetic studies should focus on common as well as rare genetic variants as a way to capture more of the unexplained phenotypic heritability.
Notably, the shared environment component (C) did not have any contribution to the etiology of OCS in this study. Taylor, 2011 [1] have also previously reported that the shared environment has a weak contribution to the OCS phenotypic variance. This finding is particularly relevant to the clinical field because it suggests that family environment (e.g., learning) is unlikely to have a major role in the etiology of the disorder. Instead, future studies should focus on the impact of specific environmental factors that are not shared between siblings or twins. Discordant sibling and twin designs are particularly suited to move the field forward because they effectively adjust for shared genetic factors and unmeasured confounders [3]. Using such designs, researchers have recently confirmed a dose-response relationship between perinatal complications and risk of OCD in the offspring [74].
Some limitations of the present study should be highlighted, such as the fact that the data were not analyzed according to the gender of the probands or the FDRs, to specific OCS subtypes or dimensions, to the symptom severity or the treatment response rates. These analyses could not be performed due to insufficient detail in many of the studies. For example, it seems likely that the tic-related subtype of OCD is particularly familial and heritable but limited data exists [75]. In addition, it would have been important to have more studies describing the recurrence risks for OCD according to the age of onset of OCS. Furthermore, Twin studies were based on selfreported questionnaires rather than on direct interviewed individuals but their results were largely compatible with those of the controlled family and population-based studies.
Despite these limitations, the current systematic review and meta-analysis represent a much needed update on the genetic epidemiology of OCD. The familial and heritable nature of OCD is now indisputable. In addition to large-scale gene-searching efforts, more needs to be done to understand environmental risk factors that are potentially modifiable, and how this newly gained knowledge can be used to improve the health of individuals with OCD and their relatives. T. Blanco-Vieira et al.