Developmental language disorder – heritability and genetic correlations with other disorders affecting language

Developmental language disorder (DLD) is a neurodevelopmental disorder primarily affecting language in the absence of a known biomedical condition, which may have a large impact on a person ’ s life and mental health. Family-based studies indicate a strong genetic component in DLD

A B S T R A C T Developmental language disorder (DLD) is a neurodevelopmental disorder primarily affecting language in the absence of a known biomedical condition, which may have a large impact on a person's life and mental health.Family-based studies indicate a strong genetic component in DLD, but genetic studies of DLD are scarce.In this study we estimated the heritability of DLD and its genetic correlations with related disorders and traits in sample of >25,000 individuals from the Danish Blood Donor Study for whom we had both genotype data and questionnaire data on language disorder and language support.We estimated SNP-based heritabilities for DLD and genetic correlations with disorders which may involve spoken language deficits and traits related to spoken language.We found significant heritability estimates for DLD ranging from ~27 % to ~52 %, depending on the method used.We found no significant evidence for genetic correlation with the investigated disorders or traits, although the strongest effect was observed for a negative genetic correlation between DLD and nonword repetition ability.To our knowledge, this study reports the first significant heritability estimate for DLD from molecular genetic data.

Introduction
Developmental language disorder (DLD) is a neurodevelopmental disorder characterized by low language ability, significantly affecting daily functioning, in the absence of biomedical conditions (Bishop et al., 2017).DLD is known to have negative consequences on the individuals who have it, which may affect their social lives, their education and their employment, as well as their mental health (Dubois et al., 2020;Elbro et al., 2011).Individuals with DLD have also been reported to have a higher prevalence of severe psychiatric disorders such as schizophrenia (Clegg et al., 2005).Despite these negative outcomes, DLD is understudied compared to related neurodevelopmental conditions (Bishop, 2010b;McGregor, 2020), including some in which language may also be impaired.With regards to the latter point, the similarities and differences between DLD and autism and how they relate to the etiologies of the disorders have been debated in the literature (Bishop, 2003(Bishop, , 2010a;;Kjelgaard and Tager-Flusberg, 2001).Studying the molecular etiology of DLD and its relationships with other neurodevelopmental disorders is therefore of high importance.We have previously shown that DLD is significantly associated with poorer mental health, learning difficulties and reading difficulties, while also being under-diagnosed in the healthcare system, in a large Danish cohort (Nudel et al., 2023).While it has long been known from family studies that (developmental) disorders of spoken language are heritable (Stromswold, 2001), studies into the heritability of DLD are scarce, in particular studies using molecular genetic data.Two recent studies using twin data reported very different heritabilities for DLD and parental-reported speech and language difficulties: 21-22 % and 70 %, respectively (Keijser et al., 2024;Toseeb et al., 2022).In this study, we used an updated genetic data freeze with an increased sample size to expand the genetic analysis from our previous study and perform further genetic analyses which were not possible to perform in the original study sample.In the updated genetic dataset, the control sample size increased by ~40 % and the case sample size increased by ~53 %.We estimated the heritability of DLD in unrelated individuals using two different approaches.We also estimated genetic correlation between DLD and relevant disorders and traits related to spoken language and verbal communication.

Participants and phenotypes
A total of 46,547 individuals from the Danish Blood Donor Study (DBDS) cohort completed the DBDS digital questionnaire with questions on language disorder and language support, as detailed previously (Nudel et al., 2023).In terms of the demographics of Danish blood donors, in the full cohort, women were more likely to donate blood around age 25 and men around between the ages of 25 to 55 compared with the general Danish population (Burgdorf et al., 2017).The probability of blood donation increased as the percentile of income increased; in terms of education, it was highest among women with high-school education and men with a short/middle length education; lastly, individuals in urbanized areas were more likely to donate blood (Burgdorf et al., 2017).As described in our previous paper (Nudel et al., 2023), DLD cases were defined as i) having or having had a self-reported language disorder or ii) having started talking late, and, additionally, having a history of at least one of the following types of intervention: iii) speech and language therapy; iv) language group intervention; v) school language unit; vi) another form of language support.Additionally, DLD cases had the following exclusionary criteria based on primary and secondary diagnoses from hospital registers for the following conditions: intellectual disability, autism spectrum disorder, hearing impairment and deafness, Down syndrome, epilepsy, Turner syndrome, and Klinefelter syndrome (see Supplementary Table S1 for a list of International Classification of Diseases (ICD) codes).DLD controls were defined as i) not having or having had a self-reported language disorder, and ii) not reporting having started talking late, and iii) not having indicated that others had difficulty understanding them at age five.See our first paper for further details on the DLD phenotype (Nudel et al., 2023).Descriptive statistics for the DLD affection status, age and sex of the participants in this study are shown in Table 1.The proportion of cases in the present study (3.6 %) was similar to the one from the previous study (Nudel et al., 2023), including both in the total sample (3.7 %) and the genetic analyses (3.3 %).

Genetic data and quality control steps
This new study uses data from the DBDS genetic data freeze from July 2023.The genetic cohort has been described previously (Hansen et al., 2019).The genotyping and imputation were performed by deCODE Genetics (Reykjavík, Iceland).

Genotyping and related quality control
Samples were genotyped on the Illumina GSA chip.Quality control for genotype data included: removing duplicate samples, samples failing a sex check (comparing sex determined from the genetic data with sample information), samples with >5 % missingness, markers with >10 % missingness, and markers with a Hardy-Weinberg equilibrium p-value <0.00001.

Imputation and related quality control
Phasing was performed with SHAPEIT4 (Delaneau et al., 2019), and the dataset was imputed using a reference set of whole-genome sequenced samples of various ethnicities called using GraphTyper (Eggertsson et al., 2017) and an in-house pipeline (Gudbjartsson et al., 2015).Following the imputation, markers with a minor allele frequency (MAF) <1 % were removed.Study-specific quality control steps included: removing markers with duplicate positions, non-autosomal markers, multiallelic markers, markers with INFO score <0.9, and markers with MAF <1 % within the final DLD case-control dataset used in this study.

Analyses of relatedness and ancestry
Using directly-genotyped markers, a relatedness check was performed using PLINK (Chang et al., 2015;Purcell et al., 2007) v2.00alpha20230109 with the king-cutoff flag.We used a cutoff value for the kinship coefficient corresponding to second degree relatives, 0.0884, such that one individual from any pair with a higher value is excluded.Also using this genotype dataset (with samples with >2.5 % missingness and markers with >5 % missingness or MAF <5 % removed, and with LD pruning, excluding regions of high LD), individuals were excluded if they were of divergent ancestry, as determined by a principal component analysis (PCA) with FlashPCA 2.0, retaining only unrelated individuals and using individuals from the 1000 Genomes project as a reference sample; a first set of Europeans was determined based on Mahalanobis distance metrics based on the first 10 PCs (5 SDs from the mean from the 1000 Genomes project EUR individuals), whereupon, in a subsequent PCA done on the latter group, information on parental birthplace was used to select individuals whose parents were born in Denmark, identifying individuals with 5 standard deviations (SDs) from the mean of individuals with this information.Lastly, in another PCA, individuals within 5 SDs from the mean of this last group were defined as being of (narrow) European ancestry and retained for downstream analysis.An additional PCA within the group of unrelated Europeans was performed to generate PCs to be used as covariates in downstream analysis.After all steps of quality control and a merge with the phenotype data, the final sample included 931 DLD cases and 24,920 DLD controls, with data for 7,295,662 autosomal markers (genome build hg38).Please note that the above imputation and quality control procedures were applied to all individuals in the new data freeze (or the final dataset of cases and controls, where stated explicitly); furthermore, some participants might have withdrawn their consent and would have been subsequently removed from genetic analyses since the publication of the first study.Therefore, the genetic dataset used in this study includes more individuals than, but not necessarily all individuals within, the genetic dataset used in our previous study.

Genome-wide association study
A genome-wide association study (GWAS) for DLD was performed with PLINK v1.90b7 using a logistic regression model (--logistic).The GWAS included covariates for sex, age at submission of the questionnaire and the first ten PCs.Manhattan and QQ plots for the GWAS results were generated with the "qqman" R script written by Stephen Turner.

Estimation of SNP-based heritability and genetic correlation
GCTA (Yang et al., 2011) v1.93.3 beta was used to create a genetic relationship matrix (GRM) from the imputed dataset using -make-grm in the final dataset.The heritability of DLD was estimated using a restricted maximum likelihood model (--reml) using the GRM and the same covariates as in the GWAS.The heritability was converted from the observed scale to the liability scale for two possible prevalences: 7 % and 10 %, with the former being the prevalence reported for specific language impairment (SLI, a more restrictive diagnosis than DLD) (Tomblin et al., 1997) and the latter being reported for total language disorder in a population study (Norbury et al., 2016).LDSC (Bulik-Sullivan et al., 2015a, 2015b) v1.0.1 was used to calculate the heritability and genetic correlation with other phenotypes (using the --h2 and the --rg functions, respectively).The summary statistics from the GWAS were processed with munge_sumstats.py(after adding a column with the non-effect allele) and using a reference list of high-quality markers (https://data.broadin stitute.org/alkesgroup/LDSCORE/w_hm3.snplist.bz2)with --merge-alleles, following the conversion of marker IDs to rsIDs based on a map of chromosome and position (hg38) downloaded from the Ensembl Bio-Mart.The default filters for the processing of markers were used.The LD score dataset used was the European dataset from the LDSC website (https://data.broadinstitute.org/alkesgroup/LDSCORE/eur_w_ld_chr.tar.bz2), which was used as both the reference and weight LD score dataset.In this part of the study we focused on traits and disorders related to spoken language, i.e. disorders which may involve deficits in verbal communication and traits that are related to the production, perception and processing of spoken language (for which GWAS summary statistics were available).
Genetic correlations with attention deficit/hyperactivity disorder (ADHD), autism spectrum disorder (ASD), and schizophrenia used summary statistics from large meta-analyses for these disorders in Europeans (Demontis et al., 2023;Grove et al., 2019;Trubetskoy et al., 2022).Summary statistics from GWAS of speech-in-noise (Liu et al., 2021), voice pitch (Gisladottir et al., 2023) were also used, as were summary statistics for nonword repetition and phoneme awareness (in Europeans, without genomic control correction) (Eising et al., 2022).The summary statistics were processed with LDSC as described above, with the total sample size either taken from the summary statistics themselves or, if this information was not available, from the documentation.The effect allele was identified based on the documentation; otherwise, "A1" was assumed to be the effect allele.For the quantitative traits nonword repetition and phoneme awareness, the phenotypes represent the numbers of correct responses (Eising et al., 2022); voice pitch was measured as the median fundamental frequency (in Hz) during a reading task (Gisladottir et al., 2023); speech-in-noise was measured as the mean value for both ears with regards to the signal-to-noise ratio (in dB) at which the participant is able to comprehend half of the spoken information (Liu et al., 2021).Any data transformations are described in the relevant papers.Genetic correlations were plotted in R v4.1.1 (R Core Team, 2014) using the ggplot2 package v3.3.6 (Wickham, 2016).

Results
The GCTA heritability estimate for DLD was h 2 =0.040 (SE=0.014,P = 1.27×10 − 03 ) on the observed scale, corresponding to h 2 =0.272 on the liability scale for a population prevalence of 7 % and h 2 =0.304 for a population prevalence of 10 %.With LDSC, the heritability was h 2 =0.068 (SE=0.017) on the observed scale, corresponding to 0.463 and 0.518 on the liability scale for the above two prevalence values, respectively.In all conversions the proportion of cases used was 3.6 %, based on the numbers of DLD cases and controls in this study.
The GWAS did not obtain genome-wide significant hits at the conventional level of P = 5 × 10 − 08 .The top hit in this GWAS was with rs4689973 at chr4:5,013,282 (OR=1.368,SE(lnOR)=0.058,P = 6.41×10 − 08 , effect allele: T, other allele: C).The top hit in our previous GWAS, rs183826546, was not genome-wide significant in this GWAS (P = 4.95×10 − 05 ), but showed an association in the same direction as before (OR=1.902).Manhattan and QQ plots can be found in Supplementary Figure S1.

The heritability of DLD
Estimating the heritability of DLD is important for understanding the contribution of genetic differences to differences in risk of DLD in the studied population.Estimating the heritability from molecular genetic data can give us an idea as to the involvement of common genetic variants in DLD.This study reports, to the best of our knowledge, the first significant heritability estimate for DLD based on molecular genetic data.Our GCTA estimates for the heritability of DLD of 27-30 % are in line with a recent twin study of DLD (Toseeb et al., 2022) but much lower than the estimate for parental-reported speech and language difficulties (excluding autism and intellectual disability) (Keijser et al., 2024); although the LDSC estimates were higher, they did not reach the estimates reported in the Keijser et al. study.A meta-analysis and review of previous studies of SLI, which has more restrictive criteria than DLD, pointed to a strong genetic component using twin concordances, with some language-related traits measured in children with language disorder even showing maximum heritability (Stromswold, 2001).However, a later study found that non-specific language impairment had a higher heritability (Hayiou-Thomas et al., 2005).Given that the SNP-based heritability estimates measured by GCTA and LDSC are bounded by the true heritability, the heritability of DLD could be higher.With increased sample sizes and a more precise phenotype (as our study was based only on self-reports), higher estimates could potentially be obtained.However, it should be emphasized that heritabilities estimated based on common genetic variants, as reported in our study, only include contributions from additive effects of common genetic variants that were included in the given study, and contributions from other types of variants are therefore not captured by them (Yang et al., 2017).Furthermore, twin studies may overestimate the heritability due to gene-gene interactions, gene-environment interactions, and/or violations of some of their assumptions (Young, 2019).

Genetic correlations between DLD and spoken language traits and related disorders
Our study did not identify significant genetic correlation between DLD and ASD, ADHD or schizophrenia, disorders in which spoken language and communication may be affected (Baird et al., 2000;DeLisi, 2001;Dover and Le Couteur, 2007).In two previous studies, which used polygenic scores, no significant genetic overlap (after correction for multiple testing) was found between SLI and ASD, ADHD or schizophrenia, although a very small significant overlap was found between SLI and childhood autism (with R 2 <1 %) (Nudel et al., 2021(Nudel et al., , 2020)).Our results with DLD are in line with the previous results, showing that DLD is likely to be etiologically distinct from these disorders, although some overlap may still be possible.In the case of ASD, it should be noted that, as DLD cases did not have a diagnosis of ASD as per the diagnostic guidelines (Bishop et al., 2017), this could have an effect on the observed genetic correlation between DLD and ASD (or lack thereof).As noted in the introduction, there is a debate in the literature as to the extent of the phenotypic overlap between DLD (SLI in early studies) and ASD, but there are reports of specific genetic overlaps between the disorders, such as the case of the CNTNAP2 gene (Vernes et al., 2008).It is important to note that the methods estimating genetic correlation as employed in this study take into account contributions only from common variants and rely on additive effects.Therefore, it is possible that other types of genetic overlap between DLD and ASD, such as gene-gene interactions, exist, a view has been supported by simulation analyses (Bishop, 2010a), and that the same rare variants may yet be implicated in both disorders.
With regards to the quantitative traits included in our study, early accounts of SLI postulated a deficit in auditory processing (Tallal andPiercy, 1973a, b, 1974), and newer studies found differences in pitch in children with SLI compared to controls (Kiss et al., 2012;Sharma and Singh, 2020).However, no significant genetic correlations were found between DLD (which, by definition, includes SLI) and voice pitch or a measure of speech discrimination, suggesting that other factors may explain the phenotypic associations.Similarly, no significant genetic correlations were found between DLD and traits related to spoken language, namely nonword repetition and phoneme awareness.Nonword repetition (the ability to correctly repeat nonsense words that sound like words in the native language of the person being tested), which has been associated with SLI in early studies, is considered a measure of phonological short-term memory (Coady and Evans, 2008;Newbury et al., 2009;The SLI Consortium, 2002, 2004).This trait has also been suggested as a marker for SLI (Bishop et al., 1996).Even though the genetic correlation between DLD and nonword repetition was not significant in our study, it was the largest (in terms of absolute value) among all genetic correlations we obtained.The correlation estimate was negative, suggesting that DLD was associated with poorer performance on the nonword repetition task, as would be expected.The large standard error for the correlation is likely due, at least in part, to the small sample size for this trait in the discovery GWAS meta-analysis (around 12,800 individuals) (Eising et al., 2022).Phoneme awareness (awareness of the sound structure of the language, or of the small sound unites of the language (phonemes)) has also been shown to be impaired in children with SLI in kindergarten and lower school grades (Catts et al., 2005).In this case, however, the genetic correlation with DLD was very close to zero.One possible explanation for these results could be related to the fact that the majority of the individuals in the discovery GWASs for these traits were not ascertained for DLD (or SLI); it has been shown, in the case of intellectual disability, that, while genetic factors influencing intelligence in the normal range could lead to mild intellectual disability (due to a particular combination of genetic variants), they are not likely to be responsible for severe intellectual disability (Reichenberg et al., 2016).In SLI, a similar phenomenon has been reported for variants in a specific gene, ATP2C2, where associations with nonword repetition were found only in children with SLI but not in the general population (Newbury et al., 2009).
Past investigations into the heritability of DLD and related conditions such as SLI have used different methods and obtained very different estimates, and, as some authors have put it, the picture for DLD, in the context of genetics, is likely more complex than for other neurodevelopmental disorders (Mountford et al., 2022).This study reported, to the best of our knowledge, the first significant SNP-based heritability estimate for DLD, providing further insight into the genetic basis of DLD and the genetic relationships between DLD and related disorders and traits.

Declarations
Ethics DBDS has secured necessary permissions and approval from the Danish Data Protection Agency (P-2019-99), the Scientific Ethical Committee system in Central Denmark Region (1-10-72-95-13) and the Danish National Committee on Health Research Ethics (NVK-Fig.1. Genetic correlations (r g ) between DLD and other traits/disorders.Error bars represent the standard errors of the r g estimates.ADHD: attention deficit/hyperactivity disorder; ASD: autism spectrum disorder; DLD: developmental language disorder; NWR: nonword repetition; PA: phoneme awareness; SCZ: schizophrenia; SIN: speech-in-noise; VP: voice pitch.
1,700,407).Blood donors are asked to participate and sign an informed consent when they visit the blood bank to donate blood.

Table 1
Descriptive statistics for the sample included in this study.
DLD: developmental language disorder; SD: standard deviation.*Age at submission of the questionnaire.