Major Genetic Risk Factors for Dupuytren's Disease Are Inherited From Neandertals

Abstract Dupuytren's disease is characterized by fingers becoming permanently bent in a flexed position. Whereas people of African ancestry are rarely afflicted by Dupuytren's disease, up to ∼30% of men over 60 years suffer from this condition in northern Europe. Here, we meta-analyze 3 biobanks comprising 7,871 cases and 645,880 controls and find 61 genome-wide significant variants associated with Dupuytren's disease. We show that 3 of the 61 loci harbor alleles of Neandertal origin, including the second and third most strongly associated ones (P = 6.4 × 10−132 and P = 9.2 × 10−69, respectively). For the most strongly associated Neandertal variant, we identify EPDR1 as the causal gene. Dupuytren's disease is an example of how admixture with Neandertals has shaped regional differences in disease prevalence.


Introduction
Dupuytren's disease is a fibroproliferative disorder affecting the hand. Initially, nodules develop in the palmar fascia and thicken to form cords, leading to contractures causing fingers to become permanently bent in a flexed position. Although the condition can affect any finger, the ring and middle fingers are most often afflicted (Dutta et al. 2020). Several risk factors have been identified, including age, alcohol consumption, diabetes mellitus, and genetic predisposition (Rydberg et al. 2021). A Danish twin study reported 80% heritability (Larsen et al. 2015), indicating a strong genetic influence. One study estimated the prevalence of Dupuytren's disease among Norwegian individuals over 60 years to be ∼30% (Burge 1999). In contrast, Dupuytren's disease is less prevalent among individuals of primarily African descent. The geographic distribution has given Dupuytren's disease the nickname "Viking disease," although this designation has been opposed (Ng et al. 2019).
In addition to case reports of Dupuytren's disease in Africa, one study measured the prevalence in a comparable fashion between different ancestries as a part of the Million Veteran Program (Saboeiro et al. 2000). In this study, the prevalence was 0.73% (95% confidence interval [CI]: 0.72-0.75%) among individuals of primarily European descent and 0.13% (95% CI: 0.12-0.14%) among individuals of primarily African descent. We note that a recent study (Salari et al. 2020), that suggested a high African prevalence, erroneously assigned diabetic patients from Jordan (Mustafa et al. 2016) and a small cohort of admixed individuals inhabiting the remote island Tristan da Cunha (Beighton and Valkenburg 1974), as individuals of African ancestry.
Genome-wide association studies (GWASs) have identified 36 risk variants for Dupuytren's disease, several of which are located in the proximity of genes encoding members of the profibrotic Wnt signaling pathway (Dolmans et al. 2011, ten Dam et al. 2016. The very high prevalence of Dupuytren's disease in certain populations, as noted above, indicates that common genetic variants may be powerful modulators of disease risk. Here, we perform a meta-analysis incorporating data from 3 large-scale biobanks, including 7,871 individuals with Dupuytren's disease and 645,880 controls, and identify 21 novel genetic risk variants. The rarity of the disease in African populations led us to investigate if any of the common variants are inherited from Neandertals, given that gene flow from Neandertals is limited in individuals of African ancestry (Chen et al. 2020). We find 3 risk factors for MBE Dupuytren's disease inherited from Neandertals, 2 of which are the second and third major risk factors.

A Meta-analysis of Dupuytren's Disease Among 653,741 Europeans
We used 3 biobanks with individuals of primarily European descent, the UK Biobank (3,488 cases, 377,173 controls), FinnGen R7 (4,051 cases and 231,460 controls), and Michigan Genomics Initiative (MGI) freeze 3 (332 cases and 37,247 controls), to identify genetic risk variants for Dupuytren's disease. Analyzing the biobanks separately, we find 37 (UK Biobank), 19 (FinnGen), and 2 (MGI) haplotypes with genome-wide significance. To increase the statistical power, we performed a meta-analysis of the 3 biobanks based on single nucleotide variants present in all biobanks (N = 12,940,999 variants). To control for false positives, we calculated the genomic inflation factor which was found to be reasonable (λ = 1.10; supplementary fig. S1, Supplementary Material online). In total, we find 61 genetic loci of genome-wide significance ( fig. 1; supplementary table S1, Supplementary Material online), out of which 24 are not significant in any biobank singly. The meta-analysis identifies 21 novel risk loci in addition to the ones previously identified by 2 GWASs (Dolmans et al. 2011, Ng et al. 2017).

Neandertal Variants Increase the Risk for Dupuytren's Disease
First, we investigate the prevalence of Dupuytren's disease in individuals of European and African descent. In the MGI biobank, we find a Dupuytren's disease prevalence of 0.49% (95% CI: 0.26-0.83%) among individuals of primarily African ancestry, whereas the corresponding prevalence among individuals of primarily European ancestry is 1.37% (95% CI: 1.25-1.48%; t-test, 2-tailed, P = 0.00012). Similarly, a lower prevalence of Dupuytren's disease has been reported in the Million Veteran Program (Saboeiro et al. 2000) among individuals of primarily African ancestry (0.13%, 95% CI: 0.12-0.14%) compared with individuals of primarily European descent (0.73%, 95% CI: 0.72-0.75%). Taken together, European ancestry is associated with a higher risk for Dupuytren's disease than African ancestry (supplementary fig. S2, Supplementary Material online).
Given the rarity of Dupuytren's contracture among individuals of primarily African ancestry, we analyzed if any of the 61 risk loci harbor alleles inherited from Neandertals. We devised a novel algorithm, building on a recently described method (Chen et al. 2020), to identify archaic tracts in modern humans (see Materials and Methods). In brief, we identified haplotypes similar to the genomes of Neandertals and Denisovans (collectively referred to as "archaic humans"), now extinct human forms that lived in Western and Eastern Eurasia, respectively, until at least approximately ∼42,000 years ago (Devièse et al. 2021). We investigated whether such haplotypes overlapped with the most significant variant in each locus associated with Dupuytren's contracture and whether the most significant variant cosegregated with archaic haplotypes. Present-day genomes contain haplotypes that are similar to archaic genomes both because they inherited haplotypes from populations ancestral to both modern humans and archaic humans and because modern humans interbred with archaic groups when they met less than 100,000 years ago (Sankararaman et al. 2012). In order to restrict the analyses to recently introgressed archaic variants, we only considered haplotypes longer than 0.023 cM, which corresponds to a significance threshold of P < 0.05 for being derived from the ancestral population (adjusted for multiple comparisons; see Materials and Methods).
Next, we evaluate if these Neandertal-derived haplotypes are associated with an increased risk, or MBE protection against Dupuytren's disease. We find that all 3 Neandertal haplotypes are associated with an increased risk for Dupuytren's disease (table 1). Note, however, that for the chromosome 8 locus the G allele tagging the introgressed haplotypes in this locus is not observed in the sequenced Neandertal genomes. This variant is likely to represent an haplotype not yet sequenced from Neandertals. Another less likely possibility is that it is the result of a mutation early after introgression. Assuming a simple additive model, the combined odds ratio of carrying all 3 Neandertal risk alleles is 2.83 (95% CI = 2.62-3.05).
In a previous genome-wide scan (Racimo et al. 2017) the Neandertal haplotype associated with Dupuytren's disease on chromosome 8 has been identified as positively selected. It remains to be elucidated, however, what putatively positively selected trait this haplotypes might influence. In FinnGen and UK Biobank, this polymorphism is not associated with any trait other than Dupuytren's disease.
A rough calculation suggests that 8.4% of the heritability (h 2 ) explained by the 61 genome-wide hits in our meta-analysis can be attributed to the 3 genetic risk factors of Neandertal origin (see Materials and Methods). In line with this, we find that the difference in polygenic risk score between cases and controls for Dupuytren's disease (see Materials and Methods) is reduced if the 3 loci harboring Neandertal haplotypes are removed from the analysis (supplementary fig. S3, Supplementary Material online).

Functional Features of the Major Neandertal Risk Factor
In an attempt to identify putative causal genetic variants of the strongest risk factor inherited from Neandertals ( fig. 2), we perform a fine-mapping analysis of the risk locus. Plotting the variant association strengths, corrected for error sizes (i.e., Z-scores), for Dupuytren's disease against linkage disequilibrium with the strongest associated variant in the locus, we observe a straight line ( fig. 3A). This suggests that the weakly associated variants gain their association with the phenotype by cosegregating with more strongly associated variants. Thus, the assumption of one causal variant is parsimonious. Based on this assumption, a fine-mapping analysis identifies 9 single-nucleotide polymorphisms (SNPs) in the 95% credible set ( fig. 3B; table 2; see Materials and Methods), of which the 2 most significant variants are most impactful based on in silico effect prediction (Rentzsch et al. 2019; Combined Annotation Dependent Depletion scores of 7.3 and 10.0, respectively). We note that both of these variants fall within an open chromatin region in dermal fibroblasts (chr7:37,984,714-37,985,157, hg38;ENSR00000211077, Zerbino et al. 2015), whereas the other 7 variants do not. Of these 2 putative causal SNPs, the risk allele is ancestral for one of the SNPs (rs17171240) and for the other SNP it is derived among Neandertals and Denisovans (rs2044830, chr7:37,984,802G/A, hg38). These 2 SNPs are in perfect linkage disequilibrium among Europeans in the 1000 Genomes Project (Auton et al. 2015).
None of the alleles in the 95% credible set fall within an exon. We therefore explored possible effects on mRNA splicing and expression using the genotype-tissue expression project (GTEx; Lonsdale et al. 2013). We find that the strongest risk factor inherited from Neandertals is associated with altered splicing of EPDR1 transcripts in 3 tissues relevant for Dupuytren's disease; muscle (P = 1.7 × 10 −23 ), adipose tissue (P = 1.1 × 10 −11 ), and cultured fibroblasts (P = 5.9 × 10 −8 ; fig. 4A).
For the canonical transcript of EPDR1 (NM_017549.5), the region chr7:37,948,840-37,949,048 (hg38) encodes the second exon. In the GTEx data, we find that the strongest risk factor inherited from Neandertals is associated with the region chr7:37,921,208-37,950,200 (hg38) being  Archaic-like haplotypes cosegregating with risk variants (r 2 > 0.5 among Europeans) with a length of > 0.023 cM (see Materials and Methods). Odds ratios (OR) for association with Dupuytren's disease are derived from the meta-analysis of UK Biobank, FinnGen, and MGI. Coordinates shown in hg38.

Discussion
There are geographical differences in the amount and type of archaic ancestry. People from Africa south of Sahara have little, or minimal, ancestry from Neandertals or Denisovans, whereas people with roots outside of Africa have inherited 1-2% of their genome from Neandertals (Chen et al. 2020). Denisovan ancestry is found in East Asian and Oceanian populations, where some populations in Oceania have up to ∼5% of Denisovan ancestry (Larena et al. 2021). Given these regional differences, it is likely that archaic gene variants contribute to some phenotypes that are primarily seen only in certain populations. Here, we focus on one such regional phenotype, Dupuytren's disease.
The present meta-analysis identifies 61 independent genetic risk variants associated with Dupuytren's disease. We find that the second most important genetic risk factor located on chromosome 7, which confers an odds ratio of ∼1.8 (P = 6.4 × 10 −132 ; fig. 3), is of Neandertal origin. By analyzing mRNA data from relevant tissues (muscle, adipose tissue, and fibroblasts), we find that the genetic risk factor is associated with a splicing variant of EPDR1 resulting in an inferred truncation of the protein. Previous work has suggested the gene SFRP4 as the causal agent in this locus (Ng et al. 2017, Jin et al. 2022, although EPDR1 has also been implicated in Dupuytren's disease susceptibility (Staats et al. 2016). The present data suggest EPDR1 as a causal gene. EPDR1 is a lysosomal protein and recent structural determination using X-ray crystallography suggests that the functional protein complex is a dimer and contains a lipid binding site (Wei et al. 2019), which are disrupted by the alternative splicing caused by the Neandertal haplotype. Although the exact pathophysiological mechanism for how the truncation of EPDR1 might contribute to the development of Dupuytren's disease deserves further investigation, it is worth noting that this protein is suggested to be involved in myofibroblast contractility (Staats et al. 2016).
Fine-mapping of the Neandertal haplotype suggests 2 equally likely causal variants located in an open chromatin region separated by only 170 base pairs (rs17171240, chr7:37,984,972G/A and rs2044830, chr7:37,984,802G/A, hg38). These variants cosegregate in modern humans and the Neandertal state of them are associated with an increase in risk. Given that one of the SNPs harbors an ancestral risk allele and the other a Neandertal-derived risk allele, we can only speculate as to whether the risk for Dupuytren's disease for this locus partly represents an ancestral state or a phenotype that emerged among Neandertals.
In addition to the risk locus on chromosome 7 (rs17171240), which is the second-most strongly associated variant in our meta-analysis, we find 2 additional loci where the risk allele is inherited from Neandertals,  Assuming that Neandertal and modern human haplotypes have equal effects and frequencies, the expected Neandertal heritability to a genetic trait is reflected by the average Neandertal ancestry among Europeans (∼2%). Under such a scenario, Dupuytren's disease is approximately 4 times more influenced by Neandertal gene flow than expected (8.4% vs. ∼2%). However, Neandertal haplotypes typically have substantially lower frequencies than averages modern human haplotypes. The average minor allele frequency in the GWAS catalogue is 26.6% (Auton et al. 2015), similar to the 24.6% minor allele frequency for the loci not harboring Neandertal haplotypes in our meta-analysis. Assuming that half of the genome contains gene flow from Neandertals, Neandertal haplotypes have an average frequency of ∼4%. Using this allele frequency for the loci harboring Neandertal haplotypes and the average catalogue frequency for the rest of loci, we find that the expected heritability explained by Neandertal admixture is ∼0.4% (see Materials and Methods). Under these assumptions, the observed influence of Neandertal ancestry on Dupuytren's disease is ∼20-fold larger than expected. Taken together, these calculations suggest that admixture with Neandertals has a substantial impact on the prevalence of Dupuytren's disease in Europe.

Meta-analysis
For the meta-analysis, we intersected SNPs present in the UK Biobank (N = 56,936,751),FinnGen (N = 16,383,194),and MGI (N = 39,981,535), resulting in 12,940,999 unique variants present in all 3 biobanks. We meta-analyzed the effects in each of the studies using the inverse-variance weighted method and fixed-effects (Balduzzi et al. 2019). To avoid floating point errors for P < 10 −300 , numerical approximation was performed based on Z-values using NIntegrate in Mathematica 12 (Wolfram Research, Inc., Champaign, IL). Loci with genome-wide significant (P < 5 × 10 −8 ) association with Dupuytren's disease were identified using LocusZoom (Pruim et al. 2010, Boughton et al. 2021).

Gauging Neandertal Ancestry in the Genetic Predisposition to Dupuytren's Disease
Given that only about half of any archaic genome is mapped, we did not require the most significant variant in each locus to be present in an archaic genome and missing in some modern human reference panel with minimal archaic ancestry, as we have done previously (Zeberg et al. 2020a(Zeberg et al. , 2020b. Instead, we investigated to what extent the most significant polymorphism in each locus cosegregated with overlapping archaic tracts. We used the software IBDmix (Chen et al. 2020) to identify archaic tracts, using 1 of the 4 high-coverage archaic genomes available (Meyer et  Genetic Risk Factors for Dupuytren's Disease · https://doi.org/10.1093/molbev/msad130 MBE included in the 1000 Genomes Project (Auton et al. 2015). IBDmix was run with default parameters, except for a more conservative LOD-score threshold of 4, as has been suggested (Chen et al. 2020). To determine if archaic tracts were homozygously present, we used bcftools v1.14 to call runs of homozygosity. PLINK v1.9 was used to

MBE
determine the linkage disequilibrium (unphased r 2 ) between a given variant and overlapping archaic haplotypes. To exclude that an archaic-like sequence originates from the common ancestor, we calculated the minimal genetic length that would exclude incomplete lineage sorting. We used a mathematical framework based on previous work (Huerta-Sánchez et al. 2014), but in contrast to performing the calculations on the physical length and using years, we directly used genetic length and branch lengths in generations, as follows. Let L be the expected genetic length of a shared ancestral sequence given by the inverse of the total branch length. The number of generations since the common ancestor of Neandertals and modern humans has been estimated to ∼21,500 (Sjödin et al. 2021) and the archaic admixture took place ∼2,000 generations ago (Sankararaman et al. 2012). Thus, the total branch length for a recent gene-flow model is 2 × 21,500-2,000 = 41,000 generations, and therefore the expected genetic length of a shared sequence is L = 1/(41,000 generations) = 1/410 cM. Conditioning on observing the archaic like tract on both branches, the probability of a length of at least m follows a Gamma distribution with shape parameter 2, and rate parameter = 1/L. We required Bonferroni correction for the number of loci tested, giving a significance threshold of P = 0.05/61. Finally, we solved numerically the equation 1-GammaCDF(m, shape = 2, rate = 410) = 0.05/61, where GammaCDF is the cumulative gamma distribution, yielding m = 0.023 cM. Hence, only tracts longer than 0.023 cM were considered significant for gene flow.
A limitation with using genetic distances is that the historical recombination map is largely unknown, particularly for archaic populations. A recombination map estimated from African populations, however, may be more accurate given that the vast majority of the modern human branch predates the major out-of-Africa exodus. Moreover, out-of-Africa populations differ in recombination rates due to drift that occurred in the out-of-Africa bottleneck (e.g., due to alleles in PRDM9; Paigen and Petkov 2018). We decided to use a recent map based on 2,046 unrelated individuals of African ancestry from the Jackson Heart Study (Zhou et al. 2020). For control purposes, we iterated the analysis using recombination maps from the Framingham Heart Study (Americans; Zhou et al. 2020) and deCODE (Icelandic;Halldorsson et al. 2019). At the end of the chromosomes where genetic distances were missing, we assumed a recombination rate of 1 cM/Mb (i.e., a cutoff of 23 kb).

Fine-mapping Analysis
We performed a Bayesian fine-mapping using LocusZoom (Pruim et al. 2010, Boughton et al. 2021. For this analysis, we restricted the haplotype-based recombination hotspots derived from the 1000 Genomes Project (hg19) which were lifted over to hg38 coordinates using the University of California, Santa Cruz, Genome Browser (https://genome.ucsc.edu/). To investigate the ancestral state among the putative causal SNPs, we obtained the ancestral alleles from Ensembl (Cunningham et al. 2022).

MBE
Scotland, Iberian in Spain, Toscani in Italia, and Utah Residents with Northern and Western European Ancestry) were used as references to reflect the included biobank participants.

Heritability Calculations and Polygenic Risk Scoring
When estimating heritability for genetic variants for casecontrol studies, a linear transformation exists such that h 2 ≈ k × f × (1−f) × β, where β is the effect estimate, f the allele frequency and k a scaling factor dependent on the population, sample prevalence and the liability threshold (Lee et al. 2011). We calculated the sum of h 2 for the most significant variant in all 61 genome-wide loci (h 2 total ) and the sum for the 3 variants inherited from Neandertals, h 2 Neandertals . Dividing h 2 Neandertals with h 2 total (thereby canceling k) results in that 8.4% of the heritability explained by the 61 genome variants could be attributed to Neandertal gene flow. For the expected Neandertal contribution to any genetic trait, we used the same formulae as above assuming 2% of the loci harbor Neandertal haplotypes and using various assumptions regarding the allele frequencies (see Discussion).
Polygenic risk score was calculated per group or individual as a sum of effect sizes per risk allele multiplied by the number of risk alleles. First, we calculated the polygenic risk score for all loci associated with Dupuytren's disease by using allele frequency data for risk alleles based on cases and controls in the FinnGen biobank. Next, we calculated polygenic risk scores per population in the 1000 Genomes Project (Auton et al. 2015) based on African, European, East Asian, and South Asian populations. The polygenic risk score probability distribution was determined for all 61 associated risk variants and compared with the corresponding distribution in the absence of the 3 Neandertal-derived risk loci.

Biobank Ethics Statements and Biobank Materials and Methods (FinnGen and MGI)
Patients and control subjects in FinnGen provided informed consent for biobank research, based on the Finnish Biobank Act. Alternatively, separate research cohorts, collected prior the Finnish Biobank Act came into effect (in September 2013) and start of FinnGen (August 2017), were collected based on study-specific consents and later transferred to the Finnish biobanks after approval by Fimea (Finnish Medicines Agency), the National Supervisory Authority for Welfare and Health. Recruitment protocols followed the biobank protocols approved by Fimea.

MGI Cohort
The MGI comprises genome-wide data and electronic health information for patients recruited via Michigan Medicine (Zawistowski et al. 2023). Informed written consent was obtained from each participant, who donated a blood sample for genetic analysis and underwent a comprehensive medical assessment. Data were collected according to Declaration of Helsinki principles. Written consent forms and study protocols were reviewed and approved by the University of Michigan Medical School Institutional Review Board.

Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.