Microdeletions and microduplications linked to severe congenital disorders in infertile men

Data on the clinical validity of DNA copy number variants (CNVs) in spermatogenic failure (SPGF) is limited. This study analyzed the genome-wide CNV profile in 215 men with idiopathic SPGF and 62 normozoospermic fertile men, recruited at the Andrology Clinic, Tartu University Hospital, Estonia. A two-fold higher representation of > 1 Mb CNVs was observed in men with SPGF (13%, n = 28) compared to controls (6.5%, n = 4). Seven patients with SPGF were identified as carriers of microdeletions (1q21.1; 2.4 Mb) or microduplications (3p26.3, 1.1 Mb; 7p22.3-p22.2, 1.56 Mb; 10q11.22, 1.42 Mb, three cases; Xp22.33; 2.3 Mb) linked to severe congenital conditions. Large autosomal CNV carriers had oligozoospermia, reduced or low-normal bitesticular volume (22–28 ml). The 7p22.3-p22.2 microduplication carrier presented mild intellectual disability, neuropsychiatric problems, and short stature. The Xp22.33 duplication at the PAR1/non-PAR boundary, previously linked to uterine agenesis, was detected in a patient with non-obstructive azoospermia. A novel recurrent intragenic deletion in testis-specific LRRC69 was significantly overrepresented in patients with SPGF compared to the general population (3.3% vs. 0.85%; χ2 test, OR = 3.9 [95% CI 1.8–8.4], P = 0.0001). Assessment of clinically valid CNVs in patients with SPGF will improve their management and counselling for general and reproductive health, including risk of miscarriage and congenital disorders in future offspring.

Male factor infertility is a prevalent health condition (5-10% of men) with broad etiologies, clinical and social consequences 1 . Known genetic causes (e.g., 47, XXY karyotype) explain < 10% of male infertility cases, whereas every second patient remains idiopathic 2 . Copy number variants (CNVs) refer to locus deletions or duplications that span from single genes to large genomic segments. CNVs may modulate the process of spermatogenesis through altered gene dosage effect, impaired homologous recombination, and/or genomic instability leading to errors in chromosomal segregation. So far, data on the role and clinical validity of CNVs predisposing to male infertility is limited 3 . The only CNVs included in the diagnostic workup of men with spermatogenic failure (SPGF) are sporadic microdeletions encompassing three Azoospermia Factor regions (AZFa, AZFb or AZFc) located in the Y chromosome and mostly leading to complete lack of mature sperm 2 . Recently, a novel Y-haplogroup specific inversion was described that predisposes to recurrent AZFc deletions and consequently, to SPGF 4 . Also some rare autosomal and X-linked CNVs have been confidently shown to cause male infertility phenotypes, such as heterozygous loss of WT1 and congenital genitourinary disorders 5 , hemizygous deletions of TEX11 and meiotic arrest 6 , biallelic loss of DPY19L2 and globozoospermia 7 . A multi-center study of patients with azoospermia has reported rare non-recurrent deletions in the DMRT1 genomic region involved in sex development 8 .
A decade ago, two seminal studies profiling genome-wide CNVs in patients with SPGF reached concordant results that the average number and total load of DNA gain/loss per genome was similar to fertile men and the general population 8,9 . The few reports published on this topic across 10 years have not identified any additional replicated candidate infertility-linked CNVs [8][9][10][11][12][13] . Therefore, more investigations are needed to fill the gaps in this underexplored field.
Calling and analysis of CNVs. Blood genomic DNA was genotyped using Illumina HumanOmniExpress-24-v1.0/v1.1 BeadChips at the institutional genotyping core facility (https:// genom ics. ut. ee/ en/ genom ics-corefacil ity). CNVs were called based on 705,754 SNPs present at the same genomic locations on both genotyping array versions. The established pipeline for autosomal CNV calling was used 17,18 , based on parallel implementation of three CNV prediction algorithms -QuantiSNP v2. 3 19 , GADA (Genome Alteration Detection Analysis), and CNstream 20 (Supplementary Methods). The HD-CNV algorithm 21 was implemented to identify overlapping CNVs called by alternative prediction tools. A criterion of 40% reciprocal overlap in the predicted region (minimal length of 100 bp) for the same type of event (deletion or duplication) was used to define confident CNVs. All CNVs called by at least two algorithms for the same individual were included in the final list (Supplementary  Table S1). A large microduplication at 7p22.3-p22.2 (Case SO1) was validated by array-CGH and the carriership of an LRRC69 intragenic deletion among 215 SPGF cases was confirmed by TaqMan qPCR (Supplementary Methods).
X chromosome CNVs were detected using the Illumina Genomestudio 2.0.5 built-in CNV analysis tool with default settings (cnvPartition CNV Analysis Plugin v3.2.0), as algorithms used for autosomal CNV calling tend  24 . Recent literature reviews were used to assemble the list of genes implicated in monogenic forms of male infertility 25,26 . Statistical analysis. Statistical analyses were performed using the R Statistical Software version [3.5.4] (http:// www.r-proje ct. org/). Two-tailed Student t-test was used to assess the differences between clinical subgroups in the size and count of CNVs, and the χ 2 test to compare the distribution of deletions and duplications. The nominal P value < 0.05 was considered statistically significant.
Enrichment of the LRRC69 deletion in SPGF was tested compared to the CNV dataset of 45,390 populationbased subjects 27,28 recruited to the Estonian Biobank (Supplementary Methods).

Results
Profile of autosomal deletions and duplications identified in the study group. Across 277 study subjects (Table 1), 2026 autosomal CNVs were identified in total (median size 33 kb, range 185 bp-2.4 Mb; Table 2, Supplementary Table S1). Losses were mapped to 697 and gains to 512 unique loci ( Fig. S1-2). The median length of a duplication exceeded over fourfold the median span of a deletion (86.2 vs. 19.6 kb Student t-test, P = 2.4 × 10 -17 ). The most extensive rearrangements were detected on chromosome 16 (mean 911 bp/ Mb per subject, Fig. S3). No significant differences were observed in the median number and load of CNVs per subject among subgroups stratified by spermatogenic output (Fig. 1, Table 2, Fig. S4). Trends for a longer cumulative span of deletions (median 191 vs. 160 kb; P < 0.05) and a shorter span of duplications (292 vs. 399 kb; P < 0.05) were observed in patients with NOA compared to normozoospermic men. The summary data of genome-wide CNVs was consistent with previous reports [8][9][10][11] .
A NOA patient with a 2.3 Mb X chromosome duplication encompassing the PAR1/non-PAR boundary. X chromosome CNVs (median 128.8 kb; range 14-2298 kb) were mapped to 31 unique regions (Supplementary Tables S1, S3). No enrichment was identified in men with SPGF compared to NORM subjects.
Patient NOA9 carried a 2.3 Mb duplication at Xp22.33 ( Fig. 1, Tables 3, 4, Supplementary Table S2). Interestingly, he presented tall stature (191 cm) and a high FSH level (21.1 IU/L, aged 28 years) that are also typical to patients with Klinefelter syndrome (mostly presenting NOA) caused by extra copies of the X chromosome 38 . However, his bitesticular volume (45 ml) and testosterone level (17.4 nmol/l) were normal. The identified DNA gain encompasses the boundary between the pseudoautosomal (PAR1, ~ 1 Mb) and non-PAR regions (~ 1.3 Mb) and represents a rearrangement-prone locus affecting recombination between the X and Y chromosomes 39 . This variant has been reported in a female patient with vaginal and uterine agenesis 36 .
One recurrent X chromosome variant, a ~ 90 kb deletion encompassing ZNF630 at Xp11.23, was detected only in cases with SPGF (one NOA, two SO). Although its allele frequency in this study and in a large multicenter dataset of cases with NOA 8 was similar (1.4% and 1.6%, respectively), this deletion has been reported with a 1.2% prevalence in the general population (gnomAD SVs v2.1: variant: DEL_X_186083) and therefore, is an unlikely major contributor to SPGF.     Table S2). None of the men carrying large CNVs in pericentromeric regions presented severe congenital or chronic health conditions apart from SPGF.
A recurrent deletion in the pericentromeric region of 16p11.2 (2 Mb) was detected in 12 of 215 SPGF patients (median sperm count 2.9 × 10 6 / ejaculate) but was not identified in NORM subjects (Fischer exact test, P = 0.07). The frequency observed in cases with SPGF exceeded the reported population prevalence fourfold (5.6% vs. 1.4%; Fig. 2a, Supplementary Table S4). The 16p11.2 deletion encompasses the hominid-specific TP53TG3 cluster comprised of six duplicated genes with enriched expression in the epididymis, spermatocytes, early and late spermatids.
A recurrent partial deletion of the testis-specific LRRC69 gene is significantly associated with SPGF. Six smaller autosomal deletions (median 41.0 kb, range 1.9-191.6 kb) and six duplications (median 50.3 kb, range 16.5-341.3 kb) (all heterozygous) were identified that were carried by four or more patients with SPGF but not by any NORM subjects (Fig. 2a, Supplementary Table S4). No recurrent CNVs were overrepresented in a specific SPGF subgroup.
A heterozygous partial LRRC69 gene deletion (52-69 kb) that removes four exons of its alternative transcript (ENST00000448384.2) was detected and validated using a locus-specific assay in seven patients with SPGF, including four cases with NOA (Methods, Fig. 2b, Supplementary Table S5). This previously undescribed testisenhanced gene is expressed in early and late spermatids 41 but there is no data on the phenotypic consequences of its reduced dosage. A highly significant enrichment of the LRRC69 deletion among cases with SPGF compared to population-based participants in the Estonian biobank (EstBB) 27 Other gene deletions were ranked as low priority due to their high population prevalence (AADAC, CSMD1, QRFPR loci; 1.5-6.4%) or unlikely major involvement of lost gene copies in SPGF (TMTC2, PSG cluster). All but one (ABCC4) recurrent duplications were mapped to subtelomeric regions with no present evidence that increased dosage of encompassed genes would affect spermatogenesis.
CNVs involving genes linked to monogenic male infertility are not enriched in cases with SPGF. Nine heterozygous deletions (three independent loci) and two duplications (two loci) were identified encompassing genes linked to monogenic infertility (Supplementary Table S7). Most involved genes were linked to disorders with autosomal recessive inheritance. One patient with NOA carried four copies of PLXNA1,

Discussion
This study analyzed the genome-wide CNV profile in 215 patients with idiopathic SPGF and 62 normozoospermic fertile men. A considerable number of patients with SPGF (n = 28; 13%) carried large deletions and duplications spanning over 1 Mb, whereas the respective carrier frequency in the normozoospermic fertile men was two times lower (n = 4, 6.5%). The proportion of large CNVs in infertile men is in a similar range as reported for patients with autism 43 or DD 44 . In comparison, a study of 7,877 Estonian Biobank participants identified only 2% of subjects as carriers of > 1 Mb CNVs 27 .
As the main outcome of this study, seven patients with SPGF (~ 3.3%) were identified as undiagnosed carriers of microdeletions (1q21.1) or microduplications (3p26.3, 7p22.3-p22.2, 10q11.22, Xp22.33) linked to severe congenital developmental conditions (Table 3). This is comparable to the prevalence of Y chromosome AZFa-c microdeletions (2-10%) currently included in the diagnostic pipeline for SPGF 2,45,46 . Clinical data of patients with large autosomal DNA gains/losses are supportive to congenital testicular maldevelopment (bitesticular volume < 30 ml, FSH > 12 IU/l) as their primary cause of SPGF (  Intronic CSMD1 deletions (indicated by #) have been proposed as candidate contributors to male infertility 8 , but in this study the detected prevalence in cases with SPGF was lower than reported in the general population. Details are presented in Supplementary  Table S4. (B) The major protein-coding transcripts of the LRRC69 gene expressed in the human testis according to the GTEx database 40 . The identified LRRC69 partial deletion is shown with a red box. Genomic coordinates of the minimal deleted region (52,375 bp) are chr8:92,128,840-92,181,214 (hg19). TPM, transcripts per million. (C) Statistically significant association between LRRC69 intragenic deletions among cases with SPGF compared to population-based participants in the Estonian Biobank (EstBB) cohort 27,28 . A χ 2 test was used to test the difference between the two groups. www.nature.com/scientificreports/ (total sperm count 0.2 × 10 6 ) with the non-recurrent 7p22.3-p22.2 microduplication also presented mild ID, neuropsychiatric problems, and short stature. To our knowledge, this is the first clinical description of an adult subject with this variant, so far reported in pediatric patients with a severe syndromic DD phenotype 31 . One patient with SPGF was identified as a carrier of the recurrent 1q21.1 microdeletion spanning 2.4 Mb (Fig. 1). The prevalence of undiagnosed cases carrying this large CNV has been reported 1/2,626 in the Estonian populationbased study 27 . The 1q21.1 microdeletion is listed in the DECIPHER 22 database of developmental disorders and characterized by incomplete penetrance and variable expressivity, including infertility and cryptorchidism in some male carriers 29,30 . Unfortunately, general health data of the case in our study was unavailable. Inheritance from healthy parents and incomplete penetrance have also been evidenced for the 3p26.3 microduplication 34 , whereas other identified non-recurrent microduplications have been reported mostly as de novo events with no penetrance estimates (Table 3). Non-recurrent microduplications at 10q11. 22 have been described in singleton cases with pediatric-onset epilepsy and ID 32,33 , but this study identified three carriers among SPGF cases. Their andrological data resembled the phenotype of the 1q21.1 deletion carrier -oligozoospermia, lower bitesticular volume compared to controls, increased FSH and BMI ( Table 4). The variant spans two candidate genes with testis-specific (ANTXRLI, restricted to spermatids) or -enhanced (PTPN20, spermatocytes and spermatids) expression. The case with NOA carrying the Xp22.33 duplication spanning the PAR1/non-PAR boundary had normal testes size, suggesting a primary defect in the process of spermatogenesis per se. As the amplified region involved no apparent candidate genes, a structural effect disturbing recombination between X and Y is a likely scenario.
CNVs encompassing large genomic regions will potentially impair critical processes in spermatogenesis -mitosis required for the efficient proliferation and differentiation of spermatogonia, and the quality of meiosis resulting in haploid spermatids. There is an abundance of literature showing that large duplications affect meiotic chromosome pairing and lead to non-allelic homologous recombination events, generating genomic rearrangements 47,48 . Meiosis may be further impaired due to translocated duplications. Acrocentric segmental duplications promoting interchromosomal duplications between acrocentric and non-acrocentric chromosomes 1, 3, 4, 7, 9, 16, and 20 are particularly prone to double-strand breaks 49 . Large CNVs in pericentromeric regions identified on chromosomes 14, 15, 16, and 19 may modulate the stability of long stretches of constitutive heterochromatin (Table 3, Supplementary Table S2), potentially affecting meiotic chromosome pairing and assembly of the kinetochore complex for chromosomal segregation 50,51 . Other reports on SPGF cases have also described CNVs in pericentromeric regions on chromosomes 13-16 52 . In our study, extensive deletions at 16p11.2 were detected in 12 patients with SPGF but not in fertile men. Chromosome 16 is susceptible to heteromorphisms and includes a hotspot for inversions leading to subsequent deletions or duplications that may predispose to errors in chromosomal segregation 53 . Reduced dosage of six testis-enriched TP53TG3 genes in the deleted region may also contribute to SPGF. The role of structural variation at 16p11.2 in SPGF must be further studied using targeted long-read sequencing.
It has been discussed that there is an abundance of genomic rearrangement hotspots that harbor genes important for spermatogenesis 54 . Also in this study, novel candidate genes for male infertility were identified that were disrupted by CNVs (Table 1, Fig. 2). As a secondary outcome, a recurrent deletion within the testis-specific LRRC69 gene was identified as a novel candidate risk factor for male infertility. It was carried by seven cases with SPGF (and no NORM) and presented significantly higher prevalence among infertile men than in the general Estonian population (3.3% vs. 0.9%). A possible link to spermatogenic parameters might be through carnitine levels 55 , which are lowered by LRRC69 loss-of-function variants 56 . Further investigation of the functional link between LRRC69 and spermatogenesis is warranted.
Several large microdeletions and microduplications identified in this study have been mainly or only described in pediatric patients presenting ID/DD (Table 3). Alternative scenarios need to be considered to understand the pleiotropic effect of large CNVs in causing developmental disorders and/or potentially leading to SPGF. Whereas microdeletion/microduplication syndromes are mainly considered to be caused by altered dosage of critical genes, most highlighted CNVs did not encompass any apparent candidate genes for SPGF. As discussed above, the presence of a large genomic rearrangement per se may affect the complex process of spermatogenesis. In other occasions, large CNVs may involve genes implicated in brain development and function, as well as those required for spermatogenesis. Altered dosage of those genes will have differential effect, depending on the tissue of action. Notably, according to the Human Protein Atlas, the human brain and testis share the highest number of group enriched genes, indicating potential shared pathways in spermatogenesis and brain development and function 41 . One example is PTPN20 (10q11.22 microduplication) that has the highest expression in both, testicular and brain tissues. It has been proposed that the role of these two tissues in the speciation process could explain the high similarity of their proteomic profile 57 . However, at present the clinical relevance (shared cause for neurological disorders and impaired spermatogenesis) of this topic is underexplored.
As a practical implication, the study data provided supportive evidence that introducing chromosomal microarray analysis into routine andrological workup of certain subgroups of cases with SPGF will improve their molecular diagnostics and clinical management. This additional genomic analysis will be relevant for idiopathic NOA and SO patients after standard genetic evaluation. Infertile men with sperm concentration less than 5 × 10 6 /ml could be tested, concordant with current recommendations for the analysis of Y chromosome AZF microdeletions 58 . Before genetic testing, counselling by a clinical geneticist could be considered for extended patient phenotyping and compilation of family health history. This advanced genetic assessment will facilitate not only evidence-based counselling of the couple about their reproductive choices and predisposition to pregnancy failure 18,59 but also identification of congenital risks for health conditions of the patient and future offspring. www.nature.com/scientificreports/ Limitations. The limitations of the study have to be acknowledged. Standard chromosome analysis was not performed in 48 (22.3%) patients with SPGF and all individuals in the NORM group. As microarray-based analysis does not enable to detect inversions, translocations, and complex genomic rearrangements., these variant types may have been missed. Additionally, the exact breakpoints of deletions and duplications cannot be precisely determined with the resolution of the SNP microarray dataset. Also, this study did not analyze CNVs on the Y chromosome due to high level of genomic complexity and low number of unique SNPs. Calling of Y-linked CNVs requires targeted genotyping and/or sequencing approaches 4 . Due to different approaches for CNV calling, only deletions/duplications of 10 kb or more were evaluated for the X chromosome and the load of CNVs on autosomes and the X chromosome could not be directly compared. Recurrent disease-associated rare genetic variants identified in one population, such as the partial deletion of LRRC69 in this study, may not be present in subjects with other ancestries. In the current retrospective study, health data and blood samples of family members had not been collected at patient recruitment. Therefore, the origin of the identified CNVs (de novo or inherited) could not be determined and the phenotypes of other possible CNV carriers in the family were unavailable. In perspective, collecting parental informed consent for their carrier testing and family history of health conditions should be recommended upon recruitment of patients with SPGF for genetic testing and research.

Conclusions
The diagnostic pipeline of SPGF cases will benefit from chromosomal microarray analysis by identifying undiagnosed carriers of clinically relevant microdeletions and microduplications. This will add significant value in the routine management and counselling of infertile men for their general and reproductive health.

Data availability
All data generated or analyzed during this study are included in this published article. Data of the identified CNVs have also been submitted to NCBI dbVar (https:// www. ncbi. nlm. nih. gov/ dbvar/) under the study accession number nstd227.