Developmental genomics of limb malformations: Allelic series in association with gene dosage effects contribute to the clinical variability

Summary Genetic heterogeneity, reduced penetrance, and variable expressivity, the latter including asymmetric body axis plane presentations, have all been described in families with congenital limb malformations (CLMs). Interfamilial and intrafamilial heterogeneity highlight the complexity of the underlying genetic pathogenesis of these developmental anomalies. Family-based genomics by exome sequencing (ES) and rare variant analyses combined with whole-genome array-based comparative genomic hybridization were implemented to investigate 18 families with limb birth defects. Eleven of 18 (61%) families revealed explanatory variants, including 7 single-nucleotide variant alleles and 3 copy number variants (CNVs), at previously reported “disease trait associated loci”: BHLHA9, GLI3, HOXD cluster, HOXD13, NPR2, and WNT10B. Breakpoint junction analyses for all three CNV alleles revealed mutational signatures consistent with microhomology-mediated break-induced replication, a mechanism facilitated by Alu/Alu-mediated rearrangement. Homozygous duplication of BHLHA9 was observed in one Turkish kindred and represents a novel contributory genetic mechanism to Gollop-Wolfgang Complex (MIM: 228250), where triplication of the locus has been reported in one family from Japan (i.e., 4n = 2n + 2n versus 4n = 3n + 1n allelic configurations). Genes acting on limb patterning are sensitive to a gene dosage effect and are often associated with an allelic series. We extend an allele-specific gene dosage model to potentially assist, in an adjuvant way, interpretations of interconnections among an allelic series, clinical severity, and reduced penetrance of the BHLHA9-related CLM spectrum.


Introduction
Vertebrate limb development is a sophisticated, patterned process with the limbs formed in the early embryonic stages because of underlying mesodermal-ectodermal cellular interactions. Orchestrated by complex ''finetuned'' gene dosage expression, tissue/cell gradients of developmental factors or signal molecules are formed through inductive interactions between mesoderm/ectoderm tissues and oscillation of several signal transduction pathways. 1 CLM represents a broad spectrum of prenatal developmental defects, including limb reduction, appendicular skeletal anomalies, and/or abnormalities in the number, length, anatomical positioning, and anatomy of the digits. 4 CLMs affect approximately 1 in 500 newborns, with occurrence as a deformation, disruption, or isolated developmental event (i.e., malformation), or it can be a manifestation of an endophenotype of a rare Mendelian disease trait. 5,6 In addition to extensive interfamilial and intrafamilial clinical and genetic heterogeneity, incomplete penetrance and variable expressivity have been frequently highlighted in CLM studies; these genetic features are among some of the inherent challenges of concluding molecular diagnosis(es) for children and families manifesting CLM and for genetic diagnoses from unphased genomic information.
During the past decade, elucidation of the molecular etiology of CLM has been facilitated by advances in highthroughput family-based genomics, rare variant allele studies, computational tools, and population variation studies, driving the molecular diagnostic ''solved rate'' of a random CLM-related cohort to be reported with a range of 10%-35% in different populations. [7][8][9][10][11] To date, more than 150 ''disease trait genes or loci'' have been attributed to CLM in affected individuals and families. A diverse mutational landscape comprised of single-nucleotide variant (SNV) alleles, small insertion or deletion (indel) events, simple sequence repeat expansions, and locus copy number variant (CNV) alleles involving coding or non-coding regulatory regions has been elucidated. 8,12,13 Despite the continuing efforts of genome-wide screening and novel molecular etiological insights provided for CLM, many candidate genes/loci remain uncharacterized or under-characterized, or allelic series at a locus are not explored. Some clinically well-described CLMs with numerous case reports published through decades of research still lack a clear underlying contributing genetic variant explanation: examples include Gollop-Wolfgang complex (GWC; MIM: 228250) and fibular aplasia, tibial campomelia, and oligosyndactyly (FATCO) syndrome (MIM: 246570). Recurrent CLM-associated CNVs and other variant types are often poorly elucidated and mechanistically enigmatic from a mutational mechanism standpoint, whether as de novo allelic variants associated with a sporadic congenital anomaly or contributing as a recently arising mutation in clan ancestors. 14 Contributions to a biallelic variant state are not always genetically clear as to when homozygosity results from identity by state (IBS; identical alleles/genomic segments do not share a common ancestor) versus identity by descent (IBD; identical alleles/genomic segments inherited from a common ancestor without intervening recombination). In this study, we implemented family-based genomics and rare variant analyses as a molecular ''entry point'' to decipher the biology, mutational features, and mutational mechanisms underlying the genetic and developmental basis of CLM.

Methods and materials
Sample preparation Prior to enrollment for genomic studies, the phenotype of each affected individual was initially investigated by local clinical geneticists and re-evaluated by physicians, including pediatricians, pediatric neurologists, and medical geneticists, in our research genomics group. To reduce the clinical heterogeneity, the recruited cases were primarily those with phenotypic findings involving the appendicular skeleton, focusing on limb anomalies as their major structural morphogenesis defects. Peripheral blood from each registered individual and associated family members was collected locally in EDTA tubes and shipped to our lab for genomic DNA extraction using standard protocols.

Exome sequencing analysis and rare variant SNV prioritization
Family-based exome sequencing (ES) of all subjects was implemented at the Baylor College of Medicine Human Genome Sequencing Center (BCM-HGSC) under previously described protocols through the Baylor-Hopkins Center for Mendelian Genomics (BHCMG) initiative. [15][16][17] In brief, pre-capture library preparation was performed using KAPA HyperPrep Kits, followed by the HGSCdesigned core capture protocol (52 Mb, Roche NimbleGen). 18 Figure 1. A schematic of embryonic upper limb development An illustrative drawing describes a human embryo at Carnegie stage 12 (top left; approximately 4 weeks after conception), when the limb bud forms as a ventrolateral bulge and begins to emerge outwardly. Also shown is a magnified illustration depicting the 3D patterning of the upper limb bud and a dissected view of the anatomical position between ectoderm and mesoderm. The P/D, A/P, and D/V axes are labeled with different colors. The AER is labeled at the brown epithelial region, ZPA is shown as a gradient-based yellow zone. Bottom right: illustration of a mutual genetic antagonism between the short isoform of GLI3 (GLI3R) and HAND2 that is required to establish a ''pre-patterning'' of the A/P axis prior to formation of the SHH-ZPA gradient. Top right: P/D elongation is regulated by many signaling elements, including FGF8, TP63, and BHLHA9.
Paired-end sequencing with multiplex pools was executed using an Illumina Hiseq2000 or NovaSeq6000, generating an average depth of coverage of $1003. Subsequently, post-sequencing data were mapped to the haploid human genome reference sequence assembly (hg19) together with variant calling, post-processing, annotation, and quality control under a standardized HGSC-Mercury pipeline as implemented previously and deployed in the cloud. 19 An unbiased, stepwise computational analysis workflow was employed for SNV filtering, parsing, and prioritization. 20 Rare variants were extracted comprehensively based on their minor allele frequency (MAF), calculated collectively from an internal BCM-HGSC exome variant database consisting of personal genomes from $13,000 individuals and a variety of public genomic databases, including the Genome Aggregation Database (gnomAD), 1000 Genomes Project (TGP), and the Atherosclerosis Risk in Communities (ARIC) study database. Filtered variants were annotated and prioritized for further study based on MAFs (<0.5% as threshold) and pathogenicity prediction algorithms, including combined annotation-dependent depletion (CADD; >20 as deleteriousness) 21

Variant validation and segregation analysis
All candidate SNVs and indels were orthogonally validated by Sanger dideoxy sequencing of the target region amplified using conventional PCR (HotStar Taq, QIAGEN). Familial segregation analysis was examined in accordance with Mendelian expectations on all identified candidate variants alleles, with potential penetrance evaluated by physical examination and radiographs. 23
The genomic architecture of each potential pathogenic CNV was experimentally investigated in affected individuals and selected family members by customized high-definition arraybased comparative genomic hybridization (HD-aCGH), using arrays designed by implementing the eArray system (https:// earray.chem.agilent.com/suredesign/) and array platforms manufactured by Agilent (AMAID: 086724). Microarray protocols, including DNA digestion, probe labeling, gender-matched hybridization, and post-washing with minor modifications, were performed as described previously. 26 Agilent SureScan and Feature Extraction software was utilized to achieve the image-to-digital transition, with further data analysis and visualization on the Agilent Genomic Workbench.
Genes mapping within genomic intervals potentially susceptible to genomic instability or Alu/Alu-mediated rearrangement (AAMR) were investigated by AluAluCNVpredictor for a relative risk genomic instability index or score, implementing a recently described informatics tool. 27,28 Approximate genomic intervals of CNVs were determined at single W-C base-pair resolution via breakpoint junction analysis using Sanger dideoxynucleotide sequencing.

Absence of heterozygosity (AOH) and co-efficient of consanguinity analysis
Genomic data evidence for potential consanguinity was investigated quantitatively based on AOH, which can serve as a surrogate measure of runs of homozygosity (ROHs), to profile potential IBD genomic intervals and a coefficient of consanguinity (i.e., the fraction of the assayable human genome haploid reference), as described previously. 29,30 For an individual genome, ROH genomic regional analyses often provide additional evidence for variant prioritization, serving as an adjuvant tool to probe the genetic and allelic genomic architecture and autozygous intervals (IBD). In this study, AOH data visualization and analysis for individual personal genomes were achieved through BafCalculator, an in-house-developed bioinformatics method for B allele frequency computation from unphased ES-variant data. 31

Study approval
Eighteen families with clinical diagnoses of CLM were recruited into the BHCMG under an institutional review board (IRB; H-29697)-approved study. 15 All participating subjects signed informed consent regarding data sharing and publication of medical information and photographs.

Results
Characteristics of study subjects and family-based genomic analyses From 2012-2020, 18 families with rare congenital defects involving limbs or the appendicular skeleton were enrolled into the BHCMG, now the BCM-Genomics Research to Elucidate the Genetics of Rare (BCM-GREGoR) disease consortium, with a mandate to ''solve the unsolved.'' Using family-based genomics and rare variant analyses, 7 SNV and 3 CNV alleles were identified as plausible molecular diagnostic etiologies for 11 of the 18 studied families, interpreted by American College of Medical Genetics and Genomics (ACMG)-based variant classification criteria (Table 1  and Table 2). 32,33 Most gene products involve the P/D and A/P axial formation of the limbs.
Intrafamilial phenotypic variability was commonly observed in the studied families and associated with reduced penetrance and variable expressivity (Table 2). Sometimes, variability in presentation manifests as unilateral or asymmetric anatomical patterns (e.g., left/right, upper/lower limbs) for an individual, with malformation observed specifically in an asymmetric pattern on only the left/or right side or upper/lower limbs.
Novel variant identification involving genes contributing to A/P axis patterning Polydactyly preaxial type IV (PPD4; MIM: 174700), alternatively known as polysyndactyly, represents manifestations    characterized by a bilateral polydactyly presenting preaxially in the thumb/hallux, with variable syndactyly seen in other digits (fingers/toes). 34,35 In our cohort, PPD4 was observed in multiple members of a Turkish family (HOU2084) as an apparent autosomal dominant (AD) trait (Figures S1A-S1B). ES analyses of three affected individuals revealed a novel heterozygous nonsense variant in the Glioma-associated oncogene (GLI)-Kruppel family member 3 gene, GLI3 (NM_000168:c.1673C>A [p.Ser558*]). Segregation studies fulfilled Mendelian expectations for a dominant allele with complete penetrance ( Figure S1A). This variant allele is predicted by conceptual translation to result in a premature termination codon (PTC) within the N-terminal zinc-finger domain (ZFD) of GLI3; this PTC is predicted to trigger nonsense-mediated decay (NMD), generating a loss-of-function (LoF) variant allele. This interpretation is further supported here by the evidence of a genotype-phenotype observation in the GLI3-nonsense variant carriers in family HOU2084, showing only the essential clinical features of PDD4 but no additional findings ( Figure S1B). During limb A/P axial development, a mutual genetic antagonism between GLI3 and heart-and neural crest derivative-expressed protein 2 (HAND2) is required to establish a ''pre-patterning'' of the A/P axis prior to ZPA formation of the Sonic hedgehog (SHH) gradient ( Figure S1C). In contrast to GLI3, HAND2 is expressed in a cell-restricted manner in the posterior mesenchymal region and activated by a direct upstream regulator, Homeobox D13 (HOXD13) ( Figure S1C).
Synpolydactyly (SPD1; MIM: 186000) is another ''phenotypic combination'' with a typical clinical presentation characterized by mesoaxial polydactyly and syndactyly, particularly involving fingers 3/4 and toes 4/5. 36,37 Trio-ES rare variant analyses of a sporadic proband, BAB8289, who manifested a typical SPD1 phenotype, found a maternally inherited and novel heterozygous missense mutation of HOXD13 (NM_000523:c.623A>T [p.Asp208Val]) (Figures S1D and S1E). This variant allele is absent in gnomAD and BCM-GREGoR internal databases comprising exome data from $13,000 individuals. The mother of the proband seems non-penetrant because variant reads are not supportive of mosaicism (variant read to total read ratio as 82/158). This finding in BAB8289 showed that a dominant cause of the disease could be identified in a family from an admixed population with a high coefficient of consanguinity.
Multiple mutation types for pathogenic variant alleles of HOXD13 segregate with SPD1. 38 Aggregated evidence implicates LoF variants of the HOXD13 functional domain driving the typical features of SPD1 and anticipated hap-loinsufficiency as the most plausible mechanism for the molecular pathogenesis of SPD1. 39,40 The p.Asp208Val substitution altered an evolutionarily conserved residue outside of, but adjacent to, the homeodomain of HOXD13 ( Figure S1D); this variant allele mapped 12 amino acids from a previously reported pathogenic missense variant, p.Gly220Val. As measured by dosage changes, the latter variant allele has been shown experimentally to jeopardize protein stability through aberrant cytosolic accumulation. 41 Biallelic disruption of NPR2 dysregulating longitudinal skeletal growth Natriuretic peptide receptor B/guanylyl cyclase B, encoded by NPR2, is one of the homodimeric transmembrane hormonal receptors regulating longitudinal skeletal growth during the process of endochondral ossification. Biallelic LoF variants in NPR2 are known to cause autosomal recessive (AR) acromesomelic dysplasia, Maroteaux type (AMDM; MIM: 602875), an endochondral bone growth disorder characterized by significant short stature, predominantly affecting the forearms and hands. 42,43 Here we identified two novel and two known pathogenic variants of NPR2 from four unrelated consanguineous Turkish families (HOU2130, HOU1409, HOU3212, and HOU1410) (Figure 2A). Short stature and acromesomelia, characterized by short and bowed forearms and symmetrically short and broad metacarpals and phalanges with fingernail hypoplasia, are observed in most of the affected individuals ( Figures 2B and S2; Table 1B).
In family HOU2130, a novel homozygous missense variant (NPR2: c.277C>A [p.Pro93Thr]) was identified through ES analysis of two affected siblings (BAB5498 and BAB5499) with AMDM. This missense variant results in alteration of a conserved residue (p.Pro93Thr) in the C-type natriuretic peptide-binding domain (CNP) of NPR2 and potentially disrupts the CNP binding affinity. AOH studies mapped the variant to overlapping 2-Mb genomic intervals of AOH/ROH in BAB5948 and BAB5949 encompassing NPR2, whereas the total AOH was 295 Mb and 239.5 Mb, respectively, consistent with a clan genomics-derived allele versus a Turkish founder allele (Figures 2A-2D). 29 Affected individuals of family HOU1409 and family HOU1410 manifest similar AMDM phenotypes and were directly tested by NPR2-targeted Sanger dideoxy sequencing. A novel homozygous nonsense variant of NPR2 (c.1087C>T [p.Arg363*]) was found in two affected male siblings (BAB3607 and BAB3608) in family HOU1409 segregating with the AMDM phenotype. This  Cys247*]) in family HOU2346. Proband BAB6262 (black arrow) was born from a first-cousin mating. He has two male siblings. One brother, BAB8610, is affected by same SHFM phenotype; the other brother, BAB6263 (shaded green on the pedigree), was diagnosed with cerebral atrophy but no limb malformation. novel stop-gain variant allele resulted in a PTC in exon 4 (total 21 exons) of NPR2 and was predicted to be subject to NMD ( Figure 2C). In family HOU1410, we identified a known pathogenic homozygous missense variant allele (NPR2, c.2720C>T [p.Thr907Met]) in two affected siblings (BAB3611 and BAB3612) and their affected paternal aunt (BAB3614). This missense variant altered a conserved residue mapping to the guanylate cyclase domain, potentially jeopardizing the process that produces cytoplasmic cyclic guanosine monophosphate (cGMP) from GTP ( Figure 2E).
Proband BAB8908 was born to Turkish kindred (HOU3212) with reported parental consanguinity, and the clinical phenotype was manifested by marked bilateral hallux macrodactyly with brachydactyly of the other toes (Figures 2A and 2B). Trio-ES analysis identified a homozygous missense variant allele (c.1673T>C [p.Ile558Thr]) in the kinase homology domain (KHD) of NPR2 ( Figures 2B-2D). Segregation analysis confirmed homozygosity for this variant allele in the proband; homozygosity was also found in an affected sister and affected maternal aunt of the proband. Both parents and the unaffected brother were heterozygous carriers consistent with Mendelian expectations (Figure 2A). The same allele has been reported previously in two affected siblings with shortlimbed short stature in association with brachycephaly and a dysplastic middle phalanx of the fifth finger. 42 AOH studies map the allelic variant to a 5.2 Mb AOH block, whereas the total AOH was 281.6 Mb, consistent with a clan genomics-derived IBD allele versus a Turkish IBS founder allele ( Figure 2D). 14,29,44 WNT10B dysfunction in limb development and evidence of a potential founder allele Failure of AER ( Figure 1) stratification presumably results in split-hand/foot malformations (SHFM), a subgroup of CLM characterized by maldevelopment of central rays of the distal extremities. SHFM type 6 (SHFM6; MIM: 225300) is a non-syndromic AR trait associated with bial-lelic frameshift, nonsense, or missense variants of WNT10B. 45,46 Studies in mouse embryonic limb bud development reveal Wnt10b as an endogenous ectodermal Wnt ligand specifically expressed in the AER at embryonic day 11.5 (E11.5), whereas its regulatory mechanism remains largely unclear. In this cohort, we identified a novel indel variant that leads to a nonsense allele of WNT10B (c.741delC [p.Cys247*]); autozygosity of this novel frameshift allele causes a similar SHFM6 phenotype in two unrelated consanguineous Turkish families in our cohort because of IBD (Figures 3A and 3C).
In family HOU2346, two male siblings, BAB6262 and BAB8610, manifest an isolated bilateral split-foot malformation characterized by absence of the two central rays.
In the upper distal limb, syndactyly is present between the third and fourth digit of the right hand of the proband (BAB6262), whereas his affected brother (BAB8610) has a flexion contracture of his left index finger ( Figure 3B). The same WNT10B frameshift variant allele (c.741delC [p.Cys247*]) was identified in another consanguineous Turkish family in our cohort, HOU3360; homozygosity segregated with SHFM6 phenotypes among three generations ( Figure 3C). Similar to the clinical features in family HOU2346, bilateral SHFM with more severe foot manifestations appeared among three affected individuals in family HOU3360 (proband BAB9281 and two siblings of her maternal grandfather, BAB9286 and BAB9316), in association with variable hand abnormalities, including syndactyly, polydactyly, and phalangeal dysplasia.
This frameshift variant allele (WNT10B, c.741delC) results in a PTC in situ (p.Cys247*) within the last exon; it is predicted by NMDescPredictor to escape NMD surveillance ( Figure 3F). Compared with the WNT10B wild-type protein structure, the linker region and C-terminal domain (CTD) of WNT10B that is essential for formation of the Wnt-Frizzled-LRP6 ternary complex, is completely removed in the modeled WNT10B p.Cys247* truncation ( Figure 3F). These observations suggest the underlying (B) Clinical images and radiographs showing the hands and feet of two affected individuals (BAB6262 and BAB8610) in association with SHFM6. Note the right-hand 3-4 syndactyly in BAB6262 and hypoplasia with medial deviation of the index finger of the left hand of BAB6263. There is split foot malformation in both, with oligodactyly and syndactyly of lateral toes. (C) Multigeneration pedigrees of family HOU3360, labeled with allelic information profiling segregation of the same frameshift variant (WNT10B: c.741delC [p.Cys247*]) and recessive inheritance pattern (i.e., AR trait) in comparison with family HOU2346. *BAB9287 is affected by an unknown type of CLM characterized by bilateral hand brachydactyly, clinodactyly, and phalangeal dysplasia. **BAB9284 has a unilateral polysyndactyly on the second toe of the right foot; its etiology remains unknown). (D) Clinical photos of the three affected individuals in family HOU3360. From top to bottom: BAB9286, BAB9316, and BAB9281 manifest the SHFM6 phenotype, demonstrating varying degrees of SHFM, oligodactyly, and syndactyly. (E) AOH genomic intervals mapping to chromosome 12 from ES personal genomes, visualized by B allele frequency computational studies from four affected individuals in families HOU2346 and HOU3360. The top two panels for family HOU2346 describe a 9.6 Mb (for BAB6262) and 21.6 Mb (for BAB8610) interval of AOH genomic interval haplotype block surrounding the causative variant allele of WNT10B (red vertical line), marked with thick gray rectangles. The two bottom panels for HOU3360 denote a shared 19.5 Mb AOH interval encompassing WNT10B variant alleles between individual BAB9281 (proband) and the affected brother of her maternal grandfather, BAB9286. (F) An illustration from NMDEscPredictor shows the WNT10B cDNA architecture and the position of the frameshift variant located in the NMD escape region (denoted as a green bar). (G) The top illustration shows a comparison of linear structures or map positions and domains between the WNT10B wild-type (WT) and truncated version (p.Cys247*, with missing residues denoted by dashed lines). Also shown is a comparison (bottom) of two 3D protein structural models (WNT10B: WT and truncated version) obtained using the AlphaFold platform and visualized on UCSF Chimera (https://www.cgl.ucsf.edu/chimera/). Truncated WNT10B was predicted to lose the entire linker region and CTD, including the amino acids that map to the interface that interacts with the WNT classic receptor (Frizzled protein) and canonical coreceptor LRP5. (legend continued on next page) molecular mechanism and that the NMD-escaping frameshift variant allele results in a truncated form of WNT10B that is unable to activate canonical Wnt/b-catenin signaling pathways during AER stratification.
Segregation analysis and ROH/AOH profiling in these two multigeneration families revealed genomic evidence of consanguinity and autozygosity of disease-causing alleles ( Figure 3E). In family HOU2346, the WNT10B variant maps within 9.6 Mb and 21.6 Mb genomic intervals of AOH/ROH in BAB6262 and BAB8610, with total AOH/ ROH of 251.2 Mb and 217.2 Mb, respectively. In BAB9281 and BAB9286, a similar 19.5 Mb genomic interval of AOH/ROH encompasses WNT10B, which tracked with transmission of the disease-causing allele and provided evidence of intergeneration autozygosity through IBD. This genetic pattern, resulting from a homozygote marrying a recessive carrier, is referred to as a pseudodominant trait.
The youngest child, BAB6263 of family HOU2346, was found to have cerebral atrophy but no limb anomaly (Supplemental Note). Carrier and segregation analysis of BAB6263 identified a homozygous variant in general transcription factor IIIC subunit 1 (GTF3C1: c.4096G>A [p.Glu1366Lys]), and this particular GTF3C1 variant allele has been reported previously by our group, causing cerebellar atrophy in another unrelated Turkish kindred ( Figure S3). 17 This family illustrates multi-locus pathogenic variation (MPV) within the personal genome of family members rather than the MPV of an individual with different genomic interval autozygosity in different siblings.
Haploinsufficiency of HOXD cluster deletion as a major molecular etiology for SHFM5 Large deletion CNVs encompassing the HOXD cluster have been proposed previously to cause syndromic SHFM5 (MIM: 606708), characterized by SHFM, syndactyly and skeletal hypoplasia in association with ectodermal and craniofacial findings. [47][48][49] We identified an individual (BAB4812) born to a Turkish family (HOU1397) with no report of parental consanguinity (Supplemental Note). His phenotype resembles a syndromic spectrum of SHFM5: bilateral hand SHFM and variable syndactyly; multiple congenital anomalies, including global developmental delay (GDD) and intellectual disability (ID); hypotonia; and facial dysmorphism ( Figures S4A and S4B).
BHLHA9 CNV and semidominant inheritance underlie GWC Copy number gains of BHLHA9 in humans have been described in association with the triplosensitivity trait (defined as a phenotype produced by an additional copy of the gene) potentially causing a gain-of-function (GoF) hypermorphic allele of SHFM with long bone deficiency (SHFLD3; MIM: 612576) as an AD trait with less than 50% penetrance and variable phenotypic expression. [50][51][52] Family HOU3586 was referred from Brazil and has been published previously. 53 Three individuals manifest a paternally inherited CLM, including SHFM and bilateral tibial aplasia ( Figures S5A-S5C). The whole-genome aCGH of each family member revealed a shared 61.8 kb duplication at chr17p13.3 in three affected and three unaffected individuals. The duplication encompasses BHLHA9 driving the onset of SHFLD3. [50][51][52] GWC (MIM: 228250) is characterized by unilateral bifid femur and tibial bone absence with monodactyly. 54,55 The presence of heterozygous triplication involving BHLHA9 (gene copy number ¼ 4) has been reported only once in a single family associated with GWC in the Japanese population, whereas there remains a lack of additional cases reported. 56 Here we found a novel allelic contribution by autozygosity-induced homozygous duplication of chr17p13.3, including the BHLHA9 locus. HOU3358 is a consanguineous family from Syria in which proband BAB9273 was clinically diagnosed as having GWC (C) B allele frequency of chromosome 17 for the proband showing a 9.8 Mb interval of AOH surrounding the homozygous BHLHA9 duplication, marked with a thick gray line. (D) A comparison of log10 reads per thousand base pairs per million reads (RPKM) plots of the duplication region from XHMM raw data. RPKM calculations of the proband (red line) and brother (yellow line) show different amplification levels on the reads mapping to the interrogated region (shaded in pink) in contrast to the gray lines delineated as individual ES samples from an internal database with similar experimental conditions. (E) An illustration showing the proposed model of BHLHA9 tandem duplication formed by AAMR. A single-strand break that occurred in the 3 0 AluSx4þ element (yellow mark) introduced microhomology (marked by a blue bar)-mediated TS into another AluSzþ element located in the 74.9 kb upstream region. A pair of primers was designed to capture the recombinant joint of the tandem duplicated allele by producing an amplicon size of $2.5 kb encompassing the approximate breakpoint location. A control primer pair was applied on an $500 bp unrelated genomic region. The diagram shown below is the 1% agarose electrophoresis of the PCR product; the specific band of $2.5 kb is the tandem duplicated allele captured in all members (parents and brother, with the lighter bands denoting heterozygous duplication; while proband with a thick band consistent with homozygous duplication). Sanger sequencing (bottom) of the $2.5 kb PCR product of the proband revealed an 8 bp microhomology. Three sequences above the Sanger sequencing trace from top to bottom denote the 3' AluSx4þ reference (colored in green), the proband ''chimeric'' sequence, and the 5 0 AluSzþ reference.
( Figures 4A and 4B). The proband has two unaffected brothers. The parents also had five pregnancies with fetuses that were prenatally diagnosed by ultrasound with bilateral femur bifurcations.
Initially, CNV analysis using XHMM from family-based exome data detected a potential copy number gain on chr17p13.3 involving BHLHA9 in proband BAB9273 and her unaffected brother, BAB9276 ( Figure 4D). Validation studies of each family member via customized chr17p HD-aCGH resolved a 74.9 kb duplication at chr17p13.3 covering the full length of BHLHA9 and tumor suppressor candidate gene 5 (TUCS5) in all four family members ( Figure 4A). This duplicated allele was brought into homozygosity in the proband with GWC through IBD, not IBS, resulting in four genomic copies of the gene and genomic duplication interval ( Figures 4A and 4C), a gene copy number identical to the heterozygous triplication mentioned above. 57 Semidominant inheritance, with a more severe phenotype in the homozygote versus heterozygote, indicates that the individuals who carried heterozygous variant alleles manifest an intermediate phenotype in comparison with the homozygous individual with the same duplication CNV affecting both alleles. Semidominant inheritance has been documented in other human disease traits, such as achondroplasia (ACH; point mutation of FGFR3; MIM: 100800) and Huntington disease (HTT triplet-repat polyglutamine expansion; MIM: 143100), and in gene duplication traits such as that resulting from CMT1A duplication. 58,59 Here, homozygous duplication for BHLHA9 results in GWC, a more severe type of limb malformation than SHFLD3; the latter is a rare disease trait caused by heterozygous duplication.

Structural variant (SV) mutagenesis by AAMR
To understand mutagenesis mechanisms underlying the observed CNV events, we applied breakpoint junction analysis on all SV cases, including deletion or duplication CNV alleles. For the de novo HOXD cluster deletion of proband BAB4812 in family HOU1397, analysis of this deleted CNV allele unveiled two flanking Alu elements (AluSxþ and AluSc8À), in the opposite orientation with respect to the haploid reference genome, that generated a complex genomic rearrangement best described as a ''deletionnormal/inversion-deletion (DEL-INV-DEL)'' sequence involving a two-step template-switching (TS) mechanism ( Figures S4D-S4F). 60 A 5 bp sequence microhomology and a 6 bp microhomology were found at two breakpoint junctions, respectively, indicating microhomology-mediated break-induced replication (MMBIR) as the likely underlying DNA-replicative repair mutational molecular mechanism (Figures S4D-S4F).
It has been established that chr17p13 is a ''genomic instability hotspot'' with high susceptibility for genomic rearrangement that frequently involves Alu/Alu recombination. 61 AluAluCNVPredictor revealed that BHLHA9 showed an elevated genomic instability index with a high relative risk score of 0.798 for OMIM genes and 0.782 for RefSeq genes ( Figure S5E). 27 Breakpoint junction analysis of family HOU3586 showed a chimeric repetitive element fused by an AluSx1 and a few upstream nucleotides adjacent to a long terminal repeat (LTR), potentially derived by MMBIR ( Figure S5D). Strikingly, another duplicated allele, including BHLHA9 in family HOU3358, was also demonstrated to be driven by an AAMR mechanism, resulting in a chimeric Alu element resulting from recombination of AluSx4 and AluSz in the same orientation, with an 8 bp microhomology at the breakpoint junction ( Figure 4E). Thus, de novo mutagenesis in an antecedent generation of the clan derived this allele by SV mutagenesis via AAMR, with IBD bringing it to homozygosity.

Discussion
Novel variant identification facilitating discovery of an allelic series Family-based genomics studies and rare variant analyses in human populations throughout the world provide an opportunity to decipher a diverse and complex variant spectrum of disease trait-associated loci potentially contributing to CLM; such allelic diversity would not be sufficiently seen/or easily engineered in model organisms. 14,44 Identifying complex variation (such as NMDescaping variant alleles, gene triplication, etc.) provides new insights into the underlying mutational mechanisms enabling extensive allelic series and genotype-phenotype correlation analysis of a given disease-causing locus.
Allelic series is historically a genetic term referring to different ''mutant alleles'' of a gene or genomic locus that can generate a range of phenotypic trait manifestations. Each phenotype of this range is ascribed to variant alleles within different regions of the same gene/locus. 62 For instance, GLI3 and HOXD13 are two genes regulating limb A/P axial development, and their disruption can cause AD-inherited digit abnormalities, as we showed in the families HOU2084 and HOU3022. However, allelic series in both genes are rather intriguing; extensive variant analysis on genotype-phenotype correlation of GLI3 discovered a unique dichotomous relationship between the typical Greig cephalopolysyndactyly syndrome (GCPS) spectrum (MIM: 175700) and a more severe AD disorder called Pallister-Hall syndrome (PHS; MIM: 146510), depending on the location (i.e., genomic map position) of the causative variant allele. 63 In contrast to GLI3, similar observational studies of HOXD13 revealed that its disruption was linked to many different digital malformations, including synpolydactyly (SPD1; MIM: 186000), brachydactyly type D (BDD; MIM: 113200), and brachydactyly type E (BDE; MIM: 113300). Although heterozygous polyalanine expansions, frameshift, and nonsense variant alleles of HOXD13 predominantly associate with SPD1, missense variants have been reported to cause SPD1 or BDD/BDE with no positional differentiation.
''Mirror trait'' conditions are special allelic series that are defined as resulting from reciprocal genetic/genomic variants (e.g., duplication versus deletion CNV, LoF versus GoF) of the same gene/loci and can cause the observed phenotypes in individuals to appear at the opposite ends of a phenotypic spectrum (e.g., short stature versus tall stature). 57,64 As we addressed previously, biallelic LoF variants of NPR2 were frequently reported, causing AMDM characterized by short-limbed short stature. However, a few of the heterozygous missense variant and in-frame deletion alleles of NPR2 were also reported to cause AD epiphyseal chondrodysplasia, Miura type (ECDM; MIM: 615923), characterized by tall stature, arachnodactyly, and long/broad halluces and long metatarsals because of a GoF (potentially a hypermorphic allele) that results in overactivity of guanylate cyclase ( Figure S6). [65][66][67] Genotype-phenotype correlation of these reciprocal traits highlighted their potential underlying etiologies related to abnormal CNP-induced cGMP production. In contrast to AMDM, ECDM-causing alleles were seldomly described ( Figure S6).
WNT signaling perturbation affecting P/D axes of limb development Wnt ligands associated with canonical and non-canonical signaling cascades are essential for limb and skeletal development, particularly required for AER formation and maintenance during P/D and D/V limb bud axial patterning. [68][69][70][71] Perturbation of seven genes (DVL1, DVL2, DVL3, FZD2, NXN, ROR2, and WNT5A) acting on non-canonical WNT/planar cell polarity (PCP) signaling pathways are known causes of Robinow syndrome (RS), a congenital skeletal dysplasia with mesomelic limb shortening, distinct craniofacial findings enabling a recognizable pattern of human malformation, and other characteristics. [72][73][74] In association with clinical and locus heterogeneity, limb shortening has been described as a common endophenotype in RS, and mesomelia (disproportionately short middle portions of the limb) is the most often observed consistent associated clinical finding in individuals with RS. 74,75 Although biallelic variants of WNT10B have been described to cause SHFM6 because of LoF alleles, heterozygous missense variant alleles of WNT10B, like WNT10A, can result in tooth agenesis, selective type 8 (STHAG8; MIM: 617073) with no distinct mutational location or nature. 76,77 Homozygous mutations in WLS, encoding the Wnt ligand secretion mediator essential for secretion of all Wnt proteins, cause human syndromic structural defects with bilateral split foot and syndactyly. 78 For human WNT signaling, we also reported limb findings of bilateral foot oligodactyly and left-hand syndactyly in the individual with a de novo, likely damaging variant allele of Nemo-like kinase (NLK), an inhibitor of the WNT/b-catenin signaling pathway ( Figure S7; Supplemental Note). 79,80 These clinical observations may not be readily synthesized nor mutational bases discovered and assessed in most standard model organism approaches; thus, the pioneering organism Homo sapiens must be utilized for the allelic series.
CNV of BHLHA9, evidence of extension of an allelespecific gene dosage effect model Gene dosage is a measurement of the number of copies of a gene in a species genome, whose alteration can simply be made by a CNV encompassing the given locus or genetranscriptional processes, such as TAD disruptions, that may affect gene action or expression potentially related to a position effect. Rare CNVs are prominently associated with disease in humans and defined the entire field of genomic disorders. [81][82][83][84] Disease-causing CNVs are often large (hundreds of kb to Mb sizes), can include more than 1 gene, and are often significantly enriched for genes for transcription factors and/or signaling molecules contributing to developmental processes. Such genes may be subject to a gene dosage effect and may be dosage sensitive, and this aspect may serve as the major determinant of its pathogenicity resulting in haploinsufficient and triplosensitive traits and even mirror traits. 57 CNV studies of dosage-sensitive genes have contributed significantly to the progress of disease gene discovery and genomic instability/genome integrity mutagenesis; i.e., SV mutagenesis of the human genome.
BHLHA9 regulates apoptosis during mammalian autopod skeletogenesis and patterning. 85,86 In the Bhlha9 knockout mouse, interdigital programmed apoptosis is jeopardized, resulting in various severities of syndactyly of the fingers and hindlimbs/forelimbs. 85,86 Abnormal expression patterns of Trp63 and other AER morphogens have been observed in Bhlha9 knockout mice. 86 Functional depletion of BHLHA9 has also been observed in individuals with biallelic missense variants involving the DNA-binding domain of BHLHA9, causing mesoaxial synostotic syndactyly with phalangeal reduction (MSSD; MIM: 609432). 87 Complex camptosynpolydactyly (CCSPD; MIM: 6097539), characterized by grossly malformed hands with digits arising from the dorsum of the hand and synpolydactyly, has also been reported in one Indian family. 88 The BHLHA9 variant allele we describe here was homozygous, with the position of amino acid change mapping adjacent to the residue substitution observed in association with MSSD.
BHLHA9 is essential for interdigital apoptosis in central limb mesenchyme cells, and biallelic LoF variants ablate this regulatory ability. We hypothesized that copy number gains involving BHLHA9 might enhance its regulatory potential; thus, the apoptosis activity may be overly or ectopically increased in central limb mesenchymal cells during limb development, causing SHFM with or without long bone deficiency. These phenomena are somewhat analogous to a ''mirror trait''; 64 this, together with its strict spatiotemporal expression pattern, suggested BHLHA9 as a dosage-sensitive gene. Further support for that contention comes from disrupted TAD findings in association with duplication CNV. 89 Therefore, individually subtle differences of perturbations in gene expression caused by CNVs of BHLHA9 may explain the phenotypic variability frequently described in the literature. [90][91][92] Evidence shows that such dominant disease-causing CNVs with less than 50% penetrance, despite having a large effect size, could arise ancestrally and form as ''founder pathogenic alleles'' frequently transmitted in the Japanese population. 56 Mirror traits at the Smith-Magenis syndrome (MIM: 182290)/Potocki-Lupski syndrome (MIM: 610883) locus and the schizophrenia and autism chr1q21.1 loci associated with microcephaly/macrocephaly are mirror trait loci for loci manifesting gene dosage effects. 57,64,93 Triplication and quadruplication involving this locus have also been described in different families associated with SHFLD3 and GWC phenotypes in different populations, with the common observation that the copy number gain value (i.e., the number of gene copies gained) enhances the likelihood of penetrance and severity of the phenotype. 56,88 We observed that homozygous duplication of BHLHA9 can cause GWC, in which SHFLD3 can be seen as an intermediate phenotype. Here we propose that BHLHA9-associated limb malformations serve as a disease model with the association of penetrance and phenotype severity determined by its magnitude of effect on gene function ( Figure 5). 44,[94][95][96] The resultant potential distortions of biological homeostasis could parsimoniously explain how such a ''gene dosage expression model'' significantly contributes to limb anomalies and other developmental birth defects. 44,57

Likelihood of penetrance
or LoF SNV/indel Figure 5. Allele-specific gene dosage model for perturbation of BHLHA9 action This illustration uses BHLHA9 as a ''molecular entry point'' describing the potential genotype-phenotype correlation, penetrance, and expressivity. These observations can be parsimoniously explained by the perturbation of gene dosage underlying different genomic events, e.g., duplication versus triplication or quadruplication of the locus. Copy number gain of BHLHA9 can abnormally enhance apoptosis during limb P/D development, causing the SHFLD phenotype. For individuals, an increased gene copy number increases the gene dosage, reinforcing the phenotype severity, likelihood, and penetrance, with homozygosity resulting in the most intense magnitude of effect (GWC). In contrast, biallelic LoF variant of BHLHA9 result in an attenuated gene dosage, driving a distinct MSSD phenotype characterized by insufficient apoptosis activity during AER stratification.

Challenges and prospects
We provided further support for the contention that the etiological elements of developmental anomalies are rather heterogeneous. With current genome-wide methods (ES and aCGH in this study), we were incapable of revealing other types of genomic events, such as non-coding variant alleles (SNVs or CNVs), copy-number-neutral structural variants, and even some types of indels and repeat expansions (e.g., di-, tri-, and tetranucleotide repeats). Other factors known to cause limb defects, such as postzygotic/somatic mutations, mosaicism, fetal deformation, or exposure to environmental mutagens, are challenging to investigate and positively control.
Prospective short-read and long-read whole-genome sequencing (WGS), and RNA sequencing (RNA-seq) will be employed to detect additional causes in unsolved cases. Concerning overall phenotypic heterogeneity, human phenotype ontology (HPO)-based quantitative phenotypic analysis will be adapted in the future to quantify the limb anomaly trait and investigate the allele-specific differences (or to uncover novel allelic series) in similar CLM spectrums as well as to dissect MPV in AR or AD blended traits. 74,97 Autozygosity mapping and AOH/ROH analyses from unphased exome data in consanguineous families could assist with additional novel disease gene/locus discovery in consanguineous families. 98 These strategies extend the effort to test the hypothesis proposed by Coban-Akdemir et al. that rare, deleterious variants embedded within long-sized AOH/ROH tend to evolve into diseasecontributory haplotypes and whether the intrafamilial clinical heterogeneity could be the resultant from distributive AOH/ROH and MPV. 30 Interpretation of interconnections among allelic series, clinical severity, and reduced penetrance remains an important and challenging genotype/phenotype interpretive biology issue. Future explorations will compare the allelic-specific gene dosage (AsGD) model proposed in this study with the established compound inheritance gene dosage model (CIGD) model to test this hypothesis and provide evidence as to whether such gene dosage expression models significantly contribute to limb anomalies and other developmental malformations. 95,97 Data and code availability This study did not generate any codes or analyze any public datasets. The identified variants in this paper were submitted to the ClinVar with sequential identifiers SCV002546515-SCV002546524. All exome sequences, consent, and phenotypic data reported in this paper can be requested from controlled-access databases on dbGaP: phs000711.v7.p2.