Introduction

Fanconi anemia (FA) is a rare Mendelian disease with variable penetrance and great heterogeneity in clinical manifestations and genetics. Clinical features include bone marrow failure, congenital malformations involving the skeletal system, gastrointestinal tract, heart, and kidneys, and predisposition to hematologic and solid malignancies [1]. As the most prevalent inherited bone marrow failure syndrome, the incidence of FA is about 1/200,000 worldwide [2]. Twenty-two causative genes have been identified to date, among which FANCA gene harbors causal variants contributing to over 60% of all cases. Previous studies have revealed high heterogeneity in variations of FA-related genes, including single-nucleotide variations (SNVs), small insertion/deletions (InDels), and frequently occurring large fragment deletions, especially in FANCA gene. The proportions of large deletions in FANCA pathogenic variants in different races range from 23% to 40% [1, 3,4,5,6], while in Afrikaner the rate surges to 80% because of funder effect [7]. The composition of genetic abnormalities in Chinese FA patients has not been investigated yet, leaving the epidemiology blank and making it extremely tricky for detection strategy design and evaluation. In this case, we found the compound heterozygous FANCA overlapping novel submicroscopic deletions of 1 kb and 42 kb in length, respectively, resulting in two different FANCA–VPS9D1 fusion transcripts in a FA pedigree via integrative approaches.

Materials and methods

Clinical report

A 7-year-old boy presented with pale, recurrent fever, and cough. He was the first child of a nonconsanguineous couple, and both of his parents were healthy. The proband was born full-term by normal delivery with the birth weight of 3.2 kg. He accepted extra finger resection on the right hand 1 year ago. No family members (both the parents and a younger brother) experienced symptoms or manifestations similar to those of the proband. Physical examination revealed short stature, systematic skin pigmentation, bilateral ptosis, hypertelorism, flat nose bridge, and bilateral ulnar deviation malformation of the thumbs. The proband had already been to two local medical centers before he came to our hospital. His complete blood cell count (CBC) results revealed pancytopenia, and bone marrow smear displayed hypoplasia. He was diagnosed with aplastic anemia and treated with antibiotics and blood transfusion in local hospitals.

His first CBC results in our hospital were as follows: white blood cell, 1.59 × 109/L; hemoglobin, 84.60 g/L; red blood cell, 2.77 × 1012/L; platelets, 106 × 109/L (3 days after platelet transfusion in a local hospital). CBC 10 days later showed a drastic drop of platelets to 11.1 × 109/L. Bone marrow smear indicated marked hypoplasia and the absence of megakaryocytes. Echocardiogram found a patent foramen ovale 3 mm in diameter. The patient was transfusion dependent and highly suspicious of FA. We then performed genetic tests on the proband and his parent to identify the genetic basis and for donor selection. Written and informed consent was obtained from the proband’s parents and the healthy controls in accordance with the Declaration of Helsinki. This study was approved by the ethics committee of Hebei Yanda Lu Daopei hospital.

Genetic test

Chromosome breakage test

Eight milliliters peripheral blood sample from the patient and 20 health controls was collected and anticoagulated with sodium heparin. Lymphocyte was obtained using Ficoll–Paque density gradient media and cultured in medium added with phytohaemagglutinin (PHA) and mitomycin C (MMC) for 72 h. Concentrations of MMC were 0, 50, and 100 ng/mL, respectively. Cultures were exposed to Colcemid before harvest. In the harvest process, the cells were transferred into hypotonic solution and fixed with glacial acetic acid and methanol. Metaphases were stained with Giemsa’s, and 100 metaphases of the proband and 20 metaphases of the controls in each culture with different MMC concentration were analyzed [8].

Nucleic acid extraction

Blood samples were collected from the proband, his parents, and 100 healthy controls. Genomic DNA was extracted from nucleate cells by silica gel column methods. Total RNA of the patient and his parents was isolated by employing the guanidinium thiocyanate-phenol chloroform method. Complementary DNA (cDNA) was synthesized using Maxima First Strand cDNA Synthesis Kit (Thermo Fisher, USA) according to the manufacturer’s protocol.

Targeted high-throughput sequencing (THS)

THS was carried out on the trio samples and normal reference samples using a panel containing BRCA2 (#MIM 600185), FANCA (#MIM 607139), FANCC (#MIM 613899), FANCD2 (#MIM 613984), and FANCG (#MIM 602956). Variants in the five genes account for more than 90% FA cases according to the literature reported [1]. Two hundred twelve amplicons were employed to cover the entire coding exons and flanking sequences of these genes. The target sequences were amplified and captured by an Ion Ampliseq Library kit 2.0–96V. Amplicons were amplified by an Ion PGM Hi-Q View OT2 Kit and sequenced with Ion PGM Hi-Q view Sequencing kit on Ion Torrent PGM. (All the kits and the sequencer were manufactured by Thermo Fisher, USA) Average sequencing depth was 300X. Analysis workflow was based on the plugins provided in Ion torrent suite. Variant calling was performed with TVC v5.0–13 from Ion torrent suite and VCFs were annotated with ANNOVAR update on 01 Jun 2017.

SNP array

Human OmniZhonghua8 V1.3 chip (Illumina, USA) containing 878,299 markers was used on the trio samples. Each sample underwent quality control, including staining, extension, target removal, hybridization, stringency, non-specific binding, and non-polymorphic control. Raw data were processed by Illumina Genomestudio. CNV Analysis workbench was performed and displayed by cnvPartition CNV Analysis Plug-in 3.2.0 in the Genomestudio Genotyping Module adopting the B allele frequency and log R ratio.

Low-WGS

The trio samples underwent low-coverage WGS at a mean coverage of 3× and 10×. The libraries were constructed with NEBNext Ultra II DNA Library Prep Kit for Illumina (New England Biolabs, USA) according to the manufacturer's instructions, followed by sequencing on Illumina HiSeq X Ten platform (Illumina, USA) using HiSeq X Ten Reagent Kit v2.5 (Illumina, USA) running on paired-end 150 bp mode. Reads were aligned to the Genome Reference Consortium Human Genome Build 37 (GRCh37/hg19) assembly of the human genome with bwa mem.

CNV analysis based on THS

THS data of normal reference samples and trio samples in the form of bam format were aligned with TMAP v 5.0.4 [9] software as output. ONCOCNV v6.8 [10] was adopted for CNV calling. The analysis process was carried out using the default parameters. We adopted the read-depth-based tool CNVnator v0.3.3 [11], combinatorial tools of Delly2 v0.8.1 [12], lumpy v0.2.13 [13] and GROM-RD v1.0 [14] to verify CNVs in this pedigree. All the tools needed bam format file as input data. Before further analysis, clean fastq files produced by Illumina NovaSeq went through quality control by fastqc v0.11.3, and mapping to hg19 by bwa mem v0.7.10-r789.

CNV visualization

NGS data in bam format was visualized with Integrative Genomics Viewer (IGV, v2.4.10). With the option of “show soft-clipped bases” in alignment preferences and viewed as pairs (to visualize discordant reads) in sample tracks, the CNVs and exact breakpoints (if captured in reads) were manual inspected for more confidence.

Validation by Sanger sequencing

The putative breakpoints and transcripts were validated by polymerase chain reaction (PCR) and Sanger sequencing (Supplementary Table 1). PCR products were Sanger sequenced forwardly and reversely, which were performed on the ABI 3500xL Genetic Analyzer (Thermo Fisher, USA).

Breakpoint sequence context analysis

Censor Server (https://www.girinst.org/censor/) and RepeatMasker (https://www.repeatmasker.org) were adopted to search for interspersed repeats in all the breakpoint regions. All alignments were performed using forward strands and reverse complements to match the Alu canonical sequence.

Results

Increased percentages of aberrant cells and chromosome breakages rates

Chromosome breakage test showed significant increase of both the percentage of aberrant cells and the mean chromosome breakage rate of the proband in every culture of different MMC concentrations compared with those of the controls (Supplementary Table 2; Supplementary Fig. 1).

THS found two deletions in the pedigree

The initial use of THS in our lab was to analyze SNPs and InDels. We failed to detect any pathologic variant of the aforementioned five genes in the proband’s sample when the annotated bam data were manually inspected in IGV. When we went over the low-coverage region, however, the homozygous loss of exon 37 in FANCA gene was observed and further proved by PCR using primers E37F/E37R and agarose gel electrophoresis (AGE) (Fig. 1a). We adopted the software ONCOCNV including a multifactor normalization and annotation technique to detect copy number aberrations, which is developed especially for targeted amplicon sequencing with a precision comparable to arrayCGH. After normalizing the read counts with a normal reference samples pool, the ONCOCNV further segmented the copy number changes by CBS algorithm.

Fig. 1
figure 1

CNV analysis based on THS and SNP array. a THS uncovered the deletion of the region containing FANCA exon 37 in the proband’s sample and PCR and AGE verified the deletion. b Log R ratio of THS data. Each dot represents the log R ratio of the reads of an amplicon. The light green dots are outlier. The purple horizontal lines represent the mean value. The log R ratio of the proband reveals a CNV (the blue shade) in the same region of his mother located in the locus of FANCA (Supplementary Tables 3 and  4). The light green dots indicated by black arrows in the proband’s plot and the father’s plot can be conferred to be a deletion in the region of FANCA exon 37 in Supplementary Table 3 and Supplementary Table 5. c CNV analysis of SNP array. From top to bottom, three bars in each plot is the B Allele frequencies, Log R ratio, and CNV values of SNP markers interspersing in chromosome 16, respectively. A CNV is captured and highlighted by pink vertical in the proband’s plot. In the mother’s plot, the signal is depressed in the same region (the blue shade) of the pink vertical but CNV cannot be conferred. In the father’s plot, no abnormality is captured (color figure online)

Ultimately, log R ratio of THS data revealed suspected compound heterozygous exons deletions of FANCA gene in the proband, with a larger heterozygous deletion of exon 32–43 extending to the end of the gene coding sequence and a suspicious homozygous deletion of exon 37. Further, with the same software and the THS data of the parents, a heterozygous deletion of exon 37 was confirmed in his father; meanwhile a larger heterozygous deletion of exon 32–43 was confirmed in his mother (Fig. 1b, Supplementary Tables 35).

SNP array detected one submicroscopic deletion in the proband

SNP array detected a heterozygous submicroscopic deletion in the proband in the region of 16q24.3, the exact location of the FANCA gene. The coordinates of the breakpoints were unspecific. In the same region of his mother, a tendency of CNV was caught in sight, but the evidence was not robust enough to confirm it. Within the father’s sample, we failed to discover any CNV via this method (Fig. 1c).

low-WGS located the breakpoints accurately

3× WGS found both deletions in the trio’s samples, and located the 1014 bp paternal deleted fragment (del1K) in chr16: 89808940-89809954 accurately referring to GRCh37/hg19 in the proband and his father containing FANCA exon 37 (chr16:89809559-89809697). When it came to the maternal deletion, the stdev of the reads depths and p values of the proband (st. dev. 6, p = 1.35 × 10−7) and his mother (st. dev. 6, p = 7.63 × 10−11) indicated the deletion with copy number called by GROM-RD [14] but failed to locate the coordinate (Fig. 2a).

Fig. 2
figure 2

CNV analysis based on Low-WGS. Low-WGS in the depth of 3× and 10× are visualized in IGV and displayed in (a) and (b), respectively. The upper two interfaces display the paternal deletion and the lower two display the maternal deletion. Discordant reads are represented in red color and the deletion is indicated with red lines. In the lower interface of (a), the coverage in the region of maternal deletion is obviously lower (the blue shade) but the coordinate cannot be identified. In accordance with the 3× WGS, 10× WGS reveals and locates the paternal deletion in the upper interface of (b). The lower interface of Fig. 2B shows the maternal deletion and its precise coordinate (color figure online)

10× WGS was then performed and showed the capacity of locating the breakpoints of both paternal and maternal fragments precisely (Fig. 2b). In accordance with the 3× WGS, the 10× WGS confirmed the del1K deletion and its exact coordinate, and further revealed the 42 kb maternal deletion (del42K) located in chr16:89780001-89822000, which encompasses exon 32–43 of FANCA, the whole range of the adjacent ZNF276, and exon 1–4 of the downstream VPS9D1 gene (Fig. 2b). Besides, 10× WGS identified a 107 bp insertion between the breakpoints of the maternal deletion. Except three continuous bases at the 3′ end, the reverse complement of the rest 104 bp insertion sequence mapped to the intron 2 of MRPS23 gene with the identity of 98.1% (102/104).

Genomic breakpoints and aberrant transcripts validated by Sanger sequencing

Regions containing genomic breakpoints of the del1K were successfully amplified and sequenced using primers 1KF/1KR in both samples of the proband and his father (Fig. 3a, b) and regions containing the breakpoints of the del42K were amplified and sequenced using primers 42KF/42KR in the samples of the proband and his mother (Fig. 3c). The putative del exon 37 mRNA transcript of paternal allele was amplified and verified using primers R1KF/R1KR in both cDNA samples (Fig. 4a, b). Two distinct FANCA–VPS9D1 fusion transcripts’ maternal allele were revealed with primers R42KF/R42KR by PCR in the cDNA of the proband and his mother, and further conferred by Sanger sequencing (Fig. 4c).

Fig. 3
figure 3

Breakpoints validation using genomic DNA. a AGE of PCR products of the trio DNA and one control using primer pairs 42K and 1K. b Diagram of the breakpoints in FANCA gene and DNA Sanger sequencing result of the paternal FANCA allele. c Diagram of the breakpoints in involved genes and Sanger sequencing result of the maternal allele

Fig. 4
figure 4

Transcripts validation using total cDNA. a AGE reveals one transcript of the paternal deletion (lane Proband-R1K and lane Father-R1K) and two distinct transcripts of the maternal deletion (lane Proband-R1K and lane Father-R1K). b Diagram of the transcript of paternal FANCA allele and Sanger sequencing validation. c Diagrams and Sanger sequencing results of the two transcripts of maternal allele

Alu elements identification and homology analysis

Four Alu elements were confirmed and presumed to participate in the two deletion events. The reverse complementary strands of AluSp in FANCA intron 36 and 7SL in intron 37 were presumed to mediate the paternal deletion (Fig. 5a). AluYk4 in intron 4 of VPS9D1 gene, which is on the downstream of FANCA, and AluYk12 in MRPS23 intron 2 were presumed to participate in the 3′ terminal genomic recombination of the maternal allele (Fig. 5b).

Fig. 5
figure 5

The rearrangement events and homology alignment. a An AluSp and a 7SL sharing a microhomology sequence induce the rearrangement within FANCA gene and form a novel chimeric Alu element in the paternal allele. Alu elements are represented by block arrows and the direction of the block arrows indicates the Alus orientation. The microhomology region is represented by an orange vertical bar, gray block arrows represent Alus irrelevant with the recombination event. b The maternal rearrangement involves four genes and one participated Alu-pair, AluYk4 and AluYk12. The microhomology region is represented by the brown vertical bar. Two genomic rearrangement events happened in the maternal allele, followed by the formation of FANCA–VPS9D1 fusion gene with a chimeric fragment of the reverse complement of MRPS23 intron 2. c The homology alignment of the two pairs of microhomology sequence. The identities of 7SL-AluSp and AluYk12-AluYk4 are 85% and 92%, respectively (color figure online)

Retrieval and functional analysis of the CNVs

None of the deletions discovered in this report have been included in the Database of Genomic Variants Archive (www.ebi.ac.uk/dgva/), the Database of Genomic Structural Variation (www.ncbi.nlm.nih.gov/dbvar/), or ClinVar (www.ncbi.nlm.nih.gov/clinvar/). Since the records are mostly submitted by Western countries and Japanese research institutions, we cannot conclude that the two deletions are rare variants or not. Moreover, we confirmed the del1K with the same genomic coordinates as those of the pedigree together with a heterozygous FANCA c.2611C>G/p.L871V variant in a 16-year-old girl diagnosed as aplastic anemia-paroxysmal nocturnal hemoglobinuria syndrome without malformation or increased chromosome breakages rate. It provides a trace that the del1K might not be a private one. Nationwide study of the genetic basis of FA is needed and case reports are also encouraged to be submitted to these databases to enrich our knowledge of the FA genetic epidemiology in Chinese population.

The normal FANCA gene encodes a 1455 aa protein (NP_000126.2), which contains a nuclear localization signal (NLS) motif, a Fanconi anemia group A protein N terminus region (Fanconi_A_N), and a Fanconi anemia group A protein region (Fanconi_A) (Fig. 6). In this case, the transcripts generated by the two deletions putatively encoded three different aberrant proteins. The C termini were truncated and the Fanconi_A regions were lost in all the three mutated FA proteins (Fig. 6). Due to the lack of the integrity of the FANCA gene, particularly the loss of Fanconi_A, we conclude that these two deletions are clearly pathogenic.

Fig. 6
figure 6

Motifs and domains of normal FANCA protein and putative aberrant proteins. The normal FANCA protein (NP_000126.2) contains 1455 aa residues, which constitute three currently known domains. The 18th–34th amino acid residues form the nuclear localization signal (NLS) motif represented by the blue box. The 172nd–522th amino acid residues constitute the Fanconi anemia group A protein N terminus region (Fanconi_A_N, pfam15865) represented by the green box. The purple box is Fanconi anemia group A protein region (Fanconi_A, pfam03511), consisting of the 1266th–1312nd amino acid residues. The second diagram depicts the putative mutant protein encoded by the paternal FANCA allele. The third and the fourth diagrams delineate two speculated FANCA–VPS9D1 fusion proteins of the maternal transcripts. In these three mutant proteins, the NSLs and the Fanconi_A_Ns are retained, but the Fanconi_A domains are lost (color figure online)

Discussion

Large deletions are a kind of notable pathogenic variants of FANCA, 75% of which are attributed to Alu-mediated non-allelic homologous recombination (NAHR) [5, 6, 15]. Song et al. [16] analyzed 219 CNV-Alu pairs from 58 articles and concluded that the sizes of deletions mediated by Alu retrotransposition vary from ~800 bp to ~4 Mb. Moreover, Alu-mediated recombination is more likely to be induced by younger Alu pairs, such as AluS and AluY subfamilies [16]. In this case, there are three fragments that participated in recombination; besides one intragenic deletion, they also generated one large deletion combined with one 107 bp inserted sequence of MRPS23 intron 2 and formed a FANCA–VPS9D1 fusion gene (Fig. 3). The sizes of the two deletions are 1014 bp and 42 kbp and are mediated by 7SL-AluSp and AluYk4-AluYk12, respectively, in accordance with the previous reports.

PCR and AGE of cDNA of the proband and his mother discovered two distinct FANCA–VPS9D1 fusion transcripts, and Sanger sequencing revealed the FANCA exon 30 skipping of transcript 2 (Fig. 4c). Exon skipping is a common phenomenon in alternative splicing and pathogenetic variants. Many SNVs including splice site mutations and missense mutations and InDels disrupting splicing donor or acceptor sites have been uncovered to lead to exon skipping [3]. Alu elements can also induce exon skipping and this effect can happen both upstream and downstream nearby alternative exons [17, 18]. This splicing change may be attributed to the over-suppression of Alu sequence exonization. To inhibit the exonization, hnRNPC, a kind of RNA binding protein that interferes with the mRNA synthesis process, is recruited to the splice sites in Alu sequences competing with the splicing factor U2AF65 [19]. The suppression effect may hinder mRNA splicing of the adjacent exons and eventually result in exon skipping. Although we found a nearby Alu element 338 bp away upstream of exon 30, there is no exon 30 skipping observed in healthy control or recorded in the CCDS database. To sum up, this exon skipping is more likely induced by the maternal genomic recombination.

About half of the large deletions extend beyond the region of FANCA gene, affecting the integrity of neighboring genes, which may exert influence on the patients’ phenotype [15]. In this case, the downstream boundary of the maternal deletion extends beyond FANCA, resulting in the loss of the whole sequence of ZNF267 gene and the deletion of VPS9D1 gene exon1–4. VPS9D1 encodes a GTPase activator and transporter protein, but has not been included in Online Mendelian Inheritance in Man database (OMIM). ZNF276 (#MIM 608460) encodes a DNA binding protein, but its particular function is still unclear, and no phenotype corresponding to its aberration is recorded in OMIM. The proband in this case manifested classically with complicated FA-associated features, but without any other symptoms irrelevant to FA. In addition, the mother harboring the 42 kb deletion is apparently healthy and without of any obvious abnormality, which demonstrates that monoallelic dysfunction of VPS9D1 or ZNF276 will not induce any hereditary diseases.

Among the applications of NGS, THS is still widely used for the best cost performance in clinical molecular diagnosis. With the growing awareness that the various genome variants (from SNPs and InDels to structural and numeric variants) may contribute to disease risks, the ability to reveal the comprehensive molecular profile should be emphasized. Although high coverage WGS can identify all kinds of variants in a single assay, the high cost makes it hard to be put into clinical practice at least by now. Hence, the integrated approaches may be actually practical in most labs.

In this case, in-depth analysis of THS data revealed a homozygous deletion of a region including exon 37 with the surrounding exons heterozygous deleted. To further locate the precise breakpoint at base level, we cross validated the variants via orthogonal methods. Sanger sequencing was the first one thought to capture the CNVs. It is easy to locate the breakpoints of short copy number deletion. But for longer ones, it is time-consuming and laborious to design primers to capture the obscure target. In this situation, genome-wide CNV analysis should be applied. Currently, in clinical routine practice, chromosomal microarray analysis (CMA), including array CGH and SNP array, has been considered as the first-tier approache. But the SNP array we performed could not clearly figure out the two submicroscopic fragment deletions.

WGS has improved the detection sensitivity of genetic abnormalities with unprecedented resolution to base level when the breakpoint can be captured into some split-reads. According to Bo Zhou et al., even just 1× coverage with short insert paired-end WGS outperforms most arrays [20]. In our study, we took the advantage of low WGS to identify the compound heterozygous deletions of FANCA gene quickly and accurately. We adopted both 3× and 10× WGS to cope with the longer CNV. As expected, both sequencing coverages could identify the focal FANCA deletions with similar resolution, though the higher coverage could capture the exact breakpoints through split reads. From the perspective of CNVs and balanced structural variants, the shallow sequencing has more potential to be translated into clinical applications [21].

Although it is prevalent to integrate multiple technologies to reveal comprehensive molecular profiles, there is also a trend to obtain the comprehensive profile of the entire genome in one single assay with the benefit of both time and cost, which attaches most importance to the clinical accessibility. With the decreasing cost and algorithm development of NGS, some comprehensive solutions are emerging [22, 23]. Theoretically, WGS possesses the ability to resolve all variant types, but the current bioinformatics algorithms and sequencing technology should be further improved to resolve more complex variants besides SNPs and InDels, such as interchromosomal translocation and intrachromosomal segmental deletion with insertion of other arbitrary genome fragments. We envisage a perfect molecular test capable of obtaining patients’ comprehensive molecular profiles speedily and accurately in a single assay with great financial benefit.