A strategy for molecular diagnostics of Fanconi anemia in Brazilian patients

Abstract Background Fanconi anemia (FA) is a predominantly autosomal recessive disease with wide genetic heterogeneity resulting from mutations in several DNA repair pathway genes. To date, 21 genetic subtypes have been identified. We aimed to identify the FA genetic subtypes in the Brazilian population and to develop a strategy for molecular diagnosis applicable to routine clinical use. Methods We screened 255 patients from Hospital de Clínicas, Universidade Federal do Paraná for 11 common FA gene mutations. Further analysis by multiplex ligation‐dependent probe amplification (MLPA) for FANCA and Sanger sequencing of all coding exons of FANCA, ‐C, and –G was performed in cases who harbored a single gene mutation. Results We identified biallelic mutations in 128/255 patients (50.2%): 89, 11, and 28 carried FANCA,FANCC, and FANCG mutations, respectively. Of these, 71 harbored homozygous mutations, whereas 57 had compound heterozygous mutations. In 4/57 heterozygous patients, both mutations were identified by the initial screening, in 51/57 additional analyses was required for classification, and in 2/57 the second mutation remained unidentified. We found 52 different mutations of which 22 were novel. Conclusion The proposed method allowed genetic subtyping of 126/255 (49.4%) patients at a significantly reduced time and cost, which makes molecular diagnosis of FA Brazilian patients feasible.


Introduction
Fanconi anemia (FA) is a genome instability syndrome that affects multiple organs, and is characterized by a range of physical abnormalities, a predisposition to neoplasias, and stem cell loss leading to progressive bone marrow failure (Crossan and Patel 2012;Kee and D'Andrea 2012;Schneider et al. 2015).
FA can be caused by biallelic mutations in the majority of the genes, hemizygous mutations in FANCB, or by dominant negative mutations in RAD51. Over the last 2 years, five new FA genes were discovered, such as FANCR/RAD51, FANCS/BRCA1, FANCT/UBE2T, FANCU/XRCC2, and FANCV/MAD2L2/REV7. The rapid advancement in identification of new FA genes improves the understanding of the role of FA proteins in the DNA repair and has important implications in the molecular characterization of FA patients. (Vaz et al. 2010;Ameziane et al. 2015;Hira et al. 2015;Sawyer et al. 2015;Bluteau et al. 2016;Mamrak et al. 2016;Park et al. 2016).
The high phenotypic variability in patients with FA and the overlap of symptoms with other syndromes make the diagnosis difficult on the basis of clinical manifestations alone, and confirmation of clinical findings by laboratory methods becomes necessary. The classic FA diagnostic test is the detection of cellular hypersensitivity to DNA interstrand crosslinking agents such as diepoxybutane (DEB) and mitomycin C (Oostra et al. 2012;Auerbach 2015). However, a rapid and accurate diagnosis of FA is of great importance, as it would significantly affect patient followup and treatment decisions. These requirements led to the development of diagnostic techniques based on molecular analysis (Castella et al. 2011;Ameziane et al. 2012;Gille et al. 2012;Aslan et al. 2015).
FA molecular diagnosis has been demanding and onerous due to the genetic heterogeneity associated with the disease. In addition to the number of genes involved, hundreds of unique causative mutations have been reported throughout the FA genes (Wijker et al. 1999;Gille et al. 2012;De Rocco et al. 2014). In various populations, founder mutations have been identified. Information on the ethnic background of the patient might provide evidence for a pathogenic mutation that is likely to be causal (Faivre et al. 2000;Tipping et al. 2001;Kutler and Auerbach 2004;Castella et al. 2011).
This study proposes a strategy for the molecular investigation of Brazilian FA patients based on an initial screening for common mutations in the most frequently affected genes FANCA (OMIM 607139), FANCC (OMIM 613899), and FANCG (OMIM 602956). In patients where both mutations are not identified by the initial screening approach, further investigation is performed using multiplex ligation-dependent probe amplification (MLPA) and Sanger sequencing of the entire coding region of the genes. The molecular characterization of patients with FA is of major importance because it permits the exclusion of diseases with overlapping clinical symptoms, allows families to receive accurate genetic counseling, and facilitates the development of targeted prenatal genetic testing. In addition, accurate molecular stratification of patients is essential for participation in forthcoming gene therapy trials (Ameziane et al. 2008;Gille et al. 2012;Knies et al. 2012).

Ethical compliance
This study was approved by the HC/UFPR Ethical Committee on Human Research, and informed consent was obtained from subjects or their legal guardians.

Patients
Our cohort included 255 Brazilian probands with FA diagnoses confirmed by chromosomal breakage (DEB) test (Auerbach 2015). Patients were followed at the Fanconi Anemia Outpatient Clinic -Hospital de Cl ınicas, Universidade Federal do Paran a (HC/UFPR), between 1995 and 2012. All 255 patients were investigated by the proposed screening test, and the investigation proceeded with 128/255 patients in whom at least one FA mutation was identified.

DNA extraction
Genomic DNA was isolated from peripheral blood samples according to Miller et al. (1988) using a modified salting out procedure.

Strategy for molecular investigation of Brazilian patients with FA
Patients were initially screened for common mutations in the FANCA, FANCC, and FANCG genes. MLPA was used to detect FANCA large deletions, and Sanger sequencing of these genes was utilized when the second mutation was not identified either by common mutation screening or by MLPA. Both MLPA and Sanger sequencing methods were performed at the Department of Clinical Genetics, VU University Medical Center, Amsterdam, the Netherlands as part of a training that allowed the implementation of these methodologies to the Laboratory of Immunogenetics of HC/UFPR in Brazil.

Screening of FA common mutations
The 11 commonly occurring mutations in the FANCA, FANCC, and FANCG genes were selected to comprise the initial screening panel (Table 1). The methods used to identify each of these mutations were polymerase chain reaction (PCR), amplification-refractory mutation system PCR (ARMS-PCR), and PCR-restriction fragment length polymorphism (RFLP) as shown in Tables 2 and 3.
The PCR and ARMS-PCR methods and also the primers for c.1115_1118delTTGG and c.2853-19_2853-1del19 were adapted from Levran et al. (1997). Primers for the mutations c.2535_2536delCT and c.987_990delT-CAC were designed using Primer3Plus software (Untergasser et al. 2007). All PCR and ARMS-PCR primers are listed in Table 2. The restriction assays and also primers for c.67delG; c.65G>A and c.456+4A>T were adapted from the original publications (Gibson et al. 1996;Auerbach et al. 2003;Yates et al. 2006). Primers and suitable enzymes for mutations c.1393C>T and c.1480+1G>C were selected using PCR designer program (http://primer1.so ton.ac.uk) (Ke et al. 2002). The mutation c.3788_3790delTCT was detected by mismatch PCR assay followed by a restriction assay developed by Levran et al. (1997) and detection of c.1077-2A>G was according to Auerbach et al. (2003). All PCR-RFLP primers are listed in Table 3.
Detection of FANCA large deletion by MLPA MLPA was used to detect deletions and duplications of entire exons in the FANCA gene (Schouten et al. 2002). The Salsa MLPA kit with the probe mix P031 and P032 for FANCA (MRC Holland, Amsterdam, the Netherlands) was used according to the manufacturer's instructions (www.mlpa.com). Separation and quantification of MLPA products were done on ABI 3730 Genetic Analyzer (Applied Biosystems, Foster City, CA, USA). MLPA data were analyzed using GeneScan TM 500 TAMRA TM size standard (Applied Biosystems) and GeneMarker software (SoftGenetics, State College, PA, USA) as described in Ameziane et al. (2008).

Sanger sequencing of FANCA, FANCC, and FANCG
In patients for whom only one mutation was identified by the screening test, the DNA sequence of the respective gene was further investigated using Sanger sequencing, except those with large deletions detected by MLPA. Sanger sequencing was performed with Big Dye Terminator (Applied Biosystems) followed by electrophoresis on ABI 3730 Genetic Analyzer. Sequence Pilot software (JSI Medical Systems, Kippenheim, Germany) and reference sequences from the RefSeq database (FANCA NM_000135.2; FANCC NM_000136.2; FANCG NM_004629.1) were used for data analysis. The reaction mixtures for the 25 lL PCR reactions were prepared as follows: 0.5 U Platinum Taq polymerase (Invitrogen, Carlsbad, CA, USA), 1.5 mM MgCl 2 , 0.2 mM dNTPs (Invitrogen), and 10 pmol primers. For the majority of amplicons, standard PCR conditions were used (initial denaturation at 95°C for 5 min, followed by 33 cycles of denaturation at 95°C for 30 sec, annealing at 60°C for 30 sec, and elongation at 72°C for 1 min). Some fragments required special conditions for PCR amplification including exons 5, 7, 13, 21, 26, 31, 38 of FANCA and   Adapted from Gibson et al. (1996), Levran et al. (1997), Auerbach et al. (2003) and Yates et al. (2006 PCR for c.1077-2A>G was according to Auerbach et al. (2003), and for c.3788_3790delTCT followed Levran et al. (1997). Restriction assays were performed according to manufactures instructions for each enzyme. PCR, polymerase chain reaction; RFLP, restriction fragment length polymorphism.
exon 7 of FANCC with an annealing temperature at 55°C, and exon 1 of FANCA with annealing at 64°C and PCR mix supplementation with 10% DMSO. The procedure for sequencing FA genes and the primer sequences are described in Gille et al. (2012). The pathogenic state of new mutations was investigated utilizing the in silico prediction algorithms SIFT, POLY-PHEN2, and Align GVGD (Tavtigian et al. 2008;Kumar et al. 2009;Adzhubei et al. 2010), which are integrated in the Alamut software (Interactive Biosoftware, Rouen, France). All already known and novel mutations identified in this study were reported to the Fanconi Anemia Mutation Database (http://www.rockefeller.edu/fanconi), hosted by the Leiden University Medical Center, the Netherlands, Leiden Open Source Variation Database (LOVD v.3.0).

Mutations identified by the screening panel
Mutations were identified in 128 of the 255 patients of the initial cohort. Twenty-nine out of 128 patients were previously investigated. Twenty out of these 29 patients had the FANCA mutation c.3788_3790delTCT identified in an earlier study by our group (Magdalena et al. 2005). This mutation was homozygous in 12/20 and heterozygous in 8/20 patients, the latter being further investigated in this study for other alterations in FANCA to complete their genetic subtype. Nine out of 29 of these Brazilian patients (four homozygous and five compound heterozygous) harbored mutations which had been previously identified by Ameziane et al. (2008). The remaining 127/ 255 patients did not have any of their mutations detected in the screening test.
Investigation of the second mutation using MLPA and Sanger sequencing Using MLPA and Sanger sequencing, the second mutation was successfully identified in 51/53 heterozygous patients (40 in FANCA, 4 in FANCC, 7 in FANCG). And in two out of 42 FANCA heterozygous patients, the second mutation was not identified. Thus, a total of 126 patients had their mutations identified using the proposed strategy (Fig. 1).
The other 30/40 patients had the second FANCA mutation identified by Sanger sequencing. Twenty-four different mutations were identified, of which 13 were novel The fact that mutation c.2535_2536delCT was detected in compound heterozygosis in two patients, and mutations c.987_990delTCAC and c.2853-19_2853-1del19 were simultaneously present in one patient explains the discrepancy between the number of alleles and the number of patients. Mutations c.67delG (FANCC) and c.1480+1G>C (FANCG) were not identified among the 255 patients with the methods used in the screening panel.
(   In all four FANCC heterozygotes, the second pathogenic mutation was identified by Sanger sequencing, and two of them were new variants (c.338G>A, c.388delGin-sAAAA).
Seven different mutations were found among the seven FANCG heterozygous patients, five are novel and three of seven are located in exon 10 (Table 5).
Overall, a total of 52 mutations were found in this cohort, 30 of which had already been described in the literature and 22 (42.3%) were novel mutations (Tables 5, 6a,b). Thus, screening for the 11 common mutations, and when necessary further testing with MLPA and Sanger sequencing, led to the molecular characterization of 126/ 255 Brazilian FA patients.

Segregation of mutations in Fanconi anemia families
In 95 of 126 patients (75.4%), the segregation of at least one allele was confirmed by the analysis of paternal and/ or maternal samples. Segregation analysis was not possible in 31 patients due to the unavailability of maternal and/ or paternal samples.

Novel mutations
The 22 novel mutations detected among the 126 patients are outlined on Table 6a,b. Fifteen correspond to changes in FANCA gene, two in FANCC, and five in FANCG. Of these, 14 were considered to be deleterious because of their functional consequences (frameshift, nonsense, large deletions and splicing affecting mutations +1; +2 and À2). With regard to the other eight mutations comprised of missense, in frame or splicing affecting mutations +3, the in silico analysis suggests that they are potentially pathogenic; however, confirmation of their effects on the normal function of the genes products is required.

Costs and time estimate
The cost estimate per patient in each step of the proposed algorithm was US$ 60.15 for screening of common mutations, US$ 75.30 for MLPA of FANCA, US$ 643.15 for FANCA sequencing, and US$ 245.20 for FANCC as well as for FANCG sequencing. The estimated cost per patient using this strategy ranged from US$ 60.15 to US$ 1269.00 with an average cost of US$ 703.70. These estimates were based on the number of patients investigated in each phase of the strategy (Fig. 1) and the frequency of FA-A, FA-C, and FA-G complementation groups in the literature. If the screening is not utilized and the search for FA mutations starts straight from MLPA and Sanger sequencing of FANCA, proceeding to FANCC and FANCG when needed, the estimated cost could range from US$ 718.50 to US$ 1209.00 with an average of US$ 905.00 considering the literature information about frequency of FA complementation groups.
The screening of common mutations requires 5 days, MLPA 3 days, sequencing of FANCA 12 days, and sequencing of FANCC and FANCG 3.5 days each; therefore the turnaround time per patient ranged from 5 to 23.5 days with an average of 11 days. This estimate considered the number of patients identified in each phase of the algorithm (Fig. 1) and the frequency of FA complementation groups in the literature. However, the turnaround time could range from 15 to 22 days with an average time of 13 days if the search for FA mutations starts by MLPA and FANCA sequencing without the initial screening for common mutations.

Discussion
The molecular investigation of FA in Brazilian patients, starting with a panel of common mutations, was proposed to determine the number of patients that could have their mutations identified with this strategy in order In 2/128 patients, only c.3788_3790delTCT mutation was detected, which explains that only 254 out of 256 potential alleles were identified. 5 Deletion starting somewhere upstream from the 5' end of the gene, located at coding DNA nucleotide -42. 6 Probably no protein is produced. to make FA molecular subtyping cost and time efficient in routine clinical diagnostics. FANCA, FANCC, and FANCG were selected to compose the screening panel because they carry deleterious mutations in FA patients more frequently than do other FA genes. The frequency of genetic subtypes in different populations has shown that 85 to 90% of patients are included in the FA-A, FA-C, and FA-G subtypes, whereas the remaining are distributed among the other 18 genetic subtypes, and a few cases are still awaiting for genetic classification (Levitus et al. 2006;Auerbach 2009;Ameziane et al. 2012;Gille et al. 2012;Schneider et al. 2015). This screening test, despite including only three genes and 11 mutations, allowed the identification of at least one FA mutation in 50.2% (128/255) of this cohort (Table 4). Considering the high genetic heterogeneity of FA, this strategy demonstrated its relevance to the molecular characterization of Brazilian FA patients. Such knowledge of the recurrent mutations in certain ethnic groups or geographical locations allows for dedicated screening approaches in the Brazilian population, which could also be applied to other populations.
Mutations c.3788_3790delTCT and c.1115_1118del TTGG included in the screening panel are the most frequent in FANCA (Levran et al. 1997Castella et al. 2011). In this study, 69 out of 255 patients (27.1%) carried c.3788_3790delTCT mutation (Table 4), and it is in agreement with our previous report where this genetic alteration was found in 24/80 (30%) of Brazilian FA patients (Magdalena et al. 2005). In contrast, c.1115_1118delTTGG was detected in 1.2% of this cohort (Table 4), whereas Levran et al. (2005) had reported a frequency of 2.2% in Brazilian patients and 5.5% in other ethnic backgrounds. In a Spanish FA-A subtype cohort, frequencies of 20.7% for c.3788_3790delTCT and 9.4% for c.1115_1118delTTGG were observed (Castella et al. 2011). The c.456+4A>T FANCC mutation, highly frequent among Ashkenazi Jews, was detected in only four of 255 patients (1.6%), and c.67delG, common among Europeans, was not identified in this Brazilian cohort. These findings enhance the previous observations about the differences in prevalence of some mutations in certain ethnic groups (Faivre et al. 2000;Futaki et al. 2000;Kutler et al. 2003;Kutler and Auerbach 2004;Neveling et al. 2009). FANCC mutations c.65G>A and c.1393C>T were also identified at low frequencies, 1.6% and 1.2%, respectively (Table 4), but higher than those reported in the FA database. This higher frequency was expected for c.1393C>T since it was first identified in Brazilian patients by Yates et al.(2006). Finally, FANCG c.1077-2A>G, initially described by Demuth et al. (2000), was found to be the second most frequent mutation in this study. It was detected in 28 of 255 (11%) patients (Table 4) supporting the previous data of Auerbach et al. (2003) where all seven FA-G Brazilian patients presented this mutation.
The screening panel developed for this study allowed the identification of homozygous mutations in 71 out of the 255 patients and compound heterozygous mutations in 4/255 patients. Among 53/255 heterozygous patients, two were excluded because only one pathogenic mutation was detected, probably owing to somatic mosaicism that can lead to reversion from the mutant to the wild-type allele (Waisfisz et al. 1999;Ameziane et al. 2008;Neveling et al. 2009;Castella et al. 2011;Gille et al. 2012). To identify the second mutation in the remaining 51 heterozygous patients (40 in FANCA, 4 in FANCC, and 7 in FANCG), additional analysis was required using MLPA and/or Sanger sequencing of the entire respective gene (Fig. 1).
Among the 40 FANCA compound heterozygous patients 25% had large deletions. Identification of this type of genetic alteration is essential for the molecular diagnosis of FA as 15-40% of pathogenic mutations in FANCA are caused by large deletions Gille et al. 2012).
The strategy of starting with the screening for common mutations reduced the need for MLPA and (or) sequencing analysis in 29.4% of the cases, as 75/255 had the two mutations detected by the initial screening. Furthermore, this approach allowed the sequencing of a single gene in 20.0% of the heterozygous patients, as the second mutation is likely to be detected in the same gene previously implicated in the disorder.
The sequencing data of FA genes in those patients whose second mutation was not identified by the screening panel showed that some FANCA exons were more frequently mutated among the Brazilian patients. This information led to the proposal of sequencing stratification into two sets according to the observed mutation frequencies. The first set would include exons 2, 8, 28, 32, 37, and 41 and the second set exons 4, 10, 16, 29, 36, 40, and 42. In most cases, this strategy would avoid the need to sequence the other 30 FANCA exons.
In total, 52 deleterious mutations were found in 126 molecularly characterized patients, including 17 point mutations, 18 small insertions and deletions, nine changes in splicing sites, and eight large deletions (Table 5). Of these variants, 22 (42.3%) are novel mutations not previously described (Table 6a,b). All novel mutations were heterozygous with another previously identified deleterious mutation in the same gene, which suggests their pathogenicity. Nevertheless, this might be a consequence of the initial strategy where only the compound heterozygous patients were sequenced. It is also very likely that patients homozygous for one of the novel mutations would exist as well. Most novel mutations were identified in only one patient, but three of the FANCA variants (c.190-2A>T, c.3163C>G, and c.4082A>C) were found in two nonconsanguineous patients.
No functional studies or cDNA analyses have been performed to confirm the pathogenicity of these newly identified genetic changes. However, their pathogenicity is very likely according to the in silico evaluation using predictive algorithms such as SIFT, POLYPHEN2, and Align GVGD (Tavtigian et al. 2008;Kumar et al. 2009;Adzhubei et al. 2010).
The segregation of the mutant alleles in the families is another finding suggesting a pathogenic role for these novel mutations. It was possible to investigate the segregation of maternal and paternal alleles for 12 of the 22 novel variants. The large deletions were not investigated in parental samples because of their evident functional consequences. Furthermore, the definition of an inherited mutation enables the investigation of heterozygous carriers as well as genetic counseling. All FA patients with identified mutations should be referred to the genetic counseling, along with their parents and siblings whose mutation status should be characterized regardless of the manifestation of congenital malformations or clinical symptoms.
Genetic subtyping in this study showed that 69.0% of the patients who had molecular investigation concluded belonged to FA-A, 8.7% to FA-C, and 22.2% to FA-G. Comparatively, Gille et al. (2012) evaluated patients from 11 nationalities and found that 57% were FA-A, 11% FA-C, and 9% FA-G, whereas only one family was identified in each of the other groups (FA-E, FA-F, and FA-B). Their results were similar to those found by other groups (Faivre et al. 2000;Kutler et al. 2003;Auerbach 2009). In the present cohort of Brazilian patients, it was found that 22.2% belonged to FA-G subtype compared to the approximately 10% reported in the literature. This higher frequency might be due to the composition of the screening panel that included the FANCG mutation c.1077-2A>G, which may introduce a bias in this frequency as investigations were only pursued for patients showing 1 of the 11 mutations in the initial panel.
All patients who had none of their mutations identified by the screening panel will be further investigated in subsequent studies. More comprehensive strategies such as complete sequencing of the FANCA, FANCC, and FANCG as well as next-generation sequencing which allows for the simultaneous investigation of all FA genes will be utilized Knies et al. 2012;De Rocco et al. 2014). The accurate frequency of FA subtypes among Brazilian patients will be possible only after conclusion of the molecular characterization of all 255 patients included in the initial cohort of this study.
In general, patients with FA require special clinical management and molecular diagnosis is necessary to rule out other diseases with overlapping phenotypes. A wellestablished FA diagnosis requires a positive result from the chromosomal breakage test as well as the detection of pathogenic mutations in FA genes Gille et al. 2012). The screening for the most frequent mutations together with MLPA and Sanger sequencing of the FANCA, FANCC, and FANCG genes enables the genetic subtyping of 50% of Brazilian patients.
The proposed approach led to reductions of 22% in cost and 16% in turnaround time, when compared to the cost of identifying mutations utilizing only MLPA and Sanger sequencing, thus enhancing the feasibility of the molecular characterization of FA patients in a routine diagnostic setting. Next-generation sequencing has been presented as an alternative for sequencing genes with lower cost, but these methods are not feasible for all clinical laboratories worldwide and are not always suitable for the analysis of small numbers of samples at a time. The assignment of Brazilian patients to their genetic FA subtypes and the identification of their respective mutations provided data that were still underexplored in Brazil.
The knowledge of the genetic profiles of the Brazilian patients with Fanconi anemia will enable future studies to investigate if there is any influence of the distinct FA genotypes on the clinical course of this rare disease.