Next-Generation Sequencing for Molecular Diagnosis of Cystic Fibrosis in a Brazilian Cohort

Cystic fibrosis (CF), an autosomal recessive genetic disease, is recognized as one of the most prevalent diseases in Caucasian populations. Epidemiological data show that the incidence of CF varies between countries and ethnic groups in the same region. CF occurs due to pathogenic variants in the gene encoding cystic fibrosis transmembrane conductance regulator (CFTR), located on chromosome 7q31.2. To date, more than 2,000 variants have been registered in the CFTR database. The study of these variants leads to the diagnosis and the possibility of a specific treatment for each patient through precision medicine. In this study, complete screening of CFTR was performed through next-generation sequencing (NGS) to gain insight into the variants circulating in the population of Rio de Janeiro and to provide patient access to treatment through genotype-specific therapies. Samples from 93 patients with an inconclusive molecular diagnosis were subjected to full-length screening of CFTR using an Illumina NGS HiSeq platform. Among these patients, 46 had two pathogenic variants, whereas 12 had only one CFTR variant. Twenty-four variants were not part of our routine screening. Of these 24 variants, V938Gfs∗37 had not been described in the CF databases previously. This research achieved a molecular diagnosis of the patients with CF and identification of possible molecular candidates for genotype-specific treatments.


Introduction
The cystic fibrosis transmembrane conductance regulator gene (CFTR; OMIM #602421) encodes a chloride channel that is located in the apical membrane of epithelial cells [1]. Variants in this gene cause a reduction or complete absence of channel activity, leading to the development of a lifethreatening illness known as cystic fibrosis (CF; OMIM #219700) or mucoviscidosis [2]. CF is characterized as a multisystem disease with an autosomal recessive inheritance pattern. Patients exhibit progressive manifestations of obstructive pulmonary disease, pancreatic insufficiency, and high concentrations of chloride in the sweat [3][4][5].
With the identification of CFTR in 1989 [6], genetic analysis to identify disease-causing variants in this gene began, improving the diagnosis of CF and identification of pathogenic variant carriers. The most prevalent pathogenic variant was discovered 30 years ago, having a deletion of a phenylalanine at position 508 of the protein (F508del; c.1521_ 1523delCTT; p.Phe508del), present in one or both alleles in approximately 90% of cases in some populations [7,8]. Additionally, genetic studies helped clarify the correlation between CFTR dysfunction and the clinical characteristics, revealing that defects in CFTR can create other phenotypes besides CF [9][10][11].
Currently, more than 2,000 variants have been described over all 27 exons of CFTR (http://www.genet.sickkids.on.ca/ cftr/StatisticsPage.html), although only some of them are pathogenic [12]. Pathogenic variants are grouped into six classes according to their primary biological defects [13].
Understanding the process of CFTR synthesis up to its targeting to the plasma membrane is essential for the development of specific treatments. These treatments could be used to correct defective CFTRs according to the pathogenic variant of each patient. The genotype-specific therapeutic approach focuses on the detection of small modulatory molecules capable of correcting deficient subcellular trafficking of CFTR ("correctors") or on the defective gating ("potentiators") [14,15]. The identification of pathogenic variants is important for early diagnosis, allowing a more effective treatment and a longer life expectancy for the patients [12,16].
The heterogeneous distribution of CFTR variants worldwide and the size of the gene represent major challenges for the molecular diagnosis of CF. Thus, establishing populationspecific mutation panels is extremely important [17,18]. With new sequencing technologies becoming easily available, it is possible to rapidly generate a large amount of sequencing data, expanding the analysis of CFTR and uncovering populationspecific mutation panels, increasing the sensitivity and specificity of available diagnostic strategies for various populations or ethnic groups [19,20].
The aim of this study was to perform a complete screening of CFTR through next-generation sequencing (NGS) to investigate the variants prevalent in the population in Rio de Janeiro, Brazil.  [21], were invited to participate in this study. In total, 217 patients agreed to participate in the study ( Figure 1). Most of these patients (198 individuals) were initially screened for 27 known CF variants (Table 1) Figure 1: Flowchart of the studied sample. Of 198 probands with clinical suspicion of cystic fibrosis (CF), 74 candidates for nextgeneration sequencing (NGS) were part of an initial genetic screening of 27 variants. After the first stage, these patients were found to have one or no mutated CFTR allele. Additionally, 19 patients had not undergone any initial screening, totaling 93 patients.

Sanger
Sequencing. Twenty-four variants detected through NGS were confirmed by Sanger sequencing. Fifteen variants were previously observed in our diagnostic routine panel either by Sanger sequencing performed previously or through restriction fragment length polymorphism (RFLP). As a result, all variants observed through NGS were confirmed by at least one additional method. Sanger sequencing was performed using a Big Dye Terminator V3.1 kit (Applied Biosystems, Austin, TX, USA) in an ABI PRISM 3130xl DNA analyzer (Applied Biosystems). The CFTR DNA amplification was achieved with the set of primers listed in Supplementary Table 1. PCR products were visualized on 1.5% agarose gels and purified using a Sweep Clean up kit (Applied Biosystems, Vilnius, Lithuania). The obtained sequences were aligned with the reference sequence of CFTR in Ensembl (ENST00000003084.10). Sequence analysis was performed with Chromas Lite 2.0 software (Technelysium) and BioEdit Sequence Alignment Editor v6.0.6 (Ibis Therapeutics).

Results
Of the 309,361,463 pairs of reads generated in our experiment, 97.52% passed the quality control, of which 99.73% were assigned individually to each of the 95 subjects, with an average of 6,364,353 reads per individual. The lowest number and highest number of reads in the samples were 1,593,382 and 17,301,187, respectively. On an average, 88.73% of the reads were successfully mapped with the Analysis of the NGS data allowed us to identify 39 variants (Table 2); 24 were not part of our routine screening. Of these, 22 were exonic and 2 were intronic; 14 already had their confirmed pathogenicity in the CFTR2 database (https://www.cftr2.org/). The 24 variants were found in exons 3,4,6,8,10,12,13,14,15,17,19,20, and 22 and in introns 5 and 19, totaling 13 missense, 6 nonsense, 2 frameshift, 1 deletion, and 2 splicing variants.
A new frameshift variant, V938Gfs * 37 (c.2812_ 2813insG; p.Val938GlyfsX37), which occurs in exon 17 of CFTR, where guanine is inserted at position 2812 of the cDNA, resulting in a stop codon 37 bases after the amino acid change, was reported. This variant was identified in heterozygosity with the F508del mutation in a 27-year-old male patient with a positive sweat test (>60 mEq) and typical CF respiratory manifestations. In silico analyses by MutationTaster showed that V938Gfs * 37 is predicted as pathogenic and affects the structure of CFTR. The variant was submitted to the CFMDB.
Based on these results, both mutated alleles were identified in 46 individuals, 12 individuals presented only one CF variant, and 35 presented no genetic variant related to CF. Among the patients with two variants identified, 37 had the genotype CF-causing/CF-causing, 9 patients presented the combination CF-causing/unknown clinical significance, and 1 had CF-causing/novel variant, according to the CFTR2 database.

Discussion
The identification of new variants causing CF continues to occur even after almost 30 years of CFTR identification. Currently, more than 2,000 variants have been registered in the CFMDB; however, only 442 are annotated in the CFTR2 database, of which 360 are considered pathogenic. These variants vary in frequency and distribution in different populations. Historically, CF has been regarded as a disease limited to people of European descent. However, research has shown that CF is not ancestry linked. Therefore, in order to obtain a high detection rate, diagnosis through population-specific mutation panels should consider the molecular heterogeneity of the population and the variants to be included [11,19]. For example, panels used in European populations to diagnose African descent patients eventually lead to inconclusive results [28]. In countries with heterogeneous populations, such as those in Latin America, the use of these panels also leads to misdiagnosis, which can compromise the patient's health and treatment [29][30][31][32]. In Brazil, with its highly mixed population, the choice of mutation panels designed for other populations has become ineffective for the diagnosis, leading to a low detection rate [33]. This shows the importance of NGS for diagnostics in these populations.
Genetic testing of CF in Brazil is not performed with uniformity, since there are no epidemiological studies or a comprehensive neonatal screening to estimate the incidence of the disease in different regions of the country. Raskin [34] estimated that only 10% of the patients are diagnosed, leading to a false impression of low incidence in the Brazilian population. According to the latest report of the Brazilian Registry of Cystic Fibrosis (REBRAFC), a large increase was observed in the percentage of patients with genotype investigation. In 2013, 40.6% of the patients in a total of 2,942 individuals had their genotyping performed, and in 2017, the number of patients genotyped reached almost 80% of the 5,128 individuals analyzed. This improvement is due to advances in molecular diagnostic techniques [35].
Here, we used NGS in a cohort of 93 patients to conclude their molecular diagnosis and search for new or rare CF pathogenic variants in the Brazilian population. Thus, 74 patients from a sample of 198 individuals were tested. These individuals had already been screened for 27 common CF pathogenic variants. This means that patients with both alleles identified from this panel were not used in this study, causing the frequency of these common pathogenic variants to be underestimated by our NGS results.
The four most frequent pathogenic variants observed in our sample of 217 patients were F508del in 42% alleles, wherein 72 patients were heterozygous, and 55 were homozygous for this pathogenic variant. The 3120+1G>A, G542X, and G85E variants were observed in 5.8%, 4.1%, and 3.2% alleles, respectively. All four variants were part of the routine testing for CF molecular diagnosis performed in our laboratory. Nunes et al. [36] published the first Brazilian study using a NGS methodology using Ion Torrent PGM (Life Technologies), with pediatric patients from the Children's Institute at Hospital das Clínicas of the University of São Paulo Medical School (HCFMUSP). The three most frequent pathogenic variants described in their study were F508del (59.1%), G542X (7.3%), and 3120+1G>A (5.3%). Our findings corroborate the observations presented by Nunes et al. [36], in which they justify the high frequency of the G542X variant as a result of the migration flow of Spanish, Portuguese, and Italians to Brazil between the 19 th and 20 th centuries.
The Brazilian population has a significant genetic heterogeneity, mainly resulting from a trihybrid ethnic mixture of Europeans, Africans, and indigenous populations, which varies proportionally between the different Brazilian regions [37]. Thus, it is clear that CF genetic tests preestablished for populations defined as Caucasian may present limitations when employed in a scenario as heterogeneous as the Brazilian one. For example, if we had used a panel of 23 variants recommended by the American College of Medical Genetics and the American College of Obstetricians and Gynecologists [38] that detected 88% of non-Hispanic Caucasians, we would have reduced our rate of pathogenic variant recognition. Furthermore, of the 39 variants found in our study, only 10 would be part of this panel. In addition, five patients 4 Disease Markers A total of 46 out of 93 patients who participated in the NGS had two pathogenic variants. Among these variants, we detected a new one, i.e., V938Gfs * 37. Predictive analysis of the possible effect of this insertion in the MutationTaster program was positive for pathogenicity. As a class I mutation, it leads to complete or near-complete loss of CFTR activity [12,40]. Even with complete sequencing of CFTR, 12 patients were identified with only one CFTR variant. Among these, seven patients presented a defined pathogenic allele (F508del/unknown, G542X/unknown, 3120+1G>A/unknown, R334W/unknown, and 2183delAA/unknown). Notably, our designed method of sequencing is not capable of detecting large exonic and intronic deletions and duplications or copy number variations, a type of variant that is known to cause CF in some cases. We believe that such pathogenic variants may be responsible for some of these cases with only one defined allele.
Five individuals had one CFTR variant classified as "non-CF causing" according to CFTR2: R668C (c.2002C>T; p.Arg668Cys), G576A (c.1727G>C; p.Gly576Ala), R75Q (c.224G>A; p.Arg75Gln), and L997F (c.2991G>C; p.Leu997Phe). Two of the five patients presented the R668C and G576A variants. In 1992, Fanen et al. [41] considered the R668C variant as a polymorphism, and in 2003, Pagani et al. [42] described G576A as a variant that likely induced the skipping of exon 12 in splicing, leading to reduced levels of normal CFTR transcripts [43]. In a study by Ziętkiewicz et al. [44], the R668C variant was considered pathogenic and G576A a compound allele element. Based on previous studies, these variants when combined with a CF pathogenic variant are associated with a moderate phenotype (CFTR-related disorders; CFTR-RDs), in particular with congenital bilateral absence of the vas deferens (CBAVD). However, we cannot affirm that both variants R668C and G576A form a complex allele in our patients, since a segrega-tion study could not be performed. According to El-Seedy et al., the variants G576A and R668C affect the chloride channel activity [45]. The R75Q variant leads to the exchange of arginine to glutamine. Zielenski et al. [46] initially reported R75Q as a neutral variant that was not involved in CF. Gené et al. [47] evaluated the impact of this variant on the functioning of the CFTR channel and found a pattern of glycosylation and subcellular distribution similar to that of wild-type CFTR. The variant L997F was initially considered as a polymorphism and was subsequently reported to cause CFTR-RDs, such as lung diseases, disseminated bronchiectasis, idiopathic pancreatitis, CBAVD, and neonatal hypertrypsinemia with normal sweat test [48].
It is now known that the severity of CF is influenced not only by CFTR variants but also by modifier genes, intragenic polymorphisms, environmental factors, and lifestyle, which explains individuals with the same variant having different clinical manifestations [19]. To this end, great efforts have been made to develop therapeutics for correcting the consequences of CFTR variants on the function of the protein [49], some of which are already available for the treatment of patients with certain genotypes [15].

Conclusions
Through the NGS-based study of CFTR, we expanded our knowledge of the variants that circulate in the population of Rio de Janeiro, allowing us to offer genetic support for patients seeking specific treatments. In addition, NGS made it possible to increase our previous panel of variants from 27 to 51 (41 CFTR pathogenic variants). This study highlights the importance of considering the distribution of pathogenic variants specific for admixed populations for choosing the right molecular diagnostic method. The use of NGS for the entire gene has an advantage over the mutation-specific panels available, allowing the discovery of disease-causing variants that are population specific. Moreover, this method provides an opportunity for patients from countries with heterogeneous populations, which are not well covered by commercial diagnostic panels, to have a molecular diagnosis for receiving genotype-specific therapy and creates the scope for providing genetic counseling to the family.