A Genotypic-Oriented View of CFTR Genetics Highlights Specific Mutational Patterns Underlying Clinical Macrocategories of Cystic Fibrosis

Cystic fibrosis (CF) is a monogenic disease caused by mutations of the cystic fibrosis transmembrane conductance regulator (CFTR) gene. The genotype–phenotype relationship in this disease is still unclear, and diagnostic, prognostic and therapeutic chal-lenges persist. We enrolled 610 patients with different forms of CF and studied them from a clinical, biochemical, microbiological and genetic point of view. Overall, there were 125 different mutated alleles (11 with novel mutations and 10 with complex mutations) and 225 genotypes. A strong correlation between mutational patterns at the genotypic level and phenotypic macrocategories emerged. This specificity appears to largely depend on rare and individual mutations, as well as on the varying prevalence of common alleles in different clinical macrocategories. However, 19 genotypes appeared to underlie different clinical forms of the disease. The dissection of the pathway from the CFTRmutated genotype to the clinical phenotype allowed to identify at least two components of the variability usually found in the genotype–phenotype relationship. One component seems to depend on the genetic variation of CFTR, the other component on the cumulative effect of variations in other genes and cellular pathways independent from CFTR. The experimental dissection of the overall biological CFTRpathway appears to be a powerful approach for a better comprehension of the genotype–phenotype relationship. However, a change from an allele-oriented to a genotypic-oriented view of CFTRgenetics is mandatory, as well as a better assessment of sources of variability within the CFTRpathway. the the forms with CFTR-RD CBAVD. mutations.They CF-PI Three alleles with stop mutations were, however, found to give rise to milder clinical forms than expected. S1455X (p.Ser1455*) was found in 1 CFTR-RD patient in compound heterozygosity with F508del (p.Phe508del) and in 1 CF-PS patient in compound heterozygosity with 621+1G>T (c.489+1G>T). the patient the only fertile man in the case series, with spermatozoa in the seminal fluid. The Q1476X (p.Gln1476*) allele was found in 3 CFTR-RD patients (2 in compound heterozygosity with F508del (p.Phe508del). The E831X (p.Glu831*) was found in 1 CF-PS patient in compound heterozygosity with F508del (p.Phe508del). This stop codon is positioned in a central zone of CFTRin an in-frame exon. It has been reported to cause anomalous splicing that skips both the exon and, consequently, the premature protein termination (see Alternative Splicing at a NAGNAG Acceptor Site as a Novel Phenotype Modifier. Our case further confirms the clinical output, which is milder than expected, of this mutated allele. These data highlight that the in-silicoassignment of a heavy functional role to stop codons undertaken caution,

Although the model of an inverse correlation between protein residual function and phenotype severity is generally accepted, the poor functional characterization of most of the CFTR sequence variations that have been identified, especially with regard to their quantitative effect, hampers the practical application of this concept. In addition to these critical issues, the problems encountered in recognizing the different clinical forms of CF, as well as the influence of modifier genes, further complicate the framework.
The aim of the present work is to improve our understanding of the genotype-phenotype relationship in CF by means of a genotypic-oriented approach based on the selection of specific mutational patterns underlying different clinical forms of the disease. A general approach in which a specific pathogenic role is assigned to each CFTR sequence variation, also taking into account disease severity, is proposed. The results of this study also shed light on the sources of variability, acting at different levels, involved in the path from genotype to protein residual function and, eventually, to clinical phenotype.

Case Series: Characterization and Diagnostic Criteria
We evaluated all patients already diagnosed and enrolled at the CF Reference Center of the Lazio Region up to 1996 and all subsequent new diagnoses from 1996 to 2012. This step yielded a consecutive case series comprising 692 patients. A total of 82 patients were excluded from this study because of incomplete genetic, biochemical, microbiological, clinical and/or family data. The remaining 610 patients (1,220 alleles) with complete data, mainly from central Italy, were enrolled in the study. According to generally accepted procedures, clinical (15,16), instrumental, laboratory (17) (Supplementary  Table S1), microbiological (18,19) and biochemical and genetic (see below) evaluations were performed. Depending on these characterizations, the patients were classified according to recent CF guidelines and recommendations (7,8,17,20) in the following four clinical macrocategories (also called "populations" in the text): (a) CF with pancreatic insufficiency (CF-PI) (354 patients, 708 alleles); (b) CF with pancreatic sufficiency (CF-PS) (138 patients, 276 alleles); (c) mono-or oligosymptomatic forms of CF, which for the purposes of this work included both CFTR-related disorders and atypical CF forms (here called CFTR-RD, 71 patients, 142 alleles); and (d) congenital bilateral absence of vas deferens (CBAVD), which for the purposes of this work was selected as the only clinical manifestation (with no other CF symptoms) (CBAVD, 47 patients, 94 alleles). When the diagnosis according to CF guidelines and recommendations was in contrast to clinical evidence, the latter prevailed.

Ethics Statement
Informed consent was obtained from every patient (or parents) before enrollment. The study was approved by the institutional ethics committee and carried out according to the Helsinki Declaration.

Biochemical Characterization
All patients underwent a sweat test, at least twice, performed by means of a quantitative pilocarpine iontophoresis method (21) by using the Macroduct device (Delcon, Milan, Italy) for sweat collection and the PCL M3 chloride analyzer Jenway (VWR International, Milan, Italy) for measurement. In accordance with recent guidelines (17), the sweat test in subjects up to 6 months of age was considered negative if [Cl -] was <30 mEq/L, pathologic if ≥60 mEq/L and borderline if in the 30-59 mEq/L range; for all other subjects, the sweat test was considered negative if <40 mEq/L, pathologic if ≥60 mEq/L and borderline if in the 40-59 mEq/L range.
Exocrine pancreatic function was evaluated by the dosage of fecal elastase 1 (22) by using the immunometric pancreatic fecal elastase test (Meridian Bioscience, Milan, Italy). The status of pancreatic sufficiency for all CF-PS, CFTR-RD and CBAVD patients was ascertained from the nonpathological levels of fecal elastase 1 (>200 μg/g) in at least two independent dosages as well as from the absence of steatorrhea. All the patients with elastase 1 level well below this threshold were characterized by reduced growth and were classified as CF-PI.

Mutational Search Strategy
DNA was extracted from peripheral blood by using the QIAamp DNA blood midi kit (Qiagen, Hilden, Germany). The mutational search on the CFTR gene (RefSeq NM_000492.3, NG_016465.3) was initially conducted by using a multistep approach, with the progressive application of five sequential steps for the analysis of the following: [1210-14TG [12]; 1210-12T [5]]) and the (TG)11T5 (c. [1210-14TG [11];1210-12T [5]]), by means of our assay (23) based on DNA sequencing; (d) the proximal 5′-flanking, all exons and adjacent intronic zones, by means of our assay based on DNA sequencing (24), always applied to completion when included (this step is referred to as "SEQ" in this article); and (e) the seven most frequent macrodeletions worldwide, by means of the FC del assay (Nuclear Laser Medicine, Milan, Italy) (this step is referred to as "DEL" in this article). The mutational search was usually interrupted when the first two CFTR mutations already characterized as diseasecausing were found on different alleles. Those genotypes reported to have at least one unknown allele (see Results) underwent all the steps, including DEL step.
The mutational search from step (a) to step (d) [CF-OLA, CF-SNAP+20, (TG)mTn and SEQ] was performed in a 96-well format, using a semi-automated platform made up of a robotic system (Microlab Starlet; Hamilton) for the reaction setup and two genetic analyzers (ABI PRISM 3100 Avant and ABI PRISM 3130 xl; Applied Biosystems [Thermo Fisher Scientific Inc., Waltham, MA, USA]) for the development of the electropherograms. For data analysis, the specific CF-OLA template (Abbott) and our specific CF-SNAP+20 template, based, respectively, on the Genotyper and GeneMapper software (Applied Biosystems [Thermo Fisher Scientific]) were used for the CF-OLA and CF-SNAP+20 steps, respectively. The results of the (TG)mTn tracts were analyzed as previously described (23). Sequences obtained in the SEQ step were analyzed by using our specific template based on Seqscape software (Applied Biosystems [Thermo Fisher Scientific]) (25). The segregation of all mutated alleles was ascertained by analysis of parents.
Mutations are reported with both the old (legacy name) and the new nomenclature (HGVS name) in all of the tables and in the text; for practical purposes, the legacy name alone is used in the figures.

Pathogenic Classification of Mutated Alleles
The general principle applied for the clinical classification of alleles was that the clinical effect is determined by the overall residual functionality of the CFTR protein and thus, ultimately, by the allele with the highest functionality. A set of three rules was established to assign a phenotypic effect to each mutated allele found in patients and to determine its ability to induce clinical manifestations belonging to one (or more) of the four clinical macrocategories identified. It was possible to apply this procedure because the diagnosis in patients had previously been made on the basis not only of guidelines and recommendations, but also of a conclusive clinical assessment. The rules were subsequently applied and their performance was experimentally validated according to the clinical classification of patients. Upon the application of each rule, a suitability control was applied to the previously classified alleles, which in some cases led to alleles being reclassified. The first rule consisted in the assignment of all alleles found in homozygosis to the specific macrocategory in which they had been identified. The second rule consisted in the classification, as CF-PI-causing alleles, of all alleles found in compound heterozygosis in the CF-PI macrocategory. The third rule consisted in the classification of all alleles found in CF-PS, CFTR-RD and/or CBAVD in compound heterozygosis with a previously classified allele (see the Supplementary Materials for the algorithm, Supplementary Figure S1 and examples). By applying these rules, it proved possible to assign each allele to (a) a single macrocategory and label it as an allele with a unique phenotypic ef-fect; (b) more than one macrocategory and label it as an allele with a variable effect; and (c) no category and label it with an uncertain classification.

Statistical Analysis
Contingency tables, analysis of variance (ANOVA), Student t test and Bonferroni multiple comparison test were used for the statistical analysis of experimental data, by using the SPSS software (SPSS [IBM, Armonk, NY, USA]).
All supplementary materials are available online at www.molmed.org.

Allele Frequencies Reveal Genetic Heterogeneity between Clinical Macrocategories
The results on allele frequencies are reported in Figure 1A and Supplementary  Table S3. We identified 125 different CFTR mutated alleles. Eleven mutations were novel (described below). We also identified 10 complex alleles (with two or more mutations in cis on the same allele, described below), two of which included two of the novel mutations. The different mutated alleles found were 69 in CF-PI, 60 in CF-PS, 37 in CFTR-RD and 24 in CBAVD. Fortythree (34.4%) of the 125 mutated alleles were found in at least two different macrocategories (4 alleles in all 4 populations, 14 alleles in 3 different populations, 25 alleles in 2 different populations) ( Figure 1B), whereas 82 alleles (65.6%) were exclusive to a single population (Supplementary  Table S3). Among the latter, 39 were found exclusively in CF-PI (56.5% of CF-PI alleles), 21 exclusively in CF-PS (35.0% of CF-PS alleles), 16 exclusively in CFTR-RD (43.2% of CFTR-RD alleles) and 6 exclusively in CBAVD (25.0% of CBAVD alleles). By summing all the populations (Supplementary Table S3), the frequency of the F508del (p.Phe508del) mutation was 0.400; the number of additional moderately frequent mutated alleles, with a prevalence ≥0.008 in CF (PI + PS), was 16 of 125 (12.8%), with an overall prevalence of 0.343; the number of rare nonindividual (found in at least two unrelated patients) mutations was 48 of 125 (38.4%), with an overall prevalence of 0.152; lastly, the number of individual mutations (found in only one patient or in siblings from one family) was 60 of 125 (48.0%), with an overall prevalence of 0.056. Each population displayed a peculiar mutational pattern (over-all χ 2 p < 0.0001; for each population pair χ 2 p < 0.0001, with the exception of the CFTR-RD versus CBAVD comparison, which was χ 2 p < 0.05). When the CF (PI + PS) population alone was considered ( Figure 1A, Supplementary Table S3), 101 mutations were found. The most frequent mutation was F508del (p.Phe508del), with an overall frequency of 0.447, and a well-differentiated frequency of 0.534 in CF-PI and 0.225 in CF-PS. Only 28 of the 101 mutations were found in both CF-PI and CF-PS, including 14 mutations also found in other populations (Figure 1B). Another 41 mutations were found in CF-PI but not in CF-PS, including 39 that were found exclusively in CF-PI; the other two were found also in other populations. Another 32 mutations were found in CF-PS, although not in CF-PI, including 21 that were found exclusively in CF-PS and 11 found also in other populations.
A total of 37 mutations were found in the CFTR-RD population ( Figure 1A, Supplementary Table S3). The most frequent mutation in this population was again F508del (p.Phe508del), with a frequency of 0.254, which is similar to that of CF-PS. Sixteen of the 37 mutations found were exclusive to CFTR-RD. By contrast, the other 21 mutations were also found in other populations ( Figure 1B).
Only 24 mutations were identified in the CBAVD population ( Figure 1A, Supplementary Table S3). The most frequent mutation was the (TG)12T5 (c.[1210-14TG [12];1210-12T [5]]) variant allele, with a frequency of 0.170; F508del (p.Phe508del) was the second most prevalent mutation, with a frequency of 0.128. Six of the 24 mutations found were exclusive to the CBAVD population, whereas the other 18 were also found in other populations ( Figure 1B).

Genetic Heterogeneity between Populations Is Amplified at a Genotypic Level
The results on genotype frequencies are reported in Figure 2A and Supplementary  Table S4. A total of 225 different CFTR mutated genotypes were identified, 11 and 20 of which include, respectively, the novel mutations and the complex alleles found. The different mutated genotypes found were 115 in CF-PI, 77 in CF-PS, 44 in CFTR-RD and 12 in CBAVD. Nineteen (8.4%) of the 225 different genotypes were found in at least two different populations ( Figure 2B), whereas the remaining 206 (91.6%) were exclusive to a single population (Supplementary Table S4). In particular, 105 were exclusively found in CF-PI (91.3% of CF-PI genotypes), 58 exclusively in CF-PS (75.3% of CF-PS genotypes), 35 exclusively in CFTR-RD (79.5% of CFTR-RD genotypes) and 8 exclusively in CBAVD (66.7% of CBAVD genotypes). Fifty-nine (26.2%) of the 225 genotypes were found in at least two unrelated individuals. These 59 nonindividual genotypes included 4 and 15 (in total, accounting for 32.2%) that were found in three and two different populations, respectively ( Figure 2B), and that consequently underwent a specific extensive genetic analysis (see Materials and Methods); the other 40 genotypes (67.8%) proved to be associated with a single population. The remaining 166 genotypes (73.8%) were found to be individual genotypes found only once in single patients (146 genotypes) or only in siblings from a single family (20 genotypes), associated with a single population.
Taking all the populations together (Supplementary Table S4), the frequency of the homozygous F508del/F508del (p.[Phe508del];[Phe508del]) genotype was 0.180. The number of additional moderately frequent genotypes, with a prevalence ≥0.008 in CF (PI + PS), was 15 out of 225 (6.7%), with an overall prevalence of 0.259. The number of rare nonindividual (found in at least two unrelated patients) genotypes was 43 out of 225 (19.1%), with an overall prevalence of 0.185. Lastly, the number of individual genotypes (found in only one patient or in siblings from one family) was 166 out of 225 (73.8%), with an overall prevalence of 0.305. As for mutated alleles, each population displayed a peculiar pattern of genotypes (overall χ 2 p < 0.0001; for each population pair χ 2 p < 0.0001).
If we consider the CF (PI + PS) population alone (Figure 2A, Supplementary Table  S4), 182 different genotypes were found with the homozygote F508del/F508del (p.
[Phe508del];[Phe508del]), which was the most frequent genotype, with an overall frequency of 0.224. However, although this genotype was also the most frequent in the CF-PI population, with a frequency of 0.311, it was never found in the CF-PS population, in which the most frequent genotype was the F508del/2789+5G>A (c. [1521_1523delCTT];[2657+5G>A]), with a frequency of 0.101. Only 10 of the 182 genotypes found in this mixed population were found in both CF-PI and CF-PS ( Figure 2B), but were never found in either CFTR-RD or CBAVD. An additional 105 genotypes were found exclusively in CF-PI but were never found in CF-PS or the other populations. Another 67 genotypes were found in CF-PS but not in CF-PI, with 58 of them being found exclusively in CF-PS and nine also found in CFTR-RD and/or CBAVD.
A total of 44 genotypes were found in the CFTR-RD population (Figure 2A, Supplementary Table S4). The F508del/F508del (p.

Phenotypic Description of the 11 Novel Mutations
The characteristics of the 11 novel mutations found are reported in Tables 1-3 and summarized below. Their absence in at least 100 subjects (200 alleles) from the general population was verified.
The K442X (p.Lys442*) mutation was found in a CF-PI male patient with a F508del/K442X (p. [Phe508del];[Lys442*]) genotype. The average sweat test was 82 ± 8 mEq/L. The patient was enrolled at 2 months of age on the basis of neonatal screening (26,27) with no symptoms or bacterial pulmonary isolates. He is now 5 years old and displays respiratory manifestations, although with no bacterial isolates.
The D529N (p.Asp529Asn) mutation was found in a CF-PI female patient with a F508del/D529N (p. [Phe508del];[Asp529Asn]) genotype. The average sweat test was 42 ± 5 mEq/L. A late diagnosis was performed at 32 years of age, when severe respiratory manifestations as well as pulmonary bacterial isolates were already present. The patient is now 39 years old and has been displaying persistent severe pulmonary manifestations with chronic bacterial colonization.
The T465N (p.Thr465Asn) mutation was found in a CF-PI male patient with a W1282X/T465N (p.[Trp1282*];[Thr465Asn]) genotype. The average sweat test was 83 ± 7 mEq/L. Meconium ileus was present at diagnosis, which was made at 3 months of age. No other symptoms, respiratory manifestations or pulmonary bacterial isolates were present. He died at 33 years of age, with severe pulmonary manifestations, chronic bacterial colonization, liver disease and cholelithiasis.
The W19X(TAG) (p.Trp19*) mutation was found in a CF-PI male patient with a G542X/W19X(TAG) (p.[Gly542*];[Trp19*]) genotype. The average sweat test was 58 ± 5 mEq/L. Diagnosis was performed at birth, when the patient exhibited meconium ileus. No other symptoms, respiratory manifestations or pulmonary bacterial isolates were present. The patient is now 3 years old and displays cholelithiasis and mild respiratory manifestations with intermittent bacterial colonization.
The H1375P (p.His1375Pro) mutation was found in 3 CF-PS patients (a brother and sister and a third unrelated male patient) with the same 2789+5G>A/H1375P (c. [2657+5G>A];[4124A>C]) genotype. The average sweat tests ranged from 63 ± 2 to 91 ± 8 mEq/L. The diagnosis of the male sibling was performed at 32 years of age on the basis of symptoms, when some pulmonary manifestations were present, although with no bacterial isolates, and de-hydration occurred. He is now 41 years old, no longer displays pulmonary symptoms but has intermittent bacterial colonization. The female sibling was enrolled at 33 years of age because of familiarity  Table 1. Genetic, biochemical, microbiological and clinical characterization of the patients with the 11 novel mutations found: position and nomenclature of novel mutations found.
Old nomenclature (legacy name) New nomenclature (HGVS name) Nucleotidic Aminoacidic Nucleotidic Aminoacidic Position notation notation Position notation notation and displayed stronger pulmonary symptoms at the diagnosis than the brother, with bacterial isolates. She is now 48 years old and has worsened pulmonary symptoms with chronic bacterial colonization. The unrelated male patient was diagnosed at 33 years of age. He displayed pulmonary symptoms, pancreatitis and cholelithiasis with no bacterial isolates. He is now 45 years old and exhibits the same pulmonary conditions as those present at enrollment. He no longer has pancreatitis and underwent a cholecystectomy.
The Q779X (p.Gln779*) mutation was found in a CF-PS brother and sister with a [(TG)11T5; V562I; A1006E]/Q779X (c.[1210-14TG [11];1210-12T [5];1684G>A; 3017C>A];[2335C>T]) genotype. Their average sweat tests were, respectively, 70 ± 15 and 62 ± 17 mEq/L (both very variable). The enrollment of the male sibling was on the basis of neonatal screening (26,27) at 2 months of age, with no clinical symptoms and no pulmonary bacterial isolates. The enrollment of the female sibling was because of familiarity and neonatal screening at 2 months age, with pulmonary symptoms and bacterial isolates already present. At follow-up, which was respectively up to 11 and 5 years, both patients showed intermittent bacterial colonization. Pulmonary symptoms appeared in the male sibling. The female sibling exhibited worsened pulmonary symptoms as well as pancreatitis and liver disease.
The G1247R(G>C) (p.Gly1247Arg) mutation was found in a CF-PS female pa-tient with a W1282X/G1247R(G>C) (p.[Trp1282*];[Gly1247Arg]) genotype. The average sweat test was 78 ± 20 mEq/L (very variable). The diagnosis was made at 6 months of age on the basis of neonatal screening (26,27), with no other symptoms. The patient is now 21 years old and displays pulmonary symptoms with chronic bacterial colonization, as well as rhinosinusitis and nasal polyposis.
The G1244R (p.Gly1244Arg) mutation was already published by us (24) when the patient was 7 years old; here we provide a further 7-year follow-up report following that description. The G1244R (p.Gly1244Arg) mutation was found in a CF-PS male patient, diagnosed at 14 months of age on the basis of symptoms, with a 3849+10kbC>T/G1244R

Phenotypic Description of the 10 Complex Alleles
Although the protocol used for the mutational search was not specifically aimed at the selection of complex alleles, 10 such alleles were found. The following five complex alleles encompassed mutations found only within respective complex alleles (never found separately).
The [E479X;V754M] (p.[Glu479*; Val754Met]) novel complex allele was found once in a CF-PI patient with a F508del (p.Phe508del) mutation on the other allele and an average sweat test of 106 ± 13 mEq/L. The E479X (p.Glu479*) is a novel mutation (described above and in Tables 1-3 . Sweat test values were 46 ± 2 mEq/L for the CF-PS patient and 31 ± 1 mEq/L for the CBAVD patient.
As the mutations described above were only found in cis in the five complex alleles, it was impossible to evaluate the specific clinical effects due to the presence of more than one mutation on the same allele. By contrast, at least one of the mutations found in cis for the other five complex alleles was also found separately from the complex allele, thereby allowing the following speculation about their cis-acting effect.
The Sweat test values ranged from 15 ± 2 to 77 ± 5 mEq/L, with an average value of 32 ± 18 mEq/L. Part of this case series has been described previously (28). Here we confirm that patients with the complex allele had more severe diagnoses and significantly (Student t test, p < 0.0001) higher average sweat test values than the patients without the complex allele.
The Arg668Cys]) (without the first mutation in cis) was found in two CFTR-RD patients with F508del (p.Phe508del) and S1235R (p.Ser1235Arg) on the other allele, with sweat tests, respectively, of 19 ± 2 and 17 ± 1 mEq/L (average 18 ± 1 mEq/L). These three mutations were never found alone. The complex allele with the three mutations in cis in the CF-PS patient was found in a more severe form of CF than the complex allele with only two mutations in cis. Accordingly, the average sweat test value was significantly (Student t test, p < 0.05) higher in the complex allele with three mutations than in the other allele.

Clinical Classification of Mutated Alleles and Genotypes
The three rules described in Materials and Methods led to the classification of 109 alleles (Table 4); 16 alleles could not, despite being identified as diseasecausing, be univocally assigned to one (or more) macrocategories on the basis of our experimental data. It was also possible to assign 87 of the 109 classified alleles to a single macrocategory (56 to CF-PI, 15 to CF-PS, 14 to CFTR-RD and 2 to CBAVD). The remaining 22 alleles were classified as causing variable phenotypes (11 CF-PI and CF-PS; 4 CF-PS and CFTR-RD; 2 CFTR-RD and CBAVD; 2 CF-PI, CF-PS and CFTR-RD; 3 CF-PS, CFTR-RD and CBAVD). No allele was classified as causing all four phenotypes, nor was any allele found to cause very different phenotypes (for example, CF-PI and CBAVD). According to the principle that the prevalent clinical effect depends on the allele with the highest residual functionality, the adherence of our model regarding the clinical effect of the combination of classified alleles was not only verified on experimentally available allele combinations but also inferred from allele combinations that are not experimentally available (Supplementary Table S5).

Relationship between Genotype, Residual Functionality and Clinical Presentation
The sweat test is an in vivo measurement of CFTR residual functionality. A general significant correlation between the sweat test and clinical manifestations emerged (ANOVA, p < 0.001; Figure 3A). The average values of the sweat test were 87 ± 19 mEq/L for CF-PI, 73 ± 22 mEq/L for CF-PS, 47 ± 24 mEq/L for CFTR-RD and 27 ± 13 mEq/L for CBAVD. However, a considerable degree of biological variability within each population and a wide overlap of values between different populations were observed. Consequently, a genotype-specific analysis of sweat test values in the 59 nonindividual genotypes (found in at least two unrelated individuals) was performed (Figures 3B, C). The 19 genotypes found in at least two different populations yielded significantly different overall sweat test values ( Figure 3B; ANOVA, p < 0.0001). However, Bonferroni multiple comparison test revealed that this overall significant difference is due to only 13 pairs of genotypes out of a total of 171 possible comparisons. In addition, a marked interindividual biological variability emerged for the four genotypes found in three different populations ( Figure 3B, the four leftmost genotypes) as well as for the 15 genotypes found in two different populations ( Figure 3B, the 15 rightmost genotypes). The general effect observed is that the same genotype can give rise to a wide range of sweat test values. Furthermore, no evident correlation was detected between the different sweat test values (obtained from the same genotype in these 19 nonindividual genotypes from different populations) and the severity of the clinical presentation. Moreover, even the 40 nonindividual genotypes found only in one population yielded significantly different overall sweat test values (Figure 3C; ANOVA, p < 0.0002). However, Bonferroni multiple comparison test did not detect any statistically significant difference when the 780 possible comparisons between each genotype pair were performed. To sum up, it is clear that highly similar sweat test values may be observed in different populations.

DISCUSSION
A high degree of heterogeneity was observed between the mutational patterns of the clinical macrocategories analyzed. The mutational patterns appeared to be specific to each population, with only 34.4% of mutated alleles shared by at least two populations and 65.6% of populationspecific mutated alleles (Figure 1, Supplementary Table S3). This specificity may be quantified as 56.5% of mutated alleles that were exclusive to CF-PI, 35.0% to CF-PS, 43.2% to CFTR-RD and 25.0% to CBAVD. This heterogeneity and mutational pattern specificity was enhanced at the genotypic level, with only 8.4% of genotypes shared by at least two populations and 91.6% of population-specific mutated genotypes (Figure 2, Supplementary Table S4). This genotype specificity may be quantified as 91.3% of genotypes that were exclusive to CF-PI, 75.3% to CF-PS, 79.5% to CFTR-RD and 66.7% to CBAVD. In addition, alleles and genotypes found in different populations also displayed well-differentiated frequencies that were specific to each population. Most of the 125 different mutated alleles identified were individual, found in only one patient or in siblings from a single family (48.0%), or were rare (38.4%), with a prevalence <0.008. Among the 225 genotypes, 73.8% were individual genotypes found in single patients or in siblings from a single family, and 19.1% were rare (frequency <0.008). It is noteworthy that 8.8% of the mutated alleles identified were novel (with an overall prevalence of 0.011), giving rise to 4.9% of the genotypes (with an overall prevalence  Supplementary Table S4 for genotype HGVS name. See text for further explanations.   Continued on next page of 0.021). When taken together, these results revealed a peculiarity of the mutational pattern within each clinical macrocategory that appeared to be largely dependent on rare and individual muta-tions and genotypes, as well as on the varying prevalence of common alleles and genotypes.
After the extended search in the CF-PI, CF-PS and CFTR-RD macrocategories, a low frequency of unknown alleles (0.007, 0.033 and 0.035, respectively) and of patients with at least one unknown allele (0.014, 0.064 and 0.042, respectively) was left. This highlights the fact that CFTR  molecular lesions underlie the vast majority of these clinical forms. By contrast, after the extended search in CBAVD, 0.436 of the alleles and 0.551 of the patients remained uncharacterized. The role of CFTR in the reproductive apparatus (29)(30)(31)(32)(33)(34)(35)(36) and the involvement of CFTR mutations in reduced male (37)(38)(39)(40)(41) and female (42)(43)(44) fertility are strongly debated issues. It is noteworthy that 31.9% of CBAVD patients had a genotype with two unknown alleles. We may conclude that the strictly mono-symptomatic form of CBAVD, which has no other CF manifestations, is frequently caused by molecular lesions other than those in the CFTR.
Although systematic studies have not yet been performed, over 40 complex alleles (with two or more mutations in cis on the same allele) of CFTR have so far been described. By using our approach, which was only partially aimed at the selection of complex alleles, we were able to identify 10 such alleles, two of which are novel. The fact that widely used protocols designed for mutational searches are usually interrupted after the first two mutations on different alleles have been found may greatly limit the interpretation of genetic data. The true functional significance of a sequence variation in cis with another variation (undetected and with some functional consequence) may be biased, as may the relationship between genotype and phenotype. For 4 out of 10 complex alleles found in our case series, an evaluation of the cis-acting effect was possible by comparing alleles with mutations in cis with those with the same mutations disjointed. The ability of complex alleles to give rise to more severe forms of CF and higher sweat test values was highlighted for three of these complex alleles. These data, together with those in the literature (see Lucarelli et al. [1]) for a review) indicate that complex alleles may account for a greater degree of variability than is usually acknowledged. To be meaningful, mutational search protocols should be designed to search for complex alleles, at least in cases in which the clinical presentation varies even when the genotype is apparently identical. At the very least, they should be planned in such a way as to complete the characterization of known complex alleles when one of the mutations already known to be in cis is found.
Some unusual results on the clinical outcome of (TG)mTn tracts and of some stop mutations are described in the Supplementary Materials.
A crucial issue in CF is the assignment of a possible pathological role to sequence variations. The algorithm we applied in this work (described in Materials and Methods and in Supplementary Figure S1 with examples) allowed 87.2% of the mutated alleles identified (109 of 125) to be assigned to clinical macrocategories. These mutated alleles comprised 79.8% (87 of 109) that could be considered to cause restricted clinical manifestations (only one specific clinical macrocategory) and the remaining 20.2% (22 of 109) that could be considered to have a varying effect (more than one clinical macrocategory) ( Table 4). Our approach left some uncertainty with regard to which clinical form(s) causes the remaining 12.8% of alleles (16 of 125), although their ability to induce disease is unequivocal. For a comment on the clinical classification of these 16 alleles, see the Supplementary Materials.
The best approach recently made to characterize CFTR mutations is the CFTR2 study (11) (http://www.cftr2.org). This and our approach have both common and distinctive features. The main common characteristic is that both use a phenotypic-driven approach. The main distinctive characteristics are that the CFTR2 is focused on the most common CFTR mutations worldwide and on classic forms of CF (with a positive sweat test), whereas our study also includes nonclassic CF forms (with also borderline sweat test) and rare mutations. Consequently, a greater mutational heterogeneity in the CFTR gene was observed in our study. A direct consequence is that the 43.2% of the alleles we identified (54 of 125, also taking into account complex alleles) were not included in the CFTR2 study. Another three alleles we classified had been recognized in the CFTR2 study as being of unknown significance. If the two alleles with an uncertain significance in our study were also excluded, 66 alleles classified in both studies, and which could consequently be compared, were left. For these alleles, the level of agreement between the two characterization approaches was excellent, with 95.5% of them (63 of 66) being classified similarly. In particular, 54 alleles were classified as causing CF-PI and/or CF-PS in our study in perfect agreement with a classification as CF-causing in the CFTR2 study. Another four alleles were classified as belonging to two or more different macrocategories (including at least one of the nonclassic, namely CFTR-RD and/or CBAVD) in our study and with varying clinical consequences in the CFTR2 study, also in this case with an excellent match. A good level of agreement may also be recognized for mutations R117H (p.Arg117His) and S977F (p.Ser977Phe), classified in our study as CFTR-RD-causing, and the mutation D579G (p.Asp579Gly), classified in our study as CF-PS-causing, and both recognized in the CFTR2 study with varying clinical consequences, therefore also including our phenotypic findings. A similar good match may also be assumed for two mutations classified as non-CF-causing in CFTR2 [S1235R (p.Ser1235Arg) and R31C (p.Arg31Cys)] but as CFTR-RD-causing in our study, owing to the fact that CFTR2 is mainly aimed at classic CF and is more prone to a classification as noncausing for those mutations originating nonclassic clinical and biochemical phenotypes. The three actually discrepant alleles were L997F (p.Leu997Phe), without the R117L (p.Arg117Leu) in cis, L206W (p.Leu206Trp) and T338I (p.Thr338Ile). The L997F (p.Leu997Phe) allele can, according to the findings that emerge both from this work and previous studies (28), also give rise to CF-PS, whereas in the CFTR2 study, it was classified as non-CFcausing. The L206W (p.Leu206Trp), which in our study was classified as a CFTR-RD-causing mutation, was classified as CF-causing in the CFTR2 study.
The T338I (p.Thr338Ile) is classified in the CFTR2 study as CF-causing. From our study resulted the origination not only of CF-PS (in agreement with the CFTR2 study) but also of CFTR-RD and CBAVD. This result extends the phenotypic consequences of this mutation also in accord with previous findings (45,46). These discrepancies may be linked to the degree of variability that is independent of CFTR and that accounts for about 4.5% (3 of 66) of the mutations considered in both studies. We found 52 individual mutated alleles that, in combination with nonindividual alleles, originated 146 (64.9%) genotypes found only once in a single patient. The assignment of a pathological role to these individual alleles was quite easy. On the other hand, there were no other patients in our case series to confirm the assignment. However, for 19 (of 52) individual alleles, a comparison with the CFTR2 study was possible. In this case, for these individual alleles, we found a perfect adherence between our classification and that of the CFTR2 study. Overall, the excellent agreement on common, rare and individual alleles included in both studies lends further support to our conclusions regarding the other mutations not included in the CFTR2 study. Our classification method represents a good example of how it is possible to deduce the phenotypic consequence of a mutated allele on the basis of a well-defined clinical classification and a reasonable extended mutational analysis, even in the absence of experimental functional studies. Obviously, the limitations of every attempt of pathogenicity inference by a phenotypic-driven approach considering a limited number of cases should be taken into account. The final goal should be to achieve a large case series as a starting point for CFTR sequence variations experimental functional analysis. This approach would be more powerful but also more complex and time-consuming.
The approaches designed to assign a phenotypic outcome to CFTR sequence variations are generally based on an allele-oriented view. However, it is widely accepted that overall residual functionality depends on both alleles and, ultimately, on the allele with the higher residual functionality. To address this issue, we elaborated a two-allele combinatorial view of the clinical outcome that is to be expected, starting from previously classified alleles (Supplementary  Table S5). This approach may be considered as a genotypic-oriented prediction tool of clinical outcome, in part experimentally validated in this work and in part inferred (to be experimentally verified when the specific genotypes are identified).
The molecular mechanisms underlying the variability between genotype and phenotype are as yet unclear. At least two steps may be involved. The first step may be defined as the transition from the CFTR-mutated genotype to CFTR protein residual function (genotype → residual functionality step). It is reasonable to assume that this transition is more likely to be influenced by intragenic (CFTRdependent) variability. It may originate from the large number of sequence variations and be markedly enhanced by their combination in trans and in cis as complex alleles (1,28), as well as by a regulatory posttranscriptional and posttranslational impairment that often escapes recognition. The second step may be defined as the transition from the CFTR protein residual function to clinical phenotype (residual functionality → clinical step). This transition is more likely to be influenced by extragenic variability due to genes other than CFTR, such as the socalled modifier genes (9,47) and the CFTR interactome (48,49). Significant differences emerged in the average sweat test values, which may be considered an in vivo measurement of the CFTR protein residual function between the populations analyzed. This result highlights that at least a general correlation between residual functionality and clinical macrocategories exists. However, the marked variability within each population results in a considerable overlap between the sweat tests. This intra-population variability seems to arise from the presence of several different genotypes, each giving rise to its own range of residual functionality, although narrower than that observed in the overall populations. It may be argued that at least some of the variability arises in the genotype → residual functionality step. On the other hand, it is to be expected that the different functionalities that arise from the same genotype are correlated with different clinical manifestations (the higher the sweat test, the more severe the clinical manifestations). However, no such correlation was detected between different sweat test values obtained from the same genotype and the clinical presentation. Furthermore, similar sweat test values associated to different genotypes may be observed in different populations. It may also be argued that another part of the variability arises in the residual functionality → clinical step. The relative contribution of the two steps to the overall variability is still largely unknown and deserves further quantitative studies.

CONCLUSION
The full clinical and mutational characterization of CF patients reveals a genetic heterogeneity that underlies a strong correlation between genotypic patterns and phenotypic macrocategories. This specificity appears to largely depend on rare and individual mutations, as well as on the varying prevalence of common alleles in the populations analyzed. A pathogenic classification of sequence variations on the basis of rigorous clinical studies and an extended mutational search may be a rapid and meaningful way to initially characterize sequence variations for which an experimental functional characterization is still lacking, as well as a starting point for subsequent experimental quantitative functional characterizations. The experimental dissection of the overall biological CFTR pathway appears to be a powerful approach for a better comprehension of the sources of variability in the genotype-phenotype relationship. Overall, our findings call For more infomation on methods and strategies, see references (15)(16)(17) mentioned in the Materials and Methods in the article. Supplementary Table S2. List of mutations with a previously described controversial effect that underwent supplemental study. The genotypes including these mutations underwent the mutational search up to the SEQ step even when 2 mutations on different alleles were already found, and up to the DEL step if classified as CF-PI. The alleles are ordered according to their nucleotidic position. See text and Table S3  )) were only found in the CF-PI macro-category, and were consequently classified as CF-PI causing. This led to the ex17a-18del (c.2988+1173_3468+2111del8600), ex2del (c.54-1161_164+1603del2875), E585X (p.Glu585*) and L1077P (p.Leu1077Pro) alleles, which had not been classified under the first rule because they were never found in homozygosity, also being classified as CF-PI causing. When this rule was explored, the 2789+5G>A (c.2657+5G>A) and R347P (p.Arg347Pro) alleles were also found in the CF-PI macro-category when combined in compound heterozygosity with alleles already classified as CF-PI causing; consequently they were re-classified not only as CF-PS causing, but also as CF-PI causing. Some examples of the third rule. The S945L (p.Ser945Leu) allele, found in the CF-PS macro-category in compound heterozygosity with an already classified CF-PI causing allele (genotype F508del / S945L (p.
[Arg117Cys];[Tyr569Asp])), was classified as CBAVD causing. The T338I (p.Thr338Ile), found in both the CF-PS and CFTR-RD macro-categories in compound heterozygosity with an already classified CF-PI causing allele (genotype F508del / T338I (p. [Phe508del];[Thr338Ile])), was initially classified as a variable CF-PS and CFTR-RD causing allele; when this rule was further explored, it was also found in the CBAVD macro-category when combined in compound heterozygosity with an allele already classified as variable CF-PI and CF-PS causing; consequently it was re-classified not only as CF-PS and CFTR-RD causing, but also as CBAVD causing.
Supplementary Table S3. Allele frequencies. Data are ordered according to a frequency decreasing order in the CF (PI + PS) population. The column "suitable mutational search" indicates the specific step of mutational analysis that highlights the corresponding mutated allele. Definitions in the column "familiarity": individual = found only in 1 patient; siblings = found only in siblings of a single family. N = number of alleles found.