Comparative In Vitro and In Silico Analyses of Variants in Splicing Regions of BRCA1 and BRCA2 Genes and Characterization of Novel Pathogenic Mutations

Several unclassified variants (UVs) have been identified in splicing regions of disease-associated genes and their characterization as pathogenic mutations or benign polymorphisms is crucial for the understanding of their role in disease development. In this study, 24 UVs located at BRCA1 and BRCA2 splice sites were characterized by transcripts analysis. These results were used to evaluate the ability of nine bioinformatics programs in predicting genetic variants causing aberrant splicing (spliceogenic variants) and the nature of aberrant transcripts. Eleven variants in BRCA1 and 8 in BRCA2, including 8 not previously characterized at transcript level, were ascertained to affect mRNA splicing. Of these, 16 led to the synthesis of aberrant transcripts containing premature termination codons (PTCs), 2 to the up-regulation of naturally occurring alternative transcripts containing PTCs, and one to an in-frame deletion within the region coding for the DNA binding domain of BRCA2, causing the loss of the ability to bind the partner protein DSS1 and ssDNA. For each computational program, we evaluated the rate of non-informative analyses, i.e. those that did not recognize the natural splice sites in the wild-type sequence, and the rate of false positive predictions, i.e., variants incorrectly classified as spliceogenic, as a measure of their specificity, under conditions setting sensitivity of predictions to 100%. The programs that performed better were Human Splicing Finder and Automated Splice Site Analyses, both exhibiting 100% informativeness and specificity. For 10 mutations the activation of cryptic splice sites was observed, but we were unable to derive simple criteria to select, among the different cryptic sites predicted by the bioinformatics analyses, those actually used. Consistent with previous reports, our study provides evidences that in silico tools can be used for selecting splice site variants for in vitro analyses. However, the latter remain mandatory for the characterization of the nature of aberrant transcripts.


Introduction
It is estimated that approximately 5% to 10% of all breast cancers occur in women with a positive family history, and that approximately 15% to 25% of familial aggregations are due to deleterious germline mutations affecting either the BRCA1 (MIM# 113705) or BRCA2 (MIM# 600185) genes [1,2]. Carriers of these mutations have a 40% to 80% probability of developing breast cancer in their life [3] and show an increased risk of other cancers, particularly ovarian carcinoma. As a consequence, BRCA1 and BRCA2 genetic testing has become a widely used procedure in the clinical management of families with genetic predisposition to breast/ovarian cancer, thus allowing discrimination of at-risk mutation carriers from non-carriers, whose cancer risk can be assumed comparable to that of the general population. However, the usefulness of these molecular analyses depends on the ability to correctly distinguish truly pathogenic mutations, i.e. responsible for the increased risk of cancer, from genetic variants without clinical relevance. Most clinically relevant alterations detected in BRCA1 and BRCA2 are nonsense or frameshift mutations that, by introducing a premature termination codon (PTC), lead to non functional proteins. Moreover, transcripts containing PTCs are mostly subject to nonsense mediated mRNA decay (NMD) [4]. Conversely, the interpretation of other genetic variants, including missense and silent substitutions, and alterations in intronic and regulatory regions, cumulatively referred to as unclassified variants (UVs), or variants of unknown significance (VUS), is not so straightforward. As a consequence, counseling of families in which only UVs are detected is difficult, since the genetic analyses fail to unambiguously identify at-risk individuals. To increase the informativeness of genetic testing in breast/ovarian cancer families, multifactorial likelihood models for the classification of UVs have been developed and applied (reviewed in [5,6]). These models take into account several factors. At present, these include the co-segregation of the variant with the disease in families and its co-occurrence in trans with a deleterious mutation in the same gene, personal and family history of cancer, histopathological tumor features, and, limited to missense mutations, the conservation across species of the affected amino acid and the nature and position of the substitution. The usefulness of integrated models is limited by the amount of data necessary to reach the required odds ratios, in favor or against causality, for reliable classification of UVs. Indeed, multifactorial likelihood methods are usually unable to classify BRCA1 and BRCA2 UVs detected in few families only [7]. This provides a strong rationale for the use of functional assays for the characterization of UVs under the assumption that they are highly sensitive and specific in detecting deleterious mutations.
A subgroup of UVs is represented by intronic and exonic alterations located in consensus splicing regions that are potentially pathogenic since they may lead to aberrant transcript(s), either lacking one or more exons, or even part of them, or retaining intronic sequences. Several UVs in the BRCA1 and BRCA2 genes with a potential consequence on mRNA splicing have been studied by cDNA analysis or reporter minigene assay. These studies show that transcript characterization is a powerful approach to correctly classify these UVs [8][9][10][11][12][13][14][15][16][17][18][19][20][21]. However, in some instances, different mRNA transcript patterns have been reported in association with the same mutation by diverse studies [18,22]. This inconsistency of results between laboratories is possibly due to the different experimental protocols adopted.
Moreover, several computational programs available online have been developed to recognize the natural acceptor and donor splice sites [23]. Numerous studies have shown that these tools may be used to predict whether BRCA1 and BRCA2 mutations located at splice sites and adjacent regions are expected to have an effect on mRNA splicing [13][14][15][16][17][18][20][21][22][24][25][26][27]. Therefore, they have been proposed to be instrumental in UV classification.
In this study, we characterized by transcript analysis 24 UVs located at donor and acceptor consensus splice sites of BRCA1 and BRCA2, including the nearly invariant dinucleotides at the 59 and 39 intron ends and adjacent nucleotides. Of the examined variants, 11 had not been previously analyzed at mRNA level, whereas 13 variants had been already examined in earlier studies. Transcript profiles observed in the latter group were compared with those previously described. In addition, we compared the experimental results with the outcome of computational analyses to evaluate the ability of different bioinformatics tools to identify deleterious splice site mutations and the nature of aberrant transcripts.

Materials and Methods
The UVs analyzed in this study were detected following direct sequencing of all coding exons and adjacent intronic regions of BRCA1 and BRCA2 (GenBank no. U14680 and U43746, respectively) in index cases from families complying with the previously reported eligibility criteria for BRCA gene testing [28]. A total of 24 UVs were investigated, including 11 not previously characterized at mRNA level (10 in BRCA1 and 1 in BRCA2). The variants consisted of 2 groups: the first (Group A) included 11 alterations (6 in BRCA1 and 5 in BRCA2) located at nearly invariant GT/AG dinucleotides at the 59 and 39 intron ends, and the second (Group B) 13 alterations (9 in BRCA1 and 4 in BRCA2) in the adjacent less conserved splicing regions, including the first 2 and the last 3 exonic nucleotides and the intronic regions ranging from IVS63 to IVS+8 and IVS-12 [29].

Ethics Statement
All subjects included in the study received genetic counseling and provided a written informed consent for BRCA gene mutation testing and for the use of their biological samples for research purposes, approved by the ethical committees of Fondazione IRCCS Istituto Nazionale Tumori and Istituto Europeo di Oncologia in Milan, and IRCCS San Martino IST-Istituto Nazionale per la Ricerca sul Cancro, Genoa.

Cell Cultures
Epstein-Barr virus (EBV)-immortalized human lymphoblastoid cell lines (LCLs) were established from peripheral blood of UV carriers. LCLs were maintained in RPMI 1640 medium supplemented with 15% fetal bovine serum plus 1% penicillinstreptomycin. Potential degradation of unstable transcripts via NMD was prevented by growing LCLs for 6 hours in the presence of 100 mg/ml puromycin prior to RNA extraction [4]. MCF7 human breast cancer cells were cultured in DMEM medium supplemented with 10% fetal calf serum plus 1% penicillinstreptomycin. LCLs and MCF7 cells were cultured at 37uC in a humidified 5% CO 2 atmosphere.

RNA Extraction and Reverse Transcriptase-PCR (RT-PCR) Product Analysis
Total RNA was purified from LCLs using the Nucleospin RNA II (Macherey-Nagel). cDNA was synthesized using random primers and the ImProm-II TM Reverse Transcriptase (Promega), or gene-specific primers and SuperScript III TM Reverse Transcriptase (Invitrogen), according to the manufacturers' protocols. For each UV studied, a specific PCR experiment was developed. Forward and reverse primers (Table S1) were designed to anneal to cDNA sequences flanking the gene region addressed by the alteration. The cDNA from a human LCL previously tested negative for BRCA1 and BRCA2 mutations was used as wild-type control. RT-PCR products were separated on agarose gel and visualized by ethidium bromide staining. Each UV examined was categorized as 'normal' or 'spliceogenic' (i.e., causing aberrant splicing) by comparison of the corresponding electrophoretic pattern with that of the wild-type cDNA. Altered transcript patterns were eventually confirmed by comparison with the transcript patterns observed in 10 healthy controls. Unfractionated PCR products were cleaned using ExoSAP-IT H (USB Corporation) and characterized by direct sequencing. When the exact nature of each amplicon could not be assessed by the direct sequencing of PCR products, normal and aberrant bands were excised from the agarose gel, purified using the Wizard SW Gel and PCR Clean-Up System (Promega) and individually sequenced. Alternatively, the amplicons were separated by cloning into the pGEM-T vector (Promega). Recombinant plasmids were transformed into E. Coli (SoloPack Gold, Agilent Tecnologies) and the inserts of individual clones were sequenced. All sequence reactions were performed using the ABI PRISMH BigDye TM Terminator Cycle Sequencing kit (Applied Biosystems) and examined on an ABI 3130 Genetic Analyzer (Applied Biosystems), using the Sequencing Analysis software (Applied Biosystems).

Assessment of Allelic Expression of Normal Transcripts
The ability of analyzed variants to synthesize normal transcripts was investigated by variant-specific PCR assays. In each assay, the primers were designed to anneal to sequences exclusive of the normal cDNA and to generate amplicons that included either the site of the exonic variant, or, if the variant was intronic, a polymorphic site for which the corresponding carrier had been previously found to be constitutionally heterozygous.
The amplification products were sequenced as previously described. In the presence of bi-allelic expression, the PCR products were cloned into the pGEM-T vector. Recombinant plasmids were transformed into E. Coli (SoloPack Gold, Agilent Tecnologies) and the inserts of individual clones were sequenced to quantify the relative amount of normal transcripts expressed by the wild-type and the mutant alleles.

Pull-down Assays
Full-length DSS1 cDNA and BRCA2 cDNA fragments, encoding the DSS1/DNA Binding Domain (DBD) and the N-terminal region, were obtained by RT-PCR of RNA purified from wildtype and BRCA2-mutated LCLs, and cloned into pEGFP-C1 (DSS1) or pGEX-4T1 (BRCA2). The BRCA2 c.8850G.T (p.Lys2950Asn) variant was inserted by direct mutagenesis into wild-type cDNA using the QuickChange XL Site-directed Mutagenesis Kit (Stratagene). Recombinant clones were verified by DNA sequencing. pGEX-4T1/BRCA2 clones were transformed into E. Coli strain BL21 (DE3) by electroporation. MCF7 cells were transfected with pEGFP-C1/DSS1 using FuGENE 6 Reagents (Roche Applied Science) and stable transfectants expressing green fluorescent protein (GFP)-DSS1 were obtained by selection in the presence of G418 (500 mg/ml). Single clones were checked by RT-PCR and Western blotting.
The glutathione-S-transferase (GST) tagged recombinant proteins, generated from the pGEX-4T1/BRCA2 constructs, were expressed and purified from the soluble fraction using Glutathione (GSH) Sepharose 4B beads according to the manufacturer's protocol (Amersham Biosciences).
For DSS1 binding assays, the wild-type and mutated resinbound GST-BRCA2 recombinant polypeptides were incubated with lysates from MCF7 GFP-DSS1 transfectants in binding buffer for 3 hours at 4uC on a rocker as described [30]. Complexes recovered from the beads were resolved on 8% SDS-PAGE gels and visualized by Coomassie blue staining or by immunoblotting with an anti-GFP antibody.
For single-stranded DNA (ssDNA) binding assays, the mutants and wild-type BRCA2 polypeptides were removed from GSH-Sepharose beads by thrombin digestion (1 U/100 mg) for 1 hour at room temperature in elution buffer (10 mM GSH in 50 mM Tris-HCl pH8.0). Free proteins were mixed with 50 ml of ssDNA agarose beads (Amersham Biosciences) and 100 ml of binding buffer (25 mM Tris-HCl pH7.5, 10% glycerol, 0.01% Triton X-100, 0.25 mM PMSF, 1 mM EDTA, 150 mM NaCl) for 2 hours at 4uC on a rocker. The supernatants were recovered and the beads washed 4 times with 300 ml of binding buffer. Equivalent amounts of supernatants (free fraction, F) and ssDNA agarose beads (bound-fraction, B) were resolved on 10% SDS-PAGE gels and visualized by Coomassie staining.
Gene regions addressed by the variants under analyses were submitted to bioinformatics analyses using the human default parameter settings of the different programs. For all programs except ASSA, the splice site prediction scores (SSPSs) in the wildtype and the mutated sequences were compared and the relative percent difference was calculated as follows: [(SSPS mut -SSPS wt )/ SSPS wt ]x100. For ASSA, which measures the binding affinity of the spliceosome to wild-type and mutated splice sites using information theory-based values (Ri) measured in bits (where a 1 bit change represents a 2-fold change [42]), the percent difference of binding affinity in the mutated compared to the wild-type sequences was calculated as follows: [2 (Rimut-Riwt) 21]6100.
In addition, we verified the ability of bioinformatics programs to identify the alternative splice sites that were observed in in vitro analyses to be activated following the destruction of the natural splice sites. For programs that were able to identify all such alternative splice sites, the sequence encompassing 500 bp upstream and downstream the natural splice site affected by the alteration was submitted to bioinformatics analyses and the SSPS and Ri patterns in the mutated sequences were analyzed.

mRNA Transcript Analysis
The occurrence of aberrant transcripts was observed for 19 variants, including all 11 mutations of group A (Table 1), and 8 out of 13 variants of group B (Table 2).
To verify whether the identified spliceogenic alleles maintained the ability to synthesize wild-type mRNAs, normal transcripts were selectively amplified from the cDNAs of carriers of the investigated mutations, using variant-specific PCR assays, and sequenced. The location of PCR primers and the nucleotide changes analyzed to verify allelic expression are reported in Table  S2. In 2 cases, BRCA2 c.47622A.G and c.875521G.A, cDNA sequence analyses revealed maintenance of the constitutional heterozygosity for the c.1114C.A SNP (exon 10; rs144848) and the c.9876G.A synonymous change (exon 27), respectively (data not shown), indicating expression of the normal mRNA from both the wild-type and mutated alleles. The corresponding PCR products containing the sites of heterozygosity were cloned into plasmid vectors and single recombinant clones were sequenced. A total of 23 clones were analyzed for the c.47622A.G mutation. Of these, 20 (87%) carried the rs144848 C allele and 3 (13%) the A allele. Of the 52 clones analyzed for the 875521G.A mutation, 47 (90%) carried the G allele and 5 (10%) the A allele of the synonymous change.
For 11 of the 13 remaining intronic mutations cDNA sequence analyses detected hemizigosity at polymorphic sites for which the corresponding carriers were heterozygous at the genomic level (Tables 1 and 2). The occurrence of mono-or bi-allelic expression of normal transcripts could not be assessed for 2 intronic mutations (BRCA1 c.4986+1G.T and c.134+3_134+6delAAGT) due to the lack of informative exonic polymorphisms. Finally, for all 4 spliceogenic mutations located in exons, cDNA sequencing revealed the presence of only the nucleotide corresponding to the wild-type allele.
Normal mRNA splicing was observed for the remaining 5 variants of group B, including BRCA1 c.5333A.G and BRCA2 c.9116C.T, already analyzed [20][21][22] and BRCA1 c.54823delT, c.59424A.G, c.4097G.A, not previously characterized. To account for the possible occurrence of NMD, LCLs carrying these variants were analyzed following treatment with puromycin. No aberrant transcripts were found. In addition, sequence analyses of cDNAs, investigating the presence of the exonic variants or of constitutionally heterozygous polymorphisms, revealed bi-allelic expression in all cases.

Functional Analysis of BRCA2 p.Val2985_Thr3001del
All 19 identified spliceogenic UVs led to PTCs, except the BRCA2 c.895421_8955delGTTinsAA which resulted in the inframe deletion of 51 nucleotides at the 59-end of exon 23, with consequent 17-amino acids loss (p.Val2985_Thr3001del) in the DBD of the protein. In addition to ssDNA, BRCA2 DBD interacts with several proteins, including DSS1 whose binding is crucial for DNA double-strand break repair [51]. Furthermore, many BRCA2 missense mutations, classified as deleterious by multifactorial likelihood model analysis [7], lie within this domain, emphasizing its functional role.
The functional consequences of the BRCA2 p.Val2985_Thr3001del mutation were assessed by testing its effect on DBD binding to DSS1 and ssDNA. Wild-type and mutant resin-bound GST-BRCA2 DBD polypeptides (Fig. 3A) were used as bait in pull-down experiments against extracts from MCF7 GFP-DSS1 transfectants, and the extent of DSS1 binding was evaluated by Western blotting using an anti-GFP antibody. DSS1 protein was found to interact efficiently with BRCA2 DBD wildtype and BRCA2 DBD carrying a variant (p.Lys2950Asn) classified as clinically neutral [7], while BRCA2 DBD Val2985_Thr3001del mutant failed to interact with DSS1 (Fig. 3B).
To evaluate the affinity of the p.Val2985_Thr3001del mutant for ssDNA, both wild-type and mutated BRCA2 DBD polypeptides, along with a BRCA2 200-aa N-terminal polypeptide, as negative control, were cleaved from the GST-agarose beads by thrombin digestion and chromatographed on ssDNA agarose beads. The pellets, representing the ssDNA-bound fraction, and the accompanying supernatants were analyzed separately by gel electrophoresis and stained with Coomassie dye. The BRCA2 wild-type and p.Lys2950Asn polypeptides were both recovered in the ssDNA agarose bead fraction (bound fraction, B), whereas the N-terminal fragment and the p.Val2985_Thr3001del mutant were completely recovered in the supernatant fraction (free fraction, F) (Fig. 3C). These results indicate that the c.895421_8955delGT-TinsAA mutation, causing in-frame 17-aa deletion in the DBD domain, abrogates the ability of BRCA2 to bind the DSS1 protein and its affinity for ssDNA.

In silico Splicing Analysis
We pursued the study, examining the ability of in silico tools to discriminate between spliceogenic and non-spliceogenic variants. Since all group A variants were found to be spliceogenic in vitro, this analysis was restricted to group B variants. Computed values (SSPSs for all programs except ASSA, and Ri for ASSA) in wildtype and mutated sequences are reported in Table S3. We observed that most programs failed to recognize the presence of one or more natural splice sites in the wild-type sequences of BRCA1 and BRCA2, using default settings. Therefore, these programs could not be used to evaluate the effect of variants located in these unrecognized sites (non informative analysis). Only 3 programs (MES, HSF and ASSA) were found to identify all investigated splice sites.
Then, for each program we verified the smallest SSPS/Ri percent decrease observed for a spliceogenic mutation. This value (varying from 4.1% for HSF to 100% for GS and SP) was assumed as the minimal SSPS/Ri difference predictive of a spliceogenic mutation (Table 3). This was done in order to set to 100% the sensitivity of in silico analyses in identifying mRNA affecting variants. Eventually, for each program we calculated the rate of false positive analyses (i.e., the number of variants incorrectly classified as spliceogenic on the total number of true non spliceogenic variants) as a measure of the specificity of their predictions. Considering informative analyses only, the rate of false positive analyses ranged from 0% for SSF, GS, HSF, SV and ASSA to 50% for NNSPLICE and NG2 ( Table 3).
The SSPS/Ri values in the wild-type and mutated sequences of the alternative splice sites that were observed in vitro to be used following the inactivation of the natural splice sites are reported in Table S4. Only 3 programs (MES, HSF and ASSA) were found to recognize all such alternative sites in the mutated sequences, either as newly created or cryptic sites (i.e. either not predicted or already predicted in the wild-type sequence, respectively). Limited to these programs, we examined the SSPS/Ri patterns in the mutated sequences spanning 6500 bp from the natural splice site. We found that in most cases the alternative splice sites actually used were not those with the highest SSPS/Ri in the considered region, or those closest to the abrogated natural splice site. Moreover, the  (Table S5).

Discussion
In this study we molecularly characterized 24 UVs in the BRCA1 and BRCA2 genes with potential effect at mRNA level. A total of 19 spliceogenic mutations were identified. These included all 11 variants located at invariant dinucleotides at the 59 and 39 intron ends, as expected, and 8 out of 13 UVs in less conserved positions of splicing regions. Sixteen mutations led to the synthesis of aberrant transcripts containing PTCs, 2 (BRCA1 c.134+3_134+6delAAGT and c.212G.A) to the up-regulation of naturally occurring PTC-containing isoforms, and one (BRCA2 c.895421_8955delGTTinsAA) to the in-frame deletion of 51 nucleotides at the 59-end of exon 23, within the region coding for the DBD, a critical functional domain of the BRCA2 protein.
Functional analyses revealed that the latter alteration caused the loss in the mutant protein of the ability to bind the partner protein DSS1 and ssDNA. Based on these observations, all spliceogenic mutations were classified as pathogenic or likely pathogenic, according to current guidelines for the interpretation of the results of in vitro splicing analyses [23]. These guidelines adopt the 5-class classification criteria proposed by Plon et al. [52], and classify spliceogenic mutations as of class 5 (probability of being pathogenic .99%) or of class 4 (probability of being pathogenic = 95%-99%), depending on the relative amount of aberrant Interaction of wild-type and mutated BRCA2 DBD polypeptides with DSS1. Equivalent amounts of GST-tagged wild-type or mutated BRCA2 fusion proteins were immobilized on GSH-Sepharose beads and challenged with MCF7 lysates as a source of GFP-DSS1. Input (top panel) and pulled down (middle panel) GFP-DSS1 protein were visualized by Western blotting with anti-GFP antibody. GSH-Sepharose beads and GST protein were used as negative controls. GST-tagged recombinant proteins were visualized by Coomassie staining of the SDS-PAGE gel used in the pull-down experiment (bottom panel). (C) Interaction of wild-type and mutated BRCA2 polypeptides with ssDNA. The mutated and wild-type peptides, removed from glutathione-agarose beads by thrombin digestion, were chromatographed on ssDNA agarose beads. A 200 amino acids N-terminal peptide was used as negative control. The free (F) and bound (B) fractions were separated, submitted to gel electrophoresis and visualized by Coomassie staining. Immunoblots were scanned using HP Scanjet G3010 Photo Scanner (Hewlett Packard). doi:10.1371/journal.pone.0057173.g003 Table 3. In silico predicted effect of group B variants and comparison with experimental results. transcripts. Following this scheme, 15 mutations for which only expression of aberrant transcripts was observed, were considered of class 5, whereas the 2 mutations that maintained the ability to express normal in addition to aberrant transcripts were provisionally categorized as of class 4. To assess the relative amount of normal and aberrant transcripts expressed by these alleles, additional quantitative analyses are required. For the remaining 2 spliceogenic mutations the distinction in either class 4 or 5 could not be made due to the inability to assess allelic specific expression of the normal mRNA (Tables 1 and 2). It must be remarked that a recent study, based on the analysis of LCL mRNA, reported 4 spliceogenic BRCA gene mutations introducing PTCs that were classified as uncertain or likely neutral by multifactorial likelihood analyses [17]. Although it is likely, as the authors of the study reported, that this discrepancy depended on a reduced performance of the multifactorial analyses, due either to a paucity of information and/or the use of non specific prior probability of pathogenicity for the variants analyzed, these data suggest that the mutation effect detected in blood cells may not necessarily reflect that occurring in at-risk tissues, such as breast and ovarian epithelium. Another possible explanation for the inconsistency between the outcome of in vitro splicing analyses and that of multifactorial models is the occurrence of spliceogenic mutations that maintain the ability to synthesize a normal in addition to an aberrant mRNA [13,14,16,19,25,26,53]. These mutations may have an impact on cancer risk different from that of fully inactivating alterations. As mentioned above, we detected 2 such mutations (BRCA2 c.47622A.G and c.875521G.A). However, quantitative analyses indicated that in both cases the contribution of the mutated allele to the total amount of normal mRNA was small. Assuming that most normal mRNA transcripts derive from the wild-type allele, we found that only approximately 10% originated from the mutated allele. Both the above mutations were detected in a single family each, and no sufficient data were available for a reliable classification using multifactorial models. It is interesting to note that, although splice site mutations producing both normal and aberrant transcript would be expected to be prevalently, if not exclusively, located in less conserved regions, both identified 'leaky' mutations were localized at the nearly invariant dinucleotides at the 59 and 39 intron ends. However, we could not formally rule out that expression of normal transcripts occurred also for other examined spliceogenic mutations, due to the relatively limited sensitivity of sequencing analyses in assessing allelic specific expression.
Of the 5 non spliceogenic variants, 2 were intronic and 3 introduced missense changes (p.Gly1366Asp and p.Asp1778Gly in BRCA1 and p.Pro3039Leu in BRCA2). For all the latter substitutions, the Align-GVGD algorithm [54] predicted a prior probability of pathogenicity of 1%. Therefore, following current guidelines [23], all non spliceogenic variants were classified as likely non pathogenic (class 2, probability of pathogenicity = 0,1%-4,9%). This classification was in agreement with additional evidence from previous studies. In particular, BRCA1 p.Asp1778Gly located in the C-terminus transcriptional activation BRCT domain of the gene was predicted as neutral by 3 computational supervised learning algorithms based on features describing evolutionary conservation, impact of mutation on protein structure, and amino acid residue [55]. This prediction has been recently confirmed by a comprehensive analysis using biochemical and cell-based transcriptional assays [56]. In addition, the presence of the variant was not detected in the proband's affected mother. Finally, the BRCA2 p.Pro3039Leu has been classified as neutral using a bioinformatics approach integrating information about protein sequence, conservation and structure in a likelihood ratio [57].
For 8 of the 13 variants that had been already investigated at the cDNA level, our findings were consistent with those of earlier reports, while for the remaining 5 variants (all spliceogenic) the observed transcript patterns differed from those described by previous studies (Table S6). This was possibly due to the different experimental protocols that were used, suggesting that differences may occur in the ability of in vitro analyses to detect mRNA transcripts, particularly those expressed at low level. Another potential source of inconsistency might be the use of different types of biological samples. Although no discrepancies emerged in the classification of the examined variants as spliceogenic or nonspliceogenic when comparing our data with those of previous studies, our observations emphasize the need of developing standardized methods for in vitro characterization of UVs through gene transcript analyses, particularly when the outcomes of these analyses are used to counsel carriers of variants at splice sites.
In previous studies, bioinformatics analyses have been proposed as a first step to select variants predicted to affect mRNA splicing and, in particular, those located outside the nearly invariant dinucleotides at the 59 and 39 intron ends [14][15][16]20,22]. To further verify the reliability and the usefulness of these programs for a priori selection of spliceogenic UVs, we compared the computational splice-site predictions obtained from 9 commonly used programs with the experimental results derived from cDNA analyses. Consistent with previous reports [14,16,17,[20][21][22]25,26], we found that most tested programs showed an incomplete informativeness, i.e. were not able to recognize all natural splice sites affected by the variants under analyses. Thus, the effect of nucleotide substitutions at these sites could not be subsequently computed, limiting the usefulness of these programs. In our analysis only 3 programs (MES, HSF and ASSA) exhibited 100% informativeness.
While the performance of a selective process is usually measured in terms of accuracy, i.e., the optimal compromise between sensitivity and specificity, it must be considered that UV classification in cancer predisposing genes is manly carried out for clinical purposes, i.e., to define risk estimates in carriers of such variants [52]. Along this line, we reasoned that a mandatory prerequirement of the procedures for BRCA1 and BRCA2 variant selection for transcript characterization is 100% sensitivity. Therefore, in our study, we considered that a spliceogenic effect was predicted when an in silico analysis measured a relative decrease of the SSPS/Ri values (of the natural splice site in the mutated compared to the wild-type sequence) higher than the lowest detected in the presence of an in vitro verified spliceogenic mutation. Based on this assumption, we eventually verified the specificity, measured as the rate of false positive predictions, of each program. In our hands, this was found to be equal to 100%, i.e. no false positive prediction, for 5 programs: SSF, GS, HSF, SV and ASSA. In a general evaluation, the programs that performed better were HSF and ASSA, the only exhibiting 100% informativeness and 100% specificity.
The knowledge of the precise nature of aberrant transcripts is crucial for the assessment of the pathogenicity of spliceogenic mutations. For example, variant alleles producing transcripts carrying in-frame deletions not disrupting known functional domains are currently classified as of unknown clinical significance [23] and some of them might actually be clinically neutral. This is supported by the observation that the BRCA2 c.6853A.G variant, resulting in increased exclusion of exon 12, is phenotypically indistinguishable from an allele with exon 12 deleted and wild-type BRCA2 in functional analyses using allelic complementation in Brca2-null mouse embryonic stem cells [58]. Therefore, it is important to ascertain whether a spliceogenic mutation, in addition to abolishing the recognition of a natural splice site, leads to the creation of novel splice sites or the activation of cryptic ones. As already discussed, in this study the usage of alternative splice sites were observed in a relevant fraction of ascertained spliceogenic variants (10/19 = 42%). We sought to verify to which extent computational programs are able to predict such occurrences. We found that only 3 programs (MES, HSF and ASSA) recognized all experimentally ascertained alternative splice sites. However, these programs also detected other putative cryptic splice sites in the vicinity of the abolished naturally-occurring splice sites and, consistent with a previous report [21], we were unable to derive simple criteria, based on the outcomes of the in silico analyses, for the prediction of the specific alternatively used splice sites. On the other hand, it is also possible that some of the cryptic sites predicted in silico could be activated in mutant samples, but the corresponding aberrant transcripts were not observed in vitro due to a limited sensitivity of the detection method we used.

Conclusions
Our study provides further evidences that in silico tools may be used for the ascertainment of splice site variants to be submitted to in vitro analyses. We performed a comparative analysis of 9 freelyavailable computational programs, and found that those that performed better in identifying variants affecting RNA splicing, under our analytical scheme, were HSF and ASSA. However, in vitro analyses remain mandatory for the characterization of the exact nature of aberrant transcripts. Wider surveys within the frame of large collaborative consortia, such as the recently established 'Evidence-based Network for the Interpretation of Germline Mutant Alleles' (ENIGMA) [59], are looked-for, in order to define the more effective protocols for the use of bioinformatics analyses in the ascertainment of spliceogenic mutations.