L-Type Calcium Channel: Predicting Pathogenic/Likely Pathogenic Status for Variants of Uncertain Clinical Significance

(1) Background: Defects in gene CACNA1C, which encodes the pore-forming subunit of the human Cav1.2 channel (hCav1.2), are associated with cardiac disorders such as atrial fibrillation, long QT syndrome, conduction disorders, cardiomyopathies, and congenital heart defects. Clinical manifestations are known only for 12% of CACNA1C missense variants, which are listed in public databases. Bioinformatics approaches can be used to predict the pathogenic/likely pathogenic status for variants of uncertain clinical significance. Choosing a bioinformatics tool and pathogenicity threshold that are optimal for specific protein families increases the reliability of such predictions. (2) Methods and Results: We used databases ClinVar, Humsavar, gnomAD, and Ensembl to compose a dataset of pathogenic/likely pathogenic and benign variants of hCav1.2 and its 20 paralogues: voltage-gated sodium and calcium channels. We further tested the performance of sixteen in silico tools in predicting pathogenic variants. ClinPred demonstrated the best performance, followed by REVEL and MCap. In the subset of 309 uncharacterized variants of hCav1.2, ClinPred predicted the pathogenicity for 188 variants. Among these, 36 variants were also categorized as pathogenic/likely pathogenic in at least one paralogue of hCav1.2. (3) Conclusions: The bioinformatics tool ClinPred and the paralogue annotation method consensually predicted the pathogenic/likely pathogenic status for 36 uncharacterized variants of hCav1.2. An analogous approach can be used to classify missense variants of other calcium channels and novel variants of hCav1.2.


Introduction
L-type Cav1.2 calcium channels are expressed in various excitable cells including cardiomyocytes [1]. Defects in gene CACNA1C, which encodes the pore-forming α1 subunit of the hCav1.2 channel, underlie cardiac disorders such as atrial fibrillation, long QT syndrome, conduction disorders, cardiomyopathies, and congenital heart defects [2]. With the advent of whole-exome sequencing data, public databases are rapidly replenished with new gene variants. Over 350 CACNA1C missense variants are listed in public databases but clinical significance is known for only 12% of the variants. The guideline of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG/AMP) recommends the employment of computational tools to predict damaging variants [3]. Numerous computational tools, which are based on different principles, have been developed to predict the pathogenicity and tolerance of genetic variants [4]. The 2 of 9 success rate of these tools varies from 60% to 80% [5]. The ACMG/AMP guideline recommends the employment of multiple software programs for variants' interpretation because individual programs and underlying algorithms have their own strengths and weaknesses. Therefore, the choice of bioinformatics tools is critical for reliable variant interpretation.
The performance of in silico tools is highly variable and depends on the disease phenotype [6]. For instance, tools MCap, MetaSVM, and MetaLR demonstrated the top performance in predicting the pathogenicity for variants associated with abnormalities in the cardiovascular system [6]. However, some methods yielded many false-positive and false-negative predictions of pathogenicity in individual protein families [7][8][9][10]. For example, MetaSVM, which reportedly has a high accuracy in general [11] and in particular for variants associated with cardiovascular diseases, predicted a deleterious effect for 75% of benign variants of the cardiac sodium channel Nav1.5 [7]. Thus, choosing a tool with a high rate of correct predictions for specific protein families allows to improve predictions [7,12].
Many statistical and machine learning tools, which use only protein sequences, have been proposed to predict variant pathogenicity. An alternative approach is the paralogous annotation method [7,13] that is based on analysis of variants in the multiple sequence alignment of functionally and structurally related proteins. The method assumes that if one protein is known to have a damaging mutation in a certain position, analogous mutation in the sequentially matching position of another protein is likely a damaging one. Consensus predictions of pathogenicity by both sequence-based methods and the paralogous annotation within the family of structurally related voltage-gated sodium and calcium channels is expected to provide more reliable pathogenicity prediction for hCav1.2 than individual approaches alone.
A comprehensive predictor assessment requires a benchmark with both positive (pathogenic/likely pathogenic) and negative (benign) variants. Here, we composed a dataset to test the performance of various predictors in identifying known pathogenic/likely pathogenic (P/LP) and benign variants in the families of voltage-gated sodium and calcium channels. We collected common (benign) missense variants from the gnomAD database and P/LP missense variants from three databases that are referred to in the next section. We evaluated the performance of 16 popular prediction tools and identified top-performing tools for the hCav1.2 channel and its paralogs. The best-performing bioinformatics tool, ClinPred, and the paralogue annotation method consensually predicted that 36 hCav1.2 variants of uncertain clinical significance are putative pathogenic/likely pathogenic variants.

Data Collection and Preprocessing
Paralogues of the hCav1.2 channel were previously identified [7]. Sequences of hCav1.2 and its paralogues were obtained from the UniProt database [14]. Pathogenic/likely pathogenic variants of hCav1.2 and its paralogues were collected from three databases: Humsavar (https://www.uniprot.org/docs/humsavar, last visited 26 January 2021), Ensembl Variation [15], and ClinVar [16]. Only 'disease' variants were extracted from the databases Ensembl and Humsavar. From the ClinVar database, we selected variants that are characterized as 'pathogenic' or 'likely pathogenic' and are associated with specific clinical conditions. Benign (neutral) variants along with their minor allele frequencies (AF) were obtained from the population database gnomAD (data release 2.1.1, October 2018) [17]. Variants with AF > 0.0001 were considered as benign [17,18]. Variants of uncertain clinical significance were extracted from ClinVar and Ensembl. All variants were combined into one broad dataset (Table S1).

hCav1.2 Topology
Region borders of the hCav1.2 channel were determined according to Uniprot entry Q13936. The pore-forming α1 subunit of the Cav1.2 channel folds from a single polypeptide chain of four homologous repeat domains (DI-DIV), which are connected by intracellular linkers. Each domain has six transmembrane helices (S1-S6) and a large extracellular membrane-reentering P-loop ( Figure 1). In each repeat, helices S1-S4 constitute a voltage sensing domain, whereas helices S5, S6, and the P-loop contribute a quarter to the pore domain.

hCav1.2 Topology
Region borders of the hCav1.2 channel were determined according to Uniprot entry Q13936. The pore-forming α1 subunit of the Cav1.2 channel folds from a single polypeptide chain of four homologous repeat domains (DI-DIV), which are connected by intracellular linkers. Each domain has six transmembrane helices (S1-S6) and a large extracellular membrane-reentering P-loop ( Figure 1). In each repeat, helices S1-S4 constitute a voltage sensing domain, whereas helices S5, S6, and the P-loop contribute a quarter to the pore domain. Each domain contains six transmembrane segments (S1-S6). Variants of uncertain clinical significance, which are reclassified here as P/LP variants, are marked by green triangles; known P/LP variants are marked by red triangles; and benign variants are marked by blue triangles.

Multiple Sequence Alignment and Paralogue Annotation
The paralogue annotation method specifically identifies P/LP missense variants by transferring annotations across families of related proteins [13,19]. Recently a modified method of paralogue annotation [13] was used to predict the pathogenicity of variants in the cardiac sodium channel hNav1.5 [7]. This approach is applied here to select potential P/LP variants from the large set of VUS in the Cav1.2 channel.
We have chosen 20 paralogues of hCav1.2: ten voltage-gated sodium channels, nine voltage-gated calcium channels, and a voltage-independent non-selective sodium leak channel (Table 1). For each paralogue, P/LP variants were collected (section 2.1). Amino Each domain contains six transmembrane segments (S1-S6). Variants of uncertain clinical significance, which are reclassified here as P/LP variants, are marked by green triangles; known P/LP variants are marked by red triangles; and benign variants are marked by blue triangles.

Multiple Sequence Alignment and Paralogue Annotation
The paralogue annotation method specifically identifies P/LP missense variants by transferring annotations across families of related proteins [13,19]. Recently a modified method of paralogue annotation [13] was used to predict the pathogenicity of variants in the cardiac sodium channel hNav1.5 [7]. This approach is applied here to select potential P/LP variants from the large set of VUS in the Cav1.2 channel.
We have chosen 20 paralogues of hCav1.2: ten voltage-gated sodium channels, nine voltage-gated calcium channels, and a voltage-independent non-selective sodium leak channel (Table 1). For each paralogue, P/LP variants were collected (Section 2.1). Amino acid sequences of Cav1.2 and its paralogues were aligned by Clustal Omega [20]. Each paralogue mutation was mapped on the Cav1.2 sequence according to the alignment (Table S2). Position-specific conservation scores (Cs) were calculated using the Zvelebil method [21] as implemented in the Amino Acid Conservation Calculation Service [22]. Variants in positions with conservation scores > 0.3 were considered as P/LP variants according to References [7,19].

Annotation of Missense Variants
We used the dbNSFPv4 database [23] to evaluate the performance of popular tools in predicting P/LP and benign variants. The database contains pre-computed scores for all potential amino acid substitutions that are taken from the following tools: SIFT, Polyphen HVAR, Polyphen HDIV, Mutation Taster, Mutation Assessor, PROVEAN, FATHMM, FATHMM_XF, MetaSVM, MetaLR, CADD, ClinPred, REVEL, PrimateAI, Eigen, and MCap [4]. The performance of algorithms was evaluated using the broad dataset, which includes P/LP and benign variants from hCav1.2 and its 20 paralogues. Variant prioritization tools do not always provide scores for every variant reported in dbNSFP. We selected only those tools in cases where scores are missing for less than 30% of the variants in our dataset. Altogether, 16 different scores were considered. We calculated the area under the ROC (receiver operating characteristic) curve (AUC) using the library pROC in R [24]. The higher the AUC score, the better is the tool performance in the dataset. ROC curves were obtained by plotting sensitivity against (1-specificity) at each threshold for each tool.
To increase the reliability of the analysis, we determined the optimal pathogenicity threshold using the R package pROC ( Table 2). All variants were divided into two categories (P/LP or benign) according to the calculated thresholds. To evaluate the performance of these tools, we calculated the following standard characteristics: where TP (true positive) is the number of P/LP variants correctly predicted as pathogenic; FN (false negative) is the number of P/LP variants incorrectly predicted as benign; TN (true negative) is the number of benign variants correctly predicted as benign; and FP (false positive) is the number of benign variants incorrectly predicted as pathogenic. Custom threshold is the custom pathogenicity threshold that divides variants in two categories: pathogenic or benign. The larger or smaller the score than the threshold, the more likely the variant is damaging. Sensitivity characterizes the number of P/LP variants, which were predicted as P/LP by the tool, while specificity characterizes the number of benign variants, which were predicted as benign by the tool. Accuracy indicates the predictive accuracy of the tool [25].

Distribution of Missense Variants in Topological Regions of hCav1.2
To identify hCav1.2 regions with pathogenic or benign variants, we explored the occurrence of mutation in the regions. About 70% of hCav1.2 missense variants are localized in cytoplasmic linkers DI-DII and DII-DIII, and in N and C-terminal parts. Most of the P/LP variants appear in linkers DI-DII and DII-DII. Most of the benign variants are localized in the C-terminal part of the channel protein ( Figure 1). Highly conserved transmembrane domains contain relatively few P/LP variants.
Most of the P/LP variants of hCav1.2 are associated with long QT syndrome (56%). Other variants are associated with Timothy syndrome, Brugada syndrome, inborn genetic diseases, and other CACNA1C-related disorders (Table S1). Variants causing long QT syndrome and Timothy syndrome most often are localized in interdomain linkers DI-DII and DII-DIII. Some variants are associated with two or more diseases.

Amino Acid Substitutions in P/LP and Benign Variants
We analyzed statistics of amino acid substitutions in the 21 channels. Arg and Leu have high mutation rates in the P/LP variants, whereas Arg and Ala are highly mutable in benign variants. The most frequent event, irrespective of disease relevance, is the substitution of a hydrophobic residue by another hydrophobic residue. Hydrophobic substitutions of Leu by Pro are frequently associated with diseases, whereas in neutral variants, substitutions of Pro by Leu are most frequent (Table 3). Table 3. Occurrence of top five residue types in P/LP and benign variants.

Type of Data WT Residue Mutant Residue Preferred Substitutions
All The human genome has 20 paralogues of the Cav1.2 channel: ten voltage-gated sodium channels, nine voltage-gated calcium channels, and a voltage-independent nonselective sodium leak channel (Table 1). Using the multiple sequence alignment of hCav1.2 and its paralogues (Section 2.3), we mapped each paralogue protein residue with a known P/LP variant onto the sequentially matching amino acid of hCav1.2 (Table S2).
A total of 146 known P/LP variants in paralogues were mapped to 89 variants in Cav1.2 (Table S2). Among these, 11 variants correspond to known P/LP variants of Cav1.2, two correspond to benign variants, and 76 correspond to VUS. Most of the paralogue variants were mapped to linkers DI-DII and DII-DIII, N and C-terminal parts, and S5/S6 loops of repeat domains DI and DIII (Table S2).

Comparing Performance of the Computational Tools
We compared the performance of 16 in-silico tools that predict the pathogenicity of variants (Table 2) [23]. Pre-computed algorithm scores were retrieved from the dbNSFPv4 database. ROC curves and AUC are shown in Figure 2. For each tool, we determined the optimal pathogenicity threshold and calculated sensitivity, specificity, and accuracy ( Table 2).
We used AUC as the main measure of performance. ClinPred demonstrated the best predictive performance (AUC = 0.97) in the broad dataset (Figure 2), followed by REVEL (AUC = 0.93) and MCap (AUC = 0.91). Moreover, ClinPred is the only tool that performed with accuracy as (AUC) > 0.90. It correctly classified 95% of the P/LP variants as pathogenic variants and 94% of the benign variants as benign variants (Table 2). MutationAssessor, Polyphen, FATHMM, and MutationTaster performed with an accuracy of < 0.80. The accuracy of other algorithms ranged from 0.80 to 0.88. The lowest accuracy across all methods was found for MutationTaster (accuracy = 0.62, AUC = 0.66). For the hCav1.2 channel, ClinPred, CADD, MetaSVM, Polyphen HDIV, and MutationTaster correctly classified all 22 P/LP variants as pathogenic. However, these methods also assigned pathogenic status for some benign variants. ClinPred predicted 4/21 (19%) benign variants as pathogenic. Polyphen HDIV and MutationTaster classified less than 60% of benign variants as benign.
The results indicate that ClinPred is the best-performing pathogenicity predictor for variants in the family of voltage-gated sodium and calcium channels.

Reclassifying VUS Variants of hCav1.2 with ClinPred and Paralogue Annotation
Most of the Cav1.2 variants in our broad dataset are currently classified as VUS. We used the best-performing tool, ClinPred, to predict the pathogenicity of VUS'. Clin-Pred predicted 188 VUS variants as pathogenic/likely pathogenic (pathogenicity threshold > 0.66). Among the 188 VUS, we further selected only those variants that are annotated as pathogenic or likely pathogenic in at least one of 20 paralogs of Cav1.2 (conservation score across paralogues Cs > 0.3). Both methods consensually predicted 36 of 309 VUS as P/LP variants. We reclassified these as putative P/LP variants ( Table 4). The variants were localized mainly in the extracellular loops DI-S5/S6 and DIII-S5/S6 (Figure 1).
The results indicate that ClinPred is the best-performing pathogenicity predictor for variants in the family of voltage-gated sodium and calcium channels.

Discussion
Numerous bioinformatics methods are used to predict the deleteriousness of missense variants. These methods rely on supervised machine learning models that are trained on a collection of manually annotated variants to predict the probable pathogenicity for each amino acid substitution. Our recent analysis revealed that in the cardiac sodium channel hNav1.5, the MetaSVM method, which reportedly has a high accuracy in general [11] and in particular for variants associated with cardiovascular diseases [6], predicted a deleterious effect for 75% of variants that are annotated as benign [7]. It is recommended to find a specific predictor with the optimal threshold value for each family of proteins [12].
In the present study, we have shown that various popular bioinformatics tools yield different predictions of pathogenicity for known P/LP and benign variants in the family of the hCav1.2 channel and its paralogues. In our broad dataset, ClinPred performed best with an area of 0.97 under the receiver operating characteristic curve. ClinPred is a metapredictor that combines commonly used and recently developed individual prediction tool scores, as well as the allele frequency of variants in different populations from the gnomAD database [26]. Its high AUC and prediction accuracy values in our dataset suggest that ClinPred was trained on large newer datasets, which overlap with the dataset used in this study.

Conclusions
Here, we created a broad dataset that includes all known missense variants in the hCav1.2 channel and 20 paralogues voltage-gated calcium and sodium channels. Most of the known pathogenic/likely pathogenic variants are found in intracellular linkers DI-DII and DII-DIII. Among sixteen pathogenicity prediction tools, which were tested using our broad dataset, ClinPred demonstrated the best performance in distinguishing pathogenic/likely pathogenic variants from benign ones. The best-performing tool is expected to improve in silico assessment of clinically relevant variants of hCav1.2 and its paralogues. ClinPred and the paralogue annotation method consensually predicted that 36 variants of hCav1.2, which are currently classified as variants of unknown significance, are pathogenic/likely pathogenic variants. Most of these variants are located in the extracellular loops DI-S5/S6 and DIII-S5/S6. The reclassified variants can be used for diagnostics of cardiac diseases. They are also promising targets for further experimental and theoretical studies.