Performance of mutation pathogenicity prediction tools on missense variants associated with 46,XY differences of sex development

OBJECTIVES: Single nucleotide variants (SNVs) are the most common type of genetic variation among humans. High-throughput sequencing methods have recently characterized millions of SNVs in several thousand individuals from various populations, most of which are benign polymorphisms. Identifying rare disease-causing SNVs remains challenging, and often requires functional in vitro studies. Prioritizing the most likely pathogenic SNVs is of utmost importance, and several computational methods have been developed for this purpose. However, these methods are based on different assumptions, and often produce discordant results. The aim of the present study was to evaluate the performance of 11 widely used pathogenicity prediction tools, which are freely available for identifying known pathogenic SNVs: Fathmn, Mutation Assessor, Protein Analysis Through Evolutionary Relationships (Phanter), Sorting Intolerant From Tolerant (SIFT), Mutation Taster, Polymorphism Phenotyping v2 (Polyphen-2), Align Grantham Variation Grantham Deviation (Align-GVGD), CAAD, Provean, SNPs&GO, and MutPred. METHODS: We analyzed 40 functionally proven pathogenic SNVs in four different genes associated with differences in sex development (DSD): 17β-hydroxysteroid dehydrogenase 3 (HSD17B3), steroidogenic factor 1 (NR5A1), androgen receptor (AR), and luteinizing hormone/chorionic gonadotropin receptor (LHCGR). To evaluate the false discovery rate of each tool, we analyzed 36 frequent (MAF>0.01) benign SNVs found in the same four DSD genes. The quality of the predictions was analyzed using six parameters: accuracy, precision, negative predictive value (NPV), sensitivity, specificity, and Matthews correlation coefficient (MCC). Overall performance was assessed using a receiver operating characteristic (ROC) curve. RESULTS: Our study found that none of the tools were 100% precise in identifying pathogenic SNVs. The highest specificity, precision, and accuracy were observed for Mutation Assessor, MutPred, SNP, and GO. They also presented the best statistical results based on the ROC curve statistical analysis. Of the 11 tools evaluated, 6 (Mutation Assessor, Phanter, SIFT, Mutation Taster, Polyphen-2, and CAAD) exhibited sensitivity >0.90, but they exhibited lower specificity (0.42-0.67). Performance, based on MCC, ranged from poor (Fathmn=0.04) to reasonably good (MutPred=0.66). CONCLUSION: Computational algorithms are important tools for SNV analysis, but their correlation with functional studies not consistent. In the present analysis, the best performing tools (based on accuracy, precision, and specificity) were Mutation Assessor, MutPred, and SNPs&GO, which presented the best concordance with functional studies.


' INTRODUCTION
The term ''differences in sex development'' (DSD) refers to congenital conditions in which chromosomal, gonadal, or anatomical sex development is atypical (1). They can be classified into three major categories: sex chromosome DSDs, 46,XX DSDs, and 46,XY DSDs (2). Most causes of DSDs are genetically determined, and several genes have been found to be associated with the DSD phenotype (3). Recent studies in individuals with DSDs have characterized numerous single nucleotide variants (SNV) in several genes, most of which are benign polymorphisms. However, distinguishing rare disease-causing SNVs from rare polymorphisms remains challenging. Functional studies for disease association variants are often used, but are laborious and timeconsuming (4,5).
Many methods have been developed for the computational prediction of the pathogenicity of SNVs, which are based on evolutionary conservation, protein structure/function, or assembly parameters, such as allelic diversity, pathogenicity, and association with genome-wide association studies (6). Studies analyzing the performance of prediction programs have been completed using a large number of missense variants (7). In the present study, we compared the performance of 11 widely used pathogenic prediction tools in the analysis of proven pathogenic DSD-causing SNVs in four different genes.

Dataset
We analyzed 40 disease-causing SNVs in four different genes associated with DSD: 17b-hydroxysteroid dehydrogenase 3 (HSD17B3), steroidogenic factor 1 (NR5A1), androgen receptor (AR), and luteinizing hormone/chorionic gonadotropin receptor (LHCGR). All pathogenic allelic variants have been published with functional studies showing loss of function activity (Table 1). To evaluate the false discovery rate of each tool, we selected 36 frequent benign SNVs (MAF40.01) found in the same DSD genes (Table 1).

Statistical Analysis
The quality of the predictions was analyzed using six parameters: accuracy, precision, negative predictive value (NPV), sensitivity, specificity, and Matthews correlation coefficient (MCC). In the equations below, tp, tn, fp, and fn refer to true positive, true negative, false positive, and false negative, respectively.
The MCC (43) is an important statistics tool that is widely used in bioinformatics as a performance metric, as it is not affected by the differing proportions of neutral and pathogenic datasets predicted by the different programs. Additionally, we also assessed the overall performance of deleterious prediction with the receiver operating characteristic (ROC) curve and area under the curve (AUC), using MedCalc for Windows, version 15.0 (MedCalc Software, Ostend, Belgium). ROC curves are an indicator of probability and performance for classification problems at various threshold settings, and AUCs represent the degree or measure of separability. Together, they indicate how capable a model is of distinguishing between classes. The higher the AUC, the better the model is at predicting an outcome (44).

' RESULTS
Based on the results for each program, none of the tools were 100% precise in identifying pathogenic SNVs. The values for the parameters measured are listed in Table 3, and include all pathogenic and benign variants. Phanter had the highest precision in the classification of pathogenic variants (38 out of 40 known to be pathogenic), followed by Mutation Taster and Polyphen-2 (both 37 out of 40 known to be pathogenic). Align-GVGD correctly classified fewer known pathogenic SNVs than any other tool (33 of 40 known to be pathogenic). Phanter and Mutation Taster both classified a high number of know benign SNVs as pathogenic (21 and 17, respectively, of 36).
Mutation Assessor, MutPred, and SNPs&GO presented more consistent results regarding the nature of the SNVs    (40), (41), (42) (pathogenic or benign). MutPred had the highest accuracy, precision, and specificity (0.83, 0.85, and 0.83, respectively), as seen in Table 4. Mutation Assessor has the highest sensitivity of all the tools evaluated, although five other tools (Phanter, SIFT, Mutation Taster, Polyphen-2, and CAAD) exhibited sensitivity 40.90, however, they were found to have lower specificity (0.42-0.67). Based on MCC, performance ranged from poor (Fathmn=0.04) to reasonably good (MutPred=0.66). Fathmn and Align-GVGD exhibited the worst performance, with a high number of false positive results (MMC=0.04 and 0.06, respectively). The comparative predictive performance of each tool was evaluated using the AUC scores from ROC plots and the true negative rate (TNR, or specificity) as measurements. The analysis was separated into random groups, since the program analyzed a maximum of six samples at a time (Figure 1

' DISCUSSION
In the present study, we analyzed and compared the abilities of 11 widely available tools for predicting the pathogenicity of SNVs. Although some algorithms are based on the same data sets, they differ in the database for conservation analysis and structural attributes. They also differ in the information required to run the predictions, as some programs request the accession number of the gene, others the protein change, nucleotide change, or chromosomal position.
Overall, we found that Mutation Assessor, MutPred, and SNPs&GO were the most reliable predictors for SNV classifications. They also exhibited the best AUC results. The accuracy of all tools evaluated ranged from poor to reasonably good (MMC=0.04-0.66). These results are consistent with what has been shown in previous studies (7,9), which is that the number of samples used in the analysis did not influence the statistical result as much.
In conclusion, computational algorithms are important screening tools for prioritizing and identifying disease-causing SNVs, but their correlation with functional studies is not consistent. In the present analysis, the highest-performing tools were Mutation Assessor, MutPred, and SNPs&GO.

' AUTHOR CONTRIBUTIONS
Montenegro LR contributed to the acquisition, analysis, interpretation of data, and drafting of the article. Lerario AM contributed to the interpretation of data and revising the article. Nishi MY contributed to the interpretation of data, drafting and revising the article. Jorge AA contributed to the analysis and interpretation of data. Mendonca BB contributed to the conception and design of the study, drafting and revising the article.