Variant Impact Predictor database (VIPdb), version 2: trends from three decades of genetic variant impact predictors

Background Variant interpretation is essential for identifying patients’ disease-causing genetic variants amongst the millions detected in their genomes. Hundreds of Variant Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs), have been developed for this purpose, with a variety of methodologies and goals. To facilitate the exploration of available VIP options, we have created the Variant Impact Predictor database (VIPdb). Results The Variant Impact Predictor database (VIPdb) version 2 presents a collection of VIPs developed over the past three decades, summarizing their characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns. We previously summarized 217 VIPs and their features in VIPdb in 2019. Building upon this foundation, we identified and categorized an additional 190 VIPs, resulting in a total of 407 VIPs in VIPdb version 2. The majority of the VIPs have the capacity to predict the impacts of single nucleotide variants and nonsynonymous variants. More VIPs tailored to predict the impacts of insertions and deletions have been developed since the 2010s. In contrast, relatively few VIPs are dedicated to the prediction of splicing, structural, synonymous, and regulatory variants. The increasing rate of citations to VIPs reflects the ongoing growth in their use, and the evolving trends in citations reveal development in the field and individual methods. Conclusions VIPdb version 2 summarizes 407 VIPs and their features, potentially facilitating VIP exploration for various variant interpretation applications. VIPdb is available at https://genomeinterpretation.org/vipdb Supplementary Information The online version contains supplementary material available at 10.1186/s40246-024-00663-z.

This need prompted the development of Variant Impact Predictors (VIPs), tools or databases designed to predict the consequences of genetic variants.The first VIP (known to us) was developed in 1993 to predict different types of collagen variants involved in osteogenesis imperfecta, using decision trees [7].Since then, hundreds of genetic VIPs have been developed, with a variety of methodologies and goals [8].Some overlapping categories of variants considered by different tools are single nucleotide variations (SNVs), insertions and deletions (indels), structural variations (SVs), nonsynonymous variants, synonymous variants, splicing variants, and regulatory variants.VIPs are designed for different contexts, such as for germline variants, somatic variants, or specific diseases or genes.While most provide pathogenicity scores, some provide valuable information about molecular mechanisms and other details [9].The variety of VIPs underscores the complex nature of variant interpretation and poses a challenge for users in identifying the most suitable VIPs for their specific needs, and VIPdb aims to help support transparency to inform these decisions.
Many computational impact prediction methods have been developed, yet the field lacks a clear consensus on their appropriate use and interpretation [10].Recognizing the need for an organized approach to explore available VIPs, several research entities have constructed resources facilitating the informed use of VIPs.Initiatives like the Critical Assessment of Genome Interpretation (CAGI) conduct community experiments to assess VIPs across different variant types and contexts (https:// genom einte rpret ation.org) [10][11][12].The dbNSFP (database for Nonsynonymous Single-nucleotide polymorphisms' Functional Predictions) hosts precomputes of several VIP results [13].OpenCRAVAT integrates hundreds of VIP analyses of cancer-related variants in one platform, enhancing accessibility for users [14].These resources have played an important role in introducing users to VIP options.Consequently, we developed VIPdb to serve as a comprehensive resource for exploring VIPs.
To systematically evaluate the pathogenicity of a variant in a clinical laboratory, ACMG/AMP has established guidelines for interpreting genetic variants that integrate several lines of evidence, including population data, functional data, segregation data, and computational prediction [15].ClinGen, CGC, and VICC also have developed standards for the classification of pathogenicity of somatic variants in cancer [16].Historically, VIPs provided only supporting evidence in determining the pathogenicity or benignity of variants in clinical settings.However, recent ClinGen clinical recommendations allow VIPs the potential to provide stronger evidence [17].This greater role for VIPs in providing evidence for clinical decisions could improve genetic disease diagnosis.
The Variant Impact Predictor database (VIPdb) offers a curation of available computational tools for predicting variant impact.Initially established in 2007 and 2010 [18], the database was last updated in 2019 [8].VIPdb version 2 is a comprehensive update through January 2, 2024, with select additional methods added through July 2024 (Supplementary Table S1).

Implementation
Our identification of VIPs involved searching for potential VIPs and examining their articles to determine whether they should be included in VIPdb.In the initial step, we searched the literature using the query "(((tool(Title]) OR (pipeline(Title])) AND (variant(Title/ Abstract]))" on PubMed and collected potential VIPs citing pioneering VIPs (SIFT, PolyPhen, ANNOVAR, and SnpEff ) [19][20][21][22][23][24][25][26][27][28][29][30].Additionally, we gathered potential VIPs from existing databases such as OpenCRAVAT and dbNSFP, as well as from submissions by VIP developers.Subsequently, we examined the literature and included only programs capable of handling variant data, such as VCF files, rsID, or location in the genome, and providing evidence or predictions of the variant impacts.Overall, this resulted in the identification of 190 additional VIPs, augmenting the VIPdb to a total of 407 VIPs (Supplementary Table S1) [7,13,.
To facilitate users' exploration of available VIPs, we described key features of each VIP.VIPs primarily designed for variant impact prediction were labeled as such.VIPs not originally designed for variant impact prediction but nonetheless used for this purpose, such as those estimating conservation scores and population allele frequencies, were categorized as non-primary.VIPs that consist of data collected from elsewhere, such as clinical classifications and functional data, were categorized as databases.Conversely, VIPs that compute variant impact predictions were classified as computational tools (labeled as non-databases) even if the data available are precomputed by the tool.Furthermore, as VIPs are designed for different types of genetic variants, we classified the VIPs according to the following overlapping categories of input: single nucleotide variant (SNV), insertion and deletion (indel) variant, structural variant (SV), nonsynonymous/nonsense variant, synonymous variant, splicing variant, and regulatory region variants, with some overlap among these categories.Licensing information, including whether the VIP is free for academic or commercial use, was also included.In addition, we provided details about accessing VIPs, such as homepage links and source code availability.
In VIPdb version 2, we have made enhancements to inform clinical decision-making.We incorporated calibrated threshold scores recommended by ClinGen for clinical use [17] with ACMG/AMP guidelines for variant classification [15].Additionally, we included community assessment results from the CAGI 6 Annotate All Missense / Missense Marathon challenge [422] to enable users to compare the overall performance of methods and the performance on subsets with high specificity or high sensitivity.
To understand the trends of genetic VIPs over the past three decades, we conducted a citation analysis.We utilized the Entrez module in Biopython to retrieve citation information from the PubMed database.Specifically, the elink function was employed to collect the number of articles citing each VIP, and the esummary function allowed for the collection of publication years for these citations.These functions facilitated the automatic collection of citation numbers by year for each VIP.
In summary, VIPdb version 2 presents a collection of 407 VIPs developed over the past three decades, with their characteristics, citation patterns, publication details, and access information (Supplementary Table S1).VIPdb version 2 is publicly accessible at https:// genom einte rpret ation.org/ vipdb and can be downloaded as a comma-separated values table (Supplementary Table S1).  .Word clouds representing core VIPs over a specific time period, using cumulative citations for core VIPs with multiple publications.Font sizes in the word clouds correspond to the logarithm of citation counts for each period, and cloud heights are scaled by the logarithm of the annual citation averages.The top 10 most cited core VIPs during the period are listed.Note: Core VIPs are methods primarily designed for variant impact prediction and are not classified as databases

Results
We incorporated 190 additional VIPs into VIPdb version 2, alongside the existing 217 VIPs in the previous version of VIPdb.The characteristics of the 407 VIPs are listed in Supplementary Table S1.Among the 407 VIPs in VIPdb version 2, 278 are core VIPs, defined as VIPs primarily designed for variant impact prediction and not a database.
An analysis of the variant type used by VIP showed a predominant focus on predicting the impacts of single nucleotide variants (SNVs) and nonsynonymous variants (Fig. 1).Since the 2010s, there has been a notable surge in the development of VIPs tailored for insertions and deletions (indels), while VIPs dedicated to predicting the impacts of splicing, structural, synonymous, and regulatory variants have grown more modestly (Fig. 1).These observations about VIP variant type not only highlight current focus on but also identify areas that have been less explored, suggesting potential directions for future research.
The citation rate of VIPs continues to rise, while the annual publications of VIPs have reached a plateau (Fig. 2).The increasing citation rates for both the 278 core VIPs and the 129 non-core VIPs reflect the ongoing growth of VIP usage (Fig. 2A).The median total citation for VIPs is 41 from 1993 to 2023, with a 95% quantile of 2559 citations (Fig. 2B).Annual publication showed a stabilization in VIP publications, with some being subsequent publications from previous work (Fig. 2C).
The citation trend of 278 core VIPs from 1993 to 2023 is shown in Figs. 3 and 4. The citation analysis revealed that SIFT and PolyPhen, among the earliest genome-wide ones, are the most cited core VIPs (Figs. 3 and 4).

Discussion and conclusions
VIPdb version 2 provides a comprehensive view of VIPs.To identify the most appropriate VIPs for user's specific needs, users are advised to thoroughly assess the strengths and weaknesses of VIPs before determining their suitability for use.For example, initiatives like the Critical Assessment of Genome Interpretation (CAGI) conduct community experiments to assess VIPs across different variant types and contexts [10][11][12].
Beyond adding new methods as they become available, we plan to enhance VIPdb by adding new fields that increase transparency, such as reporting of molecular mechanisms [9].Additionally, we will incorporate some model information, such as details about the training data, training date, and training method used.New CAGI results and ClinGen calibration will also be added.We welcome suggestions for additional feature fields to be curated in future updates.
With 407 curated VIPs, VIPdb version 2 provides a comprehensive overview of programs designed for variant impact prediction, along with their characteristics, citation patterns, publication details, and access information.VIPdb version 2 is available on the CAGI website (https:// genom einte rpret ation.org/ vipdb) and is also   S1.We invite submissions of new VIPs for the next version of VIPdb.

Fig. 4
Fig. 4 Citation trend of the top 15 most cited core VIPs in the year 2023.Note: Core VIPs are methods primarily designed for variant impact prediction and are not classified as databases