Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Phylogenetic Profiling: How Much Input Data Is Enough?

Figure 1

Predictive accuracy of phylogenetic profiling, measured in Area Under the Precision-Recall Curve (AUPRC), when we change the amount of data available for training the model for functional annotation.

For each year from 2005 to 2013 denoted on the x-axis, the corresponding dataset includes those genomes that were available both in the OMA database, as well as the NCBI taxonomy database in the respective year; the phylogenetic profiles are annotated using the UniProt-GOA file available in January of the respective year. Each violin plot summarizes the distribution of GO terms according to the AUPRC value: the area of the plot corresponds to the probability density of GO terms at different values of AUPRC; the black dot denotes the mean value of AUPRC. A) We consider 1093 GO terms in total—those that had sufficient annotation information in the most recent database releases. If the model does not have enough data to infer annotations for one of the 1093 GO terms, as will be the case for, e.g., 846 of these GO terms using the data from 2005, its AUPRC score is zero. B) We consider only the GO terms that had sufficient annotation information throughout the analysed releases. C) We consider all the GO terms from the prokaryotic GO set.

Figure 1

doi: https://doi.org/10.1371/journal.pone.0114701.g001