Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

The influence of dataset homology and a rigorous evaluation strategy on protein secondary structure prediction

Fig 9

Advanced feasibility tests for the proposed development and evaluation strategy for SSP methods.

In these two homology reduction test groups, the layout of Fig 7A and the seven advanced SSP methods developed before 2016 were applied (4 repeats). The PSSM reference dataset used in each test was fixed by random sampling to be 5 million proteins, the SSP accuracy saturating dataset size reported in [50]. The TS115 and CASP12 were composed of novel protein structures determined after Jan. 2016; hence, they had few homologs in either the UniRef90-2015 or UniRef30-2015. As time went by, they had more and more homologs deposited in the UniRef, and their overall homology with UniRef-2017 and UniRef-2019 gradually increased. When the reference dataset was adequately homology-reduced against the query sets, e.g., the NR30 group, the accuracy was stable no matter the UniRef of which year was used. Contrarily, if the homology reduction were insufficient, the observed accuracy was not reliable, even if the query set was advertised as an “independent dataset”.

Fig 9

doi: https://doi.org/10.1371/journal.pone.0254555.g009