Effects of Sample Size on Differential Gene Expression, Rank Order and Prediction Accuracy of a Gene Signature

doi:10.1371/journal.pone.0065380

Effects of Sample Size on Differential Gene Expression, Rank Order and Prediction Accuracy of a Gene Signature

Figure 2

Box-and-whiskers plot showing the mean internal cross-validation accuracy of sex prediction for different sample sizes.

Sample sizes tested ranged from n = 10 (5♀, 5♂) to n = 110 (55♀, 55♂). To calculate the mean 10-fold cross validation prediction accuracy, for each n ( = 10…110), we built classification models using a randomly selected size-n subsamples of our full dataset of n = 134. This was repeated 50 times and the median prediction accuracy for each sample was calculated. As sample size increased, so did prediction accuracy.

doi: https://doi.org/10.1371/journal.pone.0065380.g002