Effects of Sample Size on Differential Gene Expression, Rank Order and Prediction Accuracy of a Gene Signature
Figure 2
Box-and-whiskers plot showing the mean internal cross-validation accuracy of sex prediction for different sample sizes.
Sample sizes tested ranged from n = 10 (5♀, 5♂) to n = 110 (55♀, 55♂). To calculate the mean 10-fold cross validation prediction accuracy, for each n ( = 10…110), we built classification models using a randomly selected size-n subsamples of our full dataset of n = 134. This was repeated 50 times and the median prediction accuracy for each sample was calculated. As sample size increased, so did prediction accuracy.