Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Optimal distribution-preserving downsampling of large biomedical data sets (opdisDownsampling)

Table 3

Mean squared errors of PCA and autoencoder-based data reproduction of the remaining data from the sampled data subset.

Samples of 0.001 and 0.01%, for the smaller iris and miRNA data sets of 1% and 10%, of the data were drawn once using uniform sampling or 1,000 times using uniform sampling with different seeds, followed by selection of the sample that best matched the original distribution of variables, judged by statistical comparisons of probability density functions. The sampled data were subjected to projection using either PCA or a single-layer autoencoder, and then the projection parameters were used to predict the remaining data that had not been sampled from the original data set. The experiments were performed in 20 replicates starting with different and non-redundant seeds, and the means and standard deviations of the mean square errors of the data reproduction obtained during these replicates are shown.

Table 3

doi: https://doi.org/10.1371/journal.pone.0255838.t003