Systematic benchmarking of omics computational tools

Computational omics methods packaged as software have become essential to modern biological research. The increasing dependence of scientists on these powerful software tools creates a need for systematic assessment of these methods, known as benchmarking. Adopting a standardized benchmarking practice could help researchers who use omics data to better leverage recent technological innovations. Our review summarizes benchmarking practices from 25 recent studies and discusses the challenges, advantages, and limitations of benchmarking across various domains of biology. We also propose principles that can make computational biology benchmarking studies more sustainable and reproducible, ultimately increasing the transparency of biomedical data and results.

2 read alignments. In fact, RNA biology is an area of study where one can realistically argue that simulated data is the only alternative available when preparing a gold standard 51 .
The second step of differential expression analysis is the transcriptome quantification problemto identify the gene and isoform from which each read was originated, and how to use those reads to quantify the expression levels of genes and RNA isoforms. True expression levels of isoform and genes are impossible to measure even in a simple bacterial organism, where RNA transcripts are not subject to alternative splicing. Human RNA transcripts undergo alternative splicing, which presents an even more substantial challenge to obtaining a gold standard. Lack of a gold standard for gene and isoform expression levels forces the biomedical community to adopt alternative technologies for obtaining a gold standard. Measurements of gene and isoform expression levels obtained by alternative technology should not be considered a true set, as they have their own inherent biases and limitations. For example, qPCR-widely considered the gold standard for gene expression profiling-has been shown to exhibit strong deviations of ~5-10% across various targets 17 .
The third step of differential expression analysis is the expression normalization problem-to remove the biases and the variance introduced by experimental issues, while preserving the true biological variation. Currently, we lack experimental techniques capable of estimating true biological variation and differentiating variation from technical noise. Current RNA-seq analysis methods typically standardize data between samples by scaling the number of reads in a given library to a common value across all sequenced libraries, which is an oversimplification for many biological applications 64 . Lack of a gold standard prevents the biomedical research 3 community from assessing the performance of the tools that measure biological and technical variance 65 .
The final step of differential expression analysis is to determine differentially expressed (DE) genes. This problem involves running a large number of hypothesis tests in parallel, one for each gene or isoform. To properly benchmark this problem, one needs to vary multiple parameters, including the number of replicates, the number of DE genes, and the effect sizes. Nonetheless, the accurate gold standard cannot be obtained by current experimental procedures. The complexity of the differential expression analysis problem prevents the level of comprehension needed in a benchmarking study to evaluate all steps of RNA-Seq analysis. Instead, benchmarking studies separately evaluate each step of the problem 45 .
Lack of an accurate gold standard imposes a significant limitation on benchmarking studies.
Researchers planning to perform the benchmarking study face a dilemma, where, on one hand, they do not have access to experimental techniques to generate accurate gold standard, and, on another hand, it is known that the extreme complexity of the problem cannot be captured by simulated data. One compromise is to enhance the simulated data with the real data or to adjust the real data to the needs of the benchmarking study using computational techniques.

Supplementary Note 2: An example of a log file for a software tool installation and running
The log file includes any necessary dependencies and documents needed for the process of installing the software tools and corresponding dependencies. Include any errors that occurred while installing dependencies and the commands used to overcome these installation problems.
The log file documents the type of files that needed in order to input data into the tools and the format of the output file. This is the possible structure of the log file: · Input · Output · Dependencies · Commands used to install the tool · Commands used to run the tool · Reason the tool is impossible to install. This should include the exact error message and document steps (if any) which were performed to resolve the problem. In case the software developers were contacted, their suggestions should be listed here.