Exploring the Optimization of Autoencoder Design for Imputing Single-Cell RNA Sequencing Data

Nan Miles Xi; Jingyi Jessica Li

doi:10.1101/2023.02.16.528866

Abstract

Autoencoders are the backbones of many imputation methods that aim to relieve the sparsity issue in single-cell RNA sequencing (scRNA-seq) data. The imputation performance of an autoencoder relies on both the neural network architecture and the hyperparameter choice. So far, literature in the single-cell field lacks a formal discussion on how to design the neural network and choose the hyperparameters. Here, we conducted an empirical study to answer this question. Our study used many real and simulated scRNA-seq datasets to examine the impacts of the neural network architecture, the activation function, and the regularization strategy on imputation accuracy and downstream analyses. Our results show that (i) deeper and narrower autoencoders generally lead to better imputation performance; (ii) the sigmoid and tanh activation functions consistently outperform other commonly used functions including ReLU; (iii) regularization improves the accuracy of imputation and downstream cell clustering and DE gene analyses. Notably, our results differ from common practices in the computer vision field regarding the activation function and the regularization strategy. Overall, our study offers practical guidance on how to optimize the autoencoder design for scRNA-seq data imputation.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

I. We have added one subsection, “Sensitivity analysis with varying numbers of highly variable genes,” to the Method section. This subsection shows that our major findings remain consistent across various numbers of highly variable genes. II. We have included a new paragraph in the Discussion section to address the variation in imputation performance across datasets with different characteristics. III. We have included another new paragraph in the Discussion section to discuss the tradeoff between the imputation performance and computational time in the autoencoder design. IV. We have changed the title of our manuscript to “Exploring the Optimization of Autoencoder Design for Imputing Single-Cell RNA Sequencing Data.” The revised title better aligns with the objectives and findings of our study. V. We have thoroughly revised the writing of our manuscript.

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.