Abstract
A question of fundamental biological significance is to what extent the expression of a subset of genes can be used to recover the full transcriptome, with important implications for biological discovery and clinical application. To address this challenge, we present GAIN-GTEx, a method for gene expression imputation based on Generative Adversarial Imputation Networks. In order to increase the applicability of our approach, we leverage data from GTEx v8, a reference resource that has generated a comprehensive collection of transcriptomes from a diverse set of human tissues. We compare our model to several standard and state-of-the-art imputation methods and show that GAIN-GTEx is significantly superior in terms of predictive performance and runtime. Furthermore, our results indicate strong generalisation on RNA-Seq data from 3 cancer types across varying levels of missingness. Our work can facilitate a cost-effective integration of large-scale RNA biorepositories into genomic studies of disease, with high applicability across diverse tissue types.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Updated ORCID