Abstract
Non-negative matrix factorization (NMF) is an unsupervised learning method well suited to high-throughput biology. Still, inferring biological processes requires additional post hoc statistics and annotation for interpretation of features learned from software packages developed for NMF implementation. Here, we aim to introduce a suite of computational tools that implement NMF and provide methods for accurate, clear biological interpretation and analysis. A generalized discussion of NMF covering its benefits, limitations, and open questions in the field is followed by three vignettes for the Bayesian NMF algorithm CoGAPS (Coordinated Gene Activity across Pattern Subsets). Each vignette will demonstrate NMF analysis to quantify cell state transitions in public domain single-cell RNA-sequencing (scRNA-seq) data of malignant epithelial cells in 25 pancreatic ductal adenocarcinoma (PDAC) tumors and 11 control samples. The first uses PyCoGAPS, our new Python interface for CoGAPS that we developed to enhance runtime of Bayesian NMF for large datasets. The second vignette steps through the same analysis using our R CoGAPS interface, and the third introduces two new cloud-based, plug-and-play options for running CoGAPS using GenePattern Notebook and Docker. By providing Python support, cloud-based computing options, and relevant example workflows, we facilitate user-friendly interpretation and implementation of NMF for single-cell analyses.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
jjohn450{at}jhmi.edu
atsang5{at}jhu.edu
jmitch81{at}jhmi.edu
edavis71{at}jhu.edu
tomsherman159{at}gmail.com
jliefeld{at}cloud.ucsd.edu
mloth1{at}jhmi.edu, Melanie.Loth{at}Pennmedicine.upenn.edu
loyalgoff{at}jhmi.edu
jzimme27{at}jhmi.edu
bkinnyk1{at}jhmi.edu
ejaffee{at}jhmi.edu
ptamayo{at}ucsd.edu
jmesirov{at}health.ucsd.edu, https://mesirovlab.org/
mmreich{at}cloud.ucsd.edu
ejfertig{at}jhmi.edu, https://fertiglab.com/
gsteinobrien{at}jhmi.edu