Widespread Traces of Lytic Kaposi Sarcoma-Associated Herpesvirus in Primary Effusion Lymphoma at Single-Cell Resolution

Cancer cells of primary effusion lymphoma (PEL) often contain both Kaposi sarcoma-associated herpesvirus (KSHV) and Epstein-Barr virus (EBV). We measured the interplay of human, KSHV, and EBV transcription in a cell culture model of PEL using single-cell RNA sequencing. The data detect widespread trace expression of lytic KSHV genes.

I mmortalized cell lines derived from primary effusion lymphoma (PEL) positive for both Kaposi sarcoma-associated herpesvirus (KSHV) and Epstein-Barr virus (EBV) serve as tractable systems for studying transcriptional host-pathogen interactions. We grew BC-1 cells (1) in culture (2) and processed log-phase samples for single-cell RNA deep sequencing (scRNA-seq) with Gel beads in EMulsion microfluidic (GEM) technology (3) using a Chromium Single Cell 39 Library & Gel Bead Kit v2 (10X Genomics, Pleasanton, CA). The experiment was repeated twice with two independent biological replicates that each yielded ;400 to 700 million paired-end reads of 28 to 101 nucleotides.
We built a bioinformatics pipeline to simultaneously measure human and viral transcription. Reads were demultiplexed and aligned using Cell Ranger (10X Genomics). A custom genome reference was created by appending the viral genomes of EBV (4) (GenBank accession number NC_007605.1) and KSHV (5) (GenBank accession number U75698.1) to the human genome GRCh38. Reads aligning to viral genomes were assigned to features using a heuristic algorithm implemented in Python (https://github .com/jjmirandalab/scrnaseq). Feature definitions were obtained from a published EBV annotation (6) (https://github.com/flemingtonlab/public/blob/master/annotation/ chrEBV_B95_8_Raji.ann) and downloaded from the KSHV reference (5) (GenBank accession number U75698.1). First, each virus-mapping read was assigned the union of all RNA genome features its alignment overlapped with. Second, each molecule was assigned the intersection of all feature assignments from reads with its corresponding unique molecular identifier (UMI). If any molecule mapped to multiple genomes, it was excluded. Finally, for each unique set of features assigned to a molecule, counts were recorded for each cell. Cells were filtered using R scripts (https://github .com/jjmirandalab/scrnaseq). Distributions of cell quality metrics were modeled as log-normal for the total UMI count, log-normal for the proportion of expression from mitochondrial genes, and normal for the number of genes detected. Cells with extreme metric values were then identified according to the fit distributions. Data between samples were normalized with sctransform (7). "Sample" was included as the "batch_var" parameter.
Earlier RNA-seq methods only measured expression of exon regions that did not overlap another transcript (8,9); our approach discards less data. Approximately 20% of the feature sets we count correspond to potentially overlapping transcripts that would have previously been ignored. Comprehensive profiling instills more confidence in our interpretations.
Preliminary analysis of our scRNA-seq data reveals an unexpected plethora of lytic KSHV transcripts. Herpesviruses switch between a transcriptionally quiescent latent state and a reactivated lytic state. Genes were classified as latent or lytic based on expression timing (10,11). For EBV, we detected only latent genes. For KSHV, we detected both latent and lytic genes ( Table 1). Transcription of the lytic Open Reading Frames 16, K9, K4, 54, 45, and 75, as well as the lytic polyadenylated nuclear RNA PAN gene, can each be found in ;5 to 50% of cells. While we usually detected more than 1 count in each cell for the latent genes LANA and vIL6, lytic genes appear with only 1 count most of the time. Greater than 50% of cells contain these trace levels of lytic RNA. In contrast, ;98% of BC-1 cells are thought to harbor latent KSHV because lytic protein is not detected (12). Although more detailed analysis is required, our observations prompt a thoughtful redefinition of a "latent" transcriptome.
Data availability. Our data have been deposited in the NCBI Gene Expression Omnibus (13,14) under accession number GSE154900.

ACKNOWLEDGMENTS
We are grateful to Ethel Cesarman (Weill Cornell Medicine) for the BC-1 cell line and Erin C. Bush (Columbia University Irving Medical Center) for contributing to the scRNAseq experiment.
Research reported in this publication was supported by the National Institute of Allergy and Infectious Diseases of the National Institutes of Health under award number R15AI145652 to J.L.M. This research was funded in part through the NIH/NCI Cancer a Only transcripts present in at least 1% of cells in at least one replicate are shown. b The union of transcripts could be assigned to multiple genes, but all components are latent. c The union of transcripts contains one component of unknown expression timing. d The expression timing of this transcript is not known.