Functional characterization of RebL1 highlights the evolutionary conservation of oncogenic activities of the RBBP4/7 orthologue in Tetrahymena thermophila

Abstract Retinoblastoma-binding proteins 4 and 7 (RBBP4 and RBBP7) are two highly homologous human histone chaperones. They function in epigenetic regulation as subunits of multiple chromatin-related complexes and have been implicated in numerous cancers. Due to their overlapping functions, our understanding of RBBP4 and 7, particularly outside of Opisthokonts, has remained limited. Here, we report that in the ciliate protozoan Tetrahymena thermophila a single orthologue of human RBBP4 and 7 proteins, RebL1, physically interacts with histone H4 and functions in multiple epigenetic regulatory pathways. Functional proteomics identified conserved functional links for Tetrahymena RebL1 protein as well as human RBBP4 and 7. We found that putative subunits of multiple chromatin-related complexes including CAF1, Hat1, Rpd3, and MuvB, co-purified with RebL1 during Tetrahymena growth and conjugation. Iterative proteomics analyses revealed that the cell cycle regulatory MuvB-complex in Tetrahymena is composed of at least five subunits including evolutionarily conserved Lin54, Lin9 and RebL1 proteins. Genome-wide analyses indicated that RebL1 and Lin54 (Anqa1) bind within genic and intergenic regions. Moreover, Anqa1 targets primarily promoter regions suggesting a role for Tetrahymena MuvB in transcription regulation. RebL1 depletion inhibited cellular growth and reduced the expression levels of Anqa1 and Lin9. Consistent with observations in glioblastoma tumors, RebL1 depletion suppressed DNA repair protein Rad51 in Tetrahymena, thus underscoring the evolutionarily conserved functions of RBBP4/7 proteins. Our results suggest the essentiality of RebL1 functions in multiple epigenetic regulatory complexes in which it impacts transcription regulation and cellular viability.


INTRODUCTION
Eukaryotic chromatin provides a means to compact the genome as well as contributing to regulating DNAmediated processes, such as replication, recombination, repair, and transcription (1). Histone chaperones and chromatin remodelers, as well as histone posttranslational modifications (PTMs), play pivotal roles in chromatin-related processes (2). For example, multiple histone chaperones, such as nuclear autoantigenic sperm protein (NASP) and anti-silencing factor 1 (Asf1), have been reported to function in the transport pathway of newly synthesized H3/H4 (3,4). Furthermore, chromatin assembly factor-1 (CAF1) and histone-regulator-A (HIRA) function in DNA replication-dependent (RD) and replication-independent (RI) chromatin assembly processes to deposit either H3-H4 or H3.3-H4, respectively (5,6). ATP-dependent chromatin remodelers, on the other hand, mobilize DNA around nucleosomes and function in transcription regulation (7). Notably, the 'Nucleosome Remodeling and Deacetylase' (NuRD) complex is widely present at gene promoter regions and enhancer elements and plays an important role in regulating transcription and genome integrity, as well as cell cycle progression (8,9). Finally, histone PTMs are important for chromatin assembly, as well as regulation of gene expression (10). Newly synthesized H4 histones are di-acetylated at lysine (K) 5 and K12 residues, which has been shown to be important for chromatin assembly (11). These deposition-related H4K5/12ac marks are installed by the HAT1-complex, which is composed of at least two subunits, including catalytic Hat1 and histone-binding WD40 repeat protein Hat2 (RBBP4 in humans) (12,13). In Saccharomyces cerevisiae an additional nuclear subunit called Hif1 (human NASP) also interacts with the HAT1-complex at roughly stoichiometric levels (14). Whereas the depositionrelated H4 PTMs are conserved across species (15), the corresponding enzymatic complex has not been well described in evolutionarily distant protozoan lineages.
RBBP4 also functions as a component of transcriptional regulator 'multi-vulva class B' (MuvB)-complex (33)(34)(35). The mammalian core MuvB complex, composed of five subunits including Lin9, Lin37, Lin52, Lin54 and RBBP4, is a master regulator of cell-cycle-dependent gene expression (33). The composition of the core MuvB remains unaltered during different cell cycle stages, whereas its interaction partners change, which reverses its function from a transcriptional repressor to an activator (33,36,37). The core MuvB interacts with E2F4/5, DP1/2 and p130/p107 proteins to form the repressor 'dimerization partner (DP), RB-like, E2F and MuvB' (DREAM) complex, which represses the G1/S and G2/M genes during G0 and early G1 (35,36). The DREAM-specific proteins dissociate during late G1, and the MuvB core then forms an interaction with the B-MYB transcription factor during S phase (35,38). The interaction of B-MYB with MuvB is necessary for FOXM1 recruitment during G2 (38). These B-MYB-MuvB (MMB) and FOXM1-MuvB complexes stimulate the expression of genes that peak during G2 and M phases (34). Among the MuvB core subunits, Lin54 is a DNA-binding protein and has two tandem cysteine-rich (CXC) domains that share sequence similarity with Tesmin, a testis-specific metallothionein-like protein (39,40). Lin54 directly binds to DNA by recognizing the 'cell cycle genes homology region' (CHR) element (41)(42)(43). The CHR is considered to be the central promoter element required for the cell cycle-dependent regulation of G2/M phase-expressed genes (43). Several studies have shown that the DREAM-complex localizes to its target promoters via E2F/DP and LIN-54 DNA sequence motifs (44)(45)(46). In addition to Lin54, DREAM also likely comes in contact with nucleosomes by utilizing RBBP4 via its interaction with histones (33). Furthermore, Drosophila p55 (RBBP4) has been shown to be important for DREAM-mediated gene repression (47).
It is increasingly recognized that epigenetic aberrations contribute to tumor initiation and development (48). As such, it is not surprising that RBBP4 and RBBP7 have been extensively implicated in many cancers and are now being pursued as valuable drug targets (reviewd by 24). For example, RBBP7 is upregulated in many cancers, including non-small cell lung cancer, renal cell carcinoma, and breast cancer (49)(50)(51). Similarly, RBBP4 has been found to be overexpressed in hepatocellular carcinoma and acute myeloid leukemia (52,53). In human glioblastoma (GBM) tumor cells, knockdown of RBBP4 causes sensitization of tumor cells to temozolomide (TMZ) chemotherapy and supresses the expression of several DNA repair genes, including RAD51, a key enzyme of the homologous recombination repair pathway (54).
Despite their critical role in multiple epigenetic pathways, the function of RBBP4 and RBBP7 and their associated protein complexes has not been well documented outside of the Opisthokonta. Tetrahymena thermophila, a unicellular ciliate protozoan, is an excellent experimental system to study evolutionarily conserved chromatin-related processes, including gene expression and cell cycle regulation (55)(56)(57)(58)(59). Tetrahymena has two structurally and functionally distinct nuclei, a germ-line diploid micronucleus (MIC) and a somatic polyploid macronucleus (MAC), that are physically separated within the same cell (60). The MAC essentially regulates all gene expression during vegetative growth, whereas the MIC ensures stable genetic inheritance (61). Both the MAC and MIC arise from the same zygotic nu-cleus during sexual reproduction (conjugation) (61). During conjugation, two developing nuclei undergo substantial chromatin alterations, including DNA rearrangements and removal of 'internally eliminated sequences' (IES), resulting in functionally and structurally different MAC and MIC with distinct epigenetic states (62)(63)(64). Many of the epigenetic regulatory pathways in Tetrahymena that ensure the establishment of distinct chromatin states within the two nuclei are widely conserved across eukaryotes (65).
Here, we utilized functional proteomics and genomics approaches to characterize the single orthologue of human RBBP4 and RBBP7, RebL1, in Tetrahymena. Iterative proteomic analyses uncovered the composition of multiple Tetrahymena epigenetic regulatory complexes, such as the CAF1-, Sin3A/Rpd3-and MuvB-complexes, which copurified with RebL1 during vegetative growth and conjugation. Furthermore, ChIP-seq analysis of Lin54 (Anqa1), combined with previously reported RNA-seq data (66), implicated Tetrahymena MuvB in transcription regulation. Consistently, the knockdown (KD) of REBL1 significantly decreased cellular viability and altered the expression profile of some functionally critical genes, including RAD51. Overall, our results revealed a general role of RebL1 in multiple transcriptional regulatory pathways important for cell cycle and chromatin regulation in Tetrahymena.  (67). HEK293 cells (Flp-In 293 T-REx cell lines) were obtained from Life Technologies (R780-07) (Invitrogen) and cells were cultured in Dulbecco's modified Eagle's medium with 10% FBS and antibiotics as described (68).

Macronuclear gene replacement
Epitope tagging vectors for histone HHF1, HHF2, REBL1, ANQA1 and LIN9 Tt were constructed as previously described (67). Two separate ∼1 kb fragments up and downstream of the predicted stop codons were amplified using wild-type T. thermophila genomic DNA as template and primers as indicated in Supplemental Table S13. The resulting PCR products were digested with KpnI and XhoI (upstream product) or NotI and SacI (downstream product). The digested products were cloned into the tagging vector pBKS-FZZ kindly provided by Dr. Kathleen Collins (University of California, Berkeley, CA). Plasmids were amplified in Escherichia coli and the final plasmid was digested with KpnI and SacI prior to transformation. One micrometer gold particles (60 mg/mL; Bio-Rad) were coated with at least 5g of the digested plasmid DNA. The DNA coated gold particles were introduced into the T. thermophila MAC using biolistic transformation with a PDS-1000/He Biolistic particle delivery system (Bio-Rad) (69). The transformants were selected using paromomycin (60 g/mL).
MAC homozygosity was achieved through phenotypic assortment by growing the cells in increasing concentrations of paromomycin to a final concentration of 1 mg/mL.

Epitope tagging in HEK293 cells
We cloned gateway-compatible entry clones for RBBP4 and RBBP7 into the pDEST pcDNA5/FRT/TO-eGFP vector according to the manufacturer's instructions. The cloned vectors were co-transfected into Flp-In T-REx 293 cells together with the pOG44 Flp recombinase expression plasmid. Cells were selected for FRT site-specific recombination into the genome with hygromycin (Life Technologies, 10687010) at 200 ug/mL. Expression of the gene of interest was induced by addition of doxycycline (1ug/mL) to the culture medium 24 hours before harvesting.

Experimental design for mass spectrometry
At least two biological replicates of each bait were processed independently along with negative controls in each batch of sample. Tetrahymena cells without tagged bait (i.e. empty cells) and HEK293 cells expressing GFP alone were used as controls in our analysis. Extensive washes between each sample (see details for each instrument type) were performed to minimize carry-over. The order of sample acquisition on the mass spectrometer was reversed for the second replicate to avoid systematic bias. On the LTQ mass spectrometer, a freshly made column was used for each sample as previously described (67).

Affinity purification and Mass Spectrometry sample preparation
For Tetrahymena samples, affinity purification was carried out essentially as described (67,70). Briefly, T. thermophila was grown in ∼500 mL of 1× SPP to mid-log phase to a final concentration of 3 × 10 5 cells/mL for vegetative samples. For conjugation, Tetrahymena cells of different mating types were grown to mid-log phase and starved in 10 mM Tris-HCl pH 7.4 for 16 to 24 hours. Starved cells were mixed in equal numbers to initiate conjugation and pelleted at 5-hours post mixing. Vegetative and conjugating cells were pelleted and frozen at −80 • C. The frozen pellets were thawed on ice and suspended in lysis buffer (10 mM Tris-HCl pH 7.5, 1 mM MgCl 2 , 300 mM NaCl and 0.2% NP40 plus yeast protease inhibitors (Sigma)). For nuclease treatment, 500 units of Benzonase (Sigma E8263) was added and extracts were rotated on a Nutator for 30 min at 4 • C. Whole cell extracts (WCE) were clarified by centrifugation at 16 000 × g for 30 min. The resulting soluble material was incubated with 50 L of packed M2-agarose (Sigma) at 4 • C for at least 2 hours. The M2-agarose was washed once with 10 mL IPP300 (10 mM Tris-HCl pH 8.0, 300 mM NaCl, 0.1% NP40), twice with 5 mL of IP100 (10 mM Tris-HCl pH 8.0, 100 mM NaCl, 0.1% NP40), and twice with 5 mL of IP100 without detergent (10 mM Tris-HCl pH 8.0, 100 mM NaCl). 500 uL of 0.5 M NH 4 OH was added to elute the proteins by rotating for 20 min at room temperature. Preparation of protein eluates for mass spectrometry acquisition is detailed in supplementary methods.

AP-MS procedure in HEK293 cells
The AP-MS procedure for HEK293 cells was performed essentially as previously described (68,71,72). In brief, ∼20 × 10 6 cells were grown independently in two batches representing two biological replicates. Cells were harvested after 24-hour induction of protein expression using doxycycline. HEK293 cell pellets were lysed in high-salt NP-40 lysis buffer (10 mM Tris-HCl pH 8.0, 420 mM NaCl, 0.1% NP-40, plus protease/phosphatase inhibitors) with three freeze-thaw cycles. The lysate was sonicated as described (68) and treated with Benzonase for 30 min at 4 • C with endto-end rotation. The WCE was centrifuged to pellet any cellular debris. We immunoprecipitated GFP-tagged RBBP4 and RBBP7 proteins with anti-GFP antibody (G10362, Life Technologies) overnight followed by a 2-hour incubation with Protein G Dynabeads (Invitrogen). The beads were washed three times with buffer (10 mM Tris-HCl, pH 7.9, 420 mM NaCl, 0.1% NP-40) and twice with buffer without detergent (10 mM Tris-HCl, pH 7.9, 420 mM NaCl). The immunoprecipitated proteins were eluted with NH 4 OH and lyophilized. Proteins for MS analysis were prepared by in-solution trypsin digestion. The protein pellet was suspended in 44 uL of 50 mM NH 4 HCO 3 , the sample was reduced with 100 mM TCEP-HCl, alkylated with 500 mM iodoacetamide, and digested with 1 ug of trypsin overnight at 37 • C. Samples were desalted using ZipTip Pipette tips (EMD Millipore) via standard procedures. The desalted samples were analyzed with an LTQ-Orbitrap Velos mass spectrometer (ThermoFisher Scientific). Raw MS data were searched with Maxquant (v.1.6.6.0) (73) as described previously (72), yielding spectral counts and MS intensities for each identified protein for individual human proteins for each experiment. The resulting data were filtered using SAINTexpress to obtain confidence values utilizing the two biological replicates. The AP-MS data generated from HEK293 cells expressing GFP alone were used as negative control for SAINT analysis.

MS data visualization and archiving
Cytoscape (V3.4.0; (74)) was used to generate proteinprotein interaction networks. For better illustration, individual nodes were manually arranged in physical complexes. Dot plots and heatmaps were generated using ProHitsviz (75). The annotation of the co-purifying partners was carried out using BLAST (https://blast.ncbi.nlm.nih.gov/ Blast.cgi). SMART domain analysis (http://smart.emblheidelberg.de/) of the predicted proteins was carried out. All MS files used in this study were deposited at MassIVE (http: //massive.ucsd.edu). Additional details (including MassIVE accession numbers and FTP download links) can be found in Supplemental Table S14.

ChIP-Seq analysis
The ChIP-seq experiments and related analyses were performed in vegetatively growing Tetrahymena, as described previously (76)(77)(78), see supplemental methods for complete experimental details. Illumina adaptor sequences were removed from the 3 ends of 51-nt reads, and the remaining reads were mapped to the Tetrahymena genome (2014 annotation) using Bowtie 2 with default settings (79). After removal of duplicate reads, peaks were called jointly on immunoprecipitated and input samples with MACS2 (version 2.1.2) where the peak's q-value must be lower than 0.05 (80). The ChIP inputs were used as the control. The metagene analysis was performed using ChIP-Seq reads normalized over the inputs and by BPM (bins per million). The plots were generated using deeptools (81). The identified summits (apex of the peak) with 50 bp extension in both directions were analyzed using MEME-ChIP software for de novo motif discovery (http://meme-suite.org/) (82). The occurrence of Anqa1 motif along the peaks/genome was identified using FIMO with default settings (83). The closest TSS sites to the motifs were found using HOMER software's 'anno-tatePeaks' program (84).

Gene expression data
For gene expression analysis, we used microarray data (accession number GSE11300) (http://tfgd.ihb.ac.cn/) (85) and the expression values were represented in the heatmap format. To assess the similarities in gene expression profiles hierarchical clustering was performed. RNA-seq data corresponding to the DPL2 knockout was acquired from GSE104524 and analyzed for differential expression as described previously (86).

RNAi vector construction and RT-qPCR experiments
RNAi vectors were constructed essentially as described (87). RebL1 target sequence used in hairpin RNA constructs was amplified from CU428 genomic DNA using PrimeSTAR Max DNA polymerase with primers as listed in Supplemental Table S13. To create the hairpin cassette, amplified target fragments were cloned into the BamHI-BamHI and PstI-PstI sites, respectively, of pAkRNAi-NEO5, which was kindly provided by Dr Takahiko Akematsu, with the NEBuilder HF DNA Assembly kit. In this system, the NEO5 cassette of the backbone plasmid, which confers paromomycin resistance, has been replaced by a puromycin resistance marker (PAC) under the MTT2 copper-inducible promoter (87). The resulting plasmid was linearized with SacI and KpnI before biolistic transformation. The PAC cassette was activated by adding 630 M CuSO4 to the cells, with the addition of 200 g/mL puromycin dihydrochloride (Cayman Chemical, Ann Arbor, MI, USA, Cat. CAYM13884-500) to select transformants. RNAi was induced in cells carrying the hairpin construct by the addition of 0.5 g/mL CdCl 2 during vegetative growth. The expression of the target gene was examined using Rt-qPCR analysis. To this end, we isolated total RNA from the RNAi-treated or WT Tetrahymena cells using TRIzol (Life Technologies) as per the supplier's instructions. The isolated RNA was treated with Deoxyribonuclease I (RNase-free, Thermo). cDNA was prepared using iScript™ Reverse Transcription Supermix for RT-qPCR. qPCR was performed in technical triplicates using the cDNA prepared from three individual KD cell lines. The data were normalized to the expression levels of HTA3. Primers are indicated in Supplemental Table S13.

Viability assays
Viability assays were carried out essentially as described (88). We initiated our analysis by isolating single cells of wild-type or REBL1-RNAi treated lines in drops of media with and without 0.5 g/mL CdCl 2 . After 2 days of growth, the approximate number of cells in each drop was determined and drops containing >1000 cells were considered as viable. The experiment was conducted in five independent biological replicates with 50 cells isolated in each replicate. Finally, the number of viable drops from each replicate for each condition was represented as a bean plot. We used Student's t-test to assess statistically significant difference in the averages of viable drops of examined conditions.

RebL1 physically interacts with histone H4
We sought to identify the Tetrahymena RBBP4 and RBBP7 histone chaperones by first identifying the interaction partners for the core histone H4, one of their predicted partners. The Tetrahymena genome contains two core histone H4s encoded by HHF1 and HHF2 (89). We generated Tetrahymena cell lines stably expressing HHF1 and HHF2 with a Cterminal FZZ epitope tag from their endogenous MAC loci. The FZZ epitope tag comprises 3 × FLAG and two protein A moieties separated by a TEV cleavage site, which can be utilized in affinity purification experiments and indirect immunofluorescence (IF), as well as genome-wide studies. The successful expression of the endogenously tagged proteins was confirmed by western blotting analysis in whole cell extracts (WCEs) prepared either from the HHF1-and HHF2-FZZ expressing cells or untagged wild-type Tetrahymena (Supplemental Figure S1A). Similar to the core histone H3 (90), our IF analysis indicated that during vegetative growth Hhf1-and Hhf2-FZZ localize to both MAC and MIC (Supplemental Figure S1B). This observation indicates that the FZZ-tagged histones remain functionally competent, as previously reported (58,91). Since HHF1 and HHF2 encode identical proteins, we used only the HHF1-FZZ cells (designated as H4) for our downstream analyses.
We identified TTHERM 00688660 as one of the highconfidence (FDR ≤ 0.01) H4 co-purifying proteins which shared sequence similarity with human RBBP4 and RBBP7. The co-purification with H4 of an RBBP4/7like protein (RebL1, retinoblastoma binding protein 4/7like 1) is consistent with previous reports describing human and budding yeast orthologues to function as histone chaperones (95). We subsequently investigated TTHERM 00688660 as a putative RBBP4/7 Tetrahymena orthologue in greater detail.

RBBP4/7 is conserved in Tetrahymena
WD40-repeat family proteins are encoded by multiple genes in most organisms (19,96). We examined the Tetrahymena genome for the existence of additional RBBP4/7-like proteins with predicted WD40-repeats. Using the human RBBP4 and RBBP7 proteins to search against the Tetrahymena genome, we identified two proteins, including TTHERM 000709589 and the H4 copurifying TTHERM 00688660 (RebL1). Both identified proteins have conserved domain architecture with an N-terminal H4-binding domain followed by six WD40 repeats ( Figure 1B; Supplemental Figure S1C). Multiple sequence alignment analysis, however, indicated that TTHERM 000709589, which we named as Wdc1 (WD40 domain containing 1), shares only weak sequence identity (∼18%) with its human and yeast counterparts; in comparison, RebL1 had an overall ∼46% sequence identity ( Figure 1B). It is worth noting that Tetrahymena also encodes another WD40 repeat protein, TTHERM 00467910, which appears to be an orthologue of S. cerevisiae Rrb1 (known as GRWD1 in humans) (Supplemental Figure  S1C), as assessed by reciprocal BLAST searches. Rrb1 is known to function in ribosome biogenesis (97), making TTHERM 00467910 an unlikely candidate to be an RBBP4/7 orthologue.
To further categorize the identified WD40 proteins in Tetrahymena, we carried out a phylogenetic analysis and observed that RebL1 and Wdc1 are related to the RBBP4/7family of histone chaperones (Supplemental Figure S1D). Wdc1 and Arabidopsis thaliana Msi4 clustered together, distinctly within the RBBP4/7 group on the phylogenetic tree, suggesting that these proteins might be functionally divergent (Supplemental Figure S1D). In contrast, TTHERM 00467910 was found within a well differentiated clade representing the Rrb1 protein family (Supplemental Figure S1D). We conclude that RebL1 and Wdc1 are related to the RBBP4/7-family of histone chaperones in Tetrahymena, whereas TTHERM 00467910 belongs to a phylogenetically distinct WD40 sub-family.

RebL1 interacts with diverse chromatin-associated protein complexes
To test the hypothesis that TTHERM 00688660 is a bona fide RBBP4/7 orthologue, we generated endogenously tagged Tetrahymena cell lines of two different mating types stably expressing REBL1 with a C-terminal FZZ epitope tag from its native MAC locus ( Figure 1C). Next, we performed AP-MS analysis in vegetative Tetrahymena cells using RebL1-FZZ as a bait. Application of SAINTexpress to the raw AP-MS data identified at least 14 highconfidence (FDR ≤ 0.01) interaction partners of RebL1  Table S2 for complete AP-MS results.
during vegetative growth ( Figure 1D; Supplemental Table  S2). These interaction partners include the putative orthologues of S. cerevisiae Hat1, as well as TTHERM 00309890 and TTHERM 00219420, which shared sequence similarity with the human p150 and p60 (S. cerevisiae Cac1 and Cac2) subunits of the CAF1 complex, respectively (Figure 1D). These observations highlight the potentially conserved heterotrimeric and heterodimeric subunit composition of the CAF1-and HAT1-complexes, respectively. Next, we identified seven RebL1 co-purifying proteins, including Sin3, Thd1, Rxt3, Pho23, Sap30 and two PHDcontaining proteins that, based on similarity with the S. cerevisiae orthologues, define a putative Tetrahymena Rpd3 histone deacetylase (HDAC) complex ( Figure 1D). The high-confidence RebL1 interactions also included putative orthologues of the Lin54 (TTHERM 00766460) and Lin9 (TTHERM 00591560) subunits of the mammalian MuvB-complex ( Figure 1D). An additional two unanno-tated or hypothetical proteins, TTHERM 00227070 and TTHERM 00147500 (named as Jinn1 and Jinn2, respectively), co-purified with RebL1 and lacked any recognizable domains and/or sequence similarity to any known protein.
As RebL1 expression is highest after meiosis, a state that is marked by a series of rapid post-zygotic nuclear divisions (Supplemental figures S1E and S2A), we investigated whether its interactome was remodeled during sexual development of Tetrahymena. To address this, we performed AP-MS in conjugating cells harvested 5-hour post-mixing. After scoring the individual PPIs using SAINTexpress, we detected a total of 30 reproducible, high-confidence interaction partners of RebL1 during conjugation (Supplemental Table S3). All the RebL1 interaction partners identified during vegetative growth were also detected as highconfidence hits in the conjugating Tetrahymena (Figure 2A). chromatin organization, transcription regulation and histone modifications ( Figure 2B). Among the RebL1 interaction partners identified exclusively during conjugation were four putative MYBlike transcription factors (TFs) ( Figure 2C). Additionally, we identified TTHERM 00841280, which appears to be ciliate-specific and lacks sequence similarity to any known metazoan protein. The expression analysis of RebL1 copurifying proteins using publicly available micro-array data revealed that TTHERM 00841280 is exclusively expressed during conjugation with a peak at early stages correlating with the onset of MIC meiosis (Supplemental Figure S2B) (85). We named TTHERM 00841280 as 'Friend of RebL1 in conjugation' (Forc1). Among the remaining conjugationspecific RebL1 co-purifying partners were a Tor1-like, Asf1interacting protein 2 (Aip2) and several unannotated proteins, including Recp1-4 (RebL1 co-purifying protein 1-4) and Ipra1 (see below), without any recognizable domains ( Figure 2C). We have previously shown that Aip1 and Aip2 are two highly similar ciliate-specific proteins that interact with Asf1 and likely function in chromatin assembly pathways in Tetrahymena (67). We also detected Aip1 as a conjugation-specific interaction partner of RebL1, albeit at a slightly relaxed threshold (FDR≤0.03) (Supplemental Table S3). The co-purification of Aip1/2 proteins that function with ImpB6 and Asf1 (67) reinforces the idea that RebL1 functions in multiple chromatin-related pathways in Tetrahymena, consistent with our GO enrichment analysis of the RebL1 interacting partners ( Figure 2B).

RebL1 protein interaction profile is analogous to human RBBP4 and RBBP7
To examine the functional conservation of distantly related orthologues, we directly compared the protein interaction profiles of RebL1 and human RBBP4 and RBBP7. We generated inducible HEK293 cell lines expressing EGFPtagged RBBP4 and RBBP7 and used them for anti-GFP AP-MS experiments. Using SAINTexpress, we found that RBBP4 and RBBP7 physically interact with multiple chromatin-related, as well as transcriptional regulatory, protein complexes (Supplemental Table S4,5), including the CAF1-, Hat1-, MuvB-and Sin3A-complexes (FDR≤0.01; Figure 3A), as reported previously (98). However, 37.5% (18 proteins out of 48) of the high-confidence protein-protein interactions for RBBP4 and 29% (9 proteins out of 31) for RBBP7 had not been previously reported in the Bi-oGRID database (99). While there are many shared interaction partners between the RBBP4 and RBBP7 homologues, non-overlapping distinct PPIs were also observed ( Figure 3B). For example, consistent with previous studies (35), MuvB subunits co-purified with only RBBP4 and not with RBBP7 ( Figure 3A). Furthermore, CAF1-complex subunits Chaf1A and Chaf1B co-purified with RBBP4 whereas Hat1 was identified exclusively in RBBP7 interaction partners. Importantly, we observed that many of the transcription-and chromatin-related complexes that we found for human RBBP4 and RBBP7 had their putative orthologous complexes in Tetrahymena. Examples of these conserved interaction partners include the Hat1-, CAF1-, and MuvB complexes ( Figure 3C; Supplemental Figure  S2C; Supplemental Table S3). While PRC2 subunits, including Suz12, Ezh2 and EED, were co-purified with RBBP4 and RBBP7 ( Figure 3A), we did not identify any Tetrahymena orthologous subunits in our RebL1 AP-MS data. We conclude that Tetrahymena RebL1 physically interacts with multiple epigenetic regulatory complexes, and these functional links are conserved in humans.

RebL1, Lin9 Tt and Lin54 Tt are the conserved MuvB subunits
The identification of putative MuvB subunits in Tetrahymena piqued our interest due to the described role of mammalian MuvB as the master regulator of cell-cycledependent gene expression (33). To identify all the putative MuvB components, we used the human MuvB subunits as queries to search the Tetrahymena genome. Our analysis indicated that the Tetrahymena genome does not appear to encode any orthologues of Lin37 and Lin52. Lin9 appears to be present as a single gene copy in Tetrahymena, similar to humans. We identified at least 14 genes that appeared to encode the Lin54-like proteins in Tetrahymena. All the identified Lin54-like proteins contained two tandem TESMIN/TSO1-like CXC domains (Pfam: PF03638), consistent with the architecture of human Lin54 (39,40) (Supplemental Figure S3A). We next examined the expression profiles of the Tetrahymena Lin54-like genes during growth and conjugation using publicly available microarray data (85). Most of the Lin54-like genes were weakly expressed during the growth and late conjugation stages (6-hour and onwards after mixing the cells) (Supplemental Figure S3B). At least 9 out of the 14 genes, including the RebL1 interaction partner TTHERM 00766460, exhibited an expression peak at early conjugation stages (2-6 hours after mixing the cells). For example, the expression of TTHERM 00120820, TTHERM 00577120 and TTHERM 00600840 peaked at 2 hours, whereas TTHERM 00766460 had its highest expression at 4 hours after mixing the cells (Supplemental Figure S3B). These observations suggest that the expression of Lin54-like proteins is tightly regulated during early conjugation, a period when the MIC becomes transcriptionally active and undergoes meiosis (Supplemental Figure S1E).
We named Lin54-like proteins as 'Anqa-1 to 14' (after the 'ANQA' mythological creature thought to appear once in ages), considering that several of them exhibited an expression peak only once at a distinct conjugation stage.
We generated endogenously tagged Lin9 Tt -FZZ and Anqa1-FZZ (TTHERM 00766460) cell lines and employed them for AP-MS (Supplemental Figure S3C). These analyses showed reciprocal purification of RebL1 and Anqa1 when Lin9 Tt -FZZ was employed as the bait protein ( Figure  4A; Supplemental Table S6). Similarly, Lin9 Tt and RebL1 were identified as Anqa1-FZZ high-confidence co-purifying partners (Supplemental Table S7). These results are consistent with the idea that RebL1, Anqa1 and Lin9 Tt represent the conserved subunits of a putative MuvB complex in Tetrahymena. Two unannotated proteins, which we named as Jinn1 and Jinn2 (see above), co-purified with both Lin9 Tt -FZZ and Anqa1-FZZ as high-confidence hits (Figure 4A). Based on their expression profiles (Supplemental Figure S2A) and co-purification with all three conserved subunits (RebL1, Lin9 Tt and Anqa1) ( Figure 4B), we suggest that Jinn1/2 represent Tetrahymena-specific putative MuvB components.
In addition to functionally diverse proteins such as kinases and helicases, TTHERM 00317260 and TTHERM 00842380 also co-purified with Anqa1-FZZ during vegetative growth (FDR≤0.01). TTHERM 00317260, which interacted with RebL1 during conjugation, is a Tetrahymena-specific protein and lacks any recognizable domains. We named it Ipra1 (interaction partner of RebL1 and Anqa1). TTHERM 00842380 is the putative MYB-like TF (MybL1) that was also detected as a high-confidence RebL1 interaction partner during both growth and conjugation (FDR≤0.01) ( Figure 2C). This observation is consistent with the reported interaction between mammalian MuvB and B-MYB TF to form an activator MMB complex (33).

Genome-wide analysis reveals distinct RebL1 and Anqa1 DNA-binding profiles
Among the human MuvB subunits, RBBP4 and Lin54 are thought to bind DNA, with Lin54 exhibiting sequence specificity by recognising the CHR element (TTTGAA)  Tables S4 and S5 for complete AP-MS results for RBBP4 and RBBP7, respectively. (C) Schematic illustration of evolutionary conservation for the RebL1 co-purifying putative chromatin-related complexes that are also known to interact with human RBBP4 and RBBP7 proteins. The legend for the colors is provided. (33). To investigate their genome-wide localization in Tetrahymena, we utilized our endogenously tagged RebL1 and Anqa1-FZZ cell lines and performed chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq) experiments in vegetatively growing cells (Supplemental Figure S4A). ChIP-seq analysis revealed that RebL1 predominantly localizes to genic regions with 64% of the peaks being found within exons (Supplemental Figure S4B, left; Supplemental Table S9; 5531 total peaks; q value < 0.05). Consistent with the role of RebL1 in multiple chromatin-related processes, metagene analysis indicated that RebL1 binds throughout the gene body of its targets without any specific sequence preferences (Supplemental Figure S4B, right).
Unlike RebL1, Anqa1 metagene analysis indicated a clear enrichment near the transcription start sites (TSS) and transcript end sites (TES) of the targeted genes ( Figure 5A). Peak distribution analysis revealed that roughly half of the Anqa1 ChIP peaks are found within 1 kilobase of the annotated TSS ( Figure 5A; Supplemental Table S10). As an enrichment near the promoter regions suggests a role in transcription regulation, we utilized publicly available RNA-seq data that has been used to rank Tetrahymena genes based on their expression patterns during vegetative growth (66) and correlated it with the Anqa1 ChIP data. We found that the majority of the Anqa1-FZZ bound genes (290 of 422 genes) were moderately to very weakly expressed and only 12% of the targets were classified as highly expressed during Tetrahymena growth ( Figure 5B). These results suggest that Anqa1 may function to regulate the expression of its target genes.
We carried out gene ontology (GO) enrichment analysis related to molecular functions and/or biological processes. While RebL1-binding genes were not enriched in any GO terms, in keeping with its role in multiple protein complexes, Anqa1 targets were however enriched for terms asso-  Table S8 for complete AP-MS results for Anqa1 during conjugation. ciated with gene expression regulation, chromatin assembly and organization, translation regulation, ribosome biogenesis and cellular metabolic processes, suggesting a specific function in regulation of cellular processes ( Figure 5C).

A hybrid E2F/DP and Anqa1 DNA-binding motif
The ChIP-seq analysis of Lin54 in metazoans has revealed the existence of a hybrid DREAM-recruiting motif representing the E2F-binding sequence upstream of the CHR element (44)(45)(46). To identify a putative DNA-binding motif, we performed a de novo motif discovery analysis using the Anqa1 ChIP peaks. The identified Anqa1 motif appeared to share similarity with the characteristic DREAM-recruiting E2F/DP-CHR hybrid motif ( Figure 5D). By scanning across the Tetrahymena genome, we identified ∼3000 loci, with roughly half (1345 loci) being found within ±1 kb of TSS, that contain this putative DREAM-recruiting motif (Supplemental Table S11). We cannot rule out the possibility of additional DREAM-recruiting loci given that the Tetrahymena genome encodes multiple Lin54-and E2F-like factors which could potentially recognize sequences with variations in the identified motif. The Tetrahymena genome Each node depicts a GO term whereas each edge represents the degree of gene overlap that is found between two gene sets corresponding to the two GO terms. GO term significance 0.01 (white) to 0 (orange). (D) An overrepresented motif in Anqa1 bound Tetrahymena genes identified by MEME (E value: 8.0e-223). Below are shown previously identified motifs for the C. elegans EFL-1, an extended motif enriched among E2F2, LIN-9 and LIN-54 co-regulated genes in Drosophila and the CDE/CHR motif from the human Cdc2 promoter. Regions bound by human E2F4 and LIN54 at the Cdc2 promoter as well as homologous motif sequences in other organisms are highlighted by the dotted lines. (E) Metagene analyses indicating the binding profile of Anqa1 around TSS of those genes that 1) contain the E2FL1 motif in their promoters (left), 2) those genes whose expression is affected during conjugation upon DPL2 depletion (right). encodes at least seven E2F family members (E2FL1-4 and DPL1-3) and two p107/p130 (Retinoblastoma-like1/2)-like proteins (86). Previous in silico analysis has identified 714 Tetrahymena genes to contain some version of human E2F DNA-binding motif (TTTSSCGC where S is either a G or a C) in their promoters (100). Although we did not detect any DREAM-specific proteins in our AP-MS (FDR ≤ 0.01), we found that Anqa1 exhibited enriched binding in the promoter regions of putative E2F motif containing genes ( Figure 5E, left), consistent with the identification of an E2F/DP-CHR-like hybrid motif.
E2FL1 and Dpl2 heterodimer has been described to function in gene expression regulation during conjugation (86). To further explore the connection between Anqa1 and E2Fs, we utilized previously generated RNA-seq data in dpl2Δ cells during conjugation (86), and examined the Anqa1 binding profile around the TSS of affected genes. Our analyses indicated that Anqa1 binds relatively evenly across the gene bodies of both up-and down-regulated genes at 3-, 4-, 6-and 7-hour time points after mixing the dpl2Δ cells during conjugation (Supplemental Figure S4C). Remarkably, however, we observed a strong binding enrichment for Anqa1 specifically within the promoter regions of genes that are upregulated in dpl2Δ cells 5-hours post mixing cells ( Figure 5E, right), a period in conjugation immediately prior to the observed meiotic defect in dpl2Δ cells (86). In contrast, the downregulated genes had wide-spread binding within their bodies without any enrichment near the TSS (Figure 5E, right). These dpl2Δ-dependent upregulated genes were enriched in pathways such as DNA replication and DNA repair (Supplemental Figure S4C). We suggest that the putative MuvB subunit Anqa1 binds in the promoter region, possibly in conjunction with DREAMspecific E2F/DP proteins, to regulate the expression of some of its target genes.

RebL1 depletion results in growth defects
Previous studies have correlated the overexpression of RBBP4 and RBBP7 with tumor cell proliferation in certain cancer types (49,54). To analyze the RBBP4 and RBBP7 expression levels and correlation with cancer prognosis, we utilized archived patient RNA-seq data from 'The Cancer Genome Atlas' (TCGA), focusing on multiple tumor types for which the corresponding normal tissue data were available. The expression levels of both RBBP4 and RBBP7 were significantly higher in several cancers, including stomach adenocarcinoma, breast cancer, glioblastoma multiforme, cholangio carcinoma and liver hepatocellular carcinoma, compared with the normal tissues (P-value cutoff ≤ 0.01) (Supplemental Figure S5A; Supplemental Table S12 for TCGA abbreviations). Furthermore, these high expression levels of RBBP4 and RBBP7 significantly correlated with unfavorable clinical outcomes (P ≤ 0.05) in many tumor types (Supplemental Figure S5B). For example, Kaplan-Meier curves indicated that hepatocellular carcinoma (LIHC) high expression groups for both RBBP4 and RBBP7 had significantly worse overall survival (P < 0.01) compared with low expression groups ( Figure  6A). The observed overall poor survival of LIHC highexpression group was further enhanced when the RBBP4 and RBBP7 expression was combined in our analyses, suggesting an additive effect ( Figure 6B).
We examined the DepMap database, which encompasses data from genome-wide RNAi and CRISPR lossof-function screens across hundreds of cancer cell lines, and observed that RBBP4 behaves as a common essential gene (Supplemental Figure S5C) (101,102). RBBP7, on the other hand, exhibited high selectivity in terms of cancer cell line dependency, reinforcing the concept of orthologuespecific functions for RBBP4 and RBBP7. Considering the sequence/structural similarity of RebL1 with its human orthologues and formation of similar functional links (Supplemental Figure S5D), we explored the role of RBBP4 and seven in cellular proliferation and/or gene expression using Tetrahymena as a model system. To this end, a cadmium inducible RebL1 RNA interference (RNAi) construct was introduced into vegetative Tetrahymena cells. Upon CdCl 2 treatment, the RebL1 expression levels were significantly reduced in RebL1-RNAi cells in comparison with the WT and/or uninduced Tetrahymena ( Figure 6C), as assessed by reverse-transcription quantitative real-time PCR (RT-qPCR) analysis. We used growth assays to examine the functional consequences of RebL1 depletion in growing Tetrahymena cells. We found that RebL1-RNAi cells gave ∼19% viable clones, significantly fewer compared to 90% viable cells (n = 5) in wildtype Tetrahymena when treated with CdCl 2 (0.5 g/mL; see methods) ( Figure 6D), consistent with the role of its human orthologues in cell proliferation.

RebL1 depletion causes gene expression changes
The REBL1-RNAi dependent reduction in cellular viability could be related to indirect changes in expression levels of some functionally important genes. In human glioblastoma cells (GBMs), RBBP4 depletion results in sensitization of tumor cells to temozolomide (TMZ) in conjunction with the downregulation of DNA repair protein Rad51 (54). We examined Rad51 Tt expression levels in REBL1-RNAi cells. Rad51 Tt was significantly supressed in Tetrahymena depleted of RebL1 in comparison to the WT and/or uninduced cells ( Figure 6E).
Previous studies have identified an auto-regulatory feedback loop for the DREAM complex (103). To test the possibility of an auto-regulatory feedback loop for MuvB, we examined the expression levels of Anqa1 and Lin9 Tt using the RNAs derived from either the REBL1-RNAi or WT cells. We found that REBL1 knockdown suppressed the expression of both Anqa1 and Lin9 Tt (Figure 6E), consistent with the disruption of a putative Tetrahymena MuvB. Consistent with these data, we observed that expression levels of Lin54 and Lin9 positively correlate with RBBP4 in multiple tumor types (Supplemental Figure S6). Furthermore, the high expression of MuvB subunits correlate with significantly worse overall survival in many different cancer types (P≤0.05), particularly in LIHC patients where increased levels of 3 out of the 5 MuvB subunits were associated with unfavourable outcomes (Figure 6F). These results are consistent with the highly conserved physical interactions of MuvB subunits with each other. RNAi treated lines isolated in drops of media with and without CdCl 2 (0.5 g/mL). Cell growth was observed after at least 48 hours to a maximum of 72 hours. The number of viable drops were counted from each replicate (n = 5; 50 cells in each replicate) and plotted. * * * indicates t-test P≤ 0.001 and n.s.: non-significant. (E) Analysis of expression levels of target genes in REBL1-RNAi cells treated with CdCl 2 (0.5 g/mL) or untreated. The qRT-PCR data were normalized to HTA3 expression. Statistical significance was assessed using Student's t-test ( *** P ≤ 0.001, ** P ≤ 0.01, * P ≤ 0.05, n.s.: non-significant). Error bars represent standard deviation. (F) Heat map showing the hazard ratios in logarithmic scale (log 10 ) for the indicated genes in multiple cancer types. The red blocks denote higher whereas blue blocks indicate lower clinical risk, with an increase in the gene expression. The blocks with darkened frames denote statistical significance in prognostic analyses (P ≤ 0.05; Mantel-Cox test). The TCGA cancer abbreviations are provided in Supplemental  Table S12.

DISCUSSION
Effective transcriptional regulation requires the ordered assembly of multiple protein complexes on chromatin in response to specific cellular cues. In this study, we report that Tetrahymena contains at least three WD40 repeat family proteins with putative scaffolding roles. Specifically, we determined TTHERM 00688660 (RebL1) to be the major structurally and functionally conserved orthologue of human RBBP4 and 7 in Tetrahymena. We demonstrated that RebL1 is a component of multiple chromatin-related complexes and is important for gene expression regulation and cell growth.
Previous studies have indicated that the structure/function of various MuvB-derived complexes can differ across different species (33). For example, structurally and functionally distinct MuvB complexes in Drosophila salivary glands and testis have been identified (104)(105)(106). It has remained an open question, however, whether tissue-specific or development-specific MuvB complexes also exist in other species. In Tetrahymena, the presence of multiple Lin54 paralogues and their distinct expression pattern suggest the existence of multiple MuvBlike complexes. Although further studies will explore this interesting possibility, a previous report has shown the interaction between E2FL1/Dpl2 and Anqa14 (86) supporting the existence of multiple structurally/functionally distinct MuvB-containing DREAM-like complexes in Tetrahymena. Several observations reported in our study indirectly support the formation of a DREAM-like complex in Tetrahymena: (i) the identification of a putative E2F/DP-CHR-like DNA motif; (ii) the enrichment of Anqa1 ChIP peaks in the promoter regions of putative E2F target genes in Tetrahymena and (iii) the targeting of dpl2Δ-dependent upregulated genes by Anqa1. We propose that Tetrahymena MuvB functions in collaboration with E2F/DPs to form a DREAM-like complex that negatively regulates the expression of target genes. Although E2F DNA-binding motifs are largely conserved during evolution (107), CHR elements have not yet been studied in Tetrahymena. Further studies are required to thoroughly examine the role of CHR and other cis elements in Tetrahymena cell cycle regulation.
The role of RBBP4 and RBBP7 in cellular proliferation has been widely reported, although the mechanistic details have remained unclear. Our results suggest that one of the mechanisms through which RBBP4 and RBBP7 might function in cellular growth is via regulating the cell-cycle regulatory complex MuvB. Consistent with this, we found that the expression profiles of MuvB subunits are interconnected in tumor cells. In previous reports, autoregulatory feedback loops have been reported involving B-Myb and the DREAM complex (103). It will be interesting to examine whether B-Myb also functions in the MuvB autoregulation pathway. Considering that MuvB subunits were identified as biomarkers of poor prognosis for multiple cancers, our mechanistic findings have direct implications for better understanding tumor cell proliferation.
RBBP4 and RBBP7 function in multiple epigenetic complexes which might have a role in cell proliferation independent of MuvB. For example, knockdown of RBBP4 in GBM tumor cells results in suppression of the DNA-repair protein Rad51 due to disruption of the RBBP4/p300/CBP chromatin-modifying complex (54). Loss of RBBP4 in GBMs was found to be associated with suppression of tumor growth (54). Consistently, we observed that RebL1 depletion resulted in suppression of the Rad51 Tt orthologue in Tetrahymena. Disruption of Tetrahymena RAD51 Tt has been reported to lead to severe abnormalities during vegetative growth, as well as in conjugation (108,109). It remains to be seen whether Rad51 Tt also has a role in the reduction of cellular viability observed in REBL1-RNAi treated Tetrahymena cells. In summary, our research has extended the current understanding of transcription regulation in ciliates and, more broadly, on the multifaceted role (s) of RBBP4 and RBBP7 proteins in eukaryotes.

DATA AVAILABILITY
The mass spectrometry raw data reported in this paper are deposited at MassIVE (https://massive.ucsd.edu/). Additional files include the complete SAINTexpress outputs for each dataset as well as a "README" file that describes the details of mass spectrometry files deposition to the MassIVE repository (Supplemental Table S14 for accession numbers). ChIP-seq data generated can be found online at Gene Expression Omnibus (GEO, https://www.ncbi. nlm.nih.gov/geo/). NGS and reads files produced in this study were deposited at GEO database with unique identifier GSE156091.
HEK293 cell lines. Z.Z. supervised H.L. J.F.G. edited the manuscript, participated in analyses, and provided reagents. R.P. edited the manuscript, supervised trainees, and provided reagents. J.-P.L. performed all Tetrahymena MS analyses, contributed to data analysis, partly supervised the project and edited the manuscript. J.F. was responsible for study design, manuscript editing, data analyses, project supervision and provided reagents. All authors have approved the final manuscript.