Comprehensive characterization of the embryonic factor LEUTX

Summary The paired-like homeobox transcription factor LEUTX is expressed in human preimplantation embryos between the 4- and 8-cell stages, and then silenced in somatic tissues. To characterize the function of LEUTX, we performed a multiomic characterization of LEUTX using two proteomics methods and three genome-wide sequencing approaches. Our results show that LEUTX stably interacts with the EP300 and CBP histone acetyltransferases through its 9 amino acid transactivation domain (9aaTAD), as mutation of this domain abolishes the interactions. LEUTX targets genomic cis-regulatory sequences that overlap with repetitive elements, and through these elements it is suggested to regulate the expression of its downstream genes. We find LEUTX to be a transcriptional activator, upregulating several genes linked to preimplantation development as well as 8-cell-like markers, such as DPPA3 and ZNF280A. Our results support a role for LEUTX in preimplantation development as an enhancer binding protein and as a potent transcriptional activator.


INTRODUCTION
Human embryonic genome activation (EGA) is characterized by upregulation of a set of specific transcription factors and genomic repeat elements. [1][2][3][4] One of the key EGA factors, DUX4, is expressed briefly in the zygote [5][6][7][8] and drives the expression of non-coding repeat elements and its downstream genes, including LEUTX. 9,10 LEUTX is expressed at the 4-and 8-cell stages, and it is downregulated by the morula stage. 1,3,4,8 However, there is mounting evidence that Dux or its regulators Dppa2 and Dppa4 are not essential for mouse preimplantation development and thus some authors have questioned whether LEUTX as a target of human DUX4 is essential for EGA. [11][12][13] On the other hand, we found that mutation frequencies in LEUTX are lower than the average of all human protein coding genes, suggesting that LEUTX is relatively constrained in human individuals. In 7 large human genotype resources, not a single individual with two deleterious variants of LEUTX were discovered. 14 Recently Zou et al. 15 found LEUTX knockdown to have a minor effect on EGA. These results motivate further study of the potential role of LEUTX in embryonic development.
LEUTX is a paired-like (PRDL) transcription factor (TF) 16 with a complete functional K50 homeodomain. 1,14,17 It is thought to have arisen by tandem gene duplication and subsequent asymmetric sequence evolution from the cone-rod homeobox gene CRX (OTX5) from the Otx gene family. 18,19 Additional genes in this family such as ARGFX, DPRX, TPRX1, and TPRX2 are all expressed during human preimplantation development. 1 In this study we present a comprehensive characterization of LEUTX using two different proteomics methods and three different genome-wide approaches. We performed affinity purification (APMS) and BioID-MS using stable Flip-In T-REx 293 cell lines expressing MAC-tagged LEUTX, 20 and RNA sequencing (STRT-Seq on bulk-RNA, modified from single-cell tagged reverse transcription sequencing protocol), 21,22 native elongating transcript-cap analysis of gene expression (NET-CAGE), 23 and LEUTX targeted chromatin immunoprecipitation sequencing (ChIP-Seq) 24 178 SSLNQYLFP 186 9aaTAD deletion is marked as purple in the far C-terminal region. Another computationally predicted, but less conserved, 9aaTAD is highlighted in teal. Figure shows the amino acid sequences modified in this study. See also Figure S1 and Table S13.  Table S1 for complete results and  Table S2 for CORUM Complex enrichment analysis results. See Table S3 for expression of identified interactors in embryonic transcriptomics dataset. iScience Article interaction with EP300 and CBP and that the K57A homeodomain mutant would lose binding affinity to DNA. To confirm, we performed a full interaction analysis on the two functional mutants: the K57A homeodomain mutant and the 9aaTAD deletion mutant. For the LEUTX 9aaTAD mutant the stable interaction with EP300, CBP and RB was lost ( Figure 1C). RB does not contain a KIX-domain and in previous research, an interaction between EP300/CBP and RB has been shown. 34 Because the affinity purification cannot reveal if RB was bound to EP300, CBP or LEUTX we cannot confirm the direct interaction between LEUTX and RB. The interactions with EP300 and CBP are still detected through BioID-MS for the 9aaTAD deletion mutant, but significantly weakened compared to the wild type (Student's t-test, EP300 p-value = 3E-5, CBP p-value = 1E-6) ( Figure S2D). The K57A mutant still maintained direct interaction with EP300, CBP and RB ( Figure 1C). The 'Nuclear Receptor Binding' GO-term was lost for the K57A mutant but was maintained for the 9aaTAD deletion mutant ( Figure 1D). The K57A mutant also lost interactions with other TFs and chromatin modifiers, and based on its interactome, it appeared displaced from the nuclear matrix ( Figure S2B).
Altogether, we detected 149 unique high-confidence interactors for LEUTX and the domain mutants, of which the vast majority (98; 66%) are general factors detected on RNA level in all tissues in The Human Protein Atlas ( Figure S2E). 35 129 (88%) of the unique interactors were found expressed in Yan et al. (2013) 4 embryonic sequencing dataset between 2-cell and Morula stages with RPKM > 1 in at least one timepoint (Table S3). Most of the interactions detected for LEUTX and the mutants are shared in all (65; 44%) or detected in LEUTX and one of the mutants only (LEUTX and K57A: 16; 11%, LEUTX and 9aaTADdel: 23, 15%) ( Figure S2F).

LEUTX binds close to the interactors' binding sites and regulates both enhancers and promoters
To study how LEUTX acts as a transcription factor, we performed three different types of complementary genome-wide analyses using the LEUTX-TetOn hPSCs (overviewed in Figure 2A and S3). Although hPSCs do not mimic the actual molecular context of the cleavage stage embryo, but rather several days later epiblast phase, they represent a feasible model to study EGA-associated genes using methods that require millions of cells. Recently developed methods to detect and enrich human 8-cell like cell populations among hESC or naive hPSC cultures provide a new tool for further characterization of human EGA-associated factors. [36][37][38] The 8 cell-like cells, however, represent minor subpopulation among hPSC or naive hPSC cultures and are unstable with a tendency to convert back to later developmental stages and as such are not feasible for the production of large amounts of cells expressing transgenic gene of interest.
First, we applied the NET-CAGE method to detect 5 0 ends of newly synthesized promoter RNAs and bidirectionally expressed enhancer RNAs (eRNA) that indicate active enhancer positions. 23 Second, the modified STRT-Seq analysis, here performed on bulk-RNA, yielded Transcript Far 5 0 Ends (TFEs) that were used to quantify gene expression. 21 Finally, we performed LEUTX-targeted (HA) ChIP-Seq which produces genomic coordinate peaks reflecting genomic binding sites of LEUTX ( Figure 2A). Together these methods provide us a global insight where LEUTX binds in the genome and how it regulates the expression of not only protein coding genes but also the regulatory genome such as enhancers and promoters.
NET-CAGE sequencing of HEL24.3 iPSCs expressing doxycycline inducible transgenic LEUTX identified 3282 differentially expressed (FDR < 0.05) enhancers and 4203 differentially expressed (FDR < 0.05) promoters, out of which 1990 and 2664 were upregulated respectively (logFC > 0) (Figures S4A and S4B,  Tables S4 and S5). Next, we annotated the upregulated enhancers and promoters towards known GENCODE TSS regions and showed that the NET-CAGE promoters primarily annotated to proximal  iScience Article promoter regions, whereas the NET-CAGE enhancers mostly annotated to distal intergenic and intronic regions ( Figure 2B).
The FANTOM5 consortium has identified $65 000 human transcribed enhancers by sequencing 1829 human samples. 40,41 We compared our 1990 upregulated LEUTX induced enhancers to previously published enhancers and found that only 657 were included in the FANTOM5 project, with 1333 thus being novel. 42 We also compared the LEUTX induced enhancers to those upregulated by DUX4, 9 and found 269 overlapping upregulated enhancers. We further compared the identified genomic regions to publicly available regulatory region datasets to further annotate their function. 160 upregulated enhancers overlapped with known super-enhancer locations in dbSuper H1 dataset (23% of H1 super enhancers). 43 To study the effects of LEUTX expression at physiological level, we generated a hESC cell line conditionally expressing dCas9-VP192 activator together with guides targeting LEUTX promoter and enhancers identified by Vuoristo et al. (2022) 9 ( Figure S5). The activation of the enhancers that result in induction of LEUTX may, however, also affect the genomic region surrounding the LEUTX locus. We analyzed transcriptome effects by STRT at 24 h, 48 h, and 72 h after LEUTX induction, comparing them to no-induction controls. We found differential expression (FDR < 0.05) of 1050 genes in at least one time point. Similar to the NET-CAGE promoter dataset, STRT primarily detects the 5 0 ends of the transcripts characterizing the promoter level expression and GENCODE annotations were comparable to NET-CAGE promoter annotations ( Figure 2B).
Next, we identified 4861 differential (FDR < 0.05) ChIP-Seq peaks using HEL24.3 iPSCs expressing doxycycline inducible transgenic HA-tagged LEUTX. The ChIP-Seq peaks mapped mostly to distal intergenic and intronic regions ( Figure 2B, Table S6). We performed region overlap analysis to identify whether ChIP-Seq regions overlap with the NET-CAGE promoter and enhancer data.
The data showed that LEUTX binding sites overlap with both upregulated promoters and enhancers rather than downregulated ones with higher incidence with upregulated enhancers (82 ChIP-Seq peaks directly overlap upregulated NET-CAGE promoters, 308 ChIP-Seq peaks directly overlap upregulated NET-CAGE enhancers) ( Figures 2C, S4C, and S4D). Genomic Regions Enrichment of Annotations Tool (GREAT) analysis of ChIP-Seq peaks showed enrichment of the terms apical junction assembly, regulation of stem cell population maintenance and nucleobase/RNA transport ( Figure 2D).
To explore the function of the detected binding sites, we then compared differential LEUTX ChIP-Seq peaks with preimplantation embryo Assay for Transposase-Accessible Chromatin using sequencing and ChIP-Seq (pink) produce complementary, but functionally different genomic coordinates. STRT-Seq also leads to traditional gene lists to analyze upand downregulated terms and enrichment of biological functions. Multiple analyses are listed in hexagonal boxes and the motif finding tools in yellow circles. See also Figure S3 and Table S4 for statistically significantly upregulated NET-CAGE enhancer locations, Table S5 for statistically significantly upregulated NET-CAGE promoter locations, Table S6 for differential ChIP-Seq peaks. (F) Spatial Motif finding results. MEMESuite SpaMo motif finding tool was applied to search for motifs enriched proximal to the EEA-motif in our datasets. In total, we found 145 motifs that were significantly enriched proximally to the EEA-motif in all datasets (differentially upregulated NET-CAGE enhancers, NET-CAGE promoters and STRT TFEs, and ChIP-Seq peaks), out of which 12 were also detected through BioID-MS proteomics. Highlighted here are the number of total binding sites detected proximally to the EEA-motif (SpaMo output total) in key factors also detected through BioID-MS and protein-protein interaction complex enrichment analysis, in ChIP-Seq (teal) and NETCAGE-Enhancer (purple) sequence data. See Table S7 for complete results. iScience Article (ATAC-seq) data, 44 and found that LEUTX preferentially binds accessible chromatin regions identified in the 8-cell stage, as compared to 2-cell, 4-cell, and ICM ( Figure S6A). These comparisons suggest that LEUTX regulates a set of genomic regions that are accessible during embryonic development.
Furthermore, we compared our data to publicly available ENCODE TF ChIP-Seq datasets ( Figures S6B and  S6C). We found that even with differences in cell lines, batch effects, and other experimental differences LEUTX ChIP-Seq peaks were often proximal with known EP300 binding sites particularly in H1 cell line data, in comparison to cancer cell lines ( Figure S6B). Of interest, binding sites for RAD21 and SMC, components of the cohesin complex identified through our BioID-MS, were also often proximal to LEUTX binding sites ( Figure S6C).
In our previous study, LEUTX was found to bind a 36 bp motif enriched in promoters of genes involved in EGA (EEA-motif). 1,14,45 Motif analysis of all genomic datasets included in the current study showed strong enrichment of this motif, with the whole or partial EEA-motif found in every dataset and as one of the top three highest-confidence motifs ( Figure 2E). In ChIP-Seq (E-value = 2.1E-931), upregulated NET-CAGE enhancer (E-value = 8.9E-867), and STRT TFE (E-value = 6.6E-727) data it was the top hit, and in upregulated NET-CAGE promoters (E-value = 3.2E-258) it was the third motif hit sorted by E-value ( Figure 2E). Further, using the MEMESuite tool SpaMo we analyzed which motifs were enriched proximal to the EEA-motif in the genomic coordinates implicated by our data (NET-CAGE Enhancers, NET-CAGE Promoters, ChIP-Seq peaks, and STRT TFEs). We found 144 motifs that were significantly enriched in all datasets, out of which 12 were detected through proteomics (Table S7). Most notably, E2F6 (E2F6 Complex), TYY1 (Polycomb repressive complex 1), ZEB1 (CtBP complex), and SMARCA5 (BAF-complex) binding sites were enriched proximal to LEUTX binding sites and detected as protein-protein interactors of LEUTX ( Figure 2F).

LEUTX binds to repetitive elements and non-coding RNA transcription start sites
Because many of the identified regulatory regions (STRT-Seq TFEs, NET-CAGE identified promoters and enhancers) or those bound by LEUTX (ChIP-Seq peaks) were far away from annotated promoter or TSS regions and as the EEA-motif was enriched among all datasets, we investigated whether the genomic coordinates from our different datasets overlapped with repetitive elements. In the STRT-Seq data 614 unique TFEs (45% of all TFEs) overlapped with repetitive elements. 1334 upregulated NET-CAGE promoters overlapped 1733 repetitive elements (50% of all promoters), and 1299 uniquely upregulated NET-CAGE enhancers overlapped with 1732 repetitive elements (65% of all enhancers). In the ChIP-Seq data, 3160 differentially expressed unique peaks directly overlapped with 4359 known repetitive elements (65% of peaks). Next, we compared the repetitive element overlap frequencies in STRT-Seq TFEs and LEUTX-driven NET-CAGE promoters to that of FANTOM5 promoters ( Figure 3A). The results show that there was more repetitive element overlap in LEUTX-driven NET-CAGE promoters than in FANTOM5 promoters (Chi-squared test p < 2.2E-16) and microsatellites and simple repeats were overrepresented particularly in upregulated STRT TFEs and NET-CAGE promoters ( Figure S7A) while common LINE-L1 elements were underrepresented.  Table S8.
(B) Most common single repetitive elements identified overlapping LEUTX ChIP-Seq binding sites. HOMER repetitive element enrichment analysis for the ChIP-Seq peaks is compared to genomic frequency to produce estimates of under-or over enrichment. Overrepresentation is shown as red bars growing in the negative direction (Log PValue Underrepresented). Also shown is a multiple analysis corrected p-value under the FDR column. See also Table S9. We also compared the observed overlap frequencies of upregulated LEUTX-driven NET-CAGE enhancers to FANTOM5 enhancers and found that LEUTX driven enhancers had relatively more overlap with repetitive elements (Chi-squared test p < 2.2E-16) and particularly more ERV1 and MaLR elements (Chi-squared test, ERV1 p = 2.29E-206, MaLR p = 7.18E-39, Figure 3A, Table S8). HERVH (ERV1) elements were particularly overrepresented in upregulated NET-CAGE enhancers ( Figure S7A). In all cases, the most common LINE-L1 elements were underrepresented ( Figure S7A, Table S8).
LEUTX binding sites revealed through ChIP-Seq, showed notable binding to Alu elements (24% of all identified binding sites and 36.9% of all overlapping repetitive elements); the most enriched overlapping Alu element compared to genomic frequency was AluJb ( Figures S7B and S7C and Table S9). This enrichment is in agreement with the earlier finding of the 36 bp EEA motif in Alu elements. 1 However, we do not detect significant LEUTX binding to MLT2A1 or LTR12C which have been recently described as highly accessible in both human 8 cell embryos and 8CLC model. 36 Table S9). To conclude, our data suggests that many LEUTX-associated regulatory regions overlap with repetitive elements. However, the data provides only indirect evidence that LEUTX itself regulates transcription through binding to the repetitive elements.

LEUTX expression leads to a cascade of transcriptional activation
Next, we performed deeper analysis to understand the transcriptional effects of endogenous LEUTX activation in a hPSC model. We analyzed transcriptome effects by STRT at 24 h, 48 h, and 72 h after LEUTX induction, comparing them to non-induction controls. To address the validity of our hPSC model, we compared the differentially expressed genes by LEUTX activation to those expressed by human cleavage stage embryos. We found that out of 1048 genes regulated by LEUTX in at least one time point The number of differentially expressed genes increased notably from 24 h to 48 and 72 h ( Figure 3D, Table S10). The 48-and 72-h timepoints are expected to include both primary and secondary targets of LEUTX. 83 genes were differentially expressed at all time points, of which the most upregulated genes were CA4, VMO1, NCR3, CST4, TLE2, AIPL1, and CST1 (all with average log2FC > 10) whereas the most downregulated genes were C9orf135 and SIX6 (average log2FC < -2) (Table S10). Overall, LEUTX induction caused notable upregulation (average log2FC > 1) of 342 genes and downregulation (average log2FC < -1) of 162 genes, emphasizing its role as a transcriptional activator ( Figure S8B, Table S10). This is in line with previous findings characterizing LEUTX as a transcriptional activator. 17 (Table S10). Of interest, LEUTX induction led to differential regulation of the expression of at least 29 known pluripotency factors. 50,51 Out of these, we found upregulation of FGF13, and naive pluripotency markers DPPA3 and NODAL, and its antagonists LEFTY1 as well as LEFTY2, and downregulation of OTX2, CRABP1, PRDM14, C9orf135, NTS, and TDGF1 (Table S10) 52 LEUTX induction led to differential expression of 33 different cell signaling and receptor genes (Table S10).
Further, we compared the list of upregulated genes in this study to the upregulated LEUTX targets in our previous study, 17 and found 205 LEUTX targets upregulated in both datasets (Table S11). The common targets included CA4, VMO1, CST1, DPPA3, SGK1 and NODAL which were among the most upregulated in this study.
Finally, our STRT-Seq data revealed strong upregulation of CRX (log2FC 48h: 12.7, 72h: 10.9) and downregulation CRX's ancestral family member OTX2 (log2FC 48h: -2.3, 72h: -1.6) ( Figures 3E, S8C, and S8D). We found several LEUTX driven genomic locations in the CRX genomic locus: one TFE, one NET-CAGE promoter, one putative enhancer, and two ChIP-Seq peaks directly on the CRX promoter, and we found one putative intergenic enhancer ( Figure S9A). Further, at the adjacent genomic locations, we detected two distal upstream enhancers that were upstream of TPRX1, two downstream enhancers, upstream of TPRX2 and three ChIP-Seq peaks downstream of CRX. CRX has been shown to be upregulated at the 8-cell stage of human development. 8 We validated the CRX upregulation upon LEUTX expression by qRT-PCR in independent transgenic doxycycline inducible cell line ( Figure 4B). To test the functionality of putative CRX enhancer-like region, we used CRISPR activation by dCas9-VP192 45 in combination with guide RNA (gRNA) pools to target the promoter and putative enhancer-like regions in HEK293 cells ( Figures S9B and S9C). Activation of the CRX enhancer region upstream of the promoter but not the intergenic one led to upregulation of CRX expression level compared to the non-transfected control ( Figure S9B). Furthermore, co-transfection of the pool of CRX enhancer targeting guides together with the CRX promoter targeting guides led to increased expression level compared to promoter activation only. This finding supports the functionality of LEUTXactivated putative CRX enhancer.

LEUTX contributes to the expression of 8-cell like expression markers
Recently developed 8-cell-like cell (8CLC) models represent hESCs or human naive PSCs guided to transcriptionally resemble the human 8-cell embryo. Three recent studies identify a number of 8CLC signature and marker genes. [36][37][38] We compared the differentially expressed genes from our STRT-Seq to the identified 8CLC signature genes from these papers. Combined, all 1048 differentially expressed genes from LEUTX STRT-Seq match 377 genes identified in at least one of these papers (Table S10). Altogether, 34 genes were identified upregulated (logFC > 1) in our STRT-Seq and in at least two of the studied 8CLC datasets ( Figure 4A), and 8 genes, DPPA3, CA2, CLK1, ARL4D, HK2, HSPA1B, SERTAD1, and PDCL3, are upregulated after LEUTX expression in our data and are listed in all four datasets ( Figure S9D, Table S10).
In recent 8CLC research, DPPA3, TPRX1, and ZNF280A have been linked to key regulatory roles relevant to generating 8CLCs. 36 37 We find that the LEUTX induction leads to the upregulation of DPPA3 and ZNF280A in more than one experiment produced for this paper (Figures 4B and 4C). To address the validity of the STRT-Seq data, we confirmed the upregulation of DPPA3, DNMT3L, ZNF280A, DUXB, iScience Article SGK1, CRX, NODAL, GNB3 and TPRX2, which shares high sequence similarity with TPRX1 from the same gene family, 19 by RT-qPCR in an independent inducible cell line with transgenic LEUTX (Figures 4B, S9E).
Furthermore, comparison of the recent 8CLC datasets together with our LEUTX cell models shows several potentially relevant genes. For example, CA2, CLK1, SGK1 are listed as 8CLC markers. 37 VMO1 is undetectable in primed PSCs and 4CL naive PSCs and upregulated to moderate expression in 8CLCs in data in dataset by Mazid et al. (2022) 36 ( Figure 4C). The function of these genes in the human preimplantation development is unknown. Analysis of the embryonic ATAC-Seq data 43 supports that their expression peaks at 8-cell stage, similarly to the proposed markers DPPA3 and ZNF280A ( Figure 4D).
Since we detected three key components of cohesin complex to interact with LEUTX and cohesin is bound at topologically associating domain (TAD), we cross examined our data with CCCTC-Binding factor (CTCF) binding site data and embryonic ATAC-Seq data from Wu et al. (2018). 44 We found that LEUTX binds two sites proximal to CRX that coincide with CTCF binding sites. Few of the CTCF binding sites overlap LEUTX NET-CAGE enhancer peaks, indicating these binding sites were also found active in the LEUTX NET-CAGE dataset ( Figure S9A). TPRX2 is found downstream on the same strand as CRX, while TPRX1 is upstream of CRX on the opposite strand ( Figure S9A). LEUTX is bound in regions that peak in activity in the 8-cell stage, for example proximal to TPRX2, annotated as the TPRX2P pseudogene. We confirmed by RT-qPCR that LEUTX induction leads to significant TPRX2 expression ( Figure 4B).  37 as a key marker of 8CLC expression, neither paper discussed TPRX2 which we have found to be a upregulation target of LEUTX ( Figure 4B). TPRX2 is commonly thought to be a pseudogene, but has been shown to produce mRNA product during preimplantation. 1 Recently, Zou et al. (2022) 15 found that combined knockdown of TPRX genes TPRXL, TPRX1, and TPRX2 leads to delay in development and defects in EGA.

DISCUSSION
LEUTX is a primate specific gene, and one of the first genes expressed in human preimplantation embryos, its expression being restricted to the 4-cell to 8-cell stage of the preimplantation embryo. 1,17 Of interest, in our previous studies, LEUTX appeared to be the strongest transcriptional activator among the transcription factors belonging to the same PRDL family. 17,53 In this study, we set out to thoroughly characterize the functions of LEUTX using proteomics, transcriptomics and genomics approaches.
Unstable protein-protein interactions are difficult to capture, either because of being rare or transient in nature, or not strong enough to withstand cell lysis and affinity purification. 54 However, through proximity labeling we could detect multiple possible chromatin-modifying complexes that are in very close contact with LEUTX. The identification of stable interactions with EP300 and CBP, together with a notable number of dynamic chromatin modifying complex interactions, provided strong evidence that LEUTX is involved in transcriptional regulation through chromatin modification, in particular histone acetylation. ChIP-Seq further confirmed that LEUTX binds close to known EP300 binding sites.
We hypothesized that LEUTX interaction with the histone acetyltransferases EP300 and CBP is mediated by the c-terminal 9aaTAD of LEUTX which is directly interacting with KIX-domains. EP300 and CBP together with MED15 are the most well-known coactivators having KIX-domains, highly conserved globular domains with three a-helices. 26,27 KIX-domains have been found in various proteins involved in transcriptional assembly, regulation and coactivation. Currently, in UniProtKB protein database, 41 human proteins are listed as having a 9aaTAD, including embryonic transcription factors SOX9, KLF3 and ELF3 as well as all Yamanaka factors and tumor protein p53. 55 Furthermore, p53 has previously been shown to stably interact with CBP and EP300, which is critical for its transcriptional activation potential. 56,57 Other transcription factors with 9aaTADs and established interaction with EP300 included STAT1, STAT2 58 and FOXO3a 59 In this study, the removal of 9aaTAD of LEUTX eliminated the interactions with the EP300 and CBP thus confirming our hypothesis that the 9aaTAD is responsible for the direct interaction with these kinase-inducible ( iScience Article Using extensive genome-wide sequencing approaches, we found that LEUTX binding sites and differentially expressed regulatory regions overlapped with a large number of repetitive elements. We found that a large number of Alu, MaLR (ERV3), and MIR (L2-end) elements overlapped LEUTX binding sites. Alu elements have previously been shown to be enriched upstream of developmental factors. 1,60 Further, new research surrounding Alu elements shows that Alu elements are often enriched in topologically associating domain (TAD) boundaries. 61 We detected three key members of the cohesin complex through BioID-MS and found proximity of binding sites of cohesin complex members SMC and RAD21 (ENCODE TF ChIP-Seq datasets) to LEUTX binding sites. The cohesin complex is bound at TAD boundaries, maintaining boundary formation. 62 LEUTX was detected to interact with PRC1 complex, which together with the cohesin complex have been suggested to form TAD-like chromatin conformations, but at a smaller scale called the Polycomb-repressed domains (PRD). 63,64 These PRDs form between Polycomb binding regions to repress transcription. 63,64 We examined the CRX genomic locus that contains TPRX1 and TPRX2 and as such is linked to 8-cell like expression. Cross-examination of CTCF binding sites and LEUTX binding sites in this locus shows that LEUTX is bound in two CTCF binding site regions. LEUTXinduced NET-CAGE Enhancers are also overlapping with these CTCF binding sites. Many of these binding sites or enhancer regions are active in the 8-cell stage in the embryonic ATAC-Seq dataset. 44 These findings suggest that LEUTX is possibly binding at chromatin loop boundaries which warrants further studies.
LEUTX and many other members of the PRD-LIKE homeobox gene family, including ARGFX, DPRX, TPRX1 and TPRX2 are all evolutionarily descended from the CRX gene. 65 The CRX gene is flanked by TPRX1 and TPRX2 on chromosome 19, while LEUTX and DPRX have been transposed to a different location on the same chromosome, and ARGFX has been transposed to a different chromosome. 19 Previous research has suggested close co-regulation or counter-regulation within the PRD-LIKE family. 17,19,53 Maeso et al. 19 found that human LEUTX, TPRX1 and ARGFX coregulated an largely overlapping set of genes, and Royall et al. 65 found mCrx and mObox genes similarly coregulated overlapping set of genes, suggesting an evolved system controlling preimplantation development through the same binding site with high redundancy in at least placental mammals. In the analyses of human cells, overlapping expression and regulation profiles have been found between ARGFX, LEUTX, TPRX1 and DPRX, suggesting a role for LEUTX as a pulse-control activator, later repressed by DPRX. 17,19,53 We also found that LEUTX upregulated its ancestral parent CRX and downregulated its ancestral family member OTX2. These all three share the same canonical DNA binding site, together with SIX6 -another LEUTX downregulation target. We found that the CRX genomic locus, also containing TPRX1 and TPRX2, was under close regulation of LEUTX. Of interest, GSC, CRX, and PITX1 become upregulated at the 8-cell stage of human development. 8 All three share the same canonical binding site with LEUTX and follow it in temporal progression during preimplantation development. This binding site and the multitude of factors that bind it might be of key interest for preimplantation development.
We further focused our analyses on all known conserved consensus sequences for repetitive elements. iScience Article MaLRs). We further show that LEUTX preferentially binds enhancer sequences, and based on protein-protein interactions, LEUTX together with CBP and EP300 likely facilitates histone acetylation. LEUTX induction leads to differential expression of several developmental transcription factors, 8-cell like markers and epigenetic modifiers that together take part in downstream embryonic development events. Our data provide an excellent resource for the LEUTX functions in human cells, as well as for researchers working with genes belonging to the same family or preimplantation development.

Limitations of the study
We note that there are few limitations to our study. It is not possible to do functional studies that require a high number of cells in human embryos; therefore, we used several different cell lines during data collection for this study. We acknowledge that none of the cell lines exactly capture the state of the cleavage stage embryo. The questions whether LEUTX is an essential transcription factor in early human development, whether LEUTX is necessary for the pluripotent-to-totipotent transition or whether it induces a distinct early-embryonic-like state in hPSC remain to be resolved. In addition, our study had technological limitations. It is currently not feasible to perform NET-CAGE or mass spectrometry-based interactome analyses in 8CLC cell models in which only small number of cells are converted to 8-cell like cells. The methods require a large number of cells for the library preparation or data collection. 23 Therefore, further studies are needed to further model the function of LEUTX in human preimplantation development.
The experiments detailed in this paper cannot address the exact molecular function of LEUTX during the 4and 8-cell stages, nor can it address how LEUTX affects its transcriptional regulation. How LEUTX regulates transcription on a biochemical level, in vivo function of LEUTX and LEUTX function in 8CLC merits further study.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:   iScience Article with 0.5 mM EDTA (all from Thermo Fisher Scientific). Cultured in 37 C, 5% CO 2 in a humidified atmosphere.

Human samples
No human samples were used in this study.

Animal models
No animal experiments were performed in this study.

Cloning of vectors for LEUTX overexpression
In order to overexpress LEUTX in human pluripotent cells, the ORF was cloned into a modified piggyBac vector. LEUTX ORF was amplified from a TOPO vector containing the full-length clone (European nucleotide archive accession numbers: LN651090). The PCR product was ligated into piggyBac vector. The final vector was called pB-tetON-bgi-LEUTX-ires-GFP-PGK-Puro.LEUTX. The ORF was further modified by removing the C-teminal 9 amino acid TAD. The ORF was amplified form a TOPO vector containing full length clone LN651090. The PCR product was digested using AgeI and NotI and ligated into piggyBac vector. The final vector was called pB-tetON-bgi-LEUTXw/o9aaTAD-ires-GFP-PGK-Puro. For ChIP-seq, C-terminal V5 and HA tags were added to wild type LEUTX. The ORF was amplified in two-step PCR using pB-tetON-bgi-LEUTX-ires-GFP-PGK-Puro as a template. The PCR product was digested using AgeI and NotI and ligated into piggyBac vector. The final vector was called pB-tetON-bgi-LEUTX-V5-HA-IRES-GFP-PGK-Puro. Primers reported in Table S12.

Cloning of LEUTX to MAC-tag Gatewayâ destination vector for mass spectrometry
The wild type LEUTX and mutants were first amplified in a two-step PCR reaction from vectors above and cloned into a Gateway compatible entry clone using Gateway BP Clonase II (Invitrogen) according to manufacturer's instructions (Primers in Table S12). The entry clone was further cloned to Gateway compatible destination vector containing the C-terminal MAC-tag (Addgene #108077). 20,29 Cell culture for mass spectrometry To produce stable cell lines stably expressing MAC-tagged LEUTX, Flip-In T-REx 293 cell lines (Invitrogen, Life Technologies, R78007, cultured in manufacturer's recommended conditions) were co-transfected with the expression vector and the pOG44 vector (Invitrogen) using Fugene6 transfection reagent (Roche Applied Science). One day after transfection, cells were selected in 1% Streptomycin and 100 mg/ml Hygromycin for two weeks after which positive clones were pooled and amplified. Green fluorescent protein (GFP) tagged with MAC-tag was used as a negative control and processed parallel to the bait proteins. Stable cell line was expanded to 80% confluence in 20 3 150mm cell culture plates. Ten plates were used for AP-MS, in which 2 mg/ml tetracycline was added for 24 h induction, and ten plates for BioID, in which 50 mM biotin in addition to tetracycline, was added for 24 h before harvesting. Cells from five fully confluent dishes were pelleted as one biological sample. In total two biological replicates in two different approaches were produced. Samples were snap frozen and stored at -80 C.

Affinity purification mass spectrometry
In the AP-MS sample purification the sample was lysed in 3 ml ice-cold Lysis Buffer I (1% n-Dodecyl beta-Dmaltoside, 50mM Hepes, pH 8.0, 150 mM NaCl, 50 mM NaF, 1.5 mM NaVo 3 , 5 mM EDTA, 0.5 mM PMSF and Sigma Proteinase Inhibitor). In the BioID-MS sample the cell sample was lysed in 3 ml ice-cold Lysis Buffer I, supplemented with 1 ml Benzonase per sample and sonicated in a water bath in cycles with 3x continuous sonication and 5min break. Lysed samples were centrifuged at 16000x for 15 min, and again 10 min to produce cleared lysate, that was loaded on Bio-Rad spin columns that had 400 ml Strep- Tactinbeads  Liquid chromatography-mass spectrometry (LC-MS) Analysis was performed on a Q-Exactive mass spectrometer with an EASY-nLC 1000 Liquid Chromatograph Q Exactiveä Hybrid Quadrupole-Orbitrapä system via an electrospray ionization sprayer (Thermo Fisher Scientific), using Xcalibur version 3.0.63 as described in Liu et al. (2018). 20 Database search was performed with Proteome Discoverer 1.4 (Thermo Scientific) using the SEQUEST search engine on the Reviewed human proteome in UniProtKB/SwissProt databases (http://www.uniprot.org, downloaded Nov. 2020). Trypsin was selected as the cleavage enzyme and maximum of 2 missed cleavages were permitted, precursor mass tolerance at G15 ppm and fragment mass tolerance at 0.05 Da. Carbamidomethylation of cysteine was defined as a static modification. Oxidation of methionine and for BioID samples biotinylation of lysine and N-termini were set as variable modifications. All reported data were based on high-confidence peptides assigned in MSFragger v17 (FDR < 0.01).

Validation of promoters and enhancers using CRISPRa
Putative LEUTX enhancer regions 1 and 2 were predicted from Tet-On DUX4 hESC NET-CAGE dataset. 9 Putative CRX enhancer and promoter regions were predicted from NET-CAGE data introduced in this study. The guide RNAs targeting the each of the putative enhancers or promoters were designed using the Benchling CRISPR tool (https://benchling.com), targeting them to the proximal promoters (À400 to À50 base pairs from transcription start site) or +/À200 base pairs of the putative enhancer midpoint. Guide sequences were selected according to their on-and off-target score and position. Guide RNA transcriptional units (gRNA-PCR) were prepared by PCR amplification with Phusion polymerase (Thermo Fisher), using as template U6 promoter and terminator PCR products amplified from pX335 together with a guide RNA sequence-containing oligo to bridge the gap. The oligos for guide RNA transcriptional units are as in (Balboa et al., 2015). 67 PCR reaction contained 50 pmol forward and reverse primers, 2 pmol guide oligo, 5 ng U6 promoter and 5 ng terminator PCR products in a total reaction volume of 100mL. The PCR reaction program was 98 C/10 sec, 56 C/30 sec, 72 C/12 sec for 35 cycles. Amplified gRNA-PCRs were purified and transfected to HEK293 cells.
HEK 293 cells were seeded on tissue culture treated 24-well plates one day prior to transfection (5 3 104 cells/well). Cells were transfected using FuGENE HD transfection reagent (Promega) in fibroblast culture medium with 500 ng of dCas9VP192 transactivator encoding plasmid and 200 ng of guide RNA-PCR product or TdTomato guide RNA plasmid. Cells were cultured for 72 h post-transfection, after which samples were collected for qRT-PCR. Successful activation of LEUTX and CRX was confirmed by qPCR.
In order to introduce LEUTX guides to DD-dCas9 activator cell line, guide cassettes containing either four guide oligos targeting LEUTX promoter or five guide oligos targeting enhancers 1 or 2 were assembled in a GoldenGate reaction using the four different LEUTX promoter guide oligos and 5 different guide oligos targeting enhancers 1 and 2 as described in (Balboa et al., 2015). 67 Guide cassettes containing both promoter and enhancer guides was further cloned together. Finally, the guide cassettes were cloned to piggyBac vector. Primer sequences for promoter and enhancer guide oligos are provided in Table S12. See Figure S5 for LEUTX enhancer validation. iScience Article (https://github.com/hkawaji/dpi1/blob/master/identify_tss_peaks.sh) was used to identify tag clusters with default parameters but without decomposition. Peaks with at least three supporting CAGE tags were retained and used as input to identify bidirectional enhancers (https://github.com/anderssonrobin/ enhancers/blob/master/scripts/bidir_enhancers).

NET-CAGE statistical analysis
To found differentially expressed promoters and enhancers, we normalized to library size and kept peaks that have been detected in at least two samples and have log2CPM > -2.5 (enhancers) log2CPM > -2 (promoters). Differentially expressed peaks represent those that have FDR < 0.05 with EdgeR Generalized Linear Model Likelihood Ratio Test. 74 Upregulated and downregulated differentially expressed promoters and enhancers were defined as logFC>0 and logFC<0 respectively.

STRT differential expression analysis
Normalized to RNA Spike-ins with the R-Package RUVSeq. 77 During initial analysis and normalization, we found that the first row of the PCR plate (first 8 samples) were notably different from the rest of the samples.
To keep the sampletype amounts the same (promoter, promoter + enhancer, promoter + enhancer2) we excluded the first 12 samples from the analysis and for the TFE tables samples were realigned with first 12 samples removed ( Figure S10). Filtered out very lowly expressed genes by requiring more than 5 reads in at least two samples. We used a model accounting for the RNA Spike ins, pipetting set (set/time of pipetting), and the sampletype (Promoter only, Promoter + Enhancer1, Promoter + Enhancer2). EdgeR genewise negative binomial generalized linear models with quasi-likelihood test. Differentially expressed genes and TFEs are defined as those with FDR < 0.05.

ChIP-seq alignment and statistical analysis
The sequence alignment was done by Bowtie 2 77 using GRCh38 as reference human genome and the ChIPseq peak calling was carried out using the MACS2 86 ( Figure S11). MACS2 peaks with FDR < 0.05 were considered significant. MACS2 peaks were transferred to hg19 using LiftOver to be compared with the other genomic data sets.

Annotation on genomic regions
Annotation plots for genomic regions were done with ChIPSeeker R-package, 39 with promoter regions defined as 3000 kb up or downstream from known GENCODE TSS sites. Plotting of genomic regions was done using Gviz R-package 79 and using Integrative Genomics Viewer. 66 Motif finding: MEME suite To analyze which motifs were found in the genomic coordinates we had we used MEMESuite. 93 TFE and Promoters were extended with 2500bp up-and 500 downstream of peak coordinates, Enhancers peaks were extended 500bp up and downstream, whereas ChIP-Seq peaks were not extended. MEME 94