A cis-acting structural variation at the ZNF558 locus controls a gene regulatory network in human brain development

The human forebrain has expanded in size and complexity compared to chimpanzees despite limited changes in protein-coding genes, suggesting that gene expression regulation is an important driver of brain evolution. Here, we identify a KRAB-ZFP transcription factor, ZNF558, that is expressed in human but not chimpanzee forebrain neural progenitor cells. ZNF558 evolved as a suppressor of LINE-1 transposons but has been co-opted to regulate a single target, the mitophagy gene SPATA18. ZNF558 plays a role in mitochondrial homeostasis, and loss-of-function experiments in cerebral organoids suggests that ZNF558 influences developmental timing during early human brain development. Expression of ZNF558 is controlled by the size of a variable number tandem repeat that is longer in chimpanzees compared to humans, and variable in the human population. Thus, this work provides mechanistic insight into how a cis-acting structural variation establishes a regulatory network that affects human brain evolution.


In brief
Johansson et al. identify ZNF558, a KRAB-ZFP expressed in human but not chimpanzee forebrain progenitors, where it regulates its target SPATA18. The expression of ZNF558 is controlled by the size of a VNTR that is longer in chimpanzees compared to humans, demonstrating a role for structural variations in human brain evolution.

INTRODUCTION
The human forebrain has increased in size and complexity after the split between the human and chimpanzee lineages, giving rise to a new level of cognitive functions during hominid evolution (Hill and Walsh, 2005;Lui et al., 2011;Rakic, 2009;Sousa et al., 2017). Cellular and anatomical adaptations have been driven by genetic changes in the human lineage (Enard, 2016), but the actual genetic modifications responsible for this evolutionary process are mostly not understood. Protein-coding genes are highly conserved between human and chimpanzees (Kronenberg et al., 2018) and aside from the well-studied transcription factor FOXP2 (Lai et al., 2001), there remains limited evidence for a wider impact of amino acid substitutions on human brain evolution. Recently, larger structural variations resulting in gene duplication were implicated in human forebrain function and evolution. NOTCH2NL, a human-specific paralog of NOTCH2, contributes to cortical development (Fiddes et al., 2018;Suzuki et al., 2018), and duplication of TBC1D3 and the mitochondrial protein ARHGAP11B affected cortical expansion via the basal progenitor populations (Dennis and Eichler, 2016;Florio et al., 2015;Ju et al., 2016;Namba et al., 2020). Changes in cis-regulatory regions have also long been thought to contribute to species-specific differences (King and Wilson, 1975), and several studies have revealed divergent gene expression patterns in developing primate brains, although their evolutionary impact is unclear (Johnson et al., 2009;Khaitovich et al., 2006;Mora-Bermú dez et al., 2016;Prescott et al., 2015).
One gene family of particular interest in human brain evolution is the Kr€ uppel-associated box (KRAB) domain-containing zinc finger proteins (KZFPs), the largest individual family of transcription factors in mammalian genomes. KZFPs have undergone a rapid expansion during mammalian and primate evolution, and the human genome encodes for at least 350 KZFPs, $170 of which are primate specific (Imbeault et al., 2017;Jacobs et al., 2014). Many KZFPs are expressed in the human brain and integrated into neuronal gene regulatory networks (Farmiloe et al., 2020;Imbeault et al., 2017). Notably, these expression patterns are different between human and chimpanzee (Nowick et al., 2009). The majority of KZFPs are thought to be transcriptional repressors. Their conserved N-terminal KRAB domain interacts with the epigenetic co-repressor TRIM28, which induces heterochromatin formation and transcriptional repression of targets (Ayyanathan et al., 2003;Emerson and Thomas, 2009;Matsui et al., 2010;Rowe et al., 2010;Sripathy et al., 2006). KZFPs differ mainly in the number and sequence of the DNA-binding ZF domains, with the number of ZFs ranging between 2 and 40 in humans (Imbeault et al., 2017). The KZFP family has expanded and diversified through repeated cycles of segmental duplications, giving rise to novel KZFP genes with new targets and biological functions (Nowick et al., 2010).
Several studies have demonstrated an important role for KZFPs in the repression of transposable elements (TEs) in a variety of cell types, including embryonic stem cells (ESCs) and neural progenitor cells (Brattå s et al., 2017;Ecco et al., 2016;Fasching et al., 2015;Najafabadi et al., 2015;Pontis et al., 2019;Rowe et al., 2010;Rowe and Trono, 2011;Turelli et al., 2014;Wolf et al., 2015;Zhang et al., 2019). The rapid expansion of KZFPs in mammalian genomes is correlated with the expansion of TEs, where KZFPs are thought to evolve to target new TE insertions and sequences (Jacobs et al., 2014;Thomas and Schneider, 2011). The majority of KZFPs bind to specific families of TEs in human cells (Imbeault et al., 2017;Najafabadi et al., 2015), but it has also been proposed that both KZFPs and TEs have been co-opted by host genomes for the broader regulation of transcriptional networks (Brattå s et al., 2017;Ecco et al., 2016;Friedli and Trono, 2015;Imbeault et al., 2017).
These observations make KZFPs promising candidates to mediate evolutionary differences between the human and the chimpanzee brain, but there is a lack of experimental data directly addressing this hypothesis. In this study, we investigate the co-option of KZFPs in transcriptional regulation during human and chimpanzee brain development. To this end, we established an in vitro differentiation protocol allowing for quantitative comparisons between human and chimpanzee forebrain neural progenitor cells (fbNPCs). We discovered several KZFP transcription factors that are highly expressed in human but not chimpanzee fbNPCs. One of these, ZNF558, is a conserved gene that originally evolved to control the expression of long interspersed element-1 (LINE-1) elements $100 million years ago (mya). Our data show that ZNF558 no longer suppresses TEs but has been co-opted in fbNPCs to regulate a single gene, the mitophagy regulator SPATA18 (Kitamura et al., 2011). CRISPR inhibition (CRISPRi)-mediated silencing of ZNF558 in human cerebral organoids suggests a role for ZNF558 in the control of developmental timing in early human brain development. Mechanistically, we provide evidence that ZNF558 expression is controlled by a downstream variable number tandem repeat (VNTR) that is extended in chimpanzees relative to humans and variable in the human population. Epigenetic manipulation of the human VNTR was sufficient to switch off ZNF558 expression and thereby increase SPATA18 expression. Our data reveal the co-option of a TE-controlling KZFP to regulate a protein-coding gene and how this regulatory network is controlled between species and human individuals by a cisacting structural variation.

Derivation of human and chimpanzee fbNPCs
Comparative transcriptomic and epigenetic analyses of chimpanzee and human brain development have been limited by both the availability of material from these species and tissue heterogeneity. As induced pluripotent stem cells (iPSCs) from chimpanzees and other primates have become available, it is now possible to establish in vitro cell culture models (Gallego Romero et al., 2015;Marchetto et al., 2013;Mora-Bermú dez et al., 2016;Wunderlich et al., 2014). To directly compare human and chimpanzee fbNPCs, we optimized a defined, feeder-free, 2-dimensional (2D) differentiation protocol based on dual-SMAD inhibition (Grassi et al., 2019). Chimpanzee and human iPSCs could be maintained in vitro under identical conditions ( Figure 1A), and upon differentiation, we observed a rapid switch to a fbNPC-like morphology in cells from both species (Figure 1B). After 2 weeks of differentiation, we found that both human and chimpanzee fbNPCs expressed high levels of FOXG1, a key forebrain marker, while the pluripotency marker NANOG was not expressed ( Figures 1B and S1A).
Both human and chimpanzee fbNPCs expressed appropriate neuronal and forebrain markers, while genes related to other brain regions or other tissues were close to undetectable (Figure 1C). To investigate whether human and chimpanzee iPSCs differentiate into fbNPCs with different temporal trajectories, we performed bulk RNA sequencing (RNA-seq) at 13, 14, 15, and 16 days of differentiation and analyzed the covariance of gene expression between these time points and the 2 species. At the selected time points, the fbNPCs corresponded to a differentiation stage just before neuronal commitment, demonstrated by the gradual increase in basal progenitor marker EOMES and neuronal markers DCX and TBR1 from days 13 to 16 ( Figure 1C). We found no upregulation of glial markers, in line with the stepwise generation of neurons and glia during human brain development ( Figure 1C). Globally, a similar set of genes were up-and downregulated between days 13 and 16 in human and chimpanzee fbNPCs, indicating that the temporal dynamics of the protocol were closely matched ( Figure 1D). Overall, human and chimpanzee fbNPCs display a very similar transcriptome (Figure S1B). We confirmed a limited batch-to-batch variation in the differentiation protocol, and the results were consistent in cell lines from different individuals ( Figure S1C).
To investigate the heterogeneity of human and chimpanzee fbNPC cultures, we performed single-cell RNA-seq analysis at day 14 for 4,355 human and 3,620 chimpanzee cells. Principalcomponent analysis (PCA) showed that 95% of the cells clustered into 1 major population in both species (Figures S1D-S1F). Transcriptional variation within the major population was mainly explained by differences in cell-cycle state rather than cell identity, as these cells clustered into a dense population inseparable on PC1 and PC2 after regressing out cell-cycle effects ( Figure S1E). t-Distributed stochastic neighbor embedding (tSNE) confirmed the presence of a large major population of cells homogeneously expressing the forebrain progenitor markers FOXG1 and PAX6 ( Figure 1E). The tSNE analysis also A C E D B Figure 1. Differentiation of human and chimpanzee iPSCs to fbNPCs (A) Schematics of the dual-SMAD inhibition-based differentiation protocol from seeding iPSCs at day 0 to harvesting the fbNPCs at days 13-16 of differentiation. (B) Bright-field images of human and chimpanzee cells during the first week of differentiation, and FOXG1 immunocytochemistry at day 13, scale bar: 100 mm. (C) Heatmap showing marker expression at days 13-16 of differentiation (n = 1 human, n = 1 chimpanzee, 2 differentiation batches at each time point). (D) Heatmaps (left) displaying genes that are significantly (p adjusted < 0.01) up-and downregulated over time (day 16/day 13, respectively) in humans, the same set of genes mapped for both species. Boxplots (right) showing the same set of genes. The lower and upper hinges correspond to the first and third quartiles. (E) tSNE-analysis of single-cell RNA-seq data of the forebrain markers PAX6 and FOXG1 in human and chimpanzee fbNPCs (n = 1 human, n = 1 chimpanzee, 2 different differentiation batches at each time point). See also Figure S1. revealed 2 minor populations (<5%), 1 of which expressed markers associated with early-committed neurons such as NEUROG1 and NEUROD1, while the other related to the endothelial lineage (e.g., ANKRD1, CTGF) ( Figure S1F). Taken together, immunocytochemistry, bulk, and single-cell RNA-seq demonstrated that our 2D differentiation protocol was reproducible and gave rise to temporally and phenotypically matched homogeneous cultures of human and chimpanzee fbNPCs, making it a suitable model system for direct comparative analysis.
Human-specific expression of KZFP transcription factors Next, we queried our transcriptomic datasets for differentially expressed genes, choosing to focus on KZFP transcription factors because their evolutionary and biochemical characteristics make them prime candidates for governing species-specific differences. We identified 312 KZFPs that were expressed in at least 1 sample and 35 that were significantly different between species (27 higher in human, 8 higher in chimp; Wald's test, p < 1.0 3 10 À15 , with Benjamini-Hochberg correction and log 2 fold change [fc] > 1) ( Figure 2A). Seven candidates (ZNF138, ZNF248, ZNF439, ZNF557, ZNF558, ZNF596, and ZNF626) were highly expressed in human fbNPCs, with nearly no expression in chimpanzee fbNPCs ( Figures 2B and S2A).
The massive expansion of the KZFP family in mammals has arisen through rounds of segmental duplication events, making them challenging to study with comparative genomic analyses. To confirm that human-specific expression was not caused by biases in mapping, reference genome builds, or gene annotation, we mapped all of the samples to both human (GRCh38) and chimpanzee (PanTro6) assemblies. ZNF138 expression was only detected when mapping human samples to GRCh38, but not when mapping human samples to PanTro6, nor in chimp samples mapping to either assembly. Pairwise alignment of human and chimpanzee ZNF138 coding sequences revealed only 34% sequence identity, with point mutations and several deletions in the human sequence, explaining why reads from human samples do not map to the chimpanzee genome. We conclude that ZNF138 has diverged in both sequence and expression pattern in forebrain progenitors. ZNF557 and ZNF626 were detected in chimpanzee samples only when mapping to GRCh38. This suggested that issues with the PanTro6 assembly prevented mapping of these transcripts, invalidating attempts to infer human-specific expression of these genes. Four candidate genes (ZNF248, ZNF439, ZNF558, and ZNF596) were exclusively expressed in human samples, despite high mapability for both human and chimpanzee orthologs ( Figure 2B).
To further dissect the expression divergence for the 5 candidate KZFPs (ZNF138, ZNF248, ZNF439, ZNF558, and ZNF596) with human-specific expression in the developing forebrain, we used our single-cell RNA-seq analysis of human and chimpanzee fbNPCs. This analysis confirmed that all 5 candidates were highly expressed in human samples ( Figure 2C). ZNF138, ZNF248, and ZNF558 were exclusively expressed in human cells, while ZNF439 and ZNF596 were also detected in a rare number of chimpanzee cells ( Figures 2C and S2B). The candidates were expressed throughout the cell population, excluding the possibility that the human-specific expression was due to a specific subpopulation in the human samples. We also analyzed publicly available expression patterns in human and chimpanzee cerebral organoids, which confirmed that ZNF138, ZNF248, and ZNF558 also display human-specific expression in 3D models of neural development (Field et al., 2019;Kanton et al., 2019;Mora-Bermú dez et al., 2016) (Figures 2D, S2C, and S2D). CUT&RUN epigenomic profiling in human and chimpanzee fbNPCs demonstrated a striking human-specific enrichment of the activating epigenetic mark H3K4me3 over the promoters of the candidate genes, confirming that the observed differences in RNA levels result from differences in transcriptional activity ( Figure 2F).
qRT-PCR analysis of iPSCs demonstrated that ZNF138, ZNF248, and ZNF558 were expressed at low levels in human iPSCs and that their expression increased during the course of fbNPC differentiation ( Figure S2E). RNA-seq data from human fetal forebrain samples confirmed their expression during human forebrain development ( Figure 2E). We further verified that our results were not limited to the 2 chimpanzee individuals used to derive the iPSCs used in this study; a dataset with 8 other individuals confirmed human-specific expression of ZNF558, the candidate with the most robust expression in iPSCs ( Figure S2F).
We next asked whether the differences in ZNF138, ZNF248, and ZNF558 expression were limited to brain development or also found in other tissues. Publicly available transcriptome data (GTEx) indicate that ZNF138, ZNF248, and ZNF558 are widely expressed in human tissues ( Figure S3A). However, we found no apparent difference in expression between humans and chimpanzees in the adult brain, heart, or fibroblasts (Figure S3B). Thus, the difference in the expression of ZNF138, ZNF248, and ZNF558 in humans and chimpanzees appears to be limited to specific organs and developmental stages, including brain development.
Our analysis identified 3 KZFPs expressed during human but not chimpanzee brain development (ZNF138, ZNF248, and ZNF558). We selected ZNF558 for further analysis as it was exclusively and highly expressed in human samples and because this difference was detected among multiple individuals.
ZNF558 has evolved under constraint in the mammalian lineage Many KZFPs have undergone rapid evolution in primate evolution, in which new genes arise and diversify through segmental duplications (Nowick et al., 2010). However, ZNF558 is estimated to have originated before the common ancestor of afrotherian mammals, $105 mya, with orthologs in a range of placental mammals ( Figure 3A). A duplication event occurred in the common ancestor with new world monkeys, giving rise to a paralog (ZNF557), which was expressed in both human and chimpanzee fbNPCs. ZNF557 encodes 1 more ZF domain (10 in total) than ZNF558, and 8 of the 9 common ZF domains have R1non-synonymous mutations in DNA binding residues relative to ZNF558, suggesting that positive selection acted to diversify the function of these paralogs, resulting in divergent DNA targets.
Pairwise alignments revealed high conservation among ZNF558 orthologs ( Figure 3B). Analysis of the ratio between non-synonymous and synonymous mutations (dN:dS) suggests that ZNF558 has evolved under constraint in the mammalian lineage, with a median dN:dS ratio of 0.159 ( Figure 3C). KZFP

ll OPEN ACCESS
Article DNA binding is mediated by tandem C2H2 ZF domains, each of which interacts with contiguous stretches of 2-4 nt per finger (Patel et al., 2018;Persikov and Singh, 2014). In particular, the identities of 4 residues in each ZF are responsible for sequence specificity, with multiple ZF domains combining to encode a particular ''zinc fingerprint'' (Imbeault et al., 2017). To test whether the DNA-contacting residues of its 9 ZFs have evolved during recent mammalian evolution, we aligned sequences of the ZNF558 fingerprint among 8 mammals and found that these residues are almost completely conserved. All primates display identical binding residues, while single substitutions in mouse, whale, and panda proteins are conservative (Ile to Val; Ser to Thr) and thus are unlikely to have a major effect on DNA binding ( Figure 3D). We next investigated the presence of ZNF558 gene variants in the human population using a database (gnomAD version 2) of 125,748 exome and 15,708 whole-genome sequences from unrelated individuals (Karczewski et al., 2020). Heterozygous predicted loss of function (pLoF) alleles were found at slightly below the expected frequency (depth-corrected observed:expected [oe] ratio = 0.66; 90% confidence interval [CI] 0.48-1.12), and no individuals harboring homozygous pLoF variants were identified ( Figure 3E). Analysis of missense variants showed that the KRAB domain is strictly conserved in putative TRIM28-interacting residues (Friedman et al., 1996;Murphy et al., 2016). We also found low variant counts in the zinc fingerprints and zinc-coordinating residues among the human population, with 1 exception, a H370P substitution in ZF8 found in $1 in 1,000 Finnish individuals ( Figure 3E). Tthis evolutionary and population-wide analysis of ZNF558 conservation and variation indicates that the protein has been under stringent evolutionary constraint for $100 million years, which is in line with an important role in mammalian physiology. ZNF558 binds evolutionarily old LINE-1 elements and protein-coding genes The expansion of KZFPs in mammalian genomes is thought to be driven by adaptation to rapidly evolving TEs (Jacobs et al., 2014;Thomas and Schneider, 2011). It was therefore striking that our candidate gene for human/chimpanzee divergence, ZNF558, is highly conserved. This suggests that it may have been co-opted for regulating non-TE targets early in mammalian evolution. To investigate the function of ZNF558, we considered the DNA binding preferences of its 9-membered ZF array. Chromatin immunoprecipitation with exonuclease digestion (ChIP-exo) data on hemagglutinin (HA)-tagged ZNF558-expressing 293T cells showed robust signal enrichment and no overlap in target specificity between ZNF558 and its paralog ZNF557 (Imbeault et al., 2017) ( Figure 4A). CUT&RUN profiling in human and chimpanzee fbNPCs demonstrated a human-specific enrichment of the repressive epigenetic mark H3K9me3, which is associated with KZFP binding, over the ZNF558 binding sites, suggesting that the sites identified in 293T cells are also occupied in human fbNPCs ( Figures 4B and S4A). Computational prediction of the ZNF558 binding motif with overlapping 4-nt subsites suggested a T-rich binding sequence of 28 nt. The experimental binding motif partly matched the prediction, with some gaps ( Figure 4C). Longer arrays are known to deviate more from the canonical binding model and A/T-rich sites are less predictable, so such discrepancies were not surprising (Patel et al., 2018). We next investigated the features of genomic regions bound by ZNF558. We found that 54% of the 465 high-confidence ZNF558 binding sites were located in TEs, which were enriched for LINE-1 elements ( Figures 4D-4F). Of these ZNF558-bound LINE-1 elements, the majority of the sites were in evolutionary-old LINE-1 families such as L1MEg, L1M4, and L1Med ( Figure 4F). These LINE-1 families are remnants of old transposition events and have since degenerated and lost transposition capacity. Estimation of the evolutionary age of the ZNF558-bound elements revealed that the majority were dated at $100 million years old, which correlates well with the age of ZNF558 ( Figure 4F). This observation is in line with a model in which ZNF558 originally evolved to repress thenactive LINE-1 elements. These LINE-1s would then have degenerated and ZNF558 could have been co-opted to control other non-TE genomic targets. In agreement with this model, we noted that several of the ZNF558 target sites overlap with protein-coding genes. For example, the fourth highest-scoring ChIP-exo peak was located just downstream of the first exon of SPATA18, a gene involved in mitochondrial homeostasis ( Figure 4G). These co-opted ZNF558 targets may be important for mammalian physiology, since ZNF558 has been under stringent positive selection for a long time. If this scenario is true, then the LINE-1 binding that remains today likely represents a genomic fossil.
CRISPRi-mediated transcriptional silencing reveals SPATA18 as the sole functional target of ZNF558 To investigate the functional relevance of ZNF558 in human fbNPCs, we designed a CRISPRi strategy to silence ZNF558 expression ( Figure 5A). We targeted 2 distinct guide RNAs (gRNA) to a genomic region located next to the ZNF558 transcription start site (TSS) and co-expressed these with a transcriptional repressor domain fused to catalytically dead Cas9 (dCas9). The transduction of human iPSCs resulted in the efficient silencing of ZNF558 upon differentiation to fbNPCs (Figures 5A, 5B, and 5E), but no difference in differentiation capacity or expression of cell fate markers compared to controls ( Figures  S5A and S5B). Silencing of ZNF558 also resulted in an almost complete loss of H3K9me3 at ZNF558 binding sites ( Figure S4B), confirming the requirement for maintained ZNF558 binding to retain local heterochromatin.
Next, we performed RNA-seq on ZNF558-silenced fbNPCs and analyzed the transcriptome for alterations in TE and gene expression. Remarkably we found a single protein-coding gene, SPATA18, to be upregulated upon ZNF558 silencing (Figures 5C, 5D, and S5D), but no alterations in TE expression either when querying expression of unique elements or entire families. In accordance with this, H3K4me3 CUT&RUN in ZNF558-CRISPRi fbNPCs revealed a total loss of H3K4me3 over the promoter of ZNF558 and that the sole ZNF558 binding site to acquire this histone mark was the binding site at SPATA18 ( Figures  5E and S4C). These data demonstrate that ZNF558 has been co-opted in human fbNPCs to regulate SPATA18, and that its binding sites in old LINE-1 elements do not have detectable consequences on transcription of those TEs.
ZNF558 regulates SPATA18 expression in human and chimpanzee fbNPCs Next, we considered the species specificity of SPATA18 repression by ZNF558. Since transcription factors and their DNA target sites co-evolve, we analyzed the conservation of the ZNF558 binding site in a range of mammalian SPATA18 genes and found that it is conserved only in primates ( Figure S4D). The conservation of the binding site in chimpanzee SPATA18 suggests that the lack of ZNF558 expression in chimpanzee fbNPCs should result in the increased expression of SPATA18 in these cells. In accordance with this model, epigenomic profiles over the SPATA18 promoter in chimpanzee fbNPCs revealed enrichment of H3K4me3 but not H3K9me3 with concomitant expression of SPATA18 mRNA ( Figure 5F). RNA-seq analysis showed a 2.4-fold increase in chimpanzee compared to human fNPCs, and increased SPATA18 expression in chimpanzee was also observed in cerebral organoids (Field et al., 2019) (Figure S4E). To further test that ZNF558 has the capacity to regulate SPATA18 in chimpanzee fbNPCS, we transduced chimpanzee iPSCs with a ZNF558 overexpression vector ( Figures 5G-5I). Following differentiation to fbNPCs, SPATA18 was significantly downregulated upon ZNF558 overexpression ( Figures 5H, 5I,  Figure S4. and S5E). These results demonstrate that the functional consequence of differential ZNF558 expression in human and chimpanzee brain development is the repression of a single gene, SPATA18.
ZNF558 contributes to human-specific developmental timing in organoids The product of the ZNF558 target gene SPATA18 is mitochondrial eating protein (MIEAP), which contributes to the degradation of mitochondria via mitophagy (Kitamura et al., 2011). In line with this, we found that ZNF558 silencing resulted in a small but significant decrease in mitochondrial content, as measured with qPCR for mitochondrial DNA, in human fbNPCs ( Figure 6A). In addition, we found an increase in the levels of ATPase5A in respiratory chain complex V ( Figure 6B), suggesting an adaptive change in ATPase levels in response to the decreased number of mitochondria. Interestingly, recent observations directly link mitochondrial function to speciation and the expansion of the human cerebral cortex (Namba et al., 2020). Thus, to investigate the functional role of ZNF558 in human brain development, we generated ZNF558-deficient cerebral organoids ( Figure 6C), since this model system allows the study of human-specific developmental process in a 3D setting (Benito-Kwiecinski et al., 2021;Kanton et al., 2019;Pollen et al., 2019). We found that CRISPRi-mediated ZNF558 silencing using 2 distinct gRNAs did not impair cerebral organoid formation and that the resulting organoids displayed characteristic neural rosettes after 1 month of differentiation, as visualized with Pax6/ ZO1 staining, in a similar pattern when compared to organoids generated from control iPSCs (lacZ-gRNA; Figure 6D). However, we noted that ZNF558-CRISPRi organoids appeared smaller than control organoids during the first month of differentiation ( Figure 6E), which corresponds to a period when the neuroepithelial buds have formed and are undergoing expansion.
To further assess the long-term consequences of ZNF558-inhibition on human brain development, we analyzed organoids at later stages of maturation (2 and 4 months) using single-nuclei RNA-seq. High-quality data were generated from a total of 22,923 cells, including 15,768 from ZNF558-CRISPRi organoids (2 gRNAs) and 7,155 from control organoids (lacZ-gRNA). We performed an unbiased clustering analysis to identify and quantify the different cell types present in the organoids. Eight separate clusters (Figures 6F and 6G) were identified, including cerebral cells of different stages of maturation, such as neural progenitor cells, newborn neurons, and mature neurons ( Figures  6F, 6G, and S6A). All of the clusters contained cells from both the 2-and 4-month time points, and we found no apparent difference in the contribution to the different clusters by ZNF558-CRISPRi organoids (Figures 6F, 6G, and S6B), suggesting that ZNF558 does not influence developmental fate during early human brain development.
Next, we analyzed the transcriptional difference between control and ZNF558-KD organoids. We confirmed the transcriptional silencing of ZNF558 in all cell populations at both time points ( Figure S6C) and the upregulation of SPATA18 after ZNF558 inhibition in NPCs ( Figure S6D). We found that in the neuron cluster, many genes linked to neuronal differentiation and maturation, such as MYT1L, NCAM1, or RBFOX1, were upregulated in ZNF558-CRISPRi organoids ( Figure 6H). In addition, several genes linked to the establishment of neuronal projections, such as DCC and ROBO1, were upregulated in ZNF558-CRISPRi organoids ( Figure 6H). An unbiased Gene Ontology (GO) analysis of genes upregulated in organoid neurons lacking ZNF558 confirmed a significant enrichment of genes in categories linked to neuronal differentiation and maturation as well as neuron projection and synapse formation ( Figure 6I). These results indicate that neurons present in organoids that lack ZNF558 display a more mature transcriptional profile than those found in control organoids.
These results demonstrate that silencing of ZNF558 results in cerebral organoids that contain the same cell types as control organoids, suggesting that ZNF558 does not influence developmental fate. However, we noted that ZNF558 organoids were smaller during early differentiation and displayed a transcriptome indicative of the presence of more mature neurons at later stages of differentiation. These observations are in line with a role for ZNF558 in developmental timing during early brain development.
Regulation of ZNF558 is controlled by a cis-acting VNTR Given that ZNF558 is conserved between human and chimpanzee, we next explored the underlying genetic basis for differential expression between the 2 species. We found limited variation in the human and chimpanzee genomic context upstream of ZNF558, suggesting that alterations of promoter structure are unlikely to be responsible. However, downstream of ZNF558 we noted a large repetitive region in the human genome assembly (chr19:8735220-8794192, GRCh38; Figure 7A). We investigated copy-number variation at this locus by generating read-depth profiles for a collection of 1,112 high-coverage human and non-human great ape genomes from publicly available datasets ( Figure 7B). Our analysis identified a VNTR motif with a unit length of 7,460 bp, which showed striking copy-number variation across populations and species (Figures 7B and 7C). In particular, we observed a significant difference in copy number at this VNTR locus between the human and non-human primate (C) RNA-seq analysis of ZNF558-CRISPRi fbNPCs for expression of TEs (left) and protein coding genes (right), p adjusted < 0.05, |log2(fc) > 1|. (D) qRT-PCR analysis of SPATA18 expression in fbNPCs after ZNF558 inhibition (n = 4). (E) H3K4me3 CUT&RUN after ZNF558 inhibition. (F) Epigenomic (H3K4me3, H3K9me3), transcriptomic (RNA-seq), and ZNF558 binding (ChIP-exo) analysis of the SPATA18 locus in human and chimp fbNPCs. Apart from the ChIP-exo analysis in HEK293T (Imbeault et al., 2017), data were obtained in at least biological duplicate (i.e., cells derived from 2 individuals for each species) with similar results. CUT&RUN data were normalized based on spike-in DNA. Scales for RNA-seq and ChIP-exo data are RPKM. (G) Schematics of the experimental strategy to overexpress ZNF558 in chimpanzee fbNPCs. (H and I) qRT-PCR analysis of ZNF558 and SPATA18 in chimpanzee fbNPCs transduced with a lentiviral vector overexpressing ZNF558 (n = 4). *p < 0.05 (Student's t test); error bars are mean ± SEM. See also Figures S4 and S5. (NHP) samples; all non-human great ape samples, except for orangutans, have significantly higher copy numbers (>70) of this VNTR compared to humans (means: 24-43 copies) (p = 8.61 3 10 À21 , Mann-Whitney U test; Figure 7C). This result is consistent with qPCR data using genomic DNA obtained from the human and chimpanzee iPSC lines used in the present study ( Figure 7D). One interesting outlier in this analysis is the orangutan, which has a VNTR copy number similar to that of humans (mean 36, standard deviation 8.1) compared to that of other non-human great apes. Published transcriptome data showed high ZNF558 transcript levels in orangutan forebrain organoids (Field et al., 2019) (Figure S2D). We noted that the VNTR copy number is variable in the human population. To investigate whether this variation influences ZNF558 expression, we analyzed publicly available genomic and expression data from 409 human lymphoblastoid cell lines (Geuvadis consortium, Lappalainen et al., 2013). In line with our hypothesis, we found that ZNF558 expression was negatively correlated with VNTR copy number (R = À0.24, p = 0.0000013; Figure 7E). This effect was specific to ZNF558 as the expression of other genes located within 250 kb of the VNTR was not correlated with copy number ( Figure S7A). These results demonstrate that a lower VNTR copy number correlates with higher ZNF558 expression, both across hominoid species and within the human population and suggest that the repetitive genomic region downstream of ZNF558 is responsible for its differential expression.
Epigenetic manipulation of the human VNTR switches the ZNF558-SPATA18 regulatory network Finally, we investigated the mechanism of cis-regulation by the VNTR. We considered that the longer VNTR in chimpanzees could have established a repressive hub leading to ZNF558 silencing. Profiling of H3K27me3 and H3K9me3 levels over the human and chimpanzee VNTRs is challenging due to the low mapability of this genomic region. However, exploiting a spikein normalization strategy for CUT&RUN data, we confirmed that relative read depths for control immunoglobulin G (IgG) tracks matched the relative VNTR copy numbers ( Figure S7C), thereby validating our mapping strategy. H3K9me3 but not H3K27me3 enrichment was detected over the VNTR in both human and chimpanzee samples using this approach (Figures S7B  and S7D). These data are consistent with a model in which the VNTR attracts H3K9me3, the functional consequence of which is controlled by the size of the VNTR. RNA-seq analysis showed evidence of transcription within the VNTR in human but not chimpanzee fbNPCs ( Figure S7B), and we found evidence of a human-specific long non-coding RNA (lncRNA) originating in the VNTR transcribed in the antisense orientation relative to ZNF558 ( Figure S7E). These observations suggest that the shorter human VNTR is not fully silenced.
To experimentally test the model of VNTR-mediated cisrepression, we used a multi-guide CRISPRi strategy to impose silencing over the shorter human VNTR. We designed 3 different gRNAs to target different parts of the VNTR unit to co-express with the dCas9 repressor fusion (Figures 7F and 7G). Upon iPSC transduction, we observed reduced expression of the VNTR-associated lncRNA ( Figure 7H), in line with targeted silencing. Strikingly, we observed robust inhibition of ZNF558 transcription upon VNTR silencing, which in turn led to increased levels of SPATA18 transcripts ( Figures 7I and 7J). These data support the model of a brain-specific gene-regulatory network in which a downstream VNTR controls ZNF558 expression in cis, which in turn affects the repression of its target gene SPATA18.

DISCUSSION
In this study, we show how genetic variations in non-coding regions of the genome can control the activity of conserved protein-coding genes, resulting in the establishment of species-specific transcriptional networks. To date, there is limited evidence that changes in cis-acting regulatory elements are important for human brain evolution. Classically, non-coding genetic changes are thought to result in gene expression differences along a gradient, leading to slightly more or less mRNA product of a nearby gene. It remains debatable whether such differences in mRNA levels correspond to changes in protein levels, since there is evidence of compensatory buffering at the translational level (Khan et al., 2013). It also remains unclear how differences in the level of a protein affects species fitness. In contrast, our results demonstrate that non-coding regions have the capacity to mediate an on/off switch of a conserved protein-coding gene.
KZFPs are the largest family of mammalian transcription factors and have rapidly expanded and evolved in the primate lineage. It has been proposed that KZFPs are engaged in an ''arms race'' with TEs, in which KZFPs evolve to bind specific TEs and silence their activity (Jacobs et al., 2014). In this model, TEs mutate their ZFP-targeted sites, thereby escaping from silencing and regaining activity. The KZFP would then further evolve to once again silence the TE, resulting in a dynamic competition between KZFPs and TEs that drive their evolution. This evolutionary mechanism has been proposed to have been used by host genomes to drive evolutionary mechanisms (Trono, 2015). In this study, we provide evidence of human-specific expression of several KZFPs in brain development. One example, ZNF558, arose >100 mya to repress then-active LINE-1 transposons. Although its TE targets have degenerated, ZNF558 is highly conserved among mammals, in line with functional biological roles independent of TE suppression. ZNF558 now binds to both ancient L1s-likely representing a genomic fossil-and several gene targets in human cells (Imbeault et al., 2017). During human brain development, we found that ZNF558 regulates a single protein-coding gene, SPATA18.
The genetic basis for the unusual on/off switch of ZNF558 expression between human and chimpanzee fbNPCs resides in a downstream VNTR. The length of the VNTR correlates with the repression of ZNF558; in humans, in whom the repeat is 30-40 units long, the locus is linked to active transcription, while in NHPs, the VNTR is much larger, and this is associated with transcriptional repression. We do not fully understand the molecular mechanisms that underlie how differences in VNTR unit copy number affects ZNF558 expression. The VNTR is decorated by the repressive histone mark H3K9me3 both in human and chimpanzee fbNPCs. Since the chimpanzee VNTR is longer, a VNTR-based heterochromatin region is substantially larger. We also found evidence that the short human VNTR unit is transcribed, giving rise to a lncRNA, which may influence in cis the expression of nearby genes such as ZNF558. The most likely explanation for the observed phenomenon is that the extended VNTR is a heterochromatic region and that its shortening reveals a brain-specific enhancer and/or lncRNA in the repeat. Such a copy-number-dependent VNTR-based epigenetic regulatory mechanism of nearby protein coding gene expression is reminiscent of what has been previously observed at the FSHD locus (Cabianca et al., 2012).
Our observations are in line with a model in which the common ancestor of humans and NHPs carried a long VNTR downstream of ZNF558. This VNTR contracted after the human/chimpanzee evolutionary split, ultimately resulting in differences in SPATA18 levels and altered mitochondria homeostasis in the developing human brain. However, this is a simplified model, and the functional role of ZNF558 is likely multifaceted. We note that both orangutans and macaques express ZNF558 during brain development (Field et al., 2019;Kanton et al., 2019) and have a conserved ZNF588 binding site in SPATA18. Thus, the ZNF558-mediated repression of SPATA18 during brain development is not only a human-specific mechanism but is also present in other primate species. In addition, ZNF558 is expressed in most adult tissues in humans, and we find no evidence of a divergence of expression between human and chimpanzee in most of these tissues. Thus, ZNF558 is likely to perform additional roles in adult tissues. While ZNF558 is a highly conserved gene, the ZNF558-binding site in SPATA18 is conserved only in primates, suggesting that ZNF558 plays a different role in other species, to regulate alternative co-opted targets. It will be interesting to chart the functional roles of ZNF558 in other more distantly related mammals, such as mice, in which innovation in target sequences may have led to co-option of this transcription factor for other functional roles.
The downstream consequences of a shortened VNTR and activated ZNF558 expression during human brain development is transcriptional repression of SPATA18. The product of SPATA18 is MIEAP, which clears mitochondria via mitophagy (Kitamura et al., 2011). The emerging literature indicates that mitochondria are central in neural stem cell fate decisions and the process of neurogenesis partly through the control of a metabolic switch from glycolysis to OXPHOS occurring when NPCs differentiate to neurons (Arrá zola et al., 2019;Khacho et al., 2016Khacho et al., , 2019. ARHGAP11B, a human-specific gene (Florio et al., 2015), is localized to the mitochondria, where it induces a cancer-like metabolism to promote the proliferation of basal neural progenitors (Namba et al., 2020). This recent observation directly links mitochondrial function to speciation and the expansion of the human cerebral cortex. Exactly how SPATA18, regulated by ZNF558 uniquely in humans, feeds into this metabolic pathway, remains to be determined, but our results using cerebral organoids indicate that ZNF558 plays a role in developmental timing during brain development. Organoids in which ZNF558 expression was silenced appeared smaller in size during the early phase of differentiation and displayed neurons with a (B) Copy-number trajectories illustrating an increase in copy number for a 7.46-kbp VNTR motif (chr19:8,759,103À8,766,563, GRCh38) in chimpanzee, bonobo, and gorilla relative to humans. Each line is the inferred copy-number trajectory of a given sample over this region. Black arrows at the top indicate a unit of the VNTR motif. The dashed lines indicate diploid copy number 2. (C) Copy-number variation for the 7.46-kbp VNTR motif among archaic and modern humans and non-human great ape samples. Each dot is the overall (diploid) copy number for a given sample. Black diamonds and bars represent the mean and 1 standard deviation of the VNTR copy numbers in individual populations and species (Mann-Whitney U test). (D) qPCR analysis of the VNTR copy number in the human and chimp iPSC lines used in this study. (E) Significantly negative correlation between the normalized expression of ZNF558 and the copy number of the 7.46-kbp VNTR motif (chr19: 8,759,103À8,766,563) near ZNF558 in humans. RNA-seq data generated by the Geuvadis consortium, which includes samples from the 1000 Genomes project. Each dot is 1 of the 409 samples that have data for both ZNF558 gene expression and copy-number estimates for the VNTR motif. The blue line represents linear regression, and the shaded area indicates the 95% confidence interval. (F and G) Schematics of the CRISPRi-based multi-guide strategy to epigenetically silence the human VNTR. pVNTR lncRNA, human-specific putative lncRNA originating from within in the VNTR extending toward ZNF558. (H-J) qRT-PCR analysis of VNTR-lncRNA, ZNF558, and SPATA18 in human iPSCs (HS1 and HS2) transduced with VNTR-CRISPRi lentiviral vectors (n = 5-6). *p < 0.05, (Student's t test); data represented as mean ± SEM in (H). See also Figure S7.

OPEN ACCESS
Article more mature transcriptome during later growth stages. These findings are reminiscent of previously observed differences when comparing human cerebral organoids to those derived from non-human great apes (Benito-Kwiecinski et al., 2021;Kanton et al., 2019;Marchetto et al., 2019).
Our data also demonstrate that the length of the VNTR at the ZNF558 locus is variable in the human population and that there are individuals with a repeat length similar to that of chimpanzees. We found that the length of the VNTR negatively correlates with ZNF558 expression in a panel of human lymphoblastoid cell lines. Unfortunately, it was not possible to resolve the actual composition of VNTR alleles for each individual from this dataset. However, the molecular and phenotypical analysis of human individuals carrying bi-allelic long VNTRs would be interesting and may uncover a role for VNTRs in human phenotypical variation. We also note that DNA duplications of the SPATA18 locus can lead to intellectual disability and delayed language and speech development (https:// www.deciphergenomics.org), suggesting that SPATA18 and its partner transcription factor ZNF558 may be linked to neurodevelopmental disorders.
In summary, our results illustrate how one KRAB-ZFP, ZNF558, was co-opted and subsequently contributed to human brain evolution. Future studies of KRAB-ZFPs and VNTRs in human evolution and human disease will be interesting and rewarding.

Limitations of the study
The fbNPC and organoid model systems used in this project recapitulates some aspects of human brain development. However, there are limitations to the accuracy of the cell models in comparison to actual human brain development, and they are associated with experimental variation which limits the strength of some conclusions. To validate the key findings in our study, we have relied on human fetal tissue. However, such material is not available from chimpanzees, and it was therefore not possible to validate the lack of ZNF558 expression in actual chimpanzee brain development.
Furthermore, our iPSC-modeling relies on a limited number of different chimpanzee iPSC lines. Where possible, we have cross-validated our results with published data from other iPSC lines, but the number of available iPSCs from different chimpanzee individuals are still very few ($10) and associated with shipment restrictions. This represents a limitation for our understanding of how ZNF558 expression is variable in the chimpanzee population.
Finally, our data demonstrate a cis-acting role for a downstream VNTR in controlling ZNF558 gene expression. Additional molecular investigations are needed to fully resolve the mechanism of how this non-coding region exerts its effect.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:

ACKNOWLEDGMENTS
We would like to thank Fred H. Gage, Wieland Huttner, Svante P€ a€ abo, Hindrik Mulder, Barbara Treutlein, Steve Henikoff, He Zhisong, Mitchell Vollger, and Yafei Mao for their comments and support. We also thank M. Persson Vejgå rden and A. Hammarberg for technical assistance. We are grateful to all of the members of the Jakobsson lab. The work was supported by grants from the Swedish Research Council (2018-02694, to J.J.), the Swedish Brain Foundation (FO2019-0098, to J.J.), Cancerfonden (190326, to J.J.), Barncancerfonden (PR2017-0053, to

DECLARATIONS OF INTEREST
The authors declare no competing interests.

RESOURCE AVAILABILITY
Lead contact Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Johan Jakobsson (johan.jakobsson@med.lu.se).

Materials availability
Plasmids generated in this study are available upon request.

Data and code availability
There are no restrictions in data availability. The RNA and DNA sequencing data presented in this study have been deposited at GEO: GSE182224 and are publicly available. Accession numbers are listed in the Key resources table. Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
This paper analyzes existing, publicly available data. These accession numbers for the datasets are listed in the Key resources table.
All original code has been deposited at GitHub and is publicly available as of the date of publication. DOIs are listed in the Key resources table.

Immunocytochemistry
The cells were washed once with DPBS and fixed for 15 minutes with 4% paraformaldehyde (Merck Millipore), followed by three rinses with DPBS. The fixed cells were then pre-blocked for a minimum of 30 minutes in a blocking solution of KPBS with 0.25% Triton X-100 (Fisher Scientific) and 5% donkey serum. The primary antibody (rabbit anti-FOXG1, 1:50 dilution, Abcam, RRID: AB_732415 and anti-NANOG, 1:100 dilution, Abcam, RRID: AB_446437) was added and incubated overnight. On the following day, the cells were washed twice with KPBS. The secondary antibody (donkey anti-rabbit Cy3; 1:200; Jackson Lab) was added with DAPI (1:1000; Sigma-Aldrich) as a nuclear counterstain and incubated at room temperature for one hour, followed by 2-3 rinses with KPBS. The cells were visualized on a Leica microscope (model DMI6000 B), and images were cropped and adjusted in Adobe Photoshop CC. 2012; Li et al., 2009). Normalized bigwig coverage tracks were made with bamCoverage (deepTools) (Ramírez et al., 2014), with a scaling factor accounting for the number of reads arising from the spike-in yeast DNA (10^4/aligned yeast read number). Tracks were displayed in IGV.

Evolutionary analysis of ZNF558
Phylogenetic tree, pairwise sequence alignment scores and dN/dS ratios of ZNF558 orthologs was downloaded from the Ensembl (Zerbino et al., 2018). Orthologs section for ZNF558 in nexus format and FigTree was used for visualization (http://tree.bio.ed.ac.uk/ software/figtree/). For analysis of DNA binding residues of the zinc finger domain of ZNF558 and orthologs, the ClustalW multiple sequence alignment from Ensembl orthologs was visualized in JalView. DNA contacting residues were defined the four amino acids in position À1, À4, À5 and À7 relative to the first histidine residue of the two zinc-coordinating histidine residues. For analysis of ZNF558 TE binding, TE coordinates were downloaded from RepeatMasker.
To check the conservation of the ZNF558 binding site in SPATA18, we used a set of 46 eutherian mammals at the comparative genomics section of Ensembl's webtool. Visualization was done using ggmsa (https://github.com/YuLab-SMU/ggmsa) and geo-m_logo (Wagih, 2017).
Using the resulting peaks from the ChIP-exo of ZNF558 with a score higher than 20 (see ChIP-exo analysis section), we classified ZNF558 binding sites between genes and TEs using HOMER's annotatePeaks.pl. Only intronic and intragenic TEs are reported (retrotransposons as well as simple repeats and DNA transposons). A peak was registered to be at a gene when the detailed annotation would be reported as 3 0 , 5 0 , exon, non-coding, promoter-TSS, TTS, or CpG.
To further explore the significance of ZNF558 binding sites at the different classes, families, and subfamilies of TEs, we used a script implemented by Kapusta et al. (2013),  (https://github.com/4ureliek/TEanalysis/blob/master/TE-analysis_Shuffle_bed.pl).This script uses bedtools (-shuffle bed) (version 2.27.1) (Quinlan and Hall, 2010) to shuffle the genomic locations of TEs from a given annotation file (-query) (intronic and intergenic TEs) and intersects it with the provided peaks (-feat) (ZNF558 ChIP-exo peaks with score > 20). It repeats this operation to calculate the number of binding sites of these peaks binding to a particular class, family, or subfamily of TEs as expected by chance.
To obtain the annotation file for intronic or intragenic TEs, we used bedtools intersect to perform an inner join between introns or intergenic regions as annotated at HOMER's hg38.basic.annotation, and repeatmasker original annotation file for hg38 (version 4.0.5).
For the maximum values of chromosomes (-range), we used UCSC hg38.chrom.sizes and for the excluded areas (-excl) we ran TE-analysis_Shuffle_bed.pl setting up the flag-dogaps and providing GRCh38.p13 fasta file.
ChIP-exo analysis ZNF558 and ZNF557 ChIP-exo data was downloaded along with input control from GSE78099 (Imbeault et al., 2017). Raw ChIP-exo reads were quality controlled with FastQC (Babraham) and aligned to the human reference genome (GRCh38) using bowtie2 withsensitive-local (Langmead and Salzberg, 2012). Unique reads were filtered by retaining only alignments with MAPQ > 10 (samtools view -q 10) (Li et al., 2009). Peaks were called with HOMER (Heinz et al., 2010) and those with score > 20 retained. Visualization of ChIP signals was done in deepTools using the computeMatrix and plotHeatmap modules (Ramírez et al., 2014). ZNF558 motif analysis was done with MEME (search for one motif, length 18-30 nt) (Bailey et al., 2015). Prediction of the ZNF558 DNA binding sequence was done as detailed in Persikov and Singh (2014).

CRISPRi
In order to silence the transcription of ZNF558 we used the catalytically inactive Cas9 (deadCas9) fused to the transcriptional repressor KRAB. Single guide sequences were designed to recognize DNA regions just down-stream of the transcription start site (TSS) according to the GPP Portal (Broad Institute). See Key resources table for guide RNA sequences. For the induction of repressive marks on the human VNTR we chose three guide RNAs from the UCSC Gene Browser CRISPR target track with high on-target specificity and only found within the VNTR region and avoiding transposable elements (RepeatMasker). See Key resources table for guide RNA sequences. The guides were inserted into a deadCas9-KRAB-T2A-GFP lentiviral backbone containing both the guide RNA under the U6 promoter and dead-Cas9-KRAB and GFP under the Ubiquitin C promoter (pLV hU6-sgRNA hUbC-dCas9-KRAB-T2a-GFP, a gift from Charles Gersbach, Addgene plasmid #71237 RRID:Addgene_71237). The guides were inserted into the backbone using annealed oligos and the BsmBI cloning site. Lentiviruses were produced as described below yielding titers between 4.9E+08 and 9.3E+09. Control virus with a gRNA sequence not present in the human genome (LacZ) was also produced and used in all experiments. All lentiviral vectors were used with an MOI between 5 and 20. Cells were FACS sorted as described above and knock-down efficiency was validated using standard quantitative real-time RT-PCR techniques.
ZNF558-g2 GCCAAAAGCGCCGACTCGCG; ZNF558-g3 AGTCGGCGCTTTTGGCCCCG; LacZ TGCGAATACGCCCACGCGAT; VNTR-g1 GCTGCCCTGAGATATGTGTG; VNTR-g2 TACTGGAATGGGTAGGAATG; VNTR-g3 CCAGGAGCTGCACATTGAGG GFP+ cell isolation; At day 14, cells were detached with Accutase, resuspended in differentiation media with Rock inhibitor (10mM, Miltenyi) and Draq7 (BD Bioscience), and strained (70mm, BD Bioscience). Gating parameters were determined by side and forward scatter to eliminate debris and aggregated cells. The GFP-positive gates were set using untransduced cells. The sorting gates and knockdown samples was carried out using Seurat function FindMarkers (Wilcox test, p adjusted < 0.01). For SPATA18 expression, only cells with greater than zero expression for SPATA18 were included for the analysis. Cell cycle scores were computed using Seurat function CellCycleScoring. Gene ontology overrepresentation analysis was performed using the enrichGO function in the clusterProfiler package using genes that are expressed in this dataset as background (padj < 0.05) (Yu et al., 2012).
Copy number estimation for the VNTR locus near ZNF558 in human and non-human great ape samples To investigate the copy number variation for the large VNTR downstream of ZNF558 in human and non-human great ape lineages, we applied a read-depth based copy number genotyper (Sudmant et al., 2010) to a collection of 1,112 high-coverage genomes from several publicly-available resources (Bergströ m et al., 2020;Mafessoni et al., 2020;Mallick et al., 2016;Meyer et al., 2012;Prado-Martinez et al., 2013;Pr€ ufer et al., 2017). In short, sequencing reads were divided into multiples of 36-mer, which were then mapped to a repeat-masked human reference genome (GRCh38) using mrsFAST (Hach et al., 2010). Up to two mismatches per 36-mer were allowed in order to increase our mapping sensitivity and read depth of mappable sequences in our analysis was corrected for underlying GC content. Finally, copy number estimate for the locus of interest was computed by summarizing over all mappable bases for each sample.

QUANTIFICATION AND STATISTICAL ANALYSIS
Statistical details of the experiments can be found in the figure legend of related figure panels. For western blot, qPCR. qRT-PCR and size measurements Student's test was used and p < 0.05 was considered significant. Quantification of qPCR and qRT-PCR was done using the delta delta Ct method. For bioinformatical statistical analysis see each individual method section and in relevant figure legends.