CNEReg Interprets Ruminant-specific Conserved Non-coding Elements by Developmental Gene Regulatory Network

The genetic information coded in DNA leads to trait innovation via a gene regulatory network (GRN) in development. Here, we developed a conserved non-coding element interpretation method to integrate multi-omics data into gene regulatory network (CNEReg) to investigate the ruminant multi-chambered stomach innovation. We generated paired expression and chromatin accessibility data during rumen and esophagus development in sheep, and revealed 1601 active ruminant-specific conserved non-coding elements (active-RSCNEs). To interpret the function of these active-RSCNEs, we defined toolkit transcription factors (TTFs) and modeled their regulation on rumen-specific genes via batteries of active-RSCNEs during development. Our developmental GRN revealed 18 TTFs and 313 active-RSCNEs regulating 7 rumen functional modules. Notably, 6 TTFs (OTX1, SOX21, HOXC8, SOX2, TP63, and PPARG), as well as 16 active-RSCNEs, functionally distinguished the rumen from the esophagus. Our study provides a systematic approach to understanding how gene regulation evolves and shapes complex traits by putting evo-devo concepts into practice with developmental multi-omics data.


Introduction
To answer the key question of how new traits arise during the macroevolutionary process, biologists have long realized the necessity to understand the gene regulation in development responsible for morphological diversity, i.e., which genes are expressed, what regulatory element changes are involved, and how regulatory element changes affect development [1].Only recently have the field of large-scale omics and the accumulation of data matured sufficiently to explore these theoretical concepts in detail.Here, we investigate the ruminant multichambered stomach, a key mammalian organ innovation and a cornerstone of evolutionary theory, as an example to illustrate a novel framework for integrating multi-omics data to address the fundamental question of organ innovation.
The rumen hosts a diverse ecosystem of microorganisms and facilitates efficient plant fiber digestion and short-chain fatty acid uptake, which significantly promotes the expansion and diversification of ruminant animals by providing a unique evolutionary advantage relative to non-ruminants [2].This remarkable morphological innovation raises the fundamental question of how the genetic toolkit generates functional complexity through development and evolution [1,3,4].By comparing 51 ruminants with 12 mammalian outgroup species genomes, we previously identified 221,166 ruminant-specific conserved non-coding elements (RSCNEs), which span approximately 0.61% of the genome (16.5 Mb in total) [5].These RSCNEs are potential regulatory elements of proximal or distal genes for transcriptional regulation in the development of morphological and physiological traits [6].In addition, we previously sequenced two representative ruminants (sheep and roe deer) for gene expression across 50 tissues.Comparative transcriptome analysis revealed 656 rumen-specific expressed genes (RSEGs) and implied that the anatomical predecessor of the rumen is the esophagus by the most similar expression profile [5,7].There is a pressing need to understand how RSCNEs lead to changes in the expression of RSEGs.
One major bottleneck is that the cellular context, target genes (TGs), and mode of gene regulation of RSCNEs are largely unknown.First, the regulatory role of RSCNEs could be spatiotemporally dynamic and highly context-specific.Second, some RSCNEs were located distant (e.g., more than 500 kb) from any gene and therefore could not be associated with any TGs using standard the closest transcription start site (TSS) approaches, such as GREAT [8].This problem is emphasized by a recent finding that a non-coding region associated with a human craniofacial disorder causally affects the expression of SOX9 at a distance of up to 1.45 Mb during a restricted time window of facial progenitor development [9].This example motivated us to interpret the function of RSCNEs by uncovering gene regulatory networks (GRNs) with distal regulations from multi-omics data integration at different developmental time points and in different tissue types.
To tackle the aforementioned challenges, we generated time series of paired gene expression and chromatin accessibility data during rumen and esophagus development in sheep to reconstruct a developmental GRN.Our previous efforts showed that jointly modeling multi-omics data allows us to infer high-quality tissue-specific regulatory networks [10], which can be used to identify key transcription factors (TFs) during differentiation [11], reveal causal regulations [12], and interpret functionally important genetic variants [13].Taken together, we aim to integrate multi-omics data to reconstruct a genome-wide GRN during different stages of development in an apomorphic organ.Specifically, this allows us to understand how TFs bind to functional RSCNEs to coordinate cell type-specific gene expression of RSEGs and hence to gain further insights into the evolutionary development of new traits.

The landscapes of accessible chromatin regions and gene expression during rumen development
We resolved high-resolution chromatin accessibility and gene expression landscapes during rumen development by collecting ruminal epithelial cells, esophageal epithelial cells, and hepatocyte cells at five stages [embryonic day 60 (E60), postnatal day 1 (D1), postnatal day 7 (D7), postnatal day 28 (D28), and adult 1 year (Y1)] from 14 sheep (Figure 1A).Our experimental design covers the major stages of ruminal epithelium differentiation and development [14,15] and ensures an exact matching of tissues used for RNA sequencing (RNA-seq) and Assay for Transposase-Accessible Chromatin with highthroughput sequencing (ATAC-seq) libraries.In total, 37 ATAC-seq and 34 RNA-seq datasets, including biological and technical replicates, showed high quality (see Materials and methods; Tables S1 and S2).The ATAC-seq samples have an average of 115 Mb post-quality control (QC) uniquely mapped fragments to the sheep Oar_4.0 genome (Table S1; Figure S1A), which are highly enriched at TSSs (Figure S1B) and show a nucleosome structure consistent distribution (Figure S1C).We obtained 178,651 open chromatin regions (OCRs) across all samples (in average 46,872 peaks per sample) (Table S1).
Pan X et al / CNEReg Reconstructs Gene Regulatory Network for Trait Innovation Hierarchical clustering of gene expression and chromatin accessibility showed that rumen development is a multi-stage biological process (Figure 1B and C).Stages E60 and D1 clustered in one group, and D7, D28, and Y1 clustered in another group by gene expression.Chromatin accessibility patterns further distinguished stages E60 and D1.Principal component analysis (PCA) for 14,637 expressed genes and 178,651 OCRs corroborated this multi-stage pattern (Figure 1D and E).Early developmental stages E60 and D1 showed larger replicate variation than D7, D28, and Y1 at both chromatin accessibility and gene expression levels (Figure 1D and E).In addition, chromatin accessibility showed a smoother trajectory than gene expression during rumen development (Figure 1C).
The esophagus showed very similar multi-stage development (Figure S2A and B).PCA indicated a larger variance in developmental stages (PC1 32%) and a smaller variance among tissue types (PC2 25%) (Figure S2C and D).This pattern is consistent with a previous study showing that gene expression divergence between tissues/cell types increases as development progresses [16].Importantly, our chromatin accessibility data mirror this pattern, i.e., the similarity in chromatin accessibility distribution between the two tissues declines as development progresses.

Active-RSCNEs serve as enhancers in the process of rumen development and evolution
We obtained 159,837 reproducible OCRs by intersecting peaks from three replicates for the rumen and esophagus at four developmental stages.The number of reproducible OCRs was the largest at stage E60 (approximately 40%) and decreased along the developmental stages (Figure 2A), which is consistent with the observation of higher amounts of accessible chromatin at the embryonic stage [17].Most reproducible OCRs were located at distal intergenic (39.42%), intron (32.61%), and promoter (±3 kb from TSS; 21.46%) regions (Figure 2B).After overlapping the OCRs with 221,166 RSCNEs from ruminant comparative genomics analysis [5], we identified 1601 active-RSCNEs with an average length of 82 bp (Table S3).Again, the number of active-RSCNEs decreased during the developmental stages in both the rumen and esophagus (Figure 2C).They were mainly located in distal intergenic (48.95%), intron (42.40%), and promoter (4.96%) regions (Figure 2D).Compared with all reproducible OCRs, active-RSCNEs were less abundant in promoter regions by 15% (Figure S3A), and the esophagus showed a consistent trend (Figure S3B).This suggests that active-RSCNEs tend to function as distal elements during development.In addition, our observation that most active-RSCNEs are found in early developmental stages (> 90% in E60, D1, and D7) emphasizes the importance of early developmental cellular context for interpreting the regulatory role of conserved non-coding elements (CNEs).
The 1601 active-RSCNEs are composed of 414 type I and 1187 type II RSCNEs (Figure 2G; Table S3).Type I RSCNEs have no known orthologs in non-ruminant outgroups, and type II orthologs exhibit significantly higher substitution rates among outgroups [5].The ratio between type I and type II active-RSCNEs is $ 0.35, which is 4-fold lower than that of all RSCNEs (a type I/type II ratio of $ 1.77) (Figure 2G).This surprising fact suggests that type II RSCNEs tend to be more activated in the developmental stage than type I RSCNEs.Because of the deeper evolutionary origin of type II RSCNEs, they are more likely to function by altering existing regulatory elements.Furthermore, we found that active-RSCNEs are enriched for binding motifs of transcriptional regulators known to play a vital role in rumen development (AP-1, PITX1, TP63, KLF, GRHL, TEAD, OTX, and HOX; 128 motifs with Benjamini Q value < 1EÀ3 are listed in Table S4), suggesting that some active-RSCNEs may act as rumen developmental enhancers.
To assess whether the RSCNEs are likely to play an enhancer role, we next compared our 1601 active-RSCNEs with the 523,159 developmental regions of transposaseaccessible chromatin (d-TACs) from mice [18] and 926,535 human enhancers from Encyclopedia of DNA Elements (ENCODE) phase III [19].Approximately 24% of the active-RSCNEs can be found in these datasets (Figure 2H), and 11 active-RSCNEs show in vivo reporter activity according to the VISTA database [20] (Figure 2H).To validate the potential regulatory activity, 10 active-RSCNEs of length $ 300 bp were randomly selected and assessed for enhancer activity detection in both sheep and goat fibroblasts in vitro.Nine of them showed significantly higher luciferase transcriptional activation than the pGL3-promoter control (ttest, P < 0.05; Figure 2I).Collectively, these results suggest that active-RSCNEs potentially serve as enhancers in the process of rumen development and evolution.

CNE interpretation method by GRN
After finding that active-RSCNEs may function as enhancers and hence have significant impacts on morphological evolution [21], we next developed the CNE interpretation method to integrate multi-omics data into gene regulatory network (CNEReg) as an evolutionarily conserved non-coding element interpretation method.The method works by modeling the paired gene expression and chromatin accessibility data during rumen and esophagus development and consolidating them into a GRN.A GRN helps to understand in detail the process of TF binding to active-RSCNEs and how this leads to the cell type-specific activation of RSEGs during different stages of development.CNEReg takes as input a set of paired timeseries gene expression and chromatin accessibility data, ruminant comparative genomes, and comparative transcriptomes and outputs the developmental regulatory network of the active-RSCNEs.The three major steps of CNEReg include multi-omics data integration, model component identification, and developmental regulatory network inference (Figure 3A and B; see Materials and methods).The major steps and results of developmental regulatory network reconstruction are illustrated in the following sections.

Identifying TTFs during rumen development and evolution
We proposed toolkit transcription factors (TTFs) as the core concept of CNEReg and developed a computational pipeline to define and discover the developmental genetic TTFs in evo-devo that may control development, pattern formulation and identity of body parts, and recruit novel function (details in Materials and methods).We first separated 37 TFs from 619 non-TF TGs in 656 RSEGs.These 37 TFs were further filtered by a more stringent expression specificity Jensen-Shannon Median expression Score (JMS) and were required to have nearby active-RSCNEs 1 Mb upstream or downstream of the TSS (see Materials and methods).Finally, 18 TTFs were defined (Table S5).Their expression profile phylogeny well recovered the tissue lineage system (Figure 4A).Rumen was clustered the closest to the reticulum, omasum, esophagus, Luciferase activity assay of 10 active-RSCNEs randomly chosen from 1601 active-RSCNEs.RSCNE, ruminant-specific conserved noncoding element; RSEG, rumen-specific expressed gene; GO, Gene Ontology; UTR, untranslated region; d-TAC, developmental region of transposase-accessible chromatin; cCRE, candidate cis-regulatory element.and then skin and other keratin tissues, which is consistent with the basic stratified epithelium shared in the rumen with skin.These 18 TTFs also well represented the major functions of the rumen associated with other tissue systems, including the gastrointestinal system, integumentary system, reproductive system, muscular system, nervous system, and endocrine system (Figure 4B).
We observed that the rumen recruited TTFs from multiple tissues to drive gene expression.More TTFs were expressed from the gastrointestinal system than from other systems.For example, paired box protein 9 (PAX9) is a known key TF during esophagus differentiation that may play an important role in the origin of rumen from the esophagus [22].The homeobox family TFs HOXC8 and HOXC4, together with PITX1, are key developmental regulators of specific positional identities on the anterior-posterior axis [23,24].The other four TTFs, OVOL1, SOX21, TFAP2A, and TP63, are from the integumentary system and serve as master regulators in the regulation of epithelial development and differentiation [25][26][27][28].
We classified 18 TTFs into two types according to their dynamic gene expression pattern during rumen development.PITX1, BARX2, SOX2, GRHL1, GRHL3, TFAP2A, OTX1, DMRT2, and TWIST2 are early-development TTFs showing the highest expression at E60 or D1 (Figure 4C).In contrast, PAX9, TP63, HOXC4, SOX21, HOXC8, OVOL1, PPARG, POU2F3, and TEAD4 were late-development TTFs and were highly expressed at D7, D28, or Y1 (Figure 4C).We further associated those TTFs with 6 cell types by scRNA-seq data in the skin organoid culture system [29], and they showed specific expression levels at single-cell resolution in a complex skin organ model by reprogramming pluripotent stem cells.For example, TWIST2 is specifically expressed in fibroblast (Figure 4C).TWIST2 remodels chromatin accessibility to regulate the maturation of fibroblasts [30] and is required for epithelial-mesenchymal transition [31].Its high expression at the early developmental stage of the rumen may relate to the ruminal epithelial development.Totally, our results indicated these identified TTFs act important regulatory roles in diverse cell types of the rumen.

Constructing upstream and downstream regulations of rumen TTFs
To explore how TTFs are regulated and recruited, we scanned the active-RSCNEs near TTFs for sequence-specific TF motif binding by HOMER [32], retained those TFs correlating well with TTFs (Spearman's correlation coefficient > 0.6 across RNA-seq samples), and fitted a linear regression model integrating our paired expression and chromatin accessibility data to reveal upstream regulators of 18 rumen TTFs (Figure 3B; see Materials and methods).The resulting upstream regulatory network of TTFs (Figure 4D) identified 39 active-RSCNEs (15 type I and 24 type II) bound by 113 TFs for 18 TTFs (Table S6).GRHL1, an important regulator of keratin expression [33], is regulated by 31 TFs via 6 active-RSCNEs, suggesting its potential roles in rumen development.
To explore the regulatory roles of these 18 TTFs, we first scanned 1440 active-RSCNEs located 1 Mb upstream or downstream around 512 RSEGs [fragments per kilobase per million mapped reads (FPKM) > 1 in at least one development stage] by HOMER [32] for binding sites of the 18 rumen TTFs.Then, a linear regression model quantitatively associated the accessibility of active-RSCNEs with the expression of TTFs and RSEGs (Figure 3B; see Materials and methods).The resulting downstream regulatory network of TTFs linked 139 active-RSCNEs (26 type I and 113 type II) with 14 TTFs and 93 RSEGs (Figure 5A; Table S7).RSEGs were categorized into seven different tissue systems by their expression specificity [5,7].The gastrointestinal and integumentary systems both have 28 RSEGs that are functionally enriched in the hair/molting cycle process (Fisher's exact test, adjusted P = 1.50EÀ2) and regulation of antimicrobial peptide production (Fisher's exact test, adjusted P = 3.58EÀ6).This is con-sistent with our previous finding that the rumen evolved several important antibacterial functions specifically managing the microbiome composition [2].The SLC14A1 gene was specifically highly expressed in the rumen and hypothesized to be recruited from the urinary system (Figure 5A).CNEReg identified four active-RSCNEs bound by three TTFs, OTX1, PPARG, and SOX21, to regulate SLC14A1 (Figure 5B).
CNEReg designed a functional influence score by integrating regulation and conservation in evolution (Figure 3B; see Materials and methods) and ranked the active-RSCNEs in TTF upstream (Figure S4; Table S6) and downstream networks (Figure S5; Table S7).Then, we selected the top 10 active-RSCNEs for enhancer activity detection in sheep fibroblasts in vitro.Nine of ten showed significantly higher luciferase transcriptional activation than the pGL3-promoter control (ttest, P < 0.05) (Figure S6).Collectively, CNEReg provides a high-quality developmental regulatory network to study rumen evolution.

Regulatory sub-network underlying the rumen and esophagus divergence
We previously hypothesized that the anatomical predecessor of the rumen is the esophagus based on their similar expression profile compared with 49 other tissues [5,7].It is therefore of interest to identify the gene regulatory network underlying the differentiation between the rumen and esophagus.We first identified differentially expressed genes (4, 258, 577, and 2372 for E60, D1, D7, and Y1, respectively, in Figure S7A) and differentially accessible regions (9436, 10,004, 3984, 3566, and 26 for E60, D1, D7, D28, and Y1, respectively, in Figure S7B) between the rumen and esophagus at each developmental stage.Then, we identified six TTFs (PPARG, SOX21, TP63, OTX1, SOX2, and HOXC8) showing both significant differences in expression and in motifs enriched within the rumen OCRs (Figure 6A; see Materials and methods).HOXC8 showed the largest difference at the earliest developmental stage, both in expression level and motif enrichment, and SOX21, SOX2, OTX1, and PPARG showed similar trends.TP63 differentiates from D7, in which the gene expression level and motif enrichment decline quickly in the esophagus but not in the rumen.
We extracted the six differential TTFs from the TTF downstream regulatory network to form a regulatory sub-network that also included 24 differentially expressed RSEGs and 38 active-RSCNEs (Figure 6B; Table S8).The 24 differentially expressed RSEGs were classified into gastrointestinal, integumentary, reproductive, nervous, muscular, immune, and urinary systems, and 10 of 24 non-TF RSEGs were classified into integumentary systems.Seven non-TF RSEGs (KRT17, KRT36, LOC101118712, ATP6V1C2, KLK10, SPINK9, and IRX) were regulated by SOX21.A previous study revealed that SOX21 could determine the fate of ectodermal organs and control epithelial differentiation [28].We observed that SOX21 binds to RSCNE with genomic coordinates ''chr11: 40325877-150" to regulate the expression of KRT17, KRT36, and LOC101118712.The functional influence of RSCNE with genomic coordinates ''chr11:40325877-150" was ranked at the top of all type II active-RSCNEs in the differentially regulatory sub-network (Table S8).These RSEGs Pan X et al / CNEReg Reconstructs Gene Regulatory Network for Trait Innovation were enriched in epidermal development, formation of anatomical boundaries, and urea transmembrane transport biological processes (Table S9), which are consistent with the functional differences between the rumen and esophagus.The 38 active-RSCNEs may imply the potential genetic basis of rumen origin and evolution from the esophagus.

Transposable elements may rewire the GRN through active-RSCNEs
After interpreting active-RSCNEs as important regulators of TTFs and RSEGs in rumen development, we next addressed the genomic origin of the active-RSCNEs.Transposable elements (TEs) are known to constitute a high proportion of taxonomy-specific CNEs, play a central role in rewiring gene regulatory networks, and facilitate the novel or rapid evolution of ecologically relevant traits [34,35].Hence, we estimated the percentage of active-RSCNEs may be derived from TEs.Among 39 and 139 active-RSCNEs in the TTF upstream and downstream networks, we identified 6 (15.38%) and 12 (8.6%)TEs, respectively.This gives a 1.8-fold enrichment of TEs in active-RSCNEs associated with TTFs relative to non-TTF RSEGs.At the gene level, 6 of 18 TTFs (33.33%) and 12 of 93 RSEGs (12.90%) are regulated by TEs via active-RSCNEs.This gives a 2.58-fold enrichment of TEs associated with TTFs relative to RSEGs.If we associated the TEs with RSEGs by their proximity in genome coordinates, there were 85 TEs around all 656 RSEGs (±200 kb), i.e., 13% of RSEGs are associated with TEs in average.However, 33.3% TTFs are associated with TEs, and this is 1.56-fold higher than RSEGs.Together, our data suggest that TEs may recruit TTFs and rewire the regulatory network to give rise to trait novelties.

Discussion
The evolution of new traits is driven by several types of genetic reprogramming, including mutations in protein-coding genes and post-transcriptional mechanisms, the transformation of regulatory elements, such as promoters and enhancers, and the recruitment of gene expression from other organs [36,37].Mutations in non-coding regulatory regions are believed to Pan X et al / CNEReg Reconstructs Gene Regulatory Network for Trait Innovation selectively perturb TG expression in a specific tissue context and thereby circumvent any pleiotropic effects from proteincoding mutations [38].Recent advances in comparative genomics, along with the increased availability of whole genome sequences, have led to the identification of many CNEs, which are assumed to have regulatory functions [1,6,39].Therefore, the time is ripe for an analytical framework to investigate the regulatory role of such CNEs.
Biologically, we propose a model of gene expression recruitment by CNEs.Our results show how CNEs can regulate gene expression as either trans-regulatory elements (TTFs in this study) or cis-regulatory elements (CREs; active-RSCNEs in this study) of TGs (RSEGs).Methodologically, CNEReg provides a framework to integrate comparative genomics, comparative transcriptomic, and multi-omics data to interpret CNEs by GRN.On the one hand, GRN identifies TTFs and active-RSCNEs as hypotheses, which need to be pursued by in vitro and in vivo functional studies.On the other hand, GRN presents the global picture of how the rumen recruits gene expression from other tissues by activating RSCNEs to achieve many traits.This allows us to explore many biological hypotheses and rank candidates for further functional study.As an example, we reconstructed a sub-network underlying rumen and esophagus divergence, which could further interpret the differences between the rumen and its ancestral organ.The identified TTFs and active-RSCNEs in the sub-network may account for the origin and evolution of the rumen.For example, as one of the genes in Hox gene family, HOXC8 has been implicated in the divergence of axial morphology [40].The proper expression of Hox genes is essential for the precise patterning of the body plan.HOXC8 shows differences in gene expression and motif enrichment between the rumen and esophagus during development.These results indicate that the rumen employed a different gene regulatory program when differentiating from the esophagus.
Our method for systematically interpreting conserved cisregulatory sequences in non-coding regions by integrating developmental multi-omics data will have a broad interest in other applications.For example, the Zoonomia project describes a whole-genome alignment of 240 species comprising representatives from more than 80% of mammalian families [41].The bird 10,000 Genomes Project provides a comparative genome dataset for 363 genomes from 92.4% of bird families [42].Recently, 6.9 million CNEs from many vertebrate genomes have been collected into the dbCNS and await interpretation [43].
Our work is limited in several aspects.CNEReg infers gene regulation as the interaction of TFs with accessible DNA regions in development and relies on the correlation of gene expression and chromatin accessibility across samples.A much deeper understanding can be revealed by ChIP-seq data and 3D chromatin interaction data to provide physical enhancerpromoter interactions.In addition, time course regulatory analysis of omics data measured at shorter and closer developmental stages will help us to infer more accurate regulatory network [12].Furthermore, developmental samples are known as a heterogeneous mixture of many cell types, and it will be fruitful to infer the GRNs of the underlying cell types based on scATAC-seq and scRNA-seq data [10].
In conclusion, CNEReg is demonstrated as a systematic approach to understanding the large-scale maps of CNEs by modeling omics data over development for its act on gene regulation.We see the potential that CNEReg can be generalized to understand the complex traits or the origin and evolution of vertebrate organs with multi-omics data generated in proper time and space.Our method allows evo-devo thinking in how gene regulation could evolve and shape animal evolution.

CNEReg infers developmental regulatory network to interpret CNEs
CNEReg aims to systematically fill the gap between CNEs and their significantly impacted morphology in evolution.This is done by reconstructing a developmental regulatory network by paired time series of paired gene expression and chromatin accessibility data.Particularly in sheep, CNEs are RSCNEs, and morphology is the innovation of the rumen organ, which is further denoted by the set of rumen-specific genes.We reconstructed a gene regulatory network during rumen development to systematically understand how TFs regulate genes via batteries of RSCNEs, which over development led to the cell type-specific activation of RSEGs.
The main idea of CNEReg is to define those TTFs as major players in evo-devo and to study how those TFs are regulated  Note: CNEReg, conserved non-coding element interpretation method to integrate multi-omics data into gene regulatory network; TTF, toolkit transcription factor; TF, transcription factor; RSEG, rumen-specific expressed gene; RSCNE, ruminant-specific conserved non-coding element.
by RSCNEs and how they utilize RSCNEs to regulate RSEGs.CNEReg models the expression of TGs conditional on the chromatin accessibility of RSCNEs and the expression of TFs.CNEReg is composed of three steps, as shown in Figure 3, and uses three equations to model (1) the expression of TTFs, (2) the expression of RSEGs, and (3) the functional influence of RSCNEs (Figure 3; Table 1).

Defining and identifying TTFs
We identified TTFs by their nearby evolutionally conserved CREs in the genome, expression patterns across tissues, and expression levels in developmental stages.TTFs should satisfy-four conditions: (1) TFs should be rumen-specifically expressed genes (37 TFs in the 656 RSEGs); (2) there should be active-RSCNEs around TFs (±1 Mb, 35 TFs remain); (3) TFs should be expressed (FPKM > 1) at least one time point during rumen development (30 TFs remain); and (4) these TFs should have additional tissue specificity.TFs were ranked by our tissue specificity JMS (see ''Defining tissue specificity score" section), and only the TFs for the top 50 specificities in at least one tissue were selected (18 TFs remain).Finally, 18 TFs were identified as TTFs and are listed in Table S5.These TFs played a leading role in rumen development (Table S5) and served as the main component to construct the rumen developmental regulatory network.

Modeling expression of TTFs
We modeled how TTFs are regulated from paired gene expression and chromatin accessibility data to reconstruct the upstream regulatory network of TTFs.We established a linear regression model as follows to reveal the upstream regulators of the 18 TTFs (schematic illustration in Figure 3 and mathematical notations in Table 1).
where TTF l is the expression of the l-th TTF; MB i is the set of TFs with significant motif match in the i-th active-RSCNE; and TF m is the expression of the m-th candidate TF with a binding motif to regulate the l-th TTF.The Spearman correlation coefficient between TF m and TTF l is greater than 0.6 [false discovery rate (FDR) Q value < 0.01] to ensure the potential regulatory relationship; O i represents the chromatin accessibility score of the i-th active-RSCNE within 2 Mb around the l-th TTF.b is the parameter to be estimated.If b l;i is statistically significant non-zero in the regression analysis, the i-th active-RSCNE and its TFs in MB i will be contained in the upstream regulatory network of the l-th TTF.

Modeling expression of RSEGs
We modeled how the RSEGs are regulated by TTFs and their active-RSCNEs from paired gene expression and chromatin accessibility data, i.e., to reconstruct the downstream network regulated by TTFs.We established the linear regression model as follows (schematic illustration in Figure 3 and mathematical notations in Table 1): where TTF l is the expression of the l-th TTF; O k represents the chromatin accessibility score of the k-th active-RSCNE with binding sites of the l-th TTF; and RSEG n is the expression of the n-th RSEG with the k-th active-RSCNE within approximately 2 Mb.In practice, we determine the downstream regulation relationship with Spearman correlation that can eliminate the outlier values to simplify the calculation.When the Spearman correlation coefficient c l;n;k between RSEG n and ðTTF l Á O k Þ 1 2 is greater than 0.7 (FDR Q value < 0.01), the n-th RSEG is likely to be regulated by the l-th TTF through binding to the k-th active-RSCNE.The extracted TTF, active-RSCNEs, and RSEG triplets are formed the TTF's downstream regulatory network.

Quantifying the functional influence of active-RSCNEs
We quantified the functional influence of active-RSCNEs, ranked the active-RSCNEs, and selected the top active-RSCNEs as experimental candidates.This task can be done by integrating the RSCNE's conservation score in comparative genomics analysis with its regulatory potential in our developmental regulatory network.
We first collected conservation scores of active-RSCNEs from a comparative genomics study [5].RSCNEs were classified into two types by their conservation patterns across species.Type I RSCNEs had no outgroup sequence aligned, and type II RSCNEs had orthologous sequences in one or more outgroups but were only conserved in ruminants.For the k-th active-RSCNE, the conservation score C k was calculated by the PhastCons score (type I) or PhyloP score (type II).
We then estimated the regulatory strength of active-RSCNEs in the upstream and downstream regulatory networks of TTFs.An active-RSCNE played a regulatory role in the regulatory network if four conditions were satisfied: (1) this active-RSCNE should be a chromatin-accessible peak; (2) TTFs should bind to this active-RSCNE; (3) RSEGs regulated by this active-RSCNE with TTF binding should be expressed; and (4) the expression of binding TTFs and the accessibility of this active-RSCNE should be correlated with the expression of regulated RSEGs.By combining these four factors, we defined the regulatory strength R k;t of the k-th active-RSCNE at time point t in the regulatory network as follows: where O k;t is the chromatin accessibility score of the k-th active-RSCNE at time point t in the rumen; B k;l is the motif binding strength of the l-th TTF on the k-th active-RSCNE (computed by HOMER); TTF l;t is the expression of the l-th TTF at time point t in the rumen; RSEG n;t is the expression of the n-th RSEG at time point t in the rumen; and c l;n;k is the Spearman correlation coefficient between RSEG n and ðTTF l Á O k Þ 1 2 from the regulatory network.Then, the regulatory strength R k of the k-th active-RSCNE was defined as the maximum value across all time points in rumen samples as follows: The regulatory strength R k is from the multi-omics data in development, and the conservation score C k is from multigenome data across species.The two measures are at the regulation level and genome sequence level, respectively.They can be naturally assumed to be independent of each other.In practice, we found that the regulatory strength and the conservation score were quite complementary to each other (Figures S4 and S5) for active-RSCNEs.Hence, we defined the functional influence W k of the k-th active-RSCNE as the geometric mean of the regulatory strength R k and the conservation score C k as follows: This functional influence score allows us to prioritize active-RSCNEs by approximating their importance in rumen innovation.
Defining tissue specificity score Specificity illustrates the property that genes are functional in one particular biological context compared with other contexts.For our transcriptomics data across 50 tissues in sheep, genes highly expressed in only one or several tissues but not expressed in other tissues were defined as tissue specific.Our gene expression matrix had 23,126 rows (the number of expressed genes) and 830 columns (the number of samples sequenced in 50 sheep tissues with each tissue having several biological replicates; Table S10).
To quantify the tissue specificity, we proposed a Jensen-Shannon Median expression Score (JMS) for a gene in certain tissues to combine the gene expression level with a Jensen-Shannon divergence (JSD) value as follows: where medðGÞ represents the gene's median expression in a certain tissue across biological replicates.ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi medðGÞ 3 p can guarantee that the numerator and denominator are of the same magnitude.JSD is the Jensen-Shannon divergence to evaluate the gene's expression specificity introduced in [44].It adopts an entropy-based measure to assess the similarity between two probability distribution statistics as follows: where P ¼ ðx 1 ; x 2 ; Á Á Á ; x n Þ and Q ¼ ðy 1 ; y 2 ; Á Á Á ; y n Þ are two probability distributions constructed from our gene expression values across tissues.n is the number of samples.Given each row of our gene expression matrix, we normalized the gene expression vector, i.e., each element in this vector was divided by the sum of all elements.For a given gene, Q ¼ ðy 1 ; y 2 ; Á Á Á ; y n Þ is its corresponding normalized row vector.Given the tissue we are interested, P ¼ ðx 1 ; x 2 ; Á Á Á ; x n Þ is constructed as a control vector whose components are 1  m in the given tissue with m replicates and 0 in other tissues.Finally, the JSD will be calculated as the divergence between P and Q for a certain gene in certain tissue.The smaller the JSD value was, the more specific this gene was in this tissue.
In summary, our JMS provided a relative specificity score by a nonlinear measure of divergence by emphasizing significantly highly expressed genes in certain tissues to enhance specificity.This JMS allows us to better explore the TTF expression patterns and recruitment of genes based on tissue specificity.

Differential regulatory network construction between rumen and esophagus
We constructed a differential regulatory network between rumen and esophagus by extracting differential RSEGs, differential TTFs, and active-RSCNEs associated sub-network from the regulatory network of TTFs.The differential RSEGs and differential TTFs are defined as follows.

Differential RSEGs between rumen and esophagus
We used the R packages ''limma" and ''edgeR" to extract differential genes at four developmental time points (E60/D1/D7/ D28) with thresholds of FDR < 0.05 and log 2 fold change (FC) > 1 (FC of FPKM in the rumen relative to that in the esophagus).It was noted that at time point Y1, we had only one biological replicate for RNA-seq data in the rumen and esophagus separately, and we could not perform an F test on these two samples.Instead, we identified genes with FPKM > 2 in the rumen and FC > 2 as differential genes.Then, we combined differential genes at five time points to obtain differential gene sets between rumen and esophagus.Differential RSEGs between rumen and esophagus were the intersection of the differential gene set and the RSEG set in the regulatory network of TTFs.

Differential TTFs between rumen and esophagus
We first collected 1027 TFs of sheep from animalTFDB3.0(http://bioinfo.life.hust.edu.cn/AnimalTFDB/#!/).The 15,835 expressed genes in the rumen and esophagus were intersected with these 1027 TFs to obtain 768 TFs for the following analysis.We used HOMER to find TFs binding to the differential accessible peaks with threshold of Àlog 10 P value > 6 at each time point.Then, we used the R packages ''limma" and ''edgeR" to obtain differentially expressed TFs at four time points (E60/D1/D7/D28) with thresholds of FDR < 0.05 and log 2 FC > 1.We identified differentially expressed TFs at time point Y1 with threshold FPKM > 2 in the rumen and FC > 2. The differential TF set was defined as the intersection of TFs binding to differential accessible peaks and differentially expressed TFs. Differential TTFs between the rumen and esophagus were the intersection of the differential TF set and TTF set in the regulatory network of TTFs.

Hierarchical clustering and PCA
We performed hierarchical clustering on the gene expression and peak chromatin accessibility profiles in 14 rumen samples at five time points (E60/D1/D7/D28/Y1).Heatmap was mega) after 24-h transfection.Each assay was monitored at least five times, independently.The t-test was applied to calculate the significance of the regulatory activity using GraphPad Prism 7.0 software (Prism, San Diego, CA).Statistically significant differences need to meet the criterion of P < 0.05.

Figure 1
Figure 1 Time-series data of paired expression and chromatin accessibility reveal the regulatory landscape for rumen development A. Experimental design diagram for multi-replicate, multi-tissue, and multi-level omics data profiling during sheep development from E60 to postnatal stages (D1, D7, and D28) to Y1. Hierarchical clustering of gene expression for 14,637 genes (B) and chromatin accessibility for 178,651 OCRs (C).Unsupervised PCA of rumen gene expression (D) and chromatin accessibility (E).ATAC-seq, Assay for Transposase-Accessible Chromatin with high-throughput sequencing; RNA-seq, RNA sequencing; E60, embryonic day 60; D1, postnatal day 1; D7, postnatal day 7; D28, postnatal day 28; Y1, adult 1 year; OCR, open chromatin region; PCA, principal component analysis; PC, principal component.

Figure 2
Figure 2 Characterization of active-RSCNEs as developmental enhancers A. Number of reproducible peaks at each developmental stage in the rumen and esophagus.B. Annotating reproducible peaks by location in different genomic regions.C. Number of active-RSCNEs at each developmental stage in the rumen and esophagus.D. Annotating active-RSCNEs by location in different genomic regions.E. The genes nearest to active-RSCNEs are enriched in RSEGs.P value is calculated by Fisher's exact test.F. GO enrichment analysis for genes near the active-RSCNEs.G. Number of type I and type II RSCNEs in total RSCNEs and active-RSCNEs.H.The intersections among active-RSCNEs with enhancers from d-TACs, cCREs, and VISTA.I.Luciferase activity assay of 10 active-RSCNEs randomly chosen from 1601 active-RSCNEs.RSCNE, ruminant-specific conserved noncoding element; RSEG, rumen-specific expressed gene; GO, Gene Ontology; UTR, untranslated region; d-TAC, developmental region of transposase-accessible chromatin; cCRE, candidate cis-regulatory element.

Figure 3
Figure 3 CNEReg interprets RSCNEs by reconstructing the developmental regulatory network A. CNEReg inputs paired time-series gene expression and chromatin accessibility data, ruminant comparative genomes, and comparative transcriptomes, and outputs the developmental regulatory network of active-RSCNEs.Three major steps of CNEReg include multi-omics data integration, model component identification, and developmental regulatory network inference.B. The developmental regulatory network reconstruction is further illustrated in three steps.Step 1: inferring the upstream regulation of rumen TTFs.Step 2: inferring the downstream regulation of TTFs to TGs via active-RSCNEs.Step 3: deriving active-RSCNE's functional influence score by integrating regulatory strength in the network and evolutionary conservation score.The model components and notations of CNEReg are detailed in Table1.TG, target gene; TF, transcription factor; TTF, toolkit transcription factor; CNEReg, conserved non-coding element interpretation method to integrate multi-omics data into gene regulatory network; RSCNE, ruminant-specific conserved non-coding element.

Figure 5
Figure 5 Downstream regulatory network of rumen TTFs A. Downstream regulatory network with 14 rumen TTFs regulating 93 TGs via 139 active-RSCNEs.TTFs are colored by the tissue they are highly expressed, and TGs are annotated and colored by their biological system.B. An example from the regulatory network shows that SLC14A1 is regulated by four active-RSCNEs with TTF motifs.The expression and chromatin accessibility tracks are derived from rumen ATAC-seq (D1 or D7) and RNA-seq data (Y1).

Figure 6
Figure 6 Regulatory network sheds light on the difference between the rumen and esophagus in the development A. Dynamics across stages for the six differential TTFs between the rumen and esophagus by integrating motif enrichment in differential ATAC-seq peaks and gene expression levels.B. Downstream regulatory sub-network of six differential rumen TTFs.FPKM, fragments per kilobase of exon per million mapped fragments.
TTF l;t = expression of the l-th TTF on t-th time point TTF HOXC8 = 25.48 on D7 in rumen Expression of TFs TF m = expression of the m-th TF TF JUN = 1035.79on D7 in rumen Expression of RSEGs RSEG n;t = expression of the n-th RSEG on t-th time point RSEG SLC14A1 = 42.34 on D7 in rumen Accessibility of active-RSCNEs O k;t = openness of the k-th active-RSCNE on t-th time point O Chr1:196579342À242 = 18.83 on D7 in rumen TFs with motif match in an active-RSCNE MB i;l = the set of TFs with significant motif match in i-th active-RSCNE HOXC8 has motif match at active-RSCNE Chr1:196579342-242 Motif matching strength of TFs on RSCNEs B i;l = sum of Àlog P value of l-th TF's motif on i-th active-RSCNE B Chr1:196579342À242 = 4.28486

Table 1
Components and notations of the CNEReg model