Impact of 3D genome organization, guided by cohesin and CTCF looping, on sex-biased chromatin interactions and gene expression in mouse liver

Several thousand sex-differential distal enhancers have been identified in mouse liver; however, their links to sex-biased genes and the impact of any sex-differences in nuclear organization and chromatin interactions are unknown. To address these issues, we first characterized 1847 mouse liver genomic regions showing significant sex differential occupancy by cohesin and CTCF, two key 3D nuclear organizing factors. These sex-differential binding sites were primarily distal to sex-biased genes but rarely generated sex-differential TAD (topologically associating domain) or intra-TAD loop anchors, and were sometimes found in TADs without sex-biased genes. A substantial subset of sex-biased cohesin-non-CTCF binding sites, but not sex-biased cohesin-and-CTCF binding sites, overlapped sex-biased enhancers. Cohesin depletion reduced the expression of male-biased genes with distal, but not proximal, sex-biased enhancers by >10-fold, implicating cohesin in long-range enhancer interactions regulating sex-biased genes. Using circularized chromosome conformation capture-based sequencing (4C-seq), we showed that sex differences in distal sex-biased enhancer–promoter interactions are common. Intra-TAD loops with sex-independent cohesin-and-CTCF anchors conferred sex specificity to chromatin interactions indirectly, by insulating sex-biased enhancer–promoter contacts and by bringing sex-biased genes into closer proximity to sex-biased enhancers. Furthermore, sex-differential chromatin interactions involving sex-biased gene promoters, enhancers, and lncRNAs were associated with sex-biased binding of cohesin and/or CTCF. These studies elucidate how 3D genome organization impacts sex-biased gene expression in a non-reproductive tissue through both direct and indirect effects of cohesin and CTCF looping on distal enhancer interactions with sex-differentially expressed genes.


Introduction
Sex differences in gene expression are found in several non-reproductive tissues, including the brain [1], immune system [2], kidney [3] and liver [4]. Sex differences in the liver are associated with a higher incidence of aggressive liver cancer in males [5], increased susceptibility to autoimmune hepatitis in females [6], and sex differences in metabolism of diverse pharmaceuticals and environmental chemicals [7]. Transcriptomic and epigenetic sex differences in the transcriptome are best characterized in mouse liver, where more than 1000 genes [8], including many lncRNA genes [9,10] and miR-NAs [11], exhibit sex-biased expression regulated by the sex-differential temporal patterns of pituitary growth hormone secretion [12]. Sex differences in the epigenome are widespread, and frequently are associated with sex differences in gene distal, but not gene proximal, regulatory elements, which show characteristic sex-differential patterns of histone marks and chromatin accessibility (DNase hypersensitive sites, DHS) [13,14]. Three-dimensional looping is one mechanism that could potentially link the few thousand mostly distal sex-biased enhancers identified to individual sex-biased genes.
CCCTC-binding factor (CTCF) and the multi-protein complex cohesin are two major transcription factors regulating 3D genomic architecture. CTCF has primarily been studied for its role in DNA looping and insulation [15,16], while cohesin is a molecular motor powering DNA looping via a loop extrusion mechanism [17,18]. Loss of CTCF or cohesin is lethal in developing mouse embryos [19,20]. However, when degradation of the cohesin loading factor Nipbl is induced in adult mouse liver, a dose-dependent loss of both cohesin binding and virtually all focal DNA looping is seen without major hepatocyte toxicity [21]. Loss of DNA looping also occurs in other systems following depletion of either cohesin [22] or CTCF [23]. Thus, CTCF and cohesin are both required for DNA looping.
The functional role of cohesin at a given genomic site is largely dependent on its binding partners. Cohesin lacks sequence-specific DNA binding activity, but is loaded and unloaded from chromatin by specific protein complexes [24]. Cohesin can participate in shorter-range looping between enhancers and promoters ('enhancerpromoter loops') in association with Mediator and tissue-specific transcriptional regulators [25][26][27]. Genomic sites bound by cohesin but not CTCF, i.e., cohesin-non-CTCF sites (CNC), tend to be highly tissue specific, but frequently show weaker binding than sites where cohesin and CTCF are both bound, i.e., cohesin-and-CTCF (CAC) sites [25,28,29]. Cohesin forms insulating loops at CAC sites, which have been variously characterized as Topologically Associating Domains (TADs) [30] and intra-TAD loops [31], loop domains [32] and insulated neighborhoods [26]. A majority of CAC-mediated insulated loops are conserved across tissues and cell types [30][31][32][33] and many are even conserved across mammalian species at syntenic regions [34]. A distinct subset of CAC sites is important for enhancer-promoter interactions [35,36], and may specifically help structure superenhancers around key constituents enhancers, known as hubs [37]. The role of CTCF binding in the absence of cohesin, i.e., at Lone CTCF sites, is less clear. CTCF binds specific DNA motifs via its 11 zinc fingers [15], yet the function of Lone CTCF sites is likely also dependent on additional interacting proteins, such as the transcription factors YY1 [38] and STAT5 [39].
CAC-mediated insulating loops are typically anchored by a pair of tightly bound CTCF sites oriented toward each other by the non-palindromic CTCF motif [32]. Supporting this finding, inversion or deletion of CACbound anchors results in a loss of specific loops and subsequent rewiring to form alternative loops [17,40]. This stereotypical pattern of convergently oriented CTCF binding enables the accurate prediction of CAC-mediated loops based on CTCF and cohesin binding activity and motif orientation alone [17,31,41]. Computational prediction of loops linking enhancers and promoters, i.e., gene loops, is a more elusive goal, although some recent progress has been made [42,43]. Chromatin conformation capture technology using formaldehyde crosslinking and restriction enzyme digestion followed by proximity ligation can be employed to identify such loops experimentally, and thereby determine interaction frequencies between different genomic regions [44]. Circularized chromatin conformation capture with sequencing (4Cseq) is one such method that interrogates all potential interactions between a single site of interest and the rest of the genome [45].
Here we take a multi-pronged approach to elucidate the role of architectural proteins and 3D genome organization in regulating the widespread sex differences in gene expression seen in mouse liver. First, we identify sex-biased binding sites for cohesin and CTCF in mouse liver chromatin, a majority of which were found at intergenic sites distal to sex-biased genes. Further, we investigate the effects of a deficiency in cohesin loading in male mouse liver [21], and find that cohesin is specifically required for expression of male-biased genes with distal sex-biased regulatory elements. Finally, we use 4C-seq to directly evaluate sex differences in chromatin interactions involving sex-biased enhancers from five different genomic regions, and demonstrate the importance of loop domains for insulation of enhancer-promoter contacts at sex-biased genes. Overall, our findings highlight how 3-dimensional genome organization contributes to sex differences in liver gene expression in both direct and indirect ways.
Immunoprecipitation of sonicated mouse liver chromatin was performed as described [14]. Specifically, 5 µl of rabbit polyclonal antibody to either CTCF (Millipore, cat. # 07-729) or to the cohesin subunit Rad21 (Abcam, cat. # ab992) was mixed with 30 µl of Protein A Dynabeads (Invitrogen, cat. # 1002D) and incubated in blocking solution (0.5% bovine serum albumin in PBS) for 3 h at 4 °C. Beads were then washed with blocking solution, followed by incubation overnight with 70 µg of sonicated liver chromatin in 1X RIPA buffer (50 mM Tris-HCl, pH 8.0, 150 mM NaCl, 1% IPEGAL, 0.5% deoxycholic acid) containing 0.1% SDS. After washing with 1X RIPA (containing 0.1% SDS), formaldehyde crosslinks were reversed for 6 h at 65 °C, followed by RNase A digestion (Novagen, cat. # 70856) at 37 °C for 30 min and then proteinase K digestion (Bioline, cat. #37084) for 2 h at 56 °C. The resulting crude DNA extract was purified using a QIAquick Gel Extraction Kit (Qiagen, cat. # 28706) and quantified with a Qubit HS DNA kit (Invitrogen, cat. # Q32854). All samples were processed using the same protocol and conditions. Sequencing was performed for a total of eight CTCF ChIP-seq samples (n = 4 individual male and n = 4 individual female livers) and a total of six Rad21 ChIP-seq samples (n = 3 male, n = 3 female livers). The male liver ChIP-seq samples were those reported previously [31] and are available at GSE102997. Female liver ChIP samples are available at GSE130908.

4C-seq methods
Isolation of liver nuclei and crosslinking were performed as described [31] through the step where crosslinked nuclei were pelleted by centrifugation. Digestion, proximity ligation, and inverse PCR were then carried out as described [31]. Briefly, frozen nuclei were resuspended in Buffer A (15 mM Tris-HCl pH 8.0, 15 mM NaCl, 60 mM KCl, 1 mM EDTA pH 8.0, 0.5 mM EGTA pH 8.0, 0.5 mM spermidine, 0.3 mM spermine) and quantified using a Countess Automated Cell Counter (Invitrogen, cat. # C10227). An aliquot of 10 million nuclei was used for each individual 4C experiment. Nuclei were pelleted at 1000×g for 5 min at 4 °C then resuspended in 450 μl of 1X NEBuffer 3 (NEB, cat. # B7003S). Primary restriction enzyme digestion of intact nuclei was carried out overnight at 37 °C using 50,000 U of DpnII (NEB: #R0543) with agitation at 900 RPM. DpnII was inactivated by adding SDS to a final concentration of 2%. Samples were then diluted fivefold in 1X ligation buffer (Enzymatics, cat. # B6030). Proximity ligation was performed overnight at 16 °C with 200 U of T4 DNA ligase (Enzymatics, cat. # L6030). DNA was then reverse-crosslinked and purified using a standard phenol/chloroform cleanup method following the manufacturer's protocol (VWR, cat. # VWRV0883). Secondary digestion of purified DNA was performed overnight at 37 °C with 50 U of Csp6I (Thermo Scientific, cat. # ER0211) in 500 µl of 1X Buffer B (Fermentas, cat. # BB5; 10 mM Tris-HCl (pH 7.5), 10 mM MgCl 2 , 0.1 mg/ml BSA), followed by heat inactivation at 65 °C for 30 min. Samples were diluted tenfold and secondary ligation was carried out overnight at 16 °C, as described above. The effectiveness of primary digestion, proximity ligation, secondary digestion, and secondary ligation was verified at each step by reverse crosslinking and gel electrophoresis analysis (see gel image in Additional file 1: Figure S5A). The final PCR template was purified by phenol/chloroform clean up, followed by QiaPrep 2.0 column cleanup (Qiagen, cat. # 27115) to yield a standard, circularized 4C inverse PCR template, which was amplified using specific viewpoint primers, as described below.
PCR reactions were performed using inversely oriented primer pair sequences for valid 4C-seq viewpoints, obtained from the 4CSeqpipe primer database [http://www.wisdo m.weizm ann.ac.il/~atana y/4cseq _pipe/], with the addition of 5′ dangling truncated Illumina adapters (Additional file 2: Table S3A). Candidate viewpoints were selected based on the following criteria. First, we only considered viewpoints that are in the same TAD as at least one protein-coding or lncRNA gene showing > 3-fold sex bias in its expression. Second, the viewpoint must be within 1 kb of the transcription start site (TSS) of a sex-biased gene, or it must overlap a sexbiased enhancer (minimum twofold sex-bias in normalized DHS opening or H3K27ac mark intensity). Third, the non-reading primer (Additional file 2: Table S3A) was required to map to the genome uniquely, while the reading primer was more stringently required to have > 89% unique sequence identity (i.e., no 18-mer within a 20 nt primer sequence that maps elsewhere in the genome). Inverse PCR amplification of 1 µg of each 4C template was performed using Platinum Taq DNA polymerase (Invitrogen, cat. #10966026), as follows: 94 °C for 2 min, 25 amplification cycles (94 °C for 30 s, 55 °C for 30 s, 72 °C for 3 min), then 4 °C hold. To minimize the impact of PCR artifacts, a total of 6 identical inverse PCR reactions were performed for each sample, except as noted. These replicate reactions were processed in parallel and pooled prior to library preparation. For 4C-seq analysis of the Nox4 genomic region viewpoints, single  Figure S5B. 4C-seq samples were amplified using barcoded primers, such that each biological replicate had a unique barcode (NEB, cat. # E7335). To minimize over-amplification, each sample was amplified with 5 additional cycles of PCR to fill in the full Illumina adapter sequence needed for sequencing. Due to the low sequence complexity in the first 20 bases of each sequence read, 4C-seq libraries were multiplexed by combining with high complexity sequencing libraries for unrelated samples (e.g., RNA-seq or ChIP-seq libraries), which were sequenced in the same Illumina sequencing lane. In practice, 4C-seq libraries constituted no more than 15% of the total library pool, by molarity. Samples were sequenced on an Illumina HiSeq2500 instrument for 50, 125, or 150 bp paired end reads.

Computational analysis of ChIP-seq datasets
Sequence reads were split by barcode and mapped to mouse genome assembly mm9 using Bowtie2 (v2.2.9). All reads not uniquely mapped to the genome were excluded from downstream analyses. Peak calling was performed using MACS2 (v2.1.1) with default parameters, and peaks that overlapped blacklisted genomic regions (http:// www.sites .googl e.com/site/anshu lkund aje/proje cts/black lists ) were filtered out. Additionally, we removed spurious peaks that exclusively contained PCR duplicated reads, defined as 5 or more identical sequence reads that do not overlap any other reads. All BigWig tracks used to visualize sequencing data in a genome browser were normalized for both sequencing depth and sample quality, expressed as reads in peaks per million mapped reads (RIPM). In practice, the browser y-axis displays the read count from a given sample divided by the total number of reads in peaks (reads that overlap a peak identified in any sample, or the union peak list), per million. Normalization was performed separately for the CTCF and cohesin datasets. This approach is functionally similar to the quality control metric known as Fraction of Reads in Peaks (FRiP) used by the ENCODE consortium [46]. All samples used in this study were judged to be of good quality, with a mean FRiP value of 0.217 and ranging from 0.103 to 0.344. A full listing of samples sequenced and sequencing statistics is provided in Additional file 2: Table S3B.
To identify sex-differential ChIP-seq peaks, diffReps (v1.55.4) [47] was used with default parameters and a window size of 200 bp to identify in-peak differential sites, i.e., diffReps sites that overlap a MACS2 peak, defined below. The diffReps output list of sites was filtered to remove diffReps-identified sites that did not meet the following conditions: overlap with at least one of the peaks from the union peak list for the relevant factor (Additional file 3: Table S1F, G for CTCF; Additional file 3: Table S1H, I for cohesin), contains at least 10 sequence reads, shows > 2-fold sex difference, and has an FDR < 0.05. The resultant sets of ChIP-seq peaks were defined as standard stringency sex-biased peaks, and were used for all analysis, except as noted. A set of lenient stringency sex-biased CTCF and cohesin peaks was defined, as follows. Sequence reads for biological ChIPseq replicates were combined (merged) to give a single merged sample for each sex and each factor (Male CTCF, Male cohesin, Female CTCF, and Female cohesin). For each transcription factor, the male and female merged samples were then compared, and the merged sample with the higher FRiP was down-sampled, so that the mapped read files for each sex contained the same, normalized number of reads in peaks. In practice, the combined mapped reads for the merged female cohesin samples, and for the merged male CTCF samples, were down-sampled by a factor of 0.979869 and 0.745955, respectively. MAnorm [48] was then used to compare the FRiP-normalized male and female samples to identify a set of sex-differential binding sites for each factor (minimum twofold sex bias; p-adj < 0.01; read count > 15 for the upregulated peak). Binding sites identified by MAnorm that were not on the standard sex-biased peak list were designated lenient stringency sex-biased peak lists. Bedtools (v2.26.0) was used for overlap analysis to determine the distance of ChIP-seq peaks to other genomic features. Genomic coordinates (mm9) for TAD and intra-TAD boundaries were downloaded from [31], where TAD definitions are based on experimental Hi-C analysis for male mouse liver [34]. Given the low resolution of this dataset (20-40 kb bin size), the TAD definitions used in this study are most similar to early Hi-C studies, and may underestimate the total number of domains in liver.
Sex-differential ChIP-seq peaks were analyzed separately for the sets of male and female biological replicates and then divided into four groups, CAC(∆Coh), CAC(∆CTCF), CNC(∆Coh), and Lone CTCF(∆CTCF), based on the following criteria. Sex-differential cohesin peaks (∆Coh) were designated CAC(∆Coh) peaks if they overlapped a CTCF peak identified in at least 3 of the 4 CTCF ChIP-seq biological replicates in the sex showing higher cohesin binding (e.g., male-biased cohesin peaks were compared to male liver CTCF ChIP-seq replicates). Alternatively, they were designated CNC(∆Coh) peaks if they overlapped a CTCF peak found in either 0 or 1 of the four CTCF-seq biological replicates. Those ∆Coh peaks that overlapped 2 of the 4 CTCF replicates were excluded from downstream analyses. Similarly, sex-differential CTCF peaks (∆CTCF) were designated CAC(∆CTCF) peaks if they overlapped a cohesin peak found in 2 or 3 of the three available cohesin ChIP-seq biological replicates. Sex-differential CTCF peaks were designated Lone CTCF(∆CTCF) peaks if they overlapped peaks in either 0 or 1 of the three cohesin ChIPseq biological replicates. 137 sex-differential CTCF peaks overlapped sex-differential cohesin peaks, and were thus CAC(∆Coh/∆CTCF); 50 of these 137 CAC peaks were autosomal (Additional file 3: Table S1C). Sex-independent cohesin peaks, and sex-independent CTCF peaks, were, respectively, defined by ranking each peak based on the following ratio: (RIPM-normalized ChIP signal for the merged male sample)/(RIPM-normalized ChIP signal for the merged female sample), performed separately for CTCF and for cohesin. The 1000 peaks whose ratios were closest to 1 were defined as the set of sex-independent CTCF, and cohesin, peaks.

Discovery of intra-TAD loops
CTCF motif discovery was performed using the FIMO option from MEME Suite (v4.10.0), and presence of a motif was defined as a motif score > 10. Intra-TAD loops for female mouse liver were identified using the computational method described previously for male liver [31]. The analysis pipeline was run for all CAC sites and with an initial loop count of 20,000, using the set of default parameters reported for male liver [31]. This analysis yielded 9724 intra-TAD loops with 10,273 loop anchors in female liver; this compares to 9543 intra-TAD loops and 9052 loop anchors identified in male liver [31]. The redundancy in loop anchors is a reflection of nested CAC-mediated loop structures, as was described in other studies using experimentally measured loop identification compared to the computational approach used here; these studies include ChIA-PET analysis of the cohesin subunit SMC1A in mouse embryonic stem cells [26] and Hi-C analysis in human GM12878 cells, where 9448 loops were associated with 12,903 loop anchors [17,32]. Reciprocal overlap between loops was analyzed using bedtools (bedtools intersect −wa −u −r −f 0.8), as described [31]. A total of 2527 intra-TAD loops were unique to either male or female liver; however, very few had anchors that overlapped a sex-differential CAC site, suggesting that most are not biologically relevant. Supporting this, the loops that were unique to either male or female liver were weaker than the loops shared between male and female livers, and in many cases the loops narrowly met the significance cutoff in one sex but not the other. This finding is similar to our earlier finding that tissue-specific loops are often weaker than those predicted in multiple tissue types [31]. Intra-TAD loops for male and female mouse liver are listed in Additional file 3: Table S1J, and female intra-TAD loop anchors are listed in Additional file 3: Table S1K; a comparable listing for male liver is available in [31].

Computational analysis of 4C-seq datasets
Biological replicates were demultiplexed by index read barcode. As the fastq files for each biological replicate contained sequence reads from multiple viewpoints, the reads in each file were further split based on matches to the reading primer for each viewpoint (Additional file 2: Table S3A). Then, prior to mapping, we used FASTX-Toolkit (v0.0.14) to remove the first 20 nt of sequence from the 5′ end of each read, as this represents the reading primer. For read length consistency, we trimmed 25 nt from the 3′ end of 150 nt read libraries, making them identical in length to the 125 nt libraries (i.e., both were 105 bases long after 5′ and 3′ trimming). Bowtie2 (v2.2.2) was then used to iteratively map the reads to a reduced mouse genome, which comprised all genomic sequences 105 bp upstream and 105 bp downstream from each DpnII cut site in the genome (recognition sequence: GATC), as in [45,49]). To implement this step, we scanned the genome for all occurrences of the DpnII cut site GATC using the UCSCutils tool oligoMatch (exactmatch; default parameters). The resulting set of coordinates was then expanded to include sequence 105 bp upstream and 105 bp downstream of each DpnII site (bedtools flank − l 105 − r 105), which was extracted from the mm9 genome using bedtools getfasta. This reduced mouse genome sequence was indexed using bowtie2 prior to mapping. An iterative mapping strategy was implemented because some reads contained multiple ligation junctions, rendering the full-length read unmappable, as described in some Hi-C pipelines [50]. First, non-uniquely mapped reads were trimmed by ~ 10% of their total length (starting from the 3′ end of the read), and mapping was reattempted as above. This process was repeated until a minimum read length of 20 nt was reached, with a step size of 2 nt (for 50-nt-long reads) or a step size of 10 bp (for 125 or 150-nt-long reads) per iteration. These iterative trimming and remapping steps increased the overall percentage of uniquely mapping reads by 2-6% for shorter read lengths libraries, and by 8-22% for longer read length libraries.
After mapping, bedGraph files were smoothed using ucscutils (v. 20130327) with a sliding window of 11 restriction fragments, taking the median value in the window. Smoothed BigWig tracks were normalized by total reads per million to account for differences in sequencing depth. Merged replicates shown in the main figure panels were generated by taking the median signal at a given restriction fragment per viewpoint and sex. The genomic region overlapping the viewpoint fragment was removed prior to BigWig generation for merged replicates. Tracks were visualized in the WashU genome browser with replicates overlaid using the Matplot feature. Additional file 2: Table S3C shows the sequences used for demultiplexing 4C libraries as well as basic statistics for read mapping and quality control. 4C-seq data were also analyzed by using the FourCSeq Bioconductor package [51] to identify interacting regions for each viewpoint. Interacting regions were identified using the 'addpeak' command and are reported in Additional file 4: Table S4 (FDR < 0.05 and Z score > 3). Significance of annotated interacting regions shown in individual figures was calculated using the "subjecthits" command (4C-seq signal peak ± 2.5 kb).

Impact of cohesin depletion sex-biased gene expression
RNA-seq data for wild-type and cohesin-depleted (Nipbldeficient) mouse liver (GSE93431) [21] was analyzed for n = 4 wild-type (control) and n = 4 Nipbl-depleted male mouse liver replicates. Data was RPKM-normalized, and reads were expressed relative to the mean of the wild-type group, which was set = 1 on a per gene basis by dividing the expression value for each individual replicate by the mean of the wild-type group plus a small pseudo count (1e−6) to avoid dividing by zero. Data is presented as mean relative expression ± SD for all plots. To determine the global effects of cohesin depletion on male-biased genes, the set of all expressed, strongly male-biased genes (FPKM > 1, and male/female (M/F) expression ratio > 3) was divided into two groups based on distance from their TSS to the nearest male-biased DHS or male-biased H3K27ac-marked region. Male-biased genes with proximal sex-biased enhancers (n = 29) were defined as having their TSS < 20 kb from the nearest male-biased DHS and from the nearest male-biased H3K27ac peak; and malebiased genes with distal sex-biased enhancers (n = 32) were those with TSS > 20 kb from both such regions. The underlying expression values for all genes are provided in Additional file 5: Table S2A, B.

Analysis of other datasets
Pan-cancer human expression data for Nox4 was obtained for tissues with matched Normal and Tumor expression values and analyzed using the web interface TIMER using default parameters (https ://cistr ome. shiny apps.io/timer /) [52]. Nox4 expression for male C3H mouse liver and neoplasms was obtained from [53] as processed, normalized RNA-seq counts (Accession # E-MTAB-6972). The following male and female mouse liver datasets were used to generate browser screenshots: DNase-seq [13], male liver CTCF and cohesin ChIP-seq data [31], sex-biased lncRNAs [9], M/F expression ratios [54], and H3K27ac ChIP-seq data [55]. Unless otherwise noted, FPKM and M/F expression ratios for protein coding genes are based on ribosomal RNA-depleted total liver RNA, while lncRNA expression ratios are based on ribosomal RNA-depleted liver nuclear RNA. Bigwig files were RIPM-normalized separately for each experiment type as described above. Genomic regions with significant sex bias are marked by short horizontal bars below each DHS and H3K27ac BigWig signal track. The bars were colored blue to indicate male-biased DHS or H3K27ac regions, and pink to indicate female-biased DHS or H3K27ac regions. A darker shade of color was used to indicate the stringency for the feature: DHS were shaded from dark to light to indicate high, standard, and low stringency for male-biased (blue) and female-biased (pink) DHS, as defined previously [13]. The high and low stringency designations used in that earlier study approximate the stringent and lenient definitions used here. For the sexbiased H3K27ac tracks, a total of four non-overlapping groups were defined based on the magnitude of the fold-change difference in H3K27ac peaks between male and female liver samples (M/F or F/M > 2.5 was defined as strict; and M/F or F/M > 1.5 but < 2.5 were defined as lenient). Both strict and lenient sex-biased H3K27ac peaks were defined using MAnorm with cutoffs of padj < 0.05 and read count > 15 for the upregulated peak [48]. These cutoffs resulted in 1583 female-biased H3K27ac peaks (380 strict, 1203 lenient) and 2241 male-biased H3K27ac peaks (604 strict, 1637 lenient).

Statistical analysis
Boxplots, cumulative distribution plots, and statistical analyses were implemented using GraphPad Prism 7. All boxplots are displayed using the Tukey convention, where interquartile range (IQR) was calculated as the difference between the 25th and 75th percentile. Outliers were considered as values falling above the 75th percentile value + 1.5*IQR, or below the 25th percentile value − 1.5*IQR. Whiskers indicate the maximum and minimum values in a set (if no outliers) or from the 75th percentile value + 1.5*IQR to the 25th percentile value − 1.5*IQR (if outliers exist). Boxes indicate the IQR with a horizontal line indicating the median value. Bar graphs show the mean and standard deviation. Unless otherwise indicated, all pairwise comparisons used twotailed, nonparametric t tests. Comparisons of distributions were performed using a Kolmogorov-Smirnov (KS) test, while comparisons of values used a Mann-Whitney (M-W) test. The results are annotated in individual figures as follows: **** indicates p ≤ 0.0001; *** p ≤ 0.001; ** p ≤ 0.01; * p ≤ 0.05; and not significant (ns) for p > 0.05.

Sex differences in CTCF and cohesin binding in mouse liver
We used ChIP-seq to identify binding sites for CTCF and the cohesin complex protein Rad21 in both male and female mouse liver. We observed significant sexdifference in factor binding (> 2-fold at FDR < 0.05) at 975 CTCF binding sites and at 1011 cohesin binding sites ( Fig. 1a; sites are listed in Additional file 3: Table S1A and B). We applied three criteria to insure the robustness of each sex-differential site: fold-change > 2 and consistency across biological replicates (Additional file 1: Figure  S1A-D) at FDR < 0.05 using a negative binomial model implemented in diffReps [47]; differential regions must overlap a genomic region called as a peak in at least one sample; and a minimum of 10 sequence reads in each peak region. Regions showing significant sex differential binding for both CTCF and cohesin represent 137 of all sex-differential sites ( Fig. 1b; also see Additional file 1: Figure S1E for alternative filters to define sex-differential binding). An overall trend of sex-biased binding by both factors was seen for an even larger fraction of the sex-differential ChIP-seq peaks, as indicated by aggregate plots and heat maps (Fig. 1c).
Next, we classified the sex-differential CTCF and cohesin-binding sites, as follows: sites where both factors are bound (CAC sites) and show sex-differential binding of either cohesin (∆Coh) or CTCF (∆CTCF), or both factors; cohesin-only binding sites (CNC sites) with sexdifferential cohesin binding; and CTCF-only sites (Lone CTCF sites) with sex-differential CTCF binding. CAC sites comprised 45-66% of all sex-differential sites (Additional file 1: Figure S1F) and generally showed stronger factor binding than the sex-differential CNC and Lone CTCF sites (Fig. 2a, Additional file 1: Figure S2A). The strength of factor binding (Fig. 2a), the CTCF motif score (Additional file 1: Figure S2B), and the percentage of sites with a CTCF motif (Additional file 1: Figure S2C) were generally higher for the female-biased sites than the male-biased sites. In contrast, a higher fraction of malebiased than female-biased Lone CTCF sites contained a CTCF motif (66% vs 48%, Additional file 1: Figure S2C), but there was no significant sex difference in normalized ChIP signal or motif score (Fig. 2a, Additional file 1: Figure S2C). The latter sex differences may be driven by additional factors, such as the inhibitory effect of DNA methylation on CTCF binding [56,57], where the same sequence motif in male and female liver could be preferentially bound in males due to the hypermethylation of DNA seen in female compared to male mouse liver [58]. A majority (51.7%) of sex-biased CTCF peaks with CTCF motifs contain at least one CpG within the core CTCF motif, and therefore could be subject to regulation via sex differences in DNA methylation (Additional file 1: Figure  S2D).

Sex-biased binding sites mapping to TAD and intra-TAD boundaries, genes and regulatory elements
We investigated whether the sex-differential binding of CTCF and cohesin is associated with sex-differential segmentation of the genome at the level of TAD and (See figure on next page.) Fig. 1 Sex differences in cohesin and CTCF binding to mouse liver chromatin. a Distribution of male/female ratios for all diffReps-identified sex-differential sites that overlap a MACS2 peak for binding of cohesin (left) and CTCF (right). The y-axis shows the number of binding sites per bin, and the x-axis shows the sex difference in binding, expressed as log2(Male/Female) fold-change. Gray bars represent binding sites below the 10 read minimum count threshold, which were filtered out, and black bars represent sites that were statistically significant, but showed a |fold change| < 2 (values between -1 and 1 on the graph). Pink and blue bars, respectively, represent female-biased and male-biased sites above these thresholds. b Venn diagram indicating 137 sex-differential peaks are common between cohesin and CTCF. Overlap is based on all sex-biased peaks, including male-biased and female-biased peaks on sex chromosomes (autosomal sex-biased peak numbers are shown in parenthesis, at the right). This pattern of limited overlap was also seen when the full set of unfiltered diffReps regions was examined (Additional file 1: Figure S1E). In total, 1847 unique peaks exhibited significant sex bias in liver chromatin binding of CTCF and/or cohesin. c Heat maps and aggregate plots for four sets of sex-biased cohesin ('Coh') or CTCF peaks. The peak set showing significant sex bias is highlighted in red at the top of each subpanel. For the heat map, read-in-peak normalized ChIP signals are shown for male and female cohesin binding (in blue) followed by male and female CTCF binding (in purple) within a 5 kb window centered around the differential peak summit. The aggregate profiles (top) represent the average signal of the heat map below for the same 5 kb window. Within each heat map, peaks are ranked based on the magnitude of sex bias from the most sex-biased (top) to the least sex-biased (bottom) intra-TAD loops (DNA loop domains). TAD positions were derived from low resolution Hi-C experiments performed in mouse liver [34], while intra-TAD loops were based on a validated computational approach that identifies loop positions within TADs [31] of the 137 CAC peaks with significant sex differences in both CTCF and cohesin binding (Fig. 1b), 53 are on autosomes (Additional file 3: Table S1C). 17 of the 137 sex-differential CAC peaks overlap a TAD or intra-TAD loop anchor [31] in either male or female liver, of which 9 are on autosomes (Additional file 3: Table S1C). Ten of the 17 sites are associated with an intra-TAD loop predicted to be present in one sex only, of which 5 contained one of more sex-biased genes. One of these sex-based intra-TAD loop domains contained 6 sex-biased genes from the Cyp2c gene family and 2 sex-biased lncRNA genes (Additional file 1: Figure S2E). Consistent with the low frequency of sex-biased CAC sites at TAD or intra-TAD loop anchors, 88-93% of intra-TAD loops and anchors predicted in male liver were also predicted in female liver (Additional file 1: Figure S2F, Additional file 3: Table S1J). We conclude that CAC-mediated intra-TAD loops are largely conserved between the sexes in mouse liver, and thus do not play a major role in regulating sex-biased gene expression. Rather, TAD and intra-TAD loop domains are likely to indirectly guide enhancer-promoter looping in a sex-biased manner, as described below. Next, we considered whether the sex-biased binding sites for CTCF and cohesin might help link sex-biased regulatory elements to sex-biased gene promoters. We examined the locations of these sites relative to enhancer DHS, defined as open chromatin regions (i.e., DHS) with a high ratio of H3K4me1 to H3K4me3 ChIP-seq signal [31]. We found that sex-biased CNC sites are much closer to enhancer DHS [median distance of 202 bp (in males) or 119 bp (in females)] than were sex-biased CAC sites (median distance = 8.8 kb (male-biased CAC) and 17.5 kb (female-biased CAC) (Fig. 2b, left). Thus, although sex-biased CNC sites are weaker binding than CAC sites (Fig. 2a), a majority are found at enhancers and may be functional. Of note, female-biased CTCF and cohesin binding sites tended to be closer to TSS than male-biased sites (Fig. 2c), despite equivalent sample quality between the sexes (17% reads in peaks for both male and female samples; Additional file 2: Table S3B). The factors underlying this apparent difference in TSS proximity are unclear.
All classes of sex-biased CTCF and cohesin sites tend to be distal from sex-biased genes: < 20% are within 20 kb of a sex-biased gene, and only 35 to 53% are within the same TAD and could therefore be considered potential Cohesin and CTCF ChIP-seq binding strength and proximity to genes. a Box plots of normalized ChIP-seq signal for the peak sets indicated on the x-axis. Peaks with sex differential binding for cohesin (top graph) and CTCF (bottom graph) are shown. Each pair of boxplots represents the male and female ChIP-seq signal for the same set of peaks, defined by their sex bias and peak type (CAC or CNC, for ΔCohesin peaks; and CAC or Lone CTCF, for ΔCTCF peaks), as indicated below the x-axis. Peak scores were calculated by average intra-peak ChIP signal, normalized by total sequence reads per million in peak (RIPM; see "Materials and methods"). Female-biased peaks were, on average, stronger than male-biased peaks by M-W test: p ≤ 0.001 for female vs male CAC(ΔCoh), CAC(ΔCTCF), and for CNC, but not for Lone CTCF peaks. b Distance from each indicated set of cohesin and CTCF peaks to the nearest enhancer DHS. Cumulative frequency curves indicate the fraction of each group on the y-axis, within the distance in kb to the nearest enhancer DHS indicated on the x-axis. Enhancer DHS were defined based on their high ratio of the enhancer histone mark H3K4me1 over the promoter mark H3K4me3 at DHS [31]. Sex-biased CNC peaks are closer to enhancer DHS (median distance to eDHS of 0.22 kb for male-biased CNCs and 0.12 kb for female-biased CNCs; KS pval < 0.0001 for all comparisons) than the other CTCF and cohesin peak classes (M CAC(ΔCTCF): 14.98 kb; F CAC(ΔCTCF) 13.76 kb; M Lone ΔCTCF: 13.88 kb; F Lone ΔCTCF: 7.17 kb). Female-biased CNC peaks are significantly closer to enhancer DHS than are male-biased CNC peaks (p = 0.0351; KS t-test). Male-biased CAC(ΔCohesin) peaks were closer to enhancers than female-biased CAC(ΔCohesin) peaks (p = 0.002; KS t-test), however, the reverse was found for CAC(ΔCTCF) peaks (p = 0.0052; KS t-test). Distance to nearest enhancer was not significantly different between male-biased and female-biased Lone CTCF peaks (p = 0.1068; KS t-test). P values for comparisons between male-biased and female-biased peaks of the same class are shown for each plot (KS t-test). c Distance from each indicated set of cohesin and CTCF peaks to the nearest TSS. Cumulative frequency curves indicate the fraction of each group on the y-axis within the distance in kb to the nearest TSS indicated on the x-axis. TSS for protein coding (RefSeq) and liver lncRNA genes were considered [80]. Female-biased cohesin and CTCF peaks are closer to TSS than male-biased CTCF and cohesin peaks of the same class (significance by KS t-test is indicated at top left of each plot). Distance to the TSS was not significantly different for male-biased versus female-biased CNC peaks (p = 0.1458; KS t-test). d Proximity of sex-biased cohesin and CTCF binding sites to sex-biased genes. Peak designations were as follows: Proximal, peaks < 20 kb from a sex-biased gene TSS; Intra-TAD, peaks within the same intra-TAD loop as a sex-biased gene; or TAD, peaks in the same TAD as a sex-biased gene. Each of these groups is mutually exclusive. TAD loop [34] and intra-TAD loop [31] coordinates were from the indicated references. A set of 983 sex-biased biased protein-coding genes was used in this analysis (see Additional file 3: Table S1 of [54]). e Cumulative frequency curves show the fraction of each group (y-axis) within the distance in kb to the nearest sex-biased DHS or H3K27ac genomic region (x-axis), based on a merged list of published sex-biased DHS [13] and sex-biased H3K27ac ChIP-seq peaks [55] for male and female mouse liver. For this analysis, CAC peaks with sex-biased binding of CTCF and cohesin were combined and presented as a single group [CAC (Both)]. Male-biased and female-biased CNC peaks are significantly closer to sex-biased DHS/H3K27ac than the four other peak classes (p < 0.001; KS t-test). Female-biased CNC peaks were significantly closer to sex-biased DHS/H3K27ac than male-biased CNC peaks (p = 0.  cis regulators of sex-biased genes (Fig. 2d). Consistent with the association of cohesin with enhancers [25,59], 25-29% of sex-biased CNC sites overlap a sex-biased enhancer (either sex-biased DHS or sex-biased H3K27ac peaks), as compared to only 7-11% of sex-biased CACs and Lone CTCF sites (Fig. 2e). Overall, however, a majority of all classes of sex-biased CTCF and cohesin sites are quite distant from sex-biased regulatory elements (Fig. 2e), consistent with their being distal regulators of sex-biased gene expression. Sex-biased liver CNC sites and Lone CTCF sites showed much more tissue-specific binding of CTCF across mouse tissues [60,61] than did sex-biased liver CAC sites (Additional file 1: Figure S3A, lower vs upper panels). Significant differences in the tissue-specificities of CTCF binding were also seen at male-biased compared to female-biased CAC sites and Lone CTCF sites (Additional file 1: Figure S3A, SB). In addition, liver-expressed genes that mapped to sex-biased CNC sites showed a more liver-specific expression pattern than genes mapping to sex-biased CAC sites, or liver-expressed genes overall (Additional file 1: Figure S3C). This suggests that sex-biased CNCs participate in tissue-specific transcriptional regulation, as was described for CNC peaks generally [25,28,29].

Impact of cohesin depletion on distally regulated male-biased genes
A substantial fraction (35-53%) of sex-biased CTCF and cohesin binding sites are found within the same TAD as at least one sex-biased gene (Fig. 2d) and could play a role in DNA looping between sex-biased enhancers and sex-biased gene promoters. Examples of sex-biased CTCF and/or cohesin binding sites that were either proximal (< 20 kb) or distal to sex-biased genes are shown in Fig. 3a-c. Figure 3a shows a femalebiased enhancer (F-biased DHS and F-biased H3K27ac mark) with an overlapping female-biased CAC site (green arrows) located 33 kb upstream of the femalebiased gene Slc22a29 (F/M ratio = 8.7), the closest TSS. Figure 3b shows Cml5, a male-specific gene (M/F expression ratio = 20.2) with a male-biased CNC site that overlaps a male-biased DHS ~ 3 kb upstream of its TSS (Fig. 3b, red arrow). The adjacent gene, Nat8 (M/F = 4.2), has a male-biased CAC site that overlaps a male-biased DHS positioned ~ 12 kb upstream of the TSS (Fig. 3b, green arrow). Conceivably, the sex-biased binding of cohesin and CTCF at these sites could contribute to looping of the associated sex-biased enhancers to their correspondingly sex-biased gene targets. Finally, Fig. 3c shows two male-biased complement ChIP-seq and DNase-seq data in each track is shown superimposed for male (blue) and female (red) liver after normalization to total reads per million in the union of peaks for that factor (see "Materials and methods"). Below each ChIP-seq track, a horizontal bar identifies genomic regions that show significant male-bias (blue bar) or female-bias (pink bar), with darker and lighter shades indicating strict and lenient cutoffs for sex-bias, respectively (see "Materials and methods"). Green arrows indicate CTCF sex-differential CAC, and red arrows indicate CTCF sex-differential CNC and Lone peaks. Male/Female stranded polyA + RNA-seq gene expression ratios [9] are indicated above each panel. a Slc22a29 is a female-biased gene with a distal female-biased non-anchor CAC peak overlapping a robust female-biased DHS with female-biased H3K27ac histone mark accumulation ~ 34 kb upstream (green arrow). The region shown spans chr19:8290981-8333632. b Two male-biased genes, Cml5 and Nat8, with proximal male-biased cohesin and CTCF peaks overlapping male-biased DHS (red, green arrows). Cml5 has a ~ 3 kb upstream male-biased CNC peak overlapping a strongly male-biased DHS. The Cml5 promoter shows strong male-biased H3K27ac marks and a weaker male-biased DHS. Nat8 has a weakly male-biased DHS at its promoter and a stronger male-biased DHS ~ 12 kb upstream that overlaps a male-biased CAC peak. The region shown spans chr6:85766132-85794443. c Male-biased genes C8a and C8b have distal male-biased CAC and CNC peaks overlapping male-biased DHS and H3K27ac marks (green and red arrows, respectively). The linear distance between the upstream male-biased CAC peaks and downstream male-biased genes is > 1.5 Mb. C8a and C8b reside on opposite ends of the same TAD. Oma1 is close in linear distance to these sex-biased regulatory elements, but shows no sex differences in expression; based on TAD structure it is not be predicted to interact with the highlighted male-biased enhancers. See Additional file 1: Figure S4A -test). Bars represent the mean expression of the group for a given gene relative to the mean of the WT group (equal to 1), and error bars show the standard deviation based on n = 4 per group. e Loss of cohesin binding has a tenfold greater suppressive effect on male-biased genes with distal sex-biased enhancers than those with proximal sex-biased enhancers. Shown is the mean expression for cohesin-depleted versus wild-type liver, such that a value of 0.1 represents a tenfold reduction in expression after cohesin loss. The median relative expression for DHS/H3K27ac-proximal genes is 0.69 (representing a modest suppressive effect of cohesin loss) and the corresponding median for DHS/H3K27ac-distal genes is 0.07, indicating a > 10-fold greater reduction in gene expression (p = 0.0087; M-W). Similar results were obtained when the definition of proximally regulated genes was relaxed to include genes with a TSS < 20 kb from either a male-biased DHS or a male-biased H3K27ac peak (median relative expression of 0.45 versus 0.042 for distal genes). Also see Additional file 5:  ) from a cluster of strongly male-biased enhancers near the 5′ end of the same TAD. The known TAD structure of this genomic region suggests these enhancers are spatially more proximal to the C8 genes than they are to than the linearly much closer Oma1 gene, located just inside the adjacent TAD (also see Additional file 1: Figure S4A).
Loss of cohesin binding in male mouse liver, achieved by depletion of the cohesin loading factor, Nipbl, leads to a loss of distal enhancer-promoter contacts and an increase in local ectopic contacts, which can activate proximal genes [21]. Using this public RNA-seq data for cohesin-depleted male mouse liver, we compared the effects of cohesin loss on the expression of malebiased genes with proximal sex-biased enhancers versus a b c d e those that have only distal (> 20 kb) sex-biased enhancers. Figure 3d shows the relative changes in expression in cohesin-depleted compared to wild-type male mouse liver for the three genes shown in Fig. 3c. Expression of C8a and C8b decreased, by 98% and 82%, respectively, upon loss of chromatin-bound cohesin in male liver, while expression of Oma1 increased modestly (+ 22%), perhaps by an enhancer hijacking mechanism [62]. Further work in this mouse model is needed to characterize the effects of cohesin depletion in female mouse liver, and specifically to determine if Oma1 expression becomes sex biased due to de novo enhancer-promoter interactions. In contrast, the male-biased genes with proximal sex-biased enhancers, Cml5 and Nat8 (Fig. 3a), showed no significant change in expression following cohesin loss (Additional file 1: Figure S4B). We verified this requirement of cohesin for expression of distally regulated but not proximally regulated male-biased genes. Thus, malebiased genes with distal (> 20 kb) sex-biased regulatory elements were significantly more sensitive to loss of cohesin than male-biased genes with proximal sex-biased enhancers (median decrease in expression upon cohesin loss: 14.3-vs. 1.4-fold; Fig. 3e). This finding likely results from a requirement for cohesin for distal interactions, via either a direct or an indirect looping mechanism. Conceivably, for sex-biased genes with nearby sex-biased regulatory elements, enhancer-promoter loops required for gene expression can be maintained over short genomic distances by transcription factors such as Mediator [25] or YY1 [63], and without a need for cohesin.

4C-seq analysis of sex-biased chromatin interactions in mouse liver
We performed 4C-seq analysis centered on six viewpoints at sex-biased enhancers in five distinct genomic regions to determine whether a sex-bias in enhancerpromoter loops is associated with sex-biased gene expression in mouse liver. Our findings (Fig. 4) are based on n = 3 individual biological replicates per sex, whose 4C-seq interaction profiles are displayed in Additional file 1: Figure S6. We also indicate locations of computationally predicted CAC-mediated intra-TAD loops (purple arcs) in mouse liver [31], which are similar in size (median 151 kb) and abundance (9543 loops total) to loop domains [32] and Insulated Neighborhoods [26,64]. The 4C-seq track is based on merged data from three biological replicates for each sex, calculated from the median value for a sliding window of 11 restriction fragments, in reads per million normalized 4C-seq signal per sex. These values are overlaid for visualization using the Matplot functionality built into the WashU genome browser with default parameters. The next four tracks show normalized DNase-seq or ChIP-seq signal for the indicated factors, and correspond to those described in Fig. 3. Sex-biased lncRNAs are shown below the Refseq gene track in pink (female-specific lncRNAs) or blue (male-specific lncRNAs). c, e also show locations of intra-TAD loops (pink arcs) below the Refseq gene track. 4C-seq data for individual biological replicates is presented in Additional file 1: Figure S6. a Distal enhancer viewpoint near A1bg and female-biased lncRNAs. The region shown includes 12 female-biased and nuclear-enriched mono-exonic lncRNAs, which fall into three clusters. The lncRNAs in each cluster are all transcribed from the same strand, as indicated by the arrow marking the TSS and direction of transcription of the most upstream lncRNA in each cluster. LncRNAs 12590-12593 show the strongest 4C-seq interactions with the viewpoint and also the most consistent female bias (Additional file 1: Figure S5C

A1bg region
We anchored a 4C-seq viewpoint at a female-biased enhancer near the strongly female-biased gene A1bg (F/M = 155) (Fig. 4a, vertical pink bar; Additional file 1: Figure S6A). Nearby are 12 female-biased, mono-exonic nuclear-enriched lncRNAs [10] in three clusters across the genomic region displayed. Robust interactions were observed in female but not male livers between the viewpoint enhancer and three genomic regions (red arrows): a strong female-biased enhancer (right arrow), the promoter of A1bg (middle arrow), and a region downstream of A1bg that contains a cluster of four female-specific lncRNAs (lncRNAs 12590-93; left arrow), where we observed the strongest interactions. The lncRNAs in this cluster are more highly expressed (Additional file 1: Figure S5C) and are more consistently female-biased across various RNA-seq datasets than the other two lncRNA clusters (Additional file 1: Figure S5D). The maximum expression of these 12 lncRNAs ranged from 0.31 to 2.71 FPKM in female liver compared to 0 to 0.02 FPKM in male liver (Additional file 1: Figure S5C). The precise relationship between the female-biased expression of these lncRNAs and the female-bias in 3D interactions with the distal enhancer is not known. The interaction may be regulatory in nature (e.g., an enhancer-promoter interaction, as with any gene) or it could be facilitated by one or more of the 12 nuclearenriched, female-biased lncRNAs, as was described for the lncRNAs Xist [65], Firre [66], and Haunt [67]. Alternatively, the female-specific interactions shown may be primarily those of regulatory enhancers driving expression of several female-specific genes-including A1bg and multiple lncRNAs. The female-biased CTCF binding seen at both interacting regions (right and left arrows) lends mechanistic support for the latter proposal, with CTCF mediating enhancer-promoter and enhancer-enhancer interactions. As CTCF is known to interact with lncRNAs in a functional manner, and with high affinity [68], these two mechanisms are not mutually exclusive; one or more of these highly female-specific lncRNAs (Additional file 1: Figure S5D) could function in a cis-acting manner to selectively guide CTCF binding and interactions unique to female liver.

Gm4794 and Sult3a1 region
We used 4C-seq to interrogate an enhancer viewpoint proximal to the highly female-biased gene Gm4794 (F/M = 704; also known as Sult3a2) and also two femalebiased mono-exonic lncRNAs, lnc8820 and lnc8821 (F/M = 34 and 90) (Fig. 4b, Additional file 1: Figure S6B). The enhancer viewpoint is distal to two other femalebiased protein coding genes, Sult3a1 (F/M = 288) and Rsph4a (F/M = 34). We observed female-biased 4C-seq interactions with the proximal promoter of Gm4794 (left red arrow) and with strong female-biased enhancers overlapping lnc8820 and lnc8821. Unlike the enhancer neighboring A1bg, these interactions are associated with female-biased CNC peaks present at both the viewpoint enhancer and the downstream interacting enhancer. Despite the absence of CAC insulator elements across this genomic region (and consequently, the absence of intra-TAD loops), all focal interactions are local, within ~ 35 kb of the viewpoint. However, the viewpoint enhancer also made weak interactions to a broad, ~ 45 kb region extending from ~ 15 kb upstream of the female-biased promoter of Sult3a1 to ~ 10 kb beyond the gene body (Fig. 4b, red bracket). This 45-kb region contains several robust female-biased DHS and H3K27ac peaks, but lacks cohesin binding, which may account for the lack of strong, focal interactions with the viewpoint enhancer. Gm4794 and Sult3a1 may both interact with the enhancer that overlaps lnc8820 and lnc8821, but this weak (and perhaps indirect) association cannot be captured by proximity ligation under standard 4C-seq conditions. Reciprocal 4C-seq experiments anchored at these ncRNA TSS or alternative 4C methods with increased sensitivity [69] may be needed to validate these weaker interactions.

C9 region
We examined a 4C-seq promoter viewpoint placed at the complement factor C9 gene (M/F = 3.5) to investigate whether intra-TAD loops can indirectly coordinate enhancer-promoter contacts preferential to one sex in a genomic region without sex-biased CTCF or cohesin binding. C9 overlaps an antisense lncRNA that shows a sevenfold greater male-bias in expression (lnc12340; M/F = 26) (Fig. 4c, Additional file 1: Figure S6C). A strongly male-biased enhancer region lies ~ 230 kb upstream of the TSS of C9, and is characterized by malebiased DHS and H3K27ac peaks, whereas the TSS of C9 has only a male-biased DHS. The far upstream enhancer and the TSS of C9 both fall just outside of (< 10 kb from) a nested pair of intra-TAD loops that encompass both TSS of Dab2 (Fig. 4c, bottom track) and at least partially insulate Dab2, whose expression in male liver is 87-fold lower than C9 (FPKM = 0.7 vs 61). The promoter region of C9 interacts with the cluster of far upstream enhancers (Fig. 4c, red arrow), with stronger interactions seen in male liver. Weaker, mostly non-focal interactions were seen between the C9 promoter and several sites within the nested intra-TAD loops. This insulation-by-looping mechanism allows the strong male-biased enhancer to bypass a more proximal gene, Dab2, to drive expression of C9; it also helps explain the 87-fold higher expression of C9 compared to Dab2 in male liver. The shorter isoform of Dab2 shows weak, male-specific interaction, despite a lack of sex-bias in its expression. Given the interactions with the strong, far upstream enhancer region, we hypothesized that the expression of C9 and the antisense lnc12340 would be sensitive to the loss cohesin of binding. Indeed, both genes showed a three to fivefold decrease in expression in cohesin-depleted mouse liver (p < 0.01 for both), while the insulated gene Dab2 showed no significant change in expression (Fig. 4d). While there was an apparent increase in Dab2 expression, that increase did not reach statistical significance (Fig. 4d). Of note, the loss of CAC loop insulation upon cohesin depletion would be expected to result in such an increase, which would allow the upstream enhancer to more freely interact with the promoters of both Dab2 isoforms. Together, these findings support a model whereby an active, male-biased enhancer preferentially interacts with the highly expressed male-biased gene C9, while bypassing the lowly expressed intervening gene Dab2 (also see Discussion and model in Fig. 6). Nudt7 is a highly expressed, male-biased gene with distal male-biased enhancers within the same intra-TAD loop (M/F = 2.5; FPKM = 132 in male liver). Although there are some weak male-biased DHS within the gene body of Nudt7 (Fig. 4e), there is no apparent sex bias at its shared promoter with the sex-biased lncRNA gene lnc7430 (M/F = 4.0; FPKM = 2.9 in male liver). Approximately 22 kb and 39 kb upstream of the TSS of Nudt7 are two male-biased enhancers with male-biased DHS and H3K27ac marks; the latter also is proximal (~ 2.5 kb upstream) to the TSS of the malebiased lnc7423 (M/F = 7.0; FPKM = 2.8 in male liver). This enhancer cluster is one of 503 super-enhancers found in both male and female liver [31]. We observed male-biased 4C-seq interactions between the enhancer viewpoint and a neighboring male-biased enhancer, and also with the Nudt7 and lnc7430 promoter(s) (Fig. 4e, red arrows; Additional file 1: Figure S6D). In contrast, we did not observe focal interactions in either sex between the enhancer viewpoint and a sex-independent enhancer 15.7 kb upstream of Nudt7. Both Nudt7 and lnc7430 were strongly down regulated in male liver upon cohesin depletion (Fig. 4f ), suggesting their expression is dependent on interactions facilitated by the intra-TAD loop encompassing this genomic region. Expression of lnc7423 was not significantly reduced, perhaps due to its closer proximity to strong male-biased enhancers.

Sex-independent, nested intra-TAD loops restrict Nox4 to proximal enhancer-promoter interactions
NADPH oxidase 4 (Nox4) exhibits male-biased expression in mouse liver (M/F = 7.7) and may contribute to a number of liver pathologies whose incidence or severity is male-biased [70,71]. Nox4 is highly upregulated in tumor compared to healthy liver tissue of mice that spontaneously develop liver tumors (Additional file 1: Figure  S7A), and in humans, Nox4 is upregulated in hepatocellular carcinoma and other cancers (Additional file 1: Figure S7B). Mouse Nox4 is located within a pair of nested intra-TAD loops (Fig. 5a, bottom), and has a strong male-biased enhancer 11.5 kb upstream of the TSS and a strong male-biased DHS 125 kb downstream of the TSS. However, only the upstream region has H3K27ac (active enhancer) marks (Fig. 5a). We placed 4C-seq viewpoints at both the upstream region (viewpoint VP1, at − 11.5 kb; red vertical highlight in Fig. 5a) and the downstream region (viewpoint VP2, at + 125 kb; green vertical highlight) to investigate chromatin interactions with each putative regulatory region. Interactions with the downstream DHS at VP2 were limited to the domain defined by the pair of 3′ anchors of the nested intra-TAD loops, consistent with these loops insulating from distal interactions (Fig. 5a, green bracket at bottom). As a result, VP2 did not interact with the promoter of Nox4 or with the -11.5 kb enhancer in either male or female liver. Furthermore, VP2 did not show any consistent male-biased interactions, despite its location at a strong male-biased DHS.
In contrast, the − 11.5 kb enhancer at VP1 showed male-biased interactions with several genomic regions, including the Nox4 promoter and two genomic regions that are in a transcribed-like chromatin state [14] in male liver, but are in an inactive state in female liver (Fig. 5a, regions Y1 and Y2; purple and red in chromatin state maps at top, respectively). Further, both VP1-interacting regions include a short sequence in an enhancer state (Fig. 5a, green band within purple region; see Additional file 1: Figure S7C). We also observed 4C-seq interactions between VP1 and a region just upstream of the shorter intra-TAD loop 3′ anchor 31 kb downstream of the Nox4 TSS in both male and female liver (Fig. 5a, red arrow; Additional file 1: Figure S7D). The high intra-domain interactions that we observed for VP1 and the shorter nested intra-TAD loop, and the low frequency of interdomain interactions, are indicative of the insulation activity of these loops [31]. Immediately downstream of the 3′ loop anchor, in region Y2, we observed 4C-seq interactions specific to male liver, which may be due to increased movement of cohesin beyond this loop anchor as a result of the increased transcription of Nox4 in male liver. Movement of cohesin has been directly linked to transcription both in vitro and in vivo [18,72,73], and may result in a more dynamic loop structure and weaker local insulation at the 3′ loop anchor of the shorter nested loop. The VP1-interacting regions Y1 and Y2 are both in a transcribed chromatin state only in male liver,  Figure S7C for further details). Regions Y1 and Y2 are in chromatin state E13 in male liver but in inactive state E2 in female liver. Y1 and Y2 both include a short enhancer state region in male liver (E11 within region Y1; E10 in region Y2). The absence of focal interactions between VP1 and VP2 supports the model of two nested and insulated intra-TAD loops shown at the bottom. All tracks were normalized and are presented as described in Fig. 4. Interactions between VP1 and the Y1 region were significantly sex-biased by FourCSeq (FDR = 0.020), but interactions between VP1 and the Y2 region were not significant in the same analysis (FDR = 0.137). Within the shown window, no interactions originating from enh2 were found to be significantly sex-biased. Thick blue arrows, CTCF anchor orientation. b Expression of Nox4, but not the neighboring gene Tyr, is cohesin-dependent. Although Nox4 is primarily regulated by proximal enhancers within the shorter intra-TAD loop, its full expression is nevertheless dependent on cohesin. This may be due to the need for the intra-TAD loop structure; however, loss of this insulation did not increase expression of Tyr. Expression of Nox4 was reduced by 62% (p < 0.0001). Data presentation are as described in Fig. 4d Page 18 of 25 Matthews and Waxman Epigenetics & Chromatin (2020) 13:30 consistent with the male-biased 4C-seq interactions of VP1 with both regions. These findings support the predicted model of two nested intra-TAD loops, with the smaller enclosed loop insulated from the larger enclosing loop. Domain predictions for other mouse tissues, based on computational methods and experimentally observed looping in mouse embryonic stem cells [26,60], support the conclusion that the genomic regions defined by VP1 and VP2 are in separate domains (Fig. 5a, bottom). Accordingly, only the -11.5 kb enhancer at VP1 would be predicted to interact with the Nox4 promoter. Generally, the cis regulatory elements relevant for the regulation of Nox4 appear to be contained within the smaller intra-TAD loop. It is less clear what regulatory function the male-biased DHS at VP2 plays, as it does not interact with Nox4 or with the downstream gene, Tyr, which is not expressed in liver (FPKM < 0.01 in both sexes).

Discussion
We investigated sex differences in autosomal 3D genome organization in the mouse liver model, focusing on sexbased differences in chromatin binding and interactions involving cohesin and CTCF, which mediate long-range DNA looping interactions that segment mammalian genomes into megabase-scale TAD domains and their shorter intra-TAD domains. We identified 1847 binding sites for cohesin and/or CTCF that show significant differential occupancy between male and female mouse liver; however, very few of these sites were associated with sex differences in TAD or intra-TAD loop anchors. Given the low resolution of Hi-C datasets available for mouse liver [34], it is possible that we are underestimating the number of TADs present, which limits our interpretation of TAD loop anchors to only a subset of the true total. Furthermore, high resolution Hi-C would also increase our confidence in determining which specific CTCF sites anchor long-range loops. A majority of the sex-biased binding sites classified as cohesin-non-CTCF (CNC) sites (but only a minority of cohesin-and-CTCF (CAC) and Lone CTCF sites) mapped to distal enhancers, and a b Fig. 6 Model of the direct and indirect contribution of cohesin and CTCF to sex-biased liver gene expression. A1bg exemplifies direct regulation, where female-biased CTCF binding can explain the observed female-bias in looping interactions. C9 is an example of intra-TAD loops present in male and female liver that contribute indirectly to sex-biased gene expression. Not only does the intra-TAD loop insulate the lowly expressed sex-independent gene Dab2; it also brings the distal male-biased enhancer into close proximity. Likely additional factors, such as Mediator, YY1, or eRNAs, can contribute directly to the interactions observed in male liver. SI, Sex-independent Matthews and Waxman Epigenetics & Chromatin (2020) 13:30 a major subset of these overlapped sex-biased enhancers. This finding indicates a role for cohesin in sex-biased enhancer activity, and is consistent with the extensive sex bias in chromatin state seen at sex-biased enhancers but not promoters [14]. We also found that 77% (72 of 96) of sex-biased CNCs that overlap sex-biased enhancers are > 20 kb from a TSS of sex-biased gene (median distance 238 kb), consistent with the well-characterized role of cohesin in mediating distal enhancer-promoter interactions [25,27]. Furthermore, male-biased genes with distal but not proximal sex-biased enhancers were much more sensitive to cohesin depletion than genes with proximal sex-biased enhancers, implicating cohesin in long-range enhancer interactions regulating these sexbiased genes. Finally, by applying circularized chromosome conformation capture with sequencing (4C-seq) to sex-biased enhancer viewpoints in five genomic regions, we established that sex differences in chromatin interactions are a common feature of sex-biased gene expression in the liver, and we elucidated how chromatin interactions link sex-biased genes to distal sex-biased enhancers, guided both directly and indirectly by cohesin and/ or CTCF looping. Although TADs and intra-TADs are largely conserved across tissues, 20-30% of all such CAC-mediated loops are cell type-specific [31]. Nevertheless, when comparing male and female mouse liver, which show extensive hormone-determined differences in epigenetic state [14,54,74], we did not find evidence for widespread formation of sex-specific intra-TAD loops. Rather, we found that intra-TAD loops in mouse liver are largely sex-independent and devoid of sex-biased CTCF or cohesin binding at their CAC anchors. These loops do have the ability, however, to indirectly facilitate sex-dependent chromatin interactions. Thus, a sex-independent intra-TAD loop was shown to insulate the super-enhancer-associated male-biased gene Nudt7, and nested intra-TAD loops insulating Nox4 restricted the promoter of this male-biased gene from an intronic enhancer while enabling interactions with a cluster of upstream enhancers. Although the currently available Hi-C data for mouse liver is low resolution, our pair of 4C-seq viewpoints in the Nox4 region support a nested loop model that is supported by computational predictions for liver and by SMC1 ChIA-PET experimental data in embryonic stem cells. Furthermore, our analysis of female-biased gene regions revealed female-biased proximal enhancerpromoter interactions in the Sult3a gene region associated with female-biased cohesin binding, as well as female-biased interactions between the A1bg promoter, a far distal (> 100 kb) enhancer, and distal female-biased CTCF binding sites. Together, these findings support the proposal that CTCF and cohesin contribute in both direct and indirect ways to the formation of sex-biased enhancer-promoter contacts in mouse liver (Fig. 6).
Our analysis of the male-biased complement factor gene C9 provides an interesting example of sexindependent CAC-looping that indirectly facilitates sex-biased enhancer-promoter contacts. C9 interacts strongly with a distal (> 200 kb upstream) male-biased enhancer while bypassing the weakly expressed (and sex-independent) Dab2 gene region, which is insulated by a nested pair of intra-TAD loops. These nested loops, in turn, bring the TSS of C9 into much closer proximity to a cluster of far upstream male-biased enhancers than would be achieved based on linear genomic distance alone (Fig. 6b). Furthermore, we observed more frequent contacts between C9 and the far upstream enhancers in male compared to female liver, despite the absence of any male-biased binding of CTCF or cohesin to help explain sex differences in contact frequency. Conceivably, the male-biased DHS located in the upstream region facilitate direct enhancer-promoter interactions specifically in male liver by working in the context of other known looping factors. Mechanistically, these interactions could be driven by proteins, such as YY1 [63] or Mediator [25], or by non-protein factors such as enhancer-RNAs (eRNAs) [75,76]. Of note, 3 of the 4 male-biased enhancers far upstream of C9 are actively transcribed to produce bidirectional eRNA transcripts in male mouse liver [77], and all four regions are bound by the protein YY1 [78] (Additional file 1: Figure S6E).
A majority of the 1847 sex-biased binding sites for CTCF and cohesin identified here are intergenic and distal to sex-biased genes and enhancers. Furthermore, many of these sex-biased binding sites are located in TADs without any known sex-biased genes. Although those sites lack any obvious link to sex-biased gene expression in untreated liver, they could have a priming effect and contribute to sex-specific responses reported for hepatic stressors, such as high fat diet [79] and xenobiotic exposure [80], which we recently showed can induce a sex-biased gene response in TADs whose genes do not show a sex-bias in expression in the unstressed state [81]. Just as short-term feeding of a high fat diet can leave a lasting epigenetic memory in the form of epigenetic modifications [82], sex-biased binding of CTCF and/or cohesin may differentially prime each sex for distinct looping patterns that enable the observed sex-biased responses to chemical exposure or dietary stressors.
While CAC sites at TAD and intra-TAD boundaries have a well-established role as anchors that enable loop domain-level nuclear organization [17,26,27,31,32], non-anchor CAC sites may directly link enhancers to promoters or to other enhancers, and thereby contribute to interactions governing tissue-specific gene expression [35,40]. A majority of sex-differential CAC binding occurs at non-anchor CAC sites (Additional file 3: Table S1), a subset of which may mediate longdistance interactions involving sex-biased enhancers and gene promoters. Specific examples described here include the enhancer-enhancer and enhancer-promoter contacts that we identified by 4C-seq for A1bg in the context of female-biased CTCF binding. Similarly, more than half of male-biased liver CTCF binding occurred at Lone CTCF sites, which we found are closer than CAC sites to gene TSS, and can also play a non-canonical role in looping between enhancers and promoters [35].
4C-seq interactions between sex-biased enhancer viewpoints and distal sex-biased lncRNAs [9,10] were found in three of the five sex-biased genomic regions we investigated. In one example, the highly female-biased gene A1bg is nearby several strongly female-biased, nuclearenriched mono-exonic lncRNAs, several of which are transcribed from genomic loci that show female-specific interactions with the distal female-specific enhancer viewpoint that we interrogated. Enhancer-associated lncRNAs have been defined as intergenic transcripts with enhancer chromatin marks whose expression is tissuerestricted and is associated with increased expression of nearby expressed protein coding genes [83,84]. Based on our findings, we suggest that functional enhancerassociated lncRNAs might be identified by their looping interactions with enhancer sequences, which can be determined globally using high throughput interaction methods, such as Hi-C [85].
The sex-biased cohesin and CTCF binding sites described here were discovered using livers from adult (8 week) mice, and likely encompass only a subset of all sex-differential cohesin and CTCF binding sites across the lifespan of a mouse, given the dramatic changes in sex-biased gene expression that occur during prenatal and especially postnatal liver development [86,87]. The sex-biased binding of cohesin and CTCF to liver chromatin in adult liver is expected to be regulated by pituitary growth hormone secretion, which is sex-dependent and produces the sex-dependent plasma growth hormone profiles that regulate the vast majority of sex differences in the adult liver, including differences in gene expression [74,88], transcription factor binding [89,90], and chromatin states [14,54]. Given that CTCF binding to DNA can be inhibited by DNA methylation [56,57,91], the hypomethylation of enhancer sequences seen in male compared to female liver [58] could contribute to male-specific CTCF binding at such sites. Such an effect is expected to become more pronounced with age, given the increased male hypomethylation reported in older mice [58,92].
The sex-specific patterns of pituitary growth hormone secretion regulating sex-biased liver gene expression emerge at puberty, and have been implicated in the dynamic regulation of liver chromatin states in both male and female adult mouse liver [54,74]. We do not know when during mouse liver development the sexdifferential chromatin interactions described here are first established, whether they constitute a relatively fixed (static) 3D framework governing transcription in male and female nuclei, or alternatively, whether they respond dynamically to the temporal changes in plasma growth hormone profiles that regulate sex differences in liver chromatin states. The potential for dynamic, reversible changes in DNA looping was demonstrated in a study where auxin-induced cohesin degradation led to a loss of virtually all DNA loops after 40 min, followed by their reestablishment within one hour of auxin withdrawal and cohesin reintroduction [22]. Extensive changes in chromatin interactions also occur during circadian oscillations in mouse liver [93][94][95] and during macrophage differentiation [96]. Our finding that the expression of distally regulated male-biased genes is highly sensitive to cohesin depletion illustrates the importance of distal enhancer-promoter interactions in maintaining malebiased gene expression. Of note, however, this response is largely not a direct result of a loss of sex differential cohesin and/or CTCF binding at distal male-biased enhancers, as only 5 of 32 distally regulated male-biased genes have male-biased CTCF/cohesin binding at their distal male-biased enhancers. Alternatively, the sexbiased enhancer-promoter and enhancer-enhancer loops that we describe here might be determined by intrinsic sex differences [88], or might be established by early postnatal hormone exposures that program liver gene expression [97]. Studies to detect dynamic changes in DNA looping and quantify changes in chromatin interaction strength, e.g., during the course of a male plasma growth hormone pulse [74], will likely require improvements to the 4C-seq protocol, including the use of unique molecular identifiers for more accurate quantification of interactions [69] and the elimination of any PCR artifacts associated with over-amplification, which are difficult to address using conventional 4C-seq methods [45] and may have decreased the magnitude of the apparent sex differences in chromatin interactions seen in our work.
In conclusion, we employed the mouse liver model with its extensive sex differences in gene expression to study sex differences in nuclear organization and DNA looping interactions in a non-reproductive tissue exposed to sexunique patterns of hormone stimulation. We determined that male-biased genes with distal but not proximal sexbiased enhancers are particularly sensitive to the loss of cohesin binding. Furthermore, while most sex-biased binding sites for CTCF and cohesin were found to be distal from sex-biased genes, a subset likely contributes to sex-biased looping between regulatory elements in cis, as exemplified by the female-biased DNA looping interactions observed for A1bg. In addition, sex-independent CAC-looping may indirectly provide sex specificity to chromatin interactions by insulating male-biased genes such as Nudt7, or by bringing a sex-biased gene into closer proximity to a cluster of sex-biased enhancers, as demonstrated for C9. Together, these findings illustrate the direct and indirect contributions that cohesin and CTCF can make to sex-biased gene expression in the liver, and may be broadly applicable to other biological systems where distal regulation of gene expression is of interest.
Additional file 1: Figure S1. Consistency of biological replicates and additional features of sex-biased differential CTCF/cohesin peaks. A-D. Consistency of biological replicate ChIP-seq samples, with merged tracks and annotations as in Figs. 2 and 3. Here, we additionally show all cohesin (n = 3 per sex) and CTCF ChIP-seq replicates (n = 4 per sex). Blue, samples from male mouse liver; red, samples from female mouse liver are shown in red. A. Biological replicates for a male-biased "Lone CTCF" site nearby Mir2136. B. Biological replicates for two female-biased CAC(ΔCoh) peaks nearby 5530601H04Rik. Higher cohesin signal is observed across replicates for both the weaker upstream peak and a strong downstream peak. CTCF signal is not significantly different at either peak. C. Biological replicates for a female-biased CAC peak (ΔCoh AND ΔCTCF) upstream of strong sex-independent peak in an intron of Moh9. D. Biological replicates for several sex-independent CAC peaks of various peak strengths. Replicates are consistent at all sites regardless of peak strength. E. The limited overlap of sex-biased CTCF and sex-biased cohesin binding, seen in Fig. 1B, is not an artifact of the thresholds used for peak filtering. In the data shown here, similar to Fig. 1B, the overlap of sex-biased CTCF and sex-biased cohesin binding sites is limited. Shown are Venn diagrams with the number of overlapping sex-biased cohesin (Left; blue) and CTCF sites (Right; purple) for the following groups (top to bottom): (1) All sex-differential sites output by diffReps without filtering for overlap of a MACS2 ChIP-seq peak; (2) All sex-differential diffReps sites (rather than peaks, which may contain multiple sites) from diffReps that overlap a MACS2 peak for a given factor; (3) All sex-differential hotspots identified by diffReps, which is an alternate method in the diffReps software package to identify differential sites. Specifically, this last approach looks for clusters of differentially co-regulated sites that might be missed by simple overlap analysis. Overlap for all Venn diagrams is defined as ≥ 1 bp overlapping using bedtools. In some cases, two Rad21 peaks overlap a CTCF peak, or vice versa, and therefore, the number of overlapping cohesin (Rad21) sites does not necessarily equal the number of CTCF sites (hence, two numbers in the Venn overlap). F. Pie charts showing the class distribution of each sex-biased CTCF/cohesin peak set (from top to bottom): male-biased ΔCohesin peaks, female-biased ΔCohesin peaks, male-biased ΔCTCF peaks, and female-biased ΔCTCF peaks. For each of these four groups, the fraction of peaks at CAC sites is shown in purple while the fraction of peaks at either CNC (for ΔCohesin) or Lone CTCF (for ΔCTCF) is shown in blue. The total number of differential peaks in each group is indicated below each chart. Overall, female-biased sites comprised a higher percentage of CAC sites than male-biased sites. Consequently, a larger percentage of male-biased peaks are CNC peaks (for ΔCohesin peaks) and Lone CTCF peak (for ΔCTCF peaks). Peak numbers here differ slightly from Fig. 1B for cohesin differential peaks, but not CTCF, because of our approach to categorizing peaks as CNC or CAC for cohesin peaks (see "Materials and methods"). For CTCF we defined CAC peaks as genomic regions bound by CTCF that were also bound by cohesin in a majority of individual cohesin replicates (2 or 3 out of a total n = 3 per sex). Using the same approach for cohesin, we defined CAC peaks as genomic regions bound by cohesin that were also bound by CTCF in a majority of individual CTCF replicates (3 or 4 out of a total n = 4 per sex). If a peak was bound by none or only a minority of replicates for the opposite factor then it was considered Lone CTCF (in the case of CTCF; 0 or 1 cohesin replicates overlapping) or CNC (in the case of cohesin; 0 or 1 CTCF replicates overlapping). As CTCF has n = 4 replicates, if a cohesin peak is bound by exactly 2 individual CTCF replicates (of the same sex) then it is not classified and is excluded from downstream analyses. 54 male-biased cohesin peaks overlap 2 male CTCF replicates and 36 female-biased cohesin peaks overlap 2 female CTCF replicates (value of 2 in column H of Additional file 3: Table S1B). All overlaps were performed using bedtools with a minimum overlap of 1 bp, and all comparisons were made separately for males and females. Figure S2. Comparison of sex-biased CTCF/cohesin peaks. A. Female-biased CTCF and cohesin peaks tend to be stronger than male-biased peaks. Shown here are box plots for ChIP-seq signal for CTCF and cohesin for both ΔCohesin and ΔCTCF peaks. These plots differ from those presented in Fig. 2A, which present normalized ChIP signal for the factor with differential signal (i.e., male and female cohesin ChIP-seq signal for ΔCohesin peaks). In aggregate, CAC peaks with significant sex-biased cohesin binding show the same directionality of sex-bias for CTCF (and vice versa), albeit at a reduced magnitude (see also Fig. 1C). The y-axis shows normalized ChIP-seq signal for the groups indicated along the x-axis. Peaks with male-biased and female-biased cohesin binding (Left) and CTCF binding (Right) are presented separately. Each group of 4 box plots represents the male and female ChIP-seq signal for cohesin, followed by the corresponding ChIP-seq signals for CTCF for the same set of peaks. Each plot represents all differential peaks for a given sex (male or female) and factor (CTCF or cohesin). These four datasets are further divided by peak type (CAC or CNC for ΔCohesin peaks, and CAC or Lone CTCF for ΔCTCF peaks), as indicated below the x-axis. Peak scores are calculated by average intra-peak ChIP signal, normalized by the total sequence reads per million in peak (RIPM; see "Materials and methods"). B. Female-biased CAC peaks contain higher quality CTCF motifs than male-biased CAC peaks [p = 0.0212 for CAC(ΔCoh) and p = 0.0023 for CAC(ΔCTCF) peaks; M-W t test], as reflected by the FIMO motif score. This log-likelihood ratio score is a reflection of how close the best intra-peak motif matches the canonical core CTCF motif MA0139.1. There is no significant difference between motif scores for male-biased and female-biased Lone CTCF, or for male-biased and female-biased CNC peaks (p = 0.7671 and p = 0.1329; M-W t-test). The dashed line at FIMO score = 10 reflects the cutoff used to define the presence or absence of a motif in Additional file 1: Figure S2C. C. CTCF Motif frequency, based on presence of CTCF motif (MA0139.1) as identified by FIMO, with a minimum score of 10. The y-axis shows the percent of peaks in each group (separately for male-biased, female-biased, and sex-independent) found to have a CTCF motif within the peak region. A larger fraction of female-biased than male-biased CAC peaks was found to contain a CTCF binding motif. In contrast, a larger fraction of male-biased Lone CTCF peaks contain a CTCF motif, despite no significant difference in peak strength between male-biased and female-biased Lone CTCF peaks. A larger fraction of female-biased CNC peaks contain a CTCF motif, however, the vast majority do not contain CTCF motifs, as expected (< 20% for all groups). In all cases, the percent for each group is comparable to a matched set of sex-independent peaks. D. Proportion of male-biased and femalebiased CTCF peaks that have: no CTCF motif (gray), a CTCF motif lacking a CpG dinucleotide (orange), or a CTCF motif containing a CpG dinucleotide (orange). "All" indicates any male-biased or female-biased CTCF peak. E. Female-biased intra-TAD on Chr19 that contains 12 sex-biased genes, shown in blue and red boxes, some of which are lncRNA genes (ncRNA gene designations, in green). Inset at bottom left of the Figure shows CTCF and cohesin (Rad21) ChIP-seq tracks for male and female mouse liver surrounding the 3′ boundary of the femalebiased intra-TAD. F. Intra-TAD loops and loop anchors are mostly shared between male and female mouse liver. Using a computational intra-TAD loop prediction algorithm [31], we used the cohesin and CTCF ChIP-seq datasets for male and female mouse liver to identify 9543 intra-TAD loops in male liver [31] and 9724 loops in female liver, respectively. 87.9% of the intra-TAD loops in male liver were also identified in female liver (left), and 93.4% of the male intra-TAD loop anchors are also predicted to be loop anchors in female liver. This finding is consistent with there being a limited number of autosomal CAC peaks with sex differences in CTCF and cohesin binding (53 total) (Additional file 3: Table S1C) To account for nested loop structures, shared loops were defined as loops with a reciprocal overlap of 80% or greater between the loops, as implemented in prior studies of CAC-mediated insulating loops [26,31]. Figure S3.
Tissue conservation of liver sex-differential CTCF and cohesin peaks in ENCODE mouse consortium datasets. A. The x-axis indicates the number of male mouse tissues other than liver where CTCF is bound, out of 15 tissues examined by the ENCODE Consortium. A value of 15 indicates tissue-ubiquitous CTCF binding, and a value of 0 indicates liver-specific CTCF binding. The y-axis shows the proportion of male-biased peaks (blue) or female-biased peaks (red) that fall into a given bin. P-values comparing the distribution of tissue-specificity values for CTCF binding between males and females are indicated in the upper left corner of each plot (KS t-test). Results show that CAC sites (upper panels) are much less tissue-specific than Lone CTCF and CNC sites (lower panels). Further, female-biased CAC peaks are less tissue-specific than male-biased CAC peaks, while male-biased Lone CTCF peaks are less liver-specific than female-biased peaks of the same class. The greater tissue ubiquity of CTCF binding for female-biased CAC peaks could be due to the fact that female-biased CTCF peaks are stronger and contain higher quality CTCF motifs, insofar as stronger peaks show greater conservation for both sex-differential and all CTCF peaks (see panel B, below); however, male-biased Lone CTCF peaks are not significantly stronger, nor do they contain higher quality motifs than the female-biased Lone CTCF peaks. The apparent difference could be due to the fact that the CTCF ChIP-seq data from the non-liver, non-reproductive tissues examined here was obtained by the ENCODE consortium from male mice [60]. Very few male-biased and female-biased CNC peaks were bound by CTCF in other any other mouse tissues (< 20% of the total sex-biased liver CNC sites). This finding provides additional evidence that CNC sites are found at liver-specific cis regulatory elements, and that these sites do not act as insulators in other non-liver tissues (i.e., CTCF binding is lost in liver or gained in some other tissue). B. There is a strong association between CTCF peak strength and tissue conservation of CTCF binding, which likely explains the modestly higher tissue conservation of female-biased CTCF and cohesin peaks seen in panel A. Shown on the y-axis are reads-in-peaks normalized ChIP-seq data for all CTCF peaks, male-biased CTCF peaks, and female-biased CTCF peaks. These are grouped according to the number of non-liver ENCODE tissues with a CTCF peak, where 0 indicates a peak is liver-specific and 15 means all male mouse tissues with ENCODE datasets have CTCF bound at that position, as in panel A. C. The tissue specificity of neighboring genes varies significantly with the class of CTCF/cohesin binding site. Shown are Tau values, where a value close to 1 indicates the pattern of expression across mouse ENCODE RNA-seq datasets is highly tissue-specific, and where Tau values less than ~ 0.3 indicate housekeeping genes. Nearest genes (within 20 kb) were defined based on distance to the TSS, and only liverexpressed genes were considered (FPKM > 1). Tau values were calculated based on the average of two replicates from all tissues except testis, using expression data generated by the ENCODE consortium. Both female-biased and male-biased CNC sites are near genes that generally are more tissue-specific than liver-expressed genes. In addition, genes near female-biased CNC sites are significantly different than genes near similarly sex-biased CACs (p = 0.007; M-W test). This difference is not a reflection of the male-biased or female-biased CAC group used in the comparison, insofar as genes near female-biased CNCs are significantly more liver-specific than genes near male-biased CAC sites (p = 0.0171; M-W test), while the opposite comparison for male-biased CNCs vs female-biased CACs is still not significant (p = 0.07; M-W test). For these analyses, liver-expressed genes are defined by a liver expression value of FPKM > 1 (8810 genes in total) and mapping was based on the closest TSS within 20 kb of a peak. Figure S4. Screenshot of TAD containing C8a/C8b, and cohesin depletion effects. A. Shown is a screenshot with proposed model linking the distal male-biased enhancers and component complement genes C8a and C8b within a single TAD on mouse chromosome 4. This screenshot spans chr4:102960671-104603975. The tracks, normalization, and annotations are as described in Fig. 3. B. For the proximally regulated male-biased genes shown in Fig. 3B (Nat8 and Cml5), depletion of cohesin does not significantly impact gene expression. Figure S5. Quality control of 4C-seq library. A. Agarose gel analysis for quality control of ligated, digested, and re-ligated 4C samples. Lane (i) analyzes a sample after proximity ligation, lane (ii) shows the sample after digestion with the restriction enzyme Csp6i, and lane (iii) shows the sample after self-circularization ligation. Lane (iii) represents the final material used as input for inverse PCR with viewpoint-specific primers (Additional file 2: Table S3A). DNA fragment sizes (in kb) are marked on the left of the gel. B. Agarose gel analysis for quality control of final 4C-seq libraries after the inverse PCR step. A diverse mixture of PCR products is present, as indicated by a smear on the gel, with sizes primarily below ~ 1 kb, which indicate a high-quality library and which allows for efficient sequencing. DNA fragment sizes (in