Poly(ADP-ribosyl)ation associated changes in CTCF-chromatin binding and gene expression in breast cells

CTCF is an evolutionarily conserved and ubiquitously expressed architectural protein regulating a plethora of cellular functions via different molecular mechanisms. CTCF can undergo a number of post-translational modifications which change its properties and functions. One such modifications linked to cancer is poly(ADP-ribosyl)ation (PARylation). The highly PARylated CTCF form has an apparent molecular mass of 180 kDa (referred to as CTCF180), which can be distinguished from hypo- and non-PARylated CTCF with the apparent molecular mass of 130 kDa (referred to as CTCF130). The existing data accumulated so far have been mainly related to CTCF130. However, the properties of CTCF180 are not well understood despite its abundance in a number of primary tissues. In this study we performed ChIP-seq and RNA-seq analyses in human breast cells 226LDM, which display predominantly CTCF130 when proliferating, but CTCF180 upon cell cycle arrest. We observed that in the arrested cells the majority of sites lost CTCF, whereas fewer sites gained CTCF or remain bound (i.e. common sites). The classical CTCF binding motif was found in the lost and common, but not in the gained sites. The changes in CTCF occupancies in the lost and common sites were associated with increased chromatin densities and altered expression from the neighboring genes. Based on these results we propose a model integrating the CTCF130/180 transition with CTCF-DNA binding and gene expression changes. This study also issues an important cautionary note concerning the design and interpretation of any experiments using cells and tissues where CTCF180 may be present.


INTRODUCTION
The CCCTC-binding factor (CTCF), an evolutionarily conserved and ubiquitous transcription factor, is a key regulator of chromatin architecture and multiple cellular functions including transcriptional activation, silencing, insulation, mediation of long range chromatin interactions and others (1)(2)(3)(4)(5)(6). Great efforts are currently being made to integrate CTCF ChIP-sequencing (ChIP-seq) data with other types of high-throughput data such as RNA-sequencing (RNA-seq) and chromatin signatures to better understand gene regulatory mechanisms and the interplay between CTCF genetic and epigenetic regulation (7,8). It is particularly important because CTCF binds to numerous sites of unclear functions in the human genome, and some of these binding sites differ between different cell types (9). Post-translational modifications of chromatin proteins (histones, transcription factors and others) play a significant role in the regulation of epigenetic processes. Poly(ADP-ribosyl)ation (PARylation) is one of such modification performed by poly(ADP-ribose) (PAR) polymerases (PARPs) (10,11).
A growing body of evidence demonstrates the link between CTCF and PARylation; for example, the insulator and transcription factor functions of CTCF have been found to be regulated by PARylation (16,17). The link between PARylation and CTCF is important in DNA damage response (18). Direct interaction between CTCF and poly(ADP-ribose) polymerase 1 (PARP1) and their co-localization in the genome have been reported (19)(20)(21). Furthermore, PARP1 and CTCF have been found to regulate the transition between active and repressed chromatin at the lamina (22). A highly PARylated form of CTCF is represented by a protein with molecular mass 180 kDa (CTCF180), whereas the commonly observed CTCF130, is hypo-or non-PARylated and appears in many immortalized cell lines and in cancer tissues (19,(23)(24)(25). Interestingly, only CTCF180 was detected in the normal breast tissues, whereas both CTCF130 and CTCF180 were present in breast tumours (25). Generally, CTCF130 is associated with cell proliferation whereas CTCF180 is characteristic for non-proliferating cells of different types. Among those are cells from healthy breast tissues with very low proliferative index (25), cells with induced cell cycle arrest, DNA damage (25), senescence (26) or apoptosis (24,25). Currently, all existing information regarding the binding characteristics of CTCF has been mined from the experimental data obtained for CTCF130, but not CTCF180. It is not known whether the sets of targets for CTCF130 and CTCF180 are the same, completely different or overlap, and how binding of different forms of CTCF may be associated with alteration in gene expression.
In this study we utilised the immortalized human luminal breast cell line, 226LDM, in which the CTCF130 to CTCF180 transition could be induced following growth arrest (25), with the aim to analyse the genomic targets for CTCF130 and CTCF180, together with the corresponding transcriptomes, in two functional states of 226LDM cells. The first state consists of control, proliferating cells predominantly containing CTCF130, while the second state represents cells in which proliferation blockade is chemically induced leading to the presence of CTCF180 only. The 226LDM cell model therefore provides us with the unique opportunity to study both CTCF forms, whereby overcoming the problem with the absence of a specific antibody against CTCF180.

Cell Culture
226LDM cells, derived from human luminal breast cells, were propagated and treated as previously described (25). In brief, cells were seeded in flasks and grown in DMEM/F-12 (PAA) supplemented with 5 μg / ml insulin, 1 μg / ml hydrocortisone, 20 ng / ml epidermal growth factor, 20 ng / ml cholera toxin (all from Sigma), 10 % fetal bovine serum (FBS) (Biosera), and 50 μg / ml gentamicin (Life Technologies-Invitrogen) at 37 o C and 5 % CO 2 . To induce cell cycle arrest cells were treated with 100 mM hydroxyurea for 24 h followed by 1 h of complete medium and a further 24 h with 500 ng / ml nocodazole (SIGMA), and cells in suspension were then harvested for further analyses. Untreated adherent proliferating 226LDM cells were used as control.

Immunoblotting
The endogenous protein levels of CTCF were observed by SDS-PAGE/ western blot analysis (27,28) in whole cell lysates of 226LDM cells from the control and treated populations using a polyclonal anti-CTCF antibody (Millipore, 07-729, lot # JBC1903613). Anti-tubulin specific antibody (SIGMA, T5168) was used as a loading control. Chemiluminescence detection was performed with the Fusion FX7 gel documentation system (PeqLab) and the UptiLight (Interchim) reagents according to the manufacturer's instructions.

Protein Immunoprecipitation (IP)
This method was used to detect and immunoprecipitate CTCF out of a solution containing thousands of proteins present in 226LDM cells, using an antibody that specifically recognizes this protein (29).
226LDM cells cultured in a T75 flask were trypsinized, washed twice with PBS and then lysed by vortexing in BF2 (25 mM Tris/Hepes -pH 8.0, 2 mM EDTA, 0.5% Tween20, 0.5 M NaCl, 1:100 Halt protease inhibitor cocktail). The lysate was incubated on ice for 15 min and then equal volume of BF1 (25 mM Tris/Hepes -pH 8.0, 2 mM EDTA, 0.5% Tween20 and 1:100 Halt protease inhibitor cocktail) was added. For Immunoprecipitation, the cell lysate was pre-cleared by incubating 500 μl of the lysate in 50 μl of pre-blocked Protein A/Sepharose beads for 30 minutes at 4 o C on a rotor shaker. The sample was then centrifuged at 200 x g for 1 minute at RT and the pre-cleared supernatant was transferred into a fresh centrifuge tube. 50 μl of the sepharose beads were added to the pre-cleared lysate along with the anti-CTCF antibody (Millipore, 07-729, lot # JBC1903613) and the samples were incubated overnight at 4 o C on a rotating wheel. On the following day, the immune-complexes were recovered by centrifugation at low speed and the supernatant was removed. The pellet was washed three times with immunoprecipitation buffer (BF1+BF2) and each time the beads were collected with centrifugation at low speed. The sepharose was then lysed in SDS-lysis buffer and analysed by SDS-PAGE and western blot analysis as described in previous section.

Chromatin Immunoprecipitation (ChIP)
ChIP was performed using the ZymoSpin kit (Zymo Research USA) following the manufacturer's instructions. In brief, 5 x 10 6 of 226LDM cells from the control and the treated populations were crosslinked with formaldehyde. The cross-linking was quenched with glycine and the cells were washed twice with PBS with the addition of a protease inhibitor cocktail before pelleting at 1000 g for 1 min at 4 o C. The pellet was lysed in Chromatin Shearing Buffer and sonicated using Bioruptor Plus (Diagenode) on high power to obtain fragments of 250-300 bp. ChIP reaction mixes containing sheared chromatin, Chromatin Dilution Buffer, anti-CTCF antibody (Millipore, 07-729, lot # JBC1903613 or no-antibody for negative control) and protease inhibitor cocktail were incubated rotating overnight at 4 o C. The next day, ZymoMag Protein A beads were added to the mix and incubated for 1 h at 4 o C. The complexes were washed with Washing Buffers I, II and III and then the beads were re-suspended in DNA Elution Buffer. Following de-crosslinking with Proteinase K at 65 o C, the ChIP DNA was purified using the ZymoSpin IC columns. The samples were stored at -80 o C. The concentration of DNA in the ChIP samples was measured using the NanoDrop 3300 fluorospectrophotometer (Thermo Scientific) along with the Quant-iT™ PicoGreen ds DNA assay kit according to the manufacturer's instructions.

RNA extraction
Total RNA from 226LDM cells (three biological replicates from the control and three from the treated population) was extracted using the TRIsure reagent (Bioline) according to the manufacturer's guidelines. Briefly, cells grown in a T75 flask were washed twice with PBS, then scraped off and pelleted at 300 g for 5 min. Following incubation with TRIsure for 5 min at RT, chloroform was added and the sample was incubated for 15 min at RT. After centrifugation at 9,500 g for 15 min at 4 o C, the top aqueous layer was carefully extracted and the genetic material was precipitated with isopropanol for 20 min on ice. After centrifugation (9.500 g / 15 min / 4 o C) the pellet was washed twice in 75 % ethanol before air-drying the obtained RNA pellet. The RNA was solubilized in sterile water (40-50 μl) and heated for 10 min at 55 o C. The pellet was stored at -80 o C. The RNA quality was tested using the Agilent Bioanalyzer system; the electropherographs are shown in Supplemental Figure S1.

Library Preparation and Sequencing
The library preparation and sequencing using the Illumina platforms were performed at the University College London (UCL) Genomics Centre.

ChIP-seq analysis
Uniquely mappable reads were aligned to the human hg19 genome with the help of Bowtie (30) allowing up to 1 mismatch. Peak calling was done with MACS (31) using standard parameters, and taking into account the corresponding Input. Replicate experiments were analysed separately, and then obtained peaks were merged. The aggregate profiles were calculated using NucTools (32) and visualised using OriginPro (Origin Lab). Sequence motif analysis was performed using HOMER (33).

RNA-seq analysis
Reads were aligned using Novoalign 3.2 to the reference genome (hg19) and the raw counts were normalized to RPKM values using the Bam2rpkm tool from Galaxy. Differential expression was determined using DeSeq. Genes whose expression change was less than 1.5-fold were included in the "unchanged" gene expression category. Gene Ontology (GO) analysis was performed using ChIP-seq and RNA-seq data have been deposited to the GEO archive (accession number GSE102237).

Analysis of CTCF binding and gene expression profiles in proliferating (control) and arrested (treated) 226LDM cells
The 226LDM cell line was chosen as a model to investigate binding patterns of CTCF130 and CTCF180 in the genome and correlate them with transcription activity. The main challenge was the absence of an antibody specifically recognising CTCF180; i.e. all existing anti-CTCF antibodies detect either just CTCF130 or both forms. The utilisation of 226LDM cells presents a unique opportunity to discriminate between these forms, because proliferating 226LDM cells predominantly contain CTCF130, whereas cells treated with hydroxyurea (HU) and nocodazole (NO), blocking the cell cycle in the S and G2-M phase, respectively, display only CTCF180 (25). The latter cells display clear morphological changes becoming rounded and suspended in the medium. Importantly, due to batchto-batch variations, screening procedures are required to select the appropriate antibodies (25). Such tests were conducted in the current investigation and the antibodies which could recognize both CTCF130 and CTCF180 (Millipore, 07-729, lot # JBC1903613) were selected from the panel of several anti-CTC anti-CTCF antibodies (data not shown). Using these antibodies we confirmed the transition from CTCF130 to CTCF180 following 226LDM cells treatment ((Supplemental Figure S2) and also their ability to immunoprecipitate both CTCF forms (Supplemental Figure S3). These antibodies were then used for ChIP-sequencing (ChIP-seq) analysis of CTCF binding in control (proliferating) and arrested (treated) cells. RNA from these cells was also purified and RNA sequencing (RNA-seq) performed.
The analysis of total transcriptomes of control and treated cells revealed that 11,180 coding mRNAs were significantly differentially expressed in treated cells ( The analysis of CTCF ChIP-seq revealed that the number of CTCF target sites was considerably higher in control cells (n=9986) compared to treated cells (n=2271). Three groups of CTCF sites with different binding patterns in control and treated cells were observed, which were termed "common", "lost" and "gained". In the common group, CTCF was bound to the same sites in both cell states (as identified by ChIP-seq peaks), however, the CTCF occupancies varied. In the lost group CTCF binding was observed in control cells and not in treated cells, and in the gained group CTCF binding was only observed in treated cells ( Figure 1A). The majority of the sites (9729) were lost after treatment; 2014 sites were gained and 257 sites were common ( Figure 1B).
The analysis of expression of genes containing CTCF in their promoter regions, within +/-10,000 bp from the Transcription Start Sites (TSS), showed that, collectively for all three groups, most of these genes were down-regulated (1169 or 49.6%); 443 (18.8%) were up-regulated and 744 (31.6%) unchanged ( Figure 1C, left panel). The calculations of the relative number of genes showed similar patterns in individual groups (common, lost and gained) whereby the majority of the genes (~50-55%) were down-regulated and ~10-20% were up-regulated ( Figure 1C, middle panel). The absolute numbers of genes associated with changes in their expression were considerably higher in the lost group ( Figure 1C, right panel). This is not surprising since the majority of the CTCF binding sites belonged to this group ( Figure 1B). Expression did not change in a considerable percentage of genes in these groups (~30-35%). A large proportion of genes associated with the presence of CTCF belongs to housekeeping genes (35), with 39%, 42% and 20% up-regulated, and 14%, 15% and 30% down-regulated in the lost, gained and common groups, respectively. No changes were found in 47% of lost and gained, and 50% of common groups ( Figure 1D).
Gene ontology analysis of transcriptional changes revealed genes highly up-or down-regulated in three different groups ( Figure 1E). In the common group, highly up-regulated genes were associated with differentiation and down-regulated geneswith cell migration and apoptosis. In the lost, the largest group, highly up-regulated genes were enriched in categories associated with anti-apoptotic processes, whereas most of the highly down-regulated genes were associated with cell cycle and cell migration processes. In the gained group, both highly up-regulated and down-regulated genes were enriched in categories regulating RNA Pol II transcription.
Next, we assessed the relative enrichment of gene ontology terms of the genes with promoters containing CTCF (Supplementary Figure S6). In the common group, up-regulated genes were enriched in developmental processes and down-regulated genes were enriched in response to metal ions. In the lost group, up-regulated genes were enriched in ion binding and homeostasis processes, whereas most of the down-regulated genes were associated with signal transduction and adhesion processes. In the gained group, up-regulated genes were enriched in those involved in the regulation of macromolecular complexes and membrane transporter activities, whereas down-regulated genes were enriched in genes involved in nucleotide binding and biosynthetic processes. Taken together, these experiments demonstrate characteristic CTCF binding and the corresponding gene expression patterns for cells containing predominantly CTCF130 (control) and CTCF180 only (treated).

Relationship between CTCF occupancy and gene expression in control and treated cells
Next, we investigated the relationship between changes in CTCF binding and gene expression. By stratifying all CTCF-containing promoters (within +/-10,000 bp from TSS) according to the level of expression from the corresponding gene, we observed that it was more likely to find CTCF at a promoter of a higher expressed gene in both control and treated cells (Figures 2A and 2B).
Furthermore, due to the loss of CTCF from promoters of many low-expressed genes upon treatment, this effect is more pronounced in treated cells ( Figure 2B). Interestingly, when we stratified genes by their expression fold change upon treatment ( Figure 2C), it appeared that there was a clear preference for retained CTCF at common sites at promoters to be associated with genes which did not change or changed their expression minimally. Genes considerably up-or down-regulated upon treatment have lost CTCF at their promoters (see the leftmost and rightmost parts of Figure 2C).
We also correlated changes in gene expression with the changes in CTCF occupancies for the three groups of CTCF sites within a more narrow window +/-1000 bp from TSS as typically done in this type of studies (36). In agreement with observations above ( Figure 2C), for the group of genes contained common CTCF sites at their promoters the changes in gene expression were relatively small ( Figure   3A), while promoters which lost or gained CTCF were associated with a much broader range of gene expression levels ( Figure 3, panels B and C, respectively). No correlation was observed between CTCF occupancy and gene expression in the common and lost groups, although small positive correlation (r=0.15) was seen in the gained group.

Common and lost, but not gained, CTCF sites contain classical CTCF binding motifs
CTCF employs a combination of its eleven Zinc fingers to bind to diverse DNA sequences with a central ~20 bp core DNA motif critical for CTCF binding (37). Most of CTCF sites contain the classical consensus sequence, however CTCF sites with different consensus motifs and also sites which do not match any consensus motifs have been also identified previously (37-40). Since the composition of the motif may be linked to CTCF functions, we compared the CTCF binding sites identified in the three groups. We calculated the nucleotide frequencies as a function of distance from the summit of CTCF ChIP-seq peak for the subsets of the sited from the common, lost and gained groups. As shown in Figure 4, CTCF sites in the common and lost but not the gained groups contain classical CTCF recognition motif, enriched with the guanine and cytosine residues at the summit, although this pattern is more pronounced for the common sites. Interestingly, the nucleotide distribution in the 3' and 5' flanking regions of the motifs significantly differs between these groups, demonstrating higher GC content in the common group. This is in line with our previous observation that common but not lost/gained sites were enriched inside CpG islands for the system of mouse embryonic stem differentiation (41) We have also assessed the strength of CTCF binding for different classes of CTCF sites calculated by the heights of the ChIP-seq peaks ( Figure 4D). The strongest binding was observed in the control cells in the common group, which on average almost did not change after treatment. The initial CTCF signal in the lost group was smaller than in the common sites before treatment, and it significantly decreased after treatment. The lowest signal was in the gained group which most likely reflected the nature of CTCF180 binding (very weak or DNA-independent).
From these analyses we conclude that common and lost sites are characterised by the presence of the CTCF consensus motif. The strongest CTCF binding is observed for common sites, whereas it is weaker for the lost sites. The gained sites have no classical CTCF motif and this may be a reason why their CTCF binding is the lowest.

Loss of CTCF binding upon cell treatment is associated with chromatin redistributions
The importance of CTCF in the regulation of the chromatin structure has been widely recognized. For example, CTCF binding often demarcates distinct chromatin states and protects DNA from methylation (2,5,42). Using our model system, we investigated whether the presence of CTCF130 and CTCT180 at the sites in three different groups was associated with the changes in the chromatin density. In our first analysis, the accessibility of chromatin to shearing in the input samples was considered, as in several previous studies (43,44). Figure 5 shows that CTCF130 bound regions in the common control group are associated with less dense chromatin than at the same regions in treated cells. This also correlated with the reduced strength of CTCF180 binding at these sites after treatment ( Figure 4D). In the lost group, the density of chromatin increased after treatment and release of CTCF180 ( Figure 5B), whereas in the gained group the density of chromatin at CTCF180 binding sites did not change following CTCF recruitment ( Figure 5C). Taken together, these findings indicate that, in comparison with CTCF180 in treated cells, CTCF130 in control cells is associated with less dense chromatin and stronger binding to its sites. These data are in line with our previous reports on the competition of CTCF with nucleosomes (45-47). Note that the chromatin peaks in Chromatin redistributions reported in Figure 5 can be associated with different covalent modifications of histones. It was beyond the scope of this work to test all histone modifications systematically, therefore, as a test case, we have profiled only the H3K9me3 modification which is known to be associated with higher nucleosome density (46). Figure 6 (panels A and B) shows the results of the H3K9me3 ChIP-seq analysis demonstrating significant H3K9me3 redistributions around common and lost CTCF sites. Interestingly, no such rearrangements around gained CTCF sites were observed, which is in line with our previous findings that the mechanisms of CTCF binding at gained sites is different from classical CTCF/nucleosome competition and CTCF-DNA recognition.
Since PARylation can change physical CTCF interactions with chromatin proteins, we have also looked at the chromatin profiles near individual CTCF sites (not at the CTCF peak summits, but in the neighboring regions). Examples of specific gene promoter regions where CTCF-associated chromatin rearrangements take place following treatment, together with changes in gene expression patterns, are given in Supplementary Figure S7. We noted that CTCF binding in control cells was associated with sharp chromatin peaks in the physical proximity to CTCF, which disappears upon cell treatment.
In ddition, in some cases the chromatin peaks and CTCF happen at different ends of the gene (e.g. a chromatin peak at the transcription start site (TSS), and CTCF at the transcription end site (TES) in the case of E2F4. The latter suggests possible TSS-TES bridging by CTCF in control cells, which disappears after treatment. The effect of CTCF-dependent chromatin reorganization was observed for all three groups of sites (common, lost and gained), and was not correlated with changes of gene expression (gene expression could go either up or down following treatment). The fact that CTCFdependent chromatin peaks were next to CTCF but did not coincide with it, provides an argument that the chromatin peak is not formed by CTCF itself. The depletion of chromatin peaks near CTCF sites at the PARP3 and TP53 promoters was also confirmed by ChIP experiments with the primers residing inside the corresponding chromatin peaks (panels D and E in Supplementary Figure 7). We also analyzed profiles of H3K9me3 chromatin marks in the same promoters. As shown in Supplemental Figure

DISCUSSION
This study was aimed to analyse DNA targets for CTCF130 and CTCF180, together with the transcriptomes, in 226LDM cells successfully used in our previous studies (25). In the absence of a specific anti-CTCF180 antibody, it was rational to use a cell model in which a switch from CTCF130 to  Figures S4 and S5).
The analysis of the ChIP-seq confirmed for the first time that CTCF180 has well-defined genomics targets, paving the way for further research into the specifics of this binding in different conditions, cell lines or tissues. The number of sites occupied by CTCF180 was found to be much smaller than in control cells (n=2271 vs n=9986, respectively). This may be due to reduction of total CTCF and/or relocalization of at least some of the CTCF180 molecules into the cytoplasm after cell cycle arrest (Supplemental Figures S2 and S9). Such CTCF distribution pattern has previously been reported in normal breast tissues where only CTCF180 is detected (25,55). The presence of smaller number of CTCF sites in the genome of the treated cells may indicate that, individually, these sites organize and regulate larger chromatin domains. Moreover, the molecular basis of these networks is likely to be different because of the particular nature of the binding sites of CTCF180. These aspects will need to be explored in the future.
This study provides new insights on DNA-binding and gene regulatory properties of CTCF180 (summarized in Figure 7). Common and lost sites contain the classical CTCF motif, although the former are more GC-rich at the summit and in the background around the motif, whereas the latter are embedded into more AT-rich sequences. The effect of common CTCF sites residing in more GC-rich areas has been previously reported in our study of mouse embryonic stem cell differentiation (41) and it seems to be a general effect. Such properties of these sequences may be linked to the strength of CTCF binding, which is higher in the common group than in the lost group ( Figure 4D Figures S7 and S8).
No CTCF binding motif was observed in the gained group suggesting that CTCF180 may be binding to rare non-canonical CTCF sites. It is also possible that CTCF180 interacts with these regions in a DNA-independent manner, directly or through recruitment by other proteins. At least one of histone modifications redistributing at common and lost (but not gained) CTCF binding sites was found to be H3K9me3, however it is important that a wider range of chromatin marks is tested in future studies.
The lost CTCF sites localized in promoter regions are associated with the highest numbers of genes whose expression is affected (80.1%), followed by the gained (15.3%) and common (4.6%) sites.
Unchanged expression of a significant number of genes in the lost and gained groups indicates that the regulation of these genes does not depend on CTCF binding. This may not be the case for the common group in which CTCF may be important to sustain the optimal level of expression of certain genes needed for survival of cells in different functional states. According to our study, this effect is especially relevant for housekeeping genes (Figures 1D and 3).
The importance of CTCF modification in the biological processes is supported by changes in expression profiles of genes associated with CTCF ( Figures 1E and 7, far right). These changes involve down-regulation if genes involved in cell cycle and cell migration, and up-regulation of genes involved in differentiation thereby adequately reflecting the biological situation, i.e. transition from proliferating to arrested cells. Furthermore, some of the affected genes appear to be characteristic for particular groups of CTCF sites, for example, genes responsible for cell cycle regulation are downregulated in the group of genes where CTCF is lost. It is tempting to speculate that such preference may be due to the change of behaviour of PARylated CTCF at the particular type of CTCF sites.
It should be noted that in this report we suggest how CTCF PARylation may control its DNA binding properties and, subsequently, changes in local chromatin and gene expression. It was beyond our scope to consider in this experimental model global effects of CTCF on higher order chromatin structures, which can be expected from CTCF as an architectural protein (9,(57)(58)(59)(60)(61).