A crucial role for the ubiquitously expressed transcription factor Sp1 at early stages of hematopoietic specification

Mammalian development is regulated by the interplay of tissue-specific and ubiquitously expressed transcription factors, such as Sp1. Sp1 knockout mice die in utero with multiple phenotypic aberrations, but the underlying molecular mechanism of this differentiation failure has been elusive. Here, we have used conditional knockout mice as well as the differentiation of mouse ES cells as a model with which to address this issue. To this end, we examined differentiation potential, global gene expression patterns and Sp1 target regions in Sp1 wild-type and Sp1-deficient cells representing different stages of hematopoiesis. Sp1−/− cells progress through most embryonic stages of blood cell development but cannot complete terminal differentiation. This failure to fully differentiate is not seen when Sp1 is knocked out at later developmental stages. For most Sp1 target and non-target genes, gene expression is unaffected by Sp1 inactivation. However, Cdx genes and multiple Hox genes are stage-specific targets of Sp1 and are downregulated at an early stage. As a consequence, expression of genes involved in hematopoietic specification is progressively deregulated. Our work demonstrates that the early absence of active Sp1 sets a cascade in motion that culminates in a failure of terminal hematopoietic differentiation and emphasizes the role of ubiquitously expressed transcription factors for tissue-specific gene regulation. In addition, our global side-by-side analysis of the response of the transcriptional network to perturbation sheds a new light on the regulatory hierarchy of hematopoietic specification.

EB were harvested at a range of time-points for CFU-C hematopoietic colony assays using Methocult ® M3434 complete methylcellulose (Stem Cell Technologies) or for RNA isolation using TRIzol ® . Assays were set up by washing out the methylcellulose with PBS, dispersing the EB with collagenase and plating cells at 10 5 /ml in M3434 methylcellulose in duplicate 3cm bacteriological grade dishes. Dishes were scored from Day 3 for CFU-E and from Day 8 for BFU-E, CFU-M and CFU-GM.
Ery-P assays were based on those performed by Sturgeon et al and Sroczynska et al (Sturgeon et al., 2012) (Sroczynska et al., 2009). Assays were set up using EB at a range of time-points during differentiation as described above and dispersed cells were plated into methylcellulose supplemented with 10% Platelet Derived Serum (Antech), 5% PFHM (Invitrogen), 100 units/ml Penicillin and 100 µg/ml Streptomycin, 2 mM glutamine, 0.18 mg/ml Transferrin, 50 µg/ml ascorbic acid, 4.5 x 10 -4 M MTG and 2 U/ml Erythropoietin (R&D Systems) at 10 5 cells/ml in duplicate 3 cm bacteriological grade dishes. Ery-P colonies were counted on Day 5 after plating. Staining of nucleated erythroblasts: Ery-P colonies were washed off the dishes with PBS and dispersed. Single cells were cyto-spun onto glass slides, fixed with methanol and stained with Accustain Wright-Giemsa stain (Sigma).

Chromatin immunoprecipitation:
Flk1+ve sorted cells and KIT+ve sorted floating progenitors from Day 3 blast culture were used for ChIP-seq analysis. Cells were crosslinked for 10 min at room temperature with 1% formaldehyde (Thermo Scientific) and quenched with 1/10th volume 2 M glycine. Nuclei were prepared essentially as described in Lefevre et al 2003, sonicated using a Bioruptor water bath in immunoprecipitation buffer I (25 mM Tris 1 M pH 8.0, 150 mM NaCl, 2 mM EDTA pH 8.0, 1% TritonX-100 and 0.25 % SDS). After centrifugation the sheared 0.5-2 kb chromatin fragments (1-2 x 10 6 cells) were diluted with 2 volumes immunoprecipitation buffer II (25 mM Tris pH 8.0, 150mM NaCl, 2 mM EDTA pH 8.0, 1% TritonX-100, 7.5% glycerol) and precipitation was carried out for 2-3 hours at 4°C using 2 µg anti-Sp1 antibody (Santa Cruz sc-17824X) coupled to 15 µl Protein-G dynabeads (Invitrogen). Beads were washed with low salt buffer (20 mM Tris pH 8.0, 150 mM NaCl, 2 mM EDTA pH 8.0, 1% TritonX-100, 0.1% SDS), high salt buffer (20 mM Tris pH 8.0, 500 mM NaCl, 2 mM EDTA pH 8.0, 1% TritonX-100, 0.1% SDS), LiCl buffer (10 mM Tris pH 8.0, 250 mM lithium chloride, 1 mM EDTA pH 8.0, 0.5% NP40, 0.5% sodium-deoxycholate) and TE pH 8.0 containing 50 mM sodium chloride. The immune complexes were eluted in 100 µl elution buffer (100 mM NaHCO 3 , 1% SDS) and after adding 4 µl 5M sodium chloride and proteinase K, the crosslinks were reversed at 65°C overnight. DNA was extracted using the Ampure PCR purification kit (Beckman Coulter) according to manufacturer's instructions and analysed by qRT-PCR. Libraries of DNA fragments from chromatin immunoprecipitation were prepared from approximately 10 ng of DNA. Firstly, overhangs were repaired by treatment of sample material with T4 DNA polymerase, T4 PNK and Klenow DNA polymerase (all enzymes obtained from New England Biolabs, UK) in a reaction also containing 50 mM Tris-HCl, 10 mM MgCl 2 , 10 mM Dithiothreitol, 0.4 mM dNTPs and 1 mM ATP. Samples were purified after each step using Ampure PCR purification kit (Beckman Coulter). 'A' bases were added to 3' ends of fragments using Klenow Fragment (3´-5´ exo-), allowing for subsequent ligation of adapter oligonucleotides (Illumina part #1000521) using Quick T4 DNA ligase. After a further Ampure clean up to remove excess adaptors, fragments were amplified in a PCR reaction using adapter-specific primers (5'-CAAGCAGAAGACGGCATACGAGCTCTTCCGATC*T and 5'-AATGATACGGCGACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATC* T). The libraries were purified and adapter dimers removed by running the PCR products on 2% agarose gels and excising gel slices corresponding to fragments approximately 200-300 bp in size, which were then extracted using the Qiagen gel extraction kit. Libraries were validated using quantitative PCR for known targets, and quality assessed by running 1 µl of each sample on an Agilent Technologies 2100 Bioanalyser. Once prepared, DNA libraries were subjected to massively parallel DNA sequencing on an Illumina Genome Analyzer.

SDS-PAGE and Western Blotting
Protein extracts were separated on 10 % SDS-PAGE gels and western blots prepared by wet

CFSE Assay
Single cell suspensions of ES cells at a concentration of 2 x10 6 cells/ml were labelled with 1µM CFSE in PBS for 10 min at room temperature. The CFSE staining was quenched with media and cells washed thoroughly to remove excess CFSE. Cells were seeded on gelatinised 12 well TC plates at a concentration of 4 x10 4 cells per well in ES cell maintenance media.
FACS was performed every 12hrs and unstained cells were used as a negative control.

Analysis of ChIP-sequencing data
The raw sequence data in fastq format returned by the Illumina Pipeline was aligned to the mm10 mouse genome build using BWA (Li and Durbin, 2010). The reads in the resulting alignment files in sam format were used to generate density maps using bed-tools (Quinlan and Hall, 2010) and data was displayed using the UCSC Genome Browser (Kent et al., 2002).
Regions of enrichment (peaks) of ChIP data were identified using MACS (Zhang et al., 2008) and cisGenome (Ji et al., 2008) software. The resulting peaks common for the two peak calling methods were considered for further analysis. Peak overlaps, gene annotations and CpG island measurements were performed using in-house scripts. Peaks were allocated to genes if located in either their promoters or within the region of 500 bp downstream and 2000 bp upstream of the transcription start sites (TSS), as intragenic if not in the promoter but within the gene body region, or if intergenic, to the nearest gene located within 100 kb. CpG island coordinates were downloaded from the UCSC genome browser and the number of peaks in CpG islands was calculated if the peak summit lies within the CpG island start and end coordinates.

Development | Supplementary Material
A number of tools are designed for testing for differential binding sites; here we used MAnorm a Bioconductor R package (Shao et al., 2012). We found that the 10577 peaks unique for FLK1+ cells were statistically significant differential binding sites at a cut-off of a p-value of >= 0.5 and that 8136 (77%) unique peaks were statistically significant at a cut-off of a p-value of >= 0.1. We also found that the 3099 peaks unique for progenitor were statistically significant differential binding sites at a cut-off of a p-value of >= 0.5 and that 2368 (76%) progenitor unique peaks are statistically significant at a cut-off of a p-value of >= 0.1. Moreover 87% of the FLK1+ peaks were differential binding sites by comparing the 10577 unique FLK1+ peaks to the FLK1+ total peaks generated by MACS when using progenitors as a control sample. 77% of unique FLK1+ peaks were statistically significantly differential binding sites at a MACS FDR cut-off of 7.
De novo motif analysis was performed on promoters and non-promoter (distal) peaks separately using HOMER (Heinz et al., 2010). Motif lengths of 6, 8, 10, 12 and 14 bp were identified in within ± 100 bp from the peak summit and a random background sequence option was used. The motif matrices generated by HOMER were scanned against JASPAR with the use of STAMP to identify similarity to known transcription factor binding sites (Mahony and Benos, 2007). The top enriched motifs with a significant log p value score were recorded. The annotatePeaks function in HOMER was used to find occurrences of motifs in peaks and distribution of motif density around the peak summit. In this case we used the discovered motif position weight matrices (PWM) with the most significant log p value.

Analysis of microarray data
The microarray gene expression scanned images were analysed with Feature Extraction Software 10.7.1.1 (Agilent) (protocol GE1_107_Sep09, Grid: 028005_D_F_20100614 and platform Agilent SurePrint G3 Mouse GE 8x60K). The raw data output by Feature Extraction Software was analysed using the LIMMA R package (Smyth et al., 2005)with quantile normalisation and background subtraction with the normexp method (Ritchie et al., 2007) using an offset value of 16. Contrast matrix and eBays function were used and p value <= 0.01 was applied. Only genes with a minimum log2 intensity value equal to or greater than 6 in at least one array were selected as expressed genes. Genes that changed expression at least two fold up or down with respect to Sp1 -/were considered as differentially expressed.
The Principal Component Analysis (PCA) was carried out on the average value of the probe set intensity within each experiment and was calculated based on a pair-wise Pearson Development | Supplementary Material correlation coefficient matrix using R (R Core Team, 2013).
Clustering of gene expression was carried out on signal intensity for all expressed genes and on fold-changes for genes associated with at least a two-fold change. Hierarchical clustering was used with Euclidean distance and average linkage clustering. Heatmaps were generated using Mev from TM4 microarray software suite (Saeed et al., 2006). We then clustered/grouped gene expression fold changes according to patterns of expression throughout differentiation (Fig. 4A). This analysis yielded 23 clusters and identified a large number of genes whose expression was unchanged. Signal intensity and fold changes of each cluster/group individually were hierarchically clustered and box plotted respectively. (Figs   S4A, B) Gene ontology (GO) analysis was performed using bingo (Maere et al., 2005)            term networks using kappa statistics implemented by ClueGO to link the terms in the network. The right-sided enrichment (depletion) test based on the hyper-geometric distribution is used for terms and groups. The groups are created by iterative merging of initially defined groups based on the kappa score threshold. The relationship between the selected terms is defined based on their shared genes and the final groups are randomly coloured where functional groups represented by their most significant term. One, two or more colours represents that a gene/term is a member of one, two or more groups respectively. The size of the nodes reflects the enrichment significance of the terms. The network is automatically laid out using the layout algorithm supported by Cytoscape. A: KEGG pathway analysis of selected deregulated genes in Sp1-/-Flk1+ cells and progenitors that are shown in Figure 4C