Amplification of pico-scale DNA mediated by bacterial carrier DNA for small-cell-number transcription factor ChIP-seq

Chromatin-Immunoprecipitation coupled with deep sequencing (ChIP-seq) is used to map transcription factor occupancy and generate epigenetic profiles genome-wide. The requirement of nano-scale ChIP DNA for generation of sequencing libraries has impeded ChIP-seq on in vivo tissues of low cell numbers. We describe a robust, simple and scalable methodology for ChIP-seq of low-abundant cell populations, verified down to 10,000 cells. By employing non-mammalian genome mapping bacterial carrier DNA during amplification, we reliably amplify down to 50 pg of ChIP DNA from transcription factor (CEBPA) and histone mark (H3K4me3) ChIP. We further demonstrate that genomic profiles are highly resilient to changes in carrier DNA to ChIP DNA ratios. This represents a significant advance compared to existing technologies, which involve either complex steps of pre-selection for nucleosome-containing chromatin or pre-amplification of precipitated DNA, making them prone to introduce experimental biases.


Background
Genomic mapping of histone modifications, their writers, readers and erasers as well as transcription factors (TFs) has become a house-hold approach to study the genomewide regulation of gene expression programs [1,2]. The most widely applied method to generate such global mapping data is Chromatin ImmunoPrecipitation coupled with high-throughput sequencing (ChIP-seq), which however generally requires millions of cells as input material (e.g. [3]). Scarcity of cells in distinct, isolated in vivo populations such as phenotypically defined hematopoietic stem and progenitor cells has thus hampered direct experimentation on these. Examining such sub-populations is of intense interest for the elucidation of mechanisms governing lineage choice and commitment as well as transcriptional de-regulation in disease. Some researchers have cultured harvested cells to achieve sufficient cell numbers and performed ChIP-seq on cells undergoing differentiation in vitro, but this approach can give rise to biases due to culture conditions (e.g. [4]). Recent advances in the methodology have demonstrated successful ChIP-seq on very limited cell numbers [5,6]. However, these techniques are impeded by additional rounds of pre-amplification, potentially making them prone to introduce artifacts. One of these methods was established on cultured cells [6], while the other is limited to showing ChIP-seq with antibodies against histone modifications [5], yielding significantly more immunoprecipitated DNA than ChIP with antibodies to transcription factors. Furthermore, the complexity and cost of pre-amplification deters the implementation of the existing methods in many laboratories. Another recent method uses an elegant step of pre-ChIP indexing of Histone 3 containing chromatin fragments, allowing downstream distinction of input material [7]. The mixed inputs may thus function as mutual carriers during single-tube, small cell number ChIP. Importantly, the indexing step selects against nucleosome-poor genomic regions, making this approach less useful for unbiased investigation of genome-wide TF occupancy.
Here, we describe a straightforward and versatile workflow for both transcription factor and histone mark ChIPseq on low abundance cell populations isolated directly from the in vivo setting. A key element is the introduction of bacterial carrier DNA at the amplification step. This eliminates the previous need for pre-amplification and makes possible robust generation of sequencing libraries from picogram amounts of ChIP DNA.

Histone mark ChIP-seq of hematopoietic cell populations
The scarcity of biologically relevant in vivo material is often barring global level investigations into normal development as well as the aberrant regulation behind cancer and other complex diseases. Of particular interest are the genomewide binding patterns of transcription factors and the associated epigenetic profiles, which may pinpoint aberrant molecular mechanisms underlying transcriptional dysregulation and development of disease. Here, we use a standard FACS regimen (Additional file 1: Figure S1) to isolate a specific hematopoietic GMP-blast population from Cebpa p30/p30 mice expressing a truncated variant of the myeloid transcription factor CEBPA [8]. These mice develop acute myeloid leukemia with complete penetrance, and have been studied in detail [9][10][11][12]. However, the precise molecular dysregulation driving leukemogenesis remains obscure. We therefore developed a ChIP-seq assay compatible with the numbers of isolated leukemic cells from the in vivo context. First, we optimized our ChIP protocol for small cell numbers, which is described in detail here for clarity. Immediately after the sorting procedure, isolated cells were exposed to formaldehyde for cross-linking chromatinassociated proteins to the DNA, washed and snap frozen in liquid nitrogen. Next, they were subjected to sonication to break the chromatin into suitably sized fragments (Figure 1 and Methods). We found that careful inspection of the DNA size distribution of each batch of chromatin was useful to prevent further processing of low quality samples. This was achieved either by processing a parallel sample of c-Kit enriched BM cells, providing a sufficient cell number for standard gel electrophoresis, or by direct inspection of each sample using the Bioanalyzer DNA1000 assay (Methods and (Additional file 2: Figure S2)). Chromatin from roughly 125,000 cells, equivalent to 250-300 ng of naked DNA, was used as input for each ChIP experiment with antibodies against the histone marks H3 Lys27 trimethylation (H3K27me3) or H3 Lys4 trimethylation (H3K4me3), performed in siliconized tubes with optimized washing conditions and titrated antibody and antibody-binding beads (Methods). Utilizing a thorough approach of extended protein degradation and de-crosslinking steps, as well as phenol-chloroform extraction for retrieving ChIP DNA ensured robust high recovery. This approach allowed us to effectively enrich for genomic sequences associated with either H3K27me3 or H3K4me3 as assessed by quantitative PCR (qPCR) (Additional file 3: Figure S3). The H3K27me3 ChIP produced ca. 2 ng of DNA for each sample. By making minor but important changes to the standard Illumina protocol, we were able to consistently amplify the 2 ng ChIP DNA to generate libraries for high-throughput  sequencing (Methods). The H3K4me3 ChIP yielded an amount of DNA below the effective range of standard absorbance or fluorescence assays. We circumvented this obstacle by taking advantage of the fluorescence Nanodrop instrument, which allows reliable detection of DNA down to 5 pg/ul in a 1 ul sample volume (Additional file 4: Figure S4). With this approach, H3K4me3 ChIP DNA was measured to ca. 700 pg DNA, which we pooled to obtain the 2 ng sufficient for robust amplification (Methods). Using the Illumina Hiseq platform, we deep sequenced two libraries derived from two biologically independent samples for each of the two histone marks (Additional file 5: Table S1). We processed the aligned reads into genomic coverage profiles using standard procedures (Methods). Visual analysis of the profiles suggested a good concordance with previous findings [5,13], showing enrichment of the H3K27me3 mark in gene bodies, intergenic regions as well as promoters and H3K4me3 in gene promoter regions ( Figure 2A). A quantitative analysis mapped H3K27me3 reads as 6% in promoter (5' proximal) and 56% in gene body locations (intronic/exonic), while 21% of H3K4me3 reads resided in promoters ( Figure 2B). Promoter H3K4me3 modifications were positively and H3K27me3 negatively correlated with activity of associated genes, as observed previously (e.g. [5,[13][14][15][16][17]) ( Figure 2C). Finally, we assessed the reproducibility of our ChIP-seq approach by comparing coverage in promoter regions from two biologically independent replicates and found a high degree of correspondence, both by visual inspection (example in Figure 2A and E) and quantitative measures ( Figure 2D). In the examined cell type, we observe a near mutually exclusive pattern of H3K4me3 and H3K27me3 marks in promoters ( Figure 2E), as opposed to e.g. embryonic stem cells displaying a set of doublemarked promoters at poised genes, a hall-mark of the undifferentiated state [18,19].
Transcription factor ChIP-seq of specific, isolated cell populations  to chromatin components showing broader distributions. This is in line with our observations for H3K4me3 versus H3K27me3 modifications, with characteristic peak-like and broad distributions, respectively (e.g. [13]). Accordingly, we expected transcription factor ChIP to yield even less output DNA. To address this issue we used a larger input cell number. We performed ChIP with antibodies against CEBPA, a transcription factor known to be expressed in the GMP blast population [10], using chromatin from 250,000 or 500,000 cells, corresponding to 500-1000 ng input DNA per sample. By performing qPCR for amplicons covering known CEBPA target sequences, we verified the quality of the immuno-precipitation step (Additional file 3: Figure S3). Utilizing the fluorescent nanodrop, we measured the ChIP DNA yields of down to 250 pg (250,000 cells), an amount prohibitive for standard use as a starting point for Illumina amplification.
We reasoned that an absolute loss during the amplification procedure could be ameliorated by adding a DNA carrier. First, we set out to test the use of a synthetic 120-mer oligo devoid of the terminal 3'-OH group, which would prevent ligation to the amplification adaptors and hence preclude amplification during the PCR step. This gave rise to unacceptable biases as displayed by non-linear amplification of CEBPA enriched regions (data not shown), probably due to low complexity of the DNA pool during PCR amplification. We next hypothesized that addition of complex carrier DNA that could be amplified would be ideally suited to surpass this problem. Hence, we performed an in silico mappability test of 10 million 50-bp randomly extracted E. coli sequences, of which 0 mapped to the mouse genome and 14457 (<0.15%) to the human counterpart (Additional file 6: Additional Methods). Adding fragmented bacterial DNA (1700 pg) to the CEBPA ChIP DNA (ca. 300 pg) produced the 2 ng established to perform robustly in amplification. By deep sequencing the resulting compound library, we were able to map up to 20% of the obtained sequences to the mouse genome (Additional file 5: Table S1). This corresponded with the ratio between ChIP DNA and bacterial carrier and typical ChIP-seq mapping frequencies. CEBPA peaks were observed in promoter proximal positions characteristic of many transcription factors (e.g. [20]) ( Figure 3A). Quantitative assessment of CEBPA peak positions revealed a genomic distribution analogous to a thoroughly validated CEBPA ChIP-seq dataset obtained with liver cells ( Figure 3B) [21]. Further visual inspection identified several examples of CEBPA peaks shared between hepatocyte and myeloid progenitor data sets ( Figure 3C). Strikingly, the two archetypical homeostatic liver genes (Albumin (Alb) and Carbamoyl-Phosphate Synthase 1 (Cps1)) displayed CEBPA peaks in hepatocytes, but not in myeloid cells. Conversely, many genes characteristic of the myeloid lineage displayed CEBPA peaks only in the myeloid ChIP-seq dataset (e.g. Myeloperoxidase (Mpo), Colony stimulating factor 3 receptor (Granulocyte)(Csf3r), Matrix Metallopeptidase 8 (Neutrophil Collagenase)(Mmp8), Selectin L(Sell), Colony Stimulating Factor 2 (Granulocyte-Macrophage) (Csf2), Fc Fragment of IgG, Low Affinity IIIb, Receptor (Fcgr3)) ( Figure 3C). Several of these are known CEBPA targets. Two additional pieces of evidence supported the validity and specificity of our CEBPA genomic occupancy data. Firstly, the top hit of a de novo motif search in the enriched sequences matched the known CEBPA binding logo ( Figure 3D), and this motif was found strongly enriched in the peak centers ( Figure 3E). Secondly, robust conservation in these sequences were centered on CEBPA motifs, implying functional evolutionary constrain ( Figure 3F). Visual and quantitative comparisons of coverage from two biologically independent repeats indicated a high degree of reproducibility ( Figure 3A, G, H). Collectively, these data provide evidence that transcription factor ChIP-seq on smallcell-number populations is possible using bacterial DNA as a carrier during the amplification step.

Amplification from picogram amounts of ChIP DNA
Many immunophenotypically defined hematopoietic compartments, e.g. the hematopoietic stem cells, consist of a very limited number of cells. ChIP from small cell populations, as has been done previously by several groups [3,5,6,[22][23][24], consistently yield very limited quantities of ChIP DNA. Hence, we wanted to investigate if our carrier DNA amplification approach could be applied on picogram-scale amounts of DNA. To test this, we generated a series of compound DNA Illumina amplified libraries with varying ratios of bacterial carrier DNA and ChIP DNA from three pooled CEBPA ChIP samples (performed on 250,000 cells each) to allow direct comparisons between libraries. Aiming to minimize the dilution ratio, some of these libraries were generated using total input amounts of 1000 or 500 pg. The resulting four libraries (CEBPA-3, 100 pg ChIP DNA and 1900 pg carrier; CEBPA-4, 100 and 900 pg; CEBPA-5, 100 and 400 pg; CEBPA-6, 50 and 450 pg), all displayed amplification output yields and size distributions comparable to libraries generated from 2 ng DNA (Additional file 7: Figure S5). High-throughput sequencing of these libraries resulted in mapping frequencies close to the expected based on standard ChIPseq mapping efficiency and ratios of ChIP DNA to bacterial carrier DNA. E.g. from a read number of roughly 85 million for CEBPA-5, 8.4 million mapped uniquely to the mouse genome (Additional file 5: Table S1). Visually, genomic coverage profiles for each of the four new libraries closely matched our previous CEBPA tracks ( Figure 4A). An analysis of the degree of correlation between dilute libraries and CEBPA-1 indicated consistent,  high reproducibility ( Figure 4B). Strikingly, the library containing least ChIP DNA (CEBPA-6, 50 pg) display a pearson correlation of 0.85 with the 300 pg CEBPA-1 library and overall recapitulate the genomic coverage of this library ( Figure 4B, C). Next, we investigated the diluted libraries for presence of CEBPA peaks at known CEBPA targets (e.g. Nuf2, NDC80 kinetochore complex component, Smg-7 homolog, nonsense mediated mRNA decay factor, CCAAT/enhancer binding protein (C/EBP), alpha and -beta, prostaglandinendoperoxide synthase 2 and interleukin 6 receptor, alpha), and found these in all coverage profiles ( Figure 5A) [21].
To test our panel of libraries further, we performed qPCR to quantify amplicons corresponding to the CEBPA enriched regions mentioned above, which indicated reproducible and practically uniform amplification across dilution ratios and input amounts ( Figure 5B).

ChIP-seq using 10,000 cells
To investigate if our methodology allows ChIP-seq on limited cell numbers, we transplanted a new cohort of mice with the leukemic strain also used in the above and FACSisolated batches of 10,000 GMP-blasts. We optimized the chromatin preparation procedure to ensure maximal input  material for the small-cell-number ChIP (Additional file 8: Additional Protocols and Additional file 9: Figure S6). Thorough optimization of ChIP conditions produced robust enrichments for histone mark (H3K4me3) as well as TF (CEBPA) ChIP, both of which compared favorably to ChIPs done according to a previously published 10,000 cell ChIP methodology by Zwart et al. [24] (Figure 6A, B). The major advance in the previous study was addition of a combined mRNA/Histone carrier/blocker during the ChIP step. We amended our protocol to include such a carrier, resulting in even better enrichment as assessed by qPCR ( Figure 6A, B). A small amount of 10,000 cell ChIP-DNA was used for examining enrichment and establishing quantity, leaving ca. 60 pg DNA. Parallel generation of sequencing libraries with and without bacterial carrier demonstrated that at the 60 pg ChIP-DNA range, carrier is required for robust production of high-quality Illumina libraries (Additional file 10: Figure S7). Both visual inspection and quantitative analyses of genomic coverage tracks from 10,000 cell ChIPs revealed good correspondence with libraries generated from biologically independent higher-cell-number ChIPs ( Figure 6C, D, E). CEBPA-1 peaks were to a high degree shared with a biological repeat (CEBPA −2), the diluted library (CEBPA-6) and the low-cell-number ChIP (CEBPA-10K) (Additional file 6: Additional Methods). Importantly, the H3K4me3-10K profile generated from 60 pg ChIP-DNA, which display substantially fewer mapped reads (Additional file 5: Table S1), is highly similar to the H3K4me3-1 profile from 2 ng DNA resulting from ChIP without using bacterial carrier for amplification ( Figure 6C, D).
The recent establishment of the online Next Generation Sequencing Quality Control (NGS-QC) Generator, a useful cross-platform quality assessment tool [25], allowed a quantitative comparison of our ChIP-seq approach with already published methods (Additional file 6: Additional Methods). Our methodology robustly produced NGS-QC scores matching or exceeding those from previous approaches ( Figure 6F and Additional file 11: Table S2). Specifically, Shankaranarayanan (LinDA) amplification [6] resulted in considerably lower NGS-QC scores at the 10,000 cell range, while Adli amplification [5] and the Zwart methodology [24] produced data sets with comparable scores for histone modification or transcription factor ChIP-seq profiles, respectively ( Figure 6F).

Discussion and conclusions
Here, we present a complete work-flow for ChIP-seq on the limited cell populations of distinct in vivo cell compartments. By isolating a defined cell population with a standard immunophenotypic sorting regimen and using a rigorous and reliable ChIP approach, we were able to enrich regions marked by both broadly (H3K27me3) and more narrowly (H3K4me3) distributed histone modifications as well as the precise regions defined by occupancy of the central hematopoietic transcription factor, CEBPA. Importantly, the fluorescence nanodrop instrument allowed us to reliably measure the obtained picogram amounts of ChIP DNA. We demonstrate that these amounts of ChIP DNA could be faithfully amplified to generate libraries for Illumina sequencing by addition of fragmented E. coli carrier DNA. Further, we show the resulting coverage profiles to match previous data for the actively transcribed gene associated (H3K4me3) and repressed gene associated (H3K27me3) histone marks, and to specifically recover known myeloid specific CEBPA target regions. We provide evidence that CEBPA bound regions from a pool of ChIP'ed DNA down to 50 pg are amplified in a linear manner with our technique. Finally, we demonstrate that our straightforward methodology can produce high-quality ChIP-seq coverage tracks from as little as 10,000 isolated in vivo cells using antibodies against histone mark (H3K4me3) as well as transcriptional factor (CEBPA) antibodies. Importantly, the H3K4me3-ChIP 10,000 cell genomic coverage profile is very similar to the profiles generated without bacterial carrier (H3K4me3-1/-2), indicating that the addition of bacterial carrier does not impede amplification or Illumina sequencing. Our methodology can be used to elucidate important biological circuitries at a global level. For instance, we can clearly detect differences of direct targets of CEBPA in specific cell types as illustrated in Figure 3. Importantly, two recently published studies demonstrate the usefulness of our approach by revealing molecular mechanisms behind initiation of acute myeloid leukemia and regulation of hematopoietic stem cells [26,27]. Furthermore, our technology should combine easily with other genomics approaches, for instance Chia-PET [28], to permit generation of libraries from otherwise prohibitively small amounts of DNA.
Other available methodologies that allow generation of ChIP-seq data from limited amounts of input material rely on an extra amplification step, based either on custom designed random primers or T7 RNA-polymerase technology [5,6]. Even as these methods are very useful, introducing extra and complex steps in the amplification procedure will inevitably increase the risk of errors, and is costly and time-consuming. Our method is based on the conceptually simple addition of non-mapping DNA as carrier and should as such be easy to implement in any laboratory that already performs Illumina platform ChIP-seq. A disadvantage of our approach is the added cost of sequencing E. coli DNA, generating data that will be discarded. This drawback increases with the ratio between bacterial and ChIP DNA, i.e. as fewer cells are used or less material recovered, for example as a result of poor antibodies or low expression of the ChIPed factor. We have tried to address this issue by demonstrating the feasibility of using just 500 picogram total input material for the amplification procedure. The steady decline of sequencing prices should also help reduce this disadvantage.
Several groups have optimized ChIP from very small cell numbers (e.g. [3]), whereas our study focusses on refining the amplification step of ChIP-seq. Some reports have shown the addition of carrier chromatin during the ChIP stage to facilitate the application on small cell numbers, but these methodologies are either not tested or incompatible with high-throughput sequencing [29]. Recently, Zwart et al. demonstrated an increase in the ChIP-seq signal to noise ratio by adding carrier consisting of RNA and histones [24]. While ChIP protocols generally include bovine serum albumin as a carrier (e.g. [30]), Zwart and coworkers speculate that combined oligonucleotide RNA and histones mimic chromatin better, and hence offer improved blocking of spurious binding. Both components are degraded prior to the amplification step, making the modification suitable for use in ChIP-seq. We compare a low-cell-number optimized version of our methodology and find it to surpass the Zwart approach for the tested antibodies ( Figure 6A, B). Nevertheless, adopting the mRNA/histone ChIP level carrier considerably improves our protocol ( Figure 6A, B), clearly demonstrating the value of the Zwart at al. contribution to the development of ChIP technology.
In summary, our study, in combination with the progress of sequencing technology, makes possible relatively straight-forward ChIP-seq even on very scarce cell numbers e.g. from stem cell populations. This opens the door for extensive genome-wide investigation of the regulatory circuitries at all differentiation levels of in vivo tissues.

Methods
Additional methods and protocols in additional files (Additional file 6: Additional Methods and Additional file 8: Additional Protocols and Buffer Recipes).

Mouse work
Blast-GMP populations were generated as described previously [31]. Briefly, fetal livers were isolated from E14.5 Cebpa p30/p30 (CD45.2) female embryos, homogenized and filtered through a 70 um filter. Each liver was transplanted into four lethally irradiated recipients (CD45.1) by tail vein injection. The recipients developed acute myeloid leukemia (AML) within 9-11 months after transplantation and sacrificed when moribund. The bone marrow (BM) was isolated and retransplanted into sublethally irradiated recipients (1.5 × 10^6 whole BM cells/ recipient). The procedure was repeated to generate a tertiary leukemia, from which whole BM was retrieved for isolation of GMP-blasts. Genotyping was performed as previously described [10], utilizing genomic tail DNA as template. All mouse work was performed according to national and international guidelines and approved by the Danish Animal Ethical Committee under license #2012-15-2934-00725. For sorting, cells were c-Kit enriched using CD117 microbeads and MACS LS separations columns (Miltenyi Biotec) prior to staining. GMP-blasts used in this study were defined as previously described [8,10] (Additional file 1: Figure S1).

Chromatin preparation
GMP-blasts were sorted into pre-wetted siliconized microcentrifuge tubes (Biozym, cat#1267-2970) on ice, containing 300 μl PBS and 3% Fetal Calf Serum (FCS) (Saveen & Werner, cat#FB-1090/500). Volume was adjusted to 1.4 ml with cold PBS-3% FCS and samples cross-linked in 1% formaldehyde for 10 minutes at room temperature using a rotator. Cross-linking was quenched using 0.125 M Glycine and samples washed twice in cold PBS-3% FCS using a swinging bucket centrifuge (4 minutes, 600 g) and soft spin settings to minimize material loss. Cells were lysed as described previously in 300 μl lysis buffer [21], applying up/ down pipetting 10 times using a 100 ul tip low retention tip. Samples were sonicated using a Bioruptor sonicator plus for 30 cycles, 15/30 seconds on/off, high setting, and debris pelleted by centrifuging cold at 14,000 g for 10 minutes. Fragmentation of chromatin was evaluated on extracted DNA using either: 1) a c-Kit enriched 500,000 cell sample (see above) processed in parallel (one of six tubes sonicated simultaneously) and agarose gel electrophoresis or, 2) direct sample size inspection of a 20 ul aliquot using a Bioanalyzer (Agilent DNA1000 kit cat#5067-1504) (Additional file 2: Figure S2). Output quantity was determined for each sample using the Qubit instrument (Invitrogen, HS-assay cat#Q32851).

Preparation of libraries from nano-and picogram input DNA
Amplification of 2 ng ChIP DNA was essentially performed as described by the manufacturer (NEB, cat#E6240S), with the use of precast 2% SYBR agarose gels (Invitrogen, cat#G5218-02) and excision of band size 175-400 bp. Key modifications consisted of a 30 minute ligation step, 30 minute gel solubilization at 37°C of excised gel fragments, and a prolonged, double run-through elution step (each three minutes) with preheated (55°C) elution buffer for all column purifications (Qiagen, cats#28104,28704,28004) to ensure robust recovery. Amplification of picogram precipitated DNA was performed by adding carrier up to a total of 500 pg, 1000 pg or 2 ng as indicated using purified, chromosomal E. coli DNA, sonicated to a size distribution of 200-500 bp (Diagenode current protocols). All steps were performed in Lo-Bind tubes (Eppendorf, cat#525-0130). Libraries were generated for duplex or triplex sequencing using a NEB kit (cat#E7335S), and size distributions assessed by Bioanalyzer (Agilent, High Sensitivity kit, cat#5067-4626), (Additional file 7: Figure S5 and Additional file 10: Figure S7). Full protocol included in additional files (Additional file 8: Additional Protocols and Buffer Recipes).

Sequencing and mapping
All libraries were single-end sequenced on the Illumina HiSeq2000 platform at the Danish National High-throughput DNA Sequencing Centre. The resulting 50-mer reads were mapped to the NCBI7/mm9 (Mus musculus) genome assembly using Bowtie v. 0.12.8 with standard settings for unique mapping [32]. An overview of sequencing and mapping statistics is presented in (Additional file 5: Table S1). See additional files for mapping of bacterial carrier sequences (Additional file 6: Additional Methods).