Redesigning chromosomes to optimize conformation capture (Hi-C) assays

In all chromosome conformation capture based experiments the accuracy with which contacts are detected varies considerably because of the uneven distribution of restriction sites along genomes. Here, we redesigned and reassembled in yeast a 145kb region with regularly spaced restriction sites for various enzymes. Thanks to this design, we enhanced the signal to noise ratio and improved the visibility of the entire region as well as our understanding of Hi-C data, while opening new perspectives to future studies. Results Genomic derivatives of the capture of chromosome conformation assay (3C, Hi-C, Capture-C)(Lieberman-Aiden et al, 2009; Dekker et al, 2002; Hughes et al, 2014) are widely applied to decipher the average intraand inter-chromosomal organization of eukaryotes and prokaryotes (Sexton et al, 2012; Le et al, 2013; Dekker et al, 2013; Marbouty et al, 2014a). Formaldehyde cross-linking followed by segmentation of the genome by a restriction enzyme (RE) are the first steps of the experimental protocol. The basic unit of “C” experiments therefore consists in restriction fragments (RFs) that are subsequently religated and captured to identify long range contacts. The best resolution that can be obtained is directly imposed by the positions of the RE sites along the genome. Both 6-cutter and 4-cutter REs have been used (Marie-Nelly et al, 2014; Sexton et al, 2012; Rao et al, 2014; Le et al, 2013), the latter with the expectation that the resolution increases with the number of sites. However, this approach suffers from a major caveat: restriction sites (RSs) are not regularly spaced along the genomes. The distribution of RFs lengths follows a geometric distribution, with important variations along the genome that depend on the local GC content and the specific sequence recognized by the RE. Given that the likelihood for a RF to be crosslinked by formaldehyde during the first step in the procedure depends on its length (Cournac et al, 2012), the probability to detect a given fragment in any 3C experiment will in turn be strongly affected by this parameter (Fig 1A). Normalization procedures have been developed in order to correct the signal (Cournac et al, 2012; Imakaev et al, 2012) but these methods involve filtering out fragments with unusually low or high signal and aggregating the contact data over several consecutive fragments in longer bins of fixed genomic length, at the expense of actual resolution (Lajoie et al, 2015). Overall, the definition of Hi-C resolution has remained empiric, because of the lack of a control sequence where RF biases would be alleviated. In order to investigate and increase the resolution of 3C-based experiments, we designed and assembled a dedicated “synthetic” genomic region. As a proof of concept of this strategy, we describe here a redesigned ~150kb region (called here synIV-3C) of budding yeast chromosome 4. This designer chromosome closely resembles the native chromosome with respect to genetic elements (see Supplementary Note 1 and Fig S1), but was “designed” to yield high resolution and high visibility in 3C experiments by providing nearly equally spaced restriction sites. The RSs of four different enzymes were removed from the native sequence with point mutations and subsequently reintroduced within the sequence at regularly spaced positions (400bp, 1,500bp, 2,000bp and 6,000bp for DpnII, XbaI, HindIII and NdeI, respectively; Fig 1B and Fig S2). As shown on Fig 1C, the DpnII and HindIII RFs sizes in the redesigned synIV-3C region are normally distributed when compared to the highly skewed, native genome-wide distributions. Besides providing a way to increase the resolution of the 3C experiment, the design can also be used to focus on specific functional contacts, for instance between promoters and terminator regions of genes (Fig S2). When possible, coding sequences were targeted preferentially and modified using synonymous mutations (Fig S1). We identified a 150kb window on chromosome 4 for which the uniformity of RFs lengths was maximized while the number of potentially deleterious base changes was minimized (the final choice for the region can also take into account sequence annotation and guided primarily by specific interests of the end-user). From this design, DNA building block were purchased and assembled as described (Annaluru et al, 2014; Muller et al, 2012) (Supplementary Note 2). Sequencing confirmed that 144kb within the targeted region were replaced by the redesigned sequence and that 100% of the mutations were introduced at the correct positions corresponding to a total of 2% divergence with the reference genome. No significant growth defects were detected in the synthetic strain (Fig S3). We then performed Hi-C experiments on the strain carrying the synIV-3C redesigned chromosome as well as in a wild type strain using DpnII and HindIII (Supplementary Note 3). The raw DpnII contact map of chromosome 4 exhibited a remarkably “smooth” pattern within the redesigned region compared to the native flanking regions (Fig D). The read coverage over the region also exhibits a dramatic and compelling change, with a more homogeneous and regular distribution in the synthetic regions for both enzymes compared to a highly heterogeneous distribution in the native sequence (Figs 2A, B). Interestingly, careful examination of this distribution indicates that besides its own length, the capture frequency of a given fragment is also influenced by the length of its neighbors. To quantify the improvement in the SynIV-3C region we compared the signal with the signal over the same region obtained in the WT strain using the same number of aligned read pairs and identical bins of various sizes (Figs 2C, D). At the smallest resolution tested (600bp for DpnII and 2,400bp for HindIII) the WT contact map exhibited numerous blind regions with no detectable contacts (empty bins), in sharp contrast with its synthetic counterpart (Fig 2C, D). When fragments were aggregated in bins of increasing sizes (hence, resulting in a loss of resolution) these blind regions gradually disappear, although the heterogeneity of the data remains consistently higher in the WT compare to synIV-3C strain, as showed by the increased span of the color-scales of the WT maps. In order to further quantify this heterogeneity, we computed the cumulative distributions of the number of contacts between bins separated by a given genomic distance s (bp) in the synIV-3C region and in its native counterpart for DpnII and HindIII (Figs 2C and 2D, respectively). The redesigned region systematically exhibited more homogeneous contacts counts and narrower distributions than the WT region, both at short (s = 2 x bins sizes; Figs 2C and D middle panels) and longer distances (Supplementary Note 5 and Figs S4, S5). Some of the bins in the native region remain almost invisible to the assay as a result of the heterogeneity in RF distribution (blue squares on Figs 2C and D middle panels). We computed the coefficient of variation CV (i.e. standard deviation /mean) of these distributions for multiple values of s. We use this value as an indication of the signal to noise ratio (Figs 2C and D right panels). Interestingly, we found that even for large bins, the CV is significantly and consistently smaller in the synthetic region, again indicating improved resolution. These results also clearly illustrate the advantage of using a frequent cutter (DpnII vs. HindIII) restriction enzyme with respect to resolution since the distribution of contact counts between bins remains much more spread with HindIII than with DpnII, even for native sequences (Fig 2B). Chromosome conformation capture is a dynamic field: two approaches using modified restriction patterns have been recently used to increase/improve the resolution, DNAse Hi-C and Micro-C (Hsieh et al, 2015) (note also that enrichment steps of regions of interest do not alleviate the limitations associated to the natural restriction pattern described above, thus have no effect on the resolution per se). DNAse-HiC captures contacts between open chromatin sites. DNAse Hi-C was not been performed in yeast and therefore we did not compare Syn-3C with this approach. However, given the fact that DNAse sensitive sites are found approximately every 3 kb along the yeast genome (Ma et al, 2015), it is expected that DNAse Hi-C would give results comparable to Hind-III Hi-C. Micro-C, on the other hand exploits micro-Coccal nuclease (Mnase) to digest DNA rather than a restriction enzyme. This approach generates nonspecific cuts inbetween nucleosomes (every ~160bp), resulting in a relatively regular restriction pattern. MicroC reads were reprocessed and the outcome compared to Syn-3C redesigned region along chromosome 4. Although the Micro-C reads density is overall more regular than for a WT Hi-C experiment, nucleosome free regions generate some inhomogeneity in the distribution. At short distances (600bp) Micro-C and Syn-3C compared well, but the signal to noise ratio quickly drops for Micro-C at larger distances. In this frame, the two approaches aim at different objectives: whereas Micro-C captures well small domains, Syn-3C appears as an approach of choice to concomitantly i) improve the visibility of any given region from ~500bp and above, and most importantly ii) track trans interactions as well as iii) homologs. The yeast genome presents a relatively homogeneous GC content and few repeated sequences. The gain in resolution achieved by redesigning RS along the genome should therefore be even higher in organisms with more heterogeneous genomic content and will enable unbiased tracking of entire regions that are otherwise inaccessible to the experiment. One could envision, for instance, assembling the redesigned chromosome in yeast (Benders et al, 2010), before targeting the sequenced to replace its native counterpart in the organisms of interest (such as a bacteria, or eventually on mammalian cells

(RE) are the first steps of the experimental protocol. The basic unit of "C" experiments therefore consists in restriction fragments (RFs) that are subsequently religated and captured to identify long range contacts. The best resolution that can be obtained is directly imposed by the positions of the RE sites along the genome. Both 6-cutter and 4-cutter REs have been used (Marie-Nelly et al, 2014;Sexton et al, 2012;Rao et al, 2014;Le et al, 2013), the latter with the expectation that the resolution increases with the number of sites. However, this approach suffers from a major caveat: restriction sites (RSs) are not regularly spaced along the genomes.
The distribution of RFs lengths follows a geometric distribution, with important variations along the genome that depend on the local GC content and the specific sequence recognized by the RE. Given that the likelihood for a RF to be crosslinked by formaldehyde during the first step in the procedure depends on its length , the probability to detect a given fragment in any 3C experiment will in turn be strongly affected by this parameter (Fig 1A).
Normalization procedures have been developed in order to correct the signal Imakaev et al, 2012) but these methods involve filtering out fragments with unusually low or high signal and aggregating the contact data over several consecutive fragments in longer bins of fixed genomic length, at the expense of actual resolution . Overall, the definition of Hi-C resolution has remained empiric, because of the lack of a control sequence where RF biases would be alleviated.
In order to investigate and increase the resolution of 3C-based experiments, we designed and assembled a dedicated "synthetic" genomic region. As a proof of concept of this strategy, we describe here a redesigned ~150kb region (called here synIV-3C) of budding yeast chromosome 4. This designer chromosome closely resembles the native chromosome with respect to genetic elements (see Supplementary Note 1 and Fig S1), but was "designed" to yield high resolution and high visibility in 3C experiments by providing nearly equally spaced restriction sites. The RSs of four different enzymes were removed from the native sequence with point mutations and subsequently reintroduced within the sequence at regularly spaced positions (400bp, 1,500bp, 2,000bp and 6,000bp for DpnII, XbaI, HindIII and NdeI, respectively; Fig 1B and Fig S2). As shown on Fig 1C, the DpnII and HindIII RFs sizes in the redesigned synIV-3C region are normally distributed when compared to the highly skewed, native genome-wide distributions. Besides providing a way to increase the resolution of the 3C experiment, the design can also be used to focus on specific functional contacts, for instance between promoters and terminator regions of genes (Fig S2). When possible, coding sequences were targeted preferentially and modified using synonymous mutations (Fig S1). We identified a 150kb window on chromosome 4 for which the uniformity of RFs lengths was maximized while the number of potentially deleterious base changes was minimized (the final choice for the region can also take into account sequence annotation and guided primarily by specific interests of the end-user). From this design, DNA building block were purchased and assembled as described   (Supplementary Note 2). Sequencing confirmed that 144kb within the targeted region were replaced by the redesigned sequence and that 100% of the mutations were introduced at the correct positions corresponding to a total of 2% divergence with the reference genome. No significant growth defects were detected in the synthetic strain (Fig S3).
We then performed Hi-C experiments on the strain carrying the synIV-3C redesigned chromosome as well as in a wild type strain using DpnII and HindIII (Supplementary Note 3). The raw DpnII contact map of chromosome 4 exhibited a remarkably "smooth" pattern within the redesigned region compared to the native flanking regions (Fig D). The read coverage over the region also exhibits a dramatic and compelling change, with a more homogeneous and regular distribution in the synthetic regions for both enzymes compared to a highly heterogeneous distribution in the native sequence (Figs 2A, B). Interestingly, careful examination of this distribution indicates that besides its own length, the capture frequency of a given fragment is also influenced by the length of its neighbors. To quantify the improvement in the SynIV-3C region we compared the signal with the signal over the same region obtained in the WT strain using the same number of aligned read pairs and identical bins of various sizes (Figs 2C, D). At the smallest resolution tested (600bp for DpnII and 2,400bp for HindIII) the WT contact map exhibited numerous blind regions with no detectable contacts (empty bins), in sharp contrast with its synthetic counterpart (Fig 2C, D). When fragments were aggregated in bins of increasing sizes (hence, resulting in a loss of resolution) these blind regions gradually disappear, although the heterogeneity of the data remains consistently higher in the WT compare to synIV-3C strain, as showed by the increased span of the color-scales of the WT maps.
In order to further quantify this heterogeneity, we computed the cumulative distributions of the number of contacts between bins separated by a given genomic distance s (bp) in the synIV-3C region and in its native counterpart for DpnII and HindIII (Figs 2C and 2D, respectively). The redesigned region systematically exhibited more homogeneous contacts counts and narrower distributions than the WT region, both at short (s = 2 x bins sizes; Figs 2C and D middle panels) and longer distances (Supplementary Note 5 and Figs S4, S5). Some of the bins in the native region remain almost invisible to the assay as a result of the heterogeneity in RF distribution (blue squares on Figs 2C and D middle panels). We computed the coefficient of variation CV (i.e. standard deviation /mean) of these distributions for multiple values of s. We use this value as an indication of the signal to noise ratio (Figs 2C and D right panels).
Interestingly, we found that even for large bins, the CV is significantly and consistently smaller in the synthetic region, again indicating improved resolution. These results also clearly illustrate the advantage of using a frequent cutter (DpnII vs. HindIII) restriction enzyme with respect to resolution since the distribution of contact counts between bins remains much more spread with HindIII than with DpnII, even for native sequences (Fig 2B).
Chromosome conformation capture is a dynamic field: two approaches using modified restriction patterns have been recently used to increase/improve the resolution, DNAse Hi-C and Micro-C  (note also that enrichment steps of regions of interest do not alleviate the limitations associated to the natural restriction pattern described above, thus have no effect on the resolution per se). DNAse-HiC captures contacts between open chromatin sites.
DNAse Hi-C was not been performed in yeast and therefore we did not compare Syn-3C with this approach. However, given the fact that DNAse sensitive sites are found approximately every 3 kb along the yeast genome (Ma et al, 2015), it is expected that DNAse Hi-C would give results comparable to Hind-III Hi-C. Micro-C, on the other hand exploits micro-Coccal nuclease (Mnase) to digest DNA rather than a restriction enzyme. This approach generates nonspecific cuts inbetween nucleosomes (every ~160bp), resulting in a relatively regular restriction pattern. Micro-C reads were reprocessed and the outcome compared to Syn-3C redesigned region along chromosome 4. Although the Micro-C reads density is overall more regular than for a WT Hi-C experiment, nucleosome free regions generate some inhomogeneity in the distribution. At short distances (600bp) Micro-C and Syn-3C compared well, but the signal to noise ratio quickly drops for Micro-C at larger distances. In this frame, the two approaches aim at different objectives: whereas Micro-C captures well small domains, Syn-3C appears as an approach of choice to concomitantly i) improve the visibility of any given region from ~500bp and above, and most importantly ii) track trans interactions as well as iii) homologs.
The yeast genome presents a relatively homogeneous GC content and few repeated sequences. The gain in resolution achieved by redesigning RS along the genome should therefore be even higher in organisms with more heterogeneous genomic content and will enable unbiased tracking of entire regions that are otherwise inaccessible to the experiment.
One could envision, for instance, assembling the redesigned chromosome in yeast (Benders et al, 2010), before targeting the sequenced to replace its native counterpart in the organisms of interest (such as a bacteria, or eventually on mammalian cells). Other advantages of the approach include the modularity of the assembly step (Supplementary Note 2), that allows the introduction of building blocks carrying genetic elements of interest within the redesigned region. For instance, one could introduce highly expressed promoters in the middle of "gene desert" areas, to investigate the effect of gene expression on the local chromatin structure. One can also "shuffle" some of these building blocks, to look at the influence of specific DNA binding proteins on the contact networks. In addition, an interesting follow up to this study is to cross our synIV-3C strain with a WT strain (or a strain with a different design) in order to resolve at the same time both homologs in a single experiment. Finally, the combination of Capture-C (Hughes et al, 2014) like approaches, which enrich the regions of interests (tough without alleviating the inherent biases) to investigate the synthetic region will also boost the analysis depth to unprecedented levels. This specific 3C-friendly design is the first time, to our knowledge, where a large (>100kb) region of chromosome is specifically redesigned and assembled for the purpose of improving an assay so that we can now address more precisely and accurately specific questions related to the biology of the cell. It paves the way to more studies exploiting the power of synthetic biology to boost, refine, and maybe reshape traditional molecular biology approaches through orthogonal ones.

Acknowledgements
We thank Jef Boeke for the Sc2.0 PCRTags sequences for chromosome 4 and for fruitful discussions and comments on the manuscript. We thank Elodie Pirayre and Ivan Moszer for contributing to the initial steps of the design of the algorithm, and Axel Cournac, Martial

Redesigning chromosomes to optimize conformation capture (Hi-C) assays
Muller Heloise 1,2* , Scolari F. Vittore 1,2* , Mercy Guillaume 1,2 , Agier Nicolas 3,4 , Descorps-Declere Stephane 5 , Fischer Gilles 3,4 , Mozziconacci Julien 6,7 , and Koszul Romain 1,2 p. We aimed at modifying the native sequence of a budding yeast chromosome according to our design principles while introducing as little modifications as possible. Because we were planning on re-assembling only a 150kb window within the genome, we scanned through the overall sequence using a scoring quality function to look for the candidate regions qualifying as the ideal target, i.e. where our principles would introduce a minimal number of mutations.
The starting material was the S. cerevisiae SK1 strain genome sequence and annotations (Liti et al, 2009) and a list of 9 restriction enzymes (EcoRI, HindIII, NdeI, PstI, SacI, SacII, SalI, XbaI, XhoI and DpnII). RE were selected based on their low cost and restriction efficiency.  Whether the position belongs to a restriction site.
 If it belongs to an intergenic or coding region, and in the latter case, the codon it belongs to and its position.
Sliding windows of 150 kb moving with 10kb steps were then generated over the entire genome.
In parallel, we defined the restriction pattern we wanted to generate:  Regularly spaced intervals for 400, 1,500, 2,000 and 6,000 bp  Gene promoter/terminator (substitutions within a coding sequence strongly Overall, we selected the 10 "best" windows located at least at 150 kb from either a centromere or a telomere. The quality score was weighted by the presence of "forbidden positions" within the window, for instance when a start codon overlaps a restriction site to be deleted. Finally, a manual curation, aiming at fixing potential conflicts (such as 2 RSs overlapping the same bases, or accidental re-creation of a RS of one enzyme when processing a second one), followed, and was performed on the genome windows presenting the best quality scores.
We chose the final window based also on our research interests, i.e. containing at least two early replicating replication origins (Siow et al, 2012;Raghuraman et al, 2001), and several hotspots of meiotic DNA double-strand breaks (Pan et al, 2011). We also attempted to avoid too many retrotransposable elements or other DNA repeats. The final window was positioned on chromosome IV::700,000-850,000, with restriction patterns as follow: DnpII ↔ 400 bp window; XbaI ↔ 1,500 bp window; HindIII ↔ 2,000 bp window; NdeI ↔ 6,000 bp window; HhaI ↔ promoter/terminator (see summary on Fig S2). 1037 mutations were present in the sequence, the vast majority corresponding to the modifications necessary to reorganize DpnII RS (Table S1). Overall, 1037 mutations were introduced, corresponding to 0.7% divergence. Dymond et al, 2011) specific to either the native or synthetic sequence were also introduced within the window. Performing PCR using these primers allow testing for the presence and absence of the synthetic and native sequence, respectively. PCRTags were manually curated to adapt them to the restriction design, and overall 59 PCR tags out of 154 needed to be modified accordingly.
Overall, a total of 3229 bp were modified (2% of the 150,000 bp window). 743 codons were modified, but no change in the sequences of the corresponding proteins were introduced.
Although we took great care in the design of the sequence and algorithm, our ongoing experiments nevertheless suggest slight modifications in the design principles that have the potential to facilitate both the design and the analysis of the experiment. First, windows of 400 and 1,000 bp are probably sufficient to assay the structure at a high resolution. Second, during curation one could manually remove the extremely small RFs generated throughout the process and that nevertheless pass all the quality filters (such as the tiny RFs still visible in the DpnII restriction pattern of the Syn-3C region in Fig 1C). Third, SNPs have to be introduced in repeats or low complexity DNA to facilitate mapping of the reads. Fourth, that the WT and Syn-3C sequence are sufficiently divergent to be analyzed using cheaper shortread technologies (such as paired-end 50bp Illumina for instance) can also be imposed by introducing extra silent mutations in the sequence. Finally, a possibility to find the best frame for each interval/pattern is to start with the positions of RS overlapping forbidden sites as seed. Figure S1. Diagram of the workflow. Only silent mutations were introduced within coding regions.
6 583 x 150 kb windows with 10 kb overlaps were generated over the entire genome, excluding telomeres and 75 kb from each side of centromeres. 7 Here, 400 bp, 1,500 bp, 2,000 bp and 6,000 bp 8 For each 150Kb window and each interval the following steps were performed:

Assembly of the redesigned chromosome.
The redesigned sequence was split into 52 fragments of ~3,000 bp (i.e., block), with 200 bp overlaps between them. In addition sequences corresponding either to the auxotropHi-C marker genes URA3 or LEU2 were added to blocks 20, 37, 52 (URA3) and blocks 11, 28, 47 (LEU2), followed by 200 bp sequences of the WT neighboring chromosomal region. The replacement of the native sequence of strain BY4742 with the redesigned blocks was performed through a succession of six transformations, up to 11 blocks at a time .
After each transformation, independent colonies were sampled and PCRs performed at the PCR tags positions to identify the transformants that have replaced all of the native sequence with the redesigned one (Fig S3). Upon the last transformation, the selected transformant genome was sequenced and the region 707,556-852,114 (144,558 bp) was found to be replaced by the synthetic blocks.
In parallel, growth assays were performed to see if the transformant exhibited small losses in fitness. Little to no growth defect could be identified when blocks 1 to 47 replaced the native sequence. Interestingly, the last transformation using blocks 48 till 52 led repeatedly to the recovery of transformants exhibiting a slow-growth, petite phenotype (Slonimski, 1949), reflecting a block in the aerobic respiratory chain pathway and a decrease in ATP. Since the region concerned by the 6 th transformation only involved a few kb, we decided to move further regarding the analysis exploiting the 145 kb already successfully reassembled. We also observed that crossing with a WT strain gave diploids without growth defects.
Sporulation of these diploids gave offsprings with growth rates also similar to WT, suggesting stable complementation of mitochondrial genomes. Each curve is computed out of 8 independent cultures.

RNA Isolation from Yeast for RNA Sequencing
Total RNA isolation and analysis of BY4742 and Syn3C strains, were realized on three biological replicates. Single yeast colonies were grown in a 2 mL culture in YPD overnight at 30°C. The next morning, 10 mL cultures in YPD were started from 10 6 cells/mL until they reached 2.10 7 cell/mL. The cells were pelleted by spinning at 5000 rpm at 4°C for 5 min. The pellet was resuspended in 0.5 mL of Tris-HCl (10 mM, pH 7,5) and transferred to a microfuge tube. The cells were pelleted again by spinning briefly and discarding the supernatant. The cells were resuspended in 400 µL RNA TES buffer (10 mM Tris-HCl, pH 7.5 10 mM EDTA 0.5% SDS). 400 µL of acid phenol/chloroform was then added to the cells and vortexed for 1 min, and heated at 65°C for 30 minutes, briefly vortexing some time to time. The cells were placed on ice for 5 min band centrifuge at 13000 rpm at 4°C for 5 min. The aqueous layer (~400 µL) was then transferred into another microfuge tube; an equal amount of phenol/chloroform acid was added a second time, mixed well and centrifuged at 13000 rpm at 4°C for 5 min. The RNA (~400 µL) was precipitated by adding 40 µL of sodium acetate (3 M) and 1,1 mL of absolute ethanol and incubating the tube at −80˚C for a least 30 min. The RNA was pelleted by centrifuging at 13000 rpm at 4˚C for 20 min. The RNA pellet was then washed with 500 µL of 70% ethanol, air-dried and then resuspended in 50 µL of water. 15 µg were treated with 2U of DNase TURBO (Invitrogen) and cleaned up by phenol extraction and ethanol precipitation before being prepared for sequencing.

RNA-Seq Analysis of synIII
Single-end non-strand-specific RNA-seq of the Syn3C and BY4742 were performed using Illumina Nextseq and standard TruSeq preparations kits, after depletion of ribosomal RNA.
Reads were mapped using Bowtie2 to the reference S. cerevisiae BY4742 and Syn3C genome. For each gene, reads were counted if mapping quality was lower than 30 and analyzed for differential expression using DESeq2, with standard parameters.

Hi-C experiments and contact maps generation
S. cerevisiae G1 daughter cells of the redesigned strain were recovered from an exponentially growing population through an elutriation procedure .
Hi-C libraries were generated as described (Cournac et al, 2015;

Processing of the reads and contact maps generations
The raw data from each 3C experiment was processed as follow: first, PCR duplicates were collapsed using the 6 Ns present on each of the custom-made adapter and trimmed. Reads were then aligned using Bowtie 2 in its most sensitive mode against S. cerevisiae reference genome (native genome) or against the S. cerevisiae reference adapted for the Syn-3C region on chromosome 4 (SynIV-3C genome). An iterative alignment procedure was used: for each read the length of the sequence mapped increases gradually from 20 bp until the mapping became unambiguous (mapping quality > 30). Paired reads were aligned independently and each mapped read was assigned to a restriction fragment. Religation events has been filtered out through the information about the orientation of the sequences as described in . The distribution of the reads along the synthetic region and its native counterpart is represented in Fig 2. Contact matrices were built for the wild type and the mutant by binning the aligned reads into units of single restriction fragments. DpnII and HindIII contact maps for the SynIV-3C region and its native counterpart were randomly resampled in order to present the same number of contacts. The raw contact maps were then subsequently binned into units (i.e. bins) of 600, 1,200, 2,400, 4,800 and 9,600 base pairs. Contacts maps were generated using the levelplot function of the R lattice package. Outliers has been removed from the matrices if the number of the contacts surpassed by 20 times the top 5‰ threshold of the number of contacts between restriction fragment pairs.
The accession number for the sequences reported in this paper is BioProject: XXX. http://www.ncbi.nlm.nih.gov/bioproject/XXX

Statistical analysis
Construction of contact histogram. Cumulative histograms were generated from contact maps for the different bin sizes (Fig S4). small bin sizes (s = 600bp), the distribution of contacts of the redesigned region appeared systematically narrower than for the native region, with most bins being "visible", i.e.
containing at least one read. The gain in resolution somehow fades away for the frequent cutter when the bin size increases, but, interestingly, the visibility of the bins remains nevertheless systematically better. For HindIII, the gain in resolution is always considerably better in the redesigned vs. the native sequence.
Quality improvement: the CV is defined as the ratio between the standard deviation and the mean of the contact histograms at fixed distance s; to take into account the finite-size effect, along chromosome 4. The quality improvement was assessed by computing the logarithm of the ratio of the CVs of the SynIV-3C and native region (Fig S5).