A novel approach for metabolic pathway optimization: Oligo-linker mediated assembly (OLMA) method

Imbalances in gene expression of a metabolic pathway can result in less-yield of the desired products. Several targets were intensively investigated to balance the gene expression, such as promoter, ribosome binding site (RBS), the order of genes, as well as the species of the enzymes. However, the capability of simultaneous manipulation of multiple targets still needs to be explored. We reported a new DNA assembling method to vary all the above types of regulatory targets simultaneously, named oligo-linker mediated assembly (OLMA) method, which can incorporate up to 8 targets in a single assembly step. Two experimental cases were used to demonstrate the capability of the method: (1) assembly of multiple pieces of lacZ expression cassette; (2) optimization of four enzymes in lycopene biosynthetic pathway. Our results indicated that the OLMA method not only exploited larger combinatorial space, but also reduced the inefficient mutants. The unique feature of oligo-linker mediated assembly (OLMA) method is inclusion of a set of chemically synthetic double-stranded DNA oligo library, which can be designed as promoters and RBSs, or designed with different overhang to bridge the genes in different orders. The inclusion of the oligos resulted in a PCR-free and zipcode-free DNA assembly reaction for OLMA.

a co-transcribed operon, and the first gene in the operon is expressed much higher than the last one [13]. Thus, the order of genes in an operon as the third level can be modulated to balance the expression of enzymes [14]. Moreover, a same-function enzyme from different organism could have different solubility, stability, kinetic properties and substrate specificity, thus the source of enzyme as the forth level can be optimized by choosing the different coding sequences from various species [15].
Based on the length of the above targets, they can be classified into two groups: short-targets (<50 bp) and long-targets (>500 bp). The short-targets include promoter and RBS, while the long-targets contain the replication origin of the vector and the coding sequence of enzymes. When constructing the combinatorial libraries, the short-targets can be easily designed into the chemically synthesized DNA oligo, but the long-targets must be cloned into vectors or amplified by PCR. For optimizing a metabolic pathway, most of the past work focused on varying one of the above targets, although there is few work had strived to simultaneously manipulate more than one target [16].
Recently, the advanced DNA assembly methods have been harnessed to construct the combinatorial library for optimizing the metabolic pathways. These approaches include Gibson assembly method [17], Golden Gate assembly [18], Serine integrase recombination assembly (SIRA) [19], Cross-Lapping in Vitro Assembly (CLIVA) method [20], single strand assembly (SSA) method [6], Paperclip [21], VEGAS [22], YeastFab [23], Randomized BioBrick Assembly [24] and so on. Most of these approaches have capability to modulate one or more of the above targets by introducing the short-targets in the PCR primers and acquiring the long-targets from the PCR amplification, using Gibson method or homologous recombination for the ultimate assembly. However, the PCR amplification of large DNA fragments (>2 k bp) would introduce some undesired mutations into the DNA sequence of the pathway [7,25]. In these methods, shorttargets are hybridized with long-target, so PCR amplification is always needed for a different assembly order or a different short-target, such as a different strength RBS. As a PCR-free method, the Golden Gate assembly not only needs one more sub-cloning step for all the DNA fragments, but also introduces zipcodes to connect the fragments as pre-defined orders. If ones want to change the order of DNA fragments, they must repeat the laborious sub-cloning process. So a PCR-free and zipcode-free DNA assembly method is still desired to modulate the multiple targets of pathway optimization.
Here, we reported a PCR-free and zipcode-free DNA assembly method, named oligo-linker mediated assembly (OLMA) method, which can simultaneously incorporating multiple targets from both short-targets (promoter and RBS) and long-targets (coding sequences and order of genes) to generate an efficient combinatorial library. The libraries of short-targets were designed into the chemically synthetic double strand oligos, while the variants of longtargets were released from a standard vector. A unique feature of the method is the usage of the double-stranded DNA oligos as both linker and zipcode, this separation of short-targets and long-targets can avoid multiple rounds of PCR amplification. If one wants to change the order of genes in an operon, they just need to synthesize a new set of double-stranded DNA oligos and change their overhanging end as new zipcodes. Two experimental cases were chosen to evaluate the efficiency and reliability of OLMA method. The first case is to assembly multiple fragments of lacZ expression cassette, while the second one is the optimization of lycopene synthesis pathway via balancing the expression of four enzymes. Our results indicated that the OLMA method not only can effectively and reliably exploit much larger combinatorial space, but also reduce the inefficient mutants.

Results and discussion
Design and validation of Oligo-linker-mediated assembly (OLMA) Method By introducing double-stranded oligo-linker, we developed an oligonucleotides linkers mediated DNA assembly (OLMA) method based on Golden Gate cloning strategy [26] (Figure 1). The unique feature of OLMA method is the usage of double-stranded bridging oligos (<50 bp), which can join any existed modular DNA parts in a pre-defined order. Here, the double-stranded oligos can be designed either as native sequences of the modular parts or as additional regulatory elements (such as RBS, promoter). To easily test the method, we chose lacZ reporter expression cassette as a case study to validate the efficiency of different number of pieces (i.e. 1-piece, 3pieces, 4-pieces and 5-pieces). The split lacZ fragments, named lacZ1, lacZ3.1, lacZ3.2, lacZ3.3, and so on, were first constructed into a standard vector by Gibson assembly method and confirmed by sequencing. No further sequencing was needed during the following assembly steps. The double-stranded oligonucleotides (Ds-oligos) function as linker to bridge the assembly of the lacZ pieces, the sequence of lacZ module was shown in Additional file 2. All the oligos were obtained by annealing of two complementary single-strand oligonucleotides, and then phosphorylated to facilitate the following ligation reaction as described in "Methods" section. After transforming the ligated products into competent cells, 10 to10,000 colonies were acquired on a plate (Table 1 and Additional file1: Fig. S1). The correct ratios of ligated lacZ cassette were decreased from 99.9 to 43 % when the number of fragments increasing from one to four, and the ratio remained to 10 % for five lacZ fragments. These results indicated that both the colony number and positive ratio dramatically decreased when piece number reach to five. Thus, more optimization is still needed to improve the efficiency of the OLMA method for the fragment number larger than four.
Optimization of the lycopene synthetic pathway by the OLMA method As mentioned above, the unique feature of OLMA is the inclusion of the short promoter or RBS library as chemically synthetic double-stranded DNA, but keeping the long coding sequence and the replication origin released from a standard vector rather than the PCR amplification. To demonstrate the advantage of the OLMA method, a four-gene lycopene biosynthetic pathway was optimized in E.coli (Fig. 2). Lycopene has a variety of biological functions and is widely used in pharmaceutical, food and cosmetic industries. Lycopene can be produced by heterogeneously expressing three genes (crtEBI), but the expression of the idi gene in E.coli usually needs to be strengthened for balance of precursors of lycopene, IPP and DMAPP. Here, we would demonstrate how to simultaneously vary multiple targets by the OLMA method to balance the expression of the four enzymes. These targets include four RBS, three coding sequences of enzymes and the order of the genes (Fig. 2). For the coding sequence targets, four native variants of crtE, crtB and crtI genes were respectively chosen from Pantoea ananatis (Pan), Pantoea agglomerans (Pag), Pantoea vagans (Pva) and Rhodobacter sphaeroides(Rsp). For the RBS target of crtE, crtB, crtI and idi genes, a small set of rationally designed RBSs, rather than a large random RBS library, were acquired by RBS calculator with a wide range of theoretical strength (~100-10000 units) (Additional file 1: Table S1). Additionally, the order of crtE, crtB and crtI as a target can be swapped by introducing proper synthesized double-strand oligos with different overhangs. We fixed one more copy of idi gene as the last one in the crtE-crtI-crtB-idi operon, and only varied its RBS target to tune its expression. On the contrary, the other three genes (crtEBI) were varied not  only the RBS strength, but also the coding sequence and gene orders (Fig. 2).
As the first step, we individually modulated the RBS strength, coding sequence and gene order targets to optimize the lycopene pathway (Fig. 2). When varying RBS strength, the crtE, crtB and crtI genes from P.ananatis were assembled in the crtE-I-B order with 10-20 rationally designed RBSs that cover a wide range of theoretical strength (Additional file 1: Table S1). Ninety red colonies were randomly chosen and measured for the lycopene production according to the method described in "Methods" section. The tested clones exhibited a wide range of lycopene yields from 1.15 to 11.24 mg/g DCW in liquid culture, with a significant coverage (c-factor = 36.6) ( Fig. 3a). Here, the c-factor was defined as dynamic-range of lycopene yields multiplying with their variation (math- , to describe the coverage degree of the samples in the production landscape. It is worthy to note that the c-factor is independent to the number of measured samples. When varying the coding sequences of the crtE-crtI-crtB genes, we collected their native variants of the corresponding enzymes from Pantoea ananatis (Pan), Pantoea agglomerans (Pag), Pantoea vagans (Pva), Rhodobacter sphaeroides (Rsp), generating 48 variants  Gene order library was constructed using the OLMA method. The PanE, PanB and PanI genes were assembled in all six orders. The dynamic range and c-factor of each library were shown in the figures with a same RBS and crtEIB order. The lycopene yield of these variants span from 2.06 to 7.06 mg/g DCW, with a small coverage (c-factor = 7.8) (Fig. 3b). Finally, when varying the order of crtE-crtI-crtB in the operon, a series of RBS with the same strength but different overhanging were used, for example RBSs E4, E-B13, B-I15, I-id16 in Table 2 were used for the order crtE-B-I and RBSs B13, B-I15, I-E4, E-id16 were used for the order crtB-I-E. All six variants (i.e. crtEIB, crtEBI, crtIBE, crtIEB, crtBEI and crtBIE) were measured, and produced lycopene at levels ranging from 0.17 to 6.2 mg/g DCW (Fig. 3c), and the best order of lycopene producer was the crtEBI among all the six variants. The crtIEB order just produced 0.17 mg/g DCW lycopene suggesting that this gene order may result in severe imbalance in the pathway. Taken together, all the single type of targets were confirmed as effective targets to vary the gene expression and affect the production yield of lycopene by the OLMA method. Meanwhile, the rational design of RBS library has dramatically reduced the number of the inefficient variants, but generated the best dynamic range and coverage among the three types of targets.
In order to further increase the dynamic range and coverage of the combinatorial library, the manipulation of more than one type of target was desired. Fortunately, the OLMA method has the capability to incorporate multiple types of targets into a single assembly step. First, we combined two types of targets (i.e. RBSs-gene order, RBSs-coding sequence) into a combinatorial library. When combining 4 RBS targets and the geneorder target, the dynamic range and coverage of lycopene yield were respectively increased to 0.221 2.06 mg/g DCW and c-factor = 17.8 (Fig. 4a). When combing 4 RBS targets and 3 coding sequence targets, the dynamic range and coverage of lycopene yield were increased to 0.32~13.86 mg/g DCW and c-factor = 79.3 respectively (Fig. 4b). More ambitiously, 4 RBS targets, 3 coding sequence targets and the gene-order target can be combined in to a single assembly step by the OLMA method to further explore larger dynamic range and coverage of the yield landscape space of lycopene production. Though the possible combination number in the library is 3.8016 × 10 6 , only 1080 colonies were randomly chosen and measured to determine their lycopene yields. The dynamic range and coverage of lycopene production has reached to 0.14~15.17 mg/g DCW and cfactor = 83.1 respectively (Fig. 4c). The more targets were used, the dynamic range and coverage increased to a higher level, this may attributed to the additive effect of each target. From the 1080 measured variants, the ten top variants that produced maximal lycopene were sequenced (Table 3 and Additional file 2). The sequencing results showed that the sources of ten variants were very diverse for their RBSs, gene-order and the coding sequences, indicating that the landscape space of lycopene yield is zigzagged and has multiple peaks (i.e. local maximal yield) rather than one. These results supported that the OLMA method has the capability to explore larger combinatorial space and increase the probability to find a better flux-balanced variants for a multiple-enzyme pathway.

Conclusion
As a DNA assembly approach, one advantage of OLMA method is PCR-free and scar-free for its assembly process. The reason is that their short-fragments were introduced as chemically synthesized double-strand DNA oligos and the long-fragments were released from a standard vector. By separating the short-fragments from the long-fragments, the OLMA method also dramatically increased its flexibility and capacity to incorporate more targets. For instance, we here have incorporated 8 manipulated-targets (4 RBSs, 3 coding sequence and the gene-order) in a single assembly step. The four RBSs were designed as short-fragments, whereas the three targets of coding sequences were regarded as long-fragments and released from a standard vector. For the OLMA method, the ability of swapping the gene-order completely attributed to the flexible overhangs of chemically synthesized double-strand DNA oligos. More generally, the OLMA method can incorporate more regulatory elements, such as promoters, terminators, replication origin, and so on. Therefore, we expected that the OLMA method can become a powerful tool for balancing the expression of the enzymes in a more complex biosynthesis pathway.  Plasmid pHDwas used to construct donor vectors. For the assembly of lacZ cassette, receptor vector pFUS was constructed with a ccdB operon. For the assembly of lycopene pathway, receptor vector pYC1k-ccdB-idi (Additional file 1: Fig. S2), derived from pYC1k (p15A origin, Tac promoter, Kan R ) was constructed with ccdB operon and idi gene from Escherichia coli.

Methods
Strains and plasmids used in this study were listed in Table 4. Routine bacterial growth was in LB medium and antibiotics (tetracycline 10 μg/ml, kanamycin 50 μg/ ml, streptomycin 50 μg/ml) added as required.

Construction of donor vectors
The assembled lacZ cassette comprised a constitutive promoter pJ23001, lacZ gene, rrnB terminator (3.7 k bp) and a backbone vector. To test the efficiency of three-, four-, five-fragment assemblies, we divided lacZ gene into three, four and five parts. Fragments of lacZ gene from E. coli MG1655 was amplified by PCR and cloned into the standard plasmid pHD using Gibson assembly method [17]. Two BsaI type IIS restriction sites with different overhangs were positioned at each side of the fragments.
PanE from Pantoea ananatis (Pan), Pantoea agglomerans(Pag), Pantoea vagans (Pva), Rhodobacter sphaeroides(Rsp) were each amplified by PCR with the same junctions, and cloned into the standard plasmid pHD using Gibson assembly method [17]. The primers were shown in Table S2 in Additional file 1. All the BsaI restriction sites in genes were silent mutated. PanB

Preparation of double-stranded oligonucleotides
Double-stranded oligonucleotides (ds-oligos) used for assembly were obtained by annealing of two complementary single-strand oligonucleotides (forward strand and reverse complement strand). Single strand oligonucleotides were synthesized by BGI Tech, and dissolved in nuclease-free water to a concentration of 10 μΜ according to the production description, then the complementary oligos (the final concentration is 1μΜ) were annealed at 95°C for 5 min and then cooled to 4°C at 0.1°C/s. The double-stranded oligonucleotides was diluted to 100 nM for the phosphorylation in 20 μl reaction volume containing 10 μl ddH 2 O, 6 μl double-stranded oligos, 2 μl 10 × T4 DNA ligase buffer (New England BioLabs) and 2 μl T4 Polynucleotide Kinase (10 U, New England BioLabs, M0201). The reaction was incubated at 37°C for 30 min. The single-strand oligos for lacZ cassette assembly were shown in Table 5. RBSs for fine-turning the lycopene metabolic pathway were designed using RBS calculator (Salis et al.), and 10-20 RBS sequences with a wide range of theoretical strength (~100-1000) were selected to be the oligos (Additional file 1: Table S1). Here the oligos with the highest theoretical strength were shown in Table 2. These synthetic oligos are composed of RBS core, the ATG starting codon and 4 base stick ends at the 5'-end. Three  [26,28]. In all, the assembly reaction just needs a few hours to complete.

Positive clones screen
For the assembly of multiple pieces of lacZ gene, the transformed cells were plated on the LB agar supplied X-Gal plate for blue-white selection.