An efficient method for the construction of artificial, concatemeric DNA, RNA and proteins with genetically programmed functions, using a novel, vector-enzymatic DNA fragment amplification-expression technology

De novo designed bioactive molecules, such as DNA, RNA and peptides, are utilized in increasingly diverse scientific, industrial and biomedical applications. Concatemerization of designed DNA, RNA and peptides may improve their stability, bioactivity and allow for gradual release of the bioactive molecule at the intended destination. In this context, we developed a new method enabling the formation of DNA concatemers for the production of artificial, repetitive genes, encoding concatemeric RNAs and proteins of any nucleotide and amino-acid sequence. The technology recruits the Type IIS SapI restriction endonuclease (REase) for assembling DNA fragments in an ordered head-to-tail-orientation. Alternatively, other commercially available SapI isoschizomers can be used: LguI and thermostable BspQI. Four series of DNA vectors dedicated to the expression of newly formed, concatemeric open reading frames (ORFs), were designed and constructed to meet the technology needs. • Vector-enzymatic DNA fragment amplification technology. • Construction of DNA concatemers many times longer than those available with the use of current de novo gene synthesis methods. • Biosynthesis of protein tandem repeats with programmable function never seen in nature.


Introduction
Various biomaterials, including those based on peptides and polypeptides are being increasingly developed and used in medicine, as biosensors or in biotechnology and material engineering, among others [1][2][3] . Some of these applications rely on specific peptide/polypeptide ligands and can be enhanced by the use of a local concentration increase, e.g. on the molecular level, by generating joined ligands and/or DNA or RNA that encodes them. Example benefits include: obtaining higher ligand expression/biosynthesis, increased sensitivity (biosensors), increased immunogenicity (new generation vaccines composed exclusively of epitopes). Some of these methods are based on the construction of plasmids carrying multiple joined genes. Strategies enabling an ordered, head-totail arrangement of monomeric peptides are preferred over head-to-head or random arrangement. Head-to tail arrangement of cloned multimers provides stabilization of the recombinant DNA plasmid containing concatemeric polymeric DNA. Previous methods (used thus far for the construction of concatemeric, coding DNA) suffered from either the inability or great difficulty in joining DNA segments, while at the same time maintaining continuity of the formed Open Reading Frame. This prevented the formation of the final desired product -concatemeric polypeptides, containing multiple bioactive monomers of the same amino acid sequences with a pre-programmed function, without off-frame segments. This paper presents a novel genetic engineering method for the construction of concatemeric DNA, RNA and finally a protein, containing up to 500 copies of bioactive peptides, which are joined in perfect fusion for in-frame translation of the artificial protein ( Fig. 1 ). This system is based on: specific amplification-expression DNA vectors, containing a universal DNA fragment amplification module, a Type IIS REase, SapI, and T4 DNA ligase [2][3][4][5] . The presented method has numerous potential medical and scientific applications and can also be used in bioprocessing. The technology described is or has been protected by: originally Polish, PCT, EU, USA, Indian, Japanese, Israeli, Chinese patent applications (2014-15), followed by granted Polish patent no. 228341, (2018) [ 4,5 ] EU patent (aplication no. EP 15738474.4), (2020) , USA (application no. US2017/0095553 A1), (2020), Indian (application no. 201647039411) (2020)and Japanese (application no. 2017-507091) (2020).

Bacterial strains
Used for plasmid DNA cloning and purification: The strains were from Thermo Fisher Scientific (Waltham, MA, USA).

Software
The genetic maps of the DNA vectors and recombinant constructs were prepared using SnapGene software version 4.1 ( http://www.snapgene.com ).

DNA amplification-expression pAMP vectors for temperature-regulated concatemeric protein biosynthesis in E. coli cytoplasm
The pAMP vectors were designed on the basis of the p15A origin vector pACYC184 [1] and its derivative pRZ4737 (W. S. Reznikoff) [ 2 , 3 ] (GenBank: MK606505, MK606506, MK606507, MK60 6519, MK60 6520, MK651654). All pAMP DNA vectors contain: ( i ) a strong, temperature-regulated bacteriophage lambda pR transcription promoter, ( ii ) a bacteriophage lambda cI857ts repressor gene for host-independence of the vector, ( iii ) a DNA fragment amplification module, with two convergent SapI sites, separated with a SmaI site, for ordered, in-frame, head-to-tail amplification of DNA fragments, resulting in the assembly of an artificial, continuous, multimeric ORF and ( iv ) chloramphenicol resistance gene. The amplifying modules from pAMP vectors are presented in Fig. 2 . DNA sequences of the pAMP vectors are provided in Supplemental data.

DNA amplification-expression pET21AMP-HisA vector for IPTG-regulated concatemeric protein biosynthesis in E. coli cytoplasm
The pET21AMP-HisA vector was constructed using the pET-21d( + ) expression vector (Novagen, EMD Millipore Corporation) as a template for plasmid modification. The pET21AMP-HisA DNA vector (GenBank MK606521) contains: ( i ) a strong, IPTG-regulated T7-lac transcription promoter, ( ii ) colE1 ori , ( iii ) a DNA fragment amplification module HisA, with two convergent SapI sites for in-frame, headto-tail amplification of DNA fragments, resulting in assembly of an artificial, continuous, multimeric ORF and ( iv ) ampicillin resistance gene ( Fig. 3 ). DNA sequence of the pET21AMP-HisA DNA vector is provided in Supplemental data.

Method details
The rapidly developing field of synthetic biology is generating insatiable demands for synthetic genes. Depending on their application, newly designed genes may contain repetitive DNA fragments, which significantly impair their chemical synthesis. The capability of generating DNA molecules of any sequence or size is important especially for biomedical research. Here we present a significant improvement from an earlier strategy [7] that enables the formation of artificial, continuous, multimeric ORFs, concatemeric proteins of desired length and a monomer copy number using four series of specialized amplification-expression DNA vectors, equipped with a universal DNA amplification module. The module contains two convergent DNA recognition sequences of the Type IIS REase SapI, separated with a SmaI site for the insertion of any DNA fragment. It may be easily modified and introduced, as desired, into various DNA vectors, containing alternative origins of replication, antibiotic resistance genes, transcriptional promoters and translation initiation signals.
The presented method has numerous potential applications, especially in the pharmaceutical industry and tissue engineering, including vaccines and drug delivery systems production, as well as mass-production of peptide-derived biomaterials. The technology enables easy and efficient construction of artificial, concatemeric genes, greatly exceeding current chemical gene synthesis capabilities.

General scheme for directional DNA fragment amplification and concatemers construction
Four types of DNA vectors were designed for the needs of the DNA fragment amplification methodology: i) pAMP series and pET21AMP DNA vector for concatemeric protein biosynthesis in E. coli cytoplasm, ii ) pET28AMP_SapI-Ubq vector for cytoplasmic biosynthesis of concatemeric fusion proteins with an N-terminal ubiquitin and iii ) pET28AMP_PhoA or pET28AMP_MalE vectors for biosynthesis of concatemeric proteins secreted to the E. coli periplasm. One should follow the  same general protocol for all the designed DNA vectors. Protocol stages are as follows: ( i ) design or selection of the DNA fragment (monomer) to be amplified; ( ii ) chemical synthesis of DNA, PCR amplification or excision of the monomer using restriction endonucleases; ( iii ) addition of asymmetric, preferably 5 -CCC-3 /5 -GGG3', SapI 3-nt cohesive ends at 5 and 3 termini of the monomer. This can be achieved by introducing SapI recognition sequences during chemical synthesis of the monomer, PCR amplification or with the use of the vector's built-in SapI sites, after cloning the restriction fragment; ( iv ) purification of the DNA monomer equipped with SapI 3-nt cohesive ends; ( v ) directional self-ligation of DNA monomers in directional, head-to-tail orientation; ( vi ) ligation of a mixture of the formed concatemers or a selected gel-purified concatemer back into a SapI-cleaved amplification vector; ( vii ) selection of bacterial clones containing a concatemer with the desired number of monomers; ( viii ) direct biosynthesis of the protein encoded by the obtained DNA concatemer using the vector's strong promoters or excision of the concatemer, equipped with SapI-cohesive ends and repetition of steps ( iv )-( viii ), until a desired number of monomer copies within a concatemer is obtained ( Fig. 6 ).
Experimental procedure for directional head-to-tail amplification of a short DNA fragment 1. Design a synthetic dsDNA fragment encoding a bioactive peptide to be concatemerized.

Note 2: The designed dsDNA should have at least three base pairs from the end followed by SapI recognition sequences. We recommend adding GC clamps: the presence of G or C bases within the last three bases from the 3 end of the oligonucleotides (GC clamp) helps promote specific binding at the 3 end and improves SapI cleavage. Moreover, a GC clamp aids in specificity of the priming and therefore contributes to the overall efficiency of the PCR reaction.
2. Optimize a peptide-coding DNA sequence for efficient gene expression in E. coli . 3. Chemically synthesize the designed DNA fragment de novo . 4. Prepare a suitable amount of the DNA monomer for concatemerization using: (a) PCR with properly designed (e.g. containing ssDNA overhangs with SapI sites) forward and reverse primers, followed by SapI cleavage and DNA purification. (b) SapI excision of the cloned DNA monomer from the recombinant plasmid DNA and purification of the desired restriction fragment.   6. Subject the obtained recombinant constructs directly to expression of the artificial gene, coding for a concatemeric protein or alternatively perform another amplification cycle to obtain longer concatemeric genes.

Note 7: Alternative amplification.
In order to further boost DNA synthesis capability, one can design and try to chemically synthesize an artificial gene, encoding from several to several dozen copies of the selected peptide (possibilities for chemical DNA synthesis of a given repetitive DNA sequence are the only limit in this case). Such a synthetic, repetitive gene can be cloned into the selected, SapI-linearized and dephosphorylated amplification-expression vector and used as a 'monomer' for further amplification as described in Note 3 .

Experimental procedure for directional head-to-tail amplification of a long DNA fragment
For directional, head-to-tail ligation of longer DNA fragments a slight modification concerning addition of PEG40 0 0 to the ligation buffer is required to decrease circularization of the obtained ligation products.

Method validation
A validation of the presented method is described by Skowron et al. [2] and the corresponding Data in Brief article [3] . The technology has been validated so far by amplification of four DNA fragments, encoding the following peptides: (i) TKPTDGNGP (MSEC_2019_1496), (ii) TSRGDHELLGGGAAPVGG (MSEC_2019_1496; patent application P.427146), (iii) RGD and RGDGG (patent application P.425131) and (iv) RLIDRTNANFLGGGAAPVGGG (patent application P.427146). We managed to obtain up to 500 copies of our test model (the TKPTDGNGP peptide), using two rounds of amplification [2] . One should note, however, that the maximum obtainable monomer copy number within a concatemer may be lower than in the case of the mentioned test model [2] and strongly depends on the DNA sequence and the size of the DNA fragment to be concatemerized. Moreover, one should realize that some of the constructed concatemeric genes may not be efficiently expressed in E. coli . This may require a development of new strategies: improving concatemeric gene transcription and its mRNA stability, reducing toxicity or increasing solubility of the resulting proteins, as well as using alternative prokaryotic or eukaryotic expression systems. Examples of such strategies are: ( i ) N-terminal fusion with ubiquitin described herein or ( ii ) secretion of concatemeric proteins into E. coli periplasm [2] . It is worth mentioning that the DNA amplification-expression modules presented herein ( Fig. 2 -5 ) can be easily transferred to other DNA vectors, if necessary.

1.
A protocol for a new DNA fragment amplification-expression technology, devised for the production of artificial genes, encoding concatemeric RNAs and proteins with a pre-programmed nucleotide and aa sequence, is provided. The technology directs formation of ordered polymers or co-polymers, containing 500 or more copies of repeated monomeric units of DNA, RNA or peptides within a concatemer. 2. The ordered polymerisation of a DNA fragment, encoding a peptide with a pre-programmed biological or chemical function, can improve the desirable function of the resulting artificial polypeptide/protein. Such concatemeric proteins may serve for a number of constructions, exemplified by: ( i ) enhanced antigens -a new generation of vaccines; ( ii ) concatemeric proteins containing modules binding rare and/or toxic metal ions for their industrial recovery, environment remediation or the removal of toxins from the body; ( iii ) novel reservoirs for enzyme cofactors that can modulate particular enzymatic activity; ( iv ) reservoirs for peptide hormones; ( v ) protective, therapeutic concatemeric proteins, containing peptide activators or inhibitors for tissue regeneration or treatment of molecular, microbial and viral diseases; ( vi ) reservoirs for polymerised micro RNA, antisense nucleic acids against genetic, molecular, microbial and viral diseases.

Declaration of Competing Interest
The authors declare that there is no conflict of interest regarding the publication of this article.