A Python script to design primers for overlap extension PCR to ligate two DNA fragments

Ligating two or more DNA fragments is a regular operation for the subcloning and the engineering of vectors. The overlap extension PCR serves as a straightforward method to solve this issue. However, it takes a relatively long time to design the appropriate overlapping primers and the primers for the full-length sequence, and there has not been a professional offline software for such kind of primer design. Here, we propose a Python script to search, calculate and sort thousands of combinations of primers for users according to the predefined parameters. The results of script running and experimental validation show that this script is capable of generating the optimal pairs of primers based on the proper melting temperatures and lengths of the primers, which facilitates gene modification in research.


INTRODUCTION
It is a general experiment to ligate several DNA fragments during the construction of plasmids for exploring gene function. The familiar scenarios include adding specific promoters or terminators to the genes of interest, joining tandem gene duplicates, and fusing reporter genes such as GFP, RFP, or YFP to the target genes (Loehlin & Carroll, 2016;O'Halloran, Uriagereka-Herburger & Bode, 2017). In terms of assembling multiple overlapping DNA sequences, the Gibson method is highly efficient and widely applied in gene synthesis (Gibson et al., 2009). In contrast, in less complicated cases, such as ligation of two DNA fragments, the approach of overlap extension PCR is enough and can be easily carried out.
The method of overlap extension PCR takes the first and second DNA fragments as a whole sequence in the design of overlapping primers, consisting of nucleotides from both fragments. Amplifications are carried out to produce the upstream fragment and the downstream fragment separately with the corresponding 5 and 3 primers. Then, the upstream and downstream fragments are used as the template DNA of overlap extension Figure 1 The procedure of ligating two DNA fragments via overlap extension PCR. The first fragment is shown with a long blue arrow (sense strand) and a long orange arrow (antisense strand). The second fragment is shown with a long cyan arrow (sense strand) and a long purple arrow (antisense strand). The arrows show the 5 -3 direction of DNA. The PCR#1 shows the amplification of the upstream fragment with primer gene_5 and primer overlap_3. The PCR#2 shows the amplification of the downstream fragment with primer overlap_5 and primer gene_3. The upstream and downstream fragments bridge within the overlapping region during the overlap PCR of the full-length chimera sequence.
Full-size DOI: 10.7717/peerj.14283/ fig-1 PCR to obtain the full-length chimera fragment that can be applied in the following steps, e.g., ligation, sequencing ( Fig. 1). Although this method is simple, low-cost and highly efficient, the design of the overlapping primers and the primers for the full-length sequence is time-consuming and tedious. To make the two pairs of primers work well in three PCR reactions, the annealing temperatures of all the primers should be carefully considered. Though there has been an online tool for designing such primers (O'Halloran, Uriagereka-Herburger & Bode, 2017), there is no convenient software that can do the primer design offline by far. In this study, we put forth a method to design the two pairs of primers for the overlap extension PCR using Python program. The algorithm is explicit, and the source code is provided with detailed annotation, which is convenient for users to modify based on personal needs. We also performed experiments to verify the script by using the recommended primers to ligate the two example genes. It is our hope that the script will support the gene cloning process.

Materials
The computer system used in this study is macOS Monterey 12.3.1 or higher (Apple Inc., CA, USA). Python 3.10.4 (http://www.python.org) and Biopython 1.79 are required for running the script 'Overlap_ligation_primer_v1.py' (Cock et al., 2009). Users can modify the source code of the script using the IDLE (Integrated Development and Learning Environment) (http://www.python.org). The sequences of the two DNA fragments are stored in fasta files. The first DNA sequence file ECdnaQ.fasta contains the sequence of the dnaQ from Escherichia coli str. K-12 substr. MG1655 (ECdnaQ) with the gene ID 946441, and the second DNA file TBdnaN.fasta contains the sequence of the dnaN from Mycobacterium tuberculosis H37Rv (TBdnaN) with the gene ID 887092. The above script, fasta files, readme file and the zipped output file can be downloaded via the link at https://github.com/shiqiang-lin/PCR-ligate-2_DNAs.

Algorithm
In this script, the following rules are applied in the design of the overlapping primers and the primers for the full-length sequence. (1) The two overlapping primers ( Fig. 1), each containing the nucleotides from both the two fragments, are partially reverse complemented, with 3 overhangs (Fig. 2). (2) The Tm values of the four primers should be similar, which facilitates the PCR reactions. (3) The Tm value of the overlapping region between the two primers ( Fig. 2) should be moderate. In our program, we set the favorite Tm value 63 • C. However, users can modify this value by changing the variable in the line 90 of the source code (which is 'Tm_favorite = 63'). (4) The combinations of the four primers are sorted according to the K value, which is defined with the formulae below.
The 'Tm' represents the value of melting temperature of primer. The 'primer5', 'overlap3' are the primers for the first DNA fragment to be joined and 'overlap5', 'primer3' are the primers for the second DNA fragment to be joined. The 'overlap' is the overlapping region between the two primers. The 'Tm favorite ' is the set optimal Tm value of the overlapping region between the two primers ( Fig. 2). A lower K value of the combination indicates that the difference in melting temperature among the four primers is smaller.

Running method
The script is run with the following steps.
(1) Make a new directory first, copy the script, ECdnaQ.fasta and TBdnaN.fasta to the directory.
(2) Open the Terminal and change the current working directory to the path of the new directory, and run the following command. python3.10 Overlap_ligation_primer_v1.py ECdnaQ.fasta TBdnaN.fasta (3) Once the script finishes the task, a new file named 'ECdnaQ_TBdnaN_ligation_primers.txt' is generated, which lists the resulting primers (Fig. 3). The process takes less than one minute for a MacBook computer.
The ECdnaQ and TBdnaN were then gel recycled and their concentrations were measured with NanoDrop (Thermo Fisher Scientific Inc., Denver, CO, USA). The concentration of ECdnaQ was 294.4 ng/µL and that of TBdnaN was 245.5 ng/µL. The two genes were mixed with equal molar ratio, according to gene length and concentration and neglecting the GC difference. TE buffer was used to dilute the mixed gene to a concentration of 24 ng/µL, which was then used as the template DNA for producing the full-length fragment. The PCR reaction system (50 uL) contained 20 uL ddw, 2 uL 10 uM gene_5, 2 uL 10 uM gene_3, 1 uL mixed genes and 25 uL 2XPhusion HF PCR Master Mix. The PCR program was 98 • C for 30s; 30 cycles of 95 • C for 30s, 60 • C for 30s, 72 • C for 40s; 72 • C for 5min; 16 • C forever. The full-length fragment was gel recycled and ligated to T vector with pEASY R -Blunt Zero Cloning Kit (Catalog Number CB501-01, TransGen Biotech Co., Ltd., Haidian, Beijing, China). The sequence of the ligated fragment was confirmed by Sanger sequencing (Supplemental Information S3).
Moreover, we obtained the full-length fragment based on overlap extension (without adding primers gene_5/gene_3). The PCR reaction system (50 uL) contained 24 uL ddw, 1 ul equal molar mixed genes (262 ng/uL) and 25 uL 2X Phusion HF PCR Master Mix. The PCR program was 98 • C for 30s; 30 cycles of 95 • C for 30s, 60 • C for 30s, 72 • C for 40s; 72 • C for 5min; 16 • C forever. The target band was gel recycled and ligated to T vector with pEASY R -Blunt Zero Cloning Kit (Catalog Number CB501-01, TransGen Biotech Co., Ltd., Haidian, Beijing, China). The sequence of the ligated fragment was confirmed by Sanger sequencing (Supplemental Information S4).

Script running
As shown in Fig. 2, the overlapping primers vary with the values of i, j, k and l. Since i and j both can be one of the four numbers (0, 1, 2, and 3), there are 4*4 = 16 possibilities for the overlapping forward primer. Similarly, there are also 16 possibilities for the overlapping reverse primer. Meanwhile, the search interval of the primer for the full-length sequence is from 14 to 30, therefore, there are 17 possibilities for each primer of full-length sequence. In total, the number of combinations for the overlapping primers and the primers of the full-length sequence is 73,984, equaling 16*16*17*17. Although the number is remarkable, the program can generate all of these combinations within one minute.
An example of the output file 'ECdnaQ_TBdnaN_ligation_primers.txt' is shown in Fig. 3. There are seven columns in the output file: the left five columns are the name, sequence, length, GC percentage, Tm value of each primer, and the right two columns are Overlap_Tm and K value. The Overlap_Tm indicates the Tm value of the overlapping region of the overlapping primers in the combination. The K value represents the difference of Tm value among these four primers. Lower K value suggests the combination of the four primers is theoretically preferable. In the example, the K value of the first combination is 583, which is the lowest among the combinations. Although the GC percentage of the gene_3 primer is a little high, this combination is still the best choice since the GC percentage in the 3 end of the second DNA fragment is intrinsically high.

Experimental validation
The experimental validation process was divided into three steps, i.e., obtaining gel recycled ECdnaQ and TBdnaN fragments, amplifying the full-length fragment via overlap PCR, ligating the full-length fragment to T vector for sequencing confirmation. In Fig. 4, lane 1 and lane 2 showed that the primers picked up by the script (gene_5 and overlap_3 for ECdnaQ, overlap_5 and gene_3 for TBdnaN) could amplify the target genes, which were then gel recycled (lane 3 and lane 4) for template DNA to produce the full-length fragment. Since there were two methods to obtain the full-length fragment, i.e., with primers gene_5/gene_3, without primers gene_5/gene_3, we performed PCR reaction for the two situations respectively. Lane 5 and lane 6 showed that both methods were capable of yielding full-length fragments. The target bands were gel recycled and ligated to T vectors respectively. The sequencing results showed that the sequences of the full-length fragment were correct (Supplemental Information S3-S4). These results demonstrated that the primers recommended by the script could be used to ligate the ECdnaQ and TBdnaN with overlap PCR.

DISCUSSION
In this study, the design of the overlapping primers and the primers of the full-length sequence for combining the two DNA fragments using overlap extension PCR is accomplished by a Python program. As the source code is provided, it is convenient for users to adjust the relevant parameters, such as the searching space for each primer, the optimal Tm value for the Tm_overlap, and so on. The script can also be used to design the deletion primers for overlap extension PCR, in which the input sequences are the two fragments from the target gene with the deleted region removed (Forloni, Liu & Wajapeyee, 2018;Lee et al., 2010). However, both fragments should be no less than 200 bps to ensure an efficient gel extraction. For amplifying full-length fragment with mixed ECdnaQ and TBdnaN as template DNA, two approaches were used, which were with primers gene_5/gene_3 and without primers gene_5/gene_3. In terms of approach without primers gene_5/gene_3, the double strands of ECdnaQ and TBdnaN could form two kinds of overlaps during the annealing stage, i.e., overlap with two 3 ends (equivalent to primers) that could elongate by incorporating dNTPs to form the full-length fragment and overlap with two 5 ends that could not grow in length via DNA polymerase. Due to the addition of plenty of mixed ECdnaQ and TBdnaN, we could obtain enough full-length fragment for T vector ligation.
For the approach with primers gene_5/gene_3, the reaction was more complex. What happened to the reaction without primers gene_5/gene_3 could happen to that with primers gene_5/gene_3. Besides, the overlap with two 5 ends could be used as template DNA to produce the strands that could form overlap with two 3 ends in the next round of annealing, which then yielded the double-stranded full-length fragment. These full-length fragments then could be used as template DNA for exponential amplification with primers gene_5/gene_3.
The approach without primers gene_5/gene_3 was simpler as there was no exponential amplification. Therefore, lots of template DNA was needed to guarantee enough full-length fragment for the next step of experiment. A possible benefit of this approach is that the possibility of nucleotide mismatch might be reduced as the reaction was simpler. In comparison, the advantage of the approach with primers gene_5/gene_3 was that only a small amount of mixed template DNA of ECdnaQ and TBdnaN was needed. Due to the complexity of the reaction, this approach might have a slightly lower fidelity than the approach without primers gene_5/gene_3. However, the high-fidelity DNA polymerase used in this experiment ensured that the correct full-length fragments were obtained.
Currently, to the best of our knowledge, there is no other program like ours that is standalone, open-source and can automatically design thousands of primer combinations for performing overlap extension PCR to ligate two DNA fragments. The script is free, annotated in detail so that users can read the code easily and know the algorithm instead of dealing with a black box, which is important for tailoring the script according to the specific, if any, experimental needs. Also, the script exhibits the results in a friendly style for users to pick up the preferred tetrad of primers, which may save time for users and avoid mistakes that might lead to undesired mutation. Furthermore, we provided two approaches for yielding the full-length fragment and showed that both worked, demonstrating the flexibility of our script in the application.

CONCLUSION
To sum up, we offer a simple and efficient script to design the four primers to ligate the two DNA fragments and the script is verified by experiments. It is our hope that the script may help for doing gene cloning in the lab.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
This work was supported by the Project of Fujian Province Department of Science & Technology (2020N5013). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Grant Disclosures
The following grant information was disclosed by the authors: Project of Fujian Province Department of Science & Technology: 2020N5013.