An approach to delineate primers for a group of poorly conserved sequences incorporating the common motif region

Glutathione synthetase (gshB) has previously been reported to confer tolerance to acidic soil condition in Rhizobium species. Cloning the gene coding for this enzyme necessitates the designing of proper primer sets which in turn depends on the identification of high quality sequence similarity in multiple global alignments. In this experiment, a group of homologous gene sequences related to gshB gene (accession no: gi-86355669:327589-328536) of Rhizobium etli CFN 42, were extracted from NCBI nucleotide sequence databases using BLASTN and were analyzed for designing degenerate primers. However, the T-coffee multiple global alignment results did not show any block of conserved region for the above sequence set to design the primers. Therefore, we attempted to identify the location of common motif region based on multiple local alignments employing the MEME algorithm supported with MAST and Primer3. The results revealed some common motif regions that enabled us to design the primer sets for related gshB gene sequences. The result will be validated in wet lab.


Background:
Classical methods for degenerate primer design include software applications such as Gene Fisher [1], CODEHOP [2] or PrimaClade [3].These methods usually rely on the identification of clear blocks of conserved regions in multiple global alignments.Therefore, alignment quality should be very high among the sequences in order to utilize the software.However, in many cases, primer design using multiple global alignments is unsuccessful [4], due to poor conservation among the sequences under study.Here a method is analyzed which suggests the common motif region could be used to design degenerate of primers for sequences with poor global similarity or without well-conserved blocks in multiple global alignments.The technique is based on multiple local alignments, using the Multiple Expectation -Maximization for Motif Elicitation (MEME) algorithm [5], in order to search for conserved regions long enough to serve as primers.The results of MEME were then compared with Primer3 [6] result.Using MAST [7] two well conserved motifs sites were generated from the common motif regions.The potential primer properties of these motifs will be verified further in wet lab.
We employed this technique to design primer from the common motif regions for the sequences related to glutathione synthetase gene (gshB) of Rhizobium etli CFN 42.We demonstrate that the amplification of sequences with similarity to genes reported in other species (possibly homologues) is possible, even when the species in question is poorly characterized at the molecular level.
Original query (gshB gene) and its group of related sequences were aligned using T-COFFEE [10] to determine global similarity.A lack of block conservation was observed in all cases in the multiple global alignments which discarded this technique to design degenerate primers for considered sequences.Then multiple local alignments were performed using MEME to generate common motif regions for the sequences.The conditions of these alignments included a minimum motif length of 18 and a maximum of 50, and minimum sites to find was two.Then two motifs were generated using MAST by considering and combining evidences of all p-values associated with motifs resulted by MEME run.Primer3 was employed to generate primers for every individual sequence, which were then compared with the common motif portions of that sequence Table 2 (see supplementary material) suggested by the first MEME result.Again another MEME run was conducted to detect motifs of minimum width 18 and maximum width 22 (criteria of good primer) with all the previous parameters remaining unchanged for the similar sequence set and two motifs were generated employing MAST.

Discussion:
To design degenerate primer for this group of sequences, it was necessary to predict block of conserved regions present in the sequences, which indeed needed sequence alignment study of the sequence set.Therefore, a multiple global alignment algorithm of the T-coffee software was performed for the set of gshB gene homologous sequences.From the result of T-coffee no significant block conservation was detected.So the idea of designing of degenerate primer from a simple multiple global alignments of these sequences was dropped.In order to predict common motif regions for the sequences, an attempt was made for a multiple local alignment using MEME.From the MEME result common arrangement of motifs among the analyzed sequences were detected which proved the existence of two well-conserved regions among these sequences.Then combing evidence of all the p-value of the motifs, two motifs were generated using MAST server.The predicted common motifs were (1) AAGATCTTCGTCACCGAATTTCCCGATCTGA TGCCGAAGAC (+), GTCTTCGGCATCAGATCGGGAAATTCGGTGACGAAGATCTT (-) and ( 2

(see supplementary material).
The resulted motifs of the first MEME run were too long to consider or analyze as primers.Therefore, another MEME run was conducted to detect motifs of minimum width 18 and maximum width 22 (criteria of primer) with all the previous parameters remaining unchanged in this test.In this study, more than half of the results exhibited a common arrangement of motifs between the analyzed sequences, indicating a global similarity that could not be well resolved with common algorithms of multiple global alignments.This indicates the existence of well-conserved regions and similarity in the analyzed sequences, which can be useful for primer design.ACGAAGATCTTTT (-).The pair will be further verified in wet lab and with other in-silico primer analysis techniques.If these resulted primers will fail to be applied for any sequence, the motifs of individual sequence obtained in second MEME run can be extracted and ordered according to the score given in the program.Then by pairing those in all possible ways, pair of primers may be generated and analyzed.

Conclusion:
Our analysis attempts to design degenerate primers from common motif regions of a group of less conserved homologous gene sequences related to gshB gene of Rhizobium etli CFN 42.These sequences have a low sequence similarity in multiple global alignments.So the applied technique in this present study can be used for sequences those have very few known homologues, or to confirm them, and to design degenerate primers when the classical methods do not work.However, the experimented method in this study needs to be improved, for instance to reduce the time-consuming steps and avoiding sequences with low similarity patterns.It is also needed further investigation to apply it in the gene coding analysis study as this technique may deform the functional region of the gene in order to predict a common motif region.Therefore the predicted technique is needed to be validated in wet lab.

Figure 1 :
Figure 1: Second MEME run showing combined block diagram for all motifs with corresponding GENE ID and combined Pvalue The second MEME run predicted two distinct similar regions of motifs (Figure 1).Then two specific motifs were generated employing MAST algorithm from the resulted motifs of the second MEME run.The motifs were (1) TTCGACATGGCCTATATCACCT (+), AGGTGAT ATAGGCCATGTCGAA (-) and (2) AAAAGATCTTCGTCACCGAATT (+), AATTCGGTGACGAAGATCTTTT (-).The pair will be further verified in wet lab and with other in-silico primer analysis techniques.If these resulted primers will fail to be applied for any sequence, the motifs of individual sequence obtained in second MEME run can be extracted and ordered according to the score given in the program.Then by pairing those in all possible ways, pair of primers may be generated and analyzed.

Table 1 :
Showing homologous sequences of gshB gene of Rhizobium etli CFN 42 with respective accession number, organism name, query coverage and E value.

Table 2 :
Showing sequence accession number and associated primers at common motif sites resulted by first MEME run