Effects of mismatches and insertions on discrimination accuracy

Effective discrimination of non-complementary nucleotides is an important factor to ensure the accuracy of hybridization-based nucleic acid analyses. The current study investigates the effects of the chemical nature, the positions, the numbers, and the cooperative behavior of mismatches as well as insertions on 20-mer and 30-mer duplexes. We observed the hybridization stability trend affected by mismatches: G:T approximately G:G > G:A > A:A approximately T:T > A:C approximately T:C > C:C. The experimental data show that mismatches at the center of the oligonucleotide probes have a more profound destabilizing effect on the hybridization stability than those at either ends. Insertions also demonstrate a similar destabilizing effect as mismatches. These results provide useful information for designing DNA microarray nucleotide probes and for improving the discrimination accuracy of hybridization-based detections.


INTRODUCTION
DNA microarrays allow thousands of DNA or RNA samples to be analyzed in one experiment (Lockhart et al., 1996;Blohm et al., 2001).This high throughput analytical platform has greatly advanced gene expression profiling and SNP identification (Debouck & Goodfellow, 1999;Berns, 2000;Marx, 2000).To achieve the desired detection accuracy, the DNA microarray needs to accomplish two fundamental functions: recognition and discrimination.Each oligonucleotide probe on a microarray chip will recognize its nucleotide target with high specificity through complementary hybridization (also referred to as perfect match, PM) and form a stable duplex.At the same time, this oligonucleotide probe also needs to exclude effectively those nucleotide strands that do not have fully complementary sequences due to mismatches (MM) or insertions (Ins).In this regard, the melting temperature (T m ) is used to characterize the hybridization stability.To ensure the detection specificity, the melting temperature differences between PM and MMs should be large enough so that at a given hybridization temperature, all PM duplexes are relatively stable and all the noncomplementary nucleotide fragments are still singlestranded.In reality, however, there are many DNA or RNA fragments in the sample solution whose nucleotide sequences vary from the PM target by only one or two nucleotides, and they may have very similar T m s to that of the PM.Thus, they will interfere with the formation of probe : target duplexes and create false positive signals.To improve the detection accuracy, some data-correction algorithms have been developed.One solution is to design a true probe and a mismatched probe side by side on the same microarray chip.After hybridization and chip scanning, the fluorescence intensity of the true probe will be corrected by subtracting that of the mismatched probe intentionally designed (Cheng & Wang, 2001).This approach, however, raises another question regarding the mismatched probe design: at 2008 X. Piao and others which position and which nucleotide should be included in the mismatched spot.This question motivates us to conduct the current study on how the non-complementary hybridization affects the duplex stability.
In this paper, we report a study on the duplex stability affected by mismatches and insertions.The effects of the chemical nature, the positions, the numbers, and the cooperative behavior of mismatches as well as insertions on 20-mer and 30-mer duplexes were examined.An empirical hybridization stability trend was summarized.We utilized the documented thermodynamic data and structural information to interpret our observation.The current results provide valuable information for the design of high quality oligonucleotide probes used for hybridization-based analyses such as SNP identification, mutation detection, PCR amplification and in situ hybridization.

MATERIALS AND METHODS
Preparation of oligonucleotides.The 16S rRNA gene sequence of Escherichia coli (access No. J01859) was obtained from Genbank at the National Center for Biotechnology Information.The nt99-nt128 region was considered as the detection target, and two oligonucleotide probes were designed accordingly.The first one is a 30-mer with the sequence of 5' GAGTGGCGGACGGGTGAGTAATGTCT-GGGA 3', and the second one is a 20-mer with the sequence of 5' GAGTGGCGGACGGGTGAGTA 3'.To form oligonucleotide duplexes, 20-mer and 30mer oligonucleotide targets were also prepared.
To determine the effect of non-complementary hybridization on the duplex stability, single-mismatch (1-MM), double-mismatch (2-MM), single-insertion (1-Ins), double-insertion (2-Ins) and triple-insertion (3-Ins) were made at the selected positions on these oligonucleotide targets (Tables 1-7 in Supplementary Materials at www.actabp.pl).All the synthetic oligonucleotides of OPC grade were purchased from TAKARA Ltd. (Dalian, China).Stock solutions were prepared with Tris/HCl buffer (0.01 mol/L Tris, 0.1 mol/L NaCl, pH 7.4) to the final concentration of 0.05 mmol/L, and stored at -20°C.Equal volume of the probe stock solution and the target stock solution, either perfect match or mismatches or insertions, were mixed to obtain a duplex solution at the final concentration of 1 µmol/L.Each duplex solution was prepared freshly prior to the melting curve measurements.
Melting temperature measurement.The melting process of oligonucleotide duplexes was monitored by measuring the absorbance at 260 nm on a UV-VIS spectrophotometer equipped with a tem-perature control accessory (Cary 100, Varian, USA).Eighty microliters of duplex solution in a capped quartz cuvette of 1 cm pathlength was used.To ensure the comparability of the experimental data, a 6×6 multicell block was used in the measurement to measure up to 6 samples simultaneously.The multicell block has eight Peltier heat pumps to exchange efficiently heat between the sample cuvette cells and the circulating water to maintain the desired sample temperature.The melting curve was recorded in the range of 30-90°C with 1°C increment at a rate of 1°C/min.To ensure the complementary duplex formation before measurements, a denaturation-annealing cycle under the same conditions was performed prior to the temperature ramping.Three denaturation-annealing cycles were performed for data acquisition in each measurement.The melting curves demonstrated a high degree of reproducibility and were averaged to form one final curve for data analysis.The melting temperatures were determined in terms of the peak position of the first-derivative of melting curves by the software.In general, the experimental error of T m values was ±0.1°C for perfect matches and ±0.3°C for non-complementary duplexes, on repeated measurements.

Duplex thermal stability affected by single mismatches
Figure 1a shows melting curves of 20-mer duplexes containing PM, 1-MM or 2-MM.When 1-MM and 2-MM are included in the duplex, the melting curves are shifted leftward with respect to the PM curve, indicating a reduced melting temperature.Figure 1b shows the melting temperatures of 20-mer and 30-mer duplexes having different MMs in comparison with those of the PM duplexes.The single mismatches destabilize the 30-mer duplex by 3-6°C, whereas the same 1-MMs at the same position destabilize the 20-mer duplex by 3-10°C, illustrating that shorter oligonucleotide duplexes are more sensitive to MMs (see Tables 1 and 2 in Supplementary Materials at www.actabp.pl).This result is consistent with the common observation that shorter oligonuleotide probes are preferred in the hybridization-based analyses since they are more sensitive to mismatched hybridizations (Ke & Wartell, 1993).Figure 1c demonstrates the trend of the duplex stability affected by eight different types of mismatched basepairs.The curves show that the pyrimidine : pyrimidine MMs (C:C, C:T and T:T) have a greater destructive effect on the duplex stability than the purine:purine MMs (A:A, G:G and A:G) except for the A:A mis-Effects of mismatches and insertions on discrimination accuracy match that is slightly less stable than the T:T mismatch in the case of 30-mer duplexes.Furthermore, the purine:pyrimidine mismatches have a mixed effect: the G:T mismatch has the least destructive impact among the eight types of MMs in both 20-mer and 30-mer duplexes, and the A:C mismatch is more stable than the T:C and C:C mismatches (see Tables 1 and 2 in Supplementary Materials at www.actabp.pl).
Our results demonstrate a high consistency with observations reported by other researches, even though the lengths and the nucleotide sequences of the duplexes are different.The dissociation kinetics of a 19-mer oligonucleotide-DNA duplex containing various single mismatched basepairs studied on dried agarose gels showed that the destabilizing effect depended on the nature of the mismatches.The G:T and G:A mismatches slightly destabilized the duplex, while the A:A, T:T, C:T and C:A mismatches significantly decreased the duplex stability (Ikuta et al., 1987).Temperature-gradient gel electrophoresis (TGGE) has been used to determine the thermal stabilities of 48 DNA fragments that differ by a single mismatched basepair.The order of stability was determined for all mismatched basepairs in four different nearest neighbor environments; d(GXT)/d(AYC), d(GXG)/d(CYC), d(CXA)/d(TYG), and d(TXT)/d(AYA) with X, Y = A, T, C, or G. Since the DNA fragments are PCR amplicons 373 bp long, the single-mismatches destabilized them by only 1 to 5°C with respect to homologous PM DNAs.The G:T, G:G and G:A mismatches were always among the most stable ones.Purine : purine mismatches were generally more stable than pyrimidine:pyrimidine mismatches.Their results demonstrated that both the bases at the mismatch site and the neighbor stacking interactions influence the destabilization (Ke & Wartell, 1993).Tikhomirova et al. (2006) employed salt-dependent differential scanning calorimetric technique to characterize the stability of six oligomeric DNA duplexes containing a G:C, A:T, G:G, C: C, A:A or T:T mismatched basepair.Within the central AXT/TYA triplet, the duplex stabilities change in the order of GC > AT > GG > AA ≈ TT > CC.It was also estimated that the thermodynamic impact of the GG, AA, and TT mismatches is confined within the central triplet, whereas the destabilization effect of the CC mismatch involves 7-9 bp.
We interpret the observed duplex stability trend using the structural parameters of helical width, base stacking interaction, glycosidic orientation and hydrogen-bonding capability obtained in x-ray crystallography and nuclear magnetic resonant (NMR) spectroscopy.
When two purine bases are mismatched, they are tightly confined in the helical interior space.As a result, the mismatch widens the diameter of the double helix, although the overall helical structure is close to that of normal B-DNA.Additionally, to release the constraint of the two confined purine moieties, one of the mismatched G:G retains the normal anti glycosidic conformation while the other adopts the unusual syn conformation, evidenced by the solution structure of d(GAGGAGGCACG)/ d(CGTGCGTCCTC) (Cognet et al., 1991) and the single crystal structure of d(CGCGAATTGGCG) 2 (Skelly et al., 1993).
Duplexes containing a single A:G mismatch also demonstrate similar syn-anti arrangement in d(CGCAAATTGGCG) 2 , d(CGCGAATTAGCG) 2 and

2008
X. Piao and others alternating d(GA) n (n = 14, 15) sequence (Brown et al., 1986;Lane et al., 1991;Huertas et al., 1993).Analysis of their results concluded that two hydrogen bonds are formed between the mismatched G:A in all cases despite the anti-syn conformation.Interestingly, we have not identified the structure of the A: A mismatch.We speculate that the A:A mismatch will probably adopt the anti-syn conformation like that of the G:G mismatch.
When two pyrimidine bases are mismatched, due to the restriction of the phosphate backbone, they leave relatively large unfilled space in the helical interior in comparison with the normal purine: pyrimidine basepairs.A study of d(CT) 4 , d(TC) 4 and d(TC) 15 indicated that they form antiparallel duplexes stabilized by two H-bonds between C and T (Jaishree & Wang, 1993;Boulard et al., 1997).Similarly, the T:T mismatch structure is also stabilized by two imino-carbosyl hydrogen bonds (Trotta & Paci, 1998).At a neutral condition like the current situation, the C:C mismatch has only one H-bond.A molecular dynamics analysis also showed that the C: C mismatch has very pronounced mobility (Boulard et al., 1997).It is worth realizing that the hydrogen bonds of T:T and T:C are relatively weak in comparison with the normal purine : pyrimidine basepairs since the two pyrimidine moieties are separated by a relatively large distance.Thus, it is reasonable to observe that all three G:G, G:A and A:A mismatches are more stable than the T:T, T:C and C:C mismatches.
In the case of purine : pyrimidine mismatches, the G:T mismatch is accommodated in the normal helical structure by small adjustments in the sugarphosphate backbone conformation.There are still two hydrogen bonds in this G:T mismatch, making it behave as the perfect A:T match (Hunter et al., 1987).Consequently, the G:T mismatch is the most stable one among the eight types of mismatches.The A:C basepair, however, has only one hydrogen bond at neutral pH, evidenced by the single crystal structure of d(CGCAAATTCGCG) 2 and NMR structure of d(CGCGAATTCACG) 2 (Hunter et al., 1987;Gao & Patel, 1987).Nevertheless, the A:C basepair does not create any constraint on the sugar-phosphate backbone and the glycosidic orientation, thus, it should be more stable than the C:C and C:T basepairs of one hydrogen bond, but less stable than the T:T basepair of two hydrogen bonds.
Combining our current study and other reported experimental results, we propose the following duplex stability trend affected by single-mismatches: G:T > G:G > G:A > A:A > or ≈ T:T > or ≈ A: C > T:C > C:C, where the stability order of A:A, T: T and A:C may depend on the nucleotides flanking the mismatched site.
Our empirical mismatch stability trend addresses the significance of the hybridization-based platform for SNP detection.Single nucleotide polymorphisms represent a natural genetic variation in the human genome.It has been found that about 65% SNPs are the replacement of C with T, particularly in the CpG dinucleotide islands (Brookes, 1999).Thus, the G:T mismatches occur more frequently than other types of mismatches.Here, we show that the G:T, G:G and A:G mismatches cause only a 2-4°C decrease in the melting temperature relative to that of PMs, creating a great challenge for probes to discriminate the G:T, G:G or A:G basepairs accurately.In conjunction with real-time PCR, a genotyping method based on melting analysis has been developed (Liew et al., 2007).Successful allele discrimination of this approach also depends on the T m difference between SNP alleles, which is evidently determined by the length of the amplicons and the mismatches.A high resolution melting (HRM) technique had to be used to resolve the temperature differences as small as 1°C in the successful SNP identification of the F8 and GJB1 genes (Laurie et al., 2007;Kennerson et al., 2007).In addition, the repair efficiency of mismatches in DNA replication is strongly dependent on the type of mismatch (Fersht et al., 1982;Kramer et al., 1984;Lu et al., 1984).This may be the reason why the mismatch repair enzymes fail to recognize and remove the m 6 G:T mutagenic lesion (Leonard et al., 1990).

Duplex thermal stability affected by double mismatches
In comparison with the melting curves of PM and 1-MM, a second MM shifts the melting curve leftward further, indicating a decrease of melting temperature (Fig. 1a).Thus, hybridization of multiple MMs can be easily excluded in the mixed samples regardless of the chemical nature of these MMs.Although the duplexes containing different combinations of two MMs exhibit different T m s, the values of ΔT show the same trend for both 20-mer and 30mer duplexes (Fig. 2a).
It has been speculated that each mismatch in a duplex acts as an initiating point from which the duplex begins to denature toward the 5'-end and 3'end as the temperature is elevated, causing so-called cooperative melting.Such cooperative melting process is expected to lead to two distinguished behaviors.The melting will take place in a vary narrow temperature range and the melting temperature will show a big drop, that is, ΔT´ is larger than ΔT (= ΔT1 + ΔT2), where ΔT1 and ΔT2 are the melting temperature decreases when a single MM occurs at positions #1 and #2 independently.

Effects of mismatches and insertions on discrimination accuracy
We assessed the cooperative melting feature when two MMs are present in a duplex.Figures 2b  and 2c show that ΔT´ caused by 2-MM is almost equal to ΔT (Tables 3 and 4 in Supplementary Materials).In addition, we also examined the width of the derivative of their melting curves.The FWHH (full width of half height) for the PM duplex was measured as 6.2°C, while the FWHH was about 7.4°C for the 1-MM duplexes and more than 8.5°C for 2-MM duplexes, respectively (not shown).This broader temperature range could be interpreted as a result of sequential melting, that is, the duplexes begins to melt at the MM site of lower T m first and then at the MM site of higher T m .
It has been reported previously that when two MMs are separated by ten nucleotides, the duplex is more unstable (Zhen et al., 1997).It was interpreted mainly on the basis of the helical structure.B-form DNA contains about ten basepairs per helical turn, and two MMs separated by ten nucleotides become the nearest neighbors across the major grove of the helix, showing a cooperative influence on the helical stability.However, the current study did not observe such phenomenon.

Duplex thermal stability affected by mismatched positions
The group of Dr. Smith used 3-nitropyrrole to examine the mismatch position effect on the duplex stability by taking advantage of the fact that 3-nitropyrrole could not form hydrogen bonds with any base.When 3-nitropyrrole substitution for a base was made from the 5'-end to the 3'-end in a 15-mer oligonucleotide, the relationship of T m with respect to the substitution position showed a U-shape profile, indicating that the mismatches at the center portion can generate the most profound destructive effect (Zhen et al., 1997).To mimic real hybridizations, we used the cytosine nucleotide to substitute for nucleotides on the target strand in the current study based on the fact that a misparied cytosine causes an observable temperature decrease.Cytosine substitution on the target strand was made from the    5 of Supplementary Materials at www.actabp.pl).

2008
X. Piao and others 5'-end to the 3'-end sequentially.When the original nucleotide on the target strand was C, meaning the complementary site on the probe strand was G, we made a cytosine replacement on the probe strand to construct a C:C mismatch.(The nucleotide sequences of the duplexes are given in Table 5 in Supplementary Material at www.actabp.pl).Thus, we created a series of probe : target duplexes having C:X mismatches, where X can be C, T, or A. The relation of the melting temperature with respect to the mismatched position shows a U-like curve (Fig. 3) similar to that reported by Zhen et al. (1997).The deviation of Fig. 3 from a perfect U-curve is attributed to the unequal destabilizing effect of the C:A, C:T and C:C mismatches.

Duplex thermal stability affected by insertions
Sequence comparisons have also identified some genomic regions which differ from the intended target by having one extra nucleotide or one less nucleotide.Once they hybridize with the oligonucleotide probe, a bulge structure will be formed on the duplex, creating a constraint on its neighbor nucleotides and affecting the duplex stability.
A single insertion of four types of nucleotides was made at several selected positions in 30-mer (Fig. 4a) and 20-mer (Fig. 4b) duplexes.(See Tables 6 and 7 in Supplementary Materials at www.actabp.pl).The results indicate that single-insertions are almost as unstable as single-mismatches for both 20mer and 30-mer duplexes.(In these comparisons, T m s of 1-MMs are the average of melting temperatures of three different MMs at position #6, #11 and #16.).No obvious duplex stability trend, however, was observed regarding the chemical nature of the inserted nucleotides or the inserted positions.Interestingly, Figs.4a and 4b show that insertions at the position between #5 and #6 are more stable than 1-MM at the position #6. Figure 4c shows that two-insertions and three-insertions are less stable than both single-insertions and single-mismatches.
The current data provide a clear message that insertions, just like mismatches, can also reduce the duplex stability and should be considered when designing nucleotide probes.It presents a challenge to interpret the duplex stability containing insertions or deletions, since there is very limited information about the thermodynamics and structures of duplexes containing insertions.

CONCLUSION
To ensure accurate gene profiling and unambiguous identification of SNPs, we need to design high quality oligonucleotide probes and select proper experimental parameters, which requires comprehensive understanding of hybridization properties.As an initial phase of our serial study, we investigated the effect of mismatches and insertions on the DNA duplex stability.In this study, 20-mer and 30-mer oligonucleotide probes recognizing an E. coli genomic DNA sequence were designed.Target strands containing all eight types of mismatches were constructed.The effects of the chemical nature, the positions, the numbers, and the cooperative behavior of mismatches as well as insertions were examined.The experimental results demonstrate the following duplex stability trend of the mismatches: G:T > G:G > G:A > A:A > or ≈ T:T > or ≈ A:C > T:C > C:C.The mismatches at the center of the duplexes cause a more profound destructive effect on the stability than those at the ends.Just like mismatches, insertions also show a similar effect on the duplex stability.Although current experimental results are empirical and need further confirmation, this work paves the first step toward understanding of the duplex stability and relevant mismatch discrimination possibility.It is our hope that in conjunction with thermodynamic data and structural information, our study will support the utility of melting profile for designing high quality DNA microarray probes and achieving high accuracy of hybridization-based nucleic acid detection and analysis.

Figure 2 .
Figure 2. Hybridization stability of 30-mer and 20-mer duplexes affected by double-mismatches.(a) Melting temperature decreases of 30-mer and 20-mer duplexes affected by 2-MMs; (b) Comparison of melting temperature decreases of 30-mer duplexes affected by 2-MMs and two 1-MMs (see nucleotide sequences in Table 3 of Supplementary Materials); (c) Comparison of melting temperature decreases of 20-mer duplexes affected by 2-MMs and two 1-MMs (see nucleotide sequences in Table4of Supplementary Materials at www.actabp.pl).

Figure 3 .
Figure 3. Hybridization stability of 20-mer duplexes affected by mismatch positions.Numbers on the x-axis are the mismatched positions and letters are the mismatched basepairs.Open squares and filled diamonds are melting temperatures of PM and MM duplexes, respectively (see nucleotide sequences in Table5of Supplementary Materials at www.actabp.pl).

Figure 4 .
Figure 4. Hybridization stability of 30-mer and 20-mer duplexes affected by insertions.See Tables 6 and 7 in Supplementary Materials at www.actabp.pl.(a) Melting temperatures of 30-mer duplexes with single-insertion at three different positions and single-mismatches at three different positions.Ins1, Ins2 and Ins3 represent the insertion between the position #5 and #6, position #10 and #11 and position #15 and #16, respectively; (b) Melting temperatures of 20-mer duplexes with a single-insertion at two different positions and a single-mismatch at two different positions; (c) Melting temperatures of 20-mer duplexes with multiple insertions at the position between #10 and #11.1-Ins, 2-Ins and 3-Ins indicate the single-insersion, double-insertion and triple-insertion, respectively.
Table 3 of Supplementary Materials); (c) Comparison of melting temperature decreases of 20-mer duplexes affected by 2-MMs and two 1-MMs (see nucleotide sequences in Table 4 of Supplementary Materials at www.actabp.pl).