THE USE OF MOLECULAR GENETIC MARKERS AND PCR FOR DNA DIAGNOSTICS IN RAW MATERIALS DERIVED FROM FRUIT AND BERRIES

A general description of molecular genetic markers is provided in the present article. The classification of DNA markers used for identification of raw materials of plant origin is presented. The most appropriate method for identifying raw materials derived from fruit and berries is chosen using study reports. The use of PCR for determining the quantitative and qualitative composition of the raw materials is considered. DNA regions used for PCR diagnostics of raw material derived from fruit and berries are characterized. The significance of amplification for a PCR test is outlined. The optimal PCR conditions have been selected and the advantages of this method have been revealed in the present study. Amplification profiles of DNA from the samples have been analyzed using different primers. Experiments with different primers allowing for identification of the raw material were carried out. The possibility of using a complex including a common gene and a variable gene for the identification of raw materials derived from fruit and berries has been considered. The sequence of DNA regions has been analyzed. The possibility of using a ribosomal RNA gene for generic and interspecific differentiation of DNA samples has been demonstrated. The significance of oligonucleotide primers and PCR product length for the reliability of the whole genotyping system has been elucidated. A scheme of PCR-based DNA profiling has been developed. Two types of the procedure were compared and the most appropriate type was chosen according to cost efficiency.


INTRODUCTION
Identification of plant species from which raw material was derived is among the main areas of application of molecular genetic markers. DNA fragments associated with a specific nucleotide sequence constitute a new class of molecular markers. The number of such markers is severalfold higher than that of the markers characterized previously (isozymes, storage proteins, and morphological features). Besides, expression of DNA markers is neither dependent on the phenotype nor tissue-specific and can be detected at any stage of plant development. The use of DNA markers led to changes in the methods of evaluation of genetic diversity of plants, certification and classification of plant varieties, and genetic monitoring and selection breeding [4].
All studies related to identification of plant samples are based on the assumption that DNA fragments with equal molecular weights and the same activity represent the same genomic fragments within one family of plants. Specific resolved DNA bands are used as tools for the assessment of the level of similarity between multiband DNA profiles. The possibility of detecting identical multiband DNA profiles for two randomly selected individuals is close to 2•10 -9 for avocado, 1.5•10 -9 for papaya varieties, 4.2•10 -5 for apple varieties, and 2.4•10 -3 for varieties of raspberries and blackberries [12,13].
Identification of plants is carried out stepwise in a certain direction: 1. identification of individuals; 2. detection of hybrid forms; 3. investigation of pedigrees of plant specimens. The purpose of identification of individuals is assignment of a plant specimen to a species, subspecies, variety, etc. or finding a solution for a taxonomic problem.
The main areas of use of molecular genetic markers are the following: -identification of species, varieties, and other forms of plants; -assessment of genealogical relationships between plants; -search for molecular genetic markers associated with desirable traits.
DNA markers must meet several requirements, such as: -availability of phenotypic manifestations of allelic variants for the identification of different individuals; -difference between allele replacement at one locus and those at other loci; -availability of a substantial part of allelic substitutions in the target locus for identification; -random character of the sample of genetic loci investigated with regard to physiological effects and the degree of variability; -uniform distribution in the genome; -relative neutrality.
There are no primers that would meet all these requirements [4,9].
Molecular genetic markers that are most frequently used in practice can be divided into the following classes: -markers expressed as visible morphological characteristics; -markers constituted by structural portions of genes encoding the amino acid sequences of proteins; -markers constituted by non-coding regions of structural genes; -markers constituted by various DNA sequences, for which the relation to the structural genes is usually unknown -in other words, short repeats spread throughout the genome (RAPD -randomly amplifiable polymorphic DNA; ISSR -inverted short sequence repeats; and RFLP -restriction fragment length polymorphism), microsatellite loci (tandem repeats of a unit consisting of 2-6 nucleotides), and others.
RFLP (restriction fragment length polymorphism)based approach was first proposed for studies of the human genome. This method, which detects changes in the location of restriction sites in the DNA sequence, allows for the identification of mutations, deletions, insertions, inversions, or translocations in the DNA sequence and for the detection of variability in cytosine residue methylation patterns.
Isolated DNA is treated with the appropriate restriction enzymes, which are endonucleases capable of recognizing and cleaving specific DNA sequences. Restriction fragments of different length are separated by agarose or polyacrylamide gel electrophoresis. Since multiple DNA fragments are present in the electrophoregram, the detection of a specific fragment requires hybridization with a radioactive or fluorescent probe (labeled oligonucleotide) on a membrane (Southern blotting). The nucleotide sequence of the DNA probes is complementary to the sequence of the fragment of interest.
SNPs (single nucleotide polymorphisms) are usually represented with 2 allelic variants of a single nucleotide site. Polymorphism identification with these markers usually provides an unambiguous result (+/-). More than 10 different approaches to SNP identification are currently available [11].
Detection methods which do not require electrophoretic analysis are the most convenient. An example of such a method is Taq-Man which is based on hybridization of DNA template to an oligonucleotide which contains a reporter moiety at the 5'end and a quencher moiety at the 3'-end; the quencher moiety suppresses the fluorescent signal if the oligonucleotide does not hybridize to the matrix, thus enabling SNP detection.
PCR requires smaller amounts of DNA from plant tissues than RFLP, but contamination of the samples with fungal and bacterial DNA is unacceptable if the former method is used. The use of PCR assay eliminates the need for maintaining a library of clones for hybridization probes, Southern blotting, and the use of radioactive isotopes.
The amplicons produced by PCR are divided into two groups according to the type of primers used: 1. STS markers obtained using specific primers designed for the amplification of certain sequences. 2. Markers amplified with random primers. Amplification of some markers employs primers that combine the properties of specific and random primers [10].
Amplification with specific primers is a controlled process that requires knowledge of the DNA matrix sequence; the primers target a unique site or multiple sites on both DNA strands.
Amplification which does not require prior knowledge of the DNA sequence is termed random. It involves one or several random primers and results in amplification of a single or multiple genomic sites; most of these sites are polymorphic.
Successful implementation of methods for DNA diagnostics stimulated further development and implementation of highly sensitive techniques based on polymerase chain reaction (PCR). Specific DNA amplification is currently in wide use for the development of reliable procedures for identification of the origins of multicomponent products and raw material derived from various fruit and berries [8,12]. The advantages of species identification based on PCR over the traditional (physicochemical) methods of identification of genera and species serving as sources of plant-derived raw material are versatility, more precise differentiation of species, high reproducibility, and the possibility of quantitative analysis. Furthermore, plant DNA is more stable during food processing than other chemical compounds or biomolecules present in food.
Chloroplast genome, which is unique to plants, is represented by a large number of copies in plant cells and contains large variable and non-coding fragments (introns and intergenic regions), and therefore it can be used for very sensitive species identification of plantderived raw materials in foods [5]. The size of the amplified fragment is different for different species. This feature was used for identification of tangerine, orange and grapefruit juice by heteroduplex PCR in a study reported in 2006 [7].
The aim of the present study was to investigate the possibility of using PCR for DNA diagnostics of raw materials derived from fruit and berries and to analyze target DNA regions, profiles generated by amplification, and PCR products.

OBJECTS AND METHODS OF THE STUDY
Raw materials derived from fruit and berries, namely, a mixture of orange and tangerine juice, apple, strawberry, pomegranate, blueberry, blackcurrant, and pineapple were the objects of the present study. DNA regions and their sequences were investigated.

RESULTS AND DISCUSSION
Polymerase chain reaction (PCR) is a method for in vitro amplification (repeated copying) that allows for a more than 10 8 -fold expansion of the amount of the target DNA in several hours. The significance of this method for molecular biology and genetics is so great and obvious that the author was awarded the Nobel Prize in Chemistry just seven years after the discovery [2,9].
The method is based on repeated copying of specific DNA fragments; the specificity is determined by complementarity of DNA sequences and the copying is mediated by the enzyme DNA polymerase.
Amplification (controlled replication of a DNA fragment) is the main stage of PCR. Each amplification cycle consists of three phases: 1. DNA denaturarion; 2. Primer annealing (binding to target DNA); 3. Primer elongation (synthesis).
Cyclic repetition of these three steps leads to enrichment of the reaction mixture with target DNA molecules, since both the target DNA initially present in the sample and the newly synthesized DNA serve as templates in each subsequent cycle. The course of PCR, i.e. the transition from one stage to another and from cycle to cycle, is regulated by changing the temperature of the reaction mixture [3,7].
The optimal operating mode is determined by the length and specificity of the fragment amplified. The first PCR cycle is usually preceded by pre-incubation of the reaction mixture at 92-96°C for 0.5-10 min, which leads to inactivation of contaminating proteins (which may be present in the sample) and enhances the initial denaturation of template DNA. Introduction of this step increases the efficiency of PCR. After completion of the last cycle, the sample is typically incubated at 72°C for 5-10 min for completion of synthesis of all the polynucleotide chains in the PCR product. The reason for introducing this delay at the last stage of DNA synthesis is the accumulation of a large amount of PCR product; the presence of truncated chains formed during the last cycle is manifested as considerable molecular heterogeneity of the reaction mixture, while the "incomplete" chains formed in the intermediate cycles are completed in the subsequent cycles and do not affect the overall result of the reaction [1,2].
The duration of primer elongation. The rate of primer elongation in PCR is typically around 50 n/s at 72°C, and therefore 15 seconds are sufficient for the amplification of a DNA fragment shorter than 400 nt (nucleotides). This time can still be reduced, because the process of primer elongation begins during the annealing phase already. Amplification of very long sequences requires increasing the elongation time during PCR to compensate for the inhibitory effect of increasing viscosity of the reaction mixture.
After completion of the elongation step of the first cycle, the working mixture is heated to enable the transition to the denaturation step of the second cycle, and so forth. The length of amplicons formed during the third and following cycles is standard and equals the number of base pairs between the sense and antisense 3'-ends of the template DNA fragment [2,3].
The number of cycles is inversely proportional to the number of copies of the template in the reaction mixture. The number of copies is usually chosen to ensure the formation of a detectable amount of product after 25-50 cycles. The number of DNA molecules synthesized during the exponential phase of PCR increases until it is in the range of 10 12 , and afterwards the rate of product accumulation is drastically reduced. There are several major factors that hinder the theoretically predicted exponential accumulation of the product, such as exhaustion of reaction substrates (deoxyribonucleotides or primers), insufficient stability of the components of the reaction mixture, PCR inhibition by the end product, competition for substrates involving nonspecific PCR products and primer dimers, re-annealing of PCR products preventing primer elongation, incomplete denaturation in the presence of large amounts of PCR product, and some others. These factors determine the "plateau effect" that renders it impossible to improve PCR specificity or enhance the accumulation of the specific product by increasing the number of cycles. PCR specificity increases if the number of cycles and the duration of the individual stages (namely, denaturation, annealing, and elongation of primers) are reduced. However, increasing PCR efficiency for the amplification of large fragments of the template (> 1 kb, thousand of nucleotide pairs) requires increasing the duration of the cycle steps. The duration of individual cycle steps is affected by the shape of the reaction mixture droplet, the thickness of tube walls, and structural features of the heating block of the thermal cycler [2,4].
Volume of the reaction mixture. Standard sample volume for PCR is usually 20-100 µl; however, modern PCR devices allow for the use of 5 µl samples in analytical PCR. Heating and cooling of large samples is inefficient, while PCR in small reaction volumes yields small amounts of the product.
PCR is performed in automatic mode using special instruments (thermocyclers) which perform the necessary number of cycles and allow for choosing the appropriate time and temperature for each reaction cycle from a large and often continuous scale of options.
The final stage of PCR-based DNA analysis is detection of the amplification products that usually involves electrophoretic separation of PCR mixture on agarose or polyacrylamide gels and staining with ethidium bromide [9].
The specificity of DNA amplification is inferred from the position (size) of the amplicon band relative to the marker DNA fragments and a DNA standard.
The main advantages of PCR are high sensitivity and specificity, simplicity of the procedure, the possibility to use a DNA template without timeconsuming isolation or purification, and the possibility of analyzing almost any biological material [2].
A scheme illustrating all stages of polymerase chain reaction is shown in Fig. 1.   Fig. 1. A scheme of polymerase chain reaction. A1, A2 -oligonucleotide primers.
In case of absolute specificity the PCR product is a copy of the locus which is amplified, with no other nucleotide sequences present in the final product.
The precision of amplification is of crucial importance if the nucleotide sequence of the PCR product is critical for further experiments. Amplification errors are unacceptable if the primary structure of the amplified locus is to be determined or the PCR product is to be used for recombinant expression of a protein encoded by the DNA sequence in question.
The final yield of PCR product after the completion of the reaction is used to assess PCR efficiency quantitatively. Theoretically, the amount of product in the reaction mixture is doubled after each PCR cycle; however, this never occurs in real reactions.
PCR amplification of a trnT-trnL intergenic spacer fragment from chloroplast genome resulted in heteroduplex formation when a mixture of DNA molecules extracted from orange and tangerine juice was used as a template. Amplification of DNA from a mixture of orange and tangerine juice resulted in heteroduplex formation. PCR analysis of genuine juices was conducted simultaneously. The size of the PCR amplicon obtained from tangerine juice was 18% lower than that of the PCR amplicon derived from orange juice. Thus, a simple and reliable method for the identification and detection of falsified tangerine juice containing orange juice was discovered.
An international group of molecular systematics recently formed the "Barcode of life" consortium in order to create a database of genetic variation within a target gene (cytochrome c oxidase I [COI]) for fast and accurate identification of species. However, genetic variation within the COI gene of plants was not sufficient for species identification, and therefore the intergenic region psbA-trnH was suggested for use as a DNA target. The ultimate goal of this study was to create a database of PCR-RFLP profiles of the psbA-trnH intergenic region. Primers used for PCR are shown in Table 1. However, PCR with primers designed by Kress yielded no amplification products when heat-treated foodstuffs were tested. Analysis of PCR amplicons obtained with Taberlet's primers showed that elderberry (560 bp) could be qualitatively differentiated from other types of fruits and berries; apple (392 bp) and pear (411 bp) differed from other species, but not from each other; and blueberries (465 bp), grapes (459 bp), and pomegranate (463 bp) could be distinguished from other species, but not from each other. This allows for the assumption that in some cases a single set of primers can be used to identify several fruits and berries present in the sample.
PCR products were treated with restriction enzymes ApoI and DdeI and RFLP profiles were analyzed for a more precise differentiation of the amplification products.
Restriction enzymes cut double-stranded DNA at specific sites. Each restriction enzyme has a specific target sequence, which usually consists of 4-6 base pairs. The enzyme cuts DNA at each point where the target sequence is found. Different restriction enzymes have different target sequences. The results of the study are.shown.in. Table.2. [2]. High probability of restriction site loss caused by point mutations (nucleotide substitutions), inversions, deletions, and insertions in the genomic DNA sequence can present a problem for plant species identification employing this method [1].
The 5S DNA sequence proposed as an alternative is a convenient target, since it is highly conserved among species and the copy number for the 5S rRNA genes is high. Species-specific internal spacers are used for phylogenetic studies and species identification [6].
The 5S rRNA is a component of all ribosomes, except for those found in the mitochondria of certain plant species, and is transcribed from hundreds to thousands of genes in all higher eukaryotes. Genes encoding 5S rRNA are located separately from the 18S-26S rRNA gene clusters and arranged in the form of tandem repeats, with alternating arrays of 5S rRNAcoding sequences and non-transcribed spacers (NTSs) at one or more sites in the genome. Clusters of genes may be connected and therefore localize to the same chromosome or show independent localization in the genome. 5S rRNA genes of higher eukaryotes are organized in tandem repeats of the basic unit of 200-900 bp in length, with a copy number ranging from 1000 to 50000.
The gene consists of 120 bp and is connected with spacers of different sizes. The 120-bp sequence of 5S rRNA is conserved within a species, while the structure of NTS domain clusters, as well as their length (100-800 bp), varies from species to species due to a weaker selection pressure compared to that acting on the coding region. The high degree of conservation characteristic of 5S rRNA gene is due to the function of 5S rRNA, a part of the large ribosomal subunit in all eukaryotic organisms. Some regions of the gene exhibit a higher level of conservation than others due to their role in the regulation of 5S rRNA transcription [11].
Sequence conservation in the coding regions and considerable variety in the spacer regions provide a good model for investigation of the organization and evolution of multigenic system in different plant species. In view of these assumptions, changes in the NTS region were used for the study and visualization of 5S rDNA arrays in the genome, evolution studies, and phylogenetic reconstruction in some plant species [5].
Experiments on the qualitative and quantitative identification of raw materials derived from fruit and berries by real-time PCR with 5S rRNA and ANS (anthocyanidin synthase) as target DNA sequences were conducted by American researchers in 2009.
Thus, the use of 5S rRNA and anthocyanidin synthase as target DNA sequences allowed for intrageneric and intraspecific differentiation of amplification products obtained from samples of fruits and berries. Practical application of this technique for laboratory testing of incoming raw materials and product quality control is difficult because a large database must be created to enable fast analysis of the data produced. The behavior of the primers in question during the analysis of GM products is hard to predict.
Chinese researchers have proposed to use a complex containing a non-coding and variable ITS (internal translation spacer) region and the gene encoding 5.8S rRNA to identify raw materials in food, since the profiles obtained using these sites are much more varied due to lower intraspecific polymorphism and significantly higher interspecific variability of the ribosomal area in question compared to 18S rRNA and 25S rRNA regions.
About 20 types of common food allergens, including scallops, squid, shrimp, crab, salmon, mackerel, chicken, pork, red caviar, meat, gelatin, orange, kiwi, walnuts, soybeans, matsutake, peach, sweet potato, apple, and banana were used as the objects of study. DNA from raw material derived from other fruit was not analyzed. Primers designed for this system are shown in Table 4 [7]. Further studies were focused on searching for a DNA sequence allowing for more sensitive generic and interspecific differentiation. The ribosomal gene rp116 from chloroplast genome was suggested as the DNA target. Analysis of nucleotide sequences listed in the database for the ribosomal gene rp116 from plants of the genus Prunus (peach) was used to design new primers which are shown in Table 5 [9].
The method suggested allowed for unambiguous identification of members of the genus Prunus, but intraspecies differentiation (of peach and apricot, for example) remained impossible.  Analysis of data published in Russia and abroad during the recent years showed that a method of qualitative identification and quantification of fruitderived raw material in foodstuffs has not been developed yet. However, the number of PCR procedures for species identification of fruit-derived raw material in food is increasing due to the increase of the volume of data on the composition of the genome of fruit and berry plants allowing for the use of new genes and nucleotide sequences as DNA targets.
The use of short oligonucleotide primers allows for differentiation between various samples of fruit materials; multiple annealing sites (and consequently, PCR initiation sites) for such primers are found in large genomes. The shorter the primers, the higher the number of putative annealing sites. One of the limitations of further involvement of such primers in PCR is the distance between the two annealing sites for oppositely directed primers. Reliability of the entire genotyping system decreases if the formation of long PCR products is expected when such primers are used.
The length of a PCR product obtained with random primers usually ranges from 0.2 to 2.0 kb. Two types of PCR products can be formed in this case; the type depends on the localization of primer annealing sites on the target DNA (Fig. 4).

Fig. 4.
Scheme of microorganism genotyping based on PCR with random primers: a -strain-specific differences related to different localization of primer annealing sites on DNA. A DNA fragment present in the genome of strain 1 is missing from the genomes of strains 2 and 3, and therefore the band corresponding to the PCR product is not found in the electrophoregram; b -the results of amplification of DNA samples containing the same set of primer annealing sites. DNA of strains 2 and 3 contain deletions of varying length between primer binding sites; alternatively, the DNA of strains 1 and 2 can contain insertions which are missing from the DNA of strain 3. Therefore, PCR products of different lengths are formed, as evident from the results of electrophoresis. A, B -primer annealing sites on DNA.
In the first case, the PCR products differ in length due to the different number of annealing sites for one or both primers in the template DNA ( Fig. 4-a); in the second case, the number of annealing sites in specific genetic loci is similar and the differences are due to the difference in lengths of DNA fragments framed by the annealing sites ( Fig. 4-b). Both these phenomena can be observed simultaneously in real samples [4,9].
Genotyping of eukaryotes sometimes results in simultaneous formation of up to 100 amplicons. The resulting pattern of amplification is species-and strainspecific despite the use of random primers. The number of primer annealing sites on a certain DNA template can vary considerably for primers of the same length having different primary structures.
Formation of PCR products of the first type requires the creation a huge database of passports for all the research objects, which is extremely challenging due to the large number of varieties characteristic of many plant species under investigation. Therefore, design of primers providing for the formation of products of the second type was attempted. Calculation of the length of the representative DNA segments was the first step of the design procedure.

CONCLUSIONS
Molecular genetic markers have priority in the set of modern molecular genetic tools that help plant selection breeders in solving practical problems, making the selection process more sophisticated and efficient.
Molecular markers are important in biotechnology, since the use of these markers allows for detection and control of loci determining quantitative and qualitative features of organisms. Identification of a range of important features of plants, such as the weight of grains and seeds, plant height, starch concentration, diameter, length and weight of seeds or fruit, and others, employed these markers [4,9].
Methods of molecular genetics provide a powerful solution to many fundamental problems of evolution, genetics, selection breeding, preservation of genetic diversity, genotype identification, and precise determination of genotype origin. The development of new technology based on the use of PCR-generated markers for investigation of molecular genetic polymorphism has been extremely rapid.
Application of PCR to identify the source of plant DNA found in foodstuffs is currently only local in Russia, because the normative documents, test systems, and most importantly, the protocols themselves are mostly under development, while in other countries the research in this field of food industry has been going on.for.over.10.years.