Molecular distinction of C × R hybrid ( Coffea congensis × Coffea canephora ) from morphologically resembling male parent using rbc L and mat K gene sequences

Interspeci ﬁ c C × R hybrid ( Coffea congensis × Coffea canephora ) in India is cultivated as mixed population with male parent C. canephora as this species is an ef ﬁ cient pollen donor for enhanced yield. But distinction of C × R hybrid from C.canephora in oldplantation isdif ﬁ cult due to varyingplant sizes ofC × R hybridand oftenresem- bles with C. canephora . C × R hybrid cultivated under different agroclimatic conditions show distinct vegetative growthpatternwithvaryingyields.ThusdevelopmentofDNAmarkerforidenti ﬁ cationofC × Rhybridisimpor- tantforclonalpropagationandseedpreparationfromselectiveindividuals.Inthisstudy,twoDNAbarcodingloci ofchloroplastgenome( rbc Land mat K)ofparents,F1hybridanditsbackcrossprogenywerepartiallysequenced to identify SNPs as DNA marker for distinction of C × R hybrid from C. canephora . Seven SNPs in the mat K gene sequenceandthreenucleotidesinthe rbc Lgenesequencewereidenti ﬁ edasDNAmarkersforthegeneticidentity of C.congensis .TheseSNPswerefoundinF1andadvancedprogeniesofC × Rhybridduetomaternalinheritance. Large number of samples of C × R hybrids with varying morphological features revealed no polymorphism amongC × Rhybridand C.congensis .Thus,theSNPsin C.congensis canbeusedasDNAmarkers for preciseiden-ti ﬁ cation of C × R hybrid for production of clones besides tagging the chloroplast inheritance in advanced progenies. plant with a slow rate of evolutionary changes The of this to develop a simple and cost effective DNA marker for distinguishing C × R hybrid from C. canephora utilizing single nucleotide polymorphism (SNPs) in two chloroplast loci ( mat K and rbc L). In this study, SNPs between C. congensis and C. canephora were identi ﬁ ed in rbc L and mat K as species speci ﬁ c DNA marker. Multiple sequence alignments of these genes among the parents, hybrid and backcross progeny revealed maternal inheritance of chloroplast genome into F1 and its backcross progeny. Utilization of these ﬁ ndings in genetic improvement of C × R hybrid


Introduction
Hybridization is an important event for developing a new hybrid with desirable agronomic features and plays a major role in evolution and genetic diversity of crop plants (Rieseberg et al., 1995). Intergeneric hybridization is a rare phenomenon to produce fertile offspring (Stebbins, 1985). Nevertheless, many interspecific hybrids are generated due to a high degree of sexual compatibility between the species of same genera (Taylor and Kummer, 1982). Although, conventional breeding approaches lead to development of many varieties of agricultural crops, heterogeneity of F1 hybrids needs to be ensured before developing advanced progenies (Tsukazaki et al., 2006). The genus Coffea comprises over 100 species (Bridson and Verdcourt, 1988;Stoffelen, 1998;Davis et al., 2006), but only two species, Coffea arabica (2n = 2x = 44) and Coffea canephora (2n = 4x = 22) are commercially important ones (Vidal et al., 2010). C. arabica is a self compatible species, accounts for 70% of the world coffee trade due to its superior quality (Berthaud and Charrier, 1988), while Coffea canephora, a self incompatible species with inferior quality, contributes only 30% of world coffee market due to inferior quality of coffee (Ruas et al., 2003). Thus, development of new varieties of C. canephora with superior coffee quality has been pursued in India by involving C. congensis as female parent by conventional breeding. The F1 hybrid of C. congensis and C. canephora was backcrossed to C. canephora and exploited C × R hybrid. Presently, C. canephora is replaced with C × R hybrid due to its improved quality of coffee bean.
C. congensis is morphologically distinguishable from C. canephora. But its hybrid derivatives are highly polymorphic, leading a variable stature of bush size and often resemble C. canephora upon their maturity (Jamsheed et al., 1996). In addition, cultivation of C × R hybrid under various agro-climatic conditions shows morphological variations and thus it is difficult to precisely distinguish C × R hybrid from C. canephora (Jamsheed et al., 1996;Jamsheed and Srinivasan, 1988). As C. canephora is an efficient pollen donor to C × R hybrid for improving the productivity, high yielding clones of C. canephora are inevitably cultivated as mixed population. Chloroplast DNA based markers have been increasingly used to determine their inheritance in many interspecific hybrids (Mckinnon et al., 2001) since it shows maternal inheritance in most plant species with a slow rate of evolutionary changes (Mets, 1980;Birky, 1995;Setohigashi et al., 2011). The objective of this study was to develop a simple and cost effective DNA marker for distinguishing C × R hybrid from C. canephora utilizing single nucleotide polymorphism (SNPs) in two chloroplast loci (matK and rbcL). In this study, SNPs between C. congensis and C. canephora were identified in rbcL and matK as species specific DNA marker. Multiple sequence alignments of these genes among the parents, hybrid and backcross progeny revealed maternal inheritance of chloroplast genome into F1 and its backcross progeny. Utilization of these findings in genetic improvement of C × R hybrid is discussed.

Plant materials
Fully expanded young leaves of C. congensis, C. canephora, C. congensis × C. canephora (F1 hybrid) and backcross progeny of C × R to C. canephora were collected from the gene bank of Central Coffee Research Institute, Chikmagalur District, Karnataka, India (Supplementary data Table S1). Leaf samples from Backcross progeny of C × R hybrid were also collected from different provinces for the analysis. Whole leaves were stored at − 80°C freezer (Sanyo, Japan) and used for extraction of DNA.

Field observation
C. congensis (♀), C. canephora (♂), 30 individuals of F1 progenies (C × R hybrid) and sizeable population of backcrossed individuals (C × R backcrossed to C. canephora) were observed for vegetative (bush spread, leaf length, leaf width, stem girth and internodes) and reproductive (number of nodes/fruit bearing branch, number of fruits/ node, yield of fruit/plant) parameters to distinguish them (Table 1). A similar observation was extended to C × R hybrid cultivated under different provinces (Mudigere, Koppa, Coorg, Wayanad and Palney) covering a wide range geographical locations (Table 2). Data were analyzed using analysis of variance and level of significance was measured by Tukey's HSD method at p b 0.05.

Isolation of genomic DNA
About 0.1 g of frozen leaf tissue of coffee was ground into fine powder under liquid N 2 using sterile/chilled mortar and pestle. The powder samples were added into 1 ml of extraction buffer (100 mM Tris, pH 8.0, 1.4 M NaCl, 20 mM EDTA, pH 8.0, 2% CTAB, 0.3% β Mercaptoethanol and 1% PVP of Sigma Aldrich, Mumbai, India) in a centrifuge tube and incubated at 60°C for 60 min in heating block (Thermo Fisher Scientific, Mumbai, India). Samples were allowed to attain RT and equal volumes of (24:1) Chloroform and isoamyl alcohol (HiMedia Laboratories, Mumbai, India) were added and gently mixed to form an emulsion. Samples were centrifuged at 12,000 rpm for 10 min (Kubota, Japan). After centrifugation, supernatant was gently taken-out without disturbing the debris at the bottom layer. The supernatant was washed once again with chloroform and isoamyl alcohol and subjected to another round of centrifugation at 12,000 rpm for 10 min. Supernatant was taken out and added 2/3 volume of the isopropanol (HiMedia Laboratories, Mumbai, India) and incubated at −80°C (Cryo Scientific Systems Private Limited, Chennai, India) for 60 min. Samples were centrifuged at 12,000 rpm for 12 min and the supernatant was carefully removed without disturbing DNA pellet and washed with 20 μl of 70% ethanol before centrifugation at 12,000 rpm for 5 min. Ethanol was removed by pipetting and the final DNA pellet was vacuum dried for 15 min. DNA pellet was resuspended in 50 μl of 0.1 × TE buffer, pH 8 (10 mM Tris, 1 mM EDTA) and stored at − 20°C. DNA was run in 0.8% agarose gel (Sigma Aldrich, Mumbai, India) to determine the quality of DNA.

PCR amplification of rbcL and matK loci of chloroplast genome
PCR for amplification of rbcL gene was performed (MJ Minicycler, USA) with DNA samples of parents (C. congensis and C. canephora) F1 hybrid and backcross progenies using universal primers (ATGTCAC CACAAACACAGACTAAAGC-F; GAAACGGTCTCTCCAACGCAT-R for rbcL and CGATCTATTCATTCAATATTTC-F; TCTAGCACACGAAAGTCGAAGT-R for matK). In addition, backcross progenies showing varying bush sizes (C. congensis type, intermediate type and C. canephora type) were also included. Genomic DNA isolated from 25 F1 individuals were subjected to PCR amplification of rbcL and matK loci and these sequences were aligned to verify whether the hybrids carried the chloroplast of female (C. congensis) or male (C. canephora) parent. The PCR mixture consists of 25 μl master mix [10× Taq buffer, 25 mM of MgCl 2 , 10 mM dNTP mix, 1 U of Taq DNA Polymerase, 10 pmol/μl of primers (forward and reverse), template DNA (25 ng)]. The final volume was made up to 25 μl using nuclease free water. The condition for PCR amplification was 94°C for 4 min (initial denaturation), 94°C for 30 s (denaturation), followed by 30 cycles (annealing) of 56°C for 30 s, 72°C for 30 s (extension) and 72°C for 5 min (final extension). Amplified PCR products were run on 1% agarose gel. The PCR for amplification of chloroplast matK was performed with the similar condition except for annealing temperature at 52°C.

Purification of PCR products
PCR products of rbcL and matK genes of experimental samples were cut from the agarose gel and about 600 μl of 6 M NaI was added to agarose gel slice and melted at 70°C for 10 min in a heating block. The samples were gently mixed with 60 μl silica matrix (Sigma Aldrich, Mumbai, India) and incubated for 5 min at RT. The matrix was spun at 16,000 g for 10 sections to obtain the pellet. The supernatant was gently discarded and the pellet was washed with 500 μl wash buffer (10 mM Tris HCl-pH 7.5, 100 mM NaCl, 1 mM EDTA, 50% ethanol) by vortexing. The above step was repeated for 2 times to recover the pellets. The pellets with traces of wash buffer were once again centrifuged and air dried for few minutes to re-suspend the pellet with sterile water. Suspended DNA samples were dissolved by heating at 70°C for 2 min followed by centrifugation at 16,000 g for 2 min. The DNA sample was eluted into a fresh tube and the purity of PCR product was determined by agarose gel electrophoresis.

DNA sequencing and bioinformatics analysis
Purified PCR products were sequenced as per the manufacturer's protocols (Applied Biosystems 3730xl DNA Analyzer). Sequence similarity was confirmed using common BLAST similarity search and edited using NCBI ORF finder followed by BLASTX to remove internal stop codons and other non coding regions. Multiple sequence alignment was performed using hierarchical clustering (Corpet, 1988) in MultAlin interface. Sequences obtained from the parental samples (C. congensis and C. canephora) as well as from its derivatives (F1 hybrid and its backcross to C. canephora) were edited manually by removing the introns and submitted to NCBI. Evolutionary trees were constructed using neighbor-joining method (Saitou and Nei, 1987) with Kimura 2-parameter distance correction (Kimura, 1980) using MEGA 5.05 (Tamura et al., 2011).

Characterization of parents and progenies
Data on vegetative characteristic features revealed high variation between C. congensis and C. canephora (Supplementary data Fig. S1) and its F1 and backcross progeny (Supplementary data Fig. S2). C. congensis was a smaller bush of 77.7 cm diameter compared to C. canephora with broader bush (140.7 cm). F1 hybrid of these two species was intermediate in bush size with a mean diameter of 91.3 cm. However, F1 backcrossed to C. canephora showed improved vegetative vigor with a mean bush size of 130.3 cm, closer to C. canephora (Table 1). Other morphological feature such as leaf size, stem girth, and internodes were reduced significantly in F1 hybrid but these quantitative characters in backcross progenies were comparable with C. canephora. F1 hybrid and backcross progenies showed improved yield due to increase in number of nodes/fruit bearing branch and number of fruits per node. The yield of C × R hybrid was significantly improved over C. congensis. The F1 and backcross progeny of C × R hybrid were smaller in bush size over C. canephora and showed improved yield over   (Table 2).

Single nucleotide polymorphism
PCR amplification revealed the PCR product of size ≈680 bp for rbcL and ≈950 bp for matK in parents, hybrids and backcross progenies (Fig. 1). Population of backcross progenies with varying bush sizes also revealed a similar size of PCR products (Fig. 2). Sequencing of rbcL gene from, F1 hybrid and backcross progenies revealed 647 bp in C × R F1 hybrid and 636 bp in backcross progeny of C × R hybrid (Supplementary data Table S2). However, the coding regions of these products were 534 bp in C × R F1 hybrid and 579 bp in backcross progeny of C × R hybrid. Sequencing of matK gene from F1 hybrid and backcross progenies produced 950 bp and 960 bp respectively. But coding regions were 582 bp in C × R F1 hybrid and 618 bp in C × R backcross progeny.
rbcL showed single nucleotide polymorphism only in three different sites of the sequenced coding region at 100,101 and 453 bp out of 530 bp in C. congensis (Table 3). Blasting of partial rbcL gene sequences of C. congensis, C. canephora and their F1 hybrid with complete rbcL gene of C. arabica as reference gene (EF044213) revealed that the 530 bp sequences of partial rbcL genes of all the four study samples were spanning from the position of 103-632 bp regions of 1446 bp of rbcL ORFs of C. arabica (Fig. 5). Whereas, matK revealed single nucleotide polymorphism at higher frequency in seven different sites of the sequenced region at 113, 121, 123, 178, 286, 337 and 339 in C. canephora (Table 3). Blasting of partial matK gene sequences of parents, F1 hybrid and backcross progenies with complete matK gene of C. arabica (EF044213) showed that 582 bp sequences of partial matK genes of all the four samples were spanning from the position of 482-1062 bp regions of 1518 bp of matK ORFs of C. arabica.

Genetic distance of C. congensis, C. canephora and its progenies
Dendrogram constructed using rbcL and matK gene sequences of parents, hybrid and backcross progeny of C × R hybrid revealed a close resemblance within a single clade while other species of Coffea and a closely related Psilanthus species were relatively distant from the parents and hybrid derivatives (Figs. 3 and 4). The present finding revealed that the partial gene sequences of rbcL and matK locus of C. congensis can be used as DNA marker to identify C × R hybrid as these sequences resemble C × R hybrids due to maternal inheritance. However, one individual plant of F1 hybrid had shown paternal inheritance, indicating a very rare situation of chloroplast inheritance from male parent (Table 4; Fig. 4).

Discussion
C. congensis is a compact bush with a poor yield but it is known for its superior coffee flavor. Contrastingly, C. canephora is highly productive but beans are bitter in taste and produces inferior quality of coffee (Koshiro et al., 2007). To combine the flavor and productivity in hybrid, crosses were affected during 1942 and F1 progenies were evaluated for 17 years (Anonymous, 1998). Field observations clearly showed that F1 progeny was intermediate in plant stature with improved flavor but with poor productivity (Anonymous, 1998). Thus, backcross was  Fig. 3. Evolutionary dendogram constructed using rbcL sequences of C. congensis, C. canephora and its hybrid derivatives by neighbor-joining method with Kimura 2 parameter distance correction. Number at branch points indicates 1000× bootstrap value. Dark circles indicate the Coffea sp. and its hybrid derivatives analyzed in this study.
performed with C. canephora to develop C × R variety with an improved yield. Although this hybrid is distinguishable at an early stage, they show high variability after 20-25 years of cultivation and often resembles C. canephora. Commercial plantations of C × R hybrid were inevitably mixed with C. canephora as the latter one was proven to be a potent pollen donor for higher fruit set and productivity. Varying vegetative growth behaviors of C × R hybrid under different provinces lead to difficulties in the identification of C × R hybrid from C. canephora.
C × R hybrid often shows three different bush sizes (Jamsheed et al., 1996) namely C. congensis type (compact bush with smaller leaves) intermediate type (bush size is intermediate to C. congensis and C. canephora) and C. canephora type (buses are larger in size with broader leaves). In addition, unauthentic sources of seedlings from other diploid species of Coffea with a close resemblance to C × R hybrid and C. canephora makes more complexities among farming communities in precise identification genotypes for clonal propagation.
Interspecific hybridization and production of fertile hybrids utilizing agronomically important germplasm of coffee were emphasized (Louarn, 1993;Charrier, 1978). In coffee, genetic introgression of nuclear genes from diploid species of Coffea, surpassing the ploidy barrier was reported in Hibrido de Timor (a natural hybrid of C. arabica and C. canephora) and S-26 (a hybrid of C. liberica × C. arabica). These hybrids are widely used in the breeding of coffee varieties for resistance to leaf rust disease (Charrier and Eskes, 1997). However, genetic improvement of diploid species of Coffea has not been paid much attention than tetraploid species, since diploid species Fig. 4. Evolutionary dendogram constructed using matK sequences of C. congensis, C. canephora and its hybrid derivatives by neighbor-joining method with Kimura 2 parameter distance correction. Number at branch points indicates 1000× bootstrap value. Dark circles indicate Coffea sp. and its derivatives analyzed in this study. of Coffea are highly tolerant to major pests and diseases (Bettencourt, 1973;Srinivasan and Narasimhaswamy, 1975;Filho et al., 1999;Anonymous, 2000). Chloroplast DNA variation was applied to phylogenetic reconstruction (Clegg and Zurawski, 1992;Olmstead and Palmer, 1994;Razafimandimbison et al., 2009) and determination of inheritance of cpDNA (Harris and Ingram, 1991). Although maternal inheritance of chloroplast genome is the most common mode of inheritance in angiosperm (Sears, 1980;Ferris et al., 1997;Raspe, 2001), there are evidences that plastid genome inherit paternally with relatively lower frequencies (Shore and Triassi, 1998). In coffee, cpDNA was inherited maternally in C. arabica and C. canephora (Lashermes et al., 1996). A similar finding was reported in Indian coffee varieties (Suresh et al., 2012). The above findings support earlier observation made by Charrier (1978) and Louarn (1993), who stated that Coffea species hybridize freely with one another and produce fertile hybrids due to their high degree of sexual compatibility.
Previously, inheritance of cpDNA in coffee hybrids was determined by RFLP of atpB-rbcL and ndhC-trnV intergenic spacers (Lashermes et al., 1996). More recently, inheritance of rrn23-trnR (ACG) region in cpDNA of few Indian coffee varieties was determined without involving any restriction analysis as this region was duplicated in the inverted repeat and showed enough polymorphism between parents (Suresh et al., 2012). Both studies depend on the length polymorphism of amplified regions. But our findings are based on single nucleotide polymorphism which precisely identifies the sequence variation in the target genes between the parents and further flow of cpDNA into F1 and its advanced progenies. Of the various F1 individuals, only one plant had shown paternal inheritance as revealed by the rbcL sequences. Amplification of rbcL and matK genes within F1 population of varying plant sizes as well as from different provinces did not reveal any SNPs, suggesting that these SNPs are highly conserved and suitable to be utilized as DNA markers.
Molecular characterization of chloroplast DNA has a tremendous application in the identification of closely related species (Lee et al., 2009;Sangeetha et al., 2010;Balasubramani and Venkatasubramanian, 2011;Lee et al., 2011;Sui et al., 2011) and determination of interspecific hybrids (Stine et al., 1989;Achere et al., 2004;Khew and Chia, 2011). More recently, SNPs in various DNA barcoding loci have been used as DNA markers for distinguishing closely resembling medicinal plants possessing distinct medicinal properties (Sangeetha et al., 2010;Nair et al., 2012) and authentication of herbarium specimen (Khew and Chia, 2011). With the complete characterization of chloroplast genome of C. arabica and its sequence information (Samson et al., 2007), inheritance of chloroplast genome can be determined in any interspecific origin of coffee hybrids. The present study reaffirms the utilization of cpDNA variation in identifying interspecific hybrid of coffee variety in India. As the genus Coffea was reported to contain low cpDNA variation (Leshermes et al., 1996), most variable regions, such as intron and intergenic spacer need to be included for enhancing the resolution of cpDNA variation. The use of PCR ensures high sensitivity and allows the analysis of a large number of samples for determining the flow of chloroplast DNA in interspecific hybrids as well as in natural population. In India, there were twelve arabica and three robusta varieties have been developed and several other intra and interspecific hybrids are under evaluation. The present finding forms the basis to undertake a similar study in other commercial coffee varieties for precise identification of cultivars based on SNPs of chloroplast genome besides determining the pattern of chloroplast inheritance.
In India, C × R is a unique artificial hybrid combined with flavor and compact plant size. This variety fetches a premium prize for its aroma in the international market than other robusta varieties. A similar hybrid was also developed involving C. canephora and C. congensis with unique liquor quality in Madagascar and Ivory Coast for commercial cultivation (Leroy et al., 2006). In recent days, deterioration of C × R coffee bean was reported due to indiscriminate sourcing of planting materials in India. We have identified specific SNPs in rbcL and matK loci of C. congensis (a female parent of C × R hybrid) and confirmed that these SNPs are detectable in F1 hybrid and backcross progenies since chloroplast genome is inherited maternally. The SNPs identified in C. congensis can be used to trace the hybrids of this variety from a mixed population with any other diploid species of Coffea. Owing to a high degree of genetic variation in advanced progenies in many diploid species, clonal propagation was recommended for maintaining the genetic vigor and productivity. Thus, the SNPs identified in the present study can be used as a potential marker for segregating C × R hybrids which often resemble other diploid varieties of Coffea. These SNPs could be utilized for the precise selection of elite C × R hybrid in genetic improvement of this hybrid.
Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.sajb.2013.08.011. Table 4 F1 hybrid of C × R showing the maternal (P-01 to P-24) and paternal inheritance (P-25) based on the SNPs of rbcL and matK gene sequences.