Transcriptome Mining for EST-InDels and Development of EST-SSR Markers in Turmeric (Curcuma longa L.)

Curcuma longa L. commonly known as turmeric is used as a culinary spice in India and many Asian countries. Turmeric has anti-fungal, anti-bacterial, anti-malarial and anti-cancer properties to mention a few. We have analyzed the transcriptome of C. longa retrieved from NCBI SRA database (SRR495223) for the development and validation of EST-SSR markers and identification of EST-InDels to be used for C. longa and related genera. A total of 337 primers were developed and 20 primers having the rating of 100% with the help of NetPrimer were selected and used for PCR validation in C. longa, Amomum subulatum Roxb. (large cardamom) and Elettaria cardamomum (L.) Maton (small cardamom). It was found that around 50% primers generated PCR products in both types of cardamom and 85% in C. longa. The developed primers worked with curcuma, large cardamom and small cardamom plants at varying levels. Besides, the transcriptome analysis detected high amount of deletion and 18 addition of bases which could be screened through the development of CAPS markers using the tool SNP2CAPS and it was found that 93 restriction enzyme can be used for screening these InDels.

Curcuma longa L. is a rhizomatous small plant from the ginger family (Zingiberaceae), commonly known as turmeric. For centuries in many parts of East Asian countries and India, turmeric has been used as an additive in daily cooking for colouring and flavoring, it is also being used as a household remedy for many curable conditions. Another most important use of turmeric is in making Kumkum. In alkaline pH turmeric turns red and it is widely used as Vermilion, also called as Kumkum which is an important component of Hindu religious ceremonies. The chemical constituents in turmeric comprises of a group of compounds known as curcuminoids, which includes curcumin (diferuloylmethane), demethoxycurcumin and bisdemethoxycurcumin. The curcumin represents 3.14% (on average) of powdered turmeric. It also includes some important volatile oils like termerone, zingiberene and atlantone with some common constituents like sugar, resins and proteins (Nagpal and Sood 2013).
Turmeric rhizomes are widely known for its bright yellow-orange characteristic color and offers as a less expensive yellow coloring in comparing with Print ISSN : 1974ISSN : -1712 Online ISSN : 2230-732X saffron, thus it is also known as poor man's saffron and turmeric gives a different taste in food than saffron. Turmeric provides a bright yellow dye that could be used as coloring for food, paints, textiles (fabric dye) and even in cosmetics. In some part of north India (Madhya Pradesh) turmeric is also used in making yellow paints for folk painting tradition. Turmeric rhizomes has also been used as traditional remedies by mixing with other plants to cure rage of conditions including headache, tonsillitis, wounds, snake bites, sprains, fractures (Ravindran 2007).
Curcumin is appreciated worldwide as an antimalarial, anti-inflammatory and anti-tumor compound and it has also been known to modulate lipid metabolism, which has been implicated in obesity (Hamaguchi et al. 2010). Extracts from turmeric oil/oleoresin which contains curcuminoids and essential oils are used for flavoring and coloring (Haddad et al. 2011).
The genus Curcuma L. (Zingiberaceae) contains many taxa of economic, medicinal, ornamental and cultural importance. It has been considered that some species hybridize in the wild and the crosses could be naturalized (Skornickova and Sabu 2005;Skornickova et al. 2007a (Skornickova et al. 2007b). This study also revealed intraspecific variations that exist in C.longa. These kinds of variations can be analyzed using molecular markers which can be developed through bioinformatics approaches. In fact, bioinformatics tools help in sequence analysis, gene finding, sequencing comparisons, transcriptome analysis and evolutionary genomics (Hogeweg 2011;Wong 2016). Through the advancement of next generation sequencing (NGS), though it is easy to develop genome wide single nucleotide polymorphisms (SNPs), usefulness of single sequence repeats (SSRs) or microsatellites is still high owing to easy scoring, high polymorphism, relatively high information generated through genetic analysis (Mayer et al. 2017). Many NGS projects have made transcriptome and the development of SSRs is relatively easier (Zhang et al. 2017). Besides, availability of transcriptomes aid in the development of genic SSRs (or EST-SSRs) which are more transferable among related species (Mathi Thumilan et al. 2016). In this background, a study was conducted to search for EST-InDels and develop EST-SSR markers from the transcriptome sequence of C. longa.

MATERIALS AND METHODS
The transcriptome of C. longa was retrieved from NCBI SRA database (SRR495223). Quality checking of the downloaded sequences was carried out using FASTQC 1 toolkit which was followed by de novo assembly utilizing the tools available at DDBJ 2 public server. . Minimum unit size cut-off of 6 was used to report a di-nucleotide repeat, 4 for a trinucleotide repeat and 3 for SSRs of sizes 4, 5 and 6. A maximum distance of 100 nucleotides was allowed between two SSRs. In order to specify the search criteria, an additional file containing the microsatellite search parameters was saved as 'misa. ini'. WebSat 4 was used for microsatellite primer prediction. This was followed by submitting the primer sequences to Net Primer 5 which analyzed secondary structures including hairpins, self-dimers, and cross-dimers in primer pairs. A rating of 100% indicates the absence of self and cross dimerization of primers. Twenty primers that satisfied the above criteria were selected for wet-lab validation and were custom synthesized at Sigma-Aldrich, Bengaluru. unit Taq polymerase enzyme. The PCR conditions were: 94°C for 2 min, followed by 35 cycles of 94°C for 30 sec, specific annealing temperature for 1 min, 72°C for 2 min, and 72°C for 7 min for the final extension. The amplified products were separated using agarose (3%) in horizontal electrophoresis unit at 70V and visualized by ethidium bromide under UV using the gel documentation system. The bands obtained were scored by following expected product size as reference (Table 1).
For the identification of EST-InDels, retrieval of EST sequences from NCBI-EST under accession numbers, H0002100.1, H0002099.1, H0002097.1 and H0002094.1 were downloaded in FASTA format and subjected to multiple sequence alignment using Clustal W 6 . Subsequently, SNP2CAPS 7 tool was used to identify INDELs (INsertion/DELetion) where a single base has been deleted or inserted into one genome relative to another which is a symmetrical relationship, as a deletion in one corresponds to an insertion in another. The SNP2CAPS facilitated the computational conversion of SNPs or InDels into CAPS markers.

RESULTS AND DISCUSSION
The raw transcriptome sequence (SRA id: SRR4955223; size: 3.2Gb) of Curcuma longa checked using FASTQC showed that the sequence quality was satisfactory for further analysis. After assembly using Trinity tool in DDBJ the sequence size was reduced to 99Mb. The MISA tool helped to identify 6206 SSRs where mono nucleotides represent 40%, di nucleotides 18%, tri nucleotide 38%, tetra nucleotides 1.7%, penta nucleotide 0.43% and hexa nucleotide 0.5%. A total of 337 primers were designed using Websat. Then these primers were used for checking primer efficiency using the tool NetPrimer. This resulted in 30 primers having 100% efficiency and of which 20 were selected for wetlab validation with C. longa, A. subulatum and E. cardamomum. About 85% of the primers generated PCR products with C. longa and 50% of the primers worked in both the cardamom species (Fig. 1).
These primers can be used to study population genetics including phylogenetic analyses and estimation of genetic diversity of C. longa and other related genera.
In order to identify EST-InDels, the sequences were retrieved from NCBI-EST under accession numbers HO002100.1, HO002099.1, HO002097.1 and HO002094.1. The sequences were aligned together and saved as Fasta format for further analysis using These InDels could be screened through the development of CAPS markers using the tool SNP2CAPS and it was found that 93 restriction enzymes can be used for screening the InDels. Herbal remedies are highly being recognized in primary health care all over the world. They are becoming extremely important to prevent viral infection during the outbreak of contagious diseases such as dengue fever. Curcuma longa commonly known as turmeric is used as a culinary spice and medicine in India and in many Asian countries for the last several centuries. Curcumin, the major active principle of the species, has significant anti-cancer, anti-malarial and anti-oxidant effects. Intraspecific variations have been reported in C. longa with respect to chromosome number and genome size. Such variation analyzed at DNA level using genetic markers would be useful for genetic mapping, molecular breeding, phylogenetic analysis, etc. In the present study, attempt was made to develop and validate EST-SSR markers and to identify EST-InDels in C. longa and its related genera.
The SSRs identified in the study comprised of repeated blocks of mono, di, tri, tetra, penta and hexa nucleotides at varying percentages. DNA polymerase slippage is the main mutational mechanism leading to changes in microsatellite length (Schlötterer and Tautz 1992). The difference in the abundance of the SSR repeats might be due to variation in search criteria by the respective software, size of EST database, and the software tools used for EST-SSR development (Gupta et al. 2009;Jain et al. 2014;Kumpatla and Mukhopadhyay 2005;Varshney et al. 2005).

CONCLUSION
The developed SSR primers worked with curcuma, large cardamom and small cardamom plants at varying levels. The InDel markers identified in the study can be experimentally validated with a set of common restriction enzymes. Both EST-SSR and EST-InDels are useful to perform genetic mapping, marker-assisted selection in breeding programmes and population genetics including phylogenetic analyses and estimation of genetic diversity in C. longa and other related genera.

ACKNOWLEDGMENTS
Authors thank the Director, JNTBGRI for the facilities extended to carry out this research programme. SMD and AJ acknowledge their gratitude to members of the Plant Molecular Biology Laboratory of Biotechnology and Bioinformatics Division, JNTBGRI for all academic and technical help. Authors are deeply indebted to Jinu Thomas (JNTBGRI) for her assistance offered to develop the bioinformatics pipeline required for the marker development.