In-Silico Identification of EST Based Microsatellite Markers and SNPs , and Comparative Genomic Analysis of ESTs in Barnyard Millet for Their Omics Applications

Barnyard millet belongs to the family poaceae, having good nutritional properties and is also effective for diabetic patients because of its ability to reduce the blood glucose levels. The research on genomics in barnyard millet lagging behind other millets and cereals, where there is a need of more focus towards identification of microsatellite markers. The availability of EST sequences given possibility to develop and explore the EST based SSRs and SNPs. Hence, the present study was conducted at ICAR-Vivekananda Parvateeya Krishi Anusanthan Sansthan, Almora, Uttarakhand in the year 2014-2015. In the present study, the barnyard millet EST sequences (41) were downloaded in FASTA format to find the microsatellite type, distribution, frequency and developed a total of 22 primer pairs from the ESTs. The most frequent SSR repeats found to be tetranucleotide repeats (50 percent) followed by the pentaand hexanucleotide repeats. Among the dimeric SSRs, GT was found to be the most common repeat motif, AGG was the most common repeat motif in trimeric repeat motifs. The most common tetra-, pentaand hexa nucleotide repeat motifs were AGA, CAAA, TGTTT, AGACGA respectively. The SNP mining of barnyard millet ESTs found to have 1 potential SNP and 1 reliable SNP and two haplotypes. Comparative analysis of barnyard millet EST sequences with the rice genome database showed that they were homology to the rice chromosomal regions of 2, 5, 6, 8, 9 and 12, however with maize genome showed homology with respect to Zea mays Waxy gene. Thus the identified twenty two microsatellite markers and SNPs can be effectively used for barnyard millet genomics applications to study diversity, and mapping aspects. Article history Received: 29 July 2017 Accepted: 17 October 2017


Introduction
Barnyard millet (Echinochloa sp.) belongs to the sub-family panicoideae, family poaceae consisted of Japanese barnyard millet (E.esculenta (A.Braun) H. Scholz) and Indian barnyard millet (E.frumentacea Link).It is cultivated for food and feed purpose in Japan, Korea, the north-eastern parts of China and India, Pakistan and Nepal 1 .It is having good nutritional properties as it is a good source of highly digestible proteins, and insoluble fractions of fibres 2 .It is also effective for diabetic patients because of its ability to reduce the blood glucose levels 3 .Though a lot of progress has been done in other major cereals and millets at molecular level, but in barnyard millet very few nucleotide sequences (41) were available in the NCBI database even in the present next generation sequencing era.There are no genomic microsatellites available for barnyard millet.
The discovery of conserved synteny in minor cereals with reference to the major cereals like rice, maize and wheat is very important to identify useful alleles of important agro-morphological traits.Availability of molecular markers such as microsatellite markers or simple sequence repeat markers (SSRs) and single nucleotide polymorphic markers (SNPs) is very essential for barnyard millet crop improvement programmes.Though the SSRs are highly polymorphic and preferred, but they have limitations such as they are often tedious, costly cloning and enrichment procedures required for their development 4,5,6 .Microsatellites or simple sequence repeat (SSR) markers have been useful for molecular breeders and geneticists to link phenotypic and genotypic variation and also popular because of their abundance and amenability to high throughput screening.The genomic microsatellites offer few advantages over EST based SSRs like higher percentage of polymorphism 7 .In this concern, microsatellites developed from expressed sequence tags (ESTs), popularly known as EST-SSRs or genic microsatellites have become attractive forces towards identification of EST based SSRs and SNPs for different omics applications of different crops.The In-Silico mining of EST-SSRs was reported in several crops like finger millet 8 , citrus 9 , coffee 10 and other cereals 11 , where as in-silico mining of SNPs was reported in crops like sorghum 12,13

Materials and Methods EST Data Mining
A total of 41 EST sequences publicly available at NCBI (GenBank) database for barnyard millet were downloaded in FASTA format for SSR and SNP analysis.The present study was conducted at ICAR-Vivekananda Parvateeya Krishi Anusanthan Sansthan, Almora, Uttarakhand in the year 2014-15.The identification of SSR in the EST sequences was done using a "websat" software 14 tool which uses the primer 3 software as a basic programme.The riteria given for searching the simple sequence repeats was as follows.The criteria for mono-, di-, tri-, tetra-, penta-, and hexa-repeats did with minimum of 6, 5, 4, 3, 3 and 3 respectively.The main parameters for primer designing were: GC content of 50-60 %, annealing temperature (Tm) of 50-60 0 C.

Strategy of Comparative Analysis of Est Sequences of Barnyard Millet
The EST sequences of barnyard millet were retrieved from the NCBI website.To identify the homologous and orthologous genomic regions between barnyard millet with other crop genomes, these EST sequences were used for BLASTn analysis at an E value more than 4e-13.

Mining of SNPs
The presence of SNPs was done using the online software tool HaploSN Per 15 .The strategy followed for in-silico identification of SNPs was as followed.The barnyard millet EST sequences were retrieved from the NCBI database.Vector sequences were trim to remove contaminated EST sequences.Redundancy in retrieved ESTs was established through clustering using codoncode aligner (http:// www.codoncode.com/aligner/).All ESTs of barnyard millet were included in one batch and one of the sequences was included in other batch as input.Like wise to find out the SNPs presence between barnyard millet with other crop genomes like rice, maize, barley was done.To minimize the inclusion of ESTs from paralogous genes, stringent clustering criteria were followed: at least 50 bases overlap and 95 % identity between one end of a sequence to the other end.The presence of variations was identified in assembled contigs.A quality screen was performed in order to differentiate genuine SNPs from false ones using the method adopted from Picoult-Newberg et al., 16 and Hawken et al., 17 .

Mining, Frequency and Distribution of Est-Ssrs
In Barnyard Millet Till August, 2014, a total of 41 EST sequences of barnyard millet were available in the NCBI website and were downloaded in FASTA format.The in-silico mining of EST based SSRs was performed using the online software tool websat 14 and identified the forward and reverse primers using the primer 3 software embedded in the websat software.The SSR mining was done using the parameters set for repeat motifs as: di-, tri-tetra-penta-and hexa-nucleotide repeat units with > 6, >4, >3, >3, and >3 respectively.As per the above parameters, a total of seven ESTs were found to contain the microsatellite repeats.The remaining 34 sequences were very short length and did not have microsatellite repeats.A total of 22 primer pairs were able to design from the seven EST sequences.Recently Gimode et al., 18 identified 10,327 SSR primers by next generation sequencing in finger millet.Among these EST sequences of our study, five consisted of two primer pairs, while the remaining two sequences (AB668984.1 and AB668983.1)had six primer pairs.The details of the SSR repeat motifs, primer pair sequences, length and expected product size were given in table 1.The number of primers identified is similar to the earlier reports in finger millet 8 .Babu et al., 8 identified a total of 545 primer pairs and of which 32 EST sequences had more than two microsatellites.2. However, either di-or trimeric nucleotide repeats were the most frequent repeat motifs in most crop species as reported in earlier findings 16 .Gimode et al., 18 also found in finger millet that direpeats were most frequent followed by tri and tetra repeat motifs.Babu et al., 8 found GA as the most common dimer repeat motif, while in case of trimeric SSRs, it was CGG in finger millet.Reddy et al., 16 also found GA was the most common motif, followed by AG repeat motif in finger millet.This variation may be due to that very less number of EST sequences were available in the public domain NCBI.In the present study, tetra nucleotide repeats were most frequent followed by the penta-and hexa-nucleotide repeats.Guanmei et al., 18 mined the EST based SSRs from Torreya grandis and found that di-repeat motifs are most frequent.They found that AAG/CCT tri repeat motif is the most prominent repeat motif.
Among the dimeric SSRs, GT was found to be the most common repeat motif.However, in case of finger millet GA/AG was the most common dinucleotide repeat 8,19 and in Poaceae crops AG/CT and GA/TC were the most frequent 20 .However, GT repeat motif was found to be most frequent in algal species 21 similar to the present study results.Among the tri repeat SSR motifs, AGG was the most common repeat motif, however, in case of finger millet and rice crops CGG was the common trinucleotide repeat motif.In rice CCG/CGG was the predominant tri-repeat motif 22 .The most common tetra-, pentaand hexa nucleotide repeat motifs motifs were AGA, CAAA, TGTTT, AGACGA respectively.The graphical representation of the frequency of SSR motifs of ESTs of barnyard millet given in figure 1.The summary of the In-silico analysis of EST based SSRs in finger millet was depicted in table 3.After retrieval of the barnyard EST sequences in FASTA format, the HaploSNPer software tool was used for identification of SNPs.The SNPs found in the EST sequences consisted of insertions/deletions (Indels), transitions and transversions.The present study is highly relevant in current scenario, where the need of effective molecular markers needed for various omics applications in crops like barnyard millet.Computational strategies for SNP discovery makes use of large number of sequences available in public database, in most cases an expressed sequence tags (EST) considered to be faster and cost effective than experimental procedures 15 .
In present study, the SNP mining was done in comparison of EST sequences of barnyard millet with rice, maize, sorghum.

Identification of SNPs Among Barnyard Millet Sequences
All the EST sequences were grouped into 9 clusters.Out of the 9 clusters, the cluster 9 was found to have 1 potential SNP and 1 reliable SNP.This cluster has 2 haplotypes.The analysis of potential and reliable SNPs revealed that there are no indels (insertions and deletions).The potential SNPs have no transitions, but had one transversion (A\T).Among the reliable SNPs transversions was A\T and no transition was found.In cluster 9, the SNP was identified at 320 nucleotide place.The major allele was T, while the minor allele was A. Gimode et al., 18 found 23,285 non-homeologous SNPs in finger millet by next generation sequencing.The most abundant homeologous SNP was CT/AG (~62 %) while CG was the rarest SNP at about 7.5 %.

Identification of Snps Between Barnyard Millet and Rice Genome
When barnyard millet EST sequences were used as input against the rice genome database no SNPs found with the default parameters as described in the materials and methods.However, the default parameter settings were changed as follows for identifying the SNPs.The E-value set as 1e-35, the similarity for this alignment was set to 90 %.After changing the above parameters, it resulted in identification of 16 potential SNPs and 5 reliable SNPs.It was found that seven Indels were present in potential SNPs and no Indel was present in reliable SNPs.The potential SNPs found to contain one transition (A\G) and six transversions (A\T).
Among the reliable SNPs one transition (A\G) and 3 transversions (A\C) were found.Nisha et al., 23 developed single copy gene based 50K SNP chip in rice.This types of chips are very important not only in rice and also in similar related crops for their application in genomics platforms.

Identification of SNPs Between Barnyard Millet and Maize Genome
When barnyard millet EST sequences were used as input against the maize genome database no SNPs found with the default parameters as described in the materials and methods.However, the default parameter settings were changed as the similarity for this alignment is set to 90 %, six SNPs were found.A total of six potential SNPs and one reliable SNP were found between barnyard millet and maize genome sequences.There were 3 Indels in potential SNPs and no Indel was found in reliable SNPs .The potential SNPs found to contain 3 transitions (C\T, and A\G), and no transversions.Among the reliable SNPs only one transition (A\G) was found.In cluster 1, the identified SNP was at 548 nucleotide place.The major allele A was found in 7 sequences, while the minor allele A was found in 2 sequences.One haplotype was found to contain the major allele, no haplotype was found to contain the minor allele.In cluster 3, the identified SNP was at 175 nucleotide place.Similarly Girma et al., 24

Comparative Analysis of Barnyard Millet Est Sequences
Comparative genomics has facilitated the understanding of orthologues, and many fast evolving 'orphan' genes of unknown function and evolutionary history.In barnyard millets, comparative analysis provides an opportunity to study rapid genome changes, function of the newly isolated gene sequences and their further utilization in molecular breeding assisted crop improvement programmes for developing high yielding, stress resistant genotypes.
In the present study, we compared the barnyard millet EST sequences with the rice genome database.It was found that most of the barnyard millet sequences showed homology to the rice chromosomal regions of 2, 5, 6, 8, 9 and 12. Similar type of study also conducted in finger millet, where they found high synteny between rice and finger millet 25,26

Conclusion
The research on genomics in barnyard millet lagging behind other millets and cereals, where there is a need of more focus towards identification of microsatellite markers.In the present work, a total of 22 primer pairs were developed from the ESTs.The most frequent SSR repeats found to be tetra-nucleotide repeats (50 percent) followed by the penta-and hexa-nucleotide repeats.The SNP mining of barnyard millet ESTs found to have 1 potential SNP and 1 reliable SNP and two haplotypes.Thus the identified twenty two microsatellite markers and SNPs can be effectively used for barnyard millet genomics applications to study diversity, and mapping aspects.

Fig. 1 :
Fig. 1: The graphical representation of the frequency of SSR motifs of ESTs of barnyard millet 13bu et al.,8did in-silico analysis of finger millet EST sequences available at NCBI and developed a total of 545 primer pairs from the ESTs of finger millet.Girma et al.,13reported a total of 12,421 putative SNPs in sorghum from 2,921 contiguous transcripts leading to an average SNP interval of one putative SNP for every 275.26 bp.

Table 3 : The summary of the In-silico analysis of EST based SSRs in finger millet
. When we compared the similarity of barnyard millet ESTs with maize genome showed that more homology observed for Zea mays Waxy gene.The BLASTn analysis of barnyard millet EST sequences against the whole database in NCBI data base found to be more similar to GBSSI-S gene of Pannicum repens, Setaria italica, Pannicum milliacum Waxy gene, Colletotrichum eremochloae putative super oxide dismutase (SOD) gene.It needs many more EST sequences to find out the orthologous and homologous genes between barnyard millet and other cereals and millet species.